SciPy 2014: an extremely useful conference with a diversity problem

I just got back home from the SciPy 2014 conference in Austin, TX. Here are my thoughts after this year’s conference.

About SciPy in general

Since my first SciPy in 2012, I’ve decided to prioritise programming conferences over scientific ones, because the value I get for my time is just that much higher. At a scientific conference, I certainly become aware of way-cool stuff going on in other research labs in my area. Once I get home, however, I go back to whatever I was doing. In contrast, at programming conferences, I become aware of new tools and practices that change the way I do science. In his keynote this year, Greg Wilson said of Software Carpentry, “We save researchers a day a week for the rest of their careers.” I feel the same way about SciPy in general.

In the 2012 sprints, I learned about GitHub Pull Requests and code review, the lingua franca of open source development today. I can’t express how useful that’s been. I also started my ongoing collaboration with the scikit-image project, which has enabled my research to reach far more users than I ever could have achieved on my own.

No scientific conference I’ve been to has had such an impact on my science output, nor can I imagine one doing so.

This year’s highlights

This year was no different. Without further ado, here are my top hits from this year’s conference:

  • Michael Droettboom talked about his continuous benchmarking project, Airspeed Velocity. It is hilariously named and incredibly useful. asv checks out code from your Git repo at regular intervals and runs benchmarks (which you define), and plots your code’s performance over time. It’s an incredible guard against feature creep slowing down your code base.
  • IPython recently unveiled their modal version 2.0 interface, sending vimmers worldwide rejoicing. But a few key bindings are just wrong from a vim perspective. Most egregiously, i, which should enter edit mode, interrupts the kernel when pressed twice! That’s just evil. Paul Ivanov goes all in with vim keybindings with his hilarious and life-changing IPython vimception talk. The title is more appropriate than I realised. Must-watch.
  • Damián Avila revealed (heh) his live IPython presentations with Reveal.js, forever changing how Python programmers present their work. I was actually aware of this before the conference and used it in my own talk, but you should definitely watch his talk and get the extension from his repo.
  • Min RK gave an excellent tutorial on IPython parallel (repo, videos 1, 2, 3). It’s remarkable what the IPython team have achieved thanks to their decoupling of the interactive shell and the computation “kernel”. (I still hate that word.)
  • Brian Granger and Jon Frederic gave an excellent tutorial on IPython interactive widgets (notebook here). They provide a simple and fast way to interactively explore your data. I’ve already started using these on my own problems.
  • Aaron Meurer gave the best introduction to the Python packaging problem that I’ve ever seen, and how Continuum’s conda project solves it. I think we still need an in-depth tutorial on how package distributors should use conda, but for users, conda is just awesome, and this talk explains why.
  • Matt Rocklin has a gift for crystal clear speaking, despite his incredible speed, and it was on full display in his (and Mark Wiebe’s) talk on Blaze, Continuum’s new array abstraction library. I’m not sure I’ll be using Blaze immediately but it’s certainly on my radar now!
  • Lightning talks are always fun: days 1, 2, 3. Watch out for Fernando Pérez’s announcement of Project Jupyter, the evolution of the IPython notebook, and for Damon McDougall’s riveting history of waffles. (You think I’m joking.)

Apologies if I’ve missed anyone: with three tracks, an added one with the World Cup matches ;) , and my own talk preparations, “overwhelming” does not begin to describe the conference! I will second Aaron Meurer’s assertion that there were no bad talks. Which brings us to…

On my to-watch

Jake Vanderplas recently wrote a series of blog posts (last one here, with links to earlier posts) comparing frequentist and Bayesian approaches to statistical analysis, in which he makes a compelling argument that we should all relegate frequentism to the dustbin of history. As such, I intend to go over Chris Fonnesbeck’s tutorial (2, 3) and talk about Bayesian analysis in Python using PyMC.

David Sanders also did a Julia tutorial (part 2) that was scheduled at the same time as my own scikit-image tutorial, but I’m interested to see how the Julia ecosystem is progressing, so that should be a good place to start. (Although I’m still bitter that they went with 1-based indexing!)

The reproducible science tutorial (part 2) generated quite a bit of buzz so I would be interested to go over that one as well.

For those interested in computing education or in geoscience, the conference had dedicated tracks for each of those, so you are bound to find something you like (not least, Lorena Barba’s and Greg Wilson’s keynotes). Have a look at the full listing of videos here. These might be easier to navigate by looking at the conference schedule.

The SciPy experience

I want to close this post with a few reflections on the conference itself.

SciPy is broken up into three “stages”: two days of tutorials, three days of talks, and two days of sprints. Above, I covered the tutorials and talks, but to me, the sprints are what truly distinguish SciPy. The spirit of collaboration is unparalleled, and an astonishing amount of value is generated in those two days, either in the shape of code, or in introducing newcomers to new projects and new ways to collaborate in programming.

My biggest regret of the conference was not giving a lightning talk urging people to come to the sprints. I repeatedly asked people whether they were coming to the sprints, and almost invariably the answer was that they didn’t feel they were good enough to contribute. To reiterate my previous statements: (1) when I attended my first sprint in 2012, I had never done a pull request; (2) sprints are an excellent way to introduce newcomers to projects and to the pull request development model. All the buzz around the sprints was how welcoming all of the teams were, but I think there is a massive number of missed opportunities because this is not made obvious to attendees before the sprints.

Lastly, a few notes on diversity. During the conference, April Wright, a student in evolutionary biology at UT Austin, wrote a heartbreaking blog post about how excluded she felt from a conference where only 15% of attendees were women. That particular incident was joyfully resolved, with plenty of SciPyers reaching out to April and inviting her along to sprints and other events. But it highlighted just how poorly we are doing in terms of diversity. Andy Terrel, one of the conference organisers, pointed out that 15% is much better than 2012′s three (women, not percent!), but (a) that is still extremely low, and (b) I was horrified to read this because I was there in 2012… And I did not notice that anything was wrong. How can it be, in 2012, that it can seem normal to be at a professional conference and have effectively zero women around? It doesn’t matter what one says about the background percentage of women in our industry and so on… Maybe SciPy is doing all it can about diversity. (Though I doubt it.) The point is that a scene like that should feel like one of those deserted cityscapes in post-apocalyptic movies. As long as it doesn’t, as long as SciPy feels normal, we will continue to have diversity problems. I hope my fellow SciPyers look at these numbers, feel appalled as I have, and try to improve.

… And on cue, while I was writing this post, Andy Terrel wrote a great post of his own about this very topic:

http://andy.terrel.us/blog/2014/07/17/

I still consider SciPy a fantastic conference. Jonathan Eisen (@phylogenomics), whom I admire, would undoubtedly boycott it because of the problems outlined above, but I am heartened that the organising committee is taking this as a serious problem and trying hard fix it. I hope next time it is even better.

A clever use of SciPy’s ndimage.generic_filter for n-dimensional image processing

This year I am privileged to be a mentor in the Google Summer of Code for the scikit-image project, as part of the Python Software Foundation organisation. Our student, Vighnesh Birodkar, recently came up with a clever use of SciPy’s ndimage.generic_filter that is certainly worth sharing widely.

Vighnesh is tasked with implementing region adjacency graphs and graph based methods for image segmentation. He initially wrote specific functions for 2D and 3D images, and I suggested that he should merge them: either with n-dimensional code, or, at the very least, by making 2D a special case of 3D. He chose the former, and produced extremely elegant code. Three nested for loops and a large number of neighbour computations were replaced by a function call and a simple loop. Read on to find out how.

Iterating over an array of unknown dimension is not trivial a priori, but thankfully, someone else has already solved that problem: NumPy’s nditer and ndindex functions allow one to efficiently iterate through every point of an n-dimensional array. However, that still leaves the problem of finding neighbors, to determine which regions are adjacent to each other. Again, this is not trivial to do in nD.

scipy.ndimage provides a suitable function, generic_filter. Typically, a filter is used to iterate a “selector” (called a structuring element) over an array, compute some function of all the values covered by the structuring element, and replace the central value by the output of the function. For example, using the structuring element:


fp = np.array([[0, 1, 0],
               [1, 1, 1],
               [0, 1, 0]], np.uint8)

and the function np.median on a 2D image produces a median filter over a pixel’s immediate neighbors. That is,


import functools
median_filter = functools.partial(generic_filter,
                                  function=np.median,
                                  footprint=fp)

Here, we don’t want to create an output array, but an output graph. What to do? As it turns out, Python’s pass-by-reference allowed Vighnesh to do this quite easily using the “extra_arguments” keyword to generic_filter: we can write a filter function that receives the graph and updates it when two distinct values are adjacent! generic_filter passes all values covered by a structuring element as a flat array, in the array order of the structuring element. So Vighnesh wrote the following function:


def _add_edge_filter(values, g):
    """Add an edge between first element in `values` and
    all other elements of `values` in the graph `g`.
    `values[0]` is expected to be the central value of
    the footprint used.

    Parameters
    ----------
    values : array
        The array to process.
    g : RAG
        The graph to add edges in.

    Returns
    -------
    0.0 : float
        Always returns 0.

    """
    values = values.astype(int)
    current = values[0]
    for value in values[1:]:
        g.add_edge(current, value)
    return 0.0

Then, using the footprint:


fp = np.array([[0, 0, 0],
               [0, 1, 1],
               [0, 1, 0]], np.uint8)

(or its n-dimensional analog), this filter is called as follows on labels, the image containing the region labels:


filters.generic_filter(labels,
                       function=_add_edge_filter,
                       footprint=fp,
                       mode='nearest',
                       extra_arguments=(g,))

This is a rather unconventional use of generic_filter, which is normally used for its output array. Note how the return value of the filter function, _add_edge_filter, is just 0! In our case, the output array contains all 0s, but we use the filter exclusively for its side-effect: adding an edge to the graph g when there is more than one unique value in the footprint. That’s cool.

Continuing, in this first RAG implementation, Vighnesh wanted to segment according to average color, so he further needed to iterate over each pixel/voxel/hypervoxel and keep a running total of the color and the pixel count. He used elements in the graph node dictionary for this and updated them using ndindex:


for index in np.ndindex(labels.shape):
    current = labels[index]
    g.node[current]['pixel count'] += 1
    g.node[current]['total color'] += image[index]

Thus, together, numpy’s nditer, ndindex, and scipy.ndimage’s generic_filter provide a powerful way to perform a large variety of operations on n-dimensional arrays… Much larger than I’d realised!

You can see Vighnesh’s complete pull request here and follow his blog here.

If you use NumPy arrays and their massive bag of tricks, please cite the paper below!

Stefan Van Der Walt, S. Chris Colbert, & Gaël Varoquaux (2011). The NumPy array: a structure for efficient numerical computation Computing in Science and Engineering 13, 2 (2011) 22-30 arXiv: 1102.1523v1

Elsevier et al’s pricing douchebaggery exposed

Ted Bergstrom and a few colleagues have just come out with an epic paper in which they reveal how much for-profit academic publishing companies charge university libraries, numbers that had previously been kept secret. The paper is ostensibly in the field of economics, but it could be more accurately described as “sticking-it-to-the-man-ology”.

This paragraph in the footnotes was particularly concise and fun to read:

Elsevier contested our contract request from Washington State University on the grounds that their pricing policy was a trade secret, and brought suit against the university. The Superior Court judge ruled that Washington State University could release the contracts to us. Elsevier and Springer also contested our request for contracts from the University of Texas (UT) System. The Texas state attorney general opined that the UT System was required to release copies of all of these contracts.

In other words: in the interest of full disclosure, suck it, Elsevier!

The executive summary is that these companies will extort as much as they possibly can from individual universities, then do everything to keep that number a secret so that they can more freely extort even more from other universities. You can read more about it in Science.

Oh, and in the ultimate twist of irony, the paper itself is behind PNAS’s paywall. How much did your university pay for that?

Bergstrom, T., Courant, P., McAfee, R., & Williams, M. (2014). Evaluating big deal journal bundles Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1403006111

An update on mixing Java and Python with Fiji

Two weeks ago I posted about invoking ImageJ functions from Python using Fiji’s Jython interpreter. A couple of updates on the topic:

First, I’ve made a repository with a template project encapsulating my tips from that post. It’s very simple to get a Fiji Jython script working from that template. As an example, here’s a script to evaluate segmentations using the metric used by the SNEMI3D segmentation challenge (a slightly modified version of the adapted Rand error).

Second, this entire discussion might be rendered obsolete by two incredible projects from the CellProfiler team: Python-Javabridge, which allows Python to interact seamlessly with Java code, and Python-Bioformats, which uses Python-Javabridge to read Bioformats images into Python. I have yet to play with them, but both look like cleaner alternatives to interact with ImageJ than my Jython scripting! At some point I’ll write a post exploring these tools, but if you get to it before me, please mention it in the comments!

Get the best of both worlds with Fiji’s Jython interpreter

Fiji is just ImageJ, with batteries included. It contains plugins to do virtually anything you would want to do to an image. Since my go-to programming language is Python, my favorite feature of Fiji is its language-agnostic API, which supports a plethora of languages, including Java, Javascript, Clojure, and of course Python; 7 languages in all. (Find these under Plugins/Scripting/Script Editor.) Read on to learn more about the ins and outs of using Python to drive Fiji.

Among the plugin smorgasbord of Fiji is the Bio-Formats importer, which can open any proprietary microscopy file under the sun. (And there’s a lot of them!) Below I will use Jython to open some .lifs, do some processing, and output some .pngs that I can process further using Python/NumPy/scikit-image. (A .lif is a Leica Image File, because there were not enough image file formats before Leica came along.)

The first thing to note is that Jython is not Python, and it is certainly not Python 2.7. In fact, the Fiji Jython interpreter implements Python 2.5, which means no argparse. Not to worry though, as argparse is implemented in a single, pure Python file distributed under the Python license. So:

Tip #1: copy argparse.py into your project.

This way you’ll have access the state of the art in command line argument processing from within the Jython interpreter.

To get Fiji to run your code, you simply feed it your source file on the command line. So, let’s try it out with a simple example, echo.py:

import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description=
                                     "Parrot back your arguments.")
    parser.add_argument('args', nargs="*", help="The input arguments.")
    args = parser.parse_args()
    for arg in args.args:
        print(arg)

Now we can just run this:

$ fiji echo.py hello world
hello
world

But sadly, Fiji captures any -h calls, which defeats the purpose of using argparse in the first place!

$ fiji echo.py -h
Usage: /Applications/Fiji.app/Contents/MacOS/fiji-macosx [<Java options>.. --] [<ImageJ options>..] [<files>..]

Java options are passed to the Java Runtime, ImageJ
options to ImageJ (or Jython, JRuby, ...).

In addition, the following options are supported by ImageJ:
General options:
--help, -h
	show this help
--dry-run
	show the command line, but do not run anything
--debug
	verbose output

(… and so on, the output is quite huge.)

(Note also that I aliased the Fiji binary, that long path under /Applications, to a simple fiji command; I recommend you do the same.)

However, we can work around this by calling help using Python as the interpreter, and only using Fiji to actually run the file:

$ python echo.py -h
usage: echo.py [-h] [args [args ...]]

Parrot back your arguments.

positional arguments:
  args        The input arguments.

optional arguments:
  -h, --help  show this help message and exit

That’s more like it! Now we can start to build something a bit more interesting, for example, something that converts arbitrary image files to png:

import argparse
from ij import IJ # the IJ class has utility methods for many common tasks.

def convert_file(fn):
    """Convert the input file to png format.

    Parameters
    ----------
    fn : string
        The filename of the image to be converted.
    """
    imp = IJ.openImage(fn)
    # imp is the common name for an ImagePlus object,
    # ImageJ's base image class
    fnout = fn.rsplit('.', 1)[0] + '.png'
    IJ.saveAs(imp, 'png', fnout)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Convert TIFF to PNG.")
    parser.add_argument('images', nargs='+', help="Input images.")

    args = parser.parse_args()
    for fn in args.images:
        convert_file(fn)

Boom, we’re done. But wait, we actually broke the Python interpreter compatibility, since ij is not a Python library!

$ python convert2png.py -h
Traceback (most recent call last):
  File "convert.py", line 2, in <module>
    from ij import IJ # the IJ class has utility methods for many common tasks.
ImportError: No module named ij

Which brings us to:

Tip #2: only import Java API functions within the functions that use them.

By moving the from ij import IJ statement into the convert function, we maintain compatibility with Python, and can continue to use argparse’s helpful documentation strings.

Next, we want to use the Bio-Formats importer, which is class BF in loci.plugins. Figuring out the class hierarchy for arbitrary plugins is tricky, but you can find it here for core ImageJ (using lovely 1990s-style frames) and here for Bio-Formats, and Curtis Rueden has made this list for other common plugins.

When you try to open a file with Bio-Formats importer using the Fiji GUI, you get the following dialog:

BioFormats import window
BioFormats import window

That’s a lot of options, and we actually want to set some of them. If you look at the BF.openImagePlus documentation, you can see that this is done through an ImporterOptions class located in loci.plugins.in. You’ll notice that “in” is a reserved word in Python, so from loci.plugins.in import ImporterOptions is not a valid Python statement. Yay! My workaround:

Tip #3: move your Fiji imports to an external file.

So I have a jython_imports.py file with just:

from ij import IJ
from loci.plugins import BF
from loci.plugins.in import ImporterOptions

Then, inside the convert_files() function, we just do:

from jython_imports import IJ, BF, ImporterOptions

This way, the main file remains Python-compatible until the convert() function is actually called, regardless of whatever funky and unpythonic stuff is happening in jython_imports.py.

Onto the options. If you untick “Open files individually”, it will open up all matching files in a directory, regardless of your input filename! Not good. So now we play a pattern-matching game in which we match the option description in the above dialog with the ImporterOptions API calls. In this case, we setUngroupFiles(True). To specify a filename, we setId(filename). Additionally, because we want all of the images in the .lif file, we setOpenAllSeries(True).

Next, each image in the series is 3D and has three channels, but we are only interested in a summed z-projection of the first channel. There’s a set of ImporterOptions methods tantalizingly named setCBegin, setCEnd, and setCStep, but this is where I found the documentation sorely lacking. The functions take (int s, int value) as arguments, but what’s s??? Are the limits closed or open? Code review is a wonderful thing, and this would not have passed it. To figure things out:

Tip #4: use Fiji’s interactive Jython interpreter to figure things out quickly.

You can find the Jython interpreter under Plugins/Scripting/Jython Interpreter. It’s no IPython, but it is extremely helpful to answer the questions I had above. My hypothesis was that s was the series, and that the intervals would be closed. So:

>>> from loci.plugins import BF
>>> from loci.plugins.in import ImporterOptions
>>> opts = ImporterOptions()
>>> opts.setId("myFile.lif")
>>> opts.setOpenAllSeries(True)
>>> opts.setUngroupFiles(True)
>>> imps = BF.openImagePlus(opts)

Now we can play around, with one slight annoyance: the interpreter won’t print the output of your last statement, so you have to specify it:

>>> len(imps)
>>> print(len(imps))
18

Which is what I expected, as there are 18 series in my .lif file. The image shape is given by the getDimensions() method of the ImagePlus class:

>>> print(imps[0].getDimensions())
array('i', [1024, 1024, 3, 31, 1])

>>> print(imps[1].getDimensions())
array('i', [1024, 1024, 3, 34, 1])

That’s (x, y, channels, z, time).

Now, let’s try the same thing with setCEnd, assuming closed interval:

>>> opts.setCEnd(0, 0) ## only read channels up to 0 for series 0?
>>> opts.setCEnd(2, 0) ## only read channels up to 0 for series 2?
>>> imps = BF.openImagePlus(opts)
>>> print(imps[0].getDimensions())
array('i', [1024, 1024, 1, 31, 1])

>>> print(imps[1].getDimensions())
array('i', [1024, 1024, 3, 34, 1])

>>> print(imps[2].getDimensions())
array('i', [1024, 1024, 1, 30, 1])

Nothing there to disprove my hypothesis! So we move on to the final step, which is to z-project the stack by summing the intensity over all z values. This is normally accessed via Image/Stacks/Z Project in the Fiji GUI, and I found the corresponding ij.plugin.ZProjector class by searching for “proj” in the ImageJ documentation. A ZProjector object has a setMethod method that usefully takes an int as an argument, with no explanation in its docstring as to which int translates to which method (sum, average, max, etc.). A little more digging in the source code reveals some class static variables, AVG_METHOD, MAX_METHOD, and so on.

Tip #5: don’t be afraid to look at the source code. It’s one of the main advantages of working in open-source.

So:

>>> from ij.plugin import ZProjector
>>> proj = ZProjector()
>>> proj.setMethod(ZProjector.SUM_METHOD)
>>> proj.setImage(imps[0])
>>> proj.doProjection()
>>> impout = proj.getProjection()
>>> print(impout.getDimensions())
array('i', [1024, 1024, 1, 1, 1])

The output is actually a float-typed image, which will get rescaled to [0, 255] uint8 on save if we don’t fix it. So, to wrap up, we convert the image to 16 bits (making sure to turn off scaling), use the series title to generate a unique filename, and save as a PNG:

>>> from ij.process import ImageConverter
>>> ImageConverter.setDoScaling(False)
>>> conv = ImageConverter(impout)
>>> conv.convertToGray16()
>>> title = imps[0].getTitle().rsplit(" ", 1)[-1]
>>> IJ.saveAs(impout, 'png', "myFile-" + title + ".png")

You can see the final result of my sleuthing in lif2png.py and jython_imports.py. If you would do something differently, pull requests are always welcome.

Before I sign off, let me recap my tips:

1. copy argparse.py into your project;

2. only import Java API functions within the functions that use them;

3. move your Fiji imports to an external file;

4. use Fiji’s interactive Jython interpreter to figure things out quickly; and

5. don’t be afraid to look at the source code.

And let me add a few final comments: once I started digging into all of Fiji’s plugins, I found documentation of very variable quality, and worse, virtually zero consistency between the interfaces to each plugin. Some work on “the currently active image”, some take an ImagePlus instance as input, and others still a filename or a directory name. Outputs are equally variable. This has been a huge pain when trying to work with these plugins.

But, on the flipside, this is the most complete collection of image processing functions anywhere. Along with the seamless access to all those functions from Jython and other languages, that makes Fiji very worthy of your attention.

Acknowledgements

This post was possible thanks to the help of Albert Cardona, Johannes Schindelin, Wayne Rasband, and Jan Eglinger, who restlessly respond to (it seems) every query on the ImageJ mailing list. Thanks!

References

Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez JY, White DJ, Hartenstein V, Eliceiri K, Tomancak P, & Cardona A (2012). Fiji: an open-source platform for biological-image analysis. Nature methods, 9 (7), 676-82 PMID: 22743772

Linkert M, Rueden CT, Allan C, Burel JM, Moore W, Patterson A, Loranger B, Moore J, Neves C, Macdonald D, Tarkowska A, Sticco C, Hill E, Rossner M, Eliceiri KW, & Swedlow JR (2010). Metadata matters: access to image data in the real world. The Journal of cell biology, 189 (5), 777-82 PMID: 20513764

Best practices addendum: find and follow the conventions of your programming community

The bioinformatics community is all atwitter about the recent PLOS Biology article, Best Practices for Scientific Computing. Its main points should be obvious to most quasi-experienced programmers, but I can certainly remember a time when they did not seem so obvious to me (last week I think). As such, it’s a valuable addition to the written record on scientific computing. One of their code snippets, however, is pretty annoying:

def scan(op, values, seed=None):
# Apply a binary operator cumulatively to the values given
# from lowest to highest, returning a list of results.
# For example, if "op" is "add" and "values" is "[1,3,5]",
# the result is "[1, 4, 9]" (i.e., the running total of the
# given values). The result always has the same length as
# the input.
# If "seed" is given, the result is initialized with that
# value instead of with the first item in "values", and
# the final item is omitted from the result.
# Ex : scan(add, [1, 3, 5] , seed=10)
# produces [10, 11, 14]
...implementation...

First, this code ignores the article’s own advice, (1b) make names consistent, distinctive, and meaningful.  I would argue that “scan” here is neither distinctive (many other operations could be called “scan”) nor meaningful (the function purpose is not at all clear from the name). My suggestion would be “cumulative_reduce”.

It also does not address another important piece of advice that I would add to their list, maybe as (1d): Find out, and follow, the conventions of the programming community you’re joining. This will allow others to use and assess your code more readily, and you to contribute to other code libraries more easily. Here, although they have made efforts to make their advice language-agnostic, the authors have chosen Python to illustrate their point. Python happens to have strong style and documentation prescriptions in the form of Python Enhancement Proposals PEP-8: Style Guide for Python Code and PEP-257: Docstring conventions. Following PEP-8 and PEP-257, the above comments become an actual docstring (which is attached to the function automatically by documentation-generating tools):

def cumulative_reduce(op, values, seed=None):
    """Apply a binary operator cumulatively to the values given.

    The operator is applied from left to right.

    For example, if "op" is "add" and "values" is "[1,3,5]",
    the result is "[1, 4, 9]" (i.e., the running total of the
    given values). The result always has the same length as
    the input.

    If "seed" is given, the result is initialized with that
    value instead of with the first item in "values", and
    the final item is omitted from the result.
    Ex : scan(add, [1, 3, 5] , seed=10)
    produces [10, 11, 14]
    """
    ...implementation...

In addition, the Scientific Python community in particular has adopted a few docstring conventions of their own, including the NumPy docstring conventions, which divide the docstring into meaningful sections using ReStructured Text, and the doctest convention to format examples, so the documentation acts as unit tests. So, to further refine their example code:

def cumulative_reduce(op, values, seed=None):
    """Apply a binary operator cumulatively to the values given.

    The operator is applied from left to right.

    Parameters
    ----------
    op : binary function
        An operator taking as input to values of the type contained in
        `values` and returning a value of the same type.
    values : list
        The list of input values.
    seed : type contained in `values`, optional
        A seed to start the reduce operation.

    Returns
    -------
    reduced : list, same type as `values`
        The accumulated list.

    Examples
    --------
    >>> add = lambda x, y: x + y
    >>> cumulative_reduce(add, [1, 3, 5])
    [1, 4, 9]

    If "seed" is given, the result is initialized with that
    value instead of with the first item in "values", and
    the final item is omitted from the result.

    >>> cumulative_reduce(add, [1, 3, 5], seed=10)
    [10, 11, 14]
    """
    ...implementation...

Obviously, these conventions are specific to scientific Python. But the key is that other communities will have their own, and you should find out what those conventions are and adopt them. When in Rome, do as the Romans do. It’s actually taken me quite a few years of scientific programming to realise this (and internalise it). I hope this post will help someone get up to speed more quickly than I have.

(Incidentally, the WordPress/Chrome/OSX spell checker doesn’t bat an eye at “atwitter”. That’s awesome.)

Reference

Greg Wilson, DA Aruliah, C Titus Brown, Neil P Chue Hong, Matt Davis, Richard T Guy, Steven HD Haddock, Kathryn D Huff, Ian M Mitchell, Mark D Plumbley, Ben Waugh, Ethan P White, & Paul Wilson (2014). Best Practices for Scientific Computing PLoS Biol, 12 (1) DOI: 10.1371/journal.pbio.1001745

Our environmental future

Another link post, to a worthwhile article by Veronique Greenwood for Aeon (emphases mine):

For much of the thousands of years of human existence, our species has treated the world more or less as an open system. [...] the general faith was that there were, say, more whales somewhere [...] more trees somewhere [...]. Even today, in the face of imminent climate change, we continue to function as though there’s more atmosphere somewhere, ready to whisk off our waste to someplace else. It is time, though, to think of the world as a closed system. When you look at the resources involved in maintaining even a single member of a developed society, it’s hard to avoid the knowledge that this cannot continue. Last year, Tim De Chant, an American journalist who runs the blog Per Square Mile, made striking depictions of the space required if everyone in the world live liked the inhabitants of a number of countries. If we all lived like Americans, even four planet Earths would not be enough.

The article does suggest, however, that a change of mindset will push us to inventive solutions to our environmental problems. I hope she’s right.

My review of the Roost laptop stand

In short: it’s awesome; the best stand I have ever used, by a wide margin. Read on for details.

The Roost is an ingeniously designed laptop stand that folds away to nothing, so you can always carry it with you. It’s another Kickstarter success story. (It’s the third I’ve participated in, after the Elevate iPhone dock and the Pebble watch. I absolutely love the Kickstarter economy.)

Here’s a picture of the Roost in its carry bag. You can see that it’s just tiny:

Image

And unwrapped:

Image

And yet for all its diminutive size, this stand gives my laptop wicked air:

roost-before roost-after

The laptop screen actually sits higher (closer to eye level) than on other stands I’ve used from Griffin or Xbrand, despite the Roost being much lighter and smaller. Folding and unfolding the Roost is fantastically easy, smooth, and fast. It’s just excellent design.

The laptop is held up by two tiny tabs that latch underneath the display’s hinge:

2013-11-10 14.13.44

Ingenious!

If you’re at all thinking about purchasing a laptop stand, I can’t recommend the Roost highly enough. Buy it now.

Speed up your Mac’s wake up time using pmset. Do it again after upgrading to Mavericks

Last year I got a 15″ Retina Macbook Pro, an excellent machine. However, it was taking way longer than my 13″ MBP to wake up from sleep. After a few months of just accepting it as a flaw of the new machines and the cost of being an early adopter, I finally decided to look into the problem. Sure enough, I came across this excellent post from OS X Daily:

Is Your Mac Slow to Wake from Sleep? Try this pmset Workaround

Oooh, sweet goodness: basically, after 1h10min asleep, your Mac goes into a “deep sleep” mode that dumps the contents of RAM into your HDD/SSD and powers off the RAM. On wake, it needs to load up all the RAM contents again. This is slow when your machine has 16GB of RAM! Thankfully, you can make your Mac wait any amount of time before going into deep sleep. This will eat up your battery a bit more, but it’s worth it. Just type this into the Terminal:

sudo pmset -a standbydelay 86400

This changes the time to deep sleep to 24h. Since I rarely spend more than 24h without using my computer, I now have instant-on every time I open up my laptop!

Finally, the reason I wrote this now: upgrading to Mavericks sneakily resets your standbydelay to 4200. (Or, at least, it did for me.) Just run the above command again and you’ll be set, at least until the next OS upgrade comes along!

Update: the original source of this tip appears to be a post from Erv Walter on his site, Ewal.net. It goes into a lot more detail about the origin of this sleep mode — which indeed did not exist when I bought my previous Macbook Pro.