Category Archives: Research Blogging

A clever use of SciPy’s ndimage.generic_filter for n-dimensional image processing

This year I am privileged to be a mentor in the Google Summer of Code for the scikit-image project, as part of the Python Software Foundation organisation. Our student, Vighnesh Birodkar, recently came up with a clever use of SciPy’s ndimage.generic_filter that is certainly worth sharing widely.

Vighnesh is tasked with implementing region adjacency graphs and graph based methods for image segmentation. He initially wrote specific functions for 2D and 3D images, and I suggested that he should merge them: either with n-dimensional code, or, at the very least, by making 2D a special case of 3D. He chose the former, and produced extremely elegant code. Three nested for loops and a large number of neighbour computations were replaced by a function call and a simple loop. Read on to find out how.

Iterating over an array of unknown dimension is not trivial a priori, but thankfully, someone else has already solved that problem: NumPy’s nditer and ndindex functions allow one to efficiently iterate through every point of an n-dimensional array. However, that still leaves the problem of finding neighbors, to determine which regions are adjacent to each other. Again, this is not trivial to do in nD.

scipy.ndimage provides a suitable function, generic_filter. Typically, a filter is used to iterate a “selector” (called a structuring element) over an array, compute some function of all the values covered by the structuring element, and replace the central value by the output of the function. For example, using the structuring element:

fp = np.array([[0, 1, 0],
               [1, 1, 1],
               [0, 1, 0]], np.uint8)

and the function np.median on a 2D image produces a median filter over a pixel’s immediate neighbors. That is,

import functools
median_filter = functools.partial(generic_filter,
                                  function=np.median,
                                  footprint=fp)

Here, we don’t want to create an output array, but an output graph. What to do? As it turns out, Python’s pass-by-reference allowed Vighnesh to do this quite easily using the “extra_arguments” keyword to generic_filter: we can write a filter function that receives the graph and updates it when two distinct values are adjacent! generic_filter passes all values covered by a structuring element as a flat array, in the array order of the structuring element. So Vighnesh wrote the following function:

def _add_edge_filter(values, g):
    """Add an edge between first element in `values` and
    all other elements of `values` in the graph `g`.
    `values[0]` is expected to be the central value of
    the footprint used.

    Parameters
    ----------
    values : array
        The array to process.
    g : RAG
        The graph to add edges in.

    Returns
    -------
    0.0 : float
        Always returns 0.

    """
    values = values.astype(int)
    current = values[0]
    for value in values[1:]:
        g.add_edge(current, value)
    return 0.0

Then, using the footprint:

fp = np.array([[0, 0, 0],
               [0, 1, 1],
               [0, 1, 0]], np.uint8)

(or its n-dimensional analog), this filter is called as follows on labels, the image containing the region labels:

filters.generic_filter(labels,
                       function=_add_edge_filter,
                       footprint=fp,
                       mode='nearest',
                       extra_arguments=(g,))

This is a rather unconventional use of generic_filter, which is normally used for its output array. Note how the return value of the filter function, _add_edge_filter, is just 0! In our case, the output array contains all 0s, but we use the filter exclusively for its side-effect: adding an edge to the graph g when there is more than one unique value in the footprint. That’s cool.

Continuing, in this first RAG implementation, Vighnesh wanted to segment according to average color, so he further needed to iterate over each pixel/voxel/hypervoxel and keep a running total of the color and the pixel count. He used elements in the graph node dictionary for this and updated them using ndindex:

for index in np.ndindex(labels.shape):
    current = labels[index]
    g.node[current]['pixel count'] += 1
    g.node[current]['total color'] += image[index]

Thus, together, numpy’s nditer, ndindex, and scipy.ndimage’s generic_filter provide a powerful way to perform a large variety of operations on n-dimensional arrays… Much larger than I’d realised!

You can see Vighnesh’s complete pull request here and follow his blog here.

If you use NumPy arrays and their massive bag of tricks, please cite the paper below!

Stefan Van Der Walt, S. Chris Colbert, & Gaël Varoquaux (2011). The NumPy array: a structure for efficient numerical computation Computing in Science and Engineering 13, 2 (2011) 22-30 arXiv: 1102.1523v1

Get the best of both worlds with Fiji’s Jython interpreter

Fiji is just ImageJ, with batteries included. It contains plugins to do virtually anything you would want to do to an image. Since my go-to programming language is Python, my favorite feature of Fiji is its language-agnostic API, which supports a plethora of languages, including Java, Javascript, Clojure, and of course Python; 7 languages in all. (Find these under Plugins/Scripting/Script Editor.) Read on to learn more about the ins and outs of using Python to drive Fiji.

Among the plugin smorgasbord of Fiji is the Bio-Formats importer, which can open any proprietary microscopy file under the sun. (And there’s a lot of them!) Below I will use Jython to open some .lifs, do some processing, and output some .pngs that I can process further using Python/NumPy/scikit-image. (A .lif is a Leica Image File, because there were not enough image file formats before Leica came along.)

The first thing to note is that Jython is not Python, and it is certainly not Python 2.7. In fact, the Fiji Jython interpreter implements Python 2.5, which means no argparse. Not to worry though, as argparse is implemented in a single, pure Python file distributed under the Python license. So:

Tip #1: copy argparse.py into your project.

This way you’ll have access the state of the art in command line argument processing from within the Jython interpreter.

To get Fiji to run your code, you simply feed it your source file on the command line. So, let’s try it out with a simple example, echo.py:

import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description=
                                     "Parrot back your arguments.")
    parser.add_argument('args', nargs="*", help="The input arguments.")
    args = parser.parse_args()
    for arg in args.args:
        print(arg)

Now we can just run this:

$ fiji echo.py hello world
hello
world

But sadly, Fiji captures any -h calls, which defeats the purpose of using argparse in the first place!

$ fiji echo.py -h
Usage: /Applications/Fiji.app/Contents/MacOS/fiji-macosx [<Java options>.. --] [<ImageJ options>..] [<files>..]

Java options are passed to the Java Runtime, ImageJ
options to ImageJ (or Jython, JRuby, ...).

In addition, the following options are supported by ImageJ:
General options:
--help, -h
	show this help
--dry-run
	show the command line, but do not run anything
--debug
	verbose output

(… and so on, the output is quite huge.)

(Note also that I aliased the Fiji binary, that long path under /Applications, to a simple fiji command; I recommend you do the same.)

However, we can work around this by calling help using Python as the interpreter, and only using Fiji to actually run the file:

$ python echo.py -h
usage: echo.py [-h] [args [args ...]]

Parrot back your arguments.

positional arguments:
  args        The input arguments.

optional arguments:
  -h, --help  show this help message and exit

That’s more like it! Now we can start to build something a bit more interesting, for example, something that converts arbitrary image files to png:

import argparse
from ij import IJ # the IJ class has utility methods for many common tasks.

def convert_file(fn):
    """Convert the input file to png format.

    Parameters
    ----------
    fn : string
        The filename of the image to be converted.
    """
    imp = IJ.openImage(fn)
    # imp is the common name for an ImagePlus object,
    # ImageJ's base image class
    fnout = fn.rsplit('.', 1)[0] + '.png'
    IJ.saveAs(imp, 'png', fnout)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Convert TIFF to PNG.")
    parser.add_argument('images', nargs='+', help="Input images.")

    args = parser.parse_args()
    for fn in args.images:
        convert_file(fn)

Boom, we’re done. But wait, we actually broke the Python interpreter compatibility, since ij is not a Python library!

$ python convert2png.py -h
Traceback (most recent call last):
  File "convert.py", line 2, in <module>
    from ij import IJ # the IJ class has utility methods for many common tasks.
ImportError: No module named ij

Which brings us to:

Tip #2: only import Java API functions within the functions that use them.

By moving the from ij import IJ statement into the convert function, we maintain compatibility with Python, and can continue to use argparse’s helpful documentation strings.

Next, we want to use the Bio-Formats importer, which is class BF in loci.plugins. Figuring out the class hierarchy for arbitrary plugins is tricky, but you can find it here for core ImageJ (using lovely 1990s-style frames) and here for Bio-Formats, and Curtis Rueden has made this list for other common plugins.

When you try to open a file with Bio-Formats importer using the Fiji GUI, you get the following dialog:

BioFormats import window
BioFormats import window

That’s a lot of options, and we actually want to set some of them. If you look at the BF.openImagePlus documentation, you can see that this is done through an ImporterOptions class located in loci.plugins.in. You’ll notice that “in” is a reserved word in Python, so from loci.plugins.in import ImporterOptions is not a valid Python statement. Yay! My workaround:

Tip #3: move your Fiji imports to an external file.

So I have a jython_imports.py file with just:

from ij import IJ
from loci.plugins import BF
from loci.plugins.in import ImporterOptions

Then, inside the convert_files() function, we just do:

from jython_imports import IJ, BF, ImporterOptions

This way, the main file remains Python-compatible until the convert() function is actually called, regardless of whatever funky and unpythonic stuff is happening in jython_imports.py.

Onto the options. If you untick “Open files individually”, it will open up all matching files in a directory, regardless of your input filename! Not good. So now we play a pattern-matching game in which we match the option description in the above dialog with the ImporterOptions API calls. In this case, we setUngroupFiles(True). To specify a filename, we setId(filename). Additionally, because we want all of the images in the .lif file, we setOpenAllSeries(True).

Next, each image in the series is 3D and has three channels, but we are only interested in a summed z-projection of the first channel. There’s a set of ImporterOptions methods tantalizingly named setCBegin, setCEnd, and setCStep, but this is where I found the documentation sorely lacking. The functions take (int s, int value) as arguments, but what’s s??? Are the limits closed or open? Code review is a wonderful thing, and this would not have passed it. To figure things out:

Tip #4: use Fiji’s interactive Jython interpreter to figure things out quickly.

You can find the Jython interpreter under Plugins/Scripting/Jython Interpreter. It’s no IPython, but it is extremely helpful to answer the questions I had above. My hypothesis was that s was the series, and that the intervals would be closed. So:

>>> from loci.plugins import BF
>>> from loci.plugins.in import ImporterOptions
>>> opts = ImporterOptions()
>>> opts.setId("myFile.lif")
>>> opts.setOpenAllSeries(True)
>>> opts.setUngroupFiles(True)
>>> imps = BF.openImagePlus(opts)

Now we can play around, with one slight annoyance: the interpreter won’t print the output of your last statement, so you have to specify it:

>>> len(imps)
>>> print(len(imps))
18

Which is what I expected, as there are 18 series in my .lif file. The image shape is given by the getDimensions() method of the ImagePlus class:

>>> print(imps[0].getDimensions())
array('i', [1024, 1024, 3, 31, 1])

>>> print(imps[1].getDimensions())
array('i', [1024, 1024, 3, 34, 1])

That’s (x, y, channels, z, time).

Now, let’s try the same thing with setCEnd, assuming closed interval:

>>> opts.setCEnd(0, 0) ## only read channels up to 0 for series 0?
>>> opts.setCEnd(2, 0) ## only read channels up to 0 for series 2?
>>> imps = BF.openImagePlus(opts)
>>> print(imps[0].getDimensions())
array('i', [1024, 1024, 1, 31, 1])

>>> print(imps[1].getDimensions())
array('i', [1024, 1024, 3, 34, 1])

>>> print(imps[2].getDimensions())
array('i', [1024, 1024, 1, 30, 1])

Nothing there to disprove my hypothesis! So we move on to the final step, which is to z-project the stack by summing the intensity over all z values. This is normally accessed via Image/Stacks/Z Project in the Fiji GUI, and I found the corresponding ij.plugin.ZProjector class by searching for “proj” in the ImageJ documentation. A ZProjector object has a setMethod method that usefully takes an int as an argument, with no explanation in its docstring as to which int translates to which method (sum, average, max, etc.). A little more digging in the source code reveals some class static variables, AVG_METHOD, MAX_METHOD, and so on.

Tip #5: don’t be afraid to look at the source code. It’s one of the main advantages of working in open-source.

So:

>>> from ij.plugin import ZProjector
>>> proj = ZProjector()
>>> proj.setMethod(ZProjector.SUM_METHOD)
>>> proj.setImage(imps[0])
>>> proj.doProjection()
>>> impout = proj.getProjection()
>>> print(impout.getDimensions())
array('i', [1024, 1024, 1, 1, 1])

The output is actually a float-typed image, which will get rescaled to [0, 255] uint8 on save if we don’t fix it. So, to wrap up, we convert the image to 16 bits (making sure to turn off scaling), use the series title to generate a unique filename, and save as a PNG:

>>> from ij.process import ImageConverter
>>> ImageConverter.setDoScaling(False)
>>> conv = ImageConverter(impout)
>>> conv.convertToGray16()
>>> title = imps[0].getTitle().rsplit(" ", 1)[-1]
>>> IJ.saveAs(impout, 'png', "myFile-" + title + ".png")

You can see the final result of my sleuthing in lif2png.py and jython_imports.py. If you would do something differently, pull requests are always welcome.

Before I sign off, let me recap my tips:

1. copy argparse.py into your project;

2. only import Java API functions within the functions that use them;

3. move your Fiji imports to an external file;

4. use Fiji’s interactive Jython interpreter to figure things out quickly; and

5. don’t be afraid to look at the source code.

And let me add a few final comments: once I started digging into all of Fiji’s plugins, I found documentation of very variable quality, and worse, virtually zero consistency between the interfaces to each plugin. Some work on “the currently active image”, some take an ImagePlus instance as input, and others still a filename or a directory name. Outputs are equally variable. This has been a huge pain when trying to work with these plugins.

But, on the flipside, this is the most complete collection of image processing functions anywhere. Along with the seamless access to all those functions from Jython and other languages, that makes Fiji very worthy of your attention.

Acknowledgements

This post was possible thanks to the help of Albert Cardona, Johannes Schindelin, Wayne Rasband, and Jan Eglinger, who restlessly respond to (it seems) every query on the ImageJ mailing list. Thanks!

References

Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez JY, White DJ, Hartenstein V, Eliceiri K, Tomancak P, & Cardona A (2012). Fiji: an open-source platform for biological-image analysis. Nature methods, 9 (7), 676-82 PMID: 22743772

Linkert M, Rueden CT, Allan C, Burel JM, Moore W, Patterson A, Loranger B, Moore J, Neves C, Macdonald D, Tarkowska A, Sticco C, Hill E, Rossner M, Eliceiri KW, & Swedlow JR (2010). Metadata matters: access to image data in the real world. The Journal of cell biology, 189 (5), 777-82 PMID: 20513764

Why PLOS ONE is no longer my default journal

Time-to-publication at the world’s biggest scientific journal has grown dramatically, but the nail in the coffin was its poor production policies.

When PLOS ONE was announced in 2006, its charter immediately resonated with me. This would be the first journal where only scientific accuracy mattered. Judgments of “impact” and “interest” would be left to posterity, which is the right strategy when publishing is cheap and searching and filtering are easy. The whole endeavour would be a huge boon to “in-between” scientists straddling established fields — such as bioinformaticians.

My first first-author paper, Joint Genome-Wide Profiling of miRNA and mRNA Expression in Alzheimer’s Disease Cortex Reveals Altered miRNA Regulation, went through a fairly standard journal loop. We first submitted it to Genome Biology, which (editorially) deemed it uninteresting to a sufficiently broad readership; then to RNA, which (editorially) decided that our sample size was too small; and finally to PLOS ONE, where it went out to review. After a single revision loop, it was accepted for publication. It’s been cited more than 15 times a year, which is modest but above the Journal Impact Factor for Genome Biology — which means that the editors made a bad call rejecting it outright. (I’m not bitter!)

Overall, it was a very positive first experience at PLOS. Time to acceptance was under 3 months, time to publication under 4. The reviewers were no less harsh than in my previous experiences, so I felt (and still feel) that the reputation of PLOS ONE as a “junk” journal was (is) highly undeserved. (Update: There’s been a big hullabaloo about a recent sting targeting open access journals with a fake paper. PLOS ONE came away unscathed. See also the take of Mike Eisen, co-founder of PLOS.) And the number of citations certainly vindicated PLOS ONE’s approach of ignoring apparent impact.

So, when looking for a home for my equally-awkward postdoc paper (not quite computer vision, not quite neuroscience), PLOS ONE was a natural first choice.

The first thing to go wrong was the time to publication, about 6 months. Still better than many top-tier journals, but no longer a crushing advantage. And it’s not just me: there’s been plenty of discussion about time-to-publication steadily increasing at PLOS ONE. But I was not too worried about the publication time, since I’d put my paper up on the arXiv (and revised it at each round of peer-review, so you can see the revision history there — but not on PLOS ONE).

But, after multiple rounds of review, the time came for production, at which point they messed up two things: they did not include my present address; and they messed up Figure 1, which is supposed to be a small, single-column, illustrative figure, and which they made page-width. The effect is almost comical, and my first impression seeing page 2 would be to think that the authors are trying to mask their incompetence with giant pictures. (We’re not, I swear!)

Figure 1 of my paper on arXiv (left) and PLOS ONE (right)
Figure 1 of our paper on arXiv (left) and PLOS ONE (right)

Both of these mistakes could have been avoided if PLOS ONE did not have a policy of not letting you see the camera-ready pdf before it is published, and of not allowing corrections to papers unless they are technical or scientific, regardless of fault. Not to mention they could have, you know, actually looked at the dimensions embedded in the submitted TIFFs. With a $1,300 publication fee, PLOS could afford to take a little bit of extra care with production. Both of the above policies are utterly unnecessary — the added cost of sending authors a production proof is close to nil, and keeping track of revisions on online publications is also trivial (see the 22 year old arXiv for an example).

We scientists live and die by our papers. We don’t want the culmination of years of work to be marred by a silly, easily-fixed formatting error, ossified by an unwieldy bureaucracy. I’ve been an avid promoter of PLOS (and PLOS ONE in particular) over the past few years, but I’m sad to say that’s not where my next paper will end up.

Ultimately, PLOS ONE’s model, groundbreaking though it was, is already being supplanted by newcomers. PeerJ offers everything PLOS ONE does at a fraction of the cost, and further includes a preprint service and open peer-review. Ditto for F1000 Research, which in addition offers unlimited revisions (a topic close to my heart ;). And both use the excellent MathJAX to render mathematical formulas, unlike PLOS’s archaic use of embedded images. They get my vote for the journals of the future.

[Note: the views expressed herein are mine alone — no co-authors were harmed consulted in the writing of this blog post.]

References

Nunez-Iglesias J, Liu CC, Morgan TE, Finch CE, & Zhou XJ (2010). Joint genome-wide profiling of miRNA and mRNA expression in Alzheimer’s disease cortex reveals altered miRNA regulation. PloS one, 5 (2) PMID: 20126538

Kravitz DJ, & Baker CI (2011). Toward a new model of scientific publishing: discussion and a proposal. Frontiers in computational neuroscience, 5 PMID: 22164143

Juan Nunez-Iglesias, Ryan Kennedy, Toufiq Parag, Jianbo Shi, & Dmitri B. Chklovskii (2013). Machine learning of hierarchical clustering to segment 2D and 3D images arXiv arXiv: 1303.6163v3

Nunez-Iglesias J, Kennedy R, Parag T, Shi J, & Chklovskii DB (2013). Machine Learning of Hierarchical Clustering to Segment 2D and 3D Images. PloS one, 8 (8) PMID: 23977123

Randomise your samples!

ResearchBlogging.orgMicroarrays certainly get a lot of flak for being noisy sources of data. It’s certainly a valid concern, since a single microarray usually measures the expression levels of tens of thousands of genes, and only a few biological samples are examined. There’s no hope of accurately estimating the levels of that many variables with so few samples. Eric Blalock and his colleagues, however, made a compelling case in 2005 that the fault lies not with the technology itself, but with the statistical inferences drawn from the generated data. How then to reconcile the wild variability between published microarray results from different labs with the apparent validity of the technology?

Hyuna Yang and colleagues seem to have at least part of the answer. They had five different research centers analyse the exact same RNA samples, and collected the raw fluorescence values—before normalisation or any other kind of analysis. After a long (and, dare I say, tedious) analysis, they actually found that batch processing effects had a significant effect on the list of affected genes detected. The authors do a good job of explaining what batch effects are, so I’ll open the floor to them: Continue reading Randomise your samples!