Vighnesh is tasked with implementing region adjacency graphs and graph based methods for image segmentation. He initially wrote specific functions for 2D and 3D images, and I suggested that he should merge them: either with n-dimensional code, or, at the very least, by making 2D a special case of 3D. He chose the former, and produced extremely elegant code. Three nested for loops and a large number of neighbour computations were replaced by a function call and a simple loop. Read on to find out how.
Iterating over an array of unknown dimension is not trivial a priori, but thankfully, someone else has already solved that problem: NumPy’s nditer and ndindex functions allow one to efficiently iterate through every point of an n-dimensional array. However, that still leaves the problem of finding neighbors, to determine which regions are adjacent to each other. Again, this is not trivial to do in nD.
scipy.ndimage provides a suitable function, generic_filter. Typically, a filter is used to iterate a “selector” (called a structuring element) over an array, compute some function of all the values covered by the structuring element, and replace the central value by the output of the function. For example, using the structuring element:
Here, we don’t want to create an output array, but an output graph. What to do? As it turns out, Python’s pass-by-reference allowed Vighnesh to do this quite easily using the “extra_arguments” keyword to generic_filter: we can write a filter function that receives the graph and updates it when two distinct values are adjacent! generic_filter passes all values covered by a structuring element as a flat array, in the array order of the structuring element. So Vighnesh wrote the following function:
def _add_edge_filter(values, g):
"""Add an edge between first element in `values` and
all other elements of `values` in the graph `g`.
`values` is expected to be the central value of
the footprint used.
values : array
The array to process.
g : RAG
The graph to add edges in.
0.0 : float
Always returns 0.
values = values.astype(int)
current = values
for value in values[1:]:
This is a rather unconventional use of generic_filter, which is normally used for its output array. Note how the return value of the filter function, _add_edge_filter, is just 0! In our case, the output array contains all 0s, but we use the filter exclusively for its side-effect: adding an edge to the graph g when there is more than one unique value in the footprint. That’s cool.
Continuing, in this first RAG implementation, Vighnesh wanted to segment according to average color, so he further needed to iterate over each pixel/voxel/hypervoxel and keep a running total of the color and the pixel count. He used elements in the graph node dictionary for this and updated them using ndindex:
for index in np.ndindex(labels.shape):
current = labels[index]
g.node[current]['pixel count'] += 1
g.node[current]['total color'] += image[index]
Thus, together, numpy’s nditer, ndindex, and scipy.ndimage’s generic_filter provide a powerful way to perform a large variety of operations on n-dimensional arrays… Much larger than I’d realised!
You can see Vighnesh’s complete pull request here and follow his blog here.
If you use NumPy arrays and their massive bag of tricks, please cite the paper below!
Stefan Van Der Walt, S. Chris Colbert, & Gaël Varoquaux (2011). The NumPy array: a structure for efficient numerical computation Computing in Science and Engineering 13, 2 (2011) 22-30 arXiv: 1102.1523v1
Among the plugin smorgasbord of Fiji is the Bio-Formats importer, which can open any proprietary microscopy file under the sun. (And there’s a lot of them!) Below I will use Jython to open some .lifs, do some processing, and output some .pngs that I can process further using Python/NumPy/scikit-image. (A .lif is a Leica Image File, because there were not enough image file formats before Leica came along.)
The first thing to note is that Jython is not Python, and it is certainly not Python 2.7. In fact, the Fiji Jython interpreter implements Python 2.5, which means no argparse. Not to worry though, as argparse is implemented in a single, pure Python file distributed under the Python license. So:
Tip #1: copy argparse.py into your project.
This way you’ll have access the state of the art in command line argument processing from within the Jython interpreter.
To get Fiji to run your code, you simply feed it your source file on the command line. So, let’s try it out with a simple example, echo.py:
if __name__ == '__main__':
parser = argparse.ArgumentParser(description=
"Parrot back your arguments.")
parser.add_argument('args', nargs="*", help="The input arguments.")
args = parser.parse_args()
for arg in args.args:
Now we can just run this:
$ fiji echo.py hello world
But sadly, Fiji captures any -h calls, which defeats the purpose of using argparse in the first place!
$ fiji echo.py -h
Usage: /Applications/Fiji.app/Contents/MacOS/fiji-macosx [<Java options>.. --] [<ImageJ options>..] [<files>..]
Java options are passed to the Java Runtime, ImageJ
options to ImageJ (or Jython, JRuby, ...).
In addition, the following options are supported by ImageJ:
show this help
show the command line, but do not run anything
(… and so on, the output is quite huge.)
(Note also that I aliased the Fiji binary, that long path under /Applications, to a simple fiji command; I recommend you do the same.)
However, we can work around this by calling help using Python as the interpreter, and only using Fiji to actually run the file:
$ python echo.py -h
usage: echo.py [-h] [args [args ...]]
Parrot back your arguments.
args The input arguments.
-h, --help show this help message and exit
That’s more like it! Now we can start to build something a bit more interesting, for example, something that converts arbitrary image files to png:
from ij import IJ # the IJ class has utility methods for many common tasks.
"""Convert the input file to png format.
fn : string
The filename of the image to be converted.
imp = IJ.openImage(fn)
# imp is the common name for an ImagePlus object,
# ImageJ's base image class
fnout = fn.rsplit('.', 1) + '.png'
IJ.saveAs(imp, 'png', fnout)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description="Convert TIFF to PNG.")
parser.add_argument('images', nargs='+', help="Input images.")
args = parser.parse_args()
for fn in args.images:
Boom, we’re done. But wait, we actually broke the Python interpreter compatibility, since ij is not a Python library!
$ python convert2png.py -h
Traceback (most recent call last):
File "convert.py", line 2, in <module>
from ij import IJ # the IJ class has utility methods for many common tasks.
ImportError: No module named ij
Which brings us to:
Tip #2: only import Java API functions within the functions that use them.
By moving the from ij import IJ statement into the convert function, we maintain compatibility with Python, and can continue to use argparse’s helpful documentation strings.
Next, we want to use the Bio-Formats importer, which is class BF in loci.plugins. Figuring out the class hierarchy for arbitrary plugins is tricky, but you can find it here for core ImageJ (using lovely 1990s-style frames) and here for Bio-Formats, and Curtis Rueden has made this list for other common plugins.
When you try to open a file with Bio-Formats importer using the Fiji GUI, you get the following dialog:
That’s a lot of options, and we actually want to set some of them. If you look at the BF.openImagePlus documentation, you can see that this is done through an ImporterOptions class located in loci.plugins.in. You’ll notice that “in” is a reserved word in Python, so from loci.plugins.in import ImporterOptions is not a valid Python statement. Yay! My workaround:
Tip #3: move your Fiji imports to an external file.
So I have a jython_imports.py file with just:
from ij import IJ
from loci.plugins import BF
from loci.plugins.in import ImporterOptions
Then, inside the convert_files() function, we just do:
from jython_imports import IJ, BF, ImporterOptions
This way, the main file remains Python-compatible until the convert() function is actually called, regardless of whatever funky and unpythonic stuff is happening in jython_imports.py.
Onto the options. If you untick “Open files individually”, it will open up all matching files in a directory, regardless of your input filename! Not good. So now we play a pattern-matching game in which we match the option description in the above dialog with the ImporterOptions API calls. In this case, we setUngroupFiles(True). To specify a filename, we setId(filename). Additionally, because we want all of the images in the .lif file, we setOpenAllSeries(True).
Next, each image in the series is 3D and has three channels, but we are only interested in a summed z-projection of the first channel. There’s a set of ImporterOptions methods tantalizingly named setCBegin, setCEnd, and setCStep, but this is where I found the documentation sorely lacking. The functions take (int s, int value) as arguments, but what’s s??? Are the limits closed or open? Code review is a wonderful thing, and this would not have passed it. To figure things out:
Tip #4: use Fiji’s interactive Jython interpreter to figure things out quickly.
You can find the Jython interpreter under Plugins/Scripting/Jython Interpreter. It’s no IPython, but it is extremely helpful to answer the questions I had above. My hypothesis was that s was the series, and that the intervals would be closed. So:
Now, let’s try the same thing with setCEnd, assuming closed interval:
>>> opts.setCEnd(0, 0) ## only read channels up to 0 for series 0?
>>> opts.setCEnd(2, 0) ## only read channels up to 0 for series 2?
>>> imps = BF.openImagePlus(opts)
array('i', [1024, 1024, 1, 31, 1])
array('i', [1024, 1024, 3, 34, 1])
array('i', [1024, 1024, 1, 30, 1])
Nothing there to disprove my hypothesis! So we move on to the final step, which is to z-project the stack by summing the intensity over all z values. This is normally accessed via Image/Stacks/Z Project in the Fiji GUI, and I found the corresponding ij.plugin.ZProjector class by searching for “proj” in the ImageJ documentation. A ZProjector object has a setMethod method that usefully takes an int as an argument, with no explanation in its docstring as to which int translates to which method (sum, average, max, etc.). A little more digging in the source code reveals some class static variables, AVG_METHOD, MAX_METHOD, and so on.
Tip #5: don’t be afraid to look at the source code. It’s one of the main advantages of working in open-source.
The output is actually a float-typed image, which will get rescaled to [0, 255] uint8 on save if we don’t fix it. So, to wrap up, we convert the image to 16 bits (making sure to turn off scaling), use the series title to generate a unique filename, and save as a PNG:
>>> from ij.process import ImageConverter
>>> conv = ImageConverter(impout)
>>> title = imps.getTitle().rsplit(" ", 1)[-1]
>>> IJ.saveAs(impout, 'png', "myFile-" + title + ".png")
You can see the final result of my sleuthing in lif2png.py and jython_imports.py. If you would do something differently, pull requests are always welcome.
Before I sign off, let me recap my tips:
1. copy argparse.py into your project;
2. only import Java API functions within the functions that use them;
3. move your Fiji imports to an external file;
4. use Fiji’s interactive Jython interpreter to figure things out quickly; and
5. don’t be afraid to look at the source code.
And let me add a few final comments: once I started digging into all of Fiji’s plugins, I found documentation of very variable quality, and worse, virtually zero consistency between the interfaces to each plugin. Some work on “the currently active image”, some take an ImagePlus instance as input, and others still a filename or a directory name. Outputs are equally variable. This has been a huge pain when trying to work with these plugins.
But, on the flipside, this is the most complete collection of image processing functions anywhere. Along with the seamless access to all those functions from Jython and other languages, that makes Fiji very worthy of your attention.
Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez JY, White DJ, Hartenstein V, Eliceiri K, Tomancak P, & Cardona A (2012). Fiji: an open-source platform for biological-image analysis. Nature methods, 9 (7), 676-82 PMID: 22743772
Linkert M, Rueden CT, Allan C, Burel JM, Moore W, Patterson A, Loranger B, Moore J, Neves C, Macdonald D, Tarkowska A, Sticco C, Hill E, Rossner M, Eliceiri KW, & Swedlow JR (2010). Metadata matters: access to image data in the real world. The Journal of cell biology, 189 (5), 777-82 PMID: 20513764
Time-to-publication at the world’s biggest scientific journal has grown dramatically, but the nail in the coffin was its poor production policies.
When PLOS ONE was announced in 2006, its charter immediately resonated with me. This would be the first journal where only scientific accuracy mattered. Judgments of “impact” and “interest” would be left to posterity, which is the right strategy when publishing is cheap and searching and filtering are easy. The whole endeavour would be a huge boon to “in-between” scientists straddling established fields — such as bioinformaticians.
My first first-author paper, Joint Genome-Wide Profiling of miRNA and mRNA Expression in Alzheimer’s Disease Cortex Reveals Altered miRNA Regulation, went through a fairly standard journal loop. We first submitted it to Genome Biology, which (editorially) deemed it uninteresting to a sufficiently broad readership; then to RNA, which (editorially) decided that our sample size was too small; and finally to PLOS ONE, where it went out to review. After a single revision loop, it was accepted for publication. It’s been cited more than 15 times a year, which is modest but above the Journal Impact Factor for Genome Biology — which means that the editors made a bad call rejecting it outright. (I’m not bitter!)
Overall, it was a very positive first experience at PLOS. Time to acceptance was under 3 months, time to publication under 4. The reviewers were no less harsh than in my previous experiences, so I felt (and still feel) that the reputation of PLOS ONE as a “junk” journal was (is) highly undeserved. (Update: There’s been a big hullabaloo about a recent sting targeting open access journals with a fake paper. PLOS ONE came away unscathed. See also the take of Mike Eisen, co-founder of PLOS.) And the number of citations certainly vindicated PLOS ONE’s approach of ignoring apparent impact.
So, when looking for a home for my equally-awkward postdoc paper (not quite computer vision, not quite neuroscience), PLOS ONE was a natural first choice.
The first thing to go wrong was the time to publication, about 6 months. Still better than many top-tier journals, but no longer a crushing advantage. And it’s not just me: there’s been plenty of discussion about time-to-publication steadily increasing at PLOS ONE. But I was not too worried about the publication time, since I’d put my paper up on the arXiv (and revised it at each round of peer-review, so you can see the revision history there — but not on PLOS ONE).
But, after multiple rounds of review, the time came for production, at which point they messed up two things: they did not include my present address; and they messed up Figure 1, which is supposed to be a small, single-column, illustrative figure, and which they made page-width. The effect is almost comical, and my first impression seeing page 2 would be to think that the authors are trying to mask their incompetence with giant pictures. (We’re not, I swear!)
Both of these mistakes could have been avoided if PLOS ONE did not have a policy of not letting you see the camera-ready pdf before it is published, and of not allowing corrections to papers unless they are technical or scientific, regardless of fault. Not to mention they could have, you know, actually looked at the dimensions embedded in the submitted TIFFs. With a $1,300 publication fee, PLOS could afford to take a little bit of extra care with production. Both of the above policies are utterly unnecessary — the added cost of sending authors a production proof is close to nil, and keeping track of revisions on online publications is also trivial (see the 22 year old arXiv for an example).
We scientists live and die by our papers. We don’t want the culmination of years of work to be marred by a silly, easily-fixed formatting error, ossified by an unwieldy bureaucracy. I’ve been an avid promoter of PLOS (and PLOS ONE in particular) over the past few years, but I’m sad to say that’s not where my next paper will end up.
Ultimately, PLOS ONE’s model, groundbreaking though it was, is already being supplanted by newcomers. PeerJ offers everything PLOS ONE does at a fraction of the cost, and further includes a preprint service and open peer-review. Ditto for F1000 Research, which in addition offers unlimited revisions (a topic close to my heart ;). And both use the excellent MathJAX to render mathematical formulas, unlike PLOS’s archaic use of embedded images. They get my vote for the journals of the future.
[Note: the views expressed herein are mine alone — no co-authors were harmed consulted in the writing of this blog post.]
Nunez-Iglesias J, Liu CC, Morgan TE, Finch CE, & Zhou XJ (2010). Joint genome-wide profiling of miRNA and mRNA expression in Alzheimer’s disease cortex reveals altered miRNA regulation. PloS one, 5 (2) PMID: 20126538
Kravitz DJ, & Baker CI (2011). Toward a new model of scientific publishing: discussion and a proposal. Frontiers in computational neuroscience, 5 PMID: 22164143
Juan Nunez-Iglesias, Ryan Kennedy, Toufiq Parag, Jianbo Shi, & Dmitri B. Chklovskii (2013). Machine learning of hierarchical clustering to segment 2D and 3D images arXiv arXiv: 1303.6163v3
Nunez-Iglesias J, Kennedy R, Parag T, Shi J, & Chklovskii DB (2013). Machine Learning of Hierarchical Clustering to Segment 2D and 3D Images. PloS one, 8 (8) PMID: 23977123
Microarrays certainly get a lot of flak for being noisy sources of data. It’s certainly a valid concern, since a single microarray usually measures the expression levels of tens of thousands of genes, and only a few biological samples are examined. There’s no hope of accurately estimating the levels of that many variables with so few samples. Eric Blalock and his colleagues, however, made a compelling case in 2005 that the fault lies not with the technology itself, but with the statistical inferences drawn from the generated data. How then to reconcile the wild variability between published microarray results from different labs with the apparent validity of the technology?
Hyuna Yang and colleagues seem to have at least part of the answer. They had five different research centers analyse the exact same RNA samples, and collected the raw fluorescence values—before normalisation or any other kind of analysis. After a long (and, dare I say, tedious) analysis, they actually found that batch processing effects had a significant effect on the list of affected genes detected. The authors do a good job of explaining what batch effects are, so I’ll open the floor to them: Continue reading Randomise your samples!→