Continuous integration in Python, Volume 0: automated tests with pytest

I just finished the process of setting up continuous integration from scratch for one of my projects, cellom2tif, a simple image file converter/liberator. I thought I would write a blog post about that process, but it has slowly mutated into a hefty document that I thought would work better as a series. I’ll cover automated testing, test coverage, and how to get these to run automatically for your project with Travis-CI and Coveralls.

Without further ado, here goes the first post: how to set up automated testing for your Python project using pytest.

Automated tests, and why you need them

Software engineering is hard, and it’s incredibly easy to mess things up, so you should write tests for all your functions, which ensure that nothing obviously stupid is going wrong. Tests can take a lot of different forms, but here’s a really basic example. Suppose this is a function in your file,

def square(x):
    return x ** 2

Then, elsewhere in your package, you should have a file with the following function definition:

from maths import square
def test_square():
    x = 4
    assert square(x) == 16

This way, if someone (such as your future self) comes along and messes with the code in square(), test_square will tell you whether they broke it.

Testing in Python with pytest

A whole slew of testing frameworks, such as nose, will then traverse your project, look for files and functions whose names begin with test_, run them, and report any errors or assertion failures.

I’ve chosen to use pytest as my framework because:

  • it is a strict superset of both nose and Python’s built-in unittest, so that if you run it on projects set up with those, it’ll work out of the box;
  • its output is more readable than nose’s; and
  • its fixtures provide a very nice syntax for test setup and cleanup.

For a quick intro to pytest, you can watch Andreas Pelme’s talk at this year’s EuroPython conference:

But the basics are very simple: sprinkle files named throughout your project, each containing one or more test_function() definition; then type py.test on the command line (at your project root directory), and voila! Pytest will traverse all your subdirectories, gather up all the test files and functions, and run your tests.

Here’s the output for the minimal maths project described above:

~/projects/maths $ py.test
============================= test session starts ==============================
platform darwin -- Python 2.7.8 -- py-1.4.20 -- pytest-2.5.2
collected 1 items .

=========================== 1 passed in 0.03 seconds ===========================

In addition to the test functions described above, there is a Python standard called doctest in which properly formatted usage examples in your documentation are automatically run as tests. Here's an example:

def square(x):
    """Return the square of a number `x`.


    >>> square(5)
    return x ** 2

(See my post on NumPy docstring conventions for what should go in the ellipsis above.)

Depending the complexity of your code, doctests, test functions, or both will be the appropriate route. Pytest supports doctests with the --doctest-modules flag. (This runs both your standard tests and doctests.)

~/projects/maths $ py.test --doctest-modules
============================= test session starts ==============================
platform darwin -- Python 2.7.8 -- py-1.4.20 -- pytest-2.5.2
collected 2 items . .

=========================== 2 passed in 0.06 seconds ===========================

Test-driven development

That was easy! And yet most people, my past self included, neglect tests, thinking they’ll do them eventually, when the software is ready. This is backwards. You’ve probably heard the phrase “Test-driven development (TDD)”; this is what they’re talking about: writing your tests before you’ve written the functionality to pass them. It might initially seem like wasted effort, like you’re not making progress in what you actually want to do, which is write your software. But it’s not:

By spending a bit of extra effort to prevent bugs down the road, you will get to where you want to go faster.

That’s it for volume 0! Watch out for the next post: ensuring your tests are thorough by measuring test coverage.

Use pmset to restore your Mac’s instant-wake after upgrading to Yosemite

I just upgraded to Mac OS X Yosemite Beta 3, and sure enough, Apple has once again broken my time-to-deep-sleep setting. After I upgraded, I noticed that my computer took a few seconds to wake from sleep, rather than the usual instant-on that defined OS X for so long. I checked pmset, Mac’s power management command-line utility:

$ pmset -g
Active Profiles:
Battery Power       -1*
AC Power        -1
Currently in use:
 standbydelay         4200
 standby              1
 halfdim              1
 hibernatefile        /var/vm/sleepimage
 gpuswitch            2
 darkwakes            0
 disksleep            10
 sleep                0
 autopoweroffdelay    14400
 hibernatemode        3
 autopoweroff         1
 ttyskeepawake        1
 displaysleep         10
 acwake               0
 lidwake              1

Sure enough, the standbydelay setting, the time until the computer switches from “sleep” to “deep sleep”, is back to 4,200 seconds (1 hour 10 minutes), not the 24 hours that I had set it to last year.

So, as before, if you want to speed up your wake up time in everyday use of your laptop, run the following command in the Terminal:

sudo pmset -a standbydelay 86400

Check in for a new post come next upgrade! (Mammoth?)

Read microscopy images to numpy arrays with python-bioformats

The python-bioformats library lets you seamlessly read microscopy images into numpy arrays from pretty much any file format.

I recently explained how to use Fiji’s Jython interpreter to open BioFormats images, do some processing, and save the result to a standard format such as TIFF. Since then, the
CellProfiler team has released an incredible tool, the python-bioformats library. It uses the Java Native Interface (JNI) to access the Java BioFormats library and give you back a numpy array within Python. In other words, it’s magic.

Some of the stuff I was doing in the Jython interpreter was not going to fly for a 400GB image file produced by Leica (namely, setting the flag setOpenAllSeries(True)). This file contained 3D images of multiple zebrafish embryos, obtained every 30 minutes for three days. I needed to process each image sequentially.

The first problem was that even reading the metadata from the file resulted in a Java out-of-memory error! With the help of Lee Kamentsky, one of the creators of python-bioformats, I figured out that Java allocates a maximum memory footprint of just 256MB. With the raw metadata string occupying 27MB, this was not enough to contain the full structure of the parsed metadata tree. The solution was simply to set a much larger maximum memory allocation to the JVM:

import javabridge as jv, bioformats as bf
jv.start_vm(class_path=bf.JARS, max_heap_size='8G')

Once that was done, it was straightforward enough to get an XML string of metadata to get information about all the images contained in the file:

md = bf.get_omexml_metadata('Long time Gfap 260314.lif')

This was my first experience with XML, and I found the official schema extremely intimidating. Thankfully, as usual, Python makes common things easy, and parsing XML is a very common thing. Using the xml package from the Python standard library, it is easy enough to get the information I want from the XML string:

from xml.etree import ElementTree as ETree
mdroot = ETree.fromstring(md)

You might have gathered that the ElementTree object contains a rooted tree with all the information about our file. Each node has a tag, attributes, and children. The root contains information about each series:

>>> print(mdroot[300].tag, mdroot[300].attrib)
('{}Image', {'ID': 'Image:121', 'Name': '41h to 47.5 hpSCI/Pos021_S001'})

And each series contains information about its acquisition, physical measurements, and pixel measurements:

>>> for a in mdroot[300]:
...     print((a.tag, a.attrib))
('{}AcquiredDate', {}),
('{}InstrumentRef', {'ID': 'Instrument:121'}),
('{}ObjectiveSettings', {'RefractiveIndex': '1.33', 'ID': 'Objective:121:0'}),
('{}Pixels', {'SizeT': '14', 'DimensionOrder': 'XYCZT', 'PhysicalSizeY': '0.445197265625', 'PhysicalSizeX': '0.445197265625', 'PhysicalSizeZ': '1.9912714979001302', 'SizeX': '1024', 'SizeY': '1024', 'SizeZ': '108', 'SizeC': '2', 'Type': 'uint8', 'ID': 'Pixels:121'})

I only need a fraction of this metadata, so I wrote a function, parse_xml_metadata, to parse out the image names, their size in pixels, and their physical resolution.

Armed with this knowledge, it is then straightforward to preallocate a numpy array for each image and read the image from disk:

from matplotlib import pyplot as plt, cm
from lesion.lifio import parse_xml_metadata
import numpy as np
from __future__ import division

filename = 'Long time Gfap 260314.lif'
rdr = bf.ImageReader(filename, perform_init=True)
names, sizes, resolutions = parse_xml_metadata(md)
idx = 50 # arbitrary series for demonstration
size = sizes[idx]
nt, nz = size[:2]
image5d = np.empty(size, np.uint8)
for t in range(nt):
    for z in range(nz):
        image5d[t, z] =, t=t, series=idx, rescale=False)
plt.imshow(image5d[nt//2, nz//2, :, :, 0], cmap=cm.gray)

2D slice from 5D volume

Boom. Using Python BioFormats, I’ve read in a small(ish) part of a quasi-terabyte image file into a numpy array, ready for further processing.

Note: the dimension order here is time, z, y, x, channel, or TZYXC, which I think is the most efficient way to read these files in. My wrapper allows arbitrary dimension order, so it’ll be good to use it to figure out the fastest way to iterate through the volume.

In my case, I’m looking to extract statistics using scikit-image’s profile_line function, and plot their evolution over time. Here’s the min/max intensity profile along the embryo for a sample stack:

min/max vs time

I still need to clean up the code, in particular to detect bad images (no prizes for guessing which timepoint was bad here), but my point for now is that, thanks to Python BioFormats, doing your entire bioimage analysis in Python just got a heck of a lot easier.

SciPy 2014: an extremely useful conference with a diversity problem

I just got back home from the SciPy 2014 conference in Austin, TX. Here are my thoughts after this year’s conference.

About SciPy in general

Since my first SciPy in 2012, I’ve decided to prioritise programming conferences over scientific ones, because the value I get for my time is just that much higher. At a scientific conference, I certainly become aware of way-cool stuff going on in other research labs in my area. Once I get home, however, I go back to whatever I was doing. In contrast, at programming conferences, I become aware of new tools and practices that change the way I do science. In his keynote this year, Greg Wilson said of Software Carpentry, “We save researchers a day a week for the rest of their careers.” I feel the same way about SciPy in general.

In the 2012 sprints, I learned about GitHub Pull Requests and code review, the lingua franca of open source development today. I can’t express how useful that’s been. I also started my ongoing collaboration with the scikit-image project, which has enabled my research to reach far more users than I ever could have achieved on my own.

No scientific conference I’ve been to has had such an impact on my science output, nor can I imagine one doing so.

This year’s highlights

This year was no different. Without further ado, here are my top hits from this year’s conference:

  • Michael Droettboom talked about his continuous benchmarking project, Airspeed Velocity. It is hilariously named and incredibly useful. asv checks out code from your Git repo at regular intervals and runs benchmarks (which you define), and plots your code’s performance over time. It’s an incredible guard against feature creep slowing down your code base.
  • IPython recently unveiled their modal version 2.0 interface, sending vimmers worldwide rejoicing. But a few key bindings are just wrong from a vim perspective. Most egregiously, i, which should enter edit mode, interrupts the kernel when pressed twice! That’s just evil. Paul Ivanov goes all in with vim keybindings with his hilarious and life-changing IPython vimception talk. The title is more appropriate than I realised. Must-watch.
  • Damián Avila revealed (heh) his live IPython presentations with Reveal.js, forever changing how Python programmers present their work. I was actually aware of this before the conference and used it in my own talk, but you should definitely watch his talk and get the extension from his repo.
  • Min RK gave an excellent tutorial on IPython parallel (repo, videos 1, 2, 3). It’s remarkable what the IPython team have achieved thanks to their decoupling of the interactive shell and the computation “kernel”. (I still hate that word.)
  • Brian Granger and Jon Frederic gave an excellent tutorial on IPython interactive widgets (notebook here). They provide a simple and fast way to interactively explore your data. I’ve already started using these on my own problems.
  • Aaron Meurer gave the best introduction to the Python packaging problem that I’ve ever seen, and how Continuum’s conda project solves it. I think we still need an in-depth tutorial on how package distributors should use conda, but for users, conda is just awesome, and this talk explains why.
  • Matt Rocklin has a gift for crystal clear speaking, despite his incredible speed, and it was on full display in his (and Mark Wiebe’s) talk on Blaze, Continuum’s new array abstraction library. I’m not sure I’ll be using Blaze immediately but it’s certainly on my radar now!
  • Lightning talks are always fun: days 1, 2, 3. Watch out for Fernando Pérez’s announcement of Project Jupyter, the evolution of the IPython notebook, and for Damon McDougall’s riveting history of waffles. (You think I’m joking.)

Apologies if I’ve missed anyone: with three tracks, an added one with the World Cup matches ;) , and my own talk preparations, “overwhelming” does not begin to describe the conference! I will second Aaron Meurer’s assertion that there were no bad talks. Which brings us to…

On my to-watch

Jake Vanderplas recently wrote a series of blog posts (last one here, with links to earlier posts) comparing frequentist and Bayesian approaches to statistical analysis, in which he makes a compelling argument that we should all relegate frequentism to the dustbin of history. As such, I intend to go over Chris Fonnesbeck’s tutorial (2, 3) and talk about Bayesian analysis in Python using PyMC.

David Sanders also did a Julia tutorial (part 2) that was scheduled at the same time as my own scikit-image tutorial, but I’m interested to see how the Julia ecosystem is progressing, so that should be a good place to start. (Although I’m still bitter that they went with 1-based indexing!)

The reproducible science tutorial (part 2) generated quite a bit of buzz so I would be interested to go over that one as well.

For those interested in computing education or in geoscience, the conference had dedicated tracks for each of those, so you are bound to find something you like (not least, Lorena Barba’s and Greg Wilson’s keynotes). Have a look at the full listing of videos here. These might be easier to navigate by looking at the conference schedule.

The SciPy experience

I want to close this post with a few reflections on the conference itself.

SciPy is broken up into three “stages”: two days of tutorials, three days of talks, and two days of sprints. Above, I covered the tutorials and talks, but to me, the sprints are what truly distinguish SciPy. The spirit of collaboration is unparalleled, and an astonishing amount of value is generated in those two days, either in the shape of code, or in introducing newcomers to new projects and new ways to collaborate in programming.

My biggest regret of the conference was not giving a lightning talk urging people to come to the sprints. I repeatedly asked people whether they were coming to the sprints, and almost invariably the answer was that they didn’t feel they were good enough to contribute. To reiterate my previous statements: (1) when I attended my first sprint in 2012, I had never done a pull request; (2) sprints are an excellent way to introduce newcomers to projects and to the pull request development model. All the buzz around the sprints was how welcoming all of the teams were, but I think there is a massive number of missed opportunities because this is not made obvious to attendees before the sprints.

Lastly, a few notes on diversity. During the conference, April Wright, a student in evolutionary biology at UT Austin, wrote a heartbreaking blog post about how excluded she felt from a conference where only 15% of attendees were women. That particular incident was joyfully resolved, with plenty of SciPyers reaching out to April and inviting her along to sprints and other events. But it highlighted just how poorly we are doing in terms of diversity. Andy Terrel, one of the conference organisers, pointed out that 15% is much better than 2012’s three (women, not percent!), but (a) that is still extremely low, and (b) I was horrified to read this because I was there in 2012… And I did not notice that anything was wrong. How can it be, in 2012, that it can seem normal to be at a professional conference and have effectively zero women around? It doesn’t matter what one says about the background percentage of women in our industry and so on… Maybe SciPy is doing all it can about diversity. (Though I doubt it.) The point is that a scene like that should feel like one of those deserted cityscapes in post-apocalyptic movies. As long as it doesn’t, as long as SciPy feels normal, we will continue to have diversity problems. I hope my fellow SciPyers look at these numbers, feel appalled as I have, and try to improve.

… And on cue, while I was writing this post, Andy Terrel wrote a great post of his own about this very topic:

I still consider SciPy a fantastic conference. Jonathan Eisen (@phylogenomics), whom I admire, would undoubtedly boycott it because of the problems outlined above, but I am heartened that the organising committee is taking this as a serious problem and trying hard fix it. I hope next time it is even better.

A clever use of SciPy’s ndimage.generic_filter for n-dimensional image processing

This year I am privileged to be a mentor in the Google Summer of Code for the scikit-image project, as part of the Python Software Foundation organisation. Our student, Vighnesh Birodkar, recently came up with a clever use of SciPy’s ndimage.generic_filter that is certainly worth sharing widely.

Vighnesh is tasked with implementing region adjacency graphs and graph based methods for image segmentation. He initially wrote specific functions for 2D and 3D images, and I suggested that he should merge them: either with n-dimensional code, or, at the very least, by making 2D a special case of 3D. He chose the former, and produced extremely elegant code. Three nested for loops and a large number of neighbour computations were replaced by a function call and a simple loop. Read on to find out how.

Iterating over an array of unknown dimension is not trivial a priori, but thankfully, someone else has already solved that problem: NumPy’s nditer and ndindex functions allow one to efficiently iterate through every point of an n-dimensional array. However, that still leaves the problem of finding neighbors, to determine which regions are adjacent to each other. Again, this is not trivial to do in nD.

scipy.ndimage provides a suitable function, generic_filter. Typically, a filter is used to iterate a “selector” (called a structuring element) over an array, compute some function of all the values covered by the structuring element, and replace the central value by the output of the function. For example, using the structuring element:

fp = np.array([[0, 1, 0],
               [1, 1, 1],
               [0, 1, 0]], np.uint8)

and the function np.median on a 2D image produces a median filter over a pixel’s immediate neighbors. That is,

import functools
median_filter = functools.partial(generic_filter,

Here, we don’t want to create an output array, but an output graph. What to do? As it turns out, Python’s pass-by-reference allowed Vighnesh to do this quite easily using the “extra_arguments” keyword to generic_filter: we can write a filter function that receives the graph and updates it when two distinct values are adjacent! generic_filter passes all values covered by a structuring element as a flat array, in the array order of the structuring element. So Vighnesh wrote the following function:

def _add_edge_filter(values, g):
    """Add an edge between first element in `values` and
    all other elements of `values` in the graph `g`.
    `values[0]` is expected to be the central value of
    the footprint used.

    values : array
        The array to process.
    g : RAG
        The graph to add edges in.

    0.0 : float
        Always returns 0.

    values = values.astype(int)
    current = values[0]
    for value in values[1:]:
        g.add_edge(current, value)
    return 0.0

Then, using the footprint:

fp = np.array([[0, 0, 0],
               [0, 1, 1],
               [0, 1, 0]], np.uint8)

(or its n-dimensional analog), this filter is called as follows on labels, the image containing the region labels:


This is a rather unconventional use of generic_filter, which is normally used for its output array. Note how the return value of the filter function, _add_edge_filter, is just 0! In our case, the output array contains all 0s, but we use the filter exclusively for its side-effect: adding an edge to the graph g when there is more than one unique value in the footprint. That’s cool.

Continuing, in this first RAG implementation, Vighnesh wanted to segment according to average color, so he further needed to iterate over each pixel/voxel/hypervoxel and keep a running total of the color and the pixel count. He used elements in the graph node dictionary for this and updated them using ndindex:

for index in np.ndindex(labels.shape):
    current = labels[index]
    g.node[current]['pixel count'] += 1
    g.node[current]['total color'] += image[index]

Thus, together, numpy’s nditer, ndindex, and scipy.ndimage’s generic_filter provide a powerful way to perform a large variety of operations on n-dimensional arrays… Much larger than I’d realised!

You can see Vighnesh’s complete pull request here and follow his blog here.

If you use NumPy arrays and their massive bag of tricks, please cite the paper below!

Stefan Van Der Walt, S. Chris Colbert, & Gaël Varoquaux (2011). The NumPy array: a structure for efficient numerical computation Computing in Science and Engineering 13, 2 (2011) 22-30 arXiv: 1102.1523v1

Elsevier et al’s pricing douchebaggery exposed

Ted Bergstrom and a few colleagues have just come out with an epic paper in which they reveal how much for-profit academic publishing companies charge university libraries, numbers that had previously been kept secret. The paper is ostensibly in the field of economics, but it could be more accurately described as “sticking-it-to-the-man-ology”.

This paragraph in the footnotes was particularly concise and fun to read:

Elsevier contested our contract request from Washington State University on the grounds that their pricing policy was a trade secret, and brought suit against the university. The Superior Court judge ruled that Washington State University could release the contracts to us. Elsevier and Springer also contested our request for contracts from the University of Texas (UT) System. The Texas state attorney general opined that the UT System was required to release copies of all of these contracts.

In other words: in the interest of full disclosure, suck it, Elsevier!

The executive summary is that these companies will extort as much as they possibly can from individual universities, then do everything to keep that number a secret so that they can more freely extort even more from other universities. You can read more about it in Science.

Oh, and in the ultimate twist of irony, the paper itself is behind PNAS’s paywall. How much did your university pay for that?

Bergstrom, T., Courant, P., McAfee, R., & Williams, M. (2014). Evaluating big deal journal bundles Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1403006111

An update on mixing Java and Python with Fiji

Two weeks ago I posted about invoking ImageJ functions from Python using Fiji’s Jython interpreter. A couple of updates on the topic:

First, I’ve made a repository with a template project encapsulating my tips from that post. It’s very simple to get a Fiji Jython script working from that template. As an example, here’s a script to evaluate segmentations using the metric used by the SNEMI3D segmentation challenge (a slightly modified version of the adapted Rand error).

Second, this entire discussion might be rendered obsolete by two incredible projects from the CellProfiler team: Python-Javabridge, which allows Python to interact seamlessly with Java code, and Python-Bioformats, which uses Python-Javabridge to read Bioformats images into Python. I have yet to play with them, but both look like cleaner alternatives to interact with ImageJ than my Jython scripting! At some point I’ll write a post exploring these tools, but if you get to it before me, please mention it in the comments!

Get the best of both worlds with Fiji’s Jython interpreter

Fiji is just ImageJ, with batteries included. It contains plugins to do virtually anything you would want to do to an image. Since my go-to programming language is Python, my favorite feature of Fiji is its language-agnostic API, which supports a plethora of languages, including Java, Javascript, Clojure, and of course Python; 7 languages in all. (Find these under Plugins/Scripting/Script Editor.) Read on to learn more about the ins and outs of using Python to drive Fiji.

Among the plugin smorgasbord of Fiji is the Bio-Formats importer, which can open any proprietary microscopy file under the sun. (And there’s a lot of them!) Below I will use Jython to open some .lifs, do some processing, and output some .pngs that I can process further using Python/NumPy/scikit-image. (A .lif is a Leica Image File, because there were not enough image file formats before Leica came along.)

The first thing to note is that Jython is not Python, and it is certainly not Python 2.7. In fact, the Fiji Jython interpreter implements Python 2.5, which means no argparse. Not to worry though, as argparse is implemented in a single, pure Python file distributed under the Python license. So:

Tip #1: copy into your project.

This way you’ll have access the state of the art in command line argument processing from within the Jython interpreter.

To get Fiji to run your code, you simply feed it your source file on the command line. So, let’s try it out with a simple example,

import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description=
                                     "Parrot back your arguments.")
    parser.add_argument('args', nargs="*", help="The input arguments.")
    args = parser.parse_args()
    for arg in args.args:

Now we can just run this:

$ fiji hello world

But sadly, Fiji captures any -h calls, which defeats the purpose of using argparse in the first place!

$ fiji -h
Usage: /Applications/ [<Java options>.. --] [<ImageJ options>..] [<files>..]

Java options are passed to the Java Runtime, ImageJ
options to ImageJ (or Jython, JRuby, ...).

In addition, the following options are supported by ImageJ:
General options:
--help, -h
	show this help
	show the command line, but do not run anything
	verbose output

(… and so on, the output is quite huge.)

(Note also that I aliased the Fiji binary, that long path under /Applications, to a simple fiji command; I recommend you do the same.)

However, we can work around this by calling help using Python as the interpreter, and only using Fiji to actually run the file:

$ python -h
usage: [-h] [args [args ...]]

Parrot back your arguments.

positional arguments:
  args        The input arguments.

optional arguments:
  -h, --help  show this help message and exit

That’s more like it! Now we can start to build something a bit more interesting, for example, something that converts arbitrary image files to png:

import argparse
from ij import IJ # the IJ class has utility methods for many common tasks.

def convert_file(fn):
    """Convert the input file to png format.

    fn : string
        The filename of the image to be converted.
    imp = IJ.openImage(fn)
    # imp is the common name for an ImagePlus object,
    # ImageJ's base image class
    fnout = fn.rsplit('.', 1)[0] + '.png'
    IJ.saveAs(imp, 'png', fnout)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Convert TIFF to PNG.")
    parser.add_argument('images', nargs='+', help="Input images.")

    args = parser.parse_args()
    for fn in args.images:

Boom, we’re done. But wait, we actually broke the Python interpreter compatibility, since ij is not a Python library!

$ python -h
Traceback (most recent call last):
  File "", line 2, in <module>
    from ij import IJ # the IJ class has utility methods for many common tasks.
ImportError: No module named ij

Which brings us to:

Tip #2: only import Java API functions within the functions that use them.

By moving the from ij import IJ statement into the convert function, we maintain compatibility with Python, and can continue to use argparse’s helpful documentation strings.

Next, we want to use the Bio-Formats importer, which is class BF in loci.plugins. Figuring out the class hierarchy for arbitrary plugins is tricky, but you can find it here for core ImageJ (using lovely 1990s-style frames) and here for Bio-Formats, and Curtis Rueden has made this list for other common plugins.

When you try to open a file with Bio-Formats importer using the Fiji GUI, you get the following dialog:

BioFormats import window
BioFormats import window

That’s a lot of options, and we actually want to set some of them. If you look at the BF.openImagePlus documentation, you can see that this is done through an ImporterOptions class located in You’ll notice that “in” is a reserved word in Python, so from import ImporterOptions is not a valid Python statement. Yay! My workaround:

Tip #3: move your Fiji imports to an external file.

So I have a file with just:

from ij import IJ
from loci.plugins import BF
from import ImporterOptions

Then, inside the convert_files() function, we just do:

from jython_imports import IJ, BF, ImporterOptions

This way, the main file remains Python-compatible until the convert() function is actually called, regardless of whatever funky and unpythonic stuff is happening in

Onto the options. If you untick “Open files individually”, it will open up all matching files in a directory, regardless of your input filename! Not good. So now we play a pattern-matching game in which we match the option description in the above dialog with the ImporterOptions API calls. In this case, we setUngroupFiles(True). To specify a filename, we setId(filename). Additionally, because we want all of the images in the .lif file, we setOpenAllSeries(True).

Next, each image in the series is 3D and has three channels, but we are only interested in a summed z-projection of the first channel. There’s a set of ImporterOptions methods tantalizingly named setCBegin, setCEnd, and setCStep, but this is where I found the documentation sorely lacking. The functions take (int s, int value) as arguments, but what’s s??? Are the limits closed or open? Code review is a wonderful thing, and this would not have passed it. To figure things out:

Tip #4: use Fiji’s interactive Jython interpreter to figure things out quickly.

You can find the Jython interpreter under Plugins/Scripting/Jython Interpreter. It’s no IPython, but it is extremely helpful to answer the questions I had above. My hypothesis was that s was the series, and that the intervals would be closed. So:

>>> from loci.plugins import BF
>>> from import ImporterOptions
>>> opts = ImporterOptions()
>>> opts.setId("myFile.lif")
>>> opts.setOpenAllSeries(True)
>>> opts.setUngroupFiles(True)
>>> imps = BF.openImagePlus(opts)

Now we can play around, with one slight annoyance: the interpreter won’t print the output of your last statement, so you have to specify it:

>>> len(imps)
>>> print(len(imps))

Which is what I expected, as there are 18 series in my .lif file. The image shape is given by the getDimensions() method of the ImagePlus class:

>>> print(imps[0].getDimensions())
array('i', [1024, 1024, 3, 31, 1])

>>> print(imps[1].getDimensions())
array('i', [1024, 1024, 3, 34, 1])

That’s (x, y, channels, z, time).

Now, let’s try the same thing with setCEnd, assuming closed interval:

>>> opts.setCEnd(0, 0) ## only read channels up to 0 for series 0?
>>> opts.setCEnd(2, 0) ## only read channels up to 0 for series 2?
>>> imps = BF.openImagePlus(opts)
>>> print(imps[0].getDimensions())
array('i', [1024, 1024, 1, 31, 1])

>>> print(imps[1].getDimensions())
array('i', [1024, 1024, 3, 34, 1])

>>> print(imps[2].getDimensions())
array('i', [1024, 1024, 1, 30, 1])

Nothing there to disprove my hypothesis! So we move on to the final step, which is to z-project the stack by summing the intensity over all z values. This is normally accessed via Image/Stacks/Z Project in the Fiji GUI, and I found the corresponding ij.plugin.ZProjector class by searching for “proj” in the ImageJ documentation. A ZProjector object has a setMethod method that usefully takes an int as an argument, with no explanation in its docstring as to which int translates to which method (sum, average, max, etc.). A little more digging in the source code reveals some class static variables, AVG_METHOD, MAX_METHOD, and so on.

Tip #5: don’t be afraid to look at the source code. It’s one of the main advantages of working in open-source.


>>> from ij.plugin import ZProjector
>>> proj = ZProjector()
>>> proj.setMethod(ZProjector.SUM_METHOD)
>>> proj.setImage(imps[0])
>>> proj.doProjection()
>>> impout = proj.getProjection()
>>> print(impout.getDimensions())
array('i', [1024, 1024, 1, 1, 1])

The output is actually a float-typed image, which will get rescaled to [0, 255] uint8 on save if we don’t fix it. So, to wrap up, we convert the image to 16 bits (making sure to turn off scaling), use the series title to generate a unique filename, and save as a PNG:

>>> from ij.process import ImageConverter
>>> ImageConverter.setDoScaling(False)
>>> conv = ImageConverter(impout)
>>> conv.convertToGray16()
>>> title = imps[0].getTitle().rsplit(" ", 1)[-1]
>>> IJ.saveAs(impout, 'png', "myFile-" + title + ".png")

You can see the final result of my sleuthing in and If you would do something differently, pull requests are always welcome.

Before I sign off, let me recap my tips:

1. copy into your project;

2. only import Java API functions within the functions that use them;

3. move your Fiji imports to an external file;

4. use Fiji’s interactive Jython interpreter to figure things out quickly; and

5. don’t be afraid to look at the source code.

And let me add a few final comments: once I started digging into all of Fiji’s plugins, I found documentation of very variable quality, and worse, virtually zero consistency between the interfaces to each plugin. Some work on “the currently active image”, some take an ImagePlus instance as input, and others still a filename or a directory name. Outputs are equally variable. This has been a huge pain when trying to work with these plugins.

But, on the flipside, this is the most complete collection of image processing functions anywhere. Along with the seamless access to all those functions from Jython and other languages, that makes Fiji very worthy of your attention.


This post was possible thanks to the help of Albert Cardona, Johannes Schindelin, Wayne Rasband, and Jan Eglinger, who restlessly respond to (it seems) every query on the ImageJ mailing list. Thanks!


Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez JY, White DJ, Hartenstein V, Eliceiri K, Tomancak P, & Cardona A (2012). Fiji: an open-source platform for biological-image analysis. Nature methods, 9 (7), 676-82 PMID: 22743772

Linkert M, Rueden CT, Allan C, Burel JM, Moore W, Patterson A, Loranger B, Moore J, Neves C, Macdonald D, Tarkowska A, Sticco C, Hill E, Rossner M, Eliceiri KW, & Swedlow JR (2010). Metadata matters: access to image data in the real world. The Journal of cell biology, 189 (5), 777-82 PMID: 20513764

Best practices addendum: find and follow the conventions of your programming community

The bioinformatics community is all atwitter about the recent PLOS Biology article, Best Practices for Scientific Computing. Its main points should be obvious to most quasi-experienced programmers, but I can certainly remember a time when they did not seem so obvious to me (last week I think). As such, it’s a valuable addition to the written record on scientific computing. One of their code snippets, however, is pretty annoying:

def scan(op, values, seed=None):
# Apply a binary operator cumulatively to the values given
# from lowest to highest, returning a list of results.
# For example, if "op" is "add" and "values" is "[1,3,5]",
# the result is "[1, 4, 9]" (i.e., the running total of the
# given values). The result always has the same length as
# the input.
# If "seed" is given, the result is initialized with that
# value instead of with the first item in "values", and
# the final item is omitted from the result.
# Ex : scan(add, [1, 3, 5] , seed=10)
# produces [10, 11, 14]

First, this code ignores the article’s own advice, (1b) make names consistent, distinctive, and meaningful.  I would argue that “scan” here is neither distinctive (many other operations could be called “scan”) nor meaningful (the function purpose is not at all clear from the name). My suggestion would be “cumulative_reduce”.

It also does not address another important piece of advice that I would add to their list, maybe as (1d): Find out, and follow, the conventions of the programming community you’re joining. This will allow others to use and assess your code more readily, and you to contribute to other code libraries more easily. Here, although they have made efforts to make their advice language-agnostic, the authors have chosen Python to illustrate their point. Python happens to have strong style and documentation prescriptions in the form of Python Enhancement Proposals PEP-8: Style Guide for Python Code and PEP-257: Docstring conventions. Following PEP-8 and PEP-257, the above comments become an actual docstring (which is attached to the function automatically by documentation-generating tools):

def cumulative_reduce(op, values, seed=None):
    """Apply a binary operator cumulatively to the values given.

    The operator is applied from left to right.

    For example, if "op" is "add" and "values" is "[1,3,5]",
    the result is "[1, 4, 9]" (i.e., the running total of the
    given values). The result always has the same length as
    the input.

    If "seed" is given, the result is initialized with that
    value instead of with the first item in "values", and
    the final item is omitted from the result.
    Ex : scan(add, [1, 3, 5] , seed=10)
    produces [10, 11, 14]

In addition, the Scientific Python community in particular has adopted a few docstring conventions of their own, including the NumPy docstring conventions, which divide the docstring into meaningful sections using ReStructured Text, and the doctest convention to format examples, so the documentation acts as unit tests. So, to further refine their example code:

def cumulative_reduce(op, values, seed=None):
    """Apply a binary operator cumulatively to the values given.

    The operator is applied from left to right.

    op : binary function
        An operator taking as input to values of the type contained in
        `values` and returning a value of the same type.
    values : list
        The list of input values.
    seed : type contained in `values`, optional
        A seed to start the reduce operation.

    reduced : list, same type as `values`
        The accumulated list.

    >>> add = lambda x, y: x + y
    >>> cumulative_reduce(add, [1, 3, 5])
    [1, 4, 9]

    If "seed" is given, the result is initialized with that
    value instead of with the first item in "values", and
    the final item is omitted from the result.

    >>> cumulative_reduce(add, [1, 3, 5], seed=10)
    [10, 11, 14]

Obviously, these conventions are specific to scientific Python. But the key is that other communities will have their own, and you should find out what those conventions are and adopt them. When in Rome, do as the Romans do. It’s actually taken me quite a few years of scientific programming to realise this (and internalise it). I hope this post will help someone get up to speed more quickly than I have.

(Incidentally, the WordPress/Chrome/OSX spell checker doesn’t bat an eye at “atwitter”. That’s awesome.)


Greg Wilson, DA Aruliah, C Titus Brown, Neil P Chue Hong, Matt Davis, Richard T Guy, Steven HD Haddock, Kathryn D Huff, Ian M Mitchell, Mark D Plumbley, Ben Waugh, Ethan P White, & Paul Wilson (2014). Best Practices for Scientific Computing PLoS Biol, 12 (1) DOI: 10.1371/journal.pbio.1001745

Our environmental future

Another link post, to a worthwhile article by Veronique Greenwood for Aeon (emphases mine):

For much of the thousands of years of human existence, our species has treated the world more or less as an open system. [...] the general faith was that there were, say, more whales somewhere [...] more trees somewhere [...]. Even today, in the face of imminent climate change, we continue to function as though there’s more atmosphere somewhere, ready to whisk off our waste to someplace else. It is time, though, to think of the world as a closed system. When you look at the resources involved in maintaining even a single member of a developed society, it’s hard to avoid the knowledge that this cannot continue. Last year, Tim De Chant, an American journalist who runs the blog Per Square Mile, made striking depictions of the space required if everyone in the world live liked the inhabitants of a number of countries. If we all lived like Americans, even four planet Earths would not be enough.

The article does suggest, however, that a change of mindset will push us to inventive solutions to our environmental problems. I hope she’s right.