Skip to main content

Get the best of both worlds with Fiji's Jython interpreter

Fiji is just ImageJ, with batteries included. It contains plugins to do virtually anything you would want to do to an image. Since my go-to programming language is Python, my favorite feature of Fiji is its language-agnostic API, which supports a plethora of languages, including Java, Javascript, Clojure, and of course Python; 7 languages in all. (Find these under Plugins/Scripting/Script Editor.) Read on to learn more about the ins and outs of using Python to drive Fiji.

Among the plugin smorgasbord of Fiji is the Bio-Formats importer, which can open any proprietary microscopy file under the sun. (And there’s a lot of them!) Below I will use Jython to open some .lifs, do some processing, and output some .pngs that I can process further using Python/NumPy/scikit-image. (A .lif is a Leica Image File, because there were not enough image file formats before Leica came along.)

Read more…

Best practices addendum: find and follow the conventions of your programming community

The bioinformatics community is all atwitter about the recent PLOS Biology article, Best Practices for Scientific Computing. Its main points should be obvious to most quasi-experienced programmers, but I can certainly remember a time when they did not seem so obvious to me (last week I think). As such, it's a valuable addition to the written record on scientific computing. One of their code snippets, however, is pretty annoying:

def scan(op, values, seed=None):
# Apply a binary operator cumulatively to the values given
# from lowest to highest, returning a list of results.
# For example, if "op" is "add" and "values" is "[1,3,5]",
# the result is "[1, 4, 9]" (i.e., the running total of the
# given values). The result always has the same length as
# the input.
# If "seed" is given, the result is initialized with that
# value instead of with the first item in "values", and
# the final item is omitted from the result.
# Ex : scan(add, [1, 3, 5] , seed=10)
# produces [10, 11, 14]
...implementation...

First, this code ignores the article's own advice, (1b) make names consistent, distinctive, and meaningful.  I would argue that "scan" here is neither distinctive (many other operations could be called "scan") nor meaningful (the function purpose is not at all clear from the name). My suggestion would be "cumulative_reduce".

Read more…

Our environmental future

Another link post, to a worthwhile article by Veronique Greenwood for Aeon (emphases mine):

For much of the thousands of years of human existence, our species has treated the world more or less as an open system. [...] the general faith was that there were, say, more whales somewhere [...] more trees somewhere [...]. Even today, in the face of imminent climate change, we continue to function as though there’s more atmosphere somewhere, ready to whisk off our waste to someplace else. It is time, though, to think of the world as a closed system. When you look at the resources involved in maintaining even a single member of a developed society, it’s hard to avoid the knowledge that this cannot continue. Last year, Tim De Chant, an American journalist who runs the blog Per Square Mile, made striking depictions of the space required if everyone in the world live liked the inhabitants of a number of countries. If we all lived like Americans, even four planet Earths would not be enough.

The article does suggest, however, that a change of mindset will push us to inventive solutions to our environmental problems. I hope she's right.

Speed up your Mac's wake up time using pmset. Do it again after upgrading to Mavericks

Last year I got a 15" Retina Macbook Pro, an excellent machine. However, it was taking way longer than my 13" MBP to wake up from sleep. After a few months of just accepting it as a flaw of the new machines and the cost of being an early adopter, I finally decided to look into the problem. Sure enough, I came across this excellent post from OS X Daily:

Is Your Mac Slow to Wake from Sleep? Try this pmset Workaround

Oooh, sweet goodness: basically, after 1h10min asleep, your Mac goes into a "deep sleep" mode that dumps the contents of RAM into your HDD/SSD and powers off the RAM. On wake, it needs to load up all the RAM contents again. This is slow when your machine has 16GB of RAM! Thankfully, you can make your Mac wait any amount of time before going into deep sleep. This will eat up your battery a bit more, but it's worth it. Just type this into the Terminal:

sudo pmset -a standbydelay 86400
This changes the time to deep sleep to 24h. Since I rarely spend more than 24h without using my computer, I now have instant-on every time I open up my laptop!

Read more…

OSX software watch: use Photosweeper to remove duplicates in your image collection

It's no secret that the photo management problem is a huge mess. As new cameras, software, and online storage and sharing services come and go, our collections end up strewn all over the place, often in duplicate. This eats up precious storage space and makes finding that one photo an exercise in frustration.

Peter Nixey has an excellent post on the disappointing state of affairs (to put it kindly) and an excellent follow-up on how Dropbox could fix it. You should definitely read those.

But, while Apple and/or Dropbox get their act together (I'm not holding my breath), you have to make sense of your photos in your Pictures folder, in your Dropbox Photos folder, in various other Dropbox shared folders, on your Desktop, in your Lightroom, Aperture, and iPhoto collections, and so on. A lot of these might be duplicated because, for example, you were just trying out Lightroom and didn't want to commit to it so you put your pics there but also in Aperture. And by you I mean I.

So, the first step to photo sanity is to get rid of these duplicates. Thankfully, there is an excellent OSX app called Photosweeper made for just this purpose. I used it yesterday to clear 34GB of wasted space on my HDD. (I was too excited to take screenshots of the process, unfortunately!)

There's a lot to love about Photosweeper. First, it is happy to look at all the sources I mentioned above, and compare pics across them. Second, it lets you automatically define a priority for which version of a duplicate photo to save. In my case, I told it to keep iPhoto images first (since these are most likely to have ratings, captions, and so on), then Aperture, then whatever's on my HDD somewhere. If a duplicate was found within iPhoto, it should keep the most recent one.

But, third, what makes Photosweeper truly useful: it won't do a thing without letting you review everything, and it offers a great reviewing interface. It places duplicates side-by-side, marking which photo it will keep and which it will trash. Best of all, this view shows everything you need to make sure you're not deleting a high-res original in favour of the downscaled version you emailed your family: filename, date, resolution, DPI, and file size. Click on each file and the full path (even within an iPhoto or Aperture library) becomes visible. This is in stark contrast to iPhoto's lame "hey, this is a duplicate file" dialog that shows you two downscaled versions of the images with no further information.

Once you bite the bullet, it does exactly the right thing with every duplicate: iPhoto duplicates get put in the iPhoto Trash, Lightroom duplicates get marked "Rejected" and put in a special "Trash (Photosweeper)" collection, and filesystem duplicates get moved to the OSX Trash. Lesser software might have moved all the iPhoto files to the OSX Trash, leaving the iPhoto library broken.

In all, I was really impressed with Photosweeper. 34GB is nothing to sniff at and getting rid of those duplicates is the first step to consolidating all my files. It does this in a very accountable, safe way. At no point did I get that sinking feeling of "there is no undo."

Finally, I should mention that Photosweeper also has a "photo similarity" mode that finds not only duplicates, but very similar series of photos. This is really good for when you snapped 15 pics of the same thing so that one might turn out ok. But I'm too much of a digital hoarder to take that step!

Photosweeper currently sells for $10 on the Mac App Store.

All journals should require authors to publish their raw data

This is just a link post. The excellent and excellently-named Data Colada blog has a brilliant analysis of scientific fraud exposed by the raw data. Figures can obscure flaws that are immediately obvious in the numbers. (Although, Matt Terry's awesome and hilarious Yoink might alleviate this.) In this case, averages of four numbers turning out to be integers every single timeand two independent experiments giving almost exactly the same distribution of values. (Frankly, if you can't simulate random sampling from an underlying distribution, you don't belong in the fraud world!)

The post demonstrates the importance of publishing as much data (and code) as possible with a paper. Words are fuzzy; data and code are precise.

See here for more.

Why PLOS ONE is no longer my default journal

Time-to-publication at the world's biggest scientific journal has grown dramatically, but the nail in the coffin was its poor production policies.

When PLOS ONE was announced in 2006, its charter immediately resonated with me. This would be the first journal where only scientific accuracy mattered. Judgments of "impact" and "interest" would be left to posterity, which is the right strategy when publishing is cheap and searching and filtering are easy. The whole endeavour would be a huge boon to "in-between" scientists straddling established fields — such as bioinformaticians.

My first first-author paper, Joint Genome-Wide Profiling of miRNA and mRNA Expression in Alzheimer's Disease Cortex Reveals Altered miRNA Regulation, went through a fairly standard journal loop. We first submitted it to Genome Biology, which (editorially) deemed it uninteresting to a sufficiently broad readership; then to RNA, which (editorially) decided that our sample size was too small; and finally to PLOS ONE, where it went out to review. After a single revision loop, it was accepted for publication. It's been cited more than 15 times a year, which is modest but above the Journal Impact Factor for Genome Biology — which means that the editors made a bad call rejecting it outright. (I'm not bitter!)

Read more…