# A high-profile endorsement of F1000 Research

Michael Eisen speaking about F1000 Research to Nature:

“They are doing lots of things that PLOS should have done five years ago.”

I recently ranted about PLOS ONE (while still endorsing their mission) for this very reason. It’s good to know that the very top knows they need to adapt.

# Speed up your Mac’s wake up time using pmset. Do it again after upgrading to Mavericks

Last year I got a 15″ Retina Macbook Pro, an excellent machine. However, it was taking way longer than my 13″ MBP to wake up from sleep. After a few months of just accepting it as a flaw of the new machines and the cost of being an early adopter, I finally decided to look into the problem. Sure enough, I came across this excellent post from OS X Daily:

Is Your Mac Slow to Wake from Sleep? Try this pmset Workaround

Oooh, sweet goodness: basically, after 1h10min asleep, your Mac goes into a “deep sleep” mode that dumps the contents of RAM into your HDD/SSD and powers off the RAM. On wake, it needs to load up all the RAM contents again. This is slow when your machine has 16GB of RAM! Thankfully, you can make your Mac wait any amount of time before going into deep sleep. This will eat up your battery a bit more, but it’s worth it. Just type this into the Terminal:

sudo pmset -a standbydelay 86400

This changes the time to deep sleep to 24h. Since I rarely spend more than 24h without using my computer, I now have instant-on every time I open up my laptop!

Finally, the reason I wrote this now: upgrading to Mavericks sneakily resets your standbydelay to 4200. (Or, at least, it did for me.) Just run the above command again and you’ll be set, at least until the next OS upgrade comes along!

Update: the original source of this tip appears to be a post from Erv Walter on his site, Ewal.net. It goes into a lot more detail about the origin of this sleep mode — which indeed did not exist when I bought my previous Macbook Pro.

# OSX software watch: use Photosweeper to remove duplicates in your image collection

It’s no secret that the photo management problem is a huge mess. As new cameras, software, and online storage and sharing services come and go, our collections end up strewn all over the place, often in duplicate. This eats up precious storage space and makes finding that one photo an exercise in frustration.

Peter Nixey has an excellent post on the disappointing state of affairs (to put it kindly) and an excellent follow-up on how Dropbox could fix it. You should definitely read those.

But, while Apple and/or Dropbox get their act together (I’m not holding my breath), you have to make sense of your photos in your Pictures folder, in your Dropbox Photos folder, in various other Dropbox shared folders, on your Desktop, in your Lightroom, Aperture, and iPhoto collections, and so on. A lot of these might be duplicated because, for example, you were just trying out Lightroom and didn’t want to commit to it so you put your pics there but also in Aperture. And by you I mean I.

So, the first step to photo sanity is to get rid of these duplicates. Thankfully, there is an excellent OSX app called Photosweeper made for just this purpose. I used it yesterday to clear 34GB of wasted space on my HDD. (I was too excited to take screenshots of the process, unfortunately!)

There’s a lot to love about Photosweeper. First, it is happy to look at all the sources I mentioned above, and compare pics across them. Second, it lets you automatically define a priority for which version of a duplicate photo to save. In my case, I told it to keep iPhoto images first (since these are most likely to have ratings, captions, and so on), then Aperture, then whatever’s on my HDD somewhere. If a duplicate was found within iPhoto, it should keep the most recent one.

But, third, what makes Photosweeper truly useful: it won’t do a thing without letting you review everything, and it offers a great reviewing interface. It places duplicates side-by-side, marking which photo it will keep and which it will trash. Best of all, this view shows everything you need to make sure you’re not deleting a high-res original in favour of the downscaled version you emailed your family: filename, date, resolution, DPI, and file size. Click on each file and the full path (even within an iPhoto or Aperture library) becomes visible. This is in stark contrast to iPhoto’s lame “hey, this is a duplicate file” dialog that shows you two downscaled versions of the images with no further information.

Once you bite the bullet, it does exactly the right thing with every duplicate: iPhoto duplicates get put in the iPhoto Trash, Lightroom duplicates get marked “Rejected” and put in a special “Trash (Photosweeper)” collection, and filesystem duplicates get moved to the OSX Trash. Lesser software might have moved all the iPhoto files to the OSX Trash, leaving the iPhoto library broken.

In all, I was really impressed with Photosweeper. 34GB is nothing to sniff at and getting rid of those duplicates is the first step to consolidating all my files. It does this in a very accountable, safe way. At no point did I get that sinking feeling of “there is no undo.”

Finally, I should mention that Photosweeper also has a “photo similarity” mode that finds not only duplicates, but very similar series of photos. This is really good for when you snapped 15 pics of the same thing so that one might turn out ok. But I’m too much of a digital hoarder to take that step!

Photosweeper currently sells for $10 on the Mac App Store. # All journals should require authors to publish their raw data This is just a link post. The excellent and excellently-named Data Colada blog has a brilliant analysis of scientific fraud exposed by the raw data. Figures can obscure flaws that are immediately obvious in the numbers. (Although, Matt Terry’s awesome and hilarious Yoink might alleviate this.) In this case, averages of four numbers turning out to be integers every single timeand two independent experiments giving almost exactly the same distribution of values. (Frankly, if you can’t simulate random sampling from an underlying distribution, you don’t belong in the fraud world!) The post demonstrates the importance of publishing as much data (and code) as possible with a paper. Words are fuzzy; data and code are precise. See here for more. # Why PLOS ONE is no longer my default journal Time-to-publication at the world’s biggest scientific journal has grown dramatically, but the nail in the coffin was its poor production policies. When PLOS ONE was announced in 2006, its charter immediately resonated with me. This would be the first journal where only scientific accuracy mattered. Judgments of “impact” and “interest” would be left to posterity, which is the right strategy when publishing is cheap and searching and filtering are easy. The whole endeavour would be a huge boon to “in-between” scientists straddling established fields — such as bioinformaticians. My first first-author paper, Joint Genome-Wide Profiling of miRNA and mRNA Expression in Alzheimer’s Disease Cortex Reveals Altered miRNA Regulation, went through a fairly standard journal loop. We first submitted it to Genome Biology, which (editorially) deemed it uninteresting to a sufficiently broad readership; then to RNA, which (editorially) decided that our sample size was too small; and finally to PLOS ONE, where it went out to review. After a single revision loop, it was accepted for publication. It’s been cited more than 15 times a year, which is modest but above the Journal Impact Factor for Genome Biology — which means that the editors made a bad call rejecting it outright. (I’m not bitter!) Overall, it was a very positive first experience at PLOS. Time to acceptance was under 3 months, time to publication under 4. The reviewers were no less harsh than in my previous experiences, so I felt (and still feel) that the reputation of PLOS ONE as a “junk” journal was (is) highly undeserved. (Update: There’s been a big hullabaloo about a recent sting targeting open access journals with a fake paper. PLOS ONE came away unscathed. See also the take of Mike Eisen, co-founder of PLOS.) And the number of citations certainly vindicated PLOS ONE’s approach of ignoring apparent impact. So, when looking for a home for my equally-awkward postdoc paper (not quite computer vision, not quite neuroscience), PLOS ONE was a natural first choice. The first thing to go wrong was the time to publication, about 6 months. Still better than many top-tier journals, but no longer a crushing advantage. And it’s not just me: there’s been plenty of discussion about time-to-publication steadily increasing at PLOS ONE. But I was not too worried about the publication time, since I’d put my paper up on the arXiv (and revised it at each round of peer-review, so you can see the revision history there — but not on PLOS ONE). But, after multiple rounds of review, the time came for production, at which point they messed up two things: they did not include my present address; and they messed up Figure 1, which is supposed to be a small, single-column, illustrative figure, and which they made page-width. The effect is almost comical, and my first impression seeing page 2 would be to think that the authors are trying to mask their incompetence with giant pictures. (We’re not, I swear!) Figure 1 of our paper on arXiv (left) and PLOS ONE (right) Both of these mistakes could have been avoided if PLOS ONE did not have a policy of not letting you see the camera-ready pdf before it is published, and of not allowing corrections to papers unless they are technical or scientific, regardless of fault. Not to mention they could have, you know, actually looked at the dimensions embedded in the submitted TIFFs. With a$1,300 publication fee, PLOS could afford to take a little bit of extra care with production. Both of the above policies are utterly unnecessary — the added cost of sending authors a production proof is close to nil, and keeping track of revisions on online publications is also trivial (see the 22 year old arXiv for an example).

We scientists live and die by our papers. We don’t want the culmination of years of work to be marred by a silly, easily-fixed formatting error, ossified by an unwieldy bureaucracy. I’ve been an avid promoter of PLOS (and PLOS ONE in particular) over the past few years, but I’m sad to say that’s not where my next paper will end up.

Ultimately, PLOS ONE’s model, groundbreaking though it was, is already being supplanted by newcomers. PeerJ offers everything PLOS ONE does at a fraction of the cost, and further includes a preprint service and open peer-review. Ditto for F1000 Research, which in addition offers unlimited revisions (a topic close to my heart ;). And both use the excellent MathJAX to render mathematical formulas, unlike PLOS’s archaic use of embedded images. They get my vote for the journals of the future.

[Note: the views expressed herein are mine alone — no co-authors were harmed consulted in the writing of this blog post.]

References

Nunez-Iglesias J, Liu CC, Morgan TE, Finch CE, & Zhou XJ (2010). Joint genome-wide profiling of miRNA and mRNA expression in Alzheimer’s disease cortex reveals altered miRNA regulation. PloS one, 5 (2) PMID: 20126538

Kravitz DJ, & Baker CI (2011). Toward a new model of scientific publishing: discussion and a proposal. Frontiers in computational neuroscience, 5 PMID: 22164143

Juan Nunez-Iglesias, Ryan Kennedy, Toufiq Parag, Jianbo Shi, & Dmitri B. Chklovskii (2013). Machine learning of hierarchical clustering to segment 2D and 3D images arXiv arXiv: 1303.6163v3

Nunez-Iglesias J, Kennedy R, Parag T, Shi J, & Chklovskii DB (2013). Machine Learning of Hierarchical Clustering to Segment 2D and 3D Images. PloS one, 8 (8) PMID: 23977123

# Tesla makes a better place

No sooner do I berate Tesla for not supporting battery swapping than they go and announce battery swap stations! It’d be nice if they weren’t proprietary, but I’ll take it.

# A sad day in the fight against climate change

Apparently Better Place is preparing for bankruptcy. I wrote an optimistic post about Better Place years ago, when they were just about to launch. They created a swappable battery system for electric cars along with corresponding battery swap stations. In my opinion, these were the most credible cure to range anxiety for electric vehicles. Batteries take a long time to charge, even on Tesla’s Supercharger stations, which they ludicrously refer to as “super-quick” just because they can give you half a charge in half an hour. A swap in Better Place’s stations took two minutes.

Adoption of electric vehicles will remain minuscule until the range problem can be fixed. With transport accounting for about 20% of CO2 emissions worldwide, significant EV adoption would be a massive boon to the fight against climate change. And with Better Place out of the picture, that goal became just a little bit less real.

# Apparently I’m the only one excited about Sony’s new big e-ink tablet

Both The Verge and Techcrunch are quite negative about Sony’s big new device… So much so that they compelled me to write this post. Someone should give Sony some positive coverage! I think it’s an excellent idea, and if Sony needs some enthusiastic testers down under, they should send this thing my way. ;)

For those of you that haven’t seen it, Sony’s new device is dubbed “e-paper”, has a roomy 13.3″ e-ink-like display, and you can write on said display with the included stylus.

Both of the above publications seem to think that everything is fine in tablet-land and that the iPad and company already serve whatever need Sony’s new tablet tries to fill. Well, I’m here to tell them that the iPad (et al) sucks for some things. Three of those are: (1) taking handwritten notes,  (2) reading (some) pdfs in full-page view, and (3) reading in full daylight. By the sound of it, Sony’s new tablet will excel at all three, and that has me excited.

After typing “up, enter” more times than I care to admit, I decided to figure out the automatic way. Here’s my bash one-liner to get automatic download resume in curl:

export ec=18; while [ $ec -eq 18 ]; do /usr/bin/curl -O -C - "http://www.example.com/a-big-archive.zip"; export ec=$?; done 

Explanation: the exit code curl chucks when a download is interrupted is 18, and $? gives you the exit code of the last command in bash. So, while the exit code is 18, keep trying to download the file, maintaining the filename (-O) and resuming where the previous download left off (-C). # h5cat: quickly preview HDF5 file contents from the command-line As a first attempt at writing actually useful blog posts, I’ll publicise a small Python script I wrote to peek inside HDF5 files when HDFView is overkill. Sometimes you just want to know how many dimensions a stored array has, or its exact path within the HDF hierarchy. The “codebase” is currently tiny enough that it all fits below: #!/usr/bin/env python import os, sys, argparse import h5py from numpy import array arguments = argparse.ArgumentParser(add_help=False) arggroup = arguments.add_argument_group('HDF5 cat options') arggroup.add_argument('-g', '--group', metavar='GROUP', help='Preview only path given by GROUP') arggroup.add_argument('-v', '--verbose', action='store_true', default=False, help='Include array printout.') if __name__ == '__main__': parser = argparse.ArgumentParser( description='Preview the contents of an HDF5 file', parents=[arguments] ) parser.add_argument('fin', nargs='+', help='The input HDF5 files.') args = parser.parse_args() for fin in args.fin: print '>>>', fin f = h5py.File(fin, 'r') if args.group is not None: groups = [args.group] else: groups = [] f.visit(groups.append) for g in groups: print '\n ', g if type(f[g]) == h5py.highlevel.Dataset: a = f[g] print ' shape: ', a.shape, '\n type: ', a.dtype if args.verbose: a = array(f[g]) print a  h5cat is available on GitHub under an MIT license. Here’s an example use case: $ h5cat -v -g vi single-channel-tr3-0-0.00.lzf.h5
>>> single-channel-tr3-0-0.00.lzf.h5

vi
shape:  (3, 1)
type:  float64
[[ 0.        ]
[ 0.06224902]
[ 2.23062383]]