Skip to main content

SciPy 2014: an extremely useful conference with a diversity problem

I just got back home from the SciPy 2014 conference in Austin, TX. Here are my thoughts after this year's conference.

About SciPy in general

Since my first SciPy in 2012, I've decided to prioritise programming conferences over scientific ones, because the value I get for my time is just that much higher. At a scientific conference, I certainly become aware of way-cool stuff going on in other research labs in my area. Once I get home, however, I go back to whatever I was doing. In contrast, at programming conferences, I become aware of new tools and practices that change the way I do science. In his keynote this year, Greg Wilson said of Software Carpentry, "We save researchers a day a week for the rest of their careers." I feel the same way about SciPy in general.

In the 2012 sprints, I learned about GitHub Pull Requests and code review, the lingua franca of open source development today. I can't express how useful that's been. I also started my ongoing collaboration with the scikit-image project, which has enabled my research to reach far more users than I ever could have achieved on my own.

No scientific conference I've been to has had such an impact on my science output, nor can I imagine one doing so.

This year's highlights

This year was no different. Without further ado, here are my top hits from this year's conference:

  • Michael Droettboom talked about his continuous benchmarking project, Airspeed Velocity. It is hilariously named and incredibly useful. asv checks out code from your Git repo at regular intervals and runs benchmarks (which you define), and plots your code's performance over time. It's an incredible guard against feature creep slowing down your code base.
  • IPython recently unveiled their modal version 2.0 interface, sending vimmers worldwide rejoicing. But a few key bindings are just wrong from a vim perspective. Most egregiously, i, which should enter edit mode, interrupts the kernel when pressed twice! That's just evil. Paul Ivanov goes all in with vim keybindings with his hilarious and life-changing IPython vimception talk. The title is more appropriate than I realised. Must-watch.
  • Damián Avila revealed (heh) his live IPython presentations with Reveal.js, forever changing how Python programmers present their work. I was actually aware of this before the conference and used it in my own talk, but you should definitely watch his talk and get the extension from his repo.
  • Min RK gave an excellent tutorial on IPython parallel (repo, videos 1, 2, 3). It's remarkable what the IPython team have achieved thanks to their decoupling of the interactive shell and the computation "kernel". (I still hate that word.)
  • Brian Granger and Jon Frederic gave an excellent tutorial on IPython interactive widgets (notebook here). They provide a simple and fast way to interactively explore your data. I've already started using these on my own problems.
  • Aaron Meurer gave the best introduction to the Python packaging problem that I've ever seen, and how Continuum's conda project solves it. I think we still need an in-depth tutorial on how package distributors should use conda, but for users, conda is just awesome, and this talk explains why.
  • Matt Rocklin has a gift for crystal clear speaking, despite his incredible speed, and it was on full display in his (and Mark Wiebe's) talk on Blaze, Continuum's new array abstraction library. I'm not sure I'll be using Blaze immediately but it's certainly on my radar now!
  • Lightning talks are always fun: days 1, 2, 3. Watch out for Fernando Pérez's announcement of Project Jupyter, the evolution of the IPython notebook, and for Damon McDougall's riveting history of waffles. (You think I'm joking.)

Apologies if I've missed anyone: with three tracks, an added one with the World Cup matches ;) , and my own talk preparations, "overwhelming" does not begin to describe the conference! I will second Aaron Meurer's assertion that there were no bad talks. Which brings us to...

On my to-watch

Jake Vanderplas recently wrote a series of blog posts (last one here, with links to earlier posts) comparing frequentist and Bayesian approaches to statistical analysis, in which he makes a compelling argument that we should all relegate frequentism to the dustbin of history. As such, I intend to go over Chris Fonnesbeck's tutorial (2, 3) and talk about Bayesian analysis in Python using PyMC.

David Sanders also did a Julia tutorial (part 2) that was scheduled at the same time as my own scikit-image tutorial, but I'm interested to see how the Julia ecosystem is progressing, so that should be a good place to start. (Although I'm still bitter that they went with 1-based indexing!)

The reproducible science tutorial (part 2) generated quite a bit of buzz so I would be interested to go over that one as well.

For those interested in computing education or in geoscience, the conference had dedicated tracks for each of those, so you are bound to find something you like (not least, Lorena Barba's and Greg Wilson's keynotes). Have a look at the full listing of videos here. These might be easier to navigate by looking at the conference schedule.

The SciPy experience

I want to close this post with a few reflections on the conference itself.

SciPy is broken up into three "stages": two days of tutorials, three days of talks, and two days of sprints. Above, I covered the tutorials and talks, but to me, the sprints are what truly distinguish SciPy. The spirit of collaboration is unparalleled, and an astonishing amount of value is generated in those two days, either in the shape of code, or in introducing newcomers to new projects and new ways to collaborate in programming.

My biggest regret of the conference was not giving a lightning talk urging people to come to the sprints. I repeatedly asked people whether they were coming to the sprints, and almost invariably the answer was that they didn't feel they were good enough to contribute. To reiterate my previous statements: (1) when I attended my first sprint in 2012, I had never done a pull request; (2) sprints are an excellent way to introduce newcomers to projects and to the pull request development model. All the buzz around the sprints was how welcoming all of the teams were, but I think there is a massive number of missed opportunities because this is not made obvious to attendees before the sprints.

Lastly, a few notes on diversity. During the conference, April Wright, a student in evolutionary biology at UT Austin, wrote a heartbreaking blog post about how excluded she felt from a conference where only 15% of attendees were women. That particular incident was joyfully resolved, with plenty of SciPyers reaching out to April and inviting her along to sprints and other events. But it highlighted just how poorly we are doing in terms of diversity. Andy Terrel, one of the conference organisers, pointed out that 15% is much better than 2012's three (women, not percent!), but (a) that is still extremely low, and (b) I was horrified to read this because I was there in 2012... And I did not notice that anything was wrong. How can it be, in 2012, that it can seem normal to be at a professional conference and have effectively zero women around? It doesn't matter what one says about the background percentage of women in our industry and so on... Maybe SciPy is doing all it can about diversity. (Though I doubt it.) The point is that a scene like that should feel like one of those deserted cityscapes in post-apocalyptic movies. As long as it doesn't, as long as SciPy feels normal, we will continue to have diversity problems. I hope my fellow SciPyers look at these numbers, feel appalled as I have, and try to improve.

... And on cue, while I was writing this post, Andy Terrel wrote a great post of his own about this very topic: http://andy.terrel.us/blog/2014/07/17/

I still consider SciPy a fantastic conference. Jonathan Eisen (@phylogenomics), whom I admire, would undoubtedly boycott it because of the problems outlined above, but I am heartened that the organising committee is taking this as a serious problem and trying hard fix it. I hope next time it is even better.

Comments

Comments powered by Disqus