Tag Archives: diversity

Calling out SciPy on diversity (even though it hurts)

Over the past few weeks, I’ve been heavily promoting the SciPy conference, a meeting about scientific programming in Python. I’ve been telling everyone who would listen that they should submit a talk abstract and go, because scientific programming is increasingly common in any scientist’s work and SciPy massively improves how you do that.

I have also been guiltily ommitting that the speaker and attendee diversity at SciPy is shockingly bad. Last year, for example, 15% of attendees were women, and that was an improvement over the ratio three years ago, when just 3% (!!!) were women.

I rationalised continuing to promote this conference because there was talk from past organisers about making efforts to improve. (And indeed, the past three years have been on an upward trajectory.)

A couple of days ago, however, the full list of keynote speakers was announced, and lo and behold, it’s three white guys. I have to acknowledge that they are extremely accomplished in the SciPy universe, and, if diversity were not more generally a problem at this conference and in tech in general, I wouldn’t bat an eye. Excellent choice of speakers, really. Looking forward to it.

But diversity is a problem. It’s an enormous problem. I’m inclined to call it catastrophic.

Let me try to quantify it. Men and women are equally capable scientific programmers. So out of a total pool of 100 potential SciPy attendees/contributors, 50 are women and 50 are men. Now, let’s assume the male side of the community is working at near-optimum capacity, so, 50 of 50 those men are at SciPy. 15% of attendees being women means just 9 of the 50 potential female contributors are making it out to the conference (9/59 ≈ 15%). Or, slice it another way, a whopping (50 – 9) / 50 = 82% of women who could be contributing to SciPy are missing.

Now, think of where we would be if we took 82% of male science-Pythonistas and just erased their talks, their discussions, and their code contributions. The SciPy ecosystem would suck. Yet that is exactly how many coders are missing from our community.

Now, we can sit here and play female conference speaker bingo until the cows come home to roost, but that is missing the point: these are all only excuses for not doing enough. “Not my fault” is not good enough anymore. It is everyone’s fault who does not make an active and prolonged effort to fix things.

The keynote speakers are an excellent place to make a difference, because you can eliminate all sorts of confounders. I have a certain sympathy, for example, for the argument that one should pick the best abstracts/scholarship recipients, rather than focus on race or gender. (Though the process should be truly blind to remove pervasive bias, as studies and the experience of orchestra auditions have repeatedly shown.) For keynotes though, organisers are free to pursue any agenda they like. For example, they can make education a theme, and get Lorena Barba and Greg Wilson in, as they did last year.

Until the gender ratio begins to even remotely approach 1:1, diversity as an agenda should be a priority for the organisers. This doesn’t mean invite the same number of women and men to give keynotes. This means keep inviting qualified women until you have at least one confirmed female keynote speaker, and preferably two. Then, and only then, you can look into inviting men.

Women have been found to turn down conference invitations more often than men, irrespective of ability or accomplishment. I don’t know why, but I suspect one reason is lack of role models, in the form of previous female speakers. That’s why this keynote roster is so disappointing. There’s tons of accomplished female Pythonistas out there, and there would be even more if we all made a concerted effort to improve things.

I don’t want to retread the same territory that Jonathan Eisen (@phylogenomics) has already covered in “Calling attention to meeting with skewed gender ratios, even when it hurts“. In particular, see that article for links to many others with ideas to improve gender ratios. But this is my contribution in the exact same series: love SciPy. See my previous posts for illustration.

Even looking back at my recent post, when I looked for a picture that I thought captured the collegial, collaborative feel of the conference, I unintentionally picked one featuring only men. This needs to improve, massively, if I’m going to keep supporting this conference. I really hope the organisers place diversity at the centre of their agenda for every decision going forward.

I thank Jonathan Eisen, Andy Ray Terrel, and April Wright for comments on earlier versions of this article.

SciPy 2014: an extremely useful conference with a diversity problem

I just got back home from the SciPy 2014 conference in Austin, TX. Here are my thoughts after this year’s conference.

About SciPy in general

Since my first SciPy in 2012, I’ve decided to prioritise programming conferences over scientific ones, because the value I get for my time is just that much higher. At a scientific conference, I certainly become aware of way-cool stuff going on in other research labs in my area. Once I get home, however, I go back to whatever I was doing. In contrast, at programming conferences, I become aware of new tools and practices that change the way I do science. In his keynote this year, Greg Wilson said of Software Carpentry, “We save researchers a day a week for the rest of their careers.” I feel the same way about SciPy in general.

In the 2012 sprints, I learned about GitHub Pull Requests and code review, the lingua franca of open source development today. I can’t express how useful that’s been. I also started my ongoing collaboration with the scikit-image project, which has enabled my research to reach far more users than I ever could have achieved on my own.

No scientific conference I’ve been to has had such an impact on my science output, nor can I imagine one doing so.

This year’s highlights

This year was no different. Without further ado, here are my top hits from this year’s conference:

  • Michael Droettboom talked about his continuous benchmarking project, Airspeed Velocity. It is hilariously named and incredibly useful. asv checks out code from your Git repo at regular intervals and runs benchmarks (which you define), and plots your code’s performance over time. It’s an incredible guard against feature creep slowing down your code base.
  • IPython recently unveiled their modal version 2.0 interface, sending vimmers worldwide rejoicing. But a few key bindings are just wrong from a vim perspective. Most egregiously, i, which should enter edit mode, interrupts the kernel when pressed twice! That’s just evil. Paul Ivanov goes all in with vim keybindings with his hilarious and life-changing IPython vimception talk. The title is more appropriate than I realised. Must-watch.
  • Damián Avila revealed (heh) his live IPython presentations with Reveal.js, forever changing how Python programmers present their work. I was actually aware of this before the conference and used it in my own talk, but you should definitely watch his talk and get the extension from his repo.
  • Min RK gave an excellent tutorial on IPython parallel (repo, videos 1, 2, 3). It’s remarkable what the IPython team have achieved thanks to their decoupling of the interactive shell and the computation “kernel”. (I still hate that word.)
  • Brian Granger and Jon Frederic gave an excellent tutorial on IPython interactive widgets (notebook here). They provide a simple and fast way to interactively explore your data. I’ve already started using these on my own problems.
  • Aaron Meurer gave the best introduction to the Python packaging problem that I’ve ever seen, and how Continuum’s conda project solves it. I think we still need an in-depth tutorial on how package distributors should use conda, but for users, conda is just awesome, and this talk explains why.
  • Matt Rocklin has a gift for crystal clear speaking, despite his incredible speed, and it was on full display in his (and Mark Wiebe’s) talk on Blaze, Continuum’s new array abstraction library. I’m not sure I’ll be using Blaze immediately but it’s certainly on my radar now!
  • Lightning talks are always fun: days 1, 2, 3. Watch out for Fernando Pérez’s announcement of Project Jupyter, the evolution of the IPython notebook, and for Damon McDougall’s riveting history of waffles. (You think I’m joking.)

Apologies if I’ve missed anyone: with three tracks, an added one with the World Cup matches ;) , and my own talk preparations, “overwhelming” does not begin to describe the conference! I will second Aaron Meurer’s assertion that there were no bad talks. Which brings us to…

On my to-watch

Jake Vanderplas recently wrote a series of blog posts (last one here, with links to earlier posts) comparing frequentist and Bayesian approaches to statistical analysis, in which he makes a compelling argument that we should all relegate frequentism to the dustbin of history. As such, I intend to go over Chris Fonnesbeck’s tutorial (2, 3) and talk about Bayesian analysis in Python using PyMC.

David Sanders also did a Julia tutorial (part 2) that was scheduled at the same time as my own scikit-image tutorial, but I’m interested to see how the Julia ecosystem is progressing, so that should be a good place to start. (Although I’m still bitter that they went with 1-based indexing!)

The reproducible science tutorial (part 2) generated quite a bit of buzz so I would be interested to go over that one as well.

For those interested in computing education or in geoscience, the conference had dedicated tracks for each of those, so you are bound to find something you like (not least, Lorena Barba’s and Greg Wilson’s keynotes). Have a look at the full listing of videos here. These might be easier to navigate by looking at the conference schedule.

The SciPy experience

I want to close this post with a few reflections on the conference itself.

SciPy is broken up into three “stages”: two days of tutorials, three days of talks, and two days of sprints. Above, I covered the tutorials and talks, but to me, the sprints are what truly distinguish SciPy. The spirit of collaboration is unparalleled, and an astonishing amount of value is generated in those two days, either in the shape of code, or in introducing newcomers to new projects and new ways to collaborate in programming.

My biggest regret of the conference was not giving a lightning talk urging people to come to the sprints. I repeatedly asked people whether they were coming to the sprints, and almost invariably the answer was that they didn’t feel they were good enough to contribute. To reiterate my previous statements: (1) when I attended my first sprint in 2012, I had never done a pull request; (2) sprints are an excellent way to introduce newcomers to projects and to the pull request development model. All the buzz around the sprints was how welcoming all of the teams were, but I think there is a massive number of missed opportunities because this is not made obvious to attendees before the sprints.

Lastly, a few notes on diversity. During the conference, April Wright, a student in evolutionary biology at UT Austin, wrote a heartbreaking blog post about how excluded she felt from a conference where only 15% of attendees were women. That particular incident was joyfully resolved, with plenty of SciPyers reaching out to April and inviting her along to sprints and other events. But it highlighted just how poorly we are doing in terms of diversity. Andy Terrel, one of the conference organisers, pointed out that 15% is much better than 2012’s three (women, not percent!), but (a) that is still extremely low, and (b) I was horrified to read this because I was there in 2012… And I did not notice that anything was wrong. How can it be, in 2012, that it can seem normal to be at a professional conference and have effectively zero women around? It doesn’t matter what one says about the background percentage of women in our industry and so on… Maybe SciPy is doing all it can about diversity. (Though I doubt it.) The point is that a scene like that should feel like one of those deserted cityscapes in post-apocalyptic movies. As long as it doesn’t, as long as SciPy feels normal, we will continue to have diversity problems. I hope my fellow SciPyers look at these numbers, feel appalled as I have, and try to improve.

… And on cue, while I was writing this post, Andy Terrel wrote a great post of his own about this very topic:
http://andy.terrel.us/blog/2014/07/17/

I still consider SciPy a fantastic conference. Jonathan Eisen (@phylogenomics), whom I admire, would undoubtedly boycott it because of the problems outlined above, but I am heartened that the organising committee is taking this as a serious problem and trying hard fix it. I hope next time it is even better.