Category Archives: conference

Summer School Announcement: ASPP Asia-Pacific 2018

The Advanced Scientific Programming in Python (ASPP) summer school has had 10 extremely successful iterations in Europe. (You can find past materials, schedules, and student evaluations at https://python.g-node.org/archives.) Now, thanks to the INCF, we will be holding its first iteration in Australia, to cater to the Asia Pacific region. (Note: the original ASPP will still take place in Europe next Northern summer; this is a fork of that school.)

Key details

  • The workshop runs January 14-21 at the Melbourne Brain Centre, University of Melbourne, Australia
  • topics include: git, contributing to open source software with github, testing, debugging, profiling, advanced NumPy, Cython, data visualisation.
  • hands-on learning using pair programming
  • free to attend (but students are responsible for travel, accommodation, and meals)
  • 30 student places, to be selected competitively
  • application deadline is Oct 31, 2017, 23:59 UTC.
  • website: https://melbournebioinformatics.org.au/aspp-asia-pacific
  • apply: https://melbournebioinformatics.org.au/aspp-asia-pacific/applications (make sure you read the FAQ on that page)

Background

Two-and-a-bit years ago, Tiziano Zito asked me if I could join the faculty at the 2015 ASPP school in Munich (then in its 8th iteration). It turned out to be a fantastic teaching experience, and, more importantly, it was a fantastic experience for the students. Students selected for the school fit a certain profile, neither novice nor advanced. As such, you can be sure that if you participate in the school, you will learn a great deal. We teach tools that will immediately improve your scientific practice. I decided that I wanted to replicate the school in Australia. Now it is finally here!

Course outline

Scientists spend increasingly more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists have been trained to use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques and best practices that are standard in industry, but especially tailored to the needs of a programming scientist. Lectures are devised to be interactive and to give the students enough time to acquire direct hands-on experience with the materials. Students will work in pairs throughout the school and will team up to practice the newly learned skills in a real programming project — an entertaining computer game.

We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to becoming a standard tool for scientists.

Who is eligible?

This school is targeted at Master/PhD students and postdocs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python and of a version control system such as git, subversion, mercurial, or bazaar is assumed. Participants without any prior experience with Python and/or git should work through the proposed introductory material before the course.

We have strived to get a pool of students that is international and gender-balanced, and have succeeded, with gender parity in the last four schools.

More questions

If you have any questions, contact aspp@melbournebioinformatics.org.au.

Please circulate this announcement widely!

Thanks,

Juan.

EuroSciPy 2015 debrief

The videos from EuroSciPy 2015 are up! This marks a good time to write up my thoughts on the conference.
I’ve mentioned before that the yearly SciPy conference is stunningly useful. This year I couldn’t make it to Austin, but I did attend EuroSciPy, the European version of the same conference, in Cambridge, UK. It was spectacular.

Useful talks

The talk of the conference, for me, goes to Robin Wilson for recipy, which one can describe as a logging utility, if one wishes to make it sound as uninspiring as possible. Recipy’s strength is in its mind-boggling simplicity. Here is the unabridged usage guide:

import recipy

With this single line, your script will now generate an entry in a database every time it is run. It logs the start and end time, the working directory, the script’s git hash, any differences between the working copy and the last git commit (!), and the names of any input and output files. (File hashes are coming soon, I’m assured).
I don’t know about you but I have definitely lost count of the times I’ve looked at a file and wondered what script I ran to get it, or the input data that went into it. This library solves that problem with absolutely minimal friction for the user.
I also enjoyed Nicolas Rougier’s talk on ReScience, a new journal dedicated to replicated (and replicable) scientific analyses. It’s a venue to publish all those efforts to replicate a result you read in a paper. Given recent findings about how poorly most papers replicate, I think this is a really important outlet.
The other remarkable thing about it is that all review is open and done in the spirit of open source, on GitHub. Submission is by pull request, of course. With just one paper out so far, it’s a bit early to tell whether it’ll take off, but I really hope it does. I’ll be looking for stuff of my own to publish there, for sure. (Oh and by the way, they are looking for reviewers and editors!)
Another great talk was Philipp Rudiger on HoloViews, an object-oriented plotting framework. They define an arithmetic on figures: A * B overlays figure B on A, while B + C creates two subplots out of B and C (and automatically labels them). Their example notebooks rely a lot on IPython magic, which I’m not happy about and means I haven’t fully grokked the API, but it seems like a genuinely useful way to think about plotting.
A final highlight from the main session was Martin Weigert on Spimagine, his GPU-accelerated, 5D image analysis and visualisation framework. It was stupidly impressive. Although it’s a long-term project, I’m inclined to try to incorporate many of its components into scikit-image.

Tutorials

The tutorials are a great asset of both EuroSciPy and SciPy. I learn something new every year. The highlight for me was the Cython tutorial, in which Stefan Behnel demonstrated how easy it is to provide Python access to C++ code using Cython. (I have used Cython quite extensively, but only to speed up Python code, rather than wrap C or C++ code.)

Sprints

I was feeling a bit hypocritical for missing the sprints this year, since I had to run off before the Sunday. Emmanuelle Gouillart, another scikit-image core dev, suggested having a small, unofficial sprint on Friday evening. It grew and grew into a group of about 30 people (including about 10 new to sprinting) who all gathered at the Enthought Cambridge office to work on scikit-image or the SciPy lecture notes. A brilliant experience.
scikit-image sprint at Enthought
(By the way, nothing creepy going on with that dude hunching over one of our sprinters — that’s just husband-and-wife team Olivia and Robin Wilson! ;)

Final thoughts

As usual, I learned heaps and had a blast at this SciPy conference (my fourth). I hope it will remain a yearly ritual, and I hope someone reading this will give it a try next year!

The SciPy ecosystem belongs to everyone

I use Twitter favourites almost exclusively to mark posts that I know will be useful in some not-too-distant future; kind of like a Twitter Evernote. Recently I was looking through my list in search of this excellent blog post detailing how to build cross-platform binary distributions for conda.

I came across two other tweets from the EuroSciPy 2014 conference: this one by Ian Ozsvald about his IPython memory usage profiler, right next to this one by Alexandre Chabot about Aaron O’Leary’s notedown. I’d forgotten that this was how I came across these two tools, but since then I have contributed code to both (1, 2). I’d met Ian at EuroSciPy 2013, but I’ve never met Aaron, yet nevertheless there is my code in the latest version of his notedown library.

How remarkable the open-source Python community has become. Talks from Python conferences are posted to YouTube, usually as the conference is happening. (Add to that plenty of live tweeting.) Thus, even when I can’t attend the conferences, I can keep up with the latest open source libraries, from the other side of the world. And then I can grab the source code on GitHub, fiddle with it to my heart’s content, and submit a pull request to the author with my changes. After a short while, code that I wrote for my own utility is available for anyone else to use through PyPI or conda.

My point is: join us! Make your code open source, and conversely, when you need some functionality, don’t reinvent the wheel. See if there’s a library that almost meets your needs, and contribute!

Calling out SciPy on diversity (even though it hurts)

Over the past few weeks, I’ve been heavily promoting the SciPy conference, a meeting about scientific programming in Python. I’ve been telling everyone who would listen that they should submit a talk abstract and go, because scientific programming is increasingly common in any scientist’s work and SciPy massively improves how you do that.

I have also been guiltily ommitting that the speaker and attendee diversity at SciPy is shockingly bad. Last year, for example, 15% of attendees were women, and that was an improvement over the ratio three years ago, when just 3% (!!!) were women.

I rationalised continuing to promote this conference because there was talk from past organisers about making efforts to improve. (And indeed, the past three years have been on an upward trajectory.)

A couple of days ago, however, the full list of keynote speakers was announced, and lo and behold, it’s three white guys. I have to acknowledge that they are extremely accomplished in the SciPy universe, and, if diversity were not more generally a problem at this conference and in tech in general, I wouldn’t bat an eye. Excellent choice of speakers, really. Looking forward to it.

But diversity is a problem. It’s an enormous problem. I’m inclined to call it catastrophic.

Let me try to quantify it. Men and women are equally capable scientific programmers. So out of a total pool of 100 potential SciPy attendees/contributors, 50 are women and 50 are men. Now, let’s assume the male side of the community is working at near-optimum capacity, so, 50 of 50 those men are at SciPy. 15% of attendees being women means just 9 of the 50 potential female contributors are making it out to the conference (9/59 ≈ 15%). Or, slice it another way, a whopping (50 – 9) / 50 = 82% of women who could be contributing to SciPy are missing.

Now, think of where we would be if we took 82% of male science-Pythonistas and just erased their talks, their discussions, and their code contributions. The SciPy ecosystem would suck. Yet that is exactly how many coders are missing from our community.

Now, we can sit here and play female conference speaker bingo until the cows come home to roost, but that is missing the point: these are all only excuses for not doing enough. “Not my fault” is not good enough anymore. It is everyone’s fault who does not make an active and prolonged effort to fix things.

The keynote speakers are an excellent place to make a difference, because you can eliminate all sorts of confounders. I have a certain sympathy, for example, for the argument that one should pick the best abstracts/scholarship recipients, rather than focus on race or gender. (Though the process should be truly blind to remove pervasive bias, as studies and the experience of orchestra auditions have repeatedly shown.) For keynotes though, organisers are free to pursue any agenda they like. For example, they can make education a theme, and get Lorena Barba and Greg Wilson in, as they did last year.

Until the gender ratio begins to even remotely approach 1:1, diversity as an agenda should be a priority for the organisers. This doesn’t mean invite the same number of women and men to give keynotes. This means keep inviting qualified women until you have at least one confirmed female keynote speaker, and preferably two. Then, and only then, you can look into inviting men.

Women have been found to turn down conference invitations more often than men, irrespective of ability or accomplishment. I don’t know why, but I suspect one reason is lack of role models, in the form of previous female speakers. That’s why this keynote roster is so disappointing. There’s tons of accomplished female Pythonistas out there, and there would be even more if we all made a concerted effort to improve things.

I don’t want to retread the same territory that Jonathan Eisen (@phylogenomics) has already covered in “Calling attention to meeting with skewed gender ratios, even when it hurts“. In particular, see that article for links to many others with ideas to improve gender ratios. But this is my contribution in the exact same series: love SciPy. See my previous posts for illustration.

Even looking back at my recent post, when I looked for a picture that I thought captured the collegial, collaborative feel of the conference, I unintentionally picked one featuring only men. This needs to improve, massively, if I’m going to keep supporting this conference. I really hope the organisers place diversity at the centre of their agenda for every decision going forward.

I thank Jonathan Eisen, Andy Ray Terrel, and April Wright for comments on earlier versions of this article.

Go to SciPy 2015

SciPy is my favourite conference. My goal with this post is to convince someone to go who hasn’t had that chance yet.

Photo by Ian Rees (from the SciPy 2012 conference website)

Why SciPy?

Most scientists go to conferences in their own field: neuroscientists go to the monstrous Society for Neuroscience (SfN); Bioinformaticians go to RECOMB, ISMB, or PSB; and so on.

People go to these to keep up with the latest advances in their field, and often, to do a bit of networking.

SciPy is a different kind of conference. It changes the way you do science. You learn about the latest free and open source software to help you with your work. You learn to write functions and interfaces instead of scripts, and to write tests so you don’t break your code. You also learn to contribute these to bigger projects, maximising the reach and impact of your work (see “sprints”, below).

And you learn these things by doing them, with the people who are the best at this, rather than by reading books and blog posts. (Which maybe I shouldn’t knock, since I’m writing a book about all this and you are reading my blog!)

Attendees to SciPy have scientific software in common, but come from diverse fields, including physics, pure maths, data visualisation, geosciences, climatology, and yes, biology and bioinformatics. Mingling with such a diverse group is a fantastic way to get your creative juices flowing!

The conference lasts a full week and is broken up into three parts: tutorials, main conference, and sprints.

the tutorials

With a few exceptions, you won’t learn about your own field. But you will learn an enormous amount about tools that will help you be a better scientist. If you are a newbie to Python, you can go to the beginner tutorials track and learn about the fantastic scientific libraries available in Python. If you already use NumPy, SciPy, pandas, and the IPython notebook, you can go to the intermediate or advanced tracks, and learn new things about those. Even as an advanced SciPy user I still get tons of value from the tutorials. (Last year Min RK gave a wild demo of IPython parallel’s capabilities by crawling wikipedia remotely while building up a graph visualisation on his live notebook.) (Fast-forward to the 1h mark to see just the payoff.) Here’s last year’s tutorial schedule for an idea of what to expect.

the main conference track

You will also hear about the latest advances in the scientific libraries you know and use, and about libraries you didn’t know about but will find useful (such as scikit-bio, yt or epipy). The main conference track features software advances written by presenters from many different fields. Hearing about these from the authors of the software lets you ask much different questions compared to hearing someone say, for example, “we used the Matlab image processing toolbox”. If you ever had a feature request for your favourite library, or you wondered why they do something in a particular way, there’s no better opportunity to get some closure.

The crosstalk between different fields is phenomenal. Hearing how a diverse set of people deal with their software problems really opens your mind to completely different approaches to what you had previously considered.

the sprints

Finally, there’s two days of coding sprints. Even if you are a complete beginner in software development, do yourself a favour and participate in one of these.

Two and a half years after my first SciPy in 2012, I’m writing a scientific Python book for O’Reilly, and I can 100% trace it to participating in the scikit-image sprint that year. With their guidance, I wrote my first ever GitHub pull request and my first ever unit test. Both were tiny and cute, and I would consider them trivial now, but that seed grew into massive improvements in my code-writing practice and many more contributions to open source projects.

And this is huge: now, instead of giving up when a software package doesn’t do what I need it to do, I just look at the source code and figure out how I can add what I want. Someone else probably wants that functionality, and by putting it into a major software library instead of in my own code, I get it into the hands of many more users. It’s a bit counterintuitive but there is nothing more gratifying than having some random person you’ve never met complain when you break something! This never happens when all your code is in your one little specialised repository containing functionality for your one paper.

How SciPy

The SciPy calls for tutorials, talks, posters, and its plotting contest are all out. There’s specialised tracks and most of you reading this are probably interested in the computational biology and medicine track. It’s taken me a while to write this post, so there’s just one week left to submit something: the deadline is April 1st Update: the deadline for talks and posters has been extended to April 10th!

Even if you don’t get something in, I encourage you to participate. Everything I said above still applies if you’re not presenting. You might have a bit more trouble convincing your funders to pay for your travels, but if that’s the case I encourage you to apply for financial assistance from the conference.

I’ve written about SciPy’s diversity problem before, so I’m happy to report that this year there’s specific scholarships for women and minorities. (This may have been true last year, I forget.) And, awesomely, Matt Davis has offered to help first-time submitters with writing their proposals.

Give SciPy a try: submit here and/or register here. And feel free to email me or comment below if you have questions!

Update: A colleague pointed out that I should also mention the awesomeness of the conference venue, so here goes: Austin in July is awesome. If you love the heat like I do, well, it doesn’t get any better. If you don’t, don’t worry: the AT&T Conference Center AC is on friggin overdrive the whole time. Plus, there’s some nearby cold springs to swim in. The center itself is an excellent hotel and the conference organises massive discounts for attendees. There’s a couple of great restaurants on-site; and the Mexican and Texas BBQ in the area are incredible — follow some Enthought and Continuum folks around to experience amazing food. Finally, Austin is a great city to bike in: last time I rented a road bike for the whole week from Mellow Johnny’s, and enjoyed quite a few lunchtime and evening rides.

SciPy 2014: an extremely useful conference with a diversity problem

I just got back home from the SciPy 2014 conference in Austin, TX. Here are my thoughts after this year’s conference.

About SciPy in general

Since my first SciPy in 2012, I’ve decided to prioritise programming conferences over scientific ones, because the value I get for my time is just that much higher. At a scientific conference, I certainly become aware of way-cool stuff going on in other research labs in my area. Once I get home, however, I go back to whatever I was doing. In contrast, at programming conferences, I become aware of new tools and practices that change the way I do science. In his keynote this year, Greg Wilson said of Software Carpentry, “We save researchers a day a week for the rest of their careers.” I feel the same way about SciPy in general.

In the 2012 sprints, I learned about GitHub Pull Requests and code review, the lingua franca of open source development today. I can’t express how useful that’s been. I also started my ongoing collaboration with the scikit-image project, which has enabled my research to reach far more users than I ever could have achieved on my own.

No scientific conference I’ve been to has had such an impact on my science output, nor can I imagine one doing so.

This year’s highlights

This year was no different. Without further ado, here are my top hits from this year’s conference:

  • Michael Droettboom talked about his continuous benchmarking project, Airspeed Velocity. It is hilariously named and incredibly useful. asv checks out code from your Git repo at regular intervals and runs benchmarks (which you define), and plots your code’s performance over time. It’s an incredible guard against feature creep slowing down your code base.
  • IPython recently unveiled their modal version 2.0 interface, sending vimmers worldwide rejoicing. But a few key bindings are just wrong from a vim perspective. Most egregiously, i, which should enter edit mode, interrupts the kernel when pressed twice! That’s just evil. Paul Ivanov goes all in with vim keybindings with his hilarious and life-changing IPython vimception talk. The title is more appropriate than I realised. Must-watch.
  • Damián Avila revealed (heh) his live IPython presentations with Reveal.js, forever changing how Python programmers present their work. I was actually aware of this before the conference and used it in my own talk, but you should definitely watch his talk and get the extension from his repo.
  • Min RK gave an excellent tutorial on IPython parallel (repo, videos 1, 2, 3). It’s remarkable what the IPython team have achieved thanks to their decoupling of the interactive shell and the computation “kernel”. (I still hate that word.)
  • Brian Granger and Jon Frederic gave an excellent tutorial on IPython interactive widgets (notebook here). They provide a simple and fast way to interactively explore your data. I’ve already started using these on my own problems.
  • Aaron Meurer gave the best introduction to the Python packaging problem that I’ve ever seen, and how Continuum’s conda project solves it. I think we still need an in-depth tutorial on how package distributors should use conda, but for users, conda is just awesome, and this talk explains why.
  • Matt Rocklin has a gift for crystal clear speaking, despite his incredible speed, and it was on full display in his (and Mark Wiebe’s) talk on Blaze, Continuum’s new array abstraction library. I’m not sure I’ll be using Blaze immediately but it’s certainly on my radar now!
  • Lightning talks are always fun: days 1, 2, 3. Watch out for Fernando Pérez’s announcement of Project Jupyter, the evolution of the IPython notebook, and for Damon McDougall’s riveting history of waffles. (You think I’m joking.)

Apologies if I’ve missed anyone: with three tracks, an added one with the World Cup matches ;) , and my own talk preparations, “overwhelming” does not begin to describe the conference! I will second Aaron Meurer’s assertion that there were no bad talks. Which brings us to…

On my to-watch

Jake Vanderplas recently wrote a series of blog posts (last one here, with links to earlier posts) comparing frequentist and Bayesian approaches to statistical analysis, in which he makes a compelling argument that we should all relegate frequentism to the dustbin of history. As such, I intend to go over Chris Fonnesbeck’s tutorial (2, 3) and talk about Bayesian analysis in Python using PyMC.

David Sanders also did a Julia tutorial (part 2) that was scheduled at the same time as my own scikit-image tutorial, but I’m interested to see how the Julia ecosystem is progressing, so that should be a good place to start. (Although I’m still bitter that they went with 1-based indexing!)

The reproducible science tutorial (part 2) generated quite a bit of buzz so I would be interested to go over that one as well.

For those interested in computing education or in geoscience, the conference had dedicated tracks for each of those, so you are bound to find something you like (not least, Lorena Barba’s and Greg Wilson’s keynotes). Have a look at the full listing of videos here. These might be easier to navigate by looking at the conference schedule.

The SciPy experience

I want to close this post with a few reflections on the conference itself.

SciPy is broken up into three “stages”: two days of tutorials, three days of talks, and two days of sprints. Above, I covered the tutorials and talks, but to me, the sprints are what truly distinguish SciPy. The spirit of collaboration is unparalleled, and an astonishing amount of value is generated in those two days, either in the shape of code, or in introducing newcomers to new projects and new ways to collaborate in programming.

My biggest regret of the conference was not giving a lightning talk urging people to come to the sprints. I repeatedly asked people whether they were coming to the sprints, and almost invariably the answer was that they didn’t feel they were good enough to contribute. To reiterate my previous statements: (1) when I attended my first sprint in 2012, I had never done a pull request; (2) sprints are an excellent way to introduce newcomers to projects and to the pull request development model. All the buzz around the sprints was how welcoming all of the teams were, but I think there is a massive number of missed opportunities because this is not made obvious to attendees before the sprints.

Lastly, a few notes on diversity. During the conference, April Wright, a student in evolutionary biology at UT Austin, wrote a heartbreaking blog post about how excluded she felt from a conference where only 15% of attendees were women. That particular incident was joyfully resolved, with plenty of SciPyers reaching out to April and inviting her along to sprints and other events. But it highlighted just how poorly we are doing in terms of diversity. Andy Terrel, one of the conference organisers, pointed out that 15% is much better than 2012’s three (women, not percent!), but (a) that is still extremely low, and (b) I was horrified to read this because I was there in 2012… And I did not notice that anything was wrong. How can it be, in 2012, that it can seem normal to be at a professional conference and have effectively zero women around? It doesn’t matter what one says about the background percentage of women in our industry and so on… Maybe SciPy is doing all it can about diversity. (Though I doubt it.) The point is that a scene like that should feel like one of those deserted cityscapes in post-apocalyptic movies. As long as it doesn’t, as long as SciPy feels normal, we will continue to have diversity problems. I hope my fellow SciPyers look at these numbers, feel appalled as I have, and try to improve.

… And on cue, while I was writing this post, Andy Terrel wrote a great post of his own about this very topic:
http://andy.terrel.us/blog/2014/07/17/

I still consider SciPy a fantastic conference. Jonathan Eisen (@phylogenomics), whom I admire, would undoubtedly boycott it because of the problems outlined above, but I am heartened that the organising committee is taking this as a serious problem and trying hard fix it. I hope next time it is even better.