Skip to main content

Why citations are not enough for open source software

A few weeks ago I wrote about why you should cite open source tools. Although I think citations important, though, there are major problems in relying on them alone to support open source work.

The biggest problem is that papers describing a software library can only give credit to the contributors at the time that the paper was written. The preferred citation for the SciPy library is “Eric Jones, Travis Oliphant, Pearu Peterson, et al”, 2001. The “et al” is not an abbreviation here, but a fixed shorthand for all other contributors. Needless to say many, many people have contributed to the SciPy library since 2001 (GitHub counts 716 contributors as of this writing), and they are unable to get credit within the academic system for those contributions. (As an aside, Google counts about 1,200 citations to SciPy, which is a breathtaking undercounting of its value and influence, and reinforces my earlier point: cite open source software! Definitely don't use this post as an excuse not to cite it!!!)

Not surprisingly, we have had massive contributions to scikit-image since our 2014 paper, and those contributors miss out on the citations to our paper.

Read more…

Why you should cite open source tools

Every now and then, a moment or a sentence in a conversation sticks out at you, and lodges itself in the back of your brain for months or even years. In this case, the sentence is a tweet, and I fear that the only way to dislodge it is to talk about it publicly.

Last year, I complained on Twitter that a very prominent paper that was getting lots of attention used scikit-image, but failed to cite our paper. (Or the papers corresponding to many other open source packages.) I continued that scientists developing open source software depend on these citations to continue their work. (More on this in another post...) One response was that surely the developers of the open source scientific Python stack were not scientists per se, and that citations were not a priority for them.

I still sigh internally when I think of it.

That tweet manifests a pervasive perception that open source scientific software is written by God-like figures. These massively experienced software developers have easy access to funds for their work, and are at the service of all the other scientists, who are their users. I used to share this perception, but it is utterly false.

Read more…

Summer school announcement: 2nd Advanced Scientific Programming in Python (ASPP) Asia Pacific!

The Advanced Scientific Programming in Python (ASPP) summer school has had 10 successful iterations in Europe and one iteration here in Melbourne earlier this year. Another European iteration is starting next week in Camerino, Italy.

Now, thanks to the generous sponsorship of CSIRO, and the efforts of Benjamin Schwessinger and Genevieve Buckley, two alumni from the Melbourne school, and Kerensa McElroy, Agriculture Data School Coordinator at CSIRO, the Asia Pacific fork of ASPP gets its second iteration in Canberra, Jan 20-27, 2019.

Key details

  • The workshop runs January 20-27, 2019 at the Australian National University in Canberra, Australia.
  • topics include git, contributing to open source software with github, testing, debugging, profiling, advanced NumPy, Cython, and data visualisation.
  • hands-on learning using pair programming
  • free to attend (but students are responsible for travel, accommodation, and meals)
  • 30 student places, to be selected competitively
  • application deadline is Oct 7, 2018, 23:59 Anywhere On Earth
  • website: https://scipy-school.org
  • FAQ: https://scipy-school.org/faq
  • apply: https://scipy-school.org/applications

Read more…

The road to scikit-image 1.0

This is the first in a series of posts about the joint scikit-image, scikit-learn, and dask sprint that took place at the Berkeley Insitute of Data Science, May 28-Jun 1, 2018.

In addition to the dask and scikit-learn teams, the sprint brought together three core developers of scikit-image (Emmanuelle Gouillart, Stéfan van der Walt, and myself), and two newer contributors, Kira Evans and Mark Harfouche. Since we are rarely in the same timezone, let alone in the same room, we took the opportunity to discuss some high level goals using a framework suggested by Tracy Teal (via Chris Holdgraf): Vision, Mission, Values. I'll try do Chris's explanation of these ideas justice:

  • Vision: what are we trying to achieve? What is the future that we are trying to bring about?
  • Mission: what are we going to do about it? This is the plan needed to make the vision a reality.
  • Values: what are we willing to do, and not willing to do, to complete our mission?

So, on the basis of this framework, I'd like to review where scikit-image is now, where I think it needs to go, and the ideas that Emma, Stéfan, and I came up with during the sprint to get scikit-image there.

I will point out, from the beginning, that one of our values is that we are community-driven, and this is not a wishy-washy concept. (More below.) Therefore this blog post constitutes only a preliminary document, designed to kick-start an official roadmap for scikit-image 1.0 with more than a blank canvas. The roadmap will be debated on GitHub and the mailing list, open to discussion by anyone, and when completed will appear on our webpage. This post is not the roadmap.

Read more…

What do scientists know about open source?

A friend recently pointed out this great talk by Matt Bernius, What students know and don't know about open source. If you have even a minor interest in open source it's worth a watch, but the gist is: in the US alone, there are about 200,000 students enrolled in a computer science major. Open source communities are a great space to learn real-world programming, so why don't these numbers translate into massive contributions to open source?

At the core of the issue, Matt identifies two main problems: (1) colleges and universities simply don't teach open source, or even collaborative coding; and (2), many open source communities make newcomers feel unwelcome in a variety of ways.

I want to comment about this in the context of programming in science. That is, programming where the code is not the main product, but rather a useful tool to obtain a scientific result, for example in biology or physics. Here, we still see relatively little contribution to open source, for related but different cultural issues.

Read more…

1st ASPP Asia Pacific evaluation survey

In January of 2018, we had the first ASPP summer school outside of Europe. (This was a parallel workshop to the European one, which will be held in Italy in September 2018.) In general, it was a great success, with some caveats that we will elaborate on below.

First we want to note that this school was a bit different than the European ones, in that we only had attendees from Australian institutions, where the European school has broad international representation, including some from out of Europe. This was in some ways inevitable, as it is more expensive to travel to Australia from almost anywhere than to travel within Europe. On the other hand, we advertised relatively late, and we were unable to secure travel grants during the advertising period, so there is hope that a future edition would be able to attract a more international crowd from the Asia Pacific region.

Read more…

Summer School Announcement: ASPP Asia-Pacific 2018

The Advanced Scientific Programming in Python (ASPP) summer school has had 10 extremely successful iterations in Europe. (You can find past materials, schedules, and student evaluations at https://python.g-node.org/archives.) Now, thanks to the INCF, we will be holding its first iteration in Australia, to cater to the Asia Pacific region. (Note: the original ASPP will still take place in Europe next Northern summer; this is a fork of that school.)

Key details

  • The workshop runs January 14-21 at the Melbourne Brain Centre, University of Melbourne, Australia
  • topics include: git, contributing to open source software with github, testing, debugging, profiling, advanced NumPy, Cython, data visualisation.
  • hands-on learning using pair programming
  • free to attend (but students are responsible for travel, accommodation, and meals)
  • 30 student places, to be selected competitively
  • application deadline is Oct 31, 2017, 23:59 UTC.
  • website: https://melbournebioinformatics.org.au/aspp-asia-pacific
  • apply: https://melbournebioinformatics.org.au/aspp-asia-pacific/applications (make sure you read the FAQ on that page)

Read more…

Prettier LowLevelCallables with Numba JIT and decorators

In my recent post, I extolled the virtues of SciPy 0.19's LowLevelCallable. I did lament, however, that for generic_filter, the LowLevelCallable interface is a good deal uglier than the standard function interface. In the latter, you merely need to provide a function that takes the values within a pixel neighbourhood, and outputs a single value — an arbitrary function of the input values. That is a Wholesome and Good filter function, the way God intended.

In contrast, a LowLevelCallable takes the following signature:

int callback(double *buffer, intptr_t filter_size, 
             double *return_value, void *user_data)

That’s not very Pythonic at all. In fact, it’s positively Conic (TM). For those that don’t know, pointers are evil, so let’s aim to avoid their use.

Read more…

Numba in the real world

Numba is a just-in-time compiler (JIT) for Python code focused on NumPy arrays and scientific Python. I've seen various tutorials around the web and in conferences, but I have yet to see someone use Numba "in the wild". In the past few months, I've been using Numba in my own code, and I recently released my first real package using Numba, skan. The short version is that Numba is amazing and you should strongly consider it to speed up your scientific Python bottlenecks. Read on for the longer version.

Read more…