Best practices addendum: find and follow the conventions of your programming community

Juan Nunez-Iglesias

2014-01-09 21:11

The bioinformatics community is all atwitter about the recent PLOS Biology article, Best Practices for Scientific Computing. Its main points should be obvious to most quasi-experienced programmers, but I can certainly remember a time when they did not seem so obvious to me (last week I think). As such, it's a valuable addition to the written record on scientific computing. One of their code snippets, however, is pretty annoying:

def scan(op, values, seed=None):
# Apply a binary operator cumulatively to the values given
# from lowest to highest, returning a list of results.
# For example, if "op" is "add" and "values" is "[1,3,5]",
# the result is "[1, 4, 9]" (i.e., the running total of the
# given values). The result always has the same length as
# the input.
# If "seed" is given, the result is initialized with that
# value instead of with the first item in "values", and
# the final item is omitted from the result.
# Ex : scan(add, [1, 3, 5] , seed=10)
# produces [10, 11, 14]
...implementation...

First, this code ignores the article's own advice, (1b) make names consistent, distinctive, and meaningful. I would argue that "scan" here is neither distinctive (many other operations could be called "scan") nor meaningful (the function purpose is not at all clear from the name). My suggestion would be "cumulative_reduce".

It also does not address another important piece of advice that I would add to their list, maybe as (1d): Find out, and follow, the conventions of the programming community you're joining. This will allow others to use and assess your code more readily, and you to contribute to other code libraries more easily. Here, although they have made efforts to make their advice language-agnostic, the authors have chosen Python to illustrate their point. Python happens to have strong style and documentation prescriptions in the form of Python Enhancement Proposals PEP-8: Style Guide for Python Code and PEP-257: Docstring conventions. Following PEP-8 and PEP-257, the above comments become an actual docstring (which is attached to the function automatically by documentation-generating tools):

def cumulative_reduce(op, values, seed=None):
    """Apply a binary operator cumulatively to the values given.

    The operator is applied from left to right.

    For example, if "op" is "add" and "values" is "[1,3,5]",
    the result is "[1, 4, 9]" (i.e., the running total of the
    given values). The result always has the same length as
    the input.

    If "seed" is given, the result is initialized with that
    value instead of with the first item in "values", and
    the final item is omitted from the result.
    Ex : scan(add, [1, 3, 5] , seed=10)
    produces [10, 11, 14]
    """
    ...implementation...

In addition, the Scientific Python community in particular has adopted a few docstring conventions of their own, including the NumPy docstring conventions, which divide the docstring into meaningful sections using ReStructured Text, and the doctest convention to format examples, so the documentation acts as unit tests. So, to further refine their example code:

def cumulative_reduce(op, values, seed=None):
    """Apply a binary operator cumulatively to the values given.

    The operator is applied from left to right.

    Parameters
    ----------
    op : binary function
        An operator taking as input to values of the type contained in
        `values` and returning a value of the same type.
    values : list
        The list of input values.
    seed : type contained in `values`, optional
        A seed to start the reduce operation.

    Returns
    -------
    reduced : list, same type as `values`
        The accumulated list.

    Examples
    --------
    >>> add = lambda x, y: x + y
    >>> cumulative_reduce(add, [1, 3, 5])
    [1, 4, 9]

    If "seed" is given, the result is initialized with that
    value instead of with the first item in "values", and
    the final item is omitted from the result.

    >>> cumulative_reduce(add, [1, 3, 5], seed=10)
    [10, 11, 14]
    """
    ...implementation...

Obviously, these conventions are specific to scientific Python. But the key is that other communities will have their own, and you should find out what those conventions are and adopt them. When in Rome, do as the Romans do. It's actually taken me quite a few years of scientific programming to realise this (and internalise it). I hope this post will help someone get up to speed more quickly than I have.

(Incidentally, the Wordpress/Chrome/OSX spell checker doesn't bat an eye at "atwitter". That's awesome.)

Reference

Greg Wilson, DA Aruliah, C Titus Brown, Neil P Chue Hong, Matt Davis, Richard T Guy, Steven HD Haddock, Kathryn D Huff, Ian M Mitchell, Mark D Plumbley, Ben Waugh, Ethan P White, & Paul Wilson (2014). Best Practices for Scientific Computing PLoS Biol, 12 (1) DOI: 10.1371/journal.pbio.1001745

I Love Symposia!

Comments