Tag Archives: Numba

Prettier LowLevelCallables with Numba JIT and decorators

In my recent post, I extolled the virtues of SciPy 0.19’s LowLevelCallable. I did lament, however, that for generic_filter, the LowLevelCallable interface is a good deal uglier than the standard function interface. In the latter, you merely need to provide a function that takes the values within a pixel neighbourhood, and outputs a single value — an arbitrary function of the input values. That is a Wholesome and Good filter function, the way God intended.

In contrast, a LowLevelCallable takes the following signature:

int callback(double *buffer, intptr_t filter_size, 
             double *return_value, void *user_data)

That’s not very Pythonic at all. In fact, it’s positively Conic (TM). For those that don’t know, pointers are evil, so let’s aim to avoid their use.

“But Juan!”, you are no doubt exclaiming. “Juan! Didn’t you just tell us how to use pointers in Numba cfuncs, and tell us how great it was because it was so fast?”

Indeed I did. But it left a bad taste in my mouth. Although I felt that the tradeoff was worth it for such a phenomenal speed boost (300x!), I was unsatisfied. So I started immediately to look for a tidier solution. One that would let me write proper filter functions while still taking advantage of LowLevelCallables.

It turns out Numba cfuncs can call Numba jitted functions, so, with a little bit of decorator magic, it’s now ludicrously easy to write performant callables for SciPy using just pure Python. (If you don’t know what Numba JIT is, read my earlier post.) As in the last post, let’s look at grey_erosion as a baseline benchmark:

>>> import numpy as np
>>> footprint = np.array([[0, 1, 0],
...                       [1, 1, 1],
...                       [0, 1, 0]], dtype=bool)
>>> from scipy import ndimage as ndi
>>> %timeit ndi.grey_erosion(image, footprint=fp)
1 loop, best of 3: 160 ms per loop

Now, we write a decorator that uses Numba jit and Numba cfunc to make a LowLevelCallable suitable for passing directly into generic_filter:

>>> import numba
>>> from numba import cfunc, carray
>>> from numba.types import intc, CPointer, float64, intp, voidptr
>>> from scipy import LowLevelCallable
>>> def jit_filter_function(filter_function):
...     jitted_function = numba.jit(filter_function, nopython=True)
...     @cfunc(intc(CPointer(float64), intp, CPointer(float64), voidptr))
...     def wrapped(values_ptr, len_values, result, data):
...         values = carray(values_ptr, (len_values,), dtype=float64)
...         result[0] = jitted_function(values)
...         return 1
...     return LowLevelCallable(wrapped.ctypes)

If you haven’t seen decorators before, read this primer from Real Python. To summarise, we’ve written a function that takes as input a Python function, and outputs a LowLevelCallable. Here’s how to use it:

>>> @jit_filter_function
... def fmin(values):
...     result = np.inf
...     for v in values:
...         if v < result:
...             result = v
...     return result

As you can see, the fmin function definition looks just like a normal Python function. All the magic happens when we attach our @jit_filter_function decorator to the top of the function. Let’s see it in action:

>>> %timeit ndi.generic_filter(image, fmin, footprint=fp)
10 loops, best of 3: 92.9 ms per loop

Wow! numba.jit is actually over 70% faster than grey_erosion or the plain cfunc approach!

In case you want to use this, I’ve made a package available on PyPI, so you can actually pip install it right now with pip install llc (for low-level callable), and then:

>>> from llc import jit_filter_function

The source code is on GitHub. Currently it only covers ndi.generic_filter‘s signature, and only with Numba, but I hope to gradually expand it to cover all the functions that take LowLevelCallables in SciPy, as well as support Cython. Pull requests are welcome!

SciPy’s new LowLevelCallable is a game-changer

… and combines rather well with that other game-changing library I like, Numba.

I’ve lamented before that function calls are expensive in Python, and that this severely hampers many functions that should be insanely useful, such as SciPy’s ndimage.generic_filter.

To illustrate this, let’s look at image erosion, which is the replacement of each pixel in an image by the minimum of its neighbourhood. ndimage has a fast C implementation, which serves as a perfect benchmark against the generic version, using a generic filter with min as the operator. Let’s start with a 2048 x 2048 random image:

>>> import numpy as np
>>> image = np.random.random((2048, 2048))

and a neighbourhood “footprint” that picks out the pixels to the left and right, and above and below, the centre pixel:

>>> footprint = np.array([[0, 1, 0],
...                       [1, 1, 1],
...                       [0, 1, 0]], dtype=bool)

Now, we measure the speed of grey_erosion and generic_filter. Spoiler alert: it’s not pretty.

>>> from scipy import ndimage as ndi
>>> %timeit ndi.grey_erosion(image, footprint=footprint)
10 loops, best of 3: 118 ms per loop
>>> %timeit ndi.generic_filter(image, np.min, footprint=footprint)
1 loop, best of 3: 27 s per loop

As you can see, with Python functions, generic_filter is unusable for anything but the tiniest of images.

A few months ago, I was trying to get around this by using Numba-compiled functions, but the way to feed C functions to SciPy was different depending on which part of the library you were using. scipy.integrate used ctypes, while scipy.ndimage used PyCObjects or PyCapsules, depending on your Python version, and Numba only supported the former method at the time. (Plus, this topic starts to stretch my understanding of low-level Python, so I felt there wasn’t much I could do about it.)

Enter this pull request to SciPy from Pauli Virtanen, which is live in the most recent SciPy version, 0.19. It unifies all C-function interfaces within SciPy, and Numba already supports this format. It takes a bit of gymnastics, but it works! It really works!

(By the way, the release is full of little gold nuggets. If you use SciPy at all, the release notes are well worth a read.)

First, we need to define a C function of the appropriate signature. Now, you might think this is the same as the Python signature, taking in an array of values and returning a single value, but that would be too easy! Instead, we have to go back to some C-style programming with pointers and array sizes. From the generic_filter documentation:

This function also accepts low-level callback functions with one of the following signatures and wrapped in scipy.LowLevelCallable:

int callback(double *buffer, npy_intp filter_size, 
             double *return_value, void *user_data)
int callback(double *buffer, intptr_t filter_size, 
             double *return_value, void *user_data)

The calling function iterates over the elements of the input and output arrays, calling the callback function at each element. The elements within the footprint of the filter at the current element are passed through the buffer parameter, and the number of elements within the footprint through filter_size. The calculated value is returned in return_value. user_data is the data pointer provided to scipy.LowLevelCallable as-is.

The callback function must return an integer error status that is zero if something went wrong and one otherwise.

(Let’s leave aside that crazy reversal of Unix convention of the past 50 years in the last paragraph, except to note that our function must return 1 or it will be killed.)

So, we need a Numba cfunc that takes in:

  • a double pointer pointing to the values within the footprint,
  • a pointer-sized integer that specifies the number of values in the footprint,
  • a double pointer for the result, and
  • a void pointer, which could point to additional parameters, but which we can ignore for now.

The Numba type names are listed in this page. Unfortunately, at the time of writing, there’s no mention of how to make pointers there, but finding such a reference was not too hard. (Incidentally, it would make a good contribution to Numba’s documentation to add CPointer to the Numba types page.)

So, armed with all that documentation, and after much trial and error, I was finally ready to write that C callable:

>>> from numba import cfunc, carray
>>> from numba.types import intc, intp, float64, voidptr
>>> from numba.types import CPointer
>>> @cfunc(intc(CPointer(float64), intp,
...             CPointer(float64), voidptr))
... def nbmin(values_ptr, len_values, result, data):
...     values = carray(values_ptr, (len_values,), dtype=float64)
...     result[0] = np.inf
...     for v in values:
...         if v < result[0]:
...             result[0] = v
...     return 1

The only other tricky bits I had to watch out for while writing that function were as follows:

  • remembering that there’s two ways to de-reference a pointer in C: *ptr, which is not valid Python and thus not valid Numba, and ptr[0]. So, to place the result at the given double pointer, we use the latter syntax. (If you prefer to use Cython, the same rule applies.)
  • Creating an array out of the values_ptr and len_values variables, as shown here. That’s what enables the for v in values Python-style access to the array.

Ok, so now what you’ve been waiting for. How did we do? First, to recap, the original benchmarks:

>>> %timeit ndi.grey_erosion(image, footprint=footprint)
10 loops, best of 3: 118 ms per loop
>>> %timeit ndi.generic_filter(image, np.min, footprint=footprint)
1 loop, best of 3: 27 s per loop

And now, with our new Numba cfunc:

>>> %timeit ndi.generic_filter(image, LowLevelCallable(nbmin.ctypes), footprint=footprint)
10 loops, best of 3: 113 ms per loop

That’s right: it’s even marginally faster than the pure C version! I almost cried when I ran that.

Higher-order functions, ie functions that take other functions as input, enable powerful, concise, elegant expressions of various algorithms. Unfortunately, these have been hampered in Python for large-scale data processing because of Python’s function call overhead. SciPy’s latest update goes a long way towards redressing this.