Skip to content

When in doubt, use Cython

February 28, 2013

I’ll admit this up front: I was a Cython skeptic. I love C, and I had a lot of fun learning about the Python C API. I also saw some post somewhere that other C interfaces had a higher function call overhead, and really, all that work has to be good for something, right? I was wrong. There are still cases where you should write your own C extension, but you should move heaven and hell profiling first. This post isn’t going to talk at all about how to use Cython or the C API, only when to use them. Future posts will likely cover both topics.

I love C. I enjoy reading K&R, and I love the puzzles that programming in C provides. However, having lots of puzzles isn’t a good way to develop software. Rapid prototyping, extensive testing and profiling is the way to do it, e.g. Python. For the sake of this post, I’m assuming that you are working with a CPU-limited problem. Python doesn’t give you much control of memory, so it is probably the wrong tool for memory-limited problems. I’m also going to ignore potential parallelization optimization. That is for another post. Let’s say you’ve been smart: you wrote up a prototype in straight Python, profiled, optimized by exporting calculation chunks to NumPy, profiled some more, and you have function(s) that are killing your speed. What should you do next? Use Cython.

All of the code for this post can be found at my Git Hub in the FibCython folder. In particular, I’m not going to post the code for the C extensions on this post, but you can find it there.

I don’t know about you, but I have a tendency to end up with a couple of functions getting called a whole bunch of times stealing most of my precious cycles. Because of that, function call overhead matters to me. It’s not everything, but it matters. The first example I’m going to work with is a simple function that calculates the largest Fibonacci number less than n:

def fib(n):
  a, b = 0, 1
  while b < n:
    a, b = b, a + b
  return b

Not a very complicated function, but it will work for our purposes. Let’s run IPython’s %timeit on fib(1). [1] On my machine, the output looks like:

%timeit fib(1)
10000000 loops, best of 3: 148 ns per loop

All that function call does is initialize two variables, perform one comparison, and returns a value. That is about as simple of a function you will ever see. As a first pass, let’s just compile the code in cython. The only thing that is needed is a setup file. The file setup.py posted does that. After doing absolutely nothing but compiling identical code with Cython, we get a greater than factor of two speed up:

%timeit cyfib2(1)
10000000 loops, best of 3: 58.8 ns per loop

cyfib2 is simply the local name of fib2 from the file test_cython.pyx. The code is exactly as above. Cython can provide real speed by adding static typing to Python. Without the overhead of type checking, computations are able to be compiled in simplified C. Let’s add that everything we are seeing is an integer:

def fib(int n):
  cdef int a, b
  a, b = 0, 1
  while b < n:
    a, b = b, a + b
  return b

That took me two minutes. The developer time cost of using Cython is tiny. Let’s compile and run our test:

%timeit cyfib(1)
10000000 loops, best of 3: 40.6 ns per loop

That is a lot of speed for typing ‘int’ three times. But obviously, it isn’t worth the hassle of compiling Cython code for O(1) calls. Let’s run our three programs for the largest Fibonacci number less than 1000:

%timeit fib(1000)
1000000 loops, best of 3: 1.22 us per loop
%timeit cy2fib(1000)
1000000 loops, best of 3: 437 ns per loop
%timeit cyfib(1000)
10000000 loops, best of 3: 53.5 ns per loop

fib is pure Python and displays horrendous scaling. This is why you need to profile and patch performance critical pieces of your code. cy2fib is nothing but compiling pure Python code with Cython. That gives us close to a factor of 3 speedup. Finally, cyfib is our Python code annotated with the fact that we are dealing with integers. That gives us a factor of 20 speed up in total time and a factor of 100 in scaling. All that for very little work.

Now, are you ready for some REAL speed? I’ve compiled a C extension which calculates the same thing, with the same algorithm as the Python code. This only took me a few minutes because I could just use a template I had lying around, and there is no real reference counting/memory headaches to deal with. Let’s look at the function call overhead:

%timeit cfib(1)
10000000 loops, best of 3: 94.5 ns per loop

Dammit. The function call overhead is better than pure Python, but significantly slower than Cython. Obviously, I could simply read the outputted C code from Cython to figure out how they optimized the function call, or I could spend that time writing more fast Cython code. Let’s try running the two functions on a big number to get an estimate of the scaling:

%timeit cfib(1000000000)
10000000 loops, best of 3: 140 ns per loop
%timeit cyfib(1000000000)
10000000 loops, best of 3: 88.5 ns per loop

Basically, both implementations scale at effectively the same rate, so the C function will never overcome the loss in function call overhead. For more complicated functions, the function call overhead becomes irrelevant. However, I think this test pretty firmly puts the burden of proof on demonstrating the necessity of using the C API. I also wrote an exponential scaling version of the code in hopes that I would be able to defend C’s honor, but all it did was blow up my call stack. Don’t do that. Exponentially increasing recursion depth is not your friend.

There is at least one case where the C API can give you significant speed: when you have to perform element-wise calculations on an array. Cython gives you significant speed improvement for getting X[i], but it is still pretty slow. In my next post, I’ll go into more detail about the C API, optimizing array element access in Cython, and some nice tests. As a reminder, all of the code can be found here.

1. I’ve imported all functions in this post to my local namespace. That way, we aren’t also counting the expense of an unsuccessful hash or two.

Advertisements

From → Cython, Python C API

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: