Skip to content

Cython Objects… Fools Rush In

May 15, 2013

At long last, I am writing up a comparison between a Cython skip list class and a similar C++ class using Boost.Python as a wrapper. There are other tools to wrap C++ classes, but I don’t have time at the moment to compare them. In a previous post, I suggested that you should use Cython as a first go to whenever you need to speed up your functions. My recommendation for classes is going to be similar, but more tempered. As a first pass, directly compiling your Python code with Cython buys you some speed for almost no effort. However, if you are comfortable with C++, Cython extension types don’t save you much by way of headaches and don’t speed your code up that much.

Let’s set up our test real quick:

data = np.random.rand(1000).tolist()
def insert_delete(list, data):
  for datum in data:
    list.insert(datum)
  for datum in data:
    list.delete(datum)

That gives the following timings:

%timeit -n 10 insert_delet(pp_list, data)
10 loops, best of 3: 585 ms per loop

%timeit -n 10 insert_delete(cy_list, data)
10 loops, best of 3: 348 ms per loop

%timeit insert_delete(cye_list, data)
10 loops, best of 3: 112 ms per loop

%timeit insert_delete(cpp_list, data)
100 loops, best of 3: 2.74 ms per loop

pp_list is the pure Python implementation, cy_list is an un-decorated Cython implementation, cye_list is the Cython extension type, and cpp_list is our Boost.Python C++ skip list. The Cython skip list, which was just a straight copy of our pure Python code, is about 40% faster than the Python implementation; moving to a Cython extension type gives another factor of three; and the C++ implementation buys you a factor of forty or so.

Before I get into the code, I’m going to admit that the Cython implementation is probably non-optimal. At first blush, I would not have guessed that Cython references and dereferencing would be so much slower, so it would make sense to see if using the Cython syntax for pointers would be faster. However, we would probably have to trick the garbage collector somehow. We’re not getting into the realm of ugly hacks, and I didn’t attempt to go down that road.

I mentioned in my last skip list post that there is danger in making your tests dependent upon your random number generate. Well, that came back and bit me with the Cython skip lists. Because the Cython classes are compiled, they use their own instance of the NumPy random number generator, so calling np.random.seed() doesn’t influence the state of the object’s random number generator. If I were being thorough, I would have added a way to set the object’s seed and rewritten the tests. Excuses, I know, but I will be announcing the distraction that has made me less thorough soon. The Cython code is identical to the tested Python code, so I’m not concerned about correctness at the moment. However, I couldn’t do any new development before developing a test suite.

Cython extension types have basically all of the headaches of a strongly-typed object oriented language with none of the tools to deal with those headaches. I spent a moderate amount of time trying to find a way to implement both a floating point and an integer version of the class, but I wasn’t able to do so without simply repeating lots of code. Cython fused types might solve some of these headaches, but they aren’t to the point where they could be used for extension types. Let’s look at some code to see what I mean. To demonstrate my point, I’m simply going to vomit the code at you at first:

@cython.nonecheck(False)
cdef class _CySLNode(object):
  cdef double _value
  cdef _CySLNode left
  cdef _CySLNode right
  cdef _CySLNode up
  cdef _CySLNode down
  def __init__(self, value):
    self._value = value
    self.left = None
    self.right = None
    self.up = None
    self.down = None
  property value:
    def __get__(self):
      return self._value    
  property left:
    def __get__(self):
      return self.left
    def __set__(self, node):
      self.left = node
    def __del__(self):
      self.left = None
  property right:
    def __get__(self):
      return self.right
    def __set__(self, node):
      self.right = node
    def __del__(self):
      self.right = None
  property up:
    def __get__(self):
      return self.up
    def __set__(self, node):
      self.up = node
    def __del__(self):
      self.up = None
  property down:
    def __get__(self):
      return self.down
    def __set__(self, node):
      self.down = node
    def __del__(self):
      self.down = None
  def __call__(self, value):
    """
    Use __call__ to insert a node into a skip list.
    self.right is always defined since a skip list 
    is initialized with -inf and inf as the two sentinel 
    nodes.

    UNSAFE OUTSIDE OF SKIP LIST!
    """
    self.right.left = self.__class__(value)
    self.right.left.right = self.right
    self.right = self.right.left
    self.right.left = self    
  def __repr__(self):
    return "A skip list node with value {}".format(self.value)

cdef class CySkipList(object):
  cdef _CySLNode head
  cdef _CySLNode _tail
  def __init__(self):
    """
    Skip list class written in Cython for sorting integers.

    insert(value) inserts an integer into the list
    remove(value)
    """
    self.head = _CySLNode(float("-inf"))
    self._tail = _CySLNode(float("inf"))
    self.head.right = self._tail
    self._tail.left = self.head

  cdef _init_level(self):
    self.head.up = _CySLNode(float("-inf"))
    self.head.up.down = self.head
    self.head = self.head.up
    self.head.right = self._tail
    self._tail.left = self.head

  def insert(self, value):
    node = self.head
    while node.down:
      if value < node.right.value:
        node = node.down
      else:
        node = node.right
    while value > node.right.value:
      node = node.right
    node(value)
    self._raise_level(node.right)

  cdef _delete(self, node):
    node.right.left = node.left
    node.left.right = node.right

  def delete(self, value):
    node = self._find(value)
    if node is None:
      raise ValueError('{} not in skip list'.format(value))
    else:
      while node.up:
        self._delete(node)
        upnode = node.up
        del node.up
        node = upnode
        del node.down
        del upnode
      self._delete(node)

  cdef _find(self, value):
    node = self.head
    while node.down:
      if value < node.right.value:
        node = node.down
      else:
        node = node.right
    while value >= node.value:
      if value==node.value:
        return node
      else:
        node = node.right
    return None

  cdef _raise_level(self, node):
    if self._raise_check():
      up_check = node.left
      while up_check.up is None and up_check.left:
        up_check = up_check.left
        if up_check.up is not None:
          place_node = up_check.up
        else:
          self._init_level()
          place_node = self.head
        place_node(node.value)
        node.up = place_node.right
        place_node.right.down = node
        self._raise_level(self.head.right)

  cdef _raise_check(self):
    return (np.random.rand()>0.5)
  def __repr__(self):
    return "Skip list object"
  def reset(self):
    self.__init__()

One thing to notice about that code is that you have to explicitly define getters and setters for every variable you want exposed. Object oriented partisans will be happy, but I’m going to go with Guido van Rossum and say that data hiding is better in theory than in practice. Outside of that, I’m just going to let the code speak for itself. Parsing the implementation details will be left as an exercise for the reader.

The next test I want to run is memory usage. Because Python stores the namespace of each object in its own dictionary, the memory overhead of skip lists is pretty monumental. I’m running memory_profiler because heapy does not seem to like custom compiled objects without some additional headaches.

First, let’s look at the memory usage for the undecorated Cython class. Running memory_profiler gives us the following output:

python -m memory_profiler mem_prof.py 100
Filename: mem_prof.py

Line #    Mem usage  Increment   Line Contents
================================================
    12                           @profile
    13    18.457 MB   0.000 MB   def main():
    14    35.906 MB  17.449 MB     N = 1000
    15    35.906 MB   0.000 MB     data = np.random.rand(N).tolist()
    16                             
    17    18.617 MB -17.289 MB     s_list = Skip()
    18    35.922 MB  17.305 MB     for datum in data:
    19                               s_list.insert(datum)

We have to divide those numbers by the 100 runs, so we are using about 170 kB to store 1000 8 byte doubles. Ouch. Let’s see if the extension type does any better.

python -m memory_profiler mem_prof.py 100
Filename: mem_prof.py

Line #    Mem usage  Increment   Line Contents
================================================
    12                           @profile
    13    18.457 MB   0.000 MB   def main():
    14    27.543 MB   9.086 MB     N = 1000
    15    27.551 MB   0.008 MB     data = np.random.rand(N).tolist()
    16                           
    17    18.531 MB  -9.020 MB     s_list = Skip()
    18    27.672 MB   9.141 MB     for datum in data:
    19                               s_list.insert(datum)

We use close to a factor of two less memory with the Cython extension type. Since we also save about a factor of three in execution time, I’m going to hazard a guess that a lot of that time saved is loading and unloading memory. We’ll look into that in a moment. Skip lists have about ~2N nodes with four pointers per node, so we don’t expect them to be real memory efficient. A quick back of the envelope calculation suggests we should have ~ 2,000 nodes which take up 40 bytes (assuming 8 byte pointers and 8 byte doubles), so we’re still a little high in memory consumption. Let’s see how C++ does:

python -m memory_profiler mem_prof.py 100
Filename: mem_prof.py

Line #    Mem usage  Increment   Line Contents
================================================
    12                           @profile
    13    18.461 MB   0.000 MB   def main():
    14    24.859 MB   6.398 MB     N = 1000
    15    24.867 MB   0.008 MB     data = np.random.rand(N).tolist()
    16                             
    17    18.539 MB  -6.328 MB     s_list = Skip()
    18    24.922 MB   6.383 MB     for datum in data:
    19                               s_list.insert(datum)

That profiling result makes no sense to me. Our back of the envelope calculation indicates we should be using around 80 kB, and the profiler is saying we are using around 64 kB. After racking my brain, my only guess is that because of the way Python stores objects, including floating point numbers, the cost of the numbers themselves isn’t getting included in that 64 kB. Our skip list nodes use 32 bytes outside of the number itself, so that could be an explanation. If that explanation holds true, our memory profile results are pure overhead numbers, and they look even worse.

After looking through the perf results, the Cython skip lists seem to be worse in every regard except for instructions per cycle. Cython seems to be doing more with the data after it is loaded into cache, so it is able to perform more instructions before needing to load new data. Admittedly, porting Python code even to the Cython extension type was easier than writing a C++ template class from scratch, but I’m relatively confident that they would be effectively equally difficult to maintain/update. If you need a fast class in Python, I would recommend just skipping the Cython extension type until they have better tools for templates/generic programming and other types of polymorphisms.

I think I have one last skip list post left in me before I need to move onto something else. I’m still really curious what the performance hit would be to use smart pointers (Boost or C++11) instead of the raw pointers in the current implementation. Before doing that, I will need to move to a faster random number generator like one from Blitz++. After profiling the skip lists compiled with -O3, almost all of the execution time was spent in the random number generator. There is also some subtlety because what we really want is a random bit string and not simply a random integer. After that, this blog is likely to go in another direction, so please stick with me as I try to figure out what I’m doing here.

Advertisements

From → C++, Cython, Python C API

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: