# Testing C++ Skip Lists

We all know testing is important. And hard. Writing good tests that cover all likely use cases and run in a reasonable amount of time. You know what makes that harder? When your function returns a different value every time even with the same input. Random data structures and algorithms provide a unique challenge in writing good tests. This post is going to discuss how I went about writing up unit tests for my C++ skip list implementation.

Before getting into the actual details, let’s start with how to get Boost.Test up and running. I would recommend you start here. Sometime in between the Boost 1.34.1 he used and Boost 1.48, there was a change to the Boost unit testing library adding another required macro. Jason Sankey’s hello world needs to be modified by adding #define BOOST_TEST_MAIN:

#define BOOST_TEST_DYN_LINK #define BOOST_TEST_MAIN #define BOOST_TEST_MODULE Hello #include <boost/test/unit_test.hpp> int add(int i, int j) { return i + j; } BOOST_AUTO_TEST_CASE(universeInOrder) { BOOST_CHECK(add(2, 2) == 4); }

Before going over the tests let’s take a look at the code. Frist, the signature:

template <class T> class Node { protected: T value; public: Node * left; Node * right; Node * up; Node * down; Node(T value) : value(value) { left = right = up = down = NULL; }; T get_value() {return value;}; }; template <class T> class SkipList { //Insert node with value to right of node void list_insert(Node<T>* node, T Value); void raise_level(Node<T>* node, T value); void insert_level(); bool raise_check(void) { return (p_up > distribution(generator)); }; void delete_list(Node<T>* node); void delete_node(Node<T>* node); void copy(SkipList<T>& new_, const SkipList<T>& other); std::mt19937 generator; std::uniform_int_distribution<unsigned int> distribution; unsigned int rand_check; public: const T p_inf; const T n_inf; Node<T> * head; Node<T> * tail; int levels; int n_elem; float p_up; SkipList(); SkipList(const SkipList<T>& list_to_copy); SkipList<T>& operator=(const SkipList<T>& rhs); ~SkipList(); void set_seed(); void insert(const T value); bool remove(T value); Node<T> * find(T value); void reset(); };

I’ve made the classes slightly Pythonic in the sense that almost all of the members of the two classes are public. The only things that are private are those that would probably destroy the structure of the lists if invoked by an outside caller. The code is definitely more verbose–I think I typed the line `template <class T>`

approximately 100 times, but the only real difference in code complexity is the C++ object-oriented work. In particular, because we have dynamic memory allocation, we need a custom destructor, copy constructor and assignment operator. Those three functions make up another ~50 lines of code. The member function/methods are basically equivalent to the Python versions. For example, here is the implementation of insert:

template <class T> void SkipList<T>::insert(const T value) { if (value == n_inf || value == p_inf) return; rand_check = distribution(generator); Node<T> * node = head; while (value >= node->value) { if (value < node->right->get_value() && node->down) node = node->down; else node = node->right; } n_elem++; list_insert(node->left, value); if (raise_check()) raise_level(node->left, value); } template <class T> void SkipList<T>::raise_level(Node<T>* node, T value) { Node<T> * upnode = node->left; while (!(upnode->up) && upnode->left) upnode = upnode->left; if (upnode->up) upnode = upnode->up; else { insert_level(); upnode = head; } list_insert(upnode, value); upnode->right->down = node; node->up = upnode->right; if (raise_check()) raise_level(upnode->right, value); } template <class T> void SkipList<T>::insert_level() { head->up = new Node<T>(n_inf); head->up->down = head; head = head->up; tail->up = new Node<T>(p_inf); tail->up->down = tail; tail = tail->up; head->right = tail; tail->left = head; levels++; }

I’m using some bit-array trickery to allow me to only draw one random number on an insert, but other than that, the most significant difference in implementation is the fact that integer types don’t have a built-in infinity value. The right way to do this would be to define our own infinity template and overload the comparison operator. For now, I’m just using the numerical_limits to get +/- infinity if we can or just the largest/smallest representable value. That approach limits us to numeric types, but refactoring the code to be more general would not be difficult.

Since we are good developers committed to test-driven development, we are going to write up some good, consistent unit tests. We start writing up our unit tests for the private member functions before moving on to higher level tests, and we suddenly realize: we don’t know what our skip list is going to look like. In fact, the structure of our skip lists will likely change from run to run. When I tested the Python implementation, I simply set the random number generator seed to a fixed number and solved the skip list structure by hand. I have absolutely no desire to do that again. Also, that approach would mean that changing the (pseudo-)random number generator would break our tests. We write unit tests in part so that we can confirm we didn’t break anything when I tinker. We can start with some obvious invariants. First, let’s set up our test fixture so that we don’t have to recreate our skip lists over and over:

struct RandomSList { SkipList<int> s_list; RandomSList() { default_random_engine generator; uniform_int_distribution<int> rand_int(0, MAX_INSERT); for (int i=0; i<MAX_INSERT/10; i++) s_list.insert(rand_int(generator)); } }; BOOST_FIXTURE_TEST_SUITE(test_list_function, RandomSList) /* ... Our tests here ... */ BOOST_AUTO_TEST_SUITE_END()

We can break tests of random data structures into two categories: deterministic invariants and probabilistic tests. Unfortunately, a judgement call will need to be made in how frequently you want your test to fail on a correct implementation. Starting out, we know every element of every list better be less than or equal to it’s right neighbor, so we can start by testing that. A whole bunch of `BOOST_CHECK_LE(node->get_value(), node->right->get_value());`

later, we know that our skip list is properly ordered:

BOOST_AUTO_TEST_CASE(test_ordering) { s_list.insert(1003); s_list.insert(1003); // Ensure a duplicate insert was attempted Node<int> * row, * node; row = node = s_list.head; while (node->down) { node = row; while (node->right) { BOOST_CHECK_LE(node->get_value(), node->right->get_value()); if (node->up) BOOST_CHECK_EQUAL(node->get_value(), node->up->get_value()); if (node->down) BOOST_CHECK_EQUAL(node->get_value(), node->down->get_value()); node = node->right; } row = row->down; } }

We should also check that every connection is reciprocal, and versions of `BOOST_CHECK_EQUAL(node, node->right->left);`

will do the trick. We can add more tests to ensure that insert, delete, find and our various constructors are behaving as expected. However, we still don’t know if levels are being created properly.

In order to check the skip list’s levels, we really need to run a statistical test. If ten inserts in a row lead to elements in the second level of our skip list, we can never know if our implementation is broken or if we are simply one in a thousand unlucky. Thankfully, we can just run our test again and find out. The one statistical test I wrote tests for the rate at which nodes are promoted up a level. If this were deployed with large data sets, you would also want to test for correlations, but we’re not going that far. Our promotion decisions should be independent binomial trials, so we can use a normal approximation for the number of elements promoted to the next level given a number of total nodes. The test I wrote throws a warning if the number of promoted nodes is off by three standard deviations:

BOOST_AUTO_TEST_CASE(test_new_levels) { Node<int> * row, * node; row = s_list.head; while (row->down) row = row->down; while (1) { node = row; // Node count starts at -1 to avoid counting sentinels int nodes = -1, upnodes = 0; while (node->right) { if (node->up) upnodes++; nodes++; node = node->right; } if (nodes < 10) break; BOOST_WARN_MESSAGE(abs((float) s_list.p_up*nodes-upnodes) < 3*std_dev(nodes), "Level increase rate > 3*sigma away from expected.\n" << "Total nodes = " << nodes << "; Nodes pointing up = " << upnodes << "\n"); if (row->up) row = row->up; else break; } }

For the bottom list with 1000 elements, that test is going to fail >50% of the time if the probability of being promoted is >0.55 or <0.45. That is not incredibly sensitive, but it is always going to catch catastrophic failures.

As usual, all the code can be found at GitHub. Before you spend too much time twisting your brain around test_skip.cpp, I will forewarn you that file needs a lot of work. I’ve only been able to get Boost.Test to report the test and line number where the test failed. Because of that, the test program includes a ton of repeated code. There is also no testing of private members yet. In short, I’m relatively confident the skip list implementation is safe, but the tests aren’t strong enough to ensure that future versions are that way.

I ran some quick tests after wrapping the C++ code with Boost::Python, and I saw a 10-20x speed up in inserts/deletes and around a factor of 10 in memory savings. I was very impressed with Boost::Python, but it is not very portable since a user needs Boost::Python installed to compile your extension. In my next post, I will also wrap the C++ code in Cython and compare both to straight Cython and pure Python implementations. As a warning, the code presented here requires C++11, but I think the only real dependency is the random number generator. I've also done all the memory handling manually to get things up and running quickly. Don't worry, the code has been properly valgrind'ed and comes up clean. There are some significant object safety questions in the implementation, but as they say, we're all adults here. You can traverse the list yourself since the head and tail are exposed, but In a future post, I may try Boost and/or C++11 smart pointers and compare performance. It just seemed simpler to me to write the memory handling myself for a first pass.

EDIT: Adding the -O3 flag to g++ brings the speed up to about a factor of 100. More details soon.