Skip to content

Protobufs just reinvented JSON

I’m not a super big fan of protocol buffers, but that is neither here nor there. With the new protocol buffer release, you can now completely reproduce JSON in protocol buffers with proto maps:

message JSON {
  oneof value {
    int64 i = 1;
    double f = 2;
    string s = 3;
    JSON_List l = 4;
    JSON_Map m = 5;
  }
}

message JSON_Map {
  map<string, JSON> m = 1;
}

message JSON_List {
  repeated JSON val = 1;
}

I can’t imagine why on earth you would want to do that, but I’m amused that you can.

Advertisements

CS Outreach Mumblings

I think we can all agree that Computer Science has trouble recruiting and retaining people, especially those from less privileged backgrounds. I wonder how much of that is the fact that we put steep learning curves in front of people before you can do anything interesting. Here’s a short story of my experience trying to test out a script a friend of mine sent me. For a little background, I’m running Ubuntu 14.04 and trying to run a Python script.

I get the script in the email and try to execute it. Sadly, my friend is a Windows user which means I have to find a program to remove all the superfluous carriage returns [1]. Once the file is cleaned up, I try again to run the script. Unsurprisingly, I don’t have all of the dependencies installed on my system. No big deal, I’ll just add them. Of course, one of the dependencies isn’t available through Ubuntu’s packaged manager and has to be installed from the Python package index (PyPI). Being the sort who heeds sys-admin warnings to not install PyPI to my system’s root [2]. To avoid doing that, I need to set up a virtualenv, which allows me to install Python packages without installing them on my entire system. Upon trying to install the first package dependency, pip fails, and I’m forced to rely on Google. Apparently, I have to upgrade pip, using pip, just to install something else with pip. Long story short, this kind of mundanity continued for close to an hour.

Trying to put myself in someone who is just learning to program, I’m not sure why I would want to keep banging my head on the wall before I could even starting playing with the code. The Python community is generally good about being welcoming and encouraging to newcomers. That’s certainly part of why I have a career as a programmer at all. However, this is something we need to be better at. When core tools requires non-trivial expertise to use and no one fixes it, that sends the signal that we don’t care if people without that experience can use our tools. We can do better.

1. For whatever reason, Windows uses two characters to represent a new line in a file. It uses both the newline character ‘\n’ and the carriage return character ‘\r’. Unix-based systems (including Mac OS X) only use a single newline character. Unfortunately, the extra carriage return characters Windows adds to files causes Unix-based systems like my Ubuntu OS to choke and be unable to properly read the file.

2. Ubuntu uses Python for a lot of system management tasks. If you install a package to your system’s root, you run the risk of version conflicts that can break your OS. In short, never `sudo pip install` anything.

C++ co-routines have nothing to do with concurrency

Yes, the title is a little bit link-baity. It’s OK.

I just watched this excellent talk about the proposal to add co-routines to the C++17 standard. It’s a really interesting talk and gives you a sense of how to efficiently implement the future monad in an imperative language that compiles to machine code.

Now, here’s the rub: the core of the proposal (N4402) is adding a few keywords to C++ that are named after a concurrency pattern (co-routines) in spite of having nothing specific to do with currency. What is the proposal really doing? Bringing Haskell do-notation to C++!!! Haskell’s do-notation is easily my favorite syntactic sugar in the history of all syntactic sugar. Basically, do-notation makes working with monads palatable. Unfortunately, I do not have the time/space/sense of self loathing to be able to try to describe monads and why they matter, so I’m going to dedicate the rest of this post to pointing out the obvious similarities.

Read more…

A Simple Graph Algorithm

I wrote a simple graph algorithm to solve an NPR word puzzle and wrote up the solution on my GitHub.

An update about me: I’m still alive, working at Google, and mostly writing C++ these days. It seems I updated my ‘About’ section already in spite of not posting anything new.  I do have a few mostly written up blog posts that I do intend to finish. Unfortunately, almost all of my coding right now is either for work and confidential or too trivial to be worth writing about (dear god does the world not need another blog post on a prime number filter in Haskell. It’s an awesome language, but you can thank me for not writing that.).

You will be hunted down for you getattr tricks!!!

I recently inherited a codebase with… issues. One of my favorite pastimes for improving a codebase is to scroll through log files and fix bugs. Today, I found this error:

  File ".../my_module.py", line 1337, in foo_method
    bar = foo.title,
AttributeError: 'Foo' object has no attribute 'title'

No worries, grep -r will save me! Or wait… it won’t. Because someone wrote this code:

def call_obliquely(self, method_name, args):
  return getattr(self, 'method_'+method_name)(args)

Please don’t write that. Someday, someone somewhere is going to need to refactor that function somehow. If they can’t find where you call that function, they can’t take your usage into account. Using the full name is slightly better, because at least grep -r will find it. However, our IDE using brethren will be unable to use their automated refactoring tools to rename the function, and really, all their silly mouse-clicks have to be for something.

Generators generate modularity

Work has limited my time to post/do in depth work, but I wanted to write something about one of my favorite features of Python: generators. My secret goal in our current code base is to slowly make everything generators up and down. Maybe that is a touch facetious, but I do think generators are a great way to hide state and generate modularity.

Let’s start with an example. Can anyone tell me what is wrong with this code?

output_list = []
current_node = get_my_first_node()
while current_node:
  output_list.append(do_stuff(current_node.data))
  current_node = current_node.next_node

That’s a pretty straightforward example of iterating through a linked list and processing the data somehow. And you are correct, my dear reader, we are doing three logically distinct things with intermingled code. We are iterating over a linked list, processing the data, and accumulating the results of do_stuff() all together. We’ve written do_stuff to pull out a little bit of the complexity, but we’ve still kind of coded ourselves into a corner. What if we wanted to make this lazy? Why can’t I use my beloved list comprehension? What if I only wanted the first 5 items?

Ok, I went over the top a little there for a moment, but such is life. The solution is to abstract away the linked list with a generator:

def linked_list_generator(first_node):
  current_node = first_node
  while current_node:
    yield current_node
    current_node = current_node.next_node

Look at that! All of our linked list logic is hidden behind this interface. The yield statement means that the output of linked_list_generator(some_node) is going to be an iterable. More concretely, it means that we can write for node in linked_list_generator(some_node) and have that behave exactly as other iterables do. Now, we can replace our bad code about with a beautiful list comprehension:

first_node = get_my_first_node()
output_list = [do_something(node.data)
               for node in linked_list_generator(first_node)]

Isn’t that so much cleaner? We’ve now separated out our iterating logic, our processing logic and our accumulating logic. All without creating unnecessary mutable state.

If you’re looking to learn more about generators, you should probably look here.

Never leave the house with a bare exception

Some times best practices end up being just for aesthetics, and some times best practices exist so you aren’t stuck in an infinite loop which is catching your KeyboardInterrupt’s. There is basically no excuse to ever use code that looks like this:

try:
  do_something()
except:
  do_something_else()

If you don’t believe me, try this code:

while True:
  try:
    pass
  except:
    continue

Helpful hint: ctrl-\ will crash your interpreter. Which is the only way to get out of that loop. Today, I interacted with some code which is the moral equivalent of above. Except for that the infinite loop seemed to involve looping through a series of about five files. Those files were littered with bare except clauses. Finding the block catching my KeyboardInterrupt’s was effectively impossible. Just don’t do it.

If you insist on catching a plethora of exceptions try:

while True:
  try:
    pass
  except Exception:
    continue

That code will at least let you escape with a KeyboardInterrupt. If you are using a package which throws an exception not inheriting from Exception, I suggest you either write wrappers to catch the package-defined exceptions, or just use a better package.