experiences in serialization

The short of it: use Jerkson — you can easily serialize case classes:

case class Author(name : String) {}
case class Document(title : String, data : String, authors : List[Author]) {}
...
val encoded = Json.generate(doc)
val decoded = Json.parse[Document](encoded)
decoded should (equal(d))

I’ve been working some more with Scala, and found that I needed to serialize some data (I’m working with Hadoop).

Unfortunately, as with most things in life, I’ve been presented with all too many choices:

I first tried Avro, given that it’s part of the Hadoop project so I thought it would be the most seamless way of getting things workly.

Sadly, this was not to be. Avro generated classes do not support the Writable type for Hadoop, allowing them to be dropped in. Instead, you’re required to change all of your mappers/reducers to take in AvroKey/AvroValue wrapped items, and to set your output/input via the AvroInput/OutputFormats. This, while tedious, would be fine, except that I hit a Scala compiler bug when trying to get it all working.

My other thought was to simple convert the Avro object instances into strings myself, and then output strings from Hadoop. Hacky, but, hey it would work.

Except: After digging through the Avro documentation I couldn’t find a way of just turning my Avro structures into a serialized string. I could send them off to another server via an RPC, but dumping them to a file myself was out of the question. Sigh.

A note to serialization designers: please, please, please — give me an easy way to turn an object into a string.

I then started puttering around with the various JSON projects for Scala. Since there is no standard way of doing it, there are a lot of various cobbled together options that I had to try before finding out one that worked. Jerkson, despite the odd name, “just works”. I specify my objects as case classes, and magic, they can be serialized.

So now I’ve gone from outputting nice typed structures from Hadoop to just dumping strings and interpreting them myself. But I’m okay with that – it works.

Operator Overload

The last week I’ve been using Scala to work on a pet project of mine. It’s a relatively nice language, but one quirk (which I’ve also noticed in Haskell) is the extensive use of operator overloading.

For example, here’s a code completion pop-up:

I want to share a rule of thumb that’s useful for these situations:

  • Operators are functions
  • Functions should have useful names

If I’m operating on 2 integers, then +, *, /, etc. all have perfectly obvious interpretations. If I’m operating on a XML node then what do things like “/:\“, “\\“, and “\mean? And more importantly, is that meaning conveyed better then using an alternative such as ‘findAll’?

I hate to be curmudgeony, but there’s a point where I’d rather type ‘append’, then ‘+:=’.

the buddhist nature of swimming

I’ve been reading The Empty Mirror. It’s a fascinating little book about a Dutchman and a year he spent in a Japanese Zen monastery a few years after WWII. It’s a really quick read, and I recommend it – I just happened upon it in the local used bookstore, but I think it’s worth the Amazon price too.

The book, as expected focuses on the authors time in the monastery and his interactions with the monk, and the trials, stresses and achievement he gets out of the whole thing. Since I’ve also been swimming more frequently these days, it was natural for me to think about how the activities are somewhat related. In some sense, swimming is my form of meditation – it’s an activity where you can completely empty your mind. It’s especially true when you’re working hard and the pain from your muscles wipes everything else out.

I’ve never really done an extensive survey on this, but from personal experience, swimmers are pretty mellow people. Maybe it’s due to their extensive Buddhist training? I should arrange for a conference between some monks and swimmers to investigate more. But first I should probably practice meditating (or swimming) some more.

Review: Dark Silicon and the End of Multicore Scaling

Paper link: p365-esmaeilzadeh

Premise

Single core scaling stopped some number of years ago. But we’re okay, because we now have the “multi-core revolution” (I realllly hate that phrase). Or are we?

It turns out that another, very real wall is blocking future progress — power density. Already CPU designs strive mightily to shutdown or slowdown underutilized circuitry to save power. But we’re reaching the point where processors simply won’t ever be able to run “full-out”.

This paper explores the future of scaling by extrapolating based on a wide range of processors over time, and they find, rather compellingly, that we’re pretty much boned.

Details

The paper is pretty dense, but it boils down to the creation of a model parameterized by processor features:

  • number of cores
  • number of threads
  • CPU frequency
  • L1/L2/DRAM cache size, speed, miss rate
  • DRAM fetch width

and code features:

  • % code that is parallel
  • % read instructions

You can fit this model based on existing processors and their benchmark performance, as well as their power consumption and die area.

Once the model is created, now the fun starts — we can see how performance will scale with future process improvements (moving to 32, 22, 16…nm), or if we assume highly parallel code, we can see how to optimize our CPU design (lots of cores)!

The short of it

If we make optimistic assumptions for how power consumption drops with process scaling, we still end up with the sobering conclusion that we’re rapidly reaching the point where we won’t be able to power all of the parts of a chip. 128 cores aren’t as useful if we can’t keep them all running.

On the plus side, I can buy 1000 watt power supplies now, so I’m looking forward to my 16 processor, 16 core machine of the future. Welcome to the multiprocessor revolution!

fun with shared libraries — version GLIBC_2.14 not found

The shared library system for linux, especially when it comes to libc, is (choose one): archaic|complex|awesome. Today in trying to run some software on our local cluster, I encountered this lovely error:

/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found

Why was this occurring? For some reason, there was a version bump in the glibc for the memcpy instruction. How memcpy’s interface could possibly changed in a non-backward-compatible fashion, I don’t know. Still, since my local machine has a later libc then our cluster machines, I’m effectively boned.

There is a way around this, namely, I can ask the compiler to emit references to a specific, older version:

__asm__(".symver memcpy,memcpy@GLIBC_2.2.5");

Unfortunately, this code has to appear before any other uses of memcpy! Rather then update a whole lot of code, I took advantage of the -include option for gcc. I simply created a file with my desired symbol override:

# gcc-preinclude.h 

__asm__(".symver memcpy,memcpy@GLIBC_2.2.5");

and forced the build (here, autotools) to always include that before any other code:

./configure CC=gcc -include /path/to/gcc-preinclude.h ...

And away we go. A full rebuild and I’m back to a usable program again. Hurray!

Review: DThreads: Efficient Deterministic Multithreading

I’ve decided to try and be more pro-active about reading papers from the community. Since I tend to forget everything I read, it’s best I write down somewhere my thoughts for later reference — I’ve decided to post these reviews here in case they prove useful to others.

PDF of the paper

Overview:

This paper follows one of the latest fads in systems (NB: academics have fads just like fashionistas) — deterministic threading. The main idea behind all of these papers is to try and allow users to make use of multi-core/multi-threaded programming, but at the same time have it be deterministic. This is helpful as it ensures the behavior we see when testing is the same as what we see at runtime, and it also helps make it easier to recreate test failures.

This is a non-trivial problem, and it has a lot of work behind it already. The contribution with dthreads is that it, (a) claims high performance, and (b) provides a drop-in replacement for existing threads.

How does it work:

There’s a standard trick for d-threading, and DThreads uses it as well. It’s somewhat a kin to transactions in databases. First, split each thread into a separate world that can’t communicate with the other threads. Buffer all of the updates that would go to shared memory. Then, at a synchronization point (mutex lock, condition var, etc.), wait for all of the threads to finish their work, and then apply the shared memory changes in a deterministic fashion.

The DThreads way of handling this is to run each thread as a separate process. Whenever a commit point is reached, the processes determine which parts of memory they’ve changed, and commit them to shared memory at that time.

There are a number of things that need to be handled to make this all possible — you need to provide a deterministic malloc, you have to ensure threads are created and destroyed in a deterministic way, etc.

Why aren’t we using it today?

Two reasons, really.

Speed

DThreads is faster then previous work on the subject, but still can end up a lot slower then regular threading. This isn’t unexpected — you have to do more work, so you end up paying for it.

Correctness

DThreads, while being a drop-in replacement for pthreads, isn’t really a drop-in replacement for pthreads. Unfortunately for deterministic threading proponents, this type of code:

Thread 1:

while not done:
  sleep(0)

Thread 2:
... do some work ...
done = 1

where synchronization is handled outside the scope of the threading library is all too common in the wild. This forces the use of more expensive techniques then DThreads, which keeps determistic threading, for the moment, an academic exercise.

What did I think of the paper

It’s well written (which I expect of all SOSP papers), and presents a bunch of interesting tricks for improving performance and making things work. The fact that it requires all synchronization be performed through pthreads calls is on the face of it, reasonable, but sadly, it is far from realistic when it comes to running everyday applications.

Still, fun to read.