YAML is a simple markup language that has made it into Ruby’s standard library. It’s quite nice, and easy to edit for simple data structures.

However, there’s a significant problem related to the deserialisation of binary data, which I discovered several months ago. It’s a bit of a weird bug, too. I was reminded of the bug today, so here it is:

Ruby’s YAML.dump does a great job of escaping binary data where needed. Unfortunately, YAML.load cannot always read its own output back in.
I’ve experimented a bit, and found that a parse error occurs when

  • A string is used as a hash key; and
  • The string contains a byte 0x00; and
  • The string contains a byte 0x0a in a non-final position.

A rather odd set of circumstances, I’m sure you’ll agree. I analysed the exact failure behaviour by serialising and deserialising random data, then comparing all the failed data until I had pinpointed the exact causes.

I’ve made a small set of test cases to demonstrate the problem.