How to break YAML in Ruby
YAML is a simple markup language that has made it into Ruby’s standard library. It’s quite nice, and easy to edit for simple data structures.
However, there’s a significant problem related to the deserialisation of binary data, which I discovered several months ago. It’s a bit of a weird bug, too. I was reminded of the bug today, so here it is:
Ruby’s YAML.dump
does a great job of escaping
binary data where needed. Unfortunately, YAML.load
cannot always read its own output back in.
I’ve experimented a bit, and found that a parse error occurs
when
- A string is used as a hash key; and
- The string contains a byte
0x00
; and - The string contains a byte
0x0a
in a non-final position.
A rather odd set of circumstances, I’m sure you’ll agree. I analysed the exact failure behaviour by serialising and deserialising random data, then comparing all the failed data until I had pinpointed the exact causes.
I’ve made a small set of test cases to demonstrate the problem.