I spent some time on Friday trying to get to the bottom of a particularly strange effect in Ruby. Changing completely unrelated lines of code elsewhere in the project would change the behaviour of YAML and cause a test to fail.

Furthermore, whilst the puzzling behaviour was completely deterministic, it wasn’t reproducible on anyone else’s computer. Just mine.

We had a test that was effectively doing something like this:

expected_yaml = {:x => 1, :y => 2}
assert_equal expected_yaml, value_under_test

The value of expected_yaml was always

--- 
:x: 1
:y: 2

but value_under_test was sometimes

--- 
:x: 1
:y: 2

and sometimes

--- 
:y: 2
:x: 1

Remember that hashes don’t have an order, so YAML is free to marshal the elements in any order it wishes. Thus, YAML wasn’t doing anything wrong. In fact, the test itself was at fault, but I’ll look at that in a minute.

By adding four lines of code in a completely different test file, I could reliably reproduce the failure. Here they are:

def test_
end
def i_am_never_called
end

They both had to be there: comment out either of those methods, and the original test passed.

So why would defining a method that’s never used affect YAML? The answer lies in how value_under_test was being generated. The hash that was used to generate the YAML value under test was constructed differently from the hash from which the expected YAML was generated.

Take two hashes, a and b:

a = {:x => 1, :y => 2}
b = {:x => 1}
b[:y] = 2

They are equal in Ruby. They represent the same data. They are almost identical.

a == b # true

But YAML isn’t written in Ruby. Most of it is in the form of a C extension that works directly with Ruby’s internal models. And there, in those shark-infested waters, those two arrays are not the same.

My hand-waving theory is that the spooky behaviour is to do with Ruby’s symbol table: defining those two extra methods added two more symbols and triggered an expansion of the symbol hash table, affecting the order in which YAML received the keys of the hash under test.

Going back to the actual test, I rewrote it to check that the generated YAML decoded into the expected hash values:

expected = {:x => 1, :y => 2}
assert_equal expected, YAML.load(value_under_test)

The moral of the story is, I suppose, not to rely on behaviour that is not defined. In this case, that was the ordering of hashes in YAML. It might be consistent most of the time, but, occasionally, the flap of a butterfly’s wing in a far-off part of the application can cause some completely different and quite perplexing results.