I was playing around with some XML parsing the other day, and allowed myself to become sidetracked. This is what resulted: a really easy way to extract data from an XML document. It’s called Xanimal for no better reason than that it contains X, M and L in that order.

It works by catching messages and building them up into XPath expressions, so that, for example, document.foo.bar[0] translates into '/foo/bar[1]' (note that it translates Ruby 0-indexing into XPath 1-indexing). It also allows iteration over nodes using each (or any other Enumerable method).

There’s also some potentially controversial functionality: if a non-alphabetic method is called on a node (e.g. +), the node is automatically coerced into a String, Float, or Integer, depending on its format.

Here’s an example of how to use it:

require 'xanimal'

xml = %{
  <root>
    <a name="alpha">
      <b>1</b>
    </a>
    <a name="beta">
      <b>2</b>
      <b>3</b>
    </a>
  </root>
}

doc = Xanimal::Document.string(xml)

# Attributes
doc.root.a.attr(:name) # => "alpha"

# Specifying nodes by index
doc.root.a[1].b[1].to_i # => 3

# Enumeration with automatic coercion
doc.root.a[1].b.inject(0){ |sum, value| sum + value } # => 5

# Unanchored search
doc.any.b.to_i # => 1

# Doesn't work ... yet
# doc.root.a.b.inject(0){ |sum, value| sum + value } # => 6

Even in its basic state, for the subset of situations where its features are adequate, I think it provides a few advantages over using an XML parser directly:

  • It looks like Ruby
  • It behaves like Ruby (e.g. by translating indexing offsets)
  • The XML parser is abstracted, so, whilst I’ve used LibXML2, it would also be possible to switch it for REXML or something else without altering client code.

The code is available via subversion:

svn co http://paulbattley.googlecode.com/svn/xanimal/trunk xanimal

Contributions would be very welcome!