Xanimal: Easy XML parsing in Ruby

I was playing around with some XML parsing the other day, and allowed myself to become sidetracked. This is what resulted: a really easy way to extract data from an XML document. It’s called Xanimal for no better reason than that it contains X, M and L in that order.

It works by catching messages and building them up into XPath expressions, so that, for example, document.foo.bar[0] translates into '/foo/bar[1]' (note that it translates Ruby 0-indexing into XPath 1-indexing). It also allows iteration over nodes using each (or any other Enumerable method).

There’s also some potentially controversial functionality: if a non-alphabetic method is called on a node (e.g. +), the node is automatically coerced into a String, Float, or Integer, depending on its format.

Here’s an example of how to use it:

require 'xanimal'

xml = %{
  <root>
    <a name="alpha">
      <b>1</b>
    </a>
    <a name="beta">
      <b>2</b>
      <b>3</b>
    </a>
  </root>
}

doc = Xanimal::Document.string(xml)

# Attributes
doc.root.a.attr(:name) # => "alpha"

# Specifying nodes by index
doc.root.a[1].b[1].to_i # => 3

# Enumeration with automatic coercion
doc.root.a[1].b.inject(0){ |sum, value| sum + value } # => 5

# Unanchored search
doc.any.b.to_i # => 1

# Doesn't work ... yet
# doc.root.a.b.inject(0){ |sum, value| sum + value } # => 6

Even in its basic state, for the subset of situations where its features are adequate, I think it provides a few advantages over using an XML parser directly:

  • It looks like Ruby
  • It behaves like Ruby (e.g. by translating indexing offsets)
  • The XML parser is abstracted, so, whilst I’ve used LibXML2, it would also be possible to switch it for REXML or something else without altering client code.

The code is available via subversion:

svn co http://paulbattley.googlecode.com/svn/xanimal/trunk xanimal

Contributions would be very welcome!

Comments

Skip to the comment form

  1. Michael Neumann

    Wrote at 2007-10-17 09:51 UTC using Firefox 2.0.0.6 on FreeBSD:

    You might get some further ideas from _why’s Hpricot. It’s capable of parsing HTML and XML, and has a very decent API. You can use CSS-style or XPath expressions.

    require ‘hpricot’
    doc = Hpricot(xml)

    # Ruby style
    (doc/:root/:a).attr(:name) # => “alpha”

    # CSS style
    (doc/’root a’).attr(:name) # => “alpha”

    # XPath
    (doc/’//root/a’).attr(‘name’) # => “alpha”

    Have I mentioned that Hpricot is very fast (C-based parser)?
  2. Paul Battley

    Wrote at 2007-10-17 19:45 UTC using Firefox 2.0.0.7 on Mac OS X:

    Hpricot can do XML too? I never realised. I guess that makes my effort a bit redundant!

Leave a comment

Please read the comment guidelines before posting. Comments are Gravatar-enabled. Your email address will not be published.

To prove that you’re human, type human in the Bot check field.

Trying to post some program output or a long code sample? Please use a paste service and link to it instead.