HTMLEntities now works with Ruby 1.9 and JRuby

If you’re using my HTMLEntities library—and it seems that quite a lot of people are—you may be glad to know that it now (as of version 4.1.0) works with both Ruby 1.9.1 and JRuby 1.3.1.

I’ve been aware for a while that it wasn’t compatible with Ruby 1.9. That’s not really surprising, due to the new regular expression engine (Oniguruma) and significant changes in the way that character encoding is handled between 1.8 and 1.9, but I finally did something about it.

There were two things I had to do to get regular expressions working in Ruby 1.9. One was to specify the encoding of the test files, which contain verbatim UTF-8 strings. I simply added the relevant directive at the top of those files:

# encoding: UTF-8

The second issue was that, as Oniguruma understands Unicode codepoints, I needed to use codepoint ranges instead of byte ranges. This was a bit tricky as it’s not documented in the Oniguruma syntax. I had to find it by trial and error. For future reference, you use \u{N}, where N is the hexadecimal codepoint. For example, this matches codepoints outside the printable ASCII range:

/[^\u{20}-\u{7E}]/

As a bonus, I also tested is against JRuby and got it working there. The performance is, alas, noticeably worse on JRuby than on either 1.8 or 1.9. I suspect that’s due to the additional layers of indirection in the regular expression engine and string handling, but I’m not sure. Still, working is better than not working, so I count it as progress.

I’m very interested in hearing any feedback, good or bad, and especially if I’ve accidentally introduced any bugs in spite of the test suite.

Comments

Skip to the comment form

  1. lopex

    Wrote at 2009-08-17 14:01 UTC using Firefox 3.5.1 on Windows XP:

    What JRuby and JDK version did you use ? Did you run it using server compiler ? Did you warm it up correctly ? JRuby should be much faster than 1.8 and even faster than 1.9.
  2. Paul Battley

    Wrote at 2009-08-17 14:10 UTC using Firefox 3.5.2 on Mac OS X:

    1.3.1; 6; no; probably not.

    I was just eyeballing the time taken to run the tests (so after the normal Java start-up time, but without having given the JIT any time to do anything).

    I should probably write a proper performance test, shouldn’t I?
  3. Paul Battley

    Wrote at 2009-08-18 17:18 UTC using Firefox 3.5.2 on Mac OS X:

    I did a bit of proper benchmarking on a big document, warming up first by running the same task before I benchmarked it.

    Ruby 1.8.7:  17.290000   1.790000  19.080000 ( 19.086496)
    Ruby 1.9.1:  10.750000   0.040000  10.790000 ( 10.796971)
    JRuby 1.3.1:  7.272000   0.000000   7.272000 (  6.952000)

    Given time to start and warm up, JRuby is the clear winner in speed.

Leave a comment

Please read the comment guidelines before posting. Comments are Gravatar-enabled. Your email address will not be published.

To prove that you’re human, type human in the Bot check field.

Trying to post some program output or a long code sample? Please use a paste service and link to it instead.