HTML Entities

Ruby

Note: This project is now being hosted on RubyForge. For newer releases, visit HTMLEntities on RubyForge.

I needed to decode HTML entities in Ruby this morning (the things like ý and so on) and couldn't find any obvious, simple ways to do it that would handle the wide range of named entities available in HTML 4.01.

In true open source itch-scratching style, I wrote a small library to handle it. It can cope with named entities, as well as decimal and hexadecimal numeric entities.

As always, it decodes to UTF-8 format.

Update 2005-08-23

As luck would have it, I needed to do the reverse operation today, so I've added that facility. In acknowledgement of the new interface, I've bumped the version number to 2.0, but decoding is the same as before.

Update 2005-10-31

I've made some small usability improvements: String#encode_entities now processes commands in the appropriate order automatically. Some code has been streamlined and cleaned up. Finally, it now comes as a tar.gz package with an installer.

Update 2005-11-07

Thanks to Moonwolf, I have fixed some important errors. I had omitted to process f as a hexadecimal digit. How embarrassing. One-digit entities now also work.

Usage

Full instructions can be found in the documentation.

Decoding

Very simple:

require 'htmlentities'
s = 'élan'
s.decode_entities # => 'élan'

Encoding

This is slightly more complicated, due to the various options. The encode_entities method takes a variable number of parameters, which tell it which instructions to carry out.

require 'htmlentities'
s = '<élan>'
s.encode_entities # => '&lt;élan&gt;'
s.encode_entities(:basic, :named) # => '&lt;&eacute;lan&gt;'

Download

Older versions

Comments

Skip to the comment form

  1. Aaron

    Wrote at 2005-10-12 21:42 UTC using Safari 412.5 on Mac OS X:

    Exactly what I was looking for! Thanks!
  2. Porges

    Wrote at 2005-10-31 05:09 UTC using Firefox 1.4.1 on Linux:

    Kind of makes you wonder why something like this isn’t in the standard library…
  3. thoran

    Wrote at 2005-11-24 03:35 UTC using Firefox 1.0.7 on Mac OS X:

    Same. I assumed that it was lurking somewhere in the standard libraries.

    I had to go check before I posted this and found that there are methods to do as you require in the standard libraries and elsewhere. (It would be good if the search functionality on ruby-doc.org were clearer… The whole front page is a little messy actually. Also, http://www.ruby-forum.com/topic/908 was helpful.)

    Wot I found:

    1. html_escape as part of the ERB::Util package in the standard library. However there is no html_unescape in there as yours now provides.
    2. CGI::escapeHTML and CGI::unescapeHTML (as well as CGI::escapeElement and CGI::unescapeElement) are also in the standard library.
    3. I also found Common::html_escape as part of ruby-asp. Again, there’s no unescape!? See http://raa.ruby-lang.org/gonzui/markup/ruby-asp/lib/asp/ common.rb?q=fundef:mode.

    The ‘problem’ with all of these is that it seems quite unRuby-like to pass the string as a parameter to the method. More Ruby-like is something which is bundled in Rails…

    4. There is String.prototype.escapeHTML() and String.prototype.unescapeHTML() in Javascript in from script.aculo.us.

    So I think your solution, sending a message to a string, is the neatest of all the Ruby options.

    And after all that, I was really after url_encode and decode anyway! So, in case anyone else gets here looking for URL encoding and decoding, from the same libraries, and in the same order:

    1. Again ERB::Util provides for half of the story url_encode only;
    2. CGI provides both CGI::escape and CGI::unescape;
    3. ruby-asp has both also: Common::url_encode and Common::url_decode;
    4. I haven’t checked script.aculo.us.

    For now I think I’ll use CGI since its in the standard library in spite of them all having the same ‘problem’ with passing strings for URL encoding as well.
  4. Ruben

    Wrote at 2006-02-21 18:47 UTC using Safari 417.8 on Mac OS X:

    Fantastic! It’s working great so far. Thank you!
  5. JJ

    Wrote at 2006-02-22 12:50 UTC using Firefox 1.5.0.1 on Mac OS X:

    Thanks for saving me some money!
    I.e. Time is money. Thanks!
  6. Ally

    Wrote at 2006-09-07 10:27 UTC using Firefox 1.5.0.6 on Windows XP:

    Great little tool.. the decode function is especially useful and a lot better than CGI::unescapeHTML as it misses quite a lot of entities.

    Nice work!
  7. Ken

    Wrote at 2006-09-23 22:08 UTC using Firefox 1.5.0.5 on Linux:

    Very helpful, thanks!
  8. Paul

    Wrote at 2007-01-28 22:33 UTC using Firefox 1.5.0.9 on Linux:

    Again, precisely what I was looking for. Very neat coding. You made me a very happy person, by its completeness and usability. I have put it in my repository, your unit-tests next to mine… it’s doing well there :)
  9. Marco

    Wrote at 2007-03-16 16:38 UTC using Konqueror 3.1 on Unknown OS:

    This library seems very useful, and I’m going to use it for the Alexandria program http://alexandria.rubyforge.org/
  10. Dan

    Wrote at 2007-04-11 14:35 UTC using Mozilla 1.8.1 on Mac OS X:

    Nicely done. I don’t know why CGI::unescapeHTML doesn’t support the named entities. It doesn’t make sense.

    While I appreciate the ease of using your functions (s.decode_entities), I’d still rather see it in CGI. After all, it’s not the string’s job to know how to decode itself.
  11. Scott

    Wrote at 2007-07-12 14:20 UTC using Firefox 2.0.0.4 on Windows XP:

    Can anyone tell my why I get this output when I run…

    string= “you& #8217;ve”
    coder.decode(string)
    => “you342200231ve”

    I’m expecting it to be => you’ve. Am I doing something wrong? Is it a UTF-8 thing? If so, is there a way I can make sure my string is in the correct format before feeding it to coder?

    Many thanks.
  12. roop

    Wrote at 2007-12-05 10:40 UTC using Safari 419.3 on Mac OS X:

    Awesome. Thanks!

Leave a comment

Please read the comment guidelines before posting. Comments are Gravatar-enabled. Your email address will not be published.

To prove that you’re human, type human in the Bot check field.

Trying to post some program output or a long code sample? Please use a paste service and link to it instead.