Note: This project is now being hosted on RubyForge. For newer releases, visit HTMLEntities on RubyForge.
I needed to decode HTML entities in Ruby this morning (the things like
ý and so on) and couldn't find any obvious, simple ways to do it that would handle the wide range of named entities available in HTML 4.01.
In true open source itch-scratching style, I wrote a small library to handle it. It can cope with named entities, as well as decimal and hexadecimal numeric entities.
As always, it decodes to UTF-8 format.
As luck would have it, I needed to do the reverse operation today, so I've added that facility. In acknowledgement of the new interface, I've bumped the version number to 2.0, but decoding is the same as before.
I've made some small usability improvements:
String#encode_entities now processes commands in the appropriate order automatically. Some code has been streamlined and cleaned up. Finally, it now comes as a
tar.gz package with an installer.
Thanks to Moonwolf, I have fixed some important errors. I had omitted to process
f as a hexadecimal digit. How embarrassing. One-digit entities now also work.
Full instructions can be found in the documentation.
require 'htmlentities' s = 'élan' s.decode_entities # => 'élan'
This is slightly more complicated, due to the various options. The
encode_entities method takes a variable number of parameters, which tell it which instructions to carry out.
require 'htmlentities' s = '<élan>' s.encode_entities # => '<élan>' s.encode_entities(:basic, :named) # => '<élan>'
- htmlentities.tar.gz (2.2)
2005-08-03 14:06 UTC. Comments: 12.