Metaphone
Ruby
About
Note: This is now part of the Text project, hosted on RubyForge. For newer releases, visit Text on RubyForge.
Metaphone encodes names into a phonetic form such that similar-sounding names have the same or similar Metaphone encodings.
As there are multiple implementations of Metaphone, each with their own bugs, I have based this on my reading of the specification. This implementation has been only lightly tested so far; please report any bugs found.
I have also compared this implementation with that found in PHP’s standard library. The present implementation follows the algorithm description, whilst PHP’s implementation mimics the behaviour of LP’s original BASIC implementation, which appears to contain bugs (specifically with the handling of CC and MB). The changes required for 100% compatibility are noted in the code, marked with [PHP]. It would be useful to compare the behaviour of other implementations as well.
The original system described by Lawrence Philips in Computer Language Vol. 7 No. 12, December 1990, pp 39-43:
The 16 consonant sounds:
B X S K J T F H L M N P R 0 W Y
0 represents the "th" sound.
Exceptions:
Initial kn-, gn-, pn, ac- or wr- -> drop first letter
Initial x- -> change to "s"
Initial wh- -> change to "w"
Transformations:
Vowels are kept only when they are the first letter.
B -> B unless at the end of a word after "m" as in "dumb"
C -> X (sh) if -cia- or -ch-
S if -ci-, -ce- or -cy-
K otherwise, including -sch-
D -> J if in -dge-, -dgy- or -dgi-
T otherwise
F -> F
G -> silent if in -gh- and not at end or before a vowel
in -gn- or -gned- (also see dge etc. above)
J if before i or e or y if not double gg
K otherwise
H -> silent if after vowel and no vowel follows
H otherwise
J -> J
K -> silent if after "c"
K otherwise
L -> L
M -> M
N -> N
P -> F if before "h"
P otherwise
Q -> K
R -> R
S -> X (sh) if before "h" or in -sio- or -sia-
S otherwise
T -> X (sh) if -tia- or -tio-
0 (th) if before "h"
silent if in -tch-
T otherwise
V -> F
W -> silent if not followed by a vowel
W if followed by a vowel
X -> KS
Y -> silent if not followed by a vowel
Y if followed by a vowel
Z -> S
Usage
require 'metaphone'
Metaphone.metaphone('foo bar') # => "F BR"
Revision history
- 2005-05-18 0.4 More efficient double-letter removal method.
- 2005-05-18 0.3 More test cases, caught bug with initial SCH.
- 2005-05-18 0.2 Fixed a bug with GG.
- 2005-05-18 0.1 Initial release.
Licence
Copyright © 2005 Paul Battley
Usage of the works is permitted provided that this instrument is retained with the works, so that any entity that uses the works is notified of this instrument.
DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.
Download
2005-07-26 15:23 UTC. Comments: 1.
Per Olofsson
Wrote at 2006-08-23 12:24 UTC using Firefox 1.5.0.5 on Linux:
Just what I needed for searching names in my website project! Thanks!