Levenshtein

Ruby

About

Note: This is now part of the Text project, hosted on RubyForge. For newer releases, visit Text on RubyForge.

The Levenshtein distance is a measure of how similar two strings s and t are, calculated as the number of deletions/insertions/substitutions needed to transform s into t. The greater the distance, the more the strings differ.

The Levenshtein distance is also sometimes referred to as the easier-to-pronounce-and-spell ‘edit distance’.

Revision history

Licence

Copyright © 2005 Paul Battley

Usage of the works is permitted provided that this instrument is retained with the works, so that any entity that uses the works is notified of this instrument.

DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.

Download

Comments

Skip to the comment form

  1. nieruihan

    Wrote at 2005-08-26 05:39 UTC using Firefox 1.0.6 on Windows XP:

    Hi,

    The link to levenshtein.rb (http://po-ru.com/files/levenshtein/1.2/levenshtein.rb) returns:

    Not Found

    The requested URL /files/levenshtein/1.2/levenshtein.rb was not found on this server.
  2. Paul Battley

    Wrote at 2005-08-26 10:08 UTC using Safari 412.2.2 on Mac OS X:

    Thanks. It should be fixed now.
  3. Rocket Man

    Wrote at 2012-01-07 03:07 UTC using Internet Explorer 8.0 on Windows XP:

    Levenshtein you say?

    This reminds me of an interesting game I used to play with a friend way back when (actually when computers with word processors containing a good thesaurus first appeared). It’s a lot like calculating the Levenshtein distance (which I’d never heard of before now – thanks Paul!) but it involves working out the conceptual distance between two words. It works like this.

    You name a random, interesting word, let’s say ‘Charity’ and your friend names any random but conceptually far removed word, let’s say ‘Penetrate’. You agree a time limit, say 15 minutes, depending on how hot you are. Now, whoever goes first types the start word into the thesaurus, and by selecting from either a list of synonyms or antonyms at each turn, selecting a word from that list, and then finding synonyms or antonyms of it, in turn, you attempt to converge on the conceptual area to which your target word belongs. So from Charity you might move to the area related to ‘giving’ because you might feel that’s closer to ‘penetration’ than the other alternatives. There’s usually at least 30 – 50 steps and sometimes a great many more. Sometimes the challenge is to do it at all, but it’s always possible. This is a very deep, thoughtful, and intellectually challenging game, and not for those with limited patience (or vocabualry!), but it’s a great challenge if you have the temperament for it and of course you can play it as a form of solitaire, providing you don’t cheat by selecting two words that are conceptually closely related from the start.

    I was going to apologise in case this post is considered off topic, but I reckon I can get from topic to post in five minutes flat!

    Have fun with this one!!

Leave a comment

Please read the comment guidelines before posting. Comments are Gravatar-enabled. Your email address will not be published.

To prove that you’re human, type human in the Bot check field.

Trying to post some program output or a long code sample? Please use a paste service and link to it instead.