Approximate string processing

A few months ago, I was googling around and came across a concept (more like a field of study) of "approximate string processing". One of the key algorithms used in this field is called "Edit Distance" or "Levenshtein Distance" (named after the person who created the algorithm).

The edit distance between two strings is the minimum number of edit operations (i.e. insertions, deletions, and substitutions) of single characters needed to transform the first string into the second.

examples (I've bolded the changes for your viewing pleasure)




OriginalNewEdit Distance
AlanAllen2
RichTina3


I found some working code for this for Java, PL/SQL, VB and VBA.

It could be a concept you could use that works similiarly to Google's "Did you mean" concept. Many spell check programs are also using this algorithm.

Links you can find more information on this algorithm:
http://www.vldb.org/conf/2001/P491.pdf
http://www.merriampark.com/ld.htm
http://citeseer.ist.psu.edu/cache/papers/cs/11552/ http:zSzzSzwww.cs.princeton.eduzSz~ristadzSzpaperszSzpu-532-96.pdf/ristad97learning.pdf

Comments