ChaSen, Ruby, Ubuntu Linux
ChaSen is a morphological analyser for Japanese. For me, it’s particularly useful in the context of full text search. Japanese doesn’t use spaces, so it’s very hard for a computer to work out where to break up the sentence in order to index the components. ChaSen handles this beautifully, delivering a full analysis of the sentence, showing each component’s pronunciation, basic form, and part of speech. It’s an example of standing on the shoulders of giants thanks to open source software: with such powerful tools available for free, it’s possible to achieve things that would otherwise be impossible.