Archive: 2005-08-19

  • ChaSen, Ruby, Ubuntu Linux

    ChaSen is a morphological analyser for Japanese. For me, it’s particularly useful in the context of full text search. Japanese doesn’t use spaces, so it’s very hard for a computer to work out where to break up the sentence in order to index the components. ChaSen handles this beautifully, delivering a full analysis of the sentence, showing each component’s pronunciation, basic form, and part of speech. It’s an example of standing on the shoulders of giants thanks to open source software: with such powerful tools available for free, it’s possible to achieve things that would otherwise be impossible.

