ChaSen is a morphological analyser for Japanese. For me, it’s particularly useful in the context of full text search. Japanese doesn’t use spaces, so it’s very hard for a computer to work out where to break up the sentence in order to index the components. ChaSen handles this beautifully, delivering a full analysis of the sentence, showing each component’s pronunciation, basic form, and part of speech. It’s an example of standing on the shoulders of giants thanks to open source software: with such powerful tools available for free, it’s possible to achieve things that would otherwise be impossible.

I was trying to build the Ruby/ChaSen library on Ubuntu Linux. After a little trouble, I discovered that it was necessary to specify the library location:

> ruby extconf.rb -L/usr/lib
> make
> sudo make install

But every time I tried to require it in Ruby, I was rewarded with:

[...]/chasen.so: undefined symbol: _Znwj

Apparently, the problem lies in mkmf, which is choosing the wrong linker.

The solution? After using extconf.rb to create the Makefile, edit Makefile before starting make.

In the line:

LDSHARED = $(CC) -shared

Change $(CC) to g++ and then make as before. Don’t forget to make clean first if there are already files lying around from previous failed attempts.

I hope that this brief explanation helps anyone else suffering from the same problem.