ChaSen, Ruby, Ubuntu Linux
ChaSen is a morphological analyser for Japanese. For me, it’s particularly useful in the context of full text search. Japanese doesn’t use spaces, so it’s very hard for a computer to work out where to break up the sentence in order to index the components. ChaSen handles this beautifully, delivering a full analysis of the sentence, showing each component’s pronunciation, basic form, and part of speech. It’s an example of standing on the shoulders of giants thanks to open source software: with such powerful tools available for free, it’s possible to achieve things that would otherwise be impossible.
I was trying to build the Ruby/ChaSen library on Ubuntu Linux. After a little trouble, I discovered that it was necessary to specify the library location:
> ruby extconf.rb -L/usr/lib > make > sudo make install
But every time I tried to require
it in Ruby, I was
rewarded with:
[...]/chasen.so: undefined symbol: _Znwj
Apparently, the problem lies in mkmf
, which is
choosing the wrong linker.
The solution? After using extconf.rb
to create the
Makefile
, edit Makefile
before starting
make
.
In the line:
LDSHARED = $(CC) -shared
Change $(CC)
to g++
and then
make
as before. Don’t forget to make
clean
first if there are already files lying around from
previous failed attempts.
I hope that this brief explanation helps anyone else suffering from the same problem.