The Kanji Project

Ruby, XSLT

Being a lazy person, I decided to prioritise my study of kanji by studying the most frequently-occurring ones first.

In order to find out what those were, I set a spider loose on some Japanese websites—the kind I actually read—counting the number of occurrences of each kanji character, until it had gathered about a million data points. I then produced a report in XML, ran it through some XSLT to style it, and produced a list of the two thousand most frequent characters.

The list

Each entry on the list is cross-referenced to the corresponding entry in WWWJDIC so that you can easily look up the details of an unfamiliar character.

Here is the list of the top 2000 kanji. It’s a quarter of a megabyte in size, and may tax some browsers. I’m serving it compressed to save my bandwidth.

The raw data

I’ve had a few requests for the raw data, which I’m happy to oblige. It’s a gzipped XML file, and should be fairly easy to process into any format you desire.

Comments

Skip to the comment form

  1. Snowtweety

    Wrote at 2007-07-27 16:22 UTC using Safari 419.3 on Mac OS X:

    What a great project! I found it helpful for refreshing my knowledge of Chinese Radicals since they share some of the same characters.
  2. a ruby nuby

    Wrote at 2007-11-06 06:09 UTC using Opera 9.24 on Windows 2000:

    Nice. I see the top 100 on kanji-a-day dot com as well. Any chance of seeing the Ruby script that made it all possible?

    cheers
    walter

Leave a comment

Please read the comment guidelines before posting. Comments are Gravatar-enabled. Your email address will not be published.

To prove that you’re human, type human in the Bot check field.