The Kanji Project

Ruby, XSLT

Being a lazy person, I decided to prioritise my study of kanji by studying the most frequently-occurring ones first.

In order to find out what those were, I set a spider loose on some Japanese websites—the kind I actually read—counting the number of occurrences of each kanji character, until it had gathered about a million data points. I then produced a report in XML, ran it through some XSLT to style it, and produced a list of the two thousand most frequent characters.

The list

Each entry on the list is cross-referenced to the corresponding entry in WWWJDIC so that you can easily look up the details of an unfamiliar character.

Here is the list of the top 2000 kanji. It’s a quarter of a megabyte in size, and may tax some browsers. I’m serving it compressed to save my bandwidth.

The raw data

I’ve had a few requests for the raw data, which I’m happy to oblige. It’s a gzipped XML file, and should be fairly easy to process into any format you desire.

Comments

  1. Snowtweety

    Wrote at 2007-07-27 16:22 UTC using Safari 419.3 on Mac OS X:

    What a great project! I found it helpful for refreshing my knowledge of Chinese Radicals since they share some of the same characters.
  2. a ruby nuby

    Wrote at 2007-11-06 06:09 UTC using Opera 9.24 on Windows 2000:

    Nice. I see the top 100 on kanji-a-day dot com as well. Any chance of seeing the Ruby script that made it all possible?

    cheers
    walter
  3. Anki-guy

    Wrote at 2009-04-06 04:56 UTC using Firefox 3.0.8 on Windows XP:

    How does your list compare to the official joyo kanji? Or to the JLPT kanji?
  4. Oukila Mohamed Yassin

    Wrote at 2009-12-19 09:33 UTC using Firefox 3.5.6 on Windows XP:

    You’re awesome
  5. Hagen Patzke

    Wrote at 2013-02-21 22:03 UTC using Firefox 19.0 on Windows 7:

    ...still useful today. Many thanks from a learner!