The Kanji Project
Ruby, XSLT
Being a lazy person, I decided to prioritise my study of kanji by studying the most frequently-occurring ones first.
In order to find out what those were, I set a spider loose on some Japanese websites—the kind I actually read—counting the number of occurrences of each kanji character, until it had gathered about a million data points. I then produced a report in XML, ran it through some XSLT to style it, and produced a list of the two thousand most frequent characters.
The list
Each entry on the list is cross-referenced to the corresponding entry in WWWJDIC so that you can easily look up the details of an unfamiliar character.
Here is the list of the top 2000 kanji. It’s a quarter of a megabyte in size, and may tax some browsers. I’m serving it compressed to save my bandwidth.
The raw data
I’ve had a few requests for the raw data, which I’m happy to oblige. It’s a gzipped XML file, and should be fairly easy to process into any format you desire.
2005-08-10 17:43 UTC. Comments: 2.
Snowtweety
Wrote at 2007-07-27 16:22 UTC using Safari 419.3 on Mac OS X:
What a great project! I found it helpful for refreshing my knowledge of Chinese Radicals since they share some of the same characters.a ruby nuby
Wrote at 2007-11-06 06:09 UTC using Opera 9.24 on Windows 2000:
Nice. I see the top 100 on kanji-a-day dot com as well. Any chance of seeing the Ruby script that made it all possible?cheers
walter