KanjiBreak and its contributors release this database under the Creative Commons “No Rights Reserved” CC0 license.
Click to download SQLite database
Download the entire KanjiBreak database as a single SQLite3 database file! This database file can readily be consumed by nearly all programming environments.
Click to download CSV
Download the database as a CSV (comma-separated value) file! This file is readily imported into spreadsheet programs.
The SVG drawings used to denote the primitives are not included in the SQLite3 database. Those are derived from KanjiVG and are made available under the Creative Commons BY-SA license (same as KanjiVG) as a JSON database.
Both SQLite database and CSV spreadsheet contain the same three sets of data. In the CSV file, each set is separated by two empty rows.
First, if you are logged in, the filename will be KanjiBreak-hash1_[ALPHANUMERIC STRING]
. The alphanumeric string is the cryptographically-secure representation of your username, so you can identify which breakdowns you contributed. If you downloaded the CSV spreadsheet, the first row will contain this username representation. If you downloaded the SQLite database, this information is unfortunately only available in the filename.
Second, the list of three-thousand-odd kanji and primitives. In the SQLite database, this is in the targets
table. Each row represents a character capable of being broken down, or being in another character’s breakdown. (Caveat: a character can be in its own breakdown.) There are three columns in this data subset:
-
a
target
column containing the kanji or primitive,
-
a
primitive
column indicating whether the kanji is a primitive, and
-
a
kanji
column indicating whether this kanji is a jōyō or jinmeiyō kanji.
And third, what you really care about: the breakdowns. In the SQLite database, these are in the deps
table. Each row has three columns:
-
a
target
column indicating which kanji or primitive is being decomposed,
-
a
user
column containing the cryptographic representation of the username that contributed this breakdown, and
-
a
dependency
column one character in this user’s breakdown for this character. All values for target
and dependency
will be entries in the previous dataset (in the targets
table in SQLite).
A user is only allowed to have a single breakdown per character, but since each breakdown can contain multiple characters, look for multiple rows with the same
target
and
user
values—these denote that user’s entire breakdown for the given character.
Since several primitives used by KanjiBreak lack Unicode representations (that is, in the Basic Multilingual Plane), I chose to use semi-unrelated English words to denote them in the data.
Below is the list of 45 such primitives. Those marked with “⭐️” are not in Kanji ABC (see details).