KanjiBreak and its contributors release this database under the Creative Commons “No Rights Reserved” CC0 license.
Click to download SQLite database
Download the entire KanjiBreak database as a single SQLite3 database file! This database file can readily be consumed by nearly all programming environments.
Click to download CSV
Download the database as a CSV (comma-separated value) file! This file is readily imported into spreadsheet programs.
The SVG drawings used to denote the primitives are not included in the SQLite3 database. Those are derived from KanjiVG and are made available under the Creative Commons BY-SA license (same as KanjiVG) as a JSON database.
Both SQLite database and CSV spreadsheet contain the same three sets of data. In the CSV file, each set is separated by two empty rows.
First, if you are logged in, the filename will be
KanjiBreak-hash1_[ALPHANUMERIC STRING]. The alphanumeric string is the cryptographically-secure representation of your username, so you can identify which breakdowns you contributed. If you downloaded the CSV spreadsheet, the first row will contain this username representation. If you downloaded the SQLite database, this information is unfortunately only available in the filename.
Second, the list of three-thousand-odd kanji and primitives. In the SQLite database, this is in the
targets table. Each row represents a character capable of being broken down, or being in another character’s breakdown. (Caveat: a character can be in its own breakdown.) There are three columns in this data subset:
target column containing the kanji or primitive,
primitive column indicating whether the kanji is a primitive, and
kanji column indicating whether this kanji is a jōyō or jinmeiyō kanji.
And third, what you really care about: the breakdowns. In the SQLite database, these are in the
deps table. Each row has three columns:
target column indicating which kanji or primitive is being decomposed,
user column containing the cryptographic representation of the username that contributed this breakdown, and
dependency column one character in this user’s breakdown for this character. All values for
dependency will be entries in the previous dataset (in the
targets table in SQLite).
A user is only allowed to have a single breakdown per character, but since each breakdown can contain multiple characters, look for multiple rows with the same
values—these denote that user’s entire breakdown for the given character.
Since several primitives used by KanjiBreak lack Unicode representations (that is, in the Basic Multilingual Plane), I chose to use semi-unrelated English words to denote them in the data.
Below is the list of 45 such primitives. Those marked with “⭐️” are not in Kanji ABC (see details).