Break down kanji with us!


Share the data!

KanjiBreak and its contributors release this database under the Creative Commons “No Rights Reserved” CC0 license.

Click to download SQLite database

Download the entire KanjiBreak database as a single SQLite3 database file! This database file can readily be consumed by nearly all programming environments.

Click to download CSV

Download the database as a CSV (comma-separated value) file! This file is readily imported into spreadsheet programs.

The SVG drawings used to denote the primitives are not included in the SQLite3 database. Those are derived from KanjiVG and are made available under the Creative Commons BY-SA license (same as KanjiVG) as a JSON database.

Technical details

Both SQLite database and CSV spreadsheet contain the same three sets of data. In the CSV file, each set is separated by two empty rows.

First, if you are logged in, the filename will be KanjiBreak-hash1_[ALPHANUMERIC STRING]. The alphanumeric string is the cryptographically-secure representation of your username, so you can identify which breakdowns you contributed. If you downloaded the CSV spreadsheet, the first row will contain this username representation. If you downloaded the SQLite database, this information is unfortunately only available in the filename.

Second, the list of three-thousand-odd kanji and primitives. In the SQLite database, this is in the targets table. Each row represents a character capable of being broken down, or being in another character’s breakdown. (Caveat: a character can be in its own breakdown.) There are three columns in this data subset:

  1. a target column containing the kanji or primitive,
  2. a primitive column indicating whether the kanji is a primitive, and
  3. a kanji column indicating whether this kanji is a jōyō or jinmeiyō kanji.

And third, what you really care about: the breakdowns. In the SQLite database, these are in the deps table. Each row has three columns:

  1. a target column indicating which kanji or primitive is being decomposed,
  2. a user column containing the cryptographic representation of the username that contributed this breakdown, and
  3. a dependency column one character in this user’s breakdown for this character. All values for target and dependency will be entries in the previous dataset (in the targets table in SQLite).
A user is only allowed to have a single breakdown per character, but since each breakdown can contain multiple characters, look for multiple rows with the same target and user values—these denote that user’s entire breakdown for the given character.

Denoting primitives

Since several primitives used by KanjiBreak lack Unicode representations (that is, in the Basic Multilingual Plane), I chose to use semi-unrelated English words to denote them in the data.

Below is the list of 45 such primitives. Those marked with “⭐️” are not in Kanji ABC (see details).

every = TO11

inch⭐️ = EN7

fun = FR2

fishy⭐️ = FR12

marshall = B9

noble = B14

bird = C11

bless = D7

sing = F8

recommend = H9

careful = I14

shine = I19

school = L2

usual = L3

tall = M2

new = M16

mikado = M18

shizuku = M19

sky = N5

sure = O6

park = P4

pathos = P5

mourning = P8

detain = P15

discard = Q8

charity = Q14

thirst = S7

association = T2

tool = T18

exist = V4

left = V5

lament = V16

haru = V21

soldier = W2

judge = W5

soak = X3

supervise = Y6

frank = Z1

seal = Z2

not = Z5

queen = Z7

bureau = Z10

beautiful = Z17

return = Z18

anxious = Z22