Code Monkey home page Code Monkey logo

yomichan-dictionaries's Introduction

Yomichan/Yomitan Dictionaries

A comprehensive collection of Japanese and Chinese dictionaries for Yomichan/Yomitan, including terms, kanji/hanzi info, frequency, and variants with both monolingual and bilingual dictionaries available.

This repository contains dictionaries for Yomichan/Yomitan, a Japanese dictionary browser extension for Chrome, Firefox, and Edge. The repository was originally created to host the dictionaries I created, but I have since adapted this repository to serve as a hub for other dictionaries as well. If you have a dictionary you would like to share, please open an issue or pull request.

Please check here for information on what dictionaries to install.

My related dictionary resources:

Table of Contents

Dictionary Collection

Dictionaries Folder Download

Changelog

Here is a folder containing all the Japanese, Mandarin, and Cantonese dictionaries that I personally use and would recommend. I usually keep up to date with the latest versions of the dictionaries, and the folder is configured to automatically download and import the regularly updating dictionaries such as Jitendex, JMnedict, and KANJIDIC. If there is a dictionary in the folder that is outdated, please let me know!

Tip

  • But scrolling to the next entry takes too long!
  • Here are some shortcuts that may or may not help you with this:
    • alt + scroll down
    • alt + pagedown
    • alt + down arrow

Important

The collection is not complete as there are some dictionaries I feel are outdated or unnecessary or have simply chosen not to install. For these as well as older and alternate versions of dictionaries, I recommend checking out Shoui's Dictionary Collection.

What Dictionaries Should I Install?

If you're a beginner, I recommend following Shoui's Yomichan Setup. I would say the bare minimum is:

  • Bilingual:
    • [JA-EN] jitendex-yomichan
    • [JA-EN] NEW 斎藤和英大辞典
    • [JA-EN] 新和英
  • Grammar:
    • [JA Grammar] dojg-consolidated-v1_01
  • Frequency:
    • [JA Freq] JPDB_2022-05-10T03_27_02.930Z
    • [JA Freq] Freq_CC100
    • [JA Freq] BCCWJ-LUW
  • Monolingual but still useful for coverage to a beginner! If a word shows up in one of these dictionaries but not an English one, you can just web search/translate to find out more information.
    • [JA-JA] 実用日本語表現辞典
    • [JA-JA] デジタル大辞泉
    • [JA-JA Encyclopedia] Pixiv
  • Kanji information
    • [Kanji] KANJIDIC_english
    • [Kanji] JPDB Kanji
  • Pitch accent
    • [Pitch] 大辞泉

If you're a bit more familiar with the language, I highly recommend following Kuri's Yomichan Setup which goes in depth with the technical setup process and recommends you install around thirty dictionaries, with a lot of thought having been put into the selections.

My personal recommendation is to install everything. With every dictionary installed, you know that any (actual) word you come across will almost certainly be in your Yomitan installation, and that you will have a good selection of definitions for almost any word. Most dictionaries on their own do not that high of a breadth of coverage so having less dictionaries installed means you may run into confusion when you fail to look up a word. Some of the dictionaries cover quite different subject areas as well - hover a cultural reference and you will be more likely to find it in the Pixiv dictionary than any other.

Dictionaries Sort Script

If you install a lot of dictionaries and/or sync your dictionary collection across multiple devices, it can be quite the ordeal to change the sort order with the way Yomitan's UI is set up. I have written a script that will automatically sort your dictionaries for you. You can find it here: Yomitan Dictionaries Sort Script for Yomitan. Yomitan Dictionaries Sort Script for Yomitan 24.2.10.0 and newer.

To use it, simply copy the script, open the Yomitan options page, open the console, paste the script, and press enter. It will automatically sort your dictionaries for you.

By default, the sort order used is the one that I use and it supports all the dictionaries in the folder. If you want to use a different sort order, you can edit the script to change the order variable at the top.

Japanese

For an easy download of the dictionaries I use, check out this folder.

Do check out yomichan-dict-css for CSS that colors some term dictionaries to make them more immediately distinguishable.

example

Terms

JP-EN Term Dictionaries

Jitendex

Jitendex

Jitendex is a free and openly licensed Japanese-to-English dictionary built upon data from JMdict and other projects. It is the successor to JMdict for Yomichan.

JMDict

Download

The most extensive JP-EN dictionary using data from the EDRDG Project created by Jim Breen. It is recommended you use Jitendex unless you need a legacy version of the dictionary for technical reasons.

JMnedict

Download

A dictionary of Japanese proper names. The linked version is advantageous over the one linked on the Yomichan homepage as it clutters the search page much less when searching, so it's highly recommended.

Shoui Bilingual Dictionaries Collection

Download

There are various bilingual dictionaries in Shoui's bilingual folder. Check the readme in the folder for further information.

  • 新和英 (Recommended)
    • Same as the 研究社 新和英大辞典 第5版 with better deconjugation but lacking some additional sentences.
  • 研究社 新和英大辞典 第5版

New Saitou Japanese-English Dictionary

Find [JA-EN] NEW 斎藤和英大辞典 in the dictionary collection.

A bilingual dictionary by an anon, with lots of example sentences. You may want to limit the amount of example sentences to avoid cluttering the search page by using the following CSS, where the number 5 can be changed:

[data-dictionary='NEW斎藤和英大辞典'] ul.gloss-sc-ul > li:nth-child(n + 5) {
  display: none;
}

Japanese Monolingual Dictionaries

Shoui Monolingual Dictionaries Collection

Download

There are various monolingual dictionaries in Shoui's monolingual folder, authored by various people. Check the readme in the folder for further information, and check the explanation on learnjapanese.moe on how to use them. Currently contains:

  • 広辞苑 第七版
  • 三省堂国語辞典 第七版 (Recommended)
  • 実用日本語表現辞典 (Recommended)
  • 新明解国語辞典 第七版 (Recommended)
  • 明鏡国語辞典 第二版 (Recommended)
  • 旺文社国語辞典 第十一版 (Recommended)
    • Converted by irhello and shoui.
  • Weblio 古語辞典
    • Scraped/converted by 昔男/mk68.
  • 精選版 日本国語大辞典
  • 明鏡国語辞典
  • 旺文社国語辞典 第十一版 画像無し
  • 新明解国語辞典 第五版
  • 故事ことわざの辞典
    • Converted by Thermosphere with Yomichan Import
  • 広辞苑 第六版
  • 岩波国語辞典 第六版
  • 大辞林 第三版
  • ハイブリッド新辞林 v2
  • デジタル大辞泉
    • Converted by ッツ.
  • 新明解四字熟語辞典
    • Converted by ッツ.
  • 学研 四字熟語辞典
    • Converted by ッツ.
  • 日本語俗語辞書
    • Scraped/converted by Kartoffel.
  • 漢字源

Iwanami Kokugo Jiten

Find [JA-JA] 岩波国語辞典 第八版 in the dictionary collection.

A monolingual dictionary made by an anon, with very nice formatting and links for related terms.

Images

1 2 3

Jitenon Dictionaries

Find them in the dictionary collection.

There are many dictionaries available at 辞典オンライン and using stephenmk's jitenbot, some of these have been scraped for use as Yomichan dictionaries. There are quite a few entries that aren't in other dictionaries, so I'd recommend installing this.

Includes the following dictionaries:

Images (click to expand)

chrome_四字熟語辞典オンライン_-_Google_Chrome_2023-05-16_22-42-30 chrome_故事・ことわざ・慣用句辞典オンライン_-_Google_Chrome_2023-05-16_22-42-52

Sankoku Eighth Edition

三省堂国語辞典 第八版

Converted by Malte using stephenmk's jitenbot.

Daijirin Fourth Edition

大辞林第四版 | 大辞林第四版 画像無し

Converted by Malte using stephenmk's jitenbot.

Shinmeikai Eighth Edition

新明解第八版

Converted by Malte using stephenmk's jitenbot.

Meikyou Second Edition

Find [JA-JA] 明鏡国語辞典 第二版 in the dictionary collection.

Converted by dictionary anon, this dictionary has nice modern formatting.

Images (click to expand)

Shinsenkoku Tenth Edition

Find [JA-JA] 新選国語辞典 第十版 in the dictionary collection.

Converted by dictionary anon, this dictionary has nice modern formatting.

Recommended custom CSS:

li[data-dictionary^='新選国語辞典'] th,
span[data-sc-shinsenkoku10='warichu'] {
  white-space: nowrap;
}
span[data-sc-shinsenkoku10='red'] {
  color: #e5007f;
}
Images (click to expand)

Goo Thesaurus

使い方の分かる 類語例解辞典

Scraped by Malte from the online goo.ne thesaurus.

Images (click to expand)

Goo Dialect Dictionary

全国方言辞典

Scraped by Malte from the online goo.ne dialect dictionary.

Note

Note: This version has some (~650) entries formatted like そーだ instead of そうだ which can be annoying.

新語時事用語辞典

新語時事用語辞典

Scraped from http://www.breaking-news-words.com/ by Malte.

「新語時事用語辞典」は、ニュースで今最も話題になっている最新のキーワードを時流に即して紹介する、速 報・辞書サイトです。新聞で、テレビで、ネットで話題になっているキーワードや流行語をいち早く紹介しま す。

実用日本語表現辞典

実用日本語表現辞典

Scraped from http://www.practical-japanese.com/ by Malte. Updated version of the old yomichan dictionary.

Kanjipedia 同訓異義

Kanjipedia 同訓異義

Scraped by Julian and converted by Malte, sourced from Kanjipedia.

This dictionary differentiates the usages of words with the same reading but different kanji, for example 越える・超える・逾える・踰える.

漢検漢字辞典 第二版 Kanken Kanji Jiten 2nd Edition

Download (Updated 2024-03-04)

The 漢検漢字辞典 第二版 Kanken Kanji Jiten 2nd Edition is a kanji dictionary made by the 日本漢字能力検定協会 (Japanese Kanji Aptitude Test Association). It was converted by dictionary anonymous into a Yomitan dictionary.

Note

This dictionary will not work on outdated Yomitan/Yomichan/Yomibaba installations as it utilizes new additions to the schema. Please install Yomitan 23.12.29 or newer.

(Click to expand) Images

mochi (1) ben (1) rou (1)

JA Wikipedia

Wikipedia for Yomitan

A conversion of the DBPedia short-abstract dumps of JA Wikipedia for Yomitan. This dictionary features over 1.2 million entries with each entry containing the abstract and a link to the Wikipedia article. Unfortunately there are no dumps of DBPedia after December 2022, so regular updates will not be possible until DBPedia starts updating again.

Pixiv

Pixiv for Yomitan

Last Updated: 2024-02

A complete scrape of the public dic.pixiv.net encyclopedia of over 500,000 entries, containing a brief summary and links to related articles for each entry. This dictionary is quite extensive and contains entries for a vast amount of terms that would not be in traditional dictionaries. For instance, 和泉妃愛 has an entry as does likely every notable VTuber, media franchise, and mountain in Japan.

niconico-pixiv Terms

Click to expand (obsolete)

Download

Using the information gathered by ncaq for use in an IME, this is a dictionary that can help parse terms that are in both niconico and pixiv's online dictionaries. These online dictionaries are sort of like encyclopedias of the internet, so many terms such as proper nouns not in traditional dictionaries will be found.

ルールベースで IME 辞書の役に立たなそうな単語を除外しています。

surasura Onomatopoeia

Download

A dictionary of onomatopoeia from surasura.com. Contains some onomatopoeia that are not in any other dictionaries. Credit to stephenmk for the idea to mark information using those emojis with his improved JMDict.

For each entry, it contains:

  • A few definitions
  • An extended explanation if available, marked with the ℹ️ emoji
  • A few example sentences marked with the 🇯🇵 flag emoji

surasura

複合語起源 Term Origins

Download | List of words

Compound kunyomi word origins/etymology, for example 陥る -> 落ち入る(おち|いる). Information comes from anonymous forum posts, so it may not be 100% accurate.

Sources:

Gogen Yurai

Find [JA-JA Origins] 語源由来辞典 in the dictionary collection.

語源由来辞典 etymology information parsed from https://gogen-yurai.jp/ by Seikou. Contains information about the origins of words.

対義語辞典オンライン Taigigo Jiten Online

Find [JA-JA Antonym] 対義語辞典オンライン (2024-02-07) in the dictionary collection.

A dictionary of antonyms from 対義語・反対語辞典オンライン, converted by 霜月.

類語辞典オンライン Ruigo Jiten Online

Find [JA-JA Thesaurus] 類語辞典オンライン (2024-02-09) in the dictionary collection.

A dictionary of synonyms from 類語辞典オンライン, converted by 霜月.

Image (click to expand)

image

数え方辞典オンライン Kazoekata Jiten Online

Find [JA-JA Counters] 数え方辞典オンライン (2024-02-13) in the dictionary collection.

A dictionary of counters from 数え方辞典オンライン, converted by 霜月.

Image (click to expand)

image

例解学習国語辞典 第十一版 Reikai Gakushuu Kokugo Jiten

Find [JA-JA] 例解学習国語辞典 第十一版 in the dictionary collection.

Converted by @SalwynnJP with data provided by Ludia.

Salwynn's notes/images

Improved entries (around ~30k duplicates) by comparing readings with other dicts (漢字遣い参考 / 明鏡国語辞典 第二版 etc...)

Ex : [売り出し] initial term // [売出し] - [売出] duplicates

Ex : 連らく initial term // [連絡] duplicate

Ex : とうきょうと initial term (only kana) // [東京都] duplicate

56k entries + some 使い分け section (up your Text scan length in yomitan's settings to scan these)

Includes 例文 / 熟語 / 成句 for most defs

Thoughts : For testing it several weeks, I think this dict deserves his first place, above 三省堂 国語辞典 第八版

Notes : This dict needs to get conjugation for yomitan. If there are other issues about this dict, let me know.

From Discord

image image image

現代国語例解辞典 第五版 Gendai Kokugo Reikai Jiten

Find [JA-JA] 現代国語例解辞典 第五版 in the dictionary collection.

  • Has hundreds of tables and charts for explaining usage contexts of similar words
  • Has many entries for onomatopoeia / mimetic words that are grouped by similar meanings

Converted by DAnon

Images/stats (click to expand)
  • 62,873 vocabulary entries
  • 1,356 慣用句 entries
  • 276 助詞・助動詞 entries
  • 82 擬音語・擬態語 categories containing 1,138 subentries
  • 2,787 kanji entries

shinmiri keigo youki kaze

Kanji de Go

Kanji de Go for Yomitan

Kanji de Go (漢字で Go!) is a fun game quizzing people on rare/exotic kanji terms. Converted by Marv.

Grammar Dictionaries

aiko-tanaka Grammar Dictionaries

Download

A collection of grammar dictionaries scraped and converted by aiko-tanaka. A lot of manual work was put in to creating them to make them parse well, I'd recommend you install all of them. Contains:

  • Nihongo no sensei 毎日のんびり日本語教師
  • E de wakaru 絵でわかる日本語
  • Nihongo Kyoshi JLPT 文法解説まとめ
  • Donna Toki どんなときどう使う 日本語表現文型辞典
  • DoJG 日本語文法辞典(全集)

Term Frequency

jpdb Frequency Dictionary

Download

A frequency dictionary based on information scraped from https://jpdb.io in May of 2022. More information can be found here.

Due to the way the data was scraped, some terms are missing frequencies and the jpdb dictionary itself is limited to terms in JMDict. For example, 経緯 only has an entry for the いきさつ reading so it should not be used as a dictionary for sorting (the more common/correct reading is けいい). However, the corpus of JPDB is quite good for immersion learners as it covers anime, dramas, light novels, visual novels, and web novels so the frequencies will be relatively accurate to what you're actually reading. This dictionary is notable for displaying the frequencies of kana readings separately, so you can often get a sense of how often a word is written with kanji or not.

Aozora Bunko Jukugo Frequency

Download

A frequency dictionary created using data collected by vrtm based on the Aozora Bunko. Due to the methodology used, this dictionary does not cover words with kana in them but it covers many rare 熟語 not covered by other frequency dictionaries, such as 睽乖. The number in parentheses is the number of times the word appears in the corpus.

CC100

Find [JA Freq] Freq_CC100 in the dictionary collection.

Made by the mind behind arujisho, this uses the CC100 dataset which was made by crawling the web. Coverage is very wide, and there is reason behind the way readings are differentiated which is why I use this as my Yomichan sort dictionary.

Original message by Seikou

Hello everyone! Recently I tokenized the CC-100 Japanese dataset (which is a high quality dataset filtered from Commoncrawl web crawl data, and is about 70GB large) as a corpus using mecab(fugashi) and sudachi, resulting a frequency rank list of about 900k words. After filtering it using several monolingual dictionaries, I got a freq rank list of roughly 160k words.

BCCWJ

Download

From the publication:

The balanced corpus of contemporary written Japanese (BCCWJ) is Japan’s first 100 million words balanced corpus. It consists of three subcorpora (publication subcorpus, library subcorpus, and special-purpose subcorpus) and covers a wide range of text registers including books in general, magazines, newspapers, governmental white papers, best-selling books, an internet bulletin-board, a blog, school textbooks, minutes of the national diet, publicity newsletters of local governments, laws, and poetry verses.

It has extremely wide coverage with most terms you'll encounter having an entry in this list even if other frequency lists don't. In addition, it differentiates between readings quite well. Make sure to install the LUW version as it has more terms.

Innocent Ranked

Download

The Innocent Corpus from the Yomichan page but reordered to be sorted by rank. It is based on data from 5000+ novels. A weakness is that it does not differentiate based on reading, so all readings of a term will show the same value.

jpDicts Frequencies

Download

A frequency dictionary created using monolingual dictionary definitions as the corpus, so it might be useful for those who really like reading dictionaries. Made by Avratzzz.

Dictionaries used:
  • ハイブリッド新辞林 v2
  • 故事ことわざの辞典
  • 漢字源
  • 精選版 日本国語大辞典
  • 新明解四字熟語辞典
  • 学研 四字熟語辞典
  • 実用日本語表現辞典
  • 明鏡国語辞典
  • 旺文社国語辞典 第十一版
  • 新明解国語辞典 第五版
  • 大辞林 第三版
  • デジタル大辞泉
  • 岩波国語辞典 第六版
  • 広辞苑 第六版

Youtube Frequency Dictionaries

Find [JA Freq] YoutubeFreqV3 in the dictionary collection.

Download all domain-specific dictionaries

Using data from 40k manually transcribed YouTube videos we have created 16 domain specific frequency lists for YomiChan. Enjoy and feel free to share around. Created by @Zetta @Vexxed @Anonymous

Domain-specific frequency lists from Youtube Videos:

Domains:
  • Vlogs
  • Vehicles
  • Travel
  • TEDx
  • Sports
  • SciTech
  • Pets/Animals
  • Nonprofits
  • News
  • Music
  • HowtoStyle
  • Gaming
  • Film/Anime
  • Entertainment
  • Education
  • Comedy

Corpus of Everyday Japanese Conversation

Download

Important

Due to the limited nature of the original data set, this frequency list only goes up to around 20,000 in frequency. It is still useful to know the relative frequency of words in conversation, but the frequency values should not be compared to those from other more expansive frequency dictionaries. For a more complete list that is somewhat conversational, I recommend trying the Youtube frequency list.

This Yomichan frequency dictionary based on the Corpus of Everyday Japanese Conversation was converted by forsakeninfinity.

The Corpus of Everyday Japanese Conversation (CEJC) is a vocabulary and word count table based on 200 hours of recorded data (approximately from April 2016 to 2020).

Our project will develop a large-scale corpus of Japanese everyday conversation in a balanced manner. Since informants record their conversations in everyday situations by themselves, naturally occurring conversations can be collected. To build an empirical foundation for the corpus design, we conducted a survey of ordinary conversational behavior of about 250 adults."\

Corpus of Spontaneous Japanese

Find it here

Converted by Malte, “The Corpus of Spontaneous Japanese” (or CSJ) is a database containing a large collection of Japanese spoken language data and information for use in linguistic research; jointly developed by NINJAL, NICT and the Tokyo Institute of Technology, the CSJ is world-class in both the quantity and quality of the available data. Goes up to 31,605 frequency.

NINJAL Web Japanese Corpus

Find it here

Converted by Malte. Goes up to 106,762 frequency.

Shoui Dictionaries Collection Misc. Frequency Dictionaries

Some other miscellaneous frequency dictionaries in the Shoui Dictionaries Collection.

  • Anime & J-drama
  • Narou Freq
  • Novels
  • VN Freq v2
  • Wikipedia v2
  • 国語辞典
  • Nier

OhTalkWho オタク Frequency Dictionaries

Download

Some frequency dictionaries made by this YouTuber OhTalkWho オタク.

  • Netflix
  • Top 100 Shonen
  • Top 100 Slice of Life
  • JLPT Level Tags
  • Novel 5k
    • This might just be innocent corpus with stars?
  • Visual Novels
    • Might be based off vnstats? It's different than the VN Freq v2 in Shoui's Dictionaries Collection.

Anacreon's Frequency Dictionaries

Download

Some frequency dictionaries made by Anacreon that are not rank-based, but rather percentage-based where the displayed value is the percent of that corpus you would be able to read if you knew every word with that percentage or lower. They are somewhat redundant with other previously mentioned dictionaries, but some people may prefer the percentage-based approach.

Frequency is displayed as a number between MOST frequent 0 and LEAST frequent 100. Check out this graph, essentially the number in these dicts are the Y axis of this graph. So if you were aiming for understanding 95% of words you come across the most efficient way would be to mine all the words with a freq less than or equal 95.

JLPT Vocab Frequency

yomichan-jlpt-vocab

A frequency dictionary based on unofficial JLPT lists from ten years ago. There are no official vocab lists for the JLPT exam so the numbers in this list should only be used as a guideline.

Kanji

Yomichan CSS for Kanji Dictionaries

Yomichan and KANJIDIC by default have a lot of bloat in the kanji dictionary viewer, like repeating the kanji stroke order image, frequency information, and unused table rows for every entry. For using multiple kanji dictionaries, you can use some CSS to make the kanji display more compact like it is for terms.

In Settings -> Popup Appearance -> Configure custom CSS... input the following CSS for more compact display of entries.

/* remove misc dict classifications/codepoints/stats */
.kanji-glyph-data > tbody > tr:nth-child(n + 3) {
  display: none;
}

/* remove stroke diagram, freq, header for next entries */
div.entry[data-type='kanji']:nth-child(n + 2) .kanji-glyph-container,
div.entry[data-type='kanji']:nth-child(n + 2) [data-section-type='frequencies'],
div.entry[data-type='kanji']:nth-child(n + 2) table.kanji-glyph-data > tbody > tr:first-child {
  display: none;
}

/* remove 'No data found' */
.kanji-info-table-item-value-empty {
  display: none;
}

/* reduce extra padding */
.kanji-glyph-data,
div.entry[data-type='kanji'],
div.entry[data-type='kanji']:nth-child(n + 2) .kanji-glyph-data > tbody > tr > *,
.kanji-glyph-data dl.kanji-readings-japanese,
div.entry[data-type='kanji']:nth-child(n + 2)
  .kanji-glyph-data
  dl.kanji-readings-chinese[data-count='0'] {
  padding-top: 0 !important;
  padding-bottom: 0 !important;
  margin-bottom: 0em;
  margin-top: 0 !important;
}
/* remove horizontal lines */
.entry + .entry[data-type='kanji'],
div#dictionary-entries > div.entry:nth-child(n + 2) .kanji-glyph-data > tbody > tr > * {
  border-top: none !important;
}
/* change decimal list */
.kanji-gloss-list {
  list-style-type: circle;
}

Kanji Info

KANJIDIC

Download

The KANJIDIC Project's KANJIDIC is the primary English kanji dictionary used in Yomichan and contains information about most kanji, notably English definitions, readings, and some other statistics like stroke count, JLPT, grade level.

Wiktionary Kanji

Download

Kanji information of around 18,000 characters from Wiktionary, notably:

  • 呉音, 漢音, 唐音, 宋音, 慣用音 onyomi readings of kanji (further reading)
  • 字源 - information about how and why a kanji is composed the way it is, including the type of composition it is
  • The meaning of the kanji (in Japanese)
  • The various 異体字 of the kanji

jpdb Kanji

Download

Kanji information of around 6,000 characters from https://jpdb.io:

  • The 15 most common vocab applicable
  • The kanji decomposition according to jpdb (has inaccuracies because it's meant for memorizing keywords)
  • 漢字検定 level
  • 旧字体/新字体/拡張新字体 character form

TheKanjiMap

Download | List of possible phonetic components

Information from TheKanjiMap:

  • Radical information for all radicals
  • Kanji decomposition (more accurate than JPDB)
  • List of all kanji that contain a kanji/component/radical
  • Reading hints based on possible phonetic components (computed based on information from KANJIDIC and the decomposition here)

高更

Kanji Jitenon Online

Download

The online 漢字辞典オンライン kanji dictionary is an extensive Japanese kanji dictionary. It was converted into a Yomichan kanji dictionary by eurusdagr.

(Click to expand) Example image

Kanji Variants

mozc

Download

A kanji dictionary made from the kanji variant information in Google's mozc Japanese IME. Includes information about:

  • 異体字
  • 印刷標準字体
  • 簡易慣用字体
  • 旧字体
  • 略字
  • 正字
  • 俗字
  • 別字
  • 本字

jitai

Download

A kanji dictionary made using the data from shinjigen-glyph. This allows you to see information about 旧字体, 新字体, 拡張新字体, and 標準字体 variants from the kanji page in Yomichan.

Kanji Frequency

Aozora Bunko Kanji Frequency

Download

A kanji frequency dictionary created using data collected by vrtm based on the Aozora Bunko. The number in parentheses is the number of times the kanji appears in the corpus.

Innocent Corpus Kanji Frequency

Download

Uses the innocent corpus frequency list that is distributed with Yomichan to create a rank-based kanji frequency dictionary. This was created because the existing one is an occurence-based list and does not display ranks.

  • The displayed frequency in Yomichan will contain the frequency rank followed by the occurence count, for example 4686 (57) for 壟 indicating it's the 4686th most common kanji and appeared 57 times total in the 5000+ novels in Innocent Corpus.

Wikipedia Kanji Frequency

Download

Rank-based kanji frequency data from a May 2015 dump of Japanese Wikipedia, containing around 2 万 kanji. Data gathered by scriptin.

jpdb Kanji Frequency

Download

Kanji frequency data from https://jpdb.io as a Yomichan frequency dictionary.

Yomitan CSS for Non-Japanese CJK Languages

Yomitan by default renders everything in Japanese leading to incorrect glyphs being rendered when using Yomitan with non-Japanese CJK languages. This can be fixed with some CSS.

/* Set Render Language */
* {
  /* 
  Optionally set the version(s) of Noto Sans or another font you want in your preferred order.
  e.g. JP, TC, SC, HK
  */
  /* prettier-ignore */
  font-family:
  'Noto Sans HK', 
  'Noto Sans TC',
  'Noto Sans SC',
  'Noto Sans JP',
  sans-serif;

  /* 
  ja (Japanese)
  zh-Hans (Simplified)
  zh-Hant (Traditional)
  zh-Hant-HK (Traditional Hong Kong)
   */
  -webkit-locale: 'zh-Hant-HK' !important;
}

/* In Hanzi popups and the search box, override the font */
.kanji-glyph,
#search-textbox {
  font-family: unset !important;
}
/* Set Render Language End */

Simply copy this CSS into Settings -> Popup Appearance -> Configure custom CSS... and change the font-family and -webkit-locale variables to the language you want.

  • Setting the font family here is optional; setting the webkit-locale should be enough to fix the issue. However your default system fonts may look bad so I recommend installing Noto Sans.
    • In the above CSS, the order of the fonts means that when a glyph is not found in the Noto Sans TC font, it would then try to find it in the Noto Sans SC font, and so on.
  • Note that Firefox users need to set the font-language-override property instead of the -webkit-locale property as it is not supported in Firefox.

Mandarin Chinese

For an easy download of the dictionaries I use, check out this folder.

For CSS to fix the rendering of non-Japanese characters in Yomitan, see this section.

Terms

Term Dictionaries

CEDICT

CC-CEDICT dictionary for Yomichan

There was a previous version but the formatting wasn't as great and it was kind of outdated. So I created this repository with some more modern formatting and also added proper hanzi functionality. The repository automatically updates every day from the newest data at MDBG.

Shoui's Chinese Yomichan Setup

Shoui's Chinese Yomichan Setup

These Chinese Yomichan dictionaries are hosted in Shoui's guide to setting up Yomichan for Chinese, includes:

  • [ZH-EN] CEDICT (Outdated) (converted by an anon)
  • [ZH-JA] 中日大辞典 第二版 (converted by an anon)
  • [ZH-ZH] 兩岸詞典 (converted by Chrono7 on the Refold ZH Discord server)
  • [ZH-ZH] 漢語大詞典 (converted by Chrono7 on the Refold ZH Discord server)
  • [ZH-ZH] 萌典国语辞典 (简体字) (converted by Chrono7 on the Refold ZH Discord server)

Simplified Chinese Versions

Michel converted some of the above dictionaries to simplified Chinese.

  • Download 汉语大词典
    • The Hanyu Da Cidian is the most comprehensive Chinese dictionary, comparable to the Oxford English Dictionary.
  • Download 两岸词典
    • The Cross-Straits dictionary is a small mainland dictionary focusing on contemporary usage and the differences between Taiwan and Mainland Chinese.

ZH Wikipedia

Wikipedia for Yomitan

A conversion of the DBPedia short-abstract dumps of ZH Wikipedia for Yomitan. This dictionary features over 1.2 million entries with each entry containing the abstract and a link to the Wikipedia article. Unfortunately there are no dumps of DBPedia after December 2022, so regular updates will not be possible until DBPedia starts updating again.

Other Chinese Dictionaries

Download

These miscellaneous Chinese Yomichan dictionaries were converted by lix on the Refold ZH Discord server. Includes:

  • 萌典.pinyin
  • 萌典
  • 牛津英汉汉英词典
  • 现代汉语规范词典
  • 譯典通英漢雙向字典
  • 五南國語活用辭典

Wenlin ABC Chinese-English Comprehensive Dictionary

Download

Published in July 2003 and revised through 2005, the Wenlin ABC Chinese-English Comprehensive Dictionary was produced by the Wenlin Institude in cooperation with the ABC Chinese Dictionary Series Project at the University of Hawaii. It contains over 196,000 entries. This file was converted by rduwjjnh.

500 Common Chinese Idioms

500 idioms

A glossary of 500 chengyu with two example sentences for each idiom. The data is taken from this published book, and it was converted for Yomitan by Michel.

白水社 **語辞典 Hakusuisya Chinese-Japanese Dictionary

Find [ZH-JA] 白水社 **語辞典 in the dictionary collection.

From weblio's 白水社 **語辞典, converted by 昔男.

I scraped the 白水社 **語辞典 from weblio into a yomichan dictionary. 64k entries, I wouldn't say it's as extensive as 漢語大詞典 but it's pretty g. I only scraped Chinese-Japanese entries for the record. from Discord

Chinese Frequency

BLCU BCC Corpus

Find them in the dictionary collection.

Title Corpus
BLCUmixed A Balanced Mix from Magazines, Literature, Weibo, Tech
BLCUlit Literature (Foreign and Domestic)
BLCUnews Newspapers《厦门日报》、《厦门商报》、《厦门晚报》等
BLCUsci Scientific and Technological Academic Journals
BLCUcoll Dialogue (Weibo and Movie/TV Subtitles)

A Yomichan frequency list made from the comprehensive Beijing Language and Culture University Corpus Center (BLCU BCC)'s corpus containing over 9 billion characters. Thanks to nadavspi and Michel who converted it for Yomichan.

SUBTLEX-CH Subtitle Corpus

Find [ZH Freq] SUBTLEX-CH in the dictionary collection.

This is a subtitles frequency list based on over 6,000 simplified Chinese movies and TV shows from the SUBTLEX frequency list, which was compiled by Ghent University. Thanks to nadavspi and Michel who converted it for Yomichan.

HSK Levels List

Find [ZH Freq] HSK in the dictionary collection.

This Yomichan HSK Levels frequency list is based on the official HSK word list from the Chinese Ministry of Education released in 2021, which was then OCRed and neatly formatted thanks to Andy Burke . Thanks to Michel who converted it for Yomichan.

Hanzi

See Yomichan CSS for Kanji Dictionaries for CSS used to reduce the clutter included by default in Yomichan.

Note

The default kanji stroke order font included with Yomichan is made for kanji stroke orders, and as thus will contain incorrect glyphs and stroke orders for Chinese that may be misleading. You can change this by using some CSS:

.kanji-glyph {
  font-family: sans-serif; /* or a whatever font you prefer for Chinese */
}

Wiktionary Hanzi

Download

Hanzi information of nearly 100,000 characters from ZH Wiktionary. Due to the complexity of the wiktionary pages, it will display most of the text on the page, excluding tables and such so the pinyin readings may not be included for many characters. In addition, do note that for some uncommonly used characters there is little information available as the wiki pages often consist of just unicode information and code points, which was stripped from the dictionary.

zh wiktionary hanzi

Cantonese

For an easy download of the dictionaries I use, check out this folder.

For CSS to fix the rendering of non-Japanese characters in Yomitan, see this section.

Cantonese Terms

Words.hk

Words.hk for Yomitan

A conversion of the words.hk dictionary for Yomitan (formerly Yomichan). The words.hk dictionary data is fetched from words.hk, built, then released automatically every day.

CantoDict

Download

CantoDict was a Cantonese-English dictionary created and maintained by Adam Sheik and public contributors. It was abandoned, but the data was archived thanks to awong-dev at https://github.com/awong-dev/cantodict-archive. This dictionary is based off of the archived data.

canto_please canto_read

Misc Dictionaries

Download

Thanks to richter_belmont on the Refold Cantonese Discord:

I converted all of the Migaku dictionaries from the "Learn Cantonese!" shared folder on Google Drive into Yomichan dictionaries. List of dictionaries available are:

  • Canto CEDICT
  • CC-Canto
  • CE Wiktionary
  • Words.hk C-C
  • Words.hk C-E

Cantonese Term Frequency

Words.hk Frequency

Words.hk for Yomitan

A conversion of the words.hk frequency information for Yomitan (formerly Yomichan).

Cifu

Download

Spoken and written Cantonese frequency dictionaries for Yomitan from Cifu.

  • Spoken data from HKCanCor (Luke and Wong, 2015), HKCAC (Leung and Law, 2001), CantoMap (Lai and Winterstein, 2019)

  • Written data from 3,841 chapters of amateur novels from the website https://www.shikoto.com/.

Paper with more information about their methodology: Lai, Regine and Winterstein, Grégoire (2020) "Cifu: a Frequency Lexicon of Hong Kong Cantonese", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 3062--3070.

Japanese-German

Wadoku Jiten

和独辞典

Converted by Julian, 和独辞典 is a Japanese-German Yomichan dictionary based on the Wadoku dictionary.

Wadoku Daijiten

和独大辞典

Converted by Julian from the 和独大辞典.

  • Über 130.000 Stichwörter der modernen japanischen Sprache (frühe Meiji-Zeit bis Gegenwart) mit zahllosen Zusammensetzungen und Anwendungsbeispielen
  • Lateinumschrift aller Stichwörter und der Zusammensetzungen mit Kanji
  • ca. 70.000 Satzbelege aus Zeitungen, Zeitschriften, Werbung, Wissenschaft und Literatur mit Quellenangaben
  • Markierter Grund- und Aufbauwortschatz
  • Historische und fachsprachliche Erläuterungen
  • Herkunftsangaben und gesicherte Etymologien
  • Sprichwörter und idiomatische Wendungen
  • Fach- und Sondersprachen (Kinder- und Jugendsprache, Gaunersprache, Dialektismen)
  • Auflösungen von Abkürzungen
  • Fachvokabular u. a. aus den Bereichen Architektur · Astronomie · Biologie und Biochemie · Chemie · Computertechnologie · Elektrotechnik · Flora und Fauna (mit Angabe der wissenschaftlichen Nomenklatur) · Geowissenschaften · Linguistik · Mathematik · Medizin · Musik · Physik · Recht · Sport · Technik · Wirtschaft und Finanzen

![NOTE] Relatively rough conversion, more or less the entries as you'd find them on the website ^^ - might get updated in the future (converting the "tags" to actual tags etc.) . It ups the coverage that you'd get from only using Jmdict German and can be a great tool to use alongside other dicts - may it be as a source for example sentences etc.

Other

Indonesian-English

Kamata created a Indonesian Yomichan dictionary that shows the English definition of Indonesian words. The data is from Wiktionary.

Japanese-Mongolian

Japanese-Mongolian/日・モ辞典

Download | No example sentences version

A Japanese to Mongolian dictionary scraped from 栗林均's site. It contains about 19,000 entries.

現代日・モ辞典橋本勝、エルデネ・プレブジャブ『現代日本語モンゴル語辞典』春風社、2001.

jp-mongolian

Korean

See Yomichan For Korean for a fork of Yomichan that supports Korean.

On that repository the following dictionaries are listed:

  • KRDICT (KR-EN / KR-JP / Monolingual)
  • Naver (KR-JP)

Note that there is an alternative krdict-yomichan but this dictionary is no longer necessary due to the conjugation support added in Lyroxide's fork of Yomichan.

Vietnamese-English

VNEDICT

VNEDICT by Paul Denisowski converted by Marsh Nguyễn for Yomichan.

stardict-vi

From OVDP (Open Vietnamese Dictionary Project).

The Free Vietnamese Dictionary Project

Converted by Marsh Nguyễn for Yomichan.

I've just converted a Vie-Vie dictionary to a Yomichan one. This dictionary is from 'The Free Vietnamese Dictionary Project' by the author Hồ Ngọc Đức. https://www.informatik.uni-leipzig.de/~duc/Dict/install.html

Chữ Nôm Dictionary Chữ Nôm Dictionary converted by Marsh Nguyễn. The data comes from https://chunom.org/ The dictionary contains 1,569 entries.

Từ Điển Tiếng Việt Thông Dụng Vie-Vie dictionary converted by Marsh Nguyễn. The dictionary data is from Từ Điển Tiếng Việt Thông Dụng and was sourced from https://github.com/vntk/dictionary/tree/master/data contains 42012 entries.

English-English

Google Drive

Folder with some monolingual English dictionaries maintained by Umbrella including MacMillan, New Oxford American Dictionary, Cambridge, Longman, Wordset, and a frequency list from FLT.

Other

kaikki-to-yomitan

Custom dictionaries for the following languages (and some others) made from Wiktionary:

  • Albanian
  • Arabic
  • Ancient Greek
  • English
  • French
  • German
  • Greek
  • Indonesian
  • Italian
  • Japanese
  • Latin
  • Persian
  • Polish
  • Portuguese
  • Russian
  • Serb-Croatian
  • Spanish

yomichan-dictionaries's People

Contributors

epistularum avatar marvnc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

yomichan-dictionaries's Issues

Separate Pixiv Into a New Repo + Auto Update

  • Figure out how artifacts work, or just commit the data to github repo since the initial scrape would take a lot longer than 6 hours and maybe you can't upload artifacts manually?
  • Rewrite scrape code (obviously) to be less bad
    • Save what the last scrape time was, only scrape entries more recently updated
  • Redesign dict (scrape images?)

Add monolingual English dictionaries

Hi Marv, I found that Seth made a monolingual English dictionary from olad (Oxford Advanced Learner's Dictionary) it has ipa and two options, one with only definitions and other with also example sentences, the link is https://discord.com/channels/1171709317343346789/1180231433571283015/1211581296107851787
Images from the oald-extra dictionary
image
image

There's also a google drive with some monolingual English dictionaries made by Umbrella including MacMillan, New Oxford American Dictionary, Cambridge, Longman, Wordset, and a frequency from FLT the link is https://drive.google.com/drive/folders/1APj14ap2yMv0WZvSCEGJq9jfMUpZy6Ao?usp=sharing and the discord message with more info https://discord.com/channels/1171709317343346789/1171709317867655200/1193788026980151327

Simplified Hanzi Dictionary Download Links Broken

The links in Mandarin terms section for Simplified versions of 汉语大词典 and 两岸词典 point to discord cdn links that are broken.
These are the links in question:

Simplified Chinese Versions

Michel converted some of the above dictionaries to simplified Chinese.

  • Download 汉语大词典
    • The Hanyu Da Cidian is the most comprehensive Chinese dictionary, comparable to the Oxford
      English Dictionary.
  • Download 两岸词典
    • The Cross-Straits dictionary is a small mainland dictionary focusing on contemporary usage and
      the differences between Taiwan and Mainland Chinese.

Are these available elsewhere?

Not able to import dictionaries other than JMDict

When trying to import any of the dictionaries I get the following error and nothing imports.

Error: Dictionary has invalid data in 'term_bank_1.json' for value 'dictionary[0][5][1].type', validated against 'schema.additionalItems.items[5].oneOf[1].properties': 0 oneOf schemas matched

Tried this with jmdict extra, New Saitou Japanese-English Dictionary, and both pixiv dictionaries and they all error out.

Generate dictionary stats

Given a folder of dictionaries, read them all and generate table/tsvs of

  • Amount of entries
  • Amount of non redundant entries
  • Percentage of non redundant entries

Probably only support freq and term dicts at first.

Add Seth's yomichan forks and dictionaries

Hi Marv I found three yomichan forks made by Seth (For German, French and Spanish), the dictionaries on each fork can be used on Yomitan developer build too, here's a screenshot from the German dictionary used in Yomitan.

image

Links:
https://github.com/seth-js/yomichan-de (German)
https://github.com/seth-js/yomichan-fr (French)
https://github.com/seth-js/yomichan-es (Spanish)

Finally I think a link could be added to Kakki to Yomitan (Dictionaries from Wikipedia for a lot of languages that work on Yomitan developer build), they're the same as the ones on Yezichak, but they have more language options and IPA if one wants.

Link: https://github.com/themoeway/kaikki-to-yomitan/releases/tag/beta

I think it would be great to add them

Better Kanji Vocab List Dictionary

New repo. Just scrape the words from every dict and make a list for each kanji. Maybe add features to the is-hanzi module (or create a new module using it to parse characters.

Also maybe time to create that mega combined frequency list.

New dictionary addition request

Hello,

I did not know how to contact you so excuse me for creating an issue.

I just wanted to let you know that I have created this dictionary from the Corpus of Everyday Japanese Conversation recently released by the National Institute for Japanese Language and Linguistics:
https://github.com/n-manas/Corpus-of-Everyday-Japanese-Conversation---Yomichan-Frequency-Dictionary

Since it might be useful to other learners and your repository is quite referenced, I thought it would be a good idea to let you know so you can add it to the list if you want :)

Have a nice day!

CC100 not available?

Hello Marv, Can you check the link to CC100? It is showing content not available for me. I'm trying to download it on my device but it seems it isn't working.

Add Vicon dictionaries

Hi Marv, I found some migaku dictionaries that were converted into yomitan format from Target Language > English and from English > Different languages. The dictionaries data comes from lingoes. It includes Arabic, French, German, Greek, Italian, Latin, Russian, Portuguese, Korean, Hebrew, Spanish and English, I put all of them in one of my Proton Drive(alternative to Google Drive) accounts, the link is https://drive.proton.me/urls/3RY82EBQ2M#s7QNTK6lZKtZ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.