Code Monkey home page Code Monkey logo

yomitan-dict-stats's Introduction

Yomichan/Yomitan Dictionary Stats

This is a collection of information about the dictionaries available for Yomichan/Yomitan so that you can see the description/metadata available, and see the number of entries in each dictionary to get an idea of how extensive their coverage is.

You can run the script yourself by pasting the contents of generateStats.js into your browser's console while on the Yomitan options page.

Note

The entry count numbers naively count the amount of term entries in total in each Yomitan dictionary. This can lead to some dictionaries such as jpdb and Jitendex being overcounted as they have greater coverage for variants of many terms while other dictionaries might only have one entry for most terms.

Japanese

Japanese Frequency Dictionaries

Title Entry Count Information
Wikipedia 853593 Author: Thermospore
Revision: frequency_v2
URL: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/Japanese2015_10000
Description:
v0.1: Only change is the shortened title + addition of metadata
v1: Reserved version; never completed
v2: Moved to a much larger dataset. List goes up to about 850k now...
BCCWJ-LUW 824594 Author: toasted-nutbread
Revision: 1
URL: https://github.com/toasted-nutbread/yomichan-bccwj-frequency-dictionary
Attribution: Copyright National Institute for Japanese Language and Linguistics

https://pj.ninjal.ac.jp/corpus_center/bccwj/en/freq-list.html
Description:
Long unit word frequencies from the Balanced Corpus of Contemporary Written Japanese (BCCWJ).
JPDB 515231 Author: jpdb, Marv
Revision: JPDB_by-frequency-global_2022-05-10T03:27:02.930Z
URL: https://jpdb.io
Description:
Generated via userscript: https://github.com/MarvNC/jpdb-freq-list
㋕ is used to indicate a frequency for a hiragana reading.
❌ is used to indicate that a term does not appear in the JPDB corpus.
Innocent Ranked 280000 Revision: frequency1
Novels 269987 Revision: yyyy
jpDicts (206k) 206621 Revision: frequency_jpDicts
Description:
Frequency list made from Japanese dictionary (ハイブリッド新辞林 v2, 故事ことわざの辞典, 漢字源, 精選版 日本国語大辞典, 新明解四字熟語辞典, 学研 四字熟語辞典, 実用日本語表現辞典,, 旺文社国語辞典 第十一版 画像無し, 大辞林 第三版, デジタル大辞泉, 岩波国語辞典 第六版, 広辞苑 第六版)
Youtube 187053 Revision: frequency3
青空文庫熟語 169623 Author: vtrm, Marv
Revision: aozoraBunko_2022-08-26T05:50:26.124Z
URL: https://www.aozora.gr.jp
Attribution: 青空文庫
Description:
Rank-based jukugo frequencies made from Aozora Bunko
Data from https://vtrm.net/japanese/kanji-jukugo-frequency/en
Created with https://github.com/MarvNC/yomichan-dictionaries
Caveats:
Jukugo which are absent from the dictionary entries are not reported in the data since the software has no way of knowing whether it encountered a legitimate jukugo or merely a juxtaposition of several words (e.g. when two or more nouns are combined together to form a new noun, or when a jukugo is used as an adverb).

Sometimes a compound word can be either a Sino-Japanese jukugo read in on’yomi, or a native Japanese word read in kun’yomi and sometimes accompanied with okurigana. For example, 蹌踉 can either be a taru-adjective or to-adverb of Chinese origin read そうろう, or the root of a Japanese verb whose dictionary form is 蹌踉めく and which is read よろめく. Keep in mind that the program I wrote doesn’t parse kana and doesn’t try to disambiguate kanji readings. Consequently, occurrences of 蹌踉 read そうろう and 蹌踉 read よろ aren’t distinguished and are grouped together in the statistics. So if you look at the data for kanji 蹌, the line corresponding to 蹌踉 refers to all occurrences of 蹌踉 in the corpus, whatever their respective readings is.

Due to the parsing method used and to the imperfect nature of Chinese characters word segmentation algorithms, there is a small (negligible but non-zero) number of false positives and missed out words.
CC100 160836 Author: xydustc
Revision: 1
Description:
Sudachipy Mode B & fugashi parsed CC100 datasaet, filtered by dictionaries
国語辞典 156452 Revision: kokugojiten_freq
Netflix 129141 Revision: netflix.frequency
ウェブ NWJC 106754 Revision: NWJC_ver202202_2024-03-03T20:03:52.800154+00:00
URL: https://masayu-a.github.io/NWJC/
Description:
Converted programmatically from the dataset. See repo at https://github.com/Maltesaa/CSJ_and_NWJC_yomitan_freq_dict. Fork of https://github.com/forsakeninfinity/CEJC_yomichan_freq_dict
Anime & J-drama 99999 Revision: anime.frequency
VNs Freq 85374 Revision: frequency2019
Narou Freq 49269 Revision: frequency1
話し言葉 CSJ 42542 Revision: CSJ_ver201803_2024-03-03T20:04:01.044705+00:00
URL: https://clrd.ninjal.ac.jp/csj/en/index.html
Description:
Converted programmatically from the dataset. See repo at https://github.com/Maltesaa/CSJ_and_NWJC_yomitan_freq_dict. Fork of https://github.com/forsakeninfinity/CEJC_yomichan_freq_dict
VN Freq 35058 Revision: frequency1
Conversation Corpus 29528 Revision: CEJC_ver202209_2023-06-22T11:04:44.335323+00:00
URL: https://www2.ninjal.ac.jp/conversation/cejc/cejc-wc.html
Description:
Converted programmatically from the dataset. See repo at https://github.com/forsakeninfinity/CEJC_yomichan_freq_dict

Japanese Bilingual Dictionaries

Title Entry Count Information
Jitendex [2023-12-12] 290488 Author: stephenmk
Revision: 3.1
URL: jitendex.org
Attribution: © CC BY-SA 4.0 Stephen Kraus 2023

You are free to use, modify, and redistribute Jitendex files under the terms of the Creative Commons Attribution-ShareAlike License (V4.0)

Jitendex includes material from several copyrighted sources in compliance with the terms and conditions of those projects.

• JMdict (EDICT, etc.) dictionary data is provided by the Electronic Dictionaries Research Group. Visit edrdg.org for more information.
• Example sentences (Japanese and English) are provided by Tatoeba (https://tatoeba.org/). This data is licensed CC BY 2.0 FR.
• Positional information for the furigana displayed in headwords is provided by the JmdictFurigana project. This data is distributed under a Creative Commons Attribution-ShareAlike License.
新和英 152202 Revision: Shinwaei1
NEW斎藤和英大辞典 47504 Revision: saitoje.2023.03.26.1
Description:
Tip: use custom CSS to control how many example sentences are displayed

[data-dictionary="NEW斎藤和英大辞典"] ul.gloss-sc-ul>li:nth-child(n+5) {
display: none;
}

Japanese Monolingual Dictionaries

Title Entry Count Information
JA Wikipedia [2022-12-01] 1279999 Author: Wikipedians, DBPedia, Marv
Revision: wikipedia_2023-12-17T22:34:35.417Z
URL: https://ja.wikipedia.org/
Attribution: Wikipedia
Description:
Wikipedia short abstracts from the DBPedia dataset available at https://databus.dbpedia.org/dbpedia/text/short-abstracts.

Recommended custom CSS:
div.gloss-sc-div[data-sc-jawiki=red] {
color: #e5007f;
}

Created with https://github.com/MarvNC/yomichan-dictionaries
JMnedict 666086 Author: yomitan-import
Revision: JMnedict.2023-12-17
URL: https://github.com/themoeway/yomitan-import
Attribution: This publication has included material from the JMdict (EDICT, etc.) dictionary files in accordance with the licence provisions of the Electronic Dictionaries Research Group. See http://www.edrdg.org/
Pixiv [2024-02-17] 591048 Author: Pixiv contributors, Marv
Revision: 1.0.0
URL: https://github.com/MarvNC/pixiv-yomitan
Attribution: https://dic.pixiv.net
Description:
Article summaries from the Pixiv encyclopedia (ピクシブ百科事典), 546036 articles included.
Pixiv dumps used to build this found at https://github.com/MarvNC/pixiv-dump.
Built with https://github.com/MarvNC/yomichan-dict-builder.
デジタル大辞泉 527290 Author: ッツ
Revision: daijisen_20210506;2021-07-27
URL: https://dictionary.goo.ne.jp
Attribution: 監修:松村明
編集委員:池上秋彦、金田弘、杉崎一雄、鈴木丹士郎、中嶋尚、林巨樹、飛田良文
編集協力:田中牧郎、曽根脩

© Shogakukan Inc.
https://daijisen.jp
Description:
30万4千項目以上(2021年04月現在)を収録、言葉の集大成といえる大型国語辞典。年3回。定期更新を行い、最新の項目と日々修正される最新のデータを提供しています。
大辞林 第四版 334751 Revision: daijirin2;2023-07-10
Attribution: © Sanseido Co., LTD. 2019
精選版 日本国語大辞典 286365 Revision: seisenban3
広辞苑 第七版 257602 Author: shoui520 & Thermospore
Revision: 3
URL: https://github.com/Thermospore/koj72yomi
Attribution: http://kojien.iwanami.co.jp/
Description:
If you find any bugs or things to be improved, don't hesitate to post an issue on the github repo!

--- rev.BETA1 ---
・Initial testing release

--- rev.BETA2 ---
・Minor gaiji and PoS adjustments
・Added PoS for phrases (ex ○席末を汚す)
・Extracted readings for ALPH entries (ex 5W1H or DoS)
・Cleaned out some empty quotes in the structured content
・Stripped out loan source words (ex "class" being listed as kanji for クラス)

--- rev.1 ---
・Initial public release
・Minor adjustment to loan source stripping
・Fixed some errors in the gaiji map

--- rev.2 ---
・Add support for 圏点 (ex in くつ‐かぶり 【沓冠】)
・Some disambiguation info is now extracted (ex you can now see(音節)and(感動詞)in the entries for "あ")
・Use structured content for
instead of \n
・Add margins to make hierarchical entries more readable, and adjust icons to suit (ex つう・ずる 【通ずる】)
・The alph icon is now supported (ex W-VHS)
・Removed the "oyko_link" section, as it was redundant (details/discussion: FooSoft/yomichan#1910 )

--- rev.3 ---
・All photos, diagrams, and mathematical figures are now supported! (see せいき‐きょくせん 【正規曲線】for an example of the math figs)
・Add tooltips to see the path + filename of gaiji/images
・Expand and improve gaiji mapping (thanks epistularum)
・Use relative font-size instead of absolute for sub/superscript (thanks stephenmk)

--- Upcoming (maybe...) ---
・Detect kanji forms mentioned in the body of the entry (ex「波乱」とも書く in は‐らん 【波瀾】)
・Functional xrefs? ( FooSoft/yomichan#2089 )
・Further expand gaiji map
・etc absurdly obscure/minor formatting improvements
ハイブリッド新辞林 140879 Revision: shinjirin
新選国語辞典 第十版 127191 Revision: shinsenkoku10;2023-08-07
URL: https://www.shogakukan.co.jp/books/09501409
Attribution: © Shogakukan Inc. 2022
Description:
Add custom CSS for enhanced formatting

li[data-dictionary^='新選国語辞典'] th,
span[data-sc-shinsenkoku10='warichu'] {
white-space: nowrap;
}
span[data-sc-shinsenkoku10='red'] {
color: #e5007f;
}
新明解国語辞典 第八版 100593 Revision: smk8;2023-07-09
Attribution: © Sanseido Co., LTD. 2020
三省堂国語辞典 第八版 93091 Revision: sankoku8;2023-07-19
Attribution: © Sanseido Co., LTD. 2021
旺文社国語辞典 第十一版 92217 Author: irhello, shoui
Revision: OUKOKU11_1.6
URL: https://learnjapanese.moe
Attribution: Ⓒ 2013 旺文社 株式会社
https://www.obunsha.co.jp/pr/oukoku/
Description:
<================================
的確さ、わかりやすさを追求!
王道の国語辞典をYomichanで検索!
<================================
現代国語例解辞典 第五版 86213 Author: DAnon
Revision: genkokr5;2024-02-16
URL: https://www.shogakukan.co.jp/books/09501035
Attribution: © Shogakukan Inc. 2016
Description:
〈 書籍の内容 〉
さらに親しみやすく、さらにわかりやすく。
『現代国語例解辞典』は、1985年の初版刊行以来、「親しみやすく、わかりやすい」と高校生から社会人まで幅広い層から大きな支持を得てきました。とくに教育現場では「国語教諭からもっとも選ばれる辞典」として定評があり、生涯学習辞典としても愛され続けています。
第五版は、国語辞典としては初めて、国立国語研究所の日本語コーパスを全面的に活用して改訂。コーパスを活用することで、『現代』の日本語(『国語』)がどのように使われているかという実状を豊富な『例』を挙げながら『解』することが可能になった結果、さらに親しみやすく、さらにわかりやすく、そして、さらに使いやすく生まれ変わりました。
【改訂のポイント】
■コーパスを活用した255のコラム(345語)を収録
■見出し語や表記欄にもコーパスを活用
■現代風の語釈と用例で、より身近に、より親しみやすく
■「結びつきの強い語(コロケーション)」欄で日本語に幅を
■「類語対比表」で日本語を豊かなものに
■日常生活に必須の7万1,000語を収録
■本文デザインを一新、現代風にアレンジ
■コーパスについて理解を深めるための、専門家による「コーパス」解説



〈 編集者からのおすすめ情報 〉
国立国語研究所の日本語コーパスを活用した厳選コラムは、読むだけで楽しく、読むだけで幅広い知識が身につきます。【テーマ例】「空揚げ」と書くか「唐揚げ」と書くか?/「○○を組む」の○○に入る言葉で最も多いのは?/「被害を被る」「違和感を感じる」は間違いか?など255のコラムを収録しています。
岩波国語辞典 第八版 76625 Revision: iwakoku8.2023.04.08.0
KO字源 73401 Revision: KO字源
明鏡国語辞典 第二版 73068 Revision: meikyo2.2023.07.22.0
Description:
▼ non-jōyō kanji
▽ jōyō kanji used with a non-standard reading
全訳漢辞海 60839 Revision: 全訳漢辞海
例解学習国語辞典 第十一版 56840 Revision: RGKo11 2024/02/10
Attribution: © 小学館
実用日本語表現辞典 55379 Revision: jitsuyou;2023-08-15
URL: http://www.practical-japanese.com/
Description:
Added conjugation
漢検漢字辞典 第二版 50974 Author: dictionary anonymous
Revision: kankenkj2;2024-01-15
Attribution: © The Japan Kanji Aptitude Testing Foundation
漢字源 48085 Revision: kanjigen1
weblio古語辞典 47958 Revision: Meikyou1
国語辞典オンライン 44970 Revision: jitenon-kokugo;2023-05-13
URL: https://kokugo.jitenon.jp/
Attribution: © 2014-2023 国語辞典オンライン
漢検 漢字辞典 40549 Revision: ;_;
新語時事用語辞典 18294 Revision: shingojijiyougojiten;2023-08-14
URL: http://www.breaking-news-words.com/
Description:
新聞やテレビで話題に上った、新語および時事的なキーワードを解説しています。 Added conjugation
使い方の分かる 類語例解辞典 17350 Author: 小学館辞典編集部
Revision: tsukaikatanowakaru-2023-08-09
URL: https://dictionary.goo.ne.jp/thsrs/
Attribution: 使い方の分かる 類語例解辞典
Description:
Scraped from dictionary.goo.ne.jp 2023-08-09
故事ことわざの辞典 15577 Author: Thermospore
Revision: kotowaza1
Description:
Generated using the current version of Yomichan Import as of March 7th, 2021, then hex edited to remove excessive whitespace
対義語辞典オンライン 13965 Revision: taigigo_240207
Attribution: taigigo.jitenon.jp
類語辞典オンライン 12151 Revision: ruigo_240209
Attribution: ruigo.jitenon.jp
故事・ことわざ・慣用句オンライン 8513 Revision: jitenon-kotowaza;2023-05-15
URL: https://kotowaza.jitenon.jp/
Attribution: © 2014-2023 故事・ことわざ・慣用句辞典オンライン
漢字でGo! [2024-03-04] 7866 Author: Marv
Revision: 2024-03-04
URL: https://github.com/MarvNC/kanjidego-yomitan-anki
Attribution: https://formidi.github.io/KanzideGoFAQ/
https://w.atwiki.jp/kanjidego/
Description:
From the Kanji de Go! unofficial wiki.
Built with https://github.com/MarvNC/yomichan-dict-builder
四字熟語辞典オンライン 7782 Revision: jitenon-yoji;2023-05-14
URL: https://yoji.jitenon.jp/
Attribution: © 2012-2023 四字熟語辞典オンライン
学研 四字熟語辞典 5484 Author: ッツ
Revision: gakken_yojijukugo;2021-07-12
URL: https://dictionary.goo.ne.jp
Attribution: 編集:学研

© Gakken
https://hon.gakken.jp
Description:
難解な四字熟語も理解できるように、また、手軽に調べられるように、四字熟語を広く捉え約2700項目を収録。文学作品の用例が豊富で、注記や類義語、対義語も充実。検定試験やクロスワードにも使えます。
日本語俗語辞書 4354 Author: Kartoffel
Revision: 1
Attribution: http://zokugo-dict.com/
Description:
I'll only say anything in the presence of my advocate
新明解四字熟語辞典 4194 Author: ッツ
Revision: shinmeikai_yojijukugo;2021-07-12
URL: https://dictionary.goo.ne.jp
Attribution: 編集:三省堂

© SANSEIDO Co.
https://dictionary.sanseido-publ.co.jp
Description:
業界最大語数を誇る「新明解四字熟語辞典」より厳選。座右の銘や新年の抱負に使える四字熟語約2000項目を収録しています。就活のエントリーシートやスピーチなど、日常生活のさまざまな場面で役立ちます。
YOJI-JUKUGO 4017 Revision: YOJI-JUKUGO
全国方言辞典 3738 Author: goo
Revision: zenkokuhougenjiten-2023-08-12
URL: https://dictionary.goo.ne.jp/dialect/
Attribution: 全国方言辞典
Description:
Scraped from dictionary.goo.ne.jp 2023-08-12
語源由来辞典 2795 Revision: Gogen
福日木健二字熟語 2306 Revision: 福日木健二字熟語
surasura 擬声語 1422 Author: surasura, Marv
Revision: surasura_2023-03-22T01:10:57.302Z
URL: http://sura-sura.com/
Attribution: surasura
Description:
Onomatopoeia info from http://sura-sura.com/
Parsed/converted by https://github.com/MarvNC/yomichan-dictionaries
数え方辞典オンライン 1312 Revision: count_240213
Attribution: count.jitenon.jp
漢字ペディア同訓異義 966 Revision: kanjipedia-doukunigi;2023-08-28
URL: https://www.kanjipedia.jp/sakuin/doukunigi/
Attribution: © 公益財団法人 日本漢字能力検定協会
Description:
Scraped from kanjipedia 2023-08-28
複合語起源 222 Author: 名無し, 名無し, seanblue, Marv
Revision: 複合語起源_2022-08-26T22:38:51.046Z
URL: https://github.com/MarvNC/yomichan-dictionaries
Description:
Sources:
https://jbbs.shitaraba.net/bbs/read.cgi/study/10958/1299762655
https://academy6.5ch.net/test/read.cgi/gengo/1228873581
https://community.wanikani.com/t/special-kanji-words-derived-from-other-words/35655

Created with https://github.com/MarvNC/yomichan-dictionaries

Japanese Grammar Dictionaries

Title Entry Count Information
毎日のんびり日本語教師 1479 Author: nihongobongo
Revision: nihongo_no_sensei_v_1.03 ;2022-04-30;embedded urls, p of speech indicators(N5-N0)
URL: https://nihongonosensei.net/?page_id=10246
Attribution: nihongo_no_sensei
Description:
term bank 1 contains N1, bank 2 N2, etc...**の大学で日本語教師をしています。日本語教育能力検定試験の解説、対策講座、**での生活や授業、日本語の文法の説明をしています。
絵でわかる日本語 1248 Author: nihongobongo
Revision: edewakaru_v1.03; 2022-09-01
URL: https://github.com/aiko-tanaka/Grammar-Dictionaries/
Attribution: https://www.edewakaru.com/archives/cat_179055.html
Description:
日本語文法・自動詞他動詞・口語形・間違えやすい日本語・擬音語・擬態語などを「絵」で説明します。日本語を勉強している人のためのブログです.
どんなときどう使う 日本語表現文型辞典 1082 Author: nihongobongo
Revision: donna_v1.04;2022-04-30(completed arrow internal links)
URL: https://itazuraneko.neocities.org/grammar/donnatoki.html
Attribution: itazuraneko
Description:
A well regarded grammar reference covering grammar points through N5 to N1 of the 日本語能力試験.
JLPT文法解説まとめ 628 Author: nihongobongo
Revision: nihongo_kyoshi_v1.03; 2022-05-27; p.o.s. info
URL: https://nihongokyoshi-net.com/jlpt-grammars/
Attribution: 日本語NET
Description:
このページでは、JLPTに登場する文型を紹介しています。JLPTのレベル毎に50音順で並べています。

how to use:https://github.com/aiko-tanaka/Grammar-Dictionaries/
日本語文法辞典(全集) 535 Author: nihongobongo
Revision: DOJG_v1.01;2022-04-30;better formatting
Attribution: DOJG
Description:
DOJG-allVols

Japanese Kanji Dictionaries

Title Entry Count Information
漢字林 46210 Revision: 漢字林
漢字辞典オンライン 27693 Revision: jitenon-kanji;2023-08-17
URL: https://kanji.jitenon.jp/
Attribution: © 2014-2023 漢字辞典オンライン
Wiktionary漢字 18122 Author: Wiktionary, Wikimedia Foundation, Marv
Revision: Wiktionary漢字 2022-09-11T06:04:07.166Z
URL: https://ja.wiktionary.org/wiki/%E3%82%AB%E3%83%86%E3%82%B4%E3%83%AA:%E6%BC%A2%E5%AD%97
Attribution: JA Wiktionary
Description:
Kanji data from ja.wiktionary.org.
Parsed/converted by https://github.com/MarvNC/yomichan-dictionaries
KANJIDIC 10383 Author: yomitan-import
Revision: kanjidic2
URL: https://github.com/themoeway/yomitan-import
Attribution: This publication has included material from the JMdict (EDICT, etc.) dictionary files in accordance with the licence provisions of the Electronic Dictionaries Research Group. See http://www.edrdg.org/
TheKanjiMap Kanji Radicals/Composition 6911 Author: thekanjimap, Marv
Revision: thekanjimap_2023-02-04T03:44:35.926Z
URL: https://thekanjimap.com
Attribution: thekanjimap
Description:
Radical entries and kanji decomposition/compositions from thekanjimap.com.
Created with https://github.com/MarvNC/yomichan-dictionaries
JPDB Kanji 6494 Author: jpdb, Marv
Revision: jpdb_kanji_2022-08-26T22:38:14.736Z
URL: https://jpdb.io
Attribution: jpdb
Description:
Kanji data from JPDB
Created with https://github.com/MarvNC/yomichan-dictionaries
漢字ペディア 5635 Revision: 漢字ペディア
mozc Kanji Variants 1317 Author: Google, Marv
Revision: mozc_2022-08-26T22:38:27.927Z
URL: https://github.com/google/mozc
Attribution: Google
Description:
Data about kanji variants from Google's Japanese IME, mozc.
Created with https://github.com/MarvNC/yomichan-dictionaries
jitai 1174 Author: epistularum, Marv
Revision: jitai_2022-08-26T22:37:55.378Z
URL: https://github.com/epistularum/jitai
Description:
Data about 新字体/旧字体 and 標準字体/許容字体 in comparison to each other.
Created with https://github.com/MarvNC/yomichan-dictionaries

Japanese Kanji Frequency Dictionaries

Title Entry Count Information
Wikipedia Kanji 20932 Author: scriptin, Marv
Revision: kanjiFrequency1
URL: https://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8
Attribution: JA Wikipedia
Description:
Rank-based kanji frequency data from a May 2015 dump of Japanese Wikipedia.
Data from https://github.com/scriptin/kanji-frequency
Modified by https://github.com/MarvNC/yomichan-dictionaries
青空文庫漢字 8488 Author: vtrm, Marv
Revision: aozoraBunko_2022-08-26T05:49:00.968Z
URL: https://www.aozora.gr.jp/
Attribution: 青空文庫
Description:
Rank-based kanji frequency data from the Aozora Bunko
Data from https://vtrm.net/japanese/kanji-jukugo-frequency/en
Created with https://github.com/MarvNC/yomichan-dictionaries
JPDB Kanji Freq 6494 Author: jpdb, Marv
Revision: jpdb_kanji_2022-08-26T22:38:10.913Z
URL: https://jpdb.io
Attribution: jpdb
Description:
Rank-based kanji frequency data from JPDB
Created with https://github.com/MarvNC/yomichan-dictionaries
Innocent Corpus Kanji 6430 Author: cb4960, Marv
Revision: kanjiFrequency1
URL: https://web.archive.org/web/20190309073023/https://forum.koohii.com/thread-9459.html#pid168613
Attribution: Innocent Corpus Novels
Description:
Rank-based kanji frequency data from the innocent corpus
Modified by https://github.com/MarvNC/yomichan-dictionaries

Japanese Pitch Accent Dictionaries

Title Entry Count Information
大辞林第四版 152193 Author: コツ
Revision: pitch_1.0.1.1
URL: https://kotu.io
Description:
For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL.
大辞泉 88089 Author: コツ
Revision: pitch_1.0.0.1
URL: https://kotu.io
Description:
For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL.
三省堂国語辞典第八番 77630 Author: コツ
Revision: pitch_1.0.1.1
URL: https://kotu.io
Description:
For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL.
新明解第八版 75978 Author: コツ
Revision: pitch_1.0.2.1
URL: https://kotu.io
Description:
For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL.
NHK 73100 Author: コツ
Revision: pitch_1.0.1.1
URL: https://kotu.io
Description:
For accurate pronunciation (vowel elongation etc..) and comparisons across multiple modern and historical dictionaries visit the linked URL.

Cantonese Dictionaries

Cantonese Term Frequency Dictionaries

Title Entry Count Information
Cifu Spoken 51798 Author: Regine Lai, Grégoire Winterstein, Marv
Revision: 2023-12-21
URL: https://github.com/MarvNC/yomichan-dictionaries
Attribution: Lai, Regine and Winterstein, Grégoire (2020) "Cifu: a Frequency Lexicon of Hong Kong Cantonese", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 3062--3070.
Description:
Cantonese Frequency List from Cifu:
https://github.com/gwinterstein/Cifu

Spoken data from HKCanCor (Luke and Wong, 2015), HKCAC (Leung and Law, 2001), CantoMap (Lai and Winterstein, 2019)
Converted by Marv
Cifu Written 51798 Author: Regine Lai, Grégoire Winterstein, Marv
Revision: 2023-12-21
URL: https://github.com/MarvNC/yomichan-dictionaries
Attribution: Lai, Regine and Winterstein, Grégoire (2020) "Cifu: a Frequency Lexicon of Hong Kong Cantonese", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 3062--3070.
Description:
Cantonese Frequency List from Cifu:
https://github.com/gwinterstein/Cifu

Written data from 3,841 chapters of amateur novels from the website https://www.shikoto.com/
Converted by Marv
Words.hk Frequency 41174 Author: Marv
Revision: 1.0
URL: https://github.com/MarvNC/wordshk-yomitan
Attribution: Words.hk & contributers (https://words.hk)
See license at https://words.hk/base/hoifong/
Description:
Converted from the free Words.hk dictionary found at https://words.hk/.
Converted using https://github.com/MarvNC/yomichan-dict-builder

Cantonese Term Dictionaries

Title Entry Count Information
CE Wiktionary 142685 Revision: 20240412
Canto CEDICT 105347 Revision: 20240412
CantoDict 86570 Author: CantoDict contributors, Marv
Revision: cantodict_2023-04-25T22:55:55.715Z
URL: http://www.cantonese.sheik.co.uk/
Attribution: CantoDict contributors
Description:
CantoDict was a Cantonese-English dictionary created and maintained by public contributors. It was abandoned, but the data was archived thanks to awong-dev at https://github.com/awong-dev/cantodict-archive.
Created with https://github.com/MarvNC/yomichan-dictionaries
Words.hk 粵典 [2024-02-03] 60129 Author: Marv
Revision: 1.0.0
URL: https://github.com/MarvNC/wordshk-yomitan
Attribution: Words.hk & contributers (https://words.hk)
See license at https://words.hk/base/hoifong/
Description:
Converted from the free Words.hk dictionary found at https://words.hk/.
This export contains 52457 entries.
Converted using https://github.com/MarvNC/yomichan-dict-builder
Words.hk C-E FS 50064 Revision: 20240412
Words.hk C-C FS 50061 Revision: 20240412
CC-Canto 34335 Revision: 20240412

Cantonese Honzi Dictionaries

Title Entry Count Information
Words.hk 粵典 漢字 [2024-02-10] 6638 Author: Marv
Revision: 1.0.0
URL: https://github.com/MarvNC/wordshk-yomitan
Attribution: Words.hk & contributers (https://words.hk)
See license at https://words.hk/base/hoifong/
Description:
Converted from the free Words.hk dictionary found at https://words.hk/.
Converted using https://github.com/MarvNC/yomichan-dict-builder

Mandarin Chinese

Mandarin Chinese Frequency Dictionaries

Title Entry Count Information
BLCUmixed 98089 Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan
Revision: zhfreq_mixed_2023-06-20
URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/
Attribution: Beijing Language and Culture University
Description:
This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university.
BLCUsci 92779 Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan
Revision: zhfreq_sci_2023-06-20
URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/
Attribution: Beijing Language and Culture University
Description:
This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university.
BLCUcoll 91797 Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan
Revision: zhfreq_coll_2023-06-20
URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/
Attribution: Beijing Language and Culture University
Description:
This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university.
BLCUnews 91690 Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan
Revision: zhfreq_news_2023-06-20
URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/
Attribution: Beijing Language and Culture University
Description:
This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university.
BLCUlit 90580 Author: Beijing Language and Culture University compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan
Revision: zhfreq_lit_2023-06-20
URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/
Attribution: Beijing Language and Culture University
Description:
This frequency list is taken from BearXiong's post on chinese-forums.com. The data itself comes from the 15 billion character corpus compiled by the Beijing Language and Culture university.
SUBTLEX-CH 62096 Author: University of Ghent compiled the data, BearXiong converted it for Pleco, nadavspi and Michel converted it for yomichan
Revision: zhfreq_subs_2023-06-20
URL: https://www.chinese-forums.com/forums/topic/56816-sharing-a-pleco-word-frequency-user-dictionary/
Attribution: University of Ghent
Description:
This frequency list is taken from BearXiong's post on chinese-forums.com. The subtitles category is based on SUBTLEX, a frequency list compiled by the university of Ghent, Belgium.
Sinica 11549 Author: Sinica, Michel
Revision: Sinica_2024-04-16
URL: https://elearning.ling.sinica.edu.tw/eng_jindai.html
Attribution: Institute of Linguistics, Academia Sinica
Description:
This frequency list is taken from the sinica.edu.tw website and gives the frequencies of traditional characters in a Taiwanese corpus. Note that the website is from 2005, the encoding was obsolete and some characters were not rendered. It was not possible to retrieve all entries for the rarest words either, as only 300 of those occurring only once could be shown on the webpage. As a result, the full list has little more than 10 000 unique entries.
HSK 11107 Author: Chinese Ministry of Education, Andy Burke, Michel
Revision: HSK_2023-06-20
URL: https://github.com/andycburke/HSK-3.0-Word-List
Attribution: Chinese Ministry of Education
Description:
This frequency list is based on the new HSK word list (HSK 3.0, 2021). It is taken from Andy Burke's github post, which is itself taken from the Chinese Ministry of Education's original pdf. Levels 7 to 9 are not delineated

Mandarin Chinese Bilingual Term Dictionaries

Title Entry Count Information
Wenlin ABC 207289 Author: Wenlin Institute, rduwjjnh
Revision: Wenlin_2024-04-14
URL: https://wenlin.co/wow/Project:Ci
Attribution: Wenlin Institute
Description:
Published in July 2003 and revised through 2005, the Wenlin ABC Chinese-English Comprehensive Dictionary was produced by the Wenlin Institude in cooperation with the ABC Chinese Dictionary Series Project at the University of Hawaii. It contains over 196,000 entries.
CC-CEDICT [2023-12-20] 198702 Author: MDBG, CC-CEDICT, Marv
Revision: 2023-12-20
URL: https://github.com/MarvNC/cc-cedict-yomitan
Attribution: https://cc-cedict.org/wiki/
Thanks go out to everyone who submitted new words or corrections. Special thanks go out to the CC-CEDICT editor team, who spend many hours doing research to maintain a high quality level:

goldyn_chyld - Matic Kavcic
richwarm - Richard Warmington
vermillion - Julien Baley
ycandau - Yves Candau
feilipu
and the editors who wish to remain anonymous
Special thanks to:

Craig Brelsford, for his extensive list of bird names
Erik Peterson, for his work as the editor of CEDICT
Paul Andrew Denisowski, the original creator of CEDICT
Description:
CC-CEDICT is a continuation of the CEDICT project started by Paul Denisowski in 1997 with the aim to provide a complete downloadable Chinese to English dictionary with pronunciation in pinyin for the Chinese characters.
This dictionary for Yomitan was converted from the data available at https://www.mdbg.net/chinese/dictionary?page=cc-cedict using https://github.com/MarvNC/cc-cedict-yomitan and https://github.com/MarvNC/yomichan-dict-builder.
CEDICT 193931 Revision: cc_cedict_14_3_2021
中日大辞典 第二版 146381 Revision: chuunichi1
Oxford 74663 Revision: Oxford_2024-04-17
Attribution: Oxford University Press
Description:
This is one of the Oxford English-Chinese dictionaries.
白水社**語辞典 63858 Revision: 白水社**語辞典_1
DrEye 30973 Revision: DrEye_2024-04-13
Attribution: Dr.Eye
Description:
Original name: 譯典通英漢雙向字典. This seems to be one of the Dr.Eye Chinese-English dictionaries, a series of commercial dictionaries available in Taiwan. It comes with example sentences and their translations.
500idioms 869 Author: The original authors created the content; Ooodman scraped the data on his github; Michel converted it to yomitan format
Revision: 500idioms_2024-04-13
URL: https://doi.org/10.4324/9780203839140
Attribution: Jiao, L., Kubler, C. C., & Zhang, W.
Description:
A glossary of 500 chengyu with two example sentences for each idiom

Mandarin Chinese Monolingual Term Dictionaries

Title Entry Count Information
ZH Wikipedia [2022-12-01] 1249877 Author: Wikipedians, DBPedia, Marv
Revision: wikipedia_2023-12-20T01:18:42.692Z
URL: https://github.com/MarvNC/wikipedia-yomitan
Attribution: https://zh.wikipedia.org/
Description:
Wikipedia short abstracts from the DBPedia dataset available at https://databus.dbpedia.org/dbpedia/text/short-abstracts.

Recommended custom CSS:
div.gloss-sc-div[data-sc-wikipedia=term-specifier] {
color: #e5007f;
}
漢語大詞典 550544 Revision: 漢語大詞典_1
MoEdict 266956 Revision: MoEdict-2024_04-19
Attribution: Tang Feng
Description:
萌典 (mengdian) is a digital Chinese dictionary developed by Taiwanese free software programmer Tang Feng. It is one of the projects of Taiwan's open source community g0v Zero Hour Government. As a digital Chinese dictionary, Mengdian not only contains 160,000 entries in Mandarin, but also contains 20,000 entries in Taiwanese Hokkien, 14,000 entries in Taiwanese Hakka, and provides comparisons with English, French and German.
The website author Tang Feng released it into the public domain under the Creative Commons CC 0 license.
(Information taken from Wikipedia; the name MoEdict and abbreviation for 'Ministery of Education's dictionary')
兩岸詞典 163091 Revision: 兩岸詞典_1
辭源 91538 Revision: 辭源
牛津英汉汉英词典 74663 Revision: lix 2
XiandaiGuifan 72841 Revision: XiandaiGuifan_2024-04-17
Attribution: 外语教学与研究出版社 (Foreign Language Teaching and Research Press)
Description:
Xiandai Hanyu Guifan Cidian (现代汉语规范词典) is a dictionary of Standard Chinese created as part of a proposal in the Eighth Five-year Plan of China. It is similar to Xiandai Hanyu Cidian, but with notable divergences. The third edition has entries for 12,000 characters and 72,000 words, with over 80,000 example usages. (Wikipedia)
Xiandai7 70861 Revision: Xiandai_2024-04-20
Attribution: 商务印书馆有限公司 (Commercial Press);外语教学与研究出版社 (Foreign Language Teaching and Research Press)
Description:
Xiandai Hanyu Cidian (现代汉语词典) also known as Contemporary Chinese Dictionary is an important one-volume dictionary of Standard Mandarin Chinese published by the Commercial Press, now into its 7th (2016) edition. It was the first People's Republic of China dictionary to be arranged according to Hanyu Pinyin, the phonetic standard for Standard Mandarin Chinese, with explanatory notes in simplified Chinese. It contains over 70 000 entries. (Information taken from Wikipedia)
Wunan 59935 Revision: Wunan_2024-04-13
Attribution: Wu-Nan Book Inc.
Description:
The Dictionary of Mandarin (五南国语活用辞典) is a dictionary published by the Wunan Book Publishing Company in Taiwan for readers with an intermediate level of language proficiency. (Wikipedia). Please note that the pinyin was derived from the characters, as lix's file only had zhuyin.
譯典通英漢雙向字典 30973 Revision: lix 2

Mandarin Chinese Hanzi Dictionaries

Title Entry Count Information
ZH Wiktionary Hanzi 97641 Author: Wiktionary, Wikimedia Foundation, Marv
Revision: ZH_Wikt_Hanzi2023-03-05T21:01:41.039Z
URL: https://zh.wiktionary.org/wiki/Category:%E6%BC%A2%E5%AD%97%E5%AD%97%E5%85%83
Attribution: ZH Wiktionary
Description:
Hanzi data scraped from zh.wiktionary.org
Parsed/converted by https://github.com/MarvNC/yomichan-dictionaries
康熙字典 46836 Revision: 康熙字典
CC-CEDICT Hanzi [2023-12-20] 17740 Author: MDBG, CC-CEDICT, Marv
Revision: 2023-12-20
URL: https://github.com/MarvNC/cc-cedict-yomitan
Attribution: https://cc-cedict.org/wiki/
Thanks go out to everyone who submitted new words or corrections. Special thanks go out to the CC-CEDICT editor team, who spend many hours doing research to maintain a high quality level:

goldyn_chyld - Matic Kavcic
richwarm - Richard Warmington
vermillion - Julien Baley
ycandau - Yves Candau
feilipu
and the editors who wish to remain anonymous
Special thanks to:

Craig Brelsford, for his extensive list of bird names
Erik Peterson, for his work as the editor of CEDICT
Paul Andrew Denisowski, the original creator of CEDICT
Description:
CC-CEDICT is a continuation of the CEDICT project started by Paul Denisowski in 1997 with the aim to provide a complete downloadable Chinese to English dictionary with pronunciation in pinyin for the Chinese characters.
This dictionary for Yomitan was converted from the data available at https://www.mdbg.net/chinese/dictionary?page=cc-cedict using https://github.com/MarvNC/cc-cedict-yomitan and https://github.com/MarvNC/yomichan-dict-builder.
EDHCC 6371 Author: The original authors created the content; lxs602 made it available in dictionary format; Michel converted it for yomitan
Revision: EDHCC_2024-04-15
Attribution: Lawrence J. Howell; Hikaru Morimoto
Description:
The Etymological Dictionary of Han Chinese Characters contains approximately 6000 entries explaining the connections between glyph and original meanings in Old Chinese. By Lawrence J. Howell, with Hikaru Morimoto. Compiled into mdx dictionary format by lxs602 https://github.com/lxs602/Chinese-Mandarin-Dictionaries. Converted to yomitan format by Michel

yomitan-dict-stats's People

Contributors

marvnc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

yomitan-dict-stats's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.