Code Monkey home page Code Monkey logo

kuromojin's Introduction

kuromojin Actions Status: test

Provide a high level wrapper for kuromoji.js.

Features

  • Promise based API
  • Cache Layer
    • Fetch the dictionary at once
    • Return same tokens for same text

Installation

npm install kuromojin

Online Playground

📝 Require DecompressionStream supported browser

Usage

Export two API.

  • getTokenizer() return Promise that is resolved with kuromoji.js's tokenizer instance.
  • tokenize() return Promise that is resolved with analyzed tokens.
import {tokenize, getTokenizer} from "kuromojin";

getTokenizer().then(tokenizer => {
    // kuromoji.js's `tokenizer` instance
});

tokenize(text).then(tokens => {
    console.log(tokens)
    /*
    [ {
        word_id: 509800,          // 辞書内での単語ID
        word_type: 'KNOWN',       // 単語タイプ(辞書に登録されている単語ならKNOWN, 未知語ならUNKNOWN)
        word_position: 1,         // 単語の開始位置
        surface_form: '黒文字',    // 表層形
        pos: '名詞',               // 品詞
        pos_detail_1: '一般',      // 品詞細分類1
        pos_detail_2: '*',        // 品詞細分類2
        pos_detail_3: '*',        // 品詞細分類3
        conjugated_type: '*',     // 活用型
        conjugated_form: '*',     // 活用形
        basic_form: '黒文字',      // 基本形
        reading: 'クロモジ',       // 読み
        pronunciation: 'クロモジ'  // 発音
      } ]
    */
});

For browser/global options

If window.kuromojin.dicPath is defined, kuromojin use it as default dict path.

import {getTokenizer} from "kuromojin";
// Affect all module that are used kuromojin.
window.kuromojin = {
    dicPath: "https://cdn.jsdelivr.net/npm/[email protected]/dict"
};
// this `getTokenizer` function use "https://kuromojin.netlify.com/dict" 
getTokenizer();
// === 
getTokenizer({dicPath: "https://cdn.jsdelivr.net/npm/[email protected]/dict"})

📝 Test dictionary URL

Note: backward compatibility for <= 1.1.0

kuromojin v1.1.0 export tokenize as default function.

kuromojin v2.0.0 remove the default function.

import kuromojin from "kuromojin";
// kuromojin === tokenize

Recommended: use import {tokenize} from "kuromojin" instead of it

import {tokenize} from "kuromojin";

Note: kuromoji version is pinned

kuromojin pin kuromoji's version.

It aim to dedupe kuromoji's dictionary. The dictionary is large and avoid to duplicated dictionary.

Related

Tests

npm test

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D

License

MIT

kuromojin's People

Contributors

azu avatar dependabot[bot] avatar georgeosddev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

kuromojin's Issues

Update to kuromoji.js 0.1.2: HELP WANTED 🙇 🆘

kuromoji.js 0.1.2 has a breaking change.

We will update [email protected] → 0.12 as major update kuromojin@3.

After that, we need to update textlint rules that depended on kuromoji@2.
Some rules are broken and we need to fix it!

If you want to help us, please comments!

アップデートのPRを手伝ってくれる人は、どれをやるとかのコメントお願いします。

Plan

  1. Release kuromoji@3 that use kuromoji.js 0.1.2
  2. Update all dependant rules - migrate kuromoji@2 to kuromoji@3
  • If no effect, publish as a minor updates
  • Publish as major updates
  1. Update all presets

Need to update rules

📝 azu/migrate-travis-ci-to-github-actions: Migrate Travis CI to GitHub Actions. Node.js CI settingsを使ってTravis CI → GitHub Actionsもあわせてやるかも。

壊れたルールの例。

https://gist.github.com/azu/8f4435141a5eefe9dcce41a3652ede0b

CDN not found because of path.join in Kuromoji.js

https://nodejs.org/api/path.html#path_path_join_paths
https://github.com/takuyaa/kuromoji.js/blob/71ea8473bd119546977f22c61e4d52da28ac30a6/src/loader/DictionaryLoader.js#L51
After path.join, "https://" becomes "https:/" and causes 404 error in a browser.

In my use case, I use kuromojin in a web application and this results in "https://cdn.example.com/dict" to be "https://myapp.com/cdn.example.com/dict"and getTokenizer cannot find the dict files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.