Code Monkey home page Code Monkey logo

Comments (8)

mosuka avatar mosuka commented on May 20, 2024 1

Hi @djKooks ,
To use it from the CLI, give the generated dictionary as a command line argument, as shown in the following URL.

https://github.com/lindera-morphology/lindera-ko-dic-builder#tokenizing-text-using-produced-dictionary

If you want to use it in your application, please give the path to the generated dictionary to Tokenizer::new() as follows.

let mut tokenizer = Tokenizer::new(Mode::Normal, "/path/to/dictionary");

from lindera.

kination avatar kination commented on May 20, 2024 1

@mosuka ah, it works now. Thanks~

from lindera.

kination avatar kination commented on May 20, 2024

@mosuka thanks for feedback. It works well ๐Ÿ™‡
BTW, when I setup dictionary as follows, Japanese tokenizing seems not works well. Do I need to setup dictionary separately by language?

from lindera.

mosuka avatar mosuka commented on May 20, 2024

@djKooks ,
Yes, you need to create a tokenizer for each dictionary.

from lindera.

kination avatar kination commented on May 20, 2024

@mosuka thanks ๐Ÿ™
One more, could you let me know what Mode is for?

I'm testing by changing Mode::Normal / Mode::Decompose, but setting Mode::Decompose is showing error...

let mut tokenizer = LinderaTokenizer::new(Mode::Decompose, &self.dict);   // <- this not works

Is this appropriate value?

from lindera.

mosuka avatar mosuka commented on May 20, 2024

@djKooks
What kind of error are you experiencing?
I'm trying with Lindera CLI, but I'm not getting any errors.

% echo "ํ•˜๋„ค๋‹ค๊ณตํ•ญํ•œ์ •ํ† ํŠธ๋ฐฑ" | lindera -d ./lindera-ko-dic-2.1.1-20180720 -m decompose
ํ•˜๋„ค๋‹ค  NNP,์ธ๋ช…,F,ํ•˜๋„ค๋‹ค,*,*,*,*
๊ณตํ•ญ    NNG,์žฅ์†Œ,T,๊ณตํ•ญ,*,*,*,*
ํ•œ์ •    NNG,*,T,ํ•œ์ •,*,*,*,*
ํ† ํŠธ๋ฐฑ  NNG,*,T,ํ† ํŠธ๋ฐฑ,Compound,*,*,ํ† ํŠธ/NNP/์ธ๋ช…+๋ฐฑ/NNG/*
EOS

from lindera.

kination avatar kination commented on May 20, 2024

@mosuka I'm trying to use the API

let tokenizer = Tokenizer::new(Mode::Normal, "/path/to/dict");   // <- this works
...
let tokenizer = Tokenizer::new(Mode::Decompose, "/path/to/dict");   // <- this not works

from lindera.

mosuka avatar mosuka commented on May 20, 2024

@djKooks Ah, how about this?

Mode::Decompose(Penalty::default())

from lindera.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.