Hello, I'm trying to build tokenizer app which supports korean/japanese with linde

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Build tokenizer for ko/ja,about lindera-morphology/lindera

Comments (8)

mosuka commented on May 20, 2024 1

Hi @djKooks ,
To use it from the CLI, give the generated dictionary as a command line argument, as shown in the following URL.

https://github.com/lindera-morphology/lindera-ko-dic-builder#tokenizing-text-using-produced-dictionary

If you want to use it in your application, please give the path to the generated dictionary to Tokenizer::new() as follows.

let mut tokenizer = Tokenizer::new(Mode::Normal, "/path/to/dictionary");

from lindera.

kination commented on May 20, 2024 1

@mosuka ah, it works now. Thanks~

from lindera.

kination commented on May 20, 2024

@mosuka thanks for feedback. It works well 🙇
BTW, when I setup dictionary as follows, Japanese tokenizing seems not works well. Do I need to setup dictionary separately by language?

from lindera.

mosuka commented on May 20, 2024

@djKooks ,
Yes, you need to create a tokenizer for each dictionary.

from lindera.

kination commented on May 20, 2024

@mosuka thanks 🙏
One more, could you let me know what Mode is for?

I'm testing by changing Mode::Normal / Mode::Decompose, but setting Mode::Decompose is showing error...

let mut tokenizer = LinderaTokenizer::new(Mode::Decompose, &self.dict);   // <- this not works

Is this appropriate value?

from lindera.

mosuka commented on May 20, 2024

@djKooks
What kind of error are you experiencing?
I'm trying with Lindera CLI, but I'm not getting any errors.

% echo "하네다공항한정토트백" | lindera -d ./lindera-ko-dic-2.1.1-20180720 -m decompose
하네다  NNP,인명,F,하네다,*,*,*,*
공항    NNG,장소,T,공항,*,*,*,*
한정    NNG,*,T,한정,*,*,*,*
토트백  NNG,*,T,토트백,Compound,*,*,토트/NNP/인명+백/NNG/*
EOS

from lindera.

kination commented on May 20, 2024

@mosuka I'm trying to use the API

let tokenizer = Tokenizer::new(Mode::Normal, "/path/to/dict");   // <- this works
...
let tokenizer = Tokenizer::new(Mode::Decompose, "/path/to/dict");   // <- this not works

from lindera.

mosuka commented on May 20, 2024

@djKooks Ah, how about this?

Mode::Decompose(Penalty::default())

from lindera.

Recommend Projects

Build tokenizer for ko/ja about lindera HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent