Comments (8)
Hi @djKooks ,
To use it from the CLI, give the generated dictionary as a command line argument, as shown in the following URL.
If you want to use it in your application, please give the path to the generated dictionary to Tokenizer::new() as follows.
let mut tokenizer = Tokenizer::new(Mode::Normal, "/path/to/dictionary");
from lindera.
@mosuka ah, it works now. Thanks~
from lindera.
@mosuka thanks for feedback. It works well
BTW, when I setup dictionary as follows, Japanese tokenizing seems not works well. Do I need to setup dictionary separately by language?
from lindera.
@djKooks ,
Yes, you need to create a tokenizer for each dictionary.
from lindera.
@mosuka thanks
One more, could you let me know what Mode
is for?
I'm testing by changing Mode::Normal / Mode::Decompose
, but setting Mode::Decompose
is showing error...
let mut tokenizer = LinderaTokenizer::new(Mode::Decompose, &self.dict); // <- this not works
Is this appropriate value?
from lindera.
@djKooks
What kind of error are you experiencing?
I'm trying with Lindera CLI, but I'm not getting any errors.
% echo "ํ๋ค๋ค๊ณตํญํ์ ํ ํธ๋ฐฑ" | lindera -d ./lindera-ko-dic-2.1.1-20180720 -m decompose
ํ๋ค๋ค NNP,์ธ๋ช
,F,ํ๋ค๋ค,*,*,*,*
๊ณตํญ NNG,์ฅ์,T,๊ณตํญ,*,*,*,*
ํ์ NNG,*,T,ํ์ ,*,*,*,*
ํ ํธ๋ฐฑ NNG,*,T,ํ ํธ๋ฐฑ,Compound,*,*,ํ ํธ/NNP/์ธ๋ช
+๋ฐฑ/NNG/*
EOS
from lindera.
@mosuka I'm trying to use the API
let tokenizer = Tokenizer::new(Mode::Normal, "/path/to/dict"); // <- this works
...
let tokenizer = Tokenizer::new(Mode::Decompose, "/path/to/dict"); // <- this not works
from lindera.
@djKooks Ah, how about this?
Mode::Decompose(Penalty::default())
from lindera.
Related Issues (20)
- Build binary using UniDic with GitHub Actions
- Lindera doesnโt build HOT 5
- Migrate UniDic3
- Add Japanese part-of-speech stop token filter
- Add Japanese part-of-speech keep token filter
- Add lower case token filter HOT 1
- Add upper case token filter HOT 1
- Add Japanese compound noun token filter
- Add IPADIC base form token filter
- Add UniDic base form token filter
- Add IPADIC reading form token filter
- Add UniDic reading form token filter
- Add Korean part-of-speech stop token filter
- Add Korean part-of-speech keep token filter
- Add Japanese number token filter
- Add Japanese iteration mark character filter
- Add n-gram token filter
- Add Japanese completion token filter
- Add Korean reading token filter
- Add Korean number token filter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lindera.