Comments (3)
Hey!
Thanks for submitting the issue, it's great to see that people are trying to use this library.
The files were actually created from nltk
punkt tokenizer training data. At this time this library was meant to be a 1:1 port of the library so I ended up using their training data.
I'll follow up with what the keys actually mean when I can get a chance to review the data structures -- it's been awhile.
If you wanted to use your own training data, we would have to write the nltk trainer, I attempted to do it at one point but found grabbing their data easier.
https://github.com/nltk/nltk/blob/develop/nltk/tokenize/punkt.py
from sentences.
Ah, ok- thank you! I'll probably just use the training file generated by nltk for the moment, in the interests of just getting it done, but I'll take a look at implementing it. Could be a fun problem.
from sentences.
Closing for now, please feel free to ask follow up questions and I'll re-open.
from sentences.
Related Issues (17)
- optimization suggestion HOT 5
- More sentence examples HOT 20
- Ellipses are split off into sentences HOT 2
- WordTokenizer crops last word in string HOT 1
- spf13/cobra for command line leads to many recursive deps HOT 2
- Installation instructionss broken, binary links dead HOT 1
- How to have all supported languages available at runtime? HOT 3
- double-newlines should always start new sentence? HOT 2
- Allow sentence-final lower-case i HOT 3
- Add a formatting check to the CI Pipeline
- Use with Windows HOT 4
- Add support for Faroese HOT 2
- Sentences get cut off if semi-colon used HOT 1
- The demo doesn't seem to work with these two paragraphs. HOT 1
- The demo breaks on umlauts HOT 2
- Loadtraining fails with HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sentences.