Bling Fire - high speed text tokenization - for Ruby
Add this line to your application’s Gemfile:
gem 'blingfire'
Create a model
model = BlingFire::Model.new
Tokenize words
model.text_to_words(text)
Tokenize sentences
model.text_to_sentences(text)
BlingFire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:
Load a model
model = BlingFire.load_model("bert_base_tok.bin")
Convert text to ids
model.text_to_ids(text)
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/blingfire.git
cd blingfire
bundle install
bundle exec rake vendor:all
bundle exec rake test