Comments (2)
@nu11us I recently developed the fast tokenizer for bertweet-base
. You might experiment with it by installing transformers
from:
git clone --single-branch --branch fast_tokenizers_BARTpho_PhoBERT_BERTweet https://github.com/datquocnguyen/transformers.git
If you find it useful, please comment at this thread huggingface/transformers#17254 (comment), so that the fast tokenizer will be merged into the main transformers
soon.
from bertweet.
bertweet-base
should run without issue under the legacy
mode: https://github.com/huggingface/transformers/tree/main/examples/legacy/token-classification
Here is an example for sequence labeling with bertweet-base
:
cd transformers/examples/legacy/token-classification
TASK_NAME=ner
SEED=1000
OUTPUT_DIR=evalBERTweet_data/ner-wnut16-s1000-bertweet-base
MAX_LENGTH=128
BERT_MODEL=bertweet-base
BATCH_SIZE=32
NUM_EPOCHS=50
SAVE_STEPS=20
PEAK_LR=1e-5
WARMUP=200
METRIC=f1
DATA_DIR=NER/wnut16
LABELS=NER/wnut16/labels.txt
python3 run_ner.py \
--model_name_or_path $BERT_MODEL \
--output_dir $OUTPUT_DIR \
--labels $LABELS \
--seed $SEED \
--per_device_train_batch_size $BATCH_SIZE \
--tokenizer_name $BERT_MODEL \
--num_train_epochs $NUM_EPOCHS \
--learning_rate $PEAK_LR \
--warmup_steps $WARMUP \
--data_dir $DATA_DIR \
--do_train \
--do_eval \
--do_predict \
--evaluation_strategy epoch \
--save_strategy epoch \
--save_total_limit 3 \
--metric_for_best_model $METRIC \
--load_best_model_at_end \
--overwrite_output_dir
from bertweet.
Related Issues (20)
- Can't load Tokenizer HOT 1
- Some emojis not tokenized properly HOT 5
- using model for local tweets author prediction HOT 1
- What are pre-processing steps applied HOT 1
- Use model output for sentiment classifcation HOT 2
- AutoTokenizer gives error HOT 1
- Using with BERTweet with Farm HOT 1
- Applying Bertweet to a huge pandas dataframe HOT 1
- How to get the dependency parsing result using BERTweet HOT 1
- Preprocessing of tweets HOT 1
- What is the masked token in BERTweet? HOT 1
- Question about normalization=True HOT 1
- Truncated Tweets from Archive Team Tweet Stream HOT 1
- About the tokenizer for Bertweet-Large HOT 1
- next sentence prediction HOT 1
- vinai/bertweet-large returns LABEL_0 all the time HOT 1
- Tokenizer vinai/bertweet-covid19-base-uncased HOT 3
- IndexError: index out of range in self HOT 2
- Sentimental analysis of tweets. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bertweet.