Code Monkey home page Code Monkey logo

Comments (2)

datquocnguyen avatar datquocnguyen commented on May 14, 2024 2

@nu11us I recently developed the fast tokenizer for bertweet-base. You might experiment with it by installing transformers from:
git clone --single-branch --branch fast_tokenizers_BARTpho_PhoBERT_BERTweet https://github.com/datquocnguyen/transformers.git

If you find it useful, please comment at this thread huggingface/transformers#17254 (comment), so that the fast tokenizer will be merged into the main transformers soon.

from bertweet.

datquocnguyen avatar datquocnguyen commented on May 14, 2024

bertweet-base should run without issue under the legacy mode: https://github.com/huggingface/transformers/tree/main/examples/legacy/token-classification

Here is an example for sequence labeling with bertweet-base:

cd transformers/examples/legacy/token-classification
 
TASK_NAME=ner
SEED=1000
OUTPUT_DIR=evalBERTweet_data/ner-wnut16-s1000-bertweet-base
MAX_LENGTH=128
BERT_MODEL=bertweet-base
BATCH_SIZE=32
NUM_EPOCHS=50
SAVE_STEPS=20
PEAK_LR=1e-5
WARMUP=200
METRIC=f1
DATA_DIR=NER/wnut16
LABELS=NER/wnut16/labels.txt
 
python3 run_ner.py \
--model_name_or_path $BERT_MODEL \
--output_dir $OUTPUT_DIR \
--labels $LABELS \
--seed $SEED \
--per_device_train_batch_size $BATCH_SIZE \
--tokenizer_name $BERT_MODEL \
--num_train_epochs $NUM_EPOCHS \
--learning_rate $PEAK_LR \
--warmup_steps $WARMUP \
--data_dir $DATA_DIR \
--do_train \
--do_eval \
--do_predict \
--evaluation_strategy epoch \
--save_strategy epoch \
--save_total_limit 3 \
--metric_for_best_model $METRIC \
--load_best_model_at_end \
--overwrite_output_dir 

from bertweet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.