Hi, I was planning on implementing the same POS tagger architecture using the <code cl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

bertweet-base should run without issue under the <cod

Reproducing the POS Tagger Results using Huggingface Tokenizer Offsets about bertweet HOT 2 CLOSED

vinairesearch commented on May 14, 2024

Reproducing the POS Tagger Results using Huggingface Tokenizer Offsets

from bertweet.

Comments (2)

datquocnguyen commented on May 14, 2024 2

@nu11us I recently developed the fast tokenizer for bertweet-base. You might experiment with it by installing transformers from:
git clone --single-branch --branch fast_tokenizers_BARTpho_PhoBERT_BERTweet https://github.com/datquocnguyen/transformers.git

If you find it useful, please comment at this thread huggingface/transformers#17254 (comment), so that the fast tokenizer will be merged into the main transformers soon.

from bertweet.

datquocnguyen commented on May 14, 2024

bertweet-base should run without issue under the legacy mode: https://github.com/huggingface/transformers/tree/main/examples/legacy/token-classification

Here is an example for sequence labeling with bertweet-base:

cd transformers/examples/legacy/token-classification
 
TASK_NAME=ner
SEED=1000
OUTPUT_DIR=evalBERTweet_data/ner-wnut16-s1000-bertweet-base
MAX_LENGTH=128
BERT_MODEL=bertweet-base
BATCH_SIZE=32
NUM_EPOCHS=50
SAVE_STEPS=20
PEAK_LR=1e-5
WARMUP=200
METRIC=f1
DATA_DIR=NER/wnut16
LABELS=NER/wnut16/labels.txt
 
python3 run_ner.py \
--model_name_or_path $BERT_MODEL \
--output_dir $OUTPUT_DIR \
--labels $LABELS \
--seed $SEED \
--per_device_train_batch_size $BATCH_SIZE \
--tokenizer_name $BERT_MODEL \
--num_train_epochs $NUM_EPOCHS \
--learning_rate $PEAK_LR \
--warmup_steps $WARMUP \
--data_dir $DATA_DIR \
--do_train \
--do_eval \
--do_predict \
--evaluation_strategy epoch \
--save_strategy epoch \
--save_total_limit 3 \
--metric_for_best_model $METRIC \
--load_best_model_at_end \
--overwrite_output_dir

from bertweet.

Recommend Projects