Thanks for making the model available in huggingface hub. I tried to use it with some

Issue when fine-tuning the model from huggingface hub about bertweet HOT 7 CLOSED

vinairesearch commented on May 13, 2024

Issue when fine-tuning the model from huggingface hub

from bertweet.

Comments (7)

ioana-blue commented on May 13, 2024 3

I figured out what the problem is. I was running fine tuning with a max_seq_length of 512 while the BERTweet model was trained with 130. Once I used sequence length less than 130, it worked. I asked for a feature request for transformers to assert the seq size is less than max_position_embedding. See huggingface/transformers#10015

from bertweet.

ioana-blue commented on May 13, 2024

It's not a gpu problem. I tried running on the cpu, it also crashes with the following:

***** Running training *****
  Num examples = 15383
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 1923
  0%|                                                                                                                      | 0/1923 [00:00<?, ?it/s]terminate called after throwing an instance of 'std::runtime_error'
  what():  NCCL Error 1: unhandled cuda error

from bertweet.

ioana-blue commented on May 13, 2024

I upgraded to latest pytorch (1.7.1), same issue.

from bertweet.

datquocnguyen commented on May 13, 2024

Please can you try a newer transformers version ?

from bertweet.

datquocnguyen commented on May 13, 2024

I have no idea what happened.
You might also try to delete/remove BERTweet from your transformers folder in ~/.cache/torch, so it'd automatically re-download BERTweet properly.

from bertweet.

ioana-blue commented on May 13, 2024

Sure, I can try that as well. Meanwhile, I ran in interactive mode on a gpu and I managed to get better errors (haven't looked into why this happens):

Traceback (most recent call last):
  File "../models/jigsaw/tr-3.4//run_puppets.py", line 284, in <module>
    main()
  File "../models/jigsaw/tr-3.4//run_puppets.py", line 195, in main
    trainer.train(
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/transformers/trainer.py", line 756, in train
    tr_loss += self.training_step(model, inputs)
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/transformers/trainer.py", line 1056, in training_step
    loss = self.compute_loss(model, inputs)
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/transformers/trainer.py", line 1080, in compute_loss
    outputs = model(**inputs)
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/transformers/modeling_roberta.py", line 990, in forward
    outputs = self.roberta(
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/transformers/modeling_roberta.py", line 674, in forward
    embedding_output = self.embeddings(
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/dccstor/redrug_ier/envs/attack/lib/python3.8/site-packages/transformers/modeling_roberta.py", line 121, in forward
    embeddings = inputs_embeds + position_embeddings + token_type_embeddings
RuntimeError: CUDA error: device-side assert triggered
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [616,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

from bertweet.

datquocnguyen commented on May 13, 2024

I am not sure the error comes from BERTweet: indexSelectLargeIndex: block: [616,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.

from bertweet.

Issue when fine-tuning the model from huggingface hub about bertweet HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent