dhavaltaunk08 / nlp_scripts Goto Github PK

View Code? Open in Web Editor NEW

38.0 38.0 16.0 62 KB

Contains notebooks related to various transformers based models for different nlp based tasks

Jupyter Notebook 90.96% Python 8.84% Shell 0.21%

nlp_scripts's Introduction

Hi 👋, I'm Dhaval Taunk

A passionate Data Scientist from India

📝 I regularly write articles on https://medium.com/@taunkdhaval08
💬 Ask me about Machine Learning, Deep Learning, Python
📫 How to reach me [email protected]

Blogs posts

Connect with me:

Languages and Tools:

nlp_scripts's People

Contributors

Stargazers

Watchers

Forkers

xccanxin venk-chari pawan-shetty sciencepal tingnlp dangokuson techthiyanes udemirezen ashutoshdongare kang9779 jaydass nicoleta-kyo aliprf matthewzgong visriv bmanobel

nlp_scripts's Issues

The test score on Kaggle seems quiet low

I modify it and then predict the test.tsv and submit to kaggle sentiment competition. The score is quite low, much lower than BERT, so how much epoch will be better. For Bert, I just use 3 epoch, but increase epoch in this roberta seems not work.

How to load the fine-tuned model after save it local.

I noticed the fine-tuned roberta script saved the fine-tuned model locally by

model_to_save = model
torch.save(model_to_save, output_model_file)
tokenizer.save_vocabulary(output_vocab_file)

How to load this model from the local folder?
I try this:

from transformers import RobertaModel, RobertaTokenizer
# from the local folder
roberta_model = RobertaModel.from_pretrained("./")
roberta_tokenizer = RobertaTokenizer.from_pretrained('./', truncation=True, do_lower_case=True)

But failed with error: OSError: does not appear to have a file named config.json. Checkout 'https://huggingface.co/.//main' for available files.

Multi-gpu support

The current code is unable to support multi-GPU environment.

I parallelized the model using torch.nn.DataParallel(model).cuda()

I got this error when I try :

When shall clearing gradients happen?

Thank you. In the Transformers multilabel distillbert, in the function "train(epoch)", you clear gradients after the model feed-forward operation and before loss backward calculation, and you uses "optimizer.zero_grad()" twice. Will this clear the gradients that the backward calculation needs? Shall it be like this where we only need to clear gradients once before model feed-forward operation:

    optimizer.zero_grad()
    outputs = model(ids, mask, token_type_ids)

    loss = loss_fn(outputs, targets)
    if _%5000==0:
        print(f'Epoch: {epoch}, Loss:  {loss.item()}')

    loss.backward()
    optimizer.step()

print(text)
print(repr(text))
text = " ".join(text.split())
print(text)
print(repr(text))

Output:

proving there is no cure for stupidity.
'proving there is no cure for stupidity.'
proving there is no cure for stupidity.
'proving there is no cure for stupidity.'

Maybe it makes a difference in certain cases? Or can I just remove this line?