xhan77 / adaptabert Goto Github PK

View Code? Open in Web Editor NEW

47.0 47.0 13.0 10.7 MB

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

Python 54.05% Perl 25.34% Shell 4.11% TeX 9.38% HTML 7.11%

adaptabert's People

Stargazers

Watchers

Forkers

vr25 dadelani souradip93 rogervaas tarpelite xjwangsjtu fawazshah masterwhook kkatsy shubhampachori12110095 deepaliverma esoff lukasgeisseler

adaptabert's Issues

clarification on the BERT vocab

Hello,

I read the paper and I have one quick question regarding the vocabulary. When BERT is domain fine-tuned, this is what my understanding is. Take the BERT pre-trained model (say, bert-base-uncased) which comes with its own config, vocab and the model. When you use this pre-trained model on your domain-specific corpus, all you're basically doing is extending the above base model by training it on additional domain-specific documents (following the same (1) masked LM and (2) NSP tasks).

It is not quite clear from the paper if the vocab is extended too for the OOV words in the new domain corpus and are the existing words in the vocab retrained too.

Similarly, does the vocab for task fine-tuning get modified or it is just that the model learns new weights?

Kindly let me know if I am missing something.

Thank you.

need clarification in task-fine-tuning

Hi,

Can you please help me understand/comment lines 582-589 in task-fine-tuning.py?

Thanks!

how to give unlabelled text data as input?

hi @xhan77
In domain tuning.py how to give unlabelled data i.e without annotations for training? i see you have called methods get_sep_twitter_train_examples and get_conll_train_examples in BERTDataset(). Both twitter and conll .pkl files have B,I,O tags.
I have custom annotated data set 1 and another unannotated set 2.
How can i give input has only text and no tags. Please let me know

domain-specific fine-tuning

Hi,

For domain-specific fine-tuning [extend the bert-base-model by feeding it domain-specific texts], I was wondering why not use run_lm_finetuning.py for domain-tuning.py and domain-tuning.py

Thanks!

Any preprocessing step for fine-tuning on social media microblogs?

Hi, I just read the paper and found it a nice work.

As far as I know, there are much noise (including URLs, "@", "#") within Tweets. So before domain tuning, did you preprocess the million unlabeled tweets corpus? If yes, how did you perform the preprocessing?

Thanks.

example data for task-fine-tuning and domain-fine-tuning

Hi,

After going through the scripts for task-fine-tuning and domain-fine-tuning, I was wondering if you could give a sample dataset example.

A few months back, I was following this post for task fine-tuning. I am trying to relate it with your fine-tuning since it gives speed-up and also domain-specific fine-tuning.

Thanks!

xhan77 / adaptabert Goto Github PK

adaptabert's People

Stargazers

Watchers

Forkers

adaptabert's Issues

clarification on the BERT vocab

need clarification in task-fine-tuning

how to give unlabelled text data as input?

domain-specific fine-tuning

Any preprocessing step for fine-tuning on social media microblogs?

example data for task-fine-tuning and domain-fine-tuning

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent