xhan77 / adaptabert Goto Github PK
View Code? Open in Web Editor NEWUnsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling
Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling
Hello,
I read the paper and I have one quick question regarding the vocabulary. When BERT is domain fine-tuned, this is what my understanding is. Take the BERT pre-trained model (say, bert-base-uncased) which comes with its own config, vocab and the model. When you use this pre-trained model on your domain-specific corpus, all you're basically doing is extending the above base model by training it on additional domain-specific documents (following the same (1) masked LM and (2) NSP tasks).
It is not quite clear from the paper if the vocab is extended too for the OOV words in the new domain corpus and are the existing words in the vocab retrained too.
Similarly, does the vocab for task fine-tuning get modified or it is just that the model learns new weights?
Kindly let me know if I am missing something.
Thank you.
hi @xhan77
In domain tuning.py how to give unlabelled data i.e without annotations for training? i see you have called methods get_sep_twitter_train_examples
and get_conll_train_examples
in BERTDataset(). Both twitter and conll .pkl files have B,I,O tags.
I have custom annotated data set 1 and another unannotated set 2.
How can i give input has only text and no tags. Please let me know
Hi,
For domain-specific fine-tuning [extend the bert-base-model by feeding it domain-specific texts], I was wondering why not use run_lm_finetuning.py for domain-tuning.py and domain-tuning.py
Thanks!
Hi, I just read the paper and found it a nice work.
As far as I know, there are much noise (including URLs, "@", "#") within Tweets. So before domain tuning, did you preprocess the million unlabeled tweets corpus? If yes, how did you perform the preprocessing?
Thanks.
Hi,
After going through the scripts for task-fine-tuning and domain-fine-tuning, I was wondering if you could give a sample dataset example.
A few months back, I was following this post for task fine-tuning. I am trying to relate it with your fine-tuning since it gives speed-up and also domain-specific fine-tuning.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.