Code Monkey home page Code Monkey logo

adaptabert's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

adaptabert's Issues

Any preprocessing step for fine-tuning on social media microblogs?

Hi, I just read the paper and found it a nice work.

As far as I know, there are much noise (including URLs, "@", "#") within Tweets. So before domain tuning, did you preprocess the million unlabeled tweets corpus? If yes, how did you perform the preprocessing?

Thanks.

clarification on the BERT vocab

Hello,

I read the paper and I have one quick question regarding the vocabulary. When BERT is domain fine-tuned, this is what my understanding is. Take the BERT pre-trained model (say, bert-base-uncased) which comes with its own config, vocab and the model. When you use this pre-trained model on your domain-specific corpus, all you're basically doing is extending the above base model by training it on additional domain-specific documents (following the same (1) masked LM and (2) NSP tasks).

It is not quite clear from the paper if the vocab is extended too for the OOV words in the new domain corpus and are the existing words in the vocab retrained too.

Similarly, does the vocab for task fine-tuning get modified or it is just that the model learns new weights?

Kindly let me know if I am missing something.

Thank you.

example data for task-fine-tuning and domain-fine-tuning

Hi,

After going through the scripts for task-fine-tuning and domain-fine-tuning, I was wondering if you could give a sample dataset example.

A few months back, I was following this post for task fine-tuning. I am trying to relate it with your fine-tuning since it gives speed-up and also domain-specific fine-tuning.

Thanks!

how to give unlabelled text data as input?

hi @xhan77
In domain tuning.py how to give unlabelled data i.e without annotations for training? i see you have called methods get_sep_twitter_train_examples and get_conll_train_examples in BERTDataset(). Both twitter and conll .pkl files have B,I,O tags.
I have custom annotated data set 1 and another unannotated set 2.
How can i give input has only text and no tags. Please let me know

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.