Code Monkey home page Code Monkey logo

Comments (3)

JulianSlzr avatar JulianSlzr commented on May 13, 2024 3

Thanks for the suggestion! I updated to Transformers 3.3.1 and added DistilBERT & ALBERT. The main thing is defining an *BERTForMaskedLMOptimized class for speed. You can follow my example to add support for other MLMs. Pull requests welcome 🙂.

I also ran the two on BLiMP. Though ALBERT improves on downstream tasks like GLUE over RoBERTa, its BLiMP scoring is on par with BERT. Likewise, DistilBERT does similar to BERT on GLUE, but is much worse on BLiMP (78% vs 84%).

Possible takeaways:

  • DistilBERT's KD objective and taking alternate layers has an effect. Quantifier and island effects performance degrades significantly (maybe the knowledge was encoded in the offset layers? maybe probabilities are now too soft?).
  • Having pre-training match evaluation is likely more important. We saw this with LibriSpeech in our paper. Here, ALBERT and BERT are trained on the same corpus, while RoBERTa is trained on a larger corpus that may cover BLiMP better.
# distilbert-base-cased
anaphor_agreement:  0.983
argument_structure:  0.7857777777777778
binding:  0.7335714285714285
control_raising:  0.7788
determiner:  0.970375
ellipsis:  0.915
filler_gap:  0.7464285714285716
irregular_forms:  0.9555
island_effects:  0.54925
npi_licensing:  0.7901428571428571
quantifiers:  0.5895
subject_verb_agreement:  0.8965000000000001
overall:  0.782955223880597

# albert-xxlarge-v2
anaphor_agreement:  0.956
argument_structure:  0.8375555555555555
binding:  0.7912857142857143
control_raising:  0.865
determiner:  0.9395
ellipsis:  0.8735
filler_gap:  0.8188571428571427
irregular_forms:  0.9255
island_effects:  0.74975
npi_licensing:  0.9115714285714285
quantifiers:  0.6739999999999999
subject_verb_agreement:  0.8808333333333334
overall:  0.8435820895522389

from mlm-scoring.

gerardb7 avatar gerardb7 commented on May 13, 2024

Grand job, thanks a lot!

from mlm-scoring.

Ago3 avatar Ago3 commented on May 13, 2024

Hi,

I'm extending the framework to include another PyTorch model. When using MLMScorerPT we don't need to pass a vocab, do we? I couldn't find any function where it is actually used..

Thank you!

PS: Very cool work :)

from mlm-scoring.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.