awslabs / mlm-scoring Goto Github PK

View Code? Open in Web Editor NEW

330.0 15.0 61.0 23.5 MB

Python library & examples for Masked Language Model Scoring (ACL 2020)

Home Page: https://www.aclweb.org/anthology/2020.acl-main.240/

License: Apache License 2.0

Shell 0.81% Python 99.19%

language-model nlp speech-recognition pytorch mxnet bert xlm

mlm-scoring's Introduction

Masked Language Model Scoring

This package uses masked LMs like BERT, RoBERTa, and XLM to score sentences and rescore n-best lists via pseudo-log-likelihood scores, which are computed by masking individual words. We also support autoregressive LMs like GPT-2. Example uses include:

Speech Recognition: Rescoring an ESPnet LAS model (LibriSpeech)
Machine Translation: Rescoring a Transformer NMT model (IWSLT'15 en-vi)
Linguistic Acceptability: Unsupervised ranking within linguistic minimal pairs (BLiMP)

Paper: Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff. "Masked Language Model Scoring", ACL 2020.

Installation

Python 3.6+ is required. Clone this repository and install:

pip install -e .
pip install torch mxnet-cu102mkl  # Replace w/ your CUDA version; mxnet-mkl if CPU only.

Some models are via GluonNLP and others are via 🤗 Transformers, so for now we require both MXNet and PyTorch. You can now import the library directly:

from mlm.scorers import MLMScorer, MLMScorerPT, LMScorer
from mlm.models import get_pretrained
import mxnet as mx
ctxs = [mx.cpu()] # or, e.g., [mx.gpu(0), mx.gpu(1)]

# MXNet MLMs (use names from mlm.models.SUPPORTED_MLMS)
model, vocab, tokenizer = get_pretrained(ctxs, 'bert-base-en-cased')
scorer = MLMScorer(model, vocab, tokenizer, ctxs)
print(scorer.score_sentences(["Hello world!"]))
# >> [-12.410664200782776]
print(scorer.score_sentences(["Hello world!"], per_token=True))
# >> [[None, -6.126736640930176, -5.501412391662598, -0.7825151681900024, None]]

# EXPERIMENTAL: PyTorch MLMs (use names from https://huggingface.co/transformers/pretrained_models.html)
model, vocab, tokenizer = get_pretrained(ctxs, 'bert-base-cased')
scorer = MLMScorerPT(model, vocab, tokenizer, ctxs)
print(scorer.score_sentences(["Hello world!"]))
# >> [-12.411025047302246]
print(scorer.score_sentences(["Hello world!"], per_token=True))
# >> [[None, -6.126738548278809, -5.501765727996826, -0.782496988773346, None]]

# MXNet LMs (use names from mlm.models.SUPPORTED_LMS)
model, vocab, tokenizer = get_pretrained(ctxs, 'gpt2-117m-en-cased')
scorer = LMScorer(model, vocab, tokenizer, ctxs)
print(scorer.score_sentences(["Hello world!"]))
# >> [-15.995375633239746]
print(scorer.score_sentences(["Hello world!"], per_token=True))
# >> [[-8.293947219848633, -6.387561798095703, -1.3138668537139893]]

(MXNet and PyTorch interfaces will be unified soon!)

Scoring

Run mlm score --help to see supported models, etc. See examples/demo/format.json for the file format. For inputs, "score" is optional. Outputs will add "score" fields containing PLL scores.

There are three score types, depending on the model:

Pseudo-log-likelihood score (PLL): BERT, RoBERTa, multilingual BERT, XLM, ALBERT, DistilBERT
Maskless PLL score: same (add --no-mask)
Log-probability score: GPT-2

We score hypotheses for 3 utterances of LibriSpeech dev-other on GPU 0 using BERT base (uncased):

mlm score \
    --mode hyp \
    --model bert-base-en-uncased \
    --max-utts 3 \
    --gpus 0 \
    examples/asr-librispeech-espnet/data/dev-other.am.json \
    > examples/demo/dev-other-3.lm.json

Rescoring

One can rescore n-best lists via log-linear interpolation. Run mlm rescore --help to see all options. Input one is a file with original scores; input two are scores from mlm score.

We rescore acoustic scores (from dev-other.am.json) using BERT's scores (from previous section), under different LM weights:

for weight in 0 0.5 ; do
    echo "lambda=${weight}"; \
    mlm rescore \
        --model bert-base-en-uncased \
        --weight ${weight} \
        examples/asr-librispeech-espnet/data/dev-other.am.json \
        examples/demo/dev-other-3.lm.json \
        > examples/demo/dev-other-3.lambda-${weight}.json
done

The original WER is 12.2% while the rescored WER is 8.5%.

Maskless finetuning

One can finetune masked LMs to give usable PLL scores without masking. See LibriSpeech maskless finetuning.

Development

Run pip install -e .[dev] to install extra testing packages. Then:

To run unit tests and coverage, run pytest --cov=src/mlm in the root directory.

mlm-scoring's People

Contributors

Stargazers

Watchers

Forkers

yyht michaelzhouwang lhaausing mingdachen 53x vblagoje nyu-mll ju-resplande greitzmann sagnik c00renut ago3 hxing093020 shantanunair guydav yyzhuang1991 twinters sevan64 hellorusk-sandbox ruanchaves anmoisio kyhoolee minhpqn quantitative-technologies techthiyanes otarkia tripzin zhengzheng-yay kimcheongbin miidas ayoub1rb sahithyaravi joaoalvarenga fedingo arnedefauw yunfeiyue 10zinten sudan263 nilinykh inimah leo-liuzy baophuc27 prithivirajdamodaran laplacekorea elijahahianyo michaelcaohn aadelucia tomoki0924 zolastro mu4farooqi zankner felixlabelle abtuo yoonjoo16 tjmahr iq-scm dasom-jeon ajianironside sudosadia sotwi

mlm-scoring's Issues

Using mlm-scoring with other public PyTorch RoBERTa model

Hello,
This is probably a silly question, but I'm having a hard time adapting the mlm-scoring to use other public PyTorch RoBERTa models that are not on the list of supported models. Do you have any tutorials/materials on how to use self-trained/other public RoBERTa models with mlm-scoring? Any help would be much appreciated and I apologize in advance in case this information is in the repository and I missed it.

Kind regards,
Danielly

IndexError: too many indices for tensor of dimension 1

Hi there,

I'm using the PyTorch implementation with bert-base-uncased and I get the following error when the sentence contains only one token:

Traceback (most recent call last):
  File "bert.py", line 28, in <module>
    print(scorer.score_sentences(["Hello"]))
  File ".../mlm-scoring/src/mlm/scorers.py", line 167, in score_sentences
    return self.score(corpus, **kwargs)[0]
  File ".../mlm-scoring/src/mlm/scorers.py", line 757, in score
    out = out[list(range(split_size)), token_masked_ids]
IndexError: too many indices for tensor of dimension 1

It works fine with MXNet MLMs, but I need to use a community model from HuggingFace.

Thanks!

PyTorch models

Hi,
It seems that support for PyTorch models is currently limited to bert and xlm. Would it be possible to add support for lighter models, e.g. DistilBERT or ALBERT?
Do you think that using these models would hurt the performance of the scorers significantly?

Thanks!

ValueError: Model 'BertForMaskedLMOptimized' is not supported by the scorer 'RegressionFinetuner'.

Hi there,
I'm using community model 'bert-base-chinese' from HuggingFace to finetune masked LMs and I get the following error:
ValueError:
Model 'BertForMaskedLMOptimized' is not supported by the scorer 'RegressionFinetuner'.

MLMScorer supports MXNet GluonNLP MLMs: ['bert-base-en-uncased', 'bert-base-en-cased', 'roberta-base-en-cased', 'bert-large-en-uncased', 'bert-large-en-cased', 'roberta-large-en-cased', 'bert-base-en-uncased-owt', 'bert-base-multi-uncased', 'bert-base-multi-cased']
LMScorer supports MXNet GluonNLP LMs: ['gpt2-117m-en-cased', 'gpt2-345m-en-cased']
MLMScorerPT supports PyTorch Transformers MLMs:
- 'albert-*' (wrapped by AlbertForMaskedLMOptimized)
- 'bert-*' (wrapped by BertForMaskedLMOptimized)
- 'distilbert-*' (wrapped by DistilBertForMaskedLMOptimized)
- 'xlm-*' (some variants require 'lang' parameter; XLM-R not supported)

What can I do to solve this issue？
Thanks!

Understanding runtimes of different models

Hi,

I need to score a rather large number of sentences for a downstream task. I'm experimenting with models supported by huggingface with no fine tuning, e.g.:

    mlms_model, vocab, tokenizer = get_pretrained(ctxs, 'albert-base-v2')
    scorer = MLMScorerPT(mlms_model, vocab, tokenizer, ctxs)
    sentences = ... # 1847 sentences
    corpus = Corpus.from_text(sentences)
    scores = self.scorer.score(corpus, 1.0, 500) # adjusted batch size to avoid gpu out of memory errors

Depending on the model and scorer I get wildly different runtimes. On my computer, encoding 1847 sentences:

MXNet MLMs like 'bert-base-en-cased' and 'roberta-base-en-cased' with MLMScorer take 3-4 minutes
MXNet LMs like 'gpt2-117m-en-cased' with LMScorer take about 8-10 secs (for some reason I need to lower the batch size to around 50)
'albert-base-v1' and 'albert-base-v2' with MLMScorerPT take 4-5 minutes.
'distilbert-base-cased' and 'distilbert-base-uncased' take 1-2 minutes.

I expected, perhaps naively, that ALBERT and DistilBERT would be much faster due to reduced dimensionality and number of layers.

xlm-roberta example?

Thank you for the amazing work. I am trying to use xlm models for scoring, but I got a bug like below for using xlm-roberta-base/large.

(base) bill@ink-molly:~/MickeyProbes$ python probe_generation/sent_scoring.py
/home/bill/anaconda3/lib/python3.7/site-packages/mxnet/optimizer/optimizer.py:167: UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding existing optimizer mxnet.optimizer.optimizer.LAMB
  Optimizer.opt_registry[name].__name__))
WARNING:root:Model 'xlm-roberta-large' not recognized as an MXNet model; treating as PyTorch model
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 513/513 [00:00<00:00, 257kB/s]
Can't set hidden_size with value 1024 for XLMConfig {
  "architectures": [
    "XLMRobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "model_type": "xlm",
  "pad_token_id": 1
}

Traceback (most recent call last):
  File "probe_generation/sent_scoring.py", line 9, in <module>
    model, vocab, tokenizer = get_pretrained(ctxs, 'xlm-roberta-large')
  File "/home/bill/MickeyProbes/mlm-scoring/src/mlm/models/__init__.py", line 126, in get_pretrained
    model, loading_info = transformers.XLMWithLMHeadModel.from_pretrained(model_fullname, output_loading_info=True)
  File "/home/bill/anaconda3/lib/python3.7/site-packages/transformers/modeling_utils.py", line 854, in from_pretrained
    **kwargs,
  File "/home/bill/anaconda3/lib/python3.7/site-packages/transformers/configuration_utils.py", line 316, in from_pretrained
    return cls.from_dict(config_dict, **kwargs)
  File "/home/bill/anaconda3/lib/python3.7/site-packages/transformers/configuration_utils.py", line 403, in from_dict
    config = cls(**config_dict)
  File "/home/bill/anaconda3/lib/python3.7/site-packages/transformers/configuration_xlm.py", line 195, in __init__
    super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, **kwargs)
  File "/home/bill/anaconda3/lib/python3.7/site-packages/transformers/configuration_utils.py", line 215, in __init__
    raise err
  File "/home/bill/anaconda3/lib/python3.7/site-packages/transformers/configuration_utils.py", line 212, in __init__
    setattr(self, key, value)
AttributeError: can't set attribute

I am not sure if it is a version issue. Would you please provide an example for running xlm in the code? Thanks!

Increasing batch size when using command-line mlm score

I am trying to use this package's command-line interface in a similar fashion to the README's example:

mlm score \
    --mode hyp \
    --model bert-base-en-uncased \
    --gpus 0 \
    examples/asr-librispeech-espnet/data/dev-other.am.json \
    > examples/demo/dev-other-3.lm.json

However, I see that it uses only around 601 MB of GPU memory, which is much less that what the GPU is able to support (12 GB). Is there any way to increase the batch size when using mlm score? It seems that the --split-size argument would do something like this, is that right?

Hardcoded GPU 0?

Hi there,

I'm facing an issue with your PyTorch implementation and some input sentences. E.g.

s = 'RT @HISPANlCPROBS : When u walk straight into the kitchen to eat & ur mom hits u with the " ya saludaste " #ThanksgivingWithHispanics https://…'
print(scorer.score_sentences([s]))

gives the following error:

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.91 GiB total capacity; 451.65 MiB already allocated; 12.12 MiB free; 40.35 MiB cached)

I'm working on a server with three GPUs and tried setting ctxs = [mx.gpu(0)], ctxs = [mx.gpu(1)], ctxs = [mx.gpu(2)] and ctxs = [mx.cpu()] but I always get the same error about GPU 0. I'm wondering if this is hardcoded somewhere in your code? Changing the ctxs variable seems to have no effect.

Thanks.

Can't load tokenizer for 'xlm-roberta-large'

Dear authors,

I have tried to change the pre-trained model to 'xlm-roberta-large', but I got this OSerror message:

Can't load tokenizer for 'xlm-roberta-large'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'xlm-roberta-large' is the correct path to a directory containing all relevant files for a XLMTokenizer tokenizer.

Could you guide me how to solve this problem?

can I train new LM from scratch with hugging face roberta then use mlm scoring?

Hi,
I read readme and don't see how can I train my model roberta or bert from scratch (with hugging face) then save to checkpoint, then I can use mlm-scoring to integration with it.
Tks.

Where is vocab file?

how to integrate models not available via huggingface or gluon?

hi there,

as i understand your library, it works with models which are available from huggingface or gluon.

question: for a model that is not available in the model zoos of those two frameworks, e.g. a model i trained myself, how can i get this to work with a config.json, a pytorch_model.bin and a vocab.txt file?

best,

phillip

Help for Distil Roberta

Hi
I am trying to use distil roberta model with the code. I can see there is class for DistilBert which has been been loaded from transformers library here. Is there a way i could also use it for DistilRoberta as it is not in the models src of transformers library but only has a model card.

GPT Models Scoring error

I tried scoring sentences with the models mentioned here . Every model works fine except for gpt2-117m-en-cased and gpt2-345m-en-cased. The following error pops up

Traceback (most recent call last):
  File "sample.py", line 16, in <module>
    print(scorer.score_sentences(["Hello world!"]))
  File "/home/pandramish.vinay/mlm-scoring/src/mlm/scorers.py", line 148, in score_sentences
    return self.score(corpus, **kwargs)[0]
  File "/home/pandramish.vinay/mlm-scoring/src/mlm/scorers.py", line 396, in score
    dataset = self.corpus_to_dataset(corpus)
  File "/home/pandramish.vinay/mlm-scoring/src/mlm/scorers.py", line 364, in corpus_to_dataset
    ids_masked = self._ids_to_masked(ids_original)
  File "/home/pandramish.vinay/mlm-scoring/src/mlm/scorers.py", line 329, in _ids_to_masked
    mask_token_id = self._vocab.token_to_idx[self._vocab.mask_token]
AttributeError: 'Vocab' object has no attribute 'mask_token'

Any fixes ?

How to Compute Perplexity of a Sentence

Hey
Is there some way to just use a pre trained model and give a sentence as input and get it's perplexity as the output?

How to integrate this with MuRIL?

I wanted to use this library for computing the scores for MuRIL which is based on BERT MLM.Its not there in huggingface as yet.How to bridge the gap?

Where is vocab file?

I don't want to download the vocab file because I want to do offline. so I want to give a parameter to "get_pretrained".
I read this code, I think it can't do it.
Would you fix it?

if apply domain MLM-finetuning for rescoring

hi, I am little confused on rescoring for asr and nmt.
Is it post-pretrained on domain corpus for rescoring(apply MLM on domain data) or just used open-source pretrained model(roberta or bert on wikibook corpus)?

"NotImplementedError" When trying to fine-tune any bert model

When trying to follow the steps stated in Maskless fine tuning section, (i even tried to use the exact model stated in the steps..)
i always recieve:

     60     @staticmethod
     61     def _check_support(model) -> bool:
---> 62         raise NotImplementedError

is the regression fine tuner implemented for Bert models?

ERROR: No matching distribution found for mxnet-mkl

I cloned the repo locally, then ran pip install -e . and pip install mxnet-mkl but I get the error:

ERROR: Could not find a version that satisfies the requirement mxnet-mkl (from versions: none)
ERROR: No matching distribution found for mxnet-mkl

How can I fix it?

Update to transformers 4.x

I'd quite like to use this library to score the output from my RoBERTa model, but it's implemented with huggingface transformers version 4.x and this library requires 3.3.1 (and that also ended up installing tokenizers-0.8.1rc2 for some reason).

It would be nice if it could be upgraded to the latest verison.