facebookresearch / anli Goto Github PK

View Code? Open in Web Editor NEW

386.0 17.0 45.0 2.63 MB

Adversarial Natural Language Inference Benchmark

License: Other

Shell 2.90% Python 92.26% Jupyter Notebook 4.84%

anli's Introduction

Adversarial NLI

Papers

Dataset

Adversarial NLI: A New Benchmark for Natural Language Understanding

Annotations of the Dataset for Error Analysis

ANLIzing the Adversarial Natural Language Inference Dataset

Dataset

Version 1.0 is available here: https://dl.fbaipublicfiles.com/anli/anli_v1.0.zip.

Format

The dataset files are all in JSONL format (one JSON per line). Below is one example (in JSON format) with self-explanatory fields.
Note that each example (each line) in the files contains a uid field represents a unique id across all the examples in all there rounds of ANLI.

{   
    "uid": "8a91e1a2-9a32-4fd9-b1b6-bd2ee2287c8f",
    "premise": "Javier Torres (born May 14, 1988 in Artesia, California) is an undefeated Mexican American professional boxer in the Heavyweight division.
                Torres was the second rated U.S. amateur boxer in the Super Heavyweight division and a member of the Mexican Olympic team.",
    "hypothesis": "Javier was born in Mexico",
    "label": "c",
    "reason": "The paragraph states that Javier was born in the California, US."
}

Reason

AdversarialNLI dataset contains a reason field for each examples in the dev and test split and for some examples in the train split. The reason is collected by asking annotator "Please write a reason for your statement belonging to the category and why you think it was difficult for the system.".

Verifier Labels (Updated on May 11, 2022)

All the examples in our dev and test sets are verified by 2 or 3 (if the first 2 verifiers do not agree with each other) verifiers. We released additional verifier labels in verifier_labels/verifier_labels_R1-3.jsonl.
Please refer to the verifier_labels_readme or Sec 2.1, Appendix C and Figure 7 in the ANLI paper for more details about the verifier labels.

Annotations for Error Analysis

An in-depth error analysis of the dataset is available here: https://github.com/facebookresearch/anli/tree/main/anlizinganli

We use a fine-grained annotation scheme of the different aspects of inference that are responsible for the gold classification labels, and use it to hand-code all three of the ANLI development sets. These annotations can be used to answer a variety of interesting questions: which inference types are most common, which models have the highest performance on each reasoning type, and which types are the most challenging for state of-the-art models?

Leaderboard

If you want to have your model added to the leaderboard, please reach out to us or submit a PR.

Model	Publication	A1	A2	A3
InfoBERT (RoBERTa Large)	Wang et al., 2020	75.5	51.4	49.8
ALUM (RoBERTa Large)	Liu et al., 2020	72.3	52.1	48.4
GPT-3	Brown et al., 2020	36.8	34.0	40.2
ALBERT ( using the checkpoint in this codebase )	Lan et al., 2019	73.6	58.6	53.4
XLNet Large	Yang et al., 2019	67.6	50.7	48.3
RoBERTa Large	Liu et al., 2019	73.8	48.9	44.4
BERT Large	Devlin et al., 2018	57.4	48.3	43.5

(Updated on Jan 21 2021: The three entries at the bottom show the test set numbers from Table 3 in the ANLI paper. We recommend that you report test set results in your paper. Dev scores, obtained for the models in this code base, are reported below.)

Implementation

To facilitate research in the field of NLI, we provide an easy-to-use codebase for NLI data preparation and modeling. The code is built upon Transformers with a special focus on NLI.

We welcome researchers from various fields (linguistics, machine learning, cognitive science, psychology, etc.) to try NLI. You can use the code to reproduce the results in our paper or even as a starting point for your research.

Please read more in Start your NLI research.

An important detail in our experiments is that we combine SNLI+MNLI+FEVER-NLI and up-sample different rounds of ANLI to train the models.
We highly recommend you refer to the above link for reproducing the results and training your models such that the results will be comparable to the ones on the leaderboard.

(Updated on May 11, 2022)
Thanks to Jared Contrascere. Now, Researchers can use the notebook to run experiments quickly via Google Colab.

Pre-trained Models

Pre-trained NLI models can be easily called through huggingface model hub.

Version information:

python==3.7
torch==1.7
transformers==3.0.2 or later (tested: 3.0.2, 3.1.0, 4.0.0)

Models: RoBERTa, ALBert, BART, ELECTRA, XLNet.

The training data is a combination of SNLI, MNLI, FEVER-NLI, ANLI (R1, R2, R3). Please also cite the datasets if you are using the pre-trained model.

Please try the code snippet below.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

if __name__ == '__main__':
    max_length = 256

    premise = "Two women are embracing while holding to go packages."
    hypothesis = "The men are fighting outside a deli."

    hg_model_hub_name = "ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"
    # hg_model_hub_name = "ynie/albert-xxlarge-v2-snli_mnli_fever_anli_R1_R2_R3-nli"
    # hg_model_hub_name = "ynie/bart-large-snli_mnli_fever_anli_R1_R2_R3-nli"
    # hg_model_hub_name = "ynie/electra-large-discriminator-snli_mnli_fever_anli_R1_R2_R3-nli"
    # hg_model_hub_name = "ynie/xlnet-large-cased-snli_mnli_fever_anli_R1_R2_R3-nli"

    tokenizer = AutoTokenizer.from_pretrained(hg_model_hub_name)
    model = AutoModelForSequenceClassification.from_pretrained(hg_model_hub_name)

    tokenized_input_seq_pair = tokenizer.encode_plus(premise, hypothesis,
                                                     max_length=max_length,
                                                     return_token_type_ids=True, truncation=True)

    input_ids = torch.Tensor(tokenized_input_seq_pair['input_ids']).long().unsqueeze(0)

    # remember bart doesn't have 'token_type_ids', remove the line below if you are using bart.
    token_type_ids = torch.Tensor(tokenized_input_seq_pair['token_type_ids']).long().unsqueeze(0)
    attention_mask = torch.Tensor(tokenized_input_seq_pair['attention_mask']).long().unsqueeze(0)

    outputs = model(input_ids,
                    attention_mask=attention_mask,
                    token_type_ids=token_type_ids,
                    labels=None)
    # Note:
    # "id2label": {
    #     "0": "entailment",
    #     "1": "neutral",
    #     "2": "contradiction"
    # },

    predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()  # batch_size only one

    print("Premise:", premise)
    print("Hypothesis:", hypothesis)
    print("Entailment:", predicted_probability[0])
    print("Neutral:", predicted_probability[1])
    print("Contradiction:", predicted_probability[2])

If you are using our pre-trained model checkpoints with the above code snippet, you would expect to get the following numbers.

Huggingface Model Hub Checkpoint	A1 (dev)	A2 (dev)	A3 (dev)	A1 (test)	A2 (test)	A3 (test)
ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli	73.8	50.8	46.1	73.6	49.3	45.5
ynie/xlnet-large-cased-snli_mnli_fever_anli_R1_R2_R3-nli	73.4	52.3	50.8	70.0	51.4	49.8
ynie/albert-xxlarge-v2-snli_mnli_fever_anli_R1_R2_R3-nli	76.0	57.0	57.0	73.6	58.6	53.4

Rules

When using this dataset, we ask that you obey some very simple rules:

We want to make it easy for people to provide ablations on test sets without being rate limited, so we release labeled test sets with this distribution. We trust that you will act in good faith, and will not tune on the test set (this should really go without saying)! We may release unlabeled test sets later.
Training data is for training, development data is for development, and test data is for reporting test numbers. This means that you should not e.g. train on the train+dev data from rounds 1 and 2 and then report an increase in performance on the test set of round 3.
We will host a leaderboard on this page. If you want to be added to the leaderboard, please contact us and/or submit a PR with a link to your paper, a link to your code in a public repository (e.g. Github), together with the following information: number of parameters in your model, data used for (pre-)training, and your dev and test results for each round, as well as the total over all rounds.

Other NLI Reference

We used following NLI resources in training the backend model of the adversarial collection:

Citations

Dataset

@inproceedings{nie-etal-2020-adversarial,
    title = "Adversarial {NLI}: A New Benchmark for Natural Language Understanding",
    author = "Nie, Yixin  and
      Williams, Adina  and
      Dinan, Emily  and
      Bansal, Mohit  and
      Weston, Jason  and
      Kiela, Douwe",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    year = "2020",
    publisher = "Association for Computational Linguistics",
}

Annotations of the Dataset for Error Analysis

@article{williams-etal-2020-anlizing,
  title = "ANLIzing the Adversarial Natural Language Inference Dataset",
  author = "Adina Williams and
    Tristan Thrush and
    Douwe Kiela",
  booktitle = "Proceedings of the 5th Annual Meeting of the Society for Computation in Linguistics",
  year = "2022",
  publisher = "Association for Computational Linguistics",
}

License

ANLI is licensed under Creative Commons-Non Commercial 4.0. See the LICENSE file for details.

anli's People

Contributors

Stargazers

Watchers

Forkers

namisan ricklentz victoroyoma shanyas10 mahanswaray xiaming9880 iloleg dattachandan wwjinkla boxin-wbx yicheng-w anshiquanshu66 kp-forks by-sabbir siyutao ml-and-ai-repo xixihaha1995 matt-raporte khaledsharif sebschu xiaoya-li vohoanglong0107 arshdeepsekhon stjordanis nightingal3 tristanthrush contracode somashaker23 hertera1 yejiadong nikotang cesar-arce canyon289 hihaluemen kiedanski xbhuang ko120 cognitus-stuti superf0sh

anli's Issues

RuntimeError: No rendezvous handler for env://

Hi, with this command:

python src/nli/training.py \
    --model_class_name "roberta-large" \
    -n 1 \
    -g 1 \
    -nr 0 \
    --max_length 156 \
    --gradient_accumulation_steps 1 \
    --per_gpu_train_batch_size 16 \
    --per_gpu_eval_batch_size 16 \
    --save_prediction \
    --train_data snli_train:none,mnli_train:none \
    --train_weights 1,1 \
    --eval_data snli_dev:none \
    --eval_frequency 2000 \
    --experiment_name "roberta-large|snli|nli"

I got this error:

Traceback (most recent call last):
File "src/nli/training.py", line 854, in
main()
File "src/nli/training.py", line 377, in main
mp.spawn(train, nprocs=args.gpus_per_node, args=(args,)) # spawn how many process in this node
File "C:\Users\1111\AppData\Roaming\Python\Python37\site-packages\torch\multiprocessing\spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "C:\Users\1111\AppData\Roaming\Python\Python37\site-packages\torch\multiprocessing\spawn.py", line 157, in start_processes
while not context.join():
File "C:\Users\1111\AppData\Roaming\Python\Python37\site-packages\torch\multiprocessing\spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\Users\1111\AppData\Roaming\Python\Python37\site-packages\torch\multiprocessing\spawn.py", line 19, in _wrap
fn(i, *args)
File "F:\3233\anli\src\nli\training.py", line 425, in train
rank=args.global_rank
File "C:\Users\1111\AppData\Roaming\Python\Python37\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group
init_method, rank, world_size, timeout=timeout
File "C:\Users\1111\AppData\Roaming\Python\Python37\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for env://

Could you please help me with this?

Model Size

Wanted to ask if there is a way to reduce the model size, i was trying to use the code locally instead of google Collab and my ram didn't manage to let the model load,

As i see, the ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli is roughly 1.5 GB, is this the quantized version?

Thanks in advance.

Questions regarding full score calculation

Hi!

I'm a bit confused regarding how the full ANLI (A1+A2+A3) score is calculated. My understanding is that it is a sample-weighted average of the A1, A2 and A3 sets. Is that correct?

So like this:

def anli(a1, a2, a3):
    return (a1 * 1000 + a2 * 1000 + a3 * 1200) / 3200

When I try this on the InfoBERT paper the calculation matches for some scores but not for others (from Table 2):

 # FreeLB
anli(73.3, 50.5, 46.8) # 56.24, matches the paper

# SMART
anli(72.4, 49.8, 50.3) # 57.05, matches the paper

# ALUM
anli(72.3, 52.1, 48.4) # 57.02, matches the paper

# InfoBERT
anli(75.0, 50.5, 47.7) # 57.1 while paper shows 58.3 ???

Question 1: Is this a mistake in the paper or is the way I calculate the total score wrong?

But what confused me again was that in the Table 5 in the ALUM paper:

The authors say

Note that Nie et al. (2019) did not represent results for individual rounds, assignified by “-”

Comparing this with Table 3 in the original ANLI paper this doesn't seem to match:

Although it's not very explicitly said here I interpret this table as follows:

A1 column -> test score for A1
A2 column -> test score for A2
A3 column -> test score for A3
ANLI column -> test score for ANLI
ANLI-E column -> test score for ANLI-E, a more difficult subset of the ANLI test set

while the authors of the ALUM paper interpret it as follows:

A1 column -> dev score for A1
A2 column -> dev score for A2
A3 column -> dev score for A3
ANLI column -> dev score for ANLI
ANLI-E column -> test score for ANLI

The same thing occurs in the SMART paper (Table 6):

If my interpretation is right this is a significant flaw in the paper as they compare dev scores with test scores.

Question 2: Which interpretation of Table 3 in the ANLI paper is correct?

Thanks for your help.

[Question] Roberta-Large and MNLI dev results

Hi,

I am testing the model ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli uploaded by you and I noted that the MNLI dev results (matched and mismatched) are slightly different than reported in the paper.

Some observations:

In my code, I use the huggingface datasets (https://github.com/huggingface/datasets) module and I am loading as follow: load_dataset('glue', 'mnli'). I checked this dataset version and I did not find differences between the original MNLI dataset.
I am comparing with the results of Table 3 (last row), which reports (91.0 / 90.6) while in my execution reports (89.89/89.97).

Maybe I am doing something wrong in validation or it might be a numeric precision related issue but I want to confirm that it is not an issue of the uploaded model.

assertion failed train_data_weights train_data_list

Setup exactly as described and followed all steps. Then ...

$ python src/nli/training.py  --model_class_name "roberta-large" -n 1 -g 1 -nr 0 --max_length 256 --gradient_accumulation_steps 2 --per_gpu_train_batch_size 16 --per_gpu_eval_batch_size 16   --save_prediction --train_data snli_train:none,mnli_train:none    --train_weights 1  --eval_data snli_dev:none   --eval_frequency 2000    --experiment_name "roberta-large|snli|nli"
Downloading: 100%|██████████████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 4.99MB/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 3.96MB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████| 482/482 [00:00<00:00, 289kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████| 1.43G/1.43G [04:09<00:00, 5.71MB/s]
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaForSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Load Jsonl: /home/qblocks/shan/anli/data/build/snli/train.jsonl
549367it [00:02, 191192.35it/s]
Load Jsonl: /home/qblocks/shan/anli/data/build/mnli/train.jsonl
392702it [00:02, 179624.89it/s]
Load Jsonl: /home/qblocks/shan/anli/data/build/snli/dev.jsonl
9842it [00:00, 186058.95it/s]
Traceback (most recent call last):
  File "src/nli/training.py", line 852, in <module>
    main()
  File "src/nli/training.py", line 375, in main
    mp.spawn(train, nprocs=args.gpus_per_node, args=(args,))  # spawn how many process in this node
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/qblocks/shan/anli/src/nli/training.py", line 478, in train
    assert len(train_data_weights) == len(train_data_list)
AssertionError

Error loading pretrained model: xxx is a zip archive (did you mean to use torch.jit.load()?)

Hi,
When trying to load anli pretrained model with the following code:
hg_model_hub_name = "ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"
model = AutoModelForSequenceClassification.from_pretrained(hg_model_hub_name)
I got error::
RuntimeError: /home/xd/.cache/torch/transformers/05b820b482aacc2f4ab9fc864c2acc45307024b2d4ebb2458686967bf2697fe3.d5336e8b7525ed52f95265359a6bdc77551bbf8e69e271a19d8c1817385fd345 is a zip archive (did you mean to use torch.jit.load()?)
After search, I found it may be caused by loading pytorch 1.6 models from an older pytorch version.
My env:
pytorch: 1.3
huggingface/transformers: 3.02
Did you train and save the model using pytorch 1.6? What is the version requirement for pytorch and huggingface/transformers?

Da Xiao

Does 'bart-large-snli_mnli_fever_anli_R1_R2_R3-nli' output 'token_type_ids'?

Thanks for the code and model sharing!

Just wondering does the tokenizer from ynie/bart-large-snli_mnli_fever_anli_R1_R2_R3-nli not have 'token_type_ids' ?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-bdf2d57ba3d0> in <module>
      1 sents, labels = shuffle2lists(sents, labels)
      2 classifier = TestClassifier(0.9, model, tokenizer)
----> 3 preds = classifier.batch_predict(sents, pos_refs, neg_refs)
      4 print(classification_report(labels, preds, labels=[0, 1, 2], target_names=['Negative', 'Nothing', 'Positive']))

<ipython-input-61-a375e4235623> in batch_predict(self, premises, hyp_pos, hyp_neg)
     31 
     32     def batch_predict(self, premises, hyp_pos, hyp_neg):
---> 33         nli_pos_scores = self.transform(premises, hyp_pos)
     34         nli_neg_scores = self.transform(premises, hyp_neg)
     35         assert(len(nli_pos_scores)==len(nli_neg_scores))

<ipython-input-61-a375e4235623> in transform(self, premises, hypotheses)
      8 
      9     def transform(self, premises, hypotheses):
---> 10         return nli_exp(self.tokenizer, self.model, premises, hypotheses)
     11 
     12     def transform2(self, premises, hypotheses):

<ipython-input-60-a0bed999f3f7> in nli_exp(tokenizer, model, premises, hypotheses)
     26     for i in range(len(premises)):
     27         for j in range(len(hypotheses)):
---> 28             predicted_probability, _ = get_prediction(tokenizer, model, premises[i], hypotheses[j])
     29             ent_scores.append(predicted_probability[0])
     30             neu_scores.append(predicted_probability[1])

<ipython-input-60-a0bed999f3f7> in get_prediction(tokenizer, model, premise, hypothesis, max_length)
      8     attention_mask = torch.Tensor(tokenized_input_seq_pair['attention_mask']).long().unsqueeze(0)
      9 
---> 10     outputs = model(input_ids,
     11                     attention_mask=attention_mask,
     12                     token_type_ids=token_type_ids,

d:\bo\envs\srp\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

TypeError: forward() got an unexpected keyword argument 'token_type_ids'

Is path name correctly defined?

I think path is defined incorrectly.

for training data
path = name[ind + 1:]

and dev data
path = name[ind + 1:]

I think name should be changed to named_path. Is it correct?

Where's the leaderboard?

Your dataset is featured in GPT-3, might be worth updating.

[Request] Pretrained Weights

Hi,

Since you used the Transformer library from hugging face to write your code, can you upload the finetuned models used in the paper in the hugging face models?

RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR

With the command

$ python src/nli/training.py  --model_class_name "roberta-large" -n 1 -g 1 -nr 0 --max_length 256 --gradient_accumulation_steps 2 --per_gpu_train_batch_size 16 --per_gpu_eval_batch_size 16   --save_prediction --train_data snli_train:none,mnli_train:none     --eval_data snli_dev:none   --eval_frequency 2000    --experiment_name "roberta-large|snli|nli"

I get the error

                                                                                                                           Pro:global_rank:0|local_rank:0|node_rank:0||Print: anli_r1_dev Acc: 488 1000 0.488   | 23999/40717 [17:36<11:31, 24.17it/s]
Pro:global_rank:0|local_rank:0|node_rank:0||Print: anli_r2_dev Acc: 395 1000 0.395
Pro:global_rank:0|local_rank:0|node_rank:0||Print: anli_r3_dev Acc: 504 1200 0.42
Save to Jsonl: /home/qblocks/shan/anli/saved_models/10-07-07:30:24_roberta-base|snli+mnli+fnli+r1*10+r2*20+r3*10|nli/predictions/e(0)|i(6000)|anli_r1_dev#(0.488)|anli_r2_dev#(0.395)|anli_r3_dev#(0.42)/anli_r1_dev.jsonl
Save to Jsonl: /home/qblocks/shan/anli/saved_models/10-07-07:30:24_roberta-base|snli+mnli+fnli+r1*10+r2*20+r3*10|nli/predictions/e(0)|i(6000)|anli_r1_dev#(0.488)|anli_r2_dev#(0.395)|anli_r3_dev#(0.42)/anli_r2_dev.jsonl
Save to Jsonl: /home/qblocks/shan/anli/saved_models/10-07-07:30:24_roberta-base|snli+mnli+fnli+r1*10+r2*20+r3*10|nli/predictions/e(0)|i(6000)|anli_r1_dev#(0.488)|anli_r2_dev#(0.395)|anli_r3_dev#(0.42)/anli_r3_dev.jsonl
Iteration:  72%|█████████████████████████████████████████████████▊                   | 29374/40717 [21:40<08:22, 22.58it/s]
Epoch:   0%|                                                                                         | 0/2 [21:40<?, ?it/s]
Traceback (most recent call last):
  File "src/nli/training.py", line 852, in <module>
    main()
  File "src/nli/training.py", line 368, in main
    train(0, args)
  File "src/nli/training.py", line 630, in train
    labels=batch['y'])
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qblocks/.local/lib/python3.6/site-packages/transformers/modeling_roberta.py", line 344, in forward
    output_hidden_states=output_hidden_states,
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qblocks/.local/lib/python3.6/site-packages/transformers/modeling_bert.py", line 762, in forward
    output_hidden_states=output_hidden_states,
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qblocks/.local/lib/python3.6/site-packages/transformers/modeling_bert.py", line 439, in forward
    output_attentions,
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qblocks/.local/lib/python3.6/site-packages/transformers/modeling_bert.py", line 389, in forward
    layer_output = self.output(intermediate_output, attention_output)
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qblocks/.local/lib/python3.6/site-packages/transformers/modeling_bert.py", line 345, in forward
    hidden_states = self.dense(hidden_states)
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/qblocks/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1612, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Hyperparameters

Hello !
Thank you for sharing a great piece of work.

I was wondering what are the exact hyperparameters for producing Table 3 for each of BERT, XLNET and RoBETa.

Regards

About accuracy number reported in the paper

Hi, I am just wondering how do you calculate the number under the ANLI column in Table 3? It doesn't seem like any weighted/unweighted average from A1, A2, A3?

facebookresearch / anli Goto Github PK

anli's Introduction

Adversarial NLI

Papers

Dataset

Annotations of the Dataset for Error Analysis

Dataset

Format

Reason

Verifier Labels (Updated on May 11, 2022)

Annotations for Error Analysis

Leaderboard

Implementation

Pre-trained Models

Rules

Other NLI Reference

Citations

Dataset

Annotations of the Dataset for Error Analysis

License

anli's People

Contributors

Stargazers

Watchers

Forkers

anli's Issues

Recommend Projects

Recommend Topics

Recommend Org