Description of bug:
I use the run_multiple_choice.py and utils_multiple_choice.py to fine-tune RoBERTa on train set and evaluate it on dev set and I find it weird that logit for every choice of every sentence is the same. I cannot find where my error is. Is there any parameter that I forgot to set or adjust?
To reproduce the issue
I run the run_multiple_choicy.py with one GPU by
python run_multiple_choice.py \
--data_dir './MuTual/data/mutual' \
--model_type roberta \
--model_name_or_path roberta-large \
--task_name mutual \
--output_dir ./OutputRoberta \
--do_train \
--evaluate_during_training \
--overwrite_cache \
--overwrite_output_dir \
--per_gpu_train_batch_size 1
and as I choose evaluate_during_training, the evaluation results of three epochs are:
1st epoch
05/29/2020 20:55:47 - INFO - main - MRR: = 0.6967644845748685
05/29/2020 20:55:47 - INFO - main - R4_1 = 0.23589164785553046
05/29/2020 20:55:47 - INFO - main - R4_2: = 0.6173814898419865
05/29/2020 20:55:47 - INFO - main - eval_loss = 1.3862943617073265
2nd epoch
05/29/2020 21:18:06 - INFO - main - MRR: = 0.8475357411587667
05/29/2020 21:18:06 - INFO - main - R4_1 = 0.24379232505643342
05/29/2020 21:18:06 - INFO - main - R4_2: = 0.2652370203160271
05/29/2020 21:18:06 - INFO - main - eval_loss = 1.3862943660031568
3rd epoch
05/29/2020 21:40:22 - INFO - main - MRR: = 1.0
05/29/2020 21:40:22 - INFO - main - R4_1 = 0.23927765237020315
05/29/2020 21:40:22 - INFO - main - R4_2: = 0.23927765237020315
05/29/2020 21:40:22 - INFO - main - eval_loss = 1.3862942457199097
Because the MRR, R4_1 and R4_2 are weird, so I then print the loss ant logits while evaluating, and I find that the logits for all texts and all choices are the same. For example, in the third epoch the logits are [[2.822, 2.822, 2.822, 2.822][2.822, 2.822, 2.822, 2.822].....].