Hi, I try to reproduce your result of roberta-base on MNLI. Your res

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Cannot reproduce result on MNLI about pet HOT 3 CLOSED

MatthewCYM commented on July 29, 2024

Cannot reproduce result on MNLI

from pet.

Comments (3)

timoschick commented on July 29, 2024

Hi @MatthewCYM,

I think there are three differences regarding the settings:

alpha in our implementation is actually 1 - alpha in the paper (this is something I should definitely fix but didn't have the time to do yet). So if you want alpha = 1e-4 as in the paper, you actually need to set alpha = 1 - 1e-4 = 0.9999 (the default value). I would assume that this is the main reason for performance differences.
We use less unlabeled examples (check out Section B.2 of the paper)
We train the final model not for 3 epochs, but for 5000 steps (see Table 5 in the paper). That is, you should set sc_max_steps 5000 instead of sc_num_train_epochs 3.

Additionally, as roberta-base requires much less memory, we actually didn't use gradient accumulation and instead directly set --pet_per_gpu_train_batch_size 4 --pet_per_gpu_unlabeled_batch_size 12, but this shouldn't have any impact on the results.

If fixing those three things still doesn't give you results similar to those reported in the paper, please let me know. Finally, if you want to reproduce the exact results from the paper, you may need to use v1.1.0 (--branch v1.1.0). Some things like random seed initialization and dataset shuffling are implemented a little bit differently in the current version.

from pet.

MatthewCYM commented on July 29, 2024

Hi,

Thank you for the answering! Another issue I encountered is that when I train ipet on mnli with a single pattern, it will give me the following error:

Traceback (most recent call last):
File "cli.py", line 282, in
main()
File "cli.py", line 266, in main
pet.train_ipet(pet_model_cfg, pet_train_cfg, pet_eval_cfg, ipet_cfg, sc_model_cfg, sc_train_cfg, sc_eval_cfg,
File "/home/jiadong/yiming/pet/pet/modeling.py", line 191, in train_ipet
generate_ipet_train_sets(train_data=train_data, unlabeled_data=unlabeled_data,
File "/home/jiadong/yiming/pet/pet/modeling.py", line 679, in generate_ipet_train_sets
subdir_train_set = generate_ipet_train_set(
File "/home/jiadong/yiming/pet/pet/modeling.py", line 723, in generate_ipet_train_set
logits = np.average(logits, axis=0, weights=weights)
File "<array_function internals>", line 5, in average
File "/home/yiming/anaconda3/envs/pet/lib/python3.8/site-packages/numpy/lib/function_base.py", line 409, in average
raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

Do we have to use multiple patterns for the ipet?

Regards,
Matthew

from pet.

timoschick commented on July 29, 2024

Yes, iPET requires at least two different patterns.

from pet.

Recommend Projects

Cannot reproduce result on MNLI about pet HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent