Code Monkey home page Code Monkey logo

Comments (3)

timoschick avatar timoschick commented on July 29, 2024

Hi @MatthewCYM,

I think there are three differences regarding the settings:

  1. alpha in our implementation is actually 1 - alpha in the paper (this is something I should definitely fix but didn't have the time to do yet). So if you want alpha = 1e-4 as in the paper, you actually need to set alpha = 1 - 1e-4 = 0.9999 (the default value). I would assume that this is the main reason for performance differences.

  2. We use less unlabeled examples (check out Section B.2 of the paper)

  3. We train the final model not for 3 epochs, but for 5000 steps (see Table 5 in the paper). That is, you should set sc_max_steps 5000 instead of sc_num_train_epochs 3.

Additionally, as roberta-base requires much less memory, we actually didn't use gradient accumulation and instead directly set --pet_per_gpu_train_batch_size 4 --pet_per_gpu_unlabeled_batch_size 12, but this shouldn't have any impact on the results.

If fixing those three things still doesn't give you results similar to those reported in the paper, please let me know. Finally, if you want to reproduce the exact results from the paper, you may need to use v1.1.0 (--branch v1.1.0). Some things like random seed initialization and dataset shuffling are implemented a little bit differently in the current version.

from pet.

MatthewCYM avatar MatthewCYM commented on July 29, 2024

Hi,

Thank you for the answering! Another issue I encountered is that when I train ipet on mnli with a single pattern, it will give me the following error:

Traceback (most recent call last):
File "cli.py", line 282, in
main()
File "cli.py", line 266, in main
pet.train_ipet(pet_model_cfg, pet_train_cfg, pet_eval_cfg, ipet_cfg, sc_model_cfg, sc_train_cfg, sc_eval_cfg,
File "/home/jiadong/yiming/pet/pet/modeling.py", line 191, in train_ipet
generate_ipet_train_sets(train_data=train_data, unlabeled_data=unlabeled_data,
File "/home/jiadong/yiming/pet/pet/modeling.py", line 679, in generate_ipet_train_sets
subdir_train_set = generate_ipet_train_set(
File "/home/jiadong/yiming/pet/pet/modeling.py", line 723, in generate_ipet_train_set
logits = np.average(logits, axis=0, weights=weights)
File "<array_function internals>", line 5, in average
File "/home/yiming/anaconda3/envs/pet/lib/python3.8/site-packages/numpy/lib/function_base.py", line 409, in average
raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

Do we have to use multiple patterns for the ipet?

Regards,
Matthew

from pet.

timoschick avatar timoschick commented on July 29, 2024

Yes, iPET requires at least two different patterns.

from pet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.