Code Monkey home page Code Monkey logo

Comments (3)

timoschick avatar timoschick commented on September 3, 2024 2

If everything else works as expected, you can ignore this error message. It's because PET has its own truncation logic to ensure that the mask token and the pattern are never truncated. Before applying this logic, the entire sequence is tokenized without any truncation, which is why some resulting sequences are longer than the model's maximum sequence length.

from pet.

timoschick avatar timoschick commented on September 3, 2024

Hi @Harry-hash, first, you'll need to checkout the feature/genpet branch for that. There's a couple of new features GenPET uses that, unfortunately, are not mentioned in the paper on arXiv due to an ongoing anonymity period, but to train a model (with all those features enabled) using the same hyperparameters as the paper, you can use the following command:

python3 cli.py \
	--method pet \
	--wrapper_type generative \
	--pattern_ids 2 3 4 5 \
	--data_dir . \
	--model_type pegasus \
	--model_name_or_path google/pegasus-large \
	--task_name ${TASK} \
	--output_dir ${OUTPUT_DIR} \
	--train_examples ${NUM_EXAMPLES} \
	--test_examples 10000 \
	--unlabeled_examples 1000 \
	--do_eval \
	--learning_rate 1e-4 \
	--eval_set test \
	--pet_per_gpu_eval_batch_size 32 \
	--pet_per_gpu_train_batch_size 2 \
	--pet_gradient_accumulation_steps 4 \
	--output_max_seq_length ${OUTPUT_MAX_SEQ_LENGTH} \
	--pet_max_steps 250 \
	--pet_max_seq_length 512 \
	--sc_per_gpu_train_batch_size 2 \
	--sc_gradient_accumulation_steps 4 \
	--sc_per_gpu_eval_batch_size 32 \
	--sc_max_steps 250 \
	--sc_max_seq_length 512 \
	--optimizer adafactor \
	--epsilon 0.1 \
	--do_train \
	--pet_repetitions 1 \
	--train_data_seed ${TRAIN_DATA_SEED} \
	--multi_pattern_training \
	--untrained_model_scoring \
	--cutoff_percentage 0.2

Here,

  • ${TASK} is the name of the task (e.g., cnn-dailymail, see here);
  • ${OUTPUT_DIR} is the output directory;
  • ${NUM_EXAMPLES} is the number of training examples to use (in the paper, we experimented with 0, 10 and 100);
  • ${OUTPUT_MAX_SEQ_LENGTH} is the maximum length of the generated output sequence (32 for aeslc and gigaword, 64 for xsum and 128 for all other tasks);
  • ${TRAIN_DATA_SEED} is the seed used for initializing the RNG that selects the ${NUM_EXAMPLES} training examples. In the paper, we've used 0, 42 and 100.

If you don't want to use the new features mentioned above, simply remove the last three lines (i.e., do not use --multi_pattern_training and --untrained_model_scoring and do not provide a --cutoff_percentage).

from pet.

Harry-hash avatar Harry-hash commented on September 3, 2024

Thank you very much for your detailed instructions! @timoschick

But when I was running the codes, a lot of error messages appear in the terminal, which says "Token indices sequence length is longer than the specified maximum sequence length for this model (1070 > 1024). Running this sequence through the model will result in indexing errors". Is it because the parameter max_length is not specified somewhere in tokenization? I am using transformer==3.3.1

from pet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.