How should I run codes for generation tasks such as cnn-dailymail?

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you very much for your detailed instructions! <a class="user-mention notranslate

commands for generation about pet HOT 3 CLOSED

Harry-hash commented on September 3, 2024

commands for generation

from pet.

Comments (3)

timoschick commented on September 3, 2024 2

If everything else works as expected, you can ignore this error message. It's because PET has its own truncation logic to ensure that the mask token and the pattern are never truncated. Before applying this logic, the entire sequence is tokenized without any truncation, which is why some resulting sequences are longer than the model's maximum sequence length.

from pet.

timoschick commented on September 3, 2024

Hi @Harry-hash, first, you'll need to checkout the feature/genpet branch for that. There's a couple of new features GenPET uses that, unfortunately, are not mentioned in the paper on arXiv due to an ongoing anonymity period, but to train a model (with all those features enabled) using the same hyperparameters as the paper, you can use the following command:

python3 cli.py \
	--method pet \
	--wrapper_type generative \
	--pattern_ids 2 3 4 5 \
	--data_dir . \
	--model_type pegasus \
	--model_name_or_path google/pegasus-large \
	--task_name ${TASK} \
	--output_dir ${OUTPUT_DIR} \
	--train_examples ${NUM_EXAMPLES} \
	--test_examples 10000 \
	--unlabeled_examples 1000 \
	--do_eval \
	--learning_rate 1e-4 \
	--eval_set test \
	--pet_per_gpu_eval_batch_size 32 \
	--pet_per_gpu_train_batch_size 2 \
	--pet_gradient_accumulation_steps 4 \
	--output_max_seq_length ${OUTPUT_MAX_SEQ_LENGTH} \
	--pet_max_steps 250 \
	--pet_max_seq_length 512 \
	--sc_per_gpu_train_batch_size 2 \
	--sc_gradient_accumulation_steps 4 \
	--sc_per_gpu_eval_batch_size 32 \
	--sc_max_steps 250 \
	--sc_max_seq_length 512 \
	--optimizer adafactor \
	--epsilon 0.1 \
	--do_train \
	--pet_repetitions 1 \
	--train_data_seed ${TRAIN_DATA_SEED} \
	--multi_pattern_training \
	--untrained_model_scoring \
	--cutoff_percentage 0.2

Here,

${TASK} is the name of the task (e.g., cnn-dailymail, see here);
${OUTPUT_DIR} is the output directory;
${NUM_EXAMPLES} is the number of training examples to use (in the paper, we experimented with 0, 10 and 100);
${OUTPUT_MAX_SEQ_LENGTH} is the maximum length of the generated output sequence (32 for aeslc and gigaword, 64 for xsum and 128 for all other tasks);
${TRAIN_DATA_SEED} is the seed used for initializing the RNG that selects the ${NUM_EXAMPLES} training examples. In the paper, we've used 0, 42 and 100.

If you don't want to use the new features mentioned above, simply remove the last three lines (i.e., do not use --multi_pattern_training and --untrained_model_scoring and do not provide a --cutoff_percentage).

from pet.

Harry-hash commented on September 3, 2024

Thank you very much for your detailed instructions! @timoschick

But when I was running the codes, a lot of error messages appear in the terminal, which says "Token indices sequence length is longer than the specified maximum sequence length for this model (1070 > 1024). Running this sequence through the model will result in indexing errors". Is it because the parameter max_length is not specified somewhere in tokenization? I am using transformer==3.3.1

from pet.

commands for generation about pet HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent