microsoft / prophetnet Goto Github PK

A research project for natural language generation, containing the official implementations by MSRA NLC team.

License: MIT License

Python 96.33% Shell 3.67%

prophetnet's Introduction

A research project for natural language generation, containing the official implementations by MSRA NLC team.

ProphetNet: pretrained natural language generation models with future information

GLGE_baselines: natural language generation benchmark baselines

JGR: joint generator-ranker learning for natural language generation

GENIE: pretrained Diffusion natural language generation models with continuous paragraph denoising

AR-diffusion: AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

CRITIC: LLMs validate and rectify themselves through interaction with external tools

Microsoft Open Source Code of Conduct

prophetnet's People

Contributors

Stargazers

Watchers

Forkers

zhangyangisme iamownt datahack-ru utlww sharonbubz mukassab guoday chz816 yoyo1009 addf400 mars-wei changchuming sayyam-jain guitaowufeng xu-song nikitbegwani manavr123 kovnew shoubhikbanerjee xiongjun19 harshithbelagur inderstocks xrosliang amiyamandal-dev qianrenjian oodunsi1 ming94539 nlngh taffywrinkle claudiusgonzo shravankumar33 dayihengliu neural-mt steveshep nolll77 j-rossi-nl nicksunbc bescks kiminh jemmryx zzwei1 hjanani0 voxlogic pierre-zhao xcs224u-spring2021-teamtextsumm kanesp nlpxucan aryanchopra helioxgroup koukoulala samifarooqui gianlucazimmer standardgalactic nducanh255 shiroganetsumugi berenliu shubhampachori12110095 howard0615 jongwon-jay-lee han76024 adambear 643245611 tqnwhz test-mass-forker-org-1 shenwzh3 chhaviilli lzh0525 newplus cyd3nt zhenhua22 zetangforward bissembert1618 eltociear miiiiiko kemolo kunppbu victoria-brami jackman337 jeongsejin ilyocoris codeaudit chenxwh spacehunterinf ajbarryiii alexeykrylov ke1ynoc poonehmousavi shannonsands apollohuang1 mivanovitch ps789 shanshui281 bogoliubon thomassutter mathisall knowledgehacker jianglong98 mwilsoncruz hangj11 yuxin212

prophetnet's Issues

Cuda OOM issue in finetuning

Hi, I am trying to finetune 160gb model with custom dataset with following command. But with smallest settings also it goes out of memory after some update runs. I also, remove --fp16 .. but I don't see any memory improvement.

I tries with max-source-positions as 512,768,1024, update-freq as 1, 2, 4,8, batch-size as 1,2,4,8.
--fp16 enabled/disabled. When I remove this --tensorboard-logdir $TENSORBOARD_LOGDIR ... it works but can't go beyond batch-size of 2. So overall slow.

For multigpu run are there any other settings ??
I am wondering how it ran on 8 * NVIDIA V100 (16GB) GPUs .... with the settings given in ReadMe file.
Let me know.

OS - Ubuntu 16.04
Cuda - 10
Machine - 4x T4 gpus (16gb each), aws g4dn.12xlarge instance
Libraries
pytorch-transformers==1.2.0
torch==1.4.0
fairseq==0.9.0

CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train --user-dir $USER_DIR --task translation_prophetnet 
--arch $ARCH --optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 --lr 0.00001 --min-lr 1e-09 
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 2048 --dropout 0.1 --attention-dropout 0.1 
--weight-decay 0.01 --criterion $CRITERION --label-smoothing 0.1 --update-freq 1  --max-tokens 1024 
--num-workers 40 --load-from-pretrained-model $PRETRAINED_MODEL --ddp-backend=no_c10d --max-epoch 10 
--max-source-positions 768 --max-target-positions 256 --skip-invalid-size-inputs-valid-test --save-dir $SAVE_DIR 
--keep-last-epochs 10 --tensorboard-logdir $TENSORBOARD_LOGDIR $DATA_DIR --skip-invalid-size-inputs-valid-test 
--save-interval-updates 1000 --batch-size 1

About time cost on pre-training with corpus Wikipedia.

Hi, It is a awesome model and I am going to reproduce the result from end to end.
Before I start working on this, can you tell me how many hours it will cost if I do pre-training on my own device(dataset : 16GB corpus Wikipedia + BookCorpus with 64 epochs , hardware: 8 * NVIDIA V100 (32GB))?

evaluting causing error for infer language pair on pretrained cnndm

fairseq-generate cnndm/processed --path /e/workspace/ProphetNet/a.pt --user-dir prophetnet --task translation_prophetnet --batch-size 32 --gen-subset test --beam 5 --num-workers 4 --min-len 45 --max-len-b 110  --no-repeat-ngram-size 3 --lenpen 1.2 2>&1 > cnndm/output-ck9-pelt1.2-test-beam5.txt

using the above command for Inference and Evaluation but causing an error on a pre-trained model for CNN/Daily Mail

Traceback (most recent call last):
  File "D:\windows_program\conda\envs\p\Scripts\fairseq-generate-script.py", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-generate')())
  File "e:\fairseq\fairseq_cli\generate.py", line 270, in cli_main
    main(args)
  File "e:\fairseq\fairseq_cli\generate.py", line 36, in main
    return _main(args, sys.stdout)
  File "e:\fairseq\fairseq_cli\generate.py", line 57, in _main
    task = tasks.setup_task(args)
  File "e:\fairseq\fairseq\tasks\__init__.py", line 17, in setup_task
    return TASK_REGISTRY[args.task].setup_task(args, **kwargs)
  File "e:\fairseq\fairseq\tasks\translation.py", line 226, in setup_task
    raise Exception('Could not infer language pair, please provide it explicitly')
Exception: Could not infer language pair, please provide it explicitly

but in docs their no such arguments for fairseq-generate

fairseq 0.9.0
torch 1.5.1
model prophetnet_large_160G_cnndm_model.pt

How to use the pretraining task of ProphetNet

I want to use the pretraining task of ProphetNet, that recovers the mask span of the input sentence.
I follow the instruction of Figure 1 in the paper.

For example, the input is But I [MASK][MASK] my life for some lovin\' and some gold and I only recover the first [MASK]. (the sentence is from the pretraining corpus BookCorpus)
I use the following code based on HuggingFace:

from transformers import ProphetNetTokenizer, ProphetNetForConditionalGeneration
tokenizer = ProphetNetTokenizer.from_pretrained('prophetnet')
model = ProphetNetForConditionalGeneration.from_pretrained('prophetnet')

# the sentence is from the pretraining corpus BookCorpus
input_ids = tokenizer('But I traded all my life for some lovin\' and some gold', return_tensors="pt")['input_ids']
mask_id = input_ids[0][2]
input_ids[0][2:4] = tokenizer.pad_token_id

decoder_input_ids = tokenizer('[MASK][MASK] I', return_tensors="pt")['input_ids']
# the way of MASS: decoder_input_ids = tokenizer('[MASK][MASK][MASK]', return_tensors="pt")['input_ids']

output = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
probs = output.logits[0][2]
# the rank of the target word in the vocabulary
print((probs[mask_id]<probs).sum())

However, the rank of traded is 15182 among 30522 words.
And I also tried different masked words and masked spans, but the results are all unexpected.

So, I want to ask if my way to recover the mask has some errors?

How to use ProphetNetX in other languages like Vietnamese? How to preprocess the data?

ValueError: offset must be non-negative and no greater than buffer length (59260)

fairseq-generate $DATA_DIR --path $CHECK_POINT --user-dir prophetnet --task translation_prophetnet --batch-size 16 --gen-subset test --beam $BEAM --num-workers 4 --min-len 16 --max-len-b 120 --no-repeat-ngram-size 3 --lenpen $LENPEN 2>&1 > $OUTPUT_FILE
0%| | 0/15 [00:00<?, ?it/s]Traceback (most recent call last):
File "/opt/conda/bin/fairseq-generate", line 11, in
load_entry_point('fairseq==0.9.0', 'console_scripts', 'fairseq-generate')()
File "/opt/conda/lib/python3.7/site-packages/fairseq_cli/generate.py", line 199, in cli_main
main(args)
File "/opt/conda/lib/python3.7/site-packages/fairseq_cli/generate.py", line 94, in main
for sample in t:
File "/opt/conda/lib/python3.7/site-packages/tqdm/std.py", line 1081, in iter
for obj in iterable:
File "/opt/conda/lib/python3.7/site-packages/fairseq/data/iterators.py", line 36, in iter
for x in self.iterable:
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/fairseq/data/language_pair_dataset.py", line 183, in getitem
tgt_item = self.tgt[index] if self.tgt is not None else None
File "/opt/conda/lib/python3.7/site-packages/fairseq/data/indexed_dataset.py", line 475, in getitem
np_array = np.frombuffer(self._bin_buffer, dtype=self._index.dtype, count=size, offset=ptr)
ValueError: offset must be non-negative and no greater than buffer length (59260)

Abstractive Summarization using prophetnet

How to change the number of sentences in predicted summary by prophetnet.
I want to control both length and number of sentences for different usecases @yuyan2do .It will be really helpful if you can tell that .
Thanks

About ProphetNet_Code's README

I saw a couple of "ProphetNet-Zh" in ProphetNet_Code's README.
It should be ProphetNet-Code , right?

RuntimeError: unexpected EOF. Corrupted File?

Hello,

I performed the following:

Clone prophetnet repository
Installed torch and fairseq
Download ProphetNet-large-160GB pre-trained model
Download CNN/DM data
Preprocess CNN/DM data via preprocess_cnn_dm.py
Use fairseq-preprocess to generate binaries

When I run fairseq-train or inference fairseq-generate, I get the following errors:
Train

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/train.py", line 51, in main
    model = task.build_model(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/tasks/fairseq_task.py", line 185, in build_model
    return models.build_model(args, self)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/workspace/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 147, in build_model
    states = torch.load(args.load_from_pretrained_model, map_location='cpu')
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 1092436 more bytes. The file might be corrupted.

Inference

Traceback (most recent call last):  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 151, in load_checkpoint_to_cpu    from fairseq.fb_pathmgr import fb_pathmgr
ModuleNotFoundError: No module named 'fairseq.fb_pathmgr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-generate", line 11, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 199, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 47, in main
    task=task,
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 179, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 160, in load_checkpoint_to_cpu
    path, map_location=lambda s, l: default_restore_location(s, "cpu")
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 5239485 more bytes. The file might be corrupted.

Inputs:

Train

fairseq-train \
--fp16 \
--user-dir ./prophetnet --task translation_prophetnet --arch ngram_transformer_prophet_large \
--optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 \
--lr 0.0001 \
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 1000 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--criterion ngram_language_loss --label-smoothing 0.1 \
--update-freq 32  --max-sentences 2 \
--num-workers 4 \
--load-from-pretrained-model ../prophetnet_large_pretrained_160G_14epoch_model.pt \
--load-sep \
--ddp-backend=no_c10d --max-epoch 10 \
--max-source-positions 512 --max-target-positions 512 \
--skip-invalid-size-inputs-valid-test \
--seed 1 \
--save-dir ./cnndm/finetune_cnndm_checkpoints \
--keep-last-epochs 10 \
--tensorboard-logdir ./cnndm/finetune_cnndm_tensorboard \
./cnndm/processed

Inference

fairseq-generate \
./cnndm/processed \
--path ../prophetnet_large_pretrained_16G_64epoch_model.pt \
--user-dir prophetnet \
--task translation_prophetnet \
--batch-size 32 \
--gen-subset test \
--beam 5 \
--num-workers 4 \
--min-len 45 \
--max-len-b 110 \
--no-repeat-ngram-size 3 --lenpen 1.2 2>&1 > ../logs.output

Any idea how to handle this? Thank you.

Attention Maps

Is there any way to be able to visualize the attention maps for the outputs during generation?

Wrong Tokenization in SquadQG Evaluation Scripts

Thanks for the great work.

I am reproducing the result reported in GLGE but find that the SquadQG evaluation script seem to use wrong tokenization.

In /script/evaluate/qg/eval_on_unilm_qg.py, the generated text are post-processed by fix_tokenization:

ProphetNet/GLGE_baselines/script/script/evaluate/qg/eval_on_unilm_qg.py

Lines 40 to 117 in 0a1b59c

    
           def fix_tokenization(text): 
        
               input_tokens = text.split() 
        
               output_tokens = [] 
        
               has_left_quote = False 
        
               has_left_single_quote = False 
        
               i = 0 
        
               prev_dash = False 
        
               while i < len(input_tokens): 
        
                   tok = input_tokens[i] 
        
                   flag_prev_dash = False 
        
                   if tok in _tok_dict.keys(): 
        
                       output_tokens.append(_tok_dict[tok]) 
        
                       i += 1 
        
                   elif tok == "\"": 
        
                       if has_left_quote: 
        
                           output_tokens.append("''") 
        
                       else: 
        
                           output_tokens.append("``") 
        
                       has_left_quote = not has_left_quote 
        
                       i += 1 
        
                   elif tok == "'" and len(output_tokens) > 0 and output_tokens[-1].endswith("n") and i < len(input_tokens) - 1 and input_tokens[i + 1] == "t": 
        
                       output_tokens[-1] = output_tokens[-1][:-1] 
        
                       output_tokens.append("n't") 
        
                       i += 2 
        
                   elif tok == "'" and i < len(input_tokens) - 1 and input_tokens[i + 1] in ("s", "d", "ll"): 
        
                       output_tokens.append("'"+input_tokens[i + 1]) 
        
                       i += 2 
        
                   elif tok == "'": 
        
                       if has_left_single_quote: 
        
                           output_tokens.append("'") 
        
                       else: 
        
                           output_tokens.append("`") 
        
                       has_left_single_quote = not has_left_single_quote 
        
                       i += 1 
        
                   elif tok == "." and i < len(input_tokens) - 2 and input_tokens[i + 1] == "." and input_tokens[i + 2] == ".": 
        
                       output_tokens.append("...") 
        
                       i += 3 
        
                   elif tok == "," and len(output_tokens) > 0 and _is_digit(output_tokens[-1]) and i < len(input_tokens) - 1 and _is_digit(input_tokens[i + 1]): 
        
                       # $ 3 , 000 -> $ 3,000 
        
                       output_tokens[-1] += ','+input_tokens[i + 1] 
        
                       i += 2 
        
                   elif tok == "." and len(output_tokens) > 0 and output_tokens[-1].isdigit() and i < len(input_tokens) - 1 and input_tokens[i + 1].isdigit(): 
        
                       # 3 . 03 -> $ 3.03 
        
                       output_tokens[-1] += '.'+input_tokens[i + 1] 
        
                       i += 2 
        
                   elif tok == "." and len(output_tokens) > 0 and len(output_tokens[-1]) == 1 and output_tokens[-1].isupper() and i < len(input_tokens) - 2 and len(input_tokens[i + 1]) == 1 and input_tokens[i + 1].isupper() and input_tokens[i + 2] == '.': 
        
                       # U . N . -> U.N. 
        
                       k = i+3 
        
                       while k+2 < len(input_tokens): 
        
                           if len(input_tokens[k + 1]) == 1 and input_tokens[k + 1].isupper() and input_tokens[k + 2] == '.': 
        
                               k += 2 
        
                           else: 
        
                               break 
        
                       output_tokens[-1] += ''.join(input_tokens[i:k]) 
        
                       i += 2 
        
                   elif tok == "-": 
        
                       if i < len(input_tokens) - 1 and input_tokens[i + 1] == "-": 
        
                           output_tokens.append("--") 
        
                           i += 2 
        
                       elif i == len(input_tokens) - 1 or i == 0: 
        
                           output_tokens.append("-") 
        
                           i += 1 
        
                       elif output_tokens[-1] not in string.punctuation and input_tokens[i + 1][0] not in string.punctuation: 
        
                           output_tokens[-1] += "-" 
        
                           i += 1 
        
                           flag_prev_dash = True 
        
                       else: 
        
                           output_tokens.append("-") 
        
                           i += 1 
        
                   elif prev_dash and len(output_tokens) > 0 and tok[0] not in string.punctuation: 
        
                       output_tokens[-1] += tok 
        
                       i += 1 
        
                   else: 
        
                       output_tokens.append(tok) 
        
                       i += 1 
        
                   prev_dash = flag_prev_dash 
        
               return " ".join(output_tokens)

For example, it turns . . . to ..., " to '', 1 , 000 to 1,000.

However, the original data do not like the sentence after fix_tokenization. Here are some samples from the test set:

What did Harff define as " short - lived outbursts by mobs . . . ? "
Who sang " Girls Love Beyoncé " in 2013 ?
What city in Montana has over 100 , 000 people ?

Moreover, I reproduce MASS-base and find the results are higher if we disable fix_tokenization:

	BLEU	METEOR	ROUGE-L
MASS-base reported in GLGE	20.1	24.4	49.4
MASS-base reproduce with fix_tokenization	20.69	24.92	49.21
MASS-base reproduce without fix_tokenization	22.54	25.03	50.27

I wonder whether I miss somthing or the reported results use a wrong tokenization?
I also hope that, if possible, the model outputs can be released to support fair and detailed comparisons.

Looking forward to your reply

Variables needed for gradient computation should be modified by an inplace operation.

In the README.md, it suggests using the torch of version 1.3.0, but there seems no that version in the previous version of PyTorch, link.

So, I use the latest version (1.7.1) of the torch, and when I start training, I got this Runtime Error.

And then I found that the error was caused in the prophetnet/ngram_multihead_attention.py line 255.

q = q * self.scaling

It looks like this operation is not allowed anymore, then I fixed the problem by the following:

q_ = q * self.scaling

if self.bias_k is not None:
    assert self.bias_v is not None
    k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
    v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
    q = q_.contiguous().view(tgt_len, bsz * self.num_heads, self.head_dim).transpose(0, 1)

Is it possible to run prophetnet on 11G memory GPUs?

I tried to run prophetnet on 2080ti(11G memory) with Question Generation task. However, even if I set the max-sentences as 1, it still be out of memory. So I wonder whether it is possible to run this model on 11G memory GPU. Because it has similar structure and size to the other pretrained models like BERT and Unilm, which I can run them on 11G memory GPUs.

Selecting additional scoring methods for fine-tuning

We have started to fine-tune the ProphetNet model on a custom dataset. We are using fairseq==v0.9.0 version. Currently, only perplexity is supported while training, however, we would like also to validate the trained model on BLEU-4, METEOR, and ROUGE metrics.
Can anyone provide any insights on this?
Because the "--scoring" parameter in fairseq v0.9 is not supported.

CNNDM test set output

Hi, can you please release the decoded summary for CNNDM test set?

How to pretrain about other language

Hello, thanks for sharing this awesome paper.
I want to pretrain this model in Korean. But this project is not provide pretraining code.
if you don't mind me asking, would you provide pretraining code?

thanks :)

Truncated source text during inference

I finetuned the pretrained ProphetNet model for 1 epoch on my own dataset on Google Colab for a summarization task.
For inference I used:

!fairseq-interactive processed \
--path $CHECK_POINT \
--user-dir prophetnet \
--max-source-positions 6000 --max-target-positions 512 \
--task translation_prophetnet \

Output:

Namespace(beam=5, bpe=None, buffer_size=1, cpu=False, criterion='cross_entropy', data='processed', dataset_impl=None, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, empty_cache_freq=0, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, load_alignments=False, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=1, max_source_positions=6000, max_target_positions=512, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', path='drive/My Drive/deep_learning/nlp/covid19/prophetNet/finetune_checkpoints/checkpoint1.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation_prophetnet', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, truncate_source=False, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='prophetnet', warmup_updates=0, weight_decay=0.0)
| [src] dictionary: 30522 types
| [tgt] dictionary: 30522 types
| loading model(s) from drive/My Drive/deep_learning/nlp/covid19/prophetNet/finetune_checkpoints/checkpoint1.pt
tcmalloc: large alloc 1587781632 bytes == 0x34748000 @  0x7fe0e3b01b6b 0x7fe0e3b21379 0x7fe098a7f04e 0x7fe098a80f4a 0x7fe0d196e0c4 0x7fe0dfbbb5d9 0x551b15 0x5aa6ec 0x50abb3 0x50c5b9 0x508245 0x509642 0x5a55a1 0x5a58f8 0x4e07ee 0x50abe1 0x50c5b9 0x508245 0x58958c 0x5a067e 0x50d966 0x508245 0x50a080 0x50aa7d 0x50d390 0x508245 0x50a080 0x50aa7d 0x50c5b9 0x508245 0x50a080
tcmalloc: large alloc 1587781632 bytes == 0x93182000 @  0x7fe0e3b01b6b 0x7fe0e3b21379 0x7fe098a7f04e 0x7fe098a80f4a 0x7fe0d196e0c4 0x7fe0dfbbb5d9 0x551b15 0x5aa6ec 0x50abb3 0x50c5b9 0x508245 0x509642 0x5a55a1 0x5a58f8 0x4e07ee 0x50abe1 0x50c5b9 0x508245 0x58958c 0x5a067e 0x50d966 0x508245 0x50a080 0x50aa7d 0x50d390 0x508245 0x50a080 0x50aa7d 0x50c5b9 0x508245 0x50a080
| Type the input sentence and press return:
ion channels are integral membrane proteins involved in specialized physiological functions demanding a precise control of the membrane permeability as regards the exchange of water molecules, ions and even small solutes (metabolites and antibiotics) [1] [2] [3] . the modulation of channel current occurs in response to a diversity of cellular signals including changes in voltage across the cell membrane (voltage-gated ion channels), chemical stimulus (ligand-gated ion channels, phosphorylation), changes in temperature, mechanical deformation and interaction with other molecules in the cell. the physiological significance of some of these mechanisms reported in vitro has been questioned because they require extreme conditions hard to meet in vivo (unrealistic high voltages, non-physiological concentrations, etc.) [4] . accordingly, many studies have focused on the role of solution acidity [5, 6] , an elementary factor that crucially regulates ion channel activity, extensively studied both in vivo and in vitro [5] [6] [7] [8] . relevant examples of pore function modulation by ph include potassium and sodium channels, chloride channels, the mitochondrial voltage-dependent anion channel (vdac) or bacterial porins of the outer membrane of gram-negative bacteria (ompf, ompc, phoe of escherichia coli ) [7, 8] , among others.\nnarrow channels have pore dimensions comparable to the size of the permeating ions. this means that protons could block these channels current by steric reasons just occluding the channel eyelet [4] . in contrast, wide pores allowing the simultaneous passage of waa e-mail: [email protected] ter molecules and hydrated ions require more sophisticated mechanisms: protons regulate the channel conductance in a gradual way via complex networks of titratable residues involving inter-and intramolecular interactions [5] . recent studies show also that either narrow or wide channels may use hydrophobic gating to regulate ion transport across them [9] . efforts to understand those molecular interactions in ion channels are driven by the fact that proteins are highly cooperative structures [10, 11] . cooperative interactions are important factors for certain protein functions and imply some sort of communication among the system\'s components that allows either for a decisive response over a limited range of concentrations (positive cooperativity) or for a response that is less decisive but also less restricted with regard to concentration of the ligand (negative cooperativity) [12] .\nwe focus here on the changes in the ionic selectivity of membrane channels with ph, an issue still unaddressed by available all-atom md simulations and only partially explained by lower resolution mean field approaches [13] [14] [15] [16] . taking advantage of the fact that selectivity vs. ph curves display characteristic "sigmoidal dose response" shape [1, 17] we apply the hill formalism [18] , which is commonly used in biochemistry and pharmacology to analyze binding or kinetic data [19] . one could argue that proteins having a large number of ionizable residues (usually more than 100) would routinely present apparent cooperativity, reflecting the superposition of independent residue titrations rather than genuine cooperative mechanisms [20, 21] . we examine data from previous articles and from original experiments to show that this is not the case. . experimental data are taken from ref. [24] . the solid lines correspond to the fitting to eq. (1). and the sars-cov e) that exhibit contrasting cooperative features. later we discuss experiments where is n < 1, indicating negative cooperativity. in this case we aim to discriminate between actual physical interactions (as it is always the case for positive cooperativity) and apparent cooperativity (the so called spurious cooperativity).\nwild-type ompf, kindly provided by dr. s. bezrukov (nih, bethesda, usa), was isolated and purified from an e. coli culture. mutants d113c and d113r [22] were a generous gift from dr. h. miedema (wetsus, the netherlands). planar membranes were formed by the apposition of monolayers across orifices with diameters of 70-100 μm on a 15 μm thick teflon partition using diphytanoyl phosphatidylcholine. the orifices were pre-treated with a 1% solution of hexadecane in pentane. an electric potential was applied using ag/agcl electrodes in 2 m kcl, 1.5% agarose bridges assembled within standard 250 ml pipette tips. the potential was defined as positive when it was higher on the side of the protein addition (the cis side of the membrane chamber), whereas the trans side was set to ground. an axopatch 200b amplifier (molecular devices, sunnyvale, ca) in the voltage-clamp mode was used to measure the current and applied potentials. the chamber and the head stage were isolated from external noise sources with a double metal screen (amuneal manufacturing corp., philadelphia, pa). the ph was adjusted by adding hcl or koh and controlled during the experiments with a glp22 ph meter (crison instruments, barcelona). measurements were obtained at t = (23.0 ± 1.5) • c. the reversal potential measurements were corrected with the liquid junction potential calculated from henderson\'s equation, as described in detail elsewhere [23] .\nwhen a concentration gradient is set between both sides of the membrane, a net flux of ions through membrane pores (and hence an electric current) appears. the sign and magnitude of the applied voltage that is needed to make zero the electric current (the so-called reversal potential, v rev ) reveals the preferential passage of either positive or negative ions. in most ion channels the reversal potential changes substantially with the solution ph [24] [25] [26] [27] , as shown in fig. 1 with two different systems, namely the sars-cov e protein channel [28] ( fig. 1(a) ) and the pora protein (n. meningitidis) [24] ( fig. 1(b) ). in both cases, the channel discrimination for ions turns from weak cationic selectivity at neutral ph into anionic selectivity in acidic solutions. this can be explained considering that when the ph decreases, more and more acidic groups become protonated and the effective charge of the channel changes from negative to positive [24, 29] . we use the hill formalism to obtain information of how solution acidity regulates v rev . the theoretical curves fitted to the reversal potential data use the form [6, 13] \nin the two panels of fig. 1 we find a common pattern, the hill coefficient is slightly higher than 1 (positive cooperativity). this suggests that these proteins have developed high sensitivity mechanisms aiming to detect minimal changes in their environment [13] . furthermore, the effective pk of both curves (the ph that provokes a response halfway between the baseline (bottom) and maximum (top)) lies between 4 and 4.5, which is comparable to the typical pka of acidic residues (pka ∼ 4.4 and 4.0 for glutamic and aspartic acids, respectively) [14, 17, 24] . the similarities between the two panels are thought-provoking because the sars-co v e and the pora most probably have very different pore arrangement. the sars-cov e protein forms proteolipidic channels [29] . lipid molecules assemble with e proteins to form a combined tight arrangement in which the actual number of e monomers is unknown. experiments with different membrane compositions indicate that the protonation of residues in the transmembrane protein domain of e protein is not affected by the charge of the lipid polar heads [28] . therefore, positive cooperativity in this case fits with its canonical meaning in well-known oligomeric structures like hemoglobin [18, 30] : it most likely arises from the interaction between protein monomers. in contrast, the pora forms monomeric proteinaceous channels located in the outer membrane of neisseria meningitidis. in other monomeric proteins positive cooperativity has been linked either to interactions between distinct binding domains behaving as functional subunits (recoverin) or to concerted conformational changes (vdac) [13] . it is tempting to speculate that the interactions between matching clusters of charges acting as selectivity filter of the channel [24] may have cooperative nature, although the question remains open since no crystallographic structure of any complete pora protein has been resolved up to date.\nthe considerations made in the previous section emphasize the usefulness of the hill formalism as diagnostic tool to detect subtle inter-subunit or inter-domain communication in membrane proteins displaying positive cooperativity. however, in other protein channels showing negative cooperativity the analysis could be much more demanding. in this sense, the experiments performed in the bacterial porin ompf from e. coli, shown in fig. 2 (a) can be considered a case study. all measured curves show negative cooperativity (n < 1) but with the particularities that both the hill coefficient and the effective pk of the curves decrease significantly as salt concentration is increased. remarkably, diluted solutions show almost no cooperativity, as shown in the inset of fig. 2(a) . the question that we aim to investigate here is whether this negative cooperativity is genuine or it is a meaningless mathematical artifact that appears because of the superposition of independent titrations. this effect is illustrated in fig. 2(b) for the superposition of four independent and non-cooperative (n = 1) titration curves (lines) with pk from 3.5 to 5.0. the resulting superposed curve (circles) does present negative cooperativity (n = 0.75) with an averaged pk = 4.25. of note, the superposition of independent titrations can only produce apparent negative cooperativity and cannot yield curves with a hill coefficient n > 1, like those shown in fig. 1 and elsewhere [13] . although the superposition effect ( fig. 2(b) ) could give reason for the shape of the curves reported in fig. 2(a) , it cannot be invoked to explain two features of the negative cooperativity found in ompf. first, the origin of the low values attained by the effective pka in fig. 2 (a) at high concentrations, which differ from typical pka of acidic residues (somewhere between 4 and 5); and, second, why the effect of salt is the opposite of the well-known screening [31] : both the pka and the hill coefficient decrease with increasing salt concentration. we reported similar observations about the hill coefficient and pka in experiments involving ompf conductance and current noise [6] . there, we ascribed these effects to the competitive binding of salt cations and protons occurring in the channel narrow constriction [6] , formed by two acidic residues (d113 and e117) lined in front of a cluster of arginines, as shown in fig. 3 . interestingly, such competitive binding would also explain the findings reported here. the presence of cations around certain acidic residues increases the amount of protons needed to titrate the site, thus lowering the effective pk and changing the shape of the overall titration curve. clearly, such effects are more important the higher the concentration of salt.\ncomplementary insights can be obtained from an energetic analysis, having in mind that cooperativity could be interpreted as a competition between enthalpic and entropic effects [32] [33] [34] . a positive cooperative response requires a coupling of various stabilizing interactions that tighten the structure yielding an enthalpic benefit and an entropic cost. in contrast, negative cooperativity boosts the conformational freedom of the system, what occurs with a cost in enthalpy and a benefit in entropy [32] [33] [34] . in the case of a genuine negative cooperativity, the mechanism might be expected to be largely entropic in origin. recently, we have shown that this is the case [16] . the interaction of several receptors (binding sites) with different kinds of ligands (protons and cations) involves a multiplicity of arrangements in the channel that generates a significant contribution from the configurational entropy [16] . this entropic factor reinforces the existence of a genuine negative cooperativity in the ompf channel.\non the basis of the reasoning in which the ph titration shown in fig. 2(a) involves the interaction of different types of ligands and binding sites [6] , we could expect noticeable changes in the hill analysis of v rev if any of the critical residues allegedly involved are mutated. a number of previous studies suggest that the acidic residues d113 and e117 are key to control the channel sensitivity to ph [6, 16, 17] . in fact, the replacement of these two acidic residues with neutral cysteines (cc-mutant) eliminated the large conductance decrease found for wt ompf in low ph solutions [6] . for the sake of simplicity, we focus here only in the residue d113 studying two single-site mutants, the d113c (the aspartic acid is replaced with a neutral cysteine) and d113r (the aspartic acid is replaced with a positive arginine). figure 4 (a) shows the comparison between reversal potential experiments in wt ompf, d113c and d113r mutants in kcl 1.0/0.1 m. the importance of d113 in the mechanism of ph sensitivity is evident. just by changing the state of charge of this residue out of the 102 ionizable residues per ompf monomer, the effective pk increases from 2.4 to 3.3 (d113c) or to 3.8 (d113r).\nalso, the hill coefficient increases significantly from 0.43 (wt) to 0.79 (d113c) or to 0.86 (d113r). the substitution of one acidic residue with either neutral or positive residues almost eliminates the observed pk shift and negative cooperativity. one could argue that even in the most favorable case (d113r) the non-cooperative state is not regained, so that other residues (most probably e117 and others) may also participate in the process of competitive binding mentioned above. an alternative explanation could lie on the fact that the whole ompf trimer has 306 ionizable residues, so that we cannot completely rule out that the hill analysis contains a partial contribution of non-genuine apparent cooperativity similar to the situation depicted in fig. 2(b) . in fact, the existence of spurious cooperativity occurring along with genuine cooperativity is not an unexpected result, on the contrary, it is a landmark phenomenon when studying the regulation of biochemical processes in multiple-site systems [21] .\nbesides the mutation of critical channel residues, the competitive binding occurring in the central constriction of the channel can be probed with the addition of an extra ligand that alters the binding equilibrium and thus the cooperativity observed. taking advantage of the knowledge of an x-ray ompf structure showing a binding site for mg 2+ cations located between residues d113 and e117 [35] , we performed reversal potential experiments in wt ompf upon addition of millimolar concentrations of mgcl 2 . figure 4(b) shows the results obtained (green squares) compared to the measurements performed in the absence of mgcl 2 (blue circles). interestingly, the presence of mg 2+ reduces the measured reversal potential at neutral ph, showing a similar effect to that of the d113r mutant in fig. 4(a) . also, both the hill coefficient and effective pk increase compared to the control experiment. in contrast to mutated proteins, protons are able to titrate the site regardless the presence of mg 2+ ions and thus the reversal potential at low ph matches that of the control experiments (without mgcl 2 ). to complement this study, we replaced traces of mgcl 2 with lacl 3 , having in mind that la 3+ ions are well-known ion channel modulators showing stronger effects than mg 2+ [36] . in the case of lacl 3 no structure is available, but functional studies demonstrated that la 3+ ions interact with the residues located in the central constriction, being d113 and e117 the most plausible candidates [36] . as expected, lower concentrations of lacl 3 have similar effects to mgcl 2 in the ph titration of the reversal potential in ompf, as shown in fig. 4 (b) (red triangles). therefore, the presence of an extra ligand, mg 2+ or la 3+ ions, reduces the negative cooperativity observed, thus supporting the statement that the competitive binding between cations and protons has a central role in the observed negative cooperativity.\nby combining ph-dependent selectivity experiments performed in bacterial porins and viroporins we have shown that the hill formalism can be useful to analyze the cooperative behavior of these proteins. we show that in addition to the most commonly accepted notion of cooperativity (interaction between different subunits in oligomeric protein channels) alternative phenomena linked to either positive or negative cooperativity can appear in monomeric channels. we pay special attention to the bacterial porin ompf to demonstrate that one cannot rely on the hill coefficient of a single curve as the definite tool to assess genuine negative cooperative in multi-site systems like ion channels. a combination of different experiments, even involving site-directed mutagenesis, is mandatory to elucidate the origin of the underlying physical interaction. we present solid evidences that the observed negative cooperativity in ompf arises from genuine sources, namely a competitive binding between protons and cations. this mechanism could be linked to the ability of the protein to modulate ionic transport over a very wide range of ph values.
/pytorch/aten/src/ATen/native/BinaryOps.cpp:66: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
S-0	ion channels are integral membrane proteins involved in specialized physiological functions demanding a precise control of the membrane [UNK] as regards the exchange of water [UNK] ions and even small [UNK] [UNK] and [UNK] [UNK] [UNK] [UNK] . the modulation of channel current occurs in response to a diversity of cellular signals including changes in voltage across the cell membrane [UNK] ion [UNK] chemical stimulus [UNK] ion [UNK] [UNK] changes in [UNK] mechanical deformation and interaction with other molecules in the [UNK] the physiological significance of some of these mechanisms reported in vitro has been questioned because they require extreme conditions hard to meet in vivo [UNK] high [UNK] [UNK] [UNK] [UNK] [UNK] . [UNK] many studies have focused on the role of solution [UNK] [UNK] [UNK] , an elementary factor that [UNK] regulates ion channel [UNK] extensively studied both in vivo and in vitro [UNK] [UNK] [UNK] [UNK] . relevant examples of [UNK] function modulation by ph include potassium and sodium [UNK] chloride [UNK] the mitochondrial [UNK] [UNK] channel [UNK] or bacterial [UNK] of the outer membrane of [UNK] bacteria [UNK] [UNK] [UNK] of [UNK] coli ) [UNK] [UNK] , among [UNK] channels have [UNK] dimensions comparable to the size of the [UNK] [UNK] this means that [UNK] could block these channels current by [UNK] reasons just [UNK] the channel [UNK] [UNK] . in [UNK] wide [UNK] allowing the simultaneous passage of [UNK] [UNK] [UNK] ter molecules and [UNK] ions require more sophisticated [UNK] [UNK] regulate the channel [UNK] in a gradual way via complex networks of [UNK] residues involving [UNK] [UNK] interactions [UNK] . recent studies show also that either narrow or wide channels may use [UNK] [UNK] to regulate ion transport across them [UNK] . efforts to understand those molecular interactions in ion channels are driven by the fact that proteins are highly cooperative structures [UNK] [UNK] . cooperative interactions are important factors for certain protein functions and imply some sort of communication among the [UNK] components that allows either for a decisive response over a limited range of concentrations [UNK] [UNK] or for a response that is less decisive but also less restricted with regard to concentration of the ligand [UNK] [UNK] [UNK] [UNK] focus here on the changes in the ionic [UNK] of membrane channels with [UNK] an issue still [UNK] by available [UNK] md simulations and only partially explained by lower resolution mean field approaches [UNK] [UNK] [UNK] [UNK] . taking advantage of the fact that [UNK] [UNK] ph curves display characteristic [UNK] dose [UNK] shape [UNK] [UNK] we apply the hill [UNK] [UNK] , which is commonly used in biochemistry and [UNK] to analyze binding or kinetic data [UNK] . one could argue that proteins having a large number of [UNK] residues [UNK] more than [UNK] would routinely present apparent [UNK] reflecting the [UNK] of independent residue [UNK] rather than genuine cooperative mechanisms [UNK] [UNK] . we examine data from previous articles and from original experiments to show that this is not the [UNK] . experimental data are taken from [UNK] [UNK] . the solid lines correspond to the fitting to [UNK] [UNK] and the [UNK] [UNK] that exhibit contrasting cooperative [UNK] later we discuss experiments where is n < [UNK] indicating negative [UNK] in this case we aim to [UNK] between actual physical interactions [UNK] it is always the case for positive [UNK] and apparent [UNK] [UNK] so called [UNK] [UNK] [UNK] kindly provided by [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] was isolated and [UNK] from an [UNK] coli [UNK] mutants [UNK] and [UNK] [UNK] were a generous gift from [UNK] [UNK] [UNK] [UNK] the [UNK] [UNK] membranes were formed by the
H-0	-0.2399412989616394	as regards the exchange of water and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and even small [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions and [UNK] ions
P-0	-2.7302 -0.0446 -0.2181 -0.3097 -0.0106 -0.1876 -1.1719 -1.2706 -0.3719 -0.7857 -0.2937 -0.0816 -0.0913 -0.2056 -0.5902 -0.0909 -0.0400 -0.1110 -0.1614 -0.5916 -0.0750 -0.0325 -0.0797 -0.1369 -0.4495 -0.0491 -0.0299 -0.0561 -0.1128 -0.3546 -0.0463 -0.0303 -0.0545 -0.1077 -0.3349 -0.0436 -0.0341 -0.0660 -0.1046 -0.3250 -0.0411 -0.0336 -0.0808 -0.1042 -0.3279 -0.0428 -0.0372 -0.0932 -0.0984 -0.3113 -0.0400 -0.0388 -0.1061 -0.1002 -0.2925 -0.0411 -0.0401 -0.1281 -0.1054 -0.3048 -0.0390 -0.0385 -0.1393 -0.1092 -0.1049 -0.1784 -0.0846 -0.1399 -0.1188 -0.1976 -0.0568 -0.0543 -0.4345 -0.2147 -0.0860 -0.1199 -0.1128 -0.8991 -0.1517 -0.1772 -0.0925 -0.0963 -0.5904 -0.1255 -0.1643 -0.0649 -0.0832 -0.5793 -0.1275 -0.1028 -0.1191 -0.1094 -0.7388 -0.1258 -0.1548 -0.0931 -0.0954 -0.7337 -0.1254 -0.1445 -0.0834 -0.0993 -0.7416 -0.1353 -0.1513 -0.0820 -0.0982 -0.8389 -0.1400 -0.1481 -0.0744 -0.0987 -0.9415 -0.1444 -0.1458 -0.0750 -0.0948 -0.9963 -0.1626 -0.1267 -0.8353 -0.1377 -0.1581 -0.6441 -0.1505 -0.1643 -0.3590 -0.1340 -0.2158 -0.6541 -0.2159 -0.3484 -0.8315 -0.1638 -0.2550 -0.3416 -0.1492 -0.1483 -0.1478 -0.1887 -0.1234 -0.1476 -0.1580 -0.1924 -0.1688 -0.1243 -0.1764 -0.1423 -0.1215 -0.1487 -0.1192 -0.1183 -0.1394 -0.1117 -0.1162 -0.1392 -0.1076 -0.1127 -0.1365 -0.1039 -0.1134 -0.1267 -0.0956 -0.1106 -0.1229 -0.0919 -0.1085 -0.1212 -0.0881 -0.1077 -0.1203 -0.0827 -0.1073 -0.1195 -0.0809 -0.1039 -0.1130 -0.0753 -0.1044 -0.1101 -0.0727 -0.1009 -0.1104 -0.0719 -0.1005 -0.1044 -0.0692 -0.0914 -0.1000 -0.0626 -0.0877 -0.0726 -0.1075 -0.2392 -0.2199 -0.2573 -0.1619 -0.2580 -0.1570 -0.0950 -7.0815

Before passing as input, source text length = 2735
After = 593
Every source text longer than appx. 600 tokens gets truncated even though I have mentioned the source and target lengths (6000 & 512 respectively).

Would appreciate help on this! Thank you.

Train model for Vietnamese summarization

I am using ProphetNet to train a model for Vietnamese summarization. My train script is below
!fairseq-train \ --fp16 \ --user-dir $USER_DIR --task translation_prophetnet --arch $ARCH \ --encoder-layers 12 --decoder-layers 12 \ --encoder-embed-dim 768 --decoder-embed-dim 768 \ --optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 \ --lr 0.0001 \ --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 1000 \ --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \ --criterion $CRITERION --label-smoothing 0.1 \ --update-freq 32 --max-sentences 2 \ --num-workers 4 \ --load-sep \ --ddp-backend=no_c10d --max-epoch 1 \ --max-source-positions 512 --max-target-positions 512 \ --skip-invalid-size-inputs-valid-test \ --seed 1 \ --save-dir $MODELS/vnexpress \ $DATA_DIR

But when I start inference using to the model, its show error
Namespace(beam=10, bpe=None, cpu=False, criterion='cross_entropy', data='/content/ProphetNet/src/finetune_data/VnExpress_processed', dataset_impl=None, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, empty_cache_freq=0, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1.2, load_alignments=False, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=110, max_sentences=32, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=45.0, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=3, num_shards=1, num_workers=4, optimizer='nag', path='/content/drive/MyDrive/NLP2021/Models/vnexpress/checkpoint1.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation_prophetnet', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, truncate_source=False, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='/content/ProphetNet/src/prophetnet', warmup_updates=0, weight_decay=0.0) | [src] dictionary: 27671 types | [tgt] dictionary: 27671 types | loaded 4649 examples from: /content/ProphetNet/src/finetune_data/VnExpress_processed/test.src-tgt.src | loaded 4649 examples from: /content/ProphetNet/src/finetune_data/VnExpress_processed/test.src-tgt.tgt | /content/ProphetNet/src/finetune_data/VnExpress_processed test src-tgt 4649 examples | loading model(s) from /content/drive/MyDrive/NLP2021/Models/vnexpress/checkpoint1.pt tcmalloc: large alloc 1080451072 bytes == 0x251ec000 @ 0x7f20158f2b6b 0x7f2015912379 0x7f1fb843a2ea 0x7f1fb843bd9a 0x7f1fbaa9fdb3 0x7f2003c6dc20 0x551555 0x5a9dac 0x50a433 0x50beb4 0x507be4 0x508ec2 0x5a4c61 0x5a4fb8 0x4e012e 0x50a461 0x50beb4 0x507be4 0x588e5c 0x59fd0e 0x50d256 0x507be4 0x509900 0x50a2fd 0x50cc96 0x507be4 0x509900 0x50a2fd 0x50beb4 0x507be4 0x509900 0% 0/146 [00:00<?, ?it/s]Traceback (most recent call last): File "/usr/local/bin/fairseq-generate", line 8, in <module> sys.exit(cli_main()) File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 199, in cli_main main(args) File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/generate.py", line 104, in main hypos = task.inference_step(generator, models, sample, prefix_tokens) File "/usr/local/lib/python3.6/dist-packages/fairseq/tasks/fairseq_task.py", line 265, in inference_step return generator.generate(models, sample, prefix_tokens=prefix_tokens) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/fairseq/sequence_generator.py", line 113, in generate return self._generate(model, sample, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/fairseq/sequence_generator.py", line 295, in _generate tokens[:, :step + 1], encoder_outs, temperature=self.temperature, File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/fairseq/sequence_generator.py", line 553, in forward_decoder temperature=temperature, File "/usr/local/lib/python3.6/dist-packages/fairseq/sequence_generator.py", line 584, in _decode_one tokens, encoder_out=encoder_out, incremental_state=self.incremental_states[model], File "/usr/local/lib/python3.6/dist-packages/fairseq/models/fairseq_model.py", line 228, in forward_decoder return self.decoder(prev_output_tokens, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/content/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 615, in forward x_list, extra = self.extract_features(prev_output_tokens, encoder_out, incremental_state, **unused) File "/content/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 776, in extract_features real_positions=real_positions File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/content/ProphetNet/src/prophetnet/ngram_s2s_model.py", line 390, in forward real_positions=real_positions File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/content/ProphetNet/src/prophetnet/ngram_multihead_attention.py", line 339, in forward predict_relative_logits = self.ngram_relative_logits(h_ngram, attn_weights_ngram, real_positions, i_bucket_relative_stream) File "/content/ProphetNet/src/prophetnet/ngram_multihead_attention.py", line 190, in ngram_relative_logits assert real_positions[0][0] == S - 1, 'memory position is 1 2 3 4 5(S-1)' AssertionError: memory position is 1 2 3 4 5(S-1)

By the way, I format my data using PhoBERT. Here is a sample of my input data

tối 26/3 , thành_@@ ủ@@ y đà_@@ n@@ ẵng gửi công_văn yêu_cầu các cấp_@@ ủ@@ y_@@ đảng , chính_quyền tập_trung cao_độ phòng_chống dịch_bệnh co@@ vid@@ -@@ 19 , không để dịch_bệnh bùng_phát trên địa_bàn . [SEP] nhận_định co@@ vid@@ -@@ 19 tiếp_tục lây_lan nhanh hơn trong cả nước , thường_trực thành_@@ ủ@@ y nêu rõ khi dịch được kiểm_soát và không còn nguy_cơ lây_nhiễm , đại_hội đảng_bộ cấp cơ_sở sẽ diễn ra , nhưng quy_mô nhỏ , giảm thủ_tục , giảm lượng khách_@@ mời và không mời học_sinh đến chào_mừng đại_hội . [SEP] lãnh_đạo thành_phố được giao khuyến_cáo người dân hạn_chế tiếp_xúc , ít di_chuyển . [SEP] cùng_với việc dừng tất_cả hoạt_động hội_họp và sự_kiện tập_trung trên 20 người theo chỉ_đạo của thủ_tướng , đà_@@ n@@ ẵng cấm tụ_tập nhiều hơn 10 người ở bên ngoài các công_sở , trường_học , bệnh_viện . [SEP] thường_trực thành_@@ ủ@@ y yêu_cầu đóng_cửa toàn_bộ cơ_sở dịch_vụ , trừ cung_cấp lương_thực , thực_phẩm , dược_phẩm , khám chữa bệnh ; tạm dừng hoặc tổ_chức hạn_chế hoạt_động giao_thông công_cộng ; cho trẻ mầm_non , học_sinh , sinh_viên , học_viên tiếp_tục nghỉ học đến hết ngày 12/4 , theo đề_nghị của ban cán_sự đảng_@@ ub@@ nd thành_phố . [SEP] chánh_@@ văn_phòng thành_@@ ủ@@ y trầ@@ n_th@@ ắ@@ ng_lợi cho biết , ngành chức_năng sẽ xử_lý những người không khai_báo y_tế , không cách_ly và áp_dụng các biện_pháp phòng_chống dịch theo quy_định ; thông_tin sai sự_thật gây hoang_mang dư_luận . [SEP] tại quảng_@@ nam , nhằm ngăn_chặn n@@ co@@ v và đảm_bảo sức_@@ kh@@ ỏ@@ e cộng_đồng , t@@ p hội_@@ an đề_nghị chủ các quán , hàng ăn_uống , giải_khát , tiệm cà_phê có biện_pháp thích_hợp nhằm hạn_chế việc tập_trung đông người . [SEP] chính_quyền hội_@@ an yêu_cầu những nơi này giãn cách lượt người phục_vụ để đảm_bảo không quá 30 người cho một lượt . [SEP] cơ_sở kinh_doanh thực_hiện nghiêm_túc các biện_pháp y_tế để phòng_chống dịch_bệnh . " [SEP] những trường_hợp không chấp_hành thì đề_nghị có biện_@@ pháp_@@ đình_chỉ , đóng_cửa cơ_sở kinh_doanh " , văn_bản của thành_phố ngày 26/3 nêu rõ . [SEP] đến tối 26/3 , việt_@@ nam ghi_nhận 153 ca dương_tính với n@@ co@@ v , trong đó 20 người đã khỏi .

@yuyan2do

Loading the model

The model gets loaded every time fairseq-generate is called to get a summary. Is there any way to avoid the model loading everytime I want to do an inference? Is there any possible way to first pre-load the model and then the inference on it?
Thanks.

KeyError during inference with dialog-en model

Hi,

Using fairseq cli, I ran the preprocessing for test files only to generate binaries, and then tried running the inference with prophetnet-dialog-en model.

Here is my code:
`
fairseq-preprocess
--user-dir prophetnet
--task translation_prophetnet
--source-lang src --target-lang tgt
--testpref tokenized_test
--destdir processed --srcdict vocab.txt --tgtdict vocab.txt
--workers 20

BEAM=5
LENPEN=1.5
CHECK_POINT=prophetnet-dialog-en.pt
TEMP_FILE=fairseq_outputs.txt
OUTPUT_FILE=sorted_outputs.txt

fairseq-generate processed --path $CHECK_POINT --user-dir prophetnet --task translation_prophetnet --batch-size 80 --gen-subset test --beam $BEAM --num-workers 4 --no-repeat-ngram-size 3 --lenpen $LENPEN 2>&1 > $TEMP_FILE
grep ^H $TEMP_FILE | cut -c 3- | sort -n | cut -f3- | sed "s/ ##//g" > $OUTPUT_FILE`

I got the following error. Would appreciate any advice on this. Thank you!

/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py in _upgrade_state_dict(state) --> 300 {"criterion_name": "CrossEntropyCriterion", "best_loss": state["best_loss"]} KeyError: 'best_loss'

About ProphetNet-Dialog-En in dialogue dataset

I want to know whether the personachat dataset is set under the feed shot setting during model tuning, but I think the data preprocessing code seems to use all files。Thanks.

What's the difference between using bi-gram directly and the proposed loss function?

When n=2, why not use bi-gram directly for the loss? It will save a lot of computation cost.
What is the difference if all weights α_{n = 0?}

How to use custom/append to vocab.txt?

Hi, first of all, thank you for the awesome piece of work that you have shared.

I have fine-tuned ProphetNet for summarization on AmazonFoodReview dataset, it works awesome.

Just wanted to know that how can we update or use our own vocab.txt, so no words are actually missing. You know just in case of some scientific, medical, topic oriented document classification.

What are the steps that is need to be taken in such cases?

I am waiting for your reply.

Thank You.

Number of Parameters

I would like to compare the size of ProphetNet with other models. However, I could not find information on the number of parameters anywhere and could not deduce it myself as I am not knowledgeable enough. I would really appreciate your help. Thanks.

How to use provided model

Here you provide a ready-to-use model if I understand correctly? How do I use this model? I am having troubles with this as I'm quite new to torch models and fairseq.

Besides that I ran the setup until the tuning but it takes forever even on colab. So I figured there must be a quick way.. Thanks!

Where can I get the evaluation script for English Dialog, including the AVG？

I found the evaluation script from the PLATO，
I want to know if you use the same script，
and where I can find the evaluation script for AVG that used in DailyDialog and PersonaChat.
And How you get the AVG results for other models, for example PLATO, did you reproduce their experiment?

Question Generation: translation_prophetnet error

fairseq-generate: error: argument --task: invalid choice: 'translation_prophetnet' (choose from 'translation', 'translation_lev', 'translation_from_pretrained_xlm', 'multilingual_translation', 'semisupervised_translation', 'audio_pretraining', 'denoising', 'legacy_masked_lm', 'translation_moe', 'sentence_prediction', 'masked_lm', 'sentence_ranking', 'multilingual_masked_lm', 'language_modeling', 'cross_lingual_lm')

While in the code given it says to use translation_prophetnet while inferencing it does not work giving the error above. You had mentioned in another issue of providing a COLAB notebook. Please do as it will help us alot as Question Generation given the limited instructions is extremely complicated for newbies like me.

Thanks!!

https://colab.research.google.com/drive/1i8orWVxr1NRam612foQFsTt051Y3SUdo?usp=sharing
This is my colab so far. The last cell has the given error.

how can i generate summary from the given text with provided pretrained model?

How I can build models for other languages?

I am using ProphetNet build a new model for text summarizing Vietnamese articles, but the paper only provide pretrain model for English. How can I build model for Vietnamese language?

Is there any instructions to create models for text summarization in other languages (Vietnamese for example)?

How to prepare and train with other language raw datasets?

Where is the embedding process? Which embedding method used by ProphetNet?

How to pretrain the ProphetNet?

Thanks for your wonderful work!

I wonder how to pretrain a ProphetNet from scratch. Would you mind provide some instructions?

KeyError: "best loss", when loading checkpoint as Fairseq Model

Hi guys,

Thank you for the incredible work.

I tried to load this model from the larger checkpoint in the following manner:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained(model_name_or_path=MODEL_DIR,  \
                                         checkpoint_file='prophetnet_large_pretrained_160G_14epoch_model.pt')

but was presented with a key error:

KeyError                                  Traceback (most recent call last)
<ipython-input-13-782ea15f21fd> in <module>()
      1 MODEL_DIR = '/content/drive/My Drive/src/models/'
----> 2 model = TransformerModel.from_pretrained(model_name_or_path=MODEL_DIR,                                         checkpoint_file='prophetnet_large_pretrained_160G_14epoch_model.pt')

4 frames
/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py in _upgrade_state_dict(state)
    298     if "optimizer_history" not in state:
    299         state["optimizer_history"] = [
--> 300             {"criterion_name": "CrossEntropyCriterion", "best_loss": state["best_loss"]}
    301         ]
    302         state["last_optimizer_state"] = state["optimizer"]

KeyError: 'best_loss'

Versions
fairseq==0.9.0
torch==1.4.0

Any advice on how to proceed would be greatly appreciated, I wish to load ProphetNet into a fairseq model so I can adapt the architecture to a custom task.

KeyError: 'best_loss' at inference while using prophetnet code pretrained checkpoints.

I wanted to generated output for test file with pretrained weights without finetuning the model. I have preprocessed test.src and test.tgt. At inference time, I am encountering an error "KeyError: 'best_loss' "

Traceback (most recent call last): File "/home/ankita-sontakke/anaconda3/bin/fairseq-generate", line 8, in <module> sys.exit(cli_main()) File "/home/ankita-sontakke/anaconda3/lib/python3.8/site-packages/fairseq_cli/generate.py", line 379, in cli_main main(args) File "/home/ankita-sontakke/anaconda3/lib/python3.8/site-packages/fairseq_cli/generate.py", line 41, in main return _main(args, sys.stdout) File "/home/ankita-sontakke/anaconda3/lib/python3.8/site-packages/fairseq_cli/generate.py", line 88, in _main models, _model_args = checkpoint_utils.load_model_ensemble( File "/home/ankita-sontakke/anaconda3/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 250, in load_model_ensemble ensemble, args, _task = load_model_ensemble_and_task( File "/home/ankita-sontakke/anaconda3/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 279, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File "/home/ankita-sontakke/anaconda3/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 232, in load_checkpoint_to_cpu state = _upgrade_state_dict(state) File "/home/ankita-sontakke/anaconda3/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 374, in _upgrade_state_dict {"criterion_name": "CrossEntropyCriterion", "best_loss": state["best_loss"]} KeyError: 'best_loss'

Provide generated outputs

Hi all,
Thanks for sharing the code and models. Is it possible to directly provide the generated outputs of the model? I am specifically interested in the summarization task and would like to just have the outputs instead of decoding them myself using the pretrained model. I understand Gigaword might be subject to license issues, but the CNN/DailyMail outputs would suffice.

Thanks!

chinese

Inference AttributeError: 'Namespace' object has no attribute 'max_source_positions'

Following the next steps:

For summarization task,

download CNN\DM fine-tuned checkpoint
preprocess your text with BERT-tokenization, and you can refer to our preprocess scripts
use fairseq-generate or fairseq-interactive to generate summarization for your given text. For fairseq-generate, you can refer to our generate scripts. For fairseq-interactive, you can easily generate summarization for a typed-in text interactively. Detailed instructions can be found in fairseq manual

Originally posted by @qiweizhen in #1 (comment)

When I do inference (step 3) with fairseq-generate it appears AttributeError: 'Namespace' object has no attribute 'max_source_positions'. I don't know how to solve it...

I literally paste inference script and download every library with it respective version.

Loading .pt into fairseq model for customisation

Hi guys,

really incredible work, thank you.

May I please ask for a way of loading the available checkpoints into its fairseq model, such that someone can build upon your architecture?

Specifically, the "bpe" and "bpe_codes" arguments as below are what I'm trying to identify.

Invalid task choice error

If you run in to an error similar to
fairseq-preprocess: error: argument --task: invalid choice:

then make sure that you:

run the commands fairseq-blabla commands from within the /src folder and NOT from the project root folder
download all data into cnndm (or gigawords, depending on which one you are trying to use) inside the src folder, e.g. src/cnndm/prophetnet_tokenized/..

Official implementation of Huggingface version

I found the model checkpoint in the huggingface repo which is named microsoft / xprophetnet-large-wiki100-cased-xglue-qg.

https://huggingface.co/microsoft/xprophetnet-large-wiki100-cased-xglue-qg

Is it the official implementation? Can the checkpoint reproduce the results reported in the original paper?

Assertion Error in fine-tuning of Gigaword

Hi, thank you for distributing your code!
I tried to fine-tune the pre-trained ProphetNet (160G) on English Gigaword summarization dataset.
I conducted pre-processing described in README and then tried fine-tuning but faced the following Assertion Error:

  File "~/anaconda3/envs/py36pytorch14/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq_cli/train.py", line 333, in cli_main
    main(args)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq_cli/train.py", line 86, in main
    train(args, trainer, task, epoch_itr)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq_cli/train.py", line 126, in train
    for i, samples in enumerate(progress, start=epoch_itr.iterations_in_epoch):
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/tqdm/std.py", line 1127, in __iter__
    for obj in iterable:
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/iterators.py", line 314, in __next__
    chunk.append(next(self.itr))
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/iterators.py", line 43, in __next__
    return next(self.itr)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/iterators.py", line 36, in __iter__
    for x in self.iterable:
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/language_pair_dataset.py", line 252, in collater
    input_feeding=self.input_feeding,
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/language_pair_dataset.py", line 69, in collate
    move_eos_to_beginning=True,
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/language_pair_dataset.py", line 22, in merge
    pad_idx, eos_idx, left_pad, move_eos_to_beginning,
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/data_utils.py", line 44, in collate_tokens
    copy_tensor(v, res[i][size - len(v):] if left_pad else res[i][:len(v)])
  File "~/anaconda3/envs/py36pytorch14/lib/python3.6/site-packages/fairseq/data/data_utils.py", line 37, in copy_tensor
    assert src[-1] == eos_idx
AssertionError

pytorch version == 1.4.0

fairseq version == 0.9.0

In addition, when I tried to train the original Transformer (--arch transformer_wmt_en_de) with label_smoothed_cross_entropy, I succeeded training.

Do you have any idea to solve the above error?

Increasing --max-source-positions --max-target-positions

Hi again,

I was finetuning some data with --max-source-positions 1024 --max-target-positions 1024.

But it paused at epoch 001: 8%.
and showed: WARNING: overflow detected, setting loss scale to: 64.0
Is there, any upper limit with **--max-source-positions & --max-target-positions **.

I am training with 4 Tesla T4 GPUs.

Please help.

which languages was xProphetNet pretrained on?

I couldn't find the information in the repo. Sorry, if I missed it.

1. Post a typo. 2. Wander the time cost. 3. Error in inference

In https://github.com/microsoft/ProphetNet/tree/master/ProphetNet_Zh ,the example of preprocessing data, the first line:
import transformers import BertTokenizer
I think it should be from transformers import BertTokenizer.
I'm running finetune code in this script:

DATA_DIR=mypath/bl1/prophetnet/processed2
USER_DIR=mypath/bl1/prophetnet/prophetnet
ARCH=ngram_transformer_prophet_large
CRITERION=ngram_language_loss
SAVE_DIR=mypath/bl1/prophetnet/saves/save2
TENSORBOARD_LOGDIR=mypath/bl1/prophetnet/logs/log2
PRETRAINED_MODEL=mypath/data/bert_model/prophetnet_zh.pt

fairseq-train \
--fp16 \
--user-dir $USER_DIR --task translation_prophetnet --arch $ARCH \
--optimizer adam --adam-betas '(0.9, 0.999)' --clip-norm 0.1 \
--lr 0.00001 --min-lr 1e-09 \
--lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 1000 \
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
--criterion $CRITERION --label-smoothing 0.1 \
--update-freq 1  --max-tokens 1400 --max-sentences 7 \
--num-workers 4 \
--load-from-pretrained-model $PRETRAINED_MODEL \
--ddp-backend=no_c10d --max-epoch 10 \
--max-source-positions 1024 --max-target-positions 512 \
--skip-invalid-size-inputs-valid-test \
--save-dir $SAVE_DIR \
--keep-last-epochs 10 \
--tensorboard-logdir $TENSORBOARD_LOGDIR \
$DATA_DIR

And the log file and output in the terminal are both just stagnant, it has nothing output. I wonder if it's just too slow to quickly show some output or I've written wrong codes.
So I want to ask how much time should I cost? (I have about 7000 samples in train dataset, 2000 in validation and 2000 in test.

3. When directly using the downloaded pretrained model to inference by this script:

BEAM=5
LENPEN=1.5
CHECK_POINT=mypath/data/bert_model/prophetnet_zh.pt
TEMP_FILE=mypath/bl1/prophetnet/infers/infer2/fairseq_outputs.txt
OUTPUT_FILE=mypath/bl1/prophetnet/infers/infer2/sorted_outputs.txt

fairseq-generate mypath/bl1/prophetnet/processed2 --path $CHECK_POINT --user-dir mypath/bl1/prophetnet/prophetnet --task translation_prophetnet --batch-size 80 --gen-subset test --beam $BEAM --num-workers 4 --no-repeat-ngram-size 3 --lenpen $LENPEN 2>&1 > $TEMP_FILE
grep ^H $TEMP_FILE | cut -c 3- | sort -n | cut -f3- | sed "s/ ##//g" > $OUTPUT_FILE

I got this error message:

Traceback (most recent call last):
  File "mypath/anaconda3/envs/envfastsum/bin/fairseq-generate", line 33, in <module>
    sys.exit(load_entry_point('fairseq==0.9.0', 'console_scripts', 'fairseq-generate')())
  File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq_cli/generate.py", line 199, in cli_main
    main(args)
  File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq_cli/generate.py", line 47, in main
    task=task,
  File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 179, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task)
  File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 166, in load_checkpoint_to_cpu
    state = _upgrade_state_dict(state)
  File "mypath/anaconda3/envs/envfastsum/lib/python3.7/site-packages/fairseq/checkpoint_utils.py", line 300, in _upgrade_state_dict
    {"criterion_name": "CrossEntropyCriterion", "best_loss": state["best_loss"]}
KeyError: 'best_loss'

I've found there are also other issues referring to this problem. But I haven't found any direct ways to solve it. So I wonder how to solve this problem?

Train new model

Hi, Thanks for your awesome model
Could I ask how to train a whole new model for a specific task for a language like summarizing Vietnamese articles.

Would you mind provide some instructions on that?

Edited: I have successfully finetune from pretrained model ProphetNet X to create Vietnamese model. However, I also want to create new model as well.

@qiweizhen @yuyan2do @dayihengliu

Abstractive Summarization using ProphetNet

I'm following these steps to summarize my document -

download CNN\DM fine-tuned checkpoint
preprocess your text with BERT-tokenization, and you can refer to our preprocess scripts
use fairseq-generate or fairseq-interactive to generate summarization for your given text. For fairseq-generate, you can refer to our generate scripts. For fairseq-interactive, you can easily generate summarization for a typed-in text interactively. Detailed instructions can be found in fairseq manual

What is the --task argument for summarization?

Also, would this be sufficient if my processed input is in 2.txt?

fairseq-generate 2.txt --path content/drive/My Drive/prophetnet_large_160G_cnndm_model.pt --user-dir prophetnet --task summarization_prophetnet --batch-size 80 --gen-subset test --beam $BEAM --num-workers 4 --lenpen $LENPEN 2>&1 > $OUTPUT_FILE

	def fix_tokenization(text):
	input_tokens = text.split()
	output_tokens = []
	has_left_quote = False
	has_left_single_quote = False

	i = 0
	prev_dash = False
	while i < len(input_tokens):
	tok = input_tokens[i]
	flag_prev_dash = False
	if tok in _tok_dict.keys():
	output_tokens.append(_tok_dict[tok])
	i += 1
	elif tok == "\"":
	if has_left_quote:
	output_tokens.append("''")
	else:
	output_tokens.append("``")
	has_left_quote = not has_left_quote
	i += 1
	elif tok == "'" and len(output_tokens) > 0 and output_tokens[-1].endswith("n") and i < len(input_tokens) - 1 and input_tokens[i + 1] == "t":
	output_tokens[-1] = output_tokens[-1][:-1]
	output_tokens.append("n't")
	i += 2
	elif tok == "'" and i < len(input_tokens) - 1 and input_tokens[i + 1] in ("s", "d", "ll"):
	output_tokens.append("'"+input_tokens[i + 1])
	i += 2
	elif tok == "'":
	if has_left_single_quote:
	output_tokens.append("'")
	else:
	output_tokens.append("`")
	has_left_single_quote = not has_left_single_quote
	i += 1
	elif tok == "." and i < len(input_tokens) - 2 and input_tokens[i + 1] == "." and input_tokens[i + 2] == ".":
	output_tokens.append("...")
	i += 3
	elif tok == "," and len(output_tokens) > 0 and _is_digit(output_tokens[-1]) and i < len(input_tokens) - 1 and _is_digit(input_tokens[i + 1]):
	# $ 3 , 000 -> $ 3,000
	output_tokens[-1] += ','+input_tokens[i + 1]
	i += 2
	elif tok == "." and len(output_tokens) > 0 and output_tokens[-1].isdigit() and i < len(input_tokens) - 1 and input_tokens[i + 1].isdigit():
	# 3 . 03 -> $ 3.03
	output_tokens[-1] += '.'+input_tokens[i + 1]
	i += 2
	elif tok == "." and len(output_tokens) > 0 and len(output_tokens[-1]) == 1 and output_tokens[-1].isupper() and i < len(input_tokens) - 2 and len(input_tokens[i + 1]) == 1 and input_tokens[i + 1].isupper() and input_tokens[i + 2] == '.':
	# U . N . -> U.N.
	k = i+3
	while k+2 < len(input_tokens):
	if len(input_tokens[k + 1]) == 1 and input_tokens[k + 1].isupper() and input_tokens[k + 2] == '.':
	k += 2
	else:
	break
	output_tokens[-1] += ''.join(input_tokens[i:k])
	i += 2
	elif tok == "-":
	if i < len(input_tokens) - 1 and input_tokens[i + 1] == "-":
	output_tokens.append("--")
	i += 2
	elif i == len(input_tokens) - 1 or i == 0:
	output_tokens.append("-")
	i += 1
	elif output_tokens[-1] not in string.punctuation and input_tokens[i + 1][0] not in string.punctuation:
	output_tokens[-1] += "-"
	i += 1
	flag_prev_dash = True
	else:
	output_tokens.append("-")
	i += 1
	elif prev_dash and len(output_tokens) > 0 and tok[0] not in string.punctuation:
	output_tokens[-1] += tok
	i += 1
	else:
	output_tokens.append(tok)
	i += 1
	prev_dash = flag_prev_dash
	return " ".join(output_tokens)

microsoft / prophetnet Goto Github PK

prophetnet's Introduction

A research project for natural language generation, containing the official implementations by MSRA NLC team.

prophetnet's People

Contributors

Stargazers

Watchers

Forkers

prophetnet's Issues

Inputs:

Recommend Projects

Recommend Topics

Recommend Org