The mgs from wellecks

About PG Results

Hi Sean,

Thank you for sharing the source code. I am trying to reproduce the policy gradient code and the overall training ends quickly after one evaluation. Is it the same as your side via 5 random seeds?

adam_epsilon=1e-08, chunk_size_train=1024, chunk_size_valid=1024, context_length=10, data_path='/Tmp/slurm.887459.0/wikitext103_raw_gpt2bpe.pkl', decoding_len_factor=1.3, decoding_max_length=500, deterministic_decoding='greedy', eval_context_length=10, fixed_length=-1, include_greedy=0, include_target=0, lr=6.25e-05, max_epochs=100, max_grad_norm=1.0, metric='lm', no_checkpoint=False, normalized_distance=1, num_samples=4, patience=10, pg_mle_mix=0.1, print_every=100, score_mle_model_path='./MLE_seed_101/', seed=101, stochastic_decoding='temp-1.0', test_model_path='./PG_seed_101_eval_lm/', token_limit_train=1024, token_limit_valid=1024, tokenizer_cache_path='../models/tokenizer_cache', train_model_path='./MLE_seed_101/', transformers_cache_path='../models/transformers_cache', valid_every=1000, warmup_steps=0, weight_decay=0.01)
===========================================================================================
Build model.
Initialized models with 124.44M parameters.
===========================================================================================
Load data.
===========================================================================================
train size: 874556 (290473 discarded) (max_length 1017) (124453 batches).
test size: 2162 (729 discarded) (max_length 541) (300 batches).
valid size: 1896 (565 discarded) (max_length 470) (262 batches).
===========================================================================================
Start training.
===========================================================================================
Epoch: 1, Batch: [100/124453], Loss: -0.022, lm: 0.688
Epoch: 1, Batch: [200/124453], Loss: -0.141, lm: 0.598
Epoch: 1, Batch: [300/124453], Loss: -0.213, lm: 0.572
Epoch: 1, Batch: [400/124453], Loss: -0.244, lm: 0.559
Epoch: 1, Batch: [500/124453], Loss: -0.301, lm: 0.526
Epoch: 1, Batch: [600/124453], Loss: -0.315, lm: 0.523
Epoch: 1, Batch: [700/124453], Loss: -0.336, lm: 0.519
Epoch: 1, Batch: [800/124453], Loss: -0.323, lm: 0.523
Epoch: 1, Batch: [900/124453], Loss: -0.327, lm: 0.520
===========================================================================================
| Validation | Step: 1000 | Loss: 3.380 | PPL: 29.372 | lm: 58.244 |
Update best sequence-level metric [lm]: [58.244].
Save the model.

The lowest sequence-level loss does not change for 10 consecutive evaluations. The final evaluation results are very close.

Test results:
test/lm 63.491
test/edit 0.901
test/nonterm 0.030
test/length_diffs 108.739
test/target/lengths 117.224
test/output/lengths 10.051
test/output/repetition-1 0.069
test/output/repetition-2 0.043
test/output/repetition-3 0.031
test/output/repetition-4 0.020

Best,

Dong

wellecks / mgs Goto Github PK

mgs's People

Contributors

Stargazers

Watchers

Forkers

mgs's Issues

About PG Results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent