wellecks / mgs Goto Github PK
View Code? Open in Web Editor NEWMLE-Guided Parameter Search (AAAI 2021)
MLE-Guided Parameter Search (AAAI 2021)
Hi Sean,
Thank you for sharing the source code. I am trying to reproduce the policy gradient code and the overall training ends quickly after one evaluation. Is it the same as your side via 5 random seeds?
adam_epsilon=1e-08, chunk_size_train=1024, chunk_size_valid=1024, context_length=10, data_path='/Tmp/slurm.887459.0/wikitext103_raw_gpt2bpe.pkl', decoding_len_factor=1.3, decoding_max_length=500, deterministic_decoding='greedy', eval_context_length=10, fixed_length=-1, include_greedy=0, include_target=0, lr=6.25e-05, max_epochs=100, max_grad_norm=1.0, metric='lm', no_checkpoint=False, normalized_distance=1, num_samples=4, patience=10, pg_mle_mix=0.1, print_every=100, score_mle_model_path='./MLE_seed_101/', seed=101, stochastic_decoding='temp-1.0', test_model_path='./PG_seed_101_eval_lm/', token_limit_train=1024, token_limit_valid=1024, tokenizer_cache_path='../models/tokenizer_cache', train_model_path='./MLE_seed_101/', transformers_cache_path='../models/transformers_cache', valid_every=1000, warmup_steps=0, weight_decay=0.01)
===========================================================================================
Build model.
Initialized models with 124.44M parameters.
===========================================================================================
Load data.
===========================================================================================
train size: 874556 (290473 discarded) (max_length 1017) (124453 batches).
test size: 2162 (729 discarded) (max_length 541) (300 batches).
valid size: 1896 (565 discarded) (max_length 470) (262 batches).
===========================================================================================
Start training.
===========================================================================================
Epoch: 1, Batch: [100/124453], Loss: -0.022, lm: 0.688
Epoch: 1, Batch: [200/124453], Loss: -0.141, lm: 0.598
Epoch: 1, Batch: [300/124453], Loss: -0.213, lm: 0.572
Epoch: 1, Batch: [400/124453], Loss: -0.244, lm: 0.559
Epoch: 1, Batch: [500/124453], Loss: -0.301, lm: 0.526
Epoch: 1, Batch: [600/124453], Loss: -0.315, lm: 0.523
Epoch: 1, Batch: [700/124453], Loss: -0.336, lm: 0.519
Epoch: 1, Batch: [800/124453], Loss: -0.323, lm: 0.523
Epoch: 1, Batch: [900/124453], Loss: -0.327, lm: 0.520
===========================================================================================
| Validation | Step: 1000 | Loss: 3.380 | PPL: 29.372 | lm: 58.244 |
Update best sequence-level metric [lm]: [58.244].
Save the model.
The lowest sequence-level loss does not change for 10 consecutive evaluations. The final evaluation results are very close.
Test results:
test/lm 63.491
test/edit 0.901
test/nonterm 0.030
test/length_diffs 108.739
test/target/lengths 117.224
test/output/lengths 10.051
test/output/repetition-1 0.069
test/output/repetition-2 0.043
test/output/repetition-3 0.031
test/output/repetition-4 0.020
Best,
Dong
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.