Code Monkey home page Code Monkey logo

dynamic-evaluation's Introduction

Dynamic evaluation for pytorch language models as implemented in Dynamic Evaluation of Neural Sequence Models (https://arxiv.org/abs/1709.07432).

Requirements: python 3 (tested in 3.6), pytorch (tested in version 0.1.12)

Instructions for use:

  1. Train a language model using an existing repository, such as the pytorch language modelling tutorial https://github.com/pytorch/examples/tree/master/word_language_model . This should save a .pt file with the trained model

  2. Copy the file dynamiceval.py into the repository

  3. run dynamic evaluation with: python dynamiceval.py --model modelname.pt

To replicate results in paper for AWD-LSTM + dynamic eval, train the language model using this repository https://github.com/salesforce/awd-lstm-lm . We used the original codebase from this repository, with the goal of exact replication of results from their paper (which we failed to achieve). The default settings and hyperparameters for dynamiceval.py are the same as used for results in paper for AWD-LSTM + dynamic eval on both WikiText-2 and PTB.

To get stronger results with any other model, you can run with:

python dynamiceval.py --model modelname.pt --grid

This will do a hyperparameter search on the validation set, takes a few hours on PTB

command line arguements:

--model (required) -filename of the trained model to be evaluated

--data -location of the data corpus

--grid -hyperparameter grid search over lambda and eta, gives both valid and test error

--gridfast -same as grid, but only uses first 30k validation tokens for search

--val -measure validation error instead of test error

--gpu -specify a gpu device, uses device 0 by default (set negative for cpu)

--bptt -sequence segment length for dynamic eval, also used for gradient statistics on training data

--batch_size -batch size for gradient statistics on training data

--lr -learning rate eta (ignored if --grid is set)

--lamb -decay rate lambda (ignored if --grid is set)

--epsilon -stabilization parameter epsilon

--max_batches -max number of batches for training gradient statistics (-1 uses full training set)

--ms

-The code used to obtain results in the paper for AWD-LSTMs inadvertently used the root sum squared gradients in place of the root mean squared gradients from the training data (this is equation 6 in the paper). Using root sum squared gradients and root mean squared gradients are equivalent under a hyperparameter transformation. By default, this code uses sum of squared gradients on training data, but the --ms option can be used for mean squared gradients, which is consistent with the equations in the paper. Using the --ms flag and scaling the hyperparameters --lr and --epsilon each by (1/sqrt(N)), where N is the number of training batches, yields equivalent results to not using the --ms flag.

dynamic-evaluation's People

Watchers

 avatar  avatar  avatar

Forkers

nkamsteve

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.