Code Monkey home page Code Monkey logo

lstm's Introduction

Long Short Term Memory Units

This is self-contained package to train a language model on word level Penn Tree Bank dataset. It achieves 115 perplexity for a small model in 1h, and 81 perplexity for a big model in a day. Model ensemble of 38 big models gives 69 perplexity. This code is derived from https://github.com/wojciechz/learning_to_execute (the same author, but a different company).

More information: http://arxiv.org/pdf/1409.2329v4.pdf

lstm's People

Contributors

alexbw avatar soumith avatar wojzaremba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lstm's Issues

epoch

Why epoch are not integers? Isn't an epoch the training shots one performs?

epoch = 0.004
epoch = 0.104
epoch = 0.204
epoch = 0.304
epoch = 0.404
epoch = 0.504
epoch = 0.604
epoch = 0.703

Illegal memory access error when running main.lua

I am trying to run this LSTM using the code from this repo.
Running ./th lstm/main.lua

[pavels@jones-6 pavels]$ warning: fbcunn not found. Falling back to cunn
Using 1-th gpu
Loading /local/pavels/lstm_zaremba/data/ptb.train.txt, size of data = 598835
Loading /local/pavels/lstm_zaremba/data/ptb.valid.txt, size of data = 54291
Loading /local/pavels/lstm_zaremba/data/ptb.test.txt, size of data = 53432
Network parameters:
{
  max_grad_norm : 5
  seq_length : 20
  batch_size : 20
  lr : 1
  max_max_epoch : 13
  rnn_size : 200
  init_weight : 0.1
  decay : 2
  dropout : 0
  layers : 2
  vocab_size : 10000
  max_epoch : 4
}
Creating a RNN LSTM network.
Starting training.
epoch = 0.007, train perp. = 0.000, wps = 3693, dw:norm() = 4.650, lr = 1.000, since beginning = 0 mins.
THCudaCheck FAIL file=/local/pavels/torch/extra/cutorch/init.c line=230 error=77 : an illegal memory access was encountered
/local/pavels/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /local/pavels/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147
^C

This happens on the fresh torch installation. I tried using different training data but the same error gets triggered.

Typo in your paper?

Hi,

is there a typo in your paper on page 5? It currently reads,

"We train the model for 55 epochs with a learning rate of 1; after 14 epochs we start to reduce the learning rate by a factor of 1.15 after each epoch."

Should it be,

"We train the model for 55 epochs, initially with a learning rate of 1; after 14 epochs we start to reduce the learning rate by a factor of 1.15 after each epoch."

I think that's consistent with lines 259-260 and 32 and 36

Regards,

Aj

Edit - I think there might be a similar typo for the medium model as well?

replicate(x_inp, batch_size)

I'm having trouble understanding replicate.

-- Stacks replicated, shifted versions of x_inp
-- into a single matrix of size x_inp:size(1) x batch_size.
local function replicate(x_inp, batch_size)
   local s = x_inp:size(1)
   local x = torch.zeros(torch.floor(s / batch_size), batch_size)
   for i = 1, batch_size do
     local start = torch.round((i - 1) * s / batch_size) + 1
     local finish = start + x:size(1) - 1
     x:sub(1, x:size(1), i, i):copy(x_inp:sub(start, finish))
   end
   return x
end

Fom the comment I expected that the output would be a matrix of size number_of_words by batch_size

But the output is torch.floor(s / batch_size)by batch_size

If i load the first 5 lines from ptb.train.txt x is:

x:view(14, 8) -- change view for printing
  1   2   3   4   5   6   7   8
  9  10  11  12  13  14  15  16
 17  18  19  20  21  22  23  24
 25  26  27  28  29  30  31  32
 33  34  35  36  37  38  39  28
 25  40  27  41  42  43  27  44
 33  45  46  47  25  48  27  28
 29  30  49  50  42  43  51  52
 53  54  55  56  36  37  38  43
 57  58  59  60  25  36  61  43
 62  63  64  65  66  67  68  69
 70  71  36  72  73  43  74  75
 76  36  47  43  77  78  65  79
 80  81  28  29  82  83  84  25
[torch.DoubleTensor of size 14x8]

And the output from from replicate is:

replicate(x, 20)
  1   7  12  18  23  29  35  28  43  46  29  51  56  58  61  66  36  75  78  28
  2   8  13  19  24  30  36  25  27  47  30  52  36  59  43  67  72  76  65  29
  3   9  14  20  25  31  37  40  44  25  49  53  37  60  62  68  73  36  79  82
  4  10  15  21  26  32  38  27  33  48  50  54  38  25  63  69  43  47  80  83
  5  11  16  22  27  33  39  41  45  27  42  55  43  36  64  70  74  43  81  84
[torch.DoubleTensor of size 5x20]

Why is every second column shifted one? e.g 5-7, 11-12, 16-18, 22-23 etc?

attempt to call field 'LogSoftMax_updateOutput' (a nil value)

Hello, Thanks for your code. but when I tried your code, it always shows the problem that :

/home/shi/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: attempt to call field 'LogSoftMax_updateOutput' (a nil value)
stack traceback:
/home/shi/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: in function 'func'
/home/shi/torch/install/share/lua/5.1/nngraph/gmodule.lua:252: in function 'neteval'
/home/shi/torch/install/share/lua/5.1/nngraph/gmodule.lua:287: in function 'forward'
main.lua:165: in function 'fp'
main.lua:246: in function 'main'
main.lua:280: in main chunk
[C]: in function 'dofile'
.../shi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Any suggestion for that ? thanks!

LSTM length

I have a time series history data but the the length is only 28 readings taken over 6 months, can I use RNN(LSTM) to predict next 3 months readings?

if yes,how can I optimize the accuracy for this small data set?

Invalid device function dropout

I get the following error when running main.lua

stack traceback:
[C]: in function 'copy'
...iavash/torch-distro/install/share/lua/5.1/nn/Dropout.lua:16: in function 'func'
...h/torch-distro/install/share/lua/5.1/nngraph/gmodule.lua:169: in function 'neteval'
...h/torch-distro/install/share/lua/5.1/nngraph/gmodule.lua:199: in function 'forward'
main.lua:148: in function 'fp'
main.lua:224: in function 'main'
main.lua:259: in main chunk
[C]: in function 'f'
[string "local f = function() return dofile 'main.lua'..."]:1: in main chunk
[C]: in function 'xpcall'
...iavash/torch-distro/install/share/lua/5.1/trepl/init.lua:583: in function 'repl'
...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x00406170

In bp function, WE should set ds:fill(1) not set ds:zero()

I am confused with the backpropagation process. In the bp function, when i equals params.seq_length, the ds shouldn't equal 1 but not 0?(your reset_ds() set ds be 0) As we know, the outermost calculus should be 1 when calculus Composite function: dy = 1 * d(x^2)= 2x. using 0 will be dy=0 * d(x^2) = 0. Is this a bug?

  reset_ds()
  for i = params.seq_length, 1, -1 do
    state.pos = state.pos - 1
    local x = state.data[state.pos]
    local y = state.data[state.pos + 1]
    local s = model.s[i - 1]
    local derr = transfer_data(torch.ones(1))
    local tmp = model.rnns[i]:backward({x, y, s},
                                       {derr, model.ds})[3]
    g_replace_table(model.ds, tmp)
    cutorch.synchronize()
  end

questions about g_cloneManyTimes

Hi,

I have some questions about the function:

g_cloneManyTimes

I understand that this expand the LSTM unit through time
However, does the clone operation really needed?

i.e can I do something like this:
in function setup
model.rnns = core_core_network

and

in function fp
model.err[i], model.s[i] = unpack(model.rnns:forward({x, y, s}))

Best,

why reload the model, the perplexity became confused?

I want to save the trainned model after every epoch, I save the local model defined in this code and also extra data .i.e step....every time reload from the previous model, the perplexity became very lower(158) and start to increase(not decrease), the norm alwayes keep 0. it not act as saved before,what i MISS? what I changed is only the main function

local function main()
   --torch.setnumthreads(10)
   --print('threads: ', torch.getnumthreads())
   g_make_deterministic(1)
   
   state_train = {data=ptb.traindataset(params.batch_size)}
   state_valid = {data=ptb.validdataset(params.batch_size)}
   state_test  = {data=ptb.testdataset(params.batch_size)}
   params.vocab_size = ptb.vocab_size()
   print('Network parameters')
   print(params)
      
   local states = {state_train, state_valid, state_test}
   for _, state in pairs(states) do
      reset_state(state)
   end

   setup()

   -- load saved model before train
   local saved_model
   local file = io.open(params.model_path, "rb")
   if file then
      file:close()
      saved_model = torch.load(params.model_path)
      print('load from previous saved model')
   end
   
   model = saved_model or model
   collectgarbage()

   state_train.pos = model.state_train_pos or 1
   params.lr = model.lr or params.lr
   local step = model.step or 0
   local epoch = model.epoch or 0
   local total_cases = model.total_cases or 0
   local tics = model.tics or 0
   local beginning_time = torch.tic() - tics
   local start_time = torch.tic() - tics
   print('Starting training')
   local words_per_step = params.seq_length * params.batch_size
   local epoch_size = torch.floor(state_train.data:size(1) / params.seq_length)
   local perps
   while epoch < params.max_max_epoch do
      local perp = fp(state_train)
      if perps == nil then
	 perps = torch.zeros(epoch_size):add(perp)
      end
      perps[step % epoch_size + 1] = perp
      step = step + 1
      bp(state_train)
      total_cases = total_cases + params.seq_length * params.batch_size
      epoch = step /epoch_size
      if step % torch.round(epoch_size / 10) == 10 then
	 local wps = torch.floor(total_cases / torch.toc(start_time))
	 local since_beginning = g_d(torch.toc(beginning_time)/60)
	 print('epoch = ' .. g_f3(epoch) ..
		  ', train perp. = ' .. g_f3(torch.exp(perps:mean()))..
		  ', wps = ' .. wps ..
		  ', dw:norm() = ' .. g_f3(model.norm_dw) ..
		  ', lr = ' .. g_f3(params.lr) ..
		  ', since begining = '..since_beginning..' mins')

	 -- save model to model_path file
	 model.step = step
	 model.epoch = epoch
	 model.total_cases = total_cases
	 model.tics = torch.tic() - beginning_time
	 model.state_train_pos = state_train.pos
	 model.lr = params.lr
	 --clear_state()
	 torch.save(params.model_path, model)
	 
      end
      if step % epoch_size == 0 then
	 run_valid()
	 if epoch > params.max_epoch then
	    params.lr = params.lr / params.decay
	 end
      end
      if step % 33 == 0 then
	 collectgarbage()
      end
   end
   run_test()
   print('training is over.')
end

when train from scratch, it's good:
tmpdir--17_3_6_15_27_50
when train from the previous saved model, it's wrong:
tmpdir--17_3_6_15_29_26

why? anyone know the reason?

why split gates along dimension 2?

Hi, I can not understand why in the following code, the reshaped_gates is split along dimension 2 rather than 1, since output of reshaped_gates is a 4 x rnn_size matrix.

  -- Reshape to (batch_size, n_gates, hid_size)
  -- Then slize the n_gates dimension, i.e dimension 2
  local reshaped_gates =  nn.Reshape(4,params.rnn_size)(gates)
  local sliced_gates = nn.SplitTable(2)(reshaped_gates)

Could anybody explain it? Thx in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.