wojzaremba / lstm Goto Github PK

License: Apache License 2.0

Lua 100.00%

lstm's Introduction

Long Short Term Memory Units

This is self-contained package to train a language model on word level Penn Tree Bank dataset. It achieves 115 perplexity for a small model in 1h, and 81 perplexity for a big model in a day. Model ensemble of 38 big models gives 69 perplexity. This code is derived from https://github.com/wojciechz/learning_to_execute (the same author, but a different company).

More information: http://arxiv.org/pdf/1409.2329v4.pdf

lstm's People

Contributors

Stargazers

Watchers

Forkers

kyunghyuncho ajaytalati jethrotan ivanajw nagyist devsinghsachan vseledkin kaynewest binyam 5kg mphielipp tomsercu hsheil glennq ebetica jcjohnson rockt salemameen zencoding colin1988 khellan tallinzen vimvim ai-cdrone carpedm20 jhuxiang patverga vikkamath szagoruyko hyperdrive supersom anguyen8 sandwriter cvml entylop nicolasdespres sandy4321 vangogh0318 nimishzynga gouxiayibu popfido tigerneil cmxnono agomariz denizyuret lopuhin zhen-hao zhzou2020 silky ockhamsrazor liangkai frejknutar thenghiapham soylentred lemaoliu kezpitt tsutsu-xx tunght53 zdx3578 kdjyss wgapl prm10 caomw leomauro mongolia19 bkanigoro tracym111 uestc4632 birdmanhj i55code weizh junshi15 kalyanp shivam11 appscluster pilhokim junteudjio anyuanay chubbymaggie rtvt123 kirk86 jasonhoou parisilabs miradel51 tomokane howonlee peratham dds-dong junwei-pan fone4u tianlongwang bkjackson urvashik rohinarora gopalswamyshruthi chagge rjbashar pombredanne dhiegorp hitflame

lstm's Issues

question about the g_cloneManyTimes() function

Is there any specific reason for using torch.MemoryFile(''w'') in the g_cloneManyTimes() function rather than simply using net:clone()? Thanks

QUESTION: How to run on a different treebank corpus

How to run on a specific treebank text corpus, like Italian treebank corpus?
Thank you.

epoch

Why epoch are not integers? Isn't an epoch the training shots one performs?

epoch = 0.004
epoch = 0.104
epoch = 0.204
epoch = 0.304
epoch = 0.404
epoch = 0.504
epoch = 0.604
epoch = 0.703

Illegal memory access error when running main.lua

I am trying to run this LSTM using the code from this repo.
Running ./th lstm/main.lua

[pavels@jones-6 pavels]$ warning: fbcunn not found. Falling back to cunn
Using 1-th gpu
Loading /local/pavels/lstm_zaremba/data/ptb.train.txt, size of data = 598835
Loading /local/pavels/lstm_zaremba/data/ptb.valid.txt, size of data = 54291
Loading /local/pavels/lstm_zaremba/data/ptb.test.txt, size of data = 53432
Network parameters:
{
  max_grad_norm : 5
  seq_length : 20
  batch_size : 20
  lr : 1
  max_max_epoch : 13
  rnn_size : 200
  init_weight : 0.1
  decay : 2
  dropout : 0
  layers : 2
  vocab_size : 10000
  max_epoch : 4
}
Creating a RNN LSTM network.
Starting training.
epoch = 0.007, train perp. = 0.000, wps = 3693, dw:norm() = 4.650, lr = 1.000, since beginning = 0 mins.
THCudaCheck FAIL file=/local/pavels/torch/extra/cutorch/init.c line=230 error=77 : an illegal memory access was encountered
/local/pavels/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /local/pavels/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147
^C

This happens on the fresh torch installation. I tried using different training data but the same error gets triggered.

parametersNoGrad

What is parametersNoGrad?

https://github.com/wojzaremba/lstm/blob/master/base.lua#L40

Typo in your paper?

Hi,

is there a typo in your paper on page 5? It currently reads,

"We train the model for 55 epochs with a learning rate of 1; after 14 epochs we start to reduce the learning rate by a factor of 1.15 after each epoch."

Should it be,

"We train the model for 55 epochs, initially with a learning rate of 1; after 14 epochs we start to reduce the learning rate by a factor of 1.15 after each epoch."

I think that's consistent with lines 259-260 and 32 and 36

Regards,

Edit - I think there might be a similar typo for the medium model as well?

replicate(x_inp, batch_size)

I'm having trouble understanding replicate.

-- Stacks replicated, shifted versions of x_inp
-- into a single matrix of size x_inp:size(1) x batch_size.
local function replicate(x_inp, batch_size)
   local s = x_inp:size(1)
   local x = torch.zeros(torch.floor(s / batch_size), batch_size)
   for i = 1, batch_size do
     local start = torch.round((i - 1) * s / batch_size) + 1
     local finish = start + x:size(1) - 1
     x:sub(1, x:size(1), i, i):copy(x_inp:sub(start, finish))
   end
   return x
end

Fom the comment I expected that the output would be a matrix of size number_of_words by batch_size

But the output is torch.floor(s / batch_size)by batch_size

If i load the first 5 lines from ptb.train.txt x is:

x:view(14, 8) -- change view for printing
  1   2   3   4   5   6   7   8
  9  10  11  12  13  14  15  16
 17  18  19  20  21  22  23  24
 25  26  27  28  29  30  31  32
 33  34  35  36  37  38  39  28
 25  40  27  41  42  43  27  44
 33  45  46  47  25  48  27  28
 29  30  49  50  42  43  51  52
 53  54  55  56  36  37  38  43
 57  58  59  60  25  36  61  43
 62  63  64  65  66  67  68  69
 70  71  36  72  73  43  74  75
 76  36  47  43  77  78  65  79
 80  81  28  29  82  83  84  25
[torch.DoubleTensor of size 14x8]

And the output from from replicate is:

replicate(x, 20)
  1   7  12  18  23  29  35  28  43  46  29  51  56  58  61  66  36  75  78  28
  2   8  13  19  24  30  36  25  27  47  30  52  36  59  43  67  72  76  65  29
  3   9  14  20  25  31  37  40  44  25  49  53  37  60  62  68  73  36  79  82
  4  10  15  21  26  32  38  27  33  48  50  54  38  25  63  69  43  47  80  83
  5  11  16  22  27  33  39  41  45  27  42  55  43  36  64  70  74  43  81  84
[torch.DoubleTensor of size 5x20]

Why is every second column shifted one? e.g 5-7, 11-12, 16-18, 22-23 etc?

Expecting only one start?

attempt to call field 'LogSoftMax_updateOutput' (a nil value)

Hello, Thanks for your code. but when I tried your code, it always shows the problem that :

/home/shi/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: attempt to call field 'LogSoftMax_updateOutput' (a nil value)
stack traceback:
/home/shi/torch/install/share/lua/5.1/nn/LogSoftMax.lua:4: in function 'func'
/home/shi/torch/install/share/lua/5.1/nngraph/gmodule.lua:252: in function 'neteval'
/home/shi/torch/install/share/lua/5.1/nngraph/gmodule.lua:287: in function 'forward'
main.lua:165: in function 'fp'
main.lua:246: in function 'main'
main.lua:280: in main chunk
[C]: in function 'dofile'
.../shi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Any suggestion for that ? thanks!

LSTM length

I have a time series history data but the the length is only 28 readings taken over 6 months, can I use RNN(LSTM) to predict next 3 months readings?

if yes,how can I optimize the accuracy for this small data set?

Invalid device function dropout

I get the following error when running main.lua

stack traceback:
[C]: in function 'copy'
...iavash/torch-distro/install/share/lua/5.1/nn/Dropout.lua:16: in function 'func'
...h/torch-distro/install/share/lua/5.1/nngraph/gmodule.lua:169: in function 'neteval'
...h/torch-distro/install/share/lua/5.1/nngraph/gmodule.lua:199: in function 'forward'
main.lua:148: in function 'fp'
main.lua:224: in function 'main'
main.lua:259: in main chunk
[C]: in function 'f'
[string "local f = function() return dofile 'main.lua'..."]:1: in main chunk
[C]: in function 'xpcall'
...iavash/torch-distro/install/share/lua/5.1/trepl/init.lua:583: in function 'repl'
...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x00406170

In bp function, WE should set ds:fill(1) not set ds:zero()

I am confused with the backpropagation process. In the bp function, when i equals params.seq_length, the ds shouldn't equal 1 but not 0?(your reset_ds() set ds be 0) As we know, the outermost calculus should be 1 when calculus Composite function: dy = 1 * d(x^2)= 2x. using 0 will be dy=0 * d(x^2) = 0. Is this a bug?

  reset_ds()
  for i = params.seq_length, 1, -1 do
    state.pos = state.pos - 1
    local x = state.data[state.pos]
    local y = state.data[state.pos + 1]
    local s = model.s[i - 1]
    local derr = transfer_data(torch.ones(1))
    local tmp = model.rnns[i]:backward({x, y, s},
                                       {derr, model.ds})[3]
    g_replace_table(model.ds, tmp)
    cutorch.synchronize()
  end

questions about g_cloneManyTimes

Hi,

I have some questions about the function:

g_cloneManyTimes

I understand that this expand the LSTM unit through time
However, does the clone operation really needed?

i.e can I do something like this:
in function setup
model.rnns = core_core_network

and

in function fp
model.err[i], model.s[i] = unpack(model.rnns:forward({x, y, s}))

Best,

1 to params.seq_length -1 ?

https://github.com/wojzaremba/lstm/blob/master/main.lua#L212

Shouldn't this be 1 to len?

Is cutorch.synchronize() necessary?

Hi, I found cutorch.synchronize() was used in main.lua#L184. Isn't backward() blocking? And in my experiment, cutorch.synchronize() will cost much time.

why reload the model, the perplexity became confused?

I want to save the trainned model after every epoch, I save the local model defined in this code and also extra data .i.e step....every time reload from the previous model, the perplexity became very lower(158) and start to increase(not decrease), the norm alwayes keep 0. it not act as saved before,what i MISS? what I changed is only the main function

local function main()
   --torch.setnumthreads(10)
   --print('threads: ', torch.getnumthreads())
   g_make_deterministic(1)
   
   state_train = {data=ptb.traindataset(params.batch_size)}
   state_valid = {data=ptb.validdataset(params.batch_size)}
   state_test  = {data=ptb.testdataset(params.batch_size)}
   params.vocab_size = ptb.vocab_size()
   print('Network parameters')
   print(params)
      
   local states = {state_train, state_valid, state_test}
   for _, state in pairs(states) do
      reset_state(state)
   end

   setup()

   -- load saved model before train
   local saved_model
   local file = io.open(params.model_path, "rb")
   if file then
      file:close()
      saved_model = torch.load(params.model_path)
      print('load from previous saved model')
   end
   
   model = saved_model or model
   collectgarbage()

   state_train.pos = model.state_train_pos or 1
   params.lr = model.lr or params.lr
   local step = model.step or 0
   local epoch = model.epoch or 0
   local total_cases = model.total_cases or 0
   local tics = model.tics or 0
   local beginning_time = torch.tic() - tics
   local start_time = torch.tic() - tics
   print('Starting training')
   local words_per_step = params.seq_length * params.batch_size
   local epoch_size = torch.floor(state_train.data:size(1) / params.seq_length)
   local perps
   while epoch < params.max_max_epoch do
      local perp = fp(state_train)
      if perps == nil then
	 perps = torch.zeros(epoch_size):add(perp)
      end
      perps[step % epoch_size + 1] = perp
      step = step + 1
      bp(state_train)
      total_cases = total_cases + params.seq_length * params.batch_size
      epoch = step /epoch_size
      if step % torch.round(epoch_size / 10) == 10 then
	 local wps = torch.floor(total_cases / torch.toc(start_time))
	 local since_beginning = g_d(torch.toc(beginning_time)/60)
	 print('epoch = ' .. g_f3(epoch) ..
		  ', train perp. = ' .. g_f3(torch.exp(perps:mean()))..
		  ', wps = ' .. wps ..
		  ', dw:norm() = ' .. g_f3(model.norm_dw) ..
		  ', lr = ' .. g_f3(params.lr) ..
		  ', since begining = '..since_beginning..' mins')

	 -- save model to model_path file
	 model.step = step
	 model.epoch = epoch
	 model.total_cases = total_cases
	 model.tics = torch.tic() - beginning_time
	 model.state_train_pos = state_train.pos
	 model.lr = params.lr
	 --clear_state()
	 torch.save(params.model_path, model)
	 
      end
      if step % epoch_size == 0 then
	 run_valid()
	 if epoch > params.max_epoch then
	    params.lr = params.lr / params.decay
	 end
      end
      if step % 33 == 0 then
	 collectgarbage()
      end
   end
   run_test()
   print('training is over.')
end

when train from scratch, it's good:

when train from the previous saved model, it's wrong:

why? anyone know the reason?

why split gates along dimension 2?

Hi, I can not understand why in the following code, the reshaped_gates is split along dimension 2 rather than 1, since output of reshaped_gates is a 4 x rnn_size matrix.

  -- Reshape to (batch_size, n_gates, hid_size)
  -- Then slize the n_gates dimension, i.e dimension 2
  local reshaped_gates =  nn.Reshape(4,params.rnn_size)(gates)
  local sliced_gates = nn.SplitTable(2)(reshaped_gates)

Could anybody explain it? Thx in advance!