Code Monkey home page Code Monkey logo

neuralconvo's Introduction

Neural Conversational Model in Torch

This is an attempt at implementing Sequence to Sequence Learning with Neural Networks (seq2seq) and reproducing the results in A Neural Conversational Model (aka the Google chatbot).

The Google chatbot paper became famous after cleverly answering a few philosophical questions, such as:

Human: What is the purpose of living?
Machine: To live forever.

How it works

The model is based on two LSTM layers. One for encoding the input sentence into a "thought vector", and another for decoding that vector into a response. This model is called Sequence-to-sequence or seq2seq.

seq2seq
Source: http://googleresearch.blogspot.ca/2015/11/computer-respond-to-this-email.html

In this experiment, we train the seq2seq model with movie dialogs from the Cornell Movie-Dialogs Corpus. The lines are shortened to the first sentence.

Sample conversation

Here's a sample conversation after training for 20 epoch with 50000 examples, using the following command:

th train.lua --cuda --dataset 50000 --hiddenSize 1000

(Took 3 days to train on my GeForce GTX 780M.)

For OpenCL, use --opencl instead of --cuda. To train on CPU, don't provide any of those two.

WARNING: I can no longer reproduce those results. The cause seems to be a change in one of dependencies. But I'm currently working on a new implementation based on harvardnlp/seq2seq-attn.

me: Hello?
bot: Hi.

me: How are you?
bot: I'm fine.

me: What's your name?
bot: It's hard to describe.

me: How so?
bot: I'm not sure.

me: What color is the sky?
bot: It's blue.

me: What is your job?
bot: It's not that i'm a fucking werewolf!

me: What is the purpose of life?
bot: A gift.

me: Are you intelligent?
bot: Yes, well...

me: Are you a machine?
bot: That's a lie.

me: Are you human?
bot: No, i'm not.

me: What are you?
bot: I'm not sure.

me: Do you plan on taking over the world?
bot: No, i don't.

Phew! That was close. Good thing I didn't train it on the full dataset. Please experiment responsibly.

(Disclaimer: nonsensical responses have been removed.)

Installing

  1. Install Torch.

  2. Install the following additional Lua libs:

    luarocks install nn
    luarocks install rnn
    luarocks install penlight

    To train with CUDA install the latest CUDA drivers, toolkit and run:

    luarocks install cutorch
    luarocks install cunn

    To train with opencl install the lastest Opencl torch lib:

    luarocks install cltorch
    luarocks install clnn
  3. Download the Cornell Movie-Dialogs Corpus and extract all the files into data/cornell_movie_dialogs.

Training

th train.lua [-h / options]

The model will be saved to data/model.t7 after each epoch if it has improved (error decreased).

Options (some, not all)

  • --opencl use opencl for computation (requires torch-cl)
  • --cuda use cuda for computation
  • --gpu [index] use the nth GPU for computation (eg. on a 2015 MacBook --gpu 0 results in the Intel GPU being used while --gpu 1 uses the far more powerful AMD GPU)
  • -- dataset [size] control the size of the dataset
  • --maxEpoch [amount] specify the number of epochs to run

Testing

To load the model and have a conversation:

th eval.lua

License

MIT License

Copyright (c) 2016 Marc-Andre Cournoyer

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

neuralconvo's People

Contributors

chenb67 avatar danpol avatar lfuelling avatar macournoyer avatar niiamon avatar spro avatar tigerneil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neuralconvo's Issues

File.lua:375: unknown objectStep

Hi.

Thank you for your research, I'd like to repeat on my machine.

When I tried to execute your solution, I got next error:
Can you help me?

$> th train.lua --hiddenSize 1000 --dataset 5000
-- Loading dataset
data/vocab.t7 not found
-- Parsing Cornell movie dialogs data set ...
 [======================================== 387810/387810 ==============================>]  Tot: 2s117ms | Step: 0ms     
-- Pre-processing data
 [======================================== 5000/5000 ==================================>]  Tot: 1s178ms | Step: 0ms     
-- Removing low frequency words
 [======================================== 8151/8151 ==================================>]  Tot: 645ms | Step: 0ms       
Writing data/examples.t7 ...
 [======================================== 8151/8151 ==================================>]  Tot: 1s492ms | Step: 0ms 
Writing data/vocab.t7 ...

Dataset stats:
  Vocabulary size: 7061
         Examples: 8151

-- Epoch 1 / 50

 [================>....................... 1672/8151 ...................................]  ETA: 2h9m | Step: 1s201ms    
/home/y/torch/install/bin/luajit: /home/y/torch/install/share/lua/5.1/torch/File.lua:375: unknown objectStep: 1s551ms   
stack traceback:
        [C]: in function 'error'
        /home/y/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject'
        ./dataset.lua:138: in function '(for generator)'
        train.lua:65: in main chunk
        [C]: in function 'dofile'
        /home/y/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x0804cae0

Error when trying to load model

There is a problem when I run th -i eval.lua.

This is what I've got:

Loading vocabulary from data/vocab.t7 ...
-- Loading model
/Users/daniel/torch/install/bin/luajit: /Users/daniel/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    /Users/daniel/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
    /Users/daniel/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/daniel/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/daniel/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
    /Users/daniel/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/daniel/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
    /Users/daniel/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
    eval.lua:26: in main chunk
    [C]: in function 'dofile'
    ...niel/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x0102afbbd0

the result is not like what u said

i use this command train:

th train.lua --dataset 50000 --hiddenSize 1000

though it is CPU, but after weeks, It trained, and test like this:
result

cltorch and clnn support evaluation error

actually I succeed with @hughperkins 's help to make neuralconvo trainable under cltorch and clnn. the project is here.

However, after training, I test it with eval.lua, the following error occurs. any comments?

./seq2seq.lua:124: attempt to call method 'sort' (a nil value)
stack traceback:
    ./seq2seq.lua:124: in function 'eval'
    eval.lua:70: in function 'say'
    [string "_RESULT={say "hello"}"]:1: in main chunk
    [C]: in function 'xpcall'
    /Users/zhuxiaohu/torch/install/share/lua/5.1/trepl/init.lua:650: in function 'repl'
    ...aohu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:197: in main chunk
    [C]: at 0x0104929bc0

Maximum Vocabulary Size

Hi @macournoyer

We are replacing the words with "unknown" after we have encountered unique words equal to the vocab size.

if self.maxVocabSize > 0 and self.wordsCount >= self.maxVocabSize then
    -- We've reached the maximum size for the vocab. Replace w/ unknown token
    return self.unknownToken
end

I think we might get better results, if we replace the words with on basis of their frequency in the corpus and not the order of occurrence. We will start replacing the words in decreasing order of their frequency till we hit the vocab size. What do you think?

This might be a reason of inferior results when we restrict the vocabulary.

Using LR decay with Adam

Hi @macournoyer

Do we need to use the LR decay along with Adam? I guess Adam would already scale the learning rate as it will be accumulating the gradients ?

This is just my intuition.. was wondering if you tried both the cases and came to some conclusion?

"cltorch" and "clnn" required even when running with "--cuda"

I am running the training module using using "--cuda" parameter.

th train.lua --cuda --dataset 5000 --hiddenSize 100

But since, the file "neuralconvo.lua" requires "cltorch" and "clnn", the training fails. Is this intended? Can we safely remove the requirement for "cltorch" and "clnn" from "neuralconvo.lua" ?

Why do we append all the previous tokens of the decoder to generate next token?

Hi @macournoyer @chenb67 ,

In the 'eval' function of seq2seq.lua, we append the previous tokens generated by the decoder to generate the next token. Why do we do this?

My understanding was that the decoder will be able to remember the older tokens using its memory and thus we may not need to feed all the generated tokens again. We might be okay in just feeding the latest generated token.


`local prediction = self.decoder:forward(torch.Tensor(output))[#output]`
----
----
next_output = wordIds[1]
---- Here we are appending the previously generated tokens
table.insert(output, next_output)

Getting <unknown> as result

Hi all,

I have train a model for some days, with 50000 datasize, everything is fine, until I try to eval it. And I got:

you> Hi
neuralconvo> <unknown>.
you> How are you?
neuralconvo> <unknown>.
you> What is your name?
neuralconvo> <unknown>.

Every response of it is <unknown>. This is wired and I cannot find the reason, because there is no error occur so far.
So do you have any thoughts on this?

Thanks
Vincent

cuda rnn training failed on assert after if torch.isTensor(input)

-- Epoch 1 / 50

torch/install/bin/luajit: ...o/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table
stack traceback:
[C]: in function 'assert'
torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward'
./seq2seq.lua:74: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

SequencerCriterion.lua:42: expecting target table

stormy@stormy-desktop:/media/stormy/Digital-Data/AI/neuralconvo$ th train.lua --dataset 1000 --hiddenSize 1000
-- Loading dataset
Loading vocabulary from data/vocab.t7 ...

Dataset stats:
Vocabulary size: 25931
Examples: 83632

-- Epoch 1 / 50
/home/stormy/torch/install/bin/luajit: ...y/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table
stack traceback:
[C]: in function 'assert'
...y/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward'
./seq2seq.lua:74: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...ormy/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50

'dpnn' update problem

whenever I run th train.lua, I get the error message:

/auto/extra/b01902004/torch/install/bin/luajit: ...tra/b01902004/torch/install/share/lua/5.1/trepl/init.lua:383: ...tra/b01902004/torch/install/share/lua/5.1/trepl/init.lua:383: ...extra/b01902004/torch/install/share/lua/5.1/rnn/init.lua:4: Please update dpnn : luarocks install dpnn
stack traceback:
    [C]: in function 'error'
    ...tra/b01902004/torch/install/share/lua/5.1/trepl/init.lua:383: in function 'require'
    train.lua:1: in main chunk
    [C]: in function 'dofile'
    ...2004/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405be0

and after updating dpnn by luarocks install dpnn and run th train.lua again, it gives me the exact same error message again

the output of luarocks show dpnn:

dpnn scm-1 - deep extensions to nn Modules and Criterions

sharedClone, type, outside, updateGradParameters, Serial, Inception, etc.

License:    BSD
Homepage:   https://github.com/Element-Research/dpnn

Parameter updation happening after every training example

Hi @macournoyer

We are dividing the training examples into mini-batches while training. However, it looks like that we are updating the parameters of the network after every training example.

self.encoder:updateGradParameters(self.momentum)
self.decoder:updateGradParameters(self.momentum)
self.decoder:updateParameters(self.learningRate)
self.encoder:updateParameters(self.learningRate)
self.encoder:zeroGradParameters()
self.decoder:zeroGradParameters()

Are we intentionally doing some kind of online training ? I guess we can average the gradients over all the samples of a mini-batch and then update the parameters?

Thanks,
Vikram

eval.lua sort nil value

Attempting to run eval.lua on trained model (5000 dataset 1000 hidden) throwing sort nil value. All appropriate torch/lua models are installed and functioning as far as I can tell, and I even tried rebuilding them with cmake 3.2.

user@neuralbox:~/neuralconvo$ th -i eval.lua
libthclnn_searchpath    /home/user/torch/install/lib/lua/5.1/libTHCLNN.so
Loading vocabulary from data/vocab.t7 ...
-- Loading model
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Tahiti

  ______             __   |  Torch7
 /_  __/__  ________/ /   |  Scientific computing for Lua.
  / / / _ \/ __/ __/ _ \  |  Type ? for help
 /_/  \___/_/  \__/_//_/  |  https://github.com/torch
                          |  http://torch.ch

th> say "Hello."
./seq2seq.lua:123: attempt to call method 'sort' (a nil value)
stack traceback:
        ./seq2seq.lua:123: in function 'eval'
        eval.lua:66: in function 'say'
        [string "_RESULT={say "Hello."}"]:1: in main chunk
        [C]: in function 'xpcall'
        /home/user/torch/install/share/lua/5.1/trepl/init.lua:651: in function 'repl'
        ...user/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
        [C]: at 0x00406670
                                                                      [1.2571s]
th>

Unable to convert argument 3 from cdata

Im using Lua 5.1 and torch7. While trying to execute

th train.lua --cuda --dataset 50000 --hiddenSize 1000

I'm getting the below error:

Dataset stats:
  Vocabulary size: 25931
         Examples: 83632

-- Epoch 1 / 50

/home/siva/torch/install/bin/lua: unable to convert argument 3 from cdata<struct THCudaTensor*> to cdata<struct THCudaLongTensor*>
stack traceback:
        [C]: in function 'v'
        /home/siva/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'ClassNLLCriterion_updateOutput'
        ...torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:41: in function <...torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:29>
        (tail call): ?
        ...rch/install/share/lua/5.1/rnn/SequencerCriterion.lua:55: in function <...rch/install/share/lua/5.1/rnn/SequencerCriterion.lua:39>
        (tail call): ?
        ./seq2seq.lua:74: in function 'train'
        train.lua:85: in main chunk
        [C]: in function 'dofile'
        .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: ?


Segmentation fault (core dumped)

After successful installation of Torch and dependencies, I started training with th train.lua --cuda --dataset 5000 --hiddenSize 1000

and I got this in my terminal

seg_fault_torch

If anyone needs more information, please comment.

'unknown object' error when I tried to load examples.t7

First of all, thanks to share your neural conversational model implementation.

I want to see inside of 'examples' variable in your source code.
Specifically, I want to ask solution for error when I tried to load 'examples.t7' written by your code.

As a prerequisite of loading 'examples.t7' in torch7 interactive mode, I first typed
require 'neuralconvo'
require 'cutorch'

Then, I typed
examples = torch.load('data/examples.t7')
and the following error occurred.

/home/kenkim/torch/install/share/lua/5.1/torch/File.lua:294: unknown object.

Any suggestion to see inside of examples variable?

/rnn/SequencerCriterion.lua:42: expecting target table

Hello,
I tried to train convo with some variations of the following:

$ th train.lua --opencl --dataset 5000 --hiddenSize 100

I am running with opencl (ATI Radeon) or plain CPU, both causing the following error:

libthclnn_searchpath /home/jack/torch-cl/install/lib/lua/5.1/libTHCLNN.so
-- Loading dataset
Loading vocabulary from data/vocab.t7 ...

Dataset stats:
Vocabulary size: 10682
Examples: 15877
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Oland

-- Epoch 1 / 50

/home/jack/torch-cl/install/bin/luajit: ...k/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table
stack traceback:
[C]: in function 'assert'
...k/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward'
./seq2seq.lua:80: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...k/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d30

Would you know what is the issue here. Unfortunately I am quite new to Lua

Multiple GPU support

Hi Friends,

Do we have the support to run training on multiple GPUs to save time? I am having a machine with 4 GPUs but looks like only one of the GPU is being utilised.

Also, when i am trying to train over complete dataset using the following command, it is taking around 9 hours per epoch. Is this time expected or am i doing something wrong here?

th train.lua --cuda --dataset 0 --hiddenSize 1000

Thanks

model:getParameters() used multiple times

For the function model:getParameters(), here's what the documentation (https://github.com/torch/nn/blob/master/doc/module.md#flatparameters-flatgradparameters-getparameters) says:

Since the storage of every weight and gradWeight is changed, this function should be called only once on a given network.

However, in train.lua, this function gets called once every epoch. I'm not entirely sure how this affects the model or the results though. In any case, just putting this out there.

why need opencl if I user 'th train.lua --cuda --gpu 0'

root@rzai00:/prj/neuralconvo# th train.lua --cuda --gpu 0
/home/rzai/torch/install/bin/luajit: /home/rzai/torch/install/share/lua/5.1/trepl/init.lua:384: /home/rzai/torch/install/share/lua/5.1/trepl/init.lua:384: module 'cltorch' not found:No LuaRocks module found for cltorch
no field package.preload['cltorch']
no file '/home/rzai/.luarocks/share/lua/5.1/cltorch.lua'
no file '/home/rzai/.luarocks/share/lua/5.1/cltorch/init.lua'
no file '/home/rzai/torch/install/share/lua/5.1/cltorch.lua'
no file '/home/rzai/torch/install/share/lua/5.1/cltorch/init.lua'
no file './cltorch.lua'
no file '/home/rzai/torch/install/share/luajit-2.1.0-beta1/cltorch.lua'
no file '/usr/local/share/lua/5.1/cltorch.lua'
no file '/usr/local/share/lua/5.1/cltorch/init.lua'
no file '/home/rzai/.luarocks/lib/lua/5.1/cltorch.so'
no file '/home/rzai/torch/install/lib/lua/5.1/cltorch.so'
no file '/home/rzai/torch/install/lib/cltorch.so'
no file './cltorch.so'
no file '/usr/local/lib/lua/5.1/cltorch.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'error'
/home/rzai/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
train.lua:1: in main chunk
[C]: in function 'dofile'
...rzai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
root@rzai00:
/prj/neuralconvo#

Question regarding encoder-decoder coupler

Hi there, I have a question regarding the encoder-decoder coupler suggested in the examples here. My question is specifically regarding the decoder.

At the first time step, the decoder accepts the final embedding outputted by the encoder. After this first time step, the input for the next step is the decoder's output from the previous time step.

The way I have implemented a similar model in the past in Python, worked such that the final input from the encoder is input repeatedly to the decoder at each time step. My understanding is this is very useful to make sure the decoder is consistently addressing the encoded vector, rather than greedily generating sequences according to it's own output.

Have I understood the implementation correctly? If not, where is my understanding wrong? If so, what would be the best way to pass the encoded vector repeatedly to the decoder, rather than passing it's own outputs back into the next time step?

Thanks!

Train in another language

I want to give training to the text in another language.
What files are needed and in what format?

sizes do not match

zhangtekiMacBook-Air:lua_seq2seq zhangx$ th train.lua --opencl --dataset 5000 --hiddenSize 1000
libthclnn_searchpath /Users/zhangx/torch/install/lib/lua/5.1/libTHCLNN.so
-- Loading dataset
Loading vocabulary from data/vocab.t7 ...

Dataset stats:
Vocabulary size: 7061
Examples: 8151
Using Apple , OpenCL platform: Apple
Using OpenCL device: HD Graphics 5000

-- Epoch 1 / 50

/Users/zhangx/torch/install/bin/luajit: /Users/zhangx/torch/install/share/lua/5.1/nn/Container.lua:67:
In 3 module of nn.Sequential:
...hangx/torch/install/share/lua/5.1/rnn/recursiveUtils.lua:57: bad argument #2 to 'add' (sizes do not match at /tmp/luarocks_cltorch-scm-1-387/cltorch/cltorch/src/lib/THClTensorMathPointwise.cpp:156)
stack traceback:
[C]: in function 'add'
...hangx/torch/install/share/lua/5.1/rnn/recursiveUtils.lua:57: in function 'recursiveAdd'
/Users/zhangx/torch/install/share/lua/5.1/rnn/LSTM.lua:192: in function '_updateGradInput'
...gx/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:57: in function 'updateGradInput'
/Users/zhangx/torch/install/share/lua/5.1/rnn/Sequencer.lua:79: in function 'updateGradInput'
/Users/zhangx/torch/install/share/lua/5.1/nn/Module.lua:31: in function </Users/zhangx/torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/Users/zhangx/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/Users/zhangx/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
./seq2seq.lua:90: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...angx/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0109d61d10

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
/Users/zhangx/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/Users/zhangx/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
./seq2seq.lua:90: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...angx/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0109d61d10

Word embeddings?

I'm new to Torch and to Lua. Looking at the code, I can't figure out which embeddings you used. E.g., is it word2vec? Also, where is this specific code located?

Invalid arguments: DoubleTensor number FloatTensor

I'm moderately convinced that this is a neuralconvo issue per se, but in case it is:

/root/torch/install/share/lua/5.1/optim/adam.lua:59: invalid arguments: DoubleTensor number FloatTensor 
expected arguments: *DoubleTensor* [DoubleTensor] double | *DoubleTensor* [DoubleTensor] [double] DoubleTensor
stack traceback:
    [C]: in function 'add'
    /root/torch/install/share/lua/5.1/optim/adam.lua:59: in function 'adam'
    train.lua:131: in main chunk
    [C]: in function 'dofile'
    /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d50

Any idea?

Interpretation of 'decoderInputs' in dataset.lua

I'm looking at dataset.lua, where decoderInputs are being set up.

    decoderInputs = torch.IntTensor(maxTargetOutputSeqLen-1,size):fill(0)

    for samplenb = 1, #targetSeqs do
      trimmedEosToken = targetSeqs[samplenb]:sub(1,-2)
      for word = 1, trimmedEosToken:size(1) do
        if size == 1 then
          decoderInputs[word] = trimmedEosToken[word]
        else
          decoderInputs[word][samplenb] = trimmedEosToken[word]
        end
      end
    end

This tensor is then used in model.decoder:forward(decoderInputs) in train.lua. My question is, how (and why) is this tensor different from encoderInputs?

I ask this because when we evaluate the model using eval.lua, we simply pass the decoder's own output at each time step into model.decoder:forward (see Seq2Seq:eval(input), where we have the line
local prediction = self.decoder:forward(torch.Tensor(output))[#output] ).

Help would be appreciated.

Resume training from a saved model

Is there a way to resume training from where it was previously left off? For example, I trained a particular model for 50 epochs and saved it to model.t7. I tested it using eval.lua and discovered that it needed training for another 50 epochs. Is there an easy way for me to resume training such that the model is initialized with model.t7 and then continues to train?

weird result I got when I train the model for myself

I try to reproduce the experiments result from the paper, and I use your code as test. But the result seems to be very strange. Following is my command and settings:
th train.lua --cuda --dataset 50000 --hiddenSize 1000
I use the default settings in the project, and expand the dataset from 5000 to 50000. The dataset of 50000 seems to be worse than 5000, and this makes me confuse. I run for 50 epochs, and the last log is:

-- Epoch 50 / 50    
[================================ 83632/83632 =======================>]  Tot: 1h35m | Step: 68ms
Finished in 1h35m 14.664488722944 examples/sec.
Epoch stats:
           LR= 1e-05    
  Errors: min= 3.814697265625e-06   
          max= 67.265952110291  
       median= 6.5410537719727  
         mean= 7.5180888369819  
          std= 5.9302086321892  

After training, I use th -i eval.lua --cuda to see the ability of my model:

th> say 'Hi'
>> Hi walter really, sorry, sorry, sorry, sorry, sorry, sorry, sorry, sorry,    
                                                                      [0.0347s] 
th> say 'what\'s your name ?'
>> De fravio if it says who me more in your money first of mother, the place.   
                                                                      [0.0346s] 
th> say 'how old are you ?'
>> Twelve way right now.    
                                                                      [0.0122s] 
th> say 'what\'s 2 + 2 ?'
>> One. 
                                                                      [0.0111s] 
th> say 'how are you'
>> I'look here but it'look here but it'look here but it'look here but   
                                                                      [0.0347s] 
th> say 'good bye'
>> That don ?...

The result show almost no response is understandable by human without the logic and grammar. Anything I didn't catch to get this result? Please give me some hint on that. Thanks a lot.

Issue with CUDA

Hi, you just fixed my issue with the lowercase penlight references (thank you!), which unblocked me enough to encounter the following CUDA issue... :)

It seems to work without CUDA, i.e. on the CPU, but with CUDA enabled I get the following:

$ th train.lua --cuda --dataset 5000 --hiddenSize 1000
-- Loading dataset  
data/vocab.t7 not found 
-- Parsing Cornell movie dialogs data set ...   
 [================== 387810/387810 ============>]ETA: 0ms | Step: 0ms           
-- Pre-processing data  
 [================== 5000/5000 ================>]ETA: 0ms | Step: 0ms           
-- Removing low frequency words 
 [================== 8151/8151 ================>]ETA: 0ms | Step: 0ms           
Writing data/examples.t7 ...    
 [================== 8151/8151 ================>]ETA: 0ms | Step: 0ms           
Writing data/vocab.t7 ...   

Dataset stats:  
  Vocabulary size: 7061 
         Examples: 8151 

-- Epoch 1 / 50 

/home/pender/torch/install/bin/luajit: ...der/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:47: bad argument #3 to 'ClassNLLCriterion_updateOutput' (torch.CudaTensor expected, got number)
stack traceback:
    [C]: in function 'ClassNLLCriterion_updateOutput'
    ...der/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:47: in function 'forward'
    ...r/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:27: in function 'forward'
    ./seq2seq.lua:69: in function 'train'
    train.lua:76: in main chunk
    [C]: in function 'dofile'
    ...nder/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00405ea0

Any suggestions?

Inconsistent Tensor Size

In 3 module of nn.Sequential:
...aidri/torch/install/share/lua/5.1/rnn/recursiveUtils.lua:57: inconsistent tensor size at /home/mhaidri/torch/pkg/torch/lib/TH/generic/THTensorMath.c:547
stack traceback:
[C]: in function 'add'
...aidri/torch/install/share/lua/5.1/rnn/recursiveUtils.lua:57: in function 'recursiveAdd'
/home/mhaidri/torch/install/share/lua/5.1/rnn/LSTM.lua:192: in function '_updateGradInput'
...ri/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:58: in function 'updateGradInput'
/home/mhaidri/torch/install/share/lua/5.1/rnn/Sequencer.lua:79: in function 'updateGradInput'
/home/mhaidri/torch/install/share/lua/5.1/nn/Module.lua:31: in function </home/mhaidri/torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/home/mhaidri/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/mhaidri/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
./seq2seq.lua:90: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...idri/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/mhaidri/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/mhaidri/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
./seq2seq.lua:90: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...idri/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

The wrong number of TOTAL_LINES in cornell_movie_dialogs.lua?

I download the dataset from the website, and I find there is only 304713 lines in the movie_lines.txt. But in the cornell_movie_dialogs.lua, it says TOTAL_LINES is 387810. What is the difference between them? I download the wrong version of dataset or something change?

License?

In credits, it says 'copyright xxx', which doesnt really add/remove any rights particularly, but does make it clear its copyrighted. Normally, you'd need to explicitly grant license rights ,to grant rights to make derivative works and so on. Otherwise, any derivative works would be in breach, per my understanding?

attempt to call method 'topk' (a nil value)

I trained the model with a different dataset ( but in the same format as movie conversations) using "opencl". While evaluating I'm getting the below error.

me: say "Hello."

./seq2seq.lua:122: attempt to call method 'topk' (a nil value)
stack traceback:
    ./seq2seq.lua:122: in function 'eval'
    eval.lua:74: in function 'say'
    [string "_RESULT={say "Hello."}"]:1: in main chunk
    [C]: in function 'xpcall'
    /home/siva/torch-cl/install/share/lua/5.1/trepl/init.lua:651: in function 'repl'
    ...d/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
    [C]: at 0x00406670  

Printing the prediction tensor size gives the below output:

Prediction tensor size:      141874
[torch.LongStorage of size 1]

Mini-batch learning

Hi,

Cool project!
I looked in the training code and it seems like the model parameters are updated after every example. i.e. online learning.
Is it by design? we can probably achieve a significant speedup by using mini-batches.

attempt to call field 'TestSuite' (a nil value)

Command : th train.lua --cuda --dataset 5000 --hiddenSize 1000

Error :
/home/mcw/torch/install/bin/luajit: /home/mcw/torch/install/share/lua/5.1/trepl/init.lua:363: /home/mcw/torch/install/share/lua/5.1/trepl/init.lua:363: /home/mcw/torch/install/share/lua/5.1/trepl/init.lua:363: /home/mcw/torch/install/share/lua/5.1/nn/test.lua:12: attempt to call field 'TestSuite' (a nil value)
stack traceback:
[C]: in function 'error'
/home/mcw/torch/install/share/lua/5.1/trepl/init.lua:363: in function 'require'
train.lua:1: in main chunk
[C]: in function 'dofile'
.../mcw/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670

Not enough memory?

I ran this on the CPU for a few hours and when saving the model after training I got an error. Any fix ideas? I'm running this on a macbook with 16 GB of RAM

1482-sraval:neuralconvo-master sraval$ th train.lua --dataset 5000 --hiddenSize 1000
-- Loading dataset  
Loading vocabulary from data/vocab.t7 ...   

Dataset stats:  
  Vocabulary size: 7061 
         Examples: 8151 

-- Epoch 1 / 50 

 [=========================================== 8151/8151 h43m | Step: 690ms      

Finished in 1h43m 1.3066972012208 examples/sec. 

Epoch stats:    
           LR= 0.05 
  Errors: min= 1.1344377487802  
          max= 204.85985171145  
       median= 25.911087689104  
         mean= 33.523359973026  
          std= 27.159547015804  

(Saving model ...)  
/Users/sraval/torch/install/bin/luajit: not enough memory

"module 'pl.list' not found"

Got the following error after following your installation instructions (including "luarocks install penlight" which seemed to work) -- any ideas?

th train.lua --cuda --dataset 5000 --hiddenSize 1000
/home/pender/torch/install/bin/luajit: /home/pender/torch/install/share/lua/5.1/trepl/init.lua:363: /home/pender/torch/install/share/lua/5.1/trepl/init.lua:363: module 'pl.list' not found:No LuaRocks module found for pl.list
    no field package.preload['pl.list']
    no file '/home/pender/.luarocks/share/lua/5.1/pl/list.lua'
    no file '/home/pender/.luarocks/share/lua/5.1/pl/list/init.lua'
    no file '/home/pender/torch/install/share/lua/5.1/pl/list.lua'
    no file '/home/pender/torch/install/share/lua/5.1/pl/list/init.lua'
    no file './pl/list.lua'
    no file '/home/pender/torch/install/share/luajit-2.1.0-alpha/pl/list.lua'
    no file '/usr/local/share/lua/5.1/pl/list.lua'
    no file '/usr/local/share/lua/5.1/pl/list/init.lua'
    no file '/home/pender/.luarocks/lib/lua/5.1/pl/list.so'
    no file '/home/pender/torch/install/lib/lua/5.1/pl/list.so'
    no file './pl/list.so'
    no file '/usr/local/lib/lua/5.1/pl/list.so'
    no file '/usr/local/lib/lua/5.1/loadall.so'
    no file '/home/pender/.luarocks/lib/lua/5.1/pl.so'
    no file '/home/pender/torch/install/lib/lua/5.1/pl.so'
    no file './pl.so'
    no file '/usr/local/lib/lua/5.1/pl.so'
    no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
    [C]: in function 'error'
    /home/pender/torch/install/share/lua/5.1/trepl/init.lua:363: in function 'require'
    train.lua:1: in main chunk
    [C]: in function 'dofile'
    ...nder/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00405ea0

Multiple layers support

Do we have a plan to support multiple layers of LSTM ? The paper seems to have used multiple layers in encoder and decoder.

opencl training fail

I have never be successful on training.

th train.lua --opencl --dataset 50000 --hiddenSize 1000

-- Loading dataset
Loading vocabulary from data/vocab.t7 ...

Dataset stats:
Vocabulary size: 25931
Examples: 83632
libthclnn_searchpath /Users/SolarKing/Dev/torch-cl/install/lib/lua/5.1/libTHCLNN.so
Using Apple , OpenCL platform: Apple
Using OpenCL device: GeForce 9400M

-- Epoch 1 / 50

/Users/SolarKing/Dev/torch/install/bin/luajit: ...larKing/Dev/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
bad argument #3 to '?' (number expected, got nil)
stack traceback:
[C]: at 0x0ebe4500
[C]: in function '__newindex'
.../Dev/torch-cl/install/share/lua/5.1/clnn/LookupTable.lua:108: in function <.../Dev/torch-cl/install/share/lua/5.1/clnn/LookupTable.lua:99>
[C]: in function 'xpcall'
...larKing/Dev/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...arKing/Dev/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
./seq2seq.lua:71: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
.../Dev/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010e8bbbb0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
...larKing/Dev/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...arKing/Dev/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
./seq2seq.lua:71: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
.../Dev/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010e8bbbb0

Add pretrain model

Thank you for share this implementation.
Can you share your pretrain model on full dataset? My GPU is weak (GTX 610M) and it will take a lot time to train on full dataset but I really want to have several personal tests with it.

Problems with running the code

I am new to Torch and to Lua. I tried to run your code with the following command:

th train.lua --dataset 10000 --hiddenSize 128

But I got the following error:

-- Loading dataset data/vocab.t7 not found -- Parsing Cornell movie dialogs data set ... /home/username/torch/install/bin/luajit: ./cornell_movie_dialogs.lua:49: table index is nil stack traceback: ./cornell_movie_dialogs.lua:49: in function 'load' ./dataset.lua:55: in function 'load' ./dataset.lua:37: in function '__init' /home/username/torch/install/share/lua/5.1/torch/init.lua:91: in function </home/username/torch/install/share/lua/5.1/torch/init.lua:87> [C]: in function 'DataSet' train.lua:29: in main chunk [C]: in function 'dofile' ...rname/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405d50

How do I resolve this?

reversing the input sequence

In https://github.com/macournoyer/neuralconvo/blob/master/dataset.lua, line 213, why is the input sequence being reversed? Suppose we have the following conversation:

A: How are you
B: im fine

If I'm understanding this correctly, the following input-target pair is being added to the training set:
{input= you are how ; target= im fine}

According to my understanding, the following pair should be added:
{input= how are you ; target= im fine}

train.lua:169 cutorch.synchronize()

i am using ubuntu 14.04 and run this train.lua in cpu mode, got this.
/home/liqiang/Documents/Lua/torch/install/bin/luajit: train.lua:169: attempt to index global 'cutorch' (a nil value)
stack traceback:
train.lua:169: in main chunk
[C]: in function 'dofile'
.../Lua/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

if i dont use cuda, opencl, then i should comment line 169, since cutorch is for CUDA, right?

No CUDA capable device is detected

I am running this on a google cloud linux instance (Ubuntu 14.04 trusty LTS)

When I start training of CPU only, it is running fine.
I have installed CUDA (7.5) and cuDNN(v5.1 RC) and have followed the given instructions:

  • Install Torch.
  • luarocks install nn
  • luarocks install rnn
  • luarocks install penlight
  • luarocks install cutorch
  • luarocks install cunn

However, when I start training with the help of GPU support, I am receiving the following error:

user1@instance-1:~/neuralconvo$ th train.lua --cuda
-- Loading dataset
Loading vocabulary from data/vocab.t7 ...
Dataset stats:
Vocabulary size: 35147
Examples: 221282
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-515/cutorch/lib/THC/THCGeneral.c line=20 error=38 : no CUDA-
capable device is detected
/home/user1/torch/install/bin/luajit: ...user1/torch/install/share/lua/5.1/trepl/init.lua:384:
cuda runtime error (38) : no CUDA-capable device is detected at /tmp/luarocks_cutorch-scm-1-515/cutorch/lib/T
HC/THCGeneral.c:20
stack traceback:
[C]: in function 'error'
...user1/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
train.lua:55: in main chunk
[C]: in function 'dofile'
...dita/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

What is going wrong and how can I fix this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.