hughperkins / clnn Goto Github PK

View Code? Open in Web Editor NEW

126.0 126.0 16.0 734 KB

OpenCL backend for Torch nn neural networks library

License: BSD 2-Clause "Simplified" License

CMake 1.00% C++ 21.00% Lua 64.61% Shell 1.38% C 7.12% Python 4.88%

clnn's People

Contributors

Stargazers

Watchers

Forkers

brunoro petrjanda tigerneil pawni gloine ai42 viliust programmarchy skinscanner hellowilman jazzman37 limadm afcarl sunkim317 diaosidev 5l1v3r1

clnn's Issues

Test for LogSoftMax_forward/backward failing

bad argument #1 to 'updateOutput' in Mac

Hi Hugh,
When I run the neural-style.lua with openCL, I got below errors. it seems some library is missing, right? Appreciate it if the issue could be looked at.

quans-Mac-mini:neural-style liuquan0722$ th neural_style.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -output_image profile.png -model_file models/nin_imagenet_conv.caffemodel -proto_file models/train_val.prototxt -gpu 0 -backend clnn -num_iterations 1000 -seed 123 -content_layers relu0,relu3,relu7,relu12 -style_layers relu0,relu3,relu7,relu12 -content_weight 10 -style_weight 1000 -image_size 512 -optimizer adam
Successfully loaded models/nin_imagenet_conv.caffemodel
MODULE data UNDEFINED
warning: module 'data [type 5]' not found
Changing line:  table.insert(model, {'pool4', nn.SpatialAveragePooling(6, 6, 1, 1):ceil()}) 
To line:    table.insert(model, {'pool4', nn.SpatialAveragePooling(6, 6, 1, 1):ceil():ceil()})  
cccp1: 96 96 1 1
cccp2: 96 96 1 1
conv2: 256 96 5 5
cccp3: 256 256 1 1
cccp4: 256 256 1 1
conv3: 384 256 3 3
cccp5: 384 384 1 1
cccp6: 384 384 1 1
conv4-1024: 1024 384 3 3
cccp7-1024: 1024 1024 1 1
cccp8-1024: 1000 1024 1 1
Using Apple , OpenCL platform: Apple
Using OpenCL device: ATI Radeon HD 6630M
Setting up content layer    1   :   relu0   
Setting up style layer      1   :   relu0   
WARNING: Skipping content loss  
Setting up content layer    8   :   relu3   
THClReduceAll.cl build log: 
<program source>:11:10: warning: unused variable 'in1'
  float *in1 = &_in1;
         ^
<program source>:12:10: warning: unused variable 'out'
  float *out = &_out;
         ^

/Users/liuquan0722/torch/install/bin/luajit: ...iuquan0722/torch/install/share/lua/5.1/nn/Sequential.lua:44: bad argument #1 to 'updateOutput' (input channels and nInputPlane dont match)
stack traceback:
    [C]: in function 'updateOutput'
    ...iuquan0722/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    neural_style.lua:176: in function 'main'
    neural_style.lua:497: in main chunk
    [C]: in function 'dofile'
    ...0722/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x0108aa0bc0

[FeatureRequest] port SpatialUpSamplingNearest and SpatialBatchNormalization from cunn

I tried to rebuild something like http://tinyclouds.org/colorize/ and found that SpatialUpSamplingNearest and SpatialBatchNormalization are not available within clnn. I tried porting Upsampling yesterday (pawni@091882d ). It seems to work but I haven't tested it properly yet. I will continue to have a look at that and the BatchNormalization, however input would be much appreciated. :)

4 error and 1 warning when run luajit -l clnn -e 'clnn.test()'

2016.3.16 i installed nn, cltorch and tested successfully.
Besides,i succeeding in running jcjohnson's code to get styled images.
but the test of clnn shows there still are 4 errors and a warning. I'm the newer and have no idea about the reason, can anyone explain it?

`65/70 LookupTable_backward .............................................. [PASS]
66/70 SpatialAveragePooling_forward_ceil ................................ [PASS]
67/70 SpatialMaxPooling_forward_batch ................................... [PASS]
68/70 Sqrt_forward ...................................................... [PASS]
69/70 Sum_backward ...................................................... [PASS]
70/70 LookupTable_forward ............................................... [PASS]
Completed 111 asserts in 70 tests with 0 failures and 4 errors and 1 warning
--------------------------------------------------------------------------------
ELU_backward
 Function call failed
/home/gpu2/torch/install/share/lua/5.1/nn/THNN.lua:109: wrong number of arguments for function call
stack traceback:
    [C]: in function 'v'
    /home/gpu2/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'ELU_updateOutput'
    /home/gpu2/torch/install/share/lua/5.1/nn/ELU.lua:20: in function 'forward'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:200: in function 'v'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2619: in function </home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2617>
    [C]: in function 'xpcall'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:476: in function '_pcall'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:436: in function '_run'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:354: in function 'run'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2658: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

--------------------------------------------------------------------------------
ELU_forward
 Function call failed
/home/gpu2/torch/install/share/lua/5.1/nn/THNN.lua:109: wrong number of arguments for function call
stack traceback:
    [C]: in function 'v'
    /home/gpu2/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'ELU_updateOutput'
    /home/gpu2/torch/install/share/lua/5.1/nn/ELU.lua:20: in function 'forward'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:166: in function 'v'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2619: in function </home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2617>
    [C]: in function 'xpcall'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:476: in function '_pcall'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:436: in function '_run'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:354: in function 'run'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2658: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

--------------------------------------------------------------------------------
ELU_transposed
 Function call failed
/home/gpu2/torch/install/share/lua/5.1/nn/THNN.lua:109: wrong number of arguments for function call
stack traceback:
    [C]: in function 'v'
    /home/gpu2/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'ELU_updateOutput'
    /home/gpu2/torch/install/share/lua/5.1/nn/ELU.lua:20: in function 'forward'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:64: in function 'pointwise_transposed'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:216: in function 'v'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2619: in function </home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2617>
    [C]: in function 'xpcall'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:476: in function '_pcall'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:436: in function '_run'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:354: in function 'run'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2658: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

--------------------------------------------------------------------------------
mse_variablebatchsize
 Function call failed
/home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:83: Unexpected arguments passed to test function
stack traceback:
    [C]: in function 'assert'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:83: in function 'getMessage'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:126: in function 'assert'
    ...u2/torch/install/share/lua/5.1/clnn/testMSECriterion.lua:122: in function 'v'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2619: in function </home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2617>
    [C]: in function 'xpcall'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:476: in function '_pcall'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:436: in function '_run'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:354: in function 'run'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2658: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670

--------------------------------------------------------------------------------
Should use TestSuite rather than plain lua table

--------------------------------------------------------------------------------
luajit: /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:362: An error was found while running tests!
stack traceback:
    [C]: in function 'assert'
    /home/gpu2/torch/install/share/lua/5.1/torch/Tester.lua:362: in function 'run'
    /home/gpu2/torch/install/share/lua/5.1/clnn/test.lua:2658: in function 'test'
    (command line):1: in main chunk
    [C]: at 0x00406670
`

CLNN master/thnn4 build error

Hi Hugh,
I believe there is a link error in the current thnn4 / master branch.
The function GET_BLOCKS is in common.h, which has been compiled correctly

[ 25%] Building CXX object CMakeFiles/clnn.dir/common.cpp.o
[100%] Linking CXX shared module libTHCLNN.so
Undefined symbols for architecture x86_64:
"GET_BLOCKS(THClState*, int)", referenced from:
_THNN_ClSpatialAveragePooling_updateOutput in SpatialAveragePooling.cpp.o

Thanks in advance for the help.

MSECriterion batch size limitation

Working on NN with MSECriterion I've stumbled upon the problem with variable batch size. One of the batches in my training set was shorter (dataset_size=7049, batch_size=50 => one batch was 49 items).

This would cause:

/Users/petr/torch/install/bin/luajit: ...s/petr/torch/install/share/lua/5.1/clnn/MSECriterion.lua:6: bad argument #1 to 'copy' (sizes do not match at /tmp/luarocks_cltorch-scm-1-27/cltorch/cltorch/src/lib/THClTensorCopy.cpp:136)
stack traceback:
        [C]: in function 'copy'
        ...s/petr/torch/install/share/lua/5.1/clnn/MSECriterion.lua:6: in function 'forward'
        ./util/trainer.lua:25: in function 'validateRegression'
        ./util/trainer.lua:136: in function 'train'
        main.lua:94: in main chunk
        [C]: in function 'dofile'
        ...petr/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x0103632ad0

As the criterion workBuffer is initialised from first batch the sizes ofc wouldn't match.

I've went around this by trimming my dataset although I was wondering if it might be worth considering to redesign the criterion to handle variable batch sizes.

I would be happy to help with it, just wanted to get some thoughts on it first.

torch/install/share/lua/5.1/nn/THNN.lua:109: bad argument #8 to 'v' (cannot convert 'number' to 'struct THClTensor *')

Hey,

A change to torch/nn was made in this commit that results in errors when running the clnn tests:
torch/nn@26a5a7e

When running
luajit -l clnn -e 'clnn.test()'
I'm getting this error:
torch/install/share/lua/5.1/nn/THNN.lua:109: bad argument #8 to 'v' (cannot convert 'number' to 'struct THClTensor *')

For anybody else reading this: I've fixed it by doing this because I'm not knowledgeable enough to make the necessary code changes to clnn

git clone https://github.com/torch/nn.git
cd nn
git co e07d84d0f90fdf9035166785889941a4271657cf # this is the commit prior to the breaking change
luarocks make rocks/nn-scm-1.rockspec

For reference, the corresponding commit in torch/cunn that adds the necessary code to make it compatible is
torch/cunn@f03b92f

Thanks a lot for all the work you've put into this library, by the way. It has enabled me to play around with neural-style :-)

Simple linear nn

Hi, GPU newbie here. I am able to execute jcjohnson's torch-rnn fine on my GPU:

Using Apple , OpenCL platform: Apple
Using OpenCL device: ATI Radeon HD 6770M
...

But cannot replicate this when I try to port some of my own nets from CPU to OpenCL. I may be missing something simple. Can you please help?

For starters, there are just two layers

model = nn.Sequential():type(dtype)
if opt.model == 'linear' then
model:add(nn.Reshape(3 * 224 * 224))
model:add(nn.Linear(3 * 224 * 224, #classes))

I have converted the dataset type as well
trainSet.data = trainSet.data:type(dtype)

and I use optim for SGD:

optim.sgd(feval, parameters, hyper)

Stack trace below:

In 1 module of nn.Sequential:
...Chandrachud/torch/install/share/lua/5.1/torch/Tensor.lua:458: bad argument #1 to 'set' (expecting number or torch.DoubleTensor or torch.DoubleStorage at /tmp/luarocks_torch-scm-1-4207/torch7/generic/Tensor.c:1125)
stack traceback:
[C]: in function 'set'
...Chandrachud/torch/install/share/lua/5.1/torch/Tensor.lua:458: in function 'view'
...s/Chandrachud/torch/install/share/lua/5.1/nn/Reshape.lua:43: in function <...s/Chandrachud/torch/install/share/lua/5.1/nn/Reshape.lua:31>
[C]: in function 'xpcall'
...Chandrachud/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...handrachud/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
5.magique.lua:312: in function 'opfunc'
...rs/Chandrachud/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
5.magique.lua:344: in function 'train'
5.magique.lua:430: in main chunk
[C]: in function 'dofile'
...chud/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0108cc8bc0

comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long')

luajit -l clnn -e 'clnn.test()'
Running 56 tests
|_______________________________________________________ ==> Abs_backwardUsing Apple , OpenCL platform: Apple
Using OpenCL device: Iris Pro
|_______________________________________________ ==> ClassNLLCriterionMultipleTargetTHClReduceAll.cl build log:
:9:10: warning: unused variable 'in1'
float *in1 = &_in1;
^
:10:10: warning: unused variable 'out'
float *out = &_out;
^

|_________________________ ==> SoftMax_backward/tmp/luarocks_clnn-scm-1-2347/clnn/SoftMax.cpp build log:
:38:20: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long')
for (int i=0; i<get_local_size(0); i++)
~^~~~~~~~~~~~~~~~~~
:63:20: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long')
for (int i=0; i<get_local_size(0); i++)
~^~~~~~~~~~~~~~~~~~

/tmp/luarocks_clnn-scm-1-2347/clnn/SoftMax.cpp build log:
:38:20: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long')
for (int i=0; i<get_local_size(0); i++)
~^~~~~~~~~~~~~~~~~~

__________________________________________________| ==> mseApply_3t_0s_0pt-2-2-2__out = 0.00040675208460443 * (_in1 - _in2) build log:
:37:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
*out = 0.00040675208460443 * (_in1 - *in2);
^

Apply_3t_0s_0pt_-2_-2_-2__out = 0.00043487714720591 * (_in1 - _in2) build log:
:37:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
*out = 0.00043487714720591 * (_in1 - *in2);
^

________________________________________________________ ==> Done

Completed 88 asserts in 56 tests with 0 errors

Error while running test-mnist2.lua

When I try to run the mnist2 test in clnn I get this error.

using luaexe: luajit
model.modules[#model].padding   nil
Using Intel(R) Corporation , OpenCL platform: Intel(R) OpenCL
Using OpenCL device: Intel(R) HD Graphics
# StochasticGradient: training
luajit: ...epLearning/torch/install/share/lua/5.1/nn/Sequential.lua:44: bad argument #1 (field padW does not exist)
stack traceback:
        [C]: in function 'updateOutput'
        ...epLearning/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        ...ng/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
        test/test-mnist2.lua:117: in main chunk
        [C]: at 0x004058d0

Do you know what is happening here ?

"getting latest SpatialAveragePooling" instruction still present in README.md

clnn-avgpool has already been ported to the master and was removed from the branches, making the instructions useless while also making me confused about what i was doing wrong. The instructions should be removed.

these instructions still worked:
git clone https://github.com/hughperkins/nn.git -b avepool_plus_master nn-avepool
cd nn-avepool
luarocks make rocks/nn-scm-1.rockspec
cd ..

I used these instructions, should i reinstall torch/nn/cltorch/clnn or is adding the post master branch harmless?

warning: unused function 'IndexToOffset_999_get'

I'm using clnn to train a resent model on intel GPU.
When the training starts I see the warning below.

 THClReduce.cl build log: .................................... 32/634 ...................] ETA: 0ms | Step: 0ms
 <program source>:48:28: warning: unused function 'IndexToOffset_999_get'
  static inline unsigned int IndexToOffset_999_get(unsigned int linearId, global const TensorInfoCl    *info) {
                       ^

 THClReduce.cl build log:
 <program source>:67:19: warning: unused function 'IndexToOffset_999_get'
  static inline int IndexToOffset_999_get(int linearId, global const TensorInfoCl *info) {
              ^

   THClReduceAll.cl build log:
   <program source>:51:28: warning: unused function 'IndexToOffset_999_get'
    static inline unsigned int IndexToOffset_999_get(unsigned int linearId, global const TensorInfoCl  *info) {
                       ^
    <program source>:66:28: warning: unused function 'getLinearBlockId'
     static inline unsigned int getLinearBlockId() {
                       ^`

This is what I'm doing:

  if opt.backend == 'cl' then
      require 'clnn'
      require 'cltorch'
      net = net:cl()
      --cudnn.convert(net, cudnn) --Convert the net to cudnn 
      -- What is the equivalent of cud.convert for clnn ?
     criterion = criterion:cl()
  end

Is above code right ? Is there anything else that I need to do in order to use my intel GPU ?

Also I see that - train Loss: nan which should be a number ? Should I also convert the training loss value to cl ?

What else needs to be converted to cl ?

Best,
Pramod

Porting Convolutions and im2col

Hi hughperkins,

Using my limited knowledge, I've been trying to port some functions from to the openCL kernel using the guidelines suggested;

I noticed that kernel for SpatialConvolutionMM is called from im2col.cpp -

clnn/lib/THCLNN/im2col.cpp

Line 26 in a7453eb

std::string SpatialConvolutionMM_getKernelTemplate();

clnn/lib/THCLNN/im2col.cpp

Line 42 in a7453eb

std::string uniqueName = "SpatialConvolutionMM::im2col";

Spatial full convolution is very similar to SpatialConvolution, and seemed like it would make sense to follow what is already coded for SpatialConvolutionMM. Since there is a kernel call already in im2col, and no separate cl file gets generated for SpatialFullConvolution it from the port script - how should one call SpatialFullConvolution from the im2col, without breaking SpatialFullConvolution?

...or am I looking at it the wrong way? You can see my first shot at it here: ViliusT/clnn@master...ViliusT:feature/spatialFullConv

missing implementations on SpatialMaxPooling_updateGradInput

I'm porting some cunn code over to clnn and stumbled over the following error:

/Users/brunoro/dev/torch/install/bin/luajit: ...dev/torch/install/share/lua/5.1/nn/SpatialMaxPooling.lua:41: Not implemented at /Users/brunoro/dev/clnn/SpatialMaxPooling.cpp:166
stack traceback:
    [C]: in function 'SpatialMaxPooling_updateGradInput'
    ...dev/torch/install/share/lua/5.1/nn/SpatialMaxPooling.lua:41: in function 'updateGradInput'
    ...rs/brunoro/dev/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
    ...runoro/dev/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
    neural_style.lua:259: in function 'opfunc'
    .../brunoro/dev/torch/install/share/lua/5.1/optim/lbfgs.lua:66: in function 'lbfgs'
    neural_style.lua:278: in function 'main'
    neural_style.lua:439: in main chunk
    [C]: in function 'dofile'
    .../dev/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x010a4d1190

Browsing through SpatialMaxPooling.cpp I found out that there's actually some commented code just above the line that throws that exception. Is there any plans on implementing those cases?

Also, some CNN noob questions: what those cases stand for? I'd be happy to implement those if anyone points out some reference to what actually this method is doing.

ERROR:PFUI.lua:8: attempt to index field 'gui' (a nil value)

pls help

Test for CMul_backward failing

"Abs.lua:8: attempt to index field 'THNN' (a nil value)"

This is because THNN isnt implemented in clnn yet, ie torch/nn#547

Started on adding it, in progress, branch adding_THNN https://github.com/hughperkins/clnn/commits/adding_THNN

Build failing on OS X: CL/cl.h not found

Hi,

I have a problem building clnn. When trying to use EasyCL, it can't find a header file:

In file included from clnn/SpatialMaxPooling.cpp:8:
/torch/install/include/easycl/EasyCL.h:12:10: fatal error: 'CL/cl.h' file not found

include "CL/cl.h"

Plus several similar errors.

My cltorch installation is fine and passes all tests. I've checked the torch installation folder and it has the lib/libEasyCL.dylib file, meaning that EasyCL itself was built. So it seems to be a problem of how clnn uses it.

bad argument #1 to 'set' (expecting number or torch.FloatTensor or torch.FloatStorage)

Hi,

Im trying to run my NN using clnn. When executing the forward pass:

        local outputs = model:forward(inputs)

I get the following error:
torch/install/share/lua/5.1/torch/Tensor.lua:458: bad argument #1 to 'set' (expecting number or torch.FloatTensor or torch.FloatStorage at /tmp/luarocks_torch-scm-1-2862/torch7/generic/Tensor.c:1125)
stack traceback:
[C]: in function 'set'
torch/install/share/lua/5.1/torch/Tensor.lua:458: in function 'view'
torch/install/share/lua/5.1/nn/Reshape.lua:46: in function <torch/install/share/lua/5.1/nn/Reshape.lua:31>
[C]: in function 'xpcall'
torch/install/share/lua/5.1/nn/Container.lua:65: in function 'rethrowErrors'
torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:272: in function 'opfunc'
torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
train.lua:327: in function 'train'
train.lua:420: in main chunk
[C]: at 0x010a141bd0

I might be missing something fundamental, not sure?

SpatialConvolutionMM gives incorrect output with nOutputPlane == 7

to reproduce:

  local batchSize = 1
  local inFeatures = 1
  local outFeatures = 7
  local sentenceLength = 3
  local kernelSize = 3
  local net = nn.SpatialConvolutionMM(inFeatures, outFeatures, 1, kernelSize)
  net:cl()
  local weights = net.weight
  weights:uniform(-1, 1)
  net.bias:zero()  -- to simplify test
  local input = torch.ClTensor(batchSize, inFeatures, sentenceLength, 1):uniform()
  local output = net:forward(input)
  print('weights:size()', weights:size())
  weights = weights:view(torch.LongStorage({outFeatures, inFeatures, kernelSize}))
  print('weights:size()', weights:size())
  print('output:size()', output:size())
  local outLength = sentenceLength - math.floor(kernelSize / 2) * 2
  local ourOut = torch.FloatTensor(batchSize, outFeatures, outLength, 1):zero()

  for b=1,batchSize do
    -- each output feature is independnet from other outputs
    for outFeature=1,outFeatures do
      -- each output point along outS dimensino is indepdnent from other outputs
      for outS=1,outLength do
        local sum = 0
        -- convolve is sum over kernel size, and over the input features
        for k=1,kernelSize do
          local inS = outS + (k - 1)
          for inFeature=1,inFeatures do
            local weight = weights[outFeature][inFeature][k]
            sum = sum + weight * input[b][inFeature][inS][1]
          end
        end
        ourOut[b][outFeature][outS][1] = sum
      end
    end
  end
  print('output[1]')
  print(output[1])
  print('ourOut[1]')
  print(ourOut[1])
  print('output[1] - ourOut[1]')
  print(output[1]:float() - ourOut[1])
  mytester:assertlt((output:float() - ourOut):abs():max(), 0.0001)

(embedded as a test case)

For other nOutputPlane values seems to work ok...

Trouble getting ClassNLLCriterion to work

Hey,

First of all thanks for the great work on this, this project has been really helpful.

I have been struggling to get clnn to run StochasticGradient with ClassNLLCriterion.

I have been following this guide https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb

My first question is that regular nn's ClassNLLCriterion seems fine accepting 1D tensors, while clnn's needs 2D ones. I tried adjusting for this by adding a "net:add(nn.Reshape(10, 1))" as the last step of my neural network. Is this the correct approach?

Also, torch's ClassNLLCriterion accepts integers as the target, while, if I understand correctly, clnn's requires tensors with the correct label set to 1. I've converted the targets to be 1D vectors of 0's with a 1 on the correct label.

With these changes I can get it to run but I'm getting nonsensical error numbers when training (sometimes nan, sometimes impossibly high values).

Here's the full code. It works when not running through OpenCL (it gets a training error of 1.432 in about 60 seconds), so I think I did something wrong, but I can't figure out what.

require('nn')
require('cltorch')
require('clnn')

-- os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
-- os.execute('unzip cifar10torchsmall.zip')

net = nn.Sequential()

net:add(nn.SpatialConvolutionMM(3, 6, 5, 5))
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.SpatialConvolutionMM(6, 16, 5, 5))
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))
net:add(nn.Linear(16*5*5, 120))
net:add(nn.Linear(120, 84))
net:add(nn.Linear(84, 10))
net:add(nn.LogSoftMax())
net:add(nn.Reshape(10, 1))

net = net:cl()


trainset = torch.load('cifar10-train.t7')
trainset.data = trainset.data:double()

testset = torch.load('cifar10-test.t7')
testset.data = testset.data:double()


mean = {} -- store the mean, to normalize the test set in the future
stdv  = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
    mean[i] = trainset.data[{ {}, {i}, {}, {}  }]:mean() -- mean estimation
    print('Channel ' .. i .. ', Mean: ' .. mean[i])
    trainset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction

    stdv[i] = trainset.data[{ {}, {i}, {}, {}  }]:std() -- std estimation
    print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
    trainset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

for i=1,3 do -- over each image channel
    testset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction    
    testset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end


trainset.data = trainset.data:cl()

setmetatable(trainset, 
    {__index = function(t, i) 
        return {t.data[i], t.label[i]} 
    end}
);

function trainset:size() 
    return self.data:size(1) 
end

local labels = trainset.label

trainset.label = torch.Tensor(trainset.label:size(1), 10)

for i=1,trainset:size() do
    trainset.label[i] = torch.Tensor(10):fill(0)
    trainset.label[i][labels[i]] = 1
end

trainset.label = trainset.label:cl()

print(trainset)
--print(net:forward(trainset.data[4]))


criterion = nn.ClassNLLCriterion()
criterion = criterion:cl()


trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.001
trainer.maxIteration = 2


trainer:train(trainset)

If the differences between clnn's ClassNLLCriterion and nn's version are unintentional I'd love to help consolidate the interfaces.

temporal convolutions/pooling

Is a temporal convolutions/pooling implementation coming in the near future? This will allow me to accelerate my model using the GPU.

Thanks

Test for tanh_transposed and sigmoid_transposed failing

neural-style with clnn backend

I fixed using neural-style with clnn backend in my repo
https://github.com/susloparovdenis/neural-style
When I run it with -gpu 0 -backend clnn parameters I get following warnings

Successfully loaded /home/denis/Workspace/Art/neural-style//models/vgg_normalised.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Tahiti
Setting up style layer      2   :   relu1_1 
Setting up style layer      7   :   relu2_1 
THClReduceAll.cl build log: 
"/tmp/OCLI7nvej.cl", line 9: warning: variable "in1" was declared but never
          referenced
    float *in1 = &_in1;
           ^

"/tmp/OCLI7nvej.cl", line 10: warning: variable "out" was declared but never
          referenced
    float *out = &_out;
           ^


Setting up style layer      12  :   relu3_1 
Setting up style layer      21  :   relu4_1 
Setting up content layer    23  :   relu4_2 
Setting up style layer      30  :   relu5_1

The result image differes a lot from image with nn backend

Issue with using an older card or something in pre-reqs?

Hello,

I've been trying to run https://github.com/karpathy/char-rnn on an D2700 atom computer with a GeForce 8400 GS and I'm having problems with clnn specifically. Incidentally, I have been able to run the char-rnn project on the CPU, just not the GPU.

cltorch.test() passes, EasyCL tests pass. nn.test() passes. I've included the clnn test output.

From the output it seemed that maybe I had some sort of mismatch in the chain of dependent software but I'm new to torch/lua/opencl so I'm pretty lost there. Alternatively, I thought maybe this card might just be too old (OpenCL 1.1?) for some of the things that clnn is doing? Any help would be appreciated. I'm running with Nvidia's 340.96 driver as thats the one I could get working. Running on a clean install of Ubuntu 15.10

clnntest.txt

[Feature request] Groups support for SpatialConvolutionMM

To use https://github.com/szagoruyko/loadcaffe with clnn convolution needs groups support (for most of the networks).
An example is in cudnn: https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua

clnn for neuralconvo support. LogSoftMax.lua:23

Hey I am trying to add cltorch for neuralconvo . and my code is here cltorch support neural conversation model.

bash-3.2$ th train.lua --dataset 1000 --hiddenSize 100
libthclnn_searchpath    /Users/zhuxiaohu/torch/install/lib/lua/5.1/libTHCLNN.so
-- Loading dataset
data/vocab.t7 not found
-- Parsing Cornell movie dialogs data set ...
 [=============================================================== 387810/387810 =======>] Tot: 2s514ms | Step: 0ms
-- Pre-processing data
 [============================================================= 1000/1000 =============>] Tot: 242ms | Step: 0ms
-- Removing low frequency words
 [============================================================ 1569/1569 ==============>] Tot: 99ms | Step: 0ms
Writing data/examples.t7 ...
 [============================================================= 1569/1569 =============>] Tot: 323ms | Step: 0ms
Writing data/vocab.t7 ...

Dataset stats:
  Vocabulary size: 2536
         Examples: 1569
Using Apple , OpenCL platform: Apple
Using OpenCL device: HD Graphics 4000

It seems all right until the following error occurs:

kernel source:
1: // Threads per thread block
2: #define THCL_NONCONTIG_REDUCE_BLOCK_SIZE 32 * 16
3:
4: inline float modifyOp(float _in1) {
5:   float _out;
6:   float *in1 = &_in1;
7:   float *out = &_out;
8:   *out = *in1;
9:   return _out;
10: }
11:
12: inline float reduceOp(float _in1, float _in2) {
13:   // I guess the compiler can sort this stuff out :-P
14:   float _out;
15:   float *in1 = &_in1;
16:   float *in2 = &_in2;
17:   float *out = &_out;
18:   *out = *in1 + *in2;
19:   return _out;
20: }
21:
22: // kernel argument that defines tensor layout
23: typedef struct TensorInfoCl {
24:   // Extracts size/stride information for the kernel.
25:   // Successive dimensions can be collapsed if the size/strides match
26:   // up and thus there are no holes between the dimensions. This is used
27:   // to reduce the complexity of the problem.
28:   // The optional `reduceDim` indicates a reduction dimension for the
29:   // given tensor, so that the output size for this dimension will be 1.
30:
31:   int sizes[25];
32:   int strides[25];
33:   int offset;
34:   int dims;
35: } TensorInfoCl;
36: // Contiguous tensors of more than one dimension are collapsed down
37: // to one tensor
38:
39:
40: // Translate a linear index for the apply to a float* offset;
41: // specialized on `Dims` to reduce nvcc compilation time
42:
43:
44: inline int IndexToOffset_998_get(int linearId, global const TensorInfoCl *info) {
45:     return linearId + info->offset;
46: }
47:
48: inline int IndexToOffset_999_get(int linearId, global const TensorInfoCl *info) {
49:   int offset = info->offset;
50:
51:   // Use dynamic dims
52:   for (int i = info->dims - 1; i >= 0; --i) {
53:     int curDimIndex = linearId % info->sizes[i];
54:     int curDimOffset = curDimIndex * info->strides[i];
55:     offset += curDimOffset;
56:
57:     linearId /= info->sizes[i];
58:   }
59:
60:   return offset;
61: }
62:
63: inline int getLinearBlockId() {
64:   return get_group_id(2) * get_num_groups(1) * get_num_groups(0) +
65:     get_group_id(1) * get_num_groups(0) +
66:     get_group_id(0);
67: }
68:
69: // Block-wide reduction in shared memory helper; only /*threadIdx.x*/ get_local_id(0) == 0 will
70: // return the reduced value
71:
72: inline float reduceBlock( local float* smem,
73:                    int numVals,
74:                    float threadVal,
75:                    float init) {
76:   if (numVals == 0) {
77:     return init;
78:   }
79:
80:   if ((int)get_local_id(0) < numVals) {
81:     smem[ get_local_id(0)] = threadVal;
82:   }
83:
84:   // First warp will perform reductions across warps
85:   barrier(CLK_LOCAL_MEM_FENCE);
86:   if ((get_local_id(0) / 32) == 0) {
87:     float r = (int)get_local_id(0) < numVals ? smem[get_local_id(0)] : init;
88:
89:     for (int i = 32 + get_local_id(0); i < numVals; i += 32) {
90:       r = reduceOp(r, smem[i]);
91:     }
92:
93:     smem[get_local_id(0)] = r;
94:   }
95:
96:   // First thread will perform reductions across the block
97:   barrier(CLK_LOCAL_MEM_FENCE);
98:
99:   float r = init;
100:   if (get_local_id(0) == 0) {
101:     r = smem[0];
102:
103:     int numLanesParticipating = min(numVals, 32);
104:
105:     if (numLanesParticipating == 32) {
106:       // Unroll for 32 == 32 and numVals >= 32
107:       // #pragma unroll
108:       // unrolling by hand, so compiler-independent
109:
110:         r = reduceOp(r, smem[1]);
111:
112:         r = reduceOp(r, smem[2]);
113:
114:         r = reduceOp(r, smem[3]);
115:
116:         r = reduceOp(r, smem[4]);
117:
118:         r = reduceOp(r, smem[5]);
119:
120:         r = reduceOp(r, smem[6]);
121:
122:         r = reduceOp(r, smem[7]);
123:
124:         r = reduceOp(r, smem[8]);
125:
126:         r = reduceOp(r, smem[9]);
127:
128:         r = reduceOp(r, smem[10]);
129:
130:         r = reduceOp(r, smem[11]);
131:
132:         r = reduceOp(r, smem[12]);
133:
134:         r = reduceOp(r, smem[13]);
135:
136:         r = reduceOp(r, smem[14]);
137:
138:         r = reduceOp(r, smem[15]);
139:
140:         r = reduceOp(r, smem[16]);
141:
142:         r = reduceOp(r, smem[17]);
143:
144:         r = reduceOp(r, smem[18]);
145:
146:         r = reduceOp(r, smem[19]);
147:
148:         r = reduceOp(r, smem[20]);
149:
150:         r = reduceOp(r, smem[21]);
151:
152:         r = reduceOp(r, smem[22]);
153:
154:         r = reduceOp(r, smem[23]);
155:
156:         r = reduceOp(r, smem[24]);
157:
158:         r = reduceOp(r, smem[25]);
159:
160:         r = reduceOp(r, smem[26]);
161:
162:         r = reduceOp(r, smem[27]);
163:
164:         r = reduceOp(r, smem[28]);
165:
166:         r = reduceOp(r, smem[29]);
167:
168:         r = reduceOp(r, smem[30]);
169:
170:         r = reduceOp(r, smem[31]);
171:
172:     } else {
173:       for (int i = 1; i < numLanesParticipating; ++i) {
174:         r = reduceOp(r, smem[i]);
175:       }
176:     }
177:   }
178:
179:   return r;
180: }
181:
182:
183:
184:
185: inline int getReduceNoncontigDimSliceIndex() {
186:   // Each thread handles one slice
187:   return getLinearBlockId() * THCL_NONCONTIG_REDUCE_BLOCK_SIZE + /*threadIdx.x*/ get_local_id(0);
188: }
189:
190: // Kernel that handles an entire reduction of a slice of a tensor per each thread
191: kernel void
192: THClTensor_reduceNoncontigDim(global TensorInfoCl *out_info,
193:                               global float *out_data,
194:                               global TensorInfoCl *in_info,
195:                               global float *in_data,
196:                               int reductionStride,
197:                               int reductionSize,
198:                               int totalSlices,
199:                               float init) {
200:   const int sliceIndex = getReduceNoncontigDimSliceIndex();
201:
202:   if ((int)sliceIndex >= totalSlices) {
203:     return;
204:   }
205:
206:   // Each thread picks a point in `out` and `in` for which it is
207:   // producing the reduction
208:   const int outOffset =
209:     IndexToOffset_998_get(sliceIndex, &out_info[0]);
210:   const int inBaseOffset =
211:     IndexToOffset_998_get(sliceIndex, &in_info[0]);
212:
213:   // For each point in reductionSize, reduce into `r`
214:   int inOffset = inBaseOffset;
215:   float r = init;
216:
217:   for (int i = 0; (int)i < reductionSize; ++i) {
218:     r = reduceOp(r, modifyOp(in_data[inOffset]));
219:     inOffset += reductionStride;
220:   }
221:
222:   // Write out reduced value
223:   out_data[outOffset] = r;
224: }
225:
226: inline int getReduceContigDimSliceIndex() {
227:   // Each block handles one slice
228:   return getLinearBlockId();
229: }
230:
231: // Kernel that handles an entire reduction of a slice of a tensor per
232: // each block
233: kernel void
234: THClTensor_reduceContigDim(global TensorInfoCl *out_info,
235:                            global float *out_data,
236:                            global TensorInfoCl *in_info,
237:                            global float *in_data,
238:                            int reductionSize,
239:                            int totalSlices,
240:                            float init,
241:                            local float *smem) {
242:   const int sliceIndex = getReduceContigDimSliceIndex();
243:
244:   if ((int)sliceIndex >= totalSlices) {
245:     return;
246:   }
247:
248:   // Get the offset in `out` for the reduction
249:   const int outOffset =
250:     IndexToOffset_998_get(sliceIndex, &out_info[0]);
251:
252:   // Get the base offset in `in` for this block's reduction
253:   const int inBaseOffset =
254:     IndexToOffset_998_get(sliceIndex, &in_info[0]);
255:
256:   // Each thread in the block will reduce some subset of elements in
257:   // the slice. The elements are guaranteed contiguous starting at
258:   // `inBaseOffset`.
259:   float r = init;
260:   for (int i = /*threadIdx.x*/ get_local_id(0); (int)i < reductionSize; i += /*blockDim.x*/ get_local_size(0)) {
261:     r = reduceOp(r, modifyOp(in_data[inBaseOffset + i]));
262:   }
263:
264:   // Reduce within the block
265: //  extern __shared__ float smem[];
266:   r = reduceBlock(smem, /*blockDim.x*/ get_local_size(0), r, init);
267:
268:   if (/*threadIdx.x*/ get_local_id(0) == 0) {
269:     // Write out reduced value
270:     out_data[outOffset] = r;
271:   }
272: }
273:
274:

Invalid work group size, code -54
/Users/zhuxiaohu/torch/install/bin/luajit: ...huxiaohu/torch/install/share/lua/5.1/clnn/LogSoftMax.lua:23:
kernel source:
1: // Threads per thread block
2: #define THCL_NONCONTIG_REDUCE_BLOCK_SIZE 32 * 16
3:
4: inline float modifyOp(float _in1) {
5:   float _out;
6:   float *in1 = &_in1;
7:   float *out = &_out;
8:   *out = *in1;
9:   return _out;
10: }
11:
12: inline float reduceOp(float _in1, float _in2) {
13:   // I guess the compiler can sort this stuff out :-P
14:   float _out;
15:   float *in1 = &_in1;
16:   float *in2 = &_in2;
17:   float *out = &_out;
18:   *out = *in1 + *in2;
19:   return _out;
20: }
21:
22: // kernel argument that defines tensor layout
23: typedef struct TensorInfoCl {
24:   // Extracts size/stride information for the kernel.
25:   // Successive dimensions can be collapsed if the size/strides match
26:   // up and thus there are no holes between the dimensions. This is used
27:   // to reduce the complexity of the problem.
28:   // The optional `reduceDim` indicates a reduction dimension for the
29:   // given tensor, so that the output size for this dimension will be 1.
30:
31:   int sizes[25];
32:   int strides[25];
33:   int offset;
34:   int dims;
35: } TensorInfoCl;
36: // Contiguous tensors of more than one dimension are collapsed down
37: // to one tensor
38:
39:
40: // Translate a linear index for the apply to a float* offset;
41: // specialized on `Dims` to reduce nvcc compilation time
42:
43:
44: inline int IndexToOffset_998_get(int linearId, global const TensorInfoCl *info) {
45:     return linearId + info->offset;
46: }
47:
48: inline int IndexToOffset_999_get(int linearId, global const TensorInfoCl *info) {
49:   int offset = info->offset;
50:
51:   // Use dynamic dims
52:   for (int i = info->dims - 1; i >= 0; --i) {
53:     int curDimIndex = linearId -1180192744nfo->sizes[i];
54:     int curDimOffset = curDimIndex * info->strides[i];
55:     offset += curDimOffset;
56:
57:     linearId /= info->sizes[i];
58:   }
59:
60:   return offset;
61: }
62:
63: inline int getLinearBlockId() {
64:   return get_group_id(2) * get_num_groups(1) * get_num_groups(0) +
65:     ge
stack traceback:
    [C]: in function 'sum'
    ...huxiaohu/torch/install/share/lua/5.1/clnn/LogSoftMax.lua:23: in function 'updateOutput'
    ...s/zhuxiaohu/torch/install/share/lua/5.1/rnn/Recursor.lua:24: in function 'updateOutput'
    .../zhuxiaohu/torch/install/share/lua/5.1/rnn/Sequencer.lua:47: in function 'updateOutput'
    .../zhuxiaohu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./seq2seq.lua:79: in function 'train'
    train.lua:82: in main chunk
    [C]: in function 'dofile'
    ...aohu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:143: in main chunk
    [C]: at 0x0109d1cbc0

Seems like I should implement some methods in clnn LogSoftMax.lua.

[Feature request] SpatialMaxPooling in ceil mode

Caffe and Torch use different max pooling, caffe does ceil when computing output size and torch does floor. We have a switch in cudnn https://github.com/soumith/cudnn.torch/blob/master/Pooling.lua#L17, floor() and ceil() functions, and ceil max-pooling is here https://github.com/szagoruyko/imagine-nn/blob/master/SpatialMaxPooling.cu, clnn needs the same switch I think

clnn with mac Intel HD4000

Hi, I am trying to run this code (https://github.com/karpathy/char-rnn), which has support for OpenCl. Normally mac has an OpenCl preinstalled. However, when I try to run the code with the opencl option, it says to install the clnn and cltorch modules (they are installed) and if they are installed, to check my OpenCL driver's configuration. This is on a macbook pro with an Intel HD4000 card running OS X Mavericks. Any idea what is happening? How do I fix?

inplace ReLU does not work

th> m = nn.ReLU(true):cl()
                                                                      [0.0001s]
th> m:forward(torch.ClTensor(8))
Using Apple platform: Apple
Using device: HD Graphics 4000
/usr/local/share/lua/5.1/clnn/Threshold.lua:8: attempt to index field 'input' (a nil value)
stack traceback:
    /usr/local/share/lua/5.1/clnn/Threshold.lua:8: in function 'Threshold_updateOutput'
    /usr/local/share/lua/5.1/nn/Threshold.lua:20: in function 'forward'
    [string "_RESULT={m:forward(torch.ClTensor(8))}"]:1: in main chunk
    [C]: in function 'xpcall'
    /usr/local/share/lua/5.1/trepl/init.lua:630: in function 'repl'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
    [C]: at 0x0106c38400

mac build fails with "ld: library not found for -lTHC"

https://travis-ci.org/hughperkins/clnn/builds/99839333

On Mac: "THCLNN.lua:11: bad argument #1 to 'load'"

see #22 (comment) :

Per @Salicylic :

After updating to most recent clnn version, build goes through, but now require 'clnn' fails:

th> require 'clnn'
libthclnn_searchpath    nil
...ta/Users/jan/torch/install/share/lua/5.1/clnn/THCLNN.lua:11: bad argument #1 to 'load' (string expected, got nil)
stack traceback:
    [C]: in function 'load'
    ...ta/Users/jan/torch/install/share/lua/5.1/clnn/THCLNN.lua:11: in main chunk
    [C]: in function 'require'
    ...Data/Users/jan/torch/install/share/lua/5.1/clnn/init.lua:5: in main chunk
    [C]: in function 'require'
    stdin:1: in main chunk
    [C]: at 0x010486aba0

SpatialFullConvolution in clnn

Hi,

I want to ask you if is possible to convert SpatialFullConvolution for using it in clnn.

Best,

Nick

LookupTable broken

I think that some recent refactoring in torch/nn broke the backward pass for nn.LookupTable when running with cltorch. Here's a quick test case:

require 'torch'
require 'nn'
require 'cltorch'
require 'clnn'

local N, D, V = 3, 4, 5
local lookup_table = nn.LookupTable(V, D)
local x = torch.Tensor(N):random(V)
local dy = torch.randn(N, D)
local y = lookup_table:forward(x)
local dx = lookup_table:backward(x, dy)

local x_cl = x:cl()
local dy_cl = dy:cl()
lookup_table:cl()
local y_cl = lookup_table:forward(x)
local dx_cl = lookup_table:backward(x, dy)

This crashes with the error:

/home/justin/torch/install/bin/luajit: /home/justin/torch/install/share/lua/5.1/nn/LookupTable.lua:73: attempt to call field 'LookupTable_accGradParameters' (a nil value)
stack traceback:
    /home/justin/torch/install/share/lua/5.1/nn/LookupTable.lua:73: in function 'accGradParameters'
    /home/justin/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
    foo.lua:17: in main chunk
    [C]: in function 'dofile'
    ...stin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406670

I tried updating torch, nn, cltorch, and clnn, but the error persists; I'm running on Ubuntu 14.04.