This package contains several optimization routines and a logger for Torch:
torch / optim Goto Github PK
View Code? Open in Web Editor NEWA numeric optimization package for Torch.
License: Other
A numeric optimization package for Torch.
License: Other
This package contains several optimization routines and a logger for Torch:
Currently the running 'mean square' variable 'm' of rmsprop is initialized to zero. If alpha is near one (e.g. the default value 0.99) the gradient is often divided by a number < 1 during the first few iterations. Especially at the beginning of the optimization it might be beneficial not to amplify the gradient too much (e.g. with the current impl the learning rate has to be set to a much smaller value when using rmsprop compared to plain-vanilla sgd in order not to diverge, quite often I see extreme error values during the first few rmsprop steps).
A simple solution could be to initialize 'm' with 1, e.g. :fill(1)
instead of :zero()
or to specify an initialization value in the optimization state/options.
A different approach could be to estimate the mean over N timesteps while not dividing, e.g. run a warmup-phase or to use a boolean 'reset' flag that when true initializes the mean to the gradient values of the next batch (not averaging over multiple steps).
BTW Does anybody know a publication dealing with a double-exponential smoothing rmsprop-alternative?
According to Tensorflow implementation, it seems to me that the 53rd line of 'rmsprop.lua' needs to modified as
state.tmp:sqrt(state.m):add(epsilon) --> state.tmp:sqrt(state.m+epsilon)
Is it okay to use the original one without modification when I try to train Inception-resnet v2 from the scratch uisng the same optimization parameters ?
because these two lines of code don't check the length of target
and the self.mat
will be wrongly updated if target:numel()==1
.
Note that feeding ConfusionMatrix
with target
being a dim 1 tensor of length 1 is possible because of this modification to nn.ClassNLLCriterion
. I didn't submit a patch because I'm not sure whether we should support or disallow this kind of size.
Hi, is the developer interested to add the the levenberg-marquart algorithm to the package? I have implemented it recently in Torch. Practice has shown that it is one of the best optimization algorithms for common (not deep) neural nets.
Hi,
This could be a good addition to your library:
Adasecant
"In this paper, we propose an adaptive learning rate algorithm, which utilizes stochastic curvature information of the loss function for automatically tuning the learning rates"
@clementfarabet elaborate here?
The current release 1.0.3-1
is bugged! Could we please release a new one where gnuPOT
has been definitely fixed (by my commit)? I'm getting this bug on every machine..
Adagrad uses torch.sqrt applied to CudaTensors. Unfortunately, that's not supported so it fails with:
t7> =torch.sqrt(t,torch.CudaTensor(10))
expected arguments: [DoubleTensor] DoubleTensor | double
stack traceback:
[C]: at 0x7f1057ba8140
[C]: at 0x7f1057bc9070
[C]: at 0x7f1063c67960
t7>
Currently, Logger:plot(...)
always displays a plot window and optionally writes to a file, i.e. it's not possible to save a plot without displaying it. This might be inconvenient if running a batch of experiments, for example. I suggest introducing a new member variable self.showPlot = true
to control the behaviour.
Is it possible for Logger:add(s)
to add only a subset of the variables? For instance, I would like to call Logger:setNames({'loss', 'epoch', 'batch'})
and then do Logger:add({loss=0.5, batch=10})
. The unknown field epoch
should be set to a default value, say zero.
I think the current version of Logger is designed to only take as argument an ordered array (?). In the above example, if I do Logger:add({0.5, 10})
, it will simply set the epoch to 10
and leave batch
blank. This behavior is quite a pain while logging lots of variables.
I try to give a vector of lrs per layer but it does not work. Here is my code;
local learningRates = {}
local params, gradParams = model:parameters()
print(params[1]:size())
for i=1, #params do
learningRates[i] = opt.LR
end
print("setting LR")
learningRates[#params] = opt.topLayerLR
-- print(learningRates)
learningRates = torch.Tensor(learningRates):reshape(#params,1)
ocal params, gradParams = model:parameters()
print(params[1]:size())
for i=1, #params do
learningRates[i] = opt.LR
end
print("setting LR")
learningRates[#params] = opt.topLayerLR
-- print(learningRates)
learningRates = torch.Tensor(learningRates):reshape(#params,1)
Could you help me how to use learningRates in a propoer way?
Hi
I ran an experiment with optim.lbfgs which uses optim.lswolfe which in turn calls the function roots() in optim/polyinterp.lua, and my program crashed at line https://github.com/torch/optim/blob/master/polyinterp.lua#L35.
I found that this line won't pass in case n == 1, so it might be a bug
thanks
rushan chen
Hi, all~
This is the real first time to use minibatch mode since I've been sticking to minibatch=1 (SGD ;-) for long time. And it should be equivalent when call batchAdd()
with single input and single label, but it seems there is a small bug in batchAdd()
at Click me ;-)
error case:
for network model
net = nn.Sequential()
....
-- expected to outpout 8 predictions
net:add(nn.Reshape(8))
and with input and label shown below
input = torch.FloatTensor(1,4,12,12)
target = torch.FloatTensor(1,1)
The output data formate is torch.LongTensor(8)
and torch.FloatTensor(1,1)
.
Then we go to here ->> WARNING for prediction and here for label.
Finally, we got pred
with torch.LongTensor(4)
but label
with torch.FloatTensor(1)
.
As a result, out of range
is thrown at here
x_x
Summary:
batchAdd()
and add()
works fine on minibatch > 2
nn.Reshape(8)
will squeeze first dimension: 1x8x1x1 --> 8
, which is one of the cause of this bugbatchAdd
is not all-covering, maybe mutual check of preds
and targets
should be considered.Just report bug, if it's helpful to you, why not star repo?
happy in hacking~ ;-)
Hi,
The optim.plot() doesn't plot while I train from inside a docker. Any solutions?
Hello all,
Can I use ConfusionMatrix to compute top-n accuracy for my classifier or not?
If yes, how can I do that? Is there any sample code to do that?
Thanks
Hello,
I would like to plot the confusion matrix for the resnet network https://github.com/facebook/fb.resnet.torch. I substitute the 1000-way classifier with a binary one.
My output is a tensor of size 32x2 and my target a tensor of size 32x1.
When I try to use ConfusionMatrix:batchAdd I get this error:
/home/jessica/torch/install/bin/luajit: ...ca/torch/install/share/lua/5.1/optim/ConfusionMatrix.lua:117: bad argument #1 to 'indexAdd' (out of range at /home/jessica/torch/pkg/torch/lib/TH/generic/THTensor.c:729)
stack traceback:
[C]: in function 'indexAdd'
...ca/torch/install/share/lua/5.1/optim/ConfusionMatrix.lua:117: in function 'batchAdd'
./train.lua:71: in function 'train'
main.lua:59: in main chunk
[C]: in function 'dofile'
...sica/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
Given that i am new to torch and lua. Could help me understanding what is going wrong? Thank you
Hello,
I have a neural network using 'nn' and training with 'optim' package.
When training my model, all CPUs are taken and working 100%.
I wonder if optim method automatically finds all available cpu and performs parallel processing?
If yes, is there some option to configure the number of cpu for training?
Thanks
The weightDecay in Torch refers to L2, Yes?
So How can I implemment L1 weightDecay?
Can anyone specify what values config.dampening in optim.sgd?
I use
optim/1.0.5-0 (using macos as host)
Here's a torch session as a proof(see first bold section - before printing a confusion matrix instance and bold section after printing that instance):
th> foo = optim.ConfusionMatrix({'1', '2'})
[0.0001s]
th> foo:add(1,2)
[0.0001s]
th> foo.totalValid
0
[0.0000s]
th> foo:add(1,1)
[0.0000s]
th> foo.totalValid
0
[0.0001s]
th> print(foo)
ConfusionMatrix:
[[ 1 0] 100.000% [class: 1]
[ 1 0]] 0.000% [class: 2]
th> foo.totalValid
0.5
[0.0000s]
I suppose, that a ConfusionMatrix instance has to calculate its 'totalValid' field properly even if that ConfusionMatrix instance hasn't been displayed before referencing the 'totalValid' field.
In the polyinterp function, cp is first an N-dim vector, and then is reshaped into an Nx2 matrix, only if no NaNs are found.
In the case where NaNs are found that statement crashes, as it expects an Nx2 matrix.
Not sure what the right behavior should be there. Is it normal than NaNs appear at that point?
I think its time for a new rockspec.
Could we do without tags and just have a rockspec that gets updates as the repository gets updated?
While training with sgd works fine for the exact architecture, rmsprop throws the fallowing error:
qlua: /home/psxab5/torch/install/share/lua/5.1/optim/rmsprop.lua:49: calling 'addcmul' on bad self (sizes do not match at /tmp/luarocks_cutorch-scm-1-4946/cutorch/lib/THC/THCTensorMath.cu:231)
stack traceback:
[C]: at 0x7fcea4fe4e20
[C]: in function 'addcmul'
/home/psxab5/torch/install/share/lua/5.1/optim/rmsprop.lua:49: in function 'rmsprop'
./train.lua:62: in function 'train'
main.lua:91: in main chunk
This seems way too basic, but I've been looking at this for several hours and can't see a way around this:
I have a basic script which is creating a network and checking the gradient. When run on the CPU using double precision floats, the gradient check is good. On GPU, because CudaTensor objects use single precision floats, the analytic and numerical gradients don't match. This issue can also be replicated by setting the Tensor type on CPU to be FloatTensor. I feel like I have to be messing something up, but really don't see anything.
Here's the script:
require 'nn'
require 'optim'
require 'cunn'
local inputSize = 10
local outputSize = 10
local batchSize = 5
function check_gpu()
-- First we do everything on CPU
-- If you uncomment this line, the CPU estimates will be inconsistent too
-- torch.setdefaulttensortype('torch.FloatTensor')
local net = nn.Sequential()
-- :add(nn.Sigmoid())
:add(nn.Linear(inputSize, outputSize))
local w,dw = net:getParameters()
local inp = torch.randn(batchSize, inputSize)
local tgt = torch.zeros(batchSize)
for i=1,batchSize do
local i1 = torch.random(1, inputSize)
tgt[i] = i1
end
local crit = nn.CrossEntropyCriterion()
crit.sizeAverage = false
local feval = function(x)
if x ~= w then w:copy(x) end
local out = net:forward(inp)
local ce = crit:forward(out, tgt)
local gradOutput = crit:backward(out, tgt)
net:backward(inp, gradOutput)
return ce, dw
end
local diff_cpu = optim.checkgrad(feval, w, 1e-4)
print ('on cpu', diff_cpu)
-- Then we do everything on GPU
net:cuda()
-- inp = inp:type('torch.CudaDoubleTensor')
-- tgt = tgt:type('torch.CudaDoubleTensor')
inp = inp:cuda()
tgt = tgt:cuda()
w, dw = net:getParameters()
dw:zero()
crit:cuda()
local diff_gpu = optim.checkgrad(feval, w, 1e-4)
print ('on gpu now', diff_gpu)
end
check_gpu()
Here's some sample output:
on cpu 4.0399252194701e-10
on gpu now 0.003014417524189
Hi, it seems that every time the learningRate and learningRateDecay in ADAM get replaced by the default values 0.001 and 0 instead of the user-set values. Can anyone have a check?
Thanks!
In this line the space is set default by %8d
, can we customize it? That's because it will be tough to view if #classes
is large
lswolfe uses torch.Tensor by default internally and does not work with CudaTensors. If you set the default tensor type to CudaTensor it still doesn't work. This prevents using line searches with lbfgs when using CudaTensors.
When I try to use fista for optimization with GPU, I got the error. It also report here
I am currently using optim.adam to train my network. Let say, I am training my network up to xth epoch and I save my model, what settings in the optim function should I save in order to continue the training?
I notice that if I just load my save model, the loss computed is did not follow the trend. (the loss actually went back to loss computed in the first epoch). There must be some settings I need to reload in order to get back the similar loss.
The way I compare the result is by computing the loss at x + n epoch but I saved my model at xth epoch. After that I just reload my saved model at xth iteration and train to n epoch and compare the loss computed.
Technically speaking they should be similar. I hope someone can shed some light in this issue.
Say we log time, train and test accuracy. We can not plot time and accuracy on same figure since scale is different. I suggest to add ability to select what categories to plot. In my example we probably want to plot only accuracies
Dear all,
are there any plans to include an implementation of RMSProp in the near future? I know that there are already several basic/toy implementations available elsewhere [1,2,3], but it would be nice to have a solid implementation included in the optim-package, making it easier to use in terms of installing additional packages etc...
Thanks,
Michael
[1] https://github.com/w-cheng/optimx/blob/master/rmsprop.lua
[2] https://github.com/y0ast/VariationalDeconvnet/blob/master/rmsprop.lua
[3] https://github.com/kaishengtai/torch-ntm/blob/master/rmsprop.lua
While playing with the example for MarginRankingCriterion
at https://github.com/torch/nn/blob/master/doc/criterion.md#nn.MarginRankingCriterion, I noticed that optim does not seem to be able to handle cloned weights and biases, due a size mismatch of the flattened parameters and flattened gradient params. Here's a simple example:
require 'nn'
require 'optim'
p1_mlp = nn.Linear(5, 2)
p2_mlp = p1_mlp:clone('weight', 'bias')
prl = nn.ParallelTable()
prl:add(p1_mlp)
prl:add(p2_mlp)
mlp1 = nn.Sequential()
mlp1:add(prl)
mlp1:add(nn.DotProduct())
mlp2 = mlp1:clone('weight', 'bias')
mlpa = nn.Sequential()
prla = nn.ParallelTable()
prla:add(mlp1)
prla:add(mlp2)
mlpa:add(prla)
criterion = nn.MarginRankingCriterion(0.1)
x, y, z = torch.randn(5), torch.randn(5), torch.randn(5)
parameters, gradParameters = mlpa:getParameters()
print(parameters:size(), gradParameters:size()) -- show size difference
function feval(params)
local pred = mlpa:forward({{x, y}, {x, z}})
local err = criterion:forward(pred, 1)
local gradCriterion = criterion:backward(pred, 1)
mlpa:backward({{x, y}, {x, z}}, gradCriterion)
return err, gradParameters
end
optErr = optim.sgd(feval, parameters, {learningRate=0.01})
Which gives me:
12
[torch.LongStorage of size 1]
48
[torch.LongStorage of size 1]
luajit: .../share/lua/5.1/optim/sgd.lua:81: inconsistent tensor size at /tmp/luarocks_torch-scm-1-5221/torch7/lib/TH/generic/THTensorMath.c:456
stack traceback:
[C]: in function 'add'
.../share/lua/5.1/optim/sgd.lua:81: in function 'sgd'
example.lua:37: in main chunk
[C]: in function 'dofile'
.../lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406260
The example in the documentation uses updateParameters()
, which handles the parameter updating differently from optim's addition: see e.g. https://github.com/torch/optim/blob/master/sgd.lua#L81 versus the following in nn/Module.lua
:
for i=1,#params do
params[i]:add(-learningRate, gradParams[i])
end
This is a bit confusing. Of course I could somehow resize the gradParameters in feval, but that doesn't seem to be the right way to do this? I would say that if something works with the simple backward-updateParameters loop it should also work with optim. Am I missing something here?
I've been using Adagrad normally, but I decided to try Rmsprop to see if it improves accuracy. In our tests, rmsprop seemed to converge faster and to a higher maximum, so we started training our big models using it. However, I have noticed something strange that happens during training. It seems like randomly, the accuracy will suddenly precipitously drop, with loss suddenly shooting up. Sometimes I've even seen "infinity" in our testing results when this happens - as if one of the model parameters got accidentally changed to infinity, causing a cascade of failed calculations. See these results:
This is one of the first rmsprop runs:
decayed learning rate by a factor 0.97 to 0.012665023782736
iteration 6800/17090000, seq_length = 500, loss = 25.60889482, loss/seq_len = 0.02560889, gradnorm = 1.3048e+01. Time Elapsed: 3070 seconds
iteration 6850/17090000, seq_length = 500, loss = 35.99438245, loss/seq_len = 0.03599438, gradnorm = 1.7849e+01. Time Elapsed: 3158 seconds
iteration 6900/17090000, seq_length = 500, loss = 14.20753793, loss/seq_len = 0.01420754, gradnorm = 1.6731e+01. Time Elapsed: 3185 seconds
iteration 6950/17090000, seq_length = 500, loss = 31.02228065, loss/seq_len = 0.03102228, gradnorm = 2.1421e+01. Time Elapsed: 3205 seconds
decayed learning rate by a factor 0.97 to 0.012285073069254
iteration 7000/17090000, seq_length = 500, loss = 126072.68073179, loss/seq_len = 126.07268073, gradnorm = 9.3243e+03. Time Elapsed: 3183 seconds
iteration 7050/17090000, seq_length = 500, loss = 71258.54748077, loss/seq_len = 71.25854748, gradnorm = 9.2335e+03. Time Elapsed: 6792 seconds
iteration 7100/17090000, seq_length = 500, loss = 59993.95191604, loss/seq_len = 59.99395192, gradnorm = 8.9946e+03. Time Elapsed: 3071 seconds
iteration 7150/17090000, seq_length = 500, loss = 80161.97462837, loss/seq_len = 80.16197463, gradnorm = 9.0648e+03. Time Elapsed: 3223 seconds
decayed learning rate by a factor 0.97 to 0.011916520877176
iteration 7200/17090000, seq_length = 500, loss = 62363.37415352, loss/seq_len = 62.36337415, gradnorm = 6.3187e+03. Time Elapsed: 3077 seconds
iteration 7250/17090000, seq_length = 500, loss = 77396.41234885, loss/seq_len = 77.39641235, gradnorm = 6.3629e+03. Time Elapsed: 2930 seconds
iteration 7300/17090000, seq_length = 500, loss = 66974.65153092, loss/seq_len = 66.97465153, gradnorm = 5.9655e+03. Time Elapsed: 2989 seconds
iteration 7350/17090000, seq_length = 500, loss = 34369.91119689, loss/seq_len = 34.36991120, gradnorm = 5.8163e+03. Time Elapsed: 2813 seconds
Notice what happens around iteration 7000. The loss just shoots up all of the sudden. If I check the testing results, the testing loss is "infinify". It goes back to normal in subsequent iterations. At first I thought it was a rare hardware issue, but then a different model did the same thing:
Iteration Time Training Loss Testing Loss Testing # Correct Testing # Wrong Testing # Total Accurracy
1000 3032 1.998393671 3.460828 8220 140937 149157 5.51
2000 3321 1.506352061 1.13135852 106180 42977 149157 71.19
3000 3389 0.6526988754 0.6081444923 126793 22364 149157 85.01
4000 3382 0.4032474733 0.4583896942 131588 17569 149157 88.22
5000 3075 2.197617545 17.48262351 60603 88554 149157 40.63
In this second example, I can see the point where the loss starts shooting up in the logs. It doesn't appear to be instantaneous - perhaps an error is made in one iteration that slowly cascades until it affects everything.
decayed learning rate by a factor 0.97 to 0.01825346
iteration 4400/17090000, seq_length = 500, loss = 0.38249470, gradnorm = 8.0499e+01. Time Elapsed: 3280 seconds
iteration 4450/17090000, seq_length = 500, loss = 0.37212085, gradnorm = 2.9393e+02. Time Elapsed: 3426 seconds
iteration 4500/17090000, seq_length = 500, loss = 0.36586265, gradnorm = 8.7689e+01. Time Elapsed: 3288 seconds
iteration 4550/17090000, seq_length = 500, loss = 0.35865728, gradnorm = 5.4034e+01. Time Elapsed: 3416 seconds
decayed learning rate by a factor 0.97 to 0.0177058562
iteration 4600/17090000, seq_length = 500, loss = 0.40036575, gradnorm = 7.8565e+01. Time Elapsed: 3327 seconds
iteration 4650/17090000, seq_length = 500, loss = 0.42660431, gradnorm = 2.2500e+02. Time Elapsed: 3309 seconds
iteration 4700/17090000, seq_length = 500, loss = 0.49915671, gradnorm = 4.2741e+03. Time Elapsed: 3237 seconds
iteration 4750/17090000, seq_length = 500, loss = 0.86534878, gradnorm = 3.5756e+03. Time Elapsed: 3251 seconds
decayed learning rate by a factor 0.97 to 0.017174680514
iteration 4800/17090000, seq_length = 500, loss = 1.24005108, gradnorm = 4.3706e+03. Time Elapsed: 3232 seconds
iteration 4850/17090000, seq_length = 500, loss = 1.22130984, gradnorm = 5.6758e+03. Time Elapsed: 3117 seconds
iteration 4900/17090000, seq_length = 500, loss = 6.12171381, gradnorm = 9.2302e+03. Time Elapsed: 3232 seconds
iteration 4950/17090000, seq_length = 500, loss = 11.80134205, gradnorm = 9.0186e+03. Time Elapsed: 3029 seconds
decayed learning rate by a factor 0.97 to 0.01665944009858
iteration 5000/17090000, seq_length = 500, loss = 17.11424646, gradnorm = 6.3805e+03. Time Elapsed: 3075 seconds
You can see loss going down, and then it starts going up again slowly, which isn't totally unusual. But then it quickly spikes and never recovers! We didn't see any "infinity's" in this run but the same curious sudden change in loss is visible. I wouldn't be surprised if there was actually an infinity, but in one of the iterations inbetween where we don't record results.
Does anyone have any insight into what might be happening? I haven't ever seen something like this when using Adagrad - only with the models that we train using rmsprop.
Hi there,
I'm using Torch to implement TransE model. Actually I need to use a SplitTable and I got an error like the one reported here: torch/nn#568. I've tried to redefine the updateGradInput method but using optim I got the following error in the backward pass:
/opt/torch/install/bin/luajit: /opt/torch/install/share/lua/5.1/optim/sgd.lua:82: inconsistent tensor size at /opt/torch/pkg/torch/lib/TH/generic/THTensorMath.c:500
stack traceback:
[C]: in function 'add'
/opt/torch/install/share/lua/5.1/optim/sgd.lua:82: in function 'optim_method'
I've tried my model using the on the fly training procedure and it works. So I think that the problem is related to optim; maybe it is not able to manage the weird case of the undefined updateGradInput method in the SplitTable.
Thank you for your help.
Alessandro
is there a way to have multiple plots in one figure? one plot works fine, but when I do something like
logger:add{['training loss'] = loss1 }; logger:add{['test loss'] = loss2}
it gives an error. so I define two loggers but the disadvantage is, that they produce two figures instead of one.
If torch is installed with LUA52 ConfutionMatrix wont work because there is no math.log10 in LUA52:
https://github.com/torch/optim/blob/master/ConfusionMatrix.lua#L197
While training an autoencoder for mnist that uses optim.lbfgs, my program crashes at line 215 of polyinterp.lua with the error
bad argument #2 to '?' (too many indices provided at ~/torch/pkg/torch/generic/Tensor.c:894)
On inspection, I found that the cp variable is a one-dimensional tensor, but the program expects it to be two dimensional.
require 'optim'
conf = optim.ConfusionMatrix(3)
conf:add(1,3)
conf:add(2,2)
conf:add(3,1)
conf:sensitivity() -- or conf:specificity()
heres a pull request that fixes it
#28
I'm currently working on a project where there are two modules that need to be optimized. And these two modules are somewhat relative to each other. I'm wondering if it is possible to optimize them together using optim? For example, could I write a feval function whose input is a table of parameters: { paramFromModule1, paramFromModule2 } and returns a table of grads: { gradsFromModule1, gradsFromModule2 }?
I noticed the current configuration for plotting the confusion matrix is based on qt
Is it possible to plot the same in iTorch?
How does this come about?
Each function should be documented in the README, instead of just inline. There should also be a link for each optimization function to the original paper.
In the function optim.sgd the learning rate decay is implemented this way:
line 71: local clr = lr / (1 + nevals*lrd)
Where nevals is equal to state.evalCounter. However, this evalCounter is on the line 85 increased any time when optim method is called during the training. i.e. each mini-batch update. So, the nevals not contains number of iteration, but number of mini-batches.
Is that a bug or your intention?
Thanks,
Petr
I have installed optim, but it could not be found.
lua: ./Network.lua:1: module 'optim' not found: no field package.preload['optim'] no file '/usr/local/share/lua/5.2/optim.lua' no file '/usr/local/share/lua/5.2/optim/init.lua' no file '/usr/local/lib/lua/5.2/optim.lua' no file '/usr/local/lib/lua/5.2/optim/init.lua' no file '/usr/share/lua/5.2/optim.lua' no file '/usr/share/lua/5.2/optim/init.lua' no file './optim.lua' no file '/usr/local/lib/lua/5.2/optim.so' no file '/usr/lib/x86_64-linux-gnu/lua/5.2/optim.so' no file '/usr/lib/lua/5.2/optim.so' no file '/usr/local/lib/lua/5.2/loadall.so' no file './optim.so' stack traceback: [C]: in function 'require' ./Network.lua:1: in main chunk [C]: in function 'require' ./AN4CTCTest.lua:4: in main chunk [C]: in ?
The installing information is following:
`sherrie@sherrie-PC:~/CTCSR$ luarocks install optim
Installing https://raw.githubusercontent.com/torch/rocks/master/optim-1.0.5-0.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/optim-1.0.5-0.rockspec... switching to 'build' mode
正克隆到 'optim'...
remote: Counting objects: 50, done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 50 (delta 10), reused 22 (delta 6), pack-reused 0
接收对象中: 100% (50/50), 40.67 KiB | 0 bytes/s, done.
处理 delta 中: 100% (10/10), done.
检查连接... 完成。
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/sherrie/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0" && make
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/sherrie/torch/install
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_optim-1.0.5-0-9983/optim/build
cd build && make install
Install the project...
-- Install configuration: "Release"
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/checkgrad.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/adagrad.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/adadelta.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/polyinterp.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/rmsprop.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/lswolfe.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/nag.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/adam.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/rprop.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/init.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/cg.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/sgd.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/fista.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/adamax.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/asgd.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/lbfgs.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/Logger.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/ConfusionMatrix.lua
-- Installing: /home/sherrie/torch/install/lib/luarocks/rocks/optim/1.0.5-0/lua/optim/cmaes.lua
Updating manifest for /home/sherrie/torch/install/lib/luarocks/rocks
optim 1.0.5-0 is now built and installed in /home/sherrie/torch/install/ (license: BSD)
`
Could you help me about this error?
hi, guys
is optim module use cublas for speed up?
I'm running into a bug that appears and disappears for no apparent reason in the use of ConfustionMatrix.batchAdd
. See these two consecutive runs:
COMMND asb ~ git mnist src th train.lua --printstep 20 --skiplog --cuda [284/403825]
72 of 45000 training records will be unused per epoch.
24 of 15000 validation records will be unused per epoch.
[2016-10-14 18:40:09] Finished epoch = 1, batch = 20, with loss = 1.1431102752686.
[2016-10-14 18:40:10] Finished epoch = 1, batch = 40, with loss = 0.98379397392273.
[2016-10-14 18:40:10] Finished epoch = 1, batch = 60, with loss = 0.69640064239502.
[2016-10-14 18:40:11] Finished epoch = 1, batch = 80, with loss = 0.53388464450836.
[2016-10-14 18:40:11] Finished epoch = 1, batch = 100, with loss = 0.42102938890457.
[2016-10-14 18:40:12] Finished epoch = 1, batch = 120, with loss = 0.69019424915314.
[2016-10-14 18:40:13] Finished epoch = 1, batch = 140, with loss = 0.28126338124275.
[2016-10-14 18:40:13] Finished epoch = 1, batch = 160, with loss = 0.31771036982536.
[2016-10-14 18:40:14] Finished epoch = 1, batch = 180, with loss = 0.36902123689651.
[2016-10-14 18:40:15] Finished epoch = 1, batch = 200, with loss = 0.15535597503185.
[2016-10-14 18:40:15] Finished epoch = 1, batch = 220, with loss = 0.26898837089539.
[2016-10-14 18:40:16] Finished epoch = 1, batch = 240, with loss = 0.2337928712368.
[2016-10-14 18:40:16] Finished epoch = 1, batch = 260, with loss = 0.19574552774429.
[2016-10-14 18:40:17] Finished epoch = 1, batch = 280, with loss = 0.37691986560822.
[2016-10-14 18:40:18] Finished epoch = 1, batch = 300, with loss = 0.27491936087608.
[2016-10-14 18:40:18] Finished epoch = 1, batch = 320, with loss = 0.36371386051178.
[2016-10-14 18:40:19] Finished epoch = 1, batch = 340, with loss = 0.15922805666924.
/home/asb/torch/install/bin/luajit: ...sb/torch/install/share/lua/5.1/optim/ConfusionMatrix.lua:117: bad argument #1 to
'indexAdd' (out of range at /home/asb/torch/pkg/torch/lib/TH/generic/THTensor.c:729)
stack traceback:
[C]: in function 'indexAdd'
...sb/torch/install/share/lua/5.1/optim/ConfusionMatrix.lua:117: in function 'batchAdd'
train.lua:153: in main chunk
[C]: in function 'dofile'
.../asb/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405b80
COMMND asb ~ git mnist src th train.lua --printstep 20 --skiplog --cuda master
72 of 45000 training records will be unused per epoch.
24 of 15000 validation records will be unused per epoch.
[2016-10-14 18:43:17] Finished epoch = 1, batch = 20, with loss = 2.0140626430511.
[2016-10-14 18:43:18] Finished epoch = 1, batch = 40, with loss = 0.97827231884003.
[2016-10-14 18:43:18] Finished epoch = 1, batch = 60, with loss = 0.62330090999603.
[2016-10-14 18:43:19] Finished epoch = 1, batch = 80, with loss = 0.73870342969894.
[2016-10-14 18:43:19] Finished epoch = 1, batch = 100, with loss = 0.61164426803589.
[2016-10-14 18:43:20] Finished epoch = 1, batch = 120, with loss = 0.40717771649361.
[2016-10-14 18:43:21] Finished epoch = 1, batch = 140, with loss = 0.46196541190147.
[2016-10-14 18:43:21] Finished epoch = 1, batch = 160, with loss = 0.7626816034317.
[2016-10-14 18:43:22] Finished epoch = 1, batch = 180, with loss = 0.42969378829002.
[2016-10-14 18:43:23] Finished epoch = 1, batch = 200, with loss = 0.42102152109146.
[2016-10-14 18:43:23] Finished epoch = 1, batch = 220, with loss = 0.34528177976608.
[2016-10-14 18:43:24] Finished epoch = 1, batch = 240, with loss = 0.32393988966942.
[2016-10-14 18:43:24] Finished epoch = 1, batch = 260, with loss = 0.25361078977585.
[2016-10-14 18:43:25] Finished epoch = 1, batch = 280, with loss = 0.35111820697784.
[2016-10-14 18:43:26] Finished epoch = 1, batch = 300, with loss = 0.35840207338333.
[2016-10-14 18:43:26] Finished epoch = 1, batch = 320, with loss = 0.19336950778961.
[2016-10-14 18:43:27] Finished epoch = 1, batch = 340, with loss = 0.23242954909801.
Total accuracy of classifier at completion of epoch 1 = 92.062784433365.
Mean accuracy across classes at completion of epoch 1 = 92.140758547009.
[2016-10-14 18:43:29] Finished epoch = 2, batch = 20, with loss = 0.45858466625214.
[2016-10-14 18:43:29] Finished epoch = 2, batch = 40, with loss = 0.22427660226822.
[2016-10-14 18:43:30] Finished epoch = 2, batch = 60, with loss = 0.2953850030899.
[2016-10-14 18:43:31] Finished epoch = 2, batch = 80, with loss = 0.2055009752512.
The first run fails while the other succeeds with no changes whatsoever.
Moreover, arg 1 for indexAdd
that is being reported in the stack trace is hard-coded to the value 1. So not sure how user code should even affect it.
My code is available here for reference.
Any ideas to debug this?
Thanks.
Could I request a test problem for the Adam optimizer, just to understand how it works better? Thanks 👍
I tried to use the new adam optimizer on the rosenbrock test problem in optim
, used for adagrad, but it does'nt seem to work? I can't get it to work with a range of different config
parameters for rosenberg, or for the ML problem I'm working on - simple copy tasks using the neural Turing machine?
I expect I've misunderstood the adam paper, so I'm doing something wrong/really dumb -- does the objective necessarily have to be stochastic -- for adam to be applied?
If so then I expect it would'nt work for rosenbrock, and the LSTM used in the neural Turing machine which is which is deterministic?
My failed rosenbrock attempt
require 'torch'
require 'optim'
require 'rosenbrock'
require 'l2'
x = torch.Tensor(2):fill(0)
fx = {}
config_adagrad = {learningRate=1e-1}
config_adam = adam_config = {
learningRate = 1e-6,
beta1 = 0.01,
beta2 = 0.001
}
for i = 1,10001 do
--x,f=optim.adagrad(rosenbrock,x,config_adagrad)
x,f=optim.adam(rosenbrock,x,config_adam)
if (i-1)%1000 == 0 then
table.insert(fx,f[1])
end
end
print()
print('Rosenbrock test')
print()
print('x=');print(x)
print('fx=')
OUTPUT
Rosenbrock test
x=
0.01 *
2.0243
0.0523
[torch.DoubleTensor of dimension 2]
fx=
1 1
1001 0.96578919291079
2001 0.96476690379526
3001 0.96406793828219
4001 0.96344671327807
5001 0.96285032358036
6001 0.96226242754115
7001 0.96167791102736
8001 0.96111328576091
9001 0.96050921334115
10001 0.9599246681675
As far as I understand the point of weight decay is to avoid weights being too big (in absolute value).
Shouldn't weight decay be implemented with absolute values to avoid negative values getting larger in magnitude? Otherwise, if we have a large negative weight the gradient will make it try to become even larger.
Current implementation:
dfdx:add(wd, x)
How it should be:
dfdx:add(wd, torch.abs(x))
This applies to the Adagrad and SGD weight decay.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.