hey, saw you guys in someone else's starred repository stream. fyi, we have a torch ve

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

torch autograd about drmad HOT 17 CLOSED

alexbw commented on August 26, 2024

torch autograd

from drmad.

Comments (17)

bigaidream commented on August 26, 2024

Thanks for your interests! We were not aware of torch-autograd until we finished the project.
In fact, my friend, @MartinGaoR and I have already started working on the GPU code since last week. We just finished learning Lua and Torch and are still playing around with torch-autograd code.

Will let you know our difficulties while working on it. Thanks.

from drmad.

bigaidream commented on August 26, 2024

Hi @alexbw , I just finished a very initial version of DrMAD method based on torch-autograd. https://github.com/bigaidream-projects/drmad/tree/master/hypergrad_lua
One problem from torh-autograd side is that torch.dot() is not yet supported.

from drmad.

alexbw commented on August 26, 2024

I can add torch.dot support easily. Hopefully next week.
We tend not to use that function regularly because it silently flattens
matrices, which has led to very difficult to find bugs.
On Sat, Mar 5, 2016 at 10:55 AM Jie Fu [email protected] wrote:

Hi @alexbw https://github.com/alexbw , I just finished a very initial
version of DrMAD method based on torch-autograd.
https://github.com/bigaidream-projects/drmad/tree/master/hypergrad_lua
One problem from torh-autograd side is that torch.dot() is not yet
supported.

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

from drmad.

alexbw commented on August 26, 2024

Actually, I would recommend you just use regular matrix multiplies to implement the dot product.

th> a = torch.randn(10)
                                                                      [0.0025s]
th> a
 0.3137
 0.5367
 0.1144
 0.6870
-0.7452
 0.4552
 0.2735
-1.7104
-3.1059
 1.7010
[torch.DoubleTensor of size 10]

                                                                      [0.0022s]
th> torch.dot(a,a)
17.174257936419
                                                                      [0.0021s]
th> a*a
17.174257936419
                                                                      [0.0001s]

Using torch.dot can cause weird problems when you don't check the dimensionality of the inputs. For example--

th> b = torch.eye(3)
                                                                      [0.0001s]
th> b
 1  0  0
 0  1  0
 0  0  1
[torch.DoubleTensor of size 3x3]

                                                                      [0.0002s]
th> b*b
 1  0  0
 0  1  0
 0  0  1
[torch.DoubleTensor of size 3x3]

                                                                      [0.0002s]
th> torch.dot(b,b)
3
                                                                      [0.0001s]

Is there a reason I'm missing why torch.dot would be required?

from drmad.

bigaidream commented on August 26, 2024

In fact torch.dot() is not really required. I can use regular matrix multiplies.

Because the Python version of hypergrad uses dot(), and I find that torch-autograd does support dot(), I thought torch-autograd does not even support matrix multiplication.

from drmad.

bigaidream commented on August 26, 2024

@alexbw, Could you take a look at https://github.com/bigaidream-projects/drmad/blob/master/hypergrad_lua/drmad_mnist.lua#L221
For some reasons, torch-autograd does not support this operation?

from drmad.

alexbw commented on August 26, 2024

Try not closing-over the DV variable

On Tue, Mar 29, 2016 at 4:22 PM Jie Fu [email protected] wrote:

@alexbw https://github.com/alexbw, Could you take a look at
https://github.com/bigaidream-projects/drmad/blob/master/hypergrad_lua/drmad_mnist.lua#L221
For some reasons, torch-autograd does not support this operation?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1 (comment)

from drmad.

bigaidream commented on August 26, 2024

It can tune learning rates and L2 penalties on GPU now. Will do refactoring later.

from drmad.

taineleau-zz commented on August 26, 2024

Hi @alexbw ,
@bigaidream and I are refactoring DrMAD to support VGG/ResNet on CIFAR/ImageNet.

Could you please have a look at https://github.com/bigaidream-projects/drmad/blob/master/CIFAR10/cifar10_L2.lua#L85 ?

this line incur a weird error that I have no idea how to fix it.

==> loading data
    completed!
==> configuring model
/home/taineleau/torch/install/bin/luajit: .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:23: constant tensor with more than one dimension. is this an upvalue that should be a function argument?
stack traceback:
    [C]: in function 'error'
    .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:23: in function 'init'
    .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:11: in function 'new'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:24: in function 'cmul'
    cifar10_L2.lua:85: in function 'L2_norm'
    cifar10_L2.lua:118: in function 'fn'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:353: in function 'protectedFn'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:383: in function 'record'
    .../install/share/lua/5.1/autograd/runtime/codegen/init.lua:44: in function 'generateFn'
    .../install/share/lua/5.1/autograd/runtime/codegen/init.lua:140: in function 'dfTrain'
    cifar10_L2.lua:129: in function 'init'
    cifar10_L2.lua:205: in main chunk
    [C]: in function 'dofile'
    ...leau/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

But if we remove the torch.cmul() operation as https://github.com/bigaidream-projects/drmad/blob/master/CIFAR10/cifar10_L2.lua#L86, it goes well.

from drmad.

alexbw commented on August 26, 2024

You're using params_L2 as an upvalue. Try passing it into the function as an argument.

from drmad.

taineleau-zz commented on August 26, 2024

@alexbw Thanks for your reply :) It works!

I somehow misunderstood the error messages so that I tried passing it as params as this commits. But got the same result.

Just now I figure out that it's because the function L2_norm() has used an upvalue. Now it works.

Sorry for making such a silly mistake.

from drmad.

alexbw commented on August 26, 2024

No worries :)
On Mon, Sep 19, 2016 at 10:37 AM Danlu Chen [email protected]
wrote:

@alexbw https://github.com/alexbw Thanks for your reply :) It works!

I somehow misunderstood the error messages so that I tried passing it as
params as this commits
eb38963#diff-7c45adcd2acbe2bd6a96151856ee5244R134.
But got the same result.

Just now I figure out that it's because the function L2_norm()
223dc19#diff-7c45adcd2acbe2bd6a96151856ee5244R83
has used an upvalue. Now it works.

Sorry for making such a silly mistake.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJ4j1xDP9wazgzW6PA_DtusQq7gXJuyks5qrsh3gaJpZM4HiXu0
.

from drmad.

taineleau-zz commented on August 26, 2024

@alexbw Hi, I ran into a weird problem.

The loss will drastically increase when using autograd's grad.functionalize; but the original one is steady and never larger than 5 (see log below).

I train them in the same script and only a few lines are different. see: autograd and original.

As a sidenote:
I define the function dfTrain between L188-L200.
sgd_m() is a function modified from optim.sgd(), and the only difference is that I use a for-loop to handle gradParameters, since the return value of grad is a table.
this training script is based on cifar.torch.

Loss log using autograd's functionalize:

loss: ==1144.3304443359................... 81/390 ......................................] ETA: 8m42s | Step: 1s691ms
loss: ==935.63635253906................... 82/390 ......................................] ETA: 8m41s | Step: 1s691ms
loss: ==802.60095214844................... 83/390 ......................................] ETA: 8m39s | Step: 1s691ms
loss: ==1145.2346191406................... 84/390 ......................................] ETA: 8m37s | Step: 1s691ms
loss: ==1425.2674560547................... 85/390 ......................................] ETA: 8m35s | Step: 1s691ms
loss: ==796.73828125...................... 86/390 ......................................] ETA: 8m34s | Step: 1s691ms
loss: ==890.10321044922................... 87/390 ......................................] ETA: 8m32s | Step: 1s691ms
loss: ==2512.7094726562................... 88/390 ......................................] ETA: 8m30s | Step: 1s691ms
loss: ==2077.421875=>..................... 89/390 ......................................] ETA: 8m29s | Step: 1s691ms

using original one:

loss: ==2.3497157096863................... 81/390 ......................................] ETA: 8m38s | Step: 1s677ms
loss: ==2.467734336853.................... 82/390 ......................................] ETA: 8m36s | Step: 1s677ms
loss: ==2.2728099822998................... 83/390 ......................................] ETA: 8m35s | Step: 1s677ms
loss: ==2.4845409393311................... 84/390 ......................................] ETA: 8m33s | Step: 1s677ms
loss: ==2.3753600120544................... 85/390 ......................................] ETA: 8m31s | Step: 1s677ms
loss: ==2.4003090858459................... 86/390 ......................................] ETA: 8m29s | Step: 1s677ms
loss: ==2.2653262615204................... 87/390 ......................................] ETA: 8m28s | Step: 1s677ms
loss: ==2.3047337532043................... 88/390 ......................................] ETA: 8m26s | Step: 1s677ms
loss: ==2.3490085601807................... 89/390 ......................................] ETA: 8m24s | Step: 1s677ms

from drmad.

taineleau-zz commented on August 26, 2024

@alexbw some updates for this issue.

I use functionalize() to obtain the gradients on ResNet, and it works well.

Compared VGG and ResNet, I find that VGG uses nn.Dropout(), while ResNet does not. Is it the reason why functionalize() does not work on VGG?

from drmad.

alexbw commented on August 26, 2024

I don't see any obvious problems with VGG. All of your closures are used
just for constructing the network, and then you return just the network. I
would try a "binary search", splitting your network into halves,
progressively, and see where the problem is.

On Wed, Sep 28, 2016 at 11:33 PM Danlu Chen [email protected]
wrote:

@alexbw https://github.com/alexbw some updates for this issue.

I use functionalize() to obtain the gradients on ResNet
https://github.com/facebook/fb.resnet.torch, and it works well.

Compared VGG
https://github.com/szagoruyko/cifar.torch/blob/master/models/vgg_bn_drop.luaand
ResNet
https://github.com/facebook/fb.resnet.torch/blob/master/models/resnet.lua,
I find that VGG uses nn.Dropout(), while ResNet does not. Is it the
reason why functionalize() does not work on VGG?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJ4j6xzMnq_teLUl7Gu33DLOxdgta05ks5quzF4gaJpZM4HiXu0
.

from drmad.

taineleau-zz commented on August 26, 2024

@alexbw some updates.

FYI:

I managed to fix the issue above by putting vgg.lua(exactly the same in cifar.torch) under the training script of fb.resnet.torch and magically, the exploding problem on loss is fixed.

At least we are sure that vgg_bn_drop.lua work well in autograd and I am able to continue my experiments, though I still don't know why the training script of cifar.torch is not compatible with autograd.

from drmad.

taineleau-zz commented on August 26, 2024

Torch-autograd does not support hessian-vector for now. Using Theano instead.

from drmad.

torch autograd about drmad HOT 17 CLOSED

Comments (17)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent