Code Monkey home page Code Monkey logo

Comments (17)

bigaidream avatar bigaidream commented on August 26, 2024

Thanks for your interests! We were not aware of torch-autograd until we finished the project.
In fact, my friend, @MartinGaoR and I have already started working on the GPU code since last week. We just finished learning Lua and Torch and are still playing around with torch-autograd code.

Will let you know our difficulties while working on it. Thanks.

from drmad.

bigaidream avatar bigaidream commented on August 26, 2024

Hi @alexbw , I just finished a very initial version of DrMAD method based on torch-autograd. https://github.com/bigaidream-projects/drmad/tree/master/hypergrad_lua
One problem from torh-autograd side is that torch.dot() is not yet supported.

from drmad.

alexbw avatar alexbw commented on August 26, 2024

I can add torch.dot support easily. Hopefully next week.
We tend not to use that function regularly because it silently flattens
matrices, which has led to very difficult to find bugs.
On Sat, Mar 5, 2016 at 10:55 AM Jie Fu [email protected] wrote:

Hi @alexbw https://github.com/alexbw , I just finished a very initial
version of DrMAD method based on torch-autograd.
https://github.com/bigaidream-projects/drmad/tree/master/hypergrad_lua
One problem from torh-autograd side is that torch.dot() is not yet
supported.


Reply to this email directly or view it on GitHub
#1 (comment)
.

from drmad.

alexbw avatar alexbw commented on August 26, 2024

Actually, I would recommend you just use regular matrix multiplies to implement the dot product.

th> a = torch.randn(10)
                                                                      [0.0025s]
th> a
 0.3137
 0.5367
 0.1144
 0.6870
-0.7452
 0.4552
 0.2735
-1.7104
-3.1059
 1.7010
[torch.DoubleTensor of size 10]

                                                                      [0.0022s]
th> torch.dot(a,a)
17.174257936419
                                                                      [0.0021s]
th> a*a
17.174257936419
                                                                      [0.0001s]

Using torch.dot can cause weird problems when you don't check the dimensionality of the inputs. For example--

th> b = torch.eye(3)
                                                                      [0.0001s]
th> b
 1  0  0
 0  1  0
 0  0  1
[torch.DoubleTensor of size 3x3]

                                                                      [0.0002s]
th> b*b
 1  0  0
 0  1  0
 0  0  1
[torch.DoubleTensor of size 3x3]

                                                                      [0.0002s]
th> torch.dot(b,b)
3
                                                                      [0.0001s]

Is there a reason I'm missing why torch.dot would be required?

from drmad.

bigaidream avatar bigaidream commented on August 26, 2024

In fact torch.dot() is not really required. I can use regular matrix multiplies.

Because the Python version of hypergrad uses dot(), and I find that torch-autograd does support dot(), I thought torch-autograd does not even support matrix multiplication.

from drmad.

bigaidream avatar bigaidream commented on August 26, 2024

@alexbw, Could you take a look at https://github.com/bigaidream-projects/drmad/blob/master/hypergrad_lua/drmad_mnist.lua#L221
For some reasons, torch-autograd does not support this operation?

from drmad.

alexbw avatar alexbw commented on August 26, 2024

Try not closing-over the DV variable

On Tue, Mar 29, 2016 at 4:22 PM Jie Fu [email protected] wrote:

@alexbw https://github.com/alexbw, Could you take a look at
https://github.com/bigaidream-projects/drmad/blob/master/hypergrad_lua/drmad_mnist.lua#L221
For some reasons, torch-autograd does not support this operation?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1 (comment)

from drmad.

bigaidream avatar bigaidream commented on August 26, 2024

It can tune learning rates and L2 penalties on GPU now. Will do refactoring later.

from drmad.

taineleau-zz avatar taineleau-zz commented on August 26, 2024

Hi @alexbw ,
@bigaidream and I are refactoring DrMAD to support VGG/ResNet on CIFAR/ImageNet.

Could you please have a look at https://github.com/bigaidream-projects/drmad/blob/master/CIFAR10/cifar10_L2.lua#L85 ?

this line incur a weird error that I have no idea how to fix it.

==> loading data
    completed!
==> configuring model
/home/taineleau/torch/install/bin/luajit: .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:23: constant tensor with more than one dimension. is this an upvalue that should be a function argument?
stack traceback:
    [C]: in function 'error'
    .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:23: in function 'init'
    .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:11: in function 'new'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:24: in function 'cmul'
    cifar10_L2.lua:85: in function 'L2_norm'
    cifar10_L2.lua:118: in function 'fn'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:353: in function 'protectedFn'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:383: in function 'record'
    .../install/share/lua/5.1/autograd/runtime/codegen/init.lua:44: in function 'generateFn'
    .../install/share/lua/5.1/autograd/runtime/codegen/init.lua:140: in function 'dfTrain'
    cifar10_L2.lua:129: in function 'init'
    cifar10_L2.lua:205: in main chunk
    [C]: in function 'dofile'
    ...leau/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

But if we remove the torch.cmul() operation as https://github.com/bigaidream-projects/drmad/blob/master/CIFAR10/cifar10_L2.lua#L86, it goes well.

from drmad.

alexbw avatar alexbw commented on August 26, 2024

You're using params_L2 as an upvalue. Try passing it into the function as an argument.

from drmad.

taineleau-zz avatar taineleau-zz commented on August 26, 2024

@alexbw Thanks for your reply :) It works!

I somehow misunderstood the error messages so that I tried passing it as params as this commits. But got the same result.

Just now I figure out that it's because the function L2_norm() has used an upvalue. Now it works.

Sorry for making such a silly mistake.

from drmad.

alexbw avatar alexbw commented on August 26, 2024

No worries :)
On Mon, Sep 19, 2016 at 10:37 AM Danlu Chen [email protected]
wrote:

@alexbw https://github.com/alexbw Thanks for your reply :) It works!

I somehow misunderstood the error messages so that I tried passing it as
params as this commits
eb38963#diff-7c45adcd2acbe2bd6a96151856ee5244R134.
But got the same result.

Just now I figure out that it's because the function L2_norm()
223dc19#diff-7c45adcd2acbe2bd6a96151856ee5244R83
has used an upvalue. Now it works.

Sorry for making such a silly mistake.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJ4j1xDP9wazgzW6PA_DtusQq7gXJuyks5qrsh3gaJpZM4HiXu0
.

from drmad.

taineleau-zz avatar taineleau-zz commented on August 26, 2024

@alexbw Hi, I ran into a weird problem.

The loss will drastically increase when using autograd's grad.functionalize; but the original one is steady and never larger than 5 (see log below).

I train them in the same script and only a few lines are different. see: autograd and original.

As a sidenote:
I define the function dfTrain between L188-L200.
sgd_m() is a function modified from optim.sgd(), and the only difference is that I use a for-loop to handle gradParameters, since the return value of grad is a table.
this training script is based on cifar.torch.

Loss log using autograd's functionalize:

loss: ==1144.3304443359................... 81/390 ......................................] ETA: 8m42s | Step: 1s691ms
loss: ==935.63635253906................... 82/390 ......................................] ETA: 8m41s | Step: 1s691ms
loss: ==802.60095214844................... 83/390 ......................................] ETA: 8m39s | Step: 1s691ms
loss: ==1145.2346191406................... 84/390 ......................................] ETA: 8m37s | Step: 1s691ms
loss: ==1425.2674560547................... 85/390 ......................................] ETA: 8m35s | Step: 1s691ms
loss: ==796.73828125...................... 86/390 ......................................] ETA: 8m34s | Step: 1s691ms
loss: ==890.10321044922................... 87/390 ......................................] ETA: 8m32s | Step: 1s691ms
loss: ==2512.7094726562................... 88/390 ......................................] ETA: 8m30s | Step: 1s691ms
loss: ==2077.421875=>..................... 89/390 ......................................] ETA: 8m29s | Step: 1s691ms

using original one:

loss: ==2.3497157096863................... 81/390 ......................................] ETA: 8m38s | Step: 1s677ms
loss: ==2.467734336853.................... 82/390 ......................................] ETA: 8m36s | Step: 1s677ms
loss: ==2.2728099822998................... 83/390 ......................................] ETA: 8m35s | Step: 1s677ms
loss: ==2.4845409393311................... 84/390 ......................................] ETA: 8m33s | Step: 1s677ms
loss: ==2.3753600120544................... 85/390 ......................................] ETA: 8m31s | Step: 1s677ms
loss: ==2.4003090858459................... 86/390 ......................................] ETA: 8m29s | Step: 1s677ms
loss: ==2.2653262615204................... 87/390 ......................................] ETA: 8m28s | Step: 1s677ms
loss: ==2.3047337532043................... 88/390 ......................................] ETA: 8m26s | Step: 1s677ms
loss: ==2.3490085601807................... 89/390 ......................................] ETA: 8m24s | Step: 1s677ms

from drmad.

taineleau-zz avatar taineleau-zz commented on August 26, 2024

@alexbw some updates for this issue.

I use functionalize() to obtain the gradients on ResNet, and it works well.

Compared VGG and ResNet, I find that VGG uses nn.Dropout(), while ResNet does not. Is it the reason why functionalize() does not work on VGG?

from drmad.

alexbw avatar alexbw commented on August 26, 2024

I don't see any obvious problems with VGG. All of your closures are used
just for constructing the network, and then you return just the network. I
would try a "binary search", splitting your network into halves,
progressively, and see where the problem is.

On Wed, Sep 28, 2016 at 11:33 PM Danlu Chen [email protected]
wrote:

@alexbw https://github.com/alexbw some updates for this issue.

I use functionalize() to obtain the gradients on ResNet
https://github.com/facebook/fb.resnet.torch, and it works well.

Compared VGG
https://github.com/szagoruyko/cifar.torch/blob/master/models/vgg_bn_drop.luaand
ResNet
https://github.com/facebook/fb.resnet.torch/blob/master/models/resnet.lua,
I find that VGG uses nn.Dropout(), while ResNet does not. Is it the
reason why functionalize() does not work on VGG?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJ4j6xzMnq_teLUl7Gu33DLOxdgta05ks5quzF4gaJpZM4HiXu0
.

from drmad.

taineleau-zz avatar taineleau-zz commented on August 26, 2024

@alexbw some updates.

FYI:

I managed to fix the issue above by putting vgg.lua(exactly the same in cifar.torch) under the training script of fb.resnet.torch and magically, the exploding problem on loss is fixed.

At least we are sure that vgg_bn_drop.lua work well in autograd and I am able to continue my experiments, though I still don't know why the training script of cifar.torch is not compatible with autograd.

from drmad.

taineleau-zz avatar taineleau-zz commented on August 26, 2024

Torch-autograd does not support hessian-vector for now. Using Theano instead.

from drmad.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.