Code Monkey home page Code Monkey logo

xnor-net's People

Contributors

mrastegari avatar schmmd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xnor-net's Issues

Problem about prediction score when testing an image

When I used the pre-trained module to test an image, I wrote codes as below

predictions = model:forward(img:cuda())
print(predictions:exp())

and then I got
[torch.CudaTensor of size 4x1000]
,which means the vector about the final prediction score is 4*1000.
However we know the last layer is
nn.View(1000)
My test result have got three extra dimension, WHY??

Convolution using XNOR & bitcounting

Hey @mrastegari ,
For doing forward pass through convolution layers in the XNOR net, I guess the shared code uses the default Torch/nn strategy . I was wondering if your code for doing convolution using XNOR + bitcounting operations is publicly available ?

Reproducing resnet18 results

Hi,
I am trying to compare for a paper and I couldn't get it to work on resnet18, can you please share your resnet18 code so I can run your model?

About the paper

Hi @mrastegari

Thank you for the nice paper and code! The work is really impressive. I have a question about the paper. As mentioned in other issues #11 , Torch does not support bit operations. But in the paper, there is a statement: "This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings". I would appreciate it if you could explain the way to compute these values (i.e., 58x and 32x). Thank you! Wish you all the best!

Best,
Yongcheng

backward gradient is vanished

Hi,thanks for your excellent work, and I'm focused on the work for a period. I think the core is Gradient optimization. But still then I haven't reproduce your experiment. Could you provide a little advice to me?

I build the network with the block that mentioned in your paper (B->A->C->P). and Backward is using full precision data (weights&gradient) for ||r|| < 1.

XNOR net doesn't converge

Has anyone successfully run xnor-net? I run the code dozens of times, but it has never converged. The error is always "nan". Any idea to make the training converge?

problems on reproducing BWN result..

Hi,

I somehow have trouble to reproduce the alexnet BWN result. I used your suggested configuration in the previous closed issue (1 GPU, default LR, 128 batchsize, 10000 epoch size) but I still got 45% after 55 epochs....
Could you please help me out? Thanks a lot!

BTW, the pretrained model totally works (56.82%). I also tried to train with the configuration from the paper (512 batchsize, 0.1 LR and decay 0.01 every 4 epochs), it doesn't work either.

Here is my training result.
image

Darknet / XNOR

Is there any implementation code for xnor_layer.c and xnor_layer.h for Darknet framework?

#include "xnor_layer.h"
#include "binary_convolution.h"
#include "convolutional_layer.h"

layer make_xnor_layer(int batch, int h, int w, int c, int n, int size, int stride, int pad, ACTIVATION activation, int batch_normalize)
{
    int i;
    layer l = {0};
    l.type = XNOR;

    l.h = h;
    l.w = w;
    l.c = c;
    l.n = n;
    l.batch = batch;
    l.stride = stride;
    l.size = size;
    l.pad = pad;
    l.batch_normalize = batch_normalize;

    l.filters = calloc(c*n*size*size, sizeof(float));
    l.biases = calloc(n, sizeof(float));

    int out_h = convolutional_out_height(l);
    int out_w = convolutional_out_width(l);
    l.out_h = out_h;
    l.out_w = out_w;
    l.out_c = n;
    l.outputs = l.out_h * l.out_w * l.out_c;
    l.inputs = l.w * l.h * l.c;

    l.output = calloc(l.batch*out_h * out_w * n, sizeof(float));

    if(batch_normalize){
        l.scales = calloc(n, sizeof(float));
        for(i = 0; i < n; ++i){
            l.scales[i] = 1;
        }

        l.mean = calloc(n, sizeof(float));
        l.variance = calloc(n, sizeof(float));

        l.rolling_mean = calloc(n, sizeof(float));
        l.rolling_variance = calloc(n, sizeof(float));
    }

    l.activation = activation;

    fprintf(stderr, "XNOR Layer: %d x %d x %d image, %d filters -> %d x %d x %d image\n", h,w,c,n, out_h, out_w, n);

    return l;
}

void forward_xnor_layer(const layer l, network_state state)
{
    int b = l.n;
    int c = l.c;
    int ix = l.w;
    int iy = l.h;
    int wx = l.size;
    int wy = l.size;
    int s = l.stride;
    int pad = l.pad * (l.size/2);

    // MANDATORY: Make the binary layer
    ai2_bin_conv_layer al = ai2_make_bin_conv_layer(b, c, ix, iy, wx, wy, s, pad);

    // OPTIONAL: You need to set the real-valued input like:
    ai2_setFltInput_unpadded(&al, state.input);
    // The above function will automatically binarize the input for the layer (channel wise).
    // If commented: using the default 0-valued input.

    ai2_setFltWeights(&al, l.filters);
    // The above function will automatically binarize the input for the layer (channel wise).
    // If commented: using the default 0-valued weights.

    // MANDATORY: Call forward
    ai2_bin_forward(&al);

    // OPTIONAL: Inspect outputs
    float *output = ai2_getFltOutput(&al);  // output is of size l.px * l.py where px and py are the padded outputs

    memcpy(l.output, output, l.outputs*sizeof(float));
    // MANDATORY: Free layer
    ai2_free_bin_conv_layer(&al);
}

Questions About Mean Centering & Clamping

Hello,

I am trying to implement BWN in a alexnet-like network and was a little confused about the following code

    if opt.binaryWeight then
     meancenterConvParms(convNodes)
     clampConvParms(convNodes)
     realParams:copy(parameters)
     binarizeConvParms(convNodes)
    end

I understand the binarizeConvParams operation that does the approximations explained in the paper. However, in the paper we don't talk about mean centering and clamping. Could anyone explain whats the rationale behind it?

Trained on large network

Hello, I've always been so confused about how could BWN and XNOR-net be trained on large neural network such as vgg-16 or resent-50?

I find it quite difficult to change all the layers into binarized layer at one time, because there is often the gradient explosion or the gradient diminish happens during training time. And I think that change a layer at one time may be able to solve the problem. But is there any approach to deal with it without having to train the binarized layer separately?

Questions about the updateBinaryGradWeight function

Hi, thanks for your excellent work. I am reading your code for a better understanding of the paper and I have two questions about the updateBinaryGradWeight function.
I guess this function deals with the formula of “∂C/∂Wi = ∂C/∂W'i_(1/n+((∂sign/∂Wi)_alpha))" in the third paragraph on page 6. I can match the code with this formula except the "1-1/s[2]". What does it mean?
To figure out this question I write a test and now I have anther problem unfortunately : ). Here is my code and the result.

local x = torch.Tensor(2,1):fill(1);
local y = x:expand(2,10);
y:add(1);
print(y)

11 11 11 11 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11
[torch.DoubleTensor of size 2x10]

It seems like there may be something wrong when I use add() or mul() after expand(). Is there anything wrong with the way I understand the code?

Mini XNOR-Net for MNIST

Hi, I really like this project, and wonder if you can give me any advice on how to make a smaller architecture to train on MNIST?

In your paper on the last page, it's written that the B-A-C-P is the basic unit - so for novices should they just use that on it's own as an exercise to understand the paper?

Thanks a lot 👍

How to make nn.Linear binary weights

Hi

If I'm adding nn.linear into the network. Can it get binarized? I have passed it through BinActiveZ . Is this all I need to do? Where should I change in the updateBinaryGradWeight and binarizeConvParms
Because in binarizeConvParms it only binarized the ConvNode right?

XNOR-Net in Caffe

Is there any implementation of XNOR-Net in Caffe?
I'm very interested to embed a net in a phone.

Run pre-trained Xnor model failed

I follow the ReadMe to test pre-trained model. I get accuracy %56.8 from alexnet_BWN, but i only get %10 on alexnet_XNOR. Did I make something wrong? Is there anyone has this problem?

cudnnFindConvolutionForwardAlgorithm failed

Hey,

I'm getting the above error a couple of seconds after the first training epoch starts:

nClasses:   1000                                                                
nTest:  50000                                                                   
==> doing epoch on training data:                                               
==> online epoch # 1                                                            

cudnnFindConvolutionForwardAlgorithm failed:    2    convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDNN_DATA_FLOAT
/home/drodo/torch/install/bin/luajit: /home/drodo/torch/install/share/lua/5.1/threads/threads.lua:179: [thread 2 endcallback] /home/drodo/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:                                                   
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes:  convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDN
stack traceback:                                                                
[C]: in function 'error'                                                    
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'                                                   
/home/drodo/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/drodo/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
/home/drodo/xnornet/XNOR-Net/train.lua:176: in function </home/drodo/xnornet/XNOR-Net/train.lua:157>
[C]: in function 'xpcall'                                                   
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
/home/drodo/xnornet/XNOR-Net/train.lua:108: in function 'train'             
main.lua:50: in main chunk                                                  
[C]: in function 'dofile'                                                   
...rodo/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670   

Data set is prepared exactly as indicated in the README.md and cuda config is also operational. Has anyone ever come across a similar error running this ConvNet?

Cheers & Thanks,

--
Dimitrios

Why network files are big ?

The pretrained alexnet provided in the links are ~450 MB. Is this because of Lua storing the network in an unoptimized manner. The paper mentions network size of 7.5 MB on Fig 4.
Thanks for clarifying.

xnor and bitcount operations

Thanks for your excellent work! I cannot find the xnor and bitcount operations int equation 11, for in the models/alexnetxnor.lua only use the general SpatialConvolution in Torch?

BinActiveZ:updateGradInput() dependance on input

I'm not familiar with lua but I'm assuming updateGradInput() is simply zeroing gradOutput wherever input is out of [-1,1] range. Shouldn't it also multiply gradOutput by input values in that range?

I can see that BinActive don't bother computing the K tensor as per the paper. Presumably because they didn't make a massive difference but are you also using a different estimator for derivative of the sign() function for binary activations than the one used for kernels (updateBinaryGradWeight)?

Trained model file is very larger

Hi, I had trained alexnet_XNOR by myself.
(I have a smaller GPU, so the flag which i use "-nGPU 1 -batchSize 400"
It seems converged and every thing is fine. (42.1% Top-1 accuracy on validation dataset
But the question is that the saved model occupy so many disks space ..
the model_epoch.t7 need 4.8 GB
the optimState_epoch.t7 need 0.75 GB
and we need 55 epochs !!

I don't know why your download pre-train model need only 0.5 GB
Could you help me to save my disk space ?

Bug util.lua

Hi,
I noticed you modified the util.lua file, adding m:add(1/(n)):mul(1-1/s[2]):mul(n) in the updateBinaryGradWeight function. However, I don't get which is the origin of this modification and where in your paper you talk about it. Moreover, I am training a binarized version of the resnet: the problem is the training diverges with the mul(n).

In addition, may you also upload the model you used for training the binarized Resnet?
Thank you.

BWN/XNOR SqueezeNet training

Hi,

I've been trying to train SqueezeNet in both configurations(bwn and xnor), but I can't get past 31% (24% respectively) top-1 accuracy (I was expecting accuracies similar to alexnet). I tried something similar to the GoogLenet variant depicted in the paper (I replaced the expand layers with straightforward convolutions with kernel sizes of 3x3, so there is no branching).

Have you tried to train this model? If positive, can you, please, tell me how did you do it?

Thank you,
Alex

ping @mrastegari

BinConvolution doen't seem to match paper

(*) means conv operation, o is element-wise product

  # https://github.com/allenai/XNOR-Net/blob/master/models/alexnetxnor.lua#L16
   local function BinConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH)
         local C= nn.Sequential()
          C:add(nn.SpatialBatchNormalization(nInputPlane,1e-4,false))
          C:add(activation())
          C:add(cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH))
       return C
   end

In this implementation, after input activation() next step is direct convolution cudnn.SpatialConvolution() with parameters. But The paper's algorithm for Input binarization is:

I * W ~= (sign(I) (*) sign(W)) o Ka = ((sign(I) (*) sign(W)) o (A (*) k)a
where
A =  torch.mean(input.abs(), 1, keepdim=True)
k  = an averaging kernel with value 1/(w*h)

so
A(*)k is to averaging each input element with its neighboring elements. This is missing in the current implementation, where only (sign(I) (*) sign(W)) o a is calculated.

C:add(activation())
C:add(cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH))

To capture the convolution with A and k from the paper, I would expect pseudo code like this in function BinConvolution() in python

x = BinActiveZ(x)
# <=== === === === === === === === === === START
A = mean #shape N, 1, W, H
sign_I = x #shape  N, Cin, W, H
kH = self.conv.weight.shape[2] #kernel height
kW = self.conv.weight.shape[3] #kernel width
k = torch.ones(1, 1, kH, kW) * (1/(kH*kW)) #setup averaging kernel k
conv_Ak = torch.nn.Conv2d(1, 1, kH, kW, padding=(kH//2, kW//2))
conv_Ak.weight.data = k
K = conv_Ak(A) #shape N, 1, W, H

#now calculate sign_I (*)sign_W o Ka
# since self.conv.weight is already binarized by binarizeConvParams() before batch starts, 
# the `a` in `Ka` is included in `self.conv(x)` . The only missing part is `mul(K)`
# Hence:
x = self.conv(x).mul(K) 
# <=== === === === === === === === === === END

Can you check if my understanding of the discrepancy is correct?

Data pre-processing for the pre-trained models

Hi,

For the pretrained models (XNOR-Net and BWN), what are the corresponding data pre-processing procedure and the related parameters? For example, if the pre-processing is subtracting an average image from the input image, what is the average image?

Thanks.

Model size has no reduction

I found that the model parameters and memory usage did not decrease, and the network did not run faster. I'm confused!

Where is the XNOR operator implemented?

I was looking for the implementation of XNOR operator, but couldn't find any. I only saw regular Conv layers applied to binary valued inputs and weights. Does anyone know if the convolution layer is implemented at all, which uses XNOR and bitcount to replace regular matrix multiplication?

Thanks!

Where is the Scalar Multiplication and ReLU Activations in alexnetxnor.lua?

Hello!
I am trying to figure out the structure of your network by this code after reading your paper.

However, I couldn't find scalar multiplications (average of weight / input data) in the model you've created in alexnetxnor.lua
Also there isn't any ReLU functions in Binarized Convolution layers.
Is there anything wrong in the way that I am understanding the network?

As far as I read from the paper, scalar multiplication is the core idea of this paper which made possible the distinguishable results compared to BinaryConnect or other BNNs.

Thank you.

Binary Convolutional layer

Hi, thanks for sharing the code, where can I find the implementation of binary Convolutional layer? I only find "BinActiveZ.lua"

Error loading images

Hello,

I have a error when I try to use the trained model for the both models, it does not load the images from the validation dataset of Imagenet 2012 and I receive the error print "Error could not load image".
I've pre-processed the images like in the readme file.

Mihai

Having trouble reproducing the reported accuracy

I ran the code with command th main.lua -data ./images -nGPU 2 -batchSize 512 -netType alexnet -binaryWeight -dropout 0.1 after changed the learning rate policy to be

1, 4, 1e-1, 5e-4,
5, 8, 1e-3, 5e-4,
9, 12, 1e-5, 0.
13, 16, 1e-7, 0

I tried to use this to get the same result for BWN(alexnet) as reported in the paper. However, the resulting top-1 train accurcy after the first epoch is 7.82%, far from reported. The top-5 training accuracy is 19.72%. Is there anything I missed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.