allenai / xnor-net Goto Github PK

View Code? Open in Web Editor NEW

855.0 855.0 239.0 24 KB

ImageNet classification using binary Convolutional Neural Networks

Home Page: https://xnor.ai/

License: Other

Lua 100.00%

xnor-net's People

Contributors

Stargazers

Watchers

Forkers

wishforgood chagge michalbusta gujunli tkach1024 qzhou003 albicasty paseam kaobaozi codeaudit ominux rongchangzhao geyijun shengliang veronicachelu shengwangxx anhtuan98 chetanism talentica hope-yao cv-ip runngezhang liuguoyou kiikurage phuocddat seougnseon mvpduncan cpehle ryannnxu vyraun soledad89 guoyilin issac8huxley loliod tianzhi0549 sanghoon ml-lab aljaksandr duzhengyuan prvn16 hushuitian theamazingfedex qdo1010 zengjianping pr0fedt redflasher longjohncoder wala0003 hecklerhack taey16 vishnu0x1 telechong lengcong0716 donghyunlee ljk628 phoenix367 gsbyeon lvchigo amadeuzou moses1994 jorisgu ajaytalati cynthia suzzzzin coldwings iefiac feixuedudiao cwlseu bikong2 trigrass2 samuel1208 ilibx cuijianzhu aiyunfeng walle2012 igovt gchen2016 ghdeng1992 liaoheping s5248 gjtjx zgsxwsdxg 3sunny barongeng yingjianling joseph-zhong mlzxy kochie ma3252788 bjut-sipl cv9527 flrgsr sfzhoucode yexiguafuqihao warehouse1992 sunkaianna alexanderkyte cyberfire bowrein world4jason

xnor-net's Issues

Problem about prediction score when testing an image

When I used the pre-trained module to test an image, I wrote codes as below

predictions = model:forward(img:cuda())
print(predictions:exp())

and then I got
[torch.CudaTensor of size 4x1000]
,which means the vector about the final prediction score is 4*1000.
However we know the last layer is
nn.View(1000)
My test result have got three extra dimension, WHY??

Issue in the function updateBinaryGradWeight

In you code “m:add(1/n):mul(1-1/s[2])”, why you mul (1-1/s[2]) to m ? After this, why you mul n ? I can't understand this ? @mrastegari

Convolution using XNOR & bitcounting

Hey @mrastegari ,
For doing forward pass through convolution layers in the XNOR net, I guess the shared code uses the default Torch/nn strategy . I was wondering if your code for doing convolution using XNOR + bitcounting operations is publicly available ?

Reproducing resnet18 results

Hi,
I am trying to compare for a paper and I couldn't get it to work on resnet18, can you please share your resnet18 code so I can run your model?

Trained XNOR-Network model for other frameworks?

Does anyone know how to convert the .t7 format of the data to other tensorflow/theano etc compatible formats? I have successfully read the file in lua, as well as read it using python (torchfile.py ==> https://github.com/bshillingford/python-torchfile)

But I would like to convert it to h5 and run it with keras etc., so i can define my own operations for the network. Can anyone help me out please?

I dont know lua much. 🦃

About the paper

Hi @mrastegari

Thank you for the nice paper and code! The work is really impressive. I have a question about the paper. As mentioned in other issues #11 , Torch does not support bit operations. But in the paper, there is a statement: "This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings". I would appreciate it if you could explain the way to compute these values (i.e., 58x and 32x). Thank you! Wish you all the best!

Best,
Yongcheng

can you share your CIFAR-10 model?

backward gradient is vanished

Hi，thanks for your excellent work, and I'm focused on the work for a period. I think the core is Gradient optimization. But still then I haven't reproduce your experiment. Could you provide a little advice to me?

I build the network with the block that mentioned in your paper (B->A->C->P). and Backward is using full precision data (weights&gradient) for ||r|| < 1.

XNOR net doesn't converge

Has anyone successfully run xnor-net? I run the code dozens of times, but it has never converged. The error is always "nan". Any idea to make the training converge?

how to train on CIFAR-10 dataset

how to train on CIFAR-10 dataset?

Are weights in 1st and last layers binary?

According to this line, it seems that first and last layers of Binary-weights-network are not binary. Do I miss something?

thanks for providing the code to reproduce your work.

ping @mrastegari

How to train on custom data?

Hello,

I wanted to use the XNOR-Net model to train on my custom data. How to proceed for it?

problems on reproducing BWN result..

Hi,

I somehow have trouble to reproduce the alexnet BWN result. I used your suggested configuration in the previous closed issue (1 GPU, default LR, 128 batchsize, 10000 epoch size) but I still got 45% after 55 epochs....
Could you please help me out? Thanks a lot!

BTW, the pretrained model totally works (56.82%). I also tried to train with the configuration from the paper (512 batchsize, 0.1 LR and decay 0.01 every 4 epochs), it doesn't work either.

Here is my training result.

performing prediction on single image.

Hello I want to use the model alexnet_BNW.t7 to perform prediction over single image. How can I do so?
Please help!

Darknet / XNOR

Is there any implementation code for xnor_layer.c and xnor_layer.h for Darknet framework?

#include "xnor_layer.h"
#include "binary_convolution.h"
#include "convolutional_layer.h"

layer make_xnor_layer(int batch, int h, int w, int c, int n, int size, int stride, int pad, ACTIVATION activation, int batch_normalize)
{
    int i;
    layer l = {0};
    l.type = XNOR;

    l.h = h;
    l.w = w;
    l.c = c;
    l.n = n;
    l.batch = batch;
    l.stride = stride;
    l.size = size;
    l.pad = pad;
    l.batch_normalize = batch_normalize;

    l.filters = calloc(c*n*size*size, sizeof(float));
    l.biases = calloc(n, sizeof(float));

    int out_h = convolutional_out_height(l);
    int out_w = convolutional_out_width(l);
    l.out_h = out_h;
    l.out_w = out_w;
    l.out_c = n;
    l.outputs = l.out_h * l.out_w * l.out_c;
    l.inputs = l.w * l.h * l.c;

    l.output = calloc(l.batch*out_h * out_w * n, sizeof(float));

    if(batch_normalize){
        l.scales = calloc(n, sizeof(float));
        for(i = 0; i < n; ++i){
            l.scales[i] = 1;
        }

        l.mean = calloc(n, sizeof(float));
        l.variance = calloc(n, sizeof(float));

        l.rolling_mean = calloc(n, sizeof(float));
        l.rolling_variance = calloc(n, sizeof(float));
    }

    l.activation = activation;

    fprintf(stderr, "XNOR Layer: %d x %d x %d image, %d filters -> %d x %d x %d image\n", h,w,c,n, out_h, out_w, n);

    return l;
}

void forward_xnor_layer(const layer l, network_state state)
{
    int b = l.n;
    int c = l.c;
    int ix = l.w;
    int iy = l.h;
    int wx = l.size;
    int wy = l.size;
    int s = l.stride;
    int pad = l.pad * (l.size/2);

    // MANDATORY: Make the binary layer
    ai2_bin_conv_layer al = ai2_make_bin_conv_layer(b, c, ix, iy, wx, wy, s, pad);

    // OPTIONAL: You need to set the real-valued input like:
    ai2_setFltInput_unpadded(&al, state.input);
    // The above function will automatically binarize the input for the layer (channel wise).
    // If commented: using the default 0-valued input.

    ai2_setFltWeights(&al, l.filters);
    // The above function will automatically binarize the input for the layer (channel wise).
    // If commented: using the default 0-valued weights.

    // MANDATORY: Call forward
    ai2_bin_forward(&al);

    // OPTIONAL: Inspect outputs
    float *output = ai2_getFltOutput(&al);  // output is of size l.px * l.py where px and py are the padded outputs

    memcpy(l.output, output, l.outputs*sizeof(float));
    // MANDATORY: Free layer
    ai2_free_bin_conv_layer(&al);
}

Questions About Mean Centering & Clamping

Hello,

I am trying to implement BWN in a alexnet-like network and was a little confused about the following code

    if opt.binaryWeight then
     meancenterConvParms(convNodes)
     clampConvParms(convNodes)
     realParams:copy(parameters)
     binarizeConvParms(convNodes)
    end

I understand the binarizeConvParams operation that does the approximations explained in the paper. However, in the paper we don't talk about mean centering and clamping. Could anyone explain whats the rationale behind it?

Trained on large network

Hello, I've always been so confused about how could BWN and XNOR-net be trained on large neural network such as vgg-16 or resent-50?
 
I find it quite difficult to change all the layers into binarized layer at one time, because there is often the gradient explosion or the gradient diminish happens during training time. And I think that change a layer at one time may be able to solve the problem. But is there any approach to deal with it without having to train the binarized layer separately?

Questions about the updateBinaryGradWeight function

Hi, thanks for your excellent work. I am reading your code for a better understanding of the paper and I have two questions about the updateBinaryGradWeight function.
I guess this function deals with the formula of “∂C/∂Wi = ∂C/∂W'i_(1/n+((∂sign/∂Wi)_alpha))" in the third paragraph on page 6. I can match the code with this formula except the "1-1/s[2]". What does it mean?
To figure out this question I write a test and now I have anther problem unfortunately : ). Here is my code and the result.

local x = torch.Tensor(2,1):fill(1);
local y = x:expand(2,10);
y:add(1);
print(y)

11 11 11 11 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11
[torch.DoubleTensor of size 2x10]

It seems like there may be something wrong when I use add() or mul() after expand(). Is there anything wrong with the way I understand the code?

The order of imagenet classes used for training the XNOR-Net

I was performing validation on few data taken from imagenet, the result predicted for one synset images varied from image to image.
So, I wanted to know the correct order of class labels/synsets used for training.

Mini XNOR-Net for MNIST

Hi, I really like this project, and wonder if you can give me any advice on how to make a smaller architecture to train on MNIST?

In your paper on the last page, it's written that the B-A-C-P is the basic unit - so for novices should they just use that on it's own as an exercise to understand the paper?

Thanks a lot 👍

Will I get ~32x speedup on your XNOR implementation?

Sorry for potentially stupid question but I failed to find explicit answer on this vital question.

How to make nn.Linear binary weights

If I'm adding nn.linear into the network. Can it get binarized? I have passed it through BinActiveZ . Is this all I need to do? Where should I change in the updateBinaryGradWeight and binarizeConvParms
Because in binarizeConvParms it only binarized the ConvNode right?

XNOR-Net in Caffe

Is there any implementation of XNOR-Net in Caffe?
I'm very interested to embed a net in a phone.

Run pre-trained Xnor model failed

I follow the ReadMe to test pre-trained model. I get accuracy %56.8 from alexnet_BWN, but i only get %10 on alexnet_XNOR. Did I make something wrong? Is there anyone has this problem?

is it possible for you to share resnet 18

Just wanted to know if it is possible to get resnet18 BWN, used in the paper ?
Thanks!

cudnnFindConvolutionForwardAlgorithm failed

Hey,

I'm getting the above error a couple of seconds after the first training epoch starts:

nClasses:   1000                                                                
nTest:  50000                                                                   
==> doing epoch on training data:                                               
==> online epoch # 1                                                            

cudnnFindConvolutionForwardAlgorithm failed:    2    convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDNN_DATA_FLOAT
/home/drodo/torch/install/bin/luajit: /home/drodo/torch/install/share/lua/5.1/threads/threads.lua:179: [thread 2 endcallback] /home/drodo/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:                                                   
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes:  convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDN
stack traceback:                                                                
[C]: in function 'error'                                                    
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'                                                   
/home/drodo/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/drodo/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
/home/drodo/xnornet/XNOR-Net/train.lua:176: in function </home/drodo/xnornet/XNOR-Net/train.lua:157>
[C]: in function 'xpcall'                                                   
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
/home/drodo/xnornet/XNOR-Net/train.lua:108: in function 'train'             
main.lua:50: in main chunk                                                  
[C]: in function 'dofile'                                                   
...rodo/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

Data set is prepared exactly as indicated in the README.md and cuda config is also operational. Has anyone ever come across a similar error running this ConvNet?

Cheers & Thanks,

--
Dimitrios

XNOR-Net in Tensorflow

Is there any implementation of XNOR-Net in Tensorflow?

Why network files are big ?

The pretrained alexnet provided in the links are ~450 MB. Is this because of Lua storing the network in an unoptimized manner. The paper mentions network size of 7.5 MB on Fig 4.
Thanks for clarifying.

xnor and bitcount operations

Thanks for your excellent work! I cannot find the xnor and bitcount operations int equation 11, for in the models/alexnetxnor.lua only use the general SpatialConvolution in Torch?

BinActiveZ:updateGradInput() dependance on input

I'm not familiar with lua but I'm assuming updateGradInput() is simply zeroing gradOutput wherever input is out of [-1,1] range. Shouldn't it also multiply gradOutput by input values in that range?

I can see that BinActive don't bother computing the K tensor as per the paper. Presumably because they didn't make a massive difference but are you also using a different estimator for derivative of the sign() function for binary activations than the one used for kernels (updateBinaryGradWeight)?

Trained model file is very larger

Hi, I had trained alexnet_XNOR by myself.
(I have a smaller GPU, so the flag which i use "-nGPU 1 -batchSize 400"
It seems converged and every thing is fine. (42.1% Top-1 accuracy on validation dataset
But the question is that the saved model occupy so many disks space ..
the model_epoch.t7 need 4.8 GB
the optimState_epoch.t7 need 0.75 GB
and we need 55 epochs !!

I don't know why your download pre-train model need only 0.5 GB
Could you help me to save my disk space ?

Bug util.lua

Hi,
I noticed you modified the util.lua file, adding m:add(1/(n)):mul(1-1/s[2]):mul(n) in the updateBinaryGradWeight function. However, I don't get which is the origin of this modification and where in your paper you talk about it. Moreover, I am training a binarized version of the resnet: the problem is the training diverges with the mul(n).

In addition, may you also upload the model you used for training the binarized Resnet?
Thank you.

BWN/XNOR SqueezeNet training

Hi,

I've been trying to train SqueezeNet in both configurations(bwn and xnor), but I can't get past 31% (24% respectively) top-1 accuracy (I was expecting accuracies similar to alexnet). I tried something similar to the GoogLenet variant depicted in the paper (I replaced the expand layers with straightforward convolutions with kernel sizes of 3x3, so there is no branching).

Have you tried to train this model? If positive, can you, please, tell me how did you do it?

Thank you,
Alex

ping @mrastegari

Use google collab to run pretrained models

I don't have my own GPU. Is it possible to run a pre trained XNOR net on google collaboratory using the GPU.

BinConvolution doen't seem to match paper

(*) means conv operation, o is element-wise product

  # https://github.com/allenai/XNOR-Net/blob/master/models/alexnetxnor.lua#L16
   local function BinConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH)
         local C= nn.Sequential()
          C:add(nn.SpatialBatchNormalization(nInputPlane,1e-4,false))
          C:add(activation())
          C:add(cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH))
       return C
   end

In this implementation, after input activation() next step is direct convolution cudnn.SpatialConvolution() with parameters. But The paper's algorithm for Input binarization is:

I * W ~= (sign(I) (*) sign(W)) o Ka = ((sign(I) (*) sign(W)) o (A (*) k)a
where
A =  torch.mean(input.abs(), 1, keepdim=True)
k  = an averaging kernel with value 1/(w*h)

so
A(*)k is to averaging each input element with its neighboring elements. This is missing in the current implementation, where only (sign(I) (*) sign(W)) o a is calculated.

C:add(activation())
C:add(cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH))

To capture the convolution with A and k from the paper, I would expect pseudo code like this in function BinConvolution() in python

x = BinActiveZ(x)
# <=== === === === === === === === === === START
A = mean #shape N, 1, W, H
sign_I = x #shape  N, Cin, W, H
kH = self.conv.weight.shape[2] #kernel height
kW = self.conv.weight.shape[3] #kernel width
k = torch.ones(1, 1, kH, kW) * (1/(kH*kW)) #setup averaging kernel k
conv_Ak = torch.nn.Conv2d(1, 1, kH, kW, padding=(kH//2, kW//2))
conv_Ak.weight.data = k
K = conv_Ak(A) #shape N, 1, W, H

#now calculate sign_I (*)sign_W o Ka
# since self.conv.weight is already binarized by binarizeConvParams() before batch starts, 
# the `a` in `Ka` is included in `self.conv(x)` . The only missing part is `mul(K)`
# Hence:
x = self.conv(x).mul(K) 
# <=== === === === === === === === === === END

Can you check if my understanding of the discrepancy is correct?

Data pre-processing for the pre-trained models

Hi,

For the pretrained models (XNOR-Net and BWN), what are the corresponding data pre-processing procedure and the related parameters? For example, if the pre-processing is subtracting an average image from the input image, what is the average image?

Thanks.

Model size has no reduction

I found that the model parameters and memory usage did not decrease, and the network did not run faster. I'm confused!

Where is the XNOR operator implemented?

I was looking for the implementation of XNOR operator, but couldn't find any. I only saw regular Conv layers applied to binary valued inputs and weights. Does anyone know if the convolution layer is implemented at all, which uses XNOR and bitcount to replace regular matrix multiplication?

Thanks!

Where is the Scalar Multiplication and ReLU Activations in alexnetxnor.lua?

Hello!
I am trying to figure out the structure of your network by this code after reading your paper.

However, I couldn't find scalar multiplications (average of weight / input data) in the model you've created in alexnetxnor.lua
Also there isn't any ReLU functions in Binarized Convolution layers.
Is there anything wrong in the way that I am understanding the network?

As far as I read from the paper, scalar multiplication is the core idea of this paper which made possible the distinguishable results compared to BinaryConnect or other BNNs.

Thank you.

Binary Convolutional layer

Hi, thanks for sharing the code, where can I find the implementation of binary Convolutional layer? I only find "BinActiveZ.lua"

Error loading images

Hello,

I have a error when I try to use the trained model for the both models, it does not load the images from the validation dataset of Imagenet 2012 and I receive the error print "Error could not load image".
I've pre-processed the images like in the readme file.

Mihai

Having trouble reproducing the reported accuracy

I ran the code with command th main.lua -data ./images -nGPU 2 -batchSize 512 -netType alexnet -binaryWeight -dropout 0.1 after changed the learning rate policy to be

1, 4, 1e-1, 5e-4,
5, 8, 1e-3, 5e-4,
9, 12, 1e-5, 0.
13, 16, 1e-7, 0

I tried to use this to get the same result for BWN(alexnet) as reported in the paper. However, the resulting top-1 train accurcy after the first epoch is 7.82%, far from reported. The top-5 training accuracy is 19.72%. Is there anything I missed?

allenai / xnor-net Goto Github PK

xnor-net's People

Contributors

Stargazers

Watchers

Forkers

xnor-net's Issues

Recommend Projects

Recommend Topics

Recommend Org