allenai / xnor-net Goto Github PK
View Code? Open in Web Editor NEWImageNet classification using binary Convolutional Neural Networks
Home Page: https://xnor.ai/
License: Other
ImageNet classification using binary Convolutional Neural Networks
Home Page: https://xnor.ai/
License: Other
When I used the pre-trained module to test an image, I wrote codes as below
predictions = model:forward(img:cuda())
print(predictions:exp())
and then I got
[torch.CudaTensor of size 4x1000]
,which means the vector about the final prediction score is 4*1000.
However we know the last layer is
nn.View(1000)
My test result have got three extra dimension, WHY??
In you code “m:add(1/n):mul(1-1/s[2])”, why you mul (1-1/s[2]) to m ? After this, why you mul n ? I can't understand this ? @mrastegari
Hey @mrastegari ,
For doing forward pass through convolution layers in the XNOR net, I guess the shared code uses the default Torch/nn strategy . I was wondering if your code for doing convolution using XNOR + bitcounting operations is publicly available ?
Hi,
I am trying to compare for a paper and I couldn't get it to work on resnet18, can you please share your resnet18 code so I can run your model?
Does anyone know how to convert the .t7 format of the data to other tensorflow/theano etc compatible formats? I have successfully read the file in lua, as well as read it using python (torchfile.py ==> https://github.com/bshillingford/python-torchfile)
But I would like to convert it to h5 and run it with keras etc., so i can define my own operations for the network. Can anyone help me out please?
I dont know lua much. 🦃
Hi @mrastegari
Thank you for the nice paper and code! The work is really impressive. I have a question about the paper. As mentioned in other issues #11 , Torch does not support bit operations. But in the paper, there is a statement: "This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings". I would appreciate it if you could explain the way to compute these values (i.e., 58x and 32x). Thank you! Wish you all the best!
Best,
Yongcheng
Hi,thanks for your excellent work, and I'm focused on the work for a period. I think the core is Gradient optimization. But still then I haven't reproduce your experiment. Could you provide a little advice to me?
I build the network with the block that mentioned in your paper (B->A->C->P). and Backward is using full precision data (weights&gradient) for ||r|| < 1.
Has anyone successfully run xnor-net? I run the code dozens of times, but it has never converged. The error is always "nan". Any idea to make the training converge?
how to train on CIFAR-10 dataset?
Hi
According to this line, it seems that first and last layers of Binary-weights-network are not binary. Do I miss something?
thanks for providing the code to reproduce your work.
ping @mrastegari
Hello,
I wanted to use the XNOR-Net model to train on my custom data. How to proceed for it?
Hi,
I somehow have trouble to reproduce the alexnet BWN result. I used your suggested configuration in the previous closed issue (1 GPU, default LR, 128 batchsize, 10000 epoch size) but I still got 45% after 55 epochs....
Could you please help me out? Thanks a lot!
BTW, the pretrained model totally works (56.82%). I also tried to train with the configuration from the paper (512 batchsize, 0.1 LR and decay 0.01 every 4 epochs), it doesn't work either.
Hello I want to use the model alexnet_BNW.t7 to perform prediction over single image. How can I do so?
Please help!
Is there any implementation code for xnor_layer.c and xnor_layer.h for Darknet framework?
#include "xnor_layer.h"
#include "binary_convolution.h"
#include "convolutional_layer.h"
layer make_xnor_layer(int batch, int h, int w, int c, int n, int size, int stride, int pad, ACTIVATION activation, int batch_normalize)
{
int i;
layer l = {0};
l.type = XNOR;
l.h = h;
l.w = w;
l.c = c;
l.n = n;
l.batch = batch;
l.stride = stride;
l.size = size;
l.pad = pad;
l.batch_normalize = batch_normalize;
l.filters = calloc(c*n*size*size, sizeof(float));
l.biases = calloc(n, sizeof(float));
int out_h = convolutional_out_height(l);
int out_w = convolutional_out_width(l);
l.out_h = out_h;
l.out_w = out_w;
l.out_c = n;
l.outputs = l.out_h * l.out_w * l.out_c;
l.inputs = l.w * l.h * l.c;
l.output = calloc(l.batch*out_h * out_w * n, sizeof(float));
if(batch_normalize){
l.scales = calloc(n, sizeof(float));
for(i = 0; i < n; ++i){
l.scales[i] = 1;
}
l.mean = calloc(n, sizeof(float));
l.variance = calloc(n, sizeof(float));
l.rolling_mean = calloc(n, sizeof(float));
l.rolling_variance = calloc(n, sizeof(float));
}
l.activation = activation;
fprintf(stderr, "XNOR Layer: %d x %d x %d image, %d filters -> %d x %d x %d image\n", h,w,c,n, out_h, out_w, n);
return l;
}
void forward_xnor_layer(const layer l, network_state state)
{
int b = l.n;
int c = l.c;
int ix = l.w;
int iy = l.h;
int wx = l.size;
int wy = l.size;
int s = l.stride;
int pad = l.pad * (l.size/2);
// MANDATORY: Make the binary layer
ai2_bin_conv_layer al = ai2_make_bin_conv_layer(b, c, ix, iy, wx, wy, s, pad);
// OPTIONAL: You need to set the real-valued input like:
ai2_setFltInput_unpadded(&al, state.input);
// The above function will automatically binarize the input for the layer (channel wise).
// If commented: using the default 0-valued input.
ai2_setFltWeights(&al, l.filters);
// The above function will automatically binarize the input for the layer (channel wise).
// If commented: using the default 0-valued weights.
// MANDATORY: Call forward
ai2_bin_forward(&al);
// OPTIONAL: Inspect outputs
float *output = ai2_getFltOutput(&al); // output is of size l.px * l.py where px and py are the padded outputs
memcpy(l.output, output, l.outputs*sizeof(float));
// MANDATORY: Free layer
ai2_free_bin_conv_layer(&al);
}
Hello,
I am trying to implement BWN in a alexnet-like network and was a little confused about the following code
if opt.binaryWeight then
meancenterConvParms(convNodes)
clampConvParms(convNodes)
realParams:copy(parameters)
binarizeConvParms(convNodes)
end
I understand the binarizeConvParams operation that does the approximations explained in the paper. However, in the paper we don't talk about mean centering and clamping. Could anyone explain whats the rationale behind it?
Hello, I've always been so confused about how could BWN and XNOR-net be trained on large neural network such as vgg-16 or resent-50?
I find it quite difficult to change all the layers into binarized layer at one time, because there is often the gradient explosion or the gradient diminish happens during training time. And I think that change a layer at one time may be able to solve the problem. But is there any approach to deal with it without having to train the binarized layer separately?
Hi, thanks for your excellent work. I am reading your code for a better understanding of the paper and I have two questions about the updateBinaryGradWeight function.
I guess this function deals with the formula of “∂C/∂Wi = ∂C/∂W'i_(1/n+((∂sign/∂Wi)_alpha))" in the third paragraph on page 6. I can match the code with this formula except the "1-1/s[2]". What does it mean?
To figure out this question I write a test and now I have anther problem unfortunately : ). Here is my code and the result.
local x = torch.Tensor(2,1):fill(1);
local y = x:expand(2,10);
y:add(1);
print(y)
11 11 11 11 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11
[torch.DoubleTensor of size 2x10]
It seems like there may be something wrong when I use add() or mul() after expand(). Is there anything wrong with the way I understand the code?
I was performing validation on few data taken from imagenet, the result predicted for one synset images varied from image to image.
So, I wanted to know the correct order of class labels/synsets used for training.
Hi, I really like this project, and wonder if you can give me any advice on how to make a smaller architecture to train on MNIST?
In your paper on the last page, it's written that the B-A-C-P
is the basic unit - so for novices should they just use that on it's own as an exercise to understand the paper?
Thanks a lot 👍
Sorry for potentially stupid question but I failed to find explicit answer on this vital question.
Hi
If I'm adding nn.linear into the network. Can it get binarized? I have passed it through BinActiveZ
. Is this all I need to do? Where should I change in the updateBinaryGradWeight
and binarizeConvParms
Because in binarizeConvParms
it only binarized the ConvNode right?
Is there any implementation of XNOR-Net in Caffe?
I'm very interested to embed a net in a phone.
I follow the ReadMe to test pre-trained model. I get accuracy %56.8 from alexnet_BWN, but i only get %10 on alexnet_XNOR. Did I make something wrong? Is there anyone has this problem?
Just wanted to know if it is possible to get resnet18 BWN, used in the paper ?
Thanks!
Hey,
I'm getting the above error a couple of seconds after the first training epoch starts:
nClasses: 1000
nTest: 50000
==> doing epoch on training data:
==> online epoch # 1
cudnnFindConvolutionForwardAlgorithm failed: 2 convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDNN_DATA_FLOAT
/home/drodo/torch/install/bin/luajit: /home/drodo/torch/install/share/lua/5.1/threads/threads.lua:179: [thread 2 endcallback] /home/drodo/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes: convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA800,3,224,224 -filtA96,3,11,11 800,96,55,55 -padA2,2 -convStrideA4,4 CUDN
stack traceback:
[C]: in function 'error'
/home/drodo/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'
/home/drodo/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/drodo/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
/home/drodo/xnornet/XNOR-Net/train.lua:176: in function </home/drodo/xnornet/XNOR-Net/train.lua:157>
[C]: in function 'xpcall'
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:174: in function 'dojob'
/home/drodo/torch/install/share/lua/5.1/threads/threads.lua:223: in function 'addjob'
/home/drodo/xnornet/XNOR-Net/train.lua:108: in function 'train'
main.lua:50: in main chunk
[C]: in function 'dofile'
...rodo/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670
Data set is prepared exactly as indicated in the README.md and cuda config is also operational. Has anyone ever come across a similar error running this ConvNet?
Cheers & Thanks,
--
Dimitrios
Is there any implementation of XNOR-Net in Tensorflow?
The pretrained alexnet provided in the links are ~450 MB. Is this because of Lua storing the network in an unoptimized manner. The paper mentions network size of 7.5 MB on Fig 4.
Thanks for clarifying.
Thanks for your excellent work! I cannot find the xnor and bitcount operations int equation 11, for in the models/alexnetxnor.lua only use the general SpatialConvolution in Torch?
I'm not familiar with lua but I'm assuming updateGradInput() is simply zeroing gradOutput wherever input is out of [-1,1] range. Shouldn't it also multiply gradOutput by input values in that range?
I can see that BinActive don't bother computing the K tensor as per the paper. Presumably because they didn't make a massive difference but are you also using a different estimator for derivative of the sign() function for binary activations than the one used for kernels (updateBinaryGradWeight)?
Hi, I had trained alexnet_XNOR by myself.
(I have a smaller GPU, so the flag which i use "-nGPU 1 -batchSize 400"
It seems converged and every thing is fine. (42.1% Top-1 accuracy on validation dataset
But the question is that the saved model occupy so many disks space ..
the model_epoch.t7 need 4.8 GB
the optimState_epoch.t7 need 0.75 GB
and we need 55 epochs !!
I don't know why your download pre-train model need only 0.5 GB
Could you help me to save my disk space ?
Hi,
I noticed you modified the util.lua file, adding m:add(1/(n)):mul(1-1/s[2]):mul(n) in the updateBinaryGradWeight function. However, I don't get which is the origin of this modification and where in your paper you talk about it. Moreover, I am training a binarized version of the resnet: the problem is the training diverges with the mul(n).
In addition, may you also upload the model you used for training the binarized Resnet?
Thank you.
Hi,
I've been trying to train SqueezeNet in both configurations(bwn and xnor), but I can't get past 31% (24% respectively) top-1 accuracy (I was expecting accuracies similar to alexnet). I tried something similar to the GoogLenet variant depicted in the paper (I replaced the expand layers with straightforward convolutions with kernel sizes of 3x3, so there is no branching).
Have you tried to train this model? If positive, can you, please, tell me how did you do it?
Thank you,
Alex
ping @mrastegari
I don't have my own GPU. Is it possible to run a pre trained XNOR net on google collaboratory using the GPU.
(*)
means conv operation, o
is element-wise product
# https://github.com/allenai/XNOR-Net/blob/master/models/alexnetxnor.lua#L16
local function BinConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH)
local C= nn.Sequential()
C:add(nn.SpatialBatchNormalization(nInputPlane,1e-4,false))
C:add(activation())
C:add(cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH))
return C
end
In this implementation, after input activation()
next step is direct convolution cudnn.SpatialConvolution()
with parameters. But The paper's algorithm for Input binarization is:
I * W ~= (sign(I) (*) sign(W)) o Ka = ((sign(I) (*) sign(W)) o (A (*) k)a
where
A = torch.mean(input.abs(), 1, keepdim=True)
k = an averaging kernel with value 1/(w*h)
so
A(*)k
is to averaging each input element with its neighboring elements. This is missing in the current implementation, where only (sign(I) (*) sign(W)) o a
is calculated.
C:add(activation())
C:add(cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, dW, dH, padW, padH))
To capture the convolution with A
and k
from the paper, I would expect pseudo code like this in function BinConvolution()
in python
x = BinActiveZ(x)
# <=== === === === === === === === === === START
A = mean #shape N, 1, W, H
sign_I = x #shape N, Cin, W, H
kH = self.conv.weight.shape[2] #kernel height
kW = self.conv.weight.shape[3] #kernel width
k = torch.ones(1, 1, kH, kW) * (1/(kH*kW)) #setup averaging kernel k
conv_Ak = torch.nn.Conv2d(1, 1, kH, kW, padding=(kH//2, kW//2))
conv_Ak.weight.data = k
K = conv_Ak(A) #shape N, 1, W, H
#now calculate sign_I (*)sign_W o Ka
# since self.conv.weight is already binarized by binarizeConvParams() before batch starts,
# the `a` in `Ka` is included in `self.conv(x)` . The only missing part is `mul(K)`
# Hence:
x = self.conv(x).mul(K)
# <=== === === === === === === === === === END
Can you check if my understanding of the discrepancy is correct?
Hi,
For the pretrained models (XNOR-Net and BWN), what are the corresponding data pre-processing procedure and the related parameters? For example, if the pre-processing is subtracting an average image from the input image, what is the average image?
Thanks.
I found that the model parameters and memory usage did not decrease, and the network did not run faster. I'm confused!
I was looking for the implementation of XNOR operator, but couldn't find any. I only saw regular Conv layers applied to binary valued inputs and weights. Does anyone know if the convolution layer is implemented at all, which uses XNOR and bitcount to replace regular matrix multiplication?
Thanks!
Hello!
I am trying to figure out the structure of your network by this code after reading your paper.
However, I couldn't find scalar multiplications (average of weight / input data) in the model you've created in alexnetxnor.lua
Also there isn't any ReLU functions in Binarized Convolution layers.
Is there anything wrong in the way that I am understanding the network?
As far as I read from the paper, scalar multiplication is the core idea of this paper which made possible the distinguishable results compared to BinaryConnect or other BNNs.
Thank you.
Hi, thanks for sharing the code, where can I find the implementation of binary Convolutional layer? I only find "BinActiveZ.lua"
Hello,
I have a error when I try to use the trained model for the both models, it does not load the images from the validation dataset of Imagenet 2012 and I receive the error print "Error could not load image".
I've pre-processed the images like in the readme file.
Mihai
I ran the code with command th main.lua -data ./images -nGPU 2 -batchSize 512 -netType alexnet -binaryWeight -dropout 0.1
after changed the learning rate policy to be
1, 4, 1e-1, 5e-4,
5, 8, 1e-3, 5e-4,
9, 12, 1e-5, 0.
13, 16, 1e-7, 0
I tried to use this to get the same result for BWN(alexnet) as reported in the paper. However, the resulting top-1 train accurcy after the first epoch is 7.82%, far from reported. The top-5 training accuracy is 19.72%. Is there anything I missed?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.