mostafaelhoushi / deepshift Goto Github PK
View Code? Open in Web Editor NEWImplementation of "DeepShift: Towards Multiplication-Less Neural Networks" https://arxiv.org/abs/1905.13298
Implementation of "DeepShift: Towards Multiplication-Less Neural Networks" https://arxiv.org/abs/1905.13298
Great job!
There are some questions. Does this code only implement the forward propagation shift operation? The back-propagation shift operation code is not found? Can you explain where the back propagation code is?
thank you!
@mostafaelhoushi
HI @mostafaelhoushi :
Theoretically, int8 can be 3-4 times faster than FP32.
But in the current paper, there is almost no comparison with int8, because of any disadvantages of int8?
For example, the APOT paper reported that its paper speed is twice that of FP32, and it should be slower than int8.
https://github.com/yhhhli/APoT_Quantization
Looking forward to your reply, thank you very much!
Best wishes
@mostafaelhoushi
Is it possible to applied mixed assembly techniques to DeepShift to achieve great enhancement (>=5x) on inference speed?
Successfully setup everything, and compiled shift_kernel, but when import shift_kernel, error message appeared:
ImportError: /home/grant/venv/lib/python3.6/site-packages/shift_kernel-0.0.0-py3.6-linux-x86_64.egg/shift_kernel.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail36_typeMetaDataInstance_preallocated_4E
For shift_cuda_kernel, the error message is:
Segmentation fault (core dumped)
I am working on Ubuntu 18.04, others are as required.
Thank you for providing your code :)
I tried to train your code because there is no trained models.
However the accuracy is not high as much as you reported.
I really want to inference about DeepShift.
Could you provide your files of trained models?
Thank you
The weight value is cliped to [- 1, 1] in the code:
self.shift_range = (-1 * (2(weight_bits - 1) - 2), 0)**
But the weight should have a small number of values exceeding 1.
For example 2 , it is easily obtained by shifting,<<1.
Is it better to rewrite the code as follows:
self.shift_range = (-1 * (2(weight_bits - 1) - 2), 2**(weight_bits)**
Thank you. @mostafaelhoushi
Copying the question by @mengjingyouling from this issue to create a new issue:
We also want to discuss a problem with you. In your paper, the shift network is applied in classification network, not target detection. What do you think? Is there a decline in the accuracy?
Because the shift 1 bit will lead to some accuracy loss. We want to shift twice to solve it. For example: 10 = 8 + 2( shift 3 bits + shift 1 bit). Therefore, we modify the code as follows:
def get_shift_and_sign(x, rounding='deterministic'): sign = torch.sign(x) x_abs = torch.abs(x) shift1 = round(torch.log(x_abs) / np.log(2), rounding) wr1 = 2 ** shift1 w1 = x_abs-wr1 shift2 = round(torch.log(w1) / np.log(2), rounding) return shift1,shift2, sign def round_power_of_2(x, rounding='deterministic'): shift1,shift2,sign = get_shift_and_sign(x, rounding) x_rounded = (2.0 ** shift1+2.0 ** shift2) * sign return x_rounded
However, the input in class Conv2dShiftQ(_ConvNdShiftQ): function will become Nan, which should be caused by data overflow:
class Conv2dShiftQ(_ConvNdShiftQ): ... .... ... ... #@weak_script_method def forward(self, input): print("--------------------------------------forward---------------------------------------------------") print("input======",input)
Can you give some suggestions to solve it? Thank you very much.
Passing a weights.pth
file to the --weights
is not working properly. It probably doesn't load the weights.
While passing a checkpoint.pth.tar
file to the --weights
is working properly
We have done some experiments. As reported in your paper, the precision of shift in some experiments is higher than that of multiplication.
Have you ever thought about the reason? We think that it is because of the sparsity(discontinuous) realized by the Deepshift that the model can converge to the optimal solution instead of the suboptimal solution.
Thanks
hi, @mostafaelhoushi
The input x is converted to 32bit fixed point in you paper,as follows:
`def round_to_fixed(input, integer_bits=16, fraction_bits=16):
assert integer_bits >= 1, integer_bits
# TODO: Deal with unsigned tensors where there is no sign bit
# which is the case with activations to convolution that
# are usually the output of a Relu layer
if integer_bits == 1:
return torch.sign(input) - 1
delta = math.pow(2.0, -(fraction_bits))
bound = math.pow(2.0, integer_bits-1)
min_val = - bound
max_val = bound - 1
rounded = torch.floor(input / delta) * delta
clipped_value = torch.clamp(rounded, min_val, max_val)
return clipped_value`
In the annotation of this function, it is said that this function is about unsigned tensor.
But we think it is about signed tensor. For example:
signed int8=[-128,127]
round_to_fixed(-128, integer_bits=8, fraction_bits=8) =-128
Are we right? Thank you.
Some alternatives to consider:
--shift-depth all
--shift-all
Hi, thanks for the awesome work!
I am very interested in this work. However, I am new to the area of quantization and have some questions about the round_to_fixed function in deepshift/utils.py.
This function aims to convert the input from FP32 to fixed-point format (e.g., fix16), to mimic the shift operation and precion of fixed-point input.
While the range of FP32 is very large, I didn't get how this round_to_fixed function can convert input to merely 16bits. In my opinion the delta should be considered together with the range of input. If the input is in [-1,1], this function works fine (although the bound here should be also 1), so is there any implication that input is in [-1,1]? Or how should I set the default parameters (fractions and intergers) if I want to convert input to fix16?
Could you give me some comments about the difference of these two implementations? Thanks!!
In your excellent work Deepshift, where input x and activation are quantified, does wx+b or wx not need to be quantified? If they don't quantify it, they should be occupying 64 bits of memory, right?Does this affect model acceleration?
best wishes
thank you!
When I compile the shift.cu, I get a error:
error: no instance of function template "DEEP_SHIFT_GEMM_GPU_KERNEL" matches the argument list
argument types are: (int *, int *, int *, int *, int, int, int, int, int, int)
the error in DEEP_SHIFT_LINEAR_GPU and DEEP_SHIFT_CONV_GPU . when bits == 7.
According to my understanding, the template of DEEP_SHIFT_GEMM_GPU_KERNEL is:
template <int num, int bits, char mask_shift, char mask_sign, bool zero_base>
Where the error is reported, it is used as :
DEEP_SHIFT_GEMM_GPU_KERNEL<NUM_4, BIT_6, 0x7f,0x80, NON_ZERO_BASE><<<gridDim, blockDim>>>
Since char can only represent numbers from -0x80 to 0x7f. My compiler raised a error to me.
So I suggest changing the template of DEEP_SHIFT_GEMM_GPU_KERNEL to:
template <int num, int bits, unsigned char mask_shift, unsigned char mask_sign, bool zero_base>
This change really solved my problem.
However, for my partners, this issue did not cause an error, so I guess this issue may be related to the compiler version
The minist.py code implements the training process of the Deepshift method. As it is a complete process, the model is trained and then tested. The test model is generated using model. eval (). After training, save the model weight file (. pth).This training process can achieve high accuracy(train_log.csv).
However, we found that loading the generated weight file(weights.pth) for inference(test fuction) will reduce the accuracy.
Do you have any suggestions?
Thanks!
The training strictly followed README and converged on MNIST.
However, we used shift type of PS to train resnet-18 but it did not converge, and we got top-1 acc of 5.13% on val dataset after 90 epochs training. We ensure no problem in dataset.
Could you provide some guides for this?
A CPU kernel was implemented in the project. We want to know which CPU can support it.What is the acceleration efficiency?
Thank you very much
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.