mostafaelhoushi / deepshift Goto Github PK

View Code? Open in Web Editor NEW

101.0 101.0 29.0 833 KB

Implementation of "DeepShift: Towards Multiplication-Less Neural Networks" https://arxiv.org/abs/1905.13298

Python 79.45% C++ 6.94% Cuda 13.54% Shell 0.06%

deepshift's People

Contributors

Stargazers

Watchers

deepshift's Issues

Does this code only implement the forward propagation shift operation?

Great job!

There are some questions. Does this code only implement the forward propagation shift operation? The back-propagation shift operation code is not found? Can you explain where the back propagation code is?

thank you!
@mostafaelhoushi

Advantages compared to int8

HI @mostafaelhoushi ：

Theoretically, int8 can be 3-4 times faster than FP32.
But in the current paper, there is almost no comparison with int8, because of any disadvantages of int8?

For example, the APOT paper reported that its paper speed is twice that of FP32, and it should be slower than int8.

https://github.com/yhhhli/APoT_Quantization

Looking forward to your reply, thank you very much！

Best wishes
@mostafaelhoushi

Wonder the performance under the CPU with mixed assembly

Is it possible to applied mixed assembly techniques to DeepShift to achieve great enhancement (>=5x) on inference speed?

shift_kernel & shift_cuda_kernel compiled but can not import

Successfully setup everything, and compiled shift_kernel, but when import shift_kernel, error message appeared:

ImportError: /home/grant/venv/lib/python3.6/site-packages/shift_kernel-0.0.0-py3.6-linux-x86_64.egg/shift_kernel.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail36_typeMetaDataInstance_preallocated_4E

For shift_cuda_kernel, the error message is:
Segmentation fault (core dumped)

I am working on Ubuntu 18.04, others are as required.

About trained model

Thank you for providing your code :)
I tried to train your code because there is no trained models.
However the accuracy is not high as much as you reported.
I really want to inference about DeepShift.
Could you provide your files of trained models?
Thank you

Round to Fixed to Deal with Unsigned Tensors

We need to deal with unsigned tensors where there is no sign bit which is the case with activations to convolution that are usually the output of a Relu layer. This will save us one bit when we want to quantize activations to lower bitwidths.

The Number of Shifted Bits

The weight value is cliped to [- 1, 1] in the code：
self.shift_range = (-1 * (2(weight_bits - 1) - 2), 0)**

But the weight should have a small number of values exceeding 1.
For example 2 , it is easily obtained by shifting,<<1.
Is it better to rewrite the code as follows：
self.shift_range = (-1 * (2(weight_bits - 1) - 2), 2**(weight_bits)**

Thank you. @mostafaelhoushi

Error When Shifting Twice

Copying the question by @mengjingyouling from this issue to create a new issue:

We also want to discuss a problem with you. In your paper, the shift network is applied in classification network, not target detection. What do you think? Is there a decline in the accuracy?

Because the shift 1 bit will lead to some accuracy loss. We want to shift twice to solve it. For example: 10 = 8 + 2( shift 3 bits + shift 1 bit). Therefore, we modify the code as follows:
def get_shift_and_sign(x, rounding='deterministic'):
  sign = torch.sign(x)
  x_abs = torch.abs(x)
  shift1 = round(torch.log(x_abs) / np.log(2), rounding)
  wr1 = 2 ** shift1
  w1 = x_abs-wr1
  shift2 = round(torch.log(w1) / np.log(2), rounding)
  return shift1,shift2, sign

def round_power_of_2(x, rounding='deterministic'):

  shift1,shift2,sign = get_shift_and_sign(x, rounding)
  x_rounded = (2.0 ** shift1+2.0 ** shift2) * sign
  return x_rounded
However, the input in class Conv2dShiftQ(_ConvNdShiftQ): function will become Nan, which should be caused by data overflow：
class Conv2dShiftQ(_ConvNdShiftQ):
... ....
... ...

  #@weak_script_method
  def forward(self, input):
    print("--------------------------------------forward---------------------------------------------------")
    print("input======",input)
Can you give some suggestions to solve it? Thank you very much.

Loading Weights from `weights.pth` Not Working

Passing a weights.pth file to the --weights is not working properly. It probably doesn't load the weights.

While passing a checkpoint.pth.tar file to the --weights is working properly

Why is the precision of deepshift higher than that of multiplication in some experiments？

hi @mostafaelhoushi

We have done some experiments. As reported in your paper, the precision of shift in some experiments is higher than that of multiplication.
Have you ever thought about the reason? We think that it is because of the sparsity（discontinuous） realized by the Deepshift that the model can converge to the optimal solution instead of the suboptimal solution.

Thanks

round_to_fixed unsigned tensors

hi, @mostafaelhoushi

The input x is converted to 32bit fixed point in you paper，as follows：
`def round_to_fixed(input, integer_bits=16, fraction_bits=16):
assert integer_bits >= 1, integer_bits
# TODO: Deal with unsigned tensors where there is no sign bit
# which is the case with activations to convolution that
# are usually the output of a Relu layer
if integer_bits == 1:
return torch.sign(input) - 1
delta = math.pow(2.0, -(fraction_bits))
bound = math.pow(2.0, integer_bits-1)
min_val = - bound
max_val = bound - 1
rounded = torch.floor(input / delta) * delta

clipped_value = torch.clamp(rounded, min_val, max_val)
return clipped_value`

In the annotation of this function, it is said that this function is about unsigned tensor.
But we think it is about signed tensor. For example:
signed int8=[-128,127]

round_to_fixed(-128, integer_bits=8, fraction_bits=8) =-128

Are we right? Thank you.

Avoid having to provide `--shift-depth 1000`

Some alternatives to consider:

--shift-depth all
--shift-all

round_to_fixed

Hi, thanks for the awesome work!

I am very interested in this work. However, I am new to the area of quantization and have some questions about the round_to_fixed function in deepshift/utils.py.

This function aims to convert the input from FP32 to fixed-point format (e.g., fix16), to mimic the shift operation and precion of fixed-point input.

While the range of FP32 is very large, I didn't get how this round_to_fixed function can convert input to merely 16bits. In my opinion the delta should be considered together with the range of input. If the input is in [-1,1], this function works fine (although the bound here should be also 1), so is there any implication that input is in [-1,1]? Or how should I set the default parameters (fractions and intergers) if I want to convert input to fix16?

Could you give me some comments about the difference of these two implementations? Thanks!!

does wx+b or wx not need to be quantified?

hi @mostafaelhoushi

In your excellent work Deepshift, where input x and activation are quantified, does wx+b or wx not need to be quantified? If they don't quantify it, they should be occupying 64 bits of memory, right?Does this affect model acceleration？

best wishes

thank you！

A bug in shift.cu caused me to fail to compile deepshift-gpu

When I compile the shift.cu, I get a error:

error: no instance of function template "DEEP_SHIFT_GEMM_GPU_KERNEL" matches the argument list
argument types are: (int *, int *, int *, int *, int, int, int, int, int, int)

the error in DEEP_SHIFT_LINEAR_GPU and DEEP_SHIFT_CONV_GPU . when bits == 7.

According to my understanding, the template of DEEP_SHIFT_GEMM_GPU_KERNEL is:

template <int num, int bits, char mask_shift, char mask_sign, bool zero_base>

Where the error is reported, it is used as ：

DEEP_SHIFT_GEMM_GPU_KERNEL<NUM_4, BIT_6, 0x7f,0x80, NON_ZERO_BASE><<<gridDim, blockDim>>>

Since char can only represent numbers from -0x80 to 0x7f. My compiler raised a error to me.

So I suggest changing the template of DEEP_SHIFT_GEMM_GPU_KERNEL to:

template <int num, int bits, unsigned char mask_shift, unsigned char mask_sign, bool zero_base>

This change really solved my problem.

However, for my partners, this issue did not cause an error, so I guess this issue may be related to the compiler version

test mAP

The minist.py code implements the training process of the Deepshift method. As it is a complete process, the model is trained and then tested. The test model is generated using model. eval (). After training, save the model weight file (. pth).This training process can achieve high accuracy(train_log.csv).

However, we found that loading the generated weight file(weights.pth) for inference(test fuction) will reduce the accuracy.

Do you have any suggestions?

Thanks!

We used shift type of PS to train resnet-18 from source but it did not converge.

The training strictly followed README and converged on MNIST.
However, we used shift type of PS to train resnet-18 but it did not converge, and we got top-1 acc of 5.13% on val dataset after 90 epochs training. We ensure no problem in dataset.
Could you provide some guides for this?

CPU kernel acceleration

A CPU kernel was implemented in the project. We want to know which CPU can support it.What is the acceleration efficiency?
Thank you very much

mostafaelhoushi / deepshift Goto Github PK

deepshift's People

Contributors

Stargazers

Watchers

Forkers

deepshift's Issues

Recommend Projects

Recommend Topics

Recommend Org