Code Monkey home page Code Monkey logo

addernet's Introduction

AdderNet: Do We Really Need Multiplications in Deep Learning?

This code is a demo of CVPR 2020 paper AdderNet: Do We Really Need Multiplications in Deep Learning?

We present adder networks (AdderNets) to trade massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the L1-norm distance between filters and input feature as the output response. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.

Run python main.py to train on CIFAR-10.

Classification results on CIFAR-10 and CIFAR-100 datasets.

Model Method CIFAR-10 CIFAR-100
VGG-small ANN 93.72% 72.64%
PKKD ANN 95.03% 76.94%
SLAC ANN 93.96% 73.63%
ResNet-20 ANN 92.02% 67.60%
PKKD ANN 92.96% 69.93%
SLAC ANN 92.29% 68.31%
ShiftAddNet* 89.32%(160epoch) -
ResNet-32 ANN 93.01% 69.17%
PKKD ANN 93.62% 72.41%
SLAC ANN 93.24% 69.83%

Classification results on ImageNet dataset.

Model Method Top-1 Acc Top-5 Acc
ResNet-18 CNN 69.8% 89.1%
ANN 67.0% 87.6%
PKKD ANN 68.8% 88.6%
SLAC ANN 67.7% 87.9%
ResNet-50 CNN 76.2% 92.9%
ANN 74.9% 91.7%
PKKD ANN 76.8% 93.3%
SLAC ANN 75.3% 92.6%

*ShiftAddNet used different training setting.

Super-Resolution results on several SR datasets.

Scale Model Method Set5 (PSNR/SSIM) Set14 (PSNR/SSIM) B100 (PSNR/SSIM) Urban100 (PSNR/SSIM)
×2 VDSR CNN 37.53/0.9587 33.03/0.9124 31.90/0.8960 30.76/0.9140
ANN 37.37/0.9575 32.91/0.9112 31.82/0.8947 30.48/0.9099
EDSR CNN 38.11/0.9601 33.92/0.9195 32.32/0.9013 32.93/0.9351
ANN 37.92/0.9589 33.82/0.9183 32.23/0.9000 32.63/0.9309
×3 VDSR CNN 33.66/0.9213 29.77/0.8314 28.82/0.7976 27.14/0.8279
ANN 33.47/0.9151 29.62/0.8276 28.72/0.7953 26.95/0.8189
EDSR CNN 34.65/0.9282 30.52/0.8462 29.25/0.8093 28.80/0.8653
ANN 34.35/0.9212 30.33/0.8420 29.13/0.8068 28.54/0.8555
×4 VDSR CNN 31.35/0.8838 28.01/0.7674 27.29/0.7251 25.18/0.7524
ANN 31.27/0.8762 27.93/0.7630 27.25/0.7229 25.09/0.7445
EDSR CNN 32.46/0.8968 28.80/0.7876 27.71/0.7420 26.64/0.8033
ANN 32.13/0.8864 28.57/0.7800 27.58/0.7368 26.33/0.7874

Adversarial robustness on CIFAR-10 under white-box attacks without adversarial training.

Model Method Clean FGSM BIM7 PGD7 MIM5 RFGSM5
ResNet-20 CNN 92.68 16.33 0.00 0.00 0.01 0.00
ANN 91.72 18.42 0.00 0.00 0.04 0.00
CNN-R 90.62 17.23 3.46 3.67 4.23 0.06
ANN-R 90.95 29.93 29.30 29.72 32.25 3.38
ANN-R-AWN 90.55 45.93 42.62 43.39 46.52 18.36
ResNet-32 CNN 92.78 23.55 0.00 0.01 0.10 0.00
ANN 92.48 35.85 0.03 0.11 1.04 0.02
CNN-R 91.32 20.41 5.15 5.27 6.09 0.07
ANN-R 91.68 19.74 15.96 16.08 17.48 0.07
ANN-R-AWN 91.25 61.30 59.41 59.74 61.54 39.79

Comparisons of mAP on PASCAL VOC.

Model Backbone Neck mAP
Faster R-CNN Conv R50 Conv 79.5
FCOS Conv R50 Conv 79.1
RetinaNet Conv R50 Conv 77.3
FoveaBox Conv R50 Conv 76.6
Adder-FCOS Adder R50 Adder 76.5

Requirements

  • python 3
  • pytorch >= 1.1.0
  • torchvision

Preparation

You can follow pytorch/examples to prepare the ImageNet data.

The pretrained models are available in google drive or baidu cloud (access code:126b)

Usage

Run python main.py to train on CIFAR-10.

Run python test.py --data_dir 'path/to/imagenet_root/' to evaluate on ImageNet val set. You will achieve 74.9% Top accuracy and 91.7% Top-5 accuracy on the ImageNet dataset using ResNet-50.

Run python test.py --dataset cifar10 --model_dir models/ResNet20-AdderNet.pth --data_dir 'path/to/cifar10_root/' to evaluate on CIFAR-10. You will achieve 91.8% accuracy on the CIFAR-10 dataset using ResNet-20.

The inference and training of AdderNets is slow since the adder filters is implemented without cuda acceleration. You can write cuda to achieve higher inference speed.

Citation

@article{AdderNet,
	title={AdderNet: Do We Really Need Multiplications in Deep Learning?},
	author={Chen, Hanting and Wang, Yunhe and Xu, Chunjing and Shi, Boxin and Xu, Chao and Tian, Qi and Xu, Chang},
	journal={CVPR},
	year={2020}
}

Contributing

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR, because we might be taking the core in a different direction than you might be aware of.

addernet's People

Contributors

hantingchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

addernet's Issues

How do you visualize the feature?

Hi, thanks for your brilliant work. However, I didn't find details of how you visualize the high dimensional feature of neural network. Do you visualize the entire last feature map with shape (N, C, H, W), or the average pooled feature map, or the last layer before classification? Which visualization method do you use?

AdderSR论文问题

你好,我有几个问题想请教一下:
1、AdderSR论文中所用的VDSR原始模型中并没有用BN层,请问adder layer后面必须连接一个BN层来防止梯度爆炸是吗?
2、adder layer后面可以用其他种类的normalization层吗?
3、如果我将adder layer引入其他任务中,而该任务中已经有另一种Normalization层A,如果直接替换的话是adder+BN+A层是这样吗?可以同时存在两个normalization层吗?

非常感谢您抽时间回答一下!

BatchNorm 效果不好

您好,看了您的文章,感觉非常好,最近也结合项目,将Conv替换了Adder ,但是我发现貌似BatchNorm 针对某些场景并不好。针对这个咱们有啥替换的建议不??谢谢。

加速版AdderNet

您好,请问一下加速版AdderNet什么时候可以发布?之前有人问过这个问题,但是现在好像还没有(或者我没有找到)。
目前想将AdderNet用于其他领域的一些任务,但是由于训练速度太慢和GPU显存占用太高,目前版本的AdderNet实际中几乎无法使用。

感谢🙏🙏

您好,我想咨询一下ImageProcessingTransformer的问题

您好,冒昧打扰。 之前拜读了您团队的Pretrained ImageProcessingTransformer, 受益匪浅。在尝试复现您的论文,但是在Dataloader和Datasets的写过程中遇到了问题。不知如何裁剪图片, 如何对数据进行预处理。论文中提到Overlap10Pix, 但是边缘部分如何处理!
若能分享解惑,不甚感激。
感谢感谢,期待回复。

开源代码和论文的学习率问题

您好!想请问下开源代码设置的学习率是否与论文中提到的学习率是不同的?
一个是余弦衰减,一个是自适应学习率?
麻烦解答下,谢谢啊

Please update the unfair comparison with ShiftAddNet

Hi,

We recently noticed that you update the comparison with ShiftAddNet in your readme.

However, the chosen accuracy of ShiftAddNet (i.e., 85.10%) is wrongly from our adaption experiments, for which we only use half the CIFAR-10 dataset to pre-train. Our accuracy (ResNet-20 on CIFAR-10) should be 89.32%, which is also trained with a different setting and different implementation (e.g., only 160 epochs).

Could you help to update the comparison results?

Best regards,
Haoran You

the speed of adder.adder2d slower than torch.nn.Conv2d ? I'm confused.

I replace adder.adder2d with torch.nn.Conv2d, and replace torch.cdist with my_cdist. train my network, the speed of the new model slower than the old at least 6 times. I'm confused.

@torch.jit.script
def my_cdist(x1, x2):
    x1_norm = x1.pow(2).sum(dim=-1, keepdim=True)
    x2_norm = x2.pow(2).sum(dim=-1, keepdim=True)
    res = torch.addmm(x2_norm.transpose(-2, -1), x1, x2.transpose(-2, -1), alpha=-2).add_(x1_norm)
    res = res.clamp_min_(1e-30).sqrt_()

    return res

AdderNet for SR

Hi,

I see recent you have applying addernet for EDSR quantization, which is a great work.
But for some other SR, like EDVR, using deformable conv for the frame alignment.

I wonder whether addernet also be efficient, or even can be applied to this kind of network?
Since the DCNv2 using in EDVR is a irregular conv, how to adjust quantization int8 loss introduced by adderNet?
Do you have some thought over this kind of op?

Thx,
Lei

咨询论文中的公式

1)想问一下论文3.3节中公式(9) 那条计算addnet输出方差的公式,公式的第一行是怎么推到 (1-2/π)... 这一行的呢
2)还是3.3节,公式(11) ,想问下损失对于xi的偏导这条公式是怎么得来的
3)3.3节公式(13) 本地学习率α的公式,为什么α采用这种形式就能够使各层以相同的步数更新呢
谢谢

a small bug on code

Hi,nice work.i found a small mistake in your resnet50.py
it should be "import torch.nn" but you print as "1mport"

The first conv layer is not add operation

Hi, I find that you use multiplication-based kernel to conduct first layer feature extraction in both resnet20.py and resnet50.py:

self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)

Is it a special trick or I can directly replace it with adder layer without accuracy drop?

度量方式选择

您好,卷积网络使用的是互相关度量,加法网络使用的是曼哈顿度量,请问这两种度量方式有没有从理论上可以推导的优劣之分?或者说两种度量方式是不是针对不同的场景有各自的优势?或者说为什么加法网络在准确率上会略低于卷积网络?

Using too much gpu memory while training on addernet with ImageNet?

I try to train the addernet using resnet18 for ImageNet from scratch, with 4 1080Ti cards, but it just occupies too much memory that i could only set the batch_size to 16, and it's also too too slow..

For comparision, I have tired to replace the adder filters with normal conv filters and the 4 gpu cards could load 128 batch size. Did i setup wrong, or is that the normal case currently for addernet?

Have you guys tried to train with ImageNet?

Formula (2)

Y (m,n,t)= - (2)
the “-” means ?
How it comes?

eta setting for different models/datasets

Hi Hanting Chen,

Could you show the used hyper-parameters eta for {resnet20, resnet50} on {cifar10, cifar100, imagenet}, respectively? is there any suggestions for setting this value?

Train own data set

hi,
How does transfer learning train its own data set to modify pre-training parameters?
thanks

计算方差公式的疑问

论文中公式8和公式9可以那么计算的前提是“X”中以及“F”中每个位置像素独立同分布吗? ( 如果不是的话怎么得出来公式8和9的?)

image
image

RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

First I really appreciate your work . I met some problems when I try to run python main.py.
File "main.py", line 117, in
main()
File "main.py", line 112, in main
train_and_test(e)
File "main.py", line 105, in train_and_test
train(epoch)
File "main.py", line 80, in train
loss.backward()
File "/home/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

Adder layer实现反卷积问题?

你好,请问一下adder layer是否可以实现反卷积呢?我尝试用adder layer实现反卷积来实现上采样两倍,但是最后输出的尺寸确是两倍少一,我想应该是outputpadding那里有问题,所以想请教您一下该如何做呢

About the training accuracy.

Hi, Hanting Chen,

I tried the same training setting (Poly LR schedule with 0.9 power; 400 epochs; 256 batch size), and I can only get 89.16% accuracy when training resnet20 on CIFAR-10, while the reported accuracy is 91.84%.

Could you provide more training details and help to explain the gap? thanks.

how does addernet train?

Hello, how does addernet train? Can you provide the source code for the training model? When will your pre-training weights be made public?

[Urgent] Cuda synchronize problem

Hi @HantingChen

I am trying to use cuda version for speeding up, but it shows that the time of cuda threads synchronization is much much longer than computation.

Could you please tell me how u handle this problem?

Look Forward of Codes about AdderSR

Hi,
Thanks for this excellenct work!
I cannot find the project page (detailed codes) of AdderSR: Towards Energy Efficient Image Super-Resolution

Equation (5) - partial derivative of the Euclidean norm

Hi,
I would like to know why you defined the L2-distance as in Equation (14) appendix.
Doesn't L2-distance need a square root outside the summations?
And also, I would like to know how the corresponding partial derivative of L2-distance in Equation (5) comes?
Thanks.

Could you give other pre-trained models?

Hello,

Thank you for sharing your code.

I want to inference about cifar100 and Imagenet.

But the pre-trained models you were upload are only about cifr10-renet20, Imagenet-resnet50.

It takes too long to learn.

Could you give a trained model for cifar100 using resnet20 and ImageNet using resnet18?

Thank you :)

为什么addernet比conv慢很多

我使用您们提供的源码跑cifar和mnist时,使用resnet50的addernet版要比torchvision里自带的resnet50慢很多(用5000个train samples和1000个test samples时,resnet50是6s一轮,adder_resnet50()居然是两分钟一轮,请问这是为什么呢?如果GPU是专门优化的话,我用cpu跑的时候速度依然慢得多,问题出在哪里呢?

Modify YOLOv3 backbone from DarkNet to AdderNet

How to correctly modify https://github.com/eriklindernoren/PyTorch-YOLOv3 to use https://github.com/huawei-noah/AdderNet ?

The following colab ipynb notebook is what I have so far with the helps of others:

https://colab.research.google.com/drive/1VCafwykgNKAO6144LssBFFy0TmruDNSE#scrollTo=W3e-WcVxnKfs

How to solve the error on this models.py file ?

addernet_error

Namespace(batch_size=8, class_path='data/coco.names', conf_thres=0.001, data_config='config/coco.data', img_size=416, iou_thres=0.5, model_def='config/yolov3.cfg', n_cpu=8, nms_thres=0.5, weights_path='weights/yolov3.weights')
Traceback (most recent call last):
  File "test.py", line 84, in <module>
    model.load_darknet_weights(opt.weights_path)
  File "/content/PyTorch-YOLOv3/models.py", line 321, in load_darknet_weights
    num_w = conv_layer.weight.numel()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'adder2d' object has no attribute 'weight'

Why apply SGD on input feature X

in Section 3.2, this paper applys Stochastic gradient descent on input feature X, can input features be optimized? Can't understand the purpose of it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.