huawei-noah / addernet Goto Github PK

View Code? Open in Web Editor NEW

951.0 25.0 186.0 1.35 MB

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

pytorch imagenet convolutional-neural-networks cvpr2020 efficient-inference

addernet's Introduction

AdderNet: Do We Really Need Multiplications in Deep Learning?

This code is a demo of CVPR 2020 paper AdderNet: Do We Really Need Multiplications in Deep Learning?

We present adder networks (AdderNets) to trade massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the L1-norm distance between filters and input feature as the output response. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolution layer.

Run python main.py to train on CIFAR-10.

Classification results on CIFAR-10 and CIFAR-100 datasets.

Model	Method	CIFAR-10	CIFAR-100
VGG-small	ANN	93.72%	72.64%
	PKKD ANN	95.03%	76.94%
	SLAC ANN	93.96%	73.63%

ResNet-20	ANN	92.02%	67.60%
	PKKD ANN	92.96%	69.93%
	SLAC ANN	92.29%	68.31%
	ShiftAddNet*	89.32%(160epoch)	-

ResNet-32	ANN	93.01%	69.17%
	PKKD ANN	93.62%	72.41%
	SLAC ANN	93.24%	69.83%

Classification results on ImageNet dataset.

Model	Method	Top-1 Acc	Top-5 Acc
ResNet-18	CNN	69.8%	89.1%
	ANN	67.0%	87.6%
	PKKD ANN	68.8%	88.6%
	SLAC ANN	67.7%	87.9%

ResNet-50	CNN	76.2%	92.9%
	ANN	74.9%	91.7%
	PKKD ANN	76.8%	93.3%
	SLAC ANN	75.3%	92.6%

*ShiftAddNet used different training setting.

Super-Resolution results on several SR datasets.

Scale	Model	Method	Set5 (PSNR/SSIM)	Set14 (PSNR/SSIM)	B100 (PSNR/SSIM)	Urban100 (PSNR/SSIM)
×2	VDSR	CNN	37.53/0.9587	33.03/0.9124	31.90/0.8960	30.76/0.9140
		ANN	37.37/0.9575	32.91/0.9112	31.82/0.8947	30.48/0.9099
	EDSR	CNN	38.11/0.9601	33.92/0.9195	32.32/0.9013	32.93/0.9351
		ANN	37.92/0.9589	33.82/0.9183	32.23/0.9000	32.63/0.9309
×3	VDSR	CNN	33.66/0.9213	29.77/0.8314	28.82/0.7976	27.14/0.8279
		ANN	33.47/0.9151	29.62/0.8276	28.72/0.7953	26.95/0.8189
	EDSR	CNN	34.65/0.9282	30.52/0.8462	29.25/0.8093	28.80/0.8653
		ANN	34.35/0.9212	30.33/0.8420	29.13/0.8068	28.54/0.8555
×4	VDSR	CNN	31.35/0.8838	28.01/0.7674	27.29/0.7251	25.18/0.7524
		ANN	31.27/0.8762	27.93/0.7630	27.25/0.7229	25.09/0.7445
	EDSR	CNN	32.46/0.8968	28.80/0.7876	27.71/0.7420	26.64/0.8033
		ANN	32.13/0.8864	28.57/0.7800	27.58/0.7368	26.33/0.7874

Adversarial robustness on CIFAR-10 under white-box attacks without adversarial training.

Model	Method	Clean	FGSM	BIM7	PGD7	MIM5	RFGSM5
ResNet-20	CNN	92.68	16.33	0.00	0.00	0.01	0.00
	ANN	91.72	18.42	0.00	0.00	0.04	0.00
	CNN-R	90.62	17.23	3.46	3.67	4.23	0.06
	ANN-R	90.95	29.93	29.30	29.72	32.25	3.38
	ANN-R-AWN	90.55	45.93	42.62	43.39	46.52	18.36

ResNet-32	CNN	92.78	23.55	0.00	0.01	0.10	0.00
	ANN	92.48	35.85	0.03	0.11	1.04	0.02
	CNN-R	91.32	20.41	5.15	5.27	6.09	0.07
	ANN-R	91.68	19.74	15.96	16.08	17.48	0.07
	ANN-R-AWN	91.25	61.30	59.41	59.74	61.54	39.79

Comparisons of mAP on PASCAL VOC.

Model	Backbone	Neck	mAP
Faster R-CNN	Conv R50	Conv	79.5
FCOS	Conv R50	Conv	79.1
RetinaNet	Conv R50	Conv	77.3
FoveaBox	Conv R50	Conv	76.6
Adder-FCOS	Adder R50	Adder	76.5

Requirements

python 3
pytorch >= 1.1.0
torchvision

Preparation

You can follow pytorch/examples to prepare the ImageNet data.

The pretrained models are available in google drive or baidu cloud (access code:126b)

Usage

Run python main.py to train on CIFAR-10.

Run python test.py --data_dir 'path/to/imagenet_root/' to evaluate on ImageNet val set. You will achieve 74.9% Top accuracy and 91.7% Top-5 accuracy on the ImageNet dataset using ResNet-50.

Run python test.py --dataset cifar10 --model_dir models/ResNet20-AdderNet.pth --data_dir 'path/to/cifar10_root/' to evaluate on CIFAR-10. You will achieve 91.8% accuracy on the CIFAR-10 dataset using ResNet-20.

The inference and training of AdderNets is slow since the adder filters is implemented without cuda acceleration. You can write cuda to achieve higher inference speed.

Citation

@article{AdderNet,
	title={AdderNet: Do We Really Need Multiplications in Deep Learning?},
	author={Chen, Hanting and Wang, Yunhe and Xu, Chunjing and Shi, Boxin and Xu, Chao and Tian, Qi and Xu, Chang},
	journal={CVPR},
	year={2020}
}

Contributing

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR, because we might be taking the core in a different direction than you might be aware of.

addernet's People

Contributors

Stargazers

Watchers

Forkers

liuguoyou ranery wellsred jackyspeed dfenglei yaochx xrosliang aipakchoi jeremiemelo joeupwu github-luffy 983632847 xqpinitial demanding logichen hankerbit marvis shaodong-xilinx zhouhaocomeon1 ren98feng jchopemind penghu-cs wangwangsuibinbin runauto cndavy honey-sunmx gztangde ishine lamprosmousselimis changle2018 wuyx100 therealmileslee mzprose elephantgit tiankuo105 yichangwang shuxiangguo yjz1729 littleserendipity bandaidz louisnust yzu2ustc lji72 zhushaoquan joy20182018 hanllu trantorrepository googol-lab wuxf-10505 xuke225 asukaj sailingfm doudouwenhui 3-leaves-grass julianyu123456 aliang-0523 cadden alwaysnow youngadvance xinlinli170 xdcesc zyxzju dreamflyings mjt1312 beru tiantian-han chivalrouss inspectordidi amzhanghan wenbank damehou deepdarkfans karrynest jpatrickpark tenglang123 huismiling zhaoyibo61 laomagic shomlin guruzoa sundarammuthu kevvinhoo suwoongleekor seafrog1984 mark531593296 qiaowei-vvjoe sailyung llljun sohoscripts leeggpp chinthysl lanze77 zeta1999 goodboyandbadboy frankwanwork wangdeyu 18813185122 yushaodong figarok a1504popeye

addernet's Issues

How do you visualize the feature?

Hi, thanks for your brilliant work. However, I didn't find details of how you visualize the high dimensional feature of neural network. Do you visualize the entire last feature map with shape (N, C, H, W), or the average pooled feature map, or the last layer before classification? Which visualization method do you use?

AdderSR论文问题

你好，我有几个问题想请教一下：
1、AdderSR论文中所用的VDSR原始模型中并没有用BN层，请问adder layer后面必须连接一个BN层来防止梯度爆炸是吗？
2、adder layer后面可以用其他种类的normalization层吗？
3、如果我将adder layer引入其他任务中，而该任务中已经有另一种Normalization层A，如果直接替换的话是adder+BN+A层是这样吗？可以同时存在两个normalization层吗？

非常感谢您抽时间回答一下！

BatchNorm 效果不好

您好，看了您的文章，感觉非常好，最近也结合项目，将Conv替换了Adder ，但是我发现貌似BatchNorm 针对某些场景并不好。针对这个咱们有啥替换的建议不？？谢谢。

加速版AdderNet

您好，请问一下加速版AdderNet什么时候可以发布？之前有人问过这个问题，但是现在好像还没有(或者我没有找到)。
目前想将AdderNet用于其他领域的一些任务，但是由于训练速度太慢和GPU显存占用太高，目前版本的AdderNet实际中几乎无法使用。

感谢🙏🙏

您好，我想咨询一下ImageProcessingTransformer的问题

您好，冒昧打扰。之前拜读了您团队的Pretrained ImageProcessingTransformer，受益匪浅。在尝试复现您的论文，但是在Dataloader和Datasets的写过程中遇到了问题。不知如何裁剪图片，如何对数据进行预处理。论文中提到Overlap10Pix，但是边缘部分如何处理！
若能分享解惑，不甚感激。
感谢感谢，期待回复。

cuda版本addernet什么时候开源？

你好，我想问下cuda版本的addernet什么时候会开源？

开源代码和论文的学习率问题

您好！想请问下开源代码设置的学习率是否与论文中提到的学习率是不同的？
一个是余弦衰减，一个是自适应学习率？
麻烦解答下，谢谢啊

Please update the unfair comparison with ShiftAddNet

Hi,

We recently noticed that you update the comparison with ShiftAddNet in your readme.

However, the chosen accuracy of ShiftAddNet (i.e., 85.10%) is wrongly from our adaption experiments, for which we only use half the CIFAR-10 dataset to pre-train. Our accuracy (ResNet-20 on CIFAR-10) should be 89.32%, which is also trained with a different setting and different implementation (e.g., only 160 epochs).

Could you help to update the comparison results?

Best regards,
Haoran You

the speed of adder.adder2d slower than torch.nn.Conv2d ? I'm confused.

I replace adder.adder2d with torch.nn.Conv2d, and replace torch.cdist with my_cdist. train my network, the speed of the new model slower than the old at least 6 times. I'm confused.

@torch.jit.script
def my_cdist(x1, x2):
    x1_norm = x1.pow(2).sum(dim=-1, keepdim=True)
    x2_norm = x2.pow(2).sum(dim=-1, keepdim=True)
    res = torch.addmm(x2_norm.transpose(-2, -1), x1, x2.transpose(-2, -1), alpha=-2).add_(x1_norm)
    res = res.clamp_min_(1e-30).sqrt_()

    return res

python cpu slower than nn.Conv2d, why?

if need to quant model and testing in C++ ?

AdderNet for SR

Hi,

I see recent you have applying addernet for EDSR quantization, which is a great work.
But for some other SR, like EDVR, using deformable conv for the frame alignment.

I wonder whether addernet also be efficient, or even can be applied to this kind of network?
Since the DCNv2 using in EDVR is a irregular conv, how to adjust quantization int8 loss introduced by adderNet?
Do you have some thought over this kind of op?

Thx,
Lei

About the local learning rate

In equation 13, how to set the value of \yeta is not clarified. I'm quite confused.

咨询论文中的公式

1）想问一下论文3.3节中公式(9) 那条计算addnet输出方差的公式，公式的第一行是怎么推到 (1-2/π)... 这一行的呢
2）还是3.3节，公式(11) ，想问下损失对于xi的偏导这条公式是怎么得来的
3）3.3节公式(13) 本地学习率α的公式，为什么α采用这种形式就能够使各层以相同的步数更新呢
谢谢

imagenet训练代码可以公布吗？

a small bug on code

Hi,nice work.i found a small mistake in your resnet50.py
it should be "import torch.nn" but you print as "1mport"

where can I download the ResNet20-AdderNet.pth or ResNet50-AdderNet.pth？

hello，where can I download the ResNet20-AdderNet.pth or ResNet50-AdderNet.pth？

hello, why can't I obtain 91.8% acc on cifar10 with the provided ResNet20-AdderNet.pth ？

How to realize the feature visualization of ANN and CNN in Figure 1. ? Thanks !

The first conv layer is not add operation

Hi, I find that you use multiplication-based kernel to conduct first layer feature extraction in both resnet20.py and resnet50.py:

self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)

Is it a special trick or I can directly replace it with adder layer without accuracy drop?

度量方式选择

您好，卷积网络使用的是互相关度量，加法网络使用的是曼哈顿度量，请问这两种度量方式有没有从理论上可以推导的优劣之分？或者说两种度量方式是不是针对不同的场景有各自的优势？或者说为什么加法网络在准确率上会略低于卷积网络？

the backprop and adaptive learning rate code

I wonder know how you realize your new back propagation and the adaptive learning rate in each layer?
will you open source the relevant code in future?

Using too much gpu memory while training on addernet with ImageNet?

I try to train the addernet using resnet18 for ImageNet from scratch, with 4 1080Ti cards, but it just occupies too much memory that i could only set the batch_size to 16, and it's also too too slow..

For comparision, I have tired to replace the adder filters with normal conv filters and the 4 gpu cards could load 128 batch size. Did i setup wrong, or is that the normal case currently for addernet?

Have you guys tried to train with ImageNet?

are there some APIs in tensorflow about sliding window op

it seems that it's hard to achieve the adder2d_function in tensorflow, are there some APIs in tensorflow about sliding window op?

Why the paper does not provide the training/inference time or speed experiment results?

I read your paper, but I cannot find the speed test experiments, why not test?

Formula (2)

Y （m,n,t）= - (2)
the “-” means ？
How it comes?

eta setting for different models/datasets

Hi Hanting Chen,

Could you show the used hyper-parameters eta for {resnet20, resnet50} on {cifar10, cifar100, imagenet}, respectively? is there any suggestions for setting this value?

RuntimeError: _cdist_backward requires X2 to be contiguous

Hi, I am trying to train your addernet, but it returns me one runtime error, which I supposed attributes to .continuous() function or some other uncommon operations used in your adder.py.

Could you help to solve this issue?

RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

Train own data set

hi，
How does transfer learning train its own data set to modify pre-training parameters?
thanks

计算方差公式的疑问

论文中公式8和公式9可以那么计算的前提是“X”中以及“F”中每个位置像素独立同分布吗? ( 如果不是的话怎么得出来公式8和9的?）

RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

First I really appreciate your work . I met some problems when I try to run python main.py.
File "main.py", line 117, in
main()
File "main.py", line 112, in main
train_and_test(e)
File "main.py", line 105, in train_and_test
train(epoch)
File "main.py", line 80, in train
loss.backward()
File "/home/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: function adderBackward returned an incorrect number of gradients (expected 2, got 1)

Adder layer实现反卷积问题？

你好，请问一下adder layer是否可以实现反卷积呢？我尝试用adder layer实现反卷积来实现上采样两倍，但是最后输出的尺寸确是两倍少一，我想应该是outputpadding那里有问题，所以想请教您一下该如何做呢

cuda框架下使用adder方法的加速问题

请问您有打算发布cuda框架下使用adder方法的代码吗？
有的话请问大概什么时候能发布呢？加速大吗？

用resnet18版跑mnist

用resnet18版跑mnist为什么准确率只有10%

About the training accuracy.

Hi, Hanting Chen,

I tried the same training setting (Poly LR schedule with 0.9 power; 400 epochs; 256 batch size), and I can only get 89.16% accuracy when training resnet20 on CIFAR-10, while the reported accuracy is 91.84%.

Could you provide more training details and help to explain the gap? thanks.

how does addernet train?

Hello, how does addernet train? Can you provide the source code for the training model? When will your pre-training weights be made public?

Why don't replace all the convolutional layers with the adder layers

The conv2d layer is used at the beginning and end of the network

[Urgent] Cuda synchronize problem

Hi @HantingChen

I am trying to use cuda version for speeding up, but it shows that the time of cuda threads synchronization is much much longer than computation.

Could you please tell me how u handle this problem?

Look Forward of Codes about AdderSR

Hi,
Thanks for this excellenct work!
I cannot find the project page (detailed codes) of AdderSR: Towards Energy Efficient Image Super-Resolution

multiplication-free for only forward pass? Training is multiplication intense.

Hi there,

I have noticed that there are still plenty of multiplication operations in the backward pass and even square roots are required.
Is your method only multiplication free in forward pass but costly in training?
Thank you,

请问PKKDANN有开源计划么，谢谢

PKKD-ANN开源计划有么？

Equation (5) - partial derivative of the Euclidean norm

Hi,
I would like to know why you defined the L2-distance as in Equation (14) appendix.
Doesn't L2-distance need a square root outside the summations?
And also, I would like to know how the corresponding partial derivative of L2-distance in Equation (5) comes?
Thanks.

Could you give other pre-trained models?

Hello,

Thank you for sharing your code.

I want to inference about cifar100 and Imagenet.

But the pre-trained models you were upload are only about cifr10-renet20, Imagenet-resnet50.

It takes too long to learn.

Could you give a trained model for cifar100 using resnet20 and ImageNet using resnet18?

Thank you :)

为什么addernet比conv慢很多

我使用您们提供的源码跑cifar和mnist时，使用resnet50的addernet版要比torchvision里自带的resnet50慢很多(用5000个train samples和1000个test samples时，resnet50是6s一轮，adder_resnet50()居然是两分钟一轮，请问这是为什么呢？如果GPU是专门优化的话，我用cpu跑的时候速度依然慢得多，问题出在哪里呢？

Modify YOLOv3 backbone from DarkNet to AdderNet

How to correctly modify https://github.com/eriklindernoren/PyTorch-YOLOv3 to use https://github.com/huawei-noah/AdderNet ?

The following colab ipynb notebook is what I have so far with the helps of others:

https://colab.research.google.com/drive/1VCafwykgNKAO6144LssBFFy0TmruDNSE#scrollTo=W3e-WcVxnKfs

How to solve the error on this models.py file ?

Namespace(batch_size=8, class_path='data/coco.names', conf_thres=0.001, data_config='config/coco.data', img_size=416, iou_thres=0.5, model_def='config/yolov3.cfg', n_cpu=8, nms_thres=0.5, weights_path='weights/yolov3.weights')
Traceback (most recent call last):
  File "test.py", line 84, in <module>
    model.load_darknet_weights(opt.weights_path)
  File "/content/PyTorch-YOLOv3/models.py", line 321, in load_darknet_weights
    num_w = conv_layer.weight.numel()
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'adder2d' object has no attribute 'weight'