he-y / filter-pruning-geometric-median Goto Github PK

View Code? Open in Web Editor NEW

562.0 562.0 113.0 2.22 MB

Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)

Home Page: https://arxiv.org/abs/1811.00250

Python 85.91% Shell 14.09%

model-compression pruning pytorch

filter-pruning-geometric-median's People

Contributors

Stargazers

Watchers

Forkers

onetaken aaronswei aiyangyang963 tuanho27 youngergao franciszchen amore-hdu fd-mingjie huangwenwenlili liuwenhaha jeffjunzhang futureprecd vraivon yyx1107 xiangqianma peterzhousz worldmovers changewow jiafenggang2 peterzs 20victor12 arui1 fengfengfeng96 chaos1992 december-boy suzzy1128 cheneyfan andy1309mhp genius0712 invictusy sunyukuan shpanier123 leoozy tlzhao-casia violetcodet secondli xuhui6666 sseung0703 zsj9509 tclw123 yunheewoo lliai mtcld dely-yu xdmegumi zp1018 yanyangchun snowygoose mrchannon ewardjohn seethamraju tiantian-han daneren wangguanan liguang190223 maestrojeong edwardliuyu xrosliang doitdodo tiqq111 hixio-mh fangyuchu jstzwjr hommmm zymale mathpopo henryzhongsc beyondliangcai sirius93123 wuxiaolianggit jinsoo9595 crazyvertigo tianhaofu idoltgy xhwnobody abhishekvats1997 hongkkab wzb1005 chenyucong1 manongdarker scott-mao rootzzp future-superstar tim-hung lc790 duyongqi mr852 sqle-github meetsiyuan ironicbo zhangxin-xd intbingbing zyzzu zhengxiawu ms116 jiangbofeng ys31jp zachleclaire6561 christin-bose python-eric

filter-pruning-geometric-median's Issues

pruning vgg issue

Hi! I want to prune the vggnet model.But I found that pruning_cifar_vgg.py didn't use the 'do_grad_mask' function.
Is there any difference between resnet and vgg?
Hope for your reply~

distance.cdist

similar_matrix = distance.cdist(weight_vec_after_norm, weight_vec_after_norm, 'euclidean')
请问这一语句是实现哪一步的,为什么要求自身的欧氏距离呢?

the pruning interval looks useless.

your algorithms looks prune the network every end of a epoch. However, it is not.

If you prune P% weights. Norm criterion is disabled, because zero is the lowest.

Also, Distance criterion is disabled, because all of pruned weights's distance is zero.

I checked the pruned index of every end of epoch, and they are not changed.

Am I wrong?

Hi, I wonder how you handle the residual structure in the residual network, such as resnet20.
In your pruning code https://github.com/he-y/soft-filter-pruning/blob/master/utils/get_small_model.py, line 380, you write "# just conv1 of block, the input dimension should not change for shortcut", so it means you don't pruned the last conv in the resblock?

Question about the deployment of the pruned model

Thanks for your great work. I wonder if I understand your algorithm well and my question is as follow:

As we all know, the number of output channels of some layers in resnet must be same because of the existence of residual block.

Take resnet for cifar10 as example:

The number of output channels of block1 and conv0 are both 16.
Assuming conv0 prunes the 1st, 2nd, 3rd channels (namely, the rest 13 channels are remained),.
Assuming block1/layer0 prunes the 4th, 5th channels (namely, the rest 14 channels are remained).
Then when forwarding the model, we need to add the feature maps from conv0 and block1/layer0 all. The resulting feature maps still have 16 channels because conv0 and block1/layer0 prune different channels.

Does what I described above right?

If so, the deployed model still has to contain all zero weights, or I will not know which filters are pruned and how should I match features maps from conv0 and block0/layer1 when executing element-wise add operation.

Hopy for your reply!

Prune weights and bias together

Hi,

Thanks for your work. I have a question, for a convolution layer which has both weights and bias, should I prune them together? I noticed in your implementation, in conv2d, you always set bias = False. For example, a convolution layer has weights tensor [3, 3, 64, 256] and bias tensor 256, after pruning, I should get something like weight tensor [3, 3, 64, 175] and bias tensor 175?

python: can't open file 'pruning_train.py': [Errno 2] No such file or directory

请问这个问题如何解决，我想在tensorflow模型上使用你们的方法修建网络，请问代码的核心部分在什么地方?期待你的回复!
How to solve this problem? I want to use your method to build the network in tensorflow model. Where is the core part of the code? I look forward to your reply.

Is it reasonable to get zero filters larger than the theoretical pruned num?

Thanks a lot for your great work!
When I run the code, I have met some problems in getting the small model, and I have tried to find the reasons.
As described in the issue:Is it reasonable to get zero filters larger than the theoretical pruned num? e.g the filter num in first conv is 64, the pruned rate is 0.3, so, the theoretical pruned num is 45, but I got 50 zero filters. Is it reasonable？
This problem caused the failure in getting the small model because the kept filters num may not equal to the theoretical value. To get the small model successfully, I have to pick some zero filters as the kept filters, which leads to the precision decrease in the small model.
During reading your code, it seems that the zero filters should be equal to the theoretical num, I am not clear if it is right?

Tried to allocate one gpu but the commond does work

Nomatter which one I reffered to BY USING
1.CUDA_VISIBLE_DEVICES= # in the shell line
or
2. os.environ["CUDA_VISIBLE_DEVICES"] = "#" in the py file
it returns the error below
RuntimeError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 11.90 GiB total capacity; 11.34 GiB already allocated; 18.94 MiB free; 579.50 KiB cached)

(the GPU0 is used)

Maybe a Typo in pruning_cifar10.sh

Thanks for sharing the codes.
In the scripts, the file pruning_cifar10.sh may have some typos.
As the save name is ratenorm0.7_ratedist0.1, the input for argument '--rate_norm' is 1.
Is it a mistake?
If not, I would need help to understand it.

Problem about ResNet20's accuracy on CIFAR-10?

Sorry, I have some problems with ResNet20's accuracy on CIFAR-10. In two papers that can be found, I find two different accuracies and want to know which one is the final accuracy.

Abour similar_small_index

When running the code, I found that the similar_small_index always includes indexes from 0 to n,such as [0,1,2,3,4,5,6,7,8] .And the similar_sum matrix always has some equal numbers in the matrix. Is this because there are some setttings to constrain the matrix?

About the mathematical notations.

Hi, thanks for open-sourcing the code.

I got confused on some of the mathematical notations.

Why did you choose the geometric mean ? Is there theoratical evidence that 'points near GM can be represented by others'? Why don't just use the mean value of all points, because the mean value is exactly the linear combination of all points.
Eq.10 of the paper.

step-1 to step-2 is valid ONLY IF g(x) and ||x - F_{i,j^*}|| share the same minimal point. Is there asumption or proof to guarantee this?

Weights are revived after pruning due to the momentum.

this code only masks the gradient, but not momentum.
So masked weights revived after pruning and they will affect to other weights.
Did you check it?

filters after training

m.do_similar_mask()
net = m.model
After these operations('do_similar_mask'),this model's several filters have been set to zeros.
This paper is based on soft filters pruning, but after training, I found the 'model.paramters' didn't update.What is the reasons?Or Where did I get it wrong? Thanks~

small_model.pt accuracy is not equal to big_model.pt

Hi, when i train a resnet50 in imagenet, and i use get_small.sh obtain big_model.pt and small_model.pt. the accuracy is equal. but when i train a resnet50 in private dataset, the obtained small_model.pt accuracy is not equal to big_model.pt. what's the reason about it. the prunning rate is 0.3.

How to use the pruned small model directly?

Hello,thank you for your code.
Now I want to use the pruned resnet model directly you released at https://github.com/he-y/soft-filter-pruning/releases/tag/ResNet50_pruned. I have checked get_small_model.py, and I found that when creating a small model, you provide index and bn_value, and these two are obtained from the big model,does this mean if I want to use the small model,I must have the big model to get the index and bn_value?

yolov3的darknet模型

非常感谢您上一个问题的解答。我还有一个问题想请教您一下。我现在是想用yolov3算法，使用Darknet53模型进行目标识别，想使用您这个压缩算法对得到的模型（数据集是VOC）进行压缩。要怎么进行修改呢，比如训练函数train()或者精度等方面。刚刚接触这一方面，希望您能给一些建议，谢谢~

About geometric median pruning (get_filter_similar function)

As you said here, even when you define a compress rate (e.g pruning rate) the total remaining filters are less than the actual compress rate.

That said,

Why prune by geometric median after applying l2-norm?
Isn't similarity a good proxy of the importance of the filter by itself?

the small model of resnet56 for cifar10 performs not as well as the big model ?

Hi, I have trained with pruning a resnet56 for cifar10. The big model has accuracy of 89%, while the small model has only 10%. I looked into the details and found there seems to be an error in pruning training.
When training with pruning, only the first dim of conv weight are masked 0 during filtering. The second dim of conv weight, which matches the input channel number, will not be masked even if some channel of the last conv are masked 0.
However, when transfering to small model, the second dim of conv weight has also to be filtered to keep the shape of input and conv weight match. As a result, some non-zero weights are pruned, and the small model's accuracy drops a lot.
Have you noticed this issue? And how did you solve it ?
Thank you for your patience.
Looking forward to your reply!

np.abs over consine distance

Hi, thanks for your great work. I have one question on distance calculation:
https://github.com/he-y/filter-pruning-geometric-median/blob/master/pruning_cifar10.py#L528

similar_matrix = 1 - distance.cdist(weight_vec_after_norm, weight_vec_after_norm, 'cosine')
similar_sum = np.sum(np.abs(similar_matrix), axis=0)

while the cosine distance in scipy is defined as follow:

Therefore, the distance between two vectors might be negative, right?
Why here use the np.abs over similar_matrix(FPGM distance matrix) on cosine distance?

Thanks

Channel Pruning should be done in the BN layer after the convolution layer.

The formula of the BN layer is like this: (x - bn.mean) * bn.weight / sqrt(bn.variance + 1e-5) + bn.bias. after channel pruning in a convolution layer, some channels in x are zeros, but according to the formula, the output of these pruned channels in the following BN layer are not zeros. If you get the pruned model without handling each BN layer, the performance drops significantly.

Some questions about cal flops

当你裁剪完一层后，那么下一层相应位置应该也不会去计算flops，对吧？

Question on equivalence of g(x) and g'(x) in FPGM_updated

Hi, thanks for your work, I have one question on why you prove the equivalence on g(x) and g'(x), as in equation (5)-(9)
"Note that even if F_{i,j*} is not included in the calculation of the geometric median in Equation(4),
we could also achieve the same result."

Thanks in advance

Why do you do_mask before the training process? (purning_imagenet.py)

Hello:
Thank you for your code! I've read your code carefully and found that you do do_mask() and do_similar_mask() before the training process, the simplified code is listed below:
`
m.init_mask(args.rate_norm, args.rate_dist)
# m.if_zero()
m.do_mask()
m.do_similar_mask()
model = m.model
....
.....
for epoch in range(1, max_epoches):
...
...
train()
m.init_mask(args.rate_norm, args.rate_dist)
# m.if_zero()
m.do_mask()
m.do_similar_mask()
model = m.model

`
I really can not understand it. You initialize the model randomly and do the do_mask() before the first epoch. It just like you get the mask and select the filters randomly. As my understanding, you zero out the grad of the selected filters while training and also set the value of the selected filters zero. Will the values of these filters be zeros all the time since they are set zero before training and will not be updated?

Then if the filters selected before training are always zero, there is no meaning to do the do_mask after the training process.

Besides, could you please tell me whether the selected filters are changed after each epoch? Since you do the init_mask after every training process.

Reproduce cifar pretrain in table 1

Could you please share the pretrained cifar models for reproducing the result in table 1?
Thanks for your nice work!

FLOPS Calculator has error.

Your flops calculator has error.
Surely, number of filter is integer, but in your code, it is float.
In your pruning algorithm applying ceil to number of filter make it right.
After correct them, in case of resnet110-cifar10,
40% pruning only reduce 40.26% FLOPS and
50% pruning only reduce 50.47% FLOPS.
I have to check it again, but I'm sure your one is wrong.

Scripts' and logs' hyper-parameters are different.

In case of Resnet20-cifar10, scripts say learning rate is 0.01. However logs say 0.1.
Also, In paper you said that 40% means "30% pruned by distance and 10% pruned by norm".
However, the logs name is "cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2",
and it says that "'rate_dist': 0.1, 'rate_norm': 1.0".
which one is correct?

Question about the complexity

Thanks for the nice work first! And I have some question about complexity.

As stated section 3.4 and the codebase, you choose to calculate pairwise distance in equation (6) by scipy.spatial.distance.cdist. Suppose we have n filters with dimension d, the complexity here is $n^2 d$. If we explicit compute the geometric median, the complexity is $ nd $ with time for computing geometric median.
The latter seems to be more efficient. So why do you choose equation (6) instead of equation (4)?

Thanks

关于论文baseline的学习率设置

大佬，您的论文中的cifar10的reset 的baseline的学习率是60,120,160epoch设置为0.01，0.001,0.0001吗

Figure 2. norm-based criterion.question

Hello! I have a question. In Figure 2, Figure 2a, Small Norm Deviation, green is not good because the variance is small and the search space is small. Figure 2b, Large Minimum Norm, green is not good because v1 ′ ′ v1 → 0. I don't understand why the minimum value of the norm needs to be close to 0, and why is the blue one good?

Question about scripts & paper

Problem solved, sorry for bothering you

Can't find the images in IMAGENET/TRAIN/ folder

I have set the arg data as /data/groceries/imagenet1
and it is supposed to get the training data from the arg above + /train

But the mistake occured:

RuntimeError: Found 0 files in subfolders of: /data/groceries/imagenet1/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif

Question about pruning_cifar10.sh.

When I read the pruning_cifar10.sh file,I found the following shell function:

run20_2(){
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1)&
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2 1 0.1)&
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience3 1 0.1)&


(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1)&
(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2 1 0.1)&
(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience3 1 0.1)&
}

I have some question about this function, why do you run three sub shells at the same time? I find the parameters of function pruning_scratch_resnet20 in these three sub shells are same except the save_path. Can I just run one sub shell, such as the first one?
It would be appreciate if i could receive your reply! Thank you!

PermissionError: [Errno 13] Permission denied: '/data'

Why the inference time of pruning resnet20 model is the same with the time without pruning one? I used Resnet20 with pruning and without pruning, their inference time both cost about 90ms more or less.

indices = torch.LongTensor(filter_large_index).cuda()

pruning_scratch_resnet20 0 ./data/yahe/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1

when i want to run on cpu,there are erros:
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/aten/src/THC/THCGeneral.cpp:74

File "pruning_cifar10.py", line 524, in get_filter_similar
indices = torch.LongTensor(filter_large_index).cuda()

Have you tried this algorithm on MobileNet or ShuffleNet?

VGGNet on CIFAR-10

I'm trying to reproduce this results, but I got worse results, the accuracy of Pruned without FT I got are much lower than the results in this table, which is about 25%(PFEC). Could you upload your pre-trained checkpoint that is used to prune?

One question about calculate layer_end and last_index

Thank you for you great work. I have one question about layer end.

As the title say, What layer_end means, how to calculate it? And why last_index is layer_end - 3 when running init_rate in main().

self.mask_index = [x for x in range(0, last_index, 3)], as I think, there is a "3" here because conv2d contains 1 parameter and BN layer contains 2, but you only focus on conv2d layer, so you skip BN's parameters, right? But how about parameters in projection layer? As I understand, you do not consider parameters in projection layer in ResNet, but it seems that you do not skip them here.

Hope for your reply

Pruning filters and channels

Hi! I have a puzzle that if you prune some filters in a specific layer, consequently you also need to prune channels in the next layer accordingly.But your codes just zerorize filters for each layer instead of considering the ordered relationship between two layers i.e. remove both filters in the ith layer and channels in the i+1th layer. Especially when the pruning process ends, if all targeted filters are pruned in the whole network, Error will be occured in the inference process, cause the reason overhead.

The poor result in resnet+cifar10 without fine-tuning

Reproduction of ResNet-18 from scratch

Hi! After run 'python pruning_imagenet.py -a resnet18 --save_dir ./snapshots/resnet18-rate-0.7 --rate_norm 1 --rate_dist 0.3 --layer_begin 0 --layer_end 57 --layer_inter 3 /home/share/data/ilsvrc12_shrt_256_torch/' with 4 GPUs，batchsize=256

The result got Prec@1 66.662 Prec@5 87.440 Error@1 33.338 which has a little drop as yours in Top-1 @67.78 Top-5 @88.01
How & how

Set the gradients of pruned filters zero

@he-y thanks for your work and sharing!
I have a question: In train() function:

Mask grad for iteration

    m.do_grad_mask()
    optimizer.step()

How do you change model's grad through do_grad_mask for m?

How to get_small_model

I download the resnet18-5c106cde.pth and run "sh pruning_cifar10.sh ".
pruning_scratch_resnet18(){
...
--arch resnet18
--use_state_dict
--layer_begin 0 --layer_end 48 --layer_inter 3 --epoch_prune 1}
And I get the 'checkpoint.pth.tar'.
When I run the get_small_model.py in SFP, ('--resume', default='checkpoint.pth.tar').There is an error:

/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py:454: SourceChangeWarning: source code of class 'models.res_utils.DownsampleA' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
File "utils/get_small_model.py", line 346, in
main()
File "utils/get_small_model.py", line 83, in main
state_dict = remove_module_dict(state_dict)
File "utils/get_small_model.py", line 161, in remove_module_dict
for k, v in state_dict.items():
File "/home/vinsen/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in getattr
type(self).name, name))
AttributeError: 'DataParallel' object has no attribute 'items'

similar_index_for_filter issue

Thanks for the excellent work. But I got a problem here.

In "get_filter_similar" method, the mask is generated by: similar_index_for_filter = [filter_large_index[i] for i in similar_small_index].

Elements of similar_small_index and filter_large_index vectors are indices of filters. Why use the index of filters to slice another list (filter_large_index[i])? On the other hand, errors may occur.

Think this way, there are 16 filters. the similar_small_index contains 15, which is one of the smallest simi_sum filters which need to be pruned. But filter_large_index contains 10 elements only and 15 is not included.

geometric median issue

As issue #8

The zero filters which is set in mask operation will always be in the first several indexes in weight_vec_after_norm.
So if we use distance.cdist to calculate similar_matrix, the similar_matrix is like this
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0 0.x 0.x
0.x 0.x 0.x 0.x 0.x 0 0.x 0.x
the left top matrix values are all zeros.
so these zero filters will always become the geometric-median, because these filters are near the geometric-median point in params space.

And if we train net from scratch, these mask out zeros filters depends on the random weight initialization. Is that correct?

Is my understanding right?

python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory

I'm trying to run the Training VGGNet on Cifar-10 example, but I've got the following:
python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'PFEC_vggprune.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'PFEC_vggprune.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory

When I run this:

~/filter-pruning-geometric-median$ sh VGG_cifar/scripts/PFEC_train_prune.sh

Could you please help me on this? Am I doing something wrong?

This codes have problems and I think they have to be revised.

#46 #47
please revise them. they are still open issue.

File not available

Python files "training baseline, pruning from pretrain, pruning from scratch, finetune the pruend" in reproducing paper are not available.