he-y / filter-pruning-geometric-median Goto Github PK
View Code? Open in Web Editor NEWFilter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)
Home Page: https://arxiv.org/abs/1811.00250
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)
Home Page: https://arxiv.org/abs/1811.00250
Hi! I want to prune the vggnet model.But I found that pruning_cifar_vgg.py didn't use the 'do_grad_mask' function.
Is there any difference between resnet and vgg?
Hope for your reply~
similar_matrix = distance.cdist(weight_vec_after_norm, weight_vec_after_norm, 'euclidean')
请问这一语句是实现哪一步的,为什么要求自身的欧氏距离呢?
your algorithms looks prune the network every end of a epoch. However, it is not.
If you prune P% weights. Norm criterion is disabled, because zero is the lowest.
Also, Distance criterion is disabled, because all of pruned weights's distance is zero.
I checked the pruned index of every end of epoch, and they are not changed.
Am I wrong?
Hi, I wonder how you handle the residual structure in the residual network, such as resnet20.
In your pruning code https://github.com/he-y/soft-filter-pruning/blob/master/utils/get_small_model.py, line 380, you write "# just conv1 of block, the input dimension should not change for shortcut", so it means you don't pruned the last conv in the resblock?
Thanks for your great work. I wonder if I understand your algorithm well and my question is as follow:
As we all know, the number of output channels of some layers in resnet must be same because of the existence of residual block.
Take resnet for cifar10 as example:
Does what I described above right?
If so, the deployed model still has to contain all zero weights, or I will not know which filters are pruned and how should I match features maps from conv0 and block0/layer1 when executing element-wise add operation.
Hopy for your reply!
Hi,
Thanks for your work. I have a question, for a convolution layer which has both weights and bias, should I prune them together? I noticed in your implementation, in conv2d, you always set bias = False. For example, a convolution layer has weights tensor [3, 3, 64, 256] and bias tensor 256, after pruning, I should get something like weight tensor [3, 3, 64, 175] and bias tensor 175?
请问这个问题如何解决,我想在tensorflow模型上使用你们的方法修建网络,请问代码的核心部分在什么地方?期待你的回复!
How to solve this problem? I want to use your method to build the network in tensorflow model. Where is the core part of the code? I look forward to your reply.
Thanks a lot for your great work!
When I run the code, I have met some problems in getting the small model, and I have tried to find the reasons.
As described in the issue:Is it reasonable to get zero filters larger than the theoretical pruned num? e.g the filter num in first conv is 64, the pruned rate is 0.3, so, the theoretical pruned num is 45, but I got 50 zero filters. Is it reasonable?
This problem caused the failure in getting the small model because the kept filters num may not equal to the theoretical value. To get the small model successfully, I have to pick some zero filters as the kept filters, which leads to the precision decrease in the small model.
During reading your code, it seems that the zero filters should be equal to the theoretical num, I am not clear if it is right?
Nomatter which one I reffered to BY USING
1.CUDA_VISIBLE_DEVICES= #
in the shell line
or
2. os.environ["CUDA_VISIBLE_DEVICES"] = "#"
in the py file
it returns the error below
RuntimeError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 11.90 GiB total capacity; 11.34 GiB already allocated; 18.94 MiB free; 579.50 KiB cached)
(the GPU0 is used)
Thanks for sharing the codes.
In the scripts, the file pruning_cifar10.sh may have some typos.
As the save name is ratenorm0.7_ratedist0.1, the input for argument '--rate_norm' is 1.
Is it a mistake?
If not, I would need help to understand it.
When running the code, I found that the similar_small_index always includes indexes from 0 to n,such as [0,1,2,3,4,5,6,7,8] .And the similar_sum matrix always has some equal numbers in the matrix. Is this because there are some setttings to constrain the matrix?
Hi, thanks for open-sourcing the code.
I got confused on some of the mathematical notations.
Why did you choose the geometric mean
? Is there theoratical evidence that 'points near GM can be represented by others'? Why don't just use the mean value
of all points, because the mean value
is exactly the linear combination of all points.
Eq.10 of the paper.
step-1 to step-2 is valid ONLY IF g(x) and ||x - F_{i,j^*}|| share the same minimal point. Is there asumption or proof to guarantee this?
this code only masks the gradient, but not momentum.
So masked weights revived after pruning and they will affect to other weights.
Did you check it?
Hi, when i train a resnet50 in imagenet, and i use get_small.sh obtain big_model.pt and small_model.pt. the accuracy is equal. but when i train a resnet50 in private dataset, the obtained small_model.pt accuracy is not equal to big_model.pt. what's the reason about it. the prunning rate is 0.3.
Hello,thank you for your code.
Now I want to use the pruned resnet model directly you released at https://github.com/he-y/soft-filter-pruning/releases/tag/ResNet50_pruned. I have checked get_small_model.py, and I found that when creating a small model, you provide index and bn_value, and these two are obtained from the big model,does this mean if I want to use the small model,I must have the big model to get the index and bn_value?
非常感谢您上一个问题的解答。我还有一个问题想请教您一下。我现在是想用yolov3算法,使用Darknet53模型进行目标识别,想使用您这个压缩算法对得到的模型(数据集是VOC)进行压缩。要怎么进行修改呢,比如训练函数train()或者精度等方面。刚刚接触这一方面,希望您能给一些建议,谢谢~
As you said here, even when you define a compress rate (e.g pruning rate) the total remaining filters are less than the actual compress rate.
That said,
Hi, I have trained with pruning a resnet56 for cifar10. The big model has accuracy of 89%, while the small model has only 10%. I looked into the details and found there seems to be an error in pruning training.
When training with pruning, only the first dim of conv weight are masked 0 during filtering. The second dim of conv weight, which matches the input channel number, will not be masked even if some channel of the last conv are masked 0.
However, when transfering to small model, the second dim of conv weight has also to be filtered to keep the shape of input and conv weight match. As a result, some non-zero weights are pruned, and the small model's accuracy drops a lot.
Have you noticed this issue? And how did you solve it ?
Thank you for your patience.
Looking forward to your reply!
Hi, thanks for your great work. I have one question on distance calculation:
https://github.com/he-y/filter-pruning-geometric-median/blob/master/pruning_cifar10.py#L528
similar_matrix = 1 - distance.cdist(weight_vec_after_norm, weight_vec_after_norm, 'cosine')
similar_sum = np.sum(np.abs(similar_matrix), axis=0)
while the cosine distance in scipy is defined as follow:
Therefore, the distance between two vectors might be negative, right?
Why here use the np.abs over similar_matrix(FPGM distance matrix) on cosine distance?
Thanks
The formula of the BN layer is like this: (x - bn.mean) * bn.weight / sqrt(bn.variance + 1e-5) + bn.bias. after channel pruning in a convolution layer, some channels in x are zeros, but according to the formula, the output of these pruned channels in the following BN layer are not zeros. If you get the pruned model without handling each BN layer, the performance drops significantly.
当你裁剪完一层后,那么下一层相应位置应该也不会去计算flops,对吧?
Hi, thanks for your work, I have one question on why you prove the equivalence on g(x) and g'(x), as in equation (5)-(9)
"Note that even if F_{i,j*} is not included in the calculation of the geometric median in Equation(4),
we could also achieve the same result."
Thanks in advance
Hello:
Thank you for your code! I've read your code carefully and found that you do do_mask() and do_similar_mask() before the training process, the simplified code is listed below:
`
m.init_mask(args.rate_norm, args.rate_dist)
# m.if_zero()
m.do_mask()
m.do_similar_mask()
model = m.model
....
.....
for epoch in range(1, max_epoches):
...
...
train()
m.init_mask(args.rate_norm, args.rate_dist)
# m.if_zero()
m.do_mask()
m.do_similar_mask()
model = m.model
`
I really can not understand it. You initialize the model randomly and do the do_mask() before the first epoch. It just like you get the mask and select the filters randomly. As my understanding, you zero out the grad of the selected filters while training and also set the value of the selected filters zero. Will the values of these filters be zeros all the time since they are set zero before training and will not be updated?
Then if the filters selected before training are always zero, there is no meaning to do the do_mask after the training process.
Besides, could you please tell me whether the selected filters are changed after each epoch? Since you do the init_mask after every training process.
Could you please share the pretrained cifar models for reproducing the result in table 1?
Thanks for your nice work!
Your flops calculator has error.
Surely, number of filter is integer, but in your code, it is float.
In your pruning algorithm applying ceil to number of filter make it right.
After correct them, in case of resnet110-cifar10,
40% pruning only reduce 40.26% FLOPS and
50% pruning only reduce 50.47% FLOPS.
I have to check it again, but I'm sure your one is wrong.
In case of Resnet20-cifar10, scripts say learning rate is 0.01. However logs say 0.1.
Also, In paper you said that 40% means "30% pruned by distance and 10% pruned by norm".
However, the logs name is "cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2",
and it says that "'rate_dist': 0.1, 'rate_norm': 1.0".
which one is correct?
Thanks for the nice work first! And I have some question about complexity.
As stated section 3.4 and the codebase, you choose to calculate pairwise distance in equation (6) by scipy.spatial.distance.cdist
. Suppose we have n filters with dimension d, the complexity here is
The latter seems to be more efficient. So why do you choose equation (6) instead of equation (4)?
Thanks
大佬,您的论文中的cifar10的reset 的baseline的学习率是60,120,160epoch设置为0.01,0.001,0.0001吗
Hello! I have a question. In Figure 2, Figure 2a, Small Norm Deviation, green is not good because the variance is small and the search space is small. Figure 2b, Large Minimum Norm, green is not good because v1 ′ ′ v1 → 0. I don't understand why the minimum value of the norm needs to be close to 0, and why is the blue one good?
Problem solved, sorry for bothering you
I have set the arg data as /data/groceries/imagenet1
and it is supposed to get the training data from the arg above + /train
But the mistake occured:
RuntimeError: Found 0 files in subfolders of: /data/groceries/imagenet1/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif
When I read the pruning_cifar10.sh
file,I found the following shell function:
run20_2(){
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1)&
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2 1 0.1)&
(pruning_pretrain_resnet20 0 /data/yahe/cifar_GM2/pretrain_0.01/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience3 1 0.1)&
(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1)&
(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience2 1 0.1)&
(pruning_scratch_resnet20 0 ./data/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience3 1 0.1)&
}
I have some question about this function, why do you run three sub shells at the same time? I find the parameters of function pruning_scratch_resnet20
in these three sub shells are same except the save_path
. Can I just run one sub shell, such as the first one?
It would be appreciate if i could receive your reply! Thank you!
pruning_scratch_resnet20 0 ./data/yahe/cifar_GM2/scratch/cifar10_resnet20_ratenorm0.7_ratedist0.1_varience1 1 0.1
when i want to run on cpu,there are erros:
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/aten/src/THC/THCGeneral.cpp:74
File "pruning_cifar10.py", line 524, in get_filter_similar
indices = torch.LongTensor(filter_large_index).cuda()
Thank you for you great work. I have one question about layer end.
As the title say, What layer_end means, how to calculate it? And why last_index is layer_end - 3 when running init_rate in main().
self.mask_index = [x for x in range(0, last_index, 3)], as I think, there is a "3" here because conv2d contains 1 parameter and BN layer contains 2, but you only focus on conv2d layer, so you skip BN's parameters, right? But how about parameters in projection layer? As I understand, you do not consider parameters in projection layer in ResNet, but it seems that you do not skip them here.
Hope for your reply
Hi! I have a puzzle that if you prune some filters in a specific layer, consequently you also need to prune channels in the next layer accordingly.But your codes just zerorize filters for each layer instead of considering the ordered relationship between two layers i.e. remove both filters in the ith layer and channels in the i+1th layer. Especially when the pruning process ends, if all targeted filters are pruned in the whole network, Error will be occured in the inference process, cause the reason overhead.
Hi! After run 'python pruning_imagenet.py -a resnet18 --save_dir ./snapshots/resnet18-rate-0.7 --rate_norm 1 --rate_dist 0.3 --layer_begin 0 --layer_end 57 --layer_inter 3 /home/share/data/ilsvrc12_shrt_256_torch/' with 4 GPUs,batchsize=256
The result got Prec@1 66.662 Prec@5 87.440 Error@1 33.338 which has a little drop as yours in Top-1 @67.78 Top-5 @88.01
How & how
@he-y thanks for your work and sharing!
I have a question: In train() function:
m.do_grad_mask()
optimizer.step()
How do you change model's grad through do_grad_mask for m?
/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py:454: SourceChangeWarning: source code of class 'models.res_utils.DownsampleA' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True
and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
File "utils/get_small_model.py", line 346, in
main()
File "utils/get_small_model.py", line 83, in main
state_dict = remove_module_dict(state_dict)
File "utils/get_small_model.py", line 161, in remove_module_dict
for k, v in state_dict.items():
File "/home/vinsen/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in getattr
type(self).name, name))
AttributeError: 'DataParallel' object has no attribute 'items'
Thanks for the excellent work. But I got a problem here.
In "get_filter_similar" method, the mask is generated by: similar_index_for_filter = [filter_large_index[i] for i in similar_small_index].
Elements of similar_small_index and filter_large_index vectors are indices of filters. Why use the index of filters to slice another list (filter_large_index[i])? On the other hand, errors may occur.
Think this way, there are 16 filters. the similar_small_index contains 15, which is one of the smallest simi_sum filters which need to be pruned. But filter_large_index contains 10 elements only and 15 is not included.
As issue #8
The zero filters which is set in mask operation will always be in the first several indexes in weight_vec_after_norm.
So if we use distance.cdist to calculate similar_matrix, the similar_matrix is like this
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0.x 0.x 0.x
0 0 0 0 0 0.x 0.x
0.x 0.x 0.x 0.x 0.x 0 0.x 0.x
the left top matrix values are all zeros.
so these zero filters will always become the geometric-median, because these filters are near the geometric-median point in params space.
And if we train net from scratch, these mask out zeros filters depends on the random weight initialization. Is that correct?
Is my understanding right?
I'm trying to run the Training VGGNet on Cifar-10 example, but I've got the following:
python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'PFEC_vggprune.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'PFEC_finetune.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'pruning_cifar_vgg.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory python: can't open file 'PFEC_vggprune.py': [Errno 2] No such file or directory python: can't open file 'main_cifar_vgg_log.py': [Errno 2] No such file or directory
When I run this:
~/filter-pruning-geometric-median$ sh VGG_cifar/scripts/PFEC_train_prune.sh
Could you please help me on this? Am I doing something wrong?
Python files "training baseline, pruning from pretrain, pruning from scratch, finetune the pruend" in reproducing paper are not available.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.