sdc17 / upop Goto Github PK

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

Home Page: https://dachuanshi.com/UPop-Project/

License: BSD 3-Clause "New" or "Revised" License

Python 98.08% Shell 1.92%

efficient-deep-learning model-compression multimodal-learning vision-language-transformer image-captioning image-text-retrieval visual-question-answering visual-reasoning text-image-retrieval framework

upop's People

Contributors

Stargazers

Watchers

Forkers

kal729 isobe-y techthiyanes pachecorj swadesh06 angelalita

upop's Issues

显存被占满

您好，我在试运行您的代码的时候，显示我的显存被占满。报错信息如下：

RuntimeError: CUDA out of memory. Tried to allocate 290.00 MiB (GPU 0; 31.75 GiB total capacity; 29.78 GiB already allocated; 113.50 MiB free; 30.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我检索到占满内存的代码是 loss = model(image, caption, alpha=alpha, idx=idx)

想问一下该怎么解决内存占满的问题呢？谢谢！

加載checkpoints

您好！
請問在compression之後，模型的結構也發生改變，是不是不能用原來的build model函數加載模型的checkpoints？

请问能否支持blip2模型？

Problem installing petrel-oss-sdk v2.2.1-2-g1505ef3-master from environment.yaml pip dependencies

Hi，

Thanks for your valuable work and sharing the code.
When trying to install the dependencies listed in the environment.yaml file for UPop, I encountered an error related to the petrel-oss-sdk package. Specifically, the version specified in the file (v2.2.1-2-g1505ef3-master) could not be installed using pip.

I would appreciate any suggestions or insights on how to resolve this issue and successfully install the petrel-oss-sdk package from the environment.yaml file.

best

No such file or directory: 'java': 'java' while evaluating or compressing models on the Image Caption task

Please refer to this solution.

Cannot find package petrel-oss-sdk while installing dependencies

Just skip it. petrel-oss-sdk is an internal package for us to accelerate data loading. And the code will also work without this package.

Availability of Pretrained Weights for Chinese Users

Hi,

Can you update a version of the pretrained weights that can be shared through popular cloud storage platforms such as Baidu Netdisk or OneDrive, which are widely used in China. some of users in mainland China have reported difficulties in accessing the pretrained weights due to the need for a proxy or VPN connection to bypass internet censorship.

Best

关于数据集的问题

是否可以提供VQA v2和visual genome数据集的网盘文件，文中所提供的链接现在无法打开或下载

Problem with CLIP implementation

There's a bug concerning CLIP implementation, since nn.MultiheadAttention does not have a parameter called "search", which is customized in your implementation of other modules to give the model a learnable pruning parameter. However, the command runs without error. I will appreciate it if you would help me with that.

Runtime error caused by clip/mock.py or deit/mock.py while evaluating or compressing

For CLIP models, the clip/mock.py is used for patching our modification to the nn.MultiheadAttention. It was modified from the source code of the nn.MultiheadAttention in version Pytorch==1.11.0, and also tested on Pytorch==1.12.1 and Pytorch==1.13.1. However, it may not be compatible with other Pytorch versions that we have not tested. If you encounter this error in other versions, you may switch to version 1.11.0 or create your own patch file by referring to our clip/mock.py.
For DeiT models, the deit/mock.py is used for patching our modification to the timm.models. It was modified from the source code of the timm.models.vision_transformer in version timm==0.4.12 and torchvision==0.12.0. It may not be compatible with other timm and torchvision versions that we have not tested. If you encounter this error in other versions, you may switch to the above versions we used, or create your own patch file by referring to our deit/mock.py.

Question about accumulated gradients metric

Dear author,

Hello, I have read your paper and code. UPop uses the cumulative gradient of mask as metric to evaluate the weight importance. However, I don't understand why UPop prunes the parts with large cumulative gradients. Does it mean that the parts with larger cumulative gradients are less important? Is there any related research supporting this, or is it based on intuition? Could you please provide some clarification?

Thank you.

你好，我想询问一些关于微调方面的问题

我希望对BLIP模型进行剪枝，并在自己的项目数据集上进行微调，我是不是应该在retrain阶段计算loss时选用自己的数据集进行微调。
以及后续还应该在哪些参数上进行调整，或者还有什么策略可以提升微调效果。
刚刚接触这方面的研究，希望能得到您的指点，非常感谢。

你好我在渐进式剪枝这一环节有些疑问希望您解答

你好，很出色的工作！
我在渐进式剪枝这一部分有一些疑问，在论文中，您提出用累计梯度的大小作为alpha参数重要性的衡量因子，不过在论文中好像并未说明更新alpha时变动的元素是alpha的grad值大还是值小的元素？
在代码中，我看到：

sorted_alpha_grad, indices = torch.sort(alpha_grad, descending=True)
    compression_weight = torch.ones_like(indices)
    compression_weight[indices < alpha_grad_attn.numel()] = 36 # 36 = 12 (number of heads) * [1 (weights of query) + 1 (weights of key) + 1 (weights of value)]
    threshold = sorted_alpha_grad[torch.argmin(torch.abs(torch.cumsum(compression_weight, 0) - torch.sum(compression_weight)*pi))]
    
    def update(module, grad):
        mask = ((grad <= threshold) | (grad <= torch.min(grad)))
        module.data.copy_(mask + (~mask)*(1 - pi/p))

这一部分似乎说明，在update过程中是grad比threshold大的部分参数得到了更新，将其趋于0，这里我有些疑问，为何grad比threshold大的部分参数重要性就小以至于能趋于0呢？希望您解答