dongzelian / ssf Goto Github PK

[NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".

Home Page: https://arxiv.org/pdf/2210.08823.pdf

License: MIT License

Python 91.53% Shell 8.47%

ssf's People

Contributors

Stargazers

Watchers

Forkers

dhockaday henrywoo juncongmoo juneandjuly yewen1486 dercaft tianhaofu

ssf's Issues

Inconsistent of reproduced results and paper's results.

Hi,

I tried to reproduce the results for VTAB given in paper, but I have the following results:

The results are pretty different to given ones.

And here is my cifar_100 result:

Here is my running environment:

Hardware: 3090*4

Python 3.10.0

CUDA&Pytorch:

cuda 11.6.1
pytorch 1.13.1
torchvision 0.14.1
timm 0.6.5

I use a newer version of toolkit because of some compatible problem happened between 3090 driver and cuda 10, cudnn 7

Head layer of the inference model

If the pre-trained model was trained on 1000 classes，then we fine-tune it on a downstream task with 100 classes.
However, during inference, the paper says that we should use the frozen pre-trained model with the head layer of 1000 classes (without network architecture modification), does it?

Where is the re-parameterization code?

Thank you for your excellent work！
I can't find the re-parameterization code. Could you please tell me where it is or share it with us?
Thanks a lot.

Question about Fig. 4 in the latest arxiv version.

Hi,

Thank you for sharing this wonderful work!

I have a question about Fig. 4 and corresponding presentation in main text, where the distributions of weights and biases seem unchanged after full fine-tuning, while SSF modulates the distributions. However, in most cases, full fine-tuning should be an upper bound compared to other parameter-efficient fine-tuning methods (except for some few-shot settings), which may indicates that we don't need to change the distribution of parameters. It seems to challenge the rationality of "the scale and shift parameters adjust the original weights and biases".

In other words, I am confused that whether we really require modulating the distribution of parameters to adapt the pre-trained model to downstream tasks for better performance? (Also, I note that SSF achieves better performance over full fine-tuning (93.99% vs. 93.82%), but this gap is slight.)

Looking forward to your reply, thanks in advance!

Hello, where can I find the dataset definition of Oxford flowers and Stanford_car in the code?

About the train and test data in VTAB.

Hi, I have a question about the train and test data in VTAB. In the paper VPT, I note that they use the train800.txt to train the model to find the best model on val200.txt, and then they use train800val200.txt to run three rounds to report the average accuracy on the test data. In your released code, I want to know how you deal with the train, val, and test data? It seems that you train the model on train800val200.txt and then report the test accuracy based on the last trained model? Looking forward to your reply.
code:
train_list_path = os.path.join(self.dataset_root, 'train800val200.txt')
test_list_path = os.path.join(self.dataset_root, 'test.txt')

train_list_path = os.path.join(self.dataset_root, 'train800.txt')

test_list_path = os.path.join(self.dataset_root, 'val200.txt')

Performance on stanford_dogs.

Hi dongze,

Scripts on CUB works well. After running the script for stanfordcars dataset, I only got 88.82 top1 acc(different from the reported 89.6). Is there anything wrong with the released training scripts on stanfordcars dataset? Or did I made any mistake for running the script?

Thanks for your help!

Best.

Performance of VTAB-DTD

Thank you so much for sharing the code. I tried running train_scripts/vit/vtab/dtd/train_ssf.sh and found that the result could not match the number on the paper. At the end of the 100 epoch, the top 1 accuracy is 67.44 instead of 75.1 reported on the paper. I can reproduce the performance of most of the VTAB datasets. I'm curious if I missed anything for DTD or if there are mistakes in the training script for DTD. Thank you so much again and I look forward to hearing back from you.

Preprosessed vtab-1k zip file is broken

Hi authors,

Thank you for sharing the source code for you inspirational work! I wanted to mention to you that the vtab-1k dataset you shared on onedrive seems broken. The error message I got when I tried to unzip the file is "Error 79 - Inappropriate file type or format". Could you kindly help to look into the issue?

Update:
I figured out the correct way to unzip the five split .tar.gz files. The following command works for me

cat filename.tar.gz.* | tar -zxv

Training script of vtab-cifar100 under full fine-tuning

Hi!
I found it's hard to reproduce the cifar100 full fine-tuning results in Table 4 (68.9).
My bast reproduce result is using supervised pretrained vit-b, with lr=0.01, wd=1e-4 and sgd optimizer, and i got 66.3 top acc@1, which has a pretty large gap compared to 68.9.
Could you provide the training script of vtab-cifar100 under full fine-tuning setting?
Thanks!

The split on FGVC datasets(e.g. stanford_cars) is missing.

It is very easy to successfully reproduce the experiment on CUB, but not for stanford_cars. I found that the corresponding split was not provided on the stanford_cars dataset.

Is there any possible to update the code and details of this part or provide the split files on those datasets?

Any clarification will be appreciated!!

where did SSF freezed?

no find that SSF freezes the pre-trained parameters.

Training script for baselines

This is a wonderful paper! And also thank you for your neat codebase.

I am wondering if you can provide code for reproducing full-finetuning and linear-probing baselines in the paper. That would be very helpful.

How to process VTAB dataset?

I have followed the vpt repo to process VTAB dataset, but I only get some tfrecord files.
How to further process them?

Training process

Hi dongze,

When I try to run the script for stanfordcars dataset, the acc@1 for both Test and Test (EMA) are both 100% since epoch 2.
( Test: [ 505/505] Time: 0.049 (0.077) Loss: 0.1048 (0.1048) Acc@1: 100.0000 (100.0000) Acc@5: 100.0000 (100.0000)
Test (EMA): [ 505/505] Time: 0.049 (0.077) Loss: 0.1050 (0.1050) Acc@1: 100.0000 (100.0000) Acc@5: 100.0000 (100.0000)
*** Best metric: 100.0 (epoch 2))

The only differerce is that I run it on single GPU, and I delete --pretrained and add --initial-checkpoint ./pretrained_models/imagenet21k_ViT-B_16.npz (where I downloaded and uploaded the pretrained ViT model) in train_ssf.sh. Other parameters are kept the same. As for the stanfordcars dataset, I have use it to run VPT where everything is reasonable.

Did I make any mistake for running the script? Thanks for your help!

Best,

About the performance.

In the paper, Table 1 shows SSF gets 93.99% top-1 Acc using ViT-B/16 on CIFAR100. However, in Table 4, SSF gets 69.0% top-1 Acc using ViT-B/16 on CIFAR100. Are there any different settings between these two results? It seems like using the same model.