Code Monkey home page Code Monkey logo

metaformer's Introduction

PWC PWC PWC PWC

MetaFormer

A repository for the code used to create and train the model defined in “MetaFormer: A Unified Meta Framework for Fine-Grained Recognition” arxiv:2203.02751 Image text Moreover, MetaFormer is similar to CoAtNet. Therefore, this repo can also be seen as a reference PyTorch implementation of “CoAtNet: Marrying Convolution and Attention for All Data Sizes” arxiv:2106.04803 Image text

Model zoo

name resolution 1k model 21k model iNat21 model
MetaFormer-0 224x224 metafg_0_1k_224 metafg_0_21k_224 -
MetaFormer-1 224x224 metafg_1_1k_224 metafg_1_21k_224 -
MetaFormer-2 224x224 metafg_2_1k_224 metafg_2_21k_224 -
MetaFormer-0 384x384 metafg_0_1k_384 metafg_0_21k_384 metafg_0_inat21_384
MetaFormer-1 384x384 metafg_1_1k_384 metafg_1_21k_384 metafg_1_inat21_384
MetaFormer-2 384x384 metafg_2_1k_384 metafg_2_21k_384 metafg_2_inat21_384

You can also get model by https://pan.baidu.com/s/1ZGEDoWWU7Z0vx0VCjEbe6g (password:3uiq).

Usage

python module

  • install Pytorch and torchvision
pip install torch==1.5.1 torchvision==0.6.1
  • install timm
pip install timm==0.4.5
  • install Apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
  • install other requirements
pip install opencv-python==4.5.1.48 yacs==0.1.8

data preparation

Download inat21,18,17,CUB,NABirds,stanfordcars, and aircraft, put them in respective folders (<root>/datasets/<dataset_name>) and Unzip file. The folder sturture as follow:

datasets
  |————inraturelist2021
  |       └——————train
  |       └——————val
  |       └——————train.json
  |       └——————val.json
  |————inraturelist2018
  |       └——————train_val_images
  |       └——————train2018.json
  |       └——————val2018.json
  |       └——————train2018_locations.json
  |       └——————val2018_locations.json
  |       └——————categories.json.json
  |————inraturelist2017
  |       └——————train_val_images
  |       └——————train2017.json
  |       └——————val2017.json
  |       └——————train2017_locations.json
  |       └——————val2017_locations.json
  |————cub-200
  |       └——————...
  |————nabirds
  |       └——————...
  |————stanfordcars
  |       └——————car_ims
  |       └——————cars_annos.mat
  |————aircraft
  |       └——————...

Training

You can dowmload pre-trained model from model zoo, and put them under <root>/pretrained. To train MetaFG on datasets, run:

python3 -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py --cfg <config-file> --dataset <dataset-name> --pretrain <pretainedmodel-path> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]

<dataset-name>:inaturelist2021,inaturelist2018,inaturelist2017,cub-200,nabirds,stanfordcars,aircraft For CUB-200-2011, run:

python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345  main.py --cfg ./configs/MetaFG_1_224.yaml --batch-size 32 --tag cub-200_v1 --lr 5e-5 --min-lr 5e-7 --warmup-lr 5e-8 --epochs 300 --warmup-epochs 20 --dataset cub-200 --pretrain ./pretrained_model/<xxxx>.pth --accumulation-steps 2 --opts DATA.IMG_SIZE 384  

note that final learning rate is total_bs/512.

Eval

To evaluate model on dataset,run:

python3 -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> --master_port 12345  main.py --eval --cfg <config-file> --dataset <dataset-name> --resume <checkpoint> [--batch-size <batch-size-per-gpu>]

Main Result

ImageNet-1k

Name Resolution #Param #FLOPS Throughput Top-1 acc
MetaFormer-0 224x224 28M 4.6G 840.1 82.9
MetaFormer-1 224x224 45M 8.5G 444.8 83.9
MetaFormer-2 224x224 81M 16.9G 438.9 84.1
MetaFormer-0 384x384 28M 13.4G 349.4 84.2
MetaFormer-1 384x384 45M 24.7G 165.3 84.4
MetaFormer-2 384x384 81M 49.7G 132.7 84.6

Fine-grained Datasets

Result on fine-grained datasets with different pre-trained model.

Name Pretrain CUB NABirds iNat2017 iNat2018 Cars Aircraft
MetaFormer-0 ImageNet-1k 89.6 89.1 75.7 79.5 95.0 91.2
MetaFormer-0 ImageNet-21k 89.7 89.5 75.8 79.9 94.6 91.2
MetaFormer-0 iNaturalist 2021 91.8 91.5 78.3 82.9 95.1 87.4
MetaFormer-1 ImageNet-1k 89.7 89.4 78.2 81.9 94.9 90.8
MetaFormer-1 ImageNet-21k 91.3 91.6 79.4 83.2 95.0 92.6
MetaFormer-1 iNaturalist 2021 92.3 92.7 82.0 87.5 95.0 92.5
MetaFormer-2 ImageNet-1k 89.7 89.7 79.0 82.6 95.0 92.4
MetaFormer-2 ImageNet-21k 91.8 92.2 80.4 84.3 95.1 92.9
MetaFormer-2 iNaturalist 2021 92.9 93.0 82.8 87.7 95.4 92.8

Results in iNaturalist 2019, iNaturalist 2018, and iNaturalist 2021 with meta-information.

Name Pretrain Meta added iNat2017 iNat2018 iNat2021
MetaFormer-0 ImageNet-1k N 75.7 79.5 88.4
MetaFormer-0 ImageNet-1k Y 79.8(+4.1) 85.4(+5.9) 92.6(+4.2)
MetaFormer-1 ImageNet-1k N 78.2 81.9 90.2
MetaFormer-1 ImageNet-1k Y 81.3(+3.1) 86.5(+4.6) 93.4(+3.2)
MetaFormer-2 ImageNet-1k N 79.0 82.6 89.8
MetaFormer-2 ImageNet-1k Y 82.0(+3.0) 86.8(+4.2) 93.2(+3.4)
MetaFormer-2 ImageNet-21k N 80.4 84.3 90.3
MetaFormer-2 ImageNet-21k Y 83.4(+3.0) 88.7(+4.4) 93.6(+3.3)

Citation

@article{MetaFormer,
  title={MetaFormer: A Unified Meta Framework for Fine-Grained Recognition},
  author={Diao, Qishuai and Jiang, Yi and Wen, Bin and Sun, Jia and Yuan, Zehuan},
  journal={arXiv preprint arXiv:2203.02751},
  year={2022},
}

Acknowledgement

Many thanks for swin-transformer.A part of the code is borrowed from it.

metaformer's People

Contributors

bakingbrains avatar dqshuai avatar ifighting avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

metaformer's Issues

About running on one GPU

I have only one GPU. I have set local_rank=-1 ,but failed to run the code.
What do I need to revise to successfully run on one GPU?

NAbirds dataset

Hello, can you provide the data set of NAbirds? An official website can't download it. Thank you very much

metadata-generation-failed

您好,我在安装环境的时候出现问题:
cwd: /home/gpu/PycharmProjects/MetaFormer-master/apex/
Preparing metadata (setup.py) ... error
error: metadata-generation-failed
× Encountered error while generating package metadata.

能麻烦您看一下是哪里出了问题吗?我每一步都是按照您的readme文件进行的,麻烦您了

meta data

Hi, may I ask where did the meta data come from? Time, latitude and longitude are not provided in the dataset

bert_embedding_cub

作者您好,可以提供一下bert_embedding_cub这个文件吗?我在使用meta训练的时候报错了,非常感谢!

Checkpoint on iNaturalist 2018

Thanks for your wonderful work. If it is possible, could you please share your checkpoints on iNaturalist 2018? Thank you very much!

model zoo

Hi, could you please share model zoo with baidu cloud disk ? Thanks! @dqshuai

bert_embedding_cub

您好,请问您可以提供一下bert_embedding_cub这个文件吗?非常感谢!

model weights for MetaFormer-2 fine tuned on iNat 2018

are the weights for MetaFormer-2 fine tuned on iNat 2018 available? I'm doing research for this model and it would help a lot in training time and computation resources if I can get them without retraining again, thanks!

RuntimeError in

Hi,thanks for your patience.
I'm new here,and when I try to run the train in CUB200,meet the error
Could you please help .THANGKS.

Here are my run commend
python3 -m torch.distributed.launch --nproc_per_node 6 --master_port 12345 main.py --cfg ./configs/MetaFG_meta_1_224.yaml --batch-size 12 --tag cub-200_v1 --lr 5e-5 --min-lr 5e-7 --warmup-lr 5e-8 --epochs 2500 --warmup-epochs 20 --dataset cub-200 --accumulation-steps 2 --opts DATA.IMG_SIZE 224

And the error

Traceback (most recent call last):
File "main.py", line 403, in
main(config)
File "main.py", line 163, in main
train_one_epoch_local_data(config, model, criterion, data_loader_train, optimizer, epoch, mixup_fn, lr_scheduler)
File "main.py", line 210, in train_one_epoch_local_data
outputs = model(samples,meta)
File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/MetaFormer/models/MetaFG_meta.py", line 231, in forward
x = self.forward_features(x,meta)
File "/home/MetaFormer/models/MetaFG_meta.py", line 171, in forward_features
metas = torch.split(meta,self.meta_dims,dim=1)
File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/functional.py", line 156, in split
return tensor.split(split_size_or_sections, dim)
File "/home/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/tensor.py", line 499, in split
return super(Tensor, self).split_with_sizes(split_size, dim)
RuntimeError: split_with_sizes expects split_sizes to sum exactly to 32 (input tensor's size at dimension 1), but got split_sizes=[4, 3]

about pre-trained checkpoint

Very nice job!

Could you please provide some pre-trained checkpoints? For example, the 92.3% CUB accuracy MetaFormer-1, and the 92.9% CUB accuracy MetaFormer-2?

Appreciate for your generosity!

Some errata found on the code

Hi,
Thanks for sharing wonderful work :)

While running the code, I found some minor errata building the data augmentation module.

from timm.data.transforms import _pil_interp

I guess this line should fixed with
from timm.data.transforms import str_to_pil_interp
since there is no _pil_interp in timm.data.transforms. https://github.com/rwightman/pytorch-image-models/blob/master/timm/data/transforms.py

If there is something I missed, pleased let me know :)

About how to get meta data?

hi, I have downloaded nabirds data from the link provided by you, but I don't find meta data,can you tell me how to get meta data, thanks!

Is it fair to use larger pretrained model?

Hi,
First of all congratulations for your great work!

I always worried about the effect of pretrainning for FGVC. There is high risk of data overlapping of pretrained dataset and fine tune dataset. Take CUB dataset for example, it already find that CUB200-2011 have overlapping images in test dataset with imagenet1k train dataset see here. So it is highly possible that there will be more overlap of CUB with imagenet21k and iNaturalist. So there seems twio possible sources that can explain the obviously improvement when using pretrained model with larger dataset:

  1. the pretrained model may learned some commonly useful structures which improves performance on CUB task, this is good
  2. the pretrained model with larger dataset just has seem more test image from CUB test dataset, so it performs well, this is bad

So what is your opinion about this risk?

I have a question about "linear embbeding" and "non-linear embbeding".

Thanks for all your great work!

I have two questions about the paper.

  1. Is figure 2 on page 4 of the paper and figure 1 on page 10 of the paper referring to the same architecture?

  2. The term "non-linear embbedding" and "linear embbedding" are used to describe embedding meta-information, but if the figures refer to the same architecture, what is the intention behind the different designations?
    Neural networks are iterations of processes that perform linear transformations and activation functions that perform nonlinear transformations. Is it correct to say that you used "non-linear embbedding" because you are using an activation function relu that performs a non-linear transformation?

Danish Fungi 2020 - Performance Evaluation

Dear All,

This is more a feature request than a bug; anyway, can you test/utilize your method on the DF20 dataset? We include much more metadata within the dataset, thus, the performance evaluation with regular ViT architecture might is interesting.

The link to the paper and web follows:

Best,
Lukas

Regarding Inferencing.

I have trained the model for 28 epochs on CUB-200 dataset, also I wrote a small inferencing script which accepts a single image along with its meta info. but while predicting it is not at all giving good result. Do I need to train more?

Any suggestions here?

Thank you.

CUDA Version?

I don`t know the cuda version to run this project?
Maybe conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=10.2 -c pytorch?

Regarding object detection.

Does it work well for detection and localization task? (from the paper it should work well, to classify object with meta info)

Any suggestions here?

About how to get meta data?

hi, I have downloaded cub-200 data from the link provided by you, but I don't find meta data,can you tell me how to get meta data, thanks!

Regarding embedding files for cub.

Can you please suggest is this the right way to generate the embeddings?

from transformers import AutoTokenizer, AutoModel
import pickle

text_file = "file.txt"

with open(text_file, 'r') as rr:
    data = rr.read()

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

inputs = tokenizer(data, return_tensors="pt")
outputs = model(**inputs)
print(outputs[0])

embeddings = {'embedding_words': outputs[0].detach().numpy()}

with open('filepickle', 'wb') as pkl:
    pickle.dump(embeddings, pkl)

Thank You

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.