Code Monkey home page Code Monkey logo

adaptformer's Introduction

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

teaser

This is a PyTorch implementation of the paper AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition.

Shoufa Chen1*, Chongjian Ge1*, Zhan Tong2, Jiangliu Wang2,3, Yibing Song2, Jue Wang2, Ping Luo1
1The University of Hong Kong, 2Tencent AI Lab, 3The Chinese University of Hong Kong
*denotes equal contribution

Catalog

  • Video code
  • Image code

Usage

Install

  • Tesla V100 (32G): CUDA 10.1 + PyTorch 1.6.0 + torchvision 0.7.0
  • timm 0.4.8
  • einops
  • easydict

Data Preparation

See DATASET.md.

Training

Start

# video
OMP_NUM_THREADS=1 python3 -m torch.distributed.launch \
    --nproc_per_node=8 --nnodes=8 \
    --node_rank=$1 --master_addr=$2 --master_port=22234 \
    --use_env main_video.py \
    --finetune /path/to/pre_trained/checkpoints \
    --output_dir /path/to/output \
    --batch_size 16 --epochs 90 --blr 0.1 --weight_decay 0.0 --dist_eval \
    --data_path /path/to/SSV2 --data_set SSV2 \
    --ffn_adapt

on each of 8 nodes. --master_addr is set as the ip of the node 0. and --node_rank is 0, 1, ..., 7 for each node.

# image
python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_image.py \
    --batch_size 128 --cls_token \
    --finetune /path/to/pre_trained/mae_pretrain_vit_b.pth \
    --dist_eval --data_path /path/to/data \
    --output_dir /path/to/output  \
    --drop_path 0.0  --blr 0.1 \
    --dataset cifar100 --ffn_adapt

To obtain the pre-trained checkpoint, see PRETRAIN.md.

Acknowledgement

The project is based on MAE, VideoMAE, timm, and MAM. Thanks for their awesome works.

Citation

@article{chen2022adaptformer,
      title={AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition},
      author={Chen, Shoufa and Ge, Chongjian and Tong, Zhan and Wang, Jiangliu and Song, Yibing and Wang, Jue and Luo, Ping},
      journal={arXiv preprint arXiv:2205.13535},
      year={2022}
}

License

This project is under the MIT license. See LICENSE for details.

adaptformer's People

Contributors

shoufachen avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.