Code Monkey home page Code Monkey logo

patchvisiontransformer's Introduction

VisionTransformer

This repository contains PyTorch evaluation code, training code and pretrained models.

We will further upload results on other model architectures.

Getting Started

cd ./deit

Before using it, make sure you have the pytorch-image-models [timm] package by Ross Wightman installed. Note that our work relies of the augmentations proposed in this library.

SWIN

We provide models pretrained on ImageNet 2012 and finetune the checkpoint on segmentation datasets.

name acc@1 #params url
SWIN-Large 87.4 197M model

We finetune the checkpoint and get following results.

name mIoU mIoU (ms + flip) #params url
ADE20K 83.0 54.4 234M model, log
CityScapes 82.9 83.9 234M model, log

For more details, please refer to README.

DEIT

Model Zoo

We provide models pretrained on ImageNet 2012. More models will be uploaded.

name acc@1 acc@5 #params url
VIT-B12 82.9 96.3 86M model
VIT-B24 83.3 96.4 172M model
VIT-B12-384 84.2 97.0 86M model

We finetune the checkpoint proposed by VIT.

name acc@1 acc@5 #params url
VIT-L24 83.9 96.7 305M model
VIT-L24-384 85.4 96.7 305M model

Evaluate

For Deit-B12, run:

python -m torch.distributed.launch --nproc_per_node=XX --master_port=XX --use_env main.py --model deit_base_patch16_224 --aa rand-m9-mstd0.5-inc1  --input-size 224 --batch-size 16 --num_workers 2 --data-path path --output_dir output_dir --resume model.pth --eval

giving

Acc@1 82.928 Acc@5 96.342 loss 0.721

Train

The training code is not fully available and the results are currently not reproducable. Please wait for our updates.

For Deit-B12, run:

python -m torch.distributed.launch --nproc_per_node=XX --master_port=XX --use_env main.py --model deit_base_patch16_224 --aa rand-m9-mstd0.5-inc1  --input-size 224 --batch-size 72 --num_workers 4 --data-path path --output_dir output_dir -no-repeated-aug --epochs 300 --model-ema-decay 0.99996 --drop-path 0.5 --drop .0 --mixup .0 --mixup-switch-prob 0.0 

and further refine the model by

python -m torch.distributed.launch --nproc_per_node=XX --master_port=XX --use_env main.py --model deit_base_patch16_224 --aa rand-m9-mstd0.5-inc1  --input-size 224 --batch-size 72 --num_workers 4 --data-path path --output_dir output_dir -no-repeated-aug --start_epoch 300 --epochs 400 --resume model.pth --model-ema-decay 0.99996 --drop-path 0.75 --drop .0 --mixup .0 --mixup-switch-prob 0.0 

Finetune Models Trained on ImageNet-22k

python -m torch.distributed.launch --nproc_per_node=XX --master_port=XX --use_env main.py --model deit_large_patch16_224 --aa rand-n1-m1-mstd0.5-inc1 --input-size 224 --batch-size 16 --num_workers 1 --data-path path --output_dir output_dir -no-repeated-aug --smoothing 1e-6 --weight-decay 1e-8 --lr 5e-5 --start_epoch 0 --reprob 1e-6 --resume vit_checkpoint --epochs 40 --model-ema-decay 0.99996 --drop-path 0. --drop .0 --mixup .0 --mixup-switch-prob 0.0 --no-use-talk

evaluate

python -m torch.distributed.launch --nproc_per_node=XX --master_port=XX --use_env main.py --model deit_large_patch16_224 --aa rand-n1-m1-mstd0.5-inc1 --input-size 224 --batch-size 16 --num_workers 1 --data-path path --output_dir output_dir -no-repeated-aug --smoothing 1e-6 --weight-decay 1e-8 --lr 5e-5 --start_epoch 0 --reprob 1e-6 --resume vit_checkpoint --epochs 40 --model-ema-decay 0.99996 --drop-path 0. --drop .0 --mixup .0 --mixup-switch-prob 0.0 --no-use-talk --eval 

patchvisiontransformer's People

Contributors

chengyuegongr avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.