Code Monkey home page Code Monkey logo

efficientvit's Introduction

EfficientViT

About EfficientViT Models

EfficientViT is a new family of vision models for efficient high-resolution vision, especially segmentation. The core building block of EfficientViT is a new lightweight multi-scale attention module that achieves global receptive field and multi-scale learning with only hardware-efficient operations.

Here are comparisons with prior SOTA semantic segmentation models:

Here are the results of EfficientViT on image classification:

News

  • [2023/07/18] EfficientViT is accepted by ICCV 2023.

Getting Started

Installation

conda create -n efficientvit python=3.10
conda activate efficientvit
conda install -c conda-forge mpi4py openmpi
pip install -r requirements.txt

Dataset

Download Pretrained Models

Latency is measured on NVIDIA Jetson Nano and NVIDIA Jetson AGX Orin with TensorRT, fp16, batch size 1.

ImageNet

Model Resolution ImageNet Top1 Acc ImageNet Top5 Acc Params MACs Jetson Nano Jetson Orin Checkpoint
EfficientViT-B1 224 79.4 94.3 9.1M 0.52G 24.8ms 1.48ms link
EfficientViT-B1 256 79.9 94.7 9.1M 0.68G 28.5ms 1.57ms link
EfficientViT-B1 288 80.4 95.0 9.1M 0.86G 34.5ms 1.82ms link
EfficientViT-B2 224 82.1 95.8 24M 1.6G 50.6ms 2.63ms link
EfficientViT-B2 256 82.7 96.1 24M 2.1G 58.5ms 2.84ms link
EfficientViT-B2 288 83.1 96.3 24M 2.6G 69.9ms 3.30ms link
EfficientViT-B3 224 83.5 96.4 49M 4.0G 101ms 4.36ms link
EfficientViT-B3 256 83.8 96.5 49M 5.2G 120ms 4.74ms link
EfficientViT-B3 288 84.2 96.7 49M 6.5G 141ms 5.63ms link

Cityscapes

Model Resolution Cityscapes mIoU Params MACs Jetson Nano Jetson Orin Checkpoint
EfficientViT-B0 1024x2048 75.7 0.7M 4.4G 275ms 9.9ms link
EfficientViT-B1 1024x2048 80.5 4.8M 25G 819ms 24.3ms link
EfficientViT-B2 1024x2048 82.1 15M 74G 1676ms 46.5ms link
EfficientViT-B3 1024x2048 83.0 40M 179G 3192ms 81.8ms link

ADE20K

Model Resolution ADE20K mIoU Params MACs Jetson Nano Jetson Orin Checkpoint
EfficientViT-B1 512 42.8 4.8M 3.1G 110ms 4.0ms link
EfficientViT-B2 512 45.9 15M 9.1G 212ms 7.3ms link
EfficientViT-B3 512 49.0 39M 22G 411ms 12.5ms link

Usage

from efficientvit.cls_model_zoo import create_cls_model

model = create_cls_model(
  name="b3", 
  pretrained=True, 
  weight_url="assets/checkpoints/cls/b3-r288.pt"
)
from efficientvit.seg_model_zoo import create_seg_model

model = create_seg_model(
  name="b3", 
  dataset="cityscapes", 
  pretrained=True, 
  weight_url="assets/checkpoints/seg/cityscapes/b3.pt"
)
from efficientvit.seg_model_zoo import create_seg_model

model = create_seg_model(
  name="b3", 
  dataset="ade20k", 
  pretrained=True, 
  weight_url="assets/checkpoints/seg/ade20k/b3.pt"
)

Evaluation

Please run eval_cls_model.py or eval_seg_model.py to evaluate our models.

Examples: classification, segmentation

Visualization

Please run eval_seg_model.py to visualize the outputs of our semantic segmentation models.

Example:

python eval_seg_model.py --dataset cityscapes --crop_size 1024 --model b3 --save_path demo/cityscapes/b3/

Benchmarking with TFLite

To generate TFLite files, please refer to tflite_export.py. It requires the TinyNN package.

pip install git+https://github.com/alibaba/TinyNeuralNetwork.git

Example:

python tflite_export.py --export_path model.tflite --task seg --dataset ade20k --model b3 --resolution 512 512

Benchmarking with TensorRT

To generate onnx files, please refer to onnx_export.py.

Training

Please see TRAINING.md for detailed training instructions.

Contact

Han Cai: [email protected]

TODO

  • ImageNet Pretrained models
  • Segmentation Pretrained models
  • ImageNet training code
  • Super resolution models
  • Object detection models
  • Segmentation training code

Citation

If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{cai2022efficientvit,
  title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition},
  author={Cai, Han and Gan, Chuang and Han, Song},
  journal={arXiv preprint arXiv:2205.14756},
  year={2022}
}

efficientvit's People

Contributors

han-cai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.