Code Monkey home page Code Monkey logo

ctvis_vit's Introduction

CTVIS: Consistent Training for Online Video Instance Segmentation

Kaining Ying1,2*,   Qing Zhong4*,   Weian Mao4,   Zhenhua Wang3#,   Hao Chen1#

Lin Yuanbo Wu5,   Yifan Liu4,   Chenxiang Fan1,   Yunzhi Zhuge4,   Chunhua Shen1

1Zhejiang University,   2Zhejiang University of Technology

3Northwest A&F University,   4The University of Adelaide,   5Swansea University

📰 News

  • [2023/06/18] CTVIS wins 2nd Place in Pixel-level Video Understanding Challenge (VPS Track) at CVPR2023.
  • [2023/07/14] Our work CTVIS is accepted by ICCV 2023! Congrats! ✌️
  • [2023/07/24] We will release the code ASAP. Stay tuned!
  • [2023/07/31] We release the code and weights on YTVIS19_R50.
  • [2023/08/24] CTVIS wins the 2nd Place in The 5th Large-scale Video Object Segmentation Challenge - Track 2: Video Instance Segmentation at ICCV 2023.

🔨 Install

Here we provide the command lines to build conda environment.

conda create -n ctvis python=3.10 -y 
conda activate ctvis
pip install torch==2.0.0 torchvision  

# install D2
git clone https://gitee.com/yingkaining/detectron2.git
python -m pip install -e detectron2

# install mmcv
pip install openmim
mim install "mmcv==1.7.1"

pip install -r requirements.txt

cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../

🏀 Dataset Preparation

We recommend that you use the following format to organize the dataset format and refer to this for more details.

$DETECTRON2_DATASETS
+-- coco
|   |
|   +-- annotations
|   |   |
|   |   +-- instances_{train,val}2017.json
|   |   +-- coco2ytvis2019_train.json
|   |   +-- coco2ytvis2021_train.json
|   |   +-- coco2ovis_train.json
|   |
|   +-- {train,val}2017
|       |
|       +-- *.jpg
|
+-- ytvis_2019
|   ...
|
+-- ytvis_2021
|   ...
|
+-- ovis
    ...

It is worthwhile to note that annotations coco2ytvis2019_train.json, coco2ytvis2021_train.json and coco2ovis_train.json are post-processing from following command:

python tools/convert_coco2ytvis.py

If you want to visualize the dataset, you can use the following script (YTVIS19):

python browse_datasets.py ytvis_2019_train --save-dir /path/to/save/dir

⚾️ Training and Evaluation

Training

We use the weights of Mask2Former pretrained on MS-COCO as initional. You should download them first and place them in the checkpoints/.

Mask2Former-R50-COCO: Official Download Link

Mask2Former-SwinL-COCO: Official Download Link

Next you can train CTVIS, for example on YTVIS19 using R50.

python train_ctvis.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --num-gpus 8 OUTPUT_DIR work_dirs/CTVIS_YTVIS19_R50

Evaluation

Typically during training, the model is evaluated on the validation set periodically. I can also evaluate the model separately, like this:

python train_ctvis.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --eval-only --num-gpus 8 OUTPUT_DIR work_dirs/CTVIS_YTVIS19_R50 MODEL.WEIGHTS /path/to/model/weight/file

You can download the model weights in Model Zoo. Finally, we need to submit the submission files to the CodaLab to get the AP. We recommend using following scripts to push the submission to CodaLab. We appeariate this project for providing such useful feature.

python tools/codalab_upload.py --result-dir /path/to/your/submission/dir --id ytvis19 --account your_codalab_account_email --password your_codalab_account_password

Demo and Visualization

We support inference on specified videos (demo/demo.py) as well as visualization of all videos in a given dataset (demo/visualize_all_videos.py).

# demo
python demo/demo.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --video-input --output /path/to/save/output --save-frames --opts MODEL.WEIGHTS /path/to/your/checkpoint

💽 Model Zoo

YouTube-VIS 2019

Model Backbone AP AP50 AP75 AR1 AR10 Link
CTVIS ResNet-50 55.2 79.5 60.2 51.3 63.7 1Drive
CTVIS Swin-L (200 queries) 65.6 87.7 72.2 56.5 70.4

YouTube-VIS 2021

Model Backbone AP AP50 AP75 AR1 AR10 Link
CTVIS ResNet-50 50.1 73.7 54.7 41.8 59.5
CTVIS Swin-L (200 queries) 61.2 84 68.8 48 65.8

YouTube-VIS 2022

Note: YouTube-VIS 2022 shares the same training set as YouTube-VIS 2021.

Model Backbone AP APS APL Link
CTVIS ResNet-50 44.9 50.3 39.4
CTVIS Swin-L (200 queries) 53.8 61.2 46.4

OVIS

Model Backbone AP AP50 AP75 AR1 AR10 Link
CTVIS ResNet-50 35.5 60.8 34.9 16.1 41.9
CTVIS Swin-L (200 queries) 46.9 71.5 47.5 19.1 52.1

🫡 Acknowledgements

We sincerely appreciate HIGH-FLYER for providing the valuable computational resources. At the same time, we would like to express our gratitude to the following open source projects for their inspirations:

🪪 Lincese

The content of this project itself is licensed under LICENSE.

📇 Cite our Paper

If you found this project useful for your paper, please kindly cite our paper.

@misc{ying2023ctvis,
      title={{CTVIS}: {C}onsistent {T}raining for {O}nline {V}ideo {I}nstance {S}egmentation}, 
      author={Kaining Ying and Qing Zhong and Weian Mao and Zhenhua Wang and Hao Chen and Lin Yuanbo Wu and Yifan Liu and Chengxiang Fan and Yunzhi Zhuge and Chunhua Shen},
      year={2023},
      eprint={2307.12616},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

ctvis_vit's People

Contributors

kainingying avatar zhang-tao-whu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.