Code Monkey home page Code Monkey logo

ssm-vos's Introduction

SSM-VOS: Separable Structure Modeling for Semi-supervised Video Object Segmentation [paper]

Framework

A PyTorch implementation of our paper Separable Structure Modeling for Semi-supervised Video Object Segmentation by Wencheng Zhu, Jiahao Li, Jiwen Lu, and Jie Zhou. Published in IEEE Transactions on Circuits and Systems for Video Technology.

Getting Started

First, clone this project to your local environment.

git clone https://github.com/li-plus/SSM-VOS.git && cd SSM-VOS

It is recommended to create a virtual environment with python >= 3.6.

conda create --name ssm python=3.8
conda activate ssm

Install python dependencies.

pip install -r requirements.txt

Datasets Preparation

Downloading

Download DAVIS 2016, DAVIS 2017 train-val and test-dev, and YouTube-VOS 2018 datasets from their official websites. Note that for DAVIS 2016 or DAVIS 2017, only the 480p version is needed.

mkdir -p datasets && cd datasets
# DAVIS 2016
wget https://graphics.ethz.ch/Downloads/Data/Davis/DAVIS-data.zip
unzip DAVIS-data.zip
mv DAVIS DAVIS2016
# DAVIS 2017 Train Val
wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip
unzip DAVIS-2017-trainval-480p.zip
mv DAVIS DAVIS2017
# DAVIS 2017 Test Dev
wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-test-dev-480p.zip
unzip DAVIS-2017-test-dev-480p.zip
mv DAVIS DAVIS2017_test
# YouTube-VOS 2018
# Need to sign up for a competition on CodaLab, and manually download the dataset.

It is recommended to follow the below structure. If you have saved the datasets into other directory, you may need to make a symbolic link, or manually adjust the path to your datasets in make_index.py.

SSM-VOS
└── datasets
    ├── DAVIS2016
    │   ├── Annotations
    │   ├── ImageSets
    │   └── JPEGImages
    ├── DAVIS2017
    │   ├── Annotations
    │   ├── ImageSets
    │   └── JPEGImages
    ├── DAVIS2017_test
    │   ├── Annotations
    │   ├── ImageSets
    │   └── JPEGImages
    └── YouTubeVOS
        ├── train
        │   ├── Annotations
        │   ├── JPEGImages
        │   └── meta.json
        └── valid
            ├── Annotations
            ├── JPEGImages
            └── meta.json

Indexing

To simplify the codes for data loading, we firstly index training, validation, and test set for all datasets.

python make_index.py

Evaluation

Our pretrained models on DAVIS 2016, DAVIS 2017, and YouTube-VOS 2018 are available for download.

mkdir -p models/pretrained && cd models/pretrained
wget https://www.dropbox.com/s/7dctisjdrl2b47c/ssm_davis16.pt -O ssm_davis16.pt
wget https://www.dropbox.com/s/ew2d2gy3rldxob9/ssm_davis17.pt -O ssm_davis17.pt
wget https://www.dropbox.com/s/jm24vm2puprcldz/ssm_youtube.pt -O ssm_youtube.pt

To evaluate a given model on a specific dataset, specify the path to model and the corresponding split file. For example, to evaluate the pretrained model on DAVIS 2017, run

CUDA_VISIBLE_DEVICES=0 python evaluate.py --split ../splits/davis2017_val.json \
    --resume ../models/pretrained/ssm_davis17.pt --save-dir ../models/pretrained/results/davis17/

The script will generate separate mask results for each object and save them in the given --save-dir. We then merge the separate results into final masks.

python merge_masks.py -i ../models/pretrained/results/davis17/ \
    -o ../models/pretrained/results/davis17_merged/

To evaluate the performance on DAVIS 2017, we apply the official evaluation codes for DAVIS 2017. Please follow its instructions to evaluate the final results.

Similarly, to evaluate our pretrained model on DAVIS 2016, run

CUDA_VISIBLE_DEVICES=0 python evaluate.py --split ../splits/davis2016_val.json \
    --resume ../models/pretrained/ssm_davis16.pt --save-dir ../models/pretrained/results/davis16/

python merge_masks.py -i ../models/pretrained/results/davis16/ \
    -o ../models/pretrained/results/davis16_merged/

Please use the official evaluation codes for DAVIS 2016 to evaluate the final mask results.

Training

YouTube-VOS 2018

We pretrain our model on YouTube-VOS with 4 GeForce GTX 1080 Ti GPU devices.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py \
    --model-dir ../models/youtube --split ../splits/youtube_train.json

You may start tensorboard to keep track of the training process.

tensorboard --logdir ../models/youtube/board

DAVIS 2017

For better performance, we further train our model only on DAVIS 2017 based on the best pretrained model, say 80999.pt.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py \
    --model-dir ../models/davis17 --split ../splits/davis2017_train.json \
    --resume ../models/youtube/checkpoints/80999.pt --max-epoch 60 \
    --base-lr 1e-6 --save-step 640 --lr-decay-step 1920

DAVIS 2016

Similarly, we also train our model on DAVIS 2016.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py \
    --model-dir ../models/davis16 --split ../splits/davis2016_train.json \
    --resume ../models/youtube/checkpoints/80999.pt --max-epoch 100 \
    --base-lr 1e-6 --save-step 130 --lr-decay-step 650

Citation

If you find our paper or code helpful in your research, feel free to cite it.

@article{zhu2021separable,
  title={Separable Structure Modeling for Semi-supervised Video Object Segmentation},
  author={Zhu, Wencheng and Li, Jiahao and Lu, Jiwen and Zhou, Jie},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2021},
  publisher={IEEE}
}

ssm-vos's People

Contributors

li-plus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

songyang86

ssm-vos's Issues

heat map

Hello, how is the heat map made?
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.