Code Monkey home page Code Monkey logo

hfan's Introduction

HFAN: Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation (ECCV 2022)

visitors

Download HFAN weights from BaiduDrive; and MiT weights from OneDrive.

Introduction

This work has been accepted to ECCV 2022, and we will update the camera-ready version soon.

Gensheng Pei, Yazhou Yao*, Guo-Sen Xie*, Fumin Shen, Zhenmin Tang, Jinhui Tang. "Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation", European Conference on Computer Vision (ECCV), 2022.

Performance vs. Speed

HFAN

Figure 1: Performance of HFAN-Small and HFAN-Medium on DAVIS-16.

Overview

This repository is the official PyTorch implementation of the anonymous paper:

Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation
FAM relies on sharing primary objects in images across modalities to align appearance and motion features to address the mismatch of primary object positions between video frames and their corresponding optical flows.
FAT constructs a feature adaptation weight to automatically enhance cross-modal features to tackle the modal mismatch problem between aligned feature maps.

HFAN

Figure 2: The framework of HFAN.

Dependencies

We use MMSegmentation to implement our model, and CUDA 10.1 to run our experiments. Please refer to the guidelines in MMSegmentation v0.11.0.

To simplify the reproduction steps, we only need to install

pip install torch==1.7.1 torchvision==0.8.2
pip install mmcv-full==1.3.8 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.7.0/index.html
pip install opencv-python

Dataset Preparation

  1. Download the DAVIS dataset from DAVIS website.
  2. Download the YouTube-VOS dataset from YouTube-VOS website.
  3. To quickly reproduce the proposed method, we upload the processed data to Google Drive (DAVIS and YouTube-VOS).
  4. Please ensure the datasets are organized as following format.
|DAVIS2SEG
|--frame
|--flow
|--mask

|YouTube2SEG
|--frame
|--flow
|--mask

Training

Note that the locations in the code where the paths need to be modified are shown below:
infer.py in line 35.
local_configs/hfan/*.160k.py in lines 3, 4, 69.
local_configs/hfan/*.refine.py in lines 3, 67, 111.

Download MiT weights pretrained on ImageNet-1K, and put them in a folder checkpoint/.

Train HFAN-Small

# two gpus training (V100 32G)
# Please set OMP_NUM_THREADS=(1 or Your CPUs) when training with multiple GPUs.
## First
CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=16 bash tools/dist_train.sh local_configs/hfan/hfan.small.512x512.160k.py 2 --seed 1208 --deterministic --work-dir hfan-small
## Second
CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=16 bash tools/dist_train.sh local_configs/hfan/hfan.small.512x512.refine.py 2 --seed 1208 --deterministic --work-dir hfan-small

Train HFAN-Medium

# two gpus training (V100 32G)
## First
CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=16 bash tools/dist_train.sh local_configs/hfan/hfan.medium.512x512.160k.py 2 --seed 1208 --deterministic --work-dir hfan-medium
## Second
CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=16 bash tools/dist_train.sh local_configs/hfan/hfan.medium.512x512.refine.py 2 --seed 1208 --deterministic --work-dir hfan-medium

Testing

Download HFAN-Small, and put it in a folder checkpoint/.

Evaluate HFAN-Small:

# single gpu (V100 32G)
python infer.py \
    --config local_configs/hfan/hfan.small.512x512.refine.py \
    --checkpoint checkpoint/HFAN-s-converted.pth \
    --output_dir ./output_path/hfan-small
    
# single gpu (V100 32G) with multi-scale
python infer.py \
    --config local_configs/hfan/hfan.small.512x512.refine.py \
    --checkpoint checkpoint/HFAN-s-converted.pth \
    --output_dir ./output_path/hfan-small-MS --aug-test

Download HFAN-Medium, and put it in a folder checkpoint/.

Evaluate HFAN-Medium

# single gpu (V100 32G)
python infer.py \
    --config local_configs/hfan/hfan.medium.512x512.refine.py \
    --checkpoint checkpoint/HFAN-m-converted.pth \
    --output_dir ./output_path/hfan-medium
    
# single gpu (V100 32G) with multi-scale
python infer.py \
    --config local_configs/hfan/hfan.medium.512x512.refine.py \
    --checkpoint checkpoint/HFAN-m-converted.pth \
    --output_dir ./output_path/hfan-medium-MS --aug-test

Results

We report the results from the current codebase as follows, which match the performance reported in our original paper. For unsupervised video object segmentation, the matlab version of evaluation code can be obtained from DAVIS-EValuation, and the multiprocessing python version is available PyDavis16EvalToolbox. The evaluation toolbox for video salient object detection task is available VSOD.

DAVIS-16, Unsupervised Video Object Segmentation

Metrics J Mean ↑ J Recall ↑ J Decay ↓ F Mean ↑ F Recall ↑ F Decay ↓ J&F Mean ↑ FPS ↑
HFAN-Small (SS) 86.2 96.7 4.6 87.1 95.5 2.3 86.7 20.8
HFAN-Small (MS) 87.1 96.8 4.8 87.7 95.3 2.5 87.4 2.5
HFAN-Medium (SS) 86.8 96.1 4.3 88.2 95.3 1.1 87.5 14.4
HFAN-Medium (MS) 88.0 96.2 4.5 89.3 95.4 2.0 88.7 1.4

Long-Videos, Unsupervised Video Object Segmentation

Metrics J Mean ↑ J Recall ↑ J Decay ↓ F Mean ↑ F Recall ↑ F Decay ↓ J&F Mean ↑
HFAN-Small 74.9 82.5 14.8 76.1 86.0 16.0 75.5
HFAN-Medium 80.2 91.2 9.4 83.2 96.5 7.1 81.7

DAVIS-16, Video Salient Object Detection

Metrics S ↑ E ↑ F ↑ MAE ↓
HFAN-Small 0.934 0.983 0.929 0.009
HFAN-Medium 0.938 0.983 0.935 0.008

Ablation Studies

To facilitate the ablation study, we decompose the various modules of the model in the codebase. Download links are provided for each ablation version of the model.

Impact of Data Input

# single gpu (V100 32G)
## Image frame only
python infer.py \
    --config local_configs/hfan/hfan.small.512x512.refine.py \
    --checkpoint checkpoint/im-converted.pth \
    --options model.decode_head.select_method=im \
    --output_dir ./output_path/im
## Optical flow only    
python infer.py \
    --config local_configs/hfan/hfan.small.512x512.refine.py \
    --checkpoint checkpoint/fw-converted.pth \
    --options model.decode_head.select_method=fw \
    --output_dir ./output_path/fw
## Baseline    
python infer.py \
    --config local_configs/hfan/hfan.small.512x512.refine.py \
    --checkpoint checkpoint/baseline-converted.pth \
    --options model.decode_head.select_method=base \
    --output_dir ./output_path/baseline
Input J Mean ↑ ΔJ F Mean ↑ ΔF download
Image frame only 79.1 -3.9 79.8 -3.5 model / mask
Optical flow only 77.9 -5.1 76.5 -6.8 model / mask
Baseline 83.0 - 83.3 - model / mask

Efficacy of Crucial Modules

# single gpu (V100 32G)
## FAM 
python infer.py \
    --config local_configs/hfan/hfan.small.512x512.refine.py \
    --checkpoint checkpoint/FAM-converted.pth \
    --options model.decode_head.select_method=fam \
    --output_dir ./output_path/FAM
## FAT
python infer.py \
    --config local_configs/hfan/hfan.small.512x512.refine.py \
    --checkpoint checkpoint/FAT-converted.pth \
    --options model.decode_head.select_method=fat \
    --output_dir ./output_path/FAT
## HFAN  
python infer.py \
    --config local_configs/hfan/hfan.small.512x512.refine.py \
    --checkpoint checkpoint/HFAN-s-converted.pth \
    --options model.decode_head.select_method=hfan \
    --output_dir ./output_path/HFAN
Variants J Mean ↑ ΔJ F Mean ↑ ΔF download
Baseline 83.0 - 83.3 - model / mask
Baseline + FAM 85.2 +2.2 85.6 +2.3 model / mask
Baseline + FAT 85.0 +2.0 86.1 +2.8 model / mask
Baseline + HFAN 86.2 +3.2 87.1 +3.8 model / mask

Efficacy of Backbone

Backbone J Mean ↑ F Mean ↑ FPS ↑ download
MiT-b0 (SS/MS) 81.5/83.4 80.8/82.3 24.0/3.4 model / mask
MiT-b1 (SS/MS) 86.2/87.1 87.1/87.7 20.8/2.5 model / mask
MiT-b2 (SS/MS) 86.8/88.0 88.2/89.3 14.4/1.4 model / mask
MiT-b3 (SS/MS) 86.8/88.2 88.8/90.0 10.6/1.0 model / mask
Swin-Tiny (SS/MS) 86.0/87.2 87.3/87.9 12.8/1.1 model / mask
ResNet-101 (SS/MS) 86.7/87.5 87.5/88.1 12.4/1.3 model / mask

Visualize Ablated Versions

Feature-level visualization HFAN Mask-level visualization HFAN

Qualitative Results

Mask-level visualization HFAN

Citation

If you find this useful in your research, please consider citing:

@inproceedings{
title={Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation},
author={Gensheng Pei, Yazhou Yao, Guo-Sen Xie, Fumin Shen, Zhenmin Tang, Jinhui Tang},
booktitle={European Conference on Computer Vision (ECCV)},
year={2022}
}

hfan's People

Contributors

code4uvos avatar nust-machine-intelligence-laboratory avatar pgsmall avatar

Stargazers

yang-lamb avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.