Code Monkey home page Code Monkey logo

egohos's Introduction

EgoHOS

Project Page | Paper | [Bibtex]

Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications
European Conference on Computer Vision (ECCV), 2022
Lingzhi Zhang*, Shenghao Zhou*, Simon Stent, Jianbo Shi (* indicates equal contribution)

Our main goal is to provide a tool for better hand-object segmentation on the in-the-wild egocentric videos.

Prerequisites

  • Linux
  • Python 3
  • NVIDIA GPU + CUDA CuDNN

Table of Contents:

  1. Setup - download pretrained models and resources
  2. Datasets - download our egocentric hand-object segmentation datasets
  3. Checkpoints - download the checkpoints for all our models
  4. Inference on Images - quick usage on images
  5. Inference on Videos - quick usage on videos
  6. Other Resources - other resources used in our papers

Setup

  • Clone this repo:
git clone https://github.com/owenzlz/EgoHOS
  • Install dependencies:
pip install -U openmim
mim install mmcv-full
cd mmsegmentation
pip install -v -e .

For more information, please refer to MMSegmentation: https://mmsegmentation.readthedocs.io/en/latest/

  • Download our dataset using the following command line.
bash download_datasets.sh

After downloading, the dataset is structured as follows:

- [egohos dataset root]
    |- train
        |- image
        |- label
        |- contact
    |- val 
        |- image
        |- label
        |- contact
    |- test_indomain
        |- image
        |- label
        |- contact
    |- test_outdomain
        |- image
        |- label
        |- contact

In each label image, the category ids are referred as below. In the contact labels, 'ones' indicate the dense contact region.

0 -> background
1 -> left hand
2 -> right hand
3 -> 1st order interacting object by left hand
4 -> 1st order interacting object by right hand
5 -> 1st order interacting object by both hands
6 -> 2nd order interacting object by left hand
7 -> 2nd order interacting object by right hand
8 -> 2nd order interacting object by both hands
  • Download checkponts and config files:
bash download_checkpoints.sh
  • Let's first download a few test images for running the demo:
bash download_testimages.sh

Depending on the application scenarios, you may want to use one of these commands to generate the segmentation predictions. Please modify the image directory paths in the bash file if needed. The backen segmentation model is Swin-L backbone with UPerNet head.

The default of the bash commands run on the images in "./testimages/images", and the results are saved in "./testimages" folder. If you wish to test on your own images, you may either put your images into "./testimages/images" folder or change directories in the bash files.

  • Predict two hands, contact boundary, and interacting objects (1st order) sequentially.
cd mmsegmentation # if you are not in this directory
bash pred_all_obj1.sh
  • Predict two hands, contact boundary, and interacting objects (1st and 2nd orders) sequentially.
cd mmsegmentation # if you are not in this directory
bash pred_all_obj2.sh

If you only want to predict only hand/contact segmentation, or want to use each module separately, see the commands below.

  • Predict only the left and right hands.
cd mmsegmentation # if you are not in this directory
bash pred_twohands.sh
  • Predict the dense contact boundary.
cd mmsegmentation # if you are not in this directory
bash pred_cb.sh
  • Predict the (1st order) interacting objects.
cd mmsegmentation # if you are not in this directory
bash pred_obj1.sh
  • Predict the (both 1st and 2nd orders) interacting objects.
cd mmsegmentation
bash pred_obj2.sh
  • Let's first download a few test videos for running the demo:
bash download_testvideos.sh
  • Predict hands and (1st order) interacting objects.
cd mmsegmentation # if you are not in this directory
bash pred_obj1_video.sh
  • Predict hands and (1st and 2nd orders) interacting objects.
cd mmsegmentation # if you are not in this directory
bash pred_obj2_video.sh

We used other resources for the application section, i.e. mesh reconstruction. Please refer to below:

  1. Image Inpainting - LaMa: https://github.com/saic-mdal/lama
  2. Video Inpainting - Flow-edge Guided Video Completion: https://github.com/vt-vl-lab/FGVC
  3. Mesh Reconstruction of Hand-Object Interaction: https://github.com/hassony2/homan
  4. Video Recognition - SlowFast Newtork: https://github.com/epic-kitchens/epic-kitchens-slowfast

If you wish to generate higher quality mask, you may consider using mask refinement model, i.e. Cascade PSP: https://github.com/hkchengrex/CascadePSP

Citation

If you use this code for your research, please cite our paper:

@article{zhang2022EOS,
  title={Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications},
  doi = {10.48550/ARXIV.2208.03826},
  url = {https://arxiv.org/abs/2208.03826},
  author={Zhang, Lingzhi and Zhou, Shenghao and Stent, Simon and Shi, Jianbo},
  booktitle={arXiv preprint arXiv:2208.03826},
  year={2022}
}

egohos's People

Contributors

owenzlz avatar pennbotaz avatar

Stargazers

Roman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.