Code Monkey home page Code Monkey logo

pose-for-everything's Introduction

Pose-for-Everything (ECCV'2022 Oral)

Introduction

Official code repository for the paper:
Pose for Everything: Towards Category-Agnostic Pose Estimation
[Lumin Xu*, Sheng Jin*, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, and Xiaogang Wang]

Abstract

Existing works on 2D pose estimation mainly focus on a certain category, e.g. human, animal, and vehicle. However, there are lots of application scenarios that require detecting the poses/keypoints of the unseen class of objects. In this paper, we introduce the task of CategoryAgnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. To achieve this goal, we formulate the pose estimation problem as a keypoint matching problem and design a novel CAPE framework, termed POse Matching Network (POMNet). A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images. We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms. Experiments show that our method outperforms other baseline approaches by a large margin. Codes and data are available at https://github.com/luminxu/Pose-for-Everything.

Usage

Install

  1. Install mmpose.
  2. run python setup.py develop.

Training

You can follow the guideline of mmpose.

Train with multiple GPUs

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

Train with multiple machines

If you can run this code on a cluster managed with slurm, you can use the script slurm_train.sh. (This script also supports single machine training.)

./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}

Here is an example of using 16 GPUs to train POMNet on the dev partition in a slurm cluster. (Use GPUS_PER_NODE=8 to specify a single slurm cluster node with 8 GPUs, CPUS_PER_TASK=2 to use 2 cpus per task. Assume that Test is a valid ${PARTITION} name.)

GPUS=16 GPUS_PER_NODE=8 CPUS_PER_TASK=2 ./tools/slurm_train.sh Test pomnet \
  configs/mp100/pomnet/pomnet_mp100_split1_256x256_1shot.py \
  work_dirs/pomnet_mp100_split1_256x256_1shot

MP-100 Dataset

Terms of Use

  1. The dataset is only for non-commercial research purposes.
  2. All images of the MP-100 dataset are from existing datasets (COCO, 300W, AFLW, OneHand10K, DeepFashion, AP-10K, MacaquePose, Vinegar Fly, Desert Locust, CUB-200, CarFusion, AnimalWeb, Keypoint-5), which are not our property. We are not responsible for the content nor the meaning of these images.
  3. We provide the annotations for training and testing. However, for legal reasons, we do not host the images. Please follow the guidance to prepare MP-100 dataset.

Citation

@article{xu2022pose,
  title={Pose for Everything: Towards Category-Agnostic Pose Estimation},
  author={Xu, Lumin and Jin, Sheng and Zeng, Wang and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping and Wang, Xiaogang},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2022},
  month={October}
}

Acknowledgement

Thanks to:

License

This project is released under the Apache 2.0 license.

pose-for-everything's People

Contributors

jin-s13 avatar luminxu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.