Code Monkey home page Code Monkey logo

3d-jointsformer's Introduction

Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer

This repository hosts our PyTorch implementation of 3D-Jointsformer, a novel approach for real-time hand gesture recognition in video sequences. Traditional methods struggle with managing temporal dependencies while maintaining real-time performance. To address this, we propose a hybrid approach combining 3D-CNNs and Transformers. Our method utilizes a 3D-CNN to compute high-level semantic skeleton embeddings, capturing local spatial and temporal characteristics. A Transformer network with self-attention then efficiently captures long-range temporal dependencies. Evaluation of the Briareo and Multimodal Hand Gesture datasets yielded accuracy scores of 95.49% and 97.25%. Importantly, our approach achieves real-time performance on standard CPUs, distinguishing it from GPU-dependent methods. The hybrid 3D-CNN and Transformer approach outperforms existing methods in both accuracy and speed, effectively addressing real-time recognition challenges.

Installation

conda create -n 3DJointsformer python=3.9 -y
conda activate 3DJointsformer
conda install pytorch=1.11.0 torchvision=0.12.0 cudatoolkit=11.3 -c pytorch -y
pip install 'mmcv-full==1.5.0' -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
pip install mmaction2  # tested mmaction2 v0.24.0

Data Preparation

In this work we have tested the proposed model on two datasets : the Briareo and Multi-Modal Hand Gesture Dataset . The hand keypoints are obtained by Mediapipe, we have also included code to generate these hand keypoints ( see data_preprocessing ).

Train

You can use the following command to train a model.

./tools/run.sh ${CONFIG_FILE} ${GPU_IDS} ${SEED}

Example: train the model on the joint data of Briareo dataset using 2 GPUs with seed 0.

./tools/run.sh configs/transformer/jointsformer3d_briareo.py 0,1 0

Test

You can use the following command to test a model.

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

Example: inference on the joint data of Briareo dataset.

python tools/test.py configs/transformer/jointsformer3d_briareo.py \
    work_dirs/jointsformer3d/best_top1_acc_epoch_475.pth \
    --eval top_k_accuracy --cfg-options "gpu_ids=[0]"

Bibtex

If this project is useful for you, please consider citing our paper.

@Article{s23167066,
AUTHOR = {Zhong, Enmin and del-Blanco, Carlos R. and Berjón, Daniel and Jaureguizar, Fernando and García, Narciso},
TITLE = {Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer},
JOURNAL = {Sensors},
VOLUME = {23},
YEAR = {2023},
NUMBER = {16},
ARTICLE-NUMBER = {7066},
URL = {https://www.mdpi.com/1424-8220/23/16/7066},
PubMedID = {37631602},
ISSN = {1424-8220},
DOI = {10.3390/s23167066}
}

Acknowledgements

Our code is based on SkelAct , MMAction2 , SlowFast Sincere thanks to their wonderful works.

License

This project is released under the Apache 2.0 license.

3d-jointsformer's People

Contributors

enminxo avatar

Stargazers

 avatar  avatar Frank Regal avatar

Watchers

 avatar

Forkers

cda-gti-upm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.