Code Monkey home page Code Monkey logo

ipn-hand's Introduction

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

PyTorch implementation, codes and pretrained models of the paper:

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition
Gibran Benitez-Garcia, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, and Keiji Yanai
Accepted at ICPR 2020

This paper proposes the IPN Hand dataset, a new benchmark video dataset with sufficient size, variation, and real-world elements able to train and evaluate deep neural networks for continuous Hand Gesture Recognition (HGR). With our dataset, the performance of three 3D-CNN models is evaluated on the tasks of isolated and continuous real-time HGR. Since IPN hand contains RGB videos only, we analyze the possibility of increasing the recognition accuracy by adding multiple modalities derived from RGB frames, i.e., optical flow and semantic segmentation, while keeping the real-time performance.

Introduction video (supplementary material):

Dataset details

The subjects from the dataset were asked to record gestures using their own PC keeping the defined resolution and frame rate. Thus, only RGB videos were captured, and the distance between the camera and each subject varies. All videos were recorded in the resolution of 640x480 at 30 fps.

Each subject continuously performed 21 gestures with three random breaks in a single video. We defined 13 gestures to control the pointer and actions focused on the interaction with touchless screens.

Description and statics of each gesture are shown in the next table. Duration is measured in the number of frames (30 frames = 1 s).

id Label Gesture Instances Mean duration (std)
1 D0X Non-gesture 1431 147 (133)
2 B0A Pointing with one finger 1010 219 (67)
3 B0B Pointing with two fingers 1007 224 (69)
4 G01 Click with one finger 200 56 (29)
5 G02 Click with two fingers 200 60 (43)
6 G03 Throw up 200 62 (25)
7 G04 Throw down 201 65 (28)
8 G05 Throw left 200 66 (27)
9 G06 Throw right 200 64 (28)
10 G07 Open twice 200 76 (31)
11 G08 Double click with one finger 200 68 (28)
12 G09 Double click with two fingers 200 70 (30)
13 G10 Zoom in 200 65 (29)
14 G11 Zoom out 200 64 (28)
All non-gestures: 1431 147 (133)
All gestures: 4218 140 (94)
Total: 5649 142 (105)

Baseline results

Baseline results for isolated and continuous hand gesture recognition of the IPN Hand dataset can be found here.

Requirements

Please install the following requirements.

  • Python 3.5+
  • PyTorch 1.0+
  • TorchVision
  • Pillow
  • OpenCV

Pretrained models

Usage

Preparation

  • Download the dataset from here
  • Clone this repository
$ git clone https://github.com/GibranBenitez/IPN-hand
  • Store all pretrained models in ./report_ipn/

Isolated testing

  • Change the path of the dataset from ./tests/run_offline_ipn_Clf.sh and run
$ bash run_offline_ipn_Clf.sh

Continuous testing

  • Change the path of the dataset from ./tests/run_online_ipnTest.sh and run
$ bash run_online_ipnTest.sh

Citation

If you find useful the IPN Hand dataset for your research, please cite the paper:

@inproceedings{bega2020IPNhand,
  title={IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition},
  author={Benitez-Garcia, Gibran and Olivares-Mercado, Jesus and Sanchez-Perez, Gabriel and Yanai, Keiji},
  booktitle={25th International Conference on Pattern Recognition, {ICPR 2020}, Milan, Italy, Jan 10--15, 2021},
  pages={1--8},
  year={2021},
  organization={IEEE},
}

License

The benchmark code shared in this repository is licensed under the MIT license. However, the data and annotations of the IPN Hand dataset are licensed under a Creative Commons Attribution 4.0 License.

Acknowledgement

This project is inspired by many previous works, including:

ipn-hand's People

Contributors

gibranbenitez avatar luisestebanacevedobringas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ipn-hand's Issues

different train and val splits from the paper?

Hi @GibranBenitez

Firstly thank you for publishing the dataset and code.

I ran the isolated test successfully with your dataset and pretrained weights, but I got a different confusion matrix from the paper. So I am wondering if I missed sth, or just you published different splits?
1

Moreover, when I tried to load resnet50 with pretrained weights, the shapes do not match. Please tell me how to fix it. Thx.
2

some .avi files have a different number of frames than reported in annotations

When I was building dataset from video files (smaller download), I found that the number of frames in some files doesn't match the number of frames reported in the annotations or IPN_Hand/frames/<vid_name>_{:06d}.jpg.

for example: 1CM1_3_R_#226.avi

ffprobe -v error -select_streams v:0 -count_frames -show_entries stream=nb_read_frames -print_format csv 1CM1_3_R_#226.avi

stream,4786

However, final annotation ends with frame 4795

# Annot_List.txt
video,label,id,t_start,t_end,frames
...
1CM1_3_R_#226,D0X,1,4769,4795,27
...

and the folder of image frames from video also contains 4795 frames.

What method did you use to split the .avi videos into individual frames?

Test the model on a custom dataset?

Hello,
Many thanks for sharing the code.
Can you please let me know how I can test the model on a custom dataset?
Can you please also clarify the difference between online and offline testing? From what I noticed, for both cases, we need annotations.
Thank you

Unable to reproduce results of ResNeXt101

Thanks for your great work.I am trying to reproduce the results of ResNeXt101 on IPN-hand, However, I am unable to do so and I would appreciate some help.
i followed the training script run_clf_ipn_trainRex-js32b32.sh under the folder test/,here are my python args

python main.py
--root_path . --video_path datasets/HandGestures/IPN_dataset --annotation_path annotation_ipnGesture/ipnall_but_None.json --result_path results_ipn --pretrain_path report_ipn/ResNeXt101/shared_models_v1/models/jester_resnext_101_RGB_32.pth --pretrain_dataset jester --dataset ipn --sample_duration 32 --learning_rate 0.01 --model resnext --model_depth 101 --resnet_shortcut B --batch_size 384 --n_classes 13 --n_finetune_classes 13 --n_threads 16 --checkpoint 1 --modality RGB --train_crop random --n_val_samples 1 --test_subset test --n_epochs 100 --store_name ipnClf_jes32r_b32

i trained for 100 epoch,my best val acc is 65%,In your paper it is 83%, and I was able to achieve around 83% validation accuracy using the pre-trained weights of ResNeXt101. However, I am unsure if I made any mistakes in my settings or if there are any specific training strategies that could improve the accuracy further.
opts_ipnClf_jes32r_b32_resnext-101.txt

How to draw inference from the model

Hii @GibranBenitez ,
I am facing trouble to draw inference from the model . Can you please share the steps how I can do the inferencing, that would be very helpful !!
You can also share me some reference so that I can try , any kind of help is appreciated !!

ValList.txt missing for online test

@GibranBenitez
While doing online test of the dataset I get the following error
image

It is caused due to the following line in online_test.py
elif opt.dataset == 'ipn':
file_set = os.path.join(opt.video_path, 'ValList.txt')
test_paths = []
buf = 0
with open(file_set,'rb') as f:
for line in f:
vid_name = line.decode().split('\t')[0]
test_paths.append(os.path.join(opt.video_path, 'frames', vid_name))

However I could not find this file anywhere in your drive or the repo. Could you please provide this file. Is it the same as the vallistall.txt file in the annotation_ipn directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.