gibranbenitez / ipn-hand Goto Github PK

Code and models of our arXiv paper "IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition"

Home Page: https://gibranbenitez.github.io/IPN_Hand/

License: MIT License

Python 96.31% Shell 3.69%

ipn-hand's Introduction

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition

Project page and download link of the dataset

PyTorch implementation, codes and pretrained models of the paper:

IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition
Gibran Benitez-Garcia, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, and Keiji Yanai
Accepted at ICPR 2020

This paper proposes the IPN Hand dataset, a new benchmark video dataset with sufficient size, variation, and real-world elements able to train and evaluate deep neural networks for continuous Hand Gesture Recognition (HGR). With our dataset, the performance of three 3D-CNN models is evaluated on the tasks of isolated and continuous real-time HGR. Since IPN hand contains RGB videos only, we analyze the possibility of increasing the recognition accuracy by adding multiple modalities derived from RGB frames, i.e., optical flow and semantic segmentation, while keeping the real-time performance.

Introduction video (supplementary material):

Dataset details

The subjects from the dataset were asked to record gestures using their own PC keeping the defined resolution and frame rate. Thus, only RGB videos were captured, and the distance between the camera and each subject varies. All videos were recorded in the resolution of 640x480 at 30 fps.

Each subject continuously performed 21 gestures with three random breaks in a single video. We defined 13 gestures to control the pointer and actions focused on the interaction with touchless screens.

Description and statics of each gesture are shown in the next table. Duration is measured in the number of frames (30 frames = 1 s).

id	Label	Gesture	Instances	Mean duration (std)
1	D0X	Non-gesture	1431	147 (133)
2	B0A	Pointing with one finger	1010	219 (67)
3	B0B	Pointing with two fingers	1007	224 (69)
4	G01	Click with one finger	200	56 (29)
5	G02	Click with two fingers	200	60 (43)
6	G03	Throw up	200	62 (25)
7	G04	Throw down	201	65 (28)
8	G05	Throw left	200	66 (27)
9	G06	Throw right	200	64 (28)
10	G07	Open twice	200	76 (31)
11	G08	Double click with one finger	200	68 (28)
12	G09	Double click with two fingers	200	70 (30)
13	G10	Zoom in	200	65 (29)
14	G11	Zoom out	200	64 (28)
		All non-gestures:	1431	147 (133)
		All gestures:	4218	140 (94)
		Total:	5649	142 (105)

Video examples of all classes (.GIF) here

Baseline results

Baseline results for isolated and continuous hand gesture recognition of the IPN Hand dataset can be found here.

Requirements

Please install the following requirements.

Python 3.5+
PyTorch 1.0+
TorchVision
Pillow
OpenCV

Pretrained models

Usage

Preparation

Download the dataset from here
Clone this repository

$ git clone https://github.com/GibranBenitez/IPN-hand

Store all pretrained models in ./report_ipn/

Isolated testing

Change the path of the dataset from ./tests/run_offline_ipn_Clf.sh and run

$ bash run_offline_ipn_Clf.sh

Continuous testing

Change the path of the dataset from ./tests/run_online_ipnTest.sh and run

$ bash run_online_ipnTest.sh

Citation

If you find useful the IPN Hand dataset for your research, please cite the paper:

@inproceedings{bega2020IPNhand,
  title={IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition},
  author={Benitez-Garcia, Gibran and Olivares-Mercado, Jesus and Sanchez-Perez, Gabriel and Yanai, Keiji},
  booktitle={25th International Conference on Pattern Recognition, {ICPR 2020}, Milan, Italy, Jan 10--15, 2021},
  pages={1--8},
  year={2021},
  organization={IEEE},
}

License

The benchmark code shared in this repository is licensed under the MIT license. However, the data and annotations of the IPN Hand dataset are licensed under a Creative Commons Attribution 4.0 License.

Acknowledgement

This project is inspired by many previous works, including:

Real-time hand gesture detection and classification using convolutional neural networks, Kopuklu et al, FG 2019 [code]
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?, Hara et al, CVPR 2018 [code]
Optical Flow Estimation Using A Spatial Pyramid Network, Ranjan and Black, CVPR 2017 [code by Niklaus]
HarDNet: A Low Memory Traffic Network, Chao et al, ICCV 2019 [code]
Learning to estimate 3d hand pose from single rgb images, Zimmermann and Brox, ICCV 2017 [dataset]

ipn-hand's People

Contributors

Stargazers

Watchers

ipn-hand's Issues

input image size in the code is different from the value reported in paper?

Hello,
Many thanks for sharing your great work.
I noticed that in the code the input image is resized to 112 but in the paper, it was reported 320*240.
Can you please elaborate this issue?
Thanks
Mohammad

different train and val splits from the paper?

Hi @GibranBenitez

Firstly thank you for publishing the dataset and code.

I ran the isolated test successfully with your dataset and pretrained weights, but I got a different confusion matrix from the paper. So I am wondering if I missed sth, or just you published different splits?

Moreover, when I tried to load resnet50 with pretrained weights, the shapes do not match. Please tell me how to fix it. Thx.

some .avi files have a different number of frames than reported in annotations

When I was building dataset from video files (smaller download), I found that the number of frames in some files doesn't match the number of frames reported in the annotations or IPN_Hand/frames/<vid_name>_{:06d}.jpg.

for example: 1CM1_3_R_#226.avi

ffprobe -v error -select_streams v:0 -count_frames -show_entries stream=nb_read_frames -print_format csv 1CM1_3_R_#226.avi

stream,4786

However, final annotation ends with frame 4795

# Annot_List.txt
video,label,id,t_start,t_end,frames
...
1CM1_3_R_#226,D0X,1,4769,4795,27
...

and the folder of image frames from video also contains 4795 frames.

What method did you use to split the .avi videos into individual frames?

What's the accuracy in classifycation task without pretraining in jester ?

您好，您的数据集链接是不是失效了啊，访问不到。能帮我发一份数据集吗，万分感谢。[email protected]

Is the egogesture missing from datasets file folders?

in dataset.py "from datasets.egogesture import EgoGesture" No module named 'datasets.egogesture'

hello, i can not find egogesture.py in the folder datasets. should i delete the code "from datasets.egogesture import EgoGesture" ?

Test the model on a custom dataset?

Hello,
Many thanks for sharing the code.
Can you please let me know how I can test the model on a custom dataset?
Can you please also clarify the difference between online and offline testing? From what I noticed, for both cases, we need annotations.
Thank you

Unable to reproduce results of ResNeXt101

Thanks for your great work.I am trying to reproduce the results of ResNeXt101 on IPN-hand, However, I am unable to do so and I would appreciate some help.
i followed the training script run_clf_ipn_trainRex-js32b32.sh under the folder test/,here are my python args

python main.py
--root_path . --video_path datasets/HandGestures/IPN_dataset --annotation_path annotation_ipnGesture/ipnall_but_None.json --result_path results_ipn --pretrain_path report_ipn/ResNeXt101/shared_models_v1/models/jester_resnext_101_RGB_32.pth --pretrain_dataset jester --dataset ipn --sample_duration 32 --learning_rate 0.01 --model resnext --model_depth 101 --resnet_shortcut B --batch_size 384 --n_classes 13 --n_finetune_classes 13 --n_threads 16 --checkpoint 1 --modality RGB --train_crop random --n_val_samples 1 --test_subset test --n_epochs 100 --store_name ipnClf_jes32r_b32

i trained for 100 epoch,my best val acc is 65%,In your paper it is 83%, and I was able to achieve around 83% validation accuracy using the pre-trained weights of ResNeXt101. However, I am unsure if I made any mistakes in my settings or if there are any specific training strategies that could improve the accuracy further.
opts_ipnClf_jes32r_b32_resnext-101.txt

from datasets.egogesture import EgoGesture ModuleNotFoundError: No module named 'datasets.egogesture'

How to draw inference from the model

Hii @GibranBenitez ,
I am facing trouble to draw inference from the model . Can you please share the steps how I can do the inferencing, that would be very helpful !!
You can also share me some reference so that I can try , any kind of help is appreciated !!

ValList.txt missing for online test

@GibranBenitez
While doing online test of the dataset I get the following error

It is caused due to the following line in online_test.py
elif opt.dataset == 'ipn':
file_set = os.path.join(opt.video_path, 'ValList.txt')
test_paths = []
buf = 0
with open(file_set,'rb') as f:
for line in f:
vid_name = line.decode().split('\t')[0]
test_paths.append(os.path.join(opt.video_path, 'frames', vid_name))

However I could not find this file anywhere in your drive or the repo. Could you please provide this file. Is it the same as the vallistall.txt file in the annotation_ipn directory