ahmetgunduz / real-time-gesrec Goto Github PK

Real-time Hand Gesture Recognition with PyTorch on EgoGesture, NvGesture, Jester, Kinetics and UCF101

Home Page: https://arxiv.org/abs/1901.10323

License: MIT License

Python 92.30% Shell 7.70%

gesture-recognition cnn pytorch video-processing machine-learning deep-neural-networks hand-gesture-recognition resnet jester egogesture

real-time-gesrec's Introduction

Real-time Hand Gesture Recognition with 3D CNNs

PyTorch implementation of the article Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks and Resource Efficient 3D Convolutional Neural Networks, codes and pretrained models.

Figure: A real-time simulation of the architecture with input video from EgoGesture dataset (on left side) and real-time (online) classification scores of each gesture (on right side) are shown, where each class is annotated with different color.

Star History

This code includes training, fine-tuning and testing on EgoGesture and nvGesture datasets.
Note that the code only includes ResNet-10, ResNetL-10, ResneXt-101, C3D v1, whose other versions can be easily added.

Abstract

Real-time recognition of dynamic hand gestures from video streams is a challenging task since (i) there is no indication when a gesture starts and ends in the video, (ii) performed gestures should only be recognized once, and (iii) the entire architecture should be designed considering the memory and power budget. In this work, we address these challenges by proposing a hierarchical structure enabling offline-working convolutional neural network (CNN) architectures to operate online efficiently by using sliding window approach. The proposed architecture consists of two models: (1) A detector which is a lightweight CNN architecture to detect gestures and (2) a classifier which is a deep CNN to classify the detected gestures. In order to evaluate the single-time activations of the detected gestures, we propose to use the Levenshtein distance as an evaluation metric since it can measure misclassifications, multiple detections, and missing detections at the same time. We evaluate our architecture on two publicly available datasets - EgoGesture and NVIDIA Dynamic Hand Gesture Datasets - which require temporal detection and classification of the performed hand gestures. ResNeXt-101 model, which is used as a classifier, achieves the state-of-the-art offline classification accuracy of 94.04% and 83.82% for depth modality on EgoGesture and NVIDIA benchmarks, respectively. In real-time detection and classification, we obtain considerable early detections while achieving performances close to offline operation. The codes and pretrained models used in this work are publicly available.

Requirements

PyTorch

conda install pytorch torchvision cuda80 -c soumith

Python 3

Pretrained models

Pretrained_models_v1 (1.08GB): The best performing models in paper

Pretrained_RGB_models_for_det_and_clf (371MB)(Google Drive) Pretrained_RGB_models_for_det_and_clf (371MB)(Baidu Netdisk) -code:p1va

Pretrained_models_v2 (15.2GB): All models in paper with efficient 3D-CNN Models

Preparation

EgoGesture

Download videos by following the official site.
We will use extracted images that is also provided by the owners
Generate n_frames files using utils/ego_prepare.py

N frames format is as following: "path to the folder" "class index" "start frame" "end frame"

mkdir annotation_EgoGesture
python utils/ego_prepare.py training trainlistall.txt all
python utils/ego_prepare.py training trainlistall_but_None.txt all_but_None
python utils/ego_prepare.py training trainlistbinary.txt binary
python utils/ego_prepare.py validation vallistall.txt all
python utils/ego_prepare.py validation vallistall_but_None.txt all_but_None
python utils/ego_prepare.py validation vallistbinary.txt binary
python utils/ego_prepare.py testing testlistall.txt all
python utils/ego_prepare.py testing testlistall_but_None.txt all_but_None
python utils/ego_prepare.py testing testlistbinary.txt binary

Generate annotation file in json format similar to ActivityNet using utils/egogesture_json.py

python utils/egogesture_json.py 'annotation_EgoGesture' all
python utils/egogesture_json.py 'annotation_EgoGesture' all_but_None
python utils/egogesture_json.py 'annotation_EgoGesture' binary

nvGesture

Download videos by following the official site.
Generate n_frames files using utils/nv_prepare.py

N frames format is as following: "path to the folder" "class index" "start frame" "end frame"

mkdir annotation_nvGesture
python utils/nv_prepare.py training trainlistall.txt all
python utils/nv_prepare.py training trainlistall_but_None.txt all_but_None
python utils/nv_prepare.py training trainlistbinary.txt binary
python utils/nv_prepare.py validation vallistall.txt all
python utils/nv_prepare.py validation vallistall_but_None.txt all_but_None
python utils/nv_prepare.py validation vallistbinary.txt binary

Generate annotation file in json format similar to ActivityNet using utils/nv_json.py

python utils/nv_json.py 'annotation_nvGesture' all
python utils/nv_json.py 'annotation_nvGesture' all_but_None
python utils/nv_json.py 'annotation_nvGesture' binary

Jester

Download videos by following the official site.
N frames and class index file is already provided annotation_Jester/{'classInd.txt', 'trainlist01.txt', 'vallist01.txt'}

N frames format is as following: "path to the folder" "class index" "start frame" "end frame"

Generate annotation file in json format similar to ActivityNet using utils/jester_json.py

python utils/jester_json.py 'annotation_Jester'

Running the code

Offline testing (offline_test.py) and training (main.py)

bash run_offline.sh

Online testing

bash run_online.sh

Citation

Please cite the following articles if you use this code or pre-trained models:

@article{kopuklu_real-time_2019,
	title = {Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks},
	url = {http://arxiv.org/abs/1901.10323},
	author = {Köpüklü, Okan and Gunduz, Ahmet and Kose, Neslihan and Rigoll, Gerhard},
  year={2019}
}

@article{kopuklu2020online,
  title={Online Dynamic Hand Gesture Recognition Including Efficiency Analysis},
  author={K{\"o}p{\"u}kl{\"u}, Okan and Gunduz, Ahmet and Kose, Neslihan and Rigoll, Gerhard},
  journal={IEEE Transactions on Biometrics, Behavior, and Identity Science},
  volume={2},
  number={2},
  pages={85--97},
  year={2020},
  publisher={IEEE}
}

Acknowledgement

We thank Kensho Hara for releasing his codebase, which we build our work on top.

real-time-gesrec's People

Contributors

Stargazers

Watchers

Forkers

ml-lab silky ideaplexus staceycy afsalem shenmayufei andrster1 asdfqwer2015 y-x-c brandonzhong sanchitmishra salt-fly jzlhit luozhipi joelgschwind rtb7syl bityangke abhinavpatel2912 arunsha okankop gibranbenitez pandinosaurus peterzhousz parkjh688 karthik-bhaskar muhamed-farooq buryang tranvansanghust scilover thanhitpro lilyswang miaochenguo okanoshogo0903 0x6d7a leeyongchao 9dsmart jcmayoral henryleou jiojio1973 zpicenow anjingxing coldpressedlinkjuice osamasarhan liu824 nsl2014fm mbencherif peterzs hadimh ycheng22 luxuff ragnardanneskjold fweidner libkup terasakisatoshi wittamer123 nabilbukhari felix-wt wxjames abhinav-2912 nush327 nush27 villawang qinhao117 tudoranca niuwenju arrowa70 leylakhaleghi xinhu98 wienerjier teo-milea jjdbear wxthon deftruth soumi7 binianzjl nyashacodes weimeng97 aqsc wrn7777 badri2304 ishmaelsnapt jahsylla menglongyue aidonchuk johndpope ktqiu sinianyutian xiaojinu yian454 zoq stone89son bearing-413 swipswaps solvve 0aqz0 soorajsknair93 noobana actasidiot yongxie-icmm m9613163

real-time-gesrec's Issues

Torch size error with Jester Dataset

Hi.
I got this error message with Jester Dataset.

I don't know the second number's meaning of torch.Size([64, 3, 7, 7, 7]) or torch.Size([64, 1, 7, 7, 7]).

and this is my run_offline.sh file.

#!/bin/bash
python main.py \
        --root_path ~/ \
        --video_path /home/eden/20BN-jester/20bn-jester-v1/videos \
        --annotation_path ~/Real-time-GesRec/annotation_Jester/jester.json\
        --result_path ~/Real-time-GesRec/results \
        --resume_path ~/Real-time-GesRec/jester_resnext_101_RGB_32.pth \
        --dataset jester \
        --sample_duration 8 \
    --learning_rate 0.01 \
    --model resnext \
        --model_depth 101 \
        --resnet_shortcut A \
        --batch_size 16 \
        --n_classes 27 \
        --n_finetune_classes 27 \
        --n_threads 16 \
        --checkpoint 1 \
        --modality Depth \
        --train_crop random \
        --n_val_samples 3 \
        --test_subset test \
     --n_epochs 100 \

trainning Detetor and Classifier Model

Hi ,
I want to know how to train the Detector model and classifier model ,can you show me the scripts parameters of setting ? Thank you very much !

Prediction Accuracy on Nvidia Gesture dataset is very poor.

Hi Ahmet Gündüz,

I was using you pre-trained model (nv_resnext_101_Depth_32.pth) to test on nvidia gesture dataset. My accuracy for this dataset is very poor (not even 20%). Can you explain whether the model is correct one to test and if yes than why the prediction accuracy is so poor.
I have followed the steps mentioned by you in your github post.

NvGesture dataset

There are 31 files named:

                      nvGesture_v1.7z.001
                     to nvGesture_v1.7z.031

I am looking to extract these files to video format. Since these files are zipped in .7z format. I tried using

                      cat nvGesture_v1.7z.0?? | 7za x

                      cat nvGesture_v1.7z.0?? | 7za e

but in both cases I get error:

                        Error:
                        Incorrect command line

Training on the jester dataset - questions

Hi,

I have some question regarding the training process:

Do you trained separability the two models? detection & classification
Can I use the jester dataset to train the two models? (something that confuses me is that one of the classes in the jester dataset is 'no gesture'?
As I understand you used the same code in the main.py to train separability the two models?
If so, what do I need to change to switch between the training of the two models? excepts the network parameters?

Thank you,
Olga

pre-trained model

Hi, would you like to share your pre-trained model that can be finetuned for both detection and classification?
Thanks.

Unable to extract Jester datasets

Hi,
I am getting an error while extracting the Jester dataset files except 20bn-jester-v1-00 data file, while other files are giving the error. And when I checked the type of the data files, So what I observed it the file type of 20bn-jester-v1-00 is different from the other files. I am attching the screenshot of the error I am gettting which also includes the file types, please help if you have also resolved the same issue.

underfitting for nvGesture?

Hi, I tried to train the classifier for nvgesture from scratch(the hyperparams come from run_online.sh, batchsize=8, resnext101, cls=25, lr=0.01, duration=32). But I found it's almost not fitting, after tens of epoch, the acc for trainset and valset is about 5%.
And I tried to increase batchsize to 16, the acc can however converged to 9%, it's still very low. I also tried to set norm_value=255 to normalize the input data to small range or smaller lr, but it didn't work.
Did I miss something?
BTW, the detector trained from scratch is well with acc about 80%.

Start frame and end frame missing from trainlist01.txt and vallist01.txt.

In README.md it says that N frames format is as following: "path to the folder" "class index" "start frame" "end frame".
However that information seems to be missing from annotation_Jester/trainlist01.txt and annotation_Jester/vallist01.txt. Is it somewhere else? Am I looking at the right files?

Thanks.

accuracy

Hi Ahmet,

I trained the classifier using Egogesture dataset. But the validation accuracy is just around 50% and also training accuracy is around 60%.

I am using ResNext101 architecture

Am I missing anything?

Where should we put the traind model?

egogesture_online.py--IndexError

when I ran "online_test.py", the error-"IndexError index -3 is out of bounds for axis 0 with size 0" happened in the line 152 of egogesture_online.py
(counts = np.bincount(label_list[np.array(list(range(_ - int(sample_duration/8), _ )))])).
I do not know how to resolve it.

No correct results printed

Hi Ahmet,
I ran online_test.py on egogesture with cpu only by setting "opt.no_cuda=True" and "opt.n_threads =0",
but it didn't produce the right results,I changed the code like this:

and one of the results printed by console like this:

it seems that the switch never been activated，causing no result been appended:

I'd appreciate it if you could help me with this

RGB-D result

Hi,

from the paper, I only saw RGB or Depth results, how about RGB-D? Are you able to release the pre-trained RGB-D models?

Thanks.

n_frames

Hi,

I am trying to use your AMAZING code with the jester dataset.
I have some questions:

I see that you assume that for each video there is a directory names "n_frames".
How can I create those directories?
I downloaded the jester dataset and extracted the data as describes in the link but there aren't directories with the name "n_frames"
Can you give the parameters for testing on the jester dataset?

Thank you!

ResNet Detector Model

Hi Ahmet,

Could you quickly send the .PTH file for the ResNet10 Detector model you talk about in your paper? This would help with replicating what you did in your paper a lot! Thanks!

Miss alarm when same gesture twice in online mode?

Hi, ahmetgunduz:
I tested the model in online mode with my own video. And all looks fine except if I show the same gesture twice. The model failed to predict the second gesture(no gesture is detected). And I doubt the reason maybe the rule based filter, but I'm not sure. Could you please give me some advice?

cuda gpu device Error

Hi.

I have 1 GPU in my computer but I got this error.
I'm newbie of Pytorch so I don't know this Error's meaning.

Traceback (most recent call last):
  File "main.py", line 177, in <module>
    train_logger, train_batch_logger)
  File "/home/eden/Real-time-GesRec/train.py", line 34, in train_epoch
    outputs = model(inputs)
  File "/home/eden/anaconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eden/anaconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
    "them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

AssertionError

I'm new to pytorch, and I ran into a lot of errors while debugging the program. Most of them have been resolved, but I'm stuck with an AssertionError

How do I train detector part only for any dataset?

There is no detector checkpoint is added to the run_offline.py file while calling main.py.
Is there any other way to do so?

How the pre-trained model on Jester could be used to train EgoGesture?

When I tried to use the pre-trained classification model on Jester to train EgoGesture dataset, it showed that

RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.fc.weight: copying a param with shape torch.Size([27, 2048]) from checkpoint, the shape in current model is torch.Size([83, 2048]).
size mismatch for module.fc.bias: copying a param with shape torch.Size([27]) from checkpoint, the shape in current model is torch.Size([83]).

It seems it is because Jester and EgoGesture have different classes of gestures. So how should I change this parameter?

My code is shown like this:
#!/bin/bash
python main.py
--root_path ~/
--video_path /home/wisccitl/Desktop/EgoGesture
--annotation_path Real-time-GesRec/annotation_EgoGesture/egogestureall_but_None.json
--result_path Real-time-GesRec/results
--resume_path Real-time-GesRec/models/jester_resnext_101_RGB_32.pth
--dataset egogesture
--sample_duration 32
--learning_rate 0.01
--model resnext
--model_depth 101
--resnet_shortcut B
--batch_size 64
--n_classes 83
--n_finetune_classes 83
--n_threads 16
--checkpoint 1
--modality RGB
--train_crop random
--n_val_samples 1
--test_subset test
--n_epochs 100 \

I modify run_offline.sh to fit jester dataset, but got precision0.03 recall0.03

Hi,
Here's my modified run_offline.sh. If I make some error params, please help to correct. Thanks!

python offline_test.py
--root_path ~/
--video_path /home/ps/NewDisk1/Public_open/Jester/20bn-jester-v1
--annotation_path ~/Codes/Real-time-GesRec/annotation_Jester/jester.json
--result_path ~/Codes/Real-time-GesRec/results
--pretrain_path Codes/Real-time-GesRec/pretrained_models/jester_resnext_101_RGB_32.pth
--dataset jester
--sample_duration 32
--learning_rate 0.01
--model resnext
--model_depth 101
--batch_size 1
--n_classes 27
--n_finetune_classes 27
--modality RGB
--n_threads 8
--checkpoint 1
--train_crop random
--n_val_samples 1
--test_subset val
--n_epochs 100

[14787/14787] Time 0.04845 (0.09407) prec@1 0.03293 prec@5 0.18780 precision 0.00000 (0.03293) recall 0.00000 (0.03293)
-----Evaluation is finished------
Overall Prec@1 0.03293% Prec@5 0.18780%

RGB pre-trained models

Are the models in the drive also usable for RGB prediction/classification?

If not, could I kindly ask you to upload these models as well?

I am asking since the names would suggest that every model (except the jester) is for Depth data.

Thank you very much

how upzip nvGesture dataset, i use 7za e datafilename

Processing archive: /home/lxj/Gesture_recognition/data/nvGesture/nvGesture_v1.7z.001

Error: E_FAIL

Reshaping error "shape '[32, -1, 112, 112]' is invalid for input of size 865536"

I am trying to do both detection and clasification for jster dataset and I have trained the detector part of it and saved it's checkpoint and for the classification part I am using the checkpoint that you have provided. So for inferencing pupose I am running run_online.sh file by putting both checkpoints there. So for that I have made Jester_online.py file just like you made egogesture_online.py to get the dataset to provide for evaluation but it is showing some reshape error in the following line -
clip = torch.cat(clip, 0).view((self.sample_duration, -1) + im_dim).permute(1, 0, 2, 3)
the error is -
RuntimeError: shape '[32, -1, 112, 112]' is invalid for input of size 865536

For simple classification, I didn't get this error. I don't know where did this number 865536 come from.
Can you help me out with this problem. I am attaching the screenshot here.

EgoGesture Dataset is not available

Hi,

Currently I am learning your code and trying to get your result.

However, I could not download the EgoGesture dataset(author's email is wrong).

Could you please provide another link to download the dataset or any other help.

Thank you!

Running realtime classifiaction on an RGB camera

Can you please provide steps to run the real time classification (online_test.py) on a RGB camera such as a normal laptop webcam.

How to get the egogesture RGB pretrained model?

The google drive files contain only the egogesture_Depth.pth, how to get the RGB model?

How to do real-time testing with a camera?

caffe-repository

Hi, congratulations on your wonderful job, but I wonder if you have any plans to release a caffe repository?
Thanks!

Testing on real-time RGB video using jester Pretrained model

I am trying to understand your code. I have understood that for other datasets there is two models detector and classifier. I only have jester dataset available with me. And for that we only one model.

Can you please tell me How can we do real time detection RGB camera video without detetctor?
And can we recognize gesture with jester model?
or which model of other datasets can be used for same?

Not able to run offline_test.py for Jester dataset

Thank you so much for the great solution.

I am in the processing validating the solution and understanding more. Tried to test the pretrained model jester_resnext_101_RGB_32.pth with Jester dataset

Downloaded dataset and performed frame creation with python utils/jester_json.py 'annotation_Jester'.

But the command Python offline_test.py is giving below error:
dataset loading [14780/14787] run Traceback (most recent call last): File "offline_test.py", line 161, in <module> outputs = model(inputs) File "/home/albin/anaconda3/envs/l3c_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/albin/anaconda3/envs/l3c_env/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward "them on device: {}".format(self.src_device_obj, t.device)) RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

Please someone help me to resolve this issue

Regards,
Albin

Any online test example code for jester data?

Test the model with RGB_D camera

Hi, thanks for sharing your amazing model.

I am now trying to test the model with my RGB-D camera. However, I am a beginner in pytorch. So, I need some help to go through the code:

I plan to feed the model with depth images, which is achieved form the camera with openni and opencv. The shape of each frame is (112,112,3). If I want to detect and classify n frames in each iteration, what shape should the input be.
What does "sample_duration" mean? What is the difference between "sample_duration_det" and detector queue.

I am using egogesture depth model.

run online_test.py

Hi Mr Ahmet,
thanks for sharing your perfect project.
I was going to test your pretrained model in online mode but I confront an error when loading the model.
please help me .
Namespace(annotation_path='/home/sattarian/Documents/projects/hand-guesture/annotation_EgoGesture/egogestureall.json', arch='resnetl-10', batch_size=1, begin_epoch=1, checkpoint=1, clf_queue_size=16, clf_strategy='median', clf_threshold_final=0.15, clf_threshold_pre=0.6, crop_position_in_test='c', dampening=0.9, dataset='egogesture', det_counter=2.0, det_queue_size=4, det_strategy='median', ft_begin_index=0, initial_scale=1.0, learning_rate=0.1, lr_patience=10, lr_steps=[10, 20, 30, 40, 100], manual_seed=1, mean=[114.7748, 107.7354, 99.475], mean_dataset='activitynet', modality='Depth', modality_clf='Depth', modality_det='Depth', model='resnetl', model_clf='resnext', model_depth=10, model_depth_clf=101, model_depth_det=10, model_det='resnetl', momentum=0.9, n_classes=2, n_classes_clf=83, n_classes_det=2, n_epochs=200, n_finetune_classes=2, n_finetune_classes_clf=83, n_finetune_classes_det=2, n_scales=5, n_threads=16, n_val_samples=1, nesterov=False, no_cuda=False, no_hflip=False, no_mean_norm=False, no_softmax_in_test=False, no_train=False, no_val=False, norm_value=1, optimizer='sgd', pretrain_path='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth', pretrain_path_clf='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnext_101_Depth_32.pth', pretrain_path_det='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth', resnet_shortcut='A', resnet_shortcut_clf='B', resnet_shortcut_det='A', resnext_cardinality=32, resnext_cardinality_clf=32, resnext_cardinality_det=32, result_path='/home/sattarian/Documents/projects/hand-guesture/results', resume_path='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth', resume_path_clf='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnext_101_Depth_32.pth', resume_path_det='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth', root_path='/home/sattarian/Documents/projects/hand-guesture/', sample_duration=8, sample_duration_clf=32, sample_duration_det=8, sample_size=112, scale_in_test=1.0, scale_step=0.84089641525, scales=[1.0, 0.84089641525, 0.7071067811803005, 0.5946035574934808, 0.4999999999911653], std=[38.7568578, 37.88248729, 40.02898126], std_norm=False, store_name='model', stride_len=1, test=True, test_subset='test', train_crop='random', video_path='/home/sattarian/Documents/projects/hand-guesture/video_kinetics_jpg', weight_decay=0.001, whole_path='video_kinetics_jpg', wide_resnet_k=2, wide_resnet_k_clf=2, wide_resnet_k_det=2)
loading pretrained model /home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth
Traceback (most recent call last):
File "online_test.py", line 137, in
detector,classifier = load_models(opt)
File "online_test.py", line 75, in load_models
detector, parameters = generate_model(opt)
File "/home/sattarian/Documents/projects/hand-guesture/model.py", line 68, in generate_model
model.load_state_dict(pretrain['state_dict'])
File "/home/sattarian/anaconda3/envs/deep-learning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.conv1.weight: copying a param with shape torch.Size([16, 1, 7, 7, 7]) from checkpoint, the shape in current model is torch.Size([16, 3, 7, 7, 7]).

Regarding running on webcam

How to run it on webcam?
How to write different DataLoader for including the images from webcam?

size mismatch for jester pre-trained model

Hi,

I am trying to apply offline_test.py on the jester dataset with your pre-trained model.
and I got:
"size mismatch for module.conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 3, 3, 7, 7])."

I think that maybe I have some problem with my parameters.
Can you please help me?

Thank you again!
I appreciate all you help!

Get stuck at running online_test.py with pretrained model on CPU

I was trying to run oneline_test.py with pretrained model on CPU (w/o CUDA). I did some modification in model.py and online_test.py, including:

added opt.no_cuda = True right after opt = parse_opts_online()
added map_location=torch.device('cpu') to torch.load(opt.pretrain_path)
modified model.load_state_dict(pretrain['state_dict'])at line 120ish to

            state_dict =pretrain['state_dict']
            from collections import OrderedDict
            new_state_dict = OrderedDict()
            for k, v in state_dict.items():
                if 'module' in k:
                    name = k[7:] # remove 'module.' of dataparallel
                new_state_dict[name]=v
            model.load_state_dict(new_state_dict)

Solved the issue of

Missing key(s) in state_dict: "conv1.weight",...
Unexpected key(s) in state_dict: "module.conv1.weight", ...

solved by looking into here

However, now it gives me an error:

Traceback (most recent call last):
  File "online_test.py", line 138, in <module>
    detector,classifier = load_models(opt)
  File "online_test.py", line 76, in load_models
    detector, parameters = generate_model(opt)
  File "...../Real-time-GesRec-master/model.py", line 132, in generate_model
    model.load_state_dict(new_state_dict)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNetL:
        size mismatch for conv1.weight: copying a param with shape torch.Size([16, 1, 7, 7, 7]) from checkpoint, the shape in current model is torch.Size([16, 3, 7, 7, 7]).

Thanks in advance!!!

Model accuracy with Jester dataset is poor

Hi I have tried to validate the pretrained model with Jester dataset.

Preconditions:

Retrained model used jester_resnext_101_RGB_32.pth
Dataset Jester
Configurations opts.zip
Source modifications diff.zip
PyTorch version 1.1.0
Python version 3.7.3

Test:

python utils/jester_json.py 'annotation_Jester' to prepare the dataset
python offline_test.py to start the execution

But the output precision is very poor

[11/3721] Time 1.07421 (1.13381) prec@1 0.03409 prec@5 0.20455 precision 0.00000 (0.03213) recall 0.00000 (0.01278)
[12/3721] Time 1.09013 (1.13017) prec@1 0.03646 prec@5 0.20312 precision 0.03030 (0.03198) recall 0.03030 (0.01424)
[13/3721] Time 1.07996 (1.12631) prec@1 0.03365 prec@5 0.20192 precision 0.00000 (0.02952) recall 0.00000 (0.01315)
[14/3721] Time 1.08615 (1.12344) prec@1 0.03125 prec@5 0.20089 precision 0.00000 (0.02741) recall 0.00000 (0.01221)

Could you please help me to find what am missing to get the proper output?
Regards,
Albin

video path isn't working

How to feed input to classifier in online_test.py using tensor.float

Hi,
How to feed input to classifier in online_test.py using tensor.float .
I tried ,

frame= np.reshape(frame,(1,1,1,512,512)) frame=cv2.normalize(frame,None,alpha=0,beta=1,norm_type=cv2.NORM_MINMAX,dtype=cv2.CV_32F)
input_clf = torch.from_numpy(frame).float()
outputs_det = classifier(inputs_clf)

I get the following error,

RuntimeError: invalid argument 2: input image (T: 1 H: 32 W: 16) smaller than kernel size (kT: 2 kH: 3 kW: 3) at /pytorch/aten/src/THCUNN/generic/VolumetricAveragePooling.cu:57

Originally posted by @sathiez in #17 (comment)

RuntimeError: input and weight type

I am tyring train a detector on Jester dataset. However, when I run run_offline.sh I encounter the followring error right after the dataset is loaded:

Traceback (most recent call last):
File "main.py", line 177, in
train_logger, train_batch_logger)
File "/home/khasmamad/Desktop/kimo/Real-time-GesRec/train.py", line 34, in train_epoch
outputs = model(inputs)
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/khasmamad/Desktop/kimo/Real-time-GesRec/models/resnetl.py", line 177, in forward
x = self.conv1(x)
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 448, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

Googling showed me that this happens when the input and model are on separate devices (in this case, input is in GPU, while model is in CPU). But I still cannot figure out a solution. Please, help.

Variables in online_test.py

Hi again,

Can you please give more details of some of the variables in online_test script?
I think that I understand them, but after looking at the code, I am not sure...

passive_count => # consecutive number of 'no gesture' classifier
active
active_index
pre_predict
finished_prediction
prev_active

There is a situation where finished_prediction = false but we got to the last window frame?
so, results is empty (predicted = np.array(results)[:, 1]) and the code is fails...
I am wondering how to deal with that? how to calculate levenshtein_distance in this case?

Thank you again!!!!

Explanation of Testing

Hey Ahmet,
thanks for this amazing work.

I was going to test your pretrained model but there are a lot of disambiguations! can you please explain how, or give an instruction on running and testing your model?

Queries about training

Hello,

I am trying to implement your paper from scratch as part of my project and have some questions which I was hoping you could answer. I am only trying to train the detector half of the network for now and using the JESTER dataset to do so.

How is the data getting fed in? Each folder has 'n' frames which belong to a category(i.e. gesture or no gesture). Taking the detector queue as 8 frames, do you then split the 'n' frames into n/8 chunks each having the gesture or no gesture label?
How long did you pre-train it on JESTER for? You paper mentions 25 epochs but I am guessing that'd for the classifier? Your code seems to indicate 100 epochs instead.

I am hoping you can help me with this. Thanks in advance!

Regards,
Nishant Bhattacharya

any readme?

Hi:
Thanks for your shared code.
But could you please write a readme?

jester pretrained model

Hello, you did a great work good job !
My question is about the jester resnext pretrained weights:

I want to know given for exampe a single folder with 35 RGB frames what are the transformations that I need to make to this frames to get the right input and then make prediction ?
And how I can load the resnext model properly ?
And is the pretrained weights compatible with pytorch 1.0.1 cuda 10

nvGesture training

Hello Ahmet, I am trying to run your code on Nvidia dataset. On running the main.py, the train.log looks like this.
epoch loss acc precision recall lr

1 0 0 0 0 0.1

2 0 0 0 0 0.1

3 0 0 0 0 0.1

which I don't think is right. Can you please tell me what am I doing wrong?
Other than setting path and reducing the epoch value from 100 to 50. I haven't changed anything.
GesRec.pdf
These are the parameters and a part of train.log after running the main.py.

opts

Hey Ahmet,

I am trying to replicate your work. I am having problems in dataloader (probably). I haven't changed anything in your code except for the paths and few minor changes for which I was getting errors.

My opt looks like:

annotation_path='/home/ndhingra/Real-time-GesRec/Real-time-GesRec/annotation_EgoGesture/egogestureall.json', arch='resnet-10', batch_size=128, begin_epoch=1, checkpoint=10, crop_position_in_test='c', dampening=0.9, dataset='egogesture', ft_begin_index=0, initial_scale=1.0, learning_rate=0.1, lr_patience=10, lr_steps=[10, 25, 50, 80, 100], manual_seed=1, mean=[114.7748, 107.7354, 99.475], mean_dataset='activitynet', modality='RGB', model='resnet', model_depth=10, momentum=0.9, n_classes=400, n_epochs=200, n_finetune_classes=400, n_scales=5, n_threads=4, n_val_samples=3, nesterov=False, no_cuda=False, no_hflip=False, no_mean_norm=False, no_softmax_in_test=False, no_train=False, no_val=False, norm_value=1, optimizer='sgd', pretrain_path='', resnet_shortcut='B', resnext_cardinality=32, result_path='/home/ndhingra/Real-time-GesRec/Real-time-GesRec/results', resume_path='', root_path='/home/ndhingra/Real-time-GesRec/Real-time-GesRec', root_video_path='/media/storage/ndhingra/EgoGesture', sample_duration=16, sample_size=112, scale_in_test=1.0, scale_step=0.84089641525, scales=[1.0, 0.84089641525, 0.7071067811803005, 0.5946035574934808, 0.4999999999911653], std=[38.7568578, 37.88248729, 40.02898126], std_norm=False, store_name='model', test=False, test_subset='val', train_crop='corner', train_validate=False, video_path='/home/ndhingra/Real-time-GesRec/Real-time-GesRec/images', weight_decay=0.001, weighted=False, wide_resnet_k=2)

I get error in main.py

train_loader = torch.utils.data.DataLoader(
training_data,
batch_size=opt.batch_size,
shuffle=True,
num_workers=opt.n_threads,
pin_memory=True)

i.e.,

ValueError: num_samples should be a positive integeral value, but got num_samples=0

Can you suggest what changes do I have to make? If possible can you also upload opts.py which you used for egogesture dataset. Since I haven't made any changes to your code, I expect it to work as it worked for you.

Could you please give me some advice for more class finetune from jester?

Hi, ahmetgunduz:
I tried to finetune a model(classifier) trained on jester to my own gesture datasets, but the performance is awful. Could you please give me some advice? And Could you please share some details for your ego finetuning experiment details?(lr? freeze some layers? freeze bn?)
My own gesture dataset has 88 classes, almost the union of jester gesture class and ego gesture class. And only a few samples per class, (train: 42 samples, val: 6 samples). The dataset is small and with distortion(wide angle camera), second person perspective, and similar with jester.

In my experiment, the arch of trained model is resnet34_0.5channels (4 block layers: [3, 4, 6, 3]). Here are results:
a. only train fc layer, all conv and bn are frozen, lr 0.001, dropout 0.7, performance: train 0.573, val: 0.463
b. block layer4 and fc, conv1 and layer1~3(include bn) are frozen , lr 0.01, dropout 0.7, performance: train 0.743, val: 0.447
c. entire model, large lr 0.01 due to my bug, no dropout, but the model got best performance: train0.909, val0.582
d. train from scratch, lr 0.01, dropout 0.7, performance: train 0.712, val 0.329

I also found the pretrained model from jester may predict some swiping case in my own dataset as sliding due to distortion.

It seems the model has not benefit much from finetuning due to more classes than jester and distortion. Could you give me some advice?

Thanks.

ahmetgunduz / real-time-gesrec Goto Github PK

real-time-gesrec's Introduction

Real-time Hand Gesture Recognition with 3D CNNs

Star History

Abstract

Requirements

Pretrained models

Preparation

EgoGesture

nvGesture

Jester

Running the code

Citation

Acknowledgement

real-time-gesrec's People

Contributors

Stargazers

Watchers

Forkers

real-time-gesrec's Issues

Recommend Projects

Recommend Topics

Recommend Org