jkjung-avt / hand-detection-tutorial Goto Github PK

A tutorial on how to train a hand detector with TensorFlow Object Detection API

Home Page: https://jkjung-avt.github.io/hand-detection-tutorial/

License: MIT License

Python 71.12% Shell 28.88%

hand-detection-tutorial's Introduction

Hand Detection Tutorial

This is a tutorial on how to train a 'hand detector' with TensorFlow object detection API. This README outlines how to set up everything and train the object detection model locally. You could refer to the following blog post for more detailed description about the steps within.

Setup
Training
Evaluating the trained model
Testing the trained model with an image
Deploying the trained model onto Jetson TX2

Setup

Just for reference, the code in this repository has been tested on a desktop PC with:

NVIDIA GeForce GTX-1080Ti
Ubuntu 16.04.5 LTS (x86_64)
CUDA 9.2
cuDNN 7.1.4
TensorFlow 1.10.0

This tutorial uses python3 for training and testing the TensorFlow object detection models. Follow the steps below to set up the environment for training the models. Make sure tensorflow-gpu or tensorflow (python3 packages) has been installed on the system already.

Clone this repository.

$ cd ~/project
$ git clone https://github.com/jkjung-avt/hand-detection-tutorial.git
$ cd hand-detection-tutorial

Install required python3 packages.
```
$ sudo pip3 install -r requirements.txt
```
In case you are having trouble with sudo, you can do pip3 install --user -r requirements.txt instead.
Run the installation script. Make sure the last step in the script, Running model_builder_test.py, finishes without error, before continuing on.
```
$ ./install.sh
```
Download pretrained models from TensorFlow Object Detection Model Zoo.
```
$ ./download_pretrained_models.sh
```

Training

Prepare the 'egohands' dataset.

$ python3 prepare_egohands.py

The prepare_egohands.py script downloads the 'egohands' dataset and convert its annotations to KITTI format. When finished, the following files should be present in the folder. Note there are totally 4,800 jpg images in the 'egohands' dataset.

./egohands_data.zip
./egohands
  ├── (egohands dataset unzipped)
  └── ......
./egohands_kitti_formatted
  ├── images
  │   ├── CARDS_COURTYARD_B_T_frame_0011.jpg
  │   ├── ......
  │   └── PUZZLE_OFFICE_T_S_frame_2697.jpg
  └── labels
      ├── CARDS_COURTYARD_B_T_frame_0011.txt
      ├── ......
      └── PUZZLE_OFFICE_T_S_frame_2697.txt

Create the TFRecord files (train/val) needed to train the object detection model. The create_tfrecords.py script would split the jpg images into 'train' (4,300) and 'val' (500) sets, and then generate data/egohands_train.tfrecord and data/egohands_val.tfrecord. This process might take a few minutes. The resulting TFRecord files are roughly 1.1GB and 132MB in size.
```
$ ./create_tfrecords.sh
```
(Optional) Review and modify the model config file if necessary. For example, open the file configs/ssd_mobilenet_v1_egohands.config with an editor and do some editing.
Start training the model by invoking ./train.sh <model_name>. For example, to train the detector based on ssd_mobilenet_v1. Do this:
```
$ ./train.sh ssd_mobilenet_v1_egohands
```
The training is set to run for 20,000 iterations. It takes roughly 2 hours to finish on the desktop PC listed above.

If you have multiple GPUs, you could specify which GPU to use for the training with the CUDA_VISIBLE_DEVICES environment variable. For example, the following command starts a training session for the faster_rcnn_inception_v2_egohands model on the 2nd GPU (GPU #1).
```
$ CUDA_VISIBLE_DEVICES=1 ./train.sh faster_rcnn_inception_v2_egohands
```
Monitor the progress of training with TensorBoard, by executing tensorboard in another terminal.
```
$ cd ~/project/hand-detection-tutorial
$ tensorboard --logdir=ssd_mobilenet_v1_egohands
```
Then open http://localhost:6006 with a browser locally. (You could also replace localhost with IP address of the training PC, and do the monitoring remotely.)

Evaluating the trained model

The trained model could be evaluated by simply executing the ./eval.sh script. For example,

# similar to train.sh, use 'CUDA_VISIBLE_DEVICES' to specify GPU
$ ./eval.sh ssd_mobilenet_v1_egohands

Here's an example output of the evaluation output. Among all the numbers, the author would pay most attention to the 'AP @ IoU=0.50' value (0.967).

  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.681
  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.967
  Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.809
  Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.079
  Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.313
  Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.717
  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.258
  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.736
  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.742
  Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.118
  Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.466
  Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.774

In addition, you could run tensorboard to inspect details of the evaluation. Note logdir points to the 'eval' subdirectory below.

$ cd ~/project/hand-detection-tutorial
$ tensorboard --logdir=ssd_mobilenet_v1_egohands_eval

Again, open http://localhost:6006 or http://<IP.addr>:6006 with a browser. Click on the 'IMAGES' tab. You can then browse through all images in the validation set and check how well your trained model performs on those images.

Testing the trained model with an image

This repo also includes scripts to test the trained model with your own image file. For example, the following commands would convert a trained ssdlite_mobilenet_v2_egohands model into a frozen graph (saved under model_exported/), and then use the graph to detect hands in data/jk-son-hands.jpg. The output image, with bounding boxes overlaid, would be saved as detection_output.jpg.
```
$ CUDA_VISIBLE_DEVICES=0 ./export.sh ssdlite_mobilenet_v2_egohands
$ CUDA_VISIBLE_DEVICES=0 ./detect_image.sh data/jk-son-hands.jpg 
```
You can then check out the output image by, say,
```
$ display detection_output.jpg
```

Deploying the trained model onto Jetson TX2/Nano

Please refer to the following GitHub repos and blog posts.

Demo #3 ('ssd') of jkjung-avt/tensorrt_demos -> The trained 'ssd_mobilenet_v1_egohands' model could run as fast as ~31 frames per seconds (FPS) on Jetson Nano using 'trt_ssd_async.py'!
jkjung-avt/tf_trt_models
Deploying the Hand Detector onto Jetson TX2
TensorFlow/TensorRT (TF-TRT) Revisited
Testing TF-TRT Object Detectors on Jetson Nano

hand-detection-tutorial's People

Contributors

Stargazers

Watchers

Forkers

antonech wagjub liyifeng123 gfjiyue joeking11829 shota3527 roger1993 saikiran321 shirleneliew praneethpj davis-love-ai josephpengntu hy17003 rolyemmlin mbencherif francislinker ajaykumargp yuliangzhang lizongyao123 z14git scorpiozq tmrl0419 lucaslu1987 mjdileep chuan137 sphinxkid victor1600 lixin14222 shuaihugao rlgalvez fengecho namwoo kenthinson saikat1506 elenhuo sagar73594 shiyuan0806 areeberg xxjdxmt jasionkit offshelfai nmber5 dongnv17 spike666 xufunly amphancm vansweej xunan12138 chaiai liwenhust8282 cndcnd085 ccwu0918 githubtorres alexhxun

hand-detection-tutorial's Issues

Trouble converting retrained SSD Mobilenet v2 model to TensorRT engine

Hi, this issue is pretty similar to #24.

I retrained ssd_mobilenet_v2_coco on a custom dataset and would like to use it on my Jetson Nano. I'm using two classes. I used tensorflow 1.14 to retrain the model from the checkpoint using the Object Detection API at this commit: https://github.com/tensorflow/models/tree/adc01cd76ae0d9d3b2e8dde3ec6bf4086f7da046. That worked successfully, and I have a frozen_inference_graph.pb.

I'm trying to use your script to build a TensorRT engine. I'm using the same settings for "min_size", "max_size", and "input_order" as you use for ssd_mobilenet_v2_coco. When I run it, I also get the error:

[TensorRT] ERROR: UffParser: Validator error: FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/BatchNorm/FusedBatchNormV3: Unsupported operation _FusedBatchNormV3
num layers= 0
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
File "build_engine.py", line 233, in
main()
File "build_engine.py", line 227, in main
buf = engine.serialize()
AttributeError: 'NoneType' object has no attribute 'serialize'

I see that you recommended adding 1 to "num_classes", or changing "input_order", but neither of those worked for me. Also, I didn't retrain the model with a background class, only my two custom classes, so I'm not sure changing "num_classes" makes sense.

These are the versions running on the Nano:

tensorflow: 1.14
tensorrt: 6.0.1.10
uff: 0.6.5

I know you suggest running earlier versions of tensorflow and the Object Detection API, but as I was getting the same error when running "ssd_tensorrt.py" as part of the jetbot repo, I thought something else might be happening here. Also, I'd ideally not tie myself to older versions in this way.

Thanks for your help.

./train.sh result window

I0429 22:25:22.272610 140074595292928 basic_session_run_hooks.py:262] loss = 19.405396, step = 1
INFO:tensorflow:global_step/sec: 0.300812
I0429 22:30:54.705383 140074595292928 basic_session_run_hooks.py:692] global_step/sec: 0.300812
INFO:tensorflow:loss = 4.1986165, step = 101 (332.433 sec)
I0429 22:30:54.705876 140074595292928 basic_session_run_hooks.py:260] loss = 4.1986165, step = 101 (332.433 sec)
INFO:tensorflow:Saving checkpoints for 178 into ssd_mobilenet_v2_egohands/model.ckpt.
I0429 22:35:13.434082 140074595292928 basic_session_run_hooks.py:606] Saving checkpoints for 178 into ssd_mobilenet_v2_egohands/model.ckpt.

It looked as if it was coming out this way, suddenly these words came out.

Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means tf.py_functions can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.

Also, these words were printed.

W0429 22:35:21.032854 140071483201280 coco_evaluation.py:133] Ignoring detection with image id 644676416 since it was previously added
WARNING:tensorflow:Ignoring ground truth with image id 1194511846 since it was previously added
W0429 22:35:21.066672 140071474808576 coco_evaluation.py:82] Ignoring ground truth with image id 1194511846 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 1194511846 since it was previously added
W0429 22:35:21.066817 140071474808576 coco_evaluation.py:133] Ignoring detection with image id 1194511846 since it was previously added
WARNING:tensorflow:Ignoring ground truth with image id 195028185 since it was previously added
W0429 22:35:21.101681 140071474808576 coco_evaluation.py:82] Ignoring ground truth with image id 195028185 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 195028185 since it was previously added
W0429 22:35:21.101828 140071474808576 coco_evaluation.py:133] Ignoring detection with image id 195028185 since it was previously added
INFO:tensorflow:Evaluation [50/500]

After [500/500], I was able to see some sort of evaluation.

INFO:tensorflow:Evaluation [500/500]
I0429 22:35:38.560459 140074595292928 evaluation.py:167] Evaluation [500/500]
creating index...
index created!
INFO:tensorflow:Loading and preparing annotation results...
I0429 22:35:39.022960 140071474808576 coco_tools.py:109] Loading and preparing annotation results...
INFO:tensorflow:DONE (t=0.00s)
I0429 22:35:39.024285 140071474808576 coco_tools.py:131] DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=0.10s).
Accumulating evaluation results...
DONE (t=0.03s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.194
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.664
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.006
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.209
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.207
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.208
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.396
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.440
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.475
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.433

Also, these words were printed out.

INFO:tensorflow:Saving dict for global step 178: DetectionBoxes_Precision/mAP = 0.19424734, DetectionBoxes_Precision/mAP (large) = 0.20710744, DetectionBoxes_Precision/mAP (medium) = 0.2091659, DetectionBoxes_Precision/mAP (small) = -1.0, DetectionBoxes_Precision/[email protected] = 0.6642149, DetectionBoxes_Precision/[email protected] = 0.005652994, DetectionBoxes_Recall/AR@1 = 0.208, DetectionBoxes_Recall/AR@10 = 0.396, DetectionBoxes_Recall/AR@100 = 0.44, DetectionBoxes_Recall/AR@100 (large) = 0.43333334, DetectionBoxes_Recall/AR@100 (medium) = 0.475, DetectionBoxes_Recall/AR@100 (small) = -1.0, Loss/classification_loss = 5.17783, Loss/localization_loss = 2.4297307, Loss/regularization_loss = 0.23962235, Loss/total_loss = 7.8471913, global_step = 178, learning_rate = 0.004, loss = 7.8471913
I0429 22:35:39.247992 140074595292928 estimator.py:2033] Saving dict for global step 178: DetectionBoxes_Precision/mAP = 0.19424734, DetectionBoxes_Precision/mAP (large) = 0.20710744, DetectionBoxes_Precision/mAP (medium) = 0.2091659, DetectionBoxes_Precision/mAP (small) = -1.0, DetectionBoxes_Precision/[email protected] = 0.6642149, DetectionBoxes_Precision/[email protected] = 0.005652994, DetectionBoxes_Recall/AR@1 = 0.208, DetectionBoxes_Recall/AR@10 = 0.396, DetectionBoxes_Recall/AR@100 = 0.44, DetectionBoxes_Recall/AR@100 (large) = 0.43333334, DetectionBoxes_Recall/AR@100 (medium) = 0.475, DetectionBoxes_Recall/AR@100 (small) = -1.0, Loss/classification_loss = 5.17783, Loss/localization_loss = 2.4297307, Loss/regularization_loss = 0.23962235, Loss/total_loss = 7.8471913, global_step = 178, learning_rate = 0.004, loss = 7.8471913

And it started training again.

I0429 22:36:57.014948 140074595292928 basic_session_run_hooks.py:260] loss = 3.932395, step = 201 (362.309 sec)
INFO:tensorflow:global_step/sec: 0.297068
I0429 22:42:33.637230 140074595292928 basic_session_run_hooks.py:692] global_step/sec: 0.297068
INFO:tensorflow:loss = 3.5256753, step = 301 (336.623 sec)
I0429 22:42:33.637706 140074595292928 basic_session_run_hooks.py:260] loss = 3.5256753, step = 301 (336.623 sec)

After training 350 times, the above process was repeated the same way.

Is it right to be printed like this?
And I want to know the meaning of output.

Detecting hand raised

Hello there if I am interested to only detect hand that is raised for more than 2 seconds,is this algorithm possible?If yes which line should I adjust?

I did my research and I have found this pre-trained model :
https://download.01.org/opencv/2019/open_model_zoo/R1/20190404_140900_models_bin/person-detection-raisinghand-recognition-0001/FP16/

By my understanding, I can skip the training steps and use this model but by any chance do you know how to utilize them as I have trouble for a week now.

Hoping for your reply soon.

pre-trained hand detection models

Hi sir,
Can you provide the pre-trained models for hand detection task? (you trained on egohands dataset) model that i can run without retraining.

3.Run the installation script. Make sure the last step in the script

Setup

3.Run the installation script. Make sure the last step in the script, Running model_builder_test.py, finishes without error, before continuing on.

$ ./install.sh
I got error

python: can't open file '/home/andrei/hand-detection-tutorial-master/models/research/object_detection/builders/model_builder_test.py': [Errno 2] No such file or directory

And there is really no such file , where i can find/get it?

how to see several images

https://drive.google.com/file/d/1--NDacaHTg9zh0CSt1mzpuGEUg_sqLV0/view?usp=sharing

When I run the ./eva.shl and look at the tensorboard, there is only one image like image in link.
I want to detect several images. What should be my directory configuration?

https://drive.google.com/file/d/15mNcnYm4gWyLA9PTPeyYOWUReKYQzZfn/view?usp=sharing
The above links are my directory configuration and images. Only the first image is detected.

Not an issue, just a question

Anyway to setup your project on windows using Anaconda?

Thanks in advance

About model

Thanks for you great project, can you share your model "ssdlite_mobilenet_v2_egohands" to my google account "[email protected]",Thank you.

Shuffle buffer filled.

Hi, @jkjung-avt
Thanks for your work! It helps me a lot.
I followed your tutorial to train ssd_mobilenet_v1 on KITTI dataset.
When I run ./train.sh ssd_mobilenet_v1_kitti, it did not succeed.
The log shows:
INFO:tensorflow:Saving checkpoints for 0 into ssd_mobilenet_v1_kitti/model.ckpt.
I0330 12:04:13.290106 547919339536 basic_session_run_hooks.py:606] Saving checkpoints for 0 into ssd_mobilenet_v1_kitti/model.ckpt.
2020-03-30 12:07:07.154592: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 653 of 2048
2020-03-30 12:07:17.200949: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 1245 of 2048
2020-03-30 12:07:27.171252: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 1858 of 2048
2020-03-30 12:07:30.213396: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:195] Shuffle buffer filled.
./train.sh: 60: 11880 killed PYTHONPATH=pwd/models/research:pwd/models/research/slim python3 ./models/research/object_detection/model_main.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR} --num_train_steps=${NUM_TRAIN_STEPS} --sample_1_of_n_eval_samples=1 --alsologtostderr

I think it was caused by shuffle buffer size, i am not really sure. Help me!!
Please give my some hints for modification, Sincere thanks for you!

Loaded runtime CuDNN library: 7.5.1 but source was compiled with: 7.6.0.

Thanks for the tutorial. Everything worked up to train.sh. any ideas how to fix this? TIA.

2020-07-17 20:50:48.353275: E tensorflow/stream_executor/cuda/cuda_dnn.cc:319] Loaded runtime CuDNN library: 7.5.1 but source was compiled with: 7.6.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

Retraining custom object detector

Hey can you share the particular commit or the version of the object detection API with which you trained the "Hand-detection-model"? So I have a custom dataset and I retrained it with tensorflow 1.15 and the latest object detection API (November 2019 commit) . I was unable to build the tensorrt engine. I encountered the following error.

[TensorRT] ERROR: UffParser: Validator error: FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/BatchNorm/FusedBatchNormV3: Unsupported operation _FusedBatchNormV3 [TensorRT] ERROR: Network must have at least one output

Tensorflow version for training and freezing - 1.15 (GPU Telsa K80)
Model used - http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz

Mobilnet SSD v2 coco

Jetpack on Jetson Nano (used for Uff conversion and building TRT engine as per your trt_ssd tutorial)
TensorRT version - 5x
tensorflow 1.14

So I would like to retrain my custom dataset with the same version of Tensorflow and Obj detection API you had used for this tutorial. Could you please share some details regarding this. Or can you suggest a workaround for the error encountered.

How to improve created model performance

Hi jkjung !

It's me, another time ^^' I'm here because I know that nowadays NVIDIA has worked a lot to improve inference time and that I know for example that SSD-inception-v2 is running at 150 FPS on my xavier with Jetson-inference project. With your tutorial, I've been able to create my own object detection model and today it is running at 25 FPS on my Xavier while I've improved it by using the FP16 mode to convert my saved_model to the frozen_graph. I'm really far from those 150 FPS and since the config of the trained model is based of ssd-inception-v2, I'm wondering where can I improve the process of training and converting graphs to make my model faster. I've tested your several tensorrt_demos and they are also running at high FPS so if you could help me to use these tensorRT for a custom object detection model, it would simply be amazing.

Sorry for the big post, have a good day !

Software to labelize your images

Hi, I was wondering which software you were using to labelize your images because the format of your txt files ( hand 0 0 0 515 431 623 544 0 0 0 0 0 0 0 0 for example) do not look like the ones I have with LabelImg soft.

Thanks for your repo again :)

The training consume all my memory

HI.
Sorry to trouble you. I trained my hand model on a jetson nano machine and use the data of egohands, I set the batch size to 2, and I saw it eaten all my 4GB memory and 4GB swap, after a while the training steps hangs, so where can I set the limit use of the memory when use the train.py .

Retraining Custom MobilenetV2-fn and use on TensorRT

Hello, I trained custom mobilenetv2_fn model with 6 classes on Tensorflow 1.15 version.
And then, I converted to pb file.
But when building TensorRT model, I have the following error.

Using output node NMS
Converting to UFF graph
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_conf as custom op: FlattenConcat_TRT
Warning: No conversion function registered for layer: Unpack yet.
Converting Preprocessor/unstack as custom op: Unpack
Warning: No conversion function registered for layer: GridAnchor_TRT yet.
Converting GridAnchor as custom op: GridAnchor_TRT
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_loc as custom op: FlattenConcat_TRT
DEBUG [/usr/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['NMS'] as outputs
No. nodes: 612
UFF Output written to ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph_1.uff
[TensorRT] ERROR: UffParser: Validator error: Preprocessor/unstack: Unsupported operation _Unpack
Building TensorRT engine, this may take a few minutes...
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
File "main.py", line 371, in
buf = trt_engine.serialize()
AttributeError: 'NoneType' object has no attribute 'serialize'

Please help me how to fix it.
I attach pb file and python file I used.
I will expect good news ASAP.

https://drive.google.com/drive/folders/1DrCFP3T0mFSm1GNzRp8aude-Ona6SoMz?usp=sharing

Where is "ssdlite_mobilenet_v2_egohands"?

Could you share the pretrained model?

How can I view the output log of the training

Hi, @jkjung-avt
Thanks for your work! It helps me a lot.
I trained the model on the kitti dataset, but the accuracy is a little low. I want to find out the reason. Could you please tell me where the training process and intermediate results, such as loss?

Console output：(I think AP is too low.)
creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=26.29s).
Accumulating evaluation results...
DONE (t=3.02s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.227
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.455
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.197
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.085
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.186
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.439
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.207
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.341
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.369
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.182
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.331
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.574

tf_trt_models's 'camera_tf_trt.py'

I was carring out tf_trt_models. I don't know how to leave a question there, so I ask a question here.
I always get 'python3 camera_tf_trt.py' error when I do this.

2020-04-13 17:11:35.407792: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger (Unnamed Layer* 3) [Convolution]: at least three non-batch dimensions are required for input
2020-04-13 17:11:35.423809: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger (Unnamed Layer* 9) [Convolution]: at least three non-batch dimensions are required for input
Segmentation fault (core dumped)

I want to know how to get rid of this error.

speed on GPU

HI, thanks for awesom work, I tested in image by ssd_mobilenet_v2_egohands model in 1080ti GPU, but the speed is slow, about 700ms for each image averagely.
Are there some thing wrong with me, Could you please tell me the idea speed ? how to improve it ? thanks in advance.

no supported kernel for CPU devices is available

Hi,
I have followed your steps and could generate TRT model for faster_rcnn_inception_V2.
But couldn't inference them and ending up with this issue.
can you please help me with this?
2019-05-09 19:15:03.127377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-09 19:15:03.127559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 1.21GiB
2019-05-09 19:15:03.127616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-09 19:15:05.203343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-09 19:15:05.203448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-09 19:15:05.203479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-09 19:15:05.203649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 488 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Traceback (most recent call last):
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1305, in _run_fn
self._extend_graph()
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'my_trt_op1': Could not satisfy explicit device specification '/device:CPU:0' because no supported kernel for CPU devices is available.
Registered kernels:
device='GPU'

 [[Node: my_trt_op1 = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT], input_nodes=["MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer"], output_nodes=["SecondStageBoxPredictor/AvgPool"], serialized_engine="\340\013\2...00\000\000", _device="/device:CPU:0"](MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "MainV1.py", line 446, in
img,mindP,mindC,mindB=detection(frame,DistanceLogic)
File "MainV1.py", line 146, in detection
boxes,scores,classes,num=odapi.processFrame(cmb)
File "MainV1.py", line 110, in processFrame
feed_dict={self.image_tensor:image_np_expanded})
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'my_trt_op1': Could not satisfy explicit device specification '/device:CPU:0' because no supported kernel for CPU devices is available.
Registered kernels:
device='GPU'

 [[Node: my_trt_op1 = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT], input_nodes=["MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer"], output_nodes=["SecondStageBoxPredictor/AvgPool"], serialized_engine="\340\013\2...00\000\000", _device="/device:CPU:0"](MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer)]]

Caused by op 'my_trt_op1', defined at:
File "MainV1.py", line 350, in
odapi = DetectorAPI(path_to_ckpt=model_path)
File "MainV1.py", line 96, in init
tf.import_graph_def(od_graph_def,name='')
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3563, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3563, in
for c_op in c_api_util.new_tf_operations(self)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3450, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/home/nvidia/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1740, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'my_trt_op1': Could not satisfy explicit device specification '/device:CPU:0' because no supported kernel for CPU devices is available.
Registered kernels:
device='GPU'

 [[Node: my_trt_op1 = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT], input_nodes=["MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer"], output_nodes=["SecondStageBoxPredictor/AvgPool"], serialized_engine="\340\013\2...00\000\000", _device="/device:CPU:0"](MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer)]]

evel config

Hi,
In the config file of models, in the eval section we have some options like:

num_examples,
max_evals

1- My validation dataset has 90k images, in your opinion, It's better to set num_examples = 90000 or smaller number?
2- when I set to 90k, the evaluation recess become time-consuming.
And If I set to 10k, after each epoch finished and start to evaluation with 10k image, these 10k images for evaluation is done with which scenario?

same?
randomly choices?
if use 10k images for evaluation, then after 9 epoch one full evaluation is finished?

3- what's means max_evals?

Keeps running on CPU even with CUDA_VISIBLE_DEVICES=0 ./train.sh ssd_mobilenet_v1_egohands

When I use CUDA_VISIBLE_DEVICES=0 ./train.sh ssd_mobilenet_v1_egohands normally it should run on the GPU, but it keeps running on the CPU. Although I have tensorflow-gpu and tensorflow-cpu both installed.

no supported kernel for CPU devices is available

 [[Node: my_trt_op1 = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT], input_nodes=["MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer"], output_nodes=["SecondStageBoxPredictor/AvgPool"], serialized_engine="\340\013\2...00\000\000", _device="/device:CPU:0"](MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer)]]

During handling of the above exception, another exception occurred:

 [[Node: my_trt_op1 = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT], input_nodes=["MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer"], output_nodes=["SecondStageBoxPredictor/AvgPool"], serialized_engine="\340\013\2...00\000\000", _device="/device:CPU:0"](MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer)]]

 [[Node: my_trt_op1 = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT], input_nodes=["MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer"], output_nodes=["SecondStageBoxPredictor/AvgPool"], serialized_engine="\340\013\2...00\000\000", _device="/device:CPU:0"](MaxPool2D/MaxPool-1-0-TransposeNCHWToNHWC-LayoutOptimizer)]]

Training with background images

Hi @jkjung-avt ,

I have trained a model in mobilenet-ssd-v2 model in Tensorflow(less than version 2.0) for detecting "mobile-phone in hand". I trained with images having 'mobile-phones in hand'. Threshold is set to 7. So if hold a mobile-phone in my hand it will detect. This scenario is working, but the issue is when i hold some other objects(like cup, pen) in my hand, that also is getting detected. That is, i'm getting wrong detection. How can i reduce this wrong detection? Will training with other images like - image having hand held cup, book, etc without specifying label in xml - help to avoid these wrong detection?

I have gone through some issues in github regarding this, but still i'm not getting a clarity on this, that's why i come to you. Hope you will help me to get a clarity on this. Kindly help me.

Issue on training step

Sir. Excellent work i have the following error when running the training step:

2019-04-07 00:03:56.985039: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7401 (compatibility version 7400) but source was compiled with 7005 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2019-04-07 00:03:56.986207: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
./train.sh: línea 60: 17698 Abortado (core' generado) PYTHONPATH=pwd/models/research:pwd`/models/research/slim python3 ./models/research/object_detection/model_main.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR} --num_train_steps=${NUM_TRAIN_STEPS} --sample_1_of_n_eval_samples=1 --alsologtostderr

Any help would be appreciated, by the way awesome work

Segmentation fault (core dumped)

Hi, I am Asad , You give me the instructions how to run faster_rcnn_inception_v2_boat pre-trained for boat detection. After the changes as per the instructions we are facing this error.

Can i train in python2 but applied in Jestson in Python3?

train error

Thank you for your last reply. Your suggestions solvered my problem. I'm grateful to you very much!

I use MobileNet-YOLOv3 to train my own datasets(caffe model). Training is interrupted when testing the network.

I1021 10:17:16.232820 26936 solver.cpp:563] Iteration 1000, Testing net (#0)
I1021 10:17:16.232851 26936 net.cpp:679] Ignoring source layer label_data_1_split
I1021 10:17:16.239033 26936 net.cpp:679] Ignoring source layer Yolov3Loss1
I1021 10:17:16.239046 26936 net.cpp:679] Ignoring source layer Yolov3Loss2
I1021 10:17:16.239059 26936 net.cpp:679] Ignoring source layer Yolov3Loss3
*** Aborted at 1571624236 (unix time) try "date -d @1571624236" if you are using GNU date ***
PC: @ 0x7fd3e047ea45 caffe::Blob<>::cpu_data()
*** SIGSEGV (@0x3a) received by PID 26936 (TID 0x7fd3e0db4b00) from PID 58; stack trace: ***
@ 0x7fd3deb804b0 (unknown)
@ 0x7fd3e047ea45 caffe::Blob<>::cpu_data()
@ 0x7fd3e04ae4db caffe::Solver<>::TestDetectionSeg()
@ 0x7fd3e04af5e9 caffe::Solver<>::TestAll()
@ 0x7fd3e04b50e0 caffe::Solver<>::Step()
@ 0x7fd3e04b531f caffe::Solver<>::Solve()
@ 0x40bf0c train()
@ 0x4087ce main
@ 0x7fd3deb6b830 __libc_start_main
@ 0x409029 _start
@ 0x0 (unknown)

I searched online, but I didn't find the answer.

Training on GTX 1080 Ti

Hi. Thank you for awesome work.
I have invoked train.sh, it seems to run normal but when I checked the GPU memory usage using nvidia-smi, 141MB memory has being used. I have increased the batch_size in ssd_mobilenet_v2_egohands.config file from 24 to 128 but still the model using 141MB memory during training.
What could I do in order to train the model faster?

Pretrained models for CPU

Hi jkjung,

Thanks for your work. I didn't see pretrained models on the repo. Would you be able to upload them? I want to just deploy them on CPU for some quick prototyping on videos.

Thanks again!

Trying to access flag --model_dir before flags were parsed error ./train.sh ssd_mobilenet_v1_egohands

Hi, thanks for your tutorial. I have read the post and run the steps. At step 4 of Training section, I have run the command ./train.sh ssd_mobilenet_v1_egohands and got titled error, which is also below in detail:

File "./models/research/object_detection/model_main.py", line 101, in
tf.app.run()
File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./models/research/object_detection/model_main.py", line 55, in main
config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir)
File "/home/anaconda3/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 491, in getattr
raise _exceptions.UnparsedFlagAccessError(error_message)
absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --model_dir before flags were parsed.

How can I solve the problem?

model_exported/frozen_inference_graph.pb error

When I run CUDA_VISIBLE_DEVICES=0 ./detect_image.sh data/jk-son-hands.jpg command I got titled error.

python prepare_egohands.py

I entered that command. Then, this error occurred.

`INFO:root:Downloading egohands_data.zip...
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): vision.soic.indiana.edu:80
Traceback (most recent call last):
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/hyebin/anaconda3/lib/python3.6/http/client.py", line 1254, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/hyebin/anaconda3/lib/python3.6/http/client.py", line 1300, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/hyebin/anaconda3/lib/python3.6/http/client.py", line 1249, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/hyebin/anaconda3/lib/python3.6/http/client.py", line 1036, in _send_output
self.send(msg)
File "/home/hyebin/anaconda3/lib/python3.6/http/client.py", line 974, in send
self.connect()
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f320d738358>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='vision.soic.indiana.edu', port=80): Max retries exceeded with url: /egohands_files/egohands_data.zip (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f320d738358>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "prepare_egohands.py", line 231, in
main()
File "prepare_egohands.py", line 217, in main
download_file(EGOHANDS_DATASET_URL, egohands_zip_path)
File "prepare_egohands.py", line 69, in download_file
r = requests.get(url, stream=True)
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/hyebin/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='vision.soic.indiana.edu', port=80): Max retries exceeded with url: /egohands_files/egohands_data.zip (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f320d738358>: Failed to establish a new connection: [Errno 111] Connection refused',))
`

I don't know why these errors occur.

jkjung-avt / hand-detection-tutorial Goto Github PK

hand-detection-tutorial's Introduction

Hand Detection Tutorial

Table of contents

Setup

Training

Evaluating the trained model

Testing the trained model with an image

Deploying the trained model onto Jetson TX2/Nano

hand-detection-tutorial's People

Contributors

Stargazers

Watchers

Forkers

hand-detection-tutorial's Issues

Setup

Recommend Projects

Recommend Topics

Recommend Org