nvidia-ai-iot / tf_trt_models Goto Github PK

TensorFlow models accelerated with NVIDIA TensorRT

License: BSD 3-Clause "New" or "Revised" License

Shell 5.80% Python 94.20%

tensorflow tensorrt nvidia jetson tx1 tx2 models neural-network object-detection image-classification train inference realtime optimize

tf_trt_models's Introduction

TensorFlow/TensorRT Models on Jetson

This repository contains scripts and documentation to use TensorFlow image classification and object detection models on NVIDIA Jetson. The models are sourced from the TensorFlow models repository and optimized using TensorRT.

Setup
Image Classification
Object Detection

Setup

Flash your Jetson TX2 with JetPack 3.2 (including TensorRT).

Install miscellaneous dependencies on Jetson

sudo apt-get install python-pip python-matplotlib python-pil

Install TensorFlow 1.7+ (with TensorRT support). Download the pre-built pip wheel and install using pip.

pip install tensorflow-1.8.0-cp27-cp27mu-linux_aarch64.whl --user

or if you're using Python 3.

pip3 install tensorflow-1.8.0-cp35-cp35m-linux_aarch64.whl --user

Clone this repository

git clone --recursive https://github.com/NVIDIA-Jetson/tf_trt_models.git
cd tf_trt_models

Run the installation script
```
./install.sh
```
or if you want to specify python intepreter
```
./install.sh python3
```

Image Classification

Models

Model	Input Size	TF-TRT TX2	TF TX2
inception_v1	224x224	7.36ms	22.9ms
inception_v2	224x224	9.08ms	31.8ms
inception_v3	299x299	20.7ms	74.3ms
inception_v4	299x299	38.5ms	129ms
inception_resnet_v2	299x299		158ms
resnet_v1_50	224x224	12.5ms	55.1ms
resnet_v1_101	224x224	20.6ms	91.0ms
resnet_v1_152	224x224	28.9ms	124ms
resnet_v2_50	299x299	26.5ms	73.4ms
resnet_v2_101	299x299	46.9ms
resnet_v2_152	299x299	69.0ms
mobilenet_v1_0p25_128	128x128	3.72ms	7.99ms
mobilenet_v1_0p5_160	160x160	4.47ms	8.69ms
mobilenet_v1_1p0_224	224x224	11.1ms	17.3ms

TF - Original TensorFlow graph (FP32)

TF-TRT - TensorRT optimized graph (FP16)

The above benchmark timings were gathered after placing the Jetson TX2 in MAX-N mode. To do this, run the following commands in a terminal:

sudo nvpmodel -m 0
sudo ~/jetson_clocks.sh

Download pretrained model

As a convenience, we provide a script to download pretrained models sourced from the TensorFlow models repository.

from tf_trt_models.classification import download_classification_checkpoint

checkpoint_path = download_classification_checkpoint('inception_v2')

To manually download the pretrained models, follow the links here.

Build TensorRT / Jetson compatible graph

from tf_trt_models.classification import build_classification_graph

frozen_graph, input_names, output_names = build_classification_graph(
    model='inception_v2',
    checkpoint=checkpoint_path,
    num_classes=1001
)

Optimize with TensorRT

import tensorflow.contrib.tensorrt as trt

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

Jupyter Notebook Sample

For a comprehensive example of performing the above steps and executing on a real image, see the jupyter notebook sample.

Train for custom task

Follow the documentation from the TensorFlow models repository. Once you have obtained a checkpoint, proceed with building the graph and optimizing with TensorRT as shown above.

Object Detection

Models

Model	Input Size	TF-TRT TX2	TF TX2
ssd_mobilenet_v1_coco	300x300	50.5ms	72.9ms
ssd_inception_v2_coco	300x300	54.4ms	132ms

TF - Original TensorFlow graph (FP32)

TF-TRT - TensorRT optimized graph (FP16)

The above benchmark timings were gathered after placing the Jetson TX2 in MAX-N mode. To do this, run the following commands in a terminal:

sudo nvpmodel -m 0
sudo ~/jetson_clocks.sh

Download pretrained model

As a convenience, we provide a script to download pretrained model weights and config files sourced from the TensorFlow models repository.

from tf_trt_models.detection import download_detection_model

config_path, checkpoint_path = download_detection_model('ssd_inception_v2_coco')

To manually download the pretrained models, follow the links here.

Important: Some of the object detection configuration files have a very low non-maximum suppression score threshold (ie. 1e-8). This can cause unnecessarily large CPU post-processing load. Depending on your application, it may be advisable to raise this value to something larger (like 0.3) for improved performance. We do this for the above benchmark timings. This can be done by modifying the configuration file directly before calling build_detection_graph. The parameter can be found for example in this line.

Build TensorRT / Jetson compatible graph

from tf_trt_models.detection import build_detection_graph

frozen_graph, input_names, output_names = build_detection_graph(
    config=config_path,
    checkpoint=checkpoint_path
)

Optimize with TensorRT

import tensorflow.contrib.tensorrt as trt

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

Jupyter Notebook Sample

For a comprehensive example of performing the above steps and executing on a real image, see the jupyter notebook sample.

Train for custom task

Follow the documentation from the TensorFlow models repository. Once you have obtained a checkpoint, proceed with building the graph and optimizing with TensorRT as shown above. Please note that all models are not tested so you should use an object detection config file during training that resembles one of the ssd_mobilenet_v1_coco or ssd_inception_v2_coco models. Some config parameters may be modified, such as the number of classes, image size, non-max supression parameters, but the performance may vary.

tf_trt_models's People

Stargazers

Watchers

Forkers

empireofkings chomolungma mmrazlighi uuvv dawin2015 yyuzhongpv swearos rkshuai vtaranti deepdriving derweeyang sehuisong xufeifeiwhu jkjung-avt a-wigand davidzh666 jianweilin cosama binbinmeng dreadlord1984 wanjinchang hoardboard ouya-bytes ngadhvi kaiyuryozin spyderlord paduck86 jreuben11 mengzhangjian fakeryfx osmnzrn jalywang123 herrylu wacoder arvind-india evitself churley862 rbnprdy guapizyq yuuuuuuuuuuuuuuuuuummy c1a1o1 sonnguyen64 bigsnarfdude muyiai nicemartin chen89 s-imara davis-love-ai leeyongchao etranger-park nicrimier varun365 rahulkishan-mobbed programmerwyl yuvaramsingh94 haikuoyao ailibrary zhucheng725 vehicularkech andreysorokin ossdc abhishekrk eeecgwood isra60 maytom firefoxhtjc csvance bhardwajrahul ntul hxl1990 gnefihs mengmengdaxiaoji aicarmark fullstackhan turgunyusuf ardianumam hsulin0806 saikirankannaiah436 needer28 ankitshah009 xiaoye77 amir22010 williamhsu17 alphonses ud0412 liningxiao fengsiyu rathan007 dcallega dcy652701 snowygoose buptdbj pratyushlohumi26 abhay-venkatesh kumarwzx pirazor lettseheartlin rockingrepos fweih zombie0117

tf_trt_models's Issues

Import error: cannot import 'pipeline_pb2'

Hi,
I am running on Jetson TX2.
I have followed the setup instructions and after finishing I have copied your jupyter notebook example code to a python script.
While trying to exectue it the following error occurs:

Traceback (most recent call last):
File "run.py", line 14, in
from tf_trt_models.detection import download_detection_model, build_detection_graph
File "/home/nvidia/.local/lib/python3.5/site-packages/tf_trt_models-0.0-py3.5.egg/tf_trt_models/detection.py", line 1, in
ImportError: cannot import name 'pipeline_pb2'

Can anyone help me with this?
Thank you

Slower performance when writing to file?

I'm using the jupyter example to create these trt-optimized graphs for use in my projects. I'm taking the TensorRT converted graph, writing it to a file, and then loading that pb file in and performing inference. However, I've noticed the runtimes I get when doing this are about 3 times greater than the runtimes reported by the notebook. Either the notebook is reporting incorrect times or somehow reconstructing the graph from the file creates a different graph than the original that is somehow slower. Has anyone been able to reproduce this issue?

Optimized model size is too big

Guys,

I have tried to optimize my custom frozen model to run on TensorRT using create_inference_graph(), however, the output was larger than the original model (my model is around 200MB, but after converting it's more than 2GB). Is it normal that the converted model size is bigger than the orginal one? Below are my settings:

trt_graph = trt.create_inference_graph(
        input_graph_def=frozen_graph,
        outputs,
        max_batch_size=64,
        max_workspace_size_bytes=1 << 25,
        precision_mode='FP16',
        minimum_segment_size=10
)

Also, because the model was way too big, I couldn't serialize it to .pb file, so that I had this error:
[libprotobuf ERROR external/protobuf_archive/src/google/protobuf/message_lite.cc:289] Exceeded maximum protobuf size of 2GB: 2756916500

Has anyone been able to solve these issues?

Import Error: numpy.core.multiarray failed to import when running camera_tf_trt.py in real time and in my own dataset

Hi,
First of all thank you for the tutorial. When I'm using camera_tf_trt.py in image huskies I got the detection properly. But can't do in the real time . My training dataset is different , not hands but I followed the steps, and again got the same error , could you please help me.

                      --build

INFO:main:called with args: Namespace(conf_th=0.3, do_build=True, do_tensorboard=False, filename='data/image9.jpg', image_height=720, image_width=1280, labelmap_file='data/object-detection_1.pbtxt', model='ssd_mobilenet_v1_egohands', num_classes=2, rtsp_latency=200, rtsp_uri=None, use_file=False, use_image=True, use_rtsp=False, use_usb=False, video_dev=1)
INFO:main:reading label map
INFO:main:building TRT graph and saving to pb: ./data/ssd_mobilenet_v1_egohands_trt.pb
ImportError: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
File "camera_tf_trt.py", line 244, in
main()
File "camera_tf_trt.py", line 203, in main
build_trt_pb(args.model, pb_path)
File "/home/nvidia/project/tf_trt_models/utils/od_utils.py", line 43, in build_trt_pb
from tf_trt_models.detection import download_detection_model
File "/home/nvidia/project/tf_trt_models/tf_trt_models/detection.py", line 9, in
from object_detection import exporter
File "/home/nvidia/.local/lib/python3.5/site-packages/object_detection-0.1-py3.5.egg/object_detection/exporter.py", line 28, in
from object_detection.builders import model_builder
File "/home/nvidia/.local/lib/python3.5/site-packages/object_detection-0.1-py3.5.egg/object_detection/builders/model_builder.py", line 29, in
from object_detection.meta_architectures import ssd_meta_arch
File "/home/nvidia/.local/lib/python3.5/site-packages/object_detection-0.1-py3.5.egg/object_detection/meta_architectures/ssd_meta_arch.py", line 32, in
from object_detection.utils import visualization_utils
File "/home/nvidia/.local/lib/python3.5/site-packages/object_detection-0.1-py3.5.egg/object_detection/utils/visualization_utils.py", line 26, in
import matplotlib.pyplot as plt # pylint: disable=g-import-not-at-top
File "/home/nvidia/.local/lib/python3.5/site-packages/matplotlib-3.0.3-py3.5-linux-aarch64.egg/matplotlib/pyplot.py", line 32, in
import matplotlib.colorbar
File "/home/nvidia/.local/lib/python3.5/site-packages/matplotlib-3.0.3-py3.5-linux-aarch64.egg/matplotlib/colorbar.py", line 28, in
import matplotlib.artist as martist
File "/home/nvidia/.local/lib/python3.5/site-packages/matplotlib-3.0.3-py3.5-linux-aarch64.egg/matplotlib/artist.py", line 11, in
from .path import Path
File "/home/nvidia/.local/lib/python3.5/site-packages/matplotlib-3.0.3-py3.5-linux-aarch64.egg/matplotlib/path.py", line 17, in
from . import _path, rcParams
ImportError: numpy.core.multiarray failed to import

I tried to update numpy. but still shows the same error. Could anyone please help me.
Thanking you in advance

Core Dump where create_inference_graph

I follow the the code example to convert ssd_mobilenet2 models to trt model in Jetson Nano, as below:

from tf_trt_models.detection import build_detection_graph
import tensorflow.contrib.tensorrt as trt
import tensorflow as tf

config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))

frozen_graph, input_names, output_names = build_detection_graph(
    config='/mypath/ssd_mobilenet_v2_coco.config',
    checkpoint='/mypath/model.ckpt-33825'
)

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

For sure the model is trained in amd64 platform.
While executing "create_inference_graph", following core dump generate.

2019-09-27 14:58:20.320137: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger (Unnamed Layer* 3) [Convolution]: at least three non-batch dimensions are required for input
2019-09-27 14:58:20.320454: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger (Unnamed Layer* 9) [Convolution]: at least three non-batch dimensions are required for input
Segmentation fault (core dumped)

Not sure whether the problem related to following warning.

WARNING:tensorflow:TensorRT mismatch. Compiled against version 5.1.6, but loaded 5.0.6. Things may not work
WARNING:tensorflow:TensorRT mismatch. Compiled against version 5.1.6, but loaded 5.0.6. Things may not work

Could any one give me a hand on it? Thank you.

Import error: cannot import 'pipeline_pb2'

Hi,
I have followed the setup instructions and after finishing I have copied your jupyter notebook example code to a python script.
While trying to exectue it the following error occurs:

Traceback (most recent call last):
File "run.py", line 14, in
from tf_trt_models.detection import download_detection_model, build_detection_graph
File "/home/nvidia/.local/lib/python3.5/site-packages/tf_trt_models-0.0-py3.5.egg/tf_trt_models/detection.py", line 1, in
ImportError: cannot import name 'pipeline_pb2'

Can anyone help me with this?
Thank you

Not able to export FasterRCNN model

Facing the following error

2018-12-13 09:49:37.182205: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::281, condition: isIndexedCHW(dims) && volume(dims) < MAX_TENSOR_SIZE
2018-12-13 09:49:37.182317: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:857] Engine creation for segment 0, composed of 3 nodes failed: Invalid argument: Failed to create Input layer tensor InputPH_0 rank=-2. Skipping...
2018-12-13 09:49:37.182353: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0
2018-12-13 09:49:37.184917: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::281, condition: isIndexedCHW(dims) && volume(dims) < MAX_TENSOR_SIZE
2018-12-13 09:49:37.185008: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:857] Engine creation for segment 1, composed of 3 nodes failed: Invalid argument: Failed to create Input layer tensor InputPH_0 rank=-2. Skipping...
2018-12-13 09:49:37.185031: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0
*** Error in `python3': munmap_chunk(): invalid pointer: 0x00007ffe6b0e1b10 ***

Any pointers would help.

Provide Category Index

Hi Nvidia team! Your work is amazing!

In the detection notebook, is possible to provide the CATEGORY_INDEX, i.d the mapping of id to category string, it seems somehow the tensorrt changes this id.

Thanks!

subgraph conversion error for subgraph And when loop detect one image, fps change lower with loop go on.

I met two question.
First, like ,
subgraph conversion error for subgraph_index:0 due to: "Unimplemented: Operation: GatherV2 does not support tensor input as indices, at: Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/FilterGreaterThan_37/Gather/GatherV2" SKIPPING......( 91 nodes)
2019-02-28 09:14:17.579506: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-02-28 09:14:17.579643: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:1 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 180 nodes)
2019-02-28 09:14:17.583141: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-02-28 09:14:17.583251: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:2 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 91 nodes)
2019-02-28 09:14:17.855174: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3273] Max batch size= 1 max workspace size= 15239862
2019-02-28 09:14:17.855271: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3277] Using FP16 precision mode
2019-02-28 09:14:17.855298: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3279] starting build engine
2019-02-28 09:15:27.723945: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3284] Built network
2019-02-28 09:15:27.983354: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3289] Serialized engine
2019-02-28 09:15:27.989671: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3297] finished engine my_trt_op3 containing 399 nodes
2019-02-28 09:15:27.989829: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3304] Finished op preparation
2019-02-28 09:15:28.006885: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3313] OK finished op building for my_trt_op3 on device
2019-02-28 09:15:28.019061: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-02-28 09:15:28.019268: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:4 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 812 nodes)
2019-02-28 09:15:28.022909: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-02-28 09:15:28.023033: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:5 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 91 nodes)
2019-02-28 09:15:28.026248: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::364, condition: isValidDims(dims)
2019-02-28 09:15:28.026375: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:6 due to: "Invalid argument: Failed to create Input layer" SKIPPING......( 93 nodes)

Second , the fps number change lower when I loop detect one image.
INFO:main:starting to loop and detect.
fps: 20.40725928088357
fps: 18.45935037996668
fps: 16.751810248984217
fps: 15.24339697527394
fps: 13.87311701215418
fps: 12.647912411364446
fps: 11.554539115658597
fps: 10.55662194967356
fps: 9.698609523401107
fps: 8.908598441178839
fps: 8.166230613095797
fps: 7.5197477917014535
fps: 6.959576719278569
fps: 6.410740168645285
fps: 5.916197277081991
fps: 5.556488737254684
fps: 5.168263848609842
fps: 4.834741166442536
fps: 4.53864476499229
fps: 4.264259366189802
fps: 4.010705880051854
fps: 3.724590941129009
fps: 3.5302278777241947
fps: 3.380465156750385
fps: 3.208263288248047
fps: 3.0406342368439874
fps: 2.928060590611087
fps: 2.873218830414652
fps: 2.7783801820319765
fps: 2.651907419900453
fps: 2.480411458345657
fps: 2.416111572401144
fps: 2.3129406820534624
fps: 2.255962378473225
fps: 2.177369284807169
fps: 2.138038598816302
fps: 2.1001961426642466
fps: 2.051187453237581
fps: 2.020784055840426
fps: 2.0304545929767412
fps: 2.0150360488983363
fps: 1.9953965241200242
fps: 1.963710877321159
fps: 1.948278107208975
fps: 1.9227513807195553
fps: 1.8743993135525407
fps: 1.9034719956981463
fps: 1.9100180144137575
fps: 1.8847275863332658
fps: 1.910353920728798
fps: 1.9060315274170423
fps: 1.850527859182863
fps: 1.8262420378562454
fps: 1.8340284447204942
fps: 1.7066448639223182
fps: 1.576937409745135

Can anyone help me? Thank you!

TensorRT is not enabled! HELP ME!!!

When I use tensorRT to optimize my tensorflow graph , (trt_graph = trt.create_inference_graph(
... input_graph_def=frozen_graph,
... outputs=output_names,
... max_batch_size=1,
... max_workspace_size_bytes=1 << 25,
... precision_mode='FP16',
... minimum_segment_size=50
... ))
I got errors like this:
INFO:tensorflow:Running against TensorRT version 0.0.0
Traceback (most recent call last):
File "", line 7, in
File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 153, in create_inference_graph
int(msg[0]))
tensorflow.python.framework.errors_impl.FailedPreconditionError: TensorRT is not enabled!

I have found many methods to solve this problem, however, no one seems useful, can you help me ? I really need to solve this problem as soon as possible !

can the tf_trt work with faster rcnn inception v2 for object detection on nvidia geforce ?

hi
I am working on a object detection model to detect the car plates
i used faster rcnn inception v2 (COCO) to train and i got impressive result for accuracy, but the speed isn't that good i am trying to implement the system for real time human
so dose the tf_trt work with this model , and if dose can it work with Nvidia Geforce graphic card

please help me

Can this run on jetpack4.2.2

When I Run this tutorial with python3.6 and jetpack4.2.2. I failed!

InternalError: Dst tensor is not initialized.
[[node save/RestoreV2 (defined at /usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/exporter.py:323) ]]

but I can run it on tx2 with jetcpack3.2 and python3.5.

Installation error: command 'aarch64-linux-gnu-gcc' failed with exit status 1

On the TX2 running L4T 28.2, during the installation step:

pip3 install tensorflow-1.11.0-cp35-cp35m-linux_aarch64.whl --user

The following error was encountered:

In file included from /tmp/pip-install-s479x1s5/h5py/h5py/defs.c:654:0:
/tmp/pip-install-s479x1s5/h5py/h5py/api_compat.h:27:18: fatal error: hdf5.h: No such file or directory
compilation terminated.
error: command 'aarch64-linux-gnu-gcc' failed with exit status 1

All of the libraries mentioned here were installed and the error persisted. The full output for that step is attached to this post.

logTFerror.txt

I ran exactly the same code, but I get seg fault

I downloaded your code and used ssd_mobilenet_v1_coco for only detection task, but I get seg fault after running:

scores, boxes, classes, num_detections = tf_sess.run([tf_scores, tf_boxes, tf_classes, tf_num_detections], feed_dict={
tf_input: image_resized[None, ...]
})

To get this error, I used trt graph created by trt.create_inference_graph(...), but if I don't create trt_graph and just use the frozen_graph after build_detection_graph(), everything runs! I assume there's something wrong with the trt.create_inference_graph(). I didn't change any argument when calling this function.

Can anyone help me? Have you had the same problem before?

tensorflow.python.framework.errors_impl.InvalidArgumentError

When I try faster_rcnn_resnet_101, when freezing the network I got this error from model_test.py which I make all steps in it :

File "model_test.py", line 37, in
minimum_segment_size=50
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 115, in create_inference_graph
int(msg[0]))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid graph: Frame ids for node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 does not match frame ids for it's fanout.

Thanks in advance.

Low inference speed

I have tried to recreate the benchmark results with the examples from the repository. The inference speed on my Jetson TX2 is much slower compared to the results in the table on the front page.

This is the log for classification.ipynb:

2018-07-01 22:18:34.878861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-07-01 22:18:34.879005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.46GiB
2018-07-01 22:18:34.879066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-01 22:18:35.940353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-01 22:18:35.940441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-07-01 22:18:35.940466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-07-01 22:18:35.940661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4002 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Converted 230 variables to const ops.
2018-07-01 22:18:49.301345: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2018-07-01 22:18:50.402393: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 1 max workspace size= 33554432
2018-07-01 22:18:50.402478: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2664] Using FP16 precision mode
2018-07-01 22:18:50.402500: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
2018-07-01 22:19:11.072290: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2671] Built network
2018-07-01 22:19:11.308241: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2676] Serialized engine
2018-07-01 22:19:11.318361: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2684] finished engine InceptionV1/my_trt_op0 containing 493 nodes
2018-07-01 22:19:11.318499: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2704] Finished op preparation
2018-07-01 22:19:11.339604: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2712] OK finished op building
2018-07-01 22:19:11.392810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-01 22:19:11.392929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-01 22:19:11.392958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-07-01 22:19:11.392980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-07-01 22:19:11.393077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4002 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
(0.374037) golden retriever

(0.048114) miniature poodle

(0.042460) toy poodle

(0.036036) cocker spaniel, English cocker spaniel, cocker

(0.017122) standard poodle

Inference finished in 2712 ms

My only modification to the example code is time measurement around

output = tf_sess.run(tf_output, feed_dict={
    tf_input: image[None, ...]
})

I ran my tests after a reboot with

sudo nvpmodel -m 0
sudo ~/jetson_clocks.sh

Without those commands the inference time is ~200 ms higher.

What am I missing here?

TensorRT Mismatch

Linux distro and version: Ubuntu 18.04
GPU type: NVIDIA GeForce GTX 1080 Ti Founder's Edition
nvidia driver version: 418.56
CUDA version: 10.0
CUDNN version: 7.4.1
Python version [if using python]: 3.6.7
Tensorflow version:1.13.1
TensorRT version: 5.1.2

Following the instructions per the TensorRT manual, I used the frozen model to implement Tensorflow-RT:

import tensorflow.contrib.tensorrt as trt

trt_graph = trt.create_inference_graph(input_graph_def=frozen_graph, outputs=[out.op.name for out in model.outputs], max_batch_size=1,max_workspace_size_bytes=2 << 20, precision_mode="fp16")
tf.train.write_graph(trt_graph, "model", "tfrt_model.pb", as_text=False)

However, then I get the error:

WARNING:tensorflow:TensorRT mismatch. Compiled against version 5.0.2, but loaded 5.1.2. Things may not work.

What exactly is not lining up? Is my CUDA version too low? Is my cuDNN too low as well? Does it need to be upgraded to 10.1? Or should I downgrade my TensorRT? Am I missing something? Any help would be greatly appreciated.

SSD MobileNetv2 is failed (in Jetson TX2)

Hi,

I try to use ssd_mobilenet_v1_coco, it works well without any error in Jetson TX2. However, using ssd_mobilenet_v2_coco generates error like below. I already clone and install using latest repo version (date: May 2, 2019). Also, I read that other models in TF model zoo are not supported in Jetson TX2. Can you explain why? E.g., there is a layer operation that is not supported in in TensorRT.

Thanks.

InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [1,1,576,273] rhs shape= [3,3,576,273]
	 [[{{node save/Assign_3}} = Assign[T=DT_FLOAT, _class=["loc:@BoxPredictor_0/ClassPredictor/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](BoxPredictor_0/ClassPredictor/weights, save/RestoreV2/_7)]]
	 [[{{node save/RestoreV2/_642}} = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_648_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:321)]]

Caused by op 'save/Assign_3', defined at:
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.5/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.5/dist-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.5/asyncio/base_events.py", line 345, in run_forever
    self._run_once()
  File "/usr/lib/python3.5/asyncio/base_events.py", line 1312, in _run_once
    handle._run()
  File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run
    self._callback(*self._args)
  File "/usr/local/lib/python3.5/dist-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2817, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2843, in _run_cell
    return runner(coro)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 3018, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 3183, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 3265, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-9d6f92ac0f3e>", line 3, in <module>
    checkpoint=checkpoint_path,
  File "/root/.local/lib/python3.5/site-packages/tf_trt_models-0.0-py3.5.egg/tf_trt_models/detection.py", line 144, in build_detection_graph
    tf_saver = tf.train.Saver()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1094, in __init__
    self.build()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1106, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1143, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 787, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 119, in restore
    self.op.get_shape().is_fully_defined())
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/state_ops.py", line 221, in assign
    validate_shape=validate_shape)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [1,1,576,273] rhs shape= [3,3,576,273]
	 [[{{node save/Assign_3}} = Assign[T=DT_FLOAT, _class=["loc:@BoxPredictor_0/ClassPredictor/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](BoxPredictor_0/ClassPredictor/weights, save/RestoreV2/_7)]]
	 [[{{node save/RestoreV2/_642}} = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_648_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:321)]]

Unable to change batch_size

I am running a detection example with ssd_inception_v2 and I have changed the max_batch_size to 24, but when I try to actually compute a batch of any size other than 1 I get this error:
ValueError: Cannot feed value of shape (12, 300, 300, 3) for Tensor 'input:0', which has shape '(1, ?, ?, 3)'
Is there anything else that needs to be changed?

Very slow 1.8s/image on Faster RCNN ResNet101 and wrongly perceived 1 image as 153216 images !!!

Ubuntu 16.04
Docker tensorflow/tensorflow 1.13.1 and tensorflow/serving:latest-gpu
NVIDIA TensorRT 5.0.2 (https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html)
Tensorflow object detection Faster RCNN Resnet101 successfully built w/ 2 classes only
Model are converted into FP32 (also tried FP16 with the same below issue)
with graph.as_default():
with tf.Session() as sess:
trt_graph = trt.create_inference_graph(
input_graph_def=gdef,
outputs=outputs,
max_batch_size=1,
max_workspace_size_bytes=4000000000,
is_dynamic_op=True,
#precision_mode='FP16')
precision_mode='FP32')
#precision_mode='INT8')
output_node=tf.import_graph_def(trt_graph, return_elements=outputs)
#sess.run(output_node)
tf.saved_model.simple_save(sess,
rt_output_file_name_32,
inputs={'input_image': graph.get_tensor_by_name('{}:0'.format(node.name))
for node in graph.as_graph_def().node if node.op=='Placeholder'},
outputs={t:graph.get_tensor_by_name('import/'+t) for t in outputs}
)

RUN:
docker kill food_non_food
docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=/mnt/hatto/food_non_food,target=/models/food_non_food
-e MODEL_NAME=food_non_food -t tensorflow/serving:latest-gpu

CLIENT:
image = PIL.Image.open(IMAGE_PATH)
image_np = np.array(image)
payload = {"instances": [image_np.tolist()]}
SERVING_URL = 'http://localhost:8501/v1/models/food_non_food:predict'
start = time.time()
t = requests.post(SERVING_URL, json=payload)
end = time.time()
print ('Took ', end-start)

Consistenly received ERROR/WARNING:

2019-07-10 06:52:26.523782: W external/org_tensorflow/tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:264] Engine buffer is full. buffer limit=1, current entries=1, requested batch=153216
2019-07-10 06:52:26.523827: W external/org_tensorflow/tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:281] Failed to get engine batch, running native segment for import/ClipToWindow/Area/TRTEngineOp_0

This is it. It runs but alsways took 1.8 seconds/image (size 1024x) which is terrible ! The message above keep poping up that batch_size is 153216 while I submit only ONE SINGLE image !!!

I do not think that TensorRT is for PRODUCTION level just yet !

TensortRT has no effect on ssd_mobilenet_v1_fpn_coco model

When I use the ssd_mobilenet_v1_fpn_coco model to use tensorRT to accelerate,It doesn't work

retinanet mobile no tensorRT
Iteration: 0.430 sec
Iteration: 0.421 sec
Iteration: 0.420 sec
Iteration: 0.427 sec
Iteration: 0.439 sec
Iteration: 0.427 sec
Iteration: 0.411 sec
Iteration: 0.424 sec
Iteration: 0.432 sec
Iteration: 0.429 sec
Iteration: 0.413 sec
Iteration: 0.424 sec
Iteration: 0.424 sec
Iteration: 0.428 sec
Iteration: 0.427 sec
Iteration: 0.431 sec
Iteration: 0.417 sec
Iteration: 0.418 sec
tensorRT
0.505087852478
0.504916906357
0.501970052719
0.505352973938
0.494786024094
0.498456954956
0.504287004471
0.50328707695
0.507141113281
0.499255895615
0.487679004669
0.489063978195
0.492527008057
0.503779172897
0.514405965805

log:
retinanet v1
('config_path', './data/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/pipeline.config')
('checkpoint_path', './data/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/model.ckpt')
2018-09-03 09:24:44.137510: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-09-03 09:24:44.137784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.45GiB
2018-09-03 09:24:44.137850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-03 09:24:47.792908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-03 09:24:47.793169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-09-03 09:24:47.793277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-09-03 09:24:47.793573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2913 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Converted 333 variables to const ops.
2018-09-03 09:26:07.919932: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2018-09-03 09:26:16.036617: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 4
2018-09-03 09:26:16.057791: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: "Unimplemented: Require 4 dimensional input. Got 0 const6" SKIPPING......( 108 nodes)
2018-09-03 09:26:16.064689: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Require 4 dimensional input. Got 0 const6" SKIPPING......( 108 nodes)
2018-09-03 09:26:16.837392: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:2 due to: "Invalid argument: Output node 'const6' is weights not tensor" SKIPPING......( 612 nodes)
2018-09-03 09:26:16.842941: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:3 due to: "Unimplemented: Require 4 dimensional input. Got 1 Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/zeros_like_47" SKIPPING......( 181 nodes)
['boxes', 'classes', 'scores']
2018-09-03 09:27:47.320719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-03 09:27:47.320894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-03 09:27:47.320933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-09-03 09:27:47.320967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-09-03 09:27:47.321106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2913

thanks !

ImportError: No module named 'tf_trt_models.detection'

Hi,
I got an error when running this code line:
from tf_trt_models.detection import download_detection_model, build_detection_graph
But, importing tensorrt is all fine.
import tensorrt as trt

Any solution?
I use Jetson TX2 on Ubuntu 16.04 64bit flashed with Jetpack 3.3.

install.sh using python

I run the install.sh and run into problem immediately, and that's because in the install.sh it uses python but I installed everything on python3. By changing that line 6 to python3, it was able to install.

the function `build_classification_graph` can use my own defined model ?

hello, I have a question, the function build_classification_graph can use my own model by replace the parameter model to my own network path? in which the net is defined as follows:

class MyLeNet(Network):
    def setup(self):
        (self.feed('data')
             .conv(5, 5, 20, 1, 1, padding='VALID', relu=False, name='conv1')
             .max_pool(2, 2, 2, 2, name='pool1')
             .conv(5, 5, 50, 1, 1, padding='VALID', relu=False, name='conv2')
             .max_pool(2, 2, 2, 2, name='pool2')
             .fc(500, name='ip1')
             .fc(10, relu=False, name='ip2')
             .softmax(name='prob'))

error input size deply ssd on trtis

solved! thank you

Installation Error :tensorflow-1.10.1-cp35-cp35m-linux_aarch64.whl is not a supported wheel on this platform

Hardware: NVIDIA JETSON TX2

On following the instructions on https://github.com/NVIDIA-AI-IOT/tf_trt_models, at step 3
Install TensorFlow 1.7+ (with TensorRT support). Download the pre-built pip wheel and install using pip. I downloaded the .whl file and while installing I got this error.

Install TensorFlow 1.7+ (with TensorRT support). Download the pre-built pip wheel and install using pip.

Thanks.

Run times far from expected

Running the notebook with python 3 on a tx2 with JetPack 3.3. I followed the instructions and I am measuring the inference time as follow
from time import time
start = time()
output = tf_sess.run(tf_output, feed_dict={
tf_input: image[None, ...]
})
end = time()
print("Inference time: {}s".format(end-start))
scores = output[0]

Using the same examples as the notebook (inception_v1 etc), I got a inference time of 0.8 seconds, pretty far from the 7ms described.
I also used
sudo nvpmodel -m 0
sudo ~/jetson_clocks.sh

Tensorrt supported detection networks

Hi,

It is seen that tensorrt supports resnet in classification task. Does it also support the detection networks with a resnet backbone?

What are the exception modules which tensorrt does not support ?

Thanks in advance

subgraph conversion error for subgraph_index:0

Hello?
I convert detection.ipynb file to detection.py and test it.
But subgraph conversion error occur and there are no detection results.

nvidia@ tegra-ubuntu:~/tf_trt_models/examples/detection$ python3 detection.py
2018-08-03 12:56:54.283943: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-08-03 12:56:54.284075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 1.60GiB
2018-08-03 12:56:54.284128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-03 12:56:55.970822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-03 12:56:55.970892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-08-03 12:56:55.970919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-08-03 12:56:55.971081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 908 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Converted 199 variables to const ops.
['scores', 'classes', 'boxes']
2018-08-03 12:57:59.031208: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2018-08-03 12:58:04.831767: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 2
2018-08-03 12:58:04.844875: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: "Unimplemented: Require 4 dimensional input. Got 1 Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/zeros_like_30" SKIPPING......( 181 nodes)
2018-08-03 12:58:05.072039: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 1 max workspace size= 23679062
2018-08-03 12:58:05.072123: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2664] Using FP16 precision mode
2018-08-03 12:58:05.072153: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
2018-08-03 12:58:48.920384: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2671] Built network
2018-08-03 12:58:49.088932: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2676] Serialized engine
2018-08-03 12:58:49.100270: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2684] finished engine my_trt_op1 containing 434 nodes
2018-08-03 12:58:49.100414: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2704] Finished op preparation
2018-08-03 12:58:49.120219: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2712] OK finished op building
2018-08-03 12:58:55.775380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-03 12:58:55.775482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-03 12:58:55.775510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-08-03 12:58:55.775535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-08-03 12:58:55.775620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 908 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

I use JetPACK-3.2.1 and tensorflow1.8.0(python3)
I don't know what wrong..
Can you help me with this error?

ssd to trt plan

Hello,
can you please provide script to convert ssd model to tensorrt plan file?

Cannot using faster-rcnn

Does this project only support SSD_mobilenet and ssd_inception_v2 when detection? How to use faster-rcnn? Thank you!

Why freeze the graphs manually?

Is there a particular reason why the code in this repo prefers to generate a new frozen graph rather than just using the frozen graph that comes from the pre-trained models?

ImportError: No module named 'object_detection'

Hi,
I'm trying to use the repository to test TF-TRT on the JetsonTX2 but when using python3 I try this:
from tf_trt_models.detection import download_detection_model

I get the error ImportError: No module named 'object_detection'

What is weird is when I try the same command but in python (not python3), I have another error:
ImportError: No module named google.protobuf
The installation went all well, some ideas ?
Thanks

regarding conversion of ssd_inception_v2_coco_trt.pb to tensorrt plan

Hi ,
I want to convert ssd model to tensorrt plan file . But I am getting error as

Using output node scores boxes classes
Converting to UFF graph
Traceback (most recent call last):
File "convert_plan.py", line 73, in
data_type
File "convert_plan.py", line 24, in frozenToPlan
text=False,
File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/conversion_helpers.py", line 113, in from_tensorflow_frozen_model
return from_tensorflow(tf_graphdef, output_nodes, **kwargs)
File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/conversion_helpers.py", line 77, in from_tensorflow
name="main")
File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py", line 74, in convert_tf2uff_graph
uff_graph, input_replacements)
File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py", line 45, in convert_tf2uff_node
tf_node = tf_nodes[name]
KeyError: 'scores boxes classes'

Can you provide script to convert this ssd model to tensorrt plan file?

Thanks**

fps of mobilenet ssd with tensorRt

what is exact fps u get when run mobilenet SSD on jetsontx2

KeyError: "The name 'detection_scores:0' refers to a Tensor which does not exist. The operation, 'detection_scores', does not exist in the graph."

In this part of code of Jupyter notebook i have this error

I have tried with ssd_mobilenet_v1_coco

tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True

tf_sess = tf.Session(config=tf_config)

tf.import_graph_def(trt_graph, name='')

tf_input = tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
tf_scores = tf_sess.graph.get_tensor_by_name('detection_scores:0')
tf_boxes = tf_sess.graph.get_tensor_by_name('detection_boxes:0')
tf_classes = tf_sess.graph.get_tensor_by_name('detection_classes:0')

Error

`---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in
7
8 tf_input = tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
----> 9 tf_scores = tf_sess.graph.get_tensor_by_name('detection_scores:0')
10 tf_boxes = tf_sess.graph.get_tensor_by_name('detection_boxes:0')
11 tf_classes = tf_sess.graph.get_tensor_by_name('detection_classes:0')

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in get_tensor_by_name(self, name)
3662 raise TypeError("Tensor names are strings (or similar), not %s." %
3663 type(name).name)
-> 3664 return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
3665
3666 def _get_tensor_by_tf_output(self, tf_output):

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in as_graph_element(self, obj, allow_tensor, allow_operation)
3486
3487 with self._lock:
-> 3488 return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
3489
3490 def _as_graph_element_locked(self, obj, allow_tensor, allow_operation):

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in _as_graph_element_locked(self, obj, allow_tensor, allow_operation)
3528 raise KeyError("The name %s refers to a Tensor which does not "
3529 "exist. The operation, %s, does not exist in the "
-> 3530 "graph." % (repr(name), repr(op_name)))
3531 try:
3532 return op.outputs[out_n]

KeyError: "The name 'detection_scores:0' refers to a Tensor which does not exist. The operation, 'detection_scores', does not exist in the graph."`

ModuleNotFoundError: No module named 'tensorflow.contrib.tensorrt'

hi,
I got an error when running this code line:
import tensorflow.contrib.tensorrt as trt
can you tell me how to slove this?

Don't use `sudo` during installation

I just had to cleaning up my system directories after calling the install.sh script without looking into what it actually does. I had everything setup in a virtualenv and assumed it would just go there, I almost got a heart attack when I saw all the sudo calls in the script.

Please remove them in the installation script, it is bad practice and if a person calls something like sudo apt-get install just before calling the script it doesn't even tell the user what just happened. I think using the --user flag instead would be okay.

Also the sudo calls in install_protoc.sh are dangerous. If there is no other way I think ~/.local/bin would be a more appropriate location.

If this is not a solution please add a warning message above the installation instruction, best in capital red letters!!!

Very slow inference on video feed

Hi,

I saved the session using:
tf.saved_model.simple_save(tf_sess, "./save_dir/", inputs={"tf_input":tf_input}, outputs={"tf_scores":tf_scores, "tf_boxes":tf_boxes,"tf_classes":tf_classes})

And called the session again in a different script using:
tf.saved_model.loader.load(tf_sess, [tag_constants.SERVING], "./save_dir/")

When running,
scores, boxes, classes = tf_sess.run([tf_scores, tf_boxes, tf_classes], feed_dict={tf_input:image_resized[None, ...]})
I get in average around 4-5fps.

Is there any advice to run inference on a video feed to achieve the frame rate mentioned in the README?

Many thanks!

PS: I did run before doing inference /jetson_clocks and nvpmodel -m 0

Error when loading trt_graph

Hi,

I use examples/detection/detection.ipynb to load official ssd_resnet_50_fpn_coco model, and convert to tensorrt graph. I am not sure why but I got this error:

ValueError                                Traceback (most recent call last)
<ipython-input-7-8056b93d1560> in <module>()
      3 tf_sess = tf.Session(config=tf_config)
      4 
----> 5 tf.import_graph_def(trt_graph, name='')
      6 
      7 tf_input = tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')

/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.pyc in new_func(*args, **kwargs)
    452                 'in a future version' if date is None else ('after %s' % date),
    453                 instructions)
--> 454       return func(*args, **kwargs)
    455     return tf_decorator.make_decorator(func, new_func, 'deprecated',
    456                                        _add_deprecated_arg_notice_to_docstring(

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.pyc in import_graph_def(graph_def, input_map, return_elements, name, op_dict, producer_op_list)
    420       except errors.InvalidArgumentError as e:
    421         # Convert to ValueError for backwards compatibility.
--> 422         raise ValueError(str(e))
    423 
    424     # Create _DefinedFunctions for any imported functions.

ValueError: Node 'my_trt_op_0': Unknown input node 'my_trt_op_1:14'

Could you help me with this?

Thanks,
Xin

can i use this with a converted yolov3-tiny model on a jetson nano?

MobilenetV2-ssd is failed!

onverted to TensorRT: Fill, Merge, Switch, Range, ConcatV2, ZerosLike, Identity, NonMaxSuppressionV3, Minimum, StridedSlice, Shape, Split, Where, Exp, ExpandDims, Unpack, GatherV2, NoOp, TopKV2, Cast, Placeholder, Mul, Pack, Reshape, ResizeBilinear, Squeeze, Add, Greater, Const, Sub, Transpose, Slice, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops).
2019-03-29 13:17:01.262911: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:928] Number of TensorRT candidate segments: 1
[1] 27275 killed python3 camera_tf_trt.py --image --filename --model ssd_mobilenet_v2_coco

can not run with precision_mode="INT8"

error is fillowing
Node Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/add should have an input named 'Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/add/y' but it is not available

Tensor RT-FRCNN

@Alexey-Kamenev @dusty-nv @tokk-nv @nsmoly Hi can we use tensorRT to deply faster rcnn models on the tx2 board , if so can you share the links to deploy it on the board ...

TX2 slower than reported

Hello Nvidia!

Thank you for the clear explanation and benchmarking on this website, and for testing out different models, it is really appreciated! According to your execution time table, I should get 54.4ms when running ssd_inception_v2_coco on the TX2. Over 200 runs, after the network is 'warmed up', I get 69.63ms. When looking at tegra_stats, it seems that the GPU is not very efficiently utilized (even though it varies over time, it is rarely even close to 90%):

RAM 4167/7854MB (lfb 84x4MB) CPU [49%@2035,0%@2035,0%@2035,44%@2035,47%@2032,38%@2035] EMC_FREQ 7%@1866 GR3D_FREQ 18%@1300 APE 150 MTS fg 0% bg 0% BCPU@48C MCPU@48C [email protected] PLL@48C Tboard@41C [email protected] PMIC@100C [email protected] VDD_IN 7862/4839 VDD_CPU 1763/820 VDD_GPU 2531/947 VDD_SOC 997/929 VDD_WIFI 0/33 VDD_DDR 1626/1271

I just followed all the steps on the Github readme and the notebook , so any idea what could be the cause of this? I use Jetpack 3.3 and Tensorflow 1.10.

Edit: see the Nvidia forums for more on this issue

cuda version problem causes cannot use tensorflow

I think the latest jetson tx2 is cuda 10.0, but the tutorial install the tensorflow with cuda9.0, so when I import tensorflow, I met the following error:

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

In [1]: import tensorflow as tf

ImportError Traceback (most recent call last)
in ()
----> 1 import tensorflow as tf

/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/init.py in ()
20
21 # pylint: disable=g-bad-import-order
---> 22 from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
23
24 try:

/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/python/init.py in ()
47 import numpy as np
48
---> 49 from tensorflow.python import pywrap_tensorflow
50
51 # Protocol buffers

/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py in ()
72 for some common reasons and solutions. Include the entire stack trace
73 above this error message when asking for help.""" % traceback.format_exc()
---> 74 raise ImportError(msg)
75
76 # pylint: enable=wildcard-import,g-import-not-at-top,unused-import,line-too-long

ImportError: Traceback (most recent call last):
File "/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

Why the id of class will change when the tensorRT is used

For the same picture and same model,When inference with tensorRT, the classification id of the output is different from that without tensorRT。
By comparison, using tensorrt, the ids of several classifications with high confidence would be one less。
Why does this happen？
In the sample code you gave, by looking at the mscoco_label_map.pbtxt file, the id in the sample is 1 less

very slow inference result on Jetson TX2

Hi everyone,
I converted the ssdlite_mobilenetv2 and ssd_mobilenetv2 and ssd_resnet50 to TensorRT with Tensorflow API, and this API generated the .pb file.
I used Tensorflow 1.13 and Jetpack 4.2, but the final inference time is not good.
I achieved 2.5 FPS, this isn't real-time, and the loading model time is about 10 min, why?

Import Error: cannot import name 'exporter'

Traceback (most recent call last):
File "jetson_stream.py", line 9, in
from object_detector_trt import ObjectDetectorTRT
File "/home/nano/sc2.2/scripts/mobile_detector/object_detector_trt.py", line 5, in
from tf_trt_models.detection import download_detection_model,
File "/home/nano/.local/lib/python3.6/site-packages/tf_trt_models-0.0-py3.6.egg/tf_trt_models/detection.py", line 3, in
ImportError: cannot import name 'exporter'

ImportError: cannot import name 'deprecated_endpoints'

import tensorflow.contrib.tensorrt as trt

Traceback (most recent call last):
File "", line 1, in
File "/root/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/init.py", line 25, in
from tensorflow.contrib.tensorrt.python import *
File "/root/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/init.py", line 22, in
from tensorflow.contrib.tensorrt.python.ops import trt_engine_op
File "/root/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/ops/trt_engine_op.py", line 25, in
from tensorflow.contrib.tensorrt.ops.gen_trt_engine_op import *
File "/root/anaconda3/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/ops/gen_trt_engine_op.py", line 24, in
from tensorflow.python.util.deprecation import deprecated_endpoints
ImportError: cannot import name 'deprecated_endpoints'

can u guys giive me a favor?

nvidia-ai-iot / tf_trt_models Goto Github PK

tf_trt_models's Introduction

TensorFlow/TensorRT Models on Jetson

Setup

Image Classification

Models

Download pretrained model

Build TensorRT / Jetson compatible graph

Optimize with TensorRT

Jupyter Notebook Sample

Train for custom task

Object Detection

Models

Download pretrained model

Build TensorRT / Jetson compatible graph

Optimize with TensorRT

Jupyter Notebook Sample

Train for custom task

tf_trt_models's People

Stargazers

Watchers

Forkers

tf_trt_models's Issues

Hi , I want to convert ssd model to tensorrt plan file . But I am getting error as

I think the latest jetson tx2 is cuda 10.0, but the tutorial install the tensorflow with cuda9.0, so when I import tensorflow, I met the following error:

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

In [1]: import tensorflow as tf

Recommend Projects

Recommend Topics

Recommend Org

Hi ,
I want to convert ssd model to tensorrt plan file . But I am getting error as