dusty-nv / jetson-containers Goto Github PK

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

License: MIT License

Shell 7.72% Python 63.12% Dockerfile 9.93% C++ 0.07% JavaScript 3.94% CSS 0.29% HTML 4.44% Jupyter Notebook 9.22% CMake 0.04% Cuda 0.89% Ruby 0.33%

machine-learning dockerfiles jetson pytorch tensorflow pandas scikit-learn numpy ros-containers ros2-foxy

jetson-containers's Introduction

Machine Learning Containers for Jetson and JetPack

Modular container build system that provides various AI/ML packages for NVIDIA Jetson 🚀🤖


ML	`pytorch` `tensorflow` `onnxruntime` `deepstream` `jupyterlab` `stable-diffusion`
LLM	`NanoLLM` `transformers` `text-generation-webui` `ollama` `llama.cpp` `exllama` `llava` `awq` `AutoGPTQ` `MLC` `optimum` `nemo`
L4T	`l4t-pytorch` `l4t-tensorflow` `l4t-ml` `l4t-diffusion` `l4t-text-generation`
VIT	`NanoOWL` `NanoSAM` `Segment Anything (SAM)` `Track Anything (TAM)` `clip_trt`
CUDA	`cupy` `cuda-python` `pycuda` `numba` `cudf` `cuml`
Robotics	`ros` `ros2` `opencv:cuda` `realsense` `zed` `oled`
RAG	`llama-index` `langchain` `jetrag` `NanoDB` `FAISS` `RAFT`
Audio	`whisper` `whisper_trt` `piper` `riva` `audiocraft` `voicecraft`
Smart Home	`homeassistant-core` `homeassistant-base` `wyoming-whisper` `wyoming-openwakeword` `wyoming-piper` `wyoming-assist-microphone`

See the packages directory for the full list, including pre-built container images for JetPack/L4T.

Using the included tools, you can easily combine packages together for building your own containers. Want to run ROS2 with PyTorch and Transformers? No problem - just do the system setup, and build it on your Jetson:

$ jetson-containers build --name=my_container pytorch transformers ros:humble-desktop

There are shortcuts for running containers too - this will pull or build a l4t-pytorch image that's compatible:

$ jetson-containers run $(autotag l4t-pytorch)

^{jetson-containers run launches docker run with some added defaults (like --runtime nvidia, mounted /data cache and devices)}
^{autotag finds a container image that's compatible with your version of JetPack/L4T - either locally, pulled from a registry, or by building it.}

If you look at any package's readme (like l4t-pytorch), it will have detailed instructions for running it.

Changing CUDA Versions

You can rebuild the container stack for different versions of CUDA by setting the CUDA_VERSION variable:

CUDA_VERSION=12.4 jetson-containers build transformers

It will then go off and either pull or build all the dependencies needed, including PyTorch and other packages that would be time-consuming to compile. There is a Pip server that caches the wheels to accelerate builds. You can also request specific versions of cuDNN, TensorRT, Python, and PyTorch with similar environment variables like here.

Documentation

Check out the tutorials at the Jetson Generative AI Lab!

Getting Started

Refer to the System Setup page for tips about setting up your Docker daemon and memory/storage tuning.

# install the container tools
git clone https://github.com/dusty-nv/jetson-containers
bash jetson-containers/install.sh

# automatically pull & run any container
jetson-containers run $(autotag l4t-pytorch)

Or you can manually run a container image of your choice without using the helper scripts above:

sudo docker run --runtime nvidia -it --rm --network=host dustynv/l4t-pytorch:r36.2.0

Looking for the old jetson-containers? See the legacy branch.

Gallery

Multimodal Voice Chat with LLaVA-1.5 13B on NVIDIA Jetson AGX Orin (container: NanoLLM)

Interactive Voice Chat with Llama-2-70B on NVIDIA Jetson AGX Orin (container: NanoLLM)

Realtime Multimodal VectorDB on NVIDIA Jetson (container: nanodb)

NanoOWL - Open Vocabulary Object Detection ViT (container: nanoowl)

Live Llava on Jetson AGX Orin (container: NanoLLM)

Live Llava 2.0 - VILA + Multimodal NanoDB on Jetson Orin (container: NanoLLM)

Small Language Models (SLM) on Jetson Orin Nano (container: NanoLLM)

Realtime Video Vision/Language Model with VILA1.5-3b (container: NanoLLM)

jetson-containers's People

Contributors

Stargazers

Watchers

Forkers

tzeitim povilasv gimel-ai doutdex rszym88 rondagdag mrjvb pkanon bunderhi cd109 swancor feefro phongphuhanam land007 marcbelmont vignesh1230 traseehq yarikgoldvarg xyyeh brianoppenheim cccvvvvv pc0179 mickaelcormier eweill-nv dreamfortek petrox tags07 tiryoh dawsonf ron-devel marcostrullato waspinator amehrez xirdigh qwertimer jvschw tatsuya-2 rasmusandersen qqq-tech husainkapadia austinpickering rbonghi f-fl0 pablorr100 ldg810 dingooz mzahana langerma greenroom-robotics jereo sleeplessinva t2hk mengyaowunotavailable 0w8states tdnpp abstractguy abdo-diwan hyunsungkim taaaaaantrum bondi-labs namwoo geoc1234 ganlumomo shinjihagio ntnu-adaptive-sampling-group deep-learning-and-machine-learning avatar196kc zzxaijs ronaldpoint ljb2208 rayed-therap prof-schacht hebskjcc jordanresearch cloudanalytics logivations ptyork dynamics-regensburg aldrichg9 alok018 ntnu-aferry lakshadeep amirsamanahmadi azzarello snapbuy polyu-robocon muddasar-ali sheremet eniware-org aldajo92 wep21 seatre ros2deep koburzasty tyconcps sandeshghimire anscipione amfern wjzhou alessiomorale

jetson-containers's Issues

Install moveit within Dockerfile.ros.noetic

Hi!

I am trying to build Moveit! from source as explained here within Dockerfile.ros.noetic, but I am having some issues.

Here is what I am trying in the Dockerfile, right after line 54:

# Install Moveit dependencies
RUN apt-get update && \
    apt-get install --no-install-recommends --no-install-suggests --yes \
    clang-format-10 \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install Moveit
WORKDIR /
RUN chmod +x ${ROS_ROOT}/setup.bash
RUN /bin/bash -c  '${ROS_ROOT}/setup.bash'

RUN mkdir moveit_catkin_ws && \
    cd moveit_catkin_ws && \
    wstool init src && \
    wstool merge -t src https://raw.githubusercontent.com/ros-planning/moveit/master/moveit.rosinstall && \
    wstool update -t src && \
    # ISSUE HERE 
    rosdep install -y --from-paths src --ignore-src --rosdistro ${ROS_DISTRO} --os=ubuntu:bionic

RUN catkin config --extend /opt/ros/${ROS_DISTRO} --cmake-args -DCMAKE_BUILD_TYPE=Release

And this is what I am getting:

ERROR: the following packages/stacks could not have their rosdep keys resolved
to system dependencies:
moveit_ros_move_group: No definition of [rostest] for OS version [bionic]
moveit_chomp_optimizer_adapter: No definition of [pluginlib] for OS version [bionic]
moveit_resources_prbt_pg70_support: No definition of [xacro] for OS version [bionic]
moveit_planners: No definition of [catkin] for OS version [bionic]
moveit: No definition of [catkin] for OS version [bionic]
moveit_ros_occupancy_map_monitor: No definition of [rosunit] for OS version [bionic]
moveit_ros_planning: No definition of [tf2_ros] for OS version [bionic]
moveit_runtime: No definition of [catkin] for OS version [bionic]
moveit_ros_visualization: No definition of [rostest] for OS version [bionic]
moveit_resources_pr2_description: No definition of [catkin] for OS version [bionic]
moveit_ros_control_interface: No definition of [trajectory_msgs] for OS version [bionic]
moveit_ros_manipulation: No definition of [pluginlib] for OS version [bionic]
moveit_ros_perception: No definition of [rosunit] for OS version [bionic]
moveit_ros_warehouse: No definition of [tf2_ros] for OS version [bionic]
rviz_visual_tools: No definition of [rosunit] for OS version [bionic]
moveit_ros: No definition of [catkin] for OS version [bionic]
moveit_resources_prbt_moveit_config: No definition of [rviz] for OS version [bionic]
moveit_tutorials: No definition of [rosunit] for OS version [bionic]
pilz_industrial_motion_planner: No definition of [code_coverage] for OS version [bionic]
moveit_planners_chomp: No definition of [rostest] for OS version [bionic]
moveit_resources_panda_moveit_config: No definition of [topic_tools] for OS version [bionic]
moveit_fake_controller_manager: No definition of [roscpp] for OS version [bionic]
moveit_simple_controller_manager: No definition of [actionlib] for OS version [bionic]
moveit_msgs: No definition of [std_msgs] for OS version [bionic]
moveit_ros_robot_interaction: No definition of [rosunit] for OS version [bionic]
moveit_resources_prbt_support: No definition of [code_coverage] for OS version [bionic]
moveit_commander: No definition of [rostest] for OS version [bionic]
moveit_resources_prbt_ikfast_manipulator_plugin: No definition of [tf2_kdl] for OS version [bionic]
moveit_resources_fanuc_description: No definition of [catkin] for OS version [bionic]
pilz_industrial_motion_planner_testutils: No definition of [catkin] for OS version [bionic]
moveit_core: No definition of [rosunit] for OS version [bionic]
moveit_servo: No definition of [rostest] for OS version [bionic]
moveit_ros_planning_interface: No definition of [eigen_conversions] for OS version [bionic]
moveit_ros_benchmarks: No definition of [pluginlib] for OS version [bionic]
moveit_kinematics: No definition of [xmlrpcpp] for OS version [bionic]
moveit_plugins: No definition of [catkin] for OS version [bionic]
moveit_resources: No definition of [robot_state_publisher] for OS version [bionic]
moveit_visual_tools: No definition of [cmake_modules] for OS version [bionic]
panda_moveit_config: No definition of [topic_tools] for OS version [bionic]
moveit_resources_panda_description: No definition of [catkin] for OS version [bionic]
chomp_motion_planner: No definition of [catkin] for OS version [bionic]
geometric_shapes: No definition of [rosunit] for OS version [bionic]
moveit_setup_assistant: No definition of [rosunit] for OS version [bionic]
moveit_planners_ompl: No definition of [eigen_conversions] for OS version [bionic]
moveit_resources_fanuc_moveit_config: No definition of [xacro] for OS version [bionic]
The command '/bin/sh -c mkdir moveit_catkin_ws &&     cd moveit_catkin_ws &&     wstool init src &&     wstool merge -t src https://raw.githubusercontent.com/ros-planning/moveit/master/moveit.rosinstall &&     wstool update -t src &&     rosdep install -y --from-paths src --ignore-src --rosdistro ${ROS_DISTRO} --os=ubuntu:bionic' returned a non-zero code: 1

Any ideas how to solve it?

Thanks!

C++ Program crashed while running tensorflow 2.0.0 with cuda 10.0

I am using Jetpack 4.3 on Tegra Tx2. Below are other versions of 3rd party softwares.

Protobuf-3.8.0
Eigen- 3.3.90 
Tensorflow-2.0.0
Python-2.7.17
GCC 7.5.0
Bazel 0.26.1
cuDNN-7.6.3
CUDA-10.0

I have compiled tensorflow-2.0.0 inside docker container (tf-base-container has only c++ and is based on ubuntu18.04). Below is the command to run container.

docker container run --privileged -e DISPLAY=$DISPLAY -v /tmp/X11-unix:/tmp/X11-unix -e PATH=:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin -e LD_LIBRARY_PATH=:/usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu/tegra:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/lib -v /usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu -v /usr/local/cuda-10.0:/usr/local/cuda --net=host -v /root/disk-tx2/:/root/disk -v /dev:/dev -ti tf-base-container:latest /bin/bash

For compilation following command is used. this command successfully generated libtensorflow_cc.so.2.0.0

bazel build --config=opt --config=v2 --config=noaws --config=nohdfs --config=noignite --config=nokafka --config=monolithic --config=cuda --config=numa  --verbose_failures //tensorflow:libtensorflow_cc.so

But when I tried running a sample program (inside the same container on Tx2) which is using the generated library libtensorflow_cc.so.2.0.0. Below is the error I am facing.
$ ./object_detection obj_139.jpg frozen_inference_graph.pb label_map.pbtxt

Height 1200 Width 1920                                                                                       
labels path ../demo/asset-inference-graph/label_map.pbtxt 
graph path  ../demo/asset-inference-graph/frozen_inference_graph.pb                                                                                               
2020-07-07 04:02:50.339682: E /root/disk/tx2/object_detection_demo/main_buffer.cpp:393] graph_path:../demo/asset-inference-graph/frozen_inference_graph.pb                                                                       
2020-07-07 04:02:50.442481: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1                                                                                                                  
2020-07-07 04:02:50.451361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:973] ARM64 does not support NUMA - returning NUMA node zero                                                                                                             
2020-07-07 04:02:50.451517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3 pciBusID: 0000:00:00.0                                                                                             
2020-07-07 04:02:50.451562: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.                                                                                           
2020-07-07 04:02:50.451656: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:973] ARM64 does not support NUMA - returning NUMA node zero
2020-07-07 04:02:50.451809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:973] ARM64 does not support NUMA - returning NUMA node zero
2020-07-07 04:02:50.451905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-07-07 04:03:42.801867: E tensorflow/core/common_runtime/session.cc:78] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unknown error
Segmentation fault (core dumped)

Appreciate if someone can help me in this. Let me know if more info is needed.

Request: Provide TF 2.3 in ml container / Provide Jupyter in TF / Torch containers

Hi, thanks for the excellent work here, but might I suggest you provide the ml-container with the more current tensorflow 2.3 (possibly as an option) rather that the (outdated) tensorflow 1.15?

Furthermore, I believe it would be fantastic if you could provide the additional ml packages, especially Jupyter, also in the tensorflow respectively pytorch pacakges.

Best regards,

Reinhold

Compatibility problems when trying to run ROS2 packages

Hello,

I am a beginner in ROS2. I installed the container for ROS2 Eloquent and tested the image with the scripts. Everything seems to be fine.

In my container I installed a lot of tools (python, pip, torch) etc, and then proceeded to install some of the packages mentioned here. I am using a JetBot.

However, I cannot run any of the packages, and I get the same error message for every one of them. It says:

Failed to load entry point 'launch': cannot import name 'InvalidLaunchFileError' Traceback (most recent call last): File "/opt/ros/eloquent/bin/ros2", line 11, in <module> load_entry_point('ros2cli==0.8.8', 'console_scripts', 'ros2')() File "/opt/ros/eloquent/lib/python3.6/site-packages/ros2cli/cli.py", line 45, in main required=False) File "/opt/ros/eloquent/lib/python3.6/site-packages/ros2cli/command/__init__.py", line 112, in add_subparsers command_parser, '{cli_name} {name}'.format_map(locals())) File "/opt/ros/eloquent/lib/python3.6/site-packages/ros2topic/command/topic.py", line 32, in add_arguments parser, cli_name, '_verb', verb_extensions, required=False) File "/opt/ros/eloquent/lib/python3.6/site-packages/ros2cli/command/__init__.py", line 112, in add_subparsers command_parser, '{cli_name} {name}'.format_map(locals())) File "/opt/ros/eloquent/lib/python3.6/site-packages/ros2topic/verb/echo.py", line 63, in add_arguments parser, is_publisher=False, default_preset='sensor_data') File "/opt/ros/eloquent/lib/python3.6/site-packages/ros2topic/api/__init__.py", line 198, in add_qos_arguments_to_argument_parser .format(verb, default_profile.reliability.short_key)) AttributeError: 'QoSReliabilityPolicy' object has no attribute 'short_key'

Any ideas on what this could mean?

Thank you.

Mounted directory into the container does not include any files it seems empty.

Enabling Opencv with jetcam

hi Dustin,

Thanks for the docker files and the instructions.

I have a couple of questions with regard to expanding the functionality of the images. My attempt is to get a bare minimum docker with TF 2 + jetcam.

Installing opencv: For the bare-metal device (Nano), there is a package called nvidia-opencv installed through apt, but I cant seem to get it installed in a docker image. How can I add that specific apt source to my docker image?
I cant seem to get the camera working though jetcam library in the docker image. Installation works, but upon importing the python library and starting the cam there is a random error. I have a feeling that something is missing. Is it possible to add some extra information about enabling features like the jetcam though a docker image?

The image nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.4.4 has the cam + pytorch installed, but it is closed source (or at least I cant find the Dockerfile).

Thanks and regards,
Thusitha

Best practice for using ROS + ML/Vision stack?

To setup both a ML/Vision stack (pytorch, librealsense, numpy, etc.) and a ROS stack (Noetic or ROS2) on the Jetson NX, what is the best practice? My understanding is that this repo presents separate docker containers for the ml stack and ROS.

Should I merge the dockerfiles of the ROS and ml stack container in this repo, or perhaps refer to https://github.com/dusty-nv/ros_deep_learning ?

If some modules such as h5py is needed how can i add it to the container

Issue when building pytorch from dockerfile

Hi there, I keep on having an issue when I try to build the pytorch container - specifically it fails when it tries to clone in torchvision and gives me the following error:

Step 15/23 : RUN git clone -b ${TORCHVISION_VERSION} https://github.com/pytorch/vision torchvision && cd torchvision && python3 setup.py install && cd ../ && rm -rf torchvision && pip3 install "${PILLOW_VERSION}"
---> Running in b6f708c93740
Cloning into 'torchvision'...
Note: checking out '78ed10cc51067f1a6bac9352831ef37a3f842784'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

Traceback (most recent call last):
File "setup.py", line 13, in
import torch
File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 188, in
_load_global_deps()
File "/usr/local/lib/python3.6/dist-packages/torch/init.py", line 141, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.6/ctypes/init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
The command '/bin/sh -c git clone -b ${TORCHVISION_VERSION} https://github.com/pytorch/vision torchvision && cd torchvision && python3 setup.py install && cd ../ && rm -rf torchvision && pip3 install "${PILLOW_VERSION}"' returned a non-zero code: 1

I've tried remiving the -b argument but that also doesn't seem to help. I'm very new to docker so there's a chance i'm missing something obvious, but I'd really appreciate any suggestions

Unable to install ros package with noetic container

Within the noetic container, trying to do:
apt update
apt install ros-noetic-rgbd-launch

return the following error:
E: Unable to locate package ros-noetic-rgbd-launch

as I can find this package on my PC, I suspect an architecture issue?

GStreamer Support with OpenCV

OpenCV doesn't have gstreamer support when built?

Adding these to the opencv build?

-D WITH_GSTREAMER=ON \
-D WITH_GSTREAMER_0_10=OFF \
-D VIDEOIO_PLUGIN_LIST=gstreamer \

This could be seperate tags. One for ffmpeg support and one for gstreamer support. Any thoughts?

JetPack 4.5.1 and Ros2 Foxy (Dockerfile) Installation Error

I am trying to install the dockerfile on a freshly flashed image of JP 4.5.1. Here is the snippet of the log after installation:

--- stderr: cyclonedds
You have called ADD_LIBRARY for library ddsc without any source files. This typically indicates a problem with your CMakeLists.txt file
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/usr/share/maven/lib/guice.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
---
...
--- stderr: libyaml_vendor
CMake Error at CMakeLists.txt:108 (target_link_directories):
  Unknown CMake command "target_link_directories".


---
Failed   <<< libyaml_vendor [7.83s, exited with code 1]
...
Summary: 88 packages finished [20min 22s]
  1 package failed: libyaml_vendor
  3 packages aborted: fastrtps rmw rosidl_typesupport_introspection_cpp
  5 packages had stderr output: cyclonedds foonathan_memory_vendor libyaml_vendor mimick_vendor rcutils
  108 packages not processed

I can install eloquent just fine however I am having trouble with foxy.

exec error when using docker file

Hi, I tried to run the build script to build noetic (although it fails at the same place on any script).
I get this error:
standard_init_linux.go:211: exec user process caused "exec format error"
The command '/bin/sh -c apt-get update && apt-get install -y --no-install-recommends git cmake build-essential curl wget gnupg2 lsb-release && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 1

Can you explain why?
I tried running with sudo but no difference
One question, (and apologies for my docker newbie status) I am building on a x86 amd machine, could this be the issue?
Do I have to create the image on a jetson?
Thanks

help in the container Image for Jetpack 4.3

Hi, I would like to knw where to find similar container for Jetpack 4.3

The size of cuda-relative library files are ZERO byte

Hello
How are you?
I am building an own Jetson docker image from nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf1.15-py3 or nvcr.io/nvidia/l4t-pytorch:r32.4.3-pth1.6-py3 on x86_64 host machine rather than Jetson.
However, I faced the following issue during building opencv-4.2.0 with cuda support.

So I checked this cublas library file in the base container by using the following command on x86_64 host machine.

$ sudo docker run nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf1.15-py3 ls -la /usr/lib/aarch64-linux-gnu/

What is strange is that the sizes of most cuda-relative library so files are ZERO byte.

I am confused.
How should I understand this?
Thanks

for docker file for ros noetic change python3-vcstool to python3-vcstools

typo?
for docker file for ros noetic change python3-vcstool to python3-vcstools

An Issue when running an app using TensorRT model on docker container

Hello
How are you?
Thanks for contributing this project.
I made a jetson-docker image from the l4t image: nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf1.15-py3.
I am using Jetson-Xavier NX.
I installed jetson sd card image for JetPack 4.4 on my jetson.
CUDA: 10.2
cuDNN: 8.0
TensorRT: 7.1.3
I made a tensorrt engine for deep learning model and this model engine works well on host(Jetson).
I built another tensorrt engine and an app with this trt engine on docker container too.
I ran the docker container as the following on host(Jetson).
sudo docker run -it --runtime nvidia myimage
I met the following issue when started the app using the above tensorrt engine on this container.

[E] [TRT] coreReadArchive.cpp (38) - Serialization Error in verifyHeader: 0 (Version tag does not match)
[E] [TRT] INVALID_STATE: std::exception
[E] [TRT] INVALID_CONFIG: Deserialize the cuda engine failed.

I thought that this might be due to mismatch of tensorrt version and compared two versions but two versions are equal.
How can I fix this issue?
Thanks

System reboot when colcon build

Hi! my problem is Jetson will reboot when colcon build ros foxy (desktop).
Maybe you know why?

(trying in power mode 0, 3, and 7)

Using docker command without sudo

Greetings,

I would like to use the docker command without sudo on Jetson Nano, for example docker run ... & docker build... etc.

I followed the instructions mentioned here, but still I get the following error

ERRO[0000] failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: dial unix /var/run/docker.sock: connect: permission denied 
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/build?buildargs=%7B%22BASE_IMAGE%22%3A%22nvcr.io%2Fnvidia%2Fl4t-base%3Ar32.4.4%22%7D&cachefrom=%5B%5D&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile.ros.melodic.px4&labels=%7B%7D&memory=0&memswap=0&networkmode=default&rm=1&session=x10bzcexctbwk772bu1welkbp&shmsize=0&t=ros%3Amelodic-ros-px4-l4t-r32.4.4&target=&ulimits=null&version=1: dial unix /var/run/docker.sock: connect: permission denied

How can I solve this problem?

Thanks

building an image of ros-foxy-desktop

I replace change the value of the argument ROS_PKG from "ros_base" to "desktop" (and change the tag in the build script correspondingly) and get the following error after cloning into demos:

ERROR: Rosdep experienced an error: Multiple packages found with the same name "demo_nodes_cpp":

demo_nodes_cpp
demos/demo_nodes_cpp
Multiple packages found with the same name "demo_nodes_py":
demo_nodes_py
demos/demo_nodes_py
Please go to the rosdep page [1] and file a bug report with the stack trace below.
[1] : http://www.ros.org/wiki/rosdep
rosdep version: 0.20.0

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/rosdep2/main.py", line 144, in rosdep_main
exit_code = _rosdep_main(args)
File "/usr/lib/python3/dist-packages/rosdep2/main.py", line 430, in _rosdep_main
return _package_args_handler(command, parser, options, args)
File "/usr/lib/python3/dist-packages/rosdep2/main.py", line 485, in _package_args_handler
pkgs = find_catkin_packages_in(path, options.verbose)
File "/usr/lib/python3/dist-packages/rosdep2/catkin_packages.py", line 33, in find_catkin_packages_in
packages = find_packages(path)
File "/usr/lib/python3/dist-packages/catkin_pkg/packages.py", line 96, in find_packages
raise RuntimeError('\n'.join(duplicates))
RuntimeError: Multiple packages found with the same name "demo_nodes_cpp":

demo_nodes_cpp
demos/demo_nodes_cpp
Multiple packages found with the same name "demo_nodes_py":
demo_nodes_py
demos/demo_nodes_py

Cameras not listed in /dev/video*

Hey @dusty-nv,

Thanks for putting this together! I have a Jetson Nano running L4T version 32.3.1, so, for testing purposes, I pulled the L4T base image called nvcr.io/nvidia/l4t-base:r32.3.1.

I start the image with the command sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.3.1.

I have one USB 2.0 camera connected to the device, and when I run the command ls /dev/video* in the image, the output is ls: cannot access '/dev/video*': No such file or directory. However, if I run this command on the host OS, the output is as follows: /dev/video0 /dev/video1

Why is the image not able to see the camera at this location? I assume that there may be a package missing from the base image that needs to be installed. Let me know if this isn't the right place for such a question and I'll move it to the appropriate location.

Docker x86

Hi!

I know this is only tangentially related to this repo, but you seem to know a lot about the topic and I can't seem to get an answer anywhere.

Basically I want to build a container for jetson with the trt_pose repo from nvidia installed. (https://github.com/NVIDIA-AI-IOT/trt_pose) But I want to do the building on a x86 system and run on Xavier NX.

Here is a minimal docker file:

# This includes L4T (with CUDA etc) and PyTorch 1.6
FROM nvcr.io/nvidia/l4t-pytorch:r32.4.4-pth1.6-py3

# Use bash from here
SHELL ["/bin/bash", "-c"]

# Torch2TRT
RUN cd ~ \
 && git clone https://github.com/NVIDIA-AI-IOT/torch2trt \
 && cd torch2trt && python3 setup.py install --plugins

# Trt_pose
RUN pip3 install tqdm cython pycocotools \
 && apt-get install -y python3-matplotlib \
 && cd ~ \
 && git clone https://github.com/NVIDIA-AI-IOT/trt_pose \
 && cd trt_pose && python3 setup.py install

# Start command line on start
CMD ["/bin/bash"]

I have nvidia-docker installed and as default runtime. Qemu support and cuda cross-compile installed.

But I get the following error:

Cloning into 'torch2trt'... Traceback (most recent call last): File "setup.py", line 2, in <module> import torch File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 188, in <module> _load_global_deps() File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 141, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__ self._handle = _dlopen(self._name, mode) OSError: libcurand.so.10: cannot open shared object file: No such file or directory

What can I do to solve it?

can't run latest release of l4t-pytorch

max@jetson-tx2-0:~$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.5-py3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --compat32 --graphics --utility --video --display --pid=15566 /var/lib/docker/overlay2/d7c28eeff71d3aebfac3cce5ded7bd61a894f1f2d4922bf1966d993f2da86fc0/merged]\\\\nnvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/d7c28eeff71d3aebfac3cce5ded7bd61a894f1f2d4922bf1966d993f2da86fc0/merged/usr/lib/libvisionworks.so: file exists\\\\n\\\"\"": unknown.

not sure why it says file exists but no such file exists on the host file system

same issue with the ML container

max@jetson-tx2-0:~$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-ml:r32.4.2-py3
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --compat32 --graphics --utility --video --display --pid=16209 /var/lib/docker/overlay2/6944baa980c3a73e586bbc18026cf7c4f6ff8d89d0d1f879a63d5ddb5770b182/merged]\\\\nnvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/6944baa980c3a73e586bbc18026cf7c4f6ff8d89d0d1f879a63d5ddb5770b182/merged/usr/lib/libvisionworks.so: file exists\\\\n\\\"\"": unknown.

edit:

apparently this is because on host i already have libvisionworks.so installed at /usr/lib/.
mv /usr/lib/libvisionworks.so /usr/lib/libvisionworks.so.bk

solves the problem.

not sure why this is necessary? doesn't the layered fs take care of this?

ROS foxy build fails

Hi. When I'm trying to build ROS Foxy Dockerfile it is keeps on failing on step:
RUN cd ${ROS_ROOT} && colcon build --symlink-install

The error code is:

Starting >>> rosidl_typesupport_introspection_cpp
--- stderr: libyaml_vendor
CMake Error at CMakeLists.txt:108 (target_link_directories):
Unknown CMake command "target_link_directories".

Failed <<< libyaml_vendor [6.84s, exited with code 1]

Aborted <<< rosidl_typesupport_introspection_cpp [5.98s]
Aborted <<< rmw [1min 37s]
Aborted <<< fastrtps [14min 48s]

Summary: 88 packages finished [17min 19s]
1 package failed: libyaml_vendor
3 packages aborted: fastrtps rmw rosidl_typesupport_introspection_cpp
5 packages had stderr output: cyclonedds foonathan_memory_vendor libyaml_vendor mimick_vendor rcutils
108 packages not processed
The command '/bin/sh -c cd ${ROS_ROOT} && colcon build --symlink-install' returned a non-zero code: 1

Can you please advise me with this?
P.S: I was trying to remove cmake version 3.10.2 which is installing by default and install 3.13.5 but it doesn't really helped me

A custom container built with CUDA dies on start

Hi,

I've tried to build a custom docker image based on nvcr.io/nvidia/l4t-ml:r32.5.0-py3 and nvcr.io/nvidia/l4t-base:r32.5.0 with CUDA support on Jetson Xavier NX. The build itself has successfully finished, but when I try to run my container either with docker or docker-compose, it fails to start and immediately exits with 132 code. Don't see anything specific in logs:

container die 2846517886ea47c56e93c36ddf61a57f7728ea4e93a88b2f349f140e0b280c44 (exitCode=132, image=sskorol/ws, name=bold_gagarin)

Here's the Dockerfile:

FROM nvcr.io/nvidia/l4t-base:r32.5.0

RUN apt-get update && apt-get install -y --no-install-recommends \
	make \
	cmake \
	g++ \
	git \
	swig \
	nano \
	python3 \
	python3-pip \
	python3-dev \
	gcc-8 \
	g++-8 \
	gfortran \
	gfortran-8 \
	automake \
	autoconf \
	unzip \
	libtool \
	subversion

RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8 --slave /usr/bin/g++ g++ /usr/bin/g++-8 --slave /usr/bin/gcov gcov /usr/bin/gcov-8
RUN update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-8 8

RUN pip3 install setuptools wheel cython

RUN echo "Preparing Kaldi..." && git clone -b lookahead-1.8.0 --single-branch https://github.com/alphacep/kaldi /opt/kaldi
RUN cd /opt/kaldi/tools && \
	sed -i 's:status=0:exit 0:g' extras/check_dependencies.sh && \
	sed -i 's:CXXFLAGS = -g -O3 -msse -msse2:CXXFLAGS = -g -O3 -march=armv8.2-a -mcpu=cortex-a76:g' Makefile && \
	sed -i 's:--enable-ngram-fsts:--enable-ngram-fsts --disable-bin:g' Makefile && \
	echo "Making openfst..." && \
	make -j $(nproc) openfst cub
RUN cd /opt/kaldi/tools && echo "Making openblas..." && extras/install_openblas_clapack.sh

RUN cd /opt/kaldi/src && \
	./configure --mathlib=OPENBLAS_CLAPACK --shared && \
	sed -i 's: -O1 : -O3 -march=armv8.2-a :g' kaldi.mk && \
	echo "Making kaldi..." && \
	make -j $(nproc) online2 lm rnnlm

RUN echo "Making vosk..." && git clone -b gpu --single-branch https://github.com/sskorol/vosk-api /opt/vosk-api
RUN cd /opt/vosk-api/python && \
	KALDI_MKL=0 KALDI_ROOT=/opt/kaldi KALDI_CUDA=1 python3 ./setup.py install --single-version-externally-managed --root=/

RUN echo "Cleaning up..." && rm -rf /opt/vosk-api && \
	rm -rf /opt/kaldi && \
	rm -rf /root/.cache && \
	rm -rf /var/lib/apt/lists/*

When this image is built, I use it as a base for my python code. Nothing specific:

FROM sskorol/vosk-api

WORKDIR app
COPY . /app

RUN pip3 install -r requirements.txt

EXPOSE 2701

CMD ["python3", "main.py"]

I tried to run it the follogin way:

docker run --gpus all --net=host --rm --runtime nvidia -e VOSK_LANG=ru -e VOSK_SAMPLE_RATE=16000 -e VOSK_HOST=0.0.0.0 -e VOSK_PORT=2701 -e PYTHONUNBUFFERED=1 -v $PWD/model-ru:/app/model-ru sskorol/ws

As well as with docker-compose:

version: '3.7'

services:
  ws:
    image: "sskorol/ws:latest"
    ports:
      - "2701:2701"
    environment:
      - VOSK_LANG=ru
      - VOSK_SAMPLE_RATE=16000
      - VOSK_HOST=0.0.0.0
      - VOSK_PORT=2701
    volumes:
      - "${PWD}/model-ru:/app/model-ru"
    networks:
      - vosk
    deploy:
      resources:
        reservations:
          devices:
            - capabilities:
              - gpu
networks:
  vosk:
    driver: bridge

Note that I also enabled default nvidia runtime in daemon config:

{
    "data-root": "/home/[USERNAME]/nvme/docker",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

Docker version:

Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Dec 18 12:25:49 2020
 OS/Arch:           linux/arm64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Thu Dec 10 13:23:49 2020
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.4
  GitCommit:
 nvidia:
  Version:          spec: 1.0.1-dev
  GitCommit:
 docker-init:
  Version:          0.18.0
  GitCommit:

Would be greatly appreciated any help.

How to run GUI from docker file

How can I open GUI for Rviz, gvim, Gazebo in the docker container?

System:

Board: Jetson AGX Xavier
Jetpack: 4.4.1

support for Tensorflow version 2

Does Jetson support TF2 such that it can be supported in a container?

The following Release Notes suggest this should be supported. However this mentions the complete TF container which is almost double the size.

gcc: internal compiler error: Segmentation fault (program cc1)

I tried to build darknet (yolo) on the image - but it failed with:
gcc -Iinclude/ -Isrc/ -DGPU -I/usr/local/cuda/include/ -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DGPU -c ./src/gemm.c -o obj/gemm.o
gcc: internal compiler error: Segmentation fault (program cc1)

Any idea ?
Thanks
J.

./scripts/docker_build_ml.sh all fails with “unauthorized: authentication required"

Hi Dusty,
I am trying to build the docker container on my nano (Jetpack 4.4.1), but the process halts with the message "unauthorized: authentication required"

I cloned the repository:
git clone https://github.com/dusty-nv/jetson-containers.git

cd ed into the repository:
cd jetson-containers/

./scripts/docker_build_ml.sh all

yields the following output:
reading L4T version from /etc/nv_tegra_release
L4T BSP Version: L4T R32.4.4
l4t-base image: nvcr.io/nvidian/nvidia-l4t-base:r32.4.4
building PyTorch torch-1.6.0-cp36-cp36m-linux_aarch64.whl, torchvision v0.7.0 (pillow), torchaudio v0.6.0
Building l4t-pytorch:r32.4.4-pth1.6-py3 container...
[sudo] password for XXXX:
Sending build context to Docker daemon 183.3kB
Step 1/22 : ARG BASE_IMAGE=nvcr.io/nvidia/l4t-base:r32.4.3
Step 2/22 : FROM ${BASE_IMAGE}
unauthorized: authentication required

Can you help me out of the ditch? I would eventually like to alter the Dockerfile in order to create my own version, but I already get stuck here.

Best,
Jens

Doesn't build

$ ./scripts/docker_build_ros.sh all > log.txt 2>&1

See attached log.txt. Thanks!

Jetson Xavxer NX
JetPack 4.5

log.txt

How do I run the container inside visual studio code ?

Run locally

I've built all the containers successfully.

But I've NO idea how to run them.

Any tip?

Has the PyTorch 1.7 image been pushed to a repository?

I'm building a project on the PyTorch image and I want to use PyTorch 1.7. It looks like I can build it from this repository now but I can't seem to find it as a built image in NGC or Docker Hub. Is there a built image?

cudnn + tensorrt container

Thanks for putting this together!

Do you have any plans to add a tensorrt container (with cudnn)?

I'm looking to use an l4t-container to build an application that depends on tensorrt (in this case 7.6.3) for the jetson nano.

OpenCV and nvidia-l4t-base

Hey @dusty-nv, I'm new to using docker and I'm trying to figure out the best way to include OpenCV in my docker image that I am building on top of nvidia-l4t-base. In order to reduce the overall size of the image, I think the best way to do this is to use the opencv installation that is installed when the Jetson is flashed. I think this is the same general approach that you take to keep nvidia-l4t-base relatively small - you use the host's CUDA installation rather than copying it into the docker image.

Is it possible to do the same with the host's OpenCV installation? If so, could you point me to any relevant docs that could show me how to do this?

Use the cpp implementation of protobuf for Tensorflow image

The default implementation of protobuf is extremely slow and should be replace with the optimized one.
More context and install script are available on
https://jkjung-avt.github.io/tf-trt-revisited/

catkin build

Hi, what do i change to run catkin build as opposed to catkin_make? Can you help with this?

Issue when building pytorch image

When attempting to build a pytorch image I run:
./scripts/docker_build_ml.sh pytorch
and get the following when it gets to torchvision:
From some responses I've seen on google this might be a CUDA version issue, but I'm a bit in over my head so not sure how to proceed.

`
Step 15/22 : RUN git clone -b ${TORCHVISION_VERSION} https://github.com/pytorch/vision torchvision && cd torchvision && python3 setup.py install && cd ../ && rm -rf torchvision && pip3 install "${PILLOW_VERSION}"
---> Running in 18d6e6d7c9d0
Cloning into 'torchvision'...
Note: checking out '78ed10cc51067f1a6bac9352831ef37a3f842784'.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

I can't create a new file, the file system is reported as read-only

Dockerfile FROM nvcr.io/nvidia/l4t-ml:r32.4.3-py3
Step 16/19 : RUN cd /usr/local/cuda/targets/aarch64-linux/lib && mkdir 123
---> Running in d0f4ef0fa4c4
mkdir: cannot create directory '123': Read-only file system

pull access denied

Pull access denied for nvcr.io/nvidia-l4t-base, repository does not exist or may require 'docker login' denied: requested access to the resource is denied

Is it possible to pull the image?

build pytorch c++ api from this image?

Hi, I would like to build pytorch c++ api based upon your pytorch 1.5 image. However it seems that pytorch in the image is built from wheel file so that I cannot build c++ api from source. Moreover, it seems that this image doesn't have cudnn and its header files. Could you please give me some suggestions about how to build pytorch c++ api in your image? Thank you!

ROS docker version is still 32.4.3 even if I use Jetpack4.4.1

But version for ROS is still r32.4.3 in docker file and docker build script.
Is this no harm to build docker image for Jetpack4.4.1(=r32.4.4)

Dockerfile.ros.eloquent
ARG BASE_IMAGE=nvcr.io/nvidia/l4t-base:r32.4.3

docker_build_ros.sh
BASE_IMAGE="nvcr.io/nvidia/l4t-base:r32.4.3"

Thanks!

Ros foxy with ubuntu focal (20.04)

There is some provision to support ubuntu 20.04 with ros foxy?

Can not find cudnn runtime libraries inside the l4t-pytorch docker

I pull the l4t-pytorch docker using docker pull nvcr.io/nvidia/l4t-pytorch:r32.4.3-pth1.6-py3. After I run the docker, I can not find any libcudnn8 libraries using dpkg -l | grep cudnn. Anyone has any ideas about this?

can't import torch in latest release of l4t-pytorch

max@jetson-tx2-0:~$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.5-py3
root@jetson-tx2-0:/# python
Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
root@jetson-tx2-0:/# python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 135, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 93, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvToolsExt.so.1: cannot open shared object file: No such file or directory

edit:

adding this to .bashrc

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

kicks the can down to

root@jetson-tx2-0:/# python3 -c "import torch"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 135, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 93, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory

which i do not have

root@jetson-tx2-0:/# find / -name libcudart.so
/usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart.so

How to run ROS2 foxy container and source ros

Hello dusty_nv. Thank you for creating these containers and making this tutorial. I used it to run ros2 FOXY, and I was able to download and test the container and it passed the test. However, I am new to docker and do not know how to run the container so I can source Foxy and run ROS2 commands. Also, do I have to worry about my workspace getting overriden/not saved when I stop the container, or if I run the container will I be able to work in my ws and when I stop the container the ws will still be saved?

Jetpack 4.5 support

Jetpack 4.5 is currently not supported.
Is the a patch in the development queue ?
THX

Question about building the containers

I did as instructed to build the ML containers, using the following command:

./scripts/docker_build_ml.sh all

It took several hours to run.
Afterwards, I ran the following command in order to test the containers:

./scripts/docker_test_ml.sh all

I got the output down below. How do I know if it's ok or not?
If not, what should I do next?
Thanks for the help!

nvidia@Jetson:~/jetson-containers$ ./scripts/docker_test_ml.sh all
reading L4T version from /etc/nv_tegra_release
L4T BSP Version: L4T R32.4.4
testing container l4t-pytorch:r32.4.4-pth1.6-py3 => PyTorch
localuser:root being added to access control list
testing PyTorch...
PyTorch version: 1.6.0
CUDA available: True
cuDNN version: 8000
Tensor a = tensor([0., 0.], device='cuda:0')
Tensor b = tensor([0.5379, 0.0701], device='cuda:0')
Tensor c = tensor([0.5379, 0.0701], device='cuda:0')
testing LAPACK (OpenBLAS)...
done testing LAPACK (OpenBLAS)
testing torch.nn (cuDNN)...
done testing torch.nn (cuDNN)
PyTorch OK
\ndownloading data for testing torchvision...
test/data/ILSVRC2012_img_val_subset_5k.tar.gz: No such file or directory

torchvision.ops.nms fails on GPU data inside the container nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.3-py3, but works as expected on the host OS

Hey @dusty-nv

I've been having an issue running my object detection model within the container nvcr.io/nvidia/l4t-pytorch:r32.4.2-pth1.3-py3. Specifically, when running inference, I call torchvision.ops.nms in order to perform non-maximum suppression on the objects detected by the network. When doing inference in the container, this gives the following error:

File "/usr/local/lib/python3.6/dist-packages/torchvision-0.4.2-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 33, in nms
RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /torchvision/torchvision/csrc/cuda/nms_cuda.cu:127)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x78 (0x7f5e7378d8 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: nms_cuda(at::Tensor const&, at::Tensor const&, float) + 0x710 (0x7f3e0eb51c in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #2: nms(at::Tensor const&, at::Tensor const&, float) + 0x114 (0x7f3e08ae7c in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #3: <unknown function> + 0x73b70 (0x7f3e0bab70 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #4: <unknown function> + 0x70248 (0x7f3e0b7248 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #5: <unknown function> + 0x69718 (0x7f3e0b0718 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #6: <unknown function> + 0x699e4 (0x7f3e0b09e4 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #7: <unknown function> + 0x534a4 (0x7f3e09a4a4 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
<omitting python frames>
frame #9: python3() [0x529958]
frame #11: python3() [0x527860]
frame #12: python3() [0x5297dc]
frame #14: python3() [0x528ff0]
frame #17: python3() [0x5f2bcc]
frame #20: python3() [0x528ff0]
frame #23: python3() [0x5f2bcc]
frame #25: python3() [0x595e5c]
frame #28: python3() [0x528ff0]
frame #31: python3() [0x5f2bcc]
frame #34: python3() [0x528ff0]
frame #37: python3() [0x5f2bcc]
frame #39: python3() [0x595e5c]
frame #41: python3() [0x529738]
frame #43: python3() [0x527860]
frame #44: python3() [0x5297dc]
frame #46: python3() [0x528ff0]
frame #51: __libc_start_main + 0xe0 (0x7f9d2256e0 in /lib/aarch64-linux-gnu/libc.so.6)
frame #52: python3() [0x420e94]

Segmentation fault (core dumped)

To simplify the debugging process, I've come up with a minimal program that gives the same error as above:

import torch
import torchvision
bboxes = [[0.0, 0.0, 2.0, 2.0], [0.75, 0.75, 1.0, 1.0]]
scores = torch.tensor([1., 0.5]).cuda()
boxes = torch.tensor(bboxes).cuda()
keep = torchvision.ops.nms(boxes, scores, 0.7)
print(keep)

When running this code from within the container, I get essentially the same error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.4.2-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 33, in nms
RuntimeError: CUDA error: no kernel image is available for execution on the device (nms_cuda at /torchvision/torchvision/csrc/cuda/nms_cuda.cu:127)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x78 (0x7f8adb98d8 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: nms_cuda(at::Tensor const&, at::Tensor const&, float) + 0x710 (0x7f6541151c in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #2: nms(at::Tensor const&, at::Tensor const&, float) + 0x114 (0x7f653b0e7c in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #3: <unknown function> + 0x73b70 (0x7f653e0b70 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #4: <unknown function> + 0x70248 (0x7f653dd248 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #5: <unknown function> + 0x69718 (0x7f653d6718 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #6: <unknown function> + 0x699e4 (0x7f653d69e4 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
frame #7: <unknown function> + 0x534a4 (0x7f653c04a4 in /root/.cache/Python-Eggs/torchvision-0.4.2-py3.6-linux-aarch64.egg-tmp/torchvision/_C.so)
<omitting python frames>
frame #9: python3() [0x529958]
frame #11: python3() [0x527860]
frame #12: python3() [0x5297dc]
frame #14: python3() [0x528ff0]
frame #15: python3() [0x63075c]
frame #20: __libc_start_main + 0xe0 (0x7fb7aa66e0 in /lib/aarch64-linux-gnu/libc.so.6)
frame #21: python3() [0x420e94]

However, when I run this on the host OS, there are no errors. Here is the output of running jetson_release on that device (note that it has torch 1.3 and torchvision 0.4.2 installed as well):

 - NVIDIA Jetson Nano (Developer Kit Version)
   * Jetpack 4.4 DP [L4T 32.4.2]
   * NV Power Mode: MAXN - Type: 0
   * jetson_clocks service: inactive
 - Libraries:
   * CUDA: 10.2.89
   * cuDNN: 8.0.0.145
   * TensorRT: 7.1.0.16
   * Visionworks: 1.6.0.501
   * OpenCV: 4.1.1 compiled CUDA: NO
   * VPI: 0.2.0
   * Vulkan: 1.2.70

And the output of running the minimum program is tensor([0, 1], device='cuda:0'). Do you know why this program fails to run from within the container?

ros melodic issue with the environment

Dear all,
I'm hitting an issue with the script to build the ros melodic image.
Once completed with no errors, and I log into the container, I get the following error:

bash: /opt/ros/melodic/share/ros/setup.bash: No such file or directory

Basically the environment is messed up. Look:

root@roomba2:/# env

LD_LIBRARY_PATH=/opt/ros/melodic/lib:/usr/local/cuda-10.2/targets/aarch64-linux/lib:
DISPLAY=localhost:10.0
HOSTNAME=roomba2
ROS_ETC_DIR=/opt/ros/melodic/etc/ros
NVIDIA_VISIBLE_DEVICES=all
PWD=/
HOME=/root
CMAKE_PREFIX_PATH=/opt/ros/melodic
DEBIAN_FRONTEND=noninteractive
ROS_ROOT=/opt/ros/melodic/share/ros
ROS_MASTER_URI=http://localhost:11311
ROS_VERSION=1
TERM=xterm
ROS_PYTHON_VERSION=2
NVIDIA_DRIVER_CAPABILITIES=all
SHLVL=1
PYTHONPATH=/opt/ros/melodic/lib/python2.7/dist-packages
ROS_PACKAGE_PATH=/opt/ros/melodic/share
ROSLISP_PACKAGE_DIRECTORIES=
PATH=/opt/ros/melodic/bin:/usr/local/cuda-10.2/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PKG_CONFIG_PATH=/opt/ros/melodic/lib/pkgconfig
ROS_DISTRO=melodic
_=/usr/bin/env

I will look how to fix it, however if you have already a solution, any help is welcome.

Cheers