Comments (19)
Well from our perspective that is "good news", as it means that it doesn't appear to be an issue with the container toolkit, but rather something else.
from libnvidia-container.
We updated the version logic to use git describe --tags
to extract the version information. The issue here is that the v1.14.4
and v1.14.5
tags are the same. In order to override the version you can set the LIB_VERSION
and LIB_TAG
make variables.
This is the logic that we use when building this as part of the NVIDIA Container Toolkit repo here: https://github.com/NVIDIA/nvidia-container-toolkit/blob/0409824106214a55df4c89a41f12c48f492cd51b/scripts/build-all-components.sh#L58-L64
from libnvidia-container.
@elezar thanks for the information, will look into that and how I can implement that in my build toolchain.
Anyways, I have to investigate a bit further since even with version 1.14.4 and driver version 550.54.14 I can't utilize my T400 in Docker containers.
from libnvidia-container.
@elezar did something else change too since I now get this error when trying to create a container with v1.14.5:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: /usr/bin/nvidia-container-runtime did not terminate successfully: exit status 2: unknown.
EDIT: This is caused because I tried to compile with go 1.22.0 switching back to go 1.20.13 and everything is working, however I'm still not able to use driver version 550.54.14 with Docker.
from libnvidia-container.
I'll close this issue since this is resolved.
from libnvidia-container.
@ich777 please open an issue against the nvidia-container-toolkit repo with the error messages you're seeing with the 550 driver.
from libnvidia-container.
@elezar I already created an issue on the Developer Forums from Nvidia here since I don't think that it's related to libnvidia-container
nor nvidia-contaienr-toolkit
.
This only happens with driver 550.54.14 and not with 550.40.07 and earlier.
from libnvidia-container.
Does nvidia-smi
work in the container, or is it applications that are failing?
/cc @klueska
from libnvidia-container.
Does
nvidia-smi
work in the container, or is it applications that are failing?
nvidia-smi
is working just fine in the container yes.
from libnvidia-container.
There was a new feature included in the 550.54.14
that requires additional support in the NVIDIA Container Toolkit.
We are working to release a version that includes this support but are waiting for some driver components to be published.
For now, could you confirm that adding the --device /dev/nvidia-caps-imex-channels/channel0
device to your container allows it to function.
from libnvidia-container.
For now, could you confirm that adding the
--device /dev/nvidia-caps-imex-channels/channel0
device to your container allows it to function.
That does not work, I just have these devices:
root@Test:~# ls -la /dev/nvidia*
crw-rw-rw- 1 root root 195, 254 Feb 26 15:07 /dev/nvidia-modeset
crw-rw-rw- 1 root root 240, 0 Feb 26 15:08 /dev/nvidia-uvm
crw-rw-rw- 1 root root 240, 1 Feb 26 15:08 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195, 0 Feb 26 15:07 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 26 15:07 /dev/nvidiactl
/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root 80 Feb 26 15:08 ./
drwxr-xr-x 17 root root 3380 Feb 26 15:08 ../
cr-------- 1 root root 244, 1 Feb 26 15:08 nvidia-cap1
cr--r--r-- 1 root root 244, 2 Feb 26 15:08 nvidia-cap2
This is the output from nvidia-smi
from the container (of course without the path that you've mentioned):
root@8a1b7fbf37e8:/# nvidia-smi
Mon Feb 26 15:10:41 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA T400 Off | 00000000:01:00.0 Off | N/A |
| 36% 38C P0 N/A / 31W | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
root@8a1b7fbf37e8:/#
from libnvidia-container.
What was your original error? You don't seem to include it in the description (unless I missed it somehow).
from libnvidia-container.
What was your original error? You don't seem to include it in the description (unless I missed it somehow).
Some users on the Unraid Forums reported that transcoding with Plex was not working alongside with Jellyfin, so to speak NVENC.
I was able to reproduce this on my test machine, I made a short post on the Nvidia Developer Forums here where I described what I've tested so far.
If you need any logs or anything else just let me know.
from libnvidia-container.
Yeah, I'm trying to understand what "does not work" means.
from libnvidia-container.
Yeah, I'm trying to understand what "does not work" means.
Sorry, I just realized that I didn't provide much information...
Transcoding is not working with driver version 550.54.14
and it fails on Jellyfin with that error:
...
frame= 1 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed= 0x
[h264_nvenc @ 0x561522fa7240] Failed locking bitstream buffer: invalid param (8):
Error submitting video frame to the encoder
[libfdk_aac @ 0x561522fa4400] 2 frames left in the queue on closing
Conversion failed!
and on Plex with that error:
...
Feb 26, 2024 15:29:24.384 [23302936435512] Fehlersuche — Jobs: '/usr/lib/plexmediaserver/Plex Transcoder' exit code for process 1386 is -9 (signal: Killed)
...
Sorry, I will try to find more useful information in the Plex log but it basically falls back to Software transcoding.
from libnvidia-container.
Following up on @elezar's suggestion of adding --device /dev/nvidia-caps-imex-channels/channel0
, can you try running the following on the host (which should create /dev/nvidia-caps-imex-channels/channel0
) and then test again:
nvidia-modprobe -i 0:1
I'm hoping this isn't an issue, but I want to rule it out.
from libnvidia-container.
After running:
nvidia-modprobe -i 0:1
I can confirm that the device:
/dev/nvidia-caps-imex-channels/channel0
was created, but sadly enough it's still the same when passing through this device.
The Jellyfin log gives me that:
...
frame= 1 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed= 0x
[h264_nvenc @ 0x55634ff34c40] Failed locking bitstream buffer: invalid param (8):
Error submitting video frame to the encoder
[libfdk_aac @ 0x55634ff2fa40] 2 frames left in the queue on closing
Conversion failed!
And Plex falls back to Software transcoding.
Here is the docker run output:
docker run
-d
--name='Jellyfin'
--net='bridge'
-e TZ="Europe/Berlin"
-e HOST_OS="Unraid"
-e HOST_HOSTNAME="Test"
-e HOST_CONTAINERNAME="Jellyfin"
-e 'NVIDIA_VISIBLE_DEVICES'='GPU-09e16239-57bc-2ca8-39ca-c72ed08bac48'
-e 'NVIDIA_DRIVER_CAPABILITIES'='all'
-e 'PUID'='99'
-e 'PGID'='100'
-l net.unraid.docker.managed=dockerman
-l net.unraid.docker.webui='http://[IP]:[PORT:8096]/'
-l net.unraid.docker.icon='https://raw.githubusercontent.com/ich777/docker-templates/master/ich777/images/jellyfin.png'
-p '8096:8096/tcp'
-p '8920:8020/tcp'
-v '/mnt/user/Filme':'/mnt/movies':'ro'
-v '/mnt/user/Serien':'/mnt/tv':'ro'
-v '/mnt/cache/appdata/jellyfin/cache':'/cache':'rw'
-v '/mnt/cache/appdata/jellyfin':'/config':'rw'
--device='/dev/nvidia-caps-imex-channels/channel0'
--group-add=18
--runtime=nvidia 'jellyfin/jellyfin'
from libnvidia-container.
Well from our perspective that is "good news", as it means that it doesn't appear to be an issue with the container toolkit, but rather something else.
Should I open an issue somewhere with the information from here just to keep track of it and see if someone else has a similar issue or should I wait until someone answers on the Developer Forums?
Do you think it is worth testing the open source Kernel module?
from libnvidia-container.
Just to let you know @elezar and @klueska the driver version that was released today v550.67 solves this issue and NVENC is working again in combination with Docker.
from libnvidia-container.
Related Issues (20)
- 如何修改drv->devs->path的路径 HOT 1
- Building binaries, not just packages? HOT 2
- seg fault when running bundle with libnvidia-container-tools installed in container HOT 7
- why does nvidia-container-cli load libnvidia-ml via dlopen rather than linking directly? HOT 4
- Fail to start on second run. libs being set to 0 size HOT 2
- libnvidia-container ubuntu22.04/amd64 HOT 4
- libnvidia_container fails to compile with mold HOT 3
- Issue in permissions checking in nvcgo/internal/cgroup/ebpf.go ? HOT 2
- nvidia-container-runtime segfault HOT 2
- sudo yum install -y nvidia-container-toolkit failed - No such device
- nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
- Warning of Key is stored in legacy trusted.gpg keyring HOT 2
- Unprivileged `nvidia-container-cli --user configure`
- ldconfig-free deployment
- Unable to use more than 5 GPU cards HOT 2
- nvidia-container-cli: mount error: failed to add device rules: unable to generate new device filter program from existing programs: unable to create new device filters program: load program: invalid argument: 0: (69) r2 = *(u16 *)(r1 +0)
- Trouble Running NVIDIA GPU Containers on Custom Yocto-Based Distro on HPE Server with NVIDIA A40 GPU HOT 5
- How to mirror this Nvidia libnividia rmp repo with artifactory rpm repo HOT 1
- versions.mk and common.mk use PATCH variable for different things
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libnvidia-container.