Code Monkey home page Code Monkey logo

Comments (19)

klueska avatar klueska commented on August 10, 2024 1

Well from our perspective that is "good news", as it means that it doesn't appear to be an issue with the container toolkit, but rather something else.

from libnvidia-container.

elezar avatar elezar commented on August 10, 2024

We updated the version logic to use git describe --tags to extract the version information. The issue here is that the v1.14.4 and v1.14.5 tags are the same. In order to override the version you can set the LIB_VERSION and LIB_TAG make variables.

This is the logic that we use when building this as part of the NVIDIA Container Toolkit repo here: https://github.com/NVIDIA/nvidia-container-toolkit/blob/0409824106214a55df4c89a41f12c48f492cd51b/scripts/build-all-components.sh#L58-L64

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

@elezar thanks for the information, will look into that and how I can implement that in my build toolchain.

Anyways, I have to investigate a bit further since even with version 1.14.4 and driver version 550.54.14 I can't utilize my T400 in Docker containers.

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

@elezar did something else change too since I now get this error when trying to create a container with v1.14.5:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: /usr/bin/nvidia-container-runtime did not terminate successfully: exit status 2: unknown.

EDIT: This is caused because I tried to compile with go 1.22.0 switching back to go 1.20.13 and everything is working, however I'm still not able to use driver version 550.54.14 with Docker.

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

I'll close this issue since this is resolved.

from libnvidia-container.

elezar avatar elezar commented on August 10, 2024

@ich777 please open an issue against the nvidia-container-toolkit repo with the error messages you're seeing with the 550 driver.

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

@elezar I already created an issue on the Developer Forums from Nvidia here since I don't think that it's related to libnvidia-container nor nvidia-contaienr-toolkit.
This only happens with driver 550.54.14 and not with 550.40.07 and earlier.

from libnvidia-container.

elezar avatar elezar commented on August 10, 2024

Does nvidia-smi work in the container, or is it applications that are failing?

/cc @klueska

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

Does nvidia-smi work in the container, or is it applications that are failing?

nvidia-smi is working just fine in the container yes.

from libnvidia-container.

elezar avatar elezar commented on August 10, 2024

There was a new feature included in the 550.54.14 that requires additional support in the NVIDIA Container Toolkit.

We are working to release a version that includes this support but are waiting for some driver components to be published.

For now, could you confirm that adding the --device /dev/nvidia-caps-imex-channels/channel0 device to your container allows it to function.

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

For now, could you confirm that adding the --device /dev/nvidia-caps-imex-channels/channel0 device to your container allows it to function.

That does not work, I just have these devices:

root@Test:~# ls -la /dev/nvidia*
crw-rw-rw- 1 root root 195, 254 Feb 26 15:07 /dev/nvidia-modeset
crw-rw-rw- 1 root root 240,   0 Feb 26 15:08 /dev/nvidia-uvm
crw-rw-rw- 1 root root 240,   1 Feb 26 15:08 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Feb 26 15:07 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Feb 26 15:07 /dev/nvidiactl

/dev/nvidia-caps:
total 0
drwxr-xr-x  2 root root     80 Feb 26 15:08 ./
drwxr-xr-x 17 root root   3380 Feb 26 15:08 ../
cr--------  1 root root 244, 1 Feb 26 15:08 nvidia-cap1
cr--r--r--  1 root root 244, 2 Feb 26 15:08 nvidia-cap2

This is the output from nvidia-smi from the container (of course without the path that you've mentioned):

root@8a1b7fbf37e8:/# nvidia-smi
Mon Feb 26 15:10:41 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA T400                    Off |   00000000:01:00.0 Off |                  N/A |
| 36%   38C    P0             N/A /   31W |       0MiB /   2048MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@8a1b7fbf37e8:/# 

from libnvidia-container.

klueska avatar klueska commented on August 10, 2024

What was your original error? You don't seem to include it in the description (unless I missed it somehow).

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

What was your original error? You don't seem to include it in the description (unless I missed it somehow).

Some users on the Unraid Forums reported that transcoding with Plex was not working alongside with Jellyfin, so to speak NVENC.

I was able to reproduce this on my test machine, I made a short post on the Nvidia Developer Forums here where I described what I've tested so far.

If you need any logs or anything else just let me know.

from libnvidia-container.

klueska avatar klueska commented on August 10, 2024

Yeah, I'm trying to understand what "does not work" means.

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

Yeah, I'm trying to understand what "does not work" means.

Sorry, I just realized that I didn't provide much information...

Transcoding is not working with driver version 550.54.14 and it fails on Jellyfin with that error:

...
frame=    1 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
[h264_nvenc @ 0x561522fa7240] Failed locking bitstream buffer: invalid param (8): 
Error submitting video frame to the encoder
[libfdk_aac @ 0x561522fa4400] 2 frames left in the queue on closing
Conversion failed!

and on Plex with that error:

...
Feb 26, 2024 15:29:24.384 [23302936435512] Fehlersuche — Jobs: '/usr/lib/plexmediaserver/Plex Transcoder' exit code for process 1386 is -9 (signal: Killed)
...

Sorry, I will try to find more useful information in the Plex log but it basically falls back to Software transcoding.

from libnvidia-container.

klueska avatar klueska commented on August 10, 2024

Following up on @elezar's suggestion of adding --device /dev/nvidia-caps-imex-channels/channel0, can you try running the following on the host (which should create /dev/nvidia-caps-imex-channels/channel0) and then test again:

nvidia-modprobe -i 0:1

I'm hoping this isn't an issue, but I want to rule it out.

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

After running:

nvidia-modprobe -i 0:1

I can confirm that the device:

/dev/nvidia-caps-imex-channels/channel0

was created, but sadly enough it's still the same when passing through this device.
The Jellyfin log gives me that:

...
frame=    1 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
[h264_nvenc @ 0x55634ff34c40] Failed locking bitstream buffer: invalid param (8): 
Error submitting video frame to the encoder
[libfdk_aac @ 0x55634ff2fa40] 2 frames left in the queue on closing
Conversion failed!

And Plex falls back to Software transcoding.

Here is the docker run output:

docker run
  -d
  --name='Jellyfin'
  --net='bridge'
  -e TZ="Europe/Berlin"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="Test"
  -e HOST_CONTAINERNAME="Jellyfin"
  -e 'NVIDIA_VISIBLE_DEVICES'='GPU-09e16239-57bc-2ca8-39ca-c72ed08bac48'
  -e 'NVIDIA_DRIVER_CAPABILITIES'='all'
  -e 'PUID'='99'
  -e 'PGID'='100'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:8096]/'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/ich777/docker-templates/master/ich777/images/jellyfin.png'
  -p '8096:8096/tcp'
  -p '8920:8020/tcp'
  -v '/mnt/user/Filme':'/mnt/movies':'ro'
  -v '/mnt/user/Serien':'/mnt/tv':'ro'
  -v '/mnt/cache/appdata/jellyfin/cache':'/cache':'rw'
  -v '/mnt/cache/appdata/jellyfin':'/config':'rw'
  --device='/dev/nvidia-caps-imex-channels/channel0'
  --group-add=18
  --runtime=nvidia 'jellyfin/jellyfin' 

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

Well from our perspective that is "good news", as it means that it doesn't appear to be an issue with the container toolkit, but rather something else.

Should I open an issue somewhere with the information from here just to keep track of it and see if someone else has a similar issue or should I wait until someone answers on the Developer Forums?
Do you think it is worth testing the open source Kernel module?

from libnvidia-container.

ich777 avatar ich777 commented on August 10, 2024

Just to let you know @elezar and @klueska the driver version that was released today v550.67 solves this issue and NVENC is working again in combination with Docker.

from libnvidia-container.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.