Comments (10)
Can you try setting user = "root:vglusers"
in /etc/nvidia-container-runtime/config.toml
from libnvidia-container.
Some of the presumably irrelevant hits pointed to wrong device permissions, which I tried to rule out.
crw-rw---- 1 root vglusers 195, 0 Okt 9 16:27 /dev/nvidia0
crw-rw---- 1 root vglusers 195, 255 Okt 9 16:27 /dev/nvidiactl
crw-rw---- 1 root vglusers 195, 254 Okt 9 16:27 /dev/nvidia-modeset
crw-rw-rw- 1 root root 236, 0 Okt 9 16:27 /dev/nvidia-uvm
crw-rw-rw- 1 root root 236, 1 Okt 9 16:31 /dev/nvidia-uvm-tools
I am in the vglusers
group, and I also temporarily changed them all to mode 0666.
from libnvidia-container.
I found the culprit: At some point in the past, I had virtualgl installed. Uninstalling that left two configuration files, which I had to remove:
/etc/udev/rules.d/99-virtualgl-dri.rules
contained
KERNEL=="card[0-9]", MODE="0660", OWNER="root", GROUP="vglusers"
and /etc/modprobe.d/virtualgl.conf
contained:
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=1005 NVreg_DeviceFileMode=0660
I removed both files, ran update-initramfs -u
in order to let the changed options take effect, and did delgroup vglusers
(which was 1005, of course), and now I could revert the definition of user
in /etc/nvidia-container-runtime/config.toml
.
from libnvidia-container.
So nvidia-container-cli info
works but docker run
doesn't?
Can you always reproduce it? That's weird, both of them should either work or fail.
from libnvidia-container.
Yes, exactly. docker run with nvidia runtime always fails, but I find no other problem.
from libnvidia-container.
Jackpot! 👍 At last, I am able to run nvidia-docker again! :-)
I must admit that I am still very puzzled – after all, I even tried with rw permissions for "other". Furthermore, I know several people with Ubuntu 18.04 who did not seem to have to do this (but I will double-check again, now that I know more).
from libnvidia-container.
Ok, just confirmed: on a machine with 18.04 installed from scratch and NVRM version 390.77, the config entry is missing (actually, default is a commented out #user = "root:video"
), and dev permissions are:
crw-rw-rw- 1 root root 195, 0 Sep 24 13:59 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Sep 24 13:59 /dev/nvidia1
crw-rw-rw- 1 root root 195, 2 Sep 24 13:59 /dev/nvidia2
crw-rw-rw- 1 root root 195, 3 Sep 24 13:59 /dev/nvidia3
crw-rw-rw- 1 root root 195, 255 Sep 24 13:59 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Sep 24 17:43 /dev/nvidia-modeset
crw-rw-rw- 1 root root 238, 0 Sep 24 13:59 /dev/nvidia-uvm
crw-rw-rw- 1 root root 238, 1 Sep 24 17:43 /dev/nvidia-uvm-tools
from libnvidia-container.
Yeah, permissions of the device files are dictated by the kernel module parameters and those can vary depending on how you install the driver. Hence the setting ;)
from libnvidia-container.
Thanks again. For completeness, I asked another colleague who has the same upgrade history AND the same driver version, and STILL the permissions are different, and she does not need that setting.
from libnvidia-container.
Many thanks @3XX0 for the support, #37 (comment) solved the problem for us. Is there anything else to consider if we want to use nvidia-docker and VirtualGL on the same machine?
from libnvidia-container.
Related Issues (20)
- Building binaries, not just packages? HOT 2
- seg fault when running bundle with libnvidia-container-tools installed in container HOT 7
- why does nvidia-container-cli load libnvidia-ml via dlopen rather than linking directly? HOT 4
- Fail to start on second run. libs being set to 0 size HOT 2
- libnvidia-container ubuntu22.04/amd64 HOT 4
- libnvidia_container fails to compile with mold HOT 3
- Issue in permissions checking in nvcgo/internal/cgroup/ebpf.go ? HOT 2
- nvidia-container-runtime segfault HOT 2
- sudo yum install -y nvidia-container-toolkit failed - No such device
- nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
- Warning of Key is stored in legacy trusted.gpg keyring HOT 2
- Unprivileged `nvidia-container-cli --user configure`
- ldconfig-free deployment
- Unable to use more than 5 GPU cards HOT 2
- Building libnvidia-container 1.14.5 builds 1.14.4 HOT 19
- nvidia-container-cli: mount error: failed to add device rules: unable to generate new device filter program from existing programs: unable to create new device filters program: load program: invalid argument: 0: (69) r2 = *(u16 *)(r1 +0)
- Trouble Running NVIDIA GPU Containers on Custom Yocto-Based Distro on HPE Server with NVIDIA A40 GPU HOT 5
- How to mirror this Nvidia libnividia rmp repo with artifactory rpm repo HOT 1
- versions.mk and common.mk use PATCH variable for different things
- Support for Ubuntu 24.04 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libnvidia-container.