Comments (18)
I can confirm the workaround suggested by @nikolayDemirev is working. After downgrading from driver version 555.85 to 552.44 TensorFlow successfully detects the GPU with CUDA.
Since this appears to be a regression in the NVidia driver it would probably be best to lets them know. Is there somebody at TensorFlow who can do that? I could contact customer support but I'm not sure I would reach the right people.
from tensorflow.
@koryphaee I have solved the issue by installing an older NVIDIA driver (552.44). I have installed it with the "Perform a clean install" option from the Custom (Advanced) installation menu. You can find older versions here: https://www.nvidia.com/Download/Find.aspx#
The only difference is downgrading the driver version.
from tensorflow.
I can confirm the workaround suggested by @nikolayDemirev is working. After downgrading from driver version 555.85 to 552.44 TensorFlow successfully detects the GPU with CUDA. Since this appears to be a regression in the NVidia driver it would probably be best to lets them know. Is there somebody at TensorFlow who can do that? I could contact customer support but I'm not sure I would reach the right people.
I had the same problem too, 555 detects GPU with nvidia-smi but does not pass it to TensorFlow. I have downgraded to 552 and now TensorFlow detects the GPU and uses it.
Thank you :)
from tensorflow.
NVIDIA is working on this. If you user Docker CE on the Linux side of WSL, then update your nvidia-container-toolkit to 1.14.4 or later. If you are using Docker Desktop, then for now the best approach is to use a driver 552.xx or earlier. I will post back here as soon as the remaining incompatibility is resolved.
Cliff Woolley
Sr. Director, Deep Learning Software
NVIDIA
from tensorflow.
The solution works for me...
Num GPUs Available: 1
from tensorflow.
@nikolayDemirev, I was successfully able to use tensorflow-gpu with docker with Tensorflow 2.16.1. Could you please check if the requirements are installed properly or not.
https://stackoverflow.com/questions/78418499/using-tensorflow-with-gpu-on-docker-on-ubuntu
I have only gotten it working by downgrading the Nvidia driver to version 552.44. Nothing else helped. It seems the latest GPU driver, 555.85, has some issues.
from tensorflow.
@nikolayDemirev , @koryphaee ,
Looks like the issue is from the Nvidia GPU driver, where it is working with the driver version 552.44 or below. And on Nvidia driver and the tool kit, we don't have control to make the changes on the same. Could you please try to raise the concern on the Nvidia forum for the quick resolution. Thank you!
from tensorflow.
The relevant forums seems to be https://forums.developer.nvidia.com/c/developer-tools/cuda-developer-tools/285. Sadly I am not allowed to post there (I don't know why). I have submitted a support ticket instead and linked this issue. I will let you know if they respond.
from tensorflow.
We will track this in NVIDIA/nvidia-container-toolkit#520 going forward, thanks.
from tensorflow.
Docker Desktop 4.31 was released yesterday and includes NVIDIA Container Toolkit 1.15.0, which resolves this issue.
from tensorflow.
Based on the log output and the steps you’ve taken, it appears that TensorFlow within your Docker container is unable to initialize CUDA, leading to the error CUDA_ERROR_NOT_FOUND: named symbol not found
. This usually indicates an issue with CUDA installation or GPU driver compatibility within the Docker environment.
Here's a standalone code snippet and Docker run command to reproduce the issue, as well as additional steps to troubleshoot and resolve it.
Standalone Code to Reproduce the Issue
docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu bash -c "python3 -c 'import tensorflow as tf; print(tf.config.list_physical_devices(\"GPU\"))'"
Relevant Log Output
2024-05-27 18:05:30.149964: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 18:05:31.089452: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NOT_FOUND: named symbol not found
[]
Troubleshooting Steps
1. Verify GPU Driver and CUDA Toolkit in WSL2
Ensure that the GPU driver and CUDA toolkit are correctly installed in your WSL2 environment.
-
Install NVIDIA Drivers:
Follow the NVIDIA guide for WSL2 to install the latest GPU driver for WSL2.
-
Install CUDA Toolkit in WSL2:
Install the CUDA toolkit by following the official NVIDIA CUDA on WSL user guide.
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub sudo sh -c 'echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list' sudo apt-get update sudo apt-get -y install cuda
2. Check Docker Configuration
Ensure Docker is configured to use the NVIDIA runtime.
-
Install NVIDIA Container Toolkit:
sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker
-
Set Default Runtime:
Add the following to
/etc/docker/daemon.json
:{ "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }
Restart Docker:
sudo systemctl restart docker
3. Verify CUDA in Docker Container
Ensure that the Docker container can access CUDA.
-
Run CUDA Container:
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
-
Test CUDA Program:
Run a simple CUDA program inside the TensorFlow Docker container.
docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu bash -c "apt-get update && apt-get install -y cuda-samples-11-8 && cd /usr/local/cuda-11.8/samples/1_Utilities/deviceQuery && make && ./deviceQuery"
4. Set Environment Variables in Docker Container
Ensure the correct CUDA environment variables are set within the container.
-
Run TensorFlow with CUDA Environment Variables:
docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu bash -c "export LD_LIBRARY_PATH=/usr/local/cuda/lib64 && export CUDA_HOME=/usr/local/cuda && python3 -c 'import tensorflow as tf; print(tf.config.list_physical_devices(\"GPU\"))'"
5. Reinstall GPU Drivers on Host
Reinstall the NVIDIA GPU drivers on your host machine to ensure compatibility with WSL2 and Docker.
-
Clean Installation of NVIDIA Drivers:
- Use the Display Driver Uninstaller (DDU) to uninstall existing drivers.
- Download and install the latest drivers from the NVIDIA website.
6. Update Docker and WSL2
Ensure Docker Desktop and WSL2 are up to date.
-
Update Docker Desktop:
Download and install the latest version of Docker Desktop from the Docker website.
-
Update WSL2 Kernel:
Update the WSL2 kernel:
wsl --update
from tensorflow.
You do not need the Nvidia Container Toolkit in WSL. It's only necessary on a native Ubuntu host. I have tried with and without and it doesn't make a difference.
from tensorflow.
@nikolayDemirev,
I was successfully able to use tensorflow-gpu with docker with Tensorflow 2.16.1. Could you please check if the requirements are installed properly or not.
https://stackoverflow.com/questions/78418499/using-tensorflow-with-gpu-on-docker-on-ubuntu
from tensorflow.
Their support redirected me back to a more specific forum and I made a post there: https://forums.developer.nvidia.com/t/driver-555-85-is-unable-to-detect-gpu/294495
from tensorflow.
I can confirm it works now using
- Docker Desktop 4.31
- GPU Driver 555.99
Running
docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
yields
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Thank you for the quick fix and letting us know <3
from tensorflow.
@nikolayDemirev @koryphaee
Hope this issue has been fixed from the Nvidia side with the mentioned issue NVIDIA/nvidia-container-toolkit#520
Could you please feel free to move this issue to the closed status. Thank you!
from tensorflow.
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.
from tensorflow.
Are you satisfied with the resolution of your issue?
Yes
No
from tensorflow.
Related Issues (20)
- Cannot take the length of shape with unknown rank. HOT 16
- Non-deprecated tf.keras.preprocessing alternatives don't cover properly all the deprecated features
- Some problems occur in the installation of TF-2.13.0 HOT 3
- build arm image use docker HOT 2
- cannot import name 'mean_squared_error' from 'tensorflow.keras.losses HOT 3
- Improve documentation for custom training loops HOT 1
- `tf.debugging.enable_check_numerics` maximum recursion depth HOT 1
- AttributeError: module 'tensorflow' has no attribute 'layers' HOT 1
- AttributeError: module 'tensorflow' has no attribute 'layers' HOT 4
- Build TensorFlow for raspberry pi failed by ): relocation truncated to fit: R_AARCH64_CALL26 against symbol HOT 3
- Tensorflow.keras cannot be resolved HOT 3
- ImportError: Failed to load the native TensorFlow runtime due to undefined symbol HOT 2
- Deprecation warning when importing tensorflow HOT 2
- Issues in Tensorflow model training HOT 2
- Reduce TensorFlow Lite binary size HOT 1
- Consider dilation parameter support for Conv2dTranspose
- Got unexpected result in special case for Conv2dTranspose HOT 3
- Compiling doesn't change learning rate of reloaded model. Instead changes learning-rate of optimizer HOT 1
- Silent exception when using TFLiteConverter
- When should I include tensorflow-lite-gpu-delegate-plugin dependency in my android project? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow.