Code Monkey home page Code Monkey logo

Comments (18)

koryphaee avatar koryphaee commented on June 22, 2024 6

I can confirm the workaround suggested by @nikolayDemirev is working. After downgrading from driver version 555.85 to 552.44 TensorFlow successfully detects the GPU with CUDA.
Since this appears to be a regression in the NVidia driver it would probably be best to lets them know. Is there somebody at TensorFlow who can do that? I could contact customer support but I'm not sure I would reach the right people.

from tensorflow.

nikolayDemirev avatar nikolayDemirev commented on June 22, 2024 5

@koryphaee I have solved the issue by installing an older NVIDIA driver (552.44). I have installed it with the "Perform a clean install" option from the Custom (Advanced) installation menu. You can find older versions here: https://www.nvidia.com/Download/Find.aspx#

Untitled

The only difference is downgrading the driver version.

from tensorflow.

RikPi avatar RikPi commented on June 22, 2024 4

I can confirm the workaround suggested by @nikolayDemirev is working. After downgrading from driver version 555.85 to 552.44 TensorFlow successfully detects the GPU with CUDA. Since this appears to be a regression in the NVidia driver it would probably be best to lets them know. Is there somebody at TensorFlow who can do that? I could contact customer support but I'm not sure I would reach the right people.

I had the same problem too, 555 detects GPU with nvidia-smi but does not pass it to TensorFlow. I have downgraded to 552 and now TensorFlow detects the GPU and uses it.
Thank you :)

from tensorflow.

cliffwoolley avatar cliffwoolley commented on June 22, 2024 4

NVIDIA is working on this. If you user Docker CE on the Linux side of WSL, then update your nvidia-container-toolkit to 1.14.4 or later. If you are using Docker Desktop, then for now the best approach is to use a driver 552.xx or earlier. I will post back here as soon as the remaining incompatibility is resolved.

Cliff Woolley
Sr. Director, Deep Learning Software
NVIDIA

from tensorflow.

TitanTomorrow avatar TitanTomorrow commented on June 22, 2024 2

The solution works for me...

Num GPUs Available: 1

from tensorflow.

nikolayDemirev avatar nikolayDemirev commented on June 22, 2024 1

@nikolayDemirev, I was successfully able to use tensorflow-gpu with docker with Tensorflow 2.16.1. Could you please check if the requirements are installed properly or not.

https://stackoverflow.com/questions/78418499/using-tensorflow-with-gpu-on-docker-on-ubuntu

image image

I have only gotten it working by downgrading the Nvidia driver to version 552.44. Nothing else helped. It seems the latest GPU driver, 555.85, has some issues.

from tensorflow.

tilakrayal avatar tilakrayal commented on June 22, 2024 1

@nikolayDemirev , @koryphaee ,
Looks like the issue is from the Nvidia GPU driver, where it is working with the driver version 552.44 or below. And on Nvidia driver and the tool kit, we don't have control to make the changes on the same. Could you please try to raise the concern on the Nvidia forum for the quick resolution. Thank you!

from tensorflow.

koryphaee avatar koryphaee commented on June 22, 2024 1

The relevant forums seems to be https://forums.developer.nvidia.com/c/developer-tools/cuda-developer-tools/285. Sadly I am not allowed to post there (I don't know why). I have submitted a support ticket instead and linked this issue. I will let you know if they respond.

from tensorflow.

cliffwoolley avatar cliffwoolley commented on June 22, 2024 1

We will track this in NVIDIA/nvidia-container-toolkit#520 going forward, thanks.

from tensorflow.

cliffwoolley avatar cliffwoolley commented on June 22, 2024 1

Docker Desktop 4.31 was released yesterday and includes NVIDIA Container Toolkit 1.15.0, which resolves this issue.

from tensorflow.

Sr1ya avatar Sr1ya commented on June 22, 2024

Based on the log output and the steps you’ve taken, it appears that TensorFlow within your Docker container is unable to initialize CUDA, leading to the error CUDA_ERROR_NOT_FOUND: named symbol not found. This usually indicates an issue with CUDA installation or GPU driver compatibility within the Docker environment.

Here's a standalone code snippet and Docker run command to reproduce the issue, as well as additional steps to troubleshoot and resolve it.

Standalone Code to Reproduce the Issue

docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu bash -c "python3 -c 'import tensorflow as tf; print(tf.config.list_physical_devices(\"GPU\"))'"

Relevant Log Output

2024-05-27 18:05:30.149964: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-27 18:05:31.089452: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NOT_FOUND: named symbol not found
[]

Troubleshooting Steps

1. Verify GPU Driver and CUDA Toolkit in WSL2

Ensure that the GPU driver and CUDA toolkit are correctly installed in your WSL2 environment.

  1. Install NVIDIA Drivers:

    Follow the NVIDIA guide for WSL2 to install the latest GPU driver for WSL2.

  2. Install CUDA Toolkit in WSL2:

    Install the CUDA toolkit by following the official NVIDIA CUDA on WSL user guide.

    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
    sudo sh -c 'echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
    sudo apt-get update
    sudo apt-get -y install cuda

2. Check Docker Configuration

Ensure Docker is configured to use the NVIDIA runtime.

  1. Install NVIDIA Container Toolkit:

    sudo apt-get install -y nvidia-docker2
    sudo systemctl restart docker
  2. Set Default Runtime:

    Add the following to /etc/docker/daemon.json:

    {
      "runtimes": {
        "nvidia": {
          "path": "nvidia-container-runtime",
          "runtimeArgs": []
        }
      }
    }

    Restart Docker:

    sudo systemctl restart docker

3. Verify CUDA in Docker Container

Ensure that the Docker container can access CUDA.

  1. Run CUDA Container:

    docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
  2. Test CUDA Program:

    Run a simple CUDA program inside the TensorFlow Docker container.

    docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu bash -c "apt-get update && apt-get install -y cuda-samples-11-8 && cd /usr/local/cuda-11.8/samples/1_Utilities/deviceQuery && make && ./deviceQuery"

4. Set Environment Variables in Docker Container

Ensure the correct CUDA environment variables are set within the container.

  1. Run TensorFlow with CUDA Environment Variables:

    docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu bash -c "export LD_LIBRARY_PATH=/usr/local/cuda/lib64 && export CUDA_HOME=/usr/local/cuda && python3 -c 'import tensorflow as tf; print(tf.config.list_physical_devices(\"GPU\"))'"

5. Reinstall GPU Drivers on Host

Reinstall the NVIDIA GPU drivers on your host machine to ensure compatibility with WSL2 and Docker.

  1. Clean Installation of NVIDIA Drivers:

6. Update Docker and WSL2

Ensure Docker Desktop and WSL2 are up to date.

  1. Update Docker Desktop:

    Download and install the latest version of Docker Desktop from the Docker website.

  2. Update WSL2 Kernel:

    Update the WSL2 kernel:

    wsl --update

from tensorflow.

koryphaee avatar koryphaee commented on June 22, 2024

You do not need the Nvidia Container Toolkit in WSL. It's only necessary on a native Ubuntu host. I have tried with and without and it doesn't make a difference.

from tensorflow.

tilakrayal avatar tilakrayal commented on June 22, 2024

@nikolayDemirev,
I was successfully able to use tensorflow-gpu with docker with Tensorflow 2.16.1. Could you please check if the requirements are installed properly or not.

https://stackoverflow.com/questions/78418499/using-tensorflow-with-gpu-on-docker-on-ubuntu

image
image

from tensorflow.

koryphaee avatar koryphaee commented on June 22, 2024

Their support redirected me back to a more specific forum and I made a post there: https://forums.developer.nvidia.com/t/driver-555-85-is-unable-to-detect-gpu/294495

from tensorflow.

koryphaee avatar koryphaee commented on June 22, 2024

I can confirm it works now using

  • Docker Desktop 4.31
  • GPU Driver 555.99

Running

docker run --rm --gpus all -it tensorflow/tensorflow:latest-gpu python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

yields

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Thank you for the quick fix and letting us know <3

from tensorflow.

tilakrayal avatar tilakrayal commented on June 22, 2024

@nikolayDemirev @koryphaee
Hope this issue has been fixed from the Nvidia side with the mentioned issue NVIDIA/nvidia-container-toolkit#520

Could you please feel free to move this issue to the closed status. Thank you!

from tensorflow.

github-actions avatar github-actions commented on June 22, 2024

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

from tensorflow.

google-ml-butler avatar google-ml-butler commented on June 22, 2024

Are you satisfied with the resolution of your issue?
Yes
No

from tensorflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.