Code Monkey home page Code Monkey logo

Comments (8)

wjn0 avatar wjn0 commented on June 16, 2024 1

Yes, I agree. I am running into the same issue now. This is particularly frustrating because of the arcane versioning of CUDA-related toolsets (i.e. the Python packages vs. CUDA vs. the dependency matrix in the documentation). For example:

  • TensorFlow documentation lists the correct CUDA version as 11.8, so I installed that and updated my $PATH, $LD_LIBRARY_PATH, etc. accordingly (along with cudNN 8.6 as listed).
  • When I use a fresh Python 3.10 installation to install tensorflow[and-cuda] via Pip, it seems to be defaulting to CUDA runtime 12?
Collecting nvidia-cublas-cu12==12.3.4.1
  Using cached nvidia_cublas_cu12-12.3.4.1-py3-none-manylinux1_x86_64.whl (412.6 MB)
Collecting nvidia-cuda-nvrtc-cu12==12.3.107
  Using cached nvidia_cuda_nvrtc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (24.9 MB)
Collecting nvidia-curand-cu12==10.3.4.107
  Using cached nvidia_curand_cu12-10.3.4.107-py3-none-manylinux1_x86_64.whl (56.3 MB)
Collecting nvidia-cusparse-cu12==12.2.0.103
  Using cached nvidia_cusparse_cu12-12.2.0.103-py3-none-manylinux1_x86_64.whl (197.5 MB)
Collecting nvidia-nvjitlink-cu12==12.3.101
  Using cached nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (20.5 MB)
Collecting nvidia-nccl-cu12==2.19.3
  Using cached nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
Collecting nvidia-cuda-nvcc-cu12==12.3.107
  Using cached nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (22.0 MB)
Collecting nvidia-cusolver-cu12==11.5.4.101
  Using cached nvidia_cusolver_cu12-11.5.4.101-py3-none-manylinux1_x86_64.whl (125.2 MB)
Collecting nvidia-cudnn-cu12==8.9.7.29
  Using cached nvidia_cudnn_cu12-8.9.7.29-py3-none-manylinux1_x86_64.whl (704.7 MB)
Collecting nvidia-cufft-cu12==11.0.12.1
  Using cached nvidia_cufft_cu12-11.0.12.1-py3-none-manylinux1_x86_64.whl (98.8 MB)
Collecting nvidia-cuda-cupti-cu12==12.3.101
  Using cached nvidia_cuda_cupti_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (14.0 MB)
Collecting nvidia-cuda-runtime-cu12==12.3.101
  Using cached nvidia_cuda_runtime_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (867 kB)

and relevant links in the docs only seem to link out to Docker-related stuff, like https://www.tensorflow.org/install/source so the vast majority of information on the internet is out of date.

Is there any clearer guidance for how to get TensorFlow working on GPUs assuming your CUDA install is non-standard, i.e., not installed out of the Ubuntu package repo (which is infeasible in many academic settings)?

Thanks very much in advance.

EDIT: I was able to resolve this by using the TF_CPP_MAX_VLOG_LEVEL=3 (something buried in the above linked issue) to debug. It turned out that our new module system was nuking my LD_LIBRARY_PATH after cudNN was imported, so CUDA could be found but cudNN could not. Adding a note about this option to the error message around GPUs could potentially save a lot of grief (even in scenarios like mine where the issue lies not with TensorFlow, but something upstream). Just a thought. May help you as well @stellarpower (seems to be what you were looking for when you opened the issue)

from tensorflow.

SuryanarayanaY avatar SuryanarayanaY commented on June 16, 2024

Hi @stellarpower ,

You need to install GPU driver manually.After that you need to set LD_LIBRARY_PATH to the path where nvidia libraries installed. You may refer this comment . Please refer #63362 for more details. Thanks

from tensorflow.

stellarpower avatar stellarpower commented on June 16, 2024

Thanks; I had done all this previously.

But I have opened as an issue irrespective of my own setup, because I believe it should be possible to get more information from the error message. Without knowing what libraries failed to be opened, just re-installing and following the instructions again isn't a particularly efficient way to debug what happened.

from tensorflow.

stellarpower avatar stellarpower commented on June 16, 2024

@wjno thanks - I resolved the underlying problem in the end, and from memory thought I had increased the log verbosity as high as it would go, but maybe I had not. If I encounter some library problems again I'll give it a go. Cheers!

from tensorflow.

SuryanarayanaY avatar SuryanarayanaY commented on June 16, 2024

EDIT: I was able to resolve this by using the TF_CPP_MAX_VLOG_LEVEL=3 (something buried in the above linked issue) to debug. It turned out that our new module system was nuking my LD_LIBRARY_PATH after cudNN was imported, so CUDA could be found but cudNN could not. Adding a note about this option to the error message around GPUs could potentially save a lot of grief (even in scenarios like mine where the issue lies not with TensorFlow, but something upstream). Just a thought. May help you as well @stellarpower (seems to be what you were looking for when you opened the issue)

Hi @wjn0 , AFAIK the setting TF_CPP_MAX_VLOG_LEVEL=3 will only disable the debugging logs from the console.I doubt and want to know whether after disabling these logs then only cudnn libraries are being detectable? Setting right path for LD_LIBRARY_PATH should resolve the issue irrespective of disabling the debugging logs.Correct me if i am wrong.

Thanks for the info.

from tensorflow.

github-actions avatar github-actions commented on June 16, 2024

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

from tensorflow.

github-actions avatar github-actions commented on June 16, 2024

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

from tensorflow.

google-ml-butler avatar google-ml-butler commented on June 16, 2024

Are you satisfied with the resolution of your issue?
Yes
No

from tensorflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.