Comments (8)
Yes, I agree. I am running into the same issue now. This is particularly frustrating because of the arcane versioning of CUDA-related toolsets (i.e. the Python packages vs. CUDA vs. the dependency matrix in the documentation). For example:
- TensorFlow documentation lists the correct CUDA version as 11.8, so I installed that and updated my $PATH, $LD_LIBRARY_PATH, etc. accordingly (along with cudNN 8.6 as listed).
- When I use a fresh Python 3.10 installation to install
tensorflow[and-cuda]
via Pip, it seems to be defaulting to CUDA runtime 12?
Collecting nvidia-cublas-cu12==12.3.4.1
Using cached nvidia_cublas_cu12-12.3.4.1-py3-none-manylinux1_x86_64.whl (412.6 MB)
Collecting nvidia-cuda-nvrtc-cu12==12.3.107
Using cached nvidia_cuda_nvrtc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (24.9 MB)
Collecting nvidia-curand-cu12==10.3.4.107
Using cached nvidia_curand_cu12-10.3.4.107-py3-none-manylinux1_x86_64.whl (56.3 MB)
Collecting nvidia-cusparse-cu12==12.2.0.103
Using cached nvidia_cusparse_cu12-12.2.0.103-py3-none-manylinux1_x86_64.whl (197.5 MB)
Collecting nvidia-nvjitlink-cu12==12.3.101
Using cached nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (20.5 MB)
Collecting nvidia-nccl-cu12==2.19.3
Using cached nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
Collecting nvidia-cuda-nvcc-cu12==12.3.107
Using cached nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (22.0 MB)
Collecting nvidia-cusolver-cu12==11.5.4.101
Using cached nvidia_cusolver_cu12-11.5.4.101-py3-none-manylinux1_x86_64.whl (125.2 MB)
Collecting nvidia-cudnn-cu12==8.9.7.29
Using cached nvidia_cudnn_cu12-8.9.7.29-py3-none-manylinux1_x86_64.whl (704.7 MB)
Collecting nvidia-cufft-cu12==11.0.12.1
Using cached nvidia_cufft_cu12-11.0.12.1-py3-none-manylinux1_x86_64.whl (98.8 MB)
Collecting nvidia-cuda-cupti-cu12==12.3.101
Using cached nvidia_cuda_cupti_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (14.0 MB)
Collecting nvidia-cuda-runtime-cu12==12.3.101
Using cached nvidia_cuda_runtime_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (867 kB)
and relevant links in the docs only seem to link out to Docker-related stuff, like https://www.tensorflow.org/install/source so the vast majority of information on the internet is out of date.
Is there any clearer guidance for how to get TensorFlow working on GPUs assuming your CUDA install is non-standard, i.e., not installed out of the Ubuntu package repo (which is infeasible in many academic settings)?
Thanks very much in advance.
EDIT: I was able to resolve this by using the TF_CPP_MAX_VLOG_LEVEL=3
(something buried in the above linked issue) to debug. It turned out that our new module system was nuking my LD_LIBRARY_PATH
after cudNN was imported, so CUDA could be found but cudNN could not. Adding a note about this option to the error message around GPUs could potentially save a lot of grief (even in scenarios like mine where the issue lies not with TensorFlow, but something upstream). Just a thought. May help you as well @stellarpower (seems to be what you were looking for when you opened the issue)
from tensorflow.
Hi @stellarpower ,
You need to install GPU driver manually.After that you need to set LD_LIBRARY_PATH to the path where nvidia libraries installed. You may refer this comment . Please refer #63362 for more details. Thanks
from tensorflow.
Thanks; I had done all this previously.
But I have opened as an issue irrespective of my own setup, because I believe it should be possible to get more information from the error message. Without knowing what libraries failed to be opened, just re-installing and following the instructions again isn't a particularly efficient way to debug what happened.
from tensorflow.
@wjno thanks - I resolved the underlying problem in the end, and from memory thought I had increased the log verbosity as high as it would go, but maybe I had not. If I encounter some library problems again I'll give it a go. Cheers!
from tensorflow.
EDIT: I was able to resolve this by using the
TF_CPP_MAX_VLOG_LEVEL=3
(something buried in the above linked issue) to debug. It turned out that our new module system was nuking myLD_LIBRARY_PATH
after cudNN was imported, so CUDA could be found but cudNN could not. Adding a note about this option to the error message around GPUs could potentially save a lot of grief (even in scenarios like mine where the issue lies not with TensorFlow, but something upstream). Just a thought. May help you as well @stellarpower (seems to be what you were looking for when you opened the issue)
Hi @wjn0 , AFAIK the setting TF_CPP_MAX_VLOG_LEVEL=3
will only disable the debugging logs from the console.I doubt and want to know whether after disabling these logs then only cudnn libraries are being detectable? Setting right path for LD_LIBRARY_PATH
should resolve the issue irrespective of disabling the debugging logs.Correct me if i am wrong.
Thanks for the info.
from tensorflow.
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.
from tensorflow.
This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.
from tensorflow.
Are you satisfied with the resolution of your issue?
Yes
No
from tensorflow.
Related Issues (20)
- Tensorflow Developer certificate didnt recieved yet HOT 2
- TFLite for LSTM: Downscale accumulation from 32-bit to 16-bit before applying to activation HOT 2
- TypeError: len is not well defined for a symbolic Tensor (rnn_decoder_1/gru_1/Squeeze:0). Please call `x.shape` rather than `len(x)` for shape information. HOT 3
- dynamic input shape with InferenceRunner HOT 1
- Trouble Running TensorFlow v2.16.1 with NVIDIA GeForce 940MX GPU #914 HOT 2
- There is no target called wheel HOT 2
- TensorFlow Cuda in Docker under WSL2 not wokring HOT 15
- "CUDA_ERROR_NOT_FOUND: named symbol not found" in Docker container HOT 10
- There was no error when converting the lite model but an error occurred when calling the Interpreter allocate_tensors() method. It will appear if the Conv1D data_format parameter is set to channels_first and the dilation_rate parameter > 1 HOT 3
- Issue with Tesnorflow JS Face Detection on Production HOT 4
- [RNN] LSTM Model conversion error after upgrading to tf 2.16.1 from 2.15 HOT 3
- Training model with the Poisson loss function and the Adam optimizer resulted in NaN loss HOT 2
- Bazel compiling source code failed because of highwayhash/sip_hash.cc HOT 4
- segmentation fault when tf.histogram_fixed_width receives large `value_range` and `nbins` on CPU mode
- Wrong explanation about an argument of tflite interpreter HOT 4
- Not able to build TensorFlow with GPU support HOT 2
- ValueError: `validation_split` is only supported for Tensors or NumPy arrays, found following types in the input: [<class 'int'>] HOT 3
- __add__ with floating point values HOT 1
- TypeError: Expected int32, got 1e-07 of type 'float' instead. HOT 6
- Current tensorflow[and-cuda] installed by pip pulls ptxas which causes Jupyter kernel restart HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow.