This is a follow-up on a previous message. I am encountering build errors, and don't s

When the GPU is enabled, whether it is with CUDA or OpenCL, all the heavy work i

Unsuccessful Build on A10-7850K, please help! about tensorflow-opencl HOT 10 OPEN

benoitsteiner commented on May 21, 2024

Unsuccessful Build on A10-7850K, please help!

from tensorflow-opencl.

Comments (10)

Zakor94 commented on May 21, 2024 1

You are not doing anything wrong. There is an issue with SYCL and the SIMD instructions (i.e. mavx and msee flags). I am running tests right now to see if a simple fix is possible which would allow to keep the SIMD instructions for the CPU.
Otherwise the only solution would be to remove these flags.
I will keep you in touch.

Also I noticed that you are using clang. You should switch to gcc-4.8 (and g++4.8) at least.

from tensorflow-opencl.

Zakor94 commented on May 21, 2024 1

Ok this seems to compile and pass the tests just fine.
Make sure you first switch to gcc-4.8 (the link you mentioned needs to be updated).
If this still does not work, please try to apply my fix that you can find here: lukeiwanski/tensorflow@fabe385

from tensorflow-opencl.

Zakor94 commented on May 21, 2024 1

Ok so this is just some incomplete information.
If you have a look here: http://deep-beta.co.uk/setting-up-tensorflow-with-opencl-using-sycl/ you can find a very similar guide. The last command you need to launch from the _python_build directory is
python setup.py develop
This will actually create an egg-link in the dist-package folder that points to the _python_build folder and acts just like a package installed from pip.
Note that if you want to launch python and import tensorflow you will have to be outside the repository project.

Also I wasn't able to create a whl like you tried either. Not sure why.

from tensorflow-opencl.

Zakor94 commented on May 21, 2024 1

When the GPU is enabled, whether it is with CUDA or OpenCL, all the heavy work is done there so the CPU won't be much used. The optimization flags actually don't affect the CPU usage. So it is expected to observe a usage around 100%.
I think your VRAM is definitely the bottleneck here as you suspected. For comparison I have 4GB. It is true that a lot of copies happen between the CPU and GPU. Some work still needs to be done to avoid that. Then your question is actually related to SYCL, you can probably have more help here: https://github.com/lukeiwanski/tensorflow.

Well assuming you don't want to spend more money on GPU, the only possibility I see is contributing to this repository ;) There are other optimizations to do to avoid copies.

(ok that's very nice of you ^^)

from tensorflow-opencl.

thornhale commented on May 21, 2024

Thank you. That helped with compiling! Compiling now finishes successfully. However, I am not quite sure how to proceed from here to get a working build within a virtual environment. I tried to create a wheel like so:

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Then, I tried:

pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl

Now when I try to just import Tensorflow in Jupyter Notebook, I am getting the following errors:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/tensorflow/python/__init__.py in <module>()
     60     sys.setdlopenflags(_default_dlopen_flags | ctypes.RTLD_GLOBAL)
---> 61     from tensorflow.python import pywrap_tensorflow
     62     sys.setdlopenflags(_default_dlopen_flags)

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py in <module>()
     27             return _mod
---> 28     _pywrap_tensorflow = swig_import_helper()
     29     del swig_import_helper

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py in swig_import_helper()
     23             try:
---> 24                 _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
     25             finally:

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/imp.py in load_module(name, file, filename, details)
    241         else:
--> 242             return load_dynamic(name, filename, file)
    243     elif type_ == PKG_DIRECTORY:

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/imp.py in load_dynamic(name, path, file)
    341             name=name, loader=loader, origin=path)
--> 342         return _load(spec)
    343 

ImportError: libComputeCpp.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-1-c61832825467> in <module>()
     28 from sklearn.cross_validation import KFold, StratifiedKFold
     29 from sklearn.model_selection import train_test_split
---> 30 from keras.applications import ResNet50, InceptionV3
     31 from keras.models import Sequential, Model
     32 from keras.layers.core import Dense, Dropout, Flatten

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/keras/__init__.py in <module>()
      1 from __future__ import absolute_import
      2 
----> 3 from . import activations
      4 from . import applications
      5 from . import backend

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/keras/activations.py in <module>()
      1 from __future__ import absolute_import
      2 import six
----> 3 from . import backend as K
      4 from .utils.generic_utils import deserialize_keras_object
      5 

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/keras/backend/__init__.py in <module>()
     71 elif _BACKEND == 'tensorflow':
     72     sys.stderr.write('Using TensorFlow backend.\n')
---> 73     from .tensorflow_backend import *
     74 else:
     75     raise ValueError('Unknown backend: ' + str(_BACKEND))

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in <module>()
----> 1 import tensorflow as tf
      2 from tensorflow.python.training import moving_averages
      3 from tensorflow.python.ops import tensor_array_ops
      4 from tensorflow.python.ops import control_flow_ops
      5 from tensorflow.python.ops import functional_ops

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/tensorflow/__init__.py in <module>()
     22 
     23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
     25 # pylint: enable=wildcard-import
     26 

/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/tensorflow/python/__init__.py in <module>()
     70 for some common reasons and solutions.  Include the entire stack trace
     71 above this error message when asking for help.""" % traceback.format_exc()
---> 72   raise ImportError(msg)
     73 
     74 # Protocol buffers

ImportError: Traceback (most recent call last):
  File "/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 61, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
  File "/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/anthonyle/anaconda3/envs/deep_learning_gpu3/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libComputeCpp.so: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

Am I too ambitious trying to make a wheel and then do a pip install into my virtual environment? I see that in the tutorial your colleague is actually:

`
$ mkdir _python_build

$ cd _python_build

$ ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow/* .

$ ln -s ../tensorflow/tools/pip_package/* .
`
...creating symbolic links into the _python_build folder. I fail to understand how this will install a python package into a site-package folder...or make tensorflow available to the system. Could you help enlighten me on that?

from tensorflow-opencl.

thornhale commented on May 21, 2024

Hi Zakor,

I thought it would be wise for me to also run the tests as recommended. I am noticing 2 things:

1.) There is a timeout option set on the tutorial like so:
bazel test --config=sycl -k --test_timeout 1600 -- //tensorflow/... -//tensorflow/contrib/... -//tensorflow/java/... -//tensorflow /compiler/...

It's rather high. Are these limits high because the test computations are rather expensive?

2.) A lot of these tests time out on my setup. Is this an indication that the integrated GPU is not quite powerful enough to perform these tests in time? At the writing of these tests, the tests have been running for 2 days now. I expect the tests to finish within one more day. Is this an indication of deeper problems? E.g.: The GPU is not actually used etc.

Update:

After compiling the build without errors, I proceeded to compare some performance:

Tensorflow from pip: 1 epoch = ~ 1,700 sec (CPU utilization ~ 350%)
Tensorflow+SSE (4.1+4.2)+AVX+Keras on CIFAR10 dataset: 1 epoch = ~ 1,100 sec (CPU utilization ~ 350%)
Tensorflow+OpenCl: 1 epoch = ~ 11,000 sec (CPU utilization ~ 150%)

This is about 10x worse than what I would get with just optimized tensorflow compilation. I am not currently gaining the hoped for performance increases. How can I test if the GPU is used at all?

from tensorflow-opencl.

Zakor94 commented on May 21, 2024

Hi,
I am not sure why we need such a high timeout as well.
Two days is definitely too much! I usually only test -- //tensorflow/... -//tensorflow/compiler/... which takes less than an hour. Also try adding --local_test_jobs=8. Even when I ran all the tests I don't remember of any timeouts.
Maybe it is as you say because of the integrated GPU that is not powerful enough. To make sure I think you should use an external tool such as aticonfig --odgc --odgt since you have an AMD.

from tensorflow-opencl.

thornhale commented on May 21, 2024

Thank you for guiding me along:

I have reinstalled tensorflow-opencl because I specified incorrect paths to the computcpp, g++ and g compiler last time. I have also rerun my CIFAR-10 dataset to benchmark tensorflow-opencl with the integrated GPU on my AMD A10-7850 setup. This time I also looked at GPU usage. I was able to verify that the GPU is used at 100% capacity. The CPU usage is still at 150%. The time to process 1 epoch is still about 5800 seconds .

So this is still about 5X worse than just using optimized compilation flags. These are my general observations and thoughts after having tried out tensorflow-opencl:

1.) The fact that on my 4-core setup, the CPU usage is only 150% percent indicates that somehow multithreading is not fully efficient in the opencl setup because without opencl all 4 cores get used.
2.) I am not sure what it is but I thought that any matrix computation in a GPU should be faster if executed there instead of a CPU. I think in the case of opencl 1.2 we are still talking about copying data from CPU-RAM to GPU-RAM before any computations can be done. And if the GPU only has 1 GB of RAM there is potentially a lot of copying back and forth. Could this be the cause for the slowness of the computations? With opencl 2.0, I think one does not have to copy data back and forth, but could just pass pointers. May I ask what the reason is for going with opencl 1.2 instead of opencl 2.0? So the full potential of APUs cannot currently be exploited.
3.) It appears that opencl 1.2 is not a full substitute for handcrafted/optimized assembler libraries at this time.

How can I further increase computation times without NVIDIA GPUs? I still have a few months before I commit. One of the things I am waiting for is the release of the VEGA GPUs. In the absence of any softwareframe work ontop to exploit the GPU potential, it will be hard to go with the VEGA cards though. For one, the talked about rocM and MIopen initiatives have not been released.

In general what are your thoughts?

(Oh and by the way, if you need help with benchmarking on an APU system, now that things are working for me, I would be happy to help out!)

from tensorflow-opencl.

thornhale commented on May 21, 2024

Well, I will spend more money on a GPU in about 2-5 months. The question then will be what GPU to get (NVIDIA vs AMD). At this point, the answer is tilting toward NVIDIA. But I really want to give AMD a good chance first. With MIOpen and rocM, and this, the most robust path on AMD GPUs seems less defined.

This probably goes outside the scope of this discussion...in which case, could you point me to lists of optimizations that still need to be done?

from tensorflow-opencl.

Zakor94 commented on May 21, 2024

Yes this is definitely getting out of scope. Please open an issue on https://github.com/lukeiwanski/tensorflow about the optimizations that can be done to avoid copies.

from tensorflow-opencl.

Unsuccessful Build on A10-7850K, please help! about tensorflow-opencl HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent