With: latest 2.6 EmguTF Ampere RTX A4000 card <l

Current build has been updated: <a href="https://github.com/emgucv/emgutf/b

Startup Delay with Ampere architecture GPU about emgutf HOT 7 CLOSED

JohnTenney commented on June 9, 2024

Startup Delay with Ampere architecture GPU

from emgutf.

Comments (7)

emgucv commented on June 9, 2024

The most guranteed way to speed up starting time for RTX A4000 (or any Amphere compute 8.6 cards) is to pre-compiles SASS to target sm_86.

The tensorflow v2.6 release has the following options:
https://github.com/tensorflow/tensorflow/blob/v2.6.3/.bazelrc#L576
build:release_gpu_base --repo_env=TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,compute_80"
Note that compute_80 means compiles to PTX for compute 8.0, which means when running on RTX4000 it will goes throught PTX 8.0 => SASS 8.6 convertion by the Nvidia driver and that is what the delays comes from.

The tensorflow v2.8 release has the following options:
https://github.com/tensorflow/tensorflow/blob/v2.8.0/.bazelrc#L608
build:release_gpu_base --repo_env=TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,compute_80"
It is the same unchanged configuration, it won't speed up the start up time. Same for tensorflow v2.7 release.

If you added sm_86 to the compute capability and recompile Emgu TF with the flag, e.g. change it to
build:release_gpu_base --repo_env=TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,sm_86,compute_80"
it should make it possible to start up with RTX A4000 way faster. The resulting binary will be larger though.

from emgutf.

emgucv commented on June 9, 2024

Btw, our release will use the same default Tensorflow compilation flags to make sure Emgu TF behaves the same as the official Tensorflow release.

from emgutf.

SohlKim commented on June 9, 2024

It does look like we're running into PTX compilation on Ampere and it sounds like this may be resolved whenever the official Tensorflow release is updated to natively support sm_80 (or even sm_86).

However, on upgrading from EmguTF 2.2 to EmguTF 2.6, we're now seeing PTX compilation for older cards as well, including all Turing (sm_75) hardware. We've confirmed that EmguTF 2.6 is reporting: TensorFlow was not built with CUDA kernel binaries compatible with compute capability 7.5. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer., which seems unintended based on the flags listed for tensorflow 2.6 above.

Is there a way to tell what compute EmguTF 2.6 was precompiled for?

from emgutf.

emgucv commented on June 9, 2024

Found this command to list the compute capability the dll included:
cuobjdump -ptx .\tfextern.dll > out.txt

I tested against the Emgu TF 2.6 release, the header says:

Fatbin ptx code:
================
arch = sm_52
code version = [7,3]
producer = <unknown>
host = windows
compile_size = 64bit
compressed
...

That means PTX sm_52 is included for the 2.6 Emgu TF with cuda for windows release.

I am tracing back the build script. For the 2.6 release, we used this build script:
https://github.com/emgucv/emgutf/blob/2.6.0/platforms/windows/bazel_build_tf.bat#L111
The line I am highlighting set the compute capability to 5.2

The reason is that we are using the tensorflow build script for windows here:
https://github.com/tensorflow/tensorflow/blob/v2.6.0/tensorflow/tools/ci_build/windows/libtensorflow_gpu.sh
which reference the comon_env.sh script here:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/windows/bazel/common_env.sh#L60
With the highlighted line only enable compute capability 6.0 for windows build. This overrides the default bazel configuration here:
https://github.com/tensorflow/tensorflow/blob/v2.6.0/.bazelrc#L576

I remember there was a request to make Emgu TF compatible with older 5.2 devices and that's why our build script changes that to 5.2 instead of 6.0.

I will work on enabling a full list of arch for our windows build. Will try to set:
TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,sm_86,compute_80"

I will keep you posted with updates.

from emgutf.

emgucv commented on June 9, 2024

Current build script has been updated:
https://github.com/emgucv/emgutf/blob/master/platforms/windows/bazel_build_tf.bat#L134
SET TF_CUDA_COMPUTE_CAPABILITIES=sm_35,sm_50,sm_60,sm_70,sm_75,sm_80,sm_86,compute_80
The next release will have the above compute capabilities enabled.

from emgutf.

emgucv commented on June 9, 2024

FYI, the new release v2.8.0 is out with the above compute capabilities enabled.

from emgutf.

emgucv commented on June 9, 2024

Closing ticket now.

from emgutf.

Startup Delay with Ampere architecture GPU about emgutf HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent