Code Monkey home page Code Monkey logo

Comments (7)

emgucv avatar emgucv commented on June 9, 2024

The most guranteed way to speed up starting time for RTX A4000 (or any Amphere compute 8.6 cards) is to pre-compiles SASS to target sm_86.

The tensorflow v2.6 release has the following options:
https://github.com/tensorflow/tensorflow/blob/v2.6.3/.bazelrc#L576
build:release_gpu_base --repo_env=TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,compute_80"
Note that compute_80 means compiles to PTX for compute 8.0, which means when running on RTX4000 it will goes throught PTX 8.0 => SASS 8.6 convertion by the Nvidia driver and that is what the delays comes from.

The tensorflow v2.8 release has the following options:
https://github.com/tensorflow/tensorflow/blob/v2.8.0/.bazelrc#L608
build:release_gpu_base --repo_env=TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,compute_80"
It is the same unchanged configuration, it won't speed up the start up time. Same for tensorflow v2.7 release.

If you added sm_86 to the compute capability and recompile Emgu TF with the flag, e.g. change it to
build:release_gpu_base --repo_env=TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,sm_86,compute_80"
it should make it possible to start up with RTX A4000 way faster. The resulting binary will be larger though.

from emgutf.

emgucv avatar emgucv commented on June 9, 2024

Btw, our release will use the same default Tensorflow compilation flags to make sure Emgu TF behaves the same as the official Tensorflow release.

from emgutf.

SohlKim avatar SohlKim commented on June 9, 2024

It does look like we're running into PTX compilation on Ampere and it sounds like this may be resolved whenever the official Tensorflow release is updated to natively support sm_80 (or even sm_86).

However, on upgrading from EmguTF 2.2 to EmguTF 2.6, we're now seeing PTX compilation for older cards as well, including all Turing (sm_75) hardware. We've confirmed that EmguTF 2.6 is reporting: TensorFlow was not built with CUDA kernel binaries compatible with compute capability 7.5. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer., which seems unintended based on the flags listed for tensorflow 2.6 above.

Is there a way to tell what compute EmguTF 2.6 was precompiled for?

from emgutf.

emgucv avatar emgucv commented on June 9, 2024

Found this command to list the compute capability the dll included:
cuobjdump -ptx .\tfextern.dll > out.txt

I tested against the Emgu TF 2.6 release, the header says:

Fatbin ptx code:
================
arch = sm_52
code version = [7,3]
producer = <unknown>
host = windows
compile_size = 64bit
compressed
...

That means PTX sm_52 is included for the 2.6 Emgu TF with cuda for windows release.

I am tracing back the build script. For the 2.6 release, we used this build script:
https://github.com/emgucv/emgutf/blob/2.6.0/platforms/windows/bazel_build_tf.bat#L111
The line I am highlighting set the compute capability to 5.2

The reason is that we are using the tensorflow build script for windows here:
https://github.com/tensorflow/tensorflow/blob/v2.6.0/tensorflow/tools/ci_build/windows/libtensorflow_gpu.sh
which reference the comon_env.sh script here:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/windows/bazel/common_env.sh#L60
With the highlighted line only enable compute capability 6.0 for windows build. This overrides the default bazel configuration here:
https://github.com/tensorflow/tensorflow/blob/v2.6.0/.bazelrc#L576

I remember there was a request to make Emgu TF compatible with older 5.2 devices and that's why our build script changes that to 5.2 instead of 6.0.

I will work on enabling a full list of arch for our windows build. Will try to set:
TF_CUDA_COMPUTE_CAPABILITIES="sm_35,sm_50,sm_60,sm_70,sm_75,sm_86,compute_80"

I will keep you posted with updates.

from emgutf.

emgucv avatar emgucv commented on June 9, 2024

Current build script has been updated:
https://github.com/emgucv/emgutf/blob/master/platforms/windows/bazel_build_tf.bat#L134
SET TF_CUDA_COMPUTE_CAPABILITIES=sm_35,sm_50,sm_60,sm_70,sm_75,sm_80,sm_86,compute_80
The next release will have the above compute capabilities enabled.

from emgutf.

emgucv avatar emgucv commented on June 9, 2024

FYI, the new release v2.8.0 is out with the above compute capabilities enabled.

from emgutf.

emgucv avatar emgucv commented on June 9, 2024

Closing ticket now.

from emgutf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.