Feature Request If this is a feature request, please fill out the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT about serving HOT 3 OPEN

zamazan4ik commented on June 26, 2024

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT

from serving.

Comments (3)

singhniraj08 commented on June 26, 2024 1

@zamazan4ik, Thank you for the detailed explanation. We will discuss this implementation internally and update this thread.

from serving.

singhniraj08 commented on June 26, 2024

@zamazan4ik,

We have documented Performace Guide for Tensorflow Serving to help users get optimal model server performance.
Can you please explain in detail what needs to be done from our end to implement PGO with Tensorflow Serving? Based on that I can take this feature implementation to the team. Thank you!

from serving.

zamazan4ik commented on June 26, 2024

Can you please explain in detail what needs to be done from our end to implement PGO with Tensorflow Serving? Based on that I can take this feature implementation to the team.

Sure! At first, you need to integrate the PGO-specific compiler flags into your build pipeline (here are described flags for Clang, here - for GCC. If you want to support other compilers - please use the corresponding documentation to these compilers). I recommend starting with the Instrumentation PGO since generally easier to implement.

Below I collected some examples of how PGO is integrated into the build scripts in other projects (so you can take a look at the existing implementations):

ISPC: CMake scipts
Rustc: a CI script for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script
Clang: Docs
Python:
- CPython: README
- Pyston: README
Go: Bash script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
Pydantic-core: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag
NodeJS: Configure script

After that point you need to perform the training and optimization PGO phase on your benchmarks so you can estimate - does PGO have any positive effects or not on TF Serving performance (RPS, CPU usage).

This process is simple (for the Clang compiler):

Compile TF Serving in Instrumentation mode (-fprofile-instr-generate compiler option for Clang)
Run instrumented TF Serving on the benchmark workload
After the finish, TF Serving should generate some .profraw files
Prepare them with llvm-profdata
Recompile TF Serving once again with the generated above profile information
Congratulations - you got a PGO-optimized TF Serving binary! Run the benchmarks once again to measure the performance improvements

Only after you can think optimizing TF Serving prebuilt binaries with some predefined sample real-life workload. You need to choose the sample workload, integrate profile gathering into your CI/CD pipeline, etc. On the links above you also can get some insights about such a way.

We have documented Performace Guide for Tensorflow Serving to help users get optimal model server performance.

Awesome, that you have such a guide! If PGO has some positive effects on TF Serving performance, I think you can extract this guide with an additional chapter about rebuilding TF Serving with PGO or even create a dedicated page about PGO in the TF Serving documentation. Here I collected some examples of such documentation in various projects (maybe they can help you with shaping your PGO documentation for TF Serving):

ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
Databend: https://databend.rs/doc/contributing/pgo
Vector: https://vector.dev/docs/administration/tuning/pgo/
Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
GCC: Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
Clang:
- https://llvm.org/docs/HowToBuildWithPGO.html
- https://llvm.org/docs/AdvancedBuilds.html

Hope this information was helpful!

from serving.

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT about serving HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent