Comments (3)
@zamazan4ik, Thank you for the detailed explanation. We will discuss this implementation internally and update this thread.
from serving.
We have documented Performace Guide for Tensorflow Serving to help users get optimal model server performance.
Can you please explain in detail what needs to be done from our end to implement PGO with Tensorflow Serving? Based on that I can take this feature implementation to the team. Thank you!
from serving.
Can you please explain in detail what needs to be done from our end to implement PGO with Tensorflow Serving? Based on that I can take this feature implementation to the team.
Sure! At first, you need to integrate the PGO-specific compiler flags into your build pipeline (here are described flags for Clang, here - for GCC. If you want to support other compilers - please use the corresponding documentation to these compilers). I recommend starting with the Instrumentation PGO since generally easier to implement.
Below I collected some examples of how PGO is integrated into the build scripts in other projects (so you can take a look at the existing implementations):
- ISPC: CMake scipts
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag
- NodeJS: Configure script
After that point you need to perform the training and optimization PGO phase on your benchmarks so you can estimate - does PGO have any positive effects or not on TF Serving performance (RPS, CPU usage).
This process is simple (for the Clang compiler):
- Compile TF Serving in Instrumentation mode (
-fprofile-instr-generate
compiler option for Clang) - Run instrumented TF Serving on the benchmark workload
- After the finish, TF Serving should generate some
.profraw
files - Prepare them with
llvm-profdata
- Recompile TF Serving once again with the generated above profile information
- Congratulations - you got a PGO-optimized TF Serving binary! Run the benchmarks once again to measure the performance improvements
Only after you can think optimizing TF Serving prebuilt binaries with some predefined sample real-life workload. You need to choose the sample workload, integrate profile gathering into your CI/CD pipeline, etc. On the links above you also can get some insights about such a way.
We have documented Performace Guide for Tensorflow Serving to help users get optimal model server performance.
Awesome, that you have such a guide! If PGO has some positive effects on TF Serving performance, I think you can extract this guide with an additional chapter about rebuilding TF Serving with PGO or even create a dedicated page about PGO in the TF Serving documentation. Here I collected some examples of such documentation in various projects (maybe they can help you with shaping your PGO documentation for TF Serving):
- ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
- Databend: https://databend.rs/doc/contributing/pgo
- Vector: https://vector.dev/docs/administration/tuning/pgo/
- Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
- GCC: Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- Clang:
Hope this information was helpful!
from serving.
Related Issues (20)
- Unable to compile prediction_service.proto for Golang HOT 4
- TF Serving batching for Sparse Tensors HOT 6
- TF Serving gets stuck in the polling loop due to a non-existing model provided in config file HOT 3
- TensorFlow serving seems to have no version attribute HOT 3
- GPU inference in Docker container fails due to missing libdevice directory HOT 4
- CPU Memory occupied by TF Serving even though serving is on GPU HOT 6
- Version 2.15 release? HOT 7
- Mismatch between TensorRT version used in TF 2.14 GPU docker images for tensorflow/serving and tensorflow/tensorflow causes segfault during inference HOT 1
- Critical Vulnerability HOT 3
- Who to contact for security issues HOT 3
- Difference between Metrics emitted by TF Serving HOT 4
- OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND HOT 8
- java.lang.RuntimeException: Unexpected code Response{protocol=http/1.1, code=400, message=Bad Request, url=http://localhost:8501/v1/models/myfruit:predict} HOT 6
- CUDA Graphs support for Tensorflow Serving HOT 2
- OP_REQUIRES failed at xla_compile_on_demand_op.cc:290 : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND HOT 4
- Add health check to Dockerfile HOT 4
- ETA for TensorFlow Runtime Integration?
- Why TF Serving using one CUDA Compute Stream HOT 4
- Ragged Tensor as an output from Tensorflow serving HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from serving.