Deion I am deploying a YOLOv8 model for object-detection us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Sidecar Container CPU Throttling when Deploying using Triton with ONNX Backend on Kubernetes about server HOT 2 OPEN

langong347 commented on May 29, 2024

Sidecar Container CPU Throttling when Deploying using Triton with ONNX Backend on Kubernetes

from server.

Comments (2)

fpetrini15 commented on May 29, 2024

Hi @langong347,

Thank you for submitting an issue.

I notice your config does not set a different value for intra_op_thread_count, so yes, I believe the number of threads corresponds directly to the number of CPU cores. Whether or not these threads are active / have an impact on performance, it would depend on the workload.

Have you tried setting a value for intra_op_thread_count that is less than the number of CPU cores and checking its impact on performance / CPU throttling in your environment?

CC @whoisj for more thoughts.

from server.

Related Issues (20)

decouple model abnormal dynamic batching result HOT 4
Significant latency between COMPUTE_END and REQUEST_END HOT 1
Issue on page /user_guide/response_cache.html HOT 4
triton can provide request transmission in the form of a file stream? HOT 1
In Triton, multiple instances of the same model load multiple copies of the model file into memory, leading to CUDA out of memory. Why can't multiple instances share the same model file? HOT 2
Will tensorRT backend be compatible with tensorRT 9.1+ ? HOT 8
Missing :te header when using envoy proxy with grpc-web filter HOT 5
Dynamic batching does not work properly with python backend HOT 1
[400] 'MODEL' version 1 is not at ready state even if /v2/health/ready has succeeded HOT 6
Conda Package for Inference Server HOT 4
Incomplete LLM response HOT 3
After load a model, Triton server suddenly not work that it shows CUDA failed to initialize. Unknown error (error 999). HOT 2
After load a model, Triton server suddenly not work that it shows CUDA failed to initialize. Unknown error (error 999). HOT 7
[CMake error] Building Triton on arm64 machine using build.py HOT 4
Errors from tutorial : Deploying a vLLM model in Triton HOT 3
c++ developer tools API - segmentation fault with multithreaded calls of AsyncInfer HOT 10
only cpu have a error
How to generate rawInputContents with multiple dimensions and multiple input parameters in GRPC? HOT 8
Set cuda_memory_pool_byte_size to solve CNMEM_STATUS_OUT_OF_MEMORY HOT 3

Sidecar Container CPU Throttling when Deploying using Triton with ONNX Backend on Kubernetes about server HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent