Comments (4)
Hi @anantzoid,
Thanks for raising an issue!
All of these forms work for me:
tritonserver --model-store models --cache-config local,size=1048576
tritonserver --model-store models --cache-config "local,size=1048576"
tritonserver --model-store models --cache-config=local,size=1048576
- Can you share the full command and corresponding full error/log you're getting for this format?
- Also can you share the
echo ${SHELL}
you're using?
from server.
Thanks for the quick reply @rmccorm4!
Here's the error:
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 24.01 (build 80100513)
Triton Server Version 2.42.0
Copyright (c) 2018-2023, NVIDIA CORPORATION \u0026 AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION \u0026 AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 12.3 driver version 545.23.08 with kernel driver version 535.129.03.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
tritonserver: unrecognized option '--cache-config local,size=10485760'
I'm running it inside a kubernetes pod, so its not an interactive shell.
from server.
I see, thanks!
Would you like to contribute the quick doc change? (You'd need to sign and email the CLA outlined in CONTRIBUTING.md)
Otherwise I can make the quick change.
Thanks,
Ryan
from server.
Seems like it'll be quicker if you do it on your end. Thanks!
from server.
Related Issues (20)
- Significant latency between COMPUTE_END and REQUEST_END HOT 1
- triton can provide request transmission in the form of a file stream? HOT 1
- In Triton, multiple instances of the same model load multiple copies of the model file into memory, leading to CUDA out of memory. Why can't multiple instances share the same model file? HOT 2
- Will tensorRT backend be compatible with tensorRT 9.1+ ? HOT 8
- Sidecar Container CPU Throttling when Deploying using Triton with ONNX Backend on Kubernetes HOT 2
- Missing :te header when using envoy proxy with grpc-web filter HOT 5
- Dynamic batching does not work properly with python backend HOT 1
- [400] 'MODEL' version 1 is not at ready state even if /v2/health/ready has succeeded HOT 6
- Conda Package for Inference Server HOT 4
- Incomplete LLM response HOT 3
- After load a model, Triton server suddenly not work that it shows CUDA failed to initialize. Unknown error (error 999). HOT 2
- After load a model, Triton server suddenly not work that it shows CUDA failed to initialize. Unknown error (error 999). HOT 7
- [CMake error] Building Triton on arm64 machine using build.py HOT 4
- Errors from tutorial : Deploying a vLLM model in Triton HOT 3
- c++ developer tools API - segmentation fault with multithreaded calls of AsyncInfer HOT 10
- only cpu have a error
- How to generate rawInputContents with multiple dimensions and multiple input parameters in GRPC? HOT 8
- Set cuda_memory_pool_byte_size to solve CNMEM_STATUS_OUT_OF_MEMORY HOT 3
- Exception serializing request - dealing with large input HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from server.