Comments (11)
Did you use the --gpu
params when starting docker
from towhee.
This problem was solved after I restarted the container, but a new error occurred when executing the program.
Traceback (most recent call last):
File "/home/eg/PycharmProjects/Towhee/triton_endcod.py", line 8, in
res = client(data)
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/towhee/serve/triton/pipeline_client.py", line 81, in call
return self._loop.run_until_complete(self._call(inputs))[0]
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/towhee/serve/triton/pipeline_client.py", line 68, in _call
response = await self._client.infer(self._model_name, inputs)
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/tritonclient/http/aio/init.py", line 757, in infer
response = await self._post(
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/tritonclient/http/aio/init.py", line 209, in _post
res = await self._stub.post(
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/aiohttp/client.py", line 586, in _request
await resp.start(conn)
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 920, in start
self._continue = None
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/aiohttp/helpers.py", line 725, in exit
raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
from towhee.
Yes, the problem was solved after I recreated the container, but a new problem appeared. Do you know how to solve this problem?
from towhee.
It seems that access to the triton server timeout. Are there any logs on the server?
from towhee.
It seems that access to the triton server timeout. Are there any logs on the server?
docker logs shows that:
NVIDIA Release 22.07 (build 41737377)
Triton Server Version 2.24.0
Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
I1109 06:53:09.532688 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6a4e000000' with size 268435456
I1109 06:53:09.533016 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1109 06:53:09.536004 1 model_repository_manager.cc:1206] loading: pipeline:1
I1109 06:53:09.536049 1 model_repository_manager.cc:1206] loading: sentence-embedding.sbert-0:1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:11.225232 1 onnxruntime.cc:2458] TRITONBACKEND_Initialize: onnxruntime
I1109 06:53:11.225295 1 onnxruntime.cc:2468] Triton TRITONBACKEND API version: 1.10
I1109 06:53:11.225317 1 onnxruntime.cc:2474] 'onnxruntime' TRITONBACKEND API version: 1.10
I1109 06:53:11.225331 1 onnxruntime.cc:2504] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I1109 06:53:11.259270 1 onnxruntime.cc:2560] TRITONBACKEND_ModelInitialize: sentence-embedding.sbert-0 (version 1)
W1109 06:53:14.630221 1 onnxruntime.cc:787] autofilled max_batch_size to 4 for model 'sentence-embedding.sbert-0' since batching is supporrted but no max_batch_size is specified in model configuration. Must specify max_batch_size to utilize autofill with a larger max batch size
I1109 06:53:14.685000 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_0 (CPU device 0)
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:17.996107 1 onnxruntime.cc:2603] TRITONBACKEND_ModelInstanceInitialize: sentence-embedding.sbert-0_0 (GPU device 0)
I1109 06:53:20.312004 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_1 (CPU device 0)
I1109 06:53:20.312255 1 model_repository_manager.cc:1352] successfully loaded 'sentence-embedding.sbert-0' version 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:23.568245 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_2 (CPU device 0)
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:26.839855 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_3 (CPU device 0)
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:30.081773 1 model_repository_manager.cc:1352] successfully loaded 'pipeline' version 1
I1109 06:53:30.082043 1 server.cc:559]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I1109 06:53:30.082215 1 server.cc:586]
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/b |
| | | ackends","default-max-batch-size":"4"}} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/b |
| | | ackends","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
I1109 06:53:30.082348 1 server.cc:629]
+----------------------------+---------+--------+
| Model | Version | Status |
+----------------------------+---------+--------+
| pipeline | 1 | READY |
| sentence-embedding.sbert-0 | 1 | READY |
+----------------------------+---------+--------+
I1109 06:53:30.135753 1 metrics.cc:650] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I1109 06:53:30.136027 1 tritonserver.cc:2176]
I1109 06:53:30.137643 1 grpc_server.cc:4608] Started GRPCInferenceService at 0.0.0.0:8001
I1109 06:53:30.137940 1 http_server.cc:3312] Started HTTPService at 0.0.0.0:8000
I1109 06:53:30.179419 1 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
from towhee.
curl http://0.0.0.0:8000/v2/models/stats Check the server is available
from towhee.
curl http://0.0.0.0:8000/v2/models/stats Check the server is available
I set the local port to 8010. So I can get such a result, what may be the cause of the error in this case, thank you for your help.
(base) eg@eg-HP-Z8-G4-Workstation:~$ curl http://0.0.0.0:8010/v2/models/stats
{"model_stats":[{"name":"pipeline","version":"1","last_inference":0,"inference_count":0,"execution_count":0,"inference_stats":{"success":{"count":0,"ns":0},"fail":{"count":0,"ns":0},"queue":{"count":0,"ns":0},"compute_input":{"count":0,"ns":0},"compute_infer":{"count":0,"ns":0},"compute_output":{"count":0,"ns":0},"cache_hit":{"count":0,"ns":0},"cache_miss":{"count":0,"ns":0}},"batch_stats":[]},{"name":"sentence-embedding.sbert-0","version":"1","last_inference":0,"inference_count":0,"execution_count":0,"inference_stats":{"success":{"count":0,"ns":0},"fail":{"count":0,"ns":0},"queue":{"count":0,"ns":0},"compute_input":{"count":0,"ns":0},"compute_infer":{"count":0,"ns":0},"compute_output":{"count":0,"ns":0},"cache_hit":{"count":0,"ns":0},"cache_miss":{"count":0,"ns":0}},"batch_stats":[]}]}
from towhee.
Try ops.sentence_embedding.transformers
, sbert
has some bugs.
This pipeline works fine.
from towhee.
Try
ops.sentence_embedding.transformers
,sbert
has some bugs. This pipeline works fine.
Thank you for your help. I think my problem has been resolved. My other question is, which parameters can further improve the encoding speed by accelerating model inference through the Triton server in parameter settings.
from towhee.
It is possible to optimize performance by adjusting parameters such as the number of instances and batch size. For more information, please refer to the Triton documentation: https://github.com/triton-inference-server/server
from towhee.
It is possible to optimize performance by adjusting parameters such as the number of instances and batch size. For more information, please refer to the Triton documentation: https://github.com/triton-inference-server/server
Thank you very much for your help. I think my problem has been resolved.
from towhee.
Related Issues (20)
- [DesignProposal]: we need offline mode HOT 1
- [Bug]: 在多次使用 pipeline('towhee/audio-embedding-vggish') 时发生内存泄漏 HOT 11
- [Bug]: ValidationError: 101 validation errors for PointStruct vector -> 0 value is not a valid float (type=type_error.float) HOT 5
- RuntimeError: Loading operator with error:Load operator failed HOT 2
- [Bug]: 您好,我想请问一下,通过 towhee 的 vggish 模型得到音频向量后,通过什么方法计算两个音频向量的相似度?dtw吗? HOT 6
- [Feature]: Can it load the model locally? 可以本地加载本地模型吗? HOT 18
- [Bug]: RuntimeError: error checking inheritance of Ellipsis (type: ellipsis) HOT 2
- [Bug]: How to actively release video memory HOT 2
- Embedding模型加载失败 HOT 12
- [Bug]: application cannot start. `from towhee import AutoPipes` crashes HOT 4
- [Bug]: AutoPipes.pipeline('sentence_embedding', config=config) failed at pydantic 2.5.x HOT 10
- [Bug]: ImportError: cannot import name 'AutoPipes' from partially initialized module 'towhee' (most likely due to a circular import) HOT 2
- [Bug]: RuntimeError: Loading operator with error:Load operator failed HOT 9
- [Bug]: Triton in Towhee example fails in latest versions of Tritonserver HOT 3
- [Feature]: How to Release GPU Memory Occupation HOT 2
- [Enhancement]: HTTP API should support posting video / image files to compute embeddings and other operations. HOT 2
- [Bug]: When building the triton backend, using custom operators may result in errors HOT 4
- [Documentation]: How to get embeddings with CVNet HOT 1
- cannot import name 'pipe' from partially initialized module 'towhee' (most likely due to a circular import) HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from towhee.