Code Monkey home page Code Monkey logo

Comments (11)

junjiejiangjjj avatar junjiejiangjjj commented on June 2, 2024 1

image
Did you use the --gpu params when starting docker

from towhee.

Mrzhiyao avatar Mrzhiyao commented on June 2, 2024

This problem was solved after I restarted the container, but a new error occurred when executing the program.

Traceback (most recent call last):
File "/home/eg/PycharmProjects/Towhee/triton_endcod.py", line 8, in
res = client(data)
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/towhee/serve/triton/pipeline_client.py", line 81, in call
return self._loop.run_until_complete(self._call(inputs))[0]
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/towhee/serve/triton/pipeline_client.py", line 68, in _call
response = await self._client.infer(self._model_name, inputs)
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/tritonclient/http/aio/init.py", line 757, in infer
response = await self._post(
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/tritonclient/http/aio/init.py", line 209, in _post
res = await self._stub.post(
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/aiohttp/client.py", line 586, in _request
await resp.start(conn)
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 920, in start
self._continue = None
File "/home/eg/anaconda3/envs/towhee38/lib/python3.8/site-packages/aiohttp/helpers.py", line 725, in exit
raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

from towhee.

Mrzhiyao avatar Mrzhiyao commented on June 2, 2024

image Did you use the --gpu params when starting docker

Yes, the problem was solved after I recreated the container, but a new problem appeared. Do you know how to solve this problem?

from towhee.

junjiejiangjjj avatar junjiejiangjjj commented on June 2, 2024

It seems that access to the triton server timeout. Are there any logs on the server?

from towhee.

Mrzhiyao avatar Mrzhiyao commented on June 2, 2024

It seems that access to the triton server timeout. Are there any logs on the server?

docker logs shows that:

NVIDIA Release 22.07 (build 41737377)
Triton Server Version 2.24.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I1109 06:53:09.532688 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6a4e000000' with size 268435456
I1109 06:53:09.533016 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1109 06:53:09.536004 1 model_repository_manager.cc:1206] loading: pipeline:1
I1109 06:53:09.536049 1 model_repository_manager.cc:1206] loading: sentence-embedding.sbert-0:1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:11.225232 1 onnxruntime.cc:2458] TRITONBACKEND_Initialize: onnxruntime
I1109 06:53:11.225295 1 onnxruntime.cc:2468] Triton TRITONBACKEND API version: 1.10
I1109 06:53:11.225317 1 onnxruntime.cc:2474] 'onnxruntime' TRITONBACKEND API version: 1.10
I1109 06:53:11.225331 1 onnxruntime.cc:2504] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I1109 06:53:11.259270 1 onnxruntime.cc:2560] TRITONBACKEND_ModelInitialize: sentence-embedding.sbert-0 (version 1)
W1109 06:53:14.630221 1 onnxruntime.cc:787] autofilled max_batch_size to 4 for model 'sentence-embedding.sbert-0' since batching is supporrted but no max_batch_size is specified in model configuration. Must specify max_batch_size to utilize autofill with a larger max batch size
I1109 06:53:14.685000 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_0 (CPU device 0)
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:17.996107 1 onnxruntime.cc:2603] TRITONBACKEND_ModelInstanceInitialize: sentence-embedding.sbert-0_0 (GPU device 0)
I1109 06:53:20.312004 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_1 (CPU device 0)
I1109 06:53:20.312255 1 model_repository_manager.cc:1352] successfully loaded 'sentence-embedding.sbert-0' version 1
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:23.568245 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_2 (CPU device 0)
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:26.839855 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_3 (CPU device 0)
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1109 06:53:30.081773 1 model_repository_manager.cc:1352] successfully loaded 'pipeline' version 1
I1109 06:53:30.082043 1 server.cc:559]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1109 06:53:30.082215 1 server.cc:586]

+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/b |
| | | ackends","default-max-batch-size":"4"}} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/b |
| | | ackends","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+

I1109 06:53:30.082348 1 server.cc:629]
+----------------------------+---------+--------+
| Model | Version | Status |
+----------------------------+---------+--------+
| pipeline | 1 | READY |
| sentence-embedding.sbert-0 | 1 | READY |
+----------------------------+---------+--------+

I1109 06:53:30.135753 1 metrics.cc:650] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I1109 06:53:30.136027 1 tritonserver.cc:2176]
I1109 06:53:30.137643 1 grpc_server.cc:4608] Started GRPCInferenceService at 0.0.0.0:8001
I1109 06:53:30.137940 1 http_server.cc:3312] Started HTTPService at 0.0.0.0:8000
I1109 06:53:30.179419 1 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002

from towhee.

junjiejiangjjj avatar junjiejiangjjj commented on June 2, 2024

curl http://0.0.0.0:8000/v2/models/stats Check the server is available

from towhee.

Mrzhiyao avatar Mrzhiyao commented on June 2, 2024

curl http://0.0.0.0:8000/v2/models/stats Check the server is available

I set the local port to 8010. So I can get such a result, what may be the cause of the error in this case, thank you for your help.

(base) eg@eg-HP-Z8-G4-Workstation:~$ curl http://0.0.0.0:8010/v2/models/stats
{"model_stats":[{"name":"pipeline","version":"1","last_inference":0,"inference_count":0,"execution_count":0,"inference_stats":{"success":{"count":0,"ns":0},"fail":{"count":0,"ns":0},"queue":{"count":0,"ns":0},"compute_input":{"count":0,"ns":0},"compute_infer":{"count":0,"ns":0},"compute_output":{"count":0,"ns":0},"cache_hit":{"count":0,"ns":0},"cache_miss":{"count":0,"ns":0}},"batch_stats":[]},{"name":"sentence-embedding.sbert-0","version":"1","last_inference":0,"inference_count":0,"execution_count":0,"inference_stats":{"success":{"count":0,"ns":0},"fail":{"count":0,"ns":0},"queue":{"count":0,"ns":0},"compute_input":{"count":0,"ns":0},"compute_infer":{"count":0,"ns":0},"compute_output":{"count":0,"ns":0},"cache_hit":{"count":0,"ns":0},"cache_miss":{"count":0,"ns":0}},"batch_stats":[]}]}

from towhee.

junjiejiangjjj avatar junjiejiangjjj commented on June 2, 2024

Try ops.sentence_embedding.transformers, sbert has some bugs.
image
This pipeline works fine.

from towhee.

Mrzhiyao avatar Mrzhiyao commented on June 2, 2024

Try ops.sentence_embedding.transformers, sbert has some bugs. image This pipeline works fine.

Thank you for your help. I think my problem has been resolved. My other question is, which parameters can further improve the encoding speed by accelerating model inference through the Triton server in parameter settings.

from towhee.

junjiejiangjjj avatar junjiejiangjjj commented on June 2, 2024

It is possible to optimize performance by adjusting parameters such as the number of instances and batch size. For more information, please refer to the Triton documentation: https://github.com/triton-inference-server/server

from towhee.

Mrzhiyao avatar Mrzhiyao commented on June 2, 2024

It is possible to optimize performance by adjusting parameters such as the number of instances and batch size. For more information, please refer to the Triton documentation: https://github.com/triton-inference-server/server

Thank you very much for your help. I think my problem has been resolved.

from towhee.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.