Right now we are having two models one for embedding and other for reranking. For

Calling index search inside Triton python backend about server HOT 1 OPEN

riyaj8888 commented on June 18, 2024

Calling index search inside Triton python backend

from server.

Comments (1)

oandreeva-nv commented on June 18, 2024

Let me outline the process in the way I understand it, feel free to correct me.

For this task potentially you can either re-build an index based on the documents to re-use it, or de-serialize it from external service.

After that, one option is to write a python model and utilize a cuVS library. The latter one has apis to build an index, please check with their docs to see if it fits to your needs. This library also provides a variety of vector search algorithms to choose from as well as specifying k for top-k.

Then, the last step for this model is to combine initial request with retrieved top-k embedding and prepare a response, which will be passed to the next stage of your ensemble.

Let me know how this sound to you, happy to discuss further.

from server.

Calling index search inside Triton python backend about server HOT 1 OPEN

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent