Hey all, I have a quick question, is onnxruntime-genai (<a href="https://onnxruntime.a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Is onnxruntime-genai supported? about server HOT 2 OPEN

jackylu0124 commented on June 11, 2024

Is onnxruntime-genai supported?

from server.

Comments (2)

nnshah1 commented on June 11, 2024

@jackylu0124 Support for onnxruntime-genai is currently work in progress - the python bindings should work within the python backend - but we haven't had a chance to test that ourselves yet.

That being said we are actively investigating support - can you share more about your use case / timeline needed for support?

from server.

jackylu0124 commented on June 11, 2024

@jackylu0124 Support for onnxruntime-genai is currently work in progress - the python bindings should work within the python backend - but we haven't had a chance to test that ourselves yet.

That being said we are actively investigating support - can you share more about your use case / timeline needed for support?

Hi @nnshah1 , thank you very much for your fast reply! By "the python bindings should work within the python backend", you meant that I can do things like import onnxruntime_genai and write the custom inference logic in the Python backend, as opposed to having Triton Inference Server automatically manage all my .onnx model files (that use onnxruntime-genai) in the model repository for me (which is a feature currently in development), is my understanding correct?

My use case is mainly for serving LLM models, where some of which are in the form of ONNX models that depend on onnxruntime_genai. I don't have a specific timeline, I am mainly interested in knowing whether this feature is on Triton Inferencer Server's development roadmap or not.

Also a follow-up question: regarding serving LLM, what would be the best backend for serving and achieving token streaming outside of using the TensorRT-LLM backend?

Thanks!

from server.

Recommend Projects

Is onnxruntime-genai supported? about server HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent