Hi all, does TRTIS support model parallelism?, I mean, if a single model is copied in

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Does TRTIS support model parallelism? about server HOT 4 CLOSED

triton-inference-server commented on May 16, 2024

Does TRTIS support model parallelism?

from server.

Comments (4)

deadeyegoodwin commented on May 16, 2024

Yes, you can serve multiple different models, multiple instances of the same model, or multiple instances of multiple models, on one or more CPUs and GPUs, simultaneously.

The docs discuss it here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_configuration.html?highlight=batching#instance-groups

As does this blog post (in the Performance section): https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/

from server.

vilmara commented on May 16, 2024

@deadeyegoodwin, thanks for your prompt reply

from server.

BDHU commented on May 16, 2024

@deadeyegoodwin for a large model that doesn't fit on a single GPU, how does triton split the model onto mutlple GPUs then?

from server.

shanshanpt commented on May 16, 2024

@deadeyegoodwin for a large model that doesn't fit on a single GPU, how does triton split the model onto mutlple GPUs then?

Triton may not support this case based on my experience.

from server.

Does TRTIS support model parallelism? about server HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent