Code Monkey home page Code Monkey logo

stable-diffusion-tritonserver's Introduction

stable-diffusion-tritonserver

Please checkout branch v2 for converting new models

Please checkout branch v3 for converting models to TensorRT for fastest inference

Download models

# clone this repo
git clone https://github.com/kamalkraj/stable-diffusion-tritonserver.git
cd stable-diffusion-tritonserver
# clone model repo from huggingface
git lfs install
git clone https://huggingface.co/kamalkraj/stable-diffusion-v1-4-onnx

Unzip the model weights

cd stable-diffusion-v1-4-onnx
tar -xvzf models.tar.gz

Triton Inference Server

Build

docker build -t tritonserver .

Run

docker run -it --rm --gpus all -p8000:8000 -p8001:8001 -p8002:8002 --shm-size 16384m   \
-v $PWD/stable-diffusion-v1-4-onnx/models:/models tritonserver \
tritonserver --model-repository /models/

Inference

Install tritonclient and run the notebook for inference.

pip install "tritonclient[http]"

Credits

stable-diffusion-tritonserver's People

Contributors

kamalkraj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

stable-diffusion-tritonserver's Issues

How to modify the default height and width?

I need a picture with a width of 1024 and a height of 768,and I modified the unet configuration file

  {
    name: "sample"
    data_type: TYPE_FP32
    dims: [ -1, 4, -1, -1 ]#[ -1, 4, 64, 64 ]
  },

but I got an error like

tritonclient.utils.InferenceServerException: Failed to process the request(s) for model instance 'stable_diffusion', message: TritonModelException: [request id: <id_unknown>] unexpected shape for input 'sample' for model 'unet'. Expected [-1,4,64,64], got [2,4,128,96]

So how to modify the default height and width?

stabilityai/stable-diffusion-2-1-base got worse response time than StableDiffusionPipeline

hi!
i just tested your way but i got worse response time
i'm leaving this issue because there would be somthing wrong in my code or logic
or i'm using this tools inappropriate

environment

  • Ubuntu 18.04
  • T4
  • torch == 1.11.0+cu113
  • optimum == 1.4.0
  • onnx == 1.12.0
  • Python 3.8.10
  • triton 22.01

i ported stabilityai/stable-diffusion-2-1-base with convert_stable_diffusion_checkpoint_to_onnx.py and used your model directory with fixing some pbtxt dimensions

and add noise_pred = noise_pred.to("cuda") this line at link

and triton server worked like below
image

then i inference with this prompts

prompts = [
    "A man standing with a red umbrella",
    "A child standing with a green umbrella",
    "A woman standing with a yellow umbrella"
]

and i get response after 6.8 sec (avg of 3 inferences)

strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec.
of course it was done at same environment and it's also served from triton inference server
(but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)

is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline?
i expected more performance as it's hard to serve..

environment of cuda and etc

8.6.1,cudnn8.9,cuda12.1.driver530.41. always get Segmentation fault (core dumped). i have tried trt 8.5.3, cuda11.8. cudnn8.6,8.7,8.9.3. all is Segmentation fault (core dumped). trt is the newest version, 8.6.1 from git

failed to load 'stable_diffusion' version 1:

Hello,

Following instructions to deploy this project, and observing that Triton is unable to load the stable_diffusion model.

This is seen in the Triton Server logs printed to stdout:

1028 08:21:03.012132 581 pb_stub.cc:309] Failed to initialize Python stub: AttributeError: 'LMSDiscreteScheduler' object has no attribute 'set_format'

At:
  /models/stable_diffusion/1/model.py(58): initialize

I1028 08:21:03.465850 1 onnxruntime.cc:2606] TRITONBACKEND_ModelInstanceInitialize: encoder (GPU device 1)
E1028 08:21:03.470367 1 model_lifecycle.cc:596] failed to load 'stable_diffusion' version 1: Internal: AttributeError: 'LMSDiscreteScheduler' object has no attribute 'set_format'

At:
  /models/stable_diffusion/1/model.py(58): initialize

The specific function referenced in model.py is here (line 58, indicated below):

    def initialize(self, args: Dict[str, str]) -> None:
        """
        Initialize the tokenization process
        :param args: arguments from Triton config file
        """
        current_name: str = str(Path(args["model_repository"]).parent.absolute())
        self.device = "cpu" if args["model_instance_kind"] == "CPU" else "cuda"
        self.tokenizer = CLIPTokenizer.from_pretrained(current_name + "/stable_diffusion/1/")
        self.scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear")
        self.scheduler = self.scheduler.set_format("pt")   <--
        self.height = 512
        self.width = 512
        self.num_inference_steps = 50
        self.guidance_scale = 7.5
        self.eta = 0.0

I tried commenting this line out so self.scheduler is only defined in the line previous, and Triton Server starts and all models (including stable_diffusion) loads successfully and is reported by Triton as online and ready.

Leaving this in place, when subsequently working through the Jupyter Notebook, an error is raised (somewhat expectedly):

InferenceServerException: Failed to process the request(s) for model instance 'stable_diffusion', message: Stub process is not healthy.

So, forced back to the original issue - have you seen this before, or any idea on a fix?

GPU platform

Hi, which GPU platform would you recommend as a cheapest deployment platform? and what's the average generation time for 25 steps image generation with the fastest version for the GPUs you have tested?

v3 tensorrt version errors on conversion step.

Tensorrt repo was updated with new code for tensorrt 8.6.0
Current v3 pulls from master and runs the code for tensorrt 8.6.0 which no longer works.
I've tried to change the code to use commit from 8.5.2 from december but that also unfortunately produces errors. Could you please specify the commit hash in the dockerfile which allows pipeline to run successfully or update the code to run on tensorrt 8.6.0 release

Model exports for Tritonserver

Hi! First of all, great repo and very useful. I've been using it with your model export (from huggingface) and everything works great.

But now I have to deploy a different version of Stable Diffusion with Triton server. I saw that you mentioned that you used this script: https://github.com/harishanand95/diffusers/blob/dml/examples/inference/save_onnx.py
for model export. That script works for fp32, but how you exported it for fp16? Have you used some kind of converter? Do you have an example of fp16 export?

Thanks :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.