kamalkraj / stable-diffusion-tritonserver Goto Github PK

View Code? Open in Web Editor NEW

119.0 6.0 21.0 2.68 MB

Deploy stable diffusion model with onnx/tenorrt + tritonserver

License: Apache License 2.0

Dockerfile 0.04% Jupyter Notebook 99.96%

docker nvidia stablediffusion transformers deploy fp16 onnx python3 triton-inference-server inference

stable-diffusion-tritonserver's Introduction

stable-diffusion-tritonserver

Please checkout branch v2 for converting new models

Please checkout branch v3 for converting models to TensorRT for fastest inference

Download models

# clone this repo
git clone https://github.com/kamalkraj/stable-diffusion-tritonserver.git
cd stable-diffusion-tritonserver
# clone model repo from huggingface
git lfs install
git clone https://huggingface.co/kamalkraj/stable-diffusion-v1-4-onnx

Unzip the model weights

cd stable-diffusion-v1-4-onnx
tar -xvzf models.tar.gz

Triton Inference Server

Build

docker build -t tritonserver .

Run

docker run -it --rm --gpus all -p8000:8000 -p8001:8001 -p8002:8002 --shm-size 16384m   \
-v $PWD/stable-diffusion-v1-4-onnx/models:/models tritonserver \
tritonserver --model-repository /models/

Inference

Install tritonclient and run the notebook for inference.

pip install "tritonclient[http]"

Credits

ONNX conversion script from - harishanand95/diffusers

stable-diffusion-tritonserver's People

Contributors

Stargazers

Watchers

stable-diffusion-tritonserver's Issues

How to modify the default height and width？

I need a picture with a width of 1024 and a height of 768，and I modified the unet configuration file

  {
    name: "sample"
    data_type: TYPE_FP32
    dims: [ -1, 4, -1, -1 ]#[ -1, 4, 64, 64 ]
  },

but I got an error like

tritonclient.utils.InferenceServerException: Failed to process the request(s) for model instance 'stable_diffusion', message: TritonModelException: [request id: <id_unknown>] unexpected shape for input 'sample' for model 'unet'. Expected [-1,4,64,64], got [2,4,128,96]

So how to modify the default height and width?

stabilityai/stable-diffusion-2-1-base got worse response time than StableDiffusionPipeline

hi!
i just tested your way but i got worse response time
i'm leaving this issue because there would be somthing wrong in my code or logic
or i'm using this tools inappropriate

environment

Ubuntu 18.04
T4
torch == 1.11.0+cu113
optimum == 1.4.0
onnx == 1.12.0
Python 3.8.10
triton 22.01

i ported stabilityai/stable-diffusion-2-1-base with convert_stable_diffusion_checkpoint_to_onnx.py and used your model directory with fixing some pbtxt dimensions

and add noise_pred = noise_pred.to("cuda") this line at link

and triton server worked like below

then i inference with this prompts

prompts = [
    "A man standing with a red umbrella",
    "A child standing with a green umbrella",
    "A woman standing with a yellow umbrella"
]

and i get response after 6.8 sec (avg of 3 inferences)

strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec.
of course it was done at same environment and it's also served from triton inference server
(but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)

is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline?
i expected more performance as it's hard to serve..

Great work, any donation possibilities?

environment of cuda and etc

8.6.1,cudnn8.9,cuda12.1.driver530.41. always get Segmentation fault (core dumped). i have tried trt 8.5.3, cuda11.8. cudnn8.6,8.7,8.9.3. all is Segmentation fault (core dumped). trt is the newest version, 8.6.1 from git

failed to load 'stable_diffusion' version 1:

Hello,

Following instructions to deploy this project, and observing that Triton is unable to load the stable_diffusion model.

This is seen in the Triton Server logs printed to stdout:

1028 08:21:03.012132 581 pb_stub.cc:309] Failed to initialize Python stub: AttributeError: 'LMSDiscreteScheduler' object has no attribute 'set_format'

At:
  /models/stable_diffusion/1/model.py(58): initialize

I1028 08:21:03.465850 1 onnxruntime.cc:2606] TRITONBACKEND_ModelInstanceInitialize: encoder (GPU device 1)
E1028 08:21:03.470367 1 model_lifecycle.cc:596] failed to load 'stable_diffusion' version 1: Internal: AttributeError: 'LMSDiscreteScheduler' object has no attribute 'set_format'

At:
  /models/stable_diffusion/1/model.py(58): initialize

The specific function referenced in model.py is here (line 58, indicated below):

    def initialize(self, args: Dict[str, str]) -> None:
        """
        Initialize the tokenization process
        :param args: arguments from Triton config file
        """
        current_name: str = str(Path(args["model_repository"]).parent.absolute())
        self.device = "cpu" if args["model_instance_kind"] == "CPU" else "cuda"
        self.tokenizer = CLIPTokenizer.from_pretrained(current_name + "/stable_diffusion/1/")
        self.scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear")
        self.scheduler = self.scheduler.set_format("pt")   <--
        self.height = 512
        self.width = 512
        self.num_inference_steps = 50
        self.guidance_scale = 7.5
        self.eta = 0.0

I tried commenting this line out so self.scheduler is only defined in the line previous, and Triton Server starts and all models (including stable_diffusion) loads successfully and is reported by Triton as online and ready.

Leaving this in place, when subsequently working through the Jupyter Notebook, an error is raised (somewhat expectedly):

InferenceServerException: Failed to process the request(s) for model instance 'stable_diffusion', message: Stub process is not healthy.

So, forced back to the original issue - have you seen this before, or any idea on a fix?

GPU platform

Hi, which GPU platform would you recommend as a cheapest deployment platform? and what's the average generation time for 25 steps image generation with the fastest version for the GPUs you have tested?

onnx conversion

Hi Kamal - I was wondering how you have converted the diffusion model to onnx. I have tried the official implementation from here but it doesn't seem to produce the right onnx file.

https://github.com/huggingface/diffusers/blob/main/scripts/convert_stable_diffusion_checkpoint_to_onnx.py

Can you please let me know how you have done the conversion.

how to generate image from image?

v3 tensorrt version errors on conversion step.

Tensorrt repo was updated with new code for tensorrt 8.6.0
Current v3 pulls from master and runs the code for tensorrt 8.6.0 which no longer works.
I've tried to change the code to use commit from 8.5.2 from december but that also unfortunately produces errors. Could you please specify the commit hash in the dockerfile which allows pipeline to run successfully or update the code to run on tensorrt 8.6.0 release

Model exports for Tritonserver

Hi! First of all, great repo and very useful. I've been using it with your model export (from huggingface) and everything works great.

But now I have to deploy a different version of Stable Diffusion with Triton server. I saw that you mentioned that you used this script: https://github.com/harishanand95/diffusers/blob/dml/examples/inference/save_onnx.py
for model export. That script works for fp32, but how you exported it for fp16? Have you used some kind of converter? Do you have an example of fp16 export?

Thanks :)

UNAVAILABLE: Internal: HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/models/stable_diffusion/1/stable_diffusion/1/tokenizer/'. Use `repo_type` argument if needed.