Code Monkey home page Code Monkey logo

streamdiffusion's Introduction

StreamDiffusion

English | 日本語 | 한국어

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Authors: Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Kurt Keutzer

StreamDiffusion is an innovative diffusion pipeline designed for real-time interactive generation. It introduces significant performance enhancements to current diffusion-based image generation techniques.

arXiv Hugging Face Papers

We sincerely thank Taku Fujimoto and Radamés Ajna and Hugging Face team for their invaluable feedback, courteous support, and insightful discussions.

Key Features

  1. Stream Batch

    • Streamlined data processing through efficient batch operations.
  2. Residual Classifier-Free Guidance - Learn More

    • Improved guidance mechanism that minimizes computational redundancy.
  3. Stochastic Similarity Filter - Learn More

    • Improves GPU utilization efficiency through advanced filtering techniques.
  4. IO Queues

    • Efficiently manages input and output operations for smoother execution.
  5. Pre-Computation for KV-Caches

    • Optimizes caching strategies for accelerated processing.
  6. Model Acceleration Tools

    • Utilizes various tools for model optimization and performance boost.

When images are produced using our proposed StreamDiffusion pipeline in an environment with GPU: RTX 4090, CPU: Core i9-13900K, and OS: Ubuntu 22.04.3 LTS.

model Denoising Step fps on Txt2Img fps on Img2Img
SD-turbo 1 106.16 93.897
LCM-LoRA
+
KohakuV2
4 38.023 37.133

Feel free to explore each feature by following the provided links to learn more about StreamDiffusion's capabilities. If you find it helpful, please consider citing our work:

@article{kodaira2023streamdiffusion,
      title={StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation},
      author={Akio Kodaira and Chenfeng Xu and Toshiki Hazama and Takanori Yoshimoto and Kohei Ohno and Shogo Mitsuhori and Soichi Sugano and Hanying Cho and Zhijian Liu and Kurt Keutzer},
      year={2023},
      eprint={2312.12491},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Installation

Step0: clone this repository

git clone https://github.com/cumulo-autumn/StreamDiffusion.git

Step1: Make Environment

You can install StreamDiffusion via pip, conda, or Docker(explanation below).

conda create -n streamdiffusion python=3.10
conda activate streamdiffusion

OR

python -m venv .venv
# Windows
.\.venv\Scripts\activate
# Linux
source .venv/bin/activate

Step2: Install PyTorch

Select the appropriate version for your system.

CUDA 11.8

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118

CUDA 12.1

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121

details: https://pytorch.org/

Step3: Install StreamDiffusion

For User

Install StreamDiffusion

#for Latest Version (recommended)
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]


#or


#for Stable Version
pip install streamdiffusion[tensorrt]

Install TensorRT extension

python -m streamdiffusion.tools.install-tensorrt

(Only for Windows) You may need to install pywin32 additionally, if you installed Stable Version(pip install streamdiffusion[tensorrt]).

pip install --force-reinstall pywin32

For Developer

python setup.py develop easy_install streamdiffusion[tensorrt]
python -m streamdiffusion.tools.install-tensorrt

Docker Installation (TensorRT Ready)

git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusion
docker build -t stream-diffusion:latest -f Dockerfile .
docker run --gpus all -it -v $(pwd):/home/ubuntu/streamdiffusion stream-diffusion:latest

Quick Start

You can try StreamDiffusion in examples directory.

画像3 画像4
画像5 画像6

Real-Time Txt2Img Demo

There is an interactive txt2img demo in demo/realtime-txt2img directory!

Real-Time Img2Img Demo

There is a real time img2img demo with a live webcam feed or screen capture on a web browser in demo/realtime-img2img directory!

Usage Example

We provide a simple example of how to use StreamDiffusion. For more detailed examples, please refer to examples directory.

Image-to-Image

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from diffusers.utils import load_image

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

# You can load any models using diffuser's StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

# Wrap the pipeline in StreamDiffusion
stream = StreamDiffusion(
    pipe,
    t_index_list=[32, 45],
    torch_dtype=torch.float16,
)

# If the loaded model is not LCM, merge LCM
stream.load_lcm_lora()
stream.fuse_lora()
# Use Tiny VAE for further acceleration
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
# Enable acceleration
pipe.enable_xformers_memory_efficient_attention()


prompt = "1girl with dog hair, thick frame glasses"
# Prepare the stream
stream.prepare(prompt)

# Prepare image
init_image = load_image("assets/img2img_example.png").resize((512, 512))

# Warmup >= len(t_index_list) x frame_buffer_size
for _ in range(2):
    stream(init_image)

# Run the stream infinitely
while True:
    x_output = stream(init_image)
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

Text-to-Image

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

# You can load any models using diffuser's StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

# Wrap the pipeline in StreamDiffusion
# Requires more long steps (len(t_index_list)) in text2image
# You recommend to use cfg_type="none" when text2image
stream = StreamDiffusion(
    pipe,
    t_index_list=[0, 16, 32, 45],
    torch_dtype=torch.float16,
    cfg_type="none",
)

# If the loaded model is not LCM, merge LCM
stream.load_lcm_lora()
stream.fuse_lora()
# Use Tiny VAE for further acceleration
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
# Enable acceleration
pipe.enable_xformers_memory_efficient_attention()


prompt = "1girl with dog hair, thick frame glasses"
# Prepare the stream
stream.prepare(prompt)

# Warmup >= len(t_index_list) x frame_buffer_size
for _ in range(4):
    stream()

# Run the stream infinitely
while True:
    x_output = stream.txt2img()
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

You can make it faster by using SD-Turbo.

Faster generation

Replace the following code in the above example.

pipe.enable_xformers_memory_efficient_attention()

To

from streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt

stream = accelerate_with_tensorrt(
    stream, "engines", max_batch_size=2,
)

It requires TensorRT extension and time to build the engine, but it will be faster than the above example.

Optionals

Stochastic Similarity Filter

demo

Stochastic Similarity Filter reduces processing during video input by minimizing conversion operations when there is little change from the previous frame, thereby alleviating GPU processing load, as shown by the red frame in the above GIF. The usage is as follows:

stream = StreamDiffusion(
    pipe,
    [32, 45],
    torch_dtype=torch.float16,
)
stream.enable_similar_image_filter(
    similar_image_filter_threshold,
    similar_image_filter_max_skip_frame,
)

There are the following parameters that can be set as arguments in the function:

similar_image_filter_threshold

  • The threshold for similarity between the previous frame and the current frame before the processing is paused.

similar_image_filter_max_skip_frame

  • The maximum interval during the pause before resuming the conversion.

Residual CFG (RCFG)

rcfg

RCFG is a method for approximately realizing CFG with competitive computational complexity compared to cases where CFG is not used. It can be specified through the cfg_type argument in the StreamDiffusion. There are two types of RCFG: one with no specified items for negative prompts RCFG Self-Negative and one where negative prompts can be specified RCFG Onetime-Negative. In terms of computational complexity, denoting the complexity without CFG as N and the complexity with a regular CFG as 2N, RCFG Self-Negative can be computed in N steps, while RCFG Onetime-Negative can be computed in N+1 steps.

The usage is as follows:

# w/0 CFG
cfg_type = "none"
# CFG
cfg_type = "full"
# RCFG Self-Negative
cfg_type = "self"
# RCFG Onetime-Negative
cfg_type = "initialize"
stream = StreamDiffusion(
    pipe,
    [32, 45],
    torch_dtype=torch.float16,
    cfg_type=cfg_type,
)
stream.prepare(
    prompt="1girl, purple hair",
    guidance_scale=guidance_scale,
    delta=delta,
)

The delta has a moderating effect on the effectiveness of RCFG.

Development Team

Aki, Ararat, Chenfeng Xu, ddPn08, kizamimi, ramune, teftef, Tonimono, Verb,

(*alphabetical order)

Acknowledgements

The video and image demos in this GitHub repository were generated using LCM-LoRA + KohakuV2 and SD-Turbo.

Special thanks to LCM-LoRA authors for providing the LCM-LoRA and Kohaku BlueLeaf (@KBlueleaf) for providing the KohakuV2 model and ,to Stability AI for SD-Turbo.

KohakuV2 Models can be downloaded from Civitai and Hugging Face.

SD-Turbo is also available on Hugging Face Space.

Contributors

streamdiffusion's People

Contributors

againstentropy avatar attaq avatar chenfengxu714 avatar cid8705 avatar cocktailpeanut avatar cumulo-autumn avatar ddpn08 avatar discus0434 avatar gradientsurfer avatar ikasumi avatar jannchie avatar johndpope avatar kizamimi avatar maoku avatar migiyubi avatar mili-inch avatar ojh6404 avatar pengoosedev avatar radames avatar shinshin86 avatar teftef6220 avatar tonistew avatar yn35 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

streamdiffusion's Issues

when i execute "python vid2vid/main.py ", something wrong happend

Traceback (most recent call last):
File "/root/autodl-tmp/project/StreamDiffusion/examples/vid2vid/main.py", line 106, in
fire.Fire(main)
File "/root/miniconda3/envs/streamdiffusion/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/streamdiffusion/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/streamdiffusion/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/autodl-tmp/project/StreamDiffusion/examples/vid2vid/main.py", line 99, in main
video_result[i] = output_image.permute(1, 2, 0)
RuntimeError: The expanded size of the tensor (364) must match the existing size (600) at non-singleton dimension 1. Target sizes: [604, 364, 3]. Tensor sizes: [360, 600, 3]

Support for ComfyUI?

So far git community does not have a well-supported app on comfy, wonder if anyone could help to push this.

Tensorrt Error

Thanks for the open source code, but I got the following error when using acceleration="tensorrt":

[E] 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::validateInputBindings::2046] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::nvinfer1::rt::ExecutionContext::validateInputBindings::2046, condition: profileMinDims.d[i] <= dimensions.d[i]. Supplied binding dimension [1,4,64,64] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 4, minimum dimension in profile is 4, but supplied dimension is 1.

求助

Using c:\users\administrator\appdata\local\programs\python\python310\lib\site-packages
Searching for charset-normalizer==3.2.0
Best match: charset-normalizer 3.2.0
Adding charset-normalizer 3.2.0 to easy-install.pth file
Installing normalizer-script.py script to C:\Users\Administrator\AppData\Local\Programs\Python\Python310\Scripts
Installing normalizer.exe script to C:\Users\Administrator\AppData\Local\Programs\Python\Python310\Scripts

Using c:\users\administrator\appdata\local\programs\python\python310\lib\site-packages
Searching for typing-extensions==4.8.0
Best match: typing-extensions 4.8.0
Adding typing-extensions 4.8.0 to easy-install.pth file

Using c:\users\administrator\appdata\local\programs\python\python310\lib\site-packages
Searching for fsspec==2023.9.1
Best match: fsspec 2023.9.1
Adding fsspec 2023.9.1 to easy-install.pth file

Using c:\users\administrator\appdata\local\programs\python\python310\lib\site-packages
Searching for zipp==3.17.0
Best match: zipp 3.17.0
Adding zipp 3.17.0 to easy-install.pth file

Using c:\users\administrator\appdata\local\programs\python\python310\lib\site-packages
Finished processing dependencies for streamdiffusion==0.1.0

J:\SDSDSDSD\StreamDiffusion>
这样操作后如何让它正确的运行,用demo里面是执行后命令窗口自动退出

Support for other models?

I tried this one and it didn't work:

https://huggingface.co/SG161222/RealVisXL_V3.0_Turbo/tree/main

How can I load other models? Error is below:

You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
The config attributes {'skip_prk_steps': True} were passed to LCMScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.
Exporting model: engines/thibaud/sdxl_dpo_turbo--lcm_lora-False--tiny_vae-True--max_batch-2--min_batch-2--mode-img2img/unet.engine.onnx
/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:878: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if dim % default_overall_up_factor != 0:
Traceback (most recent call last):
  File "/home/ian/projs/StreamDiffusion/demo/realtime-img2img/../../utils/wrapper.py", line 546, in _load_model
    compile_unet(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/acceleration/tensorrt/__init__.py", line 76, in compile_unet
    builder.build(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/acceleration/tensorrt/builder.py", line 54, in build
    export_onnx(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/acceleration/tensorrt/utilities.py", line 416, in export_onnx
    torch.onnx.export(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
    _export(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 1596, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/jit/_trace.py", line 1285, in _get_trace_graph
    outs = ONNXTracedModule(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/jit/_trace.py", line 133, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/jit/_trace.py", line 124, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 967, in forward
    if "text_embeds" not in added_cond_kwargs:
TypeError: argument of type 'NoneType' is not iterable
Acceleration has failed. Falling back to normal mode.
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  9.37it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
The config attributes {'skip_prk_steps': True} were passed to LCMScheduler, but are not expected and will be ignored. Please verify your scheduler_config.json configuration file.
Exporting model: engines/thibaud/sdxl_dpo_turbo--lcm_lora-False--tiny_vae-True--max_batch-2--min_batch-2--mode-img2img/unet.engine.onnx
Traceback (most recent call last):
  File "/home/ian/projs/StreamDiffusion/demo/realtime-img2img/../../utils/wrapper.py", line 546, in _load_model
    compile_unet(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/acceleration/tensorrt/__init__.py", line 76, in compile_unet
    builder.build(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/acceleration/tensorrt/builder.py", line 54, in build
    export_onnx(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/acceleration/tensorrt/utilities.py", line 416, in export_onnx
    torch.onnx.export(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
    _export(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 1596, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/jit/_trace.py", line 1285, in _get_trace_graph
    outs = ONNXTracedModule(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/jit/_trace.py", line 133, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/jit/_trace.py", line 124, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/media/ian/extras/condaenvs/streamdiffusion/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 967, in forward
    if "text_embeds" not in added_cond_kwargs:
TypeError: argument of type 'NoneType' is not iterable
Acceleration has failed. Falling back to normal mode.

TensorRT instalation couse error on Linux

On Linux environment, TensorRT instalation couse error.
libcublasLt.so.11 is not installed. Cpying this file from other rep, TensorRT instalation is succeded.
What is correct instalation on Linux?
Environment - Linux20.04-jp + RTX4090

SDXL support

Is this method support SDXL? I just replace the model with sdxl base. And it raise error

Traceback (most recent call last):
File "/dfs/comicai/songtao.tian/StreamDiffusion/streamdiffusion_test.py", line 42, in
stream(init_image)
File "/root/miniconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/dfs/comicai/songtao.tian/StreamDiffusion/src/streamdiffusion/pipeline.py", line 461, in call
x_0_pred_out = self.predict_x0_batch(x_t_latent)
File "/dfs/comicai/songtao.tian/StreamDiffusion/src/streamdiffusion/pipeline.py", line 399, in predict_x0_batch
x_0_pred_batch, model_pred = self.unet_step(x_t_latent, t_list)
File "/dfs/comicai/songtao.tian/StreamDiffusion/src/streamdiffusion/pipeline.py", line 313, in unet_step
model_pred = self.unet(
File "/root/miniconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/streamdiffusion/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 967, in forward
if "text_embeds" not in added_cond_kwargs:
TypeError: argument of type 'NoneType' is not iterable

Readme Example with tensorrt: expected input[2, 4, 64, 64] to have 3 channels, but got 4 channels instead

I am using the img2img example given in the readme. But it doesn't compile tensorrt successfully. I got such an error:

RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[2, 4, 64, 64] to have 3 channels, but got 4 channels instead

I feel the problem is in the vae decoder compilation part. I found that Wrapper's code seems to be fine, it doesn't use accelerate_with_tensorrt, but I'm not too familiar with this piece and I don't see where the problem is.

example code:

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from diffusers.utils import load_image

from streamdiffusion import StreamDiffusion
from streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt
from streamdiffusion.image_utils import postprocess_image

# You can load any models using diffuser's StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

# Wrap the pipeline in StreamDiffusion
stream = StreamDiffusion(
    pipe,
    t_index_list=[32, 45],
    torch_dtype=torch.float16,
)

# If the loaded model is not LCM, merge LCM
stream.load_lcm_lora()
stream.fuse_lora()
# Use Tiny VAE for further acceleration
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
# Enable acceleration
stream = accelerate_with_tensorrt(
    stream, "engines", max_batch_size=2,
)


prompt = "1girl with dog hair, thick frame glasses"
# Prepare the stream
stream.prepare(prompt)

# Prepare image
init_image = load_image("assets/img2img_example.png").resize((512, 512))

# Warmup >= len(t_index_list) x frame_buffer_size
for _ in range(2):
    stream(init_image)

# Run the stream infinitely
while True:
    x_output = stream(init_image)
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

Enhancing StreamDiffusion's Efficiency for High-Resolution Image Generation

Dear StreamDiffusion Development Team,

I hope this message finds you well. I am reaching out to discuss a potential enhancement to the StreamDiffusion pipeline, particularly concerning the generation of high-resolution images. As an avid user and admirer of your innovative work, I have been utilising StreamDiffusion for various projects and have been thoroughly impressed with its performance and capabilities.

However, I have observed that when scaling up to generate images of higher resolution, there is a noticeable increase in computational demand, which in turn affects the real-time interactivity that is a hallmark of StreamDiffusion. While the current pipeline is exceptionally efficient, I believe there is an opportunity to optimise it further for high-resolution output.

To address this, I propose the following considerations:

  1. Adaptive Resolution Scaling: Implementing a mechanism that initially generates images at a lower resolution and progressively enhances them could maintain interactivity without compromising on detail.

  2. Distributed Processing: Exploring the possibility of distributing the computation across multiple GPUs could significantly reduce the time required for generating high-resolution images.

  3. Model Pruning: Investigating the effects of model pruning on the diffusion models to reduce the number of parameters, which could lead to faster computation times while maintaining image quality.

  4. Advanced Caching Strategies: Enhancing the pre-computation for KV-caches to handle higher resolution images more effectively, potentially by utilising a hierarchical caching system that prioritises the most impactful features.

I am curious to hear your thoughts on these suggestions and whether they align with the future roadmap for StreamDiffusion. I believe that by addressing the challenge of high-resolution image generation, StreamDiffusion can set a new standard for real-time interactive pipelines in the field.

Thank you for your time and consideration. I look forward to the possibility of contributing to the evolution of this remarkable tool.

Best regards,
yihong1120

multiple values for keyword argument 'opt_batch_size'

Got this error always when trying to call accelerate_with_tensorrt.

import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
from diffusers.utils import load_image

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image
from streamdiffusion.acceleration.tensorrt import accelerate_with_tensorrt

# You can load any models using diffuser's StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("KBlueLeaf/kohaku-v2.1").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

# Wrap the pipeline in StreamDiffusion
stream = StreamDiffusion(
    pipe,
    t_index_list=[18,26,35,45],
    torch_dtype=torch.float16,
)

# If the loaded model is not LCM, merge LCM
stream.load_lcm_lora()
stream.fuse_lora()
# Use Tiny VAE for further acceleration
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
# Enable acceleration
#pipe.enable_xformers_memory_efficient_attention()
stream = accelerate_with_tensorrt(
    stream, "engines", max_batch_size=2,
)


prompt = "space banana"
# Prepare the stream
stream.prepare(prompt)

# Prepare image
init_image = load_image("over1.jpeg").resize((512, 512))

# Warmup >= len(t_index_list) x frame_buffer_size
for _ in range(4):
    stream(init_image)

# Run the stream infinitely
while True:
    x_output = stream(init_image)
    postprocess_image(x_output, output_type="pil")[0].show()
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break
File "streamdiffusion\lib\site-packages\streamdiffusion\acceleration\tensorrt\__init__.py", line 137, in accelerate_with_tensorrt
    compile_unet(
TypeError: streamdiffusion.acceleration.tensorrt.compile_unet() got multiple values for keyword argument 'opt_batch_size'

only works on Python 3.10?

For some external reasons I can only use Python 3.9 or Python 3.11. Is this repository only working for Python 3.10?
I tried with 3.11 and it was complaining about torchvision only supported for versions <3.11 for the torch version with cuda 12.1.
On Python 3.9 when I do the pip install streamdiffusion[tensorrt] there are some errors that don't happen when installing it on 3.11.
Im on Windows.

Thanks,

Joan

Loading pipeline components spend a lot of time

Hello, I have written a request to process the generation of new images, but the first time a request is made, the pipeline components will be loaded and it will take several seconds. Is there a suitable way to preload them?

Silent error on wrong path

I'm struggling to set an absolute path in the demo to an already downloaded model. I get a 'Triton not found' error but then nothing, when I cancel the process it shows an exception noting that I've supplied an invalid path (I'm on windows).

What is the workflow for pointing to a downloaded model? The included models directory doesn't seem to be used anywhere...

what node.js and npm version?

when I use the demo: npm run build, it report an error:

SyntaxError: Unexpected token ;
    at Module._compile (internal/modules/cjs/loader.js:723:23)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
    at Module.load (internal/modules/cjs/loader.js:653:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
    at Function.Module._load (internal/modules/cjs/loader.js:585:3)
    at Module.require (internal/modules/cjs/loader.js:692:17)
    at require (internal/modules/cjs/helpers.js:25:18)
    at Object.<anonymous> (/data/home-siyehua/StreamDiffusion/demo/realtime-txt2img/view/node_modules/eslint-webpack-plugin/dist/getESLint.js:9:5)
    at Module._compile (internal/modules/cjs/loader.js:778:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] build: `react-scripts build`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] build script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /data/home-siyehua/.npm/_logs/2023-12-28T12_50_45_666Z-debug.log

and the log show:

0 info it worked if it ends with ok
1 verbose cli [ '/usr/bin/node', '/usr/bin/npm', 'run', 'build' ]
2 info using [email protected]
3 info using [email protected]
4 verbose run-script [ 'prebuild', 'build', 'postbuild' ]
5 info lifecycle [email protected]~prebuild: [email protected]
6 info lifecycle [email protected]~build: [email protected]
7 verbose lifecycle [email protected]~build: unsafe-perm in lifecycle true
8 verbose lifecycle [email protected]~build: PATH: /usr/lib/node_modules/npm/node_modules/npm-lifecycle/node-gyp-bin:/data/home-siyehua/StreamDiffusion/demo/realtime-txt2img/view/node_modules/.bin:/data/home-siyehua/miniconda3/envs/streamdiffusion/bin:/data/home-siyehua/miniconda3/condabin:/data/home-siyehua/.local/bin:/data/home-siyehua/bin:/usr/local/cuda/bin:/usr/share/Modules/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/data/home-siyehua/.ft
9 verbose lifecycle [email protected]~build: CWD: /data/home-siyehua/StreamDiffusion/demo/realtime-txt2img/view
10 silly lifecycle [email protected]~build: Args: [ '-c', 'react-scripts build' ]
11 silly lifecycle [email protected]~build: Returned: code: 1  signal: null
12 info lifecycle [email protected]~build: Failed to exec build script
13 verbose stack Error: [email protected] build: `react-scripts build`
13 verbose stack Exit status 1
13 verbose stack     at EventEmitter.<anonymous> (/usr/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:332:16)
13 verbose stack     at EventEmitter.emit (events.js:198:13)
13 verbose stack     at ChildProcess.<anonymous> (/usr/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14)
13 verbose stack     at ChildProcess.emit (events.js:198:13)
13 verbose stack     at maybeClose (internal/child_process.js:982:16)
13 verbose stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:259:5)
14 verbose pkgid [email protected]
15 verbose cwd /data/home-siyehua/StreamDiffusion/demo/realtime-txt2img/view
16 verbose Linux 5.4.119-19.0009.28
17 verbose argv "/usr/bin/node" "/usr/bin/npm" "run" "build"
18 verbose node v10.24.0
19 verbose npm  v6.14.11
20 error code ELIFECYCLE
21 error errno 1
22 error [email protected] build: `react-scripts build`
22 error Exit status 1
23 error Failed at the [email protected] build script.
23 error This is probably not a problem with npm. There is likely additional logging output above.
24 verbose exit [ 1, true ]

and my node is v10.24.0, npm v6.14.11 ,but react-scripts build error

vid2vid capabilities

A few questions about the vid2vid capabilites:

Has the vid2vid example been tested with any non-anime models? I tried feeding a realistic video into the suggestd anime model and unsurprisingly that didn't work well.

How well does it work if the input video does not have a plain background?

How well does it work if the input video is not similar to the output character prompt? The example Gif uses a prompt that is very close to the character the input video.

No MPS support right?

Just to be clear, this repo is for CUDA enabled devices only, correct? On initially testing, mps doesn't seem to work.

Use instance in docker as "rendering" backend?

Hi! Awesome work! It's super nice that you provided a docker image. I am wondering if we run StreamDiffusion in a docker container, it seems we can't run the screen example anymore but only headless examples. I haven't tried it but I took a look at your code of the screen example. Is that right?

I am thinking perhaps we can make a backend and stream images or videos to it and stream the results back.

output from vid2vid is wrong aspect

I input a 1280x720 video and the output was 720x1280. I fixed this with some adjustments to the code but the request is to output the same resolution as input.

Seed in realtime text2img demo

I updated App.tsx to request 1 image instead of 16 while disabling calculateEditDistance and added seed with fetchImage request.
This required adding seed param in PredictInputModel and in self.stream_diffusion = StreamDiffusionWrapper(

diff --git a/demo/realtime-txt2img/server/main.py b/demo/realtime-txt2img/server/main.py
index c950380..effbe21 100644
--- a/demo/realtime-txt2img/server/main.py
+++ b/demo/realtime-txt2img/server/main.py
@@ -29,6 +29,7 @@ class PredictInputModel(BaseModel):
     """

     prompt: str
+    seed: int


 class PredictResponseModel(BaseModel):
@@ -70,6 +71,7 @@ class Api:
             warmup=config.warmup,
             use_safety_checker=config.use_safety_checker,
             cfg_type="none",
+            seed=42,
         )
         self.app = FastAPI()
         self.app.add_api_route(
@@ -109,7 +111,7 @@ class Api:
         async with self._predict_lock:
             return PredictResponseModel(
                 base64_image=self._pil_to_base64(
-                    self.stream_diffusion(prompt=inp.prompt)
+                    self.stream_diffusion(prompt=inp.prompt, seed=inp.seed)
                 )
             )

diff --git a/utils/wrapper.py b/utils/wrapper.py
index 12f8e61..3af4b43 100644
--- a/utils/wrapper.py
+++ b/utils/wrapper.py
@@ -206,6 +206,7 @@ class StreamDiffusionWrapper:
         self,
         image: Optional[Union[str, Image.Image, torch.Tensor]] = None,
         prompt: Optional[str] = None,
+        seed: Optional[int] = 42,
     ) -> Union[Image.Image, List[Image.Image]]:
         """
         Performs img2img or txt2img based on the mode.
@@ -216,6 +217,8 @@ class StreamDiffusionWrapper:
             The image to generate from.
         prompt : Optional[str]
             The prompt to generate images from.
+        seed : Optional[int]
+            The seed to use with prompt, default -1.

         Returns
         -------
@@ -225,10 +228,11 @@ class StreamDiffusionWrapper:
         if self.mode == "img2img":
             return self.img2img(image)
         else:
-            return self.txt2img(prompt)
+            print(prompt)
+            return self.txt2img(prompt, seed)

     def txt2img(
-        self, prompt: Optional[str] = None
+        self, prompt: Optional[str] = None, seed: Optional[int] = 42
     ) -> Union[Image.Image, List[Image.Image], torch.Tensor, np.ndarray]:
         """
         Performs txt2img.
@@ -244,6 +248,7 @@ class StreamDiffusionWrapper:
             The generated image.
         """
         if prompt is not None:
+            self.stream.generator.manual_seed(seed)
             self.stream.update_prompt(prompt)

         if self.sd_turbo:

Seed is still not working. What am I missing?
EDIT: It works now. I don't understand why it takes a few images to show the effect of prompt though.

How to generate a batch of size=4 when input 1 image using img2img?

I want to generate a batchsize>1 images for one input image with img2img,
as the txt2img method,

x_t_latent = torch.randn((self.frame_bff_size, 4, self.latent_height, self.latent_width)).to(
                device=self.device, dtype=self.dtype
            )

x_t_latent can be setted with frame_bff_size, so i thought

 x_t_latent=x_t_latent.repeat((self.frame_bff_size, 1,1,1))  #self.frame_bff_size>1

maybe work for img2img,but got error when excute self.unet_step in self.predict_x0_batch funtion:

x_0_pred_batch, model_pred = self.unet_step(x_t_latent, t_list)
model_pred = self.unet(
RuntimeError: The expanded size of the tensor (6) must match the existing size (5) at non-singleton dimension 0.  Target sizes: [6].  Tensor sizes: [5]

I need some help,please.

image

json.decoder.JSONDecodeError

Hi, I am getting the following error on Windows when launching examples

(streamdiffusion) PS E:\StreamDiffusion\examples> python txt2img/single.py --output output.png --prompt "A cat with a hat" A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' unet\diffusion_pytorch_model.safetensors not found Loading pipeline components...: 29%|████████████████████████████████████████████████████████████▌ | 2/7 [00:04<00:10, 2.05s/it]E:\Anaconda\envs\streamdiffusion\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead. warnings.warn( Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:05<00:00, 1.22it/s] E:\Anaconda\envs\streamdiffusion\lib\site-packages\diffusers\loaders\lora.py:952: FutureWarning: fuse_text_encoder_lora` is deprecated and will be removed in version 0.25. You are using an old version of LoRA backend. This will be deprecated in the next releases in favor of PEFT make sure to install the latest PEFT and transformers packages in the future.
deprecate("fuse_text_encoder_lora", "0.25", LORA_DEPRECATION_MESSAGE)
Traceback (most recent call last):
File "E:\Anaconda\envs\streamdiffusion\lib\site-packages\diffusers\configuration_utils.py", line 424, in load_config
config_dict = cls._dict_from_json_file(config_file)
File "E:\Anaconda\envs\streamdiffusion\lib\site-packages\diffusers\configuration_utils.py", line 547, in dict_from_json_file
return json.loads(text)
File "E:\Anaconda\envs\streamdiffusion\lib\json_init
.py", line 346, in loads
return _default_decoder.decode(s)
File "E:\Anaconda\envs\streamdiffusion\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "E:\Anaconda\envs\streamdiffusion\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\StreamDiffusion\examples\txt2img\single.py", line 82, in
fire.Fire(main)
File "E:\Anaconda\envs\streamdiffusion\lib\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "E:\Anaconda\envs\streamdiffusion\lib\site-packages\fire\core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "E:\Anaconda\envs\streamdiffusion\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "E:\StreamDiffusion\examples\txt2img\single.py", line 54, in main
stream = StreamDiffusionWrapper(
File "E:\StreamDiffusion\examples\txt2img....\utils\wrapper.py", line 152, in init
self.stream: StreamDiffusion = self._load_model(
File "E:\StreamDiffusion\examples\txt2img....\utils\wrapper.py", line 463, in _load_model
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(
File "E:\Anaconda\envs\streamdiffusion\lib\site-packages\diffusers\models\modeling_utils.py", line 712, in from_pretrained
config, unused_kwargs, commit_hash = cls.load_config(
File "E:\Anaconda\envs\streamdiffusion\lib\site-packages\diffusers\configuration_utils.py", line 428, in load_config
raise EnvironmentError(f"It looks like the config file at '{config_file}' is not a valid JSON file.")
OSError: It looks like the config file at 'C:\Users\korze.cache\huggingface\hub\models--madebyollin--taesd\snapshots\b7456fca7c8110c3aaadf5edf34b05bfb2a1af55\config.json' is not a valid JSON file.
(streamdiffusion) PS E:\StreamDiffusion\examples>`

examples/optimal-performance/multi.py script not working. RuntimeError: Cannot re-initialize CUDA in forked subprocess.

Environment

  • ubuntu 22.04
  • rtx 4090

Problem description

After typing in python optimal-performance/multi.py or python optimal-performance/single.py as per instruction, this problem occurs:

$ python optimal-performance/multi.py 
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 16.01it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Traceback (most recent call last):
  File "/home/c4fun/code/github.com/cumulo-autumn/StreamDiffusion/examples/optimal-performance/../../utils/wrapper.py", line 411, in _load_model
    pipe: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/diffusers-0.24.0-py3.10.egg/diffusers/pipelines/pipeline_utils.py", line 864, in to
    module.to(device, dtype)
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/transformers-4.36.2-py3.10.egg/transformers/modeling_utils.py", line 2460, in to
    return super().to(*args, **kwargs)
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/opt/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Model load has failed. Doesn't exist.

AttributeError: module 'polygraphy.backend.trt.util' has no attribute 'get_bindings_per_profile'

when i use tensorrt , like
stream = accelerate_with_tensorrt(
stream, "engines", max_batch_size=2,
)
or set acceleration: Literal["none", "xformers", "tensorrt"] = "tensorrt", in examples/optimal-performance/single.py

i fix some bug mentioned in #19 ,finally,i get bug:AttributeError: module 'polygraphy.backend.trt.util' has no attribute 'get_bindings_per_profile'
from https://docs.nvidia.com/deeplearning/tensorrt/polygraphy/docs/_modules/polygraphy/backend/trt/util.html, it seem polygraphy.backend.trt.util has no attribute 'get_bindings_per_profile.

something loss in demo/realtime-txt2img

When using Docker, I encountered some issues when trying to run the demo's realtime-txt2img. Here are the problems I encountered:

  1. In addition to requirements, also need pnpm and nodejs.
  2. When starting for the first time, although multiple download attempts fail (due to network), they are always redownloaded without utilizing cache.

for 1: follow these, from https://pnpm.io/installation and https://github.com/nodesource/distributions, then ok.

# pnpm
curl -fsSL https://get.pnpm.io/install.sh | sh - 
source /root/.bashrc

# nodejs
apt-get update 
apt-get install -y ca-certificates curl gnupg 
mkdir -p /etc/apt/keyrings 
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg 
NODE_MAJOR=20 
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_$NODE_MAJOR.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list 
apt-get update
apt-get install nodejs -y

Here are the changes I made when using Docker, if someone have problems in docker:

  1. In setup.py, comment or remove "pywin32".
  2. In src\streamdiffusion\tools\install-tensorrt.py, comment or remove the part of if not is_installed("pywin32"):.
  3. In demo\realtime-txt2img\config.py, change host: str = "127.0.0.1" to host: str = "0.0.0.0". After starting, open 127.0.0.1:9090 in browser.

Missing dist/ directory in demo/realtime-txt2img/frontend

I got the error RuntimeError: Directory './frontend/dist' does not exist when running file in demo/realtime-txt2img/ main.py. Look lile ./frontend/dist is specified as static/ directory for app.mount() on line 90 in main.py but that directory is not available in the frontend/ directory.

sd-turbo model cannot use

When I was in examples/txt2img/single. Py used sd-turbo model, I made the following changes:

  1. change the model_path as model_id_or_path: str = "/006data/han/snapfusion/models/sd-turbo/",
  2. change the t_id_list as t_index_list=[45], Because I found that when I use the sd_turbo model, it call the txt2img_sd_turbo function in pipeline.py. the unet is used as :
    model_pred = self.unet(
    x_t_latent,
    self.sub_timesteps_tensor,
    encoder_hidden_states=self.prompt_embeds,
    return_dict=False,
    )[0]
    so the length of self.sub_timesteps_tensor must be 1. The default is 4, so the error is reported
    Nothing else has changed, but the resulting image is noise.
    屏幕截图 2023-12-29 095915

Tensorrt install error

python -m pip install --pre -i https://pypi.nvidia.com tensorrt==9.0.1.post11.dev4
Looking in indexes: https://pypi.nvidia.com
Collecting tensorrt==9.0.1.post11.dev4
Using cached tensorrt-9.0.1.post11.dev4-py2.py3-none-any.whl
Collecting tensorrt-libs==9.0.1.post11.dev4 (from tensorrt==9.0.1.post11.dev4)
Using cached https://pypi.nvidia.com/tensorrt-libs/tensorrt_libs-9.0.1.post11.dev4-py2.py3-none-manylinux_2_17_x86_64.whl (1060.6 MB)
Collecting tensorrt-bindings==9.0.1.post11.dev4 (from tensorrt==9.0.1.post11.dev4)
Using cached https://pypi.nvidia.com/tensorrt-bindings/tensorrt_bindings-9.0.1.post11.dev4-cp310-none-manylinux_2_17_x86_64.whl (956 kB)
INFO: pip is looking at multiple versions of tensorrt-libs to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement nvidia-cuda-runtime-cu11 (from tensorrt-libs) (from versions: none)
ERROR: No matching distribution found for nvidia-cuda-runtime-cu11

tensorrt error

TypeError: streamdiffusion.acceleration.tensorrt.compile_unet() got multiple values for keyword argument 'opt_batch_size'

?

Loopback?

If I put the generated image in capture area in screen example, there is a noticable gap of 3-4 frames between capture and output. How do I go about making sure that capture is done after the image is put on screen such that it creates a loopback and iteratively improves the image?

frame_buffer_size is 1 already.

something wrong

/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/diffusers/loaders/lora.py:952: FutureWarning: `fuse_text_encoder_lora` is deprecated and will be removed in version 0.25. You are using an old version of LoRA backend. This will be deprecated in the next releases in favor of PEFT make sure to install the latest PEFT and transformers packages in the future.
  deprecate("fuse_text_encoder_lora", "0.25", LORA_DEPRECATION_MESSAGE)
Traceback (most recent call last):
  File "/home/StreamDiffusion/examples/txt2img/single.py", line 83, in <module>
    fire.Fire(main)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/StreamDiffusion/examples/txt2img/single.py", line 78, in main
    output_image = stream()
  File "/home/StreamDiffusion/examples/txt2img/../../utils/wrapper.py", line 230, in __call__
    return self.txt2img(prompt)
  File "/home/StreamDiffusion/examples/txt2img/../../utils/wrapper.py", line 254, in txt2img
    image_tensor = self.stream.txt2img(self.frame_buffer_size)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/streamdiffusion/pipeline.py", line 473, in txt2img
    x_0_pred_out = self.predict_x0_batch(
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/streamdiffusion/pipeline.py", line 423, in predict_x0_batch
    x_0_pred, model_pred = self.unet_step(x_t_latent, t, idx)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/streamdiffusion/pipeline.py", line 313, in unet_step
    model_pred = self.unet(
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 1035, in forward
    sample = self.conv_in(sample)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/root/anaconda3/envs/streamsd/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: GET was unable to find an engine to execute this computation
(streamsd) [root@iZ8vbe58az41quqd311x5rZ txt2img]# 

env:
cuda:11.8
python 3.10
torch: 2.1.0
torchvision: 0.16.0

The demo environment is incorrectly configured, and the webpage is not Found.

Hello author,
I encountered some problems while using your model.
I did the following steps:

  1. Run start.sh according to the instructions, and then the npm command reported a lot of errors. I tried to use the "npm audit fix --force" command to fix it, but it didn't work.
  2. Then run "main.py" in "server", the webpage can be opened but display "Not Found"

image

image

RuntimeError: "clamp_scalar_cpu" not implemented for 'Half'

This error occurred when I tried to run it.
/home/ubuntu/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/diffusers/image_processor.py:339: FutureWarning: Passing imageas torch tensor with value range in [-1,1] is deprecated. The expected value range for image tensor is [0,1] when passing as pytorch tensor or numpy Array. You passedimage with value range [-0.050994873046875,1.0] warnings.warn( Exception in thread Thread-1 (_receive_images): Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/streamdiffusion/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/ubuntu/anaconda3/envs/streamdiffusion/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/ubuntu/GITHUG/StreamDiffusion/examples/screen/../../utils/viewer.py", line 54, in _receive_images postprocess_image(queue.get(block=False), output_type="pil")[0], File "/home/ubuntu/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/image_utils.py", line 60, in postprocess_image [ File "/home/ubuntu/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/image_utils.py", line 61, in <listcomp> denormalize(image[i]) if do_denormalize[i] else image[i] File "/home/ubuntu/anaconda3/envs/streamdiffusion/lib/python3.10/site-packages/streamdiffusion/image_utils.py", line 13, in denormalize return (images / 2 + 0.5).clamp(0, 1) RuntimeError: "clamp_scalar_cpu" not implemented for 'Half'

Env:ubuntu1804
GPU:RTX3090*2
cuda:11.4

Ablation on effect of similarity cache?

Maybe this is more suited as a comment to the paper, but my interest is in knowing the contribution of similarity cacheing to the claimed FPS, since the choice of video (that with many similar frames vs dynamic video) can have a large effect.

Hence, adding a column showing ablation with

  1. no continuous batching
  2. no similarity cache

would greatly help in untangling the contrubution of each of the optimization proposals.

Model load has failed.(txt2img or screen )have the same error

windows11
error infomation :

Traceback (most recent call last):
File "E:\Study\AI\StreamDiffusion-main\examples\txt2img....\utils\wrapper.py", line 411, in _load_model
pipe: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1090, in from_pretrained
cached_folder = cls.download(
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1649, in download
info = model_info(
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\hf_api.py", line 2084, in model_info
r = get_session().get(path, headers=headers, timeout=timeout, params=params)
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\utils_http.py", line 67, in send
return super().send(request, *args, **kwargs)
File "C:\Users\whpww\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 507, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/KBlueLeaf/kohaku-v2.1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000016FB6333C10>, 'Connection to huggingface.co timed out. (connect timeout=None)'))"), '(Request ID: c900e75e-2897-4ed0-94d4-2db8fec92845)')
Model load has failed. Doesn't exist.

screen/main.py Cannot re-initialize CUDA in forked subprocess.

ubutn 22.04 Cuda12.1 RTX3090

when running

cd example
python screen/main.py

And then press enter to start the subprocess, I am havving cuda cannot run on forked subprocess error

(streamdiffusion) hangyu5@yhyu13fuwuqi:~/Documents/Git-repoMy/AIResearchVault/repo/AIGC/game changer/StreamDiffusion/examples$ python screen/main.py
vae/diffusion_pytorch_model.safetensors not found
Loading pipeline components...:  14%|███████████████████▋                                                                                                                      | 1/7 [00:01<00:09,  1.59s/it]/home/hangyu5/anaconda3/envs/streamdiffusion/lib/python3.11/site-packages/transformers-4.36.2-py3.11.egg/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00,  1.97it/s]
Traceback (most recent call last):
  File "/home/hangyu5/Documents/Git-repoMy/AIResearchVault/repo/AIGC/game changer/StreamDiffusion/examples/screen/../../utils/wrapper.py", line 413, in _load_model
    ).to(device=self.device, dtype=self.dtype)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/streamdiffusion/lib/python3.11/site-packages/diffusers-0.24.0-py3.11.egg/diffusers/pipelines/pipeline_utils.py", line 864, in to
    module.to(device, dtype)
  File "/home/hangyu5/anaconda3/envs/streamdiffusion/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/streamdiffusion/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/hangyu5/anaconda3/envs/streamdiffusion/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/streamdiffusion/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyu5/anaconda3/envs/streamdiffusion/lib/python3.11/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Model load has failed. Doesn't exist.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.