voltaml / voltaml-fast-stable-diffusion Goto Github PK

Beautiful and Easy to use Stable Diffusion WebUI

Home Page: https://voltaml.github.io/voltaML-fast-stable-diffusion/

License: GNU General Public License v3.0

Python 72.05% Shell 0.13% HTML 0.02% CSS 0.08% JavaScript 0.02% Vue 19.89% TypeScript 4.15% Rust 3.06% Batchfile 0.03% C++ 0.12% MATLAB 0.46%

ai-art generative-art linux pytorch stable-diffusion text2image windows aitemplate naive-ui python

voltaml-fast-stable-diffusion's Introduction

VoltaML - Fast Stable Diffusion

Stable Diffusion WebUI and API accelerated by AITemplate

Documentation · Report Bug · Request Feature

Made with ❤️ by Stax124, Gabe, and the community

About the Project
Contributing
License
Contact

About the Project

Screenshots

Tech Stack

Client

API

Discord Bot

Discord.py

DevOps

Main features

Easy install with Docker
Clean and simple Web UI
Supports PyTorch as well as AITemplat for inference
Support for Windows and Linux
Documented API

Speed comparison

Please refer to this table. Data had a small sample size and was usually collected on a single machine. Your results may vary.

Installation

Please see the documentation for installation instructions.

Contributing

Contributions are always welcome!

License

Distributed under the GPL v3. See License for more information.

Contact

Feel free to contact us on Discord: https://discord.gg/pY5SVyHmWm

Project Link: https://github.com/VoltaML/voltaML-fast-stable-diffusion

voltaml-fast-stable-diffusion's People

Contributors

Stargazers

Watchers

Forkers

chitalian cian0 kamalkraj techthiyanes bankxi eltociear harishprabhala xmyx christopher-altman wwfs marcus-arcadius daniel-kelvich stanleyjacob vdt yucklou umag nawnie kylegalbraith themindexpansionnetwork stax124 chenchy phi-line victorlewis aymendje venetanji camenduru axel-havard jiya126 nod-hosseini gdkeller gabe56f deeyonn alvinsay aaronsantiago kyapp69 pdragonlabs chuanbei888 tarah7579 alphaatlas w4l6 gsrathoreniks miningp git-tengsun vn-os rossman22590 zaks blueming333 breyness mrelmida tbergman kekewind lievreai atlury g2262853652 van-wise lucysck bigfoxmedia jahangir091 krzysztofkowalski cris-almodovar katobuu papiguy zaibutcooler apollohuang1 anh-bk steveefemsc huyxuhao git1ser mukseq kelvinsoh8 wweevv-johndpope keyzf songfang omnipotentai akiko-45 exmachina100 aliang-cv

voltaml-fast-stable-diffusion's Issues

TRT Inference Not Working [volta_trt_flash]

[E] 3: [executionContext.cpp::validateInputBindings::1831] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::validateInputBindings::1831, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [2,4,64,96] for bindings[0] exceed min ~ max range at index 3, maximum dimension in profile is 64, minimum dimension in profile is 64, but supplied dimension is 96.

Exception in thread Thread-87:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 544, in infer_trt
    images = demo.infer(prompt, negative_prompt, args.height, args.width, verbose=args.verbose, seed=args.seed)
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 404, in infer
    noise_pred = self.runEngine(self.unet_model_key, {"sample": sample_inp, "timestep": timestep_inp, "encoder_hidden_states": embeddings_inp})['latent']
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 271, in runEngine
    return engine.infer(feed_dict, self.stream)
  File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 108, in infer
    raise ValueError(f"ERROR: inference failed.")
ValueError: ERROR: inference failed.

rtx4090
used original Dockerfile from the volta_trt_flash branch.

Process killed when using "volta_accelerate.py"

Hello, I encountered a problem when trying to use volta_accelerate.py

System info:
Windows 10
Nvidia 3060Ti 8GB
i5-11400F
Using Docker Desktop with WSL2 (Ubuntu) with the voltaML Docker Container image

Here is the command and the log I got

root@f585f96dd9a2:/workspace/voltaML-fast-stable-diffusion# python3 volta_accelerate.py --model="runwayml/stable-diffusion-v1-5"
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [05:11<00:00, 11.0MB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 589kB/s]
/workspace/voltaML-fast-stable-diffusion/diffusers/models/unet_2d_condition_onnx.py:274: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]):
/workspace/voltaML-fast-stable-diffusion/diffusers/models/resnet.py:182: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
/workspace/voltaML-fast-stable-diffusion/diffusers/models/resnet.py:187: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
/workspace/voltaML-fast-stable-diffusion/diffusers/models/resnet.py:109: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
/workspace/voltaML-fast-stable-diffusion/diffusers/models/resnet.py:122: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if hidden_states.shape[0] >= 64:
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:777: UserWarning: no signature found for <torch.ScriptMethod object at 0x7fa7a64aa040>, skipping _decide_input_format
  warnings.warn(f"{e}, skipping _decide_input_format")
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1880: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input latent_model_input
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1880: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input t
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1880: UserWarning: No names were found for specified dynamic axes of provided input.Automatically generated names will be applied to each dynamic axes of input encoder_hidden_states
  warnings.warn(
/opt/conda/lib/python3.8/site-packages/torch/onnx/_patch_torch.py:67: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1874.)
  torch._C._jit_pass_onnx_node_shape_type_inference(
/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:648: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1874.)
  _C._jit_pass_onnx_graph_shape_type_inference(
Killed

The last log is just "Killed" so I don't know where to investigate, maybe it's due to my graphics card not having enough VRAM?

Websockets connection sudden break when set "batch count" more than 1

logs:
02:10:14 | root | INFO » Adding job e2ffa91a-e182-45ca-a93b-6112251cbd6c to queue
100% 25/25 [00:01<00:00, 17.57it/s]
100% 25/25 [00:01<00:00, 17.52it/s]
100% 25/25 [00:01<00:00, 17.43it/s]
100% 25/25 [00:01<00:00, 17.46it/s]
100% 25/25 [00:01<00:00, 17.43it/s]
100% 25/25 [00:01<00:00, 17.23it/s]
100% 25/25 [00:01<00:00, 17.40it/s]
100% 25/25 [00:01<00:00, 17.42it/s]
100% 25/25 [00:01<00:00, 17.32it/s]
INFO: 172.18.22.48:62843 - "POST /api/generate/txt2img HTTP/1.1" 200 OK
02:10:33 | asyncio | ERROR » Task exception was never retrieved
future: <Task finished name='Task-4' coro=<WebSocketManager.perf_loop() done, defined at /app/api/websockets/manager.py:32> exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=1000, reason=''), True)>
Traceback (most recent call last):
File "/app/api/websockets/manager.py", line 60, in perf_loop
await self.broadcast(Data(data_type="cluster_stats", data=data))
File "/app/api/websockets/manager.py", line 83, in broadcast
await connection.send_json(data.to_json())
File "/usr/local/lib/python3.8/dist-packages/starlette/websockets.py", line 173, in send_json
await self.send({"type": "websocket.send", "text": text})
File "/usr/local/lib/python3.8/dist-packages/starlette/websockets.py", line 85, in send
await self._send(message)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/exceptions.py", line 65, in sender
await send(message)
File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/websockets/websockets_impl.py", line 327, in asgi_send
await self.send(data) # type: ignore[arg-type]
File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/protocol.py", line 635, in send
await self.ensure_open()
File "/usr/local/lib/python3.8/dist-packages/websockets/legacy/protocol.py", line 953, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
ERROR: Exception in ASGI application

error with using optimize.sh

Traceback (most recent call last):
File "volta_accelerate.py", line 153, in
convert_to_onnx(args)
File "volta_accelerate.py", line 79, in convert_to_onnx
traced_model = torch.jit.trace(
File "/home/work/python/lib/python3.8/site-packages/torch/jit/_trace.py", line 750, in trace
return trace_module(
File "/home/work/python/lib/python3.8/site-packages/torch/jit/_trace.py", line 967, in trace_module
module._c._create_method_from_trace(
File "/home/work/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/work/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1118, in _slow_forward
result = self.forward(*input, **kwargs)
TypeError: forward() takes from 4 to 5 positional arguments but 6 were given

[Feature Request] Support for Real-ESRGAN and GFPGAN

Would it be possible to implement it or doesn't it work with these special models?

Documentation

First of all, thank you for making compilation much simpler.

However, I think proper documentation is needed for better adoption of this technology

More specifically the infer_trt function on

voltaML-fast-stable-diffusion/volta_accelerate.py

Line 446 in f3db583

    
           def infer_trt(saving_path, model, prompt, neg_prompt, img_height, img_width, num_inference_steps, guidance_scale, num_images_per_prompt, seed=None):

I would also like to make sure that what all the "convertor" does is compile it into tensorRT.

Thank you.

[Bug]: Web UI sometimes disconnects under heavy load and breaks the preview.

Describe the bug

As the title says, sometimes while generating in the experimental branch, the Web UI will disconnects. Upon reconnecting, the noise preview feature won't work, and there is a 1-2 second delay before the generated image pops up in the UI. Disconnecting and reconnecting again doesn't fix the issue, the server has to be restarted.

This issue goes back awhile, a few weeks at least.

This is one of those annoying non deterministic issues I can't just trigger on the experimental brach... but the exact same error consistently happens when testing this torch.compile PR: #72

If I close the Web UI and open it when the model is done compiling/the image is done generating, it wont error out.

Seems to be related to some kind of networking "heartbeat" timeout? I tried making a few config changes, but had no luck:

PIPE_COMPILE_SET
  0%|                                                                             | 0/25 [00:00<?, ?it/s]
INFO     22:18:59 | uvicorn.access » 127.0.0.1:57130 - "POST /api/generate/txt2img        h11_impl.py:498
         HTTP/1.1" 500
ERROR    22:18:59 | uvicorn.error » Exception in ASGI application                         h11_impl.py:433

         ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/gpu.py:213 in generate     │
         │                                                                              │
         │   210 │   │   │   except Exception as err:  # pylint: disable=broad-except   │
         │   211 │   │   │   │   self.memory_cleanup()                                  │
         │   212 │   │   │   │   self.queue.mark_finished()                             │
         │ ❱ 213 │   │   │   │   raise err                                              │
         │   214 │   │   │                                                              │
         │   215 │   │   │   deltatime = time.time() - start_time                       │
         │   216                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/gpu.py:180 in generate     │
         │                                                                              │
         │   177 │   │   │   # Generate images                                          │
         │   178 │   │   │   try:                                                       │
         │   179 │   │   │   │   generated_images: Optional[List[Image.Image]]          │
         │ ❱ 180 │   │   │   │   generated_images = await run_in_thread_async(          │
         │   181 │   │   │   │   │   func=generate_thread_call, args=(job,)             │
         │   182 │   │   │   │   )                                                      │
         │   183                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/utils.py:104 in            │
         │ run_in_thread_async                                                          │
         │                                                                              │
         │   101 │   value, exc = thread.join()                                         │
         │   102 │                                                                      │
         │   103 │   if exc:                                                            │
         │ ❱ 104 │   │   raise exc                                                      │
         │   105 │                                                                      │
         │   106 │   return value                                                       │
         │   107                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/thread.py:45 in run        │
         │                                                                              │
         │   42 │   │   │   │   │   │   "Executing coroutine %s in %s", target.__name__ │
         │   43 │   │   │   │   │   )                                                   │
         │   44 │   │   │   │   │   try:                                                │
         │ ❱ 45 │   │   │   │   │   │   self._return = target(*self._args, **self._kwar │
         │      ignore                                                                  │
         │   46 │   │   │   │   │   except Exception as err:  # pylint: disable=broad-e │
         │   47 │   │   │   │   │   │   self._err = err                                 │
         │   48 │   │   │   │   else:                                                   │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/gpu.py:125 in              │
         │ generate_thread_call                                                         │
         │                                                                              │
         │   122 │   │   │                                                              │
         │   123 │   │   │   if isinstance(model, PyTorchStableDiffusion):              │
         │   124 │   │   │   │   logger.debug("Generating with PyTorch")                │
         │ ❱ 125 │   │   │   │   images: List[Image.Image] = model.generate(job)        │
         │   126 │   │   │   elif isinstance(model, AITemplateStableDiffusion):         │
         │   127 │   │   │   │   logger.debug("Generating with AITemplate")             │
         │   128 │   │   │   │   images: List[Image.Image] = model.generate(job)        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/pytorch.py:621   │
         │ in generate                                                                  │
         │                                                                              │
         │   618 │   │   │   │   raise ValueError("Invalid job type for this pipeline") │
         │   619 │   │   except Exception as e:                                         │
         │   620 │   │   │   self.memory_cleanup()                                      │
         │ ❱ 621 │   │   │   raise e                                                    │
         │   622 │   │                                                                  │
         │   623 │   │   # Clean memory and return images                               │
         │   624 │   │   self.memory_cleanup()                                          │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/pytorch.py:610   │
         │ in generate                                                                  │
         │                                                                              │
         │   607 │   │                                                                  │
         │   608 │   │   try:                                                           │
         │   609 │   │   │   if isinstance(job, Txt2ImgQueueEntry):                     │
         │ ❱ 610 │   │   │   │   images = self.txt2img(job)                             │
         │   611 │   │   │   elif isinstance(job, Img2ImgQueueEntry):                   │
         │   612 │   │   │   │   images = self.img2img(job)                             │
         │   613 │   │   │   elif isinstance(job, InpaintQueueEntry):                   │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/pytorch.py:257   │
         │ in txt2img                                                                   │
         │                                                                              │
         │   254 │   │   │   if "highres_fix" in job.flags:                             │
         │   255 │   │   │   │   output_type = "latent"                                 │
         │   256 │   │   │                                                              │
         │ ❱ 257 │   │   │   data = pipe.text2img(                                      │
         │   258 │   │   │   │   prompt=job.data.prompt,                                │
         │   259 │   │   │   │   height=job.data.height,                                │
         │   260 │   │   │   │   width=job.data.width,                                  │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/lwp_sd.py:685 in │
         │ text2img                                                                     │
         │                                                                              │
         │   682 │   │   │   list of `bool`s denoting whether the corresponding generat │
         │       represents "not-safe-for-work"                                         │
         │   683 │   │   │   (nsfw) content, according to the `safety_checker`.         │
         │   684 │   │   """                                                            │
         │ ❱ 685 │   │   return self.__call__(                                          │
         │   686 │   │   │   prompt=prompt,                                             │
         │   687 │   │   │   negative_prompt=negative_prompt,                           │
         │   688 │   │   │   height=height,                                             │
         │                                                                              │
         │ /home/alpha/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py:1 │
         │ 15 in decorate_context                                                       │
         │                                                                              │
         │   112 │   @functools.wraps(func)                                             │
         │   113 │   def decorate_context(*args, **kwargs):                             │
         │   114 │   │   with ctx_factory():                                            │
         │ ❱ 115 │   │   │   return func(*args, **kwargs)                               │
         │   116 │                                                                      │
         │   117 │   return decorate_context                                            │
         │   118                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference/lwp_sd.py:580 in │
         │ __call__                                                                     │
         │                                                                              │
         │   577 │   │   │   │   # call the callback, if provided                       │
         │   578 │   │   │   │   if i % callback_steps == 0:                            │
         │   579 │   │   │   │   │   if callback is not None:                           │
         │ ❱ 580 │   │   │   │   │   │   callback(i, t, latents)  # type: ignore        │
         │   581 │   │   │   │   │   if is_cancelled_callback is not None and is_cancel │
         │   582 │   │   │   │   │   │   return None                                    │
         │   583                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference_callbacks.py:47  │
         │ in txt2img_callback                                                          │
         │                                                                              │
         │    44 def txt2img_callback(step: int, _timestep: int, tensor: torch.Tensor): │
         │    45 │   "Callback for txt2img with progress and partial image"             │
         │    46 │                                                                      │
         │ ❱  47 │   images, send_image = pytorch_callback(step, _timestep, tensor)     │
         │    48 │                                                                      │
         │    49 │   websocket_manager.broadcast_sync(                                  │
         │    50 │   │   data=Data(                                                     │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/inference_callbacks.py:173 │
         │ in pytorch_callback                                                          │
         │                                                                              │
         │   170 │                                                                      │
         │   171 │   if shared.interrupt:                                               │
         │   172 │   │   shared.interrupt = False                                       │
         │ ❱ 173 │   │   raise InferenceInterruptedError                                │
         │   174 │                                                                      │
         │   175 │   shared.current_done_steps += 1                                     │
         │   176 │   send_image: bool = time.time() - last_image_time > config.api.imag │
         ╰──────────────────────────────────────────────────────────────────────────────╯
         InferenceInterruptedError

         During handling of the above exception, another exception occurred:

         ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/uvicorn/protocols/http/h11_impl.py:428 in run_asgi                        │
         │                                                                              │
         │   425 │   # ASGI exception wrapper                                           │
         │   426 │   async def run_asgi(self, app: "ASGI3Application") -> None:         │
         │   427 │   │   try:                                                           │
         │ ❱ 428 │   │   │   result = await app(  # type: ignore[func-returns-value]    │
         │   429 │   │   │   │   self.scope, self.receive, self.send                    │
         │   430 │   │   │   )                                                          │
         │   431 │   │   except BaseException as exc:                                   │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/uvicorn/middleware/proxy_headers.py:78 in __call__                        │
         │                                                                              │
         │   75 │   │   │   │   │   port = 0                                            │
         │   76 │   │   │   │   │   scope["client"] = (host, port)  # type: ignore[arg- │
         │   77 │   │                                                                   │
         │ ❱ 78 │   │   return await self.app(scope, receive, send)                     │
         │   79                                                                         │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/applications.py:276 in __call__                                   │
         │                                                                              │
         │   273 │   async def __call__(self, scope: Scope, receive: Receive, send: Sen │
         │   274 │   │   if self.root_path:                                             │
         │   275 │   │   │   scope["root_path"] = self.root_path                        │
         │ ❱ 276 │   │   await super().__call__(scope, receive, send)                   │
         │   277 │                                                                      │
         │   278 │   def add_api_route(                                                 │
         │   279 │   │   self,                                                          │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/applications.py:122 in __call__                                 │
         │                                                                              │
         │   119 │   │   scope["app"] = self                                            │
         │   120 │   │   if self.middleware_stack is None:                              │
         │   121 │   │   │   self.middleware_stack = self.build_middleware_stack()      │
         │ ❱ 122 │   │   await self.middleware_stack(scope, receive, send)              │
         │   123 │                                                                      │
         │   124 │   def on_event(self, event_type: str) -> typing.Callable:  # pragma: │
         │   125 │   │   return self.router.on_event(event_type)                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/errors.py:184 in __call__                            │
         │                                                                              │
         │   181 │   │   │   # We always continue to raise the exception.               │
         │   182 │   │   │   # This allows servers to log the error, or allows test cli │
         │   183 │   │   │   # to optionally raise the error within the test case.      │
         │ ❱ 184 │   │   │   raise exc                                                  │
         │   185 │                                                                      │
         │   186 │   def format_line(                                                   │
         │   187 │   │   self, index: int, line: str, frame_lineno: int, frame_index: i │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/errors.py:162 in __call__                            │
         │                                                                              │
         │   159 │   │   │   await send(message)                                        │
         │   160 │   │                                                                  │
         │   161 │   │   try:                                                           │
         │ ❱ 162 │   │   │   await self.app(scope, receive, _send)                      │
         │   163 │   │   except Exception as exc:                                       │
         │   164 │   │   │   request = Request(scope)                                   │
         │   165 │   │   │   if self.debug:                                             │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/cors.py:92 in __call__                               │
         │                                                                              │
         │    89 │   │   │   await response(scope, receive, send)                       │
         │    90 │   │   │   return                                                     │
         │    91 │   │                                                                  │
         │ ❱  92 │   │   await self.simple_response(scope, receive, send, request_heade │
         │    93 │                                                                      │
         │    94 │   def is_allowed_origin(self, origin: str) -> bool:                  │
         │    95 │   │   if self.allow_all_origins:                                     │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/cors.py:147 in simple_response                       │
         │                                                                              │
         │   144 │   │   self, scope: Scope, receive: Receive, send: Send, request_head │
         │   145 │   ) -> None:                                                         │
         │   146 │   │   send = functools.partial(self.send, send=send, request_headers │
         │ ❱ 147 │   │   await self.app(scope, receive, send)                           │
         │   148 │                                                                      │
         │   149 │   async def send(                                                    │
         │   150 │   │   self, message: Message, send: Send, request_headers: Headers   │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/exceptions.py:79 in __call__                         │
         │                                                                              │
         │    76 │   │   │   │   handler = self._lookup_exception_handler(exc)          │
         │    77 │   │   │                                                              │
         │    78 │   │   │   if handler is None:                                        │
         │ ❱  79 │   │   │   │   raise exc                                              │
         │    80 │   │   │                                                              │
         │    81 │   │   │   if response_started:                                       │
         │    82 │   │   │   │   msg = "Caught handled exception, but response already  │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/exceptions.py:68 in __call__                         │
         │                                                                              │
         │    65 │   │   │   await send(message)                                        │
         │    66 │   │                                                                  │
         │    67 │   │   try:                                                           │
         │ ❱  68 │   │   │   await self.app(scope, receive, sender)                     │
         │    69 │   │   except Exception as exc:                                       │
         │    70 │   │   │   handler = None                                             │
         │    71                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/middleware/asyncexitstack.py:21 in __call__                       │
         │                                                                              │
         │   18 │   │   │   │   │   await self.app(scope, receive, send)                │
         │   19 │   │   │   │   except Exception as e:                                  │
         │   20 │   │   │   │   │   dependency_exception = e                            │
         │ ❱ 21 │   │   │   │   │   raise e                                             │
         │   22 │   │   │   if dependency_exception:                                    │
         │   23 │   │   │   │   # This exception was possibly handled by the dependency │
         │   24 │   │   │   │   # still bubble up so that the ServerErrorMiddleware can │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/middleware/asyncexitstack.py:18 in __call__                       │
         │                                                                              │
         │   15 │   │   │   async with AsyncExitStack() as stack:                       │
         │   16 │   │   │   │   scope[self.context_name] = stack                        │
         │   17 │   │   │   │   try:                                                    │
         │ ❱ 18 │   │   │   │   │   await self.app(scope, receive, send)                │
         │   19 │   │   │   │   except Exception as e:                                  │
         │   20 │   │   │   │   │   dependency_exception = e                            │
         │   21 │   │   │   │   │   raise e                                             │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/routing.py:718 in __call__                                      │
         │                                                                              │
         │   715 │   │   │   match, child_scope = route.matches(scope)                  │
         │   716 │   │   │   if match == Match.FULL:                                    │
         │   717 │   │   │   │   scope.update(child_scope)                              │
         │ ❱ 718 │   │   │   │   await route.handle(scope, receive, send)               │
         │   719 │   │   │   │   return                                                 │
         │   720 │   │   │   elif match == Match.PARTIAL and partial is None:           │
         │   721 │   │   │   │   partial = route                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/routing.py:276 in handle                                        │
         │                                                                              │
         │   273 │   │   │   │   )                                                      │
         │   274 │   │   │   await response(scope, receive, send)                       │
         │   275 │   │   else:                                                          │
         │ ❱ 276 │   │   │   await self.app(scope, receive, send)                       │
         │   277 │                                                                      │
         │   278 │   def __eq__(self, other: typing.Any) -> bool:                       │
         │   279 │   │   return (                                                       │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/routing.py:66 in app                                            │
         │                                                                              │
         │    63 │   async def app(scope: Scope, receive: Receive, send: Send) -> None: │
         │    64 │   │   request = Request(scope, receive=receive, send=send)           │
         │    65 │   │   if is_coroutine:                                               │
         │ ❱  66 │   │   │   response = await func(request)                             │
         │    67 │   │   else:                                                          │
         │    68 │   │   │   response = await run_in_threadpool(func, request)          │
         │    69 │   │   await response(scope, receive, send)                           │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/routing.py:237 in app                                             │
         │                                                                              │
         │    234 │   │   if errors:                                                    │
         │    235 │   │   │   raise RequestValidationError(errors, body=body)           │
         │    236 │   │   else:                                                         │
         │ ❱  237 │   │   │   raw_response = await run_endpoint_function(               │
         │    238 │   │   │   │   dependant=dependant, values=values, is_coroutine=is_c │
         │    239 │   │   │   )                                                         │
         │    240                                                                       │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/fastapi/routing.py:163 in run_endpoint_function                           │
         │                                                                              │
         │    160 │   assert dependant.call is not None, "dependant.call must be a func │
         │    161 │                                                                     │
         │    162 │   if is_coroutine:                                                  │
         │ ❱  163 │   │   return await dependant.call(**values)                         │
         │    164 │   else:                                                             │
         │    165 │   │   return await run_in_threadpool(dependant.call, **values)      │
         │    166                                                                       │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/api/routes/generate.py:35 in    │
         │ txt2img_job                                                                  │
         │                                                                              │
         │    32 │   try:                                                               │
         │    33 │   │   images: Union[List[Image.Image], List[str]]                    │
         │    34 │   │   time: float                                                    │
         │ ❱  35 │   │   images, time = await gpu.generate(job)                         │
         │    36 │   except ModelNotLoadedError:                                        │
         │    37 │   │   raise HTTPException(  # pylint: disable=raise-missing-from     │
         │    38 │   │   │   status_code=400, detail="Model is not loaded"              │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/core/gpu.py:233 in generate     │
         │                                                                              │
         │   230 │   │   │                                                              │
         │   231 │   │   │   return (images, deltatime)                                 │
         │   232 │   │   except InferenceInterruptedError:                              │
         │ ❱ 233 │   │   │   await websocket_manager.broadcast(                         │
         │   234 │   │   │   │   Notification(                                          │
         │   235 │   │   │   │   │   "warning",                                         │
         │   236 │   │   │   │   │   "Inference interrupted",                           │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/api/websockets/manager.py:156   │
         │ in broadcast                                                                 │
         │                                                                              │
         │   153 │   │                                                                  │
         │   154 │   │   for connection in self.active_connections:                     │
         │   155 │   │   │   if connection.application_state.CONNECTED:                 │
         │ ❱ 156 │   │   │   │   await connection.send_json(data.to_json())             │
         │   157 │   │   │   else:                                                      │
         │   158 │   │   │   │   self.active_connections.remove(connection)             │
         │   159                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/websockets.py:173 in send_json                                  │
         │                                                                              │
         │   170 │   │   │   raise RuntimeError('The "mode" argument should be "text" o │
         │   171 │   │   text = json.dumps(data)                                        │
         │   172 │   │   if mode == "text":                                             │
         │ ❱ 173 │   │   │   await self.send({"type": "websocket.send", "text": text})  │
         │   174 │   │   else:                                                          │
         │   175 │   │   │   await self.send({"type": "websocket.send", "bytes": text.e │
         │   176                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/websockets.py:85 in send                                        │
         │                                                                              │
         │    82 │   │   │   │   )                                                      │
         │    83 │   │   │   if message_type == "websocket.close":                      │
         │    84 │   │   │   │   self.application_state = WebSocketState.DISCONNECTED   │
         │ ❱  85 │   │   │   await self._send(message)                                  │
         │    86 │   │   else:                                                          │
         │    87 │   │   │   raise RuntimeError('Cannot call "send" once a close messag │
         │    88                                                                        │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/starlette/middleware/exceptions.py:65 in sender                           │
         │                                                                              │
         │    62 │   │   │                                                              │
         │    63 │   │   │   if message["type"] == "http.response.start":               │
         │    64 │   │   │   │   response_started = True                                │
         │ ❱  65 │   │   │   await send(message)                                        │
         │    66 │   │                                                                  │
         │    67 │   │   try:                                                           │
         │    68 │   │   │   await self.app(scope, receive, sender)                     │
         │                                                                              │
         │ /home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packag │
         │ es/uvicorn/protocols/websockets/websockets_impl.py:345 in asgi_send          │
         │                                                                              │
         │   342 │   │                                                                  │
         │   343 │   │   else:                                                          │
         │   344 │   │   │   msg = "Unexpected ASGI message '%s', after sending 'websoc │
         │ ❱ 345 │   │   │   raise RuntimeError(msg % message_type)                     │
         │   346 │                                                                      │
         │   347 │   async def asgi_receive(                                            │
         │   348 │   │   self,                                                          │
         ╰──────────────────────────────────────────────────────────────────────────────╯
         RuntimeError: Unexpected ASGI message 'websocket.send', after sending
         'websocket.close'.
INFO     22:19:02 | root » Adding job d359deb8-5040-4165-8fee-efee38f60f96 to queue        pytorch.py:605

Installation Method

Local

Branch

Experimental

System Info

CachyOS Arch Linux, RTX 2060 GPU, 4900HS CPU, Python 3.11 (also tested in 3.10).

Logs

No response

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
I am writing the issue in English.

Segmentation fault (core dumped) - TRT Inference

I'm not able to run TRT inference locally on my A100 machine.

[11/29/2022-16:37:24] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [11/29/2022-16:37:24] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [11/29/2022-16:37:24] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [11/29/2022-16:37:24] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [11/29/2022-16:37:25] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars 51it [00:01, 40.63it/s] | 0/1 [00:00<?, ?it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.94s/it] Segmentation fault (core dumped)

Packages Details
accelerate==0.14.0
diffusers==0.9.0
ftfy==6.1.1
nvidia-cublas-cu11==11.11.3.6
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cudnn-cu11==8.6.0.163
onnx==1.12.0
onnxconverter-common==1.13.0
onnxruntime==1.13.1
onnxsim==0.4.10
pycuda==2022.2
spacy==3.4.3
tensorrt==8.5.1.7
thinc==8.1.5
tokenizers==0.13.2
torch==1.13.0+cu116
torchaudio==0.13.0+cu116
torchvision==0.14.0+cu116
transformers==4.24.0

Error during TRT model build

Running from the docker container

python volta_accelerate.py --build-static-batch --prompt "Forest" --onnx-dir onnx --engine-dir engine --force-onnx-export --backend TRT

Get an error:

  File "volta_accelerate.py", line 741, in <module>
    infer_trt(saving_path=args.output_dir,
  File "volta_accelerate.py", line 670, in infer_trt
    load_trt(model, prompt, img_height, img_width, num_inference_steps)
  File "volta_accelerate.py", line 596, in load_trt
    trt_model.loadEngines(engine_dir, onnx_dir, args.onnx_opset, 
  File "volta_accelerate.py", line 301, in loadEngines
    engine.build(onnx_opt_path, fp16=True, \
  File "/workspace/utilities.py", line 72, in build
    engine = engine_from_network(network_from_onnx_path(onnx_path), config=CreateConfig(fp16=fp16,max_workspace_size=8100654080, profiles=[p],
  File "<string>", line 3, in func_impl
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__
    return self.call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 526, in call_impl
    return engine_from_bytes(super().call_impl)
  File "<string>", line 3, in func_impl
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__
    return self.call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 550, in call_impl
    buffer, owns_buffer = util.invoke_if_callable(self._serialized_engine)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/util/util.py", line 661, in invoke_if_callable
    ret = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 484, in call_impl
    G_LOGGER.critical("Invalid Engine. Please ensure the engine was built correctly")
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/logger/logger.py", line 597, in critical
    raise PolygraphyException(message) from None
polygraphy.exception.exception.PolygraphyException: Invalid Engine. Please ensure the engine was built correctly

Question: about FP8 support for Ada and Hopper..

Hi,
Just asking how difficult will be to add FP8 support to VoltaML once TensorRT supports new FP8 format on Ada and Hopper..
What performance uplift can we expect? maybe 4090 from 80iters/s to 120 iters? Or near to 2x up to 160 iters/s..
Also would be nice to include VoltaML on H100 perf. numbers currently (beforce FP8), in case anyone can get access to..

Thanks..

clip skip selection?

Is your feature request related to a problem? Please describe.

Feature like another ui

Describe the solution you'd like

Maybe slider?

Describe alternatives you've considered

No response

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Docker gets stuck

I get stuck at Volume "voltaml-fast-stable-diffusion_output" Creating

I use Windows 10 64-bit.

Fine tuned model quality degrade after compiling to TRT

Hi I found your code works really good.
Compile went smooth, but I found that my fine tuned model quality goes down when infer with compiled TRT engines.
Do you have experience of quality change after compile?
Or is it okay with your models?

[Bug]: Dead link in Documentation

Describe the bug

top DOCS URL/Link is bad and needs to be updated on this page https://voltaml.github.io/voltaML-fast-stable-diffusion/

Reproduction

Clicked on DOCS on the main page.

Expected behavior

Should go to: https://voltaml.github.io/voltaML-fast-stable-diffusion/getting-started/introduction

Installation Method

Local

Branch

Experimental

System Info

Windows 11
Python 3.10
LAtest build
Issues is not related to the actual AI software, but the webpage

Logs

No response

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
I am writing the issue in English.

[Enchancement ] Code Hygiene

The code is excellent from the scientific point of view, and I love how fast models work, but there is room for improvement in code hygiene.

It will significantly improve the code quality (easier to read, 10x faster to modify) if you add standard industry linters and formatters to the pre-commit and CI/CD.

It will take only 20 minutes to add these checks to the code base, but the value they provide would be substantial.

Quietly fails and closes

Have been testing it out to see what it tries to do to load memory. I have an RTX 3070 8 GB and wanted to see how much memory it used before failing, however, it has not used any VRAM outside of the 2.4 GB to load the model. For RAM, it jumps up 6 GB but then quietly fails. Barely touches the 48 GB RAM avilable.

By quietly fails, I mean that it just doesn't do anything, or say anything.

Example:

if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
A:\Anaconda\envs\voltaml\lib\site-packages\transformers\models\clip\modeling_clip.py:262: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
(voltaml) PS H:\Downloads\voltaML-fast-stable-diffusion-main>

Output directory is empty, no logs were found either.

Windows Native Errors

Describe the bug

AITemplate has an issue compiling models due to it complaining about the Make file being missing. Manually editing the make file for windows specific directory structures does not work (has not in the past). Module Aitemplate can't be imported unless built manually, and even then, like TensorRT, it fails to work on windows out of the box.

Reproduction

Install the repo as you would on any other system
Manually build AItemplate
Install AITemplate Wheel (due to repo not being able to import Pypi AITemplate)
Attempt to accelerate model
Receive error about missing makefile directory

Expected behavior

Honestly, I expected this, most speedups are Linux only (WSL or otherwise).

Branch

Main

System Info

Python: 3.10
OS: Windows 11
Repo: 6c82d05
GPU: RTX 3090
RAM: 48 GB

Additional context

FileNotFoundError: [Errno 2] No such file or directory: "'data\aitemplate\Linaqruf--anything-v3.0__512x512x1\profiler'\Makefile"

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

How do u get this to run in automatic1111?

Any guide?

WSL2 + docker + cuda toolkit

I have 8gb 2080 and 32 gb memory and got lots of out of memory errors.
But in the I think it succeed somehow, yet given this error.
I think docker with cuda toolkit support is a bit limited atm?
Cublas (Could not initialize cublas. Please check CUDA installation.)
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl

/usr/local/lib/python3.8/dist-packages/diffusers/models/resnet.py:39: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert hidden_states.shape[1] == self.channels
/usr/local/lib/python3.8/dist-packages/diffusers/models/resnet.py:52: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if hidden_states.shape[0] >= 64:
/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py:349: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not return_dict:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Generating optimizing model: onnx/unet_fp16.opt.onnx
[I] Folding Constants | Pass 1
[I]     Total Nodes | Original:  8201, After Folding:  5947 |  2254 Nodes Folded
[I] Folding Constants | Pass 2
2022-12-27 21:32:57.943186900 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7900
2022-12-27 21:32:57.943320900 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7729
2022-12-27 21:32:57.943632400 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7969
2022-12-27 21:32:57.943700800 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7574
2022-12-27 21:32:57.943810200 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7427
2022-12-27 21:32:57.944080600 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_7256
...
[I]     Total Nodes | Original:  5947, After Folding:  4536 |  1411 Nodes Folded
[I] Folding Constants | Pass 3
[I]     Total Nodes | Original:  4536, After Folding:  4536 |     0 Nodes Folded
Building TensorRT engine for onnx/unet_fp16.opt.onnx: engine/unet_fp16.plan
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(32, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(32, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
	
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Requested amount of GPU memory (8589934592 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[W] Skipping tactic 3 due to insufficient memory on requested size of 8589934592 detected for tactic 0x0000000000000004.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 9 due to insufficient memory on requested size of 8589934592 detected for tactic 0x000000000000003c.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 8 due to insufficient memory on requested size of 8589934592 detected for tactic 0x000000000000003c.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[W] - 24 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 498.222 seconds
[I] Saving engine to engine/unet_fp16.plan
Exception ignored in: <function Engine.__del__ at 0x7fcd429b1550>
Traceback (most recent call last):
  File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 50, in __del__
    [buf.free() for buf in self.buffers.values() if isinstance(buf, cuda.DeviceArray) ]
AttributeError: 'Engine' object has no attribute 'buffers'
Exporting model: onnx/vae.onnx
Downloading:  23%|████████████████████████▍      
/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py:583: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if not return_dict:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Generating optimizing model: onnx/vae.opt.onnx
[I] Folding Constants | Pass 1
[I]     Total Nodes | Original:   759, After Folding:   679 |    80 Nodes Folded
[I] Folding Constants | Pass 2
2022-12-27 21:43:47.511248100 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_149

2022-12-27 21:43:47.511248100 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_149
2022-12-27 21:43:47.511309200 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_66
[I]     Total Nodes | Original:   679, After Folding:   675 |     4 Nodes Folded
[I] Folding Constants | Pass 3
[I]     Total Nodes | Original:   675, After Folding:   675 |     0 Nodes Folded
Building TensorRT engine for onnx/vae.opt.onnx: engine/vae.plan
[I]     Configuring with profiles: [Profile().add('latent', min=(1, 4, 64, 64), opt=(1, 4, 64, 64), max=(16, 4, 64, 64))]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
	
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 0 due to insufficient memory on requested size of 8589934592 detected for tactic 0x00000000000003e8.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 1 due to insufficient memory on requested size of 8589934592 detected for tactic 0x00000000000003ea.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[W] Skipping tactic 2 due to insufficient memory on requested size of 8589934592 detected for tactic 0x0000000000000000.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 12 due to insufficient memory on requested size of 17179869184 detected for tactic 0x994f5b723e2d80da.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 13 due to insufficient memory on requested size of 17179869184 detected for tactic 0x65d82d184f452332.
    Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[W] Skipping tactic 14 due to insufficient memory on requested size of 17179869184 detected for tactic 0x8d5c64a52fab02c9.
	Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 1: [virtualMemoryBuffer.cpp::resizePhysical::132] Error Code 1: Cuda Driver (invalid argument)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[I] Finished engine building in 221.174 seconds
[I] Saving engine to engine/vae.plan
Exception ignored in: <function Engine.__del__ at 0x7fcd429b1550>
Traceback (most recent call last):
  File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 50, in __del__
    [buf.free() for buf in self.buffers.values() if isinstance(buf, cuda.DeviceArray) ]
AttributeError: 'Engine' object has no attribute 'buffers'
Building TensorRT engine for onnx/clip.opt.onnx: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I]     Configuring with profiles: [Profile().add('input_ids', min=(1, 77), opt=(1, 77), max=(16, 77))]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
[W] - 6 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 49.716 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/clip.plan
Building TensorRT engine for onnx/unet_fp16.opt.onnx: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1719718653
[I]     Configuring with profiles: [Profile().add('sample', min=(2, 4, 64, 64), opt=(2, 4, 64, 64), max=(32, 4, 64, 64)).add('encoder_hidden_states', min=(2, 77, 768), opt=(2, 77, 768), max=(32, 77, 768)).add('timestep', min=[1], opt=[1], max=[1])]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[W] - 22 weights are affected by this issue: Detected subnormal FP16 values.
[I] Finished engine building in 426.725 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Building TensorRT engine for onnx/vae.opt.onnx: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I]     Configuring with profiles: [Profile().add('latent', min=(1, 4, 64, 64), opt=(1, 4, 64, 64), max=(16, 4, 64, 64))]
[I] Building engine with configuration:
    Flags                  | [FP16]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 7725.39 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
...
[I] Finished engine building in 164.962 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/vae.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (out of memory)
[W] Requested amount of GPU memory (5570036736 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[E] 2: [executionContext.cpp::ExecutionContext::409] Error Code 2: OutOfMemory (no further information)
[I] Finished engine building in 164.962 seconds
[I] Saving engine to engine/CompVis/stable-diffusion-v1-4/vae.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [defaultAllocator.cpp::allocate::20] Error Code 1: Cuda Runtime (out of memory)
[W] Requested amount of GPU memory (5570036736 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[E] 2: [executionContext.cpp::ExecutionContext::409] Error Code 2: OutOfMemory (no further information)
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 1.03MB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 738kB/s]
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<00:00, 36.6kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00<00:00, 582kB/s]
[I] Warming up ..
[I] Running StableDiffusion pipeline
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/clip.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/clip.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/unet_fp16.plan
Loading TensorRT engine: engine/CompVis/stable-diffusion-v1-4/vae.plan
[I] Loading bytes from engine/CompVis/stable-diffusion-v1-4/vae.plan
[E] 1: [wrapper.cpp::CublasWrapper::85] Error Code 1: Cublas (Could not initialize cublas. Please check CUDA installation.)
[E] 1: [engine.cpp::deserialize::867] Error Code 1: Serialization (Serialization assertion postDeserializationCheck() failed.Post deserialization check failure)
[E] 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
[!] Could not deserialize engine. See log for details.

172.17.0.1 - - [27/Dec/2022 21:58:48] "POST /voltaml/job HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/workspace/voltaML-fast-stable-diffusion/app.py", line 88, in upload_file
    pipeline_time = infer_trt(saving_path=saving_path,
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 678, in infer_trt
    load_trt(saving_path, model, prompt, img_height, img_width, num_inference_steps)
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 599, in load_trt
    trt_model.loadEngines(engine_dir, onnx_dir, args.onnx_opset,
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 309, in loadEngines
    self.engine[model_name].activate()
  File "/workspace/voltaML-fast-stable-diffusion/utilities.py", line 78, in activate
    self.engine = engine_from_bytes(bytes_from_path(self.engine_path))
  File "<string>", line 3, in func_impl
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/base/loader.py", line 42, in __call__
    return self.call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/backend/trt/loader.py", line 564, in call_impl
    G_LOGGER.critical("Could not deserialize engine. See log for details.")
  File "/usr/local/lib/python3.8/dist-packages/polygraphy/logger/logger.py", line 597, in critical
    raise PolygraphyException(message) from None
polygraphy.exception.exception.PolygraphyException: Could not deserialize engine. See log for details.

PyTorch model should not default to SD 1.4

voltaML-fast-stable-diffusion/pytorch_model.py

Line 8 in e865e25

model_name_or_path="CompVis/stable-diffusion-v1-4"

Everything else in the repo references SD 1.5, so PyTorch should use SD 1.5 as well and not 1.4.

This may also explain why the generated images are so different using the 2 methods.

Not a directory when trying to look for TRT models

Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__ return self.wsgi_app(environ, start_response) File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app response = self.handle_exception(e) File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) File "/workspace/voltaML-fast-stable-diffusion/app.py", line 155, in scan_directory tmp2 = os.listdir(os.path.join(trt_model_path,i)) NotADirectoryError: [Errno 20] Not a directory: 'engine/unet_fp16.plan'

MPS supporting

Is your feature request related to a problem? Please describe.

What do you think to add Apple MPS supporting? Will be it useful or even possible with voltaML architecture?

Describe the solution you'd like

Pytorch is supporting MPS backend for low-level Apple Silicon GPU accelerating.

Describe alternatives you've considered

No response

Additional context

Device: Macbook M1 Pro Max
OS: MacOS Ventura

Validations

Read the docs.
Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

ENV CUDA_MODULE_LOADING=LAZY

Docker is built with cuda 11.6

While lazy loading is supported only in cuda 11.7+

After failing to load a model (or the correct one) and trying to load a textual inversion (after getting error), it doesn't reset the LOAD button

Discussed in #79

^{Originally posted by cleverestx May 14, 2023}

The button is no longer usable even after re-loading the correct model when it tells you 'X model is not loaded', even when the window is closed and re-opened. Please fix this so the LOAD button is restored when loading the asked-for model (or closing and re-opening this window).

Even when loading it properly with the correct model, you can't unload/remove it? It does not appear in the prompt either...the latter would be nice, and some embeddings are for NEGATIVE prompts...how would that work?

Thank you.

What is the trick?

Is it true that this repo just converts model to TRT and this is the way how it gets speed boost?

Is it possible to adapt VoltaML to Stable Diffusion Inpainting model?

I want to try VoltaML on (stable diffusion inpainting) model. What changes should I make to the conversion scripts for VoltaML to work on Inpainting models?

How to delete this?

Hello there, after i installed it and tinker with it with no results i found out that my drive C is full from this, how do i delete it? Why didnt it got installed in the folder i run it from?

NameError: name 'loaded_model' is not defined, and FileNotFoundError: [Errno 2] No such file or directory: 'onnx/clip.onnx'

Hi,

Trying to run accelerated SD1.5 models, getting this issue
Running on windows 11 WSL, with RTX 3070 8GB

CMD:

docker run --gpus=all -v C:\voltaml\engine/engine:/workspace/voltaML-fast-stable-diffusion/engine -v C:\voltaml\output/engine:/workspace/voltaML-fast-stable-diffusion/static/output -p 5003:5003 -it voltaml/volta_diffusion_webui:v0.2

172.17.0.1 - - [18/Dec/2022 13:15:21] "POST /voltaml/job HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 661, in infer_trt
    if loaded_model!=args.model_path:
NameError: name 'loaded_model' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2548, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2528, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/workspace/voltaML-fast-stable-diffusion/app.py", line 88, in upload_file
    pipeline_time = infer_trt(saving_path=saving_path,
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 664, in infer_trt
    load_trt(saving_path, model, prompt, img_height, img_width, num_inference_steps)
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 599, in load_trt
    trt_model.loadEngines(engine_dir, onnx_dir, args.onnx_opset,
  File "/workspace/voltaML-fast-stable-diffusion/volta_accelerate.py", line 279, in loadEngines
    torch.onnx.export(model,
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/__init__.py", line 350, in export
    return utils.export(
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 163, in export
    _export(
  File "/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py", line 1148, in _export
    with torch.serialization._open_file_like(f, "wb") as opened_file:
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'onnx/clip.onnx'

[Bug]: Image to image always outputs images with 512X512 resolution

Describe the bug

Image to image always outputs images with 512X512 resolution

Reproduction

build latest branch

Expected behavior

Output by setting

Installation Method

Docker

Branch

Experimental

System Info

building with dockerfile

Logs

No response

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
I am writing the issue in English.

mount directory of Local models already downloaded and allow civitai downloads

wanting to compare to a1111, and I have a ton of models downloaded... but I see no way to mount a model directory so I can use them.
Also I often prefer civitai to huggingface.

Also if the TensorRT conversion is done locally, I'd like to save those and be able to reuse them... again, external mounted directory?

It seems like that the AIT doesn't support dynamic shape ?

Is your feature request related to a problem? Please describe.

When I change the output size of the txt2img, it occurs sth error as belows (I build the ait with runwayml--stable-diffusion-v1-5__512x512x1)

// case 1 output size is 64 * 64
...
[18:16:17] model_interface.cu:210: Error: [SetValue] Dimension got value out of bounds; expected value to be in [32, 64], but got 8.

// case 2 output size is 512 * 256
...
  File "/home/quchenxi/test/fastsd/core/aitemplate/src/ait_txt2img.py", line 429, in __call__
    latents = self.scheduler.step(
  File "/home/quchenxi/test/TensorRT/dm_engine/lib/python3.10/site-packages/diffusers/schedulers/scheduling_ddim.py", line 326, in step
    pred_original_sample = (sample - beta_prod_t ** (0.5) * model_output) / alpha_prod_t ** (0.5)
RuntimeError: The size of tensor a (32) must match the size of tensor b (64) at non-singleton dimension 3

Describe the solution you'd like

Maybe it need some work on AITStableDiffusionPipeline before building engines like trt

Describe alternatives you've considered

No response

Additional context

Do you have the plan to support the dynamic shape on AIT ? Or can you share some ideas about how to make it ?

Validations

Read the docs.
Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Converted models produce same output

I've converted to trt SD1.5 SD2.1 and my merge https://huggingface.co/Magistr/Magmix
and run
that prompt against them,
(18 years young girl:1.4),detailed face and eyes,green eyes,female focus,silver hair,short messy hair,small breasts,flat chest,(blue sneakers:1.2),(black bike shorts:1.2),full body,fooling around,standing, wolf years, wolf tail, animal tail, grey croptop,red jacket, riverside, ruins, old bridge,(yellow socks:1.2),

got same images in output from all 3 trt models

Does this project support the acceleration of Lora and ControlNet?

If not, does anyone know how to realize it? Or it cannot be realized based on TensorRT。

CLIP model in models.py cannot be changed

Hi,

In models.py file, the CLIP class can only load a default CLIP (I copy/paste the current code bellow). Shouldn't it be self.model_path instead of "openai/clip-vit-large-patch14" ? VAE and UNET are set to the correct self.model_path.

class CLIP(BaseModel):
def get_model(self):
return CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(self.device)

Thanks !

image to image AND controlnet don't work when use AIT model

INFO: 172.18.22.48:57122 - "POST /api/generate/img2img HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/app/core/cluster.py", line 143, in generate
return await best_gpu.generate(job)
File "/app/core/gpu.py", line 135, in generate
raise err
File "/app/core/gpu.py", line 127, in generate
images = await run_in_thread_async(func=generate_thread_call, args=(job,))
File "/app/core/utils.py", line 77, in run_in_thread_async
raise exc
File "/app/core/thread.py", line 45, in run
self._return = target(*self._args, **self._kwargs) # type: ignore
File "/app/core/gpu.py", line 93, in generate_thread_call
images: List[Image.Image] = model.generate(job)
File "/app/core/inference/aitemplate.py", line 102, in generate
images = self.img2img(job)
File "/app/core/inference/aitemplate.py", line 176, in img2img
pipe = StableDiffusionImg2ImgAITPipeline(
File "/app/core/aitemplate/src/ait_img2img.py", line 111, in init
self.clip_ait_exe = self.init_ait_module(
File "/app/core/aitemplate/src/ait_img2img.py", line 137, in init_ait_module
mod = Model(os.path.join(workdir, model_name, "test.so"))
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 213, in init
self.DLL = self._DLLWrapper(lib_path)
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 164, in init
self.DLL = ctypes.cdll.LoadLibrary(lib_path)
File "/usr/lib/python3.8/ctypes/init.py", line 451, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.8/ctypes/init.py", line 373, in init
self._handle = _dlopen(self._name, mode)
OSError: tmp/CLIPTextModel/test.so: cannot open shared object file: No such file or directory

Benchmark is not implemented in 'PT' mode

It probably should follow the logic in TRT mode so people can make easy comparisons.

Why compute >=7.5 required and 7.0 not supported..

Have a Titan V which is cc 7.0.. any reason why not supported V100 cards?
Curious as this project is called “Volta”..
What specific cc7.5 features needs?
Thanks..

python3 volta_accelerate.py --onnx_trt=trt "$@"

Sorry for bothering. I am not that of a code-man. Seems i get nearer using your docker image but i still keep a bunch of errors when i try to tune a model with it. last error i got was

optimize.sh: line 1: 4939 Segmentation fault python3 volta_accelerate.py --onnx_trt=onnx "$@"
optimize.sh: line 2: 4959 Segmentation fault python3 volta_accelerate.py --onnx_trt=trt "$@"

any idea what i can do to get it to work?

Add Multiple ControlNets support

Is your feature request related to a problem? Please describe.

It would be nice if voltaml suport Multi-ControlNet

Describe the solution you'd like

https://github.com/huggingface/diffusers/releases

Describe alternatives you've considered

No response

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Bug in Frontend View

Describe the bug

frontend\src\views\TextToImageView.vue

The width and height are misplaced, so when generating an image, the front sends the width value as height, and viceversa:

Reproduction

Generate any image with different width and height, and see how the frontend calls the api with the values reversed

Expected behavior

The frontend should send the correct values from the width and height sliders

Branch

Experimental

System Info

Experimental branch

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

[12/10/2022-22:18:01] [TRT] [E] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_9398 + (Unnamed Layer* 6862) [Shuffle].../3/0_2/Reshape_1 + /3/0_2/Transpose_1]}.) [12/10/2022-22:18:01] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

python3 volta_accelerate.py --onnx_trt=trtpython3 volta_accelerate.py --onnx_trt=trt

GPU: GTX 2080Ti

Running from the docker image: voltaml/volta_diffusion:v0.2

[12/10/2022-22:18:01] [TRT] [E] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_9398 + (Unnamed Layer* 6862) [Shuffle].../3/0_2/Reshape_1 + /3/0_2/Transpose_1]}.)
[12/10/2022-22:18:01] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

[Bug]: AIT Acceleration doesn't work

Describe the bug

AIT Acceleration doesn't work (waiting for 40 minutes already), stucks on UNet tab. If run AIT scripts directly with python, everything ok. Task manager shows no perfomance after 8 minutes.

Reproduction

Set Acceleration to 512x512, 1 batch, 24 CPU threads. Model stable-diffusion-v1-5.

Expected behavior

Model acceleration

Installation Method

Local

Branch

Main

System Info

Latest version of main branch. Windows, 5900x, 128 RAM, 3090ti.

Logs

2023-05-01 03:51:20,361 INFO <aitemplate.backend.profiler_cache> Ignore repeat profile_record:
SELECT algo, workspace, split_k
FROM cuda_gemm_3
WHERE
dtype_a=14 AND
dtype_b=14 AND
dtype_c=14 AND
dtype_acc=14 AND
major_a=2 AND
major_b=1 AND
major_c=2 AND
op_type='gemm_rcr_permute' AND
device='80' AND
epilogue=1 AND
pshape='64_1_8' AND
exec_entry_sha1='d9106d7291c48fc10faca140108b9deb185eed00';
2023-05-01 03:51:20,361 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> Profiler (gemm_rcr_bias_fast_gelu_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 M == 128 && N == 5120 && K == 1280) selected kernel: best_algo='cutlass_tensorop_h16816gemm_64x128_32x6_tn_align_8_8' workspace=0 split_k=1
2023-05-01 03:51:20,361 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> Profiler (gemm_rcr_bias_mul_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 M == 128 && N == 5120 && K == 1280) selected kernel: best_algo='cutlass_tensorop_h16816gemm_64x128_32x6_tn_align_8_8' workspace=0 split_k=1
2023-05-01 03:51:20,361 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> Profiler (gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3 M == 128 && N == 1280 && K == 5120) selected kernel: best_algo='cutlass_tensorop_h16816gemm_64x64_64x5_tn_align_8_8' workspace=0 split_k=1
2023-05-01 03:51:20,361 INFO <aitemplate.compiler.transform.profile> ran 75 profilers elapsed time: 0:00:07.081708
2023-05-01 03:51:20,374 INFO <aitemplate.backend.codegen> generated 1 function srcs
2023-05-01 03:51:20,385 INFO <aitemplate.compiler.compiler> folded constants elapsed time: 0:00:00.021329
2023-05-01 03:51:21,261 INFO <aitemplate.backend.codegen> generated 199 function srcs
2023-05-01 03:51:23,263 INFO <aitemplate.backend.codegen> generated 7 library srcs
2023-05-01 03:51:23,264 INFO <aitemplate.backend.builder> Using 24 CPU for building

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
I am writing the issue in English.

[Feature]: add controlnet for SD2.0 model support and ati acceleration capability

Is your feature request related to a problem? Please describe.

When I try to compile ATI for SD2.0 model, there is an error in the second step, I think it is not supported yet

Describe the solution you'd like

https://huggingface.co/thibaud/controlnet-sd21

Describe alternatives you've considered

No response

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

[Feature]: Tesla A40 support?

Is your feature request related to a problem? Please describe.

· Graphics card for AITemplate: RTX 40xx, RTX 30xx, H100, A100, A10, A30, V100, T4
It seems that the Tesla A40 is not supported?

Describe the solution you'd like

I have a Tesla A40, and it seems not to be on the supported list. Can I use it normally, or do I need to wait for support?

Describe alternatives you've considered

No response

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Obvious Output Discrepancy between PyTorch and AITemplate inference

Describe the bug

Description

The output discrepancy between PyTorch and AITemplate inference is quite obvious.

According to our various testing cases, AITemplate produces lower-quality results on average, especially for human faces.

Reproduction

Model:
chilloutmix-ni-pruned-fp16-fix

Prompt:

brown hair, 1girl, solo, hand on the hip, dress, looking at viewer, smile, street

Negative Prompt:

(worst quality low quality:1.4)

Parameter	Value
Height	512
Width	512
Sampler	DPMSolverMultiStep
CFG	7
Batch Count	4
Batch Size	1
Seed	1191535362

PyTorch Results

AITemplate Results

Expected behavior

PyTorch and AITemplate should produce similar results and quality.

Branch

Experimental

System Info

OS: Debian 11
GPU: Nvidia L4
CUDA: 12.1

Additional context

It might be related to facebookincubator/AITemplate#141

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

Tutorial for benchmark

Could you please add a tutorial on how to run the benchmark?

Batch size of 8 acceleration with AI Template does not work

Running latest image

Running inference fails with this exception:

2023-03-18T18:54:45.876370659Z   File "/app/api/routes/generate.py", line 31, in txt2img_job
2023-03-18T18:54:45.876372038Z     images, time = await cluster.generate(job)
2023-03-18T18:54:45.876373180Z   File "/app/core/cluster.py", line 164, in generate
2023-03-18T18:54:45.876374433Z     raise e
2023-03-18T18:54:45.876375506Z   File "/app/core/cluster.py", line 143, in generate
2023-03-18T18:54:45.876376675Z     return await best_gpu.generate(job)
2023-03-18T18:54:45.876377823Z   File "/app/core/gpu.py", line 135, in generate
2023-03-18T18:54:45.876379035Z     raise err
2023-03-18T18:54:45.876380137Z   File "/app/core/gpu.py", line 127, in generate
2023-03-18T18:54:45.876381353Z     images = await run_in_thread_async(func=generate_thread_call, args=(job,))
2023-03-18T18:54:45.876382550Z   File "/app/core/utils.py", line 77, in run_in_thread_async
2023-03-18T18:54:45.876383781Z     raise exc
2023-03-18T18:54:45.876384916Z   File "/app/core/thread.py", line 45, in run
2023-03-18T18:54:45.876386300Z     self._return = target(*self._args, **self._kwargs)  # type: ignore
2023-03-18T18:54:45.876387628Z   File "/app/core/gpu.py", line 93, in generate_thread_call
2023-03-18T18:54:45.876388817Z     images: List[Image.Image] = model.generate(job)
2023-03-18T18:54:45.876390019Z   File "/app/core/inference/aitemplate.py", line 103, in generate
2023-03-18T18:54:45.876391269Z     images = self.txt2img(job)
2023-03-18T18:54:45.876392395Z   File "/app/core/inference/aitemplate.py", line 142, in txt2img
2023-03-18T18:54:45.876393651Z     data = pipe(
2023-03-18T18:54:45.876394809Z   File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2023-03-18T18:54:45.876396087Z     return func(*args, **kwargs)
2023-03-18T18:54:45.876397224Z   File "/app/core/aitemplate/src/ait_txt2img.py", line 285, in __call__
2023-03-18T18:54:45.876398387Z     text_embeddings = self.clip_inference(text_input.input_ids.to(self.device))
2023-03-18T18:54:45.876399602Z   File "/app/core/aitemplate/src/ait_txt2img.py", line 170, in clip_inference
2023-03-18T18:54:45.876400929Z     exe_module.run_with_tensors(inputs, ys, graph_mode=False)
2023-03-18T18:54:45.876402103Z   File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 535, in run_with_tensors
2023-03-18T18:54:45.876403418Z     outputs_ait = self.run(
2023-03-18T18:54:45.876404582Z   File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 438, in run
2023-03-18T18:54:45.876406137Z     return self._run_impl(
2023-03-18T18:54:45.876407297Z   File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 377, in _run_impl
2023-03-18T18:54:45.876408540Z     self.DLL.AITemplateModelContainerRun(
2023-03-18T18:54:45.876410860Z   File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 181, in _wrapped_func
2023-03-18T18:54:45.876412236Z     raise RuntimeError(f"Error in function: {method.__name__}")
2023-03-18T18:54:45.876413454Z RuntimeError: Error in function: AITemplateModelContainerRun

Error on optimization step

[12/09/2022-20:25:24] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[12/09/2022-20:25:24] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[12/09/2022-20:25:24] [TRT] [W] Requested amount of GPU memory (17179869184 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[12/09/2022-20:25:25] [TRT] [W] Skipping tactic 8 due to insufficient memory on requested size of 17179869184 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().

How to fix this? I'm testing on T4 GPU with 15.1GB VRAM.

Does this project support the acceleration of AITemplate pipeline?

Is your feature request related to a problem? Please describe.

Does this project support the acceleration of Lora in AITemplate pipeline?

Describe the solution you'd like

Maybe it needs to "inject" or "swap" the LORAs weights into the already compiled unet.so. Refit API can be used to patch/update the weights of the engine at runtime in TensorRT

Describe alternatives you've considered

No response

Additional context

Or can anyone share some ideas on how to implement it ？

Validations

Read the docs.
Check that there isn't already an issue that asks for the same feature to avoid creating a duplicate.

Unable to run the voltaml/volta_diffusion:v0.1 docker image

-> % sudo docker run -it --gpus all voltaml/volta_diffusion:v0.1 bash
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/e049fdb3bc56fecdeefb3b950034cbc757eeb166b152330d00ef6e8a2972af06/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0000] error waiting for container: context canceled

This is probably because when --gpus=all is specified, the Docker engine will try and mount all the nvidia & cuda bits & pieces into the container. But some of the files in the image (e.g. /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1) are actually links rather than files, so the mounting process is not successful.

Please can you open source the Dockerfile as well.

voltaml / voltaml-fast-stable-diffusion Goto Github PK

voltaml-fast-stable-diffusion's Introduction

VoltaML - Fast Stable Diffusion

Documentation · Report Bug · Request Feature

Made with ❤️ by Stax124, Gabe, and the community

Table of Contents

About the Project

Screenshots

Tech Stack

Main features

Speed comparison

Installation

Contributing

License

Contact

voltaml-fast-stable-diffusion's People

Contributors

Stargazers

Watchers

Forkers

voltaml-fast-stable-diffusion's Issues

Describe the bug

Installation Method

Branch

System Info

Logs

Additional context

Validations

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Validations

Describe the bug

Reproduction

Expected behavior

Installation Method

Branch

System Info

Logs

Additional context

Validations

Describe the bug

Reproduction

Expected behavior

Branch

System Info

Additional context

Validations

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Validations

Discussed in #79

Describe the bug

Reproduction

Expected behavior

Installation Method

Branch

System Info

Logs

Additional context

Validations

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Validations

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Validations

Describe the bug

Reproduction

Expected behavior

Branch

System Info

Additional context