Code Monkey home page Code Monkey logo

comfyui-tileddiffusion's Introduction

Tiled Diffusion & VAE for ComfyUI

Check out the SD-WebUI extension for more information.

This extension enables large image drawing & upscaling with limited VRAM via the following techniques:

  • Reproduced SOTA Tiled Diffusion methods
  • pkuliyi2015 & Kahsolt's Tiled VAE algorithm
  • pkuliyi2015 & Kahsolt's TIled Noise Inversion method

Note

Sizes/dimensions are in pixels and then converted to latent-space sizes.

Features

  • Supported models
    • SD1.x, SD2.x, SDXL, SD3
    • FLUX
  • ControlNet support
  • StableSR support
  • Tiled Noise Inversion
  • Tiled VAE
  • Regional Prompt Control
  • Img2img upscale
  • Ultra-Large image generation

Tiled Diffusion

Tiled_Diffusion

Tip

  • Set tile_overlap to 0 and denoise to 1 to see the tile seams and then adjust the options to your needs.
  • Increase tile_batch_size to increase speed (if your machine can handle it).
  • Use the colorfix node if your colors look off.

Options

Name Description
method Tiling strategy.
tile_width Tile's width
tile_height Tile's height
tile_overlap Tile's overlap
tile_batch_size The number of tiles to process in a batch

How can I specify the tiles' arrangement?

If you have the Math Expression node (or something similar), you can use that to pass in the latent that's passed in your KSampler and divide the tile_height/tile_width by the number of rows/columns you want.

C = number of columns you want
R = number of rows you want

pixel width of input image or latent // C = tile_width
pixel height of input image or latent // R = tile_height

Tile_arrangement

SpotDiffusion

Paper

A tiling algorithm that attempts to eliminate seams by randomly shifting the denoise window per timestep. It is mainly used for fast inferences by setting tile_overlap to 0; otherwise, it's better to stick with the other tiling strategies as they produce better outputs.

This additional feature is experimental, in testing, and subject to change.

Tiled VAE

Tiled_VAE

The recommended tile sizes are given upon the creation of the node based on the available VRAM.

Note

Enabling fast for the decoder may produce images with slightly higher contrast and brightness.

Options

Name Description
tile_size
The image is split into tiles, which are then padded with 11/32 pixels' in the decoder/encoder.
fast

When Fast Mode is disabled:

  1. The original VAE forward is decomposed into a task queue and a task worker, which starts to process each tile.
  2. When GroupNorm is needed, it suspends, stores current GroupNorm mean and var, send everything to RAM, and turns to the next tile.
  3. After all GroupNorm means and vars are summarized, it applies group norm to tiles and continues.
  4. A zigzag execution order is used to reduce unnecessary data transfer.

When Fast Mode is enabled:

  1. The original input is downsampled and passed to a separate task queue.
  2. Its group norm parameters are recorded and used by all tiles' task queues.
  3. Each tile is separately processed without any RAM-VRAM data transfer.

After all tiles are processed, tiles are written to a result buffer and returned.

color_fix
Only estimate GroupNorm before downsampling, i.e., run in a semi-fast mode.

Only for the encoder. Can restore colors if tiles are too small.

Workflows

The following images can be loaded in ComfyUI.

ComfyUI_07501_

Simple upscale.


ComfyUI_07503_

4x upscale. 3 passes.

License

Great thanks to all the contributors! 🎉🎉🎉
The implementation of MultiDiffusion, Mixture of Diffusers, and Tiled VAE code is currently under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License since it was borrowed from the wonderful SD-WebUI extension. Anything else GPLv3.

Citation

@article{jimenez2023mixtureofdiffusers,
  title={Mixture of Diffusers for scene composition and high resolution image generation},
  author={Álvaro Barbero Jiménez},
  journal={arXiv preprint arXiv:2302.02412},
  year={2023}
}
@article{bar2023multidiffusion,
  title={MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation},
  author={Bar-Tal, Omer and Yariv, Lior and Lipman, Yaron and Dekel, Tali},
  journal={arXiv preprint arXiv:2302.08113},
  year={2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.