Light

mikonvergence / controlnetinpaint Goto Github PK

View Code? Open in Web Editor NEW

326.0 2.0 28.0 15.05 MB

Inpaint images with ControlNet

License: MIT License

Jupyter Notebook 97.25% Python 2.75%

controlnetinpaint's Introduction

♻️ ControlNetInpaint

ControlNet has proven to be a great tool for guiding StableDiffusion models with image-based hints! But what about changing only a part of the image based on that hint?

🔮 The initial set of models of ControlNet were not trained to work with StableDiffusion inpainting backbone, but it turns out that the results can be pretty good!

In this repository, you will find a basic example notebook that shows how this can work. The key trick is to use the right value of the parameter controlnet_conditioning_scale - while value of 1.0 often works well, it is sometimes beneficial to bring it down a bit when the controlling image does not fit the selected text prompt very well.

Demos on 🤗 HuggingFace Using ControlNetInpaint

✏️ Mask and Sketch

Check out the HuggingFace Space which allows you to scribble and describe how you want to recreate a part of an image:

🎭theaTRON

Check out the HuggingFace Space that reimagines scenes with human subjects using a text prompt:

Code Usage

This code is currently compatible with diffusers==0.14.0. An upgrade to the latest version can be expected in the near future (currently, some breaking changes are present in 0.15.0 that should ideally be fixed on the side of the diffusers interface).

Here's an example of how this new pipeline (StableDiffusionControlNetInpaintPipeline) is used with the core backbone of "runwayml/stable-diffusion-inpainting":

# load control net and stable diffusion v1-5
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
     "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16
 )

# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# remove following line if xformers is not installed
pipe.enable_xformers_memory_efficient_attention()

pipe.to('cuda')

# generate image
generator = torch.manual_seed(0)
new_image = pipe(
    text_prompt,
    num_inference_steps=20,
    generator=generator,
    image=image,
    control_image=canny_image,
    mask_image=mask_image
).images[0]

(Full example how to get images and run the results is available in the notebook!)

Results

All results below have been generated using the ControlNet-with-Inpaint-Demo.ipynb notebook.

Let's start with turning a dog into a red panda!

Canny Edge

Prompt: "a red panda sitting on a bench"

HED

Prompt: "a red panda sitting on a bench"

Scribble

Prompt: "a red panda sitting on a bench"

Depth

Prompt: "a red panda sitting on a bench"

Normal

Prompt: "a red panda sitting on a bench"

For the remaining modalities, the panda example doesn't really make much sense, so we use different images and prompts to illustrate the capability!

M-LSD

Prompt: "an image of a room with a city skyline view"

OpenPose

Prompt: "a man in a knight armor"

Segmentation Mask

Prompt: "a pink eerie scary house"

Challenging Example 🐕➡️🍔

Let's see how tuning the controlnet_conditioning_scale works out for a more challenging example of turning the dog into a cheeseburger!

In this case, we demand a large semantic leap and that requires a more subtle guide from the control image!

⏩ DiffusionFastForward: learn diffusion from ground up! 🎻

If you want to learn more about the process of denoising diffusion for images, check out the open-source course DiffusionFastForward with colab notebooks where networks are trained from scratch on high-resolution data! 🔰

Acknowledgement

There is a related excellent repository of ControlNet-for-Any-Basemodel that, among many other things, also shows similar examples of using ControlNet for inpainting. However, that definition of the pipeline is quite different, but most importantly, does not allow for controlling the controlnet_conditioning_scale as an input argument.

There are other differences, such as the fact that in this implementation, only one pipeline needs to be instantiated (as opposed to two in the other one), but the key motivation for publishing this repository is to provide a space solely focused on the application of ControlNet for inpainting.

controlnetinpaint's People

Contributors

Stargazers

Watchers

controlnetinpaint's Issues

How to get multiple images for multiple prompts

Hello @mikonvergence, your work is awesome and i have a query regarding an issue which is rendering in my brain from days.

I have 10-15 different prompts and i want to infer on a single image, also with T4 GPU, the GPU goes into fragments for single image and single prompt.

Thanks and Regards,
Satwik Sunnam.

promptless inpainting?

Is there a way to do promptless inpainting with ControlNet and stable diffusion 1.5 inpainting model?. I want to recreate this https://civitai.com/articles/1907 in colab but don't know how, and I don't want gradio UIs or server because you can't run sd-webui on free Colab and my pc is weak..

About 'strength' Parameter in StableDiffusionControlNetInpaintPipeline Compared to StableDiffusionInpaintPipeline

The StableDiffusionInpaintPipeline introduces a strength parameter, as detailed in the documentation here. However, I couldn't locate this parameter in the StableDiffusionControlNetInpaintPipeline.

If I use the parameters num_inference_steps=40 and strength=0.93 in StableDiffusionInpaintPipeline, should I then use num_inference_steps=37 (calculated as 40 * 0.93) in StableDiffusionControlNetInpaintPipeline?

About Training

Hi ! Thanks for your great work!
I'd like to ask you how to train the model? Is it training both unet for inpainting and controlnet? Or do you train these two separately?

SDXL compatibility?

Compatibility with SDXL would be awesome

is it possible to do inpaint conditioned on another image?

Instead on conditioning on a text prompt, can we somehow use controlnet to condition on another image and replace an masked area with this new image?

Thanks

can i fine tune your inpaint model with my dataset?

please tell me if i could fine tune your model with our dataset and give our objects a special prompt?

How to implement without diffusers?

I'd like to add support for controlnet inpainting in ComfyUI - so that we can use it in the AI horde - but none of our pipelines are using diffusers. Any idea how this can be done with the RunwayML version of stable diffusion?

Controlnet 1.1 preprocessors

Controlnet 1.1 preprocessors like lineart would be great

Inpainting new "concepts"

Great work @mikonvergence!
I have a question that is somewhat related to #1. Say I have a poster image and want to inpaint the face in the poster with a given avatar image like:

How can I achieve this given the fact that these avatars are a new "concept" for the LDM? I did try your method mentioned in the issue but it did not work out for me.

Is it possible to train controlnet with this pipeline

Hello! I'm trying to train a controlnet with the diffusers train script https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py
how can i use this pipeline for training with this script

The size of tensor a (612) must match the size of tensor b (3) at non-singleton dimension 4

Every model except canny is giving this error:

/content/./ControlNetInpaint/src/pipeline_stable_diffusion_controlnet_inpaint.py in prepare_mask_and_masked_image(image, mask)
166 mask = torch.from_numpy(mask)
167
--> 168 masked_image = image * (mask < 0.5)
169
170 return mask, masked_image

RuntimeError: The size of tensor a (612) must match the size of tensor b (3) at non-singleton dimension 4

Did you retrain the ControlNet for the SD-inpainting backbone?

Hi! Thank you for this repo.

I did not understand if you retrained ControlNet using the SD-inpainting backbone, or if you copied over the weight that were trained for the regular SD-backbone by ControlNot authors, and those weights somehow work on the SD-inpainting backbone as well?

Thank you very much,
Thibault

RuntimeError: GET was unable to find an engine to execute this computation

RuntimeError Traceback (most recent call last)
Cell In[9], line 4
1 from controlnet_aux import OpenposeDetector
3 openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
----> 4 pose_image = openpose(image)
5 pose_image

File /home/pai/lib/python3.9/site-packages/controlnet_aux/open_pose/init.py:83, in OpenposeDetector.call(self, input_image, detect_resolution, image_resolution, hand_and_face, return_pil)
81 H, W, C = input_image.shape
82 with torch.no_grad():
---> 83 candidate, subset = self.body_estimation(input_image)
84 hands = []
85 faces = []

File /home/pai/lib/python3.9/site-packages/controlnet_aux/open_pose/body.py:44, in Body.call(self, oriImg)
42 # data = data.permute([2, 0, 1]).unsqueeze(0).float()
43 with torch.no_grad():
---> 44 Mconv7_stage6_L1, Mconv7_stage6_L2 = self.model(data)
45 Mconv7_stage6_L1 = Mconv7_stage6_L1.cpu().numpy()
46 Mconv7_stage6_L2 = Mconv7_stage6_L2.cpu().numpy()

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /home/pai/lib/python3.9/site-packages/controlnet_aux/open_pose/model.py:116, in bodypose_model.forward(self, x)
114 def forward(self, x):
--> 116 out1 = self.model0(x)
118 out1_1 = self.model1_1(out1)
119 out1_2 = self.model1_2(out1)

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/container.py:217, in Sequential.forward(self, input)
215 def forward(self, input):
216 for module in self:
--> 217 input = module(input)
218 return input

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/conv.py:463, in Conv2d.forward(self, input)
462 def forward(self, input: Tensor) -> Tensor:
--> 463 return self._conv_forward(input, self.weight, self.bias)

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/conv.py:459, in Conv2d._conv_forward(self, input, weight, bias)
455 if self.padding_mode != 'zeros':
456 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
457 weight, bias, self.stride,
458 _pair(0), self.dilation, self.groups)
--> 459 return F.conv2d(input, weight, bias, self.stride,
460 self.padding, self.dilation, self.groups)

RuntimeError: GET was unable to find an engine to execute this computation

No removing effect

Thanks for the great repo. I was trying to remove an object from an image. I try to use the canny method and also set the prompt to be nothing and decrease the controlnet_conditioning_scale to 0. This works on the default image in the colab but not with any other image. In fact, it produces sth else in the masked area. Could you please explain what else should be done to have removing effect ?

MultiControlNet support?

In the original ControlNet pipeline, we can pass a list of controlnet models like this

        self.ptxt = StableDiffusionControlNetPipeline.from_pretrained(
                "runwayml/stable-diffusion-v1-5",
                safety_checker=None,
                requires_safety_checker=False,
                controlnet=[
                    ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16),
                    ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-depth", torch_dtype=torch.float16)
                ],
                torch_dtype=torch.float16).to("cuda")

Is this supported in this pipeline?

Cheers

How to ues it in webUI?

TypeError: StableDiffusionControlNetPipeline.prepare_image() missing 1 required positional argument: 'do_classifier_free_guidance'

pipe.to('cuda')

# generate image
generator = torch.manual_seed(0)
new_image = pipe(
    text_prompt,
    num_inference_steps=20,
    generator=generator,
    image=image,
    control_image=canny_image,
    controlnet_conditioning_scale = 0.5,
    mask_image=mask_image
).images[0]

new_image.save('output/canny_result.png')

Thanks for your great work, by running the above code in notebook, I get some issues:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:5                                                                                    │
│                                                                                                  │
│    2                                                                                             │
│    3 # generate image                                                                            │
│    4 generator = torch.manual_seed(0)                                                            │
│ ❱  5 new_image = pipe(                                                                           │
│    6 │   text_prompt,                                                                            │
│    7 │   num_inference_steps=20,                                                                 │
│    8 │   generator=generator,                                                                    │
│                                                                                                  │
│ d:\App\miniconda\envs\aigc\lib\site-packages\torch\autograd\grad_mode.py:27 in decorate_context  │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ c:\Users\Arthur\Downloads\ControlNetInpaint-main\ControlNetInpaint-main\src\pipeline_stable_diff │
│ usion_controlnet_inpaint.py:394 in __call__                                                      │
│                                                                                                  │
│   391 │   │   )                                                                                  │
│   392 │   │                                                                                      │
│   393 │   │   # 4. Prepare image                                                                 │
│ ❱ 394 │   │   control_image = self.prepare_image(                                                │
│   395 │   │   │   control_image,                                                                 │
│   396 │   │   │   width,                                                                         │
│   397 │   │   │   height,                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: StableDiffusionControlNetPipeline.prepare_image() missing 1 required positional argument: 
'do_classifier_free_guidance'

Unexpected results when used Collab example with other images

Hello,

I'm trying to use the provided Google Colab file to mask out a piece of cloth from the original image of a person wearing cloth and change the cloth with a textual prompt (like color for eg), but I'm encountering issues with the generated image. Specifically, the generated image appears to be of poor quality and has a mixed-up appearance.

Here are my inputs:
A person wearing a cloth.

A person wearing a grey cloth (representing no cloth).

Prompt

text_prompt="A woman wearing a green shirt"

It seems intuitive, however, the output image I'm receiving is not what I expected. I've followed the instructions provided in the repo, but I'm still unable to achieve satisfactory results.

OUTPUT

note:

I tried converting the grey color of the mask image to black to see if it yields any better results, but it did not, unfortunately.
I tried the canny with image and mask image to see any differences, but the generated image was still like this.

Could you please provide some guidance on how to improve the output image quality? If there are any known issues or limitations with the current implementation, please let me know as well.

Cheers
Seth

Can this work with SD 2 Inpainting

Thanks a ton for this repo. I have 2 questions:

Is there a way to make it work with Sd 2 Inpainting and potentially upcoming inpainting models ( XL etc)
If I have a ckpt of an custom inpainting model, how can I convert that to diffusers format?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.