Code Monkey home page Code Monkey logo

controlnetinpaint's Introduction

โ™ป๏ธ ControlNetInpaint

Open In Collab

ControlNet has proven to be a great tool for guiding StableDiffusion models with image-based hints! But what about changing only a part of the image based on that hint?

๐Ÿ”ฎ The initial set of models of ControlNet were not trained to work with StableDiffusion inpainting backbone, but it turns out that the results can be pretty good!

In this repository, you will find a basic example notebook that shows how this can work. The key trick is to use the right value of the parameter controlnet_conditioning_scale - while value of 1.0 often works well, it is sometimes beneficial to bring it down a bit when the controlling image does not fit the selected text prompt very well.

Demos on ๐Ÿค— HuggingFace Using ControlNetInpaint

โœ๏ธ Mask and Sketch

Check out the HuggingFace Space which allows you to scribble and describe how you want to recreate a part of an image: Screenshot 2023-04-16 at 11 56 29

๐ŸŽญtheaTRON

Check out the HuggingFace Space that reimagines scenes with human subjects using a text prompt: theaTRON tool examples

Code Usage

This code is currently compatible with diffusers==0.14.0. An upgrade to the latest version can be expected in the near future (currently, some breaking changes are present in 0.15.0 that should ideally be fixed on the side of the diffusers interface).

Here's an example of how this new pipeline (StableDiffusionControlNetInpaintPipeline) is used with the core backbone of "runwayml/stable-diffusion-inpainting":

# load control net and stable diffusion v1-5
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
     "runwayml/stable-diffusion-inpainting", controlnet=controlnet, torch_dtype=torch.float16
 )

# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# remove following line if xformers is not installed
pipe.enable_xformers_memory_efficient_attention()

pipe.to('cuda')

# generate image
generator = torch.manual_seed(0)
new_image = pipe(
    text_prompt,
    num_inference_steps=20,
    generator=generator,
    image=image,
    control_image=canny_image,
    mask_image=mask_image
).images[0]

(Full example how to get images and run the results is available in the notebook!)

Results

All results below have been generated using the ControlNet-with-Inpaint-Demo.ipynb notebook.

Let's start with turning a dog into a red panda!

Canny Edge

Prompt: "a red panda sitting on a bench"

Canny Result

HED

Prompt: "a red panda sitting on a bench"

HED Result

Scribble

Prompt: "a red panda sitting on a bench"

Canny Result

Depth

Prompt: "a red panda sitting on a bench"

Canny Result

Normal

Prompt: "a red panda sitting on a bench"

Normal Result

For the remaining modalities, the panda example doesn't really make much sense, so we use different images and prompts to illustrate the capability!

M-LSD

Prompt: "an image of a room with a city skyline view"

MLSD Result

OpenPose

Prompt: "a man in a knight armor"

Normal Result

Segmentation Mask

Prompt: "a pink eerie scary house"

Normal Result

Challenging Example ๐Ÿ•โžก๏ธ๐Ÿ”

Let's see how tuning the controlnet_conditioning_scale works out for a more challenging example of turning the dog into a cheeseburger!

In this case, we demand a large semantic leap and that requires a more subtle guide from the control image!

Cheeseburger Result

โฉ DiffusionFastForward: learn diffusion from ground up! ๐ŸŽป

If you want to learn more about the process of denoising diffusion for images, check out the open-source course DiffusionFastForward with colab notebooks where networks are trained from scratch on high-resolution data! ๐Ÿ”ฐ

Logo

Acknowledgement

There is a related excellent repository of ControlNet-for-Any-Basemodel that, among many other things, also shows similar examples of using ControlNet for inpainting. However, that definition of the pipeline is quite different, but most importantly, does not allow for controlling the controlnet_conditioning_scale as an input argument.

There are other differences, such as the fact that in this implementation, only one pipeline needs to be instantiated (as opposed to two in the other one), but the key motivation for publishing this repository is to provide a space solely focused on the application of ControlNet for inpainting.

controlnetinpaint's People

Contributors

mikonvergence avatar neelays avatar remorses avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

controlnetinpaint's Issues

How to get multiple images for multiple prompts

Hello @mikonvergence, your work is awesome and i have a query regarding an issue which is rendering in my brain from days.

I have 10-15 different prompts and i want to infer on a single image, also with T4 GPU, the GPU goes into fragments for single image and single prompt.

Thanks and Regards,
Satwik Sunnam.

promptless inpainting?

Is there a way to do promptless inpainting with ControlNet and stable diffusion 1.5 inpainting model?. I want to recreate this https://civitai.com/articles/1907 in colab but don't know how, and I don't want gradio UIs or server because you can't run sd-webui on free Colab and my pc is weak..

About 'strength' Parameter in StableDiffusionControlNetInpaintPipeline Compared to StableDiffusionInpaintPipeline

The StableDiffusionInpaintPipeline introduces a strength parameter, as detailed in the documentation here. However, I couldn't locate this parameter in the StableDiffusionControlNetInpaintPipeline.

If I use the parameters num_inference_steps=40 and strength=0.93 in StableDiffusionInpaintPipeline, should I then use num_inference_steps=37 (calculated as 40 * 0.93) in StableDiffusionControlNetInpaintPipeline?

About Training

Hi ! Thanks for your great work!
I'd like to ask you how to train the model? Is it training both unet for inpainting and controlnet? Or do you train these two separately?

Inpainting new "concepts"

Great work @mikonvergence!
I have a question that is somewhat related to #1. Say I have a poster image and want to inpaint the face in the poster with a given avatar image like:
Screenshot 2023-06-05 at 3 41 19 PM
How can I achieve this given the fact that these avatars are a new "concept" for the LDM? I did try your method mentioned in the issue but it did not work out for me.

Did you retrain the ControlNet for the SD-inpainting backbone?

Hi! Thank you for this repo.

I did not understand if you retrained ControlNet using the SD-inpainting backbone, or if you copied over the weight that were trained for the regular SD-backbone by ControlNot authors, and those weights somehow work on the SD-inpainting backbone as well?

Thank you very much,
Thibault

RuntimeError: GET was unable to find an engine to execute this computation

RuntimeError Traceback (most recent call last)
Cell In[9], line 4
1 from controlnet_aux import OpenposeDetector
3 openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
----> 4 pose_image = openpose(image)
5 pose_image

File /home/pai/lib/python3.9/site-packages/controlnet_aux/open_pose/init.py:83, in OpenposeDetector.call(self, input_image, detect_resolution, image_resolution, hand_and_face, return_pil)
81 H, W, C = input_image.shape
82 with torch.no_grad():
---> 83 candidate, subset = self.body_estimation(input_image)
84 hands = []
85 faces = []

File /home/pai/lib/python3.9/site-packages/controlnet_aux/open_pose/body.py:44, in Body.call(self, oriImg)
42 # data = data.permute([2, 0, 1]).unsqueeze(0).float()
43 with torch.no_grad():
---> 44 Mconv7_stage6_L1, Mconv7_stage6_L2 = self.model(data)
45 Mconv7_stage6_L1 = Mconv7_stage6_L1.cpu().numpy()
46 Mconv7_stage6_L2 = Mconv7_stage6_L2.cpu().numpy()

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /home/pai/lib/python3.9/site-packages/controlnet_aux/open_pose/model.py:116, in bodypose_model.forward(self, x)
114 def forward(self, x):
--> 116 out1 = self.model0(x)
118 out1_1 = self.model1_1(out1)
119 out1_2 = self.model1_2(out1)

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/container.py:217, in Sequential.forward(self, input)
215 def forward(self, input):
216 for module in self:
--> 217 input = module(input)
218 return input

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/conv.py:463, in Conv2d.forward(self, input)
462 def forward(self, input: Tensor) -> Tensor:
--> 463 return self._conv_forward(input, self.weight, self.bias)

File /home/pai/lib/python3.9/site-packages/torch/nn/modules/conv.py:459, in Conv2d._conv_forward(self, input, weight, bias)
455 if self.padding_mode != 'zeros':
456 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
457 weight, bias, self.stride,
458 _pair(0), self.dilation, self.groups)
--> 459 return F.conv2d(input, weight, bias, self.stride,
460 self.padding, self.dilation, self.groups)

RuntimeError: GET was unable to find an engine to execute this computation

No removing effect

Thanks for the great repo. I was trying to remove an object from an image. I try to use the canny method and also set the prompt to be nothing and decrease the controlnet_conditioning_scale to 0. This works on the default image in the colab but not with any other image. In fact, it produces sth else in the masked area. Could you please explain what else should be done to have removing effect ?

MultiControlNet support?

In the original ControlNet pipeline, we can pass a list of controlnet models like this

        self.ptxt = StableDiffusionControlNetPipeline.from_pretrained(
                "runwayml/stable-diffusion-v1-5",
                safety_checker=None,
                requires_safety_checker=False,
                controlnet=[
                    ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16),
                    ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-depth", torch_dtype=torch.float16)
                ],
                torch_dtype=torch.float16).to("cuda")

Is this supported in this pipeline?

Cheers

TypeError: StableDiffusionControlNetPipeline.prepare_image() missing 1 required positional argument: 'do_classifier_free_guidance'

pipe.to('cuda')

# generate image
generator = torch.manual_seed(0)
new_image = pipe(
    text_prompt,
    num_inference_steps=20,
    generator=generator,
    image=image,
    control_image=canny_image,
    controlnet_conditioning_scale = 0.5,
    mask_image=mask_image
).images[0]

new_image.save('output/canny_result.png')

Thanks for your great work, by running the above code in notebook, I get some issues:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ in <module>:5                                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    2                                                                                             โ”‚
โ”‚    3 # generate image                                                                            โ”‚
โ”‚    4 generator = torch.manual_seed(0)                                                            โ”‚
โ”‚ โฑ  5 new_image = pipe(                                                                           โ”‚
โ”‚    6 โ”‚   text_prompt,                                                                            โ”‚
โ”‚    7 โ”‚   num_inference_steps=20,                                                                 โ”‚
โ”‚    8 โ”‚   generator=generator,                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ d:\App\miniconda\envs\aigc\lib\site-packages\torch\autograd\grad_mode.py:27 in decorate_context  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    24 โ”‚   โ”‚   @functools.wraps(func)                                                             โ”‚
โ”‚    25 โ”‚   โ”‚   def decorate_context(*args, **kwargs):                                             โ”‚
โ”‚    26 โ”‚   โ”‚   โ”‚   with self.clone():                                                             โ”‚
โ”‚ โฑ  27 โ”‚   โ”‚   โ”‚   โ”‚   return func(*args, **kwargs)                                               โ”‚
โ”‚    28 โ”‚   โ”‚   return cast(F, decorate_context)                                                   โ”‚
โ”‚    29 โ”‚                                                                                          โ”‚
โ”‚    30 โ”‚   def _wrap_generator(self, func):                                                       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ c:\Users\Arthur\Downloads\ControlNetInpaint-main\ControlNetInpaint-main\src\pipeline_stable_diff โ”‚
โ”‚ usion_controlnet_inpaint.py:394 in __call__                                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   391 โ”‚   โ”‚   )                                                                                  โ”‚
โ”‚   392 โ”‚   โ”‚                                                                                      โ”‚
โ”‚   393 โ”‚   โ”‚   # 4. Prepare image                                                                 โ”‚
โ”‚ โฑ 394 โ”‚   โ”‚   control_image = self.prepare_image(                                                โ”‚
โ”‚   395 โ”‚   โ”‚   โ”‚   control_image,                                                                 โ”‚
โ”‚   396 โ”‚   โ”‚   โ”‚   width,                                                                         โ”‚
โ”‚   397 โ”‚   โ”‚   โ”‚   height,                                                                        โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
TypeError: StableDiffusionControlNetPipeline.prepare_image() missing 1 required positional argument: 
'do_classifier_free_guidance'

Unexpected results when used Collab example with other images

Hello,

I'm trying to use the provided Google Colab file to mask out a piece of cloth from the original image of a person wearing cloth and change the cloth with a textual prompt (like color for eg), but I'm encountering issues with the generated image. Specifically, the generated image appears to be of poor quality and has a mixed-up appearance.

Here are my inputs:
A person wearing a cloth.

image

A person wearing a grey cloth (representing no cloth).

image

Prompt

text_prompt="A woman wearing a green shirt"

It seems intuitive, however, the output image I'm receiving is not what I expected. I've followed the instructions provided in the repo, but I'm still unable to achieve satisfactory results.

OUTPUT
image

note:

  1. I tried converting the grey color of the mask image to black to see if it yields any better results, but it did not, unfortunately.
  2. I tried the canny with image and mask image to see any differences, but the generated image was still like this.

Could you please provide some guidance on how to improve the output image quality? If there are any known issues or limitations with the current implementation, please let me know as well.

Cheers
Seth

Can this work with SD 2 Inpainting

Thanks a ton for this repo. I have 2 questions:

  1. Is there a way to make it work with Sd 2 Inpainting and potentially upcoming inpainting models ( XL etc)
  2. If I have a ckpt of an custom inpainting model, how can I convert that to diffusers format?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.