Comments (3)
Hi, there
(1) I might be able to revise it directly if it's open-source. If so, please show me the repo link.
(2) To revise the multi-control (mc) pipeline, you can refer to the pipeline implementation
class PaintWithWord_StableDiffusionPipeline(StableDiffusionPipeline):
in 'paint_with_words.py' at
https://github.com/lwchen6309/paint-with-words-sd/blob/ae75a8f6d1279c501c17a2482164571962761816/paint_with_words/paint_with_words.py#L513
especially for the denoising step of call function at
https://github.com/lwchen6309/paint-with-words-sd/blob/ae75a8f6d1279c501c17a2482164571962761816/paint_with_words/paint_with_words.py#L783
# 7. Denoising loop
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
with self.progress_bar(total=num_inference_steps) as progress_bar:
for i, t in enumerate(timesteps):
step_index = (self.scheduler.timesteps == t).nonzero().item()
sigma = self.scheduler.sigmas[step_index]
latent_model_input = self.scheduler.scale_model_input(latents, t)
# _t = t if not is_mps else t.float()
encoder_hidden_states.update({
"SIGMA": sigma,
"WEIGHT_FUNCTION": weight_function,
})
noise_pred_text = self.unet(
latent_model_input,
t,
encoder_hidden_states=encoder_hidden_states,
).sample
(a) As you can see here, the unet receive a conditional tensor
encoder_hidden_states
, which used to be a tensor but now replaced by a dict that consists of sigma, weight_funcion and the original tensor.
(b) You also have to replace the forward function of cross attention as at https://github.com/lwchen6309/paint-with-words-sd/blob/ae75a8f6d1279c501c17a2482164571962761816/paint_with_words/paint_with_words.py#L539
You can add (1) and (2) to mc pipeline to combine pww and mc pipeline.
(3) I'd also recommend you to try (latent couple)[https://github.com/opparco/stable-diffusion-webui-two-shot], which simply modify the unet input by adding text weighted map. Using this, you don't even need to inject the forward function of cross attention module but directly revise the denoising steps.
from paint-with-words-sd.
- I appreciate your willingness to help, but honestly there's nothing else that needs to be checked beyond the libraries:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler,ModelMixin
Also, I want to keep my code private because there's sensitive info there, sorry about that. - maybe we can pass the multicontrolnet pipeline in there or something? also, I didn't understand a or b, I'm sorry about that...
I'll be completely honest with you. I don't know how either multicontrolnet or paintwithwords work beyond some theory.
For multicontrolnet all I know is that it uses multiple controlnet models to generate something (don't know how they're mixed together or how the weights are taken into account) and for paintwithwords I have no idea how you tell it to focus prompts on specific parts of an image (i know there's color mapping but not how it works).
Are we just adding more and more weights and it works? But paintwithwords doesn't use weights and won't things get corrupted eventually with all those models and weights?
from paint-with-words-sd.
In that case, I'm afraid it would be hard for you to revise the code without knowing the theory behind them, which is probably why (a) and (b) are difficult to understand.
To mix the mc pipeline and pww, it would be essential to know how they work.
I'm afraid I cannot help with this unfortunately.
from paint-with-words-sd.
Related Issues (20)
- Is it it possible that some of the features in this research paper could be implemented?
- Suggestion about the tool to generate mask image? HOT 2
- SMB World 1-1 Segmentation Map and Key. HOT 1
- About porting to the diffusers pipeline HOT 1
- Question: How is this project going? HOT 1
- inj_forward() got an unexpected keyword argument 'attention_mask' HOT 3
- Parsing of Color values going wrong HOT 1
- I am trying to run this on Stable Diffusion 2.1 but I keep getting black images HOT 3
- inj_forward() got an unexpected keyword argument 'encoder_hidden_states' HOT 2
- having trouble getting cuda installed with this
- Found another implementation
- Will the Gradio demo look something like this? HOT 2
- AUTOMATIC1111 extension of PwW + ControlNet HOT 2
- Using with lora HOT 1
- Confliction of A1111 extension of PwW+Control to the original extension of ControlNet HOT 28
- Commas, Periods and Tokens HOT 1
- Incredibly High VRAM Usage
- Paint with words with Lora? HOT 2
- text and control input align HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paint-with-words-sd.