Travel between prompts in the latent space to make pseudo-animation, extension script for AUTOMATIC1111/stable-diffusion-webui.
Try interpolating on the hidden vectors of conditioning prompt to make seemingly-continuous image sequence, or let's say a pseudo-animation. 😀
⚠ 我们成立了插件反馈 QQ 群: 616795645 (赤狐屿),欢迎出建议、意见、报告bug等 (w
⚠ We have a QQ chat group (616795645) now, any suggestions, discussions and bug reports are highly wellllcome!!
ℹ 实话不说,我想有可能通过这个来做ppt童话绘本甚至本子……
ℹ 聪明的用法:先手工盲搜两张好看的图 (只有prompt差异),然后再尝试在其间 travel :lolipop:
⚪ Features
- 2022/12/11: work in a more 'successive' way, idea borrowed from deforum ('genesis' option)
- 2022/11/14: walk by substituting token embedding ('replace' mode)
- 2022/11/13: walk by optimizing condition ('grad' mode)
- 2022/11/10: interpolate linearly on condition/uncondition ('linear' mode)
⚪ Fixups
- 2022/11/27: keep up with webui's updates (error
ImportError: FrozenCLIPEmbedderWithCustomWords
) - 2022/11/20: keep up with webui's updates (error
AttributeError: p.all_negative_prompts[0]
)
- input multiple lines in the prompt/negative-prompt box, each line is called a stage
- generate images one by one, interpolating from one stage towards the next (batch configs are ignored)
- gradually change the digested inputs between prompts
- freeze all other settings (
steps
,sampler
,cfg factor
,seed
, etc.) - note that only the major
seed
will be forcely fixed through all processes, you can still setsubseed = -1
to allow more variances
- freeze all other settings (
- export a video!
⚪ Txt2Img
sampler \ genesis | fixed | successive |
---|---|---|
Eular a | ![]() |
![]() |
DDIM | ![]() |
![]() |
⚪ Img2Img
sampler \ genesis | fixed | successive |
---|---|---|
Eular a | ![]() |
![]() |
DDIM | ![]() |
![]() |
Reference image for img2img:
Example above run configure ('linear' mode):
Prompt:
(((masterpiece))), highres, ((boy)), child, cat ears, white hair, red eyes, yellow bell, red cloak, barefoot, angel, [flying], egyptian
((masterpiece)), highres, ((girl)), loli, cat ears, light blue hair, red eyes, magical wand, barefoot, [running]
Negative prompt:
(((nsfw))), ugly,duplicate,morbid,mutilated,tranny,trans,trannsexual,mutation,deformed,long neck,bad anatomy,bad proportions,extra arms,extra legs, disfigured,more than 2 nipples,malformed,mutated,hermaphrodite,out of frame,extra limbs,missing arms,missing legs,poorly drawn hands,poorty drawn face,mutation,poorly drawn,long body,multiple breasts,cloned face,gross proportions, mutated hands,bad hands,bad feet,long neck,missing limb,malformed limbs,malformed hands,fused fingers,too many fingers,extra fingers,missing fingers,extra digit,fewer digits,mutated hands and fingers,lowres,text,error,cropped,worst quality,low quality,normal quality,jpeg artifacts,signature,watermark,username,blurry,text font ufemale focus, poorly drawn, deformed, poorly drawn face, (extra leg:1.3), (extra fingers:1.2),out of frame
Steps: 15
CFG scale: 7
Clip skip: 1
Seed: 114514
Size: 512 x 512
Model hash: 925997e9
Hypernet: (this is my secret :)
- prompt: (list of strings)
- negative prompt: (list of strings)
- input multiple lines of prompt text
- we call each line of prompt a stage, usually you need at least 2 lines of text to starts travel (unless in 'grad' mode)
- if len(positive_prompts) != len(negative_prompts), the shorter one's last item will be repeated to match the longer one
- mode: (categorical)
linear
: interpolate linearly on condition/uncondition in latent spacereplace
: walk by gradually substituting word embeddingsgrad
: walk by optimizing certain loss- NOTE:
walk
methods might not reach target stages in specified steps some times, or reached earlier than expect, in that case, manually tunegrad_alpha
andsteps
might help a little...
- steps: (int, list of int)
- number of images to interpolate between two stages
- if int, constant number of travel steps
- if list of int, length should match
len(stages)-1
, separate by comma, e.g.:12, 24, 36
- genesis: (categorical), the a prior for each image frame
fixed
: starts from pure noise in txt2img pipeline, or from the same ref-image given in img2img pipelinesuccessive
: starts from the last generated image (this will force txt2img turn to actually be img2img from the 2nd frame on)
- denoise_strength: (float), denoise strength in img2img pipelines when
genesis == 'successive'
- replace_*
- replace_order: (categorical)
random
: substitute tokens randomlysimiliar
: substitute most similar tokens first (L1 distance of token embeddings)different
: substitute most different tokens firstgrad_min
: substitute tokens that causing smallest gradient first (gradient settings same as ingrad
mode)grad_max
: substitute tokens that causing largest gradient first
- replace_order: (categorical)
- grad_*
- grad_alpha: (float), step size of a walk pace
- grad_iter: (int), step count of walk paces
- you can try trading
grad_alpha=0.01 grad_iter=1
forgrad_alpha=0.001 grad_iter=10
- might be more cautious (perhaps!), but definitely takes more time
- you can try trading
- grad_meth: (categorical), step function of a walk pace
clip
: a triky balance betweensign
andtanh
sign
: walk at a constant speed (often stuck into oscillation at the end)tanh
: significantly speed down when approaching (it takes infinite time to exactly reach...)
- grad_w_latent: (float), weight factor of
loss_latent
- grad_w_cond: (float), weight factor of
loss_cond
- video_*
- fps: (float), FPS of video, set
0
to disable file saving - fmt: (categorical), export video file format
- pad: (int), repeat beginning/ending frames, giving a in/out time
- pick: (string), cherry pick frames by python slice syntax before padding (e.g.: set
::2
to avoid non-converging ping-pong phenomenon, set:-1
to drop non-reaching last frame)
- fps: (float), FPS of video, set
- debug: (bool)
- whether show verbose debug info at console
⚠ this script will NOT probably support the schedule syntax (i.e.: [prompt:prompt:number]
), because I don't know how to interpolate between different schedule plans :(
⚠ max length diff for each prompts should NOT exceed 75
in token count, otherwise will only work on the first segment, cos' I also don't know how to interpolate between different-lengthed tensors 🤔
Easiest way to install it is to:
- Go to the "Extensions" tab in the webui, switch to the "Install from URL" tab
- Paste https://github.com/Kahsolt/stable-diffusion-webui-prompt-travel.git into "URL for extension's git repository" and click install
- (Optional) You will need to restart the webui for dependencies to be installed or you won't be able to generate video files
Manual install:
- Copy this repo folder to the 'extensions' folder of https://github.com/AUTOMATIC1111/stable-diffusion-webui
- (Optional) Restart the webui
⚪ grad mode
The loss_latent
optimizes mse_loss(current_generated_latent, target_latent)
- if
grad_w_latent
is positive, minimizing - if
grad_w_latent
is negative, maximizing
The loss_cond
optimizes l1_loss(current_cond, next_stage_cond)
- if
grad_w_cond
is positive, walk towards the next stage (minimizing) - if
grad_w_cond
is negative, walk away from it (maximizing)
Grid search results: (steps=100, grad_alpha=0.01, grad_iter=1, grad_meth='clip'
)
w_cond\w_latent | -1 | 0 | 1 |
---|---|---|---|
-1 | 纹理丢失色块平滑、逆向胚胎发育,最后变成圆圈堆叠成的抽象小人 | 前几步变得精致,随后纹理丢失色块平滑,但保持作画结构,中途突然高斯模糊,旋即背景失去语义,最后变成斑点图,l_grad下降 | 走到三张别的图,画风基本一致,背景变朦胧,途中震荡,最后人物没了,变得几何重复 |
0 | 纹理丢失色块平滑、逆向胚胎发育,最后变成圆圈堆叠成的抽象小人,l_l1上升 | - | 走到两张别的图,画风基本一致,背景变朦胧,途中震荡,l_l1上升 |
1 | 纹理丢失色块平滑、逆向胚胎发育,最后变成圆形蒙版、光栅纹理 | 近似线性插值,叠加式过渡到目标,途中震荡,l_grad下降 | 走到两张别的图,画风基本一致,背景变朦胧,最后震荡 |
(*) 上表如无特殊说明,其各项 loss 变化都符合设置的优化方向
(**) 我们似乎应当总是令 w_latent > 0
,而 w_cond
的设置似乎很玄学,这里可能遭遇了对抗样本现象(神经网络的过度线性性)……
ℹ NOTE: When 'prompt' has only single line, it will wander just around the initial stage, dynamically balancing loss_latent
and loss_cond
; this allows you to discover neighbors of your given prompt 😀
⚪ replace mode
This mode working on token embed input level, hence your can view log.txt
to see how your input tokens are gradually changed.
⚠ Remember that comma is a normal valid token, so you might see many commas there. However, they are different when appearing at different positions within the token sequence.
The actual token replacing order might reveal some information of the token importance, might the listed '>> grad ascend' or '>> embed L1-distance ascend' give you some ideas to tune your input prompt (I wish so..)
- deforum (2D/3D animation): https://github.com/deforum-art/deforum-for-automatic1111-webui
- sonar (k_diffuison samplers): https://github.com/Kahsolt/stable-diffusion-webui-sonar
by Armit 2022/11/10