mulanai / mulan Goto Github PK

View Code? Open in Web Editor NEW

120.0 3.0 3.0 4.57 MB

MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)

Python 99.95% CSS 0.02% Shell 0.03%

mulan's Introduction

🌻 MuLan

# pip install mulankit
from diffusers import StableDiffusionPipeline
+ import mulankit

pipe = StableDiffusionPipeline.from_pretrained('Lykon/dreamshaper-8')
+ pipe = mulankit.transform(pipe, 'mulanai/mulan-lang-adapter::sd15_aesthetic.pth')
image = pipe('一只蓝色的🐶 in the 바다').images[0]

一只蓝色的 🐶 in the 바다 (Dreamshaper-8)	レゴシュワルツェネッガー (SDXL-lightning)	一只可爱的猫头鹰 (MVDream)	海浪风景 (AnimateDiff)

What is it ?

We present MuLan, a versatile framework to equip any diffusion model with multilingual generation abilities natively by up to 110+ languages around the world. With properly trained text encoder from noisy data, we demonstrate that MuLan could be trained on English only data and support other languages zero-shot. Additionally, we introduce Language Adapter. A language adapter with less than 20M parameters, trained against a frozen denoiser and a text encoder, can be readily combined with any homologous community models/tools, such as LoRA, LCM, ControlNet, and IP-Adapter, without any finetuning.

无需额外训练，MuLan(木兰）可以为任何扩散模型提供原生的多语言能力。MuLan可以仅在英语数据上进行训练，即可泛化到其他多达110多种语言上。通过引入了语言适配器，我们可以将 MuLan 的多语言能力无缝地插入到任何同类社区模型/工具（如LoRA、LCM、ControlNet和IP-Adapter）中并且无需任何微调。

demo.mp4

News

optimize memory usage.
release technical report.
2024-5-14: release code and models.

How to use

We have hosted a gradio demo here.

MuLan supports

Base models: Stable Diffusion 1.5, 2.1, XL, Pixart-Alpha/Sigma.
Downstream models: ControlNet, LCM, LoRA, finetuned models and etc.
Video models: AnimateDiff.
3D models: MVDream.

Please refer to the USAGE.md and examples for more details.

Model Release

Model	Description	Link
MuLan-Language-Adapter	Adapters for SDXL, SD1.5/2.1, Pixart	hf-model
MuLan-Pixart	Full finetuned model	hf-model

See more at our Huggingface 🌻 Homepage.

Citation

If you find this repo helpful, please considering citing us.

@article{lai2024mulan,
  title={MuLan: Adapting Multilingual Diffusion Models for 110 + Languages},
  year={2024}
}

Acknowledgement

Our work is made possible by the open-source of these great works.

Stable Diffusion · Pixart-Alpha · InternVL

If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:

mulan's People

Contributors

Stargazers

Watchers

Forkers

chaojie dichen-cd personalityb0y

mulan's Issues

推荐的gpu内存?

您好，推荐使用的gpu内存是多少，现在我报错oom了。又或者有什么可以优化的地方，谢谢！

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

File "/usr/local/lib/python3.10/dist-packages/mulankit/api.py", line 58, in func
def func(*args, **kwargs): return encode_prompt_sdxl(pipe, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mulankit/patch.py", line 468, in encode_prompt_sdxl
prompt_embeds, pooled_prompt_embeds = self.unet.adapter(prompt_embeds, pooled_prompt_embeds)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mulankit/models/adapter.py", line 111, in forward
encoder_hidden_states1 = self.adapter1(encoder_hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mulankit/models/adapter.py", line 82, in forward
encoder_hidden_states = self.proj(encoder_hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

what can I do for solving this problem?

how to load lora model after transformed pipe

Here are code snippet below：

pipe = RegionalDiffusionXLPipeline.from_pretrained(model_id,torch_dtype=torch.float16, use_safetensors=True, variant="fp16").to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config,use_karras_sigmas=True)

mulan_dir = "/****/sdxl_aesthetic.pth"
text_encoder_path = "**/models/InternVL-14B-224px"
pipe = mulankit.transform(pipe, mulan_dir,text_encoder_path=text_encoder_path)

I found the pipe not work when I transformed the pipe to mulan pipe, it raised the error:

ValueError: do not know how to get attention modules for: InternVLTextModel

but when I load lora model firstly and then transformed pipe, it works.

Support for smaller text encoders

Currently MuLan internally uses OpenGVLab/InternVL-14B-224px as default text encoder
While its possible to pass path to any downloadable encoder, which ones did you test?

Note that InternVL-14B-224px is a massive model at 27GB in size and requires ~17GB of VRAM to execute in FP16 context which prohibits usage of this library on any normal consumer GPU

Great! 点赞！非常有创意和实用价值的项目！就是 InternVL-14B-224px 模型实在太大，建议换成小一些的。

SD12、SD21执行可以，就是出图质量已经跟不太上了，SDXL中的模型：
adapter_path='../ckpts/adapter/sdxl/sdxl_internvl_unet_transformer_dual_xl_1024_ema_random_drop_9000.pth',

去哪找？

还有就是 SDXL、PixArt 模型不错，就是一般的 24GB 显卡不够跑啊。。。
我看 IntrenVL 有研究 Stable Cascade，这个模型出图质量很不错，提示词遵循性也很好，能不能整合进来。