Code Monkey home page Code Monkey logo

mulan's Introduction

🌻 MuLan

PyPI - Downloads

# pip install mulankit
from diffusers import StableDiffusionPipeline
+ import mulankit

pipe = StableDiffusionPipeline.from_pretrained('Lykon/dreamshaper-8')
+ pipe = mulankit.transform(pipe, 'mulanai/mulan-lang-adapter::sd15_aesthetic.pth')
image = pipe('一只蓝色的🐶 in the 바다').images[0]
一只蓝色的 🐶 in the 바다 (Dreamshaper-8) レゴシュワルツェネッガー (SDXL-lightning) 一只可爱的猫头鹰 (MVDream) 海浪风景 (AnimateDiff)
dreamshaper8 一只戴着帽子的 rabbit レゴアーノルド・シュワルツェネッガー 海浪

What is it ?

We present MuLan, a versatile framework to equip any diffusion model with multilingual generation abilities natively by up to 110+ languages around the world. With properly trained text encoder from noisy data, we demonstrate that MuLan could be trained on English only data and support other languages zero-shot. Additionally, we introduce Language Adapter. A language adapter with less than 20M parameters, trained against a frozen denoiser and a text encoder, can be readily combined with any homologous community models/tools, such as LoRA, LCM, ControlNet, and IP-Adapter, without any finetuning.

无需额外训练,MuLan(木兰)可以为任何扩散模型提供原生的多语言能力。MuLan可以仅在英语数据上进行训练,即可泛化到其他多达110多种语言上。通过引入了语言适配器,我们可以将 MuLan 的多语言能力无缝地插入到任何同类社区模型/工具(如LoRA、LCM、ControlNet和IP-Adapter)中并且无需任何微调。

demo.mp4

News

  • optimize memory usage.
  • release technical report.
  • 2024-5-14: release code and models.

How to use

We have hosted a gradio demo here.

MuLan supports

  • Base models: Stable Diffusion 1.5, 2.1, XL, Pixart-Alpha/Sigma.
  • Downstream models: ControlNet, LCM, LoRA, finetuned models and etc.
  • Video models: AnimateDiff.
  • 3D models: MVDream.

Please refer to the USAGE.md and examples for more details.

Model Release

Model Description Link
MuLan-Language-Adapter Adapters for SDXL, SD1.5/2.1, Pixart hf-model
MuLan-Pixart Full finetuned model hf-model

See more at our Huggingface 🌻 Homepage.

Citation

If you find this repo helpful, please considering citing us.

@article{lai2024mulan,
  title={MuLan: Adapting Multilingual Diffusion Models for 110 + Languages},
  year={2024}
}

Visitors

Acknowledgement

Our work is made possible by the open-source of these great works.

Stable Diffusion · Pixart-Alpha · InternVL

If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:

image

mulan's People

Contributors

zeqiang-lai avatar vladmandic avatar

Stargazers

艾梦 avatar Muhammed Pektas avatar Shawn J. avatar Wenjun Huang avatar Llunch avatar  avatar 罗雨欣 avatar 清风明月 avatar Mossy avatar  avatar Po Tsui avatar fujingling avatar  avatar  avatar MaoYuxin avatar  avatar Rongyuan Wu avatar xyxu avatar Hyungwook Choi avatar Liam Goodrick avatar MD Saiful Islam avatar Farhan Fadhilah avatar Ahmed Osama avatar Xuemin Zhao avatar Louis avatar  avatar  avatar Diwank Singh Tomer avatar Edge Micro avatar 千古兴亡知衡权 avatar Dinghao Zhou avatar  avatar QinLuo avatar VenmoSnake avatar funnycat avatar DW avatar  avatar Langcheng Zhao avatar suzukimain avatar  avatar BaoLin Chen avatar ermu2001 avatar neo avatar  avatar Jonas Wu avatar Kang Zhao avatar  avatar  avatar Albert Zhang avatar  avatar  avatar Lizpatronum avatar lcolok avatar Vinh avatar  avatar  avatar sunzheng avatar  avatar ImmNaruto avatar Chaodong Zhang avatar Han Xiao avatar  avatar Lau Van Kiet avatar vanch avatar Doiiars avatar 坠飘尘 avatar dp avatar  avatar Charlie Cortial avatar  avatar  avatar  avatar Shareef Ifthekhar avatar  avatar Slice avatar Fangget avatar Hongbin Mao avatar Tianwei Yin avatar  avatar 一叶知秋olka avatar 诸葛蛋 avatar daidaipig avatar Seunghoon Lee avatar Walter Hugo Lopez Pinaya avatar  avatar  avatar  avatar  avatar aaaa avatar zhang avatar Vectory avatar  avatar  avatar learner avatar  avatar Yubin Wang avatar  avatar Tianrui Wang (王天锐) avatar 爱可可-爱生活 avatar Felix Erkinger avatar

Watchers

Wenhai Wang avatar Kostas Georgiou avatar  avatar

mulan's Issues

推荐的gpu内存?

您好,推荐使用的gpu内存是多少,现在我报错oom了。又或者有什么可以优化的地方,谢谢!

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

File "/usr/local/lib/python3.10/dist-packages/mulankit/api.py", line 58, in func
def func(*args, **kwargs): return encode_prompt_sdxl(pipe, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mulankit/patch.py", line 468, in encode_prompt_sdxl
prompt_embeds, pooled_prompt_embeds = self.unet.adapter(prompt_embeds, pooled_prompt_embeds)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mulankit/models/adapter.py", line 111, in forward
encoder_hidden_states1 = self.adapter1(encoder_hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mulankit/models/adapter.py", line 82, in forward
encoder_hidden_states = self.proj(encoder_hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

what can I do for solving this problem?

how to load lora model after transformed pipe

Here are code snippet below:

pipe = RegionalDiffusionXLPipeline.from_pretrained(model_id,torch_dtype=torch.float16, use_safetensors=True, variant="fp16").to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config,use_karras_sigmas=True)

mulan_dir = "/****/sdxl_aesthetic.pth"
text_encoder_path = "**/models/InternVL-14B-224px"
pipe = mulankit.transform(pipe, mulan_dir,text_encoder_path=text_encoder_path)

I found the pipe not work when I transformed the pipe to mulan pipe, it raised the error:

ValueError: do not know how to get attention modules for: InternVLTextModel

but when I load lora model firstly and then transformed pipe, it works.

Support for smaller text encoders

Currently MuLan internally uses OpenGVLab/InternVL-14B-224px as default text encoder
While its possible to pass path to any downloadable encoder, which ones did you test?

Note that InternVL-14B-224px is a massive model at 27GB in size and requires ~17GB of VRAM to execute in FP16 context which prohibits usage of this library on any normal consumer GPU

Great! 点赞!非常有创意和实用价值的项目!就是 InternVL-14B-224px 模型实在太大,建议换成小一些的。

SD12、SD21执行可以,就是出图质量已经跟不太上了,SDXL中的模型:
adapter_path='../ckpts/adapter/sdxl/sdxl_internvl_unet_transformer_dual_xl_1024_ema_random_drop_9000.pth',

去哪找?

还有就是 SDXL、PixArt 模型不错,就是一般的 24GB 显卡不够跑啊。。。
我看 IntrenVL 有研究 Stable Cascade,这个模型出图质量很不错,提示词遵循性也很好,能不能整合进来。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.