Code Monkey home page Code Monkey logo

theatergen's Introduction

Theatergen: Character Management with LLM for Consistent Multi-turn Image Generation

[๐Ÿ“„Paper] โ€ƒ [๐ŸšฉProject Page]

Teaser figure

Model Architecture

Teaser figure

Introduction

We propose Theatergen, a tuning-free method for consistent multi-turn image generation. The key idea is to utilize LLM for character management with layout and id and customize each character to avoid attention leakage. We further propose the CMIGBench for evaluating the consistency in multi-turn image generation.

TODO

  • Deployment with GPT interface
  • Release Benchmark
  • Release code

๐Ÿ”ฅ News

  • [2024.04.26] We have released our code and benchmark

Setup

๐Ÿ”ง Requirements

To install requirements:

pip install -r requirements.txt

๐Ÿš€ Generate

Generate with CMIGBench or replace with your own demo

python generate.py --task story --sd_version '1.5' --dataset_path CMIGBench

๐Ÿงช Evaluate

Prepare the output in the following format

โ”œโ”€โ”€ output_dir
|   โ”œโ”€โ”€ dialogue 1
|      โ”œโ”€โ”€ turn1.png 
|      โ”œโ”€โ”€ turn2.png 
|      โ”œโ”€โ”€ turn3.png 
|      โ””โ”€โ”€ turn4.png 
|   โ”œโ”€โ”€ dialogue 2
|      ...

Evalutate the generated results of CMIGBench

python CMIGBench/eval/eval.py 
python CMIGBench/eval/eval_extra.py 

๐Ÿ‘€ Contact Us

If you have any questions, please feel free to email us at [email protected].

๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ(I am an undergraduate student actively seeking opportunities for a Ph.D. program in 25 fall.)๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ

๐Ÿ’กAcknowledgement

Our work is based on stable diffusion, Grounded-SAM, T2I-Adapter, and IP-Adapter. We appreciate their outstanding contributions.

Citation

If you found this code helpful, please consider citing:

@article{cheng2024theatergen,
  title={TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation},
  author={Cheng, Junhao and Yin, Baiqiao and Cai, Kaixin and Huang, Minbin and Li, Hanhui and He, Yuxin and Lu, Xi and Li, Yue and Li, Yifei and Cheng, Yuhao and others},
  journal={arXiv preprint arXiv:2404.18919},
  year={2024}
}

theatergen's People

Contributors

donahowe avatar

Stargazers

 avatar Mohammad Hossein Yazdi avatar Cesarkon avatar Shahid Bilal avatar weijiawu avatar  avatar  avatar  avatar elucida avatar MD Saiful Islam avatar  avatar Jianzong Wu avatar Yue Li avatar coolcoolใฎyisuanwang avatar Jing Tang avatar Krtolica Vujadin avatar  avatar  avatar  avatar cosanostra avatar Straughter "BatmanOsama" Guthrie avatar  avatar tomato avatar  avatar Ole J. Rosendahl avatar  avatar sword avatar  avatar  avatar  avatar  avatar  avatar PandaWu avatar  avatar  avatar Melvin Voetberg avatar Terry Zhang avatar Dongyang Li avatar George avatar Jonathon W. Marshall avatar  avatar Tsichun Wang avatar  avatar Yuxin He avatar RealTrump avatar  avatar ckx avatar  avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

theatergen's Issues

Missing Hugging Face Models and Docker Container Request

When I run run.py, there are three missing Hugging Face models:

1.	/data2/chengjunhao/THEATERGEN/pretrained_models/vae_ft_mse
2.	/data2/chengjunhao/THEATERGEN/pretrained_models/diffusion_1.5_comic
3.	/diffusion_1.5/unet

Is it possible for you to provide the correct Hugging Face links for these models or their weights? Additionally, it would be greatly appreciated if you could provide a Docker container for this setup.

Thank you!

't2i_ckpt' is not defined in generate.py

Hi, really great work!
However, when I try to run generate.py with sdxl, it sames that the 't2i_ckpt' in L123 is not defined yet. What's the model for 't2i_ckpt'?

ViT SAM version

Excuse me. There have 3 versions of different SAM, ViT-[H/L/B] SAM, and which ViT SAM did you use in your work?

about negative prompt

Thanks for your releases. It seems that negative prompt is also concatenate with the positive prompt, isn't it? I have not found information about negative prompt in reference paper. How to deal with it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.