Code Monkey home page Code Monkey logo

reversion's Introduction

ReVersion: Diffusion-Based Relation Inversion from Images

Hugging Face

This repository contains the implementation of the following paper:

ReVersion: Diffusion-Based Relation Inversion from Images
Ziqi Huangβˆ—, Tianxing Wuβˆ—, Yuming Jiang, Kelvin C.K. Chan, Ziwei Liu

From MMLab@NTU affiliated with S-Lab, Nanyang Technological University

[Paper] | [Project Page] | [Video] | [Dataset]

Overview

overall_structure

We propose a new task, Relation Inversion: Given a few exemplar images, where a relation co-exists in every image, we aim to find a relation prompt <R> to capture this interaction, and apply the relation to new entities to synthesize new scenes. The above images are generated by our ReVersion framework.

Updates

Installation

  1. Clone Repo

    git clone https://github.com/ziqihuangg/ReVersion
    cd ReVersion
  2. Create Conda Environment and Install Dependencies

    conda create -n reversion
    conda activate reversion
    conda install python=3.8 pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch
    pip install diffusers["torch"]
    pip install -r requirements.txt

Usage

Relation Inversion

Given a set of exemplar images and their entities' coarse descriptions, you can optimize a relation prompt <R> to capture the co-existing relation in these images, namely Relation Inversion.

  1. Prepare the exemplar images (e.g., 0.jpg - 9.jpg) and coarse descriptions (text.json), and put them inside a folder. Feel free to use our ReVersion benchmark, or you can also prepare your own images. An example from our ReVersion benchmark is as follows:

    .reversion_benchmark_v1
    β”œβ”€β”€ painted_on
    β”‚Β Β  β”œβ”€β”€ 0.jpg
    β”‚Β Β  β”œβ”€β”€ 1.jpg
    β”‚Β Β  β”œβ”€β”€ 2.jpg
    β”‚Β Β  β”œβ”€β”€ 3.jpg
    β”‚Β Β  β”œβ”€β”€ 4.jpg
    β”‚Β Β  β”œβ”€β”€ 5.jpg
    β”‚Β Β  β”œβ”€β”€ 6.jpg
    β”‚Β Β  β”œβ”€β”€ 7.jpg
    β”‚Β Β  β”œβ”€β”€ 8.jpg
    β”‚Β Β  β”œβ”€β”€ 9.jpg
    β”‚Β Β  └── text.json
    
  2. Take the relation painted_on for example, you can start training using this script:

    accelerate launch \
        --config_file="./configs/single_gpu.yml" \
        train.py \
        --seed="2023" \
        --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
        --train_data_dir="./reversion_benchmark_v1/painted_on" \
        --placeholder_token="<R>" \
        --initializer_token="and" \
        --train_batch_size="2" \
        --gradient_accumulation_steps="4" \
        --max_train_steps="3000" \
        --learning_rate='2.5e-04' --scale_lr \
        --lr_scheduler="constant" \
        --lr_warmup_steps="0" \
        --output_dir="./experiments/painted_on" \
        --save_steps="1000" \
        --importance_sampling \
        --denoise_loss_weight="1.0" \
        --steer_loss_weight="0.01" \
        --num_positives="4" \
        --temperature="0.07"
    

    Where train_data_dir is the path to the exemplar images and coarse descriptions. output_dir is the path to save the inverted relation and the experiment logs. To generate relation-specific images, you can follow the next section Generation.

Generation

We can use the learned relation prompt <R> to generate relation-specific images with new objects, backgrounds, and style.

  1. You can obtain a learned <R> from Relation Inversion using your customized data. You can also download the models from here, where we provide several pre-trained relation prompts for you to play with.

  2. Put the models (i.e., learned relation prompt <R>) under ./experiments/ as follows:

    ./experiments/
    β”œβ”€β”€ painted_on
    β”‚   β”œβ”€β”€ checkpoint-500
    β”‚   ...
    β”‚   └── model_index.json
    β”œβ”€β”€ carved_by
    β”‚   β”œβ”€β”€ checkpoint-500
    β”‚   ...
    β”‚   └── model_index.json
    β”œβ”€β”€ inside
    β”‚   β”œβ”€β”€ checkpoint-500
    β”‚   ...
    β”‚   └── model_index.json
    ...
    
  3. Take the relation painted_on for example, you can either use the following script to generate images using a single prompt, e.g., "cat <R> stone":

    python inference.py \
    --model_id ./experiments/painted_on \
    --prompt "cat <R> stone" \
    --placeholder_string "<R>" \
    --num_samples 10 \
    --guidance_scale 7.5
    

    Or write a list prompts in ./templates/templates.py with the key name $your_template_name and generate images for every prompt in the list $your_template_name:

    your_template_name='painted_on_examples'
    python inference.py \
    --model_id ./experiments/painted_on \
    --template_name $your_template_name \
    --placeholder_string "<R>" \
    --num_samples 10 \
    --guidance_scale 7.5
    

    Where model_id is the model directory, num_samples is the number of images to generate for each prompt, and guidance_scale is the classifier-free guidance scale.

    We provide several example templates for each relation in ./templates/templates.py, such as painted_on_examples, carved_by_examples, etc.

Gradio Demo

  • We also provide a Gradio Demo to test our method using a UI. This demo supports relation-specific text-to-image generation on the fly. Running the following command will launch the demo:

    python app_gradio.py
    
  • Alternatively, you can try the online demo here.

Diverse Generation

You can also specify diverse prompts with the relation prompt <R> to generate images of diverse backgrounds and style. For example, your prompt could be "michael jackson <R> wall, in the desert", "cat <R> stone, on the beach", etc.

diverse_results

The ReVersion Benchmark

The ReVersion Benchmark consists of diverse relations and entities, along with a set of well-defined text descriptions.

  • Relations and Entities. We define ten representative object relations with different abstraction levels, ranging from basic spatial relations (e.g., β€œon top of”), entity interactions (e.g., β€œshakes hands with”), to abstract concepts (e.g., β€œis carved by”). A wide range of entities, such as animals, human, household items, are involved to further increase the diversity of the benchmark.
  • Exemplar Images and Text Descriptions. For each relation, we collect four to ten exemplar images containing different entities. We further annotate several text templates for each exemplar image to describe them with different levels of details. These training templates can be used for the optimization of the relation prompt.
  • Benchmark Scenarios. We design 100 inference templates composing of different object entities for each of the ten relations.

Citation

If you find our repo useful for your research, please consider citing our paper:

@article{huang2023reversion,
     title={{ReVersion}: Diffusion-Based Relation Inversion from Images},
     author={Huang, Ziqi and Wu, Tianxing and Jiang, Yuming and Chan, Kelvin C.K. and Liu, Ziwei},
     journal={arXiv preprint arXiv:2303.13495},
     year={2023}
}

Acknowledgement

The codebase is maintained by Ziqi Huang and Tianxing Wu.

This project is built using the following open source repositories:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.