Code Monkey home page Code Monkey logo

diffcollage's Introduction

DiffCollage: Parallel Generation of Large Content with Diffusion Models



TLDR: DiffCollage, a scalable probabilistic model that synthesizes large content in parallel, including long images, looped motions, and 360 images, with diffusion models only trained on pieces of the content.

teaser

Diff Collage

Diffusion models and notations in this code base follows EDM.

  • $\sigma = t$
  • Diffusion forward process follows $x_\sigma = x_0 + \sigma \epsilon$
  • Diffusion training objective $|x_0 - x_\theta(x_0 + \sigma \epsilon, \sigma)|^2$ (data prediction model) or $|\epsilon - \epsilon_\theta(x_0 + \sigma \epsilon, \sigma)|^2$ (noise prediction model)
  • Conversion between data prediction model and noise prediction model: $\epsilon_\theta(x_\sigma, \sigma) = \frac{x_\sigma - x_\theta(x_0 + \sigma \epsilon, \sigma)}{\sigma} $

Caveats

Please be aware of the following points when using this software:

  1. Model Conversion:

    • If your model has been trained using methods other than EDM, you may need to convert it to EDM using the change-of-variable method.
  2. Sampling hyperparameters

    • We find stochastic sampling algorithms performance way more better than deterministic sampling algorithm when sampling step is high.

usage

import diff_collage as dc

def test(eps_fn, s_churn=10.0):
    n_step = 40 # sampling step
    overlap_size = 32 # how much overlap
    num_img = 11 # how many square images
    batch_size = 5
    ts_order = 5 # sampling timestamp schedule
    img_shape = (3, 64, 64) # image shape

    # sampling with conditional independence assumption
    worker = dc.condind_long.CondIndLong(img_shape, eps_fn, num_img, overlap_size=overlap_size)
    sample = dc.sampling(
        x = worker.generate_xT(batch_size),
        noise_fn = worker.noise,
        rev_ts = worker.rev_ts(n_step, ts_order),
        x0_pred_fn = worker.x0_fn,
        s_churn = s_churn,
        is_traj = False # return sampling traj or not
    )

    # sampling with average noise method
    worker = dc.AvgLong(img_shape, eps_fn, num_img, overlap_size=overlap_size)
    sample = dc.sampling(
        x = worker.generate_xT(batch_size),
        noise_fn = worker.noise,
        rev_ts = worker.rev_ts(n_step, ts_order),
        x0_pred_fn = worker.x0_fn,
        s_churn = s_churn,
        is_traj = False # return sampling traj or not
    )

Application : text-to-motion generation

The repo provides demo code for looped motion generation based on Human Motion Diffusion Model pretrained models

seutp environment

sudo apt update
sudo apt install ffmpeg

conda env create -f environment.yml
conda activate mdm
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git

bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh

download checkpoint

cd save
gdown "https://drive.google.com/u/0/uc?id=1PE0PK8e5a5j-7-Xhs5YET5U5pGh0c821&export=download&confirm=t"
unzip humanml_trans_enc_512.zip
cd ..

# sanity check
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --text_prompt "the person walked forward and is picking up his toolbox."

Reference

@inproceedings{zhange2023diffcollage,
    title={DiffCollage: Parallel Generation of Large Content with Diffusion Models},
    author={Qinsheng Zhang and Jiaming Song and Xun Huang and Yongxin Chen and Ming-yu Liu},
    booktitle={CVPR},
    year={2023}
}

diffcollage's People

Contributors

sbyebss avatar

Stargazers

李杰穎 (Jay Lee) avatar Shuai Yang avatar HarryC@BUAA avatar ZhouXiangZhong avatar  avatar carl avatar Utkarsh Mishra avatar Ge Zhu (朱舸) avatar  avatar

Watchers

 avatar

Forkers

rucchzy

diffcollage's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.