Code Monkey home page Code Monkey logo

cinemo's Introduction

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
Official PyTorch Implementation

Arxiv Project Page

This repo contains pre-trained weights, and sampling code for our paper exploring image animation with motion diffusion models (Cinemo). You can find more visualizations on our project page.

In this project, we propose a novel method called Cinemo, which can perform motion-controllable image animation with strong consistency and smoothness. To improve motion smoothness, Cinemo learns the distribution of motion residuals, rather than directly generating subsequent frames. Additionally, a structural similarity index-based method is proposed to control the motion intensity. Furthermore, we propose a noise refinement technique based on discrete cosine transformation to ensure temporal consistency. These three methods help Cinemo generate highly consistent, smooth, and motion-controlled image animation results. Compared to previous methods, Cinemo offers simpler and more precise user control and better generative performance.

News

  • (๐Ÿ”ฅ New) Jul. 23, 2024. ๐Ÿ’ฅ Our paper is released on arxiv.

  • (๐Ÿ”ฅ New) Jun. 2, 2024. ๐Ÿ’ฅ The inference code is released. The checkpoint can be found here.

Setup

First, download and set up the repo:

git clone https://github.com/maxin-cn/Cinemo
cd Cinemo

We provide an environment.yml file that can be used to create a Conda environment. If you only want to run pre-trained models locally on CPU, you can remove the cudatoolkit and pytorch-cuda requirements from the file.

conda env create -f environment.yml
conda activate cinemo

Animation

You can sample from our pre-trained Cinemo models with animation.py. Weights for our pre-trained Cinemo model can be found here. The script has various arguments for adjusting sampling steps, changing the classifier-free guidance scale, etc:

bash pipelines/animation.sh

All related checkpoints will download automatically and then you will get the following results,

Input image Output video Input image Output video
"People Walking" "Sea Swell"
"Girl Dancing under the Stars" "Dragon Glowing Eyes"

Other Applications

You can also utilize Cinemo for other applications, such as motion transfer and video editing:

bash pipelines/video_editing.sh

All related checkpoints will download automatically and you will get the following results,

Input video First frame Edited first frame Output video

Citation

If you find this work useful for your research, please consider citing it.

@article{ma2024cinemo,
  title={Cinemo: Latent Diffusion Transformer for Video Generation},
  author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
  journal={arXiv preprint arXiv:2407.15642},
  year={2024}
}

Acknowledgments

Cinemo has been greatly inspired by the following amazing works and teams: LaVie and SEINE, we thank all the contributors for open-sourcing.

License

The code and model weights are licensed under LICENSE.

cinemo's People

Contributors

maxin-cn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.