Code Monkey home page Code Monkey logo

motionchain's Introduction

Official repo for MotionChain

MotionChain: Conversational Motion Controllers via Multimodal Prompts

Arxiv Paper • Demo • FAQCitation

Intro MotionChain

MotionChain is a unified vision-motion-language generative pre-trained model, which performs conversational generation tasks via multi-modal inputs with language models.

Technical details

The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generations, versatile multimodal generative shape models can significantly benefit various fields like 3D virtual construction and network-aided design. In this work, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a word-sentence-paragraph framework to discretize continuous shapes into shape words, further assembles these words for shape sentences, as well as integrates shape with instructional text for multi-modal paragraphs. To learn this shape-language model, we use a three-stage training scheme, including shape representation, multimodal alignment, and instruction-based generation, to align shape-language codebooks and learn the intricate correlations among these modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing.

pipeline

🚩 News

  • [2024/04/02] Upload paper and init project 🔥🔥🔥

⚡ Quick Start

▶️ Demo

👀 Visualization

⚠️ FAQ

Question-and-Answer

📖 Citation

If you find our code or paper helps, please consider citing:

@misc{jiang2024motionchain,
      title={MotionChain: Conversational Motion Controllers via Multimodal Prompts},
      author={Biao Jiang and Xin Chen and Chi Zhang and Fukun Yin and Zhuoyuan Li and Gang YU and Jiayuan Fan},
      year={2024},
      eprint={2404.01700},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgments

Thanks to BEDLAM, TMR, vector-quantize-pytorch, Motion-GPT, Motion-latent-diffusion, T2m-gpt, TEMOS, ACTOR, HumanML3D and joints2smpl, our code is partially borrowing from them.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

motionchain's People

Contributors

billl-jiang avatar

Stargazers

 avatar  avatar liuliuliuliu avatar Juze Zhang avatar  avatar  avatar Indigo avatar Raphaël avatar Snow avatar  avatar Jeff Carpenter avatar Lino Lerch avatar Tianxiao avatar  avatar Jiaxu Zhang avatar Teng Xu avatar peng jiang avatar Zhixin Piao avatar Weijie Wang avatar  avatar Jaswer avatar JingkaiSUN avatar Xingliang Jin avatar Marcus L Endicott avatar  avatar Jinpeng Liu avatar Sejong Yang avatar Jason Schuehlein avatar  avatar  avatar wxDai (戴文勋) avatar  avatar

Watchers

Snow avatar Chen Xin avatar  avatar Ling-Hao CHEN avatar Lino Lerch avatar

motionchain's Issues

Code Release

Thank you for your excellent work! When are you planning to release the code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.