Code Monkey home page Code Monkey logo

mvdfusion's Introduction

MVD-Fusion

MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu*, Zhizhuo Zhou*, Varun Jampani, Shubham Tulsiani
*Equal contribution
CVPR 2024 | GitHub | arXiv | Project page

txt2img-stable2 Given an input RGB image, MVD-Fusion generates multi-view RGB-D images using a depth-guided attention mechanism for enforcing multi-view consistency. We visualize the input RGB image (left) and three synthesized novel views (with generated depth in inset).

Shoutouts and Credits

This project is built on top of open-source code. We thank the open-source research community and credit our use of parts of Stable Diffusion, kiuikit, Zero-1-to-3, and Syncdreamer.

Code

Colab demo!

Open In Colab

Our code release contains:

  1. Code for inference
  2. Code for training (Coming Soon!)
  3. Pretrained weights on Objaverse

For bugs and issues, please open an issue on GitHub and I will try to address it promptly.

Environment Setup

Please follow the environment setup guide in ENVIRONMENT.md.

Dataset

We provide two evaluation dataset, Google Scanned Objects (GSO) and SyncDreamer in-the-wild dataset.

  1. (optional) Download GSO evaluation set here and extract it to demo_datasets/gso_eval.
  2. (optoinal) Download in-the-wild evaluation set here and extract it to demo_datasets/wild_eval.

Pretrained Weights

MVD-Fusion requires Zero-1-to-3 weights, CLIP ViT weights, and finetuned MVD-Fusion weights.

  1. Find MVD-Fusion weights here and download them to weights/, a full set of weights will have weights/clip_vit_14.ckpt, weights/mvdfusion_sep23.pt, and weights/zero123_105000_cc.ckpt.

Evaluation

Examples

To run evaluation on the GSO test set, assuming the dataset and model weights are downloaded according to instructions above, run demo.py.

$ python demo.py -c configs/mvd_gso.yaml

Flags

-g, --gpus              number of gpus to use (default: 1)
-p, --port              last digit of DDP port (default: 1)
-c, --config            yaml config file

Output

Output artifact will be saved to demo/ by default.

Training

  • Zero123 weights are required for training (for initialization). Please download them and extract them to weights/zero123_105000.ckpt.

Sample training code is provided in train.py. Please follow the evaluation tutorial above to setup the environment and pretrained weights. It is recommended to directly modify configs/mvd_train.yaml to specify the experiment directory and set the training hyperparameters. We show training flags below. We recommend a minimum of 4 GPUs for training.

$ python demo.py -c configs/mvd_train.yaml -g 4

Flags

-g, --gpus              number of gpus to use (default: 1)
-p, --port              last digit of DDP port (default: 1)
-b, --backend           distributed data parallel backend (default: nccl)

Using Custom Datasets

To train on a custom dataset, one needs to write a custom dataloader. We describe the required outputs for the __getitem__ function, which should be a dictionary containing:

{
  'images': (B, 3, H, W) image tensor,
  'R': (B, 3, 3) PyTorch3D rotation,
  'T': (B, 3) PyTorch3D translation,
  'f': (B, 2) PyTorch3D focal_length in NDC space,
  'c': (B, 2) PyTorch3D principal_point in NDC space,
}

Citation

If you find this work useful, a citation will be appreciated via:

@inproceedings{hu2024mvdfusion,
    title={MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation}, 
    author={Hanzhe Hu and Zhizhuo Zhou and Varun Jampani and Shubham Tulsiani},
    booktitle={CVPR},
    year={2024}
}

Acknowledgements

We thank Bharath Raj, Jason Y. Zhang, Yufei (Judy) Ye, Yanbo Xu, and Zifan Shi for helpful discussions and feedback. This work is supported in part by NSF GRFP Grant No. (DGE1745016, DGE2140739).

mvdfusion's People

Contributors

zhizdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mvdfusion's Issues

How to render image from Objaverse

Hello , i read your paper and i find that you render 16 views from an elevation of 30 degrees and azimuth linearly spaced across 360 degrees.I am interested in how you render imager from obj of the Objaverse.It will help me a lot if you can show me your render code 。Thank you !!!

torchtext dependency issue

Hi

Thanks for the great work!

I encountered below error

(base) root@39c632041e34:/workspace/code/mvdfusion# python demo.py -c configs/mvd_gso.yaml
Traceback (most recent call last):
File "demo.py", line 17, in
from pytorch_lightning import seed_everything
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/init.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/init.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/init.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in
from pytorch_lightning.metrics.utils import deprecated_metrics, void
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/utils.py", line 29, in
from pytorch_lightning.utilities import rank_zero_deprecation
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/init.py", line 18, in
from pytorch_lightning.utilities.apply_func import move_data_to_device # noqa: F401
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 31, in
from torchtext.legacy.data import Batch
ModuleNotFoundError: No module named 'torchtext.legacy'

which can be handled by changing the version of torchtext to 0.10.x (in current env is 0.12.0)
but when i try to reinstall torchtext the version of torch is automatically changed to older version
is it okay to use older version of torch?
and it would be thankful if you can let me the versions of packages you used

Thanks

Where is the training code

Hello,i'm interested in this work,but i can't find the training code in your train.py.When will you submit your whole training code ?it will be so grateful for me if you can share it public.Thanks!!

License

Hi! this is great work! Would love to know what license the repo has.

How can i get multi-view image

您好,拜读读了您的论文后,对您的工作很感兴趣。但是想请教一下,代码里面有没有指令可以直接生成多视图图片呢,运行了您的指令 python demo.py -c configs/mvd_gso.yaml后,生成的是gift,有没有办法可以生成的是图片结果呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.