zhizdev / mvdfusion Goto Github PK

View Code? Open in Web Editor NEW

95.0 5.0 4.0 593 KB

[CVPR 2024] MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation

Home Page: https://mvd-fusion.github.io/

License: MIT License

Python 100.00%

3d-generation diffusion-models

mvdfusion's Introduction

MVD-Fusion

MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu*, Zhizhuo Zhou*, Varun Jampani, Shubham Tulsiani
*Equal contribution
CVPR 2024 | GitHub | arXiv | Project page

Given an input RGB image, MVD-Fusion generates multi-view RGB-D images using a depth-guided attention mechanism for enforcing multi-view consistency. We visualize the input RGB image (left) and three synthesized novel views (with generated depth in inset).

Shoutouts and Credits

This project is built on top of open-source code. We thank the open-source research community and credit our use of parts of Stable Diffusion, kiuikit, Zero-1-to-3, and Syncdreamer.

Code

Colab demo!

Our code release contains:

Code for inference
Code for training (Coming Soon!)
Pretrained weights on Objaverse

For bugs and issues, please open an issue on GitHub and I will try to address it promptly.

Environment Setup

Please follow the environment setup guide in ENVIRONMENT.md.

Dataset

We provide two evaluation dataset, Google Scanned Objects (GSO) and SyncDreamer in-the-wild dataset.

(optional) Download GSO evaluation set here and extract it to demo_datasets/gso_eval.
(optoinal) Download in-the-wild evaluation set here and extract it to demo_datasets/wild_eval.

Pretrained Weights

MVD-Fusion requires Zero-1-to-3 weights, CLIP ViT weights, and finetuned MVD-Fusion weights.

Find MVD-Fusion weights here and download them to weights/, a full set of weights will have weights/clip_vit_14.ckpt, weights/mvdfusion_sep23.pt, and weights/zero123_105000_cc.ckpt.

Evaluation

Examples

To run evaluation on the GSO test set, assuming the dataset and model weights are downloaded according to instructions above, run demo.py.

$ python demo.py -c configs/mvd_gso.yaml

Flags

-g, --gpus              number of gpus to use (default: 1)
-p, --port              last digit of DDP port (default: 1)
-c, --config            yaml config file

Output

Output artifact will be saved to demo/ by default.

Training

Zero123 weights are required for training (for initialization). Please download them and extract them to weights/zero123_105000.ckpt.

Sample training code is provided in train.py. Please follow the evaluation tutorial above to setup the environment and pretrained weights. It is recommended to directly modify configs/mvd_train.yaml to specify the experiment directory and set the training hyperparameters. We show training flags below. We recommend a minimum of 4 GPUs for training.

$ python demo.py -c configs/mvd_train.yaml -g 4

Flags

-g, --gpus              number of gpus to use (default: 1)
-p, --port              last digit of DDP port (default: 1)
-b, --backend           distributed data parallel backend (default: nccl)

Using Custom Datasets

To train on a custom dataset, one needs to write a custom dataloader. We describe the required outputs for the __getitem__ function, which should be a dictionary containing:

{
  'images': (B, 3, H, W) image tensor,
  'R': (B, 3, 3) PyTorch3D rotation,
  'T': (B, 3) PyTorch3D translation,
  'f': (B, 2) PyTorch3D focal_length in NDC space,
  'c': (B, 2) PyTorch3D principal_point in NDC space,
}

Citation

If you find this work useful, a citation will be appreciated via:

@inproceedings{hu2024mvdfusion,
    title={MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation}, 
    author={Hanzhe Hu and Zhizhuo Zhou and Varun Jampani and Shubham Tulsiani},
    booktitle={CVPR},
    year={2024}
}

Acknowledgements

We thank Bharath Raj, Jason Y. Zhang, Yufei (Judy) Ye, Yanbo Xu, and Zifan Shi for helpful discussions and feedback. This work is supported in part by NSF GRFP Grant No. (DGE1745016, DGE2140739).

mvdfusion's People

Contributors

Stargazers

Watchers

Forkers

whuhxb peterzs jackzhousz hiyyg

mvdfusion's Issues

How to render image from Objaverse

Hello ， i read your paper and i find that you render 16 views from an elevation of 30 degrees and azimuth linearly spaced across 360 degrees.I am interested in how you render imager from obj of the Objaverse.It will help me a lot if you can show me your render code 。Thank you !!!

torchtext dependency issue

Thanks for the great work!

I encountered below error

(base) root@39c632041e34:/workspace/code/mvdfusion# python demo.py -c configs/mvd_gso.yaml
Traceback (most recent call last):
File "demo.py", line 17, in
from pytorch_lightning import seed_everything
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/init.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/init.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/init.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in
from pytorch_lightning.metrics.utils import deprecated_metrics, void
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/utils.py", line 29, in
from pytorch_lightning.utilities import rank_zero_deprecation
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/init.py", line 18, in
from pytorch_lightning.utilities.apply_func import move_data_to_device # noqa: F401
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/utilities/apply_func.py", line 31, in
from torchtext.legacy.data import Batch
ModuleNotFoundError: No module named 'torchtext.legacy'

which can be handled by changing the version of torchtext to 0.10.x (in current env is 0.12.0)
but when i try to reinstall torchtext the version of torch is automatically changed to older version
is it okay to use older version of torch?
and it would be thankful if you can let me the versions of packages you used

Thanks

zhizdev / mvdfusion Goto Github PK

mvdfusion's Introduction

MVD-Fusion

Shoutouts and Credits

Code

Environment Setup

Dataset

Pretrained Weights

Evaluation

Examples

Flags

Output

Training

Flags

Using Custom Datasets

Citation

Acknowledgements

mvdfusion's People

Contributors

Stargazers

Watchers

Forkers

mvdfusion's Issues

Recommend Projects

Recommend Topics

Recommend Org