thu-ml / prolificdreamer Goto Github PK

Official code of ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation (NeurIPS 2023 Spotlight)

Home Page: https://ml.cs.tsinghua.edu.cn/prolificdreamer/

License: Apache License 2.0

Python 79.47% C++ 0.47% Cuda 18.45% C 1.31% Shell 0.30%

diffusion-model dreamfusion nerf text-to-3d prolificdreamer stablediffusion

prolificdreamer's Introduction

ProlificDreamer

Official implementation of ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation, published in NeurIPS 2023 (Spotlight).

Installation

The codebase is built on stable-dreamfusion. For installation,

pip install -r requirements.txt

Training

ProlificDreamer includes 3 stages for high-fidelity text-to-3d generation.

# --------- Stage 1 (NeRF, VSD guidance) --------- #
# This costs approximately 27GB GPU memory, with rendering resolution of 512x512
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 25000 --lambda_entropy 10 --scale 7.5 --n_particles 1 --h 512  --w 512 --workspace exp-nerf-stage1/
# If you find the result is foggy, you can increase the --lambda_entropy. For example
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 25000 --lambda_entropy 100 --scale 7.5 --n_particles 1 --h 512  --w 512 --workspace exp-nerf-stage1/
# Generate with multiple particles. Notice that generating with multiple particles is only supported in Stage 1.
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 100000 --lambda_entropy 10 --scale 7.5 --n_particles 4 --h 512  --w 512 --t5_iters 20000 --workspace exp-nerf-stage1/

# --------- Stage 2 (Geometry Refinement) --------- #
# This costs <20GB GPU memory
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 15000 --scale 100 --dmtet --mesh_idx 0  --init_ckpt /path/to/stage1/ckpt --normal True --sds True --density_thresh 0.1 --lambda_normal 5000 --workspace exp-dmtet-stage2/
# If the results are with maney floaters, you can increase --density_thresh. Notice that the value of --density_thresh must be consistent in stage2 and stage3.
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 15000 --scale 100 --dmtet --mesh_idx 0  --init_ckpt /path/to/stage1/ckpt --normal True --sds True --density_thresh 0.4 --lambda_normal 5000 --workspace exp-dmtet-stage2/

# --------- Stage 3 (Texturing, VSD guidance) --------- #
# texturing with 512x512 rasterization
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 30000 --scale 7.5 --dmtet --mesh_idx 0  --init_ckpt /path/to/stage2/ckpt --density_thresh 0.1 --finetune True --workspace exp-dmtet-stage3/

We also provide a script that can automatically run these 3 stages.

bash run.sh gpu_id text_prompt

For example,

bash run.sh 0 "A pineapple."

Limitations: (1) Our work ultilizes the original Stable Diffusion without any 3D data, thus the multi-face Janus problem is prevalent in the results. Ultilizing text-to-image diffusion which has been finetuned on multi-view images will alleviate this problem. (2) If the results are not satisfactory, try different seeds. This is helpful if the results have a good quality but suffer from the multi-face Janus problem.

TODO List

Release our code.
Combine MVDream with VSD to alleviate the multi-face problem.

BibTeX

If you find our work useful for your project, please consider citing the following paper.

@inproceedings{wang2023prolificdreamer,
  title={ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation},
  author={Zhengyi Wang and Cheng Lu and Yikai Wang and Fan Bao and Chongxuan Li and Hang Su and Jun Zhu},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2023}
}

prolificdreamer's People

Contributors

Stargazers

Watchers

prolificdreamer's Issues

To get a sample 3D model generated by ProlificDreamer

I am interested in reviewing the 3D model generated by ProlificDreamer, please enlight me of how to get one of the file in fbx, obj, glb or other format.

GPU

What kind of GPU did the project need？And how long will the three stage run ?

Export mesh + texture

Thanks for your great work!

My goal is to import highly detailed meshes and textures into 3D software like Blender, Unreal for rendering

How do export mesh and textures (diffuse, specular, normal)?

Like image

Reproducing results from 2D Experiments

First of all, thank you for making this wonderful work public!

I am currently trying to reproduce the 2D Experiments on generating images with VSD with implementation details provided in Appendix G, but have failed to generate quality reported in the paper.

Would you be able to release the 2D experiment code?

请问会考虑开源模型吗？或者发布一个可本地运行或者在线版本？

[Question] Does it make sense to switch the VSD loss and Lora loss?

I'm curious if it makes sense to switch the VSD loss and LoRA loss in the paper (switch the following two red boxes). I feel like it is intuitively more natural, because it looks like, the LoRA extracts information from SD and one uses lora, which is more consistent than the original SD, to guide the NeRF. I tried to implement it with the 3studio framework but failed. I wonder if it's my bug or it really does not work. Just curious, if anyone tried?

Everyday ask you when will your code be available?

Prince Dubai like your code and want it open source

[Question] Loss Factor to Weight LoRA Loss and VSD Loss

Thank you for your awesome research!!!

I have been trying to implement prolific dreamer recently. Would you like to share more details about how to set factors for VSD loss and LoRA loss here.

Best,
Kuma

ninja: build stopped: subcommand failed.

Hi,

Thanks so much for releasing your work!

I encountered the bug "ninja: build stopped: subcommand failed." as following the instruction https://github.com/thu-ml/prolificdreamer?tab=readme-ov-file#training , with all dependancies and packages https://github.com/thu-ml/prolificdreamer/blob/main/requirements.txt installed.

Much appreciated if having your solutions.

There are no result files in stage 3

hello! I am inquiring about an issue when executing your code and an error appears as if nothing happens at the last stage 3.

There are three main types of customization.

Modify requirements due to _gridencoder issue
There was a C++17 version issue, so someone else's solution was to modify requirements.txt.
I did it.
requirements_custom.txt

tqdm
rich
ninja
numpy
pandas
scipy
scikit-learn
matplotlib
opencv-python
imageio
imageio-ffmpeg
torch==2.0.1
torchvision==0.15.2
torchaudio==2.0.2
torch-ema
einops
tensorboard
tensorboardX



# for gui
dearpygui

# for stable-diffusion
huggingface_hub
diffusers == 0.15.0
accelerate
transformers

# for dmtet and mesh export
xatlas
trimesh
PyMCubes
pymeshlab
git+https://github.com/NVlabs/nvdiffrast/

# for zero123
carvekit-colab
omegaconf
pytorch-lightning
taming-transformers-rom1504
kornia
git+https://github.com/openai/CLIP.git

# for omnidata
gdown

# for dpt
timm

# for remote debugging
debugpy-run

# for deepfloyd if
sentencepiece

Adjust numerical values such as maximum iteration 300, basic iteration 100, etc.
It takes about 10 minutes per epoch (A40), but I thought it would take 150 rounds to complete one stage, so I modified the epoch.
The hard-coded checkpoint part has been modified accordingly.

#!/bin/bash

gpu=$1
prompt=$2

echo "CUDA:$gpu, Prompt: $prompt"

filename=$(echo "$prompt" | sed 's/ /-/g')
n_particles=1

CUDA_VISIBLE_DEVICES=$gpu python main.py --text "$prompt" --iters 300 --lambda_entropy 10 --scale 7.5 --n_particles $n_particles --h 512 --w 512 --t5_iters 5000 --per_iter 100 --workspace exp-nerf-stage1/

# Find the latest checkpoint file in exp-nerf-stage1
recent_ckpt_stage1=$(find exp-nerf-stage1 -type d -name "*$filename*" -exec bash -c 'ls -t "$0"/checkpoints/*.pth 2>/dev/null' {} \; | head -n 1)

CUDA_VISIBLE_DEVICES=$gpu python main.py --text "$prompt" --iters 200 --scale 100 --dmtet --mesh_idx 0 --init_ckpt "$recent_ckpt_stage1" --normal True --sds True --density_thresh 0.1 --lambda_normal 5000 --per_iter 100 --workspace exp-dmtet-stage2/

# Find the latest checkpoint file in exp-dmtet-stage2
recent_ckpt_stage2=$(find exp-dmtet-stage2 -type d -name "*$filename*" -exec bash -c 'ls -t "$0"/checkpoints/*.pth 2>/dev/null' {} \; | head -n 1)

CUDA_VISIBLE_DEVICES=$gpu python main.py --text "$prompt" --iters 400 --scale 7.5 --dmtet --mesh_idx 0 --init_ckpt "$recent_ckpt_stage2" --density_thresh 0.1 --finetune True --per_iter 100 --workspace exp-dmtet-stage3/

Here is my stage files
you can download without login
https://www.dropbox.com/scl/fo/e0pqedb2us6l394me58jv/h?rlkey=1m28i18cl54kni4cmmvopbai5&dl=0

Google Colab

Can you add a google colab when you release

when will the code be released?

tks for your great work，and when will the code be released?

Enhancement:3DGaussian splating

could use 3DGaussian splating instead of nerf?

The result doesn't look good in stage 1

The commands I run are as follows：

cd ./prolificdreamer-main
CUDA_VISIBLE_DEVICES=0 python main.py --text "Albert Einstein is playing the guitar." --iters 25000 --lambda_entropy 100 --scale 7.5 --n_particles 1 --h 512  --w 512 --workspace exp-nerf-stage1/”

But the output video seems meaningless. What's the reason?
df_ep0250_00_textureless_rgb.mp4

关于NeRF优化的一些疑问

你好，我想问一下，为什么在sd.py中将grad传给了latents，但是执行了self.scaler.scale(loss).backward()后latents的值还是没有变化，请问VSD优化的对象是什么。

关于该模型的一点疑问

你好，我想问一下，最终对particle进行训练的时候，当particle=4的时候，是相当于训练了四个nerf嘛？以及他们是如何一起训练的呢？

Run ./run.sh 0 "A pineapple." but faild

File "/data1/git-repo/prolificdreamer/gridencoder/grid.py", line 54, in forward
_backend.grid_encode_forward(inputs, embeddings, offsets, outputs, B, D, C, L, S, H, dy_dx, gridtype, align_corners, interpolation)
TypeError: grid_encode_forward(): incompatible function arguments. The following argument types are supported:
1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: int, arg5: int, arg6: int, arg7: int, arg8: int, arg9: float, arg10: int, arg11: Optional[at::Tensor], arg12: int, arg13: bool, arg14: int) -> None

RuntimeError: instance mode - pos must have shape [>0, >0, 4]

When I run stage2 command, I get the following error. How can I solve it?

command:
CUDA_VISIBLE_DEVICES=1 python main.py --text "A pineapple." --iters 15000 --scale 100 --dmtet --mesh_idx 0 --init_ckpt exp-nerf-stage1/2023-12-07-A-pineapple.-scale-7.5-lr-0.001-albedo-le-10.0-render-512-cube-sd-2.1-5000-tet-256/checkpoints/df_ep0020.pth --normal True --sds True --density_thresh 0.1 --lambda_normal 5000 --workspace exp-dmtet-stage2/

output:
Traceback (most recent call last):
File "main.py", line 282, in
trainer.train(train_loader, valid_loader, max_epoch)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/utils.py", line 899, in train
self.train_one_epoch(train_loader)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/utils.py", line 1080, in train_one_epoch
pred_rgbs, pred_depths, loss, pseudo_loss, latents, shading = self.train_step(data)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/utils.py", line 653, in train_step
outputs = self.model.render(rays_o, rays_d, mvp, H, W, staged=False, light_d= light_d,perturb=True, bg_color=bg_color, ambient_ratio=ambient_ratio, shading=sha
ding, binarize=binarize)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/renderer.py", line 977, in render
results = self.run_dmtet(rays_d, mvp, h, w, **kwargs)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/renderer.py", line 857, in run_dmtet
rast, rast_db = dr.rasterize(self.glctx, verts_clip, faces, (h, w))
File "/data/caishuo/miniconda3/envs/prolificdreamer/lib/python3.8/site-packages/nvdiffrast/torch/ops.py", line 310, in rasterize
return _rasterize_func.apply(glctx, pos, tri, resolution, ranges, grad_db, -1)
File "/data/caishuo/miniconda3/envs/prolificdreamer/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/data/caishuo/miniconda3/envs/prolificdreamer/lib/python3.8/site-packages/nvdiffrast/torch/ops.py", line 248, in forward
out, out_db = _get_plugin().rasterize_fwd_cuda(raster_ctx.cpp_wrapper, pos, tri, resolution, ranges, peeling_idx)
RuntimeError: instance mode - pos must have shape [>0, >0, 4]