Code Monkey home page Code Monkey logo

prolificdreamer's Introduction

ProlificDreamer

Official implementation of ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation, published in NeurIPS 2023 (Spotlight).

Installation

The codebase is built on stable-dreamfusion. For installation,

pip install -r requirements.txt

Training

ProlificDreamer includes 3 stages for high-fidelity text-to-3d generation.

# --------- Stage 1 (NeRF, VSD guidance) --------- #
# This costs approximately 27GB GPU memory, with rendering resolution of 512x512
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 25000 --lambda_entropy 10 --scale 7.5 --n_particles 1 --h 512  --w 512 --workspace exp-nerf-stage1/
# If you find the result is foggy, you can increase the --lambda_entropy. For example
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 25000 --lambda_entropy 100 --scale 7.5 --n_particles 1 --h 512  --w 512 --workspace exp-nerf-stage1/
# Generate with multiple particles. Notice that generating with multiple particles is only supported in Stage 1.
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 100000 --lambda_entropy 10 --scale 7.5 --n_particles 4 --h 512  --w 512 --t5_iters 20000 --workspace exp-nerf-stage1/

# --------- Stage 2 (Geometry Refinement) --------- #
# This costs <20GB GPU memory
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 15000 --scale 100 --dmtet --mesh_idx 0  --init_ckpt /path/to/stage1/ckpt --normal True --sds True --density_thresh 0.1 --lambda_normal 5000 --workspace exp-dmtet-stage2/
# If the results are with maney floaters, you can increase --density_thresh. Notice that the value of --density_thresh must be consistent in stage2 and stage3.
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 15000 --scale 100 --dmtet --mesh_idx 0  --init_ckpt /path/to/stage1/ckpt --normal True --sds True --density_thresh 0.4 --lambda_normal 5000 --workspace exp-dmtet-stage2/

# --------- Stage 3 (Texturing, VSD guidance) --------- #
# texturing with 512x512 rasterization
CUDA_VISIBLE_DEVICES=0 python main.py --text "A pineapple." --iters 30000 --scale 7.5 --dmtet --mesh_idx 0  --init_ckpt /path/to/stage2/ckpt --density_thresh 0.1 --finetune True --workspace exp-dmtet-stage3/

We also provide a script that can automatically run these 3 stages.

bash run.sh gpu_id text_prompt

For example,

bash run.sh 0 "A pineapple."

Limitations: (1) Our work ultilizes the original Stable Diffusion without any 3D data, thus the multi-face Janus problem is prevalent in the results. Ultilizing text-to-image diffusion which has been finetuned on multi-view images will alleviate this problem. (2) If the results are not satisfactory, try different seeds. This is helpful if the results have a good quality but suffer from the multi-face Janus problem.

TODO List

  • Release our code.
  • Combine MVDream with VSD to alleviate the multi-face problem.

Related Links

BibTeX

If you find our work useful for your project, please consider citing the following paper.

@inproceedings{wang2023prolificdreamer,
  title={ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation},
  author={Zhengyi Wang and Cheng Lu and Yikai Wang and Fan Bao and Chongxuan Li and Hang Su and Jun Zhu},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2023}
}

prolificdreamer's People

Contributors

luchengthu avatar thuwzy avatar yikaiw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prolificdreamer's Issues

GPU

What kind of GPU did the project need?And how long will the three stage run ?

Export mesh + texture

Thanks for your great work!

My goal is to import highly detailed meshes and textures into 3D software like Blender, Unreal for rendering

How do export mesh and textures (diffuse, specular, normal)?

Like image

image

Reproducing results from 2D Experiments

First of all, thank you for making this wonderful work public!

I am currently trying to reproduce the 2D Experiments on generating images with VSD with implementation details provided in Appendix G, but have failed to generate quality reported in the paper.

Would you be able to release the 2D experiment code?

[Question] Does it make sense to switch the VSD loss and Lora loss?

I'm curious if it makes sense to switch the VSD loss and LoRA loss in the paper (switch the following two red boxes). I feel like it is intuitively more natural, because it looks like, the LoRA extracts information from SD and one uses lora, which is more consistent than the original SD, to guide the NeRF. I tried to implement it with the 3studio framework but failed. I wonder if it's my bug or it really does not work. Just curious, if anyone tried?

image

There are no result files in stage 3

hello! I am inquiring about an issue when executing your code and an error appears as if nothing happens at the last stage 3.

There are three main types of customization.

  1. Modify requirements due to _gridencoder issue
    There was a C++17 version issue, so someone else's solution was to modify requirements.txt.
    I did it.
    requirements_custom.txt
tqdm
rich
ninja
numpy
pandas
scipy
scikit-learn
matplotlib
opencv-python
imageio
imageio-ffmpeg
torch==2.0.1
torchvision==0.15.2
torchaudio==2.0.2
torch-ema
einops
tensorboard
tensorboardX



# for gui
dearpygui

# for stable-diffusion
huggingface_hub
diffusers == 0.15.0
accelerate
transformers

# for dmtet and mesh export
xatlas
trimesh
PyMCubes
pymeshlab
git+https://github.com/NVlabs/nvdiffrast/

# for zero123
carvekit-colab
omegaconf
pytorch-lightning
taming-transformers-rom1504
kornia
git+https://github.com/openai/CLIP.git

# for omnidata
gdown

# for dpt
timm

# for remote debugging
debugpy-run

# for deepfloyd if
sentencepiece
  1. Adjust numerical values ​​such as maximum iteration 300, basic iteration 100, etc.
    It takes about 10 minutes per epoch (A40), but I thought it would take 150 rounds to complete one stage, so I modified the epoch.

  2. The hard-coded checkpoint part has been modified accordingly.

#!/bin/bash

gpu=$1
prompt=$2

echo "CUDA:$gpu, Prompt: $prompt"

filename=$(echo "$prompt" | sed 's/ /-/g')
n_particles=1

CUDA_VISIBLE_DEVICES=$gpu python main.py --text "$prompt" --iters 300 --lambda_entropy 10 --scale 7.5 --n_particles $n_particles --h 512 --w 512 --t5_iters 5000 --per_iter 100 --workspace exp-nerf-stage1/

# Find the latest checkpoint file in exp-nerf-stage1
recent_ckpt_stage1=$(find exp-nerf-stage1 -type d -name "*$filename*" -exec bash -c 'ls -t "$0"/checkpoints/*.pth 2>/dev/null' {} \; | head -n 1)

CUDA_VISIBLE_DEVICES=$gpu python main.py --text "$prompt" --iters 200 --scale 100 --dmtet --mesh_idx 0 --init_ckpt "$recent_ckpt_stage1" --normal True --sds True --density_thresh 0.1 --lambda_normal 5000 --per_iter 100 --workspace exp-dmtet-stage2/

# Find the latest checkpoint file in exp-dmtet-stage2
recent_ckpt_stage2=$(find exp-dmtet-stage2 -type d -name "*$filename*" -exec bash -c 'ls -t "$0"/checkpoints/*.pth 2>/dev/null' {} \; | head -n 1)

CUDA_VISIBLE_DEVICES=$gpu python main.py --text "$prompt" --iters 400 --scale 7.5 --dmtet --mesh_idx 0 --init_ckpt "$recent_ckpt_stage2" --density_thresh 0.1 --finetune True --per_iter 100 --workspace exp-dmtet-stage3/

Here is my stage files
you can download without login
https://www.dropbox.com/scl/fo/e0pqedb2us6l394me58jv/h?rlkey=1m28i18cl54kni4cmmvopbai5&dl=0

The result doesn't look good in stage 1

The commands I run are as follows:

cd ./prolificdreamer-main
CUDA_VISIBLE_DEVICES=0 python main.py --text "Albert Einstein is playing the guitar." --iters 25000 --lambda_entropy 100 --scale 7.5 --n_particles 1 --h 512  --w 512 --workspace exp-nerf-stage1/”

But the output video seems meaningless. What's the reason?
df_ep0250_00_textureless_rgb.mp4
图片

关于NeRF优化的一些疑问

你好,我想问一下,为什么在sd.py中将grad传给了latents,但是执行了self.scaler.scale(loss).backward()后latents的值还是没有变化,请问VSD优化的对象是什么。

关于该模型的一点疑问

你好,我想问一下,最终对particle进行训练的时候,当particle=4的时候,是相当于训练了四个nerf嘛?以及他们是如何一起训练的呢?

Run ./run.sh 0 "A pineapple." but faild

File "/data1/git-repo/prolificdreamer/gridencoder/grid.py", line 54, in forward
_backend.grid_encode_forward(inputs, embeddings, offsets, outputs, B, D, C, L, S, H, dy_dx, gridtype, align_corners, interpolation)
TypeError: grid_encode_forward(): incompatible function arguments. The following argument types are supported:
1. (arg0: at::Tensor, arg1: at::Tensor, arg2: at::Tensor, arg3: at::Tensor, arg4: int, arg5: int, arg6: int, arg7: int, arg8: int, arg9: float, arg10: int, arg11: Optional[at::Tensor], arg12: int, arg13: bool, arg14: int) -> None

RuntimeError: instance mode - pos must have shape [>0, >0, 4]

When I run stage2 command, I get the following error. How can I solve it?

command:
CUDA_VISIBLE_DEVICES=1 python main.py --text "A pineapple." --iters 15000 --scale 100 --dmtet --mesh_idx 0 --init_ckpt exp-nerf-stage1/2023-12-07-A-pineapple.-scale-7.5-lr-0.001-albedo-le-10.0-render-512-cube-sd-2.1-5000-tet-256/checkpoints/df_ep0020.pth --normal True --sds True --density_thresh 0.1 --lambda_normal 5000 --workspace exp-dmtet-stage2/

output:
Traceback (most recent call last):
File "main.py", line 282, in
trainer.train(train_loader, valid_loader, max_epoch)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/utils.py", line 899, in train
self.train_one_epoch(train_loader)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/utils.py", line 1080, in train_one_epoch
pred_rgbs, pred_depths, loss, pseudo_loss, latents, shading = self.train_step(data)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/utils.py", line 653, in train_step
outputs = self.model.render(rays_o, rays_d, mvp, H, W, staged=False, light_d= light_d,perturb=True, bg_color=bg_color, ambient_ratio=ambient_ratio, shading=sha
ding, binarize=binarize)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/renderer.py", line 977, in render
results = self.run_dmtet(rays_d, mvp, h, w, **kwargs)
File "/data/caishuo/3D_generation/prolificdreamer/nerf/renderer.py", line 857, in run_dmtet
rast, rast_db = dr.rasterize(self.glctx, verts_clip, faces, (h, w))
File "/data/caishuo/miniconda3/envs/prolificdreamer/lib/python3.8/site-packages/nvdiffrast/torch/ops.py", line 310, in rasterize
return _rasterize_func.apply(glctx, pos, tri, resolution, ranges, grad_db, -1)
File "/data/caishuo/miniconda3/envs/prolificdreamer/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/data/caishuo/miniconda3/envs/prolificdreamer/lib/python3.8/site-packages/nvdiffrast/torch/ops.py", line 248, in forward
out, out_db = _get_plugin().rasterize_fwd_cuda(raster_ctx.cpp_wrapper, pos, tri, resolution, ranges, peeling_idx)
RuntimeError: instance mode - pos must have shape [>0, >0, 4]

Can I use sparse multi-view images to train NeRF or 3D GS?

Hi, your work is impressive! I only have sparse horizontal view images of a car and want to train a complete car model. Can I use this method to improve the unseen views' quality? Especially the looking-down views. Here are my images.

31
17

关于代码复现过程的一些疑问

在normal.py文件里面的313行,有import util,但并没有util包,也没有在项目文件里面看到util文件,所以来请教作者,感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.