heheyas / v3d Goto Github PK

View Code? Open in Web Editor NEW

408.0 408.0 12.0 36.38 MB

V3D: Video Diffusion Models are Effective 3D Generators

Home Page: https://heheyas.github.io/V3D/

Python 99.95% Shell 0.05%

v3d's People

Contributors

Stargazers

Watchers

Forkers

yuxuansnow haorand altrealabs camenduru navezjt x-ck-x jags111 abecid svorwerk-flextg anil2799 cv-synthesis

v3d's Issues

How to export cloud points with colors?

When will you release the whole model usage code and checkpoints?

Congrats for your great work!
And when will you release the whole model usage code and checkpoints?

MODEL VEDIO double-sided?

In the HUGGINGface test the generated model video is double-sided?

Multi-view image reconstruction failed.

Hello, why can't the multi-view picture generated by following your method be reconstructed by Colmap?
Colmap point cloud reconstruction is performed before Gaussian splatting is used.
Do you directly use the multi-view picture generated by Vedio_diffusion for Colmap reconstruction?view image I generated following your method be reconstructed with Colmap.

Can I generate mesh (not only video) in demo?

as title

Uknown './tmp/points3d.ply' ?

instant-nsr-pl requires transfomers_train.json

I couldn't find anything that handles the creation of that file, I assume it contains camera transformation data which must be created during the video generation.

When do you plan to release the training code?

Hi, nice work!
I wonder know when do you plan to release the training code?

Thanks!~

Training object IDs

Hi there,

It's an interesting paper I really enjoyed reading it. And thank you for opening source.

About the 290k objects training set the model was trained on - is there any chance you could share the object IDs for this subset? And can you briefly comment on what criteria were used for selecting them?

I'd really appreciate your reply.

How to control the camera_pose for every frame in generated video?

vram requirements?

How much do you need for inference really?

Try the code on my own png failed

File "scripts/pub/V3D_512.py", line 23, in
from sgm.inference.helpers import embed_watermark
File "/root/autodl-tmp/V3D-main/sgm/init.py", line 1, in
from .models import AutoencodingEngine, DiffusionEngine
File "/root/autodl-tmp/V3D-main/sgm/models/init.py", line 1, in
from .autoencoder import AutoencodingEngine
File "/root/autodl-tmp/V3D-main/sgm/models/autoencoder.py", line 14, in
from ..modules.autoencoding.regularizers import AbstractRegularizer
File "/root/autodl-tmp/V3D-main/sgm/modules/init.py", line 1, in
from .encoders.modules import GeneralConditioner, ExtraConditioner
File "/root/autodl-tmp/V3D-main/sgm/modules/encoders/modules.py", line 1134, in
class ExtraConditioner(GeneralConditioner):
File "/root/autodl-tmp/V3D-main/sgm/modules/encoders/modules.py", line 1135, in ExtraConditioner
def forward(self, batch: Dict, force_zero_embeddings: List | None = None) -> Dict:
TypeError: unsupported operand type(s) for |: '_GenericAlias' and 'NoneType'

ModuleNotFoundError: No module named 'scripts.sampling'

V3D$ PYTHONPATH="." python recon/train_from_vid.py  -w --sh_degree 0 --iterations 4000 --lambda_dssim 1.0 --lambda_lpips 2.0 --save_iterations 4000 --num_pts 100_000 --video output/000002.mp4
Traceback (most recent call last):
  File "/mnt/newhome/sora/NeRF/V3D/recon/train_from_vid.py", line 28, in <module>
    from scripts.sampling.simple_mv_latent_sample import sample_one
ModuleNotFoundError: No module named 'scripts.sampling'

hello, during 3dGS reconstruction part, I meet this eroor No module named 'scripts.sampling', can you tell me why is that?

unknown import

Hi
May I ask the following import is from which package : from scripts.sampling.simple_mv_latent_sample import sample_one

What does the gradio Number component represent in web demo?

Where can we find supplementary materials for more training details?

Hello, I think it's a valuable work. Where can we find supplementary materials for more training details? Thanks.

Will training code be released

Will training code be released？
What hardware resources are needed for training this job (graphics card model and memory), and how long is the training time approximately？
Thank you！

the motion_bucket_id and the fps_id part removement?

Dear:
Thanks for your excellent work firstly. But I found an issue when I compared V3D with the SV3D. Both of the two papers removed the motion_bucket_id as well as the fps_id, which are irrelevant. In the SV3D_u configuration, it did so with the adm_in_channels = 256.

But when referred to the V3D_512 configuration, the adm_in_channels is set 768 and I found the fps_id as well as the motion_bucket_id is set to 1 and 300 in the inference script.

So I wonder why the model is not consistent with the paper. If I do not remove the motion_bucket_id as well as the fps_id, how should I set them during training?

If convenient, could you help me solve this issue?

How to use multi-views PixelNeRF model for inference？

Hi, it's a great work！ Is there your method for exporting mesh grids？

long time no response "Run the V3D Video diffusion to generate dense multi-views"

2 hours later, no response at all, can any body help?

why is it here c2w? The format should be w2c from colmap files.

def qt2c2w(q, t):
# NOTE: remember to convert to opengl coordinate system
# rot = Rotation.from_quat(q).as_matrix()
rot = qvec2rotmat(q)
c2w = np.eye(4)
c2w[:3, :3] = np.transpose(rot)
c2w[:3, 3] = -np.transpose(rot) @ t
c2w[..., 1:3] *= -1
return c2w