heheyas / v3d Goto Github PK
View Code? Open in Web Editor NEWV3D: Video Diffusion Models are Effective 3D Generators
Home Page: https://heheyas.github.io/V3D/
V3D: Video Diffusion Models are Effective 3D Generators
Home Page: https://heheyas.github.io/V3D/
Congrats for your great work!
And when will you release the whole model usage code and checkpoints?
Hello, why can't the multi-view picture generated by following your method be reconstructed by Colmap?
Colmap point cloud reconstruction is performed before Gaussian splatting is used.
Do you directly use the multi-view picture generated by Vedio_diffusion for Colmap reconstruction?view image I generated following your method be reconstructed with Colmap.
as title
I couldn't find anything that handles the creation of that file, I assume it contains camera transformation data which must be created during the video generation.
Hi, nice work!
I wonder know when do you plan to release the training code?
Thanks!~
Hi there,
It's an interesting paper I really enjoyed reading it. And thank you for opening source.
About the 290k objects training set the model was trained on - is there any chance you could share the object IDs for this subset? And can you briefly comment on what criteria were used for selecting them?
I'd really appreciate your reply.
File "scripts/pub/V3D_512.py", line 23, in
from sgm.inference.helpers import embed_watermark
File "/root/autodl-tmp/V3D-main/sgm/init.py", line 1, in
from .models import AutoencodingEngine, DiffusionEngine
File "/root/autodl-tmp/V3D-main/sgm/models/init.py", line 1, in
from .autoencoder import AutoencodingEngine
File "/root/autodl-tmp/V3D-main/sgm/models/autoencoder.py", line 14, in
from ..modules.autoencoding.regularizers import AbstractRegularizer
File "/root/autodl-tmp/V3D-main/sgm/modules/init.py", line 1, in
from .encoders.modules import GeneralConditioner, ExtraConditioner
File "/root/autodl-tmp/V3D-main/sgm/modules/encoders/modules.py", line 1134, in
class ExtraConditioner(GeneralConditioner):
File "/root/autodl-tmp/V3D-main/sgm/modules/encoders/modules.py", line 1135, in ExtraConditioner
def forward(self, batch: Dict, force_zero_embeddings: List | None = None) -> Dict:
TypeError: unsupported operand type(s) for |: '_GenericAlias' and 'NoneType'
V3D$ PYTHONPATH="." python recon/train_from_vid.py -w --sh_degree 0 --iterations 4000 --lambda_dssim 1.0 --lambda_lpips 2.0 --save_iterations 4000 --num_pts 100_000 --video output/000002.mp4
Traceback (most recent call last):
File "/mnt/newhome/sora/NeRF/V3D/recon/train_from_vid.py", line 28, in <module>
from scripts.sampling.simple_mv_latent_sample import sample_one
ModuleNotFoundError: No module named 'scripts.sampling'
hello, during 3dGS reconstruction part, I meet this eroor No module named 'scripts.sampling', can you tell me why is that?
Hi
May I ask the following import is from which package : from scripts.sampling.simple_mv_latent_sample import sample_one
Hello, I think it's a valuable work. Where can we find supplementary materials for more training details? Thanks.
Will training code be released?
What hardware resources are needed for training this job (graphics card model and memory), and how long is the training time approximately?
Thank you!
Dear:
Thanks for your excellent work firstly. But I found an issue when I compared V3D with the SV3D. Both of the two papers removed the motion_bucket_id as well as the fps_id, which are irrelevant. In the SV3D_u configuration, it did so with the adm_in_channels = 256.
But when referred to the V3D_512 configuration, the adm_in_channels is set 768 and I found the fps_id as well as the motion_bucket_id is set to 1 and 300 in the inference script.
So I wonder why the model is not consistent with the paper. If I do not remove the motion_bucket_id as well as the fps_id, how should I set them during training?
If convenient, could you help me solve this issue?
def qt2c2w(q, t):
# NOTE: remember to convert to opengl coordinate system
# rot = Rotation.from_quat(q).as_matrix()
rot = qvec2rotmat(q)
c2w = np.eye(4)
c2w[:3, :3] = np.transpose(rot)
c2w[:3, 3] = -np.transpose(rot) @ t
c2w[..., 1:3] *= -1
return c2w
how to use the pixel-nerf encoder to aggregate multi-view for scene generation ? any example ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.