penghtyx / era3d Goto Github PK

View Code? Open in Web Editor NEW

481.0 16.0 23.0 34.16 MB

License: GNU Affero General Public License v3.0

Python 99.63% Shell 0.37%

era3d's Introduction

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

This is the official implementation of Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention.

Project Page | Arxiv | Weights |

teaser.mp4

Create your digital portrait from single image

result_clr_scale4_Yann_LeCun.mp4

result_clr_scale4_musk.mp4

Installation

conda create -n Era3D python=3.9
conda activate Era3D

# torch
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

# install xformers, download from https://download.pytorch.org/whl/cu118
pip install xformers-0.0.23.post1-cp39-cp39-manylinux2014_x86_64.whl 

# for reconstruciton
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install git+https://github.com/NVlabs/nvdiffrast

# other depedency
pip install -r requirements.txt

Weights

You can directly download the model from huggingface. You also can download the model in python script:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="pengHTYX/MacLab-Era3D-512-6view", local_dir="./pengHTYX/MacLab-Era3D-512-6view/")

Inference

we generate multivew color and normal images by running test_mvdiffusion_unclip.py. For example,

python test_mvdiffusion_unclip.py --config configs/test_unclip-512-6view.yaml \
    pretrained_model_name_or_path='pengHTYX/MacLab-Era3D-512-6view' \
    validation_dataset.crop_size=420 \
    validation_dataset.root_dir=examples \
    seed=600 \
    save_dir='mv_res'  \
    save_mode='rgb'

You can adjust the crop_size (400 or 420) and seed (42 or 600) to obtain best results for some cases.

Typically, we use rembg to predict alpha channel. If it has artifact, try to use Clipdrop to remove the background.
Instant-NSR Mesh Extraction

cd instant-nsr-pl
bash run.sh $GPU $CASE $OUTPUT_DIR

For example,

bash run.sh 0 A_bulldog_with_a_black_pirate_hat_rgba  recon

The textured mesh will be saved in $OUTPUT_DIR.

Gradio Demo for Multiview Generation

Following previous work, we use the pretrained SAM to interactively remove background.

mkdir sam_pt && cd sam_pt
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
cd ..

Then, run local gradio demo.

python app.py

Related projects

We collect code from following projects. We thanks for the contributions from the open-source community!
diffusers
Wonder3D
Syncdreamer
Instant-nsr-pl

License

This project is under AGPL-3.0, so any downstream solution and products that include our codes or the pretrained model inside it should be open-sourced to comply with the AGPL conditions. If you have any questions about the usage of Era3D, please feel free to contact us.

Citation

If you find this codebase useful, please consider cite our work.

@article{li2024era3d,
  title={Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention},
  author={Li, Peng and Liu, Yuan and Long, Xiaoxiao and Zhang, Feihu and Lin, Cheng and Li, Mengfei and Qi, Xingqun and Zhang, Shanghang and Luo, Wenhan and Tan, Ping and others},
  journal={arXiv preprint arXiv:2405.11616},
  year={2024}
}

era3d's People

Contributors

Stargazers

Watchers

era3d's Issues

Error message when running "Python app. py"

Terminal running with errors: "Input tensor shape: torch. Size ([12, 3, 512, 512]). Additional information: {}.",I found that it is related to line 186 of the app. py file "imgs_in=realrange (imgs_in," B Nv C H W ->(B Nv) C H W ")"，Can you help me solve this? Thank you!

Running on CPU

Is there any way to use this without enabling xformers? I'm trying to run inference on a CPU, but xformers doesn't support fp32 (and the CPU doesn't run in fp16). Is it possible to run without a video card without major code rework? Thank you in advance!

Cannot find the mesh with textures in $OUTPUT_DIR.

This is a amazing project! It helps me a lot! I successfully get the obj file and then I can import it into UE. But I cannot find the mesh with texture? There are some videos and mesh without texture saved in $OUTPUT_DIR.

Meet problems when exporting meshes with refined textures

Hello, thank you for your brilliant work in HR multi-view diffusion models and Wonder3D-style 3D generation!

I run your code on my case and it works well during diffusion inference stages and NeuS optimization stages. But when I visualize the refined mesh refine_ironman_rgba.obj using MeshLab, I meet some color artifacts as shown in the Figure 1.

But the video refine_ironman_rgba.mp4 does not have such flaws as the Figure 2.

I wonder whether I missed some details.

Thanks a lot! :)

Questions about reconstruction and texture refinement in the supplementary material

Hello, thank you for your brilliant work in HR multi-view diffusion models and Wonder3D-style 3D generation!

I am curious about the reconstruction and texture refinement in the supplementary material. You mentioned that render multiview images in predefined views and optimize the vertex colors with the corresponding generated images. Do you mean utilizing L1/L2 loss between rendered images and corresponding generated images?
And I wonder whether there are some visual quality comparisons between textured meshes before texture refinement and after refinement because your refinement method is quite efficient!

Thanks a lot! :)

video

teaser.mp4

problem installing reconstruction tool

Hi i'm installing the tool to build the 3d mesh inside a conda environment where i've installed torch with cuda 118
torch 2.1.2+cu118

but when i run the command:
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

it return me:
The detected CUDA version (12.0) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.

any tips to solve this problem?

training code

Thanks for your amazing work! Do have a plan to release the training code for finetuning the SD 2.1 on the objaverse dataset?

3090 gpu 报RuntimeError: Input type (float) and bias type (c10::Half) should be the same

SAM Time: 0.802s
ic| len(self.all_images): 1
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1550, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 661, in wrapper
response = f(*args, **kwargs)
File "/root/Era3D/app.py", line 215, in run_pipeline
out = pipeline(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/Era3D/mvdiffusion/pipelines/pipeline_mvdiffusion_unclip.py", line 537, in call
image_embeds, image_latents = self._encode_image(
File "/root/Era3D/mvdiffusion/pipelines/pipeline_mvdiffusion_unclip.py", line 244, in _encode_image
image_latents = self.vae.encode(image_pt).latent_dist.mode() * self.vae.config.scaling_factor
File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 261, in encode
h = self.encoder(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoders/vae.py", line 143, in forward
sample = self.conv_in(sample)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

I want to adjust Advanced Options

At first Thanks a lot for your awesome project ! 👍

I can see these options at online demo
https://huggingface.co/spaces/pengHTYX/Era3D_MV_demo

Especially Guidance Scale and Inference Steps

Can I set to command ? 🤔 ( https://github.com/pengHTYX/Era3D?tab=readme-ov-file#inference )

This is a very good work. May I ask what the camera intrinsic parameter matrix and fovy are corresponding to the six images generated?

Why are the resulting objects flatter, less round, and have white patches on their sides？

Below are my input images and results

where should I put the pretrained model

I downloaded it from the hugging face, but I have no idea where to put it.

Run on a CUDA server

How to run it on a CUDA server? Like Google Colab?

Can we use free viewpoint?

Hi, I would like to use this model as my novel view generator. With a single input image, I want the model to predict a target view given a camera pose. Would that be possible? Also, In addition to normal and rgb images, I'd also need depth image output. A naive solution would be applying monocular depth estimator on rgb images but that's not what I wanted. I'd like to output rgb, normal, depth simultaneously. Thanks in advance for your kindly reply!

position embeddings inconsistency

I have a question about this. I tried running the code in mvdiffusion/data/generate_fixed_text_embeds.py, but there were significant differences between the resulting pt file and the given file. For example, out of 78848 values in the embedding result of "front", only 8352 were consistent

Instant-NSR Mesh Extraction 失败，我是windows平台

bin D:\anaconda3\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
bin D:\anaconda3\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
bin D:\anaconda3\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
Update finite_difference_eps to 0.027204705103003882
Traceback (most recent call last):
File "F:\Era3D\instant-nsr-pl\launch.py", line 134, in
main()
File "F:\Era3D\instant-nsr-pl\launch.py", line 114, in main
trainer.fit(system, datamodule=dm)
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1103, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1182, in _run_stage
self._run_train()
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1205, in _run_train
self.fit_loop.run()
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 213, in advance
batch_output = self.batch_loop.run(kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\batch\training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 202, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 249, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 370, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1347, in _call_lightning_module_hook
output = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\core\module.py", line 1744, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\core\optimizer.py", line 169, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\strategies\strategy.py", line 234, in optimizer_step
return self.precision_plugin.optimizer_step(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\plugins\precision\native_amp.py", line 75, in optimizer_step
closure_result = closure()
^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 149, in call
self._result = self.closure(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 135, in closure
step_output = self._step_fn()
^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 419, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1485, in _call_strategy_hook
output = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\strategies\dp.py", line 134, in training_step
return self.model(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\torch\nn\parallel\data_parallel.py", line 183, in forward
return self.module(*inputs[0], **module_kwargs[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\overrides\data_parallel.py", line 77, in forward
output = super().forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\pytorch_lightning\overrides\base.py", line 98, in forward
output = self._forward_module.training_step(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\Era3D\instant-nsr-pl\systems\neus_ortho.py", line 166, in training_step
train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero
Epoch 0: : 0it [07:08, ?it/s]
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

我看了下 xxlong0/Wonder3D#47 在wonder3d里也有相关问题，以及Instant-NSR项目下也有，不知道本项目有没有注意到Instant-NSR有个fix data win版本的分支。希望能够解决，感谢

IndexError: The shape of the mask [1, 64, 64] at index 1 does not match the shape of the indexed tensor [1, 360, 360, 3] at index 1

when im usingEra3D_to_InstantMesh.json
it stopped at node Era3D MVDiffusion Model

!!! Exception during processing!!! The shape of the mask [1, 64, 64] at index 1 does not match the shape of the indexed tensor [1, 360, 360, 3] at index 1
Traceback (most recent call last):
File "C:\c\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "C:\c\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "C:\c\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "C:\ProgramData\anaconda3\envs\310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\c\custom_nodes\ComfyUI-3D-Pack\nodes.py", line 2379, in run_model
single_image = torch_imgs_to_pils(reference_image, reference_mask)[0]
File "C:\c\custom_nodes\ComfyUI-3D-Pack\shared_utils\image_utils.py", line 29, in torch_imgs_to_pils
images[inv_mask_index] = 0.
IndexError: The shape of the mask [1, 64, 64] at index 1 does not match the shape of the indexed tensor [1, 360, 360, 3] at index 1

Quality of processed OBJ

Hi, firstly very nice repo! I got managed to run it on Windows + RTX4090 using Docker. I can share my Dockerfile for other users to run it more easily if you want.

However, I got one issue, the generated normals are looking very nice, but after running the Step 2 (instant-nsr-pl), the resulting OBJ mesh is kinda blury and not having much detail as shown in the normals in first step. Could you please help how to improve the result?

Thanks

Refine obj import to blender, color attribute not found

Import it3000-mc256.obj to blender, color attribute can be used as material base color. However, import refine__name__.obj file to blender, color attribute is not found.

Can I run the Python app. py on Windows?

Can xformers be replaced with xformers-0.0.23. post1 cp39 cp39 win amd64. whl?
I always report errors when running NotImplementedError: No operator found for memory_efficient_attention_forward with inputs
Please help me

Question about Cross-domain Att

As shown in the appendix of the paper, it is said that attention contains: self-cross-domain att, row att, and cross-domain att. However, when I read the code, it seems that Only self-cross-domain att and row att are used, cross-domain att is discarded. Related details are as:

Does Era3D actually only uses the self-cross-domain att and row att?

[Windows] Instant-NSR 3D reconstruction takes more than an hour on a RTX 4090 mobile GPU!

It get stucked here for more than an hour, not even progress bar is moving:

python launch.py --config configs/neuralangelo-ortho-wmask.yaml --gpu 0 --train dataset.root_dir='../output/mv_res' dataset.scene='PirateCat_0' --exp_dir output/recon

Seed set to 42
Using finite difference to compute gradients with eps=progressive
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
You are using a CUDA device ('NVIDIA GeForce RTX 4090 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
../output/mv_res\PirateCat_0
(1024, 1024, 3)
the loaded normals are defined in the system of front view
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type             | Params
-------------------------------------------
0 | cos   | CosineSimilarity | 0
1 | model | NeuSModel        | 7.7 M
-------------------------------------------
7.7 M     Trainable params
0         Non-trainable params
7.7 M     Total params
30.742    Total estimated model params size (MB)
Epoch 0: |                                                                                                                             | 0/? [00:00<?, ?it/s]Update finite_difference_eps to 0.027204705103003882

I tested on Windows11 cuda12.1
Is something wrong? Should it be this slow?

training set object list

Thanks for open source and I enjoyed reading your paper. A few questions about training set

Could you share your training set object list? (I found several jsons in this repo but am unsure which one the model was trained on)
Can you comment on what criteria you used to select this subset?

unexpected noisy normal generated on side view and boundary

Any advise on improving the quality?

Questions about embeds under mvdiffusion/data/fixed_prompt_embeds_6view?

Is the embeds under mvdiffusion/data/fixed_prompt_embeds_6view sort of trained Textual Inversion or simply encoded prompt embedding?

The shape of the character in the image is not correctly captured.

Hello! First of all, thank you for creating this wonderful project. We appreciate it.

Attempts to load a picture of a robot in the Era3D demo on huggingface failed to capture the shape properly and the output was unsuccessful. I think it would be easier to capture the shape of the object with a grey background instead of a white background, but what do you think?

Can I create additional images from another angles ?

As you know Era3D creates 12 images like below.

I can see the source code too.

VIEWS = ['front', 'front_right', 'right', 'back', 'left', 'front_left']

I wonder that if I can create additional images from another angles like this.

These are from another angles.
If fact I am studying this. https://github.com/Profactor/continuous-remeshing
You mentioned this here. #8 (comment)

If I can create those images I can upgrade mesh quality as far as I think.
What do you think about this ?
Can I get any help for this ?

License ?

This is fantastic work!! Congrats! I noticed the repo was missing a license and was wondering if you were intending to open source this at some point ?

Please be more informative about that only HF repo works

Everyone thinks that this repo is the "source of truth", but python app.py will just generate an error once you try to generate normals.

Then the solution is: Use https://huggingface.co/spaces/pengHTYX/Era3D_MV_demo instead

But why have a broken repository and a working one?

"TypeError: unsupported operand type(s) for /: 'str' and 'int'" when executing code at .\mvdiffusion\data\single_image_dataset.py, line 182

python test_mvdiffusion_unclip.py --config configs/test_unclip-512-6view.yaml pretrained_model_name_or_path='pengHTYX/MacLab-Era3D-512-6view' validation_dataset.crop_size='420' validation_dataset.root_dir=input seed=810 save_dir='mv_res' save_mode='rgb'
{'pretrained_model_name_or_path': 'pengHTYX/MacLab-Era3D-512-6view', 'revision': None, 'num_views': 6, 'validation_dataset': {'prompt_embeds_path': 'mvdiffusion/data/fixed_prompt_embeds_6view', 'root_dir': 'input', 'num_views': 6, 'bg_color': 'white', 'img_wh': [512, 512], 'num_validation_samples': 1000, 'crop_size': '420'}, 'pred_type': 'joint', 'save_dir': 'mv_res', 'save_mode': 'rgb', 'seed': 810, 'validation_batch_size': 1, 'dataloader_num_workers': 1, 'local_rank': -1, 'pipe_kwargs': {'num_views': 6}, 'validation_guidance_scales': [3.0], 'pipe_validation_kwargs': {'num_inference_steps': 40, 'eta': 1.0}, 'validation_grid_nrow': 6, 'regress_elevation': True, 'regress_focal_length': True, 'unet_from_pretrained_kwargs': {'unclip': True, 'sdxl': False, 'num_views': 6, 'sample_size': 64, 'zero_init_conv_in': False, 'regress_elevation': True, 'regress_focal_length': True, 'camera_embedding_type': 'e_de_da_sincos', 'projection_camera_embeddings_input_dim': 4, 'zero_init_camera_projection': False, 'num_regress_blocks': 3, 'cd_attention_last': False, 'cd_attention_mid': False, 'multiview_attention': True, 'sparse_mv_attention': True, 'selfattn_block': 'self_rowwise', 'mvcd_attention': True, 'use_dino': False}, 'enable_xformers_memory_efficient_attention': True}
The config attributes {'decay': 0.9999, 'inv_gamma': 1.0, 'min_decay': 0.0, 'optimization_step': 40000, 'power': 0.6666666666666666, 'update_after_step': 0, 'use_ema_warmup': False} were passed to UNetMV2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 9/9 [00:01<00:00, 7.35it/s]
input\test0.png
Traceback (most recent call last):
File "D:\Era3D\test_mvdiffusion_unclip.py", line 233, in
main(cfg)
File "D:\Era3D\test_mvdiffusion_unclip.py", line 199, in main
validation_dataset = SingleImageDataset(
^^^^^^^^^^^^^^^^^^^
File "D:\Era3D\mvdiffusion\data\single_image_dataset.py", line 131, in init
image, alpha = self.load_image(os.path.join(self.root_dir, file), bg_color, return_type='pt')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Era3D\mvdiffusion\data\single_image_dataset.py", line 182, in load_image
scale = self.crop_size / max(h, w)
~~~~~~~~~~~~~~~^~~~~~~~~~~
TypeError: unsupported operand type(s) for /: 'str' and 'int'

(I created a "input" folder for my own images)

I wonder if we can generate 3D models

WARNING: Requirement 'xformers-0.0.23.post1-cp39-cp39-manylinux2014_x86_64.whl' looks like a filename, but the file does not exist ERROR: xformers-0.0.23.post1-cp39-cp39-manylinux2014_x86_64.whl is not a supported wheel on this platform.

Error occurs when running

pip install xformers-0.0.23.post1-cp39-cp39-manylinux2014_x86_64.whl

I'm a 3d artist, so I'm not familiar with this. I checked https://download.pytorch.org/whl/cu118/xformers/ but it seems that wheel is not there.

xformers-0.0.23.post1**+cu118**-cp39-cp39-manylinux2014_x86_64.whl seems to be the closest match, please advise where to download the correct whl