Code Monkey home page Code Monkey logo

argus-3d's Introduction

Argus-3D: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

Paper | Project Page

Installation

You can create an anaconda environment called argus-3d using

conda env create -f environment.yaml
conda activate argus-3d

Next, compile the extension modules. You can do this via

python setup.py build_ext --inplace

Generation

Download stage1 checkpoint and place it into output/PR256_ED512_EN8192.

Download stage2 checkpoint and place it into output/PR256_ED512_EN8192/class-guide/transformer3072_24_32.

Then you can try class-guide generation by run:

python generate_class-guide.py --batch_size 16 --cate chair

This script should create a folder output/PR256_ED512_EN8192/class-guide/transformer3072_24_32/class_cond where the output meshes are stored.

Note: Our model requires significant memory, and it's recommended to run it on a GPU with high VRAM capacity (40GB or above). Generating a single mesh on the A100 (80GB) takes approximately 50 seconds on average, while on V100 (32GB) it takes ~6 minutes.

Dataset

The occupancies, point clouds, and supplementary rendered images based on the Objaverse dataset can be downloaded from https://huggingface.co/datasets/BAAI/Objaverse-MIX

Coming Soon

  • Image-guide generation
  • Text-guide generation
  • Training code

Shout-outs

Thanks to everyone who makes their code and models available.

Thanks for open-sourcing!

BibTeX

@inproceedings{inproceedings,
      author = {Luo, Simian and Qian, Xuelin and Fu, Yanwei and Zhang, Yinda and Tai, Ying and Zhang, Zhenyu and Wang, Chengjie and Xue, Xiangyang},
      year = {2023},
      month = {10},
      pages = {14093-14103},
      title = {Learning Versatile 3D Shape Generation with Improved Auto-regressive Models},
      doi = {10.1109/ICCV51070.2023.01300}
}
@misc{qian2024pushing,
      title={Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability}, 
      author={Xuelin Qian and Yu Wang and Simian Luo and Yinda Zhang and Ying Tai and Zhenyu Zhang and Chengjie Wang and Xiangyang Xue and Bo Zhao and Tiejun Huang and Yunsheng Wu and Yanwei Fu},
      year={2024},
      eprint={2402.12225},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

argus-3d's People

Contributors

aimind avatar fvplab avatar

Stargazers

 avatar Haoxiang Guo avatar Dr.AlexLiu avatar Xiong Lin avatar Philip Becker-Ehmck avatar Jack Yang avatar  avatar  avatar Zhixuan Xu avatar SresserS avatar Yiwen Tang avatar Zhipeng Huang avatar gzg avatar  avatar Roozbeh avatar leanAI avatar Li Jie avatar Chris Smith avatar Tommy Falkowski avatar Hakeem Demi avatar  avatar Daniel Putzer avatar Slice avatar  avatar Mickls avatar  avatar Jean-Philippe Deblonde avatar  avatar Kyle avatar learner avatar Yunhan Yang avatar  avatar  avatar Jonathan Clark avatar Said avatar Sai Kumar Dwivedi avatar Markus Rauhalahti avatar Yacine Zahidi avatar Jeff Carpenter avatar Xiaoge Cao avatar Hu Zhu avatar Tykis avatar  avatar  avatar Tianyi Yan avatar PDC avatar Ashish Sinha avatar Sucheng Qian avatar elucida avatar long_time_no_see avatar Snow avatar zhuxiangyang avatar Johnson avatar Mr.Dong avatar Yoon, Seungje avatar Purva Tendulkar avatar Zeren Xiong avatar Jiale Xu avatar ZhiyuanthePony avatar  avatar Cao Yukang avatar William Li avatar L.JIE avatar Bowen Chen avatar Larescool avatar Sandalots avatar 爱可可-爱生活 avatar  avatar ZHAO, BO avatar Jiyao Zhang avatar SY avatar  avatar ⑨ avatar CNNwithdreams avatar Hengyu MENG avatar  avatar BENZEMA avatar xkjchen avatar  avatar  avatar ziyu avatar LongZhou avatar gates avatar yuangan avatar  avatar  avatar Yizhi Wang avatar Mona Jalal avatar 顾小东(Xiaodong Gu) avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar Snow avatar Pyjcsx avatar  avatar Mr.Dong avatar ⑨ avatar  avatar

argus-3d's Issues

Can this run on Windows?

A very outstanding work. Can this run on Windows? When I execute commands on Windows, the system prompts:
`ResolvePackageNotFound:

  • ld_impl_linux-64=2.38
  • libgomp=11.2.0
  • libgcc-ng=11.2.0
  • nspr=4.35
  • nss=3.89.1
  • libgfortran-ng=11.2.0
  • ncurses=6.4
  • readline=8.2
  • dbus=1.13.18
  • gst-plugins-base=1.14.1
  • gmp=6.2.1
  • libxkbcommon=1.0.1
  • nettle=3.6
  • libedit=3.1.20221030
  • libgfortran5=11.2.0
  • gnutls=3.6.13
  • libstdcxx-ng=11.2.0
  • gstreamer=1.14.1
  • libuuid=1.41.5
  • openh264=2.1.1
  • _openmp_mutex=5.1`

Please ask for Argus-3D configuration information

I would like to ask about the step of compiling the extension module python setup.py build_ext --inplace. The error is as follows:

File "/home/hjq/anaconda3/envs/argus-3d/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Is my server environment configured incorrectly?

Can we generate textures or vertex colors

This program of yours is very powerful, higher quality than shap-E, which is amazing. I also have a few small questions to ask, namely: 1. Can we generate textures or vertex colors with this tool; 2. I see if there are 55 categories generated by categories, can I generate other items. We are iSoftStone Innovation Research Institute, and we are also doing research and application work on automatic modeling. Hope to have some cooperation if possible, thank you!

Problem with compile the extension modules

Thank you for the great work!
when i run
python setup.py build_ext --inplace
it comes with
'ValueError: 'src/utils/libmcubes/mcubes.pyx' doesn't match any files'
i don't see libmcubes file in the github project~

How to process my own datasets?

Great work! I'd like to use my own datasets for training. Do you have any scripts that allow me to convert my own datasets (some .obj/ .glb/ .stl format meshes) to the same format as in the Objaverse-Mix. Many thanks ! Besides, I've some questions regarding this dataset:

  1. I've taken a deep look into this datasets. However I found some extreme low quality data inside this dataset. Is there anyway we can filtered these out?
  2. How to filter subdataset by category? Is there any meta data I can use?

it's too slow and result with cube

I was use the GPU device with 4090 (24GB) to generate a class knife, batch_size 16.

It's show that need 126 batch total, each batch cost about 20~30min, somethings wrong with it?

# total 0/126
Category knife:   0%|                                   | 0/126 [00:00<?, ?it/s]

# each
 55%|█████████████████████▉                  | 562/1024 [09:06<15:16,  1.98s/it]

请教下

如果我只想生成一个车,而不是把所有的车都生成出来,应该怎么处理

没有冒犯大神们的意思,赞美的话就不说了,说说我理解的,和理想模型之间的差距。

  1. 3D的数据可以用,但我坚持认为,在3D上监督做,不是最终的路,还是要到“2D-Image + 极度丰富的文本描述“,这两种约束做监督。
  2. 从算法架构上,我觉得,假设我们最终系统,假设能支撑比较好的生成100万类/个对象(先不说场景),我觉得最多前面1万类/个对象可能需要端到端的学习,而99%的后来的新对象,不需要参与端到端的训练,只是跑整个算法很小的一个子网(几个环节),提取到必要的特征,可以类似普通数据库能存下来就好。 然后infer的时候能“近邻搜”出来(可以是个NN来表示更合适的距离),直接走后面的流程。 这样才能更大的支持到1000万类/个。
    这一点,数字是随意估计的,核心想说的,必须要有几乎零训练的线性/亚线性的扩展模式。diffusion那种靠参数记忆的,我觉得对于3D(shape+texture+motion+interaction+....),做到真实世界普通人觉得华丽,或者UE5可以直接用(MetaHuman那种,还带LOD),模型大的要死。文章中,codebook那些地方,也就是VAE的细脖子,我觉得迟早是推广到真实世界后容量的瓶颈。
  3. 其他不说了,怕喷。

最后开个认真的的玩笑:从数据量上看:Objaverse-mix看起来3d的很大,相比于text, 2d-image做出chatgpt那种普通人觉得华丽的效果,差了至少两个数量级,咋弥补;从算力上看:8个A100训练4周,感觉是从0训练,那估计也差了两个数量级。

Loading Stage2 Fail !!!!!!

my GPU is A6000 48G, RAM is 128GB,but i wait so long time and res is:

Using Transformer Model:  FAST_Transformer_joint Baseline Quant Single !!
Using Single Quantizer  !!!! Reduce 4 High Reso 256 Relu Quantize 256 Symm 1
FAST_transformer_builder_baseline No drop Quant single cond class
z emb shape torch.Size([8192, 3072])

Loading Stage2 Fail !!!!!!

Current best validation metric (iou): -inf
Total number of parameters: 3670362821
output path: output/PR256_ED512_EN8192/class-guide/transformer3072_24_32
100%|███████████████████████████████████████| 1024/1024 [32:10<00:00,  1.88s/it]
100%|███████████████████████████████████████| 1024/1024 [32:04<00:00,  1.88s/it]

i also get the cube。
i want to know why my stage2 failed!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.