openrobotlab / pointllm Goto Github PK
View Code? Open in Web Editor NEW[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds
Home Page: https://runsenxu.com/projects/PointLLM
[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds
Home Page: https://runsenxu.com/projects/PointLLM
Hello,
Thanks for the fantastic work.
I want to re-train the PointBERT in scene level,
you mention that you re-train the PointBERT with color information
so could you please provide the pretraining code to me,
thanks a lot!!!
Hello,
Thanks for the fantastic work.
I would like to know how the the dataset containing uniformly sampled points is created.
I took a look at the mesh objects in objaverse and noticed that the mesh was not uniformly distributed, meaning that if we only sample the vertices of the meshes to point clouds, the point clouds will also be non-uniform.
I will appreciated if authors could show me some related instructions, code scripts, etc.
Thanks in advance
Hello! Thanks for your work! I find it really interesting.
I have a question regarding the special tokens <p_start> and <p_end> you use in PointLLM.
I would like to know how they look like: if they are the BOS and EOS tokens of the LLM, or if they are trained, or if they have a different structure...
Thanks in advance,
Andrea
Thank you for your work, it's great!
I tried your demo, and there is no problem with uploading the point cloud and the 3D file part, but there is always ERROR in the question phase, can you fix it? thank you!
Hi,when I was doing the stage-1 training, I met some problems.It seems like the problem is caused by the CUDA_DEVICES, but I can't find the device configure in the train.py.Can you help me out?
Here is the details:
scripts/PointLLM_train_stage1.sh
W0710 18:28:47.395000 140217301160576 torch/distributed/run.py:757]
W0710 18:28:47.395000 140217301160576 torch/distributed/run.py:757] *****************************************
W0710 18:28:47.395000 140217301160576 torch/distributed/run.py:757] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0710 18:28:47.395000 140217301160576 torch/distributed/run.py:757] *****************************************
2024-07-10 18:28:52 - ERROR - stderr - /home/lyc/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.
2024-07-10 18:28:52 - ERROR - stderr - warnings.warn(
2024-07-10 18:28:52 - ERROR - stderr - /home/lyc/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.
2024-07-10 18:28:52 - ERROR - stderr - warnings.warn(
[rank7]: Traceback (most recent call last):
[rank7]: File "/data2/2023/yzy/PointLLM/pointllm/train/train_mem.py", line 13, in
[rank7]: train()
[rank7]: File "/data2/2023/yzy/PointLLM/pointllm/train/train.py", line 97, in train
[rank7]: model_args, data_args, training_args = parser.parse_args_into_dataclasses()
[rank7]: File "/data1/anaconda3/envs/yzy_pointllm/lib/python3.10/site-packages/transformers/hf_argparser.py", line 332, in parse_args_into_dataclasses
[rank7]: obj = dtype(**inputs)
[rank7]: File "", line 120, in init
[rank7]: File "/data1/anaconda3/envs/yzy_pointllm/lib/python3.10/site-packages/transformers/training_args.py", line 1227, in post_init
[rank7]: and (self.device.type != "cuda")
[rank7]: File "/data1/anaconda3/envs/yzy_pointllm/lib/python3.10/site-packages/transformers/training_args.py", line 1662, in device
[rank7]: return self._setup_devices
[rank7]: File "/data1/anaconda3/envs/yzy_pointllm/lib/python3.10/site-packages/transformers/utils/generic.py", line 54, in get
[rank7]: cached = self.fget(obj)
[rank7]: File "/data1/anaconda3/envs/yzy_pointllm/lib/python3.10/site-packages/transformers/training_args.py", line 1652, in _setup_devices
[rank7]: torch.cuda.set_device(device)
[rank7]: File "/home/lyc/.local/lib/python3.10/site-packages/torch/cuda/init.py", line 399, in set_device
[rank7]: torch._C._cuda_setDevice(device)
[rank7]: RuntimeError: CUDA error: invalid device ordinal
[rank7]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank7]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank7]: Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.Root Cause (first observed failure):
[0]:
time : 2024-07-10_18:28:57
host : nuosen
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 2161485)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Hi - have you released (or planning on releasing) the 70k pointcloud assets with the corresponding complex text description?
Thanks for your great work!
May i ask how to use transfer the .npy data to .obj or other format that easy to render and load to simulator?
Hello,
Thanks for the authors' fantanstic work shared here.
I wonder how to inference with colorless point cloud.
The model I am using is the pretrained PointLLM_13B_v1.2.
Here are some failed attempt I had:
Thanks in advance.
Hello team,
Firstly, it is really a amazing work and I appreciate your work on the PointLLM project and the online demo can work well. However, I've encountered an issue while implementing the code with my personal feature points data. when I running the code, I noticed that when I input my personal feature points data to into the code as id, which the size (478, 3), it showed the error like this, which means it expected to to have 6 channels, but got 3 channels instead.
To solve this, I duplicated the 3 dimensions of my data, changing its size from (478, 3) to (478,6) , which resolved the error.
But this is not a good solution, for point cloud, we need to have only 3 dimensions. I think this issue is caused by the conv size. Could you please investigate this issue? A solution or guidance on how to handle 3-dimensional data without having to artificially adjust its size would be highly beneficial.
Thanks for your attention to this matter. Looking forward to your response.
Thanks for your excellent job! In the paper,you mentioned the pointLLM scence and showed its capability of scene understanding. I am wondering which checkpoints correspond to the scene model?
Thank you for sharing an awesome work!
Can you share me config you used to train PointLLM and PointBERT? (e.g. mm_use_point_start_end)
Also it would be great if you send me a trainer code for PointLLM.
My email is as follows: [email protected]
Thanks,
Hello~ thanks for sharing your great work!
I'm wondering if there is any benchmark test for scene understanding task on dataset like ScanQA, which was done by 3D-LLM.
Hi,
This is a great work, may I know when the training data will be released?
There is something wrong with the PointLLM online demo, i can visit it a few days ago, but not available now.
If possible, could you please fix it?
Hello! I have a question regarding the training of Point-BERT. In your paper, you claim what follows:
As the original implementation of ULIP-2 only supports point clouds with spatial coordinates (xyz), we re-train Point-BERT with color information (xyzrgb), following the same procedure outlined in the ULIP-2 paper. For training Point-BERT, we employ ViT-L/14 from OpenCLIP [20] and use point clouds from the Cap3D [29] dataset...
I am wondering if you trained Point-BERT from scratch, by following ULIP-2 pipeline (aligning 3D features with 2D and text features) or finetuned it starting from a Point-BERT model trained on the reconstruction task, as the original Point-BERT paper suggests.
Thanks in advance,
Andrea
is this correct: all the point clouds more than 8192 points will be downsampled to 8192, if less than 8192 points the system will do nothing?
The second question is can PointLLM support any number of points in the future?
Hi, great job! Do you have plan to release the weights for the PointLLM model, or provide an open source API?
hello, I want to know if there is any hardware specification for your model inference.
Are there any minimum requirement for inference or fine-tuning your model in terms of GPU, memory and etc?
Thanks for sharing the project!
When I try to use the provided api to generate 3d objects related descriptions, the .npy
path is valid on my computer, the example.json
is as follows {"raw":{"a":1,"b":2},"serialized":null}
. I am not clear what the exact format of the json
file, so I just set a very simple one.
from gradio_client import Client
client = Client("http://101.230.144.196/")
result = client.predict(
"Object ID", # str in 'Input Method' Radio component
"b4bbf2116b1a41a5a3b9d3622b07074c", # str in 'Object ID Input' Textbox component
"data/modelnet40_c/data_background_1.npy", # str (filepath or URL to file) in 'Upload Point Cloud File (PLY, NPY, etc.)' File component
"example.json", # str (filepath to JSON file) in 'parameter_15' Chatbot component
fn_index=4
)
print(result)
The server returned the following error.
Loaded as API: http://101.230.144.196/ ✔
Traceback (most recent call last):
File "pointllm.py", line 15, in <module>
result = client.predict(
File "/home/jerry/miniconda3/envs/dassl/lib/python3.8/site-packages/gradio_client/client.py", line 392, in predict
return self.submit(*args, api_name=api_name, fn_index=fn_index).result()
File "/home/jerry/miniconda3/envs/dassl/lib/python3.8/site-packages/gradio_client/client.py", line 1573, in result
return super().result(timeout=timeout)
File "/home/jerry/miniconda3/envs/dassl/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/home/jerry/miniconda3/envs/dassl/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/home/jerry/miniconda3/envs/dassl/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/jerry/miniconda3/envs/dassl/lib/python3.8/site-packages/gradio_client/client.py", line 1245, in _inner
predictions = _predict(*data)
File "/home/jerry/miniconda3/envs/dassl/lib/python3.8/site-packages/gradio_client/client.py", line 1275, in _predict
raise ValueError(result["error"])
ValueError: None
I am not sure why the error ocurred.
Thanks.
Hi, thanks for your great work!
I am currently looking into the training phase of the PointLLM. I am wondering if you can release the checkpoints of the weights after Stage1 (the pre-trained weights before stage2). I have checked Hugging Face repo, and only found the final V1.2 ckpt.
I would appreciate it if you could release more ckpts!
Hello, nice work! Do you have plans for releasing the pre-trained weights of the point cloud encoder Point-BERT
? As you mentioned in the paper:
As the original implementation of ULIP-2 only supports point clouds with spatial coordinates (xyz), we re-train Point-BERT with color information (xyzrgb), following the same procedure outlined in the ULIP-2 paper.
I think the release would be of great help. Appreciate :)
Thanks for your great job!
I have a question while implementing PointLLM.
Do we also have to update lm_head while instruction tuning? I wonder if I should fix lm_head or not.
Thanks!
Hello,
Thanks for the great work.
Is there any instruction or guidance for me to generate text description for my own dataset(part of the ShapeNet). Besides, how many gpu memory will I need for this task.
Thanks,
Daniel Wu
Thanks for your fantastic work on combining the llm and point cloud together!
When begin training stage 1, I met two issues:
First, I run training on four RTX3090 with 24GB memory each. Though setting the batchsize to 1, the CUDA OUT OF MEMORY occurs. Is there any solution to this?
Secondly, an info of stderr says: "Some weights of PointLLMLlamaForCausalLM were not initialized from the model checkpoint at checkpoints/PointLLM_7B_v1.1_init and are newly initialized". The detailed picture is attached below. I prepared the checkpoints/PointLLM_7B_v1.1_init according to the instruction, and don't know how to solve the problem.
Again, thanks for your contribution to the community! Looking forward to your reply~
Best,
Sillybear
根据您的文字描述数据生成说明,你们是首先使用blender渲染.gelb文件来生成8个视角的图片。但是你们最后给的数据集又是点云数据,而blender无法处理点云数据,所以我想询问你们是如何处理点云来生成多视角图像的?
Is the large language model in this algorithm in a frozen state or a trainable state during training
Great work! I want to know how to use my own data locally, especially in the aspect of point cloud and text alignment.
I currently have 4 RTX 4090 GPUs, each with 24GB of memory. However, I encounter an out of memory error when running a 7B model. I changed model: PointLLMLlamaForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True, use_cache=True, torch_dtype=torch.float16).cuda()
to use dtype=float16, and it runs, but during inference, I get an error:
[ERROR] Input type (float) and bias type (c10::Half)
I am seeking your help, thank you.
Hi, very interesting work!
I am using for my thesis Point-BERT encoder with the weights you provided in issue #1 and the clouds taken from HuggingFace Objaverse_660K_8192_npy_split_a* files and it works fine without further steps.
I can't figure out, however, how the x,y,z components of such clouds are normalized (if they are), or how they are generated. This would be essential to test the encoder with other clouds.
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.