Code Monkey home page Code Monkey logo

3dtrans's Introduction

arXiv arXiv arXiv arXiv arXiv GitHub issues PRs Welcome

3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task

3DTrans includes Transfer Learning Techniques and Scalable Pre-training Techniques for tackling the continuous learning issue on autonomous driving as follows.

  1. We implement the Transfer Learning Techniques consisting of four functions:
  • Unsupervised Domain Adaptation (UDA) for 3D Point Clouds
  • Active Domain Adaptation (ADA) for 3D Point Clouds
  • Semi-Supervised Domain Adaptation (SSDA) for 3D Point Clouds
  • Multi-dateset Domain Fusion (MDF) for 3D Point Clouds
  1. We implement the Scalable Pre-training which can continuously enhance the model performance for the downstream tasks, as more pre-training data are fed into our pre-training network:

Overview

News 🔥

  • We have released all codes of AD-PT here, including: 1) pre-training and fine-tuning methods, 2) labeled and pseudo-labeled data, and 3) pre-trained checkpoints for fine-tuning. Please see AD-PT for more technical details (updated on Sep. 2023).
  • SPOT shows that occupancy prediction is a promising pre-training method for general and scalable 3D representation learning, and see Figure 1 of SPOT paper for the inspiring experiment results (updated on Sep. 2023).
  • We have released the Reconstruction-Simulation Dataset obtained using the ReSimAD method (updated on Sep. 2023).
  • We have released the AD-PT pre-trained checkpoints, see AD-PT pre-trained checkpoints for pre-trained checkpoints (updated on Aug. 2023).
  • Based on 3DTrans, we achieved significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint, PV-RCNN (updated on Jun. 2023).
  • Our 3DTrans supported the Semi-Supervised Domain Adaptation (SSDA) for 3D Object Detection (updated on Nov. 2022).
  • Our 3DTrans supported the Active Domain Adaptation (ADA) of 3D Object Detection for achieving a good trade-off between high performance and annotation cost (updated on Oct. 2022).
  • Our 3DTrans supported several typical transfer learning techniques (such as TQS, CLUE, SN, ST3D, Pseudo-labeling, SESS, and Mean-Teacher) for autonomous driving-related model adaptation and transfer.
  • Our 3DTrans supported the Multi-domain Dataset Fusion (MDF) of 3D Object Detection for enabling the existing 3D models to effectively learn from multiple off-the-shelf 3D datasets (updated on Sep. 2022).
  • Our 3DTrans supported the Unsupervised Domain Adaptation (UDA) of 3D Object Detection for deploying a well-trained source model to an unlabeled target domain (updated on July 2022).
  • We calculate the distribution of the object-size for each public AD dataset in object-size statistics

We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. 🗼

Installation for 3DTrans

You may refer to INSTALL.md for the installation of 3DTrans.

Getting Started

Getting Started for ALL Settings
  • Please refer to Readme for Datasets to prepare the dataset and convert the data into the 3DTrans format. Besides, 3DTrans supports the reading and writing data from Ceph Petrel-OSS, please refer to Readme for Datasets for more details.

  • Please refer to Readme for UDA for understanding the problem definition of UDA and performing the UDA adaptation process.

  • Please refer to Readme for ADA for understanding the problem definition of ADA and performing the ADA adaptation process.

  • Please refer to Readme for SSDA for understanding the problem definition of SSDA and performing the SSDA adaptation process.

  • Please refer to Readme for MDF for understanding the problem definition of MDF and performing the MDF joint-training process.

  • Please refer to Readme for ReSimAD for ReSimAD implementation.

  • Please refer to Readme for AD-PT Pre-training for starting the journey of 3D perception pre-training using AD-PT.

  • Please refer to Readme for PointContrast Pre-training for 3D perception pre-training using PointContrast.

Model Zoo

We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.

Domain Transfer Results

UDA Results

Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.

  • All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
  • For Waymo dataset training, we train the model using 20% data.
  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training source-only model stage.
  • Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.
training time Adaptation Car@R40 download
PointPillar ~7.1 hours Source-only with SN 74.98 / 49.31 -
PointPillar ~0.6 hours Pre-SN 81.71 / 57.11 model-57M
PV-RCNN ~23 hours Source-only with SN 69.92 / 60.17 -
PV-RCNN ~23 hours Source-only 74.42 / 40.35 -
PV-RCNN ~3.5 hours Pre-SN 84.00 / 74.57 model-156M
PV-RCNN ~1 hours Post-SN 84.94 / 75.20 model-156M
Voxel R-CNN ~16 hours Source-only with SN 75.83 / 55.50 -
Voxel R-CNN ~16 hours Source-only 64.88 / 19.90 -
Voxel R-CNN ~2.5 hours Pre-SN 82.56 / 67.32 model-201M
Voxel R-CNN ~2.2 hours Post-SN 85.44 / 76.78 model-201M
PV-RCNN++ ~20 hours Source-only with SN 67.22 / 56.50 -
PV-RCNN++ ~20 hours Source-only 67.68 / 20.82 -
PV-RCNN++ ~2.2 hours Post-SN 86.86 / 79.86 model-193M
ADA Results

Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.

  • All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
  • For Waymo dataset training, we train the model using 20% data.
  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
training time Adaptation Car@R40 download
PV-RCNN ~23h@4 A100 Source Only 67.95 / 27.65 -
PV-RCNN ~1.5h@2 A100 Bi3D (1% annotation budget) 87.12 / 78.03 Model-58M
PV-RCNN ~10h@2 A100 Bi3D (5% annotation budget) 89.53 / 81.32 Model-58M
PV-RCNN ~1.5h@2 A100 TQS 82.00 / 72.04 Model-58M
PV-RCNN ~1.5h@2 A100 CLUE 82.13 / 73.14 Model-50M
PV-RCNN ~10h@2 A100 Bi3D+ST3D 87.83 / 81.23 Model-58M
Voxel R-CNN ~16h@4 A100 Source Only 64.87 / 19.90 -
Voxel R-CNN ~1.5h@2 A100 Bi3D (1% annotation budget) 88.09 / 79.14 Model-72M
Voxel R-CNN ~6h@2 A100 Bi3D (5% annotation budget) 90.18 / 81.34 Model-72M
Voxel R-CNN ~1.5h@2 A100 TQS 78.26 / 67.11 Model-72M
Voxel R-CNN ~1.5h@2 A100 CLUE 81.93 / 70.89 Model-72M
SSDA Results

We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.

  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • For Waymo dataset training, we train the model using 20% data.
  • second_5%_FT denotes that we use 5% nuScenes training data to fine-tune the Second model.
  • second_5%_SESS denotes that we utilize the SESS: Self-Ensembling Semi-Supervised method to adapt our baseline model.
  • second_5%_PS denotes that we fine-tune the source-only model to nuScenes datasets using 5% labeled data, and perform the pseudo-labeling process on the remaining 95% unlabeled nuScenes data.
training time Adaptation Car@R40 download
Second ~11 hours source-only(Waymo) 27.85 / 16.43 -
Second ~0.4 hours second_5%_FT 45.95 / 26.98 model-61M
Second ~1.8 hours second_5%_SESS 47.77 / 28.74 model-61M
Second ~1.7 hours second_5%_PS 47.72 / 29.37 model-61M
PV-RCNN ~24 hours source-only(Waymo) 40.31 / 23.32 -
PV-RCNN ~1.0 hours pvrcnn_5%_FT 49.58 / 34.86 model-150M
PV-RCNN ~5.5 hours pvrcnn_5%_SESS 49.92 / 35.28 model-150M
PV-RCNN ~5.4 hours pvrcnn_5%_PS 49.84 / 35.07 model-150M
PV-RCNN++ ~16 hours source-only(Waymo) 31.96 / 19.81 -
PV-RCNN++ ~1.2 hours pvplus_5%_FT 49.94 / 34.28 model-185M
PV-RCNN++ ~4.2 hours pvplus_5%_SESS 51.14 / 35.25 model-185M
PV-RCNN++ ~3.6 hours pvplus_5%_PS 50.84 / 35.39 model-185M
  • For Waymo-to-ONCE adaptation, we employ 8 NVIDIA A100 GPUs for model training.
  • PS denotes that we pseudo-label the unlabeled ONCE and re-train the model on pseudo-labeled data.
  • SESS denotes that we utilize the SESS method to adapt the baseline.
  • For ONCE, the IoU thresholds for evaluation are 0.7, 0.3, 0.5 for Vehicle, Pedestrian, Cyclist.
Training ONCE Data Methods Vehicle@AP Pedestrian@AP Cyclist@AP download
Centerpoint Labeled (4K) Train from scracth 74.93 46.21 67.36 model-96M
Centerpoint_Pede Labeled (4K) PS - 49.14 - model-96M
PV-RCNN++ Labeled (4K) Train from scracth 79.78 35.91 63.18 model-188M
PV-RCNN++ Small Dataset (100K) SESS 80.02 46.24 66.41 model-188M
MDF Results

Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.

  • All LiDAR-based models are trained with 8 NVIDIA A100 GPUs and are available for download.
  • The multi-domain dataset fusion (MDF) training time is measured with 8 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • For Waymo dataset training, we train the model using 20% training data for saving training time.
  • PV-RCNN-nuScenes represents that we train the PV-RCNN model only using nuScenes dataset, and PV-RCNN-DM indicates that we merge the Waymo and nuScenes datasets and train on the merged dataset. Besides, PV-RCNN-DT denotes the domain attention-aware multi-dataset training.
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
PV-RCNN-nuScenes only nuScenes 35.59 / 35.21 3.95 / 2.55 0.94 / 0.92 57.78 / 41.10 24.52 / 18.56 10.24 / 8.25
PV-RCNN-Waymo only Waymo 66.49 / 66.01 64.09 / 58.06 62.09 / 61.02 32.99 / 17.55 3.34 / 1.94 0.02 / 0.01
PV-RCNN-DM Direct Merging 57.82 / 57.40 48.24 / 42.81 54.63 / 53.64 48.67 / 30.43 12.66 / 8.12 1.67 / 1.04
PV-RCNN-Uni3D Uni3D 66.98 / 66.50 65.70 / 59.14 61.49 / 60.43 60.77 / 42.66 27.44 / 21.85 13.50 / 11.87
PV-RCNN-DT Domain Attention 67.27 / 66.77 65.86 / 59.38 61.38 / 60.34 60.83 / 43.03 27.46 / 22.06 13.82 / 11.52
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
Voxel-RCNN-nuScenes only nuScenes 31.89 / 31.65 3.74 / 2.57 2.41 / 2.37 53.63 / 39.05 22.48 / 17.85 10.86 / 9.70
Voxel-RCNN-Waymo only Waymo 67.05 / 66.41 66.75 / 60.83 63.13 / 62.15 34.10 / 17.31 2.99 / 1.69 0.05 / 0.01
Voxel-RCNN-DM Direct Merging 58.26 / 57.87 52.72 / 47.11 50.26 / 49.50 51.40 / 31.68 15.04 / 9.99 5.40 / 3.87
Voxel-RCNN-Uni3D Uni3D 66.76 / 66.29 66.62 / 60.51 63.36 / 62.42 60.18 / 42.23 30.08 / 24.37 14.60 / 12.32
Voxel-RCNN-DT Domain Attention 66.96 / 66.50 68.23 / 62.00 62.57 / 61.64 60.42 / 42.81 30.49 / 24.92 15.91 / 13.35
Baseline MDF Methods Waymo@Vehicle Waymo@Pedestrian Waymo@Cyclist nuScenes@Car nuScenes@Pedestrian nuScenes@Cyclist
PV-RCNN++ DM Direct Merging 63.79 / 63.38 55.03 / 49.75 59.88 / 58.99 50.91 / 31.46 17.07 / 12.15 3.10 / 2.20
PV-RCNN++-Uni3D Uni3D 68.55 / 68.08 69.83 / 63.60 64.90 / 63.91 62.51 / 44.16 33.82 / 27.18 22.48 / 19.30
PV-RCNN++-DT Domain Attention 68.51 / 68.05 69.81 / 63.58 64.39 / 63.43 62.33 / 44.16 33.44 / 26.94 21.64 / 18.52

3D Pre-training Results

AD-PT Results on Waymo

AD-PT demonstrates strong generalization learning ability on 3D points. We first pre-train the 3D backbone and 2D backbone using the AD-PT on ONCE dataset (from 100K to 1M data), and fine-tune the model on different datasets. Here, we report the results of fine-tuning on Waymo.

Data amount Overall Vehicle Pedestrian Cyclist
SECOND (From scratch) 3% 52.00 / 37.70 58.11 / 57.44 51.34 / 27.38 46.57 / 28.28
SECOND (AD-PT) 3% 55.41 / 51.78 60.53 / 59.93 54.91 / 45.78 50.79 / 49.65
SECOND (From scratch) 20% 60.62 / 56.86 64.26 / 63.73 59.72 / 50.38 57.87 / 56.48
SECOND (AD-PT) 20% 61.26 / 57.69 64.54 / 64.00 60.25 / 51.21 59.00 / 57.86
CenterPoint (From scratch) 3% 59.00 / 56.29 57.12 / 56.57 58.66 / 52.44 61.24 / 59.89
CenterPoint (AD-PT) 3% 61.21 / 58.46 60.35 / 59.79 60.57 / 54.02 62.73 / 61.57
CenterPoint (From scratch) 20% 66.47 / 64.01 64.91 / 64.42 66.03 / 60.34 68.49 / 67.28
CenterPoint (AD-PT) 20% 67.17 / 64.65 65.33 / 64.83 67.16 / 61.20 69.39 / 68.25
PV-RCNN++ (From scratch) 3% 63.81 / 61.10 64.42 / 63.93 64.33 / 57.79 62.69 / 61.59
PV-RCNN++ (AD-PT) 3% 68.33 / 65.69 68.17 / 67.70 68.82 / 62.39 68.00 / 67.00
PV-RCNN++ (From scratch) 20% 69.97 / 67.58 69.18 / 68.75 70.88 / 65.21 69.84 / 68.77
PV-RCNN++ (AD-PT) 20% 71.55 / 69.23 70.62 / 70.19 72.36 / 66.82 71.69 / 70.70

ReSimAD

ReSimAD Implementation

Here, we give the Download Link of our reconstruction-simulation dataset by the ReSimAD, consisting of nuScenes-like, KITTI-like, ONCE-like, and Waymo-like datasets that generate target-domain-like simulation points.

Specifically, please refer to ReSimAD reconstruction for the point-based reconstruction meshes, and PCSim for the technical details of simulating the target-domain-like points based on the reconstructed meshes. For perception module, please refer to PV-RCNN and PV-RCNN++ for model training and evaluation.

We report the zero-shot cross-dataset (Waymo-to-nuScenes) adaptation results using the BEV/3D AP performance as the evaluation metric for a fair comparison. Please refer to ReSimAD for more details.

Methods training time Adaptation Car@R40 Ckpt
PV-RCNN ~23 hours Source-only 31.02 / 17.75 Not Avaliable (Waymo License)
PV-RCNN ~8 hours ST3D 36.42 / 22.99 -
PV-RCNN ~8 hours ReSimAD 37.85 / 21.33 ReSimAD_ckpt
PV-RCNN++ ~20 hours Source-only 29.93 / 18.77 Not Avaliable (Waymo License)
PV-RCNN++ ~2.2 hours ST3D 34.68 / 17.17 -
PV-RCNN++ ~8 hours ReSimAD 40.73 / 23.72 ReSimAD_ckpt

Visualization Tools for 3DTrans

  • Our 3DTrans supports the sequence-level visualization function Quick Sequence Demo to continuously display the prediction results of ground truth of a selected scene.
Visualization Demo

Acknowledge

  • Our code is heavily based on OpenPCDet v0.5.2. Thanks OpenPCDet Development Team for their awesome codebase.
  • A Team Home for Member Information and Profile, Project Link

Technical Papers

@inproceedings{zhang2023uni3d,
  title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
  author={Zhang, Bo and Yuan, Jiakang and Shi, Botian and Chen, Tao and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9253--9262},
  year={2023}
}
@inproceedings{yuan2023bi3d,
  title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15599--15608},
  year={2023}
}
@inproceedings{yuan2023AD-PT,
  title={AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}
@inproceedings{huang2023sug,
  title={SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification},
  author={Huang, Siyuan and Zhang, Bo and Shi, Botian and Gao, Peng and Li, Yikang and Li, Hongsheng},
  booktitle={Proceedings of the 31th ACM International Conference on Multimedia},
  year={2023}
}
@inproceedings{zhang2023resimad,
  title={ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation},
  author={Zhang, Bo and Cai, Xinyu and Yuan, Jiakang and Yang, Donglin and Guo, Jianfei and Xia, Renqiu and Shi, Botian and Dou, Min and Chen, Tao and Liu, Si and others},
  journal={International Conference on Learning Representations},
  year={2024}
}
@article{yan2023spot,
  title={SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving},
  author={Yan, Xiangchao and Chen, Runjian and Zhang, Bo and Yuan, Jiakang and Cai, Xinyu and Shi, Botian and Shao, Wenqi and Yan, Junchi and Luo, Ping and Qiao, Yu},
  journal={arXiv preprint arXiv:2309.10527},
  year={2023}
}

3dtrans's People

Contributors

bobrown avatar jiakangyuan avatar runjian-chen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3dtrans's Issues

Question about AD-PT

Can you provide the code of Point-to-Beam Playback Re-sampling used in AD-PT?

About statistical alignment

Meaning work focusing on multi-datasets training and pre-training!
I am quite interested in statistical.alignment module mentioned in your paper Uni3D and this module can make a big improvement of model performance. In which step can I find the code snippet of it?

How can 3DTrans be utilized to train and detect custom point cloud dataset?

Hi!
First of all, thank you for applying transfer learning and active learning to the detection task of point cloud data. This will be a very good approach and strategy. However, the projects you have showcased are only tested on a few publicly available benchmark datasets, yielding test results. Can you tell me how to use 3DTrans to train and test my own dataset? How can I utilize this project to further improve the detection capabilities of existing models such as CenterPoint and PV-RCNN++? Can you provide me with some practical methods, detailed steps, and suggestions that can be implemented?
Thank you!

About training Bi3D Adaptation stage 1: active source domain data.

Hello. I'm running the Bi3D Adaptation stage 1. After training the first epoch, the algorithm begins to Active Evaluate. However, after the evaluation is finished, I noticed the memory usage of GPU is abnormal. My hardware is RTX 4090 24GB, however when the evaluation is done, the program reports CUDA error: out of memory. I wonder have you ever met this problem before and how did you solve it?
What's more,

source_list = active_learning_utils.get_dataset_list(source_file_path, oss=True, waymo=waymo_source, sample_interval=sample_interval)
set oss to True, which seems not suitable for me, so I set it to False.

Welcome to join us

Our team aims to broaden the boundaries of Autonomous Driving (AD) perception model, trying to find unified representations that can be generalized across different AD domains and scenarios. If you are interested in unified representation learning of AD perception model, please do not hesitate to contact us

image

No module named ‘torch_scatter’

When I run the code using the following script, I met a error of ModuleNotFoundError: No module named ‘torch_scatter’
'''
sh ./scripts/UDA/dist_train_uda.sh 4 --cfg_file ./cfgs/DA/waymo_kitti/voxel_rcnn_pre_SN_feat_3.yaml
'''
I have installed the code using the INSTALL.md, and please tell me how to fix this error?

Creating Waymo infos using huge amounts of ram

Hello @BOBrown !

I am trying to create the required Waymo PKL files for fine tuning point contrast on the Waymo dataset.
I have followed the instructions to download the Waymo-open-dataset and I am trying to run the following command:

python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo/OD/waymo_dataset.yaml

However when running this the program loads

---------------The waymo sample interval is 1, total sequecnes is 14-----------------

And then tensorflow runs but it runs out of ram.

I have tried running the script with the following configuration:

Two Nvidia A100 gpus with 80gbs of ram each
Tensorflow 2.4.0
CUDA 11.1
cuDNN/8.0.4.3
Torch 1.8.1 cu111
Waymo-open-dataset-tf-2-4-0

The error message occurs in the Waymo_utils file on line
---------------Start to generate data infos---------------

0%| | 0/14 [00:00<?, ?it/s]
100%|██████████| 14/14 [00:00<00:00, 1197.61it/s]
2023-11-29 13:24:38,993 waymo_dataset.py include_waymo_data 105 INFO Total skipped info 14
2023-11-29 13:24:38,993 waymo_dataset.py include_waymo_data 106 INFO Total samples for Waymo dataset: 0
2023-11-29 13:24:39.125843: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
---------------The waymo sample interval is 1, total sequecnes is 14-----------------

After this it starts the GPUs and I run out of ram

What is the reason for this?

Question on generalization study on Uni3D

In Table 8 of Uni3D, generalization study is conducted by evaluating zero-shot detection accuracy on KITTI. Since two separate detection heads are trained and a dual-BN is leveraged for nuscenes and waymo during pre-training, what are the implementation details for conducting zero-shot detection on KITTI and which detection head is used?

Using ckpt for pretraining

Hello!

I am currently working on reproducing the results from the AD-PT paper.
To save some time I am trying to include a ckpt file when running the pretraining with Pointcontrast file.

sh scripts/PRETRAIN/dist_train_pointcontrast.sh 2 \
--cfg_file ./cfgs/once_models/unsupervised_model/pointcontrast_pvrcnn_res_plus_backbone.yaml \
--batch_size 4 \
--epochs 4 \
--ckpt once_1M_ckpt.pth

However the model fails when trying to load the model state giving me the following error:

Traceback (most recent call last):
  File "train_pointcontrast.py", line 212, in <module>
    main()
  File "train_pointcontrast.py", line 140, in main
    it, start_epoch = model.load_params_with_optimizer(args.ckpt, to_cpu=dist_train, optimizer=optimizer, logger=logger)
  File "../pcdet/models/detectors/detector3d_template.py", line 403, in load_params_with_optimizer
    self._load_state_dict(checkpoint['model_state'], strict=True)
KeyError: 'model_state'

Do I need to save my model state before applying the checkpoint?
Do you have a solution for this problem @BOBrown

Where is the key codes about Coordinate-origin Alignment

Hello, I want to debug the codes to Better understand the paper Uni3D. However, I can not find the key codes about Coordinate-origin Alignment mentioned in Uni3D. I found that the batch_info of two datasets are directly concatenated together as batch1 and batch2, without Coordinate-origin Alignment(Maybe I miss the code). Could you please tell me where are the key codes about Coordinate-origin Alignment? Thanks a lot!

Questions on Training Uni3D

Hello!
I wonder if I want to reproduce the experimental results in Table 7 in Uni3D, how can I adjust the number of samples in nuScenes dataset, and where is the script for training on one dataset only?

Waymo evaluation issue

Hello, thanks for great works!
During Waymo dataset evaluation, I encountered 0 mAP and GT objects.
image
I searched in PCDet issue, the author says that Waymo 1.0.0 is incompatible with evaluation, and they use Waymo 1.2.0 dataset as default, referring to this: open-mmlab/OpenPCDet#1102

However, in your AD-PT paper, you mentioned that you used Waymo 1.0.0 as your experiment setting. As your codebase is based on PCDet, may I ask if you also encountered this 0 AP error during waymo evaluation? Is Waymo 1.0.0 okay for evaluation?

Running pre-training script with point contrast

Hello!

I am trying to run the pre-training script listed in the codebase documentation.
I am getting the following error message when trying to run the script:

Script:

sh scripts/PRETRAIN/dist_train_pointcontrast.sh 2 \ --cfg_file ./cfgs/once_models/unsupervised_model/pointcontrast_pvrcnn_res_plus_backbone.yaml \ --batch_size 2 \ --epochs 30

Error:

´ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 1 (pid: 506209) of binary: /cluster/home/martiiv/deeplearningproject/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group´

I am using:

  • Two Nvidia A100 GPUS
  • CUDA 11.1
  • GCC 10.2
  • Python 3.8.6
  • Pytorch Version: 1.9.0+cu111

Has anyone else encountered this error?

Full message:

`+ NGPUS=2

  • PY_ARGS='--cfg_file ./cfgs/once_models/unsupervised_model/pointcontrast_pvrcnn_res_plus_backbone.yaml --batch_size 2 --epochs 15'
  • true
  • PORT=38966
    ++ nc -z 127.0.0.1 38966
    ++ echo 1
  • status=1
  • '[' 1 '!=' 0 ']'
  • break
  • echo 38966
    38966
  • python -m torch.distributed.launch --nproc_per_node=2 --master_port=38966 train_pointcontrast.py --launcher pytorch --cfg_file ./cfgs/once_models/unsupervised_model/pointcontrast_pvrcnn_res_plus_backbone.yaml --batch_size 2 --epochs 15
    /cluster/home/martiiv/deeplearningproject/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
    logger.warn(
    The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
    WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
    Please read local_rank from os.environ('LOCAL_RANK') instead.
    INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
    entrypoint : train_pointcontrast.py
    min_nodes : 1
    max_nodes : 1
    nproc_per_node : 2
    run_id : none
    rdzv_backend : static
    rdzv_endpoint : 127.0.0.1:38966
    rdzv_configs : {'rank': 0, 'timeout': 900}
    max_restarts : 3
    monitor_interval : 5
    log_dir : None
    metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/cluster/home/martiiv/deeplearningproject/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=38966
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n/attempt_0/1/error.json
program started
program started
2023-11-13 12:58:15,729 train_pointcontrast.py main 91 INFO Start logging
2023-11-13 12:58:15,730 train_pointcontrast.py main 93 INFO CUDA_VISIBLE_DEVICES=0,1
2023-11-13 12:58:15,730 train_pointcontrast.py main 96 INFO total_batch_size: 2
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO cfg_file ./cfgs/once_models/unsupervised_model/pointcontrast_pvrcnn_res_plus_backbone.yaml
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO batch_size 1
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO epochs 15
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO workers 8
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO extra_tag default
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO ckpt None
2023-11-13 12:58:15,730 train_pointcontrast.py main 98 INFO pretrained_model None
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO launcher pytorch
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO tcp_port 18888
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO sync_bn False
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO fix_random_seed False
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO ckpt_save_interval 1
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO local_rank 0
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO max_ckpt_save_num 30
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO merge_all_iters_to_one_epoch False
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO set_cfgs None
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO max_waiting_mins 0
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO start_epoch 0
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO num_epochs_to_eval 0
2023-11-13 12:58:15,731 train_pointcontrast.py main 98 INFO save_to_file False
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.ROOT_DIR: /cluster/home/martiiv/DeepLearningProject/3DTrans
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.LOCAL_RANK: 0
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.CLASS_NAMES: ['Vehicle', 'Pedestrian', 'Cyclist']
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.USE_PRETRAIN_MODEL: False
2023-11-13 12:58:15,731 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG = edict()
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATASET: ONCEDataset
2023-11-13 12:58:15,731 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_PATH: ../data/once
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.LABELED_RATIO: 0
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.POINT_CLOUD_RANGE: [-75.2, -75.2, -5.0, 75.2, 75.2, 3.0]
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.VOXEL_SIZE: [0.1, 0.1, 0.2]
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.UNLABELED_DATA_FOR: ['teacher', 'student']
2023-11-13 12:58:15,732 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.INFO_PATH = edict()
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.train: ['once_infos_train_vehicle.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.val: ['once_infos_val_vehicle.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.test: ['once_infos_test.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.raw_small: ['once_infos_raw_small.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.raw_medium: ['once_infos_raw_medium.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.INFO_PATH.raw_large: ['once_infos_raw_large.pkl']
2023-11-13 12:58:15,732 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.DATA_SPLIT = edict()
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_SPLIT.train: train
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_SPLIT.test: val
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_SPLIT.raw: raw_small
2023-11-13 12:58:15,732 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.POINT_FEATURE_ENCODING = edict()
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.encoding_type: absolute_coordinates_encoding
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.used_feature_list: ['x', 'y', 'z', 'intensity']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.POINT_FEATURE_ENCODING.src_feature_list: ['x', 'y', 'z', 'intensity']
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_PROCESSOR: [{'NAME': 'mask_points_and_boxes_outside_range', 'REMOVE_OUTSIDE_BOXES': True}, {'NAME': 'shuffle_points', 'SHUFFLE_ENABLED': {'train': True, 'test': False}}, {'NAME': 'transform_points_to_voxels', 'VOXEL_SIZE': [0.1, 0.1, 0.2], 'MAX_POINTS_PER_VOXEL': 5, 'MAX_NUMBER_OF_VOXELS': {'train': 60000, 'test': 60000}}]
2023-11-13 12:58:15,732 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.DATA_AUGMENTOR = edict()
2023-11-13 12:58:15,732 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_AUGMENTOR.DISABLE_AUG_LIST: ['placeholder']
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.DATA_AUGMENTOR.AUG_CONFIG_LIST: [{'NAME': 'gt_sampling', 'USE_ROAD_PLANE': False, 'DB_INFO_PATH': ['once_dbinfos_train_vehicle.pkl'], 'PREPARE': {'filter_by_min_points': ['Car:5', 'Bus:5', 'Truck:5', 'Pedestrian:5', 'Cyclist:5']}, 'SAMPLE_GROUPS': ['Car:1', 'Bus:4', 'Truck:3', 'Pedestrian:2', 'Cyclist:2'], 'NUM_POINT_FEATURES': 4, 'REMOVE_EXTRA_WIDTH': [0.0, 0.0, 0.0], 'LIMIT_WHOLE_SCENE': True}, {'NAME': 'random_world_flip', 'ALONG_AXIS_LIST': ['x', 'y']}, {'NAME': 'random_world_rotation', 'WORLD_ROT_ANGLE': [-0.78539816, 0.78539816]}, {'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}]
2023-11-13 12:58:15,733 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.TEACHER_AUGMENTOR = edict()
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.TEACHER_AUGMENTOR.DISABLE_AUG_LIST: ['random_world_scaling']
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.TEACHER_AUGMENTOR.AUG_CONFIG_LIST: [{'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}]
2023-11-13 12:58:15,733 config.py log_config_to_file 10 INFO
cfg.DATA_CONFIG.STUDENT_AUGMENTOR = edict()
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.STUDENT_AUGMENTOR.DISABLE_AUG_LIST: ['placeholder']
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.STUDENT_AUGMENTOR.AUG_CONFIG_LIST: [{'NAME': 'random_world_flip', 'ALONG_AXIS_LIST': ['x', 'y']}, {'NAME': 'random_world_rotation', 'WORLD_ROT_ANGLE': [-0.78539816, 0.78539816]}, {'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}]
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.BASE_CONFIG: cfgs/dataset_configs/once/PRETRAIN/unsupervised_once_dataset.yaml
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.DATA_CONFIG.USE_PAIR_PROCESSOR: True
2023-11-13 12:58:15,733 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION = edict()
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.NUM_EPOCHS: 15
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.OPTIMIZER: adam_onecycle
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LR: 0.001
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.WEIGHT_DECAY: 0.01
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.MOMENTUM: 0.9
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.MOMS: [0.95, 0.85]
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.PCT_START: 0.4
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.DIV_FACTOR: 10
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.DECAY_STEP_LIST: [35, 45]
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LR_DECAY: 0.1
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LR_CLIP: 1e-07
2023-11-13 12:58:15,733 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LR_WARMUP: False
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.WARMUP_EPOCH: -1
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.GRAD_NORM_CLIP: 10
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.LOSS_CFG = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.POS_THRESH: 0.1
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.NEG_THRESH: 1.4
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv3 = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv3.DOWNSAMPLE_FACTOR: 4
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv3.POOL_RADIUS: [1.2]
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv3.NSAMPLE: [16]
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv4 = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv4.DOWNSAMPLE_FACTOR: 8
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv4.POOL_RADIUS: [2.4]
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.SA_LAYER.x_conv4.NSAMPLE: [16]
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.FEATURES_SOURCE: ['bev']
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.POINT_SOURCE: raw_points
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.NUM_KEYPOINTS: 2048
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.LOSS_CFG.NUM_NEGATIVE_KEYPOINTS: 1024
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.OPTIMIZATION.TEST = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.OPTIMIZATION.TEST.BATCH_SIZE_PER_GPU: 4
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.MODEL = edict()
2023-11-13 12:58:15,734 config.py log_config_to_file 13 INFO cfg.MODEL.NAME: PVRCNN_PLUS_BACKBONE
2023-11-13 12:58:15,734 config.py log_config_to_file 10 INFO
cfg.MODEL.VFE = edict()
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.MODEL.VFE.NAME: MeanVFE
2023-11-13 12:58:15,735 config.py log_config_to_file 10 INFO
cfg.MODEL.BACKBONE_3D = edict()
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.MODEL.BACKBONE_3D.NAME: VoxelResBackBone8x
2023-11-13 12:58:15,735 config.py log_config_to_file 10 INFO
cfg.MODEL.MAP_TO_BEV = edict()
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.MODEL.MAP_TO_BEV.NAME: HeightCompression
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.MODEL.MAP_TO_BEV.NUM_BEV_FEATURES: 256
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.TAG: pointcontrast_pvrcnn_res_plus_backbone
2023-11-13 12:58:15,735 config.py log_config_to_file 13 INFO cfg.EXP_GROUP_PATH: cfgs/once_models/unsupervised_model
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 1 (pid: 506209) of binary: /cluster/home/martiiv/deeplearningproject/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=38966
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_eeetvw03/none_ji9a7o3n/attempt_1/1/error.json
program started
program started
`

About ReSimAD

We are very interested in your work on ReSimAD, but we found some questions that we would like to consult with you. We downloaded the KITTI-like dataset, but found through KITTI's projection method that according to the label file you gave, the 3D box cannot circle the target object very well. It looks like this picture:
667c6f524f26215d7b61747e0cc1ff2

We could not know what the problem is in this process? Hope to get your reply.

About evaluation of source-pretrained model.

Thanks for your great job. I found some issues with the evaluation of source-pretrained model in GETTING_STARTED_ADA.md. I noticed that some of the yaml files don't include DATA_CONFIG_TAR. For example, I was testing with the following command

bash scripts/dist_test.sh 4 --cfg_file cfgs/DA/nusc_kitti/source_only/voxel_rcnn_feat_3_vehi.yaml --ckpt ../output/DA/nusc_kitti/source_only/voxel_rcnn_feat_3_vehi/default/ckpt/checkpoint_epoch_30.pth

However, as there is no DATA_CONFIG_TAR in cfgs/DA/nusc_kitti/source_only/voxel_rcnn_feat_3_vehi.yaml, this command will test the performance on the source dataset, instead of the target dataset, which is not my intension.

Using the merge_labels.py file

Hello!

I am trying to pretrain using AD-PT and I have encountered a problem.
I am aware that after you configure the ONCE dataset using this command:

python -m pcdet.datasets.once.once_dataset --func create_once_infos --cfg_file tools/cfgs/dataset_configs/once/OD/once_dataset.yaml

I realized later that you can't pre train using the produced files so I needed to merge the labels of the produced files using merge_labels.py located in the tools_utils folder.

Converting the files I get a Keyworderror 'Vehicle' error when trying to run the pre training script

You need to merge the labels in the once_infos_train.pkl, once_dbinfos_train.pkl and once_dbinfos_val.pkl files to produce the once_infos_train_vehicle.pkl files. However, when trying to merge the labels I end up with an identical file.
I am using the following command

python -m tools.tools_utils.merge_labels --raw_data_pkl once_dbinfos_train.pkl --save_path once_dbinfos_vehicle.pkl

I think I need to provide something with the --vehicle_pkl command as this argument reads from a vehicle.pkl file but I don't know what type of file I need to provide and/or where I get it from!

Has anyone encountered this problem before?

Pretraining on NuScenes

I'm currently working on pretraining the AD-PT model using the NuScenes dataset, but I've hit a few roadblocks and could really use some help. Here's where I'm at:

Following the guide i have aquired the:

  • nuscenes_dbinfos_10sweeps_withvelo.pkl
  • nuscenes_infos_10sweeps_train.pkl
  • nuscenes_infos_10sweeps_val.pkl

(However, due to an error these files had to be moved from 3DTrans/data/nuscenes/v1.0-trainval/, to 3DTrans /data/nuscenes/)

When running the script:

sh scripts/PRETRAIN/dist_train_pointcontrast.sh 2 --cfg_file cfgs/nuscenes_models/cbgs_dyn_pp_centerpoint.yaml --batch_size 4 --epochs 30

I received this error:

  File "../pcdet/datasets/nuscenes/nuscenes_semi_dataset.py", line 114, in split_nuscenes_semi_data
    raw_split = data_splits['raw']
KeyError: 'raw'

As I see the data_splits are : {'train': 'train', 'test': 'test'}

After figuring this out i modified the code (File: nuscenes_semi_dataset.py) to only run if data_splits contians 'raw':


raw_split = data_splits.get('raw')
        if raw_split:
            for info_path in info_paths[raw_split]:
                if oss_path is None:
                    info_path = root_path / info_path
                    with open(info_path, 'rb') as f:
                        infos = pickle.load(f)
                        nuscenes_unlabeled_infos.extend(copy.deepcopy(infos))
                else:
                    info_path = os.path.join(oss_path, info_path)
                    pkl_bytes = client.get(info_path, update_cache=True)
                    infos = pickle.load(io.BytesIO(pkl_bytes))
                    nuscenes_unlabeled_infos.extend(copy.deepcopy(infos))

Doing this removed the error. However, then i received this error:

Traceback (most recent call last):
  File "train_pointcontrast.py", line 206, in <module>
    main()
  File "train_pointcontrast.py", line 112, in main
    datasets, dataloaders, samplers = build_unsupervised_dataloader(
  File "../pcdet/datasets/__init__.py", line 301, in build_unsupervised_dataloader
    unlabeled_dataset = _semi_dataset_dict[dataset_cfg.DATASET]['UNLABELED_PAIR'](
KeyError: 'UNLABELED_PAIR'



Looking into this error I saw that this key is not in the NuScenes key, as_semi_dataset_dict looked like this:


_semi_dataset_dict = {
    'ONCEDataset': {
        'PARTITION_FUNC': split_once_semi_data,
        'PRETRAIN': ONCEPretrainDataset,
        'LABELED': ONCELabeledDataset,
        'UNLABELED': ONCEUnlabeledDataset,
        'UNLABELED_PAIR': ONCEUnlabeledPairDataset,
        'TEST': ONCETestDataset
    },
    'NuScenesDataset': {
        'PARTITION_FUNC': split_nuscenes_semi_data,
        'PRETRAIN': NuScenesPretrainDataset,
        'LABELED': NuScenesLabeledDataset,
        'UNLABELED': NuScenesUnlabeledDataset,
        'TEST': NuScenesTestDataset
    },
    'KittiDataset': {
        'PARTITION_FUNC': split_kitti_semi_data,
        'PRETRAIN': KittiPretrainDataset,
        'LABELED': KittiLabeledDataset,
        'UNLABELED': KittiUnlabeledDataset,
        'TEST': KittiTestDataset
    }
}

I then added a condition where the code in init.py only ran if 'UNLABELED_PAIR' was in the dataset(file: pcdet/datasets/init.py):

if 'UNLABELED_PAIR' in _semi_dataset_dict[dataset_cfg.DATASET]:
        unlabeled_dataset = _semi_dataset_dict[dataset_cfg.DATASET]['UNLABELED_PAIR'](
            dataset_cfg=dataset_cfg,
            class_names=class_names,
            infos = unlabeled_infos,
            root_path=root_path,
            logger=logger,
        )

Then this happened:

Traceback (most recent call last):
  File "train_pointcontrast.py", line 206, in <module>
2023-11-10 13:42:20,443 nuscenes_semi_dataset.py split_nuscenes_semi_data 130  INFO  Total samples for nuscenes testing dataset: 0
2023-11-10 13:42:20,443 nuscenes_semi_dataset.py split_nuscenes_semi_data 131  INFO  Total samples for nuscenes labeled dataset: 0
2023-11-10 13:42:20,443 nuscenes_semi_dataset.py split_nuscenes_semi_data 132  INFO  Total samples for nuscenes unlabeled dataset: 0
Traceback (most recent call last):
  File "train_pointcontrast.py", line 206, in <module>
    main()
      File "train_pointcontrast.py", line 112, in main
main()
  File "train_pointcontrast.py", line 112, in main
    datasets, dataloaders, samplers = build_unsupervised_dataloader(
  File "../pcdet/datasets/__init__.py", line 312, in build_unsupervised_dataloader
    datasets, dataloaders, samplers = build_unsupervised_dataloader(
  File "../pcdet/datasets/__init__.py", line 312, in build_unsupervised_dataloader
    unlabeled_sampler = torch.utils.data.distributed.DistributedSampler(unlabeled_dataset)
UnboundLocalError: local variable 'unlabeled_dataset' referenced before assignment
    unlabeled_sampler = torch.utils.data.distributed.DistributedSampler(unlabeled_dataset)
UnboundLocalError: local variable 'unlabeled_dataset' referenced before assignment

Any ideas on how to tackle these errors?

TypeError: __init__() missing 3 required positional arguments: 'num_class_s2', 'dataset_s2', and 'source_one_name'[]

When I run the following command,
python /tools/train_multi_db_merge_loss.py --cfg_file ./cfgs/MDF/nusc_kitti/nusc_kitti_voxel_rcnn_feat_3_uni3d.yaml
I encounter the following problem:
Can you help me to solve it ? Thank you very much
Traceback (most recent call last):
File "/home/hyh/westdigital_dataset/3DTrans/tools/train_multi_db_merge_loss.py", line 268, in
main()
File "/home/hyh/westdigital_dataset/3DTrans/tools/train_multi_db_merge_loss.py", line 135, in main
model = build_network(model_cfg=cfg.MODEL, num_class=len(cfg.CLASS_NAMES), dataset=source_set)
File "/home/hyh/westdigital_dataset/3DTrans/tools/../pcdet/models/init.py", line 16, in build_network
model = build_detector(
File "/home/hyh/westdigital_dataset/3DTrans/tools/../pcdet/models/detectors/init.py", line 73, in build_detector
model = all[model_cfg.NAME](
TypeError: init() missing 3 required positional arguments: 'num_class_s2', 'dataset_s2', and 'source_one_name'

python /tools/train_multi_db_merge_loss.py --cfg_file ./cfgs/MDF/nusc_kitti/nusc_kitti_voxel_rcnn_feat_3_uni3d.yaml

Error encountered when trying to train Voxel-RCNN using Uni-3D

I tried to train Voxel-RCNN using Uni-3D as instructed in readme files, but encountered the following error:

Exception|implicit_gemm]feat=torch.Size([531550, 32]),w=torch.Size([32, 3, 3, 3, 32]),pair=torch.Size([27, 784744]),act=784744,issubm=True,istrain=True
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
epochs:   0%|                                            | 0/30 [00:08<?, ?it/s]
Traceback (most recent call last):
  File "/home/lipw/3DTrans-master/tools/train_multi_db.py", line 261, in <module>
    main()
  File "/home/lipw/3DTrans-master/tools/train_multi_db.py", line 210, in main
    train_func(
  File "/home/lipw/3DTrans-master/tools/train_utils/train_multi_db_utils.py", line 174, in train_model
    accumulated_iter = train_one_epoch(
  File "/home/lipw/3DTrans-master/tools/train_utils/train_multi_db_utils.py", line 59, in train_one_epoch
    loss, tb_dict, disp_dict = model_func(model, batch)
  File "/home/lipw/3DTrans-master/tools/../pcdet/models/__init__.py", line 63, in model_func
    ret_dict, tb_dict, disp_dict = model(batch_dict, **forward_args)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/lipw/3DTrans-master/tools/../pcdet/models/detectors/voxel_rcnn.py", line 61, in forward
    batch_dict = cur_module(batch_dict)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/lipw/3DTrans-master/tools/../pcdet/models/backbones_3d/spconv_backbone_unibn.py", line 211, in forward
    t_conv2_2 = self.conv2_2(t_conv2_1)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/modules.py", line 138, in forward
    input = module(input)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/modules.py", line 138, in forward
    input = module(input)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 755, in forward
    return self._conv_forward(self.training,
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 456, in _conv_forward
    out_features = Fsp.implicit_gemm(
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/functional.py", line 224, in forward
    raise e
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/functional.py", line 210, in forward
    out, mask_out, mask_width = ops.implicit_gemm(
  File "/home/lipw/miniconda3/envs/3dtrans/lib/python3.10/site-packages/spconv/pytorch/ops.py", line 1513, in implicit_gemm
    mask_width, tune_res_cpp = ConvGemmOps.implicit_gemm(
RuntimeError: /io/build/temp.linux-x86_64-cpython-310/spconv/build/core_cc/src/cumm/conv/main/ConvMainUnitTest/ConvMainUnitTest_matmul_split_Simt_f32f32f32_0.cu:1047
cuda execution failed with error 700 an illegal memory access was encountered
Simt_f32f32f32f32f32tnt_m32n128k16m32n32k8A1_200_C301LLL_SK error with params [531550, 32] [32, 27, 32] [784744, 32] [27, 784744] [784744, 1] [784744] [] -1

Training PV-RCNN+ using Uni3D or training vanilla Voxel R-CNN works fine though.

A problem about GPUs .

Hello, I met a problem when I ran your program, I am using kitti and nuscenes-mini datasets, but when I run the source-only commandsh scripts/dist_train.sh 8 --cfg_file ./cfgs/DA/nusc_kitti/source_only/pvrcnn_old_anchor_sn_kitti.yaml my 8 NVIDIA RTX A5000 GPUs have reached 100% utilization, but the memory is used less than half, no error messages and nothing happens,I adjusted BATCH_SIZE_PER_GPU=1, but still does not work.

Groundtruth dimension mismatched

Dear author,

Thank you for your great work. I am try to reproduce your result (MDF) with waymo and nuscenes dataset, but when merge two batches from waymo and nuscenes, error occurred:

batch = common_utils.merge_two_batch_dict(batch_1, batch_2)
File "../pcdet/utils/common_utils.py", line 670, in merge_two_batch_dict
batch_merge_dict[key] = np.concatenate(tar_list_merge, axis=0)
File "<array_function internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 2, the array at index 0 has size 10 and the array at index 1 has size 8

It seems the gt from nuscenes is mismatched with gt from waymo

Could you tell me how you avoid such issue?

B.R
Thank you

KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

Excuse me, when i run the command line in [Bi3D Adaptation stage 1: active source domain data ,something occured .
the command line is as below
bash scripts/ADA/dist_train_active_source.sh 2 --cfg_file ./cfgs/ADA/nuscenes-kitti/voxelrcnn/active_source.yaml --pretrained_model ***3DTrans/tools/cfgs/DA/nusc_kitti/source_only/voxel_rcnn_feat_3_vehi/default/ckpt/checkpoint_epoch_30.pth

Whem i do the training from beginning ,everything goes well. Howerer when i resume the training , some error occured:
Traceback (most recent call last):
File "train_active_source.py", line 272, in
main()
File "train_active_source.py", line 193, in main
lr_scheduler_discriminator, lr_warmup_scheduler_discriminator = build_scheduler(
File "/home/hyh/Projects/3DTrans/tools/train_utils/optimization/init.py", line 55, in build_scheduler
lr_scheduler = lr_sched.LambdaLR(optimizer, lr_lbmd, last_epoch=last_epoch)
File "/home/hyh/anaconda3/envs/3Dtrans/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 203, in init
super(LambdaLR, self).init(optimizer, last_epoch, verbose)
File "/home/hyh/anaconda3/envs/3Dtrans/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 39, in init
raise KeyError("param 'initial_lr' is not specified "
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
Anyone else knows how to solve it?
Thank you!

Questions about Uni3D.

  1. Use feature copy is not elegant in deployment. If I use 5 dataset, need copy 5times which cost much memory and increase latency.
  2. The Semantic-level Feature Coupling-and-Recoupling Module is too fancy, and it improve little performance compared with domain-attention module.I means this module only useful in paper not in industry?
  3. This new BN layer can reach same latency performance as the original BN layer?

cover_feat in PV-RCNN++

Hi, thanks for your work with this repo.

For PV-RCNN when we only use xyz features, I'm aware that the xyz_features become None. Some solutions I've seen remove the 'raw_points' from FEATURES_SOURCE below so that it can work with xyz data only.

FEATURES_SOURCE: ['bev', 'x_conv3', 'x_conv4', 'raw_points']

You wrote that cover feat uses the z points as the 4th feature, essentially making the point cloud [x,y,z,z]. What's the idea behind this, and do you know if it works better than excluding 'raw_points'?

cover_feat_4: if cover the xyz_features using the values in z-dimension

Quick Squence Demo for 3DTrans ( KITTI / Waymo)

Hi, Thank you for this repository. I really like the Quick Sequence Demo. I would like to use the same for KITTI /Waymo. Is it to be used the same way?

If not, could you give me some pointers on going about it? I would like to contribute by creating it then.

Thanks!

About running on our own Dataset

Q1 How to perform the model training and inference using our own Dataset (for DA Waymo->our Dataset)?

Q2 How to write a new dataloader to load our private Dataset?

Problem about GPU memory

Hello, thanks for you ADA codebase.
I try to train pvrcnn with Bi3D, I use kitti as source domain and a custom dataset in kitti format (smaller than kitti) as target domain.
A CUDA out of memory problem occured during Stage 2. I use 6 RTX 2080ti (each has 10 GB memory) and set BATCH_SIZE_PER_GPU to 1.
The Discriminator training and active evaluating were both done successfully but the CUDA out of memory problem occured after these.
Are there any bug in memory management in this code? Or do I need more memory to train?

Looking forward to your response!

How should I use Uni3D to train multiple datasets?

First of all, thank you for your outstanding contribution to 3D object detection.
As can see from the paper, you don't directly perform dataset-level merging. I'm currently working on cross-domain multi-dataset training in 2D and would like to learn from your ideas from Uni3D. I would like to know how I can run uni3D to train multiple datasets (what specific code to call on the terminal, or what specific .py file to run.), and I'd like to know how you fed multiple datasets separately into the network training.

Question About AD-PT

I would like to ask a question about the AD-PT paper. In the AD-PT paper, when using only 20% of the kitti dataset, it showed a significant performance improvement in the SECOND model. However, when I used the SECOND-IOU model using the ONCE 1M provided as a backbone pretrain model, it showed -1.7mAP performance when using AD-PT. I would like to ask if you can release the imageset of the data when using 20% of the kitti.

The installation of gcc-5.4.0

Excuse me, How should I install gcc-5.4.0? I didn't find an installation command line.
Is it installed locally or in conda environment.
Thank you!

Questions on detection heads of Uni3D

Hello, thanks for your great work!
Have you considered training only one detection head for different datasets, since the detection task for different datasets is basically based on three classes: car, pedestrian, cyclist. Although there are some differences between objects of the same category in different datasets, I am curious if sharing the detection between different datasets will make the performance about the same or even better?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.