Code Monkey home page Code Monkey logo

gaussianocc's Introduction

GaussianOcc

Project Page | Arxiv | Data

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
Wanshui Gan*, Fang Liu*, Hongbin Xu, Ningkai Mo, Naoto Yokoya
πŸ“– δΈ­ζ–‡θ§£θ―»οΌˆη¬¬δΈ‰ζ–ΉοΌ‰: θ‡ͺεŠ¨ι©Ύι©ΆδΉ‹εΏƒ

Updates:

  • πŸ”” 2024/08/25 Release the code in stage 2 for both training and evaluation. Code may not be cleaned thoroughly, so feel free to open an issue if any question.
  • πŸ”” 2024/08/22 Paper release and the code will be released next week.

πŸ•Ή Demos

Demos are a little bit large; please wait a moment to load them.

3D Occupancy and Render Depth:

nuScenes:

DDAD:

πŸ“ Introduction

We introduce GaussianOcc, a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D poses from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps, semantic maps), which is both time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering).

πŸ’‘ Method

Method Overview:

πŸ”§ Installation

Clone this repo and install the dependencies:

git clone --recurse-submodules https://github.com/GANWANSHUI/GaussianOcc.git
cd GaussianOcc
conda create -n gsocc python=3.8
conda activate gsocc
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

cd submodule/diff-gaussian-rasterization-confidence
pip install .

cd submodule/diff-gaussian-rasterization-confidence-semantic
pip install .

Our code is tested with Python 3.8, PyTorch 1.9.1 and CUDA 11.3 and can be adapted to other versions of PyTorch and CUDA with minor modifications.

πŸ— Dataset Preparation

Click for more You can add the softlink if you already had the related dataset, such as:
ln -s  path_to_nuscenes GaussianOcc/data

ln -s  path_to_ddad GaussianOcc/data

nuScenes

  1. Download nuScenes V1.0 full dataset data from nuScenes and link the data folder to ./data/nuscenes/nuscenes/.

  2. Download the ground truth occupancy labels from Occ3d and unzip the gts.tar.gz to ./data/nuscenes/gts. Note that we only use the 3d occupancy labels for validation.

  3. Generate the ground truth depth maps for validation:

    python tools/export_gt_depth_nusc.py
  4. Download the generated 2D semantic labels from semantic_labels and extract the data to ./data/nuscenes/. We recommend that you use pigz to speed up the process.

  5. Download the pretrained weights of our model from Google or η™ΎεΊ¦, the password is 778c, and move them to ./ckpts/.

  6. (Optional) If you want to generate the 2D semantic labels by yourself, please refer to the README.md in GroundedSAM_OccNeRF. The dataset index pickle file nuscenes_infos_train.pkl is from SurroundOcc and should be placed under ./data/nuscenes/.

DDAD

  • Please download the official DDAD dataset and place them under data/ddad/raw_data. You may refer to official DDAD repository for more info and instructions.
  • Please download metadata of DDAD and place these pkl files in datasets/ddad.
  • We provide annotated self-occlusion masks for each sequences. Please download masks and place them in data/ddad/mask.
  • Export depth maps for evaluation
  • The ddad semantic map generation is similar to nuscenes above
cd tools
python export_gt_depth_ddad.py val

The Final folder structure should be like:

GaussianOcc/
β”œβ”€β”€ ckpts/
β”‚   β”œβ”€β”€ ddad-sem-gs/
β”‚   β”œβ”€β”€ nusc-sem-gs/
β”‚   β”œβ”€β”€ stage1_pose_nusc/
β”‚   β”œβ”€β”€ stage1_pose_ddad/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ nuscenes/
β”‚   β”‚   β”œβ”€β”€ nuscenes/
β”‚   β”‚   β”‚   β”œβ”€β”€ maps/
β”‚   β”‚   β”‚   β”œβ”€β”€ samples/
β”‚   β”‚   β”‚   β”œβ”€β”€ sweeps/
β”‚   β”‚   β”‚   β”œβ”€β”€ v1.0-trainval/
β”‚   β”‚   β”œβ”€β”€ gts/
β”‚   β”‚   β”œβ”€β”€ nuscenes_depth/
β”‚   β”‚   β”œβ”€β”€ nuscenes_semantic/
β”‚   β”‚   β”œβ”€β”€ nuscenes_infos_train.pkl
β”‚   β”œβ”€β”€ ddad/
β”‚   β”‚   │── raw_data/
β”‚   β”‚   β”‚   │── 000000
|   |   |   |── ...
|   |   |── depth/
β”‚   β”‚   β”‚   │── 000000
|   |   |   |── ...
|   |   |── mask/
β”‚   β”‚   β”‚   │── 000000
|   |   |   |── ...
|   |   |── ddad_semantic/
β”‚   β”‚   β”‚   │── 000000
|   |   |   |── ...

πŸš€ Quick Start

Training and Evaluation

sh run_gs_occ.sh

Visualization

Visualize the semantic occupancy prediction:

python tools/export_vis_data.py  # You can modify this file to choose scenes you want to visualize. Otherwise, all validation scenes will be visualized.

sh run_vis.sh

python gen_scene_video.py scene_folder_generated_by_the_above_command --sem_only

πŸ™ Acknowledgement

Many thanks to these excellent projects:

Recent related works:

πŸ“ƒ Bibtex

If you find this repository/work helpful in your research, welcome to cite our papers and give a ⭐.

@article{gan2024gaussianocc,
  title={GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting},
  author={Gan, Wanshui and Liu, Fang and Xu, Hongbin and Mo, Ningkai and Yokoya, Naoto},
  journal={arXiv preprint arXiv:2408.11447},
  year={2024}
}

@article{gan2024comprehensive,
  title={A Comprehensive Framework for 3D Occupancy Estimation in Autonomous Driving},
  author={Gan, Wanshui and Mo, Ningkai and Xu, Hongbin and Yokoya, Naoto},
  journal={IEEE Transactions on Intelligent Vehicles},
  year={2024},
  publisher={IEEE}
}

gaussianocc's People

Contributors

ganwanshui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaussianocc's Issues

Render image color from rasterizer when changing use_semantic and semantic

Hi, Thanks for your great work
I want to ask that,

  1. If I change file nusc-sem-gs.txt at use_semantic = True -> False:
    After that, I get rgb_marched from occupancy_decoder.py (rgb_marched is rendered_image color was rasterized from diff-gausian-rasterizer) and visualize , but it's tensor full 0 (black images).
  2. If I remain use_semantic = True at file nusc-sem-gs.txt and change file lib/gaussian_renderer/init.py at
    raster_settings = diff_3d_gaussian_rasterization_semantic.GaussianRasterizationSettings(...) to raster_settings = diff_3d_gaussian_rasterization.GaussianRasterizationSettings(...)
    And
    rasterizer = diff_3d_gaussian_rasterization_semantic.GaussianRasterizer(raster_settings=raster_settings) to rasterizer = diff_3d_gaussian_rasterization.GaussianRasterizer(raster_settings=raster_settings)
    I get the rgb_marched (or colors from rasterizer) like the image below (RGB - depth - color rasterizer, respectively)
    image

3.If I not change anything, and run script code , rgb_marched has 18 channels (respectively 18 semantic classes) and I don't know how to reduce 18 channels to 3 channels to save the color image as input image.

So I want to ask how can I render RGB images from your rasterizer to visualize instead of load color of input from val_loader at run_vis.py?

Thank you

missing export_gt_depth_ddad.py file

In the project readme, we have instruction on how to export ddad depth maps for evaluation

cd tools
python export_gt_depth_ddad.py val

but there seems to be no export_gt_depth_ddad.py file in the tools folder.

'point_cloud' in ddad dataset not found.

Hello, I attempt to reproduce training and test on DDAD dataset and faces multiple errors. I'd like to ask about the 'point_cloud' in ddad dataset. First, the point_cloud in DDAD.tar are in .npz file but not .npy file, Then, the camera timestamp index_temporal does not match file name of point cloud. Last, the point_cloud seems not used later. Should I just neglect the point cloud file?

Cannot download pre-trained because not have Baidu account

Hi, thanks for your great works. But I have some issue when I download your pre-trained weight in Baidu (I cannot create account in Baidu because I don't have Chinese phone number, so I cannot download it from Baidu). Can you please upload pretrained model in google drive or another online storage ?

Thanks for your help

Training code for stage1

First, Thanks for sharing your great work.

BTW, are you going to release the training code of stage 1?

The versions of mmcv and mmdet

Following the current installation, I always find these two libraries missing and encounter compatibility issues when installing them myself. Could you let me know which versions of mmcv and mmdet were used in this work? Thanks!

About Pose Free

Excellent work! May I ask, if the system is pose-free, how do you use camera poses during the LSS projection or when interpolating to obtain BEV features?

Can create novel view synthesis with this model ?

Hi, thanks for your great work !
I want to ask can your model create novel view synthesis in nuscene dataset ? If yes, how can I change the target pose to render novel view instead of 6 input views ?
Thank you

Some questions about understanding the paper

Hi, Thanks for your great work ! ! !

I want to ask that,

  1. I didn't fully understand the design of your cross view loss. As far as I understand, you extract features from six panoramic images and decode them as Gaussian parameters, then do unprojection through the masked images, followed by splatting. Then you calculate the loss between the rendering images and the original images, right? My questions are: How is the scale ensured in this process? And when calculating the loss, is it based on the original images or the masked images? Are images from adjacent time frames used?

  2. Are Stage 1 and Stage 2 related? Does Stage 1 only train the 6D pose network and the 2D encoder? Then, according to my understanding, the entire process could also be completed through Stage 2 alone, as the task of Stage 1 is unrelated to Occupancy Estimation.

    Looking forward to your reply with sincere anticipation.

The run.sh file is missing

Hi! Thank you very much for such advanced and great work.
When I try to reproduce the code, I get the following error: [Errno 2] No such file or directory: 'run.sh'
I didn't see this file in the project repository either, and I was wondering if I was missing something?

Eval with official checkpoint gets better depth prediction than reported in paper.

Great thanks for your great work and immediate open-source. When I eval with official checkpoint, I find the rayiou and miou metric match the reported in paper while the depth prediction (AbsRel, SqRel, RMSE) is better than reported. I generate depth in the same way of your script. I'd like to ask if my log is normal so that your depth prediction is better than other methods. Attached is my log:
log.txt

scripts to generte DDAD semantics

Hi, I assume you already have a script to generate semantic images for DDAD using GroundedSAM_OccNeRF, and am wondering if you can share it. Thanks.

conda install fails

Conda gets stuck on my machine, so I tried using pip install instead. Do you think the slight difference in CUDA will affect model performance?

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

FileNotFoundError: [Errno 2] No such file or directory: '.\\data\\nuscenes\\nuscenes\\v1.0-trainval\\category.json'

After running python tools/export_gt_depth_nusc.py cmd line in Windows, I got the above error. I've only downloaded the nuScenes v1.0 full dataset part 1. I couldn't manage to find the catergory.json file in the compressed file. Am I missing something or do I need the enitre dataset to run this cmd line. Here is the entire error.

(gsocc) C:\Users\User\Desktop\GaussianOcc>python tools/export_gt_depth_nusc.py Traceback (most recent call last): File "C:\Users\User\Desktop\GaussianOcc\tools\export_gt_depth_nusc.py", line 152, in <module> model = DepthGenerator('val') File "C:\Users\User\Desktop\GaussianOcc\tools\export_gt_depth_nusc.py", line 17, in __init__ self.nusc = NuScenes(version=version, File "C:\Users\User\anaconda3\envs\gsocc\lib\site-packages\nuscenes\nuscenes.py", line 69, in __init__ self.category = self.__load_table__('category') File "C:\Users\User\anaconda3\envs\gsocc\lib\site-packages\nuscenes\nuscenes.py", line 136, in __load_table__ with open(osp.join(self.table_root, '{}.json'.format(table_name))) as f: FileNotFoundError: [Errno 2] No such file or directory: '.\\data\\nuscenes\\nuscenes\\v1.0-trainval\\category.json

GPU Required for Training

Hi, how many GPUs and hours are needed to train the model? I have four RTX 4090s and am unsure if that’s enough to reproduce the results

Custom Single View Dataset

Hi, I want to ask if it's possible to give a customsingle view(like only front driving recorder) for inference?
And doing front view reconstruction?

Cause it seems multiple view information is required

Thanks for the response and good work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.