robertwyq / panoocc Goto Github PK

View Code? Open in Web Editor NEW

107.0 107.0 5.0 4.22 MB

[CVPR 2024] PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

License: GNU General Public License v3.0

Python 90.25% Shell 0.65% Dockerfile 0.02% Makefile 0.03% Batchfile 0.03% CSS 0.01% C++ 4.64% Cuda 4.39%

panoocc's People

Contributors

Stargazers

Watchers

Forkers

whuhxb seabird-go qshuiqing shawnxiaow1118

panoocc's Issues

Detection accuracy

I see that you compared the detection accuracy in the ablation experiment of the paper, but the code did not test the detection accuracy during the testing process, may I ask if there is any open source code?

camera_mask

Some of the previous methods didn't use camera_mask during training, but used it during eval, right?

I would like to ask in which file the Refine Module is defined?

Question about the align_prev_bev

PanoOcc/projects/mmdet3d_plugin/bevformer/modules/pano_transformer_occ.py

Line 205 in ce36b10

    
           prev_lidar_to_curr_lidar =  np.linalg.inv(curr_ego_to_global) @ prev_ego_to_global

I want to know if there is a problem here? Shouldn't it be from below

 for i in range(bs):
                lidar_to_ego = kwargs[i]['lidar2ego_transformation']
                curr_ego_to_global = kwargs[i]['ego2global_transform_lst'][-1]
                curr_lidar_to_global = curr_ego_to_global @ lidar_to_ego
                curr_grid_in_prev_frame_lst = []
                for j in range(len_queue):
                    prev_ego_to_global = kwargs[i]['ego2global_transform_lst'][j]
                    prev_lidar_to_global = prev_ego_to_global @ lidar_to_ego

                    prev_lidar_to_curr_lidar =  np.linalg.inv(curr_lidar_to_global) @ prev_lidar_to_global 
                    curr_lidar_to_prev_lidar = np.linalg.inv(prev_lidar_to_curr_lidar)

Visualization

How to visualize the Occupancy results in .npz format？

about checkpoints

Thanks for your excellent work, will you release the chekpoints, and when will you release them. Very thanks.

Inference Performance

Thank you for sharing your excellent work. Do you have any data you can share on inference performance time?

Testing with a single GPU

May I ask what changes I need to make to the code when testing with a single GPU？

Qustion about the implementation of voxel self-attention

Hi, thanks for your fantastic work! I have a question about the implementation of the voxel self-attention.

The paper writes ``These sampling points share the same height $z_k$, but with different learnable offsets for $(x_i^m , y_j^m )$. This encourages the voxel queries to interact in the BEV plane". To my understanding, this operation equals to split the voxel features into bev slices through the height dimension and perform deformable attention seperately for each bev slices, is this right? I wonder why not using 3D deformable attention directly?

By the way, I think the implementation of voxel self-attention may also has some problems.

PanoOcc/projects/mmdet3d_plugin/bevformer/modules/occ_temporal_attention.py

Lines 244 to 254 in 898b2a4

    
           sampling_locations = sampling_locations.contiguous() 
        
           if torch.cuda.is_available() and value.is_cuda: 
        
               # using fp16 deformable attention is unstable because it performs many sum operations 
        
               if value.dtype == torch.float16: 
        
                   MultiScaleDeformableAttnFunction = MultiScaleDeformableAttnFunction_fp32 
        
               else: 
        
                   MultiScaleDeformableAttnFunction = MultiScaleDeformableAttnFunction_fp32 
        
               output = MultiScaleDeformableAttnFunction.apply( 
        
                   value, spatial_shapes, level_start_index, sampling_locations, 
        
                   attention_weights, self.im2col_step)

The spatial_shapes here is [50, 50, 16], which is a 3-dimensional vector. However, according to the implementation of MultiScaleDeformableAttnFunction_fp32, it seems only accepts 2-dimensional spatial shapes (line 276 & 277), which means it can only attend to the first 50x50=2500 queries.

PanoOcc/ops/src/cuda/ms_deform_im2col_cuda.cuh

Lines 272 to 294 in 898b2a4

    
           for (int l_col=0; l_col < num_levels; ++l_col) 
        
           { 
        
             const int level_start_id = data_level_start_index[l_col]; 
        
             const int spatial_h_ptr = l_col << 1; 
        
             const int spatial_h = data_spatial_shapes[spatial_h_ptr]; 
        
             const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1]; 
        
             const scalar_t *data_value_ptr = data_value + (data_value_ptr_init_offset + level_start_id * qid_stride); 
        
             for (int p_col=0; p_col < num_point; ++p_col) 
        
             { 
        
               const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr]; 
        
               const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1]; 
        
               const scalar_t weight = data_attn_weight[data_weight_ptr]; 
        
               const scalar_t h_im = loc_h * spatial_h - 0.5; 
        
               const scalar_t w_im = loc_w * spatial_w - 0.5; 
        
               if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w) 
        
               { 
        
                 col += ms_deform_attn_im2col_bilinear(data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im, w_im, m_col, c_col) * weight; 
        
               } 
        
               data_weight_ptr += 1; 
        
               data_loc_w_ptr += 2;

Is there something wrong with this implementation? Or maybe there are some details that I didn’t notice.

RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8

Hi
In docker container, After completing the environment setting, I meet this error

[W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
Traceback (most recent call last):
Traceback (most recent call last):
File "./tools/train.py", line 256, in
File "./tools/train.py", line 256, in
main()
File "./tools/train.py", line 216, in main
main()
File "./tools/train.py", line 216, in main
model.init_weights()
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/base_module.py", line 117, in init_weights
model.init_weights()
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/base_module.py", line 117, in init_weights
m.init_weights()
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/base_module.py", line 106, in init_weights
m.init_weights()
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/base_module.py", line 106, in init_weights
initialize(self, self.init_cfg)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py", line 612, in initialize
initialize(self, self.init_cfg)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py", line 612, in initialize
_initialize(module, cp_cfg)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py", line 517, in _initialize
_initialize(module, cp_cfg)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py", line 517, in _initialize
func(module)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py", line 489, in call
func(module)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py", line 489, in call
load_checkpoint(
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 531, in load_checkpoint
load_checkpoint(
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 531, in load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 470, in _load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 470, in _load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 249, in load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 249, in load_checkpoint
return checkpoint_loader(filename, map_location)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 390, in load_from_torchvision
return checkpoint_loader(filename, map_location)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 390, in load_from_torchvision
return load_from_http(model_urls[model_name], map_location=map_location)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 290, in load_from_http
return load_from_http(model_urls[model_name], map_location=map_location)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/mmcv/runner/checkpoint.py", line 290, in load_from_http
torch.distributed.barrier()
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2538, in barrier
torch.distributed.barrier()
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2538, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 23199) of binary: /root/anaconda3/envs/occ/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/occ/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/anaconda3/envs/occ/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/occ/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

I think this error is related with NCCL, but I can't find solution
I use single GPU RTX 3080, and I just follow your install.md. So I, think all setting is same with your environment.

Please Reply me plz.
Thank you.

I would like to ask in which file the sparse deconvolution is defined。Looking forward to your answer

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	sampling_locations = sampling_locations.contiguous()
	if torch.cuda.is_available() and value.is_cuda:

	# using fp16 deformable attention is unstable because it performs many sum operations
	if value.dtype == torch.float16:
	MultiScaleDeformableAttnFunction = MultiScaleDeformableAttnFunction_fp32
	else:
	MultiScaleDeformableAttnFunction = MultiScaleDeformableAttnFunction_fp32
	output = MultiScaleDeformableAttnFunction.apply(
	value, spatial_shapes, level_start_index, sampling_locations,
	attention_weights, self.im2col_step)

	for (int l_col=0; l_col < num_levels; ++l_col)
	{
	const int level_start_id = data_level_start_index[l_col];
	const int spatial_h_ptr = l_col << 1;
	const int spatial_h = data_spatial_shapes[spatial_h_ptr];
	const int spatial_w = data_spatial_shapes[spatial_h_ptr + 1];
	const scalar_t data_value_ptr = data_value + (data_value_ptr_init_offset + level_start_id qid_stride);
	for (int p_col=0; p_col < num_point; ++p_col)
	{
	const scalar_t loc_w = data_sampling_loc[data_loc_w_ptr];
	const scalar_t loc_h = data_sampling_loc[data_loc_w_ptr + 1];
	const scalar_t weight = data_attn_weight[data_weight_ptr];

	const scalar_t h_im = loc_h * spatial_h - 0.5;
	const scalar_t w_im = loc_w * spatial_w - 0.5;

	if (h_im > -1 && w_im > -1 && h_im < spatial_h && w_im < spatial_w)
	{
	col += ms_deform_attn_im2col_bilinear(data_value_ptr, spatial_h, spatial_w, num_heads, channels, h_im, w_im, m_col, c_col) * weight;
	}

	data_weight_ptr += 1;
	data_loc_w_ptr += 2;