yzichen / flashocc Goto Github PK

View Code? Open in Web Editor NEW

227.0 227.0 25.0 25.5 MB

Python 94.04% C++ 1.44% C 0.17% Cuda 4.12% Shell 0.24%

flashocc's People

Contributors

Stargazers

Watchers

flashocc's Issues

flashocc环境安装和文件路径错误问题

感谢你们开源了这个优秀的项目！
但我在复现代码的时候遇到了问题，我看到该工作是基于BEVdet是实现的，于是我尝试用我之前的BEVDet的环境去实现你们的项目。可是出现了如图所示的问题：（我的环境跑BEVDet是没有问题的）

他的文件路径出现了问题，我尝试修改相关路径都失败了，您能给我提供一下解决方案或者建议吗？谢谢！

作者你好，我在测试FO_R50_M0时程序报错。下面我直接把结果贴出来
python /home/fxp/Projects/FlashOCC/tools/test.py /home/fxp/Projects/FlashOCC/projects/configs/flashocc/flashocc-r50-M0.py /home/fxp/Projects/FlashOCC/ckpts/flashocc-r50-M0-256x704.pth --eval mAP

Error in sys.excepthook:
Traceback (most recent call last):
File "/home/fxp/anaconda3/envs/flashocc/lib/python3.8/linecache.py", line 47, in getlines
return updatecache(filename, module_globals)
File "/home/fxp/anaconda3/envs/flashocc/lib/python3.8/linecache.py", line 137, in updatecache
lines = fp.readlines()
File "/home/fxp/anaconda3/envs/flashocc/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 513: invalid start byte

Original exception was:
Traceback (most recent call last):
File "/home/fxp/Projects/FlashOCC/tools/test.py", line 290, in
main()
File "/home/fxp/Projects/FlashOCC/tools/test.py", line 260, in main
outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
File "/home/fxp/Projects/FlashOCC/mmdetection3d/mmdet3d/apis/test.py", line 40, in single_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/fxp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/fxp/anaconda3/envs/flashocc/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 50, in forward
return super().forward(*inputs, **kwargs)
File "/home/fxp/.local/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/fxp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/fxp/anaconda3/envs/flashocc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "/home/fxp/Projects/FlashOCC/mmdetection3d/mmdet3d/models/detectors/base.py", line 62, in forward
return self.forward_test(**kwargs)
File "/home/fxp/Projects/FlashOCC/projects/mmdet3d_plugin/models/detectors/bevdet.py", line 237, in forward_test
return self.simple_test(points[0], img_metas[0], img_inputs[0],
File "/home/fxp/Projects/FlashOCC/projects/mmdet3d_plugin/models/detectors/bevdet_occ.py", line 165, in simple_test
File "/home/fxp/Projects/FlashOCC/projects/mmdet3d_plugin/models/detectors/bevdet_occ.py", line 179, in simple_test_occ
File "/home/fxp/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'BEVOCCHead2D' object has no attribute 'get_occ_gpu'

下面是代码图片

实际上get_occ_gpu在class BEVOCCHead2D_V2(BaseModule)下定义，如图所示

Question about "max_sweeps" setting

Hi, in the file "tools/create_data_bevdet.py" , the max_sweeps is set to be 0.
Is it correct? Because I noticed that you also use the "BEVStereo" model for training. Does it need the sweeps data?

npz 文件

作者您好，感谢您的贡献，我想问一下 npz 文件是如何生成的，有代码吗？我想用来训练、验证和测试我自己的数据集。谢谢

M3 version in FlashOCCV1

Thank you for your outstanding work! May I ask which config file corresponds to the M3 version of FlashOCCV1? I don't see the corresponding flashocc-stbase-4d-stereo-512x1408.py file in the link.

inconsistency between the performance of github

Hi , when I test the config of flashocc-r50-4d-stereo, the result is 38.07 :

===> per class IoU of 6019 samples:
===> others - IoU = 9.37
===> barrier - IoU = 46.72
===> bicycle - IoU = 19.43
===> bus - IoU = 41.69
===> car - IoU = 51.07
===> construction_vehicle - IoU = 24.99
===> motorcycle - IoU = 21.24
===> pedestrian - IoU = 23.6
===> traffic_cone - IoU = 24.22
===> trailer - IoU = 31.21
===> truck - IoU = 38.18
===> driveable_surface - IoU = 81.12
===> other_flat - IoU = 40.13
===> sidewalk - IoU = 51.93
===> terrain - IoU = 55.31
===> manmade - IoU = 47.01
===> vegetation - IoU = 39.89
===> mIoU of 6019 samples: 38.07
{'mIoU': array([0.094, 0.467, 0.194, 0.417, 0.511, 0.25 , 0.212, 0.236, 0.242,
       0.312, 0.382, 0.811, 0.401, 0.519, 0.553, 0.47 , 0.399, 0.903])}

which is a little higher than the result you provide:

 with_pretrain:
 ===> per class IoU of 6019 samples:
 ===> others - IoU = 9.08
 ===> barrier - IoU = 46.32
 ===> bicycle - IoU = 17.71
 ===> bus - IoU = 42.7
 ===> car - IoU = 50.64
 ===> construction_vehicle - IoU = 23.72
 ===> motorcycle - IoU = 20.13
 ===> pedestrian - IoU = 22.34
 ===> traffic_cone - IoU = 24.09
 ===> trailer - IoU = 30.26
 ===> truck - IoU = 37.39
 ===> driveable_surface - IoU = 81.68
 ===> other_flat - IoU = 40.13
 ===> sidewalk - IoU = 52.34
 ===> terrain - IoU = 56.46
 ===> manmade - IoU = 47.69
 ===> vegetation - IoU = 40.6
 ===> mIoU of 6019 samples: 37.84 with_pretrain:
 ===> per class IoU of 6019 samples:

BTW, I use 4090 for testing. Could the gap caused by this?

About the FPS

when i test the FPS about the config of flashoccV2-4DLongterm-Depth(8f), it has a large gap between your show in the github

RuntimeError: Ninja is required to load C++ extensions

Hi, thanks for your great work!
I have followed the install.md to setup the env, but i got the following error when i train the model with config "projects/configs/flashocc/flashocc-stbase-4d-stereo-512x1408_4x4_1e-2.py".

Ninja is required to load C++ extensions
File "/FlashOCC/projects/mmdet3d_plugin/core/evaluation/ray_metrics.py", line 12, in
dvr = load("dvr", sources=["lib/dvr/dvr.cpp", "lib/dvr/dvr.cu"], verbose=True, extra_cuda_cflags=['-allow-unsupported-compiler'])
File "FlashOCC/projects/mmdet3d_plugin/datasets/nuscenes_dataset_occ.py", line 13, in
from ..core.evaluation.ray_metrics import main as calc_rayiou
File "/FlashOCC/projects/mmdet3d_plugin/datasets/init.py", line 2, in
from .nuscenes_dataset_occ import NuScenesDatasetOccpancy
File "FlashOCC/projects/mmdet3d_plugin/init.py", line 1, in
from .datasets import *
File "FlashOCC/train.py", line 139, in main
plg_lib = importlib.import_module(_module_path)
File "FlashOCC/train.py", line 286, in
main()
RuntimeError: Ninja is required to load C++ extensions

How pretrained models work on FlashOcc？

Thank you for your excellent work.

I want to know how the pre trained model on the BEVDet detection task works when Conv3d is modified to Conv2d.

such as in https://github.com/Yzichen/FlashOCC/blob/master/projects/configs/flashocc/flashocc-r50-4d-stereo.py#L235, how does "./ckpts/bevdet-r50-4d stereo cbgs. pth" in bring benefits to FlashOcc?

I understand that they should only be effective for img_backbone and img_neck, and I have checked the training log of BEVDet-Occ and it is indeed true, the training log of BEVDet-Occ is below:

2023-04-25 12:03:47,218 - mmdet - INFO - load checkpoint from local path: /mnt/cfs/algorithm/junjie.huang/project/dev2.1/BEVDet/work_dirs/bevdet-r50-4d-stereo-cbgs/epoch_20.pth
2023-04-25 12:03:49,624 - mmdet - WARNING - The model and loaded state dict do not match exactly

size mismatch for img_view_transformer.depth_net.context_conv.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 256, 1, 1]).
size mismatch for img_view_transformer.depth_net.context_conv.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.0.weight: copying a param with shape torch.Size([59, 59, 3, 3]) from checkpoint, the shape in current model is torch.Size([88, 88, 3, 3]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.0.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.1.weight: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.1.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.1.running_mean: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.1.running_var: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.2.weight: copying a param with shape torch.Size([59, 59, 3, 3]) from checkpoint, the shape in current model is torch.Size([88, 88, 3, 3]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.2.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.3.weight: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.3.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.3.running_mean: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.3.running_var: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.depth_conv.0.conv1.weight: copying a param with shape torch.Size([256, 315, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 344, 3, 3]).
size mismatch for img_view_transformer.depth_net.depth_conv.0.downsample.weight: copying a param with shape torch.Size([256, 315, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 344, 1, 1]).
size mismatch for img_view_transformer.depth_net.depth_conv.4.weight: copying a param with shape torch.Size([59, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([88, 256, 1, 1]).
size mismatch for img_view_transformer.depth_net.depth_conv.4.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
unexpected key in source state_dict: img_bev_encoder_backbone.layers.0.1.conv1.weight, img_bev_encoder_backbone.layers.0.1.bn1.weight, img_bev_encoder_backbone.layers.0.1.bn1.bias, img_bev_encoder_backbone.layers.0.1.bn1.running_mean, img_bev_encoder_backbone.layers.0.1.bn1.running_var, img_bev_encoder_backbone.layers.0.1.bn1.num_batches_tracked, img_bev_encoder_backbone.layers.0.1.conv2.weight, img_bev_encoder_backbone.layers.0.1.bn2.weight, img_bev_encoder_backbone.layers.0.1.bn2.bias, img_bev_encoder_backbone.layers.0.1.bn2.running_mean, img_bev_encoder_backbone.layers.0.1.bn2.running_var, img_bev_encoder_backbone.layers.0.1.bn2.num_batches_tracked, img_bev_encoder_backbone.layers.0.0.bn1.weight, img_bev_encoder_backbone.layers.0.0.bn1.bias, img_bev_encoder_backbone.layers.0.0.bn1.running_mean, img_bev_encoder_backbone.layers.0.0.bn1.running_var, img_bev_encoder_backbone.layers.0.0.bn1.num_batches_tracked, img_bev_encoder_backbone.layers.0.0.bn2.weight, img_bev_encoder_backbone.layers.0.0.bn2.bias, img_bev_encoder_backbone.layers.0.0.bn2.running_mean, img_bev_encoder_backbone.layers.0.0.bn2.running_var, img_bev_encoder_backbone.layers.0.0.bn2.num_batches_tracked, img_bev_encoder_backbone.layers.0.0.conv1.weight, img_bev_encoder_backbone.layers.0.0.conv2.weight, img_bev_encoder_backbone.layers.0.0.downsample.weight, img_bev_encoder_backbone.layers.0.0.downsample.bias, img_bev_encoder_backbone.layers.1.0.bn1.weight, img_bev_encoder_backbone.layers.1.0.bn1.bias, img_bev_encoder_backbone.layers.1.0.bn1.running_mean, img_bev_encoder_backbone.layers.1.0.bn1.running_var, img_bev_encoder_backbone.layers.1.0.bn1.num_batches_tracked, img_bev_encoder_backbone.layers.1.0.bn2.weight, img_bev_encoder_backbone.layers.1.0.bn2.bias, img_bev_encoder_backbone.layers.1.0.bn2.running_mean, img_bev_encoder_backbone.layers.1.0.bn2.running_var, img_bev_encoder_backbone.layers.1.0.bn2.num_batches_tracked, img_bev_encoder_backbone.layers.1.0.conv1.weight, img_bev_encoder_backbone.layers.1.0.conv2.weight, img_bev_encoder_backbone.layers.1.0.downsample.weight, img_bev_encoder_backbone.layers.1.0.downsample.bias, img_bev_encoder_backbone.layers.1.1.bn1.weight, img_bev_encoder_backbone.layers.1.1.bn1.bias, img_bev_encoder_backbone.layers.1.1.bn1.running_mean, img_bev_encoder_backbone.layers.1.1.bn1.running_var, img_bev_encoder_backbone.layers.1.1.bn1.num_batches_tracked, img_bev_encoder_backbone.layers.1.1.bn2.weight, img_bev_encoder_backbone.layers.1.1.bn2.bias, img_bev_encoder_backbone.layers.1.1.bn2.running_mean, img_bev_encoder_backbone.layers.1.1.bn2.running_var, img_bev_encoder_backbone.layers.1.1.bn2.num_batches_tracked, img_bev_encoder_backbone.layers.1.1.conv1.weight, img_bev_encoder_backbone.layers.1.1.conv2.weight, img_bev_encoder_backbone.layers.2.0.bn1.weight, img_bev_encoder_backbone.layers.2.0.bn1.bias, img_bev_encoder_backbone.layers.2.0.bn1.running_mean, img_bev_encoder_backbone.layers.2.0.bn1.running_var, img_bev_encoder_backbone.layers.2.0.bn1.num_batches_tracked, img_bev_encoder_backbone.layers.2.0.bn2.weight, img_bev_encoder_backbone.layers.2.0.bn2.bias, img_bev_encoder_backbone.layers.2.0.bn2.running_mean, img_bev_encoder_backbone.layers.2.0.bn2.running_var, img_bev_encoder_backbone.layers.2.0.bn2.num_batches_tracked, img_bev_encoder_backbone.layers.2.0.conv1.weight, img_bev_encoder_backbone.layers.2.0.conv2.weight, img_bev_encoder_backbone.layers.2.0.downsample.weight, img_bev_encoder_backbone.layers.2.0.downsample.bias, img_bev_encoder_backbone.layers.2.1.bn1.weight, img_bev_encoder_backbone.layers.2.1.bn1.bias, img_bev_encoder_backbone.layers.2.1.bn1.running_mean, img_bev_encoder_backbone.layers.2.1.bn1.running_var, img_bev_encoder_backbone.layers.2.1.bn1.num_batches_tracked, img_bev_encoder_backbone.layers.2.1.bn2.weight, img_bev_encoder_backbone.layers.2.1.bn2.bias, img_bev_encoder_backbone.layers.2.1.bn2.running_mean, img_bev_encoder_backbone.layers.2.1.bn2.running_var, img_bev_encoder_backbone.layers.2.1.bn2.num_batches_tracked, img_bev_encoder_backbone.layers.2.1.conv1.weight, img_bev_encoder_backbone.layers.2.1.conv2.weight, img_bev_encoder_neck.up2.1.weight, img_bev_encoder_neck.up2.2.weight, img_bev_encoder_neck.up2.2.bias, img_bev_encoder_neck.up2.2.running_mean, img_bev_encoder_neck.up2.2.running_var, img_bev_encoder_neck.up2.2.num_batches_tracked, img_bev_encoder_neck.up2.4.weight, img_bev_encoder_neck.up2.4.bias, img_bev_encoder_neck.conv.0.weight, img_bev_encoder_neck.conv.1.weight, img_bev_encoder_neck.conv.1.bias, img_bev_encoder_neck.conv.1.running_mean, img_bev_encoder_neck.conv.1.running_var, img_bev_encoder_neck.conv.1.num_batches_tracked, img_bev_encoder_neck.conv.3.weight, img_bev_encoder_neck.conv.4.weight, img_bev_encoder_neck.conv.4.bias, img_bev_encoder_neck.conv.4.running_mean, img_bev_encoder_neck.conv.4.running_var, img_bev_encoder_neck.conv.4.num_batches_tracked, pre_process_net.layers.0.1.conv1.weight, pre_process_net.layers.0.1.bn1.weight, pre_process_net.layers.0.1.bn1.bias, pre_process_net.layers.0.1.bn1.running_mean, pre_process_net.layers.0.1.bn1.running_var, pre_process_net.layers.0.1.bn1.num_batches_tracked, pre_process_net.layers.0.1.conv2.weight, pre_process_net.layers.0.1.bn2.weight, pre_process_net.layers.0.1.bn2.bias, pre_process_net.layers.0.1.bn2.running_mean, pre_process_net.layers.0.1.bn2.running_var, pre_process_net.layers.0.1.bn2.num_batches_tracked, pre_process_net.layers.0.0.bn1.weight, pre_process_net.layers.0.0.bn1.bias, pre_process_net.layers.0.0.bn1.running_mean, pre_process_net.layers.0.0.bn1.running_var, pre_process_net.layers.0.0.bn1.num_batches_tracked, pre_process_net.layers.0.0.bn2.weight, pre_process_net.layers.0.0.bn2.bias, pre_process_net.layers.0.0.bn2.running_mean, pre_process_net.layers.0.0.bn2.running_var, pre_process_net.layers.0.0.bn2.num_batches_tracked, pre_process_net.layers.0.0.conv1.weight, pre_process_net.layers.0.0.conv2.weight, pre_process_net.layers.0.0.downsample.weight, pre_process_net.layers.0.0.downsample.bias

missing keys in source state_dict: img_bev_encoder_backbone.layers.0.0.conv1.conv.weight, img_bev_encoder_backbone.layers.0.0.conv1.bn.weight, img_bev_encoder_backbone.layers.0.0.conv1.bn.bias, img_bev_encoder_backbone.layers.0.0.conv1.bn.running_mean, img_bev_encoder_backbone.layers.0.0.conv1.bn.running_var, img_bev_encoder_backbone.layers.0.0.conv2.conv.weight, img_bev_encoder_backbone.layers.0.0.conv2.bn.weight, img_bev_encoder_backbone.layers.0.0.conv2.bn.bias, img_bev_encoder_backbone.layers.0.0.conv2.bn.running_mean, img_bev_encoder_backbone.layers.0.0.conv2.bn.running_var, img_bev_encoder_backbone.layers.0.0.downsample.conv.weight, img_bev_encoder_backbone.layers.0.0.downsample.bn.weight, img_bev_encoder_backbone.layers.0.0.downsample.bn.bias, img_bev_encoder_backbone.layers.0.0.downsample.bn.running_mean, img_bev_encoder_backbone.layers.0.0.downsample.bn.running_var, img_bev_encoder_backbone.layers.1.0.conv1.conv.weight, img_bev_encoder_backbone.layers.1.0.conv1.bn.weight, img_bev_encoder_backbone.layers.1.0.conv1.bn.bias, img_bev_encoder_backbone.layers.1.0.conv1.bn.running_mean, img_bev_encoder_backbone.layers.1.0.conv1.bn.running_var, img_bev_encoder_backbone.layers.1.0.conv2.conv.weight, img_bev_encoder_backbone.layers.1.0.conv2.bn.weight, img_bev_encoder_backbone.layers.1.0.conv2.bn.bias, img_bev_encoder_backbone.layers.1.0.conv2.bn.running_mean, img_bev_encoder_backbone.layers.1.0.conv2.bn.running_var, img_bev_encoder_backbone.layers.1.0.downsample.conv.weight, img_bev_encoder_backbone.layers.1.0.downsample.bn.weight, img_bev_encoder_backbone.layers.1.0.downsample.bn.bias, img_bev_encoder_backbone.layers.1.0.downsample.bn.running_mean, img_bev_encoder_backbone.layers.1.0.downsample.bn.running_var, img_bev_encoder_backbone.layers.1.1.conv1.conv.weight, img_bev_encoder_backbone.layers.1.1.conv1.bn.weight, img_bev_encoder_backbone.layers.1.1.conv1.bn.bias, img_bev_encoder_backbone.layers.1.1.conv1.bn.running_mean, img_bev_encoder_backbone.layers.1.1.conv1.bn.running_var, img_bev_encoder_backbone.layers.1.1.conv2.conv.weight, img_bev_encoder_backbone.layers.1.1.conv2.bn.weight, img_bev_encoder_backbone.layers.1.1.conv2.bn.bias, img_bev_encoder_backbone.layers.1.1.conv2.bn.running_mean, img_bev_encoder_backbone.layers.1.1.conv2.bn.running_var, img_bev_encoder_backbone.layers.2.0.conv1.conv.weight, img_bev_encoder_backbone.layers.2.0.conv1.bn.weight, img_bev_encoder_backbone.layers.2.0.conv1.bn.bias, img_bev_encoder_backbone.layers.2.0.conv1.bn.running_mean, img_bev_encoder_backbone.layers.2.0.conv1.bn.running_var, img_bev_encoder_backbone.layers.2.0.conv2.conv.weight, img_bev_encoder_backbone.layers.2.0.conv2.bn.weight, img_bev_encoder_backbone.layers.2.0.conv2.bn.bias, img_bev_encoder_backbone.layers.2.0.conv2.bn.running_mean, img_bev_encoder_backbone.layers.2.0.conv2.bn.running_var, img_bev_encoder_backbone.layers.2.0.downsample.conv.weight, img_bev_encoder_backbone.layers.2.0.downsample.bn.weight, img_bev_encoder_backbone.layers.2.0.downsample.bn.bias, img_bev_encoder_backbone.layers.2.0.downsample.bn.running_mean, img_bev_encoder_backbone.layers.2.0.downsample.bn.running_var, img_bev_encoder_backbone.layers.2.1.conv1.conv.weight, img_bev_encoder_backbone.layers.2.1.conv1.bn.weight, img_bev_encoder_backbone.layers.2.1.conv1.bn.bias, img_bev_encoder_backbone.layers.2.1.conv1.bn.running_mean, img_bev_encoder_backbone.layers.2.1.conv1.bn.running_var, img_bev_encoder_backbone.layers.2.1.conv2.conv.weight, img_bev_encoder_backbone.layers.2.1.conv2.bn.weight, img_bev_encoder_backbone.layers.2.1.conv2.bn.bias, img_bev_encoder_backbone.layers.2.1.conv2.bn.running_mean, img_bev_encoder_backbone.layers.2.1.conv2.bn.running_var, img_bev_encoder_backbone.layers.2.2.conv1.conv.weight, img_bev_encoder_backbone.layers.2.2.conv1.bn.weight, img_bev_encoder_backbone.layers.2.2.conv1.bn.bias, img_bev_encoder_backbone.layers.2.2.conv1.bn.running_mean, img_bev_encoder_backbone.layers.2.2.conv1.bn.running_var, img_bev_encoder_backbone.layers.2.2.conv2.conv.weight, img_bev_encoder_backbone.layers.2.2.conv2.bn.weight, img_bev_encoder_backbone.layers.2.2.conv2.bn.bias, img_bev_encoder_backbone.layers.2.2.conv2.bn.running_mean, img_bev_encoder_backbone.layers.2.2.conv2.bn.running_var, img_bev_encoder_backbone.layers.2.3.conv1.conv.weight, img_bev_encoder_backbone.layers.2.3.conv1.bn.weight, img_bev_encoder_backbone.layers.2.3.conv1.bn.bias, img_bev_encoder_backbone.layers.2.3.conv1.bn.running_mean, img_bev_encoder_backbone.layers.2.3.conv1.bn.running_var, img_bev_encoder_backbone.layers.2.3.conv2.conv.weight, img_bev_encoder_backbone.layers.2.3.conv2.bn.weight, img_bev_encoder_backbone.layers.2.3.conv2.bn.bias, img_bev_encoder_backbone.layers.2.3.conv2.bn.running_mean, img_bev_encoder_backbone.layers.2.3.conv2.bn.running_var, img_bev_encoder_neck.conv.conv.weight, img_bev_encoder_neck.conv.bn.weight, img_bev_encoder_neck.conv.bn.bias, img_bev_encoder_neck.conv.bn.running_mean, img_bev_encoder_neck.conv.bn.running_var, pre_process_net.layers.0.0.conv1.conv.weight, pre_process_net.layers.0.0.conv1.bn.weight, pre_process_net.layers.0.0.conv1.bn.bias, pre_process_net.layers.0.0.conv1.bn.running_mean, pre_process_net.layers.0.0.conv1.bn.running_var, pre_process_net.layers.0.0.conv2.conv.weight, pre_process_net.layers.0.0.conv2.bn.weight, pre_process_net.layers.0.0.conv2.bn.bias, pre_process_net.layers.0.0.conv2.bn.running_mean, pre_process_net.layers.0.0.conv2.bn.running_var, pre_process_net.layers.0.0.downsample.conv.weight, pre_process_net.layers.0.0.downsample.bn.weight, pre_process_net.layers.0.0.downsample.bn.bias, pre_process_net.layers.0.0.downsample.bn.running_mean, pre_process_net.layers.0.0.downsample.bn.running_var, final_conv.conv.weight, final_conv.conv.bias, predicter.0.weight, predicter.0.bias, predicter.2.weight, predicter.2.bias**

But in FlashOcc, this pretrained model seems to have played a greater role, the pretrained model can benefit img_bev_encoder_backbone and img_bev_encoder_neck, the training log of FlashOcc is below:

`2023-11-27 12:32:05,569 - mmdet - INFO - load checkpoint from local path: ./ckpts/bevdet-r50-4d-stereo-cbgs.pth
2023-11-27 12:32:05,948 - mmdet - WARNING - The model and loaded state dict do not match exactly

size mismatch for img_view_transformer.depth_net.cost_volumn_net.0.weight: copying a param with shape torch.Size([59, 59, 3, 3]) from checkpoint, the shape in current model is torch.Size([88, 88, 3, 3]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.0.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.1.weight: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.1.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.1.running_mean: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.1.running_var: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.2.weight: copying a param with shape torch.Size([59, 59, 3, 3]) from checkpoint, the shape in current model is torch.Size([88, 88, 3, 3]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.2.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.3.weight: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.3.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.3.running_mean: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.cost_volumn_net.3.running_var: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
size mismatch for img_view_transformer.depth_net.depth_conv.0.conv1.weight: copying a param with shape torch.Size([256, 315, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 344, 3, 3]).
size mismatch for img_view_transformer.depth_net.depth_conv.0.downsample.weight: copying a param with shape torch.Size([256, 315, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 344, 1, 1]).
size mismatch for img_view_transformer.depth_net.depth_conv.4.weight: copying a param with shape torch.Size([59, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([88, 256, 1, 1]).
size mismatch for img_view_transformer.depth_net.depth_conv.4.bias: copying a param with shape torch.Size([59]) from checkpoint, the shape in current model is torch.Size([88]).
unexpected key in source state_dict: pts_bbox_head.shared_conv.conv.weight, pts_bbox_head.shared_conv.bn.weight, pts_bbox_head.shared_conv.bn.bias, pts_bbox_head.shared_conv.bn.running_mean, pts_bbox_head.shared_conv.bn.running_var, pts_bbox_head.shared_conv.bn.num_batches_tracked, pts_bbox_head.task_heads.0.reg.0.conv.weight, pts_bbox_head.task_heads.0.reg.0.bn.weight, pts_bbox_head.task_heads.0.reg.0.bn.bias, pts_bbox_head.task_heads.0.reg.0.bn.running_mean, pts_bbox_head.task_heads.0.reg.0.bn.running_var, pts_bbox_head.task_heads.0.reg.0.bn.num_batches_tracked, pts_bbox_head.task_heads.0.reg.1.weight, pts_bbox_head.task_heads.0.reg.1.bias, pts_bbox_head.task_heads.0.height.0.conv.weight, pts_bbox_head.task_heads.0.height.0.bn.weight, pts_bbox_head.task_heads.0.height.0.bn.bias, pts_bbox_head.task_heads.0.height.0.bn.running_mean, pts_bbox_head.task_heads.0.height.0.bn.running_var, pts_bbox_head.task_heads.0.height.0.bn.num_batches_tracked, pts_bbox_head.task_heads.0.height.1.weight, pts_bbox_head.task_heads.0.height.1.bias, pts_bbox_head.task_heads.0.dim.0.conv.weight, pts_bbox_head.task_heads.0.dim.0.bn.weight, pts_bbox_head.task_heads.0.dim.0.bn.bias, pts_bbox_head.task_heads.0.dim.0.bn.running_mean, pts_bbox_head.task_heads.0.dim.0.bn.running_var, pts_bbox_head.task_heads.0.dim.0.bn.num_batches_tracked, pts_bbox_head.task_heads.0.dim.1.weight, pts_bbox_head.task_heads.0.dim.1.bias, pts_bbox_head.task_heads.0.rot.0.conv.weight, pts_bbox_head.task_heads.0.rot.0.bn.weight, pts_bbox_head.task_heads.0.rot.0.bn.bias, pts_bbox_head.task_heads.0.rot.0.bn.running_mean, pts_bbox_head.task_heads.0.rot.0.bn.running_var, pts_bbox_head.task_heads.0.rot.0.bn.num_batches_tracked, pts_bbox_head.task_heads.0.rot.1.weight, pts_bbox_head.task_heads.0.rot.1.bias, pts_bbox_head.task_heads.0.vel.0.conv.weight, pts_bbox_head.task_heads.0.vel.0.bn.weight, pts_bbox_head.task_heads.0.vel.0.bn.bias, pts_bbox_head.task_heads.0.vel.0.bn.running_mean, pts_bbox_head.task_heads.0.vel.0.bn.running_var, pts_bbox_head.task_heads.0.vel.0.bn.num_batches_tracked, pts_bbox_head.task_heads.0.vel.1.weight, pts_bbox_head.task_heads.0.vel.1.bias, pts_bbox_head.task_heads.0.heatmap.0.conv.weight, pts_bbox_head.task_heads.0.heatmap.0.bn.weight, pts_bbox_head.task_heads.0.heatmap.0.bn.bias, pts_bbox_head.task_heads.0.heatmap.0.bn.running_mean, pts_bbox_head.task_heads.0.heatmap.0.bn.running_var, pts_bbox_head.task_heads.0.heatmap.0.bn.num_batches_tracked, pts_bbox_head.task_heads.0.heatmap.1.weight, pts_bbox_head.task_heads.0.heatmap.1.bias, pre_process_net.layers.0.1.conv1.weight, pre_process_net.layers.0.1.bn1.weight, pre_process_net.layers.0.1.bn1.bias, pre_process_net.layers.0.1.bn1.running_mean, pre_process_net.layers.0.1.bn1.running_var, pre_process_net.layers.0.1.bn1.num_batches_tracked, pre_process_net.layers.0.1.conv2.weight, pre_process_net.layers.0.1.bn2.weight, pre_process_net.layers.0.1.bn2.bias, pre_process_net.layers.0.1.bn2.running_mean, pre_process_net.layers.0.1.bn2.running_var, pre_process_net.layers.0.1.bn2.num_batches_tracked

missing keys in source state_dict: occ_head.final_conv.conv.weight, occ_head.final_conv.conv.bias, occ_head.predicter.0.weight, occ_head.predicter.0.bias, occ_head.predicter.2.weight, occ_head.predicter.2.bias
`

I would greatly appreciate it if the author could explain this phenomenon.

About EMA in training process

Thank you very much for the open-source code! Could you please clarify whether the report data uses EMA (Exponential Moving Average) during the training process for the weights?

About bevdet

Hi, thank you for your kind words. Are you asking if there are Flashocc configuration files or ONNX models based on the original BEVDet without using plugins like BEVPool or BEVPoolv2? Deploying a model containing those plugins to non-CUDA frameworks on edge devices can indeed be quite cumbersome.

FlashOcc on FB-OCC

May I ask when your FlashOcc on FB-OCC code will be open source? Thank you very very much.

a question about the semantic categories in the configuration file

Thank you very much for your outstanding work. I would like to ask why there are only 10 semantic categories in the configuration file, which does not meet the 17 categories in the paper.

GPU memory required for training

Hello author, thank you for opening up this excellent project.
I would like to inquire about the GPU memory required for flashocc training.
thanks!

KeyError: 'scene_name'

Everything is ok when I train and test using the flashocc-r50.py, but when I run the following code to try the visualization pipeline,I got the error:

bash tools/dist_test.sh projects/configs/flashocc/flashocc-r50.py ckpts/flashocc-r50-256x704.pth 4 --eval map --eval-options show_dir=work_dirs/flashocc_r50/results 
 here is the error:
![image](https://github.com/Yzichen/FlashOCC/assets/55696767/a276c4ae-3e06-4232-a048-3fa1357bcc77)

print(dataset.evaluate(outputs, **eval_kwargs))
File "/home/maojilei/loc_code/FlashOCC/projects/mmdet3d_plugin/datasets/nuscenes_dataset_occ.py", line 92, in evaluate
scene_name = info['scene_name']
KeyError: 'scene_name'

missing log file of BEVDetOCC-R50

The link of BEVDetOCC-R50 log file is broken. Could you please provide a new one?

visulization problem

Hi, author.
Now I am using your visualization script tp visualize occupancy result, but one problem is that the view of the result is wrong, it looks like this:

It's totally different from your post in this repo
Do you have any ideas about this problem?
By the way, my environment is
open3d 0.17.0
PyYAML 5.3.1
torch 1.10.0+cu111
torchaudio 0.10.0+rocm4.1
torchpack 0.3.1
torchvision 0.11.0+cu111
mmcv-full 1.5.3
mmdeploy 0.9.0
mmdet 2.25.1
mmdet3d 1.0.0rc4
mmsegmentation 0.25.0

About Panoptic Occupancy Processing

Heatmap and occ_semantic are not obtained from the same branch, the number of instance centers obtained by instance regression maybe different from the number of instances actually existing in occ_semantic, which may cause these problems:

There is not a distance threshold limit when looking for the nearest instance center point. If a very far center point is found, under-segmentation will occur, and the instance ID of multiple instances will be the same
If the heatmap head dosen't perform well, there will be many categories in the field of the same location which can't be filtered by maxpooling, then over-segmentation will occur, and multiple instance IDs will appear for the same instance

I don't know whether these issues will affect the final panoptic occupancy results...

TensorRT Implement Writen In C++ With Cuda Acceleration

Thank you very much for your awesome work. When will support it TensorRT(C++) Implement?

Visualization of v1 and v2 model

Hi, thank you for your great work! I have some questions of visualization of v1 and v2 model.

Config (v1 model): M3:FlashOCC-4D-Stereo (2f):

Why are there two vertical bars of blank space next to the ego car in many pictures?

Config (v2 model): FlashOCCV2-4DLongterm-Depth (16f):

Why is the ground plane basically blank？
This v2 model did not use camera mask for training. I think it can improve the prediction of invisible area like far sky compared to v1 model, but why does the v2 model predict both the sky and the ground poorly and which part improves in v2 model?

Hope to get your answer, thank you！

Why FlashOcc could achieve such outstanding performance with only conv2d ops?

Hello, @Yzichen brilliant idear and nice work!
With all due respect, I wonder why FlashOcc could achieve such amazing results shown in yor paper with only conv2d ops. Couple months ago, I trained BevdetOcc2D which also uses 2D img_bev_encoder_backbone and img_bev_encoder_neck (z is collapsed), the only difference with FlashOcc is that before the features enter the head, they are reshaped to 3D, thus the head still uses Conv3d. When I read your code, I expect to see some extrodinary operation or design in the head part, but the difference is that it replaces the conv3d with conv2d simply and adds some reshape ops to get the final predictions. I can't help wondering why this kind of architecture works so well or maybe there is some novel design that I didn't notice here. Please provide some hints. Thank you a lot.

label.npz 文件

Hello author, may I ask if it is possible to train, test, and visualize our own dataset without the label.npz file. I have now achieved the conversion of my annotated dataset into nuscenes. Thank you, looking forward to your reply.

Test server result

Hi, @Yzichen. Thanks for your wonderful work. Is the result uploaded to the official server in leaderboard for evaluation?
Looking forward to your reply.
Best wishes!

About feature map

Hi, Thank you very much for open-sourcing your code. I have a question I'd like to ask you. I have noticed that many object detection or OCC tasks utilize 16x downsampling, followed by depth distribution estimation on the feature maps. Have you experimented with 4x or 8x feature maps? What are the constraints, such as memory and GPU?

The link is broken

The link of the model for BEVDetOCC-R50 is broken.

Unable to reproduce experimental results

We run the flashocc-r50.py and flashocc-stbase-4d-stereo-512x1408.py configs, but can not get the reported results:

flashocc-r50.py
reported:

===> per class IoU of 6019 samples:

===> others - IoU = 6.74

===> barrier - IoU = 37.65

===> bicycle - IoU = 10.26

===> bus - IoU = 39.55

===> car - IoU = 44.36

===> construction_vehicle - IoU = 14.88

===> motorcycle - IoU = 13.4

===> pedestrian - IoU = 15.79

===> traffic_cone - IoU = 15.38

===> trailer - IoU = 27.44

===> truck - IoU = 31.73

===> driveable_surface - IoU = 78.82

===> other_flat - IoU = 37.98

===> sidewalk - IoU = 48.7

===> terrain - IoU = 52.5

===> manmade - IoU = 37.89

===> vegetation - IoU = 32.24

===> mIoU of 6019 samples: 32.08

re-implement

===> per class IoU of 6019 samples:

===> others - IoU = 4.76

===> barrier - IoU = 32.72

===> bicycle - IoU = 10.02

===> bus - IoU = 32.77

===> car - IoU = 41.14

===> construction_vehicle - IoU = 14.91

===> motorcycle - IoU = 13.73

===> pedestrian - IoU = 15.38

===> traffic_cone - IoU = 15.02

===> trailer - IoU = 26.15

===> truck - IoU = 28.94

===> driveable_surface - IoU = 76.61

===> other_flat - IoU = 34.25

===> sidewalk - IoU = 43.99

===> terrain - IoU = 48.82

===> manmade - IoU = 33.38

===> vegetation - IoU = 30.17

===> mIoU of 6019 samples: 29.57

flashocc-stbase-4d-stereo-512x1408.py
reported:

===> per class IoU of 6019 samples:

===> others - IoU = 13.42

===> barrier - IoU = 51.07

===> bicycle - IoU = 27.68

===> bus - IoU = 51.57

===> car - IoU = 56.22

===> construction_vehicle - IoU = 27.27

===> motorcycle - IoU = 29.98

===> pedestrian - IoU = 29.93

===> traffic_cone - IoU = 29.8

===> trailer - IoU = 37.77

===> truck - IoU = 43.52

===> driveable_surface - IoU = 83.81

===> other_flat - IoU = 46.55

===> sidewalk - IoU = 56.15

===> terrain - IoU = 59.56

===> manmade - IoU = 50.84

===> vegetation - IoU = 44.67

===> mIoU of 6019 samples: 43.52

re-implement

===> per class IoU of 6019 samples:

===> others - IoU = 11.87

===> barrier - IoU = 48.89

===> bicycle - IoU = 28.64

===> bus - IoU = 50.12

===> car - IoU = 54.11

===> construction_vehicle - IoU = 24.95

===> motorcycle - IoU = 29.44

===> pedestrian - IoU = 28.22

===> traffic_cone - IoU = 27.04

===> trailer - IoU = 34.54

===> truck - IoU = 41.29

===> driveable_surface - IoU = 82.67

===> other_flat - IoU = 43.11

===> sidewalk - IoU = 54.65

===> terrain - IoU = 58.16

===> manmade - IoU = 49.85

===> vegetation - IoU = 43.37

===> mIoU of 6019 samples: 41.82

Are there parameter settings error? We have not made any modifications.

May I ask the center of Occ3D dataset ground truth coordinate

In the environment setup process in step 4 for the occupancy prediction task, the guideline told us to only download gts from the occ3d dataset. May I ask which coordinate the occ3d dataset uses? Does the occ3d dataset use Nuscenes ego coordinates as their ground truth centre?

get flops failed

Thank you for your great work!

The provided code works well for the training task. However, when attempting to run the code for detecting FLOPs, I encountered the following issue. I would like to inquire whether the data for the model's parameter count in your paper was obtained using the code snippet below.

python tools/analysis_tools/get_flops.py projects/configs/flashocc/flashocc-r50.py  --shape 256 704

and the error is:

Traceback (most recent call last):
  File "tools/analysis_tools/get_flops.py", line 109, in <module>
    main()                                                       
  File "tools/analysis_tools/get_flops.py", line 83, in main     
    model = build_model(                                         
  File "C:\Users\64679\anaconda3\envs\bevdet\lib\site-packages\mmdet3d-1.0.0rc4-py3.8.egg\mmdet3d\models\builder.py", line 122, in build_model
    return build_detector(cfg, train_cfg=train_cfg, test_cfg=test_cfg)
  File "C:\Users\64679\anaconda3\envs\bevdet\lib\site-packages\mmdet3d-1.0.0rc4-py3.8.egg\mmdet3d\models\builder.py", line 95, in build_detector
    return MMDET_DETECTORS.build(
  File "C:\Users\64679\anaconda3\envs\bevdet\lib\site-packages\mmcv\utils\registry.py", line 237, in build
    return self.build_func(*args, **kwargs, registry=self)
  File "C:\Users\64679\anaconda3\envs\bevdet\lib\site-packages\mmcv\cnn\builder.py", line 27, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "C:\Users\64679\anaconda3\envs\bevdet\lib\site-packages\mmcv\utils\registry.py", line 61, in build_from_cfg
    raise KeyError(
KeyError: 'BEVDetOCC is not in the models registry'

However, the BEVDetOCC is already registered in file "projects/mmdet3d_plugin/models/detectors/bevdet_occ.py"
So, I was wondering how to fix that

class_wise 参数设置问题？

您好，在您的论文中看到有用class_wise，但是在代码 projects/mmdet3d_plugin/models/dense_heads/bev_occ_head.py 中并没有看到 class_wise 的具体使用地方，请问该怎样在训练中使用class_wise呢？

Some details about the Center Regression Head in Panoptic-FlashOcc

Given that the BEV feature is a 2D flattened pillar representation rather than a 3D voxel representation, the regression for x and y represents the transitional ratio from the actual center to the discrete index, while the regression for z accounts for the estimation of the absolute height ratio.

can you explain the generation of ground_truth and prediction_result of (x,y,z) in detail, best in examples

RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling `cusolverDnCreate(handle)`

Hi, at present, I have deployed the environment and conducted inference testing according to the documentation (https://github.com/Yzichen/FlashOCC/blob/master/doc/install.md), but the following error message appears. Please help me to see if there is a problem with the model configuration.

WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/utils/setup_env.py:48: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
warnings.warn(
projects.mmdet3d_plugin
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/utils/setup_env.py:48: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
warnings.warn(
projects.mmdet3d_plugin
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/utils/setup_env.py:48: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
warnings.warn(
projects.mmdet3d_plugin
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/utils/setup_env.py:48: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
warnings.warn(
projects.mmdet3d_plugin
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py:401: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead
warnings.warn('DeprecationWarning: pretrained is deprecated, '
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py:401: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead
warnings.warn('DeprecationWarning: pretrained is deprecated, '
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py:401: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead
warnings.warn('DeprecationWarning: pretrained is deprecated, '
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py:401: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead
warnings.warn('DeprecationWarning: pretrained is deprecated, '
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/models/losses/cross_entropy_loss.py:239: UserWarning: Default avg_non_ignore is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set avg_non_ignore=True.
warnings.warn(
load checkpoint from local path: ckpts/flashocc-r50-256x704.pth
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/models/losses/cross_entropy_loss.py:239: UserWarning: Default avg_non_ignore is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set avg_non_ignore=True.
warnings.warn(
load checkpoint from local path: ckpts/flashocc-r50-256x704.pth
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/models/losses/cross_entropy_loss.py:239: UserWarning: Default avg_non_ignore is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set avg_non_ignore=True.
warnings.warn(
/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/models/losses/cross_entropy_loss.py:239: UserWarning: Default avg_non_ignore is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set avg_non_ignore=True.
warnings.warn(
load checkpoint from local path: ckpts/flashocc-r50-256x704.pth
load checkpoint from local path: ckpts/flashocc-r50-256x704.pth
[ ] 0/6019, elapsed: 0s, ETA:/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/datasets/pipelines/loading.py:361: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:201.)
gt_boxes, gt_labels = torch.Tensor(gt_boxes), torch.tensor(gt_labels)
Traceback (most recent call last):
File "tools/test.py", line 290, in
main()
File "tools/test.py", line 266, in main
outputs = multi_gpu_test(model, data_loader, args.tmpdir,
File "/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmdet/apis/test.py", line 109, in multi_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/users/yangchun.yan/venv/FlashOCC_venv/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "/home/users/yangchun.yan/FlashOCC/mmdetection3d/mmdet3d/models/detectors/base.py", line 62, in forward
return self.forward_test(**kwargs)
File "/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/models/detectors/bevdet.py", line 201, in forward_test
return self.simple_test(points[0], img_metas[0], img_inputs[0],
File "/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/models/detectors/bevdet_occ.py", line 111, in simple_test
img_feats, _, _ = self.extract_feat(
File "/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/models/detectors/bevdet.py", line 116, in extract_feat
img_feats, depth = self.extract_img_feat(img_inputs, img_metas, **kwargs)
File "/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/models/detectors/bevdet.py", line 94, in extract_img_feat
img_inputs = self.prepare_inputs(img_inputs)
File "/home/users/yangchun.yan/FlashOCC/projects/mmdet3d_plugin/models/detectors/bevdet.py", line 72, in prepare_inputs
global2keyego = torch.inverse(keyego2global.double()) # (B, 1, 4, 4)
RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle)

npz file

Hello author, thank you for your contribution. I would like to ask how the npz file is generated and is there any code available? I want to use it to train, validate, and test my own dataset. thanks

About LSSViewTransformerBEVStereo and LSSViewTransformerBEVDepth

Hi, thanks again for your great work!
I have a qustion about the LSSViewTransformerBEVStereo . The code is:

class LSSViewTransformerBEVStereo(LSSViewTransformerBEVDepth):
    def __init__(self,  **kwargs):
        super(LSSViewTransformerBEVStereo, self).__init__(**kwargs)
        # (D, fH_stereo, fW_stereo, 3)  3:(u, v, d)
        self.cv_frustum = self.create_frustum(kwargs['grid_config']['depth'],
                                              kwargs['input_size'],
                                              downsample=4)

Whats the difference between LSSViewTransformerBEVDepth and LSSViewTransformerBEVStereo? Since self.cv_frustum is not used.

Onnx model

Hi, thank you for your outstanding work. Could you please advise on how to convert the model weights to ONNX format, or provide the ONNX file if possible?

Environment Setup

sudo apt-get install python3-dev
sudo apt-get install libevent-dev
sudo apt-get groupinstall 'development tools

Hi, please tell me, when setuping the environment, do I have to install the above libraries in the environment?
These operations require sudo permissions, but on my development machine, sudo permissions require some application.

FlashOCC on object detection task

Thank you for your kind words.

I noticed the existence of the detection head file https://github.com/Yzichen/FlashOCC/blob/master/projects/mmdet3d_plugin/models/dense_heads/bev_centerpoint_head.py for object detection.

I am curious whether you have attempted to replace the occupancy head with the detection head.

Based on my understanding, the features entering the dense head are consistent between detection and occupancy tasks.

Therefore, the features outputted by FlashOCC should be capable of performing object detection tasks as well.

If you have conducted this experiment, could you provide the evaluation metrics for object detection?

is augmentation valid?

In the pipeline, augmentation such as flip, rotation changes input image.
but these augmentations are not currently being applied to the occupancy GT.
Isn't it logically right to stop applying that augmentations?

No supported GPU(s) detected to run this container

Thank you for your great work!

When I used the built image to create a container, the following error occurs:

WARNING: Detected NVIDIA NVIDIA GeForce RTX 4090 GPU, which is not yet supported in this version of the container
ERROR: No supported GPU(s) detected to run this container

Does this mean that NVIDIA TensorRT 22.07 (refer to"FROM nvcr.io/nvidia/tensorrt:22.07-py3" in the dockerfile) is incompatible to RTX 4090 GPU? Is FlashOCC available to run on other GPUs besides 3090?

Many thanks!

你这lss配置特别慢啊

我最近也类似你这个配置，特别特别慢，有什么改善的吗

[Bug] Incorrect Channel Order Flip during Image Loading

There appears to be a bug in projects/mmdet3d_plugin/datasets/pipelines/loading.py at line 21. The to_rgb = True setting assumes that the images are initially in BGR format and then converts them to RGB. However, since the images are loaded using Image.open(), they are already in RGB format from the start.

KeyError: 'MEGVIIEMAHook is already registered in hook'

Thank you for your great work！
I encountered an issue while using both your test and training code. It appears to be related to a problem with repeat registration. Could you please give me some guidance on how to fix this issue?

the loss weight of loss_occ

FlashOCC/projects/mmdet3d_plugin/models/dense_heads/bev_occ_head.py

Line 371 in 5499075

) * 100.0

Please tell me, why its loss weight is 100, others is 1?

AssertionError: bev_pool_v2 is not in the plugin list of tensorrt

Hi, I have already installed mmdeploy from git clone [email protected]:drilistbox/mmdeploy.git, but there is an error when I used the command: python tools/convert_bevdet_to_TRT.py $config $checkpoint $work_dir --fuse-conv-bn --fp16
Could you please give me some instructions? Thank you.

(FlashOcc) 3@1:~/FlashOCC-master-V2$ python tools/convert_bevdet_to_TRT.py $config $checkpoint $work_dir --fuse-conv-bn --int8 --calib_num 256

2024-05-09 18:00:45,267 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /home/xinjishu-workstation3/FY/FlashOCC-master-V2/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so
Traceback (most recent call last):
File "tools/convert_bevdet_to_TRT.py", line 560, in
main()
File "tools/convert_bevdet_to_TRT.py", line 311, in main
assert 'bev_pool_v2' in get_plugin_names(),
AssertionError: bev_pool_v2 is not in the plugin list of tensorrt, please install mmdeploy from https://github.com/HuangJunJie2017/mmdeploy.git

class weight setting in BEVOCCHead2D and BEVOCCHead2D_V2

Hi, in your BEVOCCHead2D, the class balance setting is :

            if self.class_balance:
                valid_voxels = voxel_semantics[mask_camera.bool()]
                num_total_samples = 0
                for i in range(self.num_classes):
                    num_total_samples += (valid_voxels == i).sum() * self.cls_weights[i]
            else:
                num_total_samples = mask_camera.sum()

            loss_occ = self.loss_occ(
                            preds,      # (B*Dx*Dy*Dz, n_cls)
                            voxel_semantics,    # (B*Dx*Dy*Dz, )
                            mask_camera,        # (B*Dx*Dy*Dz, )
                            avg_factor=num_total_samples
                        )
                        loss['loss_occ'] = loss_occ

However, the defined self.cls_weights is not used in cross entropy unlike in BEVOCCHead2D_V2. And I am wondering the how the class balance setting works in BEVOCCHead2D.
Also, in BEVOCCHead2D_V2 'camera mask' is not used, right?

FlashOcc on UniOCC and FB-OCC

Thank you for your outstanding work! May I ask when your FlashOcc on UniOCC and FB-OCC code will be open source?

Data preparation

The nuscenes_det.md you mentioned is missing. Could you please provide a specific data preparation process?

Low miou when only counting the occupancy results within the field of view of the front camera

Hello author, thank you for your work!
Why did the MIOU decrease significantly (approximately 5 points：32->27) when only counting the occupancy results within the field of view of the front camera? I set my own camera mask to ignore the rest of the space:

At the same time, I attempted to train using only the images from the front camera, and used the camera mask to only calculate the occupancy loss within the field of view of the front camera. The results were similar.

thx

yzichen / flashocc Goto Github PK

flashocc's People

Contributors

Stargazers

Watchers

Forkers

flashocc's Issues

Thank you for your excellent work.

I want to know how the pre trained model on the BEVDet detection task works when Conv3d is modified to Conv2d.

such as in https://github.com/Yzichen/FlashOCC/blob/master/projects/configs/flashocc/flashocc-r50-4d-stereo.py#L235, how does "./ckpts/bevdet-r50-4d stereo cbgs. pth" in bring benefits to FlashOcc?

I understand that they should only be effective for img_backbone and img_neck, and I have checked the training log of BEVDet-Occ and it is indeed true, the training log of BEVDet-Occ is below:

But in FlashOcc, this pretrained model seems to have played a greater role, the pretrained model can benefit img_bev_encoder_backbone and img_bev_encoder_neck, the training log of FlashOcc is below:

I would greatly appreciate it if the author could explain this phenomenon.

===> per class IoU of 6019 samples:

===> others - IoU = 6.74

===> barrier - IoU = 37.65

===> bicycle - IoU = 10.26

===> bus - IoU = 39.55

===> car - IoU = 44.36

===> construction_vehicle - IoU = 14.88

===> motorcycle - IoU = 13.4

===> pedestrian - IoU = 15.79

===> traffic_cone - IoU = 15.38

===> trailer - IoU = 27.44

===> truck - IoU = 31.73

===> driveable_surface - IoU = 78.82

===> other_flat - IoU = 37.98

===> sidewalk - IoU = 48.7

===> terrain - IoU = 52.5

===> manmade - IoU = 37.89

===> vegetation - IoU = 32.24

===> mIoU of 6019 samples: 32.08

===> per class IoU of 6019 samples:

===> others - IoU = 4.76

===> barrier - IoU = 32.72

===> bicycle - IoU = 10.02

===> bus - IoU = 32.77

===> car - IoU = 41.14

===> construction_vehicle - IoU = 14.91

===> motorcycle - IoU = 13.73

===> pedestrian - IoU = 15.38

===> traffic_cone - IoU = 15.02

===> trailer - IoU = 26.15

===> truck - IoU = 28.94

===> driveable_surface - IoU = 76.61

===> other_flat - IoU = 34.25

===> sidewalk - IoU = 43.99

===> terrain - IoU = 48.82

===> manmade - IoU = 33.38

===> vegetation - IoU = 30.17

===> mIoU of 6019 samples: 29.57

===> per class IoU of 6019 samples:

===> others - IoU = 13.42

===> barrier - IoU = 51.07

===> bicycle - IoU = 27.68

===> bus - IoU = 51.57

===> car - IoU = 56.22

===> construction_vehicle - IoU = 27.27

===> motorcycle - IoU = 29.98

===> pedestrian - IoU = 29.93

===> traffic_cone - IoU = 29.8

===> trailer - IoU = 37.77

===> truck - IoU = 43.52

===> driveable_surface - IoU = 83.81

===> other_flat - IoU = 46.55

===> sidewalk - IoU = 56.15

===> terrain - IoU = 59.56

===> manmade - IoU = 50.84

===> vegetation - IoU = 44.67

===> mIoU of 6019 samples: 43.52

===> per class IoU of 6019 samples:

===> others - IoU = 11.87

===> barrier - IoU = 48.89

===> bicycle - IoU = 28.64

===> bus - IoU = 50.12

===> car - IoU = 54.11

===> construction_vehicle - IoU = 24.95

===> motorcycle - IoU = 29.44

===> pedestrian - IoU = 28.22

===> traffic_cone - IoU = 27.04