weiyithu / surroundocc Goto Github PK

View Code? Open in Web Editor NEW

688.0 23.0 89.0 351.31 MB

[ICCV 2023] SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving

License: Apache License 2.0

Python 98.29% Shell 0.18% Cuda 1.29% C++ 0.23%

3d-reconstruction 3d-semantic-segmentation occupancy occupancy-prediction

surroundocc's People

Contributors

Stargazers

Watchers

Forkers

smartcai chengwei920412 jeffwang987 jsutcheng collector-m huaifeng1993 zhangzw12319 ai-jie01 jie311 cswangrf wkmvg yuhuang-ca devin-coder kd6696 pengcheng001 orangesodahub zhoumaomin anylee2021 gchen-apollo lunwk rex-lk clw5180 eralien facadedevil faintnj mengxingshifen1218 lruix sheffieldcao jdc08161063 musincen lidarsensorman chaomath doche xinfushe vrgame bingai thanhpham1987 eunseong17 agauto davutcanakbas mytpvresearch houxin-j yongjingli xrh98 zhangym127 sotoodeh kejingjing88212 ottffive peterzs avinswang yyash-patel fine2copyv ira569 sxxxk tmmdhz chisyliu renlancai lgnyu chenzihao008 pyl1206 zhumingxu wang-jh18-svm findurname igo312 jacklisp ywfwyht shane0228 friedrich-m hengcaizhang shengkaiwu autoiv ergouza1224 ccplxx jo-wang alreadyyang sophiezhou

surroundocc's Issues

python setup.py install 时发生错误,我已经安装了urllib3=1.9，但是setup的时候每次版本都是2.0.3，识别不出来

Installed /home/adas/anaconda3/envs/surroundocc/lib/python3.8/site-packages/mmdet3d-0.17.1-py3.8-linux-x86_64.egg
Processing dependencies for mmdet3d==0.17.1
error: urllib3 2.0.3 is installed but urllib3<2.0 is required by {'google-auth'}

some confusing about this paper

hello, what make this success? I think the generated dense lidar occupancy label plays an important role in this work. If i choose another dataset, and can't generate such dense label using possion or other methods, can the proposed method works well or not?
Looking forward to your reply, thanks!

Man-made class means building

Thank you for you great job!
I'd like to ask, does man-made refer to buildings?

请问open3d-python的版本是多少？

Visualize the result using open3d

Thanks for your work. I want to know can we use open3d to do the visualization ? which is more acceptable. or can u provide the package version list? I have encountered the problem with mayavi

Training about different resulotion

Thanks for your wonderful work.
If I want to train a network with different resolutions.

Only need to modify the config file?
and the output result resolution are also the same as the input???
and how to save the inference result as input occ.npy file
mesh vertices | train && mesh vertices | val is not have annotations?

point_cloud_range = [-50, -50, -5.0, 50, 50, 3.0]
occ_size = [200, 200, 16]

Thanks~

How to convert pth to onnx and tensorrt engine?

Thank you for wonderful job.
I have a question.
How to convert pth to onnx and tensorrt engine?

How to obtain movable objects' dense points?如何实现动态物体点云融合？

Thank you for this project. It's amazing. But I want to ask a question. How to obtain dense point clouds of movable objects?

Movable objects are moving and have a dynamic speed. If the movable objects' points from different frames are directly put together, there will be dragging points in the processed scene. So do you use the bbox index to locate the same bbox? If so, how the align the bbox in different frames, because the number of indices in different frames may vary. How does it achieved?

Look forward to your reply! Thanks in advance!

感谢这个项目，太神奇了。不过我想请教个问题。如何获取动态目标的稠密点云？

动态目标是有速度的。如果直接融合不同帧的动态目标点云，融合后场景中，动态目标会存在拖影。如果采用bbox index来定位同一个bbox，但是不同帧index的数量都可能不同，这个是如何实现的？

谢谢！祝好！

An issue that may cause corruption in training

Hi,
I'm training your model on my own dataset and I found that in some particular case training will corrupt.
In your loss function geo_scal_loss you wrote:

spec = ((1 - nonempty_target) * (empty_probs)).sum() / (1 - nonempty_target).sum()

However, if all targets are not empty (looks rediculous, but in my data there are many ignored voxels, therefore it really happens in low resolution targets...), it will be devided by 0, which cause nan loss.

I fixed this issue by manually set it to 0.

problem in 'process_your_own_data.py'

Hi, thanks for your great work firstly. I am trying to process my own data based on your work. However, when I run 'process_your_own_data.py', I met the problem:

Traceback (most recent call last):
  File "tools/generate_occupancy_with_own_data/process_your_own_data.py", line 3, in <module>
    import chamfer
ImportError: libc10.so: cannot open shared object file: No such file or directory

So I think maybe need to 'import torch' before 'import chamfer', and it works. Is that a little bug in the code?

About the dense labels you provided

I visualized your labels provided in prepare_dataset:

And it is the dense label? like the right most picture shown as belows:

It looks like there is small difference, like the drivable surface

Result of no semantic model

Hi, I' m just wondering whether you have some visual result of no semantic model, thanks a lot.

Which version of open3d should I use?

As I tried, version <= 0.3.0 cannot work properly on Python 3.7 which is mentioned in your README, while version >= 0.5.0 doesn't have "open3d.geometry.TriangleMesh.create_from_point_cloud_poisson", I wonder if I should use an old version and build from source code?

请问在generate_occupancy_nuscenes.py中，nusc_val_list对应的txt文件是如何生成的

在generate_occupancy_nuscenes.py，
parse.add_argument('--nusc_val_list', type=str, default='./nuscenes_val_list.txt')

Questions about inference time

Hi, thanks for sharing the work. I tried to run inference with ./tools/dist_test.sh ./projects/configs/surroundocc/surroundocc.py ./path/to/ckpts.pth 8 with the pickle file you provided. However, the inference speed is super slow. It took almost 2 hours to go through the entire validation dataset with 8 V100 GPUs, is it normal or I missed something?

Relationship with OpenOccupancy.

I want to know the relationship and differnece of this dataset to OpenOccupancy dataset: https://github.com/JeffWang987/OpenOccupancy
I noticed that some authers show up in both papers, and the datasets looks similar.

Ground Truth是哪个坐标系下的？

你好！我想请教个关于label坐标系的问题：不带语义标签的fov_voxels属于lidar坐标系，而带语义标签的dense_voxels_with_semantic属于voxel坐标系。假如我不使用语义label的话，这里是不是应该保存np.stack(np.where(voxel==1), axis=1)来作为label，而不是使用fov_voxels？或者采用下面的方式来获取voxel indexes？非常感谢～

        fov_voxels[:, 0] = (fov_voxels[:, 0] - pc_range[0]) / voxel_size
        fov_voxels[:, 1] = (fov_voxels[:, 1] - pc_range[1]) / voxel_size
        fov_voxels[:, 2] = (fov_voxels[:, 2] - pc_range[2]) / voxel_size
        voxel_indexes = np.floor(fov_voxels).astype(np.int)

process_your_own_data.py 源代码

Question about folder structure of 'nuscenes_occ'

I wonder what folder is 'nuscenes_occ' and how it is composed.

關於 occupancy resolution 和 ground truth 中的動態物體

你好，首先感謝開源這麼厲害的一個專案，跑過 training 之後有幾點問題想請教一下你們組的看法：

目前 SurroundOcc 的 output 是固定解析度的 voxel grid (XYZ)，在 Tesla AI Day 有提到他們最後除了 fixed-size voxel grid 以外還有 per voxel feature map 可以再遞給 MLP 透過 3D spatial point query 達到理論無限解析度的 continuous occupancy probability，像請問你們組關於這段的模型有什麼看法或相關實驗嗎？
目前 ground truth occupancy voxel grid 是透過將多偵 lidar point cloud 合併在一起，並透過 3d bbox label 將動態物體摳出對齊。不過這樣的做法實務上會面臨一個問題是，occupancy network 的一大優勢在於可以感知異形體或是沒有被 bbox 標注的複雜物體，但目前 SurroundOcc 生成 ground truth 的方法對於 dynamic object 只受限在已知的 bbox。想請問關於這點你們組有什麼想法或實驗嗎？

Question for inference

How to generate data need for generation of our own data?

I want try on my own data. I have the required information but how to convert them? Such as pc, bbox, calib, pose?

The visualization effect does not match the display

Hello, may I ask why the occ voxels appear very sparse when I visualize the npy sample data using visual.py? It is different from the dense voxel effect shown in the cover examples. Could you please explain the reason for this?

Train SurrOcc using sparse data

Have you tried to train the SurroundOcc using the sparse occupancy data, which means more holes within the labels, and how would it performance on it, and especially compare the different methods ? Actually we tried a little, but got bad results.

I noticed that your results on sparse/dense data in table 6. And how did you process the sparse ground truth in the multi-scale supervision? Is it same as the dense one like:

SurroundOcc/projects/mmdet3d_plugin/surroundocc/loss/loss_utils.py

Lines 11 to 14 in d346e8c

    
           gt = torch.zeros([gt_shape[0], gt_shape[2], gt_shape[3], gt_shape[4]]).to(gt_occ.device).type(torch.float)  
        
           for i in range(gt.shape[0]): 
        
               coords = gt_occ[i][:, :3].type(torch.long) // ratio 
        
               gt[i, coords[:, 0], coords[:, 1], coords[:, 2]] =  gt_occ[i][:, 3]

If it's true, I applied it on a dense(above) sparse(below) occupancy data:

Is it what you actually did to obtain the results in paper?

能使用nuscenes的min数据集吗？

About the training logs

I wonder if you could provide your training logs in SurroundOcc?

How do I generate 'occ_path' in nuscenes_infos.pkl ?

I can't find any code related to key word 'occ_path' in nuscenes_converter.py

多帧融合融几帧？能否用未来帧？

您好，请教2个问题：

多帧融合融合几帧有什么说法么？个人感觉是尽可能覆盖整个地面为好，需要考虑ROI和帧率、以及车速是么？
可以融合未来帧么？如果只融合历史帧，当前时刻车前方的点云还是稀疏的，个人感觉可以融合未来帧，因为无论历史还是未来帧，当前时刻都看不到的。

非常感谢！

Question about config for GT reconstruction

Hi there, loved your work. I want to ask about the config.yaml for groundtruth generation of nuscenes. When i change the config to the following:
'depth': 10 'min_density': 0.1 'n_threads': -1 'downsample': False 'voxel_size': 0.4 'max_nn': 20 'pc_range': [-40, -40, -1, 40, 40, 5.4] 'occ_size': [200, 200, 16] 'self_range': [3.0, 3.0, 3.0]
Which changes the pc_range and voxel_size, i saw that the output was no longer correct. Have your team encountered this problem before?
Original GT: "https://drive.google.com/file/d/1x_rXif3HO_-0_lVuIduNQ7QfLgKofw3Z/view?usp=sharing"
Altered GT: "https://drive.google.com/file/d/1sd_FTcMDw_WvE26ei1zsHCavXePiCoIv/view?usp=sharing"

could this model be used for 3D point clouds of nuScenes instead of multi-view images

Hi Yi Wei, great work. I was wondering if the occupancy network can work for 3D point clouds for perception tasks.

关于loss计算的multiscale_supervision函数

同学你好，关于你们loss计算函数multiscale_supervision，我有点看不懂，我的理解这是用来生成多尺度gt，应该用插值啊，看到代码中的这个部分有点不太理解（https://github.com/weiyithu/SurroundOcc/tree/main/projects/mmdet3d_plugin/surroundocc/loss/loss_utils.py），因此过来请教一下。

[Open3D WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)

你好，执行whole_scene_to_mesh代码块儿时，报warning，请问这个warning需要理会么？谢谢！

if args.to_mesh and not args.whole_scene_to_mesh:
    ################## get mesh via Possion Surface Reconstruction ##############
    point_cloud_original = o3d.geometry.PointCloud()
    with_normal2 = o3d.geometry.PointCloud()
    point_cloud_original.points = o3d.utility.Vector3dVector(scene_points[:, :3])
    with_normal = preprocess(point_cloud_original, config)
    with_normal2.points = with_normal.points
    with_normal2.normals = with_normal.normals
    mesh, _ = create_mesh_from_map(None, config['depth'], config['n_threads'],
                                   config['min_density'], with_normal2)
    scene_points = np.asarray(mesh.vertices, dtype=float)

[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 1
[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 4
[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 4
[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 5
[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 2
[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 3
[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 3
[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 3
[WARNING] /root/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1858)
          Extract
          bad average roots: 2

Does training support multiple batches?

As title

wrong in inference

    [mmcv.imread(name, self.color_type) for name in filename], axis=-1)
  File "/opt/conda/envs/open-mmlab/lib/python3.8/site-packages/mmcv/image/io.py", line 176, in imread
    check_file_exist(img_or_path,
  File "/opt/conda/envs/open-mmlab/lib/python3.8/site-packages/mmcv/utils/path.py", line 23, in check_file_exist
    raise FileNotFoundError(msg_tmpl.format(filename))
FileNotFoundError: img file does not exist: ./in_the_wild/clip/0/000510.jpg

./tools/dist_inference.sh ./projects/configs/surroundocc/surroundocc_inference.py ./ckpts/surroundocc.pth 8

can you share the in_the_wild folder ? thanks!

how to creat *.pkl files, or the code is close source?

Question for training

Have you ever met this problem?

Variable typo in evalution_semantic

two i

SurroundOcc/projects/mmdet3d_plugin/datasets/evaluation_metrics.py

Lines 56 to 65 in dcbe924

    
           for i in range(pred_occ.shape[0]): 
        
               gt_i, pred_i = gt_occ[i].cpu().numpy(), pred_occ[i].cpu().numpy() 
        
               gt_i = gt_to_voxel(gt_i, img_metas) 
        
               mask = (gt_i != 255) 
        
               score = np.zeros((class_num, 3)) 
        
               for i in range(class_num): 
        
                   if i == 0: #class 0 for geometry IoU 
        
                       score[i][0] += ((gt_i[mask] != 0) * (pred_i[mask] != 0)).sum() 
        
                       score[i][1] += (gt_i[mask] != 0).sum() 
        
                       score[i][2] += (pred_i[mask] != 0).sum()

can you provide your pip list ?

Did you perform experiments on Lidar Segmentation task?

Hello, really impressive work! Your dense occupancy label generation pipeline is very enlightening. I notice that SurroundOcc achieves SOTA results on 3d Semantic Occupancy prediction and 3D scene completion tasks. But did you perform experiments on Lidar segmentation task? I wonder whether on this task, SurroundOcc also outperforms TPVFormer. If I have to do Lidar segmentation with SurroundOcc by myself, besides the dataset change, what else changes should I make to the model to make it work on this task instead of occupancy prediction?

Did you test the performance of this model on the cvpr challenge dataset?

Are there any results to show?
Waiting for your reply

Is it possible to create a panoptic occupancy map using your code?

I want to create a panoptic occupancy map using panoptic LiDAR labels.
I'm curious if it's possible to generate a panoptic occupancy map using your code.

Visulization

Please can you provide further guidance on how to visualize the occupancy. I have tried to use the following but not working

python tools/visual.py ./projects/configs/surroundocc/surroundocc.py ckpts/surroundocc.pth --work-dir out/surrocc

FileNotFoundError: [Errno 2] No such file or directory: 'temp/pred.npy'

Code of SemanticKiTTI dataset

Hi，I find that you published the performance on SemanticKiTTI dataset. Could you publish the related code of SemanticKiTTI dataset？

batch_size

hello, when i set 2 to sampler_per_gpu in /projects/configs/surroundocc/surroundocc.py，the problem is shown as follows:
RuntimeError: stack expects each tensor to be equal size, but got [62812, 4] at entry 0 and [43226, 4] at entry 1

this is my training shell code

CONFIG=./projects/configs/surroundocc/surroundocc.py
GPUS=2
SAVE_PATH=./work_dirs/surroundocc
PORT=${PORT:-28108}
NCCL_DEBUG=INFO

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT
$(dirname "$0")/train.py $CONFIG --work-dir ${SAVE_PATH} --launcher pytorch ${@:4} --deterministic

distributed training, 2gpus i used.

looking forward to your reply, thanks!

eval_3d中threshold物理意义和范围

你好！请问threshold范围是0~1嘛？我看nuScenes的voxel_size默认是0.5m，这个需要和voxel_size保持一致么？物理意义是寻找距离为一个voxel_size范围内的vert吗？谢谢！

def eval_3d(verts_pred, verts_trgt, threshold=.5):
    d1, d2, idx1, idx2 = chamfer.forward(verts_pred.unsqueeze(0).type(torch.float), verts_trgt.unsqueeze(0).type(torch.float))
    dist1 = torch.sqrt(d1).cpu().numpy()
    dist2 = torch.sqrt(d2).cpu().numpy()
    cd = dist1.mean() + dist2.mean()
    precision = np.mean((dist1<threshold).astype('float'))
    recal = np.mean((dist2<threshold).astype('float'))
    fscore = 2 * precision * recal / (precision + recal)
    metrics = np.array([np.mean(dist1),np.mean(dist2),cd, precision,recal,fscore])
    return metrics

	gt = torch.zeros([gt_shape[0], gt_shape[2], gt_shape[3], gt_shape[4]]).to(gt_occ.device).type(torch.float)
	for i in range(gt.shape[0]):
	coords = gt_occ[i][:, :3].type(torch.long) // ratio
	gt[i, coords[:, 0], coords[:, 1], coords[:, 2]] = gt_occ[i][:, 3]

	for i in range(pred_occ.shape[0]):
	gt_i, pred_i = gt_occ[i].cpu().numpy(), pred_occ[i].cpu().numpy()
	gt_i = gt_to_voxel(gt_i, img_metas)
	mask = (gt_i != 255)
	score = np.zeros((class_num, 3))
	for i in range(class_num):
	if i == 0: #class 0 for geometry IoU
	score[i][0] += ((gt_i[mask] != 0) * (pred_i[mask] != 0)).sum()
	score[i][1] += (gt_i[mask] != 0).sum()
	score[i][2] += (pred_i[mask] != 0).sum()