Code Monkey home page Code Monkey logo

cenet's Introduction

CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous Driving arXiv

Code for our paper:

CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous Driving
Huixian Cheng, Xianfeng Han, Guoqiang Xiao
Accepted by ICME2022

Abstract:

Accurate and fast scene understanding is one of the challenging task for autonomous driving, which requires to take full advantage of LiDAR point clouds for semantic segmentation. In this paper, we present a concise and efficient image-based semantic segmentation network, named CENet. In order to improve the descriptive power of learned features and reduce the computational as well as time complexity, our CENet integrates the convolution with larger kernel size instead of MLP, carefully-selected activation functions, and multiple auxiliary segmentation heads with corresponding loss functions into architecture. Quantitative and qualitative experiments conducted on publicly available benchmarks, SemanticKITTI and SemanticPOSS, demonstrate that our pipeline achieves much better mIoU and inference performance compared with state-of-the-art models.

Updates:

2023-03-28[NEW:sparkles:] CENet achieves competitive performance in robustness evaluation at SemanticKITTI. See Repo of Robo3D for more details.


2022-07-06[:open_mouth::scream::thumbsup:] Ph.D. Hou reported an astounding 67.6% mIoU test performance of CENet, see this issue and PVD Repo for details.

2022-03-28[:sunglasses:] Suggested by reviewer, renamed to CENet.

2022-03-07[:yum:] SENet was very lucky to be provisionally accepted by ICME 2022.

2021-12-29 [:sunglasses:] Release models and training logs, which also contains ablation studies. (Please note that due to multiple updates of the code, some models and configs have inconsistencies that lead to errors, please make corresponding changes according to the specific situation.)

Prepare:

Download SemanticKITTI from official web. Download SemanticPOSS from official web.

Usage:

Train:

  • SemanticKITTI:

    python train.py -d /your_dataset -ac config/arch/senet-512.yml -n senet-512

    Note that the following training strategy is used due to GPU and time constraints, see kitti.sh for details.

    First train the model with 64x512 inputs. Then load the pre-trained model to train the model with 64x1024 inputs, and finally load the pre-trained model to train the model with 64x2048 inputs.

    Also, for this reason, if you want to resume training from a breakpoint, uncomment this section and change "/SENet_valid_best" to "/SENet".

  • SemanticPOSS:

    python train_poss.py -d /your_dataset -ac config/arch/poss.yml -n res

Infer and Eval:

  • SemanticKITTI:

    python infer.py -d /your_dataset -l /your_predictions_path -m trained_model -s valid/test

    Eval for valid sequences:

    python evaluate_iou.py -d /your_dataset -p /your_predictions_path

    For test sequences, need to upload to CodaLab pages.

  • SemanticPOSS:

    python infer_poss.py -d /your_dataset -l /your_predictions_path -m trained_model

    This will generate both predictions and mIoU results.

Visualize Example:

  • Visualize GT:

    python visualize.py -w kitti/poss -d /your_dataset -s what_sequences

  • Visualize Predictions:

    python visualize.py -w kitti/poss -d /your_dataset -p /your_predictions -s what_sequences

Pretrained Models and Logs:

KITTI Result POSS Result Ablation Study Backbone HarDNet
Google Drive Google Drive Google Drive Google Drive

TODO List:

  • Release Pretrained Model and Logs.
  • Try TensorRT acceleration.
  • To make NLA adaptation framework, See here.

Acknowledgments:

Code framework derived from SalsaNext. Models are heavily based on FIDNet. Part of code from SqueezeSegV3. Thanks to their open source code, and also to Ph.D. Zhao for some helpful discussions.

Citation:

@inproceedings{cheng2022cenet,
  title={Cenet: Toward Concise and Efficient Lidar Semantic Segmentation for Autonomous Driving},
  author={Cheng, Hui--Xian and Han, Xian--Feng and Xiao, Guo--Qiang},
  booktitle={2022 IEEE International Conference on Multimedia and Expo (ICME)},
  pages={01--06},
  year={2022},
  organization={IEEE}
}

cenet's People

Contributors

huixiancheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cenet's Issues

About Boundary Loss

Nice work! Thank u for releasing this repo!
I would like to ask a few questions about BoundaryLoss~

  1. Does BoundaryLoss consume a lot of time in your experiments? In my experiments, computing the Boundaryloss consumes a lot of GPU memory and triples the training time.
  2. I don't use the range image method to represent the point cloud, but use the BEV two-dimensional representation, could the BoundaryLoss be used to train the network?

Thanks in advance for your reply!

Config parameters are strange.

These parameters are different with SalsaNext. Can you explain it? Thx.
img_means: #range,x,y,z,signal
- 11.71279
- -0.1023471
- 0.4952
- -1.0545
- 0.2877
img_stds: #range,x,y,z,signal
- 10.24
- 12.295865
- 9.4287
- 0.8643
- 0.1450

fps

I use your model in my project. But the fps is different with that your paper show. With size being 512x64,

********************************************************************************
Cleaning point-clouds with kNN post-processing
kNN parameters:
knn: 7
search: 7
sigma: 1.0
cutoff: 2.0
nclasses: 20
********************************************************************************
Infering in device:  cuda
100%|███████████████████████████████████████| 4071/4071 [02:04<00:00, 32.75it/s]
Mean CNN inference time:0.01585113       std:0.01862994
Mean KNN inference time:0.00275165       std:0.00063380
Total Frames: 4071
Finished Infering

The fps is 67, lower than 84.9 in your paper.

I infer the valid dataset on 3090.

Is Paper Available at arXiv

Congrats on your work!

Is your paper available on arXiv now? Could you share a pre-print with me? Thanks!

Some small questions about the do_range_projection method

Hi, thanks for your excellent work.
There are some small differences between your code and your articles, in the common/laserscan.py row 249, your code is writing to
proj_y = 1.0 - (pitch + abs(fov_down)) / fov # in [0.0, 1.0]
but in the article, you said that the (u, v)'s calculation method is the following:
image

The proj_y's calculate method in your code is pitch add a abs(fov_down) but in your article that is pitch add fov_up, Can you tell me which way is the best way to calculate the v coords.

Model training is slow

I use the code for training and found that the training time is very long, can you provide the code for DDP training

About visualization

当我使用您的代码进行可视化时,输出这样的结果,并没有可视化的界面出现,请问您知道是什么原因吗?期待您的解答!
When I use your code for visualization, the output is like this, and no visual interface appears. Do you know the reason? Looking forward to your answer!

(polarnet) miao@dongtintech-rtx2080ti:~/Music/CENet-main$ python visualize.py -w kitti -d /home/miao/Music/PolarSeg-master/data/ -s 08
********************************************************************************
INTERFACE:
Dataset Type kitti
Dataset /home/miao/Music/PolarSeg-master/data/
Sequence 08
Predictions None
ignore_semantics False
ignore_safety False
offset 0
********************************************************************************
Opening config file of KITTI
Sequence folder exists! Using sequence from /home/miao/Music/PolarSeg-master/data/sequences/08/velodyne
Labels folder exists! Using labels from /home/miao/Music/PolarSeg-master/data/sequences/08/labels
WARNING: Although tkinter is already imported, the tkinter backend could not
be used ("Could not import Tkinter or pyopengltk, module(s) not found."). 
Note that running multiple GUI toolkits simultaneously can cause side effects.
Using semantics in visualizer
/home/miao/anaconda3/envs/polarnet/lib/python3.7/site-packages/vispy/gloo/texture.py:28: UserWarning: GPUs can't support floating point data with more than 32-bits, precision will be lost due to downcasting to 32-bit float.
  warnings.warn(F64_PRECISION_WARNING)
To navigate:
        b: back (previous scan)
        n: next (next scan)
        q: quit (exit program)

Question about showing different result for the validation set

Hi, Thank you for great work!

I have question about the mIOU result.

When I trained the model, the best mIOU for the validation set is about 64. But, When I infer using the SENet_valid_best model, then evaluate the validation set using the semantic_kitti_api & evaluate_iou.py, it shows 61. (64 x 512)

Is there any mistake? or it is as usual?

Thank you

BUG in infer.py

Hello, it's a really nice work, and it is simple but effective. But there is a small problem in infer.py. When I run infer.py, it will overwrite existed folder and files.

About resnet block

Dear author,

Hello! Thanks for sharing the code!!

CENet is a very powerful and efficient network, and I would like to use it for my own dataset. However, my own dataset uses 128-channel lidars (unlike the 64-channel lidars used by semantic-kitti).

And also, I noticed that in both ResNet_34 and BasicBlock, there is a requirement for groups=1 and base_width=64, and that "'BasicBlock only supports groups=1 and base_width=64'".

So, I would like to ask, if I want to use CENet for 128-channel lidar data, can I just change base_width to 128 and keep groups as 1? Or, do you have any other suggestions?

Thanks a lot!

Training setup + Validation results

Hi,
thanks for this great work.

I have a question about the training setup for SemanticKITTI to train from scratch the models with image resolutions $(64\times1024)$ and $(64\times2048)$. Is it the same of the model with images $(64\times512)$ ?

Moreover, could you please provide the results on the validation set of the models with higher image resolutions ?

Thanks

About pretrained model

Dear author,

Thanks for the sharing code.

when I use the pretrained model about Kitti result you gave to run the datasets, the results can not reach the effect in your article. I don't know if there is a problem with the model I used, because there are multiple models in the file. I would like to ask which model you used to get the results in your article.

Thanks!

About pretrained model

Hi~
Thanks for your great job.
I'm not very sure, is the released model under 2048 and 1024 SENet?? Why is its' name Salsanext???

question for environment setting

Thank you for your remarkable work.
I want to run this program, but where can I find the reference for environment settings (such as requirements.txt)?

downsample

self.layer1 = self._make_layer(block, 128, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 128, layers[2], stride=2)
self.layer4 = self._make_layer(block, 128, layers[3], stride=2)

I have a question about the net design, why don't expand the output channel when the downsample. In rangenet, squeezesegv3, salsanext, they all expand channel size to double when downsample. In your net, keeping channel size same when downsample don't lose feature ?

visualization problem

When I run visualize.py to view seg results, but the conversation interface did not appear with no error. Just like in the following figure。
image

train paramter

img_means: #range,x,y,z,signal
- 11.71279
- -0.1023471
- 0.4952
- -1.0545
- 0.2877
img_stds: #range,x,y,z,signal
- 10.24
- 12.295865
- 9.4287
- 0.8643
- 0.1450

I have a question about these paramter. Your paramter is different from darknet which used. Will these paramter increase the model accuracy after you change it.

Multi-GPU training

Hi,

Thank you for your nice work. When I trained this model with 4 GPUs, the loss backpropagation raises an "RuntimeError: grad can be implicitly created only for scaler outputs" but the code works for single gpu training. I notice that the loss should be a tensor with single number but there are four elements for 4 gpus for multi-gpu training. I think there is a bug for multi-gpu training, please check it out. Thank you!

About do_range_projection

Hello,

In common/laserscan.py, you used fov_down to calculate proj_y:

    # get projections in image coords
    proj_x = 0.5 * (yaw / np.pi + 1.0)  # in [0.0, 1.0]
    proj_y = 1.0 - (pitch + abs(fov_down)) / fov  # in [0.0, 1.0]

but in the paper, there is f_up:
image

I am confused about why there is fov_down rather than f_up? Thanks a lot.

Some code questions

Thanks for the sharing code.

  1. I only saw the repvgg module in the code, but you mentioned it in your paper without actually using it, right?
  2. It seems like the inference model only outputs [out] instead of [out, res_2, res_3, res_4] in the code. It looks like I could potentially trim off the extra three output heads when converting it to trt.
  3. Are the first three convolutional layers used for upsampling XYZIR, similar to the upsampling MLP for XYZ in PointNet?

btw
The CENet with 64x512 input model has been deployed on Jetson AGX Xavier and it worked successfully.
fp32 ≈ 6HZ
fp16 ≈10HZ

AttributeError: type object 'Trainer' has no attribute 'save_to_log'

Traceback (most recent call last):
File "train.py", line 133, in
trainer.train()
File "/home/buaa/project_liaozhihao/CENet-main/modules/trainer.py", line 336, in train
Trainer.save_to_log(logdir=self
AttributeError: type object 'Trainer' has no attribute 'save_to_log'

I found this issue when the code is training and this “Trainer.save_to_log” is not found in the Class Trainer.

After infer.py and evaluate_iou.py, the IoU are all 0.000. But the IoU result in train.py are normal value

['train', 'valid', 'test'] set:
Acc avg 0.001
IoU avg 0.000
IoU class 1 [car] = 0.000
IoU class 2 [bicycle] = 0.000
IoU class 3 [motorcycle] = 0.001
IoU class 4 [truck] = 0.000
IoU class 5 [other-vehicle] = 0.000
IoU class 6 [person] = 0.000
IoU class 7 [bicyclist] = 0.000
IoU class 8 [motorcyclist] = 0.000
IoU class 9 [road] = 0.000
IoU class 10 [parking] = 0.000
IoU class 11 [sidewalk] = 0.000
IoU class 12 [other-ground] = 0.000
IoU class 13 [building] = 0.000
IoU class 14 [fence] = 0.000
IoU class 15 [vegetation] = 0.000
IoU class 16 [trunk] = 0.000
IoU class 17 [terrain] = 0.000
IoU class 18 [pole] = 0.000
IoU class 19 [traffic-sign] = 0.000


below can be copied straight for paper table
0.000,0.000,0.001,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.001

在训练阶段的日志预测结果一切正常,但是针对验证集执行infer.py和evaluate_iou.py,之后得到预测结果都是0,并且训练多次也是这样。我下载了您提供的预训练模型512-594和512-valid,执行推理评估后的预测结果也是正常的。

我发现预测结果文件夹里,sequences/08/predictions,我的训练模型推理的结果都是二进制文件,而预训练模型推理的结果有二进制文件也有pcx图像,如下图所示。

请问您知道是什么原因吗?期待您的解答!
1
2

About the released pre-trained models

Hi author,

Thanks for your outstanding work and open source.

I have some questions about the released pre-trained models.
For 512-594, what do 512 and 594 mean respectively?

The README file mentions: First, train the model with 64x512 inputs. Then load the pre-trained model to train the model with 64x1024 inputs, and finally load the pre-trained model to train the model with 64x2048 inputs.
Is this released pre-trained model 512-594 the first step's model, and can you share the final trained model with 64x2048 input?

I am looking forward to your reply.
Best.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.