Code Monkey home page Code Monkey logo

flatformer's Introduction

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Abstract

Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3$\times$ slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks.

Results

All the results are reproducible with this repo. Regrettably, we are unable to provide the pre-trained model weights due to Waymo Dataset License Agreement. Discussions are definitely welcome if you could not obtain satisfactory performances with FlatFormer in your projects.

3D Object Detection (on Waymo validation)

Model #Sweeps mAP/H_L1 mAP/H_L2 Veh_L1 Veh_L2 Ped_L1 Ped_L2 Cyc_L1 Cyc_L2
FlatFormer 1 76.1/73.4 69.7/67.2 77.5/77.1 69.0/68.6 79.6/73.0 71.5/65.3 71.3/70.1 68.6/67.5
FlatFormer 2 78.9/77.3 72.7/71.2 79.1/78.6 70.8/70.3 81.6/78.2 73.8/70.5 76.1/75.1 73.6/72.6
FlatFormer 3 79.6/78.0 73.5/72.0 79.7/79.2 71.4/71.0 82.0/78.7 74.5/71.3 77.2/76.1 74.7/73.7

Usage

Prerequisites

The code is built with following libraries:

After installing these dependencies, please run this command to install the codebase:

pip install -v -e .

Dataset Preparation

Please follow the instructions from MMDetection3D to download and preprocess the Waymo Open Dataset. After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):

mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── waymo
│   │   ├── waymo_format
│   │   │   ├── training
│   │   │   ├── validation
│   │   │   ├── testing
│   │   │   ├── gt.bin
│   │   ├── kitti_format
│   │   │   ├── ImageSets
│   │   │   ├── training
│   │   │   ├── testing
│   │   │   ├── waymo_gt_database
│   │   │   ├── waymo_infos_trainval.pkl
│   │   │   ├── waymo_infos_train.pkl
│   │   │   ├── waymo_infos_val.pkl
│   │   │   ├── waymo_infos_test.pkl
│   │   │   ├── waymo_dbinfos_train.pkl

Training

# multi-gpu training
bash tools/dist_train.sh configs/flatformer/$CONFIG.py 8 --work-dir $CONFIG/ --cfg-options evaluation.pklfile_prefix=./work_dirs/$CONFIG/results evaluation.metric=waymo

Evaluation

# multi-gpu testing
bash tools/dist_test.sh configs/flatformer/$CONFIG.py /work_dirs/$CONFIG/latest.pth 8 --eval waymo

Citation

If FlatFormer is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@inproceedings{liu2023flatformer,
  title={FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer},
  author={Liu, Zhijian and Yang, Xinyu and Tang, Haotian and Yang, Shang and Han, Song},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

Acknowledgments

This project is based on the following codebases.

We would like to thank Tianwei Yin, Lue Fan and Ligeng Mao for providing detailed results of CenterPoint, SST/FSD and VoTr, and Yue Wang and Yukang Chen for their helpful discussions. This work was supported by National Science Foundation, MIT-IBM Watson AI Lab, NVIDIA, Hyundai and Ford. Zhijian Liu was partially supported by the Qualcomm Innovation Fellowship.

flatformer's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.