Code Monkey home page Code Monkey logo

dedode-onnx-tensorrt's Introduction

GitHub ONNX TensorRT GitHub Repo stars GitHub all releases

DeDoDe-ONNX-TensorRT

Open Neural Network Exchange (ONNX) compatible implementation of DeDoDe 🎶 Detect, Don't Describe - Describe, Don't Detect, for Local Feature Matching. Supports TensorRT 🚀.

DeDoDe figure
The DeDoDe detector learns to detect 3D consistent repeatable keypoints, which the DeDoDe descriptor learns to match. The result is a powerful decoupled local feature matcher.

Latency figure
DeDoDe ONNX TensorRT provides a 2x speedup over PyTorch.

🔥 ONNX Export

Prior to exporting the ONNX models, please install the requirements.

To convert the DeDoDe models to ONNX, run export.py. We provide two types of ONNX exports: individual standalone models, and a combined end-to-end pipeline (recommended for convenience) with the --end2end flag.

Export Example
python export.py \
    --img_size 256 256 \
    --end2end \
    --dynamic_img_size --dynamic_batch \
    --fp16

If you would like to try out inference right away, you can download ONNX models that have already been exported here or run ./weights/download.sh.

⚡ ONNX Inference

With ONNX models in hand, one can perform inference on Python using ONNX Runtime (see requirements-onnx.txt).

The DeDoDe inference pipeline has been encapsulated into a runner class:

from onnx_runner import DeDoDeRunner

images = DeDoDeRunner.preprocess(image_array)
# images.shape == (2B, 3, H, W)

# Create ONNXRuntime runner
runner = DeDoDeRunner(
    end2end_path="weights/dedode_end2end_1024.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
    # TensorrtExecutionProvider
)

# Run inference
matches_A, matches_B, batch_ids = runner.run(images)

matches_A = DeDoDeRunner.postprocess(matches_A, H_A, W_A)
matches_B = DeDoDeRunner.postprocess(matches_B, H_B, W_B)

Alternatively, you can also run infer.py.

Inference Example
python infer.py \
    --img_paths assets/im_A.jpg assets/im_B.jpg \
    --img_size 256 256 \
    --end2end \
    --end2end_path weights/dedode_end2end_1024_fp16.onnx \
    --fp16 \
    --viz

🚀 TensorRT Support

TensorRT offers the best performance and greatest memory efficiency.

TensorRT inference is supported for the end-to-end model via the TensorRT Execution Provider in ONNXRuntime. Please follow the official documentation to install TensorRT. The exported ONNX models must undergo shape inference for compatibility with TensorRT.

TensorRT Example
python tools/symbolic_shape_infer.py \
  --input weights/dedode_end2end_1024.onnx \
  --output weights/dedode_end2end_1024_trt.onnx \
  --auto_merge
CUDA_MODULE_LOADING=LAZY && python infer.py \ --img_paths assets/DSC_0410.JPG assets/DSC_0411.JPG \ --img_size 256 256 \ --end2end \ --end2end_path weights/dedode_end2end_1024_trt.onnx \ --trt \ --viz

The first run will take longer because TensorRT needs to initialise the .engine and .profile files. Subsequent runs should use the cached files. Only static input shapes are supported. Note that TensorRT will rebuild the cache if it encounters a different input shape.

⏱️ Inference Time Comparison

The inference times of the end-to-end DeDoDe pipelines are shown below.

# Keypoints10242048384040968192
Latency (ms) (RTX 4080 12GB)
PyTorch169.72170.42N/A176.18189.53
PyTorch-MP79.4280.09N/A83.896.93
ONNX170.84171.83N/A180.18203.37
TensorRT78.1279.5994.88N/AN/A
TensorRT-FP1633.935.4542.35N/AN/A
Evaluation Details The inference time, or latency, of only the end-to-end DeDoDe pipeline is reported; that is, the time taken for image preprocessing, postprocessing, copying data between the host & device, or finding inliers (e.g., CONSAC/MAGSAC) is not measured. The inference time is defined as the median over all samples in the MegaDepth test dataset. We use the data provided by LoFTR here - a total of 403 image pairs.


Each image is resized such that its dimensions are 512x512 before being fed into the pipeline. The inference time of the DeDoDe pipeline is then measured for different values of the detector's num_keypoints parameter: 1024, 2048, 4096, and 8192. Note that TensorRT has a hard limit of 3840 keypoints.

For reproducibility, the evaluation script eval.py is provided.

Latency figure

Credits

If you use any ideas from the papers or code in this repo, please consider citing the authors of DeDoDe. Lastly, if the ONNX or TensorRT versions helped you in any way, please also consider starring this repository.

@article{edstedt2023dedode,
      title={DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local Feature Matching}, 
      author={Johan Edstedt and Georg Bökman and Mårten Wadenbäck and Michael Felsberg},
      year={2023},
      eprint={2308.08479},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

dedode-onnx-tensorrt's People

Contributors

fabio-sim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.