awesome-inference's Introduction

awesome-inference

Deep Learning space is now exploding with all the new inference frameworks. This is the repo to keep track of active list. The initial list was seeded from Model_Inference_Deployment and you still might want to check it out if this list is outdated.

Please add pull request for updates.

ONNX Runtime - By Microsoft as an optimized runtime for exported models in standard ONNX format.
DeepSpeed-MII - Inference library that makes 24000 models, including very large one, fast through deepfusion, tensor-slicing, ZeroQuant, distributing model on GPU and CPU etc.
OpenAI Triton - Kernels written in OpenAI Triton language to make models run fast on GPU.
AI Template - By Meta AI, framework to run models on NVidia as well as AMD GPUs.
OpenVINO - By Intel to run models fast on Intel CPUs.
TensorRT - By NVidia to run models fast on NVidia GPUs.
TVM - Generate optimized tensor operators for given hardware.
MediaPipe - Gallery of models for popular tasks optimized for iOS, Android and more.
TensorFlow Lite - Runtime for mobile and embedded devices for TensorFlow models.
TensorFlow Serving - Serving infrastructure for TensorFlow model.
LibTorch - C++ distributions of PyTorch.
NCNN - By Tencent, inference framework for mobile devices used by their own apps.
TNN - By Tencent, cross platform acceleration for many tasks in their app.
MNN - By Alibaba, framework for compression, training and serving used by 30 apps in Alibaba.
MACE - By XiaoMi, Mobile AI Compute Engine, inference framework for mobile.
Paddle Lite - PaddlePaddle, inference on mobile and IoT
MegEngine Lite - for MegEngine.
OpenPPL - Primitive Library for Neural Network, runs ONNX models on GPU and CPU.
Bolt - By Huawei, can run OONX, TFLite models, claims to be 15% faster than others supporting Qualcomm GPU, Mali GPU and CPUs.

Recommend Projects