Deep Learning space is now exploding with all the new inference frameworks. This is the repo to keep track of active list. The initial list was seeded from Model_Inference_Deployment and you still might want to check it out if this list is outdated.
Please add pull request for updates.
- ONNX Runtime - By Microsoft as an optimized runtime for exported models in standard ONNX format.
- DeepSpeed-MII - Inference library that makes 24000 models, including very large one, fast through deepfusion, tensor-slicing, ZeroQuant, distributing model on GPU and CPU etc.
- OpenAI Triton - Kernels written in OpenAI Triton language to make models run fast on GPU.
- AI Template - By Meta AI, framework to run models on NVidia as well as AMD GPUs.
- OpenVINO - By Intel to run models fast on Intel CPUs.
- TensorRT - By NVidia to run models fast on NVidia GPUs.
- TVM - Generate optimized tensor operators for given hardware.
- MediaPipe - Gallery of models for popular tasks optimized for iOS, Android and more.
- TensorFlow Lite - Runtime for mobile and embedded devices for TensorFlow models.
- TensorFlow Serving - Serving infrastructure for TensorFlow model.
- LibTorch - C++ distributions of PyTorch.
- NCNN - By Tencent, inference framework for mobile devices used by their own apps.
- TNN - By Tencent, cross platform acceleration for many tasks in their app.
- MNN - By Alibaba, framework for compression, training and serving used by 30 apps in Alibaba.
- MACE - By XiaoMi, Mobile AI Compute Engine, inference framework for mobile.
- Paddle Lite - PaddlePaddle, inference on mobile and IoT
- MegEngine Lite - for MegEngine.
- OpenPPL - Primitive Library for Neural Network, runs ONNX models on GPU and CPU.
- Bolt - By Huawei, can run OONX, TFLite models, claims to be 15% faster than others supporting Qualcomm GPU, Mali GPU and CPUs.