Code Monkey home page Code Monkey logo

ghas-results / onnx-mlir-serving Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ibm/onnx-mlir-serving

0.0 0.0 0.0 4.32 MB

ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high throughput. ONNX Servring provides dynamic batch aggregation and workers pool to fully utilize AI accelerators on the machine.

License: Apache License 2.0

C++ 81.33% Python 8.02% CMake 9.33% Dockerfile 1.32%

onnx-mlir-serving's Introduction

ONNX-MLIR Serving

This project implements a GRPC server written with C++ to serve onnx-mlir compiled models. Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high throughput.

ONNX Servring provides dynamic batch aggregation and workers pool feature to fully utilize AI accelerators on the machine.

ONNX-MLIR is compiler technology to transform a valid Open Neural Network Exchange (ONNX) graph into code that implements the graph with minimum runtime support. It implements the ONNX standard and is based on the underlying LLVM/MLIR compiler technology.

Build

There are two ways to build this project.

Build ONNX-MLIR Serving on local environment

Prerequisite

1. GPRC Installed

Build GRPC from Source

GPRC Installation DIR example: grpc/cmake/install

2. ONNX MLIR Build is built

Copy include files from onnx-mlir source to onnx-mlir build dir.

ls onnx-mlir-serving/onnx-mlir-build/*
onnx-mlir-sering/onnx-mlir-build/include:
benchmark  CMakeLists.txt  google  onnx  onnx-mlir  OnnxMlirCompiler.h  OnnxMlirRuntime.h  rapidcheck  rapidcheck.h

onnx-mlir-serving/onnx-mlir-build/lib:
libcruntime.a

Build ONNX-MLIR Serving

cmake -DCMAKE_BUILD_TYPE=Release -DGRPC_DIR:STRING={GPRC_SRC_DIR} -DONNX_COMPILER_DIR:STRING={ONNX_MLIR_BUILD_DIR} -DCMAKE_PREFIX_PATH={GPRC_INSTALL_DIR} ../..
make -j

Build ONNX-MLIR Serving on Docker environment

Build AI GPRC Server and Client

docker build -t onnx/aigrpc-server .

Run ONNX-MLIR Server and Client

Server:

./grpc_server -h
usage: grpc_server [options]
    -w arg     wait time for batch size, default is 0
    -b arg     server side batch size, default is 1
    -n arg     thread numberm default is 1

./grpc_server

Add more models

Build Models Directory

/cmake/build
mkdir models

example models directory

models
└── mnist
    ├── config
    ├── model.so
    └── model.onnx

config

discripte model configs, can be generated usng utils/OnnxReader <model.onnx> examle of mnist config

input {
  name: "Input3"
  type {
    tensor_type {
      elem_type: 1
      shape {
        dim {
          dim_value: 1
        }
        dim {
          dim_value: 1
        }
        dim {
          dim_value: 28
        }
        dim {
          dim_value: 28
        }
      }
    }
  }
}
output {
  name: "Plus214_Output_0"
  type {
    tensor_type {
      elem_type: 1
      shape {
        dim {
          dim_value: 1
        }
        dim {
          dim_value: 10
        }
      }
    }
  }
}
max_batch_size: 1

Inference request

see utils/inference.proto and utils/onnx.proto

Use Batching

There are two place to input batch size

  1. In model config file 'max_batch_size'
  2. When start grpc_server -b [batch size]

situation_1: grpc_server without -b, defaule batch size is 1, means no batching situation_2: grpc_server -b <batch_size>, batch_size > 1, and model A config max_batch_size > 1, when query model A, will use the mininum batch size. situation_3: grpc_server -b <batch_size>, batch_size > 1, and model B config max_batch_size = 1 (generated by default), when query model B, will not using batching.

example client:

example/cpp or example/python

Example

See grpc-test.cc

  • TEST_F is a simpliest example to serve minst model.

onnx-mlir-serving's People

Contributors

lifeifei00 avatar chenqiny avatar ibm-open-source-bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.