Code Monkey home page Code Monkey logo

drogon-torch-serve's Introduction

C++ Torch Server

Serve torch models as rest-api using Drogon, example included for resnet18 model for Imagenet. Benchmarks show improvement of ~6-10x throughput and latencies for resnet18 at peak load.

Build & Run Instructions

# Create Optimized models for your machine.
$ python3 optimize_model_for_inference.py

# Build and Run Server
$ docker compose run --service-ports blaze

Development

  • Add Docker to CLion toolchain this will setup all necessary dependencies.

Client Instructions

curl "localhost:8088/classify" -F "image=@images/cat.jpg"

Benchmarking Instructions

# Drogon + libtorch
for i in {0..8}; do curl "localhost:8088/classify" -F "image=@images/cat.jpg"; done # Run once to warmup.
wrk -t8 -c100 -d60 -s benchmark/upload.lua "http://localhost:8088/classify" --latency
# FastAPI + pytorch
cd benchmark/python_fastapi
python3 -m venv env
source env/bin/activate
python3 -m pip install -r requirements.txt # Run just once to isntall dependencies to folder.
gunicorn main:app -w 2 -k uvicorn.workers.UvicornWorker --bind 127.0.0.1: # Best performance on my machine, tried 3/4 also.
deactivate # Use after benchmarking is done and gunicorn is closed

cd ../.. # back to root folder
for i in {0..8}; do curl "localhost:8088/classify" -F "image=@images/cat.jpg"; done
wrk -t8 -c100 -d60 -s benchmark/fastapi_upload.lua "http://localhost:8088/classify" --latency

Benchmarking results

Drogon + libtorch

# OS: Ubuntu 21.10 x86_64
# Kernel: 5.15.14-xanmod1
# CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz
# GPU: NVIDIA GeForce RTX 3070
Running 1m test @ http://localhost:8088/classify
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    39.30ms   10.96ms  95.51ms   70.50%
    Req/Sec   306.58     28.78   390.00     70.92%
  Latency Distribution
     50%   37.40ms
     75%   45.69ms
     90%   54.57ms
     99%   69.34ms
  146612 requests in 1.00m, 30.34MB read
Requests/sec:   2441.60
Transfer/sec:    517.41KB

FastAPI + pytorch

# OS: Ubuntu 21.10 x86_64
# Kernel: 5.15.14-xanmod1
# CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz
# GPU: NVIDIA GeForce RTX 3070
Running 1m test @ http://localhost:8088/classify
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   449.50ms  239.30ms   1.64s    70.39%
    Req/Sec    33.97     26.41   121.00     83.46%
  Latency Distribution
     50%  454.64ms
     75%  570.73ms
     90%  743.54ms
     99%    1.16s
  12981 requests in 1.00m, 2.64MB read
Requests/sec:    216.13
Transfer/sec:     44.96KB

Architecture

  • API request handing and model Pre-processing in the Drogon Controller controllers/ImageClass.cc
  • Batched Model Inference logic & post-processing in lib/ModelBatchInference.cpp

TODOS

  • Multithreaded batched inference
  • FP16 Inference
  • Uses c++20 coroutines for wait free event loop tasks
  • Add compiler optimizations for cmake.
  • Benchmark optimizations like Channel last, ONNX, TensorRT and report what's faster.
  • Pin Batched tensor used for inference to memory and re-use at every inference. No Improvement.
  • User Torch-TensorRT for inference, fastest on CUDA devices. Cuts down from 5ms to 1-2ms .
  • Use Torch Nvjpeg for faster image decoding, currently spends 2ms on this call with libjpeg-turbo.
  • Int8 Inference using FXGraph post-training quantization, Resnet Int8 Quantization example1 , example2
  • Benchmark framework against mosec
  • Use lockfree queues
  • Seperate Pre-Process, Infer and post-preprocessing.
  • Added address & memory leak sanitizers to CMake.
  • Dockerize for easy usage.

Notes

  • WIP: Just gets the job done for now, not production ready, though tested regularly.

drogon-torch-serve's People

Contributors

viig99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

drogon-torch-serve's Issues

Pytorch, Cuda and cuDNN version ?

Nice work here ...

Can I ask what's the version Pytorch, Cuda and cuDNN version you're using ?

I have faced following issues and was wondering if is it due to compatibility issues ?

I am testing in Pytorch 1.10. running on Cuda 10.2 cuDNN 7. No issue faced running with Python.

erminate called after throwing an instance of 'std::runtime_error'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torchvision/models/resnet/___torch_mangle_25.py", line 6, in forward
  def forward(self: __torch__.torchvision.models.resnet.___torch_mangle_25.ResNet,
    x: Tensor) -> Tensor:
    _0 = torch.cudnn_convolution_relu(x, CONSTANTS.c0, CONSTANTS.c1, [2, 2], [3, 3], [1, 1], 1)
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    x0 = torch.max_pool2d(_0, [3, 3], [2, 2], [1, 1])
    _1 = torch.cudnn_convolution_relu(x0, CONSTANTS.c2, CONSTANTS.c3, [1, 1], [1, 1], [1, 1], 1)

Traceback of TorchScript, original code (most recent call last):

    graph(%input, %weight, %bias, %stride:int[], %padding:int[], %dilation:int[], %groups:int):
        %res = aten::cudnn_convolution_relu(%input, %weight, %bias, %stride, %padding, %dilation, %groups)
               ~~~~ <--- HERE
        return (%res)
RuntimeError: cuDNN filters (a.k.a. weights) must be contiguous in desired memory_format

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.