Code Monkey home page Code Monkey logo

aws-solutions-library-samples / guidance-for-machine-learning-inference-on-aws Goto Github PK

View Code? Open in Web Editor NEW
31.0 14.0 9.0 900 KB

This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways you can pack thousands of unique PyTorch deep learning (DL) models into a scalable architecture and evaluate performance at scale

License: MIT No Attribution

Shell 73.26% Python 25.76% Dockerfile 0.97%
eks-cluster graviton3 inferentia ml mlops-workflow

guidance-for-machine-learning-inference-on-aws's Introduction

Guidance for Low Latency, High Throughput Inference using Efficient Compute on Amazon EKS

The guidance-for-machine-learning-inference-on-aws repository contains an end-to-end automation faremework example for running model inference locally on Docker or at scale on Amazon EKS Kubernetes cluster. It supports EKS compute nodes based on CPU, GPU, AWS Graviton and AWS Inferentia processor architectures and can pack multiple models in a single processor core for improved cost efficiency. While this example focuses on one processor architecture at a time, iterating over the steps below for various CPU/GPU Efficient Compute and Inferentia architectures enables hybrid deployments where the best processor/accelerator is used to serve each model depending on its resource consumption profile. In this sample repository, we use a bert-base NLP model from huggingface.co, however the project structure and workflow is generic and can be adapted for use with other models.


Fig. 1 - Sample Amazon EKS cluster infrastructure and deploying, running and testing of ML Inference workloads

The ML inference workloads in this project are deployed on the CPU, GPU, or Inferentia based EKS compute nodes as shown on Fig. 1. The control scripts may run in any location that has a full access to the cluster Kubernetes API. To eliminate latency concern related to the EKS cluster ingress, load tests run in pods deployed within the same cluster and send requests to the models directly through the cluster pod network.

  1. The Amazon EKS cluster has several node groups, with one Amazon EC2 instance family for each node group. Each node group can support different instance types, such as CPU (C5,C6i, C7gn), GPU (G4dn), AWS Inferentia (inf1, inf2) and can pack multiple models for each EKS node to maximize the number of served ML models that are running in a node group. Model bin packing is used to maximize compute and memory utilization of the Amazon EC2 instances in the cluster node groups.
  2. The natural language processing (NLP) open-source PyTorch model from Hugging Face, serving application and ML framework dependencies, are built by users as container images use an automation framework. These images are uploaded to Amazon Elastic Container Registry - Amazon ECR.
  3. Using the automation framework, the model container images are obtained from Amazon ECR and deployed to an Amazon EKS cluster using generated deployment and service manifests through the Kubernetes API (exposed through Elastic Load Balancing (ELB)). Model deployments are customized for each deployment target EKS compute node instance type through settings in the central configuration file.
  4. Following the best practices of the separation of model data from containers that run it, the ML model microservice design allows it to scale out to a large number of models. In the sample project, model containers are pulling data from Amazon Simple Storage Service (Amazon S3) and other public model data sources each time they are initialized.
  5. Using the automation framework, the test container images are deployed to an Amazon EKS cluster using generated deployment and service manifests through the Kubernetes API. Test deployments are customized for each deployment target EKS compute node instance type through settings in the central configuration file. Load or scale testing is performed by sending simultaneous requests to the model service pool from test pods. Performance test results and metrics are obtained, recorded, and aggregated.



Fig. 2 - ML Inference video walkthrough

Please watch this end-to-end accelerated video walkthrough (7 min) or follow the instructions below to build and run your own inference solution.

Prerequisites

This sample can be run on a single machine using Docker, or at scale on a Amazon EKS cluster.

It is assumed that the following basic tools are present: docker, kubectl, envsubst, kubetail, bc.

Operation

The project is operated through a set of action scripts as described below. To complete a full cycle from beginning-to-end, first configure the project, then follow steps 1 through 5 executing the corresponding action scripts. Each of the action scripts has a help screen, which can be invoked by passing "help" as argument: <script>.sh help

Optional - Provision an EKS cluster with 3 node groups

To provision this "opinionated" EKS cluster infrastructure optimized for running this guidance, run the ./provision.sh script. Optionally, you can use an existing EKS cluster you have or provision a new one using one of Terraform EKS blueprint that would contains nodegroups of desired target instance types.

./provision.sh

This command will execute a script that creates a CloudFormation stack which deploys an EC2 "management" instance in your default AWS region. That instance contains a userData script that provisions an EKS cluster in the us-west-2 region, pre-defined per specification based on the following template which is a part of another Git repo project. After that EKS cluster is provisoned, it is fully acessible from that EC2 "management" instance and this repository is copied there as well, ready to proceed to next steps.

Configure

./config.sh

A centralized configuration file config.properties contains all settings that are customizeable for the project. This file comes pre-configured with reasonable defaults that work out of the box. To set the processor target or any other setting edit the config file, or execute the config.sh script. Configuration changes take effect immediately upon execution of the next action script.

1. Build

./build.sh

This step builds a base container for the selected processor. A base container is required for any of the subsequent steps. This step can be executed on any instance type, regardless of processor target.

Optionally, if you'd like to push the base image to a container registry, execute ./build.sh push. Pushing the base image to a container registry is required if you are planning to run the test step against models deployed to Kubernetes. If you are using a private registry and you need to login before pushing, execute ./login.sh. This script will login to AWS ECR, other private registry implementations can be added to the script as needed.

2. Trace

./trace.sh

Compiles the model into a TorchScript serialized graph file (.pt). This step requires the model to run on the target processor. Therefore it is necessary to run this step on an instance that has the target processor available.

Upon successful compilation, the model will be saved in a local folder named trace-{model_name}.

Note

It is recommended to use the AWS Deep Learning AMI to launch the instance where your model will be traced.

3. Pack

./pack.sh

Packs the model in a container with FastAPI, also allowing for multiple models to be packed within the same container. FastAPI is used as an example here for simplicity and performance, however it can be interchanged with any other model server. For the purpose of this project we pack several instances of the same model in the container, however a natural extension of the same concept is to pack different models in the same container.

To push the model container image to a registry, execute ./pack.sh push. The model container must be pushed to a registry if you are deploying your models to Kubernetes.

4. Deploy

./deploy.sh

This script runs your models on the configured runtime. The project has built-in support for both local Docker runtimes and Kubernetes. The deploy script also has several sub-commands that facilitate the management of the full lifecycle of your model server containers.

  • ./deploy.sh run - (default) runs model server containers
  • ./deploy.sh status [number] - show container / pod / service status. Optionally show only specified instance number
  • ./deploy.sh logs [number] - tail container logs. Optionally tail only specified instance number
  • ./deploy.sh exec <number> - open bash into model server container with the specified instance number
  • ./deploy.sh stop - stop and remove deployed model contaiers from runtime

5. Test

./test.sh

The test script helps run a number of tests against the model servers deployed in your runtime environment.

  • ./test.sh build - build test container image
  • ./test.sh push - push test image to container registry
  • ./test.sh pull - pull the current test image from the container registry if one exists
  • ./test.sh run - run a test client container instance for advanced testing and exploration
  • ./test.sh exec - open shell in test container
  • ./test.sh status- show status of test container
  • ./test.sh stop - stop test container
  • ./test.sh help - list the available test commands
  • ./test.sh run seq - run sequential test. One request at a time submitted to each model server and model in sequential order.
  • ./test.sh run rnd - run random test. One request at a time submitted to a randomly selected server and model at a preset frequency.
  • ./test.sh run bmk - run benchmark test client to measure throughput and latency under load with random requests
  • ./test.sh run bma - run benchmark analysis - aggregate and average stats from logs of all completed benchmark containers

Clean up

You can uninstall the sample code for this Guidance using the AWS Command Line Interface. You must also delete the EKS cluster if it was deployed using references from this Guidance, since removal of the scale testing framework does not automatically delete Cluster and its resources.

To stop or uninstall scale Inferencetest job(s), run the following command:

./test.sh stop

It should delete all scale test pods and jobs from the specified EKS K8s namespace.

To stop or uninstall Inference model services, run the following command:

./deploy.sh stop

It should delete all Model deployments, pods, and services from the specified EKS K8s namespace.

If you provisioned an EKS cluster when setting up your prerequisites for the project as described in the "Optional - Provision an EKS cluster with 3 node groups" above, you can clean up the cluster and all resources associated with it by running this script:

./remove.sh

It should delete EKS cluster compute node groups first, then IAM service account used in that cluster, then cluster itself and, finally, ManagementInstance EC2 instance via corresponding Cloud Formations. Sometimes you may need to run that command a few times as individual stack deletion commands may time out - that should not create any problem.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

References

guidance-for-machine-learning-inference-on-aws's People

Contributors

amazon-auto avatar dzilbermanvmw avatar iankouls-aws avatar keitaw avatar modestcigit avatar sridevi1209 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guidance-for-machine-learning-inference-on-aws's Issues

Deploy Script Fails for Example

I am following the ReadMe and using all default settings.

I get the following error after running bash deploy.sh run

Runtime: kubernetes
Processor: graviton
error: error validating "STDIN": error validating data: failed to download openapi: the server has asked for the client to provide credentials; if you choose to ignore these errors, turn validation off with --validate=false
Generating ./app-bert-base-multilingual-cased-graviton-c7g.4xlarge/bert-base-multilingual-cased-graviton-0.yaml ...
error: error validating "app-bert-base-multilingual-cased-graviton-c7g.4xlarge/bert-base-multilingual-cased-graviton-0.yaml": error validating data: failed to download openapi: the server has asked for the client to provide credentials; if you choose to ignore these errors, turn validation off with --validate=false

Error while executing build script

Getting the following error when I run the build script.

I have the registry configured

#8 38.24 No package aws-neuron-runtime-base available.
#8 38.48 No package aws-neuron-runtime available.
#8 38.70 No package aws-neuron-tools available.
#8 38.83 Error: Nothing to do
------
executor failed running [/bin/sh -c yum update -y &&     yum install -y python3 python3-devel gcc-c++ &&     yum install -y tar gzip ca-certificates procps net-tools which vim wget libgomp htop jq bind-utils bc &&     yum install -y aws-neuron-runtime-base aws-neuron-runtime aws-neuron-tools]: exit code: 1

neuron runtime error

I executed ./trace.sh according to README, but neuron runtime error has occered.

Part of error log.

Question: What does the little engine say?
2022-Mar-29 07:07:29.0082    11:11    ERROR   NRT:nrt_init                                Unable to determine Neuron Driver version. Please check aws-neuron-dkms package is installed.
Traceback (most recent call last):
  File "model-tracer.py", line 101, in <module>
    answer_logits = model_traced(*example_inputs)
  File "/usr/local/lib64/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/usr/local/lib64/python3.7/site-packages/torch_neuron/decorators.py(373): forward
/usr/local/lib64/python3.7/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/usr/local/lib64/python3.7/site-packages/torch/nn/modules/module.py(1102): _call_impl
/usr/local/lib64/python3.7/site-packages/torch_neuron/graph.py(546): __call__
/usr/local/lib64/python3.7/site-packages/torch_neuron/graph.py(205): run_op
/usr/local/lib64/python3.7/site-packages/torch_neuron/graph.py(194): __call__
/usr/local/lib64/python3.7/site-packages/torch_neuron/convert.py(217): forward
/usr/local/lib64/python3.7/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/usr/local/lib64/python3.7/site-packages/torch/nn/modules/module.py(1102): _call_impl
/usr/local/lib64/python3.7/site-packages/torch/jit/_trace.py(965): trace_module
/usr/local/lib64/python3.7/site-packages/torch/jit/_trace.py(750): trace
/usr/local/lib64/python3.7/site-packages/torch_neuron/convert.py(183): trace
model-tracer.py(92): <module>
RuntimeError: The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged
to your system logs. See the Neuron Runtime's troubleshooting guide for help on this
topic: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/

I changed 1-build/Dockerfile-base-inf#L16

After changed, worked normally in my environment.

RUN yum update -y && \
    yum install -y python3 python3-devel gcc-c++ && \
    yum install -y tar gzip ca-certificates procps net-tools which vim wget libgomp htop jq bind-utils bc && \
    yum install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) aws-neuron-dkms aws-neuron-tools # I changed here.

I' AWS employee in Japan. alias akazawt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.