kingfischer16 / mldeploy Goto Github PK
View Code? Open in Web Editor NEWDeploy ML code to cloud resources as a REST API for inference and training.
License: MIT License
Deploy ML code to cloud resources as a REST API for inference and training.
License: MIT License
Write a function that takes the CPU/GPU/memory requirements from the configuration file and translates them into the minimum viable EC2 instance type. The boto3
package's describe_instance_type()
method is probably a good start.
Setup the basic code for sphinx auto-documentation of the main library (non-hidden functions). Will need to add specific CLI documentation by hand.
Add AWS SDK basics. This includes:
PyLint should be used here. Setup a basic config file, run linting.
Current thinking is to have a single, monolithic CloudFormation file that describes the entire deployment, including:
Probably more...
The utils._get_field_if_exists()
function currently only returns '(None)'
if the field does not exist. However, when we 'undeploy' something we end up setting the stack name and ID to an empty string (we do not remove the field). The function needs to return the same '(None)'
string when empty strings are found as well.
Given that the end goal is to deploy code that will be interacted with via a REST API, it might be useful to have a set of user functions that can be called (with proper direction and API keys) to pass or retrieve data from the deployed REST API.
Is there something off-the-shelf that can do this? We can use HTTP/HTTPS, but is that efficient? Can we package it such that it becomes invisible to the user?
The end result should be part of the user's code, e.g. they simply pass a list of model configurations and the API will queue them, train them, and store the performance results.
This is going to need at least 2 basic tutorials: one to use it for model training, the other for "production" deployment in prediction. The tutorials could use MNIST.
Training: Pass layers/nodes/activation functions as a request, training result metrics are recorded in a RDS, trained models are recorded in S3. Training data is uploaded separately to S3.
Prediction: Pass an image via REST, upload to S3, predict, return via REST.
The Docker image that is locally built must be pushed to AWS for deployment. We will use an ECR repository to handle Docker images on AWS for now. It is possible to do store Docker images directly in S3, but it is unclear how this will interact with ECS container deployment. Probably fine, but let's start with the easier/known way first.
This is a learning exercise. A basic CloudFormation template will be setup which will create a S3 bucket, EC2 instance, and upload a docker image (stretch).
Note that the actual deployment architecture should make use of AWS Fargate to take advantage of serverless architecture, containers, and hopefully lower cost.
Convert the CloudFormation template creation to YAML file format. While both JSON and YAML are equivalent, the CloudFormation template has the potential to be read and edited by human eyes, and YAML is the better format for this.
Thus we adhere to the following tenet: any configuration or data files that have the potential to be read by humans should be written as YAML. Data storage files that we do not expect to be read by people and only serve to hold registry information can be written in JSON.
Setup GitHub/GitLab to auto-build on push to master, and submit as new package version.
If used for training, we may want to capture results (metrics) and/or model architecture. Text data could be handled in a RDS, maybe with a backup or two (optional, redundancy
parameter in setup), storing the inputs and results, and links to S3 results. actual trained model architectures could be stored in S3.
Setting up unit testing will be a pretty big deal now. Start with the utils.py
function. Get that all sorted and working, including mock
.
Setup a 'canned' CloudFormation file for an ECS cluster of EC2 machines. We will update relevant fields (resources, regions, project names, subnet names). This will be passed to CloudFormation for stack creation.
This template should contain:
Regardless of whatever model and code gets copied into the Docker image, there will need to be a file, e.g. handler.py
, that:
Write a function that will check the name of the project, since this project name will likely be used as the stack name and in resource naming, in utils.py
. Should adhere to the following:
Create a folder and store each of the docker build files in a separate file. Will need to figure out how to parse the JSON-generator and save to file. TXT maybe?
Call mldeploy cost project-name
will give an estimate of the current deployment cost; base price (minimum instances) and per increment (each additional instance).
Given that the desired result will be somewhat stock, there needs to be created an ECS task definition template in JSON, with details (container, resources, ports) to be configured at project build time.
Let's use Python because... Python. Every container will startup and immediately execute a run_mldeploy.py
file (or some other suitable name). This file needs to:
Custom health status: https://docs.aws.amazon.com/autoscaling/ec2/userguide/healthcheck.html
Write a "salting" function to add a randomly generated 8 (4?) character alphanumeric string (or X characters) to be added to the end of every resource name to ensure the names are unique within a region.
When calling build (or possibly deploy), internal functions will check the config file for the required sizes of EC2 instances, cluster setup, project name, scaling. API setup, and so on. This will be added to the YAML file via text search. The template should contain easily searchable markers, e.g. XXPROJECTNAMEXX, XXINSTANCESIZEXX.
It may be useful to be able to query the results RDS from the CLI. Something like mldeploy query project-name "SQL SELECT STRING"
could be used. We have boto3
and all needed references and authentications. It would at least be a quick way to view the top results without logging into the AWS console or checking another DB program.
A Fargate cluster will automatically hibernate when there are no more jobs, but an EC2 cluster will not. A hibernate
function should be callable on an active deployment, and should trigger:
Additionally, a resume
or thaw
or wakeup
function is required to restart the execution again. This is functionality will likely come in handy so users don't have to undeploy
and redeploy
the same deployment everytime they need to take a break.
Add a CLI-level function to display a more detailed status of a specific project. This should be the next level deep of information after the ls
method.
A status
function should be callable on any deployment. In addition to detailed information about the setup, if a deployment is active it should also return:
status
function was calledTags should be automatically generated for the CloudFormation template to help the user find/differentiate the deployments on the AWS console. Tags could include:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.