kingfischer16 / mldeploy Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 104 KB

Deploy ML code to cloud resources as a REST API for inference and training.

License: MIT License

Makefile 0.72% Batchfile 0.91% Python 98.37%

aws aws-apigateway aws-cloudformation aws-ec2 aws-sdk cloud deployment docker python rest-api

mldeploy's People

Stargazers

Watchers

mldeploy's Issues

choose ec2 from requirements

Write a function that takes the CPU/GPU/memory requirements from the configuration file and translates them into the minimum viable EC2 instance type. The boto3 package's describe_instance_type() method is probably a good start.

setup sphinx

Setup the basic code for sphinx auto-documentation of the main library (non-hidden functions). Will need to add specific CLI documentation by hand.

AWS SDK basics

Add AWS SDK basics. This includes:

authentication using... config?
creating and starting an EC2 instance
creating and configuring an ASG
IAM roles
API Gateway and Lambda, probably
Cloud formation? Generate locally and upload?
Teardown

setup linting

PyLint should be used here. Setup a basic config file, run linting.

cloudformation master template

Current thinking is to have a single, monolithic CloudFormation file that describes the entire deployment, including:

IAM roles
Security groups
Storage (S3, RDS)
Cluster (EC2, Fargate)
API Gateway setup
Lambda pre-deployment setup
SQS
ECR repository
VPC, public and private subnets
IGW, NAT GW, ALB?

Probably more...

get field returns none if empty string

The utils._get_field_if_exists() function currently only returns '(None)' if the field does not exist. However, when we 'undeploy' something we end up setting the stack name and ID to an empty string (we do not remove the field). The function needs to return the same '(None)' string when empty strings are found as well.

python api for handling deployed rest api

Given that the end goal is to deploy code that will be interacted with via a REST API, it might be useful to have a set of user functions that can be called (with proper direction and API keys) to pass or retrieve data from the deployed REST API.

Is there something off-the-shelf that can do this? We can use HTTP/HTTPS, but is that efficient? Can we package it such that it becomes invisible to the user?

The end result should be part of the user's code, e.g. they simply pass a list of model configurations and the API will queue them, train them, and store the performance results.

basic tutorials

This is going to need at least 2 basic tutorials: one to use it for model training, the other for "production" deployment in prediction. The tutorials could use MNIST.

Training: Pass layers/nodes/activation functions as a request, training result metrics are recorded in a RDS, trained models are recorded in S3. Training data is uploaded separately to S3.

Prediction: Pass an image via REST, upload to S3, predict, return via REST.

push docker image to cloud

The Docker image that is locally built must be pushed to AWS for deployment. We will use an ECR repository to handle Docker images on AWS for now. It is possible to do store Docker images directly in S3, but it is unclear how this will interact with ECS container deployment. Probably fine, but let's start with the easier/known way first.

create simple stack

This is a learning exercise. A basic CloudFormation template will be setup which will create a S3 bucket, EC2 instance, and upload a docker image (stretch).

Note that the actual deployment architecture should make use of AWS Fargate to take advantage of serverless architecture, containers, and hopefully lower cost.

convert cf template to yaml

Convert the CloudFormation template creation to YAML file format. While both JSON and YAML are equivalent, the CloudFormation template has the potential to be read and edited by human eyes, and YAML is the better format for this.

Thus we adhere to the following tenet: any configuration or data files that have the potential to be read by humans should be written as YAML. Data storage files that we do not expect to be read by people and only serve to hold registry information can be written in JSON.

GitHub builder

Setup GitHub/GitLab to auto-build on push to master, and submit as new package version.

RDS for collecting results

If used for training, we may want to capture results (metrics) and/or model architecture. Text data could be handled in a RDS, maybe with a backup or two (optional, redundancy parameter in setup), storing the inputs and results, and links to S3 results. actual trained model architectures could be stored in S3.

start unit tests

Setting up unit testing will be a pretty big deal now. Start with the utils.py function. Get that all sorted and working, including mock.

ecs template for cf

Setup a 'canned' CloudFormation file for an ECS cluster of EC2 machines. We will update relevant fields (resources, regions, project names, subnet names). This will be passed to CloudFormation for stack creation.

This template should contain:

the core infrastructure: private VPC, subnets, API gateway
the cluster: ECS EC2
scaling and container information
setup info

request handler code

Regardless of whatever model and code gets copied into the Docker image, there will need to be a file, e.g. handler.py, that:

Polls SQS for 1 new task
Gathers required resources (data, hyperparameters) before the execution
Executes the task (user definition, making use of user code via imports)
Sends the results (metrics, files) to appropriate storage (S3, RDS)

project/stack name checker

Write a function that will check the name of the project, since this project name will likely be used as the stack name and in resource naming, in utils.py. Should adhere to the following:

128 characters maximum
alphanumeric plus hyphen characters

save docker logs

Create a folder and store each of the docker build files in a separate file. Will need to figure out how to parse the JSON-generator and save to file. TXT maybe?

command line cost estimator

Call mldeploy cost project-name will give an estimate of the current deployment cost; base price (minimum instances) and per increment (each additional instance).

create ecs task definition template

Given that the desired result will be somewhat stock, there needs to be created an ECS task definition template in JSON, with details (container, resources, ports) to be configured at project build time.

polling SQS via docker Python script

Let's use Python because... Python. Every container will startup and immediately execute a run_mldeploy.py file (or some other suitable name). This file needs to:

Get the SQS queue name
Forever loop through polling the queue for a single job and then:
- executing that job if found, then deleting the job from the queue upon completion
- waiting some time (10s, 30s, 5 minutes?) then polling again
If something interrupts this flow and causes an exit, the instance needs to be reported as unhealthy so a new one can replace it and start working.

Custom health status: https://docs.aws.amazon.com/autoscaling/ec2/userguide/healthcheck.html

salt function for names

Write a "salting" function to add a randomly generated 8 (4?) character alphanumeric string (or X characters) to be added to the end of every resource name to ensure the names are unique within a region.

update cf template on build

When calling build (or possibly deploy), internal functions will check the config file for the required sizes of EC2 instances, cluster setup, project name, scaling. API setup, and so on. This will be added to the YAML file via text search. The template should contain easily searchable markers, e.g. XXPROJECTNAMEXX, XXINSTANCESIZEXX.

query results from cli

It may be useful to be able to query the results RDS from the CLI. Something like mldeploy query project-name "SQL SELECT STRING" could be used. We have boto3 and all needed references and authentications. It would at least be a quick way to view the top results without logging into the AWS console or checking another DB program.

add hibernate function

A Fargate cluster will automatically hibernate when there are no more jobs, but an EC2 cluster will not. A hibernate function should be callable on an active deployment, and should trigger:

EC2: ASG desired instances set to zero.
Fargate: If jobs are still running, set desired tasks to zero in service.
This should be effective regardless of the SQS still having jobs or not.

Additionally, a resume or thaw or wakeup function is required to restart the execution again. This is functionality will likely come in handy so users don't have to undeploy and redeploy the same deployment everytime they need to take a break.

add project status function

Add a CLI-level function to display a more detailed status of a specific project. This should be the next level deep of information after the ls method.

add status function

A status function should be callable on any deployment. In addition to detailed information about the setup, if a deployment is active it should also return:

Remaining queue size
Number of containers running
Maximum allowed containers
Number of jobs processed
Average time per job
Deployment uptime
Time remaining until all jobs are processed
Estimate of processing costs (EC2 or Fargate usage x uptime), and estimated total cost including remaining jobs
deployment environment
deployment status: active, hibernating, not deployed
API handle, key
timestamp of when the status function was called

add relevant tags to cloudformation

Tags should be automatically generated for the CloudFormation template to help the user find/differentiate the deployments on the AWS console. Tags could include:

envinroment: prod, test, dev
deployment tool: mldeploy
version: of template?
type of architecture: lambda, ASG, Kubernetes

kingfischer16 / mldeploy Goto Github PK

mldeploy's People

Stargazers

Watchers

mldeploy's Issues

Recommend Projects

Recommend Topics

Recommend Org