Code Monkey home page Code Monkey logo

mldeploy's People

Stargazers

 avatar  avatar

Watchers

 avatar

mldeploy's Issues

choose ec2 from requirements

Write a function that takes the CPU/GPU/memory requirements from the configuration file and translates them into the minimum viable EC2 instance type. The boto3 package's describe_instance_type() method is probably a good start.

setup sphinx

Setup the basic code for sphinx auto-documentation of the main library (non-hidden functions). Will need to add specific CLI documentation by hand.

AWS SDK basics

Add AWS SDK basics. This includes:

  • authentication using... config?
  • creating and starting an EC2 instance
  • creating and configuring an ASG
  • IAM roles
  • API Gateway and Lambda, probably
  • Cloud formation? Generate locally and upload?
  • Teardown

setup linting

PyLint should be used here. Setup a basic config file, run linting.

cloudformation master template

Current thinking is to have a single, monolithic CloudFormation file that describes the entire deployment, including:

  • IAM roles
  • Security groups
  • Storage (S3, RDS)
  • Cluster (EC2, Fargate)
  • API Gateway setup
  • Lambda pre-deployment setup
  • SQS
  • ECR repository
  • VPC, public and private subnets
  • IGW, NAT GW, ALB?

Probably more...

get field returns none if empty string

The utils._get_field_if_exists() function currently only returns '(None)' if the field does not exist. However, when we 'undeploy' something we end up setting the stack name and ID to an empty string (we do not remove the field). The function needs to return the same '(None)' string when empty strings are found as well.

python api for handling deployed rest api

Given that the end goal is to deploy code that will be interacted with via a REST API, it might be useful to have a set of user functions that can be called (with proper direction and API keys) to pass or retrieve data from the deployed REST API.

Is there something off-the-shelf that can do this? We can use HTTP/HTTPS, but is that efficient? Can we package it such that it becomes invisible to the user?

The end result should be part of the user's code, e.g. they simply pass a list of model configurations and the API will queue them, train them, and store the performance results.

basic tutorials

This is going to need at least 2 basic tutorials: one to use it for model training, the other for "production" deployment in prediction. The tutorials could use MNIST.

Training: Pass layers/nodes/activation functions as a request, training result metrics are recorded in a RDS, trained models are recorded in S3. Training data is uploaded separately to S3.

Prediction: Pass an image via REST, upload to S3, predict, return via REST.

push docker image to cloud

The Docker image that is locally built must be pushed to AWS for deployment. We will use an ECR repository to handle Docker images on AWS for now. It is possible to do store Docker images directly in S3, but it is unclear how this will interact with ECS container deployment. Probably fine, but let's start with the easier/known way first.

create simple stack

This is a learning exercise. A basic CloudFormation template will be setup which will create a S3 bucket, EC2 instance, and upload a docker image (stretch).

Note that the actual deployment architecture should make use of AWS Fargate to take advantage of serverless architecture, containers, and hopefully lower cost.

convert cf template to yaml

Convert the CloudFormation template creation to YAML file format. While both JSON and YAML are equivalent, the CloudFormation template has the potential to be read and edited by human eyes, and YAML is the better format for this.

Thus we adhere to the following tenet: any configuration or data files that have the potential to be read by humans should be written as YAML. Data storage files that we do not expect to be read by people and only serve to hold registry information can be written in JSON.

GitHub builder

Setup GitHub/GitLab to auto-build on push to master, and submit as new package version.

RDS for collecting results

If used for training, we may want to capture results (metrics) and/or model architecture. Text data could be handled in a RDS, maybe with a backup or two (optional, redundancy parameter in setup), storing the inputs and results, and links to S3 results. actual trained model architectures could be stored in S3.

start unit tests

Setting up unit testing will be a pretty big deal now. Start with the utils.py function. Get that all sorted and working, including mock.

ecs template for cf

Setup a 'canned' CloudFormation file for an ECS cluster of EC2 machines. We will update relevant fields (resources, regions, project names, subnet names). This will be passed to CloudFormation for stack creation.

This template should contain:

  • the core infrastructure: private VPC, subnets, API gateway
  • the cluster: ECS EC2
  • scaling and container information
  • setup info

request handler code

Regardless of whatever model and code gets copied into the Docker image, there will need to be a file, e.g. handler.py, that:

  • Polls SQS for 1 new task
  • Gathers required resources (data, hyperparameters) before the execution
  • Executes the task (user definition, making use of user code via imports)
  • Sends the results (metrics, files) to appropriate storage (S3, RDS)

project/stack name checker

Write a function that will check the name of the project, since this project name will likely be used as the stack name and in resource naming, in utils.py. Should adhere to the following:

  • 128 characters maximum
  • alphanumeric plus hyphen characters

save docker logs

Create a folder and store each of the docker build files in a separate file. Will need to figure out how to parse the JSON-generator and save to file. TXT maybe?

command line cost estimator

Call mldeploy cost project-name will give an estimate of the current deployment cost; base price (minimum instances) and per increment (each additional instance).

create ecs task definition template

Given that the desired result will be somewhat stock, there needs to be created an ECS task definition template in JSON, with details (container, resources, ports) to be configured at project build time.

polling SQS via docker Python script

Let's use Python because... Python. Every container will startup and immediately execute a run_mldeploy.py file (or some other suitable name). This file needs to:

  • Get the SQS queue name
  • Forever loop through polling the queue for a single job and then:
    • executing that job if found, then deleting the job from the queue upon completion
    • waiting some time (10s, 30s, 5 minutes?) then polling again
  • If something interrupts this flow and causes an exit, the instance needs to be reported as unhealthy so a new one can replace it and start working.

Custom health status: https://docs.aws.amazon.com/autoscaling/ec2/userguide/healthcheck.html

salt function for names

Write a "salting" function to add a randomly generated 8 (4?) character alphanumeric string (or X characters) to be added to the end of every resource name to ensure the names are unique within a region.

update cf template on build

When calling build (or possibly deploy), internal functions will check the config file for the required sizes of EC2 instances, cluster setup, project name, scaling. API setup, and so on. This will be added to the YAML file via text search. The template should contain easily searchable markers, e.g. XXPROJECTNAMEXX, XXINSTANCESIZEXX.

query results from cli

It may be useful to be able to query the results RDS from the CLI. Something like mldeploy query project-name "SQL SELECT STRING" could be used. We have boto3 and all needed references and authentications. It would at least be a quick way to view the top results without logging into the AWS console or checking another DB program.

add hibernate function

A Fargate cluster will automatically hibernate when there are no more jobs, but an EC2 cluster will not. A hibernate function should be callable on an active deployment, and should trigger:

  1. EC2: ASG desired instances set to zero.
  2. Fargate: If jobs are still running, set desired tasks to zero in service.
    This should be effective regardless of the SQS still having jobs or not.

Additionally, a resume or thaw or wakeup function is required to restart the execution again. This is functionality will likely come in handy so users don't have to undeploy and redeploy the same deployment everytime they need to take a break.

add project status function

Add a CLI-level function to display a more detailed status of a specific project. This should be the next level deep of information after the ls method.

add status function

A status function should be callable on any deployment. In addition to detailed information about the setup, if a deployment is active it should also return:

  1. Remaining queue size
  2. Number of containers running
  3. Maximum allowed containers
  4. Number of jobs processed
  5. Average time per job
  6. Deployment uptime
  7. Time remaining until all jobs are processed
  8. Estimate of processing costs (EC2 or Fargate usage x uptime), and estimated total cost including remaining jobs
  9. deployment environment
  10. deployment status: active, hibernating, not deployed
  11. API handle, key
  12. timestamp of when the status function was called

add relevant tags to cloudformation

Tags should be automatically generated for the CloudFormation template to help the user find/differentiate the deployments on the AWS console. Tags could include:

  • envinroment: prod, test, dev
  • deployment tool: mldeploy
  • version: of template?
  • type of architecture: lambda, ASG, Kubernetes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.