AWS Sagemaker Workbench Demo

Status - Work in progress -- Non-Functional

This project is an experiment in designing custom data science workbenches on AWS Sagemaker.

The goals of project are as follows:

Demonstrate how to loosely couple data engineering and modelling
Illustrate how to train a combination of Sagemaker and bespoke models.
Perform model selection using a flexible independent model comparison Notebook.
Deploy a chosen model.

Approach

We achieve this with a combination of convention, configuration and prebuilt applications that depend on these requirements.

Data is partitioned in an independent job that should be respected by all models
Models are then built independently according to the data scientists ideas and requirements
Models are deployed to an endpoint and registered in order to permit comparison
Comparison is performed using these endpoints on independent data.
After selection and final deployment, all artefacts are cleaned to reduce costs.

Key Conventions

Overall data partitioning is done once to enforce rigorous comparison of methods.
All models load their training data through the dataset utility functions
All experiments should be performed inside an independent directory below experiments
Completed models need to be deployed and registered using the models utility functions

Usage

Clone this repository into an instance of Sagemaker Studio.

There are then two usage pathways you can follow: GUI/Notebook Workflow and Script Workflow They both rely on the same underlying scripts and configuration.

GUI/Notebook Workflow

Data Prep

Follow the Notebook data/prepare_data.ipynb to understand how we get the data and prepare it for modelling.

Experiments

Examples of modelling approaches are shown in the experiments directory.

The proposed flow is as follows:

Build a Simple Baseline - Using a sci-kit learn script
Build an XGBoost Model - Using a pre-built training job container.
Run an Autopilot Job

With these models built we can then explore their performance.

Analysis

The Model Comparisons Notebook will allow you to compare any model that has been built following the conventions show in the experients sections

This notebook makes extensive use of configuration and GUI widgets so that you can always return and perform additional comparisons after additional models have been run.

Deployment

The [Deployment Notebook] demonstrates how to select any of the models built and create an endpoint. In some instances there will be additional configuration required to add pre-processing into the endpoint.

Script Workflow

The same steps as above can be executed using the RUN script in the root of the repository. This script is parameterised such that you can run individual steps seperately, or the entire process in sequence.

The goal of this workflow is demonstrate how you might automate certains elements of your data science workflow and develop a code base that is easier to deploy.

john-hawkins / amazon-sagemaker-workbench-demo Goto Github PK

amazon-sagemaker-workbench-demo's Introduction

AWS Sagemaker Workbench Demo

Approach

Key Conventions

Usage

GUI/Notebook Workflow

Data Prep

Experiments

Analysis

Deployment

Script Workflow

amazon-sagemaker-workbench-demo's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent