Code Monkey home page Code Monkey logo

image-classification-using-aws-sagemaker's Introduction

Image Classification using AWS SageMaker

In this project, we use AWS Sagemaker to train a pretrained model that can perform image classification by using the Sagemaker profiling, debugger, hyperparameter tuning and other good ML engineering practices.

Note: This repository relates to AWS Machine Learning Engineer nanodegree provided by Udacity.

Dataset

The provided dataset is the dog breed classification dataset which can be found in the classroom. It contains images from 133 dog breeds divided into training, testing and validation datasets. The dataset can be downloaded from here.

Note: The project is designed to be dataset independent so if there is a dataset that is more interesting or relevant to your work, you are welcome to use it to complete the project.

Project Set Up and Installation

  1. Open Sagemaker Studio and create a folder for your project
  2. Clone the project repo from the Sagemaker Studio.
  3. Download the dataset from here
  4. Unzip the files(if needed)

We use train_and_deploy.ipynb which helps us interface with Sagemaker and submit training jobs to it.

Access

Upload them to an S3 bucket so that Sagemaker can use them for training.

Hyperparameter Tuning

I chose ResNet50 for its ease of use, light weight, and computational power.

To hyperparameter tuning, I tried different values for the following hyperparameters:

  • lr: ContinuousParameter(0.001, 0.1)
  • batch_size: CategoricalParameter([16, 64])
  • epochs: IntegerParameter(5, 10)

And finally, the best config is:

After hyperparamater tuning phase, the model will be trained with best hyperparameters in a training job. We can see a part of the logs inside the job corresponding to training & testing phase of the model.

hpo.py script is the one which be used for setting up hyperparameter tuning process. We use train_model.py for handling the training phase of our classification task.

Debugging and Profiling

Since we have a training job with best hyperparameters, we directly debug and profile that job with the following configuration:

rules = [
    Rule.sagemaker(rule_configs.vanishing_gradient()),
    Rule.sagemaker(rule_configs.overfit()),
    Rule.sagemaker(rule_configs.overtraining()),
    Rule.sagemaker(rule_configs.poor_weight_initialization()),
    ProfilerRule.sagemaker(rule_configs.ProfilerReport()),
]

profiler_config = ProfilerConfig(
    system_monitor_interval_millis=500, framework_profile_params=FrameworkProfile(num_steps=10)
)

debugger_config = DebuggerHookConfig(
    hook_parameters={"train.save_interval": "100", "eval.save_interval": "10"}
)

and put them in estimator instance:

Results

Debugger Line plot

As we can see, our training job is so IO-intensive because GPUMemoryUtilization is oscillating due to memory allocation and release. This observation is compatible with coming results obtained from profiler.

Operators

For both CPU and GPU operators, the three most expensive operations were:

  1. copy_
  2. contiguous
  3. to

which makes sense because these operations deal with memory transfers and allocations.

Rules

LowGPUUtilization rule was the most frequently triggered one. It can happen due to bottlenecks, blocking calls for synchronizations, or a small batch size.

Since the batch size is 16 in our experiment, it's worth to try bigger numbers for batch_size hyperparameter because BatchSize rule was triggered six times in the experiment.

Model Deployment

The model deployment is implemented using a stand-alone script(inference.py in our project). This script should at least all the things for inference of the model which is model_fn.

In notebook, we use the script as shown below:

With having predictor instance, we can invoke the endpoint by some predictions:

As we can see, the model has a successful prediction on this sample case.

image-classification-using-aws-sagemaker's People

Contributors

mohsenmahmoodzadeh avatar douglasbergman avatar

Stargazers

 avatar Ali Jafari avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.