Code Monkey home page Code Monkey logo

rmpc2 / dvc-gitactions Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mlops-guide/dvc-gitactions

0.0 0.0 0.0 12.28 MB

Example project with a complete MLOps cycle: versioning data, generating reports on pull requests and deploying the model on releases with DVC and CML using Github Actions and IBM Watson. Part of the Engineering Final Project @ Insper

Home Page: https://mlops-guide.github.io

Shell 1.18% Python 18.10% HCL 0.45% Jupyter Notebook 80.27%

dvc-gitactions's Introduction

๐Ÿงฌ DVC CI/CD MLOps Pipeline

MLOps pipeline with DVC and CML using Github Actions and IBM Cloud

model-deploy-on-release Python Package and Test

Video Demo

Documentation and Implementation Guide

๐Ÿ”ฐ Milestones

  • Data Versioning: DVC
  • Machine Learning Pipeline: DVC Pipeline (preprocess, train, evaluate)
  • CI/CD: Unit testing with Pytest, pre-commit and Github Actions
  • CML: Continuous Machine Learning and Github Actions
  • Deploy on release: Github Actions and IBM Watson
  • Monitoring: OpenScale
  • Infrastructure-as-a-code: Terraform script

๐Ÿ“‹ Requirements

  • DVC
  • Python3 and pip
  • Access to IBM Cloud Object Storage

๐Ÿƒ๐Ÿป Running Project

๐Ÿ”‘ Setup IBM Bucket Credentials

MacOS

Setup your credentials on ~/.aws/credentials and ~/.aws/config. DVC works perfectly with IBM Obejct Storage, although it uses S3 protocol, you can also see this in other portions of the repository.

~/.aws/credentials

[default]
aws_access_key_id = {{Key ID}}
aws_secret_access_key = {{Access Key}}

โœ… Pre-commit Testings

In order to activate pre-commit testing you need pre-commit

Installing pre-commit with pip

pip install pre-commit

Installing pre-commit on your local repository. Keep in mind this creates a Github Hook.

pre-commit install

Now everytime you make a commit, it will run some tests defined on .pre-commit-config.yaml before allowing your commit.

Example

$ git commit -m "Example commit"

black....................................................................Passed
pytest-check.............................................................Passed

โš—๏ธ Using DVC

Download data from the DVC repository(analog to git pull)

dvc pull

Reproduces the pipeline using DVC

dvc repro

โš™๏ธ DVC Pipelines

โœ‚๏ธ Preprocessing pipeline

dvc run -n preprocess -d ./src/preprocess_data.py -d data/weatherAUS.csv \
-o ./data/weatherAUS_processed.csv -o ./data/features.csv \
python3 ./src/preprocess_data.py ./data/weatherAUS.csv

๐Ÿ“˜ Training pipeline

dvc run -n train -d ./src/train.py -d ./data/weatherAUS_processed.csv \
 -d ./src/model.py \
-o ./models/model.joblib \
python3 ./src/train.py ./data/weatherAUS_processed.csv ./src/model.py 200

๐Ÿ“Š Evaluate pipeline

dvc run -n evaluate -d ./src/evaluate.py -d ./data/weatherAUS_processed.csv \
-d ./src/model.py -d ./models/model.joblib -o ./results/metrics.json \
-o ./results/precision_recall_curve.png -o ./results/roc_curve.png \
python3 ./src/evaluate.py ./data/weatherAUS_processed.csv ./src/model.py ./models/model.joblib

๐Ÿ™ Git Actions

๐Ÿ” IBM Credentials

Fill the credentials_example.yaml file and rename it to credentials.yaml to be able to run the scripts that require IBM keys. โš ๏ธ Never upload this file to GitHub!

To use Git Actions to deploy your model, you'll need to encrypt it, to do that run the command bellow and choose a strong password.

gpg --symmetric --cipher-algo AES256 credentials.yaml 

Now in the GitHub page for the repository, go to Settings->Secrets and add the keys to the following secrets:

AWS_ACCESS_KEY_ID (Bucket Credential)
AWS_SECRET_ACCESS_KEY (Bucket Credential)
IBM_CREDENTIALS_PASS (password for the encrypted file)

dvc-gitactions's People

Contributors

guipleite avatar arthurolga avatar gabriellm1 avatar vinigl avatar biogeek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.