Code Monkey home page Code Monkey logo

ml-pipeline's Introduction

Machine learning pipeline

This repo provides an example of how to incorporate popular machine learning tools such as DVC, MLflow, and Hydra in your machine learning project. I use my project on predicting aggressive tweets as an example.

Find the article on how to use MLflow and Hydra here

Find the article on how to use DVC here

DVC

DVC is a data version control tool. To install DVC, run

pip install dvc

Hydra

With Hydra, you can compose your configuration dynamically. To install Hydra, simply run

pip install hydra-core --upgrade

MLflow

MLflow is a platform to manage the ML lifecycle, including experimentation, reproducibility, and deployment. Install MLflow with

pip install mlflow

Structure's explanation

  • src: file for source code
  • mlruns: file for mlflow runs
  • configs: to keep config files
  • outputs: results from the runs of Hydra. Each time you run your function nested inside Hydra's decoration, the output will be saved here. If you want to change the directory in mlflow folder, use
import mlflow
import hydra
from hydra import utils

mlflow.set_tracking_uri('file://' + utils.get_original_cwd() + '/mlruns')
  • src/preprocessing.py: file for preprocessing
  • src/train_pipeline.py: training's pipeline
  • src/train.py: file for training and saving model
  • src/predict.py: file for prediction and loading model

How to pull the data with DVC

Pull the data from Google Drive

dvc pull 

How to run this file

To run the configs and see how these experiments are displayed on MLflow's server, clone this repo and run

python src/train.py

Once the run is completed, you can access to MLflow's server with

mlflow ui

Access http://localhost:5000/ from the same directory that you run the file, you should be able to see your experiment like this image

ml-pipeline's People

Contributors

karthikkaiplody avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.