Code Monkey home page Code Monkey logo

mlflow-framework's Introduction

Modular ML Experiment Framework

Objective

Build a modular machine learning engineering framework built on top of ML FLow that can orchestrate training and hyperparameter tuning experiments

  • on various ML methods
  • using various cleaning and feature extraction pipelines.

The modularity results from the ease with which the data scientist can add, remove, or swap out the various models, hyperparameter sets, or feature extraction pipelines by simply reconfiguring a YAML file.

The framework then compares the best tuned model of each ML method and offers it up as a REST API.

Such a framework could be useful to data scientists because of the ease with which the data science pipeline could be set up. It could also be useful to ML engineers who can orchestrate a periodic training and deployment cycle.

Details
The user can supply through a YAML file the following:

  • URI of data set and ML Flow tracking server
  • names of training/cross-validation functions for various models stored in a designated file
  • names of cleaning feature extraction scripts for various models
  • training parameters/hyperparameters for various models
  • the metric on which to pick best model, etc.

The framework then runs experiments using ML Flow on various models using the supplied specs by

  • applying the chosen data cleaning and feature engineering pipelines to each model and
  • training/cross-validating the various models and using their respective functions and hyperparameters.
    At the end of all the runs, it picks the best model which can served through a REST API endpoint at the user's discretion.

The framework is modular because to add another method type, all the data scientist has to do is add

  • the relevant cleaning, feature extraction, and training functions to the relevant files,
  • their details to the Specs.yaml file, and
  • add the necessary import statements.

Example use and Current status

The project is under development and in its current state represents the framework customised for a specific task: to find the best temperature forecasting model for a given dataset. I've picked this task because it's one that can be approached using various methods:

  • regression using various conventional machine learning and neural network tools, and
  • time series forecasting using various libraries such as Prophet or Darts

Thus, this dataset can a good example of how the framework an orchestrate in a modular fashion ML Flow experiments using the various machine learning methods.

To get a sense of how the eventual framework will work, please install the dependencies as listed in requirements.txt and run the train.py file. It will

  • read the the specifications for various models provided in Specs.yaml
  • engineer features using the multivariate_fe.py created for this specific problem
  • run the respective cross-validation or training functions various models in get_best_model.py
  • pick the best model of the best method.

Key additional features to be implemented

  • a command line tool that can set up the basic file structure with instructions on how to populate the files
  • ability to upsert tracking data to any URI, including one on any cloud service
  • parallelly run ML Flow experiments various methods through multiprocessing
  • generating performance and explainability charts and for each model type
  • facility to package a project created on the framework as a docker container so that can be part of a CI-CD pipeline for periodic retraining.

mlflow-framework's People

Contributors

kaiomurz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.