Code Monkey home page Code Monkey logo

awesome-open-source-mlops's Introduction

Awesome Open Source MLOps

Discord Awesome

An awesome & curated list of best open source MLOps tools for data scientists.

Contribute

Contributions are most welcome, please adhere to the contribution guidelines.

Community

You can join our gitter channel to discuss.

Table of Contents

Training

IDEs and Workspaces

  • code server - Run VS Code on any machine anywhere and access it in the browser.
  • conda - OS-agnostic, system-level binary package manager and ecosystem.
  • Docker - Moby is an open-source project created by Docker to enable and accelerate software containerization.
  • Jupyter Notebooks - The Jupyter notebook is a web-based notebook environment for interactive computing.

Frameworks for Training

  • Caffe - A fast open framework for deep learning.
  • ColossalAI - An integrated large-scale model training system with efficient parallelization techniques.
  • DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
  • Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
  • Jax - Autograd and XLA for high-performance machine learning research.
  • Kedro - Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code.
  • Keras - Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow.
  • LightGBM - A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
  • MegEngine - MegEngine is a fast, scalable and easy-to-use deep learning framework, with auto-differentiation.
  • MindSpore - MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
  • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler.
  • Oneflow - OneFlow is a performance-centered and open-source deep learning framework.
  • PaddlePaddle - Machine Learning Framework from Industrial Practice.
  • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
  • PyTorchLightning - The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
  • XGBoost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library.
  • TensorFlow - An Open Source Machine Learning Framework for Everyone.
  • VectorFlow - A minimalist neural network library optimized for sparse data and single machine environments.

Experiment Tracking

  • Aim - an easy-to-use and performant open-source experiment tracker.
  • Guild AI - Experiment tracking, ML developer tools.
  • MLRun - Machine Learning automation and tracking. -Kedro-Viz - Kedro-Viz is an interactive development tool for building data science pipelines with Kedro. Kedro-Viz also allows users to view and compare different runs in the Kedro project.
  • LabNotebook - LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.
  • Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments.

Visualization

  • Maniford - A model-agnostic visual debugging tool for machine learning.
  • netron - Visualizer for neural network, deep learning, and machine learning models.
  • TensorBoard - TensorFlow's Visualization Toolkit.
  • TensorSpace - Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js.
  • dtreeviz - A python library for decision tree visualization and model interpretation.
  • Zetane Viewer - ML models and internal tensors 3D visualizer.

Model

Model Management

  • dvc - Data Version Control | Git for Data & Models | ML Experiments Management
  • ModelDB - Open Source ML Model Versioning, Metadata, and Experiment Management
  • ormb - Docker for Your ML/DL Models Based on OCI Artifacts

Pretrained Model

  • HuggingFace - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
  • PaddleNLP - Easy-to-use and Fast NLP library with awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications.
  • PyTorch Image Models - PyTorch image models, scripts, pretrained weights.

Serving

Frameworks/Servers for Serving

  • BentoML - The Unified Model Serving Framework
  • ForestFlow - Policy-driven Machine Learning Model Server.
  • MOSEC - A machine learning model serving framework with dynamic batching and pipelined stages, provides an easy-to-use Python interface.
  • Multi Model Server - Multi Model Server is a tool for serving neural net models for inference.
  • Service Streamer - Boosting your Web Services of Deep Learning Applications.
  • TFServing - A flexible, high-performance serving system for machine learning models.
  • Triton Server (TRTIS) - The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Optimizations

  • FeatherCNN - FeatherCNN is a high performance inference engine for convolutional neural networks.
  • Forward - A library for high performance deep learning inference on NVIDIA GPUs.
  • NCNN - ncnn is a high-performance neural network inference framework optimized for the mobile platform.
  • PocketFlow - use AutoML to do model compression.
  • TNN - A uniform deep learning inference framework for mobile, desktop and server.

Observability

Large Scale Deployment

ML Platforms

  • ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management.
  • MLflow - Open source platform for the machine learning lifecycle.
  • Kubeflow - Machine Learning Toolkit for Kubernetes.
  • PAI - Resource scheduling and cluster management for AI.
  • Polyaxon - Machine Learning Management & Orchestration Platform.

Workflow

  • Argo - Workflow engine for Kubernetes.
  • Flyte - Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale.
  • Kubeflow - Machine Learning Pipelines for Kubeflow.
  • Metaflow - Build and manage real-life data science projects with ease!
  • ZenML - MLOps framework to create reproducible pipelines.

Scheduling

  • Kueue - Kubernetes-native Job Queueing.
  • PAI - Resource scheduling and cluster management for AI (Open-sourced by Microsoft).
  • Slurm - A Highly Scalable Workload Manager.
  • Volcano - A Cloud Native Batch System (Project under CNCF).
  • Yunikorn - Light-weight, universal resource scheduler for container orchestrator systems.

AutoML

  • Adanet - Tensorflow package for AdaNet.
  • Advisor - open-source implementation of Google Vizier for hyper parameters tuning.
  • Archai - a platform for Neural Network Search (NAS) that allows you to generate efficient deep networks for your applications.
  • auptimizer - An automatic ML model optimization tool.
  • autoai - A framework to find the best performing AI/ML model for any AI problem.
  • AutoGL - An autoML framework & toolkit for machine learning on graphs
  • AutoGluon - AutoML for Image, Text, and Tabular Data.
  • automl-gs - Provide an input CSV and a target field to predict, generate a model + code to run it.
  • autokeras - AutoML library for deep learning.
  • Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch.
  • auto-sklearn - an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
  • AutoWeka - hyperparameter search for Weka.
  • Chocolate - A fully decentralized hyperparameter optimization framework.
  • Dragonfly - An open source python library for scalable Bayesian optimisation.
  • Determined - scalable deep learning training platform with integrated hyperparameter tuning support; includes Hyperband, PBT, and other search methods.
  • DEvol (DeepEvolution) - a basic proof of concept for genetic architecture search in Keras.
  • EvalML - An open source python library for AutoML.
  • FEDOT - AutoML framework for the design of composite pipelines.
  • FLAML - Fast and lightweight AutoML (paper).
  • Goptuna - A hyperparameter optimization framework, inspired by Optuna.
  • HpBandSter - a framework for distributed hyperparameter optimization.
  • HPOlib2 - a library for hyperparameter optimization and black box optimization benchmarks.
  • Hyperband - open source code for tuning hyperparams with Hyperband.
  • Hypernets - A General Automated Machine Learning Framework.
  • Hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
  • hyperunity - A toolset for black-box hyperparameter optimisation.
  • Katib - Katib is a Kubernetes-native project for automated machine learning (AutoML).
  • Keras Tuner - Hyperparameter tuning for humans.
  • learn2learn - PyTorch Meta-learning Framework for Researchers.
  • Ludwig - a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
  • MOE - a global, black box optimization engine for real world metric optimization by Yelp.
  • Model Search - a framework that implements AutoML algorithms for model architecture search at scale.
  • NASGym - a proof-of-concept OpenAI Gym environment for Neural Architecture Search (NAS).
  • NNI - An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
  • Optuna - A hyperparameter optimization framework.
  • Ray Tune - Scalable Hyperparameter Tuning.
  • REMBO - Bayesian optimization in high-dimensions via random embedding.
  • RoBO - a Robust Bayesian Optimization framework.
  • scikit-optimize(skopt) - Sequential model-based optimization with a scipy.optimize interface.
  • Spearmint - a software package to perform Bayesian optimization.
  • TPOT - one of the very first AutoML methods and open-source software packages.
  • Torchmeta - A Meta-Learning library for PyTorch.
  • Vegas - an AutoML algorithm tool chain by Huawei Noah's Arb Lab.

Data

Data Management

  • Dolt - Git for Data.
  • DVC - Data Version Control | Git for Data & Models | ML Experiments Management.
  • Hub - Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size.
  • Quilt - A self-organizing data hub for S3.

Data Ingestion

Data Storage

  • LakeFS - Git-like capabilities for your object storage.

Data Transformation

Feature Engineering

  • FeatureTools - An open source python framework for automated feature engineering

Performance

ML Compiler

Profiling

โฌ† back to top

awesome-open-source-mlops's People

Contributors

gaocegege avatar kemingy avatar lkevinzc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.