mlcommons / mlcube_examples Goto Github PK

View Code? Open in Web Editor NEW

30.0 8.0 32.0 638 KB

MLCube® examples

Home Page: https://mlcommons.org/en/mlcube/

License: Apache License 2.0

Python 89.92% Dockerfile 9.68% Shell 0.40%

mlcube

mlcube_examples's Introduction

MLCube examples

The machine learning (ML) community has seen an explosive growth and innovation in the last decade. New models emerge on a daily basis, but sharing those models remains an ad-hoc process. Often, when a researcher wants to use a model produced elsewhere, they must waste hours or days on a frustrating attempt to get the model to work. Similarly, a ML engineer may struggle to port and tune models between development and production environments which can be significantly different from each other. This challenge is magnified when working with a set of models, such as reproducing related work, employing a performance benchmark suite like MLPerf, or developing model management infrastructures. Reproducibility, transparency and consistent performance measurement are cornerstones of good science and engineering.

The field needs to make sharing models simple for model creators, model users, developers and operators for both experimental and production purpose while following responsible practices. Prior works in the MLOps space have provided a variety of tools and processes that simplify user journey of deploying and managing ML in various environments, which include management of models, datasets, and dependencies, tracking of metadata and experiments, deployment and management of ML lifecycles, automation of performance evaluations and analysis, etc.

We propose an MLCube®, a contract for packaging ML tasks and models that enables easy sharing and consistent reproduction of models, experiments and benchmarks amidst these existing MLOps processes. MLCube differs from an operation tool by acting as a contract and specification as opposed to a product or implementation.

This repository contains a number of MLCube examples that can run in different environments using MLCube runners.

MNIST MLCube downloads data and trains a simple neural network. This MLCube can run with Docker or Singularity locally and on remote hosts. The README file provides instructions on how to run it. MLCube documentation provides additional details.
Hello World MLCube is a simple exampled described in this tutorial.
EMDenoise MLCube downloads data and trains a deep convolutional neural network for Electron Microscopy Benchmark. This MLCube can only run the Docker container. The README file provides instructions on how to run it.
Matmul Matmul performs a matrix multiply.

mlcube_examples's People

Contributors

Stargazers

Watchers

mlcube_examples's Issues

mnist_openfl should output data sizes

We missed a requirement: the metrics files should contain data sample counts for their relative operations. We can use any key names we like, so long as they're known at configuration time.

"Training samples", "Evaluation samples", "Validation samples", etc...

Hello World Tutorial isn't working

https://mlperf.github.io/mlcube/getting-started/hello-world/ isn't accessible

Mnist example missing source code

Cannot find the main function script and the docker file which were in the /build dir in the old repo.
Not sure whether it was removed on purpose, but I think it should be included.

Config 2.0 unified storage description

This proposes a unified storage description for config 2.0.

Today, MLCube relies on a simple "file path" approach to describe the inputs and outputs of their tasks. However, for many platforms, such like Kubernetes, it is not possible to use a single file path, because they either have complex storage backend, or use their own layer of storage abstractions, which do not use "paths" to refer to the corresponding locations in the data storage. This proposal aims to address this problem by providing a unified way of describing storage that can cover both local file systems and more complex storage solutions.

A storage backend can be described in the platform section of the config, which is supplied by the user at run-time. The storage description consists of 2 main parts: a name that will be used as a reference in the tasks' I/O paths, and a platform-specific spec that provides the details of the storage backend in the target platform, so that the runner can use it to find the right location of data.
We do not change the "path"-like descriptions of task inputs/outputs in order to keep that simple, however, we do introduce a "variable"-like component as a part of the path, so that we can use this "variable" as a reference to the corresponding storage backend and use the rest of the path as a relative path to the given storage.
A most straight-forward example of such "variable" is "$WORKSPACE" which is currently being used to refer to a specific dir in local file system. With the new proposal, the "$WORKSPACE", or any "$CUSTOM_NAME" defined by user, can refer to an arbitrary storage backend as specified in the platform section.

Since the detailed spec of the storage is specified in the platform part, it can be decoupled from the shared MLCube config and only appear in the user's config. This also means that how the spec of a given storage backend is writtern should be agreed between a user and a runner, and not relevant to the MLCube publisher.
While we do not have to provide a standard for that specs, we may provide some "guidelines/examples" for popular platforms so that there can be a convention for runner implementors.

The following is an example of how the storage backend can be defined, notice the specs in the platform section and how they are used in the tasks section. Notice also that if we give the storage a name of "WORKSPACE" then we may redirect our default workspace to the specified storage backend, without change the values in the task I/Os.

name: example-mlcube
platform:
  storage:
  - name: K8S_DATA
    spec:
      kubernetes:
        pvc_name: my-pvc
  - name: NFS_DATA
    spec:
      nfs:
        host: 127.0.0.1
        port: 2049
        path: some/nfs/path
container:
  image: mlcommons/mnist:0.0.1
  build_context: "mnist"
  build_file: "Dockerfile"
tasks:
  download:
    io:
    - {name: data_dir, type: directory, io: output, default: $NFS_DATA/data}
    - {name: log_dir, type: directory, io: output, default: $NFS_DATA/logs}
  train:
    io:
    - {name: data_dir, type: directory, io: input, default: $K8S_DATA/data}
    - {name: parameters_file, type: file, io: input, default: $K8S_DATA/parameters/default.parameters.yaml}
    - {name: log_dir, type: directory, io: output, default: $K8S_DATA/logs}
    - {name: model_dir, type: directory, io: output, default: $K8S_DATA/model}

Link to MLCommons CLA is not working

When I click on the MLCOmmans CLA I get the following:

The link redirects to https://github.com/mlcommons/systems/blob/main/mlcommons_cla.txt

THis makes it not possible for me to do pull requests. ALs I am already participating in MLcommons and have previously signed the CLA but under my google e-mail. Not sure if there is an issue with the process

MLCommons CLA bot:
Thank you for your submission, we really appreciate it. We ask that you sign our MLCommons CLA and be a member before we can accept your contribution. If you are interested in membership, please contact [email protected] .
0 out of 1 committers have signed the MLCommons CLA.
❌ @laszewsk
You can retrigger this bot by commenting recheck in this Pull Request

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.

mnist_openfl -> mnist_openflower

It would be nice to rename the mnist_openfl example to mnist_openflower to acknowledge how actively the Flower folks have been working with us.

Not a big deal, just trying to be kind to our friends!

ARM containers

It would be helpful to have aarch64 containers for the examples in addtition to x86_64, to be specific, hello_world is available in just x86_64

mlcommons / mlcube_examples Goto Github PK

mlcube_examples's Introduction

MLCube examples

mlcube_examples's People

Contributors

Stargazers

Watchers

Forkers

mlcube_examples's Issues

mnist_openfl should output data sizes

Hello World Tutorial isn't working

Mnist example missing source code

Config 2.0 unified storage description

Link to MLCommons CLA is not working

mnist_openfl -> mnist_openflower

ARM containers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent