Code Monkey home page Code Monkey logo

demml / opsml Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 293.5 MB

A common interface for registering, validating and auditing machine learning artifacts

Home Page: https://demml.github.io/opsml/

License: MIT License

Makefile 0.40% Python 77.81% JavaScript 17.81% Mako 0.05% PureBasic 3.93%
artifact-registry data-science deep-learning framework generative-ai machine-learning machine-learning-library ml mlops model-deployment production python type-checking machine-learning-operations

opsml's Introduction


opsml logo

Universal Machine Learning Artifact Registration Platform

Unit Tests Examples Style Ruff Py-Versions Checked with mypy codecov Pydantic gitleaks

What is it?

OpsML provides tooling that enables data science and engineering teams to better govern and manage their machine learning projects and artifacts by providing a standardized and universal registration system and repeatable patterns for tracking, versioning and storing ML artifacts.

Features:

  • Simple Design: Standardized design that can easily be incorporated into existing projects.

  • Cards: Track, version and store a variety of ML artifacts via cards (data, models, runs, projects) and a SQL-based card registry system. Think trading cards for machine learning.

  • Type Checking: Strongly typed and type checking for data and model artifacts.

  • Support: Robust support for a variety of ML and data libraries.

  • Automation: Automated processes including onnx model conversion, metadata creation and production packaging.

Incorporate into Existing Workflows

Add quality control to your ML projects with little effort! With opsml, data and models are added to interfaces and cards, which are then registered via card registries.

Incorporate into Existing Workflows

Given its simple and modular design, opsml can be easily incorporated into existing workflows.


opsml logo

Installation:

Poetry

poetry add opsml

Pip

pip install opsml

Setup your local environment:

By default, opsml will log artifacts and experiments locally. To change this behavior and log to a remote server, you'll need to set the following environment variables:

export OPSML_TRACKING_URI=${YOUR_TRACKING_URI}

Quickstart

If running the example below locally without a server, make sure to install the server extra:

poetry add "opsml[server]"
# imports
from sklearn.linear_model import LinearRegression
from opsml import (
    CardInfo,
    CardRegistries,
    DataCard,
    DataSplit,
    ModelCard,
    PandasData,
    SklearnModel,
)
from opsml.helpers.data import create_fake_data


info = CardInfo(name="linear-regression", repository="opsml", user_email="[email protected]")
registries = CardRegistries()


#--------- Create DataCard ---------#

# create fake data
X, y = create_fake_data(n_samples=1000, task_type="regression")
X["target"] = y

# Create data interface
data_interface = PandasData(
    data=X,
    data_splits=[
        DataSplit(label="train", column_name="col_1", column_value=0.5, inequality=">="),
        DataSplit(label="test", column_name="col_1", column_value=0.5, inequality="<"),
    ],
    dependent_vars=["target"],
)

# Create and register datacard
datacard = DataCard(interface=data_interface, info=info)
registries.data.register_card(card=datacard)

#--------- Create ModelCard ---------#

# split data
data = datacard.split_data()

# fit model
reg = LinearRegression()
reg.fit(data["train"].X.to_numpy(), data["train"].y.to_numpy())

# create model interface
interface = SklearnModel(
    model=reg,
    sample_data=data["train"].X.to_numpy(),
    task_type="regression",  # optional
)

# create modelcard
modelcard = ModelCard(
    interface=interface,
    info=info,
    to_onnx=True,  # lets convert onnx
    datacard_uid=datacard.uid,  # modelcards must be associated with a datacard
)
registries.model.register_card(card=modelcard)

Table of Contents

Usage

Now that opsml is installed, you're ready to start using it!

It's time to point you to the official Documentation Website for more information on how to use opsml

Advanced Installation Scenarios

Opsml is designed to work with a variety of 3rd-party integrations depending on your use-case.

Types of extras that can be installed:

  • Postgres: Installs postgres pyscopg2 dependency to be used with Opsml

    poetry add "opsml[postgres]"
  • Server: Installs necessary packages for setting up a Fastapi-based Opsml server

    poetry add "opsml[server]"
  • GCP with mysql: Installs mysql and gcsfs to be used with Opsml

    poetry add "opsml[gcs,mysql]"
  • GCP with mysql(cloud-sql): Installs mysql and cloud-sql gcp dependencies to be used with Opsml

    poetry add "opsml[gcp_mysql]"
  • GCP with postgres: Installs postgres and gcsgs to be used with Opsml

    poetry add "opsml[gcs,postgres]"
  • GCP with postgres(cloud-sql): Installs postgres and cloud-sql gcp dependencies to be used with Opsml

    poetry add "opsml[gcp_postgres]"
  • AWS with postgres: Installs postgres and s3fs dependencies to be used with Opsml

    poetry add "opsml[s3,postgres]"
  • AWS with mysql: Installs mysql and s3fs dependencies to be used with Opsml

    poetry add "opsml[s3,mysql]"

Environment Variables

The following environment variables are used to configure opsml. When using opsml as a client (i.e., not running a server), the only variable that must be set is OPSML_TRACKING_URI.

Name Description
APP_ENV The environment to use. Supports development, staging, and production
GOOGLE_ACCOUNT_JSON_BASE64 The base64 string of the the GCP service account to use.
OPSML_MAX_OVERFLOW The SQL "max_overflow" size. Defaults to 5
OPSML_POOL_SIZE The SQL connection pool size. Defaults to 10.
OPSML_STORAGE_URI The location of storage to use. Supports a local file system, AWS, and GCS. Example: gs://some-bucket
OPSML_TRACKING_URI Used when logging artifacts to an opsml server (a.k.a., the server which "tracks" artifacts)
OPSML_USERNAME An optional server username. If the server is setup with login enabled, all clients must use HTTP basic auth with this username
OPSML_PASSWORD An optional server password. If the server is setup with login enabled, all clients must use HTTP basic auth with this password
OPSML_RUN_ID If set, the run will be automatically loaded when creating new cards.

Supported Libraries

Opsml is designed to work with a variety of ML and data libraries. The following libraries are currently supported:

Data Libraries

Name Opsml Implementation
Pandas PandasData
Polars PolarsData
Torch TorchData
Arrow ArrowData
Numpy NumpyData
Sql SqlData
Text TextDataset
Image ImageDataset

Model Libraries

Name Opsml Implementation Example
Sklearn SklearnModel link
LightGBM LightGBMModel link
XGBoost XGBoostModel link
CatBoost CatBoostModel link
Torch TorchModel link
Torch Lightning LightningModel link
TensorFlow TensorFlowModel link
HuggingFace HuggingFaceModel link
Vowpal Wabbit VowpalWabbitModel link

Contributing

If you'd like to contribute, be sure to check out our contributing guide! If you'd like to work on any outstanding items, check out the roadmap section in the docs and get started ๐Ÿ˜ƒ

Thanks goes to these phenomenal projects and people for creating a great foundation to build from!

opsml's People

Stargazers

Ajeya Bhat avatar Thorrester avatar

Watchers

Kostas Georgiou avatar

opsml's Issues

PromptRecord for LLMs

Description

Description

Given that type validation and versioning of data and model artifacts are 2 core features of opsml, it seems logical to extend FileRecord to support PromptRecord and allow DataCards to accept PromptRecord as a input. This will allow users who are developing prompts for LLMs to do type validation and versioning of their prompts.

Moreover, we should consider making a host of prompt types in addition to a subclassable base.

Example Prompt Types:

  • TextClassification
  • NamedEntityRecognition
  • Translation
  • TextSummarization
  • QuestionAnswering
  • Reasoning

Potential class capabilities:

  • inject named arguments into text
  • support for validations (prevention of adversarial attacks?)
  • support formatted text to json

Authentication

Description

Add authentication to opsml UI and SDK for those that need an additional layer of security

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.