Code Monkey home page Code Monkey logo

seldonio / mlserver Goto Github PK

View Code? Open in Web Editor NEW
582.0 24.0 169.0 56.26 MB

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more

Home Page: https://mlserver.readthedocs.io/en/latest/

License: Apache License 2.0

Makefile 0.26% Python 97.92% Shell 0.70% Dockerfile 0.46% Jinja 0.11% JavaScript 0.55%
machine-learning scikit-learn xgboost lightgbm mlflow seldon-core kfserving

mlserver's Introduction

MLServer

An open source inference server for your machine learning models.

video_play_icon

Overview

MLServer aims to provide an easy way to start serving your machine learning models through a REST and gRPC interface, fully compliant with KFServing's V2 Dataplane spec. Watch a quick video introducing the project here.

  • Multi-model serving, letting users run multiple models within the same process.
  • Ability to run inference in parallel for vertical scaling across multiple models through a pool of inference workers.
  • Support for adaptive batching, to group inference requests together on the fly.
  • Scalability with deployment in Kubernetes native frameworks, including Seldon Core and KServe (formerly known as KFServing), where MLServer is the core Python inference server used to serve machine learning models.
  • Support for the standard V2 Inference Protocol on both the gRPC and REST flavours, which has been standardised and adopted by various model serving frameworks.

You can read more about the goals of this project on the initial design document.

Usage

You can install the mlserver package running:

pip install mlserver

Note that to use any of the optional inference runtimes, you'll need to install the relevant package. For example, to serve a scikit-learn model, you would need to install the mlserver-sklearn package:

pip install mlserver-sklearn

For further information on how to use MLServer, you can check any of the available examples.

Inference Runtimes

Inference runtimes allow you to define how your model should be used within MLServer. You can think of them as the backend glue between MLServer and your machine learning framework of choice. You can read more about inference runtimes in their documentation page.

Out of the box, MLServer comes with a set of pre-packaged runtimes which let you interact with a subset of common frameworks. This allows you to start serving models saved in these frameworks straight away. However, it's also possible to write custom runtimes.

Out of the box, MLServer provides support for:

Framework Supported Documentation
Scikit-Learn MLServer SKLearn
XGBoost MLServer XGBoost
Spark MLlib MLServer MLlib
LightGBM MLServer LightGBM
CatBoost MLServer CatBoost
Tempo github.com/SeldonIO/tempo
MLflow MLServer MLflow
Alibi-Detect MLServer Alibi Detect
Alibi-Explain MLServer Alibi Explain
HuggingFace MLServer HuggingFace

MLServer is licensed under the Apache License, Version 2.0. However please note that software used in conjunction with, or alongside, MLServer may be licensed under different terms. For example, Alibi Detect and Alibi Explain are both licensed under the Business Source License 1.1. For more information about the legal terms of products that are used in conjunction with or alongside MLServer, please refer to their respective documentation.

Supported Python Versions

🔴 Unsupported

🟠 Deprecated: To be removed in a future version

🟢 Supported

🔵 Untested

Python Version Status
3.7 🔴
3.8 🔴
3.9 🟢
3.10 🟢
3.11 🔵
3.12 🔵

Examples

To see MLServer in action, check out our full list of examples. You can find below a few selected examples showcasing how you can leverage MLServer to start serving your machine learning models.

Developer Guide

Versioning

Both the main mlserver package and the inference runtimes packages try to follow the same versioning schema. To bump the version across all of them, you can use the ./hack/update-version.sh script.

We generally keep the version as a placeholder for an upcoming version.

For example:

./hack/update-version.sh 0.2.0.dev1

Testing

To run all of the tests for MLServer and the runtimes, use:

make test

To run run tests for a single file, use something like:

tox -e py3 -- tests/batch_processing/test_rest.py

mlserver's People

Contributors

adriangonz avatar agrski avatar ascillitoe avatar axsaucedo avatar dependabot[bot] avatar dtpryce avatar edshee avatar fogdong avatar github-actions[bot] avatar iamahern avatar jesse-c avatar joerunde avatar johnpaulett avatar krishanbhasin-gc avatar m4nouel avatar mert-kirpici avatar nanbo-liu avatar njhill avatar pablobgar avatar pauledwardbrennan avatar pepesi avatar rafalskolasinski avatar rio avatar saeid93 avatar sakoush avatar salehbigdeli avatar seldondev avatar ukclivecox avatar vanducng avatar vtaskow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlserver's Issues

[gRPC] Model outputs get ignored

Hi,

thank you for this nice project. I am using the following go code to create a gRPC request for accessing a sklearn model. In my understanding the model should use the predict_proba() method and return it as output "predict_proba", however I only get the output "predict" returned.

	return &pb.ModelInferRequest{
		ModelName: c.modelName,
		Outputs: []*pb.ModelInferRequest_InferRequestedOutputTensor{
			{
				Name: "predict_proba",
			},
		},
		Inputs: []*pb.ModelInferRequest_InferInputTensor{
			{
				Name:     c.modelName,
				Datatype: "FP32",
				Shape:    []int64{1, int64(len(profile))},
				Contents: &pb.InferTensorContents{
					Fp32Contents: profile,
				},
			},
		},
	}

If I add an output {Name:"test"} the code in https://github.com/SeldonIO/MLServer/blob/0936e26354a6df77c7692faaa9c9467f5f674573/runtimes/sklearn/mlserver_sklearn/sklearn.py should raise an InferenceError Exception in line 58 but this does not happen, I assume that payload.outputs is None. I have also tested adding another input, this leads to the Exception in Line 46.

Am I setting the Outputs accordingly? How can I access sklearn's predict_proba method?

Add support for tracing

Trace each inference step within MLServer. These traces can be pushed to Jaeger or similar OpenTracing backends.

Add support to provide feedback

Seldon Core currently supports providing a “reward signal” as feedback for model’s predictions. This is received by the model as a request sent to a /feedback endpoint. Since this modifies the server protocol, it would be good to consider adding this as a server extension for MLServer.

Add lockfile to MLServer

To better support use cases where mlserver is used as a library, we shouldn't restrict the dependencies versions too much. Instead, we should look into adding some sort of lockfile (e.g. through Poetry or Conda) that locks the versions in the Docker image (so that the "app-level" environment is kept consistent).

Package up environment as part of MLServer CLI

MLServer now has built-in support to unpack and activate a conda-pack tarball. This feature could be leveraged to run the custom environment defined in theconda.yaml file usually present in MLflow model artifacts. Since MLServer expects a tarball, this issue should explore best practices on going from a conda.yaml file to a conda-pack tarball, reducing the potential friction for the user.

One potential solution is to include a utility on the mlserver-mlflow package which bridges this gap for the users.

V2 Inference Format deviation

Describe the bug

When I apply a XGB model using the Seldon-Core XGB server, the V2 Inference returns a non standard response. The outputs.data tensor should be a flattened array. Seldon currently responds with a np tensor.

To reproduce

According to the spec, inference request examples should be return the shape of the data. However the data itself is a flattened array.

My data object should be [0.005064773838967085, 0.007540544960647821, 0.9873946905136108, 0.9904587268829346, 0.005667048040777445, 0.00387418526224792]

With Seldon right now, the array is a multidimensional np data frame.

{'model_name': 'classifier', 'model_version': 'v1', 'id': '0', 'parameters': None, 'outputs': [{'name': 'predict', 'shape': [2, 3], 'datatype': 'FP32', 'parameters': None, 'data': [[0.005064773838967085, 0.007540544960647821, 0.9873946905136108], [0.9904587268829346, 0.005667048040777445, 0.00387418526224792]]}]}

Environment

Model Details

  • Images of your model: [Output of: kubectl get seldondeployment -n <yourmodelnamespace> <seldondepname> -o yaml | grep image: where <yourmodelnamespace>]
  • Logs of your model: [You can get the logs of your model by running kubectl logs -n <yourmodelnamespace> <seldonpodname> <container>]

YAML Spec through Seldon Core

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost
  labels: 
    mlctl_name: seldon-xgb-iris
    mlctl_type: model_deployment
spec:
  name: iris
  protocol: kfserving # Activate the V2 protocol
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: s3://iris
      envSecretRefName: seldon-init-container-secret
      name: classifier
    name: default
    replicas: 1

Publish in PyPi

The mlserver package is currently not published in PyPi. This leads to work arounds (e.g. pip install git+github.com/SeldonIO/mlserver) which are not ideal.

Explore content parsing / casting

Seldon Core currently allows to parse / cast the user’s payload into a friendlier format. For example, if the user sends a payload such as {“jsonData”: {“foo”: “bar”}}, their def predict(payload) method will receive a JSON object. Likewise, if they send a payload such as {“data”: {“ndarray”: [0, 1, 2, 3]}}, their predict method will receive a Numpy array.

Currently, the parsed types supported by SC are:

  • Strings, sent as strData.
  • JSON (also valid through gRPC), sent as jsonData.
  • Binary data, encoded as base64, sent as binData.
  • Numpy arrays, sent as data.ndarray or data.tensor.
  • TFServing arrays, sent as tftensor.

However, the V2 Data Plane only allows to send data as either BYTES, BOOL or numeric formats (e.g. FP32). It could be useful to extend this, so that the user can provide more information about the payload, which can then allow MLServer to cast it to other types.

Proposal

The idea is that the server reads the raw content from the data field (as per
the V2 data plane), but then looks at an extra key under parameters.content_type which
dictates how that content should be parsed. This could mean reading it as a Numpy array, a JSON dictionary or any other type.

This extension could be implemented as a middleware (going in and out).

Example for a string

The datatype can be raw BYTES, but setting parameters.content_type to string.

{
  "datatype": "BYTES",
  "data": "this is my query",
  "parameters": {
    "content_type": "string"
  }
}

A server with this extension could then read the data field and decode it as UTF8 to treat it as a string.

Example for a Numpy array

The datatype can be set to FP32, but the user could have the capability of specifying that this should be treated as a Numpy array by setting parameters.content_type to ndarray.

{
  "datatype": "FP32",
  "data": [0, 12, 2, 3, 4, 5],
  "parameters": {
    "content_type": "ndarray"
  }
}

A server with this extension could then read the data field and parse that as a Numpy array.

Example for JSON

The datatype can be raw BYTES, but setting parameters.content_type to
json.

{
  "datatype": "BYTES",
  "data": "{ 'foo': 'bar' }",
  "parameters": {
    "content_type": "json"
  }
}

A server with this extension could then read the data field and parse that as
JSON.

Example for an image

The datatype and data fields would respect the V2 API, but the user can
specify an image type through parameters.content_type.

{
  "datatype": "FP32",
  "data": [0, 1, 2, 3, 3, 4, 5, 6, 6, 67, 7, 7],
  "parameters": {
    "content_type": "image"
  }
}

The server could then read the data and transform it into PIL.Image.

Support Alibi Detect as Prepackaged Server

Currently the alibi detect server can be a prepackaged server in itself as it has a similar workflow to the normal models. This means that it could use a lot of the similar workflows involved in the alibi-detect-server in Seldon Core to build a base that could import models for outlier detectors, drift detectors and adv detectors from Alibi Detect.

Add support for poll mode

Watch for changes on the model repository to refresh models automatically whenever there are new versions of the model artifacts.

Support MLflow current protocol

As a follow-up to #167, it would be interesting to explore adding a custom endpoint to the mlserver-mlflow runtime which supports MLflow's existing API. This would help reduce friction on user adoption of MLSever, as well as a temporary stopgap for users while they adopt the V2 protocol.

Explore buildpacks

It seems that buildpacks offer an easy way to go from code to image. This could be leveraged by MLServer to ease the process of building custom inference runtimes.

Expose MLflow's model signature types as "annotated" model metadata

As a follow up to #163 and #164, it would be interesting to explore whether it's possible to expose the types information missed when converting from MLflow's model signature to the V2 Dataplane through annotations in the V2 model metadata.

For example, in the case of MLflow's string type, one could annotate the V2 Dataplane metadata for that particular input saying that even though it should be encoded as BYTES at the low-level, it's meant to be compatible with MLflow's string type.

Multi-language Runtimes

Create multi-language wrappers that can run Java, C++ and R models. For this, we can leverage the existing research in Seldon Core which leverages tools like JNI and PyBind to bridge from Python to other runtimes.

Infer model's name from folder's name

Currently, particularly in the case of MMS, MLServer requires models to specify their name in a model-settings.json file. This forces all models to ship that file alongside their model artifacts.

It would be good to, instead, infer the model's name from the folder name (if not present in env). This would reduce friction on adopting MLServer, the V2 protocol and MMS.

Add support for Alibi Explain

It would be good to add a custom extension that exposes an "explain" endpoint, similar to what's currently exposed in SC and KFServing. This would involve loading an explainer as a model.

Allow to load settings and model-settings from CLI flags

Currently, mlserver relies on having settings.json and model-settings.json files present or falling back to environment variables. It would be good to also allow users to specify these flags directly through the CLI.

For that, we should look for an integration between Pydantic (what we use to define the settings parameters) and some CLI library. We are currently using click for our CLI, but it doesn't seem that both projects are integrated.

Add support for dataframe inputs to MLflow models

As a follow-up to #160, some MLflow models require their inputs to be in the form of a Pandas Dataframe. In order to support these models, it would be good if the mlserver-mlflow runtime could convert V2 payloads to Pandas Dataframes.

Note that this should be straightforward for V2 payloads with multiple "input heads", where each input head can be treated as a column. That is, a payload such as:

{
  "inputs": [
    {
        "name": "a",
        "data": [1, 4],
        "shape": [2],
        "datatype": "INT32"
    },
      {
        "name": "b",
        "data": [2, 5],
        "shape": [2],
        "datatype": "INT32"
    },
      {
        "name": "c",
        "data": [3, 6],
        "shape": [2],
        "datatype": "INT32"
    }
  ]
}

, could be encoded to the following Pandas Dataframe:

{
    "columns": ["a", "b", "c"],
    "data": [[1, 2, 3], [4, 5, 6]]
}

Add support for custom gRPC endpoints

On top of #167, it would be great to extend the support for custom endpoints to gRPC calls as well. However, it's not clear at the moment whether this is easily achievable.

Explore "tags" extension to model metadata

The model metadata, as defined by the V2 Dataplane, currently has a hardcoded set of data types to specify for each input (e.g. BYTES, INT32, etc.). While these types are great to encode lower-level encodings, they can fall short on some cases where we need to supply information about a "richer" higher-level data type.

For example, we can think of an image input. We can currently encode image objects as BYTES inputs. However, the model metadata doesn't provide any information about how should this encoding look like. Should it be RGB, BGR, 8-bit Greyscale? What image size does the model expect?

Since this information can be quite arbitrary (and probably shouldn't be explicitly defined in the protocol), it could be interesting to explore extending the model metadata schema to support a simple (1-level) string-to-string dictionary which lets the user encode information, such as how should an input be encoded.

Content conversion / casting

To prove some of the value of this extension, this issue should explore how can MLServer leverage these extra information. During the early stages of the project, there was a discussion of how we could implement a "payload conversion" pipeline. This pipeline could use some of the extra metadata, to convert "raw inputs" (e.g. BYTES), to fully decoded objects (e.g. a pillow.Image instance).

Expose MLflow's model signature as metadata

MLflow models can define a model signature. The information contained in this model signature has a few parallelisms to the V2 Dataplane's model metadata. Therefore, it would be good to explore how we could convert from the model signature (which should be present in a MLflow's model artifact) to the model metadata "in-the-fly" so that the same information can still be exposed.

Note that there is currently a mismatch between MLflow's native types and the V2 Dataplane accepted data types (e.g. string vs BYTES). Therefore, there may be some loss of information when converting from the former to the latter format. We can explore further down the line how this loss of information can be minimised.

Add MLflow runtime with basic payload support

Add a new mlserver-mlflow runtime which allows a user to point to a MLflow Model artifact (or folder) to load a model. As a initial step, the mlserver-mlflow runtime should take care of converting the V2 Dataplane payload to a "dict of tensors", which is one of the formats expected by MLflow models.

To translate this, we could just turn the V2 input into an "index", where the keys would be the inputs[].name fields. That is, an input such as:

{
  "inputs": [
    {
        "name": "a",
        "data": [1, 4],
        "shape": [2],
        "datatype": "INT32"
    },
      {
        "name": "b",
        "data": [2, 5],
        "shape": [2],
        "datatype": "INT32"
    },
      {
        "name": "c",
        "data": [3, 6],
        "shape": [2],
        "datatype": "INT32"
    }
  ]
}

, could be turned to the following MLflow-compatible dictionary of tensors:

{
    "inputs": {
          "a": ["s1", "s2", "s3"], 
          "b": [1, 2, 3], 
          "c": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
      }
}

Tensors vs Dataframes

While some MLflow models require their inputs to be encoded as dataframes, some others will still need a dictionary of tensors (see #160). To account for this, the scope of this issue includes looking at ways to infer which type of input does a model require, as well as providing a way for the user to choose which input type to use.

The latter could be done through the V2 Protocol's inputs[].parameters field, by setting a "magic key" (e.g. mlflow_encoding: dataframe). This key can then be read by the mlserver-mlflow runtime to choose one encoding or the other.

runtime packages have no source

I'm not a python expert but I think there is an issue with python packages for mlserver runtimes.
If I do the following:
pip install mlserver mlserver-xgboost mlserver-sklearn
It appears to install all 3 packages however only the mlserver has a source directory

First to show where package is installed:

(xgboost-env) williao@williao-G3-3579:~/dev/python/xgboost/sklearn-demo$ pip show mlserver-xgboost
Name: mlserver-xgboost
Version: 0.2.0
Summary: XGBoost runtime for MLServer
Home-page: https://github.com/SeldonIO/MLServer.git
Author: Seldon Technologies Ltd.
Author-email: [email protected]
License: Apache 2.0
Location: /home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages
Requires: mlserver, xgboost
Required-by: 

Then if I list the mlserver package dirs

ls /home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver*
/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver:
cli  errors.py  grpc  handlers  __init__.py  model.py  __pycache__  registry.py  repository.py  rest  server.py  settings.py  types  utils.py  version.py

/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver-0.2.0.dist-info:
entry_points.txt  INSTALLER  LICENSE  METADATA  RECORD  top_level.txt  WHEEL

/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver_sklearn-0.2.0.dist-info:
INSTALLER  LICENSE  METADATA  RECORD  top_level.txt  WHEEL

/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver_xgboost-0.2.0.dist-info:
INSTALLER  LICENSE  METADATA  RECORD  top_level.txt  WHEEL

you can see that mlserver package has a source dir (mlserver) and a dist-info dir (mlserver-0.2.0.dist-info)
but mlserver-xgboost and mlserver-sklearn only have dist-info dir.

If I run mlserver start I get an error that package/module is missing:

mlserver start .
implementation
  ensure this value contains valid import path or valid callable: No module named 'mlserver_sklearn' (type=type_error.pyobject; error_message=No module named 'mlserver_sklearn')

The workaround for me is to run pip install -r requirements.txt on a file containing:

git+https://github.com/seldonio/mlserver#egg=mlserver
git+https://github.com/seldonio/mlserver#egg=mlserver-xgboost&subdirectory=runtimes/xgboost
git+https://github.com/seldonio/mlserver#egg=mlserver-sklearn&subdirectory=runtimes/sklearn

After this new install there is a source directory:

(xgboost-env) williao@williao-G3-3579:~/dev/python/xgboost$ ls /home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlse*
/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver:
cli  errors.py  grpc  handlers  __init__.py  model.py  __pycache__  registry.py  repository.py  rest  server.py  settings.py  types  utils.py  version.py

/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver-0.2.1.dev0-py3.7.egg-info:
dependency_links.txt  entry_points.txt  installed-files.txt  PKG-INFO  requires.txt  SOURCES.txt  top_level.txt

/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver_sklearn:
__init__.py  __pycache__  sklearn.py  version.py

/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver_sklearn-0.2.1.dev0-py3.7.egg-info:
dependency_links.txt  installed-files.txt  PKG-INFO  requires.txt  SOURCES.txt  top_level.txt

/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver_xgboost:
__init__.py  __pycache__  version.py  xgboost.py

/home/williao/dev/python/xgboost/xgboost-env/lib/python3.7/site-packages/mlserver_xgboost-0.2.1.dev0-py3.7.egg-info:
dependency_links.txt  installed-files.txt  PKG-INFO  requires.txt  SOURCES.txt  top_level.txt

and mlserver start . works

Empty fields (except `name`) in the returned metadata when using mlserver-sklearn in KFServing

Hi experts,

I was trying to follow the steps listed in https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/sklearn/v2 to setup an InferenceService supporting V2 protocol. The infer interface works fine. However when I tried to retrieve the Model MetaData via requests like:

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/sklearn-irisv2

All I got is

{"name":"sklearn-irisv2","versions":[],"platform":"","inputs":[],"outputs":[]}

Is this expected or am I miss anything? I want to get the intput/ouput tensor metadata.

Let me know if I should post it in KFServing repo instead. Thanks for the help!

Support custom endpoints and payloads

The V2 Dataplane allows for the concept of “extensions”. That is, endpoints outside of the V2 Protocol specification, which take any arbitrary payload. However, this would usually require outlining the full gRPC spec of the new endpoint. An alternative, could be to provide a new endpoint which only works under HTTP. While this may be sub-optimal, it would still offer a way for users to write their own custom protocols.

The capability to provide new endpoints would be introduced at the runtime level. That is, an inference runtime would be able to register a “custom endpoint”.

class CustomRuntime(MLModel):
    # ...

    @mlserver.custom_endpoint("/my-custom-endpoint")
    def invocations(self, payload: dict) -> dict:
        # Parse custom protocol payload and call model
        pass 

This would register an endpoint as /models/<model-name>/versions/<version>/<custom-path>. When a request is sent to this endpoint, MLServer would check whether the inference runtime used for model <model-name> supports the custom endpoint with path <custom-path>. If it doesn’t, it would then return a 404. Otherwise, it would route the request to a method registered as the handler for the custom endpoint in the inference runtime.

Add support for metrics

Add a custom extension which exposes a metrics API that can be scrapped by Prometheus. We can base this on Triton's statistics extension to ensure API-wise parity.

We could also use this chance to explore OpenTelemetry and whether we could expose vendor-agnostic metrics.

Handle multiple models with custom endpoints

Following up from #167, there are a few things to take into account before adding support for custom endpoints across multiple models.

At the moment we just load the route as-is, without namespacing it with the model name. This means that loading multiple models could lead to clashes. For example, let's think of an inference runtime which registers a custom endpoint with path /foo. After loading 10 model instances using that runtime, it wouldn't be clear any more which one should be used to serve the custom endpoint with path /foo.

Some solutions that could be explored are:

  • Disable custom endpoints when MMS is enabled. This would require adding the option to disable MMS.
  • Namespace the custom path, e.g. registering /v2/models/<model-name>/versions/<model-version>/foo instead of just /foo. This could be easy to tackle, but it's not clear whether the custom endpoint would be as useful after adding the extra prefix (i.e. mainly as it would make it incompatible with legacy upstream services).

Parallel Inference

With the support for MMS, there is a question of how we can support parallel inference at the same time across multiple models. The main blocker for this is Python's GIL, which blocks true parallelism within the same process.

The scope of this issue is to explore different alternatives (e.g. using multiprocessing) to support parallel inference.

Alibi Runtime

Create an*"inference"** runtime that lets you run Alibi Detectors and Explainers.

Getting started and usage documentation

There are currently no docs for MLServer. We should work on add a initial set of documentation that allows users to get kickstarted on using MLServer. Initially, we could focus on just a "getting started" and "usage" section to the README.md page.

Add request-level content type

Following #163, it would be good to extend the content_type concept to entire requests. That is, encoding / decoding a set of inputs that go together, like a DataFrame or a dictionary of tensors.

Convert input types based on MLflow's model signature

MLflow models provide a typing mechanism through their model signatures. These types are used to validate the input payloads, therefore is important that the mlserver-mlflow runtime takes them into account when converting from the V2 Protocol to an MLflow-valid input (see #160 and #161).

There is currently a mismatch between the types supported by the V2 Dataplane and MLflow native types. For example,

  • MLflow allows you to set an input column as a string, whereas BYTES is the closest type in the V2 Dataplane.
  • MLflow allows you to set an input column as binary, which is expected to be a base64 string. This is different to the BYTES field in the V2 Dataplane (which doesn't specify explicitly how these field needs to be transmitted).

This conversion will also need to take into account how to convert between the V2 "lower-level" types and MLflow's native types.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.