A common pattern for deploying Machine Learning (ML) models into production environments - e.g. a ML model trained using the SciKit Learn package in Python and ready to provide predictions on new data, is to expose them as RESTful API microservices hosted from within Docker containers, that are in-turn deployed to a cloud environment for handling everything required for maintaining continuous availability - e.g. fail-over, auto-scaling, load balancing and rolling service updates.
The configuration details for a continuously available cloud deployment are specific to the targeted cloud provider(s) - e.g. the deployment process and topology for Amazon Web Services is not the same as that for Microsoft Azure, which in-turn is not the same as that for Google Cloud Platform. This constitutes knowledge that needs to be acquired for every targeted cloud provider. Furthermore, it is difficult (some would say near impossible) to test entire deployment strategies locally, which makes issues such as networking hard to debug.
Kubernetes is a container orchestration platform that seeks to address these issues. Briefly, it provides a mechanism for defining entire microservice-based application deployment topologies and their service-level requirements for maintaining continuous availability. It is agnostic to the targeted cloud provider, can be run on-premises and even locally on your laptop - all that's required is a cluster of virtual machines running Kubernetes - i.e. a Kubernetes cluster.
This repository contains sample code, configuration files and Kubernetes instructions for demonstrating how a simple Python ML model can be turned into a production-grade RESTful model scoring (or prediction) API service, using Docker and Kubernetes - both locally and with Google Cloud Platform (GCP). It is not a comprehensive guide to Kubernetes, Docker or ML - think of it more as a 'ML on Kubernetes 101' for demonstrating capability and allowing newcomers to Kubernetes (e.g. data scientists who are more focused on building models as opposed to deploying them), to get up-and-running quickly and become familiar with the basic concepts.
We will demonstrate the ML model deployment using two different approaches: a first principles approach using Docker and Kubernetes; and then a deployment using the Seldon-Core framework for managing ML model pipelines on Kubernetes. The former will help to appreciate the latter, which constitutes a powerful framework for deploying and performance-monitoring many complex ML model pipelines.
We start by demonstrating how to achieve this basic competence using the simple Python ML model scoring REST API contained in the py-flask-ml-score-api/api.py
module, together with the Dockerfile
within the py-flask-ml-score-api
directory. If you're already feeling lost then these files are discussed in the points below, otherwise feel free to skip to the next section.
api.py
is a Python module that uses the Flask framework for defining a web service (app
) with a function (score
) that executes in response to a HTTP request to a specific URL (or 'route') - e.g. running locally by executing the web service usingpython run api.py
), we would reach our function (or 'endpoint') athttp://localhost:5000/score
. This function takes data sent to it as JSON (that has been automatically de-serialised as a Python dict made available as therequest
variable in our function definition), and returns a response (automatically serialised as JSON). In our example function, we expect an array of features,X
, that we pass to a ML model, which in our example returns those same features back to the caller - i.e. our ML model is the identity function, which we have chosen for demonstrative purposes. We could have loaded a pickled SciKit-Learn model and passed the data to itspredict
method, returning its score for the feature-data as JSON, just as easily - see here for an example of this in action.Dockerfile
is a YAML file that allows us to define the contents and configure the operation of our intended Docker container, when it is running. This static data, when not executed as a container, is referred to as the 'image'. In our example Dockerfile, we start by using a pre-configured Docker image (python:3.6-slim
) that has a version of Linux with Python already installed; we then copy the contents of thepy-flask-ml-score-api
local directory to a directory on the image called/usr/src/app
; then usepip
to install the Pipenv package for Python dependency management; use Pipenv to install the dependencies described inPipfile.lock
into a virtual environment on the image; configure port 5000 to be exposed to the 'outside world' on the running container; and finally, to start our Flask RESTful web service -api.py
. Building this custom image and asking the Docker daemon to run it (remember that a running image is a 'container'), will expose our RESTful ML model scoring service on port 5000 as if it were running on a dedicated virtual machine. Refer to the official Docker documentation for a more comprehensive discussion of these core concepts.
We assume that there is a Docker client and Docker daemon running locally, that the client is logged into an account on DockerHub and that there is a terminal open in the this project's root directory. To build the image described in the Dockerfile run,
docker build --tag alexioannides/test-ml-score-api py-flask-ml-score-api
Where 'alexioannides' refers to the name of the DockerHub account that we will push the image to, once we have tested it. To test that the image can be used to create a Docker container that functions as we expect it to use,
docker run --name test-api -p 5000:5000 -d alexioannides/test-ml-score-api
Where we have mapped port 5000 from the Docker container - i.e. the port our ML model scoring service is listening to - to port 5000 on our host machine (localhost). Then check that the container is listed as running using,
docker ps
And then test the exposed API endpoint using,
curl http://localhost:5000/score \
--request POST \
--header "Content-Type: application/json" \
--data '{"X": [1, 2]}'
Where you should expect a response along the lines of,
{"score":[1,2]}
All our test model does is return the input data - i.e. it is the identity function. Only a few lines of additional code are required to modify this service to load a SciKit Learn model from disk and pass new data to it's 'predict' method for generating predictions - see here for an example. Now that the container has been confirmed as operational, we can stop and remove it,
docker stop test-api
docker rm test-api
In order for a remote Docker host or Kubernetes cluster to have access to the image we've created, we need to publish it to an image registry. All cloud computing providers that offer managed Docker-based services will provide private image registries, but we will use the public image registry at DockerHub, for convenience. To push our new image to DockerHub (where my account ID is 'alexioannides') use,
docker push alexioannides/test-ml-score-api
Where we can now see that our chosen naming convention for the image is intrinsically linked to our target image registry (you will need to insert your own account ID where necessary). Once the upload is finished, log onto DockerHub to confirm that the upload has been successful via the DockerHub UI.
Minikube allows a single node Kubernetes cluster to run within a Virtual Machine (VM) within a local machine (i.e. on your laptop), for development purposes. On Mac OS X, the steps required to get up-and-running are as follows:
- make sure the Homebrew package manager for OS X is installed; then,
- install VirtualBox using,
brew cask install virtualbox
(you may need to approve installation via OS X System Preferences); and then, - install Minikube using,
brew cask install minikube
.
To start the test cluster run,
minikube start --memory 4096
Where we have specified the minimum amount of memory required to deploy a single Seldon ML component. Be patient - Minikube may take a while to start. To test that the cluster is operational run,
kubectl cluster-info
Where kubectl
is the standard Command Line Interface (CLI) client for interacting with the Kubernetes API (which was installed as part of Minikube, but is also available separately).
To launch our test model scoring service on Kubernetes, start by running the container within a Kubernetes pod that is managed by a replication controller, which is the device that ensures that at least one pod running our service is operational at any given time. This is achieved with,
kubectl run test-ml-score-api --image=alexioannides/test-ml-score-api:latest --port=5000 --generator=run/v1
Where the --generator=run/v1
flag triggers the construction of the replication controller to manage the pod. To check that it's running use,
kubectl get pods
It is possible to use port forwarding to test an individual container without exposing it to the public internet. To use this, open a separate terminal and run (for example),
kubectl port-forward test-ml-score-api-szd4j 5000:5000
Where test-ml-score-api-szd4j
is the precise name of the pod currently active on the cluster, as determined from the kubectl get pods
command. Then from your original terminal, to repeat our test request against the same container running on Kubernetes run,
curl http://localhost:5000/score \
--request POST \
--header "Content-Type: application/json" \
--data '{"X": [1, 2]}'
To expose the container as a (load balanced) service to the outside world, we have to create a Kubernetes service that references it. This is achieved with the following command,
kubectl expose replicationcontroller test-ml-score-api --type=LoadBalancer --name test-ml-score-api-http
To check that this has worked and to find the services's external IP address run,
minikube service list
And we can then test our new service - for example,
curl http://192.168.99.100:30888/score \
--request POST \
--header "Content-Type: application/json" \
--data '{"X": [1, 2]}'
Note that we need to use Minikube-specific commands as Minikube does not setup a real-life load balancer (which is what would happen if we made this request on a cloud platform). To tear-down the load balancer, replication controller, pod and Minikube cluster run the following commands in sequence,
kubectl delete rc test-ml-score-api
kubectl delete service test-ml-score-api-http
minikube delete
In order to perform testing on a real-world Kubernetes cluster with far greater resources that those available on a laptop, the easiest way is to use a managed Kubernetes platform from a cloud provider. We will use Kubernetes Engine on Google Cloud Platform (GCP).
Before we can use Google Cloud Platform, sign-up for an account and create a project specifically for this work. Next, make sure that the GCP SDK is installed on your local machine - e.g.,
brew cask install google-cloud-sdk
Or by downloading an installation image directly from GCP. Note, that if you haven't installed Minikube and all of the tools that come packaged with it, then you will need to install Kubectl, which can be done using the GCP SDK,
gcloud components install kubectl
We then need to initialise the SDK,
gcloud init
Which will open a browser and guide you through the necessary authentication steps. Make sure you pick the project you created, together with a default zone and region (if this has not been set via Compute Engine -> Settings).
Firstly, within the GCP UI visit the Kubernetes Engine page to trigger the Kubernetes API to start-up. From the command line we then start a cluster using,
gcloud container clusters create k8s-test-cluster --num-nodes 3 --machine-type g1-small
And then go make a cup of coffee while you wait for the cluster to be created.
This is largely the same as we did for running the test service locally using Minikube - run the following commands in sequence,
kubectl run test-ml-score-api --image=alexioannides/test-ml-score-api:latest --port=5000 --generator=run/v1
kubectl expose replicationcontroller test-ml-score-api --type=LoadBalancer --name test-ml-score-api-http
But, to find the external IP address for the GCP cluster we will need to use,
kubectl get services
And then we can test our service on GCP - for example,
curl http://35.234.149.50:5000/score \
--request POST \
--header "Content-Type: application/json" \
--data '{"X": [1, 2]}'
Or, we could again use port forwarding to attach to a single pod - for example,
kubectl port-forward test-ml-score-api-nl4sc 5000:5000
And then in a separate terminal,
curl http://localhost:5000/score \
--request POST \
--header "Content-Type: application/json" \
--data '{"X": [1, 2]}'
Finally, we tear-down the replication controller and load balancer,
kubectl delete replicationcontroller test-ml-score-api
kubectl delete service test-ml-score-api-http
If you are running both with Minikube locally and with a cluster on GCP, then you can switch Kubectl context from one cluster to the other using, for example,
kubectl config use-context minikube
Where the list of available contexts can be found using,
kubectl config get-contexts
Up to this point we have been using Kubectl commands to define and deploy a basic version of our ML model scoring service. This is fine for demonstrative purposes, but quickly becomes limiting as well as unmanageable. In practice, the standard way of defining entire applications is with YAML files that are posted to the Kubernetes API. The py-flask-ml-score.yaml
file in the py-flask-ml-score-api
is an example of how our ML model scoring service can be defined in a single YAML file. This can now be deployed using a single command,
kubectl apply -f py-flask-ml-score-api/py-flask-ml-score.yaml
Note, that we have defined three separate Kubernetes components in this single file: a replication controller, a load-balancer service and a namespace for all of these components (and their sub-components) - using ---
to delimit the definition of each separate component. To see all components deployed into this namespace use,
kubectl get all --namespace test-ml-app
And likewise set the --namespace
flag when using any kubectl get
command to inspect the different components of our test app. Alternatively, we can set our new namespace as the default context,
kubectl config set-context $(kubectl config current-context) --namespace=test-ml-app
And then run,
kubectl get all
Where we can switch back to the default namespace using,
kubectl config set-context $(kubectl config current-context) --namespace=default
To tear-down this application we can then use,
kubectl delete -f py-flask-ml-score-api/py-flask-ml-score.yaml
Which saves us from having to use multiple commands to delete each component individually. Refer to the official documentation for the Kubernetes API to understand the contents of this YAML file in greater depth.
Writing YAML files for Kubernetes can get repetitive and hard to manage, especially if there is a lot of 'copy paste' involved when only a handful of parameters need to be changed from one deployment to the next and there is a 'wall of YAML' that needs to be modified. Enter Helm - a framework for creating, executing and managing Kubernetes deployment templates. What follows is a very high-level demonstration of how Helm can be used to deploy our ML model scoring service - for a comprehensive discussion of Helm's full capabilities (there are a lot of them), please refer to the official documentation. Seldon-Core can also be deployed using Helm and we will cover this in more detail later on.
As before, the easiest way to install Helm onto Mac OS X is to use the Homebrew package manager,
brew install kubernetes-helm
Helm relies on a dedicated deployment server, referred to as the 'Tiller', to be running within the same Kubernetes cluster we wish to deploy our applications to. Before we deploy Tiller we need to create a cluster-wide super-user role to assign to it (via a dedicated service account),
kubectl --namespace kube-system create serviceaccount tiller
kubectl create clusterrolebinding tiller \
--clusterrole cluster-admin \
--serviceaccount=kube-system:tiller
We can now deploy the Helm Tiller to your Kubernetes cluster using,
helm init --service-account tiller
To initiate a new deployment - referred to as a 'chart' in Helm terminology - run,
helm create NAME-OF-YOUR-HELM-CHART
This creates a new directory - e.g. helm-ml-score-app
as included with this repository - with the following high-level directory structure,
helm-ml-score-app/
| -- charts/
| -- templates/
| Chart.yaml
| values.yaml
Briefly, the charts
directory contains other charts that our new chart will depend on (we will not make use of this), the templates
directory contains our Helm templates, Chart.yaml
contains core information for our chart (e.g. name and version information) and values.yaml
contains default values to render our templates with (in the case that no values are passed from the command line).
The next step is to delete all of the files in the templates
directory (apart from NOTES.txt
), and to replace them with our own. We start with namespace.yaml
for declaring a namespace for our app,
apiVersion: v1
kind: Namespace
metadata:
name: {{ .Values.app.namespace }}
Anyone familiar with HTML template frameworks (e.g. Jinja), will be familiar with the use of {{}}
for defining values that will be injected into the rendered template. In this specific instance .Values.app.namespace
injects the app.namespace
variable, whose default value defined in values.yaml
. Next we define the contents of our pod in pod.yaml
,
apiVersion: v1
kind: ReplicationController
metadata:
name: {{ .Values.app.name }}-rc
labels:
app: {{ .Values.app.name }}
env: {{ .Values.app.env }}
namespace: {{ .Values.app.namespace }}
spec:
replicas: {{ .Values.replicas }}
template:
metadata:
labels:
app: {{ .Values.app.name }}
env: {{ .Values.app.env }}
namespace: {{ .Values.app.namespace }}
spec:
containers:
- image: {{ .Values.app.image }}
name: {{ .Values.app.name }}-api
ports:
- containerPort: {{ .Values.containerPort }}
protocol: TCP
And the details of the load balancer service in service.yaml
,
apiVersion: v1
kind: Service
metadata:
name: {{ .Values.app.name }}-lb
labels:
app: {{ .Values.app.name }}
namespace: {{ .Values.app.namespace }}
spec:
type: LoadBalancer
ports:
- port: {{ .Values.containerPort }}
targetPort: {{ .Values.targetPort }}
selector:
app: {{ .Values.app.name }}
What we have done, in essence, is to split-out each component of the deployment details from py-flask-ml-score.yaml
into its own file and then define template variables for each parameter of the configuration that is most likely to change from one deployment to the next. To test and examine the rendered template, without having to attempt a deployment, run,
helm install helm-ml-score-app --debug --dry-run
If you are happy with the results of the 'dry run', then execute the deployment and generate a release from the chart using,
helm install helm-ml-score-app
This will automatically print the status of the release, together with the name that Helm has ascribed to it (e.g. 'willing-yak') and the contents of NOTES.txt
rendered to the terminal. To list all available Helm releases and their names use,
helm list
And to the status of all their constituent components (e.g. pods, replication controllers, service, etc.) use for example,
helm status willing-yak
The ML scoring service can now be tested in exactly the same way as we have done previously (above). Once you have convinced yourself that it's working as expected, the release can be deleted using,
helm delete willing-way
Another framework for templating the configuration of Kubernetes application deployments is Ksonnet. Ksonnet allows you to compose Kubernetes application components using templated JSON-object configuration files, written in data templating language called Jsonnet (a superset of JSON). This alternative to Helm is also supported as a means of deploying Seldon-Core (demonstrated below).
The easiest way to install Ksonnet (on Mac OS X) is to use Homebrew,
brew install ksonnet/tap/ks
Conform that the installation has been successful by running,
ks version
The first step is to initialise a Ksonnet application and we will start by assuming that Minikube is running and is set to the current context,
ks init NAME-OF-YOUR-KSONNET-APP \
--context minikube \
--api-spec=version:v1.8.0
This creates a new directory - e.g. ksonnet-ml-score-app
as included with this repository - with the following high-level directory structure,
ksonnet-ml-score-app/
| -- components/
| -- environments/
| -- lib/
| -- vendor/
| app.yaml
Briefly, the components
directory will contain the files that describe each individual component that is to be deployed as part of the application, while the environments
directory will contain details of environment-specific deployment overrides. The app.yaml
file contains the actual environment details - e.g. Kubernetes cluster IPs and namespaces and will need to be modified if these core fields change. In order to work with this Ksonnet application, we will need to make it the current directory.
cd ksonnet-ml-score-app
Ksonnet defines 'components' based on prototypes - i.e. Jsonnet templates for pre-configured deployments, where the required fields for the template are provided via command line arguments. To replicate the YAML deployment used above we can use the generic deployed-service
prototype component. To add this component to our application use,
ks generate deployed-service test-ml-app \
--image alexioannides/test-ml-score-api \
--containerPort 5000 \
--servicePort 8000 \
--replicas 2 \
--type ClusterIP
Where the configuration parameters we pass to this prototype component (or template) are self-explanatory. We can take a look at the implied deployment in YAML format using,
ks show default
Next, we want to specify some specific environments - in our case, one for Minikube and one for our GCP cluster, whose context names have been extracted by running kubectl config get-contexts
. This is accomplished with,
ks env add test-local --context minikube
ks env add gcp --context gke_k8s-ml-ops_europe-west2-b_k8s-test-cluster
Deploying to each environment in-turn is as simple as running,
ks apply test-local
ks apply gcp
Which demonstrates the power of Ksonnet! Deploying new components is as simple as running the ks generate
command with the appropriate prototype and re-applying (and similarly for modifying existing deployments).
Seldon's core mission is to simplify the deployment of complex ML prediction pipelines on top of Kubernetes. In this demonstration we are going to focus on the simplest possible example - i.e. the simple ML model scoring API we have already been using.
Seldon-core depends heavily on Source-to-Image - a tool for automating the process of building code artifacts from source and injecting them into docker images. For Seldon, the artifacts are the different pieces of an ML pipeline. We use Homebrew to install Source-to-Image on Mac OS X,
brew install source-to-image
To confirm that it has been installed correctly run,
s2i version
We're using Pipenv to manage the Python dependencies for this project. To install seldon-core
into a virtual environment managed by Pipenv for use only by this project use,
pipenv install --python 3.6 seldon-core
Note, that we are specifying Python 3.6 explicitly, as at the time of writing Seldon-Core does not work with Python 3.7. If you don't wish to use pipenv
you can install seldon-core
using pip
into whatever environment is most convenient and then drop the use of pipenv run
when testing with Seldon-Core (below).
To deploy a ML component using Seldon, we need to create Seldon-compatible Docker images. We start by following these guidelines for defining a Python class that wraps an ML model targeted for deployment with Seldon. This is contained within the seldon-ml-score-component
directory. Firstly, ensure that the docker daemon is running locally and then run,
s2i build seldon-ml-score-component seldonio/seldon-core-s2i-python3:0.4 alexioannides/seldon-ml-score-component
Launch the container using Docker locally,
docker run --name seldon-s2i-test -p 5000:5000 -d alexioannides/seldon-ml-score-component
And then test the resulting Seldon component using the dedicated testing application from the seldon-core
Python package,
pipenv run seldon-core-tester seldon-ml-score-component/contract.json localhost 5000 -p
If it works as expected (i.e. without throwing any errors), push it to an image registry - for example,
docker push alexioannides/seldon-ml-score-component
Before we can proceed any further, we will need to grant a cluster-wide super-user role to our user, using Role-Based Access Control (RBAC). On GCP this is achieved with,
kubectl create clusterrolebinding kube-system-cluster-admin \
--clusterrole cluster-admin \
--serviceaccount kube-system:default \
--user $(gcloud info --format="value(config.account)")
And for Minikube with,
kubectl create clusterrolebinding kube-system-cluster-admin \
--clusterrole cluster-admin \
--serviceaccount kube-system:default
Next, we create a Kubernetes namespace for all Seldon components that we will deploy,
kubectl create namespace seldon
And we then set it as a default for the current kubectl context,
kubectl config set-context $(kubectl config current-context) --namespace=seldon
So that whenever we run a kubectl command it will now explicitly reference the seldon
namespace.
We now move on to deploying our Seldon compatible ML component and creating a service from it. To achieve this, we will start by demonstrating how to deploy Seldon-Core using Helm charts. To deploy Seldon-Core using Helm and Helm charts, we start by deploying the Seldon Custom Resource Definitions (CRD), directly from the Seldon chart repository hosted at https://storage.googleapis.com/seldon-charts
,
helm install seldon-core-crd \
--name seldon-core-crd \
--repo https://storage.googleapis.com/seldon-charts \
--set usage_metrics.enabled=true
We then do the same for Seldon-Core,
helm install seldon-core \
--name seldon-core \
--repo https://storage.googleapis.com/seldon-charts \
--set apife.enabled=false \
--set rbac.enabled=true \
--set ambassador.enabled=true \
--set single_namespace=true \
--set namespace=seldon
If we now run helm list --namespace seldon
we should see that Seldon-Core has been deployed and is waiting for Seldon ML components to be deployed alongside it. To deploy our Seldon-compatible ML model score service we configure and deploy another Seldon chart as follows,
helm install seldon-single-model \
--name test-seldon-ml-score-api \
--repo https://storage.googleapis.com/seldon-charts \
--set model.image.name=alexioannides/seldon-ml-score-component
We will define our Seldon ML deployment using Seldon's Ksonnet prototypes, using the same workflow as we did for the Ksonnet deployment of our simple ML model scoring service (above). We start by initialising a new Ksonnet application,
ks init NAME-OF-YOUR-SELDON-KSONNET-APP p --api-spec=version:v1.8.0
This will create a new directory - e.g. seldon-ksonnet-ml-score-app
as bundled with this repository - containing all of the necessary base configuration files for a Ksonnet-based deployment. We start by changing our current directory accordingly,
cd seldon-ksonnet-ml-score-app
To be able to add the base Seldon-Core components to the application we first need to link to the Seldon Ksonnet registry (located on GitHub),
ks registry add seldon-core github.com/SeldonIO/seldon-core/tree/master/seldon-core
And then install the Seldon-Core Ksonnet package,
ks pkg install seldon-core/seldon-core@master
Then we can generate the Seldon-Core components from the Seldon-Core prototype deployment,
ks generate seldon-core seldon-core \
--withApife=false \
--withAmbassador=true \
--withRbac=true \
--singleNamespace=true \
--namespace=seldon
We can now deploy Seldon-Core - without our ML component - to the default environment (extracted from the current kubectl context) using,
ks apply default
Finally, we deploy our model scoring API component on Seldon-Core by creating the new Ksonnet component that references the Seldon-Core Docker image containing the model scoring API and then applying it, as follows,
ks generate seldon-serve-simple-v1alpha2 test-seldon-ml-score-api --image alexioannides/seldon-ml-score-component
ks apply default -c test-seldon-ml-score-api
Note the similarities in the steps used for both Ksonnet and Helm deployments.
Regardless of how we deployed Seldon-Core and our Seldon-compatible ML model scoring service, we will test it with the same approaches we have been using above.
We follow the same general approach as we did for our first-principles Kubernetes deployments above, but using embedded bash commands to find the Ambassador API gateway component we need to target for port-forwarding. Regardless of whether or not we working with GCP or Minikube use,
kubectl port-forward $(kubectl get pods -n seldon -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n seldon 8003:8080
We can then test the model scoring API deployed via Seldon-Core, using the API defined by Seldon-Core,
curl http://localhost:8003/seldon/test-seldon-ml-score-api/api/v0.1/predictions \
--request POST \
--header "Content-Type: application/json" \
--data '{"data":{"names":["a","b"],"tensor":{"shape":[2,2],"values":[0,0,1,1]}}}'
Firstly, we need to expose the service to the public internet. If working on GCP we can expose the service via the ambassador
API gateway component deployed as part of Seldon-Core,
kubectl expose deployment seldon-core-ambassador --type=LoadBalancer --name=seldon-core-ambassador-external
And then to retrieve the external IP for GCP use,
kubectl get services
And for Minikube use,
minikube service list
And then to test the pubic endpoint use, for example,
curl http://192.168.99.111:32074/seldon/test-seldon-ml-score-api/api/v0.1/predictions \
--request POST \
--header "Content-Type: application/json" \
--data '{"data":{"names":["a","b"],"tensor":{"shape":[2,2],"values":[0,0,1,1]}}}'
To delete a Ksonnet deployment from the Kubernetes cluster, make sure you are in the application directory and then use,
ks delete default
To delete a Helm deployment from the Kubernetes cluster, first retrieve a list of all the releases in the Seldon namespace,
helm list --namespace seldon
And then remove them using,
helm delete seldon-core --purge && \
helm delete seldon-core-crd --purge && \
helm delete test-seldon-ml-score-api --purge
If there is a GCP cluster that needs to be killed run,
gcloud container clusters delete k8s-test-cluster
And likewise if working with Minikube,
minikube stop
minikube delete