online-ml / chantilly Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 18.0 2.21 MB

🍦 Deployment tool for online machine learning models

License: BSD 3-Clause "New" or "Revised" License

Python 84.13% Makefile 0.20% CSS 0.62% HTML 15.05%

chantilly's People

Contributors

Stargazers

Watchers

Forkers

databill86 whiteeyehansel umizio andfanilo kusumy sandy4321 hholst80 pedrolarben rena-ganba venoli volomos dliofindia taogeanton2 michhar bigdatasciencegroup andrekpl codiangl

chantilly's Issues

To do

Validate API payloads, either with schema, marshmallow, or schematics
Make API routes return clear error messages
Add route documentation, maybe with something like Swagger

Consider making /api/predict a GET operation instead of POST

That would be consistent with other API standards. We are not adding anything with the predict operation. The /api/learn however does add new information and should remain POST.

In production we lock down services this way using NGINX and only allow certain services to accept only HTTP GET requests. For example certain Elasticsearch nodes can only process HTTP GET thus making them read-only. This is a much needed security model to harden a production environment. Current configuration of Chantilly would not allow this.

FYI this is how you do that in a server block on NGINX:

{ ... limit_except GET { deny all; } ... }

Consider cloudpickle

We might want to support cloudpickle as well or instead of dill.

Websocket implementation

The best way to make multiple predictions and model updates with a single connection is to use a websocket. This will provide a bidirectional route to send model updates and receive predictions. When done we can provide an example using Python's websocket-client library, as well as in JavaScript.

Unable to run in Windows

Hello all,

I'm running Python 3.9 in an Anaconda environment on Windows.

After following the installation instructions, I get the following on the command line:

Python 3.9.6 (default, Jul 30 2021, 11:42:22) [MSC v.1916 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 7.22.0 -- An enhanced Interactive Python.

In [1]: chantilly run
  File "<ipython-input-1-c90d62525bcd>", line 1
    chantilly run
              ^
SyntaxError: invalid syntax

Are there additional steps needed to get it running in Windows?

Question on fit_one function

Thanks for your great work, I was attracted by the incremental learning property of your work. You use the fit_one function to update the deployed model given the new data, but I did not find the actual method you use to update the model. The fit_one function only has one line to return self, maybe I didn't find the correct position. Can you please share your thoughts on this?

Write performance benchmark

This will allow to measure the impact of changes we make.

trouble using loaded model

Update metrics - possibly missing a case?

heyo! So I am using the same logic as chantilly after I've hit the learn endpoint and want to update metrics. Here is the basic logic refactored into its own function:

def update_metrics(self, prediction, ground_truth, model_name):
    """
    Given a prediction, update metrics to reflect it.
    """
    # Update the metrics
    metrics = self.db[f"metrics/{model_name}"]

    # At this point prediction is a dict. It might be empty because no training data has been seen
    if not prediction:
        return metrics

    for metric in metrics:

        # If the metrics requires labels but the prediction is a dict, then we need to retrieve the
        # predicted label with the highest probability
        if (
            isinstance(metric, ClassificationMetric)
            and metric.requires_labels
            and isinstance(prediction, dict)
        ):
            pred = max(prediction, key=prediction.get)
            metric.update(y_true=ground_truth, y_pred=pred)
        else:
            metric.update(y_true=ground_truth, y_pred=prediction)
    self.db[f"metrics/{model_name}"] = metrics
    return metrics

This works fine for the test.py case with regression, but when I was testing a binary model, I found that one of the metrics (I think Mean Squared Error or MSE?) wouldn't enter the first if case because it isn't a classification metric, and then it would hit the else and fail because at that point, prediction was either an empty dict or a filled dict (which cannot be given to that function.) So when I inspected I'd see either something like:

{}
# or
{1: 1.0}

So I think what was happening is that for the first binary model prediction, it returns an empty dict because it doesn't know anything. And then after that I think the dict has keys that are ground truths, and values that are the predictions? So first I added this:

# In some cases the prediction is a dict but not a ClassificationMetric
elif isinstance(prediction, dict):
    y_pred = prediction[ground_truth]
else:
    metric.update(y_true=ground_truth, y_pred=prediction)
self.db[f"metrics/{model_name}"] = metrics

And that worked for my debugging session while I had a model already established, but when I restarted from scratch I got a key error. And turns out, the ground truth wasn't a value returned by the prediction. So I changed to:

# In some cases the prediction is a dict but not a ClassificationMetric
elif isinstance(prediction, dict):
    y_pred = prediction.get(ground_truth)
    # The ground truth value isn't always known
    if y_pred:
        metric.update(y_true=ground_truth, y_pred=y_pred)
else:
    metric.update(y_true=ground_truth, y_pred=prediction)
self.db[f"metrics/{model_name}"] = metrics

This felt a little funky to me so I wanted to double check about the correct way to go about this. This is a binary model as defined here in chantilly. I suspect this will be tweaked further when I update to allow learning without having a true label (e.g., unsupervised cases I asked about in your talk!)

And then a follow up question - are there any river datasets / examples for multiclass? Thank you!

Display model memory usage

creme models have a memory_usage property. It would to return this information in the /api/models route.

Can this be used for un-labeled models as well? And how many?

Is it possible to use chantilly for unsupervised learning models (anomalies) and timeseries forecast models, thus without necessarily haivng labeled data? which metrics would be meaningful in this case then?

I ask this because i dont see this kind of "flavors" within the chantilly list.

And also a side question:

can the chantilly backend store more then one model? is it possible to disable metrics computation for models so that the backend does not overload? i have applications in mind with dozens of models at the same time. Is Chantilly suitable in that case?

Great stuff

add a /predict_proba method

The current /predict only allows for predict_proba_one if the model is a Classifier. We are currently using the predict_proba_one method for a non-Classifier model however we can not access that method through chantilly.

Consider Blinker

At the moment we have a custom class for announcing events and metric updates that can be listened to through the streaming API. We might want to check out blinker and see if it solves the problem better.

Allow flavors

I've been thinking a bit, and I think that the way forward is allow "flavors". For instance there could be the following flavors: classification, regression, recommendation, etc. This would allow to handle fine-grained behavior, validate models, and remove a lot of ifs in the code.

Dockerfile + Kubernetes example

Unable to upload creme model

I have trained a binary classifier on the Spam Dataset and trying to deploy it using Chantilly. First, I load my trained model and upload it using a POST request. But, while doing classifications, the response says there is no "creme_model" model.

import dill
import requests
import os


if __name__ == '__main__':
	
	model_path = './models/creme_model.pkl'

	with open(os.path.join(model_path), 'rb') as file:  
		creme_model = dill.load(file)

	url = 'http://localhost:5000'
	config = {'flavor': 'binary'}
	
	requests.post(f'{url}/api/init', json=config)
	requests.post(f'{url}/api/model', data=dill.dumps(creme_model)) # uploads the model

	r = requests.post(f'{url}/api/predict', json={
		'id':1,
	    'model': 'creme_model',
	    'features':{'title': 'dun say so early hor... U c already then say...'}
	})
	print(r.json())

	r_get_models = requests.get(f'{url}/api/models')
	print(r_get_models.json())

	r_get_model_metrics = requests.get(f'{url}/api/metrics')
	print(r_get_model_metrics.json())

	r_get_model_stats = requests.get(f'{url}/api/stats')
	print(r_get_model_stats.json())

Output

{'message': "No model named 'creme_model'."}
{'default': 'vacuous-strawberry', 'models': ['vacuous-strawberry', 'cumbersome-passion', 'tedious-watermelon', 'exuberant-strawberry', 'picayune-strawberry', 'spiffy-passion', 'wrathful-papaya', 'diligent-coconut', 'amuck-ratatouille', 'bewildered-watermelon', 'capricious-pear', 'illustrious-cherry', 'hapless-grape', 'abrasive-pineapple', 'direful-banana', 'feeble-quince', 'earthy-pizza', 'noxious-melon', 'momentous-grape']}
{'Accuracy': 0.0, 'F1': 0.0, 'LogLoss': 0.0, 'Precision': 0.0, 'Recall': 0.0}
{'learn': {'ewm_duration': 0, 'ewm_duration_human': '0ns', 'mean_duration': 0, 'mean_duration_human': '0ns', 'n_calls': 0}, 'predict': {'ewm_duration': 921484, 'ewm_duration_human': '921μs484ns', 'mean_duration': 921484, 'mean_duration_human': '921μs484ns', 'n_calls': 1}}

move default error log location

I went looking for errors and found them in the /var/log/syslog

Please make /var/log/chantilly/error.log or something similar to make it easy to find.

Thank you!

'Unsupported operand type' error prediction in README example

For Making a prediction part of the README,

r.json() contains this error:

{'message': 'TypeError("unsupported operand type(s) for -: 'str' and 'float'")'}

river and chantilly version:
('0.9.0', '0.3.0')

500 error posting ClassifierChain model

When posting the following model to chantilly it generates a 500 error. I have tried this with all three "flavors" with the same result. I can successfully post other models just not this one.

model = feature_selection.VarianceThreshold(threshold=0.01)
model |= preprocessing.StandardScaler()
model |= multioutput.ClassifierChain(
model=linear_model.LogisticRegression(),
order=list(range(5))
)

from syslog:

Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: [2020-04-06 18:07:48,870] ERROR in app: Exception on /api/model [POST] Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: Traceback (most recent call last): Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: response = self.full_dispatch_request() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: rv = self.handle_user_exception(e) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: reraise(exc_type, exc_value, tb) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: raise value Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: rv = self.dispatch_request() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return self.view_functions[rule.endpoint](**req.view_args) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/app/api.py", line 97, in model Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: model = dill.loads(flask.request.get_data()) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 275, in loads Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return load(file, ignore, **kwds) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 270, in load Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return Unpickler(file, ignore=ignore, **kwds).load() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 472, in load Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: obj = StockUnpickler.load(self) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: TypeError: __init__() missing 1 required positional argument: 'model' Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: 127.0.0.1 - - [06/Apr/2020 18:07:48] "#033[35m#033[1mPOST /api/model HTTP/1.1#033[0m" 500 -

Error when POST a new model.

This is the error when posting a model:

Am I missing something required in the environment? Python 3.6?

127.0.0.1 - - [29/Mar/2020 21:29:41] "POST /api/init HTTP/1.1" 201 - [2020-03-29 21:29:41,320] ERROR in app: Exception on /api/model [POST] Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/local/lib/python3.6/dist-packages/app/api.py", line 107, in model name = db.add_model(model, name=name) File "/usr/local/lib/python3.6/dist-packages/app/db.py", line 69, in add_model name = _random_name() File "/usr/local/lib/python3.6/dist-packages/app/db.py", line 85, in _random_name adj = rng.choice(list(open(os.path.join(here, 'adjectives.txt')))) FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/app/adjectives.txt'

invalid javascript syntax in example

= e => {} is not valid javascript. remove first = token

es.addEventListener('learn', = e => {
    var data = JSON.parse(e.data);
    console.log(data.model, data.features, data.prediction, data.ground_truth)
};

Integrate with AWS

I think it should be pretty easy to add some AWS integration with the boto3 library.

Allow for multiple models

to define my_model:
requests.post('http://localhost:5000/api/**my_model**', data=dill.dumps(model))

make predictions to my_model:
requests.post('http://localhost:5000/api/**my_model**/predict', json={...

etc.