online-ml / chantilly Goto Github PK
View Code? Open in Web Editor NEW🍦 Deployment tool for online machine learning models
License: BSD 3-Clause "New" or "Revised" License
🍦 Deployment tool for online machine learning models
License: BSD 3-Clause "New" or "Revised" License
That would be consistent with other API standards. We are not adding anything with the predict operation. The /api/learn however does add new information and should remain POST.
In production we lock down services this way using NGINX and only allow certain services to accept only HTTP GET requests. For example certain Elasticsearch nodes can only process HTTP GET thus making them read-only. This is a much needed security model to harden a production environment. Current configuration of Chantilly would not allow this.
FYI this is how you do that in a server block on NGINX:
{ ... limit_except GET { deny all; } ... }
We might want to support cloudpickle as well or instead of dill.
The best way to make multiple predictions and model updates with a single connection is to use a websocket. This will provide a bidirectional route to send model updates and receive predictions. When done we can provide an example using Python's websocket-client library, as well as in JavaScript.
Hello all,
I'm running Python 3.9 in an Anaconda environment on Windows.
After following the installation instructions, I get the following on the command line:
Python 3.9.6 (default, Jul 30 2021, 11:42:22) [MSC v.1916 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.
IPython 7.22.0 -- An enhanced Interactive Python.
In [1]: chantilly run
File "<ipython-input-1-c90d62525bcd>", line 1
chantilly run
^
SyntaxError: invalid syntax
Are there additional steps needed to get it running in Windows?
Thanks for your great work, I was attracted by the incremental learning property of your work. You use the fit_one function to update the deployed model given the new data, but I did not find the actual method you use to update the model. The fit_one function only has one line to return self, maybe I didn't find the correct position. Can you please share your thoughts on this?
This will allow to measure the impact of changes we make.
heyo! So I am using the same logic as chantilly after I've hit the learn endpoint and want to update metrics. Here is the basic logic refactored into its own function:
def update_metrics(self, prediction, ground_truth, model_name):
"""
Given a prediction, update metrics to reflect it.
"""
# Update the metrics
metrics = self.db[f"metrics/{model_name}"]
# At this point prediction is a dict. It might be empty because no training data has been seen
if not prediction:
return metrics
for metric in metrics:
# If the metrics requires labels but the prediction is a dict, then we need to retrieve the
# predicted label with the highest probability
if (
isinstance(metric, ClassificationMetric)
and metric.requires_labels
and isinstance(prediction, dict)
):
pred = max(prediction, key=prediction.get)
metric.update(y_true=ground_truth, y_pred=pred)
else:
metric.update(y_true=ground_truth, y_pred=prediction)
self.db[f"metrics/{model_name}"] = metrics
return metrics
This works fine for the test.py case with regression, but when I was testing a binary model, I found that one of the metrics (I think Mean Squared Error or MSE?) wouldn't enter the first if case because it isn't a classification metric, and then it would hit the else and fail because at that point, prediction was either an empty dict or a filled dict (which cannot be given to that function.) So when I inspected I'd see either something like:
{}
# or
{1: 1.0}
So I think what was happening is that for the first binary model prediction, it returns an empty dict because it doesn't know anything. And then after that I think the dict has keys that are ground truths, and values that are the predictions? So first I added this:
# In some cases the prediction is a dict but not a ClassificationMetric
elif isinstance(prediction, dict):
y_pred = prediction[ground_truth]
else:
metric.update(y_true=ground_truth, y_pred=prediction)
self.db[f"metrics/{model_name}"] = metrics
And that worked for my debugging session while I had a model already established, but when I restarted from scratch I got a key error. And turns out, the ground truth wasn't a value returned by the prediction. So I changed to:
# In some cases the prediction is a dict but not a ClassificationMetric
elif isinstance(prediction, dict):
y_pred = prediction.get(ground_truth)
# The ground truth value isn't always known
if y_pred:
metric.update(y_true=ground_truth, y_pred=y_pred)
else:
metric.update(y_true=ground_truth, y_pred=prediction)
self.db[f"metrics/{model_name}"] = metrics
This felt a little funky to me so I wanted to double check about the correct way to go about this. This is a binary model as defined here in chantilly. I suspect this will be tweaked further when I update to allow learning without having a true label (e.g., unsupervised cases I asked about in your talk!)
And then a follow up question - are there any river datasets / examples for multiclass? Thank you!
creme
models have a memory_usage
property. It would to return this information in the /api/models
route.
Is it possible to use chantilly for unsupervised learning models (anomalies) and timeseries forecast models, thus without necessarily haivng labeled data? which metrics would be meaningful in this case then?
I ask this because i dont see this kind of "flavors" within the chantilly list.
And also a side question:
can the chantilly backend store more then one model? is it possible to disable metrics computation for models so that the backend does not overload? i have applications in mind with dozens of models at the same time. Is Chantilly suitable in that case?
Great stuff
The current /predict only allows for predict_proba_one if the model is a Classifier. We are currently using the predict_proba_one method for a non-Classifier model however we can not access that method through chantilly.
At the moment we have a custom class for announcing events and metric updates that can be listened to through the streaming API. We might want to check out blinker and see if it solves the problem better.
I've been thinking a bit, and I think that the way forward is allow "flavors". For instance there could be the following flavors: classification, regression, recommendation, etc. This would allow to handle fine-grained behavior, validate models, and remove a lot of ifs in the code.
I have trained a binary classifier on the Spam Dataset and trying to deploy it using Chantilly. First, I load my trained model and upload it using a POST request. But, while doing classifications, the response says there is no "creme_model" model.
import dill
import requests
import os
if __name__ == '__main__':
model_path = './models/creme_model.pkl'
with open(os.path.join(model_path), 'rb') as file:
creme_model = dill.load(file)
url = 'http://localhost:5000'
config = {'flavor': 'binary'}
requests.post(f'{url}/api/init', json=config)
requests.post(f'{url}/api/model', data=dill.dumps(creme_model)) # uploads the model
r = requests.post(f'{url}/api/predict', json={
'id':1,
'model': 'creme_model',
'features':{'title': 'dun say so early hor... U c already then say...'}
})
print(r.json())
r_get_models = requests.get(f'{url}/api/models')
print(r_get_models.json())
r_get_model_metrics = requests.get(f'{url}/api/metrics')
print(r_get_model_metrics.json())
r_get_model_stats = requests.get(f'{url}/api/stats')
print(r_get_model_stats.json())
Output
{'message': "No model named 'creme_model'."}
{'default': 'vacuous-strawberry', 'models': ['vacuous-strawberry', 'cumbersome-passion', 'tedious-watermelon', 'exuberant-strawberry', 'picayune-strawberry', 'spiffy-passion', 'wrathful-papaya', 'diligent-coconut', 'amuck-ratatouille', 'bewildered-watermelon', 'capricious-pear', 'illustrious-cherry', 'hapless-grape', 'abrasive-pineapple', 'direful-banana', 'feeble-quince', 'earthy-pizza', 'noxious-melon', 'momentous-grape']}
{'Accuracy': 0.0, 'F1': 0.0, 'LogLoss': 0.0, 'Precision': 0.0, 'Recall': 0.0}
{'learn': {'ewm_duration': 0, 'ewm_duration_human': '0ns', 'mean_duration': 0, 'mean_duration_human': '0ns', 'n_calls': 0}, 'predict': {'ewm_duration': 921484, 'ewm_duration_human': '921μs484ns', 'mean_duration': 921484, 'mean_duration_human': '921μs484ns', 'n_calls': 1}}
I went looking for errors and found them in the /var/log/syslog
Please make /var/log/chantilly/error.log or something similar to make it easy to find.
Thank you!
When posting the following model to chantilly it generates a 500 error. I have tried this with all three "flavors" with the same result. I can successfully post other models just not this one.
model = feature_selection.VarianceThreshold(threshold=0.01)
model |= preprocessing.StandardScaler()
model |= multioutput.ClassifierChain(
model=linear_model.LogisticRegression(),
order=list(range(5))
)
from syslog:
Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: [2020-04-06 18:07:48,870] ERROR in app: Exception on /api/model [POST] Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: Traceback (most recent call last): Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: response = self.full_dispatch_request() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: rv = self.handle_user_exception(e) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: reraise(exc_type, exc_value, tb) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: raise value Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: rv = self.dispatch_request() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return self.view_functions[rule.endpoint](**req.view_args) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/app/api.py", line 97, in model Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: model = dill.loads(flask.request.get_data()) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 275, in loads Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return load(file, ignore, **kwds) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 270, in load Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return Unpickler(file, ignore=ignore, **kwds).load() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 472, in load Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: obj = StockUnpickler.load(self) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: TypeError: __init__() missing 1 required positional argument: 'model' Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: 127.0.0.1 - - [06/Apr/2020 18:07:48] "#033[35m#033[1mPOST /api/model HTTP/1.1#033[0m" 500 -
This is the error when posting a model:
Am I missing something required in the environment? Python 3.6?
127.0.0.1 - - [29/Mar/2020 21:29:41] "POST /api/init HTTP/1.1" 201 - [2020-03-29 21:29:41,320] ERROR in app: Exception on /api/model [POST] Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/local/lib/python3.6/dist-packages/app/api.py", line 107, in model name = db.add_model(model, name=name) File "/usr/local/lib/python3.6/dist-packages/app/db.py", line 69, in add_model name = _random_name() File "/usr/local/lib/python3.6/dist-packages/app/db.py", line 85, in _random_name adj = rng.choice(list(open(os.path.join(here, 'adjectives.txt')))) FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/app/adjectives.txt'
= e => {}
is not valid javascript. remove first =
token
es.addEventListener('learn', = e => {
var data = JSON.parse(e.data);
console.log(data.model, data.features, data.prediction, data.ground_truth)
};
I think it should be pretty easy to add some AWS integration with the boto3 library.
to define my_model:
requests.post('http://localhost:5000/api/**my_model**', data=dill.dumps(model))
make predictions to my_model:
requests.post('http://localhost:5000/api/**my_model**/predict', json={...
etc.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.