Code Monkey home page Code Monkey logo

chantilly's People

Contributors

hholst80 avatar maxhalford avatar michhar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chantilly's Issues

Consider making /api/predict a GET operation instead of POST

That would be consistent with other API standards. We are not adding anything with the predict operation. The /api/learn however does add new information and should remain POST.

In production we lock down services this way using NGINX and only allow certain services to accept only HTTP GET requests. For example certain Elasticsearch nodes can only process HTTP GET thus making them read-only. This is a much needed security model to harden a production environment. Current configuration of Chantilly would not allow this.

FYI this is how you do that in a server block on NGINX:

{ ... limit_except GET { deny all; } ... }

Websocket implementation

The best way to make multiple predictions and model updates with a single connection is to use a websocket. This will provide a bidirectional route to send model updates and receive predictions. When done we can provide an example using Python's websocket-client library, as well as in JavaScript.

Unable to run in Windows

Hello all,

I'm running Python 3.9 in an Anaconda environment on Windows.

After following the installation instructions, I get the following on the command line:

Python 3.9.6 (default, Jul 30 2021, 11:42:22) [MSC v.1916 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 7.22.0 -- An enhanced Interactive Python.

In [1]: chantilly run
  File "<ipython-input-1-c90d62525bcd>", line 1
    chantilly run
              ^
SyntaxError: invalid syntax

Are there additional steps needed to get it running in Windows?

Question on fit_one function

Thanks for your great work, I was attracted by the incremental learning property of your work. You use the fit_one function to update the deployed model given the new data, but I did not find the actual method you use to update the model. The fit_one function only has one line to return self, maybe I didn't find the correct position. Can you please share your thoughts on this?

Update metrics - possibly missing a case?

heyo! So I am using the same logic as chantilly after I've hit the learn endpoint and want to update metrics. Here is the basic logic refactored into its own function:

def update_metrics(self, prediction, ground_truth, model_name):
    """
    Given a prediction, update metrics to reflect it.
    """
    # Update the metrics
    metrics = self.db[f"metrics/{model_name}"]

    # At this point prediction is a dict. It might be empty because no training data has been seen
    if not prediction:
        return metrics

    for metric in metrics:

        # If the metrics requires labels but the prediction is a dict, then we need to retrieve the
        # predicted label with the highest probability
        if (
            isinstance(metric, ClassificationMetric)
            and metric.requires_labels
            and isinstance(prediction, dict)
        ):
            pred = max(prediction, key=prediction.get)
            metric.update(y_true=ground_truth, y_pred=pred)
        else:
            metric.update(y_true=ground_truth, y_pred=prediction)
    self.db[f"metrics/{model_name}"] = metrics
    return metrics

This works fine for the test.py case with regression, but when I was testing a binary model, I found that one of the metrics (I think Mean Squared Error or MSE?) wouldn't enter the first if case because it isn't a classification metric, and then it would hit the else and fail because at that point, prediction was either an empty dict or a filled dict (which cannot be given to that function.) So when I inspected I'd see either something like:

{}
# or
{1: 1.0}

So I think what was happening is that for the first binary model prediction, it returns an empty dict because it doesn't know anything. And then after that I think the dict has keys that are ground truths, and values that are the predictions? So first I added this:

# In some cases the prediction is a dict but not a ClassificationMetric
elif isinstance(prediction, dict):
    y_pred = prediction[ground_truth]
else:
    metric.update(y_true=ground_truth, y_pred=prediction)
self.db[f"metrics/{model_name}"] = metrics

And that worked for my debugging session while I had a model already established, but when I restarted from scratch I got a key error. And turns out, the ground truth wasn't a value returned by the prediction. So I changed to:

# In some cases the prediction is a dict but not a ClassificationMetric
elif isinstance(prediction, dict):
    y_pred = prediction.get(ground_truth)
    # The ground truth value isn't always known
    if y_pred:
        metric.update(y_true=ground_truth, y_pred=y_pred)
else:
    metric.update(y_true=ground_truth, y_pred=prediction)
self.db[f"metrics/{model_name}"] = metrics

This felt a little funky to me so I wanted to double check about the correct way to go about this. This is a binary model as defined here in chantilly. I suspect this will be tweaked further when I update to allow learning without having a true label (e.g., unsupervised cases I asked about in your talk!)

And then a follow up question - are there any river datasets / examples for multiclass? Thank you!

Display model memory usage

creme models have a memory_usage property. It would to return this information in the /api/models route.

Can this be used for un-labeled models as well? And how many?

Is it possible to use chantilly for unsupervised learning models (anomalies) and timeseries forecast models, thus without necessarily haivng labeled data? which metrics would be meaningful in this case then?

I ask this because i dont see this kind of "flavors" within the chantilly list.

And also a side question:

can the chantilly backend store more then one model? is it possible to disable metrics computation for models so that the backend does not overload? i have applications in mind with dozens of models at the same time. Is Chantilly suitable in that case?

Great stuff

add a /predict_proba method

The current /predict only allows for predict_proba_one if the model is a Classifier. We are currently using the predict_proba_one method for a non-Classifier model however we can not access that method through chantilly.

Consider Blinker

At the moment we have a custom class for announcing events and metric updates that can be listened to through the streaming API. We might want to check out blinker and see if it solves the problem better.

Allow flavors

I've been thinking a bit, and I think that the way forward is allow "flavors". For instance there could be the following flavors: classification, regression, recommendation, etc. This would allow to handle fine-grained behavior, validate models, and remove a lot of ifs in the code.

Unable to upload creme model

I have trained a binary classifier on the Spam Dataset and trying to deploy it using Chantilly. First, I load my trained model and upload it using a POST request. But, while doing classifications, the response says there is no "creme_model" model.

import dill
import requests
import os


if __name__ == '__main__':
	
	model_path = './models/creme_model.pkl'

	with open(os.path.join(model_path), 'rb') as file:  
		creme_model = dill.load(file)

	url = 'http://localhost:5000'
	config = {'flavor': 'binary'}
	
	requests.post(f'{url}/api/init', json=config)
	requests.post(f'{url}/api/model', data=dill.dumps(creme_model)) # uploads the model

	r = requests.post(f'{url}/api/predict', json={
		'id':1,
	    'model': 'creme_model',
	    'features':{'title': 'dun say so early hor... U c already then say...'}
	})
	print(r.json())

	r_get_models = requests.get(f'{url}/api/models')
	print(r_get_models.json())

	r_get_model_metrics = requests.get(f'{url}/api/metrics')
	print(r_get_model_metrics.json())

	r_get_model_stats = requests.get(f'{url}/api/stats')
	print(r_get_model_stats.json())

Output

{'message': "No model named 'creme_model'."}
{'default': 'vacuous-strawberry', 'models': ['vacuous-strawberry', 'cumbersome-passion', 'tedious-watermelon', 'exuberant-strawberry', 'picayune-strawberry', 'spiffy-passion', 'wrathful-papaya', 'diligent-coconut', 'amuck-ratatouille', 'bewildered-watermelon', 'capricious-pear', 'illustrious-cherry', 'hapless-grape', 'abrasive-pineapple', 'direful-banana', 'feeble-quince', 'earthy-pizza', 'noxious-melon', 'momentous-grape']}
{'Accuracy': 0.0, 'F1': 0.0, 'LogLoss': 0.0, 'Precision': 0.0, 'Recall': 0.0}
{'learn': {'ewm_duration': 0, 'ewm_duration_human': '0ns', 'mean_duration': 0, 'mean_duration_human': '0ns', 'n_calls': 0}, 'predict': {'ewm_duration': 921484, 'ewm_duration_human': '921μs484ns', 'mean_duration': 921484, 'mean_duration_human': '921μs484ns', 'n_calls': 1}}

move default error log location

I went looking for errors and found them in the /var/log/syslog

Please make /var/log/chantilly/error.log or something similar to make it easy to find.

Thank you!

500 error posting ClassifierChain model

When posting the following model to chantilly it generates a 500 error. I have tried this with all three "flavors" with the same result. I can successfully post other models just not this one.

model = feature_selection.VarianceThreshold(threshold=0.01)
model |= preprocessing.StandardScaler()
model |= multioutput.ClassifierChain(
model=linear_model.LogisticRegression(),
order=list(range(5))
)

from syslog:

Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: [2020-04-06 18:07:48,870] ERROR in app: Exception on /api/model [POST] Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: Traceback (most recent call last): Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: response = self.full_dispatch_request() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: rv = self.handle_user_exception(e) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: reraise(exc_type, exc_value, tb) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: raise value Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: rv = self.dispatch_request() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return self.view_functions[rule.endpoint](**req.view_args) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/app/api.py", line 97, in model Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: model = dill.loads(flask.request.get_data()) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 275, in loads Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return load(file, ignore, **kwds) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 270, in load Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: return Unpickler(file, ignore=ignore, **kwds).load() Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 472, in load Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: obj = StockUnpickler.load(self) Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: TypeError: __init__() missing 1 required positional argument: 'model' Apr 6 18:07:48 ip-172-26-7-45 chantilly[17628]: 127.0.0.1 - - [06/Apr/2020 18:07:48] "#033[35m#033[1mPOST /api/model HTTP/1.1#033[0m" 500 -

Error when POST a new model.

This is the error when posting a model:

Am I missing something required in the environment? Python 3.6?

127.0.0.1 - - [29/Mar/2020 21:29:41] "POST /api/init HTTP/1.1" 201 - [2020-03-29 21:29:41,320] ERROR in app: Exception on /api/model [POST] Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/local/lib/python3.6/dist-packages/app/api.py", line 107, in model name = db.add_model(model, name=name) File "/usr/local/lib/python3.6/dist-packages/app/db.py", line 69, in add_model name = _random_name() File "/usr/local/lib/python3.6/dist-packages/app/db.py", line 85, in _random_name adj = rng.choice(list(open(os.path.join(here, 'adjectives.txt')))) FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/app/adjectives.txt'

invalid javascript syntax in example

= e => {} is not valid javascript. remove first = token

es.addEventListener('learn', = e => {
    var data = JSON.parse(e.data);
    console.log(data.model, data.features, data.prediction, data.ground_truth)
};

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.