Comments (20)
Yep it was indeed a stupid bug - I forgot a format string "f" when setting the flavor so it was setting literally to "flavor/{name}" so this:
def init_metrics(name: str):
db = get_db()
try:
- flavor = db["flavor/{name}"]
+ flavor = db[f"flavor/{name}"]
except KeyError:
raise exceptions.FlavorNotSet
db[f"metrics/{name}"] = flavor.default_metrics()
from chantilly.
It's cool that you're using the shelve
library too, I like it a lot.
it's more fun and addictive than "real" work haha
Story of my life haha.
from chantilly.
Looks cool! Keep it up :)
(or unsupervised)
Yes I believe there is a lot of interest in supporting anomaly detection.
from chantilly.
Hey
So one thing is that you shouldn't be using regression metrics in conjunction with classification models. They're incompatible by design. Each metric has a works_with
method to help you check if a model is compatible with a metric.
And then a follow up question - are there any river datasets / examples for multiclass? Thank you!
Yes, I suggest checking out the multiclass
module :)
from chantilly.
huh, that must be a bug then, because I do create the model as a binary type and not regression. I can look into this in an evening this week / this weekend. I refactored the design a bit so the metrics are namespaced with the model name, so as long as the original model's flavor is captured when it's added, they should follow suite, but I must have a bug.
Yes, I suggest checking out the multiclass module :)
Gah now I just feel silly, I was searching the datasets looking for hints of multiclass... sorry nothing to see here!
I'll close this issue after I investigate the potential bug - I'm still developing a lot pretty quickly and things will be more stable after I add the testing suite.
from chantilly.
Gah now I just feel silly
Don't! Even I get lost in all the River modules we have
I'll close this issue after I investigate the potential bug - I'm still developing a lot pretty quickly and things will be more stable after I add the testing suite.
Sounds good! The work you're doing is really cool.
from chantilly.
Thanks again for the help! I'm definitely making good progress - it's more fun and addictive than "real" work haha. :)
from chantilly.
Just thinking out loud: one of the things I would like this kind of platform to support is non-HTTP traffic. For instance, being able to handle WebSockets and/or SSE would be cool. Indeed, I think that persistent connections play nicely with online learning and sensors etc. I had my eyes on using FastAPI for this reason. Anyway, just some food for thought! I'm sure the same can be done with Django.
from chantilly.
Django can handle web sockets, but FastAPI is definitely faster than Django! And I agree that would be neat. The way I've designed Django River ML it's to have
- a pretty generic internal client that I could easily plug into another application backend (e.g., FastAPI) with only tweaks to setting things up and middleware and how the api endpoints are created
- a specification for the endpoints (I haven't written it out fully - it's in a markdown draft not written to version control yet but I will soon) and
- interacted with via also a common terminal client to make it easy (see https://github.com/vsoch/riverapi - sorry no pretty docs there yet but coming soon!)
And these things will make it easy to plug into whatever other Python frameworks we would be interested in! And I'd be happy to make one in FastAPI too, although I'd like to finish Django River ML first because I'm pretty excited to test it out for my use case :)
from chantilly.
And that reminds me, I was starting to brain storm what front end views we might provide for basic functionality. It's a plugin so we can't take over the entire design of an app, but I think some basic demo or views (even to include elsewhere) would be neat. I'm going to sleep now but something to think about - let me know what ideas you have!
from chantilly.
a pretty generic internal client that I could easily plug into another application backend (e.g., FastAPI) with only tweaks to setting things up and middleware and how the api endpoints are created
Exactly! In fact it should be able to do this online learning dance without any connection to the internet. For instance, when running a model on a closed-off device where the loop is self-contained. What matters here is to create the right abstractions via interfaces.
interacted with via also a common terminal client to make it easy (see https://github.com/vsoch/riverapi - sorry no pretty docs there yet but coming soon!)
Agreed! You need a client to interact with the "platform".
And I'd be happy to make one in FastAPI too, although I'd like to finish Django River ML first because I'm pretty excited to test it out for my use case :)
Enjoy :)
And that reminds me, I was starting to brain storm what front end views we might provide for basic functionality. It's a plugin so we can't take over the entire design of an app, but I think some basic demo or views (even to include elsewhere) would be neat.
Yes I agree with that. My experience here is that the interface should be read-only: you can't use the interface to add/remove models or what not. It's just a view.
from chantilly.
A follow up question! I'm testing the multi-class example, and for example here is my model:
In [18]: model
Out[18]:
Pipeline (
StandardScaler (
with_std=True
),
OneVsOneClassifier (
classifier=LogisticRegression (
optimizer=SGD (
lr=Constant (
learning_rate=0.01
)
)
loss=Log (
weight_pos=1.
weight_neg=1.
)
l2=0.
intercept_init=0.
intercept_lr=Constant (
learning_rate=0.01
)
clip_gradient=1e+12
initializer=Zeros ()
)
)
)
And it has the function but I suspect it just raises not implemented error:
model.predict_
predict_many() predict_proba_many()
predict_one() predict_proba_one()
So the question for the server - the standard case is always predicting one, but this doesn't seem to work here for this multiclass flavor:
~/anaconda3/envs/river/lib/python3.10/site-packages/river/base/classifier.py in predict_proba_one(self, x)
49 # method that each classifier has to implement. Instead, we raise an exception to indicate
50 # that a classifier does not support predict_proba_one.
---> 51 raise NotImplementedError
52
53 def predict_one(self, x: dict) -> base.typing.ClfTarget:
NotImplementedError:
I think this happened because we are checking for the functions as attributes on the model, but since the abstract class implements it, it is technically there
def check_model(self, model):
for method in ("learn_one", "predict_proba_one"):
if not hasattr(model, method):
return False, f"The model does not implement {method}."
return True, None
but not actually implemented there (looks like we have predict_one and learn_one?) https://github.com/online-ml/river/blob/ec1cf318310add301afe12160cebc66eaebcec2c/river/multiclass/ovo.py#L74-L97
Anyhoo - so my questions are:
- How should we handle this check method?
- What prediction/learn functions should the multiclass flavor use?
I'm also noticing that for these datasets, they are under multiclass but labeled as binary, but "a multiclass should work too!"
That alone might be the issue - that we aren't expected to use multiclass? But I was hoping to find a dataset / example that can use it (and I would like the server to support it!)
from chantilly.
I think this happened because we are checking for the functions as attributes on the model, but since the abstract class implements it, it is technically there
This is specific to that multi-class model; it doesn't support outputting probabilities. It implements predict_one
, but not predict_proba_one
. This is very much an edge-case, because you should expect every classifier (regardless of binary or multi-class) to implement predict_proba_one
.
How should we handle this check method?
I'm not sure how to check a method is implemented with the current way River's base classes are set up.
What prediction/learn functions should the multiclass flavor use?
It should always be predict_proba_one
and learn_one
. But I would try/except the predict_proba_one
and use predict_one
if the former raises an implementation error.
from chantilly.
I'll try that! And eventually there should be some clarity to the user about which models will work, and which not.
from chantilly.
I tried changing to predict_one to avoid the implementation error (that worked)! but the ground truth and predictions I get back are both strings, and when that hits the 'update_metrics function it falls into the last if/else:
In [5]: prediction
Out[5]: 'path'
In [6]: ground_truth
Out[6]: 'sky'
and results in this error in river:
In [4]: metric.update(y_true=ground_truth, y_pred=prediction)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/Desktop/Code/django-river-ml/django_river_ml/client.py in <module>
----> 1 metric.update(y_true=ground_truth, y_pred=prediction)
~/anaconda3/envs/river/lib/python3.10/site-packages/river/metrics/base.py in update(self, y_true, y_pred, sample_weight)
424
425 def update(self, y_true, y_pred, sample_weight=1.0):
--> 426 self._mean.update(x=self._eval(y_true, y_pred), w=sample_weight)
427 return self
428
~/anaconda3/envs/river/lib/python3.10/site-packages/river/metrics/cross_entropy.py in _eval(self, y_true, y_pred)
46
47 def _eval(self, y_true, y_pred):
---> 48 return optim.losses.CrossEntropy()(y_true, y_pred)
~/anaconda3/envs/river/lib/python3.10/site-packages/river/optim/losses.py in __call__(self, y_true, y_pred)
246 total = 0
247
--> 248 for label, proba in y_pred.items():
249 if y_true == label:
250 total += self.class_weight.get(label, 1.0) * math.log(
AttributeError: 'str' object has no attribute 'items'
Might it be the case that the "predict_one" is returning the wrong format? Or are we in another case of a model being incorrectly matched with the metrics? For metrics I have:
[Accuracy: 0.00%,
CrossEntropy: 0.,
MacroPrecision: 0.,
MacroRecall: 0.,
MacroF1: 0.,
MicroPrecision: 0.,
MicroRecall: 0.,
MicroF1: 0.]
model flavor:
Pipeline (
StandardScaler (
with_std=True
),
OneVsOneClassifier (
classifier=LogisticRegression (
optimizer=SGD (
lr=Constant (
learning_rate=0.01
)
)
loss=Log (
weight_pos=1.
weight_neg=1.
)
l2=0.
intercept_init=0.
intercept_lr=Constant (
learning_rate=0.01
)
clip_gradient=1e+12
initializer=Zeros ()
)
)
)
I've never used Logistic regression in a multiclass case - I'm used to getting between 0 and 1 and applying some threshold for two classes. For reference I was looking here: https://riverml.xyz/latest/api/multiclass/OneVsOneClassifier/ that is grouping it under multiclass (hence why I'm probably incorrectly using it here!)
And I'm thinking about the design of model "flavors" - if it's the case that it's hard to generalize models into these flavors, it perhaps would make sense to have a direct lookup of "for model X use this base class, metrics, etc." but maybe we can still get it working for the more general flavors already here.
Thanks for your help! I should be able to make some more time this weekend to work on river - had a busy end of the week!
from chantilly.
Mmm I'm not sure I fully understand what you're doing.
Classification metrics expect labels or dictionaries with a probability for each label. Each classification metric has a requires_label
property indicating this. Does that help?
from chantilly.
Sorry for not being clear, let me try to better walk through it.
- So I start with the wrapper LogisticRegression example that I linked above. it's under "multiclass" and I chose it to test that endpoint.
- This results in the MultiClass model, and the following metrics:
[Accuracy: 0.00%,
CrossEntropy: 0.,
MacroPrecision: 0.,
MacroRecall: 0.,
MacroF1: 0.,
MicroPrecision: 0.,
MicroRecall: 0.,
MicroF1: 0.]
and this is correct, at least reflected also in Chantilly here
- When we are updating metrics (this part of Chantilly) for metric CrossEntropy: 0.,
- requires_labels: False
- isinstance(ClassificationMetric) is True
- However the prediction is not a dictionary, so it fails the first if statement and we hit here
So now we have:
- ground_truth
Out[26]: 'sky'
In [27]: prediction
Out[27]: 'path'
And calling this method:
metric.update(y_true=ground_truth, y_pred=prediction)
results in the error above - the prediction is a string and not a dict.
If we trace back up to here where the original error was, to recall our previous discussion, the flavor is multiclass (per coming from the multiclass examples)
flavor
Out[30]: <django_river_ml.flavors.MultiClassFlavor at 0x7fa1b8e11900>
so the default pred_func is predict_proba_one
which fails with the not implemented error. So I fall back to predict_one
and that is the reason we return a string.
So I think what I'm hearing is that I'm not allowed to provide a str to this metric. So here are some options for moving forward:
- This model is in fact not multiclass and I shouldn't be using it in that context, period, in which my suggestion for the implementation is that we do a better mapping of model -> model type and remove the need for the user to specify it (and possibly get it wrong)
- This case is possible but the label I'm returned back needs to be reformatted into a different structure to allow it to work
- This case is possible but should not be allowed to update a metric given that a str is returned
In any case, I do think this feels a little buggy and we should get to the bottom of it - let me know what questions you have!
from chantilly.
Ok I think I understand. Thanks for the details.
Your problem is that OneVsOneClassifier
doesn't implement predict_proba_one
. So you can't produce probabilities. This means that the metrics which requires probability estimates, namely CrossEntropy
, should not be updated, period.
How I would handle this:
- Try to call
predict_proba_one
- Catch
NotImplementedError
- Fall back to
predict_one
- Loop over the metrics
- Skip cases where
metric.requires_label == False
How does that sound?
from chantilly.
Okay that worked! For a regression model with mean absolute error (MAE) there simply wasn't an attribute requires_labels
and it hit the last statement because it's not a ClassificationMetric (this is for a regression model) so instead of checking for that attribute, I'm going to do the dump Pythonic thing and just try/except and if the metric works to be updated, great, if not, well it's probably not intended for the case or (if it's a bug) we will find it eventually :)
I'm done adding the multiclass, going to do the new label endpoint and write some tests and then do a PR - will post here when it's in!
from chantilly.
okay tiny progress! vsoch/django-river-ml#7 This is great - we have label and a multiclass example, and I think next I'm going to give a shot at a KMeans model (or unsupervised). Thanks again for your help - I suspect it's late in your time zone so I will not ping further today!
from chantilly.
Related Issues (20)
- Consider cloudpickle
- Consider Blinker
- Consider making /api/predict a GET operation instead of POST HOT 3
- 500 error posting ClassifierChain model HOT 3
- Display model memory usage
- Integrate with AWS
- Unable to upload creme model HOT 1
- invalid javascript syntax in example
- trouble using loaded model
- Question on fit_one function HOT 2
- Unable to run in Windows HOT 5
- 'Unsupported operand type' error prediction in README example HOT 1
- Can this be used for un-labeled models as well? And how many?
- add a /predict_proba method HOT 3
- Websocket implementation
- Allow flavors HOT 5
- Write performance benchmark HOT 7
- move default error log location HOT 2
- Error when POST a new model. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chantilly.