Code Monkey home page Code Monkey logo

Comments (20)

vsoch avatar vsoch commented on August 16, 2024 1

Yep it was indeed a stupid bug - I forgot a format string "f" when setting the flavor so it was setting literally to "flavor/{name}" so this:

def init_metrics(name: str):
    db = get_db()
    try:
-        flavor = db["flavor/{name}"]
+        flavor = db[f"flavor/{name}"]
    except KeyError:
        raise exceptions.FlavorNotSet
    db[f"metrics/{name}"] = flavor.default_metrics()

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024 1

It's cool that you're using the shelve library too, I like it a lot.

it's more fun and addictive than "real" work haha

Story of my life haha.

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024 1

Looks cool! Keep it up :)

(or unsupervised)

Yes I believe there is a lot of interest in supporting anomaly detection.

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024

Hey 👋

So one thing is that you shouldn't be using regression metrics in conjunction with classification models. They're incompatible by design. Each metric has a works_with method to help you check if a model is compatible with a metric.

And then a follow up question - are there any river datasets / examples for multiclass? Thank you!

Yes, I suggest checking out the multiclass module :)

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

huh, that must be a bug then, because I do create the model as a binary type and not regression. I can look into this in an evening this week / this weekend. I refactored the design a bit so the metrics are namespaced with the model name, so as long as the original model's flavor is captured when it's added, they should follow suite, but I must have a bug.

Yes, I suggest checking out the multiclass module :)

Gah now I just feel silly, I was searching the datasets looking for hints of multiclass... sorry nothing to see here!

I'll close this issue after I investigate the potential bug - I'm still developing a lot pretty quickly and things will be more stable after I add the testing suite.

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024

Gah now I just feel silly

Don't! Even I get lost in all the River modules we have 😅

I'll close this issue after I investigate the potential bug - I'm still developing a lot pretty quickly and things will be more stable after I add the testing suite.

Sounds good! The work you're doing is really cool.

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

Thanks again for the help! I'm definitely making good progress - it's more fun and addictive than "real" work haha. :)

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024

Just thinking out loud: one of the things I would like this kind of platform to support is non-HTTP traffic. For instance, being able to handle WebSockets and/or SSE would be cool. Indeed, I think that persistent connections play nicely with online learning and sensors etc. I had my eyes on using FastAPI for this reason. Anyway, just some food for thought! I'm sure the same can be done with Django.

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

Django can handle web sockets, but FastAPI is definitely faster than Django! And I agree that would be neat. The way I've designed Django River ML it's to have

  1. a pretty generic internal client that I could easily plug into another application backend (e.g., FastAPI) with only tweaks to setting things up and middleware and how the api endpoints are created
  2. a specification for the endpoints (I haven't written it out fully - it's in a markdown draft not written to version control yet but I will soon) and
  3. interacted with via also a common terminal client to make it easy (see https://github.com/vsoch/riverapi - sorry no pretty docs there yet but coming soon!)

And these things will make it easy to plug into whatever other Python frameworks we would be interested in! And I'd be happy to make one in FastAPI too, although I'd like to finish Django River ML first because I'm pretty excited to test it out for my use case :)

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

And that reminds me, I was starting to brain storm what front end views we might provide for basic functionality. It's a plugin so we can't take over the entire design of an app, but I think some basic demo or views (even to include elsewhere) would be neat. I'm going to sleep now but something to think about - let me know what ideas you have!

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024

a pretty generic internal client that I could easily plug into another application backend (e.g., FastAPI) with only tweaks to setting things up and middleware and how the api endpoints are created

Exactly! In fact it should be able to do this online learning dance without any connection to the internet. For instance, when running a model on a closed-off device where the loop is self-contained. What matters here is to create the right abstractions via interfaces.

interacted with via also a common terminal client to make it easy (see https://github.com/vsoch/riverapi - sorry no pretty docs there yet but coming soon!)

Agreed! You need a client to interact with the "platform".

And I'd be happy to make one in FastAPI too, although I'd like to finish Django River ML first because I'm pretty excited to test it out for my use case :)

Enjoy :)

And that reminds me, I was starting to brain storm what front end views we might provide for basic functionality. It's a plugin so we can't take over the entire design of an app, but I think some basic demo or views (even to include elsewhere) would be neat.

Yes I agree with that. My experience here is that the interface should be read-only: you can't use the interface to add/remove models or what not. It's just a view.

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

A follow up question! I'm testing the multi-class example, and for example here is my model:

In [18]: model
Out[18]: 
Pipeline (
  StandardScaler (
    with_std=True
  ),
  OneVsOneClassifier (
    classifier=LogisticRegression (
      optimizer=SGD (
        lr=Constant (
          learning_rate=0.01
        )
      )
      loss=Log (
        weight_pos=1.
        weight_neg=1.
      )
      l2=0.
      intercept_init=0.
      intercept_lr=Constant (
        learning_rate=0.01
      )
      clip_gradient=1e+12
      initializer=Zeros ()
    )
  )
)

And it has the function but I suspect it just raises not implemented error:

model.predict_
                        predict_many()       predict_proba_many()
                        predict_one()        predict_proba_one() 

So the question for the server - the standard case is always predicting one, but this doesn't seem to work here for this multiclass flavor:

~/anaconda3/envs/river/lib/python3.10/site-packages/river/base/classifier.py in predict_proba_one(self, x)
     49         # method that each classifier has to implement. Instead, we raise an exception to indicate
     50         # that a classifier does not support predict_proba_one.
---> 51         raise NotImplementedError
     52 
     53     def predict_one(self, x: dict) -> base.typing.ClfTarget:

NotImplementedError:

I think this happened because we are checking for the functions as attributes on the model, but since the abstract class implements it, it is technically there

def check_model(self, model):
        for method in ("learn_one", "predict_proba_one"):
            if not hasattr(model, method):
                return False, f"The model does not implement {method}."
        return True, None

but not actually implemented there (looks like we have predict_one and learn_one?) https://github.com/online-ml/river/blob/ec1cf318310add301afe12160cebc66eaebcec2c/river/multiclass/ovo.py#L74-L97

Anyhoo - so my questions are:

  1. How should we handle this check method?
  2. What prediction/learn functions should the multiclass flavor use?

I'm also noticing that for these datasets, they are under multiclass but labeled as binary, but "a multiclass should work too!"

image

That alone might be the issue - that we aren't expected to use multiclass? But I was hoping to find a dataset / example that can use it (and I would like the server to support it!)

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024

I think this happened because we are checking for the functions as attributes on the model, but since the abstract class implements it, it is technically there

This is specific to that multi-class model; it doesn't support outputting probabilities. It implements predict_one, but not predict_proba_one. This is very much an edge-case, because you should expect every classifier (regardless of binary or multi-class) to implement predict_proba_one.

How should we handle this check method?

I'm not sure how to check a method is implemented with the current way River's base classes are set up.

What prediction/learn functions should the multiclass flavor use?

It should always be predict_proba_one and learn_one. But I would try/except the predict_proba_one and use predict_one if the former raises an implementation error.

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

I'll try that! And eventually there should be some clarity to the user about which models will work, and which not.

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

I tried changing to predict_one to avoid the implementation error (that worked)! but the ground truth and predictions I get back are both strings, and when that hits the 'update_metrics function it falls into the last if/else:

In [5]: prediction
Out[5]: 'path'

In [6]: ground_truth
Out[6]: 'sky'

and results in this error in river:

In [4]:                 metric.update(y_true=ground_truth, y_pred=prediction)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/Desktop/Code/django-river-ml/django_river_ml/client.py in <module>
----> 1 metric.update(y_true=ground_truth, y_pred=prediction)

~/anaconda3/envs/river/lib/python3.10/site-packages/river/metrics/base.py in update(self, y_true, y_pred, sample_weight)
    424 
    425     def update(self, y_true, y_pred, sample_weight=1.0):
--> 426         self._mean.update(x=self._eval(y_true, y_pred), w=sample_weight)
    427         return self
    428 

~/anaconda3/envs/river/lib/python3.10/site-packages/river/metrics/cross_entropy.py in _eval(self, y_true, y_pred)
     46 
     47     def _eval(self, y_true, y_pred):
---> 48         return optim.losses.CrossEntropy()(y_true, y_pred)

~/anaconda3/envs/river/lib/python3.10/site-packages/river/optim/losses.py in __call__(self, y_true, y_pred)
    246         total = 0
    247 
--> 248         for label, proba in y_pred.items():
    249             if y_true == label:
    250                 total += self.class_weight.get(label, 1.0) * math.log(

AttributeError: 'str' object has no attribute 'items'

Might it be the case that the "predict_one" is returning the wrong format? Or are we in another case of a model being incorrectly matched with the metrics? For metrics I have:

[Accuracy: 0.00%,
 CrossEntropy: 0.,
 MacroPrecision: 0.,
 MacroRecall: 0.,
 MacroF1: 0.,
 MicroPrecision: 0.,
 MicroRecall: 0.,
 MicroF1: 0.]

model flavor:

Pipeline (
  StandardScaler (
    with_std=True
  ),
  OneVsOneClassifier (
    classifier=LogisticRegression (
      optimizer=SGD (
        lr=Constant (
          learning_rate=0.01
        )
      )
      loss=Log (
        weight_pos=1.
        weight_neg=1.
      )
      l2=0.
      intercept_init=0.
      intercept_lr=Constant (
        learning_rate=0.01
      )
      clip_gradient=1e+12
      initializer=Zeros ()
    )
  )
)

I've never used Logistic regression in a multiclass case - I'm used to getting between 0 and 1 and applying some threshold for two classes. For reference I was looking here: https://riverml.xyz/latest/api/multiclass/OneVsOneClassifier/ that is grouping it under multiclass (hence why I'm probably incorrectly using it here!)

And I'm thinking about the design of model "flavors" - if it's the case that it's hard to generalize models into these flavors, it perhaps would make sense to have a direct lookup of "for model X use this base class, metrics, etc." but maybe we can still get it working for the more general flavors already here.

Thanks for your help! I should be able to make some more time this weekend to work on river - had a busy end of the week!

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024

Mmm I'm not sure I fully understand what you're doing.

Classification metrics expect labels or dictionaries with a probability for each label. Each classification metric has a requires_label property indicating this. Does that help?

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

Sorry for not being clear, let me try to better walk through it.

  1. So I start with the wrapper LogisticRegression example that I linked above. it's under "multiclass" and I chose it to test that endpoint.
  2. This results in the MultiClass model, and the following metrics:
[Accuracy: 0.00%,
 CrossEntropy: 0.,
 MacroPrecision: 0.,
 MacroRecall: 0.,
 MacroF1: 0.,
 MicroPrecision: 0.,
 MicroRecall: 0.,
 MicroF1: 0.]

and this is correct, at least reflected also in Chantilly here

  1. When we are updating metrics (this part of Chantilly) for metric CrossEntropy: 0.,
  • requires_labels: False
  • isinstance(ClassificationMetric) is True
  • However the prediction is not a dictionary, so it fails the first if statement and we hit here

So now we have:

- ground_truth
Out[26]: 'sky'

In [27]: prediction
Out[27]: 'path'

And calling this method:

 metric.update(y_true=ground_truth, y_pred=prediction)

results in the error above - the prediction is a string and not a dict.

If we trace back up to here where the original error was, to recall our previous discussion, the flavor is multiclass (per coming from the multiclass examples)

flavor
Out[30]: <django_river_ml.flavors.MultiClassFlavor at 0x7fa1b8e11900>

so the default pred_func is predict_proba_one which fails with the not implemented error. So I fall back to predict_one and that is the reason we return a string.

So I think what I'm hearing is that I'm not allowed to provide a str to this metric. So here are some options for moving forward:

  1. This model is in fact not multiclass and I shouldn't be using it in that context, period, in which my suggestion for the implementation is that we do a better mapping of model -> model type and remove the need for the user to specify it (and possibly get it wrong)
  2. This case is possible but the label I'm returned back needs to be reformatted into a different structure to allow it to work
  3. This case is possible but should not be allowed to update a metric given that a str is returned

In any case, I do think this feels a little buggy and we should get to the bottom of it - let me know what questions you have!

from chantilly.

MaxHalford avatar MaxHalford commented on August 16, 2024

Ok I think I understand. Thanks for the details.

Your problem is that OneVsOneClassifier doesn't implement predict_proba_one. So you can't produce probabilities. This means that the metrics which requires probability estimates, namely CrossEntropy, should not be updated, period.

How I would handle this:

  1. Try to call predict_proba_one
  2. Catch NotImplementedError
  3. Fall back to predict_one
  4. Loop over the metrics
  5. Skip cases where metric.requires_label == False

How does that sound?

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

Okay that worked! For a regression model with mean absolute error (MAE) there simply wasn't an attribute requires_labels and it hit the last statement because it's not a ClassificationMetric (this is for a regression model) so instead of checking for that attribute, I'm going to do the dump Pythonic thing and just try/except and if the metric works to be updated, great, if not, well it's probably not intended for the case or (if it's a bug) we will find it eventually :)

I'm done adding the multiclass, going to do the new label endpoint and write some tests and then do a PR - will post here when it's in!

from chantilly.

vsoch avatar vsoch commented on August 16, 2024

okay tiny progress! vsoch/django-river-ml#7 This is great - we have label and a multiclass example, and I think next I'm going to give a shot at a KMeans model (or unsupervised). Thanks again for your help - I suspect it's late in your time zone so I will not ping further today!

from chantilly.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.