precision@k recall@k AUC

As per <a class="issue-link js-issue-link" data-error-text="Failed to load title" data

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add more metrics about spotlight HOT 3 OPEN

maciejkula commented on May 20, 2024

Add more metrics

from spotlight.

maciejkula commented on May 20, 2024

As per #55

from spotlight.

mokarakaya commented on May 20, 2024

First, thank you very much for implementing Spotlight. I am planning to use Spotlight for my further study.

I'd like to contribute by implementing AUC. Here is my plan for implementation with some questions;

We can create a precision-recall curve (axises are precision and recall) or we can create ROC curve (axises are true positive rate to false positive rate) (See ref1). I think the precision-recall curve is fine. What is your opinion?
We will use different k values (number of recommended items) to produce different points of the curve.
New evaluation metric will return a single result since AUC is a reduced result of curve graphs. I see that results (e.g. precision and recall) in Spotlight are generally arrays rather than single results. Do you think AUC metric should return an array or a single result?
We can calculate the area under the curve by using the Trapezoidal rule or Simpson's rule. By default, the metric will calculate the area by using Trapezoidal rule. Simpson's rule will be optional.

Do you think the plan is ok for implementation? Please let me know your comments.

Ref1 - Recommender Systems Handbook 2nd edition - 8.3.2.2 Measuring Usage Prediction

ps: we need to hit x=1 and y=1 values, since this metric is generally used to compare multiple algorithms.

from spotlight.

nikisix commented on May 20, 2024

@mkarakaya,
You should contribute your idea! Here are my thoughts having helped out in the past on spotlight evaluation metrics:

Yes prec-recall is fine
Of course
I asked the same question, and @maciejkula's response was as you guess -- an array. Personally, I would not be opposed to a single result however, in the case that integrating tons of AUCs were slow and could be sped up somehow by aggregating first.
Sure

Also, have you considered a confusion matrix or at least F1-score?

Good luck!

from spotlight.

Recommend Projects