Comments (7)
I invite you to join the organization. After you accept it, I will give you the access right of the fsrs4anki repository. Then you can edit the following wiki page:
https://github.com/open-spaced-repetition/fsrs4anki/wiki/The-Metric
from srs-benchmark.
It would be more convenient if I had editing rights too.
from srs-benchmark.
I'm writing a draft. You can give some advice, or just write some paragraphs.
Introduction
Weighted Root Mean Square Error in Bins (RMSE (bins)) is a metric engineered to evaluate the accuracy of memory prediction by FSRS and other spaced repetition algorithms in our SRS Benchmark.
In our recent algorithm comparison experiment, we found that the RMSE (bins) metric can be deceived in certain cases. To prevent cheating algorithms from obtaining artificially high scores, we modified the definition of the RMSE metric.
This article contains three parts:
- the old definition of RMSE (bins)
- the cheating case
- the new definition
Old definition
https://www.reddit.com/r/Anki/comments/15mab6e/fsrs_explained_part_2_accuracy/
The cheating case
The cheating method is very simple: output the average probability. Taking weather forecasting as an example, a weather forecast that always predicts tomorrow's probability of rain as the historical average will have a very low RMSE(bins). However, such a forecast is completely useless.
New definition
The main difference is the binning method. Instead of grouping the predictions and review outcomes by the predicted probability, the new method group them based on three features: the interval length, the number of reviews, and the number of lapses.
Within each bin, the squared difference between the average predicted probability of recall and the average recall rate is calculated. These values are then weighted according to the sample size in each bin, and then the final weighted root mean square error is calculated.
Taking weather forecasting as an example again, we can group the predicted probability of rain by season, temperature, air pressure, and other features. It's obvious that the historical average method will has very poor performance in this metric.
from srs-benchmark.
It would be more convenient if you made a wiki page and gave me editing rights.
from srs-benchmark.
Thank you!
from srs-benchmark.
Alright, I think it looks good. @user1823 here: https://github.com/open-spaced-repetition/fsrs4anki/wiki/The-Metric. Any feedback is welcome.
from srs-benchmark.
I have made some edits. It LGTM.
from srs-benchmark.
Related Issues (20)
- Inclusion of any of the boosting models HOT 23
- [Feature Request] Add a Transformer HOT 15
- collect bad cases from Anki users' dataset HOT 9
- visualize metrics over time HOT 2
- [Feature Request] Train a gradient-boosted decision tree HOT 36
- Some weird first forgetting curves HOT 11
- [Feature request] Add confidence intervals for all metrics HOT 9
- accidental post
- Revlogs parsing HOT 12
- [Question] A βrawβ version of the tiny_dataset.zip HOT 3
- [Feature Request] Add a BiLSTM HOT 2
- [Feature request] Add the ACT-R model (see paper) HOT 21
- [TODO] Add DASH and its variants HOT 13
- [Feature request] A quantitative measure of cheating HOT 9
- Ebisu? HOT 7
- [Question] Some more details from a ML perspective HOT 8
- Cannot download dataset from huggingface HOT 4
- Neural network scheduler HOT 42
- Add MCC
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from srs-benchmark.