Comments (50)
@L-M-Sherlock @user1823 I made a reddit post about the new benchmark: https://www.reddit.com/r/Anki/comments/18csuer/fsrs_is_now_the_most_accurate_spaced_repetition/
I suggest removing the links to my previous posts from wiki, and adding a link to this post instead.
from srs-benchmark.
OK. I will update the default parameters tomorrow. Maybe I should also draw a distribution graph for the first four parameters.
from srs-benchmark.
My computer crashed twice. It cost 30 hours to run the benchmark for FSRSv4:
Model: FSRSv4
Total number of users: 19934
Total number of reviews: 738181395
metric: LogLoss
FSRSv4 mean: 0.3658
metric: RMSE
FSRSv4 mean: 0.3141
metric: RMSE(bins)
FSRSv4 mean: 0.0818
weights: [0.6774, 1.6889, 4.4659, 15.0454, 4.9454, 1.0627, 0.8762, 0.0464, 1.5597, 0.1358, 0.994, 2.1661, 0.0681, 0.3403, 1.2845, 0.2573, 2.7212]
from srs-benchmark.
Model: FSRSv4
Total number of users: 19934
Total number of reviews: 738181395
metric: LogLoss
FSRSv4 mean: 0.3360
metric: RMSE
FSRSv4 mean: 0.3002
metric: RMSE(bins)
FSRSv4 mean: 0.0590
from srs-benchmark.
However, the new median weights will schedule a longer interval (4 days) for new card pressed good in the first time. I'm afraid that it will induce more confusion.
But, they are based on a very large number of collections and should be more accurate. I think accuracy is more important. We can tell people that short early intervals is a shortcoming of SM-2, which FSRS has corrected.
Also see Q10 in https://github.com/open-spaced-repetition/fsrs4anki/wiki/FAQ
from srs-benchmark.
Also, I would like you to benchmark FSRS v4.5 against SM-17 in the respective repo.
from srs-benchmark.
The new dataset has 1.5 billions of revlogs as least. It will take a lot of time to benchmark.
from srs-benchmark.
Wow. I thought it would be 100-400 million.
from srs-benchmark.
Out of curiosity, what are your PC specs?
Also, if you make the dataset downloadable, I can do some benchmarking on my own and report the results, to speed things up a little bit.
from srs-benchmark.
You can use a subset if you wish. I provided that much so you can do longer tests when required, or divide the sample set up to see how different samples differ.
from srs-benchmark.
Out of curiosity, what are your PC specs?
Mac M2 MAX 96GB unified memory
from srs-benchmark.
I'm trying to find an efficient method to deal with the dataset.
from srs-benchmark.
The converted dataset file:
https://drive.google.com/file/d/1Tg8WmLHxK_VRxnAtATXS_HYdywsaxTyf/view?usp=drive_link
The new benchmark code:
https://github.com/open-spaced-repetition/fsrs-benchmark/tree/Feat/new-dataset
I'm benchmarking FSRSv4.
from srs-benchmark.
Once you benchmark it, can you send me a file with RMSE values, as well as % of button usage for each user? I want to re-do my analysis of how button usage affects RMSE.
from srs-benchmark.
I tried git clone https://github.com/open-spaced-repetition/fsrs-benchmark/tree/Feat/new-dataset.git
, I got an error saying that the repository was not found.
from srs-benchmark.
I got an error saying that the repository was not found.
You need to clone the fsrs-benchmark repo and checkout the Feat/new-dataset branch.
https://www.atlassian.com/git/tutorials/using-branches/git-checkout
from srs-benchmark.
I cloned it some time ago, but I can't figure out the checkout stuff.
from srs-benchmark.
I recommend installing GitHub Desktop: https://desktop.github.com/, where you can checkout the branch conveniently:
from srs-benchmark.
Ok, what next? How do I run the benchmark?
from srs-benchmark.
For example:
MODEL=SM2 & python other.py
from srs-benchmark.
Sorry, but I'm afraid you will have to be very patient with me. Explain the setup process to me as if I was a 45 year old housewife.
- Where do I unzip the dataset after downloading it? I don't see a folder for it.
- I don't see the
new-dataset
branch on my PC. Do I need to clone it somehow? If so, how to do it using Github Desktop? - Where do I input the code you gave above? I tried this:
But that doesn't seem to work.
from srs-benchmark.
- Where do I unzip the dataset after downloading it? I don't see a folder for it.
You need to unzip the dataset.7z and move the directory dataset
to fsrs-benchmark/
.
2. I don't see the
new-dataset
branch on my PC.
Could you expand this menu?
3. Where do I input the code you gave above? I tried this:
You need to open a terminal, and cd fsrs-benchmark
, and then you can input the command.
from srs-benchmark.
As far as I can tell, switching to a different branch doesn't affect any files on my PC.
from srs-benchmark.
You need to open a terminal, and cd
fsrs-benchmark
, and then you can input the command.
I get a "'MODEL' is not recognized as an internal or external command, operable program or batch file." error. But it still runs though.
from srs-benchmark.
And I'm getting the same issue I had before, with graphs popping up.
Unfortunately, I don't remember what code in which file was responsible for this.
EDIT: it's utils.py, I fixed it.
from srs-benchmark.
I get a "'MODEL' is not recognized as an internal or external command
Sorry, I forget that you used windows. It should be set MODEL=SM2
. You can replaced SM2 with LSTM
, FSRSv3
, HLR
.
from srs-benchmark.
switching to a different branch doesn't affect any files on my PC.
It does. It can change the contents of script.py and other.py.
from srs-benchmark.
switching to a different branch doesn't affect any files on my PC.
It does. It can change the contents of script.py and other.py.
My bad. I tried switching back and forth and checking whether "Date modified" of script.py and other.py changes, and yes, it does.
It should be
set MODEL=SM2
Thank you, that works. But I see stuff like this:
Even though there are no files with that name, or any .tsv files for that matter.
from srs-benchmark.
EDIT: it's utils.py, I fixed it.
Strange. Last time I fixed that issue by editing utils.py, but not this time.
from srs-benchmark.
Did you remove the old dataset?
from srs-benchmark.
The fsrs-benchmark folder only has the new dataset.
from srs-benchmark.
Ah, I see the problem. I have two fsrs-benchmark folders, the previous clone and the new one. Though that still doesn't explain the problem with graphs popping up, since I commented out graphing in both utils.py files.
Ok, I'll leave the benchmarking to you. This is giving me a headache.
from srs-benchmark.
And please don't forget #14 (comment)
from srs-benchmark.
Thank you! Will you update the results with values weighted by reviews an ln(reviews)?
from srs-benchmark.
Also, it says there are around 700 million reviews, not 1.5 billion. Why?
from srs-benchmark.
Also, it says there are around 700 million reviews, not 1.5 billion. Why?
The 1.5 billion include short-term reviews.
Thank you! Will you update the results with values weighted by reviews an ln(reviews)?
I use median by user.
from srs-benchmark.
Thank you! Will you update the results with values weighted by reviews an ln(reviews)?
I use median by user.
from srs-benchmark.
I meant this:
The above metrics are weighted by ln(review).
from srs-benchmark.
Then for the sake of consistency, include weighting by n_reviews too, to make everything the same as in the old benchmark.
from srs-benchmark.
The same goes for all other algorithms too. Speaking of which, I would love to see a dry run with the new default parameters.
from srs-benchmark.
I've been trying to run the benchmark myself and this is the new error I get
Commands:
cd fsrs-benchmark
set DEV_MODE=1 && set MODEL=SM2 && python other.py
Error:
Traceback (most recent call last):
File "C:\Users\Andrew\fsrs-benchmark\other.py", line 674, in <module>
if file.stem in map(lambda x: x.stem, Path(f"result/{model}").iterdir()):
File "C:\Users\Andrew\AppData\Local\Programs\Python\Python310\lib\pathlib.py", line 1017, in iterdir
for name in self._accessor.listdir(self):
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'result\\SM2 '
C:\Users\Andrew\fsrs-benchmark\result\SM2 is a valid path, so idk what's the problem.
from srs-benchmark.
I did manage to make script.py work, but not other.py.
Well, at least it means I can benchmark my ideas on the new dataset in the future.
from srs-benchmark.
Please add FSRS v4 (dry run) to the benchmark with the new dataset too.
from srs-benchmark.
Please add FSRS v4 (dry run) to the benchmark with the new dataset too.
Which initial parameters should the dry run use?
from srs-benchmark.
These: #14 (comment)
Also, share them with Dae so that he will add them to Anki 23.12.
from srs-benchmark.
Actually, I'm not sure which parameters built-in FSRS should use
from srs-benchmark.
The dry run's result has been out.
Total number of users: 19995
Total number of reviews for evaluation: 738,221,150
Weighted by number of reviews
Algorithm | Log Loss | RMSE | RMSE(bins) |
---|---|---|---|
FSRS v4 | 0.3360 | 0.3002 | 0.0590 |
FSRS rs | 0.3404 | 0.3018 | 0.0628 |
LSTM | 0.4000 | 0.3123 | 0.0859 |
FSRS v4 (unoptimized) | 0.3677 | 0.3128 | 0.0886 |
FSRS v3 | 0.4286 | 0.3226 | 0.1074 |
SM2 | 0.6339 | 0.3592 | 0.1799 |
HLR | 0.8175 | 0.3790 | 0.2094 |
Weighted by ln(number of reviews)
Algorithm | Log Loss | RMSE | RMSE(bins) |
---|---|---|---|
FSRS v4 | 0.3658 | 0.3141 | 0.0820 |
FSRS rs | 0.3705 | 0.3161 | 0.0860 |
FSRS v4 (unoptimized) | 0.4023 | 0.3313 | 0.1155 |
FSRS v3 | 0.5201 | 0.3471 | 0.1425 |
LSTM | 0.5645 | 0.3562 | 0.1536 |
SM2 | 0.8702 | 0.3954 | 0.2240 |
HLR | 2.3669 | 0.5393 | 0.4096 |
However, the new median weights will schedule a longer interval (4 days) for new card pressed good in the first time. I'm afraid that it will induce more confusion.
from srs-benchmark.
I agree with user1823. Built-in FSRS should use the new, more accurate default parameters.
from srs-benchmark.
Btw, I don't think there are any use cases for the old (small) dataset now, so I suggest making the new-dataset branch the main branch.
from srs-benchmark.
from srs-benchmark.
Related Issues (20)
- Inclusion of any of the boosting models HOT 23
- [Feature Request] Add a Transformer HOT 15
- collect bad cases from Anki users' dataset HOT 9
- visualize metrics over time HOT 2
- [Feature Request] Train a gradient-boosted decision tree HOT 36
- Some weird first forgetting curves HOT 11
- [Feature request] Add confidence intervals for all metrics HOT 9
- accidental post
- Revlogs parsing HOT 12
- [Question] A βrawβ version of the tiny_dataset.zip HOT 3
- [Feature Request] Add a BiLSTM HOT 2
- [Feature request] Add the ACT-R model (see paper) HOT 21
- [TODO] Add DASH and its variants HOT 13
- [Feature request] A quantitative measure of cheating HOT 9
- Write an article about binned RMSE and cheating calibration metrics HOT 7
- Ebisu? HOT 7
- [Question] Some more details from a ML perspective HOT 8
- Cannot download dataset from huggingface HOT 4
- Neural network scheduler HOT 42
- Add MCC
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from srs-benchmark.