<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="19

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Updating the benchmark with new data,about open-spaced-repetition/srs-benchmark

Expertium commented on September 24, 2024 2

@L-M-Sherlock @user1823 I made a reddit post about the new benchmark: https://www.reddit.com/r/Anki/comments/18csuer/fsrs_is_now_the_most_accurate_spaced_repetition/
I suggest removing the links to my previous posts from wiki, and adding a link to this post instead.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024 2

OK. I will update the default parameters tomorrow. Maybe I should also draw a distribution graph for the first four parameters.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024 1

My computer crashed twice. It cost 30 hours to run the benchmark for FSRSv4:

Model: FSRSv4
Total number of users: 19934
Total number of reviews: 738181395
metric: LogLoss
FSRSv4 mean: 0.3658
metric: RMSE
FSRSv4 mean: 0.3141
metric: RMSE(bins)
FSRSv4 mean: 0.0818
weights: [0.6774, 1.6889, 4.4659, 15.0454, 4.9454, 1.0627, 0.8762, 0.0464, 1.5597, 0.1358, 0.994, 2.1661, 0.0681, 0.3403, 1.2845, 0.2573, 2.7212]

usage.csv

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024 1

Model: FSRSv4
Total number of users: 19934
Total number of reviews: 738181395
metric: LogLoss
FSRSv4 mean: 0.3360
metric: RMSE
FSRSv4 mean: 0.3002
metric: RMSE(bins)
FSRSv4 mean: 0.0590

from srs-benchmark.

user1823 commented on September 24, 2024 1

However, the new median weights will schedule a longer interval (4 days) for new card pressed good in the first time. I'm afraid that it will induce more confusion.

But, they are based on a very large number of collections and should be more accurate. I think accuracy is more important. We can tell people that short early intervals is a shortcoming of SM-2, which FSRS has corrected.

Also see Q10 in https://github.com/open-spaced-repetition/fsrs4anki/wiki/FAQ

from srs-benchmark.

Expertium commented on September 24, 2024

Also, I would like you to benchmark FSRS v4.5 against SM-17 in the respective repo.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

The new dataset has 1.5 billions of revlogs as least. It will take a lot of time to benchmark.

from srs-benchmark.

Expertium commented on September 24, 2024

Wow. I thought it would be 100-400 million.

from srs-benchmark.

Expertium commented on September 24, 2024

Out of curiosity, what are your PC specs?
Also, if you make the dataset downloadable, I can do some benchmarking on my own and report the results, to speed things up a little bit.

from srs-benchmark.

dae commented on September 24, 2024

You can use a subset if you wish. I provided that much so you can do longer tests when required, or divide the sample set up to see how different samples differ.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

Out of curiosity, what are your PC specs?

Mac M2 MAX 96GB unified memory

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

I'm trying to find an efficient method to deal with the dataset.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

The converted dataset file:
https://drive.google.com/file/d/1Tg8WmLHxK_VRxnAtATXS_HYdywsaxTyf/view?usp=drive_link

The new benchmark code:
https://github.com/open-spaced-repetition/fsrs-benchmark/tree/Feat/new-dataset

I'm benchmarking FSRSv4.

from srs-benchmark.

Expertium commented on September 24, 2024

Once you benchmark it, can you send me a file with RMSE values, as well as % of button usage for each user? I want to re-do my analysis of how button usage affects RMSE.

from srs-benchmark.

Expertium commented on September 24, 2024

I tried git clone https://github.com/open-spaced-repetition/fsrs-benchmark/tree/Feat/new-dataset.git, I got an error saying that the repository was not found.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

I got an error saying that the repository was not found.

You need to clone the fsrs-benchmark repo and checkout the Feat/new-dataset branch.

https://www.atlassian.com/git/tutorials/using-branches/git-checkout

from srs-benchmark.

Expertium commented on September 24, 2024

I cloned it some time ago, but I can't figure out the checkout stuff.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

I recommend installing GitHub Desktop: https://desktop.github.com/, where you can checkout the branch conveniently:

from srs-benchmark.

Expertium commented on September 24, 2024

Ok, what next? How do I run the benchmark?

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

For example:

MODEL=SM2 & python other.py

from srs-benchmark.

Expertium commented on September 24, 2024

Sorry, but I'm afraid you will have to be very patient with me. Explain the setup process to me as if I was a 45 year old housewife.

Where do I unzip the dataset after downloading it? I don't see a folder for it.
I don't see the new-dataset branch on my PC. Do I need to clone it somehow? If so, how to do it using Github Desktop?
Where do I input the code you gave above? I tried this:

But that doesn't seem to work.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

Where do I unzip the dataset after downloading it? I don't see a folder for it.

You need to unzip the dataset.7z and move the directory dataset to fsrs-benchmark/.

2. I don't see the new-dataset branch on my PC.

Could you expand this menu?

3. Where do I input the code you gave above? I tried this:

You need to open a terminal, and cd fsrs-benchmark, and then you can input the command.

from srs-benchmark.

Expertium commented on September 24, 2024

As far as I can tell, switching to a different branch doesn't affect any files on my PC.

from srs-benchmark.

Expertium commented on September 24, 2024

You need to open a terminal, and cd fsrs-benchmark, and then you can input the command.

I get a "'MODEL' is not recognized as an internal or external command, operable program or batch file." error. But it still runs though.

from srs-benchmark.

Expertium commented on September 24, 2024

And I'm getting the same issue I had before, with graphs popping up.

Unfortunately, I don't remember what code in which file was responsible for this.
EDIT: it's utils.py, I fixed it.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

I get a "'MODEL' is not recognized as an internal or external command

Sorry, I forget that you used windows. It should be set MODEL=SM2. You can replaced SM2 with LSTM, FSRSv3, HLR.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

switching to a different branch doesn't affect any files on my PC.

It does. It can change the contents of script.py and other.py.

from srs-benchmark.

Expertium commented on September 24, 2024

switching to a different branch doesn't affect any files on my PC.

It does. It can change the contents of script.py and other.py.

My bad. I tried switching back and forth and checking whether "Date modified" of script.py and other.py changes, and yes, it does.

It should be set MODEL=SM2

Thank you, that works. But I see stuff like this:

Even though there are no files with that name, or any .tsv files for that matter.

from srs-benchmark.

Expertium commented on September 24, 2024

EDIT: it's utils.py, I fixed it.

Strange. Last time I fixed that issue by editing utils.py, but not this time.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

Did you remove the old dataset?

from srs-benchmark.

Expertium commented on September 24, 2024

The fsrs-benchmark folder only has the new dataset.

from srs-benchmark.

Expertium commented on September 24, 2024

Ah, I see the problem. I have two fsrs-benchmark folders, the previous clone and the new one. Though that still doesn't explain the problem with graphs popping up, since I commented out graphing in both utils.py files.
Ok, I'll leave the benchmarking to you. This is giving me a headache.

from srs-benchmark.

Expertium commented on September 24, 2024

And please don't forget #14 (comment)

from srs-benchmark.

Expertium commented on September 24, 2024

Thank you! Will you update the results with values weighted by reviews an ln(reviews)?

from srs-benchmark.

Expertium commented on September 24, 2024

Also, it says there are around 700 million reviews, not 1.5 billion. Why?

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

Also, it says there are around 700 million reviews, not 1.5 billion. Why?

The 1.5 billion include short-term reviews.

Thank you! Will you update the results with values weighted by reviews an ln(reviews)?

I use median by user.

from srs-benchmark.

Expertium commented on September 24, 2024

Thank you! Will you update the results with values weighted by reviews an ln(reviews)?

I use median by user.

I meant this:

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

I meant this:

The above metrics are weighted by ln(review).

from srs-benchmark.

Expertium commented on September 24, 2024

Then for the sake of consistency, include weighting by n_reviews too, to make everything the same as in the old benchmark.

from srs-benchmark.

Expertium commented on September 24, 2024

The same goes for all other algorithms too. Speaking of which, I would love to see a dry run with the new default parameters.

from srs-benchmark.

Expertium commented on September 24, 2024

I've been trying to run the benchmark myself and this is the new error I get
Commands:

cd fsrs-benchmark
set DEV_MODE=1 && set MODEL=SM2 && python other.py

Error:

Traceback (most recent call last):
  File "C:\Users\Andrew\fsrs-benchmark\other.py", line 674, in <module>
    if file.stem in map(lambda x: x.stem, Path(f"result/{model}").iterdir()):
  File "C:\Users\Andrew\AppData\Local\Programs\Python\Python310\lib\pathlib.py", line 1017, in iterdir
    for name in self._accessor.listdir(self):
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'result\\SM2 '

C:\Users\Andrew\fsrs-benchmark\result\SM2 is a valid path, so idk what's the problem.

from srs-benchmark.

Expertium commented on September 24, 2024

I did manage to make script.py work, but not other.py.
Well, at least it means I can benchmark my ideas on the new dataset in the future.

from srs-benchmark.

Expertium commented on September 24, 2024

Please add FSRS v4 (dry run) to the benchmark with the new dataset too.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

Please add FSRS v4 (dry run) to the benchmark with the new dataset too.

Which initial parameters should the dry run use?

from srs-benchmark.

Expertium commented on September 24, 2024

These: #14 (comment)

Also, share them with Dae so that he will add them to Anki 23.12.

from srs-benchmark.

Expertium commented on September 24, 2024

Actually, I'm not sure which parameters built-in FSRS should use

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

The dry run's result has been out.

Total number of users: 19995

Total number of reviews for evaluation: 738,221,150

Weighted by number of reviews

Algorithm	Log Loss	RMSE	RMSE(bins)
FSRS v4	0.3360	0.3002	0.0590
FSRS rs	0.3404	0.3018	0.0628
LSTM	0.4000	0.3123	0.0859
FSRS v4 (unoptimized)	0.3677	0.3128	0.0886
FSRS v3	0.4286	0.3226	0.1074
SM2	0.6339	0.3592	0.1799
HLR	0.8175	0.3790	0.2094

Weighted by ln(number of reviews)

Algorithm	Log Loss	RMSE	RMSE(bins)
FSRS v4	0.3658	0.3141	0.0820
FSRS rs	0.3705	0.3161	0.0860
FSRS v4 (unoptimized)	0.4023	0.3313	0.1155
FSRS v3	0.5201	0.3471	0.1425
LSTM	0.5645	0.3562	0.1536
SM2	0.8702	0.3954	0.2240
HLR	2.3669	0.5393	0.4096

However, the new median weights will schedule a longer interval (4 days) for new card pressed good in the first time. I'm afraid that it will induce more confusion.

from srs-benchmark.

Expertium commented on September 24, 2024

I agree with user1823. Built-in FSRS should use the new, more accurate default parameters.

from srs-benchmark.

Expertium commented on September 24, 2024

Btw, I don't think there are any use cases for the old (small) dataset now, so I suggest making the new-dataset branch the main branch.

from srs-benchmark.

L-M-Sherlock commented on September 24, 2024

from srs-benchmark.

Updating the benchmark with new data about srs-benchmark HOT 50 CLOSED

Comments (50)

Weighted by number of reviews

Weighted by ln(number of reviews)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent