Code Monkey home page Code Monkey logo

Comments (50)

Expertium avatar Expertium commented on September 24, 2024 2

@L-M-Sherlock @user1823 I made a reddit post about the new benchmark: https://www.reddit.com/r/Anki/comments/18csuer/fsrs_is_now_the_most_accurate_spaced_repetition/
I suggest removing the links to my previous posts from wiki, and adding a link to this post instead.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024 2

OK. I will update the default parameters tomorrow. Maybe I should also draw a distribution graph for the first four parameters.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024 1

My computer crashed twice. It cost 30 hours to run the benchmark for FSRSv4:

Model: FSRSv4
Total number of users: 19934
Total number of reviews: 738181395
metric: LogLoss
FSRSv4 mean: 0.3658
metric: RMSE
FSRSv4 mean: 0.3141
metric: RMSE(bins)
FSRSv4 mean: 0.0818
weights: [0.6774, 1.6889, 4.4659, 15.0454, 4.9454, 1.0627, 0.8762, 0.0464, 1.5597, 0.1358, 0.994, 2.1661, 0.0681, 0.3403, 1.2845, 0.2573, 2.7212]

usage.csv

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024 1
Model: FSRSv4
Total number of users: 19934
Total number of reviews: 738181395
metric: LogLoss
FSRSv4 mean: 0.3360
metric: RMSE
FSRSv4 mean: 0.3002
metric: RMSE(bins)
FSRSv4 mean: 0.0590

from srs-benchmark.

user1823 avatar user1823 commented on September 24, 2024 1

However, the new median weights will schedule a longer interval (4 days) for new card pressed good in the first time. I'm afraid that it will induce more confusion.

But, they are based on a very large number of collections and should be more accurate. I think accuracy is more important. We can tell people that short early intervals is a shortcoming of SM-2, which FSRS has corrected.

Also see Q10 in https://github.com/open-spaced-repetition/fsrs4anki/wiki/FAQ

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Also, I would like you to benchmark FSRS v4.5 against SM-17 in the respective repo.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

The new dataset has 1.5 billions of revlogs as least. It will take a lot of time to benchmark.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Wow. I thought it would be 100-400 million.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Out of curiosity, what are your PC specs?
Also, if you make the dataset downloadable, I can do some benchmarking on my own and report the results, to speed things up a little bit.

from srs-benchmark.

dae avatar dae commented on September 24, 2024

You can use a subset if you wish. I provided that much so you can do longer tests when required, or divide the sample set up to see how different samples differ.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

Out of curiosity, what are your PC specs?

Mac M2 MAX 96GB unified memory

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

I'm trying to find an efficient method to deal with the dataset.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

The converted dataset file:
https://drive.google.com/file/d/1Tg8WmLHxK_VRxnAtATXS_HYdywsaxTyf/view?usp=drive_link

The new benchmark code:
https://github.com/open-spaced-repetition/fsrs-benchmark/tree/Feat/new-dataset

I'm benchmarking FSRSv4.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Once you benchmark it, can you send me a file with RMSE values, as well as % of button usage for each user? I want to re-do my analysis of how button usage affects RMSE.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

I tried git clone https://github.com/open-spaced-repetition/fsrs-benchmark/tree/Feat/new-dataset.git, I got an error saying that the repository was not found.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

I got an error saying that the repository was not found.

You need to clone the fsrs-benchmark repo and checkout the Feat/new-dataset branch.

https://www.atlassian.com/git/tutorials/using-branches/git-checkout

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

I cloned it some time ago, but I can't figure out the checkout stuff.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

I recommend installing GitHub Desktop: https://desktop.github.com/, where you can checkout the branch conveniently:

image

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

image
Ok, what next? How do I run the benchmark?

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

For example:

MODEL=SM2 & python other.py

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Sorry, but I'm afraid you will have to be very patient with me. Explain the setup process to me as if I was a 45 year old housewife.

  1. Where do I unzip the dataset after downloading it? I don't see a folder for it.
  2. I don't see the new-dataset branch on my PC. Do I need to clone it somehow? If so, how to do it using Github Desktop?
  3. Where do I input the code you gave above? I tried this:
    image
    But that doesn't seem to work.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024
  1. Where do I unzip the dataset after downloading it? I don't see a folder for it.

You need to unzip the dataset.7z and move the directory dataset to fsrs-benchmark/.

2. I don't see the new-dataset branch on my PC.

Could you expand this menu?

image

3. Where do I input the code you gave above? I tried this:

You need to open a terminal, and cd fsrs-benchmark, and then you can input the command.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

image
As far as I can tell, switching to a different branch doesn't affect any files on my PC.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

You need to open a terminal, and cd fsrs-benchmark, and then you can input the command.

I get a "'MODEL' is not recognized as an internal or external command, operable program or batch file." error. But it still runs though.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

And I'm getting the same issue I had before, with graphs popping up.
image
Unfortunately, I don't remember what code in which file was responsible for this.
EDIT: it's utils.py, I fixed it.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

I get a "'MODEL' is not recognized as an internal or external command

Sorry, I forget that you used windows. It should be set MODEL=SM2. You can replaced SM2 with LSTM, FSRSv3, HLR.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

switching to a different branch doesn't affect any files on my PC.

It does. It can change the contents of script.py and other.py.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

switching to a different branch doesn't affect any files on my PC.

It does. It can change the contents of script.py and other.py.

My bad. I tried switching back and forth and checking whether "Date modified" of script.py and other.py changes, and yes, it does.

It should be set MODEL=SM2

Thank you, that works. But I see stuff like this:
image

Even though there are no files with that name, or any .tsv files for that matter.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

EDIT: it's utils.py, I fixed it.

Strange. Last time I fixed that issue by editing utils.py, but not this time.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

Did you remove the old dataset?

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

The fsrs-benchmark folder only has the new dataset.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Ah, I see the problem. I have two fsrs-benchmark folders, the previous clone and the new one. Though that still doesn't explain the problem with graphs popping up, since I commented out graphing in both utils.py files.
Ok, I'll leave the benchmarking to you. This is giving me a headache.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

And please don't forget #14 (comment)

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Thank you! Will you update the results with values weighted by reviews an ln(reviews)?

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Also, it says there are around 700 million reviews, not 1.5 billion. Why?

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

Also, it says there are around 700 million reviews, not 1.5 billion. Why?

The 1.5 billion include short-term reviews.

Thank you! Will you update the results with values weighted by reviews an ln(reviews)?

I use median by user.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Thank you! Will you update the results with values weighted by reviews an ln(reviews)?

I use median by user.

I meant this:
image

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

I meant this:

The above metrics are weighted by ln(review).

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Then for the sake of consistency, include weighting by n_reviews too, to make everything the same as in the old benchmark.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

The same goes for all other algorithms too. Speaking of which, I would love to see a dry run with the new default parameters.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

I've been trying to run the benchmark myself and this is the new error I get
Commands:

cd fsrs-benchmark
set DEV_MODE=1 && set MODEL=SM2 && python other.py

Error:

Traceback (most recent call last):
  File "C:\Users\Andrew\fsrs-benchmark\other.py", line 674, in <module>
    if file.stem in map(lambda x: x.stem, Path(f"result/{model}").iterdir()):
  File "C:\Users\Andrew\AppData\Local\Programs\Python\Python310\lib\pathlib.py", line 1017, in iterdir
    for name in self._accessor.listdir(self):
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'result\\SM2 '

C:\Users\Andrew\fsrs-benchmark\result\SM2 is a valid path, so idk what's the problem.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

I did manage to make script.py work, but not other.py.
Well, at least it means I can benchmark my ideas on the new dataset in the future.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Please add FSRS v4 (dry run) to the benchmark with the new dataset too.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

Please add FSRS v4 (dry run) to the benchmark with the new dataset too.

Which initial parameters should the dry run use?

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

These: #14 (comment)

Also, share them with Dae so that he will add them to Anki 23.12.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Actually, I'm not sure which parameters built-in FSRS should use
image

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

The dry run's result has been out.

Total number of users: 19995

Total number of reviews for evaluation: 738,221,150

Weighted by number of reviews

Algorithm Log Loss RMSE RMSE(bins)
FSRS v4 0.3360 0.3002 0.0590
FSRS rs 0.3404 0.3018 0.0628
LSTM 0.4000 0.3123 0.0859
FSRS v4 (unoptimized) 0.3677 0.3128 0.0886
FSRS v3 0.4286 0.3226 0.1074
SM2 0.6339 0.3592 0.1799
HLR 0.8175 0.3790 0.2094

Weighted by ln(number of reviews)

Algorithm Log Loss RMSE RMSE(bins)
FSRS v4 0.3658 0.3141 0.0820
FSRS rs 0.3705 0.3161 0.0860
FSRS v4 (unoptimized) 0.4023 0.3313 0.1155
FSRS v3 0.5201 0.3471 0.1425
LSTM 0.5645 0.3562 0.1536
SM2 0.8702 0.3954 0.2240
HLR 2.3669 0.5393 0.4096

However, the new median weights will schedule a longer interval (4 days) for new card pressed good in the first time. I'm afraid that it will induce more confusion.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

I agree with user1823. Built-in FSRS should use the new, more accurate default parameters.

from srs-benchmark.

Expertium avatar Expertium commented on September 24, 2024

Btw, I don't think there are any use cases for the old (small) dataset now, so I suggest making the new-dataset branch the main branch.

from srs-benchmark.

L-M-Sherlock avatar L-M-Sherlock commented on September 24, 2024

weight0
weight1
weight2
weight3

from srs-benchmark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.