<a href="https://github.com/open-spaced-repetition/fsrs-benchmark/files/14072494/revlo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

Revlogs parsing,about open-spaced-repetition/srs-benchmark

L-M-Sherlock commented on July 28, 2024 1

Thanks for this report. I will fix it in soon.

from srs-benchmark.

imrryr commented on July 28, 2024

@L-M-Sherlock

from srs-benchmark.

L-M-Sherlock commented on July 28, 2024

I get the correct result from your code. It seems an environment problem.

from srs-benchmark.

L-M-Sherlock commented on July 28, 2024

I find that only review_th is inconsistent with my result. It doesn't matter. The order is correct.

from srs-benchmark.

imrryr commented on July 28, 2024

Unfortunately, it does matter for my analysis. I need the review_th to be right to order the entire file, also the delta_t column was different. I really need to figure it out in my environment.

from srs-benchmark.

L-M-Sherlock commented on July 28, 2024

The review_th is calculated here:

https://github.com/open-spaced-repetition/fsrs-benchmark/blob/ea493cf91900d9c8fd3bd05c42518373875c799f/revlogs2dataset.py#L54

I recommend searching the document of pandas about this function.

Document: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html

I'm not helpful here because I can't reproduce the bug.

It’s helpful for debugging to store the intermediate products during the converting. You can save the df into csv after each step. Then you may locate the bug.

from srs-benchmark.

imrryr commented on July 28, 2024

OK, that makes sense. It appears the densification is working but the original times coming from the revlog are different. Did you test with the revlog I provided? Not your original?

from srs-benchmark.

imrryr commented on July 28, 2024

is there any chance the stats_pb2 file is the wrong version? I got that from @dae 's package and it was needed to run your file. @L-M-Sherlock

from srs-benchmark.

L-M-Sherlock commented on July 28, 2024

I also got that from dae. Could you show some cases about the different review time?

from srs-benchmark.

imrryr commented on July 28, 2024

Here is an output before dropping the rows: review_time card_id rating review_state is_learn_start sequence_group last_learn_start mask relative_day delta_t i review_th
0 97218963 0 3 0 True 1 1 True -19683 -1 1 4863
1 97224667 0 3 0 False 1 1 True -19683 0 2 4864
2 440742459 0 3 1 False 1 1 True -19679 4 3 4997
3 933416194 0 4 1 False 1 1 True -19674 5 4 5846
4 1046892324 0 2 3 False 1 1 True -19672 2 5 6105
... ... ... ... ... ... ... ... ... ... ... .. ...
7070 -1726999624 645 3 0 False 620 620 True -19705 0 2 1367
7071 -1726339624 645 3 3 False 620 620 True -19705 0 3 1380
7072 -1697912624 645 3 1 False 620 620 True -19704 1 4 1639
7073 -1659497624 645 3 3 False 620 620 True -19704 0 5 1959
7074 -1637230624 645 3 3 False 620 620 True -19704 0 6 2077

[6966 rows x 12 columns]
card_id review_th delta_t rating
0 0 4863 -1 3
1 0 4864 0 3
2 0 4997 4 3
3 0 5846 5 4
4 0 6105 2 2
... ... ... ... ...
7070 645 1367 0 3
7071 645 1380 0 3
7072 645 1639 1 3
7073 645 1959 0 3
7074 645 2077 0 3

from srs-benchmark.

L-M-Sherlock commented on July 28, 2024

It's weird that the review_time is negative.

https://github.com/open-spaced-repetition/fsrs-benchmark/blob/ea493cf91900d9c8fd3bd05c42518373875c799f/revlogs2dataset.py#L31

Could you check whether they are correct after below this line?

from srs-benchmark.

imrryr commented on July 28, 2024

Ha! my environment demoted the int64 to int32 here, which corrupted it. Problem solved. @L-M-Sherlock
df["review_time"] = df["review_time"].astype(int) fixed with
df["review_time"] = df["review_time"].astype("int64")

from srs-benchmark.

Revlogs parsing about srs-benchmark HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent