Hi, Thanks so much for maintaining this very easy-to-use library! I wanted to

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[BUG] current overwriting of transitions in the buffer causes problems in to_mdp_dataset() about d3rlpy HOT 6 OPEN

gunshi commented on June 2, 2024

[BUG] current overwriting of transitions in the buffer causes problems in to_mdp_dataset()

from d3rlpy.

Comments (6)

takuseno commented on June 2, 2024

@gunshi Hi, thanks for reporting this. Actually, that is a design to support frame stacking of those transitions. We need to keep those pointers to get previous frames. It's not functionally wrong, but I understand that it might change the number of transitions when to_mdp_dataset.

from d3rlpy.

gunshi commented on June 2, 2024

Hi @takuseno, I see.
I'm not sure about the functional correctness though, since when self._cursor loops around after the buffer overflows and the transitions are overwritten, the episode boundaries should reflect that.
In this case however, to_mdp_dataset will generate the wrong episodes where at least one episode will be composed of transitions from two different unrelated episodes, simply because the oldest remaining(non-overwritten) transition of one episode still maintains a link to the previous transition, which is in fact the newest transition of a new episode.
If one uses RL algorithms that only need 1-step transitions to compute their objectives, then there is no issue here, however if one samples n-step quantities or even trajectories from this buffer, then they won't make sense for this episode right?

Thanks for your engagement!
Gunshi

from d3rlpy.

gunshi commented on June 2, 2024

Ps: As an example, suppose I use a buffer of size 10000, and wait for the buffer to get full. Then after every 2000 steps of online RL, I check the lengths and number of episodes in the dataset derived from the buffer through the following code:
dataset = buffer.to_mdp_dataset() print(len(episode) for episode in dataset)
The sum of the lengths of these episodes should be equal to 10000 =size of buffer.
In this case however, I will get an ever increasing list that is not consistent with the size of the buffer. A partially overwritten episode whose length should ideally be lesser than it's original length (when it was first written), will still include the transitions from the "overwriting" episode, and thus include their lengths in it's own total length.
I'm not sure if my explanation was clear, and I can try to get a minimal reproducible example for this if that helps.

from d3rlpy.

gunshi commented on June 2, 2024

hey @takuseno,
I added a fix here: #269
This is not meant to be merged, but rather just to get your review on whether the fix makes sense or is still wrong in some sense.
Could you let m know if this looks ok?
Thanks!

from d3rlpy.

takuseno commented on June 2, 2024

@gunshi Thanks for the PR! As long as you're not doing frame stacking, it looks good to me. I'll keep it open until we figure out how to incorporate it with frame stacking logic.

from d3rlpy.

gunshi commented on June 2, 2024

Hey! I'm not sure why it would be wrong in the context of frame stacking? At the point of discontinuity, the first/oldest remaining transition of the episode being overwritten will just have it's observation padded with zeros right? (just like any start-of-the-episode transition in a dataset of episodes of transitions)

I want to make sure it's correct as I'm applying this fix to my local repo as well and hope I haven't introduced a bug with this fix.
Best,
Gunshi

from d3rlpy.

[BUG] current overwriting of transitions in the buffer causes problems in to_mdp_dataset() about d3rlpy HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent