Code Monkey home page Code Monkey logo

Comments (6)

takuseno avatar takuseno commented on June 2, 2024

@gunshi Hi, thanks for reporting this. Actually, that is a design to support frame stacking of those transitions. We need to keep those pointers to get previous frames. It's not functionally wrong, but I understand that it might change the number of transitions when to_mdp_dataset.

from d3rlpy.

gunshi avatar gunshi commented on June 2, 2024

Hi @takuseno, I see.
I'm not sure about the functional correctness though, since when self._cursor loops around after the buffer overflows and the transitions are overwritten, the episode boundaries should reflect that.
In this case however, to_mdp_dataset will generate the wrong episodes where at least one episode will be composed of transitions from two different unrelated episodes, simply because the oldest remaining(non-overwritten) transition of one episode still maintains a link to the previous transition, which is in fact the newest transition of a new episode.
If one uses RL algorithms that only need 1-step transitions to compute their objectives, then there is no issue here, however if one samples n-step quantities or even trajectories from this buffer, then they won't make sense for this episode right?

Thanks for your engagement!
Gunshi

from d3rlpy.

gunshi avatar gunshi commented on June 2, 2024

Ps: As an example, suppose I use a buffer of size 10000, and wait for the buffer to get full. Then after every 2000 steps of online RL, I check the lengths and number of episodes in the dataset derived from the buffer through the following code:
dataset = buffer.to_mdp_dataset() print(len(episode) for episode in dataset)
The sum of the lengths of these episodes should be equal to 10000 =size of buffer.
In this case however, I will get an ever increasing list that is not consistent with the size of the buffer. A partially overwritten episode whose length should ideally be lesser than it's original length (when it was first written), will still include the transitions from the "overwriting" episode, and thus include their lengths in it's own total length.
I'm not sure if my explanation was clear, and I can try to get a minimal reproducible example for this if that helps.

from d3rlpy.

gunshi avatar gunshi commented on June 2, 2024

hey @takuseno,
I added a fix here: #269
This is not meant to be merged, but rather just to get your review on whether the fix makes sense or is still wrong in some sense.
Could you let m know if this looks ok?
Thanks!

from d3rlpy.

takuseno avatar takuseno commented on June 2, 2024

@gunshi Thanks for the PR! As long as you're not doing frame stacking, it looks good to me. I'll keep it open until we figure out how to incorporate it with frame stacking logic.

from d3rlpy.

gunshi avatar gunshi commented on June 2, 2024

Hey! I'm not sure why it would be wrong in the context of frame stacking? At the point of discontinuity, the first/oldest remaining transition of the episode being overwritten will just have it's observation padded with zeros right? (just like any start-of-the-episode transition in a dataset of episodes of transitions)

I want to make sure it's correct as I'm applying this fix to my local repo as well and hope I haven't introduced a bug with this fix.
Best,
Gunshi

from d3rlpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.