Comments (6)
@gunshi Hi, thanks for reporting this. Actually, that is a design to support frame stacking of those transitions. We need to keep those pointers to get previous frames. It's not functionally wrong, but I understand that it might change the number of transitions when to_mdp_dataset
.
from d3rlpy.
Hi @takuseno, I see.
I'm not sure about the functional correctness though, since when self._cursor loops around after the buffer overflows and the transitions are overwritten, the episode boundaries should reflect that.
In this case however, to_mdp_dataset
will generate the wrong episodes where at least one episode will be composed of transitions from two different unrelated episodes, simply because the oldest remaining(non-overwritten) transition of one episode still maintains a link to the previous transition, which is in fact the newest transition of a new episode.
If one uses RL algorithms that only need 1-step transitions to compute their objectives, then there is no issue here, however if one samples n-step quantities or even trajectories from this buffer, then they won't make sense for this episode right?
Thanks for your engagement!
Gunshi
from d3rlpy.
Ps: As an example, suppose I use a buffer of size 10000, and wait for the buffer to get full. Then after every 2000 steps of online RL, I check the lengths and number of episodes in the dataset derived from the buffer through the following code:
dataset = buffer.to_mdp_dataset() print(len(episode) for episode in dataset)
The sum of the lengths of these episodes should be equal to 10000 =size of buffer.
In this case however, I will get an ever increasing list that is not consistent with the size of the buffer. A partially overwritten episode whose length should ideally be lesser than it's original length (when it was first written), will still include the transitions from the "overwriting" episode, and thus include their lengths in it's own total length.
I'm not sure if my explanation was clear, and I can try to get a minimal reproducible example for this if that helps.
from d3rlpy.
hey @takuseno,
I added a fix here: #269
This is not meant to be merged, but rather just to get your review on whether the fix makes sense or is still wrong in some sense.
Could you let m know if this looks ok?
Thanks!
from d3rlpy.
@gunshi Thanks for the PR! As long as you're not doing frame stacking, it looks good to me. I'll keep it open until we figure out how to incorporate it with frame stacking logic.
from d3rlpy.
Hey! I'm not sure why it would be wrong in the context of frame stacking? At the point of discontinuity, the first/oldest remaining transition of the episode being overwritten will just have it's observation padded with zeros right? (just like any start-of-the-episode transition in a dataset of episodes of transitions)
I want to make sure it's correct as I'm applying this fix to my local repo as well and hope I haven't introduced a bug with this fix.
Best,
Gunshi
from d3rlpy.
Related Issues (20)
- [BUG] Scorer freezes HOT 3
- [BUG] Error for `gym` with version 0.26.2 HOT 2
- [QUESTION] Using n_frames for Tabular data HOT 2
- How can I accelerate my training [HELP] HOT 2
- [BUG] Question / Possible bug - observation data changing when loaded into TransitionMiniBatch() HOT 9
- [BUG] CQL Crashing HOT 3
- [Question] Question about log parameters HOT 2
- [QUESTION] Early stopping HOT 2
- [REQUEST] Vectorized / multiple env support HOT 2
- [BUG] Error in fitted q evaluation example HOT 3
- [REQUEST] How can I use Transformer to be a Encoder? HOT 3
- [QUESTION] What's meaning about discounted_sum_of_advantage_scorer? HOT 4
- [REQUEST] Adding Cal-QL HOT 2
- [Question] - Increasing model capacity HOT 2
- Models' Logits return, HOT 2
- [Question] How to append online transitions with pre-existing d4rl buffer in finetuning training? HOT 10
- [REQUEST] For next release, add the option to specify the ensemble reduction method as parameter of QFunc Factory
- Discrete version of MOPO / COMBO
- [REQUEST] Multi dimensional action space HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from d3rlpy.