Maybe some dumb questions about the N-step ReplayBuffer In upd

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Some questions on the N-step ReplayBuffer about rainbow-is-all-you-need HOT 4 CLOSED

curt-park commented on July 3, 2024

Some questions on the N-step ReplayBuffer

from rainbow-is-all-you-need.

Comments (4)

MrSyee commented on July 3, 2024 1

Hi, @ty2000 !

Thank you for your feedback. I answer your questions.

When the next_obs is less than n-step, for example just 2 step, done is True by _get_n_step_info(). Then reward of n-step buffer is R_1 + gamma * R_2, and gamma * next_q_value * mask is 0 by mask (1 - done) in _compute_dqn_loss(). target is just reward, that is $R_1 + \gamma R_2$. For this reason, it doesn't matter if it's gamma ** self.n-step.
It is right that you said in 2.1. Current obs must be the same in the 1-step buffer and n-step buffer when the same index. @Curt-Park We'll consider how this part does modify. However, it is just OK that 1-step buffer and n-step buffer are only synchronized at the current obs. Because we want that it is updated about the Q value of current obs.

Thanks.

from rainbow-is-all-you-need.

MrSyee commented on July 3, 2024 1

@Curt-Park I'd checked this PR and merged. Thank you for your prompt action!

@ty2000 You're right. We fixed this problem. Can you check this code again? Thanks.

from rainbow-is-all-you-need.

ty2000 commented on July 3, 2024

Thanks for the clarification. You are right on question 1 and 2.1, I missed the mask part.

As for 2.1, I also think ideally obs in 1-step and n-step should be the same, however the implementation right now actually has same next_obs, instead of obs. maybe when sampling from n-step buffer using the index, the index needs to be adjusted backward to retrieve the correct obs, or maybe return self.n_step_buffer[0] in the n-step buffer's store method, to be saved into the 1-step buffer, so these two buffers can be synchronized

from rainbow-is-all-you-need.