Comments (4)
Hi, @ty2000 !
Thank you for your feedback. I answer your questions.
-
When the next_obs is less than n-step, for example just 2 step, done is True by _get_n_step_info(). Then reward of n-step buffer is R_1 + gamma * R_2, and gamma * next_q_value * mask is 0 by mask (1 - done) in _compute_dqn_loss(). target is just reward, that is
$R_1 + \gamma R_2$ . For this reason, it doesn't matter if it's gamma ** self.n-step. -
It is right that you said in 2.1. Current obs must be the same in the 1-step buffer and n-step buffer when the same index. @Curt-Park We'll consider how this part does modify. However, it is just OK that 1-step buffer and n-step buffer are only synchronized at the current obs. Because we want that it is updated about the Q value of current obs.
Thanks.
from rainbow-is-all-you-need.
@Curt-Park I'd checked this PR and merged. Thank you for your prompt action!
@ty2000 You're right. We fixed this problem. Can you check this code again? Thanks.
from rainbow-is-all-you-need.
Thanks for the clarification. You are right on question 1 and 2.1, I missed the mask part.
As for 2.1, I also think ideally obs in 1-step and n-step should be the same, however the implementation right now actually has same next_obs, instead of obs. maybe when sampling from n-step buffer using the index, the index needs to be adjusted backward to retrieve the correct obs, or maybe return self.n_step_buffer[0] in the n-step buffer's store method, to be saved into the 1-step buffer, so these two buffers can be synchronized
from rainbow-is-all-you-need.
@MrSyee Please review the pull-request.
from rainbow-is-all-you-need.
Related Issues (20)
- redundant max in double dqn HOT 4
- Google Drive ,Saving,Loading,Resuming Features. HOT 2
- input state-action pair into Rainbow DQN HOT 1
- Running on Atari Games HOT 2
- "indices" in the N-step ReplayBuffer undefined HOT 2
- Categorical DQN parameters for Acrobot HOT 1
- Update torch, numpy version HOT 1
- What your version of segment_tree
- 奇怪的是,rainbow的代码容易出现内存不足的问题 interrupted by signal 9: SIGKILL HOT 3
- Atari HOT 1
- Atari HOT 4
- bias_sigma initialization in noisy net HOT 2
- Save memory checkpoints HOT 5
- V_min and V_max - Rainbow DQN HOT 4
- does this work with mountaincar and other gym environments HOT 2
- Update frequency/method and warm-up period HOT 2
- clear momory during n_step_learning HOT 2
- AttributeError: module 'gym.wrappers' has no attribute 'Monitor' HOT 2
- Save/Load capabilities HOT 2
- Not handling time limits
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rainbow-is-all-you-need.