Comments (2)
Hi,
To be clear, the code has two nested ifs: the outer is to check if the states are the same, the other to check if the actions are the same. So when the states and actions are the same, the trace is set to 1 (replacing traces). When the states are the same, but the actions are different, the trace is reset to 0. Although there are different choices for how to replacing traces in control, this resetting of non-selected actions in the trace to zero is the approached advocated in the book:
https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node80.html
There are several possible ways to generalize replacing eligibility traces for use in control methods. Obviously, when a state is revisited and a new action is selected, the trace for that action should be reset to 1. But what of the traces for the other actions for that state? The approach recommended by Singh and Sutton (1996) is to set the traces of all the other actions from the revisited state to 0.
Interestingly, the math they provide in the book is actually wrong, because after that text they include the accumulating traces for the same state-action pair and the reset to zero for state state different action :p However, the paper they're citing is clear and much more detailed on the reasoning. That paper is here:
http://www-all.cs.umass.edu/pubs/1995_96/singh_s_ML96.pdf
See the pseudocode on page 142
If you don't think there is something else I'm missing, I'll close.
from burlap.
Huh, interesting. It looks like the second version of Sutton's book has inconsistency with his first version. (maybe because it's incomplete yet)
http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html (page 161 definition, page 162 pseudocode)
Anyway, thanks very much for your confirmation and the reference. I am good to close it
from burlap.
Related Issues (20)
- Q Learning Stochastic Movements HOT 5
- PolicyIteration error, type mismatch? HOT 1
- Cumulative probability incorrect in EMinMaxPolicy.getActionDistributionForState() ? HOT 1
- Cumulative probability incorrect in ECorrelatedQJointPolicy.getActionDistributionForState() ? HOT 1
- Hashing Performance in SimpleHashableState HOT 4
- DiscretizingMaskedHashableStateFactory is missing methods for handling config HOT 1
- StochasticGames SCPSolver does not work with Java 64-bit under Windows x64 HOT 2
- Renaming objects in a GenericOOState changes the order in objectsByClass list
- is the website source available?
- PerformancePlotter TRIAL_AVERAGES_ONLY Mode
- Null action issue on allApplicableActionsForTypes in ActionUtils.java
- Burlap incompatibilities HOT 1
- ROS - Continous Domain
- Possible bug in "performReachabilityFrom" function in PolicyIteration.java
- Actions are not published after terminal state
- StatePainter's Graphics2D Context Doesn't Update
- getTotalValueIterations() for ValueIteration?
- PolicyUtils.Rollout(Policy, State, SampleModel) will hang when policy does not reach goal state
- Agent can try to move out of bounds, throwing an ArrayIndexOutOfBoundsException HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from burlap.