Hello, I have been recently learning RL with Sutton's book along with implementation

Is this a bug in SarsaLam implementation? about burlap HOT 2 CLOSED

mysl commented on July 18, 2024

Is this a bug in SarsaLam implementation?

from burlap.

Comments (2)

jmacglashan commented on July 18, 2024

Hi,

To be clear, the code has two nested ifs: the outer is to check if the states are the same, the other to check if the actions are the same. So when the states and actions are the same, the trace is set to 1 (replacing traces). When the states are the same, but the actions are different, the trace is reset to 0. Although there are different choices for how to replacing traces in control, this resetting of non-selected actions in the trace to zero is the approached advocated in the book:

https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node80.html

There are several possible ways to generalize replacing eligibility traces for use in control methods. Obviously, when a state is revisited and a new action is selected, the trace for that action should be reset to 1. But what of the traces for the other actions for that state? The approach recommended by Singh and Sutton (1996) is to set the traces of all the other actions from the revisited state to 0.

Interestingly, the math they provide in the book is actually wrong, because after that text they include the accumulating traces for the same state-action pair and the reset to zero for state state different action :p However, the paper they're citing is clear and much more detailed on the reasoning. That paper is here:

http://www-all.cs.umass.edu/pubs/1995_96/singh_s_ML96.pdf

See the pseudocode on page 142

If you don't think there is something else I'm missing, I'll close.

from burlap.

mysl commented on July 18, 2024

Huh, interesting. It looks like the second version of Sutton's book has inconsistency with his first version. (maybe because it's incomplete yet)
http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html (page 161 definition, page 162 pseudocode)

Anyway, thanks very much for your confirmation and the reference. I am good to close it

from burlap.

Recommend Projects

Is this a bug in SarsaLam implementation? about burlap HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent