Ch2, line62: Issue#1: If, line 48, you set self.actionCount to a

Ch2, line 48, 62, & 77 : don't seem to match book calc about reinforcement-learning-an-introduction HOT 7 CLOSED

shangtongzhang commented on May 18, 2024

Ch2, line 48, 62, & 77 : don't seem to match book calc

from reinforcement-learning-an-introduction.

Comments (7)

ShangtongZhang commented on May 18, 2024

Issue#1: note np.asarray(self.actionCount) + 1 doesn't really change self.actionCount. I add +1 to all action counts to avoid division by 0, but self.actionCount stays unchanged.
Issue#2: From my view, firstly we get the original estimation for every action, then we need to repair every action estimation per formula 2.8 in book. Then choose the action with maximum estimation.

from reinforcement-learning-an-introduction.

atki4564 commented on May 18, 2024

On Issue#1: https://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html says 'No copy is performed if the input is already an ndarray''. I tested this on https://www.pythonanywhere.com/try-ipython/, and it actually is changing the array; therefore line 77 is actually a duplicate calc. I run print just before line 77 to see who's right.

from reinforcement-learning-an-introduction.

ShangtongZhang commented on May 18, 2024

But self.actionCount is a list (line 40) not an ndarray

from reinforcement-learning-an-introduction.

atki4564 commented on May 18, 2024

My bad, python newbie mistake (sorry)

from reinforcement-learning-an-introduction.

ShangtongZhang commented on May 18, 2024

No problem. And even it's ndarray, it won't be changed. Because np.asarray(self.actionCount) + 1 simply returns a new ndarray without changing the original one.
Try

a = np.zeros(4)
b = np.asarray(a) + 1

and see the value of a and b

from reinforcement-learning-an-introduction.

atki4564 commented on May 18, 2024

On Issue#2:
then line 90,
else:
# update estimation with constant step size
self.qEst[action] += 0.1 * (reward - self.qEst[action])
is duplicate because you already 'repaired every action estimation per formula' in line 62?

from reinforcement-learning-an-introduction.

ShangtongZhang commented on May 18, 2024

No. This is normal update for action estimation. Action estimation needs to be repaired whenever we want to choose an action. But that repair isn't and shouldn't be lasting, it should be forgotten after having chosen an action.

from reinforcement-learning-an-introduction.

Recommend Projects

Ch2, line 48, 62, & 77 : don't seem to match book calc about reinforcement-learning-an-introduction HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent