Code Monkey home page Code Monkey logo

rl-book's People

Contributors

zhiqingxiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-book's Issues

第六章MoutainCar-v0 砖瓦编码计算的问题

在 class TileCoder 里的__call__()
def call(self, floats=(), ints=()):
dim = len(floats)
scaled_floats = tuple(f * self.layers * self.layers for f in floats)
features = []
for layer in range(self.layers):
codeword = (layer,) + tuple(int((f + (1 + dim * i) * layer) /
self.layers) for i, f in enumerate(scaled_floats)) + ints
feature = self.get_feature(codeword)
features.append(feature)
return features
这里使用了 (f + (1 + dim * i) * layer) 来计算不同layer的位置
但是实际计算, 当i=1时 会把1-7层的的坐标映射到1-9或2-10
实测表明 使用(f + layer) 效果会更好 而且会严格的把1-7层的数据映射到0-8

SARSA 算法<<<<<<<<<<<<<<<<<<<<<
平均回合奖励 = -12649.0 / 100 = -126.49
SARSA(λ) 算法<<<<<<<<<<<<<<<<<<<<<
平均回合奖励 = -10181.0 / 100 = -101.81
我们为什么要采用(f + (1 + dim * i) * layer) 来计算呢?

ValueError: too many values to unpack (expected 2)

I'm getting the following error when running the Advantage Actor-Critic to Play Acrobot-V1

ValueError Traceback (most recent call last)
in <cell line: 21>()
20 episode_rewards = []
21 for episode in itertools.count():
---> 22 episode_reward, elapsed_steps = play_episode(env, agent, seed=episode,
23 mode='train')
24 episode_rewards.append(episode_reward)

in play_episode(env, agent, seed, mode, render)
1 def play_episode(env, agent, seed=None, mode=None, render=False):
----> 2 observation, _ = env.reset(seed=seed)
3 reward, terminated, truncated = 0., False, False
4 agent.reset(mode=mode)
5 episode_reward, elapsed_steps = 0., 0

ValueError: too many values to unpack (expected 2)

第6章 小车上山

请教

  1. 代码清单6-4中,智能体的get_q方法的return,动作价值不是应该权重乘上特征向量吗,但是为什么这里是self.w[features]?

  2. 代码清单6.3 砖瓦编码
    为什么这么第一层是64个砖瓦,剩下7层是81个砖瓦呢?8+1是怎么来的?我的理解是如果选用8层,那么每层是大网格/砖瓦 相当于8*8的小格。可一层覆盖最终有网格/瓦片是怎么决定的呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.