Code Monkey home page Code Monkey logo

rl-book's Issues

ValueError: too many values to unpack (expected 2)

I'm getting the following error when running the Advantage Actor-Critic to Play Acrobot-V1

ValueError Traceback (most recent call last)
in <cell line: 21>()
20 episode_rewards = []
21 for episode in itertools.count():
---> 22 episode_reward, elapsed_steps = play_episode(env, agent, seed=episode,
23 mode='train')
24 episode_rewards.append(episode_reward)

in play_episode(env, agent, seed, mode, render)
1 def play_episode(env, agent, seed=None, mode=None, render=False):
----> 2 observation, _ = env.reset(seed=seed)
3 reward, terminated, truncated = 0., False, False
4 agent.reset(mode=mode)
5 episode_reward, elapsed_steps = 0., 0

ValueError: too many values to unpack (expected 2)

第6章 小车上山

请教

  1. 代码清单6-4中,智能体的get_q方法的return,动作价值不是应该权重乘上特征向量吗,但是为什么这里是self.w[features]?

  2. 代码清单6.3 砖瓦编码
    为什么这么第一层是64个砖瓦,剩下7层是81个砖瓦呢?8+1是怎么来的?我的理解是如果选用8层,那么每层是大网格/砖瓦 相当于8*8的小格。可一层覆盖最终有网格/瓦片是怎么决定的呢?

第六章MoutainCar-v0 砖瓦编码计算的问题

在 class TileCoder 里的__call__()
def call(self, floats=(), ints=()):
dim = len(floats)
scaled_floats = tuple(f * self.layers * self.layers for f in floats)
features = []
for layer in range(self.layers):
codeword = (layer,) + tuple(int((f + (1 + dim * i) * layer) /
self.layers) for i, f in enumerate(scaled_floats)) + ints
feature = self.get_feature(codeword)
features.append(feature)
return features
这里使用了 (f + (1 + dim * i) * layer) 来计算不同layer的位置
但是实际计算, 当i=1时 会把1-7层的的坐标映射到1-9或2-10
实测表明 使用(f + layer) 效果会更好 而且会严格的把1-7层的数据映射到0-8

SARSA 算法<<<<<<<<<<<<<<<<<<<<<
平均回合奖励 = -12649.0 / 100 = -126.49
SARSA(λ) 算法<<<<<<<<<<<<<<<<<<<<<
平均回合奖励 = -10181.0 / 100 = -101.81
我们为什么要采用(f + (1 + dim * i) * layer) 来计算呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.