Code Monkey home page Code Monkey logo

Comments (5)

starry-sky6688 avatar starry-sky6688 commented on September 16, 2024

谢谢提醒,应该是

n_return[:, -1] = r[:, -1] * mask[:, -1]

for transition_idx in range(max_episode_len - 2, -1, -1):
n_return[:, transition_idx] = (r[:, transition_idx] + self.args.gamma * n_return[:, transition_idx + 1] * terminated[:, transition_idx]) * mask[:, transition_idx]

下次我会更新这里的错误。

另外对于terminated 和 padding,如你所说确实只有一步之差,因为最后一步需要特殊处理。你可以只用terminated 来实现。我是为了避免每次训练时都根据terminated来计算padding,所以直接将padding保存了下来,毕竟episode长度不一样,不能直接通过矩阵运算得到。

from marl-algorithms.

yywe avatar yywe commented on September 16, 2024

多谢回复。
不过想了想,我感觉后面这个循环里面其实也可以不用*terminated[:, transition_idx]吧?

n_return[:, -1] = r[:, -1] * mask[:, -1]
for transition_idx in range(max_episode_len - 2, -1, -1):
n_return[:, transition_idx] = (r[:, transition_idx] + self.args.gamma * n_return[:, transition_idx + 1]) * mask[:, transition_idx]

这就够了, 因为mask的原因,在边界(terminated)后面的n_return[:, transition_idx + 1])已经是0了。所以真实的最后一步(没padding的最后一步)的值仅仅会来自于 r[:, transition_idx]

当然*terminated[:, transition_idx]也没错,
这么理解对吗?

from marl-algorithms.

starry-sky6688 avatar starry-sky6688 commented on September 16, 2024

可以不用mask,但是terminated要用。当transition_idx是episode最后一步时,terminated=0, mask=1,这个时候要把下一步的return消除,只能用terminated。mask只是一个双保险,其实terminated就足够了。

from marl-algorithms.

yywe avatar yywe commented on September 16, 2024

当transition_idx是episode最后一步时,terminated=0, mask=1.
确实是。

不过这里有点tricky.

比如说max length = 5, 然后第3步结束,
这样. terminated: 00111
padded: 00011
mask= 1-padded: 11100
反转terminated: 11000

然后最后一步terminated=0, mask=1。 我的理解是这里虽然mask 为1. 下一步的reward n_return[:, transition_idx + 1] 由于是填充的 (填充的r是0, 而且下一步的mask是0, 所以循环肯定会先就把最后一步下一步的reward已经搞成0了)。
就是说最后一步的下一步的return已经是0了,所以是否有消除这个动作没关系?

from marl-algorithms.

starry-sky6688 avatar starry-sky6688 commented on September 16, 2024

可以这么理解。

from marl-algorithms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.