Code Monkey home page Code Monkey logo

Comments (3)

sailxjx avatar sailxjx commented on August 22, 2024

@HuihuiTong 能不能看下完整的代码?以我们内部的经验,按你这种方式将 evaluate 改成异步,是可以加速一倍的

from di-engine.

HuihuiTong avatar HuihuiTong commented on August 22, 2024

@HuihuiTong 能不能看下完整的代码?以我们内部的经验,按你这种方式将 evaluate 改成异步,是可以加速一倍的

主要改动,就是参考我们DI-engine的文档:https://di-engine-docs.readthedocs.io/zh_CN/latest/distributed/index_zh.html#id4
beseline是:https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline
修改的方式,跟DI-engine文档只中一致,其他没有改动。
我整理一下,将两份对比代码发出来。

还有个问题,想顺便问一下,其他几个关于DI-engine框架疑惑点:

  1. 对于训练规模与分布式。我想提升环境数量以及训练规模,比如从baseline中的8个,提升到100个,但是在多个节点上分布式训练(比如10个节点运行env,一个节点运行learner训练),这样基于现有DI-engine需要怎么修改?是否有demo可参考?
  2. 关于训练中参数max_iterations,我设置为2000,训练依然训练十几个小时,还没有结束。我不太理解这个参数控制的范围。
  3. 有没有关于DI-engine框架,使用交流的微信或答疑群。

from di-engine.

sailxjx avatar sailxjx commented on August 22, 2024

@HuihuiTong 文档没跟上,可以见最新的变更,https://github.com/opendilab/DI-engine-docs/blob/ae8604be47a248c420e33a2bdb1bbe188d335669/source/distributed/index_zh.rst

对于这几个问题我解释一下:

  1. 你可以改用 subprocessenvmanager,然后将配置中的 collect_env_num 改成 100 个,但是我比较怀疑这样的效果,因为更多的进程会增加更多的 cpu 和内存,还有采集到 buffer 的 IO 开销。你可以用 step timer 打印出来的时间看一下 learn 和 collect 每一步的时间,如果发现 collect 比 learn 快,那就没必要增加环境数量,而应该增加 learn 的效率。
  2. max_iterations 决定了总训练循环数,跟第一个问题一样,你得观察一下每个循环的时间,如果一个循环的时间需要几秒,那么确实有可能 2000 个循环也要好几个小时
  3. 要获得比较快的回答,目前还只能用 github issue,社区和群得花太多时间运营,目前资源还不太够。

from di-engine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.