Code Monkey home page Code Monkey logo

Comments (2)

Sengxian avatar Sengxian commented on July 20, 2024

您好 @XiaoqingNLP

  1. 是我们自己的训练用的开发仓库名字,基于 Megatron-DeepSpeed 修改而来。
  2. 在流水线并行下,一次个 batch 会被划分为若干 micro batch 来减少流水线的气泡,因此梯度累加是必须使用的。具体来说,我们采用的是 1F1B 的流水线 schedule。
  3. 这个只是我们排查不收敛是不是 reshuffle 导致的,此外这部分在日志中标记了(废除),最终的训练曲线没有这一段。
  4. 是的,我们发现需要缓慢改变数据的分布,否则会导致训练不收敛。
  5. 一般观察到的现象是 loss 先 spike 到几个数量级以上,再 nan。

抱歉我们的日志没有记录的太详细,当时只是内部记载用,如果还有更多问题欢迎提出 :)

from glm-130b.

eyuansu62 avatar eyuansu62 commented on July 20, 2024

请问LargeScale有开源版本吗?

from glm-130b.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.