Comments (4)
Thanks for your attention! We are still in active development for quantization and other explorations. The training code will be released after we are done, maybe later this month.
from glm-130b.
Unlike OPT-175B and BLOOM-176B, we use 40G A100 instead of its 80G version, so the GPU memory could be very tight for training. We have tried numerous combinations of parallelism / hidden size / layers and ended up using 4-way tensor and 8-way pipeline parallelism for the best training throughput, approximately 135 TFLOPS / GPU. Adam optimizer with ZeRO stage 1 is used in actual training. At least 40 nodes are required to start training. Using optimizers like Adafactor or 8-bit Adam could further reduce the minimum requirement to 24 nodes, however, we did not use them for best convergence.
from glm-130b.
@Sengxian 1. Is it convenient to introduce the basic training settings? For example, the number of model parallel mp
and zero stage
in training, the minimum number of nodes required for training?
from glm-130b.
@Sengxian hi, has this plan delayed?
maybe later this month.
from glm-130b.
Related Issues (20)
- 关于FT inference benchmark数据的疑问
- 每个token耗时呈脉冲式变化
- GLM-130B文档中描述model weights,GPU内存需要260G,测试demo中实际测试总占用在240G左右,请问是什么原因
- 模型并行集群怎么搭建
- 请问GLM可以在输出内容时,同时输出引用内容的来源吗?
- 模型申请页面无法提交申请 HOT 1
- 基于130B有chat版本开源的计划吗?
- 申请邮件收到的模型下载链接都失效了 HOT 5
- FasterTransformer能否支持Glm6B呢
- glm2-130B will it be made? HOT 1
- 请问,课程链接在哪里? HOT 1
- RuntimeError: probability tensor contains either `inf`, `nan` or element < 0answers, answers_with_style, blanks = fill_blanks(raw_text, model, tokenizer, strategy)
- 8卡 fastertransformer 推理报错RuntimeError: [FT][ERROR] Assertion fail: /home/young.ruan/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:539
- 执行bash scripts/generate.sh --input-source interactive时出现的错误。大佬救救! HOT 1
- Clarification Request on GLM-130B Model Architecture and Licensing for Commercial Use
- 有用tensortRT-llm的docker环境跑通模型的吗?求助...
- 请各位大佬伸以援手,我想要在自己本地部署一个该模型,怎么在windows上进行部署?
- 下载到一半就再也下不了了
- error about the GLM-130B’s model checkpoint HOT 1
- Could you offer a download link with Chinese mainland mirror?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glm-130b.