Comments (9)
Hello @hamza-alethea. Sorry, the current conversion script is for the checkpoint of our training framework, we will fix it at very soon. By the way, FasterTransformer does not support inference on V100 machines. A quantized version of the GLM-130B that allows efficient INT8 inference on the V100 will be released in the next few days, so please be patient and keep an eye on our GitHub repo.
from glm-130b.
Thank you!
Do you know any other method which helps me to reduce response time for GLM-130B?
from glm-130b.
We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
from glm-130b.
We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
Hello,the Quantization method referred in the link can also apply to GLM-10B model?
from glm-130b.
We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
Hello,the Quantization method referred in the link can also apply to GLM-10B model?
We haven't tried it, but I think a smaller model might be easier to do quantization.
from glm-130b.
We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
Hello,the Quantization method referred in the link can also apply to GLM-10B model?We haven't tried it, but I think a smaller model might be easier to do quantization.
i juset convert to 4-way for the GLM-10B,but i run the generate_block.sh,but failed to load the model
from glm-130b.
We have just released the quantized version of GLM-130B. The V100 servers can efficiently run the GLM-130B in INT8 precision, see Quantization of GLM-130B for details.
Hello,the Quantization method referred in the link can also apply to GLM-10B model?We haven't tried it, but I think a smaller model might be easier to do quantization.
i juset convert to 4-way for the GLM-10B,but i run the generate_block.sh,but failed to load the model
i change the MPSIZE to4 in the script
from glm-130b.
用FasterTransformer加速推理中指定某个pt,还是整个pt的文件夹?
from glm-130b.
Hello @jiangliqin! This repo is only for GLM-130B model, we have not yet done quantization for GLM-10B.
from glm-130b.
Related Issues (20)
- 6 cards inference HOT 1
- [Question]GLM-130B模型有vocab文件吗? HOT 1
- GLM-130B 模型结构超参问题
- 关于docs/quantization.md中图片疑问
- 训练目标
- 关于FT inference benchmark数据的疑问
- 每个token耗时呈脉冲式变化
- GLM-130B文档中描述model weights,GPU内存需要260G,测试demo中实际测试总占用在240G左右,请问是什么原因
- 模型并行集群怎么搭建
- 请问GLM可以在输出内容时,同时输出引用内容的来源吗?
- 模型申请页面无法提交申请 HOT 1
- 基于130B有chat版本开源的计划吗?
- 申请邮件收到的模型下载链接都失效了 HOT 5
- FasterTransformer能否支持Glm6B呢
- glm2-130B will it be made? HOT 1
- 请问,课程链接在哪里? HOT 1
- RuntimeError: probability tensor contains either `inf`, `nan` or element < 0answers, answers_with_style, blanks = fill_blanks(raw_text, model, tokenizer, strategy)
- 8卡 fastertransformer 推理报错RuntimeError: [FT][ERROR] Assertion fail: /home/young.ruan/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:539
- 执行bash scripts/generate.sh --input-source interactive时出现的错误。大佬救救! HOT 1
- Clarification Request on GLM-130B Model Architecture and Licensing for Commercial Use
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glm-130b.