Comments (6)
@lockmatrix Hi,
Interesting point. I think it worths a trial, but the problem may lie in the intolerable speed (as the memory cannot contain the full checkpoint of 260G, we need frequent disk swapping and it can be even slower compared to swapping between main memory and GPU memory).
What's your view on the speed of GLM-130B's inference? When resources are limited, do you prefer to sacrifice more generation performance or speed?
from glm-130b.
er, 260G is too much, it needs a smaller model for pc
from glm-130b.
What's your view on the speed of GLM-130B's inference? When resources are limited, do you prefer to sacrifice more generation performance or speed?
I prefer sacrifice more generation performance, It will be much more useful if a cheaper PC can inference it.
M1Ultra (mac studio) costs 30k RMB, has 128GB memory, which is cheaper than 8* RTX 3090.
For people only want to inference it but not train it, get 8* RTX 3090 is not a good idea.
from glm-130b.
Have you ever tried GLM's 10 billion version? We provide an English and a Chinese monolingual GLM-10B version, and according to my estimation, it only requires about 40G memory when using FP16.
from glm-130b.
Have you ever tried GLM's 10 billion version? We provide an English and a Chinese monolingual GLM-10B version, and according to my estimation, it only requires about 40G memory when using FP16.
No.
I will try it when I get a M1Max
from glm-130b.
Great. Thanks for your advice.
from glm-130b.
Related Issues (20)
- GLM-130B 模型结构超参问题
- 关于docs/quantization.md中图片疑问
- 训练目标
- 关于FT inference benchmark数据的疑问
- 每个token耗时呈脉冲式变化
- GLM-130B文档中描述model weights,GPU内存需要260G,测试demo中实际测试总占用在240G左右,请问是什么原因
- 模型并行集群怎么搭建
- 请问GLM可以在输出内容时,同时输出引用内容的来源吗?
- 模型申请页面无法提交申请 HOT 1
- 基于130B有chat版本开源的计划吗?
- 申请邮件收到的模型下载链接都失效了 HOT 5
- FasterTransformer能否支持Glm6B呢
- glm2-130B will it be made? HOT 1
- 请问,课程链接在哪里? HOT 1
- RuntimeError: probability tensor contains either `inf`, `nan` or element < 0answers, answers_with_style, blanks = fill_blanks(raw_text, model, tokenizer, strategy)
- 8卡 fastertransformer 推理报错RuntimeError: [FT][ERROR] Assertion fail: /home/young.ruan/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:539
- 执行bash scripts/generate.sh --input-source interactive时出现的错误。大佬救救! HOT 1
- Clarification Request on GLM-130B Model Architecture and Licensing for Commercial Use
- 有用tensortRT-llm的docker环境跑通模型的吗?求助...
- 请各位大佬伸以援手,我想要在自己本地部署一个该模型,怎么在windows上进行部署?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glm-130b.