I think it is cheaper.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

M1Max 64G inference support? or even M2Max (in future)? about glm-130b HOT 6 CLOSED

thudm commented on July 20, 2024

M1Max 64G inference support? or even M2Max (in future)?

from glm-130b.

Comments (6)

Xiao9905 commented on July 20, 2024

@lockmatrix Hi,

Interesting point. I think it worths a trial, but the problem may lie in the intolerable speed (as the memory cannot contain the full checkpoint of 260G, we need frequent disk swapping and it can be even slower compared to swapping between main memory and GPU memory).

What's your view on the speed of GLM-130B's inference? When resources are limited, do you prefer to sacrifice more generation performance or speed?

from glm-130b.

lockmatrix commented on July 20, 2024

er, 260G is too much, it needs a smaller model for pc

from glm-130b.

lockmatrix commented on July 20, 2024

What's your view on the speed of GLM-130B's inference? When resources are limited, do you prefer to sacrifice more generation performance or speed?

I prefer sacrifice more generation performance, It will be much more useful if a cheaper PC can inference it.
M1Ultra (mac studio) costs 30k RMB, has 128GB memory, which is cheaper than 8* RTX 3090.
For people only want to inference it but not train it, get 8* RTX 3090 is not a good idea.

from glm-130b.

Xiao9905 commented on July 20, 2024

@lockmatrix ,

Have you ever tried GLM's 10 billion version? We provide an English and a Chinese monolingual GLM-10B version, and according to my estimation, it only requires about 40G memory when using FP16.

from glm-130b.

lockmatrix commented on July 20, 2024

Have you ever tried GLM's 10 billion version? We provide an English and a Chinese monolingual GLM-10B version, and according to my estimation, it only requires about 40G memory when using FP16.

No.
I will try it when I get a M1Max

from glm-130b.

Xiao9905 commented on July 20, 2024

Great. Thanks for your advice.

from glm-130b.

Recommend Projects

M1Max 64G inference support? or even M2Max (in future)? about glm-130b HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent