Comments (14)
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --bf16 --stream_chat
可以正常推理,但是
python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 4
出现错误:Floating point exception (core dumped)
from cogvlm.
--fp16 不要写
from cogvlm.
不加入fp16
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --quant 4
又出现了问题:
模型:error message Input type (c10::Half) and bias type (float) should be the same
请问不是torch xformer apex cudnn版本的问题吧
from cogvlm.
readme里面有提示啊:# In SAT version,--quant should be used with --fp16
from cogvlm.
Huggingface demo
git0322-CogVLM/basic_demo# python cli_demo_hf.py --from_pretrained hugcog1125/cogvlm-chat-hf --fp16
Floating point exception (core dumped)
加入--fp16也出现了这个问题,谢谢
from cogvlm.
git0322-CogVLM/basic_demo# python cli_demo_hf.py --from_pretrained hugcog1125/cogvlm-chat-hf --quant 4
Floating point exception (core dumped)
from cogvlm.
使用原来的cuda11.8镜像,多次尝试匹配torch xforms不同版本,不断出现新问题。
使用cuda12.1的新镜像,Floating point exception (core dumped)就消失了。
估计是跟cuda cudnn有关系。
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 4 OK
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 8 OK
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 OK
from cogvlm.
测试发现:
quant 4bit 推理占用显存29G
quant 8bit 推理占用显存38G
不量化 推理占用显存42G
显存占用正常吗,有没有办法让quant 4bit 占用的显存小于24G,可以在4090显卡上运行?
from cogvlm.
web demo 4bit 显存占用12G.
eval评估代码 4bit后显存占用29G。
请问如何控制eval占用显存过大。修改代码位置
from cogvlm.
web demo 4bit 显存占用12G.
eval评估代码 4bit后显存占用29G。
请问如何控制eval占用显存过大。修改代码位置
from cogvlm.
使用原来的cuda11.8镜像,多次尝试匹配torch xforms不同版本,不断出现新问题。 使用cuda12.1的新镜像,Floating point exception (core dumped)就消失了。 估计是跟cuda cudnn有关系。 git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 4 OK git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 8 OK git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 OK
想问一下,最后选择了什么版本的 torch 与xformers 呢?
from cogvlm.
torch 2.2.1+cu121
xformers 0.0.25
from cogvlm.
你好,还有一个关于量化的后续问题
4bit量化后显存12G,但是内存占用率还是比较大,出现了OOM错误。
如何减少内存使用
能否保存量化后的模型,加载量化后的模型进行推理,这样内存是否会降低,是否可行,谢谢!
from cogvlm.
空了能否解答一下,谢谢
from cogvlm.
Related Issues (20)
- Running Gradio app locally results in inappropriate error: "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE." HOT 1
- Using CogVLM as an API HOT 1
- Code of finetuning the cogagent on Mind2Web ? HOT 1
- Deploy CogVLM using Docker
- Could we replace the vicuna-7b directly with stronger llm? HOT 1
- 我想用同样的promt,在每次都清除上下文的情况下得到3种答案,为什么结果都是一样的 HOT 2
- Chat with PDF documentation instead of images
- CogAgent 视觉预训练模型 EVA2-CLIP-L HOT 1
- CogVLM源代码是否支持多轮对话训练 HOT 5
- 关于模型视觉定位原理
- 运行微调脚本报错缺少相关参数 HOT 2
- 如何构建CogAgent的微调数据集? HOT 1
- 两张3090微调CogVLM的可能性? HOT 1
- 加载cogvlm-chat-hf模型报错 Error while deserializing header: MetadataIncompleteBuffer
- 我该使用什么格式的输入来用模型进行visual grounding 任务? HOT 1
- 原来带grounding功能的是哪个web demo地址? HOT 1
- Cogagent demo can not be accessed HOT 1
- torch.cuda.OutOfMemoryError: CUDA out of memory. HOT 1
- Hi,请问后续会给出CogAgent在mind2web数据集上评估的代码吗? HOT 1
- 如何多机多卡微调模型?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cogvlm.