Code Monkey home page Code Monkey logo

Comments (14)

elesun2018 avatar elesun2018 commented on June 9, 2024

git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --bf16 --stream_chat
可以正常推理,但是

python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 4

出现错误:Floating point exception (core dumped)
image

from cogvlm.

zRzRzRzRzRzRzR avatar zRzRzRzRzRzRzR commented on June 9, 2024

--fp16 不要写

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

不加入fp16
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --quant 4
又出现了问题:
模型:error message Input type (c10::Half) and bias type (float) should be the same

请问不是torch xformer apex cudnn版本的问题吧

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

readme里面有提示啊:# In SAT version,--quant should be used with --fp16

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

Huggingface demo
git0322-CogVLM/basic_demo# python cli_demo_hf.py --from_pretrained hugcog1125/cogvlm-chat-hf --fp16
Floating point exception (core dumped)
加入--fp16也出现了这个问题,谢谢

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

git0322-CogVLM/basic_demo# python cli_demo_hf.py --from_pretrained hugcog1125/cogvlm-chat-hf --quant 4
Floating point exception (core dumped)

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

使用原来的cuda11.8镜像,多次尝试匹配torch xforms不同版本,不断出现新问题。
使用cuda12.1的新镜像,Floating point exception (core dumped)就消失了。
估计是跟cuda cudnn有关系。
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 4 OK
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 8 OK
git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 OK

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

测试发现:
quant 4bit 推理占用显存29G
quant 8bit 推理占用显存38G
不量化 推理占用显存42G
显存占用正常吗,有没有办法让quant 4bit 占用的显存小于24G,可以在4090显卡上运行?

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

web demo 4bit 显存占用12G.
eval评估代码 4bit后显存占用29G。
请问如何控制eval占用显存过大。修改代码位置

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

web demo 4bit 显存占用12G.
eval评估代码 4bit后显存占用29G。
请问如何控制eval占用显存过大。修改代码位置

from cogvlm.

liangdebugger avatar liangdebugger commented on June 9, 2024

使用原来的cuda11.8镜像,多次尝试匹配torch xforms不同版本,不断出现新问题。 使用cuda12.1的新镜像,Floating point exception (core dumped)就消失了。 估计是跟cuda cudnn有关系。 git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 4 OK git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 --quant 8 OK git0322-CogVLM/basic_demo# python web_demo.py --from_pretrained cogvlm-base-490 --version base --stream_chat --fp16 OK

想问一下,最后选择了什么版本的 torch 与xformers 呢?

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

torch 2.2.1+cu121
xformers 0.0.25

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

你好,还有一个关于量化的后续问题
4bit量化后显存12G,但是内存占用率还是比较大,出现了OOM错误。
如何减少内存使用
能否保存量化后的模型,加载量化后的模型进行推理,这样内存是否会降低,是否可行,谢谢!

from cogvlm.

elesun2018 avatar elesun2018 commented on June 9, 2024

空了能否解答一下,谢谢

from cogvlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.