Comments (7)
遇到了相同的问题,
- 不用flash-attn batchsize=1可以正常出结果 batchsize>1时候有padding的样本过模型后输出为nan
- 安装flash-attn 就好了
from qwen.
Hi, I'm not sure I understand your use case. May I know what results you were expecting? You have literally prevented the initial position from attending to itself and it should be expected that the model did not know what next token would be.
from qwen.
yeah sorry that I may not make it clear.
I was trying to use model.forward function directly rather than calling model.generate function, in order to observe its behavior in the forward pass.
My input is of different lengths, so I have to pad them to the same lengths. I used left padding, prepending pad <endoftext> tokens. In my opinion, those pad tokens should not be attended, and attention_mask is used in this scenario, setting those positions to 0 so that the model won't attend to those pad tokens in the forward pass.
However, I got all NaN logits, which confuses me. I tried not to pass the attention_mask parameter, and there are no NaN values in the logits, which is I expected. So I infer that this may be the problem of the attention_mask. To further locate the problem, I tried different attention_masks, finally found out that If we set the first position to 0 (in which case the model won't attend to the first token which is a pad token), the return values of model.forward function , i.e., the logits, will all be NaN values.
Also, I tried Qwen1.5-7B-Chat model, and it does not have this problem, i.e., even if I set the attention_mask of the first position to 0, the output will still be free of NaN values. So I suspect that this may be a problem of Qwen-7B-Chat.
But also, I may make mistakes, please let me know if I do.
from qwen.
And If the masked tokens in the left positions should not know what the next token should be due to that they are prevented from attending to themselves, why are the logits of other un-masked positions (the right positions ) are also NaN values? Did I get it wrong?
from qwen.
Hi, after reading through your comments, and if I understood correctly, Qwen1.5 was working as you would expect. I would suggest just using Qwen1.5.
P.S.: Investigating the original issue is more complicated than it appeared. Was flash attention enabled? Were you following the instructions in README to do batch inference?
from qwen.
Hi, Qwen1.0 models and code will not be updated anymore. Please try Qwen2.0 instead.
from qwen.
遇到一样的问题,qwen2模型做cls任务,在使用flash-attn 正常,不使用时就会出现nan问题,目前onnx导出好像还不支持flash-attn 的算子,导出报错
from qwen.
Related Issues (20)
- [BUG] the max_position_embeddings parameter in the config.json for Qwen2-57B-A14B has been mistakenly set to 131072.
- [BUG] <title>Adding regular tokens is not supported HOT 1
- 如何修改模型的结构 HOT 1
- [BUG] <title> vLLM推理乱码 HOT 2
- Qwen 的开源模型能输出 logprobs吗? HOT 3
- [BUG] docker_openai_api.sh 报can't open file 'openai_api.py' HOT 1
- 推理时的显存使用为啥这么少呢? HOT 1
- [BUG] <title>Qwen2-7b-instruct使用SFT-FT,loss变为0,如何解决? HOT 2
- 大模型function call对比传统nlp方式有什么优势? HOT 2
- [BUG] 百炼文档中function call 的示例有误 HOT 1
- 请教下为什么Qwen/finetune.py和Qwen/eval/evaluate_ceval.py 的tokenizer的padding_side 不一样呢? HOT 1
- [BUG] Qwen 1.8B 多线程推理时报错 HOT 2
- [BUG] <title> model_max_length 32768 not work HOT 4
- [BUG] <title> 请问QWenLMHeadModel中的QWenModel模块是处理文本信息吗? HOT 1
- 官方推理脚本和模型文件中的pad_token不一致 HOT 1
- Qwen-Chat-RLHF和Qwen-Chat的区别 HOT 1
- [BUG] 增加上下文长度后输出乱码 HOT 1
- [BUG] <title>Nvidia Jetson Orin NX开发板上推理运行qlora微调之后的模型,报错:不支持QuantLinear() HOT 1
- AWQ量化后,输出不能正常停止,不量化推理正常 HOT 1
- 请问可以支持加入本地知识库进行微调大模型吗 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qwen.