Code Monkey home page Code Monkey logo

qwen's People

Contributors

artificialzeng avatar chywang avatar cyente avatar dlutsniper avatar eltociear avatar fyabc avatar ganjinzero avatar hanpenggit avatar haonan-li avatar huybery avatar hzhwcmhf avatar jiacheo avatar jianxinma avatar jin-hao80 avatar jklj077 avatar joindn avatar justinlin610 avatar jxst539246 avatar liudayiheng avatar logicwong avatar lucid1ty avatar lukeming-tsinghua avatar seanxuu avatar simonjjj avatar songt96 avatar tuhahaha avatar wysaid avatar yangapku avatar yongchn avatar zsc19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qwen's Issues

请求int4量化模型

未量化的模型会导致在RAM容量较少的情况下占满RAM导致崩溃(例如Google Colaboratory和Kaggle Notebook环境),所以在此希望官方能像ChatGLM-6B一样在Huggingface上上传一份int4量化后的模型。
image

RuntimeError: value cannot be converted to type at::Half without overflow

跑官方README里面的例子和repo中的demo.py都报这个错误。

File "/root/.cache/huggingface/modules/transformers_modules/Qwen/Qwen-7B-Chat/44e46a0f02169a2c4790fbcccec82cd20f4df717/qwen_generation_utils.py", line 349, in call
scores[i, self.eos_token_id] = float(2**30)
RuntimeError: value cannot be converted to type at::Half without overflow

Flash attention 加速效果较差,大约只提升5%的推理速度

Hi
我按照 您这边给的flash attention 安装步骤,成功安装了 flash attention,
在运行时 log也显示了:
use flash_attn rotary
use flash_attn rms_norm

我在A100 机器上测试,方式安装flash attention比不安装带来的性能提速,只能带来低于5%的推理提速,(每个token的生成耗时)
所以我想问问,在你们内部实测时,flash attention 带来的性能提升大概是多少呀

MPS does not support cumsum op with int64 input

您好,尝试在M1的mac上运行模型,由于内存问题,加了一个offload_folder和torch_dtype,代码如下:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch

tokenizer = AutoTokenizer.from_pretrained("/Users/sniper/model/Qwen-7b-chat", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("/Users/sniper/model/Qwen-7b-chat", device_map="auto",
                                             offload_folder="offload", torch_dtype=torch.float16,
                                             trust_remote_code=True, fp16=True).eval()


model.generation_config = GenerationConfig.from_pretrained("/Users/sniper/model/Qwen-7b-chat",
                                                           trust_remote_code=True)  
# 第一轮对话 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)

但是在chat行(倒数第二)出现错误:

 position_ids = attention_mask.long().cumsum(-1) - 1
RuntimeError: MPS does not support cumsum op with int64 input

请问这是什么原因呀

关于text-generation-webui调用,前面的兄弟,有网页版的

在huggingface上留言可能看不到,这里热闹一些:
使用text-generation-webui加载Qwen/Qwen-7B-Chat模型的时候参数如图一所示(这台机器显卡太差,CPU较好),加载之后默认只能使用1个CPU线程(如图二),大量的CPU被闲置,然后推理速度非常非常慢,我查了你们开源的readme,没有看到启动参数调整的信息,请问我可以在哪里调整启动参数,使用更多的CPU用于推理呢,谢谢。
PS:Git从huggingface下载的时候默认会漏一个文件qwen.tiktoken,我不知道是不是我的特例。
微信图片_20230804091731
4861308d0ae0fe62430e99d7cd6503f3

工具调用的评估方式

非常有价值的工作!
不过工具调用的评估没有太多的材料,希望官方能提供评估的数量级,以及训练时是否针对了特定API进行了训练,对于没见过的API的工具选择的效果如何呢?希望开发者能回复,谢谢!

安装flash-attn错误

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Bug of tokenize "<|endoftext|>"

在对"<|endoftext|>"进行tokenize的时候,会将其切分成多个token,而不是151643这一个token。

运行脚本:

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen-7B', trust_remote_code=True)
print('encode <|endoftext|>: {}'.format(tokenizer.encode('<|endoftext|>')))

分词结果为:

encode <|endoftext|>: [27, 91, 8691, 723, 427, 91, 29]

希望qwen的同学修复一下。

安装flash-attention装不上 报错

我使用Python3.8/3.10安装这个flash-attention都装不上 报错 Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects,有遇到过这个问题的小伙伴吗

qwen7b的硬件需求是什么呀?

请问有公布模型训练和测评的硬件需求吗?需要调研硬件资源需求,但是md文件好像没有明确说明,大家有看到嘛?

基于Qwen-7B实现了QLoRA多轮对话微调,完善API和Web demo功能

首先感谢开源 Qwen-7B 模型,我基于该模型实现了 QLoRA 多轮对话微调,项目地址:https://github.com/hiyouga/LLaMA-Efficient-Tuning

QLoRA 指令微调:

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path Qwen/Qwen-7B-Chat \
    --do_train \
    --dataset sharegpt_zh \
    --template chatml \
    --finetuning_type lora \
    --lora_target c_attn \
    --output_dir qwen_lora \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate 3e-5 \
    --num_train_epochs 1.0 \
    --quantization_bit 4 \
    --fp16

Web Demo:

python src/web_demo.py \
    --model_name_or_path Qwen/Qwen-7B-Chat \
    --template chatml

API 部署(基于 OpenAI 格式):

python src/api_demo.py \
    --model_name_or_path Qwen/Qwen-7B-Chat \
    --template chatml

另外,希望开发者可以修复一下 tokenizer 的 decode 方法,使其支持 skip_special_tokens 参数,便于后续开发,目前该参数没有实际生效。 (最新版已修复)

源码对应位置:huggingface.co/Qwen/Qwen-7B-Chat/blob/5e7f6a3f41724e7cb8ea3e3be7a1faf2bd5d6a38/tokenization_qwen.py#L228

def _decode(
    self,
    token_ids: Union[int, List[int]],
    skip_special_tokens: bool = False,
    clean_up_tokenization_spaces: bool = None,
    **kwargs,
) -> str:
    if isinstance(token_ids, int):
        token_ids = [token_ids]
    return self.tokenizer.decode(token_ids)

输入文本较长时无输出结果

首先感谢开源qwen-7B大模型!
我在使用chat版本时遇到输入文本较长时无输出结果的问题,输入指令的文字长度为4722,该指令经过tokenizer编码后的input_ids长度为3172,我修改了generation_config.json中关于输入长度的配置:

  "max_context_size": 4096

但是模型的 response 是一个空字符串,我通过单步调试确认没有因为token过长等原因提前结束,而是进入了正常的自回归解码过程,输出的前两个token刚好是 stop_words_ids中的两个 token,我看了下readme中应该是能支持8k规模的context:

Support of 8K Context Length. Both Qwen-7B and Qwen-7B-Chat support the context length of 8K, which allows inputs with long contexts.

我尝试将指令输入截断至3265个字这时又能正常输出结果,想问下这是什么原因呢?单纯是输入过长导致性能不好还是我的使用方式存在问题?

此项目需要解决的问题:1、...

1、按README的方法从头到尾实践后,无法启动。
2、下载flash-attention后,无法成功pip install csrc/layer_norm和pip install csrc/rotary。
2、无法流式问答。
4、无webUI。
5、没有说明如何加载本地模型,本地模型的路径应该填写在哪里?希望给个代码范本。
6、按说明安装环境后,在项目内打开CMD输入python medo.py加载后报错:device_map="auto"
总结:希望有更易读且全面的说明流程(起码按README的方法从头到尾实践后可运行)。
如果不改进可能:不利于推广,即使那么多人说你的好,却没有一个真正运行后的测评,也没有视频真正去讲解,因为没人能按你的readme运行得起来。

What is the padding token?

Thanks for your amazing work. By the way, may I ask that what is the padding token in your tokenizer? Without that, I don't think I can perform finetuning on this model.

logn attention size does not match

modeling_qwen.py, line 373

seq_end = key.size(0)
logn_tensor = self.logn_tensor[:, seq_start:seq_end, :, :]

should be

seq_start = key.size(1) - query.size(1)
seq_end = key.size(1)
logn_tensor = self.logn_tensor[:, seq_start:seq_end, :, :]

24G GPU 炸显存了

用你们的DEMO,结果跑不起来,炸显存了,难道只能用量化的吗?
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 23.65 GiB total capacity; 20.85 GiB already allocated; 1.26 GiB free; 20.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

无法使用deepspeed zero3训练

│ /root/.cache/huggingface/modules/transformers_modules/Qwen-7B/modeling_qwen.py:206 in __init__   │
│                                                                                                  │
│    203 │   │   self.use_logn_attn = config.use_logn_attn                                         │
│    204 │   │                                                                                     │
│    205 │   │   logn_list = [math.log(i, self.seq_length) if i > self.seq_length else 1 for i in  │
│ ❱  206 │   │   self.logn_tensor = torch.Tensor(logn_list)[None, :, None, None]                   │
│    207 │   │   self._ntk_cached = 1.0                                                            │
│    208 │   │                                                                                     │
│    209 │   │   self.attn_dropout = nn.Dropout(config.attn_pdrop)                                 │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py:209 in    │
│ new_tensor                                                                                       │
│                                                                                                  │
│    206 def get_new_tensor_fn_for_dtype(dtype: torch.dtype) -> Callable:                          │
│    207 │   def new_tensor(cls, *args) -> Tensor:                                                 │
│    208 │   │   device = torch.device(get_accelerator().device_name(os.environ["LOCAL_RANK"]))    │
│ ❱  209 │   │   tensor = _orig_torch_empty(0, device=device).new_empty(*args)                     │
│    210 │   │   if tensor.is_floating_point():                                                    │
│    211 │   │   │   tensor = tensor.to(dtype)                                                     │
│    212                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: new_empty(): argument 'size' must be tuple of ints, but found element of type float at pos 2049

麻烦解决一下。

fail to save tokenizer

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen-7B', trust_remote_code=True)
tokenizer.save_pretrained('checkpoint')

fail to save tokenizer

    vocab_files = self.save_vocabulary(save_directory, filename_prefix=filename_prefix)
TypeError: save_vocabulary() got an unexpected keyword argument 'filename_prefix'

下一步会提供stream chat接口吗?尝试加了一下,会有乱码 😂

在modeling_qwen.py里加进去了,但有时候好像会有乱码。

image

下面是diff,求指导。

diff --git a/modeling_qwen.py b/modeling_qwen.py
index cc58746..a0361d9 100644
--- a/modeling_qwen.py
+++ b/modeling_qwen.py
@@ -883,6 +883,7 @@ class QWenLMHeadModel(QWenPreTrainedModel):
         history: Optional[HistoryType],
         system: str = "You are a helpful assistant.",
         append_history: bool = True,
+        stream: Optional[bool] = False,
     ) -> Tuple[str, HistoryType]:
 
         if history is None:
@@ -902,25 +903,39 @@ class QWenLMHeadModel(QWenPreTrainedModel):
         )
         input_ids = torch.tensor([context_tokens]).to(self.device)
 
-        outputs = self.generate(
-            input_ids,
-            stop_words_ids=stop_words_ids,
-            return_dict_in_generate=False,
-        )
+        if stream:
+            from transformers_stream_generator.main import NewGenerationMixin, StreamGenerationConfig
+            self.__class__.generate = NewGenerationMixin.generate
+            self.__class__.sample_stream = NewGenerationMixin.sample_stream
+            stream_config = StreamGenerationConfig(**self.generation_config.to_dict(), do_stream=True)
 
-        response = decode_tokens(
-            outputs[0],
-            tokenizer,
-            raw_text_len=len(raw_text),
-            context_length=len(context_tokens),
-            chat_format=self.generation_config.chat_format,
-            verbose=False,
-        )
+            def stream_generator():
+                outputs = []
+                for token in self.generate(input_ids, stop_words_ids=stop_words_ids, return_dict_in_generate=False, generation_config=stream_config):
+                    outputs.append(token.item())
+                    yield tokenizer.decode(outputs, skip_special_tokens=True)
+
+            return stream_generator()
+        else:
+            outputs = self.generate(
+                input_ids,
+                stop_words_ids=stop_words_ids,
+                return_dict_in_generate=False,
+            )
+
+            response = decode_tokens(
+                outputs[0],
+                tokenizer,
+                raw_text_len=len(raw_text),
+                context_length=len(context_tokens),
+                chat_format=self.generation_config.chat_format,
+                verbose=False,
+            )
 
-        if append_history:
-            history.append((query, response))
+            if append_history:
+                history.append((query, response))
 
-        return response, history
+            return response, history
 
     def generate(
         self,

关于 SFT 训练 label mask 的疑问

  1. system 和 user 对应的 <|im_end|> 是否要添加 label mask(label_id设为 -100)?
  2. assistant 的 <|im_end|> 后面的 \n 是否要添加 label mask?

测试输入:

<|im_start|>system
system test<|im_end|>
<|im_start|>user
round 1 query<|im_end|>
<|im_start|>assistant
round 1 answer<|im_end|>
<|im_start|>user
round 2 query<|im_end|>
<|im_start|>assistant
round 2 answer<|im_end|>

tokenizer 结果

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.