Code Monkey home page Code Monkey logo

baichuan-13b's Introduction

Baichuan-13B

🤗 Baichuan-13B-Base • 🤗 Baichuan-13B-Chat • 🤖 ModelScope • 💬 WeChat

license

中文 | English

更新信息

  • [2023.09.06] 我们发布了新一代开源模型 Baichuan 2,包含 7B、13B 尺寸 🔥🔥🔥
  • [2023.08.01] 更新了对齐模型 Baichuan-13B-Chat 权重,优化了部分场景的效果

目录

介绍

Baichuan-13B 是由百川智能继 Baichuan-7B 之后开发的包含 130 亿参数的开源可商用的大规模语言模型,在权威的中文和英文 benchmark 上均取得同尺寸最好的效果。本次发布包含有预训练 (Baichuan-13B-Base) 和对齐 (Baichuan-13B-Chat) 两个版本。Baichuan-13B 有如下几个特点:

  1. 更大尺寸、更多数据:Baichuan-13B 在 Baichuan-7B 的基础上进一步扩大参数量到 130 亿,并且在高质量的语料上训练了 1.4 万亿 tokens,超过 LLaMA-13B 40%,是当前开源 13B 尺寸下训练数据量最多的模型。支持中英双语,使用 ALiBi 位置编码,上下文窗口长度为 4096。
  2. 同时开源预训练和对齐模型:预训练模型是适用开发者的『 基座 』,而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型(Baichuan-13B-Chat),具有很强的对话能力,开箱即用,几行代码即可简单的部署。
  3. 更高效的推理:为了支持更广大用户的使用,我们本次同时开源了 int8 和 int4 的量化版本,相对非量化版本在几乎没有效果损失的情况下大大降低了部署的机器资源门槛,可以部署在如 Nvidia 3090 这样的消费级显卡上。
  4. 开源免费可商用:Baichuan-13B 不仅对学术研究完全开放,开发者也仅需邮件申请并获得官方商用许可后,即可以免费商用。

Benchmark结果

我们在各个权威大语言模型的中英文 benchmark 上进行了5-shot评测。结果如下:

Model 5-shot STEM Social Sciences Humanities Others Average
Baichuan-7B 38.2 52.0 46.2 39.3 42.8
Chinese-Alpaca-Plus-13B 35.2 45.6 40.0 38.2 38.8
Vicuna-13B 30.5 38.2 32.5 32.5 32.8
Chinese-LLaMA-Plus-13B 30.3 38.0 32.9 29.1 32.1
Ziya-LLaMA-13B-Pretrain 27.6 34.4 32.0 28.6 30.0
LLaMA-13B 27.0 33.6 27.7 27.6 28.5
moss-moon-003-base (16B) 27.0 29.1 27.2 26.9 27.4
Baichuan-13B-Base 45.9 63.5 57.2 49.3 52.4
Baichuan-13B-Chat 43.7 64.6 56.2 49.2 51.5
Model 5-shot STEM Social Sciences Humanities Others Average
Vicuna-13B 40.4 60.5 49.5 58.4 52.0
LLaMA-13B 36.1 53.0 44.0 52.8 46.3
Chinese-Alpaca-Plus-13B 36.9 48.9 40.5 50.5 43.9
Ziya-LLaMA-13B-Pretrain 35.6 47.6 40.1 49.4 42.9
Baichuan-7B 35.6 48.9 38.4 48.1 42.3
Chinese-LLaMA-Plus-13B 33.1 42.8 37.0 44.6 39.2
moss-moon-003-base (16B) 22.4 22.8 24.2 24.4 23.6
Baichuan-13B-Base 41.6 60.9 47.4 58.5 51.6
Baichuan-13B-Chat 40.9 60.9 48.8 59.0 52.1

说明:我们采用了 MMLU 官方的评测方案

Model 5-shot STEM Humanities Social Sciences Others China Specific Average
Baichuan-7B 34.4 47.5 47.6 46.6 44.3 44.0
Vicuna-13B 31.8 36.2 37.6 39.5 34.3 36.3
Chinese-Alpaca-Plus-13B 29.8 33.4 33.2 37.9 32.1 33.4
Chinese-LLaMA-Plus-13B 28.1 33.1 35.4 35.1 33.5 33.0
Ziya-LLaMA-13B-Pretrain 29.0 30.7 33.8 34.4 31.9 32.1
LLaMA-13B 29.2 30.8 31.6 33.0 30.5 31.2
moss-moon-003-base (16B) 27.2 30.4 28.8 32.6 28.7 29.6
Baichuan-13B-Base 41.7 61.1 59.8 59.0 56.4 55.3
Baichuan-13B-Chat 42.8 62.6 59.7 59.0 56.1 55.8

说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们采用了其官方的评测方案

模型细节

模型名称 隐藏层维度 层数 注意力头数 词表大小 总参数量 训练数据(tokens) 位置编码 最大长度
Baichuan-7B 4,096 32 32 64,000 7,000,559,616 1.2 万亿 RoPE 4,096
Baichuan-13B 5,120 40 40 64,000 13,264,901,120 1.4 万亿 ALiBi 4,096

推理和部署

推理所需的模型权重、源码、配置已发布在 Hugging Face:Baichuan-13B-BaseBaichuan-13B-Chat。下面以 Baichuan-13B-Chat 为例示范多种推理方式。程序会自动从 Hugging Face 下载所需资源。

推理前请安装依赖:

pip install -r requirements.txt

Python代码方式

>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers.generation.utils import GenerationConfig
>>> tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-13B-Chat", use_fast=False, trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
>>> model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan-13B-Chat")
>>> messages = []
>>> messages.append({"role": "user", "content": "世界上第二高的山峰是哪座"})
>>> response = model.chat(tokenizer, messages)
>>> print(response)
乔戈里峰世界第二高峰———乔戈里峰西方登山者称其为k2峰海拔高度是8611米位于喀喇昆仑山脉的中巴边境上

在上述代码中,模型加载指定 device_map='auto',会使用所有可用显卡。如需指定使用的设备,可以使用类似 export CUDA_VISIBLE_DEVICES=0,1(使用了0、1号显卡)的方式控制。

命令行工具方式

python cli_demo.py

最后输出示例如下:

网页 demo 方式

依靠streamlit运行以下命令,会在本地启动一个 web 服务,把控制台给出的地址放入浏览器即可访问。

streamlit run web_demo.py

效果如下:

Baichuan-13B-Chat 示例输出

内容创作
用户:
你是一个内容创作者,编写一段关于对马斯克和比尔盖茨访问**视角的短视频脚本

Baichun-13B-Chat:
探索马斯克与比尔·盖茨的**之旅

镜头1:(背景音乐)画面切入,展示**的繁华都市景象。

旁白:欢迎来到我们的节目,今天我们将带您一起了解两位科技巨头——埃隆·马斯克和比尔·盖茨在**的一次特殊访问。

镜头2:(背景音乐)画面切换至马斯克和比尔·盖茨抵达**的机场。

旁白:这次访问是他们首次共同踏上**的土地,他们将在这里展开一场关于科技创新、环保和教育的对话。

镜头3:(背景音乐)画面展示马斯克和比尔·盖茨参观一家电动汽车制造厂。

旁白:在这次访问中,他们首先参观了一家电动汽车制造厂,了解了**在新能源汽车领域的最新发展。

镜头4:(背景音乐)画面展示马斯克和比尔·盖茨参加一场关于人工智能的论坛。

旁白:随后,他们参加了关于人工智能的一场论坛,探讨了这一领域的发展趋势及其对未来的影响。

镜头5:(背景音乐)画面展示马斯克和比尔·盖茨参观一所学校。

旁白:在这次访问中,他们还参观了一所学校,了解了**在教育领域的创新实践。

镜头6:(背景音乐)画面展示马斯克和比尔·盖茨接受媒体采访。

旁白:最后,他们在媒体采访环节表示,希望通过这次访问,能够为中美两国在科技创新、环保和教育等领域的合作提供一些启示。

镜头7:(背景音乐)画面切回繁华都市景象。

旁白:这就是马斯克和比尔·盖茨的**之行,他们的访问不仅让我们看到了**在科技创新方面的巨大潜力,也为我们提供了关于未来发展的更多思考。感谢收看,我们下期再见!
广告文案
用户:
请以销售员的角色,将洗衣液的干净、无残留、无异味以及除菌等优点进行推销

Baichun-13B-Chat:
尊敬的顾客,您好!我是一名洗衣液销售员,非常高兴为您介绍我们这款产品。我们的洗衣液具有以下几个显著优点:

干净:我们的洗衣液采用了先进的洗涤技术,能够深入衣物纤维,有效去除各种污渍,让您的衣物焕然一新。同时,它不会对衣物的颜色和质地造成任何损害,让您的衣物保持原有的品质。

无残留:与传统的洗衣粉相比,我们的洗衣液在清洗过程中不会产生过多的泡沫,因此清洗后的衣物更加柔软,不会有黏腻的感觉。此外,它的去污能力强,能彻底洗净衣物上的污渍,避免了传统洗衣粉容易残留在衣物上的问题。

无异味:我们的洗衣液采用天然植物提取物作为原料,不含有害化学物质,因此在使用过程中不会产生刺鼻的气味。即使是敏感肌肤的人群也可以放心使用,让您的衣物散发出淡雅的清香。

除菌:为了满足现代人对健康生活的需求,我们的洗衣液特别添加了高效除菌成分,能够有效杀灭衣物上的细菌,保护您和家人的身体健康。无论是婴幼儿的衣物还是贴身内衣,都可以放心使用。

总之,我们的洗衣液凭借其干净、无残留、无异味以及除菌等优点,成为了广大消费者的首选。如果您还没有尝试过我们的产品,那么现在就是最好的时机。相信它一定会给您带来全新的洗衣体验!
用户:
准备一份以宠物食品为主题的报告,包括潜在消费者,市场规模,市场机遇,及盈利模式等内容


Baichun-13B-Chat:

一、引言

随着人们生活水平的提高和对宠物的关爱程度不断加深,宠物食品行业得到了快速发展。本报告将对宠物食品行业的潜在消费者、市场规模、市场机遇以及盈利模式等方面进行分析,以期为相关企业提供有益的参考信息。

二、潜在消费者

1. 养宠人群:随着养宠人数的增加,宠物食品的需求也在不断增长。根据统计数据显示,2019年**城镇养宠(犬猫)人口数量已达7355万人,同比增长4.6%。预计到2023年,这一数字将达到8742万人。

2. 新生代消费者:新生代消费者在消费观念上更加注重健康、环保和个性化,这使得他们更愿意为宠物购买高品质的食品。此外,他们也更倾向于通过网络平台了解和购买宠物食品。

三、市场规模

1. 根据国家统计局数据,2019年我国宠物食品市场规模达到了1,020亿元人民币,同比增长约10%。预计到2023年,市场规模将达到1,  500亿元人民币。

2. 从产品类型来看,宠物干粮市场占比最高,约占总市场的70%;其次是宠物湿粮,占比约为20%;宠物零食市场占比约为10%。

四、市场机遇

1. 功能性宠物食品:随着消费者对健康的关注度不断提高,具有特定功能的宠物食品如处方粮、减肥粮等需求逐渐增大。

2. 天然有机宠物食品:越来越多的消费者倾向于选择天然、无添加剂的宠物食品,这也为市场带来了新的商机。

3. 定制化宠物食品:针对不同品种、年龄、体型的宠物,推出定制化的宠物食品,满足消费者个性化需求。

五、盈利模式

1. 传统销售渠道:通过超市、宠物店、电商平台等传统渠道销售宠物食品。

2. 直销模式:通过自建网站或与第三方平台合作,直接向消费者销售宠物食品。

3. 会员制:通过设立会员制度,为会员提供专属优惠和服务,增加客户粘性。

4. 增值服务:提供宠物美容、寄养、训练等附加服务,实现多元化盈利。

六、结论

宠物食品行业具有巨大的市场潜力和发展空间。企业应抓住市场机遇,关注消费者需求变化,创新产品
精准问答
用户:
世界上第二高的山是什么山

Baichun-13B-Chat:
乔戈里峰(K2) 海拔8611米,位于喀喇昆仑山脉的中巴边境上,是**以外登山者的最大考验,也是世界上最难征服的山峰之一。它的高度仅次于珠穆朗玛峰,位居世界第二。
用户:
意兴阑珊是什么意思

Baichun-13B-Chat:
"意兴阑珊"是一个汉语成语,它的意思是形容一个人兴致不高,兴趣减弱。这个词通常用来描述一个人在某个活动中失去了原有的热情和兴趣。
用户:
请问强化学习里面的Reward Shaping是什么意思

Baichun-13B-Chat:
奖励塑造(Reward Shaping)是强化学习中的一个概念,它是指通过调整环境或设计特定的奖励函数来引导智能体(Agent)学习到更符合人类期望的行为。在传统的强化学习中,智能体通过与环境交互来学习最优策略,但这种方法可能导致智能体学习到一些次优甚至不良行为。

奖励塑造的目的是为了解决这个问题,通过人为地调整奖励函数,使得智能体更容易学到我们希望它学会的行为。这可以通过以下几种方式实现:

1. 增加正向奖励:当智能体表现出期望行为时,给予其更多的奖励,以鼓励这种行为。
2. 减少负向奖励:当智能体表现出不期望行为时,减少对其进行惩罚,降低其对这种行为的偏好。
3. 引入新奖励:为智能体提供额外的奖励,以便更好地引导其学习特定行为。
4. 改变奖励分布:通过调整奖励函数,使其具有不同的奖励分布,从而影响智能体的学习过程。

总之,奖励塑造是一种通过调整环境和奖励函数来引导智能体学习更好行为的方法,有助于提高强化学习的性能。
语言理解
用户:
小张给领导买了一个礼物说,说“意思一下”,领导说“什么意思”,小张回答说:“没什么意思,就是随便意思意思”。请依次解读一下小张说的这几个“意思”的含义。

Baichun-13B-Chat:
1. 第一个“意思”:指的是小张送给领导的那个礼物,表示他已经买了这个礼物并准备送给他。
2. 第二个“意思”:指的是小张对领导的一种礼貌和尊重,表示自己并没有忘记他的身份和地位,在购买礼物时也考虑到了这一点。
3. 第三个“意思”:指的是小张对于这次赠送礼物的行为本身的态度,即不希望这件事情引起过多的关注或争议,只是想表达一种心意而已。

推理性能

Baichuan-13B 使用了 ALiBi 线性偏置技术,相对于 Rotary Embedding 计算量更小,对推理性能有显著提升;与标准的 LLaMA-13B 相比,平均推理速度 (tokens/s) 实测提升 31.6%:

Model tokens/s
LLaMA-13B 19.4
Baichuan-13B 25.4

测试环境和参数:GPU A100-SXM4-80G, PyTorch 2.0.0+cu117, transformers 4.29.1, batch size = 1, 生成长度 = 2048, 精度 fp16, 基于 Baichuan-13B-Base

量化部署

Baichuan-13B 支持 int8 和 int4 量化,用户只需在推理代码中简单修改两行即可实现。

使用量化的用户请务必注意!

请仔细阅读接下来的的示例代码,尤其是第一行模型加载部分,和上面的推理示例是不同的。

开发者可以按照自己的需求修改模型的加载方式,但是请注意:如果是为了节省显存而进行量化,应加载原始精度模型到 CPU 后再开始量化;避免在from_pretrained时添加device_map='auto'或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

如需使用 int8 量化:

model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda() 

同样的,如需使用 int4 量化:

model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()

另外,如果你不想调用 quantize 在线量化,我们有量化好的 int8 Chat 模型可供使用:Baichuan-13B-Chat-int8

model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat-int8", torch_dtype=torch.float16, trust_remote_code=True).cuda()

量化前后占用显存情况如下:

Precision GPU Mem (GB)
bf16 / fp16 26.0
int8 15.8
int4 9.7

量化后在各个 benchmark 上的结果和原始版本对比如下:

Model 5-shot C-Eval MMLU CMMLU
Baichuan-13B-Base 52.4 51.6 55.3
Baichuan-13B-Base-int8 51.2 49.9 54.5
Baichuan-13B-Base-int4 47.6 46.0 51.0

CPU 部署

Baichuan-13B 支持 CPU 推理,但需要强调的是,CPU 的推理速度相对较慢。需按如下方式修改模型加载的方式:

model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float32, trust_remote_code=True)

使用CPU进行推理大概需要 60GB 内存。

对模型进行微调

开发者可以对 Baichuan-13B-Base 或 Baichuan-13B-Chat 进行微调使用。在此我们测试了与 Baichuan-13B 兼容的微调工具 LLaMA Efficient Tuning,并给出全量微调LoRA微调的两种示范。

在开始之前,开发者需下载 LLaMA Efficient Tuning 项目并按其要求安装依赖

输入数据为放置在项目data目录下的 json 文件,用--dataset选项指定(参考下面示例),多个输入文件用,分隔。json 文件示例格式和字段说明如下:

[
    {
        "instruction": "What are the three primary colors?",
        "input": "",
        "output": "The three primary colors are red, blue, and yellow."
    },
    ....
]

json 文件中存储一个列表,列表的每个元素是一个 sample。其中instruction代表用户输入,input是可选项,如果开发者同时指定了instructioninput,会把二者用\n连接起来代表用户输入;output代表期望的模型输出。

下面我们给出两种微调场景下测试跑通的示范脚本。

全量微调

我们在 8 * Nvidia A100 80 GB + deepspeed 的环境下进行了全量微调测试。

训练启动脚本示例:

deepspeed --num_gpus=8 src/train_bash.py \
    --stage sft \
    --model_name_or_path baichuan-inc/Baichuan-13B-Base \
    --do_train \
    --dataset alpaca_gpt4_en,alpaca_gpt4_zh \
    --finetuning_type full \
    --output_dir path_to_your_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \ 
    --per_device_eval_batch_size 4 \ 
    --gradient_accumulation_steps 8 \ 
    --preprocessing_num_workers 16 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 2.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16 \
    --deepspeed deepspeed.json

deep_speed.json 配置示例:

{
  "train_micro_batch_size_per_gpu": "auto",
  "zero_allow_untested_optimizer": true,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "initial_scale_power": 16, 
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },  
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "overlap_comm": false,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "contiguous_gradients" : true
  }
}

LoRA微调

我们在单张 Nvidia A100 80G 显卡上进行了 LoRA 微调测试。

训练启动脚本示例:

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path baichuan-inc/Baichuan-13B-Base \
    --do_train \
    --dataset alpaca_gpt4_en,alpaca_gpt4_zh \
    --finetuning_type lora \
    --lora_rank 8 \ 
    --lora_target W_pack \
    --output_dir path_to_your_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \ 
    --per_device_eval_batch_size 4 \ 
    --gradient_accumulation_steps 8 \ 
    --preprocessing_num_workers 16 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --num_train_epochs 2.0 \
    --dev_ratio 0.01 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --plot_loss \
    --fp16

关于使用 LLaMA Efficient Tuning 的更详细的用法,请参阅其项目主页说明。

声明

我们在此声明,我们的开发团队并未基于 Baichuan-13B 模型开发任何应用,无论是在 iOS、Android、网页或任何其他平台。我们强烈呼吁所有使用者,不要利用 Baichuan-13B 模型进行任何危害国家社会安全或违法的活动。另外,我们也要求使用者不要将 Baichuan-13B 模型用于未经适当安全审查和备案的互联网服务。我们希望所有的使用者都能遵守这个原则,确保科技的发展能在规范和合法的环境下进行。

我们已经尽我们所能,来确保模型训练过程中使用的数据的合规性。然而,尽管我们已经做出了巨大的努力,但由于模型和数据的复杂性,仍有可能存在一些无法预见的问题。因此,如果由于使用 Baichuan-13B 开源模型而导致的任何问题,包括但不限于数据安全问题、公共舆论风险,或模型被误导、滥用、传播或不当利用所带来的任何风险和问题,我们将不承担任何责任。

协议

对本仓库源码的使用遵循开源许可协议 Apache 2.0。对 Baichuan-13B 模型的社区使用见《Baichuan-13B 模型社区许可协议》。Baichuan-13B 支持商用。如果将 Baichuan-13B 模型或其衍生品用作商业用途,请您按照如下方式联系许可方,以进行登记并向许可方申请书面授权:联系邮箱 [email protected]

baichuan-13b's People

Contributors

adonishong avatar baichuan-assistant avatar bc-gpd avatar benywon avatar gradientguru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

baichuan-13b's Issues

提问它是不是baichuan-13B,回答却是baichuan-7B,请问是为啥?

1 下载模型

from huggingface_hub import snapshot_download
snapshot_download(repo_id="baichuan-inc/Baichuan-13B-Chat", cache_dir=".")

2 推理模型

from fastapi import FastAPI, Request
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
from transformers.generation.utils import GenerationConfig
import uvicorn, json, datetime
import torch
import os
from typing import Dict, Tuple, Union, Optional
from torch.nn import Module

# 自动设置device_map
def auto_configure_device_map(num_gpus: int):
    num_trans_layers = 40
    per_gpu_layers = num_trans_layers / num_gpus
    device_map = {'model.embed_tokens': 0,
    'model.norm': num_gpus-1, 'lm_head': num_gpus-1}
    for i in range(num_trans_layers):
        device_map[f'model.layers.{i}'] = int(i//per_gpu_layers)
    return device_map

# GPU数量
NUM_GPUS = torch.cuda.device_count() if torch.cuda.is_available() else None
# device_map
device_map = auto_configure_device_map(NUM_GPUS) if NUM_GPUS>0 else None
device = torch.device("cuda") if NUM_GPUS>0 else torch.device("cpu")
device_dtype = torch.half if NUM_GPUS>0 else torch.float

# 显存回收
def torch_gc():
    if torch.cuda.is_available():
        with torch.cuda.device(device):
            torch.cuda.empty_cache()
            torch.cuda.ipc_collect()

app = FastAPI()

@app.post("/")
async def create_item(request: Request):
    global model, tokenizer
    json_post_raw = await request.json()
    json_post = json.dumps(json_post_raw)
    json_post_list = json.loads(json_post)
    prompt = json_post_list.get('prompt')
    history = json_post_list.get('history')
    max_length = json_post_list.get('max_length')
    top_p = json_post_list.get('top_p')
    temperature = json_post_list.get('temperature')
    messages = []
    messages.append({"role": "user", "content": prompt})
    response = model.chat(tokenizer, messages)
    
    now = datetime.datetime.now()
    time = now.strftime("%Y-%m-%d %H:%M:%S")
    #response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)
    answer = {
        "response": response,
        "status": 200,
        "time": time
    }
    log = "[" + time + "] " + '", prompt:"' + prompt + '", response:"' + repr(response) + '"'
    print(log)
    torch_gc()
    return answer


if __name__ == '__main__':
    model_dir = "./baichuan-inc--Baichuan-13B-Chat/snapshots/d0a98e13222c6e82d24062f60ff491519e249744"
    tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, device_map=device_map, torch_dtype=torch.float16)
    model.generation_config = GenerationConfig.from_pretrained(model_dir)
    tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=False, trust_remote_code=True)
    print(model)
    model.eval()
    uvicorn.run(app, host='0.0.0.0', port=8080, workers=1)

3 访问模型

curl --location 'http://localhost:8080' \
--header 'Content-Type: application/json' \
--data '{"prompt": "你是baichuan-13b吗?", "history": []}'
{"response":"不是,我是Baichuan-7B,一个人工智能程序,可以在多个任务中提供帮助,包括但不限于回答问题、提供建议、生成代码和解释算法。","status":200,"time":"2023-07-11 17:40:08"

4 服务日志

[2023-07-11 17:40:08] ", prompt:"你是baichuan-13b吗?", response:"'不是,我是Baichuan-7B,一个人工智能程序,可以在多个任务中提供帮助,包括但不限于回答问题、提供建议、生成代码和解释算法。'"
INFO:     xx.xx.xx.xx:32134 - "POST / HTTP/1.1" 200 OK

bug in cli_demo

【steam 开关流式生成,exit 结束。】
应该是
【stream 开关流式生成,exit 结束。】

FP16 微调overflow

请问有人试过用DeepSpeed Chat的代码基于ZERO2+FP16微调吗,发现会下溢出, 换成BF16后同样发现embeddng 层某些参数变为0,导致后面梯度为nan,没法更新参数,请问怎么解决?

多卡运行demoRuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Traceback:
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/home/kemove/shimin/instructor-embedding/baichuan.py", line 83, in
main()
File "/home/kemove/shimin/instructor-embedding/baichuan.py", line 74, in main
for response in model.chat(tokenizer, messages, stream=True):
File "/home/kemove/.cache/huggingface/modules/transformers_modules/baichuan-inc/Baichuan-13B-Chat/d8e1124426fb781d50266f22be116243b093774d/modeling_baichuan.py", line 527, in stream_generator
for token in self.generate(input_ids, generation_config=stream_config):
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/transformers_stream_generator/main.py", line 931, in sample_stream
outputs = self(
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/kemove/.cache/huggingface/modules/transformers_modules/baichuan-inc/Baichuan-13B-Chat/d8e1124426fb781d50266f22be116243b093774d/modeling_baichuan.py", line 382, in forward
outputs = self.model(
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/kemove/.cache/huggingface/modules/transformers_modules/baichuan-inc/Baichuan-13B-Chat/d8e1124426fb781d50266f22be116243b093774d/modeling_baichuan.py", line 325, in forward
layer_outputs = decoder_layer(
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/kemove/.cache/huggingface/modules/transformers_modules/baichuan-inc/Baichuan-13B-Chat/d8e1124426fb781d50266f22be116243b093774d/modeling_baichuan.py", line 175, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/kemove/anaconda3/envs/instructor/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/kemove/.cache/huggingface/modules/transformers_modules/baichuan-inc/Baichuan-13B-Chat/d8e1124426fb781d50266f22be116243b093774d/modeling_baichuan.py", line 62, in forward
return self.weight * hidden_states

请问这个怎么处理

量化 int4,报错"RuntimeError: CUDA Error: no kernel image is available for execution on the device"

量化 int4,
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
运行“streamlit run web_demo.py”可以正常启动,但是问问题后,就报错。

[user] 你是谁?
2023-07-13 12:53:14.567 Uncaught app exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/root/Baichuan-13B/web_demo.py", line 72, in
main()
File "/root/Baichuan-13B/web_demo.py", line 61, in main
for response in model.chat(tokenizer, messages, stream=True):
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 527, in stream_generator
for token in self.generate(input_ids, generation_config=stream_config):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/usr/local/lib/python3.8/dist-packages/transformers_stream_generator/main.py", line 931, in sample_stream
outputs = self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 382, in forward
outputs = self.model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 325, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 178, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 113, in forward
proj = self.W_pack(hidden_states)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/quantizer.py", line 116, in forward
rweight = dequant4(self.weight, self.scale, input).T
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/quantizer.py", line 82, in dequant4
kernels.int4_to_fp16(
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 48, in call
func = self._prepare_func()
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
self._module.get_module(), self._func_name
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 24, in get_module
self._module[curr_device] = cuda.cuModuleLoadData(self._code)
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/base.py", line 94, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/cuda.py", line 233, in cuModuleLoadData
checkCUStatus(cuda.cuModuleLoadData(ctypes.byref(module), data))
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/cuda.py", line 216, in checkCUStatus
raise RuntimeError("CUDA Error: %s" % cuGetErrorString(error))
RuntimeError: CUDA Error: no kernel image is available for execution on the device

输入长度超过最大长度

我在做模型推理的时候,如果输入的长度大于模型所支持的最大长度会报错,请问这种问题怎么解决?

AMD

请问支持AMD GPU吗?

运行量化model = model.quantize(4).cuda()报错

NotImplementedError: Could not run 'aten::lshift.Scalar' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::lshift.Scalar' is only available for these backends: [CPU, CUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher]. CPU: registered at aten/src/ATen/RegisterCPU.cpp:31034 [kernel] CUDA: registered at aten/src/ATen/RegisterCUDA.cpp:43986 [kernel] BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback] FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback] Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:280 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback] Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:19 [backend fallback] ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback] ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradHIP: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradMPS: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradIPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradVE: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradLazy: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradMeta: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradMTIA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_0.cpp:16728 [kernel] AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:487 [backend fallback] AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:354 [backend fallback] FuncTorchBatched: registered at ../aten/src/ATen/functorch/BatchRulesBinaryOps.cpp:324 [kernel] FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback] Batched: registered at ../aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback] PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback] FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback] PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]
Traceback:
File "/mnt/data/anaconda3/envs/pytorch-cuda/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/mnt/data/chatglm/Baichuan-13B-main/Baichuan-13B-main/web_demo.py", line 73, in
main()
File "/mnt/data/chatglm/Baichuan-13B-main/Baichuan-13B-main/web_demo.py", line 52, in main
model, tokenizer = init_model()
File "/mnt/data/anaconda3/envs/pytorch-cuda/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 211, in wrapper
return cached_func(*args, **kwargs)
File "/mnt/data/anaconda3/envs/pytorch-cuda/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 240, in call
return self._get_or_create_cached_value(args, kwargs)
File "/mnt/data/anaconda3/envs/pytorch-cuda/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 266, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/mnt/data/anaconda3/envs/pytorch-cuda/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 320, in _handle_cache_miss
computed_value = self._info.func(func_args, **func_kwargs)
File "/mnt/data/chatglm/Baichuan-13B-main/Baichuan-13B-main/web_demo.py", line 20, in init_model
model = model.quantize(4).cuda()
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 457, in quantize
layer.self_attn.W_pack = QLinear(
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/quantizer.py", line 99, in init
self.weight = quant4(weight, self.scale)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/quantizer.py", line 51, in quant4
qweight[:, j] = ((intweight[:, j
8+7] & 0x0f) << 28) \

安装后调试报错

response = model.chat(tokenizer, messages)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in :1 │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils_contextl │
│ ib.py:115 in decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ C:\Users\winston/.cache\huggingface\modules\transformers_modules\baichuan-inc\Baichuan-13B-Chat\ │
│ f5f47be2adbbdceb784f334d6fa1ca2c73e65097\modeling_baichuan.py:552 in chat │
│ │
│ 549 │ │ │ return stream_generator() │
│ 550 │ │ else: │
│ 551 │ │ │ self.class.generate = PreTrainedModel.generate # disable stream │
│ ❱ 552 │ │ │ outputs = self.generate(input_ids, generation_config=generation_config) │
│ 553 │ │ │ response = tokenizer.decode(outputs[0][len(input_ids[0]):], skip_special_tok │
│ 554 │ │ │ return response │
│ 555 │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils_contextl │
│ ib.py:115 in decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generati │
│ on\utils.py:1572 in generate │
│ │
│ 1569 │ │ │ ) │
│ 1570 │ │ │ │
│ 1571 │ │ │ # 13. run sample │
│ ❱ 1572 │ │ │ return self.sample( │
│ 1573 │ │ │ │ input_ids, │
│ 1574 │ │ │ │ logits_processor=logits_processor, │
│ 1575 │ │ │ │ logits_warper=logits_warper, │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generati │
│ on\utils.py:2619 in sample │
│ │
│ 2616 │ │ │ model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) │
│ 2617 │ │ │ │
│ 2618 │ │ │ # forward pass to get next token │
│ ❱ 2619 │ │ │ outputs = self( │
│ 2620 │ │ │ │ **model_inputs, │
│ 2621 │ │ │ │ return_dict=True, │
│ 2622 │ │ │ │ output_attentions=output_attentions, │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\modu │
│ le.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py:1 │
│ 65 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ C:\Users\winston/.cache\huggingface\modules\transformers_modules\baichuan-inc\Baichuan-13B-Chat\ │
│ f5f47be2adbbdceb784f334d6fa1ca2c73e65097\modeling_baichuan.py:400 in forward │
│ │
│ 397 │ │ │
│ 398 │ │ │
│ 399 │ │ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn) │
│ ❱ 400 │ │ outputs = self.model( │
│ 401 │ │ │ input_ids=input_ids, │
│ 402 │ │ │ past_key_values=past_key_values, │
│ 403 │ │ │ inputs_embeds=inputs_embeds, │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\modu │
│ le.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py:1 │
│ 65 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ C:\Users\winston/.cache\huggingface\modules\transformers_modules\baichuan-inc\Baichuan-13B-Chat\ │
│ f5f47be2adbbdceb784f334d6fa1ca2c73e65097\modeling_baichuan.py:325 in forward │
│ │
│ 322 │ │ │ │ │ None, │
│ 323 │ │ │ │ ) │
│ 324 │ │ │ else: │
│ ❱ 325 │ │ │ │ layer_outputs = decoder_layer( │
│ 326 │ │ │ │ │ hidden_states, │
│ 327 │ │ │ │ │ attention_mask=attention_mask, │
│ 328 │ │ │ │ │ past_key_value=past_key_value, │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\modu │
│ le.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py:1 │
│ 65 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ C:\Users\winston/.cache\huggingface\modules\transformers_modules\baichuan-inc\Baichuan-13B-Chat\ │
│ f5f47be2adbbdceb784f334d6fa1ca2c73e65097\modeling_baichuan.py:178 in forward │
│ │
│ 175 │ │ hidden_states = self.input_layernorm(hidden_states) │
│ 176 │ │ │
│ 177 │ │ # Self Attention │
│ ❱ 178 │ │ hidden_states, self_attn_weights, present_key_value = self.self_attn( │
│ 179 │ │ │ hidden_states=hidden_states, │
│ 180 │ │ │ attention_mask=attention_mask, │
│ 181 │ │ │ past_key_value=past_key_value, │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\modu │
│ le.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py:1 │
│ 65 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ C:\Users\winston/.cache\huggingface\modules\transformers_modules\baichuan-inc\Baichuan-13B-Chat\ │
│ f5f47be2adbbdceb784f334d6fa1ca2c73e65097\modeling_baichuan.py:113 in forward │
│ │
│ 110 │ │ │
│ 111 │ │ bsz, q_len, _ = hidden_states.size() │
│ 112 │ │ │
│ ❱ 113 │ │ proj = self.W_pack(hidden_states) │
│ 114 │ │ proj = proj.unflatten(-1, (3, self.hidden_size)).unsqueeze(0).transpose(0, -2).s │
│ 115 │ │ query_states = proj[0].view(bsz, q_len, self.num_heads, self.head_dim).transpose │
│ 116 │ │ key_states = proj[1].view(bsz, q_len, self.num_heads, self.head_dim).transpose(1 │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\modu │
│ le.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py:1 │
│ 65 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module.hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ C:\Users\winston\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\line │
│ ar.py:114 in forward │
│ │
│ 111 │ │ │ init.uniform
(self.bias, -bound, bound) │
│ 112 │ │
│ 113 │ def forward(self, input: Tensor) -> Tensor: │
│ ❱ 114 │ │ return F.linear(input, self.weight, self.bias) │
│ 115 │ │
│ 116 │ def extra_repr(self) -> str: │
│ 117 │ │ return 'in_features={}, out_features={}, bias={}'.format( │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: "addmm_impl_cpu
" not implemented for 'Half'
print(response)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in :1 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
NameError: name 'response' is not defined

多轮对话格式

请问多轮对话的格式是怎样的?目前只看到单轮的{'role': 'user', 'content': '你好'},

P40 int8推理过于慢

1689219958389
image

样例代码运行速度大约1it/s
A100的 fp16 算力约为 300 TFOPS,官方速度 25.4 it/s
p40的 int8 算例为 47 TFOPS,速度大约应为4it/s
现在感觉跟p40的fp16速度大致相同,想知道是量化问题还是某些库没有装好?

Package Version


accelerate 0.20.3
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.0.1
anyio 3.7.1
async-timeout 4.0.2
attrs 23.1.0
certifi 2023.5.7
charset-normalizer 3.2.0
click 8.1.4
cmake 3.26.4
contourpy 1.1.0
cpm-kernels 1.0.11
cycler 0.11.0
exceptiongroup 1.1.2
fastapi 0.99.1
ffmpy 0.3.0
filelock 3.12.2
fonttools 4.40.0
frozenlist 1.3.3
fsspec 2023.6.0
gradio 3.36.1
gradio_client 0.2.8
h11 0.14.0
httpcore 0.17.3
httpx 0.24.1
huggingface-hub 0.16.4
idna 3.4
importlib-metadata 6.8.0
importlib-resources 6.0.0
install 1.3.5
Jinja2 3.1.2
jsonschema 4.18.0
jsonschema-specifications 2023.6.1
kiwisolver 1.4.4
latex2mathml 3.76.0
linkify-it-py 2.0.2
lit 16.0.6
Markdown 3.4.3
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.7.2
mdit-py-plugins 0.3.3
mdtex2html 1.2.0
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.4
networkx 3.1
numpy 1.24.4
orjson 3.9.2
packaging 23.1
pandas 2.0.3
Pillow 10.0.0
pip 23.1.2
pkgutil_resolve_name 1.3.10
protobuf 4.23.4
psutil 5.9.5
pydantic 1.10.7
pydub 0.25.1
Pygments 2.15.1
pyparsing 3.0.9
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2023.3
PyYAML 6.0
referencing 0.29.1
regex 2023.6.3
requests 2.31.0
rpds-py 0.8.10
safetensors 0.3.1
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 41.6.0
six 1.16.0
sniffio 1.3.0
sse-starlette 1.6.1
starlette 0.28.0
sympy 1.12
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1+cu118
torchvision 0.15.2+cu118
tqdm 4.65.0
transformers 4.30.2
transformers-stream-generator 0.0.4
triton 2.0.0
typing_extensions 4.5.0
tzdata 2023.3
uc-micro-py 1.0.2
urllib3 2.0.3
uvicorn 0.22.0
websockets 11.0.3
yarl 1.9.2
zipp 3.16.0

cannot import name 'AutoModelForCausalLM' from 'transformers'

CONDA创建的虚拟环境,安装pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pyton3.11
requirements.txt安装完毕
运行web demo 报错:

2023-07-11 20:22:09.863 Uncaught app exception
Traceback (most recent call last):
File "E:\tools\anaconda202304\envs\baichuan\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 552, in run_script
exec(code, module.dict)
File "J:\baichuan\web_demo.py", line 4, in
from transformers import AutoModelForCausalLM, AutoTokenizer
ImportError: cannot import name 'AutoModelForCausalLM' from 'transformers' (E:\tools\anaconda202304\envs\baichuan\Lib\site-packages\transformers_init
.py)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I used peft to fine tune baichuan llm via lora way.
I ran the same fine-tuning code as 7B for 13B, but something went wrong:

/opt/conda/envs/trl/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/opt/conda/envs/trl/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[21], line 1
----> 1 trainer.train()
      2 model.save_pretrained("baichuan13b/baichuan13b/")

File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:1537, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1532     self.model_wrapped = self.model
   1534 inner_training_loop = find_executable_batch_size(
   1535     self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1536 )
-> 1537 return inner_training_loop(
   1538     args=args,
   1539     resume_from_checkpoint=resume_from_checkpoint,
   1540     trial=trial,
   1541     ignore_keys_for_eval=ignore_keys_for_eval,
   1542 )

File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:1802, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1799     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   1801 with self.accelerator.accumulate(model):
-> 1802     tr_loss_step = self.training_step(model, inputs)
   1804 if (
   1805     args.logging_nan_inf_filter
   1806     and not is_torch_tpu_available()
   1807     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   1808 ):
   1809     # if loss is nan or inf simply add the average of previous logged losses
   1810     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /opt/conda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py:2658, in Trainer.training_step(self, model, inputs)
   2656         scaled_loss.backward()
   2657 else:
-> 2658     self.accelerator.backward(loss)
   2660 return loss.detach() / self.args.gradient_accumulation_steps

File /opt/conda/envs/trl/lib/python3.10/site-packages/accelerate/accelerator.py:1842, in Accelerator.backward(self, loss, **kwargs)
   1840     return
   1841 elif self.scaler is not None:
-> 1842     self.scaler.scale(loss).backward(**kwargs)
   1843 else:
   1844     loss.backward(**kwargs)

File /opt/conda/envs/trl/lib/python3.10/site-packages/torch/_tensor.py:487, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    477 if has_torch_function_unary(self):
    478     return handle_torch_function(
    479         Tensor.backward,
    480         (self,),
   (...)
    485         inputs=inputs,
    486     )
--> 487 torch.autograd.backward(
    488     self, gradient, retain_graph, create_graph, inputs=inputs
    489 )

File /opt/conda/envs/trl/lib/python3.10/site-packages/torch/autograd/__init__.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    195     retain_graph = create_graph
    197 # The reason we repeat same the comment below is that
    198 # some Python versions print out the first line of a multi-line function
    199 # calls in the traceback and some print out the last line
--> 200 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    201     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    202     allow_unreachable=True, accumulate_grad=True)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

could you please help me to fix this, thx

加载模型报错 get_input_embeddings NotImplementedError

/root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/lmflow-0.0.1-py3.9.egg/lmflow/models/ │
│ hf_decoder_model.py:228 in init
│ │
│ 225 │ │ │ # We resize the embeddings only when necessary to avoid index errors. │
│ 226 │ │ │ # If you are creating a model from scratch on a small vocab and want a │
│ 227 │ │ │ # smaller embedding size, remove this test. │
│ ❱ 228 │ │ │ embedding_size = model.get_input_embeddings().weight.shape[0] │
│ 229 │ │ │ if len(tokenizer) > embedding_size: │
│ 230 │ │ │ │ model.resize_token_embeddings(len(tokenizer)) │
│ 231 │
│ │
│ /root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/transformers/modeling_utils.py:1192 │
│ in get_input_embeddings │
│ │
│ 1189 │ │ base_model = getattr(self, self.base_model_prefix, self) │
│ 1190 │ │ print("debug", base_model, self.base_model_prefix, self) │
│ 1191 │ │ if base_model is not self: │
│ ❱ 1192 │ │ │ return base_model.get_input_embeddings() │
│ 1193 │ │ else: │
│ 1194 │ │ │ raise NotImplementedError │
│ 1195 │
│ │
│ /root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/transformers/modeling_utils.py:1194 │
│ in get_input_embeddings │
│ │
│ 1191 │ │ if base_model is not self: │
│ 1192 │ │ │ return base_model.get_input_embeddings() │
│ 1193 │ │ else: │
│ ❱ 1194 │ │ │ raise NotImplementedError │
│ 1195 │ │
│ 1196 │ def set_input_embeddings(self, value: nn.Module): │
│ 1197 │ │ """ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
NotImplementedError

商用咨询

看介绍中提到,base 模型申请后可以商用,chat模型没有明确说明,可以商用吗?

ValueError: Unrecognized configuration class <class 'transformers_modules.baichuan-13B.configuration_baichuan.BaichuanConfig'> to build an AutoTokenizer.

Baichuan-13B> python .\cli_demo.py
init model ...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 3/3 [02:14<00:00, 44.83s/it]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\eshanhaiGPU\AI\Baichuan-13B\cli_demo.py:78 in │
│ │
│ 75 │
│ 76 │
│ 77 if name == "main": │
│ ❱ 78 │ main() │
│ 79 │
│ │
│ C:\Users\eshanhaiGPU\AI\Baichuan-13B\cli_demo.py:40 in main │
│ │
│ 37 │
│ 38 │
│ 39 def main(stream=True): │
│ ❱ 40 │ model, tokenizer = init_model() │
│ 41 │ │
│ 42 │ messages = clear_screen() │
│ 43 │ while True: │
│ │
│ C:\Users\eshanhaiGPU\AI\Baichuan-13B\cli_demo.py:22 in init_model │
│ │
│ 19 │ model.generation_config = GenerationConfig.from_pretrained( │
│ 20 │ │ "C:\Users\eshanhaiGPU\Desktop\smb\models\baichuan-13B" │
│ 21 │ ) │
│ ❱ 22 │ tokenizer = AutoTokenizer.from_pretrained( │
│ 23 │ │ "C:\Users\eshanhaiGPU\Desktop\smb\models\baichuan-13B", │
│ 24 │ │ use_fast=False, │
│ 25 │ │ trust_remote_code=True │
│ │
│ C:\Users\eshanhaiGPU\anaconda3\envs\llm\Lib\site-packages\transformers\models\auto\tokenization_ │
│ auto.py:719 in from_pretrained │
│ │
│ 716 │ │ │ │ │ │ "in order to use this tokenizer." │
│ 717 │ │ │ │ │ ) │
│ 718 │ │ │
│ ❱ 719 │ │ raise ValueError( │
│ 720 │ │ │ f"Unrecognized configuration class {config.class} to build an AutoTokeni │
│ 721 │ │ │ f"Model type should be one of {', '.join(c.name for c in TOKENIZER_MAPPI │
│ 722 │ │ ) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Unrecognized configuration class <class 'transformers_modules.baichuan-13B.configuration_baichuan.BaichuanConfig'> to build an
AutoTokenizer.

lora训练报错 ValueError: Please specify `target_modules` in `peft_config`

/root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/lmflow-0.0.1-py3.9.egg/lmflow/models/ │
│ auto_model.py:16 in get_model │
│ │
│ 13 │ def get_model(self, model_args, *args, **kwargs): │
│ 14 │ │ arch_type = model_args.arch_type │
│ 15 │ │ if arch_type == "decoder_only": │
│ ❱ 16 │ │ │ return HFDecoderModel(model_args, *args, **kwargs) │
│ 17 │ │ elif arch_type == "text_regression": │
│ 18 │ │ │ return TextRegressionModel(model_args, *args, **kwargs) │
│ 19 │ │ elif arch_type == "encoder_decoder": │
│ │
│ /root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/lmflow-0.0.1-py3.9.egg/lmflow/models/ │
│ hf_decoder_model.py:222 in init
│ │
│ 219 │ │ │ │ │ lora_dropout=model_args.lora_dropout, │
│ 220 │ │ │ │ │ target_modules=lora_target_modules, │
│ 221 │ │ │ │ ) │
│ ❱ 222 │ │ │ │ model = get_peft_model(model, peft_config) │
│ 223 │ │ │ │ model.print_trainable_parameters() │
│ 224 │ │ │ │
│ 225 │ │ │ # We resize the embeddings only when necessary to avoid index errors. │
│ │
│ /root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/peft/mapping.py:145 in get_peft_model │
│ │
│ 142 │ │ peft_config = _prepare_lora_config(peft_config, model_config) │
│ 143 │ │ return PeftModel(model, peft_config) │
│ 144 │ if not isinstance(peft_config, PromptLearningConfig): │
│ ❱ 145 │ │ peft_config = _prepare_lora_config(peft_config, model_config) │
│ 146 │ else: │
│ 147 │ │ peft_config = _prepare_prompt_learning_config(peft_config, model_config) │
│ 148 │ return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config) │
│ │
│ /root/anaconda3/envs/lmflow_v3/lib/python3.9/site-packages/peft/mapping.py:120 in │
prepare_lora_config │
│ │
│ 117 │ if peft_config.target_modules is None: │
│ 118 │ │ print(TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING) │
│ 119 │ │ if model_config["model_type"] not in TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES

│ ❱ 120 │ │ │ raise ValueError("Please specify target_modules in peft_config") │
│ 121 │ │ peft_config.target_modules = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[ │
│ 122 │ if len(peft_config.target_modules) == 1: │
│ 123 │ │ peft_config.fan_in_fan_out = True │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Please specify target_modules in peft_config

关于位置编码

请问为何在13B的时候将位置编码从ROPE改为Alibi了。如果从长度外推的角度来考虑,我们最近看论文Rope通过插值也能做到很好的长度外推,请问改成Alibi是基于其他的一些考虑吗?谢谢~

请提供int8的模型下载

不然每次运行都需要很长的时间使用CPU进行量化。
请大家注意,这个模型即使用量化int8去跑,也是需要64G内存的。但是量化完之后,就只占用25G左右。
测试效果是比ChatGLM强很多,但是远远比不上GPT4,很多地方也比不上GPT3.5。
想微调的可以试试,日常使用就算了。

Vicuna-13B performs unexpectedly poor on all your evaluations. Did you use delta weights directly without merging?

In your MMLU evaluation, the accuracy of Vicuna is only 24.9%, which is the same as a random guess. This is obviously wrong.
Did you directly use our delta weights (https://huggingface.co/lmsys/vicuna-13b-delta-v1.1) without merging them with the base weights?

If you correctly use our latest weights (https://github.com/lm-sys/FastChat#vicuna-weights), you should get an MMLU accuracy about 52.1 (https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard).

从本地加载量化,程序没反应

`import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig

tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/model/Baichuan-13B-Base", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("/root/autodl-tmp/model/Baichuan-13B-Base", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda()`

从本地加载,再量化,程序没反应,内存也不涨。不知道哪些写的有问题

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.