yuanzhoulvpi2017 / zero_nlp Goto Github PK

View Code? Open in Web Editor NEW

2.5K 30.0 317.0 34.59 MB

中文nlp解决方案(大模型、数据、模型、训练、推理)

License: MIT License

Jupyter Notebook 47.98% Python 51.77% Shell 0.25%

bert nlp transformers gpt2 chatglm-6b clip gpt gpt-neox pytorch text-generation

zero_nlp's Introduction

zero to nlp

特点

🎯目标：基于pytorch、transformers做中文领域的nlp开箱即用的训练框架，提供全套的训练、微调模型（包括大模型、文本转向量、文本生成、多模态等模型）的解决方案；
💽数据：
- 从开源社区，整理了海量的训练数据，帮助用户可以快速上手；
- 同时也开放训练数据模版，可以快速处理垂直领域数据；
- 结合多线程、内存映射等更高效的数据处理方式，即使需要处理百GB规模的数据，也是轻而易举；
💻流程：每一个项目有完整的模型训练步骤，如：数据清洗、数据处理、模型构建、模型训练、模型部署、模型图解；
🔥模型：当前已经支持gpt2、clip、gpt-neox、dolly、llama、chatglm-6b、VisionEncoderDecoderModel等多模态大模型；
🚀多卡串联 ：当前，多数的大模型的尺寸已经远远大于单个消费级显卡的显存，需要将多个显卡串联，才能训练大模型、才能部署大模型。因此对部分模型结构进行修改，实现了训练时、推理时 的多卡串联功能。
⚙️模型工具：添加了大模型的词表裁切和词表扩充 教程model_modify

中文名称	文件夹名称	数据	数据清洗	大模型	模型部署	图解
中文文本分类	chinese_classifier	✅	✅	✅	❌	✅
中文`gpt2`	chinese_gpt2	✅	✅	✅	✅	❌
中文`clip`	chinese_clip	✅	✅	✅	❌	✅
图像生成中文文本	VisionEncoderDecoderModel	✅	✅	✅	❌	✅
vit核心源码介绍	vit model	❌	❌	❌	❌	✅
`Thu-ChatGlm-6b`(`v1`版本作废)	simple_thu_chatglm6b	✅	✅	✅	✅	❌
🌟chatglm-`v2`-6b🎉	chatglm_v2_6b_lora	✅	✅	✅	❌	❌
中文`dolly_v2_3b`	dolly_v2_3b	✅	✅	✅	❌	❌
中文`llama`(作废)	chinese_llama	✅	✅	✅	❌	❌
中文`bloom`	chinese_bloom	✅	✅	✅	❌	❌
中文`falcon`(注意：falcon模型和bloom结构类似)	chinese_bloom	✅	✅	✅	❌	❌
中文预训练代码	model_clm	✅	✅	✅	❌	❌
百川大模型	model_baichuan	✅	✅	✅	✅	❌
模型修剪✂️	model_modify	✅	✅	✅
llama2 流水线并行	pipeline	✅	✅	✅	❌	❌
百川2-7b-chat的`dpo`	DPO baichuan2-7b-chat	✅	✅	✅	❌	❌
训练时候，数据占比发生变化	train_data_sample	✅	✅	✅	❌	❌
internlm-base sft	internlm-sft	✅	✅	✅	❌	❌
train qwen2	train_qwen2	✅	✅	✅	✅	❌

数据流程图解

我一直觉得，数据流程通过图解的形式表达出来，其实是最清楚的，因此我都会尽可能的把每一个任务的都图解出来。

文本分类数据图解

中文gpt2

中文clip

图像生成中文文本

vit 源码

分享transformers源码解读

一直在做transformers的源码解读，可以去B站查看视频👉良睦路程序员

分享数据

一直在整理开源数据，如果有需要，可以关注公众号统计学人，回复nlp数据即可。目前还在整理数据中

zero_nlp's People

Contributors

Stargazers

Watchers

Forkers

huangzhenyang qiu-bot shallmore shangzhensen charleylu666 songqian27 o0mahan0o buaachuanwang dingsl-giser ruzwdy sevenmpp lukaschen1986 linqinhong shengjiaxiang chengturbo pandaupc dohkoxiaozu dddhu zhenqiang-sun jhx-zc pangpang97 fengyueqq itzk-sgh zxf864823150 opprash haojiepan1 moziofmoon renfengyi wanddy ariafyy ma-dan xinzaifeixiang1992 quantjia hanzhenlei767 jeremiah0425 weiguang3100 guhaifudeng goggeryang oblonka hfyydd zhangfaen jinzhetan aigc404 zhijiachen asihacker techthiyanes abbhay huoshanfei xx-zhang x22x22 whisperchi dogvane sam6513 focueai vpegasus lhrlab myxiaoyu albertbj huggingaha feynmanmeng pangweiwei mingyangbj d2gin done520 colionx wdkwdkwdk owenmasculinity han508 xinqiyang zhangzai666 yufengsoft joshuayan felixchina2000 amprosdev haidiyoushen 447806664 techventurebuilder hhy5277 erlangs zhaodice feiyangw luieswww fangcao1314 wccccp peter65374 jaredshuai ml-xx lililibaohang jangocheng whitefu jiutian12 airsyuan fjfd wlhgtc nightplano qimingnan shieldgold halleygao echoplay dominicqi

zero_nlp's Issues

我使用自定义样本出错了，是样本里面的size太大了么 chatglm-6b lora

如何给ChatGLM-6B增加特定领域的知识，然后根据这些知识来问答

我想给ChatGLM-6B增加特定领域的知识，这些知识可能是很多段的大量文本内容，然后根据这些知识来问答，应该怎么做呢

使用训练后的模型报错

代码：
`from transformers import AutoTokenizer
from thuglm.modeling_chatglm import ChatGLMForConditionalGeneration
import torch

model = ChatGLMForConditionalGeneration.from_pretrained(".//test005//checkpoint-300").cuda()
tokenizer = AutoTokenizer.from_pretrained("thuglm", trust_remote_code=True)

with torch.autocast("cuda"):
res, history = model.chat(tokenizer=tokenizer, query="你是谁? ")
# res = model.forward(input_ids=all_input.get('input_ids').cuda())
print(res)`

报错：
Traceback (most recent call last):
File "simple_api.py", line 5, in
model = ChatGLMForConditionalGeneration.from_pretrained(".//test005//checkpoint-300").cuda()
File "/usr/local/python3/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2274, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory .//test005//checkpoint-300.

哪位跑通的大哥把conda环境配置发一下作为参考？

下面各个库都用的哪个版本？cuda用的哪个版本？
torch
transformers
peft

请问数据集是做的什么任务

是生成任务吗？为什么我跑起来之后生成的东西很乱，而且任务不明确

关于训练集数据的疑惑

看了训练数据集是那种无格式的一维文本数据。
但是ChatGLM看起来似乎是用这种数据格式训练的

 prompt += "[Round {}]\n问：{}\n答：".format(len(history), query)

例如，ChatGLM团队给AI训练的语料应该是类似:

[Round 1]问：睡不着。
答：数到114514个绵羊即可睡着。

那么用户输入"睡不着。"的时候，输入给模型的内容实质上是:

[Round 1]问：睡不着。
答：

模型根据之前的训练，自动补充了接下来的内容，从而作为聊天回复结果(也就是推测出答后面应该是什么)

但你的训练集，并没有遵循上述格式，甚至和对话一点关系都没有，比较好奇会出什么样的结果，能达到什么效果?丰富问答语料还是补充模型知识?…纯新手，只是有点疑惑

训练后没有效果，我换了data2里面的内容后，又报如下错误。。

/MyTrainer.py", line 819, in _get_train_sampler
return RandomSampler(self.train_dataset, generator=generator)
File "/home/thudm/.local/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 108, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

训练LOSS下降太快

{'loss': 6.4776, 'learning_rate': 1.9914000000000003e-05, 'epoch': 4.17}
{'loss': 2.0588, 'learning_rate': 1.9814000000000004e-05, 'epoch': 8.33}
{'loss': 0.6284, 'learning_rate': 1.9714e-05, 'epoch': 12.5}
{'loss': 0.1956, 'learning_rate': 1.9614000000000002e-05, 'epoch': 16.67}
这个是不是下降太快了？

README 里有一处笔误

但是，你在从github上下载我这个仓库后，是看不到这几个文件的：

pytorch_model-00001-of-00008.bin、
pytorch_model-00002-of-00008.bin、
pytorch_model-00002-of-00008.bin、
pytorch_model-00003-of-00008.bin、
pytorch_model-00004-of-00008.bin、
pytorch_model-00005-of-00008.bin、
pytorch_model-00006-of-00008.bin、
pytorch_model-00007-of-00008.bin、
pytorch_model-00008-of-00008.bin、
ice_text.model

这段第二个多重复了一次哈哈

多卡微调报错呢

return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper__index_select)

infer的时候，如果问训练集问题，模型响应异常

使用最新代码微调之后，infer的时候遇到了以下bug，不知道是不是lora调整的层加载在cpu？还是别的原因

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

code02里的代码运行训练的时候报如下错误：
本人cuda版本：11.3，torch
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

TypeError: 'NoneType' object is not subscriptable

微调训练刚开始时遇到 TypeError: 'NoneType' object is not subscriptable
问题出在modeling_chatglm的173行，计算自注意层rotary_emb时，cos_cached和sin_cached被初始化为None了，然后大概是这两个被直接用到。
不清楚是数据问题还是什么问题？设备P100

Lora并没有降低多少显存，这是为什么呢？

而实际测试中，
长度1024，不加lora， batch_size=1，显存39825MB
长度1024，加lora， batch_size=1，显存38675MB

这并没有很大的差别，这是为啥？

交友贴

老哥我也是良睦路程序员，认识下大佬。

我这里有一份Chinese clip的微调方案

首先非常赞赏你对clip方面研究的热情，关于微调方案，我这边image部分使用的原open ai的model，text部分使用的taiyi text_encoder 并冻结了一部分层：

from torch.utils.data import Dataset, DataLoader
import torch
from transformers import CLIPModel, CLIPProcessor, BertForSequenceClassification
from transformers import BertForSequenceClassification, BertConfig, BertTokenizer
import clip
from torch import nn, optim
import pandas as pd
from PIL import Image
import os

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# 加载模型
 
img_encoder, preprocess = clip.load('ViT-B/32', device=device, jit=False) 
# 
text_tokenizer = BertTokenizer.from_pretrained("IDEA-CCNL/Taiyi-CLIP-Roberta-102M-Chinese")
text_encoder = BertForSequenceClassification.from_pretrained("IDEA-CCNL/Taiyi-CLIP-Roberta-102M-Chinese").to(device)


# clip.model.convert_weights(img_encoder)
class image_caption_dataset(Dataset):
    def __init__(self, img_ls,tit_ls):
        self.img_ls = img_ls
        self.tit_ls = tit_ls

    def __len__(self):
        return len(self.img_ls)

    def __getitem__(self, idx):
        image = preprocess(Image.open( “"), stream=True).raw))
        title = self.tit_ls[idx]
        return image, title

def convert_models_to_fp32(model):
    for p in model.parameters():
        p.data = p.data.float()
        p.grad.data = p.grad.data.float()

# list_image_path = ['./imgs/0.jpeg','./imgs/1.jpeg','./imgs/2.jpeg','./imgs/3.jpeg' ] 
# list_txt = ['a good cat toy is colorful' , 'a cat toy on the desk', "there is a cat toy on the sofa", "a photo of cat toy" ]
#加载数据集
dataset = image_caption_dataset(img_ls,tit_ls)
train_dataloader = DataLoader(dataset, batch_size=32)
#设置参数
loss_img = nn.CrossEntropyLoss().to(device)
loss_txt = nn.CrossEntropyLoss().to(device)
optimizer = optim.Adam([{'params': img_encoder.parameters()}, {'params': text_encoder.parameters()}], lr=5e-5, betas=(0.9, 0.98), eps=1e-6, weight_decay=0.2)
 
for name, param in list(text_encoder.named_parameters())[:-20]:
#     print(name)
    param.requires_grad = False
for i in range(500):
    k = 1
    for batch in train_dataloader:
        list_image, list_txt = batch  # list_images is list of image in numpy array(np.uint8), or list of PIL images

        #list_image = list_image.to(device)
        texts = text_tokenizer(list_txt, padding=True, return_tensors='pt')['input_ids'].to(device)
        images = list_image.to(device)
#         logits_per_image, logits_per_text = model(images, texts)
        logits_per_image = img_encoder.encode_image(images)
        logits_per_text = text_encoder(texts).logits
        if device == "cpu":
            ground_truth = torch.arange(len(list_image)).long().to(device)
        else:
            #ground_truth = torch.arange(batch_size).half().to(device)
            ground_truth = torch.arange(len(list_image), dtype=torch.long, device=device)


        #反向传播
        total_loss = (loss_img(logits_per_image, ground_truth) + loss_txt(logits_per_text, ground_truth)) / 2
        k += 1
        if k%10==0:
            print(k,":",total_loss)
        optimizer.zero_grad()
        total_loss.backward()
        if device == "cpu":
            optimizer.step()
        else:
#             convert_models_to_fp32(model)
            optimizer.step()
#             clip.model.convert_weights(img_encoder)

    print('[%d] loss: %.3f' %(i + 1, total_loss))
# torch.save(model, './model/model1.pkl')

 希望上面代码对大家clip方面的工作或研究有帮组，有其它更好的方案也期望能进行改善

感谢对国产大模型的支持，期待单机多卡的版本！

大佬修改名称有什么经验呢

模型微调

请问一下如何对GLM模型进行微调

参考最新的微调alpaca数据集报错

RuntimeError: Caught RuntimeError in replica 1 on device 1.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper__index_select)

训练好的ckeckpoint模型没有变化

遇到了非常神奇的现象，用老哥的数据原封不动训练chatglm，训练完之后用checkpoint200推理，明明loss只有0.002左右了，但是问他readme里面的问题，输出结果和原来的模型依然没有任何变化。
不知道大家有没有遇到过这种情况？

closed

我用了BELLE的0.5M语料训练

训练后的输出不太对劲，每个输出后面都有自己提问。。。如下所示

还有

这些问题都是回答的时候自己生成的，是不是因为微调预处理输入问题。。。或者数据没处理好？

使用测试的data2数据集可以进行正常训练，但是使用我自己构建的数据集则无法进行正常训练

我将之前的text文本生成了csv文件
内容如下所示
"### Instruction:
怎样拥有健康的身体

Response:

多运动多睡觉
END
"
但是加载csv的时候，我观察到，datasets并没有解析csv。而且直接生成了cache，于是训练很快就结束了。
我尝试调整数据集但没有什么效果

train_chatglm6b.py 可以使用alpaca数据集格式吗？

train_chatglm6b.py data2 目录下的csv文件，都是一个字段content,这个字段内容就放纯文本就行吗？

另外，这个可以使用alpaca数据集格式吗？

peft 是啥版本啊？不兼容啊

/site-packages/peft/tuners/lora.py", line 464, in
class Linear8bitLt(bnb.nn.Linear8bitLt, LoraLayer):
AttributeError: module 'bitsandbytes' has no attribute 'nn'

微调后怎么启动一个api server供外部调用？

请问，微调后怎么启动一个api server供外部调用？

训练的epoch数

方便分享一下训练了多少个epoch后模型有效果吗？

讨论一下实时微调的技术实现可能性?

有没有可能，实现可以通过和AI对话(不管是webui还是命令行)，得到一系列聊天记录。
对ai某些回答不满意，可以强制修改聊天记录，最终使得对话符合预期。
再一键将该对话喂给AI，并入微调层
微调层足够多后，再并入大模型中

微调后的checkpoint 能保存为原来的bin格式的文件吗？

simple_thu_chatglm6b
微调后的checkpoint 能保存为原来的bin格式的文件吗？

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...

对训练数据的准备逻辑有点疑问

glm项目本身就是一个中文预训练语言模型

GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.

chatglm 应该是基于预训练好的模型上微调的，根据博客https://chatglm.cn/blog，这个微调过程应该不是简单的对语言模型的微调，或许可以参考alpaca 的思路，做instruction 的微调？
我也不是很确定，希望交流一下。

RuntimeError when training: GET was unable to find an engine to execute this computation

Error when running:
pythonsimple_thu_chatglm6b/train_chatglm6b.py

Traceback (most recent call last):
File "train_chatglm6b.py", line 111, in
trainer.train()
File "/home/project/zero_nlp/simple_thu_chatglm6b/MyTrainer.py", line 1634, in train
return inner_training_loop(
File "/home/project/zero_nlp/simple_thu_chatglm6b/MyTrainer.py", line 1901, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/project/zero_nlp/simple_thu_chatglm6b/MyTrainer.py", line 2655, in training_step
self.scaler.scale(loss).backward()
File "/home/miniconda3/envs/chatglm/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/miniconda3/envs/chatglm/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: GET was unable to find an engine to execute this computation

my envs:

python=3.8.16
torch=2.0.0
nvidia 3090 with cuda version 11.5

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...报这个信息，保存不了模型文件

vocab_file 词表文件在哪里，用的哪个，放哪里呢？

Some weights of the model checkpoint at ./save_model/ were not used

执行infer的时候，日志提示了这些，是因为这些weights没能加载到模型中，所以预测结果猜测没训练之前结果一模一样吧，
但我不知道怎么改，可以让这些参数正常加载。@yuanzhoulvpi2017

sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
load model: ./save_model/
Some weights of the model checkpoint at ./save_model/ were not used when initializing ChatGLMForConditionalGeneration: ['transformer.layers.5.attention.query_key_value.lora_B.weight', 'transformer.layers.26.attention.query_key_value.lora_B.weight', 'transformer.layers.13.attention.query_key_value.lora_A.weight', 'transformer.layers.7.attention.query_key_value.lora_A.weight', 'transformer.layers.4.attention.query_key_value.lora_A.weight', 'transformer.layers.6.attention.query_key_value.lora_A.weight', 'transformer.layers.10.attention.query_key_value.lora_B.weight', 'transformer.layers.13.attention.query_key_value.lora_B.weight', 'transformer.layers.11.attention.query_key_value.lora_B.weight', 'transformer.layers.27.attention.query_key_value.lora_A.weight', 'transformer.layers.2.attention.query_key_value.lora_A.weight', 'transformer.layers.1.attention.query_key_value.lora_B.weight', 'transformer.layers.19.attention.query_key_value.lora_B.weight', 'transformer.layers.18.attention.query_key_value.lora_A.weight', 'transformer.layers.20.attention.query_key_value.lora_A.weight', 'transformer.layers.9.attention.query_key_value.lora_B.weight', 'transformer.layers.20.attention.query_key_value.lora_B.weight', 'transformer.layers.22.attention.query_key_value.lora_A.weight', 'transformer.layers.21.attention.query_key_value.lora_B.weight', 'transformer.layers.22.attention.query_key_value.lora_B.weight', 'transformer.layers.6.attention.query_key_value.lora_B.weight', 'transformer.layers.24.attention.query_key_value.lora_B.weight', 'transformer.layers.0.attention.query_key_value.lora_B.weight', 'transformer.layers.17.attention.query_key_value.lora_A.weight', 'transformer.layers.21.attention.query_key_value.lora_A.weight', 'transformer.layers.25.attention.query_key_value.lora_B.weight', 'transformer.layers.1.attention.query_key_value.lora_A.weight', 'transformer.layers.18.attention.query_key_value.lora_B.weight', 'transformer.layers.14.attention.query_key_value.lora_B.weight', 'transformer.layers.16.attention.query_key_value.lora_B.weight', 'transformer.layers.15.attention.query_key_value.lora_A.weight', 'transformer.layers.17.attention.query_key_value.lora_B.weight', 'transformer.layers.23.attention.query_key_value.lora_B.weight', 'transformer.layers.14.attention.query_key_value.lora_A.weight', 'transformer.layers.9.attention.query_key_value.lora_A.weight', 'transformer.layers.8.attention.query_key_value.lora_A.weight', 'transformer.layers.25.attention.query_key_value.lora_A.weight', 'transformer.layers.10.attention.query_key_value.lora_A.weight', 'transformer.layers.23.attention.query_key_value.lora_A.weight', 'transformer.layers.15.attention.query_key_value.lora_B.weight', 'transformer.layers.5.attention.query_key_value.lora_A.weight', 'transformer.layers.26.attention.query_key_value.lora_A.weight', 'transformer.layers.4.attention.query_key_value.lora_B.weight', 'transformer.layers.12.attention.query_key_value.lora_A.weight', 'transformer.layers.8.attention.query_key_value.lora_B.weight', 'transformer.layers.11.attention.query_key_value.lora_A.weight', 'transformer.layers.2.attention.query_key_value.lora_B.weight', 'transformer.layers.19.attention.query_key_value.lora_A.weight', 'transformer.layers.3.attention.query_key_value.lora_B.weight', 'transformer.layers.7.attention.query_key_value.lora_B.weight', 'transformer.layers.24.attention.query_key_value.lora_A.weight', 'transformer.layers.27.attention.query_key_value.lora_B.weight', 'transformer.layers.12.attention.query_key_value.lora_B.weight', 'transformer.layers.0.attention.query_key_value.lora_A.weight', 'transformer.layers.16.attention.query_key_value.lora_A.weight', 'transformer.layers.3.attention.query_key_value.lora_A.weight']

This IS expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

请教以下，单机多卡怎么实现啊？

默认配置24g显存还是会爆

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 23.87 GiB total capacity; 23.08 GiB already allocated; 9.38 MiB free; 23.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
改batch size好像没什么用，看监视是12g瞬间到24g然后就无了

能给出来安装包以及依赖环境得配置嘛？各种报错

'ChatGLMForConditionalGeneration' object has no attribute 'model_parallel' 大佬这个是因为没开多卡吗

trainer = MyTrainer(

File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 355, in init

if hasattr(model, "is_parallelizable") and model.is_parallelizable and model.model_parallel:

File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 289, in getattr

return getattr(self.base_model, name)

File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py", line 206, in getattr

return getattr(self.model, name)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1269, in getattr

raise AttributeError("'{}' object has no attribute '{}'".format(

AttributeError: 'ChatGLMForConditionalGeneration' object has no attribute 'model_parallel'

双卡3090微调aplace，有没有人遇到这个问题

训练过程中突然就断了，或者一开始的时候就断了。

主要报错信息如下：

下面还有一部分：

环境：

python 3.8
cuda 11.6
pytorch 2.0

模型对预训练数据集拟合效果很差

不知道作者是否真的跑通了[code02_训练模型全部流程.ipynb] 这个demo，训练出来的模型和原版没啥区别，数据集里的问题可以说一个都没记住

构建dataset报错

执行train_model_02.ipynb 中，构建dataset 报错，测试集和训练集小数据正常，完整数据报错，每次报错的位置步相同。错误信息如下，请帮忙解答下，谢谢！

TypeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_21060\2193834453.py in
14 dataset = Dataset.from_pandas(df=pd.read_csv("bigdata/clean_train_test/train.csv"))
15 dataset = dataset.train_test_split(test_size=0.0002)
---> 16 dataset = dataset.map(
17 function=tokenizer_text,
18 batched=True

d:\software\anaconda\lib\site-packages\datasets\dataset_dict.py in map(self, function, with_indices, with_rank, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, load_from_cache_file, cache_file_names, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, desc)
814 cache_file_names = {k: None for k in self}
815 return DatasetDict(
--> 816 {
817 k: dataset.map(
818 function=function,

d:\software\anaconda\lib\site-packages\datasets\dataset_dict.py in (.0)
815 return DatasetDict(
816 {
--> 817 k: dataset.map(
818 function=function,
819 with_indices=with_indices,

d:\software\anaconda\lib\site-packages\datasets\arrow_dataset.py in map(self, function, with_indices, with_rank, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, load_from_cache_file, cache_file_name, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, suffix_template, new_fingerprint, desc)
2813
...
--> 429 encodings = self._tokenizer.encode_batch(
430 batch_text_or_text_pairs,
431 add_special_tokens=add_special_tokens,

TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

只需要这下目录下的代码，就可以微调了吗

只需要这下目录下的代码/simple_thu_chatglm6b/，就可以微调了吗
我已经跑通了原来的chatGLM-6B模型。

跑 code02_训练模型全部流程.ipynb 时报错 ModuleNotFoundError: No module named 'datasets'

之前一切正常，但是跑到这里

from thuglm.modeling_chatglm import ChatGLMForConditionalGeneration
from transformers import Trainer, TrainingArguments
import random
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModel
from peft import get_peft_model, LoraConfig, TaskType
from typing import Optional
import torch

报告这个错误

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 4
      2 from transformers import Trainer, TrainingArguments
      3 import random
----> 4 from datasets import load_dataset
      5 from transformers import AutoTokenizer, AutoModel
      6 from peft import get_peft_model, LoraConfig, TaskType

ModuleNotFoundError: No module named 'datasets'

可以增量训练么？

大家有没有增量训练的思路或者资料分享一下，不胜感激:)

作者您好，希望您解答一下我的问题，谢谢

您好，非常高兴看到这么优秀简练的项目。我微调您更新的alpaca数据集代码报了一下错误，希望得到您的解答

对比MyTrainer和transformers的trainer修改

请教thu_chatglm6b下的transformers版本多少，想对比下MyTrainer具体做了哪些修改以及修改原有，谢谢~

多卡并行训练报错

我这里用了8张P40，指定了训练程序用，1，2，3，4 号卡
RuntimeError: Caught RuntimeError in replica 0 on device 0.

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

第二次再运行，卡在加载模型后，就不动了。错也不报。进程也杀不了了，还影响了0号卡上的生成文本应用，这时候只能重启

推理问题

我把推理代码拓到.py文件运行，出现执行异常