Code Monkey home page Code Monkey logo

zero_nlp's Introduction

zero to nlp

特点

  1. 🎯目标:基于pytorchtransformers做中文领域的nlp开箱即用的训练框架,提供全套的训练、微调模型(包括大模型、文本转向量、文本生成、多模态等模型)的解决方案;
  2. 💽数据
    • 从开源社区,整理了海量的训练数据,帮助用户可以快速上手;
    • 同时也开放训练数据模版,可以快速处理垂直领域数据;
    • 结合多线程、内存映射等更高效的数据处理方式,即使需要处理百GB规模的数据,也是轻而易举;
  3. 💻流程:每一个项目有完整的模型训练步骤,如:数据清洗、数据处理、模型构建、模型训练、模型部署、模型图解;
  4. 🔥模型:当前已经支持gpt2clipgpt-neoxdollyllamachatglm-6bVisionEncoderDecoderModel等多模态大模型;
  5. 🚀多卡串联 :当前,多数的大模型的尺寸已经远远大于单个消费级显卡的显存,需要将多个显卡串联,才能训练大模型、才能部署大模型。因此对部分模型结构进行修改,实现了训练时推理时 的多卡串联功能。
  6. ⚙️模型工具:添加了大模型的词表裁切词表扩充 教程model_modify

目录

模型训练

中文名称 文件夹名称 数据 数据清洗 大模型 模型部署 图解
中文文本分类 chinese_classifier
中文gpt2 chinese_gpt2
中文clip chinese_clip
图像生成中文文本 VisionEncoderDecoderModel
vit核心源码介绍 vit model
Thu-ChatGlm-6b(v1版本 作废) simple_thu_chatglm6b
🌟chatglm-v2-6b🎉 chatglm_v2_6b_lora
中文dolly_v2_3b dolly_v2_3b
中文llama(作废) chinese_llama
中文bloom chinese_bloom
中文falcon(注意:falcon模型和bloom结构类似) chinese_bloom
中文预训练代码 model_clm
百川大模型 model_baichuan
模型修剪✂️ model_modify
llama2 流水线并行 pipeline
百川2-7b-chat的dpo DPO baichuan2-7b-chat
训练时候,数据占比发生变化 train_data_sample
internlm-base sft internlm-sft
train qwen2 train_qwen2
数据流程图解

我一直觉得,数据流程通过图解的形式表达出来,其实是最清楚的,因此我都会尽可能的把每一个任务的都图解出来。

文本分类数据图解

中文gpt2

中文clip

model

图像生成中文文本

model

vit 源码

分享transformers源码解读

一直在做transformers的源码解读,可以去B站查看视频👉良睦路程序员

分享数据

一直在整理开源数据,如果有需要,可以关注公众号统计学人,回复nlp数据即可。目前还在整理数据中

统计学人

zero_nlp's People

Contributors

xxw1995 avatar yuanzhoulvpi2017 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zero_nlp's Issues

使用训练后的模型报错

代码:
`from transformers import AutoTokenizer
from thuglm.modeling_chatglm import ChatGLMForConditionalGeneration
import torch

model = ChatGLMForConditionalGeneration.from_pretrained(".//test005//checkpoint-300").cuda()
tokenizer = AutoTokenizer.from_pretrained("thuglm", trust_remote_code=True)

with torch.autocast("cuda"):
res, history = model.chat(tokenizer=tokenizer, query="你是谁? ")
# res = model.forward(input_ids=all_input.get('input_ids').cuda())
print(res)`

报错:
Traceback (most recent call last):
File "simple_api.py", line 5, in
model = ChatGLMForConditionalGeneration.from_pretrained(".//test005//checkpoint-300").cuda()
File "/usr/local/python3/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2274, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory .//test005//checkpoint-300.

关于训练集数据的疑惑

看了训练数据集是那种无格式的一维文本数据。
但是ChatGLM看起来似乎是用这种数据格式训练的

 prompt += "[Round {}]\n问:{}\n答:".format(len(history), query)

例如,ChatGLM团队给AI训练的语料应该是类似:

[Round 1]问:睡不着。
答:数到114514个绵羊即可睡着。

那么用户输入"睡不着。"的时候,输入给模型的内容实质上是:

[Round 1]问:睡不着。
答:

模型根据之前的训练,自动补充了接下来的内容,从而作为聊天回复结果(也就是推测出答后面应该是什么)

但你的训练集,并没有遵循上述格式,甚至和对话一点关系都没有,比较好奇会出什么样的结果,能达到什么效果?丰富问答语料还是补充模型知识?…纯新手,只是有点疑惑

训练后没有效果,我换了data2里面的内容后,又报如下错误 。。

/MyTrainer.py", line 819, in _get_train_sampler
return RandomSampler(self.train_dataset, generator=generator)
File "/home/thudm/.local/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 108, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

训练LOSS下降太快

{'loss': 6.4776, 'learning_rate': 1.9914000000000003e-05, 'epoch': 4.17}
{'loss': 2.0588, 'learning_rate': 1.9814000000000004e-05, 'epoch': 8.33}
{'loss': 0.6284, 'learning_rate': 1.9714e-05, 'epoch': 12.5}
{'loss': 0.1956, 'learning_rate': 1.9614000000000002e-05, 'epoch': 16.67}
这个是不是下降太快了?

README 里有一处笔误

但是,你在从github上下载我这个仓库后,是看不到这几个文件的:

pytorch_model-00001-of-00008.bin、
pytorch_model-00002-of-00008.bin、
pytorch_model-00002-of-00008.bin、
pytorch_model-00003-of-00008.bin、
pytorch_model-00004-of-00008.bin、
pytorch_model-00005-of-00008.bin、
pytorch_model-00006-of-00008.bin、
pytorch_model-00007-of-00008.bin、
pytorch_model-00008-of-00008.bin、
ice_text.model


这段第二个多重复了一次哈哈

多卡微调报错呢

return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper__index_select)

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

code02里的代码运行训练的时候报如下错误:
本人cuda版本:11.3,torch
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

TypeError: 'NoneType' object is not subscriptable

微调训练刚开始时遇到 TypeError: 'NoneType' object is not subscriptable
问题出在modeling_chatglm的173行,计算自注意层rotary_emb时,cos_cached和sin_cached被初始化为None了,然后大概是这两个被直接用到。
不清楚是数据问题还是什么问题?设备P100

交友贴

老哥 我也是良睦路程序员,认识下大佬。

我这里有一份Chinese clip的微调方案

首先非常赞赏你对clip方面研究的热情,关于微调方案,我这边image部分使用的原open ai的model,text部分使用的taiyi text_encoder 并冻结了一部分层:

from torch.utils.data import Dataset, DataLoader
import torch
from transformers import CLIPModel, CLIPProcessor, BertForSequenceClassification
from transformers import BertForSequenceClassification, BertConfig, BertTokenizer
import clip
from torch import nn, optim
import pandas as pd
from PIL import Image
import os

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# 加载模型
 
img_encoder, preprocess = clip.load('ViT-B/32', device=device, jit=False) 
# 
text_tokenizer = BertTokenizer.from_pretrained("IDEA-CCNL/Taiyi-CLIP-Roberta-102M-Chinese")
text_encoder = BertForSequenceClassification.from_pretrained("IDEA-CCNL/Taiyi-CLIP-Roberta-102M-Chinese").to(device)


# clip.model.convert_weights(img_encoder)
class image_caption_dataset(Dataset):
    def __init__(self, img_ls,tit_ls):
        self.img_ls = img_ls
        self.tit_ls = tit_ls

    def __len__(self):
        return len(self.img_ls)

    def __getitem__(self, idx):
        image = preprocess(Image.open( “"), stream=True).raw))
        title = self.tit_ls[idx]
        return image, title

def convert_models_to_fp32(model):
    for p in model.parameters():
        p.data = p.data.float()
        p.grad.data = p.grad.data.float()

# list_image_path = ['./imgs/0.jpeg','./imgs/1.jpeg','./imgs/2.jpeg','./imgs/3.jpeg' ] 
# list_txt = ['a good cat toy is colorful' , 'a cat toy on the desk', "there is a cat toy on the sofa", "a photo of cat toy" ]
#加载数据集
dataset = image_caption_dataset(img_ls,tit_ls)
train_dataloader = DataLoader(dataset, batch_size=32)
#设置参数
loss_img = nn.CrossEntropyLoss().to(device)
loss_txt = nn.CrossEntropyLoss().to(device)
optimizer = optim.Adam([{'params': img_encoder.parameters()}, {'params': text_encoder.parameters()}], lr=5e-5, betas=(0.9, 0.98), eps=1e-6, weight_decay=0.2)
 
for name, param in list(text_encoder.named_parameters())[:-20]:
#     print(name)
    param.requires_grad = False
for i in range(500):
    k = 1
    for batch in train_dataloader:
        list_image, list_txt = batch  # list_images is list of image in numpy array(np.uint8), or list of PIL images

        #list_image = list_image.to(device)
        texts = text_tokenizer(list_txt, padding=True, return_tensors='pt')['input_ids'].to(device)
        images = list_image.to(device)
#         logits_per_image, logits_per_text = model(images, texts)
        logits_per_image = img_encoder.encode_image(images)
        logits_per_text = text_encoder(texts).logits
        if device == "cpu":
            ground_truth = torch.arange(len(list_image)).long().to(device)
        else:
            #ground_truth = torch.arange(batch_size).half().to(device)
            ground_truth = torch.arange(len(list_image), dtype=torch.long, device=device)


        #反向传播
        total_loss = (loss_img(logits_per_image, ground_truth) + loss_txt(logits_per_text, ground_truth)) / 2
        k += 1
        if k%10==0:
            print(k,":",total_loss)
        optimizer.zero_grad()
        total_loss.backward()
        if device == "cpu":
            optimizer.step()
        else:
#             convert_models_to_fp32(model)
            optimizer.step()
#             clip.model.convert_weights(img_encoder)

    print('[%d] loss: %.3f' %(i + 1, total_loss))
# torch.save(model, './model/model1.pkl')

 希望上面代码对大家clip方面的工作或研究有帮组,有其它更好的方案也期望能进行改善

参考最新的微调alpaca数据集报错

RuntimeError: Caught RuntimeError in replica 1 on device 1.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper__index_select)

训练好的ckeckpoint模型没有变化

遇到了非常神奇的现象,用老哥的数据原封不动训练chatglm,训练完之后用checkpoint200推理,明明loss只有0.002左右了,但是问他readme里面的问题,输出结果和原来的模型依然没有任何变化。
不知道大家有没有遇到过这种情况?

我用了BELLE的0.5M语料训练

训练后的输出不太对劲,每个输出后面都有自己提问。。。如下所示
image
还有
image
这些问题都是回答的时候自己生成的,是不是因为微调预处理输入问题。。。或者数据没处理好?

peft 是啥版本啊?不兼容啊

/site-packages/peft/tuners/lora.py", line 464, in
class Linear8bitLt(bnb.nn.Linear8bitLt, LoraLayer):
AttributeError: module 'bitsandbytes' has no attribute 'nn'

训练的epoch数

方便分享一下训练了多少个epoch后模型有效果吗?

讨论一下实时微调的技术实现可能性?

有没有可能,实现可以通过和AI对话(不管是webui还是命令行),得到一系列聊天记录。
对ai某些回答不满意,可以强制修改聊天记录,最终使得对话符合预期。
再一键将该对话喂给AI,并入微调层
微调层足够多后,再并入大模型中

对训练数据的准备逻辑有点疑问

glm项目本身就是一个中文预训练语言模型

GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.

chatglm 应该是基于预训练好的模型上微调的,根据博客https://chatglm.cn/blog,这个微调过程应该不是简单的对语言模型的微调,或许可以参考alpaca 的思路,做instruction 的微调?
我也不是很确定,希望交流一下。

RuntimeError when training: GET was unable to find an engine to execute this computation

Error when running:
pythonsimple_thu_chatglm6b/train_chatglm6b.py

Traceback (most recent call last):
File "train_chatglm6b.py", line 111, in
trainer.train()
File "/home/project/zero_nlp/simple_thu_chatglm6b/MyTrainer.py", line 1634, in train
return inner_training_loop(
File "/home/project/zero_nlp/simple_thu_chatglm6b/MyTrainer.py", line 1901, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/project/zero_nlp/simple_thu_chatglm6b/MyTrainer.py", line 2655, in training_step
self.scaler.scale(loss).backward()
File "/home/miniconda3/envs/chatglm/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/miniconda3/envs/chatglm/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: GET was unable to find an engine to execute this computation

my envs:

  • python=3.8.16
  • torch=2.0.0
  • nvidia 3090 with cuda version 11.5

Some weights of the model checkpoint at ./save_model/ were not used

执行infer的时候,日志提示了这些,是因为这些weights没能加载到模型中,所以预测结果猜测没训练之前结果一模一样吧,
但我不知道怎么改,可以让这些参数正常加载。@yuanzhoulvpi2017

sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
load model: ./save_model/
Some weights of the model checkpoint at ./save_model/ were not used when initializing ChatGLMForConditionalGeneration: ['transformer.layers.5.attention.query_key_value.lora_B.weight', 'transformer.layers.26.attention.query_key_value.lora_B.weight', 'transformer.layers.13.attention.query_key_value.lora_A.weight', 'transformer.layers.7.attention.query_key_value.lora_A.weight', 'transformer.layers.4.attention.query_key_value.lora_A.weight', 'transformer.layers.6.attention.query_key_value.lora_A.weight', 'transformer.layers.10.attention.query_key_value.lora_B.weight', 'transformer.layers.13.attention.query_key_value.lora_B.weight', 'transformer.layers.11.attention.query_key_value.lora_B.weight', 'transformer.layers.27.attention.query_key_value.lora_A.weight', 'transformer.layers.2.attention.query_key_value.lora_A.weight', 'transformer.layers.1.attention.query_key_value.lora_B.weight', 'transformer.layers.19.attention.query_key_value.lora_B.weight', 'transformer.layers.18.attention.query_key_value.lora_A.weight', 'transformer.layers.20.attention.query_key_value.lora_A.weight', 'transformer.layers.9.attention.query_key_value.lora_B.weight', 'transformer.layers.20.attention.query_key_value.lora_B.weight', 'transformer.layers.22.attention.query_key_value.lora_A.weight', 'transformer.layers.21.attention.query_key_value.lora_B.weight', 'transformer.layers.22.attention.query_key_value.lora_B.weight', 'transformer.layers.6.attention.query_key_value.lora_B.weight', 'transformer.layers.24.attention.query_key_value.lora_B.weight', 'transformer.layers.0.attention.query_key_value.lora_B.weight', 'transformer.layers.17.attention.query_key_value.lora_A.weight', 'transformer.layers.21.attention.query_key_value.lora_A.weight', 'transformer.layers.25.attention.query_key_value.lora_B.weight', 'transformer.layers.1.attention.query_key_value.lora_A.weight', 'transformer.layers.18.attention.query_key_value.lora_B.weight', 'transformer.layers.14.attention.query_key_value.lora_B.weight', 'transformer.layers.16.attention.query_key_value.lora_B.weight', 'transformer.layers.15.attention.query_key_value.lora_A.weight', 'transformer.layers.17.attention.query_key_value.lora_B.weight', 'transformer.layers.23.attention.query_key_value.lora_B.weight', 'transformer.layers.14.attention.query_key_value.lora_A.weight', 'transformer.layers.9.attention.query_key_value.lora_A.weight', 'transformer.layers.8.attention.query_key_value.lora_A.weight', 'transformer.layers.25.attention.query_key_value.lora_A.weight', 'transformer.layers.10.attention.query_key_value.lora_A.weight', 'transformer.layers.23.attention.query_key_value.lora_A.weight', 'transformer.layers.15.attention.query_key_value.lora_B.weight', 'transformer.layers.5.attention.query_key_value.lora_A.weight', 'transformer.layers.26.attention.query_key_value.lora_A.weight', 'transformer.layers.4.attention.query_key_value.lora_B.weight', 'transformer.layers.12.attention.query_key_value.lora_A.weight', 'transformer.layers.8.attention.query_key_value.lora_B.weight', 'transformer.layers.11.attention.query_key_value.lora_A.weight', 'transformer.layers.2.attention.query_key_value.lora_B.weight', 'transformer.layers.19.attention.query_key_value.lora_A.weight', 'transformer.layers.3.attention.query_key_value.lora_B.weight', 'transformer.layers.7.attention.query_key_value.lora_B.weight', 'transformer.layers.24.attention.query_key_value.lora_A.weight', 'transformer.layers.27.attention.query_key_value.lora_B.weight', 'transformer.layers.12.attention.query_key_value.lora_B.weight', 'transformer.layers.0.attention.query_key_value.lora_A.weight', 'transformer.layers.16.attention.query_key_value.lora_A.weight', 'transformer.layers.3.attention.query_key_value.lora_A.weight']

  • This IS expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

默认配置24g显存还是会爆

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 23.87 GiB total capacity; 23.08 GiB already allocated; 9.38 MiB free; 23.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
改batch size好像没什么用,看监视是12g瞬间到24g然后就无了

'ChatGLMForConditionalGeneration' object has no attribute 'model_parallel' 大佬这个是因为没开多卡吗

trainer = MyTrainer(

File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 355, in init

if hasattr(model, "is_parallelizable") and model.is_parallelizable and model.model_parallel:

File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 289, in getattr

return getattr(self.base_model, name)

File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py", line 206, in getattr

return getattr(self.model, name)

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1269, in getattr

raise AttributeError("'{}' object has no attribute '{}'".format(

AttributeError: 'ChatGLMForConditionalGeneration' object has no attribute 'model_parallel'

模型对预训练数据集拟合效果很差

不知道作者是否真的跑通了[code02_训练模型全部流程.ipynb] 这个demo,训练出来的模型和原版没啥区别,数据集里的问题可以说一个都没记住

构建dataset报错

执行train_model_02.ipynb 中,构建dataset 报错,测试集和训练集小数据正常,完整数据报错,每次报错的位置步相同。错误信息如下,请帮忙解答下,谢谢!

TypeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_21060\2193834453.py in
14 dataset = Dataset.from_pandas(df=pd.read_csv("bigdata/clean_train_test/train.csv"))
15 dataset = dataset.train_test_split(test_size=0.0002)
---> 16 dataset = dataset.map(
17 function=tokenizer_text,
18 batched=True

d:\software\anaconda\lib\site-packages\datasets\dataset_dict.py in map(self, function, with_indices, with_rank, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, load_from_cache_file, cache_file_names, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, desc)
814 cache_file_names = {k: None for k in self}
815 return DatasetDict(
--> 816 {
817 k: dataset.map(
818 function=function,

d:\software\anaconda\lib\site-packages\datasets\dataset_dict.py in (.0)
815 return DatasetDict(
816 {
--> 817 k: dataset.map(
818 function=function,
819 with_indices=with_indices,

d:\software\anaconda\lib\site-packages\datasets\arrow_dataset.py in map(self, function, with_indices, with_rank, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, load_from_cache_file, cache_file_name, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, suffix_template, new_fingerprint, desc)
2813
...
--> 429 encodings = self._tokenizer.encode_batch(
430 batch_text_or_text_pairs,
431 add_special_tokens=add_special_tokens,

TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

跑 code02_训练模型全部流程.ipynb 时报错 ModuleNotFoundError: No module named 'datasets'

之前一切正常,但是跑到这里

from thuglm.modeling_chatglm import ChatGLMForConditionalGeneration
from transformers import Trainer, TrainingArguments
import random
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModel
from peft import get_peft_model, LoraConfig, TaskType
from typing import Optional
import torch

报告这个错误

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 4
      2 from transformers import Trainer, TrainingArguments
      3 import random
----> 4 from datasets import load_dataset
      5 from transformers import AutoTokenizer, AutoModel
      6 from peft import get_peft_model, LoraConfig, TaskType

ModuleNotFoundError: No module named 'datasets'

多卡并行训练报错

我这里用了8张P40,指定了训练程序用,1,2,3,4 号卡
RuntimeError: Caught RuntimeError in replica 0 on device 0.

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

第二次再运行,卡在加载模型后,就不动了。错也不报。进程 也杀不了了,还影响了0号卡上的生成文本应用,这时候只能重启

推理问题

我把推理代码拓到.py文件运行,出现执行异常
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.