如何设置batch_size个数，变动后train会变维度上的错误

---> 75 return {'input_ids': torch.tensor(input_ids).long(),
76 'attention_mask': torch.stack(attention_mask),
77 'labels': torch.stack(labels),

ValueError: expected sequence of length 115 at dim 1 (got 62)

return {'input_ids': torch.tensor(input_ids).long(),
'attention_mask': attention_mask,
'labels': torch.stack(labels),
'position_ids':torch.stack(position_ids)}
这里为什么有torch.tensor()和torch.stack()呢？
不写成torch.stack（）有时会报ValueError: expected sequence of length 50 at dim 1 (got 49)这个错，可以解答一下吗？

[deepspeed] OVERFLOW!

训练时，经常报疑似梯度溢出警告？请问大佬能解答下原因和解决方法么？
Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

显存问题

可不可以使用22gb甚至20gb的设备训练？

不考虑GLM的双向注意力部分，注意力矩阵不是一个下三角矩阵吗

[1., 0., 0.],
[1., 1., 0.],
[1., 1., 1.]]
类似这种。
所以attention_mask = (attention_mask < 0.5).bool()这个是不是多余的呢？

peft 0.3.0如何设置adapater_name

peft 的0.3.0在LoraModel里增加了一个adapter_name是，这个参数怎么填写？

位置编码

项目里用的二维位置编码吗？
但是chatGLM不是用的旋转位置编码吗？
@lich99

position_ids.append(torch.stack([torch.arange(0, _max_length, device=device),
torch.concat([torch.zeros(context_length - 1, device=device),
torch.arange(0, _max_length - context_length + 1, device=device)])]).long())

能否使用量化后的chatGLM-6b-int4小模型进行微调？

学生党在colab上想尝试您的代码
但是显卡只有15G ，没法跑这个大模型，于是将模型路径换成chatGLM-6b-int4 这个量化后的小模型
但是运行时在运行train.py时215行outputs = model(**batch) 这里报错：self and mat2 must have same dtype
这是因为量化后的模型的参数和输入不匹配导致的吗？那么可以在int4量化后的模型上进行finetune吗？
您方便的话能否写一个量化后模型微调的demo，非常感谢！

LoRA训练时间大概是多久呢？

LoRA的A矩阵一直不更新

用一个样例调试的时候，发现LoRA_A矩阵的梯度一直为0，没有被更新。

关于ZeRO的疑问？

文中写道：Try ZeRO 2 and no offload first, unless you encounter OOM. ZeRO 2 (no offload) > ZeRO 2 (offload) > ZeRO 3 (no offload) > ZeRO 3 (offload)。
请问为什么推荐使用ZeRO2-offload，是因为ZeRO3以及ZeRO3-offload等虽然高效利用了显存，但是牺牲了参数的计算精度等，这会导致模型的训练效果变差吗？
感谢回复！

作者你好，这个项目对python的版本有什么要求吗，3.7是否可以？

CUDA error: device-side assert triggered

模型正常加载发送prompt后报错
Loading checkpoint shards: 14%|████████▏ | 1/7 [00:00<00:05, 1Loading checkpoint shards: 29%|████████████████▎ | 2/7 [00:01<0Loading checkpoint shards: 43%|████████████████████████▍ | 3/7 Loading checkpoint shards: 57%|████████████████████████████████▌ Loading checkpoint shards: 71%|████████████████████████████████████████▋ Loading checkpoint shards: 86%|████████████████████████████████████████████Loading checkpoint shards: 100%|████████████████████████████████████████████Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [00:05<00:00, 1.18it/s]
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.

issues

binary_path: F:\Anaconda\envs\chatglm\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll
CUDA SETUP: Loading binary F:\Anaconda\envs\chatglm\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll...
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.

报错信息如下：
┌─────────────────────────────── Traceback (most recent call last) ────────────── ──────────────────┐
│ F:\ChatGLM\chatglm3_6b_finetune\inference_hf.py:51 in main │
│ │
│ 48 │ │ prompt: Annotated[str, typer.Option(help='')], │
│ 49 ): │
│ 50 │ model, tokenizer = load_model_and_tokenizer(model_dir) │
│ > 51 │ response, _ = model.chat(tokenizer, prompt) │
│ 52 │ print(response) │
│ 53 │
│ 54 │
│ │
│ F:\Anaconda\envs\chatglm\lib\site-packages\torch\autograd\grad_mode.py:27 in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ > 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def _wrap_generator(self, func): │
│ │
│ C:\Users\Administrator.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chat │
│ glm.py:1042 in chat │
│ │
│ 1039 │ │ inputs = inputs.to(self.device) │
│ 1040 │ │ eos_token_id = [tokenizer.eos_token_id, tokenizer.get_command("<|user|>"), │
│ 1041 │ │ │ │ │ │ tokenizer.get_command("<|observation|>")] │
│ > 1042 │ │ outputs = self.generate(**inputs, **gen_kwargs, eos_token_id=eos_token_id) │
│ 1043 │ │ outputs = outputs.tolist()[0][len(inputs["input_ids"][0]):-1] │
│ 1044 │ │ response = tokenizer.decode(outputs) │
│ 1045 │ │ history.append({"role": role, "content": query}) │
│ │
│ F:\Anaconda\envs\chatglm\lib\site-packages\torch\autograd\grad_mode.py:27 in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ > 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def _wrap_generator(self, func): │
│ │
│ F:\Anaconda\envs\chatglm\lib\site-packages\transformers\generation\utils.py:1575 in generate │
│ │
│ 1572 │ │ │ ) │
│ 1573 │ │ │ │
│ 1574 │ │ │ # 13. run sample │
│ > 1575 │ │ │ result = self._sample( │
│ 1576 │ │ │ │ input_ids, │
│ 1577 │ │ │ │ logits_processor=prepared_logits_processor, │
│ 1578 │ │ │ │ logits_warper=logits_warper, │
│ │
│ F:\Anaconda\envs\chatglm\lib\site-packages\transformers\generation\utils.py:2697 in _sample │
│ │
│ 2694 │ │ │ model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) │
│ 2695 │ │ │ │
│ 2696 │ │ │ # forward pass to get next token │
│ > 2697 │ │ │ outputs = self( │
│ 2698 │ │ │ │ **model_inputs, │
│ 2699 │ │ │ │ return_dict=True, │
│ 2700 │ │ │ │ output_attentions=output_attentions, │
│ │
│ F:\Anaconda\envs\chatglm\lib\site-packages\torch\nn\modules\module.py:1130 in _call_impl │
│ │
│ 1127 │ │ # this function, and just call forward. │
│ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ > 1130 │ │ │ return forward_call(*input, **kwargs) │
│ 1131 │ │ # Do not call functions when jit is used │
│ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ C:\Users\Administrator.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chat │
│ glm.py:941 in forward │
│ │
│ 938 │ │ use_cache = use_cache if use_cache is not None else self.config.use_cache │
│ 939 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │
│ 940 │ │ │
│ > 941 │ │ transformer_outputs = self.transformer( │
│ 942 │ │ │ input_ids=input_ids, │
│ 943 │ │ │ position_ids=position_ids, │
│ 944 │ │ │ attention_mask=attention_mask, │
│ │
│ F:\Anaconda\envs\chatglm\lib\site-packages\torch\nn\modules\module.py:1130 in _call_impl │
│ │
│ 1127 │ │ # this function, and just call forward. │
│ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ > 1130 │ │ │ return forward_call(*input, **kwargs) │
│ 1131 │ │ # Do not call functions when jit is used │
│ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ C:\Users\Administrator.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chat │
│ glm.py:822 in forward │
│ │
│ 819 │ │ │ │ │ │ │ │ │ │ │ attention_mask], dim=-1) │
│ 820 │ │ │
│ 821 │ │ if full_attention_mask is None: │
│ > 822 │ │ │ if (attention_mask is not None and not attention_mask.all()) or (past_key_va │
│ 823 │ │ │ │ full_attention_mask = self.get_masks(input_ids, past_key_values, padding │
│ 824 │ │ │
│ 825 │ │ # Rotary positional embeddings │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

环境如下：
absl-py==2.1.0
accelerate==0.27.2
aiofiles==23.2.1
aiohttp==3.9.3
aiosignal==1.3.1
altair==5.2.0
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.3.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arxiv==2.1.0
async-timeout==4.0.3
attrs==23.2.0
azure-core==1.30.1
azure-storage-blob==12.19.1
backoff==2.2.1
beautifulsoup4==4.12.3
bitsandbytes==0.37.1
bitsandbytes-windows==0.37.5
blinker==1.7.0
blis==0.7.11
Brotli==1.1.0
cachetools==5.3.3
catalogue==2.0.10
certifi==2024.2.2
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
cloudpathlib==0.16.0
colorama==0.4.6
coloredlogs==15.0.1
confection==0.1.4
contourpy==1.2.0
cpm-kernels==1.0.11
cryptography==42.0.5
curl_cffi==0.6.2
cycler==0.12.1
cymem==2.0.8
dashscope==1.13.6
dataclasses-json==0.6.4
datasets==2.18.0
deepdiff==6.7.1
Deprecated==1.2.14
deprecation==2.1.0
dill==0.3.8
distro==1.9.0
duckduckgo_search==5.1.0
effdet==0.4.1
einops==0.7.0
emoji==2.10.1
environs==9.5.0
et-xmlfile==1.1.0
exceptiongroup==1.2.0
faiss-cpu==1.7.4
fake-useragent==1.5.1
fastapi==0.109.0
feedparser==6.0.10
ffmpy==0.3.2
filelock==3.13.1
filetype==1.2.0
flatbuffers==24.3.7
fonttools==4.49.0
frozenlist==1.4.1
fschat==0.2.35
fsspec==2024.2.0
gitdb==4.0.11
GitPython==3.1.42
google-auth==2.29.0
google-auth-oauthlib==0.4.6
gradio==3.50.0
gradio_client==0.6.1
greenlet==3.0.3
grpcio==1.60.0
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==1.0.4
httpx==0.27.0
httpx-sse==0.4.0
huggingface-hub==0.21.4
humanfriendly==10.0
hyperframe==6.0.1
idna==3.4
imageio==2.34.0
importlib_metadata==7.0.2
importlib_resources==6.3.1
iniconfig==2.0.0
iopath==0.1.10
isodate==0.6.1
jieba==0.42.1
Jinja2==3.1.3
joblib==1.3.2
jsonpatch==1.33
jsonpath-python==1.0.6
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
langchain==0.0.354
langchain-community==0.0.20
langchain-core==0.1.23
langchain-experimental==0.0.47
langcodes==3.3.0
langdetect==1.0.9
langsmith==0.0.87
latex2mathml==3.77.0
layoutparser==0.3.4
lazy_loader==0.3
llama-index==0.9.35
loguru==0.7.2
lxml==5.1.0
Markdown==3.5.2
markdown-it-py==3.0.0
markdown2==2.4.13
markdownify==0.11.6
MarkupSafe==2.1.5
marshmallow==3.21.1
matplotlib==3.8.3
mdtex2html==1.3.0
mdurl==0.1.2
metaphor-python==0.1.23
minio==7.2.5
mkl-fft==1.3.8
mkl-random==1.2.4
mkl-service==2.4.0
mpmath==1.3.0
msg-parser==1.2.0
multidict==6.0.5
multiprocess==0.70.16
murmurhash==1.0.10
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.2.1
nh3==0.2.15
nltk==3.8.1
numexpr==2.8.6
numpy==1.24.4
oauthlib==3.2.2
olefile==0.47
omegaconf==2.3.0
onnx==1.15.0
onnxruntime==1.15.1
openai==1.9.0
opencv-python==4.9.0.80
openpyxl==3.1.2
ordered-set==4.1.0
orjson==3.9.15
packaging==23.2
pandas==2.0.3
pathlib==1.0.1
pdf2image==1.17.0
pdfminer.six==20231228
pdfplumber==0.11.0
peft==0.9.0
pikepdf==8.4.1
Pillow==9.5.0
pillow_heif==0.15.0
pip==23.3.1
pluggy==1.4.0
portalocker==2.8.2
preshed==3.0.9
prompt-toolkit==3.0.43
protobuf==3.20.3
psutil==5.9.8
pyarrow==15.0.1
pyarrow-hotfix==0.6
pyasn1==0.6.0
pyasn1_modules==0.4.0
pyclipper==1.3.0.post5
pycocotools==2.0.7
pycparser==2.21
pycryptodome==3.20.0
pydantic==1.10.13
pydantic_core==2.16.3
pydash==7.0.7
pydeck==0.8.1b0
pydub==0.25.1
PyExecJS==1.5.1
Pygments==2.17.2
PyJWT==2.8.0
pymilvus==2.4.0
PyMuPDF==1.23.16
PyMuPDFb==1.23.9
pypandoc==1.13
pyparsing==3.1.2
pypdf==4.1.0
pypdfium2==4.28.0
pyreadline3==3.4.1
pytesseract==0.3.10
pytest==7.4.3
python-dateutil==2.9.0.post0
python-decouple==3.8
python-docx==1.1.0
python-dotenv==1.0.1
python-iso639==2024.2.7
python-magic==0.4.27
python-magic-bin==0.4.14
python-multipart==0.0.9
python-pptx==0.6.23
pytz==2024.1
pywencai==0.12.2
pywin32==306
PyYAML==6.0.1
rapidfuzz==3.6.2
rapidocr-onnxruntime==1.3.8
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
requests-oauthlib==2.0.0
rich==13.7.1
rouge-chinese==1.0.3
rpds-py==0.18.0
rsa==4.9
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
ruff==0.3.3
safetensors==0.4.2
scikit-image==0.22.0
scikit-learn==1.4.1.post1
scipy==1.12.0
semantic-version==2.10.0
sentence-transformers==2.2.2
sentencepiece==0.2.0
setuptools==68.2.2
sgmllib3k==1.0.0
shapely==2.0.3
shellingham==1.5.4
shortuuid==1.0.13
simplejson==3.19.2
six==1.16.0
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.1
socksio==1.0.0
soupsieve==2.5
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
SQLAlchemy==2.0.19
srsly==2.4.8
sse-starlette==1.8.2
starlette==0.35.0
streamlit==1.30.0
streamlit-aggrid==0.3.4.post3
streamlit-antd-components==0.3.1
streamlit-chatbox==1.1.11
streamlit-feedback==0.1.3
streamlit-modal==0.1.0
streamlit-option-menu==0.3.12
strsimpy==0.2.1
svgwrite==1.4.3
sympy==1.12
tabulate==0.9.0
tenacity==8.2.3
tensorboard==2.10.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
text2vec==1.2.9
thinc==8.2.3
threadpoolctl==3.3.0
tifffile==2024.2.12
tiktoken==0.5.2
timm==0.9.16
tokenizers==0.15.2
toml==0.10.2
tomli==2.0.1
tomlkit==0.12.0
toolz==0.12.1
torch==1.12.0+cu113
torchaudio==0.12.0+cu113
torchvision==0.13.0+cu113
tornado==6.4
tqdm==4.66.1
transformers==4.39.3
transformers-stream-generator==0.0.4
typer==0.9.0
typing_extensions==4.10.0
typing-inspect==0.9.0
tzdata==2024.1
tzlocal==5.2
ujson==5.9.0
unstructured==0.11.0
unstructured-client==0.22.0
unstructured-inference==0.7.15
unstructured.pytesseract==0.3.12
urllib3==2.1.0
uvicorn==0.28.0
validators==0.22.0
visdom==0.2.4
wasabi==1.1.2
watchdog==3.0.0
wavedrom==2.0.3.post3
wcwidth==0.2.13
weasel==0.3.4
websocket-client==1.7.0
websockets==12.0
Werkzeug==3.0.2
wheel==0.41.2
win32-setctime==1.1.0
wrapt==1.16.0
xformers==0.0.23.post1
xlrd==2.0.1
XlsxWriter==3.2.0
xxhash==3.4.1
yarl==1.9.4
youtube-search==2.1.2
zipp==3.18.0

About multi-GPU

Is there any command and ymal example about using accelerate to launch train.py?

example.ipynb中进行训练测试loss为nan

前面代码不变

model(**batch).loss

改为

for i in range(10):
    output = model(**batch)
    loss = output.loss
    loss.backward()
    optimizer.step()
    lr_scheduler.step()
    optimizer.zero_grad()
    print(loss.detach().float())

输出

tensor(3.2207, device='cuda:0')
tensor(nan, device='cuda:0')
tensor(nan, device='cuda:0')
tensor(nan, device='cuda:0')
tensor(nan, device='cuda:0')
tensor(nan, device='cuda:0')
tensor(nan, device='cuda:0')
tensor(nan, device='cuda:0')
tensor(nan, device='cuda:0')
tensor(nan, device='cuda:0')

只有第1个step时loss正常计算，请问这是为啥？
lr设置1e-8

NameError: name 'train_dataloader' is not defined

lr_scheduler = get_linear_schedule_with_warmup(
optimizer=optimizer,
num_warmup_steps=int(len(train_dataloader) / accumulate_step),
num_training_steps=(int(len(train_dataloader) / accumulate_step) * NUM_EPOCHS),
)

RuntimeError: CUDA error: invalid device ordinal

运行accelerate launch --config_file config/default_config.yaml train_new.py,然后报错：

Traceback (most recent call last):
  File "/home/searchgpt/yq/ChatGLM-finetune-LoRA/train_new.py", line 38, in <module>
    accelerator = Accelerator(mixed_precision=mixed_precision, gradient_accumulation_steps=accumulate_step, deepspeed_plugin=deepspeed_plugin)
  File "/home/searchgpt/anaconda3/envs/stanford_alpaca/lib/python3.10/site-packages/accelerate/accelerator.py", line 340, in __init__
    self.state = AcceleratorState(
  File "/home/searchgpt/anaconda3/envs/stanford_alpaca/lib/python3.10/site-packages/accelerate/state.py", line 539, in __init__
    PartialState(cpu, **kwargs)
  File "/home/searchgpt/anaconda3/envs/stanford_alpaca/lib/python3.10/site-packages/accelerate/state.py", line 123, in __init__
    torch.cuda.set_device(self.device)
  File "/home/searchgpt/anaconda3/envs/stanford_alpaca/lib/python3.10/site-packages/torch/cuda/__init__.py", line 350, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

运行环境为：单机2卡A100

训练loss变为NaN

进行反向传播以后，loss就会变为NaN。
设置的accumlate_step = 16。

目前系统的CUDA是10.2，pytorch中的cuda是11.6。
代码如下，单GPU：

import os
import tqdm
import json
import torch
import loralib as lora
import lora_utils.insert_lora
import dataset.GLM as GLM_Data
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, AutoModel
# from accelerate import Accelerator, DeepSpeedPlugin
from transformers import get_linear_schedule_with_warmup

device = "cuda"
checkpoint = "THUDM/chatglm-6b"
# mixed_precision = 'bf16'
lora_config = {
    'r': 32,
    'lora_alpha':32,
    'lora_dropout':0.1,
    'enable_lora':[True, True, True],
}
max_length = 256
LR = 2e-5
NUM_EPOCHS = 2
batch_size = 4
accumulate_step = 8
warm_up_ratio = 0.1


tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True, revision = 'main')
model = AutoModel.from_pretrained(checkpoint, trust_remote_code=True, revision = 'main')
model = lora_utils.insert_lora.get_lora_model(model, lora_config)


import dataset.Alpaca as Alpaca_Data

pairs = Alpaca_Data.load('./data/alpaca_data.json')
GLM_Data.device = device
pairs_encoded = GLM_Data.encode_pairs(pairs, tokenizer)
pairs_encoded = list(filter(lambda pair: len(pair['prompt'])+len(pair['completion']) <= max_length, pairs_encoded))
train_dataset = GLM_Data.GLMDataset(pairs_encoded)
train_dataloader = DataLoader(dataset=train_dataset, collate_fn = GLM_Data.collate_fn, shuffle=True, batch_size=batch_size)


optimizer = torch.optim.AdamW(model.parameters(), lr=LR)
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=int(len(train_dataloader) / accumulate_step * warm_up_ratio),
    num_training_steps=(int(len(train_dataloader) / accumulate_step) * NUM_EPOCHS),
)

# model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dataloader)
# model.to(device).train()
from torch.cuda.amp import autocast

LR = 2e-5
NUM_EPOCHS = 2
accumulate_step = 16
version = 'test'

optimizer = torch.optim.AdamW(model.parameters(), lr=LR)

lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=int(len(train_dataloader) / accumulate_step),
    num_training_steps=(int(len(train_dataloader) / accumulate_step) * NUM_EPOCHS),
)


model.half().to(device).train()

for epoch in range(NUM_EPOCHS):
    epoch_loss_local = 0
    for step, batch in enumerate(t:=tqdm.tqdm(train_dataloader)):
        batch = {k: v.to('cuda') for k, v in batch.items()}
        outputs = model(**batch)
        loss_d = outputs.loss.detach()
        epoch_loss_local += loss_d
        t.set_description(f"loss: {epoch_loss_local.cpu().float() / step}")
        loss = outputs.loss / accumulate_step
        loss.backward()
        if (step+1) % accumulate_step == 0:
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()

请问有交流群吗?

rt

数据集和微调模型的一些问题

请问目前支持中文数据集吗？还有就是试着加载了库里的chatglm-6b_alpaca_5.pt这个模型，但是好像并没有起什么作用

LORAConfig报错：ValueError: Target modules ['q', 'k', 'v'] not found in the base model. Please check the target modules and try again.

我使用train.py完成了训练，并且获得了./saved/finetune_0.pt文件，但是由于没有inference部分，所以使用LoRA_finetune_with_stanford_alpaca.ipynb中最后inference模块的时候报错：

Traceback (most recent call last):
  File "inference.py", line 49, in <module>
    module.query_key_value = peft.tuners.lora.LoraModel(config, module.query_key_value)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python38\lib\site-packages\peft\tuners\lora.py", line 118, in __init__
    self._find_and_replace()
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python38\lib\site-packages\peft\tuners\lora.py", line 181, in _find_and_replace
    raise ValueError(
ValueError: Target modules ['q', 'k', 'v'] not found in the base model. Please check the target modules and try again.

使用的LORAConfig是：（这个target在ipynb和train.py里是一致的）

config = LoraConfig(
              peft_type="LORA", 
              r=32, 
              lora_alpha=32, 
              target_modules=["q", "k", "v"],
              lora_dropout=0.1, 
              )

为什么会报这个错？之前遇到过但是也没搜到…不知道大家能不能跑通inference。

问答数据集如何构建

自己构建问答对的数据集，按照这样格式整理有什么问题？

还请指教一下，多谢！

Is there any details about dataset?

No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py'

您好，问一下，这个训练完有可以展示与原来基础模型进行对比的测试效果吗

训练超显存

请问这个模型训练显存至少要多少？我这里是V100 32GB

可以给一下具体的环境requirement吗?

有什么卡训练的，V100完全搞不定

有什么卡训练的，V100完全搞不定，动不动cuda out of memory

here are my questions,I have more than 4 gpus to run the train.py,but it still out of memory,I check the usage of memory and find that one of them overflows and produce the bug,how can I solve it?

WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "", line 1, in
FileNotFoundError: [Errno 2] No such file or directory: '/home/cike/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/551a50efec3acc5a9b94de8ec46d33d0f81919f7/modeling_chatglm.py'
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00, 1.92s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00, 1.95s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00, 1.96s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00, 1.97s/it]
trainable_params:22020096 (0.35%), non_trainable_params:6255206400
trainable_params:22020096 (0.35%), non_trainable_params:6255206400
trainable_params:22020096 (0.35%), non_trainable_params:6255206400
trainable_params:22020096 (0.35%), non_trainable_params:6255206400
[2023-04-06 08:33:02,712] [INFO] [logging.py:93:log_dist] [Rank -1] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-04-06 08:33:02,747] [INFO] [logging.py:93:log_dist] [Rank -1] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-04-06 08:33:02,756] [INFO] [logging.py:93:log_dist] [Rank -1] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-04-06 08:33:02,874] [INFO] [logging.py:93:log_dist] [Rank -1] DeepSpeed info: version=0.8.3, git-hash=unknown, git-branch=unknown
[2023-04-06 08:33:24,442] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-04-06 08:33:24,445] [INFO] [logging.py:93:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-04-06 08:33:24,445] [INFO] [logging.py:93:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-04-06 08:33:24,509] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2023-04-06 08:33:24,509] [INFO] [utils.py:55:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2023-04-06 08:33:24,509] [INFO] [logging.py:93:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2023-04-06 08:33:24,510] [INFO] [stage_1_and_2.py:144:init] Reduce bucket size 500,000,000
[2023-04-06 08:33:24,510] [INFO] [stage_1_and_2.py:145:init] Allgather bucket size 500,000,000
[2023-04-06 08:33:24,510] [INFO] [stage_1_and_2.py:146:init] CPU Offload: False
[2023-04-06 08:33:24,510] [INFO] [stage_1_and_2.py:147:init] Round robin gradient partitioning: False
Using /home/cike/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/cike/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/cike/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/cike/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Emitting ninja build file /home/cike/.cache/torch_extensions/py39_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.35059165954589844 seconds
Loading extension module utils...
Time to load utils op: 0.40517687797546387 seconds
Loading extension module utils...
Time to load utils op: 0.40523695945739746 seconds
Loading extension module utils...
Time to load utils op: 0.40430521965026855 seconds
Rank: 3 partition count [4] and sizes[(5505024, False)]
Rank: 2 partition count [4] and sizes[(5505024, False)]
Rank: 0 partition count [4] and sizes[(5505024, False)]
Rank: 1 partition count [4] and sizes[(5505024, False)]
Using /home/cike/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00042438507080078125 seconds
Using /home/cike/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Using /home/cike/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Time to load utils op: 0.00040459632873535156 seconds
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00035071372985839844 seconds
0%| | 0/12241 [00:00<?, ?it/s][2023-04-06 08:33:25,838] [INFO] [utils.py:829:see_memory_usage] Before initializing optimizer states
[2023-04-06 08:33:25,839] [INFO] [utils.py:830:see_memory_usage] MA 11.71 GB Max_MA 11.72 GB CA 11.75 GB Max_CA 12 GB
[2023-04-06 08:33:25,839] [INFO] [utils.py:838:see_memory_usage] CPU Virtual Memory: used = 12.0 GB, percent = 4.8%
[2023-04-06 08:33:26,025] [INFO] [utils.py:829:see_memory_usage] After initializing optimizer states
[2023-04-06 08:33:26,025] [INFO] [utils.py:830:see_memory_usage] MA 11.76 GB Max_MA 11.82 GB CA 11.85 GB Max_CA 12 GB
[2023-04-06 08:33:26,026] [INFO] [utils.py:838:see_memory_usage] CPU Virtual Memory: used = 12.0 GB, percent = 4.8%
[2023-04-06 08:33:26,026] [INFO] [stage_1_and_2.py:520:init] optimizer state initialized
[2023-04-06 08:33:26,092] [INFO] [utils.py:829:see_memory_usage] After initializing ZeRO optimizer
[2023-04-06 08:33:26,093] [INFO] [utils.py:830:see_memory_usage] MA 11.76 GB Max_MA 11.76 GB CA 11.85 GB Max_CA 12 GB
[2023-04-06 08:33:26,093] [INFO] [utils.py:838:see_memory_usage] CPU Virtual Memory: used = 12.0 GB, percent = 4.8%
[2023-04-06 08:33:26,094] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2023-04-06 08:33:26,095] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-04-06 08:33:26,095] [INFO] [logging.py:93:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2023-04-06 08:33:26,095] [INFO] [logging.py:93:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.999)]
[2023-04-06 08:33:26,096] [INFO] [config.py:1018:print] DeepSpeedEngine configuration:
[2023-04-06 08:33:26,096] [INFO] [config.py:1022:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2023-04-06 08:33:26,096] [INFO] [config.py:1022:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-04-06 08:33:26,096] [INFO] [config.py:1022:print] amp_enabled .................. False
[2023-04-06 08:33:26,096] [INFO] [config.py:1022:print] amp_params ................... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] bfloat16_enabled ............. True
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] checkpoint_parallel_write_pipeline False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] checkpoint_tag_validation_enabled True
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] checkpoint_tag_validation_fail False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f15140266a0>
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] communication_data_type ...... None
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] curriculum_enabled_legacy .... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] curriculum_params_legacy ..... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] data_efficiency_enabled ...... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] dataloader_drop_last ......... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] disable_allgather ............ False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] dump_state ................... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] dynamic_loss_scale_args ...... None
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] eigenvalue_enabled ........... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] eigenvalue_gas_boundary_resolution 1
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] eigenvalue_layer_name ........ bert.encoder.layer
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] eigenvalue_layer_num ......... 0
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] eigenvalue_max_iter .......... 100
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] eigenvalue_stability ......... 1e-06
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] eigenvalue_tol ............... 0.01
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] eigenvalue_verbose ........... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] elasticity_enabled ........... False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] flops_profiler_config ........ {
"enabled": false,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] fp16_auto_cast ............... None
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] fp16_enabled ................. False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] fp16_master_weights_and_gradients False
[2023-04-06 08:33:26,097] [INFO] [config.py:1022:print] global_rank .................. 0
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] grad_accum_dtype ............. None
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] gradient_accumulation_steps .. 8
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] gradient_clipping ............ 0.0
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] gradient_predivide_factor .... 1.0
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] initial_dynamic_scale ........ 1
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] load_universal_checkpoint .... False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] loss_scale ................... 1.0
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] memory_breakdown ............. False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] optimizer_legacy_fusion ...... False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] optimizer_name ............... None
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] optimizer_params ............. None
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] pld_enabled .................. False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] pld_params ................... False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] prescale_gradients ........... False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] scheduler_name ............... None
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] scheduler_params ............. None
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] sparse_attention ............. None
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] sparse_gradients_enabled ..... False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] steps_per_print .............. inf
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] train_batch_size ............. 32
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] train_micro_batch_size_per_gpu 1
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] use_node_local_storage ....... False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] wall_clock_breakdown ......... False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] world_size ................... 4
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] zero_allow_untested_optimizer True
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] zero_enabled ................. True
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] zero_force_ds_cpu_optimizer .. True
[2023-04-06 08:33:26,098] [INFO] [config.py:1022:print] zero_optimization_stage ...... 2
[2023-04-06 08:33:26,099] [INFO] [config.py:1007:print_user_config] json = {
"train_batch_size": 32,
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 8,
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "none"
},
"offload_param": {
"device": "none"
},
"stage3_gather_16bit_weights_on_model_save": false
},
"steps_per_print": inf,
"bf16": {
"enabled": true
},
"fp16": {
"enabled": false
},
"zero_allow_untested_optimizer": true
}
Using /home/cike/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00032019615173339844 seconds
loss: 2.640625: 0%| | 1/12241 [00:04<14:19:20, 4.21s/it]
Traceback (most recent call last):
File "/home/cike/zzp/LoRA/ChatGLM-finetune-LoRA/train.py", line 220, in
accelerator.backward(loss)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/accelerate/accelerator.py", line 1677, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2008, in backward
self.allreduce_gradients()
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, kwargs)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1918, in allreduce_gradients
self.optimizer.overlapping_partition_gradients_reduce_epilogue()
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 834, in overlapping_partition_gradients_reduce_epilogue
self.independent_gradient_partition_epilogue()
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 720, in independent_gradient_partition_epilogue
self.reduce_ipg_grads()
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1287, in reduce_ipg_grads
self.average_tensor(self.ipg_buffer[self.ipg_index])
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1018, in average_tensor
tensor_to_reduce = tensor.to(self.communication_data_type)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 954.00 MiB (GPU 1; 15.90 GiB total capacity; 12.84 GiB already allocated; 927.75 MiB free; 14.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
loss: 2.75: 0%| | 1/12241 [00:04<15:35:54, 4.59s/it]
Traceback (most recent call last):
File "/home/cike/zzp/LoRA/ChatGLM-finetune-LoRA/train.py", line 220, in
accelerator.backward(loss)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/accelerate/accelerator.py", line 1677, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, **kwargs)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2008, in backward
self.allreduce_gradients()
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, kwargs)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1918, in allreduce_gradients
self.optimizer.overlapping_partition_gradients_reduce_epilogue()
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 834, in overlapping_partition_gradients_reduce_epilogue
self.independent_gradient_partition_epilogue()
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 720, in independent_gradient_partition_epilogue
self.reduce_ipg_grads()
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1287, in reduce_ipg_grads
self.average_tensor(self.ipg_buffer[self.ipg_index])
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1018, in average_tensor
tensor_to_reduce = tensor.to(self.communication_data_type)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 954.00 MiB (GPU 3; 15.90 GiB total capacity; 12.88 GiB already allocated; 903.75 MiB free; 14.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 107844 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 107846 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 107845) of binary: /home/cike/anaconda/envs/lora/bin/python
Traceback (most recent call last):
File "/home/cike/anaconda/envs/lora/bin/accelerate", line 8, in
sys.exit(main())
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/accelerate/commands/launch.py", line 908, in launch_command
deepspeed_launcher(args)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/accelerate/commands/launch.py", line 647, in deepspeed_launcher
distrib_run.run(args)
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/cike/anaconda/envs/lora/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
[1]:
time : 2023-04-06_08:33:33
host : 4d9275d5570f
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 107847)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-04-06_08:33:33
host : 4d9275d5570f
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 107845)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

context_length = obj['prompt'].index(130004)

130004是从哪来的？为什么设置这个参数？

while cnt < retry_cnt:

大佬这是在做什么while cnt < retry_cnt:？？

train.py的命令行启动是什么？

直接python3 train.py吗？要设置什么参数？

关于分布式GPU训练

这个训练是如何支持分布式GPU训练的，有什么参数是能设置让模型进行并行训练的，我只知道修改那个num_processes的参数只能在每个GPU下分别训练一个模型，但是能不能让一个模型在多个GPU下训练呢，因为我的显卡只有16G

请问下Finetune之后能实现企业定制FAQ的效果吗？可能有一百个问答这样

如题

torch.distributed.elastic.multiprocessing.errors.ChildFailedError, when running the train_new.py

I encountered a ChildFailedError,
I used the command python -m torch.distributed.run train_new.py
below is the error information.

The program was run on py3.10, and pip list is as below

训练后的结果对应不上

finetune没效果

针对你是谁finetune，输出还是原模型的回答，新finetune的pt模型加载没有问题吧，如下：
tokenizer = AutoTokenizer.from_pretrained("ChatGLM-6B/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("ChatGLM-6B/chatglm-6b", trust_remote_code=True).half().cuda()

加载finetune模型

peft_path = "ChatGLM-finetune-LoRA/saved/finetune_test/finetune_test_epoch_2.pt"
model.load_state_dict(torch.load(peft_path), strict=False)
model.eval()

lich99 / chatglm-finetune-lora Goto Github PK

chatglm-finetune-lora's People

Contributors

Stargazers

Watchers

Forkers

chatglm-finetune-lora's Issues

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

train.py FAILED

Failures: [1]: time : 2023-04-06_08:33:33 host : 4d9275d5570f rank : 3 (local_rank: 3) exitcode : 1 (pid: 107847) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-04-06_08:33:33 host : 4d9275d5570f rank : 1 (local_rank: 1) exitcode : 1 (pid: 107845) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

加载finetune模型

Recommend Projects

Recommend Topics

Recommend Org

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

Failures:
[1]:
time : 2023-04-06_08:33:33
host : 4d9275d5570f
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 107847)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-04-06_08:33:33
host : 4d9275d5570f
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 107845)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html