Code Monkey home page Code Monkey logo

swift's Introduction

SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning)



ModelScope Community Website
中文   |   English  

📖 Table of Contents

📝 Introduction

SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by PEFT, we also provide a complete Adapters library to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts.

To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners.

Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.

🎉 News

  • 2024.04.20: Support for inference, fine-tuning, and deployment of Atom series models. This includes: Atom-7B and Atom-7B-Chat. use this script to train.
  • 2024.04.19: Support for single-card, DDP, ZeRO2, and ZeRO3 training and inference with NPU, please refer to NPU Inference and Fine-tuning Best Practices.
  • 2024.04.19: Support for inference, fine-tuning, and deployment of Llama3 series models. This includes: Llama-3-8B, Llama-3-8B-Instruct, Llama-3-70B, and Llama-3-70B-Instruct. use this script to train.
  • 2024.04.18: Supported models: wizardlm2-7b-awq, wizardlm2-8x22b, yi-6b-chat-awq, yi-6b-chat-int8, yi-34b-chat-awq, yi-34b-chat-int8. Supported --deepspeed zero3-offload and provided default zero3-offload configuration file for zero3+cpu offload usage.
  • 2024.04.18: Supported compatibility with HuggingFace ecosystem using the environment variable USE_HF, switching to use models and datasets from HF. Please refer to the HuggingFace ecosystem compatibility documentation.
  • 2024.04.17: Support the evaluation for OpenAI standard interfaces. Check the parameter documentation for details.
  • 🔥2024.04.17: Support CodeQwen1.5-7B series: CodeQwen1.5-7B, CodeQwen1.5-7B-Chat,CodeQwen1.5-7B-Chat-AWQ, use this script to train.
  • 2024.04.16: Supports inference and fine-tuning of llava-v1.6-34b model. For best practice, you can refer to here.
  • 2024.04.13: Support the fine-tuning and inference of Mixtral-8x22B-v0.1 model, use this script to start training!
  • 2024.04.13: Support the newly launched MiniCPM series: MiniCPM-V-2.0、MiniCPM-2B-128k、MiniCPM-MoE-8x2B and MiniCPM-1B.use this script to start training!
  • 🔥2024.04.11: Support Model Evaluation with MMLU/ARC/CEval datasets(also user custom eval datasets) with one command! Check this documentation for details. Meanwhile, we support a trick way to do multiple ablation experiments, check this documentation to use.
  • 🔥2024.04.11: Support c4ai-command-r series: c4ai-command-r-plus, c4ai-command-r-v01, use this script to train.
  • 2024.04.10: Use SWIFT to fine-tune the qwen-7b-chat model to enhance its function call capabilities, and combine it with Modelscope-Agent for best practices, which can be found here.
  • 🔥2024.04.09: Support ruozhiba dataset. Search ruozhiba in this documentation to begin training!
  • 2024.04.08: Support the fine-tuning and inference of XVERSE-MoE-A4.2B model, use this script to start training!
  • 2024.04.04: Support QLoRA+FSDP to train a 70B model with two 24G memory GPUs, use this script to train.
  • 🔥2024.04.03: Support Qwen1.5-32B series: Qwen1.5-32B, Qwen1.5-32B-Chat, Qwen1.5-32B-Chat-GPTQ-Int4.use this script to start training!
  • 🔥2024.04.02: Support the fine-tuning and inference of Mengzi3-13B-Base model, use this script to start training!
  • 🔥2024.04.01: Support dbrx series: dbrx-base and dbrx-instruct, use this script to start training!
  • 🔥2024.03.29: Support Qwen1.5-MoE series: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.
  • 🔥2024.03.29: Support the fine-tuning and inference of Grok-1 300B MoE, please view details here.
  • 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-7b and TeleChat-12b model, use this script to start training!
  • 🔥2024.03.20: Supports inference and fine-tuning for the llava series. For best practice, you can refer to here.
More
  • 🔥2024.03.12: Support inference and fine-tuning for deepseek-vl series. Best practices can be found here.

  • 🔥2024.03.11: Support GaLore for effectively reducing memory usage to 1/2 of the original in full-parameter training.

  • 🔥2024.03.10: End-to-end best practices from fine-tuning to deployment for Qwen1.5-7B-Chat and Qwen1.5-72B-Chat.

  • 🔥2024.03.09: Support training and inference of MAMBA model, use this script to start training!

  • 2024.03.09: Support training and inference of AQLM quantized model, use this script to start training!

  • 2024.03.06: Support training and inference of AWQ quantized model, use this Qwen1.5-AWQ model script to start training, and support training and inference of yi-9b.

  • 🔥2024.02.29: Support LLaMA PRO, simply use this script to start training.

  • 🔥2024.02.29: Support LoRA+, simply use this script to start training.

  • 2024.02.25: Support swift export to quantize models using AWQ/GPTQ and push to ModelScope Hub. See documentation: LLM Quantization.

  • 2024.02.22: Support gemma series: gemma-2b, gemma-2b-instruct, gemma-7b, gemma-7b-instruct.

  • 2024.02.16: Support deepseek-math series: deepseek-math-7b, deepseek-math-7b-instruct, deepseek-math-7b-chat.

  • 🔥2024.02.05: Support Qwen1.5 series models, see model list for all supported Qwen1.5 models. Provide fine-tuning scripts for qwen1half-7b-chat, qwen1half-7b-chat-int8.

  • 2024.02.05: Support training of diffusion models such as SDXL, SD, ControlNet, as well as DreamBooth training. See corresponding training scripts for details.

  • 2024.02.01: Support minicpm series: minicpm-2b-sft-chat, minicpm-2b-chat.

  • 🔥2024.02.01: Support dataset mixing to reduce catastrophic forgetting. Use --train_dataset_mix_ratio 2.0 to enable training! We also open sourced the general knowledge dataset ms-bench.

  • 🔥2024.02.01: Support Agent training! Agent training algorithm is derived from this paper. We also added ms-agent, a high-quality agent dataset. Use this script to start Agent training!

  • 🔥2024.02.01: Support adding SFT loss in DPO training to reduce repetitive generation caused by KL divergence loss.

  • 2024.02.01: Support using AdaLoRA and IA3 adapters in training.

  • 2024.02.01: Support --merge_lora parameter in AnimateDiff training.

  • 2024.01.30: Support internlm-xcomposer2-7b-chat.

  • 🔥2024.01.30: Support ZeRO-3, simply specify --deepspeed default-zero3.

  • 2024.01.29: Support internlm2-math series: internlm2-math-7b, internlm2-math-7b-chat, internlm2-math-20b, internlm2-math-20b-chat.

  • 🔥2024.01.26: Support yi-vl-6b-chat, yi-vl-34b-chat.

  • 2024.01.24: Support codefuse-codegeex2-6b-chat, codefuse-qwen-14b-chat.

  • 2024.01.23: Support orion series: orion-14b, orion-14b-chat.

  • 2024.01.20: Support xverse-13b-256k, xverse-65b-v2, xverse-65b-chat.

  • 🔥2024.01.17: Support internlm2 series: internlm2-7b-base, internlm2-7b, internlm2-7b-sft-chat, internlm2-7b-chat, internlm2-20b-base, internlm2-20b, internlm2-20b-sft-chat, internlm2-20b-chat.

  • 2024.01.15: Support yuan series: yuan2-2b-instruct, yuan2-2b-janus-instruct, yuan2-51b-instruct, yuan2-102b-instruct.

  • 🔥2024.01.12: Support deepseek-moe series: deepseek-moe-16b, deepseek-moe-16b-chat.

  • 🔥2024.01.04: Support VLLM deployment, compatible with OpenAI API style, see VLLM Inference Acceleration and Deployment for details.

  • 2024.01.04: Update Benchmark for convenient viewing of training speed and memory usage of different models.

  • 🔥2023.12.29: Support web-ui for sft training and inference, use swift web-ui after installing ms-swift to start.

  • 🔥2023.12.29: Support DPO RLHF (Reinforcement Learning from Human Feedback) and three datasets for this task: AI-ModelScope/stack-exchange-paired, AI-ModelScope/hh-rlhf and AI-ModelScope/hh_rlhf_cn. See documentation to start training!

  • 🔥2023.12.28: Support SCEdit! This tuner can significantly reduce memory usage in U-Net and support low-memory controllable image generation (replacing ControlNet), read the section below to learn more.

  • 2023.12.23: Support codegeex2-6b.

  • 2023.12.19: Support phi2-3b.

  • 2023.12.18: Support VLLM for inference acceleration.

  • 2023.12.15: Support deepseek, deepseek-coder series: deepseek-7b, deepseek-7b-chat, deepseek-67b, deepseek-67b-chat, openbuddy-deepseek-67b-chat, deepseek-coder-1_3b, deepseek-coder-1_3b-instruct, deepseek-coder-6_7b, deepseek-coder-6_7b-instruct, deepseek-coder-33b, deepseek-coder-33b-instruct.

  • 2023.12.13: Support mistral-7b-instruct-v2, mixtral-moe-7b, mixtral-moe-7b-instruct.

  • 2023.12.09: Support freeze_parameters parameter as a compromise between lora and full-parameter training. Corresponding sh can be found in full_freeze_ddp. Support disable_tqdm, lazy_tokenize, preprocess_num_proc parameters, see command line arguments for details.

  • 2023.12.08: Support sus-34b-chat, support yi-6b-200k, yi-34b-200k.

  • 2023.12.07: Support Multi-Node DDP training.

  • 2023.12.05: Support models: zephyr-7b-beta-chat, openbuddy-zephyr-7b-chat. Support datasets: hc3-zh, hc3-en.

  • 🔥2023.12.02: Self-cognition fine-tuning best practices, 10 minutes to fine-tune a large model for self-cognition, create your own unique large model.

  • 🔥2023.11.30: Support training and inference of qwen-1_8b, qwen-72b, qwen-audio series models. Corresponding sh scripts can be found in qwen_1_8b_chat, qwen_72b_chat, qwen_audio_chat

  • 🔥2023.11.29: Support training and inference of AnimateDiff

  • 🔥2023.11.24: Support yi-34b-chat, codefuse-codellama-34b-chat models. Corresponding sh scripts can be found in yi_34b_chat, codefuse_codellama_34b_chat.

  • 🔥2023.11.18: Support tongyi-finance-14b series models: tongyi-finance-14b, tongyi-finance-14b-chat, tongyi-finance-14b-chat-int4. Corresponding sh scripts can be found in tongyi_finance_14b_chat_int4.

  • 2023.11.16: Support flash attn for more models: qwen series, qwen-vl series, llama series, openbuddy series, mistral series, yi series, ziya series. Please use use_flash_attn parameter.

  • 🔥2023.11.11: Support NEFTune, simply use Swift.prepare_model(model, NEFTuneConfig()) to enable.

  • 🔥2023.11.11: Support training and inference by command line and inference by Web-UI, see Usage with Swift CLI section below for details.

  • 🔥2023.11.10: Support bluelm series models: bluelm-7b, bluelm-7b-chat, bluelm-7b-32k, bluelm-7b-chat-32k. Corresponding sh scripts can be found in bluelm_7b_chat.

  • 🔥2023.11.08: Support training and inference of xverse-65b model, script at xverse_65b.

  • 🔥2023.11.07: Support training and inference of yi-6b, yi-34b models, scripts at yi_6b, yi_34b.

  • 🔥2023.10.30: Support two new tuners: QA-LoRA and LongLoRA.

  • 🔥2023.10.30: Support editing models using ROME (Rank One Model Editing) to infuse new knowledge into models without training!

  • 2023.10.30: Support skywork-13b series models: skywork-13b, skywork-13b-chat. Corresponding sh scripts can be found in skywork_13b.

  • 🔥2023.10.27: Support chatglm3 series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. Corresponding sh scripts can be found in chatglm3_6b.

  • 🔥2023.10.17: Support SFT of int4, int8 models: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8.

  • 2023.10.15: Support ziya2-13b series models: ziya2-13b, ziya2-13b-chat.

  • 2023.10.12: Support mistral-7b series models: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-instruct.

  • 🔥2023.10.07: Support DeepSpeed ZeRO-2, enabling lora (not just qlora) to run DDP on dual A10 cards.

  • 2023.10.04: Support more math, law, SQL, code domain datasets: blossom-math-zh, school-math-zh, text2sql-en, sql-create-context-en, lawyer-llama-zh, tigerbot-law-zh, leetcode-python-en.

  • 🔥2023.09.25: Support qwen-14b series: qwen-14b, qwen-14b-chat.

  • 2023.09.18: Support internlm-20b series: internlm-20b, internlm-20b-chat.

  • 2023.09.12: Support MP+DDP to accelerate full-parameter training.

  • 2023.09.05: Support openbuddy-llama2-70b-chat.

  • 2023.09.03: Support baichuan2 series: baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat.

🛠️ Installation

SWIFT runs in the Python environment. Please ensure your Python version is higher than 3.8.

  • Method 1: Install SWIFT using pip command:
# Full capabilities
pip install 'ms-swift[all]' -U
# LLM only
pip install 'ms-swift[llm]' -U
# AIGC only
pip install 'ms-swift[aigc]' -U
# Adapters only
pip install ms-swift -U
  • Method 2: Install SWIFT through source code (convenient for running training and inference scripts), please run the following commands:
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'

SWIFT depends on torch>=1.13, recommend torch>=2.0.0.

  • Method 3: Use SWIFT in our Docker image
# China-Hangzhou image
docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1
# US-west image
docker pull registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1

🚀 Getting Started

This section introduces basic usage, see the Documentation section for more ways to use.

Web-UI

swift web-ui

Training

Training Scripts

You can refer to the following scripts to customize your own training script.

Supported Training Processes

Training Process Training Method
Pretraining Text Generation
Fine-tuning Single-turn/Multi-turn
Agent Training/Self-cognition
Multi-modal Vision/Multi-modal Speech
Human Alignment DPO
Text-to-Image DreamBooth, etc.
Text-to-Video -

Single GPU Training

Start single GPU fine-tuning with the following command:

LoRA:

# Experimental Environment: A100
# GPU Memory Requirement: 20GB
# Runtime: 3.1 hours
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    --eval_steps 200 \

Full-parameter:

# Experimental Environment: A100
# GPU Memory Requirement: 80GB
# Runtime: 2.5 hours
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type full \
    --output_dir output \
    --eval_steps 500 \

Model Parallel Training

# Experimental Environment: 2 * A100
# GPU Memory Requirement: 10GB + 13GB
# Runtime: 3.4 hours
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \

Data Parallel Training

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 30GB
# Runtime: 0.8 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \

Combining Model Parallelism and Data Parallelism:

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 2*14GB + 2*18GB
# Runtime: 1.7 hours
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \

Deepspeed Training

ZeRO2:

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 21GB
# Runtime: 0.9 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    --deepspeed default-zero2 \

ZeRO3:

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 19GB
# Runtime: 3.2 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    --deepspeed default-zero3 \

ZeRO3-Offload:

# Experimental Environment: 4 * A100
# GPU Memory Requirement: 4 * 12GB
# Runtime: 60 hours
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_id_or_path AI-ModelScope/WizardLM-2-8x22B \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    --deepspeed zero3-offload \

Inference

Original model:

CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
# use VLLM
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat \
    --infer_backend vllm --max_model_len 8192

LoRA fine-tuned:

CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true
# use VLLM
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true \
    --merge_lora true --infer_backend vllm --max_model_len 8192

Evaluation

CUDA_VISIBLE_DEVICES=0 swift eval --model_type qwen1half-7b-chat --eval_dataset mmlu ceval

Export

Original model:

CUDA_VISIBLE_DEVICES=0 swift export --model_type qwen1half-7b-chat \
    --quant_bits 4 --quant_method awq

LoRA fine-tuned:

CUDA_VISIBLE_DEVICES=0 swift export \
    --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true \
    --quant_method awq --quant_bits 4 \
    --merge_lora true \

Deployment

Original model:

CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen1half-7b-chat
# 使用VLLM加速
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen1half-7b-chat \
    --infer_backend vllm --max_model_len 8192

LoRA fine-tuned:

CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir xxx/checkpoint-xxx
# 使用VLLM加速
CUDA_VISIBLE_DEVICES=0 swift deploy \
    --ckpt_dir xxx/checkpoint-xxx --merge_lora true \
    --infer_backend vllm --max_model_len 8192

Supported Models

LLMs

Model Type Model Introduction Language Model Size Model Type
Qwen
Qwen1.5
Tongyi Qwen 1.0 and 1.5 series models Chinese
English
0.5B-72B
including quantized versions
base model
chat model
MoE model
code model
ChatGLM2
ChatGLM3
Codegeex2
Zhipu ChatGLM series models Chinese
English
6B base model
chat model
code model
Baichuan/Baichuan2 Baichuan 1 and Baichuan 2 Chinese
English
7B-13B
including quantized versions
base model
chat model
Yuan2 Langchao Yuan series models Chinese
English
2B-102B instruct model
XVerse XVerse series models Chinese
English
7B-65B base model
chat model
long text model
MoE model
LLaMA2 LLaMA2 series models English 7B-70B
including quantized versions
base model
chat model
LLaMA3 LLaMA3 series models English 8B-70B base model
chat model
Mistral
Mixtral
Mistral series models English 7B-22B base model
instruct model
MoE model
YI 01AI's YI series models Chinese
English
6B-34B
including quantized
base model
chat model
long text model
InternLM
InternLM2
InternLM2-Math
Pujiang AI Lab InternLM series models Chinese
English
1.8B-20B base model
chat model
math model
DeepSeek
DeepSeek-MoE
DeepSeek-Coder
DeepSeek-Math
DeepSeek series models Chinese
English
1.3B-67B base model
chat model
MoE model
code model
math model
MAMBA MAMBA temporal convolution model English 130M-2.8B base model
Gemma Google Gemma series models English 2B-7B base model
instruct model
MiniCPM OpenBmB MiniCPM series models Chinese
English
2B-3B chat model
MoE model
OpenBuddy OpenBuddy series models Chinese
English
7B-67B base model
chat model
Orion OrionStar AI series models Chinese
English
14B base model
chat model
BlueLM VIVO BlueLM large model Chinese
English
7B base model
chat model
Ziya2 Fengshenbang series models Chinese
English
13B base model
chat model
Skywork Skywork series models Chinese
English
13B base model
chat model
Zephyr Zephyr series models based on Mistral English 7B chat model
PolyLM Tongyi Lab self-developed PolyLM series models Multilingual 13B base model
SeqGPT Tongyi Lab self-developed text understanding model for information extraction and text classification Chinese 560M semantic understanding model
SUS Southern University of Science and Technology model fine-tuned on YI Chinese
English
34B chat model
Tongyi-Finance Tongyi finance series models Chinese
English
14B base model
chat model
financial model
CodeFuse-CodeLLaMA
CodeFuse-Codegeex2
CodeFuse-Qwen
Ant CodeFuse series models Chinese
English
6B-34B chat model
code model
phi2 Microsoft's PHI2 model English 3B base model
code model
Grok X-ai English 300B base model
TeleChat Tele-AI Chinese
English
7B-12B chat model
dbrx databricks English 132B base model
chat model
mengzi3 Langboat Chinese
English
13B base model
c4ai-command-r c4ai Multilingual 35B-104B chat model
WizardLM2 WizardLM2 series models English 7B-8x22B
including quantized versions
chat model
MoE model
Atom Atom Chinese 7B base model
chat model

MLLMs

Model Type Model Introduction Language Model Size Model Type
Qwen-VL Tongyi Qwen vision model Chinese
English
7B
including quantized versions
base model
chat model
Qwen-Audio Tongyi Qwen speech model Chinese
English
7B base model
chat model
YI-VL 01AI's YI series vision models Chinese
English
6B-34B chat model
XComposer2 Pujiang AI Lab InternLM vision model Chinese
English
7B chat model
DeepSeek-VL DeepSeek series vision models Chinese
English
1.3B-7B chat model
MiniCPM-V OpenBmB MiniCPM vision model Chinese
English
3B chat model
CogVLM
CogAgent
Zhipu ChatGLM visual QA and Agent model English 17B-18B chat model
Llava Llava series models English 7B-34B chat model
mPLUG-Owl mPLUG-Owl series models English 11B chat model

Diffusion Models

Model Type Model Introduction Language Model Type
AnimateDiff AnimateDiff animation model English text-to-video
SD1.5/SD2.0/SDXL StabilityAI series diffusion models English text-to-image

Supported Open Source Datasets

Dataset Type Training Task Documentation
General Fine-tuning 🔥ruozhiba, 🔥ms-bench, 🔥ms-bench-mini, 🔥alpaca-en(gpt4), 🔥alpaca-zh(gpt4), multi-alpaca-all, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, instruct-en, gpt4all-en, sharegpt-en, sharegpt-zh, tulu-v2-sft-mixture, wikipedia-zh, open-orca, open-orca-gpt4, sharegpt-gpt4, 🔥sharegpt-gpt4-mini.
Agent Fine-tuning 🔥ms-agent, ms-agent-for-agentfabric-default, ms-agent-for-agentfabric-addition, damo-mini-agent-zh, damo-agent-zh, agent-instruct-all-en.
General Human Alignment 🔥hh-rlhf-cn, stack-exchange-paired, hh-rlhf-harmless-base, hh-rlhf-helpful-base, hh-rlhf-helpful-online, hh-rlhf-helpful-rejection-sampled, hh-rlhf-red-team-attempts, hh-rlhf-cn-harmless-base-cn, hh-rlhf-cn-helpful-base-cn, hh-rlhf-cn-harmless-base-en, hh-rlhf-cn-helpful-base-en.
Code Fine-tuning code-alpaca-en, 🔥leetcode-python-en, 🔥codefuse-python-en, 🔥codefuse-evol-instruction-zh.
Medical Fine-tuning medical-en, medical-zh, medical-mini-zh, 🔥disc-med-sft-zh.
Legal Fine-tuning lawyer-llama-zh, tigerbot-law-zh, 🔥disc-law-sft-zh.
Math Fine-tuning 🔥blossom-math-zh, school-math-zh, open-platypus-en.
SQL Fine-tuning text2sql-en, 🔥sql-create-context-en.
Text Generation Fine-tuning 🔥advertise-gen-zh, 🔥dureader-robust-zh.
Classification Fine-tuning cmnli-zh, 🔥cmnli-mini-zh, 🔥jd-sentiment-zh, 🔥hc3-zh, 🔥hc3-en.
Quantization Assist Quantization pileval.
Other Fine-tuning finance-en, poetry-zh, webnovel-zh, generated-chat-zh, cls-fudan-news-zh, ner-jave-zh.
Vision Fine-tuning coco-en, 🔥coco-mini-en, coco-mini-en-2, capcha-images.
Audio Fine-tuning aishell1-zh, 🔥aishell1-mini-zh.

Supported Technologies

Technology Name
🔥LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
🔥LoRA+: LoRA+: Efficient Low Rank Adaptation of Large Models
🔥LLaMA PRO: LLAMA PRO: Progressive LLaMA with Block Expansion
🔥SCEdit: SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing < arXiv | Project Page >
🔥NEFTune: Noisy Embeddings Improve Instruction Finetuning
QA-LoRA:Quantization-Aware Low-Rank Adaptation of Large Language Models
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
ROME: Rank-One Editing of Encoder-Decoder Models
Adapter: Parameter-Efficient Transfer Learning for NLP
Prompt Tuning: Visual Prompt Tuning
Side: Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks
Res-Tuning: Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone < arXiv | Project Page | Usage >
Tuners provided by PEFT, such as IA3, AdaLoRA, etc.

Supported Hardware

Hardware Environment Notes
CPU
RTX 20/30/40 series, etc. After 30 series, BF16 and FlashAttn can be used
Computing cards T4/V100, etc. BF16 and FlashAttn not supported
Computing cards A10/A100, etc. Support BF16 and FlashAttn
Huawei Ascend NPU

📃 Documentation

Documentation Compiling

make docs
# Check docs/build/html/index.html in web-browser

User Guide

Document Name
Using Web-UI
Using Tuners
LLM Inference
LLM Fine-tuning
LLM Evaluation
LLM Quantization
LLM Deployment
DPO Human Alignment Training
AnimateDiff Training

Reference Documentation

Document Name
Command Line Arguments
Customizing New Models and Datasets
Supported Models and Datasets List
Runtime Speed and Memory Benchmark

Best Practices

Best Practices Name
Agent Fine-Tuning Best Practice
Self-Cognition Fine-Tuning Best Practice
Qwen1.5 Best Practice
Multi-Modal Model Training Best Practice
NPU Best Practice

Deep Learning Tutorials

Tutorial Name
Introduction to Deep Learning
Large Model Basics
Prompt Engineering
Transformer Architecture Introduction
Training Technique Selection
Data Preprocessing
Quantization
Training
Inference
Deployment
Evaluation

🏛 License

This framework is licensed under the Apache License (Version 2.0). For models and datasets, please refer to the original resource page and follow the corresponding License.

📎 Citation

@Misc{swift,
  title = {SWIFT:Scalable lightWeight Infrastructure for Fine-Tuning},
  author = {The ModelScope Team},
  howpublished = {\url{https://github.com/modelscope/swift}},
  year = {2024}
}

☎ Contact Us

You can contact us and communicate with us by adding our WeChat group:

Star History

Star History Chart

swift's People

Contributors

baoleai avatar chuanzhubin avatar firmament-cyou avatar hjh0119 avatar jiangzeyinzi avatar jintao-huang avatar slin000111 avatar tastelikefeet avatar wangqiang9 avatar weedwardzhao1 avatar wenmengzhou avatar wuwujay avatar yimi81 avatar yingdachen avatar yuzachongyi avatar zzhangpurdue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

swift's Issues

运行大模型微调的例子中简单使用报错

运行大模型微调的例子examples/pytorch/llm/README_CN.md中简单使用:
环境:win10 anaconda python3.9
显卡:3090

错误信息:
[INFO:swift] Model file modeling_qwen.py is different from the latest version v1.1.6,This is because you are using an older version or the file is updated manually.
0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "D:\Anaconda3\envs\Swift_py39\lib\runpy.py", line 288, in run_path
return _run_module_code(code, init_globals, run_name,
File "D:\Anaconda3\envs\Swift_py39\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "D:\Anaconda3\envs\Swift_py39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Workshop\swift\examples\pytorch\llm\test.py", line 19, in
best_ckpt_dir = sft_main(sft_args)
File "D:\Anaconda3\envs\Swift_py39\lib\site-packages\swift\llm\utils\utils.py", line 193, in x_main
return llm_x(args)
File "D:\Anaconda3\envs\Swift_py39\lib\site-packages\swift\llm\sft.py", line 232, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "D:\Anaconda3\envs\Swift_py39\lib\site-packages\transformers\trainer.py", line 1591, in train
return inner_training_loop(
File "D:\Anaconda3\envs\Swift_py39\lib\site-packages\transformers\trainer.py", line 1870, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "D:\Anaconda3\envs\Swift_py39\lib\site-packages\accelerate\data_loader.py", line 448, in iter
dataloader_iter = super().iter()
File "D:\Anaconda3\envs\Swift_py39\lib\site-packages\torch\utils\data\dataloader.py", line 442, in iter
return self._get_iterator()
File "D:\Anaconda3\envs\Swift_py39\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\Anaconda3\envs\Swift_py39\lib\site-packages\torch\utils\data\dataloader.py", line 1043, in init
w.start()
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "D:\Anaconda3\envs\Swift_py39\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

0%| | 0/1 [00:00<?, ?it/s]

RuntimeError: Device index must not be negative

镜像

pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel

执行命令

git clone https://github.com/modelscope/swift.git
cd swift/examples/pytorch/llm
bash scripts/qwen_7b/qlora/sft.sh

执行结果

Loading checkpoint shards:   0%|                                                                                                                                                                              | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/swift/examples/pytorch/llm/src/llm_sft.py", line 291, in <module>
    llm_sft(args)
  File "/root/swift/examples/pytorch/llm/src/llm_sft.py", line 167, in llm_sft
    model, tokenizer = get_model_tokenizer(
  File "/root/swift/examples/pytorch/llm/src/utils/models.py", line 259, in get_model_tokenizer
    model, tokenizer = get_function(model_dir, torch_dtype, load_model,
  File "/root/swift/examples/pytorch/llm/src/utils/models.py", line 151, in get_model_tokenizer_qwen
    return get_model_tokenizer_from_repo(model_dir, torch_dtype, load_model,
  File "/root/swift/examples/pytorch/llm/src/utils/models.py", line 44, in get_model_tokenizer_from_repo
    model = AutoModelForCausalLM.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 98, in from_pretrained
    model = module_class.from_pretrained(model_dir, *model_args,
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 64, in from_pretrained
    return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3260, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 725, in _load_state_dict_into_meta_model
    set_module_quantized_tensor_to_device(
  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 109, in set_module_quantized_tensor_to_device
    new_value = value.to(device)
RuntimeError: Device index must not be negative
root@dlcl4o079d5hls8w-master-0:~/swift/examples/pytorch/llm# python 
Python 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> 
>>> print(torch.__version__)
2.0.1
>>> 
>>> 
>>> import torch
>>> flag = torch.cuda.is_available()
>>> if flag:
...     print("CUDA可使用")
... else:
...     print("CUDA不可用")
... 
CUDA可使用
>>> 
>>> ngpu= 1
>>> # Decide which device we want to run on
>>> device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
>>> 
>>> device
device(type='cuda', index=0)
>>> print("驱动为:",device)
驱动为: cuda:0
>>> print("GPU型号: ",torch.cuda.get_device_name(0))
GPU型号:  NVIDIA A10
>>> 

请问qwen-vl模型full微调OOM

环境:8*A100-80G

微调方法:
1、直接full模式,OOM
2、deepspeed+freeze visual+finetune LLM,能跑起来
问题:finetune visual + freeze LLM,报错OOM,看了一圈visual的代码,没发现啥问题?希望大佬解答下。

微调seqgpt后,使用微调过后的ckpt作为resume_from_ckpt参数,报错

Traceback (most recent call last):
File "src/llm_sft.py", line 242, in
llm_sft(args)
File "src/llm_sft.py", line 218, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/transformers/trainer.py", line 1705, in _inner_training_loop
self._load_optimizer_and_scheduler(resume_from_checkpoint)
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/transformers/trainer.py", line 2496, in _load_optimizer_and_scheduler
self.optimizer.load_state_dict(
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/accelerate/optimizer.py", line 107, in load_state_dict
self.optimizer.load_state_dict(state_dict)
File "/root/anaconda3/envs/swift/lib/python3.8/site-packages/torch/optim/optimizer.py", line 390, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

采用qlora,使用的是Qwen-7b-chat,本地数据集,出现这样的报错。我之前用lora的方式是OK的。

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/train_file//opensource/swift-main/examples/pytorch/llm/llm_sft.py", line 2, in
import custom
File "/train_file/xx/opensource/swift-main/examples/pytorch/llm/custom.py", line 8, in
from swift.llm import (ConversationsPreprocessor,QueryPreprocessor, LoRATM, Template,TemplateType,
File "/train_file/xxx/opensource/swift-main/swift/llm/init.py", line 2, in
from .infer import llm_infer
File "/train_file/xx/opensource/swift-main/swift/llm/infer.py", line 6, in
from modelscope import BitsAndBytesConfig, GenerationConfig
File "", line 1075, in _handle_fromlist
File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/import_utils.py", line 422, in getattr
module = self._get_module(self._class_to_module[name])
File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/import_utils.py", line 441, in _get_module
raise RuntimeError(
RuntimeError: Failed to import modelscope.utils.hf_util because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
Failed to import transformers.generation.utils because of the following error (look up to see its traceback):

    CUDA Setup failed despite GPU being available. Please run the following command to get more information:

    python -m bitsandbytes

    Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
    to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
    and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

增量训练

请问后边会支持千问模型的增量训练吗?

raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)

训练qwen_7b/lora_ddp/sft.sh,训练一段时间后,会出现下面的错误,而且看起来是已经训练超过1个epoch了,数据集说明已经训练一遍了,那是什么原因呀?

12%|█▏        | 2899/25120 [2:12:20<16:58:37,  2.75s/it]
 12%|█▏        | 2900/25120 [2:12:23<16:39:43,  2.70s/it]
                                                         
{'loss': 2.011, 'learning_rate': 9.81e-05, 'epoch': 1.85, 'global_step': 2900}

 12%|█▏        | 2900/25120 [2:12:23<16:39:43,  2.70s/it]
 12%|█▏        | 2900/25120 [2:12:23<16:39:43,  2.70s/it]
 12%|█▏        | 2901/25120 [2:12:26<16:59:40,  2.75s/it]
 12%|█▏        | 2902/25120 [2:12:28<16:18:49,  2.64s/it]
 12%|█▏        | 2903/25120 [2:12:31<16:37:35,  2.69s/it]
 12%|█▏        | 2904/25120 [2:12:34<16:56:03,  2.74s/it]
 12%|█▏        | 2905/25120 [2:12:37<17:37:09,  2.86s/it]WARNING:torch.distributed.elastic.agent.server.api:Received 1 death signal, shutting down workers
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2919043 closing signal SIGHUP
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2919044 closing signal SIGHUP
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2919045 closing signal SIGHUP
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2919046 closing signal SIGHUP
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2919047 closing signal SIGHUP
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2919048 closing signal SIGHUP
Traceback (most recent call last):
  File "miniconda3/envs/pygpt/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 241, in launch_agent
    result = agent.run()
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
    result = f(*args, **kwargs)
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 723, in run
    result = self._invoke_run(role)
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 864, in _invoke_run
    time.sleep(monitor_interval)
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 62, in _terminate_process_handler
    raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 2918977 got signal: 1

多张4090机器上执行13b模型LORA微调提示显存不足

在baichuan2-13-chat,qwen-14b-chat等模型上使用lora_ddp_ds/sft.sh实验,都是提示OOM显存不足。
已经配置了nproc_per_node = 显卡数, CUDA_VISIBLE_DEVICES也添加了对应的序号。
单张卡的显存不够,但主机上多张卡的总显存是足够的,怎样修改脚本或代码才能正常执行LORA微调?

问题出在loading checkpoint shards 环节,加载到单卡的显存放不下就失败了。
image

show_freeze_layers

bash scripts/qwen_7b_chat/qlora/sft.sh 
2023-08-24 13:38:40,885 - modelscope - INFO - PyTorch version 2.0.1 Found.
2023-08-24 13:38:40,886 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-08-24 13:38:40,908 - modelscope - INFO - Loading done! Current index file version is 1.8.4, with md5 1cad0e50ca2a3cb304bd047cd1c5803f and a total number of 902 components indexed
Traceback (most recent call last):
  File "/root/swift/examples/pytorch/llm/src/llm_sft.py", line 10, in <module>
    from utils import (DATASET_MAPPING, MODEL_MAPPING, TEMPLATE_MAPPING,
  File "/root/swift/examples/pytorch/llm/src/utils/__init__.py", line 1, in <module>
    from .dataset import DATASET_MAPPING, get_dataset, process_dataset
  File "/root/swift/examples/pytorch/llm/src/utils/dataset.py", line 9, in <module>
    from swift.utils import get_seed
  File "/opt/conda/lib/python3.10/site-packages/swift/__init__.py", line 4, in <module>
    from .utils.import_utils import _LazyModule
  File "/opt/conda/lib/python3.10/site-packages/swift/utils/__init__.py", line 4, in <module>
    from .torch_utils import (add_version_to_work_dir, get_seed, is_master,
ImportError: cannot import name 'show_freeze_layers' from 'swift.utils.torch_utils' (/opt/conda/lib/python3.10/site-packages/swift/utils/torch_utils.py)

看着像代码漏提交了

windows上微调报错,麻烦帮忙看看怎么解决

Traceback (most recent call last):
File "H:\ai\qwen\swift\examples\pytorch\llm\src\llm_sft.py", line 321, in
llm_sft(args)
File "H:\ai\qwen\swift\examples\pytorch\llm\src\llm_sft.py", line 299, in llm_sft
trainer.train(trainer_args.resume_from_checkpoint)
File "C:\Python310\lib\site-packages\transformers\trainer.py", line 1539, in train
return inner_training_loop(
File "C:\Python310\lib\site-packages\transformers\trainer.py", line 1787, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "C:\Python310\lib\site-packages\accelerate\data_loader.py", line 381, in iter
Traceback (most recent call last):
dataloader_iter = super().iter()
File "", line 1, in
File "C:\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 438, in iter
return self._get_iterator()
File "C:\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 386, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
File "C:\Python310\lib\site-packages\torch\utils\data\dataloader.py", line 1058, in init
exitcode = _main(fd, parent_sentinel)
File "C:\Python310\lib\multiprocessing\spawn.py", line 126, in _main
w.start()
File "C:\Python310\lib\multiprocessing\process.py", line 121, in start
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
self._popen = self._Popen(self)
File "C:\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Python310\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Python310\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'builtins.CoreBPE' object
0%| | 0/3 [00:06<?, ?it/s]

全量微调 Qwen, 报 OOM

全量微调 Qwen, 用torchrun 的情况下,报 OOM
机器配置:双卡 A800,batch_size 为 1

请问单机多卡,如何加速训练呢

AttributeError: 'MsDataset' object has no attribute 'info'

Traceback (most recent call last):
File "/data/home/tgyhyt/project/swift-main/examples/pytorch/llm/src/llm_sft.py", line 338, in
llm_sft(args)
File "/data/home/tgyhyt/project/swift-main/examples/pytorch/llm/src/llm_sft.py", line 230, in llm_sft
dataset = get_dataset(args.dataset.split(','))
File "/data/home/tgyhyt/project/swift-main/examples/pytorch/llm/src/utils/dataset.py", line 339, in get_dataset
dataset_list.append(get_function())
File "/data/home/tgyhyt/project/swift-main/examples/pytorch/llm/src/utils/dataset.py", line 181, in get_coco_en_dataset
return _process_mutimodal_dataset(dataset, 'please describe the image', 'image', 'caption')
File "/data/home/tgyhyt/project/swift-main/examples/pytorch/llm/src/utils/dataset.py", line 152, in _process_mutimodal_dataset
dataset.features._column_requires_decoding['image'] = False
AttributeError: 'MsDataset' object has no attribute 'info'

RuntimeError: self and mat2 must have the same dtype

(gpt) root@autodl-container-9e2911833c-bcf1743a:~/autodl-tmp/swift-main/examples/pytorch/llm# CUDA_VISIBLE_DEVICES=0 python src/llm_sft.py --model_type qwen-7b --sft_type lora --dtype bf16 --output_dir runs --dataset alpaca-en,alpaca-zh --dataset_sample -1 --num_train_epochs 1 --max_length 1024 --quantization_bit 4 --lora_rank 64 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules ALL --batch_size 1 --weight_decay 0. --learning_rate 1e-4 --gradient_accumulation_steps 16 --max_grad_norm 0.5 --warmup_ratio 0.03 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 10 --use_flash_attn false --push_to_hub false --hub_model_id qwen-7b-qlora --hub_private_repo true --hub_token 'your-sdk-token'
2023-08-24 15:54:28,792 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found.
2023-08-24 15:54:28,793 - modelscope - INFO - Loading ast index from /root/autodl-tmp/.cache/modelscope/hub/ast_indexer
2023-08-24 15:54:28,829 - modelscope - INFO - Loading done! Current index file version is 1.8.1, with md5 1f897f6541cc699224f7379a0c996b2e and a total number of 893 components indexed

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /root/miniconda3/envs/gpt/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/root/miniconda3/envs/gpt/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/root/miniconda3/envs/gpt/lib/libcudart.so'), PosixPath('/root/miniconda3/envs/gpt/lib/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /root/miniconda3/envs/gpt/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /root/miniconda3/envs/gpt/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
2023-08-24 15:54:31,955 - swift - INFO - Setting template_type: chatml
2023-08-24 15:54:31,955 - swift - INFO - args: SftArguments(model_type='qwen-7b', sft_type='lora', template_type='chatml', output_dir='runs/qwen-7b', ddp_backend=None, seed=42, resume_from_ckpt=None, dtype='bf16', ignore_args_error=False, dataset='alpaca-en,alpaca-zh', dataset_seed=42, dataset_sample=-1, dataset_test_size=0.01, system='you are a helpful assistant!', max_length=1024, quantization_bit=4, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, lora_target_modules=['ALL'], lora_rank=64, lora_alpha=32, lora_dropout_p=0.05, gradient_checkpoint=True, batch_size=1, num_train_epochs=1, optim='adamw_torch', learning_rate=0.0001, weight_decay=0.0, gradient_accumulation_steps=16, max_grad_norm=0.5, lr_scheduler_type='cosine', warmup_ratio=0.03, eval_steps=50, save_steps=50, save_total_limit=2, logging_steps=10, push_to_hub=False, hub_model_id='qwen-7b-qlora', hub_private_repo=True, hub_strategy='every_save', hub_token='your-sdk-token', use_flash_attn=False)
device_count: 1
rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
2023-08-24 15:54:31,955 - swift - INFO - Global seed set to 42
2023-08-24 15:54:31,956 - swift - INFO - quantization_config: {'load_in_8bit': False, 'load_in_4bit': True, 'llm_int8_threshold': 6.0, 'llm_int8_skip_modules': None, 'llm_int8_enable_fp32_cpu_offload': False, 'llm_int8_has_fp16_weight': False, 'bnb_4bit_quant_type': 'nf4', 'bnb_4bit_use_double_quant': True, 'bnb_4bit_compute_dtype': torch.bfloat16}
2023-08-24 15:54:32,165 - modelscope - INFO - Use user-specified model revision: v.1.0.4
2023-08-24 15:54:32,460 - swift - INFO - model_config: QWenConfig {
"_name_or_path": "/root/autodl-tmp/.cache/modelscope/hub/qwen/Qwen-7B",
"activation": "swiglu",
"apply_residual_connection_post_layernorm": false,
"architectures": [
"QWenLMHeadModel"
],
"attn_pdrop": 0.0,
"auto_map": {
"AutoConfig": "configuration_qwen.QWenConfig",
"AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
},
"bf16": true,
"bias_dropout_fusion": true,
"bos_token_id": 151643,
"embd_pdrop": 0.0,
"eos_token_id": 151643,
"ffn_hidden_size": 22016,
"fp16": false,
"fp32": false,
"initializer_range": 0.02,
"kv_channels": 128,
"layer_norm_epsilon": 1e-06,
"model_dir": "/root/autodl-tmp/.cache/modelscope/hub/qwen/Qwen-7B",
"model_type": "qwen",
"n_embd": 4096,
"n_head": 32,
"n_inner": null,
"n_layer": 32,
"n_positions": 6144,
"no_bias": true,
"onnx_safe": null,
"padded_vocab_size": 151936,
"params_dtype": "torch.bfloat16",
"pos_emb": "rotary",
"resid_pdrop": 0.1,
"rotary_emb_base": 10000,
"rotary_pct": 1.0,
"scale_attn_weights": true,
"seq_length": 2048,
"tie_word_embeddings": false,
"tokenizer_type": "QWenTokenizer",
"torch_dtype": "bfloat16",
"transformers_version": "4.30.2",
"use_cache": true,
"use_dynamic_ntk": true,
"use_flash_attn": false,
"use_logn_attn": true,
"vocab_size": 151936
}

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00, 1.97s/it]
Using pad_token, but it is not set yet.
2023-08-24 15:54:53,136 - swift - INFO - Setting lora_target_modules: ['c_attn', 'w1', 'c_proj', 'w2']
2023-08-24 15:54:53,136 - swift - INFO - lora_config: get_wrapped_class..PeftWrapper(peft_type=<PeftType.LORA: 'LORA'>, base_model_name_or_path=None, task_type='CAUSAL_LM', inference_mode=False, r=64, target_modules=['c_attn', 'w1', 'c_proj', 'w2'], lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', modules_to_save=None, init_lora_weights=True)
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.wte.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.ln_1.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.attn.c_attn.weight]: requires_grad=False, dtype=torch.uint8, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.attn.c_attn.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.attn.c_attn.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.attn.c_attn.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.attn.c_proj.weight]: requires_grad=False, dtype=torch.uint8, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.attn.c_proj.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.attn.c_proj.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.ln_2.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.w1.weight]: requires_grad=False, dtype=torch.uint8, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.w1.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.w1.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.w2.weight]: requires_grad=False, dtype=torch.uint8, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.w2.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.w2.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.c_proj.weight]: requires_grad=False, dtype=torch.uint8, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.c_proj.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.0.mlp.c_proj.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0
2023-08-24 15:56:29,482 - swift - INFO - [base_model.model.transformer.h.1.ln_1.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
2023-08-24 15:56:29,483 - swift - INFO - ...
2023-08-24 15:56:29,492 - swift - INFO - PeftModelForCausalLM: 4626.4525M Params (143.1306M Trainable), 1207.9596M Buffers.
2023-08-24 15:56:29,493 - modelscope - INFO - No subset_name specified, defaulting to the default
2023-08-24 15:56:30,026 - modelscope - WARNING - Reusing dataset alpaca-gpt4-data-en (/root/.cache/modelscope/hub/datasets/AI-ModelScope/alpaca-gpt4-data-en/master/data_files)
2023-08-24 15:56:30,026 - modelscope - INFO - Generating dataset alpaca-gpt4-data-en (/root/.cache/modelscope/hub/datasets/AI-ModelScope/alpaca-gpt4-data-en/master/data_files)
2023-08-24 15:56:30,026 - modelscope - INFO - Reusing cached meta-data file: /root/.cache/modelscope/hub/datasets/AI-ModelScope/alpaca-gpt4-data-en/master/data_files/66247e987561e76d71cc064cb302eb31
Downloading data files: 0it [00:00, ?it/s]
Extracting data files: 0it [00:00, ?it/s]
2023-08-24 15:56:31,834 - modelscope - INFO - No subset_name specified, defaulting to the default
2023-08-24 15:56:32,312 - modelscope - WARNING - Reusing dataset alpaca-gpt4-data-zh (/root/.cache/modelscope/hub/datasets/AI-ModelScope/alpaca-gpt4-data-zh/master/data_files)
2023-08-24 15:56:32,312 - modelscope - INFO - Generating dataset alpaca-gpt4-data-zh (/root/.cache/modelscope/hub/datasets/AI-ModelScope/alpaca-gpt4-data-zh/master/data_files)
2023-08-24 15:56:32,312 - modelscope - INFO - Reusing cached meta-data file: /root/.cache/modelscope/hub/datasets/AI-ModelScope/alpaca-gpt4-data-zh/master/data_files/d17e7f3c34d5d65c37d14ef32c78bfc3
Downloading data files: 0it [00:00, ?it/s]
Extracting data files: 0it [00:00, ?it/s]
2023-08-24 15:58:03,046 - swift - INFO - Dataset Token Length: 170.389767±111.748190, min=27.000000, max=857.000000, size=99811
2023-08-24 15:58:03,226 - swift - INFO - Dataset Token Length: 174.365709±110.360343, min=31.000000, max=557.000000, size=1009
2023-08-24 15:58:03,227 - swift - INFO - [INPUT_IDS] [151644, 8948, 198, 9330, 525, 264, 10950, 17847, 0, 151645, 198, 151644, 872, 198, 58465, 1247, 279, 2701, 11652, 311, 1281, 432, 16245, 1447, 785, 4143, 525, 12035, 911, 862, 14487, 16319, 624, 151645, 198, 151644, 77091, 198, 785, 4143, 525, 1411, 40033, 448, 27262, 323, 49819, 369, 862, 14487, 16319, 13, 151645, 151643]
2023-08-24 15:58:03,227 - swift - INFO - [INPUT] <|im_start|>system
you are a helpful assistant!<|im_end|>
<|im_start|>user
Rewrite the following sentence to make it stronger:

The students are excited about their upcoming assignment.
<|im_end|>
<|im_start|>assistant
The students are brimming with excitement and anticipation for their upcoming assignment.<|im_end|><|endoftext|>
2023-08-24 15:58:03,227 - swift - INFO - [LABLES_IDS] [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 785, 4143, 525, 1411, 40033, 448, 27262, 323, 49819, 369, 862, 14487, 16319, 13, 151645, 151643]
2023-08-24 15:58:03,227 - swift - INFO - [LABLES] [-100 * 38]The students are brimming with excitement and anticipation for their upcoming assignment.<|im_end|><|endoftext|>
2023-08-24 15:58:03,228 - swift - INFO - work_dir: /root/autodl-tmp/swift-main/examples/pytorch/llm/runs/qwen-7b/v0-20230824-155803
2023-08-24 15:58:03,231 - swift - INFO - trainer_args: Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=1,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=50,
evaluation_strategy=steps,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=16,
gradient_checkpointing=True,
greater_is_better=False,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=qwen-7b-qlora,
hub_private_repo=True,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0001,
length_column_name=length,
load_best_model_at_end=True,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/root/autodl-tmp/swift-main/examples/pytorch/llm/runs/qwen-7b/v0-20230824-155803/runs/Aug24_15-58-03_autodl-container-9e2911833c-bcf1743a,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=0.5,
max_steps=-1,
metric_for_best_model=loss,
mp_parameters=,
no_cuda=False,
num_train_epochs=1,
optim=adamw_torch,
optim_args=None,
output_dir=/root/autodl-tmp/swift-main/examples/pytorch/llm/runs/qwen-7b/v0-20230824-155803,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=1,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard', 'wandb'],
resume_from_checkpoint=None,
run_name=/root/autodl-tmp/swift-main/examples/pytorch/llm/runs/qwen-7b/v0-20230824-155803,
save_on_each_node=False,
save_safetensors=False,
save_steps=50,
save_strategy=steps,
save_total_limit=2,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=True,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
2023-08-24 15:58:03,755 - swift - INFO - Model file config.json is different from the latest version v1.0.5,This is because you are using an older version or the file is updated manually.
0%| | 0/6238 [00:00<?, ?it/s]use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
Traceback (most recent call last):
File "/root/autodl-tmp/swift-main/examples/pytorch/llm/src/llm_sft.py", line 323, in
llm_sft(args)
File "/root/autodl-tmp/swift-main/examples/pytorch/llm/src/llm_sft.py", line 301, in llm_sft
trainer.train(trainer_args.resume_from_checkpoint)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/transformers/trainer.py", line 2759, in training_step
loss = self.compute_loss(model, inputs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/transformers/trainer.py", line 2784, in compute_loss
outputs = model(**inputs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward
return model_forward(*args, **kwargs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
return self.base_model(
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/autodl-tmp/.cache/huggingface/hub/modules/transformers_modules/Qwen-7B/modeling_qwen.py", line 925, in forward
transformer_outputs = self.transformer(
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/autodl-tmp/.cache/huggingface/hub/modules/transformers_modules/Qwen-7B/modeling_qwen.py", line 756, in forward
outputs = torch.utils.checkpoint.checkpoint(
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/root/autodl-tmp/.cache/huggingface/hub/modules/transformers_modules/Qwen-7B/modeling_qwen.py", line 752, in custom_forward
return module(*inputs, use_cache, output_attentions)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/autodl-tmp/.cache/huggingface/hub/modules/transformers_modules/Qwen-7B/modeling_qwen.py", line 523, in forward
attn_outputs = self.attn(
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/autodl-tmp/.cache/huggingface/hub/modules/transformers_modules/Qwen-7B/modeling_qwen.py", line 367, in forward
mixed_x_layer = self.c_attn(hidden_states)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/gpt/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: self and mat2 must have the same dtype

请问sft.sh中哪个参数是加载本机离线模型呀?修改model_type不正确

nproc_per_node=2

#PYTHONPATH=/home/user/miniconda/envs/swift/bin/python
PYTHONPATH=../../..
CUDA_VISIBLE_DEVICES=0,1
torchrun
--nproc_per_node=$nproc_per_node
--master_port 29500
/root/swift/examples/pytorch/llm/src/llm_sft.py
--model_type qwen-14b-chat-int4
--sft_type lora
--template_type chatml
--dtype fp16
--output_dir /root/swift_output
--ddp_backend nccl
--dataset csc-zh
--train_dataset_sample 20000
--num_train_epochs 1
--max_length 4096
--lora_rank 8
--lora_alpha 32
--lora_dropout_p 0.
--lora_target_modules ALL
--gradient_checkpointing true
--batch_size 1
--weight_decay 0.
--learning_rate 1e-4
--gradient_accumulation_steps $(expr 16 / $nproc_per_node)
--max_grad_norm 0.5
--warmup_ratio 0.03
--eval_steps 100
--save_steps 100
--save_total_limit 2
--logging_steps 10
--use_flash_attn false
--push_to_hub false
--deepspeed_config_path 'ds_config/zero2.json'
--only_save_model true \

能支持Qwen-VL官方的微调格式吗

  {
    "id": "identity_1",
    "conversations": [
      {
        "from": "user",
        "value": "Picture 1: <img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>\n图中的狗是什么品种?",
      },
      {
        "from": "assistant",
        "value": "图中是一只拉布拉多犬。。"
      }
      {
        "from": "user",
        "value": "框出图中的格子衬衫",
      },
      {
        "from": "assistant",
        "value": "<ref>格子衬衫</ref><box>(588,499),(725,789)</box>"
      }
    ]
  }

能支持类似这样的微调数据吗?
参考
https://github.com/QwenLM/Qwen-VL/blob/master/README_CN.md

如何禁用hadoop

qlora微调qwen-chat-7b的时候老是调用hadoop-2.7.4,虽然我只有一个GPU,但是这个会导致权限问题,我能如何在代码中禁止这样的调用?谢谢!

IndexError: index 0 is out of bounds for dimension 0 with size 0

训练百川2-7b报这个错误,
数据集是我自己的,可以使用qwen-7b-lora-ddp脚本训练,但是百川训练会报错(配置使用的下载下来默认的)

IndexError: index 0 is out of bounds for dimension 0 with size 0
    output_dir = broadcast_string(output_dir)    
first_zero = (tensor == 0).nonzero()[0].item()  File "examples/pytorch/llm/src/utils/utils.py", line 181, in broadcast_string

IndexError: index 0 is out of bounds for dimension 0 with size 0
    first_zero = (tensor == 0).nonzero()[0].item()
IndexError: index 0 is out of bounds for dimension 0 with size 0
Traceback (most recent call last):
  File "examples/pytorch/llm/src/llm_sft.py", line 345, in <module>
    llm_sft(args)
  File "examples/pytorch/llm/src/llm_sft.py", line 255, in llm_sft
    output_dir = broadcast_string(output_dir)
  File "examples/pytorch/llm/src/utils/utils.py", line 181, in broadcast_string
    first_zero = (tensor == 0).nonzero()[0].item()
IndexError: index 0 is out of bounds for dimension 0 with size 0
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2720322 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2720323 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2720318) of binary: /mnt/data1/yingzhi/dlframework/miniconda3/envs/pygpt/bin/python
Traceback (most recent call last):
  File "/mnt/data1/yingzhi/dlframework/miniconda3/envs/pygpt/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "miniconda3/envs/pygpt/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
src/llm_sft.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-09-09_21:53:07
  host      : ubuntu-4U-GPU-Server
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 2720319)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-09-09_21:53:07
  host      : ubuntu-4U-GPU-Server
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 2720320)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2023-09-09_21:53:07
  host      : ubuntu-4U-GPU-Server
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 2720321)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-09-09_21:53:07
  host      : ubuntu-4U-GPU-Server
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2720318)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

执行 scripts/qwen_7b_chat/lora/sft.sh 数据集为damo-agent-mini-zh报错

INFO:modelscope] No subset_name specified, defaulting to the default
[WARNING:modelscope] Reusing dataset ms_agent-bench (/u01/liuys/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files)
[INFO:modelscope] Generating dataset ms_agent-bench (/u01/liuys/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files)
[INFO:modelscope] Reusing cached meta-data file: /u01/liuys/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files/d7bde86fe25cc1973db0962e0dfe0b07
[INFO:modelscope] Reusing cached meta-data file: /u01/liuys/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files/466e047df4e2dc8c730e0ba6e1d60e41
Downloading data files: 0it [00:00, ?it/s]
Extracting data files: 0it [00:00, ?it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 153600/153600 [00:07<00:00, 19888.54it/s]
0it [00:00, ?it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [00:00<00:00, 7210.39it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 152/152 [00:00<00:00, 577476.64it/s]
[INFO:swift] train_dataset: Dataset({
features: ['system', 'query', 'response', 'history'],
num_rows: 0
})
[INFO:swift] val_dataset: Dataset({
features: ['system', 'query', 'response', 'history'],
num_rows: 152
})
0it [00:00, ?it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 152/152 [00:00<00:00, 500.85it/s]
/u01/liuys/swift/package/swift/utils/llm_utils.py:25: RuntimeWarning: Mean of empty slice.
mean = _token_len.mean().item()
/u01/liuys/anaconda3/envs/ms-sft/lib/python3.10/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
/u01/liuys/anaconda3/envs/ms-sft/lib/python3.10/site-packages/numpy/core/_methods.py:206: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/u01/liuys/anaconda3/envs/ms-sft/lib/python3.10/site-packages/numpy/core/_methods.py:163: RuntimeWarning: invalid value encountered in divide
arrmean = um.true_divide(arrmean, div, out=arrmean,
/u01/liuys/anaconda3/envs/ms-sft/lib/python3.10/site-packages/numpy/core/methods.py:198: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/u01/liuys/swift/examples/pytorch/llm/src/llm_sft.py", line 241, in
llm_sft(args)
File "/u01/liuys/swift/examples/pytorch/llm/src/llm_sft.py", line 110, in llm_sft
stat_dataset(train_dataset)
File "/u01/liuys/swift/package/swift/utils/llm_utils.py", line 27, in stat_dataset
min
= _token_len.min().item()
File "/u01/liuys/anaconda3/envs/ms-sft/lib/python3.10/site-packages/numpy/core/_methods.py", line 45, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation minimum which has no identity

请问怎么封装模型呢?

我是用lora微调的通义千问7bchat,但是怎么把训练的checkpoint-xxx文件夹封装呢?想部署到自己服务器中或者函数计算中使用,希望大佬有空可以回答一下

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.