Reminder <input type=

模型微调的关键日志如下： <div class="snippet-clipboard-content notranslate position-relative
Comments (7)

WangxuP commented on August 16, 2024
模型微调的关键日志如下：

(llamafactory) [root@instance-67wbmebl LLaMA-Factory-0.8.2]# CUDA_VISIBLE_DEVICES=0 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.32.8 MASTER_PORT=29500 llamafact-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
[2024-06-29 15:19:05,408] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.1), only 1.0.0 is known to be compatible
06/29/2024 15:19:07 - INFO - llamafactory.cli - Initializing distributed tasks at: 192.168.32.8:29500
[2024-06-29 15:19:32,522] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.1), only 1.0.0 is known to be compatible
[2024-06-29 15:19:34,436] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-29 15:19:34,436] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
06/29/2024 15:19:34 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
06/29/2024 15:19:34 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|tokenization_utils_base.py:2159] 2024-06-29 15:19:34,507 >> loading file qwen.tiktoken
[INFO|tokenization_utils_base.py:2159] 2024-06-29 15:19:34,507 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2159] 2024-06-29 15:19:34,508 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2159] 2024-06-29 15:19:34,508 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2159] 2024-06-29 15:19:34,508 >> loading file tokenizer.json
06/29/2024 15:19:34 - INFO - llamafactory.data.template - Add eos token: <|im_end|>
06/29/2024 15:19:34 - INFO - llamafactory.data.template - Add pad token: <|im_end|>
06/29/2024 15:19:34 - INFO - llamafactory.data.loader - Loading dataset time_change5_llama.json...
Converting format of dataset (num_proc=16): 100%|█████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:00<00:00, 29515.42 examples
Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:15<00:00, 258.17 examples
input_ids:
[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 105043, 33, 1570, 104202, 10042, 102064, 20450, 54542, 101057, 3837, 114806, 10042,  5333, 37945, 44063, 31196, 101975, 72881, 99700, 105359, 17714, 105149, 2236, 68805, 1773, 2236, 100630, 9370, 44931, 18830, 2073, 67949, 20450, 3328, 33590, 2073, 552820450, 40906, 33590, 2073, 80565, 20450, 49688, 96332, 31196, 72881, 99700, 28311, 104373, 99609, 27442, 38182, 75108, 100977, 17177, 271, 102808, 43815, 100470, 31526, 6, 68805, 151645, 198, 151644, 77091, 198, 515, 1, 3328, 788, 330, 7319, 7689, 10700, 2129, 756, 1, 40906, 788, 330, 3328, 4358, 355, 20557, 7, 16, 568, 983, 7319, 1916,05, 266, 1462, 7, 24, 11, 20, 19, 1215, 756, 1, 49688, 788, 330, 3328, 4358, 355, 20557, 7, 16, 568, 983, 7319, 1916, 1005, 266, 1462, 7, 24, 11, 20, 19, 1215, 698, 92, 645]
inputs:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user

[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 0, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 515, 1, 3328, , 330, 7319, 7689, 10700, 2129, 756, 1, 40906, 788, 330, 3328, 4358, 355, 20557, 7, 16, 568, 983, 7319, 1916, 1005, 266, 1462, 7, 24, 11, 20, 19, 1215, 756, 1, 49688, 78330, 3328, 4358, 355, 20557, 7, 16, 568, 983, 7319, 1916, 1005, 266, 1462, 7, 24, 11, 20, 19, 1215, 698, 92, 151645]

[INFO|configuration_utils.py:731] 2024-06-29 15:20:34,982 >> loading configuration file /home/models/Qwen-14B-Chat/config.json
[INFO|configuration_utils.py:731] 2024-06-29 15:20:34,983 >> loading configuration file /home/models/Qwen-14B-Chat/config.json
[INFO|configuration_utils.py:800] 2024-06-29 15:20:34,984 >> Model config QWenConfig {
  "_name_or_path": "/home/models/Qwen-14B-Chat/",
  "architectures": [
    "QWenLMHeadModel"
  ],
  "attn_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 27392,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 8192,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "softmax_in_fp32": false,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.42.2",
  "use_cache": true,
  "use_cache_kernel": false,
  "use_cache_quantization": false,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 152064
}

[INFO|modeling_utils.py:3553] 2024-06-29 15:20:35,012 >> loading weights file /home/models/Qwen-14B-Chat/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-06-29 15:20:35,012 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[INFO|configuration_utils.py:1000] 2024-06-29 15:20:35,017 >> Generate config GenerationConfig {}

Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码，尤其如果你在9月25日已经开始使用Qwen-7B，千万注意不要使用错误代码和模型。
[2024-06-29 15:21:13,136] [INFO] [partition_parameters.py:345:__exit__] finished initializing model - num_params = 323, num_elems = 14.17B
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:58<00:00,  3.91s/
[INFO|modeling_utils.py:4364] 2024-06-29 15:22:11,839 >> All model checkpoint weights were used when initializing QWenLMHeadModel.

[INFO|modeling_utils.py:4372] 2024-06-29 15:22:11,839 >> All the weights of QWenLMHeadModel were initialized from the model checkpoint at /home/models/Qwen-14B-Chat/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use QWenLMHeadModel for predictions without further training.
[INFO|configuration_utils.py:953] 2024-06-29 15:22:11,842 >> loading configuration file /home/models/Qwen-14B-Chat/generation_config.json
[INFO|configuration_utils.py:1000] 2024-06-29 15:22:11,842 >> Generate config GenerationConfig {
  "chat_format": "chatml",
  "do_sample": true,
  "eos_token_id": 151643,
  "max_new_tokens": 512,
  "max_window_size": 6144,
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "top_k": 0,
  "top_p": 0.8
}

06/29/2024 15:22:11 - WARNING - llamafactory.model.model_utils.checkpointing - You are using the old GC format, some features (e.g. BAdam) will be invalid.
06/29/2024 15:22:11 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
06/29/2024 15:22:11 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
06/29/2024 15:22:11 - INFO - llamafactory.model.adapter - ZeRO3/FSDP/PureBF16/BAdam detected, remaining trainable params as their original precision.
06/29/2024 15:22:11 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
06/29/2024 15:22:11 - INFO - llamafactory.model.model_utils.misc - Found linear modules: c_attn,c_proj,w1,w2
06/29/2024 15:22:12 - INFO - llamafactory.model.loader - trainable params: 27893760 || all params: 14195184640 || trainable%: 0.1965
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimversion or higher.
[INFO|trainer.py:642] 2024-06-29 15:22:12,247 >> Using auto half precision backend
06/29/2024 15:22:12 - WARNING - llamafactory.extras.callbacks - Previous trainer log in this folder will be deleted.
[INFO|deepspeed.py:329] 2024-06-29 15:22:12,398 >> Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has b CPU and GPU implementation (except LAMB)
Installed CUDA version 12.4 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 0.2251906394958496 seconds
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000100, betas=(0.900000, 0.999000), weight_decay=0.010000, adam_w=1
[2024-06-29 15:22:12,738] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.4, git-hash=unknown, git-branch=unknown
[2024-06-29 15:22:12,768] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-06-29 15:22:12,771] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-06-29 15:22:12,771] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-06-29 15:22:12,799] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam
[2024-06-29 15:22:12,799] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.pSpeedCPUAdam'>
[2024-06-29 15:22:12,799] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
[2024-06-29 15:22:12,800] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 3 optimizer
[2024-06-29 15:22:12,947] [INFO] [utils.py:781:see_memory_usage] Stage 3 initialize beginning
[2024-06-29 15:22:12,948] [INFO] [utils.py:782:see_memory_usage] MA 0.05 GB         Max_MA 4.35 GB         CA 0.06 GB         Max_CA 4 GB
[2024-06-29 15:22:12,948] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.09 GB, percent = 22.3%
[2024-06-29 15:22:12,955] [INFO] [stage3.py:130:__init__] Reduce bucket size 26214400
[2024-06-29 15:22:12,955] [INFO] [stage3.py:131:__init__] Prefetch bucket size 23592960
[2024-06-29 15:22:13,094] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2024-06-29 15:22:13,094] [INFO] [utils.py:782:see_memory_usage] MA 0.05 GB         Max_MA 0.05 GB         CA 0.06 GB         Max_CA 0 GB
[2024-06-29 15:22:13,094] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.09 GB, percent = 22.3%
Parameter Offload: Total persistent parameters: 10859520 in 361 params
[2024-06-29 15:22:15,623] [INFO] [utils.py:781:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2024-06-29 15:22:15,624] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB         Max_MA 0.05 GB         CA 0.06 GB         Max_CA 0 GB
[2024-06-29 15:22:15,624] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.14 GB, percent = 22.3%
[2024-06-29 15:22:15,774] [INFO] [utils.py:781:see_memory_usage] Before creating fp16 partitions
[2024-06-29 15:22:15,775] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.06 GB         Max_CA 0 GB
[2024-06-29 15:22:15,775] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.14 GB, percent = 22.3%
[2024-06-29 15:22:41,741] [INFO] [utils.py:781:see_memory_usage] After creating fp16 partitions: 1
[2024-06-29 15:22:41,742] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.06 GB         Max_CA 0 GB
[2024-06-29 15:22:41,742] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.16 GB, percent = 22.4%
[2024-06-29 15:22:41,894] [INFO] [utils.py:781:see_memory_usage] Before creating fp32 partitions
[2024-06-29 15:22:41,895] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.06 GB         Max_CA 0 GB
[2024-06-29 15:22:41,895] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.16 GB, percent = 22.4%
[2024-06-29 15:22:42,060] [INFO] [utils.py:781:see_memory_usage] After creating fp32 partitions
[2024-06-29 15:22:42,061] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.06 GB         Max_CA 0 GB
[2024-06-29 15:22:42,061] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.23 GB, percent = 22.4%
[2024-06-29 15:22:42,211] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2024-06-29 15:22:42,212] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.06 GB         Max_CA 0 GB
[2024-06-29 15:22:42,212] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.23 GB, percent = 22.4%
[2024-06-29 15:22:42,391] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2024-06-29 15:22:42,392] [INFO] [utils.py:782:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.06 GB         Max_CA 0 GB
[2024-06-29 15:22:42,392] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.3 GB, percent = 22.5%
[2024-06-29 15:22:42,392] [INFO] [stage3.py:486:_setup_for_real_optimizer] optimizer state initialized
[2024-06-29 15:22:42,702] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2024-06-29 15:22:42,703] [INFO] [utils.py:782:see_memory_usage] MA 0.05 GB         Max_MA 0.05 GB         CA 0.11 GB         Max_CA 0 GB
[2024-06-29 15:22:42,703] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 28.37 GB, percent = 22.5%
[2024-06-29 15:22:42,703] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer_Stage3
[2024-06-29 15:22:42,703] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-06-29 15:22:42,703] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-06-29 15:22:42,703] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.999)]
[2024-06-29 15:22:42,707] [INFO] [config.py:997:print] DeepSpeedEngine configuration:
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   activation_checkpointing_config  {
    "partition_activations": false,
    "contiguous_memory_optimization": false,
    "cpu_checkpointing": false,
    "number_checkpoints": null,
    "synchronize_checkpoint_boundary": false,
    "profile": false
}
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': Fa, 'overlap_events': True}
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   amp_enabled .................. False
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   amp_params ................... False
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   autotuning_config ............ {
    "enabled": false,
    "start_step": null,
    "end_step": null,
    "metric_path": null,
    "arg_mappings": null,
    "metric": "throughput",
    "model_info": null,
    "results_dir": "autotuning_results",
    "exps_dir": "autotuning_exps",
    "overwrite": true,
    "fast": true,
    "start_profile_step": 3,
    "end_profile_step": 5,
    "tuner_type": "gridsearch",
    "tuner_early_stopping": 5,
    "tuner_num_trials": 50,
    "model_info_path": null,
    "mp_size": 1,
    "max_train_batch_size": null,
    "min_train_batch_size": 1,
    "max_train_micro_batch_size_per_gpu": 1.024000e+03,
    "min_train_micro_batch_size_per_gpu": 1,
    "num_tuning_micro_batch_sizes": 3
}
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   bfloat16_enabled ............. False
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   bfloat16_immediate_grad_update  False
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   checkpoint_parallel_write_pipeline  False
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   checkpoint_tag_validation_enabled  True
[2024-06-29 15:22:42,707] [INFO] [config.py:1001:print]   checkpoint_tag_validation_fail  False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fdec1622980>
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   communication_data_type ...... None
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kern: False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'neare, 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantizatitype': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_gro': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_paramet': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   curriculum_enabled_legacy .... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   curriculum_params_legacy ..... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enab': False}}}}
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   data_efficiency_enabled ...... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   dataloader_drop_last ......... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   disable_allgather ............ False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   dump_state ................... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'consecutive_hysesis': False, 'min_scale': 1}
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   eigenvalue_enabled ........... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   eigenvalue_gas_boundary_resolution  1
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   eigenvalue_layer_num ......... 0
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   eigenvalue_max_iter .......... 100
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   eigenvalue_stability ......... 1e-06
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   eigenvalue_tol ............... 0.01
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   eigenvalue_verbose ........... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   elasticity_enabled ........... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   flops_profiler_config ........ {
    "enabled": false,
    "recompute_fwd_factor": 0.0,
    "profile_step": 1,
    "module_depth": -1,
    "top_modules": 1,
    "detailed": true,
    "output_file": null
}
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   fp16_auto_cast ............... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   fp16_enabled ................. True
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   fp16_master_weights_and_gradients  False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   global_rank .................. 0
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   grad_accum_dtype ............. None
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   gradient_accumulation_steps .. 2
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   gradient_clipping ............ 1.0
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   gradient_predivide_factor .... 1.0
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   graph_harvesting ............. False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=Falpin_parameters=True tp_gather_partition_size=8
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   initial_dynamic_scale ........ 65536
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   load_universal_checkpoint .... False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   loss_scale ................... 0
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   memory_breakdown ............. False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   mics_hierarchial_params_gather  False
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   mics_shard_size .............. -1
[2024-06-29 15:22:42,708] [INFO] [config.py:1001:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedName') comet=CometConfig(enabled=False, samples_log_interval=100, project=None, workspace=None, api_key=None, experiment_name=None, experiment_key=None, online=None, modone) wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enablFalse
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   nebula_config ................ {
    "enabled": false,
    "persistent_storage_path": null,
    "persistent_time_interval": 100,
    "num_of_version_in_retention": 2,
    "enable_nebula_load": true,
    "load_path": null
}
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   optimizer_legacy_fusion ...... False
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   optimizer_name ............... None
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   optimizer_params ............. None
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpt_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   pld_enabled .................. False
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   pld_params ................... False
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   prescale_gradients ........... False
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   scheduler_name ............... None
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   scheduler_params ............. None
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   seq_parallel_communication_data_type  torch.float32
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   sparse_attention ............. None
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   sparse_gradients_enabled ..... False
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   steps_per_print .............. inf
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   timers_config ................ enabled=True synchronized=True
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   train_batch_size ............. 4
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   train_micro_batch_size_per_gpu  1
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   use_data_before_expert_parallel_  False
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   use_node_local_storage ....... False
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   wall_clock_breakdown ......... False
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   weight_quantization_config ... None
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   world_size ................... 2
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   zero_allow_untested_optimizer  True
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=2621440se_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False ofad_param=DeepSpeedZeroOffloadParamConfig(device='cpu', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=True) offload_optimi=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=se, ratio=1.0) sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=23592960 param_persistence_threshol1200 model_persistence_threshold=sys.maxsize max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=True use_all_reduce_for_fetcarams=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_qtized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_lir=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   zero_enabled ................. True
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   zero_force_ds_cpu_optimizer .. True
[2024-06-29 15:22:42,709] [INFO] [config.py:1001:print]   zero_optimization_stage ...... 3
[2024-06-29 15:22:42,709] [INFO] [config.py:987:print_user_config]   json = {
    "train_batch_size": 4,
    "train_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 2,
    "gradient_clipping": 1.0,
    "zero_allow_untested_optimizer": true,
    "fp16": {
        "enabled": true,
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "bf16": {
        "enabled": false
    },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1.000000e+09,
        "reduce_bucket_size": 2.621440e+07,
        "stage3_prefetch_bucket_size": 2.359296e+07,
        "stage3_param_persistence_threshold": 5.120000e+04,
        "stage3_max_live_parameters": 1.000000e+09,
        "stage3_max_reuse_distance": 1.000000e+09,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "steps_per_print": inf
}
[INFO|trainer.py:2128] 2024-06-29 15:22:42,709 >> ***** Running training *****
[INFO|trainer.py:2129] 2024-06-29 15:22:42,709 >>   Num examples = 3,600
[INFO|trainer.py:2130] 2024-06-29 15:22:42,709 >>   Num Epochs = 5
[INFO|trainer.py:2131] 2024-06-29 15:22:42,709 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2134] 2024-06-29 15:22:42,709 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
[INFO|trainer.py:2135] 2024-06-29 15:22:42,709 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2136] 2024-06-29 15:22:42,709 >>   Total optimization steps = 4,500
[INFO|trainer.py:2137] 2024-06-29 15:22:42,714 >>   Number of trainable parameters = 27,893,760
  0%|                                                                                                                                              | 0/4500 [00:00<?, ?it/root/miniconda3/envs/llamafactory/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should bessed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current delt behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  warnings.warn(

  0%|                                                                                                                                  | 2/4500 [02:44<100:19:17, 80.29s/
  0%|▏                                                                                                                                  | 7/4500 [08:46<91:20:26, 73.19s/

{'loss': 1.2934, 'grad_norm': 1.2723101440923572, 'learning_rate': 2.2222222222222225e-06, 'epoch': 0.01}
{'loss': 1.19, 'grad_norm': 1.3058574250868817, 'learning_rate': 4.444444444444445e-06, 'epoch': 0.02}
  1%|▋                                                                                                                                 | 24/4500 [29:18<90:05:55, 72.47s/
{'loss': 1.1011, 'grad_norm': 1.4012755254348368, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.03}
  1%|▉                                                                                                                                 | 31/4500 [37:45<89:56:10, 72.45s/

  1%|█                                                                                                                                 | 35/4500 [42:35<89:53:07, 72.47s/


  1%|█▏                                                                                                                                | 39/4500 [47:25<89:48:27, 72.47s/


{'loss': 0.898, 'grad_norm': 1.3420188309333259, 'learning_rate': 8.88888888888889e-06, 'epoch': 0.04}
  1%|█▎                                                                                                                                | 45/4500 [54:40<89:39:22, 72.45s/
  1%|▍                                      | 46/4500 [55:52<89:37:32, 72.44s/it]                                                                                                                                                                                                                                                                 {'loss': 0.684, 'grad_norm': 1.3847420371646724, 'learning_rate': 1.1111111111111112e-05, 'epoch': 0.06}
  1%|█▍                                                                         [INFO|trainer.py:3478] 2024-06-29 16:24:24,295 >> Saving model checkpoint to saves/qwen/l/sft/checkpoint-50
from llama-factory.
hiyouga commented on August 16, 2024
deepspeed z3 需要 nvlink 才能快
from llama-factory.
WangxuP commented on August 16, 2024
deepspeed z3 需要 nvlink 才能快
也就是说，常规网络下，造成这样的结果是正常的是吗？
from llama-factory.
Alwin4Zhang commented on August 16, 2024
多机多卡，卡通信了吧
from llama-factory.
WangxuP commented on August 16, 2024
多机多卡，卡通信了吧
卡之间是通信的，监控其它的卡是能看到对应的进程的。
from llama-factory.
NiushanDong commented on August 16, 2024
两机之间的网络是socket还是ib网络？如果不是ib网络，多机之间的通讯就会非常慢，从而影响训练速度
from llama-factory.
WangxuP commented on August 16, 2024
两机之间的网络是socket还是ib网络？如果不是ib网络，多机之间的通讯就会非常慢，从而影响训练速度
确实是这样的。
from llama-factory.
使用A10对qwen-14b-chat进行Lora微调，2机2卡训练比1机2卡慢了10倍 about llama-factory HOT 7 CLOSED

Comments (7)

模型微调的关键日志如下：

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent