Comments (4)
Try using the hf-compatible version of jamba: TechxGenus/Jamba-v0.1-hf
from llama-factory.
Nope, get the exact same error:
[rank0]: expected the next 1 parameters in the parameter fetch queue to be ({'id': 'name=model.layers.31.mamba.out_proj.weight id=1192', 'status': 'AVAILABLE', 'numel': 33554432, 'ds_numel': 33554432, 'shape': (4096, 8192), 'ds_shape': (4096, 8192), 'requires_grad': True, 'grad_shape': None, 'persist': False, 'active_sub_modules': {489}, 'ds_tensor.shape': torch.Size([4194304])},)
[rank0]: but got
[rank0]: ({'id': 'name=model.layers.31.mamba.dt_proj.bias id=1191', 'status': 'AVAILABLE', 'numel': 8192, 'ds_numel': 8192, 'shape': (8192,), 'ds_shape': (8192,), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {483}, 'ds_tensor.shape': torch.Size([1024])},).
To confirm, this is the new config:
{
"architectures": [
"JambaForCausalLM"
],
"attention_dropout": 0.0,
"attn_layer_offset": 4,
"attn_layer_period": 8,
"bos_token_id": 1,
"eos_token_id": 2,
"expert_layer_offset": 1,
"expert_layer_period": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"mamba_conv_bias": true,
"mamba_d_conv": 4,
"mamba_d_state": 16,
"mamba_dt_rank": 256,
"mamba_expand": 2,
"mamba_proj_bias": false,
"max_position_embeddings": 262144,
"model_type": "jamba",
"num_attention_heads": 32,
"num_experts": 16,
"num_experts_per_tok": 2,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"num_logits_to_keep": 1,
"output_router_logits": false,
"pad_token_id": 0,
"rms_norm_eps": 1e-06,
"router_aux_loss_coef": 0.001,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.40.0",
"use_cache": true,
"use_mamba_kernels": true,
"vocab_size": 65536
}
from llama-factory.
The full traceback is needed for debugging
from llama-factory.
Here is the full traceback:
[rank7]: Traceback (most recent call last):
[rank7]: File "/juicefs-algorithm/workspace/nlp/li_wang/llama/src/llamafactory/launcher.py", line 9, in <module>
[rank7]: launch()
[rank7]: File "/juicefs-algorithm/workspace/nlp/li_wang/llama/src/llamafactory/launcher.py", line 5, in launch
[rank7]: run_exp()
[rank7]: File "/juicefs-algorithm/workspace/nlp/li_wang/llama/src/llamafactory/train/tuner.py", line 40, in run_exp
[rank7]: run_sft(
[rank7]: File "/juicefs-algorithm/workspace/nlp/li_wang/llama/src/llamafactory/train/sft/workflow.py", line 98, in run_sft
[rank7]: train_result = trainer.train(
[rank7]: ^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train
[rank7]: return inner_training_loop(
[rank7]: ^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
[rank7]: tr_loss_step = self.training_step(model, inputs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 3250, in training_step
[rank7]: self.accelerator.backward(loss)
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/accelerate/accelerator.py", line 2126, in backward
[rank7]: self.deepspeed_engine_wrapped.backward(loss, **kwargs)
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/accelerate/utils/deepspeed.py", line 166, in backward
[rank7]: self.engine.backward(loss, **kwargs)
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]: ret_val = func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1976, in backward
[rank7]: self.optimizer.backward(loss, retain_graph=retain_graph)
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]: ret_val = func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/zero/stage3.py", line 2213, in backward
[rank7]: self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
[rank7]: scaled_loss.backward(retain_graph=retain_graph)
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/_tensor.py", line 525, in backward
[rank7]: torch.autograd.backward(
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/__init__.py", line 267, in backward
[rank7]: _engine_run_backward(
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
[rank7]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/function.py", line 301, in apply
[rank7]: return user_fn(self, *args)
[rank7]: ^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 303, in backward
[rank7]: outputs = ctx.run_function(*detached_inputs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank7]: result = forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/models/jamba/modeling_jamba.py", line 1202, in forward
[rank7]: hidden_states = self.mamba(
[rank7]: ^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank7]: result = forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/models/jamba/modeling_jamba.py", line 991, in forward
[rank7]: return self.cuda_kernels_forward(hidden_states, cache_params)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/models/jamba/modeling_jamba.py", line 901, in cuda_kernels_forward
[rank7]: contextualized_states = self.out_proj(scan_outputs.transpose(1, 2))
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1571, in _call_impl
[rank7]: args_result = hook(self, args)
[rank7]: ^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]: ret_val = func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook
[rank7]: self.pre_sub_module_forward_function(module)
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank7]: return func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function
[rank7]: param_coordinator.fetch_sub_module(sub_module, forward=True)
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank7]: return fn(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]: ret_val = func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank7]: return func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 338, in fetch_sub_module
[rank7]: raise RuntimeError(
[rank7]: RuntimeError: tracing error at step 470:
[rank7]: module id: 489, training: True
[rank7]: expected the next 1 parameters in the parameter fetch queue to be ({'id': 'name=model.layers.31.mamba.out_proj.weight id=1192', 'status': 'AVAILABLE', 'numel': 33554432, 'ds_numel': 33554432, 'shape': (4096, 8192), 'ds_shape': (4096, 8192), 'requires_grad': True, 'grad_shape': None, 'persist': False, 'active_sub_modules': {489}, 'ds_tensor.shape': torch.Size([4194304])},)
[rank7]: but got
[rank7]: ({'id': 'name=model.layers.31.mamba.dt_proj.bias id=1191', 'status': 'AVAILABLE', 'numel': 8192, 'ds_numel': 8192, 'shape': (8192,), 'ds_shape': (8192,), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {483}, 'ds_tensor.shape': torch.Size([1024])},).
from llama-factory.
Related Issues (20)
- 关于base模型sft template的问题 HOT 2
- llama facotry example运行时报错binascii.Error,有时候又能执行成功,是什么原因?
- 在使用llamafactory-cli api加载qwen2-72B-Instruct-AWQ出现CUDA out of memory HOT 1
- 是否支持Qlora进行预训练 HOT 1
- 同一个推理脚本,同一个问答,推理效果不一致 HOT 1
- 问题output过长的时候,回答会出现截断、不完整的情况 HOT 8
- 啥时候能够支持MoE-LoRA的相关工作,缓解灾难性遗忘的 HOT 1
- 量化训练、推理问题
- llamafactory-cli api 在处理前一个请求还没完成,再来一个新请求的时候,服务会崩掉,再也无法支持新请求了 HOT 13
- codestral模型支持问题 HOT 1
- export可以合并多个lora吗? HOT 2
- V100和Ascend910A微调glm4均失败,寻求帮助 HOT 11
- L40-48G 4卡正常lora训练 但是推理的时候OOM HOT 6
- 顺利在 Apple silicon M3 上运行 README 中 Llama3-8B 相关示例工作流的小波折 HOT 4
- 损失曲线中original 和 smoothed 分别是什么含义 HOT 1
- qwen2推理时报错,其他模型没问题 HOT 1
- 保存模型的参数
- 想问下目前框架支持微调DeepSeek-Coder-V2-Instruct不 HOT 1
- 量化后启动启动报错 HOT 1
- 设置了验证集比例,如何体现的呢? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama-factory.