Code Monkey home page Code Monkey logo

Comments (4)

hiyouga avatar hiyouga commented on June 29, 2024

Try using the hf-compatible version of jamba: TechxGenus/Jamba-v0.1-hf

from llama-factory.

lwang2070 avatar lwang2070 commented on June 29, 2024

Nope, get the exact same error:

[rank0]: expected the next 1 parameters in the parameter fetch queue to be ({'id': 'name=model.layers.31.mamba.out_proj.weight id=1192', 'status': 'AVAILABLE', 'numel': 33554432, 'ds_numel': 33554432, 'shape': (4096, 8192), 'ds_shape': (4096, 8192), 'requires_grad': True, 'grad_shape': None, 'persist': False, 'active_sub_modules': {489}, 'ds_tensor.shape': torch.Size([4194304])},) 
[rank0]: but got 
[rank0]:  ({'id': 'name=model.layers.31.mamba.dt_proj.bias id=1191', 'status': 'AVAILABLE', 'numel': 8192, 'ds_numel': 8192, 'shape': (8192,), 'ds_shape': (8192,), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {483}, 'ds_tensor.shape': torch.Size([1024])},).

To confirm, this is the new config:

{
  "architectures": [
    "JambaForCausalLM"
  ],
  "attention_dropout": 0.0,
  "attn_layer_offset": 4,
  "attn_layer_period": 8,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "expert_layer_offset": 1,
  "expert_layer_period": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "mamba_conv_bias": true,
  "mamba_d_conv": 4,
  "mamba_d_state": 16,
  "mamba_dt_rank": 256,
  "mamba_expand": 2,
  "mamba_proj_bias": false,
  "max_position_embeddings": 262144,
  "model_type": "jamba",
  "num_attention_heads": 32,
  "num_experts": 16,
  "num_experts_per_tok": 2,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "num_logits_to_keep": 1,
  "output_router_logits": false,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-06,
  "router_aux_loss_coef": 0.001,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.0",
  "use_cache": true,
  "use_mamba_kernels": true,
  "vocab_size": 65536
}

from llama-factory.

hiyouga avatar hiyouga commented on June 29, 2024

The full traceback is needed for debugging

from llama-factory.

lwang2070 avatar lwang2070 commented on June 29, 2024

Here is the full traceback:

[rank7]: Traceback (most recent call last):
[rank7]:   File "/juicefs-algorithm/workspace/nlp/li_wang/llama/src/llamafactory/launcher.py", line 9, in <module>
[rank7]:     launch()
[rank7]:   File "/juicefs-algorithm/workspace/nlp/li_wang/llama/src/llamafactory/launcher.py", line 5, in launch
[rank7]:     run_exp()
[rank7]:   File "/juicefs-algorithm/workspace/nlp/li_wang/llama/src/llamafactory/train/tuner.py", line 40, in run_exp
[rank7]:     run_sft(
[rank7]:   File "/juicefs-algorithm/workspace/nlp/li_wang/llama/src/llamafactory/train/sft/workflow.py", line 98, in run_sft
[rank7]:     train_result = trainer.train(
[rank7]:                    ^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train
[rank7]:     return inner_training_loop(
[rank7]:            ^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
[rank7]:     tr_loss_step = self.training_step(model, inputs)
[rank7]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/trainer.py", line 3250, in training_step
[rank7]:     self.accelerator.backward(loss)
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/accelerate/accelerator.py", line 2126, in backward
[rank7]:     self.deepspeed_engine_wrapped.backward(loss, **kwargs)
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/accelerate/utils/deepspeed.py", line 166, in backward
[rank7]:     self.engine.backward(loss, **kwargs)
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1976, in backward
[rank7]:     self.optimizer.backward(loss, retain_graph=retain_graph)
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/zero/stage3.py", line 2213, in backward
[rank7]:     self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
[rank7]:     scaled_loss.backward(retain_graph=retain_graph)
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/_tensor.py", line 525, in backward
[rank7]:     torch.autograd.backward(
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/__init__.py", line 267, in backward
[rank7]:     _engine_run_backward(
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
[rank7]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/autograd/function.py", line 301, in apply
[rank7]:     return user_fn(self, *args)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 303, in backward
[rank7]:     outputs = ctx.run_function(*detached_inputs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank7]:     result = forward_call(*args, **kwargs)
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/models/jamba/modeling_jamba.py", line 1202, in forward
[rank7]:     hidden_states = self.mamba(
[rank7]:                     ^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank7]:     result = forward_call(*args, **kwargs)
[rank7]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/models/jamba/modeling_jamba.py", line 991, in forward
[rank7]:     return self.cuda_kernels_forward(hidden_states, cache_params)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/transformers/models/jamba/modeling_jamba.py", line 901, in cuda_kernels_forward
[rank7]:     contextualized_states = self.out_proj(scan_outputs.transpose(1, 2))
[rank7]:                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1571, in _call_impl
[rank7]:     args_result = hook(self, args)
[rank7]:                   ^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook
[rank7]:     self.pre_sub_module_forward_function(module)
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank7]:     return func(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function
[rank7]:     param_coordinator.fetch_sub_module(sub_module, forward=True)
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank7]:     return fn(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank7]:     ret_val = func(*args, **kwargs)
[rank7]:               ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank7]:     return func(*args, **kwargs)
[rank7]:            ^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/li_wang/local/conda/envs/llama/lib/python3.11/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 338, in fetch_sub_module
[rank7]:     raise RuntimeError(
[rank7]: RuntimeError: tracing error at step 470: 
[rank7]: module id: 489, training: True
[rank7]: expected the next 1 parameters in the parameter fetch queue to be ({'id': 'name=model.layers.31.mamba.out_proj.weight id=1192', 'status': 'AVAILABLE', 'numel': 33554432, 'ds_numel': 33554432, 'shape': (4096, 8192), 'ds_shape': (4096, 8192), 'requires_grad': True, 'grad_shape': None, 'persist': False, 'active_sub_modules': {489}, 'ds_tensor.shape': torch.Size([4194304])},) 
[rank7]: but got 
[rank7]:  ({'id': 'name=model.layers.31.mamba.dt_proj.bias id=1191', 'status': 'AVAILABLE', 'numel': 8192, 'ds_numel': 8192, 'shape': (8192,), 'ds_shape': (8192,), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {483}, 'ds_tensor.shape': torch.Size([1024])},).

from llama-factory.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.