Hi, when running example inference on Mamba2: <div class="snippet-clipboard-conten

Mamba2 assertion error about mamba HOT 2 OPEN

wyc1997 commented on July 17, 2024 2

Mamba2 assertion error

from mamba.

Comments (2)

AlwaysFHao commented on July 17, 2024

Simply put, there may have been an error during the installation of your casual_conv1d package. Currently, your code is actually running through "nn. Conv1d", which implements the casual_conv1d logic through a padding scheme. Therefore, the actual output needs to be truncated.
You can change lines 214 to 216 in the Mamba2 source code to
xBC = self.act(self.conv1d(xBC.transpose(1, 2))[:, :, :seqlen].transpose(1, 2))
For details, please check #437

from mamba.

zixianwang2022 commented on July 17, 2024

Update: when I reinstalled conv-1d library, the latest commit code worked. Thanks!

Hi @AlwaysFHao

I am using official github version that is based on commit 03a38fb.

I used what you described, but I got into an error.

Here is my code:

model = MambaLMHeadModel.from_pretrained (pretrained_model_name="state-spaces/mamba2-130m")
tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-2.8b-hf")
text = "The text of the declaration of independence is:"
inputs = {'input_ids': tokenizer(text, return_tensors="pt")['input_ids'].to(device) }
input_ids = inputs['input_ids']
model.to(device)
out= model.generate (input_ids, max_length=100, temperature=0)

Here is the last part of the error log:

File ~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:194, in MixerModel.forward(self, input_ids, inference_params, **mixer_kwargs)
    [192](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:192) residual = None
    [193](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:193) for layer in self.layers:
--> [194](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:194)     hidden_states, residual = layer(
    [195](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:195)         hidden_states, residual, inference_params=inference_params, **mixer_kwargs
    [196](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:196)     )
    [197](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:197) if not self.fused_add_norm:
    [198](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:198)     residual = (hidden_states + residual) if residual is not None else hidden_states
...
---> [81](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/miniconda3/envs/ssm/lib/python3.9/site-packages/triton/runtime/autotuner.py:81) self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)

File <string>:65, in _chunk_scan_fwd_kernel(cb_ptr, x_ptr, z_ptr, out_ptr, out_x_ptr, dt_ptr, dA_cumsum_ptr, seq_idx_ptr, C_ptr, prev_states_ptr, D_ptr, chunk_size, hdim, dstate, batch, seqlen, nheads_ngroups_ratio, stride_cb_batch, stride_cb_chunk, stride_cb_head, stride_cb_csize_m, stride_cb_csize_k, stride_x_batch, stride_x_seqlen, stride_x_head, stride_x_hdim, stride_z_batch, stride_z_seqlen, stride_z_head, stride_z_hdim, stride_out_batch, stride_out_seqlen, stride_out_head, stride_out_hdim, stride_dt_batch, stride_dt_chunk, stride_dt_head, stride_dt_csize, stride_dA_cs_batch, stride_dA_cs_chunk, stride_dA_cs_head, stride_dA_cs_csize, stride_seq_idx_batch, stride_seq_idx_seqlen, stride_C_batch, stride_C_seqlen, stride_C_head, stride_C_dstate, stride_states_batch, stride_states_chunk, stride_states_head, stride_states_hdim, stride_states_dstate, stride_D_head, IS_CAUSAL, HAS_D, D_HAS_HDIM, HAS_Z, HAS_SEQ_IDX, BLOCK_SIZE_M, BLOCK_SIZE_N, BLOCK_SIZE_K, BLOCK_SIZE_DSTATE, IS_TRITON_22, grid, num_warps, num_stages, extern_libs, stream, warmup, device, device_type)

ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

from mamba.

Mamba2 assertion error about mamba HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent