Comments (2)
Simply put, there may have been an error during the installation of your casual_conv1d package. Currently, your code is actually running through "nn. Conv1d", which implements the casual_conv1d logic through a padding scheme. Therefore, the actual output needs to be truncated.
You can change lines 214 to 216 in the Mamba2 source code to
xBC = self.act(self.conv1d(xBC.transpose(1, 2))[:, :, :seqlen].transpose(1, 2))
For details, please check #437
from mamba.
Update: when I reinstalled conv-1d library, the latest commit code worked. Thanks!
Hi @AlwaysFHao
I am using official github version that is based on commit 03a38fb.
I used what you described, but I got into an error.
Here is my code:
model = MambaLMHeadModel.from_pretrained (pretrained_model_name="state-spaces/mamba2-130m")
tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-2.8b-hf")
text = "The text of the declaration of independence is:"
inputs = {'input_ids': tokenizer(text, return_tensors="pt")['input_ids'].to(device) }
input_ids = inputs['input_ids']
model.to(device)
out= model.generate (input_ids, max_length=100, temperature=0)
Here is the last part of the error log:
File ~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:194, in MixerModel.forward(self, input_ids, inference_params, **mixer_kwargs)
[192](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:192) residual = None
[193](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:193) for layer in self.layers:
--> [194](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:194) hidden_states, residual = layer(
[195](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:195) hidden_states, residual, inference_params=inference_params, **mixer_kwargs
[196](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:196) )
[197](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:197) if not self.fused_add_norm:
[198](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/mamba_official/mamba_ssm/models/mixer_seq_simple.py:198) residual = (hidden_states + residual) if residual is not None else hidden_states
...
---> [81](https://vscode-remote+ssh-002dremote-002blei-002dlab.vscode-resource.vscode-cdn.net/home/ziw081/mamba_official/~/miniconda3/envs/ssm/lib/python3.9/site-packages/triton/runtime/autotuner.py:81) self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File <string>:65, in _chunk_scan_fwd_kernel(cb_ptr, x_ptr, z_ptr, out_ptr, out_x_ptr, dt_ptr, dA_cumsum_ptr, seq_idx_ptr, C_ptr, prev_states_ptr, D_ptr, chunk_size, hdim, dstate, batch, seqlen, nheads_ngroups_ratio, stride_cb_batch, stride_cb_chunk, stride_cb_head, stride_cb_csize_m, stride_cb_csize_k, stride_x_batch, stride_x_seqlen, stride_x_head, stride_x_hdim, stride_z_batch, stride_z_seqlen, stride_z_head, stride_z_hdim, stride_out_batch, stride_out_seqlen, stride_out_head, stride_out_hdim, stride_dt_batch, stride_dt_chunk, stride_dt_head, stride_dt_csize, stride_dA_cs_batch, stride_dA_cs_chunk, stride_dA_cs_head, stride_dA_cs_csize, stride_seq_idx_batch, stride_seq_idx_seqlen, stride_C_batch, stride_C_seqlen, stride_C_head, stride_C_dstate, stride_states_batch, stride_states_chunk, stride_states_head, stride_states_hdim, stride_states_dstate, stride_D_head, IS_CAUSAL, HAS_D, D_HAS_HDIM, HAS_Z, HAS_SEQ_IDX, BLOCK_SIZE_M, BLOCK_SIZE_N, BLOCK_SIZE_K, BLOCK_SIZE_DSTATE, IS_TRITON_22, grid, num_warps, num_stages, extern_libs, stream, warmup, device, device_type)
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
from mamba.
Related Issues (20)
- Questions about Chunk_size using Triton optimization in SSD kernel HOT 2
- When I run mamba2 : ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
- Possible bug when running evaluation with self.use_mem_eff_path=False
- Typo of dconv at Line 231 of modules/mamba2.py HOT 1
- How to load mamba1's weight to mamba2 ? HOT 1
- Small datasets HOT 4
- Help with _chunk_state_fwd. HOT 1
- Assertion error in ssd_minimal HOT 5
- Questions regarding pretrained Mamba2-Attention Hybrid Model HOT 2
- (about the paper) In the Section5.1, I have a question: Why M matrix, whose element is also matrix, can finally be (T, T) size? HOT 2
- A mamba scaling problem given the perplexity score curves shown in the TTT paper HOT 2
- Passing an initial_conv_state in mamba_split_conv1d_scan_combined? HOT 2
- Self-distillation technique
- Question for 'self.use_mem_eff_path and inference_params'
- triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 254208, Hardware limit: 101376. HOT 2
- I want to ask does anyone know how to solve this problem
- /anaconda3/lib/python3.11/site-packages/causal_conv1d_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ENS_7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEEERKNS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb HOT 1
- Mamba-2 Error: `'NoneType' object has no attribute 'causal_conv1d_fwd'` HOT 2
- Used selective_scan_cuda and causal_conv1d_cuda, but still very slow to train
- mamba / self-attention hybrid generation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mamba.