Comments (7)
After we merge #8526 we should try to add full support for this model. cc @compilade
from llama.cpp.
I love the shout-out in the linked blog post!
You can deploy Codestral Mamba using the mistral-inference SDK, which relies on the reference implementations from Mamba’s GitHub repository. The model can also be deployed through TensorRT-LLM. For local inference, keep an eye out for support in llama.cpp. You may download the raw weights from HuggingFace.
That's a really nice nod -- love to see it!
from llama.cpp.
Hey guys, any progress on ETA for it?
from llama.cpp.
#7727 should cover for this model, but with untied embeddings unlike the other Mamba2 models.
from llama.cpp.
FYI, there is an "ngroups" param that changes how layer norm is done : https://github.com/state-spaces/mamba/blob/c0a00bd1808881831ddf43206c69362d4df90cf7/mamba_ssm/modules/mamba2.py#L47
We use ngroups=8. If you forget it or try with ngroups = 1 you'll have a bad time.
Good luck !
from llama.cpp.
I'd love this.
from llama.cpp.
thanks!
from llama.cpp.
Related Issues (20)
- Bug: Phi-3 mini 128k performance degradation with kv size > 8k (server) HOT 3
- cannot import name 'BaseVocab' from 'gguf' HOT 2
- Llama-Quantize : Layers quantized in the wrong order, thus damaging the variable bits tensor quants scheme consistency.
- Bug: Quantizing a bog standard llama is failing - Error: Error quantizing: b"main: invalid nthread 'Q8_0' (stoi)\n" HOT 1
- Feature Request: Support Falcon Mamba 7B HOT 2
- Bug: M2 Mac Studio not using context shifting
- Bug: Slow response times with llama.cpp llama-server HOT 7
- Bug: llama-server scales default context incorrectly for multiple slots
- Feature Request: UPX the growing binaries in packaging. HOT 3
- Bug: -DCMAKE_CUDA_ARCHITECTURES=52 on GTX 1660 Ti or RTX 3060 results in incorrect output HOT 2
- Bug: Couldn't load GGUF file into Transformers
- Feature Request: Add split model support in gguf-py
- Bug: Flan t5 xl conversion error (medium severity, has workaround)
- Bug: NikolayKozloff/madlad400-10b-mt-Q8_0-GGUF works with llama-cli but doesn't work with llama-server HOT 5
- Feature Request: introduce Tool Call API in server mode HOT 2
- Feature Request: Support for fixie-ai/ultravox-v0_3
- Bug: OpenBLAS compile doesn‘t work in Ubuntu 22.04 HOT 4
- Bug: GGML_SCHED_MAX_SPLITS must be increased to run BigLlama-3.1-681B-Instruct using GPU acceleration HOT 2
- Feature Request: please add falcon 7b mamba support HOT 1
- Bug: Unable to load phi3:3B(2.2GB) model on Apple M1 Pro
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.