Feature Deion New 7B coding model just released by Mistral.

After we merge <a class="issue-link js-issue-link" data-error-text="Failed to load tit

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="23

Feature Request: Support Codestral Mamba about llama.cpp HOT 7 OPEN

VelocityRa commented on August 16, 2024 92

Feature Request: Support Codestral Mamba

from llama.cpp.

Comments (7)

ggerganov commented on August 16, 2024 67

After we merge #8526 we should try to add full support for this model. cc @compilade

from llama.cpp.

HanClinto commented on August 16, 2024 36

I love the shout-out in the linked blog post!

You can deploy Codestral Mamba using the mistral-inference SDK, which relies on the reference implementations from Mamba’s GitHub repository. The model can also be deployed through TensorRT-LLM. For local inference, keep an eye out for support in llama.cpp. You may download the raw weights from HuggingFace.

That's a really nice nod -- love to see it!

from llama.cpp.

fredconex commented on August 16, 2024 19

Hey guys, any progress on ETA for it?

from llama.cpp.

theo77186 commented on August 16, 2024 9

#7727 should cover for this model, but with untied embeddings unlike the other Mamba2 models.

from llama.cpp.

timlacroix commented on August 16, 2024 5

FYI, there is an "ngroups" param that changes how layer norm is done : https://github.com/state-spaces/mamba/blob/c0a00bd1808881831ddf43206c69362d4df90cf7/mamba_ssm/modules/mamba2.py#L47

We use ngroups=8. If you forget it or try with ngroups = 1 you'll have a bad time.

Good luck !

from llama.cpp.

0wwafa commented on August 16, 2024 1

I'd love this.

from llama.cpp.

txhno commented on August 16, 2024

thanks!

from llama.cpp.

Feature Request: Support Codestral Mamba about llama.cpp HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent