Comments (2)
It looks like NVEmbed
is basically Mistral but with non-causal attention and "latent attention" pooling. I hadn't seen latent attention pooling before, but judging from the modeling code on HF, it's just another attention layer on top of the last hidden states.
Right now in llama.cpp
, we can tell causal-by-default models like Mistral to use non-causal attention. If we get #7477 merged, that will allow general pooling on these models. The only catch is we don't have latent pooling implemented, but it should be quite straightforward.
from llama.cpp.
If we get #7477 merged, that will allow general pooling on these models. The only catch is we don't have latent pooling implemented, but it should be quite straightforward.
Thanks, will wait for that to be merged.
from llama.cpp.
Related Issues (20)
- Bug: error loading model architecture: unknown model architecture: 'clip' HOT 3
- Bug: --threads-http argument has disappeared from server executable
- Bug: token generation seems to slow down for higher slots HOT 3
- Bug: Vulkan backend unable to allocate memory when run across multiple GPUs for larger models HOT 3
- Bug: QWEN2 quantization GGML_ASSERT HOT 68
- SIGSEGV on moderately complex grammar
- I am running two socket servers, and the CPU usage is at 50% HOT 3
- Qwen2-57B-A14B-Instruct not supported HOT 13
- Bug: QWEN2 MoE imatrix contains nan's after generating it HOT 7
- Bug: Running a large model through the server using vulkan backend always generates gibberish after first call.
- Bug: CUDA enabled docker container fails to launch HOT 1
- Converting finetune LLaVA model to gguf but while debugging getting result = self.mapping.get(key[:-len(suffix)]) as None
- No successful releases from CI in the last 2 days. HOT 5
- iGPU offloading Bug: Memory access fault by GPU node-1 (appeared once only)
- Research: Im writing a paper on our medical finetuned llava-v1.6, HOT 2
- Bug: Server ends up in infinite loop if number of requests in the batch is greater than parallel slots with system prompt HOT 2
- Refactor: Formalise Keys.General GGUF KV Store HOT 10
- Bug: embeddings endpoint broken HOT 1
- Bug: server /completion endpoint no longer accepts numeric tokens HOT 2
- Bug: Possible precision loss when using KV cache HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.