Comments (3)
You might get better performance with all the 8 cores with #6915.
that's so cool! thanks for letting me know
i give my benchmarks with that branch (kunnis:MMThreadingPerfChange) :
-t 4
prompt: 11.22 tokens per second
eval: 7.58 tokens per second
(faster than before which was 7t/s, 7.2t/s at best)
-t 8
prompt: 12.65 tokens per second
eval: 8.64 tokens per second
resulting text is still the exact same
should i close the issue or wait until it gets merged before closing it?
from llama.cpp.
You might get better performance with all the 8 cores with #6915.
from llama.cpp.
#6915 merged
from llama.cpp.
Related Issues (20)
- Feature Request: Loading PeFT - LoRA adapters during runtime without prior merging HOT 2
- Bug: JSON Schema-to-GBNF additionalProperties bugs (and other minor quirks) HOT 2
- Bug: error loading model architecture: unknown model architecture: 'clip' HOT 4
- Bug: --threads-http argument has disappeared from server executable
- Bug: token generation seems to slow down for higher slots HOT 4
- Bug: Vulkan backend unable to allocate memory when run across multiple GPUs for larger models HOT 7
- Bug: QWEN2 quantization GGML_ASSERT HOT 74
- SIGSEGV on moderately complex grammar HOT 1
- I am running two socket servers, and the CPU usage is at 50% HOT 4
- Qwen2-57B-A14B-Instruct not supported HOT 13
- Bug: QWEN2 MoE imatrix contains nan's after generating it HOT 7
- Bug: Running a large model through the server using vulkan backend always generates gibberish after first call. HOT 1
- Bug: CUDA enabled docker container fails to launch HOT 3
- Converting finetune LLaVA model to gguf but while debugging getting result = self.mapping.get(key[:-len(suffix)]) as None HOT 6
- No successful releases from CI in the last 2 days. HOT 5
- iGPU offloading Bug: Memory access fault by GPU node-1 (appeared once only) HOT 1
- Research: Im writing a paper on our medical finetuned llava-v1.6, HOT 3
- Bug: Server ends up in infinite loop if number of requests in the batch is greater than parallel slots with system prompt HOT 3
- Refactor: Formalise Keys.General GGUF KV Store HOT 12
- Bug: embeddings endpoint broken HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.