help, i'm on orange pi 5 and i use phi-3 and now it slow :'((((( T_T 😢 😿 😭 <p d

You might get better performance with all the 8 cores with <a class="issu

You might get better performance with all the 8 cores with <a class="issue-link js-iss

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="22

selects too many cores by default on orange pi 5 (2x slower) about llama.cpp HOT 3 CLOSED

calculatortamer commented on September 24, 2024

selects too many cores by default on orange pi 5 (2x slower)

from llama.cpp.

Comments (3)

calculatortamer commented on September 24, 2024 1

You might get better performance with all the 8 cores with #6915.

that's so cool! thanks for letting me know

i give my benchmarks with that branch (kunnis:MMThreadingPerfChange) :
-t 4
prompt: 11.22 tokens per second
eval: 7.58 tokens per second
(faster than before which was 7t/s, 7.2t/s at best)

-t 8
prompt: 12.65 tokens per second
eval: 8.64 tokens per second

resulting text is still the exact same

should i close the issue or wait until it gets merged before closing it?

from llama.cpp.

slaren commented on September 24, 2024

You might get better performance with all the 8 cores with #6915.

from llama.cpp.

calculatortamer commented on September 24, 2024

#6915 merged

from llama.cpp.

Related Issues (20)

Feature Request: Loading PeFT - LoRA adapters during runtime without prior merging HOT 2
Bug: JSON Schema-to-GBNF additionalProperties bugs (and other minor quirks) HOT 2
Bug: error loading model architecture: unknown model architecture: 'clip' HOT 4
Bug: --threads-http argument has disappeared from server executable
Bug: token generation seems to slow down for higher slots HOT 4
Bug: Vulkan backend unable to allocate memory when run across multiple GPUs for larger models HOT 7
Bug: QWEN2 quantization GGML_ASSERT HOT 74
SIGSEGV on moderately complex grammar HOT 1
I am running two socket servers, and the CPU usage is at 50% HOT 4
Qwen2-57B-A14B-Instruct not supported HOT 13
Bug: QWEN2 MoE imatrix contains nan's after generating it HOT 7
Bug: Running a large model through the server using vulkan backend always generates gibberish after first call. HOT 1
Bug: CUDA enabled docker container fails to launch HOT 3
Converting finetune LLaVA model to gguf but while debugging getting result = self.mapping.get(key[:-len(suffix)]) as None HOT 6
No successful releases from CI in the last 2 days. HOT 5
iGPU offloading Bug: Memory access fault by GPU node-1 (appeared once only) HOT 1
Research: Im writing a paper on our medical finetuned llava-v1.6, HOT 3
Bug: Server ends up in infinite loop if number of requests in the batch is greater than parallel slots with system prompt HOT 3
Refactor: Formalise Keys.General GGUF KV Store HOT 12
Bug: embeddings endpoint broken HOT 1

selects too many cores by default on orange pi 5 (2x slower) about llama.cpp HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent