Comments (15)
@nmandic78 server
& main
have been renamed as of #7809. you may inadvertently be using stale compilation artifacts.
Line 14 in 26a39bb
try llama-server
& llama-cli
instead
from llama.cpp.
can you show your build log? i can confirm it works for me (though not specifically b3262, will update and verify), so wondering if it cached some build stuff and you didn't actually get latest
from llama.cpp.
The name of the server binary has been changed to llama-server
, you are probably using an old build.
from llama.cpp.
that works, thank you so much @bartowski1182
from llama.cpp.
gemma-2-9b-it works fine for me - Q6_K quant, converted and launched using llama.cpp b3259.
from llama.cpp.
Oh, I feel so stupid now :D Should have read that.
Indeed, when using renamed (ones really built with new release) binaries, there is no problem.
Thank you!
from llama.cpp.
gemma 9b not work
lama.cpp b3259
pip install git+https://github.com/ggerganov/llama.cpp.git@b3259
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "C:\Users\m\Desktop\ollama\1.py", line 4, in
llm = Llama(
^^^^^^
File "C:\Users\m\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp\llama.py", line 358, in init
self._model = self._stack.enter_context(contextlib.closing(_LlamaModel(
^^^^^^^^^^^^
File "C:\Users\m\AppData\Local\Programs\Python\Python311\Lib\site-packages\llama_cpp_internals.py", line 54, in init
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: ./gemma-2-9b-it.Q4_K.gguf
from llama.cpp.
i still see following when running gemma2 27b q4_k_m of https://huggingface.co/bartowski/gemma-2-27b-it-GGUF after updating to latest commits
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'
from llama.cpp.
i still see following when running gemma2 27b q4_k_m of https://huggingface.co/bartowski/gemma-2-27b-it-GGUF after updating to latest commits
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'
I just tested build from current master (d0a7145) using freshly downloaded Q3_K_S quant from this repo, launched like this:
llama-server.exe -v -ngl 99 -m gemma-2-27b-it-Q3_K_S.gguf -c 4096
and it seems to be working (at least in basic scope - stuff like interleaved SWA/full attention might be still missing):
from llama.cpp.
@wesleysanjose make sure you're using llama-cli and have all the latest binaries
from llama.cpp.
@bartowski1182 I use server to launch openai compatible server, what's the difference? I always uses that
./server -m $1 --n-gpu-layers $2 -c $3 --host 192.168.0.184 --port 5000 -b 4096 -to 120 -ts 20,6
from llama.cpp.
The binary names were updated a few weeks ago so you're using the old ones that have been sitting around
It should be ./llama-server
from llama.cpp.
I'm getting this error when attempting to quantize a bf16 GGUF after building on Windows. I'm tacking this on here, as a fix may be related.
$ ./llama-cli --version
version: 3504 (e09a800)
built with MSVC 19.40.33812.0 for x64
And yet:
llama.cpp\build\bin\release\quantize temp.gguf ./text-generation-webui/models/%1.Q8_0.gguf q8_0
Eventually ends with:
llama_model_loader: - type f32: 105 tensors
llama_model_loader: - type bf16: 183 tensors
llama_model_quantize: failed to quantize: unknown model architecture: 'gemma2'
main: failed to quantize model from 'temp.gguf'
from llama.cpp.
@jim-plus the quantize binary was also renamed alongside main & server, with the same llama-
prefix:
give llama-quantize
a shot.
from llama.cpp.
Ah, that did it (along with clearing out all the old binaries for a clean rebuild). Thanks!
from llama.cpp.
Related Issues (20)
- Bug: Inference fails with "llama_get_logits_ith: invalid logits id 7, reason: no logits" in ollama HOT 1
- Feature Request: RPC Cuda Build to link with cudart dlls HOT 1
- Fail to build with Vulkan on macOS with make or cmake HOT 7
- Feature Request: Req to support Structured output and JSON schema to GBNF
- Bug: KV cache load/save is slow HOT 1
- Bug: Quantized kv cache caused performance drop on Apple silicon HOT 3
- build ERROR: Failed building wheel for pyyaml HOT 1
- Bug: Update to "convert_hf_to_gguf.py"
- Bug: Latest version of convert_hf_to_gguf not compatible with gguf 0.9.1 from pip
- Bug: llama-cli.exe don't have option as doc describes (like --chat-template) HOT 5
- Bug: exception while rasing a another exception in convert_llama_ggml_to_gguf script
- Bug: Kompute exits before loading model when offloading to GPU HOT 1
- Feature Request: Support vulkan when building on Android
- Bug: When --parallel 4 is turned ON, the inferring result is apparently like fool .But when --parallel 4 is turned OFF everything is OK ? HOT 4
- Bug: Decoding special tokens in T5 HOT 6
- Feature Request: Ovis1.5-Gemma2-9B model support? HOT 1
- Bug: BF16 is very slow HOT 3
- Feature Request: echo=true in llama-server HOT 1
- Feature Request: add support to LLaVA OneVision
- Bug: Speed regression from early this year HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.