Comments (6)
Just to confirm, do you have backslashes in that command line so that all are indeed passed in?
Does the binary print num_threads : 2
?
Even two threads should help, but it depends on the platform. Maybe one core is already enough to saturate memory bandwidth?
from gemma.cpp.
Yes, pretty sure gemma.cpp accepted the param. See the screenshot (was using 6 this time).
You are probably right; it may have hit the memory bottleneck even w/ 1 thread. Not sure how to check though.
Btw, this runs on an Android phone.
from gemma.cpp.
it won't always be monotonically increasing with # threads, can be quite system dependent so takes a bit of experimentation. You might want to try 2b-it-sfp which should be faster in general and may be less mem bandwidth bound.
Neat to hear it's running on an android phone! what model?
from gemma.cpp.
Running on Xiaomi 14
from gemma.cpp.
Good, so it's getting the argument value correctly. You can run STREAM to benchmark bandwidth, it also supports threading.
+1 to the SFP suggestion.
from gemma.cpp.
Closing for now, if there's anything that's not addressed above, feel free to chime in. Also added a small note to the README "What are some easy ways to make the model run faster?" here https://github.com/google/gemma.cpp?tab=readme-ov-file#troubleshooting-and-faqs
from gemma.cpp.
Related Issues (20)
- Using mingw64 build on windows 10 fails HOT 3
- Weights converted from PyTorch don't seem to work properly HOT 2
- Seeking feedbacks for python wrapper of gemma.cpp HOT 3
- Is the format of output Markdown? HOT 2
- Failed on raspberry pi OS (64bit) HOT 6
- TODO (**Optimize, potentially using new VQSort PartialSort**) HOT 1
- again:Cached compressed weights does not exist yet gemma HOT 5
- Implement arch changes
- Have there been any performance comparisons between gemma.cpp and llama.cpp? HOT 3
- Mismatch between expected 1024000 and actual 1024512 KiB size. HOT 1
- Any more fine example or user-guide for CodeGemma?
- How can I get the compiled static executable binary file in linux platform? HOT 2
- Near-term roadmap HOT 3
- Use a MatMul implementation over MatVec for Prefill Computations HOT 2
- Paligemma Support HOT 1
- Compiling under mingw with clang error.. HOT 5
- gemma.cc:1322: Failed to load model weight HOT 6
- Compilation fails for raspberry pi HOT 2
- Gemma.cpp hangs on a Gemma 7B model that was finetuned using huggingface peft(QLoRA) HOT 11
- OFF Topic, Request for Open-Sourcing Google Gemini Flash
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gemma.cpp.