Comments (1)
The design philosophy of ggml/llama.cpp is not to use external dependencies if at all possible. I was recently informed by an NVIDIA engineer that the way to go for tensor cores is to directly write PTX code (the NVIDIA equivalent of assembly) so I may take a look at the project in terms of that.
Also, I know that you're an AMD user so I would advise you not to count your chickens before they hatch. If the project does what I think it does it would need significant effort to write the equivalence of PTX code for AMD (at least if the performance is supposed to be actually good) so I'm skeptical about AMD support "soon" (but I'll gladly let myself be proven wrong).
from llama.cpp.
Related Issues (20)
- Pretokenizer not supported by conversion script HOT 2
- convert.py still fails on llama3 8B-Instruct downloaded directly from Meta (Huggingface works) HOT 5
- Flash attention implementations do not handle case where value vectors have different dimension from query vectors HOT 5
- AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall
- [Android/Termux] Significantly higher RAM usage with Vulkan compared to CPU only HOT 3
- Description of "-t N" option for server is inaccurate HOT 1
- Need help on building shared libraries on Windows machine for Android x86_64 (emulator)
- [SYCL] include shared libs in sycl release HOT 3
- Can I handle multiple images in the same context?
- bf16 problem HOT 1
- llama_model_load: error loading model: unable to allocate backend buffer HOT 2
- Funny response with LLaMa 3 8B HOT 1
- Why does the server-cuda container consume CPU time? HOT 1
- convert-hf-to-gguf.py fails PR #7234
- Custom `seed` values ignored by `llama.cpp HTTP server` HOT 7
- Different result between use llama_tokenize and python original transformers tokenizer HOT 6
- Possible bug in the 'deepseek-coder' chat template's system message HOT 1
- llama_get_logits_ith: invalid logits id 14, reason: no logits HOT 2
- Possible (very serious) bug in chat templates that use '<s>' token having a space added after it HOT 2
- Llama.cpp server doesn't return grammar error messages when in streaming mode
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.