Comments (4)
i found that vram dont have significant increase while inferring. so maybe something else caused the issue
from llama.cpp.
bisected and found the commit caused the issus, so keep it open
from llama.cpp.
from llama.cpp.
@0cc4m this is probably my bad, I made some changes to the way views are initialized in ggml-backend that may have created this issue. Views are now initialized in the buffer of their parent tensor, instead of on the compute buffer. The reason I made this change is because I came to the conclusion that allocating views on the compute buffer cannot work reliably because the compute buffer is not always of the same type as the buffer used to allocate the tensor originally, and backends should be able to use the same extra as their parent anyway. I thought it was safe to make this change because the CUDA backend no longer needs extras for normal buffers, but I didn't realize that the vulkan backend still does.
Looking at the ggml_tensor_extra_gpu
of the vulkan backend I think it should be possible to do this, the only change is that you would have to calculate the offset as t->extra->offset + t->view_offs
. Essentially, add the offset of the view to the offset of the extra. Does that sound right?
from llama.cpp.
Related Issues (20)
- server: Bring back multimodal support
- Feature Request: Support for Florence-2 Vision Models HOT 1
- Feature Request: Hardware support check HOT 12
- Bug: Or Feature? BPE Tokenization mutates whitespaces into double-whitespace tokens when add_prefix_space is true (default)
- Bug: Qwen2-72B-Instruct (and finetunes) Q4_K_M generates random output HOT 2
- Bug: Inference is messed up in llama-server+default ui and llama-cli but works in llama-server+openweb ui HOT 1
- Bug: `-fPIC` compiler flag missing in cmake build?
- Bug: Embedding endpoint takes exponential time to process a long unknown token HOT 3
- 我想convert一个比较大的模型时报错Unable to allocate 1.96 GiB for an array with shape (128256, 8192) and data type float16如何解决 HOT 1
- Bug: moondream2 inference not correct (severe quality degradation compared to reference)
- Tag b3187 Windows ARM binary release without "main.exe" HOT 1
- Bug: ABI problem in binary file "llama-b3187-bin-win-msvc-arm64.zip" HOT 1
- Bug: --chat-template seems to be broken now, no way to truly chat from the llama-cli HOT 3
- Bug: LoRA Finetuning fails for GPU offloading
- Bug: brew install on a Mac HOT 1
- Bug: Persistent hallucination even after re-running llama.cpp HOT 4
- win7 failed HOT 1
- Bug: JSON Schema - enum behind a $ref generates an object with unrestricted properties HOT 3
- Bug: llama-server crashes when started with --embeddings HOT 6
- Bug: similar sizes suggest some heavy shared component in all 38 `llama-*` binaries (which now weigh 14 GB in total) HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.