Comments (3)
If I add lm_head.linear1 and lm_head.linear2
Even if this works it will likely think this is just two linear .weight
type projections in series, whereas to use a .bias
it needs to do an affine projection.
I don't know enough about llama.cpp to help more, but IIRC the Qwen models have some affine projections in then and use .bias
as well as .weight
, so this might be worth a look.
from llama.cpp.
Can you provide some tips on what I need to modify to make this work?
If it's a variation of an existing architecture, you might be able to simply specify new optional tensors on model load and then detect their presence in the compute graph to use them when they are present.
This is kind of how StableLM2 1.6B support was added in #5052.
Also if there is any documentation on porting new model architectures I would appreciate it if you could point me to it.
https://github.com/ggerganov/llama.cpp/blob/master/docs/HOWTO-add-model.md
from llama.cpp.
from llama.cpp.
Related Issues (20)
- Bug: similar sizes suggest some heavy shared component in all 38 `llama-*` binaries (which now weigh 14 GB in total) HOT 5
- [feature request] conversion to gguf in a more pure form. HOT 2
- Vulkan backend regression: gibberish output when layers offloaded to GPU HOT 2
- Bug: Cannot load GGUF file, it asks if it is GGML. HOT 1
- Bug: Crashes at the end of startup during first prompt processing HOT 23
- Bug: llama.cpp apparently exits with '[end of text]' before processing prompt if prompt is ~2048 tokens
- Add Support for Bamboo LLM
- Bug: ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 137438953504 HOT 2
- sh: 1: ./llama.cpp/llama-quantize: not found HOT 2
- Bug: abort on Android (pixel 8 pro) HOT 1
- Bug: [RPC] RPC apparently isn't honoring backend memory capacity et. al. HOT 3
- Feature Request: Provide means to quantify the restriction of RAM/VRAM usage for each GPU and system RAM.
- Feature Request: It would be convenient and faster if users could specify that the model data used for a RPC-server instance is already available by some fast(er) means (file system GGUF, whatever). HOT 1
- Bug: Crash with GGML CUDA error when inferencing on llama-server HOT 9
- Bug: convert-hf-to-gguf.py - AttributeError: 'LlamaTokenizerFast' object has no attribute 'added_tokens_decoder' HOT 1
- Bug: llama3 8b gradient unsupported? HOT 1
- Bug: Missing required key: general.description
- Bug: After running for a while, the llama-server exhibits extremely high CPU usage, resulting in timeouts for all requests.
- Bug: converting model from HF to GGUF gives error
- Bug: infill reference crashed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.