Comments (4)
Thanks! The bigger problem now is that I am out of disk space, haha!
Anyway, will try to figure out something later
from llama.cpp.
Leave a tip jar to get a @ggerganov bigger SSD and / or macbook :D
from llama.cpp.
Its kinda pointless now but I was able to merge the 30B and 65B with this core bit of hackery added to the convert script.
+ fname_model = sys.argv[1] + "/consolidated." + str(i).zfill(2) + ".pth"
+ model_i = torch.load(fname_model, map_location="cpu")
+
+ # Since the models are split, we need to append the tensors changing the shape/size
+ for k, v in model_i.items():
+ if k in model:
+ if model[k].dtype != v.dtype:
+ print("ERROR: Tensor types do not match: ", model[k].dtype, " vs ", v.dtype)
+ sys.exit(1)
+ elif len(model[k].shape) == 1:
+ print("Skipping tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+ continue
+ elif k == "output.weight":
+ print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+ model[k] = torch.cat((model[k], v), dim=0)
+ print("New shape: ", model[k].shape)
+ continue
+ elif "tok_embeddings" in k:
+ print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+ model[k] = torch.cat((model[k], v), dim=1)
+ print("New shape: ", model[k].shape)
+ continue
+ elif "attention.wo" in k:
+ print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+ model[k] = torch.cat((model[k], v), dim=1)
+ print("New shape: ", model[k].shape)
+ continue
+ elif "feed_forward.w2" in k:
+ print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+ model[k] = torch.cat((model[k], v), dim=1)
+ print("New shape: ", model[k].shape)
+ else:
+ print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype, " with shape: ", model[k].shape)
+ model[k] = torch.cat((model[k], v), dim=0)
+ print("New shape: ", model[k].shape)
+ else:
+ print("Adding tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+ model[k] = v
+ del model_i```
from llama.cpp.
Fixed with 007a8f6
On startup, we go through all the parts and merge them dynamically in the ggml
buffers.
from llama.cpp.
Related Issues (20)
- Significantly different results (and WRONG) inference when GPU is enabled. HOT 1
- Different tokenization than AutoTokenizer when word is adjacent to non-special added token HOT 2
- CodeQwen returns extra white space for code completion (w/server + fim)
- Non-deterministic output of the llama.cpp server when using multiple slots HOT 4
- Flash Attention not working with NVIDIA Quadro P3200 Pascal Architecture GPU HOT 4
- Gibberish from longer context HOT 10
- BOS ,and EOS tokens HOT 2
- llava 1.5 invalid output after first inference (llamacpp server) HOT 4
- Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly HOT 128
- ./perplexity should allow multiple files, and macro-averaging HOT 1
- Vega 56 with rocm failing to load model from file HOT 1
- Llama-3 MoE conversion script issue
- --timeout parameter on examples/server does not stop longer running inference in slots HOT 9
- Llava functions compiled as extern "C" throw exceptions
- DBRX GGUF conversion no longer working
- Using #pragma once makes it difficult HOT 2
- Unable to load on iOS device after update - Compiler encountered an internal error
- The convert-hf-to-gguf-update.py seems doesn't work. HOT 4
- Malformed `system_prompt` in `/completions` request crashes server HOT 1
- Completion probabilities exceed 100% with top_p < 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.cpp.