Comments (6)
A total anywhere in the 0-1.2 range will almost certainly be fine. You can probably get a lot weirder than that, but I'm still experimenting myself.
The paper this method comes from (https://arxiv.org/abs/2311.03099) shows great results with a drop rate as high as 0.9, which would be a density
value of 0.1. I haven't tried that low yet though. 0.3-0.5 have worked for me so far.
I'd be interested to hear if you get any fun results or run into any trouble with the code.
from mergekit.
Well for starters the install for the new branch doesn't quite work? I had to manually add the scripts and merge_methods folders into pip's install directory.
Mergekit doesn't like the Yi tokenizer, but that's fine, I can just use the llama one or copy it over.
Also my first test merge seems to be corrupt, and makes transformers error out with a bunch of strange CUDA asserts. A ties merge from the main branch 5 days ago worked fine. The config was:
models:
- model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
# no parameters necessary for base model
- model: /home/alpha/Storage/Models/Raw/migtissera_Tess-M-v1.2
parameters:
weight: 0.62
density: 0.55
- model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
parameters:
weight: 0.56
density: 0.55
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
parameters:
int8_mask: true
dtype: bfloat16
...
../aten/src/ATen/native/cuda/Indexing.cu:1237: indexSelectSmallIndex: block: [28,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
File "/home/alpha/AI/text-generation-webui/modules/callbacks.py", line 57, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/text-generation-webui/modules/text_generation.py", line 355, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 1719, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2801, in sample
outputs = self(
^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 879, in forward
inputs_embeds = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
^^^^^^^^^^^^
File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
shrug My testing time is limited, but I will poke at it some more soon, lol.
from mergekit.
That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that particular error.
(You can probably also work with the Yi tokenizer class directly if you pass --trust-remote-code
, if that's your jam.)
I'll see if I can replicate the setup issue too, that sounds annoying.
from mergekit.
That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that particular error.
(You can probably also work with the Yi tokenizer class directly if you pass
--trust-remote-code
, if that's your jam.)I'll see if I can replicate the setup issue too, that sounds annoying.
That is precisely what I did, to the dot. You probably don't have to replicate the model config, lol.
from mergekit.
Yeah it works with the base model tokenizer, thanks. In fact, a few responses from the merge model seem pretty smart.
from mergekit.
Any positive results from parameter tweaking yet?
Also, is there a particular reason not to go higher density? Should't values above 0.5 "preserve" more of the finetuning from the models?
from mergekit.
Related Issues (20)
- Typo in your evolutionary merging tutorial causes error HOT 1
- what is the default model download path ? HOT 3
- Idea: Scaling the Down-Projection Matrix in 'Mixture of Experts' Models HOT 7
- The format of the inference result after model merging is not consistent with the base model inference format HOT 1
- DoRA HOT 1
- Training at Lower Context and Merging Large
- View size is not compatible with input tensor's size and stride/You are trying to save a non contiguous tensor Fix HOT 1
- RuntimeError: "svd_cuda_gesvdj" not implemented for 'BFloat16' HOT 4
- Mergekit-Evolve with vLLM enabled causes error if merge_method is linear HOT 2
- How are dimensions of multiple layer's matched?
- Merge only the transformer parts (including the input embedding layer) HOT 5
- Implementation of AdaMerging: Adaptive Model Merging for Multi-Task Learning
- Existing Mergekit algorithms to merge VLM with LLM?
- Qwen/Qwen1.5-1.8B MoE Merging fails HOT 3
- Mixed Precision Merging HOT 1
- Add support for `subfolder` loading
- How to merge a VLM and LLM with different model type.
- Merge of hidden_size
- Relax dependency versions
- EvoMerge Genome Bug HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mergekit.