Are weights that add up to ~1.2 a sane target? And whats a sane value for the Bernoull

Sane defaults for dare_ties merging? about mergekit HOT 6 OPEN

brucethemoose commented on May 28, 2024

Sane defaults for dare_ties merging?

from mergekit.

Comments (6)

cg123 commented on May 28, 2024 1

A total anywhere in the 0-1.2 range will almost certainly be fine. You can probably get a lot weirder than that, but I'm still experimenting myself.

The paper this method comes from (https://arxiv.org/abs/2311.03099) shows great results with a drop rate as high as 0.9, which would be a density value of 0.1. I haven't tried that low yet though. 0.3-0.5 have worked for me so far.

I'd be interested to hear if you get any fun results or run into any trouble with the code.

from mergekit.

brucethemoose commented on May 28, 2024

Well for starters the install for the new branch doesn't quite work? I had to manually add the scripts and merge_methods folders into pip's install directory.

Mergekit doesn't like the Yi tokenizer, but that's fine, I can just use the llama one or copy it over.

Also my first test merge seems to be corrupt, and makes transformers error out with a bunch of strange CUDA asserts. A ties merge from the main branch 5 days ago worked fine. The config was:

models:
  - model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
    # no parameters necessary for base model
  - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-M-v1.2
    parameters:
      weight: 0.62
      density: 0.55
  - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
    parameters:
      weight: 0.56
      density: 0.55
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
parameters:
  int8_mask: true
dtype: bfloat16

...
../aten/src/ATen/native/cuda/Indexing.cu:1237: indexSelectSmallIndex: block: [28,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/alpha/AI/text-generation-webui/modules/callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/text-generation-webui/modules/text_generation.py", line 355, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 1719, in generate
    return self.sample(
           ^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2801, in sample
    outputs = self(
              ^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 879, in forward
    inputs_embeds = self.embed_tokens(input_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 163, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/functional.py", line 2237, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered

shrug My testing time is limited, but I will poke at it some more soon, lol.

from mergekit.

cg123 commented on May 28, 2024

That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that particular error.

(You can probably also work with the Yi tokenizer class directly if you pass --trust-remote-code, if that's your jam.)

I'll see if I can replicate the setup issue too, that sounds annoying.

from mergekit.

brucethemoose commented on May 28, 2024

That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that particular error.

(You can probably also work with the Yi tokenizer class directly if you pass --trust-remote-code, if that's your jam.)

I'll see if I can replicate the setup issue too, that sounds annoying.

That is precisely what I did, to the dot. You probably don't have to replicate the model config, lol.

from mergekit.

brucethemoose commented on May 28, 2024

Yeah it works with the base model tokenizer, thanks. In fact, a few responses from the merge model seem pretty smart.

from mergekit.

brucethemoose commented on May 28, 2024

Any positive results from parameter tweaking yet?

Also, is there a particular reason not to go higher density? Should't values above 0.5 "preserve" more of the finetuning from the models?

from mergekit.

Sane defaults for dare_ties merging? about mergekit HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent