Code Monkey home page Code Monkey logo

Comments (6)

cg123 avatar cg123 commented on May 28, 2024 1

A total anywhere in the 0-1.2 range will almost certainly be fine. You can probably get a lot weirder than that, but I'm still experimenting myself.

The paper this method comes from (https://arxiv.org/abs/2311.03099) shows great results with a drop rate as high as 0.9, which would be a density value of 0.1. I haven't tried that low yet though. 0.3-0.5 have worked for me so far.

I'd be interested to hear if you get any fun results or run into any trouble with the code.

from mergekit.

brucethemoose avatar brucethemoose commented on May 28, 2024

Well for starters the install for the new branch doesn't quite work? I had to manually add the scripts and merge_methods folders into pip's install directory.

Mergekit doesn't like the Yi tokenizer, but that's fine, I can just use the llama one or copy it over.

Also my first test merge seems to be corrupt, and makes transformers error out with a bunch of strange CUDA asserts. A ties merge from the main branch 5 days ago worked fine. The config was:

models:
  - model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
    # no parameters necessary for base model
  - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-M-v1.2
    parameters:
      weight: 0.62
      density: 0.55
  - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
    parameters:
      weight: 0.56
      density: 0.55
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
parameters:
  int8_mask: true
dtype: bfloat16
...
../aten/src/ATen/native/cuda/Indexing.cu:1237: indexSelectSmallIndex: block: [28,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/alpha/AI/text-generation-webui/modules/callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/text-generation-webui/modules/text_generation.py", line 355, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 1719, in generate
    return self.sample(
           ^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2801, in sample
    outputs = self(
              ^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 879, in forward
    inputs_embeds = self.embed_tokens(input_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 163, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "/home/alpha/AI/voltaML-fast-stable-diffusion/venv/lib/python3.11/site-packages/torch/nn/functional.py", line 2237, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered

shrug My testing time is limited, but I will poke at it some more soon, lol.

from mergekit.

cg123 avatar cg123 commented on May 28, 2024

That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that particular error.

(You can probably also work with the Yi tokenizer class directly if you pass --trust-remote-code, if that's your jam.)

I'll see if I can replicate the setup issue too, that sounds annoying.

from mergekit.

brucethemoose avatar brucethemoose commented on May 28, 2024

That looks like a tokenizer mismatch issue to me. Did you maybe copy in the tokenizer for Tess-M-v1.2? The added tokens not present in the base model could cause that particular error.

(You can probably also work with the Yi tokenizer class directly if you pass --trust-remote-code, if that's your jam.)

I'll see if I can replicate the setup issue too, that sounds annoying.

That is precisely what I did, to the dot. You probably don't have to replicate the model config, lol.

from mergekit.

brucethemoose avatar brucethemoose commented on May 28, 2024

Yeah it works with the base model tokenizer, thanks. In fact, a few responses from the merge model seem pretty smart.

from mergekit.

brucethemoose avatar brucethemoose commented on May 28, 2024

Any positive results from parameter tweaking yet?

Also, is there a particular reason not to go higher density? Should't values above 0.5 "preserve" more of the finetuning from the models?

from mergekit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.