arcee-ai / mergekit Goto Github PK

Tools for merging pretrained large language models.

License: GNU Lesser General Public License v3.0

Python 99.18% Jupyter Notebook 0.82%

llama llm model-merging

mergekit's Introduction

mergekit

mergekit is a toolkit for merging pre-trained language models. mergekit uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

Features:

Supports Llama, Mistral, GPT-NeoX, StableLM, and more
Many merge methods
GPU or CPU execution
Lazy loading of tensors for low memory use
Interpolated gradients for parameter values (inspired by Gryphe's BlockMerge_Gradient script)
Piecewise assembly of language models from layers ("Frankenmerging")
Mixture of Experts merging

🔊 Call to Evolve - to solve evolutionary merge methods as a community - please see #207.

🌐 GUI Launch Alert 🤗 - We are excited to announce the launch of a graphical user interface for mergekit in Hugging Face Spaces! This GUI simplifies the merging process, making it more accessible to a broader audience. Check it out and contribute at Hugging Face Spaces - mergekit-community.

Installation

git clone https://github.com/arcee-ai/mergekit.git
cd mergekit

pip install -e .  # install the package and make scripts available

If the above fails with the error of:

ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode:
(A "pyproject.toml" file was found, but editable mode currently requires a setuptools-based build.)

You may need to upgrade pip to > 21.3 with the command python3 -m pip install --upgrade pip

Usage

The script mergekit-yaml is the main entry point for mergekit. It takes a YAML configuration file and an output path, like so:

mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]

This will run the merge and write your merged model to ./output-model-directory.

For more information on the arguments accepted by mergekit-yaml run the command mergekit-yaml --help.

Uploading to Huggingface

When you have a merged model you're happy with, you may want to share it on the Hugging Face Hub. mergekit generates a README.md for your merge with some basic information for a model card. You can edit it to include more details about your merge, like giving it a good name or explaining what it's good at; rewrite it entirely; or use the generated README.md as-is. It is also possible to edit your README.md online once it has been uploaded to the Hub.

Once you're happy with your model card and merged model, you can upload it to the Hugging Face Hub using the huggingface_hub Python library.

# log in to huggingface with an access token (must have write permission)
huggingface-cli login
# upload your model
huggingface-cli upload your_hf_username/my-cool-model ./output-model-directory .

The documentation for huggingface_hub goes into more detail about other options for uploading.

Merge Configuration

Merge configurations are YAML documents specifying the operations to perform in order to produce your merged model. Below are the primary elements of a configuration file:

merge_method: Specifies the method to use for merging models. See Merge Methods for a list.
slices: Defines slices of layers from different models to be used. This field is mutually exclusive with models.
models: Defines entire models to be used for merging. This field is mutually exclusive with slices.
base_model: Specifies the base model used in some merging methods.
parameters: Holds various parameters such as weights and densities, which can also be specified at different levels of the configuration.
dtype: Specifies the data type used for the merging operation.
tokenizer_source: Determines how to construct a tokenizer for the merged model.

Parameter Specification

Parameters are flexible and can be set with varying precedence. They can be specified conditionally using tensor name filters, which allows finer control such as differentiating between attention heads and fully connected layers.

Parameters can be specified as:

Scalars: Single floating-point values.
Gradients: List of floating-point values, specifying an interpolated gradient.

The parameters can be set at different levels, with decreasing precedence as follows:

slices.*.sources.parameters - applying to a specific input slice
slices.*.parameters - applying to a specific output slice
models.*.parameters or input_model_parameters - applying to any tensors coming from specific input models
parameters - catchall

Tokenizer Source

The tokenizer_source field of a configuration file determines what tokenizer is used by the merged model. This also effects how embeddings and language model heads are merged.

This functionality is still experimental and may break. Please file an issue if you encounter any issues with it.

Valid values:

base: use the tokenizer from the base model
union: construct a tokenizer with all tokens from all models
model:<model_path>: use the tokenizer from a specific model

If set, mergekit will find a mapping between each model's vocabulary and the output tokenizer. This allows models with different vocabularies or added tokens to be meaningfully merged.

tokenizer_source is compatible with all merge methods, but when used lm_head/embed_tokens will be merged linearly. For two-model merges, the embed_slerp parameter can be set to true to use SLERP instead.

If the tokenizer_source field is not set, mergekit will fall back to its legacy default behavior. The tokenizer for the base model (or first model in the merge, if no base model is specified) will be copied to the output directory. The parameter matrices for lm_head/embed_tokens will be truncated to the smallest size present in the merge. In most cases this corresponds to using the tokenizer for the base model.

Examples

Several examples of merge configurations are available in examples/.

Merge Methods

A quick overview of the currently supported merge methods:

Method	`merge_method` value	Multi-Model	Uses base model
Linear (Model Soups)	`linear`	✅	❌
SLERP	`slerp`	❌	✅
Task Arithmetic	`task_arithmetic`	✅	✅
TIES	`ties`	✅	✅
DARE TIES	`dare_ties`	✅	✅
DARE Task Arithmetic	`dare_linear`	✅	✅
Passthrough	`passthrough`	❌	❌
Model Breadcrumbs	`breadcrumbs`	✅	✅
Model Breadcrumbs + TIES	`breadcrumbs_ties`	✅	✅
Model Stock	`model_stock`	✅	✅

Linear

The classic merge method - a simple weighted average.

Parameters:

weight - relative (or absolute if normalize=False) weighting of a given tensor
normalize - if true, the weights of all models contributing to a tensor will be normalized. Default behavior.

SLERP

Spherically interpolate the parameters of two models. One must be set as base_model.

Parameters:

t - interpolation factor. At t=0 will return base_model, at t=1 will return the other one.

Task Arithmetic

Computes "task vectors" for each model by subtracting a base model. Merges the task vectors linearly and adds back the base. Works great for models that were fine tuned from a common ancestor. Also a super useful mental framework for several of the more involved merge methods.

Parameters: same as Linear

TIES

Builds on the task arithmetic framework. Resolves interference between models by sparsifying the task vectors and applying a sign consensus algorithm. Allows you to merge a larger number of models and retain more of their strengths.

Parameters: same as Linear, plus:

density - fraction of weights in differences from the base model to retain

DARE

In the same vein as TIES, sparsifies task vectors to reduce interference. Differs in that DARE uses random pruning with a novel rescaling to better match performance of the original models. DARE can be used either with the sign consensus algorithm of TIES (dare_ties) or without (dare_linear).

Parameters: same as TIES for dare_ties, or Linear for dare_linear

Passthrough

passthrough is a no-op that simply passes input tensors through unmodified. It is meant to be used for layer-stacking type merges where you have only one input model. Useful for frankenmerging.

Model Breadcrumbs

An extension of task arithmetic that discards both small and and extremely large differences from the base model. As with DARE, the Model Breadcrumbs algorithm can be used with (breadcrumbs_ties) or without (breadcrumbs) the sign consensus algorithm of TIES.

Parameters: same as Linear, plus:

density - fraction of weights in differences from the base model to retain
gamma - fraction of largest magnitude differences to remove

Note that gamma corresponds with the parameter β described in the paper, while density is the final density of the sparsified tensors (related to γ and β by density = 1 - γ - β). For good default values, try density: 0.9 and gamma: 0.01.

Model Stock

Uses some neat geometric properties of fine tuned models to compute good weights for linear interpolation. Requires at least three models, including a base model.

Parameters:

filter_wise: if true, weight calculation will be per-row rather than per-tensor. Not recommended.

LoRA extraction

Mergekit allows extracting PEFT-compatible low-rank approximations of finetuned models.

Usage

mergekit-extract-lora finetuned_model_id_or_path base_model_id_or_path output_path [--no-lazy-unpickle] --rank=desired_rank

Mixture of Experts merging

The mergekit-moe script supports merging multiple dense models into a mixture of experts, either for direct use or for further training. For more details see the mergekit-moe documentation.

Citation

We now have a paper you can cite for the MergeKit library:

@article{goddard2024arcee,
  title={Arcee's MergeKit: A Toolkit for Merging Large Language Models},
  author={Goddard, Charles and Siriwardhana, Shamane and Ehghaghi, Malikeh and Meyers, Luke and Karpukhin, Vlad and Benedict, Brian and McQuade, Mark and Solawetz, Jacob},
  journal={arXiv preprint arXiv:2403.13257},
  year={2024}
}

mergekit's People

Contributors

Stargazers

Watchers

Forkers

bet0x maximilian-winter xiechengmude tokenbender expert68 henk717 alexfdo sparverius legallycoder tomchapin sholtomaud callum17 masterjp123 pendekarcode hooji sundogs8603 eonsonicblue josephrp winglian icoderzqliu alpindale codeaudit ahhr80 menicefellow chesketh76 yellowrosecx sunyata11 xiusdk atinos31 bozdw79 tflowede mjweb100versinda eculdsayinertle hadryan techthiyanes balikowade hhy5277 jianzhengming dumpmemory yanxg yuhai-china fakerybakery kp-forks q5sys evelynmitchell goodsmash 170928 teilomillet fiditenemini odellus tungllm synthpx kenakafrosty apollohuang1 rogervaas hangj11 jipyeong-lee sungbalance vasu018 makrehchi manfredwang093 nivibilla zhutony cygwynd swadeshbalajee fst813 jeffrey-won-personal anubrag nyxkrage zonetwelve zxkyjimmy sicariussicariistuff anhmike ftgreat evdcush roysh tutumomo bananemure icemastert sorokinvld mlabonne mascobot vihangd wangjunxiao mexicanamerican hbcbh1999 moeinmn cdj0311 chrismugisha xzyaoi spico197 xiaoyee gstoica27 jan-karsten-kuhnke yizhang-unifr standardgalactic xingxiangrui ar57m f901107 hjf2005

mergekit's Issues

how to merge the custom_code model?

how to merge the custom_code model?
error like this :ValueError: The repository for D:/oobabooga_windows/text-generation-webui/models/THUDM_chatglm3-6b-32k contains custom code which must be executed to correctlyload the model. You can inspect the repository content at THUDM/chatglm3-6b-32k.
Please pass the argument trust_remote_code=True to allow custom code to be run.

Deci 7B Support

Hi,
Is Deci 7B support planned?
Thank you!

How to Load Models Downloaded from Hugging Face in a Script?

Thank you very much for your open-source project, it has been immensely helpful to me. I've encountered an issue during use and am hoping to get your assistance.

I have downloaded several models from Hugging Face to my personal directory at ~/.cache/huggingface/hub. I'm wondering how to read these models directly in my script. Could you please guide me on how to achieve this?

Looking forward to your response. Thank you!

A Possible bug about `sparsify.py`

def bernoulli(
    tensor: torch.Tensor, density: float, rescale: bool = True
) -> torch.Tensor:
    if density >= 1:
        return tensor

    mask = 1 - torch.bernoulli(torch.full_like(input=tensor, fill_value=density))
    res = tensor * mask
    if rescale:
        res *= 1 / (1 - density)
    return res

In this function density is used as probability of masking the weight ? if I have density=0.95, it will wipeout parameter with expectation of 0.95.

Could you please explain how passthrough slicing works?

If I have:

slices:
  - sources:
    - model: mistralai/Mistral-7B-Instruct-v0.2
      layer_range: [0, 24]
  - sources:
    - model: mistralai/Mistral-7B-Instruct-v0.2
      layer_range: [8, 32]
merge_method: passthrough
dtype: float16

Its taking layers 0-24 and then on top its putting layers 8-32 of the same model?

Or does it intermingle them somehow?

Beginner Question: Merge 2 7B models to make 13B model

Hi,
This is a beginner question, as I'm completely new to model merging. Is it possible to merge 2 7B models to create a larger 13B model?
Thank you

Merge two models of different architectures

Hi,
Is it possible to merge a Mistral and a Llama model? Also, would it be possible to convert a Mistral model to a model in the Llama format?
Thank you!

MPS support

Hi,
Any plans to support MPS/Apple Silicon as well as CUDA?

does it support TinyLlama?

tried to merge TinyLlama-1.1B-intermediate-step-1195k-token-2.5T got this error with lazy_unpickle:
UnpicklingError: Unsupported type torch._tensor._rebuild_from_type_v2

and without lazy_unpickle:
UnpicklingError: Weights only load failed. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported operand 149

phi-2 error

PS D:\mergekit-phi-2> mergekit-yaml 11b.yml ./output-model-directory --cuda
C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_mo │
│ dule_utils.py:595 in resolve_trust_remote_code                                                   │
│                                                                                                  │
│   592 │   │   │   trust_remote_code = False                                                      │
│   593 │   │   elif has_remote_code and TIME_OUT_REMOTE_CODE > 0:                                 │
│   594 │   │   │   try:                                                                           │
│ ❱ 595 │   │   │   │   signal.signal(signal.SIGALRM, _raise_timeout_error)                        │
│   596 │   │   │   │   signal.alarm(TIME_OUT_REMOTE_CODE)                                         │
│   597 │   │   │   │   while trust_remote_code is None:                                           │
│   598 │   │   │   │   │   answer = input(                                                        │
│                                                                                                  │
│ ╭─────────────────────────── locals ────────────────────────────╮                                │
│ │    has_local_code = False                                     │                                │
│ │   has_remote_code = True                                      │                                │
│ │        model_name = 'cognitivecomputations/dolphin-2_6-phi-2' │                                │
│ │ trust_remote_code = None                                      │                                │
│ ╰───────────────────────────────────────────────────────────────╯                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'signal' has no attribute 'SIGALRM'

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\mergekit-phi-2\mergekit\scripts\run_yaml.py:85 in main                                        │
│                                                                                                  │
│    82 │   │   data = yaml.load(file, yaml.SafeLoader)                                            │
│    83 │                                                                                          │
│    84 │   merge_config: MergeConfiguration = MergeConfiguration.model_validate(data)             │
│ ❱  85 │   run_merge(                                                                             │
│    86 │   │   merge_config,                                                                      │
│    87 │   │   out_path,                                                                          │
│    88 │   │   options=MergeOptions(                                                              │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       allow_crimes = False                                                                   │ │
│ │      clone_tensors = False                                                                   │ │
│ │        config_file = '11b.yml'                                                               │ │
│ │     copy_tokenizer = True                                                                    │ │
│ │               cuda = True                                                                    │ │
│ │               data = {                                                                       │ │
│ │                      │   'slices': [                                                         │ │
│ │                      │   │   {                                                               │ │
│ │                      │   │   │   'sources': [                                                │ │
│ │                      │   │   │   │   {                                                       │ │
│ │                      │   │   │   │   │   'model': 'cognitivecomputations/dolphin-2_6-phi-2', │ │
│ │                      │   │   │   │   │   'layer_range': [0, 18]                              │ │
│ │                      │   │   │   │   }                                                       │ │
│ │                      │   │   │   ]                                                           │ │
│ │                      │   │   },                                                              │ │
│ │                      │   │   {                                                               │ │
│ │                      │   │   │   'sources': [                                                │ │
│ │                      │   │   │   │   {                                                       │ │
│ │                      │   │   │   │   │   'model': 'cognitivecomputations/dolphin-2_6-phi-2', │ │
│ │                      │   │   │   │   │   'layer_range': [0, 32]                              │ │
│ │                      │   │   │   │   }                                                       │ │
│ │                      │   │   │   ]                                                           │ │
│ │                      │   │   },                                                              │ │
│ │                      │   │   {                                                               │ │
│ │                      │   │   │   'sources': [                                                │ │
│ │                      │   │   │   │   {                                                       │ │
│ │                      │   │   │   │   │   'model': 'cognitivecomputations/dolphin-2_6-phi-2', │ │
│ │                      │   │   │   │   │   'layer_range': [14, 32]                             │ │
│ │                      │   │   │   │   }                                                       │ │
│ │                      │   │   │   ]                                                           │ │
│ │                      │   │   }                                                               │ │
│ │                      │   ],                                                                  │ │
│ │                      │   'merge_method': 'passthrough',                                      │ │
│ │                      │   'dtype': 'bfloat16'                                                 │ │
│ │                      }                                                                       │ │
│ │               file = <_io.TextIOWrapper name='11b.yml' mode='r' encoding='utf-8'>            │ │
│ │      lazy_unpickle = False                                                                   │ │
│ │   lora_merge_cache = None                                                                    │ │
│ │     low_cpu_memory = False                                                                   │ │
│ │       merge_config = MergeConfiguration(                                                     │ │
│ │                      │   merge_method='passthrough',                                         │ │
│ │                      │   slices=[                                                            │ │
│ │                      │   │   OutputSliceDefinition(                                          │ │
│ │                      │   │   │   sources=[                                                   │ │
│ │                      │   │   │   │   InputSliceDefinition(                                   │ │
│ │                      │   │   │   │   │   model='cognitivecomputations/dolphin-2_6-phi-2',    │ │
│ │                      │   │   │   │   │   layer_range=(0, 18),                                │ │
│ │                      │   │   │   │   │   parameters=None                                     │ │
│ │                      │   │   │   │   )                                                       │ │
│ │                      │   │   │   ],                                                          │ │
│ │                      │   │   │   base_model=None,                                            │ │
│ │                      │   │   │   residual_weight=None,                                       │ │
│ │                      │   │   │   parameters=None                                             │ │
│ │                      │   │   ),                                                              │ │
│ │                      │   │   OutputSliceDefinition(                                          │ │
│ │                      │   │   │   sources=[                                                   │ │
│ │                      │   │   │   │   InputSliceDefinition(                                   │ │
│ │                      │   │   │   │   │   model='cognitivecomputations/dolphin-2_6-phi-2',    │ │
│ │                      │   │   │   │   │   layer_range=(0, 32),                                │ │
│ │                      │   │   │   │   │   parameters=None                                     │ │
│ │                      │   │   │   │   )                                                       │ │
│ │                      │   │   │   ],                                                          │ │
│ │                      │   │   │   base_model=None,                                            │ │
│ │                      │   │   │   residual_weight=None,                                       │ │
│ │                      │   │   │   parameters=None                                             │ │
│ │                      │   │   ),                                                              │ │
│ │                      │   │   OutputSliceDefinition(                                          │ │
│ │                      │   │   │   sources=[                                                   │ │
│ │                      │   │   │   │   InputSliceDefinition(                                   │ │
│ │                      │   │   │   │   │   model='cognitivecomputations/dolphin-2_6-phi-2',    │ │
│ │                      │   │   │   │   │   layer_range=(14, 32),                               │ │
│ │                      │   │   │   │   │   parameters=None                                     │ │
│ │                      │   │   │   │   )                                                       │ │
│ │                      │   │   │   ],                                                          │ │
│ │                      │   │   │   base_model=None,                                            │ │
│ │                      │   │   │   residual_weight=None,                                       │ │
│ │                      │   │   │   parameters=None                                             │ │
│ │                      │   │   )                                                               │ │
│ │                      │   ],                                                                  │ │
│ │                      │   models=None,                                                        │ │
│ │                      │   input_model_parameters=None,                                        │ │
│ │                      │   parameters=None,                                                    │ │
│ │                      │   base_model=None,                                                    │ │
│ │                      │   dtype='bfloat16',                                                   │ │
│ │                      │   tokenizer_source=None                                               │ │
│ │                      )                                                                       │ │
│ │           out_path = './output-model-directory'                                              │ │
│ │     out_shard_size = 5000000000                                                              │ │
│ │ transformers_cache = None                                                                    │ │
│ │  trust_remote_code = False                                                                   │ │
│ │            verbose = False                                                                   │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ D:\mergekit-phi-2\mergekit\merge.py:62 in run_merge                                              │
│                                                                                                  │
│    59 │   │   raise RuntimeError("No output requested")                                          │
│    60 │                                                                                          │
│    61 │   method = merge_methods.get(merge_config.merge_method)                                  │
│ ❱  62 │   model_arch_info = [                                                                    │
│    63 │   │   get_architecture_info(m.config()) for m in merge_config.referenced_models()        │
│    64 │   ]                                                                                      │
│    65 │   if not options.allow_crimes:                                                           │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │        dtype = torch.bfloat16                                                                │ │
│ │ merge_config = MergeConfiguration(                                                           │ │
│ │                │   merge_method='passthrough',                                               │ │
│ │                │   slices=[                                                                  │ │
│ │                │   │   OutputSliceDefinition(                                                │ │
│ │                │   │   │   sources=[                                                         │ │
│ │                │   │   │   │   InputSliceDefinition(                                         │ │
│ │                │   │   │   │   │   model='cognitivecomputations/dolphin-2_6-phi-2',          │ │
│ │                │   │   │   │   │   layer_range=(0, 18),                                      │ │
│ │                │   │   │   │   │   parameters=None                                           │ │
│ │                │   │   │   │   )                                                             │ │
│ │                │   │   │   ],                                                                │ │
│ │                │   │   │   base_model=None,                                                  │ │
│ │                │   │   │   residual_weight=None,                                             │ │
│ │                │   │   │   parameters=None                                                   │ │
│ │                │   │   ),                                                                    │ │
│ │                │   │   OutputSliceDefinition(                                                │ │
│ │                │   │   │   sources=[                                                         │ │
│ │                │   │   │   │   InputSliceDefinition(                                         │ │
│ │                │   │   │   │   │   model='cognitivecomputations/dolphin-2_6-phi-2',          │ │
│ │                │   │   │   │   │   layer_range=(0, 32),                                      │ │
│ │                │   │   │   │   │   parameters=None                                           │ │
│ │                │   │   │   │   )                                                             │ │
│ │                │   │   │   ],                                                                │ │
│ │                │   │   │   base_model=None,                                                  │ │
│ │                │   │   │   residual_weight=None,                                             │ │
│ │                │   │   │   parameters=None                                                   │ │
│ │                │   │   ),                                                                    │ │
│ │                │   │   OutputSliceDefinition(                                                │ │
│ │                │   │   │   sources=[                                                         │ │
│ │                │   │   │   │   InputSliceDefinition(                                         │ │
│ │                │   │   │   │   │   model='cognitivecomputations/dolphin-2_6-phi-2',          │ │
│ │                │   │   │   │   │   layer_range=(14, 32),                                     │ │
│ │                │   │   │   │   │   parameters=None                                           │ │
│ │                │   │   │   │   )                                                             │ │
│ │                │   │   │   ],                                                                │ │
│ │                │   │   │   base_model=None,                                                  │ │
│ │                │   │   │   residual_weight=None,                                             │ │
│ │                │   │   │   parameters=None                                                   │ │
│ │                │   │   )                                                                     │ │
│ │                │   ],                                                                        │ │
│ │                │   models=None,                                                              │ │
│ │                │   input_model_parameters=None,                                              │ │
│ │                │   parameters=None,                                                          │ │
│ │                │   base_model=None,                                                          │ │
│ │                │   dtype='bfloat16',                                                         │ │
│ │                │   tokenizer_source=None                                                     │ │
│ │                )                                                                             │ │
│ │       method = <mergekit.merge_methods.passthrough.PassthroughMerge object at                │ │
│ │                0x00000164EDC8A4D0>                                                           │ │
│ │      options = MergeOptions(                                                                 │ │
│ │                │   allow_crimes=False,                                                       │ │
│ │                │   transformers_cache=None,                                                  │ │
│ │                │   lora_merge_cache=None,                                                    │ │
│ │                │   cuda=True,                                                                │ │
│ │                │   low_cpu_memory=False,                                                     │ │
│ │                │   out_shard_size=5000000000,                                                │ │
│ │                │   copy_tokenizer=True,                                                      │ │
│ │                │   clone_tensors=False,                                                      │ │
│ │                │   trust_remote_code=False,                                                  │ │
│ │                │   random_seed=None,                                                         │ │
│ │                │   lazy_unpickle=False                                                       │ │
│ │                )                                                                             │ │
│ │     out_path = './output-model-directory'                                                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ D:\mergekit-phi-2\mergekit\merge.py:63 in <listcomp>                                             │
│                                                                                                  │
│    60 │                                                                                          │
│    61 │   method = merge_methods.get(merge_config.merge_method)                                  │
│    62 │   model_arch_info = [                                                                    │
│ ❱  63 │   │   get_architecture_info(m.config()) for m in merge_config.referenced_models()        │
│    64 │   ]                                                                                      │
│    65 │   if not options.allow_crimes:                                                           │
│    66 │   │   if not all(a == model_arch_info[0] for a in model_arch_info[1:]):                  │
│                                                                                                  │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮          │
│ │ .0 = <list_iterator object at 0x00000164F02B87F0>                                   │          │
│ │  m = ModelReference(path='cognitivecomputations/dolphin-2_6-phi-2', lora_path=None) │          │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯          │
│                                                                                                  │
│ D:\mergekit-phi-2\mergekit\common.py:75 in config                                                │
│                                                                                                  │
│    72 │   │   return ModelReference(path=out_path)                                               │
│    73 │                                                                                          │
│    74 │   def config(self) -> PretrainedConfig:                                                  │
│ ❱  75 │   │   return AutoConfig.from_pretrained(self.path)                                       │
│    76 │                                                                                          │
│    77 │   def tensor_index(self, cache_dir: Optional[str] = None) -> ShardedTensorIndex:         │
│    78 │   │   assert self.lora_path is None                                                      │
│                                                                                                  │
│ ╭─────────────────────────────────────── locals ────────────────────────────────────────╮        │
│ │ self = ModelReference(path='cognitivecomputations/dolphin-2_6-phi-2', lora_path=None) │        │
│ ╰───────────────────────────────────────────────────────────────────────────────────────╯        │
│                                                                                                  │
│ C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\aut │
│ o\configuration_auto.py:1085 in from_pretrained                                                  │
│                                                                                                  │
│   1082 │   │   config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_n  │
│   1083 │   │   has_remote_code = "auto_map" in config_dict and "AutoConfig" in config_dict["aut  │
│   1084 │   │   has_local_code = "model_type" in config_dict and config_dict["model_type"] in CO  │
│ ❱ 1085 │   │   trust_remote_code = resolve_trust_remote_code(                                    │
│   1086 │   │   │   trust_remote_code, pretrained_model_name_or_path, has_local_code, has_remote  │
│   1087 │   │   )                                                                                 │
│   1088                                                                                           │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                           cls = <class                                                       │ │
│ │                                 'transformers.models.auto.configuration_auto.AutoConfig'>    │ │
│ │                 code_revision = None                                                         │ │
│ │                   config_dict = {                                                            │ │
│ │                                 │   '_name_or_path': 'microsoft/phi-2',                      │ │
│ │                                 │   'activation_function': 'gelu_new',                       │ │
│ │                                 │   'architectures': ['PhiForCausalLM'],                     │ │
│ │                                 │   'attn_pdrop': 0.0,                                       │ │
│ │                                 │   'auto_map': {                                            │ │
│ │                                 │   │   'AutoConfig':                                        │ │
│ │                                 'cognitivecomputations/dolphin-2_6-phi-2--configuration_phi… │ │
│ │                                 │   │   'AutoModelForCausalLM':                              │ │
│ │                                 'cognitivecomputations/dolphin-2_6-phi-2--modeling_phi.PhiF… │ │
│ │                                 │   },                                                       │ │
│ │                                 │   'embd_pdrop': 0.0,                                       │ │
│ │                                 │   'flash_attn': False,                                     │ │
│ │                                 │   'flash_rotary': False,                                   │ │
│ │                                 │   'fused_dense': False,                                    │ │
│ │                                 │   'img_processor': None,                                   │ │
│ │                                 │   ... +17                                                  │ │
│ │                                 }                                                            │ │
│ │                has_local_code = False                                                        │ │
│ │               has_remote_code = True                                                         │ │
│ │                        kwargs = {                                                            │ │
│ │                                 │   '_from_auto': True,                                      │ │
│ │                                 │   'name_or_path':                                          │ │
│ │                                 'cognitivecomputations/dolphin-2_6-phi-2'                    │ │
│ │                                 }                                                            │ │
│ │ pretrained_model_name_or_path = 'cognitivecomputations/dolphin-2_6-phi-2'                    │ │
│ │             trust_remote_code = None                                                         │ │
│ │                 unused_kwargs = {'name_or_path': 'cognitivecomputations/dolphin-2_6-phi-2'}  │ │
│ │                use_auth_token = None                                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_mo │
│ dule_utils.py:611 in resolve_trust_remote_code                                                   │
│                                                                                                  │
│   608 │   │   │   │   signal.alarm(0)                                                            │
│   609 │   │   │   except Exception:                                                              │
│   610 │   │   │   │   # OS which does not support signal.SIGALRM                                 │
│ ❱ 611 │   │   │   │   raise ValueError(                                                          │
│   612 │   │   │   │   │   f"The repository for {model_name} contains custom code which must be   │
│   613 │   │   │   │   │   f"load the model. You can inspect the repository content at https://   │
│   614 │   │   │   │   │   f"Please pass the argument `trust_remote_code=True` to allow custom    │
│                                                                                                  │
│ ╭─────────────────────────── locals ────────────────────────────╮                                │
│ │    has_local_code = False                                     │                                │
│ │   has_remote_code = True                                      │                                │
│ │        model_name = 'cognitivecomputations/dolphin-2_6-phi-2' │                                │
│ │ trust_remote_code = None                                      │                                │
│ ╰───────────────────────────────────────────────────────────────╯                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: The repository for cognitivecomputations/dolphin-2_6-phi-2 contains custom code which must be executed to
correctly load the model. You can inspect the repository content at
https://hf.co/cognitivecomputations/dolphin-2_6-phi-2.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
PS D:\mergekit-phi-2> mergekit-yaml 11b.yml ./output-model-directory --cuda
C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_mo │
│ dule_utils.py:595 in resolve_trust_remote_code                                                   │
│                                                                                                  │
│   592 │   │   │   trust_remote_code = False                                                      │
│   593 │   │   elif has_remote_code and TIME_OUT_REMOTE_CODE > 0:                                 │
│   594 │   │   │   try:                                                                           │
│ ❱ 595 │   │   │   │   signal.signal(signal.SIGALRM, _raise_timeout_error)                        │
│   596 │   │   │   │   signal.alarm(TIME_OUT_REMOTE_CODE)                                         │
│   597 │   │   │   │   while trust_remote_code is None:                                           │
│   598 │   │   │   │   │   answer = input(                                                        │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │    has_local_code = False                                                                    │ │
│ │   has_remote_code = True                                                                     │ │
│ │        model_name = 'D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomputa… │ │
│ │ trust_remote_code = None                                                                     │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'signal' has no attribute 'SIGALRM'

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\mergekit-phi-2\mergekit\scripts\run_yaml.py:85 in main                                        │
│                                                                                                  │
│    82 │   │   data = yaml.load(file, yaml.SafeLoader)                                            │
│    83 │                                                                                          │
│    84 │   merge_config: MergeConfiguration = MergeConfiguration.model_validate(data)             │
│ ❱  85 │   run_merge(                                                                             │
│    86 │   │   merge_config,                                                                      │
│    87 │   │   out_path,                                                                          │
│    88 │   │   options=MergeOptions(                                                              │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       allow_crimes = False                                                                   │ │
│ │      clone_tensors = False                                                                   │ │
│ │        config_file = '11b.yml'                                                               │ │
│ │     copy_tokenizer = True                                                                    │ │
│ │               cuda = True                                                                    │ │
│ │               data = {                                                                       │ │
│ │                      │   'slices': [                                                         │ │
│ │                      │   │   {                                                               │ │
│ │                      │   │   │   'sources': [                                                │ │
│ │                      │   │   │   │   {                                                       │ │
│ │                      │   │   │   │   │   'model':                                            │ │
│ │                      'D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomput… │ │
│ │                      │   │   │   │   │   'layer_range': [0, 18]                              │ │
│ │                      │   │   │   │   }                                                       │ │
│ │                      │   │   │   ]                                                           │ │
│ │                      │   │   },                                                              │ │
│ │                      │   │   {                                                               │ │
│ │                      │   │   │   'sources': [                                                │ │
│ │                      │   │   │   │   {                                                       │ │
│ │                      │   │   │   │   │   'model':                                            │ │
│ │                      'D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomput… │ │
│ │                      │   │   │   │   │   'layer_range': [0, 32]                              │ │
│ │                      │   │   │   │   }                                                       │ │
│ │                      │   │   │   ]                                                           │ │
│ │                      │   │   },                                                              │ │
│ │                      │   │   {                                                               │ │
│ │                      │   │   │   'sources': [                                                │ │
│ │                      │   │   │   │   {                                                       │ │
│ │                      │   │   │   │   │   'model':                                            │ │
│ │                      'D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomput… │ │
│ │                      │   │   │   │   │   'layer_range': [14, 32]                             │ │
│ │                      │   │   │   │   }                                                       │ │
│ │                      │   │   │   ]                                                           │ │
│ │                      │   │   }                                                               │ │
│ │                      │   ],                                                                  │ │
│ │                      │   'merge_method': 'passthrough',                                      │ │
│ │                      │   'dtype': 'bfloat16'                                                 │ │
│ │                      }                                                                       │ │
│ │               file = <_io.TextIOWrapper name='11b.yml' mode='r' encoding='utf-8'>            │ │
│ │      lazy_unpickle = False                                                                   │ │
│ │   lora_merge_cache = None                                                                    │ │
│ │     low_cpu_memory = False                                                                   │ │
│ │       merge_config = MergeConfiguration(                                                     │ │
│ │                      │   merge_method='passthrough',                                         │ │
│ │                      │   slices=[                                                            │ │
│ │                      │   │   OutputSliceDefinition(                                          │ │
│ │                      │   │   │   sources=[                                                   │ │
│ │                      │   │   │   │   InputSliceDefinition(                                   │ │
│ │                      │   │   │   │   │                                                       │ │
│ │                      model='D:\\oobabooga_windows\\text-generation-webui\\models\\cognitive… │ │
│ │                      │   │   │   │   │   layer_range=(0, 18),                                │ │
│ │                      │   │   │   │   │   parameters=None                                     │ │
│ │                      │   │   │   │   )                                                       │ │
│ │                      │   │   │   ],                                                          │ │
│ │                      │   │   │   base_model=None,                                            │ │
│ │                      │   │   │   residual_weight=None,                                       │ │
│ │                      │   │   │   parameters=None                                             │ │
│ │                      │   │   ),                                                              │ │
│ │                      │   │   OutputSliceDefinition(                                          │ │
│ │                      │   │   │   sources=[                                                   │ │
│ │                      │   │   │   │   InputSliceDefinition(                                   │ │
│ │                      │   │   │   │   │                                                       │ │
│ │                      model='D:\\oobabooga_windows\\text-generation-webui\\models\\cognitive… │ │
│ │                      │   │   │   │   │   layer_range=(0, 32),                                │ │
│ │                      │   │   │   │   │   parameters=None                                     │ │
│ │                      │   │   │   │   )                                                       │ │
│ │                      │   │   │   ],                                                          │ │
│ │                      │   │   │   base_model=None,                                            │ │
│ │                      │   │   │   residual_weight=None,                                       │ │
│ │                      │   │   │   parameters=None                                             │ │
│ │                      │   │   ),                                                              │ │
│ │                      │   │   OutputSliceDefinition(                                          │ │
│ │                      │   │   │   sources=[                                                   │ │
│ │                      │   │   │   │   InputSliceDefinition(                                   │ │
│ │                      │   │   │   │   │                                                       │ │
│ │                      model='D:\\oobabooga_windows\\text-generation-webui\\models\\cognitive… │ │
│ │                      │   │   │   │   │   layer_range=(14, 32),                               │ │
│ │                      │   │   │   │   │   parameters=None                                     │ │
│ │                      │   │   │   │   )                                                       │ │
│ │                      │   │   │   ],                                                          │ │
│ │                      │   │   │   base_model=None,                                            │ │
│ │                      │   │   │   residual_weight=None,                                       │ │
│ │                      │   │   │   parameters=None                                             │ │
│ │                      │   │   )                                                               │ │
│ │                      │   ],                                                                  │ │
│ │                      │   models=None,                                                        │ │
│ │                      │   input_model_parameters=None,                                        │ │
│ │                      │   parameters=None,                                                    │ │
│ │                      │   base_model=None,                                                    │ │
│ │                      │   dtype='bfloat16',                                                   │ │
│ │                      │   tokenizer_source=None                                               │ │
│ │                      )                                                                       │ │
│ │           out_path = './output-model-directory'                                              │ │
│ │     out_shard_size = 5000000000                                                              │ │
│ │ transformers_cache = None                                                                    │ │
│ │  trust_remote_code = False                                                                   │ │
│ │            verbose = False                                                                   │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ D:\mergekit-phi-2\mergekit\merge.py:62 in run_merge                                              │
│                                                                                                  │
│    59 │   │   raise RuntimeError("No output requested")                                          │
│    60 │                                                                                          │
│    61 │   method = merge_methods.get(merge_config.merge_method)                                  │
│ ❱  62 │   model_arch_info = [                                                                    │
│    63 │   │   get_architecture_info(m.config()) for m in merge_config.referenced_models()        │
│    64 │   ]                                                                                      │
│    65 │   if not options.allow_crimes:                                                           │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │        dtype = torch.bfloat16                                                                │ │
│ │ merge_config = MergeConfiguration(                                                           │ │
│ │                │   merge_method='passthrough',                                               │ │
│ │                │   slices=[                                                                  │ │
│ │                │   │   OutputSliceDefinition(                                                │ │
│ │                │   │   │   sources=[                                                         │ │
│ │                │   │   │   │   InputSliceDefinition(                                         │ │
│ │                │   │   │   │   │                                                             │ │
│ │                model='D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomput… │ │
│ │                │   │   │   │   │   layer_range=(0, 18),                                      │ │
│ │                │   │   │   │   │   parameters=None                                           │ │
│ │                │   │   │   │   )                                                             │ │
│ │                │   │   │   ],                                                                │ │
│ │                │   │   │   base_model=None,                                                  │ │
│ │                │   │   │   residual_weight=None,                                             │ │
│ │                │   │   │   parameters=None                                                   │ │
│ │                │   │   ),                                                                    │ │
│ │                │   │   OutputSliceDefinition(                                                │ │
│ │                │   │   │   sources=[                                                         │ │
│ │                │   │   │   │   InputSliceDefinition(                                         │ │
│ │                │   │   │   │   │                                                             │ │
│ │                model='D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomput… │ │
│ │                │   │   │   │   │   layer_range=(0, 32),                                      │ │
│ │                │   │   │   │   │   parameters=None                                           │ │
│ │                │   │   │   │   )                                                             │ │
│ │                │   │   │   ],                                                                │ │
│ │                │   │   │   base_model=None,                                                  │ │
│ │                │   │   │   residual_weight=None,                                             │ │
│ │                │   │   │   parameters=None                                                   │ │
│ │                │   │   ),                                                                    │ │
│ │                │   │   OutputSliceDefinition(                                                │ │
│ │                │   │   │   sources=[                                                         │ │
│ │                │   │   │   │   InputSliceDefinition(                                         │ │
│ │                │   │   │   │   │                                                             │ │
│ │                model='D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomput… │ │
│ │                │   │   │   │   │   layer_range=(14, 32),                                     │ │
│ │                │   │   │   │   │   parameters=None                                           │ │
│ │                │   │   │   │   )                                                             │ │
│ │                │   │   │   ],                                                                │ │
│ │                │   │   │   base_model=None,                                                  │ │
│ │                │   │   │   residual_weight=None,                                             │ │
│ │                │   │   │   parameters=None                                                   │ │
│ │                │   │   )                                                                     │ │
│ │                │   ],                                                                        │ │
│ │                │   models=None,                                                              │ │
│ │                │   input_model_parameters=None,                                              │ │
│ │                │   parameters=None,                                                          │ │
│ │                │   base_model=None,                                                          │ │
│ │                │   dtype='bfloat16',                                                         │ │
│ │                │   tokenizer_source=None                                                     │ │
│ │                )                                                                             │ │
│ │       method = <mergekit.merge_methods.passthrough.PassthroughMerge object at                │ │
│ │                0x000002C3AD353950>                                                           │ │
│ │      options = MergeOptions(                                                                 │ │
│ │                │   allow_crimes=False,                                                       │ │
│ │                │   transformers_cache=None,                                                  │ │
│ │                │   lora_merge_cache=None,                                                    │ │
│ │                │   cuda=True,                                                                │ │
│ │                │   low_cpu_memory=False,                                                     │ │
│ │                │   out_shard_size=5000000000,                                                │ │
│ │                │   copy_tokenizer=True,                                                      │ │
│ │                │   clone_tensors=False,                                                      │ │
│ │                │   trust_remote_code=False,                                                  │ │
│ │                │   random_seed=None,                                                         │ │
│ │                │   lazy_unpickle=False                                                       │ │
│ │                )                                                                             │ │
│ │     out_path = './output-model-directory'                                                    │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ D:\mergekit-phi-2\mergekit\merge.py:63 in <listcomp>                                             │
│                                                                                                  │
│    60 │                                                                                          │
│    61 │   method = merge_methods.get(merge_config.merge_method)                                  │
│    62 │   model_arch_info = [                                                                    │
│ ❱  63 │   │   get_architecture_info(m.config()) for m in merge_config.referenced_models()        │
│    64 │   ]                                                                                      │
│    65 │   if not options.allow_crimes:                                                           │
│    66 │   │   if not all(a == model_arch_info[0] for a in model_arch_info[1:]):                  │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ .0 = <list_iterator object at 0x000002C3AE9F87F0>                                            │ │
│ │  m = ModelReference(                                                                         │ │
│ │      │                                                                                       │ │
│ │      path='D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomputations_dolp… │ │
│ │      │   lora_path=None                                                                      │ │
│ │      )                                                                                       │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ D:\mergekit-phi-2\mergekit\common.py:75 in config                                                │
│                                                                                                  │
│    72 │   │   return ModelReference(path=out_path)                                               │
│    73 │                                                                                          │
│    74 │   def config(self) -> PretrainedConfig:                                                  │
│ ❱  75 │   │   return AutoConfig.from_pretrained(self.path)                                       │
│    76 │                                                                                          │
│    77 │   def tensor_index(self, cache_dir: Optional[str] = None) -> ShardedTensorIndex:         │
│    78 │   │   assert self.lora_path is None                                                      │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ self = ModelReference(                                                                       │ │
│ │        │                                                                                     │ │
│ │        path='D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomputations_do… │ │
│ │        │   lora_path=None                                                                    │ │
│ │        )                                                                                     │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\aut │
│ o\configuration_auto.py:1085 in from_pretrained                                                  │
│                                                                                                  │
│   1082 │   │   config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_n  │
│   1083 │   │   has_remote_code = "auto_map" in config_dict and "AutoConfig" in config_dict["aut  │
│   1084 │   │   has_local_code = "model_type" in config_dict and config_dict["model_type"] in CO  │
│ ❱ 1085 │   │   trust_remote_code = resolve_trust_remote_code(                                    │
│   1086 │   │   │   trust_remote_code, pretrained_model_name_or_path, has_local_code, has_remote  │
│   1087 │   │   )                                                                                 │
│   1088                                                                                           │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                           cls = <class                                                       │ │
│ │                                 'transformers.models.auto.configuration_auto.AutoConfig'>    │ │
│ │                 code_revision = None                                                         │ │
│ │                   config_dict = {                                                            │ │
│ │                                 │   '_name_or_path': 'microsoft/phi-2',                      │ │
│ │                                 │   'activation_function': 'gelu_new',                       │ │
│ │                                 │   'architectures': ['PhiForCausalLM'],                     │ │
│ │                                 │   'attn_pdrop': 0.0,                                       │ │
│ │                                 │   'auto_map': {                                            │ │
│ │                                 │   │   'AutoConfig': 'configuration_phi.PhiConfig',         │ │
│ │                                 │   │   'AutoModelForCausalLM':                              │ │
│ │                                 'modeling_phi.PhiForCausalLM'                                │ │
│ │                                 │   },                                                       │ │
│ │                                 │   'embd_pdrop': 0.0,                                       │ │
│ │                                 │   'flash_attn': False,                                     │ │
│ │                                 │   'flash_rotary': False,                                   │ │
│ │                                 │   'fused_dense': False,                                    │ │
│ │                                 │   'img_processor': None,                                   │ │
│ │                                 │   ... +17                                                  │ │
│ │                                 }                                                            │ │
│ │                has_local_code = False                                                        │ │
│ │               has_remote_code = True                                                         │ │
│ │                        kwargs = {                                                            │ │
│ │                                 │   '_from_auto': True,                                      │ │
│ │                                 │   'name_or_path':                                          │ │
│ │                                 'D:\\oobabooga_windows\\text-generation-webui\\models\\cogn… │ │
│ │                                 }                                                            │ │
│ │ pretrained_model_name_or_path = 'D:\\oobabooga_windows\\text-generation-webui\\models\\cogn… │ │
│ │             trust_remote_code = None                                                         │ │
│ │                 unused_kwargs = {                                                            │ │
│ │                                 │   'name_or_path':                                          │ │
│ │                                 'D:\\oobabooga_windows\\text-generation-webui\\models\\cogn… │ │
│ │                                 }                                                            │ │
│ │                use_auth_token = None                                                         │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\dynamic_mo │
│ dule_utils.py:611 in resolve_trust_remote_code                                                   │
│                                                                                                  │
│   608 │   │   │   │   signal.alarm(0)                                                            │
│   609 │   │   │   except Exception:                                                              │
│   610 │   │   │   │   # OS which does not support signal.SIGALRM                                 │
│ ❱ 611 │   │   │   │   raise ValueError(                                                          │
│   612 │   │   │   │   │   f"The repository for {model_name} contains custom code which must be   │
│   613 │   │   │   │   │   f"load the model. You can inspect the repository content at https://   │
│   614 │   │   │   │   │   f"Please pass the argument `trust_remote_code=True` to allow custom    │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │    has_local_code = False                                                                    │ │
│ │   has_remote_code = True                                                                     │ │
│ │        model_name = 'D:\\oobabooga_windows\\text-generation-webui\\models\\cognitivecomputa… │ │
│ │ trust_remote_code = None                                                                     │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: The repository for D:\oobabooga_windows\text-generation-webui\models\cognitivecomputations_dolphin-2_6-phi-2
contains custom code which must be executed to correctly load the model. You can inspect the repository content at
https://hf.co/D:\oobabooga_windows\text-generation-webui\models\cognitivecomputations_dolphin-2_6-phi-2.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.

No rule to produce...

Hello,

I'm using the notebook.ipynb for starters with some slight modifications.

I've tried these two different YAMLs below and am getting two different, though similar, errors. allow_crimes=True and the models seem to download just fine.

First: passthrough

slices:
  - sources:
    - model: stabilityai/stablelm-zephyr-3b
      layer_range: [0, 16]
  - sources:
    - model: stabilityai/stablecode-instruct-alpha-3b
      layer_range: [16, 32]
merge_method: passthrough
dtype: float16

When called like this

mergekit.scripts.run_yaml.main(
    CONFIG_YML,
    OUTPUT_PATH,
    LORA_MERGE_CACHE,
    cuda=torch.cuda.is_available(),
    copy_tokenizer=COPY_TOKENIZER,
    lazy_unpickle=LAZY_UNPICKLE,
    low_cpu_memory=LOW_CPU_MEMORY,
    trust_remote_code=True,
    allow_crimes=True
)

This yields an error
No rule to produce stabilityai/stablecode-instruct-alpha-3b:model.layers.16.input_layernorm.weight

or when run like this
!mergekit-yaml /content/passthrough.yml /content/ --cuda --allow-crimes --trust-remote-code
I receive this error
RuntimeError: No rule to produce stabilityai/stablelm-zephyr-3b:gpt_neox.embed_in.weight

Second: linear

While this setup

models:
  - model: stabilityai/stablelm-zephyr-3b
    parameters:
      weight: 1.0
      normalize: true
  - model: stabilityai/stablecode-instruct-alpha-3b
    parameters:
      weight: 0.3
      normalize: true
merge_method: linear
dtype: float16

Called via the same python code above as well as the CLI method, it yields this error
RuntimeError: No rule to produce stabilityai/stablecode-instruct-alpha-3b:model.embed_tokens.weight

Additional Notes

There was another error, but I forgot the YAML file used to create it.
No rule to produce stabilityai/stablelm-zephyr-3b:gpt_neox.embed_in.weight

The trust_remote_code=True and --trust-remote-code don't appear to do anything, I'm still prompted y/N just the same as without it.

Thanks for any help!

Why two different options generate different size of models?

Option 1

slices:
  - sources:
    - model: AIDC-ai-business/Marcoroni-7B-v3
      layer_range: [0, 24]
  - sources:
    - model: Toten5/Marcoroni-neural-chat-7B-v2
      layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16

Option 2

slices:
  - sources:
      - model: AIDC-ai-business/Marcoroni-7B-v3
        layer_range: [0, 24]
      - model: Toten5/Marcoroni-neural-chat-7B-v2
        layer_range: [8, 32]
merge_method: slerp
base_model: AIDC-ai-business/Marcoroni-7B-v3
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
      
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 
   
dtype: float16

I generated two models using different config options.

The first one generated a 10.7B size model, however, the second one generated a 5.5B size model.

I applied the same base models and layer information, but it yielded different results.

Anyone could explain why these different results have occurred?

Moreover, in slerp merge, there are any other options on the parameter (especially in the filter, mlp or self-attn, or others?)?

Thanks!

UnboundLocalError: local variable 'base_cfg_mistral' referenced before assignment

Hi,
I'm getting UnboundLocalError: local variable 'base_cfg_mistral' referenced before assignment when using Mixtral MoE.

I resolved this issue by replacing:

def build(
    config: MistralMOEConfig,
    out_path: str,
    load_in_4bit: bool = False,
    load_in_8bit: bool = False,
    lazy_unpickle: bool = False,
):
    base_model = ModelReference.parse(config.base_model)
    base_cfg = base_model.config()
    if not isinstance(base_cfg, MistralConfig):
        base_cfg_mistral.sliding_window = base_cfg.max_position_embeddings
        base_cfg_mistral.max_position_embeddings = base_cfg.max_position_embeddings
        base_cfg = base_cfg_mistral
        base_cfg_mistral = MistralConfig(**base_cfg.to_dict())

with

def build(
    config: MistralMOEConfig,
    out_path: str,
    load_in_4bit: bool = False,
    load_in_8bit: bool = False,
    lazy_unpickle: bool = False,
):
    base_model = ModelReference.parse(config.base_model)
    base_cfg = base_model.config()
    if not isinstance(base_cfg, MistralConfig):
        base_cfg_mistral = MistralConfig(**base_cfg.to_dict())
        base_cfg_mistral.sliding_window = base_cfg.max_position_embeddings
        base_cfg_mistral.max_position_embeddings = base_cfg.max_position_embeddings
        base_cfg = base_cfg_mistral

I'm not sure if I'm doing this correctly, will this still provide the correct results? (I'm merging 2 OpenLlama 3B models)
Thank you!

Merging Two Models of Different Architectures

Hi,
Might it be possible to merge Mistral and Llama?
Thank you!

Mistral

Hi,
Is Mistral supported? When I'm trying to merge two Mistral models, I get KeyError: 'mistral'

Can Models with Different vocab_size be Merged?

Great job for this toolkit .

I'm attempting to merge two models with differing vocab_size: augmxnt/shisa-7b-v1 (base) and teknium/OpenHermes-2.5-Mistral-7B. The augmxnt/shisa-7b-v1 model has an expanded vocab_size. However, after merging them using dare_ties, the output of the entire model becomes garbled. Could this be related to my setting of tokenizer_source to union? If I don't set it, I encounter an error:

RuntimeError: The size of tensor a (32000) must match the size of tensor b (120128) at non-singleton dimension 0

Is there a way to successfully merge these models despite their different vocab_size?

Thanks for your help!

Data point for Dare Ties

I uploaded 3 different merges, the same in every way except for density, to HF, and interestingly the higher-density merges perform significantly better:

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-HighDensity
https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties-ExtremeDensity
https://huggingface.co/brucethemoose/CaPlatTessDolXaBoros-Yi-34B-200K-DARE-Ties

Even the "extreme" density config scores far higher than a modest density config. Its perplexity is much lower as well:

Very high density:

models:
  - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
    # no parameters necessary for base model
  - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4
    parameters:
      weight: 0.19
      density: 0.83
  - model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k
    parameters:
      weight: 0.14
      density: 0.6
  - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
    parameters:
      weight: 0.19
      density: 0.83
  - model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200K-Q
    parameters:
      weight: 0.14
      density: 0.6
  - model: /home/alpha/FastModels/ehartford_dolphin-2.2-yi-34b-200k
    parameters:
      weight: 0.19
      density: 0.83
  - model: /home/alpha/FastModels/fblgit_una-xaberius-34b-v1beta
    parameters:
      weight: 0.15
      density: 0.08
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
parameters:

  int8_mask: true
dtype: bfloat16

"Normal" density:

models:
  - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
    # no parameters necessary for base model
  - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4
    parameters:
      weight: 0.19
      density: 0.44
  - model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k
    parameters:
      weight: 0.14
      density: 0.34
  - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
    parameters:
      weight: 0.19
      density: 0.44
  - model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200K-Q
    parameters:
      weight: 0.14
      density: 0.34
  - model: /home/alpha/FastModels/ehartford_dolphin-2.2-yi-34b-200k
    parameters:
      weight: 0.19
      density: 0.44
  - model: /home/alpha/FastModels/fblgit_una-xaberius-34b-v1beta
    parameters:
      weight: 0.15
      density: 0.08
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
parameters:

int8_mask: true
dtype: bfloat16

Is it supported to merge multiple models into a Mixtral model?

Is it possible to merge two llama2-70B chat models into one 140B model and achieve better performance?

Can you conduct TIES merging only on the embedding weights of two models?

Add a push_to_hub method

Very nice project! I'm the Machine Learning Librarian at Hugging Face. We're seeing quite a few merged models produced via MergeKit being uploaded to the Hugging Face Hub. I wanted to ask two questions in response to this:

Have you considered adding a push_to_hub command to MergeKit? Unless I missed something at the moment this not directly included? We could also add a tag mergkit when uploading the model to the Hub, which would allow you to easily find models created via Mergekit.
Related to this, it could be nice to use the YAML config to directly insert some metadata to the model card on the Hub. Some libraries have started adding some very nice auto generated model cards (see SpanMarker as an example). This could be fairly simple but could be a nice way to make it easier for people to discover/understand merged models shared to the Hub.

Happy to help with these if there's interest from your side!

cc @Wauplin @osanseviero

Memory question

Maybe its a code comment somewhere and I missed it, but is there a general guide for how much memory would be required for merging models? Is it just the total sum of all the pytorch.bin files from all the models that we want to merge or is there some other quick 'napkin math' that we could use to estimate approx much would be required?

Also which has more of an impact for merging regular RAM or vRAM?

Support merging stablelm-3b-4e1t models

StableLMEpochForCausalLM

Question: what format do the models need to be in for merging?

I must have missed something, but for the life of me I cannot find any information about the format of the models that mergekit expects.

Are they pytorch, safetensors? Can it use GGUF models because I see syntax that resembles huggingface-cli referencing TheBloke?

How to merge models of two different sizes and architectures?

Hello!

I'm trying to merge a 7B mistral with a 13B Llama-2 model. In my mind I'd like to essentially keep the 7B model the same and just expand it to 13B size by grafting on the 13B if that makes sense. Sorry, I'm new to this.

I know the results won't be ideal probably, it's just for personal experimentation. Is this possible? If so, could you point me in the right direction?

I've tried your frankenllama_22.py script and I got an error about expanding tensors too much or something along those lines. Again, sorry, I'm very new to this. Any help is appreciated.

Eval

Anyone has merged and eval'ed?

Runtime Error, please help

Please help me, I'm a beginner trying to merge two 7b models using gradient slerp, and I encountered this error.
RuntimeError: No rule to produce (model name):model.layers.32.input_layernorm.weight

PLEASE add the --cache-dir back!

why did you even remove that?

Sane defaults for dare_ties merging?

Are weights that add up to ~1.2 a sane target? And whats a sane value for the Bernoulli density thing?

what is `filter` and `gradient` doing?

models:
  - model: TheBloke/Llama-2-13B-fp16
    # no parameters necessary for base model
  - model: psmathur/orca_mini_v3_13b
    parameters:
      density: [1, 0.7, 0.1] # density gradient
      weight: 1.0
  - model: garage-bAInd/Platypus2-13B
    parameters:
      density: 0.5
      weight: [0, 0.3, 0.7, 1] # weight gradient
  - model: WizardLM/WizardMath-13B-V1.0
    parameters:
      density: 0.33
      weight:
        - filter: mlp
          value: 0.5
        - value: 0
merge_method: ties
base_model: TheBloke/Llama-2-13B-fp16
parameters:
  normalize: true
  int8_mask: true
dtype: float16

Question 1
what is gradient doing?
for example, density is need to float, but list. and list value is gradient
what is gradient? does it generate a result for each value?

Question 2
what is filter doing?
does it combine only those, or does it exclude those?
and what is value? and one value has '-' but another one is not. what is diffrent between those?

please help~~~!!!

Can you please explain the weight and density parameters related to the TIES merging?

According to the following YAML file, where do we have lists of values? What is the correspondence?
https://github.com/cg123/mergekit/blob/main/examples/ties.yml#L11

Here is a fix for mergekit-moe on Apple Silicon

Hi,

Sorry for not submitting / requesting this fix the proper way, I'm new to github.

However, I ran into and subsequently fixed issues that prevented me from creating an MoE model on Apple Silicon (Metal/MPS).

The fix boils down to the fact that bfloat16 operations are not supported in PyTorch for MPS. Source. To fix this for myself I made the following simple changes, which make use of the CPU rather than GPU

in file mixtral_moe.py near line 124:

device_map="auto",

must be changed to

device_map="cpu",

in file lazy_tensor_loader.py near line 101:

shard = torch.load(model_path)

must be changed to

shard = torch.load(
    model_path,
    map_location=torch.device('cpu')
)

Before applying these fixes, the Warm up loaders progress bar never moved past 20% for me. After the fixes I am able to create MoE models as expected.

I hope this can help!

Can I perform a task_arithmetic or TIES merge on two embedding matrices?

@cg123 Thanks a lot for the amazing work :)

I want to update the embedding matrix of the base model with a delta embedding matrix from another task-specific model.

I intend to focus solely on the embedding layer. Is there a specific method for doing this?

Are there any plans to support Encoder-Decoder models like T5?

Hi, thanks for sharing this awesome work! Are there any plans to support Encoder-Decoder models like T5? Are there any methods for merging such models?

Support for miscrosoft Phi-2

Could you please extend the functions to use with Phi-2. This would be great for the small language community.

Low memory mode?

Hi, is there any way to merge two 7B models on a free Colab (13GB memory)?

mergekit-moe seems to fail

It gets to 4/9 on the progress bar and then just exits. No tokenizer or anything is copied over, only the config.json and safetensors are in the output folder. Whats going on?

Relevant literature for these methods

Hi, I was wondering if there are some relevant papers on the linear and SLERP merging methods. Thanks!

License

Hi,
Great project! Just curious why you switched from MIT to LGPL?
Thank you!

Is it possible to merge multiple models into a MOE model?

Is it possible to merge multiple models into a MOE model?
I think it is possible to reduce the VRAM consumption during merging by dynamically loading models in batches?

I was wondering after merging did I need fine-tune the merged model?

Nice job
I have a question, after merging did I need to fine-tune the merged model?

DARE: Separate base models for delta measurement and merge?

I have a request. If I understand correctly, the DARE merge measures the deltas from the base model given in the YML to the other models in the merge, then uses those deltas to do the whole "Mario merge" thing.

However, I want to merge a model finetuned on vanilla Llama 2 into a model trained over sequelbox/DynamicFactor (a model intended as an improved base for further finetuning.)

Is it possible for you to add a feature that allows us to use one base model during the delta measurements (in my case vanilla L2) and then another model during the merge itself (in my case DynamicFactor)?

Question

Hi @cg123
Great job for this toolkit making miracles...
I got a question:
I naively tried to merge all FFN experts of mixtral by simply averaging weights of these 8 experts.
Output is not good.
Based on what you know with merging weights do you have any clue/advice about if this is possible in some ways?
Cheers.

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

Missing required parameter weight for model.embed_tokens.weight

Hi,
Sorry to open so many issues :). I'm trying to merge a Zephyr model and I'm getting this error:

RuntimeError: Missing required parameter weight for HuggingFaceH4/zephyr-7b-alpha.model.embed_tokens.weight

Zephyr is based on Mistral, so it should be supported, right?

Error when merging two Yi models

Hi, awesome project
Anyway I'm encountering majour difficulties when trying to fuse two Yi models(1,2)
this its the conf file

slices:
  - sources:
      - model: /workspace/Tess-M-v1.3
        layer_range: [0, 14]
  - sources:
      - model: /workspace/Nous-Capybara-34B
        layer_range: [7, 21]  
  - sources:
      - model: /workspace/Tess-M-v1.3
        layer_range: [15, 29] 
  - sources:
      - model: /workspace/Nous-Capybara-34B
        layer_range: [22, 36] 
  - sources:
      - model: /workspace/Tess-M-v1.3
        layer_range: [30, 44] 
  - sources:
      - model: /workspace/Nous-Capybara-34B
        layer_range: [37, 51]  
  - sources:
      - model: /workspace/Tess-M-v1.3
        layer_range: [45, 59] 
merge_method: passthrough
base_model: /workspace/Tess-M-v1.3
tokenizer_source: /workspace/Tess-M-v1.3
dtype: float16

It doesn't seem to change anything if I switch the models.
The error it give is:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):

  File "/usr/local/bin/mergekit-yaml", line 8, in <module>
    sys.exit(_main())

  File "/workspace/mergekit/mergekit/scripts/run_yaml.py", line 101, in _main
    typer.run(main)

  File "/workspace/mergekit/mergekit/scripts/run_yaml.py", line 82, in main
    run_merge(

  File "/workspace/mergekit/mergekit/merge.py", line 68, in run_merge
    tokenizer, embed_permutations = build_tokenizer(

  File "/workspace/mergekit/mergekit/tokenizer.py", line 54, in build_tokenizer
    vocabularies[model] = model_tok.vocab

AttributeError: 'YiTokenizer' object has no attribute 'vocab'

MergeKit models does not behave the same as the original model

Hi @cg123, I am the author of AutoAWQ. After being in contact with TheBloke, it seems there are some issues with models from MergeKit.

Weights are not the same datatype as original, e.g. Mixtral is BF16 but a mergekit model may have both BF16 and FP16
Some weights are null or zeros. This can cause errors during quantization.

Are you aware of any of these issues and do you have any fix for them?

The following models are currently failing:

causallm/8x7b-moe-test-not-mixtral
perlthoughts/chupacabra-8x7b-moe
perlthoughts/falkor-8x7b-moe
perlthoughts/starling-lm-alpha-8x7b-moe
undi95/bigplap-8x20b

But these models work fine:

chargoddard/mixtralnt-4x7b-test
chargoddard/mixtralrpchat-zloss

slices:
  - sources:
    - model: HuggingFaceH4/zephyr-7b-beta
      layer_range: [0, 12]
  - sources:
    - model: argilla/notus-7b-v1
      layer_range: [28, 32]
merge_method: passthrough
dtype: bfloat16

And when I run this code:

from transformers import pipeline
import torch
pipe = pipeline('text-generation', model='minihermes', device='mps')
pipe('Python is a programming language')

I get:

[{'generated_text': 'Python is a programming language, <a\n luego without.\n\n\n\n\n\n\n\n'}]

I feel like I'm doing something wrong here... Is this an issue w/ the tokenizer?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.