Comments (7)
Can mergekit operate correctly on a mix of fp16 and bf16 models by automatically converting to the target dtype?
from mergekit.
It does convert everything to the target dtype you specify if you have dtype
set in the configuration. The fact that GGUF quantization works makes me think this is probably not an issue on the mergekit side.
I do know that DARE TIES with lower density values can result in an unusual distribution of parameter magnitudes. The rescaling step tends to introduce large outliers that are critical to the behavior of the model. I saw this happen more with densities around 0.1 but it could be happening here too. Maybe this is throwing exl2 for a loop? I'm not sure.
@turboderp any thoughts?
from mergekit.
ExLlama converts everything to FP16 when loading or quantizing, and the difference in dynamic range compared to BF16 could conceivably be an issue, but I've never seen it in practice. I tried downloading that model and converting it here, and it seems to work fine? (Both in 4bpw h6 and 8bpw h8.)
from mergekit.
I updated to the current version and retried. The 8.0bpw h8 exl2 quant completed, but the truncation length that got read in was only 2048 when I loaded it in with ooba. The Q8_0 GGUF truncation length remained at 8192. I forgot to mention I ran this under Windows 11. Ryzen 3 2200G, if that matters.
from mergekit.
I wonder if this is a bug in TGW. From v0.0.15 EXL2 adds a "quantization_config" key to the config.json, which is the only place a length of 2048 would be mentioned in the model. It's only under that key, though, as the calibration length. The model itself still lists "max_position_embeddings" of 8192.
There's also a "max_new_tokens" key in generation_config.json that TGW might be reading? Not sure why it would use that key but it might explain it? (was looking at a different model, scratch that.)
from mergekit.
I looked at the config.json in the generated result for the exl2 quant. Not sure why the length under calibration ended up at 2048. Manually adjusting it to 8192 resulted in ooba reporting the truncation length to be 8192 rather than 2048. 2048 does not show up at all in the config.json of the merged result output from mergekit, so I should probably take this issue up with exllamav2 at this point.
{
"_name_or_path": "SanjiWatsuki/Kunoichi-7B",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.38.2",
"use_cache": true,
"vocab_size": 32000,
"quantization_config": {
"quant_method": "exl2",
"version": "0.0.16",
"bits": 8.0,
"head_bits": 6,
"calibration": {
"rows": 100,
"length": 2048,
"dataset": "(default)"
}
}
}
from mergekit.
The length listed under quantization config has nothing to do with inference. It's just metadata for troubleshooting purposes. You'd have to ask ooba what config option they're (not) reading to arrive at 2048 as the default.
from mergekit.
Related Issues (20)
- How does allow_crimes works HOT 1
- Typo in your evolutionary merging tutorial causes error HOT 1
- what is the default model download path ? HOT 3
- Idea: Scaling the Down-Projection Matrix in 'Mixture of Experts' Models HOT 7
- The format of the inference result after model merging is not consistent with the base model inference format HOT 1
- DoRA HOT 1
- Training at Lower Context and Merging Large
- View size is not compatible with input tensor's size and stride/You are trying to save a non contiguous tensor Fix HOT 1
- RuntimeError: "svd_cuda_gesvdj" not implemented for 'BFloat16' HOT 4
- Mergekit-Evolve with vLLM enabled causes error if merge_method is linear HOT 2
- How are dimensions of multiple layer's matched?
- Merge only the transformer parts (including the input embedding layer) HOT 5
- Implementation of AdaMerging: Adaptive Model Merging for Multi-Task Learning
- Existing Mergekit algorithms to merge VLM with LLM?
- Qwen/Qwen1.5-1.8B MoE Merging fails HOT 3
- Mixed Precision Merging HOT 1
- Add support for `subfolder` loading
- How to merge a VLM and LLM with different model type.
- Merge of hidden_size
- Relax dependency versions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mergekit.