I did a DARE TIES merge of 3 fp16 and 1 bf16 model, outputting fp16 or bf16. The resul

DARE TIES merge of mixed fp16 and bf16 models cannot be exl2 quanted about mergekit HOT 7 CLOSED

jim-plus commented on June 26, 2024

DARE TIES merge of mixed fp16 and bf16 models cannot be exl2 quanted

from mergekit.

Comments (7)

jim-plus commented on June 26, 2024

Can mergekit operate correctly on a mix of fp16 and bf16 models by automatically converting to the target dtype?

from mergekit.

cg123 commented on June 26, 2024

It does convert everything to the target dtype you specify if you have dtype set in the configuration. The fact that GGUF quantization works makes me think this is probably not an issue on the mergekit side.

I do know that DARE TIES with lower density values can result in an unusual distribution of parameter magnitudes. The rescaling step tends to introduce large outliers that are critical to the behavior of the model. I saw this happen more with densities around 0.1 but it could be happening here too. Maybe this is throwing exl2 for a loop? I'm not sure.

@turboderp any thoughts?

from mergekit.

turboderp commented on June 26, 2024

ExLlama converts everything to FP16 when loading or quantizing, and the difference in dynamic range compared to BF16 could conceivably be an issue, but I've never seen it in practice. I tried downloading that model and converting it here, and it seems to work fine? (Both in 4bpw h6 and 8bpw h8.)

from mergekit.

jim-plus commented on June 26, 2024

I updated to the current version and retried. The 8.0bpw h8 exl2 quant completed, but the truncation length that got read in was only 2048 when I loaded it in with ooba. The Q8_0 GGUF truncation length remained at 8192. I forgot to mention I ran this under Windows 11. Ryzen 3 2200G, if that matters.

from mergekit.

turboderp commented on June 26, 2024

I wonder if this is a bug in TGW. From v0.0.15 EXL2 adds a "quantization_config" key to the config.json, which is the only place a length of 2048 would be mentioned in the model. It's only under that key, though, as the calibration length. The model itself still lists "max_position_embeddings" of 8192.

~~There's also a "max_new_tokens" key in generation_config.json that TGW might be reading? Not sure why it would use that key but it might explain it?~~ (was looking at a different model, scratch that.)

from mergekit.

jim-plus commented on June 26, 2024

I looked at the config.json in the generated result for the exl2 quant. Not sure why the length under calibration ended up at 2048. Manually adjusting it to 8192 resulted in ooba reporting the truncation length to be 8192 rather than 2048. 2048 does not show up at all in the config.json of the merged result output from mergekit, so I should probably take this issue up with exllamav2 at this point.

{
    "_name_or_path": "SanjiWatsuki/Kunoichi-7B",
    "architectures": [
        "MistralForCausalLM"
    ],
    "attention_dropout": 0.0,
    "bos_token_id": 1,
    "eos_token_id": 2,
    "hidden_act": "silu",
    "hidden_size": 4096,
    "initializer_range": 0.02,
    "intermediate_size": 14336,
    "max_position_embeddings": 8192,
    "model_type": "mistral",
    "num_attention_heads": 32,
    "num_hidden_layers": 32,
    "num_key_value_heads": 8,
    "rms_norm_eps": 1e-05,
    "rope_theta": 10000.0,
    "sliding_window": 4096,
    "tie_word_embeddings": false,
    "torch_dtype": "bfloat16",
    "transformers_version": "4.38.2",
    "use_cache": true,
    "vocab_size": 32000,
    "quantization_config": {
        "quant_method": "exl2",
        "version": "0.0.16",
        "bits": 8.0,
        "head_bits": 6,
        "calibration": {
            "rows": 100,
            "length": 2048,
            "dataset": "(default)"
        }
    }
}

from mergekit.

turboderp commented on June 26, 2024

The length listed under quantization config has nothing to do with inference. It's just metadata for troubleshooting purposes. You'd have to ask ooba what config option they're (not) reading to arrive at 2048 as the default.

from mergekit.

DARE TIES merge of mixed fp16 and bf16 models cannot be exl2 quanted about mergekit HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent