bigcode-project / starcoder2 Goto Github PK

Home of StarCoder2!

License: Apache License 2.0

Python 100.00%

starcoder2's Introduction

StarCoder 2

StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues. The models use Grouped Query Attention, a context window of 16,384 tokens, with sliding window attention of 4,096 tokens. The 3B & 7B models were trained on 3+ trillion tokens, while the 15B was trained on 4+ trillion tokens. For more details check out the paper.

Quickstart
Fine-tuning
- Setup
- Training
Evaluation

Quickstart

StarCoder2 models are intended for code completion, they are not instruction models and commands like "Write a function that computes the square root." do not work well.

Installation

First, we have to install all the libraries listed in requirements.txt

pip install -r requirements.txt
# export your HF token, found here: https://huggingface.co/settings/account
export HF_TOKEN=xxx

Model usage and memory footprint

Here are some examples to load the model and generate code, with the memory footprint of the largest model, StarCoder2-15B. Ensure you've installed transformers from source (it should be the case if you used requirements.txt)

pip install git+https://github.com/huggingface/transformers.git

Running the model on CPU/GPU/multi GPU

Using full precision

# pip install git+https://github.com/huggingface/transformers.git # TODO: merge PR to main
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder2-15b"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# to use Multiple GPUs do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Using torch.bfloat16

# pip install accelerate
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "bigcode/starcoder2-15b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# for fp16 use `torch_dtype=torch.float16` instead
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Memory footprint: 32251.33 MB

Quantized Versions through `bitsandbytes`

Using 8-bit precision (int8)

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig(load_in_8bit=True)

checkpoint = "bigcode/starcoder2-15b_16k"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder2-15b_16k", quantization_config=quantization_config)

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
# load_in_8bit
Memory footprint: 16900.18 MB
# load_in_4bit
>>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Memory footprint: 9224.60 MB

You can also use pipeline for the generation:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
checkpoint = "bigcode/starcoder2-15b"

model = AutoModelForCausalLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
print( pipe("def hello():") )

Text-generation-inference:

docker run -p 8080:80 -v $PWD/data:/data -e HUGGING_FACE_HUB_TOKEN=<YOUR BIGCODE ENABLED TOKEN> -d  ghcr.io/huggingface/text-generation-inference:latest --model-id bigcode/starcoder2-15b --max-total-tokens 8192

For more details, see here.

Fine-tuning

Here, we showcase how you can fine-tune StarCoder2 models. For more fine-tuning resources you can check StarCoder's GitHub repository and SantaCoder-Finetuning.

Setup

Install pytorch see documentation, for example the following command works with cuda 12.1:

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Install the requirements (this installs transformers from source to support the StarCoder2 architecture):

pip install -r requirements.txt

Before you run any of the scripts make sure you are logged in wandb and HuggingFace Hub to push the checkpoints:

wandb login
huggingface-cli login

Now that everything is done, you can clone the repository and get into the corresponding directory.

Training

To fine-tune efficiently with a low cost, we use PEFT library for Low-Rank Adaptation (LoRA) training and bitsandbytes for 4bit quantization. We also use the SFTTrainer from TRL.

For this example, we will fine-tune StarCoder2-3b on the Rust subset of the-stack-smol. This is just for illustration purposes; for a larger and cleaner dataset of Rust code, you can use The Stack dedup.

To launch the training:

accelerate launch finetune.py \
        --model_id "bigcode/starcoder2-3b" \
        --dataset_name "bigcode/the-stack-smol" \
        --subset "data/rust" \
        --dataset_text_field "content" \
        --split "train" \
        --max_seq_length 1024 \
        --max_steps 10000 \
        --micro_batch_size 1 \
        --gradient_accumulation_steps 8 \
        --learning_rate 2e-5 \
        --warmup_steps 20 \
        --num_proc "$(nproc)"

If you want to fine-tune on other text datasets, you need to change dataset_text_field argument to the name of the column containing the code/text you want to train on.

Evaluation

To evaluate StarCoder2 and its derivatives, you can use the BigCode-Evaluation-Harness for evaluating Code LLMs. You can also check the BigCode Leaderboard.

starcoder2's People

Contributors

Stargazers

Watchers

Forkers

josephrp userbox020 imacoduh jwwestcott techthiyanes zhangrz5 jwski du406470427 bbsyinnv1131 aapostoliadis yaominglei yjercou owhvayifuqq webboyvc ccc0168 carlosouza dragon28 patrickbdevaney daikeren tehmasta malnutrition jmoork kufeng76 hosuappa wenhuilu gaofengliu lihuibng isuyu tsayan sweifan karthikra yipianyun2024 eljahdiosmio lalomorales22 rexterity ioio7896 evdcush synthwave-systems farukaydogan genostack yellowbee686 muhtasham daizelai rightpop-centhart bloggeno14 messagesta-r k-oprokets buffar-m wanglaosan00 johnny-rice farerthebesthulkferdy exelliumphilodian scopency93 boardinstaceyvitalbee leebufan peachninja-shadesdogg yanxg entreseestroonshulk beachroon-r f901107 gsqycx vebrevent bingmo33 louud19 gfluentie htinkerchic 87cephagne bagotoxic-y kmewgnome 78stamaha ailmendra17stictime opissroo-glasedip sletsuchem guychrome18 bpodhani championderp73 hopwator-y gament-y racergigaeyeecht eventsserene-sushydr readerenes46 nicsyscalamarket billite-jiggyough mahopenta-mercyfeline tatraflex-t seattand36 wotifra-cookyri sporkseneta dragonusangelife likemedia31sevarica zzxxcc7412 willie542 apollohuang1 rickyhong phymucs saitaotechnology alverazcartin477 ddkwork winning1120xx icemastert

starcoder2's Issues

format for inference in code completion

starcoder's format for inference in code completion is PSM, <fim_prefix> + prefix + <fim_suffix> + suffix + <fim_middle>

what's that for starcoder2?

from the paper, we could only see that

what is the sft template？

what is the sft template？
when i try to use this model, i dont know what is the sft template.
please help me, thankyou

Official Support for GGUF Quantization in BigCode Starcoder2 to Enhance Accessibility and Efficiency

Dear BigCode team, what a wonderful project!

I am writing this feature request for official implementation of GGUF quantization for Starcoder2 to enhance its adoption with coding platforms and APIs such as Ollama and LMStudio.

Despite the model's advanced capabilities with its versions, its integration and usability in the OpenAI-API style coding ecosystem, including extensions like "Continue" for VSCode, could be significantly improved. The current lack of support for GGUF quantization limits its potential reach and utility.

An official implementation by your team would ensure optimal performance and compatibility, eliminating the need for community-driven workarounds. I urge you to consider this proposal as a step towards making BigCode Starcoder2 a more versatile and inclusive tool for the developer community. Official GGUF quantization could significantly impact its adoption and effectiveness across diverse development environments.

Thank you for your time and consideration of this important enhancement. I look forward to your positive response and the future success of BigCode Starcoder2.

Clash in requirements for finetuning Starcoder2

#Facing the following error while trying to finetune Starcoder2 with the given script.

Description:

For transformers.AutoModelForCausalLM to recognize Starcoder2 transformers>4.39.0 is required.

But trl is still using transformers==4.38.2. Even if I compile from source & use trl=0.7.12.dev0 I still get an issue.

Here is the error with using `transformers==4.38.2`

KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1127             try:
-> 1128                 config_class = CONFIG_MAPPING[config_dict["model_type"]]
   1129             except KeyError:

4 frames
KeyError: 'starcoder2'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1128                 config_class = CONFIG_MAPPING[config_dict["model_type"]]
   1129             except KeyError:
-> 1130                 raise ValueError(
   1131                     f"The checkpoint you are trying to load has model type `{config_dict['model_type']}` "
   1132                     "but Transformers does not recognize this architecture. This could be because of an "

ValueError: The checkpoint you are trying to load has model type `starcoder2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Here is the error when using `transformers==4.39.0`

ImportError                               Traceback (most recent call last)
[<ipython-input-2-3ef713ffd06d>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from trl import SFTTrainer
      2 print("trl version:", trl.__version__)

1 frames
[/usr/local/lib/python3.10/dist-packages/trl/__init__.py](https://localhost:8080/#) in <module>
      3 __version__ = "0.7.12.dev0"
      4 
----> 5 from .core import set_seed
      6 from .environment import TextEnvironment, TextHistory
      7 from .extras import BestOfNSampler

[/usr/local/lib/python3.10/dist-packages/trl/core.py](https://localhost:8080/#) in <module>
     23 import torch.nn.functional as F
     24 from torch.nn.utils.rnn import pad_sequence
---> 25 from transformers import top_k_top_p_filtering
     26 
     27 from .import_utils import is_npu_available, is_xpu_available

ImportError: cannot import name 'top_k_top_p_filtering' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/__init__.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

Unlawful use of my code

The readme of this repo reads the following:

StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 [...]

The dataset linked contains my code, without following its license (or lack thereof).

Consent is not opt-out. You trained an LLM on code you are not allowed to use.

Megatron model weights for StarCoder2-15B

A year ago, the raw Megatron weights for StarCoder were released.

Would it be possible to release the Megatron weights for StarCoder2, especially the 15B variant?

Also, publishing a script to convert StarCoder2 from Megatron format to HuggingFace format would be helpful. Thanks!

Can starcoder2 be trained with a different language like TCL or lisp?

Hello @loubnabnl is it possible to get starcoder2 to learn TCL?

It was not part of the 30 languages so was curious if it's worth pursuing with SFT?

Also, is there FIM script you used for this version of starcoder2?

support SPM mode for FIM prompts

from fim paper (https://arxiv.org/pdf/2207.14255.pdf) section 3.1: SPM mode can be used to reuse kv cache across completion requests.

SPM modes can enable further latency optimization (which is very important in case of code completion tools). is there any reason that startcoder models are using normal PSM mode?

Some weights of the model checkpoint at `finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM

I get the following error after finetuning this model on the R dataset following the example in the README.

Some weights of the model checkpoint at finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM: ['model.layers.0.self_attn.k_proj.base_layer.bias', 'model.layers.0.self_attn.k_proj.base_layer.weight', 'model.layers.0.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.k_proj.lora_A.default.weight', 'model.layers.0.self_attn.k_proj.lora_B.default.weight', 'model.layers.0.self_attn.o_proj.base_layer.bias', 'model.layers.0.self_attn.o_proj.base_layer.weight', 'model.layers.0.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.o_proj.lora_A.default.weight', 'model.layers.0.self_attn.o_proj.lora_B.default.weight', 'model.layers.0.self_attn.q_proj.base_layer.bias', 'model.layers.0.self_attn.q_proj.base_layer.weight', 'model.layers.0.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.q_proj.lora_A.default.weight', 'model.layers.0.self_attn.q_proj.lora_B.default.weight', 'model.layers.0.self_attn.v_proj.base_layer.bias', 'model.layers.0.self_attn.v_proj.base_layer.weight', 'model.layers.0.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.v_proj.lora_A.default.weight', 'model.layers.0.self_attn.v_proj.lora_B.default.weight', 'model.layers.1.self_attn.k_proj.base_layer.bias', 'model.layers.1.self_attn.k_proj.base_layer.weight', 'model.layers.1.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.k_proj.lora_A.default.weight', 'model.layers.1.self_attn.k_proj.lora_B.default.weight', 'model.layers.1.self_attn.o_proj.base_layer.bias', 'model.layers.1.self_attn.o_proj.base_layer.weight', 'model.layers.1.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.o_proj.lora_A.default.weight', 'model.layers.1.self_attn.o_proj.lora_B.default.weight', 'model.layers.1.self_attn.q_proj.base_layer.bias', 'model.layers.1.self_attn.q_proj.base_layer.weight', 'model.layers.1.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.q_proj.lora_A.default.weight', 'model.layers.1.self_attn.q_proj.lora_B.default.weight', 'model.layers.1.self_attn.v_proj.base_layer.bias', 'model.layers.1.self_attn.v_proj.base_layer.weight', 'model.layers.1.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.v_proj.lora_A.default.weight', 'model.layers.1.self_attn.v_proj.lora_B.default.weight', 'model.layers.10.self_attn.k_proj.base_layer.bias', 'model.layers.10.self_attn.k_proj.base_layer.weight', 'model.layers.10.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.k_proj.lora_A.default.weight', 'model.layers.10.self_attn.k_proj.lora_B.default.weight', 'model.layers.10.self_attn.o_proj.base_layer.bias', 'model.layers.10.self_attn.o_proj.base_layer.weight', 'model.layers.10.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.o_proj.lora_A.default.weight', 'model.layers.10.self_attn.o_proj.lora_B.default.weight', 'model.layers.10.self_attn.q_proj.base_layer.bias', 'model.layers.10.self_attn.q_proj.base_layer.weight', 'model.layers.10.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.q_proj.lora_A.default.weight', 'model.layers.10.self_attn.q_proj.lora_B.default.weight', 'model.layers.10.self_attn.v_proj.base_layer.bias', 'model.layers.10.self_attn.v_proj.base_layer.weight', 'model.layers.10.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.v_proj.lora_A.default.weight', 'model.layers.10.self_attn.v_proj.lora_B.default.weight', 'model.layers.11.self_attn.k_proj.base_layer.bias', 'model.layers.11.self_attn.k_proj.base_layer.weight', 'model.layers.11.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.k_proj.lora_A.default.weight', 'model.layers.11.self_attn.k_proj.lora_B.default.weight', 'model.layers.11.self_attn.o_proj.base_layer.bias', 'model.layers.11.self_attn.o_proj.base_layer.weight', 'model.layers.11.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.o_proj.lora_A.default.weight', 'model.layers.11.self_attn.o_proj.lora_B.default.weight', 'model.layers.11.self_attn.q_proj.base_layer.bias', 'model.layers.11.self_attn.q_proj.base_layer.weight', 'model.layers.11.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.q_proj.lora_A.default.weight', 'model.layers.11.self_attn.q_proj.lora_B.default.weight', 'model.layers.11.self_attn.v_proj.base_layer.bias', 'model.layers.11.self_attn.v_proj.base_layer.weight', 'model.layers.11.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.v_proj.lora_A.default.weight', 'model.layers.11.self_attn.v_proj.lora_B.default.weight', 'model.layers.12.self_attn.k_proj.base_layer.bias', 'model.layers.12.self_attn.k_proj.base_layer.weight', 'model.layers.12.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.k_proj.lora_A.default.weight', 'model.layers.12.self_attn.k_proj.lora_B.default.weight', 'model.layers.12.self_attn.o_proj.base_layer.bias', 'model.layers.12.self_attn.o_proj.base_layer.weight', 'model.layers.12.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.o_proj.lora_A.default.weight', 'model.layers.12.self_attn.o_proj.lora_B.default.weight', 'model.layers.12.self_attn.q_proj.base_layer.bias', 'model.layers.12.self_attn.q_proj.base_layer.weight', 'model.layers.12.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.q_proj.lora_A.default.weight', 'model.layers.12.self_attn.q_proj.lora_B.default.weight', 'model.layers.12.self_attn.v_proj.base_layer.bias', 'model.layers.12.self_attn.v_proj.base_layer.weight', 'model.layers.12.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.v_proj.lora_A.default.weight', 'model.layers.12.self_attn.v_proj.lora_B.default.weight', 'model.layers.13.self_attn.k_proj.base_layer.bias', 'model.layers.13.self_attn.k_proj.base_layer.weight', 'model.layers.13.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.k_proj.lora_A.default.weight', 'model.layers.13.self_attn.k_proj.lora_B.default.weight', 'model.layers.13.self_attn.o_proj.base_layer.bias', 'model.layers.13.self_attn.o_proj.base_layer.weight', 'model.layers.13.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.o_proj.lora_A.default.weight', 'model.layers.13.self_attn.o_proj.lora_B.default.weight', 'model.layers.13.self_attn.q_proj.base_layer.bias', 'model.layers.13.self_attn.q_proj.base_layer.weight', 'model.layers.13.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.q_proj.lora_A.default.weight', 'model.layers.13.self_attn.q_proj.lora_B.default.weight', 'model.layers.13.self_attn.v_proj.base_layer.bias', 'model.layers.13.self_attn.v_proj.base_layer.weight', 'model.layers.13.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.v_proj.lora_A.default.weight', 'model.layers.13.self_attn.v_proj.lora_B.default.weight', 'model.layers.14.self_attn.k_proj.base_layer.bias', 'model.layers.14.self_attn.k_proj.base_layer.weight', 'model.layers.14.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.k_proj.lora_A.default.weight', 'model.layers.14.self_attn.k_proj.lora_B.default.weight', 'model.layers.14.self_attn.o_proj.base_layer.bias', 'model.layers.14.self_attn.o_proj.base_layer.weight', 'model.layers.14.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.o_proj.lora_A.default.weight', 'model.layers.14.self_attn.o_proj.lora_B.default.weight', 'model.layers.14.self_attn.q_proj.base_layer.bias', 'model.layers.14.self_attn.q_proj.base_layer.weight', 'model.layers.14.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.q_proj.lora_A.default.weight', 'model.layers.14.self_attn.q_proj.lora_B.default.weight', 'model.layers.14.self_attn.v_proj.base_layer.bias', 'model.layers.14.self_attn.v_proj.base_layer.weight', 'model.layers.14.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.v_proj.lora_A.default.weight', 'model.layers.14.self_attn.v_proj.lora_B.default.weight', 'model.layers.15.self_attn.k_proj.base_layer.bias', 'model.layers.15.self_attn.k_proj.base_layer.weight', 'model.layers.15.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.k_proj.lora_A.default.weight', 'model.layers.15.self_attn.k_proj.lora_B.default.weight', 'model.layers.15.self_attn.o_proj.base_layer.bias', 'model.layers.15.self_attn.o_proj.base_layer.weight', 'model.layers.15.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.o_proj.lora_A.default.weight', 'model.layers.15.self_attn.o_proj.lora_B.default.weight', 'model.layers.15.self_attn.q_proj.base_layer.bias', 'model.layers.15.self_attn.q_proj.base_layer.weight', 'model.layers.15.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.q_proj.lora_A.default.weight', 'model.layers.15.self_attn.q_proj.lora_B.default.weight', 'model.layers.15.self_attn.v_proj.base_layer.bias', 'model.layers.15.self_attn.v_proj.base_layer.weight', 'model.layers.15.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.v_proj.lora_A.default.weight', 'model.layers.15.self_attn.v_proj.lora_B.default.weight', 'model.layers.16.self_attn.k_proj.base_layer.bias', 'model.layers.16.self_attn.k_proj.base_layer.weight', 'model.layers.16.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.k_proj.lora_A.default.weight', 'model.layers.16.self_attn.k_proj.lora_B.default.weight', 'model.layers.16.self_attn.o_proj.base_layer.bias', 'model.layers.16.self_attn.o_proj.base_layer.weight', 'model.layers.16.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.o_proj.lora_A.default.weight', 'model.layers.16.self_attn.o_proj.lora_B.default.weight', 'model.layers.16.self_attn.q_proj.base_layer.bias', 'model.layers.16.self_attn.q_proj.base_layer.weight', 'model.layers.16.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.q_proj.lora_A.default.weight', 'model.layers.16.self_attn.q_proj.lora_B.default.weight', 'model.layers.16.self_attn.v_proj.base_layer.bias', 'model.layers.16.self_attn.v_proj.base_layer.weight', 'model.layers.16.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.v_proj.lora_A.default.weight', 'model.layers.16.self_attn.v_proj.lora_B.default.weight', 'model.layers.17.self_attn.k_proj.base_layer.bias', 'model.layers.17.self_attn.k_proj.base_layer.weight', 'model.layers.17.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.k_proj.lora_A.default.weight', 'model.layers.17.self_attn.k_proj.lora_B.default.weight', 'model.layers.17.self_attn.o_proj.base_layer.bias', 'model.layers.17.self_attn.o_proj.base_layer.weight', 'model.layers.17.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.o_proj.lora_A.default.weight', 'model.layers.17.self_attn.o_proj.lora_B.default.weight', 'model.layers.17.self_attn.q_proj.base_layer.bias', 'model.layers.17.self_attn.q_proj.base_layer.weight', 'model.layers.17.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.q_proj.lora_A.default.weight', 'model.layers.17.self_attn.q_proj.lora_B.default.weight', 'model.layers.17.self_attn.v_proj.base_layer.bias', 'model.layers.17.self_attn.v_proj.base_layer.weight', 'model.layers.17.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.v_proj.lora_A.default.weight', 'model.layers.17.self_attn.v_proj.lora_B.default.weight', 'model.layers.18.self_attn.k_proj.base_layer.bias', 'model.layers.18.self_attn.k_proj.base_layer.weight', 'model.layers.18.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.k_proj.lora_A.default.weight', 'model.layers.18.self_attn.k_proj.lora_B.default.weight', 'model.layers.18.self_attn.o_proj.base_layer.bias', 'model.layers.18.self_attn.o_proj.base_layer.weight', 'model.layers.18.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.o_proj.lora_A.default.weight', 'model.layers.18.self_attn.o_proj.lora_B.default.weight', 'model.layers.18.self_attn.q_proj.base_layer.bias', 'model.layers.18.self_attn.q_proj.base_layer.weight', 'model.layers.18.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.q_proj.lora_A.default.weight', 'model.layers.18.self_attn.q_proj.lora_B.default.weight', 'model.layers.18.self_attn.v_proj.base_layer.bias', 'model.layers.18.self_attn.v_proj.base_layer.weight', 'model.layers.18.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.v_proj.lora_A.default.weight', 'model.layers.18.self_attn.v_proj.lora_B.default.weight', 'model.layers.19.self_attn.k_proj.base_layer.bias', 'model.layers.19.self_attn.k_proj.base_layer.weight', 'model.layers.19.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.k_proj.lora_A.default.weight', 'model.layers.19.self_attn.k_proj.lora_B.default.weight', 'model.layers.19.self_attn.o_proj.base_layer.bias', 'model.layers.19.self_attn.o_proj.base_layer.weight', 'model.layers.19.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.o_proj.lora_A.default.weight', 'model.layers.19.self_attn.o_proj.lora_B.default.weight', 'model.layers.19.self_attn.q_proj.base_layer.bias', 'model.layers.19.self_attn.q_proj.base_layer.weight', 'model.layers.19.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.q_proj.lora_A.default.weight', 'model.layers.19.self_attn.q_proj.lora_B.default.weight', 'model.layers.19.self_attn.v_proj.base_layer.bias', 'model.layers.19.self_attn.v_proj.base_layer.weight', 'model.layers.19.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.v_proj.lora_A.default.weight', 'model.layers.19.self_attn.v_proj.lora_B.default.weight', 'model.layers.2.self_attn.k_proj.base_layer.bias', 'model.layers.2.self_attn.k_proj.base_layer.weight', 'model.layers.2.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.k_proj.lora_A.default.weight', 'model.layers.2.self_attn.k_proj.lora_B.default.weight', 'model.layers.2.self_attn.o_proj.base_layer.bias', 'model.layers.2.self_attn.o_proj.base_layer.weight', 'model.layers.2.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.o_proj.lora_A.default.weight', 'model.layers.2.self_attn.o_proj.lora_B.default.weight', 'model.layers.2.self_attn.q_proj.base_layer.bias', 'model.layers.2.self_attn.q_proj.base_layer.weight', 'model.layers.2.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.q_proj.lora_A.default.weight', 'model.layers.2.self_attn.q_proj.lora_B.default.weight', 'model.layers.2.self_attn.v_proj.base_layer.bias', 'model.layers.2.self_attn.v_proj.base_layer.weight', 'model.layers.2.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.v_proj.lora_A.default.weight', 'model.layers.2.self_attn.v_proj.lora_B.default.weight', 'model.layers.20.self_attn.k_proj.base_layer.bias', 'model.layers.20.self_attn.k_proj.base_layer.weight', 'model.layers.20.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.k_proj.lora_A.default.weight', 'model.layers.20.self_attn.k_proj.lora_B.default.weight', 'model.layers.20.self_attn.o_proj.base_layer.bias', 'model.layers.20.self_attn.o_proj.base_layer.weight', 'model.layers.20.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.o_proj.lora_A.default.weight', 'model.layers.20.self_attn.o_proj.lora_B.default.weight', 'model.layers.20.self_attn.q_proj.base_layer.bias', 'model.layers.20.self_attn.q_proj.base_layer.weight', 'model.layers.20.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.q_proj.lora_A.default.weight', 'model.layers.20.self_attn.q_proj.lora_B.default.weight', 'model.layers.20.self_attn.v_proj.base_layer.bias', 'model.layers.20.self_attn.v_proj.base_layer.weight', 'model.layers.20.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.v_proj.lora_A.default.weight', 'model.layers.20.self_attn.v_proj.lora_B.default.weight', 'model.layers.21.self_attn.k_proj.base_layer.bias', 'model.layers.21.self_attn.k_proj.base_layer.weight', 'model.layers.21.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.k_proj.lora_A.default.weight', 'model.layers.21.self_attn.k_proj.lora_B.default.weight', 'model.layers.21.self_attn.o_proj.base_layer.bias', 'model.layers.21.self_attn.o_proj.base_layer.weight', 'model.layers.21.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.o_proj.lora_A.default.weight', 'model.layers.21.self_attn.o_proj.lora_B.default.weight', 'model.layers.21.self_attn.q_proj.base_layer.bias', 'model.layers.21.self_attn.q_proj.base_layer.weight', 'model.layers.21.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.q_proj.lora_A.default.weight', 'model.layers.21.self_attn.q_proj.lora_B.default.weight', 'model.layers.21.self_attn.v_proj.base_layer.bias', 'model.layers.21.self_attn.v_proj.base_layer.weight', 'model.layers.21.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.v_proj.lora_A.default.weight', 'model.layers.21.self_attn.v_proj.lora_B.default.weight', 'model.layers.22.self_attn.k_proj.base_layer.bias', 'model.layers.22.self_attn.k_proj.base_layer.weight', 'model.layers.22.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.k_proj.lora_A.default.weight', 'model.layers.22.self_attn.k_proj.lora_B.default.weight', 'model.layers.22.self_attn.o_proj.base_layer.bias', 'model.layers.22.self_attn.o_proj.base_layer.weight', 'model.layers.22.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.o_proj.lora_A.default.weight', 'model.layers.22.self_attn.o_proj.lora_B.default.weight', 'model.layers.22.self_attn.q_proj.base_layer.bias', 'model.layers.22.self_attn.q_proj.base_layer.weight', 'model.layers.22.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.q_proj.lora_A.default.weight', 'model.layers.22.self_attn.q_proj.lora_B.default.weight', 'model.layers.22.self_attn.v_proj.base_layer.bias', 'model.layers.22.self_attn.v_proj.base_layer.weight', 'model.layers.22.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.v_proj.lora_A.default.weight', 'model.layers.22.self_attn.v_proj.lora_B.default.weight', 'model.layers.23.self_attn.k_proj.base_layer.bias', 'model.layers.23.self_attn.k_proj.base_layer.weight', 'model.layers.23.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.k_proj.lora_A.default.weight', 'model.layers.23.self_attn.k_proj.lora_B.default.weight', 'model.layers.23.self_attn.o_proj.base_layer.bias', 'model.layers.23.self_attn.o_proj.base_layer.weight', 'model.layers.23.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.o_proj.lora_A.default.weight', 'model.layers.23.self_attn.o_proj.lora_B.default.weight', 'model.layers.23.self_attn.q_proj.base_layer.bias', 'model.layers.23.self_attn.q_proj.base_layer.weight', 'model.layers.23.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.q_proj.lora_A.default.weight', 'model.layers.23.self_attn.q_proj.lora_B.default.weight', 'model.layers.23.self_attn.v_proj.base_layer.bias', 'model.layers.23.self_attn.v_proj.base_layer.weight', 'model.layers.23.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.v_proj.lora_A.default.weight', 'model.layers.23.self_attn.v_proj.lora_B.default.weight', 'model.layers.24.self_attn.k_proj.base_layer.bias', 'model.layers.24.self_attn.k_proj.base_layer.weight', 'model.layers.24.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.k_proj.lora_A.default.weight', 'model.layers.24.self_attn.k_proj.lora_B.default.weight', 'model.layers.24.self_attn.o_proj.base_layer.bias', 'model.layers.24.self_attn.o_proj.base_layer.weight', 'model.layers.24.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.o_proj.lora_A.default.weight', 'model.layers.24.self_attn.o_proj.lora_B.default.weight', 'model.layers.24.self_attn.q_proj.base_layer.bias', 'model.layers.24.self_attn.q_proj.base_layer.weight', 'model.layers.24.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.q_proj.lora_A.default.weight', 'model.layers.24.self_attn.q_proj.lora_B.default.weight', 'model.layers.24.self_attn.v_proj.base_layer.bias', 'model.layers.24.self_attn.v_proj.base_layer.weight', 'model.layers.24.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.v_proj.lora_A.default.weight', 'model.layers.24.self_attn.v_proj.lora_B.default.weight', 'model.layers.25.self_attn.k_proj.base_layer.bias', 'model.layers.25.self_attn.k_proj.base_layer.weight', 'model.layers.25.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.k_proj.lora_A.default.weight', 'model.layers.25.self_attn.k_proj.lora_B.default.weight', 'model.layers.25.self_attn.o_proj.base_layer.bias', 'model.layers.25.self_attn.o_proj.base_layer.weight', 'model.layers.25.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.o_proj.lora_A.default.weight', 'model.layers.25.self_attn.o_proj.lora_B.default.weight', 'model.layers.25.self_attn.q_proj.base_layer.bias', 'model.layers.25.self_attn.q_proj.base_layer.weight', 'model.layers.25.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.q_proj.lora_A.default.weight', 'model.layers.25.self_attn.q_proj.lora_B.default.weight', 'model.layers.25.self_attn.v_proj.base_layer.bias', 'model.layers.25.self_attn.v_proj.base_layer.weight', 'model.layers.25.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.v_proj.lora_A.default.weight', 'model.layers.25.self_attn.v_proj.lora_B.default.weight', 'model.layers.26.self_attn.k_proj.base_layer.bias', 'model.layers.26.self_attn.k_proj.base_layer.weight', 'model.layers.26.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.k_proj.lora_A.default.weight', 'model.layers.26.self_attn.k_proj.lora_B.default.weight', 'model.layers.26.self_attn.o_proj.base_layer.bias', 'model.layers.26.self_attn.o_proj.base_layer.weight', 'model.layers.26.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.o_proj.lora_A.default.weight', 'model.layers.26.self_attn.o_proj.lora_B.default.weight', 'model.layers.26.self_attn.q_proj.base_layer.bias', 'model.layers.26.self_attn.q_proj.base_layer.weight', 'model.layers.26.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.q_proj.lora_A.default.weight', 'model.layers.26.self_attn.q_proj.lora_B.default.weight', 'model.layers.26.self_attn.v_proj.base_layer.bias', 'model.layers.26.self_attn.v_proj.base_layer.weight', 'model.layers.26.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.v_proj.lora_A.default.weight', 'model.layers.26.self_attn.v_proj.lora_B.default.weight', 'model.layers.27.self_attn.k_proj.base_layer.bias', 'model.layers.27.self_attn.k_proj.base_layer.weight', 'model.layers.27.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.k_proj.lora_A.default.weight', 'model.layers.27.self_attn.k_proj.lora_B.default.weight', 'model.layers.27.self_attn.o_proj.base_layer.bias', 'model.layers.27.self_attn.o_proj.base_layer.weight', 'model.layers.27.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.o_proj.lora_A.default.weight', 'model.layers.27.self_attn.o_proj.lora_B.default.weight', 'model.layers.27.self_attn.q_proj.base_layer.bias', 'model.layers.27.self_attn.q_proj.base_layer.weight', 'model.layers.27.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.q_proj.lora_A.default.weight', 'model.layers.27.self_attn.q_proj.lora_B.default.weight', 'model.layers.27.self_attn.v_proj.base_layer.bias', 'model.layers.27.self_attn.v_proj.base_layer.weight', 'model.layers.27.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.v_proj.lora_A.default.weight', 'model.layers.27.self_attn.v_proj.lora_B.default.weight', 'model.layers.28.self_attn.k_proj.base_layer.bias', 'model.layers.28.self_attn.k_proj.base_layer.weight', 'model.layers.28.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.k_proj.lora_A.default.weight', 'model.layers.28.self_attn.k_proj.lora_B.default.weight', 'model.layers.28.self_attn.o_proj.base_layer.bias', 'model.layers.28.self_attn.o_proj.base_layer.weight', 'model.layers.28.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.o_proj.lora_A.default.weight', 'model.layers.28.self_attn.o_proj.lora_B.default.weight', 'model.layers.28.self_attn.q_proj.base_layer.bias', 'model.layers.28.self_attn.q_proj.base_layer.weight', 'model.layers.28.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.q_proj.lora_A.default.weight', 'model.layers.28.self_attn.q_proj.lora_B.default.weight', 'model.layers.28.self_attn.v_proj.base_layer.bias', 'model.layers.28.self_attn.v_proj.base_layer.weight', 'model.layers.28.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.v_proj.lora_A.default.weight', 'model.layers.28.self_attn.v_proj.lora_B.default.weight', 'model.layers.29.self_attn.k_proj.base_layer.bias', 'model.layers.29.self_attn.k_proj.base_layer.weight', 'model.layers.29.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.k_proj.lora_A.default.weight', 'model.layers.29.self_attn.k_proj.lora_B.default.weight', 'model.layers.29.self_attn.o_proj.base_layer.bias', 'model.layers.29.self_attn.o_proj.base_layer.weight', 'model.layers.29.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.o_proj.lora_A.default.weight', 'model.layers.29.self_attn.o_proj.lora_B.default.weight', 'model.layers.29.self_attn.q_proj.base_layer.bias', 'model.layers.29.self_attn.q_proj.base_layer.weight', 'model.layers.29.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.q_proj.lora_A.default.weight', 'model.layers.29.self_attn.q_proj.lora_B.default.weight', 'model.layers.29.self_attn.v_proj.base_layer.bias', 'model.layers.29.self_attn.v_proj.base_layer.weight', 'model.layers.29.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.v_proj.lora_A.default.weight', 'model.layers.29.self_attn.v_proj.lora_B.default.weight', 'model.layers.3.self_attn.k_proj.base_layer.bias', 'model.layers.3.self_attn.k_proj.base_layer.weight', 'model.layers.3.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.k_proj.lora_A.default.weight', 'model.layers.3.self_attn.k_proj.lora_B.default.weight', 'model.layers.3.self_attn.o_proj.base_layer.bias', 'model.layers.3.self_attn.o_proj.base_layer.weight', 'model.layers.3.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.o_proj.lora_A.default.weight', 'model.layers.3.self_attn.o_proj.lora_B.default.weight', 'model.layers.3.self_attn.q_proj.base_layer.bias', 'model.layers.3.self_attn.q_proj.base_layer.weight', 'model.layers.3.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.q_proj.lora_A.default.weight', 'model.layers.3.self_attn.q_proj.lora_B.default.weight', 'model.layers.3.self_attn.v_proj.base_layer.bias', 'model.layers.3.self_attn.v_proj.base_layer.weight', 'model.layers.3.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.v_proj.lora_A.default.weight', 'model.layers.3.self_attn.v_proj.lora_B.default.weight', 'model.layers.4.self_attn.k_proj.base_layer.bias', 'model.layers.4.self_attn.k_proj.base_layer.weight', 'model.layers.4.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.k_proj.lora_A.default.weight', 'model.layers.4.self_attn.k_proj.lora_B.default.weight', 'model.layers.4.self_attn.o_proj.base_layer.bias', 'model.layers.4.self_attn.o_proj.base_layer.weight', 'model.layers.4.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.o_proj.lora_A.default.weight', 'model.layers.4.self_attn.o_proj.lora_B.default.weight', 'model.layers.4.self_attn.q_proj.base_layer.bias', 'model.layers.4.self_attn.q_proj.base_layer.weight', 'model.layers.4.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.q_proj.lora_A.default.weight', 'model.layers.4.self_attn.q_proj.lora_B.default.weight', 'model.layers.4.self_attn.v_proj.base_layer.bias', 'model.layers.4.self_attn.v_proj.base_layer.weight', 'model.layers.4.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.v_proj.lora_A.default.weight', 'model.layers.4.self_attn.v_proj.lora_B.default.weight', 'model.layers.5.self_attn.k_proj.base_layer.bias', 'model.layers.5.self_attn.k_proj.base_layer.weight', 'model.layers.5.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.k_proj.lora_A.default.weight', 'model.layers.5.self_attn.k_proj.lora_B.default.weight', 'model.layers.5.self_attn.o_proj.base_layer.bias', 'model.layers.5.self_attn.o_proj.base_layer.weight', 'model.layers.5.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.o_proj.lora_A.default.weight', 'model.layers.5.self_attn.o_proj.lora_B.default.weight', 'model.layers.5.self_attn.q_proj.base_layer.bias', 'model.layers.5.self_attn.q_proj.base_layer.weight', 'model.layers.5.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.q_proj.lora_A.default.weight', 'model.layers.5.self_attn.q_proj.lora_B.default.weight', 'model.layers.5.self_attn.v_proj.base_layer.bias', 'model.layers.5.self_attn.v_proj.base_layer.weight', 'model.layers.5.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.v_proj.lora_A.default.weight', 'model.layers.5.self_attn.v_proj.lora_B.default.weight', 'model.layers.6.self_attn.k_proj.base_layer.bias', 'model.layers.6.self_attn.k_proj.base_layer.weight', 'model.layers.6.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.k_proj.lora_A.default.weight', 'model.layers.6.self_attn.k_proj.lora_B.default.weight', 'model.layers.6.self_attn.o_proj.base_layer.bias', 'model.layers.6.self_attn.o_proj.base_layer.weight', 'model.layers.6.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.o_proj.lora_A.default.weight', 'model.layers.6.self_attn.o_proj.lora_B.default.weight', 'model.layers.6.self_attn.q_proj.base_layer.bias', 'model.layers.6.self_attn.q_proj.base_layer.weight', 'model.layers.6.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.q_proj.lora_A.default.weight', 'model.layers.6.self_attn.q_proj.lora_B.default.weight', 'model.layers.6.self_attn.v_proj.base_layer.bias', 'model.layers.6.self_attn.v_proj.base_layer.weight', 'model.layers.6.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.v_proj.lora_A.default.weight', 'model.layers.6.self_attn.v_proj.lora_B.default.weight', 'model.layers.7.self_attn.k_proj.base_layer.bias', 'model.layers.7.self_attn.k_proj.base_layer.weight', 'model.layers.7.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.k_proj.lora_A.default.weight', 'model.layers.7.self_attn.k_proj.lora_B.default.weight', 'model.layers.7.self_attn.o_proj.base_layer.bias', 'model.layers.7.self_attn.o_proj.base_layer.weight', 'model.layers.7.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.o_proj.lora_A.default.weight', 'model.layers.7.self_attn.o_proj.lora_B.default.weight', 'model.layers.7.self_attn.q_proj.base_layer.bias', 'model.layers.7.self_attn.q_proj.base_layer.weight', 'model.layers.7.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.q_proj.lora_A.default.weight', 'model.layers.7.self_attn.q_proj.lora_B.default.weight', 'model.layers.7.self_attn.v_proj.base_layer.bias', 'model.layers.7.self_attn.v_proj.base_layer.weight', 'model.layers.7.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.v_proj.lora_A.default.weight', 'model.layers.7.self_attn.v_proj.lora_B.default.weight', 'model.layers.8.self_attn.k_proj.base_layer.bias', 'model.layers.8.self_attn.k_proj.base_layer.weight', 'model.layers.8.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.k_proj.lora_A.default.weight', 'model.layers.8.self_attn.k_proj.lora_B.default.weight', 'model.layers.8.self_attn.o_proj.base_layer.bias', 'model.layers.8.self_attn.o_proj.base_layer.weight', 'model.layers.8.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.o_proj.lora_A.default.weight', 'model.layers.8.self_attn.o_proj.lora_B.default.weight', 'model.layers.8.self_attn.q_proj.base_layer.bias', 'model.layers.8.self_attn.q_proj.base_layer.weight', 'model.layers.8.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.q_proj.lora_A.default.weight', 'model.layers.8.self_attn.q_proj.lora_B.default.weight', 'model.layers.8.self_attn.v_proj.base_layer.bias', 'model.layers.8.self_attn.v_proj.base_layer.weight', 'model.layers.8.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.v_proj.lora_A.default.weight', 'model.layers.8.self_attn.v_proj.lora_B.default.weight', 'model.layers.9.self_attn.k_proj.base_layer.bias', 'model.layers.9.self_attn.k_proj.base_layer.weight', 'model.layers.9.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.k_proj.lora_A.default.weight', 'model.layers.9.self_attn.k_proj.lora_B.default.weight', 'model.layers.9.self_attn.o_proj.base_layer.bias', 'model.layers.9.self_attn.o_proj.base_layer.weight', 'model.layers.9.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.o_proj.lora_A.default.weight', 'model.layers.9.self_attn.o_proj.lora_B.default.weight', 'model.layers.9.self_attn.q_proj.base_layer.bias', 'model.layers.9.self_attn.q_proj.base_layer.weight', 'model.layers.9.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.q_proj.lora_A.default.weight', 'model.layers.9.self_attn.q_proj.lora_B.default.weight', 'model.layers.9.self_attn.v_proj.base_layer.bias', 'model.layers.9.self_attn.v_proj.base_layer.weight', 'model.layers.9.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.v_proj.lora_A.default.weight', 'model.layers.9.self_attn.v_proj.lora_B.default.weight']
- This IS expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Starcoder2ForCausalLM were not initialized from the model checkpoint at finetune_starcoder2/final_checkpoint and are newly initialized: ['model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.o_proj.bias', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.9.self_attn.v_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig()

checkpoint = "bigcode/starcoder2-3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained("finetune_starcoder2/final_checkpoint", quantization_config=quantization_config)

inputs = tokenizer.encode("hello_world_function <- function() {", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Also, I don't think doing 4-bit quantization as a default for finetuning is a good idea. It should be opt-in with a flag.

I am also wondering why do we use the Stack v1 for finetuning and not the Stack v2?

What does "unique tokens" mean (in the paper) ?

For example, on page 16 it said "This leads to a dataset of 622B+ unique tokens. For the 7B, we include OpenWebMath, Wikipedia, and Arxiv, leading to a slightly larger dataset of 658B+ unique tokens. For the 15B, we include the-stack-v2-train-full dataset and all extra data sources listed in §2, resulting in a dataset with 913B+ unique tokens. The size of this dataset is 4× the size of the training dataset for StarCoderBase."
The question is, does the "unique tokens" mean there are such a number of tokens totally in the dataset after dedup~ or if you use the starcoderv2's tokenizer to tokenize the whole dataset, you can get such a huge vocab dict?

Can't load StarCoder2-3B

Hi,

I am using Python 3.10.8 and I've updated all packages, but I still get this error message, when try to use the model:

ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

Any ideas what the problem might be?

Better inference based on starcode2-3b model

I am new to starcode.

when I run the follow demo:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "./starcoder2-3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)

inputs = tokenizer.encode("def is_prime(n):", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

it returns:

def is_prime():
    """
    This function checks if a number is prime or not.
    """

it doesn`t finish. so I SET the max_length=120, then it returns:

def is_prime():
    """
    This function checks if a number is prime or not.
    """
    num = int(input("Enter a number: "))
    if num > 1:
        for i in range(2, num):
            if (num % i) == 0:
                print(num, "is not a prime number")
                break
        else:
            print(num, "is a prime number")
    else:
        print(num, "is not a prime number")


is_prime()
<file_sep>/README.md
# Python-

The part

is_prime()
<file_sep>/README.md
# Python-

is redundant. now my solution is:

generated_code = tokenizer.decode(outputs[0])
if "<file_sep>" in generated_code:
    generated_code = generated_code.split("<file_sep>")[0]
print(generated_code)

But I don`t think it a good idea. I want the model to return the results in one go without generating redundant parts. How can I do that? Could you give me some advice?

[Q] Does this support fill-in-the-middle completion?

Does this support fill-in-the-middle completion?

"lm_head.weight" not in the parameters of Starcoder2-3B and Starcoder2-7B (Huggingface version)

There is no lm_head.weight in the parameters of Starcoder2-3B and Starcoder2-7B. Is it because of tied embedding?

Inquiry about Fine-Tuning Using Custom Code

Hi there,

I hope this message finds you well. I am currently exploring the process of fine-tuning models using my own codebase, and I was hoping to seek some guidance on this matter.

Could you please provide me with information on how I can effectively fine-tune models using my own codebase? Additionally, would it be possible for you to share any scripts or resources related to data preprocessing for this purpose?

I truly appreciate any assistance or insights you can provide on this matter. Thank you very much for your time and support.

Best regards
@loubnabnl

What prevents you from throughly opensourcing?

I noticed that even though bigcode/starcoder(2) is much opener than code llama and deepseekcoder, eg. open-sourced datasets, clearly described data processing and training, and so on, it is still not thoroughly open; code used for pretraining and data processing has never been open-source.
So just out of curiosity, what prevents you from that?

CrossCodeEval Results for StarCoder 2

Hi, currently I'm researching the impact of different retrieval-augmented generation (RAG) techniques on the LLM effect. We are attempting to replicate the CrossCodeEval from the "StarCoder 2 and The Stack v2: The Next Generation" paper as a baseline.

However, we have encountered issues in replicating the results stated in section 7.6.2 of the paper using the provided GitHub repository data and code for CrossCodeEval, along with the hyperparameters specified in the section. The paper reports a Code ES of 74.52 and an ID F1 of 68.81 for StarCoder2-7B’s Python code generation, whereas our replicated results showed a Code ES of 67.92 and an ID F1 of 58.08.

We noticed the option to use the BigCode-Evaluation-Harness for testing as mentioned in your repository, but we could not find the CrossCodeEval experiment within the bigcode-project/bigcode-evaluation-harness project. Therefore, we proceeded with the direct use of the open-source GitHub code and dataset for CrossCodeEval, employing the hyperparameters given in section 7.6.2.

My experiment evironment is:

A100 40G*8 DGX node
ubuntu 20.04
cuda 12.1
torch 2.1.2

Could you please provide any insights or additional guidelines that might help us better replicate the benchmark results? Any assistance or further details you could offer would be greatly appreciated.

Thank you for your time and support.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.