openaccess-ai-collective / axolotl Goto Github PK

View Code? Open in Web Editor NEW

6.7K 6.7K 740.0 5.38 MB

Go ahead and axolotl questions

Home Page: https://openaccess-ai-collective.github.io/axolotl/

License: Apache License 2.0

Python 99.16% Shell 0.56% Dockerfile 0.13% Jinja 0.15% CSS 0.01%

axolotl's People

Contributors

Stargazers

Watchers

Forkers

ehartford practical-dreamer khmerailab jesusoctavioas nanocode012 jak3122 rooben-me rocketgod-git thytu jordiclive haorand imperial18 farishijazi fearnworks cg123 dkzdev lightningralf winglian utensil snoopycn fredi-python angainordev bratao pocketdoclabs glavin001 akj2018 worthmining leecig jjhw mindrages blipranger mhenrichsen branng0 callum17 sroecker danidayede kshetrajna12 zuodh msinha251 idoru spellcraftai wangjiaqiys stickybandit86 moohax robertalanm jphme nan-do kimiko-ai faycald ssmi153 theobjectivedad teknium1 ethanhs plurigrid yolantele madhavajay taranakiai bjoernpl eugenepentland tmm1 l3utterfly alpindale enn-nafnlaus xnliang98 ablateit jaredquekjz morganmcg1 tokenbender ittailup evdcush flotos spachava753 endlessreform teargosling clearsitedesigns philpax sw882882 acrastt dongxiaolong scottlogic-alex maximilian-winter maximegmd brandondelpozo birch-san memazouni flyx-ai thebloke prettysparklepony simulanics brandonmcclure bdashore3 slapdrone salahzoubi mdhasanali3 simsim314 mhylle jondurbin kennyfrc nacloudai gptcrash

axolotl's Issues

run model.train() on models before training

see https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train

[Question] Should inference instruction have stripped last new line?

axolotl/scripts/finetune.py

Lines 68 to 71 in bbfc333

    
           instruction = get_multi_line_input() 
        
           if not instruction: 
        
               return 
        
           prompt: str = next(prompter_module().build_prompt(instruction=instruction))

The inference script requires typing enter to pass the input even it's only one line. The result is that a \n is appended to that line.

>>> get_multi_line_input()
Give me an instruction (Ctrl + D to finish): 
test
'test\n'

Should this be changed to instruction=instruction.strip('\n')?

I am not sure about other prompting style, but for completion, we want the text to be continued test is a word. Instead of having a \n after the input test\n is a word.

An alternative solution would be to add strip inside of the CompletionPrompter.

[Refactor] prepare_model_for_int8_training is deprecated and will be removed in a future version

Getting this warning on latest peft.

/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/utils/other.py:76: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead

[Feature] Support Conditional Pretrained Datasets (using tags)

https://laion.ai/notes/cpretrain/

[Feature] Add Landmark attention

Per request from Discord to add this feature.

They seem to monkeypatch LlamaModel and add special mem token.

Ref: https://github.com/epfml/landmark-attention/blob/main/llama/train.py

No module named axolotl.utils.validation

After pip installing axolotl, and trying to run the provided finetuning command I get:

Traceback (most recent call last): File "/home/someone/axolotl/scripts/finetune.py", line 17, in <module> from axolotl.utils.validation import validate_config ModuleNotFoundError: No module named 'axolotl.utils.validation'

Can't find validation.py anywhere in the commits either

[Feature] Allow pass prompter config

Proposal: Code has been written to accept any Prompter. We should allow this to be configurable using a cfg or kwarg

axolotl/scripts/finetune.py

Line 188 in bbfc333

do_inference(cfg, model, tokenizer)

which is used here

axolotl/scripts/finetune.py

Lines 59 to 64 in bbfc333

    
           def do_inference(cfg, model, tokenizer, prompter="AlpacaPrompter"): 
        
               tokenizer.add_special_tokens({"unk_token": "<unk>"}) 
        
               tokenizer.add_special_tokens({"bos_token": "<s>"}) 
        
               tokenizer.add_special_tokens({"eos_token": "</s>"}) 
        
               prompter_module = getattr(importlib.import_module("axolotl.prompters"), prompter)

[Feature] Deprecate `batch_size` in favor of `gradient accumulation steps`

Reason: Easier for calculation

Replace use of batch size with gradient
Replace in doc
Check duplicate config in validate_config
Replace all configs/examples

[Bug] Seed does not always load from `cfg.seed`

I think this is an easy Issue to tackle for anyone interested.

It should be set to a default value if not defined, somewhere at start hopefully (maybe when loading config).

Update below

axolotl/src/axolotl/utils/data.py

Line 115 in a617f1b

ds = ds.shuffle(seed=42)["train"].shard(num_shards=cfg.shards, index=0)

Update below

axolotl/src/axolotl/utils/data.py

Line 220 in a617f1b

dataset = Dataset.from_list(samples).shuffle(seed=42)

Pass seed to Trainer
Pass seed to any function that has it available

[Feature] update prepare_model_for_int8_training

FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead.

add pre-commit hook with pylint, flake8 and black

mypy would be nice too, but that might be a big ask

Unusable early_stopping_patience param

Whenever user uses early_stopping_patience it results in the following error: AssertionError: EarlyStoppingCallback requires load_best_model_at_end = True

RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()

I get the below at the end of training. I suspect it's due to loading 8 bit and https://github.com/winglian/axolotl/blob/47ad3890bc35985b9046f403312887035e19f96f/src/axolotl/utils/trainer.py#L99

Stack trace

File "/workspace/scripts/finetune.py", line 246, in <module> 
    fire.Fire(train) 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 141, in Fire 
    component_trace = _Fire(component, args, parsed_flag_args, context, name) 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 475, in _Fire 
    component, remaining_args = _CallAndUpdateTrace( 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace 
    component = fn(*varargs, **kwargs) 
  File "/workspace/scripts/finetune.py", line 235, in train 
    trainer.train(resume_from_checkpoint=resume_from_checkpoint) 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 1664, in train 
    return inner_training_loop( 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 2054, in _inner_training_loop 
    self._load_best_model() 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 2230, in _load_best_model 
    load_result = model.load_state_dict(state_dict, False) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2027, in load_state_dict 
    load(self, state_dict) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  [Previous line repeated 4 more times] 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2009, in load 
    module._load_from_state_dict( 
  File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/nn/modules.py", line 298, in _load_from_state_dict 
    raise RuntimeError("Loading a quantized checkpoint into non-quantized Linear8bitLt is " 
RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()

Info

Commit: Before dev merge winglian/axolotl@cb9a887

disable checkpoint for wandb_log_model:

update all the configs / examples and change wandb_log_model: checkpoint => wandb_log_model:

this will prevent uploading obscenely large artifacts to wandb by default and using quota

Save `adapter_bin` using callbacks if `lora`

Proposal

It would be good to also save lora during each checkpoint

Solution

We can also save the lora using callbacks. I saw code for callback, however, we can slightly modify it to not delete pytorch_model.bin, so that we can resume training.

We can check if adapter: lora then add the callback.

Happy to PR this.

Edit: Discussion at huggingface/peft#353 (comment)

issues to fix reported from discord

Bambi#1600
I can report my observations from my attempts at int8 LoRA training via the trainer built into Oobabooga’s textgen webUI if it helps:

Myself and others are able to train LoRAs with int8 precision for the original unquantized HF llama-7b and llama-13b models
The LoRA from this train produced expected results at inference when applied to the unquantized llama models
VRAM usage during the train was observed to be evenly seemed split between cards
GPU utilization however was observed to alternate between the cards (one card was pulling 150 watts, the other pulling 300 watts then they’d swap) indicating a serialized but threaded workload vs true parallelization
Encountered but upon saving first checkpoint causing both cards to OOM. Following numerous forum threads we reverted out bitsandbytes model from 0.38.1 to 0.37.2 which resolved the issue.

[Refactor] Fix duplicate `config` and `examples` folder and update previous configs

In the past, the configs were all in configs. However, as things changed, some parts have been moved to the examples folder.

Furthermore, there are some old/invalid configs within the configs folder due to recent changes.

It would be good to move all configs to examples folder within each architecture for better maintenance and update those previous configs to work.

I'm curious if anyone has any better ideas?

custom prompt strategies improvement

https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/prompt_strategies/__init__.py#L10-L11

on except, try to use importlib without the relative module in case they are using their own custom module/handler

update references to previous repo location

winglian/axolotl -> OpenAccess-AI-Collective/axolotl

Issue load Llama tokenizer

Hello, I'm getting a weird issue loading tokenizer. I've checked that the line of code hasn't changed even on my latest pull. The only difference could be transformer source changed something.

https://github.com/winglian/axolotl/blob/7576d85c735e307fa1dbbcb8e0cba8b53bb1fa48/src/axolotl/utils/models.py#L138-L139

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.88it/s]
Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear8bitLt(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)'.
Traceback (most recent call last):
  File "/workspace/src/axolotl/utils/models.py", line 140, in load_model
    tokenizer = LlamaTokenizer.from_pretrained(model)

What does `shard` do?

In latest update, there is shard argument. What is it trying to do? Is it trying to load model then output lora adapter?

https://github.com/winglian/axolotl/blob/cb9a88704707b0fc3362988a7f57b606e4448ac7/scripts/finetune.py#L169-L171

Is this due to how you're saving full model now?

https://github.com/winglian/axolotl/blob/cb9a88704707b0fc3362988a7f57b606e4448ac7/scripts/finetune.py#L222

As I understand, if you want to extract lora from checkpoint, you need to load from the checkpoints first, then set the base model with those weights. If this shard meant something different, could I PR this feature of lora extraction from checkpoint?

[Feature] token gisting

https://arxiv.org/abs/2304.08467

when hashing the dataset cache, be sure to use the tokenizer as part of the hash key

[Feature] multi-modal training

e.g. PandaGPT for audio/video training. This is likely a large feature. lmk if anyone is interested in helping with this one.

https://github.com/yxuansu/PandaGPT/blob/main/code/model/modeling_llama.py
https://github.com/yxuansu/PandaGPT/blob/main/code/model/openllama.py#L86

AttributeError: 'AlpacaPrompter' object has no attribute 'prompt_no_input'

Not sure if this is intended, but if the prompt dict contains the key "input" but the value for input is an empty string the line
input in prompt will resolve to False:

class AlpacaPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
    def parse_instruction_fields(self, prompt) -> (str, str, str):
        print(f"Is input in prompt?:  {input in prompt}")
        return (
            prompt["instruction"],
            prompt["input"] if "input" in prompt else "",
            prompt["output"],
        )

If the prompt input is an empty string, build_prompt will try to build a prompt with prompt_no_input

    def build_prompt(
        self,
        instruction: str,
        input: Union[None, str] = None,
        output: Union[None, str] = None,
    ) -> Generator[str, None, None]:
        # returns the full prompt from instruction and optional input
        # if a label (=response, =output) is provided, it's also appended.
        if input:
            res = self.prompt_input.format(instruction=instruction, input=input)
        else:
            res = self.prompt_no_input.format(instruction=instruction)
        if output:
            res = f"{res}{output}"
        yield res

but if the prompt style is 'alpaca', there is no prompt_no_input:

  def match_prompt_style(self):
      if self.prompt_style == PromptStyle.instruct.value:
          self.prompt_input = (
              self.system_prompt
              + "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
          )
          self.prompt_no_input = (
              self.system_no_input_prompt
              + "### Instruction:\n{instruction}\n\n### Response:\n"
          )
          self.response_split = "### Response:"
      if self.prompt_style == PromptStyle.chat.value:
          self.prompt_input = (
              self.system_prompt + "USER: {instruction}\n{input}\nASSISTANT:"
          )
          self.prompt_no_input = (
              self.system_no_input_prompt + "USER: {instruction}\nASSISTANT:"
          )
          self.response_split = "ASSISTANT:"

Not sure what the best solution is - Add a prompt_no_input for alpaca style prompts or rephrase the ifs so that the result is an empty "### Input: "?

I'm willing to do a PR, just tell me what solution you want to see.

optionally save as safetensors.

might be easier to have a separate option to load a model and re-save it as safetensors

[Feature] Allow passing file to inference on

Problem

It may be necessary to repeat the same questions across many experiments. It is time consuming to copy paste line by line.

Feature

Allow passing path to jsonl file or similar that can be read and ran through the model then output to a results file

[Bug] xformer requires alpaca_lora_4bit install even if we don't use gptq

If we turn on xformer but don't need gptq (only qlora for example), it will not work as it imports from alpaca_lora_4bit.

Discord user faced this as well: https://discord.com/channels/1104757954588196865/1111279858136383509/1113300840913055764

Move https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/monkeypatch/llama_attn_hijack_xformers.py to repo

[Refactor] Remove use of local variables `save_steps` and `eval_steps` as they are not modified

axolotl/src/axolotl/utils/trainer.py

Lines 53 to 54 in 87dffbc

    
           save_steps = cfg.save_steps 
        
           eval_steps = cfg.eval_steps

When I looked into the code and saw this, I thought it was saved to a local variable to be modified / compared. However, it does not seem so.

It might be better to remove these and use the original cfg counterparts to remove the preconception that the local variable is different. Of course, it's also ok to leave them as is.

Support Galactica as base model, is it possible?

Galactica has been finetuned here and here

But I haven't found any code published to finetune it.

Would love to see finetuning galactica supported in axolotl.

[Doc] Update wording of 4bit to GPTQ and add qlora

Problem

Previously, 4bit quant meant GPTQ. However, due to the new release of qlora, there can be some confusion on this subject (like here https://discord.com/channels/1104757954588196865/1111279858136383509/1111846381355802744).

TODO

Add new column for qlora
Rename 4bit quant
Evaluate qlora on other architectures

[Test] Add test for all prompt tokenizers

Following #111

[Question] Duplicate shard config names?

I noticed two different shard code using different configs in load_tokenized_prepared_datasets and load_prepare_datasets

axolotl/src/axolotl/utils/data.py

Lines 114 to 115 in a617f1b

    
           if d.shards: 
        
               ds = ds.shuffle(seed=42)["train"].shard(num_shards=cfg.shards, index=0)

axolotl/src/axolotl/utils/data.py

Lines 345 to 351 in a617f1b

    
           if cfg.dataset_shard_num and cfg.dataset_shard_idx is not None: 
        
               logging.info( 
        
                   f"Using index #{cfg.dataset_shard_idx} of {cfg.dataset_shard_num} shards" 
        
               ) 
        
               dataset = dataset.shard( 
        
                   num_shards=cfg.dataset_shard_num, index=cfg.dataset_shard_idx 
        
               )

Not sure if these two parts should be combined and called elsewhere, but I think the config should be unified to use same name.

[Bug] Validation file does not exist

axolotl/scripts/finetune.py

Line 17 in a617f1b

from axolotl.utils.validation import validate_config

Would crash as the file does not exist

save steps enhancement

if save_steps is a fraction, calculate the steps based on floor(save_steps * total_steps_per_epoch).

this way if someone were to say 0.5, they could get a checkpoint at half an epoch and the end of an epoch without having to manually figure it out

early stopping callback requires load_best_model_at_end to be True

there are a lot of factors that affect how load_best_model_at_end gets set from eval steps and save_steps. we need to figure out what the best way to handle this is.

[Bug] QLoRA port missing some steps

https://github.com/artidoro/qlora/blob/main/qlora.py#L328-L337

    for name, module in model.named_modules():
        if isinstance(module, LoraLayer):
            if args.bf16:
                module = module.to(torch.bfloat16)
        if 'norm' in name:
            module = module.to(torch.float32)
        if 'lm_head' in name or 'embed_tokens' in name:
            if hasattr(module, 'weight'):
                if args.bf16 and module.weight.dtype == torch.float32:
                    module = module.to(torch.bfloat16)

[Feature] Replace `cfg.load_4bit` with `cfg.gptq`

Proposal: Change naming to reduce confusing with load_in_4bit which is used for qlora.

Breaking change: Yes

Replace all instances
Add assert in validation config to assert not cfg.load_4bit, "cfg.load_4bit has been deprecated. Please change to cfg.gptq"

[Feature] Automatic Rebase

For a cleaner tree

https://github.com/marketplace/actions/automatic-rebase

ImportError: cannot import name 'Mapping' from 'collections'

 ~/g/axolotl   axolotl   main ≡  ?1  accelerate launch scripts/finetune.py configs/llama_30B_4bit.yml~~
Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 11, in <module>
    from attrdict import AttrDefault
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping
ImportError: cannot import name 'Mapping' from 'collections' (/home/eric/miniconda3/envs/axolotl/lib/python3.10/collections/__init__.py)
Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 11, in <module>
    from attrdict import AttrDefault
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping

Support python 3.10 and higher

Currently axolotl requires python 3.9
Python 3.10 and 3.11 will fail due to dependency issue
Can you please update the dependencies so that axolotl will work in 3.10 and 3.11

[Feature] Add tests

We need to also think about adding some tests to ensure more stability. I do not have much experience on the matter, however, I think at the very least, we should test the following:

Functional

Test validate_config for conflicting configs

End to end for each architecture in Readme for one or two global_step:

fp16/fp32
4bit
8bit
gptq

confusing error message

I get a confusing error message. Can you please help?

My command line is:
accelerate launch scripts/finetune.py configs/llama_30B_4bit.yml

My config is:

base_model: ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors
base_model_config: ../alpaca_lora_4bit/llama-30b-4bit/
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: false
datasets:
  - path: ../alpaca_lora_4bit/leet10k-alpaca-merged.json
    type: alpaca
dataset_prepared_path: data/last_run_prepared
val_set_size: 0.04
adapter: lora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len: 1024
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
output_dir: ./lora-test
batch_size: 128
micro_batch_size: 8
num_epochs: 4
warmup_steps: 100
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
tf32: true
gradient_checkpointing: false
early_stopping_patience: 3
resume_from_checkpoint:
auto_resume_from_checkpoints: true
local_rank:
load_4bit: true
xformers_attention: true
flash_attention:

My error message is:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
binbin  /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/0/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/3/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/2/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
bin /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/1/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 246, in <module>
    fire.Fire(train)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/eric/git/axolotl/scripts/finetune.py", line 178, in train
    model, tokenizer, lora_config = load_model(
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 136, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 445, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 922, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors' is not a valid JSON file.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 246, in <module>
    fire.Fire(train)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/eric/git/axolotl/scripts/finetune.py", line 178, in train
    model, tokenizer, lora_config = load_model(
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 136, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 445, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 922, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors' is not a valid JSON file.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2878 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2879 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2881 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 2880) of binary: /home/eric/miniconda3/envs/axolotl2/bin/python
Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/launch.py", line 914, in launch_command
    multi_gpu_launcher(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/launch.py", line 603, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
scripts/finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-04-24_23:18:00
  host      : mlc-win.
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 2880)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

qlora save peft on final callback

{'eval_loss': 1.2171393632888794, 'eval_runtime': 7.1067, 'eval_samples_per_second': 4.362, 'eval_steps_per_second': 0.141, 'epoch': 4.38}
{'loss': 1.0812, 'learning_rate': 3.581603349196372e-06, 'epoch': 4.5}
{'loss': 1.0813, 'learning_rate': 2.0253513192751373e-06, 'epoch': 4.62}
{'loss': 1.0691, 'learning_rate': 9.035651368646648e-07, 'epoch': 4.75}
{'loss': 1.0922, 'learning_rate': 2.2640387134577058e-07, 'epoch': 4.88}
{'loss': 1.117, 'learning_rate': 0.0, 'epoch': 5.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [2:10:50<00:00, 192.75s/it]
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
fire.Fire(train)fire.Fire(train)Traceback (most recent call last):
Traceback (most recent call last):

trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/workspace/axolotl/scripts/finetune.py", line 244, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
return inner_training_loop(
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
self._load_best_model()
self._load_best_model()self._load_best_model()

File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._issue_warnings_after_load(load_result)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._issue_warnings_after_load(load_result)
UnboundLocalErrorUnboundLocalError: local variable 'load_result' referenced before assignment
: local variable 'load_result' referenced before assignment
self._issue_warnings_after_load(load_result)
self._load_best_model()
UnboundLocalError : File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model

File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model

UnboundLocalError: local variable 'load_result' referenced before assignment
self._issue_warnings_after_load(load_result)

GPTQ vs QLoRA

GPTQ and QLoRA are mutually exclusive when it comes to the PEFT dependency. see https://github.com/winglian/alpaca_lora_4bit/blob/main/requirements.txt#L9 vs QLoRA basically needing main. It's probably worth removing the [int4] part of the install from the docker container, and simply doing a basic install. We'll also need to update the docs for those people who want to use GPTQ that they will need to pip uninstall peft and pip install .[int4]. Also, the caveat for them is they need to uninstall peft again if they want to switch back to qlora.

[BUG] Fix attention masking when concatenating sequences

Someone should review, but I think we're doing it incorrectly. we need to set the attention mask so that the first token in each concatenated sequence has an attention mask of zero. In most cases, this is setting the mask of the bos token to zero

add bitsandbytes build with cuda library in base docker image

from my qlora notes:

cd bitsandbytes
CUDA_VERSION=118 make cuda11x
pip uninstall bitsandbytes
python setup.py install
pip install scipy
pip uninstall transformers
pip install "transformers @ git+https://github.com/huggingface/transformers.git
pip install bert-score==0.3.13 evaluate==0.4.0 rouge-score==0.1.2 scikit-learn==1.2.2 sentencepiece==0.1.99 wandb==0.15.2

should update requirements.txt too.

[Bug] Add `cfg.hf_use_auth_token` to set whether to attach auth token

As discussed in Discord, if a user is not authenticated to huggingface, the code would error as it expects the token.

We would like to swap to look for whether to attach using a config instead cfg.hf_use_auth_token.

Set all use_auth_token=True to load from cfg use_auth_token=cfg.hf_use_auth_token instead

axolotl/src/axolotl/utils/data.py

Line 67 in 87dffbc

f"{cfg.push_dataset_to_hub}/{ds_hash}", use_auth_token=True
Add it to the docs under all yaml options

Trainer() got multiple values for keyword argument 'callbacks'

when running (8xA100 80Gb)
I run into this error:

File "/root/axolotl/scripts/finetune.py", line 239, in <module> trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer Traceback (most recent call last): File "/root/axolotl/scripts/finetune.py", line 239, in <module> trainer = transformers.Trainer( fire.Fire(train)TypeError : File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fir e/core.py", line 141, in Fire transformers.trainer.Trainer() got multiple values for keyword argument 'callbacks' fire.Fire(train) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 475, in _Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 691, in _CallAndUpdateTrace component, remaining_args = _CallAndUpdateTrace( File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/root/axolotl/scripts/finetune.py", line 198, in train component = fn(*varargs, **kwargs) File "/root/axolotl/scripts/finetune.py", line 198, in train trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer trainer = transformers.Trainer( TypeError: transformers.trainer.Trainer() got multiple values for keyword argument 'c allbacks'trainer = transformers.Trainer(

load tokenizer separately and before the models

this way the datasets can be tokenized without the models for data preparation

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	instruction = get_multi_line_input()
	if not instruction:
	return
	prompt: str = next(prompter_module().build_prompt(instruction=instruction))

	def do_inference(cfg, model, tokenizer, prompter="AlpacaPrompter"):
	tokenizer.add_special_tokens({"unk_token": "<unk>"})
	tokenizer.add_special_tokens({"bos_token": "<s>"})
	tokenizer.add_special_tokens({"eos_token": "</s>"})

	prompter_module = getattr(importlib.import_module("axolotl.prompters"), prompter)

	if d.shards:
	ds = ds.shuffle(seed=42)["train"].shard(num_shards=cfg.shards, index=0)

	if cfg.dataset_shard_num and cfg.dataset_shard_idx is not None:
	logging.info(
	f"Using index #{cfg.dataset_shard_idx} of {cfg.dataset_shard_num} shards"
	)
	dataset = dataset.shard(
	num_shards=cfg.dataset_shard_num, index=cfg.dataset_shard_idx
	)