Code Monkey home page Code Monkey logo

axolotl's People

Contributors

akj2018 avatar ali-mosavian avatar angainordev avatar brianfitzgerald avatar casper-hansen avatar cg123 avatar chiragjn avatar dreamgenx avatar fearnworks avatar hamelsmu avatar jinwonkim93 avatar johanwork avatar jphme avatar kallewoof avatar maximegmd avatar mhenrichsen avatar monk1337 avatar nanocode012 avatar napuh avatar pocketdoclabs avatar ricardodominguez avatar seungduk-yanolja avatar theobjectivedad avatar thytu avatar tmm1 avatar tokestermw avatar utensil avatar viktoriussuwandi avatar winglian avatar xzuyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

axolotl's Issues

[Question] Should inference instruction have stripped last new line?

instruction = get_multi_line_input()
if not instruction:
return
prompt: str = next(prompter_module().build_prompt(instruction=instruction))

The inference script requires typing enter to pass the input even it's only one line. The result is that a \n is appended to that line.

>>> get_multi_line_input()
Give me an instruction (Ctrl + D to finish): 
test
'test\n'

Should this be changed to instruction=instruction.strip('\n')?

I am not sure about other prompting style, but for completion, we want the text to be continued test is a word. Instead of having a \n after the input test\n is a word.

An alternative solution would be to add strip inside of the CompletionPrompter.

No module named axolotl.utils.validation

After pip installing axolotl, and trying to run the provided finetuning command I get:

Traceback (most recent call last): File "/home/someone/axolotl/scripts/finetune.py", line 17, in <module> from axolotl.utils.validation import validate_config ModuleNotFoundError: No module named 'axolotl.utils.validation'

Can't find validation.py anywhere in the commits either

[Feature] Allow pass prompter config

Proposal: Code has been written to accept any Prompter. We should allow this to be configurable using a cfg or kwarg

do_inference(cfg, model, tokenizer)

which is used here

def do_inference(cfg, model, tokenizer, prompter="AlpacaPrompter"):
tokenizer.add_special_tokens({"unk_token": "<unk>"})
tokenizer.add_special_tokens({"bos_token": "<s>"})
tokenizer.add_special_tokens({"eos_token": "</s>"})
prompter_module = getattr(importlib.import_module("axolotl.prompters"), prompter)

[Bug] Seed does not always load from `cfg.seed`

I think this is an easy Issue to tackle for anyone interested.

It should be set to a default value if not defined, somewhere at start hopefully (maybe when loading config).

  • Update below

ds = ds.shuffle(seed=42)["train"].shard(num_shards=cfg.shards, index=0)

  • Update below

dataset = Dataset.from_list(samples).shuffle(seed=42)

  • Pass seed to Trainer
  • Pass seed to any function that has it available

Unusable early_stopping_patience param

Whenever user uses early_stopping_patience it results in the following error: AssertionError: EarlyStoppingCallback requires load_best_model_at_end = True

RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()

I get the below at the end of training. I suspect it's due to loading 8 bit and https://github.com/winglian/axolotl/blob/47ad3890bc35985b9046f403312887035e19f96f/src/axolotl/utils/trainer.py#L99

Stack trace

File "/workspace/scripts/finetune.py", line 246, in <module> 
    fire.Fire(train) 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 141, in Fire 
    component_trace = _Fire(component, args, parsed_flag_args, context, name) 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 475, in _Fire 
    component, remaining_args = _CallAndUpdateTrace( 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace 
    component = fn(*varargs, **kwargs) 
  File "/workspace/scripts/finetune.py", line 235, in train 
    trainer.train(resume_from_checkpoint=resume_from_checkpoint) 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 1664, in train 
    return inner_training_loop( 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 2054, in _inner_training_loop 
    self._load_best_model() 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 2230, in _load_best_model 
    load_result = model.load_state_dict(state_dict, False) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2027, in load_state_dict 
    load(self, state_dict) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  [Previous line repeated 4 more times] 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2009, in load 
    module._load_from_state_dict( 
  File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/nn/modules.py", line 298, in _load_from_state_dict 
    raise RuntimeError("Loading a quantized checkpoint into non-quantized Linear8bitLt is " 
RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()

Info

Commit: Before dev merge winglian/axolotl@cb9a887

disable checkpoint for wandb_log_model:

update all the configs / examples and change wandb_log_model: checkpoint => wandb_log_model:

this will prevent uploading obscenely large artifacts to wandb by default and using quota

Save `adapter_bin` using callbacks if `lora`

Proposal

It would be good to also save lora during each checkpoint

Solution

We can also save the lora using callbacks. I saw code for callback, however, we can slightly modify it to not delete pytorch_model.bin, so that we can resume training.

We can check if adapter: lora then add the callback.

Happy to PR this.

Edit: Discussion at huggingface/peft#353 (comment)

issues to fix reported from discord

Bambi#1600
I can report my observations from my attempts at int8 LoRA training via the trainer built into Oobabooga’s textgen webUI if it helps:

  1. Myself and others are able to train LoRAs with int8 precision for the original unquantized HF llama-7b and llama-13b models
  2. The LoRA from this train produced expected results at inference when applied to the unquantized llama models
  3. VRAM usage during the train was observed to be evenly seemed split between cards
  4. GPU utilization however was observed to alternate between the cards (one card was pulling 150 watts, the other pulling 300 watts then they’d swap) indicating a serialized but threaded workload vs true parallelization
  5. Encountered but upon saving first checkpoint causing both cards to OOM. Following numerous forum threads we reverted out bitsandbytes model from 0.38.1 to 0.37.2 which resolved the issue.

[Refactor] Fix duplicate `config` and `examples` folder and update previous configs

In the past, the configs were all in configs. However, as things changed, some parts have been moved to the examples folder.

Furthermore, there are some old/invalid configs within the configs folder due to recent changes.

It would be good to move all configs to examples folder within each architecture for better maintenance and update those previous configs to work.

I'm curious if anyone has any better ideas?

Issue load Llama tokenizer

Hello, I'm getting a weird issue loading tokenizer. I've checked that the line of code hasn't changed even on my latest pull. The only difference could be transformer source changed something.

https://github.com/winglian/axolotl/blob/7576d85c735e307fa1dbbcb8e0cba8b53bb1fa48/src/axolotl/utils/models.py#L138-L139

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.88it/s]
Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear8bitLt(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)'.
Traceback (most recent call last):
  File "/workspace/src/axolotl/utils/models.py", line 140, in load_model
    tokenizer = LlamaTokenizer.from_pretrained(model)

What does `shard` do?

In latest update, there is shard argument. What is it trying to do? Is it trying to load model then output lora adapter?

https://github.com/winglian/axolotl/blob/cb9a88704707b0fc3362988a7f57b606e4448ac7/scripts/finetune.py#L169-L171

Is this due to how you're saving full model now?

https://github.com/winglian/axolotl/blob/cb9a88704707b0fc3362988a7f57b606e4448ac7/scripts/finetune.py#L222

As I understand, if you want to extract lora from checkpoint, you need to load from the checkpoints first, then set the base model with those weights. If this shard meant something different, could I PR this feature of lora extraction from checkpoint?

AttributeError: 'AlpacaPrompter' object has no attribute 'prompt_no_input'

Not sure if this is intended, but if the prompt dict contains the key "input" but the value for input is an empty string the line
input in prompt will resolve to False:

class AlpacaPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
    def parse_instruction_fields(self, prompt) -> (str, str, str):
        print(f"Is input in prompt?:  {input in prompt}")
        return (
            prompt["instruction"],
            prompt["input"] if "input" in prompt else "",
            prompt["output"],
        )

If the prompt input is an empty string, build_prompt will try to build a prompt with prompt_no_input

    def build_prompt(
        self,
        instruction: str,
        input: Union[None, str] = None,
        output: Union[None, str] = None,
    ) -> Generator[str, None, None]:
        # returns the full prompt from instruction and optional input
        # if a label (=response, =output) is provided, it's also appended.
        if input:
            res = self.prompt_input.format(instruction=instruction, input=input)
        else:
            res = self.prompt_no_input.format(instruction=instruction)
        if output:
            res = f"{res}{output}"
        yield res

but if the prompt style is 'alpaca', there is no prompt_no_input:

  def match_prompt_style(self):
      if self.prompt_style == PromptStyle.instruct.value:
          self.prompt_input = (
              self.system_prompt
              + "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
          )
          self.prompt_no_input = (
              self.system_no_input_prompt
              + "### Instruction:\n{instruction}\n\n### Response:\n"
          )
          self.response_split = "### Response:"
      if self.prompt_style == PromptStyle.chat.value:
          self.prompt_input = (
              self.system_prompt + "USER: {instruction}\n{input}\nASSISTANT:"
          )
          self.prompt_no_input = (
              self.system_no_input_prompt + "USER: {instruction}\nASSISTANT:"
          )
          self.response_split = "ASSISTANT:"

Not sure what the best solution is - Add a prompt_no_input for alpaca style prompts or rephrase the ifs so that the result is an empty "### Input: "?

I'm willing to do a PR, just tell me what solution you want to see.

[Feature] Allow passing file to inference on

Problem

It may be necessary to repeat the same questions across many experiments. It is time consuming to copy paste line by line.

Feature

Allow passing path to jsonl file or similar that can be read and ran through the model then output to a results file

[Refactor] Remove use of local variables `save_steps` and `eval_steps` as they are not modified

save_steps = cfg.save_steps
eval_steps = cfg.eval_steps

When I looked into the code and saw this, I thought it was saved to a local variable to be modified / compared. However, it does not seem so.

It might be better to remove these and use the original cfg counterparts to remove the preconception that the local variable is different. Of course, it's also ok to leave them as is.

[Question] Duplicate shard config names?

I noticed two different shard code using different configs in load_tokenized_prepared_datasets and load_prepare_datasets

if d.shards:
ds = ds.shuffle(seed=42)["train"].shard(num_shards=cfg.shards, index=0)

if cfg.dataset_shard_num and cfg.dataset_shard_idx is not None:
logging.info(
f"Using index #{cfg.dataset_shard_idx} of {cfg.dataset_shard_num} shards"
)
dataset = dataset.shard(
num_shards=cfg.dataset_shard_num, index=cfg.dataset_shard_idx
)

Not sure if these two parts should be combined and called elsewhere, but I think the config should be unified to use same name.

save steps enhancement

if save_steps is a fraction, calculate the steps based on floor(save_steps * total_steps_per_epoch).

this way if someone were to say 0.5, they could get a checkpoint at half an epoch and the end of an epoch without having to manually figure it out

[Feature] Replace `cfg.load_4bit` with `cfg.gptq`

Proposal: Change naming to reduce confusing with load_in_4bit which is used for qlora.

Breaking change: Yes

  • Replace all instances
  • Add assert in validation config to assert not cfg.load_4bit, "cfg.load_4bit has been deprecated. Please change to cfg.gptq"

ImportError: cannot import name 'Mapping' from 'collections'

 ~/g/axolotl   axolotl   main ≡  ?1  accelerate launch scripts/finetune.py configs/llama_30B_4bit.yml~~
Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 11, in <module>
    from attrdict import AttrDefault
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping
ImportError: cannot import name 'Mapping' from 'collections' (/home/eric/miniconda3/envs/axolotl/lib/python3.10/collections/__init__.py)
Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 11, in <module>
    from attrdict import AttrDefault
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping

Support python 3.10 and higher

Currently axolotl requires python 3.9
Python 3.10 and 3.11 will fail due to dependency issue
Can you please update the dependencies so that axolotl will work in 3.10 and 3.11

[Feature] Add tests

We need to also think about adding some tests to ensure more stability. I do not have much experience on the matter, however, I think at the very least, we should test the following:

Functional

  • Test validate_config for conflicting configs

End to end for each architecture in Readme for one or two global_step:

  • fp16/fp32
  • 4bit
  • 8bit
  • gptq

confusing error message

I get a confusing error message. Can you please help?

My command line is:
accelerate launch scripts/finetune.py configs/llama_30B_4bit.yml

My config is:

base_model: ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors
base_model_config: ../alpaca_lora_4bit/llama-30b-4bit/
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: false
datasets:
  - path: ../alpaca_lora_4bit/leet10k-alpaca-merged.json
    type: alpaca
dataset_prepared_path: data/last_run_prepared
val_set_size: 0.04
adapter: lora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len: 1024
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
output_dir: ./lora-test
batch_size: 128
micro_batch_size: 8
num_epochs: 4
warmup_steps: 100
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
tf32: true
gradient_checkpointing: false
early_stopping_patience: 3
resume_from_checkpoint:
auto_resume_from_checkpoints: true
local_rank:
load_4bit: true
xformers_attention: true
flash_attention:

My error message is:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
binbin  /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/0/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/3/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/2/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
bin /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/1/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 246, in <module>
    fire.Fire(train)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/eric/git/axolotl/scripts/finetune.py", line 178, in train
    model, tokenizer, lora_config = load_model(
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 136, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 445, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 922, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors' is not a valid JSON file.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 246, in <module>
    fire.Fire(train)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/eric/git/axolotl/scripts/finetune.py", line 178, in train
    model, tokenizer, lora_config = load_model(
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 136, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 445, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 922, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors' is not a valid JSON file.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2878 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2879 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2881 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 2880) of binary: /home/eric/miniconda3/envs/axolotl2/bin/python
Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/launch.py", line 914, in launch_command
    multi_gpu_launcher(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/launch.py", line 603, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
scripts/finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-04-24_23:18:00
  host      : mlc-win.
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 2880)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

qlora save peft on final callback

{'eval_loss': 1.2171393632888794, 'eval_runtime': 7.1067, 'eval_samples_per_second': 4.362, 'eval_steps_per_second': 0.141, 'epoch': 4.38}
{'loss': 1.0812, 'learning_rate': 3.581603349196372e-06, 'epoch': 4.5}
{'loss': 1.0813, 'learning_rate': 2.0253513192751373e-06, 'epoch': 4.62}
{'loss': 1.0691, 'learning_rate': 9.035651368646648e-07, 'epoch': 4.75}
{'loss': 1.0922, 'learning_rate': 2.2640387134577058e-07, 'epoch': 4.88}
{'loss': 1.117, 'learning_rate': 0.0, 'epoch': 5.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [2:10:50<00:00, 192.75s/it]
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
fire.Fire(train)fire.Fire(train)Traceback (most recent call last):
Traceback (most recent call last):

trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/workspace/axolotl/scripts/finetune.py", line 244, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
return inner_training_loop(
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
self._load_best_model()
self._load_best_model()self._load_best_model()

File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._issue_warnings_after_load(load_result)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._issue_warnings_after_load(load_result)
UnboundLocalErrorUnboundLocalError: local variable 'load_result' referenced before assignment
: local variable 'load_result' referenced before assignment
self._issue_warnings_after_load(load_result)
self._load_best_model()
UnboundLocalError : File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model

File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model

UnboundLocalError: local variable 'load_result' referenced before assignment
self._issue_warnings_after_load(load_result)

GPTQ vs QLoRA

GPTQ and QLoRA are mutually exclusive when it comes to the PEFT dependency. see https://github.com/winglian/alpaca_lora_4bit/blob/main/requirements.txt#L9 vs QLoRA basically needing main. It's probably worth removing the [int4] part of the install from the docker container, and simply doing a basic install. We'll also need to update the docs for those people who want to use GPTQ that they will need to pip uninstall peft and pip install .[int4]. Also, the caveat for them is they need to uninstall peft again if they want to switch back to qlora.

[BUG] Fix attention masking when concatenating sequences

Someone should review, but I think we're doing it incorrectly. we need to set the attention mask so that the first token in each concatenated sequence has an attention mask of zero. In most cases, this is setting the mask of the bos token to zero

add bitsandbytes build with cuda library in base docker image

from my qlora notes:

cd bitsandbytes
CUDA_VERSION=118 make cuda11x
pip uninstall bitsandbytes
python setup.py install
pip install scipy
pip uninstall transformers
pip install "transformers @ git+https://github.com/huggingface/transformers.git
pip install bert-score==0.3.13 evaluate==0.4.0 rouge-score==0.1.2 scikit-learn==1.2.2 sentencepiece==0.1.99 wandb==0.15.2

should update requirements.txt too.

[Bug] Add `cfg.hf_use_auth_token` to set whether to attach auth token

As discussed in Discord, if a user is not authenticated to huggingface, the code would error as it expects the token.

We would like to swap to look for whether to attach using a config instead cfg.hf_use_auth_token.

  • Set all use_auth_token=True to load from cfg use_auth_token=cfg.hf_use_auth_token instead

    f"{cfg.push_dataset_to_hub}/{ds_hash}", use_auth_token=True

  • Add it to the docs under all yaml options

Trainer() got multiple values for keyword argument 'callbacks'

when running (8xA100 80Gb)
I run into this error:

File "/root/axolotl/scripts/finetune.py", line 239, in <module> trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer Traceback (most recent call last): File "/root/axolotl/scripts/finetune.py", line 239, in <module> trainer = transformers.Trainer( fire.Fire(train)TypeError : File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fir e/core.py", line 141, in Fire transformers.trainer.Trainer() got multiple values for keyword argument 'callbacks' fire.Fire(train) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 475, in _Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 691, in _CallAndUpdateTrace component, remaining_args = _CallAndUpdateTrace( File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/root/axolotl/scripts/finetune.py", line 198, in train component = fn(*varargs, **kwargs) File "/root/axolotl/scripts/finetune.py", line 198, in train trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer trainer = transformers.Trainer( TypeError: transformers.trainer.Trainer() got multiple values for keyword argument 'c allbacks'trainer = transformers.Trainer(

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.