tloen / alpaca-lora Goto Github PK
View Code? Open in Web Editor NEWInstruct-tune LLaMA on consumer hardware
License: Apache License 2.0
Instruct-tune LLaMA on consumer hardware
License: Apache License 2.0
Hey!, I apologize if this is a rather generic question
I'm not able to find good examples on how I may continue training with peft over on the peft repository from a stored peft checkpoint,
and since the fine tuning code only shows how to fine tune from scratch, I'd be greatful if I could be given an example on how I may fine tune alpaca from the stored peft checkpoint instead of scratch
thanks, and I really appreciate all the work put into this project!
I wonder if the order of the data (in https://github.com/tloen/alpaca-lora/blob/main/alpaca_data.json) matters for training.
For example, If I wanted to add my own data to the original dataset in the cleaned JSON file, would it be fine to append it to the end of the JSON? Or should I append and then shuffle the order?
Excuse my ignorance, as I am new to this technology entirely.
Regards
when I set pad token 0 and padding=True,
the generated text for the padded prompt shows always
(alpaca-lora) root@DESKTOP-FRT:/mnt/f/nlp/alpaca-lora# python generate.py
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Traceback (most recent call last):
File "/mnt/f/nlp/alpaca-lora/generate.py", line 13, in
model = LlamaForCausalLM.from_pretrained(
File "/root/anaconda3/envs/alpaca-lora2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2546, in from_pretrained
importlib_metadata.version("bitsandbytes")
File "/root/anaconda3/envs/alpaca-lora2/lib/python3.10/importlib/metadata/init.py", line 996, in version
return distribution(distribution_name).version
File "/root/anaconda3/envs/alpaca-lora2/lib/python3.10/importlib/metadata/init.py", line 969, in distribution
return Distribution.from_name(distribution_name)
File "/root/anaconda3/envs/alpaca-lora2/lib/python3.10/importlib/metadata/init.py", line 548, in from_name
raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes
I've just played with 4bit quantization and it works really good. Much faster loading and inference time, ability to load much bigger model on GPU without quality degradation. It's just like magic.
But in order to make quantized lora-trained model I need to combine somehow base HF + trained Lora and get new model in HF format. This new model can be quantized with GPTQ-for-LLaMa script.
Here is a someones guide for 4bit LLAMA if you want to try
Can anybody help?
Not exactly an issue, but have just been trying to run one epoch of finetuning with llama-13b. On a 4090 looks like it will take roughly 4 hours with the setting `MICRO_BATCH_SIZE = 2'.
However, it looks like the loss already converged to ~1 within epoch 0.12 (roughly 30 minutes into training), so it doesn't really make sense to use epoch=3 and potentially a larger micro batch size.
I could be wrong here. Happy to hear some feedback on how to better tune the parameters.
Noticed you have MAX_LENGTH set to 256, while Stanford used 512.
Is there a reason you set it to the smaller value? Curious if you are getting better results and what your reasoning was for using 256?
not sure when I am going to shutdown, but I will leave this for few hours at least (running on RTX5000). maybe I will put things up in GKE later for tester purpose
NOTE: didn't do anything about maintaining the context yet
https://notebooksa.jarvislabs.ai/P1lDk5ziArYf6hVUkcne1vVlbwica44Ux7zNWyAeq-c69p-j0D1_ktPMmBKniGk8/
The code performed fine in colab. I want to do it in a new test environment, not in the gradio on my development server.
However, the code below gives an error.
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=2048,
)
test.py is equivalent to generate.py.
/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/generation/utils.py:1374: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your m
(conda_alpaca) jovyan@ranking-0:~/alpaca-lora$ python3 test.py
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /opt/conda/envs/conda_alpaca/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 118
/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:14<00:00, 2.22it/s]
Instruction: Tell me about alpacas.
Traceback (most recent call last):
File "test.py", line 90, in <module>
print("Response:", evaluate(instruction))
File "test.py", line 47, in evaluate
generation_output = model.generate(
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/peft/peft_model.py", line 581, in generate
outputs = self.base_model.generate(**kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/generation/utils.py", line 1490, in generate
return self.beam_search(
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/generation/utils.py", line 2749, in beam_search
outputs = self(
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 772, in forward
outputs = self.model(
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 621, in forward
layer_outputs = decoder_layer(
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 316, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 216, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/peft/tuners/lora.py", line 522, in forward
result = super().forward(x)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 360, in forward
outliers = state.CB[:, state.idx.long()].clone()
TypeError: 'NoneType' object is not subscriptable
Awesome project. Thanks so much for creating a foss alpaca codebase.
Training the 7B model takes about 18GB of RAM.
I tried training the 13B model, and ran out of VRAM on my 24GB card. I suspect, will need at least 32GB of VRAM.
Has anyone else been successful with fine-tuning a 13B model?
I tried to fine tune the 13B model with a 3090 (24GB Ram). The training was started and a progress bar was also shown, however, I got an error saying 'maximum recursion depth exceeded' after 100 steps of training. Has anyone had the similar error? Thanks!
These instructions will allow you to finetune on windows.
Warning
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Caused by tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf", cache_dir="./cache/")
Protential solution is modify tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf", cache_dir="./cache/")
or modifypath_to_cache_hfmodel/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/tokenizer_config.json
change
{"bos_token": "", "eos_token": "", "model_max_length": 1000000000000000019884624838656, "tokenizer_class": "LlamaTokenizer", "unk_token": ""}
to {"bos_token": "", "eos_token": "", "model_max_length": 1000000000000000019884624838656, "tokenizer_class": "LLaMATokenizer", "unk_token": ""}
Should we create a discord or slack channel to discuss the issues?
Getting the following error with the latest commit (even after uninstalling and re-installing transformers from git):
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
I also tried force_download=True and still get the error.
Traceback (most recent call last):
File "/workspace/alpaca-lora/finetune.py", line 95, in <module>
trainer.train(resume_from_checkpoint=False)
File "/workspace/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 1628, in train
return inner_training_loop(
File "/workspace/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 1895, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/workspace/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 2637, in training_step
loss = self.compute_loss(model, inputs)
File "/workspace/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 2669, in compute_loss
outputs = model(**inputs)
File "/workspace/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 157, in forward
raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
0%| | 0/1083 [00:00<?, ?it/s]
Trying to run this on a 4xA100 instance, getting this error.
Nvidia-smi shows that something is getting loaded into both gpu0 and gpu1:
Hi,
Thanks for the finetuning code. I found that in the original LLaMa code, the pad_token_id is -1, and in your implementation, change it into 0
tokenizer.pad_token_id = 0
Would you mind explaining the reason since if the pad id is -1, we could not look up in the embedding tables.
Thanks.
Originally I was playing around with https://github.com/zphang/minimal-llama/ to generate alpaca-like adaptation here. Both use peft but in a slightly different fashion, like different parameter saving and using a custom trainer. Was wondering which one is the idiomatic approach?
It might be a good idea to share the hardware spec and parameters that you got the fine-tuning to work to get a sense of the hardware requirement.
First of all, thanks for building this up so quickly!
I really appreciate your hard work.
I was wondering if this code works w/o 8-bit quantization in general since it uses HuggingFace's common interface
/usr/local/lib/python3.9/dist-packages/peft/tuners/lora.py in _find_and_replace(self)
146 parent, target, target_name = self._get_submodules(key)
147 bias = target.bias is not None
--> 148 if loaded_in_8bit and isinstance(target, bnb.nn.Linear8bitLt):
149 kwargs.update(
150 {
NameError: name 'bnb' is not defined
Sorry for my probably newbie questions.
As I understand with current finetuning we teach how model should answer to our questions based on already present knowledge from basic training.
But what If I want to include new knowledge to the model and be able to ask different questions about it?
For example: make model learn the whole novel/specific town latest news/specific scientific paper etc. Then ask it to summarize or analyze something within that new knowledge. In other words, if I want to ask a model questions based on huge input text (which is 100-1000 times bigger than the max input tokens).
How can I achieve this? Or where I can learn how to do it?
I think it would be very interesting if this model could be run with JavaScript. One of the main issues with chatGPT and others is during peak hours when many people connect, causing server overload. If the model could be run directly in the browser, many people could use it quickly and without any server costs.
When I try :
Instruction: Tell me about the president of Mexico in 2019.
I get
Response: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1st, 2018. He is a member of the National Regeneration Movement (MORENA) political party and is the first left-wing president of Mexico since 1946. He is known for his anti-corruption and anti-neolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioli
Instead of
The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1, 2018. He is a member of the National Regeneration Movement (MORENA) political party and is the first left-wing president of Mexico since 1946. He is known for his anti-corruption and anti-neoliberal policies, as well as his commitment to improving the living conditions of the Mexican people.
I just add device_map={'': 0}
follow the #14 (comment)
Why is the result the same every time?
Hi. I got some free TPU quotas from Google and I tried to train a model on a Google Cloud TPU VM (v2-8). It can download the model but had the following error. Below are the full logs:
$python3 finetune.py
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
/home/aicheung/.local/lib/python3.8/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Traceback (most recent call last):
File "finetune.py", line 46, in <module>
model = LlamaForCausalLM.from_pretrained(
File "/home/aicheung/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2591, in from_pretrained
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
`device_map` to `from_pretrained`. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
Is TPU not supported? I have no experience with HuggingFace's libraries (only used Tensorflow before) so I am not sure how it works. Thanks.
Here's some code needed for this adjustments.
https://github.com/johnsmith0031/alpaca_lora_4bit
Don't know why the training is so slow.
Update: for anyone experiencing this issue, see the workaround I posted in #14 (comment)
I tried out the finetune script locally and and it looks like there was no problem with that. However, when trying to run inference, I'm getting AttributeError: 'NoneType' object has no attribute 'device'
from bitsandbytes. I've checked and looks like it was an issue related to model sharing on cpu and gpu, but I am not sure which part of this repo is causing that. Any idea?
Relevant issue in bitsandbytes: bitsandbytes-foundation/bitsandbytes#40
Is it possible to stream each token of the output as soon as it is generated by the model? I guess it depends on the hugging face transformers classes and methods. Any solution to this?
Hi all,
I am trying to start generate.py, but don't have a CUDA card? (Actually I can insert an old Quadro K2000, if this helps?)
After all the steps in the setup section, I get the following:
/home/georgi/Documents/GitHub/alpaca-lora/venv/bin/python /home/georgi/Documents/GitHub/alpaca-lora/generate.py
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/georgi/Documents/GitHub/alpaca-lora/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Traceback (most recent call last):
File "/home/georgi/Documents/GitHub/alpaca-lora/generate.py", line 7, in <module>
model = LlamaForCausalLM.from_pretrained(
File "/home/georgi/Documents/GitHub/alpaca-lora/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2591, in from_pretrained
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
`device_map` to `from_pretrained`. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
Process finished with exit code 1
Is there a way around this? I tried inserting load_in_8bit_fp32_cpu_offload=True in a few places, but doesn't fix it.
Thanks!
First of all - thank you for the code and model. Even for 7B it works awesome!
But I stumbled upon strange problem. It doesn't matter how I set generator param max_new_tokens or max_length (any big numbers like 2048) it always limit generation on 256 new tokens and stops without ending the sentence. While limiting to lower then 256 works as expected.
Is it hard-coded somewhere which I can edit for my setup?
I got the error as blow and hope someone can solve it.
I have change the device_map(such as "balanced", "balanced_low_0", "sequential") in
model = LLaMAForCausalLM.from_pretrained(
"decapoda-research/llama-7b-hf",
load_in_8bit=True,
device_map="auto",
)
but not working.
evaluate(input("Instruction: ")) # how to learn english
Instruction: how to learn english
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[16], line 1
----> 1 evaluate(input("Instruction: "))
Cell In[15], line 11, in evaluate(instruction, input)
9 inputs = tokenizer(prompt, return_tensors="pt")
10 input_ids = inputs["input_ids"].cuda()
---> 11 generation_output = model.generate(
12 input_ids=input_ids,
13 generation_config=generation_config,
14 return_dict_in_generate=True,
15 output_scores=True,
16 max_new_tokens=256
17 )
18 for s in generation_output.sequences:
19 output = tokenizer.decode(s)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/peft/peft_model.py:581, in PeftModelForCausalLM.generate(self, **kwargs)
579 try:
580 if not isinstance(self.peft_config, PromptLearningConfig):
--> 581 outputs = self.base_model.generate(**kwargs)
582 else:
583 if "input_ids" not in kwargs:
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/generation/utils.py:1490, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, **kwargs)
1483 input_ids, model_kwargs = self._expand_inputs_for_generation(
1484 input_ids=input_ids,
1485 expand_size=generation_config.num_beams,
1486 is_encoder_decoder=self.config.is_encoder_decoder,
1487 **model_kwargs,
1488 )
1489 # 13. run beam search
-> 1490 return self.beam_search(
1491 input_ids,
1492 beam_scorer,
1493 logits_processor=logits_processor,
1494 stopping_criteria=stopping_criteria,
1495 pad_token_id=generation_config.pad_token_id,
1496 eos_token_id=generation_config.eos_token_id,
1497 output_scores=generation_config.output_scores,
1498 return_dict_in_generate=generation_config.return_dict_in_generate,
1499 synced_gpus=synced_gpus,
1500 **model_kwargs,
1501 )
1503 elif is_beam_sample_gen_mode:
1504 # 11. prepare logits warper
1505 logits_warper = self._get_logits_warper(generation_config)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/generation/utils.py:2749, in GenerationMixin.beam_search(self, input_ids, beam_scorer, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
2745 break
2747 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-> 2749 outputs = self(
2750 **model_inputs,
2751 return_dict=True,
2752 output_attentions=output_attentions,
2753 output_hidden_states=output_hidden_states,
2754 )
2756 if synced_gpus and this_peer_finished:
2757 cur_len = cur_len + 1
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:770, in LLaMAForCausalLM.forward(self, input_ids, attention_mask, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
767 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
769 # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
--> 770 outputs = self.model(
771 input_ids=input_ids,
772 attention_mask=attention_mask,
773 past_key_values=past_key_values,
774 inputs_embeds=inputs_embeds,
775 use_cache=use_cache,
776 output_attentions=output_attentions,
777 output_hidden_states=output_hidden_states,
778 return_dict=return_dict,
779 )
781 hidden_states = outputs[0]
782 logits = self.lm_head(hidden_states)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:619, in LLaMAModel.forward(self, input_ids, attention_mask, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
612 layer_outputs = torch.utils.checkpoint.checkpoint(
613 create_custom_forward(decoder_layer),
614 hidden_states,
615 attention_mask,
616 None,
617 )
618 else:
--> 619 layer_outputs = decoder_layer(
620 hidden_states,
621 attention_mask=attention_mask,
622 past_key_value=past_key_value,
623 output_attentions=output_attentions,
624 use_cache=use_cache,
625 )
627 hidden_states = layer_outputs[0]
629 if use_cache:
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:316, in LLaMADecoderLayer.forward(self, hidden_states, attention_mask, output_attentions, use_cache, past_key_value)
313 hidden_states = self.input_layernorm(hidden_states)
315 # Self Attention
--> 316 hidden_states, self_attn_weights, present_key_value = self.self_attn(
317 hidden_states=hidden_states,
318 past_key_value=past_key_value,
319 attention_mask=attention_mask,
320 output_attentions=output_attentions,
321 )
322 hidden_states = residual + hidden_states
324 # Fully Connected
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:216, in LLaMAAttention.forward(self, hidden_states, past_key_value, attention_mask, output_attentions)
212 """Input shape: Batch x Time x Channel"""
214 bsz, q_len, _ = hidden_states.size()
--> 216 query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
217 key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
218 value_states = self.v_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/peft/tuners/lora.py:522, in Linear8bitLt.forward(self, x)
521 def forward(self, x: torch.Tensor):
--> 522 result = super().forward(x)
524 if self.disable_adapters:
525 return result
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/bitsandbytes/nn/modules.py:242, in Linear8bitLt.forward(self, x)
239 if self.bias is not None and self.bias.dtype != x.dtype:
240 self.bias.data = self.bias.data.to(x.dtype)
--> 242 out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
243 if not self.state.has_fp16_weights:
244 if self.state.CB is not None and self.state.CxB is not None:
245 # we converted 8-bit row major to turing/ampere format in the first inference pass
246 # we no longer need the row-major weight
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:488, in matmul(A, B, out, state, threshold, bias)
486 if threshold > 0.0:
487 state.threshold = threshold
--> 488 return MatMul8bitLt.apply(A, B, out, bias, state)
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:317, in MatMul8bitLt.forward(ctx, A, B, out, bias, state)
313 else:
314 if state.CxB is None and using_igemmlt:
315 # B in in 8-bit row-major, we can transform it back to 16-bit to extract outlier dimensions
316 # we also need to convert it to the turing/ampere format
--> 317 state.CxB, state.SB = F.transform(state.CB, to_order=formatB)
318 else:
319 if not state.has_fp16_weights and state.CxB is None and using_igemmlt:
File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/bitsandbytes/functional.py:1698, in transform(A, to_order, from_order, out, transpose, state, ld)
1697 def transform(A, to_order, from_order='row', out=None, transpose=False, state=None, ld=None):
-> 1698 prev_device = pre_call(A.device)
1699 if state is None: state = (A.shape, from_order)
1700 else: from_order = state[1]
AttributeError: 'NoneType' object has no attribute 'device'
Thu Mar 16 16:37:03 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01 Driver Version: 510.39.01 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:1A:00.0 Off | N/A |
| 31% 32C P2 50W / 250W | 8012MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:3D:00.0 Off | N/A |
| 29% 31C P2 50W / 250W | 4350MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3903 C ...nvs/LiuJieTest/bin/python 8009MiB |
| 1 N/A N/A 3903 C ...nvs/LiuJieTest/bin/python 4347MiB |
+-----------------------------------------------------------------------------+
# packages in environment at /home/zhuji/miniconda3/envs/LiuJieTest:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
accelerate 0.17.1 pypi_0 pypi
aiohttp 3.8.4 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
anyio 3.6.2 pypi_0 pypi
argon2-cffi 21.3.0 pypi_0 pypi
argon2-cffi-bindings 21.2.0 pypi_0 pypi
arrow 1.2.3 pypi_0 pypi
asgiref 3.6.0 pypi_0 pypi
asttokens 2.2.1 pypi_0 pypi
async-timeout 4.0.2 pypi_0 pypi
attrs 22.2.0 pypi_0 pypi
backcall 0.2.0 pypi_0 pypi
beautifulsoup4 4.11.2 pypi_0 pypi
bitsandbytes 0.37.1 pypi_0 pypi
bleach 6.0.0 pypi_0 pypi
ca-certificates 2023.01.10 h06a4308_0
certifi 2022.12.7 py39h06a4308_0
cffi 1.15.1 pypi_0 pypi
charset-normalizer 3.1.0 pypi_0 pypi
cmake 3.26.0 pypi_0 pypi
comm 0.1.2 pypi_0 pypi
datasets 2.10.1 pypi_0 pypi
debugpy 1.6.6 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
defusedxml 0.7.1 pypi_0 pypi
dill 0.3.6 pypi_0 pypi
django 4.1.7 pypi_0 pypi
executing 1.2.0 pypi_0 pypi
fastjsonschema 2.16.3 pypi_0 pypi
filelock 3.10.0 pypi_0 pypi
fqdn 1.5.1 pypi_0 pypi
frozenlist 1.3.3 pypi_0 pypi
fsspec 2023.3.0 pypi_0 pypi
huggingface-hub 0.13.2 pypi_0 pypi
idna 3.4 pypi_0 pypi
importlib-metadata 6.0.0 pypi_0 pypi
ipykernel 6.21.3 pypi_0 pypi
ipython 8.11.0 pypi_0 pypi
ipython-genutils 0.2.0 pypi_0 pypi
ipywidgets 8.0.4 pypi_0 pypi
isoduration 20.11.0 pypi_0 pypi
jedi 0.18.2 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
jsonpointer 2.3 pypi_0 pypi
jsonschema 4.17.3 pypi_0 pypi
jupyter 1.0.0 pypi_0 pypi
jupyter-client 8.0.3 pypi_0 pypi
jupyter-console 6.6.3 pypi_0 pypi
jupyter-core 5.2.0 pypi_0 pypi
jupyter-events 0.6.3 pypi_0 pypi
jupyter-server 2.4.0 pypi_0 pypi
jupyter-server-terminals 0.4.4 pypi_0 pypi
jupyterlab-pygments 0.2.2 pypi_0 pypi
jupyterlab-widgets 3.0.5 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.2 h6a678d5_6
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
lit 15.0.7 pypi_0 pypi
loralib 0.1.1 pypi_0 pypi
markupsafe 2.1.2 pypi_0 pypi
matplotlib-inline 0.1.6 pypi_0 pypi
mistune 2.0.5 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
multidict 6.0.4 pypi_0 pypi
multiprocess 0.70.14 pypi_0 pypi
nbclassic 0.5.3 pypi_0 pypi
nbclient 0.7.2 pypi_0 pypi
nbconvert 7.2.10 pypi_0 pypi
nbformat 5.7.3 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nest-asyncio 1.5.6 pypi_0 pypi
networkx 3.0 pypi_0 pypi
notebook 6.5.3 pypi_0 pypi
notebook-shim 0.2.2 pypi_0 pypi
numpy 1.24.2 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi
nvidia-curand-cu11 10.2.10.91 pypi_0 pypi
nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi
nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi
nvidia-nccl-cu11 2.14.3 pypi_0 pypi
nvidia-nvtx-cu11 11.7.91 pypi_0 pypi
openssl 1.1.1t h7f8727e_0
packaging 23.0 pypi_0 pypi
pandas 1.5.3 pypi_0 pypi
pandocfilters 1.5.0 pypi_0 pypi
parso 0.8.3 pypi_0 pypi
peft 0.3.0.dev0 pypi_0 pypi
pexpect 4.8.0 pypi_0 pypi
pickleshare 0.7.5 pypi_0 pypi
pip 23.0.1 py39h06a4308_0
platformdirs 3.1.1 pypi_0 pypi
prometheus-client 0.16.0 pypi_0 pypi
prompt-toolkit 3.0.38 pypi_0 pypi
psutil 5.9.4 pypi_0 pypi
ptyprocess 0.7.0 pypi_0 pypi
pure-eval 0.2.2 pypi_0 pypi
pyarrow 11.0.0 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pygments 2.14.0 pypi_0 pypi
pyrsistent 0.19.3 pypi_0 pypi
python 3.9.16 h7a1cb2a_2
python-dateutil 2.8.2 pypi_0 pypi
python-json-logger 2.0.7 pypi_0 pypi
pytz 2022.7.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
pyzmq 25.0.1 pypi_0 pypi
qtconsole 5.4.1 pypi_0 pypi
qtpy 2.3.0 pypi_0 pypi
readline 8.2 h5eee18b_0
regex 2022.10.31 pypi_0 pypi
requests 2.28.2 pypi_0 pypi
responses 0.18.0 pypi_0 pypi
rfc3339-validator 0.1.4 pypi_0 pypi
rfc3986-validator 0.1.1 pypi_0 pypi
send2trash 1.8.0 pypi_0 pypi
sentencepiece 0.1.97 pypi_0 pypi
setuptools 65.6.3 py39h06a4308_0
six 1.16.0 pypi_0 pypi
sniffio 1.3.0 pypi_0 pypi
soupsieve 2.4 pypi_0 pypi
sqlite 3.41.1 h5eee18b_0
sqlparse 0.4.3 pypi_0 pypi
stack-data 0.6.2 pypi_0 pypi
sympy 1.11.1 pypi_0 pypi
terminado 0.17.1 pypi_0 pypi
tinycss2 1.2.1 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.2 pypi_0 pypi
torch 2.0.0 pypi_0 pypi
tornado 6.2 pypi_0 pypi
tqdm 4.65.0 pypi_0 pypi
traitlets 5.9.0 pypi_0 pypi
transformers 4.27.0.dev0 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
typing-extensions 4.5.0 pypi_0 pypi
tzdata 2022g h04d1e81_0
uri-template 1.2.0 pypi_0 pypi
urllib3 1.26.15 pypi_0 pypi
wcwidth 0.2.6 pypi_0 pypi
webcolors 1.12 pypi_0 pypi
webencodings 0.5.1 pypi_0 pypi
websocket-client 1.5.1 pypi_0 pypi
wheel 0.38.4 py39h06a4308_0
widgetsnbextension 4.0.5 pypi_0 pypi
xxhash 3.2.0 pypi_0 pypi
xz 5.2.10 h5eee18b_1
yarl 1.8.2 pypi_0 pypi
zipp 3.15.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
3.pip list
Package Version
------------------------ -----------
accelerate 0.17.1
aiohttp 3.8.4
aiosignal 1.3.1
anyio 3.6.2
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
asgiref 3.6.0
asttokens 2.2.1
async-timeout 4.0.2
attrs 22.2.0
backcall 0.2.0
beautifulsoup4 4.11.2
bitsandbytes 0.37.1
bleach 6.0.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.1.0
cmake 3.26.0
comm 0.1.2
datasets 2.10.1
debugpy 1.6.6
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.6
Django 4.1.7
executing 1.2.0
fastjsonschema 2.16.3
filelock 3.10.0
fqdn 1.5.1
frozenlist 1.3.3
fsspec 2023.3.0
huggingface-hub 0.13.2
idna 3.4
importlib-metadata 6.0.0
ipykernel 6.21.3
ipython 8.11.0
ipython-genutils 0.2.0
ipywidgets 8.0.4
isoduration 20.11.0
jedi 0.18.2
Jinja2 3.1.2
jsonpointer 2.3
jsonschema 4.17.3
jupyter 1.0.0
jupyter_client 8.0.3
jupyter-console 6.6.3
jupyter_core 5.2.0
jupyter-events 0.6.3
jupyter_server 2.4.0
jupyter_server_terminals 0.4.4
jupyterlab-pygments 0.2.2
jupyterlab-widgets 3.0.5
lit 15.0.7
loralib 0.1.1
MarkupSafe 2.1.2
matplotlib-inline 0.1.6
mistune 2.0.5
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.14
nbclassic 0.5.3
nbclient 0.7.2
nbconvert 7.2.10
nbformat 5.7.3
nest-asyncio 1.5.6
networkx 3.0
notebook 6.5.3
notebook_shim 0.2.2
numpy 1.24.2
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
packaging 23.0
pandas 1.5.3
pandocfilters 1.5.0
parso 0.8.3
peft 0.3.0.dev0
pexpect 4.8.0
pickleshare 0.7.5
pip 23.0.1
platformdirs 3.1.1
prometheus-client 0.16.0
prompt-toolkit 3.0.38
psutil 5.9.4
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 11.0.0
pycparser 2.21
Pygments 2.14.0
pyrsistent 0.19.3
python-dateutil 2.8.2
python-json-logger 2.0.7
pytz 2022.7.1
PyYAML 6.0
pyzmq 25.0.1
qtconsole 5.4.1
QtPy 2.3.0
regex 2022.10.31
requests 2.28.2
responses 0.18.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
Send2Trash 1.8.0
sentencepiece 0.1.97
setuptools 65.6.3
six 1.16.0
sniffio 1.3.0
soupsieve 2.4
sqlparse 0.4.3
stack-data 0.6.2
sympy 1.11.1
terminado 0.17.1
tinycss2 1.2.1
tokenizers 0.13.2
torch 2.0.0
tornado 6.2
tqdm 4.65.0
traitlets 5.9.0
transformers 4.27.0.dev0
triton 2.0.0
typing_extensions 4.5.0
uri-template 1.2.0
urllib3 1.26.15
wcwidth 0.2.6
webcolors 1.12
webencodings 0.5.1
websocket-client 1.5.1
wheel 0.38.4
widgetsnbextension 4.0.5
xxhash 3.2.0
yarl 1.8.2
zipp 3.15.0
Hello,
I'd like to execute generate.py the following way:
I don't want it to create a public link
I don't want it to bind to 127.0.01, instead, I want it to bind to 0.0.0.0, therefore it's accessible on the local network.
Thank you.
code
python generate.py
error
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/t-enshengshi/anaconda3/envs/alpaca-lora did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 112
/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so...
Downloading tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 8.89MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 173B/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 141/141 [00:00<00:00, 48.2kB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 427/427 [00:00<00:00, 46.2kB/s]
Downloading (…)model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25.5k/25.5k [00:00<00:00, 333kB/s]
Downloading (…)l-00001-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:02<00:00, 169MB/s]
Downloading (…)l-00002-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 103MB/s]
Downloading (…)l-00003-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:07<00:00, 57.7MB/s]
Downloading (…)l-00004-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 118MB/s]
Downloading (…)l-00005-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 96.5MB/s]
Downloading (…)l-00006-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:07<00:00, 51.3MB/s]
Downloading (…)l-00007-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 104MB/s]
Downloading (…)l-00008-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 94.4MB/s]
Downloading (…)l-00009-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 97.8MB/s]
Downloading (…)l-00010-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 119MB/s]
Downloading (…)l-00011-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 122MB/s]
Downloading (…)l-00012-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 128MB/s]
Downloading (…)l-00013-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 92.4MB/s]
Downloading (…)l-00014-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 114MB/s]
Downloading (…)l-00015-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 109MB/s]
Downloading (…)l-00016-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 120MB/s]
Downloading (…)l-00017-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 120MB/s]
Downloading (…)l-00018-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 130MB/s]
Downloading (…)l-00019-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 84.2MB/s]
Downloading (…)l-00020-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 86.4MB/s]
Downloading (…)l-00021-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 99.9MB/s]
Downloading (…)l-00022-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 110MB/s]
Downloading (…)l-00023-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 105MB/s]
Downloading (…)l-00024-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 122MB/s]
Downloading (…)l-00025-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 117MB/s]
Downloading (…)l-00026-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 96.9MB/s]
Downloading (…)l-00027-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 129MB/s]
Downloading (…)l-00028-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 96.6MB/s]
Downloading (…)l-00029-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:02<00:00, 158MB/s]
Downloading (…)l-00030-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 101MB/s]
Downloading (…)l-00031-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 100MB/s]
Downloading (…)l-00032-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 108MB/s]
Downloading (…)l-00033-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 524M/524M [00:05<00:00, 94.0MB/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:08<00:00, 3.78it/s]
Some weights of the model checkpoint at decapoda-research/llama-7b-hf were not used when initializing LLaMAForCausalLM: ['model.layers.6.input_layernorm.weight', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.19.mlp.down_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.10.mlp.up_proj.weight', 'model.layers.19.input_layernorm.weight', 'model.layers.28.post_attention_layernorm.weight', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.1.post_attention_layernorm.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.5.post_attention_layernorm.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.19.post_attention_layernorm.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.3.post_attention_layernorm.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.20.post_attention_layernorm.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.4.mlp.up_proj.weight', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.11.post_attention_layernorm.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.7.mlp.down_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.31.input_layernorm.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.6.mlp.down_proj.weight', 'model.layers.7.input_layernorm.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.13.post_attention_layernorm.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.24.input_layernorm.weight', 'model.layers.0.input_layernorm.weight', 'model.layers.28.input_layernorm.weight', 'model.norm.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.18.post_attention_layernorm.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.2.input_layernorm.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.29.input_layernorm.weight', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.24.mlp.down_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.11.mlp.up_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.30.input_layernorm.weight', 'model.layers.8.input_layernorm.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.20.input_layernorm.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.3.input_layernorm.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.22.input_layernorm.weight', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.5.input_layernorm.weight', 'model.layers.16.post_attention_layernorm.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.18.input_layernorm.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.16.input_layernorm.weight', 'model.layers.31.post_attention_layernorm.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.29.post_attention_layernorm.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.26.input_layernorm.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.6.post_attention_layernorm.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.2.post_attention_layernorm.weight', 'model.layers.4.post_attention_layernorm.weight', 'model.layers.9.input_layernorm.weight', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.28.mlp.down_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.13.input_layernorm.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.post_attention_layernorm.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.25.input_layernorm.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.12.input_layernorm.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.14.post_attention_layernorm.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.embed_tokens.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.16.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.9.post_attention_layernorm.weight', 'model.layers.1.input_layernorm.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.17.post_attention_layernorm.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.7.post_attention_layernorm.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.14.input_layernorm.weight', 'model.layers.10.post_attention_layernorm.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.23.post_attention_layernorm.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.30.post_attention_layernorm.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.15.post_attention_layernorm.weight', 'model.layers.12.post_attention_layernorm.weight', 'model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.17.input_layernorm.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.15.input_layernorm.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.21.input_layernorm.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.8.post_attention_layernorm.weight', 'model.layers.23.input_layernorm.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.13.self_attn.k_proj.weight']
- This IS expected if you are initializing LLaMAForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LLaMAForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LLaMAForCausalLM were not initialized from the model checkpoint at decapoda-research/llama-7b-hf and are newly initialized: ['model.decoder.layers.25.feed_forward.w3.weight', 'model.decoder.layers.25.self_attn.k_proj.weight', 'model.decoder.layers.22.ffn_norm.weight', 'model.decoder.layers.23.self_attn.o_proj.weight', 'model.decoder.layers.22.self_attn.o_proj.weight', 'model.decoder.layers.26.feed_forward.w2.weight', 'model.decoder.layers.6.self_attn.o_proj.weight', 'model.decoder.layers.14.self_attn.q_proj.weight', 'model.decoder.layers.17.attention_norm.weight', 'model.decoder.layers.19.self_attn.o_proj.weight', 'model.decoder.layers.15.feed_forward.w1.weight', 'model.decoder.layers.21.feed_forward.w2.weight', 'model.decoder.layers.10.self_attn.o_proj.weight', 'model.decoder.layers.24.ffn_norm.weight', 'model.decoder.layers.11.feed_forward.w2.weight', 'model.decoder.layers.15.self_attn.k_proj.weight', 'model.decoder.layers.13.attention_norm.weight', 'model.decoder.layers.6.attention_norm.weight', 'model.decoder.layers.7.attention_norm.weight', 'model.decoder.layers.8.feed_forward.w2.weight', 'model.decoder.layers.18.self_attn.q_proj.weight', 'model.decoder.layers.26.feed_forward.w3.weight', 'model.decoder.layers.15.self_attn.o_proj.weight', 'model.decoder.layers.28.attention_norm.weight', 'model.decoder.layers.31.self_attn.o_proj.weight', 'model.decoder.layers.16.self_attn.o_proj.weight', 'model.decoder.layers.17.self_attn.k_proj.weight', 'model.decoder.layers.13.self_attn.q_proj.weight', 'model.decoder.layers.21.self_attn.o_proj.weight', 'model.decoder.layers.28.self_attn.v_proj.weight', 'model.decoder.layers.30.self_attn.o_proj.weight', 'model.decoder.layers.1.self_attn.o_proj.weight', 'model.decoder.layers.15.self_attn.v_proj.weight', 'model.decoder.layers.1.feed_forward.w1.weight', 'model.decoder.layers.1.feed_forward.w3.weight', 'model.decoder.layers.8.self_attn.v_proj.weight', 'model.decoder.layers.21.self_attn.q_proj.weight', 'model.decoder.layers.3.self_attn.v_proj.weight', 'model.decoder.layers.18.ffn_norm.weight', 'model.decoder.layers.22.feed_forward.w3.weight', 'model.decoder.layers.27.ffn_norm.weight', 'model.decoder.layers.8.ffn_norm.weight', 'model.decoder.layers.8.self_attn.k_proj.weight', 'model.decoder.layers.24.feed_forward.w3.weight', 'model.decoder.layers.14.feed_forward.w3.weight', 'model.decoder.layers.16.attention_norm.weight', 'model.decoder.layers.5.feed_forward.w3.weight', 'model.decoder.layers.11.feed_forward.w3.weight', 'model.decoder.layers.4.attention_norm.weight', 'model.decoder.layers.21.ffn_norm.weight', 'model.decoder.layers.28.self_attn.o_proj.weight', 'model.decoder.layers.30.self_attn.k_proj.weight', 'model.decoder.layers.14.feed_forward.w1.weight', 'model.decoder.layers.16.feed_forward.w2.weight', 'model.decoder.layers.24.feed_forward.w2.weight', 'model.decoder.layers.6.self_attn.k_proj.weight', 'model.decoder.layers.20.attention_norm.weight', 'model.decoder.layers.15.attention_norm.weight', 'model.decoder.layers.3.self_attn.q_proj.weight', 'model.decoder.layers.17.self_attn.o_proj.weight', 'model.decoder.layers.25.feed_forward.w1.weight', 'model.decoder.layers.1.feed_forward.w2.weight', 'model.decoder.layers.19.attention_norm.weight', 'model.decoder.layers.13.feed_forward.w1.weight', 'model.decoder.layers.1.self_attn.k_proj.weight', 'model.decoder.layers.20.feed_forward.w2.weight', 'model.decoder.layers.17.ffn_norm.weight', 'model.decoder.layers.12.attention_norm.weight', 'model.decoder.layers.23.ffn_norm.weight', 'model.decoder.layers.14.self_attn.k_proj.weight', 'model.decoder.layers.26.feed_forward.w1.weight', 'model.decoder.layers.6.feed_forward.w1.weight', 'model.decoder.layers.12.feed_forward.w3.weight', 'model.decoder.layers.0.feed_forward.w3.weight', 'model.decoder.layers.22.feed_forward.w1.weight', 'model.decoder.layers.29.feed_forward.w1.weight', 'model.decoder.layers.19.self_attn.k_proj.weight', 'model.decoder.layers.11.self_attn.v_proj.weight', 'model.decoder.layers.13.feed_forward.w3.weight', 'model.decoder.layers.4.ffn_norm.weight', 'model.decoder.layers.9.attention_norm.weight', 'model.decoder.layers.30.feed_forward.w3.weight', 'model.decoder.layers.17.feed_forward.w1.weight', 'model.decoder.layers.18.self_attn.v_proj.weight', 'model.decoder.layers.29.ffn_norm.weight', 'model.decoder.layers.17.self_attn.q_proj.weight', 'model.decoder.layers.23.feed_forward.w2.weight', 'model.decoder.layers.31.feed_forward.w2.weight', 'model.decoder.layers.11.attention_norm.weight', 'model.decoder.layers.5.feed_forward.w2.weight', 'model.decoder.layers.3.feed_forward.w1.weight', 'model.decoder.layers.29.self_attn.q_proj.weight', 'model.decoder.layers.15.feed_forward.w2.weight', 'model.decoder.layers.2.feed_forward.w1.weight', 'model.decoder.layers.27.feed_forward.w1.weight', 'model.decoder.layers.30.self_attn.q_proj.weight', 'model.decoder.layers.0.attention_norm.weight', 'model.decoder.layers.23.feed_forward.w3.weight', 'model.decoder.layers.5.self_attn.o_proj.weight', 'model.decoder.layers.14.self_attn.v_proj.weight', 'model.decoder.layers.4.feed_forward.w1.weight', 'model.decoder.layers.23.feed_forward.w1.weight', 'model.decoder.layers.29.self_attn.k_proj.weight', 'model.decoder.layers.2.self_attn.v_proj.weight', 'model.decoder.layers.3.ffn_norm.weight', 'model.decoder.layers.11.self_attn.k_proj.weight', 'model.decoder.layers.18.feed_forward.w3.weight', 'model.decoder.layers.27.feed_forward.w2.weight', 'model.decoder.layers.28.self_attn.k_proj.weight', 'model.decoder.layers.31.attention_norm.weight', 'model.decoder.layers.8.feed_forward.w3.weight', 'model.decoder.layers.9.feed_forward.w2.weight', 'model.decoder.layers.10.self_attn.v_proj.weight', 'model.decoder.layers.2.attention_norm.weight', 'model.decoder.layers.14.feed_forward.w2.weight', 'model.decoder.layers.19.feed_forward.w2.weight', 'model.decoder.layers.21.attention_norm.weight', 'model.decoder.layers.24.attention_norm.weight', 'model.decoder.layers.26.self_attn.k_proj.weight', 'model.decoder.layers.26.self_attn.v_proj.weight', 'model.decoder.layers.9.feed_forward.w1.weight', 'model.decoder.layers.31.feed_forward.w1.weight', 'model.decoder.layers.13.self_attn.k_proj.weight', 'model.decoder.layers.6.ffn_norm.weight', 'model.decoder.layers.8.attention_norm.weight', 'model.decoder.layers.19.feed_forward.w3.weight', 'model.decoder.layers.27.feed_forward.w3.weight', 'model.decoder.layers.18.self_attn.o_proj.weight', 'model.decoder.layers.27.self_attn.q_proj.weight', 'model.decoder.layers.8.self_attn.o_proj.weight', 'model.decoder.layers.27.self_attn.v_proj.weight', 'model.decoder.layers.9.self_attn.q_proj.weight', 'model.decoder.layers.6.self_attn.v_proj.weight', 'model.decoder.layers.0.ffn_norm.weight', 'model.decoder.layers.19.self_attn.v_proj.weight', 'model.decoder.layers.22.feed_forward.w2.weight', 'model.decoder.layers.29.self_attn.o_proj.weight', 'model.decoder.layers.1.self_attn.v_proj.weight', 'model.decoder.layers.11.ffn_norm.weight', 'model.decoder.layers.4.self_attn.q_proj.weight', 'model.decoder.layers.10.feed_forward.w3.weight', 'model.decoder.layers.18.self_attn.k_proj.weight', 'model.decoder.layers.16.self_attn.v_proj.weight', 'model.decoder.layers.7.feed_forward.w3.weight', 'model.decoder.layers.23.self_attn.v_proj.weight', 'model.decoder.layers.5.ffn_norm.weight', 'model.decoder.layers.9.feed_forward.w3.weight', 'model.decoder.layers.20.self_attn.k_proj.weight', 'model.decoder.norm.weight', 'model.decoder.layers.3.attention_norm.weight', 'model.decoder.layers.10.ffn_norm.weight', 'model.decoder.layers.17.feed_forward.w2.weight', 'model.decoder.layers.3.self_attn.k_proj.weight', 'model.decoder.layers.5.attention_norm.weight', 'model.decoder.layers.20.ffn_norm.weight', 'model.decoder.layers.4.feed_forward.w2.weight', 'model.decoder.layers.17.feed_forward.w3.weight', 'model.decoder.layers.20.self_attn.o_proj.weight', 'model.decoder.layers.13.self_attn.v_proj.weight', 'model.decoder.layers.22.self_attn.k_proj.weight', 'model.decoder.layers.30.feed_forward.w1.weight', 'model.decoder.layers.4.feed_forward.w3.weight', 'model.decoder.layers.24.feed_forward.w1.weight', 'model.decoder.layers.30.attention_norm.weight', 'model.decoder.layers.7.self_attn.k_proj.weight', 'model.decoder.layers.5.self_attn.v_proj.weight', 'model.decoder.layers.1.attention_norm.weight', 'model.decoder.layers.7.ffn_norm.weight', 'model.decoder.layers.21.self_attn.k_proj.weight', 'model.decoder.layers.31.ffn_norm.weight', 'model.decoder.layers.3.feed_forward.w2.weight', 'model.decoder.layers.7.self_attn.o_proj.weight', 'model.decoder.layers.21.self_attn.v_proj.weight', 'model.decoder.layers.0.self_attn.k_proj.weight', 'model.decoder.layers.19.feed_forward.w1.weight', 'model.decoder.layers.2.self_attn.k_proj.weight', 'model.decoder.layers.24.self_attn.o_proj.weight', 'model.decoder.layers.10.feed_forward.w2.weight', 'model.decoder.layers.9.self_attn.v_proj.weight', 'model.decoder.layers.10.attention_norm.weight', 'model.decoder.layers.23.self_attn.q_proj.weight', 'model.decoder.layers.13.self_attn.o_proj.weight', 'model.decoder.layers.16.feed_forward.w3.weight', 'model.decoder.layers.1.ffn_norm.weight', 'model.decoder.layers.20.feed_forward.w1.weight', 'model.decoder.layers.15.self_attn.q_proj.weight', 'model.decoder.layers.24.self_attn.q_proj.weight', 'model.decoder.layers.2.ffn_norm.weight', 'model.decoder.layers.15.feed_forward.w3.weight', 'model.decoder.layers.29.attention_norm.weight', 'model.decoder.layers.4.self_attn.k_proj.weight', 'model.decoder.layers.10.feed_forward.w1.weight', 'model.decoder.layers.15.ffn_norm.weight', 'model.decoder.layers.12.self_attn.o_proj.weight', 'model.decoder.layers.2.self_attn.o_proj.weight', 'model.decoder.layers.23.self_attn.k_proj.weight', 'model.decoder.layers.25.ffn_norm.weight', 'model.decoder.layers.12.feed_forward.w1.weight', 'model.decoder.layers.28.feed_forward.w3.weight', 'model.decoder.layers.16.self_attn.k_proj.weight', 'model.decoder.layers.28.ffn_norm.weight', 'model.decoder.layers.26.self_attn.q_proj.weight', 'model.decoder.layers.13.ffn_norm.weight', 'model.decoder.layers.20.self_attn.q_proj.weight', 'model.decoder.layers.20.self_attn.v_proj.weight', 'model.decoder.layers.13.feed_forward.w2.weight', 'model.decoder.layers.26.self_attn.o_proj.weight', 'model.decoder.layers.9.ffn_norm.weight', 'model.decoder.layers.24.self_attn.v_proj.weight', 'model.decoder.layers.25.self_attn.o_proj.weight', 'model.decoder.layers.6.self_attn.q_proj.weight', 'model.decoder.layers.25.self_attn.v_proj.weight', 'model.decoder.layers.28.feed_forward.w2.weight', 'model.decoder.layers.22.attention_norm.weight', 'model.decoder.layers.10.self_attn.q_proj.weight', 'model.decoder.layers.27.self_attn.o_proj.weight', 'model.decoder.embed_tokens.weight', 'model.decoder.layers.4.self_attn.o_proj.weight', 'model.decoder.layers.14.ffn_norm.weight', 'model.decoder.layers.25.feed_forward.w2.weight', 'model.decoder.layers.21.feed_forward.w1.weight', 'model.decoder.layers.7.feed_forward.w2.weight', 'model.decoder.layers.0.feed_forward.w1.weight', 'model.decoder.layers.25.attention_norm.weight', 'model.decoder.layers.25.self_attn.q_proj.weight', 'model.decoder.layers.27.attention_norm.weight', 'model.decoder.layers.7.feed_forward.w1.weight', 'model.decoder.layers.29.self_attn.v_proj.weight', 'model.decoder.layers.2.feed_forward.w3.weight', 'model.decoder.layers.17.self_attn.v_proj.weight', 'model.decoder.layers.12.feed_forward.w2.weight', 'model.decoder.layers.5.feed_forward.w1.weight', 'model.decoder.layers.3.self_attn.o_proj.weight', 'model.decoder.layers.6.feed_forward.w2.weight', 'model.decoder.layers.11.self_attn.o_proj.weight', 'model.decoder.layers.19.ffn_norm.weight', 'model.decoder.layers.9.self_attn.o_proj.weight', 'model.decoder.layers.18.feed_forward.w1.weight', 'model.decoder.layers.0.feed_forward.w2.weight', 'model.decoder.layers.26.ffn_norm.weight', 'model.decoder.layers.30.self_attn.v_proj.weight', 'model.decoder.layers.27.self_attn.k_proj.weight', 'model.decoder.layers.30.feed_forward.w2.weight', 'model.decoder.layers.31.self_attn.k_proj.weight', 'model.decoder.layers.19.self_attn.q_proj.weight', 'model.decoder.layers.30.ffn_norm.weight', 'model.decoder.layers.2.feed_forward.w2.weight', 'model.decoder.layers.8.self_attn.q_proj.weight', 'model.decoder.layers.22.self_attn.q_proj.weight', 'model.decoder.layers.29.feed_forward.w3.weight', 'model.decoder.layers.0.self_attn.o_proj.weight', 'model.decoder.layers.5.self_attn.q_proj.weight', 'model.decoder.layers.11.feed_forward.w1.weight', 'model.decoder.layers.12.self_attn.k_proj.weight', 'model.decoder.layers.12.ffn_norm.weight', 'model.decoder.layers.28.self_attn.q_proj.weight', 'model.decoder.layers.14.self_attn.o_proj.weight', 'model.decoder.layers.29.feed_forward.w2.weight', 'model.decoder.layers.11.self_attn.q_proj.weight', 'model.decoder.layers.3.feed_forward.w3.weight', 'model.decoder.layers.0.self_attn.v_proj.weight', 'model.decoder.layers.7.self_attn.v_proj.weight', 'model.decoder.layers.12.self_attn.q_proj.weight', 'model.decoder.layers.21.feed_forward.w3.weight', 'model.decoder.layers.23.attention_norm.weight', 'model.decoder.layers.16.ffn_norm.weight', 'model.decoder.layers.9.self_attn.k_proj.weight', 'model.decoder.layers.10.self_attn.k_proj.weight', 'model.decoder.layers.31.feed_forward.w3.weight', 'model.decoder.layers.0.self_attn.q_proj.weight', 'model.decoder.layers.2.self_attn.q_proj.weight', 'model.decoder.layers.18.feed_forward.w2.weight', 'model.decoder.layers.18.attention_norm.weight', 'model.decoder.layers.1.self_attn.q_proj.weight', 'model.decoder.layers.4.self_attn.v_proj.weight', 'model.decoder.layers.22.self_attn.v_proj.weight', 'model.decoder.layers.24.self_attn.k_proj.weight', 'model.decoder.layers.6.feed_forward.w3.weight', 'model.decoder.layers.31.self_attn.v_proj.weight', 'model.decoder.layers.16.feed_forward.w1.weight', 'model.decoder.layers.16.self_attn.q_proj.weight', 'model.decoder.layers.5.self_attn.k_proj.weight', 'model.decoder.layers.20.feed_forward.w3.weight', 'model.decoder.layers.12.self_attn.v_proj.weight', 'model.decoder.layers.8.feed_forward.w1.weight', 'model.decoder.layers.28.feed_forward.w1.weight', 'model.decoder.layers.31.self_attn.q_proj.weight', 'model.decoder.layers.14.attention_norm.weight', 'model.decoder.layers.7.self_attn.q_proj.weight', 'model.decoder.layers.26.attention_norm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 24.8kB/s]
Downloading (…)/adapter_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 370/370 [00:00<00:00, 163kB/s]
Downloading adapter_model.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.8M/16.8M [00:00<00:00, 89.7MB/s]
Instruction: Tell me about alpacas.
Traceback (most recent call last):
File "/home/t-enshengshi/workspace/alpaca-lora/generate.py", line 77, in <module>
print("Response:", evaluate(instruction))
File "/home/t-enshengshi/workspace/alpaca-lora/generate.py", line 51, in evaluate
generation_output = model.generate(
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/peft/peft_model.py", line 581, in generate
outputs = self.base_model.generate(**kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/generation/utils.py", line 1490, in generate
return self.beam_search(
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/generation/utils.py", line 2749, in beam_search
outputs = self(
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 852, in forward
outputs = self.model.decoder(
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 624, in forward
layer_outputs = decoder_layer(
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 165, in forward
query_states = self.q_proj(hidden_states).view(bsz, tgt_len, self.num_heads, self.head_dim)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/peft/tuners/lora.py", line 522, in forward
result = super().forward(x)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 390, in forward
output = torch.nn.functional.linear(A_wo_outliers, state.CB.to(A.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
feature request from @Ayushk4
Hello I'm trying to run the generate.py example but I get the following error ("decapoda-research/llama-7b-hf" was previously downloaded) :
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/anaconda3/envs/alpaca-lora did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
warn(msg)
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: :/usr/local/cuda/lib64/ did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
ERROR: python: undefined symbol: cudaRuntimeGetVersion
CUDA SETUP: libcudart.so path is None
CUDA SETUP: Is seems that your cuda installation is not in your path. See https://github.com/TimDettmers/bitsandbytes/issues/85 for more information.
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 00
CUDA SETUP: Loading binary /root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards: 0%| | 0/33 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/projects/alpaca-lora/generate.py", line 13, in <module>
model = LlamaForCausalLM.from_pretrained(
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2646, in from_pretrained
) = cls._load_pretrained_model(
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2969, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/modeling_utils.py", line 676, in _load_state_dict_into_meta_model
set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/utils/bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 196, in to
return self.cuda(device)
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 160, in cuda
CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/ctypes/__init__.py", line 395, in __getattr__
func = self.__getitem__(name)
File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/ctypes/__init__.py", line 400, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
I'm running into a conda environment based on python 3.9 and I installed the requirements with pip. Also I compiled the bitsandbytes package from source by running
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes/
make cuda11x
CUDA_VERSION=112
python setup.py install
Here are the nvidia-smi and nvcc --version prints:
Fri Mar 17 13:53:10 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 On | 00000000:00:10.0 Off | N/A |
| 30% 34C P8 6W / 125W | 1MiB / 7982MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
which nvcc:
/usr/bin/nvcc
which nvidia-smi:
/usr/bin/nvidia-smi
Any idea what the problem might be? Thank you in advance.
When using CPU as a device, after trying various prompts, gradio returns with the following error when running generate.py:
Something went wrong
Unexpected token '<', "<html> <h"... is not valid JSON
Interestingly I do not see any errors in the console log, it seems to be a formatting issue with the gradio ui.
the prompts I tried:
Write a Python program that prints the first 10 Fibonacci numbers.
write python code that renames all files starting with "mleml" in the current folder to "kek".
Something like this:
Below is an instruction that describes a task, possibly paired with an input that provides further context. Write a response that appropriately completes the request.
<instruction>Add two numbers
<input>2, 3
<response>
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/trainer.py", line 1628, in train
return inner_training_loop(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/trainer.py", line 1895, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/trainer.py", line 2637, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/trainer.py", line 2669, in compute_loss
outputs = model(**inputs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/peft_model.py", line 529, in forward
return self.base_model(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 852, in forward
outputs = self.model.decoder(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 616, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 612, in custom_forward
return module(*inputs, output_attentions, None)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 167, in forward
value_states = self.v_proj(hidden_states).view(bsz, tgt_len, self.num_heads, self.head_dim)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/tuners/lora.py", line 522, in forward
result = super().forward(x)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward
output += torch.matmul(subA, state.subB)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x7 and 8x4096)
Would it be possible to document the steps needed to correctly merge LoRa weights? This would be to allow the merged model to run on llama.cpp on lower end hardware.
I've tried figuring out how to do this myself, but the magic is a little too deep and my understanding a little too shallow :)
I am not sure if the issue is here in the export code or in GPTQ for LLama.
What I did:
Used export_hf_checkpoint.py
Executed the quantization from GPTQ for llama (https://github.com/qwopqwop200/GPTQ-for-LLaMa)
At the beginning of the quantization i got the following warning:
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:11<00:00, 3.38it/s]
Some weights of the model checkpoint at ../alpaca-lora/hf_ckpt/ were not used when initializing LlamaForCausalLM: ['base_model.model.lm_head.weight']
Is that normal?
When I try to use the 4 bit quantized model I am only getting random output:
(.venv) [danielw@pc GPTQ-for-LLaMa]$ CUDA_VISIBLE_DEVICES=0 python llama_inference.py ../alpaca-lora/hf_ckpt/ --wbits 4 --load alpace-7b-4bit-non-cleaned.pt --text "Hello"
Loading model ...
Done.
Hello Kub Akademutionsiy?')}{\rightarrow größ office \\ größ Roberts lé?'ulseulseame collect authorizationSide色սöd São affected throwingcur let authorizationמ bug affected dw collectmineною Chairulse instant Unionistrictscherरabsरclusowclar
Not sure whats going on :( It seems that the export_hf_checkpoint script exports a model that is not compatible with GPTQ for LLaMA.
Hi, would love to test the LORA model, did anyone try and do it on windows?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.