tatsu-lab / stanford_alpaca Goto Github PK

View Code? Open in Web Editor NEW

28.8K 340.0 4.0K 8.45 MB

Code and documentation to train Stanford's Alpaca models, and generate the data.

Home Page: https://crfm.stanford.edu/2023/03/13/alpaca.html

License: Apache License 2.0

Python 100.00%

deep-learning instruction-following language-model

stanford_alpaca's Introduction

Stanford Alpaca: An Instruction-following LLaMA Model

This is the repo for the Stanford Alpaca project, which aims to build and share an instruction-following LLaMA model. The repo contains:

The 52K data used for fine-tuning the model.
The code for generating the data.
The code for fine-tuning the model.
The code for recovering Alpaca-7B weights from our released weight diff.

Note: We thank the community for feedback on Stanford-Alpaca and supporting our research. Our live demo is suspended until further notice.

Usage and License Notices: Alpaca is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes. The weight diff is also CC BY NC 4.0 (allowing only non-commercial use).

Overview

The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section. In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the text-davinci-003 model on the Self-Instruct instruction-following evaluation suite [2].

Alpaca is still under development, and there are many limitations that have to be addressed. Importantly, we have not yet fine-tuned the Alpaca model to be safe and harmless. We thus encourage users to be cautious when interacting with Alpaca, and to report any concerning behavior to help improve the safety and ethical considerations of the model.

Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission to do so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.

Please read our release blog post for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thought process for releasing a reproducible model.

[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1

[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560

Data Release

alpaca_data.json contains 52K instruction-following data we used for fine-tuning the Alpaca model. This JSON file is a list of dictionaries, each dictionary contains the following fields:

instruction: str, describes the task the model should perform. Each of the 52K instructions is unique.
input: str, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.
output: str, the answer to the instruction as generated by text-davinci-003.

We used the following prompts for fine-tuning the Alpaca model:

for examples with a non-empty input field:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:

for examples with an empty input field:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:

During inference (eg for the web demo), we use the user instruction with an empty input field (second option).

Data Generation Process

Running the code

Set environment variables OPENAI_API_KEY to your OpenAI API key.
Install the dependencies with pip install -r requirements.txt.
Run python -m generate_instruction generate_instruction_following_data to generate the data.

We built on the data generation pipeline from self-instruct and made the following modifications:

We used text-davinci-003 to generate the instruction data instead of davinci.
We wrote a new prompt (prompt.txt) that explicitly gave the requirement of instruction generation to text-davinci-003. Note: there is a slight error in the prompt we used, and future users should incorporate the edit in #24
We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].

This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500). In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by self-instruct. We plot the below figure (in the style of Figure 2 in the self-instruct paper to demonstrate the diversity of our data. The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.

Fine-tuning

We fine-tune our models using standard Hugging Face training code. We fine-tune LLaMA-7B and LLaMA-13B with the following hyperparameters:

Hyperparameter	LLaMA-7B	LLaMA-13B
Batch size	128	128
Learning rate	2e-5	1e-5
Epochs	3	5
Max length	512	512
Weight decay	0	0

To reproduce our fine-tuning runs for LLaMA, first install the requirements

pip install -r requirements.txt

Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP full_shard mode. We were able to reproduce a model of similar quality as the one we hosted in our demo with the following command using Python 3.10. Replace <your_random_port> with a port of your own, <your_path_to_hf_converted_llama_ckpt_and_tokenizer> with the path to your converted checkpoint and tokenizer (following instructions in the PR), and <your_output_dir> with where you want to store your outputs.

torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True

The same script also works for OPT fine-tuning. Here's an example for fine-tuning OPT-6.7B

torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path "facebook/opt-6.7b" \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \
    --tf32 True

Note the given training script is meant to be simple and easy to use, and is not particularly optimized. To run on more gpus, you may prefer to turn down gradient_accumulation_steps to keep a global batch size of 128. Global batch size has not been tested for optimality.

Addressing OOM

Naively, fine-tuning a 7B model requires about 7 x 4 x 4 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU. If you'd like to further reduce the memory footprint, here are some options:

Turn on CPU offload for FSDP with --fsdp "full_shard auto_wrap offload". This saves VRAM at the cost of longer runtime.

In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP with offload. Here's an example to use DeepSpeed stage-3 with 4 GPUs with both parameter and optimizer offload:

pip install deepspeed
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --deepspeed "./configs/default_offload_opt_param.json" \
    --tf32 True

The DeepSpeed library also provides some helpful functions to estimate memory usage.

LoRA fine-tunes low-rank slices of the query, key, and value embedding heads. This can reduce the total memory footprint from 112GB to about 7x4=28GB. We may release our re-implemention of this in the future, but for now the peft codebase can be a useful resource.

Recovering Alpaca Weights

The weight diff between Alpaca-7B and LLaMA-7B is located here. To recover the original Alpaca-7B weights, follow these steps:

1. Convert Meta's released weights into huggingface format. Follow this guide:
    https://huggingface.co/docs/transformers/main/model_doc/llama
2. Make sure you cloned the released weight diff into your local machine. The weight diff is located at:
    https://huggingface.co/tatsu-lab/alpaca-7b/tree/main
3. Run this function with the correct paths. E.g.,
    python weight_diff.py recover --path_raw <path_to_step_1_dir> --path_diff <path_to_step_2_dir> --path_tuned <path_to_store_recovered_weights>

Once step 3 completes, you should have a directory with the recovered weights, from which you can load the model like the following

import transformers
alpaca_model = transformers.AutoModelForCausalLM.from_pretrained("<path_to_store_recovered_weights>")
alpaca_tokenizer = transformers.AutoTokenizer.from_pretrained("<path_to_store_recovered_weights>")

Authors

All grad students below contributed equally and the order is determined by random draw.

All advised by Tatsunori B. Hashimoto. Yann is also advised by Percy Liang and Xuechen is also advised by Carlos Guestrin.

Citation

Please cite the repo if you use the data or code in this repo.

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].

Acknowledgements

We thank Yizhong Wang for his help in explaining the data generation pipeline in Self-Instruct and providing the code for the parse analysis plot. We thank Yifan Mai for helpful support, and members of the Stanford NLP Group as well as the Center for Research on Foundation Models (CRFM) for their helpful feedback.

stanford_alpaca's People

Contributors

Stargazers

Watchers

Forkers

c00renut epinnock codeaudit tranquilo12 loki44 suryatmodulus rozgo muharremokutan mindrages smy20011 stanleyjacob peterjliu baocin abacaj sbusso mobarmg lk251 rogervaas avsolatorio shobith bhanuc annihilatorrrr albertoual danielwe2 kemolo sslava k-nar mcwebdev brentes moerehman wvangeit standardgalactic vishnuvaradaraj xuhuifan daniel-furman thecloudfather upupbl sanmiandresofa brianbaldock macols77 dalian-ai kangli davidyuan666 entn-at skymsg higuseonhye mysticaltech gaohuan2015 bungerr darrengao628 ddaying theemancipator zhao-kun nztinversive sxthunder dumpmemory ditonogo techthiyanes wangguojim endeavorh greenrock21 co-simulation cellinlab wregret randbear royshan petercao markxsq weiplanet munifico yifree llegomark ukaserge zzhalan trojblue liudaihu yanniszhou backupart dnasdw capxax trisix jadentan hanlin-luo linhduongtuan viyv alanxmay slf188 zhangxt chengzhongnan markschmidty mtfelix igorcosta ccoltong1215 imaubrey xloongcn sunalamye pkuiloveoov wi11iam5-mpu billyan2018 gvc0461082002

stanford_alpaca's Issues

Do you shift the output label?

From your training code, the output label and input label is the same. Where do you shift the output label? Will this happen automatically inside trainer?

Public release of model weights

Congratulations on the fine-tune! We have observed some fantastic performance through the provided web interface.

AFAIK the original Llama model was released under GNU/GPL, you should be able to distribute derivative work respecting this original license, correct? (Even if the original model weights have not officially been distributed to the public yet)

Will you provide some sort of wait-list to notify us when the model weights are made available?

Interested in as much information as you may share on this, again, congratulations and thank your impressive work!

https://github.com/facebookresearch/llama/blob/main/LICENSE

Due to OOM, who can finetune LLaMA using bitsandbytes for an 8-bit setting on a single 3090?

Dear @ALL
Due to OOM as mentioned in previous issues, who can finetune LLaMA using bitsandbytes for an 8-bit setting on a single 3090?
If yes, please share your experiments and experience.

Best regards,
Linh

OpenAIError Error communicating with OpenAI

Hello, I kept receive this issue.

Loading llama-7b from huggingface

Could you share the link to the adopted llama-7b model? I was trying the one from decapoda-research (https://huggingface.co/decapoda-research) (https://huggingface.co/decapoda-research/llama-7b-hf/discussions) but it looks like the model itself cannot be loaded.

Reduce reproduction cost 96%, from $600 to $24, by releasing the instruct dataset only

The blog post says $500 was spent producing the dataset.
The blog post also says $100 was spent on 3xA100 80GB for 3 hours.
The market rate for 4xA100 is around $8 per hour. (See vast.ai for example)

If the dataset is provided for fine tuning then Alpaca could be reproduce for just about $24 and we would not have to wait for Facebook's response regarding sharing of the pre-trained model.

How to inference after finetuning ?

Thanks for sharing the training code. I've finished a 3-epoch finetuing.
However, I don't find the inference code.
Would you please give some advice on it? or sharing the infercence code ?
Thanks again!

When can we support airgap installation?

HI guys,
This one is awesome. When do you guys plan to support airgap installation? in another words, the end user can run it in their Laptop or any VMs in public cloud?

Fine-Tuning very slow (6h->24h??)

Hello, first of all thank you for releasing the training code for alpaca, we really appreaciate it.

I am running the fine-tuning script on an 4xA100-SXM4-80GB, and currently getting an 24H ETA. Which doesn't really scales with the reported "3 hours on 8 80GB A100s" mentioned on https://crfm.stanford.edu/2023/03/13/alpaca.html , Shouldn't it be around 6hours, or even 12hours considering that the script "is not particularly optimized"?

Is anyone else encountering this issue? And if this is expected, then what were the methods you used to optimize the fine-tuning process?

Running on CUDA 12.1, Torch 1.13, and the transformers fork of llama at the commit you mentioned.

Thanks.

Not quite understand the importance of this repo.

Hi, devs at stanford. Today I took a try on your project and run the command to generate the data. And after awhile, it outputs a json file, regen.json like below. So I have a little confused, forgive my ignorance but I really don't know how to make something cool with this "regex.json" file. You know what I mean, is like I got a file, but what can I do with it. I guessed ppl might able to create something similar to ChatGpt but weaker, this is my guessing so far. Please enlightened me, thanks.

Numpie lost Factories

Generation problem after / before instruction fine-tuning

Environment: 6xA6000 48GB with Ubuntu 22.04, Pytorch 1.13.0

I ran into a generation problem after following your instruction to convert LLaMA-7B weight using your attached script.

I simply used the following script to directly test generation after loading the converted LLaMA-7B model:

tokenizer.batch_decode(model.generate(**tokenizer('I want to ', return_tensors="pt")))

The output of above code is:

'I want to acoérницschutzirectorioieckťDEX threshold släktetolasĭüttpiel'

The problem happens both before and after following your README for instruction fine-tuning. (note that I see the loss is decreasing over time during the fine-tuning stage which seems OK)

I have no problem running generation using original code from LLaMA, may I know your generation script so that I can test what caused the problem? Thanks.

Plan to release the web demo code

Hi, thanks for sharing your work, this is amazing!

Do you plan to release the web demo code ?

We are thinking about why this small model can store enough world knowledge

Hi, we find your work in home page.
https://crfm.stanford.edu/2023/03/13/alpaca.html
This work inspires us how to adjust large language models in a good way.
Now, We are thinking about why this small model can store enough world knowledge.

Best.

Training recipe??

The blog says training recipe is too released in the code, but I cannot find it. Can you update the repo with code used for training the model, along with required dependencies/guide, etc, to help us do the same, maybe with bigger models.
Thanks for this awesome repo.

infer cost

Hi,

Can a consumer level GPU run infer with alpaca-7B model?

why 52K?

Hello, thank you for open-sourcing your training details! I just tried your demo and found the responses surprisingly fluent.

Wondering if your decision to train on a 52K instruction dataset was influenced by some criteria? I'm wondering if there's a floor where you found responses to be qualitatively inferior, or trying a number beyond 52K to have not yielded better results?

Comparing training log [Shared my training log]

I am currently training the model, and I am hoping to compare it with others. I am only using only 2 A100-80G.
Here is my wanb log:
https://wandb.ai/charliezjw/huggingface/runs/hil1q6lt

No evaluation dataset was given for the trainer

Hi, there, I just finish the finetuning process as introduced in train.py. However, I encountered one problem about trainer.evaluate().

{'loss': 0.3974, 'learning_rate': 3.5380966993958655e-11, 'epoch': 3.0}
{'loss': 0.4492, 'learning_rate': 0.0, 'epoch': 3.0}
{'train_runtime': 17758.138, 'train_samples_per_second': 8.785, 'train_steps_per_second': 0.069, 'train_loss': 0.7304400721402787, 'epoch': 3.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1218/1218 [4:55:48<00:00, 14.57s/it]
Traceback (most recent call last):
  File "/home/codes/finetune_llama/alpaca/train.py", line 233, in <module>
    train()
  File "/home/codes/finetune_llama/alpaca/train.py", line 227, in train
    trainer.evaluate()
  File "/home/anaconda3/envs/hawq/lib/python3.9/site-packages/transformers/trainer.py", line 2920, in evaluate
    eval_dataloader = self.get_eval_dataloader(eval_dataset)
  File "/home/anaconda3/envs/hawq/lib/python3.9/site-packages/transformers/trainer.py", line 934, in get_eval_dataloader
    raise ValueError("Trainer: evaluation requires an eval_dataset.")
ValueError: Trainer: evaluation requires an eval_dataset.

Should I give an eval_dataset here?

Support for gpt-3.5-turbo

gpt-3.5-turbo is cheaper and faster than davinci. I'm not 100% sure whether it will actually work better for Alpaca but figure it may be worth a trial. Any interest in taking a PR?

Question about training precision

In the provided training command:

torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
    --tf32 True

Why is --bf16 used, if the model checkpoints were originally fp16? Is it simply overridden by the --tf32 flag later?

web demo agree button does nothing.

[Q] How much vRAM does finetuning LLaMa 7B require?

How much vRAM does finetuning LLaMa 7B require?

What was the hardware used to train Alpaca?

Are some layers frozen while fine-tuning？

Hello, when finetuning LLaMa, are there any specific layers frozen so that these layers do not trained in training?
@percyliang @guestrin @thashim @rtaori @lxuechen

Can we access this model from huggingface? [eos]

CUDA out of memory for a single core A100 80G GPU

I encountered the CUDA OOM on a single core A100 80G using your training code? Can i fix this by changing anything?

does support multi-turn training data?

Thanks for the greate job to help us easy to finetune on llama
I found that the training data is just single turn, does support for the mulit-turn data like OIG

Bigger LLaMA models

Dear Stanford Researchers, Professors, Students (all geniuses) thank you for your amazing job!
Would the tuning code you released in this repo (and the dataset) be fit for finetuning larger LLaMA models like 13b/30b/65b?

How would the computational effort scale with such models?

could be open source the model ?

hi ,dear
the model could be open source ?

How to plot the pie chart ?

Once you collect 52k synthetic dataset, how did you plot the pie chart here ?

Thanks !

No checkpoint and no eval_dataset

It seems no eval_dataset and thus no storing for checkpoint ?

(for privacy, I hide the absolute file path and replace with )

Traceback (most recent call last):
  File "<path>/stanford_alpaca/train.py", line 232, in <module>
    train()
  File "<path>/stanford_alpaca/train.py", line 226, in train
    trainer.evaluate()
  File "<path>/stanford_alpaca/transformers-68d640f7c368bcaaaecfc678f11908ebbd3d6176/src/transformers/trainer.py", line 2920, in evaluate
    eval_dataloader = self.get_eval_dataloader(eval_dataset)
  File "<path>/stanford_alpaca/transformers-68d640f7c368bcaaaecfc678f11908ebbd3d6176/src/transformers/trainer.py", line 934, in get_eval_dataloader
    raise ValueError("Trainer: evaluation requires an eval_dataset.")
ValueError: Trainer: evaluation requires an eval_dataset.

Can you share the log of your finetuning code?

As the name implies, can you share the training log?

Example of Instruction-Tuning Training

Hello, thank you for open-sourcing this work. We are now interested in generating our own instructions to fine-tune the Llama model based on your documentation and approach. Could you please advise on any resources or references we can use? Also, are these codes available on Hugging Face?

'type' object is not subscriptable

The exception can be fixed by replacing 'dict' to 'Dict'

from typing import Optional, Sequence, Union ... def openai_completion( prompts: Union[str, Sequence[str], Sequence[dict[str, str]], dict[str, str]],
-->
from typing import Optional, Sequence, Union, Dict ... def openai_completion( prompts: Union[str, Sequence[str], Sequence[Dict[str, str]], Dict[str, str]],

generate_instruction_following_data

Hi, What form should generate_instruction_following_data be in to execute generate instruction?

Reduce the length of your prompt.

prompt_batches: 0%| | 0/1 [00:00<?, ?it/s]WARNING:root:OpenAIError: This model's maximum context length is 4097 tokens, however you requested 4162 tokens (1090 in your prompt; 3072 for the completion). Please reduce your prompt; or completion length..

Inquiry: Inference Parameters used for Gradio Demo

As an independent researcher I'm interested in knowing what generation parameters are used in the Gradio Web Demo. Things such as temperature and repetition penalty, if you have used even more advanced samplers like Typical Sampling or Tail Free Sampling, I'd be interested to know that as well. From my brief testing it appears that the some parameter or setting is hampering creativity, perhaps that is intentional for the demo?
Thanks in advance!

Exception: Could not find the transformer layer class to wrap in the model

the version of transformers is https://github.com/huggingface/transformers/pull/21955/commits

Can this one support MacOS? Any particular hardware is required?

I am trying to run this thru my laptop(MacOS), just wondering any particular requirements in hardware?

How to train with the Bible content?

Hi,

What is the steps to train it with this specific Bible content?

Example:
https://raw.githubusercontent.com/tushortz/variety-bible-text/master/bibles/kjv.txt

Can you show me the steps to train it?

And the other question is: The final file is compatible with LLAMA?

Thanks.

Questions on fine-tuning process

I have three questions regarding the fine-tuning process.

How does max length (hyperparameter) work? Does each training sample concatanate multiple examples until it reaches the max length, or each training sample only includes a single example that is padded to the max length?
Is cross entropy loss is applied to all tokens including the input tokens (instruction + input), or just output tokens (response), or the weighted sum?
How is an user prompt processed at test time? Is it considered as an example with an empty input field?

Thank you in advance.

OOM issue

Can this finetuning script fit into A10, which only has 24GB GPU memory? I am trying to fine-tune the model on 4 A10 GPUs using a batch size of 1, but I still get an OOM error.

Training code detail.

Thanks for sharing this project. I have been trying to train the larger model for an offline first free education assistant for poor students preparing for competitive exams . Sharing training code, even if in an pr can really helpful for me fine-tune an education assistant.

Resuming from checkpoint

My first run of the trainer could not save the model because the evaluate() call fails. I have removed that method call and now would like to resume from the last checkpoint. However, I cannot seem to get that working. Is there some disparity between model architecture and checkpoint architecture? The change I made to accommodate checkpoint resumption and the error I get are shown below show below.

Change for checkpoint resumption

data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args) trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module) transformers.logging.set_verbosity_info() trainer.train() #trainer.train("output/checkpoint-18000") #trainer.evaluate() trainer.save_state() safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir)

Error stacktrace

`Loading model from output/checkpoint-18000/.
Traceback (most recent call last):
File "/home/ubuntu/alpaca/stanford_alpaca/train.py", line 246, in
train()
File "/home/ubuntu/alpaca/stanford_alpaca/train.py", line 239, in train
trainer.train("output/checkpoint-18000/")
File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1617, in train
self._load_from_checkpoint(resume_from_checkpoint)
File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2120, in _load_from_checkpoint
load_result = load_sharded_checkpoint(model, resume_from_checkpoint, strict=is_sagemaker_mp_enabled())
File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 385, in load_sharded_checkpoint
state_dict = torch.load(os.path.join(folder, shard_file), map_location="cpu")
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, pickle_load_args)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_utils.py", line 169, in _rebuild_tensor_v2
tensor = _rebuild_tensor(storage, storage_offset, size, stride)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_utils.py", line 148, in rebuild_tensor
return t.set(storage._untyped_storage, storage_offset, size, stride)
RuntimeError: Trying to resize storage that is not resizable
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 122406 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 122407 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 122409 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 122408) of binary: /usr/local/bin/python3.10
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/torchrun", line 8, in
sys.exit(main())
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py", line 346, in wrapper
return f(*args, kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED
`

how to fine-tune on V100

need help!

CUDA out of memory

Great work! In READM, you guys mention that 4 A100 80G can train this model, but when I try 8 40G A100, it meets cuda oom error.

Finetuning using standard hugging face training code

In ReadMe.md I saw that the model is finetuned using Stanford hugging face setup. I tried it but getting this error. Could someone help in calling Llama weights using hf

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("Bitsy/llama-7b-hfcompatible-clean")

Error :

KeyError Traceback (most recent call last)
in
1 from transformers import AutoModelForCausalLM
2
----> 3 model = AutoModelForCausalLM.from_pretrained("Bitsy/llama-7b-hfcompatible-clean")

2 frames
/usr/local/lib/python3.9/dist-packages/transformers/models/auto/configuration_auto.py in getitem(self, key)
577 return self._extra_content[key]
578 if key not in self._mapping:
--> 579 raise KeyError(key)
580 value = self._mapping[key]
581 module_name = model_type_to_module_name(key)

KeyError: 'llama'

inference kwargs

Thanks for the great work, I reproduced the training, but at inference time tends to generate shorter text. I am using:

generated = model.generate(batch["input_ids"], max_length=512)

Does the interface on the demo web page adjust other kwargs?
Thanks

Confusion about input ids

Hi, thanks for sharing such a great job.
I've read your fine-tuning code and I'm a little confused about the inputs of the model.
From the code, the Input of model should be, here's an, example: ### # Instruction: {instruction}### Input{input}### Response:{response}. so the input_ids: tokenizer(example), label_ids:tokenizer(example), and label_ids[:len(source_len)]=IGNORE_INDEX.
I would like to ask, why do input ids contain response token ids? So the data target won't leak?

I am looking forward to your reply. Thank you very much.

Any APIs like OpenAI will be released in the future?

Any plan?