Code Monkey home page Code Monkey logo

open-instruct's Introduction

Training Open Instruction-Following Language Models

This repo serves as an open effort on instruction-tuning popular pretrained language models on publicly available datasets. We release this repo and will keep updating it with:

  1. Code for finetuning language models with latest techniques and instruction datasets in a unified format.
  2. Code for running standard evaluation on a range of benchmarks, targeting for differnt capabilities of these language models.
  3. Checkpoints or other useful artifacts that we build in our exploration.

Please see our first paper How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources for more thoughts behind this project and our initial findings.

Tülu (a hybrid camel) represents a suite of LLaMa models that we built by fully-finetuning them on a strong mix of datasets.

News

  • [2023-08-18] Added support for ToxiGen/TrutufulQA evaluation. Check our scripts/eval/ for examples of running them.
  • [2023-08-08] Supported several new instruction dataset, including LIMA / WizardLM / Open-Orca. See the preparation script for details. Performance hasn't been evaluated yet.
  • [2023-08-06] Supported LLaMa 2 finetuning and FlashAttention-2 by bumping the version of transformers and many other dependencies.
  • [2023-06-29] Added licensing info for our released models.
  • [2023-06-09] Released Tülu (a suite of LLaMa models fully-finetuned on a strong mix of datasets) and many other checkpoints on HuggingFace [Links].
  • [2023-06-09] Initial release of the codebase containing the training and evaluation code for our arxiv paper.

Setup

To run training, evaluation, or inference for our finetuned models, you need to install the required packages by running the following command (after installing pytorch):

pip install -r requirements.txt

If you just want the dependencies for the weight diff script, use:

pip install -r weight-diff-requirements.txt

Training

Dataset preparation

We include a collection of representative instruction datasets in our exploration and are adding new ones to our list. We unify them into the same chatting format. To download and prepare these datasets, simply run the following command:

./scripts/prepare_train_data.sh

Please check these datasets for licenses and restrictions around their use!

Model preparation

Generally, most huggingface-compatible causal language models should work fine with our codebase, potentially with some adjusting for different tokenizers etc. Some models may require addtional requests to download. E.g., for LLaMa 1 and 2, please consult the Hugging Face documentation for requesting access and converting them to a huggingface-compatible format.

Finetuning

You can use the following command to run instruction tuning (finetuning a pretrained model to follow instructions):

./scripts/finetune_with_accelerate.sh

Make sure to adjust model_name_or_path, tokenizer_name, train_file, and output_dir to your models / data / setting. By default, this uses deepspeed with accelerate.

Released Checkpoints

We provide a number of model checkpoints that we trained. You can find them on Hugging Face here. Here are some quick links to the checkpoints that are finetuned from LLaMa 1:

Datasets ↓ Model Sizes → 7B 13B 30B 65B
SuperNI link link
CoT link link
Flan V2 link link
Dolly link link
Open Assistant 1 link link
ShareGPT link link link link
Self-instruct (original) link link
Unnatural Instructions link link
Alpaca link link
Code-Alpaca link link
GPT4-Alpaca link link
Baize link link
Human-Mix link link link link
Tulu link link link link

We also trained Pythia and OPT models on the Tulu mixture (aka the Human+GPT mixture), and they are available here:

Weight diff script

Some of the checkpoints are released as weight diffs to the base model (mostly for LLaMa 1). We use a slightly modified form of the Alpaca weight diff script, which runs the same.

To merge a model:

  1. Download the relevant LLaMa model and convert it to Hugging Face format (see above).
  2. Download our repository and install the right dependencies (see above).
  3. Download the model diff you want.
  4. Run the command below:
python scripts/weight_diff.py recover --path_raw ${hf_llama_path} --path_tuned ${output_path} --path_diff ${diff_location}

Evaluation

Benchmark-based eval

We provide the scripts for running evaluation of Huggingface/OpenAI models on a list of standard benchmarks targeting for the core capabilities of large language models. These benchmakrs include:

We are working on including more promising benchmarks into this list. Please stay tuned!

You can use the following script to download all the evaluation data:

./scripts/prepare_eval_data.sh

Evaluation scripts for different datasets are put under ./scripts. For example, you can use the following command to run the MMLU evaluation script:

./scripts/eval/mmlu.sh

Model-based eval

We support using GPT4 to evaluate the quality of model's response following the GPT4 evaluation protocol proposed in AlpacaFarm. To run this AlpacaFarm eval, please make sure you install our fork of AlpacaFarm (https://github.com/hamishivi/alpaca_farm) and use the following script:

python eval/alpaca_farm_eval.py --model <model> --batch_size 8

Please check the script for more details on the script itself!

Human evaluation

We will release our human evaluation interface and data soon!

Licensing

This codebase is licensed under Apache 2.0 as given in LICENSE.

The license we use for the models released (along with the base model licenses) can be found in model_licenses/tulu_license.txt - just replace <MODELNAME> with the actual model name (i.e., the name on HuggingFace).

Citation

If you used this repository or our models, please cite our work:

@misc{wang2023far,
   title={How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources}, 
   author={Yizhong Wang and Hamish Ivison and Pradeep Dasigi and Jack Hessel and Tushar Khot and Khyathi Raghavi Chandu and David Wadden and Kelsey MacMillan and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi},
   year={2023},
   eprint={2306.04751},
   archivePrefix={arXiv},
   primaryClass={cs.CL}
}

open-instruct's People

Contributors

yizhongw avatar hamishivi avatar dsttsd avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.