Code Monkey home page Code Monkey logo

corda's Introduction

CorDA: Context-Oriented Decomposition Adaptation of Large Language Models

(under construction)

Authors: Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem

Paper: [PDF]

corda


Getting Start

Download the repo and install dependencies.

git clone https://github.com/iboing/CorDA.git
cd CorDA
pip install -r requirements.txt

The dataset in json/jsonl formats used to collect covariance matrices (MetaMath for math, CodeFeedback for code, WizardLM_evol_instruct and alpaca for instruction following) can be downloaded from our huggingface repo. The other datasets will be automatically downloaded when runing the code.

Step 1: Context-oriented Decomposition

You can skip Step 1 by directly downloading the decomposed model from our huggingface repo.

Corda enables to initialize a learnable adapter in two modes, knowledge-preserved adaptation (tools/build_KPA.sh) and instruction-previewed adaptation (tools/build_IPA.sh). The knowledge-preserved adaptation samples questions from QA datasets, such as triviaQA and nq_open, to obtain covariance matrices for decomposition, and uses the smallest $r$ singular values and vectors to initialize the adapter. The instruction-previewed adaptation samples queries and responses from the finetuning dataset to obtain covariance matrices, and initialize the adapter with the larget $r$ singular values and vectors.

๐Ÿ“– Knowledge-preserved adaptation

CUDA_VISIBLE_DEVICES=0 python build_corda.py \
    --model_id "meta-llama/Llama-2-7b-hf" \
    --cov_aware \
    --r {rank} \
    --use_cache \
    --calib_dataset "nqopen" \
    --calib_loader_size 256 \
    --save_model \
    --save_path {path_to_decomposed_model}

Arguments:

  • --model_id is the pre-trained model for decomposition.
  • --cov_aware adopts our context-oriented decomposition and collects covariance matrices.
  • --r is the low rank of LoRA, e.g. 128.
  • --use_cache adopts the dataloader and covariance matrices saved in CorDA/cache, to avoid calculating the covariance matrices again.
  • --calib_dataset specifies the dataset to sample data to obtain covariance matrices. KPA mode uses QA datasets with choices of "nqopen" and "traivia_qa".
  • --calib_loader_size is the number of sampled data.
  • --save_model saves the initialized model in --save_path.

๐Ÿ”ญ Instruction-previewed adaptation

CUDA_VISIBLE_DEVICES=0 python build_corda.py \
    --model_id "meta-llama/Llama-2-7b-hf" \
    --cov_aware \
    --r {rank} \
    --use_cache \
    --first_eigen \
    --calib_dataset "MetaMATH" \
    --calib_loader_size 256 \
    --save_model \
    --save_path {path_to_decomposed_model}

Arguments:

  • --first_eigen uses the largest $r$ singular values and vectors to initialize the learnable adapter for the instruction-previewed adaptation mode.
  • --calib_dataset specifies the dataset to sample data to obtain covariance matrices. IPA mode uses finetuning datasets with choices of "MetaMATH", "codefeedback", "WizLMinstruct", and "alpaca".

Step 2: Adapter Training

LoRA:

sh tools/train_LoRA.sh {path_to_trained_model}

CorDA:

sh tools/train_CorDA.sh {path_to_decomposed_model} {path_to_trained_model}

Step 3: Inference

After training, LoRA adapter can be merged with the base model by runing:

python merge_adapter_to_base_model.py --base_model "meta-llama/Llama-2-7b-hf" --adapter {path_to_trained_model}/ft --output_path {path_to_merged_model}

The current CorDA code is based on customized model instead of huggingface/PEFT. So, CorDA model can be directly used to inference (trust_remote_code needs to be true when loading the model) without the need of merging. If you still wants to merge the trained adapter in CorDA with the frozen weight to restore the original LLaMA model architecture, you can execute the command:

(optional)
python merge_adapter_for_corda.py --model_id {path_to_trained_model}/ft --save_path {path_to_merged_model}

This may lead to slighty lower performance than inference directly without merging due to numerical variation.

Inference on world knowledge:

Inference on world knowledge benchmarks is based on EleutherAI/lm-evaluation-harness. For example, we evaluate on nq_open by:

accelerate launch -m lm_eval \
    --model hf \
    --model_args pretrained={path_to_trained_model}/ft,trust_remote_code=True,dtype=float16 \
    --output_path {result_path}/nq_open.json \
    --tasks nq_open \
    --batch_size auto \
    --max_batch_size 8 \
    --device cuda

Inference on Math:

Evaluation on Gsm8k and Math can be performed by:

(for CorDA:)
sh tools/inference_Math.sh {path_to_trained_model}/ft
(for LoRA:)
sh tools/inference_Math.sh {path_to_merged_model}

Inference on Code and Instruction Following:

Evaluation on HumanEval and MBPP is based on bigcode-evaluation-harness. Evaluation on MTBench is based on FastChat. We use their default settings for evaluation.

Results

Method TriviaQA NQ open GSM8k Math
LoRA 44.17 1.91 42.68 5.92
CorDA (KPA with nqopen) 45.23 10.44 45.64 6.94
CorDA (IPA with MetaMath) - - 54.59 8.54

Compared with LoRA, CorDA in knowledge-preserved adaptation (KPA) not only has better performances on the finetuning task, but also helps to mitigate the forgetting of world knowledge. CorDA in instruction-previewed adaptation (IPA) enables to further enhance the finetuning performance.

The models can be downloaded from our huggingface repo.

Wikitext/PTB Results

To reproduce the Figure 2 and Table 6 of our paper to compare our context-oriented decomposition with ASVD and Plain SVD, you can execute tools/full_decompose.sh. Concretely,

CUDA_VISIBLE_DEVICES=0 python -u build_corda.py \
    --model_id="meta-llama/Llama-2-7b-hf" \
    --r {the smallest rank to discard} \
    --mode full_decompose \
    --use_cache \
    --calib_dataset "wikitext2" \
    --calib_loader_size 256 \
    [--cov_aware for context-oriented decomposition, --act_aware for ASVD, or remove this argument for plain SVD]

Arguments:

  • --r is the number of the smallest singular values and vectors to discard, i.e., the x-axis of Figure 2.
  • --cov_aware adopts our context-oriented SVD. Use --act_aware for ASVD or remove this argument for the Plain SVD.

Citation

If you find our work/code useful for your research, please consider citing:

@article{yang2024corda,
  title={CorDA: Context-Oriented Decomposition Adaptation of Large Language Models},
  author={Yang, Yibo and Li, Xiaojie and Zhou, Zhongzhu and Song, Shuaiwen Leon and Wu, Jianlong and Nie, Liqiang and Ghanem, Bernard},
  journal={arXiv preprint arXiv:2406.05223},
  year={2024}
}

corda's People

Contributors

iboing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.