Code Monkey home page Code Monkey logo

lq_lora_v0's Introduction

LQ-LoRA: Low-rank plus Quantized Matrix Decomposition for Efficient Language Model Finetuning [Paper]

Changelog

  • 20231215: Uploaded artifacts.

Artifacts

  • Model checkpoint (and training logs) for LLaMA-2 7B with LQ-LoRA (2.75-bits, 64-rank, Fisher) [link]
  • Model checkpoint (and training logs) for LLaMA-2 70B with LQ-LoRA (2.75-bits, 64-rank, Fisher) [link]
  • Pre-computed ILP data for LLaMA-2 7B [link]
  • Pre-computed ILP data for LLaMA-2 70B [link]
  • Fisher Information for LLaMA-2 7B [link]
  • Fisher Information for LLaMA-2 70B -> file over the size limit, please contact us!

Installation

  1. Clone the repo
git clone https://github.com/HanGuo97/lq-lora.git
cd lq-lora
  1. Create Docker image (optional)
# Using BuiltKit
DOCKER_BUILDKIT=1 docker build \
    -t lqlora \
    -f Dockerfile \
    .

docker run -ti --rm \
    --gpus all \
    -p 28888:8888 \
    --shm-size=2g \
    lqlora \
    bash -c "cd main/ && jupyter-lab --ip=0.0.0.0 --allow-root"
  1. Install dependencies
bash scripts/setup.sh

Note: Some of the codebase relies on PyTorch>=2.1.

Usages

Downloading Data for Quantization

After downloading the files, please update FILE_NAMES_DICT in models/allocation_utils accordingly.

Applying Quantization

from transformers import AutoTokenizer, AutoModelForCausalLM
from models import lora_utils

data = "c4"         # applying data-aware quantization
budget = "2.75"     # target bits
model_size = "70b"  # 7b or 70b

# Loads the base model (to CPU)
model = AutoModelForCausalLM.from_pretrained(
    f"meta-llama/Llama-2-{model_size}-hf")

# Adds LoRA components, etc
model = lora_utils.prepare_model_for_lora(
    model=model,
    num_ranks=64,
    lora_alpha=16,
    lora_dropout=0.0,
    use_gradient_checkpointing=True)

# Applies LQ-LoRA to the model.
lora_utils.transform_lora_layers(
    lpq=True,
    model=model,
    model_name=f"llama-2-{model_size}/lpq-64/{data},budget={budget}",
    device="cuda")

Saving Quantized Models

Note that HuggingFace's PEFT library only saves the adapter parameters. Since LQ-LoRA additionally changes the base model parameters, we need to save the entire weights of the model.

state_dict = model.state_dict()
file_name = os.path.join(
    output_dir,
    "full_model.pth")
torch.save(state_dict, file_name)

Loading Quantized Models

# No need to apply `transform_lora_layers` because
# these will be loaded from the checkpoint.
model = lora_utils.prepare_model_for_lora(
    model=model,
    num_ranks=64,
    lora_alpha=16,
    lora_dropout=0.0,
    use_gradient_checkpointing=True,
    checkpoint_dir=checkpoint_dir)  # -> enter the path to the checkpoint directory

Todos

  • Upload the artifacts
  • We use a legacy version of the (de)quantizaton implementation. We will update the code to use the latest version of the (de)quantization implementation.

Acknowledgement

This code reuses components from several libraries including QLoRA and OmniQuant.

lq_lora_v0's People

Contributors

baohaoliao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.