Code Monkey home page Code Monkey logo

quik's Introduction

QUIK

This repository contains the code for QUIK, a method for quantizing the majority of the weights and activations to 4bit post-training.

QUIK is described in the following paper: https://arxiv.org/abs/2310.09259

Install

Dependencies

  • cmake
  • C++ compiler (GCC/clang/...)
  • nvcc

Instructions

git clone https://github.com/IST-DASLab/QUIK.git
cd QUIK
pip install -e .  # or pip install .

Example

LLama example

cd experiments
pip install -r requirements.txt
python llama.py --fp_features_num 256 --model meta-llama/Llama-2-7b-hf --hf_token <your_hf_token> --dataset c4 \ 
--w_bits 4 --w_clip --a_bits 4 --save_qmodel_path save_gptq_model_path --int8_down_proj --sim_eval --benchmark 

Benchmark will be run on all available GPUs.

Linear layer benchmarks

Linear layer benchmarks can be run with python layer_benchmark.py. One can vary input size with command line parameters.

Model adapt to QUIK

First, one has to quantize the model weights using GPTQ algorithm. In llama.py it is done with llama_sequential function. From that we get quantized weights (that are still stored in torch.float16). Then ones needs create QUIK Linear layers using qlinear.MixedQLinear.from_float that must replace original Linear layers. See llama_replace_with_kernels in llama.py. Now the quantized model is ready for use.

Fake Quantization examples

To run the fake quantization example, check fake_quant directory.

Citation

The full paper is available on arxiv. The full citation is

@article{QUIK,
  title={QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models},
  author={Ashkboos, Saleh and Markov, Ilia and Frantar, Elias and Zhong, Tingxuan and Wang, Xincheng and Ren, Jie and Hoefler, Torsten and Alistarh, Dan},
  journal={arXiv preprint arXiv:2310.09259},
  year={2023}
}

quik's People

Contributors

sashkboos avatar jieren98 avatar xcwang1999 avatar ilmarkov avatar blacksamorez avatar dalistarh avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.