This repo implements a new method called DAQ (Divide-and-Quantize) which essentially divides big weight matrices into flexible chunks and quantizes them separately.
To install from source and develop locally:
git clone https://github.com/Luccadoremi/Model-Compression-DAQ.git
cd hat
pip install --editable .
1. Train a SuperTransformer (HAT [paper])
The SuperTransformer is a supernet that contains many SubTransformers with weight-sharing.
By default, we train WMT tasks on 8 GPUs. Please adjust --update-freq
according to GPU numbers (128/x
for x GPUs). Note that for IWSLT, we only train on one GPU with --update-freq=1
.
python train-our.py --configs=configs/[task_name]/supertransformer/[search_space].yml
# for example
python train-our.py --configs=configs/wmt14.en-de/supertransformer/space0.yml
# another example
CUDA_VISIBLE_DEVICES=0,1,2,3 python train-our.py --configs=configs/wmt14.en-fr/supertransformer/space0.yml --update-freq=32
In the --configs
file, SuperTransformer model architecture, SubTransformer search space and training settings are specified.
3. Train a Searched SubTransformer (Training with Quantization Noise for Extreme Model Compression [paper])
For details please check the script.
# run with default arguments
./train.sh
# for example this will run a subtransformer training with quantization noise
./train.sh our quant_noise
# this will quantized all weights for details check corresponding yml files
./train.sh our post_quant-quant_noise-n5
# to provide model.yml for a dataset, train.sh can be run like following
# ./train.sh <ARCH> <COMMON.YML-TYPE> <GPUs> <DATASET> <MODEL.YML>
./train.sh our post_quant-quant_noise-n5 0,1 iwslt14.de-en [email protected][email protected]
For details please check the script.
# run with default arguments
./test.sh
# Calculate BLEU score for non-quantized model
./test.sh our quant_noise
# Calculate BLEU score for a quantized model (you need to provide quantization config path)
./test.sh our post_quant-quant_noise-n5 configs/iwslt14.de-en/subtransformer/pq-quantization-n5.yml
- Python >= 3.6
- PyTorch >= 1.0.0
- configargparse >= 0.14
- New model training requires NVIDIA GPUs and NCCL
- sklearn
- Use 4 bit to encode assignments using more buckets (for now its 8 bits)
- Shared centroids accross the layers
- 1D weight resampling https://github.com/adefossez/julius/
This repository is released under the MIT license. See LICENSE for more information.