The official PyTorch implementation of LAE introduced in the following paper:
Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, Jian Zhang;
A Unified Continual Learning Framework with General Parameter-Efficient Tuning;
International Conference on Computer Vision (ICCV), 2023.
The "pre-training → downstream adaptation" presents both new opportunities and challenges for Continual Learning (CL). Although the recent state-of-the-art in CL is achieved through Parameter-Efficient-Tuning (PET) adaptation paradigm, only prompt has been explored, limiting its application to Transformers only. In this paper, we position prompting as one instantiation of PET, and propose a unified CL framework with general PET, dubbed as Learning-Accumulation-Ensemble (LAE). PET, e.g., using Adapter, LoRA, or Prefix, can adapt a pre-trained model to downstream tasks with fewer parameters and resources. Given a PET method, our LAE framework incorporates it for CL with three novel designs. 1) Learning: the pre-trained model adapts to the new task by tuning an online PET module, along with our adaptation speed calibration to align different PET modules, 2) Accumulation: the task-specific knowledge learned by the online PET module is accumulated into an offline PET module through momentum update, 3) Ensemble: During inference, we respectively construct two experts with online/offline PET modules (which are favored by the novel/historical tasks) for prediction ensemble. We show that LAE is compatible with a battery of PET methods and gains strong CL capability. For example, LAE with Adaptor PET surpasses the prior state-of-the-art by 1.3% and 3.6% in last-incremental accuracy on CIFAR100 and ImageNet-R datasets, respectively.
-
Install dependencies
pip install -r requirements.txt
-
Prepare datasets
-
create a dataset root diretory, e.g., data
-
CIFAR100
andImageNet-R
datasets will be automatically downloaded -
the overview of dataset root diretory
├── cifar100 │ └── cifar-100-python ├── domainnet │ ├── clipart │ ├── infograph │ ├── painting │ ├── quickdraw │ ├── real │ └── sketch └── imagenet-r ├── imagenet-r ├── train_list.txt └── val_list.txt
-
-
Generate config file (replace
<root>
with your dataset root path)python main.py data.root=<root> data.dataset=cifar100 --print_config > cifar100.yaml
-
Run experiment
python main.py --config=cifar100.yaml
We provide configs and Makefile to quickly reproduce the ten-tasks experimental results reported in the paper, run the following command if the make
has been installed:
make vit_adapter
make vit_lora
make vit_prefix
make swin_adapter
make convnext_adapter
Run make
command with BASE
arg (default is base/cifar100_order1.yaml
) to reproduce other experiments, e.g.:
make BASE="base/imagenet-r_order1.yaml" vit_adapter
Modifiy data.num_increment_classes
(5/10
for CIFAR100/ImageNet-R) in base config files to reproduce 20-task
experiments.
Recent paper CODA-P uses the different ImageNet-R train-validation split and pre-training checkpoint from ours. To enable a fair comparison, we conducted additional experiments on ImageNet-R with the same training settings as CODA-P and report the results in the table below.
The pre-training checkpoints can be found at IN21K and IN21K+1K, The key difference between them is the data augmentation, and IN21K+1K further fine-tunes on the ImageNet-1K dataset after pre-training on the ImageNet-21K dataset.
The train and validation list files of CODA-P split can be found at train_list_coda-p.txt, val_list_coda-p.txt, which are transformed from the CODA-P code.
Approach | Split | Training Strategy | ViT-B_16 | ||
---|---|---|---|---|---|
CODA-P | L2P | DualPrompt | IN21K | 61.38 | 73.10 |
LAE (Prefix10) | L2P | DualPrompt | IN21K | 71.19 | 81.92 |
CODA-P | L2P | CODA-P | IN21K | 70.34 | 78.07 |
LAE (Prefix10) | L2P | CODA-P | IN21K | 72.27 | 79.97 |
CODA-P | CODA-P | CODA-P | IN21K+1K | 74.31 | 83.75 |
LAE (Prefix10) | CODA-P | CODA-P | IN21K+1K | 74.43 | 83.91 |
Our LAE achieves comparable performance to CODA-P using the CODA-P split and IN21K+1K checkpoint, while outperforming it using the L2P split and IN21K checkpoint (which is the setting used in L2P and DualPrompt). All experiments were conducted using the same 4 x Nvidia 3090 GPUs (modify the strategy
and devices
fields in config file).
We also compare our LAE with CODA-Prompt on 5-task DomainNet benchmark, results are shown below:
Approach | Training Strategy | ViT-B_16 | ||
---|---|---|---|---|
CODA-P | CODA-P | IN21K+1K | 67.11 | 13.79 |
LAE (Prefix10) | CODA-P | IN21K+1K | 68.37 | 8.33 |
where
-
We would like to thank Jaeho Lee for providing the PyTorch implementation of L2P and DualPrompt, which served as a valuable reference for our code.
-
We use datasets and scenarios provided by continumm, an awesome data loading library for Continual Learning.
-
The official JAX code of L2P and DualPrompt can be found here.
-
The official PyTorch code of CODA-P can be found here.
-
The official PyTorch code of ESN can be found here.
@article{gao2023lae,
title = {A Unified Continual Learning Framework with General Parameter-Efficient Tuning},
author = {Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, Jian Zhang},
journal = {International Conference on Computer Vision (ICCV)},
year = {2023}
}