Code Monkey home page Code Monkey logo

clip's Introduction

CTP

GitHub Python PyTorch visitors

The repository is

  1. A PyTorch library that provides some continual learning baseline algorithms on the vision-language continual pretraining benchmark P9D dataset.

  2. PyTorch implement for ICCV23 paper of "CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation".

๐ŸŽจ Introduction

Vision-Language Pretraining (VLP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets. Regarding the growing nature of real-world data, such an offline training paradigm on ever-expanding data is unsustainable, because models lack the continual learning ability to accumulate knowledge constantly. However, most continual learning studies are limited to uni-modal classification and existing multi-modal datasets cannot simulate continual non-stationary data stream scenarios.

To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D which contains over one million product image-text pairs from 9 industries. The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.

We comprehensively study the characteristics and challenges of VLCP, and propose a new algorithm: Compatible momentum contrast with Topology Preservation, dubbed CTP. The compatible momentum model absorbs the knowledge of the current and previous-task models to flexibly update the modal feature. Moreover, Topology Preservation transfers the knowledge of embedding across tasks while preserving the flexibility of feature adjustment.

main

โš™๏ธ Setup and Environments

  • Python=3.7.15
  • PyTorch=1.11.0
  • Nvidia Driver Version=470.82.01
  • CUDA Version=11.4

The detailed dependencies can refer to requirements.txt.

pip install -r requirements.txt

๐Ÿ“‹ Reimplemented Methods

The detailed introduction of each baseline method can refer to the appendix of our paper or the corresponding raw paper.

Besides, the train folder provides the reimplemented codes on the vision-language continual pretraining task. Meanwhile, we also provide the training log of all methods as supplementary.

  • SeqF: Learning approach which learns each task incrementally while not using any data or knowledge from previous tasks.

  • JointT: Learning approach which has access to all data from previous tasks and serves as an upper bound baseline.

Memory-Free methods: Baseline methods without exemplar replay.

Memory-Buffer methods: Baseline methods with exemplar replay.

  • ER: Continual learning with tiny episodic memories. arxiv | code
  • MoF: (Mean-of-Feature). The exemplar sampling methods used in ICARL.
  • Kmeans: Use online k-Means to estimate the k centroids in feature space and update the exemplar buffer. It is documented in arxiv as a comparative sampling method.
  • ICARL: Incremental Classifier and Representation Learning. arxiv | CVPR 2017 | code
  • LUCIR: Learning a Unified Classifier Incrementally via Rebalancing. CVPR 2019 | code

๐Ÿ“ Dataset

The details of P9D dataset can be found in this repository.

๐Ÿ“ฅ Pretrained Model Weight

The model weights of each method are too large. For example, each baseline method has model weights obtained from 8 tasks (2.3G*8=18.4G). As an alternative, we provide the training log of all baseline methods in the default and reversed task order.

Meanwhile, we provide the download links of CTP and CTP_ER model weights which are trained in the default task order. The link to Google Driver only has the model weights of the final task, but the link to Baidu Netdisk has the model weights of each task.

training log model weights
Google Driver Here Here
Baidu Netdisk Here Here

๐Ÿ” Train and Evaluation

Training from Scratch:

  1. Modify the file paths of the dataset in configs/base_seqF.yaml to your path.
  2. Modify the LOG_NAME and OUT_DIR in shell/seq_xxx.sh to your storage path. xxx represents the name of the method.
  3. Change the current path to the shell folder, and run the corresponding scripts seq_xxx.sh.
    cd /shell/
    sh seq_xxx.sh
    
  4. The corresponding training log will be written in the logger folder.

Evaluation:

  1. Modify the LOG_NAME and OUT_DIR in eval.sh to the storage path of the trained model.
  2. Run the evaluation script eval.sh.
    sh eval.sh
    

๐Ÿ“ Citation

If this codebase is useful to you, please cite our work:

@article{zhu2023ctp,
  title={CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation},
  author={Hongguang Zhu and Yunchao Wei and Xiaodan Liang and Chunjie Zhang and Yao Zhao},
  journal={Proceedings of the IEEE International Conference on Computer Vision},
  year={2023},
}

๐Ÿผ Contacts

If you have any questions, please feel free to contact me: [email protected] or [email protected].

๐Ÿ“š Reference

  1. Li, Junnan, et al. "Align before Fuse: Vision and Language Representation Learning with Momentum Distillation." NeurIPS. 2021.
  2. Masana, Marc, et al. "Class-Incremental Learning: Survey and Performance Evaluation on Image Classification." TPAMI. 2023.
  3. Zhou, Dawei, et al. "PyCIL: a Python toolbox for class-incremental learning." SCIENCE CHINA Information Sciences. 2023.
  4. Hong, Xiaopeng, et al. "An Incremental Learning, Continual Learning, and Life-Long Learning Repository". Github repository
  5. Wang, Liyuan, et al. "A Comprehensive Survey of Continual Learning: Theory, Method and Application". arxiv 2023

clip's People

Contributors

y-huiming-y avatar kevinlight831 avatar bymust avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.