Code Monkey home page Code Monkey logo

2024-aaai-hpt's Introduction

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI2024)

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao

PWC PWC PWC PWC PWC

Official implementation of the paper "Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models".


๐Ÿ“ข News

  • (Dec 12, 2023)
    • Training and evaluation codes for HPT are released ๐Ÿ”“
  • (Dec 09, 2023)
    • Paper accepted at AAAI 2024 ๐ŸŽ‰

โœจ Highlights

main figure

Abstract: Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections among entities or attributes linked to a particular category. To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations. Preexisting prompt tuning methods exhibit inadequacies in managing this structured knowledge. Consequently, we propose a novel approach called Hierarchical Prompt Tuning (HPT), which enables simultaneous modeling of both structured and conventional linguistic knowledge. Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning. In addition, by incorporating high-level and globallevel prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships. Extensive experiments demonstrate that our HPT shows strong effectiveness and generalizes much better than existing SOTA methods.

๐Ÿš€ Contributions

  • We raise the consideration that it is crucial to use structured knowledge from descriptions to assist learning prompts. Thus, we leverage large language models to generate category-related descriptions along with corresponding structured relationships;
  • We propose Hierarchical Prompt Tuning (HPT) for simultaneously modeling both structured and conventional linguistic knowledge. By incorporating both forms of knowledge, we can enhance prompt effectiveness with more category-related information;
  • Extensive experiments are conducted on three commonly used evaluation settings. HPT outperforms existing approaches with a remarkable improvement.

๐Ÿ“Š Results

Base-to-New Generalization

Results reported below show average accuracy for base and new classes across 11 recognition datasets averaged over 3 seeds. Please refer to our paper for more numerical results

Name Base Accuracy New Accuracy Harmonic Mean
CLIP 69.34 74.22 71.70
CoOp 82.69 63.22 71.66
CoCoOp 80.47 71.69 75.83
MaPLe 82.28 75.14 78.55
HPT 84.32 (+2.04) 76.86 (+1.72) 80.23 (+1.68)

Cross-Dataset Evaluation

Results reported below show accuracy for the source dataset ImageNet and 4 ImageNet-variant datasets averaged over 3 seeds.

ImNet Caltech Pets Cars Flowers Food Aircraft SUN397 DTD EuroSAT UCF Average
CLIP 71.51 93.70 89.14 64.51 68.71 85.30 18.47 64.15 41.92 46.39 66.55 63.88
CoCoOp 71.02 94.43 90.14 65.32 71.88 86.06 22.94 67.36 45.73 45.37 68.21 65.74
MaPLe 70.72 93.53 90.49 65.57 72.23 86.20 24.74 67.01 46.49 48.06 68.69 66.30
HPT 71.72 94.20 92.63 66.33 74.84 86.21 25.68 68.75 50.87 47.36 70.50 67.74

Domain Generalization

Results reported below show accuracy for the source dataset ImageNet and the other 10 target datasets averaged over 3 seeds.

ImageNet ImageNetV2 ImageNet-S ImageNet-A ImageNet-R Average
CLIP 66.73 60.83 46.15 47.77 73.96 57.17
CoOp 71.51 64.20 47.99 49.71 75.21 59.28
CoCoOp 71.02 64.07 48.75 50.63 76.18 59.90
MaPLe 70.72 64.07 49.15 50.90 76.98 60.26
HPT 71.72 65.25 49.36 50.85 77.38 60.71

๐Ÿ› ๏ธ Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

๐Ÿ—‚๏ธ Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

๐Ÿงช Training and Evaluation

Please refer to RUN.md for detailed instructions on training and evaluating.

๐Ÿ” Citation

If you use our work, please consider citing:

@misc{wang2023learning,
      title={Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models}, 
      author={Yubin Wang and Xinyang Jiang and De Cheng and Dongsheng Li and Cairong Zhao},
      year={2023},
      eprint={2312.06323},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

๐Ÿ“ง Contact

If you have any questions, please create an issue on this repository or contact us at [email protected] or [email protected].

๐Ÿ˜ƒ Acknowledgments

Our code is based on CoCoOp and CoOp repository. We thank the authors for releasing their code. If you use our model and code, please consider citing these works as well.

2024-aaai-hpt's People

Contributors

shuguang-52 avatar thomaswangy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.