Code Monkey home page Code Monkey logo

dada's Introduction

DADA++: Dual Alignment Domain Adaptation for Unsupervised Video-Text Retrieval

Xiaoshuai Hao, Haimei Zhao, Hui Zhang, Weiming Li, Rong Yin, Wanqian Zhang, and Jing Zhang

News | Abstract | Method | Results | Preparation | Code | Acknowledgments | Citing

News

  • (2023/9/27) DADA++ is coming.
  • (2023/2/28) Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval(DADA) is accepted to CVPR 2023.

Abstract

Video-text retrieval aims at returning the most semantically relevant videos given a textual query, which is a thriving topic in both computer vision and natural language processing communities. This paper focuses on a more challenging task, i.e., Unsupervised Domain Adaptation Video-text Retrieval (UDAVR), wherein training and testing data come from different distributions. Previous approaches are mostly derived from classification-based domain adaptation methods, which are neither multi-modal nor suitable for retrieval tasks. They merely alleviate the domain shift while overlooking the pairwise misalignment issue in the target domain, i.e., there exist no semantic relationships between target videos and texts. To tackle this, we propose a novel method named Dual Alignment Domain Adaptation (DADA++). Specifically, we first introduce cross-modal semantic embedding to generate discriminative source features in a joint embedding space. Besides, we utilize the cross-modal domain adaptations to balance the minimization of domain shift in a smooth manner. Furthermore, we empirically identify the pairwise misalignment in the target domain, and thus propose the integrated Dual Alignment Consistency (iDAC). The proposed iDAC adaptively aligns the video-text pairs, which are more likely to be relevant in the target domain, by verifying their cross-modal semantic proximity reciprocally in both hard and soft manners. This enables that positive pairs are increasing progressively and the noisy ones will potentially be aligned in the whole training procedure. We also provide insights into the function of DADA++ through the lens of domain adaptation, explaining its superiority in a theoretical way. Compared with state-of-the-art methods, DADA++ achieves 9.4% and 8.5% relative improvements on R@1 under the setting of TGIF→MSR-VTT and TGIF→MSVD respectively, demonstrating its superior performance.

Method

The schematic diagram of DADA++

the framework figure

Illustration of integrated Dual Alignment Consistency (iDAC)

the iDAC figure

Results

the result figure

Quantitative results

quantitative figure

Preparation

Prerequisites

The code is built with the following libraries:

  • Python 3.7
  • Pytorch 1.4.0
  • Transformers 3.1.0
  • Numpy 1.18.1

Data Preparation

Code

Codes coming soon...

Acknowledgments

Some components of this code implementation are adopted from HGR, GPO, CE, MMT, CLIP4Clip, and CLIP2Video. We sincerely appreciate their contributions.

Citing

If you find DADA useful in your research, please kindly consider citing the following paper.

@inproceedings{hao2023dual,
  title={Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval},
  author={Hao, Xiaoshuai and Zhang, Wanqian and Wu, Dayan and Zhu, Fei and Li, Bo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18962--18972},
  year={2023}
}

License

This project is licensed under the Apache-2.0 License.

dada's People

Contributors

xshuai714 avatar mapbench avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.