Code Monkey home page Code Monkey logo

emcl's Introduction

ใ€NeurIPS'2022 ๐Ÿ”ฅใ€‘Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Conference Paper

The implementation of NeurIPS 2022 paper Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations.

๐Ÿ’ก I also have other video-language projects that may interest you โœจ.

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Accepted by CVPR 2023 (Highlight) | [HBI Code]
Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Accepted by ICCV 2023 | [DiffusionRet Code]
Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Accepted by IJCAI 2023 | [DiCoSA Code]
Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen

๐Ÿ“ฃ Updates

  • [2023/04/12]: We provide download links for the processed datasets, including MSRVTT, MSVD, ActivityNet Captions, and DiDeMo. (See EMCL-Net)
  • [2023/04/10]: Add MSVD, LSMDC, ActivityNet Captions, and DiDeMo datasets (See EMCL-Net).
  • [2023/01/12]: Our approach achieves better performance (46.8 -> 48.2 on MSR-VTT dataset) when training with more GPUs (2 -> 8). So we recommend using more GPUs for better performance.

results

  • [2022/12/14]: Add the code of EMCL-Net.
  • [2022/11/21]: Release code for reimplementing the experiments in the paper.

๐Ÿš€ Quick Start

Datasets

Datasets Google Cloud Baidu Yun Peking University Yun
MSR-VTT Download Download Download
MSVD Download Download Download
ActivityNet TODO Download Download
DiDeMo TODO Download Download

Model Zoo

Checkpoint Google Cloud Baidu Yun Peking University Yun
MSR-VTT Download TODO Download
ActivityNet Download Download Download

Text-video Retrieval

Video-question Answering

๐Ÿ“• Overview

Most video-and-language representation learning approaches employ contrastive learning, e.g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs. However, such learned shared latent spaces are not often optimal, and the modality gap between visual and textual representation can not be fully eliminated. In this paper, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations.

motivation

๐Ÿ“š Method

EMCL

๐Ÿ“Œ Citation

If you find this paper useful, please consider staring ๐ŸŒŸ this repo and citing ๐Ÿ“‘ our paper:

@inproceedings{
jin2022expectationmaximization,
title={Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations},
author={Peng Jin and JinFa Huang and Fenglin Liu and Xian Wu and Shen Ge and Guoli Song and David A. Clifton and Jie Chen},
booktitle={Advances in Neural Information Processing Systems},
volume={35},
pages={30291--30306},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022}
}

๐ŸŽ—๏ธ Acknowledgments

Our code is based on MMT, CLIP, CLIP4Clip, DRL and CLIP2Video. We sincerely appreciate for their contributions.

emcl's People

Contributors

jpthu17 avatar infaaa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.