Task:
- Temporal Moment Localization via Language: given a query, find the corresponding moment in a given video. (major focus of this repo)
Markdown format:
- [Paper Name](link) - Author 1 et al, `Conference Year`. [[code]](link)
- 2020/07/27 start the repo.
- Papers before 2020 are mainly collected by muketong.
- to be updated ...
- grounding, retrieval, localization
- None.
- Grounded Language Learning from Video Described with Sentences - H. Yu et al,
ACL 2013
. - Visual Semantic Search: Retrieving Videos via Complex Textual Queries - Dahua Lin et al,
CVPR 2014
. - Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework - R. Xu et al,
AAAI 2015
. - Unsupervised Alignment of Actions in Video with Text Descriptions - Y. C. Song et al,
IJCAI 2016
.
- Localizing Moments in Video with Natural Language - Lisa Anne Hendricks et al,
ICCV 2017
. [code] - TALL: Temporal Activity Localization via Language Query - Jiyang Gao et al,
ICCV 2017
. [code]. - !(Still on arxiv 20200609)Where to Play: Retrieval of Video Segments using Natural-Language Queries - S. Lee et al,
arxiv 2017
.
- Attentive Moment Retrieval in Videos - M. Liu et al,
SIGIR 2018
. - Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos - B. Liu et al,
ECCV 2018
. - (Video Retrieval+Grounding)Find and Focus: Retrieve and Localize Video Events with Natural Language Queries - Dian Shao et al,
ECCV 2018
. - Temporally Grounding Natural Sentence in Video - J. Chen et al,
EMNLP 2018
. - Localizing Moments in Video with Temporal Language - Lisa Anne Hendricks et al,
EMNLP 2018
.
Supervised:
- MAC: Mining Activity Concepts for Language-based Temporal Localization - Runzhou Ge Ge et al,
WACV 2019
. [code] - Multilevel Language and Vision Integration for Text-to-Clip Retrieval - H. Xu et al,
AAAI 2019
. [code] - Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos - He, Dongliang et al,
AAAI 2019
. - To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression - Y. Yuan et al,
AAAI 2019
. [code] - Semantic Proposal for Activity Localization in Videos via Sentence Query - S. Chen et al,
AAAI 2019
. - Localizing natural language in videos - J. Chen et al,
AAAI 2019
. - ExCL: Extractive Clip Localization Using Natural Language Descriptions - S. Ghosh et al,
NAACL 2019
. - Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention - B. Jiang et al,
ICMR 2019
. [code] - Language-Driven Temporal Activity Localization_ A Semantic Matching Reinforcement Learning Model - W. Wang et al,
CVPR 2019
. - MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment - Da Zhang et al,
CVPR 2019
. - Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos - Zhu Zhang et al,
SIGIR 2019
. [code] - Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos - Yitian Yuan et al,
NIPS 2019
. [code] - DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization - Chujie Lu et al,
EMNLP 2019
. - !(still on arxiv 20200609)Temporal Localization of Moments in Video Collections with Natural Language - V. Escorcia et al,
arxiv 2019
.
Weakly Supervised:
- Weakly Supervised Video Moment Retrieval From Text Queries - N. C. Mithun et al,
CVPR 2019
. - Weakly-supervised spatio-temporally grounding natural sentence in video - Zhenfang Chen et al,
ACL 2019
. [code] - WSLLN: Weakly Supervised Natural Language Localization Networks - M. Gao et al,
EMNLP 2019
.
Supervised:
- Moment Retrieval via Cross-Modal Interaction Networks With Query Reconstruction - Zhijie Lin et al,
TIP 2020
. - Rethinking the Bottom-Up Framework for Query-based Video Localization - Long Chen et al,
AAAI 2020
. - Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction - Jingwen Wang et al,
AAAI 2020
. [code] - Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language - Songyang Zhang et al,
AAAI 2020
. [code] - Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video - Jie Wu et al,
AAAI 2020
. [code] - Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention - C. R. Opazo et al,
WACV 2020
. [code] - Local-Global Video-Text Interactions for Temporal Grounding - Mun Jonghwan et al,
CVPR 2020
. [code] - Dense Regression Network for Video Grounding - Zeng Runhao et al,
CVPR 2020
. [code] - Tripping through time: Efficient Localization of Activities in Videos - Meera Hahn et al,
BMVC 2020
. - Span-based Localizing Network for Natural Language Video Localization - Hao Zhang et al,
ACL 2020
. [code] - Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language - Shaoxiang Chen et al,
ECCV 2020
. [code] - Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos - Shaoxiang Chen et al,
ECCV 2020
. - Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization - Daizong Liu et al,
MM 2020
. [code] - Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos - Xiaoye Qu et al,
MM 2020
. - Language Guided Networks for Cross-modal Moment Retrieval - Kun Liu et al,
arxiv
.
Weakly Supervised:
- Weakly-Supervised Video Moment Retrieval via Semantic Completion Network - Zhijie Lin et al,
AAAI 2020
. - Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos - Zhu Zhang et al,
MM 2020
. - VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval - Minuk Ma et al,
ECCV 2020
.
Conferences to be update:
- MM 2020 (some papers are added, wait for proceedings)
- EMNLP 2020 (wait for camera-ready)
- ICCV 2020