T-PAMI, 2024
Linhui Xiao
·
Xiaoshan Yang
·
Yaowei Wang
·
Changsheng Xu
This repo is used for recording, tracking, and benchmarking several recent visual grounding methods to supplement our survey.
If you find any work missing or have any suggestions (papers, implementations, and other resources), feel free to pull requests.
We will add the missing papers to this repo as soon as possible.
-
You are welcome to give us an issue or PR (pull request) for your open vocabulary learning work !
-
Note that: Due to the huge paper in Arxiv, we are sorry to cover all in our survey. You can directly present a PR into this repo and we will record it for next version update of our survey.
- We update GitHub to record the available paper by the end of 2024/7/30.
-
A comprehensive survey for Visual Grounding, including Referring Expression Comprehension and Phrase Grounding.
-
It includes the most recently Grounding Multi-modal Large Language model and Visual-Language Pre-trained model grounding transfer works.
-
We list detailed results for the most representative works and give a fairer and clearer comparison of different approaches.
-
We provide a list of future research insights.