Code Monkey home page Code Monkey logo

kaamava / research-and-application-of-temporal-reasoning Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 0.0 98 KB

This is used for storing the code and all the hard work during 2022.10-2023.10 in terms of the temporal reasoning project in the area of NLP. Sincere graduate for all the kindness and help offered by my professor Juntao Li and my senior Zhaochen Su, who have achieved huge success in EMNLP2023. My own work has been submitted to COLING2024.

Jupyter Notebook 20.17% Python 78.21% Shell 1.62%

research-and-application-of-temporal-reasoning's Introduction

Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change

This repository contains part of the code and pre-trained models for our paper "Awareness of Time: Video-Language Models Embedding with Temporal Reasoning", which has been submitted to LREC-COLING2024. The complete code will be released after the conference announces the acceptance results.

Contents

  • Abstract
  • Overview
  • Datasets
  • Baseline
  • Train
  • Results

Abstract

Video-language pre-training has significantly improved the performance of diverse downstream tasks related to video and language. However, existing approaches often directly adapt image-language pre-training paradigms to video-language tasks, neglecting the unique temporal characteristics of videos. In this paper, we present a novel temporal-aware video-language pre-training framework. It introduces two innovative pre-training tasks to enhance temporal-awareness in multi-modal representations, incorporating fine-grained temporal moment information and temporal contextual relations between video-text pairs. Firstly, we propose a cross-modal moment exploration task, leveraging paired texts to uncover detailed video moment representations. Subsequently, using the acquired moment representations, we capture inherent temporal contextual relations by aligning video-text pairs across different time resolutions in a multi-modal temporal relation exploration task. Additionally, we introduce a shuffling test to assess the temporal reliance of datasets and the efficacy of video-language pre-training. This framework aims to fully exploit the temporal dimension in video data for more effective pre-training and improved downstream task performance.

Overview

·We show that existing video-language models have difficulty in associating time order in video and language through controlled experiments on synthetic data and several evaluations on real datasets.

·We propose a temporal reasoning video-language pre-training framework with both videolanguage understanding and generation capabilities.

·We introduce temporal reasoning pre-training tasks to generate temporal reasoning multi-modal representation through modeling fine-grained temporal moment information and capturing the temporal contextual relations between moment and event.

Datasets

we pre-train our model on a webly-sourced video dataset WebVid-2M with 2.5M video-text pairs and a image-text dataset Google Conceptual Captions (CC3M) with 3M image-text pairs. Unlike previous methods, we do not pre-train our model on the large-scale video-text datasets like HowTo100M with 136M video-text pairs and YT-Temporal-180M due to the heavy computation.

tempo-data-v1

We evaluate our pre-trained model on several video-language benchmarks including video-text retrieval, video question answering, and video captioning tasks. Specifically, video question answering (VideoQA) can be categorized as Multiple-Choice (MC) and Open-Ended (OE) settings. The evaluation datasets are briefly summarized in below.

• Video-Text Retrieval: MSRVTT, ActivityNet Caption and SSv2-Template;

• VideoQA (MC): TGIF-Action, TGIF-Transition, MSRVTT-MC and NExT-QA;

• VideoQA (OE): MSRVTT-QA,MSVD-QA and ActivityNet-QA;

• Video Captioning: MSRVTT.

Baseline

Post-pretraining Dataset Hyperparameters Download link
$\alpha_{\text{same}}$ $\alpha_{\text{cross}}$ $\beta$
TEMPO-TL 1.0 1.0 1.0 Link
ActivityNet 1.0 1.0 0.0 Link
Charades 1.0 1.0 0.0 Link
Charades-Ego 1.0 1.0 1.0 Link

Train

Result

These two parts will be released after the conference announces the acceptance results.

research-and-application-of-temporal-reasoning's People

Contributors

kaamava avatar

Stargazers

 avatar  avatar igeo avatar  avatar  avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.