Code Monkey home page Code Monkey logo

a2summ's Introduction

Align and Attend: Multimodal Summarization with Dual Contrastive Losses (CVPR2023)

The official repository of our paper "Align and Attend: Multimodal Summarization with Dual Contrastive Losses".

teaser

Model Overview

model

Requirements

You can install the conda environment by running:

conda create -n a2summ python=3.8.13
conda activate a2summ
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install tensorboard
pip install rouge-score==0.1.2
pip install scipy ortools h5py pyyaml

Dataset

We evaluate our A2Summ on two multimodal summarization multimodal output datasets (CNN, Daily_Mail) and two standard video summarization datasets (SumMe, TVSum). We also collected a large-scale multimodal summarization dataset BLiSS which consists of livestream videos and transcripts with annotated summary. Before running the code, please download the pre-processed datasets from google drive link. Unzip it under the data/ folder and make sure the data structure is as below.

 ├── data
     └── BLiSS
         ├── annotation
         ├── feature
     └── CNN
         ├── annotation
         ├── feature
     └── Daily_Mail
         ├── annotation
         ├── feature
     └── SumMe
         ├── caption
         ├── feature
         ├── splits.yml
     └── TVSum
         ├── caption
         ├── feature
         ├── splits.yml

BLiSS Dataset

For the BLiSS dataset, due to the copyright issue, we only provide the extracted video/thumbnail features instead of the original videos/thunmbnails. If you need access to the original videos, please email me ([email protected]) for the public URLs of each video.

Running

Training

We train the model on a single GTX-1080ti GPU. To train the model on different dataset, please execute the following command.

python train.py --dataset ${dataset}

Testing

First, download the checkpoints into "saved_model" directory and pass it as the checkpoint flag.

python train.py --dataset ${dataset} \
    --test --checkpoint saved_model/${dataset}

Citation

If you find our code or our paper useful for your research, please [★star] this repo and [cite] the following paper:

@inproceedings{he2023a2summ,
  title = {Align and Attend: Multimodal Summarization with Dual Contrastive Losses},
  author={He, Bo and Wang, Jun and Qiu, Jielin and Bui, Trung and Shrivastava, Abhinav and Wang, Zhaowen},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023}
}

Acknowledgement

We referenced the repos below for the code

a2summ's People

Contributors

boheumd avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.