Code Monkey home page Code Monkey logo

video_captioning_rl's Introduction

Reinforced Video Captioning with Entailment Rewards (EMNLP 2017)

This repository contains the re-implementation along with improved results of the paper: Reinforced Video Captioning with Entailment Rewards (EMNLP 2017)

This code is tested on python 3.6 and pytorch 0.3.

Setup:

Install all the required packages from requirements.txt file.

Datasets:

Download the ResNet-152 frame-level features + ResNeXt-101 motion features for MSR-VTT videos.

Download the captions and vocabulary data for MSR-VTT, and place the downloaded data in 'data' folder.

Evaluation:

Clone the code from here to setup the evaluation metrics, and place it the parent directory on this repository. Note that this is also required during training since tensorboard logs the validation scores.

Run Code:

To train Baseline-XE model:

python main.py --model_name "model_name"

To train CIDEr-RL model:

python main.py --model_name "model_name" --load_path "path_to_baseline_model_folder" --lr 0.00001 --reward_type CIDEr --max_epoch 40 --loss_function xe+rl

To train CIDEnt-RL model:

python main.py --model_name "model_name" --load_path "path_to_baseline_model_folder" --load_entailment_path "path_to_entailment_model_end_with_*pth" --lr 0.00001 --reward_type CIDEnt --max_epoch 40 --loss_function xe+rl

For testing:

python main.py --mode test --load_path "path_to_model_folder" --beam_size 5 

Pretrained Models

Download the pretrained models for Baseline, CIDEr-RL, and CIDEnt-RL from here.

For running the pretrained models:

python main.py --mode test --load_path "path_to_model_ending_with_*.pth" --beam_size 5 

MSR-VTT Results

On running the above given pretrained models you should achieve the following results:

Models CIDEr BLEU-4 METEOR ROUGE
Baseline-XE 48.2 40.8 28.1 60.7
CIDEr-RL 52.5 41.8 28.0 62.2
CIDEnt-RL 53.0 42.2 28.2 62.3

Note that first our CIDEr-RL model achieves stat. signif. improvements over Baseline-XE (on CIDEr, BLEU, and ROUGE) and then our CIDEnt-RL model achieves stat. signif. improvements over CIDEr-RL (on CIDEr, BLEU, and METEOR).

Note that this released baseline model has several improvements (and significantly better results) w.r.t. our EMNLP17 paper results, due to the following enhancements:

  • Better visual features: (1) ResNet-152 frame-level features (2) ResNeXt-101 motion features
  • Used SCST approach instead of MIXER for reinforcement leanring
  • Used better entailment classifier

References

If you find this code helpful, please consider citing the following papers:

@inproceedings{pasunuru2017reinforced,
    title={Reinforced Video Captioning with Entailment Rewards},
    author={Pasunuru, Ramakanth and Bansal, Mohit},
    booktitle={EMNLP},
    year={2017}
}

@inproceedings{pasunuru2017multi,
    title={Multi-Task Video Captioning with Video and Entailment Generation},
    author={Pasunuru, Ramakanth and Bansal, Mohit},
    booktitle={ACL},
    year={2017}
}

video_captioning_rl's People

Contributors

ramakanth-pasunuru avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.