Code Monkey home page Code Monkey logo

text-to-clip_retrieval's Introduction

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

Code released by Huijuan Xu (Boston University).

Introduction

We address the problem of text-based activity retrieval in video. Given a sentence describing an activity, our task is to retrieve matching clips from an untrimmed video. Our model learns a fine-grained similarity metric for retrieval and uses visual features to modulate the processing of query sentences at the word level in a recurrent neural network. A multi-task loss is also employed by adding query re-generation as an auxiliary task.

License

Our code is released under the MIT License (refer to the LICENSE file for details).

Citing

If you find our paper useful in your research, please consider citing:

@inproceedings{xu2019multilevel,
title={Multilevel Language and Vision Integration for Text-to-Clip Retrieval.},
author={Xu, Huijuan and He, Kun and Plummer, Bryan A. and Sigal, Leonid and Sclaroff,
Stan and Saenko, Kate},
booktitle={AAAI},
year={2019}
}

Contents

  1. Installation
  2. Preparation
  3. Train Proposal Network
  4. Extract Proposal Features
  5. Training
  6. Testing

Installation:

  1. Clone the Text-to-Clip_Retrieval repository.

    git clone --recursive [email protected]:VisionLearningGroup/Text-to-Clip_Retrieval.git
  2. Build Caffe3d with pycaffe (see: Caffe installation instructions).

    Note: Caffe must be built with Python support!

cd ./caffe3d

# If have all of the requirements installed and your Makefile.config in
  place, then simply do:
make -j8 && make pycaffe
  1. Build lib folder.

    cd ./lib    
    make

Preparation:

  1. We convert the orginal data annotation files into json format.

    # train data json file
    caption_gt_train.json 
    # test data json file
    caption_gt_test.json
  2. Download the videos in Charades dataset and extract frames at 25fps.

Train Proposal Network:

  1. Generate the pickle data for training proposal network model.

    cd ./preprocess
    # generate training data
    python generate_roidb_modified_freq1.py
  2. Download C3D classification pretrain model to ./pretrain/ .

  3. In root folder, run proposal network training:

    bash ./experiments/train_rpn/script_train.sh
  4. We provide one set of trained proposal network model weights.

Extract Proposal Features:

  1. In root folder, extract proposal features for training data and save as hdf5 data.
    bash ./experiments/extract_HDF_for_LSTM/script_test.sh

Training:

  1. In root folder, run:
    bash ./experiments/Text_to_Clip/script_train.sh

Testing:

  1. Generate the pickle data for testing the Text_to_Clip model.

    cd ./preprocess
    # generate test data
    python generate_roidb_modified_freq1_full_retrieval_test.py
  2. Download one sample model to ./experiments/Text_to_Clip/snapshot/ .

    One Text_to_Clip model on Charades-STA dataset is provided in: caffemodel .

    The provided model has Recall@1 (tIoU=0.7) score ~15.6% on the test set.

  3. In root folder, generate the similarity scores on the test set and save as pickle file.

    bash ./experiments/Text_to_Clip/test_fast/script_test.sh 
  4. Get the evaluation results.

    cd ./experiments/Text_to_Clip/test_fast/evaluation/
    bash bash.sh

text-to-clip_retrieval's People

Contributors

huijuan88 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.