Code Monkey home page Code Monkey logo

arctic-capgen-vid's Introduction

This package contains the accompanying code for the following paper:

PDF

BibTeX

Video

Poster with follow-up works that include

With the default setup in config.py, you will be able to train a model on YouTube2Text, reproducing (in fact better than) the results corresponding to the 3rd row in Table 1 where a global temporal attention model is applied on features extracted by GoogLenet.

Note: due to the fact that video captioning research has gradually converged to using coco-caption as the standard toolbox for evaluation. We intergrate this into this package. In the paper, however, a different tokenization methods was used, and the results from this package is not strictly comparable with the one reported in the paper.

#####Please follow the instructions below to run this package

  1. Dependencies
  2. Theano can be easily installed by following the instructions there. Theano has its own dependencies as well. The simpliest way to install Theano is to install Anaconda. Instead of using Theano coming with Anaconda, we suggest running git clone git://github.com/Theano/Theano.git to get the most recent version of Theano.
  3. coco-caption. Install it by simply adding it into your $PYTHONPATH.
  4. Jobman. After it has been git cloned, please add it into $PYTHONPATH as well.
  5. Download the preprocessed version of Youtube2Text. It is a zip file that contains everything needed to train the model. Unzip it somewhere. By default, unzip will create a folder youtube2text_iccv15 that contains 8 pkl files.

preprocessed YouTube2Text download link

  1. Go to common.py and change the following two line RAB_DATASET_BASE_PATH = '/data/lisatmp3/yaoli/datasets/' and RAB_EXP_PATH = '/data/lisatmp3/yaoli/exp/' according to your specific setup. The first path is the parent dir path containing youtube2text_iccv15 dataset folder. The second path specifies where you would like to save all the experimental results.
  2. Before training the model, we suggest to test data_engine.py by running python data_engine.py without any error.
  3. It is also useful to verify coco-caption evaluation pipeline works properly by running python metrics.py without any error.
  4. Now ready to launch the training
  5. to run on cpu: THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python train_model.py
  6. to run on gpu: THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python train_model.py

#####Notes on running experiments Running train_model.py for the first time takes much longer since Theano needs to compile for the first time lots of things and cache on disk for the future runs. You will probably see some warning messages on stdout. It is safe to ignore all of them. Both model parameters and configurations are saved (the saving path is printed out on stdout, easy to find). The most important thing to monitor is train_valid_test.txt in the exp output folder. It is a big table saving all metrics per validation. Please refer to model_attention.py line 1207 -- 1215 for actual meaning of columns.

#####Bonus In the paper, we never mentioned the use of uni-directional/bi-directional LSTMs to encode video representations. But this is an obvious extension. In fact, there has been some work related to it in several other recent papers following ours. So we provide codes for more sophicated encoders as well.

#####Trouble shooting This is a known problem in COCO evaluation script (their code) where METEOR are computed by creating another subprocess, which does not get killed automatically. As METEOR is called more and more, it eats up mem gradually. To fix the problem, add this line after line https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/meteor/meteor.py#L44 self.meteor_p.kill()

If you have any questions, drop us email at [email protected].

arctic-capgen-vid's People

Contributors

yaoli avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.