yaoli / arctic-capgen-vid Goto Github PK

View Code? Open in Web Editor NEW

261.0 14.0 92.0 84 KB

automatic video description generation with GPU training

License: Other

Python 99.61% TeX 0.39%

arctic-capgen-vid's Introduction

This package contains the accompanying code for the following paper:

[1] Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville Describing Videos by Exploiting Temporal Structure. ICCV 2015.

PDF

BibTeX

Video

Poster with follow-up works that include

[2] Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio Oracle performance for visual captioning. BRITISH MACHINE VISION CONFERENCE (BMVC) 2016 (oral).
[3] Nicolas Ballas, Li Yao, Chris Pal, Aaron Courville Delving Deeper into Convolutional Networks for Learning Video Representations. International Conference of Learning Representations (ICLR) 2016. (conference track)

With the default setup in config.py, you will be able to train a model on YouTube2Text, reproducing (in fact better than) the results corresponding to the 3rd row in Table 1 where a global temporal attention model is applied on features extracted by GoogLenet.

Note: due to the fact that video captioning research has gradually converged to using coco-caption as the standard toolbox for evaluation. We intergrate this into this package. In the paper, however, a different tokenization methods was used, and the results from this package is not strictly comparable with the one reported in the paper.

Please follow the instructions below to run this package

Dependencies
Theano can be easily installed by following the instructions there. Theano has its own dependencies as well. The simpliest way to install Theano is to install Anaconda. Instead of using Theano coming with Anaconda, we suggest running git clone git://github.com/Theano/Theano.git to get the most recent version of Theano.
coco-caption. Install it by simply adding it into your $PYTHONPATH.
Jobman. After it has been git cloned, please add it into $PYTHONPATH as well.
Download the preprocessed version of Youtube2Text. It is a zip file that contains everything needed to train the model. Unzip it somewhere. By default, unzip will create a folder youtube2text_iccv15 that contains 8 pkl files.

preprocessed YouTube2Text download link

Go to common.py and change the following two line RAB_DATASET_BASE_PATH = '/data/lisatmp3/yaoli/datasets/' and RAB_EXP_PATH = '/data/lisatmp3/yaoli/exp/' according to your specific setup. The first path is the parent dir path containing youtube2text_iccv15 dataset folder. The second path specifies where you would like to save all the experimental results.
Before training the model, we suggest to test data_engine.py by running python data_engine.py without any error.
It is also useful to verify coco-caption evaluation pipeline works properly by running python metrics.py without any error.
Now ready to launch the training
to run on cpu: THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python train_model.py
to run on gpu: THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python train_model.py

Notes on running experiments

Running train_model.py for the first time takes much longer since Theano needs to compile for the first time lots of things and cache on disk for the future runs. You will probably see some warning messages on stdout. It is safe to ignore all of them. Both model parameters and configurations are saved (the saving path is printed out on stdout, easy to find). The most important thing to monitor is train_valid_test.txt in the exp output folder. It is a big table saving all metrics per validation. Please refer to model_attention.py line 1207 -- 1215 for actual meaning of columns.

Bonus

In the paper, we never mentioned the use of uni-directional/bi-directional LSTMs to encode video representations. But this is an obvious extension. In fact, there has been some work related to it in several other recent papers following ours. So we provide codes for more sophicated encoders as well.

Trouble shooting

This is a known problem in COCO evaluation script (their code) where METEOR are computed by creating another subprocess, which does not get killed automatically. As METEOR is called more and more, it eats up mem gradually. To fix the problem, add this line after line https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/meteor/meteor.py#L44 self.meteor_p.kill()

Support for Python3.5

Please refer to this repo.

If you have any questions, drop us email at [email protected].

arctic-capgen-vid's People

Contributors

Stargazers

Watchers

Forkers

kratarth1203 kracwarlock skallumadi ml-lab lchia olivernina cc13ny ilovecv lvapeab casaro switchfootsid bogger hsientzucheng beronx86 anirudh9119 michaelxin mycalljordan yidann yangxs plsang txd866 bullud zhangyangang hyzcn meteora9479 santara kkhetarpal cv-ip caomw kumarkrishna bityangke hawklucky kekedan tianfeng80 bitwangdan erkang 873369359 pdaicode ataalimi anejatanu34 yanweifu wangxuanhan dimplesl arasharchor feiyuhug huan2016 w-garcia wjb123 bmyan wikipedia2008 blues5 iqbal-chowdhury chilicy wwwanghao shaoyandea tsingzao binbinbian shubhampachori12110095 ai3dvision ammieqi afcarl totalgood wujinlonglovezhangmiao1314 sanket7783 holibert amirunpri2018 lianglili sususushi fendou201398 cuteofdragon shiyaya zpnew johncomeon wangxinqi94 prateeksarangi janderer xiaoyongbest1234 pttb369 videodnn patilanuja fuqianggu deepaliverma xingyu-liu dendisuhubdy melong007 apllolulu felixduelmer rongfei-chen iq-scm zjx54959

arctic-capgen-vid's Issues

raise child_exception OSError: [Errno 2] No such file or directory

Hi everybody,

When I tried to run metrics.py I have this error i don't know where does it come could you help me please

python metrics.py
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
loading youtube2text googlenet features
uneven minibath chunking, overall 20, last one 11
uneven minibath chunking, overall 20, last one 8
init COCO-EVAL scorer
tokenization...
Traceback (most recent call last):
  File "metrics.py", line 202, in <module>
    test_cocoeval()
  File "metrics.py", line 198, in test_cocoeval
    valid_score, test_score = score_with_cocoeval(samples_valid, samples_test, engine)
  File "metrics.py", line 91, in score_with_cocoeval
    valid_score = scorer.score(gts_valid, samples_valid, engine.valid_ids)
  File "/home/ruben/arctic-capgen-vid-master/cocoeval.py", line 22, in score
    gts  = tokenizer.tokenize(gts)
  File "/home/ruben/coco-caption-master/pycocoevalcap/tokenizer/ptbtokenizer.py", line 52, in tokenize
    stdout=subprocess.PIPE)
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

About validation and testing

There are 'unverified' and 'clean' captions in microsoft corpus file, roughly 40 captions in English per video. When I use cocoeval to achieve the metrics like BLEU@4, METEOR, ROUGE_L and CIDEr, should I treat all these captions as ground true captions? Or just use your samples provided here https://github.com/yaoli/arctic-capgen-vid/tree/master/test? I want to do a fair comparison presented by your paper Describing Videos by Exploiting Temporal Structure.

cocoeval.py gives error

File "/u/username/coco-caption/pycocoevalcap/tokenizer/ptbtokenizer.py", line 37, in tokenize
sentences = '\n'.join([v.replace('\n', ' ') for k, v in captions_for_image.items()])
AttributeError: 'list' object has no attribute 'replace'

Can you let me know of a solution?

How to change the number of layers of lstm

How to change the number of layers of lstm in code?

Questions about the results and training time?

Hi,

Thanks for your work. It helps me a lot. I saw that the test "BLUE"/"METEOR"/"CIDEr" results saved in "train_valid_test.txt" is much better than the results reported in the paper. Why it's better?

And It took me about 10 hours running the program on GPU for only 15 epochs and 12000 updates. Is that normal or do I made something wrong? Do I need to train for 500 epochs?

And the last question is how to set the dimensions of LSTM and "dim_word". Why do you set "dim_word" to 468 and "dim" to 3518?

Thanks for your help.

OSError: [Errno 12] Cannot allocate memory

Thank @yaoli for the great work.
I successfully compiled and run the packages.
However, the running stopped in about half day.
The errors are as follows.
Does anyone has the same error ?
'''
init COCO-EVAL scorer
tokenization...
Traceback (most recent call last):
File "train_model.py", line 103, in
sys.exit(main(state))
File "train_model.py", line 89, in main
train_from_scratch(config, state, channel)
File "train_model.py", line 82, in train_from_scratch
model_attention.train_from_scratch(state, channel)
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/model_attention.py", line 1306, in train_from_scratch
model.train(**state.attention)
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/model_attention.py", line 1177, in train
f_init=f_init, f_next=f_next, model=self
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/metrics.py", line 174, in compute_score
valid_score, test_score = score_with_cocoeval(samples_valid, samples_test, engine)
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/metrics.py", line 91, in score_with_cocoeval
valid_score = scorer.score(gts_valid, samples_valid, engine.valid_ids)
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/cocoeval.py", line 22, in score
gts = tokenizer.tokenize(gts)
File "/home/mmc/Downloads/coco-caption-master/pycocoevalcap/tokenizer/ptbtokenizer.py", line 52, in tokenize
stdout=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 679, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1143, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
'''

Randonly shuffle the input data

I can not find the shuffle module of the input video-text pairs after each epoch.
Does this impact the performance ?

Program is running since 2 days (48 hours)

How much time does is generally take to run this code?
I have been running the code since last 2 days, and it has reached upto Epoch16.
How many epochs are there in train_model?

Functions to generate captions on new dataset?

which function should I use to generate caption on new video data?

Saving the best BLEU model

When the best model according to bleu is saved (lines 1218-1223), the stored parameters are those that achieved the best error (best_p), not the best BLEU (i.e. the current ones). I think that best_p = unzip(tparams) is missing at line 1220

Input data preparation

Thanks for the great code!
We want to use the library to gets some results on our dataset. When we analyze the pkl files (from the given zip file), the no.of GoogLeNet features (1024 dim.) for each video are less than actual total no. of frames in the video. It seems there is some kind of sampling of frames or it could be from the HoG, HoF, MBH feature cube from the paper, but is unclear. Once the features are obtained, these are split into 26 equally spaced clips from which first frame is taken as input.
Are any scripts also released for input pkl data preparation?
Thanks.

tensorflow branch: model_attention_tf.py

Dear @yaoli, I am trying to implement the tensorflow version. While doing so, I am getting the following error in after typing the command 'python train_model.py' Since I want to run it using tensorflow I modified the train command you have mentioned in the readme file. Does the file need any cleaning up? Its is missing Class Attention.

Also, the initial lines when running the code indicate that Theano is still being used. How come? I did edit the config.py file to select tensorflow ('run_with': 'tensorflow').

Thanks!

import tables

Hi Dr. Yao,
When I test the data_engine.py, it gives me an error that "ImportError: No module named tables".
Could you tell me where this module is?
Thank you so much!

fix grammar and improve clarity

Confused by lstm_cond_layer._step() function's variable names

I was confused by lstm_cond_layer._step() function in file model_attention.py line 296:

   def _step(m_, x_, # sequences
             h_, c_, a_, ct_, # outputs_info
             pctx_, ctx_, Wd_att, U_att, c_att, W_sel, b_sel, U, Wc, # non_sequences

From the lambda function _step0 you defined and the codes in _step function, I see that ct_ means averaged context produced by attention mechanism and ctx_ means context. However, this function returned ctx_ at line 337 as follows:

        rval = [h, c, alpha, ctx_, pstate_, pctx_, i, f, o, preact, alpha_pre]+pctx_list
        return rval

I suggest you to make it clear. Thanks!

Preprocessed Youtube2text link unavailable

Is there an updated / alternative Youtube2text download link? Thanks!

In the preprocessed data , what contents in the data?

OSError: [Errno 12] Cannot allocate memory

Hi,

I still have this memory error even if I put self.meteor_p.kill().
I run it on a virtual machine Ubuntu 16.04 with 4gb ram

THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python train_model.py
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
current save dir /home/ruben/exp/arctic-capgen-vid/test_non/
creating directory /home/ruben/exp/arctic-capgen-vid/test_non/
erasing everything in /home/ruben/exp/arctic-capgen-vid/test_non/
rm: cannot remove '/home/ruben/exp/arctic-capgen-vid/test_non//*': No such file or directory
saving model config into /home/ruben/exp/arctic-capgen-vid/test_non/model_config.pkl
Model Type: attention
Host: ubuntu
Command: train_model.py
training an attention model
/home/ruben/arctic-capgen-vid-master/model_attention.py:37: UserWarning: Feeding context to output directly seems to hurt.
warnings.warn('Feeding context to output directly seems to hurt.')
Loading data
loading youtube2text googlenet features
uneven minibath chunking, overall 64, last one 12
uneven minibath chunking, overall 200, last one 91
uneven minibath chunking, overall 200, last one 168
init params
no lstm on ctx
Traceback (most recent call last):
File "train_model.py", line 103, in
sys.exit(main(state))
File "train_model.py", line 89, in main
train_from_scratch(config, state, channel)
File "train_model.py", line 82, in train_from_scratch
model_attention.train_from_scratch(state, channel)
File "/home/ruben/arctic-capgen-vid-master/model_attention.py", line 1306, in train_from_scratch
model.train(**state.attention)
File "/home/ruben/arctic-capgen-vid-master/model_attention.py", line 928, in train
self.build_model(tparams, model_options)
File "/home/ruben/arctic-capgen-vid-master/model_attention.py", line 516, in build_model
use_noise=use_noise)
File "/home/ruben/arctic-capgen-vid-master/model_attention.py", line 355, in lstm_cond_layer
p=0.5, n=1, dtype=state_below.dtype),
File "/home/ruben/.local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1364, in binomial
x = self.uniform(size=size, nstreams=nstreams)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1324, in uniform
rstates = self.get_substream_rstates(nstreams, dtype)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/configparser.py", line 115, in res
return f(*args, **kwargs)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1226, in get_substream_rstates
multMatVect(rval[0], A1p72, M1, A2p72, M2)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 65, in multMatVect
[A_sym, s_sym, m_sym, A2_sym, s2_sym, m2_sym], o, profile=False)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/compile/function.py", line 326, in function
output_keys=output_keys)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1784, in orig_function
defaults)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1651, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/vm.py", line 1059, in make_all
impl=impl))
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/op.py", line 924, in make_thunk
no_recycling)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/op.py", line 828, in make_c_thunk
output_storage=node_output_storage)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 1190, in make_thunk
keep_lock=keep_lock)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 1131, in compile
keep_lock=keep_lock)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 1578, in cthunk_factory
key = self.cmodule_key()
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 1268, in cmodule_key
compile_args=self.compile_args(),
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 951, in compile_args
ret += c_compiler.compile_args()
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1949, in compile_args
native_lines = get_lines("%s -march=native -E -v -" % theano.config.cxx)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1918, in get_lines
shell=True)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/misc/windows.py", line 43, in subprocess_Popen
proc = subprocess.Popen(command, startupinfo=startupinfo, **params)
File "/usr/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1235, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

As if there is a mistake in function of gen_sample for Beam Search engine

I use your code to set up my own framework, I find this is mistake is shown in attach file. When I change the 'ti' to 'idx', The result is better than the result you reported.

Load Trained Model

Hi,

How do I use my model after training?

alpha_ratio.txt
model_best.npz
model_best_blue_or_meteor.npz
model_best_so_far.npz
model_config.pkl
model_current.npz
model_options.pkl
train_valid_test.txt
valid_samples.txt

Can you show me an example of code to load the model?
thank!

Questions concerning pre-proccessing

In your paper, it says that you select 26 equally-spaced frames out of the first 240 from each video. Can I ask that what is the meaning of pick 26 out of 240 frames? How you pick the 26 frames, randomly or any criteria? And in your FEAT_key_vidID_value_features.pkl file, it maps the videos with 2D arrays. For each video, it has a corresponding 2D array that size is 200_300 x 1024. Can I ask what is the meaning of the size of these arrays since you specified that you pick 240 frames of each video?

How to generate the pkl files for new dataset

hello @yaoli Recently I have tried to write some scripts to generate the pkl files for the new dataset MSR-VTT, like cap.pkl, worddict.pkl and so on. But when I put them into training the model, it can't learn anything. I think it maybe caused by my wrong generation. Could you please provide me with the preprocessing scripts? I need it VERY MUCH to do my research. Thanks a lot!

train, valid and test splits

Hi @yaoli , In the pre-processing data file, the video names have been rename as vid***. Can you share the youtube video-id mapping (original youtube video name->vid***) with me?
I want to use your split setting in my experiments. I want to re-extract features from videos in the Youtube2Text dataset. Thank you!

How to get the results without training the model

Hi @yaoli , thank you for your code. When I run python metrics.py for verifying coco-caption evaluation pipeline works properly, I got results. However, I want to know if the results I get are the results corresponding to the 3rd row in Table 1 in your paper.

pytorch version

I want to konw whether someone has implemented code of this paper in pytorch? If so, I'd like you to kindly provided it to me? Thank you!