Code Monkey home page Code Monkey logo

arctic-capgen-vid's Introduction

This package contains the accompanying code for the following paper:

PDF

BibTeX

Video

Poster with follow-up works that include

With the default setup in config.py, you will be able to train a model on YouTube2Text, reproducing (in fact better than) the results corresponding to the 3rd row in Table 1 where a global temporal attention model is applied on features extracted by GoogLenet.

Note: due to the fact that video captioning research has gradually converged to using coco-caption as the standard toolbox for evaluation. We intergrate this into this package. In the paper, however, a different tokenization methods was used, and the results from this package is not strictly comparable with the one reported in the paper.

Please follow the instructions below to run this package
  1. Dependencies
  2. Theano can be easily installed by following the instructions there. Theano has its own dependencies as well. The simpliest way to install Theano is to install Anaconda. Instead of using Theano coming with Anaconda, we suggest running git clone git://github.com/Theano/Theano.git to get the most recent version of Theano.
  3. coco-caption. Install it by simply adding it into your $PYTHONPATH.
  4. Jobman. After it has been git cloned, please add it into $PYTHONPATH as well.
  5. Download the preprocessed version of Youtube2Text. It is a zip file that contains everything needed to train the model. Unzip it somewhere. By default, unzip will create a folder youtube2text_iccv15 that contains 8 pkl files.

preprocessed YouTube2Text download link

  1. Go to common.py and change the following two line RAB_DATASET_BASE_PATH = '/data/lisatmp3/yaoli/datasets/' and RAB_EXP_PATH = '/data/lisatmp3/yaoli/exp/' according to your specific setup. The first path is the parent dir path containing youtube2text_iccv15 dataset folder. The second path specifies where you would like to save all the experimental results.
  2. Before training the model, we suggest to test data_engine.py by running python data_engine.py without any error.
  3. It is also useful to verify coco-caption evaluation pipeline works properly by running python metrics.py without any error.
  4. Now ready to launch the training
  5. to run on cpu: THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python train_model.py
  6. to run on gpu: THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python train_model.py
Notes on running experiments

Running train_model.py for the first time takes much longer since Theano needs to compile for the first time lots of things and cache on disk for the future runs. You will probably see some warning messages on stdout. It is safe to ignore all of them. Both model parameters and configurations are saved (the saving path is printed out on stdout, easy to find). The most important thing to monitor is train_valid_test.txt in the exp output folder. It is a big table saving all metrics per validation. Please refer to model_attention.py line 1207 -- 1215 for actual meaning of columns.

Bonus

In the paper, we never mentioned the use of uni-directional/bi-directional LSTMs to encode video representations. But this is an obvious extension. In fact, there has been some work related to it in several other recent papers following ours. So we provide codes for more sophicated encoders as well.

Trouble shooting

This is a known problem in COCO evaluation script (their code) where METEOR are computed by creating another subprocess, which does not get killed automatically. As METEOR is called more and more, it eats up mem gradually. To fix the problem, add this line after line https://github.com/tylin/coco-caption/blob/master/pycocoevalcap/meteor/meteor.py#L44 self.meteor_p.kill()

Support for Python3.5

Please refer to this repo.

If you have any questions, drop us email at [email protected].

arctic-capgen-vid's People

Contributors

yaoli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arctic-capgen-vid's Issues

raise child_exception OSError: [Errno 2] No such file or directory

Hi everybody,

When I tried to run metrics.py I have this error i don't know where does it come could you help me please

python metrics.py
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
loading youtube2text googlenet features
uneven minibath chunking, overall 20, last one 11
uneven minibath chunking, overall 20, last one 8
init COCO-EVAL scorer
tokenization...
Traceback (most recent call last):
  File "metrics.py", line 202, in <module>
    test_cocoeval()
  File "metrics.py", line 198, in test_cocoeval
    valid_score, test_score = score_with_cocoeval(samples_valid, samples_test, engine)
  File "metrics.py", line 91, in score_with_cocoeval
    valid_score = scorer.score(gts_valid, samples_valid, engine.valid_ids)
  File "/home/ruben/arctic-capgen-vid-master/cocoeval.py", line 22, in score
    gts  = tokenizer.tokenize(gts)
  File "/home/ruben/coco-caption-master/pycocoevalcap/tokenizer/ptbtokenizer.py", line 52, in tokenize
    stdout=subprocess.PIPE)
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

About validation and testing

There are 'unverified' and 'clean' captions in microsoft corpus file, roughly 40 captions in English per video. When I use cocoeval to achieve the metrics like BLEU@4, METEOR, ROUGE_L and CIDEr, should I treat all these captions as ground true captions? Or just use your samples provided here https://github.com/yaoli/arctic-capgen-vid/tree/master/test? I want to do a fair comparison presented by your paper Describing Videos by Exploiting Temporal Structure.

cocoeval.py gives error

File "/u/username/coco-caption/pycocoevalcap/tokenizer/ptbtokenizer.py", line 37, in tokenize
sentences = '\n'.join([v.replace('\n', ' ') for k, v in captions_for_image.items()])
AttributeError: 'list' object has no attribute 'replace'

Can you let me know of a solution?

Questions about the results and training time?

Hi,

Thanks for your work. It helps me a lot. I saw that the test "BLUE"/"METEOR"/"CIDEr" results saved in "train_valid_test.txt" is much better than the results reported in the paper. Why it's better?

And It took me about 10 hours running the program on GPU for only 15 epochs and 12000 updates. Is that normal or do I made something wrong? Do I need to train for 500 epochs?

And the last question is how to set the dimensions of LSTM and "dim_word". Why do you set "dim_word" to 468 and "dim" to 3518?

Thanks for your help.

OSError: [Errno 12] Cannot allocate memory

Thank @yaoli for the great work.
I successfully compiled and run the packages.
However, the running stopped in about half day.
The errors are as follows.
Does anyone has the same error ?
'''
init COCO-EVAL scorer
tokenization...
Traceback (most recent call last):
File "train_model.py", line 103, in
sys.exit(main(state))
File "train_model.py", line 89, in main
train_from_scratch(config, state, channel)
File "train_model.py", line 82, in train_from_scratch
model_attention.train_from_scratch(state, channel)
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/model_attention.py", line 1306, in train_from_scratch
model.train(**state.attention)
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/model_attention.py", line 1177, in train
f_init=f_init, f_next=f_next, model=self
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/metrics.py", line 174, in compute_score
valid_score, test_score = score_with_cocoeval(samples_valid, samples_test, engine)
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/metrics.py", line 91, in score_with_cocoeval
valid_score = scorer.score(gts_valid, samples_valid, engine.valid_ids)
File "/mnt/pan_sdd1/VideoToLanguage/Solutions/arctic-capgen-vid-master/cocoeval.py", line 22, in score
gts = tokenizer.tokenize(gts)
File "/home/mmc/Downloads/coco-caption-master/pycocoevalcap/tokenizer/ptbtokenizer.py", line 52, in tokenize
stdout=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 679, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1143, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
'''

Program is running since 2 days (48 hours)

How much time does is generally take to run this code?
I have been running the code since last 2 days, and it has reached upto Epoch16.
How many epochs are there in train_model?

Saving the best BLEU model

When the best model according to bleu is saved (lines 1218-1223), the stored parameters are those that achieved the best error (best_p), not the best BLEU (i.e. the current ones). I think that best_p = unzip(tparams) is missing at line 1220

Input data preparation

Thanks for the great code!
We want to use the library to gets some results on our dataset. When we analyze the pkl files (from the given zip file), the no.of GoogLeNet features (1024 dim.) for each video are less than actual total no. of frames in the video. It seems there is some kind of sampling of frames or it could be from the HoG, HoF, MBH feature cube from the paper, but is unclear. Once the features are obtained, these are split into 26 equally spaced clips from which first frame is taken as input.
Are any scripts also released for input pkl data preparation?
Thanks.

tensorflow branch: model_attention_tf.py

Dear @yaoli, I am trying to implement the tensorflow version. While doing so, I am getting the following error in after typing the command 'python train_model.py' Since I want to run it using tensorflow I modified the train command you have mentioned in the readme file. Does the file need any cleaning up? Its is missing Class Attention.

Also, the initial lines when running the code indicate that Theano is still being used. How come? I did edit the config.py file to select tensorflow ('run_with': 'tensorflow').

Thanks!

image

import tables

Hi Dr. Yao,
When I test the data_engine.py, it gives me an error that "ImportError: No module named tables".
Could you tell me where this module is?
Thank you so much!

Confused by lstm_cond_layer._step() function's variable names

I was confused by lstm_cond_layer._step() function in file model_attention.py line 296:

   def _step(m_, x_, # sequences
             h_, c_, a_, ct_, # outputs_info
             pctx_, ctx_, Wd_att, U_att, c_att, W_sel, b_sel, U, Wc, # non_sequences

From the lambda function _step0 you defined and the codes in _step function, I see that ct_ means averaged context produced by attention mechanism and ctx_ means context. However, this function returned ctx_ at line 337 as follows:

        rval = [h, c, alpha, ctx_, pstate_, pctx_, i, f, o, preact, alpha_pre]+pctx_list
        return rval

I suggest you to make it clear. Thanks!

OSError: [Errno 12] Cannot allocate memory

Hi,

I still have this memory error even if I put self.meteor_p.kill().
I run it on a virtual machine Ubuntu 16.04 with 4gb ram

THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python train_model.py
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
current save dir /home/ruben/exp/arctic-capgen-vid/test_non/
creating directory /home/ruben/exp/arctic-capgen-vid/test_non/
erasing everything in /home/ruben/exp/arctic-capgen-vid/test_non/
rm: cannot remove '/home/ruben/exp/arctic-capgen-vid/test_non//*': No such file or directory
saving model config into /home/ruben/exp/arctic-capgen-vid/test_non/model_config.pkl
Model Type: attention
Host: ubuntu
Command: train_model.py
training an attention model
/home/ruben/arctic-capgen-vid-master/model_attention.py:37: UserWarning: Feeding context to output directly seems to hurt.
warnings.warn('Feeding context to output directly seems to hurt.')
Loading data
loading youtube2text googlenet features
uneven minibath chunking, overall 64, last one 12
uneven minibath chunking, overall 200, last one 91
uneven minibath chunking, overall 200, last one 168
init params
no lstm on ctx
Traceback (most recent call last):
File "train_model.py", line 103, in
sys.exit(main(state))
File "train_model.py", line 89, in main
train_from_scratch(config, state, channel)
File "train_model.py", line 82, in train_from_scratch
model_attention.train_from_scratch(state, channel)
File "/home/ruben/arctic-capgen-vid-master/model_attention.py", line 1306, in train_from_scratch
model.train(**state.attention)
File "/home/ruben/arctic-capgen-vid-master/model_attention.py", line 928, in train
self.build_model(tparams, model_options)
File "/home/ruben/arctic-capgen-vid-master/model_attention.py", line 516, in build_model
use_noise=use_noise)
File "/home/ruben/arctic-capgen-vid-master/model_attention.py", line 355, in lstm_cond_layer
p=0.5, n=1, dtype=state_below.dtype),
File "/home/ruben/.local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1364, in binomial
x = self.uniform(size=size, nstreams=nstreams)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1324, in uniform
rstates = self.get_substream_rstates(nstreams, dtype)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/configparser.py", line 115, in res
return f(*args, **kwargs)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 1226, in get_substream_rstates
multMatVect(rval[0], A1p72, M1, A2p72, M2)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py", line 65, in multMatVect
[A_sym, s_sym, m_sym, A2_sym, s2_sym, m2_sym], o, profile=False)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/compile/function.py", line 326, in function
output_keys=output_keys)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1784, in orig_function
defaults)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1651, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/vm.py", line 1059, in make_all
impl=impl))
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/op.py", line 924, in make_thunk
no_recycling)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/op.py", line 828, in make_c_thunk
output_storage=node_output_storage)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 1190, in make_thunk
keep_lock=keep_lock)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 1131, in compile
keep_lock=keep_lock)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 1578, in cthunk_factory
key = self.cmodule_key()
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 1268, in cmodule_key
compile_args=self.compile_args(),
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cc.py", line 951, in compile_args
ret += c_compiler.compile_args()
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1949, in compile_args
native_lines = get_lines("%s -march=native -E -v -" % theano.config.cxx)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1918, in get_lines
shell=True)
File "/home/ruben/.local/lib/python2.7/site-packages/theano/misc/windows.py", line 43, in subprocess_Popen
proc = subprocess.Popen(command, startupinfo=startupinfo, **params)
File "/usr/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1235, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Load Trained Model

Hi,

How do I use my model after training?

alpha_ratio.txt
model_best.npz
model_best_blue_or_meteor.npz
model_best_so_far.npz
model_config.pkl
model_current.npz
model_options.pkl
train_valid_test.txt
valid_samples.txt

Can you show me an example of code to load the model?
thank!

Questions concerning pre-proccessing

In your paper, it says that you select 26 equally-spaced frames out of the first 240 from each video. Can I ask that what is the meaning of pick 26 out of 240 frames? How you pick the 26 frames, randomly or any criteria? And in your FEAT_key_vidID_value_features.pkl file, it maps the videos with 2D arrays. For each video, it has a corresponding 2D array that size is 200_300 x 1024. Can I ask what is the meaning of the size of these arrays since you specified that you pick 240 frames of each video?

How to generate the pkl files for new dataset

hello @yaoli Recently I have tried to write some scripts to generate the pkl files for the new dataset MSR-VTT, like cap.pkl, worddict.pkl and so on. But when I put them into training the model, it can't learn anything. I think it maybe caused by my wrong generation. Could you please provide me with the preprocessing scripts? I need it VERY MUCH to do my research. Thanks a lot!

train, valid and test splits

Hi @yaoli , In the pre-processing data file, the video names have been rename as vid***. Can you share the youtube video-id mapping (original youtube video name->vid***) with me?
I want to use your split setting in my experiments. I want to re-extract features from videos in the Youtube2Text dataset. Thank you!

How to get the results without training the model

Hi @yaoli , thank you for your code. When I run python metrics.py for verifying coco-caption evaluation pipeline works properly, I got results. However, I want to know if the results I get are the results corresponding to the 3rd row in Table 1 in your paper.

pytorch version

I want to konw whether someone has implemented code of this paper in pytorch? If so, I'd like you to kindly provided it to me? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.