Code Monkey home page Code Monkey logo

arctic-captions's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arctic-captions's Issues

train error!

when I train the model,there comes the error:

       14     for cc in caps:
       15         seqs.append([worddict[w] if worddict[w] < n_words else 1 for w in cc[0].split()])
  ---> 16         feat_list.append(features[cc[1]])
       17 
       18     lengths = [len(s) for s in seqs]

       TypeError: 'coo_matrix' object does not support indexing

why don't finetune cnn?

hey, I have noticed that cnn-lstm model can benefit a lot from fine-tuning cnn. But why the code don't fine-tune cnn?

cannot figure out the code

I have comprehended the image caption attention model. But I still cannot figure out how the image caption attention model is set up in theano. The code is too hard for me. Who else has similar problems?

about platform

Can this code be built on windows installed with theano?I have a problem when build in vs2013:cannot open file "cublas.lib".However,I have already installed theano,anaconda and cuda.

no re-ranking?

Actually, I know that re-ranking gives higher score (for m-RNN, about 3 points gain in terms of BLEU4 score and about 9 points in terms of CIDEr)
I'm wondering that this work does not use re-ranking as other works do? also, no any post-processing?

RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility

/Users/Calvin/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *

This is due to the my version of numpy? I googled about it but I can't get a good answer, though I found someone said it's just fine.

bias term has been added twice in LSTM layer

Both the codes in the function _step(m_, x_, h_, c_) from here and in computing state_below from here add the bias term, which seems to be wrong. Could you please have a look? Or if it is correct, could you please explain it a bit? Thx.

Questions or bugs in the adam optimizer

From line 84,85 and 97,98 of the optimizer.py , we can see the b1 and b2 here are correspond to '1-b1' and '1-b2' respectively of the original adam paper, i.e., 'Adam: A Method for Stochastic Optimization" Kingma et al. (ICLR 2015)'. However, I am confused by line 90,91.
I think the code should be :
fix1 = 1. - (1-b1)(i_t)
fix2 = 1. - (1-b2)
(i_t), instead. Because the b1 and b2 should also be switched to '1-b1' and '1-b2' constantly during the implementation.

I wonder how the authors use the adam optimizer when conducting experiments on MSCOCO.

'sample = empty string' when trying to train flickr_30k

Thanks for the sharing the implementation. Please, anybody know why when training, I get the 'truth' = empty string, here is the output of the debugger:

Epoch 1
... Truth 0 : UNK UNK with UNK white shirt aims UNK dart at an UNK target as several other people holding darts look on beside her UNK
Sample ( 0 ) 0 :
... Truth 1 : UNK man UNK along with three others UNK is standing in snowy and cold weather with only green shoes and black shorts on UNK
Sample ( 0 ) 1 :
... Truth 2 : bald man in what looks like to be UNK lab with UNK lady watching as he is mixing some liquids in clear cup containers
Sample ( 0 ) 2 :
... Truth 3 : UNK UNK in UNK big purple hat and UNK long blue coat and UNK man with UNK satchel spending time in the city UNK
Sample ( 0 ) 3 :
... Truth 4 : UNK man rides UNK skateboard down the railing of UNK staircase in front of UNK closed storefront UNK with three people watching him UNK
Sample ( 0 ) 4 :
... Truth 5 : UNK group of dancers dressed in red and wearing short white tutus conversing UNK while UNK single dancer is practicing in the background UNK
Sample ( 0 ) 5 :
... Truth 6 : UNK on UNK baseball field UNK one baseball player is sliding into UNK base while UNK player from the opposing team is jumping UNK
Sample ( 0 ) 6 :
... Truth 7 : UNK female acrobat with long UNK blond curly hair UNK dangling upside down while suspending herself from long UNK red ribbons of fabric UNK
Sample ( 0 ) 7 :
... Truth 8 : UNK man with UNK backpack is walking down UNK street while UNK man in an orange shirt pushes UNK cart the opposite direction UNK
Sample ( 0 ) 8 :
... Truth 9 : UNK football game is going on UNK with player on the field UNK squatting on the sidelines and standing UNK watching the game UNK
Sample ( 0 ) 9 :

I extracted the CNN features of the images as illustrated here: #1

Post your evaluation score

Hello, everyone,

I got the following score after I ran the coco.

{'CIDEr': 0.50350648251818364, 'Bleu_4': 0.20037826460154334, 'Bleu_3': 0.2920434703847389, 'Bleu_2': 0.42775646056296673, 'Bleu_1': 0.6105274018537202, 'ROUGE_L': 0.43556281782994649, 'METEOR': 0.23890246684760072}

So METEOR is almost same. However my BLEU score are 7~8% lower than paper. I wonder if this is acceptable or there is something wrong in my process.

Would you please share your results in this post?

Thanks.

Question about sampling in stochastic attention

Hello, while analyzing the source code, I found the process of getting alpha_sample by stochastic hard attention quiet not clear, mainly because of the variable 'h_sampling_mask'

The sampling part of the code is (in capgen.py, line 409),
alpha_sample = h_sampling_mask * trng.multinomial(pvals=alpha,dtype=theano.config.floatX)
+ (1.-h_sampling_mask) * alpha

When h_sampling_mask is 1, alpha_sample would be the sampling result of the multinomial distribution.
When h_sampling_mask is 0, however, alpha_sample would be simply alpha.

I though, according to the paper, alpha_sample should be simply
alpha_sample = trng.multinomial(pvals=alpha,dtype=theano.config.floatX)
which is equivalent to setting h_sampling_mask 1.

Why is "h_sampling_mask" needed?

Feature extraction

Hello, thank you for sharing this great project.

I would like to run the code for caption generation of my own image. To that end, I tried to extract image feature vector with VGG model (based on the paper). I used below code for extract feature vector (values of conv5_3).


input_img = numpy.array( caffe.io.load_image(path_img) ) #loading images HxWx3 (RGB)
caffe_input = numpy.array( preprocess_image(input_img) ) #preprocess the images

caffe_net.blobs['data'].reshape(1,3,224,224)
caffe_net.blobs['data'].data[...] = caffe_input

out = caffe_net.forward()

feature = numpy.array(caffe_net.blobs['conv5_3'].reshape(1,512,14,14))

How can I get context vector for function 'capgen.gen_sample'?
I am not good at python and caffe. I want to know how to get 'context' from the VGG model.

Thank you.

question about "doubly stochastic attention"

As I'm reading the paper, I don't understand, for the soft attention version, why we encourage \sum_{t} a_{ti} \approx 1, as I feel C/L would be more appropriate, since \sum_{t,i} a_{ti} = C.

Instruction to visualize hard attention

Hi, thanks for sharing the code. I expect the alpha_visualization.ipynb to visualize hard attention just like what the paper represented. But I trained a stochastic one and visualize it , then I got visualization like the soft one. Has anyone successfully visualize the hard attention and give me some instrustions on how to do it? Thanks. :)

Bug in doubly stochastic attention?

It seems like when computing the doubly stochastic attention, the code is doing:

alpha_reg = alpha_c * ((1.-alphas.sum(0))**2).sum(0).mean()

As per my understanding alphas is of dimensions [sequence_length, batch_size, feature_map_spatial_extent] which for vgg conv5 would be 14 x 14 = 196.

This means that we are averaging along the 196 spatial locations as opposed to averaging along with minibatch. Is this the expected behavior?

Any clarification on this would be great!

Typo in the arXiv & ICML paper

From the code and Pascanu et. al, I think that Eqn 7 in the arXiv version (or Eqn 2 in the ICML version) should add a tanh after L_0.

And is a y_t missing in the same equation as well? From Bahdanau et al. Appendix part Page 14, p(y_i | s_i, y_{i-1}, c_i)

question about grads of alphas in hard Attention

Hello, I feel realy confused about the grads of alphas in hard attention. The source code is in line 1199:

known_grads={alphas:opt_outs['masked_cost'][:,:,None]/10.*
(alphas_sample/alphas) + alpha_entropy_c*(tensor.log(alphas) + 1)})

Can anyone explain this to me, please?

Dataset Format and Flow of Code

I want to run this on my own dataset.... is there any kind of specific format it needs? How to prepare dataset? What is the Flow of this code?

Any body know any kind of Tensorflow implementation of this code? which must have beam search as well...

where is model_name.npz?

any tutorial for the codes? I can hardly understand all these files.
I run the ipynb cells and can't find model_name.npz. It seems dev_list and image_path also need be specified.

datasets = {'flickr8k': (flickr8k.load_data, flickr8k.prepare_data),
'flickr30k': (flickr30k.load_data, flickr30k.prepare_data),
'coco': (coco.load_data, coco.prepare_data)}

location of the model file, the pkl file should be named "model_name.npz.pkl"

model= 'model_name.npz'

location of the devset split file like the ones in /splits

dev_list = './splits/coco_val.txt'
image_path = './path_to_coco_dev_image/'

load model model_options

with open('%s.pkl'%model, 'rb') as f:
options = pkl.load(f)

print 'Loading: ' + options['dataset']

flist = []
with open(dev_list, 'r') as f:
for l in f:
flist.append(l.strip())

MemoryError in sampling process (gen_model in generate_caps.py)

My system configuration is i5-4440k with a 4GB GTX 960. When I run the generate_caps.py for the flickr30k version, I get an error like this:

loading data ...
Error when trying to find the memory information on the GPU: initialization error
Error allocating 4000000 bytes of device memory (initialization error). Driver report 0 bytes free and 0 bytes total 
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "generate_caps.py", line 30, in gen_model
    tparams = init_tparams(params)
  File "/home/f0z/sat/arctic-captions/capgen.py", line 94, in init_tparams
    k2 = theano.shared(params[params.keys()[0]], name=params.keys()[0], borrow=True)
  File "/home/f0z/pyenvs/DL/local/lib/python2.7/site-packages/theano/compile/sharedvalue.py", line 208, in shared
    allow_downcast=allow_downcast, **kwargs)
  File "/home/f0z/pyenvs/DL/local/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 203, in float32_shared_constructor
    deviceval = type_support_filter(value, type.broadcastable, False, None)
MemoryError: ('Error allocating 4000000 bytes of device memory (initialization error).', "you might consider using 'theano.shared(..., borrow=True)'")

I've run nvidia-smi and there is plenty of memory (~3.5GB and I'm only loading the dev dataset). Also there are no permission issues. Since the error is in declaring shared variables in the gpu through a process, I'm guessing this is due to inability of the function gen_model to allocate memory in GPU from a process. Can anyone please suggest any workarounds?

Reproduce result

Hi Kelvin,

After I trained model 2000 epochs, I tried to use generate_caps.py to get the captions. However, the result was meaningless. Is there anything I need to take care of regarding the generate_caps.py?

Thanks

"NaN detected" when using 'attn_type': 'stochastic'

When I set attn_type to deterministic training works fine, when I set it to stochastic I get a 'NaN detected' error and the training halts.
Here is an example output:

$ python evaluate_coco.py 
Using gpu device 0: GeForce GTX TITAN Black
Using the following parameters:
{'lrate': 0.01, 'decay_c': 0.0, 'patience': 10, 'save_per_epoch': False, 'n_layers_init': 2, 'RL_sumCost': True, 'max_epochs': 5000, 'dispFreq': 1, 'attn_type': 'stochastic', 'alpha_c': 1.0, 'temperature': 1.0, 'n_layers_att': 2, 'saveto': 'my_caption_model.npz', 'ctx_dim': 512, 'valid_batch_size': 64, 'lstm_encoder': False, 'n_layers_lstm': 1, 'optimizer': 'adam', 'validFreq': 2000, 'dictionary': None, 'batch_size': 64, 'selector': True, 'n_words': 10000, 'dataset': 'coco', 'use_dropout_lstm': False, 'prev2out': True, 'dim': 1800, 'use_dropout': True, 'dim_word': 512, 'sampleFreq': 250, 'semi_sampling_p': 0.5, 'n_layers_out': 1, 'saveFreq': 1000, 'maxlen': 100, 'alpha_entropy_c': 0.002, 'ctx2out': True, 'reload_': False}
Loading data
... loading data
Building model
Buliding sampler
Building f_init... Done
Optimization
Epoch  0
Epoch  0 Update  1 Cost  527.620910645 PD  0.000997066497803 UD  0.72459602356
NaN detected
Traceback (most recent call last):
  File "evaluate_coco.py", line 81, in <module>
    main(defaults)
  File "evaluate_coco.py", line 49, in main
    print "Final cost: {:.2f}".format(validerr.mean())
AttributeError: 'float' object has no attribute 'mean'

I am using theano version '0.6.0.dev-8e85dbabd78c3932997aaf840832a1bb5c5835b3'

How soon does the sanity check look like working fine?

I'm running the capgen.py, but the results of the sanity check don't look meaningful. Basically, the five generated captions for each image are all null or "a". How soon can I see if it works? I just want it to show any random words, not null or 'a'. Or do I have to be patient and wait for a few hours until it generates meaningful captions?

MissingInputError from theano.function

Hello, thank you for sharing.

I tried to generate caption for my image. To that end, I follows your code. I made a model file with capgen.train function (using Flickr8k data and default parameters).

Based on the model, I tried to generate a caption. However, I meet an error. The error is ...


MissingInputError Traceback (most recent call last)
in ()
1 # get the alphas and selector value [called \beta in the paper]
----> 2 f_alpha = theano.function(inps, alphas, name='f_alpha')
3 if options['selector']:
4 f_sels = theano.function(inps, opt_outs['selector'], name='f_sels')

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function.pyc in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
264 allow_input_downcast=allow_input_downcast,
265 on_unused_input=on_unused_input,
--> 266 profile=profile)
267 # We need to add the flag check_aliased inputs if we have any mutable or
268 # borrowed used defined inputs

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/pfunc.pyc in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
509 return orig_function(inputs, cloned_outputs, mode,
510 accept_inplace=accept_inplace, name=name, profile=profile,
--> 511 on_unused_input=on_unused_input)
512
513

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input)
1463 accept_inplace=accept_inplace,
1464 profile=profile,
-> 1465 on_unused_input=on_unused_input).create(
1466 defaults)
1467

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in init(self, inputs, outputs, mode, accept_inplace, function_builder, profile, on_unused_input, fgraph)
1131 need_opt = True
1132 # make the fgraph (copies the graph, creates NEW INPUT AND OUTPUT VARIABLES)
-> 1133 fgraph, additional_outputs = std_fgraph(inputs, outputs, accept_inplace)
1134 fgraph.profile = profile
1135 else:

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in std_fgraph(input_specs, output_specs, accept_inplace)
139 orig_outputs = [spec.variable for spec in output_specs] + updates
140
--> 141 fgraph = gof.fg.FunctionGraph(orig_inputs, orig_outputs)
142
143 for node in fgraph.apply_nodes:

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in init(self, inputs, outputs, features, clone)
133 self.variables.add(input)
134
--> 135 self.import_r(outputs, reason="init")
136 for i, output in enumerate(outputs):
137 output.clients.append(('output', i))

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in import_r(self, variables, reason)
255 for apply_node in [r.owner for r in variables if r.owner is not None]:
256 if apply_node not in self.apply_nodes:
--> 257 self.import(apply_node, reason=reason)
258 for r in variables:
259 if r.owner is None and not isinstance(r, graph.Constant) and r not in self.inputs:

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in import(self, apply_node, check, reason)
360 "for more information on this error."
361 % str(node)),
--> 362 r)
363
364 for node in new_nodes:

MissingInputError: ("An input of the graph, used to compute dot(<TensorType(float32, matrix)>, decoder_Wd_att), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", <TensorType(float32, matrix)>)


I start ipython with theano parameter as "THEANO_FLAGS=device=gpu,floatX=float32 ipython notebook"

Do I have mistakes?

Thank you.

Step by step readme file

Hi Kelvin, I am studying the image and video captioning. And I found your paper. Which is very well explained. I want to try your model but I am just very confused as the readme file you provided isn't really helping. Can you please explain me how to get started with your model. Like what file is for features extraction, caption generation, training, evaluation, etc.

Kind Regards

Datasets in .pkl format?

Hello, thank you for sharing this great project.

I would like to run the code, but it seems like the project does not contain the datasets used. Even though I can get flickr or coco dataset but I do not know how the data is preprocessed in those .pkl files.

Can I possibly get the data as it is used in the project?

Thank you.

metrics.py reference format?

The metrics.py scripts requires a path for the hypothesis files and one for the reference files.
The format for the hypothesis file I assume is the file that is generated by generate_caps.py, but can you please explain the format of reference files, since there are like 5 sentences per image?
Thank you!

KeyError: 'A'

when run the prepare_data() in flickr30k.py ,it always reports a KeyError: 'A'.
Does it mean that my dictionary.pkl is incomplete?

How to get start and How the convolution works out

Hi, Kelvin

I wanna to run your code, but I don't know how to start.

I download some corpus, I get the data "dataset.json" , is that what I need ?

Also, I wanna to know how the convolution in "encoder" works, can you give me an example?

For example , here comes a 3-D matrix (1002003) , how can I set my "filter(extract)" , the 3-D or 2-D or others? and what the result be ?

Best wishes

Trained Models

I am not sure it is true way to ask this question, but have you published any pretrained models on any dataset?

Generate captions for custom images

Hello!
I'd like to test your great framework by generating captions for my own set of images.
How can I do that without training (using any pretrained model)?

MemoryError

I tried to implement it on GPU without any revise of the hyperparameters(only the path). However, I got
MemoryError: Error allocating 25690112 bytes of device memory (out of memory).
when it was on epoch 2.
I'm not sure if I use wrong command:
THEANO_FLAGS='mode=FAST_RUN,floatX=float32,device=gpu0' python evaluate_flickr30k.py
It shows CNMeM is disabled.
Then I try:
THEANO_FLAGS='mode=FAST_RUN,floatX=float32,device=gpu0,lib.cnmem=1' python evaluate_flickr30k.py
However, when it runs gen_sample, the sample sentence is vanished, which is really strange.

Or something wrong with the preprocessing?
I make the four pkl files by
#1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.