kelvinxu / arctic-captions Goto Github PK

View Code? Open in Web Editor NEW

957.0 50.0 353.0 2.57 MB

Python 88.75% Jupyter Notebook 11.25%

arctic-captions's Introduction

arctic-captions

Source code for Show, Attend and Tell: Neural Image Caption Generation with Visual Attention runnable on GPU and CPU.

Joint collaboration between the Université de Montréal & University of Toronto.

Dependencies

This code is written in python. To use it you will need:

Python 2.7
A relatively recent version of NumPy
scikit learn
skimage
argparse

In addition, this code is built using the powerful Theano library. If you encounter problems specific to Theano, please use a commit from around February 2015 and notify the authors.

To use the evaluation script (metrics.py): see coco-caption for the requirements.

Reference

If you use this code as part of any published research, please acknowledge the following paper (it encourages researchers who publish their code!):

"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention."
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. To appear ICML (2015)

@article{Xu2015show,
    title={Show, Attend and Tell: Neural Image Caption Generation with Visual Attention},
    author={Xu, Kelvin and Ba, Jimmy and Kiros, Ryan and Cho, Kyunghyun and Courville, Aaron and Salakhutdinov, Ruslan and Zemel, Richard and Bengio, Yoshua},
    journal={arXiv preprint arXiv:1502.03044},
    year={2015}
}

License

The code is released under a revised (3-clause) BSD License.

arctic-captions's People

Stargazers

Watchers

Forkers

kratarth1203 jnhwkim vrama91 leo-zhou kyunghyuncho amoliu starte jethrotan zencoding jrabary tuan3w vikkamath youralien anenbergb samim23 atousatorabi plsang duguyue100 pengxuwei yanweifu stephenjia gouxiayibu pcyin asampat3090 arunmallya ymt123 ronghanghu sjtufs zhzou2020 joshuanewman10 dc-swind multipath lixiangnlp kod3r deepmodel hychyc07 skaasj mathrho zcyang flipvrijn sameer95 luong-vinh xinmei9322 jhuxiang xlhdh qmiwang lngvietthang vadimostanin wangg12 zchengquan kevinwenya scatterbrain333 lxastro liuchang8am arunlodhi guilk rayz0620 kmario23 vyouman saurabh3949 leonardblier chenxi116 jonbean gevorgk liqing-ustc amandasongmm pinglmlcv ml-ai-nlp-ir loicbarrault ericclei tomokane yidann andrewchiyz kyuusaku binbinbian sszzsupersupersupersuper zealcui posenhuang liujinxue benjamesbabala wanjinchang peratham littlecherry11 miradel51 yangzlthu bigeyedestroyer ouya-bytes vanova elani0 sxjscience dylan-fan dds-dong qinggege guangweidao huluhu cosmozhang yangjunpro somaticapi hyzcn alorozco53

arctic-captions's Issues

'sample = empty string' when trying to train flickr_30k

Thanks for the sharing the implementation. Please, anybody know why when training, I get the 'truth' = empty string, here is the output of the debugger:

Epoch 1
... Truth 0 : UNK UNK with UNK white shirt aims UNK dart at an UNK target as several other people holding darts look on beside her UNK
Sample ( 0 ) 0 :
... Truth 1 : UNK man UNK along with three others UNK is standing in snowy and cold weather with only green shoes and black shorts on UNK
Sample ( 0 ) 1 :
... Truth 2 : bald man in what looks like to be UNK lab with UNK lady watching as he is mixing some liquids in clear cup containers
Sample ( 0 ) 2 :
... Truth 3 : UNK UNK in UNK big purple hat and UNK long blue coat and UNK man with UNK satchel spending time in the city UNK
Sample ( 0 ) 3 :
... Truth 4 : UNK man rides UNK skateboard down the railing of UNK staircase in front of UNK closed storefront UNK with three people watching him UNK
Sample ( 0 ) 4 :
... Truth 5 : UNK group of dancers dressed in red and wearing short white tutus conversing UNK while UNK single dancer is practicing in the background UNK
Sample ( 0 ) 5 :
... Truth 6 : UNK on UNK baseball field UNK one baseball player is sliding into UNK base while UNK player from the opposing team is jumping UNK
Sample ( 0 ) 6 :
... Truth 7 : UNK female acrobat with long UNK blond curly hair UNK dangling upside down while suspending herself from long UNK red ribbons of fabric UNK
Sample ( 0 ) 7 :
... Truth 8 : UNK man with UNK backpack is walking down UNK street while UNK man in an orange shirt pushes UNK cart the opposite direction UNK
Sample ( 0 ) 8 :
... Truth 9 : UNK football game is going on UNK with player on the field UNK squatting on the sidelines and standing UNK watching the game UNK
Sample ( 0 ) 9 :

I extracted the CNN features of the images as illustrated here: #1

Generate captions for custom images

Hello!
I'd like to test your great framework by generating captions for my own set of images.
How can I do that without training (using any pretrained model)?

Typo in the arXiv & ICML paper

From the code and Pascanu et. al, I think that Eqn 7 in the arXiv version (or Eqn 2 in the ICML version) should add a tanh after L_0.

And is a y_t missing in the same equation as well? From Bahdanau et al. Appendix part Page 14, p(y_i | s_i, y_{i-1}, c_i)

Tensoflow implementation of this model?

is there any TensorFlow base code available for this model?... I found one but that does lack some core properties

question about grads of alphas in hard Attention

Hello, I feel realy confused about the grads of alphas in hard attention. The source code is in line 1199:

known_grads={alphas:opt_outs['masked_cost'][:,:,None]/10.*
(alphas_sample/alphas) + alpha_entropy_c*(tensor.log(alphas) + 1)})

Can anyone explain this to me, please?

Bug in doubly stochastic attention?

It seems like when computing the doubly stochastic attention, the code is doing:

alpha_reg = alpha_c * ((1.-alphas.sum(0))**2).sum(0).mean()

As per my understanding alphas is of dimensions [sequence_length, batch_size, feature_map_spatial_extent] which for vgg conv5 would be 14 x 14 = 196.

This means that we are averaging along the 196 spatial locations as opposed to averaging along with minibatch. Is this the expected behavior?

Any clarification on this would be great!

MemoryError

I tried to implement it on GPU without any revise of the hyperparameters(only the path). However, I got
MemoryError: Error allocating 25690112 bytes of device memory (out of memory).
when it was on epoch 2.
I'm not sure if I use wrong command:
THEANO_FLAGS='mode=FAST_RUN,floatX=float32,device=gpu0' python evaluate_flickr30k.py
It shows CNMeM is disabled.
Then I try:
THEANO_FLAGS='mode=FAST_RUN,floatX=float32,device=gpu0,lib.cnmem=1' python evaluate_flickr30k.py
However, when it runs gen_sample, the sample sentence is vanished, which is really strange.

Or something wrong with the preprocessing?
I make the four pkl files by
#1

Question about sampling in stochastic attention

Hello, while analyzing the source code, I found the process of getting alpha_sample by stochastic hard attention quiet not clear, mainly because of the variable 'h_sampling_mask'

The sampling part of the code is (in capgen.py, line 409),
alpha_sample = h_sampling_mask * trng.multinomial(pvals=alpha,dtype=theano.config.floatX)
+ (1.-h_sampling_mask) * alpha

When h_sampling_mask is 1, alpha_sample would be the sampling result of the multinomial distribution.
When h_sampling_mask is 0, however, alpha_sample would be simply alpha.

I though, according to the paper, alpha_sample should be simply
alpha_sample = trng.multinomial(pvals=alpha,dtype=theano.config.floatX)
which is equivalent to setting h_sampling_mask 1.

Why is "h_sampling_mask" needed?

Post your evaluation score

Hello, everyone,

I got the following score after I ran the coco.

{'CIDEr': 0.50350648251818364, 'Bleu_4': 0.20037826460154334, 'Bleu_3': 0.2920434703847389, 'Bleu_2': 0.42775646056296673, 'Bleu_1': 0.6105274018537202, 'ROUGE_L': 0.43556281782994649, 'METEOR': 0.23890246684760072}

So METEOR is almost same. However my BLEU score are 7~8% lower than paper. I wonder if this is acceptable or there is something wrong in my process.

Would you please share your results in this post?

Thanks.

Instruction to visualize hard attention

Hi, thanks for sharing the code. I expect the alpha_visualization.ipynb to visualize hard attention just like what the paper represented. But I trained a stochastic one and visualize it , then I got visualization like the soft one. Has anyone successfully visualize the hard attention and give me some instrustions on how to do it? Thanks. :)

train error!

when I train the model,there comes the error:

       14     for cc in caps:
       15         seqs.append([worddict[w] if worddict[w] < n_words else 1 for w in cc[0].split()])
  ---> 16         feat_list.append(features[cc[1]])
       17 
       18     lengths = [len(s) for s in seqs]

       TypeError: 'coo_matrix' object does not support indexing

Instructions to re-train the model: where to start?

Is there a step-by-step instruction on how to train the model on COCO dataset? I cloned the repo and tried running capon.py but that didn't work.

A bug in setting Adam optimizer learning rate?

Hi,
Looks like the Adam optimizer learning rate is not affected by the function input and is fixed to
lr0=0.0002
https://github.com/kelvinxu/arctic-captions/blob/master/optimizers.py#L83

Is this an intended behavior?
Thank you

why don't finetune cnn?

hey, I have noticed that cnn-lstm model can benefit a lot from fine-tuning cnn. But why the code don't fine-tune cnn?

MemoryError in sampling process (gen_model in generate_caps.py)

My system configuration is i5-4440k with a 4GB GTX 960. When I run the generate_caps.py for the flickr30k version, I get an error like this:

loading data ...
Error when trying to find the memory information on the GPU: initialization error
Error allocating 4000000 bytes of device memory (initialization error). Driver report 0 bytes free and 0 bytes total 
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "generate_caps.py", line 30, in gen_model
    tparams = init_tparams(params)
  File "/home/f0z/sat/arctic-captions/capgen.py", line 94, in init_tparams
    k2 = theano.shared(params[params.keys()[0]], name=params.keys()[0], borrow=True)
  File "/home/f0z/pyenvs/DL/local/lib/python2.7/site-packages/theano/compile/sharedvalue.py", line 208, in shared
    allow_downcast=allow_downcast, **kwargs)
  File "/home/f0z/pyenvs/DL/local/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 203, in float32_shared_constructor
    deviceval = type_support_filter(value, type.broadcastable, False, None)
MemoryError: ('Error allocating 4000000 bytes of device memory (initialization error).', "you might consider using 'theano.shared(..., borrow=True)'")

I've run nvidia-smi and there is plenty of memory (~3.5GB and I'm only loading the dev dataset). Also there are no permission issues. Since the error is in declaring shared variables in the gpu through a process, I'm guessing this is due to inability of the function gen_model to allocate memory in GPU from a process. Can anyone please suggest any workarounds?

Can I get pkl files?

I would like to have 'coco_align.train.pkl' and other pkl files.
I tried
https://github.com/intuinno/arctic-captions
https://github.com/rowanz/arctic-captions
https://github.com/Lorne0/arctic-captions
but failed. Please help me..

bias term has been added twice in LSTM layer

Both the codes in the function _step(m_, x_, h_, c_) from here and in computing state_below from here add the bias term, which seems to be wrong. Could you please have a look? Or if it is correct, could you please explain it a bit? Thx.

question about "doubly stochastic attention"

As I'm reading the paper, I don't understand, for the soft attention version, why we encourage \sum_{t} a_{ti} \approx 1, as I feel C/L would be more appropriate, since \sum_{t,i} a_{ti} = C.

Questions or bugs in the adam optimizer

From line 84,85 and 97,98 of the optimizer.py , we can see the b1 and b2 here are correspond to '1-b1' and '1-b2' respectively of the original adam paper, i.e., 'Adam: A Method for Stochastic Optimization" Kingma et al. (ICLR 2015)'. However, I am confused by line 90,91.
I think the code should be :
fix1 = 1. - (1-b1)(i_t)
fix2 = 1. - (1-b2)(i_t), instead. Because the b1 and b2 should also be switched to '1-b1' and '1-b2' constantly during the implementation.

I wonder how the authors use the adam optimizer when conducting experiments on MSCOCO.

where is model_name.npz?

any tutorial for the codes? I can hardly understand all these files.
I run the ipynb cells and can't find model_name.npz. It seems dev_list and image_path also need be specified.

datasets = {'flickr8k': (flickr8k.load_data, flickr8k.prepare_data),
'flickr30k': (flickr30k.load_data, flickr30k.prepare_data),
'coco': (coco.load_data, coco.prepare_data)}

location of the model file, the pkl file should be named "model_name.npz.pkl"

model= 'model_name.npz'

location of the devset split file like the ones in /splits

dev_list = './splits/coco_val.txt'
image_path = './path_to_coco_dev_image/'

load model model_options

with open('%s.pkl'%model, 'rb') as f:
options = pkl.load(f)

print 'Loading: ' + options['dataset']

flist = []
with open(dev_list, 'r') as f:
for l in f:
flist.append(l.strip())

Training parameters for Flickr30k, coco dataset?

Hello, thank you for sharing.

Shared source contains training parameters for Flickr8k dataset.
Can I use the same parameters for Flickr30K and coco dataset?

Thank you ^^;

Sample script to train the model

Hi, do you mind sharing some sample scripts or command lines to train the model?

Reproduce result

Hi Kelvin,

After I trained model 2000 epochs, I tried to use generate_caps.py to get the captions. However, the result was meaningless. Is there anything I need to take care of regarding the generate_caps.py?

Thanks

how to run and get_start?

no re-ranking?

Actually, I know that re-ranking gives higher score (for m-RNN, about 3 points gain in terms of BLEU4 score and about 9 points in terms of CIDEr)
I'm wondering that this work does not use re-ranking as other works do? also, no any post-processing?

about platform

Can this code be built on windows installed with theano?I have a problem when build in vs2013:cannot open file "cublas.lib".However,I have already installed theano,anaconda and cuda.

MissingInputError from theano.function

Hello, thank you for sharing.

I tried to generate caption for my image. To that end, I follows your code. I made a model file with capgen.train function (using Flickr8k data and default parameters).

Based on the model, I tried to generate a caption. However, I meet an error. The error is ...

MissingInputError Traceback (most recent call last)
in ()
1 # get the alphas and selector value [called \beta in the paper]
----> 2 f_alpha = theano.function(inps, alphas, name='f_alpha')
3 if options['selector']:
4 f_sels = theano.function(inps, opt_outs['selector'], name='f_sels')

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function.pyc in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
264 allow_input_downcast=allow_input_downcast,
265 on_unused_input=on_unused_input,
--> 266 profile=profile)
267 # We need to add the flag check_aliased inputs if we have any mutable or
268 # borrowed used defined inputs

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/pfunc.pyc in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
509 return orig_function(inputs, cloned_outputs, mode,
510 accept_inplace=accept_inplace, name=name, profile=profile,
--> 511 on_unused_input=on_unused_input)
512
513

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input)
1463 accept_inplace=accept_inplace,
1464 profile=profile,
-> 1465 on_unused_input=on_unused_input).create(
1466 defaults)
1467

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in init(self, inputs, outputs, mode, accept_inplace, function_builder, profile, on_unused_input, fgraph)
1131 need_opt = True
1132 # make the fgraph (copies the graph, creates NEW INPUT AND OUTPUT VARIABLES)
-> 1133 fgraph, additional_outputs = std_fgraph(inputs, outputs, accept_inplace)
1134 fgraph.profile = profile
1135 else:

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in std_fgraph(input_specs, output_specs, accept_inplace)
139 orig_outputs = [spec.variable for spec in output_specs] + updates
140
--> 141 fgraph = gof.fg.FunctionGraph(orig_inputs, orig_outputs)
142
143 for node in fgraph.apply_nodes:

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in init(self, inputs, outputs, features, clone)
133 self.variables.add(input)
134
--> 135 self.import_r(outputs, reason="init")
136 for i, output in enumerate(outputs):
137 output.clients.append(('output', i))

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in import_r(self, variables, reason)
255 for apply_node in [r.owner for r in variables if r.owner is not None]:
256 if apply_node not in self.apply_nodes:
--> 257 self.import(apply_node, reason=reason)
258 for r in variables:
259 if r.owner is None and not isinstance(r, graph.Constant) and r not in self.inputs:

/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in import(self, apply_node, check, reason)
360 "for more information on this error."
361 % str(node)),
--> 362 r)
363
364 for node in new_nodes:

MissingInputError: ("An input of the graph, used to compute dot(<TensorType(float32, matrix)>, decoder_Wd_att), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", <TensorType(float32, matrix)>)

I start ipython with theano parameter as "THEANO_FLAGS=device=gpu,floatX=float32 ipython notebook"

Do I have mistakes?

Thank you.

Feature extraction

Hello, thank you for sharing this great project.

I would like to run the code for caption generation of my own image. To that end, I tried to extract image feature vector with VGG model (based on the paper). I used below code for extract feature vector (values of conv5_3).

input_img = numpy.array( caffe.io.load_image(path_img) ) #loading images HxWx3 (RGB)
caffe_input = numpy.array( preprocess_image(input_img) ) #preprocess the images

caffe_net.blobs['data'].reshape(1,3,224,224)
caffe_net.blobs['data'].data[...] = caffe_input

out = caffe_net.forward()

feature = numpy.array(caffe_net.blobs['conv5_3'].reshape(1,512,14,14))

How can I get context vector for function 'capgen.gen_sample'?
I am not good at python and caffe. I want to know how to get 'context' from the VGG model.

Thank you.

cannot replicate the result in the paper?

I ran the default configuration, it seems there is moderate gap between the result reported in the paper?

Dataset Format and Flow of Code

I want to run this on my own dataset.... is there any kind of specific format it needs? How to prepare dataset? What is the Flow of this code?

Any body know any kind of Tensorflow implementation of this code? which must have beam search as well...

How to get start and How the convolution works out

Hi, Kelvin

I wanna to run your code, but I don't know how to start.

I download some corpus, I get the data "dataset.json" , is that what I need ?

Also, I wanna to know how the convolution in "encoder" works, can you give me an example?

For example , here comes a 3-D matrix (1002003) , how can I set my "filter(extract)" , the 3-D or 2-D or others? and what the result be ?

Best wishes

"NaN detected" when using 'attn_type': 'stochastic'

When I set attn_type to deterministic training works fine, when I set it to stochastic I get a 'NaN detected' error and the training halts.
Here is an example output:

$ python evaluate_coco.py 
Using gpu device 0: GeForce GTX TITAN Black
Using the following parameters:
{'lrate': 0.01, 'decay_c': 0.0, 'patience': 10, 'save_per_epoch': False, 'n_layers_init': 2, 'RL_sumCost': True, 'max_epochs': 5000, 'dispFreq': 1, 'attn_type': 'stochastic', 'alpha_c': 1.0, 'temperature': 1.0, 'n_layers_att': 2, 'saveto': 'my_caption_model.npz', 'ctx_dim': 512, 'valid_batch_size': 64, 'lstm_encoder': False, 'n_layers_lstm': 1, 'optimizer': 'adam', 'validFreq': 2000, 'dictionary': None, 'batch_size': 64, 'selector': True, 'n_words': 10000, 'dataset': 'coco', 'use_dropout_lstm': False, 'prev2out': True, 'dim': 1800, 'use_dropout': True, 'dim_word': 512, 'sampleFreq': 250, 'semi_sampling_p': 0.5, 'n_layers_out': 1, 'saveFreq': 1000, 'maxlen': 100, 'alpha_entropy_c': 0.002, 'ctx2out': True, 'reload_': False}
Loading data
... loading data
Building model
Buliding sampler
Building f_init... Done
Optimization
Epoch  0
Epoch  0 Update  1 Cost  527.620910645 PD  0.000997066497803 UD  0.72459602356
NaN detected
Traceback (most recent call last):
  File "evaluate_coco.py", line 81, in <module>
    main(defaults)
  File "evaluate_coco.py", line 49, in main
    print "Final cost: {:.2f}".format(validerr.mean())
AttributeError: 'float' object has no attribute 'mean'

I am using theano version '0.6.0.dev-8e85dbabd78c3932997aaf840832a1bb5c5835b3'

KeyError: 'A'

when run the prepare_data() in flickr30k.py ,it always reports a KeyError: 'A'.
Does it mean that my dictionary.pkl is incomplete?

Split problem

I found the split of test/val is different from what is given in karpathy/neuraltalk2.
According to their script
https://github.com/karpathy/neuraltalk2/blob/master/coco/coco_preprocess.ipynb

They tried to get first 5000 as val, 5000-10000 as test from this dataset http://msvocds.blob.core.windows.net/annotations-1-0-3/captions_train-val2014.zip. But when I output the filename, I found it's totally different. Did I miss something?

How soon does the sanity check look like working fine?

I'm running the capgen.py, but the results of the sanity check don't look meaningful. Basically, the five generated captions for each image are all null or "a". How soon can I see if it works? I just want it to show any random words, not null or 'a'. Or do I have to be patient and wait for a few hours until it generates meaningful captions?

f_init = theano.function([ctx], [ctx]+init_state+init_memory, name='f_init', profile=False)

Excuse me, I don't quite understand what the output of [ctx]+init_state+init_memory means in this theano.function finit .Besides in functions of rval = f_init(ctx0),ctx0 = rval[0], next_state.append(rval[1+lidx]), what exactly is f_init. As far as I am concerned, init_state and init_memory are just we want. Why do we bother to add them up? Thank you.

Trained Models

I am not sure it is true way to ask this question, but have you published any pretrained models on any dataset?

how to set decay_c when train the net?

The init value of decay_c is 0,how to set this value when use the L2 regularization

Datasets in .pkl format?

Hello, thank you for sharing this great project.

I would like to run the code, but it seems like the project does not contain the datasets used. Even though I can get flickr or coco dataset but I do not know how the data is preprocessed in those .pkl files.

Can I possibly get the data as it is used in the project?

Thank you.

Step by step readme file

Hi Kelvin, I am studying the image and video captioning. And I found your paper. Which is very well explained. I want to try your model but I am just very confused as the readme file you provided isn't really helping. Can you please explain me how to get started with your model. Like what file is for features extraction, caption generation, training, evaluation, etc.

Kind Regards

cannot figure out the code

I have comprehended the image caption attention model. But I still cannot figure out how the image caption attention model is set up in theano. The code is too hard for me. Who else has similar problems?

RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility

/Users/Calvin/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *

This is due to the my version of numpy? I googled about it but I can't get a good answer, though I found someone said it's just fine.

Where can I get "model_name.npz" to run the Jupyter Notebook example?

Can you please provide a pretrained model on any of the three datasets (coco, flickr8k, flickr30k)? Or, at least, an indication on where to find it?
I was trying to replicate the experiments in the Jupyter Notebook

argument 'model' in the 'main function' (generate_caps.py)

What input should be given to model?
Is there any model file for this argument or I have to create?

metrics.py reference format?

The metrics.py scripts requires a path for the hypothesis files and one for the reference files.
The format for the hypothesis file I assume is the file that is generated by generate_caps.py, but can you please explain the format of reference files, since there are like 5 sentences per image?
Thank you!

kelvinxu / arctic-captions Goto Github PK

arctic-captions's Introduction

arctic-captions

Dependencies

Reference

License

arctic-captions's People

Stargazers

Watchers

Forkers

arctic-captions's Issues

location of the model file, the pkl file should be named "model_name.npz.pkl"

location of the devset split file like the ones in /splits

load model model_options

feature = numpy.array(caffe_net.blobs['conv5_3'].reshape(1,512,14,14))

Recommend Projects

Recommend Topics

Recommend Org