arctic-captions's People
Forkers
kratarth1203 jnhwkim vrama91 leo-zhou kyunghyuncho amoliu starte jethrotan zencoding jrabary tuan3w vikkamath youralien anenbergb samim23 atousatorabi plsang duguyue100 pengxuwei yanweifu stephenjia gouxiayibu pcyin asampat3090 arunmallya ymt123 ronghanghu sjtufs zhzou2020 joshuanewman10 dc-swind multipath lixiangnlp kod3r deepmodel hychyc07 skaasj mathrho zcyang flipvrijn sameer95 luong-vinh xinmei9322 jhuxiang xlhdh qmiwang lngvietthang vadimostanin wangg12 zchengquan kevinwenya scatterbrain333 lxastro liuchang8am arunlodhi guilk rayz0620 kmario23 vyouman saurabh3949 leonardblier chenxi116 jonbean gevorgk liqing-ustc amandasongmm pinglmlcv ml-ai-nlp-ir loicbarrault ericclei tomokane yidann andrewchiyz kyuusaku binbinbian sszzsupersupersupersuper zealcui posenhuang liujinxue benjamesbabala wanjinchang peratham littlecherry11 miradel51 yangzlthu bigeyedestroyer ouya-bytes vanova elani0 sxjscience dylan-fan dds-dong qinggege guangweidao huluhu cosmozhang yangjunpro somaticapi hyzcn alorozco53arctic-captions's Issues
train error!
when I train the model,there comes the error:
14 for cc in caps:
15 seqs.append([worddict[w] if worddict[w] < n_words else 1 for w in cc[0].split()])
---> 16 feat_list.append(features[cc[1]])
17
18 lengths = [len(s) for s in seqs]
TypeError: 'coo_matrix' object does not support indexing
why don't finetune cnn?
hey, I have noticed that cnn-lstm model can benefit a lot from fine-tuning cnn. But why the code don't fine-tune cnn?
cannot figure out the code
I have comprehended the image caption attention model. But I still cannot figure out how the image caption attention model is set up in theano. The code is too hard for me. Who else has similar problems?
about platform
Can this code be built on windows installed with theano?I have a problem when build in vs2013:cannot open file "cublas.lib".However,I have already installed theano,anaconda and cuda.
how to run and get_start?
no re-ranking?
Actually, I know that re-ranking gives higher score (for m-RNN, about 3 points gain in terms of BLEU4 score and about 9 points in terms of CIDEr)
I'm wondering that this work does not use re-ranking as other works do? also, no any post-processing?
how to set decay_c when train the net?
The init value of decay_c is 0,how to set this value when use the L2 regularization
RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
/Users/Calvin/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from scan_perform.scan_perform import *
This is due to the my version of numpy? I googled about it but I can't get a good answer, though I found someone said it's just fine.
bias term has been added twice in LSTM layer
Questions or bugs in the adam optimizer
From line 84,85 and 97,98 of the optimizer.py , we can see the b1 and b2 here are correspond to '1-b1' and '1-b2' respectively of the original adam paper, i.e., 'Adam: A Method for Stochastic Optimization" Kingma et al. (ICLR 2015)'. However, I am confused by line 90,91.
I think the code should be :
fix1 = 1. - (1-b1)(i_t)
fix2 = 1. - (1-b2)(i_t), instead. Because the b1 and b2 should also be switched to '1-b1' and '1-b2' constantly during the implementation.
I wonder how the authors use the adam optimizer when conducting experiments on MSCOCO.
A bug in setting Adam optimizer learning rate?
Hi,
Looks like the Adam optimizer learning rate is not affected by the function input and is fixed to
lr0=0.0002
https://github.com/kelvinxu/arctic-captions/blob/master/optimizers.py#L83
Is this an intended behavior?
Thank you
'sample = empty string' when trying to train flickr_30k
Thanks for the sharing the implementation. Please, anybody know why when training, I get the 'truth' = empty string, here is the output of the debugger:
Epoch 1
... Truth 0 : UNK UNK with UNK white shirt aims UNK dart at an UNK target as several other people holding darts look on beside her UNK
Sample ( 0 ) 0 :
... Truth 1 : UNK man UNK along with three others UNK is standing in snowy and cold weather with only green shoes and black shorts on UNK
Sample ( 0 ) 1 :
... Truth 2 : bald man in what looks like to be UNK lab with UNK lady watching as he is mixing some liquids in clear cup containers
Sample ( 0 ) 2 :
... Truth 3 : UNK UNK in UNK big purple hat and UNK long blue coat and UNK man with UNK satchel spending time in the city UNK
Sample ( 0 ) 3 :
... Truth 4 : UNK man rides UNK skateboard down the railing of UNK staircase in front of UNK closed storefront UNK with three people watching him UNK
Sample ( 0 ) 4 :
... Truth 5 : UNK group of dancers dressed in red and wearing short white tutus conversing UNK while UNK single dancer is practicing in the background UNK
Sample ( 0 ) 5 :
... Truth 6 : UNK on UNK baseball field UNK one baseball player is sliding into UNK base while UNK player from the opposing team is jumping UNK
Sample ( 0 ) 6 :
... Truth 7 : UNK female acrobat with long UNK blond curly hair UNK dangling upside down while suspending herself from long UNK red ribbons of fabric UNK
Sample ( 0 ) 7 :
... Truth 8 : UNK man with UNK backpack is walking down UNK street while UNK man in an orange shirt pushes UNK cart the opposite direction UNK
Sample ( 0 ) 8 :
... Truth 9 : UNK football game is going on UNK with player on the field UNK squatting on the sidelines and standing UNK watching the game UNK
Sample ( 0 ) 9 :
I extracted the CNN features of the images as illustrated here: #1
Post your evaluation score
Hello, everyone,
I got the following score after I ran the coco.
{'CIDEr': 0.50350648251818364, 'Bleu_4': 0.20037826460154334, 'Bleu_3': 0.2920434703847389, 'Bleu_2': 0.42775646056296673, 'Bleu_1': 0.6105274018537202, 'ROUGE_L': 0.43556281782994649, 'METEOR': 0.23890246684760072}
So METEOR is almost same. However my BLEU score are 7~8% lower than paper. I wonder if this is acceptable or there is something wrong in my process.
Would you please share your results in this post?
Thanks.
Question about sampling in stochastic attention
Hello, while analyzing the source code, I found the process of getting alpha_sample by stochastic hard attention quiet not clear, mainly because of the variable 'h_sampling_mask'
The sampling part of the code is (in capgen.py, line 409),
alpha_sample = h_sampling_mask * trng.multinomial(pvals=alpha,dtype=theano.config.floatX)
+ (1.-h_sampling_mask) * alpha
When h_sampling_mask is 1, alpha_sample would be the sampling result of the multinomial distribution.
When h_sampling_mask is 0, however, alpha_sample would be simply alpha.
I though, according to the paper, alpha_sample should be simply
alpha_sample = trng.multinomial(pvals=alpha,dtype=theano.config.floatX)
which is equivalent to setting h_sampling_mask 1.
Why is "h_sampling_mask" needed?
Feature extraction
Hello, thank you for sharing this great project.
I would like to run the code for caption generation of my own image. To that end, I tried to extract image feature vector with VGG model (based on the paper). I used below code for extract feature vector (values of conv5_3).
input_img = numpy.array( caffe.io.load_image(path_img) ) #loading images HxWx3 (RGB)
caffe_input = numpy.array( preprocess_image(input_img) ) #preprocess the images
caffe_net.blobs['data'].reshape(1,3,224,224)
caffe_net.blobs['data'].data[...] = caffe_input
out = caffe_net.forward()
feature = numpy.array(caffe_net.blobs['conv5_3'].reshape(1,512,14,14))
How can I get context vector for function 'capgen.gen_sample'?
I am not good at python and caffe. I want to know how to get 'context' from the VGG model.
Thank you.
question about "doubly stochastic attention"
As I'm reading the paper, I don't understand, for the soft attention version, why we encourage \sum_{t} a_{ti} \approx 1
, as I feel C/L
would be more appropriate, since \sum_{t,i} a_{ti} = C
.
Instruction to visualize hard attention
Hi, thanks for sharing the code. I expect the alpha_visualization.ipynb to visualize hard attention just like what the paper represented. But I trained a stochastic one and visualize it , then I got visualization like the soft one. Has anyone successfully visualize the hard attention and give me some instrustions on how to do it? Thanks. :)
Bug in doubly stochastic attention?
It seems like when computing the doubly stochastic attention, the code is doing:
alpha_reg = alpha_c * ((1.-alphas.sum(0))**2).sum(0).mean()
As per my understanding alphas is of dimensions [sequence_length, batch_size, feature_map_spatial_extent] which for vgg conv5 would be 14 x 14 = 196.
This means that we are averaging along the 196 spatial locations as opposed to averaging along with minibatch. Is this the expected behavior?
Any clarification on this would be great!
Typo in the arXiv & ICML paper
From the code and Pascanu et. al, I think that Eqn 7 in the arXiv version (or Eqn 2 in the ICML version) should add a tanh
after L_0
.
And is a y_t
missing in the same equation as well? From Bahdanau et al. Appendix part Page 14, p(y_i | s_i, y_{i-1}, c_i)
Can I get pkl files?
I would like to have 'coco_align.train.pkl' and other pkl files.
I tried
https://github.com/intuinno/arctic-captions
https://github.com/rowanz/arctic-captions
https://github.com/Lorne0/arctic-captions
but failed. Please help me..
question about grads of alphas in hard Attention
Hello, I feel realy confused about the grads of alphas in hard attention. The source code is in line 1199:
known_grads={alphas:opt_outs['masked_cost'][:,:,None]/10.*
(alphas_sample/alphas) + alpha_entropy_c*(tensor.log(alphas) + 1)})
Can anyone explain this to me, please?
Dataset Format and Flow of Code
I want to run this on my own dataset.... is there any kind of specific format it needs? How to prepare dataset? What is the Flow of this code?
Any body know any kind of Tensorflow implementation of this code? which must have beam search as well...
where is model_name.npz?
any tutorial for the codes? I can hardly understand all these files.
I run the ipynb cells and can't find model_name.npz. It seems dev_list and image_path also need be specified.
datasets = {'flickr8k': (flickr8k.load_data, flickr8k.prepare_data),
'flickr30k': (flickr30k.load_data, flickr30k.prepare_data),
'coco': (coco.load_data, coco.prepare_data)}
location of the model file, the pkl file should be named "model_name.npz.pkl"
model= 'model_name.npz'
location of the devset split file like the ones in /splits
dev_list = './splits/coco_val.txt'
image_path = './path_to_coco_dev_image/'
load model model_options
with open('%s.pkl'%model, 'rb') as f:
options = pkl.load(f)
print 'Loading: ' + options['dataset']
flist = []
with open(dev_list, 'r') as f:
for l in f:
flist.append(l.strip())
MemoryError in sampling process (gen_model in generate_caps.py)
My system configuration is i5-4440k with a 4GB GTX 960. When I run the generate_caps.py for the flickr30k version, I get an error like this:
loading data ...
Error when trying to find the memory information on the GPU: initialization error
Error allocating 4000000 bytes of device memory (initialization error). Driver report 0 bytes free and 0 bytes total
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "generate_caps.py", line 30, in gen_model
tparams = init_tparams(params)
File "/home/f0z/sat/arctic-captions/capgen.py", line 94, in init_tparams
k2 = theano.shared(params[params.keys()[0]], name=params.keys()[0], borrow=True)
File "/home/f0z/pyenvs/DL/local/lib/python2.7/site-packages/theano/compile/sharedvalue.py", line 208, in shared
allow_downcast=allow_downcast, **kwargs)
File "/home/f0z/pyenvs/DL/local/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 203, in float32_shared_constructor
deviceval = type_support_filter(value, type.broadcastable, False, None)
MemoryError: ('Error allocating 4000000 bytes of device memory (initialization error).', "you might consider using 'theano.shared(..., borrow=True)'")
I've run nvidia-smi
and there is plenty of memory (~3.5GB and I'm only loading the dev dataset). Also there are no permission issues. Since the error is in declaring shared variables in the gpu through a process, I'm guessing this is due to inability of the function gen_model to allocate memory in GPU from a process. Can anyone please suggest any workarounds?
Reproduce result
Hi Kelvin,
After I trained model 2000 epochs, I tried to use generate_caps.py to get the captions. However, the result was meaningless. Is there anything I need to take care of regarding the generate_caps.py?
Thanks
Sample script to train the model
Hi, do you mind sharing some sample scripts or command lines to train the model?
"NaN detected" when using 'attn_type': 'stochastic'
When I set attn_type to deterministic training works fine, when I set it to stochastic I get a 'NaN detected' error and the training halts.
Here is an example output:
$ python evaluate_coco.py
Using gpu device 0: GeForce GTX TITAN Black
Using the following parameters:
{'lrate': 0.01, 'decay_c': 0.0, 'patience': 10, 'save_per_epoch': False, 'n_layers_init': 2, 'RL_sumCost': True, 'max_epochs': 5000, 'dispFreq': 1, 'attn_type': 'stochastic', 'alpha_c': 1.0, 'temperature': 1.0, 'n_layers_att': 2, 'saveto': 'my_caption_model.npz', 'ctx_dim': 512, 'valid_batch_size': 64, 'lstm_encoder': False, 'n_layers_lstm': 1, 'optimizer': 'adam', 'validFreq': 2000, 'dictionary': None, 'batch_size': 64, 'selector': True, 'n_words': 10000, 'dataset': 'coco', 'use_dropout_lstm': False, 'prev2out': True, 'dim': 1800, 'use_dropout': True, 'dim_word': 512, 'sampleFreq': 250, 'semi_sampling_p': 0.5, 'n_layers_out': 1, 'saveFreq': 1000, 'maxlen': 100, 'alpha_entropy_c': 0.002, 'ctx2out': True, 'reload_': False}
Loading data
... loading data
Building model
Buliding sampler
Building f_init... Done
Optimization
Epoch 0
Epoch 0 Update 1 Cost 527.620910645 PD 0.000997066497803 UD 0.72459602356
NaN detected
Traceback (most recent call last):
File "evaluate_coco.py", line 81, in <module>
main(defaults)
File "evaluate_coco.py", line 49, in main
print "Final cost: {:.2f}".format(validerr.mean())
AttributeError: 'float' object has no attribute 'mean'
I am using theano version '0.6.0.dev-8e85dbabd78c3932997aaf840832a1bb5c5835b3'
Training parameters for Flickr30k, coco dataset?
Hello, thank you for sharing.
Shared source contains training parameters for Flickr8k dataset.
Can I use the same parameters for Flickr30K and coco dataset?
Thank you ^^;
How soon does the sanity check look like working fine?
I'm running the capgen.py, but the results of the sanity check don't look meaningful. Basically, the five generated captions for each image are all null or "a". How soon can I see if it works? I just want it to show any random words, not null or 'a'. Or do I have to be patient and wait for a few hours until it generates meaningful captions?
MissingInputError from theano.function
Hello, thank you for sharing.
I tried to generate caption for my image. To that end, I follows your code. I made a model file with capgen.train function (using Flickr8k data and default parameters).
Based on the model, I tried to generate a caption. However, I meet an error. The error is ...
MissingInputError Traceback (most recent call last)
in ()
1 # get the alphas and selector value [called \beta in the paper]
----> 2 f_alpha = theano.function(inps, alphas, name='f_alpha')
3 if options['selector']:
4 f_sels = theano.function(inps, opt_outs['selector'], name='f_sels')
/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function.pyc in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
264 allow_input_downcast=allow_input_downcast,
265 on_unused_input=on_unused_input,
--> 266 profile=profile)
267 # We need to add the flag check_aliased inputs if we have any mutable or
268 # borrowed used defined inputs
/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/pfunc.pyc in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
509 return orig_function(inputs, cloned_outputs, mode,
510 accept_inplace=accept_inplace, name=name, profile=profile,
--> 511 on_unused_input=on_unused_input)
512
513
/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input)
1463 accept_inplace=accept_inplace,
1464 profile=profile,
-> 1465 on_unused_input=on_unused_input).create(
1466 defaults)
1467
/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in init(self, inputs, outputs, mode, accept_inplace, function_builder, profile, on_unused_input, fgraph)
1131 need_opt = True
1132 # make the fgraph (copies the graph, creates NEW INPUT AND OUTPUT VARIABLES)
-> 1133 fgraph, additional_outputs = std_fgraph(inputs, outputs, accept_inplace)
1134 fgraph.profile = profile
1135 else:
/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/compile/function_module.pyc in std_fgraph(input_specs, output_specs, accept_inplace)
139 orig_outputs = [spec.variable for spec in output_specs] + updates
140
--> 141 fgraph = gof.fg.FunctionGraph(orig_inputs, orig_outputs)
142
143 for node in fgraph.apply_nodes:
/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in init(self, inputs, outputs, features, clone)
133 self.variables.add(input)
134
--> 135 self.import_r(outputs, reason="init")
136 for i, output in enumerate(outputs):
137 output.clients.append(('output', i))
/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in import_r(self, variables, reason)
255 for apply_node in [r.owner for r in variables if r.owner is not None]:
256 if apply_node not in self.apply_nodes:
--> 257 self.import(apply_node, reason=reason)
258 for r in variables:
259 if r.owner is None and not isinstance(r, graph.Constant) and r not in self.inputs:
/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg/theano/gof/fg.pyc in import(self, apply_node, check, reason)
360 "for more information on this error."
361 % str(node)),
--> 362 r)
363
364 for node in new_nodes:
MissingInputError: ("An input of the graph, used to compute dot(<TensorType(float32, matrix)>, decoder_Wd_att), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", <TensorType(float32, matrix)>)
I start ipython with theano parameter as "THEANO_FLAGS=device=gpu,floatX=float32 ipython notebook"
Do I have mistakes?
Thank you.
Step by step readme file
Hi Kelvin, I am studying the image and video captioning. And I found your paper. Which is very well explained. I want to try your model but I am just very confused as the readme file you provided isn't really helping. Can you please explain me how to get started with your model. Like what file is for features extraction, caption generation, training, evaluation, etc.
Kind Regards
Datasets in .pkl format?
Hello, thank you for sharing this great project.
I would like to run the code, but it seems like the project does not contain the datasets used. Even though I can get flickr or coco dataset but I do not know how the data is preprocessed in those .pkl files.
Can I possibly get the data as it is used in the project?
Thank you.
metrics.py reference format?
The metrics.py scripts requires a path for the hypothesis files and one for the reference files.
The format for the hypothesis file I assume is the file that is generated by generate_caps.py, but can you please explain the format of reference files, since there are like 5 sentences per image?
Thank you!
KeyError: 'A'
when run the prepare_data() in flickr30k.py ,it always reports a KeyError: 'A'.
Does it mean that my dictionary.pkl is incomplete?
cannot replicate the result in the paper?
I ran the default configuration, it seems there is moderate gap between the result reported in the paper?
Instructions to re-train the model: where to start?
Is there a step-by-step instruction on how to train the model on COCO dataset? I cloned the repo and tried running capon.py but that didn't work.
How to get start and How the convolution works out
Hi, Kelvin
I wanna to run your code, but I don't know how to start.
I download some corpus, I get the data "dataset.json" , is that what I need ?
Also, I wanna to know how the convolution in "encoder" works, can you give me an example?
For example , here comes a 3-D matrix (1002003) , how can I set my "filter(extract)" , the 3-D or 2-D or others? and what the result be ?
Best wishes
Trained Models
I am not sure it is true way to ask this question, but have you published any pretrained models on any dataset?
Tensoflow implementation of this model?
is there any TensorFlow base code available for this model?... I found one but that does lack some core properties
argument 'model' in the 'main function' (generate_caps.py)
What input should be given to model?
Is there any model file for this argument or I have to create?
Generate captions for custom images
Hello!
I'd like to test your great framework by generating captions for my own set of images.
How can I do that without training (using any pretrained model)?
Where can I get "model_name.npz" to run the Jupyter Notebook example?
Can you please provide a pretrained model on any of the three datasets (coco, flickr8k, flickr30k)? Or, at least, an indication on where to find it?
I was trying to replicate the experiments in the Jupyter Notebook
Split problem
I found the split of test/val is different from what is given in karpathy/neuraltalk2.
According to their script
https://github.com/karpathy/neuraltalk2/blob/master/coco/coco_preprocess.ipynb
They tried to get first 5000 as val, 5000-10000 as test from this dataset http://msvocds.blob.core.windows.net/annotations-1-0-3/captions_train-val2014.zip. But when I output the filename, I found it's totally different. Did I miss something?
f_init = theano.function([ctx], [ctx]+init_state+init_memory, name='f_init', profile=False)
Excuse me, I don't quite understand what the output of [ctx]+init_state+init_memory
means in this theano.function finit
.Besides in functions of rval = f_init(ctx0)
,ctx0 = rval[0]
, next_state.append(rval[1+lidx])
, what exactly is f_init. As far as I am concerned, init_state
and init_memory
are just we want. Why do we bother to add them up? Thank you.
MemoryError
I tried to implement it on GPU without any revise of the hyperparameters(only the path). However, I got
MemoryError: Error allocating 25690112 bytes of device memory (out of memory).
when it was on epoch 2.
I'm not sure if I use wrong command:
THEANO_FLAGS='mode=FAST_RUN,floatX=float32,device=gpu0' python evaluate_flickr30k.py
It shows CNMeM is disabled.
Then I try:
THEANO_FLAGS='mode=FAST_RUN,floatX=float32,device=gpu0,lib.cnmem=1' python evaluate_flickr30k.py
However, when it runs gen_sample, the sample sentence is vanished, which is really strange.
Or something wrong with the preprocessing?
I make the four pkl files by
#1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.