deeprnn / image_captioning Goto Github PK

Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

License: MIT License

Python 99.81% Shell 0.19%

image_captioning's Introduction

Introduction

This neural system for image captioning is roughly based on the paper "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" by Xu et al. (ICML2015). The input is an image, and the output is a sentence describing the content of the image. It uses a convolutional neural network to extract visual features from the image, and uses a LSTM recurrent neural network to decode these features into a sentence. A soft attention mechanism is incorporated to improve the quality of the caption. This project is implemented using the Tensorflow library, and allows end-to-end training of both CNN and RNN parts.

Prerequisites

Tensorflow (instructions)
NumPy (instructions)
OpenCV (instructions)
Natural Language Toolkit (NLTK) (instructions)
Pandas (instructions)
Matplotlib (instructions)
tqdm (instructions)

Usage

Preparation: Download the COCO train2014 and val2014 data here. Put the COCO train2014 images in the folder train/images, and put the file captions_train2014.json in the folder train. Similarly, put the COCO val2014 images in the folder val/images, and put the file captions_val2014.json in the folder val. Furthermore, download the pretrained VGG16 net here or ResNet50 net here if you want to use it to initialize the CNN part.
Training: To train a model using the COCO train2014 data, first setup various parameters in the file config.py and then run a command like this:

python main.py --phase=train \
    --load_cnn \
    --cnn_model_file='./vgg16_no_fc.npy'\
    [--train_cnn]

Turn on --train_cnn if you want to jointly train the CNN and RNN parts. Otherwise, only the RNN part is trained. The checkpoints will be saved in the folder models. If you want to resume the training from a checkpoint, run a command like this:

python main.py --phase=train \
    --load \
    --model_file='./models/xxxxxx.npy'\
    [--train_cnn]

To monitor the progress of training, run the following command:

tensorboard --logdir='./summary/'

Evaluation: To evaluate a trained model using the COCO val2014 data, run a command like this:

python main.py --phase=eval \
    --model_file='./models/xxxxxx.npy' \
    --beam_size=3

The result will be shown in stdout. Furthermore, the generated captions will be saved in the file val/results.json.

Inference: You can use the trained model to generate captions for any JPEG images! Put such images in the folder test/images, and run a command like this:

python main.py --phase=test \
    --model_file='./models/xxxxxx.npy' \
    --beam_size=3

The generated captions will be saved in the folder test/results.

Results

A pretrained model with default configuration can be downloaded here. This model was trained solely on the COCO train2014 data. It achieves the following BLEU scores on the COCO val2014 data (with beam size=3):

BLEU-1 = 70.3%
BLEU-2 = 53.6%
BLEU-3 = 39.8%
BLEU-4 = 29.5%

Here are some captions generated by this model:

References

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. ICML 2015.
The original implementation in Theano
An earlier implementation in Tensorflow
Microsoft COCO dataset

image_captioning's People

Contributors

Stargazers

Watchers

Forkers

wenlihaoyu benjamesbabala kevinwenya arbdigital kekedan happyphonon lwwang hhappy06 paojianghu dyz-zju gilberttam twinsyssy1018 banben imutlab wheatwaves fengjiexyb jdc08161063 wanjinchang stevenlol www0wwwjs1 vanova yenchi-hsu fykjia congchu dimplesl nanfengpo iqbal-chowdhury lsheiba whz1861 hsakas azurathena yiqinggit gongyanchao fanbenchao liulixin29 vistada chenghuige sfidea merlin2013 zaytiamo nku428 himani777 yumiaogithub fei161 desire2020 yzh119 ethanyhzhang changquanyou samuel2015 pranavr93 vijayg4 xiaobaoer quaffquaff thanhtu19392 dracotin lebronyxm shubhampachori12110095 xiaonanchong96 skyewang saruvora reiisky nunofernandes-plight mdamo superalexander zyj0021200 haiboowang karmarv baaslaawe hillyess yunwenhuang honghonghonghonghonghonghonghong darylbhlin progresstogether haojx07 wqzdhrdhr estimator ahyuan mpowelson zgsxwsdxg jake132456 qini7 logicholmes rushyam coldmanck frelam ashutoshgiri junchenjin cynthia0811 sammycience davidqiu1993 davis980520 noandrea jeff-ruby zqdeepbluesky artist100 ahmedsmostafa dnjs5649 baifengbai mkhoin liyingxuan89

image_captioning's Issues

BUG in dataset.py

COCO have no methods like filter_by_words() in line 75 and all_captions() in line 69 in dataset.py.

Questions about training details

Hi, I' am trying to reproduce your work.

May I ask, how much are the totoal_loss and accuracy after training?
I train the model for 60 epochs on 1/10 of the train data, and get a total_loss of about 1.6, an accuracy of about 65%, but when generating captions for the test images, the model just repeats all the same word, quite strange!!!

Any ideas? Thanks very much.

Training on multi-gpu

I am unable to train the model on multiple GPUs. Am I missing something? Where do I need to configure the script for multi-gpu training? Thanks

can't download the pretrained model!

can anybody share the pretrained model from another way?

AttributeError: 'module' object has no attribute 'AUTO_SIZE'

Thanks for you sharing.
I met an AttributeError: 'module' object has no attribute 'AUTO_SIZE'. I wonder that my tensorflow version(1.2.0 ) is low. Maybe higher than 1.4.0 is needed?
How can I solve the problem?

How long does the training take?

I have been training for a few days, I do not know how long I need to train

ImportError: No module named 'tokenizer.ptbtokenizer'

the eval.py import list all No module named.
can u give me some advise ?

Cannot run the code

IOError: [Errno 2] No such file or directory: './train/captions_train2014.json'
What are the pre-requisites needed to run this code?

How high BLEU scores will be achieved provided enough training iterations ?

Can your code achieve the same or similar BLEU scores published in the original paper?

subprocess.py", line 1024, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

Loading the model from ./models/289999.npy...
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 47/47 [00:01<00:00, 23.69it/s]
47 tensors loaded.
Evaluating the model ...
batch: 100%|█████████████████████████████████████████████████████████████████████████████████| 1266/1266 [2:02:42<00:00, 5.83s/it]
Loading and preparing results...
DONE (t=0.13s)
creating index...
index created!
tokenization...
Traceback (most recent call last):
File "main.py", line 69, in
tf.app.run()
File "/home/viktor/anaconda2/envs/captureimage4/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 58, in main
model.eval(sess, coco, data, vocabulary)
File "/home/viktor/anaconda2/envs/captureimage4/scr/image_captioning-master/base_model.py", line 108, in eval
scorer.evaluate()
File "/home/viktor/anaconda2/envs/captureimage4/scr/image_captioning-master/utils/coco/pycocoevalcap/eval.py", line 31, in evaluate
gts = tokenizer.tokenize(gts)
File "/home/viktor/anaconda2/envs/captureimage4/scr/image_captioning-master/utils/coco/pycocoevalcap/tokenizer/ptbtokenizer.py", line 52, in tokenize
stdout=subprocess.PIPE)
File "/home/viktor/anaconda2/envs/captureimage4/lib/python2.7/subprocess.py", line 390, in init
errread, errwrite)
File "/home/viktor/anaconda2/envs/captureimage4/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
(captureimage4) viktor@viktor-System-Product-Name:

AttributeError: 'module' object has no attribute 'rnn_cell' on model.py

Hello

I'm trying to run it with weigths provided, however I didn't found our environment's setting (python version? dependencies versions?) So I'm running it on python 2.7.

I got the following error:

(captioning) rola93@rola93-Latitude-E5520:~/no_version/image_captioning$ python main.py --phase=test     --model_file='./models/289999/289999.npy' --beam_size=3
2018-08-08 14:50:20.015861: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-08-08 14:50:20.015898: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-08-08 14:50:20.015922: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Building the vocabulary...
Vocabulary built.
Number of words = 5000
Building the dataset...
Dataset built.
Building the CNN...
CNN built.
Building the RNN...
Traceback (most recent call last):
  File "main.py", line 69, in <module>
    tf.app.run()
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 63, in main
    model = CaptionGenerator(config)
  File "/home/rola93/no_version/image_captioning/base_model.py", line 27, in __init__
    self.build()
  File "/home/rola93/no_version/image_captioning/model.py", line 10, in build
    self.build_rnn()
  File "/home/rola93/no_version/image_captioning/model.py", line 228, in build_rnn
    lstm = tf.nn.rnn_cell.LSTMCell(
AttributeError: 'module' object has no attribute 'rnn_cell'

According to this tf.nn.rnn_cell.LSTMCell was moved to contrib, so instead of lstm = tf.nn.rnn_cell.LSTMCell it should be lstm = tf.contrib.rnn.LSTMCell

I tried it, and worked (actually it breaks anyway, but i'm sure it's another problem).

Does it make sense?

thank you!

The test

how to test an image ?

I have run that:
python main.py --phase=test \ --model_file='./models/xxxxxx.npy' \ --beam_size=3
now, I want to test another image , but don't want to reload the model, how can I do?

How to visualize the model?

In keras we can use:

keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

how to do it inTF

no such file or directory:'./models/config.pickle'

I ran into this problem, after vocabulary biult, dataset built, cnn built, rnn built, the problem occurs,
no such file or directory:'./models/config.pickle'

python3.5 ImportError: No module named '_sqlite3' nltk

hi,guys! when i tried to run train.py on Ubuntu16.04 with python3.5, i got this error, could anyone tell me how to fix this problem? i don't have the root right& i don't want to reinstall python from source. thank you, please!

'NoneType' object has no attribute 'swapaxes'

How do I get around this error:
Traceback (most recent call last):
File "main.py", line 69, in
tf.app.run()
File "C:\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "main.py", line 50, in main
model.train(sess, data)
File "C:\Users\pranj\Desktop\img_cap\base_model.py", line 50, in train
images = self.image_loader.load_images(image_files)
File "C:\Users\pranj\Desktop\img_cap\utils\misc.py", line 35, in load_images
images.append(self.load_image(image_file))
File "C:\Users\pranj\Desktop\img_cap\utils\misc.py", line 19, in load_image
temp = image.swapaxes(0, 2)
AttributeError: 'NoneType' object has no attribute 'swapaxes'

hello

Do you have a QQ? I run your code, why are all the results of the test picture output the same? I checked the code, and there was no error. Why the description of the image is the same

AttributeError. Dataset attribute has no attribute:

def train(self, sess, train_data):
""" Train the model using the COCO train2014 data. """
print("Training the model...")
config = self.config
if not os.path.exists(config.summary_dir):
os.mkdir(config.summary_dir)
train_writer = tf.summary.FileWriter(config.summary_dir,
sess.graph)
for _ in tqdm(list(range(config.num_epochs)), desc='epoch'):
for _ in tqdm(list(range(train_data.num_batches)), desc='batch'):
batch = train_data.__next__batch()
image_files, sentences, masks = batch
images = self.image_loader.load_images('./train/images')
feed_dict = {self.images: images,
self.sentences: sentences,
self.masks: masks}
_, summary, global_step = sess.run([self.opt_op,
self.summary,
self.global_step],
feed_dict=feed_dict)
if (global_step + 1) % config.save_period == 0:
self.save()
train_writer.add_summary(summary, global_step)
train_data.reset()
self.save()
train_writer.close()
print("Training complete.")

File "H:\First Neural Network\image_captioning-master\base_model.py", line 44, in train
batch = train_data.__next__batch()

AttributeError: 'DataSet' object has no attribute '_BaseModel__next__batch'

PyFPE_jbuf Error

Hello, Thank you for your efforts for writing and putting up your code on github. I really appreciate it.

I have an issue with it when I use anaconda environment while executing it, it gives me the following error:

/image_captioning-master/utils/coco/_mask.so: undefined symbol: PyFPE_jbuf

I search for it online and also tried removing the numpy library and then again executing it, but I still face this error. Can you please help me out ?

Thank You

use the first command,it shows 'list' object has no attribute 'iteritems'

it shows:
Traceback (most recent call last):
File "test.py", line 12, in
for param_name, data in data_dict[op_name].iteritems():
AttributeError: 'list' object has no attribute 'iteritems'

and i print the data_dict[op_name],it is list,what should i do.

how can I use part of training data?

If I just put some of the training data into the /images fold, there is always an error shown below.

Training the model...
epoch: 0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last): | 0/11290 [00:00<?, ?it/s]
File "main.py", line 72, in
tf.app.run()
File "/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "main.py", line 53, in main
model.train(sess, data)
File "/Users/brian/Downloads/show-attend-and-tell-master/base_model.py", line 50, in train
images = self.image_loader.load_images(image_files)
File "/Users/brian/Downloads/show-attend-and-tell-master/utils/misc.py", line 34, in load_images
images.append(self.load_image(image_file))
File "/Users/brian/Downloads/show-attend-and-tell-master/utils/misc.py", line 18, in load_image
temp = image.swapaxes(0, 2)
AttributeError: 'NoneType' object has no attribute 'swapaxes'

Well trained model on COCO train 2014?

This link can not open now.

Could somebody share the well trained model the author provided?
Or has somebody already trained on your own computers and save the model?

Thanks very much!

Do you have the vgg16_no_fc.npy?

Anyone have the file:vgg16_no_fc.npy. The address is invaild. Thank you !

Visualize attention points

Hi @DeepRNN Is there a way I can access the attention point from the image. I know I can do that using alpha, but I want to use those attention points and utilize that for further data processing. Can you please guide how can I get alpha from build_rnn() to the main file and access the positions of each annotation while testing it with an image? something like - this

tostring() has been removed. Please call tobytes() instead.

how to remove attention mechanism

As far as I know, image caption can be generated without the mechanism of attention.
How can i remove the mechanism of attention effectively for training?
can u give me advice pls?

How to visualization visual attention

Hi, How to visualization visual attention

Strange visual attention results

Hello @DeepRNN !
I took a look at attentions that model generates in test mode.
I did the following: in base_model.py:200 i changed the code as following

                memory, output, scores, attentions = sess.run(
                    [self.memory, self.output, self.probs, self.attentions],
                    feed_dict = {self.contexts: contexts,
                                 self.last_word: last_word,
                                 self.last_memory: last_memory,
                                 self.last_output: last_output})

So after that, every attentions array has the shape batch_size, 196, beam_size and for simplicity I set beam_size=1 when testing. Next, I simply stack all attentions in one numpy array and visualize its content.
I found two concerns:

Attentions maps for different tokens vary negligibly (for 1.jpg the maximum difference between 1st and 2nd tokens is ~e^-9).
The maps itself looks rather strange. For the test image with bus:

However, the caption of the image is both grammatically and semantically correct.
I would like to discuss these results.

Command-line argument settings

What command-line arguments should be used to get the best performance (BLEU, etc.)? I assume --load_cnn_model should be set to True so that a pretrained CNN model can be used. What other settings should be used? For example, should --num_lstm be increased from 1 to 2? Should --init_lstm_with_fc_feats be set to True?

Which layer of Google's NASNet should I use for extracting features for attention?

I am implementing a network similar to this one, but want to use the pre-trained CNN with max accuracy over 2012ILSVRC dataset, i.e., NASNet-large. Usually, people go by extracting the last convolution layer features. But NASNet's architecture is relatively complex and I couldn't find a direct Conv layer. Below is the Tensorboard visualization of the "final_layer" cell of NASNet:

And below is the second last cell:

To me, the relu node I've selected in first image([1,11,11,4032]) seems close to what's needed for attention, but I am not sure. Any help will be highly appreciated.

Attention Formula dismatch with the implementation

I found that in the paper， the formula of MLP attention is usually desribed as below：

where vi is i-th feature map，ht is the output of lstm.

But in the code, the implementation goes like this:

    def attend(self, contexts, output):
        """ Attention Mechanism. """
        config = self.config
        reshaped_contexts = tf.reshape(contexts, [-1, self.dim_ctx])
        reshaped_contexts = self.nn.dropout(reshaped_contexts)
        output = self.nn.dropout(output)
        if config.num_attend_layers == 1:
            # use 1 fc layer to attend
            logits1 = self.nn.dense(reshaped_contexts,
                                    units = 1,
                                    activation = None,
                                    use_bias = False,
                                    name = 'fc_a')
            logits1 = tf.reshape(logits1, [-1, self.num_ctx])
            logits2 = self.nn.dense(output,
                                    units = self.num_ctx,
                                    activation = None,
                                    use_bias = False,
                                    name = 'fc_b')
            logits = logits1 + logits2
        else:
            # use 2 fc layers to attend
            temp1 = self.nn.dense(reshaped_contexts,
                                  units = config.dim_attend_layer,
                                  activation = tf.tanh,
                                  name = 'fc_1a')
            temp2 = self.nn.dense(output,
                                  units = config.dim_attend_layer,
                                  activation = tf.tanh,
                                  name = 'fc_1b')
            temp2 = tf.tile(tf.expand_dims(temp2, 1), [1, self.num_ctx, 1])
            temp2 = tf.reshape(temp2, [-1, config.dim_attend_layer])
            temp = temp1 + temp2
            temp = self.nn.dropout(temp)
            logits = self.nn.dense(temp,
                                   units = 1,
                                   activation = None,
                                   use_bias = False,
                                   name = 'fc_2')
            logits = tf.reshape(logits, [-1, self.num_ctx])
        alpha = tf.nn.softmax(logits)
        return alpha

Here I only consider the 2-fc branch.
I think the fomula of the code is : wa(tanh(Wva vi) + tanh(Wha ht)), which is slightly different with the paper. But tanh(A) + tanh(B) != tanh(A+B)

So I wonder if there could be some problems that this difference may cause. Anyone can help?

would you pls release a pretrained caption model?

I want to try testing serval images with this amazing method, would you pls release a pretrained caption model?

Have a question in main.py

Hello, thank you for your work first.
There are some problem when I run main.py, I am confused why there is a Syntax Error in bleu_scorer.py .Error message as follow：

Traceback (most recent call last):
File "main.py", line 5, in
from model import CaptionGenerator
File "D:\AI_Prj\image_captioning-master\model.py", line 4, in
from base_model import BaseModel
File "D:\AI_Prj\image_captioning-master\base_model.py", line 13, in
from utils.coco.pycocoevalcap.eval import COCOEvalCap
File "D:\AI_Prj\image_captioning-master\utils\coco\pycocoevalcap\eval.py", line 3, in
from utils.coco.pycocoevalcap.bleu.bleu import Bleu
File "D:\AI_Prj\image_captioning-master\utils\coco\pycocoevalcap\bleu\bleu.py", line 11, in
from utils.coco.pycocoevalcap.bleu.bleu_scorer import BleuScorer
File "D:\AI_Prj\image_captioning-master\utils\coco\pycocoevalcap\bleu\bleu_scorer.py", line 60
def cook_test(test, (reflen, refmaxcounts), eff=None, n=4):
^
SyntaxError: invalid syntax

How can I get 'word_table.pickle'?

I was told "No such file or directory: './words/word_table.pickle'" when running main.py

More details for training the model?

Can the author provide more details for training the model?

It's useful to describe where to get the dataset, how should we prepare the input files and how should we start the training process.

Thanks!

Anyone meet the error about "OMP"?

OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.

Calculation of attention

In the code，you use fully_connect layer to calculate attention,why don't use formula in <>?

You may need to pass the encoding= option to numpy.load

i'm using mac os x el capitan
python 3.6.5
the last line error which given to me is 'You may need to pass the encoding= option to numpy.load'
log of process:
loading annotations into memory...
Done (t=4.42s)
creating index...
index created!
Filtering the captions by length...
creating index...
index created!
Building the vocabulary...
Vocabulary built.
Number of words = 5000
Filtering the captions by words...
creating index...
index created!
Processing the captions...
Captions processed.
Number of captions = 515671
Building the dataset...
Dataset built.
Building the CNN...
CNN built.
Building the RNN...
RNN built.
Loading the CNN from vgg16_no_fc.npy...
loading annotations into memory...
Done (t=3.82s)
creating index...
index created!
Filtering the captions by length...
creating index...
index created!
Building the vocabulary...
Vocabulary built.
Number of words = 5000
Filtering the captions by words...
creating index...
index created!
Processing the captions...
Captions processed.
Number of captions = 515671
Building the dataset...
Dataset built.
Building the CNN...
CNN built.
Building the RNN...
RNN built.
Loading the CNN from vgg16_no_fc.npy...

I revised this code and work for test at base_model.py

Dear all

there is something error at this line, so I make some revise

145 plt.savefig(os.path.join("test/results/",image_name.split("/")[-1]+'_result.jpg'))

Why this model is trained for the same value

How to get the logit of attention？

In lines 404 to 415 of the model.py file, why do you want to add the logit of the image and the logit of the hiddien state as the final logit? Why not directly multiply the image features and the hidden state as the final logit?

Why don't you convert the weighted image features into the state c of the cell, and concatenate it with word embedding as input?

Graph is finalized and cannot be modified.

with tf.Session() as sess:
if FLAGS.phase == 'train':
# training phase
data = prepare_train_data(config)
tf.get_default_graph().finalize()
model = CaptionGenerator(config)
sess.run(tf.global_variables_initializer())
if FLAGS.load:
model.load(sess, FLAGS.model_file)
if FLAGS.load_cnn:
model.load_cnn(sess, FLAGS.cnn_model_file)
tf.get_default_graph().finalize()
model.train(sess, data)

Vocabulary built.
Number of words = 5000
Filtering the captions by words...
100%|██████████| 409884/409884 [00:48<00:00, 8534.57it/s]
creating index...
index created!
Processing the captions...
Captions processed.
Number of captions = 361254
Building the dataset...
Dataset built.
Traceback (most recent call last):

File "", line 83, in
tf.app.run()

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))

File "", line 54, in main
model = CaptionGenerator(config)

File "H:\First Neural Network\image_captioning-master\base_model.py", line 26, in init
trainable = False)

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 259, in init
constraint=constraint)

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 380, in _init_from_args
initial_value, name="initial_value", dtype=dtype)

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1011, in convert_to_tensor
as_ref=False)

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1107, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\constant_op.py", line 217, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\constant_op.py", line 202, in constant
name=name).outputs[0]

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3386, in create_op
self._check_not_finalized()

File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3024, in _check_not_finalized
raise RuntimeError("Graph is finalized and cannot be modified.")

RuntimeError: Graph is finalized and cannot be modified.

FileNotFoundError: [WinError 2] 系统找不到指定的文件。

FailedPreconditionError: Attempting to use uninitialized value lstm/lstm_cell/biases

Hello

I'm trying to run it with provided weigths on some images to get its captions.

I'm running it on python 2.7, and tensorflow 1.1.0., however wher I run it I get this:

$ python main.py --phase=test     --model_file='./models/289999/289999.npy' --beam_size=3
2018-08-08 15:12:08.881870: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-08-08 15:12:08.881907: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-08-08 15:12:08.881922: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Building the vocabulary...
Vocabulary built.
Number of words = 5000
Building the dataset...
Dataset built.
Building the CNN...
CNN built.
Building the RNN...
RNN built.
Loading the model from ./models/289999/289999.npy...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 47/47 [00:01<00:00, 45.43it/s]
45 tensors loaded.
Testing the model ...
path:   0%|                                                                                                              | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "main.py", line 69, in <module>
    tf.app.run()
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 66, in main
    model.test(sess, data, vocabulary)
  File "/home/rola93/no_version/image_captioning/base_model.py", line 124, in test
    caption_data = self.beam_search(sess, batch, vocabulary)
  File "/home/rola93/no_version/image_captioning/base_model.py", line 202, in beam_search
    self.last_output: last_output})
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
    run_metadata_ptr)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
    feed_dict_string, options, run_metadata)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
    target_list, options, run_metadata)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value lstm/lstm_cell/biases
   [[Node: lstm/lstm_cell/biases/read = Identity[T=DT_FLOAT, _class=["loc:@lstm/lstm_cell/biases"], _device="/job:localhost/replica:0/task:0/cpu:0"](lstm/lstm_cell/biases)]]

Caused by op u'lstm/lstm_cell/biases/read', defined at:
  File "main.py", line 69, in <module>
    tf.app.run()
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "main.py", line 63, in main
    model = CaptionGenerator(config)
  File "/home/rola93/no_version/image_captioning/base_model.py", line 27, in __init__
    self.build()
  File "/home/rola93/no_version/image_captioning/model.py", line 10, in build
    self.build_rnn()
  File "/home/rola93/no_version/image_captioning/model.py", line 279, in build_rnn
    output, state = lstm(current_input, last_state)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 404, in __call__
    lstm_matrix = _linear([inputs, m_prev], 4 * self._num_units, bias=True)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 1056, in _linear
    initializer=init_ops.constant_initializer(bias_start, dtype=dtype))
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 714, in _get_single_variable
    validate_shape=validate_shape)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 197, in __init__
    expected_shape=expected_shape)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 316, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1338, in identity
    result = _op_def_lib.apply_op("Identity", input=input, name=name)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/rola93/.pyenv/versions/captioning_2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value lstm/lstm_cell/biases
   [[Node: lstm/lstm_cell/biases/read = Identity[T=DT_FLOAT, _class=["loc:@lstm/lstm_cell/biases"], _device="/job:localhost/replica:0/task:0/cpu:0"](lstm/lstm_cell/biases)]]

Is there a way to use ResNet?

I was trying to run the model with ResNet as CNN. I've supplied model weights file in parameters, and also changed it in configuration. But in Tensorboard it still seems like the network still uses VGG.

Is there a way to actually use ResNet? If so, how?

Docker Serving

Anyone with experience Serving the TF model for Docker? In particular with producing the signature_def_map.

Have a question in model.py

File "/home/cao/semantic-IQA/image-caption-tensorflow/image_captioning-master/model.py", line 449, in build_rnn
opt_op = solver.apply_gradients(zip(gs, tvars), global_step=self.global_step)
ValueError: Variable emb_w/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

Python 3 Version

Hi all,

I have adapted this repo into python3-compatible version. Please refer to here for the code. I did not pull request as the new version seems not compatible with old python 2 (mainly as different data encoding). Hope it helps!

Hi guys, could u share your hyperparameters of models

It take a long time for training, who can tell optimal hyperparameters