anuragmishracse / caption_generator Goto Github PK

View Code? Open in Web Editor NEW

263.0 263.0 120.0 924 KB

A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.

License: MIT License

Python 100.00%

bleu-score captioning-images cnn image image-captioning keras lstm rnn tensorflow

caption_generator's People

Contributors

Stargazers

Watchers

Forkers

kaiquewdev yanwunantian souravroy0708 aesmin yenchi-hsu rollingstone ztilottama simmoncn zergioguitar 19ai nanfengpo lsheiba zhangxd12 xibinyue scholltan stevenlol nwy2010 roggirg xiaonanchong96 y12uc231 emailhy himani777 konglongteng crowoy shubhampachori12110095 nsahotaq3 whrenstone zaytiamo swordsmanxyz balaneshin shreya512 p7rox aimsky guptam kinect59 bhaveshoswal jojowither wylqq312715289 uglyboxer zyj0021200 abneil tadatitam abilashamarthaluri seb5666 lazysheep233 xjdeng salonirk11 ls0207 frfy arasharchor dokeash varshithpolu mel1015 nchlis ai3dvision mani2307 div99 suraj-deshmukh pb-pravin medicalimageanalysisgroup freelogic cynthia0811 sushantjha8 jblinder vinace lymrty awantik alanmorninglight cequencer gitter-ht sherlock42 afcarl czzyyy anirband jetrunner weixx11 linuxcampusclubsjce kiroshailay jerryzhou199 showkeyjar schoothuang zbxzc35 avinashtgoje mamta-ms abhi-121 misalraj multiplecrashes djokerdummy ahirtonlopes cherishineni ryzbaka thzll2001 arjun-kava b2220333 chopper-leey haoyudong-97 jonyboy2000 claytonbrown kanchitbajaj8070 chreegq

caption_generator's Issues

how long does the model take to train?

I'm new to deep learning.I'm running 1/50 epoch and it is already more than an hour. Can someone tell me how long will it take to complete?

I got error while runnning train_model.py

terminate_id = tokens.index('<end>') ValueError: '<end>' is not in list

When to use pretrained word embeddings?

Hello, I was wondering when do you think is worth to use a pretrained word embeddings model. I am facing a one-to-many problem as well, where my "many" are text paragraphs (~80 words). I have 100K training instances. What do you think?

Also, if I were about to use a pretrained word embedding model, where should I insert it in your code?

Thanks in advance.

Missing InceptionV3

The prepare_dataset.py has finished. I am trying to run the train_model.py however, it is giving me the following error:

ImportError: No module named inception_v3

I thought that the inception_v3 module was included in Tensorflow?

terminate_id = tokens.index('<end>') ---- ValueError: '<end>' is not in list

Hi. While executing the test.py file. I am getting the following error.

terminate_id = tokens.index('')
ValueError: '' is not in list

from where do we get the flickr_8k_test_dataset.txt file .please help

Generating Captions on non-Flickr8k images

How do you go about generating captions on images of your choice?

The testing code seems to reply on the pre-generated encoded_images.py file, which only contains encodings for Flickr8k images in this case. The prepare_dataset.py file does not seem easy to adapt to encode images of your choice into encoded_images.py.

Any help would be appreciated.

AttributeError: 'module' object has no attribute 'CaptionGenerator'

Hi, after running prepare_dataset.py I tried to ran train_dataset.py. But it gives this error message.
``File "train_model.py", line 25, in
train_model(epochs=50)
File "train_model.py", line 6, in train_model
cg = caption_generator.CaptionGenerator()
AttributeError: 'module' object has no attribute 'CaptionGenerator'

IOError: [Errno 2] No such file or directory: '/home/user/Documents/caption_generator-master/Flicker8k_Dataset/image_id\tcaptions'

Im getting this issue i dont know why. Any ideas?

bad caption

I have trained the model and the loss is 1.16, acc is 0.7574. But when I take the model to generate caption , the caption is excursive and have no sense.

How should I do?

Pre-trained models

Hi,

The model works good and i would like to know if you can share a pre-trained models.

Thanks

ValueError:

You are trying to load a weight file containing 19 layers into a model with 16 layers. (prepare_dataset.pt-- tensorflow beckend)

Instalation

Hі, I was looking for a solution that would make captions for pictures, and came across your project. But I faced the problem of not being able to run this application on my laptop. Is there any more detailed instruction that will describe step by step what exactly needs to be done to run the program?

A small doubt about implementation.

So I have read the paper and have a small doubt, The authors just initialise the CNN to pre-trained weights(ImageNet) and they don't change the weights further? Can you just tell me your approach for the cnn, you just initialised it with imagenet weights and never altered them? right?

Thank You.

Installing tensorflow

Looks like the implementation of this project is by python 2.7. Im facing issues in installing tensorflow. Am i right with the version of python?

Error

RuntimeError: You must compile your model before using it.

Running prepare_dataset.py is okay.
When I run test_model.py, I found the following error.
How should I change code? Please guide me!

C:\Anaconda3\envs\caption_generator-master\python.exe
D:/code/Python/caption_generator-master/caption_generator/train_model.py
Using TensorFlow backend.
Total samples : 383454
Vocabulary size: 8256
Maximum caption length: 40
Variables initialization done!
Model created!
Traceback (most recent call last):
File "D:/Python/caption_generator-master/caption_generator/train_model.py", line 26, in
train_model(epochs=50)
File "D:/code/Python/caption_generator-master/caption_generator/train_model.py", line 17, in train_model
model.fit_generator(cg.data_generator(batch_size=batch_size), steps_per_epoch=cg.total_samples/batch_size, epochs=epochs, verbose=2, callbacks=callbacks_list)
File "C:\Anaconda3\envs\caption_generator-master\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\envs\caption_generator-master\lib\site-packages\keras\engine\training.py", line 1420, in fit_generator
initial_epoch=initial_epoch)
File "C:\Anaconda3\envs\caption_generator-master\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
model._make_train_function()
File "C:\Anaconda3\envs\caption_generator-master\lib\site-packages\keras\engine\training.py", line 496, in _make_train_function
raise RuntimeError('You must compile your model before using it.')
RuntimeError: You must compile your model before using it.

Process finished with exit code 1

Low accuracy on MSCOCO

HI, I'm following your code and try to train the network on MSCOCO
Here is my code

class Caption_Model:
def init(self,char_to_int,int_to_char,vocab_size=26688,max_caption_len=20,folder_path=path,epochs=10,batch_size=64):
self.img_model=Sequential()
self.text_model=Sequential()
self.model=Sequential()
self.vocab_size=vocab_size
self.max_caption_len=max_caption_len
self.folder_path=folder_path
self.data={}
self.char_to_int=char_to_int
self.int_to_char=int_to_char
self.batch_size=batch_size
self.epochs=epochs

def get_image_model(self):
    self.img_model.add(Dense(Embedding_dim,input_dim=4096,activation='relu'))
    self.img_model.add(RepeatVector(self.max_caption_len+1))
    # self.img_model.summary()
    return self.img_model

def get_text_model(self):
    self.text_model.add(Embedding(self.vocab_size,256,input_length=self.max_caption_len+1))
    self.text_model.add(LSTM(512,return_sequences=True))
    #self.text_model.add(Dropout(0.2))
    self.text_model.add(TimeDistributed(Dense(Embedding_dim,activation='relu')))
    # self.text_model.summary()
    return self.text_model

def get_caption_model(self,predict=False):
    self.get_image_model()
    self.get_text_model()
    self.model.add(Merge([self.img_model,self.text_model],mode='concat'))
    self.model.add(LSTM(1000,return_sequences=False))
    self.model.add(Dense(self.vocab_size))
    self.model.add(Activation('softmax'))
    print "Now model.model"
    sgd = SGD(lr=1e-3, decay=1e-6, momentum=0.99, nesterov=True)
    rms = RMSprop(lr=0.005)
    if predict:
        return
    else:
        # weight='/home/paperspace/Document/DeepLearning/ImageCaption/code/Models/checkpoint/weights-improvement-02-5.2473.hdf5'
        # self.model.load_weights(weight)
        self.model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])

def load_data(self,set_type='train'):
    data={}
    with open(self.folder_path+set_type+'.processed_img.2.pkl') as f:
        data['imgs']=pickle.load(f)
    with open(os.path.join(self.folder_path,'all%spartial_sentences_0.pkl'%set_type)) as f:
        data['partial_sentences']=pickle.load(f)
    return data

def data_generator(self,set_type='train'):
    data=self.load_data(set_type)
    j=0
    temp=data['partial_sentences'].keys()
    partial_sentences,images=[],[]
    next_words=np.zeros((self.batch_size,self.vocab_size)).astype(float)
    count=0
    round_count=0
    while True:
        round_count+=1
        random.shuffle(temp)
        print "the %d round!" %round_count
        for key in temp:
            image=data['imgs'][key]
            for sen in data['partial_sentences'][key]:
                for k in range(len(sen)):
                    count+=1
                    partial=sen[:k+1]
                    partial_sentences.append(partial)
                    images.append(image)
                    # print "index is: ",count-1
                    if k==len(sen)-1:
                        next_words[count-1][self.char_to_int['<end>']]=1
                    else:
                        next_words[count-1][sen[k+1]]=1
                    if count>=self.batch_size:
                        partial_sentences=sequence.pad_sequences(partial_sentences, maxlen=self.max_caption_len+1, padding='post')
                        partial_sentences=np.asarray(partial_sentences)
                        images=np.asarray(images)
                        # partial_sentences=partial_sentences/float(self.vocab_size)
                        # print partial_sentences
                        count=0
                        yield [images,partial_sentences],next_words
                        partial_sentences,images=[],[]
                        next_words=np.zeros((self.batch_size,self.vocab_size)).astype(float)

        j+=1

def train(self):
    self.get_caption_model()
    filepath="Models/checkpoint/weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
    checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
    callbacks_list = [checkpoint]

    self.model.fit_generator(self.data_generator('train'),steps_per_epoch=step_size/self.batch_size,epochs=self.epochs,validation_data=self.data_generator('val'),validation_steps=v_step_size/self.batch_size,callbacks=callbacks_list)
    # self.model.fit_generator(self.data_generator('train'),steps_per_epoch=step_size/self.batch_size,epochs=self.epochs,callbacks=callbacks_list)

    try:
        self.model.save('Models/WholeModel.h5',overwrite=True)
        self.model.save_weights('Models/Weights.h5',overwrite=True)
    except:
        print "Error in saving model."
    print "After training model...\n"

Accuracy maintains about 35% in the end and training loss is about 3.xxx
I just cannot figure out what's wrong with the code.
Could you please offer some help.
Thank you so much!

Running caption_generator on Google Cloud Engine

I am attempting to get this up and running on a Google Cloud Engine (GCE) VM - Debian 4.9.51-1 x86_64. I performed a: sudo pip install -r requirements.txt, and everything installed correctly.

I then attempt to run python caption_generator/prepare_dataset.py. This outputs the following error:

Using TensorFlow backend.
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
Traceback (most recent call last):
  File "caption_generator/prepare_dataset.py", line 2, in <module>
    from keras.preprocessing import image
  File "/usr/local/lib/python2.7/dist-packages/keras/__init__.py", line 3, in <module>
    from . import activations
  File "/usr/local/lib/python2.7/dist-packages/keras/activations.py", line 4, in <module>
    from .utils.generic_utils import deserialize_keras_object
  File "/usr/local/lib/python2.7/dist-packages/keras/utils/__init__.py", line 6, in <module>
    from . import io_utils
  File "/usr/local/lib/python2.7/dist-packages/keras/utils/io_utils.py", line 10, in <module>
    import h5py
  File "/usr/local/lib/python2.7/dist-packages/h5py/__init__.py", line 31, in <module>
    from .highlevel import *
  File "/usr/local/lib/python2.7/dist-packages/h5py/highlevel.py", line 13, in <module>
    from ._hl.base import is_hdf5, HLObject
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/base.py", line 78, in <module>
    dlapl = default_lapl()
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/base.py", line 65, in default_lapl
    lapl = h5p.create(h5p.LINK_ACCESS)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5p.pyx", line 131, in h5py.h5p.create
  File "h5py/h5p.pyx", line 72, in h5py.h5p.propwrap
ValueError: Not a property list class (Not a property list class)

Any help would be much appreciated.

Network does not converge, bad captions

Hello,

I've followed your instructions and started training the network. The loss reaches its minimum value after about 5 epochs and then it starts to diverge again.

After 50 epochs, the generated captions of the best epoch (5th or 6th) look like this:

Predicting for image: 992
2351479551_e8820a1ff3.jpg : exercise lamb Fourth headphones facing pasta soft her soft her soft her soft her soft her dads college soft her dads college soft her her her her her soft her her her her her soft her her her her
Predicting for image: 993
3514179514_cbc3371b92.jpg : fist graffitti soft her soft her Hollywood Fourth Crowd soft her her soft her her her her her soft her her her her her her soft her her her her soft her her her her soft her her her
Predicting for image: 994
1119015538_e8e796281e.jpg : closeout security soft her soft her security fall soft her her her her her fall soft her her her her her her soft her her her her her soft her her her her soft her her her her her
Predicting for image: 995
3727752439_907795603b.jpg : roots college Fourth tree-filled o swing-set places soft her soft her her soft her her soft her her college soft her her her her her her her soft her her her her soft her her her her her her

Any idea what's wrong?

i got error while running train_model.py

raise RuntimeError('You must compile your model before using it.')

RuntimeError: You must compile your model before using it.

Flickr_30k.trainimages.txt and Flickr_30k.testimages.txt files.

Hello, do u have the Flickr_30k.trainimages.txt and Flickr_30k.testimages.txt files. I can't find this files in anywhere=( In official web it's unable to download. I have image I need just this files

How to generate caption for other images

Can someone please update the test_model.py to generate caption for specific images?

pred = np.reshape(pred, pred.shape[1])

what's the use of this code?
and it reports an error: cannot reshape array of size 100352 into shape (14,)

I couldn't get the weights-improvement-48.hdf5

After training, I noticed I just generate weights as follows:
weights-improvement-01.hdf5
weights-improvement-02.hdf5
weights-improvement-03.hdf5
weights-improvement-04.hdf5
Apart from mentioned above, I get no other hdf5 file generated, can anyone tell me what's the problem?
I had checked the address of my training dataset is right and the epoch is 50.

Running on Mac OS X (10.12.6)

I would like to get this running on my local machine before pushing it up to Google Compute Engine.

I setup a new virtual environment, and cloned the repo. I then ran pip install -r requirements.txt and everything installed correctly.

I then tried running python caption_generator/prepare_dataset.py and get the following error:

Using TensorFlow backend.
Traceback (most recent call last):
  File "prepare_dataset.py", line 5, in <module>
    from imagenet_utils import preprocess_input
ImportError: No module named imagenet_utils

I was under the impression that imagenet_utils was included in Keras?

Module Error

Which version of keras is used? I'm getting error saying "module" object is not callable.

IOError: [Errno 2] No such file or directory: 'encoded_images.p'

and where are the datasets?

how to get Flicker8k_dataset

how to get the images that is required to be stored in Flicker8k_Dataset folder to run prepare_data.py
please help my email id is : [email protected]

Awful Captions Generated when Testing

After finally getting it to work (with basically no changes to the code), and letting it run for 25 epochs, I tried running the test_model.py script. This produces really bad results. I have changed the weights file in the script to use the most recent weights file generated, which was weights-improvement-03.hdf5 for some reason. When training, the accuracy was not increasing, and the loss was increasing.

Here are some of the captions generated when I use that weights file:

Predicting for image: 0
3385593926_d3e9c21170.jpg : A black and white dog jumps in the grass .
Predicting for image: 1
2677656448_6b7e7702af.jpg : A black and white dog is running in the grass .
Predicting for image: 2
311146855_0b65fdb169.jpg : A man wearing a black shirt and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hat and blue shirt and black shirt is people
Predicting for image: 3
1258913059_07c613f7ff.jpg : A man in a black shirt is sitting in a mountain .
Predicting for image: 4
241347760_d44c8d3a01.jpg : A man in a red shirt is playing in the field .
Predicting for image: 5
2654514044_a70a6e2c21.jpg : A black and white dog is running on a field .
Predicting for image: 6
2339106348_2df90aa6a9.jpg : A man wearing a black shirt and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and white shirt
Predicting for image: 7
256085101_2c2617c5d0.jpg : A black and white dog jumps in the grass .
Predicting for image: 8
280706862_14c30d734a.jpg : A black and white dog running in the snow .
Predicting for image: 9
3072172967_630e9c69d0.jpg : A man in a red shirt is sitting in the background .
Predicting for image: 10
3482062809_3b694322c4.jpg : A man wearing a black shirt and blue hair and blue shorts and blue shorts and blue shorts and blue shirt and blue hair and blue shirt and blue shirt and white shirt and black shirt is people

can you please share a copy of your 'weights-improvement-XX.hdf5'

I find the training takes a lot of time. Can you share a copy of your model weight?
We hope to test if we deploy your code successfully.
Up to now, we only run 1 epoch and the test result shows not a single caption can be produced.
Is this normal? will more epoch rounds improve brings reasonable output instead of empty output?
Thank you for your cool work.

Value Error

ValueError: decode_predictions expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 14, 14, 512)

When I ran ..
model = VGG16(include_top=False, weights='imagenet')
img_path = 'Elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print('Input image shape:', x.shape)
preds = model.predict(x)
print('Predicted:', decode_predictions(preds))

I got this error can u help me fix it please.

Why don't you validate during trainning?

Hello, I was wonder why is it that you don't add a validation generator during training. How do you control that the model doesn't overfit too much/it is able to generalize?
Thanks in advance

Final Model

Hey would it be possible for you to upload the final model?

Thanks,
Rohin

UnboundLocalError: local variable 'next' referenced before assignment

caption_generator.py : How do you break out of "while 1"

In file caption_generator.py in function "data_generator" at line 81 there's a

while 1:

During training I only see thousands of executions of the statement

print "yielding count: "+str(gen_count)

which is within this loop.

Since I don't see any exit conditions or break within this loop, I am wondering when and how do we break out of this loop.

Thanks

GPU not working

Hi!
When I ran the train_model to train, keras did not use gpu automatically. However, when I ran mnist codes, the keras used the gpu automatically. I cant tell why. Can somebody enlighten me, please!

Thanks

raise RuntimeError('You must compile your model before using it.')

Hey when i try to train the model i am getting the following error:

Using TensorFlow backend.
413439
WARNING:tensorflow:From C:\Users\UserName\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Model created!
Traceback (most recent call last):
File "D:\Image-caption\Image-Captioning-master\train.py", line 14, in
train(int(sys.argv[1]))
File "D:\Image-caption\Image-Captioning-master\train.py", line 9, in train
model.fit_generator(sd.data_process(batch_size=batch_size), steps_per_epoch=sd.no_samples/batch_size, epochs=epoch, verbose=2, callbacks=None)
File "C:\Users\UserName\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\UserName\Anaconda3\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "C:\Users\UserName\Anaconda3\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
model._make_train_function()
File "C:\Users\UserName\Anaconda3\lib\site-packages\keras\engine\training.py", line 496, in _make_train_function
raise RuntimeError('You must compile your model before using it.')
RuntimeError: You must compile your model before using it.

I am trying to solve it but getting no luck. Its been 2 weeks since i am stuck on the problem. Please help me out here.

anuragmishracse / caption_generator Goto Github PK

caption_generator's People

Contributors

Stargazers

Watchers

Forkers

caption_generator's Issues

Recommend Projects

Recommend Topics

Recommend Org