Code Monkey home page Code Monkey logo

caption_generator's People

Contributors

anuragmishracse avatar crowoy avatar y12uc231 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caption_generator's Issues

When to use pretrained word embeddings?

Hello, I was wondering when do you think is worth to use a pretrained word embeddings model. I am facing a one-to-many problem as well, where my "many" are text paragraphs (~80 words). I have 100K training instances. What do you think?

Also, if I were about to use a pretrained word embedding model, where should I insert it in your code?

Thanks in advance.

Missing InceptionV3

The prepare_dataset.py has finished. I am trying to run the train_model.py however, it is giving me the following error:

ImportError: No module named inception_v3

I thought that the inception_v3 module was included in Tensorflow?

Generating Captions on non-Flickr8k images

How do you go about generating captions on images of your choice?

The testing code seems to reply on the pre-generated encoded_images.py file, which only contains encodings for Flickr8k images in this case. The prepare_dataset.py file does not seem easy to adapt to encode images of your choice into encoded_images.py.

Any help would be appreciated.

AttributeError: 'module' object has no attribute 'CaptionGenerator'

Hi, after running prepare_dataset.py I tried to ran train_dataset.py. But it gives this error message.
``File "train_model.py", line 25, in
train_model(epochs=50)
File "train_model.py", line 6, in train_model
cg = caption_generator.CaptionGenerator()
AttributeError: 'module' object has no attribute 'CaptionGenerator'

bad caption

I have trained the model and the loss is 1.16, acc is 0.7574. But when I take the model to generate caption , the caption is excursive and have no sense.
image
How should I do?

Pre-trained models

Hi,

The model works good and i would like to know if you can share a pre-trained models.

Thanks

ValueError:

You are trying to load a weight file containing 19 layers into a model with 16 layers. (prepare_dataset.pt-- tensorflow beckend)

Instalation

Hั–, I was looking for a solution that would make captions for pictures, and came across your project. But I faced the problem of not being able to run this application on my laptop. Is there any more detailed instruction that will describe step by step what exactly needs to be done to run the program?

A small doubt about implementation.

So I have read the paper and have a small doubt, The authors just initialise the CNN to pre-trained weights(ImageNet) and they don't change the weights further? Can you just tell me your approach for the cnn, you just initialised it with imagenet weights and never altered them? right?

Thank You.

Installing tensorflow

Looks like the implementation of this project is by python 2.7. Im facing issues in installing tensorflow. Am i right with the version of python?

RuntimeError: You must compile your model before using it.

Running prepare_dataset.py is okay.
When I run test_model.py, I found the following error.
How should I change code? Please guide me!

C:\Anaconda3\envs\caption_generator-master\python.exe
D:/code/Python/caption_generator-master/caption_generator/train_model.py
Using TensorFlow backend.
Total samples : 383454
Vocabulary size: 8256
Maximum caption length: 40
Variables initialization done!
Model created!
Traceback (most recent call last):
File "D:/Python/caption_generator-master/caption_generator/train_model.py", line 26, in
train_model(epochs=50)
File "D:/code/Python/caption_generator-master/caption_generator/train_model.py", line 17, in train_model
model.fit_generator(cg.data_generator(batch_size=batch_size), steps_per_epoch=cg.total_samples/batch_size, epochs=epochs, verbose=2, callbacks=callbacks_list)
File "C:\Anaconda3\envs\caption_generator-master\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\envs\caption_generator-master\lib\site-packages\keras\engine\training.py", line 1420, in fit_generator
initial_epoch=initial_epoch)
File "C:\Anaconda3\envs\caption_generator-master\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
model._make_train_function()
File "C:\Anaconda3\envs\caption_generator-master\lib\site-packages\keras\engine\training.py", line 496, in _make_train_function
raise RuntimeError('You must compile your model before using it.')
RuntimeError: You must compile your model before using it.

Process finished with exit code 1

Low accuracy on MSCOCO

HI, I'm following your code and try to train the network on MSCOCO
Here is my code

class Caption_Model:
def init(self,char_to_int,int_to_char,vocab_size=26688,max_caption_len=20,folder_path=path,epochs=10,batch_size=64):
self.img_model=Sequential()
self.text_model=Sequential()
self.model=Sequential()
self.vocab_size=vocab_size
self.max_caption_len=max_caption_len
self.folder_path=folder_path
self.data={}
self.char_to_int=char_to_int
self.int_to_char=int_to_char
self.batch_size=batch_size
self.epochs=epochs

def get_image_model(self):
    self.img_model.add(Dense(Embedding_dim,input_dim=4096,activation='relu'))
    self.img_model.add(RepeatVector(self.max_caption_len+1))
    # self.img_model.summary()
    return self.img_model

def get_text_model(self):
    self.text_model.add(Embedding(self.vocab_size,256,input_length=self.max_caption_len+1))
    self.text_model.add(LSTM(512,return_sequences=True))
    #self.text_model.add(Dropout(0.2))
    self.text_model.add(TimeDistributed(Dense(Embedding_dim,activation='relu')))
    # self.text_model.summary()
    return self.text_model

def get_caption_model(self,predict=False):
    self.get_image_model()
    self.get_text_model()
    self.model.add(Merge([self.img_model,self.text_model],mode='concat'))
    self.model.add(LSTM(1000,return_sequences=False))
    self.model.add(Dense(self.vocab_size))
    self.model.add(Activation('softmax'))
    print "Now model.model"
    sgd = SGD(lr=1e-3, decay=1e-6, momentum=0.99, nesterov=True)
    rms = RMSprop(lr=0.005)
    if predict:
        return
    else:
        # weight='/home/paperspace/Document/DeepLearning/ImageCaption/code/Models/checkpoint/weights-improvement-02-5.2473.hdf5'
        # self.model.load_weights(weight)
        self.model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])

def load_data(self,set_type='train'):
    data={}
    with open(self.folder_path+set_type+'.processed_img.2.pkl') as f:
        data['imgs']=pickle.load(f)
    with open(os.path.join(self.folder_path,'all%spartial_sentences_0.pkl'%set_type)) as f:
        data['partial_sentences']=pickle.load(f)
    return data

def data_generator(self,set_type='train'):
    data=self.load_data(set_type)
    j=0
    temp=data['partial_sentences'].keys()
    partial_sentences,images=[],[]
    next_words=np.zeros((self.batch_size,self.vocab_size)).astype(float)
    count=0
    round_count=0
    while True:
        round_count+=1
        random.shuffle(temp)
        print "the %d round!" %round_count
        for key in temp:
            image=data['imgs'][key]
            for sen in data['partial_sentences'][key]:
                for k in range(len(sen)):
                    count+=1
                    partial=sen[:k+1]
                    partial_sentences.append(partial)
                    images.append(image)
                    # print "index is: ",count-1
                    if k==len(sen)-1:
                        next_words[count-1][self.char_to_int['<end>']]=1
                    else:
                        next_words[count-1][sen[k+1]]=1
                    if count>=self.batch_size:
                        partial_sentences=sequence.pad_sequences(partial_sentences, maxlen=self.max_caption_len+1, padding='post')
                        partial_sentences=np.asarray(partial_sentences)
                        images=np.asarray(images)
                        # partial_sentences=partial_sentences/float(self.vocab_size)
                        # print partial_sentences
                        count=0
                        yield [images,partial_sentences],next_words
                        partial_sentences,images=[],[]
                        next_words=np.zeros((self.batch_size,self.vocab_size)).astype(float)

        j+=1

def train(self):
    self.get_caption_model()
    filepath="Models/checkpoint/weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
    checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
    callbacks_list = [checkpoint]

    self.model.fit_generator(self.data_generator('train'),steps_per_epoch=step_size/self.batch_size,epochs=self.epochs,validation_data=self.data_generator('val'),validation_steps=v_step_size/self.batch_size,callbacks=callbacks_list)
    # self.model.fit_generator(self.data_generator('train'),steps_per_epoch=step_size/self.batch_size,epochs=self.epochs,callbacks=callbacks_list)

    try:
        self.model.save('Models/WholeModel.h5',overwrite=True)
        self.model.save_weights('Models/Weights.h5',overwrite=True)
    except:
        print "Error in saving model."
    print "After training model...\n"

Accuracy maintains about 35% in the end and training loss is about 3.xxx
I just cannot figure out what's wrong with the code.
Could you please offer some help.
Thank you so much!

Running caption_generator on Google Cloud Engine

I am attempting to get this up and running on a Google Cloud Engine (GCE) VM - Debian 4.9.51-1 x86_64. I performed a: sudo pip install -r requirements.txt, and everything installed correctly.

I then attempt to run python caption_generator/prepare_dataset.py. This outputs the following error:

Using TensorFlow backend.
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
Traceback (most recent call last):
  File "caption_generator/prepare_dataset.py", line 2, in <module>
    from keras.preprocessing import image
  File "/usr/local/lib/python2.7/dist-packages/keras/__init__.py", line 3, in <module>
    from . import activations
  File "/usr/local/lib/python2.7/dist-packages/keras/activations.py", line 4, in <module>
    from .utils.generic_utils import deserialize_keras_object
  File "/usr/local/lib/python2.7/dist-packages/keras/utils/__init__.py", line 6, in <module>
    from . import io_utils
  File "/usr/local/lib/python2.7/dist-packages/keras/utils/io_utils.py", line 10, in <module>
    import h5py
  File "/usr/local/lib/python2.7/dist-packages/h5py/__init__.py", line 31, in <module>
    from .highlevel import *
  File "/usr/local/lib/python2.7/dist-packages/h5py/highlevel.py", line 13, in <module>
    from ._hl.base import is_hdf5, HLObject
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/base.py", line 78, in <module>
    dlapl = default_lapl()
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/base.py", line 65, in default_lapl
    lapl = h5p.create(h5p.LINK_ACCESS)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5p.pyx", line 131, in h5py.h5p.create
  File "h5py/h5p.pyx", line 72, in h5py.h5p.propwrap
ValueError: Not a property list class (Not a property list class)

Any help would be much appreciated.

Network does not converge, bad captions

Hello,

I've followed your instructions and started training the network. The loss reaches its minimum value after about 5 epochs and then it starts to diverge again.

After 50 epochs, the generated captions of the best epoch (5th or 6th) look like this:

Predicting for image: 992
2351479551_e8820a1ff3.jpg : exercise lamb Fourth headphones facing pasta soft her soft her soft her soft her soft her dads college soft her dads college soft her her her her her soft her her her her her soft her her her her
Predicting for image: 993
3514179514_cbc3371b92.jpg : fist graffitti soft her soft her Hollywood Fourth Crowd soft her her soft her her her her her soft her her her her her her soft her her her her soft her her her her soft her her her
Predicting for image: 994
1119015538_e8e796281e.jpg : closeout security soft her soft her security fall soft her her her her her fall soft her her her her her her soft her her her her her soft her her her her soft her her her her her
Predicting for image: 995
3727752439_907795603b.jpg : roots college Fourth tree-filled o swing-set places soft her soft her her soft her her soft her her college soft her her her her her her her soft her her her her soft her her her her her her

Any idea what's wrong?

I couldn't get the weights-improvement-48.hdf5

After training, I noticed I just generate weights as follows:
weights-improvement-01.hdf5
weights-improvement-02.hdf5
weights-improvement-03.hdf5
weights-improvement-04.hdf5
Apart from mentioned above, I get no other hdf5 file generated, can anyone tell me what's the problem?
I had checked the address of my training dataset is right and the epoch is 50.

Running on Mac OS X (10.12.6)

I would like to get this running on my local machine before pushing it up to Google Compute Engine.

I setup a new virtual environment, and cloned the repo. I then ran pip install -r requirements.txt and everything installed correctly.

I then tried running python caption_generator/prepare_dataset.py and get the following error:

Using TensorFlow backend.
Traceback (most recent call last):
  File "prepare_dataset.py", line 5, in <module>
    from imagenet_utils import preprocess_input
ImportError: No module named imagenet_utils

I was under the impression that imagenet_utils was included in Keras?

Module Error

Which version of keras is used? I'm getting error saying "module" object is not callable.

Awful Captions Generated when Testing

After finally getting it to work (with basically no changes to the code), and letting it run for 25 epochs, I tried running the test_model.py script. This produces really bad results. I have changed the weights file in the script to use the most recent weights file generated, which was weights-improvement-03.hdf5 for some reason. When training, the accuracy was not increasing, and the loss was increasing.

Here are some of the captions generated when I use that weights file:

Predicting for image: 0
3385593926_d3e9c21170.jpg : A black and white dog jumps in the grass .
Predicting for image: 1
2677656448_6b7e7702af.jpg : A black and white dog is running in the grass .
Predicting for image: 2
311146855_0b65fdb169.jpg : A man wearing a black shirt and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hat and blue shirt and black shirt is people
Predicting for image: 3
1258913059_07c613f7ff.jpg : A man in a black shirt is sitting in a mountain .
Predicting for image: 4
241347760_d44c8d3a01.jpg : A man in a red shirt is playing in the field .
Predicting for image: 5
2654514044_a70a6e2c21.jpg : A black and white dog is running on a field .
Predicting for image: 6
2339106348_2df90aa6a9.jpg : A man wearing a black shirt and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and blue hair and white shirt
Predicting for image: 7
256085101_2c2617c5d0.jpg : A black and white dog jumps in the grass .
Predicting for image: 8
280706862_14c30d734a.jpg : A black and white dog running in the snow .
Predicting for image: 9
3072172967_630e9c69d0.jpg : A man in a red shirt is sitting in the background .
Predicting for image: 10
3482062809_3b694322c4.jpg : A man wearing a black shirt and blue hair and blue shorts and blue shorts and blue shorts and blue shirt and blue hair and blue shirt and blue shirt and white shirt and black shirt is people

can you please share a copy of your 'weights-improvement-XX.hdf5'

I find the training takes a lot of time. Can you share a copy of your model weight?
We hope to test if we deploy your code successfully.
Up to now, we only run 1 epoch and the test result shows not a single caption can be produced.
Is this normal? will more epoch rounds improve brings reasonable output instead of empty output?
Thank you for your cool work.

Value Error

ValueError: decode_predictions expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 14, 14, 512)

When I ran ..
model = VGG16(include_top=False, weights='imagenet')
img_path = 'Elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
print('Input image shape:', x.shape)
preds = model.predict(x)
print('Predicted:', decode_predictions(preds))

I got this error can u help me fix it please.

Why don't you validate during trainning?

Hello, I was wonder why is it that you don't add a validation generator during training. How do you control that the model doesn't overfit too much/it is able to generalize?
Thanks in advance

Final Model

Hey would it be possible for you to upload the final model?

Thanks,
Rohin

caption_generator.py : How do you break out of "while 1"

In file caption_generator.py in function "data_generator" at line 81 there's a

while 1:

During training I only see thousands of executions of the statement

print "yielding count: "+str(gen_count)

which is within this loop.

Since I don't see any exit conditions or break within this loop, I am wondering when and how do we break out of this loop.

Thanks

GPU not working

Hi!
When I ran the train_model to train, keras did not use gpu automatically. However, when I ran mnist codes, the keras used the gpu automatically. I cant tell why. Can somebody enlighten me, please!

Thanks

raise RuntimeError('You must compile your model before using it.')

Hey when i try to train the model i am getting the following error:

Using TensorFlow backend.
413439
WARNING:tensorflow:From C:\Users\UserName\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Model created!
Traceback (most recent call last):
File "D:\Image-caption\Image-Captioning-master\train.py", line 14, in
train(int(sys.argv[1]))
File "D:\Image-caption\Image-Captioning-master\train.py", line 9, in train
model.fit_generator(sd.data_process(batch_size=batch_size), steps_per_epoch=sd.no_samples/batch_size, epochs=epoch, verbose=2, callbacks=None)
File "C:\Users\UserName\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\UserName\Anaconda3\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "C:\Users\UserName\Anaconda3\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
model._make_train_function()
File "C:\Users\UserName\Anaconda3\lib\site-packages\keras\engine\training.py", line 496, in _make_train_function
raise RuntimeError('You must compile your model before using it.')
RuntimeError: You must compile your model before using it.

I am trying to solve it but getting no luck. Its been 2 weeks since i am stuck on the problem. Please help me out here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.