codekansas / keras-language-modeling Goto Github PK

View Code? Open in Web Editor NEW

657.0 41.0 171.0 8.25 MB

:book: Some language modeling tools for Keras

Home Page: https://codekansas.github.io/language

License: MIT License

Python 95.63% Shell 4.37%

language-model keras-language-modeling answer-vector

keras-language-modeling's Introduction

keras-language-modeling

Some code for doing language modeling with Keras, in particular for question-answering tasks. I wrote a very long blog post that explains how a lot of this works, which can be found here.

Stuff that might be of interest

attention_lstm.py: Attentional LSTM, based on one of the papers referenced in the blog post and others. One application used it for image captioning. It is initialized with an attention vector which provides the attention component for the neural network.
insurance_qa_eval.py: Evaluation framework for the InsuranceQA dataset. To get this working, clone the data repository and set the INSURANCE_QA environment variable to the cloned repository. Changing config will adjust how the model is trained.
keras-language-model.py: The LanguageModel class uses the config settings to generate a training model and a testing model. The model can be trained by passing a question vector, a ground truth answer vector, and a bad answer vector to fit. Then predict calculates the similarity between a question and answer. Override the build method with whatever language model you want to get a trainable model. Examples are provided at the bottom, including the EmbeddingModel, ConvolutionModel, and RecurrentModel.

Getting Started

# Install Keras (may also need dependencies)
git clone https://github.com/fchollet/keras
cd keras
sudo python setup.py install

# Clone InsuranceQA dataset
git clone https://github.com/codekansas/insurance_qa_python
export INSURANCE_QA=$(pwd)/insurance_qa_python

# Run insurance_qa_eval.py
git clone https://github.com/codekansas/keras-language-modeling
cd keras-language-modeling/
python insurance_qa_eval.py

Alternatively, I wrote a script to get started on a Google Cloud Platform instance (Ubuntu 16.04) which can be run via

cd ~
git clone https://github.com/codekansas/keras-language-modeling
cd keras-language-modeling
source install.py

I've been working on making these models available out-of-the-box. You need to install the Git branch of Keras (and maybe make some modifications) in order to run some of these models; the Keras project can be found here.

The runnable program is insurance_qa_eval.py. This will create a models/ directory which will store a history of the model's weights as it is created. You need to set an environment variable to tell it where the INSURANCE_QA dataset is.

Finally, my setup (which I think is pretty common) is to have an SSD with my operating system, and an HDD with larger data files. So I would recommend creating a models/ symlink from the project directory to somewhere in your HDD, if you have a similar setup.

Serving to a port

I added a command line argument that uses Flask to serve to a port. Once you've installed Flask, you can run:

python insurance_qa_eval.py serve

This is useful in combination with ngrok for monitoring training progress away from your desktop.

Additionally

The official implementation can be found here

Data

L6 from Yahoo Webscope
InsuranceQA data
- Pythonic version

keras-language-modeling's People

Contributors

Stargazers

Watchers

Forkers

udibr xypan1232 ml-ai-nlp-ir chausler binbinbian amitshah kod3r wavelets jim-kukla wubr2000 halisyilboga1 salemameen xuzhenjing2016 wenwei-dev ye-lun wgfi110 cliff007 lvapeab jz3707 rjbashar huarong hydercps libcorner gali472 appliedml realentertain dodocho kaishengyao mhjabreel icewwn mossaab0 hyzcn little1tow nooralahzadeh meshiguge vyraun reactiv lijian8 bityangke rafat-islam1186 wuzhongdehua kaeflint wassname hxl1990 mingmingyang namkhanhtran oneproton pwwq0909 n1ckelman weichungw techstone feidong1991 qgzang manasrk alirezarahimpour rohankshir michaelicaza hemina alexkruegger ganji15 nelken zhxsxuan ccv-edward zwhinmedia lidusik777 guodao bellwind nigeljyng sszzsupersupersupersuper spec2e pcgreat curtis999 suensummit qilicun alimuham1 grainw weizhili-relfektion imutlab gus-guo nizq manli009 techscientist nininininini andyhyh wangjin0818 tonytongzhao melody-xiaomi weiliangxiao colinsongf tlytly341 jcbgamboa kaelchen lgpang dddragons fydlzr ptighe sohuking66 leezqcst duolajiang jkhlot

keras-language-modeling's Issues

Evaluation Result Correct?

To save time, I set np_epoch to 2 and the program only displays 1 epoch. I choose that epoch and evaluate it against the test sets: The top 1 precision figures seem to 1/10 of what the paper claims? Or do I misunderstand something?

Epoch 1/1
14832/14832 [==============================] - 236s - loss: 0.0297 - val_loss: 0.0154
Best: Loss = 0.0154112447405, Epoch = 1
2016-06-14 08:22:54 :: ----- test1 -----
[====================]Top-1 Precision: 0.049444
MRR: 0.131885
2016-06-14 08:46:11 :: ----- test2 -----
[====================]Top-1 Precision: 0.040000
MRR: 0.124294
2016-06-14 09:09:09 :: ----- dev -----
[====================]Top-1 Precision: 0.053000
MRR: 0.128266

Lambda soes not support making

Hello,

Thanks for sharing you experience on this subject.

I've got an issue running insurance_qa_eval.py. Here's the full stacktrace :
Traceback (most recent call last): File "insurance_qa_eval.py", line 274, in <module> model.compile(optimizer=optimizer) File "/home/myuser/tests/insurance_qna/keras-language-modeling/keras_models.py", line 114, in compile qa_model = self.get_qa_model() File "/home/myuser/tests/insurance_qna/keras-language-modeling/keras_models.py", line 101, in get_qa_model self._models = self.build() File "/home/myuser/tests/insurance_qna/keras-language-modeling/keras_models.py", line 279, in build question_pool = merge([maxpool(question_f_dropout), maxpool(question_b_dropout)], mode='concat', concat_axis=-1) File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 485, in __call__ self.add_inbound_node(inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 543, in add_inbound_node Node.create_node(self, inbound_layers, node_indices, tensor_indices) File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 149, in create_node output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0])) File "/usr/local/lib/python3.4/dist-packages/keras/engine/topology.py", line 578, in compute_mask 'but was passed an input_mask: ' + str(input_mask)) Exception: Layer lambda_1 does not support masking, but was passed an input_mask: Elemwise{neq,no_inplace}.0

I'm using

Keras 1.0.2
Thenao 0.8.2

Thanks

Blog is unavialable

How can I get attention weight from attention model?

Hi,
Could I get attention weight from "K.function([model.layers[0].input], [model.layers['my_attention_lstm_layer'].output])" of attention model?

internal values (question/answer representations) are always float64

I augment the model to output representations of questions/answers. However, when I call predict to get the question vectors, I found the internal values of model(e.g., question_out, answer_out) are always with dtype='float64'.
I changed floatX='float32' in both theanorc.txt and ~/.keras/keras.json, and it doesn't work.
Is there anyway to set the output representations to 'float32'?

Example script for AttentionLSTM

I am having a bit of trouble understanding how to incorporate the AttentionLSTM layer into my code. In your blog you have said that "The attentional component can be tacked onto the LSTM code that already exists.". But unlike a standard LSTM, this custom layer requires a second parameter which is the attention vector. As such, I tried the following code to build my model

seq_len, input_dims, output_dims = 200, 4096, 512
input_seq = Input(shape=(seq_len, input_dims,), dtype='float32')
attn = AttentionLSTM(output_dims, input_seq)(input_seq)  
model = Model(input=input_seq, output=attn)

However I get the following error: ValueError: Dimensions 4096 and 200 are not compatible.

My main trouble is understanding what should be the attention vector that should be passed according to your class specification. I know, conceptually, from the Show, Attend and Tell paper, that the attention vector should be each of the 1x4096 vectors. But I can't figure out how to pass that into the AttentionLSTM layer.

It would be very helpful if you could provide a gist or example script to demonstrate how to use the AttentionLSTM layer just like you did with the different rnns in your blog post!

Keras 2 compatibility

I struggled a lot to make the code under keras 2. Could you provide an updated version?

Thanks much!

Can you tell me what test1 and test2 of the insurance data set correspond to in the original data?

There are four possibilities
validation V1
validation V2
test V1
test V2

Or you split any of them? Many thanks.

What is the configuration for the results in results.notes?

@codekansas Can you share your parameter settings for the results shared in results.notes?

Sigmoid in AttentionLSTM

I noticed that you run the attention through a sigmoid because you were having numerical problems:

https://github.com/codekansas/keras-language-modeling/blob/master/attention_lstm.py#L54

This may work, but I think that should actually be a softmax. In the paper you cite, it only says that the activation should be proportional to

exp(dot(m, U_s))

In another paper [1], they explicitly say it should be

softmax(exp(dot(m, U_s)))

[1] https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf

What is single_attention_param?

Could you explain what is the meaning for setting single_attention_param=True in your AttentionLSTMWrapper()?

Thanks!

TypeError: 'NoneType' object is not iterable

Train on 16686 samples, validate on 1854 samples
Epoch 1/1
16686/16686 [==============================] - 1s - loss: 0.0060 - val_loss: 0.0340Fitting epoch 2000

2016-10-28 07:51:05 -- Epoch 1999 Loss = 0.0060, Validation Loss = 0.0340 (Best: Loss = 0.0102, Epoch = 152)
Train on 16686 samples, validate on 1854 samples
Epoch 1/1
[====================]]====================] - 1s - loss: 0.0061 - val_loss: 0.0337
2016-10-28 07:51:07 -- Epoch 2000 Loss = 0.0061, Validation Loss = 0.0337 (Best: Loss = 0.0102, Epoch = 152)
----- test1 -----
Top-1 Precision: 0.117778
MRR: 0.207952
----- test2 -----
Top-1 Precision: 0.121111
MRR: 0.212116
----- dev -----
Top-1 Precision: 0.129000
MRR: 0.216403
Traceback (most recent call last):
File "insurance_qa_eval.py", line 262, in
top1, mrr = evaluator.get_score(verbose=False)
TypeError: 'NoneType' object is not iterable
rzai@rzai00:/prj/keras-language-modeling$
rzai@rzai00:/prj/keras-language-modeling$

Some questions about attention model!

Hi, @codekansas, thanks for sharing the project. I have read the main file about attention model to question answering and the paper you suggested. However, I have some questions as following:
Take insuranceqa.py for example.
1.Is the input of your data the word frequence index? Would you give me an actual example of your input data?
q_data, ag_data, ab_data, targets = get_data(data_sets[0])
2. What is the "mrr" in your code? A evaluation measure?
def get_mrr(model, questions, all_answers, n_good, n_eval=-1):
3. Why are the targets all zeros?

def get_data(f_name):
....
        targets += [0] * len(bad_answers)

4 I find you load the weights of test model, as following:
test_model.load_weights(os.path.join(models_path, 'iqa_model_for_training_iter_900.h5'))
Where are the weights of test model? The weights of trained model can transfer to test model?

5 Similar to my task (figuring out how similar two tweets), I only need to get the similarity of each tweet pair. But I have to set a threshold value, which is used to determine whether two tweets are from the same person or not (If the similarity greater than threshold value, the pair is from the same person, vice versa). Therefore, do you think what the threshold value should be set? Or other method to leverage similarity to determine whether two tweets are from the same person or not?

data_path is hardcoded in several places

Having this be set in just one location would simplify configuration for new users.

I suggest an environment variable for DATA_PATH and will submit a PR for that change if you're interested.

Implementation of Unsupervised RNN language model + trained embeddings

Hi,

I can't find the implementation of this model that was added recently to the results, and that appears to perform much better than the other ones.

Output shape of the similarity merge layer is still incorrect

I use the latest version and check the similarity part of the model, i.e.,
qa_model = merge([question_output, answer_output], mode=similarity, output_shape=lambda x: x[:-1],name='similarity')
When I invoke qa_model.summary(), here is the screen output:

The last layer (similarity) should have an output shape of (None, 1) but is (None, 200). I use the default 'cos' mode of Keras, it is (None,1). Also, the program doesn't work. It has nan loss.

Bootstrapping issue: No clear path to reproduce results

Thanks for sharing this experiment.

I'm trying to get it working to reproduce your results, but it seems like there's a bootstrapping problem.

Running either script produces an error that some required resource doesn't exist in models/.
Assuming insurace_qa_eval.py is the top-level script, I've made some modifications to uncomment the "save embeddings" portion of the script, but I'm still waiting for it to finish running.
It also looks like it will next need to invoke the __main__ block to produce models/word2vec_100_dim.h5 in insurance_qa_embeddings.py in order to finish bootstrapping.

Is that the right approach for getting this running? If so, I'll open a PR when I've got it all working.

The incorporation of attention in attention_lstm.py

In the blog post and in the related literature about attention LSTM, attention is incorporated like

attention_state = tanh(dot(attention_vec, W_attn) + dot(new_hidden_state, U_attn))

However, in attention_lstm.py it is incorporated like:

attention_state = tanh(dot(attention_vec, W_attn) * dot(new_hidden_state, U_attn))

Is it a typo or do you find it a better way of incorporating attention?

Complaint about no shape when using models saved from Genism

Hi, I am trying your work on different data sets, such as the WikiQA. So I think I need to retrain word2vec weights and change the line: 'initial_embed_weights': np.load('word2vec_100_dim.embeddings'). But when I save the model I get from Gensim and put its name within the np.load() brackets. I get an error message saying that my model file has no shape. Am I saving and using the wrong file? I did a model.save in Gensim.

how to train it in an incremental way?

if I interrupt the training process, can I continue it later?

Sample example code in your blog not working and giving TypeError: Cannot convert Type TensorType?

I have came across your tutorial (http://benjaminbolte.com/blog/2016/keras-language-modeling.html) in the web and its pretty simple and easy to understand. As a begineer I tried out your first sample code in my keras (version 1.2.0) with theano backend

import itertools
import numpy as np

sentences = '''
sam is red
hannah not red
hannah is green
bob is green
bob not red
sam not green
sarah is red
sarah not green'''.strip().split('\n')
is_green = np.asarray([[0, 1, 1, 1, 1, 0, 0, 0]], dtype='int32').T

lemma = lambda x: x.strip().lower().split(' ')
sentences_lemmatized = [lemma(sentence) for sentence in sentences]
words = set(itertools.chain(*sentences_lemmatized))
# set(['boy', 'fed', 'ate', 'cat', 'kicked', 'hat'])

# dictionaries for converting words to integers and vice versa
word2idx = dict((v, i) for i, v in enumerate(words))
idx2word = list(words)

# convert the sentences a numpy array
to_idx = lambda x: [word2idx[word] for word in x]
sentences_idx = [to_idx(sentence) for sentence in sentences_lemmatized]
sentences_array = np.asarray(sentences_idx, dtype='int32')

# parameters for the model
sentence_maxlen = 3
n_words = len(words)
n_embed_dims = 3

# put together a model to predict 
from keras.layers import Input, Embedding, merge, Flatten, SimpleRNN
from keras.models import Model

input_sentence = Input(shape=(sentence_maxlen,), dtype='int32')
input_embedding = Embedding(n_words, n_embed_dims)(input_sentence)
color_prediction = SimpleRNN(1)(input_embedding)

predict_green = Model(input=[input_sentence], output=[color_prediction])
predict_green.compile(optimizer='sgd', loss='binary_crossentropy')

# fit the model to predict what color each person is
predict_green.fit([sentences_array], [is_green], nb_epoch=5000, verbose=1)
embeddings = predict_green.layers[1].W.get_value()

# print out the embedding vector associated with each word
for i in range(n_words):
	print('{}: {}'.format(idx2word[i], embeddings[i]))

But at this line

# fit the model to predict what color each person is
predict_green.fit([sentences_array], [is_green], nb_epoch=5000, verbose=1)
embeddings = predict_green.layers[1].W.get_value()

I am getting this error

TypeError: Cannot convert Type TensorType(float32, 3D) (of Variable Subtensor{:int64:}.0) into Type TensorType(float32, (False, False, True)). You can try to manually convert Subtensor{:int64:}.0 into a TensorType(float32, (False, False, True)).

Making evaluation after training work

Hi, I have no problem getting evaluation during training work by setting evaluation mode to all in config. But my supervisor suggests me that:

I should not see the evaluation result during training
I should not use any of the test sets during development.

To achieve this:

I have to put back the commented out lines for evaluation:

evaluator.load_epoch(model, 54) evaluator.get_mrr(model, evaluate_all=True)

I also need to add a line to change the evaluation set during training to dev

#self._eval_sets = dict([(s, self.load(s)) for s in ['dev', 'test1', 'test2']]) self._eval_sets = dict([('dev', self.load('dev')) ])

I also need to add a line to set the evaluation set to test 1 and/or test 2 when doing the evaluation in 1. I try:

evaluator.eval_sets = dict([('dev', evaluator.load('test1')) ]) evaluator.load_epoch(model, 54) evaluator.get_mrr(model, evaluate_all=True)

But it gives me the error:

TypeError: 'dict' object is not callable

What should I do? Thanks in advance.

Can you be a bit more specific about using insurance_qa_embeddings.py?

Hi, thanks for your great work and support. I have got insurance_qa_evaluation.py running with Kera's own embedding layer. How do I bridge insurance_qa_embeddings.py to insurance_qa_evaluation.py?
I want to substitute parts of the input sentences with their synonyms or antonyms and see how training and prediction goes? Any suggestion on how it can be done?

An error when saving model and weights!

json_string = tweet_model.to_json()
open(r'models\tweet_model_architecture.json', 'w', encoding = 'utf-8').write(json_string)
tweet_model.save_weights(r'models\tweet_model_weights.h5',overwrite = True)


  File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "E:/EMNLP/attention/keras-language-modeling-master/keras-language-modeling-master/tweet_similarity.py", line 305, in <module>
    json_string = tweet_model.to_json()

  File "C:\Anaconda2\lib\site-packages\keras\engine\topology.py", line 2343, in to_json
    return json.dumps(model_config, default=get_json_type, **kwargs)

  File "C:\Anaconda2\lib\json\__init__.py", line 251, in dumps
    sort_keys=sort_keys, **kw).encode(obj)

  File "C:\Anaconda2\lib\json\encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)

  File "C:\Anaconda2\lib\json\encoder.py", line 270, in iterencode
    return _iterencode(o, 0)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x88 in position 22: invalid start byte

Have you met this error?

incorrect predicted output shape

I checked out code from master branch and printed sims shape.

in insurance_qa_eval.py, after line 179 (sims = self.model.predict([question, answers]) ),
I printed sims shape

"print(sims.shape)"

nb_epoch is set to 5. It is strange that its shape is (500, 500). I assume that it is (500,1).

train ConvolutionModel model,it seems failed

when i train CNN model,it epochs 2 twice ,the loss value became nan. it seems getting in local minmize.

_pickle.UnpicklingError: the STRING opcode argument must be quoted

Thank you for your great works and support.
Continue to do testing on "insurance_qa_eval.py", encountered error below:

Using Theano backend.
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.

C:\Users\Desktop\Deep_QA_System\data\insurance_qa_python\answers
(None, 1)
Traceback (most recent call last):
C:\Users\Desktop\Deep_QA_System\data\insurance_qa_python\train
File "C:/Users/PycharmProjects/insurance_qa_eval/insurance_qa_eval.py", line 272, in
best_loss = evaluator.train(model)
File "C:/Users/PycharmProjects/insurance_qa_eval/insurance_qa_eval.py", line 98, in train
training_set = self.load('train')
File "C:/Users/PycharmProjects/insurance_qa_eval/insurance_qa_eval.py", line 40, in load
return pickle.load(open(os.path.join(self.path, name), 'rb'))
_pickle.UnpicklingError: the STRING opcode argument must be quoted
Process finished with exit code 1

I suspect the error happened in load function
Line 36:
def load(self, name):
return pickle.load(open(os.path.join(self.path, name), 'rb'))

So I added print function to find the cause. It looks like for 'answers' it works and the data from answers can be printed out, but for 'train' , it failed.

For result on output of "(None, 1)", I do not know where it came from.

P.S. on your code line 245 : 'initial_embed_weights': np.load('models/word2vec_100_dim.h5'),
word2vec_100_dim.h5, do you mean "word2vec_100_dim.embeddings" which on github ?

Any idea to get the performance to 70%

Hi, I mean without doing something not in the paper dos santos 2016

I am mentioning 70% coz it is what the author of this paper reported on using the LSTM+ attention with the insuranceQA data. I get 40 something like codekansas. Can I be confident in blaming dos santos in faking the result?

Not able to replicate results with Embedding + maxpooling

I could not replicate the results with embedding + maxpooling layer. I ran 3000 epochs. Could you please suggest your parameters for producing following results.

Embedding + Max Pooling:

Top 1 Precision:
- 0.492 on test 1
- 0.483 on test 2
- 0.495 on dev
MRR:
- 0.624 on test 1
- 0.611 on test 2
- 0.624 on dev

models

When running insurance_qa_eval.py, I have :

  File "/Users/heri/anaconda/lib/python2.7/site-packages/numpy/lib/npyio.py", line 362, in load
    fid = open(file, "rb")
IOError: [Errno 2] No such file or directory: 'models/word2vec_100_dim.h5'

Exception: Layer lambda_1 does not support masking, but was passed an input_mask: Elemwise{neq,no_inplace}.0

today I am trying insurance_qa_eval.py, got exception below 👍

File "C:\Users\Deep_QA_System\data\insurance_qa_eval.py", line 261, in
model.compile(optimizer=optimizer)
File "C:\Python35\lib\site-packages\keras\keras_models.py", line 111, in compile
qa_model = self.get_qa_model()
File "C:\Python35\lib\site-packages\keras\keras_models.py", line 96, in get_qa_model
self._models = self.build()
File "C:\Python35\lib\site-packages\keras\keras_models.py", line 247, in build
question_pool = merge([maxpool(question_f_rnn), maxpool(question_b_rnn)], mode='concat', concat_axis=-1)
File "C:\Python35\lib\site-packages\keras\engine\topology.py", line 485, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "C:\Python35\lib\site-packages\keras\engine\topology.py", line 543, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "C:\Python35\lib\site-packages\keras\engine\topology.py", line 149, in create_node
output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0]))
File "C:\Python35\lib\site-packages\keras\engine\topology.py", line 578, in compute_mask
'but was passed an input_mask: ' + str(input_mask))
Exception: Layer lambda_1 does not support masking, but was passed an input_mask: Elemwise{neq,no_inplace}.0

Using Python 3.5 Keras-1.0.6

Blog post not opening

The blog post that you mention in your README does not exist anymore. The link gives 404 error.

Trainable weights in AttentionLSTMWrapper

This statement (https://github.com/codekansas/keras-language-modeling/blob/master/attention_lstm.py#L106) overwrites the trainable_weights of the inner LSTM layer. It should add to the weights of the LSTM Layer instead.

Question on top1-precision evaluation formula

Hi, I am wondering why you calculate the top 1 precision by check whether the answer assigned the maximum score by the model is the good answer assigned the maximum score by the model (if my interpretation is not wrong)

I try replacing the computation of c1 by:

c_1 += 1 if max_r in d['good'] else 0

, which I think is more appropriate. But it seems to go wrong as it always end up being zero. Can anyone give me any insight on this? Many thanks.

               indices = d['good'] + d['bad']
                answers = self.pada([self.answers[i] for i in indices])
                question = self.padq([d['question']] * len(indices))
                n_good = len(d['good'])
                sims = model.predict([question, answers], batch_size=500).flatten()
                r = rankdata(sims, method='max')
                max_r = np.argmax(r)
                max_n = np.argmax(r[:n_good])
                c_1 += 1 if max_r == max_n else 0              
                c_2 += 1 / float(r[max_r] - r[max_n] + 1)

Training loss and validation loss are too low

Thanks for the implementation and the tutorial.

I have a general question regarding the training and the validation loss.
I created a model which is inspired by your work here to do prediction on knowledge base triples.
My model uses hinge loss and cosine similarity between positive and negative training instance:

loss = max(0, margin - sim(pos_triple) + sim(neg_triple))

When I train the model, I getting too small values for the training and the validation losses from the first epoch and after few epoch the validation loss is becoming zero.

Did you experienced similar behavior in your experiments? Could anybody provide an explanation?

Best,

Error saving the model

I am using the same code as in the AttentionModel in keras_models.py. I have added the additional Dense Layers in the end. When I try to save the model, it throws the following issue: The code that is throwing this issue is here: keras-team/keras#2659

File "trainer.py", line 48, in create_training_features
json_string = explicit_model.to_json()
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2368, in to_json
config = self.get_config()
File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2163, in get_config
new_node_index = node_conversion_map[node_key]
KeyError: 'input1_ib-0'

What changes are needed to run the CNN model?

Hi, I try changing the attention model to cnn without success. I get complaints about shape of input layers. Can you give me some ideas what to fix in order to run the cnn model included?

Adding one channel to the input data

Hi, thanks for your great work on sharing the one and only working keras-based implementation of deep learning models for NLP that can be found on the web.

Suppose I want to move on to something original by adding one or more channels to the input data,which may be a color channel in image or another language, can you suggest which parts of your code I should look into and share with me any of your ideas on the actual changes needed?

I have had a lot of pains before with getting dimensions of layers matching and doing away with the unused output warning. I would be extremely grateful if you can offer any help.

How to adapt to Keras 1.03

I would like to figure out what is going on with the warning about unused output and modify the code. What do I need to do to get the codes here to run on Keras 1.03? Many thanks in advance.