My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.
Hi! Thank you for making this code! I'm studying CDSSM and trying to import dataset in. I'm new to this so I'm a little confused, so sorry if this question is too easy. If I port data from a dataset, should it be in an array of one-hot vectors?
For the variable length input, if I understand you correctly, it just can fit one query by one query instead of fitting all data? So I would like to know, whether there exist some tricks to fit all of the variable length input in one batch?
Hi, I stumbled upon your implementation of dssm and was wondering the following: should the different weights matrices be shared between query and doc?
neg_l_Ds = [[] for j in range(J)] for i in range(sample_size): possibilities = list(range(sample_size)) possibilities.remove(i) negatives = np.random.choice(possibilities, J, replace = False) for j in range(J): negative = negatives[j] neg_l_Ds[j].append(pos_l_Ds[negative])
I think negative sample is not the corresponding search result. So why neg_l_Ds are derived from pos_l_Ds?
I have been working on implementing DSSM and CDSSM for a while now. Your code is excellent, easy to understand and the comments help to follow it in perfect sync with the paper.
The entire code is running. However, I wanted to see the value of R_Q_D_p or R_Q_D_ns in the end, and could not to do that. Backend's eval function gives an error, and I am unable to view the actual score inside the tensor.
I would be grateful if you could tell how to view the value/scores.
Once again, I really appreciate your efforts and well-written code!
I know question-question match is a text similarity problem.
What about question-answer match or question-doc match? It is used in information retrieval.
I have trouble understanding this fitting process:
foriinrange(sample_size):
history=model.fit([l_Qs[i], pos_l_Ds[i]] + [neg_l_Ds[j][i] forjinrange(J)], y, epochs=1, verbose=0)
where each of the training sample goes through the network only once. I don't think it is applicable.
On the other hand, I don't think using a padded input is applicable, either. It just doesn't match the original method. Could someone give me some advice on how to deal with variable inputs (in the paper)?