johnson0722 / ctr_prediction Goto Github PK
View Code? Open in Web Editor NEWCTR prediction using FM FFM and DeepFM
CTR prediction using FM FFM and DeepFM
CTR_Prediction/Deep_FM/utilities.py
Line 26 in ec4de3d
Line 58 in ec4de3d
I am getting this error in DeepFM.py.. Can anyone please help??
您好。FFM算法中,不太理解为什么要加一个field?
您好,有没有同步的sample数据。
多谢多谢!
FM/FM.py中的train_sparse_data_frac_0.01.pkl文件哪来的?
def train_model(sess, model, epochs=10, print_every=50):
"""training model"""
# Merge all the summaries and write them out to train_logs
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter('train_logs', sess.graph)
# get sparse training data
with open('../avazu_CTR/train_sparse_data_frac_0.01.pkl', 'rb') as f:
sparse_data_fraction = pickle.load(f)
# get number of batches
num_batches = len(sparse_data_fraction)
for e in range(epochs):
num_samples = 0
losses = []
for ibatch in range(num_batches):
# batch_size data
batch_y = sparse_data_fraction[ibatch]['labels']
batch_y = np.array(batch_y)
actual_batch_size = len(batch_y)
batch_indexes = np.array(sparse_data_fraction[ibatch]['indexes'], dtype=np.int64)
batch_shape = np.array([actual_batch_size, feature_length], dtype=np.int64)
batch_values = np.ones(len(batch_indexes), dtype=np.float32)
# create a feed dictionary for this batch
feed_dict = {model.X: (batch_indexes, batch_values, batch_shape),
model.y: batch_y,
model.keep_prob:1.0}
loss, accuracy, summary, global_step, _ = sess.run([model.loss, model.accuracy,
merged,model.global_step,
model.train_op], feed_dict=feed_dict)
# aggregate performance stats
losses.append(loss*actual_batch_size)
num_samples += actual_batch_size
# Record summaries and train.csv-set accuracy
train_writer.add_summary(summary, global_step=global_step)
# print training loss and accuracy
if global_step % print_every == 0:
logging.info("Iteration {0}: with minibatch training loss = {1} and accuracy of {2}"
.format(global_step, loss, accuracy))
saver.save(sess, "checkpoints/model", global_step=global_step)
# print loss of one epoch
total_loss = np.sum(losses)/num_samples
print("Epoch {1}, Overall loss = {0:.3g}".format(total_loss, e+1))
Hello @Johnson0722 , I meet this error when I run FM.py. I use the same dataset as you. Can you give me some help?
`Caused by op 'interaction_layer/SparseTensorDenseMatMul/SparseTensorDenseMatMul', defined at:
File "FM.py", line 223, in
model.build_graph()
File "FM.py", line 94, in build_graph
self.inference()
File "FM.py", line 61, in inference
tf.pow(tf.sparse_tensor_dense_matmul(self.X, v), 2),
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\sparse_ops.py", line 1822, in sparse_tensor_dense_matmul
adjoint_b=adjoint_b)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_sparse_ops.py", line 3213, in sparse_tensor_dense_mat_mul
name=name)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
op_def=op_def)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): k (303) from index[19,1] out of bounds (>=303)
[[Node: interaction_layer/SparseTensorDenseMatMul/SparseTensorDenseMatMul = SparseTensorDenseMatMul[T=DT_FLOAT, Tindices=DT_INT64, adjoint_a=false, adjoint_b=false, _device="/job:localhost/replica:0/task:0/devi
ce:CPU:0"](_arg_Placeholder_2_0_2, _arg_Placeholder_1_0_1, _arg_Placeholder_0_0, interaction_layer/v/read)]]
`
tf.reduce_sum(tf.multiply(v[i,self.feature2field[i]], v[j,self.feature2field[j]])),
这个应该是:
tf.reduce_sum(tf.multiply(v[i,self.feature2field[j]], v[j,self.feature2field[i]])),
才对吧。
比如有user有1000W,item有1000W,那么要有 1000W*1000W = 1000000亿 的特征数据?
Caused by op u'Ftrl/update_Variable/SparseApplyFtrl', defined at:
File "DeepFM.py", line 325, in
model.build_graph()
File "DeepFM.py", line 132, in build_graph
self.train()
File "DeepFM.py", line 124, in train
self.train_op = optimizer.minimize(self.loss, global_step=self.global_step)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 410, in minimize
name=name)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 610, in apply_gradients
update_ops.append(processor.update_op(self, grad))
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 128, in update_op
return optimizer._apply_sparse_duplicate_indices(g, self._v)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 1019, in _apply_sparse_duplicate_indices
return self._apply_sparse(gradient_no_duplicate_indices, var)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/training/ftrl.py", line 224, in _apply_sparse
use_locking=self._use_locking)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/training/gen_training_ops.py", line 3299, in sparse_apply_ftrl
use_locking=use_locking, name=name)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py",line 488, in new_func
return func(*args, **kwargs)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/u2019101432/.conda/envs/tf1.12/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Index 131989 at offset 131989 in indices is out of range
[[node Ftrl/update_Variable/SparseApplyFtrl (defined at DeepFM.py:124) = SparseApplyFtrl[T=DT_FLOAT, Tindices=DT_INT64, use_locking=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Variable, Variable/Ftrl, Variable/Ftrl_1, Ftrl/update_Variable/UnsortedSegmentSum, Ftrl/update_Variable/Unique, Ftrl/learning_rate, Ftrl/l1_regularization_strength, Ftrl/update_DNN/b1/Cast, Ftrl/learning_rate_power)]]
Parameter Server架构还是All Reduce架构?
CPU还是GPU?
有没有开源代码参考?
用不用改TensorFlow源码?
性价比最高的方案是?
I want to find the dataset which had processed.Also,the nature dataset is ok. thank you.
deep fm中deep测输入为什么是隐向量v呢?
代码DeepFM.py:
"""
# shape of [None, 2]
self.linear_terms = tf.add(tf.matmul(self.X, w1), b)
# shape of [None, 1]
self.interaction_terms = tf.multiply(0.5,
tf.reduce_mean(
tf.subtract(
tf.pow(tf.matmul(self.X, v), 2),
tf.matmul(tf.pow(self.X, 2), tf.pow(v, 2))),
1, keep_dims=True))
"""
问题:DeepFM论文中每个离散特征(one hot之前)被表示成latent_size长度的embedding值,本质上感觉可以看成一个局部全连接。注意这个地方是每个离散特征。在每个离散特征表示成embedding之后,然后两两相乘形成interaction.
但是代码中用“tf.matmul(self.X, v)”这种方式相乘其结果是把所有的特征最终变成了一个self.k大小的embedding,而不是每个特征都变成self.k大小。
这个地方是不是存在问题呢?感觉应该用tf.multiply(self.X, v),不知道我理解的是否有问题。
In the implement, second order is broadcasted added to first order. I would like to know why they are added and what the meaning is.
In my opinion, there is a replacement method that the first order is summed to a scalar and then it is added to the second order part.
fields_train_dict = {}
for field in fields_train:
with open('dicts/' + field + '.pkl', 'rb') as f:
fields_train_dict[field] = pickle.load(f)
fields_test_dict = {}
for field in fields_test:
with open('dicts/' + field + '.pkl', 'rb') as f:
fields_test_dict[field] = pickle.load(f)
这段代码中路径"dicts/"下的field保存的是什么?
feature_length和field_cnt是什么关系的?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.