eges's People
Forkers
datianshi21 fenxouxiaoquan lixinghui ispame zhrlove kiminh gitmz-15 lijian10086 linuxfl gyczero mindis zsmx matinj sdh15 xiaoqingwang magic94lee ziyang599 ronghuiju jxz542189 jackeyou river90 leetcode-notes silencioo baichuan qianrenjian technologymz yujun001 joahxiong eadon999 xuqincheng mingxuanm vincentliubuaa hongchaoliu568 nonva jia66 yaakovsu huangqionghua yuyichen09 ningshiqi jetina zxcwolf bubblezhong wangsw8906 liuzhuanghahaha cosastro txyzy1232006 haiming2019 vinson1021 zdaotian xieguangkun tzesing anmingyu11 xuqiong1989 toheight wangboshiwbs lizhimin03 jieliuu xwild zhangzhe2212 xiesai gaohui7141 xxlxx1 xiaozhao111 jiadiwu coder-duibai tiger115136 godamn seven-xu jianlei0808 lebronhe penalizedem luckmoon scukuer legendary001 rzj-1997 wangchengb yu3401 chenghuige kingleao bjjacking kanzakiholmesaria sammzhang kii-chan-iine edisontu guixianjin zuoyan007 greenary-john myknotruby learningandchanging biamgo wangxyowo nlpming recom-lx wen227 liyinchao world4jason alan20062006 zhangyuanxun 2012fang garyguhaifengeges's Issues
负采样的问题
请问Preprocess transition probs 内存爆掉怎么解决呢
能不能做一个冷启动物品的例子
EGES_model_dataset.py issues
该文件第53行self.inputs[-1] 应改为self. batch_labels
oov side info problem
Since all item sequence is generated from random walk, item's side info like brand
will become oov due to low transition possibility.
How to deal with this issue?
论文中的公式(5)是不是有误
应该是少个负号,或者把minimize改成maximize?参考graphSAGE论文中的公式(1)
损失函数的问题
数据链接失效
大佬,数据连接失效了,能发一份数据么?
Can you provide results of the cold item?
I cannot figure out the performance of the trained model upon cold items. Can you provide the code for cold items.
groupby('user_id').agg(list)报错
ValueError: no results
数据集下载
原链接已经失效了
两个带_dataset 和不带的差异是?
小白问一下:两个带_dataset 和不带的差异是?
alias table生成方式有问题
Line 130 in f600b2e
这代码对每个weight都进行了归一化,也就是所有weight都会小等于1。而在alias.py里面,small放的是小于1的数,large放的大于1的数,归一化之后的数都只会放到small中了,那这个alias采样就没有意义了
agg(list)代码报错,请大佬看看。
def get_session(action_data, use_type=None)
中的group_action_data = action_data.groupby('user_id').agg(list)
为何是agg(list)?
agg聚合操作,传入的参数list是不是有误?参数list代码在这一行时候报错了。
报错信息:
File "E:/0001CTR/召回算法汇总/EGES/EGES-master/data_process.py", line 32, in get_session
group_action_data = action_data.groupby('user_id').agg(list)
File "D:\Anaconda2\envs\py3\lib\site-packages\pandas\core\groupby.py", line 3597, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "D:\Anaconda2\envs\py3\lib\site-packages\pandas\core\groupby.py", line 3114, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
File "D:\Anaconda2\envs\py3\lib\site-packages\pandas\core\base.py", line 564, in _aggregate
return self._aggregate_multiple_funcs(arg, _level=_level), None
File "D:\Anaconda2\envs\py3\lib\site-packages\pandas\core\base.py", line 616, in _aggregate_multiple_funcs
return concat(results, keys=keys, axis=1)
File "D:\Anaconda2\envs\py3\lib\site-packages\pandas\tools\merge.py", line 845, in concat
copy=copy)
File "D:\Anaconda2\envs\py3\lib\site-packages\pandas\tools\merge.py", line 878, in init
raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate
权重矩阵如何获取到??
我看模型文件
tensor_name: softmax_w/Adam_1
tensor_name: softmax_w
tensor_name: softmax_b/Adam
tensor_name: softmax_b
tensor_name: embedding3/Adam_1
tensor_name: embedding3
tensor_name: embedding2/Adam
tensor_name: embedding2
tensor_name: embedding1/Adam
tensor_name: Variable/Adam_1
tensor_name: embedding0/Adam_1
tensor_name: embedding0/Adam
tensor_name: beta2_power
tensor_name: embedding2/Adam_1
tensor_name: embedding0
tensor_name: softmax_w/Adam
tensor_name: beta1_power
tensor_name: embedding1/Adam_1
tensor_name: embedding1
tensor_name: embedding3/Adam
tensor_name: Variable/Adam
tensor_name: softmax_b/Adam_1
tensor_name: Variable
这些哪个是权重矩阵值。embedding0,embedding1,embedding2,embedding3的值应该是sku的embedding、brand、cate和shop的embedding吧
产生的上下文为什么每次都要加上0呢?
# brand_list, shop_list, cate_list = side_infos
session_items = []
for lst in session_list:
lst = list(map(int, lst)) + [0]
session_items.extend(lst)
没看出冷启动物品怎么训练进去的呀
关于您代码中的session截断的问题
大佬您好,万分感谢您的分享!!!
您在readme中有说:“session截断依据为最后一个action为下单,或者30分钟内没有action”
请问为什么把'下单'操作作为截断的依据呢,如果我在半个小时内,先后下单了A,B两件商品,这不也说A,B之间有潜在的推荐关系吗?
关于negative sampling的问题
请问由商品序列生成graph的时候,这里是不是生成的有向图
# session2graph
node_pair = dict()
for session in session_list_all:
for i in range(1, len(session)):
if (session[i - 1], session[i]) not in node_pair.keys():
node_pair[(session[i - 1], session[i])] = 1
else:
node_pair[(session[i - 1], session[i])] += 1
in_node_list = list(map(lambda x: x[0], list(node_pair.keys())))
out_node_list = list(map(lambda x: x[1], list(node_pair.keys())))
weight_list = list(node_pair.values())
graph_df = pd.DataFrame({'in_node': in_node_list, 'out_node': out_node_list, 'weight': weight_list})
新的数据,增量修改或新增该如何做?
新的数据,增量修改或新增该如何做?
hello,看了您的code分享,很nice!有个问题不懂word2vec中的target应该是单个id吧?为啥这里的y是list?
Preprocess transition probs...这一步时间很长
这一步时间非常长,而且,内存会从几个G慢慢变得越来越大,最后256G的内存会爆掉
感觉sampled_softmax那里直接用均匀采样(uniform)是不是不太好?
感觉可以根据word2vec原论文中的那个负采样概率公式得到每个id的采样概率,然后根据概率采样几个负样本更好(比如alias采样)?
感觉code author理解错了原文的loss??
看的是EGES_model.py。�
def make_skipgram_loss() 这个函数像代码作者这样写,我理解的意思是,最大化了node v的embedding Hv和node v的context node u的id的共现概率??
可是原文公式(8)的loss是L(v,u,y)=−[ylog(σ(HvT Zu))+(1−y)log(1−σ(HvT Zu))]啊??
求问是我理解错了还是代码作者理解错了呀....
另外求问_dataset版的py文件是干什么用的呀?
side information embedding是否被所有node共享的问题
代码中每个side information对应一个embedding层:
def embedding_init(self):
cat_embedding_vars = []
for i in range(self.num_feat):
embedding_var = tf.Variable(tf.random_uniform((self.feature_lens[i], self.embedding_dim), -1, 1), name='embedding'+str(i),
trainable=True)
cat_embedding_vars.append(embedding_var)
return cat_embedding_vars
这样生成的所有的side information的embedding lookup table的尺寸为:
(side information个数,side information的取值个数(不固定),embedding向量维度)
但是论文中2.4节第三段提到对于每个node的每个side information,都有其单独对应的一个embedding向量:
Specifically, W_v_0 denotes the embedding of item v , and W_v_s denotes the embedding of the s-th type of side information attached to item v.
意思是不是所有的side information的embedding lookup table应该有如下尺寸:
(node个数,side information个数,side information的取值个数(不固定),embedding向量维度)?这样若两个node的同一个side information的取值相同,这两个side information的embedding也会不一样。但是按照代码里的逻辑,它们的embedding是一样的。
OOM问题
当节点有580万左右时embedding_size=128
embedding_var = tf.Variable(tf.random_uniform((self.feature_lens[i], self.embedding_dim), -1, 1), name='embedding'+str(i),
trainable=True)
这里出现OOM,是因为节点太多了吗
运行期异常
运行环境
python 3.7.3
tensorflow == 1.14.0
joblib
networkx
运行步骤:
python data_process.py
pyhon EGES.py
报错信息:
Errors may have originated from an input operation.
Input Source operations connected to node embedding_lookup_2:
inputs_shop (defined at F:/PycharmProjects/EGES-master/EGES.py:70)
Variable_2/read (defined at F:/PycharmProjects/EGES-master/EGES.py:79)
Original stack trace for 'embedding_lookup_2':
File "F:/PycharmProjects/EGES-master/EGES.py", line 80, in
shop_embed = tf.nn.embedding_lookup(shop_embedding, inputs_shop)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\embedding_ops.py", line 315, in embedding_lookup
transform_fn=None)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\embedding_ops.py", line 133, in _embedding_lookup_and_transform
array_ops.gather(params[0], ids, name=name), ids, max_norm)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3475, in gather
return gen_array_ops.gather_v2(params, indices, axis, name=name)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 4097, in gather_v2
batch_dims=batch_dims, name=name)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
op_def=op_def)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()
输出的权重矩阵如何得到?
您好,side info的权重矩阵即embedding怎么得到呢?好像程序中没有这一块的逻辑
尊敬的作者,请问您对计算每个item的top n个相似的邻居有没有什么建议
The author used the link prediction task in the offline experiments.
I didn't find the train-test split process in this code. How to evaluate the accuracy of the model?
为什么要将多个session串成一个大session?
session_items = []
for lst in session_list:
lst = list(map(int, lst)) + [0]
session_items.extend(lst)
在后续生成target的时候这里不会将一些原本没有共现关系的item弄成共现的了吗?
Preprocess transition probs...
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.