Comments (6)
这里的fields是为了生成offsets, 这里输入的数据经过label encode后,每一列是单独的label_encode, 而embedding的时候为了一次性的过一层embedding layer,就需要累加编码,比如A特征为0,1两种,B特征为0,1,2三种,那么需要把B的0,1,2变成2,3,4。这样在embedding中 A的0 和B的0不会被混淆关联issue
from ctr_algorithm.
我将您的代码跑通,取出一组数据,如下:
神经网络的输入:[ 1, 2, 0, 110, 823, 1, 712, 28, 0, 5703,
12062, 575, 1, 0, 128, 3, 2, 36, 0, 1, 0, 18]
offset:[ 0 1 6 11 997 1868 1885 2653 2714 2732 11275 58583
61188 61191 61194 61641 61645 61650 61790 61793 61830 61973]
输入+offset:[ 1, 3, 6, 121, 1820, 1869, 2597, 2681, 2714, 8435,
23337, 59158, 61189, 61191, 61322, 61644, 61647, 61686, 61790, 61794,
61830, 61991]
比如输入+offset中第二列特征索引为3,其实应该为4吧。神经网络输入的第二列特征为2,代表的是0,1,2,第一列特征索引为0,1,所以第二列特特起始从2开始,分别对应到2,3,4,所以我感觉应该为4,而输入为3.不知道是我理解的是由问题吗?
from ctr_algorithm.
嗯嗯, 的确,要累加,然后索引往后+1, 这里应该是个bug; 这里看上应该修改为fields = data_x.max().values + 1
from ctr_algorithm.
Bug
feature_fields的作用是因为所有特征共用一个Embedding表,故而需要每列需要累加其索引
e.g. field_dims = [2, 4, 2], offsets = [0, 2, 6]
所以,实际look up table中
0 - 1行 对应 特征 X0, 即 field_dims[0]
2 - 5行 对应 特征 X1, 即 field_dims[1]
6 - 7行 对应 特征 X2, 即 field_dims[2]
但实际特征取值 forward(self, x) 的 x大小 只在自身词表内取值
比如:X1取值1,对应Embedding内行数就是 offsets[X1] + X1 = 2 + 1 = 3
故而在代码中
fields = data_x.max().values 应该改为 fields = data_x.max().values + 1
from ctr_algorithm.
感谢解惑,还有一个问题,就是在Embedding的时候,torch.nn.Embedding(sum(feature_fields)+1,),这句代码是不是不用加1了?
from ctr_algorithm.
是的,可以不加
from ctr_algorithm.
Related Issues (7)
- 模型中offset的问题 HOT 2
- 模型中编码问题 HOT 2
- 关于DIN的疑问
- 关于DIN模型的疑惑 HOT 4
- 关于DIEN的PyTorch实现 HOT 1
- 请问,Pytorch版本的xDeepFM怎么跑不通啊 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctr_algorithm.