Code Monkey home page Code Monkey logo

Comments (6)

Prayforhanluo avatar Prayforhanluo commented on June 22, 2024

这里的fields是为了生成offsets, 这里输入的数据经过label encode后,每一列是单独的label_encode, 而embedding的时候为了一次性的过一层embedding layer,就需要累加编码,比如A特征为0,1两种,B特征为0,1,2三种,那么需要把B的0,1,2变成2,3,4。这样在embedding中 A的0 和B的0不会被混淆关联issue

from ctr_algorithm.

yanduoduan avatar yanduoduan commented on June 22, 2024

我将您的代码跑通,取出一组数据,如下:
神经网络的输入:[ 1, 2, 0, 110, 823, 1, 712, 28, 0, 5703,
12062, 575, 1, 0, 128, 3, 2, 36, 0, 1, 0, 18]
offset:[ 0 1 6 11 997 1868 1885 2653 2714 2732 11275 58583
61188 61191 61194 61641 61645 61650 61790 61793 61830 61973]
输入+offset:[ 1, 3, 6, 121, 1820, 1869, 2597, 2681, 2714, 8435,
23337, 59158, 61189, 61191, 61322, 61644, 61647, 61686, 61790, 61794,
61830, 61991]
比如输入+offset中第二列特征索引为3,其实应该为4吧。神经网络输入的第二列特征为2,代表的是0,1,2,第一列特征索引为0,1,所以第二列特特起始从2开始,分别对应到2,3,4,所以我感觉应该为4,而输入为3.不知道是我理解的是由问题吗?

from ctr_algorithm.

Prayforhanluo avatar Prayforhanluo commented on June 22, 2024

嗯嗯, 的确,要累加,然后索引往后+1, 这里应该是个bug; 这里看上应该修改为fields = data_x.max().values + 1

from ctr_algorithm.

Prayforhanluo avatar Prayforhanluo commented on June 22, 2024

Bug
feature_fields的作用是因为所有特征共用一个Embedding表,故而需要每列需要累加其索引
e.g. field_dims = [2, 4, 2], offsets = [0, 2, 6]

所以,实际look up table中
0 - 1行 对应 特征 X0, 即 field_dims[0]
2 - 5行 对应 特征 X1, 即 field_dims[1]
6 - 7行 对应 特征 X2, 即 field_dims[2]
但实际特征取值 forward(self, x) 的 x大小 只在自身词表内取值
比如:X1取值1,对应Embedding内行数就是 offsets[X1] + X1 = 2 + 1 = 3

故而在代码中
fields = data_x.max().values 应该改为 fields = data_x.max().values + 1

from ctr_algorithm.

yanduoduan avatar yanduoduan commented on June 22, 2024

感谢解惑,还有一个问题,就是在Embedding的时候,torch.nn.Embedding(sum(feature_fields)+1,),这句代码是不是不用加1了?

from ctr_algorithm.

Prayforhanluo avatar Prayforhanluo commented on June 22, 2024

是的,可以不加

from ctr_algorithm.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.