非常感谢您的项目。有2个小问题： 1、根据数据集，发现你并没有bag，所以sentence_level attention没有起到作用对吧？ 2、

@JinJiao8 在 initial.py 中可以看到，他固定了输入句子长度： <div class="snippet-clipboard-content not

关于lstm的padding和loss的mask计算问题 about information-extraction-chinese HOT 18 CLOSED

wenfeixiang1991 commented on August 22, 2024

关于lstm的padding和loss的mask计算问题

from information-extraction-chinese.

Comments (18)

wenfeixiang1991 commented on August 22, 2024 1

再弄清楚mask loss 和 attention 的原理后，感觉它没有用mask loss 的原因可能是因为数据集特点并不需要吧，即使把生词和空词加入进来，将它们也看作为某种信息，并由于有word level attention 的存在，这样就差不多能够弥补 mask loss了，何况如果先mask loss 再加 attention，会比定长序列加 attention 麻烦，所以估计索性就直接 + attention 而没有 mask + attention。
没人回答，close

from information-extraction-chinese.

Mariobai commented on August 22, 2024

我想问一下啊。他这个项目对于句子长度不固定的处理方式是什么啊？

from information-extraction-chinese.

wenfeixiang1991 commented on August 22, 2024

@JinJiao8 在 initial.py 中可以看到，他固定了输入句子长度：

# length of sentence is 70
    fixlen = 70

对于短的词，补上了“BLANK”，
对于未知词，用“UNK”替换，
并赋予“BLANK”与“UNK”相应的 word id
意味着有相应的 word embedding

from information-extraction-chinese.

Mariobai commented on August 22, 2024

我想问一下，他这里面的这句话什么意思？

max length of position embedding is 60（-60 ~ +60）

maxlen = 60
这一行代码感觉没有起到任何作用啊。这个好像是位置信息的向量吧。

from information-extraction-chinese.

wenfeixiang1991 commented on August 22, 2024

是的，你看 initial.py，写的很清楚

from information-extraction-chinese.

Mariobai commented on August 22, 2024

关键问题是这个怎么来计算实体在句子中的相对位置啊。

from information-extraction-chinese.

wenfeixiang1991 commented on August 22, 2024

我觉得你先看完代码，如果有不清楚的，再问会比较好一点。
1、先找到实体在句子中的索引
2、遍历句子中的每个词，并记录下当前词到2个实体索引位置的相对距离

#For Chinese
        en1pos = sentence.find(en1)
        if en1pos == -1:
            en1pos = 0
        en2pos = sentence.find(en2)
        if en2pos == -1:
            en2post = 0
        
        output = []

        #Embeding the position
        for i in range(fixlen):
            word = word2id['BLANK']
            rel_e1 = pos_embed(i - en1pos)
            rel_e2 = pos_embed(i - en2pos)
            output.append([word, rel_e1, rel_e2])

# embedding the position
def pos_embed(x):
    if x < -60:
        return 0
    if -60 <= x <= 60:
        return x + 61
    if x > 60:
        return 122

from information-extraction-chinese.

Mariobai commented on August 22, 2024

嗯嗯。。好的，还一个问题，就是这里面说采用字来训练字向量，这样词语之间的语义信息岂不是没有了，我用了维基百科中文语料库，去除英文和一些其他字符还剩900MB左右，但是我再训练字向量的时候成为了15MB左右，采用字向量没有问题吗？

from information-extraction-chinese.

wenfeixiang1991 commented on August 22, 2024

我对字向量和词向量的实际效果上有什么具体区别不是很有体会，我都是用针对我们业务语料 word embedding in train 而不是预训练word vector，我也都是词级的，没有试过字级的，直觉上还是词级的靠谱些。

from information-extraction-chinese.

Mariobai commented on August 22, 2024

我都是用针对我们业务语料 word embedding in train 而不是预训练word vector:
这句话什么意思？你用的是one-hot?你的词向量用什么训练的

from information-extraction-chinese.

wenfeixiang1991 commented on August 22, 2024

放到神经网络第一层作为参数，与w2v是一样的，都是稠密向量，只不过在更新的时候不是按着CBOW与skip-gram语言模型，而是按着我的具体分类任务进行更新权值。
本质上就是做了一层参数，参数就是词向量

from information-extraction-chinese.

Mariobai commented on August 22, 2024

可能是我知道的还是太少了吧，对于您的这种方法还是不理解，我采用的是字向量，你们是针对什么领域的实体关系抽取

from information-extraction-chinese.

Mariobai commented on August 22, 2024

我想问一下，怎么在我的网络模型里面加一个测试集。因为现在只有训练集合测试集，并没有测试集。你知道怎么加吗？

from information-extraction-chinese.

Mariobai commented on August 22, 2024

embedding the position

def pos_embed(x):
if x < -60:
return 0
if -60 <= x <= 60:
return x + 61
if x > 60:
return 122

为什么这个地方小于-60就要返回0，在-60到60之间返回 x + 61,大于60就返回122，这个是根据什么定义出来的啊？

from information-extraction-chinese.

wenfeixiang1991 commented on August 22, 2024

就是对相对位置距离做indix编码，假设句子长度定下来了，句子中每个词对句中的实体相对距离范围是-60---60，然后要对其做索引编码，就像词索引一样，这样，在神经网络第一层，构建一个122*position_dim的权值W，随着训练进行，就会更新这个W，在预测时，位置index也就会找起相应的向量。

from information-extraction-chinese.

Mariobai commented on August 22, 2024

非常抱歉打扰您：您上次说的这一部分的代码的含义我还是没有弄清楚，麻烦您说的更浅显一点好吗？ # embedding the position def pos_embed(x): if x < -60: return 0 if -60 <= x <=60: return x + 61 if x > 60： return 122 比如说我现在的句子是：减灾 1.3亿元投入资金减灾经济效益1.3亿元 en1pos = 0, en2pos = 6 当i = 0时为什么pos_embed(I - en1pos) 就要等于61，我的第一个词和我的第一个实体之间的距离是怎么定义的呢？我的句子长度根本就没有61啊。。麻烦您指导一下，我实在是想了很久。

…

在 2017年11月1日，15:52，Feixiang.Wen ***@***.***> 写道：再弄清楚mask loss 和 attention 的原理后，感觉它没有用mask loss 的原因可能是因为数据集特点并不需要吧，即使把生词和空词加入进来，将它们也看作为某种信息，并由于有word level attention 的存在，这样就差不多能够弥补 mask loss了，何况如果先mask loss 再加 attention，会比定长序列加 attention 麻烦，所以估计索性就直接 + attention 而没有 mask + attention。没人回答，close — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AV-2-mjrWr4umYBVGZPmpcpI9MKVojiYks5syCNMgaJpZM4P4ReH>.

from information-extraction-chinese.

wenfeixiang1991 commented on August 22, 2024

你的这个句子是没有那么长，但是你用tensorflow的时候需要固定维数，也就是tf在有些运算上需要稠密矩阵，所以你在数据预处理的时候需要对你的数据统一最大序列长度如果是lstm的话。所以人家也是按着最大分词数的长度来补齐的。
其次，你肯定不用60啊，代码我没有仔细看，但位置向量指的就是当前词与实体的位置编码，就像你会给一个词一个唯一的index一样。
比如： max sequence length = 10
那么一条数据： word1 word2 word3 word4 word5 word6 word7 word8 word9 word10
其中 word3 是entity1，word7是entity2
则 word1 word2 entity1 word4 word5 word6 entity2 word8 word9 word10
然后
a、给词赋id 1 2 3 4 5 6 7 8 9 10
b、给词赋位置（与entity1）索引 -2 -1 0 1 2 3 4 5 6 7
c、给词赋位置（与entity2）索引 -6 -5 -4 -3 -2 -1 0 1 2 3
这下你看到 b、c就是位置编码，对应每个词，然后你看那id是负的，你想办法给它弄成正的呗就。
然后 a、b、c、同时做为 lstm 模型的输入进行构建 embedding层
你要是再问这种问题，我觉得你得教学费了 :>)
你完全可以先看看自然语言处理与深度学习之类的相关知识再看相关代码

from information-extraction-chinese.

Mariobai commented on August 22, 2024

非常感谢您的指导。谢谢。

from information-extraction-chinese.

关于lstm的padding和loss的mask计算问题 about information-extraction-chinese HOT 18 CLOSED

Comments (18)

max length of position embedding is 60（-60 ~ +60）

embedding the position

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent