onehaitao / distant-supervised-relation-extraction Goto Github PK

View Code? Open in Web Editor NEW

18.0 2.0 2.0 608 KB

Implementation of Neural Relation Extraction with Selective Attention over Instances.

License: MIT License

Python 99.09% Shell 0.91%

relation-extraction distant-supervision nyt-corpus

distant-supervised-relation-extraction's Introduction

distant-supervised-relation-extraction

Implementation of Neural Relation Extraction with Selective Attention over Instances.

Environment Requirements

python 3.6
pytorch 1.3.0
gensim 3.8.0
matplotloib 3.1.2
sklearn 0.21.3

Data

NYT10

Usage

Download the NYT10 and decompress it in the current directory.
Preprocess the original data, and the processed data is stored in processed folder.

python preprocess.py

You can use the following the commands to start the program.

python run.py --encoder='cnn' --selector='one'
python run.py --encoder='cnn' --selector='att'
python run.py --encoder='cnn' --selector='avg'
python run.py --encoder='pcnn' --selector='one'
python run.py --encoder='pcnn' --selector='att'
python run.py --encoder='pcnn' --selector='avg'

More details can be seen by python run.py -h.

You can use run the draw.py to visualize the results.

python draw.py

Result

The results of my version are present as follows:

The training log can be seen in train.log.

Note:

Some settings may be different from those mentioned in the paper.
No validation set used during training.
Some errors exists in my code, but on the whole it is right.
If you have any suggestions, please Issue.

Reference Link

distant-supervised-relation-extraction's People

Contributors

Stargazers

Watchers

Forkers

cl-zzx abbhay

distant-supervised-relation-extraction's Issues

请教在ATT类的forward函数中几个问题

在ATT类的forward函数中，
（1）att_sen_reps是否表示给每个句子计算重要程度？
（2）att_score是否表示每个句子所属关系的得分？
（3）bag_reps = torch.mm(att_weight, sen_reps)，这一步为什么要用句子所属关系的权重乘以句子表征呢？请问怎么理解？

 for i in range(batch_size): 
           sen_reps = reps[scope[i]:scope[i+1], :].\
               view(-1, self.filter_num*self.piece) 
           att_sen_reps = torch.mul(sen_reps, self.attention)  # [n,C]*[1,C] = [n,C]
           rel_embedding = self.dense.weight.t()  # [230,53]维度[C,N],其中N表示关系类型的个数
           att_score = torch.mm(att_sen_reps, rel_embedding)  # [n,N] = [n,C]*[C,N],如[1,53]
           att_weight = F.softmax(att_score, dim=0).t()  
           bag_reps = torch.mm(att_weight, sen_reps)

请教Eval类的__scorer函数中的几个问题

在Eval类的__scorer函数中，
（1）为什么需要将预测出来的结果，按照概率从大到小进行排序？代码如下
sorted_pred_result = sorted( pred_result, key=lambda x: x['prob'], reverse=True)
（2）为什么需要下面使用两层for循环来统计所有句子在所有关系上的概率值？代码如下

       for i in range(all_probs.shape[0]): 
            for j in range(1, self.class_num):
                pred_result.append(
                    {
                        'prob': all_probs[i][j], 
                        'flag': int(j == labels[i])
                    }
                )

（3）测试时，在计算precision的时候，为什么分母是(i+1)，怎么理解呢？所有句子在所有关系上的所有概率总共有5037084个，而correct的样本数目只有1950条。代码如下，

        for i, item in enumerate(sorted_pred_result):  
            correct += item['flag']  
            precision.append(float(correct) / float(i + 1)) # [5037084]
            recall.append(float(correct) / float(facts_num)) # [5037084]
        auc = sklearn.metrics.auc(x=recall, y=precision)

关于训练的日志

不好意思再打扰一下，我在训练之前把train.log文件删除了，想请问一下这个log文件是代码运行过程中自动生成的吗？

start to train the model ...
Train: 0%| | 0/1754 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: 'NYTDataLoader' object has no attribute '__collate_fn'
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: 'NYTDataLoader' object has no attribute '__collate_fn'
Traceback (most recent call last):
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\queues.py", line 108, in get
raise Empty
_queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "Z:/distant-supervised-relation-extraction-main/run.py", line 114, in
runner.train()
File "Z:/distant-supervised-relation-extraction-main/run.py", line 46, in train
for step, (data, label, scope) in enumerate(data_iterator):
File "D:\Anaconda\envs\pytorch\lib\site-packages\tqdm_tqdm.py", line 1022, in iter
for obj in iterable:
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self._next_data()
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1152, in _get_data
success, data = self._try_get_data()
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1003, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 10800, 15280) exited unexpectedly

以上是运行的时候报错信息，为什么会提示这一句AttributeError: 'NYTDataLoader' object has no attribute '__collate_fn'
我看到在NYTDataloader类的定义里面是有这个的。

data load

hello，I come across a quetion when I run your code. In the run.py file , main function run error. when I load NYT data , it imply that 'ran out of input ' , I don't know why . I try to change the num_worker=0, but not solve this problem