Code Monkey home page Code Monkey logo

distant-supervised-relation-extraction's Introduction

distant-supervised-relation-extraction

PWC PWC

Implementation of Neural Relation Extraction with Selective Attention over Instances.

Environment Requirements

  • python 3.6
  • pytorch 1.3.0
  • gensim 3.8.0
  • matplotloib 3.1.2
  • sklearn 0.21.3

Data

Usage

  1. Download the NYT10 and decompress it in the current directory.
  2. Preprocess the original data, and the processed data is stored in processed folder.
python preprocess.py
  1. You can use the following the commands to start the program.
python run.py --encoder='cnn' --selector='one'
python run.py --encoder='cnn' --selector='att'
python run.py --encoder='cnn' --selector='avg'
python run.py --encoder='pcnn' --selector='one'
python run.py --encoder='pcnn' --selector='att'
python run.py --encoder='pcnn' --selector='avg'

More details can be seen by python run.py -h.

  1. You can use run the draw.py to visualize the results.
python draw.py

Result

The results of my version are present as follows: pr_cnn pr_pcnn

The training log can be seen in train.log.

Note:

  • Some settings may be different from those mentioned in the paper.
  • No validation set used during training.
  • Some errors exists in my code, but on the whole it is right.
  • If you have any suggestions, please Issue.

Reference Link

distant-supervised-relation-extraction's People

Contributors

onehaitao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

cl-zzx abbhay

distant-supervised-relation-extraction's Issues

请教在ATT类的forward函数中几个问题

在ATT类的forward函数中,
(1)att_sen_reps是否表示给每个句子计算重要程度?
(2)att_score是否表示每个句子所属关系的得分?
(3)bag_reps = torch.mm(att_weight, sen_reps), 这一步为什么要用句子所属关系的权重乘以句子表征呢?请问怎么理解?

 for i in range(batch_size): 
           sen_reps = reps[scope[i]:scope[i+1], :].\
               view(-1, self.filter_num*self.piece) 
           att_sen_reps = torch.mul(sen_reps, self.attention)  # [n,C]*[1,C] = [n,C]
           rel_embedding = self.dense.weight.t()  # [230,53]维度[C,N],其中N表示关系类型的个数
           att_score = torch.mm(att_sen_reps, rel_embedding)  # [n,N] = [n,C]*[C,N],如[1,53]
           att_weight = F.softmax(att_score, dim=0).t()  
           bag_reps = torch.mm(att_weight, sen_reps) 

请教Eval类的__scorer函数中的几个问题

在Eval类的__scorer函数中,
(1)为什么需要将预测出来的结果,按照概率从大到小进行排序?代码如下
sorted_pred_result = sorted( pred_result, key=lambda x: x['prob'], reverse=True)
(2)为什么需要下面使用两层for循环来统计所有句子在所有关系上的概率值?代码如下

       for i in range(all_probs.shape[0]): 
            for j in range(1, self.class_num):
                pred_result.append(
                    {
                        'prob': all_probs[i][j], 
                        'flag': int(j == labels[i])
                    }
                )

(3)测试时,在计算precision的时候,为什么分母是(i+1),怎么理解呢?所有句子在所有关系上的所有概率总共有5037084个,而correct的样本数目只有1950条。代码如下,

        for i, item in enumerate(sorted_pred_result):  
            correct += item['flag']  
            precision.append(float(correct) / float(i + 1)) # [5037084]
            recall.append(float(correct) / float(facts_num)) # [5037084]
        auc = sklearn.metrics.auc(x=recall, y=precision)

关于训练的日志

不好意思再打扰一下,我在训练之前把train.log文件删除了,想请问一下这个log文件是代码运行过程中自动生成的吗?

请教一下在运行run.py文件时候的问题

start to train the model ...
Train: 0%| | 0/1754 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: 'NYTDataLoader' object has no attribute '__collate_fn'
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: 'NYTDataLoader' object has no attribute '__collate_fn'
Traceback (most recent call last):
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "D:\Anaconda\envs\pytorch\lib\multiprocessing\queues.py", line 108, in get
raise Empty
_queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "Z:/distant-supervised-relation-extraction-main/run.py", line 114, in
runner.train()
File "Z:/distant-supervised-relation-extraction-main/run.py", line 46, in train
for step, (data, label, scope) in enumerate(data_iterator):
File "D:\Anaconda\envs\pytorch\lib\site-packages\tqdm_tqdm.py", line 1022, in iter
for obj in iterable:
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self._next_data()
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1152, in _get_data
success, data = self._try_get_data()
File "D:\Anaconda\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1003, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 10800, 15280) exited unexpectedly

以上是运行的时候报错信息,为什么会提示这一句AttributeError: 'NYTDataLoader' object has no attribute '__collate_fn'
我看到在NYTDataloader类的定义里面是有这个的。

data load

hello,I come across a quetion when I run your code. In the run.py file , main function run error. when I load NYT data , it imply that 'ran out of input ' , I don't know why . I try to change the num_worker=0, but not solve this problem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.