Additional Material for the publication:

An Evaluation of State-of-the-Art Approaches to Relation Extraction for Usage on Domain-Specific Corpora

Christoph Brandl, Jens Albrecht and Renato Budinich

This publication was created as part of the research group Future Engineering.

Manually Labelled Future Engineering Data & Adapted FewRel Data

The folder 'fe-training-data' contains all available examples from our manually labelled Future Engineering data. They are splitted into training, test and evaluation data files. The data set is based on articles extracted from electrive.com, a news provider targeting decision-makers, manufacturers and service providers in the e-mobility sector.

In addition, the folder 'fewrel-training-data' contains the used training and evaluation data from the FewRel data set, as described in the conference papers.

Implementations of Different Relation Extraction Approaches

This repository contains different approaches for the Relation Extraction task from text. At the moment the repository contains working implementations of the following approaches :

Entity-aware BLSTM based on this GitHub repository
ERNIE based on this GitHub repository
R-BERT based on this GitHub repository
Matching the Blanks BERT based on the this GitHub repository
BERT Pair based on this GitHub repository

In addition the repository contains a converter for parsing TSV files from the INCEptTION annotation tool transfering them into a data format similar to the format of FewRel data.

Requirements

python == 3.6
torch >= 1.5.0
transformers == 3.0.0
nltk >= 3.2.5
rdflib >= 5.0.0
tagme >= 0.1.3
flair >= 0.6.0
wptools >= 0.4.17
pydotplus >= 2.0.2
graphviz >= 0.10.1
lime >= 0.2.0.1

There is a requiremets.txt file included in the repository for installing all needed libraries in the correct version.
However, note that some of the libraries can not be installed via a requirements file and have to be installed seperately. In particular, PyTorch, Flair and PyCurl.

Installation

In order to use the approaches in this repository some additional files like pretraining checkpoints or additional data sources of the approaches have to be downloaded.

The Matching the Blanks GitHub repository provides a data file for the pre-training process of the BERT model:

Pre-training data for MTB training

The authors of the ERNIE approach provide additional data:

The used data for fine-tuning the approaches to the specific tasks are also provided:

The Entity-aware BLSTM approach uses pre-trained Glove vectors for word representation (the extracted file should be located in a resource folder inside the approaches folder):

GloVe pre-trained word vectors

The dowloaded data can be extracted and moved into the corresponding folder of the approach in the repository.

Usage

Each of the above approaches is included in an own Jupyter notebook. There the approach can be trained on one of the datasets (fine-tuning). At the end of those notebooks all needed information including the trained model weights and additional resources is stored in checkpoint files. This training step is a prerequisite for using the models later for the inference of new sentences in the Text2RelationGraph notebook.

The notebook Text2RelationGraph contains a complete processing from a not annotated text to RDF-Triples building a knowledge graph. Therefore one of the approaches can be chosen dynamically within the notebook. The notebook uses the previously trained and stored information from the approaches individual notebooks.
Additionally an evaluation of all approaches can be done with different datasets. Metrics as accuracy, precision, recall and F1 score are calculated and a confusion matrix is plotted.

lakshmid13579 / fe-relation-extraction-natl21 Goto Github PK

fe-relation-extraction-natl21's Introduction

An Evaluation of State-of-the-Art Approaches to Relation Extraction for Usage on Domain-Specific Corpora

Christoph Brandl, Jens Albrecht and Renato Budinich

Manually Labelled Future Engineering Data & Adapted FewRel Data

Implementations of Different Relation Extraction Approaches

Requirements

Installation

Usage

fe-relation-extraction-natl21's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent