Code Monkey home page Code Monkey logo

fe-relation-extraction-natl21's Introduction

Additional Material for the publication:

An Evaluation of State-of-the-Art Approaches to Relation Extraction for Usage on Domain-Specific Corpora

Christoph Brandl, Jens Albrecht and Renato Budinich

This publication was created as part of the research group Future Engineering.

Manually Labelled Future Engineering Data & Adapted FewRel Data

The folder 'fe-training-data' contains all available examples from our manually labelled Future Engineering data. They are splitted into training, test and evaluation data files. The data set is based on articles extracted from electrive.com, a news provider targeting decision-makers, manufacturers and service providers in the e-mobility sector.

In addition, the folder 'fewrel-training-data' contains the used training and evaluation data from the FewRel data set, as described in the conference papers.

Implementations of Different Relation Extraction Approaches

This repository contains different approaches for the Relation Extraction task from text. At the moment the repository contains working implementations of the following approaches :

In addition the repository contains a converter for parsing TSV files from the INCEptTION annotation tool transfering them into a data format similar to the format of FewRel data.

 

Requirements


  • python == 3.6
  • torch >= 1.5.0
  • transformers == 3.0.0
  • nltk >= 3.2.5
  • rdflib >= 5.0.0
  • tagme >= 0.1.3
  • flair >= 0.6.0
  • wptools >= 0.4.17
  • pydotplus >= 2.0.2
  • graphviz >= 0.10.1
  • lime >= 0.2.0.1

There is a requiremets.txt file included in the repository for installing all needed libraries in the correct version.
However, note that some of the libraries can not be installed via a requirements file and have to be installed seperately. In particular, PyTorch, Flair and PyCurl.

 

Installation


In order to use the approaches in this repository some additional files like pretraining checkpoints or additional data sources of the approaches have to be downloaded.

The Matching the Blanks GitHub repository provides a data file for the pre-training process of the BERT model:

The authors of the ERNIE approach provide additional data:

The used data for fine-tuning the approaches to the specific tasks are also provided:

The Entity-aware BLSTM approach uses pre-trained Glove vectors for word representation (the extracted file should be located in a resource folder inside the approaches folder):

 

The dowloaded data can be extracted and moved into the corresponding folder of the approach in the repository.

 

Usage


Each of the above approaches is included in an own Jupyter notebook. There the approach can be trained on one of the datasets (fine-tuning). At the end of those notebooks all needed information including the trained model weights and additional resources is stored in checkpoint files. This training step is a prerequisite for using the models later for the inference of new sentences in the Text2RelationGraph notebook.

The notebook Text2RelationGraph contains a complete processing from a not annotated text to RDF-Triples building a knowledge graph. Therefore one of the approaches can be chosen dynamically within the notebook. The notebook uses the previously trained and stored information from the approaches individual notebooks.
Additionally an evaluation of all approaches can be done with different datasets. Metrics as accuracy, precision, recall and F1 score are calculated and a confusion matrix is plotted.

fe-relation-extraction-natl21's People

Contributors

jsalbr avatar christophbrandl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.