Code Monkey home page Code Monkey logo

kbp-el's Introduction

layout
default

Knowledge Base Population (KBP)

In this document we will build an application for the slot filling task of the TAC KBP competition. Slot filling involves extracting information about entities in text; the goal is to use a seed knowledge base to create an augmented knowledge base of (entity, relation, entity) tuples (the second entity is referred to as the "slot").The relationships are defined by the competition guidelines.

This example uses a sample of the data for the 2010 task. Note that the data provided in this example application is only 0.2% of the original corpus so the recall (and thus the F1 score) will be low. However, using 100% of the 2010 corpus, this example system achieves an F1 score of XX on the KBP task, which beats the top result of 29 from the 2010 competition.

Note that in order to run the system on the full data set, you need to replace 2 tables with their full versions (but with the same exact schema): sentence and freebase.

Application overview

The application is an extension of the mention-level extraction system, so please make sure you have gone through that part of the tutorial and have an understanding of basic relation extraction using DeepDive. The main difference here is that we are now concerned with entity-level relationships, not mention-level. In other words,

given the following input:

  • a set of sentences with NLP features
  • a set of Freebase entities
  • an entity-level training set of the form (entity1, relation, entity2),

instead of producing a set of (mention1, relation, mention2) tuples as the final output, we want to produces tuples of the form (entity1, relation, entity2).

Note that in order to obtain the entity-level result we need to perform entity linking, which will associate mentions in text with Freebase entities (the mentions "Barack Hussein Obama" and "President Barack Obama" all refer to the entity Barack Obama).

This tutorial will walk you through building a full DeepDive application that extracts relationships between entities in raw text. We use news articles and blogs as our input data and want to extract all pairs of entities that participate in the KBP relations (e.g. Barack Obama and Michelle Obama for the spouse relation).

The application performs the following high-level steps:

  1. Load data from provided database dump
  2. Extract features. This includes steps to:
  • Extract entity mentions from sentences
  • Extract lexical and syntactic features from mention-level relation candidates (entity mention pairs in the same sentence)
  • Link Freebase entities to mentions in text (entity linking)
  • Generate positive and negative training examples for relation candidates
  • Extract the non-example mention-level relation candidates
  • Extract the entity-level relation candidates by combining the mention-level candidates with entity linking
  1. Generate a factor graph using inference rules
  2. Perform inference and learning
  3. Generate results

Let us now go through the steps to get the example KBP system up and running.

Contents

Installing DeepDive

This tutorial assumes a working installation of DeepDive. Please go through the example application walkthrough before proceeding.

After following the walkthrough, your deepdive directory should contain a folder called app, which should contain a folder called spouse.

Let's now proceed to a tutorial for setting up the KBP application, which will help you get started.

kbp-el's People

Contributors

msushkov avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

lrxzhy

kbp-el's Issues

Preprocess with NLP extractor on new dataset

Hi,

I successfully set up the project and can run kbc on your data. Thanks for your great work!
Now, I would like to run kbc on a new dataset. Unfortunately, the link (http://deepdive.stanford.edu/walkthrough-extras#nlp_extractor) mentioned in your guideline does not work anymore. Do you still backup the source code/scripts to generate "data/db_dump/sql_files" from raw text somewhere?

BTW, is it possible to use your pre-trained model on my new dataset?

Best,
Dat

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.