Code Monkey home page Code Monkey logo

lookup's Introduction

DBpedia Lookup

DBpedia Lookup is a web service that can be used to look up DBpedia URIs by related keywords. Related means that either the label of a resource matches, or an anchor text that was frequently used in Wikipedia to refer to a specific resource matches (for example the resource http://dbpedia.org/resource/United_States can be looked up by the string "USA"). The results are ranked by the number of inlinks pointing from other Wikipedia pages at a result page.

Web APIs

Two APIs are offered: Keyword Search and Prefix Search. A hosted version of the Lookup service is available on the DBpedia server infrastructure.

Keyword Search

The Keyword Search API can be used to find related DBpedia resources for a given string. The string may consist of a single or multiple words.

Example: Places that have the related keyword "berlin"

http://lookup.dbpedia.org/api/search/KeywordSearch?QueryClass=place&QueryString=berlin

Prefix Search (i.e. Autocomplete)

The Prefix Search API can be used to implement autocomplete input boxes. For a given partial keyword like berl the API returns URIs of related DBpedia resources like http://dbpedia.org/resource/Berlin.

Example: Top five resources for which a keyword starts with "berl"

http://lookup.dbpedia.org/api/search/PrefixSearch?QueryClass=&MaxHits=5&QueryString=berl

Parameters

The query parameters accepted by the endpoints are

  • QueryString: a string for which a DBpedia URI should be found.
  • QueryClass: a DBpedia class from the Ontology that the results should have (for owl#Thing and untyped resource, leave this parameter empty).
  • MaxHits: the maximum number of returned results (default: 5)

JSON support

By default all data is returned as XML, the service also retuns JSON to any request including the Accept: application/json header.

Running a local mirror of the webservice

Clone and build the DBpedia extraction framework

DBpedia Lookup depends on the core of the DBpedia extraction framework, which is not available in a public Maven repo at the moment. Java 7 is required to compile it.

git clone git://github.com/dbpedia/extraction-framework.git
cd extraction-framework
git checkout DBpedia_3.8
mvn clean install

Clone and build DBpedia Lookup

git clone git://github.com/dbpedia/lookup.git
cd lookup
mvn clean install

Download and configure the index

wget http://spotlight.dbpedia.org/download/dbpedia-lookup-index-3.8.tgz
tar xzvf dbpedia-lookup-index-3.8.tgz

Run the server

./run Server dbpedia-lookup-index-3.8

The server should now be running at http://localhost:1111

Rebuilding the index

Rebuilding an index is usually not required, if you only intend on running a local mirror of the service you can donwload a prebuilt index as outlined above.

To re-build the index you will require

Get the following DBpedia datasets

from http://downloads.dbpedia.org/current/en/

  • redirects_en.nt
  • short_abstracts_en.nt
  • instance_types_en.nt
  • article_categories_en.nt

Concatenate all data and sort by URI

This is necessary because indexing in sorted order is significantly faster.

  cat instance_types_en.nt  \
      short_abstracts_en.nt \
      article_categories_en.nt | sort >all_dbpedia_data.nt

Get the dataset redirects_en.nt

Redirects are not indexed, but they are excluded as targets of lookup.

Run Indexer

The indexer has to be run twice:

  1. with the DBpedia data

     ./run Indexer lookup_index_dir redirects_en.nt all_dbpedia_data.nt
    
  2. with the pignlproc data

     ./run Indexer lookup_index_dir redirects_en.nt nerd_stats_output.tsv
    

Support and feedback

The best way to get support or give feedback on the Lookup project is via the DBpedia discussion mailing list. More technical queries about the code base should be directed to the DBpedia developers mailing list.

The DBpedia wiki also has useful information on the project.

Maintainers

lookup's People

Contributors

jcsahnwaldt avatar maxjakob avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.