Code Monkey home page Code Monkey logo

wikipedia-wikidata's Introduction

Add Wikipedia and Wikidata to Nominatim

OSM contributors frequently tag items with links to Wikipedia and Wikidata. Nominatim can use the page ranking of Wikipedia pages to help indicate the relative importance of osm features. This is done by calculating an importance score between 0 and 1 based on the number of inlinks to an article for a location. If two places have the same name and one is more important than the other, the wikipedia score often points to the correct place.

These scripts extract and prepare both Wikipedia page rank and Wikidata links for use in Nominatim.

Create a new postgres DB for Processing

Due to the size of initial and intermediate tables, processing can be done in an external database:

CREATE DATABASE wikiprocessingdb;

Wikipedia

Processing these data requires a large amount of disk space (~1TB) and considerable time (>24 hours).

Import & Process Wikipedia tables

This step downloads and converts Wikipedia page data SQL dumps to postgreSQL files which can be imported and processed with pagelink information from Wikipedia language sites to calculate importance scores.

  • The script will processes data from whatever set of Wikipedia languages are specified in the initial languages array

  • Note that processing the top 40 Wikipedia languages can take over a day, and will add nearly 1TB to the processing database. The final output tables will be approximately 11GB and 2GB in size

To download, convert, and import the data, then process summary statistics and compute importance scores, run:

./import_wikipedia.sh

Wikidata

This script downloads and processes Wikidata to enrich the previously created Wikipedia tables for use in Nominatim.

Import & Process Wikidata

This step downloads and converts Wikidata page data SQL dumps to postgreSQL files which can be processed and imported into Nominatim database. Also utilizes Wikidata Query Service API to discover and include place types.

  • Script presumes that the user has already processed Wikipedia tables as specified above

  • Script requires wikidata_place_types.txt and wikidata_place_type_levles.csv

  • script requires the jq json parser

  • Script processes data from whatever set of Wikipedia languages are specified in the initial languages array

  • Script queries Wikidata Query Service API and imports all instances of place types listed in wikidata_place_types.txt

  • Script updates wikipedia_articles table with extracted wikidata

By including Wikidata in the wikipedia_articles table, new connections can be made on the fly from the Nominatim placex table to wikipedia_article importance scores.

To download, convert, and import the data, then process required items, run:

./import_wikidata.sh

License

The source code is available under a GPLv2 license.

wikipedia-wikidata's People

Contributors

mtmail avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.