Code Monkey home page Code Monkey logo

pelias's Introduction

Pelias

Build Status

Pelias is a set of tools for importing OpenStreetMap data into Elasticsearch, and a simple server to handle queries and autocomplete suggestions.

Requirements

  • PostgreSQL: You'll need a postGIS-enabled database with OpenStreetMap data, imported with osm2pgsql. NOTE: The import process expects certain fields, so you'll need to use the style file here: config/osm2pgsql.style
  • Elasticsearch: For Search download the latest version of Elasticsearch
  • Redis: Geonames lookup for quattroshapes cross-referencing
  • Sidekiq: Used for background processing (also uses Redis)
  • Ruby >= 2

Usage

To get set up, run the following.

Get the code and bundle to download dependencies

$ git clone [email protected]:mapzen/pelias.git && cd pelias
$ bundle

Prepare synonyms

$ bundle exec rake synonyms:build

Set up the index & mappings:

$ bundle exec rake index:create

Prepare geonames

Geonames provide nice alternative names and populations for locations. We cross-reference this data with quattroshapes in the next step in order to provide a better search experience.

$ bundle exec rake geonames:prepare

Insert quattroshapes

These are shapes for various administrative shapes. They are provided by the http://quattroshapes.com/ project.

NOTE: These tasks are enqueued via Sidekiq and must be run in isolated steps. You can run them inline by using the environment variable ES_INLINE=1.

$ bundle exec rake quattroshapes:prepare_all
$ bundle exec rake quattroshapes:populate_admin0 ES_INLINE=1
$ bundle exec rake quattroshapes:populate_admin1 ES_INLINE=1
$ bundle exec rake quattroshapes:populate_admin2 ES_INLINE=1
$ bundle exec rake quattroshapes:populate_local_admin ES_INLINE=1
$ bundle exec rake quattroshapes:populate_locality ES_INLINE=1
$ bundle exec rake quattroshapes:populate_neighborhood ES_INLINE=1

Add OSM data

Assuming you've set up a postGIS-enabled database with OSM data, the following will add all streets and addresses to the index, reverse geocoding them into the above shapes.

$ bundle exec rake osm:populate_street
$ bundle exec rake osm:populate_address
$ bundle exec rake osm:populate_poi

Start the server

$ unicorn

You should now be able to access the server at http://localhost:8080/suggest?query=party

Setup Performance Information

The following is a brief synopsis of setting up this environment including: approximate times to complete each step, amount of data, number of documents, etc.

Architecture/Tuning for a FULL PLANET index in shortest time period

  • PostgreSQL/PostGIS: 1 c3.8xlarge
    • this is only to facilitate the fastest of initial load times into pelias
  • Elasticsearch: 20 m3.2xlarge
    • optimization work to be done to lessen on heap storage requirements
    • assumes 80 shards, 1 replica per shard, half of physical memory allocated to ES for heap
  • Sidekiq: 8 c1.medium
    • only required for initial import to complete in a timely manner
    • can be removed once complete or scaled back as required for updates on an ongoing basis

Using this hardware allocation, we also recommend the following during the initial data load:

  • disable replication in elasticsearch
  • set the index refresh interval to something north of an hour (or disable it altogether for the duration of the indexing process)
  • in PostgreSQL, add the following index (this will take some time if you're working with a full planet installation):
    • CREATE INDEX limit_street_line ON planet_osm_line (name, highway);

Load Times

Using the above architecture, we've observed the following load times:

  • geonames + quattroshapes: roughly an hour
  • osm: ~3 days

Data Footprint

Documents in Elasticsearch upon completion of load:

  • ~66 million

Unique data size on disk:

  • ~300GB
    • ~600GB with one replica

API

search

This is our search endpoint. This is used to search the index for addresses, POIs, etc.

/search?query=brooklyn
/search?query=brooklyn&center=-74.08,40.77
/search?query=brooklyn&viewbox=-74.08,40.77,-73.9,40.67
/search?query=brooklyn&viewbox=-74.08,40.77,-73.9,40.67

suggest

This is an autocomplete suggestion endpoint. It provides search suggestions given text to look up.

/suggest?query=bro
/suggest?query=bro&size=5

reverse

This is the reverse geocoding endpoint. It takes lng and lat params and returns GeoJSON corresponding to the given location.

/reverse?lng=1&lat=2

Demo

Check out our demo here: http://mapzen.com/pelias

LICENSE

MIT License. See included LICENSE

pelias's People

Contributors

heffergm avatar missinglink avatar mjcunningham avatar randymeech avatar seejohnrun avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

okev

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.