Code Monkey home page Code Monkey logo

osmesa's Introduction

OSMesa

Join the chat at https://gitter.im/osmesa/Lobby

This project is a collection of tools for working with OpenStreetMap (OSM). It is built to enable large scale batch analytic jobs to run on the latest OSM data, as well as streaming jobs which operate on updated with minutely replication files.

Getting Started

This library is a toolkit meant to make the munging and manipulation of OSM data a simpler affair than it would otherwise be. Nevertheless, a significant degree of domain-specific knowledge is necessary to profitably work with OSM data. Prospective users would do well to study the OSM data-model and to develop an intuitive sense for how the various pieces of the project hang together to enable an open-source, globe-scale map of the world.

If you're already fairly comfortable with OSM's model, running one of the diagnostic (console printing/debugging) Spark Streaming applications provided in the analytics subproject is probably the quickest way to explore Spark SQL and its usage within this library. To run the change stream processor application from the beginning of (OSM) time and until cluster failure or user termination, try this:

# head into the 'src' directory
cd src

# build the jar we'll be submitting to spark
sbt "project analytics" assembly

# submit the streaming application to spark for process management
spark-submit \
  --class osmesa.analytics.oneoffs.ChangeStreamProcessor \
  ./analytics/target/scala-2.11/osmesa-analytics.jar \
  --start-sequence 1

Deployment

Utilities are provided in the deployment directory to bring up cluster and enable you to push the OSMesa jar to that cluster. The spawned EMR cluster comes with Apache Zeppelin enabled, which allows jars to be registered/loaded for a console-like experience similar to Jupyter or IPython notebooks but which will execute spark jobs across the entire spark cluster. Actually wiring up Zeppelin to use OSMesa sources is beyond the scope of this document, but it is a relatively simple configuration.

Statistics

Summary statistics aggregated at the user and hashtag level that are supported by OSMesa:

  • Number of added buildings
  • Number of modified buildings
  • Number of added roads
  • Number of modified roads
  • Km of added roads
  • Km of modified roads
  • Number of added waterways
  • Number of modified waterways
  • Km of added waterways
  • Km of modified waterways
  • Number of added points of interest
  • Number of modified points of interest

SQL Tables

Statistics calculation, whether batch or streaming, updates a few tables that jointly can be used to discover user or hashtag stats. These are the schemas of the tables being updated.

These tables are fairly normalized and thus not the most efficient for directly serving statistics. If that's your goal, it might be useful to create materialized views for any further aggregation. A couple example queries that can serve as views are provided: hashtag_statistics and user_statistics

Batch

  • ChangesetStats will produce an ORCfile with statistics aggregated by changeset

Stream

Vector Tiles

Vector tiles, too, are generated in batch and via streaming so that a fresh set can be quickly generated and then kept up to date. Summary vector tiles are produced for two cases: to illustrate the scope of a user's contribution and to illustrate the scope of a hashtag/campaign within OSM

Batch

  • FootprintByCampaign produces a z/x/y stack of vector tiles corresponding to all changes marked with a given hashtag
  • FootprintByUser produces a z/x/y stack of vector tiles which correspond to a user's modifications to OSM

Stream

  • HashtagFootprintUpdater updates a z/x/y stack of vector tiles corresponding to all changes marked with a given hashtag
  • UserFootprintUpdater updates a z/x/y stack of vector tiles which correspond to a user's modifications to OSM

osmesa's People

Contributors

echeipesh avatar fosskers avatar gitter-badger avatar jamesmcclain avatar jpolchlo avatar lossyrob avatar mojodna avatar moradology avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.