mapquest / mapquest-osm-server Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 4.0 902 KB

A scalable OpenStreetMap API server built over a distributed key/value store.

License: Other

Python 100.00%

mapquest-osm-server's People

Stargazers

Watchers

Forkers

jkoshy fosm anuragvohraec

mapquest-osm-server's Issues

Implement cache invalidation on the front-end

Once support for adding incremental changes to the data store is added (see #14, osmChange support to the dbmgr tool), we would need a way to invalidate 'stale' entries from the caches maintained inside the front-end servers.

A (too simple) solution could be to have the front-ends periodically reset their caches, for example by reading a 'generation count' of some kind from the data store at regular intervals and resetting their caches if this count has changed. Such an approach would however be wasteful because updates to the data store are likely to be frequent (once a minute with the Minutely Mapnik process), but with each update only touching a very small number of elements.

Need a tool that can extract a geographical subset of a 'full' planet.

This tool would be useful for creating smaller sized extracts for testing the handling of a 'full' planet dump (issue #4).

Investigate alternate geo-grouping and storage schemes for OSM data

Investigate more cache efficient ways to group OSM elements.

"Real-life" usage (e.g., map queries from JOSM) appear to have a significant degree of geographical locality to them.

It may help to group OSM elements in a different manner, so that repeated queries in the same geographical area are served with fewer compute resources (and faster).

This ticket tracks the following tasks:

Collect data on actual data access patterns.
Implement a cache efficient grouping scheme that works well with these access patterns.

<-- References -->

Reduce data transfer in the implementation of the /map API

In the implementation of the /map API, we can reduce the data transferred between the data store and the front end by modifying the implementation in the following way:

Store node coordinates along with node IDs in the geo-indexes created by the dbmgr tool.
After retrieving geo-index information covering the desired bounding box to the /map API, use these node coordinates to immediately filter out nodes falling outside the desired bounding box.

Use Cyclone & Python-Twisted instead of Tornado

Use Cyclone instead of TornadoWeb, for the front end in the Python prototype. Cyclone is built over Python-Twisted.

The benefits would be:

The convenience of the twistd driver program.
Improved asynchronicity.

Create a tool for verifying the contents of the data store

Create a test tool that verifies that the contents of the data store match that in a given planet dump.

Extend the /capabilities API to indicate an upstream server

(Related to issue #7)

Return an additional element in the response to a /capabilities query indicating the upstream server to use for POST and PUT operations.

Investigate better ways to access a Membase backend

The code currently accesses membase in memcache compatibility mode. This is sub-optimal in a couple of ways:

Traffic to the membase is routed through a proxy (moxi), which implies an additional hop for data to traverse.
The python client used by the current server does not deal with all the error responses that a membase server could issue.
The memcache protocol itself has limits that the membase does not have.

Choose and use a suitable interface library so that a Membase backend can be accessed efficiently.

Create an installable package using distutils

Package the server's source code using distutils, so that it can be easily distributed.

Move project documentation to the main source tree.

Move project documentation to the main source tree from the wiki.

This would make it easier to keep the project's documentation and code in sync.

Cloning the project would clone documentation automatically, without an extra step to clone the wiki's Git repository.
Collaboration on documentation can use GitHub's facilities.
Only one variant of Markdown syntax to worry about.

Add support for ingesting PBF planet dumps

The PBF distribution format is space-efficient compared to XML, and parses faster.

Add support for parsing PBF input to the dbmgr tool.

Create an installation guide

Create user-centric documentation covering the care and feeding of the API Server.

A rough outline of this article would be:

Introduction
- A brief overview of the OSM project.
- A description of what this API service does.
- Limitations of this API service.
Pre-requisites for installation.
- Supported operating systems.
- Pre-requisite packages.
The installation process.
- Describe this for each supported OS.
Monitoring the operation of the server.
- Facilities for logging.
- Monitoring performance metrics.
- Looking out for errors.
Troubleshooting help.
Appendices:
- How to report bugs.
- Building the server from source.
- Contributing source code patches.

The guide would need be written in using open-source documentation tools that are in common use in the open-source eco-system, say one of LaTeX, ConTeXt or DocBook.

The dbmgr tool needs to be faster/more frugal with memory

A complete planet.osm dump currently holds of the order of a billion nodes, ninety million ways and just a bit less than a million relations, per the current statistics for the OSM database.

In order to be able to deal with a data set of this size in a reasonable amount of time, the db-mgr ingestion tool needs to be sped up considerably, and also made frugal in its memory consumption.

Retrieve slab configuration information from the data store

Slab configuration information is used by the dbmgr tool to group OSM elements into slabs.

The dbmgr tool and the front-end currently retrieve slab related information from the system configuration file (config/osm-api-server.cfg). It would be more robust for the front-end to retrieve slab information from the data store directly.

Configuration information in the data store should also be versioned, so that incompatible configuration schemas can be detected.

Provide APIs to query meta-data about the contents of the data store

It would be useful if meta-data about the contents of the data-store could be queried externally. For example:

A human readable comment describing the contents of the data store.
Upload history.
- The time of the each (full or incremental) update to the data store.
- Information about the data used (file size, last modified time, source URL, if any).
Information about the OSM elements available in the data store, perhaps in the form of a min--max range for each type of element.

Store map data in the data store in a compact, encoded form

Store data in the data store encoded using protobufs. If done right, this change has the potential to reduce the storage requirements for the back end.

Add support for ingesting osmChange documents

Incremental updates to the map are distributed by the OpenStreetMap project in the form of OsmChange files. These need to be supported by the ingestor tool.

Open Issue: OsmChange files do not contain <changeset> elements; for now, the ingestor would need to fetch these elements from an upstream server.

See also: issue #22 which tracks related changes to the front-end.

Investigate support for queries using XAPI-style predicates

XAPI offers the ability to query the map using complex predicates.

For example:

/api/0.6/node[amenity=hospital]
Queries the map for all nodes containing <tag k="amenity" v="hospital" />
/api/0.6/*[key1=value1][key2=value2]...
Queries the map for all elements matching the predicates (key1 == value1 && key2 == value2).
/api/0.6/relation[not(way)]
Retrieve relations that do not have way members.
/api/0.6/node[@user=name]
Retrieve nodes last change by a specific user name.

Investigate how best these queries could be implemented in the current architecture.

Support "full" planet dumps

"Full" planet dumps contain the entire history of the map. In order to support full dumps, we would need to:

Represent edits to the map in a space efficient manner.
Implement support for ingesting change history.
Implement support for serving change history via the front-end server.

Redirect write requests to an upstream server

It would be useful for this API server to redirect requests that it cannot handle (using an appropriate HTTP response code) to an upstream server.

Examples of such requests include:

Requests for data that is not present in the datastore, e.g., GPS traces, or user data.
Requests that would modify the data, e.g., requests to the /create URIs or POST requests to URIs.

This feature would allow this API server to act as a fast and scalable "front end" for the current OSM API server.

txriak offers an interface to Riak from Twisted/Python.

mapquest / mapquest-osm-server Goto Github PK

mapquest-osm-server's People

Stargazers

Watchers

Forkers

mapquest-osm-server's Issues

Recommend Projects

Recommend Topics

Recommend Org