Code Monkey home page Code Monkey logo

Comments (12)

bdon avatar bdon commented on May 24, 2024

+1, should be doable by:

  1. bring in the s2 bindings
  2. use the s2 region coverer on your input geometry
  3. traverse the cell index based on prefix of the coverings, putting node IDs into a set
  4. do a pass over the set of nodes, finding the set of referenced ways, relations... etc

from osmexpress.

CloudNiner avatar CloudNiner commented on May 24, 2024

I attempted a first pass at this, but got a bit stuck. From your comment, are you suggesting that this can be done entirely in Python using the existing osmx bindings, perhaps pending addressing #23?

I'm using the s2sphere library as it appears to provide an s2geometry installation in an easily pip installable package so that should address getting cell ids.

In attempting to unpack

traverse the cell index based on prefix of the coverings

Can I pass the cell id I get back from RegionCoverer directly to the osmx bindings Index class? Is the name argument to pass to Index constructor the same as the lmdb table names, e.g. cell_node?

from osmexpress.

bdon avatar bdon commented on May 24, 2024

Can I pass the cell id I get back from RegionCoverer directly to the osmx bindings Index class?

Yes, if your coverer has both minimum and maximum cell levels set to 16, you can then query cell_node directly with this cell ID. This can be implemented as a first pass. It's not optimal though, because a huge region might need thousands or even millions of level 16 cells. A more efficient way is to let cells be anywhere from level 0 to 16 for a sparse representation of the covering; because every S2 parent cell shares a prefix with all its children, you can use LMDB cursor ranges to iterate over child cells efficiently like is done in the C++ code https://github.com/protomaps/OSMExpress/blob/master/src/storage.cpp#L171

The s2sphere package looks neat, I haven't tried it. I believe the S2 source tree itself has "official" python bindings but unsure what the workflow is to get that packaged up neatly. If the s2sphere library is easier to install and has everything we need, we should use that.

from osmexpress.

bdon avatar bdon commented on May 24, 2024

Is the name argument to pass to Index constructor the same as the lmdb table names, e.g. cell_node?

I think cell_node should work the same way as the other subclasses of Index, they're all just tables from int64 -> [int64]. Something like

class CellNode(Index):
    def __init__(self,txn):
        super().__init__(txn,cell_node')

from osmexpress.

CloudNiner avatar CloudNiner commented on May 24, 2024

if your coverer has both minimum and maximum cell levels set to 16

Looks like I wasn't setting minimum as well as maximum. Will play around with this a bit more. The s2sphere region coverer returns a list of CellId which does have the child_begin and child_end methods on it so that more efficient implementation seems feasible at first glance.

I believe the S2 source tree itself has "official" python bindings

It does here but it's not obvious to me either how to include them in a python/pip project easily.

other subclasses of Index

🤦 sometimes reading source on a Monday am is hard?

Thanks for the hints, I'll update with progress, no ETA though. If you find another use case for this or begin working on it let me know.

from osmexpress.

CloudNiner avatar CloudNiner commented on May 24, 2024

I managed to spend some time on this today, creating a Python CLI that generates a GeoJson file of nodes and ways filtered by the requested tags and bounding box.

Lots of possible future work left here, including but not limited to:

  • Relation support
  • Implement the more efficient s2 cell walk described above
  • Correctly generate LineString or Polygon for ways (currently always a LineString)
  • Support Polygon as input in addition to BBox
  • Add option to allow user to specify AND or OR when providing multiple tag filters
  • Refactor the core operation of returning a list of OSM objects within a polygon to a method that can live in OSMExpress

https://gist.github.com/CloudNiner/8b020f434d0509419032c6b2fe0577c5

from osmexpress.

bdon avatar bdon commented on May 24, 2024

Looks good so far! some random thoughts, some directly related and some more speculative:

Correctly generate LineString or Polygon for ways (currently always a LineString)
Relation support

This should be done outside of osmx, but I think for ways you need to 1) ensure the orientation is correct and 2) ensure it is not self-intersecting, which OSM ways can be, but is not allowed per the OGC Simple Features spec
If you intend to also support MultiPolygons, I think it's a pretty tough nut to crack, and a good implementation of multi polygon builders exists already within Osmium. Something I want to investigate is if a osmium handler can be "fed" directly from an osmx database instead of an .osm.pbf file

Support Polygon as input in addition to BBox
Refactor the core operation of returning a list of OSM objects within a polygon to a method that can live in OSMExpress

the osmx API should only have one way to specify the query region, taking a covering - a list of cell IDs - as the argument. We should have some helper methods to convert from common geometry specs to S2 regions and coverings. In the C++ code I called this "region", so we should create another python module probably called "region" to handle this

Add option to allow user to specify AND or OR when providing multiple tag filters

unrelated to osmx but I implemented something similar in another project which you may want to look at: https://github.com/hotosm/osm-export-tool-python/blob/master/osm_export_tool/sql.py

Node = namedtuple("Node", ["id", "location", "tags", "metadata"])
Way = namedtuple("Way", ["id", "nodes", "tags", "metadata"])

I would avoid having to mirror the layout of OSM objects in your client code because you need to malloc for every object. Ideally the osmx bindings make it easy to take advantage of zero-copy views of OSM objects - this may mean the API provides functions that take sets of object IDs and uses yield and generator expressions to iterate over them

node_ids.extend(cell_node_index.get(cell_id.id()))

node_ids should generally be a python set because there will be lots of duplicates. And ideally iterating over that set should guarantee ascending order

from osmexpress.

CloudNiner avatar CloudNiner commented on May 24, 2024

I updated my prior example to take an s2.S2Region instead of a bbox and wrote the Python code to convert a geojson.Polygon to s2.S2Region. There are other improvements, like the using generators to yield nodes and ways and the Set() fix described above. Here's the updated complete example: https://gist.github.com/CloudNiner/e5552aaae6c235193b8f196558b92e2c

Unfortunately, the parts of this example that might be useful in the OSMExpress python bindings (polygon -> s2region and node_ids_for_region) now utilize the s2 SWIG bindings because S2Polygon isn't implemented in s2sphere. This prevented a PR because the bindings are not directly pip installable. s2-py is a potential option but installs an out of date (from 2018) version of the library. It's also troublesome (no guidance from s2 docs and a few open issues relating to the SWIG bindings) to install the SWIG bindings to anywhere other than the global default python installation. There is an open PR that adds support for building a pip wheel but its been dormant for awhile now.

Regardless, I hope the example is instructive and provides a useful start to those looking to perform similar tasks or jump start a future effort to add some of these operations to the OSMX Python bindings when there's a clear path forward for inclusion of the s2 bindings as a dependency.

from osmexpress.

bdon avatar bdon commented on May 24, 2024

I agree that the bindings not being pip installable is an issue, thanks for figuring out the workaround.

Were you able to accomplish the goal of your script in an efficient manner using Python? It's useful to know if investing more in Python packaging is worth it for use cases similar to yours - If you encountered performance issues, it may be better to direct developers towards just using the C++ API

from osmexpress.

CloudNiner avatar CloudNiner commented on May 24, 2024

Were you able to accomplish the goal of your script in an efficient manner using Python?

I'd say yes in this case. There's a few different performance bottlenecks a user might run into, but once I had the efficient implementation for the function s2region -> node id set, building a prototype with the python bindings was really quick. The s2sphere version (is pip installable!) of that function wasn't significantly slower than the s2 swig bindings so perhaps an implementation in osmx bindings for LatLngRect and Cap might be warranted and would serve any query use case well.

The search_by_ways code begins to slow as you increase query area or the number of results returned but as far as I can tell, the issue is just volume of operations being performed. I was still able to write 100s of mb of geojson results for areas up to 1000 sq kms returning thousands of ways, it just took 1-2mins. Whether that's satisfactory or not depends on individual use case, for us it was fine since our use case is focused on smaller areas with few results at a time.

from osmexpress.

bdon avatar bdon commented on May 24, 2024

https://gist.github.com/CloudNiner/e5552aaae6c235193b8f196558b92e2c#file-osmx_utils-py-L23

This can be made even faster :) as is, on line 23, if you are traversing a cell at a low level, I think you are potentially issuing millions of queries to Index.get - consider the case where there is only 1 record in the database for that low-level cell, but the cell is made of 1 million level16 nodes.

Instead, use the child_begin cell to set the lmdb cursor state and then move the cursor forward. You won't be able to use Index.get, you will need to write another method that manipulates the cursor; the logic and LMDB functions used should be the same as in the C++ code. This will limit the number of LMDB operations to the number of records in the actual cell, 1 record in the example above.

Curious to see if this improves on the slowness of big query areas you mentioned.

from osmexpress.

bdon avatar bdon commented on May 24, 2024

I'm closing this issue because of lack of activity.

The main use case for OSMExpress remains getting PBFs out of a minutely-updated planet .osmx. Analytical use cases in Python may be better served by https://docs.geodesk.com/python which looks like an excellent new project.

from osmexpress.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.