protomaps / osmexpress Goto Github PK
View Code? Open in Web Editor NEWFast database file format for OpenStreetMap
License: BSD 2-Clause "Simplified" License
Fast database file format for OpenStreetMap
License: BSD 2-Clause "Simplified" License
The storage for coordinates east of the prime meridian and south of the equator is not masked correctly.
This is the most important step to making this library widely useful. I envision a few ways this can happen:
osmx
program as a subprocess, which returns JSON. This is OK for very basic use cases, but hard to extend to queries other than a single OSM entity. Also, the main reason to use this library is speed, but having to fork/exec on every query would negate that.In my applications I am discovering I need to buffer the input region for extracts by a small amount, to avoid cases where data is missing near the edge of the extract.
One solution is to buffer the geometry using a library like Shapely or GEOS, this is another extra dependency though.
S2CellUnion has an Expand
operation which will add a buffer of cells at a given level around the union. Maybe the extract
command can take an --expand N
where N is the cell level.
A potential hazard is the meaning of expand and buffer is specific to the coordinate system you are working in. If you buffer in GEOS you can choose to do this in WGS84 coordinates, web mercator or your chosen projection, but if done in S2 this may have a confusing result depending on how much you precisely want to buffer by: https://s2geometry.io/resources/s2cell_statistics.html
Is it possible to somehow use this to serve vector map tiles?
I'd be interested in that script if you want to publish it. Thanks
If I run
> osmx query path/to/db.osmx
Segmentation fault
I get a segfault attempting to read argv[3]
: https://github.com/protomaps/OSMExpress/blob/master/src/cmd.cpp#L48
The docs indicate that calling the query in this form should be possible: https://github.com/protomaps/OSMExpress#command-line
As a temporary workaround, providing some random third arg so that the code falls back into the else case works:
> osmx query path/to/db.osmx print
locations: 20377226
nodes: 418210
ways: 2168392
relations: 31391
cell_node: 20377226
node_way: 22504715
node_relation: 27683
way_relation: 393230
relation_relation: 1459
Timestamp: 2020-04-02T20:59:01Z
Sequence #: 2572
(This is a summary of discussion in the OSMUS #dev slack channel)
Replication diffs as OsmChange (.OSC) files are the standard way of consuming OSM updates. The OSC format is not reference-complete. Clients that want to see the before/after for a changed object's tags, geometry or metadata need to source this information from elsewhere.
The most popular "enhanced" diff format is the Augmented Diff described on the OSM wiki: https://wiki.openstreetmap.org/wiki/Overpass_API/Augmented_Diffs This is implemented by Overpass API. I'm not aware of other implementations.
In theory, one can generate augmented diffs with two inputs: 1. an OsmChange file and 2. an osmx database that's the complete state of OSM immediately before that OsmChange is applied. The Augmented Diff can then be hosted as a static file or put on S3. The benefit of this strategy is that it has very few moving parts.
@CloudNiner at Azavea is developing on this idea here: https://github.com/azavea/onramp which is a C++ implementation. This is likely the way to go for a production-ready system. It may be worth writing a Python one as well if only to validate the correctness of outputs across different implementations.
<new>
element for deleted objects. It might depend on how the OsmChange was generated. Again, if clients don't depend on this information it might not matter.After running cmake
and make
, make install
installs all files except for osmx
.
Hi there,
During fuzz testing of the OSMX parsing there were a couple crashes discovered. Although these files only crash the apps, they could potentially be crafted further into security issues where a malformed OSMX file would be able compromise the process's memory through memory corruption, so hardening the code to prevent these kinds of bugs would be great.
You can download the crashing files in a zip from Ufile to debug and understand where the code is crashing.
Here's a snip of one of the crash logs.
Program received signal SIGSEGV, Segmentation fault.
mdb_xcursor_init1 (mc=0x7fffffffdd00, node=0x7efff767ffda) at mdb.c:7507
7507 mx->mx_db.md_pad = 0;
#0 mdb_xcursor_init1 (mc=0x7fffffffdd00, node=0x7efff767ffda) at mdb.c:7507
#1 mdb_cursor_set (mc=0x7fffffffdd00, key=0x7fffffffe0b0,
data=0x7fffffffe0a0, op=MDB_SET, exactp=0x7fffffffdcfc) at mdb.c:6142
#2 0x00000000005aceb5 in mdb_get (txn=<optimized out>, dbi=<optimized out>,
key=0x7fffffffe0b0, data=0x7fffffffe0a0) at mdb.c:5762
#3 0x000000000040fb30 in osmx::db::Metadata::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#4 0x000000000040da0b in main ()
rax 0x0 0
rbx 0x7fffffffdd00 140737488346368
rcx 0x7efff767ffff 139637832548351
rdx 0x1d 29
rsi 0x1000000000000 281474976710656
rdi 0x7efff767ffda 139637832548314
rbp 0x7fffffffdcfc 0x7fffffffdcfc
rsp 0x7fffffffdc90 0x7fffffffdc90
r8 0x7fffffffdcfc 140737488346364
r9 0x6c9a73 7117427
r10 0x6c9828 7116840
r11 0x7ffff7872be0 140737346218976
r12 0x7fffffffe0b0 140737488347312
r13 0x7efff767f000 139637832544256
r14 0xf 15
r15 0x7fffffffe0a0 140737488347296
rip 0x5ad1ba 0x5ad1ba <mdb_cursor_set+762>
eflags 0x10206 [ PF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
=> 0x5ad1ba <mdb_cursor_set+762>: mov %rsi,0x188(%rax)
0x5ad1c1 <mdb_cursor_set+769>:
movaps 0x9aa78(%rip),%xmm0 # 0x647c40
0x5ad1c8 <mdb_cursor_set+776>: movups %xmm0,0x190(%rax)
0x5ad1cf <mdb_cursor_set+783>: movq $0x0,0x1a0(%rax)
'exploitable' version 1.32
Linux ubuntu 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64
Signal si_signo: 11 Signal si_addr: 392
Nearby code:
0x00000000005ad19e <+734>: jne 0x5ad284 <mdb_cursor_set+964>
0x00000000005ad1a4 <+740>: movzx edx,WORD PTR [rdi+0x6]
0x00000000005ad1a8 <+744>: lea rcx,[rdi+rdx*1]
0x00000000005ad1ac <+748>: add rcx,0x8
0x00000000005ad1b0 <+752>: movabs rsi,0x1000000000000
=> 0x00000000005ad1ba <+762>: mov QWORD PTR [rax+0x188],rsi
0x00000000005ad1c1 <+769>: movaps xmm0,XMMWORD PTR [rip+0x9aa78] # 0x647c40
0x00000000005ad1c8 <+776>: movups XMMWORD PTR [rax+0x190],xmm0
0x00000000005ad1cf <+783>: mov QWORD PTR [rax+0x1a0],0x0
0x00000000005ad1da <+794>: movzx esi,WORD PTR [rdi+rdx*1+0x14]
Stack trace:
# 0 mdb_xcursor_init1 at 0x5ad1ba in OSMExpress/osmx
# 1 mdb_cursor_set at 0x5ad1ba in OSMExpress/osmx
# 2 mdb_get at 0x5aceb5 in OSMExpress/osmx
# 3 osmx::db::Metadata::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at 0x40fb30 in OSMExpress/osmx
# 4 main at 0x40da0b in OSMExpress/osmx
Faulting frame: # 0 mdb_xcursor_init1 at 0x5ad1ba in OSMExpress/osmx
Description: Access violation near NULL on destination operand
Short description: DestAvNearNull (15/22)
Hash: 474dceedaab021c3f42b9dc0bd4ed041.474dceedaab021c3f42b9dc0bd4ed041
Exploitability Classification: PROBABLY_EXPLOITABLE
Explanation: The target crashed on an access violation at an address matching the destination operand of the instruction. This likely indicates a write access violation, which means the attacker may control write address and/or value. However, it there is a chance it could be a NULL dereference.
Other tags: AccessViolation (21/22)
Thanks!
We're exploring an application of OSMExpress that is interested in retrieving objects filtered by specific tags from a polygon bounding area.
In its simplest form, I'm imagining a python method that is something like:
# Polygon could be any number of things, such as a Shapely polygon,
# a python dict representation of GeoJson, a list of coordinate tuples or a bbox
def extract(region: Polygon): [CellIds]
that reproduces the functionality of https://github.com/protomaps/OSMExpress/blob/master/src/extract.cpp#L130-L134
As discussed in #12 not all use cases require all metadata for locations and it has a high storage cost. There are some use cases though, like generating complete augmented diffs, that require all metadata. Keeping all metadata within OSMExpress as opposed to some external store is one option that reduces operational complexity at the cost of significant extra storage. It would be useful to hear what other use cases for all metadata might exist.
Summary of a brief discussion I had with @bdon regarding potential implementations:
Are we sure we need username in addition to uid? Username, as a variable length string field, would be the most painful to add.
This could be implemented as a "complete metadata" option, although in that case all other commands would need to know if the database was created in complete or slim (without location metadata) mode. In complete mode the locations table is not used and the nodes table would contain location, tags and metadata. In Slim mode the locations table is used and the nodes table only stores tags. We'd need to ensure that complete mode code does not impact performance of slim mode.
This task doesn't depend on #1, but may be simpler or use less storage space if it is addressed first.
Need to update constant.
We have been running osmx with nightly updates successfully for about the last ten months, powering KDE's "raw data" tile server which is e.g. used by Marble, and it has performed really well for that :)
We have been noticing a disproportional growth of the file on disk though, compared to a full re-import of the latest OSM data. A fresh import results in 786GB right now, while the incremental updates since November resulted in 877GB and growing. Is there some sort of database vacuum command (ideally in-place) to mitigate that growth, or do you have any other advice for continuous use that might help with this?
Thank you!
I'm considering to package OSM Express for NixOS / nixpkgs. As with many Linux distributions, the preferred way is to let applications use common copies of the libraries provided by the distribution, rather each bringing their own copies of those libraries.
It seems like, except for s2geometry, the libraries required by OSM Express are already in nixpkgs:
(When packaging OSM Express, I would also — separately — package s2geometry.)
However I'm unsure how to tell the OSM Express build system to use the distribution-provided libraries rather than the ones included in the source tree, and in the instructions to build from source I didn't find anything about that, either. Ideally, this can be done by passing arguments to cmake
instead of having to modify any files in the source tree.
I followed the install instructions for MacOS on Catalina. I have openssl installed via brew and also ran xcode-select --install
.
When I run:
cmake -v -DCMAKE_BUILD_TYPE=Release -DOPENSSL_INCLUDE_DIR=/usr/local/Cellar/openssl/1.0.2s/include/ .
I get the error:
[ 62%] Built target s2
[ 62%] Building CXX object CMakeFiles/osmxTest.dir/test/test_region.cpp.o
[ 62%] Building CXX object CMakeFiles/osmxTest.dir/src/region.cpp.o
[ 62%] Linking CXX executable osmxTest
ld: cannot link directly with dylib/framework, your binary is not an allowed client of /usr/lib/libcrypto.dylib for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [osmxTest] Error 1
make[1]: *** [CMakeFiles/osmxTest.dir/all] Error 2
make: *** [all] Error 2
The up to date version of openssl on Catalina is [email protected] 1.1.1d
, so I also tried using the include path for that version above, e.g.:
cmake -v -DCMAKE_BUILD_TYPE=Release -DOPENSSL_INCLUDE_DIR=/usr/local/Cellar/[email protected]/1.1.1d/include/ .
and got the same error. I tried omitting the openssl include dir option entirely, and also got the same error.
This is totally polish, but it would be nice to have a friendly help message when you run osmx
without arguments. Right now it throws an exception:
OSMExpress (master *)$ ./osmx
No such file or directory, file /Users/matth/projects/misc/OSMExpress/src/storage.cpp, line 26.
Abort trap: 6
Curious, have you considered using a string pool for storing frequent tags? Currently, OSMExpress stores all tags as :List(Text)
, but looking at taginfo I wonder if it might be worth representing the 64K most frequent tags as 16-bit integers. The numeric tag IDs might get assigned when an OSMExpress database is initially getting built from a planet, and never change during the database lifetime. If anyone happens to give this a try, I’d be curious about how much space this would save in practice. Of course it would make the codebase more complicated, also for clients who just want to decode an OSMExpress database. So, as always, there’d be a tradeoff.
If a polygon is passed in where all points are coincident, it should handle this as belonging to only one cell, instead of returning the entire planet as the covering.
Using the pre-compiled 0.2.0 Linux binary I get the following assert when trying to load a full planet pbf file:
$ osmx expand planet-latest.osm.pbf planet.osmx
Start convert
generator planet-dump-ng 1.1.8
osmosis_replication_timestamp 2020-07-13T00:00:00Z
pbf_dense_nodes true
pbf_optional_feature_0 Has_Metadata
pbf_optional_feature_1 Sort.Type_then_ID
timestamp 2020-07-13T00:00:00Z
Box: (-180,-90,180,90)
Timestamp: 2020-07-13T00:00:00Z
Sequence#:
Start insert
[======================================================================] 100%
Start External sort cell_node
[======================================================================] 100%
Finished External sort cell_node in 3596.03 seconds.
Start External sort node_way
MDB_KEYEXIST: Key/data pair already exists, file /home/ubuntu/OSMExpress/src/storage.cpp, line 157.
Aborted (core dumped)
Using a smaller extract worked fine.
I'm trying to get the OSMExpress CommandLine Tool to work with Windows. As there is no release for windows yet I followed the manual build instructions to build the project in the Windows Linux Subsystem (Ubuntu). The commands do run successfully but the resulting osmx executable isn't found by the system for some reason. I tried setting it as a path variable and it seems to work but when trying to execute osmx expand new_york_county.osm.pbf new_york_county.osmx
I get the following database error:
The use case here is doing extracts for a given administrative boundary.
Because creating multipolygons is out of scope of this library, we might assume at worst complete connectivity between node locations In the relation, so all covering approximations of relations must be convex.
For most use cases like a metropolitan area, this assumption is fine as they are generally convex.
For unusually defined areas, such as a admin boundary surrounding another (example: South Africa : Lesotho) this strategy isn't great
For admin boundaries that have exclaves (United States with Alaska included https://www.openstreetmap.org/relation/148838, France with all overseas departments https://www.openstreetmap.org/relation/2202162 ) this strategy is really bad.
A refinement is to do some minor interpretation of relation roles, creating a convex hull for each "outer" member.
this would be cool
It would be nice if I could query for multiple ids at once per osm element and then have the result piped to stdout in a common data format such as CSV or GeoJSON. Something like
osmx query ./database.osmx way 1001 1002 1234
// or
osmx query ./database.osmx way 1001,1002,1234
I'm not that familiar with C++ or these APIs but it looks like if I wanted to do that myself using the osmexpress headers in code I'd have to make one getReader(id)
call for each node, way or relation element that I wanted to pull from the database. Is that correct or is there another API call I can use to return a List of results?
Cool project! I'm considering to package OSM Express for NixOS / nixpkgs.
Package names in nixpkgs:
(See section Package naming in the Nixpkgs manual.)
Given these constraints, what should the package name be?
osmexpress
?osm-express
?osmx
, like your PyPI package for the python lib?FlatBuffers might be simpler to compile and install because it has less features than capnp - if the impact on speed and file size is small (or better), we should use that instead. We can also try to implement a hardcoded string encoding for common keys/values for space savings.
the osmx
command line utility is useful to non-C++ programmers who just want to download, update and query OpenStreetMap data. There should be either a giant static binary build or an installer bundle with libraries included for macOS, Ubuntu.
Hi. New to the project, trying to pull some tag information for a node, and hitting a wall. I failed to find any examples to help me along, so I'm going to describe the problem, and hope that there is a simple fix.
This problem arises from the augmented diff construction process, and the discovery that in node deletion, an overpass-generated diff shows tag information for a node that is being deleted, but diffs generated by https://github.com/azavea/onramp have an empty tag field. I noticed that this block doesn't attempt to read in the tags, as is the case for ways and relations. (This is relevant to OSMX, I swear.)
I went to add node tags to this module, but ran into problems finding these tags. Given a known node id, the following
node_id = ######
env = osmx.Environment(osmx_filename)
with osmx.Transaction(env) as txn:
nodes = osmx.Nodes(txn)
nd = nodes.get(node_id)
print(nd.tags)
fails because nd
is None
, though the location does exist.
I must be misunderstanding in a basic way, and could use a pointer or two.
Thanks!
Possibly add one or all of:
We would ignore metadata for nodes that have no tags.
The latest version available at https://pypi.org/project/osmx/ is 0.0.3
I'm on a fresh Ubuntu 22.04 install, and I'm attempting to compile OSMExpress.
I get through CMake, and when I make
, I get the following error:
[ 32%] Built target capnp
[ 32%] Building CXX object vendor/s2geometry/CMakeFiles/s2.dir/src/s2/base/stringprintf.cc.o
/home/n/OSMExpress/vendor/s2geometry/src/s2/base/stringprintf.cc:17:10: fatal error: 's2/base/stringprintf.h' file not found
#include "s2/base/stringprintf.h"
^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [vendor/s2geometry/CMakeFiles/s2.dir/build.make:76: vendor/s2geometry/CMakeFiles/s2.dir/src/s2/base/stringprintf.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1791: vendor/s2geometry/CMakeFiles/s2.dir/all] Error 2
make: *** [Makefile:166: all] Error 2
I also attempted to run the 0.2.0 compiled binary, and I get this error.
./osmx: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.