using geofence polygon to judge one event is too complicated and slow in Hive by direc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Seeking geofence st_contains query alternative about h3 HOT 4 CLOSED

uber commented on July 18, 2024

Seeking geofence st_contains query alternative

from h3.

Comments (4)

nrabinowitz commented on July 18, 2024 1

(As this is a question, not a feature request or bug report, please direct to StackOverflow in the future.)

I think there's some confusion here - you can use the polyfill function to return the hexagons within a polygon, but they are all the same resolution (supplied by the caller). You would have to call compact on the resulting hexagons in order to get hexagons in multiple resolutions.

I am not a Hive expert, but there are a few options for using H3 to spatially index data points:

The simplest is to use polyfill to fill a given polygon at a resolution that fits your desired precision, and then create a table using the hexagons as a reverse index with rows like h3index, polygon_id. Data points with lat/lon can then be mapped to a H3 index using geoToH3 (ideally at index time), and you can join this field with the polygon table to find the polygon id for a given data point. This is generally very fast, but the reverse index can get very large depending on the size and precision of the polygons you need to index.
A slower but more space-efficient option uses compact to index the polygon at multiple resolutions. Assigning a data point to a polygon would then require performing n joins or queries, where n is the number of different resolutions in the compacted set. This might be a better solution if you needed to cover a significant geographic area with high precision (making a standard reverse index very large) but generally only had to handle a single data point at a time (e.g. in a geocoding API).

Obviously if your polygons are stable, you'd be better off doing this at index time, storing the polygon id with the data point, rather than at query time.

from h3.

nrabinowitz commented on July 18, 2024 1

As noted, I don't actually know much about Hive :). This tradeoff was something we were considering with Cassandra queries at one point.

from h3.

dfellis commented on July 18, 2024

@nrabinowitz I don't think the second one would be that much slower. If you take your given lat, lng coordinates and compute the H3 index at all of the possible resolutions (say 6, 7, 8, 9 for your compacted set) and simply query for any of those 4 matching, it would just be four integer compares until it either succeeds or fails, and that would probably actually be faster than a single comparison across the entire uncompacted set.

If it was a normal non-Hive database, you could index the H3 integers with a hash index and it would literally be just 4 hash lookups and you'd get the answer in O(1) time instead of O(n), but that's the trade-off you make with the Hadoop ecosystem versus classic DBs (more space, but slower queries).

from h3.

harryprince commented on July 18, 2024

@dfellis join multiple times seems too stupid on the hive SQL grammar side, however, so far, it seems the best solution.

from h3.

Seeking geofence st_contains query alternative about h3 HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent