Code Monkey home page Code Monkey logo

Comments (9)

kylebarron avatar kylebarron commented on September 25, 2024 1

It's correct that we don't currently support mixed geometry types as input.

It would be good to have a helper to do this. It looks like shapely.get_type_ids is quite fast (10ms for 3M points), so we'll add something like this to the internals of viz. You should be able to use this until the next release with just viz(split_gdf(gdf)).

from typing import List
from shapely import GeometryType


def split_gdf(gdf: gpd.GeoDataFrame) -> List[gpd.GeoDataFrame]:
    type_ids = np.array(shapely.get_type_id(gdf.geometry))
    unique_type_ids = set(np.unique(type_ids))

    if GeometryType.GEOMETRYCOLLECTION in unique_type_ids:
        raise ValueError("GeometryCollections not currently supported")

    if GeometryType.LINEARRING in unique_type_ids:
        raise ValueError("LinearRings not currently supported")

    if len(unique_type_ids) == 1:
        return [gdf]

    if len(unique_type_ids) == 2:
        if unique_type_ids == {GeometryType.POINT, GeometryType.MULTIPOINT}:
            return [gdf]

        if unique_type_ids == {GeometryType.LINESTRING, GeometryType.MULTILINESTRING}:
            return [gdf]

        if unique_type_ids == {GeometryType.POLYGON, GeometryType.MULTIPOLYGON}:
            return [gdf]

    gdfs = []
    point_indices = np.where(
        (type_ids == GeometryType.POINT) | (type_ids == GeometryType.MULTIPOINT)
    )[0]
    if len(point_indices) > 0:
        gdfs.append(gdf.iloc[point_indices])

    linestring_indices = np.where(
        (type_ids == GeometryType.LINESTRING)
        | (type_ids == GeometryType.MULTILINESTRING)
    )[0]
    if len(linestring_indices) > 0:
        gdfs.append(gdf.iloc[linestring_indices])

    polygon_indices = np.where(
        (type_ids == GeometryType.POLYGON) | (type_ids == GeometryType.MULTIPOLYGON)
    )[0]
    if len(polygon_indices) > 0:
        gdfs.append(gdf.iloc[polygon_indices])

    return gdfs

In the long run, we'll support mixed geometry types on the JS side as well, but I think unblocking Python users is good enough for now.

Using this code results in another error:

Can you make a separate issue for this? This could be a bug in geoarrow-rs or geoarrow-pyarrow.

from lonboard.

kylebarron avatar kylebarron commented on September 25, 2024 1

Is lonboard currently optimized to display big polygon files? Points using the scatterplot layer are blazingly fast, but the polygon layer takes a toll on machine resources.

Polygons are necessarily much more consuming to draw than points. As is, Lonboard draws every polygon without any simplification. Your options to improve performance are:

  • Drawing with only an outline filled=False or with only a fill stroked=False.;
  • Simplifying polygons before rendering
  • Reducing the amount of data rendered.

from lonboard.

RaczeQ avatar RaczeQ commented on September 25, 2024

After looking into the stacktrace, it looks like shapely issue with to_ragged_array function.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 2
      1 # table = pq.read_table(osm_data_path)
----> 2 viz(table)
      3 # table

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:150, in viz(data, scatterplot_kwargs, path_kwargs, polygon_kwargs, map_kwargs)
    138     layers = [
    139         create_layer_from_data_input(
    140             item,
   (...)
    146         for i, item in enumerate(data)
    147     ]
    148 else:
    149     layers = [
--> 150         create_layer_from_data_input(
    151             data,
    152             _viz_color=color_ordering[0],
    153             scatterplot_kwargs=scatterplot_kwargs,
    154             path_kwargs=path_kwargs,
    155             polygon_kwargs=polygon_kwargs,
    156         )
    157     ]
    159 map_kwargs = {} if not map_kwargs else map_kwargs
    161 if "basemap_style" not in map_kwargs.keys():

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:186, in create_layer_from_data_input(data, **kwargs)
    184 # pyarrow table
    185 if isinstance(data, pa.Table):
--> 186     return _viz_geoarrow_table(data, **kwargs)
    188 # Shapely array
    189 if isinstance(data, np.ndarray) and np.issubdtype(data.dtype, np.object_):

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:365, in _viz_geoarrow_table(table, _viz_color, scatterplot_kwargs, path_kwargs, polygon_kwargs)
    356 def _viz_geoarrow_table(
    357     table: pa.Table,
    358     *,
   (...)
    362     polygon_kwargs: Optional[PolygonLayerKwargs] = None,
    363 ) -> Union[ScatterplotLayer, PathLayer, PolygonLayer]:
    364     table = remove_extension_classes(table)
--> 365     table = parse_wkb_table(table)
    367     geometry_column_index = get_geometry_column_index(table.schema)
    368     geometry_field = table.schema.field(geometry_column_index)

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/parse_wkb.py:33, in parse_wkb_table(table)
     31     extension_type_name = field.metadata.get(b"ARROW:extension:name")
     32     if extension_type_name in wkb_names:
---> 33         new_field, new_column = parse_wkb_column(field, column)
     34         table = table.set_column(field_idx, new_field, new_column)
     36 return table

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/parse_wkb.py:84, in parse_wkb_column(field, column)
     81 # We call shapely.from_wkb on the _entire column_ so that we don't get mixed type
     82 # arrays in each column.
     83 shapely_arr = shapely.from_wkb(column)
---> 84 new_field, geom_arr = construct_geometry_array(
     85     shapely_arr,
     86     crs_str=crs_str,
     87 )
     89 # Slice full array to maintain chunking
     90 chunk_offsets = [0]

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/extension_types.py:292, in construct_geometry_array(shapely_arr, include_z, field_name, crs_str)
    282 def construct_geometry_array(
    283     shapely_arr: NDArray[np.object_],
    284     include_z: Optional[bool] = None,
   (...)
    290     # extension metadata on the field without instantiating extension types into the
    291     # global pyarrow registry
--> 292     geom_type, coords, offsets = shapely.to_ragged_array(
    293         shapely_arr, include_z=include_z
    294     )
    296     if coords.shape[-1] == 2:
    297         dims = CoordinateDimension.XY

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/shapely/_ragged_array.py:291, in to_ragged_array(geometries, include_z)
    286         raise ValueError(
    287             "Geometry type combination is not supported "
    288             f"({[GeometryType(t).name for t in geom_types]})"
    289         )
    290 else:
--> 291     raise ValueError(
    292         "Geometry type combination is not supported "
    293         f"({[GeometryType(t).name for t in geom_types]})"
    294     )
    296 return typ, coords, offsets

ValueError: Geometry type combination is not supported (['POINT', 'LINESTRING', 'POLYGON', 'MULTIPOLYGON'])

from lonboard.

RaczeQ avatar RaczeQ commented on September 25, 2024

Using this code results in another error:

from geoarrow.rust.core import read_parquet

table = read_parquet(osm_data_path)
viz(table)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[52], line 1
----> 1 viz(table)

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:150, in viz(data, scatterplot_kwargs, path_kwargs, polygon_kwargs, map_kwargs)
    138     layers = [
    139         create_layer_from_data_input(
    140             item,
   (...)
    146         for i, item in enumerate(data)
    147     ]
    148 else:
    149     layers = [
--> 150         create_layer_from_data_input(
    151             data,
    152             _viz_color=color_ordering[0],
    153             scatterplot_kwargs=scatterplot_kwargs,
    154             path_kwargs=path_kwargs,
    155             polygon_kwargs=polygon_kwargs,
    156         )
    157     ]
    159 map_kwargs = {} if not map_kwargs else map_kwargs
    161 if "basemap_style" not in map_kwargs.keys():

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:204, in create_layer_from_data_input(data, **kwargs)
    202 if hasattr(data, "__arrow_c_stream__"):
    203     data = cast("ArrowStreamExportable", data)
--> 204     return _viz_geoarrow_table(pa.table(data), **kwargs)
    206 # Anything with __geo_interface__
    207 if hasattr(data, "__geo_interface__"):

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/table.pxi:5221, in pyarrow.lib.table()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/ipc.pxi:880, in pyarrow.lib.RecordBatchReader._import_from_c_capsule()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:88, in pyarrow.lib.check_status()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/geoarrow/pyarrow/_type.py:58, in GeometryExtensionType.__arrow_ext_deserialize__(cls, storage_type, serialized)
     55 schema = lib.SchemaHolder()
     56 storage_type._export_to_c(schema._addr())
---> 58 c_vector_type = lib.CVectorType.FromStorage(
     59     schema, cls._extension_name.encode("UTF-8"), serialized
     60 )
     62 return cls(c_vector_type)

File src/geoarrow/c/_lib.pyx:496, in geoarrow.c._lib.CVectorType.FromStorage()

File src/geoarrow/c/_lib.pyx:375, in geoarrow.c._lib.CVectorType._move_from_ctype()

ValueError: Failed to initialize GeoArrowSchemaView: Expected valid list type for coord parent 1 for extension 'geoarrow.multipoint'

from lonboard.

RaczeQ avatar RaczeQ commented on September 25, 2024

File for tests:
monaco_nofilter_noclip_compact.zip

from lonboard.

RaczeQ avatar RaczeQ commented on September 25, 2024

Thanks @kylebarron for some guidance. For now, I've used duckdb to get the geometry type (geoarrow-c has some internal functions to get that when calculating unique geometry types, but it's not exposed to the python api) and pyarrow compute filter to separate a single table into three with this code:

import duckdb
import pyarrow as pa
import pyarrow.compute as pc
from geoarrow.pyarrow import io
from lonboard import viz

duckdb.install_extension("spatial")
duckdb.load_extension("spatial")

try:
    from osmnx import geocode_to_gdf

    import quackosm as qosm

    region = geocode_to_gdf("Chicago") # will donwload 279 mb file, works on a 16 gb machine
    # region = geocode_to_gdf("Greater London") # doesn't work on a 16 gb machine, can display points and linestrings as separate maps, but not polygons
    data_path = qosm.convert_geometry_to_gpq(region.unary_union)

    geoparquet_path = data_path
except Exception:
    geoparquet_path = "monaco_nofilter_noclip_compact.geoparquet" # use file from the zip attached, small example

geometry_types = (
    duckdb.read_parquet(str(geoparquet_path))
    .project("ST_GeometryType(ST_GeomFromWKB(geometry)) t")
    .to_arrow_table()["t"]
)

polygon_values = pa.array(["POLYGON", "MULTIPOLYGON"])
linestring_values = pa.array(["LINESTRING", "MULTILINESTRING"])
points_values = pa.array(["POINT", "MULTIPOINT"])
array_filters = (points_values, linestring_values, polygon_values)

geoarrow_tbl = io.read_geoparquet_table(geoparquet_path)

arrays = [
    geoarrow_tbl.filter(pc.is_in(geometry_types, value_set=search_values))
    for search_values in array_filters
]

viz(arrays)

Is lonboard currently optimized to display big polygon files? Points using the scatterplot layer are blazingly fast, but the polygon layer takes a toll on machine resources.

image
image

from lonboard.

kylebarron avatar kylebarron commented on September 25, 2024

Longer term we may additionally implement an optional tiled approach, where e.g. I port https://github.com/mapbox/geojson-vt to Rust + Arrow and then we dynamically fetch tiles from Python to JS. That would allow a general use case of having more data in Python than you want to send to the browser. See #414

from lonboard.

kylebarron avatar kylebarron commented on September 25, 2024

File for tests: monaco_nofilter_noclip_compact.zip

By the way I think we've been thinking that GeoParquet files should still be saved with a .parquet file extension, not .geoparquet, but I opened an issue on the spec repo to discuss. opengeospatial/geoparquet#211

from lonboard.

RaczeQ avatar RaczeQ commented on September 25, 2024

Thank you Kyle for the help and incorporating this use-case into the lonboard 🙂
I have to admit, I wasn't aware that *.geoparquet extension might be not compliant with official GeoParquet specification, so I'll probably update my libraries for that as well 😅

from lonboard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.