Comments (9)
It's correct that we don't currently support mixed geometry types as input.
It would be good to have a helper to do this. It looks like shapely.get_type_ids
is quite fast (10ms for 3M points), so we'll add something like this to the internals of viz
. You should be able to use this until the next release with just viz(split_gdf(gdf))
.
from typing import List
from shapely import GeometryType
def split_gdf(gdf: gpd.GeoDataFrame) -> List[gpd.GeoDataFrame]:
type_ids = np.array(shapely.get_type_id(gdf.geometry))
unique_type_ids = set(np.unique(type_ids))
if GeometryType.GEOMETRYCOLLECTION in unique_type_ids:
raise ValueError("GeometryCollections not currently supported")
if GeometryType.LINEARRING in unique_type_ids:
raise ValueError("LinearRings not currently supported")
if len(unique_type_ids) == 1:
return [gdf]
if len(unique_type_ids) == 2:
if unique_type_ids == {GeometryType.POINT, GeometryType.MULTIPOINT}:
return [gdf]
if unique_type_ids == {GeometryType.LINESTRING, GeometryType.MULTILINESTRING}:
return [gdf]
if unique_type_ids == {GeometryType.POLYGON, GeometryType.MULTIPOLYGON}:
return [gdf]
gdfs = []
point_indices = np.where(
(type_ids == GeometryType.POINT) | (type_ids == GeometryType.MULTIPOINT)
)[0]
if len(point_indices) > 0:
gdfs.append(gdf.iloc[point_indices])
linestring_indices = np.where(
(type_ids == GeometryType.LINESTRING)
| (type_ids == GeometryType.MULTILINESTRING)
)[0]
if len(linestring_indices) > 0:
gdfs.append(gdf.iloc[linestring_indices])
polygon_indices = np.where(
(type_ids == GeometryType.POLYGON) | (type_ids == GeometryType.MULTIPOLYGON)
)[0]
if len(polygon_indices) > 0:
gdfs.append(gdf.iloc[polygon_indices])
return gdfs
In the long run, we'll support mixed geometry types on the JS side as well, but I think unblocking Python users is good enough for now.
Using this code results in another error:
Can you make a separate issue for this? This could be a bug in geoarrow-rs or geoarrow-pyarrow.
from lonboard.
Is lonboard currently optimized to display big polygon files? Points using the scatterplot layer are blazingly fast, but the polygon layer takes a toll on machine resources.
Polygons are necessarily much more consuming to draw than points. As is, Lonboard draws every polygon without any simplification. Your options to improve performance are:
- Drawing with only an outline
filled=False
or with only a fillstroked=False
.; - Simplifying polygons before rendering
- Reducing the amount of data rendered.
from lonboard.
After looking into the stacktrace, it looks like shapely
issue with to_ragged_array
function.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 2
1 # table = pq.read_table(osm_data_path)
----> 2 viz(table)
3 # table
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:150, in viz(data, scatterplot_kwargs, path_kwargs, polygon_kwargs, map_kwargs)
138 layers = [
139 create_layer_from_data_input(
140 item,
(...)
146 for i, item in enumerate(data)
147 ]
148 else:
149 layers = [
--> 150 create_layer_from_data_input(
151 data,
152 _viz_color=color_ordering[0],
153 scatterplot_kwargs=scatterplot_kwargs,
154 path_kwargs=path_kwargs,
155 polygon_kwargs=polygon_kwargs,
156 )
157 ]
159 map_kwargs = {} if not map_kwargs else map_kwargs
161 if "basemap_style" not in map_kwargs.keys():
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:186, in create_layer_from_data_input(data, **kwargs)
184 # pyarrow table
185 if isinstance(data, pa.Table):
--> 186 return _viz_geoarrow_table(data, **kwargs)
188 # Shapely array
189 if isinstance(data, np.ndarray) and np.issubdtype(data.dtype, np.object_):
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:365, in _viz_geoarrow_table(table, _viz_color, scatterplot_kwargs, path_kwargs, polygon_kwargs)
356 def _viz_geoarrow_table(
357 table: pa.Table,
358 *,
(...)
362 polygon_kwargs: Optional[PolygonLayerKwargs] = None,
363 ) -> Union[ScatterplotLayer, PathLayer, PolygonLayer]:
364 table = remove_extension_classes(table)
--> 365 table = parse_wkb_table(table)
367 geometry_column_index = get_geometry_column_index(table.schema)
368 geometry_field = table.schema.field(geometry_column_index)
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/parse_wkb.py:33, in parse_wkb_table(table)
31 extension_type_name = field.metadata.get(b"ARROW:extension:name")
32 if extension_type_name in wkb_names:
---> 33 new_field, new_column = parse_wkb_column(field, column)
34 table = table.set_column(field_idx, new_field, new_column)
36 return table
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/parse_wkb.py:84, in parse_wkb_column(field, column)
81 # We call shapely.from_wkb on the _entire column_ so that we don't get mixed type
82 # arrays in each column.
83 shapely_arr = shapely.from_wkb(column)
---> 84 new_field, geom_arr = construct_geometry_array(
85 shapely_arr,
86 crs_str=crs_str,
87 )
89 # Slice full array to maintain chunking
90 chunk_offsets = [0]
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/extension_types.py:292, in construct_geometry_array(shapely_arr, include_z, field_name, crs_str)
282 def construct_geometry_array(
283 shapely_arr: NDArray[np.object_],
284 include_z: Optional[bool] = None,
(...)
290 # extension metadata on the field without instantiating extension types into the
291 # global pyarrow registry
--> 292 geom_type, coords, offsets = shapely.to_ragged_array(
293 shapely_arr, include_z=include_z
294 )
296 if coords.shape[-1] == 2:
297 dims = CoordinateDimension.XY
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/shapely/_ragged_array.py:291, in to_ragged_array(geometries, include_z)
286 raise ValueError(
287 "Geometry type combination is not supported "
288 f"({[GeometryType(t).name for t in geom_types]})"
289 )
290 else:
--> 291 raise ValueError(
292 "Geometry type combination is not supported "
293 f"({[GeometryType(t).name for t in geom_types]})"
294 )
296 return typ, coords, offsets
ValueError: Geometry type combination is not supported (['POINT', 'LINESTRING', 'POLYGON', 'MULTIPOLYGON'])
from lonboard.
Using this code results in another error:
from geoarrow.rust.core import read_parquet
table = read_parquet(osm_data_path)
viz(table)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[52], line 1
----> 1 viz(table)
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:150, in viz(data, scatterplot_kwargs, path_kwargs, polygon_kwargs, map_kwargs)
138 layers = [
139 create_layer_from_data_input(
140 item,
(...)
146 for i, item in enumerate(data)
147 ]
148 else:
149 layers = [
--> 150 create_layer_from_data_input(
151 data,
152 _viz_color=color_ordering[0],
153 scatterplot_kwargs=scatterplot_kwargs,
154 path_kwargs=path_kwargs,
155 polygon_kwargs=polygon_kwargs,
156 )
157 ]
159 map_kwargs = {} if not map_kwargs else map_kwargs
161 if "basemap_style" not in map_kwargs.keys():
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:204, in create_layer_from_data_input(data, **kwargs)
202 if hasattr(data, "__arrow_c_stream__"):
203 data = cast("ArrowStreamExportable", data)
--> 204 return _viz_geoarrow_table(pa.table(data), **kwargs)
206 # Anything with __geo_interface__
207 if hasattr(data, "__geo_interface__"):
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/table.pxi:5221, in pyarrow.lib.table()
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/ipc.pxi:880, in pyarrow.lib.RecordBatchReader._import_from_c_capsule()
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:88, in pyarrow.lib.check_status()
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/geoarrow/pyarrow/_type.py:58, in GeometryExtensionType.__arrow_ext_deserialize__(cls, storage_type, serialized)
55 schema = lib.SchemaHolder()
56 storage_type._export_to_c(schema._addr())
---> 58 c_vector_type = lib.CVectorType.FromStorage(
59 schema, cls._extension_name.encode("UTF-8"), serialized
60 )
62 return cls(c_vector_type)
File src/geoarrow/c/_lib.pyx:496, in geoarrow.c._lib.CVectorType.FromStorage()
File src/geoarrow/c/_lib.pyx:375, in geoarrow.c._lib.CVectorType._move_from_ctype()
ValueError: Failed to initialize GeoArrowSchemaView: Expected valid list type for coord parent 1 for extension 'geoarrow.multipoint'
from lonboard.
File for tests:
monaco_nofilter_noclip_compact.zip
from lonboard.
Thanks @kylebarron for some guidance. For now, I've used duckdb to get the geometry type (geoarrow-c
has some internal functions to get that when calculating unique geometry types, but it's not exposed to the python api) and pyarrow compute filter to separate a single table into three with this code:
import duckdb
import pyarrow as pa
import pyarrow.compute as pc
from geoarrow.pyarrow import io
from lonboard import viz
duckdb.install_extension("spatial")
duckdb.load_extension("spatial")
try:
from osmnx import geocode_to_gdf
import quackosm as qosm
region = geocode_to_gdf("Chicago") # will donwload 279 mb file, works on a 16 gb machine
# region = geocode_to_gdf("Greater London") # doesn't work on a 16 gb machine, can display points and linestrings as separate maps, but not polygons
data_path = qosm.convert_geometry_to_gpq(region.unary_union)
geoparquet_path = data_path
except Exception:
geoparquet_path = "monaco_nofilter_noclip_compact.geoparquet" # use file from the zip attached, small example
geometry_types = (
duckdb.read_parquet(str(geoparquet_path))
.project("ST_GeometryType(ST_GeomFromWKB(geometry)) t")
.to_arrow_table()["t"]
)
polygon_values = pa.array(["POLYGON", "MULTIPOLYGON"])
linestring_values = pa.array(["LINESTRING", "MULTILINESTRING"])
points_values = pa.array(["POINT", "MULTIPOINT"])
array_filters = (points_values, linestring_values, polygon_values)
geoarrow_tbl = io.read_geoparquet_table(geoparquet_path)
arrays = [
geoarrow_tbl.filter(pc.is_in(geometry_types, value_set=search_values))
for search_values in array_filters
]
viz(arrays)
Is lonboard currently optimized to display big polygon files? Points using the scatterplot layer are blazingly fast, but the polygon layer takes a toll on machine resources.
from lonboard.
Longer term we may additionally implement an optional tiled approach, where e.g. I port https://github.com/mapbox/geojson-vt to Rust + Arrow and then we dynamically fetch tiles from Python to JS. That would allow a general use case of having more data in Python than you want to send to the browser. See #414
from lonboard.
File for tests: monaco_nofilter_noclip_compact.zip
By the way I think we've been thinking that GeoParquet files should still be saved with a .parquet
file extension, not .geoparquet
, but I opened an issue on the spec repo to discuss. opengeospatial/geoparquet#211
from lonboard.
Thank you Kyle for the help and incorporating this use-case into the lonboard
🙂
I have to admit, I wasn't aware that *.geoparquet
extension might be not compliant with official GeoParquet specification, so I'll probably update my libraries for that as well 😅
from lonboard.
Related Issues (20)
- Test with pyogrio 0.8
- GeoArrowLayer
- Use `ravel()` instead of `flatten()`
- Invert layers order in `split_mixed_gdf` function HOT 2
- Google Colab crashes for large datasets HOT 4
- Understanding fly_to() and set_view_state() HOT 3
- Automated widget testing
- [BUG] Error in reprojection from GeoArrow stream with non-WGS84 crs and non-nullable coords HOT 1
- Add test for fixed GeoArrow reprojection with GeoPandas 1.0
- [BUG] None default get_line_color in PolygonLayer not respected HOT 4
- [EPIC] Support for ColumnLayer (extruded point values) HOT 3
- feature: add support to zoom into a specific layer HOT 2
- [BUG] Layer Attribution Option HOT 2
- Rename "other args" to "keyword args" in docstrings
- Move griffe-inherited-docstrings into pyproject.toml docs section HOT 1
- Add CI check for mkdocs build without warning
- Add layer control to permit opacity control
- [BUG] Maps not displaying in jupyter notebook HOT 1
- Why "Input being reprojected to EPSG:4326 CRS"? HOT 2
- Add ESLint HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lonboard.