georust / geozero Goto Github PK

View Code? Open in Web Editor NEW

305.0 305.0 29.0 1.84 MB

Zero-Copy reading and writing of geospatial data.

License: Apache License 2.0

Rust 99.70% Makefile 0.25% Shell 0.06%

geozero's People

Contributors

Stargazers

Watchers

geozero's Issues

Cannot parse a geometry-only shapefile without a DBF

A user in discord reported not being able to process a shapefile because they didn't have a .dbf file.

I think we should be able to parse such a datasource, provided there's a .shp file, and just treat it as having empty properties while iterating over the geometries.

thread 'open_shp_file' panicked at 'called Result::unwrap() on an Err value: MissingDbf', src/main.rs:256:19
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/result.rs:1814:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/result.rs:1107:23
   4: geogcn::open_shp_file
             at ./src/main.rs:256:15
   5: geogcn::open_shp_file::{{closure}}
             at ./src/main.rs:249:1
   6: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

Add "Geospatial" category to `Cargo.toml`

rust-lang/crates.io@9b65b4c

https://doc.rust-lang.org/cargo/reference/manifest.html#the-categories-field

GeometryCollection.to_geo() returns only the last geometry

We hit upon this by passing geometries from postgres using geozero: When a GeometryCollection is stored as a postgis Geometry field in the database, and fetched through geozero using the Decode impl, only the last geometry was returned. I'd expect the full collection to be returned instead.

Reproducing repository:
https://github.com/audunska/geozero-bug

Format support

This is a tracking issue for the current format support. Wishes for priorities or other formats can be added as comment.

	Reader (XY)	Reader (ZM, t/tm)	Writer (XY)	Writer (ZM, t/tm)	FromWkb
geo-types	✓	-	✓	-	✓
GeoJSON	✓		✓		✓
GDAL	✓	✓ (Z)	✓	(broken)	✓
GEOS	✓	✓ (Z)	✓		✓
SVG		-	✓	-	✓
WKB	✓	✓ (ZM)	✓	✓ (ZM)	-
WKT			✓	✓ (ZM)	✓
Flatgeobuf	✓	✓
Shapefile	✓

Property and Dataset Support:

	Read properties	Read dataset	Write properties	Write dataset
GeoJSON		✓		✓
GDAL
SVG	-		-	✓
Flatgeobuf	✓	✓
Shapefile	✓

CSV reader/writer

Do you think a CSV reader would be worthwhile?

ogr2ogr supports it: https://gdal.org/drivers/vector/csv.html

For example I have a dataset like this:

Address,Type,Datetime,Latitude,Longitude,Report Location,Incident Number
904 7th Av,Car Fire,05/22/2019 12:55:00 PM,47.6069,-122.329051,POINT (-122.329051 47.6069),F190051945
9610 53rd Av S,Aid Response,05/22/2019 12:55:00 PM,47.515984,-122.266529,POINT (-122.266529 47.515984),F190051946

There are multiple ways that the csv might contain geometry data in the wild, some some thought about configuration is due. Two popular ones I've seen are:

If it's a table of points, lat and lon columns are specified.
wkt column

My real world example above actually has both!

How to convert from geo_types::Geometry to geozero::GeozeroGeometry

Hi there, I'm trying to find out the most appropriate way to write some geo::MultiLineString typed objects into a flatgeobuf format file. Now I'm facing the issue when calling FgbWriter::add_feature_geom because I don't know how to create the required GeozeroGeometry type parameter. Could you please give me some hints or are there any examples for such use case?

I have seen from the FgbWriter's tests that it's using the raw flatbuffers to write the points. Is that the recommended way?

And the impl GeozeroGeometry for geo_types::Geometry<f64> looks already pretty close to the answer. But I still could not (yet) figure out how to use it properly. Sorry for my very limited rust knowledge. And thank you in advance!

RFC: API improvements

The geometry processing API has worked well for different implementations, so there are no short term changes planned. Mid term improvements:

Pass SRID to processor
More ergonomic processor configuration (currently via dimensions())

There is a initial high-level Reader API and Writer API, covering basic file reading/writing.
The current core Reader/Writer implementations have the following Structs for writing and functions for reading:

GeoJsonWriter: GeoJSON writer
read_geojson: Read and process GeoJSON feature
read_geojson_geom: Read and process GeoJSON geometry
Geos: Generator for GEOS geometry type
process_geos: Process GEOS geometry
RustGeo: Generator for georust/geo geometry type
SvgWriter: SVG writer
Wkt: WKT writer

flatgeobuf:

FgbReader: FlatGeobuf dataset reader
HttpFgbReader: FlatGeobuf dataset HTTP reader

Code examples:

let mut geos = Geos::new();
read_geojson(geojson.as_bytes(), &mut geos);

let mut fgb = FgbReader::open(&mut filein)?;
fgb.select_bbox(8.8, 47.2, 9.5, 55.3)?;
let mut json = GeoJsonWriter::new(&mut fout);
fgb.process_features(&mut json)

These struct/functions should be named more uniformly. Proposal:

FooWriter: Struct for writing format Foo. (Alternatives: Foo, FooSink, FooProducer, foo::Writer)
FooReader : Struct for reading format Foo. (Alternatives: FooSource, FooConsumer, foo::Reader)
FooReader.process_features : Read and process all selected features
foo::process: Function for processing (reading) feature of format Foo
foo:process_geom: Function for processing (reading) geometry of format Foo

Code examples:

let mut out = GeosWriter::new();
geojson::process(geojson.as_bytes(), &mut out);

Feedback welcome! @georust ?

Introduce global dimension constants

IMO, the CoordDimensions has a somewhat weird usage pattern: it uses constant functions instead of true consts. Perhaps we should introduce global DIM_XY, DIM_XYZ, etc, same as we currently have in the unit tests, but exposed globally, assuming everyone is OK with the naming. Should we do this?

Consider using pure rust protobuf codegen

protoc is one of the common ways to parse .proto files, but it comes with a cost -- each user of the crate must have it installed in their environment, which sometimes gets tricky for embedded and other setups. On the other hand, protobuf-codegen can do the same thing using Rust-only code as part of the build.rs step, and not rely on any native libs.

I have used that lib extensively (e.g. in the osm pbf parsing, and I think it might make much more sense for us to use it for MVT parse/gen.

Are there any concerns with switching to the protobuf instead of prost? I have not heard (but have not verified) any performance concerns. cc: @pka

Documentation for parsing GeoJSON from a file

I am trying to read GeoJSON from a file and iterate through the features in the FeatureCollection with their geom and feature properties. I would have thought this would be a very common use case, but I don't see any documentation giving example code for how to do this. All the examples seem to assume you've got the string of a single geometry in memory already, or that you're using FlatGeobuf (*.fgb) files.

I've found two examples on GitHub, but:

geoq: Doesn't use either GeoJsonReader or FeatureProcessor, which seems like it should be the right way…
geoarrow: Uses this GeoTableBuilder thing which I think is part of Arrow, and thus not something someone who isn't using Arrow would use.

Have I missed the sample code for this? I'd expect a simple example in the overview on the GeoJsonReader page, and probably in the main README as well.

`point_begin`/`point_end` should be called around `xy` for each Point in a MultiPoint

Extracted from #183

When processing MultiPoints, many of the formats that implement GeometryProcessor do not call point_begin/point_end for each point in a multipoint, and instead call only xy. To be consistent with MultiLineString and MultiPolygon, we should surround each child geometry with it's corresponding begin/ends.

In truth, I don't know of a format yet that requires these events, so I expect most implementations will be a no-op, but I think making the usage consistent across Multi{Point,LineString,Polygon} makes things easier to understand.

processor.begin_multipoint(...)
for point in multipoint {
+  processor.begin_point(...)
   processor.xy(...)
+  processor.end_point(...)
}
processor.end_multipoint(...)

Better support for sparse properties by declaring schema when available

I want to convert an FGB to a CSV. This already works for a typical FGB, but I'd like to take advantage of the FGB format to save some space by skipping a features' empty properties.

I think solving this problem might have some more general purpose use in geozero.

Because an FGB's properties are prefixed with their column index, when a particular feature has no value for a column, you could choose to omit the column altogether, rather than spending 6 bytes just to say "no value for this column". I've made this change in a demo FGB feature branch here: https://github.com/michaelkirk/flatgeobuf/tree/mkirk/empty-fields.

In theory there's no problem writing this back out to another FGB or to a flexible format like geojson, but some other output formats need to know the schema up front, like csv (but maybe also gpx and shapefile, arrow?).

I think it can be broken down to a few cases:

It's irrelevant for geometry-only formats such as wkt and geo-types, so we don't need to worry about them.
Formats that support writing sparse properties could be serialized more succinctly, such as fgb, geojson by omitting empty values. Probably this should be a configurable option on the writer.
Formats that support constant time access to their schema, such as csv, fgb, (arrow? gpkg?) can be deserialized in one pass. Other formats do not support constant time access to their schema, like geojson. That means it's not currently possible to convert sparse geojson to something rigid like csv, because "new" columns might appear after already writing some CSV rows. An additional pass before writing to ascertain the schema could address this, but that has some drawbacks, and in any case, doesn't currently exist. (There's no guarantees about any geojson in the wild having regular columns anyway, so we're already facing that problem to a degree).

As for a potential step forward:

/// Feature processing trait
#[allow(unused_variables)]
pub trait FeatureProcessor: GeomProcessor + PropertyProcessor {
    /// Begin of dataset processing
-    fn dataset_begin(&mut self, name: Option<&str>) -> Result<()> {
+    fn dataset_begin(&mut self, name: Option<&str>, schema: Option<Vec<ColumnArgs >>) -> Result<()> {
        Ok(())
    }

Reading from an fgb would call: dataset_begin(Some(name_from_header), Some(feature_schema_from_header)) whereas reading from geojson would call dataset_begin(None, None)

Note that this would mean introducing something like FGB's ColumnArgs and ColumnType to geozero.

Formats that require a rigid schema, like csv, could utilize that data in order to correctly "fill in the blanks" when reading features with sparse properties.

This definitely introduces some complexity into the library. Overall, I'm not sure if it's worth it. What do people think?

feature: convert GeozeroDatasource to geo package

I see there is some support for encoding/decoding geometry with geo-package: https://github.com/georust/geozero/blob/master/geozero/src/gpkg/geopackage.rs

How about a one-shot conversion of a GeozeroDatasource to convert geometry and feature data to geo package format?

let mut filein = BufReader::new(File::open("countries.fgb")?);
let mut fgb = FgbReader::open(&mut filein)?.select_all()?;
fgb.write_gpkg("contries.gpkg").unwrap();

Add czml support

https://github.com/AnalyticalGraphicsInc/czml-writer/wiki/CZML-Guide

geojson: should omit `name` field when dataset name is None

Is there any value in including the blank name property?

I think we should do:

{ 
  "type": "FeatureCollection",
- "name": "",
  "features": [ ...]
}

Support for writing WKT with precision

It doesn't appear that there's currently a way to write WKT values with a given precision?

I'm implementing "display methods" in geoarrow so that Python users can have a better display in a terminal than

<geoarrow.rust.core._rust.MultiPolygonArray at 0x10a12e9a0>

and in this use case, having too many decimals of precision is distracting. E.g. shapely gives a "repr" of

It looks like this format! would have to be changed?

geozero/geozero/src/wkt/wkt_writer.rs

Line 104 in 52a4d2d

self.out.write_all(format!("{x} {y}").as_bytes())?;

Ref https://stackoverflow.com/a/49778870

Proof of concept– combining begin and end processor methods

Just sharing in case it's something we want to consider

diff --git a/geozero/src/feature_processor.rs b/geozero/src/feature_processor.rs
index 2b8554a..6b4c8d2 100644
--- a/geozero/src/feature_processor.rs
+++ b/geozero/src/feature_processor.rs
@@ -4,7 +4,7 @@ use crate::property_processor::PropertyProcessor;
 
 /// Feature processing trait
 #[allow(unused_variables)]
-pub trait FeatureProcessor: GeomProcessor + PropertyProcessor {
+pub trait FeatureProcessor: GeomProcessor + PropertyProcessor + Sized {
     /// Begin of dataset processing
     fn dataset_begin(&mut self, name: Option<&str>) -> Result<()> {
         Ok(())
@@ -22,10 +22,16 @@ pub trait FeatureProcessor: GeomProcessor + PropertyProcessor {
         Ok(())
     }
     /// Begin of feature property processing
+    fn write_properties(&mut self, inner: impl FnOnce(&mut Self) -> Result<()>) -> Result<()> {
+        Ok(())
+    }
+    /// Begin of feature property processing
+    #[deprecated]
     fn properties_begin(&mut self) -> Result<()> {
         Ok(())
     }
     /// End of feature property processing
+    #[deprecated]
     fn properties_end(&mut self) -> Result<()> {
         Ok(())
     }
diff --git a/geozero/src/geojson/geojson_reader.rs b/geozero/src/geojson/geojson_reader.rs
index 9d107a4..29fc714 100644
--- a/geozero/src/geojson/geojson_reader.rs
+++ b/geozero/src/geojson/geojson_reader.rs
@@ -121,9 +121,9 @@ fn process_geojson_feature<P: FeatureProcessor>(
     if feature.geometry.is_some() || feature.properties.is_some() {
         processor.feature_begin(idx as u64)?;
         if let Some(ref properties) = feature.properties {
-            processor.properties_begin()?;
-            process_properties(properties, processor)?;
-            processor.properties_end()?;
+            processor.write_properties(|processor| {
+                process_properties(properties, processor)
+            })?;
         }
         if let Some(ref geometry) = feature.geometry {
             processor.geometry_begin()?;
diff --git a/geozero/src/geojson/geojson_writer.rs b/geozero/src/geojson/geojson_writer.rs
index ac7f371..97214b4 100644
--- a/geozero/src/geojson/geojson_writer.rs
+++ b/geozero/src/geojson/geojson_writer.rs
@@ -54,6 +54,12 @@ impl<W: Write> FeatureProcessor for GeoJsonWriter<'_, W> {
         self.out.write_all(b"}")?;
         Ok(())
     }
+    fn write_properties(&mut self, inner: impl FnOnce(&mut Self) -> Result<()>) -> Result<()> {
+        self.out.write_all(br#", "properties": {"#)?;
+        inner(self)?;
+        self.out.write_all(b"}")?;
+        Ok(())
+    }
     fn properties_begin(&mut self) -> Result<()> {
         self.out.write_all(br#", "properties": {"#)?;
         Ok(())

Make readme examples part of `cargo test`

Rust markdown examples are usually included as part of the cargo test, but if they are in README, they require some additional snippet of code. Currently, the examples in the readme are non-compilable (they need a few extra lines hidden with the # symbols).

end to end PostGIS example

I see there is a WKB reader/writer and the README mentions support for PostGIS, but it's not obvious to me how to actually wire everything up.

Is there a working postgis example somewhere?

Implement an sqlx (sqlite) driver

I know there is already an implementation for other databases, so it would be nice if this could be expanded for the sqlite db.

I am happy to try to help out, but my rust is not very good, so i don't know how much I could do myself.

Make srid & dim fields of all writers private, and add ctor

As was done in a recently merged here as part of #155 for WktWriter, we need to remove pub on srid and dim fields, making it an internal implementation detail. Instead, there should be an extra writer constructor, e.g. WktWriter::with_opts(&mut out, dialect, dims, srid) that takes all the needed options.

This makes access cleaner, and removes a potential bug of dynamically changing srid or dimensions during processing. Unless of course the writer should support their dynamic modification, but even in that case it should be a function, not external variable access.

cc: @pka @Oreilles

Using this crate to do bulk inserts with sqlx

Hi,
Would it be possible to use this crate to do bulk inserts as described by the FAQ section of SQLX?
https://github.com/launchbadge/sqlx/blob/master/FAQ.md#how-can-i-bind-an-array-to-a-values-clause-how-can-i-do-bulk-inserts

I'm having trouble with type casting the geometry arrays.

Create a sample MVT file with all types of geometries/properties for testing

We need to create a new MVT test file that

has a free-license - ideally CC0, but any MVT could work with an appropriate license description
has every type of geometry, the more complicated the better
has every type of property (tag) value as allowed by the spec, e.g. int, str, bool, etc

Ideally, we could even use more than one tile to cover non-standard extent (usually 4096) and non-standard SRID

Place the new tile into geozero/tests/data, and adjust mvt test to use the new file in addition to (or instead of?) the current one.

GeoPackage reader (GeozeroDatasource impl)

It would be nice to have a GeoPackage reader that handles full attribute conversion into geozero as well for the table. Or is this out of scope because it would require depending on a specific SQL client instead of leaving that choice for the user? I.e. in geoarrow-rs, I'd probably want a sync API, and would probably want to use rusqlite, but others might prefer sqlx.

`impl_scalar_property_reader!` for usize?

re: flatgeobuf/flatgeobuf#109 (comment)

let population: Option<usize> = feature.property("population");

It looks like there is no impl_scalar_property_reader! for usize.

In my specific use case, I realized it was inappropriate use of usize anyway, and switched to a u64, so everything is working for me.

Is usize something you want to support?

It seems somewhat fraught as a serializable type since it can have different size on different machines.

How to convert projection

As I do encounter mentions of SRID in the issues and docs I am a bit puzzled if this library can also convert from one projection to another? Or would I need to use it in combination with https://github.com/georust/proj and if so, is there an example maybe how to do that?

Multiple problems with Shapefile test data

Thanks for this project. I'm not a Rust user, but I'm writing a Shapefile parser for Go and am testing my code with your testdata. I think some of your example Shapefiles are incorrect, which may in turn be concealing bugs in your code.

I'm using GDAL as a reference implementation. GDAL includes an ogrinfo command that can be used to query Shapefiles:

$ ogrinfo -al geozero-shp/tests/data/point.shp
INFO: Open of `geozero-shp/tests/data/point.shp'
      using driver `ESRI Shapefile' successful.

Layer name: point
Geometry: Point
Feature Count: 1
Extent: (122.000000, 37.000000) - (122.000000, 37.000000)
Layer SRS WKT:
(unknown)
OGRFeature(point):0
  POINT (122 37)

Note that GDAL requires a .shx index file. For many of your test Shapefiles, these do not exist. I've been using this tip to create a .shx file where they are missing.

I believe that the following errors in your test Shapefiles are present:

geozero-shp/tests/data/pointz.shp: bounds in header do not match bounds of data; first record should have record number 1, but has record number 0.

geozero-shp/tests/data/polygon.shp: the polygon is invalid because its rings are not closed.

geozero-shp/tests/data/poly.dbf (note that this in in the .dbf file): the dbf file is missing the \x1a terminator.

implement WKT Reader

My first thought would be to build something on top of https://github.com/georust/wkt as you've done with the GeoJson reader.

Have you put any thought into this already? I'm willing to work on this if you're interested @pka.

Move geozero-shp into geozero?

geozero-shp is out of date; it depends on an old version of geozero and therefore it seems can't be used with geozero 0.11. Is there a reason why it exists as a separate crate? Can it be moved into the main geozero crate with a feature flag? Otherwise it seems necessary to publish a new version of geozero-shp whenever geozero releases a new version

[request] make WkbInfo and header readers public

Pretty self-explanatory; my use-case benefits from parsing the header of WKBs and EWKBs for the geometry's metadata before parsing the rest, so a minor refactor and making this type public would be really useful.

If it's too inconvenient or minor of an issue, I can submit a PR.

Docs: Add example how to use with Serde

I have a struct that uses:

#[derive(Debug, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct Item {
    pub id: i32,
    pub coordinates: Option<Decode<geo_types::Geometry<f64>>>,

The #[derive(Seralize)] produces the following error:

the trait Serialize is not implemented for geozero::wkb::Decode<geo_types::Geometry>

Would it be possible to add an example to the docs showing hot to serialize a struct fetched with sqlx using Serde?

Add a KML reader

https://docs.rs/kml/latest/kml/

`geos::Geometry` to `ewkt` panic

    #[test]
    fn geos_to_ewkt() {
        use geozero::GeozeroGeometry;
        use geozero::ToWkt;

        let wkt = "POINT(1 1)";
        let mut ggeom = geos::Geometry::new_from_wkt(wkt).unwrap();

        println!("ewkt: {}", ggeom.to_ewkt(ggeom.srid()).unwrap());
    }

this test will panic with

thread 'function::as_ewkt::tests::geos_to_ewkt' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/geos-8.3.1/src/coord_seq.rs:498:9:
assertion failed: self.nb_dimensions >= 3

Documentation for basic usage of geozero-shp

Hello, and thank you for this project.

I am trying to do a simple newbie program that, given an ESRI Shapefile as input, it outputs the geometries as GeoJSON.

I have been trying to use the geozero-shp crate, simply following the short example that can be found on its README, without any luck. It seems the example is incorrect, or it might have been outdated by outer changes on its dependencies.

I wrote this program:

use geozero::geojson::GeoJsonWriter;


fn main() {
    let path = "/home/jose/Downloads/ne_10m_admin_0_sovereignty/ne_10m_admin_0_sovereignty.shp";
    let reader = geozero_shp::Reader::from_path(path).unwrap();
    let mut json: Vec<u8> = Vec::new();
    let data = reader.iter_features(GeoJsonWriter::new(&mut json)).unwrap();
}

and when I run it, it doesn't seem that GeoJsonWriter is a valid thing to pass to the processor argument:

jose@uranium ~/C/e/shp_ingestor (master) [101]> cargo run -- /home/jose/Downloads/ne_10m_admin_0_sovereignty/ne_10m_admin_0_sovereignty.shp
   Compiling shp_ingestor v0.1.0 (/home/jose/Code/experimental/shp_ingestor)
error[E0277]: the trait bound `GeoJsonWriter<'_, Vec<u8>>: geozero::feature_processor::FeatureProcessor` is not satisfied
   --> src/main.rs:8:37
    |
8   |     let data = reader.iter_features(GeoJsonWriter::new(&mut json)).unwrap();
    |                       ------------- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `geozero::feature_processor::FeatureProcessor` is not implemented for `GeoJsonWriter<'_, Vec<u8>>`
    |                       |
    |                       required by a bound introduced by this call
    |
    = help: the following other types implement trait `geozero::feature_processor::FeatureProcessor`:
              geozero::ProcessorSink
              geozero::multiplex::Multiplexer<P1, P2>
note: required by a bound in `Reader::<T>::iter_features`
   --> /home/jose/.cargo/registry/src/github.com-1ecc6299db9ec823/geozero-shp-0.3.1/src/reader.rs:161:29
    |
161 |     pub fn iter_features<P: FeatureProcessor>(
    |                             ^^^^^^^^^^^^^^^^ required by this bound in `Reader::<T>::iter_features`

error[E0277]: the trait bound `GeoJsonWriter<'_, Vec<u8>>: geozero::feature_processor::FeatureProcessor` is not satisfied
  --> src/main.rs:8:16
   |
8  |     let data = reader.iter_features(GeoJsonWriter::new(&mut json)).unwrap();
   |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `geozero::feature_processor::FeatureProcessor` is not implemented for `GeoJsonWriter<'_, Vec<u8>>`
   |
   = help: the following other types implement trait `geozero::feature_processor::FeatureProcessor`:
             geozero::ProcessorSink
             geozero::multiplex::Multiplexer<P1, P2>
note: required by a bound in `ShapeRecordIterator`
  --> /home/jose/.cargo/registry/src/github.com-1ecc6299db9ec823/geozero-shp-0.3.1/src/reader.rs:39:35
   |
39 | pub struct ShapeRecordIterator<P: FeatureProcessor, T: Read + Seek> {
   |                                   ^^^^^^^^^^^^^^^^ required by this bound in `ShapeRecordIterator`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `shp_ingestor` due to 2 previous errors

Is there any documentation or code examples where I can see how this crate is used?

Thank you.

Formatted geojson writer

I sometimes find myself wanting formatted geojson, like what is output by serde_json::to_string_pretty.

Currently, I reach for an external tool to solve this, which is fine, but I think it'd be nice if geozero could handle this.

I'm imagining something like:

geozero --csv-geometry-column=geometry my_input.csv --geojson-pretty-output my_output.geojson

Fallible properties getter

Signature to get a property is:

fn property<T: PropertyReadType>(&self, name: &str) -> Option<T>

But that doesn't allow you to distinguish between "property missing" and "invalid property" (and an error with details about why it's invalid).

I guess I was expecting some kind of Result rather than Option. Is that something you'd consider?

e.g. trying to parse a string field into an integer.

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "foo": "Dinagat Islands"
  }
}

Maybe something like this:

let foo: u64;
match feature.properties::<u64>("foo") {
  Ok(Some(v)) => foo = v,
  Ok(None) => println!("missing property"),
  Err(e) => println!("invalid property: {}", e),
}

Geoparquet format reader

It is possible to incorporate in your library the new format launched by the OGC, GeoParquet..This is the link https://github.com/opengeospatial/geoparquet

`BufferedGeozeroDatasource` for spatial filtering on reading

It's often desired to prevent materialization of data features outside of some spatial region, usually a bounding box. Some formats, like FlatGeobuf, GeoPackage, and Shapefile, have a native implementation of spatial filtering, but others, like CSV, GeoJSON, GeoJSON Lines, don't have any built-in method to the format.

A simple way to handle this spatial filtering in general would be to implement a BufferedGeozeroDatasource. It would wrap a general GeozeroDatasource and its constructor would include a bbox parameter. It would buffer up to one feature, determine whether the geometry intersects the input bbox, and if so, propagate that feature on to the consumer.

I'm interested in implementing this kind of thing for the geoarrow crate but it might be general enough to live in the geozero crate. Thoughts?

Implement a streaming GeoJSON reader

The current GeoJSON reader is based on https://github.com/georust/geojson which reads the whole file into memory.

A few unsuccessful attempts are in https://github.com/pka/geozero/tree/parse-geojson-with-serde:

Serde Deserialize with custom deserialize methods for geometries and properties: I was not able to pass the parsed data out of the deserializon methods to the caller
Serde Transcoder (https://serde.rs/transcode.html): too much semantic information is lost in the serialize methods. Enough for pretty printing, but not well suited for reading complex data structures
Reading a GeoJSON stream and deserialize the current feature only: I didn't find a reliable way to detect the begin and end of a feature

I also had a deeper look at https://github.com/Geal/nom, which has a streaming interface. But I didn't find a good way to continue after a "Buffer empty" error.

Clarification of `GeomProcessor` methods

The documentation for the methods in the GeomProcessor trait is kinda sparse and that makes it hard to know what order of traits is guaranteed from data producers.

I'm trying to flesh out support in geoarrow for conversion from geozero to geoarrow arrays. Up until now, I've made the user choose a geometry trait when importing the trait, e.g. ToPolygonArray, and to some extent assume the inputs will indeed be polygons.

Now I'm trying to implement support from arbitrary-typed or mixed type items (see geoarrow/geoarrow-rs#304), which means storing a piece of internal state for each geometry input so that when xy() is called, I know which geometry type that's associated with, and thus which internal geometry array to append to.

But one thing that isn't documented is when each method gets called, when it's a part of a larger geometry. So for example in the GEOS reader, point_begin gets called for a point type

geozero/geozero/src/geos/geos_reader.rs

Line 49 in 8501fed

processor.point_begin(idx)?;

but not for a MultiPoint type or any other type. But linestring_begin gets called for all of line string, multi line string, polygon, and multi polygon types!

I guess this is just a feature request to flesh out these docstrings? It's hard to ensure conformance against the traits only by reading the code.

Error refactoring

GeoZero has one Error enum, and many of the values are String objects that contain the actual error information, sometimes formatted.

I think it would better to refactor it so that each specific error can be easily analyzed without any extra performance cost.

Proposed structure

enum GeozeroError {
    arrow: ArrowError, csv: CsvError, gdal: GdalError, geojson: GeojsonError, geos: GeosError, gpkg: GpkgError, gpx: GpxError, mvt: MvtError, postgis: PostgisError, svg: SvgError, tessellator: TessellatorError, wkb: WkbError, wkt: WktError,
}

Each one of the sub-errors are their own enums with the actual values. So instead of this in src/mvt/mvt_reader.rs:

return Err(GeozeroError::Feature(format!(
    "invalid feature.tags length: {:?}", feature.tags.len()
)));

we will have this:

return Err(MvtError::InvalidFeatureTagsLength(feature.tags.len());

Note that thiserror has #[from] to simplify MvtError -> GeozeroError conversion.

Existing errors

Here are all the error usages at the moment (some of these are used multiple times)

GeozeroError::ColumnNotFound
GeozeroError::ColumnType(stringify!($e), format!("{v:?}")
GeozeroError::Dataset(error
GeozeroError::Feature("invalid feature.tags length: {feature_tags_count:?}"
GeozeroError::Feature("invalid key index {key_idx}"
GeozeroError::Feature("invalid value index {value_idx}"
GeozeroError::Geometry("CoordSeq missing"
GeozeroError::Geometry("Invalid UTF-8 encoding"
GeozeroError::Geometry("Missing Geometry"
GeozeroError::Geometry("Missing LineStrings for Polygon"
GeozeroError::Geometry("Missing container for LineString"
GeozeroError::Geometry("Missing container for Polygon"
GeozeroError::Geometry("Missing polygons for MultiPolygon"
GeozeroError::Geometry("No LineStrings for MultiLineString"
GeozeroError::Geometry("No coords for LineString"
GeozeroError::Geometry("No coords for MultiPoint"
GeozeroError::Geometry("No coords for Point"
GeozeroError::Geometry("Not ready for coords"
GeozeroError::Geometry("The input was an empty Point, but the output doesn't support empty Points"
GeozeroError::Geometry("Too few coordinates in line or ring"
GeozeroError::Geometry("Unexpected geometry type"
GeozeroError::Geometry("test"
GeozeroError::Geometry(error
GeozeroError::Geometry(format!("Unsupported geometry type {geometry_type}"
GeozeroError::GeometryFormat
GeozeroError::IoError(io_err
GeozeroError::Property(format!("unsupported value type for key {key}"

Add a GPX reader

https://github.com/georust/gpx

Impl `GeozeroGeometry` for concrate type should impls srid() and dims()

Currently only geos::Geometry<'_> impl all funcs in GeozeroGeometry, other types like Ewkb will use the default impl in trait definition.

The default impls are as follows so we cannot get dims and srid for types like Ewkb:

    fn dims(&self) -> CoordDimensions {
        CoordDimensions::xy()
    }
    /// SRID of geometry
    fn srid(&self) -> Option<i32> {
        None
    }

georust / geozero Goto Github PK

geozero's People

Contributors

Stargazers

Watchers

Forkers

geozero's Issues

Proposed structure

Existing errors

Recommend Projects

Recommend Topics

Recommend Org