Code Monkey home page Code Monkey logo

Comments (8)

airbreather avatar airbreather commented on August 15, 2024

From @DGuidi on February 28, 2017 9:44

Any way to compress the result? Germany now give me a shapefile of 70GB (!).

My2Cents: shapefile is a binary format with a well defined standard, so size of file is directly related to size of data. maybe you can (I think):

  1. simplify your geometries
  2. create your own "ShapefileZipWriter" that directly generates a zipped archive of a shapefile (if possible)

from nettopologysuite.io.shapefile.

airbreather avatar airbreather commented on August 15, 2024

Oh... does BigEndianBinaryWriter need the same performance improvement I did in c6d2ccd? I see some calls to that class's naive WriteIntBE method. Maybe the corresponding reader too...

from nettopologysuite.io.shapefile.

airbreather avatar airbreather commented on August 15, 2024

If that doesn't help, could you please provide sample code and maybe a sample serialized RouterDb file so we can look at the same thing? I've got an old serialized routerdb file, but it dates back to the times when Itinero was part of OsmSharp, and I'm guessing you've changed stuff since then, and I really don't want to dedicate CPU time to rerun odp assuming it's still as slow as it was back then.

If I can just get that little bit of help, I'd love to spend time agonizing over this one.

from nettopologysuite.io.shapefile.

airbreather avatar airbreather commented on August 15, 2024

From @xivk on February 28, 2017 14:59

I'll try and build a sample application and do some profiling too, stay tuned. :-)

from nettopologysuite.io.shapefile.

airbreather avatar airbreather commented on August 15, 2024

I've done some exploration in my branch perf-exploration:

  • e2aba74 stops us from allocating extra on every big-endian write
  • 578a508 stops us from flushing the output writer after every feature when we don't have to (i.e., when we're not writing a .shx file that needs to know)
  • aa6953d moves a heap allocation out of a loop
  • 294ec76 handles writing the .shp, .dbf, and .shx (if any) files all in a single scan of the input features (also gets rid of this method's version of the heap allocation I moved out of that other loop)
  • 9e8e1c1 lets us actually skip writing the .shx file if we don't want to (lots of stuff in the code seems to think that it might ever be possible for us not to write this).
  • baeb13f combines what used to be separate .Min() and .Max() loops into one, for both Z and M ordinates.
  • 40a6b0d gets rid of another place where we would flush the stream before and after writing each feature; this time, we would just do it in order to throw an exception if there's a bug in our own code, which seems unnecessary (maybe we can bring it back with a property on the writer or something).

from nettopologysuite.io.shapefile.

xivk avatar xivk commented on August 15, 2024

Related to this issue, if we can get rid of the count required parameter in the header we can use IEnumerables as input:

https://github.com/NetTopologySuite/NetTopologySuite.IO.ShapeFile/blob/master/NetTopologySuite.IO.GeoTools/ShapefileDataWriter.cs#L25

Maybe it's possible to write this count after all features have been enumerated?

I did notice a huge improvement already because we now seem to only enumerating the collection once instead of twice! 👍 💯

from nettopologysuite.io.shapefile.

airbreather avatar airbreather commented on August 15, 2024

write this count after all features have been enumerated

A problem with this is that we would have to seek, which means that we'd start only supporting seekable streams (at least for callers that don't have a count to pass in, though branching like that adds maintenance cost), which may turn out to be really awkward.

from nettopologysuite.io.shapefile.

xivk avatar xivk commented on August 15, 2024

Yes, I also considered that, but it's a pretty big show stopper for some usecases not to be able 'stream' the data into the writer.

Maybe we should just stop using shapefiles ;-)

Anyway, for now, I count features in another way before writing so I'm not blocked on this or something like that.

from nettopologysuite.io.shapefile.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.