Code Monkey home page Code Monkey logo

Comments (5)

bdon avatar bdon commented on May 24, 2024 1

Hi @vkrause ,

  1. Do you only one one update a day, with a daily diff from planet.openstreetmap.org ?
  2. Are there read transactions that may be happening at the same time as the update?

My understanding of LMDB is that the database size will never shrink - this is a consequence of its MVCC design which avoids the performance impact of a compaction phase. If there is reads happening simultaneously as writes, the writes are guaranteed to grow the DB instead of reusing empty pages.

Maybe it will be useful to implement osmx compact file.osmx which simply creates a new database, does cursor iteration over every old sub-database and inserts into new with MDB_APPEND - effectively offline compaction. This means your peak storage usage when you're done writing new and haven't deleted old + renamed is about 2 terabytes. Would that be a preferred solution over a full re-import? I think it should be much faster than a reimport from PBF, but hard to know without testing.

from osmexpress.

vkrause avatar vkrause commented on May 24, 2024

Thanks for the quick response @bdon!

1. Do you only one one update a day, with a daily diff from planet.openstreetmap.org ?

2. Are there read transactions that may be happening at the same time as the update?

Right, we only do one update a day using the daily diffs, and there is no safeguard against simultaneous reads during that time.

My understanding of LMDB is that the database size will never shrink - this is a consequence of its MVCC design which avoids the performance impact of a compaction phase. If there is reads happening simultaneously as writes, the writes are guaranteed to grow the DB instead of reusing empty pages.

Ah, that's an interesting theory! So if we run the updates more frequently that might increase the likelihood of read/write collisions, but the "damage" would be much smaller when they happen, and thus this could overall reduce the growth speed?

Maybe it will be useful to implement osmx compact file.osmx which simply creates a new database, does cursor iteration over every old sub-database and inserts into new with MDB_APPEND - effectively offline compaction. This means your peak storage usage when you're done writing new and haven't deleted old + renamed is about 2 terabytes. Would that be a preferred solution over a full re-import? I think it should be much faster than a reimport from PBF, but hard to know without testing.

The system we are running this on only has 1TB of fast SSD storage unfortunately (but otherwise has plenty of resources), so we did the full reimport on slow disks and replaced the osmx file afterwards. A more efficient offline compaction thus wouldn't really reduce the downtime (which is the copying of the final file, not the reimport itself).

We'll experiment with more frequent updates, if that slows down the growths a bit we already should get to just one reimport per year (and thus 1-2 hours of scheduled downtime), that's manageable.

Thank you!

from osmexpress.

bdon avatar bdon commented on May 24, 2024

What if you implement reader/writer mutual exclusion, by having the reader acquire the same lock like this:

https://github.com/protomaps/OSMExpress/blob/master/utils/osmx-update#L17

If your application can accept reads being blocked for as long as a write happens - which for minutely updates should be a few seconds at most - it's worth experimenting to see if that solves the DB growth issue. Measuring the effect of mutual exclusion or frequent updates would be useful as a contribution to the docs :)

from osmexpress.

vkrause avatar vkrause commented on May 24, 2024

What if you implement reader/writer mutual exclusion, by having the reader acquire the same lock like this:

https://github.com/protomaps/OSMExpress/blob/master/utils/osmx-update#L17

That could be a viable option indeed, and looks straightforward to implement.

If your application can accept reads being blocked for as long as a write happens - which for minutely updates should be a few seconds at most - it's worth experimenting to see if that solves the DB growth issue. Measuring the effect of mutual exclusion or frequent updates would be useful as a contribution to the docs :)

I'll do some measurements now that I know what to try. Definitely happy to report/contribute back what we find, all our work is free/open/public anyway :)

Thanks again for your help!

from osmexpress.

bdon avatar bdon commented on May 24, 2024

Marking this as closed for now since behavior is as expected.

from osmexpress.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.