Code Monkey home page Code Monkey logo

Comments (6)

jbms avatar jbms commented on May 14, 2024

What is happening when overwriting a shard is that tensorstore assumes there is no existing shard (this saves one read request in the common case of no existing shard), performs a write conditioned on the key not existing (which still requires a full upload of the data), gets back an error, reads the existing shard and merges the changes, then rewrites it. Since that involves 2 uploads and 1 download of the entire shard, I would expect it may take 3 times as long.

What you suggest as far as detecting that the entire shard is being rewritten, and then performing the write unconditionally sounds like the best solution. I will look into implementing that, hopefully in the next couple days.

The first of the two uploads could be eliminated by instead checking if there is existing data first, but that would introduce one extra read operation in the common case of no existing data and would still be 2x the normal cost.

There isn't an api at the moment for getting the shard path for a given chunk. In the future I do plan to add apis for retrieving the preferred grid for reading and writing a volume, which would make it easier to perform shard-aligned writes.

from tensorstore.

jbms avatar jbms commented on May 14, 2024

As an update, I am still working on fixing this issue fully --- it turned out to be trickier than expected. I implemented the approach outlined in the prior comment of doing an unconditional write in the case that all chunks are being written (without any preconditions). An additional fix was needed to also handle the case where the volume was not an exact multiple of the chunk size --- previously, the resultant partial "edge" chunks were not eligible for unconditional writeback.

These fixes essentially resolved this specific issue, but while testing them I found that there was a race condition whereby writeback may start too early before all of the pending writes have been flushed from one cache to another, leading to a similar inefficiency, and there isn't really any way to reliably avoid that race. To address this problem in a clean way, I'm working on implementing a transaction system that would allow deferring writeback and then atomically commiting the writes to a shard.

from tensorstore.

stephenplaza avatar stephenplaza commented on May 14, 2024

Thanks for spending time on this ... it is definitely going to be immensely helpful for the workflow that I am building now.

Not sure if this will help, but one concept that has worked well for us in DVID ingestion is to have an 'unsafe' mode. Basically a flag that will disable a couple critical features for ensuring data consistency in favor of speed and with the assumption that only an expert would enable this flag.

from tensorstore.

stephenplaza avatar stephenplaza commented on May 14, 2024

Jeremy: does the current build have this half fix? (I was planning to do another ingestion run with my new EM ingestion service using your recent fix for the sharded/unsharded spec from another issue and wanted to know if I should expect side effects from this issue)

from tensorstore.

jbms avatar jbms commented on May 14, 2024

The current build does not have the half fix, though if you think it would be useful I can try to get that pushed out later today. I am making good progress on the full fix, though it is a larger change.

from tensorstore.

stephenplaza avatar stephenplaza commented on May 14, 2024

I don't think a week will matter too much if you will have it by then. Thanks again!!

from tensorstore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.