For my ingestion service, I am doing shard-aligned writes. However, sometimes, due to

performance overwriting a shard about tensorstore HOT 6 CLOSED

google commented on May 14, 2024

performance overwriting a shard

from tensorstore.

Comments (6)

jbms commented on May 14, 2024

What is happening when overwriting a shard is that tensorstore assumes there is no existing shard (this saves one read request in the common case of no existing shard), performs a write conditioned on the key not existing (which still requires a full upload of the data), gets back an error, reads the existing shard and merges the changes, then rewrites it. Since that involves 2 uploads and 1 download of the entire shard, I would expect it may take 3 times as long.

What you suggest as far as detecting that the entire shard is being rewritten, and then performing the write unconditionally sounds like the best solution. I will look into implementing that, hopefully in the next couple days.

The first of the two uploads could be eliminated by instead checking if there is existing data first, but that would introduce one extra read operation in the common case of no existing data and would still be 2x the normal cost.

There isn't an api at the moment for getting the shard path for a given chunk. In the future I do plan to add apis for retrieving the preferred grid for reading and writing a volume, which would make it easier to perform shard-aligned writes.

from tensorstore.

jbms commented on May 14, 2024

As an update, I am still working on fixing this issue fully --- it turned out to be trickier than expected. I implemented the approach outlined in the prior comment of doing an unconditional write in the case that all chunks are being written (without any preconditions). An additional fix was needed to also handle the case where the volume was not an exact multiple of the chunk size --- previously, the resultant partial "edge" chunks were not eligible for unconditional writeback.

These fixes essentially resolved this specific issue, but while testing them I found that there was a race condition whereby writeback may start too early before all of the pending writes have been flushed from one cache to another, leading to a similar inefficiency, and there isn't really any way to reliably avoid that race. To address this problem in a clean way, I'm working on implementing a transaction system that would allow deferring writeback and then atomically commiting the writes to a shard.

from tensorstore.

stephenplaza commented on May 14, 2024

Thanks for spending time on this ... it is definitely going to be immensely helpful for the workflow that I am building now.

Not sure if this will help, but one concept that has worked well for us in DVID ingestion is to have an 'unsafe' mode. Basically a flag that will disable a couple critical features for ensuring data consistency in favor of speed and with the assumption that only an expert would enable this flag.

from tensorstore.

stephenplaza commented on May 14, 2024

Jeremy: does the current build have this half fix? (I was planning to do another ingestion run with my new EM ingestion service using your recent fix for the sharded/unsharded spec from another issue and wanted to know if I should expect side effects from this issue)

from tensorstore.

jbms commented on May 14, 2024

The current build does not have the half fix, though if you think it would be useful I can try to get that pushed out later today. I am making good progress on the full fix, though it is a larger change.

from tensorstore.

stephenplaza commented on May 14, 2024

I don't think a week will matter too much if you will have it by then. Thanks again!!

from tensorstore.

performance overwriting a shard about tensorstore HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent