Code Monkey home page Code Monkey logo

Comments (5)

gjoseph92 avatar gjoseph92 commented on July 28, 2024

Yeah, a timeout on the read would be reasonable. Then we could have that timeout trigger a retry via whatever logic we implement for #18.

@mukhery I'm curious if you have a reproducer for this, or have noticed cases/datasets/patterns that tend to cause it more often?

For now, you might try playing with setting GDAL_HTTP_MAX_RETRY and GDAL_HTTP_RETRY_DELAY via LayeredEnv. See https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files-random-access and https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_HTTP_TIMEOUT.

Maybe something like:

retry_env = stackstac.DEFAULT_GDAL_ENV.updated(dict(
    GDAL_HTTP_TIMEOUT=45,
    GDAL_HTTP_MAX_RETRY=5,
    GDAL_HTTP_RETRY_DELAY=0.5
))
stackstac.stack(..., gdal_env=retry_env)

from stackstac.

mukhery avatar mukhery commented on July 28, 2024

I tried to come up with something to reproduce but haven't been able to. We've also been seeing several other network-related/comms issues, so it's possible that our specific workload and how we've implemented the processing is causing some of these issues. I ended up just adding timeouts to the task futures and then cancelling and/or restarting the cluster if needed to meet our current need. Feel free to close this issue if you'd like and I can reopen later if I'm able to reliably reproduce.

from stackstac.

gjoseph92 avatar gjoseph92 commented on July 28, 2024

I'll keep it open, since I think it's a reasonable thing to implement.

I ended up just adding timeouts to the task futures

Curious how you implemented this?

from stackstac.

mukhery avatar mukhery commented on July 28, 2024

Sounds good, thanks!

I did something like this:

try:
    fut = cluster.client.compute(<task_involving_stackstac_data>)
    dask.distributed.wait(fut, timeout=600)
except dask.distributed.TimeoutError as curr_exception:
    error_text = f'{curr_exception}'[:100] #sometimes the error messages are crazy long
    print(f'task failed with exception: {error_text}')

from stackstac.

gjoseph92 avatar gjoseph92 commented on July 28, 2024

Nice! That makes sense.

from stackstac.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.