Code Monkey home page Code Monkey logo

Comments (10)

plamut avatar plamut commented on August 10, 2024 1

@mcsimps2 Thanks for the quick feedback, it seems that the fix could be pretty straightforward. It might even make sense to rename timeout to total_timeout in order to better distinguish it from the timeout passed to the requests transport...

The issue can probably also be reproduced by artificially throttling one's internet connection, but if there is a need to access a big file, I will let you know. Or for verifying a patch when ready as a part of a PR review, that would also be cool!

from python-storage.

mcsimps2 avatar mcsimps2 commented on August 10, 2024

I also realize that it would be wise to split such large files into a multi-part upload, but the issue still remains, especially for slow internet connections.

from python-storage.

IlyaFaer avatar IlyaFaer commented on August 10, 2024

@plamut, PTAL

from python-storage.

plamut avatar plamut commented on August 10, 2024

(disclaimer: speaking from the top of my head, would need to check the code to confirm)

Not a definite answer, but the motivation for adding a TimeoutGuard was to support customizable total timeouts for the underlying operations, when these operations can potentially involve multiple requests.

For example, sometimes there is a Future object sitting on top that wants to timeout after X seconds, even though each individual underlying request would take less than X seconds. This was needed to support some use case scenarios in BigQuery.

With that said, it is possible that this addition broke the expected semantics in other use cases, e.g. where timeout is expected to represent a per-request timeout.

If there is no explicit timeout given (timeout=None), then the TimeoutGuard should act as a no-op. I think the following paragraph is the key:

File uploads for a couple minutes, exceeding the default timeout of 60/61 seconds (...) that the TimeGuard uses.

IIRC, this is what happens - the library user provides no explicit timeout, but unlike some other libraries, this library also defines a default timeout (60, 61), which is passed to the TimeoutGuard. However, that timeout is probably meant as a timeout for each requests request, and not the total timeout. As the requests docs explain:

timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.

@mcsimps2 Does this sound plausible? From the issue description it seems to me that your application does not use explicit timeouts, is that correct?

(in which case the reported timeout error is indeed a bug - thanks for reporting it!)

Update: Classified as P1, as this can break large file uploads, yet no obvious and reasonable workaround exists (monkeypatching does not count).

from python-storage.

mcsimps2 avatar mcsimps2 commented on August 10, 2024

Ah, that does make sense - I see how that's useful if you have multiple requests going on and need to cancel an operation mid-way through the sequence of requests if the "clock" timeout was exceeded.

You're correct, I do not set any explicit timeouts when calling the code, so any non-null timeout populated to the TimeoutGuard must be coming from a library default.

However, that timeout is probably meant as a timeout for each requests request, and not the total timeout.

As a quick check (and for debugging purposes only), I can verify that setting the timeout passed to TimeoutGuard to None, while still passing the read and connect timeouts to requests per the timeout parameter, does indeed stop this issue from happening in the AuthorizedSession.request function and allows me to successfully complete a large/slow upload operation.

So it seems that, perhaps, specifying a separate timeout to pass to TimeoutGuard in addition to the one we want to pass to the requests library would remediate this problem, although that might be a naive fix since I have only looked at a very limited portion of the codebase, and it would be one more method argument to worry about.

I would be happy to run any tests that would be helpful since I have access to big files and a slow network.

from python-storage.

plamut avatar plamut commented on August 10, 2024

@mcsimps2 FYI, I opened a pull request in the google-auth library.

The timeout parameter no longer affects the TimeoutGuard, and instead a separate max_allowed_time parameter is added to control it. Its value is None by default, making the TimeoutGuard a no-op by default as well.

If you wish, you can already try it out by using the google-auth version from the PR branch, but otherwise there will probably be a new release soon anyway, and the Storage lib will just need a version bump of the google-auth dependency.

from python-storage.

mcsimps2 avatar mcsimps2 commented on August 10, 2024

Awesome, thank you for pushing this forward so quickly - much appreciated! This will help a lot, I'm looking forward to the next release

from python-storage.

plamut avatar plamut commented on August 10, 2024

@mcsimps2 The most recent google-auth release google-auth==1.11.0 contains the change that separates transport timeouts from the total time wall clock timeout. The latter is None by default, meaning that TimoutGuard should not be raising unnecessary Timeout errors anymore.

Until a new storage lib release is made, you could try manually pinning google-auth to the abovementioned version, resolving this issue.

Also, thanks for the very good and detailed report, it made diagnosing the issue much faster. 👍

from python-storage.

crwilcox avatar crwilcox commented on August 10, 2024

@plamut is all of the work done for this (minus a release)?

from python-storage.

plamut avatar plamut commented on August 10, 2024

@crwilcox Yes, the bug no longer occurs if google-auth>=1.11.0 is used (the bugfix was done in the latter awhile ago).

from python-storage.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.