Comments (10)
@mcsimps2 Thanks for the quick feedback, it seems that the fix could be pretty straightforward. It might even make sense to rename timeout
to total_timeout
in order to better distinguish it from the timeout
passed to the requests
transport...
The issue can probably also be reproduced by artificially throttling one's internet connection, but if there is a need to access a big file, I will let you know. Or for verifying a patch when ready as a part of a PR review, that would also be cool!
from python-storage.
I also realize that it would be wise to split such large files into a multi-part upload, but the issue still remains, especially for slow internet connections.
from python-storage.
@plamut, PTAL
from python-storage.
(disclaimer: speaking from the top of my head, would need to check the code to confirm)
Not a definite answer, but the motivation for adding a TimeoutGuard
was to support customizable total timeouts for the underlying operations, when these operations can potentially involve multiple requests.
For example, sometimes there is a Future object sitting on top that wants to timeout after X seconds, even though each individual underlying request would take less than X seconds. This was needed to support some use case scenarios in BigQuery.
With that said, it is possible that this addition broke the expected semantics in other use cases, e.g. where timeout
is expected to represent a per-request timeout.
If there is no explicit timeout given (timeout=None
), then the TimeoutGuard
should act as a no-op. I think the following paragraph is the key:
File uploads for a couple minutes, exceeding the default timeout of 60/61 seconds (...) that the TimeGuard uses.
IIRC, this is what happens - the library user provides no explicit timeout, but unlike some other libraries, this library also defines a default timeout (60, 61)
, which is passed to the TimeoutGuard
. However, that timeout is probably meant as a timeout for each requests
request, and not the total timeout. As the requests
docs explain:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.
@mcsimps2 Does this sound plausible? From the issue description it seems to me that your application does not use explicit timeouts, is that correct?
(in which case the reported timeout error is indeed a bug - thanks for reporting it!)
Update: Classified as P1, as this can break large file uploads, yet no obvious and reasonable workaround exists (monkeypatching does not count).
from python-storage.
Ah, that does make sense - I see how that's useful if you have multiple requests going on and need to cancel an operation mid-way through the sequence of requests if the "clock" timeout was exceeded.
You're correct, I do not set any explicit timeouts when calling the code, so any non-null timeout populated to the TimeoutGuard
must be coming from a library default.
However, that timeout is probably meant as a timeout for each requests request, and not the total timeout.
As a quick check (and for debugging purposes only), I can verify that setting the timeout passed to TimeoutGuard
to None
, while still passing the read and connect timeouts to requests
per the timeout
parameter, does indeed stop this issue from happening in the AuthorizedSession.request
function and allows me to successfully complete a large/slow upload operation.
So it seems that, perhaps, specifying a separate timeout to pass to TimeoutGuard
in addition to the one we want to pass to the requests
library would remediate this problem, although that might be a naive fix since I have only looked at a very limited portion of the codebase, and it would be one more method argument to worry about.
I would be happy to run any tests that would be helpful since I have access to big files and a slow network.
from python-storage.
@mcsimps2 FYI, I opened a pull request in the google-auth
library.
The timeout
parameter no longer affects the TimeoutGuard
, and instead a separate max_allowed_time
parameter is added to control it. Its value is None
by default, making the TimeoutGuard
a no-op by default as well.
If you wish, you can already try it out by using the google-auth
version from the PR branch, but otherwise there will probably be a new release soon anyway, and the Storage
lib will just need a version bump of the google-auth
dependency.
from python-storage.
Awesome, thank you for pushing this forward so quickly - much appreciated! This will help a lot, I'm looking forward to the next release
from python-storage.
@mcsimps2 The most recent google-auth release google-auth==1.11.0
contains the change that separates transport timeouts from the total time wall clock timeout. The latter is None
by default, meaning that TimoutGuard
should not be raising unnecessary Timeout errors anymore.
Until a new storage
lib release is made, you could try manually pinning google-auth to the abovementioned version, resolving this issue.
Also, thanks for the very good and detailed report, it made diagnosing the issue much faster. 👍
from python-storage.
@plamut is all of the work done for this (minus a release)?
from python-storage.
@crwilcox Yes, the bug no longer occurs if google-auth>=1.11.0
is used (the bugfix was done in the latter awhile ago).
from python-storage.
Related Issues (20)
- Implement soft delete
- Need to be able to download objects without storage.bucket.get permission same as gsutil HOT 1
- BlobWriter should abort multipart upload during exception handling HOT 7
- docs: add object retention samples
- Add http.client.RemoteDisconnected to list of retryable exceptions HOT 1
- Add 503 to list of retriable HTTP response codes HOT 4
- Warning: a recent release failed HOT 1
- Discussion: Contribution Idea - Python Code Sample for Handling Large JSON Files on Google Cloud HOT 3
- Sign blob URL using workload identity instead of common service account credentials HOT 9
- blob.upload_from_string get error Caused by SSLError(SSLEOFError HOT 2
- Support Storage Control Quickstart HOT 2
- `Blob.content_type` is `None` when created `from_string()` HOT 1
- match_glob keyword argument on google.cloud.storage.Client().list_blobs() has disappeared HOT 2
- Blob Writer's close function causes latency > 15s under load. HOT 6
- FR: Support HNS enablement in bucket metadata
- Can't set Cache-Control on GCS object HOT 2
- media_link & self_link in blob do not update when client option "api_endpoint" is set HOT 2
- Bypass 8MB limit to allow file to be uploaded in single request HOT 1
- Retry batch delete blob on 503
- Some unit tests require real credentials files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-storage.