Comments (10)
I believe we are experiencing a similar problems every since 1.25.0 was released last week. We are accessing blobs directly and getting random 504 errors.
The code can be as simple as:
storage_client = storage.Client()
bucket = storage_client.get_bucket("some-bucket")
Logs show errors like this:
google.api_core.exceptions.GatewayTimeout: 504 GET https://storage.googleapis.com/storage/v1/b/some-bucket/?projection=noAcl:
from python-storage.
@mcsimps2 Did you begin to see this behavior recently (with a particular release) or has it been ongoing?
Edit: Assigning priority and type to keep the bots happy, please change to something more appropriate once you are able to look at this in more detail.
from python-storage.
I have had hanging threads on both google-cloud-storage == 1.24.1 and 1.25.0. However, this problem has only manifested recently (the first time I ran into it was this earlier this week). It first occurred on 1.24.1, so I upgraded to 1.25.0. The Google servers have usually been responsive enough that I've never run into a timeout issue at all. Thus, I believe these types of loopholes have existed in the code over several versions, yet perhaps something recent occurred that decreased the timeliness/responsiveness of the servers, thereby causing these timeout=None
problems to manifest themselves; however, this is only speculation.
In the meantime, if there are other users experiencing a similar problem (@akuttig-hs), especially with threads hanging indefinitely from the timeout=None
issue, I've explored three solutions that each seem to work as a temporary workaround:
- calling the function in a new thread and killing that thread after a certain amount of time passes
- monkeypatching
AuthorizedSession
to always have a timeout - monkeypatching the
requests
library thatgoogle-cloud-python
uses to always have a timeout:
original_session_request = requests.Session.request
def monkeypatched_session_request(self, *args, **kwargs):
if not kwargs.get("timeout"):
kwargs["timeout"] = (DEFAULT_CONNECT_TIMEOUT, DEFAULT_READ_TIMEOUT)
return original_session_request(self, *args, **kwargs)
requests.Session.request = monkeypatched_session_request
The monkeypatch was the first thing I tried, and it definitely helped alleviate the problem since the proper Timeout exception is being thrown by the requests library, whereas before, it would never throw an exception and just wait forever for a response. That said, in the end, it's not a real solution to the problem.
from python-storage.
Does the issue have to do with the resumable-media
dependency by any chance? The following code snippet could be relevant:
The snippet applies the default timeout
argument, if the later is missing, but not if it is present and set to None
. A recent change to storage
added optional timeout
arguments (to match the method signature changes in core
), which might have broken the resumable-media
's reliance on the non-existence of the argument (I'm not 100% sure how these timeouts are propagated around, though).
If the above is indeed the cause, it would affect versions 1.24.0+
, i.e. those released after Dec 26th 2019, which would be consistent with the reports on indefinite hangings that started occurring recently.
Since we more or less always want to have some sort of a timeout, the above logic in resumable media could be modified to apply the default timeout on None
values as well.
Edit:
The code path from the issue description:
...
This then callsJSONConnection.api_request
(defaults timeout to None) ->JSONConnection._make_request
(defaults timeout to None) ->JSONConnection._do_request
(defaults timeout to None) ->AuthorizedSession.request
(defaults timeout to None), which makes the ultimate call to therequests.Session
object with a None timeout.
does not sound like going through resumable-media
, meaning that patching the latter would not work in that particular case, and an explicit timeout would have to be brought in somehow.
FWIW, BigQuery had similar problems (example), because there used to be no way of specifying a timeout to use in the transport layer.
Edit 2:
As a quick workaround, we could add a non-None default timeout to AuthorizedSession.request(), as this is much faster than adding a configurable timeout to every code path.
Edit 3:
Created a pull request in the google-auth dependency that adds a default timeout of 120 second (the same default timeout that is already used when refreshing credentials).
from python-storage.
@mcsimps2 @akuttig-hs Would you mind trying an upgrade to google-auth==1.11.0
to see if that resolves the issue?
from python-storage.
Thanks for the quick response!
@plamut I don't believe all requests still have a non-null Timeout with the recent PR you submitted that patched AuthorizedSession.request
. The flow of the code is JSONConnection.api_request (defaults timeout to None) -> JSONConnection._make_request (defaults timeout to None) -> JSONConnection._do_request (defaults timeout to None) -> AuthorizedSession.request (defaults timeout now to _DEFAULT_TIMEOUT). Correct me if I'm wrong, but I believe the default kwarg is only used if the specified kwarg is missing in the call to AuthorizedSession.request
. However, JSONConnection._do_request
is passing timeout=None
, which overrides the default kwarg value of _DEFAULT_TIMEOUT in AuthorizedSession.request
. The _do_request
method looks like
def _do_request(
self, method, url, headers, data, target_object, timeout=None
):
return self.http.request(
url=url, method=method, headers=headers, data=data, timeout=timeout
)
So in the end, timeout=None
is being passed to AuthorizedSession.request
, which will override the default kwarg value of _DEFAULT_TIMEOUT
, the end result being a null timeout again.
A quick code just for my sanity to make sure I understand kwargs correctly
def a(timeout=None): #JSONConnection._do_request
b(timeout=timeout)
def b(timeout=_DEFAULT_TIMEOUT): # AuthorizedSession.request
print(f"The timeout is {timeout}")
a() # prints "The timeout is None"
So perhaps, the code should be like this:
def a(timeout=None): #JSONConnection._do_request
b(timeout=timeout)
def b(timeout=None): # AuthorizedSession.request
timeout = timeout or _DEFAULT_TIMEOUT
print(f"The timeout is {timeout}")
a() # prints "The timeout is 120"
from python-storage.
I don't believe all requests still have a non-null Timeout with the recent PR you submitted that patched AuthorizedSession.request.
That is correct. There was actually a PR commit in google-auth that would override any None
value, but it was reverted during the review due to the concern that this could break some other use case in another dependent library.
The ruling was that if a library passes a None
timeout, it probably has a reason for it. I'd say the next step is thus to change the default timeout values in storage lib to non-None
- at least until all code paths can be given an explicit timeout by the library users.
from python-storage.
Ah, you’re perfectly right. I should’ve looked at the PR more closely, my apologies!
from python-storage.
Hey,
We are experiencing the same problems as @akuttig-hs described - simple requests for bucket.get_blob(<blob_name>)
return 504 Gateway
quite frequently but randomly.
I came across this document - Cloud Storage Retry Params
If we determine timeouts (and retries, for possible failures) through this class, will that be helpful for our case?
I'm about to try and test this and will update with news, but wanted your opinion
@plamut
from python-storage.
@maykmayx I can speak only generally, because I am not too familiar with this particular library (storage).
The following code drives the retries from inside the Retry
objects:
https://github.com/googleapis/google-cloud-python/blob/master/api_core/google/api_core/retry.py#L182-L190
If the target callable blocks indefinitely, a retry will not have a chance to act, which is the issue that is being dealt with here (#10199 will add customizable timeouts to storage
).
If the target callable does return, but with an error, predicate()
is run to determine whether to retry the function or not. Different libraries define different predicated, but those can be customized, if needed:
from google.api_core.retry import Retry
retry = Retry()
retry = retry.with_predicate(lambda exc: "foobar" in str(exc))
From what I can see from the linked sample, RetryParams
configure how to retry, but not what to retry. For the latter, a predicate needs to be configured, if you could obtain a reference to the Retry
instance? And make it repeat any 504 errors, if this does not already?
from python-storage.
Related Issues (20)
- Add 503 to list of retriable HTTP response codes HOT 4
- Warning: a recent release failed HOT 1
- Discussion: Contribution Idea - Python Code Sample for Handling Large JSON Files on Google Cloud HOT 3
- Sign blob URL using workload identity instead of common service account credentials HOT 9
- blob.upload_from_string get error Caused by SSLError(SSLEOFError HOT 2
- Support Storage Control Quickstart HOT 2
- `Blob.content_type` is `None` when created `from_string()` HOT 1
- match_glob keyword argument on google.cloud.storage.Client().list_blobs() has disappeared HOT 2
- Blob Writer's close function causes latency > 15s under load. HOT 6
- FR: Support HNS enablement in bucket metadata
- Can't set Cache-Control on GCS object HOT 2
- media_link & self_link in blob do not update when client option "api_endpoint" is set HOT 2
- Bypass 8MB limit to allow file to be uploaded in single request HOT 1
- Retry batch delete blob on 503
- Some unit tests require real credentials files
- Support HOT 1
- Add HNS folders samples HOT 1
- Datastore emulator causes cloud storage client to hang HOT 2
- Samples for Managed Folders HOT 1
- Docs request: how to use `Client._http`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-storage.