Code Monkey home page Code Monkey logo

Comments (15)

ElectricSwan avatar ElectricSwan commented on August 10, 2024 24

@aborzin , I found that there is a setter for chunk size on the blob object, so I've replaced the module-level
storage.blob._DEFAULT_CHUNKSIZE = 5 * 1024* 1024 # 5 MB
with
blob.chunk_size = 5 * 1024 * 1024 # Set 5 MB blob size
when I create the blob, which means 1 less access to a protected member.

This also means that, for anyone with an upload speed of at least 1.1 Mbps [1], then no change needs to be made to the library, and only the public setter needs to be used.

For anyone whose upload speed is less than 1.1 Mbps, the module level
storage.blob._MAX_MULTIPART_SIZE = 5 * 1024* 1024 # 5 MB
is still required (in addition to setting blob.chunk_size).

[1] 1.1 Mbps is the minimum required to upload 8 MB within the 60 second timeout

from python-storage.

ElectricSwan avatar ElectricSwan commented on August 10, 2024 11

I'm also getting the same error on both Ubuntu 18.0.4 with Python 3.6.9 and Windows 10 with Python 3.8.0, both using google-cloud-storage 1.26.0
The timeout happens after 60 seconds.
I'm limited to around 800 kbps upload speed, so for me that gives a timeout for any files larger than 6 MB.
Any uploads that complete within 60 seconds are successful.

from python-storage.

HemangChothani avatar HemangChothani commented on August 10, 2024 11

PR #185 added explicit timeout argument to the blob methods. Now users can pass a longer timeout to resolve this issue. Feel free to reopen if this issue appear again.

from python-storage.

aborzin avatar aborzin commented on August 10, 2024 9

@vfa-minhtv, I have been experiencing similar timeout issues on my macOS and Win platforms with google-cloud-storage==1.26.0. However, the timeout issues are inconsistent and apparently dependent on the network speed. As already mentioned in this thread, typically it fails with very slow upload speed.

I checked the code and found that any data stream of 8 MB and larger will _do_resumable_upload(..) which sends the data stream in chunks (which absolutely makes sense to support slow network connectivity):

        if size is not None and size <= _MAX_MULTIPART_SIZE:
            response = self._do_multipart_upload(
                client, stream, content_type, size, num_retries, predefined_acl
            )
        else:
            response = self._do_resumable_upload(
                client, stream, content_type, size, num_retries, predefined_acl
            )

However, the chunk size is not set in the initialization call and therefore will be set to some predefined default value:

        if chunk_size is None:
            chunk_size = self.chunk_size
            if chunk_size is None:
                chunk_size = _DEFAULT_CHUNKSIZE

This default value is set to 100 MB:

_DEFAULT_CHUNKSIZE = 104857600  # 1024 * 1024 B * 100 = 100 MB

So you must have ~ 15 MBps upload speed to complete the request within 1 min, which is apparently the default timeout (see http://www.meridianoutpost.com/resources/etools/calculators/calculator-file-download-time.php for quick upload time calculations).

I made a test and reduced the _DEFAULT_CHUNKSIZE value to 10 MB, which solved my issues.
I hope this will help and we will be able to control the chunk size based on our environment parameters.

from python-storage.

vfa-minhtv avatar vfa-minhtv commented on August 10, 2024 4

@ElectricSwan

The timeout happens after 60 seconds.
I'm limited to around 800 kbps upload speed, so for me that gives a timeout for any files larger than 6 MB.
Any uploads that complete within 60 seconds are successful.

Exact same thing happen for me. My internet is also limited so 60 seconds timeout is insufficient to finish uploading

from python-storage.

aborzin avatar aborzin commented on August 10, 2024 4

@ElectricSwan, my problem with the workaround I proposed earlier is that it was not portable (because it changes the code in the local version of google-cloud-storage lib). So I decided to override the chunk size of the blob object after it is created:

storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(destination_blob_name)

# WARNING: this is a workaround for a google-cloud-storage issue as reported on:
# https://github.com/googleapis/python-storage/issues/74
blob._chunk_size = 8388608  # 1024 * 1024 B * 16 = 8 MB

blob.upload_from_filename(source_file_name)

Even though it is a bad practice to access the "private" variable of a class, it seems to be a reasonable solution for now.

from python-storage.

ElectricSwan avatar ElectricSwan commented on August 10, 2024 1

Thank you @aborzin for sharing your excellent investigation.

I've edited lines 108 and 109 of
env\Lib\site-packages\google\cloud\storage\blob.py
and set the values of both to to 5 MB

_DEFAULT_CHUNKSIZE = 5 * 1024* 1024  # 5 MB
_MAX_MULTIPART_SIZE = 5 * 1024* 1024  # 5 MB

With my 800 kbps upload speed, the maximum size was 6 MB, so I chose 5 MB to provide some margin.
I can now successfully upload large files on my 800 kbps upload link.

from python-storage.

aborzin avatar aborzin commented on August 10, 2024 1

@ElectricSwan, I agree that my second solution works only if you set the chunk size to 8MB or larger because of the _MAX_MULTIPART_SIZE threshold. However I think you can override it from your code as well:

# WARNING: this is a workaround for gstorage issue as reported in:
# https://github.com/googleapis/python-storage/issues/74
storage.blob._MAX_MULTIPART_SIZE = 5 * 1024* 1024
blob._chunk_size = 5 * 1024* 1024

I debugged this option and and the threshold was set correctly to 5 MB. Of course, you can do it once, after the google.cloud.storage package is loaded (and not do it for each and every call to upload a file).

from python-storage.

crwilcox avatar crwilcox commented on August 10, 2024

You state you ran blob.upload_from_filename("path/on/storage", "path/of/big/file/on/local"), which doesn't match the signature:
upload_from_filename(filename, content_type=None, client=None, predefined_acl=None)

I created a repro, but unfortunately this is working well for me.

import datetime

# Using version 1.26.0 of google-cloud-storage
from google.cloud import storage

BUCKET_NAME = "your-bucket"

client = storage.Client()
bucket = client.bucket(BUCKET_NAME)

#filename = "4Gb.txt"

# 300Mb.txt generated with 
# `❯ dd if=/dev/urandom of=300Mb.txt bs=1048576 count=300`
filename = "300Mb.txt"

# Write file if necessary.
blob = bucket.blob(filename)
if not blob.exists():
    print(f"Writing {filename}")
    start = datetime.datetime.now()
    blob.upload_from_filename(filename)
    end = datetime.datetime.now()
    print(f"Wrote {filename} {end-start}")

# Read file
print(f"Reading {filename}")
start = datetime.datetime.now()
blob.download_to_filename(f"downloaded-{filename}")
end = datetime.datetime.now()
print(f"Read {filename} {end-start}")

For good measure, I tried the same thing with a 4Gb file as well:

❯ python write_and_read.py
Writing 300Mb.txt
Wrote 300Mb.txt 0:00:46.983565
Reading 300Mb.txt
Read 300Mb.txt 0:00:39.064956
❯ python write_and_read.py
Writing 4Gb.txt
Wrote 4Gb.txt 0:09:43.198637
Reading 4Gb.txt
Read 4Gb.txt 0:14:05.202351

This was done from a new virtual environment and python 3.8

❯ pip freeze
cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
google-api-core==1.16.0
google-auth==1.11.2
google-cloud-core==1.3.0
google-cloud-storage==1.26.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
idna==2.9
protobuf==3.11.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pytz==2019.3
requests==2.23.0
rsa==4.0
six==1.14.0
urllib3==1.25.8

from python-storage.

crwilcox avatar crwilcox commented on August 10, 2024

@ElectricSwan , what is the code you are running? The sample code I provided runs well beyond 60 seconds. Simplified it is

from google.cloud import storage

BUCKET_NAME = "your-bucket"

client = storage.Client()
bucket = client.bucket(BUCKET_NAME)

# `❯ dd if=/dev/urandom of=4Gb.txt bs=1048576 count=4096`
filename = "4Gb.txt"

# Write file if necessary.
blob = bucket.blob(filename)
blob.upload_from_filename(filename)

from python-storage.

ElectricSwan avatar ElectricSwan commented on August 10, 2024

@crwilcox , I just ran your simplified code in a virtual env with Python 3.6.9 on Ubuntu 18.04, and I get the timeout at 60 seconds.

I get the exact same timeout error with your simplified code on my Windows 10 PC running Python 3.8.0.

Here is the pip freeze from the Ubuntu PC, which appears identical to yours;

cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
google-api-core==1.16.0
google-auth==1.11.2
google-cloud-core==1.3.0
google-cloud-storage==1.26.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
idna==2.9
pkg-resources==0.0.0
protobuf==3.11.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pytz==2019.3
requests==2.23.0
rsa==4.0
six==1.14.0
urllib3==1.25.8

Here's my pip freeze from the Windows PC;

astroid==2.3.3
cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
colorama==0.4.3
google-api-core==1.16.0
google-auth==1.11.2
google-cloud-core==1.3.0
google-cloud-storage==1.26.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
idna==2.8
isort==4.3.21
lazy-object-proxy==1.4.3
mccabe==0.6.1
protobuf==3.11.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pylint==2.4.4
pytz==2019.3
requests==2.22.0
rsa==4.0
six==1.14.0
urllib3==1.25.8
wrapt==1.11.2

The only potentially relevant differences that I can see between my Windows and Ubuntu boxes are that my Windows box has slightly older versions of;

idna==2.8
requests==2.22.0

but the result is the same on both platforms; timeout after 60 seconds.

Here is my stacktrace on the Ubuntu PC, which is almost identical to the stacktrace submitted by @vfa-minhtv

Traceback (most recent call last):
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1075, in _send_output
    self.send(chunk)
  File "/usr/lib/python3.6/http/client.py", line 996, in send
    self.sock.sendall(data)
  File "/usr/lib/python3.6/ssl.py", line 975, in sendall
    v = self.send(byte_view[count:])
  File "/usr/lib/python3.6/ssl.py", line 944, in send
    return self._sslobj.write(data)
  File "/usr/lib/python3.6/ssl.py", line 642, in write
    return self._sslobj.write(data)
socket.timeout: The write operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1075, in _send_output
    self.send(chunk)
  File "/usr/lib/python3.6/http/client.py", line 996, in send
    self.sock.sendall(data)
  File "/usr/lib/python3.6/ssl.py", line 975, in sendall
    v = self.send(byte_view[count:])
  File "/usr/lib/python3.6/ssl.py", line 944, in send
    return self._sslobj.write(data)
  File "/usr/lib/python3.6/ssl.py", line 642, in write
    return self._sslobj.write(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', timeout('The write operation timed out',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "simple_test.py", line 18, in <module>
    blob.upload_from_filename(filename)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1342, in upload_from_filename
    predefined_acl=predefined_acl,
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1287, in upload_from_file
    client, file_obj, content_type, size, num_retries, predefined_acl
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1193, in _do_upload
    client, stream, content_type, size, num_retries, predefined_acl
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 987, in _do_multipart_upload
    response = upload.transmit(transport, data, object_metadata, content_type)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/resumable_media/requests/upload.py", line 106, in transmit
    retry_strategy=self._retry_strategy,
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/resumable_media/requests/_helpers.py", line 136, in http_request
    return _helpers.wait_and_retry(func, RequestsMixin._get_status_code, retry_strategy)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/resumable_media/_helpers.py", line 150, in wait_and_retry
    response = func()
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/google/auth/transport/requests.py", line 317, in request
    **kwargs
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/julie/Programming/Python3/bu_to_gcs/env/lib/python3.6/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out',))

I also just tried installing google-cloud-storage without a virtual env under Ubuntu (just in case there was something peculiar under a virtual env), but I still get the same timeout after 60 seconds.

from python-storage.

vfa-minhtv avatar vfa-minhtv commented on August 10, 2024

You state you ran blob.upload_from_filename("path/on/storage", "path/of/big/file/on/local"), which doesn't match the signature:
upload_from_filename(filename, content_type=None, client=None, predefined_acl=None)

Hi @crwilcox!

So sorry. I gave a wrong example. My actual code is:

image

from python-storage.

ElectricSwan avatar ElectricSwan commented on August 10, 2024

@aborzin, I agree wholeheartedly that it is not a good idea to change code in a library, and I do prefer your 2nd solution, but unfortunately it doesn't work for me, because of the test at line 1191;
if size is not None and size <= _MAX_MULTIPART_SIZE:

In my case (800 kbps upload), I am unable to upload 8 MB within 60 seconds. That is why I have to change the value of _MAX_MULTIPART_SIZE as well. Without that change, files between 6 MB and 8 MB still fail.

Because _MAX_MULTIPART_SIZE is a module-level variable, I can't see any way of changing that value from within my code, so for now I'm stuck with modifying the lib. Please correct me if I'm wrong.

from python-storage.

ElectricSwan avatar ElectricSwan commented on August 10, 2024

@aborzin, I was working on the same solution, and was just about to post it when I got your notification.

I've done;

from google.cloud import storage
# WARNING; WORKAROUND to prevent timeout for files > 6 MB on 800 kbps upload link.
storage.blob._DEFAULT_CHUNKSIZE = 5 * 1024* 1024  # 5 MB
storage.blob._MAX_MULTIPART_SIZE = 5 * 1024* 1024  # 5 MB

from python-storage.

sjtarik avatar sjtarik commented on August 10, 2024

I am having the exact same issue constantly while uploading a 15 mb file on 2.4mb upload speed.
when I set the timeout I still get 503 response. Strange thing is upload is actually successful, I am able to browse and verify integrity of the uploaded file on the bucket.

DEBUG:pydub.converter:subprocess output: b'video:0kB audio:19466kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.001139%'
DEBUG:google.auth._default:Checking /Users/xxx/Desktop/videotranscripts/google_voicebookmarks_service.json for explicit credentials as part of auth process...
DEBUG:google.auth._default:Checking /Users/xxx/Desktop/videotranscripts/google_voicebookmarks_service.json for explicit credentials as part of auth process...
DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): storage.googleapis.com:443
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "GET /storage/v1/b/yyy?projection=noAcl&prettyPrint=false HTTP/1.1" 200 587
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "POST /upload/storage/v1/b/yyy/o?uploadType=resumable HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "PUT /upload/storage/v1/b/yyy/o?uploadType=resumable&upload_id=ABg5-UxLKi2XLTyw1shMgubVCY3aYVHfPGjLe5gfEHEMyMlVI-HNQabmCu437hCljHU_n3QVQtc8dpCHZbrXMZq7pGw HTTP/1.1" 200 755
DEBUG:google.auth._default:Checking /Users/xxx/Desktop/videotranscripts/XXX.json for explicit credentials as part of auth process...
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
DEBUG:google.api_core.retry:Retrying due to , sleeping 0.8s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 2.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 2.7s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 7.8s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 1.9s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 28.7s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 21.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 60.0s ...

from python-storage.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.