Comments (5)
It might be interesting to log the http traffic and inspect what is going on.
Python
Enabling HTTP requests in Python GCS library can be done using the logging module. In the following example, I'm enabling logging.DEBUG in the example:
from google.cloud import storage
# Python3 required
import http
http.client.HTTPConnection.debuglevel=5
# Necessary to turn on logging
storage_client = storage.Client()
blobs = storage_client.list_blobs("anima-frank")
for blob in blobs:
print(blob.name)
Ref: https://docs.python.org/3/library/logging.html
Ref: https://docs.python.org/2/library/logging.html
GSUTIL
gsutil has the flag --debug
to enable HTTP request logs
For example:
gsutil --debug ls gs://bucket-name
Ref: https://cloud.google.com/storage/docs/gsutil/addlhelp/TopLevelCommandLineOptions
from python-storage.
httplog_4workers_20files.txt
httplog.txt
"Per Chris's request I ran the test program with HTTP logging turned on. Here is the output for a run with 1 worker retrieving one file."
from python-storage.
First off, I made some test data by running this locally and uploading to a directory in storage:
for n in {1..1000}; do
dd if=/dev/urandom of=file$( printf %03d "$n" ).data bs=1 count=1024
done
I also made some small modifications to the code to make it a bit more flexible.
- I set defaults in the file to make running in a debugger easier. if you set your own they will still be used.
- the code didn't support objects without metadata. It checks now before assuming there is metadata to access.
from python-storage.
After a bit of investigation. testing from my network (Seattle, WA). running 8 workers on a machine with a Quad-Core Intel Core i7 (8 vCores). Bucket is multiregion us.
I tracked metadata retrieval, downloading the 1kb file, setting metadata. Each takes right around .15 - .25 seconds. If it takes longer than .25 I print a warning. The attached log has a single warning from a metadata update that took .27 seconds.
The code has change slightly from above as I added additional logging.
code.txt
log_1000f_8w.txt
Time to download using code.py is around 40 seconds (timing is capturing the final sleep on threads so actual time is less)
Using gsutil -m cp -r gs://bucket/demo-data/
I see it taking 35.76s
from python-storage.
Closing this out as customer has been helped. Will open bugs to dig into specific things we can do to help folks avoid this in the future. It seems the threaded version of this code has some contention. Moving to multiprocessing is much faster.
Using threads: ~30 seconds
After moving to multiprocesing:
16 workers: 19.4 seconds
32 workers: 13.2 seconds
64 workers: 10.3 seconds
128 workers: 9.1 secondsF
from python-storage.
Related Issues (20)
- Discussion: Contribution Idea - Python Code Sample for Handling Large JSON Files on Google Cloud HOT 3
- Sign blob URL using workload identity instead of common service account credentials HOT 9
- blob.upload_from_string get error Caused by SSLError(SSLEOFError HOT 2
- Support Storage Control Quickstart HOT 2
- `Blob.content_type` is `None` when created `from_string()` HOT 1
- match_glob keyword argument on google.cloud.storage.Client().list_blobs() has disappeared HOT 2
- Blob Writer's close function causes latency > 15s under load. HOT 6
- FR: Support HNS enablement in bucket metadata
- Can't set Cache-Control on GCS object HOT 2
- media_link & self_link in blob do not update when client option "api_endpoint" is set HOT 2
- Bypass 8MB limit to allow file to be uploaded in single request HOT 1
- Retry batch delete blob on 503
- Some unit tests require real credentials files
- Support HOT 1
- Add HNS folders samples HOT 1
- Datastore emulator causes cloud storage client to hang HOT 2
- Samples for Managed Folders HOT 1
- Docs request: how to use `Client._http`
- Misleading error message when creating notifications while specifying a notification ID. HOT 1
- Can I already interact with the new Soft Delete option using this library?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-storage.