Code Monkey home page Code Monkey logo

bucket-pull's Introduction

bucket-pull

Small CLI command to download a bucket directory locally.
Aims to simulate gsutil cp -r when copying from a bucket to local path.

# download whole bucket to a local dir
bucket-pull gs://mybucketname ./mybucketname

# download a directory and all it's content to a local dir
bucket-pull gs://mybucketname/mydir ./

Get it

pip install bucket-pull

https://pypi.org/project/bucket-pull/

Run it

# as cli 
$ bucket-pull gs://mybucketname ./mybucketname

# as a module
$ python -m bucket_pull gs://mybucketname ./mybucketname 

Auth & Permissions

The utility makes use of the Google SDK and uses Client-Provided Authentication

To run with exported SA key, you can make use of the GOOGLE_APPLICATION_CREDENTIALS environmental variable

GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-credentials.json  bucket-pull gs://bucket/mydir /tmp/some/path

The account you are connecting with will need at least storage.buckets.get on the bucket, which can be granted with the roles/storage.legacyBucketReader.

gsutil iam ch serviceAccount:[email protected]:legacyBucketReader  gs://bucket

Some noteable differences

gsutil seems to have this somewhat weird behaviour when the destination path doesn't exist

$ gsutil cp -r  gs://smoss-tech-test-bucket/mydir/ ./doesnotexist/actually/
$ echo $?
0
$ ls ./doesnotexist
ls: cannot access './doesnotexist': No such file or directory

It will not create the destination path (not that weird)
but it won't complain either and end exits with 0.

Here bucket-pull diverge and throws an error instead.

Notes on multi-processing

With the -m flag we can enable multi-processing.

Here bucket-pull has opted for using the threading.
So, not true paralellism and we only ever make use of one CPU.
However, since we are mostly IO bound (disk and network) there is
still some gain to be had by using multiple threads waiting for IO.

"very" scientific comparison:

# with multithreading
time ./bucket-pull.py gs://smoss-tech-test-bucket/mydir /tmp/ -m
...
Downloading to /tmp/mydir/32mb.file
Downloading to /tmp/mydir/128mb.file
Downloading to /tmp/mydir/64mb.file
Downloading to /tmp/mydir/a/1.txt
Downloading to /tmp/mydir/a/b/2.txt
./bucket-pull.py gs://smoss-tech-test-bucket/mydir /tmp/ -m  5.86s user 5.24s system 22% cpu 50.149 total

# single thread
time ./bucket-pull.py gs://smoss-tech-test-bucket/mydir /tmp/ 
...
Downloading to /tmp/mydir/128mb.file
Downloading to /tmp/mydir/32mb.file
Downloading to /tmp/mydir/64mb.file
Downloading to /tmp/mydir/a/1.txt
Downloading to /tmp/mydir/a/b/2.txt
./bucket-pull.py gs://smoss-tech-test-bucket/mydir /tmp/  4.80s user 4.38s system 12% cpu 1:13.83 total

bucket-pull's People

Contributors

semos7 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.