Comments (5)
https://github.com/oittaa/gcp-storage-emulator#docker
The directory used for the emulated storage is located under /storage in the container. In the following example the host's directory $(pwd)/cloudstorage will be bound to the emulated storage.
from gcp-storage-emulator.
This is a directory controlled by the service and is read/write root by default, as the Docker service also runs as root. Additionally this cannot apply in memory backed storage. What I would like to have is a user directory, with user permissions and mounting on the container so at launch it imports all data in there. Ideally the top level directories of the import directory should be used as bucket names. For example the following directory:
import-dir
|_bucket_a
|_directory_a
|_directory_b
|_file_a
|_file_b
|_bucket_b
|_directory_c
|_directory_d
|_file_e
|_file_f
Should be loaded on startup and the server should create or use the buckets bucket_a
, bucket_b
(in memory or disk) and upload the corresponding files into the proper bucket.
from gcp-storage-emulator.
Yeah, that sounds like a good idea. I don't have much time at the moment, but pull requests are welcome.
from gcp-storage-emulator.
@MiltiadisKoutsokeras just FYI https://github.com/fsouza/fake-gcs-server has the behavior you're after.
For our use-case we actually don't want that behavior and and are trying to move to gcp-storage-emulator
instead. But I figured I would drop a note in case you're still in need of that.
from gcp-storage-emulator.
I have come up with a solution to the problem. Here it goes.
First I use Docker Compose to launch the container with these directives:
google_storage:
image: oittaa/gcp-storage-emulator
restart: unless-stopped
ports:
# Exposed in port 9023 of localhost
- "127.0.0.1:9023:9023/tcp"
environment:
####################################################################
# Application environment variables
PROJECT_ID: ${PROJECT_ID:-localtesting}
entrypoint: /entrypoint.sh
command: ["gcp-storage-emulator", "start",
"--host=google_storage", "--port=9023", "--in-memory",
"--default-bucket=${BUCKET_NAME:-localtesting_bucket}" ]
volumes:
- ./tests/storage/entrypoint.sh:/entrypoint.sh:ro
- ./tests/storage/docker_entrypoint_init.py:/docker_entrypoint_init.py:ro
- ./tests/storage/buckets:/docker-entrypoint-init-storage:ro
As you can see I pass the desired project name and bucket name via Env Vars, PROJECT_ID
and BUCKET_NAME
.
I override the entrypoint of the container with my own Bash script/Python script combination, entrypoint.sh
and docker_entrypoint_init.py
. Here are their contents:
entrypoint.sh
#!/usr/bin/env bash
# Exit in any error
set -e
[ "${PROJECT_ID}" = "" ] && { echo "PROJECT_ID Environment Variable is not Set!"; exit 1; }
# Install Python requirements
pip install google-cloud-storage==1.31.2
# Execute command line arguments in background and save process ID
"${@}" & PROCESSID=$!
# Wait process to start
while ! kill -0 "${PROCESSID}" >/dev/null 2>&1
do
echo "Waiting for process to start..."
sleep 1
done
echo "Process started, ID = ${PROCESSID}"
sleep 2
# Cloud Emulators
export STORAGE_EMULATOR_HOST=http://google_storage:9023
# Import data to bucket
echo "Importing data..."
python3 /docker_entrypoint_init.py
echo "DONE"
# Wait process to exit
wait "${PROCESSID}"
docker_entrypoint_init.py
"""Initialize Google Storage data
"""
import logging
from os import scandir, environ
import sys
from google.auth.credentials import AnonymousCredentials
from google.cloud import storage
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
def upload_contents(client, directory, bucket_name=None):
"""Upload recursively contents of specified directory.
Args:
client (google.cloud.storage.Client): Google Storage Client.
directory (str): upload directory path.
bucket_name (str, optional): Bucket name to use for upload. Defaults to
None.
"""
for entry in scandir(directory):
print(entry.path)
if entry.is_dir():
if bucket_name is not None:
# This is a normal directory inside a bucket
upload_contents(client, directory + '/' +
entry.name, bucket_name)
else:
# This is a bucket directory
upload_contents(client, directory + '/' +
entry.name, entry.name)
elif entry.is_file():
if bucket_name is not None:
tokens = entry.path.split(bucket_name + '/')
bucket_obj = client.bucket(bucket_name)
if len(tokens) > 1:
gs_path = tokens[1]
blob_obj = bucket_obj.blob(gs_path)
blob_obj.upload_from_filename(entry.path)
PROJECT_ID = environ.get('PROJECT_ID')
if PROJECT_ID is None:
logger.error('Missing required Environment Variables! Please set \
PROJECT_ID')
sys.exit(1)
storage_client = storage.Client(credentials=AnonymousCredentials(),
project=PROJECT_ID)
# Scan import data directory
upload_contents(storage_client, '/docker-entrypoint-init-storage')
logger.info('Successfully imported bucket data!')
logger.info('List:')
for bucket in storage_client.list_buckets():
print(f'Bucket: {bucket}')
for blob in bucket.list_blobs():
print(f'|_Blob: {blob}')
# All OK
sys.exit(0)
I hope this is helpful.
from gcp-storage-emulator.
Related Issues (20)
- Resumable uploads errors if (optional) name is omitted HOT 6
- upload_from_file() with content type "application/json" not working HOT 3
- Installing gcp-storage-emulator==2022.4.9 breaks tests HOT 1
- Bucket or Object not found HOT 4
- Bucket Upload notification to PubSub HOT 1
- cannot upload a file in nodejs HOT 2
- CredentialsError on first Client() logon HOT 2
- Feature request: Example of how to implement with CORS and SSL via docker HOT 3
- Connecting to emulator through PySpark HOT 1
- AttributeError: 'str' object has no attribute 'get_payload' HOT 1
- Custom metadata not included in HTTP headers when using storage emulator
- Python API inside a Docker container fails HOT 1
- 404 on blob make_public() or make_private() function
- using the emulator, upload a file then downloading it return 404
- using the emulator, upload a file then downloading it return 404 HOT 4
- Emulator returns `0.0.0.0` address instead of `localhost` for resumable upload in Docker (.NET client) HOT 2
- Hive working with GCP storage emulator
- Implement IAM endpoints for GCP Buckets
- Does it work with current SDK versions? HOT 2
- Support signing URLs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcp-storage-emulator.