Code Monkey home page Code Monkey logo

Comments (5)

oittaa avatar oittaa commented on May 23, 2024

https://github.com/oittaa/gcp-storage-emulator#docker

The directory used for the emulated storage is located under /storage in the container. In the following example the host's directory $(pwd)/cloudstorage will be bound to the emulated storage.

from gcp-storage-emulator.

MiltiadisKoutsokeras avatar MiltiadisKoutsokeras commented on May 23, 2024

This is a directory controlled by the service and is read/write root by default, as the Docker service also runs as root. Additionally this cannot apply in memory backed storage. What I would like to have is a user directory, with user permissions and mounting on the container so at launch it imports all data in there. Ideally the top level directories of the import directory should be used as bucket names. For example the following directory:

import-dir
|_bucket_a
  |_directory_a
  |_directory_b
    |_file_a
    |_file_b
|_bucket_b
  |_directory_c
  |_directory_d
    |_file_e
    |_file_f

Should be loaded on startup and the server should create or use the buckets bucket_a, bucket_b (in memory or disk) and upload the corresponding files into the proper bucket.

from gcp-storage-emulator.

oittaa avatar oittaa commented on May 23, 2024

Yeah, that sounds like a good idea. I don't have much time at the moment, but pull requests are welcome.

from gcp-storage-emulator.

mike-marcacci avatar mike-marcacci commented on May 23, 2024

@MiltiadisKoutsokeras just FYI https://github.com/fsouza/fake-gcs-server has the behavior you're after.

For our use-case we actually don't want that behavior and and are trying to move to gcp-storage-emulator instead. But I figured I would drop a note in case you're still in need of that.

from gcp-storage-emulator.

MiltiadisKoutsokeras avatar MiltiadisKoutsokeras commented on May 23, 2024

I have come up with a solution to the problem. Here it goes.

First I use Docker Compose to launch the container with these directives:

google_storage:
        image: oittaa/gcp-storage-emulator
        restart: unless-stopped
        ports:
            # Exposed in port 9023 of localhost
            - "127.0.0.1:9023:9023/tcp"
        environment:
            ####################################################################
            # Application environment variables
            PROJECT_ID: ${PROJECT_ID:-localtesting}
        entrypoint: /entrypoint.sh
        command: ["gcp-storage-emulator", "start",
            "--host=google_storage", "--port=9023", "--in-memory",
            "--default-bucket=${BUCKET_NAME:-localtesting_bucket}" ]
        volumes:
            - ./tests/storage/entrypoint.sh:/entrypoint.sh:ro
            - ./tests/storage/docker_entrypoint_init.py:/docker_entrypoint_init.py:ro
            - ./tests/storage/buckets:/docker-entrypoint-init-storage:ro

As you can see I pass the desired project name and bucket name via Env Vars, PROJECT_ID and BUCKET_NAME.
I override the entrypoint of the container with my own Bash script/Python script combination, entrypoint.sh and docker_entrypoint_init.py. Here are their contents:

entrypoint.sh

#!/usr/bin/env bash

# Exit in any error
set -e

[ "${PROJECT_ID}" = "" ] && { echo "PROJECT_ID Environment Variable is not Set!"; exit 1; }

# Install Python requirements
pip install google-cloud-storage==1.31.2

# Execute command line arguments in background and save process ID
"${@}" & PROCESSID=$!

# Wait process to start
while ! kill -0 "${PROCESSID}" >/dev/null 2>&1
do
    echo "Waiting for process to start..."
    sleep 1
done
echo "Process started, ID = ${PROCESSID}"
sleep 2

# Cloud Emulators
export STORAGE_EMULATOR_HOST=http://google_storage:9023

# Import data to bucket
echo "Importing data..."
python3 /docker_entrypoint_init.py
echo "DONE"

# Wait process to exit
wait "${PROCESSID}"

docker_entrypoint_init.py

"""Initialize Google Storage data
"""

import logging
from os import scandir, environ
import sys
from google.auth.credentials import AnonymousCredentials
from google.cloud import storage

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

def upload_contents(client, directory, bucket_name=None):
    """Upload recursively contents of specified directory.

    Args:
        client (google.cloud.storage.Client): Google Storage Client.
        directory (str): upload directory path.
        bucket_name (str, optional): Bucket name to use for upload. Defaults to
        None.
    """
    for entry in scandir(directory):
        print(entry.path)
        if entry.is_dir():
            if bucket_name is not None:
                # This is a normal directory inside a bucket
                upload_contents(client, directory + '/' +
                                entry.name, bucket_name)
            else:
                # This is a bucket directory
                upload_contents(client, directory + '/' +
                                entry.name, entry.name)
        elif entry.is_file():
            if bucket_name is not None:
                tokens = entry.path.split(bucket_name + '/')
                bucket_obj = client.bucket(bucket_name)
                if len(tokens) > 1:
                    gs_path = tokens[1]
                    blob_obj = bucket_obj.blob(gs_path)
                    blob_obj.upload_from_filename(entry.path)

PROJECT_ID = environ.get('PROJECT_ID')
if PROJECT_ID is None:
    logger.error('Missing required Environment Variables! Please set \
PROJECT_ID')
    sys.exit(1)

storage_client = storage.Client(credentials=AnonymousCredentials(),
                                project=PROJECT_ID)

# Scan import data directory
upload_contents(storage_client, '/docker-entrypoint-init-storage')

logger.info('Successfully imported bucket data!')
logger.info('List:')
for bucket in storage_client.list_buckets():
    print(f'Bucket: {bucket}')
    for blob in bucket.list_blobs():
        print(f'|_Blob: {blob}')

# All OK
sys.exit(0)

I hope this is helpful.

from gcp-storage-emulator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.