Metadata is a service which updates our metadata storage. It relies on couple hosted services, but each of the following components should be scalable, fault tollerant and locally testable.
Any git hosting service (like github, bitbucket, gitlab, ...) which exposes webhooks API (so far we only support github webhooks).
Can be a Cloud Function or generally HTTP Service which can handle Metadata Events. The main responsibility of Webhook is to receive and publish the event to PubSub Service
Real-time messaging service. PubSub supports pull and push mechanism to deliver messages.
Can be a Cloud Function or HTTP Service (like webhook). It will be triggered by PubSub Service when the new event arrives (FIFO order is not guaranteed). The main responsibility of Subscriber is to decode and deserialize the event, extract useful metadata, optionally go to the Git Version Control Service for more detailed metadata. This last step (depends on complexity) can be realized either by internal process or by another service. The last step is to update our Metadata Database. Subscriber may additionally backup events in Raw Events Storage.
Schema based database where all repositories' metadata are stored. It's the main source of data for Metrics API.
Can be Distributted File System or any Storage Service where we can backup raw events (just in case, if we want to re-publish them).
Current implementation requires running PostgreSQL database (see docker-compose.yml file) with pre-created schema (see schema.sql file):
$ POSTGRES_DB=test POSTGRES_USER=user POSTGRES_PASSWORD=password docker-compose up --no-deps postgres
$ psql -h localhost -U user -W -d test < schema.sql
$ make test-all