zalando / postgres-operator Goto Github PK

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes

Home Page: https://postgres-operator.readthedocs.io/

License: MIT License

Makefile 0.35% Shell 2.03% Go 76.21% Dockerfile 0.29% Python 12.98% JavaScript 0.77% HTML 0.37% CSS 0.30% Pug 6.44% Mustache 0.26%

kubernetes postgresql operator golang postgres cluster managed-services data-infrastructure postgres-operator database-as-a-service

postgres-operator's Introduction

Postgres Operator

The Postgres Operator delivers an easy to run highly-available PostgreSQL clusters on Kubernetes (K8s) powered by Patroni. It is configured only through Postgres manifests (CRDs) to ease integration into automated CI/CD pipelines with no access to Kubernetes API directly, promoting infrastructure as code vs manual operations.

Operator features

Rolling updates on Postgres cluster changes, incl. quick minor version updates
Live volume resize without pod restarts (AWS EBS, PVC)
Database connection pooling with PGBouncer
Support fast in place major version upgrade. Supports global upgrade of all clusters.
Restore and cloning Postgres clusters on AWS, GCS and Azure
Additionally logical backups to S3 or GCS bucket can be configured
Standby cluster from S3 or GCS WAL archive
Configurable for non-cloud environments
Basic credential and user management on K8s, eases application deployments
Support for custom TLS certificates
UI to create and edit Postgres cluster manifests
Compatible with OpenShift

PostgreSQL features

Supports PostgreSQL 16, starting from 11+
Streaming replication cluster via Patroni
Point-In-Time-Recovery with pg_basebackup / WAL-E via Spilo
Preload libraries: bg_mon, pg_stat_statements, pgextwlist, pg_auth_mon
Incl. popular Postgres extensions such as decoderbufs, hypopg, pg_cron, pg_partman, pg_stat_kcache, pgq, pgvector, plpgsql_check, postgis, set_user and timescaledb

The Postgres Operator has been developed at Zalando and is being used in production for over five years.

Supported Postgres & K8s versions

Release	Postgres versions	K8s versions	Golang
v1.12.2	11 → 16	1.27+	1.22.3
v1.11.0	11 → 16	1.27+	1.21.7
v1.10.1	10 → 15	1.21+	1.19.8
v1.9.0	10 → 15	1.21+	1.18.9
v1.8.2	9.5 → 14	1.20 → 1.24	1.17.4

Getting started

For a quick first impression follow the instructions of this tutorial.

Supported setups of Postgres and Applications

Documentation

There is a browser-friendly version of this documentation at postgres-operator.readthedocs.io

postgres-operator's People

Contributors

Stargazers

Watchers

Forkers

jan-m cuulee yolkov open-source-archive fokusferit imraghava geku sdudoladov nboukeffa ksala mrgoogol erthalion javadev-a acroca stefanhipfel gurjeet joshscorp nielsdoucet sapcc aaron4894 therealwardo emmaai nexiles oier onlinehead dio valer-cara hendrixroa coderanger pathcl zalandoalex u5surf jirfag rogervaas aixeshunter deepdivenow kimxogus aknuds1 weiboyiyou redbaron mdbdba stang ermajn kupson kubekit99 98labs emayssat zyecho tacid sagar2366 andersquist rembik bazzilio pocorschi aaroniscode normanschlatter shinzu containerpope hbo2 nilium aroundus-inc patrislav sh credativ datadevopscloud aimannajjar nrkno srcrawford srikanth-medikonda hmilkovi epasham vineethreddy02 grimtonic pakhomov-passteam akhilesh-bhople akhileshbhople andersenqvist kealanm isgasho alfredw33 vamsi-js yujunz whogan00 skoenig localrivet carlqlange kangwoo waclawikj granthana-biswas vickatte lichnak zhuomingliang muratkars ptzagk deitmerit runyontr sirivong edurdias dbeecham phobot

postgres-operator's Issues

Improve bootstrap: don't wait for all replicas until users are created

The first pod and master is quickly available and then the operator currently continues to wait until the remaining pods in the statefulset also are healthy and have the replica label attached. As this adds additional wait time for the user and creates a window where the database is available but no users for employees or application are created the operator should start with role and database creation once the first pod with the master label is available.

Add validation of the allowedSourceRanges list

Currently if we have one or more invalid IPs in the allowedSourceRanges list, we get an error while creating/updating services:
spec.LoadBalancerSourceRanges: Invalid value: "[10.0.0.1 10.0.0.2/32 10.0.0.3/32]": must be a list of IP ranges. For example, 10.240.0.0/24,10.250.0.0/24
in the example above 10.0.0.1 has invalid format.

How to use the PostgreSQL cluster in Minikube

The README instructions to install the postgres-operator in Minikube now work (after #11 was merged), but how do I use the running cluster (login)?

$ kubectl get pods --show-labels
NAME                                 READY     STATUS    RESTARTS   AGE       LABELS
acid-testcluster-0                   1/1       Running   0          8m        application=spilo,spilo-role=master,version=acid-testcluster
acid-testcluster-1                   1/1       Running   0          4m        application=spilo,spilo-role=replica,version=acid-testcluster
etcd0                                1/1       Running   0          10m       app=etcd,etcd_node=etcd0
etcd1                                1/1       Running   0          10m       app=etcd,etcd_node=etcd1
etcd2                                1/1       Running   0          10m       app=etcd,etcd_node=etcd2
fake-teams-api-3898472216-g4pfb      1/1       Running   0          10m       name=fake-teams-api,pod-template-hash=3898472216
postgres-operator-2288360344-z9cdf   1/1       Running   0          9m        name=postgres-operator,pod-template-hash=2288360344

kubectl port-forward acid-testcluster-0 5432

I would expect something to work like:

psql -h localhost -U admin
...

Maybe we can just provide OAuth login with another fake service (would be cool to demonstrate the OAuth pam module)?

Offer more options for database in the manifest

More than current #databases: name->owner, which is also confusing if you look at it alone:
https://github.com/zalando-incubator/postgres-operator/blob/master/manifests/minimal-postgres-manifest.yaml#L19

An object with properties [owner, encoding, template, tablespace, ...] should be supported. Potentially, the full list from the CREATE TABLE statement: https://www.postgresql.org/docs/current/static/sql-createdatabase.html

EBS resize failed when IAM role not present and ends up in questionable state.

In the case of the operator running without the IAM role or without sufficient privileges to do the volume resize agains AWS API it statefulset is still updated to reflect the desired size while the volume stays at 10GB.

This should lead to either continues tries to resize the existing EBS volume (which may be less nice)

Or should lead to an unsuccesful sync of manifest and stateful set with repeated error message/logs that credentials/privileges are missing.

Expose log messages per cluster via API

The API should allow access to a limited set of log lines per cluster to allow build a UI for the operator.

This could be some fixed size list per cluster to keep last ~ 100 lines.

Lines should have time, level, message to highlight errors later.

Add an example clone option to the manifest

Link Josh' Kubecon talk "Kube-native Postgres"

How to run PostgreSQL on Kubernetes with Zalando's Patroni / PostgreSQL Operator: presentation + demo.

https://www.youtube.com/watch?v=Zn1vd7sQ_bc (36 minutes)

We should link this in the README.

No support for volumes less than 1GB

The operator uses gigabytes internally to evaluate the disk space when resizing volumes, so the cluster is always created with at least 1Gi volume, however, cluster updates compare the existing volume size with the actual volume size in the manifest and if the latter is less than 1Gi they fail.

This is clearly visible since our test manifest uses 512Mi volumes.

Consider adding comment to operator-created database objects

Such as roles, dbs(?) and whatever else it might be creating. For roles a comment like "Created by postgres-operator: DO NOT ALTER, CHANGES WILL BE REVERTED AUTOMATICALLY". Maybe add the timestamp of last sync in the comment as well.

Log that operator is waiting for master to become healthy/available

Due to config the wrong label reflected that master pod is there but from logs it was hard to figure out that operator was waiting for another label to be present.

Log every 30 seconds or every 60 seconds what the worker is still waiting for pod with "spilo_role_label" label.

Make "superuser" for team members configurable

Right now we grant superuser to team members.

This should be configurable in the config map, whether or not this is indeed granted on clusters created as it depends on the environment databases/cluster is considered to be in.

Create the replica service

For those use cases when one needs to access a replica. One needs to define it in the PostgreSQL manifest as well.

Travis CI + Coverage and badges

Configure Travis CI, coverage and add badges to README.

Sync/Recreate Infrastructure Roles on operator startup

It would be very helpful for the most basic rotation or in case of mistakes in infrastructure role secret if we would create all roles again on startup. This should include altering existing infrastructure roles and changing their password.

on startup change password of existing infrastructure roles to what is currently in the secret.

Rethink how we distribute work across workers.

Right now we use hashing, which leads to unused workers and an uneven distribution.

A quick change should be to change this using equal distribution across all workers by just assigning clusters to workers in a round robin way and store which cluster goes where when it is first discovered.

Had this discussion once before with @alexeyklyukin

Do not attempt to compare current state with the new one on a cluster with UpdateFailed status

If one provides an invalid pg manifest during the update to the operator it will result in 2 failures: first one when trying to generate kubernetes resources from that manifest, and another one on the next update (which usually tries to fix the problem), since the operator will generate the old resources first in order to compare with the new ones. Since we know that the old resources generation has already failed (due to the UpdateFailed status for the cluster), we can skip this step and just sync the newly generated spec with what we get directly from Kubernetes.

Manual failover: improve health check of replica

We observed the operator terminating a master pod once a replica pod reached running state but as the replica pod had a significant WAL backlog it was not promotable by Patroni. This led to a downtime for as long as it took to reschedule the other pod and promote that pod to master again.

Needs to basicall also compare WAL delay on replica to be within the max replication lag.

Operator connection leak

Apparently after the 415a7fd when creating users too many idle connections are established per each cluster, each of them corresponding to a single user and doing:

BEGIN ALTER ROLE "vhengst" RESET ALL;ALTER ROLE "vhengst" SET log_statement TO 'all'; END;

It seems that the changes actually triggered the old bug of establishing too many connections (could be that the underlying library is not limiting itself to only one), but the issue was dormant until after another bug, caused by the aforementioned request that changes the log_statement every time it goes the sync happens.

Operator should warn if secrets are empty

The operator should log a warning if the secrets it reads appear to be empty. Rationale: a user may inadvertently write a .yaml file with broken syntax so that the file looks meaningful to the user but empty to the operator.

Merge the create and the sync code for the cluster

Right now, Create is special because it fails when objects it tries to create are discovered as already existing during the call of the create{Object} (i.e. StatefulSet, Service, etc.). Subsequently, the next sync will find the cluster in a half-created state and sync all definitions from the cluster manifest with the actual state of the Kubernetes. This actually results in Create being equivalent to a Sync at the end of the day, but introduces a delay of one one ResyncPeriod before such a cluster can be used.

Given that, I don't see any strong reasons for not replacing Create with Sync altogether for the cluster. In fact, we already run sync every time for the secrets and user roles in the DB, not failing when those already exist.

Local fake teams api

In order to make it easy to test and developer postgres operator it would be great to have local to minikube (and obviously fake) teams api

Add a "databases" endpoint to the operator REST API

The endpoint should return a {clusterName : listOfDatabases} map like in {"acid-demo-01": "database01","database02"], ....}

Employee role creation with toggle able settings: log_statements and superuser

The operator should have two flags in the config map that allow to specify how employee roles are created:

toggle if users are created as a superuser
toggle if users are created with log statement = all

We can for now assume that all users are created within the "admin" role.

With superuser disable for users and log statements enabled this gives us proper statement logging.

Operator should not re-write password already set by the infrastructure roles

The role with the same name may be defined simultaneously at infrastructure-roles.yaml and postgres-manifest.yaml. In this case the operator currently overwrites the password defined at the infrastructure-roles.yaml with the random password.

Google Cloud Storage Credentials

After experimenting with the Patroni helm chart, I've decided to go with the postgres-operator instead.

The Patroni helm chart allows Gcloud credentials to be stored as a secret which is then mounted as a volume referenced by the GOOGLE_APPLICATION_CREDENTIALS environmental variable. It looks like the operator makes no such provision. What would be the suggested way to inject the Google Cloud credentials for WALE backups into the Spillo pods using the operator?

OAuth token should go the watched namespace

Currently the OAuth token seems to be in the wrong namespace.

Support usage of another Dockerimage for "new" clusters.

Basically to support easier testing and rolling out new features with less impact this could be useful.

The operator would pick the new Dockerimage for newly created clusters but would not touch existing stateful sets if their image is either the "new" or "current" Docker image.

Docs Request: document updating a manifest

The current documentation doesn't explain how one updates a manifest for an existing cluster.

I'd write that myself, except that I've been unable to figure out how to do it with trial & error.

Support for multiple namespaces

My expectation was that the Postgres operator will watch and create Postgres instances in multiple namespaces. So far I only got it working for the namespace where the operator is running. Is this by design or did I miss some settings?

Thanks for your help.

Release first version as Docker image on registry.opensource.zalan.do

I think the best approach would be to configure the Continuous Delivery Platform (delivery.yaml) to build and push Docker images to registry.opensource.zalan.do/acid/postgres-operator:GIT_TAG. Releases should be tagged in git and release notes should be available on https://github.com/zalando-incubator/postgres-operator/releases .

Service updates do not change the metadata

Right now we have a bunch of clusters that transition from the old zalando.org/dnsname annotation to the new external-dns.alpha.kubernetes.io/hostname one. The sync code correctly detects that and calls patch on those services, but the patch does nothing, because it only changes the spec, and the annotation is in the metadata part of the service.

Discussion notes on: 1 Statefulset per node

Yesterday evening we found out that we arrived at similar need for different reason:

Controlled scale down to not terminate master node during scale down
Allow per replica/node different configuration, e.g. slaves with less resources

As a remark to the statefulset should scale down in a deterministic fashion thus the highest number of pod will be terminated(removed).

For 1) I also don't feel this is warrant enough for rewrite to multiple statefulsets, as failing over should be a no brainer combined with the probably very low times people actually scale up and down. Maybe Patroni can even prefer to fail over to lowest number pod.

Empty loadBalancerSourceRanges allows LB access from everywhere

If the user omits allowedSourceRanges field in the PostgreSQL manifest, the service generated from that spec lacks the loadBalancerSourceRanges restrictions. As a result, no restrictions is applied on the network access and the service is accessible from everywhere.

The most straightforward solution is to set the loadBalancerSourceRanges to 127.0.0.1/32 if allowedSourceRanges are empty, however, that would make the target load balancer unusable.
I'd propose to consider allowedSourceRanges a mandatory field in the manifest. One can always emulate "free for all" with something like 0.0.0.0/0.

consider switching from glide to dep package manager

Make container name static

Resize EBS volumes

This involves 3 steps:

resize the EBS volume itself
resize the filesystem on a pod (implement exec interface)
replace the statefulset in order to apply the changes for new pods.

Script and document how to start the custom operator locally

The information is already in readme, but it spreads several section and contains some shell magic

Check if deleted kubernetes object is actually deleted

We need to be sure that deleted kubernetes objects no longer exist in the cluster.

Delete kubernetes resources of the Clusters in the failed states

We need to clean up kubernetes resources after failed (and maybe deleted) clusters.

The idea is to get list of kubernetes resources(Pods, Secrets, etc) using cluster_labels as a selector and match the resource to the corresponding cluster(using cluster_name_label). if the cluster is in the failed status(ClusterStatusUpdateFailed, ClusterStatusAddFailed), we can safely delete associated to that cluster resources

Add short CLI help on Postgres operator/customResource

For example, in the format similar to kubectl explain

Replace TPR with CRD

Kubernetes 1.7 has deprecated TPRs in favor of CRDs. See blog post and release notes.

https://coreos.com/blog/custom-resource-kubernetes-v17

Fix the `useLoadBalancer` in the complete Postgres Manifest

Do not delete "postgres" credentials for cluster on delete

Make it configurable that the postgres credentials secret is not deleted when cluster is deleted.

This is purely a workaround until the flow of delete and restore is working nicely. Unfortunately until then deleting the postgres secret breaks the usability of a restored cluster as the postgres password is restored from backup.

An empty master pod errors out the whole sync

If there is no master pod in the cluster, the operator fails during the sync with:

time="2017-07-12T07:35:37Z" level=info msg="Recreating master pod '/'" cluster-name=team-example pkg=cluster worker=2
time="2017-07-12T07:35:37Z" level=error msg="could not sync cluster 'default/team-example': could not sync statefulsets: could not recreate pods: could not recreate master pod '/': could not delete pod: resource name may not be empty" pkg=controller worker=2

and doesn't proceed further to re-creating roles and persistent volumes. Factors like an wrong docker image, misconfigured Etcd and so on may result in an empty master pod, leading to the issues with this cluster even after the initial problem has been fixed.

Make it configurable rule to check if name equals "{team}-.*"
Make it configurable how to generate the name of the statefulset
Make it configurable how to generate the name of the service

make docker won't work (CGO_ENABLED should be 0)

The Docker image generated with make docker won't work as it is based on alpine:

make docker
echo '{\n "url": "git:[email protected]:zalando-incubator/postgres-operator.git",\n "revision": "f635fac",\n "author": "hjacobs",\n "status": ""\n}' > scm-source.json
mkdir -p docker/build/
cp build/linux/postgres-operator scm-source.json docker/build/
cd docker && docker build --rm -t "pierone.example.com/acid/postgres-operator:f635fac" .
Sending build context to Docker daemon  49.23MB
Step 1 : FROM alpine
 ---> 02674b9cb179
Step 2 : MAINTAINER Team ACID @ Zalando <[email protected]>
 ---> Using cache
 ---> 7b9b7535d5a5
Step 3 : RUN apk --no-cache add ca-certificates
 ---> Using cache
 ---> 9b87d980f206
Step 4 : COPY build/* /
 ---> de51ed941534
Removing intermediate container 8b9ef9592485
Step 5 : ENTRYPOINT /postgres-operator
 ---> Running in 2b9976b1d602
 ---> 1dcb57fa2d2f
Removing intermediate container 2b9976b1d602
Successfully built 1dcb57fa2d2f

Docker container cannot start:

docker run -it pierone.example.com/acid/postgres-operator:f635fac
docker: Error response from daemon: Container command '/postgres-operator' not found or does not exist.

The required libraries do not exist:

/ # ldd /postgres-operator 
	/lib64/ld-linux-x86-64.so.2 (0x55a7b360a000)
	libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x55a7b360a000)
	libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x55a7b360a000)
/ # ls -la /lib64
ls: /lib64: No such file or directory