Code Monkey home page Code Monkey logo

crunchydata / postgres-operator Goto Github PK

View Code? Open in Web Editor NEW
3.8K 67.0 574.0 637.65 MB

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.

Home Page: https://access.crunchydata.com/documentation/postgres-operator/v5/

License: Apache License 2.0

Makefile 0.83% Go 97.73% Shell 1.41% Dockerfile 0.03%
postgresql kubernetes operator postgres postgres-operator high-availability database database-management postgresql-clusters disaster-recovery

postgres-operator's Issues

Considerations multipile PVCs for non-shared external storage

@jmccormick2001
I got the postgres operator to work with Dell EMC's ScaleIO. Here are some considerations to help with scaling the operator for non-shared external storage like ScaleIO:

  • Create a PVC for main database data mounted at /pgdata
  • Create a PVC for backup database data mounted at /pgbackup.
  • Create a PVC for replica pods data mounted at /pgdata

This will help avoid conflicts when scheduling on multi-node Kubernetes clusters where main database pod and replica pods may get scheduled on the same instance.

Failover and recovery

Hi Guys!

I'm looking for information in your solution about failover and recovery of nodes in postgres cluster, sorry but not sure if it exists... could you please advice?

Godep restore has errors when building from source?

Using version 1.9 of Go and Golang.

[sarah@localhost postgres-operator]$ godep restore
godep: Dep (k8s.io/client-go/discovery) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/apps/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/authentication/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/authentication/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/authorization/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/authorization/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/autoscaling/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/autoscaling/v2alpha1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/batch/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/batch/v2alpha1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/certificates/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/core/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/extensions/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/policy/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/rbac/v1alpha1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/rbac/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/settings/v1alpha1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/storage/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/storage/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/plugin/pkg/client/auth/gcp) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/rest) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/tools/auth) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/tools/clientcmd) restored, but was unable to load it with error:
Package (context) not found
godep: Error checking some deps.

Missing "thirdpartyresource.extentions pg-clone.crunchydata.com not found" error when creating cluster

I pulled the latest source code, built locally, and deployed necessary resources successfully:

kubectl get deployment,po,configMap,thirdpartyresource
NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/postgres-operator   1         1         1            1           2m

NAME                                    READY     STATUS    RESTARTS   AGE
po/postgres-operator-2889167354-mjdtk   1/1       Running   0          2m

NAME               DATA      AGE
cm/operator-conf   6         1h

NAME                                             DESCRIPTION                             VERSION(S)
thirdpartyresources/pg-backup.crunchydata.com    A postgres backup ThirdPartyResource    v1
thirdpartyresources/pg-cluster.crunchydata.com   A postgres cluster ThirdPartyResource   v1
thirdpartyresources/pg-upgrade.crunchydata.com   A postgres upgrade ThirdPartyResource   v1

However, when I attempt to create a cluster, I get the following (this is true with other pgo commands):

pgo create cluster mycluster
DEBU[0000] kubeconfig path is /home/vladimir/admin.conf
DEBU[0000] namespace is default
DEBU[0000] ConnectToKube called
ERRO[0000] thirdpartyresources.extensions "pg-clone.crunchydata.com" not found
ERRO[0000] required pg-clone.crunchydata.com TPR was not found on your kube cluster

It is worth noting this works ok when I pull the pre-built binaries ver 1.2 for postgres-operator and pgo.

Set VolumeMount to ReadOnly=true explicitly for DB replica pod for /pgdata

The documentation mentions that the replica pod for the database is read-only. However, when inspecting the pod, it shows that the readOnly flag is set to false.

Volumes:
  pgdata:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	pgcluster-pvc
    ReadOnly:	false

Would it make sense to also set the volume bind-mount to readOnly for the pod container or would that break the normal DB operation ?

Why does pgo decide on node placement?

I'm curious about the motivation for fixed placement of postgres instances. Feels like this would decrease robustness by preventing pod + pvc moving to a different node in event of node failure

refactor user command

instead of various flags, use a more proper syntax...

pgo create user ....
pgo delete user ...
etc.

repeat backups error

make sure repeating backups does not error, in some situations this will happen.

Clarify openshift requirements

The requirements section says that Openshift Origin 1.5.1+ and Openshift Container Platform 3.5 are required, but does not explain why - and the term 'openshift' does not appear in the rest of the README.md.

Is it still required?

version filter

Applying the version filter to show cluster does not work.

pgo show cluster all --version=9.6.3

I am running K8s 1.6.1 with pgo version 1.3.2 in a custom namespace.

load - 2nd time throws error

when running load twice, an error is produced not running the load job, the load needs to cleanup existing jobs before being re-run

The run.sh script, in the build setup, creates 60 PVs

Script examples/operator/run.sh calls create-pv.sh which in turns creates 60 with

for i in {1..60}
do
   	echo "creating PV crunchy-pv$i"
	export COUNTER=$i
	kubectl --namespace=$NAMESPACE delete pv crunchy-pv$i
	envsubst < $DIR/crunchy-pv.json | kubectl --namespace=$NAMESPACE create -f -
done

Most of them go unbound (as shown below). What is the reason for that setup ?

crunchy-pv54   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv55   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv56   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv57   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv58   1Gi        RWX           Retain          Bound       default/crunchy-pvc                            6m
crunchy-pv59   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv6    1Gi        RWX           Retain          Available                                                  6m
crunchy-pv60   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv7    1Gi        RWX           Retain          Available                                                  6m

thirdpartyresources.extensions "pg-clone.crunchydata.com" not found

Result of command (or any other pgo command) with kubernetes cluster on coreos:

pgo show cluster all

Is:
ERRO[0000] thirdpartyresources.extensions "pg-clone.crunchydata.com" not found
ERRO[0000] required pg-clone.crunchydata.com TPR was not found on your kube cluster

And indeed no pg-clone.crunchydata.com TPR is created by the script in the repos as shown below.

kubectl get thirdpartyresources
NAME DESCRIPTION VERSION(S)
pg-backup.crunchydata.com A postgres backup ThirdPartyResource v1
pg-cluster.crunchydata.com A postgres cluster ThirdPartyResource v1
pg-upgrade.crunchydata.com A postgres upgrade ThirdPartyResource v1

Can't find any info on where to get the pg-clone TPR or how to create it.

re-implement clone operation

the clone operation is removed in v2.0.0, it needs to be re-implemented due to various issues with that implementation.

policy apply - allow database choice

specify which database the policy gets applied to? It looks to be applied to the postgres database by default, we should allow the PG_DATABASE setting to be chosen for the policy as well.

Clarify that configuration of pgo cli is required, not optional

Now that I have things installed, here's my history trying to create a cluster:

$ which pgo
/usr/local/bin/pgo
$ pgo create cluster my-service
ERRO[0000] --kubeconfig flag is not set and required
$ pgo
The pgo command line interface lets you
create and manage PostgreSQL clusters.

Usage:
  pgo [command]

Available Commands:
  apply       apply a Policy
  backup      perform a Backup
  clone       perform a clone
  create      Create a Cluster or Policy
  delete      Delete a policy, database, cluster, backup, or upgrade
  scale       Scale a Cluster
  show        show a description of a cluster
  test        test a Cluster
  upgrade     perform an upgrade

Flags:
      --config string       config file (default is $HOME/.pgo.yaml)
      --debug               enable debug with true
      --kubeconfig string   kube config file
      --namespace string    kube namespace to work in (default is default)
      --selector string     label selector string
  -t, --toggle              Help message for toggle

Use "pgo [command] --help" for more information about a command.
$ ls ~/.kube/config 
/Users/jar349/.kube/config
$ pgo --kubeconfig ~/.kube/config create cluster my-service
ERRO[0000] --namespace flag is not set and required
$ pgo --kubeconfig ~/.kube/config --namespace default create cluster my-service
ERRO[0000] invalid MASTER_STORAGE.PVC_ACCESS_MODE specified

My thought was I'd just be able to type out the commands the way that you do in your documentation. The documentation says "You can configure both the client and the operator" but the truth is (unless I've made another mistake which is entirely possible) you MUST configure the cli before one can follow along with your documentation.

I recommend clarifying that.

Also, it's common for tools to place their configuration files in $HOME/.tool/config but that's not in the list of places pgo looks for its config. I'd prefer not to put pgo.yaml directly in $HOME. Would it be possible to add $HOME/.pgo/ to the list of places pgo looks for config files?

Use K8s ConfigMap instead of PVC for /data

Similarly to using Secrets #6 it seems using ConfigMaps instead of a hostPath PVC would make more sense to store the operator configuration and templates. Or is there a reason why ConfigMaps wouldn't work here?

support data deletion

when deleting a cluster, a flag to support data deletion, currently data is NOT deleted when the cluster is deleted

Use Kubernetes secrets to store credentials

Currently all credentials seem to be stored in the TPRs. Is it planned to switch this to using Kubernetes secrets? This way the operator could generate credentials while deploying a new database or cluster and store them in secrets.

seg fault - pgo show backup all scenario

when a backup pod is no longer there, the pgo show backup all command will segfault, recreate by create by creating a backup, then remove the pod with kubectl, then pgo show backup all

Databases and clusters should be created in the same namespace as the TPR

I have two clusters defined as TPRs. One in the default namespace and one in the pgtest namespace.

$ kubectl get pgcluster --all-namespaces
NAMESPACE   NAME               KIND
default     pg-orange-jaguar   PgCluster.v1.crunchydata.com
pgtest      pg-pink-whale      PgCluster.v1.crunchydata.com

However, all deployments are created in the default namespace.

$ kubectl get deploy --all-namespaces
NAMESPACE         NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default           pg-orange-jaguar              1         1         1            0           35m
default           pg-orange-jaguar-replica      2         2         2            0           35m
default           pg-pink-whale                 1         1         1            0           15m
default           pg-pink-whale-replica         2         2         2            0           15m

It's not clear to me what the model chosen here is. Do I need one operator per pgcluster, one operator per k8s namespaces or one operator per k8s cluster? Optimally I think the operator should only need to be deployed once per k8s cluster and then I can create multiple pgclusters in multiple namespaces.

Does crunchydata/postgres-operator:centos7-1.5.1 create the correct thirdpartyresources?

the logs of the postgres-operator say that they have created:

  • pg-cluster.crunchydata.com
  • pg-backup.
  • pg-upgrade
  • pg-policy
  • pg-clone
  • pg-policy-log

but then it goes on to say that it can't find the requested resources (they don't have dashes in their names):

E0920 18:20:55.418172       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/backup/backup.go:100: Failed to list *tpr.PgBackup: the server could not find the requested resource (get pgbackups.crunchydata.com)
E0920 18:20:55.418308       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/cluster/cluster.go:120: Failed to list *tpr.PgCluster: the server could not find the requested resource (get pgclusters.crunchydata.com)
E0920 18:20:55.425411       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/cluster/clone.go:73: Failed to list *tpr.PgClone: the server could not find the requested resource (get pgclones.crunchydata.com)
E0920 18:20:55.443308       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/cluster/policies.go:163: Failed to list *tpr.PgPolicylog: the server could not find the requested resource (get pgpolicylogs.crunchydata.com)
E0920 18:20:55.443419       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/upgrade/upgrade.go:73: Failed to list *tpr.PgUpgrade: the server could not find the requested resource (get pgupgrades.crunchydata.com)

the examples/tpr folder has a pg-database tpr that isn't created by default by the postgres-operator container. Should it be? OR is it meant for creating a single database instance for testing purposes?

30 minutes after deploying the postgres-operator, I see the following in the logs:

Sep 20 14:57:02 postgres-operator-3770555364-5n3sl postgres-operator error time="2017-09-20T18:57:02Z" level=error msg="error in major upgrade watch closed before Until timeout" 
Sep 20 14:57:59 postgres-operator-3770555364-5n3sl postgres-operator error time="2017-09-20T18:57:59Z" level=error msg="error in ProcessJobs watch closed before Until timeout" 
Sep 20 15:10:23 postgres-operator-3770555364-5n3sl postgres-operator error time="2017-09-20T19:10:23Z" level=error msg="error in ProcessPolicies watch closed before Until timeout" 
Sep 20 15:19:03 postgres-operator-3770555364-5n3sl postgres-operator error time="2017-09-20T19:19:03Z" level=error msg="erro in clone complete watch closed before Until timeout" 

Replace TPR with CRD

Kubernetes 1.7 has deprecated TPRs in favor of CRDs.
Any plans to move to CRD soon?

support image prefix other than default

right now, crunchydata is assumed as the image prefix....using a remote registry does not work with this approach...allow for CCP_IMAGE_PREFIX to be specified in the config.

Getting started script will not work if /data already exists

When the /data directory already exists and the run.sh script is executed, it skips creating the crunch-pvc resource. This in turn will cause the pgo create cluster testcluster to fail when attempting to attach /pgdata

kubectl get pods
NAME                                   READY     STATUS              RESTARTS   AGE
postgres-operator-3650826749-59nxb     1/1       Running             0          2h
testcluster-4275003772-ddfwn           0/1       ContainerCreating   0          2h
testcluster-replica-4058347393-88mhd   0/1       ContainerCreating   0          2h
testcluster-replica-4058347393-q19v2   0/1       ContainerCreating   0          2h
FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  2h		32s		23	kubelet, minikube			Warning		FailedMount	Unable to mount volumes for pod "testcluster-4275003772-ddfwn_default(d2a2f3a7-4d68-11e7-a09f-080027d45d33)": timeout expired waiting for volumes to attach/mount for pod "default"/"testcluster-4275003772-ddfwn". list of unattached/unmounted volumes=[pgdata]

Based on kubelet logs, the crunchy-pvc is not found:

Jun 10 02:02:05 minikube localkube[3372]: E0610 02:02:05.042059    3372 desired_state_of_world_populator.go:259] Error processing volume "pgdata" for pod "testcluster-replica-4058347393-88mhd_default(d2a6ecc5-4d68-11e7-a09f-080027d45d33)": error processing PVC "default"/"crunchy-pvc": failed to fetch PVC default/crunchy-pvc from API server. err=persistentvolumeclaims "crunchy-pvc" not found
Jun 10 02:02:08 minikube localkube[3372]: E0610 02:02:08.062501    3372 desired_state_of_world_populator.go:259] Error processing volume "pgdata" for pod "testcluster-4275003772-ddfwn_default(d2a2f3a7-4d68-11e7-a09f-080027d45d33)": error processing PVC "default"/"crunchy-pvc": failed to fetch PVC default/crunchy-pvc from API server. err=persistentvolumeclaims "crunchy-pvc" not found

Clarify how to deploy the operator to Kubernetes

Only at the very bottom of the README.md is there mention of the postgres-operator container that executes in kubernetes via a Deployment (its docker image in dockerhub is linked).

I was expecting a yaml file that contained a kubernetes Deployment referencing the postgres-operator container, but I cannot find it.

Would you please consider creating a yaml file so that I can deploy the postgres-operator via kubectl create -f postgres-operator.yaml?

Allow postgres extensions

User should be able to install and enable extensions such as postgis within postgres instances that operator creates

Clarify usage on GCE

According to Readme:

Openshift Origin 1.5.1+ or Openshift Container Platform 3.5

...it is required to have Kubernetes installed with OpenShift. Does that mean that postgres-operator will not work over Google Cloud Platform (specifically Google Container Engine)? Shouldn't operator be cloud-provider agnostic?

created clusters do not have roles/passwords specified in .pgo.yaml

Here's my .pgo.yaml (located in ~/.pgo/yaml):

KUBECONFIG:  /Users/jar349/.kube/config
CLUSTER:
  CCP_IMAGE_TAG:  centos7-9.6-1.4.1
  PORT:  5432
  PG_MASTER_USER: admin
  PG_MASTER_PASSWORD:  password
  PG_USER:  admin
  PG_PASSWORD:  password
  PG_DATABASE:  test1
  PG_ROOT_PASSWORD:  password
  STRATEGY:  1
  REPLICAS:  1
  PASSWORD_AGE_DAYS:  3650
MASTER_STORAGE:
  STORAGE_CLASS:  gp2
  PVC_ACCESS_MODE:  ReadWriteOnce
  PVC_SIZE:  1Gi
  STORAGE_TYPE:  dynamic
  FSGROUP:  26
REPLICA_STORAGE:
  STORAGE_CLASS:  gp2
  PVC_ACCESS_MODE:  ReadWriteOnce
  PVC_SIZE:  1Gi
  STORAGE_TYPE:  dynamic
  FSGROUP:  26
BACKUP_STORAGE:
  STORAGE_CLASS:  gp2
  PVC_ACCESS_MODE:  ReadWriteOnce
  PVC_SIZE:  5Gi
  STORAGE_TYPE:  dynamic
  FSGROUP:  26
PGO:
  LSPVC_TEMPLATE:  /Users/jar349/.pgo.lspvc-template.json
  CO_IMAGE_TAG:  centos7-1.5.2
  DEBUG:  true

I create a test1 cluster:

$ pgo create cluster --namespace default test1
DEBU[0000] kubeconfig path is /Users/jar349/.kube/config 
DEBU[0000] namespace is                                 
DEBU[0000] ConnectToKube called                         
DEBU[0000] connected to kube. at /Users/jar349/.kube/config 
DEBU[0000] create cluster called                        
DEBU[0000] no policies are specified                    
DEBU[0000] create cluster called for test1              
DEBU[0000] pgcluster test1 not found so we will create it 
created PgCluster test1

Then I port-forward the pod...

$ kl port-forward test1-1519893712-t1421 5432
Forwarding from 127.0.0.1:5432 -> 5432
Forwarding from [::1]:5432 -> 5432

I load up pgadmin and can't connect. On a hunch, I try a password-less login with postgres and I'm in and I see:

screen shot 2017-09-22 at 5 50 45 pm

Have I done something wrong? Why isn't there an admin role with the password password?

Deployment docs

Operator and it's client deployment docs are a bit unclear

major upgrade to pg10 not working

until pg10 container supports pg_audit, major upgrades require users to manually edit postgresql.conf and remove pg_audit.so reference before the upgrade will work...when pgaudit is available for pg10 this problem will be resolved. This version effects postgres-operator v.2.0.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.