crunchydata / postgres-operator Goto Github PK

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.

Home Page: https://access.crunchydata.com/documentation/postgres-operator/v5/

License: Apache License 2.0

Makefile 0.83% Go 97.73% Shell 1.41% Dockerfile 0.03%

postgresql kubernetes operator postgres postgres-operator high-availability database database-management postgresql-clusters disaster-recovery

postgres-operator's Issues

v1.5.1 postgres-operator deployment fails to schedule

getting No nodes are available that match all of the following predicates:: PodToleratesNodeTaints (1).
I do not see any tolerations set in the deployment.

Considerations multipile PVCs for non-shared external storage

@jmccormick2001
I got the postgres operator to work with Dell EMC's ScaleIO. Here are some considerations to help with scaling the operator for non-shared external storage like ScaleIO:

Create a PVC for main database data mounted at /pgdata
Create a PVC for backup database data mounted at /pgbackup.
Create a PVC for replica pods data mounted at /pgdata

This will help avoid conflicts when scheduling on multi-node Kubernetes clusters where main database pod and replica pods may get scheduled on the same instance.

Failover and recovery

Hi Guys!

I'm looking for information in your solution about failover and recovery of nodes in postgres cluster, sorry but not sure if it exists... could you please advice?

Godep restore has errors when building from source?

Using version 1.9 of Go and Golang.

[sarah@localhost postgres-operator]$ godep restore
godep: Dep (k8s.io/client-go/discovery) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/apps/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/authentication/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/authentication/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/authorization/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/authorization/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/autoscaling/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/autoscaling/v2alpha1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/batch/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/batch/v2alpha1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/certificates/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/core/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/extensions/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/policy/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/rbac/v1alpha1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/rbac/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/settings/v1alpha1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/storage/v1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/kubernetes/typed/storage/v1beta1) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/plugin/pkg/client/auth/gcp) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/rest) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/tools/auth) restored, but was unable to load it with error:
Package (context) not found
godep: Dep (k8s.io/client-go/tools/clientcmd) restored, but was unable to load it with error:
Package (context) not found
godep: Error checking some deps.

Missing "thirdpartyresource.extentions pg-clone.crunchydata.com not found" error when creating cluster

I pulled the latest source code, built locally, and deployed necessary resources successfully:

kubectl get deployment,po,configMap,thirdpartyresource
NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/postgres-operator   1         1         1            1           2m

NAME                                    READY     STATUS    RESTARTS   AGE
po/postgres-operator-2889167354-mjdtk   1/1       Running   0          2m

NAME               DATA      AGE
cm/operator-conf   6         1h

NAME                                             DESCRIPTION                             VERSION(S)
thirdpartyresources/pg-backup.crunchydata.com    A postgres backup ThirdPartyResource    v1
thirdpartyresources/pg-cluster.crunchydata.com   A postgres cluster ThirdPartyResource   v1
thirdpartyresources/pg-upgrade.crunchydata.com   A postgres upgrade ThirdPartyResource   v1

However, when I attempt to create a cluster, I get the following (this is true with other pgo commands):

pgo create cluster mycluster
DEBU[0000] kubeconfig path is /home/vladimir/admin.conf
DEBU[0000] namespace is default
DEBU[0000] ConnectToKube called
ERRO[0000] thirdpartyresources.extensions "pg-clone.crunchydata.com" not found
ERRO[0000] required pg-clone.crunchydata.com TPR was not found on your kube cluster

It is worth noting this works ok when I pull the pre-built binaries ver 1.2 for postgres-operator and pgo.

support metrics

support a --metrics flag

Set VolumeMount to ReadOnly=true explicitly for DB replica pod for /pgdata

The documentation mentions that the replica pod for the database is read-only. However, when inspecting the pod, it shows that the readOnly flag is set to false.

Volumes:
  pgdata:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	pgcluster-pvc
    ReadOnly:	false

Would it make sense to also set the volume bind-mount to readOnly for the pod container or would that break the normal DB operation ?

Why does pgo decide on node placement?

I'm curious about the motivation for fixed placement of postgres instances. Feels like this would decrease robustness by preventing pod + pvc moving to a different node in event of node failure

refactor user command

instead of various flags, use a more proper syntax...

pgo create user ....
pgo delete user ...
etc.

repeat backups error

make sure repeating backups does not error, in some situations this will happen.

Clarify openshift requirements

The requirements section says that Openshift Origin 1.5.1+ and Openshift Container Platform 3.5 are required, but does not explain why - and the term 'openshift' does not appear in the rest of the README.md.

Is it still required?

pgo - show ready state in output

in the pgo show command output, show the ready state as well as the pod phase (currently).

use openshift/kube api instead of /var/run/docker.sock in operator pod

using /var/run/docker.sock in kubernetes or in openshift is incorrect way of managing loads in container orchestration system. The correct way is that it should call and manage resources using APIs in kube/oc. https://github.com/CrunchyData/postgres-operator/blob/master/examples/operator/deployment.json#L61

adding policies during creation race condition

when adding policies as part of cluster creation there is a race condition between when the database starts up and when the policy is applied on slow systems.

version filter

Applying the version filter to show cluster does not work.

pgo show cluster all --version=9.6.3

I am running K8s 1.6.1 with pgo version 1.3.2 in a custom namespace.

load - 2nd time throws error

when running load twice, an error is produced not running the load job, the load needs to cleanup existing jobs before being re-run

ability to apply a policy as part of data loading

a flag to have a policy applied before data loading occurs

can't evaluate field OPERATOR_LABELS in type cluster.DeploymentTemplateFields

When I create a cluster with pgo create cluster testcluster I get the following error:
executing "/operator-conf/cluster-deployment-1.json" at <.OPERATOR_LABELS>: can't evaluate field OPERATOR_LABELS in type cluster.DeploymentTemplateFields

Services are up but not cluster pods

The run.sh script, in the build setup, creates 60 PVs

Script examples/operator/run.sh calls create-pv.sh which in turns creates 60 with

for i in {1..60}
do
   	echo "creating PV crunchy-pv$i"
	export COUNTER=$i
	kubectl --namespace=$NAMESPACE delete pv crunchy-pv$i
	envsubst < $DIR/crunchy-pv.json | kubectl --namespace=$NAMESPACE create -f -
done

Most of them go unbound (as shown below). What is the reason for that setup ?

crunchy-pv54   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv55   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv56   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv57   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv58   1Gi        RWX           Retain          Bound       default/crunchy-pvc                            6m
crunchy-pv59   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv6    1Gi        RWX           Retain          Available                                                  6m
crunchy-pv60   1Gi        RWX           Retain          Available                                                  6m
crunchy-pv7    1Gi        RWX           Retain          Available                                                  6m

thirdpartyresources.extensions "pg-clone.crunchydata.com" not found

Result of command (or any other pgo command) with kubernetes cluster on coreos:

pgo show cluster all

Is:
ERRO[0000] thirdpartyresources.extensions "pg-clone.crunchydata.com" not found
ERRO[0000] required pg-clone.crunchydata.com TPR was not found on your kube cluster

And indeed no pg-clone.crunchydata.com TPR is created by the script in the repos as shown below.

kubectl get thirdpartyresources
NAME DESCRIPTION VERSION(S)
pg-backup.crunchydata.com A postgres backup ThirdPartyResource v1
pg-cluster.crunchydata.com A postgres cluster ThirdPartyResource v1
pg-upgrade.crunchydata.com A postgres upgrade ThirdPartyResource v1

Can't find any info on where to get the pg-clone TPR or how to create it.

Compatibility with Openshift 1.4 ?

Would this work with openshift 1.4 based on k8s 1.4.x ? is there something 1.5 specific you folks rely on ?

re-implement clone operation

the clone operation is removed in v2.0.0, it needs to be re-implemented due to various issues with that implementation.

better error messages from pgo when applying policies

when a policy is not applied to anything matching the selector, a better message to the user is required than nothing to indicate nothing was matched

policy apply - allow database choice

specify which database the policy gets applied to? It looks to be applied to the postgres database by default, we should allow the PG_DATABASE setting to be chosen for the policy as well.

Clarify that configuration of pgo cli is required, not optional

Now that I have things installed, here's my history trying to create a cluster:

$ which pgo
/usr/local/bin/pgo

$ pgo create cluster my-service
ERRO[0000] --kubeconfig flag is not set and required

$ pgo
The pgo command line interface lets you
create and manage PostgreSQL clusters.

Usage:
  pgo [command]

Available Commands:
  apply       apply a Policy
  backup      perform a Backup
  clone       perform a clone
  create      Create a Cluster or Policy
  delete      Delete a policy, database, cluster, backup, or upgrade
  scale       Scale a Cluster
  show        show a description of a cluster
  test        test a Cluster
  upgrade     perform an upgrade

Flags:
      --config string       config file (default is $HOME/.pgo.yaml)
      --debug               enable debug with true
      --kubeconfig string   kube config file
      --namespace string    kube namespace to work in (default is default)
      --selector string     label selector string
  -t, --toggle              Help message for toggle

Use "pgo [command] --help" for more information about a command.

$ ls ~/.kube/config 
/Users/jar349/.kube/config

$ pgo --kubeconfig ~/.kube/config create cluster my-service
ERRO[0000] --namespace flag is not set and required

$ pgo --kubeconfig ~/.kube/config --namespace default create cluster my-service
ERRO[0000] invalid MASTER_STORAGE.PVC_ACCESS_MODE specified

My thought was I'd just be able to type out the commands the way that you do in your documentation. The documentation says "You can configure both the client and the operator" but the truth is (unless I've made another mistake which is entirely possible) you MUST configure the cli before one can follow along with your documentation.

I recommend clarifying that.

Also, it's common for tools to place their configuration files in $HOME/.tool/config but that's not in the list of places pgo looks for its config. I'd prefer not to put pgo.yaml directly in $HOME. Would it be possible to add $HOME/.pgo/ to the list of places pgo looks for config files?

Use K8s ConfigMap instead of PVC for /data

Similarly to using Secrets #6 it seems using ConfigMaps instead of a hostPath PVC would make more sense to store the operator configuration and templates. Or is there a reason why ConfigMaps wouldn't work here?

support data deletion

when deleting a cluster, a flag to support data deletion, currently data is NOT deleted when the cluster is deleted

Use Kubernetes secrets to store credentials

Currently all credentials seem to be stored in the TPRs. Is it planned to switch this to using Kubernetes secrets? This way the operator could generate credentials while deploying a new database or cluster and store them in secrets.

why don't you use statefullset to implement ?

seg fault - pgo show backup all scenario

when a backup pod is no longer there, the pgo show backup all command will segfault, recreate by create by creating a backup, then remove the pod with kubectl, then pgo show backup all

Databases and clusters should be created in the same namespace as the TPR

I have two clusters defined as TPRs. One in the default namespace and one in the pgtest namespace.

$ kubectl get pgcluster --all-namespaces
NAMESPACE   NAME               KIND
default     pg-orange-jaguar   PgCluster.v1.crunchydata.com
pgtest      pg-pink-whale      PgCluster.v1.crunchydata.com

However, all deployments are created in the default namespace.

$ kubectl get deploy --all-namespaces
NAMESPACE         NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
default           pg-orange-jaguar              1         1         1            0           35m
default           pg-orange-jaguar-replica      2         2         2            0           35m
default           pg-pink-whale                 1         1         1            0           15m
default           pg-pink-whale-replica         2         2         2            0           15m

It's not clear to me what the model chosen here is. Do I need one operator per pgcluster, one operator per k8s namespaces or one operator per k8s cluster? Optimally I think the operator should only need to be deployed once per k8s cluster and then I can create multiple pgclusters in multiple namespaces.

Does crunchydata/postgres-operator:centos7-1.5.1 create the correct thirdpartyresources?

the logs of the postgres-operator say that they have created:

pg-cluster.crunchydata.com
pg-backup.
pg-upgrade
pg-policy
pg-clone
pg-policy-log

but then it goes on to say that it can't find the requested resources (they don't have dashes in their names):

E0920 18:20:55.418172       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/backup/backup.go:100: Failed to list *tpr.PgBackup: the server could not find the requested resource (get pgbackups.crunchydata.com)
E0920 18:20:55.418308       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/cluster/cluster.go:120: Failed to list *tpr.PgCluster: the server could not find the requested resource (get pgclusters.crunchydata.com)
E0920 18:20:55.425411       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/cluster/clone.go:73: Failed to list *tpr.PgClone: the server could not find the requested resource (get pgclones.crunchydata.com)
E0920 18:20:55.443308       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/cluster/policies.go:163: Failed to list *tpr.PgPolicylog: the server could not find the requested resource (get pgpolicylogs.crunchydata.com)
E0920 18:20:55.443419       1 reflector.go:201] github.com/crunchydata/postgres-operator/operator/upgrade/upgrade.go:73: Failed to list *tpr.PgUpgrade: the server could not find the requested resource (get pgupgrades.crunchydata.com)

the examples/tpr folder has a pg-database tpr that isn't created by default by the postgres-operator container. Should it be? OR is it meant for creating a single database instance for testing purposes?

30 minutes after deploying the postgres-operator, I see the following in the logs:

Sep 20 14:57:02 postgres-operator-3770555364-5n3sl postgres-operator error time="2017-09-20T18:57:02Z" level=error msg="error in major upgrade watch closed before Until timeout" 
Sep 20 14:57:59 postgres-operator-3770555364-5n3sl postgres-operator error time="2017-09-20T18:57:59Z" level=error msg="error in ProcessJobs watch closed before Until timeout" 
Sep 20 15:10:23 postgres-operator-3770555364-5n3sl postgres-operator error time="2017-09-20T19:10:23Z" level=error msg="error in ProcessPolicies watch closed before Until timeout" 
Sep 20 15:19:03 postgres-operator-3770555364-5n3sl postgres-operator error time="2017-09-20T19:19:03Z" level=error msg="erro in clone complete watch closed before Until timeout"

Replace TPR with CRD

Kubernetes 1.7 has deprecated TPRs in favor of CRDs.
Any plans to move to CRD soon?

better user feedback for pgo database delete

when you run 'pgo delete database missingfoobar', and missingfoobar does not exist, a better feedback message to the user is needed, currently nothing is printed out.

support infinity password aging

support image prefix other than default

right now, crunchydata is assumed as the image prefix....using a remote registry does not work with this approach...allow for CCP_IMAGE_PREFIX to be specified in the config.

Getting started script will not work if /data already exists

When the /data directory already exists and the run.sh script is executed, it skips creating the crunch-pvc resource. This in turn will cause the pgo create cluster testcluster to fail when attempting to attach /pgdata

kubectl get pods
NAME                                   READY     STATUS              RESTARTS   AGE
postgres-operator-3650826749-59nxb     1/1       Running             0          2h
testcluster-4275003772-ddfwn           0/1       ContainerCreating   0          2h
testcluster-replica-4058347393-88mhd   0/1       ContainerCreating   0          2h
testcluster-replica-4058347393-q19v2   0/1       ContainerCreating   0          2h

FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  2h		32s		23	kubelet, minikube			Warning		FailedMount	Unable to mount volumes for pod "testcluster-4275003772-ddfwn_default(d2a2f3a7-4d68-11e7-a09f-080027d45d33)": timeout expired waiting for volumes to attach/mount for pod "default"/"testcluster-4275003772-ddfwn". list of unattached/unmounted volumes=[pgdata]

Based on kubelet logs, the crunchy-pvc is not found:

Jun 10 02:02:05 minikube localkube[3372]: E0610 02:02:05.042059    3372 desired_state_of_world_populator.go:259] Error processing volume "pgdata" for pod "testcluster-replica-4058347393-88mhd_default(d2a6ecc5-4d68-11e7-a09f-080027d45d33)": error processing PVC "default"/"crunchy-pvc": failed to fetch PVC default/crunchy-pvc from API server. err=persistentvolumeclaims "crunchy-pvc" not found
Jun 10 02:02:08 minikube localkube[3372]: E0610 02:02:08.062501    3372 desired_state_of_world_populator.go:259] Error processing volume "pgdata" for pod "testcluster-4275003772-ddfwn_default(d2a2f3a7-4d68-11e7-a09f-080027d45d33)": error processing PVC "default"/"crunchy-pvc": failed to fetch PVC default/crunchy-pvc from API server. err=persistentvolumeclaims "crunchy-pvc" not found

support audit log

`pgo create cluster example1` spits an error

level=error msg="error unmarshalling master json into Deployment invalid character 's' looking for beginning of object key string"

Clarify how to deploy the operator to Kubernetes

Only at the very bottom of the README.md is there mention of the postgres-operator container that executes in kubernetes via a Deployment (its docker image in dockerhub is linked).

I was expecting a yaml file that contained a kubernetes Deployment referencing the postgres-operator container, but I cannot find it.

Would you please consider creating a yaml file so that I can deploy the postgres-operator via kubectl create -f postgres-operator.yaml?

support proxy

support --pgpool and --proxy flags

Allow postgres extensions

User should be able to install and enable extensions such as postgis within postgres instances that operator creates

bug - pgo seg faults when backup job is in ContainerCreating status

when you create a backup job for the first time, it will cause the backup container to download, takes a bit, if you issue 'pgo show backup foo' during that time, it will segfault during the call to lspvc

json file loading

postgres-operator Helm chart

have you guys have planned to release postgres-operator Helm chart?

Clarify usage on GCE

According to Readme:

Openshift Origin 1.5.1+ or Openshift Container Platform 3.5

...it is required to have Kubernetes installed with OpenShift. Does that mean that postgres-operator will not work over Google Cloud Platform (specifically Google Container Engine)? Shouldn't operator be cloud-provider agnostic?

created clusters do not have roles/passwords specified in .pgo.yaml

Here's my .pgo.yaml (located in ~/.pgo/yaml):

KUBECONFIG:  /Users/jar349/.kube/config
CLUSTER:
  CCP_IMAGE_TAG:  centos7-9.6-1.4.1
  PORT:  5432
  PG_MASTER_USER: admin
  PG_MASTER_PASSWORD:  password
  PG_USER:  admin
  PG_PASSWORD:  password
  PG_DATABASE:  test1
  PG_ROOT_PASSWORD:  password
  STRATEGY:  1
  REPLICAS:  1
  PASSWORD_AGE_DAYS:  3650
MASTER_STORAGE:
  STORAGE_CLASS:  gp2
  PVC_ACCESS_MODE:  ReadWriteOnce
  PVC_SIZE:  1Gi
  STORAGE_TYPE:  dynamic
  FSGROUP:  26
REPLICA_STORAGE:
  STORAGE_CLASS:  gp2
  PVC_ACCESS_MODE:  ReadWriteOnce
  PVC_SIZE:  1Gi
  STORAGE_TYPE:  dynamic
  FSGROUP:  26
BACKUP_STORAGE:
  STORAGE_CLASS:  gp2
  PVC_ACCESS_MODE:  ReadWriteOnce
  PVC_SIZE:  5Gi
  STORAGE_TYPE:  dynamic
  FSGROUP:  26
PGO:
  LSPVC_TEMPLATE:  /Users/jar349/.pgo.lspvc-template.json
  CO_IMAGE_TAG:  centos7-1.5.2
  DEBUG:  true

I create a test1 cluster:

$ pgo create cluster --namespace default test1
DEBU[0000] kubeconfig path is /Users/jar349/.kube/config 
DEBU[0000] namespace is                                 
DEBU[0000] ConnectToKube called                         
DEBU[0000] connected to kube. at /Users/jar349/.kube/config 
DEBU[0000] create cluster called                        
DEBU[0000] no policies are specified                    
DEBU[0000] create cluster called for test1              
DEBU[0000] pgcluster test1 not found so we will create it 
created PgCluster test1

Then I port-forward the pod...

$ kl port-forward test1-1519893712-t1421 5432
Forwarding from 127.0.0.1:5432 -> 5432
Forwarding from [::1]:5432 -> 5432

I load up pgadmin and can't connect. On a hunch, I try a password-less login with postgres and I'm in and I see:

Have I done something wrong? Why isn't there an admin role with the password password?

Deployment docs

Operator and it's client deployment docs are a bit unclear

major upgrade to pg10 not working

until pg10 container supports pg_audit, major upgrades require users to manually edit postgresql.conf and remove pg_audit.so reference before the upgrade will work...when pgaudit is available for pg10 this problem will be resolved. This version effects postgres-operator v.2.0.0.

crunchydata / postgres-operator Goto Github PK

postgres-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org