Comments (11)
Just encountered into the same issue in GKE
from cloudnative-pg.
From what I can see, it looks like an issue with the underlying file system (data corruption of that file). You need to recreate the PVC of that instance.
from cloudnative-pg.
It happens 2 days in a row, we are using EC2 spot instances, and we managed to solve it by deleting the problematic PVC (gp3) but that manual PVC creation every day is probably not the desired behavior.
from cloudnative-pg.
This might be related to #3698. Give us some time to investigate.
from cloudnative-pg.
I face the same issue in multiple clusters in the same k8s cluster. I think it occurs when a network error happens.
The only way to resolve it is to delete the PVC of the failing pod and remove the pod. In this case, the operator is going to recreate the failing replica.
Logs:
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"setup","msg":"Starting CloudNativePG Instance Manager","logging_pod":"harbor-postgres-3","version":"1.22.1","build":{"Version":"1.22.1","Commit":"c7be872e","Date":"2024-02-02"}}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"setup","msg":"starting tablespace manager","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"setup","msg":"starting external server manager","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"setup","msg":"starting controller-runtime manager","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting EventSource","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","source":"kind source: *v1.Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting Controller","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"roles_reconciler","msg":"starting up the runnable","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"roles_reconciler","msg":"skipping the RoleSynchronizer in replicas","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"roles_reconciler","msg":"setting up RoleSynchronizer loop","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting webserver","logging_pod":"harbor-postgres-3","address":":9187"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting webserver","logging_pod":"harbor-postgres-3","address":"localhost:8010"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting EventSource","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","source":"kind source: *v1.Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting Controller","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting webserver","logging_pod":"harbor-postgres-3","address":":8000"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting EventSource","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","source":"kind source: *v1.Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting Controller","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting workers","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","worker count":1}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting workers","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","worker count":1}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting workers","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","worker count":1}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Installed configuration file","logging_pod":"harbor-postgres-3","pgdata":"/var/lib/postgresql/data/pgdata","filename":"pg_ident.conf"}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Updated replication settings","logging_pod":"harbor-postgres-3","filename":"override.conf"}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Found previous run flag","logging_pod":"harbor-postgres-3","filename":"/var/lib/postgresql/data/pgdata/cnpg_initialized-harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Extracting pg_controldata information","logging_pod":"harbor-postgres-3","reason":"postmaster start up"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"pg_controldata","msg":"pg_control version number: 1300\nCatalog version number: 202307071\nDatabase system identifier: 7321087858539200534\nDatabase cluster state: shut down in recovery\npg_control last modified: Thu 29 Feb 2024 06:43:08 AM UTC\nLatest checkpoint location: E/10000028\nLatest checkpoint's REDO location: E/10000028\nLatest checkpoint's REDO WAL file: 0000000C0000000E00000010\nLatest checkpoint's TimeLineID: 12\nLatest checkpoint's PrevTimeLineID: 12\nLatest checkpoint's full_page_writes: on\nLatest checkpoint's NextXID: 0:93004\nLatest checkpoint's NextOID: 25537\nLatest checkpoint's NextMultiXactId: 2\nLatest checkpoint's NextMultiOffset: 3\nLatest checkpoint's oldestXID: 722\nLatest checkpoint's oldestXID's DB: 1\nLatest checkpoint's oldestActiveXID: 0\nLatest checkpoint's oldestMultiXid: 1\nLatest checkpoint's oldestMulti's DB: 1\nLatest checkpoint's oldestCommitTsXid:0\nLatest checkpoint's newestCommitTsXid:0\nTime of latest checkpoint: Wed 28 Feb 2024 09:53:23 PM UTC\nFake LSN counter for unlogged rels: 0/3E8\nMinimum recovery ending location: E/100000A0\nMin recovery ending loc's timeline: 12\nBackup start location: 0/0\nBackup end location: 0/0\nEnd-of-backup record required: no\nwal_level setting: logical\nwal_log_hints setting: on\nmax_connections setting: 100\nmax_worker_processes setting: 32\nmax_wal_senders setting: 10\nmax_prepared_xacts setting: 0\nmax_locks_per_xact setting: 64\ntrack_commit_timestamp setting: off\nMaximum data alignment: 8\nDatabase block size: 8192\nBlocks per segment of large relation: 131072\nWAL block size: 8192\nBytes per WAL segment: 16777216\nMaximum length of identifiers: 64\nMaximum columns in an index: 32\nMaximum size of a TOAST chunk: 1996\nSize of a large-object chunk: 2048\nDate/time type storage: 64-bit integers\nFloat8 argument passing: by value\nData page checksum version: 0\nMock authentication nonce: 431926e09764e6c576b934232250a417e85da4f29af6306f5db1e88e1660bc4f\n","pipe":"stdout","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Instance is still down, will retry in 1 second","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"harbor-postgres","namespace":"harbor"},"namespace":"harbor","name":"harbor-postgres","reconcileID":"0846c8b9-adae-4fa0-9085-82724a409769","uuid":"e5f15169-d6d1-11ee-83aa-4ade4de24c82","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.099 UTC [24] LOG: pgaudit extension initialized","pipe":"stderr","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.152 UTC [24] LOG: redirecting log output to logging collector process","pipe":"stderr","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.152 UTC [24] HINT: Future log output will appear in directory \"/controller/log\".","pipe":"stderr","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.152 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"1","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"ending log output to stderr","hint":"Future log output will go to log destination \"csvlog\".","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.152 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"2","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"starting PostgreSQL 16.1 (Debian 16.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.152 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"3","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"listening on IPv4 address \"0.0.0.0\", port 5432","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.152 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"4","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"listening on IPv6 address \"::\", port 5432","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.152 UTC [24] LOG: ending log output to stderr","source":"/controller/log/postgres","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.152 UTC [24] HINT: Future log output will go to log destination \"csvlog\".","source":"/controller/log/postgres","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.167 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"5","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"listening on Unix socket \"/controller/run/.s.PGSQL.5432\"","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.198 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"1","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"database system was shut down in recovery at 2024-02-29 06:43:08 UTC","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"wal-restore","msg":"Restored WAL file","logging_pod":"harbor-postgres-3","walName":"0000000D.history","startTime":"2024-02-29T07:12:29Z","endTime":"2024-02-29T07:12:29Z","elapsedWalTime":0.633726065}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"wal-restore","msg":"WAL restore command completed (parallel)","logging_pod":"harbor-postgres-3","walName":"0000000D.history","maxParallel":1,"successfulWalRestore":1,"failedWalRestore":0,"endOfWALStream":false,"startTime":"2024-02-29T07:12:29Z","downloadStartTime":"2024-02-29T07:12:29Z","downloadTotalTime":0.634063937,"totalTime":0.734779978}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.964 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"2","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000D.history\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.066 UTC","user_name":"postgres","database_name":"postgres","process_id":"52","connection_from":"[local]","session_id":"65e02e5e.34","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","msg":"Updated replication settings","logging_pod":"harbor-postgres-3","filename":"override.conf"}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.222 UTC","user_name":"postgres","database_name":"postgres","process_id":"55","connection_from":"[local]","session_id":"65e02e5e.37","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.245 UTC","user_name":"postgres","database_name":"postgres","process_id":"56","connection_from":"[local]","session_id":"65e02e5e.38","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.246 UTC","user_name":"postgres","database_name":"postgres","process_id":"57","connection_from":"[local]","session_id":"65e02e5e.39","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","msg":"DB not available, will retry","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"harbor-postgres","namespace":"harbor"},"namespace":"harbor","name":"harbor-postgres","reconcileID":"fbc281dc-1514-4b0a-9c39-0689e3a17f0a","uuid":"e6a01d01-d6d1-11ee-83aa-4ade4de24c82","logging_pod":"harbor-postgres-3","err":"failed to connect to `host=/controller/run user=postgres database=postgres`: server error (FATAL: the database system is starting up (SQLSTATE 57P03))"}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.250 UTC","user_name":"postgres","database_name":"postgres","process_id":"58","connection_from":"[local]","session_id":"65e02e5e.3a","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.305 UTC","user_name":"postgres","database_name":"postgres","process_id":"59","connection_from":"[local]","session_id":"65e02e5e.3b","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.571 UTC","user_name":"postgres","database_name":"postgres","process_id":"60","connection_from":"[local]","session_id":"65e02e5e.3c","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"wal-restore","msg":"WAL file not found in the recovery object store","logging_pod":"harbor-postgres-3","walName":"0000000E.history","options":["--endpoint-url","https://s3.eu-central-1.amazonaws.com","--cloud-provider","aws-s3","s3://TRUNCATED/","harbor-postgres"],"startTime":"2024-02-29T07:12:30Z","endTime":"2024-02-29T07:12:30Z","elapsedWalTime":0.555448566}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.751 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"3","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"entering standby mode","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:31Z","msg":"Updated replication settings","logging_pod":"harbor-postgres-3","filename":"override.conf"}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:31.397 UTC","user_name":"postgres","database_name":"postgres","process_id":"74","connection_from":"[local]","session_id":"65e02e5f.4a","session_line_num":"1","session_start_time":"2024-02-29 07:12:31 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:31.399 UTC","user_name":"postgres","database_name":"postgres","process_id":"75","connection_from":"[local]","session_id":"65e02e5f.4b","session_line_num":"1","session_start_time":"2024-02-29 07:12:31 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:31Z","msg":"DB not available, will retry","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"harbor-postgres","namespace":"harbor"},"namespace":"harbor","name":"harbor-postgres","reconcileID":"70ee1fc8-5c19-43f4-a7d1-3bc8df2a218d","uuid":"e75275e9-d6d1-11ee-83aa-4ade4de24c82","logging_pod":"harbor-postgres-3","err":"failed to connect to `host=/controller/run user=postgres database=postgres`: server error (FATAL: the database system is starting up (SQLSTATE 57P03))"}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"wal-restore","msg":"Restored WAL file","logging_pod":"harbor-postgres-3","walName":"0000000D.history","startTime":"2024-02-29T07:12:30Z","endTime":"2024-02-29T07:12:31Z","elapsedWalTime":0.524848024}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"wal-restore","msg":"WAL restore command completed (parallel)","logging_pod":"harbor-postgres-3","walName":"0000000D.history","maxParallel":1,"successfulWalRestore":1,"failedWalRestore":0,"endOfWALStream":false,"startTime":"2024-02-29T07:12:30Z","downloadStartTime":"2024-02-29T07:12:30Z","downloadTotalTime":0.525037538,"totalTime":0.633526301}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:31.418 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"4","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000D.history\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:31.911 UTC","user_name":"postgres","database_name":"postgres","process_id":"88","connection_from":"[local]","session_id":"65e02e5f.58","session_line_num":"1","session_start_time":"2024-02-29 07:12:31 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.059 UTC","user_name":"postgres","database_name":"postgres","process_id":"89","connection_from":"[local]","session_id":"65e02e60.59","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.072 UTC","user_name":"postgres","database_name":"postgres","process_id":"90","connection_from":"[local]","session_id":"65e02e60.5a","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.129 UTC","user_name":"postgres","database_name":"postgres","process_id":"91","connection_from":"[local]","session_id":"65e02e60.5b","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"wal-restore","msg":"Restored WAL file","logging_pod":"harbor-postgres-3","walName":"0000000D0000000E00000010","startTime":"2024-02-29T07:12:31Z","endTime":"2024-02-29T07:12:32Z","elapsedWalTime":0.614393645}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"wal-restore","msg":"WAL restore command completed (parallel)","logging_pod":"harbor-postgres-3","walName":"0000000D0000000E00000010","maxParallel":1,"successfulWalRestore":1,"failedWalRestore":0,"endOfWALStream":false,"startTime":"2024-02-29T07:12:31Z","downloadStartTime":"2024-02-29T07:12:31Z","downloadTotalTime":0.61485673,"totalTime":0.708725379}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.200 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"5","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000D0000000E00000010\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.392 UTC","user_name":"postgres","database_name":"postgres","process_id":"93","connection_from":"[local]","session_id":"65e02e60.5d","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.416 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"6","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"invalid resource manager ID in checkpoint record","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.416 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"7","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"PANIC","sql_state_code":"XX000","message":"could not locate a valid checkpoint record","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.434 UTC","user_name":"postgres","database_name":"postgres","process_id":"94","connection_from":"[local]","session_id":"65e02e60.5e","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.494 UTC","user_name":"postgres","database_name":"postgres","process_id":"96","connection_from":"[local]","session_id":"65e02e60.60","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","msg":"DB not available, will retry","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"harbor-postgres","namespace":"harbor"},"namespace":"harbor","name":"harbor-postgres","reconcileID":"d5a0ece4-2194-454f-80c1-1c370fda5f19","uuid":"e801d87a-d6d1-11ee-83aa-4ade4de24c82","logging_pod":"harbor-postgres-3","err":"failed to connect to `host=/controller/run user=postgres database=postgres`: server error (FATAL: the database system is starting up (SQLSTATE 57P03))"}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.496 UTC","user_name":"postgres","database_name":"postgres","process_id":"97","connection_from":"[local]","session_id":"65e02e60.61","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.634 UTC","user_name":"postgres","database_name":"postgres","process_id":"98","connection_from":"[local]","session_id":"65e02e60.62","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:33Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:33.283 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"6","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"startup process (PID 28) was terminated by signal 6: Aborted","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:33Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:33.283 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"7","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"aborting startup due to startup process failure","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:33Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:33.285 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"8","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"database system is shut down","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Extracting pg_controldata information","logging_pod":"harbor-postgres-3","reason":"postmaster has exited"}
{"level":"error","ts":"2024-02-29T07:12:33Z","msg":"PostgreSQL process exited with errors","logging_pod":"harbor-postgres-3","error":"exit status 1","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:128\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/run/lifecycle.(*PostgresLifecycle).Start\n\tinternal/cmd/manager/instance/run/lifecycle/lifecycle.go:98\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:223"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"error","ts":"2024-02-29T07:12:33Z","msg":"error received after stop sequence was engaged","error":"exit status 1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:490"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Webserver exited","logging_pod":"harbor-postgres-3","address":":9187"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Webserver exited","logging_pod":"harbor-postgres-3","address":":8000"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Exited log pipe","fileName":"/controller/log/postgres.json","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Exited log pipe","fileName":"/controller/log/postgres","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"All workers finished","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Webserver exited","logging_pod":"harbor-postgres-3","address":"localhost:8010"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"All workers finished","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"All workers finished","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Exited log pipe","fileName":"/controller/log/postgres.csv","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2024-02-29T07:12:33Z","logger":"roles_reconciler","msg":"Terminated RoleSynchronizer loop","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for HTTP servers"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Wait completed, proceeding to shutdown the manager"}
from cloudnative-pg.
We are facing the same problem on Google GKE clusters. After each node upgrade random instances have problems with recovering. The only workaround we found is removing one of failing instance PVC.
from cloudnative-pg.
I keep running into this every other day
from cloudnative-pg.
Also discovered this now.. is CNPG team aware of this? Whom to ping about this issue?
only working solution was to delete the failing Pod and PVC so a new node is created to join the cluster
from cloudnative-pg.
I am facing similar issue. Do we have any fix available for below issue ? I tried to delete the pod but did not help.
{"level":"error","ts":"2024-02-29T07:12:33Z","msg":"PostgreSQL process exited with errors","logging_pod":"harbor-postgres-3","error":"exit status 1","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:128\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/run/lifecycle.(*PostgresLifecycle).Start\n\tinternal/cmd/manager/instance/run/lifecycle/lifecycle.go:98\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:223"}
from cloudnative-pg.
@gbartolini is it possible to modify the health check behavior of the pods managed by CNPG?
from cloudnative-pg.
This might be related to #3698. Give us some time to investigate.
@gbartolini Any updates on this would be greatly appreciated 🙏🏿
from cloudnative-pg.
Related Issues (20)
- Can cloudnative-pg support longhorn to make data sync instead of pg sync data itself in pg cluster, to support infinity disk capacity
- setup creadentials for preflight check in ci workflow
- [Bug]: Logger PGAudit not working, postgres logger is used instead HOT 2
- [Bug]: Kubectl Plugin displays default Operator Image version, when using ImageCatalog in Cluster CR
- [Bug]: CNPG 1.23 - cannot create Cluster auxiliary objects: expected pointer, but got invalid HOT 2
- [Bug]: Replica starts recovery from backup that is outside of retention period.
- [Bug]: Node restarts may render cluster broken HOT 1
- [Docs]: development env setup instructions don't support Apple Silicon
- [Docs]: Release notes for 1.23.1
- [Feature]: Add Container Lifecycle Hooks (PostStart,PreStop) for containers in the Cluster CRD
- [Bug]: Recovery from WAL fails when tablespace was added HOT 1
- [Bug]: New replica searches for WAL archive that doesn't exist
- [Feature]: ScheduledBackup resource should be able to backup using plugin method
- [Bug]: Operator restarting due to DetectAvailableArchitectures()
- [Bug]: Operator test should rely on Deployments not counting pods
- [Docs]: guide on production readiness for CNPG HOT 2
- [Bug]: Do not start the psql or minio container for tests if not needed
- [Bug]: The instance manager is not able to unfence PG after a previous unfence operation failed
- [Feature]: Specify addtional labels for PodMonitor
- [Bug]: Missing log line if only the private key is changed in a TLS secret
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cloudnative-pg.