Code Monkey home page Code Monkey logo

Comments (11)

movergan avatar movergan commented on May 24, 2024 1

Just encountered into the same issue in GKE

from cloudnative-pg.

gbartolini avatar gbartolini commented on May 24, 2024

From what I can see, it looks like an issue with the underlying file system (data corruption of that file). You need to recreate the PVC of that instance.

from cloudnative-pg.

orenzah avatar orenzah commented on May 24, 2024

It happens 2 days in a row, we are using EC2 spot instances, and we managed to solve it by deleting the problematic PVC (gp3) but that manual PVC creation every day is probably not the desired behavior.

from cloudnative-pg.

gbartolini avatar gbartolini commented on May 24, 2024

This might be related to #3698. Give us some time to investigate.

from cloudnative-pg.

janosmiko avatar janosmiko commented on May 24, 2024

I face the same issue in multiple clusters in the same k8s cluster. I think it occurs when a network error happens.

The only way to resolve it is to delete the PVC of the failing pod and remove the pod. In this case, the operator is going to recreate the failing replica.

Logs:

{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"setup","msg":"Starting CloudNativePG Instance Manager","logging_pod":"harbor-postgres-3","version":"1.22.1","build":{"Version":"1.22.1","Commit":"c7be872e","Date":"2024-02-02"}}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"setup","msg":"starting tablespace manager","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"setup","msg":"starting external server manager","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"setup","msg":"starting controller-runtime manager","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting EventSource","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","source":"kind source: *v1.Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting Controller","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"roles_reconciler","msg":"starting up the runnable","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"roles_reconciler","msg":"skipping the RoleSynchronizer in replicas","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","logger":"roles_reconciler","msg":"setting up RoleSynchronizer loop","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting webserver","logging_pod":"harbor-postgres-3","address":":9187"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting webserver","logging_pod":"harbor-postgres-3","address":"localhost:8010"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting EventSource","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","source":"kind source: *v1.Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting Controller","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting webserver","logging_pod":"harbor-postgres-3","address":":8000"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting EventSource","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","source":"kind source: *v1.Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting Controller","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting workers","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","worker count":1}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting workers","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","worker count":1}
{"level":"info","ts":"2024-02-29T07:12:28Z","msg":"Starting workers","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","worker count":1}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Installed configuration file","logging_pod":"harbor-postgres-3","pgdata":"/var/lib/postgresql/data/pgdata","filename":"pg_ident.conf"}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Updated replication settings","logging_pod":"harbor-postgres-3","filename":"override.conf"}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Found previous run flag","logging_pod":"harbor-postgres-3","filename":"/var/lib/postgresql/data/pgdata/cnpg_initialized-harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Extracting pg_controldata information","logging_pod":"harbor-postgres-3","reason":"postmaster start up"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"pg_controldata","msg":"pg_control version number:            1300\nCatalog version number:               202307071\nDatabase system identifier:           7321087858539200534\nDatabase cluster state:               shut down in recovery\npg_control last modified:             Thu 29 Feb 2024 06:43:08 AM UTC\nLatest checkpoint location:           E/10000028\nLatest checkpoint's REDO location:    E/10000028\nLatest checkpoint's REDO WAL file:    0000000C0000000E00000010\nLatest checkpoint's TimeLineID:       12\nLatest checkpoint's PrevTimeLineID:   12\nLatest checkpoint's full_page_writes: on\nLatest checkpoint's NextXID:          0:93004\nLatest checkpoint's NextOID:          25537\nLatest checkpoint's NextMultiXactId:  2\nLatest checkpoint's NextMultiOffset:  3\nLatest checkpoint's oldestXID:        722\nLatest checkpoint's oldestXID's DB:   1\nLatest checkpoint's oldestActiveXID:  0\nLatest checkpoint's oldestMultiXid:   1\nLatest checkpoint's oldestMulti's DB: 1\nLatest checkpoint's oldestCommitTsXid:0\nLatest checkpoint's newestCommitTsXid:0\nTime of latest checkpoint:            Wed 28 Feb 2024 09:53:23 PM UTC\nFake LSN counter for unlogged rels:   0/3E8\nMinimum recovery ending location:     E/100000A0\nMin recovery ending loc's timeline:   12\nBackup start location:                0/0\nBackup end location:                  0/0\nEnd-of-backup record required:        no\nwal_level setting:                    logical\nwal_log_hints setting:                on\nmax_connections setting:              100\nmax_worker_processes setting:         32\nmax_wal_senders setting:              10\nmax_prepared_xacts setting:           0\nmax_locks_per_xact setting:           64\ntrack_commit_timestamp setting:       off\nMaximum data alignment:               8\nDatabase block size:                  8192\nBlocks per segment of large relation: 131072\nWAL block size:                       8192\nBytes per WAL segment:                16777216\nMaximum length of identifiers:        64\nMaximum columns in an index:          32\nMaximum size of a TOAST chunk:        1996\nSize of a large-object chunk:         2048\nDate/time type storage:               64-bit integers\nFloat8 argument passing:              by value\nData page checksum version:           0\nMock authentication nonce:            431926e09764e6c576b934232250a417e85da4f29af6306f5db1e88e1660bc4f\n","pipe":"stdout","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","msg":"Instance is still down, will retry in 1 second","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"harbor-postgres","namespace":"harbor"},"namespace":"harbor","name":"harbor-postgres","reconcileID":"0846c8b9-adae-4fa0-9085-82724a409769","uuid":"e5f15169-d6d1-11ee-83aa-4ade4de24c82","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.099 UTC [24] LOG:  pgaudit extension initialized","pipe":"stderr","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.152 UTC [24] LOG:  redirecting log output to logging collector process","pipe":"stderr","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.152 UTC [24] HINT:  Future log output will appear in directory \"/controller/log\".","pipe":"stderr","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.152 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"1","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"ending log output to stderr","hint":"Future log output will go to log destination \"csvlog\".","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.152 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"2","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"starting PostgreSQL 16.1 (Debian 16.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.152 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"3","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"listening on IPv4 address \"0.0.0.0\", port 5432","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.152 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"4","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"listening on IPv6 address \"::\", port 5432","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.152 UTC [24] LOG:  ending log output to stderr","source":"/controller/log/postgres","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"2024-02-29 07:12:29.152 UTC [24] HINT:  Future log output will go to log destination \"csvlog\".","source":"/controller/log/postgres","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.167 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"5","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"listening on Unix socket \"/controller/run/.s.PGSQL.5432\"","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.198 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"1","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"database system was shut down in recovery at 2024-02-29 06:43:08 UTC","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"wal-restore","msg":"Restored WAL file","logging_pod":"harbor-postgres-3","walName":"0000000D.history","startTime":"2024-02-29T07:12:29Z","endTime":"2024-02-29T07:12:29Z","elapsedWalTime":0.633726065}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"wal-restore","msg":"WAL restore command completed (parallel)","logging_pod":"harbor-postgres-3","walName":"0000000D.history","maxParallel":1,"successfulWalRestore":1,"failedWalRestore":0,"endOfWALStream":false,"startTime":"2024-02-29T07:12:29Z","downloadStartTime":"2024-02-29T07:12:29Z","downloadTotalTime":0.634063937,"totalTime":0.734779978}
{"level":"info","ts":"2024-02-29T07:12:29Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:29.964 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"2","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000D.history\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.066 UTC","user_name":"postgres","database_name":"postgres","process_id":"52","connection_from":"[local]","session_id":"65e02e5e.34","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","msg":"Updated replication settings","logging_pod":"harbor-postgres-3","filename":"override.conf"}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.222 UTC","user_name":"postgres","database_name":"postgres","process_id":"55","connection_from":"[local]","session_id":"65e02e5e.37","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.245 UTC","user_name":"postgres","database_name":"postgres","process_id":"56","connection_from":"[local]","session_id":"65e02e5e.38","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.246 UTC","user_name":"postgres","database_name":"postgres","process_id":"57","connection_from":"[local]","session_id":"65e02e5e.39","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","msg":"DB not available, will retry","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"harbor-postgres","namespace":"harbor"},"namespace":"harbor","name":"harbor-postgres","reconcileID":"fbc281dc-1514-4b0a-9c39-0689e3a17f0a","uuid":"e6a01d01-d6d1-11ee-83aa-4ade4de24c82","logging_pod":"harbor-postgres-3","err":"failed to connect to `host=/controller/run user=postgres database=postgres`: server error (FATAL: the database system is starting up (SQLSTATE 57P03))"}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.250 UTC","user_name":"postgres","database_name":"postgres","process_id":"58","connection_from":"[local]","session_id":"65e02e5e.3a","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.305 UTC","user_name":"postgres","database_name":"postgres","process_id":"59","connection_from":"[local]","session_id":"65e02e5e.3b","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.571 UTC","user_name":"postgres","database_name":"postgres","process_id":"60","connection_from":"[local]","session_id":"65e02e5e.3c","session_line_num":"1","session_start_time":"2024-02-29 07:12:30 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"wal-restore","msg":"WAL file not found in the recovery object store","logging_pod":"harbor-postgres-3","walName":"0000000E.history","options":["--endpoint-url","https://s3.eu-central-1.amazonaws.com","--cloud-provider","aws-s3","s3://TRUNCATED/","harbor-postgres"],"startTime":"2024-02-29T07:12:30Z","endTime":"2024-02-29T07:12:30Z","elapsedWalTime":0.555448566}
{"level":"info","ts":"2024-02-29T07:12:30Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:30.751 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"3","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"entering standby mode","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:31Z","msg":"Updated replication settings","logging_pod":"harbor-postgres-3","filename":"override.conf"}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:31.397 UTC","user_name":"postgres","database_name":"postgres","process_id":"74","connection_from":"[local]","session_id":"65e02e5f.4a","session_line_num":"1","session_start_time":"2024-02-29 07:12:31 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:31.399 UTC","user_name":"postgres","database_name":"postgres","process_id":"75","connection_from":"[local]","session_id":"65e02e5f.4b","session_line_num":"1","session_start_time":"2024-02-29 07:12:31 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:31Z","msg":"DB not available, will retry","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"harbor-postgres","namespace":"harbor"},"namespace":"harbor","name":"harbor-postgres","reconcileID":"70ee1fc8-5c19-43f4-a7d1-3bc8df2a218d","uuid":"e75275e9-d6d1-11ee-83aa-4ade4de24c82","logging_pod":"harbor-postgres-3","err":"failed to connect to `host=/controller/run user=postgres database=postgres`: server error (FATAL: the database system is starting up (SQLSTATE 57P03))"}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"wal-restore","msg":"Restored WAL file","logging_pod":"harbor-postgres-3","walName":"0000000D.history","startTime":"2024-02-29T07:12:30Z","endTime":"2024-02-29T07:12:31Z","elapsedWalTime":0.524848024}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"wal-restore","msg":"WAL restore command completed (parallel)","logging_pod":"harbor-postgres-3","walName":"0000000D.history","maxParallel":1,"successfulWalRestore":1,"failedWalRestore":0,"endOfWALStream":false,"startTime":"2024-02-29T07:12:30Z","downloadStartTime":"2024-02-29T07:12:30Z","downloadTotalTime":0.525037538,"totalTime":0.633526301}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:31.418 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"4","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000D.history\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:31Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:31.911 UTC","user_name":"postgres","database_name":"postgres","process_id":"88","connection_from":"[local]","session_id":"65e02e5f.58","session_line_num":"1","session_start_time":"2024-02-29 07:12:31 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.059 UTC","user_name":"postgres","database_name":"postgres","process_id":"89","connection_from":"[local]","session_id":"65e02e60.59","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.072 UTC","user_name":"postgres","database_name":"postgres","process_id":"90","connection_from":"[local]","session_id":"65e02e60.5a","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.129 UTC","user_name":"postgres","database_name":"postgres","process_id":"91","connection_from":"[local]","session_id":"65e02e60.5b","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"wal-restore","msg":"Restored WAL file","logging_pod":"harbor-postgres-3","walName":"0000000D0000000E00000010","startTime":"2024-02-29T07:12:31Z","endTime":"2024-02-29T07:12:32Z","elapsedWalTime":0.614393645}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"wal-restore","msg":"WAL restore command completed (parallel)","logging_pod":"harbor-postgres-3","walName":"0000000D0000000E00000010","maxParallel":1,"successfulWalRestore":1,"failedWalRestore":0,"endOfWALStream":false,"startTime":"2024-02-29T07:12:31Z","downloadStartTime":"2024-02-29T07:12:31Z","downloadTotalTime":0.61485673,"totalTime":0.708725379}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.200 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"5","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"restored log file \"0000000D0000000E00000010\" from archive","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.392 UTC","user_name":"postgres","database_name":"postgres","process_id":"93","connection_from":"[local]","session_id":"65e02e60.5d","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.416 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"6","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"invalid resource manager ID in checkpoint record","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.416 UTC","process_id":"28","session_id":"65e02e5d.1c","session_line_num":"7","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"PANIC","sql_state_code":"XX000","message":"could not locate a valid checkpoint record","backend_type":"startup","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.434 UTC","user_name":"postgres","database_name":"postgres","process_id":"94","connection_from":"[local]","session_id":"65e02e60.5e","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.494 UTC","user_name":"postgres","database_name":"postgres","process_id":"96","connection_from":"[local]","session_id":"65e02e60.60","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","msg":"DB not available, will retry","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster","Cluster":{"name":"harbor-postgres","namespace":"harbor"},"namespace":"harbor","name":"harbor-postgres","reconcileID":"d5a0ece4-2194-454f-80c1-1c370fda5f19","uuid":"e801d87a-d6d1-11ee-83aa-4ade4de24c82","logging_pod":"harbor-postgres-3","err":"failed to connect to `host=/controller/run user=postgres database=postgres`: server error (FATAL: the database system is starting up (SQLSTATE 57P03))"}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.496 UTC","user_name":"postgres","database_name":"postgres","process_id":"97","connection_from":"[local]","session_id":"65e02e60.61","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:32Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:32.634 UTC","user_name":"postgres","database_name":"postgres","process_id":"98","connection_from":"[local]","session_id":"65e02e60.62","session_line_num":"1","session_start_time":"2024-02-29 07:12:32 UTC","transaction_id":"0","error_severity":"FATAL","sql_state_code":"57P03","message":"the database system is starting up","backend_type":"client backend","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:33Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:33.283 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"6","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"startup process (PID 28) was terminated by signal 6: Aborted","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:33Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:33.283 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"7","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"aborting startup due to startup process failure","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:33Z","logger":"postgres","msg":"record","logging_pod":"harbor-postgres-3","record":{"log_time":"2024-02-29 07:12:33.285 UTC","process_id":"24","session_id":"65e02e5d.18","session_line_num":"8","session_start_time":"2024-02-29 07:12:29 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"00000","message":"database system is shut down","backend_type":"postmaster","query_id":"0"}}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Extracting pg_controldata information","logging_pod":"harbor-postgres-3","reason":"postmaster has exited"}
{"level":"error","ts":"2024-02-29T07:12:33Z","msg":"PostgreSQL process exited with errors","logging_pod":"harbor-postgres-3","error":"exit status 1","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:128\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/run/lifecycle.(*PostgresLifecycle).Start\n\tinternal/cmd/manager/instance/run/lifecycle/lifecycle.go:98\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:223"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"error","ts":"2024-02-29T07:12:33Z","msg":"error received after stop sequence was engaged","error":"exit status 1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:490"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Webserver exited","logging_pod":"harbor-postgres-3","address":":9187"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Webserver exited","logging_pod":"harbor-postgres-3","address":":8000"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Exited log pipe","fileName":"/controller/log/postgres.json","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Exited log pipe","fileName":"/controller/log/postgres","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"All workers finished","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Webserver exited","logging_pod":"harbor-postgres-3","address":"localhost:8010"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"All workers finished","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"All workers finished","controller":"cluster","controllerGroup":"postgresql.cnpg.io","controllerKind":"Cluster"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Exited log pipe","fileName":"/controller/log/postgres.csv","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for caches"}
{"level":"info","ts":"2024-02-29T07:12:33Z","logger":"roles_reconciler","msg":"Terminated RoleSynchronizer loop","logging_pod":"harbor-postgres-3"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Stopping and waiting for HTTP servers"}
{"level":"info","ts":"2024-02-29T07:12:33Z","msg":"Wait completed, proceeding to shutdown the manager"}

from cloudnative-pg.

ensdhvi avatar ensdhvi commented on May 24, 2024

We are facing the same problem on Google GKE clusters. After each node upgrade random instances have problems with recovering. The only workaround we found is removing one of failing instance PVC.

from cloudnative-pg.

suleimi avatar suleimi commented on May 24, 2024

I keep running into this every other day

from cloudnative-pg.

maaft avatar maaft commented on May 24, 2024

Also discovered this now.. is CNPG team aware of this? Whom to ping about this issue?

only working solution was to delete the failing Pod and PVC so a new node is created to join the cluster

from cloudnative-pg.

oberai07 avatar oberai07 commented on May 24, 2024

I am facing similar issue. Do we have any fix available for below issue ? I tried to delete the pod but did not help.

{"level":"error","ts":"2024-02-29T07:12:33Z","msg":"PostgreSQL process exited with errors","logging_pod":"harbor-postgres-3","error":"exit status 1","stacktrace":"github.com/cloudnative-pg/cloudnative-pg/pkg/management/log.(*logger).Error\n\tpkg/management/log/log.go:128\ngithub.com/cloudnative-pg/cloudnative-pg/internal/cmd/manager/instance/run/lifecycle.(*PostgresLifecycle).Start\n\tinternal/cmd/manager/instance/run/lifecycle/lifecycle.go:98\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\tpkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:223"}

from cloudnative-pg.

janosmiko avatar janosmiko commented on May 24, 2024

@gbartolini is it possible to modify the health check behavior of the pods managed by CNPG?

from cloudnative-pg.

suleimi avatar suleimi commented on May 24, 2024

This might be related to #3698. Give us some time to investigate.

@gbartolini Any updates on this would be greatly appreciated 🙏🏿

from cloudnative-pg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.