Code Monkey home page Code Monkey logo

Comments (1)

dsessler7 avatar dsessler7 commented on May 27, 2024

@danfinn and I have been talking about this in the Crunchy Data Discord Server, but I would like to provide a response here for anyone running into the same issue.

So first off, we currently require “disaster recovery” via pgBackRest to be enabled. A number of users have requested the ability to turn disaster recovery off entirely, and we do have this in our development backlog and hope to implement it sometime in the future. One of the primary reasons why we do currently require the use of pgBackRest is that we use backups/archives for the creation of replicas when scaling up postgresclusters, as it is much faster/more efficient than creating the replica db by doing streaming replication from the primary/another replica. This is why we do an initial full backup when a postgrescluster is first created.

Once that initial backup is completed, WAL “archives” will be created as changes are made to the database, which allow us to do a “Point-In-Time-Recovery” if needed. This backup and the subsequent WAL will remain until the backup is expired. Over time, this can lead to disk space being taken up by large amounts of WAL if the backup(s) never expire, which is what we are seeing here.

When a given backup expires is determined by the pgBackRest retention settings. The first thing to understand about the retention settings is that the minimum number of full backups that can be retained is 1. This makes sense as both incremental and differential backups rely on full backups to build off of, so a pgBackRest repo really needs at least one full backup to be of any use. When the retention type is set to “count” the --repo-retention-full value represents the number of full backups you want to retain (again, minimum value is 1). If the retention type is set to “time” rather than “count”, then full backups whose age (in days) are older than the --repo-retention-full value will be expired IF at least one backup remains that is equal to or greater than the --repo-retention-full in age. The pgBackRest documentation has the following example to illustrate this behavior:

“If repo-retention-full is 30 (days) and there are 2 full backups: one 25 days old and one 35 days old, no full backups will be expired because expiring the 35 day old backup would leave only the 25 day old backup, which would violate the 30 day retention policy of having at least one backup 30 days old before an older one can be expired.”

So, to recap, whether your retention type is set to "count" or "time" you will always have to have at least one full backup. WAL will accumulate until the associated backup (the backup that came before the WAL) is expired. Therefore, it is good practice to have both regular scheduled backups (although I will note that you can also do manual "one-off" backups with PGO) and appropriate retention settings. PGO only does one initial full backup, so you will need to add a backup schedule to your postgrescluster manifest manually. If you really don't care about disaster recovery and don't want to worry about running out of disk space, set your --repo-retention-full to 1 and schedule regular full backups to keep the WAL from accumulating.

from postgres-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.