Comments (12)
Hi,
I suspect (correct me if I am wrong) that's because in postgresql.conf I only have set up
archive_command
to use only first repository (I thought it would be enough, I don't know how to combine those 2 stanzas if needed in one archive command?) As you see above I have 2 storage accounts, and I wanted to have 2 stanzas for convenience and clarity.
Indeed. You only configured PG to push the WAL segments to only one of the repositories. Imo, you're not using the stanza section correctly. Usually, repositories are globally defined and you should rather use repo option indexing (multi-repo feature), not multiple stanzas.
As stated in the docs:
A stanza is the configuration for a PostgreSQL database cluster that defines where it is located,
how it will be backed up, archiving options, etc. Most db servers will only have one PostgreSQL
database cluster and therefore one stanza, whereas backup servers will have a stanza for every
database cluster that needs to be backed up.
You only have 1 PG cluster, so you should only have 1 stanza configuration (containing the options to connect to that cluster).
Is the only option really, to combine those 2 stanzas into one stanza with 2 repos? Can't they somehow share the WAL archive? Or Can I have 2 independent WAL archives?
The WAL archives need to reside in the same location than the backups. The expire command will btw only remove WAL archives based on the oldest backup in the repository, so you'll need to take backups separately and individually for each repo (using i.e. the backup --repo=x
option).
Your configuration should probably look like this:
[global]
log-level-console=info
log-level-file=debug
start-fast=y
compress-type=zst
process-max=4
### async settings:
archive-async=y
archive-push-queue-max=360GB
spool-path=/var/spool/pgbackrest
archive-get-queue-max=1GB
## disaster recovery, 28 days retention period
...
repo1-gcs-bucket=gcp-disasterrecovery
repo1-retention-full=28
repo1-retention-full-type=time
## long-term archive, retention of 1000 days
...
repo2-gcs-bucket=gcp-archive
repo2-retention-full=1000
repo2-retention-full-type=time
[my_stanza_name]
pg1-path=/var/lib/postgresql/14/main/
With this, the archive-push
command will push the WAL segment to archive to both repositories at the same time.
Regards,
from pgbackrest.
Thanks pgstef for very quick reply.
But question remains, about the expiration (retention).
I created 2 stanzas because I would like to avoid keeping of 1000 days worth of WAL files...
I only need PITR recovery for the disaster_recovery stanza of 28 days.
I suspect for the config you have proposed above, the WAL would be kept for 1000 days, right? Because higher value would keep them from expiring?
from pgbackrest.
No, you would have 1000 days of archives in the repository where the oldest backup is 1000 days old and 28 days of archives in the repository where the oldest backup is 28 days old.
As mentioned, WAL archives are expired based on oldest backup in the repo.
There's also the repo-retention-archive which lets reducing the amount of WAL files in a repo, but it's way more complicated to use when you're using full backup time expiration. Modifying the WAL archiving retention method is imo an advanced feature that should be used very carefulyl.
from pgbackrest.
Ah yes, right, there would be 2 separate WAL archive subfolders in each storage bucket.
So there is no option to shrink the size of stored WALs for longterm archive
repo.
1000 days it is.
I think it would be good use case for a new pgbackrest feature to allow for a snapshot
to be taken. Let's say just full backup with a few WAL files bundled together (15 minutes worth?) (because postgresql requires that I think) as a single point in time. And then one could take such a snapshot
every 24 hours for the sake of long-term archive.
Not sure if its possible from technical point of view without stopping the db.
from pgbackrest.
So there is no option to shrink the size of stored WALs for
longterm archive
repo. 1000 days it is.
Yes, there's a way. I wrote it in my last comment -> repo-retention-archive. If you don't need PITR capability between the backups in the repository, you can change the archives retention setting. But as also mentioned, mixing this with full backup time retention makes it very tricky.
I think it would be good use case for a new pgbackrest feature to allow for a
snapshot
to be taken. Let's say just full backup with a few WAL files bundled together (15 minutes worth?) (because postgresql requires that I think) as a single point in time. And then one could take such asnapshot
every 24 hours for the sake of long-term archive.
Those snapshots as you call it are basically incr/diff backups. WAL files are basically the root of Postgres write mechanism. You need it for consistency, crash recovery, physical backups, replication,... You can't just through them away or mess with it. What you can do is to take incr or diff backups to not need to replay the WAL entries between the full and those incr/diff backups. From my perspective, what you're asking is there already.
Not sure if its possible from technical point of view without stopping the db.
It is, that's the magic of WAL files. You need them for your backup/snapshot consistency. At least those that have been generated between the db files copy. The others (generated after the backup) are there for PITR.
But for sure, if you want multiple repositories, you'll need to have backups in both (and the backup command only work per repository) and achieving some tricky retention configuration using time retention type will definitely give you some headaches. Way easier to combine number of backups and how often/when they are scheduled.
from pgbackrest.
Yes, there's a way. I wrote it in my last comment -> repo-retention-archive. If you don't need PITR capability between the backups in the repository, you can change the archives retention setting. But as also mentioned, mixing this with full backup time retention makes it very tricky.
Ok, let's say I am going to setup retention for longterm_archive
backups retention of 1000 days and repo-retention-arhive
for 1 backup. Backups are taken weekly. Will I be able to restore from backup older than a week using just longterm-archive
repository? Lets assume the first repo disaster_recovery
is unavailable (data loss or sth).
from pgbackrest.
Will I be able to restore from backup older than a week using just
longterm-archive
repository?
The WAL archives needed for the backup consistency will never get expired. So yes, you will be able to restore your older backups but you'll only be able to perform PITR in the last week. You won't be able to recover between 2 backups.
Using pgbackrest restore --repo=2 --type=immediate --set=X
will let you do that. And what's even better is that the archive-get
command will work on both repos in priority order: so it will first look for the wal archive in repo1 and if missing, in repo2. So if storage is faster on repo1, you'll get faster recovery too.
To understand what it means to restore a backup with only a few WAL archives (the ones needed for the consistency), you can do some research on Postgres recovery_target='immediate'
.
from pgbackrest.
If disk space is at a premium, then this setting, in conjunction with repo-retention-archive-type, can be used to aggressively expire WAL segments. However, doing so negates the ability to perform PITR from the backups with expired WAL and is therefore not recommended
Sound like my scenario. I want this settings only for my second repo, longterm one. no need to PITR, this repo is just for backup once a week, that's restorable to any time that week, - I don't really care. (law regulation). Because of requirement of storing it for years WAL files would grow too much.
WAL segments required to make a backup consistent are always retained until the backup is expired regardless of how this option is configured.
So pgbackrest automagically would keep the minimum number of WAL files. perfect!
Would such config be valid?
..
## disaster recovery, 28 days retention period
...
repo1-gcs-bucket=gcp-disasterrecovery
repo1-retention-full=28
repo1-retention-full-type=time
## long-term archive, retention of 1000 days
...
repo2-gcs-bucket=gcp-archive
repo2-retention-full=1000
repo2-retention-full-type=time
repo2-retention-archive=1
[my_stanza_name]
pg1-path=/var/lib/postgresql/14/main/
I am aiming to have WAL cleared after 1 day for repo2(would be actually 1 week I suppose because thats how often repo2 backup is taken?) and keep the full backups for 1000 days.
I am asking because the documentation says:
https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive
If repo-retention-full-type is time, then this value will default to removing archives that are earlier than the oldest full backup retained after satisfying the repo-retention-full setting.
So would it really expire WAL files for repo 2 older than 1 week or rather older than 1000 days?
from pgbackrest.
I am aiming to have WAL cleared after 1 day for repo2(would be actually 1 week I suppose because thats how often repo2 backup is taken?) and keep the full backups for 1000 days.
The expire is based on backups, so unless you have more regular backups, it will probably be 1 week yes.
What you're actually describing is a use-case where you'd have 1 repo to only take backups but don't have archives (just those needed for the backup consistency). That means "disabling archiving on a specific repository". We would like to have that kind of feature, but we're not there yet.
I am asking because the documentation says: https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive
If repo-retention-full-type is time, then this value will default to removing archives that are earlier than the oldest full backup retained after satisfying the repo-retention-full setting.
That's what I wrote at least twice: reducing the amount of WAL files in a repo is way more complicated when you're using full backup time expiration.
So would it really expire WAL files for repo 2 older than 1 week or rather older than 1000 days?
Honestly, looking at the code, I believe it would work (only keep the WAL archives after the latest backup). But that needs to be tested to really confirm it (and testing time expiration is not that easy tbh).
Regards
from pgbackrest.
Thanks pgstef, for detailed explanation.
I don't really care if its expiring with time, or with count (as I know the frequency its the same for me - I can recalculate it).
I understand this is 2 different pieces of code.
I can test it by setting up above and then see if WAL files were expired indeed (I think such expiration would be logged). Will post results here.
from pgbackrest.
I can confirm, that after a week I get in the expire.log:
2024-05-05 03:56:39.997 P00 INFO: repo2: 14-1 remove archive, start = 0000000100001ABF, stop = 0000000100001AC000000039
Looks like it works, as the archive
in ltsarchive gcs bucket is much smaller than the disasterrecovery one.
What could be better is the message:
- for criteria not met you get:
INFO: repo1: time-based archive retention not met - archive logs will not be expired
- for criteria met you don't have such a message. Not very convenient for grepping logs. I hoped for
will be expired message
I am very happy it works, the saved space for a week is 2 TB just for one DB. Thanks a lot for your help!
from pgbackrest.
for criteria met you don't have such a message. Not very convenient for grepping logs. I hoped for will be expired message
Hmm, but I think that's what we do, e.g.
repo1: 9.4-1 remove archive, start = 000000020000000000000006, stop = 000000020000000000000006
from pgbackrest.
Related Issues (20)
- Pgbackrest INCR Backups Consuming Too Much Disk HOT 1
- Backup to Sftp not working HOT 7
- Documentation for #2216 HOT 1
- Unable to setup sftp due to public key authentication failed: libssh2 error [-18] HOT 7
- config autogeneration not working in Meson or Make HOT 2
- Expired WAL files not getting deleted after backup HOT 8
- [Question] Is it possible to improve performance of the final stages of a backup? HOT 4
- Recovery from folder (related to #2042) HOT 1
- Does PostgreSQL Replica Server Generate WAL Files? HOT 2
- Error about missing "archive.info" locally when doing S3-only backup HOT 5
- feature request: archive-push-queue-max monitoring HOT 1
- ERROR: [029]: manifest validation failed HOT 3
- gcs module with private endpoint HOT 2
- Containers backup setup HOT 1
- Azure Storage Account Permissions (Best Practice/Security) HOT 1
- [Question] restore single db PITR, in a cluster that have many databases HOT 4
- ERROR : backup command end: aborted with exception HOT 10
- How to migrate PG through pgbackbrest HOT 2
- Question regarding multi-repo feature HOT 2
- WAL summarizer in pg 17 and incremental backups in pgbackrest ? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgbackrest.