Comments (3)
Hello @egyptianbman. When I originally read your analysis, it sounded correct. Thanks to @PhanLe1010 (who I discussed it with) for clueing me in to the following!
When Longhorn purges snapshots after a snapshot deletion, the following steps are triggered:
- processRemoveSnapshot is called on any eligible snapshots marked as removed: https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/pkg/sync/rpc/server.go#L1124-L1126.
- processRemoveSnapshot calls DiskPrepareRemove to determine what operations to perform on the eligible snapshot: https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/pkg/sync/rpc/server.go#L1318-L1320.
- In the case we are discussing, the oldest snapshot has one child which is not the head. (In Longhorn, the volume head has a parent, which is the most recent snapshot. This snapshot has a parent that is the next most recent snapshot, etc.) Two ops are queued up: https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/pkg/replica/replica.go#L514-L531.
- Coalesce the disk (oldest snapshot) and its child.
- Replace the disk (oldest snapshot) with its child.
- processRemoveSnapshot calls FoldFile to complete the coalesce operation: https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/pkg/sync/rpc/server.go#L1327-L1332.
- FoldFile calls coalesce, which COPIES DATA FROM THE CHILD TO THE PARENT. So the parent (oldest snapshot) receives data from its child: https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/vendor/github.com/longhorn/sparse-tools/sparse/sfold.go#L100-L110.
- processRemoveSnapshot calls replaceDisk to complete the replace operation: https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/pkg/sync/rpc/server.go#L1339-L1344.
- ReplaceDisk calls hardlinkDisk to remove the child (whose data has been copied to the parent) and create a hard link from the child's name to the parent (oldest snapshot): https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/pkg/replica/replica.go#L358-L375.
- ReplaceDisk calls removeDiskNode, which reconfigures the snapshot chain: https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/pkg/replica/replica.go#L358-L375.
- ReplaceDisk calls rmDisk to delete the parent file. Since there is now a hard link from the child's name to the parent, the data isn't actually gone, just the original link/reference: https://github.com/longhorn/longhorn-engine/blob/182b37c15825eafabb52f0b1cd462eaa297b9252/pkg/replica/replica.go#L824-L847
So, to make a long story short, even though the oldest snapshot file appears to be removed in favor of the next oldest snapshot file, data is actually copied in reverse (from the newer snapshot file to the older one). Then, some file system metadata operations take place to make things look "right".
I would not expect the feature you are requesting to speed up this process. If you have the following snapshots:
- snap1 (contains 1TiB of writes)
- snap2 (contains 5 GiB of writes)
- snap3 (contains 5 GiB of writes)
- head (contains 5 GiB of writes)
Both of these operations are virtually identical from a Longhorn perspective:
- Deleting snap1 actually involves copying 5GiB of data from snap2 to snap1, then "renaming" the resulting file snap2.
- Deleting snap2 actually involves copying 5GiB of data from snap3 to snap2, then "renaming" the resulting file snap3.
Please chime in to keep me honest if I've said something incorrect @PhanLe1010 and @longhorn/dev-data-plane!
from longhorn.
Thank you so much for such a thorough response! Are the snapshots always growing since longhorn is essentially merging oldest-1 into oldest? Is oldest then actually always the delta from the first snapshot to the oldest-1?
from longhorn.
Are the snapshots always growing since longhorn is essentially merging oldest-1 into oldest?
Each snapshot has a maximum size that is the nominal size of the volume. If oldest-1
contains writes to blocks that were not in oldest
, then the actual space consumed by oldest
increases accordingly during the coalescing operation. However, if oldest
is already consuming the space of the nominal size of the volume, the coalescing operation simply overwrites blocks.
In the "worst" case, coalescing does not reduce the actual size, because oldest
contained some blocks and oldest-1
contained other blocks. Coalescing deletes a file, but all of the data from the snapshots must be retained. In most cases, oldest-1
mostly contains changed blocks that are also in oldest
. So coalescing allows us to get rid of the outdated copy of the blocks in oldest
, reducing actual space consumption.
Is oldest then actually always the delta from the first snapshot to the oldest-1?
I'm not sure I follow.
oldest
is self-contained snapshot of the volumw as it existed at some point in the past. Maybe there were previous snapshots, but all history of them has been lost during previous coalescing operations.oldest-1
has the changes to the volume since oldest. If you hadoldest-1
but notoldest
, you would have only a corrupted mess of blocks and that almost certainly would not contain a valid file system.oldest-2
has the changes to the volume sinceoldest-1
. If you hadoldest-1
but notoldest
ANDoldest-1
, you would have only a corrupted mess of blocks and that almost certainly would not contain a valid file system.volume-head
has the changes to the volume since the most recent snapshot. If you hadvolume-head
but not all of the previous snapshots, you would have only a corrupted mess of blocks and that almost certainly would not contain a valid file system.
In some since, oldest
is the most important snapshot in the chain. If you have it, you have SOMETHING useful, even if it is out-of-date. All of the others are just differences from previous. It would be very unlikely (though technically possible) for your data to be intact if you lost the oldest snapshot (outside of intentional coalescing operations performed by Longhorn of course).
from longhorn.
Related Issues (20)
- [BACKPORT][v1.6.2][BUG] BackupTarget conditions don't reflect connection errors in v1.6.0 HOT 2
- [IMPROVEMENT] Clean up BackupTarget condition message handling
- [BUG] talos /var/lib/rancher/longhorn vs /var/lib/longhorn HOT 6
- [FEATURE] Container-Optimized OS support for the v2 data engine
- [TEST][FEATURE] Container-Optimized OS support for the v2 data engine HOT 1
- [TEST] Analyze `test_ha_backup_deletion_recovery` flaky test case
- [BACKPORT][v1.6.1][IMPROVEMENT] Add dmsetup and dmcrypt utilities check in environment check script HOT 3
- [BUG] potential risk to unmap a negative number HOT 10
- [BACKPORT][v1.5.5][BUG] potential risk to unmap a negative number HOT 1
- [BACKPORT][v1.6.1][BUG] potential risk to unmap a negative number HOT 4
- Kubernetes added a new node, but Longhorn didn't detect the addition of the new node. How can I make Longhorn recognize the addition as well? I deployed using the kubectl method. HOT 3
- [TEST] Update regression job `K8S_DISTRO_VERSION` and `LONGHORN_STABLE_VERSION` parameter
- [TEST] Verify upgrade for all gitops solutions HOT 2
- Add canonical links for SEO
- Almalinux 9 - longhorn-manager CrashLoopBackOff HOT 4
- Go-live checklist
- [TEST] Negative test case `Stress Volume Node Memory When Volume Is Offline Expanding` failed: `KeyError: 'test.longhorn.io/last-recorded-expanded-size'` HOT 1
- [CI] Add `xfstests` (filesystem testing suite) in CI test
- [BUG] Error get size (backups)
- [BUG] Negative test case got stuck in waiting for longhorn-ui pods HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from longhorn.