The README for bees indicates that filesystems should be mounted with the flushoncommi

WARN when filesystem is mounted with flushoncommit — best practices? about bees HOT 1 OPEN

zygo commented on June 7, 2024

WARN when filesystem is mounted with flushoncommit — best practices?

from bees.

Comments (1)

Zygo commented on June 7, 2024 2

The README lists flushoncommit as a tested configuration, not a requirement. Neither flushoncommit setting conflicts with bees, so you can use either mount option with bees.

I don't test bees on filesystems with noflushoncommit because the data loss that normally occurs on crashes with noflushoncommit creates noise results in testing that make it difficult to determine whether any data loss that is found was expected btrfs behavior, or some bug in either btrfs or bees.

The default (noflushoncommit) makes btrfs performance competitive in filesystem benchmarks, but it's very unforgiving when there is a crash. In addition to the usual empty files that get left behind on a crash on other filesystems like ext4, btrfs noflushoncommit also injects hole extents into the middle of files if delayed allocation causes a reordering of extent flushes around a transaction commit, and a crash occurs between the transaction commit and the delayed flushes. This gets worse on larger/busier systems because the flushes take more time to execute, so the percentage of time when data is at risk rapidly reaches 100%, and the locations of the holes injected into files become less predictable. In other words, data corruption (as opposed to loss) on a crash is certain with noflushoncommit, the only variables are how much data and which parts of which files are damaged.

Because of the problems with noflushoncommit, I don't recommend using btrfs without flushoncommit except in specific circumstances:

Dedicated application servers can use flushoncommit safely if all writes are known to be explicitly persisted with fsync, and lost file writes can be detected and tolerated by all applications on the server. Database engines and sqlite can meet this requirement, but tools like 'cp' and 'rsync' do not.
Another solution is to delete all files that were written just before a crash. A build server can do this by erasing the workspace of an interrupted build and restarting builds from a fresh checkout after a reboot. A build in progress would likely be corrupted in ways that are difficult to detect, especially if GNU ld is involved.
An end-user running a collection of desktop productivity applications without rigorous automated supervision is just rolling the dice on data loss. Such users should always mount with flushoncommit.

bees does call fsync() after creating temporary extents for dedup src, and all of its temporary files are created with O_TMPFILE so they are automatically removed on reboots; however, there is still a race condition when noflushoncommit is used that has no solution:

two extents are identified as duplicate by bees
some non-bees process writes identical data to the src extent, replacing the src extent with delalloc blocks in memory
bees dedups the extent (with dedup_file_range ioctl), replacing a reference to a flushed dst extent on disk with a reference to a non-flushed src extent in memory. In the previous step we required identical data so the dedup is still accepted by the kernel
a btrfs commit occurs, deleting the old dst ref, but not flushing the new src ref because noflushoncommit
system crashes before the flush occurs

With noflushoncommit, in theory the above sequence could replace the duplicate extent with a hole (note: I haven't tested noflushoncommit with bees, so I have not reproduced this in practice, and there may be some other mechanism in btrfs that prevents this case). flushoncommit does not allow this to happen as the flush and commit occur within the same transaction.

Probably the best approach would be to fix the bug introduced in kernel commit ce8ea7cc6eb3 ("btrfs: don't call btrfs_start_delalloc_roots in flushoncommit"). See also https://marc.info/?l=linux-btrfs&m=151315564008773. This bug does not affect LTS kernels (i.e. 4.14.x) so it can also be avoided by using one of those.

To date I haven't seen evidence of this warning causing any problems in practice; however, I wouldn't expect it to be a problem unless a umount is attempted with a filesystem under heavy write load, and I would expect the effect in that case to be limited to a kernel crash or maybe a deadlock.

from bees.

WARN when filesystem is mounted with flushoncommit — best practices? about bees HOT 1 OPEN

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent