Code Monkey home page Code Monkey logo

Comments (1)

Zygo avatar Zygo commented on June 7, 2024 2

The README lists flushoncommit as a tested configuration, not a requirement. Neither flushoncommit setting conflicts with bees, so you can use either mount option with bees.

I don't test bees on filesystems with noflushoncommit because the data loss that normally occurs on crashes with noflushoncommit creates noise results in testing that make it difficult to determine whether any data loss that is found was expected btrfs behavior, or some bug in either btrfs or bees.

The default (noflushoncommit) makes btrfs performance competitive in filesystem benchmarks, but it's very unforgiving when there is a crash. In addition to the usual empty files that get left behind on a crash on other filesystems like ext4, btrfs noflushoncommit also injects hole extents into the middle of files if delayed allocation causes a reordering of extent flushes around a transaction commit, and a crash occurs between the transaction commit and the delayed flushes. This gets worse on larger/busier systems because the flushes take more time to execute, so the percentage of time when data is at risk rapidly reaches 100%, and the locations of the holes injected into files become less predictable. In other words, data corruption (as opposed to loss) on a crash is certain with noflushoncommit, the only variables are how much data and which parts of which files are damaged.

Because of the problems with noflushoncommit, I don't recommend using btrfs without flushoncommit except in specific circumstances:

  • Dedicated application servers can use flushoncommit safely if all writes are known to be explicitly persisted with fsync, and lost file writes can be detected and tolerated by all applications on the server. Database engines and sqlite can meet this requirement, but tools like 'cp' and 'rsync' do not.
  • Another solution is to delete all files that were written just before a crash. A build server can do this by erasing the workspace of an interrupted build and restarting builds from a fresh checkout after a reboot. A build in progress would likely be corrupted in ways that are difficult to detect, especially if GNU ld is involved.
  • An end-user running a collection of desktop productivity applications without rigorous automated supervision is just rolling the dice on data loss. Such users should always mount with flushoncommit.

bees does call fsync() after creating temporary extents for dedup src, and all of its temporary files are created with O_TMPFILE so they are automatically removed on reboots; however, there is still a race condition when noflushoncommit is used that has no solution:

  • two extents are identified as duplicate by bees
  • some non-bees process writes identical data to the src extent, replacing the src extent with delalloc blocks in memory
  • bees dedups the extent (with dedup_file_range ioctl), replacing a reference to a flushed dst extent on disk with a reference to a non-flushed src extent in memory. In the previous step we required identical data so the dedup is still accepted by the kernel
  • a btrfs commit occurs, deleting the old dst ref, but not flushing the new src ref because noflushoncommit
  • system crashes before the flush occurs

With noflushoncommit, in theory the above sequence could replace the duplicate extent with a hole (note: I haven't tested noflushoncommit with bees, so I have not reproduced this in practice, and there may be some other mechanism in btrfs that prevents this case). flushoncommit does not allow this to happen as the flush and commit occur within the same transaction.

Probably the best approach would be to fix the bug introduced in kernel commit ce8ea7cc6eb3 ("btrfs: don't call btrfs_start_delalloc_roots in flushoncommit"). See also https://marc.info/?l=linux-btrfs&m=151315564008773. This bug does not affect LTS kernels (i.e. 4.14.x) so it can also be avoided by using one of those.

To date I haven't seen evidence of this warning causing any problems in practice; however, I wouldn't expect it to be a problem unless a umount is attempted with a filesystem under heavy write load, and I would expect the effect in that case to be limited to a kernel crash or maybe a deadlock.

from bees.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.