Why am I seeing <div class="snippet-clipboard-content notranslate position-relativ

Actually, nothing: <div class="snippet-clipboard-content notranslate position-rela

Some statistics I compiled on PERFORMANCE warnings: <div class="snippet-clipboard-

I think I can pull that statistics from the systemd journal: <div class="snippet-c

refs=0, does it make sense? about bees HOT 14 OPEN

zygo commented on June 1, 2024

refs=0, does it make sense?

from bees.

Comments (14)

Zygo commented on June 1, 2024

refs=0 occurs when the hash table contains an entry for an extent that has been deleted after it was added to the hash table. The extent lookup returns 0 refs. bees removes the entry from the hash table and proceeds to the next match or next new block.

We should probably have a command-line option to set the time to generate the PERFORMANCE warning. Not all filesystems can resolve in less than 2.5 seconds, especially busy filesystems on slow spinning media.

from bees.

kakra commented on June 1, 2024

Well, this is a btrfs on 3x 7200rpm drives with 400G bcache in front of it. And it doesn't look like I see more or less of these performance warnings regardless of what I'm currently doing with the system. It's more like bees makes itself busy, especially while working on big container VMs.

from bees.

Zygo commented on June 1, 2024

Large files are also a bad case for LOGICAL_INO. A lot of things seem to be bad cases for LOGICAL_INO.

If you look up the blocks near the above, e.g.

for x in $(seq -32 32); do btrfs ins log $((x * 4096 + 0x3204ada000)) /run/bees/mnt/26a832d7-73a5-4de5-b272-69f73e1daf5e; done

do you find any references to a large file?

from bees.

kakra commented on June 1, 2024

Actually, nothing:

# time for x in $(seq -32 32); do btrfs ins log $((x * 4096 + 0x3204ada000)) /run/bees/mnt/26a832d7-73a5-4de5-b272-69f73e1daf5e; done

real    2m38,952s
user    0m24,895s
sys     2m22,522s

I guess that performance warning should probably also log the subvolid, because btrfs ins log only operates on the specified subvolume, doesn't it? At least btrfs ins ino works that way...

from bees.

Zygo commented on June 1, 2024

Some statistics I compiled on PERFORMANCE warnings:

$ for x in stats/beestest-*/log.txt; do echo "--- $x ---"; sed -nr 's/.*PERFORMANCE.*sec:.([^ ]+).*/\1/p' < "$x" | sort | uniq -dc | sort -nr; done
 
--- 4x SSD single/dup high traffic ---
      6 Searching
      4 pread(fd 
      4 Fetching
      2 syncing 
 
--- WD Black single/dup 2TB high traffic ---
  40097 Reading
  13352 Resolving
   4662 grow  
   3994 syncing
   3438 open_root_ino(root
   2792 Searching
   2791 dedup
    802 BeesAddress(fd
    209 pread(fd
    209 Fetching
     56 Resizing
     38 creating
     28 pwrite(fd
     14 Saving
 
--- WD Green RAID1 969GB low traffic ---
     63 dedup  
     50 pread(fd
     50 Fetching
     16 grow
      9 Resolving
      5 syncing  
 
--- WD Green RAID1 16GB low traffic ---
      9 pread(fd
      9 Fetching
      3 pwrite(fd

Are yours significantly different?

from bees.

Zygo commented on June 1, 2024

btrfs ins log operates on the extent tree, so there is no subvol in the input. The output of LOGICAL_INO is a list of (subvol, ino, offset) tuples.

Maybe we could get some useful information by running BEESLOGTRACE if a resolve takes too long. That would at least tell us which file we were attempting to match.

from bees.

kakra commented on June 1, 2024

I think I can pull that statistics from the systemd journal:

$ journalctl -u beesd@26a832d7-73a5-4de5-b272-69f73e1daf5e | sed -nr 's/.*PERFORMANCE.*sec:.([^ ]+).*/\1/p' | sort | uniq -dc | sort -nr
  59982 Resolving
  20457 grow
    164 dedup
     65 pwrite(fd
     35 Saving
     13 open_root_ino(root
     10 Searching
      8 pread(fd
      7 find_cell
      6 syncing
      6 fetch_missing_extent
      5 Reading
      5 push_random_hash_addr
      3 Resizing
      2 Fetching
      2 erase
      2 BeesAddress(fd

It's 400G bcache in front of 3x Samsung F1 1TB dRAID0/dRAID1.

I have another system running which shows these numbers:

# journalctl -u beesd@5dbd3587-a3ac-441c-a14c-b1aa073c5123 | sed -nr 's/.*PERFORMANCE.*sec:.([^ ]+).*/\1/p' | sort | uniq -dc | sort -nr
  16745 Resolving
   7524 grow
   1280 dedup
     14 syncing
      7 creating
      6 pwrite(fd

That is single/dup on VMware ESXi, ontop of RAID5 or RAID50 (currently not sure) with hardware SSD cache RAID1.

BTW: Both systems are mostly idle, except some Gentoo updates were run. The first machine has some light desktop workload, the second machine sits idle currently. It's a rescued server system inside a container, with bees running outside of the container. Both have borgbackup running from time to time but that is very fast and finishes within minutes with very low system impact.

from bees.

kakra commented on June 1, 2024

So it seems like on spinning media, "Resolving" and "grow" have "outstanding" bad performance. Not sure about "grow", but "Resolving" probably needs to be optimized in the kernel.

from bees.

Zygo commented on June 1, 2024

Resolving has long been one of the bottlenecks in bees. It used to be four orders of magnitude slower. Now at least in some storage layer configurations it's not the slowest thing any more, but there's still considerable room for improvement. The reason why bees has the notion of 'toxic' extents at all is mostly because of past poor performance of LOGICAL_INO.

Unfortunately, LOGICAL_INO is a core component of bees that can't be easily replaced or worked around. We could avoid using LOGICAL_INO in the hash table (if we doubled the cell size and became less resilient to things like snapshot deletes), but if we don't hit all the references to an extent at close to the same time then we don't free any disk space.

'grow' is slow for a number of reasons: it does IO in 4K blocks, it allocates and frees (with the associated page-table updates) for each block, it reads blocks in backward order, and it does way too many TREE_SEARCH_V2 calls (it's one of the reasons why a substantial amount of bees kernel CPU time is in TREE_SEARCH_V2). Some of those can be fixed in bees.

Even if we fixed those, 'grow' would still appear slow because it performs all the reads when a duplicate is found, so it's not equivalent to most of the other operations that only read a single 4K block or 16K metadata page. These reads are typically not in cache, so 'grow' is blamed for half the IO of duplicates (more than half, since the destination is guaranteed to be a single extent while the source could be many extents). The 'dedup' call seems faster than it really is because it's rereading stuff that was already pulled into RAM by the 'grow' function, so 'dedup' only gets a PERFORMANCE warning if it gets blocked waiting for a transaction commit.

LOGICAL_INO is slow for assorted reasons I'm only beginning to understand. It does do at least some metadata reads with no caching (which is why bees has an internal LOGICAL_INO cache). It also interacts with transaction commits in (to me) unexpected ways--it will block during a transaction commit the same way dedup and sync calls do. That's surprising because LOGICAL_INO is conceptually a read operation, so I would expect its blocking behavior to be similar to read and TREE_SEARCH_V2 ioctls (which don't block during transaction commit). I think this may be because (begin handwaving guess) LOGICAL_INO accumulates reference locks on everything it touches and doesn't release the locks until it is done, but can't finish until everything it touched gets flushed to disk (end handwaving guess). But this is only theory as I've only been really looking at this for the last week or so.

from bees.

kakra commented on June 1, 2024

How did you infer the value for BEES_TOO_LONG? Because when looking at my numbers, it should be enough to raise it to 2.8 to suppress most of the harmless PERFORMANCE warnings I am seeing.

from bees.

Zygo commented on June 1, 2024

I started at 10s, then cut it in half twice. It is a pretty arbitrary number that is highly dependent on the performance characteristics of the filesystem's lower storage layers; hence, my earlier suggestion to make it configurable.

from bees.

kakra commented on June 1, 2024

I wonder if we could detect sys/block/*/rotational to set a sane default... Maybe 2.5 * (rotational + 1)...

from bees.

Zygo commented on June 1, 2024

That would require...

getting the btrfs device tree
mapping that to Linux devices
mapping those, lsblk-style, back to /sys/block/*

which seems like a lot of extra code for bees, and it wouldn't handle exotic cases like bcache-on-spinning-rust or big NAS devices over iSCSI.

It would be much easier to make it a config option and let sysadmins pick the value that is best for their array. Or just demote it to DEBUG log level (now that we have those), since PERFORMANCE is only interesting to people who want to make bees faster.

from bees.

kakra commented on June 1, 2024

Demoting seems like the best idea... I'm working on configuration files now, so we can get a tuneable later.

from bees.

refs=0, does it make sense? about bees HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent