Code Monkey home page Code Monkey logo

Comments (7)

Byron avatar Byron commented on September 26, 2024

Thanks for reporting!

Can you try to use hyperfine and see the impact of the thread-count on performance? Note that I threw in pdu as well as it usually is the fastest way to iterate.

root=<path-to-measure>
hyperfine -N -w1 -M2 "gdu $root" "dua -t1 $root" "dua -t2 $root" "dua -t4 $root" "dua -t8 $root" "pdu $root"

The theory is that dua uses too many threads which can actually hurt performance on MacOS, and I noticed that 3 to 4 threads is usually giving the best performance. Maybe there is a number that is bringing it closer to gdu. Lastly, pdu is typically faster than dua and I'd expect it to be as fast as gdu or faster. Please note that it has flags for thread-counts as well, in case you want to dive deeper if the results are interesting.
Also note that this uses the non-interactive version of dua which uses the same traversal engine under the hood.

from dua-cli.

glowinthedark avatar glowinthedark commented on September 26, 2024

@Byron

hyperfine results

linux arm64 ext4 (772.98 GiB total, HDD)

Click for system details RAM: 4 GB
$ uname -a
Linux iq 6.1.0-rpi7-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux
$ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A76
    Model:               1
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r4p1
    CPU(s) scaling MHz:  100%
    CPU max MHz:         2400.0000
    CPU min MHz:         1000.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32

                          atomics fphp asimdhp cpuid asimdrdm lrcpc

                          dcpop asimddp
Summary
  'gdu /media/t12/Music' ran
    1.07 ± 0.01 times faster than 'dua -t2 /media/t12/Music'
    1.13 ± 0.00 times faster than 'dua -t4 /media/t12/Music'
    1.31 ± 0.02 times faster than 'dua -t8 /media/t12/Music'
    1.49 ± 0.01 times faster than 'dua -t1 /media/t12/Music'

macos APFS (78,48 GiB total, built-in SSD)

Click for system details
#uname -a
Darwin NCM38333.local 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul  5 22:21:53 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6020 arm64
  Chip:	Apple M2 Pro
  Total Number of Cores:	12 (8 performance and 4 efficiency)
  Memory:	32 GB
Summary
  dua -t8 ~/projects ran
    1.08 ± 0.00 times faster than pdu ~/projects
    1.30 ± 0.00 times faster than dua -t4 ~/projects
    1.50 ± 0.01 times faster than gdu ~/projects
    2.16 ± 0.00 times faster than dua -t2 ~/projects
    3.94 ± 0.02 times faster than dua -t1 ~/projects

The non-interactive dua mode is performing great, i.e. dua -t8 ~/projects is very fast on APFS.

The slowness is observed with interactive mode with e.g. dua -t8 i ~/projects which takes almost forever. Not sure what would be the hyperfine command for testing interactive mode as I suppose it probably cannot handle tty mode (?)

from dua-cli.

Byron avatar Byron commented on September 26, 2024

Thanks for the measurements, very interesting results!

It's very interesting that gdu manages to be this much faster on Linux, and thread-scaling doesn't seem to do dua much good with -t2 being the best value on a 4-core machine.

On MacOS it scales much better, but the question remains why it's slow in interactive mode.

I have a hunch and implemented a fix in #225, which you are invited to try out. If you'd say that the ~/projects folder as a lot of top-level entries, then my hunch might be true.

Something you could also check is how many threads gdu uses by default - it's entirely unclear to me why it's so much faster on Linux except that maybe it's related to internal inefficiencies during traversal which weigh dua down (see #224). Edit: Maybe it's also related to the HDD being less receptive to the order of traversal or something related to it due to generally higher latencies. Whatever it is that makes it faster on SSD might be what makes it slower on HDD.

PS: I have made a new release with the fix, and would hope it will improve the situation as this is the only guess I had: https://github.com/Byron/dua-cli/releases/tag/v2.27.2 . Should it still not release the handbreaks you'd probably need to instrument a run, but we get there when we get there.

from dua-cli.

glowinthedark avatar glowinthedark commented on September 26, 2024

compiling for apple silicon on macos m2 throws an error while running cargo install dua-cli

error[E0446]: crate-private type `FilesystemScan` in public interface
  --> ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dua-cli-2.27.2/src/interactive/app/state.rs:42:5
   |
27 | pub(crate) struct FilesystemScan {
   | -------------------------------- `FilesystemScan` declared as crate-private
...
42 |     pub scan: Option<FilesystemScan>,
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't leak crate-private type

For more information about this error, try `rustc --explain E0446`.
error: could not compile `dua-cli` (bin "dua") due to previous error
error: failed to compile `dua-cli v2.27.2`, intermediate artifacts can be found at `/var/folders/py/73sb2fsj37xbmtkgw111l07w0000gp/T/cargo-installoMoXeN`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.

same error when explicitly checking out the tag (both on macos m2 and linux arm64):

git clone https://github.com/Byron/dua-cli.git && cd dua-cli
git checkout tags/v2.27.2
cargo build --release

Tried the Intel X86 binary from the releases — completes Ok:

/tmp/dua-v2.27.2-x86_64-apple-darwin/dua i ~/projects
Sort mode: size descending  Total disk usage: 149.07 GB  
Processed 1743246 entries in 9.81s 

the original m2-binary (v2.20.1 arm64) still shows scanning apparently even after scanning finished (although the number of entries is not identical) 🤔

Entries: 1 in 0s (472/s)  -> scanning <- 149.07 GB  
Entries: 1743248 in 8.99s

from dua-cli.

Byron avatar Byron commented on September 26, 2024

compiling for apple silicon on macos m2 throws an error while running cargo install dua-cli

This is fixed now in main, see #226 .

the original m2-binary (v2.20.1 arm64) still shows scanning apparently even after scanning finished (although the number of entries is not identical) 🤔

This typically means that it is indeed still scanning, but all threads are stalled, presumably. I recommend to try again building the latest version. Let's see.

from dua-cli.

glowinthedark avatar glowinthedark commented on September 26, 2024

pulling, building and running latest main now makes dua -t8 i .. finish scanning in about the same time as gdu with just ~2..3 seconds difference on macos m2 (1744024 entries in 22.25s), on linux rpi 5 arm64 8GB RAM scanning a 765GB file system tree on a NVME m2 drive takes roughly equal time as gdu (723.05 GiB Processed 640603 entries in 5.25s), hard to tell the difference

thank you so much for taking the time to look into this — much appreciated! 🙏

from dua-cli.

Byron avatar Byron commented on September 26, 2024

Thanks so much for letting me know, it's much appreciated, too :).

It's great to hear that the fix did indeed work, and that gdu isn't unconditionally faster anymore :).

Closing, as it sounds like this issue is no more.

from dua-cli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.