uutils / findutils Goto Github PK
View Code? Open in Web Editor NEWRust implementation of findutils
License: MIT License
Rust implementation of findutils
License: MIT License
10 occurrences found with
cargo +nightly clippy -- -W clippy::pedantic
warning: only a `panic!` in `if`-then statement
--> src/find/matchers/lname.rs:76:13
|
76 | / if e.kind() != ErrorKind::AlreadyExists {
77 | | panic!("Failed to create sym link: {:?}", e);
78 | | }
| |_____________^ help: try instead: `assert!(!(e.kind() != ErrorKind::AlreadyExists), "Failed to create sym link: {:?}", e);`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_assert
= note: `-W clippy::manual-assert` implied by `-W clippy::pedantic`
Currently:
# TOTAL: 16
# PASS: 3
# SKIP: 1
# XFAIL: 0
# FAIL: 11
# XPASS: 0
# ERROR: 1
I assume this isn't an intentional extension:
$ ./target/debug/find \( \)
.
./.git
...
Even if this is deprecated, for compat, we should implement it:
-follow
Deprecated; use the -L option instead. Dereference symbolic links. Implies
-noleaf. The -follow option affects only those tests which appear after it on the
command line. Unless the -H or -L option has been specified, the position of the
-follow option changes the behaviour of the -newer predicate; any files listed as
the argument of -newer will be dereferenced if they are symbolic links. The same
consideration applies to -newerXY, -anewer and -cnewer. Similarly, the -type pred-
icate will always match against the type of the file that a symbolic link points to
rather than the link itself. Using -follow causes the -lname and -ilname predi-
cates always to return false.
We should add implementations of both.
Repeated slashes for the root directory should match a single slash:
$ ./target/debug/find /// -maxdepth 0 -name /
$ find /// -maxdepth 0 -name /
find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name ‘/’' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ ‘/’'.
///
---- find::matchers::printf::tests::test_printf_special_types stdout ----
thread 'find::matchers::printf::tests::test_printf_special_types' panicked at 'Couldn't find fifo in /tmp/exampleZwqMxS', src/find/matchers/mod.rs:569:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
example: https://github.com/uutils/findutils/runs/7684599223?check_suite_focus=true
@refi64 rings a bell ?
Rust is amazing at parallelism, we could probably leverage that to run the analysis in parallel:
Line 156 in ac576f5
warning: empty String is being created manually
--> src/find/matchers/printf.rs:470:36
|
470 | .unwrap_or_else(|| "".to_owned())
| ^^^^^^^^^^^^^ help: consider using: `String::new()`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_string_new
= note: `-W clippy::manual-string-new` implied by `-W clippy::pedantic`
This involves handling -H, -L and finishing find::matchers::type_matcher.
Take a look at https://github.com/BurntSushi/walkdir/blob/master/examples/walkdir.rs for hints on how to handle -L with walkdir and https://github.com/BurntSushi/walkdir/blob/master/src/tests.rs on how to test interaction with symlinks on both unix and windows machines
-I R same as --replace=R
-i, --replace[=R] replace R in INITIAL-ARGS with names read
from standard input, split at newlines;
if R is unspecified, assume {}
$ ./target/debug/find -maxdepth 0 -execdir echo '{}' \;
Failed to run echo: No such file or directory (os error 2)
POSIX says
The find utility shall be able to descend to arbitrary depths in a file hierarchy and shall not fail due to path length limitations (unless a path operand specified by the application exceeds {PATH_MAX} requirements).
but
$ name="0123456789ABCDEF"
$ name="${name}${name}${name}${name}"
$ name="${name}${name}${name}${name}"
$ name="${name:0:255}"
$ (mkdir deep && cd deep && for i in {1..17}; do mkdir $name && cd $name; done)
$ ./target/debug/find deep -mindepth 17
Error: deep: other os error
$ find deep -mindepth 17
deep/0123456789ABCDEF0123456789ABCDEF...
Also, despite printing an error message, uutils' find exits with status 0.
cargo +nightly clippy --allow-dirty --fix -- -W clippy::pedantic
finds 34 occurrences
For example:
warning: variables can be used directly in the `format!` string
--> src/find/matchers/delete.rs:49:17
|
49 | writeln!(&mut stderr(), "Failed to delete {}: {}", path_str, e).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
= note: `-W clippy::uninlined-format-args` implied by `-W clippy::pedantic`
help: change this to
|
49 - writeln!(&mut stderr(), "Failed to delete {}: {}", path_str, e).unwrap();
49 + writeln!(&mut stderr(), "Failed to delete {path_str}: {e}").unwrap();
At the moment when something goes wrong in a non-fatal way, we just write directly to stderr. We should extend find::Dependencies to abstract out access to stderr much like we did with stdout. This will allow
We currently just return errors created from arbitrary strings. At some point we should switch to a proper find::Error enum
$ find | head
.
./.git
./.git/branches
./.git/hooks
./.git/hooks/applypatch-msg.sample
./.git/hooks/commit-msg.sample
./.git/hooks/post-update.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-commit.sample
./.git/hooks/prepare-commit-msg.sample
$ RUST_BACKTRACE=1 ./target/debug/find | head
.
./.git
./.git/branches
./.git/hooks
./.git/hooks/applypatch-msg.sample
./.git/hooks/commit-msg.sample
./.git/hooks/post-update.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-commit.sample
./.git/hooks/prepare-commit-msg.sample
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Os { code: 32, message: "Broken pipe" } }', src/libcore/result.rs:859
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
1: std::sys_common::backtrace::_print
2: std::panicking::default_hook::{{closure}}
3: std::panicking::default_hook
4: std::panicking::rust_panic_with_hook
5: std::panicking::begin_panic
6: std::panicking::begin_panic_fmt
7: rust_begin_unwind
8: core::panicking::panic_fmt
9: core::result::unwrap_failed
at /build/rust/src/rustc-1.17.0-src/src/libcore/macros.rs:29
10: <core::result::Result<T, E>>::unwrap
at /build/rust/src/rustc-1.17.0-src/src/libcore/result.rs:737
11: <findutils::find::matchers::printer::Printer as findutils::find::matchers::Matcher>::matches
at ./src/find/matchers/printer.rs:26
12: <findutils::find::matchers::logical_matchers::AndMatcher as findutils::find::matchers::Matcher>::matches::{{closure}}
at ./src/find/matchers/logical_matchers.rs:39
13: <core::slice::Iter<'a, T> as core::iter::iterator::Iterator>::all::{{closure}}
at /build/rust/src/rustc-1.17.0-src/src/libcore/slice.rs:1027
14: <core::slice::Iter<'a, T>>::search_while
at /build/rust/src/rustc-1.17.0-src/src/libcore/slice.rs:1118
15: <core::slice::Iter<'a, T> as core::iter::iterator::Iterator>::all
at /build/rust/src/rustc-1.17.0-src/src/libcore/slice.rs:1026
16: <findutils::find::matchers::logical_matchers::AndMatcher as findutils::find::matchers::Matcher>::matches
at ./src/find/matchers/logical_matchers.rs:39
17: findutils::find::process_dir
at ./src/find/mod.rs:122
18: findutils::find::do_find
at ./src/find/mod.rs:139
19: findutils::find::find_main
at ./src/find/mod.rs:198
20: find::main
at ./src/find/main.rs:14
21: std::panicking::try::do_call
22: __rust_maybe_catch_panic
23: std::rt::lang_start
24: main
25: __libc_start_main
26: _start
From a code review...
I'm wondering whether the testing-commandline should be made conditional, as it doesn't ship in production. The easiest way to do that would be to use a feature.
$ find -name '**.rs'
./src/find/matchers/perm.rs
...
$ ./target/debug/find -name '**.rs'
Error: Pattern syntax error near position 2: recursive wildcards must form a single path component
But **
is not a recursive glob, it is just two wildcards next to each other. Like the regex .*.*\.rs
This should be a minor extension to the existing -exec[dir] functionality
The upstream manpage states:
-newerXY reference
Compares the timestamp of the current file with reference. The
reference argument is normally the name of a file (and one of
its timestamps is used for the comparison) but it may also be a
string describing an absolute time. X and Y are placeholders
for other letters, and these letters select which time belonging
to how reference is used for the comparison.
a The access time of the file reference
B The birth time of the file reference
c The inode status change time of reference
m The modification time of the file reference
t reference is interpreted directly as a time
$ find . ! -newermt 'jan 01, 2000' -exec touch -d@1706517278 {} +
Error: Unrecognized flag: '-newermt'
As mentioned in uutils/coreutils#947 , a Rust implementation of xargs would be dope for faster, more secure systems.
$ ./target/debug/find -perm +rwx
Error: invalid mode '+rwx'
$ ./target/debug/find -perm u+rwX
Error: invalid mode 'u+rwX'
$ ./target/debug/find -perm u=g
Error: invalid mode 'u=g'
$ find -exec echo '{}' foo + \;
. foo +
...
$ ./target/debug/find -exec echo '{}' foo + \;
Error: -exec [args...] + isn't supported yet. Only -exec [args...] ;
In fact, POSIX says
The end of the primary expression shall be punctuated by a <semicolon> or by a <plus-sign>. Only a <plus-sign> that immediately follows an argument containing only the two characters "{}" shall punctuate the end of the primary expression. Other uses of the <plus-sign> shall not be treated as special.
👋 while upgrading rust to 1.71.0, we found that uutils-findutils 0.4.1 checksum got changed, raise this issue to confirm if there was a re-reg for the 0.4.1 release. Thanks!
relates to Homebrew/homebrew-core#137183
E.g.
$ mkdir foo
$ chmod -rwx foo
$ ./target/debug/find foo
foo
Error: foo: permission denied
$ echo $?
0
$ find foo
foo
find: ‘foo’: Permission denied
$ echo $?
1
$ ./target/debug/find / -maxdepth 0 -execdir pwd \;
/home/tavianator/code/uutils/findutils
$ find / -maxdepth 0 -execdir pwd \;
/
Of all GNU find's extensions, this is one of the most useful.
$ find \! \! -name find
./src/find
./target/debug/find
$ ./target/debug/find \! \! -name find
.
./.git
./.git/branches
./.git/hooks
./.git/hooks/applypatch-msg.sample
...
Here when detecting a -not
/!
it does invert_next_matcher = true
, but maybe it should be something more like invert_next_matcher = !invert_next_matcher
?
There are a number of pure rust globs and regex crates, might be better to not have to deal with a C dependency if we could avoid it.
From what I'm seeing regex
may lack a mean to select a specific flavour of regex, not sure if somebody already had a mean to restrict the engine to not support extensions compared to posix/emacs.
This should be fairly trivial with the code at time of writing: we just need to remove the hard-coded expectations about slashes being the path separator (mainly in the tests).
Some of the currently unimplemented features (user, group and symlinks) might have to involve some if cfg!(windows)
magic.
On the symlink front, I'd advise take inspiration from https://github.com/BurntSushi/walkdir/blob/master/src/tests.rs on how to test interaction with symlinks on both unix and windows machines
findutils is doing way too many argument management by hand.
It should be delegate to clap
for example
Lines 89 to 92 in ac576f5
Lines 167 to 175 in ac576f5
Lines 205 to 207 in ac576f5
findutils/src/find/matchers/mod.rs
Lines 205 to 212 in ac576f5
We (will soon) have support for -exec and -execdir clauses that end with a ';' (i.e. run this command for every file/directory). We need to also add support for clauses that end with a '+' (i.e. batch up the files/dirs and then run the command for as many as possible at once).
Unfortunately this isn't easy in an os-independent way because the standard library doesn't expose any way of telling when a command-line is going to be too long. I raised rust-lang/rust#40384 but it's not getting much traction.
So we need to go for a lowest common denominator approach. Choose a hard-coded limit (I'd suggest a bit less than 8kB, to allow for any inaccuracies in the next bit), come up with an efficient way of estimating the command-line length (doing this accurately is going to involve reimplementing too much of std::process::Command) and trigger the command when the estimated total goes over the limit.
To do this we need to
I would be great in uutils-findutils released binaries, like uutils-coreutils does! This makes it much easier for people to make scoop manifests etc... to install the findutils.
$ touch ./-
$ find -
-
$ ./target/debug/find -
Error: Unrecognized flag: '-'
Some discussion here: https://savannah.gnu.org/bugs/?15235
In coreutils, we have a great way to show progress / regression compared to the GNU testsuite
For example:
https://github.com/uutils/coreutils/runs/4833285375?check_suite_focus=true
Implementation:
https://github.com/uutils/coreutils/blob/main/.github/workflows/GnuTests.yml
It would be great to have this here too.
warning: unnecessary `!=` operation
--> src/testing/commandline/main.rs:81:26
|
81 | destination_dir: if args[1] != "-" {
| __________________________^
82 | | Some(args[1].clone())
83 | | } else {
84 | | None
85 | | },
| |_________^
|
= help: change to `==` and swap the blocks of the `if`/`else`
$ find ~ -print0 | ./target/debug/xargs -0 echo >/dev/null
Error: Command could not be run: Argument list too long (os error 7)
From a quick look, there's at least an issue here:
Lines 141 to 146 in 36e3229
This needs to count not only the length of the string, but also the size of the pointer that ends up in argv
/envp
.
It might be worth using the https://crates.io/crates/argmax crate to do this accounting for us. It could also help for #6.
$ ./target/debug/find -exec echo '{}' -help \;
Usage: find [path...] [expression]
...
But the -help
should be an argument to -exec
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.