Code Monkey home page Code Monkey logo

Comments (8)

BurntSushi avatar BurntSushi commented on June 13, 2024

This is the logic that detects whether stdin is readable or not:

/// Returns true if and only if stdin is believed to be readable.
///
/// When stdin is readable, command line programs may choose to behave
/// differently than when stdin is not readable. For example, `command foo`
/// might search the current directory for occurrences of `foo` where as
/// `command foo < some-file` or `cat some-file | command foo` might instead
/// only search stdin for occurrences of `foo`.
///
/// Note that this isn't perfect and essentially corresponds to a heuristic.
/// When things are unclear (such as if an error occurs during introspection to
/// determine whether stdin is readable), this prefers to return `false`. That
/// means it's possible for an end user to pipe something into your program and
/// have this return `false` and thus potentially lead to ignoring the user's
/// stdin data. While not ideal, this is perhaps better than falsely assuming
/// stdin is readable, which would result in blocking forever on reading stdin.
/// Regardless, commands should always provide explicit fallbacks to override
/// behavior. For example, `rg foo -` will explicitly search stdin and `rg foo
/// ./` will explicitly search the current working directory.
pub fn is_readable_stdin() -> bool {
use std::io::IsTerminal;
#[cfg(unix)]
fn imp() -> bool {
use std::{
fs::File,
os::{fd::AsFd, unix::fs::FileTypeExt},
};
let stdin = std::io::stdin();
let Ok(fd) = stdin.as_fd().try_clone_to_owned() else { return false };
let file = File::from(fd);
let Ok(md) = file.metadata() else { return false };
let ft = md.file_type();
ft.is_file() || ft.is_fifo() || ft.is_socket()
}
#[cfg(windows)]
fn imp() -> bool {
winapi_util::file::typ(winapi_util::HandleRef::stdin())
.map(|t| t.is_disk() || t.is_pipe())
.unwrap_or(false)
}
#[cfg(not(any(unix, windows)))]
fn imp() -> bool {
false
}
!std::io::stdin().is_terminal() && imp()
}

This was the most recent change to the stdin detection as far as I know: 020c5453a5880257c87949030550d24c596f5c8d

Dealing with stdin is fundamentally a heuristic. And many programs do process control in a way that leaves stdin looking like there's something to read on it. This happened with neovim in #1892 and culminated in a change to neovim itself in neovim/neovim#14812.

Ideally it should work automatically.

It just can't. This is the downside of making rg foo work. You are not the first to report an issue like this.

from ripgrep.

ltrzesniewski avatar ltrzesniewski commented on June 13, 2024

I don't understand where that 1:foojitzu line comes from (I mean, there's nothing that goes through stdin). I don't get it when trying to reproduce, the only output I get is bar:foojitzu when running rg foojitzu | cat -.

from ripgrep.

aktau avatar aktau commented on June 13, 2024

I don't understand where that 1:foojitzu line comes from (I mean, there's nothing that goes through stdin). I don't get it when trying to reproduce, the only output I get is bar:foojitzu when running rg foojitzu | cat -.

Yes, that must've been a copy/paste error on my part. I had originally invoked without | cat -, but decided to do so as it would make the output more similar (if it had appeared).

And many programs do process control in a way that leaves stdin looking like there's something to read on it.

But in this case, I suspect that parallel doesn't connect a stdin at all. In this case, why not stop reading from stdin?

This is the logic that detects whether stdin is readable or not:

At the very least, I don't believe it should exit with -1 without stating what's wrong to the user. It took me too long to find that the workaround is to pass ./ as an extra argument. It appears the exit comes quickly after the following two lines (extract from strace):

1279757 read(0, "", 8192)               = 0
1279757 read(0, "", 65536)              = 0

I'm not sure why it tries to do this (twice?) and then exit(1) afterwards. I can't get strace -k to work in order to find the source of the exit(1) syscall (I tried installing the debian dbgsym package, to no avail).

Would it not be possible to just fall back to "don't read from stdin" if one cannot read from stdin?

from ripgrep.

BurntSushi avatar BurntSushi commented on June 13, 2024

With #2807, I added some extra details to --debug logging:

$ rg --debug foojitzu | cat -
rg: DEBUG|rg::flags::config|crates/core/flags/config.rs:41: /home/andrew/.ripgreprc: arguments loaded from config file: ["--max-columns-preview", "--colors=match:bg:0xff,0x7f,0x00", "--colors=match:fg:0xff,0xff,0xff", "--colors=line:none", "--colors=line:fg:magenta", "--colors=path:fg:green", "--type-add=got:*_test.go"]
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1083: number of paths given to search: 0
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1108: using heuristics to determine whether to read from stdin or search ./ (is_readable_stdin=false, stdin_consumed=false, mode=Search(Standard))
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1118: heuristic chose to search ./
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1269: found hostname for hyperlink configuration: duff
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1279: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 12 thread(s)
rg: DEBUG|grep_regex::config|crates/regex/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
bar:foojitzu

$ parallel -v 'rg --debug foojitzu ; echo done {}' ::: bar
rg --debug foojitzu ; echo done bar
done bar
rg: DEBUG|rg::flags::config|crates/core/flags/config.rs:41: /home/andrew/.ripgreprc: arguments loaded from config file: ["--max-columns-preview", "--colors=match:bg:0xff,0x7f,0x00", "--colors=match:fg:0xff,0xff,0xff", "--colors=line:none", "--colors=line:fg:magenta", "--colors=path:fg:green", "--type-add=got:*_test.go"]
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1083: number of paths given to search: 0
rg: DEBUG|grep_cli|crates/cli/src/lib.rs:209: for heuristic stdin detection on Unix, found that is_file=false, is_fifo=true and is_socket=false, and thus concluded that is_stdin_readable=true
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1108: using heuristics to determine whether to read from stdin or search ./ (is_readable_stdin=true, stdin_consumed=false, mode=Search(Standard))
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1121: heuristic chose to search stdin
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1269: found hostname for hyperlink configuration: duff
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1279: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 1 thread(s)
rg: DEBUG|grep_regex::config|crates/regex/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|crates/globset/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

The relevant lines for the parallel ... command are:

rg: DEBUG|grep_cli|crates/cli/src/lib.rs:209: for heuristic stdin detection on Unix, found that is_file=false, is_fifo=true and is_socket=false, and thus concluded that is_stdin_readable=true
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1108: using heuristics to determine whether to read from stdin or search ./ (is_readable_stdin=true, stdin_consumed=false, mode=Search(Standard))
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1121: heuristic chose to search stdin

Specifically, notice that is_fifo=true. parallel is specifically advertising stdin as a readable file descriptor.

I suspect that parallel doesn't connect a stdin at all.

The above logging demonstrates otherwise unfortunately.

At the very least, I don't believe it should exit with -1 without stating what's wrong to the user.

ripgrep cannot tell the difference between "ripgrep searched something the user did not intend and thus should produce an error" and "ripgrep searched something the user intended and could not find a match."

I'm not sure why it tries to do this (twice?) and then exit(1) afterwards.

Because parallel advertises stdin as a fifo. So ripgrep tries to search it. Apparently there is nothing there to read, so it is indistinguishable from piping an empty file on to stdin. ripgrep doesn't find what you asked for, and thus returns with no match (and a corresponding error exit code as a result).

There is no "crashing" here. I'm not sure why you reported it that way...

Let's please flip the script here. Instead of saying "the current behavior is not ideal" (which I would absolutely agree with), please instead suggest concrete changes. And please do so after reading the linked neovim issue.

from ripgrep.

aktau avatar aktau commented on June 13, 2024

First of, thanks for engaging on this. I appreciate it. I apologize if I came off as curt, it was not my intention.

Replying:

On Neovim. I read the Neovim issue, and see that they added a stdin closing option on the job API for that. I think I experienced a similar issue on another CLI that had to do with Neovim. The issue in that case was that the application did not expect to be passed a socket pair (libuv which Neovim uses creates subprocess connecting pipes with socketpair(2), not pipe(2)). AFAIK using socket pairs for stdio is the default on some OSes (the BSDs? I can't recall). Back then I investigated whether we couldn't just switch to pipe(2) which would have made the application I was dealing with work. I forgot why, but eventually I decided this had a low chance of success, libuv switched to socketpair(2) for a reason, and if I recall correctly the commit message isn't very clear about it, so chesterons fence applies.

Anyway, back to the issue: perhaps we should ask: what is the use of passing an empty file? Can ripgrep ever do anything useful if it gets passed an empty file? If the answer is no, perhaps it could be considered to just act as if stdin is not connected if reading from it succeeds but produces 0 bytes.

I also changed the issue title to s/crash/exit/, as it's clear now that this behaviour is intentional.

from ripgrep.

BurntSushi avatar BurntSushi commented on June 13, 2024

Anyway, back to the issue: perhaps we should ask: what is the use of passing an empty file? Can ripgrep ever do anything useful if it gets passed an empty file? If the answer is no, perhaps it could be considered to just act as if stdin is not connected if reading from it succeeds but produces 0 bytes.

There is no way that will ever happen. Passing an empty file isn't necessarily something someone does intentionally. An empty file is still a valid haystack to search. Doing rg foo < empty-file and having ripgrep ignore < empty-file to search the current working directory is completely unexpected behavior. If I were to allow that change, I'd be the first curse it. It would also have unexpected behavior in shell pipelines when there is no input being piped in. For example, maybe rg foo log.2024-05-14 | rg bar works because foo is in log.2024-05-14, but maybe foo isn't in log.2024-05-15. So instead of rg foo log.2024-05-15 | rg bar producing no output like you'd expect, rg bar would just search the current working directory instead. That's bonkers.

I appreciate that you're trying to make a concrete suggestion here, but I'm pretty sure there is nothing to be done here because the user's intent cannot be discerned. It is a failing of the Unix CLI conventions (of which, stdin detection is part of it). You have two choices. First is to make the calling program stop advertising stdin as a readable file descriptor. Second is to disable ripgrep's heuristic stdin detection by passing ./.

The only real way to avoid something like this is to build a program that doesn't change its behavior based on what stdin is. But then common usage scenarios get more verbose. ripgrep does the intuitive thing correctly 99% of the time in exchange for doing the unintuitive thing with a bad failure mode in certain cases. And usually that's because of the parent process doing something they probably shouldn't be doing... Like advertising a readable stdin.

from ripgrep.

aktau avatar aktau commented on June 13, 2024

You're right, I hadn't thought of that use case, mainly because I tend to use normal grep for this. I reach for ripgrep when I need to sensibly search "everything".

It doesn't make sense to break existing users of this, even if it were somehow decided that ignoring empty files were reasonable.

And usually that's because of the parent process doing something they probably shouldn't be doing... Like advertising a readable stdin.

Hopefully I can get around to writing this up for the GNU parallel author. But something tells me that changing its behaviour to supplying a closed stdin would trigger other types of unpalatable failures in other programs. Can't please everyone.

Thanks for your thoughts.

from ripgrep.

BurntSushi avatar BurntSushi commented on June 13, 2024

You're right, I hadn't thought of that use case, mainly because I tend to use normal grep for this. I reach for ripgrep when I need to sensibly search "everything".

Yeah, I very specifically designed ripgrep to be able to drop-in for grep in most cases. Obviously it has some different behaviors and flags, but in most cases, you should be able to replace grep with rg---including in a shell pipeline---without issue. This was a very intentional act on my part that stood in contrast to a tool like ag which does have weird quirks and failings when you try to use it like it grep.

But something tells me that changing its behaviour to supplying a closed stdin would trigger other types of unpalatable failures in other programs.

I think this is likely, yes, although I don't know what it is. If you do go through with this and get a concrete answer to this, I'd be most appreciative if you could share it here. Because it would complete the "nothing can be done" picture here. neovim, for example, I believe added an option to not pass a readable stdin, but I don't believe it changed its default behavior. So there is likely something here. Whether it's just another form of legacy or something more "legitimate" though, I dunno.

from ripgrep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.