blyxxyz / lexopt Goto Github PK

View Code? Open in Web Editor NEW

279.0 4.0 9.0 176 KB

Minimalist pedantic command line parser

License: MIT License

Rust 98.56% JavaScript 1.44%

argument-parser command-line getopt rust

lexopt's People

Contributors

Stargazers

Watchers

Forkers

isgasho doytsujin nordzilla ldm0 theidinside aragilar taiki-e salewski

lexopt's Issues

Missing documentation on panics

Hey there,

First off, fantastic work with this crate! I really enjoy the interface and how lightweight it is!

While playing around with lexopt, I noticed that the following tiny example produced a panic that stumped me quite a bit:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut parser = lexopt::Parser::from_env();
    while let Some(arg) = parser.next()? {
        dbg!(arg);
    }
    Ok(())
}

When running this example as cargo run -- --m 10, everything works fine.
However, if I run this as cargo run -- --m=10, the code panics.
When consuming the value in the body of the while loop (via parser.optional_value()), the example works again.

Is this behavior expected / documented somewhere? It did surprise me quite a bit.

Can't detect `--`

I'd like to pass everything after -- to another command, but can't seem to detect the --. For instance, this test fails:

#[test]
fn test_lexopt() {
    let mut p = lexopt::Parser::from_args(&["foo", "bar", "--", "baz"]);
    assert_eq!(
        p.next().unwrap(),
        Some(lexopt::Arg::Value(std::ffi::OsString::from("foo")))
    );
    assert_eq!(
        p.next().unwrap(),
        Some(lexopt::Arg::Value(std::ffi::OsString::from("bar")))
    );
    assert_eq!(
        p.next().unwrap(),
        Some(lexopt::Arg::Value(std::ffi::OsString::from("--")))
    );
}

thread 'test_lexopt' panicked:
assertion `left == right` failed
  left: Some(Value("baz"))
 right: Some(Value("--"))

How would I get all the arguments after --? Or break out of the parsing loop once -- is encountered, without parsing baz?

Do not use `Error::source` in `fmt::Display` implementation

What I mean by this title is that if I want to manually print the entire error stack using the Error::source method, I get an error message twice. Example:

cannot parse argument "213": The length of hex string must be 6
The length of hex string must be 6

Code I use to unwind an error:

let args = args::parse_cli_args().unwrap_or_else(|err| {
    let mut err: Box<&(dyn std::error::Error)> = Box::new(&err);
    loop {
        eprintln!("{}", err);
        if let Some(source) = err.source() {
            err = Box::new(source);
        } else {
            break;
        }
    }
    std::process::exit(1);
});

I understand that the idea behind "lexopt" is to be as simple as possible with no fuss. Therefore, I suggest not to change the current behavior much. We can move the current behavior to the "{:#}" formatter and use "{}" to print a single error message (to be able to print the full error manually). That is how anyhow does this¹. But it will be breaking changes (or not?). What do you think? If you want to see it too, I can create a PR.

rust-lang/project-error-handling#23

For now, the recommendation is to always do one or the other. If you're going to print your sources, you should not return them via the source function and vice versa.

Cheers.

https://docs.rs/anyhow/latest/anyhow/struct.Error.html#display-representations ↩

Parser's value() method and dashes

The docs for Parser's value() method state:

"A value is collected even if it looks like an option (i.e., starts with -)."

This causes friction when parsing an option that takes a value. e.g. say I have a

usage: prog [-v] [-o file]

handling the -o case:

Short('o') => filename = parser.value()?,

If the user runs "prog -o -v", then '-v' becomes the filename, without any error. I would expect calling .value() to return an Err if the value starts with '-'.

I think returning an error would make it more consistent with Parser's values() method, which returns an error if the first value contains a dash. And also with Parser's next() method, which cannot return a Value(value), where value contains a dash.

Could we error if the return value to parser.value() starts with a '-' ? Or let me know if I'm missing something here. Thanks

.into_string() does not work well with other error types

Some programs parse inside a function that returns an anyhow-style error type or a custom error type. This doesn't play well with OsString::into_string(): its error type (OsString) can convert to anyhow::Error but not to most other error types.

rres uses .into_string().unwrap().
minoru-fediverse-crawler uses .into_string().map_err(|ostr| anyhow!("{}", ostr.to_string_lossy()))?.
Another workaround would be .into_string().map_err(lexopt::Error::from)?.
Yet another is .parse()?. This performs a copy.

A ValueExt::string(self) -> Result<String, lexopt::Error> method would solve this. (There may be a better name.)

Error's From<OsString> impl should then be kept, except perhaps in a release that already breaks most code for other reasons. But it should be removed from the README.

Short options should be able to take value with format -o=v

Would be nice to be able to say -o=some_value to adhere more closely to standards and other libraries / crates and thus make the switch away from more "heavy" libraries to this one a much more pleasant experience. For those who wish to have compilation speed and much less bloat, this would make that move much easier and not justifying the move much harder.

Getting `RawArgs` without advancing the iterator

Hi! I'm writing a library on top of lexopt and I'd like to peek the next argument without advancing the optional value, because I need to support some weird arguments that don't follow the usual rules.

Essentially I'd like a method like this, which gives me the RawArgs if there are no more shorts or long values to take:

pub fn try_raw_args(&mut self) -> Option<RawArgs<'_>> {
    if self.long_value.is_some() || self.shorts.is_some() {
         return None;
    }

    Some(RawArgs(&mut self.source))
}

With this method, I can make a loop like this, where the check for the custom argument does not interfere with the normal parsing:

loop {
    // Check the weird custom argument
    if let Some(raw) = parser.try_raw_args() {
        ...
    }

    // Parse normal argument
    let Some(arg) = parser.next() else { break; };
    ...
}

Retrieve argument count from the Parser

There is currently no way to tell the total length of arguments from the lexopt::Parser.

Consider this scenario:

pub struct Token;

// Clone trait bound is just to prevent conflict with the blanket
// implementation of impl<T> From<T> for T
impl<T: Clone> From<T> for Token {
    fn from(_: T) -> Self {
        Self {}
    }
}

fn lex_tokens() -> Result<Vec<Token>, Error> {
    let mut tokens = Vec::new();
    let mut p = parse("-a -bc value --long");
    while let Some(arg) = p.next()? {
        match arg {
            Short(val) => tokens.push(val.try_into()?),
            Long(val) => tokens.push(val.try_into()?),
            Value(val) => tokens.push(val.try_into()?),
        }
    }
    Ok(tokens)
}

It would be convenient if the parser were able to retrieve the argument count before the loop, so that we can initialize the Vec with a starting capacity that we know will hold all of the tokens:

fn lex_tokens() -> Result<Vec<Token>, Error> {
    let mut p = parse("-a -bc value --long");
    let mut tokens = Vec::with_capacity(p.arg_count());
    while let Some(arg) = p.next()? {
        match arg {
            Short(val) => tokens.push(val.try_into()?),
            Long(val) => tokens.push(val.try_into()?),
            Value(val) => tokens.push(val.try_into()?),
        }
    }
    Ok(tokens)
}

I don't think there's a workaround for from_args(), but with the from_env() case, users can just call:

Vec::with_capacity(std::env::args_os().len())

which honestly is not that bad, but it is unfortunate to have to use std::env::args_os() outside of the Parser.
Of course, there is still no guarantee that all the tokens will fit, even with setting the capacity, since combined short options may produce multiple tokens from the same argument. But it's still nice to get an estimate.

arg_count() could be as simple as

pub fn arg_count(&self) -> usize {
   self.source.len()
}

But there would be some tradeoffs to consider:

source would have to become a dyn ExactSizeIterator

source: Box<dyn ExactSizeIterator<Item = OsString> + 'static>,

from_args() would have to ensure that its IntoIter type is exact size

pub fn from_args<I>(args: I) -> Parser
where
    I: IntoIterator + 'static,
    I::Item: Into<OsString>,
    <I as IntoIterator>::IntoIter: ExactSizeIterator,
{

This would be a breaking change; your parse() helper function for your tests would have to collect into something with an exact size first, because a SplitWhitespace is lazily evaluated.

fn parse(args: &'static str) -> Parser {
    Parser::from_args(args.split_whitespace().map(bad_string).collect::<Vec<_>>())
}

One benefit of this is that all iterators would be capped out at usize's max value, so overflow would no longer be an issue for the position functionality.

I will leave it up to you to determine if such a change would be in line with your vision for this project. I'm just trying to think from a perspective of reducing friction for users. Is it more convenient to be able to get the arg count natively from the API, or is it more convenient to parse from iterators of unknown size? I don't know, but I felt like asking the question.

Anyhow, feel free to close this issue if this is not in line with your vision, and at least the decision will be documented here. If you keep the current behavior, it may be worth documenting how to get the size separately from std::env:args_os()

Keeping track of argument position

I asked on reddit about the possibility of enumerating the arguments that are parsed by lexopt.

As @blyxxyz pointed out, this is possible to do with the current implementation (0.1.0), but the approach is not recommended.

@blyxxyz also suggested that a potential option moving forward could be to add a method to the parser that the user could call at any time to retrieve the position of most-recently parsed argument.

Parser::position(&self) -> u64

I think this is great idea as a minimally-invasive, non-breaking change to the existing implementation that also keeps the API very minimalist and correct.

I believe that the position returned for the first-parsed argument would have to be 1, since the zero-index argument as a whole is technically the binary name. If someone were to call position() before calling next(), then 0 is the only logical choice for this scenario.

I would expect the functionality to be something like this:

let mut p = parse("-a -b -c value");
assert_eq!(p.position(), 0);

assert_eq!(p.next()?.unwrap(), Short('a'));
assert_eq!(p.position(), 1);
    
assert_eq!(p.next()?.unwrap(), Short('b'));
assert_eq!(p.position(), 2);
    
assert_eq!(p.next()?.unwrap(), Short('c'));
assert_eq!(p.position(), 3);

assert_eq!(p.value()?, "value");
assert_eq!(p.position(), 4);

let mut p = parse("-ab -cvalue");
assert_eq!(p.position(), 0);

assert_eq!(p.next()?.unwrap(), Short('a'));
assert_eq!(p.position(), 1);
    
assert_eq!(p.next()?.unwrap(), Short('b'));
assert_eq!(p.position(), 1);
    
assert_eq!(p.next()?.unwrap(), Short('c'));
assert_eq!(p.position(), 2);

assert_eq!(p.value()?, "value");
assert_eq!(p.position(), 2);

How to get remaining arguments

To implement external sub-commands, eg. git foo --bar --baz, I need to invoke a binary eg. git-foo in this case with arguments --bar --baz.

I can't figure out though, how to get the remaining arguments as plain strings, and not as Long/Short/Value. Any ideas how I might achieve that?

The position() function can be tricky to use in while-let loops

Consider this code:

fn find_first_long_value_with_position() -> Result<Option<(usize, String)>, Error> {
    let mut p = parse("-s --long");
    while let Some(arg) = p.next()? {
        match arg {
            Long(val) => return Ok(Some((p.position(), val.to_owned()))),
            _ => (),
        }
    }
    Ok(None)
}

Unfortunately, this code does not compile due to a double borrow:

|
|   while let Some(arg) = p.next()? {
|                         - mutable borrow occurs here
|       match arg {
|           Long(val) => return Ok(Some((p.position(), val.to_owned()))),
|                                        ^             --- mutable borrow later used here
|                                        |
|                                        immutable borrow occurs here

There are several ways around this; one of them is to simply swap the order of the tuple:

fn find_first_long_value_with_position() -> Result<Option<(String, usize)>, Error> {
    let mut p = parse("-s --long");
    while let Some(arg) = p.next()? {
        match arg {
            Long(val) => return Ok(Some((val.to_owned(), p.position()))),
            _ => (),
        }
    }
    Ok(None)
}

More generally, we just need to ensure that val's non-lexical lifetime has ended before the call to p.position(), so even with the original tuple ordering, we could do something like:

match arg {
    Long(val) => {
        let val = val.to_owned();
        return Ok(Some((p.position(), val)));
    }
    _ => (),
}

I don't think there is a clean way around this while still exposing position() as a public function that takes &self as an argument. After all, even bin_name() has the same difficulties, but I don't think anyone will be generating tuples of arguments with the bin name.

In my opinion, there are several paths forward:

Revert the position() functionality that was implemented in #2. This would make me a bit sad, because I'd really like to see this functionality available in this library, but I would understand.
Leave things the way they are, and thoroughly document how to properly navigate the borrowing nuances. The current functionality is correct and fully usable. It just has some pain points with the borrow checker if you're not careful.
Make the position() function private (or remove it), and change the definition of Arg to always include the position:

pub enum Arg<'a> {
    /// A short option, e.g. `-q`.
    Short(usize, char),
    /// A long option, e.g. `--verbose`. (The dashes are not included.)
    Long(usize, &'a str),
    /// A positional argument, e.g. `/dev/null`.
    Value(usize, OsString),
}

Users who don't care about the position could just ignore it in the match:

match arg {
    Short(_, val) => unimplemented!(),
    Long(_, val) => unimplemented!(),
    Value(_, val) => unimplemented!(),
}

Make the position() function private (or remove it), and make the position data member an Rc<Cell<Option<usize>>>. Then handle the complexity internally, providing two ways to get the next Arg: one that also provides the position, and one that does not.

We could add a function like the following:

pub fn next_enumerated(&mut self) -> Result<Option<(usize, Arg<'_>)>, Error> {
    let pos = self.position.clone(); // Just cloning the Rc
    self.next().map(|option| option.map(|arg| (pos.get().expect("argument count overflow"), arg)))
}

This is more in line with my original suggestion of creating an EnumeratedParser, except we're only adding one new function.

I like option 4) the most because it supports the position functionality without altering the existing functionality, and without user-facing borrow-checker complexity.

Importantly, this is still a breaking change, and so is adding the position() function in the first place. Initially, I thought it wasn't, but I realized that it is after giving it more thought. (See example playground.) I understand if this might change your opinion on the matter, possibly even toward option 1.

For more context, my personal use case for this function is something more like this:

pub struct Token;

// Clone trait bound is just to prevent conflict with the blanket
// implementation of impl<T> From<T> for T
impl<T: Clone> From<T> for Token {
    fn from(_: T) -> Self {
        Self {}
    }
}

#[derive(Debug)]
pub struct EnumeratedToken {
    position: usize,
    token: Token,
}

impl EnumeratedToken {
    fn new(position: usize, token: Token) -> Self {
        Self { position, token }
    }
}

fn lex_enumerated_tokens() -> Result<Vec<EnumeratedToken>, Error> {
    let mut p = lexopt::Parser::from_env();
    let mut tokens = Vec::with_capacity(std::env::args_os().len());
    while let Some((position, arg)) = p.next_enumerated()? {
        match arg {
            Short(val) => tokens.push(EnumeratedToken::new(position, val.try_into()?)),
            Long(val) => tokens.push(EnumeratedToken::new(position, val.try_into()?)),
            Value(val) => tokens.push(EnumeratedToken::new(position, val.try_into()?)),
        }
    }
    Ok(tokens)
}

I also want to note that whatever decision we come to (if it requires implementing new functionality), I'm happy to submit a pull request or do some code review, if you're open to such contributions.

Thanks again for your hard work on this!

Include the binary's name when calling `from_args()`

from_env() takes the executable's name from the argv, but from_args() expects an argv with the name removed. This discrepancy causes a bit of trouble:

this is slightly unexpected (although documented)
if one uses from_args(), then bin_name() becomes useless (always returns None)

I think it'd be simpler for everyone involved if from_args() took the same input as from_env() does, i.e. an argv with executable's name.

Add setting to forbid numeric option names

I'm using lexopt to write a command that supports negative numbers and dates (i.e., things starting with - and a digit) as arguments. If the user doesn't pass -- before such an argument, then lexopt treats, say, -1234 as the short option -1, and I have to use Parser::optional_value() to get the rest of the argument, append it to the -1, and parse the result. This works, but it's a little inconvenient, and it doesn't forbid arguments like -v1234.

I therefore request that lexopt add some setting (likely through a ParserBuilder) for not treating a hyphen followed by a digit as a short option. If you think this isn't sufficiently minimalist for lexopt, that's OK with me; I just thought I'd throw the idea out there.

Arg borrowing from Parser is inconvenient

When creating new abstractions on top of lexopt (like subcommands) it's annoying that Arg borrows from Parser. It means you can't call .value() or pass the parser to another function until you've unpacked the Arg.

The borrowing is necessary to be able to match on Arg::Long in an ergonomic way. You can match a &str against a string literal but not a String. So the Parser owns the string and the Arg just borrows it.

If it ever becomes possible to match a String against a string literal then it might be worth releasing a new backward-incompatible version of lexopt to take advantage of that.

In the meantime there's an ugly trick you can use to decouple the Arg from the Parser:

fn unleash_arg<'a, 'b>(
    storage: &'a mut Option<String>,
    arg: lexopt::Arg<'b>,
) -> lexopt::Arg<'a> {
    use lexopt::Arg::*;
    match arg {
        Long(opt) => {
            *storage = Some(opt.to_owned());
            Long(storage.as_deref().unwrap())
        }
        Short(opt) => Short(opt),
        Value(val) => Value(val),
    }
}

let mut storage = None;
let arg = unleash_arg(&mut storage, arg);

unleash_arg could be made into a method on Arg, but there might be a better solution.

(See also: #3)

Why does `bin_name()` return `str` rather than `OsString`?

lexopt tries not to convert strings unless absolutely necessary, but bin_name() deviates from that. The source code doesn't explain why. If bin_name() returned OsString, that'd be more consistent with other methods.

Incorrect "missing argument at end of command" error

This code:

fn main() -> Result<(), lexopt::Error> {
    let mut parser = lexopt::Parser::from_env();
    parser.values()?;
    Ok(())
}

Has this behavior:

$ cargo run -- -a b
Error: missing argument at end of command

This is Error::MissingValue's message if Parser doesn't remember an option. In 0.1.0 it was only returned by Parser::value(), at the end of the command line, but Parser::values() may return it if it encounters an option.

An easy fix would be to remove "at end of command" from the message, but maybe there's a better way out.

Please provide way to parse `-o=value` as option taking value `=value`

Hi. I saw this crate mentioned in a blog post and I like the idea. But there is a difficulty:

Argument parsers should be transparent as much as possible. Currently, becuase lexopt supports -o=value to mean "set to value", to unparse an unknown string (like a user-provided filename) it is necessary to always pass the =.

(The situation with short options is different to long options: supporting --long=value is completely unambiguous and simply a good idea.)

IMO the = is unnatural here. I'm not aware of many other programs which treat -o=value as setting the option to value. Almost all (including for example all the POSIX utilities) treat it as setting the option to =value. See eg the Utility Convension in the Single Unix Specification, or the manpage getopt(3)

And as I point out, handling = specially for short option values is not a cost-free feature (unlike with long options): it changes the meaning of existing programs. A shell script which takes some arguments and runs a lexopt-using Rust program, and passes filenames to it, must write the = or risk malfunctioning on filenames which start with=. Because the = is unconventional with a short option, the author of such a script probably won't have written it, so the script will probably have this bug.

And, within an existing Rust project, switching from another option parsing library to lexopt is hazardous because it will change the meaning of command lines which are accepted by both.

Could you please provide an option to allow lexopt to be used without this = on short option feature? I'm not sure if that would involve a ParserBuilder or whether you could just make it a configuration method on Parser.

Personally I think the default ought to be to not handle = specially in short options but that would be a breaking change.

Thanks for your attention.

The README walkthrough mentions a "convenience" method that I don't think is very clear

We start parsing with Parser::from_env().
We call parser.next() in a loop to get all the arguments until they run out.
We match on arguments. Short and Long indicate an option.
To get the value that belongs to an option (like 10 in -n 10) we call parser.value().
This returns a standard [OsString](https://doc.rust-lang.org/std/ffi/struct.OsString.html).
For convenience, use lexopt::prelude::* adds a .parse() method, analogous to [str::parse](https://doc.rust-lang.org/std/primitive.str.html#method.parse).
Calling parser.value() is how we tell Parser that -n takes a value at all.

In this section of the readme, it's exemplary clear to a newbie what's happening, until it isn't. After the 5'th line, describing the return type of parser.value() we suddenly gets mentions of another method called something confusingly similarly. I find it obfuscates more than it clarifies.

The line "For convenience, use lexopt::prelude::* adds a .parse() method, analogous to str::parse." is my problem. First, what is .parse() working on? How the hell do I keep them apart when they are named almost the same thing? Why is it a convenience?

Unless it's expanded a lot, I suggest that line be removed, as it clarifies little but adds confusion.

...or maybe it's just me being dense.

BTW the markdown looks messy, but I wanted to make the quote stand out from my text.

Feature request: Allow manually setting the context ("last_option") in error messages

Consider the following example:

fn main() -> Result<(), Box<dyn std::error::Error>>{
    use lexopt::prelude::*;

    let mut parser = lexopt::Parser::from_env();
    while let Some(arg) = parser.next()? {
        match arg {
            Long("debug") => {
                println!("Debug mode enabled");
            },
            Value(command) => {
                let arg1 = parser.value()?;
                println!("Command: {}, arg1: {}", command.string()?, arg1.string()?)
            }
            _ => {
                eprintln!("Unknown argument: {:?}", arg);
            }
        }
    }
    Ok(())
}

Running it with a missing command argument:

$ lexopt-test --debug test
Debug mode enabled
Error: missing argument for option '--debug'

The error message is misleading. It should be Error: missing argument for command 'test'.

Perhaps it could be improved in the following ways:

Reset last_option on Parser::next().

If the last option requires an argument, the user would use parser.value(), not parser.next().
If the last option has an optional argument, then the user might use parser.next(), but missing that argument would not be an error about the last option, anyway.
So this change only affects the rare case like an option require 0 or 2 arguments, so after seeing the option, the user would first call parser.next(), and if it's a value, call parser.value() again, and the error message should still be about the option.

Provide Parser::set_error_message(message_template: &str) and Parser::reset_error_message() to override (and restore) the default error message.

ability to treat all remaining arguments as free arguments

Hi, really nice library.

I would like to stop parsing options when I encounter a "free" argument (a Arg::Value) and treat all subsequent arguments as "free" arguments, regardless if they contain option flags. Ideally, I would be able to call something like parser.finish() (fn finish(self) -> Vec<OsString>) to get the remaining arguments. I didn't see an obvious way to do it with the current API.

blyxxyz / lexopt Goto Github PK

lexopt's People

Contributors

Stargazers

Watchers

Forkers

lexopt's Issues

Footnotes

Recommend Projects

Recommend Topics

Recommend Org