Code Monkey home page Code Monkey logo

Comments (9)

BurntSushi avatar BurntSushi commented on August 15, 2024

Yeah, right now we have iter() (which just gives the &str that matched that capture) and iter_pos() (which gives the start/end byte offsets of each capture match). If we get iter_named should we also have iter_named_pos?

This would be a good addition though, because I don't think there's any real way to do this with the existing public API.

from regex.

havarnov avatar havarnov commented on August 15, 2024

I can add that feature as well. Should I add it to the same pull request?

I see that iter_pos() returns Option<Option<usize, usize>>. Is this really necessary? Isn't it iterating over known groups (self.idx < self.caps.len())?

Maybe both iter_pos() and iter_named_pos() should just unwrap the position?

from regex.

BurntSushi avatar BurntSushi commented on August 15, 2024

Allow me to be upfront. I haven't given too much thought to the existing iterators define on Captures nor which iterators we should have. Retooling what's there is a totally reasonable thing to do.

Unfortunately, the Option<Option<(usize, usize)>> is probably correct. It is meant to delineate the difference between a capture group that matches something and a capture group that doesn't match. Note that it is possible for a capture group to not match while the overall regex still matches. I guess there are two reasonable ways to handle that in an iterator: do what we do now (represent it with a None value) or just skip it entirely. Skipping it doesn't sound like good default behavior since the client can do it trivially with filter_map and the identity function.

This why the other methods on Captures (pos, at and name) return an Option. Namely, None means "this capture was not involved in the match of the regex."

Below is some sample code demonstrating these different cases: (note the differences in the regex in the first two functions)

extern crate regex;

fn empty_capture() {
    let re = regex::Regex::new(r"(?P<foo>\d*)(?P<bar>[a-z]*)").unwrap();
    let caps = re.captures("abc").unwrap();
    for pos in caps.iter_pos().skip(1) {
        println!("{:?}", pos);
    }
}

fn noexist_capture() {
    let re = regex::Regex::new(r"(?P<foo>\d+)|(?P<bar>[a-z]+)").unwrap();
    let caps = re.captures("abc").unwrap();
    for pos in caps.iter_pos().skip(1) {
        println!("{:?}", pos);
    }
}

fn noexist_capture_skip() {
    let re = regex::Regex::new(r"(?P<foo>\d+)|(?P<bar>[a-z]+)").unwrap();
    let caps = re.captures("abc").unwrap();
    for pos in caps.iter_pos().skip(1).filter_map(|x| x) {
        println!("{:?}", pos);
    }
}

fn main() {
    println!("empty_capture");
    empty_capture();
    println!("noexist_capture");
    noexist_capture();
    println!("noexist_capture_skip");
    noexist_capture_skip();
}

from regex.

BurntSushi avatar BurntSushi commented on August 15, 2024

PS: I'm new to contributing the rust project and open source in general, so I'm sorry if I'm not following standard procedures.

I just wanted to say: thank you for your contribution! You're doing everything perfectly. :-) I apologize for the hang ups here, but you've stumbled on some really subtle issues of regex capture groups!

from regex.

BurntSushi avatar BurntSushi commented on August 15, 2024

I can add that feature as well. Should I add it to the same pull request?

I think I'm going to say to hold off for now. If you fix up my last comment (changing to Option<Option<(&str, &str)>>) then I'll merge it because it solves a use case. But I think the design of Captures needs to be reconsidered. (There are other cases not handled, like getting the positions given a name. But we're heading into combinatorial explosion here. We should probably switch to a trait or an enum. Then we can provide fewer iterators too.)

from regex.

havarnov avatar havarnov commented on August 15, 2024

Ok, but in that case shouldn't iter() also return Option<Option<'& str>>?

from regex.

BurntSushi avatar BurntSushi commented on August 15, 2024

Yes.

from regex.

havarnov avatar havarnov commented on August 15, 2024

One more thing, do you mean that iter_named() should return Option<(&str, Option<&str>)>?

extern crate regex;

fn main() {
    let re = regex::Regex::new(r"(?P<foo>\d+)|(?P<bar>[a-z]+)").unwrap();
    let caps = re.captures("123").unwrap();
    for (k, v) in caps.iter_named() {
        println!("group name: {:?}, value: {:?}", k, v);
    }
    // prints:
    // group name: foo, value: Some("123")
    // group name: bar, value: None
}

That makes more sense imo, since then you know that you iterate over all the named groups in the regex and you can check the value if it's Some or None.

Or am I missing something?

from regex.

BurntSushi avatar BurntSushi commented on August 15, 2024

Yup, that's exactly right. Nice catch!

from regex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.