Comments (9)
Yeah, right now we have iter()
(which just gives the &str
that matched that capture) and iter_pos()
(which gives the start/end byte offsets of each capture match). If we get iter_named
should we also have iter_named_pos
?
This would be a good addition though, because I don't think there's any real way to do this with the existing public API.
from regex.
I can add that feature as well. Should I add it to the same pull request?
I see that iter_pos()
returns Option<Option<usize, usize>>
. Is this really necessary? Isn't it iterating over known groups (self.idx < self.caps.len()
)?
Maybe both iter_pos()
and iter_named_pos()
should just unwrap the position?
from regex.
Allow me to be upfront. I haven't given too much thought to the existing iterators define on Captures
nor which iterators we should have. Retooling what's there is a totally reasonable thing to do.
Unfortunately, the Option<Option<(usize, usize)>>
is probably correct. It is meant to delineate the difference between a capture group that matches something and a capture group that doesn't match. Note that it is possible for a capture group to not match while the overall regex still matches. I guess there are two reasonable ways to handle that in an iterator: do what we do now (represent it with a None
value) or just skip it entirely. Skipping it doesn't sound like good default behavior since the client can do it trivially with filter_map
and the identity function.
This why the other methods on Captures
(pos
, at
and name
) return an Option
. Namely, None
means "this capture was not involved in the match of the regex."
Below is some sample code demonstrating these different cases: (note the differences in the regex in the first two functions)
extern crate regex;
fn empty_capture() {
let re = regex::Regex::new(r"(?P<foo>\d*)(?P<bar>[a-z]*)").unwrap();
let caps = re.captures("abc").unwrap();
for pos in caps.iter_pos().skip(1) {
println!("{:?}", pos);
}
}
fn noexist_capture() {
let re = regex::Regex::new(r"(?P<foo>\d+)|(?P<bar>[a-z]+)").unwrap();
let caps = re.captures("abc").unwrap();
for pos in caps.iter_pos().skip(1) {
println!("{:?}", pos);
}
}
fn noexist_capture_skip() {
let re = regex::Regex::new(r"(?P<foo>\d+)|(?P<bar>[a-z]+)").unwrap();
let caps = re.captures("abc").unwrap();
for pos in caps.iter_pos().skip(1).filter_map(|x| x) {
println!("{:?}", pos);
}
}
fn main() {
println!("empty_capture");
empty_capture();
println!("noexist_capture");
noexist_capture();
println!("noexist_capture_skip");
noexist_capture_skip();
}
from regex.
PS: I'm new to contributing the rust project and open source in general, so I'm sorry if I'm not following standard procedures.
I just wanted to say: thank you for your contribution! You're doing everything perfectly. :-) I apologize for the hang ups here, but you've stumbled on some really subtle issues of regex capture groups!
from regex.
I can add that feature as well. Should I add it to the same pull request?
I think I'm going to say to hold off for now. If you fix up my last comment (changing to Option<Option<(&str, &str)>>
) then I'll merge it because it solves a use case. But I think the design of Captures
needs to be reconsidered. (There are other cases not handled, like getting the positions given a name. But we're heading into combinatorial explosion here. We should probably switch to a trait or an enum. Then we can provide fewer iterators too.)
from regex.
Ok, but in that case shouldn't iter()
also return Option<Option<'& str>>
?
from regex.
Yes.
from regex.
One more thing, do you mean that iter_named()
should return Option<(&str, Option<&str>)>
?
extern crate regex;
fn main() {
let re = regex::Regex::new(r"(?P<foo>\d+)|(?P<bar>[a-z]+)").unwrap();
let caps = re.captures("123").unwrap();
for (k, v) in caps.iter_named() {
println!("group name: {:?}, value: {:?}", k, v);
}
// prints:
// group name: foo, value: Some("123")
// group name: bar, value: None
}
That makes more sense imo, since then you know that you iterate over all the named groups in the regex and you can check the value if it's Some
or None
.
Or am I missing something?
from regex.
Yup, that's exactly right. Nice catch!
from regex.
Related Issues (20)
- Add char_range() method for the match type HOT 2
- `regex::bytes::Regex::is_match` with a simple pattern with long sequences of wildcards is significantly slower than a naïve alternative HOT 2
- UnicodeSetsMode support (`v` flag mode, `\q`) HOT 9
- Detect if a replacement may allocate HOT 3
- Add method to get full match from `Captures` HOT 3
- Have a way to iterate over sub matches with names included HOT 1
- O(m * n) lookaround
- `meta::Cache::reset` can panic
- Add Min DFA for a regex HOT 23
- Inconsistent behavior with zero-width matches on empty strings
- Valid prefix search (with ^) goes into dead state HOT 3
- The regex parse error while the expre is correct ! HOT 2
- Onepass DFA always has empty captures (user error) HOT 2
- dfa/onepass.rs: index out of bounds HOT 2
- Errors when running quickstart from docs HOT 2
- Add a flag for unescaped literal groups HOT 1
- Inconsistency with is_match and Python's search in Matching Specific Regex Patterns HOT 6
- regex-lite with a &[u8] haystack HOT 2
- Underscore will not match propblaly HOT 2
- Invalid regex with multiple repetition flags is accepted HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from regex.