Comments (8)
Can you please describe the problem you're trying to solve?
from regex.
Using
let re = Regex::new(r"([a-zA-Z_][a-zA-Z0-9]*)|([0-9]+)|(\.)|(=)").unwrap();
When matching "asdf.aeg = 34"
get
group index 1, match "asdf"
group index 3, match "."
group index 1, match "aeg"
group index 4, match "="
group index 2, match "34"
Just want to get the each group index in O(1) time. No need to take a iter to find the index.
from regex.
It sounds like what you want is the index of the first capture group that matches in O(1)
time. In the example you've given, the O(n)
solution seems perfectly fine to me because n
is tiny. I suspect that this is true for most real world use cases too.
In order to add this feature (and comply strictly with the O(1)
requirement), the regex VM would have to actually keep track of the first capture group that matches. This seems like a lot of work for probably zero gain (other than a nicer theoretical property).
I'm not inclined to get on board with this, but I'm happy to leave the issue open for now in case there is a real world use case that I've missed that would materially benefit from a change like this.
from regex.
I'm going to close this because I'm not convinced this is actually worth adding. If there's a more compelling use case that I'm missing, I'm happy to try and work through it with you. (A benchmark would be handy too, since O(n) where n is typically tiny probably isn't significant.)
from regex.
Horrible decision. "N" might be small for you, but these decisions lead to a generally poor set of libraries. Try telling somebody writing an NLP tokenizer over a large corpus or a compiler writer or somebody parsing news feeds for stock trading this. I really think you should reconsider. Rust is made to be performance based, and it seems strange to make such concessions in critcal library code.
from regex.
Hey, @jnordwick, do you happen to have a benchmark you can feed @BurntSushi? He's very responsive to feedback, and the decision seems totally reasonable given the context, since this feature will cause complications internally. 😄
(I'm sure everyone will be super-receptive as long as we don't get stuck in an unnecessarily aggressive discussion.)
from regex.
@huonw I've found a few acceptable ways around it, especially if it would cause a lot of grief in the code. Part of it was my misunderstanding of the str slice and part of it is going with a few simplier string matchers. I have to write KMP and Rabin-Karp rolling hash anyways.
from regex.
@jnordwick Yes, please try to be friendly and constructive.
Could you help me make a better decision by providing some data? A benchmark would be great!
I'm also not aware of other regex libraries that provide this information. Have you used one? If so, I'd love to take a peek.
I have to write KMP and Rabin-Karp rolling hash anyways.
If you're in this domain of problems, then you may find aho-corasick useful. I built it for use inside this crate, so its API isn't as useful as it could be. But I'd be happy to iterate on it with you.
from regex.
Related Issues (20)
- Inconsistent behavior with zero-width matches on empty strings
- Valid prefix search (with ^) goes into dead state HOT 3
- The regex parse error while the expre is correct ! HOT 2
- Onepass DFA always has empty captures (user error) HOT 2
- dfa/onepass.rs: index out of bounds HOT 2
- Errors when running quickstart from docs HOT 2
- Add a flag for unescaped literal groups HOT 1
- Inconsistency with is_match and Python's search in Matching Specific Regex Patterns HOT 6
- regex-lite with a &[u8] haystack HOT 2
- Underscore will not match propblaly HOT 2
- Invalid regex with multiple repetition flags is accepted HOT 3
- Valgrind reports "possibly lost" when using static `Regex` HOT 7
- adding regex-automata to cargo.toml file reduces performance
- Add method to get the full match of a Captures without an unwrap HOT 2
- valgrind reports "Conditional jump or move depends on uninitialised value(s)" HOT 2
- no_std support and cargo.toml doc comments HOT 2
- calling `caputres_iter` in an `async` function may cause memory leak HOT 20
- Compilation error when compiling with nightly with `unstable` feature enabled HOT 4
- switch `once_cell` recommendation to `std::sync::LazyLock` HOT 1
- Version 1.5.0 should be yanked HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from regex.