Comments (2)
Oh sorry, I just stepped back for a second and realized that this is an slightly unfair comparison. regex_match
checks if there is a match directly at the start of the slice, while regex::bytes::Regex::is_match
checks for a match anywhere in the slice. The slices I'm passing as input are exactly the same size as the pattern and either have a matching pattern or not, so in this particular case there shouldn't really be a difference.
Tomorrow I'll try and write an equivalent of regex::bytes::Regex::find
as I'm not super happy with the performance of that with these patterns either. I'll use that to generate another comparison.
from regex.
I'm happy to write a test program if that's required for this issue to get attention
Yes, please provide a reproduction program. While you did provide a good amount of details, it's no substitute for the real thing. Otherwise, I have to spend time deciphering exactly what you meant in your prose and possibly make guesses on things. But if you give me the code, the inputs and the specific commands you're running, then it takes all of the guesswork out of the process. I can then be reasonably sure that I'm looking at the same thing you are.
With that said...
My expectation is that this kind of pattern would be at-worst slightly slower than the naïve alternative that follows:
I think your expectation is way off personally. I wouldn't call your code naive in any way. It's a bespoke targeted solution to a specific problem. There are a variety of reasons why it might be faster here:
- My guess is that your benchmark is latency oriented. Your bespoke code is small and likely has little to almost no overhead at all. Compare that with a regex engine which has to execute a pile code for every search before it even gets to executing the search itself.
- Regex engines support a very general pattern language, and it just doesn't take advantage of every possible optimization opportunity. Sometimes it's because it's difficult to do. Sometimes it's because it might not be worth it. And in other less interesting cases, it's just because nobody has attempt to implement it. In this case, from what I can see, your bespoke matcher is doing something very critical to performance here that the regex engine is likely not doing: it's recognizing the bounded repeat as a simple skip instruction and literally jumping over an entire portion of the haystack. The regex engine, by contrast, is still going to match
.
1023 times one byte at a time. That's going to lose---and lose big---to the simple pointer offset you've implemented.
With respect to (2), it's possible the regex engine could implement this optimization, but it would be limited to specific cases. And I haven't thought through all of it.
from regex.
Related Issues (20)
- Mention MSRV change in CHANGELOG.md HOT 1
- regex-lite: make the std feature optional HOT 9
- Add/replace std::OnceCell mention in the Readme HOT 1
- Use the same name of a capture group in different alternatives of disjunction (| operator) HOT 1
- Big bump in code size with the switch to regex-automata in 1.9.0 HOT 2
- `regex-syntax` error messages highlight incorrect characters/not handling graphemes correctly HOT 4
- Add char_range() method for the match type HOT 2
- UnicodeSetsMode support (`v` flag mode, `\q`) HOT 9
- Detect if a replacement may allocate HOT 3
- Add method to get full match from `Captures` HOT 3
- Have a way to iterate over sub matches with names included HOT 1
- O(m * n) lookaround
- `meta::Cache::reset` can panic
- Add Min DFA for a regex HOT 23
- Inconsistent behavior with zero-width matches on empty strings
- Valid prefix search (with ^) goes into dead state HOT 3
- The regex parse error while the expre is correct ! HOT 2
- Onepass DFA always has empty captures (user error) HOT 2
- dfa/onepass.rs: index out of bounds HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from regex.