Code Monkey home page Code Monkey logo

Comments (8)

BurntSushi avatar BurntSushi commented on September 18, 2024

Can you please describe the problem you're trying to solve?

from regex.

qdwang avatar qdwang commented on September 18, 2024

Using

let re = Regex::new(r"([a-zA-Z_][a-zA-Z0-9]*)|([0-9]+)|(\.)|(=)").unwrap();

When matching "asdf.aeg = 34" get

group index 1, match "asdf"
group index 3, match "."
group index 1, match "aeg"
group index 4, match "="
group index 2, match "34"

Just want to get the each group index in O(1) time. No need to take a iter to find the index.

from regex.

BurntSushi avatar BurntSushi commented on September 18, 2024

It sounds like what you want is the index of the first capture group that matches in O(1) time. In the example you've given, the O(n) solution seems perfectly fine to me because n is tiny. I suspect that this is true for most real world use cases too.

In order to add this feature (and comply strictly with the O(1) requirement), the regex VM would have to actually keep track of the first capture group that matches. This seems like a lot of work for probably zero gain (other than a nicer theoretical property).

I'm not inclined to get on board with this, but I'm happy to leave the issue open for now in case there is a real world use case that I've missed that would materially benefit from a change like this.

from regex.

BurntSushi avatar BurntSushi commented on September 18, 2024

I'm going to close this because I'm not convinced this is actually worth adding. If there's a more compelling use case that I'm missing, I'm happy to try and work through it with you. (A benchmark would be handy too, since O(n) where n is typically tiny probably isn't significant.)

from regex.

jnordwick avatar jnordwick commented on September 18, 2024

Horrible decision. "N" might be small for you, but these decisions lead to a generally poor set of libraries. Try telling somebody writing an NLP tokenizer over a large corpus or a compiler writer or somebody parsing news feeds for stock trading this. I really think you should reconsider. Rust is made to be performance based, and it seems strange to make such concessions in critcal library code.

from regex.

huonw avatar huonw commented on September 18, 2024

Hey, @jnordwick, do you happen to have a benchmark you can feed @BurntSushi? He's very responsive to feedback, and the decision seems totally reasonable given the context, since this feature will cause complications internally. 😄

(I'm sure everyone will be super-receptive as long as we don't get stuck in an unnecessarily aggressive discussion.)

from regex.

jnordwick avatar jnordwick commented on September 18, 2024

@huonw I've found a few acceptable ways around it, especially if it would cause a lot of grief in the code. Part of it was my misunderstanding of the str slice and part of it is going with a few simplier string matchers. I have to write KMP and Rabin-Karp rolling hash anyways.

from regex.

BurntSushi avatar BurntSushi commented on September 18, 2024

@jnordwick Yes, please try to be friendly and constructive.

Could you help me make a better decision by providing some data? A benchmark would be great!

I'm also not aware of other regex libraries that provide this information. Have you used one? If so, I'd love to take a peek.

I have to write KMP and Rabin-Karp rolling hash anyways.

If you're in this domain of problems, then you may find aho-corasick useful. I built it for use inside this crate, so its API isn't as useful as it could be. But I'd be happy to iterate on it with you.

from regex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.