Code Monkey home page Code Monkey logo

dk.brics.automaton's People

Contributors

amoeller avatar dan2097 avatar lewismc avatar turf00 avatar valfirst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dk.brics.automaton's Issues

Extract States Sequence

Hi, is there any chances that I can extract the corresponding states sequence after processing the string on the automaton? Thank you.

got a diff result from the java Pattern and Automaton

String str = "num1=123&num2=456";
String regex = "num1=.&num2=.";
p = Pattern.compile(regex);
System.out.println(p.matcher(str).matches());
Automaton automaton = new RegExp(regex).toAutomaton();
System.out.println(automaton.run(str));
//result : true , false

if I add '\\' before '&' the result is "true, true"

is '&' a special char?

found '#' have the same problem

it also return empty when do intersection between two same regex contain the special char.

is it have other special char ?

Unexpected poor performance with sub string matching in AutomatonMatcher

My expectation with Brics is that for the following scenario:

  • When matching a sub-string inside an input String with the AutomatonMatcher via a RunAutomaton
  • Regex starts with any character with repetition i.e. .*
  • Input string does not match at all

That the runtime of matching would be equivalent to the number of characters in the input string, i.e. each char would be interrogated once as part of the matching.

E.g.

  public boolean find(final String input) {
    // automaton is of type RunAutomaton built using tabelize=true
    final AutomatonMatcher matcher = automaton.newMatcher(input);
    return matcher.find();
  }

However, this does not appear to be the case at all. The implementation does some form of "backtracking" in the AutomatonMatcher#find() method. Is this a known problem or just expected with the ability to match and return the position of matching.

https://github.com/cs-au-dk/dk.brics.automaton/blob/master/src/dk/brics/automaton/AutomatonMatcher.java#L94

   public boolean find() {
...
    int l = getChars().length();
		while (begin < l) {
			int p = automaton.getInitialState();
                         // inner loop
                         // will "backtrack per char" if the matching eventually fails
			for (int i = begin; i < l; i++) {
				final int new_state = automaton.step(p, getChars().charAt(i));
				if (new_state == -1) {
				    break;
				} else if (automaton.isAccept(new_state)) {
				    // found a match from begin to (i+1)
				    match_start = begin;
				    match_end=(i+1);
				}
				p = new_state;
			}
			if (match_start != -1) {
				setMatch(match_start, match_end);
				return true;
			}
			begin += 1;
		}
		if (match_start != -1) {
			setMatch(match_start, match_end);
			return true;
		} else {
			setMatch(-2, -2);
			return false;
		}
    }

You see that if a match occurs at position i, then we will continue until the match fails and then restart from position i+1.

Adding my own version of this class, with extra counting and logging I can see the following. We initially noticed this due to "poor" performance in relation to testing some inputs and patterns.

In essence we simply count how often automaton.step is checked.

For a test with:

  • Input string length = 100,000 chars
  • Input will not match
  • Pattern = .*(rhino[£$%])+

Then we see, calls to automaton.step = 5000050000

Which are way more than my expectation of 100k would be?

I was able to approach the performance I would expect by hacking a version of the RunAutomaton.run method that returns early if it reaches an accept state and giving any regex a wildcard match at the start. Obviously that doesn't handle position checking.

Please don't get me wrong though this library is a great tool and we appreciate all the work that went into it.

You might argue that the example is a little dumb as it has .* at the start and we are doing a match but I believe the same problem would also occur with other regexs that use alternations with repetition not at the start of the regex depending on the input string form.

I'm sure I am missing something and will welcome any assistance.

Ignore-case (?i) Flag Support in RegExp

The ignore-case flag is one JDK/Perl regex feature that is badly missing from dk.brics.automaton. Since the question mark is a reserved character, it should hopefully not break existing regular expressions? If it would, it could be made optional, hidden behind a RegExp constructor flag.

If that flag-style is acceptable, I would be willing to take a shot at implementing it.

Is this project still alive?

Hi @amoeller, team !

The last commit on the master was more than 2 years ago and it seems the project isn't actively maintained.

Is this project still alive?

Store RunAutomaton without tabelize and tabelize after load

I have a probably rather niche use case, I'd like to create RunAutomatons, serialize them with store, then only once they're deserialized with load tabelize them. The reasoning being that for the automata in question (https://github.com/dan2097/opsin/tree/master/opsin-core/src/main/resources/uk/ac/cam/ch/wwmm/opsin/resources/serialisedAutomata) tabelizing significantly increases their size, and given how fast setAlphabet() is there could be even a speed penalty for deserializing this array as opposed to creating it programmatically (Java's inbuilt serialization code isn't the fastest).

As setAlphabet() isn't public I can currently achieve this either via reflection or having a class in the dk.brics.automaton package.

I can't really think of an idiomatic solution to this. I mean a simple solution would be to make setAlphabet() public, but there isn't normally a reason to call this after the RunAutomaton is constructed. A load(InputStream stream, boolean tableize) could get confusing as to how it interacts with serialized tabelized and non-tabelized automata e.g. if you set it to false would it set classmap to null on the deserialized RunAutomaton.

Some regexes are not properly visualized by `toDot()`

After converting some regular expressions to automata and attempting to visualize them, I found that some regexes are not output correctly. Some examples are shown below.

Output when visualizing $\{upper:([a-z0-9]|:|\. ||/):
fig1
In the ([a-z0-9]|:|\. ||/) part, the output does not contain 0-9. Also, :. /, which is expected to be . -: is output.

Output when visualizing $\{([a-z0-9]|:)+:\-([a-z0-9])+:
fig2
In the [a-z0-9] part, the part marked in red is the expected output, while the part marked in blue seems to be redundant.

Some of these regular expressions are improperly visualized, but I get the expected results for matching in both regexes. Are these behaviors due to a bug or the specification of the library?

toString for REGEXP_STRING should escape " (double quote)

When calling toString for a RegExp that is a string, I would expect that a double quote gets escaped (if not already). This would allow to re-parse the same string to get the same RegExp. Here it is a failing test:
@test
public void problemRegExpToString() throws IOException {
String reg = "a\"";
System.out.println(reg); \ a"
RegExp regexR = new RegExp(reg);
System.out.println(regexR.toString()); \ "a""
new RegExp(regexR.toString()); // Exception
}
The second call for the constructor throws an exception (for the string is now "a""

Add support for JPMS modules

Is there any interest in supporting JPMS modules introduced in Java 9? Currently, I am in the process of modularizing some projects and get a warning when using BRICS since the compiler is only able to infer a filename-based module name. I see two possibilities to address this issue.

  1. The least invasive approach would be to include an Automatic-Module-Name entry in the MANIFEST.MF file of the built JAR file. This would allow you to claim a module name (something like dk.brics.automaton) and get rid of the warnings without having to change too much of the build process. For example (in case of Maven), it would only require an additional config property on the jar-plugin.

  2. However, since BRICS barely has any dependencies, it would also be easy to add a full module-info.java descriptor which would allow people to conveniently use BRICS in more advanced setups that include jlink or jpackage. The drawback of this approach is that you would have to build BRICS on a JDK greater than 8. To not break compatibility with the current version, I would suggest to realize this with two compilation phases. One that compiles everything for Java 9+ (including the module-info) and one that re-compiles everything (except the module-info) for Java 8. Since module-info isn't a valid Java identifier, you can't reference this file in regular Java 8 applications. So unless you are doing some reflective JAR scanning, people shouldn't run into any problems with UnsupportedClassVersionErrors.

If interested, I could provide a PR for the initial support of this feature. However, since I am not really well-versed in Ant, I would probably need some help to adjust the build.xml as well.

Cover codebase with tests

It would be great to have tests for different algorithms. It allows regression testing and it helps with familiarization.

Regex doesn't handle surrogate pairs properly

Hi, thank you for providing a great regular expression library!

I have noticed that brics handles input regex string as a sequence of java.lang.Character, and this could cause a somewhat unintuitive behavior.

For example, 𠀋<𠮟<𡵅 as a Unicode Scalar Value (0x2000b, 0x20b9f, 0x21d45 respectively, all of them will be expressed with surrogate pairs), but automaton created from [𠀋-𡵅] doesn't accept 𠮟.

private static void checkOneCodePoint(final String s) {
    if (s.codePointCount(0, s.length()) != 1) throw new IllegalArgumentException();
}

public static boolean testBrics(final String a, final String b, final String c) {
    checkOneCodePoint(a); checkOneCodePoint(b); checkOneCodePoint(c);
    final RegExp regex = new RegExp("[" + a + "-" + c + "]");
    return regex.toAutomaton().run(b);
}

public static boolean testJava(final String a, final String b, final String c) {
    checkOneCodePoint(a); checkOneCodePoint(b); checkOneCodePoint(c);
    final Pattern pattern = Pattern.compile("[" + a + "-" + c + "]");
    return pattern.matcher(b).matches();
}

public static void main(final String[] args) throws IOException {
    final String a = new String(new int[]{0x2000b}, 0, 1); // 𠀋
    final String b = new String(new int[]{0x20b9f}, 0, 1); // 𠮟
    final String c = new String(new int[]{0x21d45}, 0, 1); // 𡵅
    System.out.println(testBrics(a, b, c)); // false
    System.out.println(testJava(a, b, c)); // true
}

Fixing this would require us to do

  • Read regex string as (not java.lang.Character-by-java.lang.Character but) Code Point stream. This also includes fixes for operator precedence, like 𠀋+.
  • Convert them to java.lang.Characters, and if they involve surrogate pairs, do something similar to what we do for numerical interval <n-m>

Although won't-fix totally make sense, it'd be great if we could find this fact in the documentation.

Thanks,

Missing tag

The git repository is missing the tag for the 1.12 release (should be put on d646e15)

Complement pattern match problem

With regular expression, the pattern "(?!201[0-8])\d{4}" matches "2022" but not "2012". However with automaton, the complement pattern "~(201[0-8])\d{4}" matches both "2022" and "2012". I don't know why. How can I implement a pattern making same function with the regular expession "(?!201[0-8])\d{4}"?

Word boundary not working

Hi !
Ran into word boundary issue, I have a long list of tokens, some of which I
need to match with word boundary; Basic Pattern class works, but not Automation.
Any way this can be fixed?

Automaton p0_AA = new RegExp(".*(something|\b(blah|foo|goo)\b)").toAutomaton();
RunAutomaton p0_RA = new RunAutomaton(p0_AA);
System.out.println(p0_RA.run("ba foo nery"));

-->false

Basic regex works with above.

Pattern p0 = Pattern.compile(".*(something|\b(blah|foo|goo)\b)");
String s = "ba foo nery";
Matcher m = p0.matcher(s);
if (m.find()) {
System.out.println("pattern found");
} else {
System.out.println("not found");
}

-->found

getFiniteStrings has a bug when the initial state is an accept state

The following assertion fails. (in scala)

assert(
    BasicOperations.union(BasicAutomata.makeString(""), BasicAutomata.makeEmpty())
        .getFiniteStrings().asScala == Set("")
)

It appears there is a bug in the getFiniteStrings function. When:

  • initial state is accept state
  • the automata has no transitions

nothing is added to strings.

Fix Javadoc WARNING's post JDK11 upgrade

The following JDK WARNING's should be addressed

100 warnings
[WARNING] Javadoc Warnings
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:505: warning: no @return
[WARNING] public int getNumberOfStates() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:515: warning: no @return
[WARNING] public int getNumberOfTransitions() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:595: warning: no @return
[WARNING] public String toDot() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:682: warning: no @return
[WARNING] public static Automaton load(URL url) throws IOException, ClassCastException, ClassNotFoundException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:693: warning: no @return
[WARNING] public static Automaton load(InputStream stream) throws IOException, ClassCastException, ClassNotFoundException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:712: warning: no @return
[WARNING] public static Automaton makeEmpty()	{
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:719: warning: no @return
[WARNING] public static Automaton makeEmptyString() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:726: warning: no @return
[WARNING] public static Automaton makeAnyString()	{
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:733: warning: no @return
[WARNING] public static Automaton makeAnyChar() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:740: warning: no @param for c
[WARNING] public static Automaton makeChar(char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:740: warning: no @return
[WARNING] public static Automaton makeChar(char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:747: warning: no @param for min
[WARNING] public static Automaton makeCharRange(char min, char max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:747: warning: no @param for max
[WARNING] public static Automaton makeCharRange(char min, char max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:747: warning: no @return
[WARNING] public static Automaton makeCharRange(char min, char max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:754: warning: no @param for set
[WARNING] public static Automaton makeCharSet(String set) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:754: warning: no @return
[WARNING] public static Automaton makeCharSet(String set) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:761: warning: no @param for min
[WARNING] public static Automaton makeInterval(int min, int max, int digits) throws IllegalArgumentException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:761: warning: no @param for max
[WARNING] public static Automaton makeInterval(int min, int max, int digits) throws IllegalArgumentException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:761: warning: no @param for digits
[WARNING] public static Automaton makeInterval(int min, int max, int digits) throws IllegalArgumentException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:761: warning: no @return
[WARNING] public static Automaton makeInterval(int min, int max, int digits) throws IllegalArgumentException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:768: warning: no @param for s
[WARNING] public static Automaton makeString(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:768: warning: no @return
[WARNING] public static Automaton makeString(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:775: warning: no @param for strings
[WARNING] public static Automaton makeStringUnion(CharSequence... strings) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:775: warning: no @return
[WARNING] public static Automaton makeStringUnion(CharSequence... strings) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:782: warning: no @param for n
[WARNING] public static Automaton makeMaxInteger(String n) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:782: warning: no @return
[WARNING] public static Automaton makeMaxInteger(String n) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:789: warning: no @param for n
[WARNING] public static Automaton makeMinInteger(String n) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:789: warning: no @return
[WARNING] public static Automaton makeMinInteger(String n) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:796: warning: no @param for i
[WARNING] public static Automaton makeTotalDigits(int i) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:796: warning: no @return
[WARNING] public static Automaton makeTotalDigits(int i) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:803: warning: no @param for i
[WARNING] public static Automaton makeFractionDigits(int i) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:803: warning: no @return
[WARNING] public static Automaton makeFractionDigits(int i) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:810: warning: no @param for value
[WARNING] public static Automaton makeIntegerValue(String value) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:810: warning: no @return
[WARNING] public static Automaton makeIntegerValue(String value) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:817: warning: no @param for value
[WARNING] public static Automaton makeDecimalValue(String value) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:817: warning: no @return
[WARNING] public static Automaton makeDecimalValue(String value) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:824: warning: no @param for s
[WARNING] public static Automaton makeStringMatcher(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:824: warning: no @return
[WARNING] public static Automaton makeStringMatcher(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:831: warning: no @param for a
[WARNING] public Automaton concatenate(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:831: warning: no @return
[WARNING] public Automaton concatenate(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:838: warning: no @param for l
[WARNING] static public Automaton concatenate(List<Automaton> l) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:838: warning: no @return
[WARNING] static public Automaton concatenate(List<Automaton> l) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:845: warning: no @return
[WARNING] public Automaton optional() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:852: warning: no @return
[WARNING] public Automaton repeat() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:859: warning: no @param for min
[WARNING] public Automaton repeat(int min) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:859: warning: no @return
[WARNING] public Automaton repeat(int min) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:866: warning: no @param for min
[WARNING] public Automaton repeat(int min, int max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:866: warning: no @param for max
[WARNING] public Automaton repeat(int min, int max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:866: warning: no @return
[WARNING] public Automaton repeat(int min, int max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:873: warning: no @return
[WARNING] public Automaton complement() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:880: warning: no @param for a
[WARNING] public Automaton minus(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:880: warning: no @return
[WARNING] public Automaton minus(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:887: warning: no @param for a
[WARNING] public Automaton intersection(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:887: warning: no @return
[WARNING] public Automaton intersection(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:894: warning: no @param for a
[WARNING] public boolean subsetOf(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:894: warning: no @return
[WARNING] public boolean subsetOf(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:901: warning: no @param for a
[WARNING] public Automaton union(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:901: warning: no @return
[WARNING] public Automaton union(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:908: warning: no @param for l
[WARNING] static public Automaton union(Collection<Automaton> l) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:908: warning: no @return
[WARNING] static public Automaton union(Collection<Automaton> l) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:922: warning: no @param for pairs
[WARNING] public void addEpsilons(Collection<StatePair> pairs) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:929: warning: no @return
[WARNING] public boolean isEmptyString() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:936: warning: no @return
[WARNING] public boolean isEmpty() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:943: warning: no @return
[WARNING] public boolean isTotal() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:950: warning: no @param for accepted
[WARNING] public String getShortestExample(boolean accepted) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:950: warning: no @return
[WARNING] public String getShortestExample(boolean accepted) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:957: warning: no @param for s
[WARNING] public boolean run(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:957: warning: no @return
[WARNING] public boolean run(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:972: warning: no @param for a
[WARNING] public static Automaton minimize(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:972: warning: no @return
[WARNING] public static Automaton minimize(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:980: warning: no @param for a
[WARNING] public Automaton overlap(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:980: warning: no @return
[WARNING] public Automaton overlap(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:987: warning: no @return
[WARNING] public Automaton singleChars() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:994: warning: no @param for set
[WARNING] public Automaton trim(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:994: warning: no @param for c
[WARNING] public Automaton trim(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:994: warning: no @return
[WARNING] public Automaton trim(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1001: warning: no @param for set
[WARNING] public Automaton compress(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1001: warning: no @param for c
[WARNING] public Automaton compress(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1001: warning: no @return
[WARNING] public Automaton compress(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1008: warning: no @param for map
[WARNING] public Automaton subst(Map<Character,Set<Character>> map) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1008: warning: no @return
[WARNING] public Automaton subst(Map<Character,Set<Character>> map) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1015: warning: no @param for c
[WARNING] public Automaton subst(char c, String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1015: warning: no @param for s
[WARNING] public Automaton subst(char c, String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1015: warning: no @return
[WARNING] public Automaton subst(char c, String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1022: warning: no @param for source
[WARNING] public Automaton homomorph(char[] source, char[] dest) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1022: warning: no @param for dest
[WARNING] public Automaton homomorph(char[] source, char[] dest) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1022: warning: no @return
[WARNING] public Automaton homomorph(char[] source, char[] dest) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1029: warning: no @param for chars
[WARNING] public Automaton projectChars(Set<Character> chars) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1029: warning: no @return
[WARNING] public Automaton projectChars(Set<Character> chars) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1036: warning: no @return
[WARNING] public boolean isFinite() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1043: warning: no @param for length
[WARNING] public Set<String> getStrings(int length) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1043: warning: no @return
[WARNING] public Set<String> getStrings(int length) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1050: warning: no @return
[WARNING] public Set<String> getFiniteStrings() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1057: warning: no @param for limit
[WARNING] public Set<String> getFiniteStrings(int limit) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1057: warning: no @return
[WARNING] public Set<String> getFiniteStrings(int limit) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1064: warning: no @return
[WARNING] public String getCommonPrefix() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1078: warning: no @param for a
[WARNING] public static Automaton hexCases(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1078: warning: no @return
[WARNING] public static Automaton hexCases(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1085: warning: no @param for a
[WARNING] public static Automaton replaceWhitespace(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1085: warning: no @return
[WARNING] public static Automaton replaceWhitespace(Automaton a) {
[WARNING] ^
[INFO] Building jar: /Users/lmcgibbn/Downloads/dk.brics.automaton/target/automaton-1.12-2-javadoc.jar

special chars \d \s \w in regex

It would be very good if brics regex subsystem could support natively special chars \d \s \w. I know that we can substitute those with the corresponding pattern (like [0-9] and so on), but if brics could build the automaton directly, it would be better. Note that these are the most used operators brics is not supporting according to
Carl Chapman and Kathryn T. Stolee. 2016. Exploring regular expression usage and context in Python. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 282-293.
We use brics for generating strings using mutation testing: http://cs.unibg.it/mutrex/ and it works great !

JDK11 Support

Hi @amoeller I have worked on upgrading dk.brics.automaton to run on JDK11.
JDK 1.6 was deprecated some time ago...
Pu;ll request coming up...

string generation loops for patterns like "(.){0,2}"

Hi,

we love working with Brics, but we came across this issue:

Given the pattern (.){0,2}, Brics correctly produces the automaton a:

initial state: 2
state 0 [accept]:
  \u0000-\uffff -> 1
state 1 [accept]:
state 2 [accept]:
  \u0000-\uffff -> 0

However, calling SpecialOperations.getFiniteStrings(a) enters an infitite loop.

Thanks,
Stefanie

Memory consumption problem

I tried to match a url to a bunch of site url patterns. The site pattern for each site is just like this: "..google.com/.". I concatenate all the site patterns into one pattern like this: "(..google.com/.)|(..facebook.com/.)|...". When I test with 100 sites, the memory requirement to build a automaton is very huge. I set the java heap size to 8G but it still fails with OutOfMemory error. I didn't try more since it won't meet my requirement if the memory consumption is so huge. Is it supposed to use so much memory to build an automaton for the pattern like this?

confusion about isEmpty()

I am confused about Automaton.isEmpty(). I would think that the language defined by the regexp for the empty string "()" is the empty set.

I intentionally do not want to use "#" (for empty language) in the regular expression, because I would like to match against this symbol without having to escape.

The second assertion in my test below fails:

  public void testEmptyString() throws REException {
    int flags = RegExp.ALL & ~RegExp.EMPTY;
    Automaton automaton = (new RegExp("()", flags)).toAutomaton();

    RunAutomaton ra = new RunAutomaton(automaton);

    assertTrue(ra.run(""));
    assertTrue(automaton.isEmpty()); // fails
  }

The compiled automaton is

initial state: 0
state 0 [accept]:

How to match all not containing a specific string?

Hi

this library is used in Gerrit.
In Gerrit i try to install a mechanism to add reviewer only if the commit message does not start with "!!NOT_READY!!"

If it would be standard regex, i think i could use:
message:^(?!!!NOT_READY!!).*

How can i do this in gerrit using dk.brics.automaton ?

BR
Frank

Regex Intersection problem

String url = "https://a.b.com/d.html?p=ttt";
String regex1 = "(http|https)://a\\.b\\.com.*";
String regex2 = "^(http|https)://a\\.b\\.com/d\\.html.*";

Automaton automaton = new RegExp(regex1).toAutomaton();
Automaton match = new RegExp(regex2).toAutomaton();

System.out.println(Pattern.compile(regex1).matcher(url).matches()); //true
System.out.println(Pattern.compile(regex2).matcher(url).matches()); //true

System.out.println(automaton.intersection(match).isEmpty()); //true

hi, I have two regex regex1,regex2.
Java Pattern matches the same url.
But the Automaton intersection is empty.
It return not empty only the two regex both haven't '^' or both start with '^'.
How can I get not empty no matter have or not '^'
Thx.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.