cs-au-dk / dk.brics.automaton Goto Github PK
View Code? Open in Web Editor NEWdk.brics.automaton - finite-state automata and regular expressions for Java
Home Page: http://www.brics.dk/automaton/
License: Other
dk.brics.automaton - finite-state automata and regular expressions for Java
Home Page: http://www.brics.dk/automaton/
License: Other
Hi, is there any chances that I can extract the corresponding states sequence after processing the string on the automaton? Thank you.
String str = "num1=123&num2=456";
String regex = "num1=.&num2=.";
p = Pattern.compile(regex);
System.out.println(p.matcher(str).matches());
Automaton automaton = new RegExp(regex).toAutomaton();
System.out.println(automaton.run(str));
//result : true , false
if I add '\\' before '&' the result is "true, true"
is '&' a special char?
found '#' have the same problem
it also return empty when do intersection between two same regex contain the special char.
is it have other special char ?
My expectation with Brics is that for the following scenario:
AutomatonMatcher
via a RunAutomaton
.*
That the runtime of matching would be equivalent to the number of characters in the input string, i.e. each char would be interrogated once as part of the matching.
E.g.
public boolean find(final String input) {
// automaton is of type RunAutomaton built using tabelize=true
final AutomatonMatcher matcher = automaton.newMatcher(input);
return matcher.find();
}
However, this does not appear to be the case at all. The implementation does some form of "backtracking" in the AutomatonMatcher#find()
method. Is this a known problem or just expected with the ability to match and return the position of matching.
public boolean find() {
...
int l = getChars().length();
while (begin < l) {
int p = automaton.getInitialState();
// inner loop
// will "backtrack per char" if the matching eventually fails
for (int i = begin; i < l; i++) {
final int new_state = automaton.step(p, getChars().charAt(i));
if (new_state == -1) {
break;
} else if (automaton.isAccept(new_state)) {
// found a match from begin to (i+1)
match_start = begin;
match_end=(i+1);
}
p = new_state;
}
if (match_start != -1) {
setMatch(match_start, match_end);
return true;
}
begin += 1;
}
if (match_start != -1) {
setMatch(match_start, match_end);
return true;
} else {
setMatch(-2, -2);
return false;
}
}
You see that if a match occurs at position i
, then we will continue until the match fails and then restart from position i+1
.
Adding my own version of this class, with extra counting and logging I can see the following. We initially noticed this due to "poor" performance in relation to testing some inputs and patterns.
In essence we simply count how often automaton.step
is checked.
For a test with:
.*(rhino[£$%])+
Then we see, calls to automaton.step = 5000050000
Which are way more than my expectation of 100k would be?
I was able to approach the performance I would expect by hacking a version of the RunAutomaton.run
method that returns early if it reaches an accept state and giving any regex a wildcard match at the start. Obviously that doesn't handle position checking.
Please don't get me wrong though this library is a great tool and we appreciate all the work that went into it.
You might argue that the example is a little dumb as it has .*
at the start and we are doing a match but I believe the same problem would also occur with other regexs that use alternations with repetition not at the start of the regex depending on the input string form.
I'm sure I am missing something and will welcome any assistance.
The ignore-case flag is one JDK/Perl regex feature that is badly missing from dk.brics.automaton
. Since the question mark is a reserved character, it should hopefully not break existing regular expressions? If it would, it could be made optional, hidden behind a RegExp constructor flag.
If that flag-style is acceptable, I would be willing to take a shot at implementing it.
Hi @amoeller, team !
The last commit on the master was more than 2 years ago and it seems the project isn't actively maintained.
Is this project still alive?
I have a probably rather niche use case, I'd like to create RunAutomatons, serialize them with store
, then only once they're deserialized with load
tabelize them. The reasoning being that for the automata in question (https://github.com/dan2097/opsin/tree/master/opsin-core/src/main/resources/uk/ac/cam/ch/wwmm/opsin/resources/serialisedAutomata) tabelizing significantly increases their size, and given how fast setAlphabet()
is there could be even a speed penalty for deserializing this array as opposed to creating it programmatically (Java's inbuilt serialization code isn't the fastest).
As setAlphabet() isn't public I can currently achieve this either via reflection or having a class in the dk.brics.automaton package.
I can't really think of an idiomatic solution to this. I mean a simple solution would be to make setAlphabet() public, but there isn't normally a reason to call this after the RunAutomaton is constructed. A load(InputStream stream, boolean tableize)
could get confusing as to how it interacts with serialized tabelized and non-tabelized automata e.g. if you set it to false would it set classmap to null on the deserialized RunAutomaton.
After converting some regular expressions to automata and attempting to visualize them, I found that some regexes are not output correctly. Some examples are shown below.
Output when visualizing $\{upper:([a-z0-9]|:|\. ||/)
:
In the ([a-z0-9]|:|\. ||/)
part, the output does not contain 0-9
. Also, :. /
, which is expected to be . -:
is output.
Output when visualizing $\{([a-z0-9]|:)+:\-([a-z0-9])+
:
In the [a-z0-9]
part, the part marked in red is the expected output, while the part marked in blue seems to be redundant.
Some of these regular expressions are improperly visualized, but I get the expected results for matching in both regexes. Are these behaviors due to a bug or the specification of the library?
When calling toString for a RegExp that is a string, I would expect that a double quote gets escaped (if not already). This would allow to re-parse the same string to get the same RegExp. Here it is a failing test:
@test
public void problemRegExpToString() throws IOException {
String reg = "a\"";
System.out.println(reg); \ a"
RegExp regexR = new RegExp(reg);
System.out.println(regexR.toString()); \ "a""
new RegExp(regexR.toString()); // Exception
}
The second call for the constructor throws an exception (for the string is now "a""
Provide support for look ahead/behind assertions.
Is there any interest in supporting JPMS modules introduced in Java 9? Currently, I am in the process of modularizing some projects and get a warning when using BRICS since the compiler is only able to infer a filename-based module name. I see two possibilities to address this issue.
The least invasive approach would be to include an Automatic-Module-Name
entry in the MANIFEST.MF
file of the built JAR file. This would allow you to claim a module name (something like dk.brics.automaton
) and get rid of the warnings without having to change too much of the build process. For example (in case of Maven), it would only require an additional config property on the jar-plugin.
However, since BRICS barely has any dependencies, it would also be easy to add a full module-info.java
descriptor which would allow people to conveniently use BRICS in more advanced setups that include jlink
or jpackage
. The drawback of this approach is that you would have to build BRICS on a JDK greater than 8. To not break compatibility with the current version, I would suggest to realize this with two compilation phases. One that compiles everything for Java 9+ (including the module-info) and one that re-compiles everything (except the module-info) for Java 8. Since module-info
isn't a valid Java identifier, you can't reference this file in regular Java 8 applications. So unless you are doing some reflective JAR scanning, people shouldn't run into any problems with UnsupportedClassVersionError
s.
If interested, I could provide a PR for the initial support of this feature. However, since I am not really well-versed in Ant, I would probably need some help to adjust the build.xml
as well.
This library is causing dependency check failures in our build pipeline:
org.xml.sax.SAXParseException: Content is not allowed in prolog.
An error occurred while analyzing '....gradle\caches\modules-2\files-2.1\dk.brics.automaton\automaton\1.11-8\6ebfa65eb431ff4b715a23be7a750cbc4cc96d0f\automaton-1.11-8.jar' (Central Analyzer).
could you add an xml declaration to your POM file?
https://search.maven.org/#artifactdetails%7Cdk.brics.automaton%7Cautomaton%7C1.11-8%7Cjar
It would be great to have tests for different algorithms. It allows regression testing and it helps with familiarization.
The link to the tarball on the Download page (http://www.brics.dk/~amoeller/automaton/automaton-1.12-1.tar.gz) appears to be broken.
Hi, thank you for providing a great regular expression library!
I have noticed that brics handles input regex string as a sequence of java.lang.Character
, and this could cause a somewhat unintuitive behavior.
For example, 𠀋<𠮟<𡵅
as a Unicode Scalar Value (0x2000b
, 0x20b9f
, 0x21d45
respectively, all of them will be expressed with surrogate pairs), but automaton created from [𠀋-𡵅]
doesn't accept 𠮟
.
private static void checkOneCodePoint(final String s) {
if (s.codePointCount(0, s.length()) != 1) throw new IllegalArgumentException();
}
public static boolean testBrics(final String a, final String b, final String c) {
checkOneCodePoint(a); checkOneCodePoint(b); checkOneCodePoint(c);
final RegExp regex = new RegExp("[" + a + "-" + c + "]");
return regex.toAutomaton().run(b);
}
public static boolean testJava(final String a, final String b, final String c) {
checkOneCodePoint(a); checkOneCodePoint(b); checkOneCodePoint(c);
final Pattern pattern = Pattern.compile("[" + a + "-" + c + "]");
return pattern.matcher(b).matches();
}
public static void main(final String[] args) throws IOException {
final String a = new String(new int[]{0x2000b}, 0, 1); // 𠀋
final String b = new String(new int[]{0x20b9f}, 0, 1); // 𠮟
final String c = new String(new int[]{0x21d45}, 0, 1); // 𡵅
System.out.println(testBrics(a, b, c)); // false
System.out.println(testJava(a, b, c)); // true
}
Fixing this would require us to do
java.lang.Character
-by-java.lang.Character
but) Code Point stream. This also includes fixes for operator precedence, like 𠀋+
.java.lang.Character
s, and if they involve surrogate pairs, do something similar to what we do for numerical interval <n-m>
Although won't-fix totally make sense, it'd be great if we could find this fact in the documentation.
Thanks,
The git repository is missing the tag for the 1.12 release (should be put on d646e15)
With regular expression, the pattern "(?!201[0-8])\d{4}" matches "2022" but not "2012". However with automaton, the complement pattern "~(201[0-8])\d{4}" matches both "2022" and "2012". I don't know why. How can I implement a pattern making same function with the regular expession "(?!201[0-8])\d{4}"?
Hi !
Ran into word boundary issue, I have a long list of tokens, some of which I
need to match with word boundary; Basic Pattern class works, but not Automation.
Any way this can be fixed?
Automaton p0_AA = new RegExp(".*(something|\b(blah|foo|goo)\b)").toAutomaton();
RunAutomaton p0_RA = new RunAutomaton(p0_AA);
System.out.println(p0_RA.run("ba foo nery"));
-->false
Basic regex works with above.
Pattern p0 = Pattern.compile(".*(something|\b(blah|foo|goo)\b)");
String s = "ba foo nery";
Matcher m = p0.matcher(s);
if (m.find()) {
System.out.println("pattern found");
} else {
System.out.println("not found");
}
-->found
The following assertion fails. (in scala)
assert(
BasicOperations.union(BasicAutomata.makeString(""), BasicAutomata.makeEmpty())
.getFiniteStrings().asScala == Set("")
)
It appears there is a bug in the getFiniteStrings
function. When:
nothing is added to strings
.
The following JDK WARNING
's should be addressed
100 warnings
[WARNING] Javadoc Warnings
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:505: warning: no @return
[WARNING] public int getNumberOfStates() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:515: warning: no @return
[WARNING] public int getNumberOfTransitions() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:595: warning: no @return
[WARNING] public String toDot() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:682: warning: no @return
[WARNING] public static Automaton load(URL url) throws IOException, ClassCastException, ClassNotFoundException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:693: warning: no @return
[WARNING] public static Automaton load(InputStream stream) throws IOException, ClassCastException, ClassNotFoundException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:712: warning: no @return
[WARNING] public static Automaton makeEmpty() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:719: warning: no @return
[WARNING] public static Automaton makeEmptyString() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:726: warning: no @return
[WARNING] public static Automaton makeAnyString() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:733: warning: no @return
[WARNING] public static Automaton makeAnyChar() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:740: warning: no @param for c
[WARNING] public static Automaton makeChar(char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:740: warning: no @return
[WARNING] public static Automaton makeChar(char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:747: warning: no @param for min
[WARNING] public static Automaton makeCharRange(char min, char max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:747: warning: no @param for max
[WARNING] public static Automaton makeCharRange(char min, char max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:747: warning: no @return
[WARNING] public static Automaton makeCharRange(char min, char max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:754: warning: no @param for set
[WARNING] public static Automaton makeCharSet(String set) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:754: warning: no @return
[WARNING] public static Automaton makeCharSet(String set) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:761: warning: no @param for min
[WARNING] public static Automaton makeInterval(int min, int max, int digits) throws IllegalArgumentException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:761: warning: no @param for max
[WARNING] public static Automaton makeInterval(int min, int max, int digits) throws IllegalArgumentException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:761: warning: no @param for digits
[WARNING] public static Automaton makeInterval(int min, int max, int digits) throws IllegalArgumentException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:761: warning: no @return
[WARNING] public static Automaton makeInterval(int min, int max, int digits) throws IllegalArgumentException {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:768: warning: no @param for s
[WARNING] public static Automaton makeString(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:768: warning: no @return
[WARNING] public static Automaton makeString(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:775: warning: no @param for strings
[WARNING] public static Automaton makeStringUnion(CharSequence... strings) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:775: warning: no @return
[WARNING] public static Automaton makeStringUnion(CharSequence... strings) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:782: warning: no @param for n
[WARNING] public static Automaton makeMaxInteger(String n) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:782: warning: no @return
[WARNING] public static Automaton makeMaxInteger(String n) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:789: warning: no @param for n
[WARNING] public static Automaton makeMinInteger(String n) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:789: warning: no @return
[WARNING] public static Automaton makeMinInteger(String n) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:796: warning: no @param for i
[WARNING] public static Automaton makeTotalDigits(int i) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:796: warning: no @return
[WARNING] public static Automaton makeTotalDigits(int i) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:803: warning: no @param for i
[WARNING] public static Automaton makeFractionDigits(int i) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:803: warning: no @return
[WARNING] public static Automaton makeFractionDigits(int i) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:810: warning: no @param for value
[WARNING] public static Automaton makeIntegerValue(String value) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:810: warning: no @return
[WARNING] public static Automaton makeIntegerValue(String value) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:817: warning: no @param for value
[WARNING] public static Automaton makeDecimalValue(String value) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:817: warning: no @return
[WARNING] public static Automaton makeDecimalValue(String value) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:824: warning: no @param for s
[WARNING] public static Automaton makeStringMatcher(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:824: warning: no @return
[WARNING] public static Automaton makeStringMatcher(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:831: warning: no @param for a
[WARNING] public Automaton concatenate(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:831: warning: no @return
[WARNING] public Automaton concatenate(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:838: warning: no @param for l
[WARNING] static public Automaton concatenate(List<Automaton> l) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:838: warning: no @return
[WARNING] static public Automaton concatenate(List<Automaton> l) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:845: warning: no @return
[WARNING] public Automaton optional() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:852: warning: no @return
[WARNING] public Automaton repeat() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:859: warning: no @param for min
[WARNING] public Automaton repeat(int min) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:859: warning: no @return
[WARNING] public Automaton repeat(int min) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:866: warning: no @param for min
[WARNING] public Automaton repeat(int min, int max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:866: warning: no @param for max
[WARNING] public Automaton repeat(int min, int max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:866: warning: no @return
[WARNING] public Automaton repeat(int min, int max) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:873: warning: no @return
[WARNING] public Automaton complement() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:880: warning: no @param for a
[WARNING] public Automaton minus(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:880: warning: no @return
[WARNING] public Automaton minus(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:887: warning: no @param for a
[WARNING] public Automaton intersection(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:887: warning: no @return
[WARNING] public Automaton intersection(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:894: warning: no @param for a
[WARNING] public boolean subsetOf(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:894: warning: no @return
[WARNING] public boolean subsetOf(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:901: warning: no @param for a
[WARNING] public Automaton union(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:901: warning: no @return
[WARNING] public Automaton union(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:908: warning: no @param for l
[WARNING] static public Automaton union(Collection<Automaton> l) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:908: warning: no @return
[WARNING] static public Automaton union(Collection<Automaton> l) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:922: warning: no @param for pairs
[WARNING] public void addEpsilons(Collection<StatePair> pairs) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:929: warning: no @return
[WARNING] public boolean isEmptyString() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:936: warning: no @return
[WARNING] public boolean isEmpty() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:943: warning: no @return
[WARNING] public boolean isTotal() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:950: warning: no @param for accepted
[WARNING] public String getShortestExample(boolean accepted) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:950: warning: no @return
[WARNING] public String getShortestExample(boolean accepted) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:957: warning: no @param for s
[WARNING] public boolean run(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:957: warning: no @return
[WARNING] public boolean run(String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:972: warning: no @param for a
[WARNING] public static Automaton minimize(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:972: warning: no @return
[WARNING] public static Automaton minimize(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:980: warning: no @param for a
[WARNING] public Automaton overlap(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:980: warning: no @return
[WARNING] public Automaton overlap(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:987: warning: no @return
[WARNING] public Automaton singleChars() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:994: warning: no @param for set
[WARNING] public Automaton trim(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:994: warning: no @param for c
[WARNING] public Automaton trim(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:994: warning: no @return
[WARNING] public Automaton trim(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1001: warning: no @param for set
[WARNING] public Automaton compress(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1001: warning: no @param for c
[WARNING] public Automaton compress(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1001: warning: no @return
[WARNING] public Automaton compress(String set, char c) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1008: warning: no @param for map
[WARNING] public Automaton subst(Map<Character,Set<Character>> map) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1008: warning: no @return
[WARNING] public Automaton subst(Map<Character,Set<Character>> map) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1015: warning: no @param for c
[WARNING] public Automaton subst(char c, String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1015: warning: no @param for s
[WARNING] public Automaton subst(char c, String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1015: warning: no @return
[WARNING] public Automaton subst(char c, String s) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1022: warning: no @param for source
[WARNING] public Automaton homomorph(char[] source, char[] dest) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1022: warning: no @param for dest
[WARNING] public Automaton homomorph(char[] source, char[] dest) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1022: warning: no @return
[WARNING] public Automaton homomorph(char[] source, char[] dest) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1029: warning: no @param for chars
[WARNING] public Automaton projectChars(Set<Character> chars) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1029: warning: no @return
[WARNING] public Automaton projectChars(Set<Character> chars) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1036: warning: no @return
[WARNING] public boolean isFinite() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1043: warning: no @param for length
[WARNING] public Set<String> getStrings(int length) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1043: warning: no @return
[WARNING] public Set<String> getStrings(int length) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1050: warning: no @return
[WARNING] public Set<String> getFiniteStrings() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1057: warning: no @param for limit
[WARNING] public Set<String> getFiniteStrings(int limit) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1057: warning: no @return
[WARNING] public Set<String> getFiniteStrings(int limit) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1064: warning: no @return
[WARNING] public String getCommonPrefix() {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1078: warning: no @param for a
[WARNING] public static Automaton hexCases(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1078: warning: no @return
[WARNING] public static Automaton hexCases(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1085: warning: no @param for a
[WARNING] public static Automaton replaceWhitespace(Automaton a) {
[WARNING] ^
[WARNING] /Users/lmcgibbn/Downloads/dk.brics.automaton/src/dk/brics/automaton/Automaton.java:1085: warning: no @return
[WARNING] public static Automaton replaceWhitespace(Automaton a) {
[WARNING] ^
[INFO] Building jar: /Users/lmcgibbn/Downloads/dk.brics.automaton/target/automaton-1.12-2-javadoc.jar
🧿
Automaton automaton1 = RegExp("^GSA").toAutomaton();
Automaton automaton2 = new RegExp("GSA").toAutomaton();
System.out.println(automaton2.intersection(automaton1).isEmpty());
it should intersect ,but code gives me that these two patterns are unique.
It would be very good if brics regex subsystem could support natively special chars \d \s \w. I know that we can substitute those with the corresponding pattern (like [0-9] and so on), but if brics could build the automaton directly, it would be better. Note that these are the most used operators brics is not supporting according to
Carl Chapman and Kathryn T. Stolee. 2016. Exploring regular expression usage and context in Python. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 282-293.
We use brics for generating strings using mutation testing: http://cs.unibg.it/mutrex/ and it works great !
Hi @amoeller I have worked on upgrading dk.brics.automaton to run on JDK11.
JDK 1.6 was deprecated some time ago...
Pu;ll request coming up...
Hi,
we love working with Brics, but we came across this issue:
Given the pattern (.){0,2}, Brics correctly produces the automaton a:
initial state: 2
state 0 [accept]:
\u0000-\uffff -> 1
state 1 [accept]:
state 2 [accept]:
\u0000-\uffff -> 0
However, calling SpecialOperations.getFiniteStrings(a)
enters an infitite loop.
Thanks,
Stefanie
Hi, any chance we can implement https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5091862/ ? It claims to be faster than all the existing methods available. Thanks!
I tried to match a url to a bunch of site url patterns. The site pattern for each site is just like this: "..google.com/.". I concatenate all the site patterns into one pattern like this: "(..google.com/.)|(..facebook.com/.)|...". When I test with 100 sites, the memory requirement to build a automaton is very huge. I set the java heap size to 8G but it still fails with OutOfMemory error. I didn't try more since it won't meet my requirement if the memory consumption is so huge. Is it supposed to use so much memory to build an automaton for the pattern like this?
Character classes with a quantifier do not work, such as \d{2} or \d{2,3} as they do in a Java Regex. Instead they have to be written as \d\d or as [0-9]{2,3} .
I am confused about Automaton.isEmpty(). I would think that the language defined by the regexp for the empty string "()" is the empty set.
I intentionally do not want to use "#" (for empty language) in the regular expression, because I would like to match against this symbol without having to escape.
The second assertion in my test below fails:
public void testEmptyString() throws REException {
int flags = RegExp.ALL & ~RegExp.EMPTY;
Automaton automaton = (new RegExp("()", flags)).toAutomaton();
RunAutomaton ra = new RunAutomaton(automaton);
assertTrue(ra.run(""));
assertTrue(automaton.isEmpty()); // fails
}
The compiled automaton is
initial state: 0
state 0 [accept]:
Hi
this library is used in Gerrit.
In Gerrit i try to install a mechanism to add reviewer only if the commit message does not start with "!!NOT_READY!!"
If it would be standard regex, i think i could use:
message:^(?!!!NOT_READY!!).*
How can i do this in gerrit using dk.brics.automaton ?
BR
Frank
String url = "https://a.b.com/d.html?p=ttt";
String regex1 = "(http|https)://a\\.b\\.com.*";
String regex2 = "^(http|https)://a\\.b\\.com/d\\.html.*";
Automaton automaton = new RegExp(regex1).toAutomaton();
Automaton match = new RegExp(regex2).toAutomaton();
System.out.println(Pattern.compile(regex1).matcher(url).matches()); //true
System.out.println(Pattern.compile(regex2).matcher(url).matches()); //true
System.out.println(automaton.intersection(match).isEmpty()); //true
hi, I have two regex regex1,regex2.
Java Pattern matches the same url.
But the Automaton intersection is empty.
It return not empty only the two regex both haven't '^' or both start with '^'.
How can I get not empty no matter have or not '^'
Thx.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.