Code Monkey home page Code Monkey logo

fare's Introduction

Fare - [F]inite [A]utomata and [R]egular [E]xpressions

Project Fare is an effort to bring a DFA/NFA (finite-state automata) implementation from Java to .NET. There are quite a few implementations available in other languages today. This project aims to fill the gap in .NET.

Fare is a .NET port of the well established Java library dk.brics.automaton with API as close as possible to the corresponding dk.brics.automaton classes.

Is my Regular Expression supported?

Probably yes.

Keep in mind though that Project Fare turns Regular Expressions into Automatons by applying the algorithms of dk.brics.automaton and xeger.

If your Regular Expression isn't supported, it would make sense to debug the C# code but also compare with the results from xeger.

As an alternative, you may use a different pattern or even use a different engine to reverse the Regular Expression into an Automaton. As an example, you can use the Rex engine.

Design changes

Based on version 1.11-8 of dk.brics.automaton released on September 7, 2011. [ChangeLog] (http://www.brics.dk/automaton/ChangeLog)

NuGet package

Fare is available via NuGet.

Versioning

Fare reached version 1 without following a particular versioning scheme. From version 1 and above, Fare follows Semantic Versioning 2.0.0.

Which projects use Fare?

Fare is used in:

fare's People

Contributors

fread75 avatar gukoff avatar jimic avatar jnyrup avatar leolplex avatar magnusmikkelsen avatar moodmosaic avatar vassiliki avatar zvirja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fare's Issues

Empty intersection not detected

I am trying to write a method that will detect if there is NO intersection between 2 regular expressions (as in this answer on stack overflow for java: https://stackoverflow.com/a/17957180/2987400 )

but it never suggests that the intersection is empty.

Here is my code:

public bool IsNotOrthogonal(string productName1, string productName2)
    {
        _log.DebugFormat("Checking for overlap between [{0}] and [{1}]", productName1, productName2);

        var r1 = new RegExp(productName1);
        var a1 = r1.ToAutomaton();

        var r2 = new RegExp(productName2);
        var a2 = r2.ToAutomaton();

        var i2 = a1.Intersection(a2);

        if (i2.IsEmpty)
        {
            _log.Debug("no overlap");
            return false;
        }
        else
        {
            _log.Debug("intersection is not empty");
            return true;
        }
    }

and some a unit test that fails when I expect it to pass:

public void SimpleNoOverlap()
    {
        var logger = new Mock<ILog>();
        var sut = new RegexOverlapChecker(x => logger.Object);
        var res = sut.IsNotOrthogonal(@"^A", @"^B");
        Assert.IsFalse(res);
    }

Any help on this would be greatly appreciated

Some expression causes infinite delays and memory + CPU usage

Here is an expression I tried:

^(?=.*[A-Z].*[A-Z])(?=.*[!@#$&*])(?=.*[0-9].*[0-9])(?=.*[a-z].*[a-z].*[a-z]).{8}$

Been running for two hours now and memory usage is over 1.2GB and going up. I'm trying to generate a very secure password like string.

2 Files not included

Hi,

I downloaded the source and the files DataTypes.cs and StringUnionOperations.cs are missing in the Src. What can I do? Thanks!

Alex

It generates the same string being called in a sequence

for _ in 1..10 do
        (Xeger @"^\[\<([A-Z][a-zA-Z0-9]*)*\>\]$").Generate() |> printfn "%s"

run 1

[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]

run 2

[<V5N42Eu>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]
[<>]

run 3

[<Fl>]
[<>]
[<>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]
[<IFELv0We6427oTb4116>]

a{0,50} generate short string

If I write a regex expression like a{0,100} I expect that Xeger.Generate() generates one of the possible matching sequences ("a", "aa","aaa","aaa" ...... "aaaaaaaaaaaaa[....]aaaaaaaaaaaaaa").

But Xeger.Generate() almost never generate sequence longer than 15. On stackoverflow Issue generating multiple occurrence with Fare/Xeger they told me

It looks like Xeger is randomly selecting possible transitions at each step and then appending the string matching that transition to the result. For your regex, once the matching string has 1 a, there are two possible allowed transitions: "Add another a" or "End of string". [...]

So, If I understood correctly this means that the probability to get long string is tremendously low.

Is there any simple way to make a{0,100} really generate sequences between 0 and 100 characters long? (I mean, with similar frequency:)).
Thank you, g.

Cannot recognize "(?!0000)\d{4}"

var randString = string.Empty;
Xeger xegerGenerator = new Xeger("(?!0000)\d{4}");
randString = xegerGenerator.Generate();

randString = "!00004377";

randString should be any four digits except "0000" (four zeros)

Implement IComparable in State class

As discussed in this thread, it looks like the F# comparison constraint looks for (non-generic) IComparable, which the existing IComparable<T> doesn't inherit from.

The State class implements IComperable<T> and so it could also implement IComparable as well.

Issues with Automaton.Intersection

Hello,

Before all, thanks for you portage, this is exactly what I was looking for, unfortunately I encounter an issue with the only method I need.

I have an issue with those 2 following basic tests I made with the Intersection method :

  • I defined the 2 RegExp : "ab." and "a.", when I try to calculate their intersection I receive an System.OutOfMemoryException
  • I defined the 2 RegExp : "ab" and "ab" (they are identical), I calculate their intersection and proceed to call "Reduce" on the resulting automaton, and that makes this automatons empty (and can't recognize "ab" as a valid string)

Positive lookahead?

I'm trying to get a random password with some restrictions, this is my regex

^(?=.\d)(?=.[a-z])(?=.[A-Z])(?=.[@$._*#&%]).{8,16}$

This regex says:

  • have one or more digits
  • have lowercase letters
  • have uppercase letters
  • have one or more that symbols
  • lenght between 8 and 16 chars

When I try to run this on Xeger the code hangs, It is possible to do this?

Inifinite loop processing expression

The sample code below causes FARE to enter an infinite loop, the problem is in BasicOperations.Determinize. I don't follow the code, but it looks like the bug maybe that its not stopping in the accept check (the break drops out of the for loop not the outer while loop).

 while(...)
 {
          ...
          foreach (State q in s)
            {
                if (q.Accept)
                {
                    r.Accept = true;
                    break; (I THINK THIS SHOULD BREAK OUT OF THE WHILE LOOP)
                }
            }

Sample code

  string pattern = "\\S+.*";
  Fare.Xeger sut = new Fare.Xeger(pattern, rnd);

Strong Name Signing

Would it be possible to "Strong Name Sign" the assemblies in this NuGet package?

Referencing/Using this package in a strongly-named assembly results in the following error at runtime:
Could not load file or assembly 'Fare, Version=1.0.3.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. A strongly-named assembly is required. (Exception from HRESULT: 0x80131044)

Xeger throws if regex contains "(#)"

We are using Fare / Xeger to generate Data from a specific Regex, but if the Regex contains "(#)"
(like in this one: "\d{8}(#)\d{3}" ) it throws an InvalidOperationException "State" while generating.

To create anything, creates Chinese characters

I'm trying to get a random string. Anything is acceptable. Thus I came with this regular expression:

.*

That literally means any character, any number of times.

However, I ran this line of code like a thousand times, and not even once I get a simple English character, all gibberish and Chinese.

new Fare.Xeger(".*").Generate(); // ran for a thousand times, and I expected outputs like "A坏_#@LA", but I got examples like "☐缨坏庶袗"

Why is it behaving that way? Does it have an internal preference for Unicode characters? Does it ignore ASCII characters on string generation, unless explicitly force via the Regular Expression pattern?

System.InvalidOperationException: state

Xeger throws System.InvalidOperationException: state when trying to generate a string for this regular expression for emails:
^(?=.{6,50}$)([\w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([\w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$

StackOverflowException in Xeger

The following regular expressions create a StackOverflowException:

^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?
^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&amp;%\$\-]+)*@)?((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.[a-zA-Z]{2,4})(\:[0-9]+)?(/[^/][a-zA-Z0-9\.\,\?\'\\/\+&amp;%\$#\=~_\-@]*)*$
(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?$

The Xeger class keep traversing the Automaton's States resulting to a StackOverflowException. This happens in the IList<Transition> GetSortedTransitions() method of the State class.

The issue appeared after this commit which addresses this issue.

Invalid expression is get generated

What i wanted is
"The Password must have a minimum/maximum of 8 characters, including 1 special character, 1 digit and once capital letter."
I have a following expression ^((?=.\d)(?=.[A-Z])(?=.*\W).{8,8})$
But it is not generating valid string as per my requirement
Please check what's the problem

[(https://stackoverflow.com/questions/57006786/not-able-to-generate-string-from-complex-regular-expression-with-fare)]

Xeger sometimes fails to generate a string matching this regex: @"^[^\s]+$"

The RegEx pattern: @"^[^\s]+$" expects a string without spaces, yet Xeger sometimes generates results which are contradicting this requirement.
A code to prove it:

public void XegerIssueProof()
{
    string pattern = @"^[^\s]+$";
    for (int i = 0; i < 50000; i++)
    {
        string result = new Xeger(pattern).Generate();
        if (!Regex.IsMatch(result, pattern))
        {
            throw new Exception($"Failed at attempt:{i}, generated string is: '{result}'");
        }
    }
}

Some results:
Failed at attempt:9150, generated string is: ' ~v'
Failed at attempt:1, generated string is: ' {uyz{_z'
Failed at attempt:3397, generated string is: 'a t|o'

Push tags corresponding to the releases

I'm looking to work on the AutoFixture/AutoFixture#938. I would like to rework the build/versioning system to make it similar to the AutoFixture model (where it's CI based). It's well described here.

The first step (and basically for the consistency) I need repo to contain the tags. Could you please add the tags corresponding to the existing releases? Unfortunately, I cannot push them via PR 😕

locks up generating a value

string patternPropertyName = (new Fare.Xeger("(.*)docker_builder")).Generate();

Would maybe be worth adding some kind of timeout or limit to the generate function so issues like this don't kill an application.

Issue with expression

The following regex comes from the chrome-manifest.json

When used with the following code it throws an exception (argument all_urls not found).
I'm no regex expert, but other online parsers validate it, so I assume its valid....

Fare.Xeger sut = new Fare.Xeger(regexString);
string result = sut.Generate();

^((\*|http|https|file|ftp|chrome-extension):\/\/(\*|\*\.[^\/\*]+|[^\/\*]+)?(\/.*))|<all_urls>$

Fare is a great piece of software and works on most test cases, thanks for all the effort.

Add a build automation script

Add the appropriate build automations in order to:

  • Auto-increment assembly version numbers
  • Build
    • Verify / Code Analysis
    • Test Run
  • Release
  • Create NuGet Packages

Repetitive generation returns same result at high speed

Hi,

I'm generating like 5000 instances of data using Fare in parallel. From some point onward the results get repetitive.

The code is:

static Xeger persianWordPattern = new Xeger(@"[آ ا ب پ ت ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی]{2,8}");

And in a parallel loop, after a while the results would become:

آآ
آآ
آآ
آآ
آآ
آآ
آآ
آآ

Any idea why this is so?

Thank you for this great library.

Add AppVeyor CI

We definitely need CI for this project, otherwise it's very dangerous to merge the PRs. Everything is ready for CI (configuration file is already present), you need to do a couple of things only:

  • Go to AppVeyor and create a build for this repo (a few clicks).
  • If you wish to enable automatic release after you create a tag (via GitHub UI), please update the appveyor.yml file and specify the NuGet API Key (encrypted) to the NUGET_API_KEY variable.

Huffman Minimization Mark List Initialization Issue

The initialization of the List called "mark" in MinimizationOperations.MinimizeHuffman is incomplete, and consequently an Index Out of Bounds error is produced for "mark[n1][n2] = true;" (This is of course only when Automaton.Minimization is set to Automaton.MinimizeHuffman).

build.cmd won't run on Windows with only .NET Core installed

Downloading NuGet.exe ...
Feeds used:
  C:\Users\Nikos\.nuget\packages\
  https://api.nuget.org/v3/index.json



Attempting to gather dependency information for package 'FAKE.Core.4.63.2' with respect to project 'build\tools', targeting 'Any,Version=v0.0'
Gathering dependency information took 1.65 sec
Attempting to resolve dependencies for package 'FAKE.Core.4.63.2' with DependencyBehavior 'Lowest'
Resolving dependency information took 0 ms
Resolving actions to install package 'FAKE.Core.4.63.2'
Resolved actions to install package 'FAKE.Core.4.63.2'
Retrieving package 'FAKE.Core 4.63.2' from 'C:\Users\Nikos\.nuget\packages\'.
Adding package 'FAKE.Core.4.63.2' to folder 'C:\Snapshot\Fare\build\tools'
Added package 'FAKE.Core.4.63.2' to folder 'C:\Snapshot\Fare\build\tools'
Successfully installed 'FAKE.Core 4.63.2' to build\tools
Executing nuget actions took 436.42 ms
Running FAKE Build...
C:\Program Files\Git\mingw64\bin\git.exe describe --tags --long --abbrev=40 --first-parent --match=v*
v2.1.1-1-g71eaf12a6553a6ddd84a0fc52e78f19ada766d0a
Building project with version: LocalBuild
Shortened DependencyGraph for Target CompleteBuild:
<== CompleteBuild
   <== NuGetPack
      <== CleanNuGetPackages
      <== Test
         <== Build
            <== Verify
               <== RestoreNuGetPackages
               <== VerifyOnly
            <== BuildOnly
         <== TestOnly

The running order is:
  - CleanNuGetPackages
  - RestoreNuGetPackages
  - VerifyOnly
  - Verify
  - BuildOnly
  - Build
  - TestOnly
  - Test
  - NuGetPack
  - CompleteBuild
Running build with 1 worker
Starting Target: CleanNuGetPackages
Creating C:\Snapshot\Fare\build\NuGetPackages
Finished Target: CleanNuGetPackages
Starting Target: RestoreNuGetPackages
Building project: Src/Fare.sln
  c:\Windows\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe  Src/Fare.sln /t:Restore /m  /nodeReuse:False  /v:m  /p:RestorePackages="False" /p:AssemblyVersion="2.1.0.0" /p:FileVersion="2.1.1.0" /p:InformationalVersion="2.1.1.1-71eaf12a6553a6ddd84a0fc52e78f19ada766d0a" /p:PackageVersion="2.1.1.1" /logger:Fake.MsBuildLogger+ErrorLogger,"C:\Snapshot\Fare\build\tools\FAKE.Core\tools\FakeLib.dll"
Microsoft (R) Build Engine version 4.7.3056.0
[Microsoft .NET Framework, version 4.0.30319.42000]
Copyright (C) Microsoft Corporation. All rights reserved.

C:\Snapshot\Fare\Src\Fare.sln.metaproj : error MSB4057: The target "Restore" does not exist in the project. [C:\Snapshot\Fare\Src\Fare.sln]
Running build failed.
Error:
Building Src/Fare.sln failed with exitcode 1.

---------------------------------------------------------------------
Build Time Report
---------------------------------------------------------------------
Target                 Duration
------                 --------
CleanNuGetPackages     00:00:00.0025071
RestoreNuGetPackages   Failure
Total:                 00:00:01.0394329
---------------------------------------------------------------------
Status:                Failure
---------------------------------------------------------------------
---------------------------------------------------------------------
  1) Building Src/Fare.sln failed with exitcode 1.
  2) MSB4057: C:\Snapshot\Fare\Src\Fare.sln.metaproj(0,0): The target "Restore" does not exist in the project.
---------------------------------------------------------------------

#30, #31, /cc @zvirja

Class of char wrongly parsed

I have this code using Xeger package 2.2.1 that I used to test an AutoFixture call that is often crashing (this RegEx is normally in a RegularExpressionAttribute annotation).
new Fare.Xeger(@"^\s*[0-9\s]{0,6}\s*$").Generate();
that generates this C# string.
" s\t \t "

Obviously, the character "s" is not valid using this RegEx.

To reproduce, I called the previous code multiple time in the Execution window in Visual Studio 2022. The global results were

new Fare.Xeger(@"^\s*[0-9\s]{0,6}\s*$").Generate();
""
new Fare.Xeger(@"^\s*[0-9\s]{0,6}\s*$").Generate();
""
new Fare.Xeger(@"^\s*[0-9\s]{0,6}\s*$").Generate();
"  s\t \t  "
new Fare.Xeger(@"^\s*[0-9\s]{0,6}\s*$").Generate();
"4\t\t"

NB
Note how the [0-9\s] part is almost never used. This seems bad in a randomized context.

Pattern-problem

If I use "[A-Z][0-9A-Z]{10}" as the pattern, I get a string of 11 chars lenth but char 2 to 11 is always a number. This is true for other patterns too:

"[A-Z][A-Za-z0-9]{10}"
"[A-Z][a-zA-Z0-9]{10}"

Other samples:
"[A-Za-z0-9]{11}" => 11 numbers
"[A-Za-z]{11}" => 11 upper-case letters

Seems to be a little bit confusing to me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.