Code Monkey home page Code Monkey logo

implementation's Introduction

Build Status

PHPVerbalExpressions

VerbalExpressions is a PHP library that helps to construct hard regular expressions.

Installation

The project supports Composer so you have to install Composer first, before project setup.

$ composer require  verbalexpressions/php-verbal-expressions:dev-master

Examples

<?php
// some tests
require './vendor/autoload.php';
use VerbalExpressions\PHPVerbalExpressions\VerbalExpressions;

$regex = new VerbalExpressions();

$regex->startOfLine()
      ->then("http")
      ->maybe("s")
      ->then("://")
      ->maybe("www.")
      ->anythingBut(" ")
      ->endOfLine();


if ($regex->test("http://github.com")) {
    echo "valid url". '<br>';
} else {
    echo "invalid url". '<br>';
}

if (preg_match($regex, 'http://github.com')) {
    echo 'valid url';
} else {
    echo 'invalid url';
}


echo "<pre>". $regex->getRegex() ."</pre>";

echo $regex->clean(array("modifiers" => "m", "replaceLimit" => 4))
           ->find(' ')
           ->replace("This is a small test http://somesite.com and some more text.", "-");

More examples are available in the following files:

Business readable language expression definition

$definition = 'start, then "http", maybe "s", then "://", maybe "www.", anything but " ", end';
$regex = new VerbalExpressionsScenario($definition);

Methods list

Name Description Usage
add add values to the expression add('abc')
startOfLine mark expression with ^ startOfLine(false)
endOfLine mark the expression with $ endOfLine()
then add a string to the expression add('foo')
find alias for then find('foo')
maybe define a string that might appear once or not maybe('.com')
anything accept any string anything()
anythingBut accept any string but the specified char anythingBut(',')
something accept any non-empty string something()
somethingBut anything non-empty except for these chars somethingBut('a')
replace shorthand for preg_replace() replace($source, $val)
lineBreak match \r \n lineBreak()
br shorthand for lineBreak br()
tab match tabs \t tab()
word match \w+ word()
anyOf any of the listed chars anyOf('abc')
any shorthand for anyOf any('abc')
range adds a range to the expression range(a,z,0,9)
withAnyCase match case default case sensitive withAnyCase()
stopAtFirst toggles the g modifiers stopAtFirst()
addModifier add a modifier addModifier('g')
removeModifier remove a mofier removeModifier('g')
searchOneLine Toggles m modifier searchOneLine()
multiple adds the multiple modifier multiple('*')
_or wraps the expression in an or with the provided value _or('bar')
limit adds char limit limit(1,3)
test performs a preg_match test('[email protected]')

For all the above method (except test) you could use the VerbalExpressionsScenario.

Other Implementations

You can see an up to date list of all ports on VerbalExpressions.github.io.

Building the project and running the tests

The project supports Composer so you have to install Composer first before project setup.

curl -sS https://getcomposer.org/installer | php
php composer.phar install --dev
ln -s vendor/phpunit/phpunit/phpunit.php phpunit
./phpunit

implementation's People

Contributors

metal3d avatar mihai-vlc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

implementation's Issues

Add DenoVerbalExpressions

Deno 1.0 was recently announced. I ported the JSVerbalExpressions over. Mainly, I had to make it Strict TypeScript compliant and convert the tests into the core Deno testing framework and add the mod.ts . You can view it here: https://github.com/evosland/DenoRegularExpressions .

I didn't call it TsVerbalExpressions because it's specific to Deno and depends on some of its internals. I didn't write anything new, however. It's really a fixup of JavaScript version.

Or method enhancements

Still working on implementatin, I see the whole implementation (excepting mine, but I'm wrong to implement this before speaking about with you) uses this kind of method:

or(value)
=> ...|value

I'm afraid that this one is not very usefull. It can be ok for simple test, but what about finding:

startofline, then "foo", then "bar", endofline
OR
startofline, then "bar", then "baz". endofline

My implementation does:

StartOfLine().Then("foo").Then("bar").EndOfLine().Or().StartOfLine().Then("bar").Then("baz").EndOfLine()

That returns:

(?m)(?:^(?:foo)(?:bar)$)|(?:^(?:bar)(?:baz)$)

I only record an array of "parts" to concatanate at compile time... My unittest work as expected.

That works with simple expressions, or multiple expressions...

Find("foo").Or().Find("bar")
(?m)(?:foo)|(?:bar)

StartOfLine().Then("foo").Then("bar").EndOfLine().
Or().StartOfLine().Then("bar").Then("baz").EndOfLine().
Or().StartOfLine().Find("AAA").EndOfLine().
(?m)(?:^(?:foo)(?:bar)$)|(?:^(?:bar)(?:baz)$)|(?:^(?:AAA)$)

Not() method

Testing again implementation, I've got some ideas...

We have "Find(value)" method that returns:

(?:value)

Right... but there is no method to return:

(?!value)

We've got AnythingBut and SomethingBut that works with ranges, not a word. Or maybe I misimplemented methods.

I suggest "Not(value)", are you ok with this idea ?

PS: Go cannot treat this regexp notation... I will find other way to do. But for other, can you give an opinion ?

Question about maintainer ownership

I'm curious since these libraries are very clearly ports of the original library to what extent I have ownership of the elm-verbal-expressions repo specifically, so that I could add a code of conduct of my choosing, for example.

Language-agnostic tests

Would I be alone in thinking that having a unified, language-agnostic test suite would be beneficial?

I've recently been thinking over the issue of unit tests for the different implementations (mostly just the PHP & JS ones as you might guess). A quick google suggested that language-agnostic test suites are few & far between. So I rolled my own :P

It struck me that one would need to define a series of tests for VerbalExpressions that can be read in as many languages as possible. Parsing XML isn't exactly straight-forward in PHP or JS, so I went with JSON.

So with the test suites being defined in JSON, it would seem like a good idea to ensure that the test suites were well-defined (so each implementation would know what to expect of the structure of the files). For this we have JSON Schema and a grunt task I cooked up last night (this issue would've been posted last night, but unicorns got in the way). This allows test definers to be confident that when a new version goes out there should only be issues with implementations when the test schema changes (i.e. if grunt jsonschema fails, don't push it).

The basic schema I've implemented in the proof-of-concept examples allows for:

  • multiple tests per file (no restrictions on file names beyond any silliness that would prevent an implementation from parsing it)
  • abstract call stacks & arguments (only a single call stack for all languages)
  • default expected output + language-specific outputs (if for whatever reason an implementation needs to generate different regex strings to be most performant whilst doing the same job)

Thoughts/comments/suggestions?

Matrix with implementation support?

Hi,

Is there any matrix/overview with the implementations of VerbalExpressions for various methods?

It looks that there are allot of inconsistencies among language implementations :( .

Thank you.

Testing getRegex/toString

Should the same series of inputs spit out exactly the same regex, or should the regex be platform-specific ?

I'm referring mostly to the differences between the PHP & JS implementations, with JS seeming to have /g by default, and multiple('') giving "(?:)*" instead of "+".

The second part of this question is "should this repo contain a machine-consumable (JSON/XML/other) list of inputs & "accptable" outputs?

Checklist of function from JS source

These are the functions that need documentation if we consider the original JS version as the refference.

  • add
  • startOfLine
  • endOfLine
  • then
  • find
  • maybe
  • anything
  • anythingBut
  • something
  • somethingBut
  • replace
  • lineBreak
  • br (shorthand for lineBreak)
  • tab
  • word
  • anyOf
  • any (shorthand for anyOf)
  • range
  • withAnyCase
  • stopAtFirst
  • searchOneLine
  • multiple
  • or
  • begindCapture
  • endCapture

Update Kotlin implementation to support Kotlin Multiplatform

KotlinVerbalExpressions only targets Kotlin/JVM, which is not a big surprise since the repo hasn't been updated since Kotlin/JS, Kotlin/Native, and Kotlin/WASM have had more prominent releases.

VerbalExpressions would be especially useful for Kotlin Multiplatform, as currently it is difficult to create Regexes that perform identically on each platform. This is because currently each Kotlin target delegates to the Regex engine available at runtime, which can differ significantly between platforms.

Update KotlinVerbalExpressions

I would like to update the KotlinVerbalExpressions repo. I propose the following updates:

  • Update the Gradle version and build config
  • Update Kotlin from 1.2 to 1.8 (the latest version)
  • Migrate the build config to use Kotlin Multiplatform, but still target Kotlin/JVM

I have made a demonstration PR for this work: aSemy/KotlinVerbalExpressions#1

Kotlin Multiplatform

What is more difficult to determine is how to support additional Kotlin Multiplatform targets:

  • Kotlin/JS
  • Kotlin/Native
  • Kotlin/WASM

I see two approaches:

  1. VerbalExpressions can be written so that the platform's Regex engine does not have any impact.

    This would be the best solution, because then KotlinVerbalExpressions could be written in common Kotlin code, and would not need to worry about platform specific details.

    I suspect the best approach for this is to try and use JSVerbalExpressions as a basis, since Javascript are also dependent on the available Regex engine, so perhaps it has already overcome the same problem that Kotlin Multiplatform faces now?

  2. KotlinVerbalExpressions uses expect/actual to delegate to platform specific VerbalExpression implementations. For Kotlin/JS and Kotlin/JVM this would be easier, since there are already VerbalExpressions implementations. However, Kotlin/Native requires a pure-C implementation, and I don't think VerbalExpressions has a pure-C implementation?


CC @zsmb13 ๐Ÿ‘‹, since you originally implemented KotlinVerbalExpressions way back when, and would probably be able to help with publishing a new version?

Multiple() implementation

As I did on JS implementation, I recreate the issue there to discuss about "multiple" function.

Some implementation return the given value with "+", other give "{2,}"

To my side, for Go, I developped a method that have this implementation:

multiple(string value, int min, int max)

where:

  • min is the minimum number of occurence to get
  • maw is the maximum number of occurence to get

both argument are, IMHO, not required, default should be considered as doing "value?"

This is how I did:

// get "foo" at least one time
v.Multiple("foo")
v.Multiple("foo", 1)

// get "foo" 0 or more times
v.Multiple("foo", 0)

//get "foo" 0 or 1 times
v.Multiple("foo", 0, 1)

// get "foo" 0 to 10 times
v.Multiple("foo",0 ,10)

//get "foo" at least 10 times
v.Multiple("foo", 10)

//get "foo" exactly 10 times
v.Multiple("foo", 10, 10)

//get "foo" from 1 to 10 times
v.Multiple("foo", 1, 10)

Discussion is opened

Default regular expressions

In CSharp we have made some common regex expressions, like e-mail and url.
So e.g. one can write VerbEx something like:

verbEx.StartOfLine().Then(CommonRegex.Email);

I think this should be implemented similar across the different languages.

It's important that if we make something like default regex words like e-mail and url, that the underlying regex is equal between the different language ports.

Other examples of commonregex that could be implemented:

email
phone
url
date
ip address
rgb color hex value
decimal number
time format

See original issue in CSharp:
VerbalExpressions/CSharpVerbalExpressions#4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.