Code Monkey home page Code Monkey logo

Comments (10)

Aunsiels avatar Aunsiels commented on June 17, 2024 1

(and regarding your last question, self._test_compare only tests that PythonRegex has the same behavior as re. It does not assume the word is accepted or not. In your example, both modules return False, which is good)

from pyformlang.

aldo-roseman avatar aldo-roseman commented on June 17, 2024 1

I have found some new hard cases! It looks like the escaping of a dot with backslashes in front does not behave correctly:

from pyformlang.regular_expression import PythonRegex
PythonRegex(r"\.").accepts(["a"]) # Returns False
PythonRegex(r"\.").accepts(["."]) # Returns True
PythonRegex(r"\\.").accepts(["\\a"]) # Returns False, should be True
PythonRegex(r"\\.").accepts(["\\."]) # Returns False, should be True
PythonRegex(r"\\\.").accepts(["\\a"]) # Returns False
PythonRegex(r"\\\.").accepts(["\\."]) # Returns False, should be True

from pyformlang.

Aunsiels avatar Aunsiels commented on June 17, 2024

Thank you for your feedback. There were three different issues:

  1. "-" denotes a range, but if the range contains escaped characters (like "|"), we need to really escape them when unfolding the range.
  2. "[" and "]" were not able to be escaped inside a "[...]"
  3. (not mentioned here but appeared) r"a[ab\\]" had a weird behavior. In some cases, we need to ignore the escaping of the last "]" (I am not quite sure what the rule is; if there is an odd number of "\", the re module in Python raises an error).

Thank you again for your help in finding bug :) I added your examples to the tests, and they should be working now. A new version was pushed on Pypi.

from pyformlang.

aldo-roseman avatar aldo-roseman commented on June 17, 2024

Thanks for the quick fix! However, I think ranges where the ending character is escaped still do not work correctly. r"[{-}]" correctly accepts {, | and }. However, r"[{-\}]" only accepts { and }, but not |. Escaping the first { does not change the behavior. Similar for the brackets: r"[\[-\]]" only accepts [ and ], but not \.

(Also, it looks like you added the test self._test_compare(r"[{-}]", "-"), but r"[{-}]" should, and does not, accept - and should, and does, accept |. Shouldn't the automated test fail?)

from pyformlang.

Aunsiels avatar Aunsiels commented on June 17, 2024

Thanks again.

Ultimately, I had to work harder to understand what Python is actually doing and how escaping works (https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences). I am not quite sure I mimic exactly the behavior of Python re in all edge cases, but it should be closer now.

I discovered weird things. For example, the three following regexes seem to have the same behavior:

r = re.compile("\\x10")
r = re.compile("\x10")
r = re.compile(r"\x10")

To recognize the string \x10 (not hexadecimal equivalent), we have to escape twice:

r = re.compile("\\\\x10")

At this point, I am not sure anybody is writing such regex...

Anyway, it should work better now. A new version was pushed. Tell me if you find other hard cases!

from pyformlang.

aldo-roseman avatar aldo-roseman commented on June 17, 2024

Thanks! It looks like it works correctly now.

from pyformlang.

Aunsiels avatar Aunsiels commented on June 17, 2024

Thank you again! I quickly found the bug and corrected it.

from pyformlang.

aldo-roseman avatar aldo-roseman commented on June 17, 2024

Thanks for quick response, but it doesn't seem to be fixed? When I install version 1.0.7, I get the same results in my example as before?

from pyformlang.

Aunsiels avatar Aunsiels commented on June 17, 2024

In your examples, you have to remove the square brackets. Otherwise, Pyformlang understands that "\\a" is a single symbol/letter (because Pyformlang is more general in a way than Python re).

from pyformlang.

aldo-roseman avatar aldo-roseman commented on June 17, 2024

Ah yes, I forgot about that part. Thanks, it works now! Then my examples were also incorrect as the last one did already return True, but 'luckily' the middle two were indeed wrong...

from pyformlang.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.