aunsiels / pyformlang Goto Github PK

A python library to manipulate formal languages and various automata

Home Page: https://pypi.org/project/pyformlang/

License: MIT License

Makefile 0.17% Python 99.73% Dockerfile 0.09%

regular-languages language-theory python-library formal-languages automata indexed-grammars regex python context-free-grammar context-free-language

pyformlang's Introduction

pyformlang

A python library to manipulate formal grammar. In general, it can be used to better understand algorithms in a formal way.

If you use Pyformlang in your project, please cite our paper:

@InProceedings{pyformlang,
author="Romero, Julien",
title="Pyformlang: An Educational Library for Formal Language Manipulation",
booktitle="SIGCSE",
year="2021"
doi = {https://doi.org/10.1145/3408877.3432464}
}

Installation

pip3 install pyformlang

Sources

Most algorithms come from Introduction to Automata Theory, Languages, and Computation (2nd edition) by John E. Hopcroft, Rajeev Motwani and Jeferey D. Ullman.

Indexed grammars come from the original paper Index Grammars - An Extension of Context-free grammars by Alfred V. Aho.

On the implementation of Hopcroft minimization algorithm: Implementation of Hopcroft's Algorithm, Hang Zhou

Intersection CFG/Regex and a better written version

Usage

Please refer to the official documentation: pyformlang.readthedocs.io.

pyformlang's People

Contributors

Stargazers

Watchers

Forkers

yaccconstructor parthokr ilyaep vadyushkins belonesox longquanjiang sooids marcotcr kubef

pyformlang's Issues

AttributeError: module 'pyformlang' has no attribute 'regular_expression'

from pyformlang.finite_automaton import EpsilonNFA, State, Symbol, Epsilon

enfa = EpsilonNFA()
state0 = State(0)
state1 = State(1)
symb_a = Symbol("0")
symb_b = Symbol("1")
enfa.add_start_state(state0)
enfa.add_final_state(state1)
enfa.add_transition(state0, symb_a, state0)
enfa.add_transition(state1, symb_b, state0)
enfa.add_transition(state1, symb_b, state1)

# Turn a finite automaton into a regex...
regex = enfa.to_regex()
# And turn it back into an epsilon non deterministic automaton
enfa2 = regex.to_epsilon_nfa()

Source: Here

This code generates AttributeError: module 'pyformlang' has no attribute 'regular_expression

But importing Regex from pyformlang.regular_expression fixes that issue.

Why a method call to a library requires explicit import?

Importing Regex from pyformlang.regular_expression at epsilon_nfa.py causes circular import.
So importing Regex at runtime from to_regex() may resolve this. Like

    def to_regex(self) -> "Regex":
        from pyformlang import regular_expression
        ...
        ...
        ...
        return regular_expression.Regex(res)

at epsilon_nfa.py

Standard regular expression

Hello,

I would like to use standard regular expressions along with the core functions of your library, such as building an automaton from a regular expression. How can i do this?

Thanks for pyformlang!

Bug with to_normal_form() function

Hi, @Aunsiels!
First of all, thanks a lot for the wonderful library!
I was translating the context-free grammar into the Chomsky normal form by using to_normal_form() function and noticed this strange bug.

>>> from pyformlang.cfg import CFG
>>> cfg = CFG.from_text("S -> a S b S")
>>> cnf = cfg.to_normal_form()
>>> cnf.productions
set()

However, I was expecting to see something like this.

S -> S0 S
S0 -> S2 S1
S1 -> b
S2 -> S3 S
S3 -> a

Environment

OS: Ubuntu 20.04
Python: 3.8.5
Pyformlang: 0.1.24, 0.1.23, 0.1.22, 0.1.21, 0.1.20
Yes, on all these versions of the library, the bug is also reproduced.

Question: PDA final state or empty stack?

Hi,
I read your doc on readthedocs and browsed through the code, but I'm still not sure when which type of PDA is used.

When adding the transitions step by step to the PDA, should this be a PDA accepting on final state or on empty stack? Is this determined by checking whether final_state contains some states?

I noticed that when calling to_final_state() and to_empty_stack() in either calls the (same) automata changed. Is it assumed that the automata is in accept by state mode when calling to_empty_stack() and analog for accept by empty stack?
(the thought of the functions assuming a mode of the PDA comes to me by the comments like Turns the current PDA final state to empty stack)

Then when converting to CFG, is it also assumed that the automata was in accept by empty stack (that's how I'd interpret the comment Turns this PDA on empty stack accepting into a CFG)?

Please elaborate on what is set how and what is assumed.

pyformlang.regular_expression.regex_objects.MisformedRegexError: Wrong parenthesis regex Regex

Hi, @Aunsiels!

Thank you for the awesome library! 👏

Unfortunately, while using your library, I got the error 🐛 mentioned in the title. 😞

Environment

OS: Ubuntu 20.04
Python: 3.8.10
Pyformlang: 0.1.26

How to reproduce

from pyformlang.cfg import CFG
from pyformlang.rsa import RecursiveAutomaton as RSA

rsa = RSA.from_cfg(CFG.from_text("S -> ((a|((b|c))))"))

Possible reason

I think this is due to the fact that instead of a regular expression on the right side, CFG breaks it down into products(using |) of the form like this:

S -> c))))
S -> ((a
S -> ((b

Which will be converted to text like this

S -> c))))|((a|((b

Possible solution

I can suggest replacing the from_cfg() function with the from_text() function, since regex in production body is not supported in CFG in the expected way.

But these will be backward-incompatible changes and I want to know what you think about this?

Trouble with epsilon in CFG

$ python3
Python 3.7.5 (default, Apr 19 2020, 20:18:17) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyformlang.cfg import CFG
>>> gr = CFG.from_text('S -> a S b S\nS -> ')
>>> gr.generate_epsilon()
True
>>> new_gr = gr.to_normal_form()
>>> new_gr.generate_epsilon()
False

But I think that language generated by new_gr should be equal to language generated by gr. Including an epsilon.

Is there algorith to convert epsNFA to NFA (remove epsilon transitions)?

Is there algorith to convert epsNFA to NFA («remove epsilon transitions»)?

I could not find it, but may be I missed something.

Or should it be here? (PR with this algorithms will be welcomed here?)

Bug with CNF

Hello! I found some problem with CNF. After translating my CNF grammar to text I tried to read it from text again, but got another grammar...

cfg = CFG.from_text("S -> a S b | a b")
cnf = cfg.to_normal_form()
cnf.contains([Terminal("a"),Terminal("b")])
True
new_text = cnf.to_text()
new_cnf = CFG.from_text(new_text)
new_cnf.contains([Terminal("a"), Terminal("b")])
False

Can you fix this please?

P.S. I think it might be happening because of wrong translation of variables (creating variables with lowercase first letter from terminals (a#CNF# -> Terminal(a))

Bug with braces and brackets in character ranges in PythonRegex

Character ranges work incorrectly when braces or brackets are used inside them. Starting with braces, the expressions r"[{}]" and r"[\{\}]" work correctly, e.g.

from pyformlang.regular_expression import PythonRegex
PythonRegex(r"[{}]").accepts(["{"])

returns True. However, using r"[{-}]" raises an error and r"[\{-\}]" works incorrectly as the expression does not accept |.

Using brackets does not really work at all. The expressions r"[\[]", r"[Z-\[]", r"[\[-a]" and r"[\[-\]]" do not accept anything and r"[\]]", r"[Z-\]]" and r"[\]-a]" raise errors.

Trouble with Regex

Hello,

When I was using regular expressions in my program, I found some unusual behavior.

For example, I can write this:

Python 3.9.0 (default, Nov 18 2020, 13:28:38) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyformlang.regular_expression import Regex
>>> Regex('a a | a')
((a.a)|a)
>>> Regex('a . a | a')
((a.a)|a)
>>> Regex('a* a | a')
(((a)*.a)|a)

So, I realise that priority of operator . more than priority of operator |. But when I use * just before |, priority of this operations changes:

>>> Regex('a a* | a')
(a.((a)*|a))
>>> Regex('a . a* | a')
(a.((a)*|a))
>>> Regex('a . a* | (a)')
(a.((a)*|a))
>>> Regex('a* a* | a')
((a)*.((a)*|a))

I can define expression ((a.(a)*)|a) only through using parenthesis:

>>> Regex('(a a*) | a')
((a.(a)*)|a)

And error isn't in printing:

>>> Regex('b a* | a').accepts('b')
True
>>> Regex('b a* | a').accepts('a')
False
>>> Regex('(b a*) | a').accepts('a')
True
>>>

aunsiels / pyformlang Goto Github PK

pyformlang's Introduction

pyformlang

Installation

Sources

Usage

pyformlang's People

Contributors

Stargazers

Watchers

Forkers

pyformlang's Issues

Environment

Environment

How to reproduce

Possible reason

Possible solution

Recommend Projects

Recommend Topics

Recommend Org