Code Monkey home page Code Monkey logo

pyformlang's Introduction

pyformlang

Python package pypi Documentation badge

A python library to manipulate formal grammar. In general, it can be used to better understand algorithms in a formal way.

If you use Pyformlang in your project, please cite our paper:

@InProceedings{pyformlang,
author="Romero, Julien",
title="Pyformlang: An Educational Library for Formal Language Manipulation",
booktitle="SIGCSE",
year="2021"
doi = {https://doi.org/10.1145/3408877.3432464}
}

Installation

pip3 install pyformlang

Sources

Most algorithms come from Introduction to Automata Theory, Languages, and Computation (2nd edition) by John E. Hopcroft, Rajeev Motwani and Jeferey D. Ullman.

Indexed grammars come from the original paper Index Grammars - An Extension of Context-free grammars by Alfred V. Aho.

On the implementation of Hopcroft minimization algorithm: Implementation of Hopcroft's Algorithm, Hang Zhou

Intersection CFG/Regex and a better written version

Usage

Please refer to the official documentation: pyformlang.readthedocs.io.

pyformlang's People

Contributors

aunsiels avatar belonesox avatar ilyaep avatar kubef avatar marcotcr avatar snyk-bot avatar vadyushkins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyformlang's Issues

AttributeError: module 'pyformlang' has no attribute 'regular_expression'

from pyformlang.finite_automaton import EpsilonNFA, State, Symbol, Epsilon

enfa = EpsilonNFA()
state0 = State(0)
state1 = State(1)
symb_a = Symbol("0")
symb_b = Symbol("1")
enfa.add_start_state(state0)
enfa.add_final_state(state1)
enfa.add_transition(state0, symb_a, state0)
enfa.add_transition(state1, symb_b, state0)
enfa.add_transition(state1, symb_b, state1)

# Turn a finite automaton into a regex...
regex = enfa.to_regex()
# And turn it back into an epsilon non deterministic automaton
enfa2 = regex.to_epsilon_nfa()

Source: Here

This code generates AttributeError: module 'pyformlang' has no attribute 'regular_expression

But importing Regex from pyformlang.regular_expression fixes that issue.

Why a method call to a library requires explicit import?

Importing Regex from pyformlang.regular_expression at epsilon_nfa.py causes circular import.
So importing Regex at runtime from to_regex() may resolve this. Like

    def to_regex(self) -> "Regex":
        from pyformlang import regular_expression
        ...
        ...
        ...
        return regular_expression.Regex(res)

at epsilon_nfa.py

Bug with to_normal_form() function

Hi, @Aunsiels!
First of all, thanks a lot for the wonderful library!
I was translating the context-free grammar into the Chomsky normal form by using to_normal_form() function and noticed this strange bug.

>>> from pyformlang.cfg import CFG
>>> cfg = CFG.from_text("S -> a S b S")
>>> cnf = cfg.to_normal_form()
>>> cnf.productions
set()

However, I was expecting to see something like this.

S -> S0 S
S0 -> S2 S1
S1 -> b
S2 -> S3 S
S3 -> a

Environment

  • OS: Ubuntu 20.04
  • Python: 3.8.5
  • Pyformlang: 0.1.24, 0.1.23, 0.1.22, 0.1.21, 0.1.20
    Yes, on all these versions of the library, the bug is also reproduced.

Standard regular expression

Hello,

I would like to use standard regular expressions along with the core functions of your library, such as building an automaton from a regular expression. How can i do this?

Thanks for pyformlang!

pyformlang.regular_expression.regex_objects.MisformedRegexError: Wrong parenthesis regex Regex

Hi, @Aunsiels!

Thank you for the awesome library! 👏

Unfortunately, while using your library, I got the error 🐛 mentioned in the title. 😞

Environment

  • OS: Ubuntu 20.04
  • Python: 3.8.10
  • Pyformlang: 0.1.26

How to reproduce

from pyformlang.cfg import CFG
from pyformlang.rsa import RecursiveAutomaton as RSA

rsa = RSA.from_cfg(CFG.from_text("S -> ((a|((b|c))))"))

Possible reason

I think this is due to the fact that instead of a regular expression on the right side, CFG breaks it down into products(using |) of the form like this:

S -> c))))
S -> ((a
S -> ((b

Which will be converted to text like this

S -> c))))|((a|((b

Possible solution

I can suggest replacing the from_cfg() function with the from_text() function, since regex in production body is not supported in CFG in the expected way.

But these will be backward-incompatible changes and I want to know what you think about this?

Question: PDA final state or empty stack?

Hi,
I read your doc on readthedocs and browsed through the code, but I'm still not sure when which type of PDA is used.

When adding the transitions step by step to the PDA, should this be a PDA accepting on final state or on empty stack? Is this determined by checking whether final_state contains some states?

I noticed that when calling to_final_state() and to_empty_stack() in either calls the (same) automata changed. Is it assumed that the automata is in accept by state mode when calling to_empty_stack() and analog for accept by empty stack?
(the thought of the functions assuming a mode of the PDA comes to me by the comments like Turns the current PDA final state to empty stack)

Then when converting to CFG, is it also assumed that the automata was in accept by empty stack (that's how I'd interpret the comment Turns this PDA on empty stack accepting into a CFG)?

Please elaborate on what is set how and what is assumed.

Trouble with epsilon in CFG

$ python3
Python 3.7.5 (default, Apr 19 2020, 20:18:17) 
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyformlang.cfg import CFG
>>> gr = CFG.from_text('S -> a S b S\nS -> ')
>>> gr.generate_epsilon()
True
>>> new_gr = gr.to_normal_form()
>>> new_gr.generate_epsilon()
False

But I think that language generated by new_gr should be equal to language generated by gr. Including an epsilon.

Trouble with Regex

Hello,

When I was using regular expressions in my program, I found some unusual behavior.

For example, I can write this:

Python 3.9.0 (default, Nov 18 2020, 13:28:38) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyformlang.regular_expression import Regex
>>> Regex('a a | a')
((a.a)|a)
>>> Regex('a . a | a')
((a.a)|a)
>>> Regex('a* a | a')
(((a)*.a)|a)

So, I realise that priority of operator . more than priority of operator |. But when I use * just before |, priority of this operations changes:

>>> Regex('a a* | a')
(a.((a)*|a))
>>> Regex('a . a* | a')
(a.((a)*|a))
>>> Regex('a . a* | (a)')
(a.((a)*|a))
>>> Regex('a* a* | a')
((a)*.((a)*|a))

I can define expression ((a.(a)*)|a) only through using parenthesis:

>>> Regex('(a a*) | a')
((a.(a)*)|a)

And error isn't in printing:

>>> Regex('b a* | a').accepts('b')
True
>>> Regex('b a* | a').accepts('a')
False
>>> Regex('(b a*) | a').accepts('a')
True
>>> 

Bug with CNF

Hello! I found some problem with CNF. After translating my CNF grammar to text I tried to read it from text again, but got another grammar...

cfg = CFG.from_text("S -> a S b | a b")
cnf = cfg.to_normal_form()
cnf.contains([Terminal("a"),Terminal("b")])
True
new_text = cnf.to_text()
new_cnf = CFG.from_text(new_text)
new_cnf.contains([Terminal("a"), Terminal("b")])
False

Can you fix this please?

P.S. I think it might be happening because of wrong translation of variables (creating variables with lowercase first letter from terminals (a#CNF# -> Terminal(a))

Bug with braces and brackets in character ranges in PythonRegex

Character ranges work incorrectly when braces or brackets are used inside them. Starting with braces, the expressions r"[{}]" and r"[\{\}]" work correctly, e.g.

from pyformlang.regular_expression import PythonRegex
PythonRegex(r"[{}]").accepts(["{"])

returns True. However, using r"[{-}]" raises an error and r"[\{-\}]" works incorrectly as the expression does not accept |.

Using brackets does not really work at all. The expressions r"[\[]", r"[Z-\[]", r"[\[-a]" and r"[\[-\]]" do not accept anything and r"[\]]", r"[Z-\]]" and r"[\]-a]" raise errors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.