textx / arpeggio Goto Github PK

Parser interpreter based on PEG grammars written in Python http://textx.github.io/Arpeggio/

License: Other

Python 99.37% Shell 0.63%

parser parsing parsing-expression-grammar python

arpeggio's Introduction

textX is a meta-language for building Domain-Specific Languages (DSLs) in Python. It is inspired by Xtext.

In a nutshell, textX will help you build your textual language in an easy way. You can invent your own language or build a support for already existing textual language or file format.

From a single language description (grammar), textX will build a parser and a meta-model (a.k.a. abstract syntax) for the language. See the docs for the details.

textX follows the syntax and semantics of Xtext but differs in some places and is implemented 100% in Python using Arpeggio PEG parser - no grammar ambiguities, unlimited lookahead, interpreter style of work.

Quick intro

Here is a complete example that shows the definition of a simple DSL for drawing. We also show how to define a custom class, interpret models and search for instances of a particular type.

from textx import metamodel_from_str, get_children_of_type

grammar = """
Model: commands*=DrawCommand;
DrawCommand: MoveCommand | ShapeCommand;
ShapeCommand: LineTo | Circle;
MoveCommand: MoveTo | MoveBy;
MoveTo: 'move' 'to' position=Point;
MoveBy: 'move' 'by' vector=Point;
Circle: 'circle' radius=INT;
LineTo: 'line' 'to' point=Point;
Point: x=INT ',' y=INT;
"""

# We will provide our class for Point.
# Classes for other rules will be dynamically generated.
class Point:
    def __init__(self, parent, x, y):
        self.parent = parent
        self.x = x
        self.y = y

    def __str__(self):
        return "{},{}".format(self.x, self.y)

    def __add__(self, other):
        return Point(self.parent, self.x + other.x, self.y + other.y)

# Create meta-model from the grammar. Provide `Point` class to be used for
# the rule `Point` from the grammar.
mm = metamodel_from_str(grammar, classes=[Point])

model_str = """
    move to 5, 10
    line to 10, 10
    line to 20, 20
    move by 5, -7
    circle 10
    line to 10, 10
"""

# Meta-model knows how to parse and instantiate models.
model = mm.model_from_str(model_str)

# At this point model is a plain Python object graph with instances of
# dynamically created classes and attributes following the grammar.

def cname(o):
    return o.__class__.__name__

# Let's interpret the model
position = Point(None, 0, 0)
for command in model.commands:
    if cname(command) == 'MoveTo':
        print('Moving to position', command.position)
        position = command.position
    elif cname(command) == 'MoveBy':
        position = position + command.vector
        print('Moving by', command.vector, 'to a new position', position)
    elif cname(command) == 'Circle':
        print('Drawing circle at', position, 'with radius', command.radius)
    else:
        print('Drawing line from', position, 'to', command.point)
        position = command.point
print('End position is', position)

# Output:
# Moving to position 5,10
# Drawing line from 5,10 to 10,10
# Drawing line from 10,10 to 20,20
# Moving by 5,-7 to a new position 25,13
# Drawing circle at 25,13 with radius 10
# Drawing line from 25,13 to 10,10

# Collect all points starting from the root of the model
points = get_children_of_type("Point", model)
for point in points:
    print('Point: {}'.format(point))

# Output:
# Point: 5,10
# Point: 10,10
# Point: 20,20
# Point: 5,-7
# Point: 10,10

Video tutorials

Introduction to textX

Implementing Martin Fowler's State Machine DSL in textX

Docs and tutorials

The full documentation with tutorials is available at http://textx.github.io/textX/stable/

You can also try textX in our playground. There is a dropdown with several examples to get you started.

Support in IDE/editors

Projects that are currently in progress are:

textX-LS - support for Language Server Protocol and VS Code for any textX based language. This project is about to supersede the following projects:
- textX-languageserver - Language Server Protocol support for textX languages
- textX-extensions - syntax highlighting, code outline
viewX - creating visualizers for textX languages

If you are a vim editor user check out support for vim.

For emacs there is textx-mode which is also available in MELPA.

You can also check out textX-ninja project. It is currently unmaintained.

Discussion and help

For general questions, suggestions, and feature requests please use GitHub Discussions.

For issues please use GitHub issue tracker.

Citing textX

If you are using textX in your research project we would be very grateful if you cite our paper:

Dejanović I., Vaderna R., Milosavljević G., Vuković Ž. (2017). TextX: A Python tool for Domain-Specific Languages implementation. Knowledge-Based Systems, 115, 1-4.

License

MIT

Python versions

Tested for 3.8+

arpeggio's People

Contributors

Stargazers

Watchers

Forkers

sciumo hfeeki avrambdp superactive ygg01 mzsk sdvillal moreati leonelgv elsevierknowledgebasedsystems leiyangyou amper rseymour isaachaze qix- smbolton pombredanne snookee python3pkg demianw schmittlauch aluriak lhchavez textx-tools morganjk iamjoshbinder ianmmoir aolko chintookr sebix dodumosu yssource yuukanoo kushalvenkatesh gridl mabraham bernardovillalonga edhodapp folkol joliet0l alensuljkanovic unigrammar tbm chandannaik999 mgorny mdamien cesareconcordia philip-h-dye elijahahianyo vellamk stanislaw shanecandoit arpitjain799 smurfix vvaidy

arpeggio's Issues

Keeping comments

Related to textX/textX#54 (comment).

There are some situations in you want to keep comments. For example, when transpiling things.

Don't know what is the best way to do it, but having the comment and the line where it is should suffice.

Php Parser : bug triggered by direct rule reference

I'm trying to create a php parser (selected portions of PHP only)

My code
http://pastebin.com/2RHWYTWf

Error I faced
http://pastebin.com/NRR6hzDW

It looks error somewhere in library or I may need to tweak grammar.

Same grammar works with other library.

Also I want to remove ws* between each tokens. I mean whitespace should not be consided and removed automatically by lib or using any rule.
ws= whitespace

Thanks
Yash

Preverse line number

Hi Igor,

If I found a match in a file I would to store reference of that. like line number

Example: Dom view of any editor show all functions of current doc and when you click on function you go to where it is defined in same doc.

Any idea how can I achieve this?

Thanks
Yash

RecursionError: maximum recursion depth exceeded while calling a Python object

Hello!

I'm trying to use Arpeggio to build PEG grammar for new language. I'm new to grammar design, so maybe that's, my fault.

Your tool, seems great, but I faced problem. I receive "RecursionError: maximum recursion depth exceeded while calling a Python object" error, when my grammar becomes complex enough.

Call stack is repetition of

File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/__init__.py", line 380, in _parse
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/__init__.py", line 272, in parse
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/__init__.py", line 350, in _parse
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/__init__.py", line 272, in parse
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/__init__.py", line 380, in _parse
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/__init__.py", line 230, in parse
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/__init__.py", line 141, in dprint

File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/peg.py", line 108, in _resolve
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/peg.py", line 108, in _resolve
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/peg.py", line 108, in _resolve
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/peg.py", line 108, in _resolve
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/peg.py", line 108, in _resolve
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/peg.py", line 108, in _resolve
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/peg.py", line 108, in _resolve
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/peg.py", line 109, in _resolve
File "/usr/lib/python3.5/site-packages/Arpeggio-1.1-py3.5.egg/arpeggio/__init__.py", line 753, in __eq__

Below is the example of such problematic grammar:

number_token     = r'(\d+|\d+\.\d*|\d*\.\d+)'
identifier_token = r'[a-zA-Z_][a-zA-Z0-9_]*'

unqualified_identifier_expression = identifier_token
qualified_identifier_expression   = identifier_token ( "." identifier_token )*

unary_expression              = number_token / qualified_identifier_expression

method_call_expression        = ( collection_index_expression / method_call_expression / qualified_identifier_expression ) "(" ( rvalue_expression ( "," rvalue_expression )* )? ")"
collection_index_expression   = ( method_call_expression / collection_index_expression / qualified_identifier_expression ) "["   rvalue_expression ( "," rvalue_expression )*    "]"

lvalue_expression             = qualified_identifier_expression
rvalue_expression             = collection_index_expression / method_call_expression / unary_expression

expression                    = collection_index_expression / method_call_expression / unary_expression

belang = expression* EOF

When I simplify method_call_expression and collection_index_expression productions to

method_call_expression        = qualified_identifier_expression "(" ( rvalue_expression ( "," rvalue_expression )* )? ")"
collection_index_expression   = qualified_identifier_expression "["   rvalue_expression ( "," rvalue_expression )*    "]"

Everything works, but grammar is not as powerful as I want.

So be more specific, I want

a [ 0 ] ( 1 ) [ 2 ]

to be parsed as
assume "a" to be collection, and get element #0
assume returned value to be callable, and call with argument of 1
assume returned value to be collection, and get element #2

incorrect behavior when parsing unordered group in clean PEG

The unordered group will not be parsed correctly in a multi-line grammar unless a backslash is the last character on the line in the grammar.

This is how you would expect to write the grammar (no backslash):

print  ParserPEG("""
  letters = "{" ("a" "b")#  "}"
""", "letters").parse(""" { b a } """)

This incorrectly throws an exception: arpeggio.NoMatch: Expected 'a' at position (1, 4) => ' { *b a } '.

This example adds a backslash as the last char in the line fixes the parsing:

print ParserPEG("""
  letters = "{" ("a" "b")#  "}" \
""", "letters").parse(""" { b a } """)

This correctly prints: { | b | a | }

I'm using Arpeggio 1.7.1 installed from pip under Python 2.7 in Windows.

Reserved Keywords ?

Hi, first of all, thanks for this library !

Sorry if I'm missing something obvious, but I'm wondering how I can simply handle reserved keywords in my grammar with Arpeggio.

Suppose that I have some reserved keywords in my grammar, like class or function, and I don't want them to be recognized as valid identifier:

from arpeggio import Kwd, EOF, ParserPython
from arpeggio import RegExMatch as _

##### GRAMMAR ################################################################

def identifier ():         return _(r'[a-zA-Z]\w*') # generate ambiguities with reserved keywords
# ...
def class_body ():         return '{', '}'
def class_name ():         return identifier
def class_declaration ():  return Kwd ('class'), class_name, class_body, EOF

##### MAIN ###################################################################

input_program = 'class class { }'
parser = ParserPython(class_declaration, ignore_case=False, debug=True)
parser.parse(input_program)

The code above will parse the text 'class class { }' without errors, because the second word class match the rule class_name :

?? Try match rule class_name=RegExMatch([a-zA-Z]\w*) in class_declaration at position 6 => class *class { }
   ++ Match 'class' at 6 => 'class *class* { }'

For now, I'm using the following workaround that excludes keywords from the identifier regex:

reserved_keywords = ['class', 'function'] # ...
def identifier (): return _(r'(?!\b({})\b)([a-zA-Z]\w*)'.format ('|'.join (reserved_keywords)))

It works as I expected:

arpeggio.NoMatch: Expected class_name at position (1, 7) => 'class *class { }'.

But is there something more automatic in Arpeggio to achieve that same purpose ? I'm thinking of something like the Keyword class in PyPEG that internally maintains a list of keywords used in the grammar.

Thanks !

Gracefully handle some pathological expression combination

Some PEG expression combinations leads to infinite loops.

For example:

ZeroOrMore(Optional(...

Optional succeeds always making ZeroOrMore match to iterate infinitely.

Support for lexical rules

While matching lexical rules whitespaces should be preserved (no ws skipping). Matching for comments should be disabled also.

Update docs to reflext RegExMatch changes

New params introduced by PR #38 from @Aluriak
Docs needs update.

Implement deepcopy for parsed nodes

Being able to deep copy nodes will enable implementing strategies requiring parse tree transformations such as syntactic sugar or static analyses of the language. Hence, it would be good to be able to copy nodes.

"Not" does not work

In [1]: from arpeggio import *
In [2]: def expr(): return OneOrMore("-"), Not("-"), OneOrMore("-")
In [3]: p = ParserPython(expr)
In [4]: p.debug = True
In [5]: p.parse("----o----")
>> Matching rule expr=Sequence at position 0 => *----o----
   >> Matching rule OneOrMore in expr at position 0 => *----o----
      ?? Try match rule StrMatch(-) in expr at position 0 => *----o----
      ++ Match '-' at 0 => '*-*---o----'
      ?? Try match rule StrMatch(-) in expr at position 1 => -*---o----
      ++ Match '-' at 1 => '-*-*--o----'
      ?? Try match rule StrMatch(-) in expr at position 2 => --*--o----
      ++ Match '-' at 2 => '--*-*-o----'
      ?? Try match rule StrMatch(-) in expr at position 3 => ---*-o----
      ++ Match '-' at 3 => '---*-*o----'
      ?? Try match rule StrMatch(-) in expr at position 4 => ----*o----
      -- No match '-' at 4 => '----*o*----'
   <<+ Matched rule OneOrMore in expr at position 4 => ----*o----
   >> Matching rule Not in expr at position 4 => ----*o----
      ?? Try match rule StrMatch(-) in expr at position 4 => ----*o----
      -- No match '-' at 4 => '----*o*----'
   <<- Not matched rule Not in expr at position 4 => ----*o----
   >> Matching rule OneOrMore in expr at position 4 => ----*o----
      ?? Try match rule StrMatch(-) in expr at position 4 => ----*o----
      -- No match '-' at 4 => '----*o*----'
   <<- Not matched rule OneOrMore in expr at position 4 => ----*o----
<<- Not matched rule expr=Sequence in expr at position 0 => *----o----
---------------------------------------------------------------------------
NoMatch                                   Traceback (most recent call last)
<ipython-input-5-6b2f56b7e019> in <module>()
----> 1 p.parse("----o----")

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/arpeggio/__init__.py in parse(self, _input, file_name)
   1314             self.comments_model.clear_cache()
   1315         self.comment_positions = {}
-> 1316         self.parse_tree = self._parse()
   1317 
   1318         # In debug mode export parse tree to dot file for

...

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/arpeggio/__init__.py in _nm_raise(self, *args)
   1513                     self.nm.rules.append(rule)
   1514 
-> 1515         raise self.nm
   1516 
   1517 

NoMatch: Expected '-' or '-' or '-' at position (1, 5) => '----*o----'.

Replace use of imp library with importlib

imp has been deprecated a long time ago, and it is going to be removed soon.

imp-to-importlib.patch replaces use of imp library with importlib for Python 3 (Python 2 support is still maintained via imp).

I am sorry for not providing a proper pull request, but I am a Python maintainer in SUSE and this is just one of many packages I maintain.

Version 0.6

Adding Logical Operators to Grammar

Hello,

I'm trying to add AND and OR operators to my grammar. I have the following:

    def number():
        return _(r'\d*\.\d*|\d+')

    def function_name():
        return _(r'[a-zA-Z_0-9]+')

    def func():
        return function_name, "(", \
               [expression, number], \
               ZeroOrMore((',', [expression, number])), \
               ")"

    def factor():
        return Optional(["+", "-"]), [number, ("(", expression, ")"), func]

    def term():
        return factor, ZeroOrMore(["*", "/"], factor)

    def expression():
        return term, ZeroOrMore(["+", "-"], term)

    def check_term():
        return [func, ("(", check_expression, ")")]
    
    def check_expression():
        return check_term, ZeroOrMore(["AND", "OR"], check_term)

    def check():
        return check_expression, EOF

Where a function can return a number or a bool. However, when applying the grammar to something like GreaterThan(Avg(5, 6, 7), 0.25) AND LessThan(Avg(5, 6, 7), 0.3), I get the following error:

arpeggio.NoMatch: Expected '*' or '/' or '+' or '-' or EOF at position (1, 33) => ')), 0.25) *AND LessTh'.

GreaterThan and LessThan compare their numeric arguments to return a bool. Avg takes the average of its numeric arguments to return a number.

Seems like it's hitting expression instead of check_expression. Any ideas?

NonTerminal to string

Support rendering NonTerminal to a matched string. This operation should be efficient (caching).

Why semicolon

Python users see things clean and never use ; to end statements. This is obvious for many langs but it looks extra in python.

While using grammar I found it mandatory to put semicolon (;) at end of each line which is looks unnecessary to me.

I hope you got my point :) What is your opinion?

Thanks,
Yash , KineticWing IDE

Support for injecting extra rules written in Python into PEG grammars

It would be useful to be able to inject rules written as python functions into PEG grammars.
This would accomplish two things:

Greater portability for libraries. I could publish a library with python functions which anyone could use regardless of whether they're using the peg, cleanpeg or python parsers. Python functions, although more cumbersome to write, are more composable.
It would allow allow the user to write special rules able to respect whitespace in PEG files, while skipping whitespace in the rest of the rules. I believe this is currently impossible without rewriting the whole grammar in Python.

I'd suggest the following API:

from lib.external import rule1, rule2
from arpeggio.cleanpeg import ParserPEG
parser = ParserPEG(calc_grammar,
    "calc", 
    extra_rules={'rule_name1': rule1,  'rule_name2': rule2})

The user could then use 'rule_name1' and 'rule_name2' in the file, and the rules would be automatically resolve. There might be a problem with name clashes between user defined rules and inner rules defined by the external functions, though. I'm not familiar enough with Arpeggio's internals to be sure.

Bugfix release 1.9.1 is available

Please see CHANGELOG.md

Memoization cache not cleared on input change

When input is changed memoization cache should be cleared on each ParserExpression.

Un-Pythonic escapes and escape sequences in PEG regexs and string literals

Hi Igor! Thanks for arpeggio, I'm really enjoying using it.

This could either be a bug report, or a feature/documentation request, since "fixing" the issue may break some existing PEG grammars.

When I started writing PEG grammars for arpeggio, I assumed they would use regular Python syntax regarding escapes and escape sequences in regexs and string literals. Then I discovered several things that don't work that way (they can be done using Python grammars, of course):

This fails by matching too much, if any single quotes occur after it in the grammar:

  rule = '\\'  # literal backslash before final single quote

As a work-around, one can use "\", since backslash escape isn't implemented for double-quoted strings. But that also means this doesn't work:

  rule = "a\"b"  # escaped double quote inside double-quoted string

Regular expression recognition is similarly limited:

  rule = r'\\[abc]\\'  # backslash before final single quote can fail
  
  rule = r"."   # double quotes not supported at all

Once string literals are parsed, escape sequences within them are incorrectly converted:

  rule = '\\n\n'  # matches 'backslash-newline-newline', not 'backslash-n-newline'.

  rule = '\a\b\f\r\v\101\x41\u266f\U0001036a'   # none are converted as one might expect

I can send you pull requests that will change quote-escaping to match the usual Python rules, and implement Python-standard escape sequence recognition. But first, I wanted to get your thoughts on the issue. While the changes I'd like to see follow the principle of least surprise, and would enhance the capability of PEG grammars, they probably will break at least a few grammars existing in the wild.

Thanks again,

-Sean

Documentation missing

Why not so popular?

Library looks good to me and having good collection of examples.

You're surely missing online documentation. Write some article and tutorials on lib.

Thanks,
Yash
KineticWing IDE

Sharing examples

Hi Igor,

I would to share few grammars based on cleanpeg. I would also tweak comment style in cleanpeg.
I'm working on python+bash so it is easier to comment in that style.

Would you accept pull request? or I just put them in forked version.

I can help in documentation also but please note My English is not very good and so documentation may contain errors.

Your PEG module is excellent. I tried other PEG python library also and found your PEG module picks from grammar effectively with very less effort. Congratulations for good work.

Support for left recursion

fix the docs link

currently links to http://www.igordejanovic.net/iYRTb/LbkRP/Arpeggio/

Please distribute tests and license in sdist pypi tarball

On pypi there is no license, which is needed for obvious reasons, and tests present in the tarball.

The tests would allow distributiion vendors to run some basic selftest to validate if the package is still working with the stack they present in their distribution.

skipws/ws is not set/restored in OrderedChoice

Related discussion can be found at textX/textX#205

Symbols with empty match are not preserved

Here is a minimal example.

Given the grammar:

abc = a b c
a = "a"
b = r'b?'
c = "c"

When it parses "abc" the result is normal,
but when it parses "ac" the NonTerminal b just disapper in the parse tree like this:

>>> print(parse_tree)
a | c

The result is the same when using b = "b"? instead.

Since b is actually matched, shouldn't it be in the parse tree (like a or c) with node.value == ''?

Support for infinite loops detection and reporting

In the case of left recursive grammar Arpeggio will end with RecursionError.
It would be nice to have a detailed report why that happen and where in the grammar is the left recursive loop.

Suppressing of matched strings

There should be a way to suppress matched string from showing up in parse tree.

Node Visitor

Its like same as semantic actions but make things looks very clean.

It's your competitor library which I found equally good as per my knowledge.

If possible rather separate semantic functions it good to have a class.
like this
https://github.com/erikrose/parsimonious/blob/master/parsimonious/nodes.py#L151

As per my knowledge both serve same purpose semantic functions and node visitor .

Node visitor makes code look clean and tight.

What's your opinion.

indexing children

The documentation (http://textx.github.io/Arpeggio/stable/semantics/) says that you can index children by rule name.
baz_created = children['baz']
This does not seem to be true as I get the following error:
TypeError: list indices must be integers or slices, not str

Make matched rules available as attributes on NonTerminal instances

Default attribute name should be the rule name but the user could override that.

Parsing comments in language?

What's a good way to handle general syntactic elements such as comments which can often appear virtually anywhere in the grammar? It seems very tedious to add it to the grammar, and since Arpeggio seems to ignore whitespace (which is generally what you want) it seems that handling comments that run until the end of a line is difficult.

So far I'm just running a preprocessing pass on the source code to strip out comments intelligently, but just wondering if there's a convenient way to add them to the grammar without too much work so I don't have to reimplement basic nesting logic for stripping out comments?

Awesome library otherwise! I was using Parsimonious before but it got in my way often, Arpeggio usually does exactly what you want by default.

Token not appearing in child nodes

Hello. I'm not quite sure what I'm doing wrong, but I tried to extend your calc example to work with more operators and aid my understanding of how to work with Arpeggio, and everything except the exponent and null rules are working as expected.

I get an IndexError when I use 'NULL' in any expression. I also realized that any expression involving '^' didn't work as expected. For example, with the original (commented out) code, the expression '4 ^ 2' would return 4.0.

I'm sure this is all my own mistake, but if you could clarify this, I'd be most grateful. Thanks in advance.

grammar = '''
number = r'\d+\.{0,1}\d*'
variable = r'[A-Z]+'
null = "NULL"
value = null / number / variable / "(" expression ")"
exponent = value (("^") value)*
product = exponent (("*" / "/") exponent)*
sum = product (("+" / "-") product)*
comparison = sum ((">=" / ">" / "<=" / "<" / "==" / "!=") sum)*
expression = comparison (("&&" / "||") comparison)*
builder = expression+ EOF
'''

OPERATIONS = {
    '+': operator.iadd,
    '-': operator.isub,
    '*': operator.imul,
    '/': operator.itruediv,
    '^': operator.ipow,
    '>=': operator.ge,
    '>': operator.gt,
    '<=': operator.le,
    '<': operator.lt,
    '==': operator.eq,
    '!=': operator.ne,
    '&&': operator.and_,
    '||': operator.or_
}

class TreeVisitor(PTNodeVisitor):
    def visit_null(self, node, children):
        return None

    def visit_number(self, node, children):
        return float(node.value)

    def visit_value(self, node, children):
        return children[-1]

    def visit_exponent(self, node, children):
        # TODO: not sure why the exponent is a special case,
        # but the sign isn't being consumed/returned in the parse
        # tree
        # ---- ORIGINAL CODE ----
        # exponent = children[0]
        # for i in range(2, len(children), 2):
        #     sign = children[i - 1]
        #     exponent = OPERATIONS[sign](exponent, children[i])
        #
        # return exponent
        # ---- END ORIGINAL CODE ----
        if len(children) == 1:
            return children[0]

        exponent = children[0]
        for i in children[1:]:
            exponent **= i

        return exponent

    def visit_product(self, node, children):
        product = children[0]
        for i in range(2, len(children), 2):
            sign = children[i - 1]
            product = OPERATIONS[sign](product, children[i])

        return product

    def visit_sum(self, node, children):
        total = children[0]
        for i in range(2, len(children), 2):
            sign = children[i - 1]
            total = OPERATIONS[sign](total, children[i])

        return total

    def visit_comparison(self, node, children):
        comparison = children[0]
        for i in range(2, len(children), 2):
            sign = children[i - 1]
            comparison = OPERATIONS[sign](comparison, children[i])

        return comparison

    def visit_expression(self, node, children):
        expression = children[0]
        for i in range(2, len(children), 2):
            sign = children[i - 1]
            expression = OPERATIONS[sign](expression, children[i])

        return expression


parser = ParserPEG(grammar, 'builder')


def process_builder_expression(expression):
    tree = parser.parse(expression)
    return visit_parse_tree(tree, TreeVisitor())

Double slash issue

http://pastebin.com/XzXCfXfZ

In somewhere middle of log
str:
namespace | Fuel\Tasks | ;

In End of log
ASG: Second pass

{'': '', 'Fuel\Tasks': {'Robots': [['run'], ['protect']]}}

Why double slash between Fuel and Tasks. I guess it is bug.

Python and Grammer is located in public repo
https://bitbucket.org/codeyash/resources/src/7cfb9ba5a0d1de12d2d10734de8e8b6dcb9c6ff3/enrichers/PhpParser/py/php/?at=master
Php.py and grammer.peg

With latest released Arpeggio

Comment in grammer

Bash or python style comment should be allowed in grammar.

For long grammars it's a must requirement.

Thanks,
Yash

Matching everything in a subexpression loops infinitely

I made a simple grammar

grammar = \
        """
        rule <- (subexpression)+;
        subexpression <- r'^.*$';
        """

and tried to parse a simple text like so

from arpeggio.peg import ParserPEG
parser = ParserPEG(grammar, "rule")
parsed = parser.parse("something simple")

And the process never finishes. It just goes on parsing, steadily filling up the free memory.

Report all possible matches at the point of failure

Currently, Arpeggio will report the first match it tried at the point of failure.
It should be fairly easy to implement reporting of all of the matches that could be found at that point for the parser to continue.

Transfer of Arpeggio to textX organization

The Arpeggio project is moving to its new home under textX organization by the end of the next week.
Please see here for the discussion.

I'll post here the date of the transfer operation when I make a detailed plan and the necessary steps for you to take if you have local clones of this repo.

Few tests fail with pytest5.0+

Raises changed a bit the behaviour with 5.x series and now the exceptions need to be accessed slighly differently.

See the errors bellow:

[    5s] =================================== FAILURES ===================================
[    5s] _________________________ test_non_optional_precedence _________________________
[    5s] 
[    5s]     def test_non_optional_precedence():
[    5s]         """
[    5s]         Test that all tried match at position are reported.
[    5s]         """
[    5s]         def grammar():
[    5s]             return Optional('a'), 'b'
[    5s]     
[    5s]         parser = ParserPython(grammar)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse('c')
[    5s] >       assert "Expected 'a' or 'b'" in str(e)
[    5s] E       assert "Expected 'a' or 'b'" in '<ExceptionInfo NoMatch tblen=12>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=12>' = str(<ExceptionInfo NoMatch tblen=12>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:27: AssertionError
[    5s] _______________________ test_optional_with_better_match ________________________
[    5s] 
[    5s]     def test_optional_with_better_match():
[    5s]         """
[    5s]         Test that optional match that has gone further in the input stream
[    5s]         has precedence over non-optional.
[    5s]         """
[    5s]     
[    5s]         def grammar():  return [first, Optional(second)]
[    5s]         def first():    return 'one', 'two', 'three', '4'
[    5s]         def second():   return 'one', 'two', 'three', 'four', 'five'
[    5s]     
[    5s]         parser = ParserPython(grammar)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse('one two three four 5')
[    5s]     
[    5s] >       assert "Expected 'five'" in str(e)
[    5s] E       assert "Expected 'five'" in '<ExceptionInfo NoMatch tblen=12>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=12>' = str(<ExceptionInfo NoMatch tblen=12>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:57: AssertionError
[    5s] ____________________________ test_alternative_added ____________________________
[    5s] 
[    5s]     def test_alternative_added():
[    5s]         """
[    5s]         Test that matches from alternative branches at the same positiona are
[    5s]         reported.
[    5s]         """
[    5s]     
[    5s]         def grammar():
[    5s]             return ['one', 'two'], _(r'\w+')
[    5s]     
[    5s]         parser = ParserPython(grammar)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse('   three ident')
[    5s] >       assert "Expected 'one' or 'two'" in str(e)
[    5s] E       assert "Expected 'one' or 'two'" in '<ExceptionInfo NoMatch tblen=16>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=16>' = str(<ExceptionInfo NoMatch tblen=16>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:74: AssertionError
[    5s] ___________________________ test_file_name_reporting ___________________________
[    5s] 
[    5s]     def test_file_name_reporting():
[    5s]         """
[    5s]         Test that if parser has file name set it will be reported.
[    5s]         """
[    5s]     
[    5s]         def grammar():      return Optional('a'), 'b', EOF
[    5s]     
[    5s]         parser = ParserPython(grammar)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse("\n\n   a c", file_name="test_file.peg")
[    5s] >       assert "Expected 'b' at position test_file.peg:(3, 6)" in str(e)
[    5s] E       assert "Expected 'b' at position test_file.peg:(3, 6)" in '<ExceptionInfo NoMatch tblen=8>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=8>' = str(<ExceptionInfo NoMatch tblen=8>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:89: AssertionError
[    5s] ______________________ test_comment_matching_not_reported ______________________
[    5s] 
[    5s]     def test_comment_matching_not_reported():
[    5s]         """
[    5s]         Test that matching of comments is not reported.
[    5s]         """
[    5s]     
[    5s]         def grammar():      return Optional('a'), 'b', EOF
[    5s]         def comments():     return _('\/\/.*$')
[    5s]     
[    5s]         parser = ParserPython(grammar, comments)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse('\n\n a // This is a comment \n c')
[    5s] >       assert "Expected 'b' at position (4, 2)" in str(e)
[    5s] E       assert "Expected 'b' at position (4, 2)" in '<ExceptionInfo NoMatch tblen=8>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=8>' = str(<ExceptionInfo NoMatch tblen=8>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:105: AssertionError
[    5s] _________________________ test_not_match_at_beginning __________________________
[    5s] 
[    5s]     def test_not_match_at_beginning():
[    5s]         """
[    5s]         Test that matching of Not ParsingExpression is not reported in the
[    5s]         error message.
[    5s]         """
[    5s]     
[    5s]         def grammar():
[    5s]             return Not('one'), _(r'\w+')
[    5s]     
[    5s]         parser = ParserPython(grammar)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse('   one ident')
[    5s] >       assert "Not expected input" in str(e)
[    5s] E       AssertionError: assert 'Not expected input' in '<ExceptionInfo NoMatch tblen=8>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=8>' = str(<ExceptionInfo NoMatch tblen=8>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:122: AssertionError
[    5s] ________________________ test_not_match_as_alternative _________________________
[    5s] 
[    5s]     def test_not_match_as_alternative():
[    5s]         """
[    5s]         Test that Not is not reported if a part of OrderedChoice.
[    5s]         """
[    5s]     
[    5s]         def grammar():
[    5s]             return ['one', Not('two')], _(r'\w+')
[    5s]     
[    5s]         parser = ParserPython(grammar)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse('   three ident')
[    5s] >       assert "Expected 'one' at " in str(e)
[    5s] E       assert "Expected 'one' at " in '<ExceptionInfo NoMatch tblen=16>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=16>' = str(<ExceptionInfo NoMatch tblen=16>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:137: AssertionError
[    5s] ____________________________ test_sequence_of_nots _____________________________
[    5s] 
[    5s]     def test_sequence_of_nots():
[    5s]         """
[    5s]         Test that sequence of Not rules is handled properly.
[    5s]         """
[    5s]     
[    5s]         def grammar():
[    5s]             return Not('one'), Not('two'), _(r'\w+')
[    5s]     
[    5s]         parser = ParserPython(grammar)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse('   two ident')
[    5s] >       assert "Not expected input" in str(e)
[    5s] E       AssertionError: assert 'Not expected input' in '<ExceptionInfo NoMatch tblen=12>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=12>' = str(<ExceptionInfo NoMatch tblen=12>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:152: AssertionError
[    5s] ___________________________ test_compound_not_match ____________________________
[    5s] 
[    5s]     def test_compound_not_match():
[    5s]         """
[    5s]         Test a more complex Not match error reporting.
[    5s]         """
[    5s]         def grammar():
[    5s]             return [Not(['two', 'three']), 'one', 'two'], _(r'\w+')
[    5s]     
[    5s]         parser = ParserPython(grammar)
[    5s]     
[    5s]         with pytest.raises(NoMatch) as e:
[    5s]             parser.parse('   three ident')
[    5s] >       assert "Expected 'one' or 'two' at" in str(e)
[    5s] E       assert "Expected 'one' or 'two' at" in '<ExceptionInfo NoMatch tblen=24>'
[    5s] E        +  where '<ExceptionInfo NoMatch tblen=24>' = str(<ExceptionInfo NoMatch tblen=24>)
[    5s] 
[    5s] tests/unit/test_error_reporting.py:166: AssertionError
[    5s] =============================== warnings summary ===============================
[    5s] tests/unit/test_error_reporting.py:99
[    5s]   /home/abuild/rpmbuild/BUILD/Arpeggio-1.9.0/tests/unit/test_error_reporting.py:99: DeprecationWarning: invalid escape sequence \/
[    5s]     def comments():     return _('\/\/.*$')
[    5s] 
[    5s] tests/unit/test_peg_parser.py:20
[    5s]   /home/abuild/rpmbuild/BUILD/Arpeggio-1.9.0/tests/unit/test_peg_parser.py:20: DeprecationWarning: invalid escape sequence \d
[    5s]     '''
[    5s] 
[    5s] tests/unit/regressions/issue_16/test_issue_16.py:65
[    5s]   /home/abuild/rpmbuild/BUILD/Arpeggio-1.9.0/tests/unit/regressions/issue_16/test_issue_16.py:65: DeprecationWarning: invalid escape sequence \*
[    5s]     """
[    5s] 
[    5s] tests/unit/test_examples.py::test_examples
[    5s]   /home/abuild/rpmbuild/BUILD/Arpeggio-1.9.0/tests/unit/../../examples/simple/simple.py:19: DeprecationWarning: invalid escape sequence \*
[    5s]     def comment():          return [_("//.*"), _("/\*.*\*/")]
[    5s] 
[    5s] tests/unit/test_examples.py::test_examples
[    5s]   /home/abuild/rpmbuild/BUILD/Arpeggio-1.9.0/tests/unit/../../examples/json/json.py:22: DeprecationWarning: invalid escape sequence \d
[    5s]     def jsonNumber():       return _('-?\d+((\.\d*)?((e|E)(\+|-)?\d+)?)?')
[    5s] 
[    5s] -- Docs: https://docs.pytest.org/en/latest/warnings.html
[    5s] =============== 9 failed, 81 passed, 5 warnings in 0.71 seconds ================

Support for Python 3

Support for correct PEG syntax

Hello,

I have been using Arpeggio as a tool to test a grammar (re)written in PEG recently. It's a really good tool (a bit hard to understand the debugging output sometimes), but I found out that it doesn't really use the syntax defined on the original PEG paper by Bryan Ford.

The main differences are the use of # instead of // for comments and the lack of semicolon for rule endings. I've seen that there is also a clean PEG alternative which doesn't follow the actual syntax (uses = instead of <-, as far as I know).

I understand that using the semicolon is much simpler that not using it when parsing PEG itself, but is there any reason behind the decision? Are there any plans to support the correct PEG syntax? It doesn't deviate much from the norm, so I think doing it might be a good idea.

Thanks in advance.

Parsing Error (arpeggio.NoMatch) For a Clean PEG Grammar Using Backslashes

A parsing error occurs (arpeggio.NoMatch) by using backslashes in the clean PEG grammar :

from arpeggio.cleanpeg import ParserPEG
ParserPEG(
     """
         hash = "{" double_quote "=>" double_quote "}"
         double_quote = r'"(?:[^\\"]|\\")+"'
     """, "hash").parse('{ "hello" => "world" }')

Can't get unordered sequence to work

For me the following sample does not work:

from textx.metamodel import metamodel_from_str

testmm = metamodel_from_str("""               
Colors:
   ("red" "green" "blue")#
;
""")

The following error is raised:

TextXSyntaxError: Expected '*' or '?' or '+' or '-' or '|' or attribute or '!' or '&' or ''' or '"' or '/' or rule_ref or '(' or ';' at position (3, 25) => 'n" "blue")*# ; '.

As I copied the example from the explanation of the unordered sequence from the help, it should run, I think. Where is my mistake?

Thanks in advance,

Horst

Triple quoted strings in PEG

Hi,

It's possible to use triple quoted strings in PEG like in Python (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)?

It's useful in regex, example

quoted <- r'''(["'])(?P<value>\w*)\1''';

Now it looks like empty string ('') in PEG.

Do not suppress rules with one child sequences

For grammars that have rules with one-child sequence (i.e. delegating to other rules) an optimisation measure will suppress those node from the parser model.
This causes problems in semantic analysis as the visitors will not get called.

def qfile():             return ZeroOrMore(entry), EOF
def entry():            return header, data
def header():           return "Min. 1st Qu.  Median    Mean 3rd Qu.    Max."
def data():             return _min, q1, med, mean, q3, _max
def number():           return _(r'\d*\.\d*|\d+')
def _min():             return number
def q1():               return number
def med():              return number
def mean():             return number
def q3():               return number
def _max():             return number

Here _min, q1, med... rules are delegating to number rule.
If we now write visitor method:

    def visit_number(self, node, children):
        return float(node.value)

If will not get called as the number node from the parser model is suppressed.

This optimisation measure should be controlled with a parser param.

URI Parse: IndexError: list index out of range

I'm trying to implement (pastebin LhSuFXhL) the URI grammar (RFC 3986) but I get an IndexError exception whether I code the host rule as indicated in the RFC.

As far as I know it could be solved whether I introduce the following changes in the grammar but I would like to be as accurate as possible.

def host():
    return Optional([ ip_literal, ipv4address, reg_name ])

def reg_name():
    return OneOrMore([ unreserved, pct_encoded, sub_delims])

ZeroOrMore gobbles syntax errors

Hi!

First off, compliments on the wonderful tool. It makes life so much easier!

But here's the but. Take this grammar...

def obj():         return "{", Optional(pair_list), "}"
def number():      return _("\d+")
def value():       return [ obj, number ]
def key():         return _("[a-zA-Z]+")
def pair():        return key, ":", value
def pair_list():   return pair, ZeroOrMore(",", pair)
def head():        return "[", key, "]"
def sect():        return head, ZeroOrMore(pair)

...and apply it to a string formatted in this way (note the comma syntax error after d:1,):

[SECTION]
a: 1
b: {c: 1,d:1,}

>>> ParserPython(sect).parse("[SECTION]\na:1\nb:{c:2,d:3,}")
[ [  '[' [0], key 'SECTION' [1],  ']' [2] ], [ key 'a' [4],  ':' [5], [ number '1' [6] ] ] ]
>>> _

In this parse tree, the entire contents of b got thrown out because of ZeroOrMore, sensibly I must add, since its docstring explicitly states that it will never fail. The problem is that whatever goes on in there is invisible and silent.

Verily, the parent process (visit_parse_tree for one) ends up receiving a perfectly well-formed parse tree, and it has no way of knowing that there was some kind of mistake. pair could resolve in an arbitrarily complex manner, and the input could be of arbitrary size (which is important because the developer has no hints about the precise point of failure).

So, would you consider an option to have ZeroOrMore print NoMatch exceptions and failure locations (if debugging mode is enabled, for example)?

On the user level, it would be a great help to have a validating pass similar to parseAll in pyparsing. If set to True, the parser must match the entire string or fail (and print the deepest point from which it could not proceed further).

Maybe there is an analogous feature in Arpeggio already? If there isn't, I'd appreciate it if you considered adding one.

Thanks!

Regex bug

r'[0-9a-zA-Z\s"-']+'
r'[0-9a-zA-Z\s"-']+'

Please note single quote inside this expression. Tried with slash. Same error.

Is this bug or I'm doing anything wrong here.