textx / textx Goto Github PK

Domain-Specific Languages and parsers in Python made easy http://textx.github.io/textX/

License: MIT License

Python 99.24% Shell 0.36% Brainfuck 0.01% C 0.02% Limbo 0.11% AMPL 0.02% ASL 0.01% Dockerfile 0.25%

textx's Introduction

textX is a meta-language for building Domain-Specific Languages (DSLs) in Python. It is inspired by Xtext.

In a nutshell, textX will help you build your textual language in an easy way. You can invent your own language or build a support for already existing textual language or file format.

From a single language description (grammar), textX will build a parser and a meta-model (a.k.a. abstract syntax) for the language. See the docs for the details.

textX follows the syntax and semantics of Xtext but differs in some places and is implemented 100% in Python using Arpeggio PEG parser - no grammar ambiguities, unlimited lookahead, interpreter style of work.

Quick intro

Here is a complete example that shows the definition of a simple DSL for drawing. We also show how to define a custom class, interpret models and search for instances of a particular type.

from textx import metamodel_from_str, get_children_of_type

grammar = """
Model: commands*=DrawCommand;
DrawCommand: MoveCommand | ShapeCommand;
ShapeCommand: LineTo | Circle;
MoveCommand: MoveTo | MoveBy;
MoveTo: 'move' 'to' position=Point;
MoveBy: 'move' 'by' vector=Point;
Circle: 'circle' radius=INT;
LineTo: 'line' 'to' point=Point;
Point: x=INT ',' y=INT;
"""

# We will provide our class for Point.
# Classes for other rules will be dynamically generated.
class Point:
    def __init__(self, parent, x, y):
        self.parent = parent
        self.x = x
        self.y = y

    def __str__(self):
        return "{},{}".format(self.x, self.y)

    def __add__(self, other):
        return Point(self.parent, self.x + other.x, self.y + other.y)

# Create meta-model from the grammar. Provide `Point` class to be used for
# the rule `Point` from the grammar.
mm = metamodel_from_str(grammar, classes=[Point])

model_str = """
    move to 5, 10
    line to 10, 10
    line to 20, 20
    move by 5, -7
    circle 10
    line to 10, 10
"""

# Meta-model knows how to parse and instantiate models.
model = mm.model_from_str(model_str)

# At this point model is a plain Python object graph with instances of
# dynamically created classes and attributes following the grammar.

def cname(o):
    return o.__class__.__name__

# Let's interpret the model
position = Point(None, 0, 0)
for command in model.commands:
    if cname(command) == 'MoveTo':
        print('Moving to position', command.position)
        position = command.position
    elif cname(command) == 'MoveBy':
        position = position + command.vector
        print('Moving by', command.vector, 'to a new position', position)
    elif cname(command) == 'Circle':
        print('Drawing circle at', position, 'with radius', command.radius)
    else:
        print('Drawing line from', position, 'to', command.point)
        position = command.point
print('End position is', position)

# Output:
# Moving to position 5,10
# Drawing line from 5,10 to 10,10
# Drawing line from 10,10 to 20,20
# Moving by 5,-7 to a new position 25,13
# Drawing circle at 25,13 with radius 10
# Drawing line from 25,13 to 10,10

# Collect all points starting from the root of the model
points = get_children_of_type("Point", model)
for point in points:
    print('Point: {}'.format(point))

# Output:
# Point: 5,10
# Point: 10,10
# Point: 20,20
# Point: 5,-7
# Point: 10,10

Video tutorials

Introduction to textX

Implementing Martin Fowler's State Machine DSL in textX

Docs and tutorials

The full documentation with tutorials is available at http://textx.github.io/textX/stable/

You can also try textX in our playground. There is a dropdown with several examples to get you started.

Support in IDE/editors

Projects that are currently in progress are:

textX-LS - support for Language Server Protocol and VS Code for any textX based language. This project is about to supersede the following projects:
- textX-languageserver - Language Server Protocol support for textX languages
- textX-extensions - syntax highlighting, code outline
viewX - creating visualizers for textX languages

If you are a vim editor user check out support for vim.

For emacs there is textx-mode which is also available in MELPA.

You can also check out textX-ninja project. It is currently unmaintained.

Discussion and help

For general questions, suggestions, and feature requests please use GitHub Discussions.

For issues please use GitHub issue tracker.

Citing textX

If you are using textX in your research project we would be very grateful if you cite our paper:

Dejanović I., Vaderna R., Milosavljević G., Vuković Ž. (2017). TextX: A Python tool for Domain-Specific Languages implementation. Knowledge-Based Systems, 115, 1-4.

License

MIT

Python versions

Tested for 3.8+

textx's People

Contributors

Stargazers

Watchers

Forkers

zeph c4pone atveit pombredanne snookee elsevierknowledgebasedsystems jean danibishop max-kamps aranega saidctb morganjk vinayakatk textx-tools scribeszone sksundaram-learning arnejoris milenal9 slinn0 rowhit gborghesan y1997jain fgro93 geniedesalpages goto40 dkrikun alensuljkanovic aluriak simkimsia bigmangfu bmjjr asuivelentine codeaudit awesome-python bmwiedemann dmfullstack kushalvenkatesh stjordanis gridl barseghyanartur etsangsplk vlimmere andycardenasp markusschmaus cr0hn crestsphere mathias-luedtke sebix hlouzada nhamidn stanislaw c0c1 ionicsolutions irwin1985 sprime01 cedric05 africain30 heavenlog nikhilgarg28 wheatmathink shellkjell yuuyins felixonmars python-repository-hub tijana994 thambibharathi mnaderhirn yutiansut mgorny larslindved tshoang nevenaal iq-scm danyeaw furkanakkurt1335 davidchall u20024804

textx's Issues

How to parse whitespace sensitive grammar(Python-like) by textX？

able to split model to several files and use import

Is there any plan to make a built in support for import mechanism in model and have grammar support for it ?.

Right now I see only a way to define a simple grammar just for import statement. Use this grammar to parse the model (the compile context) than use the Abstract tree to to open the import files do it recursively till all includes resolve. Concatenate all the includes in to a single mega model and parse that one with the real grammar which ignores the import statement.

Is there a better way it do do right now ?.

Support properties in the user classes.

textX doesn't play well with the user supplied classes with properties named the same as attributes defined on meta-classes.

Incorrect edge case in treating escaped characters in quoted strings

I believe that the following is parsed incorrectly:

Python 3.6.4 (default, Mar  9 2018, 23:15:03)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(ins)>>> from textx.metamodel import metamodel_from_str
(ins)>>> mm = metamodel_from_str('Root: STRING;')
(ins)>>> mm.model_from_str(r"'\\' a'")
"\\' a"

In my understanding, \\ should be interpreted as a single backslash, the following ' should terminate the string, which should leave the hanging a' as non-parseable. At least that's what Python does.

[dot rendering] repetitions on attribute, pipe split

when defining a value as a repetition like: tag+=WORD['|']
the resulting dot file has an unescaped pipe between the values, leading to a wrong rendering

p.s. I will try to fix this, I'm creating the issue here as I could not on my fork/github

Existing DSL for JSON Schema?

Textx is an outstanding tool, thank you.

Is anyone aware of a JSON Schema implementation in Textx?

Rules that only differ by their separator modifier does not parse

I have a simple model:

seq1: these & are & good & friends
seq2: those | exclusive | ones

I try to parse it with this meta model:

Document: sequences*=Sequence;
Sequence: SequenceAnd | SequencePipe;
SequenceAnd: ID ':' values+=ID[eolterm '&'];
SequencePipe: ID ':' values+=ID[eolterm '|'];

The meta model checks OK, but when I try to use it on the model above I get:

Expected '&' or ID or ID or EOF at position .\model.txt:(2, 13) => 'q2: those *| exclusiv'.

To reproduce:
model.txt
and
meta.tx.txt
and run
textx check meta.tx.txt model.txt

As far as I can understand from reading the grammar this should work, but it might be a mistake on my side.

Crash when 'name' is not a hashable type.

When a rules's name is not a hashable type (e.g. a list), like in this rule:

Root:
	name+=ID[','] ;

process_node crashes ungracefully:

  File "/usr/bin/textx", line 11, in <module>
    sys.exit(textx())
  File "/usr/lib/python3.6/site-packages/textx/commands/console.py", line 51, in textx
    model = metamodel.model_from_file(args.model, debug=args.d)
  File "/usr/lib/python3.6/site-packages/textx/metamodel.py", line 465, in model_from_file
    encoding, debug=debug)
  File "/usr/lib/python3.6/site-packages/textx/model.py", line 168, in get_model_from_file
    debug=debug)
  File "/usr/lib/python3.6/site-packages/textx/model.py", line 194, in get_model_from_str
    model = parse_tree_to_objgraph(self, self.parse_tree[0])
  File "/usr/lib/python3.6/site-packages/textx/model.py", line 520, in parse_tree_to_objgraph
    model = process_node(parse_tree)
  File "/usr/lib/python3.6/site-packages/textx/model.py", line 333, in process_node
    parser._instances[id(inst.__class__)][inst.name] = inst
TypeError: unhashable type: 'list'

Make sequence a higher precendence than ordered choice

To align with EBNF and PEG grammars, sequence should be of a higher precedence than ordered choice.
Currently it is not the case leads to using parentheses more than it should and it is unintuitive to the seasoned EBNF grammar writters. I'm considering this to be a bug in the language.

textX fails to apply obj processors in the context of list attrs.

If the obj in the list happens to be a base python type (str, int, float, bool) an error is raised during search for object processor in the metamodel.

version is not available

Python modules have a convention to have a __version__ variable from the top level module.

This helps to recognize the version of the module, as there is no other way to do this in an environment without pip.

From version 1.6.1, textX has no such.

textx.model.ObjCrossRef not resolved to proper object

I was trying textx and there seems to be a problem with link rule references. Resulting model is containing textx.model.ObjCrossRef instead of specified objects.

Here is an example code:

from textx.metamodel import metamodel_from_file, metamodel_from_str

grammar_str = """
Expression:
    atts+=Attribute[','] 'formula' form=Formula
 ;
Formula:
    (values=FormulaExp (values=AndOr values=FormulaExp)*)
;


FormulaExp:
    ( values=BOOL )
    | ( values=Cond )
    | ( values='(' values=Formula values=')' )
;

Cond:
    negative?='not' (attribute = [Attribute|attr_id])
    ((operator = BinOp values=STRING) | (operator = 'in' '(' values+=STRING[','] ')') | (operator = 'between' values=STRING 'and' values=STRING))
;

attr_id:
    /attr_[a-f0-9]+/
;

Attribute:
    name = attr_id
;

BinOp:
    '<' | '>'   
;

AndOr:
    'and'|'or'
;
"""

mm = metamodel_from_str(grammar_str, ignore_case=True, auto_init_attributes=False)
m = mm.model_from_str("attr_123, attr_444 formula attr_123 < 'aa'")
print m.atts
print m.form.values[0].values[0].attribute
print m.form.values[0].values[0].attribute.cls, m.form.values[0].values[0].attribute.obj_name

Second print should be of class Attribute with name attr_123. It is ObjCrossRef instead.

Change missing element default values

Default value for all missing elements should be None.
Object processors can be used to supply default values.

Parser serialization/code generation

Currently, each time a program/text need to be analyzed, the grammar is parsed first to extract the metamodel and the language-parser.
Is there a way of serializing the generated parser so the grammar parsing can be done once and for all?

From prior discussion, @igordejanovic suggested two solutions:

pickle the parse graph,
generate the parse graph code as the parser code is a parser graph object, it could be done (the one prefered by @igordejanovic).

Indeed, the second solution seems to be the best one, and would add a behavior close to the ANTLR one without disturbing the existing

From what I understand, a simple visit of each parser node should be enough to recreate the same graph parser, am I missing something?
In such a scenario, the metamodel should also be generated, could this be done also very simply or is there some catch for the metamodel generation?

Thanks!

ImportError: No module named metamodel

Installing the latest version with pip does not seem to work anymore. After a fresh installation the following import fails:

from textx.metamodel import metamodel_from_file, metamodel_from_str

with the error:
ImportError: No module named metamodel

At first I thought it could be a configuration problem, but this has occurred on two separate machines.

Or maybe there has been a change in how the imports should be used and I'm not aware of that?

I should also add that I can get around this by downloading the zip and having the library directly in the project that uses it.

Support for Python 3

User defined non-trivial builtin types

I am attempting to understand if it is possible to feed pack a previous parser run into textX in order to split up definition of user defined reference types.

For the "SimpleType" example provided defining a class to add to the system is trivial, but when the grammar for a type is very complicated it becomes necessary to define a large portion of the grammar in a user defined class structure.

What I would like to do is run a preprocessing step on a string set and then extract the automatically generated classes from the system, and then feed those back in to the larger grammar definition as builtins.

The alternative to this if it is not possible would be to simply prepend the types to any submitted language file, but that seems like a naive approach.

As follows is the rather lengthy exploration of this concept.

from textx.metamodel import metamodel_from_file
from textx.metamodel import metamodel_from_str
from textx.export import metamodel_export
from textx.export import model_export

type_grammar = '''
Range:
    /\[/ left=expr ':'  right=expr /\]/;
Type:
    'fundamental'   range*=Range;
Typedef:
    'typedef' name=ID aliasof=Type ';';
expr: prec6;
prec6:   op=prec5  (op=op6   op=prec5)*;
prec5:   op=prec4  (op=op5   op=prec4)*;
prec4:   op=prec3  (op=op4   op=prec3)*;
prec3:   op=prec2  (op=op3   op=prec2)*;
prec2:   op=prec1  (op=op2   op=prec1)*;
prec1:   op=prec0  (op=op1   op=prec0)*;
prec0:   op=term   (op=op0   op=term)*;
unary:              op=opU   op=term;
term:   ('(' op=prec6 ')') | op=number | op=unary;
op6:  '|'                       ;
op5:  '^'                       ;
op4:  '&'                       ;
op3:  '<<' | '>>'               ;
op2:  '+'  | '-'                ;
op1:  '*'  | '\/' | '%' | '\/\/';
op0:  '**'                      ;
opU:  '-'  | '+' | '~'          ;
number: /\d+/;
'''

entity_grammar= '''
EntityModel:
    typedefs*=Typedef
    entities+=Entity
;
Entity:
    'entity' name=ID
    '{'
        fields += EntityField
    '}'
;
EntityField:
    Property | Reference;
Property:
    name=ID ':' type=Type ';';
Reference:
    name=ID ':' type=[RefType] ';';
RefType:
    Entity |  Typedef;
Comment:
    /\/\/.*$/
;
'''

########################################################################################################################
# What if I split up the grammar into the type system and the structural system.
# Could I run the grammar through the parser once, generate the classes, and then
# send those classes back to the parser a second time with the builtins?
########################################################################################################################
builtin_type_str = '''
typedef bar   fundamental[7:0];
'''

test_str='''
entity testExpr {
  name: fundamental[3:0];
  foo : bar;
}
'''

seperate_type_grammar = 'BuiltinTypes: types+=Typedef;' + type_grammar

class TypedefCollector:
    def __init__(self):
        self.objects = []
        pass
    def typedef_obj_processor(self,x):
        self.objects.append(x)
        print("typedef_obj_processor:")
        print(x)

class ModelCollector:
    def __init__(self):
        self.metamodel = []
        self.model = []
    def model_processor(self,metamodel, model):
        self.metamodel = metamodel
        self.model = model
        print("model_processor")
        print(metamodel)
        print(model)

collector = TypedefCollector()
mcollector = ModelCollector()

obj_processors = {
    'Typedef': collector.typedef_obj_processor,
    }

pre_meta = metamodel_from_str(seperate_type_grammar)
metamodel_export(pre_meta, 'seperate_type_model.dot')

pre_meta.register_obj_processors(obj_processors)
pre_meta.register_model_processor(mcollector.model_processor)

type_model = pre_meta.model_from_str(builtin_type_str,debug=False)
model_export(type_model, 'seperate_builtin_types.dot')

# Now we have a type model seperate from the full parser grammar. How do we send that back?
model_ns = mcollector.model.namespaces[None]
model_ns.pop('BuiltinTypes')

classes = list(model_ns.values())


full_grammar = entity_grammar + type_grammar

meta = metamodel_from_str(full_grammar,
                           classes=classes,
                           builtins=collector.objects)

metamodel_export(meta, 'seperate_test_entity_model.dot')


example_model = meta.model_from_str(test_str)
model_export(example_model, 'seperate_test_example.dot')

exit(0)

########################################################################################################################
# One option would be to simply append the typedefs to an incoming string to be parsed...
########################################################################################################################
builtin_type_str = '''
typedef bar   fundamental[7:0];
'''

test_str='''
entity testExpr {
  name: fundamental[3:0];
  foo : bar;
}
'''

new_test_str = builtin_type_str + test_str

full_grammar = entity_grammar + type_grammar
meta = metamodel_from_str(full_grammar)
metamodel_export(meta, 'seperate_type_model.dot')
example_model = meta.model_from_str(new_test_str)
model_export(example_model, 'seperate_type_test.dot')


exit(0)

########################################################################################################################
# Without builtins everything is fine
########################################################################################################################

test_str='''
entity testExpr {
  name:fundamental[3:0];
}
'''

full_grammar = entity_grammar + type_grammar
meta = metamodel_from_str(full_grammar)
metamodel_export(meta, 'seperate_type_model.dot')
example_model = meta.model_from_str(test_str)
model_export(example_model, 'seperate_type_test.dot')

exit(0)

########################################################################################################################
# This no longer works because range is now very complicated.
# Wouldn't I have to duplicate the entire type grammar to express it?
########################################################################################################################

full_grammar = entity_grammar + type_grammar
meta = metamodel_from_str(full_grammar)

class Typedef(object):
    def __init__(self, parent, name,aliasof):
        self.parent = parent
        self.name = name
        self.aliasof = aliasof
    def __str__(self):
        return self.name

type_builtins =  {                 #This string here isn't what we want, we would need a "Type object"
        'integer': Typedef(None, 'bar','fundamental[7:0]')
}


entity_mm = metamodel_from_file(full_grammar,
                                classes=[Typedef],
                                builtins=type_builtins,
                                debug=False)

exit(0)

Recursion difference with Arpeggio and textX

The following grammar definition in Arpeggio works properly and is able to parse a
python arithmetic string.

def expression(): return op_prec6
def op_prec6():   return op_prec5,  ZeroOrMore(['|'],              op_prec5)
def op_prec5():   return op_prec4,  ZeroOrMore(['^'],              op_prec4)
def op_prec4():   return op_prec3,  ZeroOrMore(['&'],              op_prec3)
def op_prec3():   return op_prec2,  ZeroOrMore(['<<','>>'],        op_prec2)
def op_prec2():   return op_prec1,  ZeroOrMore(['+','-'],          op_prec1)
def op_prec1():   return op_prec0,  ZeroOrMore(['*','/','%','//'], op_prec0)
def op_prec0():   return op_prime,  ZeroOrMore(['**'],             op_prime)
def op_unary():   return                       ['-','+','~'],      op_prime
def op_prime():   return [("(", expression, ")"),number,op_unary]
def calc():       return OneOrMore(expression), EOF
def number():     return _(r'\d+')

but when that same grammar is expressed in the textX syntax it fails with an infinite recursion.

arith_grammar = '''
/*  2 */ expression: op_prec6;
/*  3 */ op_prec6:   op_prec5 | ('|'                         | op_prec5)*;
/*  4 */ op_prec5:   op_prec4 | ('^'                         | op_prec4)*;
/*  5 */ op_prec4:   op_prec3 | ('&'                         | op_prec3)*;
/*  6 */ op_prec3:   op_prec2 | ('<<' | '>>'                 | op_prec2)*;
/*  7 */ op_prec2:   op_prec1 | ('+'  | '-'                  | op_prec1)*;
/*  8 */ op_prec1:   op_prec0 | ('*'  | '\/' | '%' | '\/\/'  | op_prec0)*;
/*  9 */ op_prec0:   op_prime | ('**'                        | op_prime)*;
/* 10 */ op_unary:              ('-'  | '+' | '~'            | op_prime);
/* 11 */ op_prime: ('(' expression ')') | INT | op_unary;
/* 12 */ calc: expressions+=expression;
'''

arith_meta = metamodel_from_str(arith_grammar)
metamodel_export(arith_meta, 'arith_model.dot')

The error message is as follows

Traceback (most recent call last):
  File "C:/home/workspace_cdt/fpga_projects/python/grammer/meta-hdl.py", line 134, in <module>
    arith_meta = metamodel_from_str(arith_grammar)
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\metamodel.py", line 474, in metamodel_from_str
    language_from_str(lang_desc, metamodel)
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\textx.py", line 857, in language_from_str
    lang_parser = parser.getASG()
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\arpeggio\__init__.py", line 1433, in getASG
    sem_actions[sa_name].second_pass(self, asg_node)
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\textx.py", line 354, in second_pass
    self._resolve_rule_refs(grammar_parser, model_parser)
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\textx.py", line 284, in _resolve_rule_refs
    cls._tx_peg_rule = resolve(cls._tx_peg_rule, cls.__name__)
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\textx.py", line 278, in resolve
    return _inner_resolve(node, cls_rule_name)
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\textx.py", line 247, in _inner_resolve

...Repeat enough times to blow up the stack...

  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\textx.py", line 253, in _inner_resolve
    if initial_rule_name in model_parser.metamodel:
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\metamodel.py", line 411, in __contains__
    self[name]
  File "C:\Anaconda2\envs\parse_env\lib\site-packages\textx\metamodel.py", line 377, in __getitem__
    if name in self._current_namespace:
RecursionError: maximum recursion depth exceeded

Thank you for your work.

Finish initial documentation

Infinite recursion in visualization

When I try to visualize this model with textx visualize simple.tx:

List: members+=Value;
Value: ('{' List '}') | ID;

I get

  ...
  File "c:\users\venti\documents\eclipse\textx\textx\export.py", line 48, in r
    result = "|".join([r(x) for x in s.nodes])
  File "c:\users\venti\documents\eclipse\textx\textx\export.py", line 50, in r
    result = " ".join([r(x) for x in s.nodes])
  File "c:\users\venti\documents\eclipse\textx\textx\export.py", line 46, in r
    result = text(s)
RuntimeError: maximum recursion depth exceeded while calling a Python object

textx check simple.tx says the meta-model is ok.

Also, this form works:

List: '{' members+=Value '}';
Value: List | ID;

but this one doesn't:

ListSyntax: '{' List '}';
List: members+=Value;
Value: ListSyntax | ID;

I think all of these should be should work. I tried with master branch of Arpeggio and textX, as well as Arpeggio-1.4 and textX-1.3.

Support for mixed abstract/match rules

Currently it is assumed that rule can be either Normal (which results in metaclass creation), Abstract, e.g.

RuleName:
   First | Second
;

or Match rule, e.g.

Color:
   "red" | "blue" | "green";

Sometimes it is useful to have rules that is combination of Abstract and match rule, e.g.

SomeRule:
    INT | FLOAT | MyObject | "null"

This is not well supported in the model visualization.

Shadowing of metamodel module by the function from the model API

This is an issue spotted by @aranega.
See this comment

Suppress operator for matches in match rules

Match rules will return whole matched string at the moment. There are situations when this behavior is not preferable. Although, we can always make a grammar to work around this the result won't be elegant.

e.g.

SomeRule:
   color=Color
;
Color:
   'clr' 'black'
;

Color is match rule but in this case we don't want clr keyword to be returned as a part of the string. It could be written as:

SomeRule:
   'clr' color=Color
;
Color:
   'black'
;

This will work but it is not nice because we are consuming clr keyword relating to color in SomeRule. If we have color definitions in other places in the grammar we must consume clr keyword there too.

It would be better to write Color rule like this:

Color:
   'clr'- 'black'

- would suppress that match from the result. This will be a nice addition to repetitions for simple matches (issue #20).

comment makes fail the correct language parsing

I have the following model, which is sensitive to comment existence.
Whenever a comment exists, the parse fails to return the correct AST.

Symptom 1

The model is the following:

file:
    lines*=cascading
;

Comment:
    /#.*$/
;

space[noskipws]:
    /[ \t]*/
;

cascading:
    group | line
;

line[noskipws]:
   /\s*/-
   (modifier=ID space &ID)? keyword=ID 
   /\s*/-
;

group:
    keyword=ID '{' entries*=cascading '}'
;

I would like to parse a file like this.

group
{
    ZERO
    ONE
    TWO
}

What I expect, that all the capitalized word here catch as keyword.
Which is working well.

But, if I make a comment after the ZERO like this,

...
   ZERO # comment
...

the behavior changes totally, and the ZERO will be a modifier for ONE.
This is what I declared as bug as the first part.

Symptom 2

If I change the model, to remove the keyword from the group like this:

group:
    '{' entries*=cascading '}'
;

and remove the corresponding group keywords from the file like this:

{
    ZERO # comment
    ONE
    TWO
}

, the bug disappears, and work flawlessly even with comments.

That means, that somehow the group language element effects the parsing of the line, which is also weird.

Hint:

I checked the debug output of these solutions for commented input.
Whenever the fault happened, the textX do not try to parse the comment just jumped over it instead.

Grammar problem

Hi Igor Dejanović, I first want to congrat you for your work. textX is a realy easy and usefull DSL tool.
I am currently facing a problem in my grammar.

Method:
    'func('  (params+=Parameter[','])?  ')'
;

Parameter:
    (type=ID)?
    name=ID
;

And here is a simple test:

from os.path import join, dirname
from textx.metamodel import metamodel_from_file

def test():
    meta_model = metamodel_from_file(
        join(dirname(__file__), 'my_grammar.tx'))
    model = meta_model.model_from_str(
        """
        func( a b, c )
        """
    )

I get a syntax error : "Expected ID at position (2, 22) => 'c( a b, c *)"
So textX has recognized 'c' as a type instead of a name and expects to find a name next. I tried many other kind of pattern but I still get that error...
Any idea ?

Report rule name on NoMatch for match rules.

Instead of reporting a string or regex match that failed a rule name will be more user-friendly in error messages.

Support for "include Rules"

A form of rules that will generate attributes on the calling meta-class.

Rule1:
    a=INT Rule2;
Rule2:
    b=STRING;

Rule2 is called directly (not from an assignment) thus b attribute will be created on Rule1 instance. Rule2 instance will not be instantiated.

problem with resolution of links with complex names

Hi,

During development I have run into a strange problem. I am not sure whether this is a bug or it's intentional. If I try to reference to an object that has name more complex than a simple ID I get an error.
I have created a minimal example to demonstrate the problem :

from textx.metamodel import metamodel_from_str
tx= """
EntityModel:
  Entity
  Reference
;
EntityName:
  ID '.' ID
;
Entity:
  'entity' name=EntityName 
;
Reference:
  'reference' ref=[Entity]
;
"""
text= """
entity Some.Person
reference Some.Person
"""
mm= metamodel_from_str(tx)
model= mm.model_from_str(text)

Running the above program gives the following error:
Expected EOF at position (3, 15) => 'rence Some*.Person

Somehow when referencing the program doesn't use original expression for name, but rather expects an ID.

Rules that have choice of simple matches and base types should be treated as match rules

Repetition/optional operators for regex and string matches

Add repetition operators *, + and optional ? for regex and string matches.

Infinite recursion during dot export for recursive match rules

Running visualization on the following grammar consisting only of a single match rule

First: 'a' First;

leads to infinite recursion.

Grammar: Unordered group with nonsequential sub-repetitions

Rule: ('a' 'b'* 'c')# ; parses c a, a c b c, or b b b a c, but not b c b b a. Is there a way to express such a grammar?

[dot rendering] escape curly brackets in string of label

original label="{WORD|[^*.#\[\]\\{\\}\,]+}"]
aimed label="{WORD|[^*.#\[\]\\\{\\\}\,]+}"]

expected rendering on xdot [^*.#\[\]\{\}\,]+

actually... we need label="{WORD|[^*.#\\[\\]\\\{\\\}\\,]+}"] -> so, extra escaping
on the slashes in front of the [ ] , ...there is a general issue with the escaping

p.s. I will try to fix this, I'm creating the issue here as I could not on my fork/github

Canonical way to to convert AST objects to dicts

For viewing the whole parsed AST in a Python console/notebook, or iterating over it, it's practical to convert the AST objects to dicts. At the moment, I'm using the following snippet, but it would be nice to have a canonical way in TextX, perhaps as an object method.

def obj_to_dict(o):
    if isinstance(o, TextXClass):
        return {
            k: obj_to_dict(v) for k, v in vars(o).items()
            if k[0] != '_' and k != 'parent'
        }
    elif isinstance(o, list):
        return [obj_to_dict(x) for x in o]
    return o

Refactoring ambiguous rule

I have been using the following grammar

M: defs+= Def; Def: name=ID '=' val+=ID[eolterm];

to parse data like

a = b
c = d e
f = g

I am now trying to rewrite it without having to rely on newlines (eolterm) to separate each Def. I have been struggling with lookahead predicates to remove ambiguities, to no avail. Any ideas on how to implement that?

@igordejanovic is there a textx mailing list or some other preferred place for posting 'help requests' like this one, instead of using github issues?

Support for PEG syntactic predicates

This is already supported in Arpeggio so the implementation should be straightforward.

Object processor

Currently there is a support for model postprocessing using register_model_processor call on metamodel.

There should also be a similar postprocessing support on a object level. Register call will accept a callable and a class/rule name for which a callable will get called.

Change classes for instantiation to iterable instead of dict

The names of classes for instantiation should not be different from the names of given class object, thus to simplify API this should be changed to iterable instead of dictionary.

Match rule processors

Match rules always return strings. In some cases it would be nice to transform it to other python types or to process the string in a way not possible by the current textX grammar capabilities.

Match rule processors will work in a similar way to object processors but with a twist. It would be a python callable that receives a matched string and returns whatever should be used in its place.

Reverse parsing: from AST to code

Hi, amazing project! I just started using it to parse a small DSL that serves as an input file to a large scientific program.

I was wondering how much work would it be to add the ability to go back from the parsed AST to the source code. The use case would be to parse code, modify the AST to some degree and write it back to code.

(I assume this functionality does not exist at the moment, I didn't find any mention of it in the docs.)

Proper treatment of linebreaks

I'm dealing with a language where the end of line has a meaning (Fortran). Naturally, there is a mechanism for including a linebreak (using the & character). Currently, I deal with this by including & *\n * as a possible Comment. This works great, but I was wondering if this approach has some potential catches, or if there is a better way to do this. I feel that a linebreak is more like a whitespace, but whitespaces can be only single characters.

Instantiating user supplied classes

textX currently makes python class for each common rule. In some scenarios it would be usable to supply a collection of Python classes (probably dict) that would be instantiated in the model instead of dynamically created ones.

Full unicode support

I'd like to know how to parse full unicode string. \w is not enough for parsing lines like

2785-599 São Domingo

I've been trying to use /'regEx'/u or even \p{L} like in python but it doesn't seem to work.
Using string match, it works:

address = '2785-599 São Domingo'

But when the instance is created a new error arises:

'ascii' codec can't encode character

Appreciate your help.

Potential memory leak

I used the IBM Rhapsody LightSwitch.rpy file and multiplied the content a couple of times to get it up to 2.7MB. Run it with caution memory consumption is very high.

Sorry i had to rename it github is not allowing upload of *.rpy files.

LightSwitch.txt

Support for off-side (indentation based) languages

Object postprocessor should be called after object graph full init is complete

Object processors are called now immediately after object is constructed. This leads to all sorts of problem regarding accessing other objects using references. References may still be unresolved.

Better approach would be to postpone obj. processor call after model is fully initialised.

Should comments be treated as whitespaces?

textX/Arpeggio treat comments in a special way. More specifically, noskipws rule modifier will disable whitespace skipping but will leave comments skipping in place which may lead to some gotchas.

If comments should be treated like whitespaces than noskipws would mean don't skip whitespaces and comments. This would be a backward incompatible change. This would also mean that comments should be handled manually in noskipws rules.

This issue is created as a poll to gather feedback from the community. Please vote with emoji thumbs up/down.