Comments (15)
A parser will be happy with part of the input unless the end-of-file expression ($
) is mentioned in the grammar. This version should work as expected:
identifier = /\w+/ ;
expre = '{' expre '}' | expre '->' identifier | identifier ;
start = expre $ ;
from tatsu.
The fact that TatSu will drop the end of the input by default is a strange feature IMHO, but I got used to that already and this bug report is about a different issue. :)
In this case the parser drops the start of the input, not the end. However, the problem can also happen in the middle of the input: for example, in { { size } test }
, { size }
is dropped.
from tatsu.
@manueljacob Could you write a failing unit test for this case? I have some upcoming time to take a good look at issues like this.
from tatsu.
BTW, nobody ever told me that being specific about a grammar expecting to see the end of input was unexpected for users.
It would seriously break backwards compatibility to change that behavior, but I'll think about it.
from tatsu.
Sorry, I was a bit late, but in 6a84bfc I added a simplified test case.
The problem may be in the _recursive_call() method of the ParseContext class. When parsing "foo bar", in the beginning of the method the beginning "foo" is parsed with the right side of the expr
rule (identifier
). Then, because expr
is a recursive rule, in the end of the method the rule is tried again, but this time with the buffer position at "bar".
from tatsu.
In 825ffde I moved and renamed the test, and modified it to look like the other tests in this file.
EDIT: More precisely, I created a fresh branch in my fork with one commit, including these changes.
from tatsu.
@manueljacob My fix for this bug broke several other things, so I reverted it.
from tatsu.
Hi, I've noticed a similar bug after adding an unary operator to the example calc grammar:
start = expression $ ;
expression
=
| expression ('+'|'-') term
| term
;
term
=
| term ('*'|'/') unary
| unary
;
unary
=
| '-' factor
| factor
;
factor
=
| '(' ~ @:expression ')'
| '-' factor
| number
;
number = /\d+/ ;
For the input -3*-3*-5
, the parser returns
[ [ ['-', '3'], '*', ['-', '3']], '*', ['-', '5']]
However, -3*-3*-5-5
returns only ['-', '5']
. Opening the left-recursion in term by hand parses correctly.
Are those bugs related?
from tatsu.
I did a test with Whimsy using this grammar:
(Based on this failing test case https://github.com/neogeny/TatSu/blob/master/test/grammar/left_recursion_test.py#L349)
class DroppedInputGrammar: Grammar() {
fun identifier() = word { build_str (
syntax = { repeat1 { alphanum() } },
value = { it.toString() }
)}
val expression = leftrec {
self -> choice {
self() && word(",") && self() ||
identifier()
}
}
override fun root() = expression()
}
With the following results:
foo -> [foo]
foo bar -> [foo, bar]
foo, bar -> [foo, bar]
Different, but not quite the expected output either. I might have done something wrong formulating the grammar tho. (Tagging @norswap again, thanks for your great help so far!)
What I'm suspecting is that nothing is telling the parser to stop when doing the left recursion.
It keeps matching "expression->identifier" until the input position doesn't grow anymore. In this case that means eating the first match. Tatsu returns only the last one and Whimsy returns every match.
Something would have to check if we are matching a non left recursive branch and then fail when this happens multiple times in sequence...
Could also be a problem with memoization since I haven't changed anything specific to that yet, but I don't know if it applies to this case, since the output I get from Whimsy isn't the expected one either.
from tatsu.
Interesting experiment. Why didn't I think of trying other parser generators? :-) It would be good to know what something YACC-like does with that grammar.
Memoization can be turned off in TatSu, but I think you have to instantiate Parser()
to do it. It can also be switched off temporarily. I think it is done to handle negative lookaheads.
from tatsu.
@Victorious3 This really made me go WTF until I came to my senses.
There's an issue in the grammar, you didn't put the sequential stuff in a seq
:
val expression = leftrec {
self -> choice {
seq { self() && word(",") && self() } ||
identifier()
}
}
Without it what happens is that on round 1 you match an identifier. On round 2, the first call to self()
recalls this identifier (the input position is restored past "foo"), but when it fails to match the comma, it tries the other side of the ||
expression without resetting the input position to the start of the input, causing it to match a second identifier after the first one. Hence the unexpected result.
It's way too easy to make such a mistake. This is terrible UX and I'm sorry. No such problem in Autumn 4 :p
Oh and thank you too! I's cool to have people scrutinizing all this stuff, it's how good software gets made. Keep up the good work
from tatsu.
@norswap Getting the correct output now.
Tatsu's Trace:
↙start ~1:1
foo bar
↙expr↙start ~1:1
foo bar
↙expr↙expr↙start ~1:1
foo bar
⟲ expr↙expr↙start ~1:1
foo bar
↙identifier↙expr↙start ~1:1
foo bar
≡'foo' /\w+/
bar
≡identifier↙expr↙start ~1:4
bar
↙expr↙expr↙start ~1:5
bar
≡expr↙expr↙start ~1:5
bar
≢','
bar
↙identifier↙expr↙start ~1:5
bar
≡'bar' /\w+/
≡identifier↙expr↙start
↙expr↙expr↙start
≡expr↙expr↙start
≢','
↙identifier↙expr↙start
≢'' /\w+/
≡expr↙start
≡start
It's good until the first comma fails, then it tries identifier
at the same position again...
What does wrapping it in seq
do exactly?
@apalala Maybe the changes from #32 would apply now with my fix
EDIT: No, I just tried it. At least not without changes. The other tests still fail
Tatsu doesn't follow https://github.com/ncellar/autumn_v1/blob/61a2bc95403b06859b9dc7e12f19a95b539c2002/src/com/norswap/autumn/extensions/leftrec/LeftRecursive.java exactly, I suppose #32 was meant to rectify that in some way
from tatsu.
It's what I explain in my last message, seq
makes sure that if the predicate (the &&
-separated chain of parser calls) fail, the input position is reset to what it was at the start of the chain.
The basic heuristic: you're never supposed to mix &&
and ||
when calling parsers: &&
go inside seq
only and ||
inside choice
.
from tatsu.
I was wondering because whatever Tatsu does, it does not reset the input position, similar to the behavior you described when leaving out seq
. Something could be wrong in how Tatsu does sequences, but since they work fine everywhere else I find it unlikely.
It's probably the implementation of the bounded left recursion that's lacking, I might try again from scratch on that one.
from tatsu.
Trying to get a fix for this into #75 as well, having some issues with whitespace (failing to retrieve seeds/saving them at the wrong position) but other than it seems to work. I also disabled memoization completely for the time being.
from tatsu.
Related Issues (20)
- PRs for trace output readability? HOT 2
- tatsu.ast.AST objects don't pickle/unpickle properly HOT 1
- Generated semantics use old syntax for super() in __init__()
- Naming nested parses causes inside nodes to be duplicated in the AST HOT 2
- Calling `str()` on a model containing non-JSON-representable types raises exception
- Generated `main()` passes the wrong arguments to `parser.parse()` HOT 17
- Unusual use of `__new__()` in `NodeWalker` HOT 3
- Generared code parser and dynamic parser return a different AST HOT 4
- NodeWalker._find_walker returns invalid walker due to conflicting ids in node.__class__. HOT 3
- Arguments passed to NodeWalker are not passed correctly if Node is a list instance.
- _walker_cache must be owned by an instance of NodeWalker, not by the NodeWalker class.
- Node missing children with nested dict HOT 5
- How to parse indent based language like yaml? HOT 7
- Support for currently active python versions (e.g. 3.7) HOT 20
- Interpreting parseinfo HOT 3
- Cannot change default comment regexp HOT 1
- Does TatSu 5.8.3 still pick up the regex module automatically? HOT 3
- Name is not used in tatsu.to_python_sourcecode HOT 1
- tests not excluded from sdist HOT 1
- Regex broken after v5.7.3 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tatsu.