Code Monkey home page Code Monkey logo

Comments (15)

apalala avatar apalala commented on May 16, 2024

A parser will be happy with part of the input unless the end-of-file expression ($) is mentioned in the grammar. This version should work as expected:

identifier = /\w+/ ;
expre = '{' expre '}' | expre '->' identifier | identifier ;
start = expre $ ;

from tatsu.

manueljacob avatar manueljacob commented on May 16, 2024

The fact that TatSu will drop the end of the input by default is a strange feature IMHO, but I got used to that already and this bug report is about a different issue. :)

In this case the parser drops the start of the input, not the end. However, the problem can also happen in the middle of the input: for example, in { { size } test }, { size } is dropped.

from tatsu.

apalala avatar apalala commented on May 16, 2024

@manueljacob Could you write a failing unit test for this case? I have some upcoming time to take a good look at issues like this.

from tatsu.

apalala avatar apalala commented on May 16, 2024

BTW, nobody ever told me that being specific about a grammar expecting to see the end of input was unexpected for users.

It would seriously break backwards compatibility to change that behavior, but I'll think about it.

from tatsu.

manueljacob avatar manueljacob commented on May 16, 2024

Sorry, I was a bit late, but in 6a84bfc I added a simplified test case.

The problem may be in the _recursive_call() method of the ParseContext class. When parsing "foo bar", in the beginning of the method the beginning "foo" is parsed with the right side of the expr rule (identifier). Then, because expr is a recursive rule, in the end of the method the rule is tried again, but this time with the buffer position at "bar".

from tatsu.

manueljacob avatar manueljacob commented on May 16, 2024

In 825ffde I moved and renamed the test, and modified it to look like the other tests in this file.
EDIT: More precisely, I created a fresh branch in my fork with one commit, including these changes.

from tatsu.

apalala avatar apalala commented on May 16, 2024

@manueljacob My fix for this bug broke several other things, so I reverted it.

from tatsu.

frozenblit avatar frozenblit commented on May 16, 2024

Hi, I've noticed a similar bug after adding an unary operator to the example calc grammar:

    start = expression $ ;
    expression
        =
        | expression ('+'|'-') term
        | term
        ;
    term
        =
        | term ('*'|'/') unary
        | unary
        ;
    unary
        =
        | '-' factor
        | factor
        ;
    factor
        =
        | '(' ~ @:expression ')'
        | '-' factor
        | number
        ;
    number = /\d+/ ;

For the input -3*-3*-5, the parser returns

[ [ ['-', '3'], '*', ['-', '3']], '*', ['-', '5']]

However, -3*-3*-5-5 returns only ['-', '5']. Opening the left-recursion in term by hand parses correctly.

Are those bugs related?

from tatsu.

Victorious3 avatar Victorious3 commented on May 16, 2024

I did a test with Whimsy using this grammar:
(Based on this failing test case https://github.com/neogeny/TatSu/blob/master/test/grammar/left_recursion_test.py#L349)

class DroppedInputGrammar: Grammar() {

    fun identifier() = word { build_str (
        syntax = { repeat1 { alphanum() } },
        value = { it.toString() }
    )}

    val expression = leftrec {
        self -> choice {
            self() && word(",") && self() ||
            identifier()
        }
    }

    override fun root() = expression()
}

With the following results:

foo      -> [foo]
foo bar  -> [foo, bar]
foo, bar -> [foo, bar]

Different, but not quite the expected output either. I might have done something wrong formulating the grammar tho. (Tagging @norswap again, thanks for your great help so far!)

What I'm suspecting is that nothing is telling the parser to stop when doing the left recursion.
It keeps matching "expression->identifier" until the input position doesn't grow anymore. In this case that means eating the first match. Tatsu returns only the last one and Whimsy returns every match.

Something would have to check if we are matching a non left recursive branch and then fail when this happens multiple times in sequence...
Could also be a problem with memoization since I haven't changed anything specific to that yet, but I don't know if it applies to this case, since the output I get from Whimsy isn't the expected one either.

from tatsu.

apalala avatar apalala commented on May 16, 2024

Interesting experiment. Why didn't I think of trying other parser generators? :-) It would be good to know what something YACC-like does with that grammar.

Memoization can be turned off in TatSu, but I think you have to instantiate Parser() to do it. It can also be switched off temporarily. I think it is done to handle negative lookaheads.

from tatsu.

norswap avatar norswap commented on May 16, 2024

@Victorious3 This really made me go WTF until I came to my senses.

There's an issue in the grammar, you didn't put the sequential stuff in a seq:

val expression = leftrec {
        self -> choice {
            seq { self() && word(",") && self() } ||
            identifier()
        }
    }

Without it what happens is that on round 1 you match an identifier. On round 2, the first call to self() recalls this identifier (the input position is restored past "foo"), but when it fails to match the comma, it tries the other side of the || expression without resetting the input position to the start of the input, causing it to match a second identifier after the first one. Hence the unexpected result.

It's way too easy to make such a mistake. This is terrible UX and I'm sorry. No such problem in Autumn 4 :p

Oh and thank you too! I's cool to have people scrutinizing all this stuff, it's how good software gets made. Keep up the good work 👍

from tatsu.

Victorious3 avatar Victorious3 commented on May 16, 2024

@norswap Getting the correct output now.

Tatsu's Trace:

↙start ~1:1
foo bar
↙expr↙start ~1:1
foo bar
↙expr↙expr↙start ~1:1
foo bar
⟲ expr↙expr↙start ~1:1
foo bar
↙identifier↙expr↙start ~1:1
foo bar
≡'foo' /\w+/
 bar
≡identifier↙expr↙start ~1:4
 bar
↙expr↙expr↙start ~1:5
bar
≡expr↙expr↙start ~1:5
bar
≢','
bar
↙identifier↙expr↙start ~1:5
bar
≡'bar' /\w+/
≡identifier↙expr↙start
↙expr↙expr↙start
≡expr↙expr↙start
≢','
↙identifier↙expr↙start
≢'' /\w+/
≡expr↙start
≡start

It's good until the first comma fails, then it tries identifier at the same position again...
What does wrapping it in seq do exactly?

@apalala Maybe the changes from #32 would apply now with my fix
EDIT: No, I just tried it. At least not without changes. The other tests still fail
Tatsu doesn't follow https://github.com/ncellar/autumn_v1/blob/61a2bc95403b06859b9dc7e12f19a95b539c2002/src/com/norswap/autumn/extensions/leftrec/LeftRecursive.java exactly, I suppose #32 was meant to rectify that in some way

from tatsu.

norswap avatar norswap commented on May 16, 2024

It's what I explain in my last message, seq makes sure that if the predicate (the &&-separated chain of parser calls) fail, the input position is reset to what it was at the start of the chain.

The basic heuristic: you're never supposed to mix && and || when calling parsers: && go inside seq only and || inside choice.

from tatsu.

Victorious3 avatar Victorious3 commented on May 16, 2024

I was wondering because whatever Tatsu does, it does not reset the input position, similar to the behavior you described when leaving out seq. Something could be wrong in how Tatsu does sequences, but since they work fine everywhere else I find it unlikely.
It's probably the implementation of the bounded left recursion that's lacking, I might try again from scratch on that one.

from tatsu.

Victorious3 avatar Victorious3 commented on May 16, 2024

Trying to get a fix for this into #75 as well, having some issues with whitespace (failing to retrieve seeds/saving them at the wrong position) but other than it seems to work. I also disabled memoization completely for the time being.

from tatsu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.