Code Monkey home page Code Monkey logo

Comments (10)

evanmrsampson avatar evanmrsampson commented on July 28, 2024

The conflict was with the nonlogicals token. I added a precedence rule, and made it left associative and that seemed to end the conflict. However, I haven't tested it extensively.

from macleod.

Fxhnd avatar Fxhnd commented on July 28, 2024

Nice!

I have a couple questions on the addition of the *_error states for the grammar though. Is it a good idea for the parser to attempt to parse things that don't match the intended grammar? I see that the error states raise TypeExceptions but I think there is the possibility where those error states would be confusing later on. For example it looks like the p_connective_error() references back to the axiom_list which, while I get the intent, may let the parser continue onwards into the sentence and consume more tokens.

By the time either a p_connective_error is totally matched or the p_error is hit, the error that gets raised won't have anything to do with the actual problem anymore. For this reason I think it's better to lean back on just using the p_error built-in and throw the current token + line#. If we keep the grammar so it only proceeds according to our desired syntax it'll throw earlier and less ambiguously.

from macleod.

evanmrsampson avatar evanmrsampson commented on July 28, 2024

I agree that the rules definitely add ambiguity to the grammar, but for my project I need to be able to identify some common errors. From p_error there's no way to tell what went wrong without doing expensive string comparisons, so I believe this method has lower overhead. I could make a second parser, something like Parser_Unstable.py, if you think that's a good idea.

from macleod.

evanmrsampson avatar evanmrsampson commented on July 28, 2024

I haven't done extensive testing on the grammars I've written, but I have tested them, and they seem to work fine. Additionally, I think if the grammar can't identify the error correctly it should just default to p_error which will give the default syntax error stuff.

from macleod.

thahmann avatar thahmann commented on July 28, 2024

It might be worthwhile creating two versions of the grammar: one "parser" (like the one Rob had) that is just interested in parsing a correct file and throws a possibly cryptic error if something is wrong and a second "debugger" that may continue parsing after encountering an error as attempt to figure out what is wrong.
Ideally, the second one builds on the first one (maybe have a "debug" mode?) and doesn't require two complete sets of grammars.

from macleod.

evanmrsampson avatar evanmrsampson commented on July 28, 2024

I've been messing around and researching this issue and here's what I've found:

  • PLY supports states for lex, but not yacc. As such, you can't conditionally add more grammar rules to the parser.
  • We could have two different parsing files, but both would require the full grammar set (I tried importing the base parser into a debug parser, but PLY couldn't find my new rules)
  • If we do that, we may have to delete parsetab.py every time we change modes, which will slow down performance.

from macleod.

Fxhnd avatar Fxhnd commented on July 28, 2024

I definitely want to avoid having two versions of the parser since that would quickly become a pain to maintain if one version of the parser gets ahead of the other. The maintainer of PLY also states the PLY library doesn't play nicely with trying to support two different parsers within the same namespace so there is that too.

I might be misunderstanding what the extra *_error() states in the BNF give us. Looking at the code I can see that they raise TypeError if certain, expected, malformed input conditions are met. However, since the the *_error() are states just like any other it's possible for them to fail to parse correctly as well. In that case they all fall back to the aforementioned p_error() function. What is the expected output of the parser on malformed input?

Input like

 (forall (x)
 ((and
  (not (ZEX x))
  (Cont x x)
 ))
)

with the proposed scheme will yield errors like, TypeError: There may be a missing ")" parenthesis or if you remove the nested p_error(p) function call it'll yield TypeError: Missing connective

Instead of baking in non-exhaustive error states I think we'd be better off falling into the p_error(p) function and using the current lexing TOKEN plus parsing stack from yacc.yacc to provide useful feedback. e.g. the same snippet from above would produce:

Error in input at line 9!
    Unexpected Token: (
    Last Token: (
    Parser Stack:
        LPAREN START URI LPAREN FORALL LPAREN nonlogicals RPAREN LPAREN

Traceback (most recent call last):
  File "../../bin/parser.py", line 30, in <module>
    ontology = Parser.parse_file(args.file, args.sub, args.base, args.resolve)
  File "/users/rpskillet/Vault/thesis/macleod/macleod/parsing/Parser.py", line 329, in parse_file
    parsed_objects = yacc.parse(buff)
  File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
  File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 1201, in parseopt_notrack
    tok = call_errorfunc(self.errorfunc, errtoken, self)
  File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 192, in call_errorfunc
    r = errorfunc(token)
  File "/users/rpskillet/Vault/thesis/macleod/macleod/parsing/Parser.py", line 308, in p_error
    raise SyntaxError("Unexpected token found in input!")
SyntaxError: Unexpected token found in input!  

If we really want to decorate further then we could add add grammar rules that expect the error token. This is different than adding additional states to parse malformed input. For example, instead of including the new rule p_connective_error and adding it as a reduction for p_axiom we'd just add mirrors of our existing rules (e.g. p_conjunction_error). It looks like this was done with p_comment_error already. Building off the example from before :

def p_universal_error(p):
    """
    universal : LPAREN FORALL LPAREN nonlogicals RPAREN error axiom RPAREN
              | LPAREN FORALL LPAREN nonlogicals RPAREN error RPAREN
              | LPAREN FORALL LPAREN nonlogicals RPAREN axiom error RPAREN
    """
    raise ParseError("Error parsing term in Universal") 

which would yield the error output of

Error in input at line 9!
    Unexpected Token: (
    Last Token: (
    Parser Stack:
        LPAREN START URI LPAREN FORALL LPAREN nonlogicals RPAREN LPAREN

Traceback (most recent call last):
  File "../../bin/parser.py", line 30, in <module>
    ontology = Parser.parse_file(args.file, args.sub, args.base, args.resolve)
  File "/users/rpskillet/Vault/thesis/macleod/macleod/parsing/Parser.py", line 345, in parse_file
    parsed_objects = yacc.parse(buff)
  File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
  File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 1120, in parseopt_notrack
    p.callable(pslice)
  File "/users/rpskillet/Vault/thesis/macleod/macleod/parsing/Parser.py", line 238, in         p_universal_error
    raise ParseError("Error parsing term in Universal")
macleod.parsing.Parser.ParseError: Error parsing term in Universal

The nice thing about the error token is that it will continue to consume tokens from the lexer until it's able to resynchronize the parser and settle on which term we fudged. Normally this is used for error recovery and is the recommended procedure to follow that doesn't include black magic and other witchcraft. If mirrors are written for each p_* rule that we have then we both tighten the definition of what we can parse and can give very verbose feedback on what is wrong with a parsed string. Note the example rule I gave above is very minimal, in reality you would look at each of the tokens in p to figure out which error state you're in and print accordingly.

Of course the parser could also stand to have it's states renamed and some of the recursive rules revised so they are more clear. Hopefully that helps, let me know what you think!

from macleod.

evanmrsampson avatar evanmrsampson commented on July 28, 2024

Alrighty, I see what you're saying about malformed input. I suppose I didn't realize what kind of things we can do with PLY. I will remove the malformed input rule and try to just use the error token

from macleod.

evanmrsampson avatar evanmrsampson commented on July 28, 2024

@Fxhnd is it best to have the grammars consider the parentheses at the highest level or the lowest level? It seems like we try to take them into account as close to the terminals as possible, but I'm not sure what the reasoning is for one way or another. Should we try to stick to one convention?

from macleod.

Fxhnd avatar Fxhnd commented on July 28, 2024

I'm not sure if there is a solid design/logic benefit one way or another. I tried to keep the parentheses inside the sub-rules just to keep the rules less cluttered at the higher levels:

axiom : conjunction | disjunction | etc..

looks better to me than

axiom : lparen conjunction rparen | ...

plus having the parens within the sub-rules lets us do things like print the whole "construct" including the parens for debugging.

Also, I whipped up a quick and useful error output for the p_error() function that we could probably build off of. I think it'll probably do something funny if the error token ends up being parsed into a nonlogical so that's something that needs looking at. It prints out the construct that where the expected token was found and also underlines the error within the output.

Error at line 18! Unexpected Token: '(̲' :: "( forall ( x ) ( (̲ and ( not ( ZEX x ) ) ( Cont x x ) ) ) ) "
Error at line 15! Unexpected Token: 'l̲o̲l̲' :: "( forall l̲o̲l̲ ( x ) ( and ( not ( ZEX x ) ) ( Cont x x ) ) ) "

Let me know if you have any other questions!

def p_error(p):

    global parser

    # A little stack manipulation here to get everything we need
    stack = [symbol for symbol in parser.symstack][1:]
    pivot = len(stack) - stack[::-1].index(next((x for x in stack[::-1] if x.type == 'axiom'), None))
    current_axiom = stack[pivot:]
    current_axiom.append(p)

    # Use the brace level to figure out how many future tokens we need to complete the error token
    lparens = len([x for x in current_axiom if x.type == "LPAREN"])
    lookahead_tokens = []
    while lparens != 0:
        lookahead_token = parser.token()
        if lookahead_token == None:
            break
        else:
            lookahead_tokens.append(lookahead_token)
            if lookahead_token.type == "RPAREN":
                lparens -= 1
            elif lookahead_token.type == "LPAREN":
                lparens += 1

    # Put together a full list of tokens for the error token
    current_axiom += lookahead_tokens

    # String manipulation to "underbar" the error token
    axiom_string = []
    overbar_error = ''.join([x+'\u0332' for x in p.value])
    p.value = overbar_error

    for token in current_axiom:
        raw_token = token.value
        if isinstance(raw_token, str):
            axiom_string.append(raw_token + ' ') 
        elif isinstance(raw_token, list):
            for sub_token in raw_token:
                axiom_string.append(sub_token + ' ')

    print("""Error at line {}! Unexpected Token: '{}' :: "{}"\n\n{}""".format(p.lexer.lineno, p.value, ''.join(axiom_string), ' '.join([symbol.type for symbol])
    return p

from macleod.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.