Code Monkey home page Code Monkey logo

Comments (5)

CircleCode avatar CircleCode commented on May 22, 2024

since lookahead is supported in tokens, maybe this can be done with some magic tokens… I'll investigate on it.

By the way, even if possible, it would mean one cannot skip \s, thus making parsing a little bit more tedious. So even if some cool tokens can do this, I suppose it would be great if this could be done by the compiler itself.

from compiler.

CircleCode avatar CircleCode commented on May 22, 2024

after thinking about it, since INDENT (or UNINDENT) is relative to previous line, it would require look behind assertions, which I suppose are not supported (because of tokens trimming the text from the left)

from compiler.

CircleCode avatar CircleCode commented on May 22, 2024

maybe there is something that can be used from this paper: http://michaeldadams.org/papers/layout_parsing/LayoutParsing.pdf

from compiler.

CircleCode avatar CircleCode commented on May 22, 2024

side note: here are the rules used by python's lexer to add INDENT and DEDENT tokens ( from http://docs.python.org/2/reference/lexical_analysis.html#indentation ):

First, tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces preceding the first non-blank character then determines the line’s indentation. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation.

The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows.

Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line’s indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.

it seems not too hard to implement, but the difficulty comes from the fact that this has to be mixed with user defined grammar

If I find some time, I'll try to play with this

Note: since we are parsing the stream as a single string (and not line by line), we have to include newline in our analysis, and take precedence over user defined tokens

from compiler.

Hywan avatar Hywan commented on May 22, 2024

Closing because it's old :-).

from compiler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.