Comments (5)
since lookahead is supported in tokens, maybe this can be done with some magic tokens… I'll investigate on it.
By the way, even if possible, it would mean one cannot skip \s
, thus making parsing a little bit more tedious. So even if some cool tokens can do this, I suppose it would be great if this could be done by the compiler itself.
from compiler.
after thinking about it, since INDENT
(or UNINDENT
) is relative to previous line, it would require look behind assertions, which I suppose are not supported (because of tokens trimming the text from the left)
from compiler.
maybe there is something that can be used from this paper: http://michaeldadams.org/papers/layout_parsing/LayoutParsing.pdf
from compiler.
side note: here are the rules used by python's lexer to add INDENT and DEDENT tokens ( from http://docs.python.org/2/reference/lexical_analysis.html#indentation ):
First, tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces preceding the first non-blank character then determines the line’s indentation. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation.
The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows.
Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line’s indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.
it seems not too hard to implement, but the difficulty comes from the fact that this has to be mixed with user defined grammar
If I find some time, I'll try to play with this
Note: since we are parsing the stream as a single string (and not line by line), we have to include newline in our analysis, and take precedence over user defined tokens
from compiler.
Closing because it's old :-).
from compiler.
Related Issues (20)
- Multiple start-symbols support
- Parsing tree is just the first token HOT 1
- Dependabot can't resolve your PHP dependency files
- Backtrack issue when rules overlap HOT 1
- PHP 7.4 deprecation warning in Bin/Pp.php
- Future of Compiler package HOT 4
- Enhance context output of UnrecognizedToken exception
- Enhance context output of UnexpectedToken exception
- Unexpected namespace assignment for PCRE containing colon
- Dependabot can't resolve your PHP dependency files
- Dependabot can't resolve your PHP dependency files
- mbstring problem
- Debug grammar tooling problems
- Bug when saving parser class
- Question: how to access/traverse nodes of grammar HOT 5
- Inlining code of the Parser and license HOT 2
- Madness with exceptions HOT 5
- Remove dependency to `ext/ctype` HOT 4
- Broken visualization of invalid input token in multiline input HOT 7
- Unrecognized Token in Lexer always reports Line 1? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from compiler.