Comments (8)
@munching We aren't talking about your issue directly. If lark behaved correctly, you would have gotten an AttributeError
from accessing those names on .data
, which probably would have clued you in faster. That is what we are talking about fixing.
For collecting comments we normally suggest terminal callbacks, but that doesn't easily bring it into the tree, that is a bit of extra work.
from lark.
node.data
isn't part of the input, but of the grammar. Use node.meta
instead.
@erezsh We really need to change it back so that Tree
doesn't take Token, but just their values. Those being Token
's results in many issues.
from lark.
Hi @MegaIng
Thank you for the quick response. I totally misunderstood the "data" thing, now it works fine. Thank you very much for helping!
from lark.
@MegaIng Yeah, makes sense. The tokens of the parsed grammar aren't relevant to the output tree.
from lark.
Not sure what do you mean by that, I'm relatively new to Lark. But trying to access start_pos / end_pos of tokens is my attempt to bring comments into the tree that were ignored on parsing stage. I've done some research and it looks like there isn't a way to easily do that. I'm parsing a Pascal-like language and must use LALR parser because of its speed: my usual input is roughly 350 mb of source code and Earley works unacceptably slow. Tried to rewrite my grammar to not ignore the code but that's probably not possible with LALR. So in my case having tokens with information on where they were found in the input is the last hope of bringing in the comments.
from lark.
@munching You might find this useful: https://lark-parser.readthedocs.io/en/latest/recipes.html#collect-all-comments-with-lexer-callbacks
from lark.
@erezsh @MegaIng
Thank you very much for the explanations!
I'm actually already using terminal callbacks to collect comments and then knowing their start/end I go through the tree and try to find the "tightest" token that fully enclose my comment. Then the task is to figure out in between what children to put it to. And that's where I was stuck because data.start_pos
was giving me seemingly irrelevant numbers :)
from lark.
There is some code that I wrote once that did something like it.
Maybe I should clean it up and add it to Lark, as a utility function.
I don't know if it will be helpful for you, but this is the code:
def assign_comments(tree, comments):
nodes_by_line = classify(tree.iter_subtrees(), lambda t: getattr(t.meta, 'line', None))
nodes_by_line.pop(None,None)
rightmost_nodes = {line: max(nodes, key=lambda n: n.meta.column) for line, nodes in nodes_by_line.items()}
leftmost_nodes = {line: min(nodes, key=lambda n: (n.meta.column, -(n.meta.end_pos - n.meta.start_pos))) for line, nodes in nodes_by_line.items()}
for c in comments:
if c.line == c.end_line:
n = rightmost_nodes[c.end_line]
assert not hasattr(n.meta, 'inline_comment')
n.meta.inline_comment = c
else:
if c.end_line not in leftmost_nodes:
# Probably past the end of the file
# XXX verify this is the case
continue
n = leftmost_nodes[c.end_line]
header_comments = getattr(n.meta, 'header_comments', [])
n.meta.header_comments = header_comments + [c]
P.S. classify()
is basically like itertools.groupby
but it returns a dict.
from lark.
Related Issues (20)
- Please remove the duplicate PYPI record HOT 1
- Transformer raises AttributeError when a tree is only a token HOT 1
- Import lark grammar written in one python project into another HOT 2
- Need help figuring out why some characters are captured in __ANON_ HOT 2
- Need help with terminals not showing up as expected
- Transforming tree after standalone parser results in different AST HOT 4
- Making a comment by using regular expression HOT 5
- earley very, very slow HOT 24
- Cant read `meta` from Tree or Token? HOT 5
- How to define lark grammar for best parsing performance HOT 8
- Unable to parse Arabic text HOT 3
- Add `outlines` in the list of projects using Lark HOT 2
- Lark.open_from_package() does not support namespace packages HOT 2
- Stand-alone program cannot be run HOT 4
- Issue of installing lark in Python HOT 1
- Pipe in terminal regex not working as expected HOT 1
- Transformer Not Applying Expected Transformations in Lark Parser HOT 2
- Deprecation Warning HOT 6
- accepts() vs choices() in InteractiveParser HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lark.