Comments (22)
Hi Uriva,
If you present your use-case, perhaps I can provide some insight into the best way to solve it.
Right now, ambiguity is resolved by choosing the shortest matching rule. When the length of the rules is equal, then yes, their position in the grammar marks their priority, however this is not an official feature (at least not right now), so don't rely on it.
My hunch is that adding a priority modifier should be enough for most purposes. Something in the spirit of:
rulename.3 : some thing in side
anotherrule.5: etc etc
Do you need something more intricate than that?
from lark.
If it is by the position in the grammar this is good enough.
I mean - just adding it to the library contract (and ignoring the rule length).
If you feel the length of the rule is important, then introducing a priority as you suggested sounds reasonable.
Thanks!!
from lark.
I feel that implicit priority is a bad idea. Grammars are confusing enough already, and ambiguity is even more-so.
I will add numbered priority soon. Let me know if you have any preferences regarding it.
from lark.
On second thought I agree. Implicitness here will be unclear.
from lark.
Hi Uriva,
I added this feature and pushed it to master. If you clone the repo to latest HEAD, you should be able to use it.
For an example of how to use it, see:
tests/test_parser.py : test_earley_prioritization()
Its code should demonstrate proper usage and effects. Let me know if you have any questions. Let me know if it works or not.
from lark.
I'm stumbling across the assert in common.py:45
When printing the rule that doesn't have 3 components I'm getting:
('import', ['_IMPORT', 'import_args', '_NL'], 'autoalias_import__IMPORT_import_args__NL', <lark.load_grammar.RuleOptions instance at 0x8d82950>)
Seems to be an import.
My imports:
%import common.NUMBER
%import common.WS
%ignore WS
from lark.
Are you sure you have the updated version? In the latest "master/HEAD", that assert is in line 44.
Add the full exception. It will help me understand why you're getting it.
from lark.
Probably my mistake because now it seems to be working.
However I have a new error:
AssertionError: Priority is the same between both rules: <rule1 : token1 token2> == <rule1 : token1 token2>
Should this error occur with the same rule on both sides?
from lark.
No, this is a silly bug on my part. Try the latest master and see if it solves the problem for you.
from lark.
Cool:)
from lark.
lexer_conf = LexerConf(tokens, ['WS', 'COMMENT'])
TypeError: __init__() takes exactly 4 arguments (3 given)
from lark.
Some of my rules have priority, but not all - I assume this is ok?
from lark.
That's weird. Try to erase *.pyc ?
Yeah, it has a default priority if you don't specify it.
from lark.
Sorry I had a merge issue.
But still getting the same original error.
Could it be that the two rules are identical but still pass the equality check?
from lark.
Maybe. If you can give me some use case that produces this error, it would be much easier for me to correct it.
from lark.
Ok, I'm trying to produce a minimal example.
from lark.
The string: a b c a b c
And the grammar:
rule1.1: "a" rule4 | "a" rule3
rule2.2: rule3 "a"
rule3: "b" "c"
rule4: rule3 | "b"
start: (rule1 | rule2)+
%import common.WS
%ignore WS
from lark.
Okay. Pushed a fix to master. Try it now.
from lark.
Seems good 👍
from lark.
Could you elaborate a bit on how this works?
e.g. if several rules were used in a parse, and they have different scores, how is the parse score computed? Is it simply the min/max priority of the rules used?
from lark.
Basically, whenever there is an ambiguity, the resolver chooses by these conditions, in this order:
- Priority (if and only if specified on both rules)
- If both are part of the same rule (like rule1 in your example), choose the shortest one (if such exists)
- Otherwise, choose the tree with the least amount of children
Just from writing it down I can see this isn't good enough (but I already knew that), but I'm not sure yet how to fix it. If you have any ideas, let me know. I will also consider partial fixes that will solve your current problem.
from lark.
I'm closing this issue. Let me know if there's anything that isn't resolved.
from lark.
Related Issues (20)
- Breaking changes / docs out of date HOT 7
- GrammarError: Rule 'anycase' used but not defined (in rule pipesyn) HOT 4
- Can not chain or merge two transformers HOT 7
- Black formatter breaks Lark standalone parser generation
- Is it possible to parse a list of terminals? HOT 2
- Partial parsing HOT 11
- Is there a way to receive callbacks when a rule finishes
- Checking for allowed tokens with accepts() triggers transformer callbacks HOT 4
- Directly used literals not returned by transformer HOT 1
- Ability to search for parseable substrings HOT 6
- Improve IMAP ID parser HOT 13
- multiline strings in python3 grammar HOT 1
- Type of `tree.data` is wrong. HOT 1
- File input to `parse` method gives TypeError: object of type '_io.TextIOWrapper' has no len() HOT 5
- CPython 3.11.7 breaks `regex` module compatible pattern width calculations HOT 9
- Exclude classes in create_tranformer by user provided pedicate
- Make Token inherit from Generic. HOT 2
- Data structure for getting possible terminal sequences? HOT 2
- AssertionError when using templates HOT 4
- lark.exceptions.UnexpectedCharacters: No terminal matches ',' in the current parser context, at line 1 col 8 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lark.