Comments (13)
Again, Lark does that with a regexp... Anyway, this isn't exactly an issue with Lark. Next time, such discussions are better placed at the discussions tab: https://github.com/lark-parser/lark/discussions
from lark.
Can you provide a few examples? I don't really understand the format, and how it's "contained in json". The example (?) you provided just looks very invalid.
from lark.
@MegaIng it looks invalid because it's en excerpt, but the right hand side is exactly what I gave, it's just that github filtered some backslashes, I escaped the excerpt.
Here is a json :-)
{
"event": "imap_command_finished",
"hostname": "host.example.com",
"start_time": "2023-11-23T10:59:53.033463Z",
"end_time": "2023-11-23T10:59:53.033673Z",
"categories": [
"imap",
"service:imap"
],
"fields": {
"user": "dovecot_user_login",
"local_ip": "XXXXXXX",
"local_port": 993,
"remote_ip": "YYYYYYY",
"remote_port": 10904,
"session": "T2w7vM8KmCrZRrUC",
"duration": 121,
"cmd_tag": "41",
"cmd_name": "ID",
"cmd_input_name": "ID",
"cmd_args": "(\"name\" \"Thunderbird\" \"version\" \"115.4.1\")",
"cmd_human_args": "(\"name\" \"Thunderbird\" \"version\" \"115.4.1\")",
"tagged_reply_state": "OK",
"tagged_reply": "OK ID completed.",
"last_run_time": "2023-11-23T10:59:53.033432Z",
"running_usecs": 93,
"lock_wait_usecs": 0,
"bytes_in": 42,
"bytes_out": 67,
"reason_code": [
"imap:cmd_id"
]
}
}
The content is in cmd_args
or cmd_human_args
It follows this RFC.
from lark.
From a short glance, it doesn't really look like Lark is necessary here. You could probably get this done with just a regexp.
from lark.
- You should just use the stdlib
json
parser for the entirqe json. You will then get a string of the form("name" "Thunderbird" ...)
i.e. without the extra outside quotes or the extra backslashes in the middle. - You can then use a slightly simpler grammar than what you have right now since you don't have to deal with the backslashes on the outside.
- to convert an
ESCAPED_STRING
to an actual python string, you can just useeval
. Since you now that it matches a specific regex, this isn't a dangerous use ofeval
.
from lark.
this isn't a dangerous use of eval
That is what https://docs.python.org/3/library/ast.html#ast.literal_eval is for.
from lark.
That is what https://docs.python.org/3/library/ast.html#ast.literal_eval is for.
Which also just calls eval
in this situation, look at the sourcecode.
from lark.
From a short glance, it doesn't really look like Lark is necessary here. You could probably get this done with just a regexp.
You're totally right, it was my first move, but I think it's more failproof to use lark, and, also, far more elegant. Also, theoretically, the values may contain double quotes, which won't get handled properly by a classic regex logic, would it?
- You should just use the stdlib
json
parser for the entirqe json. You will then get a string of the form("name" "Thunderbird" ...)
i.e. without the extra outside quotes or the extra backslashes in the middle.
Already done, the backslashes you see is just because it was extracted from a json representation, but of course the string I give to Lark is ("name" "xxx" "version" "xxx")
- You can then use a slightly simpler grammar than what you have right now since you don't have to deal with the backslashes on the outside.
I don't see how, this doesn't work when I try to drop the escaped quotes I put.
- to convert an
ESCAPED_STRING
to an actual python string, you can just useeval
. Since you now that it matches a specific regex, this isn't a dangerous use ofeval
.
Ack, thanks!
from lark.
- You can then use a slightly simpler grammar than what you have right now since you don't have to deal with the backslashes on the outside.
I don't see how, this doesn't work when I try to drop the escaped quotes I put.
Right, your current tripled up backslashes are because you aren't using raw strings, missed that part.
from lark.
- You can then use a slightly simpler grammar than what you have right now since you don't have to deal with the backslashes on the outside.
I don't see how, this doesn't work when I try to drop the escaped quotes I put.
Right, your current tripled up backslashes are because you aren't using raw strings, missed that part.
I'm open to implement something more elegant if you think there is.
from lark.
@P-EB At the end of the day, ESCAPED_STRING is parsed as a regexp
If you use a regexp directly, you can also use capture groups to remove the double quotes, and not have to call eval, which is arguably a bit more efficient. (though that doesn't matter much)
from lark.
@P-EB At the end of the day, ESCAPED_STRING is parsed as a regexp
If you use a regexp directly, you can also use capture groups to remove the double quotes, and not have to call eval, which is arguably a bit more efficient. (though that doesn't matter much)
I read you, but as stated the tuple might contain double quotes on the values (eg: ("name" "Blah \" blah")
) and while tokenization with lark seems to handle this properly, I am not aware of a way to do that efficiently with regexp. Do you have a solution I'm not aware of? I'm totally open to the idea that I don't know a feature of re
.
from lark.
Sorry I thought that the question tag was designed for this purpose.
from lark.
Related Issues (20)
- Breaking changes / docs out of date HOT 7
- GrammarError: Rule 'anycase' used but not defined (in rule pipesyn) HOT 5
- Can not chain or merge two transformers HOT 7
- Black formatter breaks Lark standalone parser generation
- Is it possible to parse a list of terminals? HOT 2
- Partial parsing HOT 11
- Is there a way to receive callbacks when a rule finishes
- Checking for allowed tokens with accepts() triggers transformer callbacks HOT 4
- Directly used literals not returned by transformer HOT 1
- Ability to search for parseable substrings HOT 6
- multiline strings in python3 grammar HOT 1
- Type of `tree.data` is wrong. HOT 1
- File input to `parse` method gives TypeError: object of type '_io.TextIOWrapper' has no len() HOT 5
- CPython 3.11.7 breaks `regex` module compatible pattern width calculations HOT 9
- Exclude classes in create_tranformer by user provided pedicate
- Make Token inherit from Generic. HOT 2
- Data structure for getting possible terminal sequences? HOT 2
- AssertionError when using templates HOT 4
- lark.exceptions.UnexpectedCharacters: No terminal matches ',' in the current parser context, at line 1 col 8 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lark.