Code Monkey home page Code Monkey logo

Comments (13)

erezsh avatar erezsh commented on June 1, 2024 1

Again, Lark does that with a regexp... Anyway, this isn't exactly an issue with Lark. Next time, such discussions are better placed at the discussions tab: https://github.com/lark-parser/lark/discussions

from lark.

MegaIng avatar MegaIng commented on June 1, 2024

Can you provide a few examples? I don't really understand the format, and how it's "contained in json". The example (?) you provided just looks very invalid.

from lark.

P-EB avatar P-EB commented on June 1, 2024

@MegaIng it looks invalid because it's en excerpt, but the right hand side is exactly what I gave, it's just that github filtered some backslashes, I escaped the excerpt.

Here is a json :-)

{
  "event": "imap_command_finished",
  "hostname": "host.example.com",
  "start_time": "2023-11-23T10:59:53.033463Z",
  "end_time": "2023-11-23T10:59:53.033673Z",
  "categories": [
    "imap",
    "service:imap"
  ],
  "fields": {
    "user": "dovecot_user_login",
    "local_ip": "XXXXXXX",
    "local_port": 993,
    "remote_ip": "YYYYYYY",
    "remote_port": 10904,
    "session": "T2w7vM8KmCrZRrUC",
    "duration": 121,
    "cmd_tag": "41",
    "cmd_name": "ID",
    "cmd_input_name": "ID",
    "cmd_args": "(\"name\" \"Thunderbird\" \"version\" \"115.4.1\")",
    "cmd_human_args": "(\"name\" \"Thunderbird\" \"version\" \"115.4.1\")",
    "tagged_reply_state": "OK",
    "tagged_reply": "OK ID completed.",
    "last_run_time": "2023-11-23T10:59:53.033432Z",
    "running_usecs": 93,
    "lock_wait_usecs": 0,
    "bytes_in": 42,
    "bytes_out": 67,
    "reason_code": [
      "imap:cmd_id"
    ]
  }
}

The content is in cmd_args or cmd_human_args

It follows this RFC.

from lark.

erezsh avatar erezsh commented on June 1, 2024

From a short glance, it doesn't really look like Lark is necessary here. You could probably get this done with just a regexp.

from lark.

MegaIng avatar MegaIng commented on June 1, 2024
  • You should just use the stdlib json parser for the entirqe json. You will then get a string of the form ("name" "Thunderbird" ...) i.e. without the extra outside quotes or the extra backslashes in the middle.
  • You can then use a slightly simpler grammar than what you have right now since you don't have to deal with the backslashes on the outside.
  • to convert an ESCAPED_STRING to an actual python string, you can just use eval. Since you now that it matches a specific regex, this isn't a dangerous use of eval.

from lark.

erezsh avatar erezsh commented on June 1, 2024

this isn't a dangerous use of eval

That is what https://docs.python.org/3/library/ast.html#ast.literal_eval is for.

from lark.

MegaIng avatar MegaIng commented on June 1, 2024

That is what https://docs.python.org/3/library/ast.html#ast.literal_eval is for.

Which also just calls eval in this situation, look at the sourcecode.

from lark.

P-EB avatar P-EB commented on June 1, 2024

From a short glance, it doesn't really look like Lark is necessary here. You could probably get this done with just a regexp.

You're totally right, it was my first move, but I think it's more failproof to use lark, and, also, far more elegant. Also, theoretically, the values may contain double quotes, which won't get handled properly by a classic regex logic, would it?

  • You should just use the stdlib json parser for the entirqe json. You will then get a string of the form ("name" "Thunderbird" ...) i.e. without the extra outside quotes or the extra backslashes in the middle.

Already done, the backslashes you see is just because it was extracted from a json representation, but of course the string I give to Lark is ("name" "xxx" "version" "xxx")

  • You can then use a slightly simpler grammar than what you have right now since you don't have to deal with the backslashes on the outside.

I don't see how, this doesn't work when I try to drop the escaped quotes I put.

  • to convert an ESCAPED_STRING to an actual python string, you can just use eval. Since you now that it matches a specific regex, this isn't a dangerous use of eval.

Ack, thanks!

from lark.

MegaIng avatar MegaIng commented on June 1, 2024
  • You can then use a slightly simpler grammar than what you have right now since you don't have to deal with the backslashes on the outside.

I don't see how, this doesn't work when I try to drop the escaped quotes I put.

Right, your current tripled up backslashes are because you aren't using raw strings, missed that part.

from lark.

P-EB avatar P-EB commented on June 1, 2024
  • You can then use a slightly simpler grammar than what you have right now since you don't have to deal with the backslashes on the outside.

I don't see how, this doesn't work when I try to drop the escaped quotes I put.

Right, your current tripled up backslashes are because you aren't using raw strings, missed that part.

I'm open to implement something more elegant if you think there is.

from lark.

erezsh avatar erezsh commented on June 1, 2024

@P-EB At the end of the day, ESCAPED_STRING is parsed as a regexp

If you use a regexp directly, you can also use capture groups to remove the double quotes, and not have to call eval, which is arguably a bit more efficient. (though that doesn't matter much)

from lark.

P-EB avatar P-EB commented on June 1, 2024

@P-EB At the end of the day, ESCAPED_STRING is parsed as a regexp

If you use a regexp directly, you can also use capture groups to remove the double quotes, and not have to call eval, which is arguably a bit more efficient. (though that doesn't matter much)

I read you, but as stated the tuple might contain double quotes on the values (eg: ("name" "Blah \" blah")) and while tokenization with lark seems to handle this properly, I am not aware of a way to do that efficiently with regexp. Do you have a solution I'm not aware of? I'm totally open to the idea that I don't know a feature of re.

from lark.

P-EB avatar P-EB commented on June 1, 2024

Sorry I thought that the question tag was designed for this purpose.

from lark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.