Code Monkey home page Code Monkey logo

Comments (7)

badicsalex avatar badicsalex commented on June 12, 2024

Unfortunately this can only be done with quite a bit of hacking currently. Maybe I should implement proper "external function" support.

But if you really want to do it right now, there is a way.

You have to manually create a parse_XID_START function in the same namespace as the compiled grammar (should be easy if you use the macro, a bit harder if you use a buildscript).

Something like the following:

peginate!("
@export
Idents = {idents:Ident};

@string
@no_skip_ws
Ident = (XID_START | '_') {XID_CONTINUE};
");

pub fn parse_XID_START<'a, _CT>(
    state: ParseState<'a>,
    _tracer: impl ParseTracer,
    _cache: &_CT,
) -> ParseResult<'a, char> {
    // Boilerplate
    let result = state.s().chars().next().ok_or_else(|| {
        state
            .clone()
            .report_error(ParseErrorSpecifics::Other)
    })?;

    // Actual business logic
    if !result.is_xid_start() {
        return Err(state.report_error(ParseErrorSpecifics::Other));
    }

    // More boilerplate
    // We are skipping a full character, so we should be OK.
    let state = unsafe { state.advance(result.len_utf8()) };
    Ok(ParseOk {
        result,
        state,
        farthest_error: None,
    })
}

pub fn parse_XID_CONTINUE<'a, _CT>(
    state: ParseState<'a>,
    _tracer: impl ParseTracer,
    _cache: &_CT,
) -> ParseResult<'a, char> {
    // Boilerplate
    let result = state.s().chars().next().ok_or_else(|| {
        state
            .clone()
            .report_error(ParseErrorSpecifics::Other)
    })?;

    // Actual business logic
    if !result.is_xid_start() {
        return Err(state.report_error(ParseErrorSpecifics::Other));
    }

    // More boilerplate
    // We are skipping a full character, so we should be OK.
    let state = unsafe { state.advance(result.len_utf8()) };
    Ok(ParseOk {
        result,
        state,
        farthest_error: None,
    })
}

#[test]
fn test_macro() {
    let s = Idents::parse("xyz áé8").unwrap();
    assert_eq!(s.idents, vec!["xyz", "áé8"]);
}

from peginator.

badicsalex avatar badicsalex commented on June 12, 2024

I understand the above is not convenient. What if I implemented a syntax like this:

@custom_char(crate::some_module::check_xid)
XID_START;

And then in some_module.rs, you could have a function like this:

fn check_xid(char) -> bool {
    char.is_xid_start
}

Maybe even use the unicode_xid directly:

@custom_char(unicode_xid::UnicodeXID::is_xid_continue)
XID_CONTINUE;

Would it fit your use-case?

from peginator.

oovm avatar oovm commented on June 12, 2024

This hacking meets my needs.

If it were to stabilize as a feature I would like to be

@custom_char(char_xid_start) // advance 1 char
XID_START = 'ANY';  // annotative description, do not use
@custom_string(keyword_where, 5) // advance 5 chars 
WHERE = 'case insensitive where'; // annotative description, do not use

@check_string(keyword_checker)
KEYWORD = Ident; // Requires successful capture of Ident and keyword_checker to return true

with function signature

fn char_xid_start(char) -> bool;
fn keyword_where(&str) -> bool;
fn keyword_checker(&str) -> bool;

from peginator.

badicsalex avatar badicsalex commented on June 12, 2024

The syntax I'm currently thinking about is:

@char
@check(unicode_xid::UnicodeXID::is_xid_continue)
XID_START = char; # In this case "char" is actually used

@extern(crate::keyword_where -> String)
WHERE; # no body, prefer comments

@check(crate::keyword_check)
KEYWORD = Ident;

There would be two new additions:

@check directive
The function gets whatever the rule spits out (char in case of @char rules, strings or structs in case of string or struct rules), and should return a bool.
So fn char_xid_start(char) -> bool and fn keyword_checker(&str) -> bool fits here, but you could also do checks on more complex structures with multiple fields in the middle of parsing.

@extern directive

It is a completely external parse function with the following signature:

fn custom_fn(&str) -> Result<(T, usize), &'static str>

If the string can be parsed OK, you return a tuple with the result, and the amount of bytes (!) the parser consumed from the input, wrapped in OK. If it cannot be parsed according to the rule, you return a static error message string wrapped in Err.

In case of the keyword where, it would probably look something like this:

fn keyword_where(&str) -> Result<(String, usize), &'static str> {
    if str.to_uppercase() == "WHERE" {
        let result = str.chars().take(5).collect();
        Ok((result, result.len()))
    } else {
        Err("Expected string 'where' (case insensitive)
    }
}

Or you could also return () or a named empty struct for efficiency.

It could also be used to parse numbers in place with something like fn number(&str) -> Result<(i64, usize), &'static str>

You could also do the requested r#"-string feature. In that case you would return the parsed string literal, but skip the starting and ending ##-s. (I really don't want to implement the stack, I think it's not a good addition to PEGs)

Any comments?

from peginator.

badicsalex avatar badicsalex commented on June 12, 2024

By the way, is the case insensitive match common?

Because I think adding a case insensitive string literal and char literal shouldn't be a big problem (the biggest problem is coming up with a good syntax for it).

from peginator.

badicsalex avatar badicsalex commented on June 12, 2024

Please see if the newly added features satisfy your needs. If so, I'll close the issue.

from peginator.

oovm avatar oovm commented on June 12, 2024

Good, this approach is very scalable.

from peginator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.