Comments (7)
Unfortunately this can only be done with quite a bit of hacking currently. Maybe I should implement proper "external function" support.
But if you really want to do it right now, there is a way.
You have to manually create a parse_XID_START function in the same namespace as the compiled grammar (should be easy if you use the macro, a bit harder if you use a buildscript).
Something like the following:
peginate!("
@export
Idents = {idents:Ident};
@string
@no_skip_ws
Ident = (XID_START | '_') {XID_CONTINUE};
");
pub fn parse_XID_START<'a, _CT>(
state: ParseState<'a>,
_tracer: impl ParseTracer,
_cache: &_CT,
) -> ParseResult<'a, char> {
// Boilerplate
let result = state.s().chars().next().ok_or_else(|| {
state
.clone()
.report_error(ParseErrorSpecifics::Other)
})?;
// Actual business logic
if !result.is_xid_start() {
return Err(state.report_error(ParseErrorSpecifics::Other));
}
// More boilerplate
// We are skipping a full character, so we should be OK.
let state = unsafe { state.advance(result.len_utf8()) };
Ok(ParseOk {
result,
state,
farthest_error: None,
})
}
pub fn parse_XID_CONTINUE<'a, _CT>(
state: ParseState<'a>,
_tracer: impl ParseTracer,
_cache: &_CT,
) -> ParseResult<'a, char> {
// Boilerplate
let result = state.s().chars().next().ok_or_else(|| {
state
.clone()
.report_error(ParseErrorSpecifics::Other)
})?;
// Actual business logic
if !result.is_xid_start() {
return Err(state.report_error(ParseErrorSpecifics::Other));
}
// More boilerplate
// We are skipping a full character, so we should be OK.
let state = unsafe { state.advance(result.len_utf8()) };
Ok(ParseOk {
result,
state,
farthest_error: None,
})
}
#[test]
fn test_macro() {
let s = Idents::parse("xyz áé8").unwrap();
assert_eq!(s.idents, vec!["xyz", "áé8"]);
}
from peginator.
I understand the above is not convenient. What if I implemented a syntax like this:
@custom_char(crate::some_module::check_xid)
XID_START;
And then in some_module.rs, you could have a function like this:
fn check_xid(char) -> bool {
char.is_xid_start
}
Maybe even use the unicode_xid directly:
@custom_char(unicode_xid::UnicodeXID::is_xid_continue)
XID_CONTINUE;
Would it fit your use-case?
from peginator.
This hacking meets my needs.
If it were to stabilize as a feature I would like to be
@custom_char(char_xid_start) // advance 1 char
XID_START = 'ANY'; // annotative description, do not use
@custom_string(keyword_where, 5) // advance 5 chars
WHERE = 'case insensitive where'; // annotative description, do not use
@check_string(keyword_checker)
KEYWORD = Ident; // Requires successful capture of Ident and keyword_checker to return true
with function signature
fn char_xid_start(char) -> bool;
fn keyword_where(&str) -> bool;
fn keyword_checker(&str) -> bool;
from peginator.
The syntax I'm currently thinking about is:
@char
@check(unicode_xid::UnicodeXID::is_xid_continue)
XID_START = char; # In this case "char" is actually used
@extern(crate::keyword_where -> String)
WHERE; # no body, prefer comments
@check(crate::keyword_check)
KEYWORD = Ident;
There would be two new additions:
@check directive
The function gets whatever the rule spits out (char in case of @char rules, strings or structs in case of string or struct rules), and should return a bool.
So fn char_xid_start(char) -> bool
and fn keyword_checker(&str) -> bool
fits here, but you could also do checks on more complex structures with multiple fields in the middle of parsing.
@extern directive
It is a completely external parse function with the following signature:
fn custom_fn(&str) -> Result<(T, usize), &'static str>
If the string can be parsed OK, you return a tuple with the result, and the amount of bytes (!) the parser consumed from the input, wrapped in OK. If it cannot be parsed according to the rule, you return a static error message string wrapped in Err.
In case of the keyword where, it would probably look something like this:
fn keyword_where(&str) -> Result<(String, usize), &'static str> {
if str.to_uppercase() == "WHERE" {
let result = str.chars().take(5).collect();
Ok((result, result.len()))
} else {
Err("Expected string 'where' (case insensitive)
}
}
Or you could also return () or a named empty struct for efficiency.
It could also be used to parse numbers in place with something like fn number(&str) -> Result<(i64, usize), &'static str>
You could also do the requested r#"-string feature. In that case you would return the parsed string literal, but skip the starting and ending ##-s. (I really don't want to implement the stack, I think it's not a good addition to PEGs)
Any comments?
from peginator.
By the way, is the case insensitive match common?
Because I think adding a case insensitive string literal and char literal shouldn't be a big problem (the biggest problem is coming up with a good syntax for it).
from peginator.
Please see if the newly added features satisfy your needs. If so, I'll close the issue.
from peginator.
Good, this approach is very scalable.
from peginator.
Related Issues (20)
- Add case insensitive string literals HOT 1
- Generalized position HOT 1
- Precedence climbing HOT 12
- Flattened enums HOT 5
- Dead code elimination
- Left recursion
- Cannot add @position to external rules HOT 2
- Extension name? HOT 3
- How to make indentation? HOT 6
- How to force-include a char? HOT 15
- Automatic rule inlining and type deduction HOT 5
- Confusing error message in case of a syntax error HOT 4
- Later processing errors should give error position information
- Bug: non-overridden choice in all-choices rule creates struct and not enum HOT 8
- Duplicated rule names are allowed
- Referencing non-existent rules in fields are allowed
- Allow the names "state", "global", "result", etc. in field names
- Handle fields (and rules) that have a rust keyword name
- '--railroad' doesn't get included in the cli? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from peginator.