kevinmehall / rust-peg Goto Github PK

View Code? Open in Web Editor NEW

1.4K 20.0 106.0 1.14 MB

Parsing Expression Grammar (PEG) parser generator for Rust

Home Page: https://crates.io/crates/peg

License: MIT License

Shell 0.11% Rust 99.89%

parsing-expression-grammars peg rust grammar parsing parser-generator

rust-peg's People

Contributors

Stargazers

Watchers

Forkers

fiedzia vhbit nham mythmon globin bartolsthoorn simonsapin untitaker emk wizeman jneem mttr daramos zrsmith92 nejucomo crazymykl buster ebfull rustyrazorblade argxento pzol loganmhb jsgf iximeow canndrew fredriknoren pxlp troplin koba-e964 benesch rhysd eeeeeta solidsnack ivanbakel jayshua insipx moritzkn vickenty jblondin pombredanne sandeep-datta livingthought lily-mara ignatenkobrain ncalexan hansihe tempbottle mre tpt khionu adamnemecek skyvoid123 mapbased akitsu-sanae zicklag felipellrocha kensuke-hinata baitcenter oronsh tawawhite mbrukman atul9 ra2003 zhutony ysawc isgasho therealbnut adrianwn bgw chief-detektor icodein lideen999 informalsystems yutiansut fgasperij kw217 zsol mikelolasagasti neunenak cyberflamego andrewpedia gregbowyer tamuratak csnover ajunlonglive standardgalactic kingrongh coolreader18 stevefan1999-personal piggynl kile-asmussen ppaulweber devdave shepherd1530 tchajed mrlegohead0x45 iq-scm dongbin86 flamenco vipassanacoder

rust-peg's Issues

Returning a Mutable Variable from a Rule

Hello, I was wondering if there was any way to return a variable from a rule to a parent rule so that it would be mutable?

For example:

#[export]
document -> Document
    = a:startList ( b:block { a.children.add(b) } )*
    { a }

startList -> Element
    = &.
    { new_mut_list() }

Case insensitive literal match should use case folding instead of to_uppercase

Uppercase is not a correct case folding for all unicode characters. See http://www.w3.org/International/wiki/Case_folding for details.

Of course, the only use of this feature I know of is a SQL parser, and there as in most programming languages, all identifiers are ASCII, so this is likely just pedantry.

Performance problems

I wrote a parser to recognize a Lua-like language, and it seems that it runs very slowly: Parsing the function call f(1,2,3,4,5,6,6,7,87,8,9,9,5,5,4) takes a whole 280 ms (according to cargo bench, which takes over a minute to complete).

The grammar I used can be found here.

Can this be due to my expression rules using one rule per precedence level (there are currently 8 precedence levels) or is rust-peg more likely to be the culprit?

Thanks

Error: Expected HashSet

I'm hitting an error with rust-peg. It's expecting a newline in a statement that shouldn't need one:

main.rs:56:1: 123:5 error: Error at Line 58:12: Expected HashSet {"\u{2029}", "\r\n", "\u{2028}", "\n", "\r"}
main.rs:56 peg! dot(r#"
main.rs:57 #[pub]
main.rs:58 graph -> DotGraph
main.rs:59  = __ "strict"? __ gt:GraphType __ id:ID? __ "{" __ stmts:stmt_list __ "}" __
main.rs:60  {
main.rs:61      DotGraph::new(id, gt, stmts)
           ...

"skip"-rule

Is there an easy way to always skip whitespace or do I need to tell explicitely that it is allowed but not needed between every token?

Support for generic type parameters

Hi,

First of all, thanks for this great library :)

I'd like to instantiate different implementations of objects with the same grammar. This is like an abstract factory where the grammar uses the factory to create the objects.

If I could define a generic type parameter with a bound on a pub generated function then I could use the factory.

What do you think about this use-case?

UPDATE:
This issue is orthogonal to #84 because I don't need an extra argument to the functions just an <T: Factory> generic type parameter on the generated functions. My Factory has only static methods.

Allow to return tuples

Currently it is impossible to return a tuple from a rule although sometimes it is pretty useful

Bind variables to lookahead values

Can I bind variables to the values of lookahead expressions? Like:

foo -> &'input str
  = var:&bar { var }

So it'd return the value of the bar expression, without consuming the input. I tried it, but match_str is empty and var is the unit value.

Thanks for your help and this great library!

Custom error messages or rule names

A hack to achieve custom error messages after #71:

_binop
    = "+" / "-" / "*" / "/" / ">=" / "<=" / "==" / "!=" / ">" / "<"

binop
    = op:_binop? {? match op {
        Some(op) => Ok(op),
        None => Err("binary operator"),
    }}

peg.js supports associating human-readable names with rules which are used in the "expected" error message.

integer "integer" = [0-9]+
Line 1, column 1: Expected integer but "a" found.

Since rust-peg already uses pseudo-rust attributes, this could perhaps be:

#[expected = "integer"]
integer = [0-9]+

if we wanted to clone the peg.js semantics.

False-y values don't match

At least, I think that is what happens. Here is an example:

#![feature(globs)]
#![feature(phase)]

#[phase(plugin)]
extern crate peg_syntax_ext;

mod ast {
    #[deriving(PartialEq, Show)]
    pub enum Exp {
        If(Box<Exp>, Box<Exp>, Box<Exp>),
        Int(int),
    }
}

peg! parse(r#"
use ast::*;

#[export]
exp -> Exp = ws e:(if / int) ws { e }

if -> Exp = "if" a:exp "then" b:exp "else" c:exp { If(box a, box b, box c) }

int -> Exp = [1-9][0-9]* { Int(from_str::<int>(match_str).unwrap()) }

ws -> () = [\t\r\n\f ]*
"#)

fn main() {
    use ast::*;
    assert_eq!(parse::exp("if 1 then 2 else 3"),
               Ok(If(box Int(1), box Int(2), box Int(3))));  // ok
    assert_eq!(parse::exp("if 0 then 1 else 2"),
               Ok(If(box Int(0), box Int(1), box Int(2))));  // fail
    assert_eq!(parse::exp("if 2 then 1 else 0"),
               Ok(If(box Int(2), box Int(1), box Int(0))));  // fail
}

Return Types with Lifetime Specifiers for Rules

It currently doesn't seem to be possible to designate a return type with a lifetime specifier for anything by &str.

Would it be possible to allow something like this:

post -> Post<'input>
    = header:header eol eol content:content
    { Post::new(header, content) }

Weird parsing error

This grammar:

#[pub]
root -> String
    = code:(code_block / inline / non_code) { "".to_string() }

non_code -> String
    = (!code .)* { match_str.to_string() }

code -> String
    = code_block / inline

inline -> String
    = "`" [^`]+ "`" { match_str.to_string() }

code_block -> String
    = ticks (!ticks .)* ticks { match_str.to_string() }

ticks
    = "\n" " "* "```"

fails at parsing https://raw.githubusercontent.com/suhr/rust_book_ru/master/src/drop.md with

ParseError { line: 4, column: 112, offset: 264, expected: {"[^`]"} }

Is this an expected behaviour?

Crate on crates.io broken

Version 0.2.0 doesn't currently compile (complaining about Path), while the git version is fine. I'm guessing 2fe3c92 fixed this, but no 0.2.1 was released afterwards...

Warn if you name the return value of a rule returning ()

If you forget a return type

name = [a-z]+
fncall -> Call = name:name "()" { Call(name.into_string()) }

we can provide an error message at the rust-peg level that will be more helpful than rustc's "error: type () does not implement any method in scope named into_string" and give a hint in the right direction. It is useless to bind a name to (), so the user meant to specify a return type.

Thanks to @untitaker for running into this pitfall and reporting it.

rust-peg broken on nightly build

A breaking change to syntax::ext::base::ExtCtxt::new(), requiring an additional argument, seems to be preventing rust-peg from building on the latest nightly: rust-lang/rust@e364f0e#diff-5100da09ad52c81568562e36ee359343

I get this error:

$ cargo build
   Compiling peg v0.3.1 (file:///Users/lbuckley/Development/rust-peg)
src/fake_extctxt.rs:9:16: 11:4 error: this function takes 4 parameters but 3 parameters were supplied [E0061]
src/fake_extctxt.rs: 9   let mut cx = syntax::ext::base::ExtCtxt::new(&ps, Vec::new(),
src/fake_extctxt.rs:10     syntax::ext::expand::ExpansionConfig::default("rust-peg".to_string())
src/fake_extctxt.rs:11   );
src/fake_extctxt.rs:9:16: 11:4 help: run `rustc --explain E0061` to see a detailed explanation
error: aborting due to previous error
Could not compile `peg`.

I am very new to Rust, but the fix seems to be simply adding a new Vec::<GatedCfg> to the call to ExtCtxt::new(). After making that change locally the build succeeds and the command line tool seems to work, though I am having trouble getting the compiler plugin to load, so I don't know if the tests work or not. I will poke at that a bit more and put up a PR of some kind.

Can't import from parent module

My main file looks like this:

#![feature(plugin, core)]
#![plugin(peg_syntax_ext)]

enum BinOp {
    AddOp,
    SubOp,
    MulOp,
    DivOp,
}

peg_file! lisp("grammar.rustpeg");

But I can't access BinOp with use super::BinOp in my grammar.rustpeg file:

Compiling lisp v0.0.1 (file:///mnt/ubuntu/home/gulshan/code/lang/rust/lisp)
<quote expansion>:1:1: 1:17 error: use of undeclared type name `BinOp`
<quote expansion>:1 name_66,ctxt_0

Auto generated issue for statements related to ? symbol

The grammar rule
statementList = ( (statement NL) / (LINECOMMENT NL) )*
works as a charm. How ever the following rule is problematic:
statementList = ( (statement NL?) / (LINECOMMENT NL) )*
because in the generated method the NL? returns a Option<()> rather than (), e.g. it get passed up in the following way:
let choice_res = { let seq_res = parse_statement(input, state, pos); match seq_res { Matched(pos, _) => { match parse_NL(input, state, pos) { Matched(newpos, value) => { Matched(newpos, Some(value)) } Failed => { Matched(pos, None) } } } Failed => Failed, } };

Now since the actual signature of the functions is Result<()> and we are returning the compiler can not determine what type is choice_res () or Option<()>.

Look into syntex to make the library work on 1.0 stable

$ git rev-parse --abbrev-ref HEAD ; git rev-parse HEAD
master
7a4ddb202cb457103827bab91a0bb2f745440b07

$ cargo build
   Compiling peg v0.2.2 (file:///home/n/3p/github.com/kevinmehall/rust-peg)
src/grammar.rs:4:5: 4:29 warning: use of deprecated item: use e.g. i32::from_str_radix, #[warn(deprecated)] on by default
src/grammar.rs:4 use std::num::from_str_radix;
                     ^~~~~~~~~~~~~~~~~~~~~~~~
src/grammar.rs:3970:73: 3970:94 warning: use of deprecated item: use e.g. i32::from_str_radix, #[warn(deprecated)] on by default
src/grammar.rs:3970                                                                         from_str_radix::<u32>(match_str,
                                                                                            ^~~~~~~~~~~~~~~~~~~~~
src/grammar.rs:4046:61: 4046:82 warning: use of deprecated item: use e.g. i32::from_str_radix, #[warn(deprecated)] on by default
src/grammar.rs:4046                                                             from_str_radix::<u32>(match_str,
                                                                                ^~~~~~~~~~~~~~~~~~~~~
src/grammar.rs:4134:97: 4134:118 warning: use of deprecated item: use e.g. i32::from_str_radix, #[warn(deprecated)] on by default
src/grammar.rs:4134                                                                                                 from_str_radix::<u32>(match_str,
                                                                                                                    ^~~~~~~~~~~~~~~~~~~~~
src/grammar.rs:4271:145: 4271:166 warning: use of deprecated item: use e.g. i32::from_str_radix, #[warn(deprecated)] on by default
src/grammar.rs:4271                                                                                                                                                 from_str_radix::<u32>(match_str,
                                                                                                                                                                    ^~~~~~~~~~~~~~~~~~~~~
src/peg_syntax_ext.rs:1:1: 1:107 error: unstable feature
src/peg_syntax_ext.rs:1 #![feature(plugin_registrar, quote, box_syntax, core, collections, rustc_private, box_patterns, str_char)]
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: this feature may not be used in the beta release channel
error: aborting due to previous error
Could not compile `peg`.

To learn more, run the command again with --verbose.

$ rustc --version
rustc 1.0.0-beta (9854143cb 2015-04-02) (built 2015-04-02)

Sugar for source mapping

In conjunction with context objects (#84), the library could have excellent support for annotating the parse tree with source positions for error reporting and interactive editing.

With the #[spanned] attribute on a rule, the rule result would be filtered through a method of the context object before being returned. context.wrap_span(result, start_pos, end_pos). The start position and end position are known to rust-peg, so this saves duplicating that code in an action of every rule that collects source position information. The context object could add additional information such as a filename or global SourceMap offset.

error: unresolved import `Switch`

Thank you for such a cool library!

The changes to enum scoping described here landed last night, causing peg builds to break:

$ cargo build 2>&1 | head
   Compiling docopt v0.6.8 (git://github.com/docopt/docopt.rs#b8e7cca7)
   Compiling peg v0.1.0 (https://github.com/kevinmehall/rust-peg.git#4ffe8b68)
/home/emk/.cargo/git/checkouts/docopt.rs-643e15305361a76f/master/src/parse.rs:47:13: 47:19 error: unresolved import `Switch`. There is no `Switch` in `???`
/home/emk/.cargo/git/checkouts/docopt.rs-643e15305361a76f/master/src/parse.rs:47 use {Value, Switch, Counted, Plain, List};
                                                                                             ^~~~~~
/home/emk/.cargo/git/checkouts/docopt.rs-643e15305361a76f/master/src/parse.rs:47:21: 47:28 error: unresolved import `Counted`. There is no `Counted` in `???`
/home/emk/.cargo/git/checkouts/docopt.rs-643e15305361a76f/master/src/parse.rs:47 use {Value, Switch, Counted, Plain, List};
                                                                                                     ^~~~~~~

I'll submit a PR for this issue in just a moment. It's a pretty easy fix.

Return types of expressions w/ repetition operators

There appears to be a problem with expressions in the forms expr* and expr+. For example:

fn main() {
    println!("{:?}", parser::sqayz("'aaaaa'");
}

peg! parser(r#"
    #[pub]
    sqayz -> String
        // No, I didn't really expect this to compile - just experimenting
        = ['] content:[a]* [']  { content.to_string() }
"#);

Attempting to compile the above produces this error:

Compiling pegtest02 v0.0.1 (file:///home/m/scratch/pegtest02)
<quote expansion>:89:85: 89:103 error: type `collections::vec::Vec<()>` does not implement any method in scope named `to_string`
<quote expansion>:89                                                                   name_498,ctxt_0.name_70,ctxt_0()

Okay, so content:[a]* returns a Vec type, as stated in the documentation. But apparently the specific type is Vec<()>. Just to confirm, I tried this:

peg! parser(r#"
    #[pub]
    sqayz -> Vec<()>
        = ['] content:[a]* [']  { content }
"#);

This compiles, and running the resulting program produces:

Ok([(), (), (), (), ()])

... which doesn't seem like a very useful result; it's certainly not what I want. Okay, here's another version:

 peg! parser(r#"
    #[pub]
    sqayz -> String
        = ['] content:ayz [']  { content.to_string() }

    ayz -> String
        = [a]+                       { match_str.to_string() }
"#);

This compiles, and produces the desired output:

Ok("aaaaa")

Okay, one more:

peg! parser(r#"
    #[pub]
    sqayz -> Vec<String>
        = ['] content:ayz+ [']  { content }

    ayz -> String
        = [a]                   { match_str.to_string() }
"#);

This compiles, but produces the output:

Ok(["a", "a", "a", "a", "a"])

I am not sufficiently knowledgeable about PEG to make any statements about standards-compliance (and I gather PEG isn't exactly a standard anyway), but some of the above behavior is certainly counterintuitive. My comments are as follows:

The statement in the documentation: expression+ - Match one or more repetitions of expression and return the results as a Vec is clearly incomplete. My experiments show that <character_class>+ and <character_class>* return &str, as do <literal_string>+ and <literal_string>* while <nonterminal>+ and <nonterminal>* return vectors as stated.
It seems that <identifier>:<terminal> produces a Vec<()>. If that result is intentional, or if the types of expression that can be labeled are restricted, I think some explanation is needed.

Thanks!

Generated parsers should use a custom error type for Result

When using the try! macro around a generated parser from within a function which returns another Result with a different custom error...

fn foo() -> Result<(), MyCustomError> {
  let value = try!(grammar::expr("xxx"));
  ...
  Ok(())
}

...because the generated parsers use String for Result errors, I must implement the following for my custom error type.

impl FromError<String> for MyCustomError {
  ...
}

In the current situation, I have two choices

Option 1

pub enum MyCustomError { IoError(IoError), ParseError(String) }

Send all String to ParseError. This is bad if there are other libraries which also use String for an error; the semantics of FromError do not make sense. (I'd raise the same issues to the other library for using String). It's an error-space collision!

Option 2

pub enum MyCustomError { IoError(IoError), GenericError(String) }

Send all String to GenericError, which loses the ability to have specificity, such as custom formatting in the implementation of FromError.

Proposal

Have parsers generated by rust-peg instead look something like

fn expr(&str) -> Result<a, ParseError>

Can't seem to access peg binary when peg is specified as cargo dependency

Sorry for posting this here, but I am unable to figure this out and so far the people in the rust IRC have not responded either.

I'm not sure how to access the peg binary when my project is using cargo to include peg as a dependency. Is there a standard way of doing this?

For instance, I have the following as a dependency in my Cargo.toml file:

[dependencies]
peg = "0.1.1"

It'd be nice if when I ran cargo build there was a directory of binary executables in target. Then I could just do target/bin/peg my_grammar.rustpeg > src/grammar.rs.

Again, sorry this is also a question about Cargo in general, but I can't seem to figure this out.

Grammar ambiguity between {action} and {count}.

Actions that consist of returning a single integer are ambiguous with the repetition count syntax introduced by #20.

number -> u32
  = "one" {1}
  / "two" {2}
  / "three" {3}

Those should be actions, but are parsed as repeat counts.

One option that minimally solves the problem is to simply drop x {5} in favor of x {5, 5}. A bare comma is not valid Rust, but the distinction between actions and the remaining { , } forms may be hard for humans to parse.

Does anyone have ideas for a new syntax for the bounded-repeat functionality?

infinite loop

When I try to use the following contents cargo test never returns:

#![feature(plugin)]
#![plugin(peg_syntax_ext)]

peg! line(r#"

#[pub]
line -> Vec<&'input str>
  = line_parts* 

line_parts -> &'input str
  = literal
  / command

#[pub]
literal -> &'input str
  = l:(!"$(" .)* { match_str }

command -> &'input str
  = "$(" c:inside_command ")" { c }

inside_command -> &'input str
  = [a-zA-Z0-9_-]+ { match_str }

"#);

#[test]
fn main() {
  assert_eq!(line::literal("foo ").ok().unwrap(), "foo ");
  assert_eq!(line::literal("foo \n").ok().unwrap(), "foo \n");
  assert_eq!(line::line("foo $(a)").ok().unwrap(), vec!["foo ", "a"]);
}

I think the problem is using the .* expression.

Edit: The example was simplified based on @jonas-schievink 's suggestions.

Removing IdentTT

I'm considering removing IdentTT here. Do you have any objections to that?

The syntax would have to change to something like

peg! {
 mod modname(....)
}

On a match, give user the match line number and column, along with the byte position

It would be convenient to give the user line and column numbers as well as the byte position on a match. This is just a guess, but it might be as easy as calling pos_to_line here:

rust-peg/src/translate.rs

Line 566 in de754ea

let start_pos = pos;

If that's the case, I can go ahead and add this.

Add bounded repetitions

It would be nice if this syntax was supported:

x{n,m}    at least n times x and at most m times x
x{,m}     at most m times x (including zero times)
x{n,}     at least n times x
x{n}      exactly n times x

It can be emulated with something like x x (x (x x?)?)? but that quickly gets cumbersome.

/ expression ordering gives different results when there is a leading partial match

From the readme:

e1 / e2 / e3 - Try to match e1. If the match succeeds, return its result, otherwise try e2, and so on.

From this I expected e2 to begin matching from the same place as e1 began (&c), although that may not be the case... consider the following:

foo -> String
  = bar / baz

qux -> String
  = baz / bar

bar -> String
  = [a-z]+[A-Z]+ { match_str.to_string() }

baz -> String
  = [a-z]+ { match_str.to_string() }

Testing foo and qux gives different results:

println!("{:?}", foo("abcABC"));
println!("{:?}", foo("abc"));

Ok("abcABC")
Ok("abc")

println!("{:?}", qux("abcABC"));
println!("{:?}", qux("abc"));

Err(ParseError { line: 1, column: 1, offset: 0, expected: {"[a-z]"} })
Err(ParseError { line: 1, column: 1, offset: 0, expected: {"[a-z]"} })

The latter behaviour surprised me (compared to the former case, and compared to [a-z]+ / [A-Z]+), and I'm not sure if it is intended.

Wikipedia

The choice operator e1 / e2 first invokes e1, and if e1 succeeds, returns its result immediately. Otherwise, if e1 fails, then the choice operator backtracks to the original input position at which it invoked e1, but then calls e2 instead, returning e2's result.

Matching Lua's "long strings"

I'm trying to write a parser that matches Lua's long string syntax. Basically, "[" followed by any number of "=" (including 0) followed by "[" opens the string, and "]" followed by the same number of "=" followed by "]" closes it.

Examples of long strings:

[[simple long string]]
[=["level 1" long string]]this is still part of the string]==]as is this]=]

I didn't come up with a parser that works, it's probably not even possible. This is what I currently have:

mlstring_char -> char
    = [^\]] { match_str.char_at(0) }

mlstring_inner -> String
    = "=" s:mlstring_inner "="      { s }
    / "[" chars:mlstring_char* "]"  { chars.into_iter().collect() }

string -> String
    = "[" s:mlstring_inner "]" { s }

This obviously doesn't work: Long strings end with the first "]" inside of them. If PEG wasn't greedy, I could just match any char inside mlstring_inner, but rust-peg doesn't seem to have any non-greedy mode.
Another way to solve this would be to reference captured strings in the pattern (like some Regex implementations allow). That way, I could just capture "="* in the opening braces, and reference it in the closing brace. The match would only succeed if both strings are the same, ie. if both braces use the same number of "="s.
The third way I've come up with is the most flexible solution: Run arbitrary rust code to determine if a rule should match. This way, I could just compare the number of "="s, and match the end of the long string if it has an equal number of "="s.

Is this somehow possible without modifying rust-peg?

Support for threading a context object through to action code

Allow an object passed to the top-level parse call to be threaded through all parse functions and accessible to the action code.


Use cases: string interners, source mapping.

How to make created module public?

Lets say I have this in my libs.rs:

peg! parsemod(r#"
  #[pub]
  expr = …
  …
"#);

And I want to use it from main.rs:

extern crate mycrate;
use mycrate::parsemod;
fn main () {
  let parsed = parsemod::expr("Foo");
}

Building this with cargo build results in module parsemod is private.

How can I make parsemod public?

crates.io package is out of date

Good morning, and thank you for an excellent parser library!

I'm trying to upload one of my libraries to crates.io, which means that I need to use the packaged version of peg. Unfortunately, that version seems to be out of date:

   Compiling peg v0.1.0
src/rustast.rs:25:2: 25:8 error: attempted access of field `path` on type `syntax::ast::Path`, but no field with that name was found
src/rustast.rs:25   r.path
                    ^~~~~~
src/rustast.rs:43:12: 43:27 error: this function takes 0 parameters but 1 parameter was supplied [E0061]
src/rustast.rs:43   let r = p.parse_ty(false);
                              ^~~~~~~~~~~~~~~
error: aborting due to 2 previous errors
Could not compile `peg`.

Would it be possible to upload a new version to crates.io, so that I can do the same with my own package? Thank you once again for the excellent library!

Ignore unmatched curly braces in strings and comments in Rust code blocks

Test case:

foo -> &str = "foo" {"}"}

Running peg against this grammar yields the following error:

$ peg test.rustpeg 
Error parsing language specification: error at 2:24: expected one of `\r`, `;`, `/*`, `[a-z]`, `#[export]`, `[A-Z]`, `\r\n`, `\u{2028}`, `#[pub]`, `\n`, `\u{2029}`, `/`, `_`, `//`, `[ \t\u{a0}\u{feff}\u{1680}\u{180e}\u{2000}-\u{200a}\u{202f}\u{205f}\u{3000}]`

It seems like the problem is in the "}" part of the grammar. If I replace it with something else (e.g. "a"), the code generation works.

This happens to v0.2.3

add usage example to readme

It's not very clear what you do once you've generated a parser. I think an example in the readme would be pretty helpful.

I'll try to take a stab at it if I get some time, but it won't be till at least the weekend.

Memoization

Add packrat memoization to improve performance. As this would require that the rule return type be Clone, it would need to be opt-in per-rule, perhaps with rule attribute #[cache]. If the attribute is specified, the translator would add a rule_name_cache: HashMap<usize, RuleResult<ReturnType>> field to the generated ParseState struct. On entry to the rule function, it would check to see if the rule has already been run at that position, and return the cached value if so. On exit from the rule, it could store the start position and result into the HashMap.

Add debug tracing like yydebug

It would be very useful to have some type of debug tracing like yydebug in lex/yacc. All I'd really like is to know when a certain state has been entered and then left.

Compilation failure using newer rustc

After upgrading rustc, I got these errors:

src/rustast.rs:25:2: 25:8 error: attempted access of field `path` on type `syntax::ast::Path`, but no field with that name was found
src/rustast.rs:25       r.path
                        ^~~~~~
src/rustast.rs:43:12: 43:27 error: this function takes 0 parameters but 1 parameter was supplied [E0061]
src/rustast.rs:43       let r = p.parse_ty(false);
                                  ^~~~~~~~~~~~~~~
error: aborting due to 2 previous errors
Build failed, waiting for other jobs to finish...
Could not compile `peg`.

Matching byte-ranges

For example, https://tools.ietf.org/html/rfc5234 defines several byte-ranges to match.

It'd be nice if rust-peg supported a similar syntax:

Hence:

     DIGIT       =  %x30-39

is equivalent to:

     DIGIT       =  "0" / "1" / "2" / "3" / "4" / "5" / "6" /

                    "7" / "8" / "9"

Trouble matching strings with match_str (expected end of input at 1)

#![feature(phase)]
#[phase(plugin)]
extern crate peg_syntax_ext;

use a::name;
peg! a(r#"
#[pub]
name -> ~str
  = [a-zA-Z0-9_]+ { match_str.to_owned() }
"#)

#[test]
fn test_parser() {
  assert_eq!(name("k"), Ok("k"));
}

This test fails at the peg! macro with:

 error: Expected end of input at 1

Any idea what went wrong? This is an example from the README.md.

Tested with:

rustc 0.12.0-pre-nightly (711d71027 2014-08-22 00:56:00 +0000)
cargo 0.0.1-pre-nightly (a40d1fd 2014-08-25 13:44:11 +0000)

Support matching specific number of repetitions with delimiter

Basically, what I'd like is to change expression ++ delim to support a range argument, something like expression ++{a, b} delim, where {a, b} behaves like in regex.

Version 0.2.3 does not build with Rust 1.0.0 beta

I'm unable to compile latest rust-peg 0.2.3 with the Rust 1.0.0 beta release for Windows x64.

Used build tools:

> cargo --version
cargo 0.0.1-pre-nightly (84d6d2c 2015-03-31) (built 2015-04-01)

> rustc --version
rustc 1.0.0-beta (9854143cb 2015-04-02) (built 2015-04-02)

Build errors:

D:\Users\amader\.cargo\git\checkouts\rust-peg-3bb581b77f5cbfba\master\src\rustast.rs:24:4: 24:31 error: type `syntax::ast::Path` does not implement any method in scope named `unwrap_or_else`
D:\Users\amader\.cargo\git\checkouts\rust-peg-3bb581b77f5cbfba\master\src\rustast.rs:24     r.unwrap_or_else(|_|panic!())
                                                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~
D:\Users\amader\.cargo\git\checkouts\rust-peg-3bb581b77f5cbfba\master\src\rustast.rs:36:4: 36:33 error: type `syntax::ptr::P<syntax::ast::Block>` does not implement any method in scope named `unwrap_or_else`
D:\Users\amader\.cargo\git\checkouts\rust-peg-3bb581b77f5cbfba\master\src\rustast.rs:36     r.unwrap_or_else(|e| panic!(e))
                                                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
D:\Users\amader\.cargo\git\checkouts\rust-peg-3bb581b77f5cbfba\master\src\peg_syntax_ext.rs:97:49: 97:78 error: type `bool` does not implement any method in scope named `unwrap_or_else`
D:\Users\amader\.cargo\git\checkouts\rust-peg-3bb581b77f5cbfba\master\src\peg_syntax_ext.rs:97                     if !parser.eat(&token::Eof).unwrap_or_else(|e| panic!(e)) {

Support for Rust 1.0?

When I try to compile rust-peg with rustc 1.0.0 (a59de37e9 2015-05-13) (built 2015-05-14), it fails with these errors:

/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:255:65: 255:75 error: type `&collections::string::String` does not implement any method in scope named `to_tokens`
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:255          println!("[PEG_TRACE] Attempting to match rule {} at {}:{}", $rule_name, line, col);
                                                                                                                                                                             ^~~~~~~~~~
note: in expansion of quote_expr!
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:253:3: 265:5 note: expansion site
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:261:71: 261:81 error: type `&collections::string::String` does not implement any method in scope named `to_tokens`
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:261              Matched(_, _) => println!("[PEG_TRACE] Matched rule {} at {}:{}", $rule_name, line, col),
                                                                                                                                                                                      ^~~~~~~~~~
note: in expansion of quote_expr!
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:253:3: 265:5 note: expansion site
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:262:72: 262:82 error: type `&collections::string::String` does not implement any method in scope named `to_tokens`
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:262              Failed => println!("[PEG_TRACE] Failed to match rule {} at {}:{}", $rule_name, line, col)
                                                                                                                                                                                       ^~~~~~~~~~
note: in expansion of quote_expr!
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/translate.rs:253:3: 265:5 note: expansion site
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/rustast.rs:20:11: 20:40 error: type `syntax::parse::ParseSess` does not implement any method in scope named `new`
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/rustast.rs:20     let ps = syntax::parse::ParseSess::new();
                                                                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/rustast.rs:32:11: 32:40 error: type `syntax::parse::ParseSess` does not implement any method in scope named `new`
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/rustast.rs:32     let ps = syntax::parse::ParseSess::new();
                                                                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/rustast.rs:40:11: 40:40 error: type `syntax::parse::ParseSess` does not implement any method in scope named `new`
/Users/christopherdumas/.cargo/registry/src/github.com-1ecc6299db9ec823/peg-0.3.0/src/rustast.rs:40     let ps = syntax::parse::ParseSess::new();
                                                                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error: aborting due to 6 previous errors
Could not compile `peg`.

vec push_all marked unstable - nightly builds fail

Just updated to the latest nightly via multirust and builds are now failing. Here's the output:

jhaddad@jhaddad-rmbp15 ~/dev/thunderdome$ cargo build                                                                                                                    ✹master
   Compiling civet-sys v0.1.4
   Compiling semver v0.1.19
   Compiling peg v0.3.0
   Compiling libc v0.1.8
   Compiling route-recognizer v0.1.11
   Compiling conduit v0.7.3
   Compiling conduit-router v0.7.1
   Compiling civet v0.7.6
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:57:11: 65:27 error: use of unstable library feature 'vec_push_all': likely to be replaced by a more optimized extend
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:57     items.push_all(&
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:58         header_items(ctxt).into_iter()
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:59       .chain(grammar.rules.iter().map(|rule|{
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:60           compile_rule(ctxt, rule)
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:61       }))
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:62       .chain(grammar.rules.iter().filter(|rule| rule.exported).map(|rule| {
                                                                                                                          ...
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:57:11: 65:27 help: add #![feature(vec_push_all)] to the crate attributes to enable
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:402:74: 402:80 error: use of unstable library feature 'slice_extras': likely to be renamed
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:402                  compile_match_and_then(ctxt, &exprs[0], None, write_seq(ctxt, exprs.tail()))
                                                                                                                                                                                                                ^~~~~~
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:402:74: 402:80 help: add #![feature(slice_extras)] to the crate attributes to enable
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:419:42: 419:48 error: use of unstable library feature 'slice_extras': likely to be renamed
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:419                  let next = write_choice(ctxt, exprs.tail(), result_used);
                                                                                                                                                                                ^~~~~~
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:419:42: 419:48 help: add #![feature(slice_extras)] to the crate attributes to enable
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:552:30: 552:36 error: use of unstable library feature 'slice_extras': likely to be renamed
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:552                          write_seq(ctxt, exprs.tail(), code, is_cond)
                                                                                                                                                                          ^~~~~~
/Users/jhaddad/.multirust/toolchains/nightly/cargo/registry/src/github.com-0a35038f75765ae4/peg-0.3.0/src/translate.rs:552:30: 552:36 help: add #![feature(slice_extras)] to the crate attributes to enable
error: aborting due to 4 previous errors
Could not compile `peg`.

To learn more, run the command again with --verbose.

Build is broken

I'm getting this error when trying to build with today's nightly

src/rustast.rs:24:2: 24:3 error: mismatched types:
 expected `syntax::ast::Path`,
    found `core::result::Result<syntax::ast::Path, syntax::diagnostic::FatalError>`
(expected struct `syntax::ast::Path`,
    found enum `core::result::Result`) [E0308]
src/rustast.rs:24   r
                    ^
src/rustast.rs:36:2: 36:3 error: mismatched types:
 expected `syntax::ptr::P<syntax::ast::Block>`,
    found `core::result::Result<syntax::ptr::P<syntax::ast::Block>, syntax::diagnostic::FatalError>`
(expected struct `syntax::ptr::P`,
    found enum `core::result::Result`) [E0308]
src/rustast.rs:36   r
                    ^
src/peg_syntax_ext.rs:97:24: 97:48 error: cannot apply unary operator `!` to type `core::result::Result<bool, syntax::diagnostic::FatalError>`
src/peg_syntax_ext.rs:97                     if !parser.eat(&token::Eof) {
                                                ^~~~~~~~~~~~~~~~~~~~~~~~
error: aborting due to 3 previous errors
Could not compile `peg`.

$ rustc --version
rustc 1.0.0-nightly (d9146bf8b 2015-04-07) (built 2015-04-08)

I use rust-peg for multiple projects, so I'm hoping you can get to this soon.

Can rust-peg's dylib annotation be removed?

Cargo's behavior for building binaries is to dynamically link everything if one dependency has been found that is only available as dylib. In my case, I want to write a standalone binary, but rust-peg currently prevents that.

I removed this line from peg_syntax_ext:

#![crate_type = "dylib"]

and everything seems to work fine.

Illegal hardware instruction

I'm trying to run rust-peg but it fails with:

$ RUST_BACKTRACE=1 ./target/peg src/grammar.rustpeg

thread '

' has overflowed its stack
[1] 29743 illegal hardware instruction RUST_BACKTRACE=1 ./target/peg src/grammar.rustpeg

This is with the latest nightly rust build:

$ rustc --version
rustc 0.13.0-nightly (62fb41c32 2014-12-23 02:41:48 +0000)

Any idea what could be going on here?

Thanks!

Allowing grouping of expressions by some brackets

So this is quite straight forward, I know it is more of an echancement rather than anything else, but it would be nice if we can have some way of grouping things, so that for instance you can write something like:
ID / (LBRACE ID ++ COMMA RBRACE)

Preprocess matched string with rule before parsing it with another rule

I have a parser that can split a file like this into a hashmap:

FOO:foo
BAR:bar
BAZ:baz

However, some values span over multiple lines when the continuing lines start
with a single space, the following would be equivalent:

FOO:
 bar
BAR:bar
BAZ:b
 az

The problem is that the linebreak can happen anywhere, not just in the value.
Therefore i'd like to match the line with a rule similar to this:

continuation = eol whitespace
continuations -> &str
    = x:[a-zA-Z:] ++ continuation {
    let mut rv = "";
    for p in x.into_iter() {
        rv += p;
    };
    rv
}

And feed the returned value into my previous parser.

The progress of this can be viewed at untitaker/rust-vobject#1

Re-usable parsers

Assume I define a few parsers within a module and export the parsing functions, similar like the following:

pub use self::parser::empty as empty;

peg! parser(r#"
#[pub]
empty -> ()
    = "" { () }
"#)

I'm wondering if it is possible to reuse the exporter parser in another module in another PEG macro call. In general, I'm interested in reusing parsers defined in different source files.

Any ideas?