Code Monkey home page Code Monkey logo

racket-peg's Introduction

racket-peg

This library implements a PEG parser generator.

Getting Started

  • run chmod +x ./make to turn ./make executable

  • Run ./make install to install the peg library. You should be able to (require peg) in your own racket programs after that. Or use #lang peg!

  • Run ./make update to apply changes, if you're hacking on it.

  • Run ./make docs to build the documentation.

  • Run ./make test to check that it is working after installation.

Source Code Map

The PEG parser system is built in two stages. There is a lispy s-expression version. Then there is a self-hosted PEG syntax version.

The lispy version is implemented in:

  • peg.rkt - The PEG parsing VM and parser macro.
  • peg-result.rkt - The fundamental data structure used for parse results. It's a kind of automatically joinable sequence.

The PEG aspect is implemented in these files:

  • peg-src/peg-in-peg.rkt - The syntax of our peg language. In peg.
  • peg-src/sexp-parser.rkt - A basic s-expression parser. In peg.

Both of the above files are "bootstrapped" using the racket macro expander to produce the following:

  • peg-in-peg.rkt - expanded version of peg-src/peg-in-peg.peg.
  • s-exp.rkt - expanded version of peg-src/s-exp.peg.
  • peg-to-scheme.rkt - support for peg-in-peg.
  • main.rkt - This adds the #lang peg glue to racket.

Authors

  • Raymond Nicholson
  • João Pedro Abreu de Souza

racket-peg's People

Contributors

jamestmartin avatar kstrafe avatar lwhjp avatar petrolifero avatar rain-1 avatar samth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

racket-peg's Issues

add positive lookahead operator

add & operator, that is equivalent to !!

&a == ! (! a)

so, if a matchs, !a fails without consume, and !(!a) match, again, without consumes.

Is just a syntatic sugar but make easy write intersection of languages

add quantifier repetition

to write something like password <- [a-zA-Z0-9]{8}[a-zA-Z0-9]* ; to the password has minimun length of 8.
or a <- [a-z]{8,10} to between 8 and 10 char.

or a <- [a-z]{8,} without upper limit, with the semantics of minimun of 8

add epsilon operator

In the same way that positive lookahead, this is a syntatic sugar

epsilon <-- (! .) / (& .) ;

I dont know of a good symbol of that, but I think that will be a good add.

add a primitive to read an s-expression

We will need a primitive that either:

  • (A) invokes rackets s-expression reader
  • (B) calls on our own s-expr reader

that allows us to easily include s-expressions inside PEG grammars.

(A) would depend on #2

perhaps (B) is the better option though. Even though we have to "re"implement s-expr parsing, it would blend more seamlessly with the rest of the peg parsing system.

add keyword support in s-exp

#lang peg

(struct add (a b) #:transparent);

(define (my-sort lst #:comparator [cmp <])
             (sort lst cmp))

Any of this lines works. In parser level we need to add support to keywords.

We can't use #f as a semantic value

#lang peg

a <- 'a' -> #f ;

This don't work because is translated to use #f as semantic value this is a hard-coded code interpreted as don't have semantic action(in real world, this repeat this input) instead of returning #f when I use (peg a "a")

support to "..."

#lang peg


(define (eval-rpn l [aux '()])
  (match l
    ['() (car aux)]
    [(list (? number? a) b ...) (eval-rpn (cdr l) (cons (car l) aux))]
    [(list a b ...) (eval-rpn (cdr l) (cons (a (car aux) (cadr aux)) (cddr aux)))]));

_ < [ \t\n]* ;
number <- value:[0-9]+ -> (string->number value);

top <- v:rpn ! .  -> (eval-rpn v);

rpn <- number (_ rpn _ Bop)* _ ;

Bop <- v:[+\-] -> (if (equal? v "+") + -) ;

this dont work because the identifier parser dont understand "..."

error messages

it needs to produce useful error messages on a parse failure.

make peg-src/sexp-parser.rkt more complete

it doesn't cover the full scheme langauge yet. it doesn't need to cover the complete syntax of scheme. but we should add a bit more than this current bare minimum.

  • characters

allow more permissive rules

A well-formed PEG always accept or fails, never go in a infinite loop.

There's only two ways a PEG is not-well-formed : left recursion or star over nullable expression.

E <- E '+' T ; is a example of left recursion
A <- ('a'*)* ; is a example of star over nullable expression

If we can detect this states, we can show to the user, improving error message instead of the user see a infinite parsing time .

Other possibility is allow more permissive rules

There are algorithms that receive a left-recursive grammar, turn in a non-left recursive grammar, obtain the parse tree and transform that to a parse tree compatible with the original grammar. If we use this, mostly grammar can be just ported to peg, without any change.

PEG.js do this. I am studing this algorithm, but yet don't understand total.

I don't know if there's algorithm to detect star over nullable expression to do this with the second restriction.

benchmark this

do some timing measurements against other parsers e.g. lalr parser in guile scheme. see how efficient this parser is on a range of languages compared to other systems.

shorthand for dropping a string

A common idiom is to define rules like OB < '{" ;

What shorthand could be created for this?

  • maybe "{" could be used, but it does't seem logical.
  • maybe a drop operator that can be applied to anything, which char though?

change (? e) to (or e (epsilon))

the optional operator (?) is not fundamental. He can be written using "or" in the way on title.

If we do that, as we do when add the (&) operator, in a future when a feature must be added, will be less code to alter, less tests to do.

require peg instead of peg/peg

Is there any way we can support (require peg) as well as the #lang peg, instead of being forced to do (require peg/peg)?

cannot pkg

cannot make info.rkt right for the racket pkg page.

Parse Failure

#lang peg

item <-- [^{]+ / '{' item '}' ;

;> (peg item "{abc}")
;     {abc}          
;          ^ here
; peg: parse failed in rule peg-rule:item at location 5 with options '(or (+ (and (! #\{) (any-char))) (and "{" item "}"))

semantic value

put capabilities to get semantic value from "#lang peg"

something like

#lang peg

number <- (name value [0-9]*) (lambda () value) ;

import feature

We need an import feature like this:

import name1.rkt ;
import name2.rkt ;
import name3.rkt ;
rule1 <- .. ;
rule2 <- .. ;

To be able to split parsers up into multiple files, and also to import supporting functions for semantic actions.

document pegvm semantics

#69

When I think of put other continuation, I bumped into my ignorance about the formal functioning of pegvm.

add empty string support

#lang peg

a <- '' 'a' ;

this don't work. Because expect to have at least one char inside single quotes

Allow qualified import

The great advantage of peg is composition. Imagine someone write a peg to a language and define expressions only as arithmetic Expressions. I want to put Boolean expressions, function call and other things. I can't just import aritexp.rkt because he export expression. I will need to change the code.

One possible alternative can be import only some Peg's, or import with a prefix.

import aritexp from otherexp.rkt;
import exp as aritexp from otherexp.rkt;
import otherexp.rkt with prefix other;

In this way, the number of languages can be multiplied by 10 because of reuse

add '[' support

#lang peg

(define (f l)
      (match
                  [(list a b) a]
                  [(list a) (list a 1)]))

a <- 'a' ;

This file don't work because we only support ( and don't support [ but [ is normal in racket code

Bug with context-sensitive grammar

I took example from Ford's original paper:

A ← aAb/ε
B ← bBc/ε
S ← &(A!b)a∗ B!.
#lang racket

(require peg)

(define-peg A
  (or (and 	(char #\a) A (char #\b)) 	(epsilon)))

(define-peg B
  (or (and 	(char #\b) B (char #\c)) 	(epsilon)))

(define-peg S
  (and (& (and A (! (char #\b)))) (* (char #\a)) B (! (any-char))))

(peg S "aaabbcc") ; should not match

(peg S "aabbcc")  ; works as expected

(peg S "abbcc")   ; works as expected

Interesting that this bug is consistent across several implementations.

Port to guile

For porting define-for-syntax consider:

(eval-when (expand load eval) (define (gen-id xxx) ...))

define those functions in a separate library and do (import (for (helper-library) expand run))

implement memorization of results

So, when we do something like

A <- ifElse / if
ifElse <- if else
if <- ....

When try to parse a simple if with A, ifElse will fail, then will try "if", but will remember "Hey, I match a if starting in this text position, so I will apply this result to the continuation sucess and try to parse the rest"

And in this way, the cost of memory can be bigger if not implemented in a clever waty, but the time cost will be smaller

unicode safe

Currently the system only really works with ASCII. We should support full Unicode. This may be difficult, but here is an issue for it.

peg-in-peg don't accept itself

To study how to resolv #6 I try extend peg-in-peg but the peg don't accept itself

#lang peg

nt-char <- [a-zA-Z0-9_\-] ;
nonterminal <-- nt-char+ !nt-char SP ;
SP < [ \t\n]* ;

literal <-- SQ (BS ['\\] / !['\\] .)* SQ SP ;
SQ < ['] ;
BS < [\\] ;

charclass <-- LB '^'? (cc-range / cc-escape / cc-single)+ RB SP ;
cc-range <-- cc-char DASH cc-char ;
cc-escape <-- BS . ;
cc-single <-- cc-char ;
cc-char <- !cc-escape-char . ;
cc-escape-char <- '[' / ']' / '-' / '^' / '\\' / 'n' / 't' ;
LB < '[' ;
RB < ']' ;
DASH < '-' ;

peg <-- SP grammar+ ;
grammar <-- (nonterminal ('<--' / '<-' / '<') SP pattern) ';' SP ;
pattern <-- alternative (SLASH SP alternative)* ;
alternative <-- expression+ ;
expression <-- [!]? SP primary ([*+?] SP)? ;
primary <-- '(' SP pattern ')' SP / '.' SP / literal / charclass / nonterminal ;
SLASH < '/' ;"

DRRACKET TERMINAL

(peg peg "nt-char <- [a-zA-Z0-9_\-] ;
nonterminal <-- nt-char+ !nt-char SP ;
SP < [ \t\n]* ;

literal <-- SQ (BS ['\\] / !['\\] .)* SQ SP ;
SQ < ['] ;
BS < [\\] ;

charclass <-- LB '^'? (cc-range / cc-escape / cc-single)+ RB SP ;
cc-range <-- cc-char DASH cc-char ;
cc-escape <-- BS . ;
cc-single <-- cc-char ;
cc-char <- !cc-escape-char . ;
cc-escape-char <- '[' / ']' / '-' / '^' / '\\' / 'n' / 't' ;
LB < '[' ;
RB < ']' ;
DASH < '-' ;

peg <-- SP grammar+ ;
grammar <-- (nonterminal ('<--' / '<-' / '<') SP pattern) ';' SP ;
pattern <-- alternative (SLASH SP alternative)* ;
alternative <-- expression+ ;
expression <-- [!]? SP primary ([*+?] SP)? ;
primary <-- '(' SP pattern ')' SP / '.' SP / literal / charclass / nonterminal ;
SLASH < '/' ;
")

OUTPUT

read: unknown escape sequence - in string

generate #lang peg/syntax

#7 #13

this is a very future functionality.

The peg lang actually generate structs that is cool to do a lot of things, but is not so cool to implement languages inside racket. In #lang peg/syntax the generate object is a syntax object, so the parser file can be the reader file directly.

We don't need to add lexical information on create a language, just create the semantic, import in the parser, and the parser becomes a reader because generate syntax object.

use case :

  1. The user write a language that have a formal semantic defined.
  2. So, he write the grammar under #lang peg, make some tests, extend, etc
  3. after this, he change the lang in the parser to #lang peg/syntax and voilà, a reader.

[n] charclass

I'm not allowed to put [n] in a charclass. fix this.

Add verbose option on peg

There's only two options to working peg : he match or he dont. When I have problems with peg, I miss some path information

#lang peg

number <- [0-9]+;
identifier <- [a-zA-Z]+ ;
terminal <- number /    identifier ;

(peg terminal "123" #:verbose) outputs :

match "123" with terminal -> number -> "123"

or (peg terminal "foobar" #:verbose) outputs :

match "foobar" with terminal -> number -> fail -> terminal ->identifier -> "foobar"

Release under LGPLv3+

Hi,

I note that the library is released under GPLv3. Is it possible relicensing under LGPLv3+ so it can be used in other projects without forcing them to become GPL projects?

In case of affirmative answer, let me know, that I will create a pull request with correct headers on all source-files, and a clarification of the license in the README, following the GNU guidelines.

Otherwise if the GPL license was chosen deliberately, it is obviously 100% fair, i.e. it is your work! :-)

Many thanks in any case,
Massimo

Overriding/adding alternatives to rules from imported modules

It would be useful to be able to declare a grammar which extends an existing one with new alternatives for a rule, e.g. something like:

expr.rkt:

#lang peg

program <- expr;

expr <- sum;

sum <- (a:product op:('+' / '-') b:sum) / x:product ->
  (if x x (list (string->symbol op) a b));

product <- (a:value op:('*' / '/') b:product) / x:value ->
  (if x x (list (string->symbol op) a b));

value <- x:number / '(' x:expr ')' -> x;

number <- n:[0-9]+ -> (string->number n);

enhanced-expr.rkt:

#lang peg

(require "expr.rkt");

program <- expr;
value <- 'sqrt(' e:expr ')' -> (list sqrt e);

and then have the REPL behave like

Welcome to Racket v7.4.
> (require "enhanced-expr.rkt")
> (require peg)
> (peg program "1+sqrt(16)")
'(+ 1 (#<procedure:sqrt> 16))

I hope the example is clear enough to express the problem I'm facing. Ideally, I'd like to have a way of adding new alternatives to rules defined in another source file, or to override them and have the ability to access the original alternatives to keep them available.

I'm very new to Racket and I apologize if this isn't a limitation of racket-peg but of the Racket module system and I'm open to any workaround suggestion. I think I understand why it's not working this way - definitions in a module have nothing to do with definitions in required modules, which keep their original bindings - however I have no idea how to workaround the issue. Maybe there is some alternative to require which "embeds" the specified module in the current one and has some policy for definition overrides?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.