vladcc / rdpg Goto Github PK
View Code? Open in Web Editor NEWOptimizing LL(1) recursive descent parser generator
License: MIT License
Optimizing LL(1) recursive descent parser generator
License: MIT License
!!!NOTE!!! This project has moved to: https://github.com/vladcc/shawk/tree/main/shawk/awk/rdpg # rdpg Compiler-compiler in awk rdpg is a compiler-compiler in awk. It takes a context free grammar representation and translates that to a LL(1) recursive descent parser for a target language. This translation consists of two or three stages, each of which is represented as a separate awk script: naive translation, optimization, and target language generation. The naive translation and optimization steps output an intermediate representation. In the final stage this ir is translated to the target language. The output of each stage is plain text which becomes the input of the next stage. Note that you can write a backend for any language in any language, as long as that backend understands the ir, and you'd still be able to use rdpg and the optimizer at the front of the pipeline. As a very short example: $ cd rdpg/ $ head examples/infix_calc_grammar.txt # start symbol rule parse defn statements end rule statements defn statement eoi defn statement statements end $ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | head -n 25 | cat -n 1 comment generated by rdpg.awk 1.11 2 func parse 3 block_open parse_1 4 comment rule parse 5 comment defn statements 6 call tok_next 7 if call statements 8 block_open parse_2 9 return true 10 block_close parse_2 11 else 12 block_open parse_2 13 return false 14 block_close parse_2 15 block_close parse_1 16 func_end 17 func statements 18 block_open statements_1 19 comment rule statements 20 comment defn statement eoi 21 comment defn statement statements 22 if call statement 23 block_open statements_2 24 if call eoi 25 block_open statements_3 A complete run with maximum optimization and translation to awk will look like: $ awk -f rdpg_ir.awk -f rdpg.awk <input-file> | \ > awk -f rdpg_ir.awk -f rdpg-opt.awk -vOlvl=5 | \ > awk -f rdpg_ir.awk -f rdpg-to-awk.awk E.g.: $ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | awk -f rdpg_ir.awk -f rdpg-opt.awk -vOlvl=5 | awk -f rdpg_ir.awk -f rdpg-to-awk.awk | head -n 25 | cat -n 1 # <definitions> 2 # translated by rdpg-to-awk.awk 1.01 3 # generated by rdpg.awk 1.11 4 # optimized by rdpg-opt.awk 1.1 Olvl=5 5 function parse( _arr) { 6 # rule parse 7 # defn statements 8 tok_next() 9 while (1) { 10 if (statement()) { 11 if (eoi()) { 12 return 1 13 } else { 14 continue 15 } 16 } 17 return 0 18 } 19 } 20 function eoi( _arr) { 21 # rule eoi? 22 # defn EOI 23 if (tok_match(EOI())) { 24 tok_next() 25 return 1 Each stage needs to know about the ir which is defined in rdpg_ir.awk, so it becomes somewhat cumbersome to invoke. The optimization step can be skipped, but then you'd end up with a naive parser, which will use a lot of stack space because of needless recursion. $ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | awk -f rdpg_ir.awk -f rdpg-to-awk.awk | head -n 25 | cat -n 1 # <definitions> 2 # translated by rdpg-to-awk.awk 1.01 3 # generated by rdpg.awk 1.11 4 function parse( _arr) { 5 # rule parse 6 # defn statements 7 tok_next() 8 if (statements()) { 9 return 1 10 } else { 11 return 0 12 } 13 } 14 function statements( _arr) { 15 # rule statements 16 # defn statement eoi 17 # defn statement statements 18 if (statement()) { 19 if (eoi()) { 20 return 1 21 } else if (statements()) { 22 return 1 23 } else { 24 return 0 25 } Any of the rdpg*.awk scripts except for rdpg_ir.awk can be invoked with the -vHelp=1 flag for more information on what they do. Below is a breakdown of the project structure. 1. rdpg: rdpg.awk - the main translator rdpg_ir.awk - the ir definition, i.e. string constants rdpg-opt.awk - the optimizer rdpg-to-awk.awk - translates ir to awk rdpg-to-c.awk - translates ir to C 2. The rdpg parser: rdpg.awk generates ir from a CFG, but still needs to parse the CFG representation. To keep things simple, I've chosen to use a line oriented state machine parser. Gets the job done, it's trivial to implement in awk and is easy to generate. Also, it maps pretty well to how CFGs are usually written down. gen-rdpg/scriptscript.awk - this script generates the parser part of rdpg.awk gen-rdpg/rdpg-rules.txt - the input to scriptscript.awk 3. The interesting parts of the examples directory: examples/infix_calc_grammar.txt - represents a context free grammar in rdpg syntax for the venerable infix expression calculator example. Supports exponentiation, in order to demonstrate right associativity, and a basic on-error token synchronization. examples/test_input.txt - this file is given to the generated parsers during their test runs. The output of each parser must match examples/accept_result.txt examples/awk - here you can find one awk source file per rdpg optimization level examples/c - same as examples/awk but in C 4. Testing: tests/run-tests.sh - the entry point for all tests. Exercises the whole pipeline. Starts with basic rdpg.awk functionality, then moves to optimization, and ends with the generation of one parser per optimization level per target language. These parsers are then run with examples/test_input.txt as input and are actually the files from the examples/awk and examples/c directories. Correctness is confirmed by diff-ing the current output with the expected one. run-tests.sh can be run from any directory. By default, if all tests pass it doesn't say anything and prints information only when something goes wrong. If you give it an argument, however, it will print the commands for each step it executes. E.g.: $ bash tests/run-tests.sh $ $ bash tests/run-tests.sh x | head diff <(awk -f ../rdpg.awk -vVersion=1) <(echo 'rdpg.awk 1.11') [ 0 -eq 0 ] diff <(awk -f ../rdpg-opt.awk -vVersion=1) <(echo 'rdpg-opt.awk 1.1') [ 0 -eq 0 ] diff <(awk -f ../rdpg-to-c.awk -vVersion=1) <(echo 'rdpg-to-c.awk 1.01') [ 0 -eq 0 ] diff <(awk -f ../rdpg-to-awk.awk -vVersion=1) <(echo 'rdpg-to-awk.awk 1.01') [ 0 -eq 0 ] diff <(awk -f ../rdpg_ir.awk -f ../rdpg.awk ./test_rdpg/test_rdpg_opt_olvl3_inf_loop.txt | awk -f ../rdpg_ir.awk -f ../rdpg-opt.awk -vOlvl=3) ./test_rdpg/accept_rdpg_opt_olvl3_inf_loop.txt [ 0 -eq 0 ]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.