Code Monkey home page Code Monkey logo

rdpg's Introduction

!!!NOTE!!! This project has moved to:
https://github.com/vladcc/shawk/tree/main/shawk/awk/rdpg



# rdpg
Compiler-compiler in awk

rdpg is a compiler-compiler in awk. It takes a context free grammar
representation and translates that to a LL(1) recursive descent parser for a
target language. This translation consists of two or three stages, each of
which is represented as a separate awk script: naive translation, optimization,
and target language generation. The naive translation and optimization steps
output an intermediate representation. In the final stage this ir is translated
to the target language. The output of each stage is plain text which becomes the
input of the next stage. Note that you can write a backend for any language in
any language, as long as that backend understands the ir, and you'd still be
able to use rdpg and the optimizer at the front of the pipeline.

As a very short example:

$ cd rdpg/
$ head examples/infix_calc_grammar.txt 
# start symbol
rule parse
defn statements
end

rule statements
defn statement eoi
defn statement statements
end
$ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | head -n 25 | cat -n
     1  comment generated by rdpg.awk 1.11
     2  func parse
     3  block_open parse_1
     4  comment rule parse
     5  comment defn statements
     6          call tok_next
     7          if call statements
     8          block_open parse_2
     9                  return true
    10          block_close parse_2
    11          else
    12          block_open parse_2
    13                  return false
    14          block_close parse_2
    15  block_close parse_1
    16  func_end
    17  func statements
    18  block_open statements_1
    19  comment rule statements
    20  comment defn statement eoi
    21  comment defn statement statements
    22          if call statement
    23          block_open statements_2
    24                  if call eoi
    25                  block_open statements_3
        
A complete run with maximum optimization and translation to awk will look like:

$ awk -f rdpg_ir.awk -f rdpg.awk <input-file> | \
> awk -f rdpg_ir.awk -f rdpg-opt.awk -vOlvl=5 | \
> awk -f rdpg_ir.awk -f rdpg-to-awk.awk

E.g.:

$ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | awk -f rdpg_ir.awk -f rdpg-opt.awk -vOlvl=5 | awk -f rdpg_ir.awk -f rdpg-to-awk.awk | head -n 25 | cat -n
     1  # <definitions>
     2  # translated by rdpg-to-awk.awk 1.01
     3  # generated by rdpg.awk 1.11
     4  # optimized by rdpg-opt.awk 1.1 Olvl=5
     5  function parse(    _arr) {
     6  # rule parse
     7  # defn statements
     8          tok_next()
     9          while (1) {
    10                  if (statement()) {
    11                          if (eoi()) {
    12                                  return 1
    13                          } else {
    14                                  continue
    15                          }
    16                  }
    17                  return 0
    18          }
    19  }
    20  function eoi(    _arr) {
    21  # rule eoi?
    22  # defn EOI
    23          if (tok_match(EOI())) {
    24                  tok_next()
    25                  return 1

Each stage needs to know about the ir which is defined in rdpg_ir.awk, so it
becomes somewhat cumbersome to invoke. The optimization step can be skipped, but
then you'd end up with a naive parser, which will use a lot of stack space
because of needless recursion.

$ awk -f rdpg_ir.awk -f rdpg.awk examples/infix_calc_grammar.txt | awk -f rdpg_ir.awk -f rdpg-to-awk.awk | head -n 25 | cat -n
     1  # <definitions>
     2  # translated by rdpg-to-awk.awk 1.01
     3  # generated by rdpg.awk 1.11
     4  function parse(    _arr) {
     5  # rule parse
     6  # defn statements
     7          tok_next()
     8          if (statements()) {
     9                  return 1
    10          } else {
    11                  return 0
    12          }
    13  }
    14  function statements(    _arr) {
    15  # rule statements
    16  # defn statement eoi
    17  # defn statement statements
    18          if (statement()) {
    19                  if (eoi()) {
    20                          return 1
    21                  } else if (statements()) {
    22                          return 1
    23                  } else {
    24                          return 0
    25                  }

Any of the rdpg*.awk scripts except for rdpg_ir.awk can be invoked with the
-vHelp=1 flag for more information on what they do. 

Below is a breakdown of the project structure.

1. rdpg:
rdpg.awk        - the main translator
rdpg_ir.awk     - the ir definition, i.e. string constants
rdpg-opt.awk    - the optimizer
rdpg-to-awk.awk - translates ir to awk
rdpg-to-c.awk   - translates ir to C

2. The rdpg parser:
rdpg.awk generates ir from a CFG, but still needs to parse the CFG
representation. To keep things simple, I've chosen to use a line oriented state
machine parser. Gets the job done, it's trivial to implement in awk and is easy
to generate. Also, it maps pretty well to how CFGs are usually written down.

gen-rdpg/scriptscript.awk - this script generates the parser part of rdpg.awk
gen-rdpg/rdpg-rules.txt   - the input to scriptscript.awk

3. The interesting parts of the examples directory:
examples/infix_calc_grammar.txt - represents a context free grammar in rdpg
syntax for the venerable infix expression calculator example. Supports
exponentiation, in order to demonstrate right associativity, and a basic
on-error token synchronization.

examples/test_input.txt - this file is given to the generated parsers during
their test runs. The output of each parser must match examples/accept_result.txt

examples/awk - here you can find one awk source file per rdpg optimization level
examples/c   - same as examples/awk but in C

4. Testing:
tests/run-tests.sh - the entry point for all tests. Exercises the whole
pipeline. Starts with basic rdpg.awk functionality, then moves to optimization,
and ends with the generation of one parser per optimization level per target
language. These parsers are then run with examples/test_input.txt as input and
are actually the files from the examples/awk and examples/c directories.
Correctness is confirmed by diff-ing the current output with the expected one.
run-tests.sh can be run from any directory. By default, if all tests pass it
doesn't say anything and prints information only when something goes wrong. If
you give it an argument, however, it will print the commands for each step it
executes. E.g.:
$ bash tests/run-tests.sh
$
$ bash tests/run-tests.sh x | head
diff <(awk -f ../rdpg.awk -vVersion=1) <(echo 'rdpg.awk 1.11')
[ 0 -eq 0 ]
diff <(awk -f ../rdpg-opt.awk -vVersion=1) <(echo 'rdpg-opt.awk 1.1')
[ 0 -eq 0 ]
diff <(awk -f ../rdpg-to-c.awk -vVersion=1) <(echo 'rdpg-to-c.awk 1.01')
[ 0 -eq 0 ]
diff <(awk -f ../rdpg-to-awk.awk -vVersion=1) <(echo 'rdpg-to-awk.awk 1.01')
[ 0 -eq 0 ]
diff <(awk -f ../rdpg_ir.awk -f ../rdpg.awk ./test_rdpg/test_rdpg_opt_olvl3_inf_loop.txt | awk -f ../rdpg_ir.awk -f ../rdpg-opt.awk -vOlvl=3) ./test_rdpg/accept_rdpg_opt_olvl3_inf_loop.txt
[ 0 -eq 0 ]

rdpg's People

Contributors

vladcc avatar

Stargazers

Volodymyr Gubarkov avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.