Code Monkey home page Code Monkey logo

obelisk's Introduction

Obelisk Build Status Mentioned in Awesome OCaml

Obelisk is a simple tool which produces pretty-printed output from a Menhir parser file (.mly).

It is inspired from yacc2latex and is also written in OCaml, but is aimed at supporting features from Menhir instead of only those of ocamlyacc.

Table of Contents

Installation

Dependencies

The Makefile also uses imagemagick and wkhtmltopdf to build documentation images.

In addition to the package suffix, which is used to define starred commands, here is a summary of package dependencies for the different LaTeX modes:

OPAM

If you use OPAM, just type:

opam install obelisk

Manual installation

Just git clone to clone the Obelisk repository, then type:

dune build

This will provide you with an executable which you can feed .mly files with: dune exec src/main.exe -- <options> <file.mly>.

If you want to install obelisk, you can type:

dune install [--prefix <the destination directory>]

Usage

obelisk [latex|html] [options] <files>

If multiple files are specified, Obelisk will output a concatenated result, without consistency checks, so the user is responsible for avoiding eg. name clashes between the several files.

By default Obelisk defaults to standard output, use -o <file> to specify an output file.

Pattern recognition

Obelisk can infer some common patterns (possibly parameterized):

  • options
  • lists and non-empty lists
  • separated lists and non-empty separated lists

Once recognized, if the -i switch is specified the rules are deleted and their instances are replaced with default constructions (eg. _*, _+, [_]). Without the -i flag, only the productions of the recognized rules are replaced, the total amount of rules remaining the same.

For example, on these simple rules (from this file):

my_option(X, Y):
  |     {}
  | Y X {}

my_list(A):
  |              {}
  | A my_list(A) {}

my_nonempty_list(C):
  | C                     {}
  | C my_nonempty_list(C) {}

my_separated_nonempty_list(X,Y):
  | X                                   {}
  | X Y my_separated_nonempty_list(X,Y) {}

my_separated_list(X,S):
  |                                 {}
  | my_separated_nonempty_list(X,S) {}

my_rule(E,F,S1,S2):
  | my_option(E, F)                    {}
  | my_list(E)                         {}
  | my_nonempty_list(F)                {}
  | my_separated_nonempty_list(E,S1)   {}
  | my_separated_list(F,S2)            {}

Obelisk outputs:

<my_option(X, Y)> ::= [Y X]

<my_list(A)> ::= A*

<my_nonempty_list(C)> ::= C+

<my_separated_nonempty_list(X, Y)> ::= X (Y X)*

<my_separated_list(X, S)> ::= [X (S X)*]

<my_rule(E, F, S1, S2)> ::= <my_option(E, F)>
                          | <my_list(E)>
                          | <my_nonempty_list(F)>
                          | <my_separated_nonempty_list(E, S1)>
                          | <my_separated_list(F, S2)>

And with the -i switch:

<my_rule(E, F, S1, S2)> ::= [F E]
                          | E*
                          | F+
                          | E (S1 E)*
                          | [F (S2 F)*]

Multi-format output

By default the output format is a simple text format close to the BNF syntax. You can use the subcommands latex or html to get a LaTeX (resp. HTML) file.

In default and HTML mode, the option -noaliases avoid printing token aliases in the output.

LaTeX

Use the following options to tweak the LaTeX:

  • -tabular: a tabular-based format from the tabu package (default)
  • -syntax: use the syntax package
  • -backnaur: use the backnaur package (not recommended: manual line-wrapping through this trick)

In either cases, the output may be customized via the use of LaTeX commands that you can redefine to fit your needs. The commands names are auto-generated from the terminal names, and because of LaTeX limitations, underscore are removed and numbers are converted into their roman form.

By default in LaTeX mode, the -o <grammar.tex> switch will produce the standalone LaTeX file <grammar.tex> which you can directly compile (eg. with pdflatex).

But in conjunction with -o <grammar.tex>, you can use -package <definitions> to output two files:

  1. a LaTeX file <grammar.tex> containing only the grammar contents ;
  2. a package file <definitions.sty> (the .sty extension is added automatically) containing the necessary extra packages requirements and command definitions.

These two files are then intended to be included in a non-supplied main LaTeX file following this example skeleton:

\documentclass[preview]{standalone}

\usepackage{definitions}

\begin{document}

\include{grammar}

\end{document}

To avoid name clashes, in particular when using the -package option and eg. importing multiple grammars with the same LaTeX commands names, or in the case where one of the syntax construction name matches one already defined LaTeX macro, you can specify a common prefix for the commands with the option -prefix <myprefix>.

As end-beginning commands are forbidden in LaTeX, commands creating from rules with names beginning with end are automatically prefixed with zzz.

HTML

The HTML file uses internal CSS stylesheet which allows one to customize the output (in a poorer way than in the latex mode). The stylesheet uses content properties for some parts of the grammar by default (-css option), to make it modular and easily modifiable, but then some symbols are not treated as content and, for example, are not copy-pastable. Use the -nocss option to disable the use of such properties.

Example

Here are the different formats output obtained by Obelisk from its own parser.

Default

<specification> ::= <rule>* EOF

<rule> ::= <old_rule>
         | <new_rule>

<old_rule> ::= [<flags>] <ident> ATTRIBUTE* <parameters(<ident>)> COLON
               <optional_bar> <group> (BAR <group>)* SEMICOLON*

<flags> ::= PUBLIC
          | INLINE
          | PUBLIC INLINE
          | INLINE PUBLIC

<optional_bar> ::= [BAR]

<group> ::= <production> (BAR <production>)* ACTION [<precedence>]

<production> ::= <producer>* [<precedence>]

<producer> ::= [LID EQ] <actual> ATTRIBUTE* SEMICOLON*

<generic_actual(A, B)> ::= <ident> <parameters(A)>
                         | B <modifier>

<actual> ::= <generic_actual(<lax_actual>, <actual>)>

<lax_actual> ::= <generic_actual(<lax_actual>, <actual>)>
               | <group> (BAR <group>)*

<new_rule> ::= [PUBLIC] LET LID ATTRIBUTE* <parameters(<ident>)> <binder>
               <expression>

<binder> ::= COLONEQ
           | EQEQ

<expression> ::= <optional_bar> <seq_expression> (BAR <seq_expression>)*

<seq_expression> ::= [<pattern> EQ] <symbol_expression> SEMICOLON
                     <seq_expression>
                   | <symbol_expression>
                   | <action_expression>

<symbol_expression> ::= <ident> <parameters(<expression>)>
                      | <symbol_expression> <modifier>

<action_expression> ::= <action>
                      | <action> <precedence>
                      | <precedence> <action>

<action> ::= ACTION
           | POINTFREEACTION

<pattern> ::= LID
            | UNDERSCORE
            | TILDE
            | LPAR [<pattern> (COMMA <pattern>)*] RPAR

<modifier> ::= OPT
             | PLUS
             | STAR

<precedence> ::= PREC <ident>

<parameters(X)> ::= [LPAR [X (COMMA X)*] RPAR]

<ident> ::= UID
          | LID
          | QID

LaTeX

Tabular

Tabular

Syntax

Syntax

Backnaur

Backnaur

HTML

With CSS content properties

HTMLCSS

Without CSS content properties

HTMLNOCSS

obelisk's People

Contributors

lelio-brun avatar zapashcanon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

obelisk's Issues

Bars

Production and group bars should be treated differently in printers.

[LaTeX] Incorrect or dubious printing

The following file does not produce a very nice output for comma_sep and comma_sep1:

%public %inline comma_sep(X) :
  xs = separated_list(COMMA, X)
    { xs }

%public %inline comma_sep1(X) :
  xs = separated_nonempty_list(COMMA, X)
    { xs }

Display of pairs seems erroneous

The grammar

edn_assoc:
  | BEGIN_ASSOC; lp = list(pair(value, value)); RIGHT_BRACE { lp }
;

will result in

<edn_assoc> ::= BEGIN_ASSOC <value> <value>* RIGHT_BRACE

and I would expect

<edn_assoc> ::= BEGIN_ASSOC (<value> <value>)* RIGHT_BRACE

Please let me know if there something I'm missing!
Also, thank you for the great software ๐Ÿ‘

nonempty_list isn't recognized ?

Hi !

I've have a .mly file that looks like:

let expr :=
...
| MATCH; m = expr; l = nonempty_list(match_case); END; { Match (Raw, m, l) }

I then run obelisk -i parser.mly and get:

<expr> ::= ...
         ...
         | MATCH <expr> nonempty_list(<match_case>) END

But I would expect the following:

<expr> ::= ...
         ...
         | MATCH <expr> <match_case>+ END

Am I missing something ? Note that, in the same file, list(X) is correctly recognized and translated to X*.

EDIT: I'm on OCaml 4.09.0, menhir 20200211 and obelisk 0.5.0.

nonempty_list latex output is incomplete

Hi,

On the same file than the one in #10 ; on the line with the nonempty_list, here's what I get in the LaTeX output:

  & \grambar &\MATCH*{} \expr*{} nonempty\_list \zzzzzzEND*{}\\

Which visually gives something like: MATCH <expr> nonempty_list END.

Obviously, something is missing, it should be: MATCH <expr> nonempty_list(<match_case>) END.

Productions starting with "end" generate wrong Latex

Consider the following production definition:

endif: END | ENDIF { }

In Latex version, it will generate a \newcommand\endif{end\_if}, however, \newcommand definitions are not allowed to have names starting with end, or it will emit an error. The generated code for productions starting with end should be with \def instead.

[LaTeX] Add an option to generate only the grammar

Unless I'm mistaken, it's only possible to generate a complete, standalone file, as of now. If so, it would be nice to add an option to generate only the grammar part, in order to include it in a larger document.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.