Code Monkey home page Code Monkey logo

parser-combinator's Introduction

Parser Combinator

This library provides a DSL which you can use to easily generate parsers in Ruby.

At it's core, it's a parser combinator library, but you don't need to worry about that. You build more complex expression based on simple ones, and match any formal language you want.

Here's what the grammars look like, this demo will parse JSON:

parser = Grammar.build do
  rule(:bopen)       { (one "{") > whitespace }
  rule(:bclose)      { whitespace < (one "}") }
  rule(:semicolon)   { whitespace < (one ":") > whitespace }
  rule(:comma)       { whitespace < (one ",") > whitespace }
  rule(:quote)       { one '"' }
  rule(:true)        { str "true" }
  rule(:false)       { str "false" }
  rule(:null)        { str "null" }

  # string
  rule(:hexdigit)           { anyChar %w[0 1 2 3 4 5 6 7 8 9 a b c d e f] }
  rule(:hexdigits)          { (one "u") >> (exactly(4) { (rule :hexdigit) }) }
  rule(:any_escaped_char)   { (one "\\") >> ((anyChar %w[" \\ / b f n r t]) | (rule :hexdigits)) }
  rule(:any_unescaped_char) { (anyCharBut %w[" \\]) }
  rule(:string_char)        { (rule :any_unescaped_char) | (rule :any_escaped_char) }
  rule(:string)             { match (many0 { (rule :string_char) }), between: [(rule :quote), (rule :quote)] }

  # number
  rule(:decimal)              { (one '.') >> many1 { anyNumber } }
  rule(:cientific)            { (anyChar %w[e E]) >> (anyChar %w[+ -]) >> many1 { anyNumber } }
  rule(:decimal_or_cientific) { (rule :decimal) > (rule :cientific) }
  rule(:positive_number)      { ((one "0") | many1 { anyNumber }) > (rule :decimal_or_cientific) }
  rule(:number)               { (one "-") < (rule :positive_number) }

  # array
  rule(:array_body) { (rule :value_group) | empty }
  rule(:array)      { match (rule :array_body), between: [(one "["), (one "]")] }

  rule(:value_group) { ((rule :value) >> (rule :comma) >> (rule :value_group)) | (rule :value)  }
  rule(:value)       { (rule :string) | (rule :number) | (rule :object) | (rule :array) | (rule :true) | (rule :false) | (rule :null) }
  rule(:pair)        { (rule :string) >> (rule :semicolon) >> (rule :value) }
  rule(:pair_group)  { ((rule :pair) >> (rule :comma) >> (rule :pair_group)) | (rule :pair) }
  rule(:pair_body)   { (rule :pair_group) | empty }
  rule(:object)      { match (rule :pair_body), between: [(rule :bopen), (rule :bclose)] }

  # The last rule is always the starting rule, but let's make things clear
  start(:object)
end

parser.run('{ "foo": "bar" }').ok? # => true
parser.run('{ "foo": }').ok?       # => false
parser.run('not even json').ok?    # => false

It might look a bit cryptic at first but the power combinators give us is well worth the initial learning curve! Don't believe me? Let me show you.

Introduction

Okay let's do this! I'll show you how to use this library and why it's awesome. Let's say we want to match an assign statement:

foo = 1

We can start right away!

# this uses minitest, see `test/test_turorial.rb`
parser = Grammar.build do
  rule(:assign) { (str "foo = 1") }

  start(:assign)
end

parser.run("foo = 1").ok?.must_equal true

That was almost cheating wasn't it. Let's say we want to be able to match any number now:

NOTE That looks like RSpec doesn't it? Well it's what most Ruby DSLs look like, and if you've ever worked with any, you'll feel right at home. If not, don't worry a DSL is a tiny language afterall!

parser = Grammar.build do
  rule(:assign) { (str "foo = ") >> anyNumber }

  start(:assign)
end

parser.run("foo = 1").ok?.must_equal true
parser.run("foo = 3").ok?.must_equal true
parser.run("foo = 9").ok?.must_equal true

It works! It really is that easy. But what is that >> thing over there? It just means "match this AND THEN match this other thing". If any of them fails, the rule fails.

Okay now let's say we want real identifiers, not just foo:

parser = Grammar.build do
  rule(:assign) { many1 { anyLetter } >> (str " = ") >> anyNumber }

  start(:assign)
end

parser.run("foo = 1").ok?.must_equal true
parser.run("bar = 3").ok?.must_equal true
parser.run("baz = 9").ok?.must_equal true

Oh! Almost too easy right? many1 is a parser provided for you, it takes a block. The block must return a parser which will be run on the input one or more times. It's the same as saying anyLetter+.

We also use something called anyLetter. It is a parser provided by the library and it matches [a-zA-Z]+.

NOTE As you might have guessed, you are also provided of a parser called many0. many0 { anyLetter } is the equivalent of anyLetter*.

Now, let's tidy up the grammar a bit:

parser = Grammar.build do
  rule(:equals) { whitespace < (one "=") > whitespace }
  rule(:assign) { many1 { anyLetter } >> (rule :equals) >> anyNumber }

  start(:assign)
end

parser.run("foo = 1").ok?.must_equal true
parser.run("bar =3").ok?.must_equal  true
parser.run("baz= 9").ok?.must_equal  true

Nice! What is > and < you ask? Well, they are similar to >>, and they are called combinators. This is a parser combinator afterall right? That whole concept is borrowed from functional programming, you don't really need it to use the library at all though, so don't worry.

The combinator > means "take whatever is on the left, and the right is optional". As you might have guessed, < means the exact opposite, "take whatever is on the right, and the left is optional". We can combine those two in a hacky way to write

whitespace < (one "=") > whitespace

You can think about it as "surround (one "=") with optional stuff".

And that's it! Not bad for a 5 minutes intro huh?

Documentation

The documentation is a WIP, and reflects the actual API as much as possible, the true API lives in the tests, so it's highly recommended to check them out. Tests can be found in test/test_*.rb.

Base Parsers

The library provides several base parsers for you. Those are used to constuct bigger, more complex parsers.

Binary Combinators

Logical OR: |

let(:foo) { rule(:bar) | rule(:baz) }

Logical AND: >>

let(:foo) { rule(:bar) >> rule(:baz) }

Unary Combinators

TODO

  • and +

Development

$ bundle install

Parsers

A parser is an instance of Parser, an object with a run method which takes some input and returns a ParserResult.

Running tests

$ ruby -Ilib:test test/test_parser.rb

TODO

Decent syntax error reporting, eg: Which line, which column failed.

parser-combinator's People

Contributors

gosukiwi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.