Code Monkey home page Code Monkey logo

pcreparser's Introduction

PCREParser

Introduction

An ANTLR 3 grammar that generates a parser able to parse PCRE (Perl compatible regular expressions) and produce an abstract syntax tree (AST) of such expressions.

For an ANTLR 4 grammar, have a look here: https://github.com/bkiers/pcre-parser

For a JavaScript version, checkout the js branch.

Download/get library

To get the library, checkout the project and run mvn clean install, or download the jar.

You can also try the parser online: pcreparser.appspot.com

Some examples

The main class of this library is the pcreparser.PCRE class. Below are some examples of supported functionality.

1. Get capture group-count

source:

PCRE pcre = new PCRE("((.)\\1+ (?<YEAR>(?:19|20)\\d{2})) [^]-x]");
System.out.println(pcre.getGroupCount());

output:

3

Note that the named capture group, (?<YEAR>(?:19|20)\\d{2}), also counts. Below is the list of groups:

  1. ((.)\\1+ (?<YEAR>(?:19|20)\\d{2}))
  2. (.)
  3. (?<YEAR>(?:19|20)\\d{2})

2. Get named group-count

source:

PCRE pcre = new PCRE("((.)\\1+ (?<YEAR>(?:19|20)\\d{2})) [^]-x]");
System.out.println(pcre.getNamedGroupCount());

output:

1

3. Print ASCII tree of regex or group

source:

PCRE pcre = new PCRE("((.)\\1+ (?<YEAR>(?:19|20)\\d{2})) [^]-x]");
System.out.println(pcre.toStringASCII()); // equivalent to: pcre.toStringASCII(0)

output:

'- ALTERNATIVE
   |- ELEMENT
   |  '- CAPTURING_GROUP
   |     '- ALTERNATIVE
   |        |- ELEMENT
   |        |  '- CAPTURING_GROUP
   |        |     '- ALTERNATIVE
   |        |        '- ELEMENT
   |        |           '- ANY
   |        |- ELEMENT
   |        |  |- NUMBERED_BACKREFERENCE
   |        |  |  '- NUMBER='1'
   |        |  '- QUANTIFIER
   |        |     |- NUMBER='1'
   |        |     |- NUMBER='2147483647'
   |        |     '- GREEDY
   |        |- ELEMENT
   |        |  '- LITERAL=' '
   |        '- ELEMENT
   |           '- NAMED_CAPTURING_GROUP_PERL
   |              |- NAME='YEAR'
   |              '- ALTERNATIVE
   |                 |- ELEMENT
   |                 |  '- NON_CAPTURING_GROUP
   |                 |     '- OR
   |                 |        |- ALTERNATIVE
   |                 |        |  |- ELEMENT
   |                 |        |  |  '- LITERAL='1'
   |                 |        |  '- ELEMENT
   |                 |        |     '- LITERAL='9'
   |                 |        '- ALTERNATIVE
   |                 |           |- ELEMENT
   |                 |           |  '- LITERAL='2'
   |                 |           '- ELEMENT
   |                 |              '- LITERAL='0'
   |                 '- ELEMENT
   |                    |- DecimalDigit='\d'
   |                    '- QUANTIFIER
   |                       |- NUMBER='2'
   |                       |- NUMBER='2'
   |                       '- GREEDY
   |- ELEMENT
   |  '- LITERAL=' '
   '- ELEMENT
      '- NEGATED_CHARACTER_CLASS
         '- RANGE
            |- LITERAL=']'
            '- LITERAL='x'

Or to print a specific group or named group:

source:

PCRE pcre = new PCRE("((.)\\1+ (?<YEAR>(?:19|20)\\d{2})) [^]-x]");
System.out.println(pcre.toStringASCII(2));

output:

'- CAPTURING_GROUP
   '- ALTERNATIVE
      '- ELEMENT
         '- ANY

source:

PCRE pcre = new PCRE("((.)\\1+ (?<YEAR>(?:19|20)\\d{2})) [^]-x]");
System.out.println(pcre.toStringASCII("YEAR"));

output:

'- NAMED_CAPTURING_GROUP_PERL
   |- NAME='YEAR'
   '- ALTERNATIVE
      |- ELEMENT
      |  '- NON_CAPTURING_GROUP
      |     '- OR
      |        |- ALTERNATIVE
      |        |  |- ELEMENT
      |        |  |  '- LITERAL='1'
      |        |  '- ELEMENT
      |        |     '- LITERAL='9'
      |        '- ALTERNATIVE
      |           |- ELEMENT
      |           |  '- LITERAL='2'
      |           '- ELEMENT
      |              '- LITERAL='0'
      '- ELEMENT
         |- DecimalDigit='\d'
         '- QUANTIFIER
            |- NUMBER='2'
            |- NUMBER='2'
            '- GREEDY

Besides the toStringASCII() method demonstrated above, there are some other methods able to display the AST:

  • PCRE#toStringDOT(): creates a DOT-representation of group 0
  • PCRE#toStringDOT(int n): creates a DOT-representation of group n
  • PCRE#toStringDOT(String s): creates a DOT-representation of named group s

4. get the real AST

In order to get the actual AST from the pattern, use one of the following methods:

  • PCRE#getCommonTree(): get the AST of group 0
  • PCRE#getCommonTree(int n): get the AST of group n
  • PCRE#getCommonTree(String s): get the AST of named group s

All methods above return a CommonTree that has the following attributes:

  • CommonTree#getChildren(): List: a java.util.List of all child nodes/AST's
  • CommonTree#getType(): int: the token type of the AST (token types can be found as static ints in PCRELexer, once generated)
  • CommonTree#getText(): String: the text the token associated with this node matched during parsing
  • the API

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.