Code Monkey home page Code Monkey logo

ply's Introduction

PLY - Python Lex-Yacc

Author: David Beazley (https://www.dabeaz.com)

Introduction

PLY is a zero-dependency Python implementation of the traditional parsing tools lex and yacc. It uses the same LALR(1) parsing algorithm as yacc and has most of its core features. It is compatible with all modern versions of Python.

PLY was originally created in 2001 to support an Introduction to Compilers course at the University of Chicago. As such, it has almost no features other than the core LALR(1) parsing algorithm. This is by design--students should be made to suffer. Well, at least a little bit. However, from a more practical point of view, there is a lot flexibility in terms of how you decide to use it. You can use PLY to build Abstract Syntax Trees (ASTs), simple one-pass compilers, protocol decoders, or even a more advanced parsing framework.

Download

The current release of PLY requires the use of Python 3.6 or greater. If you need to support an older version, download one of the historical releases.

How to Install and Use

Although PLY is open-source, it is not distributed or installed by package manager. There are only two files: lex.py and yacc.py, both of which are contained in a ply package directory. To use PLY, copy the ply directory into your project and import lex and yacc from the associated ply subpackage.

from .ply import lex
from .ply import yacc

PLY has no third-party dependencies and can be freely renamed or moved around within your project as you see fit. It rarely changes.

Example

The primary use for PLY is writing parsers for programming languages. Here an example that parses simple expressions into an abstract syntax tree (AST):

# -----------------------------------------------------------------------------
# example.py
#
# Example of using PLY To parse the following simple grammar.
#
#   expression : term PLUS term
#              | term MINUS term
#              | term
#
#   term       : factor TIMES factor
#              | factor DIVIDE factor
#              | factor
#
#   factor     : NUMBER
#              | NAME
#              | PLUS factor
#              | MINUS factor
#              | LPAREN expression RPAREN
#
# -----------------------------------------------------------------------------

from ply.lex import lex
from ply.yacc import yacc

# --- Tokenizer

# All tokens must be named in advance.
tokens = ( 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN',
           'NAME', 'NUMBER' )

# Ignored characters
t_ignore = ' \t'

# Token matching rules are written as regexs
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'

# A function can be used if there is an associated action.
# Write the matching regex in the docstring.
def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

# Ignored token with an action associated with it
def t_ignore_newline(t):
    r'\n+'
    t.lexer.lineno += t.value.count('\n')

# Error handler for illegal characters
def t_error(t):
    print(f'Illegal character {t.value[0]!r}')
    t.lexer.skip(1)

# Build the lexer object
lexer = lex()
    
# --- Parser

# Write functions for each grammar rule which is
# specified in the docstring.
def p_expression(p):
    '''
    expression : term PLUS term
               | term MINUS term
    '''
    # p is a sequence that represents rule contents.
    #
    # expression : term PLUS term
    #   p[0]     : p[1] p[2] p[3]
    # 
    p[0] = ('binop', p[2], p[1], p[3])

def p_expression_term(p):
    '''
    expression : term
    '''
    p[0] = p[1]

def p_term(p):
    '''
    term : factor TIMES factor
         | factor DIVIDE factor
    '''
    p[0] = ('binop', p[2], p[1], p[3])

def p_term_factor(p):
    '''
    term : factor
    '''
    p[0] = p[1]

def p_factor_number(p):
    '''
    factor : NUMBER
    '''
    p[0] = ('number', p[1])

def p_factor_name(p):
    '''
    factor : NAME
    '''
    p[0] = ('name', p[1])

def p_factor_unary(p):
    '''
    factor : PLUS factor
           | MINUS factor
    '''
    p[0] = ('unary', p[1], p[2])

def p_factor_grouped(p):
    '''
    factor : LPAREN expression RPAREN
    '''
    p[0] = ('grouped', p[2])

def p_error(p):
    print(f'Syntax error at {p.value!r}')

# Build the parser
parser = yacc()

# Parse an expression
ast = parser.parse('2 * 3 + 4 * (5 - x)')
print(ast)

Documentation

The doc/ directory has more extensive documentation on PLY.

Examples

The example/ directory has various examples of using PLY.

Test Suite

PLY is not shipped with a test-suite. However, it is extensively tested before releases are made. The file ply-tests.tar.gz contains a current version of the tests should you want to run them on your own.

Bug Reports

PLY is mature software and new features are rarely added. If you think you have found a bug, please report an issue.

Questions and Answers

Q: Why does PLY have such a weird API?

Aside from simply being over 20 years old and predating many advanced Python features, PLY takes inspiration from two primary sources: the Unix yacc utility and John Aycock's Spark toolkit. yacc uses a convention of referencing grammar symbols by $1, $2, and so forth. PLY mirrors this using p[1], p[2], etc. Spark was a parsing toolkit that utilized introspection and Python docstrings for specifying a grammar. PLY borrowed that because it was clever. A modern API can be found in the related SLY project.

Q: Why isn't PLY distributed as a package?

PLY is highly specialized software that is really only of interest to people creating parsers for programming languages. In such a project, I think you should take responsibility for library dependencies. PLY is very small and rarely changes--if it works for your project, there is no reason to ever update it. At the same time, as it's author, I reserve the creative right to make occasional updates to the project, including those that might introduce breaking changes. The bottom line is that I'm not a link in your software supply chain. If you use PLY, you should copy it into your project.

Q: Do you accept pull requests and user contributions?

No. New features are not being added to PLY at this time. However, I am interested in bug reports should you find a problem with PLY. Feel free to use the issue-tracker for feature ideas--just because PLY is not in active development doesn't mean that good ideas will be ignored.

Q: Can I use PLY to make my own parsing tool?

Absolutely! It's free software that you are free to copy and modify in any way that makes sense. This includes remixing it into something that's more powerful. I'd just ask that you keep the original copyright notice in files based on the original PLY source code so that others know where it came from.

Acknowledgements

A special thanks is in order for all of the students in CS326 who suffered through about 25 different versions of this.

The CHANGES file acknowledges those who have contributed patches.

Elias Ioup did the first implementation of LALR(1) parsing in PLY-1.x. Andrew Waters and Markus Schoepflin were instrumental in reporting bugs and testing a revised LALR(1) implementation for PLY-2.0.

Take a Class!

If you'd like to learn more about compiler principles or have a go at implementing a compiler from scratch, come take a course. https://www.dabeaz.com/compiler.html.

ply's People

Contributors

alberth avatar astrofrog avatar bobotig avatar boriel avatar brettcannon avatar carreau avatar dabeaz avatar divergentdave avatar gabrielaraujof avatar gmt avatar graingert avatar gvalkov avatar hugovk avatar jackdanger avatar joeedh avatar jvtrigueros avatar kajojify avatar laerreal avatar manuphatak avatar mdboom avatar moses-palmer avatar oboroc avatar paulsmith avatar psihonavt avatar qyearsley avatar richardustc avatar rreilink avatar segevfiner avatar timgates42 avatar warsaw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.