Bytecode transformers for CPython inspired by the ast
module's NodeTransformer
.
codetransformer
is a library that allows us to work with CPython's bytecode representation at runtime. codetransformer
provides a level of abstraction between the programmer and the raw bytes read by the eval loop so that we can more easily inspect and modify bytecode.
codetransformer
is motivated by the need to override parts of the python language that are not already hooked into through data model methods. For example:
- Override the
is
andnot
operators. - Custom data structure literals.
- Syntax features that cannot be represented with valid python AST or source.
- Run without a modified CPython interpreter.
codetransformer
was originally developed as part of lazy to implement the transformations needed to override the code objects at runtime.
While this can be done as an AST transformation, we will often need to execute the constructor for the literal multiple times. Also, we need to be sure that any additional names required to run our code are provided when we run. With codetransformer
, we can pre compute our new literals and emit code that is as fast as loading our unmodified literals without requiring any additional names be available implicitly.
In the following block we demonstrate overloading dictionary syntax to result in collections.OrderedDict
objects. OrderedDict
is like a dict
; however, the order of the keys is preserved.
>>> from codetransformer.transformers.literals import ordereddict_literals
>>> @ordereddict_literals
... def f():
... return {'a': 1, 'b': 2, 'c': 3}
>>> f()
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
This also supports dictionary comprehensions:
>>> @ordereddict_literals
... def f():
... return {k: v for k, v in zip('abc', (1, 2, 3))}
>>> f()
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
The next block overrides float
literals with decimal.Decimal
objects. These objects support arbitrary precision arithmetic.
>>> from codetransformer.transformers.literals import decimal_literals
>>> @decimal_literals
... def f():
... return 1.5
>>> f()
Decimal('1.5')
Pattern matched exceptions are a good example of a CodeTransformer
that would be very complicated to implement at the AST level. This transformation extends the try/except
syntax to accept instances of BaseException
as well subclasses of BaseException
. When excepting an instance, the args
of the exception will be compared for equality to determine which exception handler should be invoked. For example:
>>> @pattern_matched_exceptions()
... def foo():
... try:
... raise ValueError('bar')
... except ValueError('buzz'):
... return 'buzz'
... except ValueError('bar'):
... return 'bar'
>>> foo()
'bar'
This function raises an instance of ValueError
and attempts to catch it. The first check looks for instances of ValueError
that were constructed with an argument of 'buzz'
. Because our custom exception is raised with 'bar'
, these are not equal and we do not enter this handler. The next handler looks for ValueError('bar')
which does match the exception we raised. We then enter this block and normal python rules take over.
We may also pass their own exception matching function:
>>> def match_greater(match_expr, exc_type, exc_value, exc_traceback):
... return math_expr > exc_value.args[0]
>>> @pattern_matched_exceptions(match_greater)
... def foo():
... try:
... raise ValueError(5)
... except 4:
... return 4
... except 5:
... return 5
... except 6:
... return 6
>>> foo()
6
This matches on when the match expression is greater in value than the first argument of any exception type that is raised. This particular behavior would be very hard to mimic through AST level transformations.
The three core abstractions of codetransformer
are:
- The
Instruction
object which represents an opcode which may be paired with some argument. - The
Code
object which represents a collection ofInstruction
s. - The
CodeTransformer
object which represents a set of rules for manipulatingCode
objects.
The Instruction
object represents an atomic operation that can be performed by the CPython virtual machine. These are things like LOAD_NAME
which loads a name onto the stack, or ROT_TWO
which rotates the top two stack elements.
Some instructions accept an argument, for example LOAD_NAME
, which modifies the behavior of the instruction. This is much like a function call where some functions accept arguments. Because the bytecode is always packed as raw bytes, the argument must be some integer (CPython stores all arguments two in bytes). This means that things that need a more rich argument system (like LOAD_NAME
which needs the actual name to look up) must carry around the actual arguments in some table and use the integer as an offset into this array. One of the key abstractions of the Instruction
object is that the argument is always some python object that represents the actual argument. Any lookup table management is handled for the user. This is helpful because some arguments share this table so we don't want to add extra entries or forget to add them at all.
Another annoyance is that the instructions that handle control flow use their argument to say what bytecode offset to jump to. Some jumps use the absolute index, others use a relative index. This also makes it hard if you want to add or remove instructions because all of the offsets must be recomputed. In codetransformer
, the jump instructions all accept another Instruction
as the argument so that the assembler can manage this for the user. We also provide an easy way for new instructions to "steal" jumps that targeted another instruction so that can manage altering the bytecode around jump targets.
Code
objects are a nice abstraction over python's types.CodeType
. Quoting the CodeType
constructor docstring:
code(argcount, kwonlyargcount, nlocals, stacksize, flags, codestring,
constants, names, varnames, filename, name, firstlineno,
lnotab[, freevars[, cellvars]])
Create a code object. Not for the faint of heart.
The codetransformer
abstraction is designed to make it easy to dynamically construct and inspect these objects. This allows us to easy set things like the argument names, and manipulate the line number mappings.
The Code
object provides methods for converting to and from Python's code representation:
from_pycode
to_pycode
.
This allows us to take an existing function, parse the meaning from it, modify it, and then assemble this back into a new python code object.
Note
Code
objects are immutable. When we say "modify", we mean create a copy with different values.
This is the set of rules that are used to actually modify the Code
objects. These rules are defined as a set of patterns
which are a DSL used to define a DFA for matching against sequences of Instruction
objects. Once we have matched a segment, we yield new instructions to replace what we have matched. A simple codetransformer looks like:
from codetransformer import CodeTransformer, instructions
class FoldNames(CodeTransformer):
@pattern(
instructions.LOAD_GLOBAL,
instructions.LOAD_GLOBAL,
instructions.BINARY_ADD,
)
def _load_fast(self, a, b, add):
yield instructions.LOAD_FAST(a.arg + b.arg).steal(a)
This CodeTransformer
uses the +
operator to implement something like CPP
s token pasting for local variables. We read this pattern as a sequence of two LOAD_GLOBAL
(global name lookups) followed by a BINARY_ADD
instruction (+
operator call). This will then call the function with the three instructions passed positionally. This handler replaces this sequence with a single instruction that emits a LOAD_FAST
(local name lookup) that is the result of adding the two names together. We then steal any jumps that used to target the first LOAD_GLOBAL
.
We can execute this transformer by calling an instance of it on a function object, or using it like a decorator. For example:
>>> @FoldNames()
... def f():
... ab = 3
... return a + b
>>> f()
3
codetransformer
is free software, licensed under the GNU General Public License, version 2. For more information see the LICENSE
file.
Source code is hosted on github at https://github.com/llllllllll/codetransformer.