juliasymbolics / metatheory.jl Goto Github PK

Makes Julia reason with equations. General purpose metaprogramming, symbolic computation and algebraic equational reasoning library for the Julia programming language: E-Graphs & equality saturation, term rewriting and more.

Home Page: https://juliasymbolics.github.io/Metatheory.jl/dev/

License: MIT License

Julia 91.15% TeX 6.92% Rust 1.93%

julia symbolic symbolic-computation programming-language optimization compiler-optimization compiler-construction term-rewriting equality-saturation egraphs

metatheory.jl's Introduction

Metatheory.jl

The 3.0 version is in-development and close to release!

If you want to use Metatheory.jl, please use the `ale/3.0` branch.

It's a lot more stable, and up to ~200x faster than 2.0

Metatheory.jl is a general purpose term rewriting, metaprogramming and algebraic computation library for the Julia programming language, designed to take advantage of the powerful reflection capabilities to bridge the gap between symbolic mathematics, abstract interpretation, equational reasoning, optimization, composable compiler transforms, and advanced homoiconic pattern matching features. The core features of Metatheory.jl are a powerful rewrite rule definition language, a vast library of functional combinators for classical term rewriting and an e-graph rewriting, a fresh approach to term rewriting achieved through an equality saturation algorithm. Metatheory.jl can manipulate any kind of Julia symbolic expression type, as long as it satisfies the TermInterface.jl.

Metatheory.jl provides:

An eDSL (domain specific language) to define different kinds of symbolic rewrite rules.
A classical rewriting backend, derived from the SymbolicUtils.jl pattern matcher, supporting associative-commutative rules. It is based on the pattern matcher in the SICM book.
A flexible library of rewriter combinators.
An e-graph rewriting (equality saturation) backend and pattern matcher, based on the egg library, supporting backtracking and non-deterministic term rewriting by using a data structure called e-graph, efficiently incorporating the notion of equivalence in order to reduce the amount of user effort required to achieve optimization tasks and equational reasoning.
@capture macro for flexible metaprogramming.

Intuitively, Metatheory.jl transforms Julia expressions in other Julia expressions and can achieve such at both compile and run time. This allows Metatheory.jl users to perform customized and composable compiler optimizations specifically tailored to single, arbitrary Julia packages. Our library provides a simple, algebraically composable interface to help scientists in implementing and reasoning about semantics and all kinds of formal systems, by defining concise rewriting rules in pure, syntactically valid Julia on a high level of abstraction. Our implementation of equality saturation on e-graphs is based on the excellent, state-of-the-art technique implemented in the egg library, reimplemented in pure Julia.

2.0 is out!

Second stable version is out:

New e-graph pattern matching system, relies on functional programming and closures, and is much more extensible than 1.0's virtual machine.
No longer dispatch against types, but instead dispatch against objects.
Faster E-Graph Analysis
Better library macros
Updated TermInterface to 0.3.3
New interface for e-graph extraction using EGraphs.egraph_reconstruct_expression
Simplify E-Graph Analysis Interface. Use Symbols or functions for identifying Analyses.
Remove duplicates in E-Graph analyses data.

Many features have been ported from SymbolicUtils.jl. Metatheory.jl can be used in place of SymbolicUtils.jl when you have no need of manipulating mathematical expressions. The introduction of TermInterface.jl has allowed for large potential in generalization of term rewriting and symbolic analysis and manipulation features. Integration between Metatheory.jl with Symbolics.jl, as it has been shown in the "High-performance symbolic-numerics via multiple dispatch" paper.

Contributing

If you'd like to give us a hand and contribute to this repository you can:

Find a high level description of the project architecture in ARCHITECTURE.md
Read the contribution guidelines in CONTRIBUTING.md

Installation

You can install the stable version:

julia> using Pkg; Pkg.add("Metatheory")

Or you can install the developer version (recommended by now for latest bugfixes)

julia> using Pkg; Pkg.add(url="https://github.com/JuliaSymbolics/Metatheory.jl")

Documentation

Extensive Metatheory.jl is available here

Citing

If you use Metatheory.jl in your research, please cite our works.

Sponsors

If you enjoyed Metatheory.jl and would like to help, you can donate a coffee or choose place your logo and name in this page. See 0x0f0f0f's Github Sponsors page!

metatheory.jl's People

Contributors

Stargazers

Watchers

Forkers

sailfish009 drupalio jpfairbanks olynch philzook58 numhack giordano dpsanders chenzhao44 roger-luo murilo gpeairs amadeusine mrvplusone jlperla playfloor emaibb rexwzh gronniger herrgahr jakevossen5 schrauf jmw150 wilkenfeld securitydao pitmonticone ntoxeg dpinol t-bltg chriselrod o314 kylebd99 willow-ahrens ranocha baptistelamic rayegun turibe brugarolas gkronber jknohr nmheim e-kyriakos inqwire nystrom

metatheory.jl's Issues

Equational theories require symmetric rules

When using Metatheory for equational reasoning, we need to add both directions of all the axioms in the theory to the rewriting system. Would \iff be a good symbol for denoting a symmetric rule? ModelingToolkit uses ~ as an equation operator so that is another choice.

Anti-rules

From @bradeneliason :
2) Nonequivalence relations. It is possible to represent nonequivalence edges on the e-graph – anti-rules. The rules are transformations that are always true. An anti-rule would fire on sets of terms that are always false. For example: even ≠ odd. If ever you reach a state in your e-graph where two terms linked by an anti-rule are in the same e-class, then something has gone wrong and you can stop trying to saturate. I could see this used in a theorem prover to perform proof by contradiction. Perhaps anti-rules can be created with the current implementation of egg/MT.

For fun, I turned on DEBUG...

and I noticed a few minor problems:

@debug "updated parents " id g.parents[id] in repair! generates an exception because g has no parents field (anymore).
@debug(n, " not found in memo") in add! has its arguments reversed.
I would make a pull request but I'm not sure how to fix the first problem. Still learning how this all works.

I think it might be useful to add some simple statistics gathering about how many adds, merges, etc. are performed;
a less verbose tracing facility than debug would be useful for those wondering how it all really hangs together.

Parametrize the metadata field type in ENodes.

Experiment conditional rules

Better e-graphs pattern matching for blocks.

By now, code blocks behave as n-ary operators. What can be a good way for better integration of expressions like :(begin x; y; z end)? cc @thautwarm

interpolation of objects in theory macro with $

Knuth-Bendix Completion for Classical Backend

Match on `::` and `...` (and optionally `where`) without triggering special ematching functions.

A @raw macro call could be added in front of left hand of rules where :: and other special e-matching expressions should not be treated as special cases, and matched directly as expressions. This is not very elegant, one may want to use :: type assertions and match on a raw :: expression in the same left hand of a rule.
Usage of $ to "raw escape" expressions in the left hand could be considered and be a more consistent and elegant solution.

@when macro for conditional dynamic rules

@when guard a + foo  => b

becomes

a + foo  |>  if  guard;   b  else :($a + $foo) end

A custom type for specifying goals

Support splatting `...` in egraphs pattern matching

Support splatting in left and right hand of rules in the e-graph e-matcher.
Example:

t = @theory begin 
    f(args...) => foo(args[1:2]...) + bar(args[2:end]...)
end

This is already supported by the MatchCore backend.

Splatting in left hand of rules requires modifying the pattern matcher.
Splatting in right hand of rules does in fact require supporting dynamic rewrites.

Try Smaller Id Sizes

From our conversation: This a possibly silly micro-optimization, but it is easy to try switching from Int64 ids to UInt32 or Int32 ids.

Fix docs Package.toml

For the docs folder, I have to manually add dependencies to docs/Package.toml.

In docs/make.jl I run

include("../src/Metatheory.jl")

How do I specify in docs/Package.toml that I want to include the (unstaged, uncommitted) files and modules from the parent ../src folder, and automatically handle dependencies?

Bug in ematch for no argument functions

@philzook58

simp_theory = @theory begin
    munit() => :foo
end
G = EGraph( :(munit()) )
saturate!(G, simp_theory, timeout=1)
#=
MethodError: no method matching ematchlist(::EGraph, ::Array{Any,1}, ::Array{Any,1}, ::Base.ImmutableDict{Any,Tuple{EClass,Any}})
Closest candidates are:
  ematchlist(::EGraph, ::Array{Any,1}, !Matched::Array{Int64,1}, ::Base.ImmutableDict{Any,Tuple{EClass,Any}}) at /home/philip/.julia/packages/Metatheory/R4YWe/src/EGraphs/ematch.jl:25
=#

Symbol cache not working correctly

It seems that the symbol cache is not working correctly.
It is used to reduce the search space in terms of eclasses when performing e-matching, and should be a map of symbols -> eclasses where the symbols appear.

https://github.com/0x0f0f0f/Metatheory.jl/blob/f823eb8c733f410e10fcbb66aeeb7b71ca6c4c50/src/EGraphs/saturation.jl#L45

Is `@metatheory_init` really needed?

Testing Rule eqivalence with ==

Currently identical rules doesn't return true when compared with ==

a = Rule(:(a + b => b + a))
b = Rule(:(a + b => b + a))
a == b # currently returns false

#fix
==(a::Rule, b::Rule) = (a.expr == b.expr) && (a.mode == b.mode)

Document analyses and AbstractAnalysis

Different pretty prints in CLI of egraphs

title says it all

Use Penrose to visualize e-graphs

This is a speculative issue. It would be great to visualize an e-graph, and in particular how the saturation proceeds. Arranging the diagram in a visually informative way seems in principle quite difficult. However, Penrose supposedly solves this problem by optimizing diagrams.

Support comparisons in AST

Do a workaround/AST binarization of comparisons. Comparisons in Julia's AST are stored in a way not really fitting for Metatheory: a \in b <= c becomes Expr(:comparison, a, \in, b, <=, c)

SuperInterpreter/partial evaluator examples of the WHILE language.

see test/test_while_superinterpreter.jl and test/test_while.jl

Should error on nonsensical rules

simp_theory = @theory begin
    a == b
end

G = EGraph(:x)
@time saturate!(G, simp_theory)
extractor = addanalysis!(G, ExtractionAnalysis, astsize)
ex = extract!(G, extractor)

This rule makes no sense since the pattern b is not found on the left hand side, if I understand correctly. MT currently silently ignores the rules? It should error instead.

Support Conditional/Dynamic rewrite rules for E-Graphs backend

see #4 #3 #1

Detect causes of slowdowns in equality saturation. Improve Schedulers and performance.

See the last lines of https://github.com/0x0f0f0f/Metatheory.jl/blob/master/test/test_logic.jl
After a certain number of iterations, the equality saturation process slows down terribly even if the BackoffScheduler heuristic is applied.
To prove the Constructive Dilemma (:(((p => q) ∧ (r => s) ∧ (p ∨ r)) => (q ∨ s))) in the logic test example, it takes two equality saturation processes with 10 and 5 iterations each (saturate => extract => saturate => extract), instead of a single step with 15 iterations (which is too memory intensive, or too slow).

How to write your own Analysis/Scheduler tutorials

Serious egraphs pattern matcher bug

How to reproduce on latest master commit:

testt = @theory begin
    a + b       ==  b + a
    a + (b + c) ==  (a + b) + c
    d - d       => 0
    a + -b      => a - b
    a - b       => a + (-b)
end

ex = :((a + d) + -d)
g = EGraph(ex)
saturate!(g, testt)
# ERROR: unbound pattern variable b in rule Rule(:(a + -b => a - b))

Symbol literals are treated as QuoteNode in the `rewrite` backend.

Bugs when function calls have no arguments

An issue in extraction

julia> using Metatheory;

julia> using Metatheory.EGraphs;

julia> @metatheory_init ();

julia> ex = :(f());

julia> g = EGraph(ex);

julia> extract!(g, astsize)
:f

The expected result should be :(f()).

Optimize internal operations in E-Graphs

The EGraph type and operations extensively use dictionaries. Consider performance tips from the Julia manual:
https://docs.julialang.org/en/v1/manual/performance-tips/

Replace unefficient/memory hungry data structures (dictionaries, sets, dynamic arrays) with more efficient and specific (e.g. static arrays, sparse matrices) ones.
Consider using StaticArrays
Avoid containers with abstract type parameters
Annotate values taken from untyped locations
Write Type Stable Functions
Avoid Changing Variable Types
Use Loop Vectorization where possible: when the execution order does not matter.
Use parallel for loops with @threads or FLoops.jl

From https://dl.acm.org/doi/pdf/10.1145/3434304 :
"egg sorts e-nodes within each e-class ot enable binary search and also maintains a cache mapping function symbols to e-classes that contain e-nodes with that function symbol"

Cleaner exceptions that are better to handle and highlight which rule errored.

overlap with symbolics.jl?

both seem to be based at the core on graph rewriting and i think they use e graphs too. thoughts?

Consistency of literal symbols and pattern variables between backends

julia> co = @theory begin
               foo(x ⋅ :bazoo ⋅ :woo) => Σ(:n * x)
       end
1-element Vector{Rule}:
 Rule(:(foo((x ⋅ :bazoo) ⋅ :woo) => Σ(:n * x)))


julia> @testset "Consistency with Matchcore backend" begin
               ex = :(foo(wa(rio) ⋅ bazoo ⋅ woo))
               g = EGraph(ex)
               saturate!(g, co)
               extran = addanalysis!(g, ExtractionAnalysis, astsize)

               res = extract!(g, extran)

               resclassic = rewrite(ex, co)

               println(res)
               println(resclassic)

       end

prints

Σ(:n * wa(rio))
Σ(n * wa(rio))

This should print

Σ(n * wa(rio))
Σ(n * wa(rio))

Backoff scheduler

By now, this implementation uses an every-rule-every-time. A scheduler that identifies rewrites that are unperformant would make this implementation a lot faster.

From the egg paper https://dl.acm.org/doi/pdf/10.1145/3434304

By default, egg uses the built-in
backoff scheduler that identifies rewrites that are matching in exponentially-growing locations and
temporarily bans them. We have observed that this greatly reduced run time (producing the same
results) in many settings. egg can also use a conventional every-rule-every-time scheduler, or the
user can supply their own.

Goal-Informed Schedulers

Hard pruning rules

Implement E-Class Analysis

As described in https://dl.acm.org/doi/pdf/10.1145/3434304
An eclass analysis defines a domain D (a julia type?) and associates a value in D to every eclass in the egraph.
It forms a join semilattice. Check section 4.1 for more implementation details (and where to integrate analysis in e-graph adding, merging and repairing).

An interface AbstractAnalysis can be defined, so that it can be implemented by users by providing the make, join and modify methods as seen in the paper.

It is important to support the syntax already used for dynamic rules in the MatchCore.jl compiler. This implies supporting type assertions on literals in left hand sides and supporting dynamic rules, which compute the substitution dynamically based on analysis' data.

There are some built-in analysis that may be useful to provide: AstSize, computing the size of e-nodes for extraction, and TypeAnalysis, enriching the e-graph with type information. The latter could be needed for supporting type assertions for literals in left hand of rules out of the box.

Some questions: How to allow multiple simultaneous analysis per egraph? Storing an array of AbstractAnalysis and a Dict{AbstractAnalysis, Dict{Int64, Any} for storing the data in the egraph? Doesn't sound too performant, but would do the job. The analysis data could also be stored externally to the e-graph.

Produce EGraph images like egg.

Some resources
https://github.com/Keno/GraphViz.jl
https://epatters.github.io/Catlab.jl/dev/generated/graphics/graphviz_wiring_diagrams/
https://github.com/search?q=Compose.jl&type=Repositories

Extraction from E-Graph

Extract the best fitting expression from an egraph given a cost function.
A cost function k is local when the cost of a enode f(a1,...) can be computed from the function symbol f and the cost of the children enodes.
If a cost function is local, extracting an optimal expression can be done with a fixed-point traversal over the egraph: see https://dl.acm.org/doi/pdf/10.1145/2813885.2737959 .

If the cost function is local, extraction can be formulated as an e-class analysis. The analysis data is a tuple (n,k(n)) where n is the cheapest enode in the eclass and k(n) is its cost. make(n) computes k(n) based on the analysis data of n's children. merge simply takes the tuple with lower cost. Extraction can then proceed recursively.

Tests error on Julia v1.5.3 stable.

Why?

Benchmarks

See for example
https://gist.github.com/MasonProtter/0b8717d0e371b8a6efb1a697555c06a8

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Support Type Assertions for literals in the left hand of rules in the egraph backend.

The MatchCore backend already supports this out of the box:
One can add additional type assertions on the left hand of rules and they will match only if the matched metavariable is of the asserted type. This, together with dynamic rewrite rules makes constant folding very easy when using the MatchCore backend:

t = @theory begin 
      # syntactic rewrite rule
      x + x => 2x
      # number constant folding, conditional+dynamic rule
      a::Number + b::Number |> a+b 
end

To support type assertions in the EGraphs backend, two things might be considered:
Modifying the ematch pattern matcher to do match only if the assertions hold (on literals/constants), OR/AND
implicitly adding an e-class analysis for types (TypeAnalysis, see #1 ) during saturation and computing conditional/dynamic rules involving type assertions based on the type analysis. The latter would probably integrate better with the general egg-like workflow, but may not be needed for simply doing assertions in the left hand.

AC Rules/Loop Prevention for MatchCore rewriting backend

While the egg-like backend fully supports Associative-Commutative (and Distributive) rules (see also #8), the MatchCore backend is still heavily affected by non-terminating (infinite) rewriting loops.
When compiling theories for the MatchCore backend, a few improvements could be done:

Identify (and disallow for MatchCore?) associative, commutative and distributive rules, and other rules known to cause computational loops.
Keep an history of the expressions that have been visited by the classic fixpoint recursive rewriting algorithm during reduction. Halt, warn and return the expression when a loop is detected. (A vector of hash-es for storing history can suffice, but vulnerable to collisions)
Consider a "fuel" mechanism, used for example in GHC. Based on the input expression, at the beginning of reduction each rule is assigned a "fuel" counter. When the left hand of a rule is matched repeatedly, decrement its fuel. If a rule runs out of fuel (e.g. because of a loop due to commutativity or associativity), temporarily remove it from the theory and recompile the pattern matching block, after the expression changes, reintroduce the rule (in which position??) and reset its fuel counter.
Consider designing a smarter algorithm that considers both fuel and reduction history.
(Hardest point) Consider compile time transformations and rule reordering for the MatchCore backend, when a set of rewriting rules can be inferred to be terminating a priori.

Evaluate the Lazy Analysis algorithm.

Efficiency? Stability?

Using the interface to work on Syms (presumably correctly) gives wrong answers

using SymbolicUtils
using Metatheory
using Metatheory.EGraphs
import SymbolicUtils: Symbolic, Sym, operation, arguments, Term

TermInterface.istree(t::Symbolic) = SymbolicUtils.istree(t)
TermInterface.gethead(t::Symbolic) = :call 
TermInterface.gethead(t::Sym) = t
TermInterface.getargs(t::Symbolic) = [operation(t), arguments(t)...]
TermInterface.arity(t::Symbolic) = length(arguments(t))

function unflatten_args(f, args, N=4)
    length(args) < N && return Term{Real}(f, args)
    unflatten_args(f, [Term{Real}(f, group)
                                       for group in Iterators.partition(args, N)], N)
end

function TermInterface.preprocess(t::Symbolic)
    if SymbolicUtils.istree(t)
        f = operation(t)
        if f == (+) || f == (*) # check out for other binary ops TODO
            a = arguments(t)
            if length(a) > 2
                return unflatten_args(+, a, 2)
            end
        end
    end
    return t
end

function EGraphs.instantiateterm(g::EGraph, pat::PatTerm, x::Type{<:Symbolic{T}}, children) where {T}
    @assert pat.head == :call
    return Term{T}(children[1], children[2:end])
end


# Define an extraction method dispatching on MyExpr
function EGraphs.extractnode(n::ENode{<:Symbolic{T}}, extractor::Function) where {T}
    # (foo, bar, baz) = n.metadata
    # extracted arguments
    ret_args = [extractor(a) for a in n.args]
    if n.head == :call 
        return Term{T}(ret_args[1], ret_args[2:end])
    end
end



function costfun(n::ENode, g::EGraph, an)
    if arity(n) == 0
        if n.head == :+
            return 1
        elseif n.head == :-
            return 1
        elseif n.head == :*
            return 3
        elseif n.head == :/
            return 30
        else
            return 1
        end
    end
    if !(n.head == :call)
        return 1000000000
    end
    cost = 0

    for id ∈ n.args
        eclass = geteclass(g, id)
        !hasdata(eclass, an) && (cost += Inf; break)
        cost += last(getdata(eclass, an))
    end
    cost
end


@metatheory_init ()


theory = @methodtheory begin
    a * x == x * a
    a * x + a * y == a*(x+y)
end

function optimize(ex)
    g = EGraph()

    settermtype!(g, Term{symtype(ex), Any})

    ec = addexpr!(g, ex)

    g.root = ec.id

    # (2x) * x => 2 * (x * x) => 2x^2


    params = SaturationParams()
    saturate!(g, theory, params)

    extract!(g, costfun) # --> "term" head args
end

In the REPL:

julia> include("the file above")

julia> using SymbolicUtils

julia> 2(a+b) - a*(a+b)
2a + 2b - (a*(a + b))

julia> optimize(2(a+b)-a*(a+b))
2(a + b) + a - 1 + a + b

Optimize the E-Graph Pattern Matcher

The current pattern matcher is an unefficient version of the pattern matcher in
https://www.hpl.hp.com/techreports/2003/HPL-2003-148.pdf
Adapted from https://github.com/philzook58/EGraphs.jl/
By now, the pattern matcher uses channels as generators.
This architecture should be reconsidered for efficient parallelization.

Another pattern matcher architecture, based on a small virtual machine is
http://leodemoura.github.io/files/ematching.pdf
If this solution is considered, the abstract virtual machine could be implemented as low level as possible.

Possible bug involving turnstile operator

See PR #24 for MWE.

Use a Term-like interface instead of Expr

It would be good if Metatheory worked on types with some kind of common interface. Any type can say -- this is how you destructure this as a term, and this is how to put parts of a new term back into my type, now go and apply some rules on it.

We have an interface like this in SymbolicUtils.jl https://symbolicutils.juliasymbolics.org/interface/ it's basically istree, operation, arguments, similarterm, and other optional items. I plan to add issym so that Sym is not the only thing representing an "unknown".

Further, it would be nice to factor out the interface into an "ExpressionBase" package and use that.