The easter-egg from eytans

Colors should remove unneeded colored edges during rebuild

Try using the colored memo built during congruence_closure for each color?

Gather colored node statistics

We need to understand the overhead (memory and runtime) of each color. We should check how many nodes each color has, how many nodes are shared, and how much size does the colored memo take during rebuild. The last one is important to see if we want to hold a memo for a color continually.
We should also collect data on how many black IDs (size of the sets of union_map) each color has (could help with early stopping).

Improve sorting performance

Currently when we sort parents or nodes we need to clone each node (to be able to use it as a sorting key). We can overcome this by using the "permutations" crate. This will complicate the API a bit, so we should first check how much of an overhead the cloning takes.

Add tests for `FilteringSearcher` with colored matches

The matches should be where they can pass the predicate with an additional color, but can not with a normal match

Add tests for `ConditionalApplier` with `check_colored`

check_colored now returns for which colors the condition holds. This means we need to "apply" the Applier on a colored substitution. We also might need to update the Subst assignments according to the color. I already did a quick fix, but we need to have tests for this.

Colored Parents are not canonized during rebuild

Because we separated colored_parents edges from parents #5 , during rebuild we do not canonize colored parents and update the memo. To solve this we need to do a special pass over the colored_parents, but because they do not affect the black merges we should do it only after finishing the classic rebuild.
Doing this will require another loop of

parents.sort
parents.dedup_by(<some color and to_union maintenance>)
update to_do list

But here the unions should only be colored. We should verify that it is safe to merge duplicate nodes for all colors where they appeard (and then delete one of the classes from black - which should happen by colored_rebuild at that point)

Color aware pattern matching

Currently the colored pattern implementation (colored_machine) uses a simple modification to the old machine to add branching to each color. This is inefficient when an edge is colored.
Because some edges are colored we can skip them for some colors during matching.
Currently the machine takes all nodes and filters by node color when relevant.
Instead, we need to change how the machine is implemented, pulling "up" the node matching, and then choosing colors by node.

Code duplication - process_unions

In process unions we are repeating the pattern of

todo.drain:
  sort parents
  dedup parents and update metadata
  union everything

This pattern repeats itself a few time during process_unions and colored_process_unions and for memo in some places.
Currently it is implemented for each case separately as the typing of each case differs a bit. We should find a way to generalize the types and dedup closures to a single framework. It will be extra difficult to maintain performance because in process_unions we are usin Vec::dedup_by which is more efficient then rebuilding the vector, but in process_colored_unions we use an iterator.

Colored Parents

I want to separate colored parents from normal parents. This can be useful when implementing other features (such as removing nodes #2 ), and might be used to improve efficiency. This is also good for memory use because we only hold colors for colored parents (which have dense colors).

Because parents are only used in rebuilding, and color rebuild needs to recreate parents each time it is not expensive to implement.

Verify matching jumps colored_brothers on first eclass

I changed the EGraph API to use search, search_eclass, and colored_search_eclass. Each has different assumptions, but in general search should also return colored results. Therefor, it should jump to colored equivalences from the first eclass we are searching in.
Should also check what are the assumptions for colored_search_eclass, and add tests accordingly.

Add tests for `check_colored_imm`

Sparse node colors need to be updated during rebuild

When adding a colored edge or a black edge, and some colored edge needs to be modified (i.e. added a color or blackened) we can't update sparse node colors (because at this point the class.nodes is not ordered). Instead we should keep a lazy dirty_colors list to be used during class rebuild.
This list does not need to contain updates that happened during parent merging because the appropriate edges are also in class.nodes and will be deduped at rebuild_classes.
Note: We need to update the nodes in dirty_colors before applying it.

Single colored edges

Currently we implemented (now in features/multi_colored_splits) the Colored EGraph with edge sharing between different colors. This lead to a lot of overhead in code during the rebuild to manage parents with colors (also compute overhead), or colored_parents and dirty_colors. In both cases we save some memory (which is hard to amount theoretically, best case is save all colored_nodes from all colors except one, worst case is none), but at the cost of compute during add and rebuild.

The really bad thing about this, is how insurmountable the implementation and accompanying data-structure invariants became.
We will now attempt something a bit different. We will allow only up to a single color for each EClass, and if it is colored it will have only one node (this might change in the future).
This will enable a massive simplification of the rebuild process because we will not need to have dirty_colors (which are used to update the set of colors applicable to a single edge). This means we will only have a single loop of parent deduplication (i.e. finding congruence closure unions) for black, and for colored equality relations.
An additional optimization is mentioned in #14 .

TODO: is it allowed to have more then one colored edge in a colored EClass (might need to change invariants accordingly).

To do this the data structure will need to change to support a few different things:

This will have the following invariants:

Colored EClasss nodes are canonized to said color
After rebuild colored nodes are in colored canonized form
parents always leads to uncolored classes.
colored_parents always leads to colored classes.
after rebuild colored_memo key are canonized to colored equality relation
after rebuild colored_memo values (id's) are the black representative (the node should sit in that class, and it should be colored).

Experiment: Conditional Optimizations

Extraction with colors
Colored conditions
Find dataset of programs to run optimizations on (consider Herbie)
Find metric

Discriminant should not be used

In rust, std::mem::discriminant returns a valid value for only 'Enum's. This is ineffective for dynamic languages. All the code relying on the "EClass" being sorted by discriminant should be changed to use a function defined as part of the language (which can be discriminant by default) which will greatly increase runtime for SymbolLang style languages.

Experiment: Egg vs. Z3 (validity checking)

Collect benchmarks from unsat formulas. Translate formulas to terms and "forall" expressions to rewrite rules.

Blocked by #7.

Early stopping when false=true

We should add a mechanism to detect bad "colors". We can say a color is bad if it has proven something that can not be. Colors are implemented under the assumption that only a few colored unions will happen, but when we add a false assumption it can imply that false = true, and the entire E-Graph can fold into a single E-Class which will be very expensive.

Detect vacuity using classes_by_op
Create delete_color
Implement automatic runner hooks to delete colors
- Make this optional for experiments
Collect to stats how many colors were deleted

Investigate weird case splits in split_and_or

The test case in TheSy splits a non-recursive boolean datatype with the function And, Or, and Not. When looking at the splits it finds many seem unnecessary, so we should take a look and improve the split finders if possible.

Apply is very slow compared to search and rebuild

This is probably due to colored_add's complexity being factorial in the amount of black_ids of it's children nodes.
We should consider letting colored_add add uneeded nodes that will be deleted during rebuild.

Redo reconstructing to be fast (and cache all classes)

Optimize Colored Add

It seems adding and combinatorically looking for black edges is expensive. One proposed solution is to have a separate place for each colored edge. This way can add a colored edge in its colored canonical form, and if it has a black form delete it during rebuild.
We also want to keep a set of colored edges already added. The set is important because we might later delete an edge during rebuild, but add it again in a new iteration of rewriting.

eytans / easter-egg Goto Github PK

easter-egg's People

Contributors

Stargazers

easter-egg's Issues

Recommend Projects

Recommend Topics

Recommend Org