Code Monkey home page Code Monkey logo

easter-egg's People

Contributors

chandrakananandi avatar eytans avatar mwillsey avatar oflatt avatar remysucre avatar waywardmonkeys avatar yycdavid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

easter-egg's Issues

Gather colored node statistics

We need to understand the overhead (memory and runtime) of each color. We should check how many nodes each color has, how many nodes are shared, and how much size does the colored memo take during rebuild. The last one is important to see if we want to hold a memo for a color continually.
We should also collect data on how many black IDs (size of the sets of union_map) each color has (could help with early stopping).

Improve sorting performance

Currently when we sort parents or nodes we need to clone each node (to be able to use it as a sorting key). We can overcome this by using the "permutations" crate. This will complicate the API a bit, so we should first check how much of an overhead the cloning takes.

Add tests for `ConditionalApplier` with `check_colored`

check_colored now returns for which colors the condition holds. This means we need to "apply" the Applier on a colored substitution. We also might need to update the Subst assignments according to the color. I already did a quick fix, but we need to have tests for this.

Colored Parents are not canonized during rebuild

Because we separated colored_parents edges from parents #5 , during rebuild we do not canonize colored parents and update the memo. To solve this we need to do a special pass over the colored_parents, but because they do not affect the black merges we should do it only after finishing the classic rebuild.
Doing this will require another loop of

parents.sort
parents.dedup_by(<some color and to_union maintenance>)
update to_do list

But here the unions should only be colored. We should verify that it is safe to merge duplicate nodes for all colors where they appeard (and then delete one of the classes from black - which should happen by colored_rebuild at that point)

Color aware pattern matching

Currently the colored pattern implementation (colored_machine) uses a simple modification to the old machine to add branching to each color. This is inefficient when an edge is colored.
Because some edges are colored we can skip them for some colors during matching.
Currently the machine takes all nodes and filters by node color when relevant.
Instead, we need to change how the machine is implemented, pulling "up" the node matching, and then choosing colors by node.

Code duplication - process_unions

In process unions we are repeating the pattern of

todo.drain:
  sort parents
  dedup parents and update metadata
  union everything

This pattern repeats itself a few time during process_unions and colored_process_unions and for memo in some places.
Currently it is implemented for each case separately as the typing of each case differs a bit. We should find a way to generalize the types and dedup closures to a single framework. It will be extra difficult to maintain performance because in process_unions we are usin Vec::dedup_by which is more efficient then rebuilding the vector, but in process_colored_unions we use an iterator.

Colored Parents

I want to separate colored parents from normal parents. This can be useful when implementing other features (such as removing nodes #2 ), and might be used to improve efficiency. This is also good for memory use because we only hold colors for colored parents (which have dense colors).

Because parents are only used in rebuilding, and color rebuild needs to recreate parents each time it is not expensive to implement.

Verify matching jumps colored_brothers on first eclass

I changed the EGraph API to use search, search_eclass, and colored_search_eclass. Each has different assumptions, but in general search should also return colored results. Therefor, it should jump to colored equivalences from the first eclass we are searching in.
Should also check what are the assumptions for colored_search_eclass, and add tests accordingly.

Sparse node colors need to be updated during rebuild

When adding a colored edge or a black edge, and some colored edge needs to be modified (i.e. added a color or blackened) we can't update sparse node colors (because at this point the class.nodes is not ordered). Instead we should keep a lazy dirty_colors list to be used during class rebuild.
This list does not need to contain updates that happened during parent merging because the appropriate edges are also in class.nodes and will be deduped at rebuild_classes.
Note: We need to update the nodes in dirty_colors before applying it.

Single colored edges

Currently we implemented (now in features/multi_colored_splits) the Colored EGraph with edge sharing between different colors. This lead to a lot of overhead in code during the rebuild to manage parents with colors (also compute overhead), or colored_parents and dirty_colors. In both cases we save some memory (which is hard to amount theoretically, best case is save all colored_nodes from all colors except one, worst case is none), but at the cost of compute during add and rebuild.

The really bad thing about this, is how insurmountable the implementation and accompanying data-structure invariants became.
We will now attempt something a bit different. We will allow only up to a single color for each EClass, and if it is colored it will have only one node (this might change in the future).
This will enable a massive simplification of the rebuild process because we will not need to have dirty_colors (which are used to update the set of colors applicable to a single edge). This means we will only have a single loop of parent deduplication (i.e. finding congruence closure unions) for black, and for colored equality relations.
An additional optimization is mentioned in #14 .

TODO: is it allowed to have more then one colored edge in a colored EClass (might need to change invariants accordingly).

To do this the data structure will need to change to support a few different things:

  • Remove dense colors from memo and create a separate colored_memo that can have multiple equal edges (for each color)
  • Remove dirty_colors
  • Add optional color to EClass and filter by it during search and rebuild.
  • Have the colored class canonized to the colored form (also in the colored_memo)
  • Fix colored_process_unions to include only de-duplication by memo and parents ++ colored_parents (no need to canonize colored_parents separately)
  • colored_parents should be keyed by color (should probably use a hashmap).
  • Update colored_add accordingly.
  • Update union and colored_union accordingly.
  • Update classes_by_op to include colored classes and it's usages accordingly
  • When creating a combined color I need to duplicate the colored edges - Also what about colored add?
  • Colored merge should remove colored parents?

This will have the following invariants:

  • Colored EClasss nodes are canonized to said color
  • After rebuild colored nodes are in colored canonized form
  • parents always leads to uncolored classes.
  • colored_parents always leads to colored classes.
  • after rebuild colored_memo key are canonized to colored equality relation
  • after rebuild colored_memo values (id's) are the black representative (the node should sit in that class, and it should be colored).

Discriminant should not be used

In rust, std::mem::discriminant returns a valid value for only 'Enum's. This is ineffective for dynamic languages. All the code relying on the "EClass" being sorted by discriminant should be changed to use a function defined as part of the language (which can be discriminant by default) which will greatly increase runtime for SymbolLang style languages.

Experiment: Egg vs. Z3 (validity checking)

Collect benchmarks from unsat formulas. Translate formulas to terms and "forall" expressions to rewrite rules.

  • Use early stopping
  • Find dataset (SMTLIB)
  • Implement deep case splitting
  • Run colored
  • Run cloned
    • Memory limit
  • Run Z3, CVC5

Blocked by #7.

Early stopping when false=true

We should add a mechanism to detect bad "colors". We can say a color is bad if it has proven something that can not be. Colors are implemented under the assumption that only a few colored unions will happen, but when we add a false assumption it can imply that false = true, and the entire E-Graph can fold into a single E-Class which will be very expensive.

  • Detect vacuity using classes_by_op
  • Create delete_color
  • Implement automatic runner hooks to delete colors
    • Make this optional for experiments
  • Collect to stats how many colors were deleted

Investigate weird case splits in split_and_or

The test case in TheSy splits a non-recursive boolean datatype with the function And, Or, and Not. When looking at the splits it finds many seem unnecessary, so we should take a look and improve the split finders if possible.

Apply is very slow compared to search and rebuild

This is probably due to colored_add's complexity being factorial in the amount of black_ids of it's children nodes.
We should consider letting colored_add add uneeded nodes that will be deleted during rebuild.

Optimize Colored Add

It seems adding and combinatorically looking for black edges is expensive. One proposed solution is to have a separate place for each colored edge. This way can add a colored edge in its colored canonical form, and if it has a black form delete it during rebuild.
We also want to keep a set of colored edges already added. The set is important because we might later delete an edge during rebuild, but add it again in a new iteration of rewriting.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.