egraphs-good / extraction-gym Goto Github PK
View Code? Open in Web Editor NEWbenchmarking e-graph extraction
License: MIT License
benchmarking e-graph extraction
License: MIT License
Thank you for making this repo open-source. It is a really neat and helpful tool for egraph extraction research.
While executing the code, I've noticed an inconsistency in the results produced by different extraction methods. The cost outcomes derived from the ilp-cbc method consistently surpass those from both the greedy-dag and faster-greedy-dag methods. Theoretically, given that ILP aims for the optimal cost, it should always yield a result less than or equal to the greedy methods.
Specifically, here are some examples:
tensat/nasrnn_acyclic.json
{
"name": "data/tensat/nasrnn_acyclic.json",
"extractor": "greedy-dag",
"tree": 1021.1320661730861,
"dag": 1.0898100023314328,
"micros": 3009220
}
{
"name": "data/tensat/nasrnn_acyclic.json",
"extractor": "ilp-cbc",
"tree": 1697.1535464829212,
"dag": 1.5102849980721658,
"micros": 6817584
}
data/babble/list_list_hard_test_ellisk_2019-02-15T11.26.41--bench010_it15.json
{
"name": "data/babble/list_list_hard_test_ellisk_2019-02-15T11.26.41--bench010_it15.json",
"extractor": "greedy-dag",
"tree": 915,
"dag": 425,
"micros": 13953
}
{
"name": "data/babble/list_list_hard_test_ellisk_2019-02-15T11.26.41--bench010_it15.json",
"extractor": "ilp-cbc",
"tree": 1862,
"dag": 813,
"micros": 235790
}
Any insights or clarifications you can offer would be greatly appreciated.
Does anyone know why faster-greedy-dag is less good than greedy-dag?
cumulative dag cost for faster-greedy-dag: 77030
cumulative dag cost for greedy-dag: 76907
Using the tests in #26
target/release/extraction-gym data/fuzz/8.json --extractor=faster-greedy-dag --out=output/fuzz/8.json-faster-greedy-dag.json
Get this error:
thread 'main' panicked at 'assertion failed: self.find_cycles(&egraph, &egraph.root_eclasses).is_empty()', src/extract/mod.rs:79:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
With the extra checking in #26, global-greedy-dag now fails because it selects nodes from the wrong class:
target/release/extraction-gym data/eggcc-bril/duplicate_branch.bril.json --extractor=global-greedy-dag --out=output/eggcc-bril/duplicate_branch.bril.json-global-greedy-dag.json
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
iteration 6
thread 'main' panicked at 'assertion failed: node.eclass == *cid', src/extract/mod.rs:84:13
Previously this wasn't detected. I realised when global-greedy-dag has some extractions with a lower dag-cost than the optimal.
If I understand correctly, the egraphs end with acyclic
in tensat dataset are egraphs that are pruned ahead of time. However, I found there are cycles existing in some of these acyclic egraphs.
For example, in bert_acyclic.json
, one can easily verify that enodes 553__1
, 475__1
, 7680__0
form a cycle. And there the number of cycles in some acyclic tensat egraphs is not small. According to my analysis, here is the number of cycles for each of the tensat acyclic egraphs:
I am wondering if this is because my understanding is not correct or any other potential bugs?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.