E-GRAPH

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

Let $\Sigma$ be a set of uninterpreted functions, where $\Sigma _{n}$ is the subset of $\Sigma$ consisting of functions of arity $n$ . Let $\mathbb {id}$ be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of $f\in \Sigma _{n}$ to e-class IDs $i_{1},i_{2},\ldots ,i_{n}\in \mathbb {id}$ is denoted $f(i_{1},i_{2},\ldots ,i_{n})$ and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:^[1]

A union-find structure $U$ representing equivalence classes of e-class IDs, with the usual operations $\mathrm {find}$ , $\mathrm {add}$ and $\mathrm {merge}$ . An e-class ID $e$ is canonical if $\mathrm {find} (U,e)=e$ ; an e-node $f(i_{1},\ldots ,i_{n})$ is canonical if each $i_{j}$ is canonical ( $j$ in $1,\ldots ,n$ ).
An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- a hashcons $H$ (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map $M$ that maps e-class IDs to e-classes, such that $M$ maps equivalent IDs to the same set of e-nodes: $\forall i,j\in \mathbb {id} ,M[i]=M[j]\Leftrightarrow \mathrm {find} (U,i)=\mathrm {find} (U,j)$

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants.^[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes $f(i_{1},\ldots ,i_{n}),f(j_{1},\ldots ,j_{n})$ are congruent when $\mathrm {find} (U,i_{k})=\mathrm {find} (U,j_{k}),k\in \{1,\ldots ,n\}$ . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

E-graphs expose wrappers around the $\mathrm {add}$ , $\mathrm {find}$ , and $\mathrm {merge}$ operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

E-matching

Let $V$ be a set of variables and let $\mathrm {Term} (\Sigma ,V)$ be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, $\mathrm {Term} (\Sigma ,V)$ is the smallest set such that $V\subset \mathrm {Term} (\Sigma ,V)$ , $\Sigma _{0}\subset \mathrm {Term} (\Sigma ,V)$ , and when $x_{1},\ldots ,x_{n}\in \mathrm {Term} (\Sigma ,V)$ and $f\in \Sigma _{n}$ , then $f(x_{1},\ldots ,x_{n})\in \mathrm {Term} (\Sigma ,V)$ . A term containing variables is called a pattern, a term without variables is called ground.

An e-graph $E$ represents a ground term $t\in \mathrm {Term} (\Sigma ,\emptyset )$ if one of its e-classes represents $t$ . An e-class $C$ represents $t$ if some e-node $f(i_{1},\ldots ,i_{n})\in C$ does. An e-node $f(i_{1},\ldots ,i_{n})\in C$ represents a term $g(j_{1},\ldots ,j_{n})$ if $f=g$ and each e-class $M[i_{k}]$ represents the term $j_{k}$ ( $k$ in $1,\ldots ,n$ ).

e-matching is an operation that takes a pattern $p\in \mathrm {Term} (\Sigma ,V)$ and an e-graph $E$ , and yields all pairs $(\sigma ,C)$ where $\sigma \subset V\times \mathbb {id}$ is a substitution mapping the variables in $p$ to e-class IDs and $C\in \mathbb {id}$ is an e-class ID such that each term $\sigma (p)$ is represented by $C$ . There are several known algorithms for e-matching,^[3]^[4] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.^[5]

Complexity

An e-graph with n equalities can be constructed in O(n log n) time.^[6]

Equality saturation

Equality saturation is a technique for building optimizing compilers using e-graphs.^[7] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3^[8] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.^[9] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.^[10] E-graphs are also used in the Simplify theorem prover of ESC/Java.^[11]

Equality saturation is used in specialized optimizing compilers,^[12] e.g. for deep learning^[13] and linear algebra.^[14] Equality saturation has also been used for translation validation applied to the LLVM toolchain.^[15]

E-graphs have been applied to several problems in program analysis, including fuzzing,^[16] abstract interpretation,^[17] and library learning.^[18]

References

^ ( Willsey et al. 2021)
^ ( Willsey et al. 2021)
^ ( de Moura & Bjørner 2007)
^ Moskal, Michał; Łopuszański, Jakub; Kiniry, Joseph R. (2008-05-06). "E-matching for Fun and Profit". Electronic Notes in Theoretical Computer Science. Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007). 198 (2): 19–35. doi: 10.1016/j.entcs.2008.04.078. ISSN 1571-0661.
^ Zhang, Yihong; Wang, Yisu Remy; Willsey, Max; Tatlock, Zachary (2022-01-12). "Relational e-matching". Proceedings of the ACM on Programming Languages. 6 (POPL): 35:1–35:22. doi: 10.1145/3498696. S2CID 236924583.
^ ( Flatt et al. 2022, p. 2)
^ ( Tate et al. 2009)
^ de Moura, Leonardo; Bjørner, Nikolaj (2008). "Z3: An Efficient SMT Solver". In Ramakrishnan, C. R.; Rehof, Jakob (eds.). Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. Berlin, Heidelberg: Springer. pp. 337–340. doi: 10.1007/978-3-540-78800-3_24. ISBN 978-3-540-78800-3.
^ Rümmer, Philipp (2012). "E-Matching with Free Variables". In Bjørner, Nikolaj; Voronkov, Andrei (eds.). Logic for Programming, Artificial Intelligence, and Reasoning. Proceedings. 18th International Conference, LPAR-18, Merida, Venezuela, March 11–15, 2012. Lecture Notes in Computer Science. Vol. 7180. Berlin, Heidelberg: Springer. pp. 359–374. doi: 10.1007/978-3-642-28717-6_28. ISBN 978-3-642-28717-6.
^ ( Flatt et al. 2022, p. 2)
^ Detlefs, David; Nelson, Greg; Saxe, James B. (May 2005). "Simplify: a theorem prover for program checking". Journal of the ACM. 52 (3): 365–473. doi: 10.1145/1066100.1066102. ISSN 0004-5411. S2CID 9613854.
^ Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002-05-17). "Denali: a goal-directed superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi: 10.1145/543552.512566. ISSN 0362-1340.
^ Yang, Yichen; Phothilimtha, Phitchaya Mangpo; Wang, Yisu Remy; Willsey, Max; Roy, Sudip; Pienaar, Jacques (2021-03-17). "Equality Saturation for Tensor Graph Superoptimization". arXiv: 2101.01332 [ cs.AI].
^ Wang, Yisu Remy; Hutchison, Shana; Leang, Jonathan; Howe, Bill; Suciu, Dan (2020-12-22). "SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra". arXiv: 2002.07951 [ cs.DB].
^ Stepp, Michael; Tate, Ross; Lerner, Sorin (2011). "Equality-Based Translation Validator for LLVM". In Gopalakrishnan, Ganesh; Qadeer, Shaz (eds.). Computer Aided Verification. Lecture Notes in Computer Science. Vol. 6806. Berlin, Heidelberg: Springer. pp. 737–742. doi: 10.1007/978-3-642-22110-1_59. ISBN 978-3-642-22110-1.
^ "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03.
^ Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-03-17). "Abstract Interpretation on E-Graphs". arXiv: 2203.09191 [ cs.LO].
Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-05-30). "Combining E-Graphs with Abstract Interpretation". arXiv: 2205.14989 [ cs.DS].
^ Cao, David; Kunkel, Rose; Nandi, Chandrakana; Willsey, Max; Tatlock, Zachary; Polikarpova, Nadia (2023-01-09). "babble: Learning Better Abstractions with E-Graphs and Anti-Unification". Proceedings of the ACM on Programming Languages. 7 (POPL): 396–424. arXiv: 2212.04596. doi: 10.1145/3571207. ISSN 2475-1421. S2CID 254536022.

de Moura, Leonardo; Bjørner, Nikolaj (2007). "Efficient E-Matching for SMT Solvers". In Pfenning, Frank (ed.). Automated Deduction – CADE-21. Lecture Notes in Computer Science. Vol. 4603. Berlin, Heidelberg: Springer. pp. 183–198. doi: 10.1007/978-3-540-73595-3_13. ISBN 978-3-540-73595-3.
Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages. 5 (POPL): 23:1–23:29. arXiv: 2004.03082. doi: 10.1145/3434304. S2CID 226282597.
Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation". Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery. pp. 264–276. doi: 10.1145/1480881.1480915. ISBN 978-1-60558-379-2. S2CID 2138086.
Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure". In A. Griggio; N. Rungta (eds.). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83. doi: 10.34727/2022/isbn.978-3-85448-053-2_13. ISBN 978-3-85448-053-2. S2CID 252118847.

External links

[1] ( Willsey et al. 2021)

[2] ( Willsey et al. 2021)

[3] ( de Moura & Bjørner 2007)

[4] Moskal, Michał; Łopuszański, Jakub; Kiniry, Joseph R. (2008-05-06). "E-matching for Fun and Profit". Electronic Notes in Theoretical Computer Science. Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007). 198 (2): 19–35. doi: 10.1016/j.entcs.2008.04.078. ISSN 1571-0661.

[5] Zhang, Yihong; Wang, Yisu Remy; Willsey, Max; Tatlock, Zachary (2022-01-12). "Relational e-matching". Proceedings of the ACM on Programming Languages. 6 (POPL): 35:1–35:22. doi: 10.1145/3498696. S2CID 236924583.

[6] ( Flatt et al. 2022, p. 2)

[7] ( Tate et al. 2009)

[8] Moura, Leonardo; Bjørner, Nikolaj (2008). "Z3: An Efficient SMT Solver". In Ramakrishnan, C. R.; Rehof, Jakob (eds.). Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. Vol. 4963. Berlin, Heidelberg: Springer. pp. 337–340. doi: 10.1007/978-3-540-78800-3_24. ISBN 978-3-540-78800-3.

[9] Rümmer, Philipp (2012). "E-Matching with Free Variables". In Bjørner, Nikolaj; Voronkov, Andrei (eds.). Logic for Programming, Artificial Intelligence, and Reasoning. Proceedings. 18th International Conference, LPAR-18, Merida, Venezuela, March 11–15, 2012. Lecture Notes in Computer Science. Vol. 7180. Berlin, Heidelberg: Springer. pp. 359–374. doi: 10.1007/978-3-642-28717-6_28. ISBN 978-3-642-28717-6.

[10] ( Flatt et al. 2022, p. 2)

[11] Detlefs, David; Nelson, Greg; Saxe, James B. (May 2005). "Simplify: a theorem prover for program checking". Journal of the ACM. 52 (3): 365–473. doi: 10.1145/1066100.1066102. ISSN 0004-5411. S2CID 9613854.

[12] Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002-05-17). "Denali: a goal-directed superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi: 10.1145/543552.512566. ISSN 0362-1340.

[13] Yang, Yichen; Phothilimtha, Phitchaya Mangpo; Wang, Yisu Remy; Willsey, Max; Roy, Sudip; Pienaar, Jacques (2021-03-17). "Equality Saturation for Tensor Graph Superoptimization". arXiv: 2101.01332 [ cs.AI].

[14] Wang, Yisu Remy; Hutchison, Shana; Leang, Jonathan; Howe, Bill; Suciu, Dan (2020-12-22). "SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra". arXiv: 2002.07951 [ cs.DB].

[15] Stepp, Michael; Tate, Ross; Lerner, Sorin (2011). "Equality-Based Translation Validator for LLVM". In Gopalakrishnan, Ganesh; Qadeer, Shaz (eds.). Computer Aided Verification. Lecture Notes in Computer Science. Vol. 6806. Berlin, Heidelberg: Springer. pp. 737–742. doi: 10.1007/978-3-642-22110-1_59. ISBN 978-3-642-22110-1.

[16] "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". pldi22.sigplan.org. Retrieved 2023-02-03.

[17] Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-03-17). "Abstract Interpretation on E-Graphs". arXiv: 2203.09191 [ cs.LO].
Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-05-30). "Combining E-Graphs with Abstract Interpretation". arXiv: 2205.14989 [ cs.DS].

[18] Cao, David; Kunkel, Rose; Nandi, Chandrakana; Willsey, Max; Tatlock, Zachary; Polikarpova, Nadia (2023-01-09). "babble: Learning Better Abstractions with E-Graphs and Anti-Unification". Proceedings of the ACM on Programming Languages. 7 (POPL): 396–424. arXiv: 2212.04596. doi: 10.1145/3571207. ISSN 2475-1421. S2CID 254536022.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

Definition and operations

Invariants

Operations

E-matching

Complexity

Equality saturation

Applications

References

External links

Definition and operations

Invariants

Operations

E-matching

Complexity

Equality saturation

Applications

References

External links

Videos

Websites

Encyclopedia

Facebook