In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Let be a set of uninterpreted functions, where is the subset of consisting of functions of arity . Let be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of to e-class IDs is denoted and called an e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures: [1]
In addition to the above structure, a valid e-graph conforms to several data structure invariants. [2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes are congruent when . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
![]() | This section needs expansion. You can help by
adding to it. (June 2021) |
E-graphs expose wrappers around the , , and operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
Let be a set of variables and let be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, is the smallest set such that , , and when and , then . A term containing variables is called a pattern, a term without variables is called ground.
An e-graph represents a ground term if one of its e-classes represents . An e-class represents if some e-node does. An e-node represents a term if and each e-class represents the term ( in ).
e-matching is an operation that takes a pattern and an e-graph , and yields all pairs where is a substitution mapping the variables in to e-class IDs and is an e-class ID such that each term is represented by . There are several known algorithms for e-matching, [3] [4] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal. [5]
![]() | This section needs expansion. You can help by
adding to it. (June 2021) |
Equality saturation is a technique for building optimizing compilers using e-graphs. [7] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3 [8] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. [9] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates. [10] E-graphs are also used in the Simplify theorem prover of ESC/Java. [11]
Equality saturation is used in specialized optimizing compilers, [12] e.g. for deep learning [13] and linear algebra. [14] Equality saturation has also been used for translation validation applied to the LLVM toolchain. [15]
E-graphs have been applied to several problems in program analysis, including fuzzing, [16] abstract interpretation, [17] and library learning. [18]
In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Let be a set of uninterpreted functions, where is the subset of consisting of functions of arity . Let be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of to e-class IDs is denoted and called an e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures: [1]
In addition to the above structure, a valid e-graph conforms to several data structure invariants. [2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes are congruent when . The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
![]() | This section needs expansion. You can help by
adding to it. (June 2021) |
E-graphs expose wrappers around the , , and operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
Let be a set of variables and let be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, is the smallest set such that , , and when and , then . A term containing variables is called a pattern, a term without variables is called ground.
An e-graph represents a ground term if one of its e-classes represents . An e-class represents if some e-node does. An e-node represents a term if and each e-class represents the term ( in ).
e-matching is an operation that takes a pattern and an e-graph , and yields all pairs where is a substitution mapping the variables in to e-class IDs and is an e-class ID such that each term is represented by . There are several known algorithms for e-matching, [3] [4] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal. [5]
![]() | This section needs expansion. You can help by
adding to it. (June 2021) |
Equality saturation is a technique for building optimizing compilers using e-graphs. [7] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3 [8] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. [9] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates. [10] E-graphs are also used in the Simplify theorem prover of ESC/Java. [11]
Equality saturation is used in specialized optimizing compilers, [12] e.g. for deep learning [13] and linear algebra. [14] Equality saturation has also been used for translation validation applied to the LLVM toolchain. [15]
E-graphs have been applied to several problems in program analysis, including fuzzing, [16] abstract interpretation, [17] and library learning. [18]