This article has multiple issues. Please help
improve it or discuss these issues on the
talk page. (
Learn how and when to remove these template messages)
|
Original author(s) | |
---|---|
Initial release | March 11, 2010[1] |
Stable release | 2021-04-01
/ April 1, 2021[2]
|
Repository | |
Written in | C++ |
Operating system | Cross-platform |
Type | Pattern matching library |
License | BSD |
Website |
github |
RE2 is a software library for regular expressions via a finite-state machine using automata theory, in contrast to almost all other regular expression libraries, which use backtracking implementations. It provides a C++ interface.
RE2 was implemented and used by Google. The library uses an "on-the-fly" deterministic finite-state automaton algorithm based on Ken Thompson's Plan 9 grep. [3]
RE2 generally compares to
Perl Compatible Regular Expressions (PCRE) in performance. For certain regular expression operators like |
(
logical disjunction or
boolean "or") it exceeds PCRE. On the other hand, RE2 does not support back-references because the Thompson DFA
[3] algorithm cannot implement those efficiently. It is also slightly slower than PCRE for parenthetic capturing operations.
PCRE can use a large recursive stack with corresponding high memory use and have exponential runtime on certain patterns. In contrast, RE2 uses a fixed stack and guarantees that run-time increases linearly (not exponentially) with the size of the input. The maximum memory allocated with RE2 is configurable.
RE2 has a slightly smaller set of features than PCRE, but has very predictable run-time and a maximum memory allotment. This makes it suitable for use in server applications, which require boundaries on memory usage and computational time. PCRE, on the other hand, has almost all of the features that a regular expression library can have, but has unpredictable run-time and memory usage and can grow unbounded.
RE2 is, for example, used by Google products like; Gmail, Google Documents and Google Sheets. [4] See GitHub for a documentation of the syntax: RE2 syntax.
In Google Sheets, it is used in the functions RegexMatch(), RegexReplace(), RegexExtract() and the find and replace feature. RegexExtract(), does not use grouping.
The RE2 algorithm has been rewritten in Rust as the package "regex". CloudFlare's web application firewall uses this package because the RE2 algorithm is immune to ReDoS. [5]
Russ Cox also wrote RE1, an earlier regular expression based on a bytecode interpreter. [6] OpenResty uses a RE1 fork called "sregex". [7]
This article has multiple issues. Please help
improve it or discuss these issues on the
talk page. (
Learn how and when to remove these template messages)
|
Original author(s) | |
---|---|
Initial release | March 11, 2010[1] |
Stable release | 2021-04-01
/ April 1, 2021[2]
|
Repository | |
Written in | C++ |
Operating system | Cross-platform |
Type | Pattern matching library |
License | BSD |
Website |
github |
RE2 is a software library for regular expressions via a finite-state machine using automata theory, in contrast to almost all other regular expression libraries, which use backtracking implementations. It provides a C++ interface.
RE2 was implemented and used by Google. The library uses an "on-the-fly" deterministic finite-state automaton algorithm based on Ken Thompson's Plan 9 grep. [3]
RE2 generally compares to
Perl Compatible Regular Expressions (PCRE) in performance. For certain regular expression operators like |
(
logical disjunction or
boolean "or") it exceeds PCRE. On the other hand, RE2 does not support back-references because the Thompson DFA
[3] algorithm cannot implement those efficiently. It is also slightly slower than PCRE for parenthetic capturing operations.
PCRE can use a large recursive stack with corresponding high memory use and have exponential runtime on certain patterns. In contrast, RE2 uses a fixed stack and guarantees that run-time increases linearly (not exponentially) with the size of the input. The maximum memory allocated with RE2 is configurable.
RE2 has a slightly smaller set of features than PCRE, but has very predictable run-time and a maximum memory allotment. This makes it suitable for use in server applications, which require boundaries on memory usage and computational time. PCRE, on the other hand, has almost all of the features that a regular expression library can have, but has unpredictable run-time and memory usage and can grow unbounded.
RE2 is, for example, used by Google products like; Gmail, Google Documents and Google Sheets. [4] See GitHub for a documentation of the syntax: RE2 syntax.
In Google Sheets, it is used in the functions RegexMatch(), RegexReplace(), RegexExtract() and the find and replace feature. RegexExtract(), does not use grouping.
The RE2 algorithm has been rewritten in Rust as the package "regex". CloudFlare's web application firewall uses this package because the RE2 algorithm is immune to ReDoS. [5]
Russ Cox also wrote RE1, an earlier regular expression based on a bytecode interpreter. [6] OpenResty uses a RE1 fork called "sregex". [7]