real-regex 2026.6.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- real_regex-2026.6.0/LICENSE +21 -0
- real_regex-2026.6.0/MANIFEST.in +7 -0
- real_regex-2026.6.0/PKG-INFO +206 -0
- real_regex-2026.6.0/README.md +182 -0
- real_regex-2026.6.0/include/real/ast.hpp +853 -0
- real_regex-2026.6.0/include/real/charclass.hpp +182 -0
- real_regex-2026.6.0/include/real/compiler.hpp +455 -0
- real_regex-2026.6.0/include/real/config.hpp +37 -0
- real_regex-2026.6.0/include/real/pike.hpp +638 -0
- real_regex-2026.6.0/include/real/prefilter.hpp +220 -0
- real_regex-2026.6.0/include/real/program.hpp +232 -0
- real_regex-2026.6.0/include/real/real.hpp +670 -0
- real_regex-2026.6.0/include/real/storage.hpp +556 -0
- real_regex-2026.6.0/include/real/utf8.hpp +54 -0
- real_regex-2026.6.0/pyproject.toml +52 -0
- real_regex-2026.6.0/python/real/__init__.py +96 -0
- real_regex-2026.6.0/python/real_regex.egg-info/PKG-INFO +206 -0
- real_regex-2026.6.0/python/real_regex.egg-info/SOURCES.txt +24 -0
- real_regex-2026.6.0/python/real_regex.egg-info/dependency_links.txt +1 -0
- real_regex-2026.6.0/python/real_regex.egg-info/top_level.txt +1 -0
- real_regex-2026.6.0/python/src/_real.cpp +998 -0
- real_regex-2026.6.0/python/tests/test_differential_fuzz.py +200 -0
- real_regex-2026.6.0/python/tests/test_parity.py +125 -0
- real_regex-2026.6.0/python/tests/test_real.py +180 -0
- real_regex-2026.6.0/setup.cfg +4 -0
- real_regex-2026.6.0/setup.py +45 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 René Chenard
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
# Make the sdist self-contained: carry the C++ headers the binding compiles
|
|
2
|
+
# against and the binding source, plus the project docs.
|
|
3
|
+
graft include
|
|
4
|
+
recursive-include python/src *.cpp
|
|
5
|
+
recursive-include python/tests *.py
|
|
6
|
+
include python/README.md
|
|
7
|
+
include README.md LICENSE ARCHITECTURE.md
|
|
@@ -0,0 +1,206 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: real-regex
|
|
3
|
+
Version: 2026.6.0
|
|
4
|
+
Summary: REAL — linear-time (ReDoS-safe) regex engine with an re-compatible API
|
|
5
|
+
Author-email: René Chenard <rene.chenard.1@ulaval.ca>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/RECHE23/real-regex
|
|
8
|
+
Project-URL: Repository, https://github.com/RECHE23/real-regex
|
|
9
|
+
Project-URL: Documentation, https://reche23.github.io/real-regex/
|
|
10
|
+
Project-URL: Issues, https://github.com/RECHE23/real-regex/issues
|
|
11
|
+
Keywords: regex,regular-expression,redos,linear-time,re
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
16
|
+
Classifier: Programming Language :: C++
|
|
17
|
+
Classifier: Topic :: Text Processing
|
|
18
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
19
|
+
Classifier: Operating System :: OS Independent
|
|
20
|
+
Requires-Python: >=3.10
|
|
21
|
+
Description-Content-Type: text/markdown
|
|
22
|
+
License-File: LICENSE
|
|
23
|
+
Dynamic: license-file
|
|
24
|
+
|
|
25
|
+
# REAL
|
|
26
|
+
|
|
27
|
+
**Regular Expression Algorithmic Library** — a header-only C++20 regex engine,
|
|
28
|
+
constexpr from end to end, with an `re`-compatible Python binding.
|
|
29
|
+
|
|
30
|
+
- **Linear time, always.** The engine is a Pike VM (Thompson NFA simulation):
|
|
31
|
+
no backtracking, ReDoS-safe by construction.
|
|
32
|
+
- **Constexpr-friendly.** Patterns known at compile time are parsed, compiled
|
|
33
|
+
and matched at compile time.
|
|
34
|
+
- **Minimal memory.** Static (sizes fixed at compile time, zero allocation),
|
|
35
|
+
dynamic (storage sized exactly once at pattern compilation), or hybrid
|
|
36
|
+
(compile-time pattern, runtime text, zero heap allocation).
|
|
37
|
+
- **Zero dependencies.** One include.
|
|
38
|
+
|
|
39
|
+
Unsupported syntax is rejected with `real::regex_error` rather than silently
|
|
40
|
+
diverging. Deferred (and rejected): lookarounds, backreferences,
|
|
41
|
+
atomic/possessive groups, Unicode property classes, Unicode case folding,
|
|
42
|
+
`re.X`, `pos`/`endpos`. The planned next step is a lazy DFA for the
|
|
43
|
+
dense-candidate cases where `re` is still ahead.
|
|
44
|
+
|
|
45
|
+
Over the benchmark suite (`make bench-python`), REAL is **1.98x faster than
|
|
46
|
+
Python's `re`** at the geometric mean, with identical outputs; the `(a+)+b`
|
|
47
|
+
ReDoS case completes in microseconds where `re` takes over a second.
|
|
48
|
+
|
|
49
|
+
## Supported syntax
|
|
50
|
+
|
|
51
|
+
| Syntax | Meaning |
|
|
52
|
+
|---|---|
|
|
53
|
+
| `abc` | literal bytes (UTF-8 patterns match their UTF-8 bytes) |
|
|
54
|
+
| `\.` `\*` `\\` … | escaped metacharacter, matched literally |
|
|
55
|
+
| `.` | any codepoint except `\n` |
|
|
56
|
+
| `[abc]` `[a-z]` `[^abc]` | character class (members must be ASCII); `[^…]` matches any codepoint outside the set |
|
|
57
|
+
| `\d \D \w \W \s \S` | digit / word / space classes (ASCII sets, like Python's `re.ASCII`) |
|
|
58
|
+
| `\n \t \r \f \v \a \0` `\xHH` | control and hex escapes |
|
|
59
|
+
| `x*` `x+` `x?` | quantifiers (greedy; append `?` for lazy) |
|
|
60
|
+
| `x{n}` `x{n,}` `x{,m}` `x{n,m}` | counted repetition (greedy or lazy; counts capped at 1000) |
|
|
61
|
+
| `a\|b` | alternation, leftmost branch preferred |
|
|
62
|
+
| `(…)` `(?:…)` | capturing / non-capturing group |
|
|
63
|
+
| `(?P<name>…)` `(?<name>…)` | named capturing group (Python and .NET styles) |
|
|
64
|
+
| `^` `$` | line/text anchors (Python semantics: `$` also matches before a final `\n`) |
|
|
65
|
+
| `\A` `\Z` | strict text start / end |
|
|
66
|
+
| `\b` `\B` | word boundary / non-boundary (ASCII word characters) |
|
|
67
|
+
| `(?ims)` prefix | global flags: `i` case-insensitive (ASCII), `m` multiline, `s` dotall — also `real::flags` on the constructor |
|
|
68
|
+
|
|
69
|
+
**Unicode model:** matching is UTF-8 byte-based, but every construct consumes
|
|
70
|
+
whole codepoints (multi-byte sequences compile to byte-level alternatives), so
|
|
71
|
+
match boundaries never split a character. Class members and the `\d \w \s`
|
|
72
|
+
sets are ASCII by design; `[^…]`, `\D \W \S` and `.` do match non-ASCII
|
|
73
|
+
codepoints.
|
|
74
|
+
|
|
75
|
+
**Divergence from Python:** when a *nullable* loop body ends with an empty
|
|
76
|
+
iteration — e.g. `(a*)*` on `"aa"` — Python captures that final empty
|
|
77
|
+
iteration (`''`); REAL, like Perl/PCRE, keeps the last non-empty one (`"aa"`).
|
|
78
|
+
Group 0 is identical either way.
|
|
79
|
+
|
|
80
|
+
## C++ API
|
|
81
|
+
|
|
82
|
+
```cpp
|
|
83
|
+
#include <real/real.hpp>
|
|
84
|
+
|
|
85
|
+
real::regex rx("hello"); // runtime pattern, storage sized exactly once
|
|
86
|
+
rx.match("hello world"); // anchored at the start (Python re.match)
|
|
87
|
+
rx.fullmatch("hello"); // whole text (Python re.fullmatch)
|
|
88
|
+
rx.search("say hello"); // leftmost match anywhere (Python re.search)
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
`match`/`fullmatch`/`search` return a `real::match_result`: `matched()`,
|
|
92
|
+
`operator bool`, `start(g)`, `end(g)`, `m[g]` (a `std::string_view` into the
|
|
93
|
+
searched text, which must outlive the result), and the same accessors by group
|
|
94
|
+
name (`m["year"]`, `group_index`).
|
|
95
|
+
|
|
96
|
+
```cpp
|
|
97
|
+
for (auto& m : rx.find_iter(text)) { … } // lazy, Python finditer rules
|
|
98
|
+
rx.find_all(text); // eager vector<match_result>
|
|
99
|
+
rx.replace(text, "$2:$1"); // $&, $1…, ${name}, $$ — re.sub
|
|
100
|
+
rx.replace(text, "#", 2); // count limit
|
|
101
|
+
rx.split(text); // Python re.split, with groups
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
Empty matches follow Python's rules: they are yielded (even right after a
|
|
105
|
+
non-empty match) and the scan then advances one whole codepoint.
|
|
106
|
+
`find_iter`/`find_all` cannot be called on a temporary regex, and
|
|
107
|
+
`match`/`search`/`split` cannot take a temporary `std::string`.
|
|
108
|
+
|
|
109
|
+
### Three memory modes
|
|
110
|
+
|
|
111
|
+
```cpp
|
|
112
|
+
// Static: pattern compiled at compile time into exactly-sized constexpr
|
|
113
|
+
// arrays; an invalid pattern is a *compile error*.
|
|
114
|
+
constexpr real::static_regex<"(\\d{4})-(\\d{2})"> date;
|
|
115
|
+
static_assert(date.search("on 2026-06-10")[1] == "2026"); // constexpr match
|
|
116
|
+
|
|
117
|
+
// Hybrid: compile-time pattern, runtime text — matching performs zero heap
|
|
118
|
+
// allocations (state lives on the stack).
|
|
119
|
+
date.search(runtime_text);
|
|
120
|
+
|
|
121
|
+
// Dynamic: everything at runtime; the program is sized exactly once at
|
|
122
|
+
// compilation, match state is per-run scratch.
|
|
123
|
+
real::regex rx2(user_pattern, real::flags::icase);
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
The pure library is standard C++20 with no platform dependencies. `real::real`
|
|
127
|
+
is the CMake `FetchContent`/`find_package` target.
|
|
128
|
+
|
|
129
|
+
## Python binding
|
|
130
|
+
|
|
131
|
+
An `re`-compatible module backed by the C++ engine (CPython Limited API, one
|
|
132
|
+
abi3 extension, zero dependencies):
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
import real
|
|
136
|
+
|
|
137
|
+
real.search(r"(?P<y>\d{4})-(?P<m>\d{2})", "on 2026-06-10").groupdict()
|
|
138
|
+
real.compile(r"\w+").findall(text) # findall/finditer/split/sub/subn
|
|
139
|
+
real.sub(r"\s+", " ", text) # templates: \1, \g<name>, callables
|
|
140
|
+
real.compile(rb"[^;]+").findall(raw) # bytes patterns: raw-byte semantics
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
`str` matching is UTF-8 with character indices in `start/end/span`; `bytes`
|
|
144
|
+
patterns get `re`'s exact raw-byte semantics. Unsupported `re` features raise
|
|
145
|
+
`real.error` at compile time. Build with `make python && make python-test`.
|
|
146
|
+
|
|
147
|
+
Once published: `pip install real-regex` (one `cp310-abi3` wheel per platform
|
|
148
|
+
serves CPython 3.10+; the self-contained sdist compiles where no wheel
|
|
149
|
+
matches).
|
|
150
|
+
|
|
151
|
+
**Release process (manual + tag-driven, for reliability):**
|
|
152
|
+
- Use calendar versioning `YYYY.M.PATCH` with monthly patch reset
|
|
153
|
+
(e.g. `2026.6.0` for the first release of June 2026, then `2026.6.1` etc.;
|
|
154
|
+
next month starts at `.0`).
|
|
155
|
+
- Update the version in **both** places:
|
|
156
|
+
- `pyproject.toml` (the one used by the release guard and PyPI)
|
|
157
|
+
- `python/real/__init__.py` (the runtime `__version__` exposed to users)
|
|
158
|
+
- Commit the change (optionally include `[release]` in the message as a
|
|
159
|
+
human signal or for future automation).
|
|
160
|
+
- `git tag v2026.6.0`
|
|
161
|
+
- `git push origin main v2026.6.0`
|
|
162
|
+
|
|
163
|
+
The tag triggers `.github/workflows/release.yml`:
|
|
164
|
+
- `check-version` ensures the tag exactly matches the version in `pyproject.toml`.
|
|
165
|
+
- It builds abi3 wheels with `cibuildwheel` (Linux/macOS/Windows) + sdist.
|
|
166
|
+
- Publishes to PyPI using Trusted Publishing (OIDC) — no secrets.
|
|
167
|
+
|
|
168
|
+
We deliberately kept the process simple and explicit (no auto-bump on
|
|
169
|
+
merge yet) to avoid accidental publishes and keep the history auditable.
|
|
170
|
+
The tag-based guard + OIDC is the reliable core.
|
|
171
|
+
|
|
172
|
+
## Development
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
make help # list all targets
|
|
176
|
+
make test # build and run the test suite
|
|
177
|
+
make coverage # line coverage report (LLVM)
|
|
178
|
+
make sanitize # tests under ASan + UBSan
|
|
179
|
+
make lint # clang-tidy
|
|
180
|
+
make misra # MISRA C++:2023-oriented analysis
|
|
181
|
+
make fuzz # libFuzzer robustness fuzzing (clang)
|
|
182
|
+
make doc # API reference (Doxygen)
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
The API reference is published at <https://reche23.github.io/real-regex/>.
|
|
186
|
+
|
|
187
|
+
Select the compiler with `make test CXX=g++-14`. Every behaviour is tested at
|
|
188
|
+
runtime and in constexpr (`static_assert`) under Clang and GCC; an equivalence
|
|
189
|
+
suite checks the prefilter and fast paths never change results; a parity suite
|
|
190
|
+
and a randomized differential fuzzer compare Python outputs against `re`.
|
|
191
|
+
|
|
192
|
+
CI exercises:
|
|
193
|
+
|
|
194
|
+
| Platform | Architecture | Compiler |
|
|
195
|
+
|----------|--------------|----------|
|
|
196
|
+
| Linux | x86-64 | GCC, Clang |
|
|
197
|
+
| Linux | AArch64 | GCC |
|
|
198
|
+
| macOS | Apple Silicon (arm64) | Apple Clang |
|
|
199
|
+
| Windows | x86-64 | MSVC |
|
|
200
|
+
|
|
201
|
+
IntelLLVM (`icpx`), x86-64 macOS and the BSDs share the Clang flag set and are
|
|
202
|
+
supported by the build configuration but not exercised in CI.
|
|
203
|
+
|
|
204
|
+
## License
|
|
205
|
+
|
|
206
|
+
MIT — Copyright (c) 2026 René Chenard
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
# REAL
|
|
2
|
+
|
|
3
|
+
**Regular Expression Algorithmic Library** — a header-only C++20 regex engine,
|
|
4
|
+
constexpr from end to end, with an `re`-compatible Python binding.
|
|
5
|
+
|
|
6
|
+
- **Linear time, always.** The engine is a Pike VM (Thompson NFA simulation):
|
|
7
|
+
no backtracking, ReDoS-safe by construction.
|
|
8
|
+
- **Constexpr-friendly.** Patterns known at compile time are parsed, compiled
|
|
9
|
+
and matched at compile time.
|
|
10
|
+
- **Minimal memory.** Static (sizes fixed at compile time, zero allocation),
|
|
11
|
+
dynamic (storage sized exactly once at pattern compilation), or hybrid
|
|
12
|
+
(compile-time pattern, runtime text, zero heap allocation).
|
|
13
|
+
- **Zero dependencies.** One include.
|
|
14
|
+
|
|
15
|
+
Unsupported syntax is rejected with `real::regex_error` rather than silently
|
|
16
|
+
diverging. Deferred (and rejected): lookarounds, backreferences,
|
|
17
|
+
atomic/possessive groups, Unicode property classes, Unicode case folding,
|
|
18
|
+
`re.X`, `pos`/`endpos`. The planned next step is a lazy DFA for the
|
|
19
|
+
dense-candidate cases where `re` is still ahead.
|
|
20
|
+
|
|
21
|
+
Over the benchmark suite (`make bench-python`), REAL is **1.98x faster than
|
|
22
|
+
Python's `re`** at the geometric mean, with identical outputs; the `(a+)+b`
|
|
23
|
+
ReDoS case completes in microseconds where `re` takes over a second.
|
|
24
|
+
|
|
25
|
+
## Supported syntax
|
|
26
|
+
|
|
27
|
+
| Syntax | Meaning |
|
|
28
|
+
|---|---|
|
|
29
|
+
| `abc` | literal bytes (UTF-8 patterns match their UTF-8 bytes) |
|
|
30
|
+
| `\.` `\*` `\\` … | escaped metacharacter, matched literally |
|
|
31
|
+
| `.` | any codepoint except `\n` |
|
|
32
|
+
| `[abc]` `[a-z]` `[^abc]` | character class (members must be ASCII); `[^…]` matches any codepoint outside the set |
|
|
33
|
+
| `\d \D \w \W \s \S` | digit / word / space classes (ASCII sets, like Python's `re.ASCII`) |
|
|
34
|
+
| `\n \t \r \f \v \a \0` `\xHH` | control and hex escapes |
|
|
35
|
+
| `x*` `x+` `x?` | quantifiers (greedy; append `?` for lazy) |
|
|
36
|
+
| `x{n}` `x{n,}` `x{,m}` `x{n,m}` | counted repetition (greedy or lazy; counts capped at 1000) |
|
|
37
|
+
| `a\|b` | alternation, leftmost branch preferred |
|
|
38
|
+
| `(…)` `(?:…)` | capturing / non-capturing group |
|
|
39
|
+
| `(?P<name>…)` `(?<name>…)` | named capturing group (Python and .NET styles) |
|
|
40
|
+
| `^` `$` | line/text anchors (Python semantics: `$` also matches before a final `\n`) |
|
|
41
|
+
| `\A` `\Z` | strict text start / end |
|
|
42
|
+
| `\b` `\B` | word boundary / non-boundary (ASCII word characters) |
|
|
43
|
+
| `(?ims)` prefix | global flags: `i` case-insensitive (ASCII), `m` multiline, `s` dotall — also `real::flags` on the constructor |
|
|
44
|
+
|
|
45
|
+
**Unicode model:** matching is UTF-8 byte-based, but every construct consumes
|
|
46
|
+
whole codepoints (multi-byte sequences compile to byte-level alternatives), so
|
|
47
|
+
match boundaries never split a character. Class members and the `\d \w \s`
|
|
48
|
+
sets are ASCII by design; `[^…]`, `\D \W \S` and `.` do match non-ASCII
|
|
49
|
+
codepoints.
|
|
50
|
+
|
|
51
|
+
**Divergence from Python:** when a *nullable* loop body ends with an empty
|
|
52
|
+
iteration — e.g. `(a*)*` on `"aa"` — Python captures that final empty
|
|
53
|
+
iteration (`''`); REAL, like Perl/PCRE, keeps the last non-empty one (`"aa"`).
|
|
54
|
+
Group 0 is identical either way.
|
|
55
|
+
|
|
56
|
+
## C++ API
|
|
57
|
+
|
|
58
|
+
```cpp
|
|
59
|
+
#include <real/real.hpp>
|
|
60
|
+
|
|
61
|
+
real::regex rx("hello"); // runtime pattern, storage sized exactly once
|
|
62
|
+
rx.match("hello world"); // anchored at the start (Python re.match)
|
|
63
|
+
rx.fullmatch("hello"); // whole text (Python re.fullmatch)
|
|
64
|
+
rx.search("say hello"); // leftmost match anywhere (Python re.search)
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
`match`/`fullmatch`/`search` return a `real::match_result`: `matched()`,
|
|
68
|
+
`operator bool`, `start(g)`, `end(g)`, `m[g]` (a `std::string_view` into the
|
|
69
|
+
searched text, which must outlive the result), and the same accessors by group
|
|
70
|
+
name (`m["year"]`, `group_index`).
|
|
71
|
+
|
|
72
|
+
```cpp
|
|
73
|
+
for (auto& m : rx.find_iter(text)) { … } // lazy, Python finditer rules
|
|
74
|
+
rx.find_all(text); // eager vector<match_result>
|
|
75
|
+
rx.replace(text, "$2:$1"); // $&, $1…, ${name}, $$ — re.sub
|
|
76
|
+
rx.replace(text, "#", 2); // count limit
|
|
77
|
+
rx.split(text); // Python re.split, with groups
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Empty matches follow Python's rules: they are yielded (even right after a
|
|
81
|
+
non-empty match) and the scan then advances one whole codepoint.
|
|
82
|
+
`find_iter`/`find_all` cannot be called on a temporary regex, and
|
|
83
|
+
`match`/`search`/`split` cannot take a temporary `std::string`.
|
|
84
|
+
|
|
85
|
+
### Three memory modes
|
|
86
|
+
|
|
87
|
+
```cpp
|
|
88
|
+
// Static: pattern compiled at compile time into exactly-sized constexpr
|
|
89
|
+
// arrays; an invalid pattern is a *compile error*.
|
|
90
|
+
constexpr real::static_regex<"(\\d{4})-(\\d{2})"> date;
|
|
91
|
+
static_assert(date.search("on 2026-06-10")[1] == "2026"); // constexpr match
|
|
92
|
+
|
|
93
|
+
// Hybrid: compile-time pattern, runtime text — matching performs zero heap
|
|
94
|
+
// allocations (state lives on the stack).
|
|
95
|
+
date.search(runtime_text);
|
|
96
|
+
|
|
97
|
+
// Dynamic: everything at runtime; the program is sized exactly once at
|
|
98
|
+
// compilation, match state is per-run scratch.
|
|
99
|
+
real::regex rx2(user_pattern, real::flags::icase);
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
The pure library is standard C++20 with no platform dependencies. `real::real`
|
|
103
|
+
is the CMake `FetchContent`/`find_package` target.
|
|
104
|
+
|
|
105
|
+
## Python binding
|
|
106
|
+
|
|
107
|
+
An `re`-compatible module backed by the C++ engine (CPython Limited API, one
|
|
108
|
+
abi3 extension, zero dependencies):
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
import real
|
|
112
|
+
|
|
113
|
+
real.search(r"(?P<y>\d{4})-(?P<m>\d{2})", "on 2026-06-10").groupdict()
|
|
114
|
+
real.compile(r"\w+").findall(text) # findall/finditer/split/sub/subn
|
|
115
|
+
real.sub(r"\s+", " ", text) # templates: \1, \g<name>, callables
|
|
116
|
+
real.compile(rb"[^;]+").findall(raw) # bytes patterns: raw-byte semantics
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
`str` matching is UTF-8 with character indices in `start/end/span`; `bytes`
|
|
120
|
+
patterns get `re`'s exact raw-byte semantics. Unsupported `re` features raise
|
|
121
|
+
`real.error` at compile time. Build with `make python && make python-test`.
|
|
122
|
+
|
|
123
|
+
Once published: `pip install real-regex` (one `cp310-abi3` wheel per platform
|
|
124
|
+
serves CPython 3.10+; the self-contained sdist compiles where no wheel
|
|
125
|
+
matches).
|
|
126
|
+
|
|
127
|
+
**Release process (manual + tag-driven, for reliability):**
|
|
128
|
+
- Use calendar versioning `YYYY.M.PATCH` with monthly patch reset
|
|
129
|
+
(e.g. `2026.6.0` for the first release of June 2026, then `2026.6.1` etc.;
|
|
130
|
+
next month starts at `.0`).
|
|
131
|
+
- Update the version in **both** places:
|
|
132
|
+
- `pyproject.toml` (the one used by the release guard and PyPI)
|
|
133
|
+
- `python/real/__init__.py` (the runtime `__version__` exposed to users)
|
|
134
|
+
- Commit the change (optionally include `[release]` in the message as a
|
|
135
|
+
human signal or for future automation).
|
|
136
|
+
- `git tag v2026.6.0`
|
|
137
|
+
- `git push origin main v2026.6.0`
|
|
138
|
+
|
|
139
|
+
The tag triggers `.github/workflows/release.yml`:
|
|
140
|
+
- `check-version` ensures the tag exactly matches the version in `pyproject.toml`.
|
|
141
|
+
- It builds abi3 wheels with `cibuildwheel` (Linux/macOS/Windows) + sdist.
|
|
142
|
+
- Publishes to PyPI using Trusted Publishing (OIDC) — no secrets.
|
|
143
|
+
|
|
144
|
+
We deliberately kept the process simple and explicit (no auto-bump on
|
|
145
|
+
merge yet) to avoid accidental publishes and keep the history auditable.
|
|
146
|
+
The tag-based guard + OIDC is the reliable core.
|
|
147
|
+
|
|
148
|
+
## Development
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
make help # list all targets
|
|
152
|
+
make test # build and run the test suite
|
|
153
|
+
make coverage # line coverage report (LLVM)
|
|
154
|
+
make sanitize # tests under ASan + UBSan
|
|
155
|
+
make lint # clang-tidy
|
|
156
|
+
make misra # MISRA C++:2023-oriented analysis
|
|
157
|
+
make fuzz # libFuzzer robustness fuzzing (clang)
|
|
158
|
+
make doc # API reference (Doxygen)
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
The API reference is published at <https://reche23.github.io/real-regex/>.
|
|
162
|
+
|
|
163
|
+
Select the compiler with `make test CXX=g++-14`. Every behaviour is tested at
|
|
164
|
+
runtime and in constexpr (`static_assert`) under Clang and GCC; an equivalence
|
|
165
|
+
suite checks the prefilter and fast paths never change results; a parity suite
|
|
166
|
+
and a randomized differential fuzzer compare Python outputs against `re`.
|
|
167
|
+
|
|
168
|
+
CI exercises:
|
|
169
|
+
|
|
170
|
+
| Platform | Architecture | Compiler |
|
|
171
|
+
|----------|--------------|----------|
|
|
172
|
+
| Linux | x86-64 | GCC, Clang |
|
|
173
|
+
| Linux | AArch64 | GCC |
|
|
174
|
+
| macOS | Apple Silicon (arm64) | Apple Clang |
|
|
175
|
+
| Windows | x86-64 | MSVC |
|
|
176
|
+
|
|
177
|
+
IntelLLVM (`icpx`), x86-64 macOS and the BSDs share the Clang flag set and are
|
|
178
|
+
supported by the build configuration but not exercised in CI.
|
|
179
|
+
|
|
180
|
+
## License
|
|
181
|
+
|
|
182
|
+
MIT — Copyright (c) 2026 René Chenard
|