descent 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 75c7bef6464798eee6b29147f663d41fcb237c71338e67f14596bd6ce98d4574
4
+ data.tar.gz: 9a76dd2385b9087cc95640437fa5be20fcde8468525211c4d1c1cfd6b72bd626
5
+ SHA512:
6
+ metadata.gz: f3065ddce128bf8864915aef45ba67b2f24403d67ce2d159d7616cdceb9559821080692ee7be330802134868b1819a5a381c40d538addc12852f5abbdbd2f7ce
7
+ data.tar.gz: 79231da63252a41e0ac42666cbf17001a87397a7885bd3c63bfb79d153656c904a5f6f04b3b9c80591f3f39effec841fc902112748bfafad19cd03dda4297509
data/CHANGELOG.md ADDED
@@ -0,0 +1,285 @@
1
+ # Changelog
2
+
3
+ All notable changes to descent will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.7.1] - 2026-01-09
9
+
10
+ ### Fixed
11
+ - **Commands after /error preserved**: Removed stale `filter_unreachable_after_error`
12
+ that incorrectly dropped commands (including `|return`) after `/error` calls.
13
+
14
+ ## [0.7.0] - 2026-01-09
15
+
16
+ ### Added
17
+ - **Comprehensive test suite**: 148 tests covering Lexer, Parser, IRBuilder, Validator,
18
+ and Generator modules. Includes integration harness tests that compile and run parsers.
19
+ - **SYNTAX.md reference**: Complete .desc DSL syntax reference document with BNF-style
20
+ grammar, all directives, actions, and character classes.
21
+ - **Validator checks**: Missing parser name validation, duplicate types, undefined
22
+ function calls, invalid state transitions, and more.
23
+ - **`streaming:` option**: Generator now accepts `streaming: false` to omit StreamingParser
24
+ infrastructure (~200 lines) when not needed.
25
+
26
+ ### Changed
27
+ - **`/error` no longer auto-returns**: The `/error(Code)` command now only emits the
28
+ error event. Add explicit `|return` to exit. This enables error recovery patterns
29
+ like `/error(NoTabs) | ->['\n'] |>>`.
30
+ - **Escape sequences consolidated**: Single `ESCAPE_SEQUENCES` constant used by both
31
+ `rust_expr` and `transform_call_args`, eliminating duplication.
32
+
33
+ ### Fixed
34
+ - **COL in function args**: `/element(COL)` now correctly generates `self.parse_element(self.col(), on_event)`
35
+ instead of broken output. Function calls are processed before COL/LINE/PREV expansion.
36
+ - **Unused variable warnings**: Locals only assigned at entry (not reassigned in body)
37
+ now emit `let` instead of `let mut`, eliminating `unused_mut` warnings.
38
+ - **Double return warnings**: Fixed template generating two consecutive returns when
39
+ `/error` was followed by explicit `|return`.
40
+
41
+ ## [0.6.17] - 2026-01-02
42
+
43
+ ### Fixed
44
+ - **O(n²) chained memchr**: scan_to4/5/6 now limit the second search to the range
45
+ found by the first search. Previously both searches scanned the entire remaining
46
+ input independently, causing O(n²) behavior on large documents.
47
+
48
+ ## [0.6.16] - 2026-01-02
49
+
50
+ ### Changed
51
+ - **SIMD newline injection for line tracking**: Scannable states now automatically
52
+ inject `'\n'` into scan targets (if not already present and size < 6). This enables
53
+ correct line/column tracking during SIMD scans without runtime checks. When the
54
+ injected newline is hit, the parser updates line/column and continues scanning.
55
+ Scan functions simplified to just add offset to column, trusting no newlines exist
56
+ between start and found position.
57
+
58
+ ## [0.6.15] - 2026-01-02
59
+
60
+ ### Fixed
61
+ - **pascalcase preserves PascalCase**: The `pascalcase` filter now correctly handles
62
+ already-PascalCase input like `UnclosedInterpolation` instead of lowercasing it
63
+ to `Unclosedinterpolation`. Splits on case transitions in addition to delimiters.
64
+ - **Error code deduplication**: Custom `/error(Code)` calls no longer create duplicate
65
+ enum variants when the same code is auto-generated from `expects_char` inference.
66
+
67
+ ## [0.6.14] - 2026-01-02
68
+
69
+ ### Added
70
+ - **advance_to validation**: `->[...]` now validates its arguments at IR build time.
71
+ Errors on: character classes (LETTER, DIGIT, etc.), parameter references (:param),
72
+ and >6 characters. Only literal bytes are supported (uses SIMD memchr).
73
+
74
+ ### Fixed
75
+ - **advance_to 4-6 chars**: `->[...]` now correctly supports 4-6 characters using
76
+ chained memchr (scan_to4/5/6). Previously the template generated broken code for >3 chars.
77
+
78
+ ## [0.6.13] - 2026-01-01
79
+
80
+ ### Fixed
81
+ - **BRACKET End event on inline emit**: BRACKET type functions now correctly emit
82
+ their End event on return even when preceded by inline emits like `RawContent(USE_MARK)`.
83
+ Previously `suppress_auto_emit` incorrectly skipped the End event for BRACKET types.
84
+ - **If-case break after return**: `|if[cond] |return` followed by `| -> |>> :state` now
85
+ correctly generates two separate match arms. Previously the bare action case commands
86
+ were appended to the if-case, causing unreachable code warnings.
87
+ - **Entry actions preserved**: Function-level entry actions like `| val = 5` are now
88
+ correctly preserved through IR transformations (prepend_values, type coercion).
89
+ - **Local variable initialization**: Locals with entry action assignments now initialize
90
+ directly with the value (`let mut val: i32 = 5;`) instead of init-then-assign,
91
+ eliminating "value assigned is never read" warnings.
92
+ - **set_term helper emission**: `TERM` commands now correctly trigger emission of the
93
+ `set_term` helper method regardless of offset value. Previously `TERM(0)` failed to
94
+ compile because the helper wasn't generated.
95
+
96
+ ## [0.6.12] - 2026-01-01
97
+
98
+ ### Changed
99
+ - **Conditional helper emission**: Generated parsers now only include helper methods
100
+ that are actually used, eliminating dead_code warnings. The generator analyzes
101
+ usage of `col()`, `prev()`, character class methods (`is_letter`, `is_digit`, etc.),
102
+ and scan methods (`scan_to1` through `scan_to6`) and only emits what's needed.
103
+
104
+ ## [0.6.9] - 2026-01-01
105
+
106
+ ### Fixed
107
+ - **Unconditional state handling**: States with bare action cases (no character match)
108
+ now execute immediately without waiting for a byte. Previously, `| MARK |>> :next`
109
+ would generate `Some(_) =>` which waited for a byte before executing MARK.
110
+
111
+ ## [0.6.8] - 2026-01-01
112
+
113
+ ### Fixed
114
+ - **Empty content span bug**: `span_from_mark()` and `term()` now correctly handle
115
+ empty content where TERM is called at the same position as MARK. Uses sentinel
116
+ value (`usize::MAX`) to distinguish "TERM not called" from "TERM called with
117
+ empty content". Fixes spans like `!{{}}` returning 6..8 instead of 6..6.
118
+ - **Example syntax**: Fixed `c[\n]` → `c['\n']` in example .desc files. Bare
119
+ escape sequences must be quoted per characters.md spec.
120
+
121
+ ## [0.6.7] - 2025-01-01
122
+
123
+ ### Fixed
124
+ - **`:param` in conditionals**: `if[col <= :line_col]` now correctly generates
125
+ `col <= line_col` instead of literal `:line_col`.
126
+ - **`<>` for `:byte` params**: Empty class now generates `0u8` (never-match sentinel)
127
+ instead of `b'?'` which incorrectly matched question marks.
128
+ - **Function call arg validation**: `/func(param)` where `param` matches a known
129
+ parameter now errors with helpful message suggesting `:param` or `'param'`.
130
+
131
+ ## [0.6.6] - 2025-01-01
132
+
133
+ ### Added
134
+ - **Unified CharacterClass parser**: New `CharacterClass` module implements the
135
+ `characters.md` spec with consistent parsing everywhere (c[...], function args,
136
+ PREPEND). All character class syntax now goes through a single code path.
137
+ - **Param reference validation**: Bare identifiers matching param names now raise
138
+ helpful errors in both PREPEND and function calls:
139
+ - `PREPEND(foo)` → suggests `PREPEND(:foo)` or `PREPEND('foo')`
140
+ - `/func(foo)` → suggests `/func(:foo)` or `/func('foo')`
141
+ - This prevents confusing bugs where param names are treated as literal strings
142
+
143
+ ### Fixed
144
+ - **`<>` empty class consistency**: `<>` now correctly means "empty" everywhere:
145
+ - `PREPEND(<>)` → `b""` (no-op, empty prepend)
146
+ - `/func(<>)` for `:bytes` param → `b""` (empty byte slice)
147
+ - Previously `PREPEND(<>)` incorrectly output literal `<>` characters
148
+ - **Type inference for numeric comparisons**: Conditions like `space_term == 0`
149
+ no longer incorrectly type the param as `:byte`. Numeric flag comparisons stay
150
+ as `:i32`; only character literal comparisons (e.g., `close == '|'`) set `:byte`.
151
+ - **`:byte` type propagation**: When function A passes `:param` to function B
152
+ where B's param is `:byte`, A's param now correctly becomes `:byte`. Previously
153
+ only `:bytes` was propagated.
154
+ - **Hex escapes in literals**: `'\x00'` and other hex escapes now work correctly
155
+ in PREPEND and function arguments, producing actual byte values.
156
+
157
+ ### Changed
158
+ - Removed duplicate constant definitions (PREDEFINED_RANGES, SINGLE_CHAR_CLASSES)
159
+ in favor of unified CharacterClass module.
160
+ - `bytes_like_value?` now only matches `<>` - single-char values like `'|'` are
161
+ typed based on usage, not call-site inference.
162
+
163
+ ## [0.6.5] - 2024-12-31
164
+
165
+ ### Fixed
166
+ - **PREPEND quote stripping**: `PREPEND('|')` now correctly generates `b"|"` (1 byte)
167
+ instead of `b"'|'"` (3 bytes). Quoted literals are properly unquoted before embedding.
168
+ - **Lexer bracket extraction**: `c[']']` now works correctly - the lexer respects
169
+ single quotes when extracting bracket content, so `]` inside quotes doesn't close.
170
+
171
+ ### Changed
172
+ - **Stricter character validation**: Characters outside `/A-Za-z0-9_-/` in `c[...]`
173
+ must now be quoted. This catches common errors and enforces consistent syntax:
174
+ - `c["]` is invalid, use `c['"']`
175
+ - `c[#]` is invalid, use `c['#']`
176
+ - `c[abc]` is valid (alphanumeric)
177
+ - `c[-_]` is valid (hyphen and underscore allowed bare)
178
+ - **Escape sequences outside class wrapper**: Using `<SQ>`, `<P>` etc. outside a
179
+ `<...>` class wrapper now raises a clear error suggesting proper syntax.
180
+
181
+ ## [0.6.3] - 2024-12-31
182
+
183
+ ### Fixed
184
+ - **Semicolon in quoted strings**: `PREPEND(';')` no longer treats the semicolon
185
+ as a comment start. The lexer now tracks quotes when stripping comments.
186
+ - **Pipe in quoted arguments**: Function calls like `/func('|')` now parse correctly.
187
+ The lexer tracks quotes when splitting on pipe delimiters.
188
+
189
+ ### Changed
190
+ - **Validation for character syntax**: Added comprehensive validation for `c[...]`
191
+ patterns to catch unterminated quotes, bare special characters, and invalid
192
+ legacy syntax before parsing.
193
+
194
+ ## [0.6.2] - 2024-12-31
195
+
196
+ ### Fixed
197
+ - **Conditionals in SCAN branches**: Character literals and escape sequences like
198
+ `<P>` now work correctly in conditional expressions (e.g., `|if[PREV == <P>]`).
199
+ - **Escape sequences in expressions**: `rust_expr` filter now transforms embedded
200
+ escape sequences like `<P>` to `b'|'` in all expression contexts.
201
+
202
+ ## [0.6.1] - 2024-12-31
203
+
204
+ ### Added
205
+ - **LINE variable**: Access current line number (1-indexed) in expressions.
206
+ Transforms to `self.line as i32` in generated Rust code.
207
+
208
+ ## [0.6.0] - 2024-12-31
209
+
210
+ ### Changed
211
+ - **PREPEND semantics fixed**: PREPEND now correctly adds bytes to the accumulation
212
+ buffer instead of emitting a separate Text event. The prepended content is combined
213
+ with the next `term()` result using `Cow<[u8]>` for zero-copy in the common case.
214
+ - **Event content type**: Content fields in events are now `Cow<'a, [u8]>` instead of
215
+ `&'a [u8]`. This enables zero-copy when no PREPEND is used, with owned data only
216
+ when prepending is needed.
217
+
218
+ ### Added
219
+ - **Unicode identifier classes**: `XID_START`, `XID_CONT`, `XLBL_START`, `XLBL_CONT`
220
+ for Unicode-aware identifier parsing (requires `unicode-xid` crate)
221
+ - **Conditional unicode-xid import**: The crate is only required when Unicode
222
+ classes are actually used in the parser
223
+
224
+ ### Fixed
225
+ - **PREPEND buffer persistence**: The prepend buffer now persists across nested
226
+ function calls, allowing `PREPEND(*) | /paragraph` patterns to work correctly
227
+
228
+ ## [0.2.1] - 2024-12-30
229
+
230
+ ### Added
231
+ - **DIGIT character class**: Matches ASCII digits (0-9) using `is_ascii_digit()`
232
+ - **HEX_DIGIT character class**: Matches hex digits (0-9, a-f, A-F) using `is_ascii_hexdigit()`
233
+ - **`|eof` directive**: Explicit EOF handling with custom actions and inline emits
234
+ - **Parameterized byte terminators**: Functions can take byte parameters for dynamic character matching
235
+ - Syntax: `|c[:param]|` matches against parameter value
236
+ - Parameters used in char matches become `u8` type automatically
237
+ - Enables single functions to handle multiple bracket types ([], {}, ())
238
+ - **Escape sequences**: `<LP>` for `(` and `<RP>` for `)` in function arguments
239
+ - **PREPEND with parameter references**: `PREPEND(:param)` emits parameter value as Text event
240
+ - `PREPEND()` with empty content is a no-op
241
+ - `PREPEND(:param)` where param is 0 is also a no-op (runtime check)
242
+ - Parameters used in PREPEND are inferred as `u8` type
243
+
244
+ ### Fixed
245
+ - **Double emit bug (#11)**: CONTENT functions with inline emits no longer emit twice
246
+ - Inline emit (e.g., `Integer(USE_MARK)`) followed by bare `|return` now correctly
247
+ suppresses the auto-emit for the function's return type
248
+ - **EOF bypasses inline emits (#12)**: Use `|eof` directive for explicit EOF behavior
249
+ - **`|eof` not generating code (#13)**: The `|eof` directive now properly generates
250
+ action code including inline emits
251
+ - **Quote characters in function parameters**: Bare `"` and `'` now correctly convert
252
+ to `b'"'` and `b'\''` when passed as function arguments
253
+
254
+ ### Changed
255
+ - EOF handling documentation updated to reflect explicit `|eof` support
256
+ - README and CLAUDE.md updated with all character classes and EOF directive
257
+
258
+ ## [0.2.0] - 2024-12-29
259
+
260
+ ### Added
261
+ - Parameterized functions with `:param` syntax
262
+ - Combined character classes: `|c[LETTER'[.?!]|` matches class OR literal chars
263
+ - TERM adjustments: `TERM(-1)` terminates slice before current position
264
+ - PREPEND command: `PREPEND(|)` emits literal as text event
265
+ - Inline literal emits: `TypeName`, `TypeName(literal)`, `TypeName(USE_MARK)`
266
+ - PREV variable for previous byte context
267
+ - Custom error codes via `/error(ErrorCode)`
268
+
269
+ ### Fixed
270
+ - Duplicate error code generation for same return types
271
+ - Local variable scoping across states
272
+ - Functions with no states now handled gracefully
273
+ - Return with value for INTERNAL types
274
+
275
+ ## [0.1.0] - 2024-12-20
276
+
277
+ ### Added
278
+ - Initial release
279
+ - Lexer, Parser, IR Builder, Generator pipeline
280
+ - Rust code generation via Liquid templates
281
+ - SCAN optimization inference (memchr-based SIMD scanning)
282
+ - Type system: BRACKET, CONTENT, INTERNAL
283
+ - Character classes: LETTER, LABEL_CONT
284
+ - Automatic MARK/TERM for CONTENT types
285
+ - Recursive descent with true call stack