rip-lang 3.7.3 → 3.8.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,234 @@
1
+ # Solar & Lunar — Dual Parser Generators
2
+
3
+ One grammar. Two parsers. Two fundamentally different parsing strategies.
4
+
5
+ ```
6
+ grammar.rip ──→ Solar ──→ parser.js (SLR(1) table-driven, 215KB)
7
+ └→ Lunar ──→ parser-rd.js (predictive recursive descent, 110KB)
8
+ ```
9
+
10
+ Both parsers accept the same token stream from the lexer and produce identical
11
+ s-expression ASTs. They share no code at runtime — they are completely
12
+ independent implementations derived from the same grammar specification.
13
+
14
+ **Test parity:** 1,162 / 1,182 tests passing (98.3%) — 20 files at 100%.
15
+
16
+ ---
17
+
18
+ ## Files
19
+
20
+ | File | Lines | Purpose |
21
+ |------|-------|---------|
22
+ | `grammar.rip` | 944 | Grammar specification — defines all syntax rules |
23
+ | `solar.rip` | 926 | SLR(1) parser generator — produces table-driven parsers |
24
+ | `lunar.rip` | 2,412 | Predictive recursive descent generator — produces PRD parsers |
25
+
26
+ ---
27
+
28
+ ## Grammar (`grammar.rip`)
29
+
30
+ The grammar defines Rip's syntax as production rules with semantic actions.
31
+ Each rule maps a pattern of tokens and nonterminals to an s-expression:
32
+
33
+ ```coffee
34
+ # Assignment: Assignable = Expression → ["=", target, value]
35
+ Assign: [
36
+ o 'Assignable = Expression' , '["=", 1, 3]'
37
+ o 'Assignable = INDENT Expression OUTDENT' , '["=", 1, 4]'
38
+ ]
39
+
40
+ # If/else: builds nested s-expression nodes
41
+ IfBlock: [
42
+ o 'IF Expression Block' , '["if", 2, 3]'
43
+ o 'IfBlock ELSE IF Expression Block' , '...' # left-recursive chain
44
+ ]
45
+
46
+ # Binary operators: Expression OP Expression with precedence
47
+ Operation: [
48
+ o 'Expression + Expression' , '["+", 1, 3]'
49
+ o 'Expression MATH Expression' , '[2, 1, 3]'
50
+ o 'Expression ** Expression' , '["**", 1, 3]'
51
+ ]
52
+ ```
53
+
54
+ ---
55
+
56
+ ## Solar (`solar.rip`) — SLR(1) Table Parser Generator
57
+
58
+ Solar processes the grammar through a classic compiler-theory pipeline:
59
+
60
+ ```
61
+ Grammar → Process Rules → Build LR Automaton → Compute FIRST/FOLLOW
62
+ → Build Parse Table → Resolve Conflicts → Generate Code
63
+ ```
64
+
65
+ ### What it produces
66
+
67
+ A table-driven parser where every parsing decision is a lookup:
68
+ `parseTable[state][symbol]` → shift, reduce, or accept. The parse table
69
+ encodes 801 states with all transitions delta-compressed for minimal size.
70
+
71
+ ### How to use
72
+
73
+ ```bash
74
+ bun src/grammar/solar.rip grammar.rip # → parser.js (SLR table)
75
+ bun src/grammar/solar.rip -r grammar.rip # → parser-rd.js (PRD via Lunar)
76
+ bun src/grammar/solar.rip --info grammar.rip # Show grammar statistics
77
+ bun src/grammar/solar.rip --sexpr grammar.rip # Show grammar as s-expression
78
+ ```
79
+
80
+ ### Integration with Lunar
81
+
82
+ Solar imports Lunar and installs it with one line:
83
+
84
+ ```coffee
85
+ import { install as installLunar } from './lunar.rip'
86
+ # ... (Generator class definition) ...
87
+ installLunar Generator
88
+ ```
89
+
90
+ This adds `generateRD()` to the Generator prototype. When `-r` is passed,
91
+ Solar calls `generator.generateRD()` instead of `generator.generate()`.
92
+
93
+ ---
94
+
95
+ ## Lunar (`lunar.rip`) — Predictive Recursive Descent Generator
96
+
97
+ Lunar analyzes the same grammar that Solar processes and generates a
98
+ hand-rolled-looking recursive descent parser with Pratt expression parsing.
99
+
100
+ ### Architecture
101
+
102
+ The generated parser has five layers:
103
+
104
+ 1. **Token management** — `advance()`, `expect()`, `match()`, `loc()`, `withLoc()`
105
+ 2. **Speculation** — `mark()`, `reset()`, `speculate()` for backtracking
106
+ 3. **Pratt expression parser** — `parseExpression(minBP)` with binding powers
107
+ 4. **Nonterminal functions** — one `parseX()` function per grammar nonterminal
108
+ 5. **Parser shell** — same API as the table parser (`parser.parse()`, exports)
109
+
110
+ ### How it works
111
+
112
+ Lunar derives everything from the grammar — no hardcoded token or nonterminal
113
+ names in the core generators:
114
+
115
+ **Expression analysis** (`_analyzeExpressionRules`) — walks the grammar to detect:
116
+ - Which nonterminal is the "expression" (contains Operation as an alternative)
117
+ - Which is the "operation" (has the most `NT OP NT` binary rules with precedence)
118
+ - Which is the "value" (pure choice nonterminal with the most atom alternatives)
119
+ - Which is "code" (FIRST set contains `->` or `=>`)
120
+ - Assignment operators (nonterminals where all rules are `LHS TOKEN RHS`)
121
+ - Prefix starters (keyword tokens that begin expression alternatives)
122
+ - Postfix chains (left-recursive property/index/call rules through Value chain)
123
+ - Atom types (terminals reachable through the Value nonterminal chain)
124
+ - Expression-handled tokens (for skipping redundant choice alternatives)
125
+
126
+ **Pratt parser generation** (`_generateRDExpression`) — builds the while loop:
127
+ - Prefix starters dispatch to keyword-led parsers (IF→parseIf, FOR→parseFor, etc.)
128
+ - Assignment operators checked at binding power 0
129
+ - Postfix operators from Operation rules (with INDENT/TERMINATOR variants)
130
+ - Postfix chains from property/index/call rules (resolved to FIRST sets)
131
+ - Infix binary operators with correct associativity and control-flow merging
132
+ - Ternary operators
133
+ - Postfix if/unless/while/until and comprehensions
134
+ - Statement tokens (RETURN, STATEMENT) enter the Pratt loop for postfix patterns
135
+
136
+ **Nonterminal classification** (`_classifyNonterminal`) — detects patterns:
137
+
138
+ | Pattern | Detection | Example |
139
+ |---------|-----------|---------|
140
+ | `root` | Grammar start symbol | Root |
141
+ | `body-list` | Left-recursive with TERMINATOR | Body, ComponentBody |
142
+ | `comma-list` | Left-recursive with `,` | ArgList, ParamList |
143
+ | `concat-list` | Left-recursive, no separator | Interpolations, Whens |
144
+ | `left-rec-loop` | Self-referential with terminal continuation | IfBlock |
145
+ | `expression` | Contains the operation nonterminal | Expression |
146
+ | `operation` | Has binary operator rules | Operation |
147
+ | `token` | Single rule, single terminal | Identifier, Property |
148
+ | `keyword` | All rules start with unique terminals | Return, Def, Enum |
149
+ | `choice` | All rules are single-nonterminal passthroughs | Value, Line, Statement |
150
+ | `sequence` | Everything else | Assign, Catch, Block |
151
+
152
+ **Shared prefix disambiguation** (`_generateRDSharedPrefix`) — handles rules
153
+ that share a common beginning:
154
+ - Optional chain detection with rule-length-based action dispatch
155
+ - Same-token grouping with deeper disambiguation
156
+ - Terminal and nonterminal suffix separation with FIRST set grouping
157
+ - Empty-rule-as-default when nonterminal suffixes have FIRST checks
158
+
159
+ **Speculation** — `mark()`/`reset()`/`speculate()` for grammar ambiguities:
160
+ - Range vs Array: `[1..10]` vs `[1, 2, 3]` — try Range first
161
+ - Slice vs Expression in INDEX_START: `arr[1..3]` vs `arr[0]`
162
+ - Range vs destructuring in For: `for [1..5]` vs `for [a, b] as iter`
163
+ - Terminal/nonterminal overlap: `...` as Splat vs expansion marker
164
+ - Left-rec-loop lookahead: `ELSE IF` vs `ELSE Block` in IfBlock
165
+
166
+ ### Three specialized generators
167
+
168
+ Three nonterminals (out of 93) have patterns that require multi-token lookahead
169
+ and can't be handled by the generic generators:
170
+
171
+ - **For** (34 rules) — FORIN/FOROF/FORAS variants with optional WHEN/BY in both orders
172
+ - **Object** (5 rules) — comprehension vs regular object determined after parsing key:value
173
+ - **AssignObj** (6 rules) — key:value vs key=default vs shorthand vs rest after parsing key
174
+
175
+ 90 of 93 nonterminals (96.8%) are generated purely from grammar analysis.
176
+
177
+ ### Remaining 20 test failures (98.3% → 100%)
178
+
179
+ The remaining failures cluster into a few fixable categories:
180
+
181
+ | Category | Tests | Root Cause |
182
+ |----------|-------|------------|
183
+ | **Array elisions** | 5 | `[,1]`, `[1,,3]` — ArgElisionList/Elision not parsed |
184
+ | **Trailing comma** | 1 | `[1,2,]` — OptElisions at end of array |
185
+ | **Export/Import edge** | 3 | `export { x }` without FROM, `export x ~= ...`, `import x, * as m` — deeper shared-prefix disambiguation generates duplicate conditions |
186
+ | **Semicolons context** | 3 | `def foo(); 42` — Def/async without block body (CALL_END before Block) |
187
+ | **Class patterns** | 2 | Class expression without name, `@bar:` static in class body |
188
+ | **Postfix ternary** | 1 | `a = x if true else 0` — assignment context for postfix ternary |
189
+ | **Other edge cases** | 5 | `invalid extends`, `array destructuring skip`, type alias, typed runtime |
190
+
191
+ **Key fix needed for Export/Import:** The shared-prefix handler generates duplicate
192
+ `else if` branches with identical conditions when two rules share a common prefix
193
+ through nonterminals but diverge at a terminal after parsing (e.g., `} FROM` vs `}`).
194
+ The inner terminal group disambiguator needs to check terminal continuation instead
195
+ of generating separate branches.
196
+
197
+ **Key fix needed for elisions:** The Array parser routes `[` to Range speculation
198
+ then Array. Array uses ArgElisionList which calls Arg → Expression. But leading
199
+ commas `[,1]` and sparse `[1,,3]` need the Elision nonterminal which the generic
200
+ comma-list handler doesn't generate.
201
+
202
+ ---
203
+
204
+ ## Comparison
205
+
206
+ | Aspect | Solar (SLR) | Lunar (PRD) |
207
+ |--------|-------------|-------------|
208
+ | Strategy | Bottom-up table lookup | Top-down predictive |
209
+ | Output size | 215KB (encoded tables) | 110KB (readable code) |
210
+ | Startup | Decode table on load | Zero initialization |
211
+ | Debugging | "State 437" errors | Named function call stacks |
212
+ | Correctness | Mathematically derived | Grammar-derived + 3 specializations |
213
+ | Test parity | 1,235/1,235 (100%) | 1,162/1,182 (98.3%) |
214
+ | Speed | O(n) tight loop | O(n) function calls |
215
+ | Expressions | Shift/reduce with precedence | Pratt with binding powers |
216
+
217
+ Both produce identical s-expressions for 98.3% of the test suite. Having two
218
+ independent implementations from the same grammar provides cross-validation.
219
+
220
+ ---
221
+
222
+ ## The Innovation
223
+
224
+ Most parser generators commit to one strategy: Yacc produces LALR tables,
225
+ ANTLR produces LL recursive descent, PEG.js produces PEG parsers. Solar and
226
+ Lunar generate **both** from a single grammar specification.
227
+
228
+ The key insight: the FIRST/FOLLOW sets and operator precedence table that
229
+ Solar computes for SLR(1) contain all the information a Pratt-based recursive
230
+ descent parser needs. The data is the same — it's just read differently.
231
+ Solar reads it as table entries. Lunar reads it as dispatch conditions and
232
+ binding powers.
233
+
234
+ Same grammar. Same semantics. Two fundamentally different parsers.