rip-lang 2.9.1 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,576 @@
1
+ # Rip Compiler Internals
2
+
3
+ > Architecture, design decisions, and technical reference for the Rip compiler.
4
+
5
+ ---
6
+
7
+ ## Table of Contents
8
+
9
+ 1. [Why Rip](#1-why-rip)
10
+ 2. [Architecture](#2-architecture)
11
+ 3. [S-Expressions](#3-s-expressions)
12
+ 4. [Lexer & Rewriter](#4-lexer--rewriter)
13
+ 5. [Code Generation](#5-code-generation)
14
+ 6. [Compiler](#6-compiler)
15
+ 7. [Solar Parser Generator](#7-solar-parser-generator)
16
+ 8. [Debug Tools](#8-debug-tools)
17
+ 9. [Future Work](#9-future-work)
18
+
19
+ ---
20
+
21
+ # 1. Why Rip
22
+
23
+ ## The Short Version
24
+
25
+ 1. **Simplicity scales** — S-expressions make compilers 50% smaller and 10x easier to maintain
26
+ 2. **Zero dependencies** — True autonomy from the npm ecosystem
27
+ 3. **Modern output** — ES2022 everywhere, no legacy baggage
28
+ 4. **Reactivity as operators** — `:=`, `~=`, `~>` are language syntax, not library imports
29
+ 5. **Self-hosting** — Rip compiles itself, including its own parser generator
30
+
31
+ ## Why S-Expressions
32
+
33
+ Most compilers use complex AST node classes. Rip uses **simple arrays**:
34
+
35
+ ```javascript
36
+ // Traditional AST (CoffeeScript, TypeScript, Babel)
37
+ class BinaryOp {
38
+ constructor(op, left, right) { ... }
39
+ compile() { /* 50+ lines */ }
40
+ }
41
+
42
+ // Rip's S-Expression
43
+ ["+", left, right] // That's it!
44
+ ```
45
+
46
+ **Result:** CoffeeScript's compiler is 17,760 LOC. Rip's is ~7,700 LOC — smaller, yet includes a complete reactive runtime.
47
+
48
+ > **Transform the IR (s-expressions), not the output (strings).**
49
+
50
+ This single principle eliminates entire categories of bugs. When your IR is simple data (arrays), transformations are trivial and debuggable:
51
+
52
+ ```javascript
53
+ // Debugging: inspect the data directly
54
+ console.log(sexpr);
55
+ // ["comprehension", ["*", "x", 2], [["for-in", ["x"], ["array", 1, 2, 3]]], []]
56
+
57
+ // vs. string manipulation
58
+ console.log(code);
59
+ // "(() => {\n const result = [];\n for (const x of arr) {\n..."
60
+ ```
61
+
62
+ ## Rip vs CoffeeScript
63
+
64
+ | Feature | CoffeeScript | Rip |
65
+ |---------|-------------|------|
66
+ | Optional chaining | 4 soak operators | ES6 `?.` / `?.[]` / `?.()` |
67
+ | Ternary | No | `x ? a : b` |
68
+ | Regex features | Basic | Ruby-style (`=~`, indexing, captures in `_`) |
69
+ | Async shorthand | No | Dammit operator (`!`) |
70
+ | Void functions | No | `def fn!` |
71
+ | Reactivity | None | `:=`, `~=`, `~>` |
72
+ | Comprehensions | Always IIFE | Context-aware |
73
+ | Modules | CommonJS | ES6 |
74
+ | Classes | ES5 | ES6 |
75
+ | Dependencies | Multiple | **Zero** |
76
+ | Parser generator | External (Jison) | **Built-in (Solar)** |
77
+ | Self-hosting | No | **Yes** |
78
+ | Total LOC | 17,760 | ~7,700 |
79
+
80
+ ## Design Principles
81
+
82
+ - **Simplicity scales** — Simple IR, clear pipeline, minimal code, comprehensive tests
83
+ - **Zero dependencies is a feature** — No supply chain attacks, no version conflicts, no `node_modules` bloat
84
+ - **Self-hosting proves quality** — Rip compiles its own parser generator; if it can compile itself, it works
85
+
86
+ ---
87
+
88
+ # 2. Architecture
89
+
90
+ ## The Pipeline
91
+
92
+ ```
93
+ Source Code → Lexer → Parser → S-Expressions → Codegen → JavaScript
94
+ (1,542) (352) (simple arrays) (3,148) (ES2022)
95
+ ```
96
+
97
+ ## Key Files
98
+
99
+ | File | Purpose | Lines | Modify? |
100
+ |------|---------|-------|---------|
101
+ | `src/lexer.js` | Lexer + Rewriter | 1,542 | Yes |
102
+ | `src/compiler.js` | Compiler + Code Generator | 3,148 | Yes |
103
+ | `src/parser.js` | Generated parser | 352 | No (auto-gen) |
104
+ | `src/grammar/grammar.rip` | Grammar specification | 887 | Yes (carefully) |
105
+ | `src/grammar/solar.rip` | Parser generator | 1,001 | No |
106
+
107
+ ## Example Flow
108
+
109
+ ```coffee
110
+ # Input
111
+ x = 42
112
+
113
+ # Tokens (from lexer)
114
+ [["IDENTIFIER", "x"], ["=", "="], ["NUMBER", "42"]]
115
+
116
+ # S-Expression (from parser)
117
+ ["program", ["=", "x", 42]]
118
+
119
+ # Generated Code (from codegen)
120
+ "x = 42;"
121
+ ```
122
+
123
+ ---
124
+
125
+ # 3. S-Expressions
126
+
127
+ S-expressions are simple arrays that serve as Rip's intermediate representation (IR). Each has a **head** (string identifying node type) and **rest** (arguments/children).
128
+
129
+ ## Complete Node Type Reference
130
+
131
+ ### Top Level
132
+ ```javascript
133
+ ['program', ...statements]
134
+ ```
135
+
136
+ ### Variables & Assignment
137
+ ```javascript
138
+ ['=', target, value]
139
+ ['+=', target, value] // And all compound assigns: -=, *=, /=, %=, **=
140
+ ['&&=', target, value]
141
+ ['||=', target, value]
142
+ ['?=', target, value] // Maps to ??=
143
+ ['??=', target, value]
144
+ ```
145
+
146
+ ### Functions
147
+ ```javascript
148
+ ['def', name, params, body] // Named function
149
+ ['->', params, body] // Thin arrow (unbound this)
150
+ ['=>', params, body] // Fat arrow (bound this)
151
+
152
+ // Parameters can be:
153
+ 'name' // Simple param
154
+ ['rest', 'name'] // Rest: ...name
155
+ ['default', 'name', expr] // Default: name = expr
156
+ ['expansion'] // Expansion marker: (a, ..., b)
157
+ ['object', ...] // Object destructuring
158
+ ['array', ...] // Array destructuring
159
+ ```
160
+
161
+ ### Calls & Property Access
162
+ ```javascript
163
+ [callee, ...args] // Function call
164
+ ['await', expr] // Await
165
+ ['.', obj, 'prop'] // Property: obj.prop
166
+ ['?.', obj, 'prop'] // Optional: obj?.prop
167
+ ['[]', arr, index] // Index: arr[index]
168
+ ['optindex', arr, index] // Optional: arr?.[index]
169
+ ['optcall', fn, ...args] // Optional: fn?.(args)
170
+ ['new', constructorExpr] // Constructor
171
+ ['super', ...args] // Super call
172
+ ['tagged-template', tag, str] // Tagged template
173
+ ```
174
+
175
+ ### Data Structures
176
+ ```javascript
177
+ ['array', ...elements] // Array literal
178
+ ['object', ...pairs] // Object literal (pairs: [key, value])
179
+ ['...', expr] // Spread (prefix only)
180
+ ```
181
+
182
+ ### Operators
183
+ ```javascript
184
+ // Arithmetic
185
+ ['+', left, right] ['-', left, right] ['*', left, right]
186
+ ['/', left, right] ['%', left, right] ['**', left, right]
187
+
188
+ // Comparison (== compiles to ===)
189
+ ['==', left, right] ['!=', left, right]
190
+ ['<', left, right] ['<=', left, right]
191
+ ['>', left, right] ['>=', left, right]
192
+
193
+ // Logical
194
+ ['&&', left, right] ['||', left, right] ['??', left, right]
195
+
196
+ // Unary
197
+ ['!', expr] ['~', expr] ['-', expr]
198
+ ['+', expr] ['typeof', expr] ['delete', expr]
199
+ ['++', expr, isPostfix] ['--', expr, isPostfix]
200
+
201
+ // Special
202
+ ['instanceof', expr, type]
203
+ ['?', expr] // Existence check
204
+ ```
205
+
206
+ ### Control Flow
207
+ ```javascript
208
+ ['if', condition, thenBlock, elseBlock?]
209
+ ['unless', condition, body]
210
+ ['?:', condition, thenExpr, elseExpr] // Ternary
211
+ ['switch', discriminant, cases, defaultCase?]
212
+ ```
213
+
214
+ ### Loops
215
+ ```javascript
216
+ ['for-in', vars, iterable, step?, guard?, body]
217
+ ['for-of', vars, object, guard?, body]
218
+ ['for-as', vars, iterable, async?, guard?, body] // ES6 for-of on iterables
219
+ ['while', condition, body]
220
+ ['until', condition, body]
221
+ ['loop', body]
222
+ ['break'] ['continue']
223
+ ['break-if', condition] ['continue-if', condition]
224
+ ```
225
+
226
+ ### Comprehensions
227
+ ```javascript
228
+ ['comprehension', expr, iterators, guards]
229
+ ['object-comprehension', keyExpr, valueExpr, iterators, guards]
230
+ ```
231
+
232
+ ### Exceptions
233
+ ```javascript
234
+ ['try', tryBlock, [catchParam, catchBlock]?, finallyBlock?]
235
+ ['throw', expr]
236
+ ```
237
+
238
+ ### Classes
239
+ ```javascript
240
+ ['class', name, parent?, ...members]
241
+ ```
242
+
243
+ ### Ranges & Slicing
244
+ ```javascript
245
+ ['..', from, to] // Inclusive range
246
+ ['...', from, to] // Exclusive range
247
+ ```
248
+
249
+ ### Blocks & Modules
250
+ ```javascript
251
+ ['block', ...statements] // Multiple statements
252
+ ['do-iife', expr] // Do expression (IIFE)
253
+ ['import', specifiers, source]
254
+ ['export', statement]
255
+ ['export-default', expr]
256
+ ['export-all', source]
257
+ ['export-from', specifiers, source]
258
+ ```
259
+
260
+ ---
261
+
262
+ # 4. Lexer & Rewriter
263
+
264
+ The lexer (`src/lexer.js`) is a clean reimplementation that replaces the old lexer (3,260 lines) with ~1,550 lines producing the same token vocabulary the parser expects.
265
+
266
+ ## Architecture
267
+
268
+ - **9 tokenizers** in priority order: identifier, comment, whitespace, line, string, number, regex, js, literal
269
+ - **7 rewriter passes**: removeLeadingNewlines, closeOpenCalls, closeOpenIndexes, normalizeLines, tagPostfixConditionals, addImplicitBracesAndParens, addImplicitCallCommas
270
+ - **Token format**: `[tag, val]` array with `.pre`, `.data`, `.loc`, `.spaced`, `.newLine` properties
271
+
272
+ ## Token Properties
273
+
274
+ | Property | Type | Purpose |
275
+ |----------|------|---------|
276
+ | `.pre` | number | Whitespace count before this token |
277
+ | `.data` | object/null | Metadata: `{await, predicate, quote, invert, parsedValue, ...}` |
278
+ | `.loc` | `{r, c, n}` | Row, column, length |
279
+ | `.spaced` | boolean | Sugar for `.pre > 0` |
280
+ | `.newLine` | boolean | Preceded by a newline |
281
+
282
+ ## Identifier Suffixes
283
+
284
+ | Suffix | Data flag | Meaning | JS output |
285
+ |--------|-----------|---------|-----------|
286
+ | `!` | `.data.await = true` | Dammit operator | `await` + base name |
287
+ | `?` | `.data.predicate = true` | Existence check | `(expr != null)` |
288
+
289
+ The `?` suffix is captured only when NOT followed by `.`, `?`, `[`, or `(` — so `?.` (optional chaining), `??` (nullish coalescing), `?.()`, and `?.[i]` remain unambiguous.
290
+
291
+ The `!` suffix on `as` in for-loops (`as!`) emits `FORASAWAIT` instead of `FORAS`, enabling `for x as! iterable` as shorthand for `for await x as iterable`.
292
+
293
+ ## Language Changes (3.0 Rewrite)
294
+
295
+ ### Removed
296
+
297
+ | Feature | Old syntax | Replacement |
298
+ |---------|-----------|-------------|
299
+ | Postfix spread/rest | `x...` | `...x` (ES6 prefix only) |
300
+ | Prototype access | `x::y`, `x?::y` | Direct `.prototype` or class syntax; `::` reserved for type annotations |
301
+ | Soak call sugar | `x?(args)` | `x?.(args)` (ES6 optional call) |
302
+ | Soak index sugar | `x?[i]` | `x?.[i]` (ES6 optional index) |
303
+ | `is not` contraction | `x is not y` | `x isnt y` |
304
+
305
+ ### Added
306
+
307
+ | Feature | Syntax | Purpose |
308
+ |---------|--------|---------|
309
+ | `for...as` iteration | `for x as iter` | ES6 `for...of` on iterables (replaces `for x from iter`) |
310
+ | `as!` async shorthand | `for x as! iter` | Shorthand for `for await x as iter` |
311
+
312
+ ### Changed
313
+
314
+ | Item | Old | New |
315
+ |------|-----|-----|
316
+ | Location data | `locationData` (object) | `.loc = {r, c, n}` |
317
+ | `for...from` keyword | `FORFROM` | `FORAS` |
318
+ | Token metadata | `new String(val)` with props | `.data` object on token |
319
+ | Category arrays | `Array` + `indexOf` | `Set` + `.has()` |
320
+ | Variable style | `const`/`let` mix | All `let` |
321
+ | Rewriter passes | 13 | 7 |
322
+
323
+ ### Preserved
324
+
325
+ All 9 tokenizer methods, full token vocabulary, implicit call/object/brace detection, string interpolation with recursive sub-lexing, heredoc indent processing, arrow function parameter tagging, `do` IIFE support, `for own x of obj`, all reactive operators (`:=`, `~=`, `~>`, `=!`), all Rip aliases (`and`, `or`, `is`, `isnt`, `not`, `yes`, `no`, `on`, `off`).
326
+
327
+ ---
328
+
329
+ # 5. Code Generation
330
+
331
+ The compiler (`src/compiler.js`) transforms s-expressions into JavaScript. The `CodeGenerator` class is a dispatch table — s-expression heads map to generator methods.
332
+
333
+ ## Context-Aware Generation
334
+
335
+ Some patterns generate different code based on usage context:
336
+
337
+ ```javascript
338
+ generate(sexpr, context = 'statement') {
339
+ // context can be 'statement' or 'value'
340
+ }
341
+ ```
342
+
343
+ **Comprehensions** are the primary example:
344
+
345
+ ```coffee
346
+ # Statement context (result discarded) → Plain loop
347
+ console.log x for x in arr
348
+
349
+ # Value context (result used) → IIFE with array building
350
+ result = (x * 2 for x in arr)
351
+ ```
352
+
353
+ | Parent Node | Child | Context | Reason |
354
+ |-------------|-------|---------|--------|
355
+ | Assignment | RHS | `'value'` | Value assigned to variable |
356
+ | Call | Arguments | `'value'` | Values passed to function |
357
+ | Return | Expression | `'value'` | Value returned from function |
358
+ | Function | Last statement | `'value'` | Implicit return |
359
+ | Function | Non-last statements | `'statement'` | Result discarded |
360
+ | Loop | Body | `'statement'` | Loops don't return values |
361
+ | If/Unless | Branches | Inherit parent | Pass through context |
362
+ | Array | Elements | `'value'` | Values stored in array |
363
+
364
+ ## Variable Scoping
365
+
366
+ CoffeeScript semantics: function-level scoping with closure access.
367
+
368
+ - `collectProgramVariables()` — Walks top-level, stops at functions
369
+ - `collectFunctionVariables()` — Walks function body, stops at nested functions
370
+ - Filters out outer variables (accessed via closure)
371
+ - Emits `let` declarations at scope top
372
+
373
+ ## Auto-Detection
374
+
375
+ Functions automatically become async or generators:
376
+
377
+ ```coffee
378
+ # Contains await or dammit → becomes async
379
+ def loadData(id)
380
+ user = getUser!(id)
381
+ user.posts
382
+
383
+ # Contains yield → becomes generator
384
+ counter = ->
385
+ yield 1
386
+ yield 2
387
+ ```
388
+
389
+ ## Existence Check
390
+
391
+ | Syntax | Compiles To |
392
+ |--------|-------------|
393
+ | `x?` | `(x != null)` |
394
+ | `obj.prop?` | `(obj.prop != null)` |
395
+ | `x ?? y` | `x ?? y` |
396
+ | `x ??= 10` | `x ??= 10` |
397
+
398
+ Optional chaining uses ES6 syntax:
399
+
400
+ | Syntax | Compiles To |
401
+ |--------|-------------|
402
+ | `obj?.prop` | `obj?.prop` |
403
+ | `arr?.[0]` | `arr?.[0]` |
404
+ | `fn?.(x)` | `fn?.(x)` |
405
+
406
+ ## Range Optimization
407
+
408
+ ```coffee
409
+ for i in [1...100]
410
+ process(i)
411
+ # → for (let i = 1; i < 100; i++) { process(i); }
412
+ ```
413
+
414
+ ## String & Regex Processing
415
+
416
+ String tokens carry metadata in `.data`:
417
+ - `quote`: The quote delimiter (`"`, `'`, `"""`, `'''`, `///`)
418
+ - `quote.length === 3`: Indicates a heredoc
419
+
420
+ Heredocs use the closing delimiter's column position as the baseline for indentation stripping.
421
+
422
+ REGEX tokens store `delimiter` and optional `heregex` flags in `token.data`.
423
+
424
+ ---
425
+
426
+ # 6. Compiler
427
+
428
+ The compiler (`src/compiler.js`) is a clean reimplementation replacing the old compiler (6,016 lines) with ~3,150 lines producing identical JavaScript output.
429
+
430
+ ## Structure
431
+
432
+ ```
433
+ CodeGenerator class
434
+ - GENERATORS dispatch table (~55 generators)
435
+ - Variable collection (program + function scope)
436
+ - Main generate() dispatch
437
+ - ~55 generate* methods
438
+ - Body/formatting/utility helpers
439
+ - Reactive runtime (inline string, ~270 lines)
440
+ Compiler class (with shim adapter for new lexer)
441
+ Convenience exports
442
+ ```
443
+
444
+ ## Metadata Bridge
445
+
446
+ Two one-line helpers isolate all `new String()` awareness:
447
+
448
+ ```javascript
449
+ let meta = (node, key) => node instanceof String ? node[key] : undefined;
450
+ let str = (node) => node instanceof String ? node.valueOf() : node;
451
+ ```
452
+
453
+ The `Compiler` class's lexer adapter reconstructs `new String()` wrapping from the new lexer's `token.data` property, so grammar actions pass metadata through s-expressions unchanged.
454
+
455
+ ## Removed Generators
456
+
457
+ | Generator | S-expr | Reason |
458
+ |-----------|--------|--------|
459
+ | `generatePrototype` | `::` | Feature removed from lexer |
460
+ | `generateOptionalPrototype` | `?::` | Feature removed from lexer |
461
+ | `generateSoakIndex` | `?[]` | Replaced by `optindex` / `?.[]` |
462
+ | `generateSoakCall` | `?call` | Replaced by `optcall` / `?.()` |
463
+
464
+ ## Renamed: `for-from` → `for-as`
465
+
466
+ - `GENERATORS['for-as']` replaces `GENERATORS['for-from']`
467
+ - Grammar adds `FORASAWAIT` token: `for x as! iter` → `for await x as iter`
468
+ - Both forms produce the same s-expression: `["for-as", vars, iterable, true, guard, body]`
469
+
470
+ ## Consolidation
471
+
472
+ | Area | Old lines | New lines | Reduction |
473
+ |------|-----------|-----------|-----------|
474
+ | Total file | 6,016 | ~3,150 | **48%** |
475
+ | Body generation | ~500 | ~200 | 60% |
476
+ | Variable collection | ~230 | ~100 | 57% |
477
+ | Helper methods | ~600 | ~250 | 58% |
478
+
479
+ ---
480
+
481
+ # 7. Solar Parser Generator
482
+
483
+ **Solar** is a complete SLR(1) parser generator included with Rip — written in Rip, compiled by Rip, zero external dependencies.
484
+
485
+ **Location:** `src/grammar/solar.rip` (1,001 lines)
486
+
487
+ ## Grammar Syntax
488
+
489
+ ```coffeescript
490
+ o = (pattern, action, options) ->
491
+ pattern = pattern.trim().replace /\s{2,}/g, ' '
492
+ [pattern, action ? 1, options]
493
+ ```
494
+
495
+ **Style 1: Pass-Through** — Omit action, returns first token:
496
+ ```coffeescript
497
+ Expression: [
498
+ o 'Value'
499
+ o 'Operation'
500
+ ]
501
+ ```
502
+
503
+ **Style 2: S-Expression** — Bare numbers become token references:
504
+ ```coffeescript
505
+ For: [
506
+ o 'FOR ForVariables FOROF Expression Block', '["for-of", 2, 4, null, 5]'
507
+ ]
508
+ ```
509
+
510
+ **Style 3: Advanced** — `$n` patterns for conditional logic:
511
+ ```coffeescript
512
+ Parenthetical: [
513
+ o '( Body )', '$2.length === 1 ? $2[0] : $2'
514
+ ]
515
+ ```
516
+
517
+ ## Performance
518
+
519
+ | Metric | Jison | Solar |
520
+ |--------|-------|-------|
521
+ | Parse time | 12,500ms | ~50ms |
522
+ | Dependencies | Many | Zero |
523
+ | Self-hosting | No | Yes |
524
+ | Code size | 2,285 LOC | 1,001 LOC |
525
+
526
+ After modifying `src/grammar/grammar.rip`:
527
+
528
+ ```bash
529
+ bun run parser # Regenerates src/parser.js
530
+ ```
531
+
532
+ ---
533
+
534
+ # 8. Debug Tools
535
+
536
+ ```bash
537
+ # See tokens from lexer
538
+ echo 'x = 42' | ./bin/rip -t
539
+
540
+ # See s-expressions from parser
541
+ echo 'x = 42' | ./bin/rip -s
542
+
543
+ # See generated JavaScript
544
+ echo 'x = 42' | ./bin/rip -c
545
+
546
+ # Interactive REPL with debug modes
547
+ ./bin/rip
548
+ rip> .tokens # Toggle token display
549
+ rip> .sexp # Toggle s-expression display
550
+ rip> .js # Toggle JS display
551
+ ```
552
+
553
+ Compare old and new compilers across all test suites:
554
+
555
+ ```bash
556
+ bun src/compare-compilers.js
557
+ ```
558
+
559
+ ---
560
+
561
+ # 9. Future Work
562
+
563
+ - `::` token for type annotations (see `docs/RIP-TYPES.md`)
564
+ - Comment preservation for source maps
565
+ - Parser update to read `.data` directly instead of `new String()` properties
566
+ - Once parser supports `.data`, the `meta()`/`str()` helpers become trivial to update
567
+
568
+ ---
569
+
570
+ **See Also:**
571
+ - [RIP-LANG.md](RIP-LANG.md) — Language reference
572
+ - [RIP-REACTIVITY.md](RIP-REACTIVITY.md) — Reactivity deep dive
573
+
574
+ ---
575
+
576
+ *Rip 3.0.0 — 1,048 tests passing — Zero dependencies — Self-hosting — ~7,700 LOC*