kumi 0.0.9 → 0.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (103) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +18 -0
  3. data/CLAUDE.md +18 -258
  4. data/README.md +188 -121
  5. data/docs/AST.md +1 -1
  6. data/docs/FUNCTIONS.md +52 -8
  7. data/docs/VECTOR_SEMANTICS.md +286 -0
  8. data/docs/compiler_design_principles.md +86 -0
  9. data/docs/features/README.md +15 -2
  10. data/docs/features/hierarchical-broadcasting.md +349 -0
  11. data/docs/features/javascript-transpiler.md +148 -0
  12. data/docs/features/performance.md +1 -3
  13. data/docs/features/s-expression-printer.md +2 -2
  14. data/docs/schema_metadata.md +7 -7
  15. data/examples/deep_schema_compilation_and_evaluation_benchmark.rb +21 -15
  16. data/examples/game_of_life.rb +2 -4
  17. data/lib/kumi/analyzer.rb +34 -14
  18. data/lib/kumi/compiler.rb +4 -283
  19. data/lib/kumi/core/analyzer/passes/broadcast_detector.rb +717 -66
  20. data/lib/kumi/core/analyzer/passes/dependency_resolver.rb +1 -1
  21. data/lib/kumi/core/analyzer/passes/input_access_planner_pass.rb +47 -0
  22. data/lib/kumi/core/analyzer/passes/input_collector.rb +118 -99
  23. data/lib/kumi/core/analyzer/passes/join_reduce_planning_pass.rb +293 -0
  24. data/lib/kumi/core/analyzer/passes/lower_to_ir_pass.rb +993 -0
  25. data/lib/kumi/core/analyzer/passes/pass_base.rb +2 -2
  26. data/lib/kumi/core/analyzer/passes/scope_resolution_pass.rb +346 -0
  27. data/lib/kumi/core/analyzer/passes/semantic_constraint_validator.rb +28 -0
  28. data/lib/kumi/core/analyzer/passes/toposorter.rb +9 -3
  29. data/lib/kumi/core/analyzer/passes/type_checker.rb +9 -5
  30. data/lib/kumi/core/analyzer/passes/type_consistency_checker.rb +2 -2
  31. data/lib/kumi/core/analyzer/passes/{type_inferencer.rb → type_inferencer_pass.rb} +4 -4
  32. data/lib/kumi/core/analyzer/passes/unsat_detector.rb +92 -48
  33. data/lib/kumi/core/analyzer/plans.rb +52 -0
  34. data/lib/kumi/core/analyzer/structs/access_plan.rb +20 -0
  35. data/lib/kumi/core/analyzer/structs/input_meta.rb +29 -0
  36. data/lib/kumi/core/compiler/access_builder.rb +36 -0
  37. data/lib/kumi/core/compiler/access_planner.rb +219 -0
  38. data/lib/kumi/core/compiler/accessors/base.rb +69 -0
  39. data/lib/kumi/core/compiler/accessors/each_indexed_accessor.rb +84 -0
  40. data/lib/kumi/core/compiler/accessors/materialize_accessor.rb +55 -0
  41. data/lib/kumi/core/compiler/accessors/ravel_accessor.rb +73 -0
  42. data/lib/kumi/core/compiler/accessors/read_accessor.rb +41 -0
  43. data/lib/kumi/core/compiler_base.rb +137 -0
  44. data/lib/kumi/core/error_reporter.rb +6 -5
  45. data/lib/kumi/core/errors.rb +4 -0
  46. data/lib/kumi/core/explain.rb +157 -205
  47. data/lib/kumi/core/export/node_builders.rb +2 -2
  48. data/lib/kumi/core/export/node_serializers.rb +1 -1
  49. data/lib/kumi/core/function_registry/collection_functions.rb +100 -6
  50. data/lib/kumi/core/function_registry/conditional_functions.rb +14 -4
  51. data/lib/kumi/core/function_registry/function_builder.rb +142 -53
  52. data/lib/kumi/core/function_registry/logical_functions.rb +173 -3
  53. data/lib/kumi/core/function_registry/stat_functions.rb +156 -0
  54. data/lib/kumi/core/function_registry.rb +138 -98
  55. data/lib/kumi/core/ir/execution_engine/combinators.rb +117 -0
  56. data/lib/kumi/core/ir/execution_engine/interpreter.rb +336 -0
  57. data/lib/kumi/core/ir/execution_engine/values.rb +46 -0
  58. data/lib/kumi/core/ir/execution_engine.rb +50 -0
  59. data/lib/kumi/core/ir.rb +58 -0
  60. data/lib/kumi/core/ruby_parser/build_context.rb +2 -2
  61. data/lib/kumi/core/ruby_parser/declaration_reference_proxy.rb +0 -12
  62. data/lib/kumi/core/ruby_parser/dsl_cascade_builder.rb +37 -16
  63. data/lib/kumi/core/ruby_parser/input_builder.rb +61 -8
  64. data/lib/kumi/core/ruby_parser/parser.rb +1 -1
  65. data/lib/kumi/core/ruby_parser/schema_builder.rb +2 -2
  66. data/lib/kumi/core/ruby_parser/sugar.rb +7 -0
  67. data/lib/kumi/errors.rb +2 -0
  68. data/lib/kumi/js.rb +23 -0
  69. data/lib/kumi/registry.rb +17 -22
  70. data/lib/kumi/runtime/executable.rb +213 -0
  71. data/lib/kumi/schema.rb +15 -4
  72. data/lib/kumi/schema_metadata.rb +2 -2
  73. data/lib/kumi/support/ir_dump.rb +491 -0
  74. data/lib/kumi/support/s_expression_printer.rb +17 -16
  75. data/lib/kumi/syntax/array_expression.rb +6 -6
  76. data/lib/kumi/syntax/call_expression.rb +4 -4
  77. data/lib/kumi/syntax/cascade_expression.rb +4 -4
  78. data/lib/kumi/syntax/case_expression.rb +4 -4
  79. data/lib/kumi/syntax/declaration_reference.rb +4 -4
  80. data/lib/kumi/syntax/hash_expression.rb +4 -4
  81. data/lib/kumi/syntax/input_declaration.rb +6 -5
  82. data/lib/kumi/syntax/input_element_reference.rb +5 -5
  83. data/lib/kumi/syntax/input_reference.rb +5 -5
  84. data/lib/kumi/syntax/literal.rb +4 -4
  85. data/lib/kumi/syntax/location.rb +5 -0
  86. data/lib/kumi/syntax/node.rb +33 -34
  87. data/lib/kumi/syntax/root.rb +6 -6
  88. data/lib/kumi/syntax/trait_declaration.rb +4 -4
  89. data/lib/kumi/syntax/value_declaration.rb +4 -4
  90. data/lib/kumi/version.rb +1 -1
  91. data/lib/kumi.rb +6 -15
  92. data/scripts/analyze_broadcast_methods.rb +68 -0
  93. data/scripts/analyze_cascade_methods.rb +74 -0
  94. data/scripts/check_broadcasting_coverage.rb +51 -0
  95. data/scripts/find_dead_code.rb +114 -0
  96. metadata +36 -9
  97. data/docs/features/array-broadcasting.md +0 -170
  98. data/lib/kumi/cli.rb +0 -449
  99. data/lib/kumi/core/compiled_schema.rb +0 -43
  100. data/lib/kumi/core/evaluation_wrapper.rb +0 -40
  101. data/lib/kumi/core/schema_instance.rb +0 -111
  102. data/lib/kumi/core/vectorization_metadata.rb +0 -110
  103. data/migrate_to_core_iterative.rb +0 -938
@@ -0,0 +1,286 @@
1
+ # Kumi Vector Semantics — Short Guide
2
+
3
+ This note documents how Kumi handles **vectorized traversal** over **arbitrary nested objects**, how **alignment/broadcasting** works, and how **reducers** and **structure functions** behave. It’s intentionally concise but hits all the sharp edges.
4
+
5
+ ---
6
+
7
+ ## Terminology
8
+
9
+ * **Path** – a dot-separated traversal, e.g. `input.regions.offices.employees.salary`.
10
+ * **Scope (axes)** – the list of array segments encountered along a path.
11
+ Example: for `regions.offices.employees.salary` the scope is `[:regions, :offices, :employees]`.
12
+ * **Rank** – number of axes = `scope.length`.
13
+ * **Index tuple** – lexicographic coordinates per axis, e.g. `[region_i, office_j, employee_k]`.
14
+
15
+ **Three Laws (think of them as invariants):**
16
+
17
+ 1. **Enumeration**
18
+ `each_indexed(path).map(&:first) == ravel(path)`
19
+
20
+ 2. **Reconstruction**
21
+ `lift(to_scope, each_indexed(path))` regroups by `to_scope` (must be a prefix of `scope(path)`).
22
+
23
+ 3. **Counting**
24
+ `size(path) == ravel(path).length == each_indexed(path).count`
25
+
26
+ These laws are the mental model. Everything else is just mechanics.
27
+
28
+ ---
29
+
30
+ ## Access Modes
31
+
32
+ Kumi’s Access Planner emits low-level ops (`enter_hash`, `enter_array`) and supports three vector modes per path:
33
+
34
+ ### 1) `:materialize`
35
+
36
+ Return the **original nested structure** down to that path (no enumeration).
37
+ Good for “give me the data shaped like the input.”
38
+
39
+ ```ruby
40
+ # Input (object mode)
41
+ {
42
+ regions: [
43
+ { name: "E", offices: [{ employees: [{salary: 100}, {salary: 120}] }] },
44
+ { name: "D", offices: [{ employees: [{salary: 90}] }] }
45
+ ]
46
+ }
47
+
48
+ materialize("regions.offices.employees.salary")
49
+ # => [[ [100,120] ], [ [90] ]]
50
+ ```
51
+
52
+ ### 2) `:ravel`
53
+
54
+ **Enumerate elements at the next array boundary** for that path, i.e., “collect the items at this depth.”
55
+ It is **not** NumPy’s “flatten everything.” It collects the next level.
56
+
57
+ ```ruby
58
+ ravel("regions") # => [ {…E…}, {…D…} ] (enumerate regions)
59
+ ravel("regions.offices") # => [ {employees:[…]}, {employees:[…]} ] (each office)
60
+ ravel("regions.offices.employees.salary") # => [ [100,120], [90] ] (each employee group at that depth)
61
+ ```
62
+
63
+ ### 3) `:each_indexed`
64
+
65
+ Enumerate leaf values **with** their index tuple (authoritative for `lift` and alignment):
66
+
67
+ ```ruby
68
+ each_indexed("regions.offices.employees.salary")
69
+ # => [
70
+ # [100, [0,0,0]], [120, [0,0,1]],
71
+ # [ 90, [1,0,0]]
72
+ # ]
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Lift (Regroup by prefix)
78
+
79
+ `lift(to_scope)` turns a vector-of-rows (from `each_indexed`) into a nested array grouped by `to_scope`.
80
+
81
+ ```ruby
82
+ # Given values from each_indexed above:
83
+ lift([:regions], …) # => [ [100,120], [90] ]
84
+ lift([:regions,:offices], …) # => [ [[100,120]], [[90]] ]
85
+ lift([:regions,:offices,:employees], …) # => [ [[[100,120]]], [[[90]]] ]
86
+ ```
87
+
88
+ * `to_scope` must be a **prefix** of the vector’s `scope`.
89
+ * Depth is derived mechanically from index arity; VM doesn’t guess.
90
+
91
+ ---
92
+
93
+ ## Alignment & Broadcasting
94
+
95
+ When mapping a function over multiple arguments, Kumi:
96
+
97
+ 1. Picks a **carrier** vector (the one with the longest scope).
98
+ 2. **Aligns** other vectors to the carrier if they are **prefix-compatible** (same axes prefix).
99
+ 3. **Broadcasts** scalars across the carrier.
100
+
101
+ If scopes aren’t prefix-compatible, lowering raises:
102
+ `cross-scope map without join: [:a] vs [:b,:c]`
103
+
104
+ ```ruby
105
+ # price, quantity both scope [:items]
106
+ final = price * quantity # zip by position (same scope)
107
+
108
+ # Broadcast scalar across [:items]
109
+ discounted = price * 0.9
110
+
111
+ # Align prefix [:regions] to carrier [:regions,:offices]
112
+ aligned_tax = align_to(offices_subtotals, regions_tax)
113
+ total = offices_subtotals * (1 - aligned_tax)
114
+ ```
115
+
116
+ ---
117
+
118
+ ## Structure Functions vs Reducers
119
+
120
+ * **Reducers** collapse a vector to a **scalar** (e.g., `sum`, `min`, `avg`).
121
+ Lowering selects a vector argument and emits a `Reduce`.
122
+
123
+ * **Structure functions** observe or reshape **structure** (e.g., `size`, `flatten`, `count_across`).
124
+ Lowering usually uses a `:ravel` plan and a plain `Map` (no indices required).
125
+
126
+ ### Laws for `size` and `flatten`
127
+
128
+ * `size(path) == ravel(path).length` (Counting Law)
129
+ * `flatten(path)` flattens nested arrays (by default all levels; use `flatten_one` for one level).
130
+
131
+ ---
132
+
133
+ ## End-to-End Mini Examples
134
+
135
+ ### A. Simple vector math + reducers (object access)
136
+
137
+ ```ruby
138
+ module Cart
139
+ extend Kumi::Schema
140
+ schema do
141
+ input do
142
+ array :items do
143
+ float :price
144
+ integer :qty
145
+ end
146
+ float :shipping_threshold
147
+ end
148
+
149
+ value :subtotals, input.items.price * input.items.qty
150
+ value :subtotal, fn(:sum, subtotals)
151
+ value :shipping, subtotal > input.shipping_threshold ? 0.0 : 9.99
152
+ value :total, subtotal + shipping
153
+ end
154
+ end
155
+
156
+ data = {
157
+ items: [{price: 100.0, qty: 2}, {price: 200.0, qty: 1}],
158
+ shipping_threshold: 50.0
159
+ }
160
+
161
+ r = Cart.from(data)
162
+ r[:subtotals] # => [200.0, 200.0] (vector map)
163
+ r[:subtotal] # => 400.0 (reducer)
164
+ r[:shipping] # => 0.0
165
+ r[:total] # => 400.0
166
+ ```
167
+
168
+ **Internal truths**:
169
+
170
+ * `each_indexed(input.items.price)` → `[[100.0,[0]],[200.0,[1]]]`
171
+ * `size(input.items)` → `2` because `ravel(input.items)` has length 2.
172
+
173
+ ### B. Mixed scopes + alignment
174
+
175
+ ```ruby
176
+ module Regions
177
+ extend Kumi::Schema
178
+ schema do
179
+ input do
180
+ array :regions do
181
+ float :tax
182
+ array :offices do
183
+ array :employees do
184
+ float :salary
185
+ end
186
+ end
187
+ end
188
+ end
189
+
190
+ value :office_payrolls, fn(:sum, input.regions.offices.employees.salary) # vector reduce per office
191
+ value :taxed, office_payrolls * (1 - input.regions.tax) # tax (align regions.tax to [:regions,:offices])
192
+ end
193
+ end
194
+
195
+ # Alignment rule: regions.tax (scope [:regions]) aligns to office_payrolls (scope [:regions,:offices])
196
+ ```
197
+
198
+ ### C. Element access (pure arrays) + structure functions
199
+
200
+ ```ruby
201
+ module Cube
202
+ extend Kumi::Schema
203
+ schema do
204
+ input do
205
+ array :cube do
206
+ element :array, :layer do
207
+ element :array, :row do
208
+ element :float, :cell
209
+ end
210
+ end
211
+ end
212
+ end
213
+
214
+ value :layers, fn(:size, input.cube) # == ravel(input.cube).length
215
+ value :matrices, fn(:size, input.cube.layer) # enumerate at next depth
216
+ value :rows, fn(:size, input.cube.layer.row)
217
+ value :all_values, fn(:flatten, input.cube.layer.row.cell)
218
+ value :total, fn(:sum, all_values)
219
+ end
220
+ end
221
+
222
+ data = { cube: [ [[1,2],[3]], [[4]] ] }
223
+
224
+ # ravel views (intuition)
225
+ # ravel(cube) => [ [[1,2],[3]], [[4]] ]
226
+ # ravel(cube.layer) => [ [1,2], [3], [4] ]
227
+ # ravel(cube.layer.row) => [ 1, 2, 3, 4 ]
228
+ # ravel(cube.layer.row.cell) => [ 1, 2, 3, 4 ] (same leaf)
229
+
230
+ c = Cube.from(data)
231
+ c[:layers] # => 2
232
+ c[:matrices] # => 3
233
+ c[:rows] # => 4
234
+ c[:all_values] # => [1,2,3,4]
235
+ c[:total] # => 10
236
+ ```
237
+
238
+ ---
239
+
240
+ ## Planner & VM: Who does what?
241
+
242
+ * **Planner**: Emits deterministic `enter_hash`/`enter_array` sequences per path and mode.
243
+
244
+ * For element edges (inline array aliases), it **does not** emit `enter_hash`.
245
+ * For `:each_indexed` / `:ravel`, it appends a terminal `enter_array` **only if** the final node is an array.
246
+ * **Lowerer**: Decides plans (`:ravel`, `:each_indexed`, `:materialize`), inserts `align_to`, emits `lift` at declaration boundary when a vector result should be exposed as a scalar nested array.
247
+ * **VM**: Purely mechanical:
248
+
249
+ * `broadcast_scalar` for scalar→vec expansion,
250
+ * `zip_same_scope` when scopes match,
251
+ * `align_to` for prefix alignment,
252
+ * `group_rows` inside `lift` to reconstruct prefixes.
253
+
254
+ No type sniffing or guesses: the IR is the source of truth.
255
+
256
+ ---
257
+
258
+ ## Jagged & Sparse Arrays
259
+
260
+ * Ordering is **lexicographic by index tuple** (stable).
261
+ * No padding is introduced; missing branches are just… missing.
262
+ * `align_to(..., on_missing: :error|:nil)` enforces policy.
263
+
264
+ ---
265
+
266
+ ## Error Policies
267
+
268
+ For missing keys/arrays, accessors obey policy:
269
+
270
+ * `:error` (default) – raise descriptive error with the path/mode.
271
+ * `:skip` – drop the missing branch (useful in ravels).
272
+ * `:yield_nil` – emit `nil` in place (preserves cardinality).
273
+
274
+ Document these on any user-facing accessor.
275
+
276
+ ---
277
+
278
+ ## Quick Cheatsheet
279
+
280
+ * Use **`ravel(path)`** to “list the things at this level.”
281
+ * Use **`each_indexed(path)`** when you need `(value, idx)` pairs for joins/regroup.
282
+ * Use **`lift(to_scope, each_indexed(path))`** to reconstruct nested structure.
283
+ * **Reducers** (e.g., `sum`, `avg`, `min`) consume the raveled view of their argument.
284
+ * **Structure functions** (e.g., `size`, `flatten`, `flatten_one`, `count_across`) operate on structure at that depth and usually compile via `:ravel`.
285
+
286
+ Keep the three laws in mind and Kumi’s behavior is predictable—even over deeply nested, heterogeneous data.
@@ -0,0 +1,86 @@
1
+ # Compiler Design Principles
2
+
3
+ ## Core Principle: Smart Analyzer, Dumb Compiler
4
+
5
+ The Kumi compiler follows a strict separation of concerns:
6
+
7
+ ### Analyzer Phase (Smart)
8
+ - **Makes all decisions** about how operations should be executed
9
+ - **Analyzes semantic context** to determine operation modes
10
+ - **Pre-computes execution strategies** and stores in metadata
11
+ - **Resolves complex logic** like nested array broadcasting, reduction flattening, etc.
12
+ - **Produces complete instructions** for the compiler to follow
13
+
14
+ ### Compiler Phase (Dumb)
15
+ - **Follows metadata instructions** without making decisions
16
+ - **No conditional logic** based on data types, function types, or structure analysis
17
+ - **Mechanically executes** the pre-computed strategy from analyzer
18
+ - **Pure translation** from AST + metadata → executable functions
19
+
20
+ ## Examples
21
+
22
+ ### ❌ BAD: Compiler Making Decisions
23
+ ```ruby
24
+ def compile_call(expr)
25
+ if Kumi::Registry.reducer?(expr.fn_name)
26
+ if nested_array_detected?(values)
27
+ # Compiler deciding to flatten
28
+ flatten_and_call(expr.fn_name, values)
29
+ end
30
+ end
31
+ end
32
+ ```
33
+
34
+ ### ✅ GOOD: Compiler Following Metadata
35
+ ```ruby
36
+ def compile_call(expr)
37
+ # Just read the pre-computed strategy
38
+ strategy = @analysis.metadata[:call_strategies][expr]
39
+ execute_strategy(strategy, expr)
40
+ end
41
+ ```
42
+
43
+ ### ❌ BAD: Runtime Structure Analysis
44
+ ```ruby
45
+ def vectorized_function_call(fn_name, values)
46
+ # Compiler analyzing structure at runtime
47
+ if values.any? { |v| deeply_nested?(v) }
48
+ apply_nested_broadcasting(fn, values)
49
+ end
50
+ end
51
+ ```
52
+
53
+ ### ✅ GOOD: Pre-computed Broadcasting Plan
54
+ ```ruby
55
+ def compile_element_field_reference(expr)
56
+ # Analyzer already determined the strategy
57
+ metadata = @analysis.state[:broadcasts][:nested_paths][expr.path]
58
+ traverse_nested_path(ctx, expr.path, metadata[:operation_mode])
59
+ end
60
+ ```
61
+
62
+ ## Benefits
63
+
64
+ 1. **Predictable Performance**: No runtime analysis or decision-making
65
+ 2. **Easier Testing**: Compiler behavior determined entirely by metadata
66
+ 3. **Maintainable**: Complex logic isolated in analyzer passes
67
+ 4. **Extensible**: New features added by extending analyzer, not compiler
68
+ 5. **Debuggable**: All decisions visible in analyzer metadata
69
+
70
+ ## Implementation Pattern
71
+
72
+ For any new compiler feature:
73
+
74
+ 1. **Analyzer Pass**: Analyze the requirement and store strategy in metadata
75
+ 2. **Metadata Schema**: Define clear data structure for the strategy
76
+ 3. **Compiler Method**: Read metadata and execute strategy mechanically
77
+ 4. **No Conditionals**: Avoid `if` statements based on runtime data in compiler
78
+
79
+ ## Metadata-Driven Architecture
80
+
81
+ The compiler should be a pure **metadata interpreter**:
82
+ - Input: AST + Analyzer Metadata
83
+ - Output: Executable Functions
84
+ - Process: Mechanical translation following metadata instructions
85
+
86
+ This ensures the compiler remains simple, fast, and maintainable as the system grows in complexity.
@@ -30,9 +30,15 @@ Defines expected inputs with types and constraints.
30
30
  - Domain validation at runtime
31
31
  - Separates input metadata from business logic
32
32
 
33
+ ### [Hierarchical Broadcasting](hierarchical-broadcasting.md)
34
+ Automatic vectorization over hierarchical data structures with dual access modes.
35
+
36
+ - Object access for structured business data
37
+ - Element access for multi-dimensional arrays
38
+ - Mixed access modes in same schema
39
+
33
40
  ### [Performance](performance.md)
34
- TODO: Add benchmark data
35
- Processes large schemas with optimized algorithms.
41
+ Processes large schemas.
36
42
 
37
43
  - Result caching
38
44
  - Selective evaluation
@@ -44,6 +50,13 @@ Debug and inspect AST structures with readable S-expression notation output.
44
50
  - Proper indentation and hierarchical structure
45
51
  - Useful for debugging schema parsing and AST analysis
46
52
 
53
+ ### [JavaScript Transpiler](javascript-transpiler.md)
54
+ Transpiles compiled schemas to standalone JavaScript code.
55
+
56
+ - Generates bundles with only required functions
57
+ - Supports CommonJS and browser environments
58
+ - Maintains identical behavior across platforms
59
+
47
60
  ## Integration
48
61
 
49
62
  - Type inference uses input declarations