RubyGems - kumi-parser - Versions diffs - 0.0.32 → 0.1.0 - Mend

kumi-parser 0.0.32 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

checksums.yaml +4 -4
data/.rubocop.yml +41 -0
data/CHANGELOG.md +64 -0
data/CLAUDE.md +59 -120
data/README.md +28 -6
data/examples/parse_and_inspect.rb +34 -0
data/kumi-parser.gemspec +3 -4
data/lib/kumi/parser/grammar.rb +120 -0
data/lib/kumi/parser/lexer.rb +232 -0
data/lib/kumi/parser/parse_error.rb +52 -0
data/lib/kumi/parser/parser.rb +692 -0
data/lib/kumi/parser/source.rb +76 -0
data/lib/kumi/parser/text_parser.rb +37 -27
data/lib/kumi/parser/token.rb +10 -71
data/lib/kumi/parser/version.rb +1 -1
data/lib/kumi-parser.rb +9 -10
metadata +16 -37
data/examples/debug_text_parser.rb +0 -41
data/examples/debug_transform_rule.rb +0 -26
data/examples/text_parser_comprehensive_test.rb +0 -333
data/examples/text_parser_test_with_comments.rb +0 -146
data/lib/kumi/parser/base.rb +0 -51
data/lib/kumi/parser/direct_parser.rb +0 -698
data/lib/kumi/parser/error_extractor.rb +0 -89
data/lib/kumi/parser/errors.rb +0 -40
data/lib/kumi/parser/helpers.rb +0 -154
data/lib/kumi/parser/smart_tokenizer.rb +0 -373
data/lib/kumi/parser/syntax_validator.rb +0 -21
data/lib/kumi/parser/text_parser/api.rb +0 -60
data/lib/kumi/parser/token_constants.rb +0 -467
data/lib/kumi/text_parser.rb +0 -40
data/lib/kumi/text_schema.rb +0 -31

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a48abb72f68e3db4876b1e3d7a56563f048046a34524359a042829d7a98360dd
-  data.tar.gz: 8ae0bb61111aa3179a8e097b7a4f0b9f47f11e6b20d71aa48a51742eaed84487
+  metadata.gz: b880347a083ba29083167538910e1d1453a1b7c9f334bd4a8ec86437878de8e3
+  data.tar.gz: 529fcc0f2bb7102ff8a79e3c8e69ef4cc3a613f13378f81d3d86feaf359d6cb8
 SHA512:
-  metadata.gz: 92bd51d6b1df52e197c912226a04dedb94fb162d9f8302ead05a1e6f5acd2917124909f0d212564c50ae74ab92b17c579871463f9fe8bb4bf43c8cc23802dfba
-  data.tar.gz: 89c50b842361f72dc4249b129532425efae1089e7729deb1b9b5d9bcdcda41afd6f5ae994b5dce4b78526c8a28d68579ac7aa17a82c5ee6356dbd4727e882aa5
+  metadata.gz: 59a7da2e91de9ff04d804ad3eefe846dea9dac256083220d286ea4d29f509fdd456019f84683e9b147667da767ed090e0f5a0769777edcdeed6b9585e20b36a4
+  data.tar.gz: 8faaaf5d58767f4f5aa776be90033d0e65b9b6484c402434c24db305f0759ff874091445341f47b50a46e915552dcd9f4f2f63ec13caf94fdc9a12c58147e4d2

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,41 @@
+AllCops:
+  TargetRubyVersion: 3.1
+  NewCops: disable
+  SuggestExtensions: false
+# A recursive-descent parser and a single-pass lexer naturally have a handful
+# of longer dispatch methods; the default 10-line limit fights that structure
+# rather than improving it. Keep generous ceilings and let the grammar read
+# linearly.
+Metrics/MethodLength:
+  Max: 35
+Metrics/AbcSize:
+  Max: 30
+Metrics/CyclomaticComplexity:
+  Max: 12
+Metrics/PerceivedComplexity:
+  Max: 12
+Metrics/ClassLength:
+  Max: 600
+Metrics/ModuleLength:
+  Max: 200
+# The gem's entry point must be named after the gem (`kumi-parser`), which is
+# hyphenated by convention.
+Naming/FileName:
+  Exclude:
+    - "lib/kumi-parser.rb"
+# Positional format tokens read fine for short, local format strings.
+Style/FormatStringToken:
+  Enabled: false
+# Specs read better with descriptive backtick-quoted example names and longer
+# example/group bodies than the metric defaults allow.
+Metrics/BlockLength:
+  Exclude:
+    - "spec/**/*"
+    - "*.gemspec"
+Style/Documentation:
+  Enabled: false

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,64 @@
+# Changelog
+All notable changes to kumi-parser are documented here. The format is based on
+[Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project
+adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.0] – 2026-06-17
+### Changed
+- **Full rewrite of the lexer and parser.** The hand-rolled `SmartTokenizer`
+  (char-by-char loop with string accumulation and a context stack) and the
+  metadata-bag `TOKEN_METADATA` table are replaced by a single-pass
+  `StringScanner` lexer producing typed tokens, plus a compact `Grammar` of
+  lookup tables (keywords, type keywords, function sugar, operator
+  precedence/associativity). The recursive-descent + Pratt parser builds the
+  same `Kumi::Syntax::*` AST as before — verified byte-for-byte against
+  kumi-core's golden AST snapshots (50 schemas) and the full compile/runtime
+  pipeline.
+- **First-class parse errors.** Every syntax error now reports an exact
+  `file:line:col`, a plain-English "expected X, but found Y", and a
+  caret-annotated source frame. Errors are strictly scoped to the parse phase
+  (lexing + AST construction); name resolution, types, and axes remain the
+  analyzer's concern. The raised `Kumi::Errors::SyntaxError` carries a
+  structured `Location`, so callers render the frame from data rather than
+  scraping the message.
+- The public surface is a single `Kumi::Parser::TextParser` facade
+  (`parse` / `valid?` / `validate`). The duplicate `Base`, `Api`,
+  `SyntaxValidator`, and `ErrorExtractor` entry points, the `Kumi::TextParser`
+  and `Kumi::TextSchema` shims, and the unused `parslet` / `zeitwerk`
+  dependencies are removed. `kumi` is now a declared runtime dependency
+  (the parser builds its AST nodes).
+### Fixed
+- `element :type, :name` array-element declarations and chained array access
+  through deeply nested inputs now parse (the old parser failed its own specs
+  for these).
+### Removed
+- The `element` input-declaration keyword in the text grammar (it was unused by
+  any schema and duplicated the standard `array :x do <type> :name end` form).
+## [0.0.33] – 2026-06-14
+### Added
+- `outer(...)` recognized as function sugar, mirroring `cross(...)`. `outer` is
+  the cross-array all-pairs operator (A × B) — the sibling of `cross` (A × A').
+  Both bare `outer(expr)` and `fn(:outer, expr)` parse to the same
+  `CallExpression(:outer, …)`, so text schemas can now express two-array
+  all-pairs (e.g. a pixels × lights field).
+## [0.0.32] – 2026-06
+### Added
+- `cross(...)` recognized as function sugar (self-join all-pairs / N-body axis op).
+## [0.0.31] – 2026-06
+### Added
+- Multi-level namespace constants in the tokenizer.
+## [0.0.29] – 2026-06
+### Added
+- `import` syntax for composing schemas.
+## [0.0.25] – 2026-06
+### Changed
+- Renamed `token_metadata` to `token_constants`; told Zeitwerk to ignore it
+  (it is a plain constants module, not an autoloadable class).

data/CLAUDE.md CHANGED Viewed

@@ -1,120 +1,59 @@
-# Kumi Parser - Technical Context
-## Current Architecture (January 2025)
-## Key Files
-- `lib/kumi/parser/smart_tokenizer.rb` - Tokenizer with context tracking
-- `lib/kumi/parser/direct_parser.rb` - Parser implementation (renamed from direct_ast_parser.rb)
-- `lib/kumi/parser/token_metadata.rb` - Token types and metadata
-- `lib/kumi/parser/text_parser.rb` - Public API maintaining compatibility
-- `lib/kumi/parser/base.rb` - Core parsing interface
-- `lib/kumi/parser/syntax_validator.rb` - Validation with proper diagnostics
-- `lib/kumi/parser/errors.rb` - Custom error types
-## Important Syntax Rules
-- **Functions**: `fn(:symbol, args...)` only (no dot notation like `fn.max()`)
-- **Operators**: Standard precedence (*/% > +- > comparisons > & > |)
-- **Array access**: Uses `array[index]` syntax (converted to `:at` function internally)
-- **Equality**: `==` and `!=` operators (converted from `:eq`/`:ne` tokens)
-- **Multi-line expressions**: Parser skips newlines within expressions
-- **Cascade**: `value :name do ... on condition, result ... base result ... end`
-- **Constants**: Text parser cannot resolve Ruby constants - use inline values
-## AST Structure & Compatibility
-All nodes from `Kumi::Syntax::*` (defined in main kumi gem):
-- `Root(inputs, values, traits)`
-- `InputDeclaration(name, domain, type, children)`
-- `ValueDeclaration(name, expression)`
-- `TraitDeclaration(name, expression)`
-- `CallExpression(fn_name, args)`
-- `InputReference(name)` / `InputElementReference(path)`
-- `DeclarationReference(name)`
-- `Literal(value)`
-- `CascadeExpression(cases)` / `CaseExpression(condition, result)`
-- `ArrayExpression(elements)`
-**Ruby DSL Compatibility**:
-- Cascade conditions: Simple trait references wrapped in `all?([trait])` function calls
-- Array access: `[index]` becomes `CallExpression(:at, [array, index])`
-- Operators: `:eq` → `:==`, `:ne` → `:!=` for consistency
-- Constants: Ruby constants resolved to values in DSL, remain as `DeclarationReference` in text parser
-## Debugging & Testing
-**View AST structure**:
-```ruby
-ast = Kumi::Parser::TextParser.parse(schema)
-puts Kumi::Support::SExpressionPrinter.print(ast)
-# => (Root
-#      inputs: [(InputDeclaration :income :float)]
-#      values: [(ValueDeclaration :tax (CallExpression :+ ...))]
-#      traits: [(TraitDeclaration :adult (CallExpression :>= ...))])
-```
-**Quick validation test**:
-```ruby
-ruby -r./lib/kumi/parser/text_parser -e "p Kumi::Parser::TextParser.valid?('schema do input do float :x end end')"
-```
-**Compare with Ruby DSL**:
-```ruby
-# Define schema in Ruby
-module TestSchema
-  extend Kumi::Schema
-  schema do
-    input do
-      float :income
-    end
-    value :tax, fn(:calc, input.income)
-  end
-end
-# Parse equivalent text
-text_ast = Kumi::Parser::TextParser.parse(<<~KUMI)
-  schema do
-    input do
-      float :income
-    end
-    value :tax, fn(:calc, input.income)
-  end
-KUMI
-# Compare ASTs
-ruby_ast = TestSchema.__kumi_syntax_tree__
-text_ast == ruby_ast # Should be true
-```
-- Tax schema in `spec/kumi/parser/text_parser_example tax_schema_spec.rb` is canonical test
-- Run all tests: `rspec spec/kumi/parser/`
-- Integration tests: `rspec spec/kumi/parser/text_parser_integration_spec.rb`
-## Error Handling & Validation
-- **Parse errors**: `Kumi::Parser::Errors::ParseError` (internal) → `Kumi::Errors::SyntaxError` (public API)
-- **Tokenizer errors**: `Kumi::Parser::Errors::TokenizerError` with location info
-- **Diagnostics**: Use `SyntaxValidator` for detailed error reporting with line/column info
-- **Location tracking**: All tokens and AST nodes include `Kumi::Syntax::Location(file, line, column)`
-## Test Status (January 2025)
-✅ **All specs passing**: 32 examples, 0 failures, 1 pending
-- ✅ Syntax validation with proper diagnostics
-- ✅ AST compatibility with Ruby DSL (when constants aren't used)
-- ✅ Integration with analyzer and compiler
-- ✅ End-to-end execution testing
-- ✅ Error type compatibility
-## Known Limitations
-- **Ruby constants**: Text parser cannot resolve Ruby constants like `CONST_NAME` - use inline values instead
-- **Domain specification**: Parsing not fully implemented
-- **Diagnostic APIs**: Monaco/CodeMirror/JSON format methods not implemented
-## Performance
-- Tokenization: <1ms for typical schemas
-- Parsing: ~4ms for complete tax schema (21 values, 4 traits)
-- Direct AST construction eliminates transformation overhead
+# Kumi Parser — Technical Context
+## Architecture
+`source → Lexer → tokens → Parser → AST`. The AST is kumi-core's
+`Kumi::Syntax::*` (so this gem depends on `kumi`).
+- `lib/kumi/parser/lexer.rb` — single-pass `StringScanner` lexer. Emits a flat
+  array of `Token`s; newlines and comments are emitted but skipped by the
+  parser. Context-free (no input/schema context stack).
+- `lib/kumi/parser/parser.rb` — recursive descent for declarations, Pratt for
+  expressions. Builds `Kumi::Syntax::*` nodes directly.
+- `lib/kumi/parser/grammar.rb` — lookup tables: `KEYWORDS`, `TYPE_KEYWORDS`,
+  `FUNCTION_SUGAR`, `BINARY_OPERATORS` (precedence + associativity + fn name),
+  `BOOLEANS`. Replaces the old `TOKEN_METADATA` bag.
+- `lib/kumi/parser/token.rb` — `Struct(:kind, :value, :offset)`. No metadata.
+- `lib/kumi/parser/source.rb` — offset → `Location` and caret code frames.
+- `lib/kumi/parser/parse_error.rb` — `ParseError` (located, framed).
+- `lib/kumi/parser/text_parser.rb` — public API: `parse` / `valid?` /
+  `validate`. Raises `Kumi::Errors::SyntaxError` carrying a `Location`.
+## Grammar notes
+- Functions: `fn(:name, args...)` or bare sugar `name(...)` for the entries in
+  `FUNCTION_SUGAR` (`select`, `shift`, `roll`, `cross`, `outer`, `index`, the
+  `to_*` casts). `select` lowers to `:select`.
+- Operators lower to fn names: `==`→`:==`, `!=`→`:!=`, `**`→`:power`,
+  `&`→`:and`, `|`→`:or`, the rest to `:add`/`:multiply`/etc.
+- `array[i]` → `CallExpression(:at, [array, i])`.
+- Unary minus on a non-literal → `subtract(0, x)`; a `-` directly before a digit
+  in operand position is a negative literal (`Literal(-1)`), while a spaced `-`
+  after a value is the binary operator.
+- `let :n, …` → `ValueDeclaration` with `hints: { inline: true }`.
+- Cascade `on`: a single trait ref is wrapped in `cascade_and([ref])`; multiple
+  conditions become `cascade_and([...])`.
+- Function-option kwargs (`policy: :clamp`, `axis_offset: 1`) are stored as raw
+  scalars on `CallExpression#opts`. Imported-call kwargs are full expressions
+  on `ImportCall#input_mapping`.
+- A bare capitalized word is an identifier (e.g. `let :W, …` referenced as `W`);
+  only `Foo::Bar` paths are constants. `Float::INFINITY` is the one resolved
+  constant value.
+## Error boundary
+Parse errors are about shape only (unexpected char, missing `end`, malformed
+hash pair). Anything needing meaning — undefined references, types, axes — is
+the analyzer's job in kumi-core and must not be flagged here.
+## Testing
+- `bundle exec rspec` — the parser's own specs (lexer, AST, errors). The
+  `Gemfile` points `kumi` at `../kumi-core` for local dev.
+- **The real equivalence gate is in kumi-core**: from that repo,
+  `bundle exec bin/kumi golden_v2 verify --repr ast` byte-compares the parser's
+  AST against 50 frozen `golden/*/expected/ast.txt` snapshots; the full
+  `golden_v2 verify` runs the whole pipeline (incl. Ruby/JS runtime parity).
+  Keep both green. kumi-core's `Gemfile` points `kumi-parser` at this path for
+  local dev.
+- `bundle exec rubocop` — clean.

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Kumi::Parser
-Text parser for [Kumi](https://github.com/amuta/kumi) schemas. Direct tokenizer → AST construction with ~4ms parse time.
+Text parser for [Kumi](https://github.com/amuta/kumi) schemas: a single-pass lexer feeding a recursive-descent + Pratt parser that builds kumi-core's AST directly, with located, framed parse errors.
 ## Installation
@@ -59,8 +59,7 @@ end
 **Function calls**: `fn(:name, arg1, arg2, ...)`
 **Operators**: `+` `-` `*` `**` `` `/` `%` `>` `<` `>=` `<=` `==` `!=` `&` `|`
 **References**: `input.field`, `value_name`, `array[index]`
-**Strings**: Both `"double"` and `'single'` quotes supported
-**Element syntax**: `element :type, :name` for array element specifications
+**Strings**: Both `"double"` and `'single'` quotes supported
 ## Ruby DSL Differences
@@ -70,9 +69,32 @@ end
 ## Architecture
-- `smart_tokenizer.rb` - Context-aware tokenization with embedded metadata
-- `direct_ast_parser.rb` - Recursive descent parser, direct AST construction
-- `token_metadata.rb` - Token types, precedence, and semantic hints
+The pipeline is `source → Lexer → tokens → Parser → AST`, where the AST is
+kumi-core's `Kumi::Syntax::*` nodes.
+- `lexer.rb` — single-pass `StringScanner` lexer producing a flat array of
+  typed `Token`s, each carrying only its kind, value, and start offset.
+- `parser.rb` — recursive descent for declarations, Pratt for expressions.
+- `grammar.rb` — the lookup tables (keywords, type keywords, function sugar,
+  operator precedence/associativity) shared by the lexer and parser.
+- `source.rb` / `parse_error.rb` — turn a byte offset into a `file:line:col`
+  location and a caret-annotated code frame for error messages.
+- `text_parser.rb` — the public `parse` / `valid?` / `validate` facade.
+### Error reporting
+Parse errors report an exact location, a plain-English description of what was
+expected versus what was found, and a source frame:
+```
+demo.kumi:2:3: expected an `input do` block, but found `value`
+➤    2 |   value :y, input.x
+       |   ^
+```
+Errors are confined to the parse phase. Resolving names, checking types, and
+reasoning about axes are semantic concerns handled later by kumi-core's
+analyzer, not by this gem.
 ## License

data/examples/parse_and_inspect.rb ADDED Viewed

@@ -0,0 +1,34 @@
+# frozen_string_literal: true
+# Parse a Kumi text schema and print its AST, then show what a parse error
+# looks like. Run with: ruby -Ilib examples/parse_and_inspect.rb
+require 'kumi-parser'
+schema = <<~KUMI
+  schema do
+    input do
+      integer :age, domain: 18..120
+      array :scores do
+        float :value
+      end
+    end
+    trait :adult, input.age >= 18
+    let   :total, fn(:sum, input.scores.value)
+    value :tier do
+      on adult, "adult"
+      base "minor"
+    end
+  end
+KUMI
+ast = Kumi::Parser::TextParser.parse(schema)
+puts Kumi::Support::SExpressionPrinter.print(ast)
+puts "\n--- a parse error ---"
+begin
+  Kumi::Parser::TextParser.parse("schema do\n  value :y input.x\nend\n", source_file: 'demo.kumi')
+rescue Kumi::Errors::SyntaxError => e
+  puts "#{e.message} (#{e.location})"
+end

data/kumi-parser.gemspec CHANGED Viewed

@@ -12,7 +12,7 @@ Gem::Specification.new do |spec|
   spec.description = 'Allows Kumi schemas to be written as plain text with syntax validation and editor integration.'
   spec.homepage = 'https://github.com/amuta/kumi-parser'
   spec.license = 'MIT'
-  spec.required_ruby_version = '>= 3.0.0'
+  spec.required_ruby_version = '>= 3.1.0'
   spec.metadata['allowed_push_host'] = 'https://rubygems.org'
   spec.metadata['homepage_uri'] = spec.homepage
@@ -31,9 +31,8 @@ Gem::Specification.new do |spec|
   spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
   spec.require_paths = ['lib']
-  # Dependencies
-  spec.add_dependency 'parslet', '~> 2.0'
-  spec.add_dependency 'zeitwerk', '~> 2.6'
+  # The parser builds Kumi::Syntax::* nodes, so it needs the core gem's AST.
+  spec.add_dependency 'kumi'
   # Development dependencies
   spec.add_development_dependency 'bundler', '~> 2.0'

data/lib/kumi/parser/grammar.rb ADDED Viewed

@@ -0,0 +1,120 @@
+# frozen_string_literal: true
+module Kumi
+  module Parser
+    # Static grammar tables shared by the lexer and parser. These replace the
+    # old per-token TOKEN_METADATA bag: lookups are keyed by a word or token
+    # kind, so the data lives once here instead of being copied onto every
+    # token instance.
+    module Grammar
+      # Bare words that are keywords rather than identifiers. `true`/`false` are
+      # not here: they are boolean literals, handled directly by the lexer.
+      KEYWORDS = {
+        'schema' => :schema,
+        'input' => :input,
+        'value' => :value,
+        'let' => :let,
+        'trait' => :trait,
+        'import' => :import,
+        'codegen' => :codegen,
+        'do' => :do,
+        'end' => :end,
+        'on' => :on,
+        'base' => :base,
+        'fn' => :fn
+      }.freeze
+      # The two boolean literal words, mapped to their Ruby values.
+      BOOLEANS = { 'true' => true, 'false' => false }.freeze
+      # Type keywords introduce input declarations. The value is the canonical
+      # type symbol stored on InputDeclaration.
+      TYPE_KEYWORDS = {
+        'integer' => :integer,
+        'float' => :float,
+        'decimal' => :decimal,
+        'string' => :string,
+        'boolean' => :boolean,
+        'any' => :any,
+        'array' => :array,
+        'hash' => :hash
+      }.freeze
+      # Container types whose declarations may open a nested `do … end` block.
+      CONTAINER_TYPES = %i[array hash].freeze
+      # Bare-call sugar: `name(args)` parses as `fn(:resolved_name, args)`.
+      # The value is the function symbol the call lowers to.
+      FUNCTION_SUGAR = {
+        'select' => :select,
+        'shift' => :shift,
+        'roll' => :roll,
+        'cross' => :cross,
+        'outer' => :outer,
+        'index' => :index,
+        'to_decimal' => :to_decimal,
+        'to_integer' => :to_integer,
+        'to_float' => :to_float,
+        'to_string' => :to_string
+      }.freeze
+      # Binary operators: kind => [precedence, :left/:right, fn_name].
+      # Higher precedence binds tighter. fn_name is the symbol the operator
+      # lowers to in the AST (CallExpression fn_name).
+      BINARY_OPERATORS = {
+        power: [7, :right, :power],
+        multiply: [6, :left, :multiply],
+        divide: [6, :left,  :divide],
+        modulo: [6, :left,  :modulo],
+        add: [5, :left, :add],
+        subtract: [5, :left, :subtract],
+        gte: [4, :left,  :>=],
+        lte: [4, :left,  :<=],
+        gt: [4, :left,  :>],
+        lt: [4, :left,  :<],
+        eq: [4, :left,  :==],
+        ne: [4, :left,  :!=],
+        and: [3, :left, :and],
+        or: [2, :left, :or]
+      }.freeze
+      module_function
+      def keyword(word)
+        KEYWORDS[word]
+      end
+      def boolean?(word)
+        BOOLEANS.key?(word)
+      end
+      def boolean(word)
+        BOOLEANS[word]
+      end
+      def type_keyword(word)
+        TYPE_KEYWORDS[word]
+      end
+      def function_sugar(word)
+        FUNCTION_SUGAR[word]
+      end
+      def binary_operator?(kind)
+        BINARY_OPERATORS.key?(kind)
+      end
+      def precedence(kind)
+        BINARY_OPERATORS.fetch(kind)[0]
+      end
+      def right_associative?(kind)
+        BINARY_OPERATORS.fetch(kind)[1] == :right
+      end
+      def operator_fn(kind)
+        BINARY_OPERATORS.fetch(kind)[2]
+      end
+    end
+  end
+end