RubyGems - ruby_llm-contract - Versions diffs - 0.3.0 → 0.3.6 - Mend

ruby_llm-contract 0.3.0 → 0.3.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +46 -0
data/Gemfile.lock +2 -2
data/lib/ruby_llm/contract/adapters/ruby_llm.rb +3 -3
data/lib/ruby_llm/contract/concerns/eval_host.rb +1 -0
data/lib/ruby_llm/contract/contract/schema_validator.rb +70 -3
data/lib/ruby_llm/contract/eval/baseline_diff.rb +5 -1
data/lib/ruby_llm/contract/eval/eval_definition.rb +25 -4
data/lib/ruby_llm/contract/eval/report.rb +1 -1
data/lib/ruby_llm/contract/eval/runner.rb +2 -1
data/lib/ruby_llm/contract/eval/trait_evaluator.rb +6 -0
data/lib/ruby_llm/contract/pipeline/result.rb +1 -1
data/lib/ruby_llm/contract/prompt/builder.rb +2 -1
data/lib/ruby_llm/contract/step/base.rb +2 -2
data/lib/ruby_llm/contract/step/limit_checker.rb +1 -1
data/lib/ruby_llm/contract/step/retry_policy.rb +1 -1
data/lib/ruby_llm/contract/step/runner.rb +7 -1
data/lib/ruby_llm/contract/version.rb +1 -1
data/lib/ruby_llm/contract.rb +16 -0
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: b032109a7818caa3f68cae651f9f99210765d4257825f52a332944a6120ad522
-  data.tar.gz: 8f4c1bb95cbcf79236723e100becf8c8f2b87061bd7c29827152e4d716a99ce3
+  metadata.gz: 35a61fe65d6a7939e3ef22bdd37732d2ae6cd5643f51d595a3f26b4281eea396
+  data.tar.gz: 9b1b95b29c31e433af60c25e85dfdebf3e8e71cb85c0e568835309a7cd855926
 SHA512:
-  metadata.gz: e84f8e58367e2eae1ea6a0a712e125be6b3edb361ce6feca984c659f15ca11ce658143adf7fdfcd09f5c1ff57d09fad31e431320f780dd08da7ab7499dd9b961
-  data.tar.gz: 29c98d8fb09a92df1a88136d7c67094784fdf2ae01ae9ec1aaa3fc5f1cd589fd27c7139c84663ba9e49c89e5537f98480eb451076c8a00dffcccfc3bf062f5d8
+  metadata.gz: 0bb0333b6c362b1687b51f6bf360fd6d659c066a2a5b4b539bab4795150e5c1c8dbebe8dac6d05791b62958058d60418e5ff1f2b5db1f050f29412ed136494a5
+  data.tar.gz: ff5a8e7c30344993617bdd5f85d857e91d0cb633e2b7fe35a08aadf0790a4c7c0389cb017f92a192d199fe1eaba9526c509d5731321b36bd2c6e5fdedb5ca6d0

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,51 @@
 # Changelog
+## 0.3.6 (2026-03-24)
+- **Recursive array/object validation** — nested arrays (`array of array of string`) validated recursively. Object items validated even without `:properties` (e.g. `additionalProperties: false`).
+- **Deep symbolize in sample pre-validation** — array samples with string keys (`[{"name" => "Alice"}]`) correctly symbolized before schema validation.
+## 0.3.5 (2026-03-24)
+- **String constraints in SchemaValidator** — `minLength`/`maxLength` enforced for root and nested strings.
+- **Array item validation** — scalar items (string, integer) validated against items schema type and constraints.
+- **Non-JSON sample_response fails fast** — `sample_response("hello")` with object schema raises ArgumentError at definition time instead of silently passing.
+- **`max_tokens` in KNOWN_CONTEXT_KEYS** — no more spurious "Unknown context keys" warning.
+- **Duplicate models deduplicated** — `compare_models(models: ["m", "m"])` runs model once.
+## 0.3.4 (2026-03-24)
+- **SchemaValidator validates non-object roots** — boolean, integer, number, array root schemas now enforce type, min/max, enum, minItems/maxItems. Previously only object schemas were validated.
+- **Removed passing cases = regression** — `regressed?` returns true when baseline had passing cases that are now missing. Prevents gate bypass by deleting eval cases.
+- **JSON string sample_response fixed** — `sample_response('{"name":"Alice"}')` correctly parsed for pre-validation instead of double-encoding.
+- **`context[:max_tokens]` forwarded** — overrides step's `max_output` for adapter call AND budget precheck.
+## 0.3.3 (2026-03-23)
+- **Skipped cases visible in regression diff** — baseline PASS → current SKIP now detected as regression by `without_regressions` and `fail_on_regression`.
+- **Skip only on missing adapter** — eval runner no longer masks evaluator errors as SKIP. Only "No adapter configured" triggers skip.
+- **Array/Hash sample pre-validation** — `sample_response([{...}])` correctly validated against schema instead of silently skipping.
+- **`assume_model_exists: false` forwarded** — boolean `false` no longer dropped by truthiness check in adapter options.
+- **Duplicate case names caught at definition** — `add_case`/`verify` with same name raises immediately, not at run time.
+## 0.3.2 (2026-03-23)
+- **Array response preserved** — `Adapters::RubyLLM` no longer stringifies Array content. Steps with `output_type Array` work correctly.
+- **Falsy prompt input** — `run(false)` and `build_messages(false)` pass `false` to dynamic prompt blocks instead of falling back to `instance_eval`.
+- **`retry_on` flatten** — `retry_on([:a, :b])` no longer wraps in nested array.
+- **Builder reset** — `Prompt::Builder` resets nodes on each build (no accumulation on reuse).
+- **Pipeline false output** — `output: false` no longer shows "(no output)" in pretty_print.
+## 0.3.1 (2026-03-23)
+Fixes from persona_tool production deployment (4 services migrated).
+- **Proc/Lambda in `expected_traits`** — `expected_traits: { score: ->(v) { v > 3 } }` now works.
+- **Zeitwerk eager-load** — `load_evals!` eager-loads `app/contracts/` and `app/steps/` before loading eval files. Fixes uninitialized constant errors in Rake tasks.
+- **Falsy values** — `expected: false`, `input: false`, `sample_response(nil)` all handled correctly.
+- **Context key forwarding** — `provider:` and `assume_model_exists:` forwarded to adapter. `schema:` and `max_tokens:` are step-level only (no split-brain).
+- **Deep-freeze immutability** — constructors never mutate caller's data.
 ## 0.3.0 (2026-03-23)
 Baseline regression detection — know when quality drops before users do.

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    ruby_llm-contract (0.3.0)
+    ruby_llm-contract (0.3.6)
       dry-types (~> 1.7)
       ruby_llm (~> 1.0)
       ruby_llm-schema (~> 0.3)
@@ -165,7 +165,7 @@ CHECKSUMS
   rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
   ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
   ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
-  ruby_llm-contract (0.3.0)
+  ruby_llm-contract (0.3.6)
   ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
   unicode-display_width (3.2.0) sha256=0cdd96b5681a5949cdbc2c55e7b420facae74c4aaf9a9815eee1087cb1853c42
   unicode-emoji (4.2.0) sha256=519e69150f75652e40bf736106cfbc8f0f73aa3fb6a65afe62fefa7f80b0f80f

data/lib/ruby_llm/contract/adapters/ruby_llm.rb CHANGED Viewed

@@ -43,8 +43,8 @@ module RubyLLM
         def chat_constructor_options(options)
           opts = { model: options[:model] }
-          opts[:provider] = options[:provider] if options[:provider]
-          opts[:assume_model_exists] = options[:assume_model_exists] if options[:assume_model_exists]
+          opts[:provider] = options[:provider] if options.key?(:provider)
+          opts[:assume_model_exists] = options[:assume_model_exists] if options.key?(:assume_model_exists)
           opts
         end
@@ -57,7 +57,7 @@ module RubyLLM
         def build_response(response)
           content = response.content
-          content = content.to_s unless content.is_a?(Hash)
+          content = content.to_s unless content.is_a?(Hash) || content.is_a?(Array)
           Response.new(
             content: content,

data/lib/ruby_llm/contract/concerns/eval_host.rb CHANGED Viewed

@@ -46,6 +46,7 @@ module RubyLLM
         def compare_models(eval_name, models:, context: {})
           context ||= {}
+          models = models.uniq
           reports = models.each_with_object({}) do |model, hash|
             model_context = deep_dup_context(context).merge(model: model)
             hash[model] = run_single_eval(eval_name, model_context)

data/lib/ruby_llm/contract/contract/schema_validator.rb CHANGED Viewed

@@ -40,10 +40,77 @@ module RubyLLM
       def validate_non_hash_output
         expected_type = @json_schema[:type]&.to_s
         if expected_type == "object" || @json_schema.key?(:properties)
-          ["expected object, got #{@output.class}"]
-        else
-          []
+          return ["expected object, got #{@output.class}"]
+        end
+        errors = []
+        validate_type_match(errors, @output, expected_type, "root") if expected_type
+        validate_constraints(errors, @output, @json_schema, "root")
+        if expected_type == "array" && @output.is_a?(Array) && @json_schema[:items]
+          validate_array_items(errors, @output, @json_schema[:items], "")
+        end
+        errors
+      end
+      def validate_array_items(errors, array, items_schema, prefix)
+        array.each_with_index do |item, i|
+          item_prefix = "#{prefix}[#{i}]"
+          validate_value(errors, item, items_schema, item_prefix)
+        end
+      end
+      def validate_value(errors, value, schema, prefix)
+        value_type = schema[:type]&.to_s
+        validate_type_match(errors, value, value_type, prefix) if value_type
+        validate_constraints(errors, value, schema, prefix)
+        if value.is_a?(Hash) && (schema.key?(:properties) || value_type == "object")
+          validate_object(value, schema, prefix: prefix)
+          errors.concat(@errors)
+          @errors = []
+        elsif value.is_a?(Array) && schema[:items]
+          validate_array_items(errors, value, schema[:items], prefix)
+        end
+      end
+      def validate_type_match(errors, value, expected_type, prefix)
+        valid = case expected_type
+                when "string" then value.is_a?(String)
+                when "integer" then value.is_a?(Integer)
+                when "number" then value.is_a?(Numeric)
+                when "boolean" then value.is_a?(TrueClass) || value.is_a?(FalseClass)
+                when "array" then value.is_a?(Array)
+                else true
+                end
+        errors << "#{prefix}: expected #{expected_type}, got #{value.class}" unless valid
+      end
+      def validate_constraints(errors, value, schema, prefix)
+        if schema[:minimum] && value.is_a?(Numeric) && value < schema[:minimum]
+          errors << "#{prefix}: #{value} is less than minimum #{schema[:minimum]}"
+        end
+        if schema[:maximum] && value.is_a?(Numeric) && value > schema[:maximum]
+          errors << "#{prefix}: #{value} is greater than maximum #{schema[:maximum]}"
+        end
+        if schema[:enum] && !schema[:enum].include?(value)
+          errors << "#{prefix}: #{value.inspect} is not in enum #{schema[:enum].inspect}"
+        end
+        if schema[:minItems] && value.is_a?(Array) && value.length < schema[:minItems]
+          errors << "#{prefix}: array has #{value.length} items, minimum #{schema[:minItems]}"
+        end
+        if schema[:maxItems] && value.is_a?(Array) && value.length > schema[:maxItems]
+          errors << "#{prefix}: array has #{value.length} items, maximum #{schema[:maxItems]}"
+        end
+        if schema[:minLength] && value.is_a?(String) && value.length < schema[:minLength]
+          errors << "#{prefix}: string length #{value.length} is less than minLength #{schema[:minLength]}"
+        end
+        if schema[:maxLength] && value.is_a?(String) && value.length > schema[:maxLength]
+          errors << "#{prefix}: string length #{value.length} is greater than maxLength #{schema[:maxLength]}"
         end
       end

data/lib/ruby_llm/contract/eval/baseline_diff.rb CHANGED Viewed

@@ -48,7 +48,11 @@ module RubyLLM
         end
         def regressed?
-          regressions.any?
+          regressions.any? || removed_passing_cases.any?
+        end
+        def removed_passing_cases
+          removed_cases.select { |name| @baseline[name]&.dig(:passed) }
         end
         def improved?

data/lib/ruby_llm/contract/eval/eval_definition.rb CHANGED Viewed

@@ -34,6 +34,7 @@ module RubyLLM
         def add_case(description, input: nil, expected: nil, expected_traits: nil, evaluator: nil)
           case_input = input.nil? ? @default_input : input
           raise ArgumentError, "add_case requires input (set default_input or pass input:)" if case_input.nil?
+          validate_unique_case_name!(description)
           @cases << {
             name: description,
@@ -52,6 +53,7 @@ module RubyLLM
           expected_or_proc = expect unless expect.nil?
           case_input = input.nil? ? @default_input : input
           validate_verify_args!(expected_or_proc, case_input)
+          validate_unique_case_name!(description)
           evaluator = expected_or_proc.is_a?(::Proc) ? expected_or_proc : nil
@@ -85,6 +87,12 @@ module RubyLLM
           [{ name: "contract check", input: @default_input, expected: nil, evaluator: nil }]
         end
+        def validate_unique_case_name!(name)
+          return unless @cases.any? { |c| c[:name] == name }
+          raise ArgumentError, "Duplicate case name '#{name}' in eval '#{@name}'. Case names must be unique."
+        end
         def validate_verify_args!(expected_or_proc, case_input)
           raise ArgumentError, "verify requires either a positional argument or expect: keyword" if expected_or_proc.nil?
           raise ArgumentError, "verify requires input (set default_input or pass input:)" if case_input.nil?
@@ -98,15 +106,28 @@ module RubyLLM
           return if errors.empty?
           raise ArgumentError, "sample_response does not satisfy step schema: #{errors.join(", ")}"
-        rescue JSON::ParserError
-          # Not JSON -- skip pre-validation
+        rescue JSON::ParserError => e
+          # Non-JSON string with a structured schema = clear error
+          raise ArgumentError, "sample_response is not valid JSON: #{e.message}"
         end
         def validate_sample_against_schema(schema)
-          response_hash = @sample_response.is_a?(Hash) ? @sample_response : JSON.parse(@sample_response.to_s)
-          symbolized = Parser.symbolize_keys(response_hash)
+          parsed = case @sample_response
+                   when Hash, Array then @sample_response
+                   when String then JSON.parse(@sample_response)
+                   else @sample_response
+                   end
+          symbolized = deep_symbolize(parsed)
           SchemaValidator.validate(symbolized, schema)
         end
+        def deep_symbolize(obj)
+          case obj
+          when Hash then Parser.symbolize_keys(obj)
+          when Array then obj.map { |item| deep_symbolize(item) }
+          else obj
+          end
+        end
       end
     end
   end

data/lib/ruby_llm/contract/eval/report.rb CHANGED Viewed

@@ -97,7 +97,7 @@ module RubyLLM
           validate_baseline!(baseline_data)
           BaselineDiff.new(
             baseline_cases: baseline_data[:cases],
-            current_cases: evaluated_results.map { |r| serialize_case(r) }
+            current_cases: results.map { |r| serialize_case(r) }
           )
         end

data/lib/ruby_llm/contract/eval/runner.rb CHANGED Viewed

@@ -32,7 +32,8 @@ module RubyLLM
           build_case_result(test_case, step_result, eval_result)
         rescue RubyLLM::Contract::Error => e
-          # No adapter configured — skip this case (offline mode without sample_response)
+          raise unless e.message.include?("No adapter configured")
           skipped_result(test_case, e.message)
         end

data/lib/ruby_llm/contract/eval/trait_evaluator.rb CHANGED Viewed

@@ -26,6 +26,8 @@ module RubyLLM
         def trait_error(key, value, expectation)
           case expectation
+          when ::Proc
+            trait_proc_error(key, value, expectation)
           when ::Regexp
             trait_regexp_error(key, value, expectation)
           when Range
@@ -56,6 +58,10 @@ module RubyLLM
           "#{key}: expected falsy, got #{value.inspect}" if value
         end
+        def trait_proc_error(key, value, expectation)
+          "#{key}: trait check failed, got #{value.inspect}" unless expectation.call(value)
+        end
         def trait_equality_error(key, value, expectation)
           "#{key}: expected #{expectation.inspect}, got #{value.inspect}" unless value == expectation
         end

data/lib/ruby_llm/contract/pipeline/result.rb CHANGED Viewed

@@ -116,7 +116,7 @@ module RubyLLM
         end
         def format_output(output)
-          return ["(no output)"] unless output
+          return ["(no output)"] if output.nil?
           pairs = output.is_a?(Hash) ? output : { value: output }
           pairs.map do |key, val|

data/lib/ruby_llm/contract/prompt/builder.rb CHANGED Viewed

@@ -10,7 +10,8 @@ module RubyLLM
         end
         def build(input = nil)
-          if input && @block.arity >= 1
+          @nodes = []
+          if !input.nil? && @block.arity >= 1
             instance_exec(input, &@block)
           else
             instance_eval(&@block)

data/lib/ruby_llm/contract/step/base.rb CHANGED Viewed

@@ -58,7 +58,7 @@ module RubyLLM
             end
           end
-          KNOWN_CONTEXT_KEYS = %i[adapter model temperature provider assume_model_exists].freeze
+          KNOWN_CONTEXT_KEYS = %i[adapter model temperature max_tokens provider assume_model_exists].freeze
           def run(input, context: {})
             context = (context || {}).transform_keys { |k| k.respond_to?(:to_sym) ? k.to_sym : k }
@@ -68,7 +68,7 @@ module RubyLLM
             policy = retry_policy
             ctx_temp = context[:temperature]
-            extra = context.slice(:provider, :assume_model_exists)
+            extra = context.slice(:provider, :assume_model_exists, :max_tokens)
             result = if policy
                        run_with_retry(input, adapter: adapter, default_model: default_model,
                                       policy: policy, context_temperature: ctx_temp, extra_options: extra)

data/lib/ruby_llm/contract/step/limit_checker.rb CHANGED Viewed

@@ -29,7 +29,7 @@ module RubyLLM
         end
         def append_cost_error(estimated, errors)
-          estimated_output = @max_output || 0
+          estimated_output = effective_max_output || 0
           estimated_cost = CostCalculator.calculate(
             model_name: @model,
             usage: { input_tokens: estimated, output_tokens: estimated_output }

data/lib/ruby_llm/contract/step/retry_policy.rb CHANGED Viewed

@@ -39,7 +39,7 @@ module RubyLLM
         end
         def retry_on(*statuses)
-          @retryable_statuses = statuses
+          @retryable_statuses = statuses.flatten
         end
         def retryable?(result)

data/lib/ruby_llm/contract/step/runner.rb CHANGED Viewed

@@ -83,14 +83,20 @@ module RubyLLM
         end
         def build_adapter_options
+          effective_max_tokens = @extra_options[:max_tokens] || @max_output
           { model: @model }.tap do |opts|
             opts[:schema] = @output_schema if @output_schema
-            opts[:max_tokens] = @max_output if @max_output
+            opts[:max_tokens] = effective_max_tokens if effective_max_tokens
             opts[:temperature] = @temperature if @temperature
             @extra_options.each { |k, v| opts[k] = v unless opts.key?(k) }
           end
         end
+        def effective_max_output
+          @extra_options[:max_tokens] || @max_output
+        end
         def build_error_result(error_result, messages)
           Result.new(
             status: error_result.status,

data/lib/ruby_llm/contract/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module RubyLLM
   module Contract
-    VERSION = "0.3.0"
+    VERSION = "0.3.6"
   end
 end

data/lib/ruby_llm/contract.rb CHANGED Viewed

@@ -51,6 +51,10 @@ module RubyLLM
         return if dirs.empty?
+        # In Rails, eager-load parent directories so contract classes
+        # are available when eval files reference them.
+        eager_load_contract_dirs! if defined?(::Rails)
         # Clear file-sourced evals ONCE, then load ALL dirs.
         Thread.current[:ruby_llm_contract_reloading] = true
         eval_hosts.each do |host|
@@ -79,6 +83,18 @@ module RubyLLM
         @eval_hosts || []
       end
+      def eager_load_contract_dirs!
+        %w[app/contracts app/steps].each do |path|
+          full = ::Rails.root.join(path)
+          next unless full.exist?
+          ::Rails.autoloaders.main.eager_load_dir(full.to_s)
+        rescue StandardError
+          # Zeitwerk not available or dir not managed — skip
+          nil
+        end
+      end
       def auto_create_adapter!
         require "ruby_llm"
         configuration.default_adapter = Adapters::RubyLLM.new

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ruby_llm-contract
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 0.3.6
 platform: ruby
 authors:
 - Justyna