ruby_llm-contract 0.3.0 → 0.3.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b032109a7818caa3f68cae651f9f99210765d4257825f52a332944a6120ad522
4
- data.tar.gz: 8f4c1bb95cbcf79236723e100becf8c8f2b87061bd7c29827152e4d716a99ce3
3
+ metadata.gz: 35a61fe65d6a7939e3ef22bdd37732d2ae6cd5643f51d595a3f26b4281eea396
4
+ data.tar.gz: 9b1b95b29c31e433af60c25e85dfdebf3e8e71cb85c0e568835309a7cd855926
5
5
  SHA512:
6
- metadata.gz: e84f8e58367e2eae1ea6a0a712e125be6b3edb361ce6feca984c659f15ca11ce658143adf7fdfcd09f5c1ff57d09fad31e431320f780dd08da7ab7499dd9b961
7
- data.tar.gz: 29c98d8fb09a92df1a88136d7c67094784fdf2ae01ae9ec1aaa3fc5f1cd589fd27c7139c84663ba9e49c89e5537f98480eb451076c8a00dffcccfc3bf062f5d8
6
+ metadata.gz: 0bb0333b6c362b1687b51f6bf360fd6d659c066a2a5b4b539bab4795150e5c1c8dbebe8dac6d05791b62958058d60418e5ff1f2b5db1f050f29412ed136494a5
7
+ data.tar.gz: ff5a8e7c30344993617bdd5f85d857e91d0cb633e2b7fe35a08aadf0790a4c7c0389cb017f92a192d199fe1eaba9526c509d5731321b36bd2c6e5fdedb5ca6d0
data/CHANGELOG.md CHANGED
@@ -1,5 +1,51 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.3.6 (2026-03-24)
4
+
5
+ - **Recursive array/object validation** — nested arrays (`array of array of string`) validated recursively. Object items validated even without `:properties` (e.g. `additionalProperties: false`).
6
+ - **Deep symbolize in sample pre-validation** — array samples with string keys (`[{"name" => "Alice"}]`) correctly symbolized before schema validation.
7
+
8
+ ## 0.3.5 (2026-03-24)
9
+
10
+ - **String constraints in SchemaValidator** — `minLength`/`maxLength` enforced for root and nested strings.
11
+ - **Array item validation** — scalar items (string, integer) validated against items schema type and constraints.
12
+ - **Non-JSON sample_response fails fast** — `sample_response("hello")` with object schema raises ArgumentError at definition time instead of silently passing.
13
+ - **`max_tokens` in KNOWN_CONTEXT_KEYS** — no more spurious "Unknown context keys" warning.
14
+ - **Duplicate models deduplicated** — `compare_models(models: ["m", "m"])` runs model once.
15
+
16
+ ## 0.3.4 (2026-03-24)
17
+
18
+ - **SchemaValidator validates non-object roots** — boolean, integer, number, array root schemas now enforce type, min/max, enum, minItems/maxItems. Previously only object schemas were validated.
19
+ - **Removed passing cases = regression** — `regressed?` returns true when baseline had passing cases that are now missing. Prevents gate bypass by deleting eval cases.
20
+ - **JSON string sample_response fixed** — `sample_response('{"name":"Alice"}')` correctly parsed for pre-validation instead of double-encoding.
21
+ - **`context[:max_tokens]` forwarded** — overrides step's `max_output` for adapter call AND budget precheck.
22
+
23
+ ## 0.3.3 (2026-03-23)
24
+
25
+ - **Skipped cases visible in regression diff** — baseline PASS → current SKIP now detected as regression by `without_regressions` and `fail_on_regression`.
26
+ - **Skip only on missing adapter** — eval runner no longer masks evaluator errors as SKIP. Only "No adapter configured" triggers skip.
27
+ - **Array/Hash sample pre-validation** — `sample_response([{...}])` correctly validated against schema instead of silently skipping.
28
+ - **`assume_model_exists: false` forwarded** — boolean `false` no longer dropped by truthiness check in adapter options.
29
+ - **Duplicate case names caught at definition** — `add_case`/`verify` with same name raises immediately, not at run time.
30
+
31
+ ## 0.3.2 (2026-03-23)
32
+
33
+ - **Array response preserved** — `Adapters::RubyLLM` no longer stringifies Array content. Steps with `output_type Array` work correctly.
34
+ - **Falsy prompt input** — `run(false)` and `build_messages(false)` pass `false` to dynamic prompt blocks instead of falling back to `instance_eval`.
35
+ - **`retry_on` flatten** — `retry_on([:a, :b])` no longer wraps in nested array.
36
+ - **Builder reset** — `Prompt::Builder` resets nodes on each build (no accumulation on reuse).
37
+ - **Pipeline false output** — `output: false` no longer shows "(no output)" in pretty_print.
38
+
39
+ ## 0.3.1 (2026-03-23)
40
+
41
+ Fixes from persona_tool production deployment (4 services migrated).
42
+
43
+ - **Proc/Lambda in `expected_traits`** — `expected_traits: { score: ->(v) { v > 3 } }` now works.
44
+ - **Zeitwerk eager-load** — `load_evals!` eager-loads `app/contracts/` and `app/steps/` before loading eval files. Fixes uninitialized constant errors in Rake tasks.
45
+ - **Falsy values** — `expected: false`, `input: false`, `sample_response(nil)` all handled correctly.
46
+ - **Context key forwarding** — `provider:` and `assume_model_exists:` forwarded to adapter. `schema:` and `max_tokens:` are step-level only (no split-brain).
47
+ - **Deep-freeze immutability** — constructors never mutate caller's data.
48
+
3
49
  ## 0.3.0 (2026-03-23)
4
50
 
5
51
  Baseline regression detection — know when quality drops before users do.
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- ruby_llm-contract (0.3.0)
4
+ ruby_llm-contract (0.3.6)
5
5
  dry-types (~> 1.7)
6
6
  ruby_llm (~> 1.0)
7
7
  ruby_llm-schema (~> 0.3)
@@ -165,7 +165,7 @@ CHECKSUMS
165
165
  rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
166
166
  ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
167
167
  ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
168
- ruby_llm-contract (0.3.0)
168
+ ruby_llm-contract (0.3.6)
169
169
  ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
170
170
  unicode-display_width (3.2.0) sha256=0cdd96b5681a5949cdbc2c55e7b420facae74c4aaf9a9815eee1087cb1853c42
171
171
  unicode-emoji (4.2.0) sha256=519e69150f75652e40bf736106cfbc8f0f73aa3fb6a65afe62fefa7f80b0f80f
@@ -43,8 +43,8 @@ module RubyLLM
43
43
 
44
44
  def chat_constructor_options(options)
45
45
  opts = { model: options[:model] }
46
- opts[:provider] = options[:provider] if options[:provider]
47
- opts[:assume_model_exists] = options[:assume_model_exists] if options[:assume_model_exists]
46
+ opts[:provider] = options[:provider] if options.key?(:provider)
47
+ opts[:assume_model_exists] = options[:assume_model_exists] if options.key?(:assume_model_exists)
48
48
  opts
49
49
  end
50
50
 
@@ -57,7 +57,7 @@ module RubyLLM
57
57
 
58
58
  def build_response(response)
59
59
  content = response.content
60
- content = content.to_s unless content.is_a?(Hash)
60
+ content = content.to_s unless content.is_a?(Hash) || content.is_a?(Array)
61
61
 
62
62
  Response.new(
63
63
  content: content,
@@ -46,6 +46,7 @@ module RubyLLM
46
46
 
47
47
  def compare_models(eval_name, models:, context: {})
48
48
  context ||= {}
49
+ models = models.uniq
49
50
  reports = models.each_with_object({}) do |model, hash|
50
51
  model_context = deep_dup_context(context).merge(model: model)
51
52
  hash[model] = run_single_eval(eval_name, model_context)
@@ -40,10 +40,77 @@ module RubyLLM
40
40
 
41
41
  def validate_non_hash_output
42
42
  expected_type = @json_schema[:type]&.to_s
43
+
43
44
  if expected_type == "object" || @json_schema.key?(:properties)
44
- ["expected object, got #{@output.class}"]
45
- else
46
- []
45
+ return ["expected object, got #{@output.class}"]
46
+ end
47
+
48
+ errors = []
49
+ validate_type_match(errors, @output, expected_type, "root") if expected_type
50
+ validate_constraints(errors, @output, @json_schema, "root")
51
+
52
+ if expected_type == "array" && @output.is_a?(Array) && @json_schema[:items]
53
+ validate_array_items(errors, @output, @json_schema[:items], "")
54
+ end
55
+
56
+ errors
57
+ end
58
+
59
+ def validate_array_items(errors, array, items_schema, prefix)
60
+ array.each_with_index do |item, i|
61
+ item_prefix = "#{prefix}[#{i}]"
62
+ validate_value(errors, item, items_schema, item_prefix)
63
+ end
64
+ end
65
+
66
+ def validate_value(errors, value, schema, prefix)
67
+ value_type = schema[:type]&.to_s
68
+
69
+ validate_type_match(errors, value, value_type, prefix) if value_type
70
+ validate_constraints(errors, value, schema, prefix)
71
+
72
+ if value.is_a?(Hash) && (schema.key?(:properties) || value_type == "object")
73
+ validate_object(value, schema, prefix: prefix)
74
+ errors.concat(@errors)
75
+ @errors = []
76
+ elsif value.is_a?(Array) && schema[:items]
77
+ validate_array_items(errors, value, schema[:items], prefix)
78
+ end
79
+ end
80
+
81
+ def validate_type_match(errors, value, expected_type, prefix)
82
+ valid = case expected_type
83
+ when "string" then value.is_a?(String)
84
+ when "integer" then value.is_a?(Integer)
85
+ when "number" then value.is_a?(Numeric)
86
+ when "boolean" then value.is_a?(TrueClass) || value.is_a?(FalseClass)
87
+ when "array" then value.is_a?(Array)
88
+ else true
89
+ end
90
+ errors << "#{prefix}: expected #{expected_type}, got #{value.class}" unless valid
91
+ end
92
+
93
+ def validate_constraints(errors, value, schema, prefix)
94
+ if schema[:minimum] && value.is_a?(Numeric) && value < schema[:minimum]
95
+ errors << "#{prefix}: #{value} is less than minimum #{schema[:minimum]}"
96
+ end
97
+ if schema[:maximum] && value.is_a?(Numeric) && value > schema[:maximum]
98
+ errors << "#{prefix}: #{value} is greater than maximum #{schema[:maximum]}"
99
+ end
100
+ if schema[:enum] && !schema[:enum].include?(value)
101
+ errors << "#{prefix}: #{value.inspect} is not in enum #{schema[:enum].inspect}"
102
+ end
103
+ if schema[:minItems] && value.is_a?(Array) && value.length < schema[:minItems]
104
+ errors << "#{prefix}: array has #{value.length} items, minimum #{schema[:minItems]}"
105
+ end
106
+ if schema[:maxItems] && value.is_a?(Array) && value.length > schema[:maxItems]
107
+ errors << "#{prefix}: array has #{value.length} items, maximum #{schema[:maxItems]}"
108
+ end
109
+ if schema[:minLength] && value.is_a?(String) && value.length < schema[:minLength]
110
+ errors << "#{prefix}: string length #{value.length} is less than minLength #{schema[:minLength]}"
111
+ end
112
+ if schema[:maxLength] && value.is_a?(String) && value.length > schema[:maxLength]
113
+ errors << "#{prefix}: string length #{value.length} is greater than maxLength #{schema[:maxLength]}"
47
114
  end
48
115
  end
49
116
 
@@ -48,7 +48,11 @@ module RubyLLM
48
48
  end
49
49
 
50
50
  def regressed?
51
- regressions.any?
51
+ regressions.any? || removed_passing_cases.any?
52
+ end
53
+
54
+ def removed_passing_cases
55
+ removed_cases.select { |name| @baseline[name]&.dig(:passed) }
52
56
  end
53
57
 
54
58
  def improved?
@@ -34,6 +34,7 @@ module RubyLLM
34
34
  def add_case(description, input: nil, expected: nil, expected_traits: nil, evaluator: nil)
35
35
  case_input = input.nil? ? @default_input : input
36
36
  raise ArgumentError, "add_case requires input (set default_input or pass input:)" if case_input.nil?
37
+ validate_unique_case_name!(description)
37
38
 
38
39
  @cases << {
39
40
  name: description,
@@ -52,6 +53,7 @@ module RubyLLM
52
53
  expected_or_proc = expect unless expect.nil?
53
54
  case_input = input.nil? ? @default_input : input
54
55
  validate_verify_args!(expected_or_proc, case_input)
56
+ validate_unique_case_name!(description)
55
57
 
56
58
  evaluator = expected_or_proc.is_a?(::Proc) ? expected_or_proc : nil
57
59
 
@@ -85,6 +87,12 @@ module RubyLLM
85
87
  [{ name: "contract check", input: @default_input, expected: nil, evaluator: nil }]
86
88
  end
87
89
 
90
+ def validate_unique_case_name!(name)
91
+ return unless @cases.any? { |c| c[:name] == name }
92
+
93
+ raise ArgumentError, "Duplicate case name '#{name}' in eval '#{@name}'. Case names must be unique."
94
+ end
95
+
88
96
  def validate_verify_args!(expected_or_proc, case_input)
89
97
  raise ArgumentError, "verify requires either a positional argument or expect: keyword" if expected_or_proc.nil?
90
98
  raise ArgumentError, "verify requires input (set default_input or pass input:)" if case_input.nil?
@@ -98,15 +106,28 @@ module RubyLLM
98
106
  return if errors.empty?
99
107
 
100
108
  raise ArgumentError, "sample_response does not satisfy step schema: #{errors.join(", ")}"
101
- rescue JSON::ParserError
102
- # Not JSON -- skip pre-validation
109
+ rescue JSON::ParserError => e
110
+ # Non-JSON string with a structured schema = clear error
111
+ raise ArgumentError, "sample_response is not valid JSON: #{e.message}"
103
112
  end
104
113
 
105
114
  def validate_sample_against_schema(schema)
106
- response_hash = @sample_response.is_a?(Hash) ? @sample_response : JSON.parse(@sample_response.to_s)
107
- symbolized = Parser.symbolize_keys(response_hash)
115
+ parsed = case @sample_response
116
+ when Hash, Array then @sample_response
117
+ when String then JSON.parse(@sample_response)
118
+ else @sample_response
119
+ end
120
+ symbolized = deep_symbolize(parsed)
108
121
  SchemaValidator.validate(symbolized, schema)
109
122
  end
123
+
124
+ def deep_symbolize(obj)
125
+ case obj
126
+ when Hash then Parser.symbolize_keys(obj)
127
+ when Array then obj.map { |item| deep_symbolize(item) }
128
+ else obj
129
+ end
130
+ end
110
131
  end
111
132
  end
112
133
  end
@@ -97,7 +97,7 @@ module RubyLLM
97
97
  validate_baseline!(baseline_data)
98
98
  BaselineDiff.new(
99
99
  baseline_cases: baseline_data[:cases],
100
- current_cases: evaluated_results.map { |r| serialize_case(r) }
100
+ current_cases: results.map { |r| serialize_case(r) }
101
101
  )
102
102
  end
103
103
 
@@ -32,7 +32,8 @@ module RubyLLM
32
32
 
33
33
  build_case_result(test_case, step_result, eval_result)
34
34
  rescue RubyLLM::Contract::Error => e
35
- # No adapter configured — skip this case (offline mode without sample_response)
35
+ raise unless e.message.include?("No adapter configured")
36
+
36
37
  skipped_result(test_case, e.message)
37
38
  end
38
39
 
@@ -26,6 +26,8 @@ module RubyLLM
26
26
 
27
27
  def trait_error(key, value, expectation)
28
28
  case expectation
29
+ when ::Proc
30
+ trait_proc_error(key, value, expectation)
29
31
  when ::Regexp
30
32
  trait_regexp_error(key, value, expectation)
31
33
  when Range
@@ -56,6 +58,10 @@ module RubyLLM
56
58
  "#{key}: expected falsy, got #{value.inspect}" if value
57
59
  end
58
60
 
61
+ def trait_proc_error(key, value, expectation)
62
+ "#{key}: trait check failed, got #{value.inspect}" unless expectation.call(value)
63
+ end
64
+
59
65
  def trait_equality_error(key, value, expectation)
60
66
  "#{key}: expected #{expectation.inspect}, got #{value.inspect}" unless value == expectation
61
67
  end
@@ -116,7 +116,7 @@ module RubyLLM
116
116
  end
117
117
 
118
118
  def format_output(output)
119
- return ["(no output)"] unless output
119
+ return ["(no output)"] if output.nil?
120
120
 
121
121
  pairs = output.is_a?(Hash) ? output : { value: output }
122
122
  pairs.map do |key, val|
@@ -10,7 +10,8 @@ module RubyLLM
10
10
  end
11
11
 
12
12
  def build(input = nil)
13
- if input && @block.arity >= 1
13
+ @nodes = []
14
+ if !input.nil? && @block.arity >= 1
14
15
  instance_exec(input, &@block)
15
16
  else
16
17
  instance_eval(&@block)
@@ -58,7 +58,7 @@ module RubyLLM
58
58
  end
59
59
  end
60
60
 
61
- KNOWN_CONTEXT_KEYS = %i[adapter model temperature provider assume_model_exists].freeze
61
+ KNOWN_CONTEXT_KEYS = %i[adapter model temperature max_tokens provider assume_model_exists].freeze
62
62
 
63
63
  def run(input, context: {})
64
64
  context = (context || {}).transform_keys { |k| k.respond_to?(:to_sym) ? k.to_sym : k }
@@ -68,7 +68,7 @@ module RubyLLM
68
68
  policy = retry_policy
69
69
 
70
70
  ctx_temp = context[:temperature]
71
- extra = context.slice(:provider, :assume_model_exists)
71
+ extra = context.slice(:provider, :assume_model_exists, :max_tokens)
72
72
  result = if policy
73
73
  run_with_retry(input, adapter: adapter, default_model: default_model,
74
74
  policy: policy, context_temperature: ctx_temp, extra_options: extra)
@@ -29,7 +29,7 @@ module RubyLLM
29
29
  end
30
30
 
31
31
  def append_cost_error(estimated, errors)
32
- estimated_output = @max_output || 0
32
+ estimated_output = effective_max_output || 0
33
33
  estimated_cost = CostCalculator.calculate(
34
34
  model_name: @model,
35
35
  usage: { input_tokens: estimated, output_tokens: estimated_output }
@@ -39,7 +39,7 @@ module RubyLLM
39
39
  end
40
40
 
41
41
  def retry_on(*statuses)
42
- @retryable_statuses = statuses
42
+ @retryable_statuses = statuses.flatten
43
43
  end
44
44
 
45
45
  def retryable?(result)
@@ -83,14 +83,20 @@ module RubyLLM
83
83
  end
84
84
 
85
85
  def build_adapter_options
86
+ effective_max_tokens = @extra_options[:max_tokens] || @max_output
87
+
86
88
  { model: @model }.tap do |opts|
87
89
  opts[:schema] = @output_schema if @output_schema
88
- opts[:max_tokens] = @max_output if @max_output
90
+ opts[:max_tokens] = effective_max_tokens if effective_max_tokens
89
91
  opts[:temperature] = @temperature if @temperature
90
92
  @extra_options.each { |k, v| opts[k] = v unless opts.key?(k) }
91
93
  end
92
94
  end
93
95
 
96
+ def effective_max_output
97
+ @extra_options[:max_tokens] || @max_output
98
+ end
99
+
94
100
  def build_error_result(error_result, messages)
95
101
  Result.new(
96
102
  status: error_result.status,
@@ -2,6 +2,6 @@
2
2
 
3
3
  module RubyLLM
4
4
  module Contract
5
- VERSION = "0.3.0"
5
+ VERSION = "0.3.6"
6
6
  end
7
7
  end
@@ -51,6 +51,10 @@ module RubyLLM
51
51
 
52
52
  return if dirs.empty?
53
53
 
54
+ # In Rails, eager-load parent directories so contract classes
55
+ # are available when eval files reference them.
56
+ eager_load_contract_dirs! if defined?(::Rails)
57
+
54
58
  # Clear file-sourced evals ONCE, then load ALL dirs.
55
59
  Thread.current[:ruby_llm_contract_reloading] = true
56
60
  eval_hosts.each do |host|
@@ -79,6 +83,18 @@ module RubyLLM
79
83
  @eval_hosts || []
80
84
  end
81
85
 
86
+ def eager_load_contract_dirs!
87
+ %w[app/contracts app/steps].each do |path|
88
+ full = ::Rails.root.join(path)
89
+ next unless full.exist?
90
+
91
+ ::Rails.autoloaders.main.eager_load_dir(full.to_s)
92
+ rescue StandardError
93
+ # Zeitwerk not available or dir not managed — skip
94
+ nil
95
+ end
96
+ end
97
+
82
98
  def auto_create_adapter!
83
99
  require "ruby_llm"
84
100
  configuration.default_adapter = Adapters::RubyLLM.new
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby_llm-contract
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Justyna