ruby_llm-contract 0.3.0 → 0.3.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +54 -0
- data/Gemfile.lock +2 -2
- data/README.md +1 -1
- data/lib/ruby_llm/contract/adapters/ruby_llm.rb +3 -3
- data/lib/ruby_llm/contract/concerns/eval_host.rb +1 -0
- data/lib/ruby_llm/contract/contract/schema_validator.rb +70 -3
- data/lib/ruby_llm/contract/eval/baseline_diff.rb +15 -3
- data/lib/ruby_llm/contract/eval/eval_definition.rb +24 -4
- data/lib/ruby_llm/contract/eval/report.rb +1 -1
- data/lib/ruby_llm/contract/eval/runner.rb +2 -1
- data/lib/ruby_llm/contract/eval/trait_evaluator.rb +11 -2
- data/lib/ruby_llm/contract/pipeline/result.rb +1 -1
- data/lib/ruby_llm/contract/prompt/builder.rb +6 -3
- data/lib/ruby_llm/contract/step/base.rb +4 -3
- data/lib/ruby_llm/contract/step/limit_checker.rb +1 -1
- data/lib/ruby_llm/contract/step/retry_policy.rb +1 -1
- data/lib/ruby_llm/contract/step/runner.rb +7 -1
- data/lib/ruby_llm/contract/version.rb +1 -1
- data/lib/ruby_llm/contract.rb +19 -0
- data/ruby_llm-contract.gemspec +5 -3
- metadata +6 -4
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: dee963c252704634b8b9452e4e0460561e7795385e2dc59f4d5cc089a16d9210
|
|
4
|
+
data.tar.gz: ce289e0f1dee22a75d7079b28775c6dd0e5d85b01a54e5a97e4f47b40c2f5741
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: d10ff4021462051d80cb5205174a24f9c5093ee096fc5add7d5bfacc88fb936a364d474871c05d87dad404ffc9577c998e7a1ae73cc8a8e0a5868e7cef629c83
|
|
7
|
+
data.tar.gz: 914a370baf65d5e8fc62f78a22e3bc6ee9eba83b78257ac95b87c8d5965ae23e54dbb7a66de7b2b6c7dc3c848a513be22c2e37e76445d9094dd576f3d3867215
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,59 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.3.7 (2026-03-24)
|
|
4
|
+
|
|
5
|
+
- **Trait missing key = error** — `expected_traits: { title: 0..5 }` on output `{}` now fails instead of silently passing.
|
|
6
|
+
- **nil input in dynamic prompts** — `run(nil)` with `prompt { |input| ... }` correctly passes nil to block.
|
|
7
|
+
- **Defensive sample pre-validation** — `sample_response` uses the same parser as runtime (handles code fences, BOM, prose around JSON).
|
|
8
|
+
- **Baseline diff excludes skipped** — self-compare with skipped cases no longer shows artificial score delta.
|
|
9
|
+
- **Zeitwerk eval/ ignore** — `eager_load_contract_dirs!` ignores `eval/` subdirs before eager load.
|
|
10
|
+
|
|
11
|
+
## 0.3.6 (2026-03-24)
|
|
12
|
+
|
|
13
|
+
- **Recursive array/object validation** — nested arrays (`array of array of string`) validated recursively. Object items validated even without `:properties` (e.g. `additionalProperties: false`).
|
|
14
|
+
- **Deep symbolize in sample pre-validation** — array samples with string keys (`[{"name" => "Alice"}]`) correctly symbolized before schema validation.
|
|
15
|
+
|
|
16
|
+
## 0.3.5 (2026-03-24)
|
|
17
|
+
|
|
18
|
+
- **String constraints in SchemaValidator** — `minLength`/`maxLength` enforced for root and nested strings.
|
|
19
|
+
- **Array item validation** — scalar items (string, integer) validated against items schema type and constraints.
|
|
20
|
+
- **Non-JSON sample_response fails fast** — `sample_response("hello")` with object schema raises ArgumentError at definition time instead of silently passing.
|
|
21
|
+
- **`max_tokens` in KNOWN_CONTEXT_KEYS** — no more spurious "Unknown context keys" warning.
|
|
22
|
+
- **Duplicate models deduplicated** — `compare_models(models: ["m", "m"])` runs model once.
|
|
23
|
+
|
|
24
|
+
## 0.3.4 (2026-03-24)
|
|
25
|
+
|
|
26
|
+
- **SchemaValidator validates non-object roots** — boolean, integer, number, array root schemas now enforce type, min/max, enum, minItems/maxItems. Previously only object schemas were validated.
|
|
27
|
+
- **Removed passing cases = regression** — `regressed?` returns true when baseline had passing cases that are now missing. Prevents gate bypass by deleting eval cases.
|
|
28
|
+
- **JSON string sample_response fixed** — `sample_response('{"name":"Alice"}')` correctly parsed for pre-validation instead of double-encoding.
|
|
29
|
+
- **`context[:max_tokens]` forwarded** — overrides step's `max_output` for adapter call AND budget precheck.
|
|
30
|
+
|
|
31
|
+
## 0.3.3 (2026-03-23)
|
|
32
|
+
|
|
33
|
+
- **Skipped cases visible in regression diff** — baseline PASS → current SKIP now detected as regression by `without_regressions` and `fail_on_regression`.
|
|
34
|
+
- **Skip only on missing adapter** — eval runner no longer masks evaluator errors as SKIP. Only "No adapter configured" triggers skip.
|
|
35
|
+
- **Array/Hash sample pre-validation** — `sample_response([{...}])` correctly validated against schema instead of silently skipping.
|
|
36
|
+
- **`assume_model_exists: false` forwarded** — boolean `false` no longer dropped by truthiness check in adapter options.
|
|
37
|
+
- **Duplicate case names caught at definition** — `add_case`/`verify` with same name raises immediately, not at run time.
|
|
38
|
+
|
|
39
|
+
## 0.3.2 (2026-03-23)
|
|
40
|
+
|
|
41
|
+
- **Array response preserved** — `Adapters::RubyLLM` no longer stringifies Array content. Steps with `output_type Array` work correctly.
|
|
42
|
+
- **Falsy prompt input** — `run(false)` and `build_messages(false)` pass `false` to dynamic prompt blocks instead of falling back to `instance_eval`.
|
|
43
|
+
- **`retry_on` flatten** — `retry_on([:a, :b])` no longer wraps in nested array.
|
|
44
|
+
- **Builder reset** — `Prompt::Builder` resets nodes on each build (no accumulation on reuse).
|
|
45
|
+
- **Pipeline false output** — `output: false` no longer shows "(no output)" in pretty_print.
|
|
46
|
+
|
|
47
|
+
## 0.3.1 (2026-03-23)
|
|
48
|
+
|
|
49
|
+
Fixes from persona_tool production deployment (4 services migrated).
|
|
50
|
+
|
|
51
|
+
- **Proc/Lambda in `expected_traits`** — `expected_traits: { score: ->(v) { v > 3 } }` now works.
|
|
52
|
+
- **Zeitwerk eager-load** — `load_evals!` eager-loads `app/contracts/` and `app/steps/` before loading eval files. Fixes uninitialized constant errors in Rake tasks.
|
|
53
|
+
- **Falsy values** — `expected: false`, `input: false`, `sample_response(nil)` all handled correctly.
|
|
54
|
+
- **Context key forwarding** — `provider:` and `assume_model_exists:` forwarded to adapter. `schema:` and `max_tokens:` are step-level only (no split-brain).
|
|
55
|
+
- **Deep-freeze immutability** — constructors never mutate caller's data.
|
|
56
|
+
|
|
3
57
|
## 0.3.0 (2026-03-23)
|
|
4
58
|
|
|
5
59
|
Baseline regression detection — know when quality drops before users do.
|
data/Gemfile.lock
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
ruby_llm-contract (0.3.
|
|
4
|
+
ruby_llm-contract (0.3.7)
|
|
5
5
|
dry-types (~> 1.7)
|
|
6
6
|
ruby_llm (~> 1.0)
|
|
7
7
|
ruby_llm-schema (~> 0.3)
|
|
@@ -165,7 +165,7 @@ CHECKSUMS
|
|
|
165
165
|
rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
|
|
166
166
|
ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
|
|
167
167
|
ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
|
|
168
|
-
ruby_llm-contract (0.3.
|
|
168
|
+
ruby_llm-contract (0.3.7)
|
|
169
169
|
ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
|
|
170
170
|
unicode-display_width (3.2.0) sha256=0cdd96b5681a5949cdbc2c55e7b420facae74c4aaf9a9815eee1087cb1853c42
|
|
171
171
|
unicode-emoji (4.2.0) sha256=519e69150f75652e40bf736106cfbc8f0f73aa3fb6a65afe62fefa7f80b0f80f
|
data/README.md
CHANGED
|
@@ -6,7 +6,7 @@ Companion gem for [ruby_llm](https://github.com/crmne/ruby_llm).
|
|
|
6
6
|
|
|
7
7
|
## The problem
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
Which model should you use? The expensive one is accurate but costs 4x more. The cheap one is fast but hallucinates on edge cases. You tweak a prompt — did accuracy improve or drop? You have no data. Just gut feeling.
|
|
10
10
|
|
|
11
11
|
## The fix
|
|
12
12
|
|
|
@@ -43,8 +43,8 @@ module RubyLLM
|
|
|
43
43
|
|
|
44
44
|
def chat_constructor_options(options)
|
|
45
45
|
opts = { model: options[:model] }
|
|
46
|
-
opts[:provider] = options[:provider] if options
|
|
47
|
-
opts[:assume_model_exists] = options[:assume_model_exists] if options
|
|
46
|
+
opts[:provider] = options[:provider] if options.key?(:provider)
|
|
47
|
+
opts[:assume_model_exists] = options[:assume_model_exists] if options.key?(:assume_model_exists)
|
|
48
48
|
opts
|
|
49
49
|
end
|
|
50
50
|
|
|
@@ -57,7 +57,7 @@ module RubyLLM
|
|
|
57
57
|
|
|
58
58
|
def build_response(response)
|
|
59
59
|
content = response.content
|
|
60
|
-
content = content.to_s unless content.is_a?(Hash)
|
|
60
|
+
content = content.to_s unless content.is_a?(Hash) || content.is_a?(Array)
|
|
61
61
|
|
|
62
62
|
Response.new(
|
|
63
63
|
content: content,
|
|
@@ -46,6 +46,7 @@ module RubyLLM
|
|
|
46
46
|
|
|
47
47
|
def compare_models(eval_name, models:, context: {})
|
|
48
48
|
context ||= {}
|
|
49
|
+
models = models.uniq
|
|
49
50
|
reports = models.each_with_object({}) do |model, hash|
|
|
50
51
|
model_context = deep_dup_context(context).merge(model: model)
|
|
51
52
|
hash[model] = run_single_eval(eval_name, model_context)
|
|
@@ -40,10 +40,77 @@ module RubyLLM
|
|
|
40
40
|
|
|
41
41
|
def validate_non_hash_output
|
|
42
42
|
expected_type = @json_schema[:type]&.to_s
|
|
43
|
+
|
|
43
44
|
if expected_type == "object" || @json_schema.key?(:properties)
|
|
44
|
-
["expected object, got #{@output.class}"]
|
|
45
|
-
|
|
46
|
-
|
|
45
|
+
return ["expected object, got #{@output.class}"]
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
errors = []
|
|
49
|
+
validate_type_match(errors, @output, expected_type, "root") if expected_type
|
|
50
|
+
validate_constraints(errors, @output, @json_schema, "root")
|
|
51
|
+
|
|
52
|
+
if expected_type == "array" && @output.is_a?(Array) && @json_schema[:items]
|
|
53
|
+
validate_array_items(errors, @output, @json_schema[:items], "")
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
errors
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
def validate_array_items(errors, array, items_schema, prefix)
|
|
60
|
+
array.each_with_index do |item, i|
|
|
61
|
+
item_prefix = "#{prefix}[#{i}]"
|
|
62
|
+
validate_value(errors, item, items_schema, item_prefix)
|
|
63
|
+
end
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
def validate_value(errors, value, schema, prefix)
|
|
67
|
+
value_type = schema[:type]&.to_s
|
|
68
|
+
|
|
69
|
+
validate_type_match(errors, value, value_type, prefix) if value_type
|
|
70
|
+
validate_constraints(errors, value, schema, prefix)
|
|
71
|
+
|
|
72
|
+
if value.is_a?(Hash) && (schema.key?(:properties) || value_type == "object")
|
|
73
|
+
validate_object(value, schema, prefix: prefix)
|
|
74
|
+
errors.concat(@errors)
|
|
75
|
+
@errors = []
|
|
76
|
+
elsif value.is_a?(Array) && schema[:items]
|
|
77
|
+
validate_array_items(errors, value, schema[:items], prefix)
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
def validate_type_match(errors, value, expected_type, prefix)
|
|
82
|
+
valid = case expected_type
|
|
83
|
+
when "string" then value.is_a?(String)
|
|
84
|
+
when "integer" then value.is_a?(Integer)
|
|
85
|
+
when "number" then value.is_a?(Numeric)
|
|
86
|
+
when "boolean" then value.is_a?(TrueClass) || value.is_a?(FalseClass)
|
|
87
|
+
when "array" then value.is_a?(Array)
|
|
88
|
+
else true
|
|
89
|
+
end
|
|
90
|
+
errors << "#{prefix}: expected #{expected_type}, got #{value.class}" unless valid
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
def validate_constraints(errors, value, schema, prefix)
|
|
94
|
+
if schema[:minimum] && value.is_a?(Numeric) && value < schema[:minimum]
|
|
95
|
+
errors << "#{prefix}: #{value} is less than minimum #{schema[:minimum]}"
|
|
96
|
+
end
|
|
97
|
+
if schema[:maximum] && value.is_a?(Numeric) && value > schema[:maximum]
|
|
98
|
+
errors << "#{prefix}: #{value} is greater than maximum #{schema[:maximum]}"
|
|
99
|
+
end
|
|
100
|
+
if schema[:enum] && !schema[:enum].include?(value)
|
|
101
|
+
errors << "#{prefix}: #{value.inspect} is not in enum #{schema[:enum].inspect}"
|
|
102
|
+
end
|
|
103
|
+
if schema[:minItems] && value.is_a?(Array) && value.length < schema[:minItems]
|
|
104
|
+
errors << "#{prefix}: array has #{value.length} items, minimum #{schema[:minItems]}"
|
|
105
|
+
end
|
|
106
|
+
if schema[:maxItems] && value.is_a?(Array) && value.length > schema[:maxItems]
|
|
107
|
+
errors << "#{prefix}: array has #{value.length} items, maximum #{schema[:maxItems]}"
|
|
108
|
+
end
|
|
109
|
+
if schema[:minLength] && value.is_a?(String) && value.length < schema[:minLength]
|
|
110
|
+
errors << "#{prefix}: string length #{value.length} is less than minLength #{schema[:minLength]}"
|
|
111
|
+
end
|
|
112
|
+
if schema[:maxLength] && value.is_a?(String) && value.length > schema[:maxLength]
|
|
113
|
+
errors << "#{prefix}: string length #{value.length} is greater than maxLength #{schema[:maxLength]}"
|
|
47
114
|
end
|
|
48
115
|
end
|
|
49
116
|
|
|
@@ -9,8 +9,8 @@ module RubyLLM
|
|
|
9
9
|
def initialize(baseline_cases:, current_cases:)
|
|
10
10
|
@baseline = index_by_name(baseline_cases)
|
|
11
11
|
@current = index_by_name(current_cases)
|
|
12
|
-
@baseline_score = baseline_cases
|
|
13
|
-
@current_score = current_cases
|
|
12
|
+
@baseline_score = compute_score(baseline_cases)
|
|
13
|
+
@current_score = compute_score(current_cases)
|
|
14
14
|
freeze
|
|
15
15
|
end
|
|
16
16
|
|
|
@@ -48,7 +48,11 @@ module RubyLLM
|
|
|
48
48
|
end
|
|
49
49
|
|
|
50
50
|
def regressed?
|
|
51
|
-
regressions.any?
|
|
51
|
+
regressions.any? || removed_passing_cases.any?
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
def removed_passing_cases
|
|
55
|
+
removed_cases.select { |name| @baseline[name]&.dig(:passed) }
|
|
52
56
|
end
|
|
53
57
|
|
|
54
58
|
def improved?
|
|
@@ -74,6 +78,14 @@ module RubyLLM
|
|
|
74
78
|
|
|
75
79
|
private
|
|
76
80
|
|
|
81
|
+
def compute_score(cases)
|
|
82
|
+
# Exclude skipped cases from score (consistent with Report#score)
|
|
83
|
+
evaluated = cases.reject { |c| c[:details]&.start_with?("skipped:") }
|
|
84
|
+
return 0.0 if evaluated.empty?
|
|
85
|
+
|
|
86
|
+
evaluated.sum { |c| c[:score] } / evaluated.length
|
|
87
|
+
end
|
|
88
|
+
|
|
77
89
|
def index_by_name(cases)
|
|
78
90
|
cases.each_with_object({}) { |c, h| h[c[:name]] = c }
|
|
79
91
|
end
|
|
@@ -34,6 +34,7 @@ module RubyLLM
|
|
|
34
34
|
def add_case(description, input: nil, expected: nil, expected_traits: nil, evaluator: nil)
|
|
35
35
|
case_input = input.nil? ? @default_input : input
|
|
36
36
|
raise ArgumentError, "add_case requires input (set default_input or pass input:)" if case_input.nil?
|
|
37
|
+
validate_unique_case_name!(description)
|
|
37
38
|
|
|
38
39
|
@cases << {
|
|
39
40
|
name: description,
|
|
@@ -52,6 +53,7 @@ module RubyLLM
|
|
|
52
53
|
expected_or_proc = expect unless expect.nil?
|
|
53
54
|
case_input = input.nil? ? @default_input : input
|
|
54
55
|
validate_verify_args!(expected_or_proc, case_input)
|
|
56
|
+
validate_unique_case_name!(description)
|
|
55
57
|
|
|
56
58
|
evaluator = expected_or_proc.is_a?(::Proc) ? expected_or_proc : nil
|
|
57
59
|
|
|
@@ -85,6 +87,12 @@ module RubyLLM
|
|
|
85
87
|
[{ name: "contract check", input: @default_input, expected: nil, evaluator: nil }]
|
|
86
88
|
end
|
|
87
89
|
|
|
90
|
+
def validate_unique_case_name!(name)
|
|
91
|
+
return unless @cases.any? { |c| c[:name] == name }
|
|
92
|
+
|
|
93
|
+
raise ArgumentError, "Duplicate case name '#{name}' in eval '#{@name}'. Case names must be unique."
|
|
94
|
+
end
|
|
95
|
+
|
|
88
96
|
def validate_verify_args!(expected_or_proc, case_input)
|
|
89
97
|
raise ArgumentError, "verify requires either a positional argument or expect: keyword" if expected_or_proc.nil?
|
|
90
98
|
raise ArgumentError, "verify requires input (set default_input or pass input:)" if case_input.nil?
|
|
@@ -98,15 +106,27 @@ module RubyLLM
|
|
|
98
106
|
return if errors.empty?
|
|
99
107
|
|
|
100
108
|
raise ArgumentError, "sample_response does not satisfy step schema: #{errors.join(", ")}"
|
|
101
|
-
rescue JSON::ParserError
|
|
102
|
-
|
|
109
|
+
rescue JSON::ParserError, RubyLLM::Contract::ParseError => e
|
|
110
|
+
raise ArgumentError, "sample_response is not valid JSON: #{e.message}"
|
|
103
111
|
end
|
|
104
112
|
|
|
105
113
|
def validate_sample_against_schema(schema)
|
|
106
|
-
|
|
107
|
-
|
|
114
|
+
parsed = case @sample_response
|
|
115
|
+
when Hash, Array then @sample_response
|
|
116
|
+
when String then Parser.parse(@sample_response, strategy: :json)
|
|
117
|
+
else @sample_response
|
|
118
|
+
end
|
|
119
|
+
symbolized = deep_symbolize(parsed)
|
|
108
120
|
SchemaValidator.validate(symbolized, schema)
|
|
109
121
|
end
|
|
122
|
+
|
|
123
|
+
def deep_symbolize(obj)
|
|
124
|
+
case obj
|
|
125
|
+
when Hash then Parser.symbolize_keys(obj)
|
|
126
|
+
when Array then obj.map { |item| deep_symbolize(item) }
|
|
127
|
+
else obj
|
|
128
|
+
end
|
|
129
|
+
end
|
|
110
130
|
end
|
|
111
131
|
end
|
|
112
132
|
end
|
|
@@ -32,7 +32,8 @@ module RubyLLM
|
|
|
32
32
|
|
|
33
33
|
build_case_result(test_case, step_result, eval_result)
|
|
34
34
|
rescue RubyLLM::Contract::Error => e
|
|
35
|
-
|
|
35
|
+
raise unless e.message.include?("No adapter configured")
|
|
36
|
+
|
|
36
37
|
skipped_result(test_case, e.message)
|
|
37
38
|
end
|
|
38
39
|
|
|
@@ -19,13 +19,18 @@ module RubyLLM
|
|
|
19
19
|
end
|
|
20
20
|
|
|
21
21
|
def check_trait(output, key, expectation, errors)
|
|
22
|
-
|
|
23
|
-
|
|
22
|
+
unless output.is_a?(Hash) && output.key?(key)
|
|
23
|
+
errors << "#{key}: missing key"
|
|
24
|
+
return
|
|
25
|
+
end
|
|
26
|
+
error_msg = trait_error(key, output[key], expectation)
|
|
24
27
|
errors << error_msg if error_msg
|
|
25
28
|
end
|
|
26
29
|
|
|
27
30
|
def trait_error(key, value, expectation)
|
|
28
31
|
case expectation
|
|
32
|
+
when ::Proc
|
|
33
|
+
trait_proc_error(key, value, expectation)
|
|
29
34
|
when ::Regexp
|
|
30
35
|
trait_regexp_error(key, value, expectation)
|
|
31
36
|
when Range
|
|
@@ -56,6 +61,10 @@ module RubyLLM
|
|
|
56
61
|
"#{key}: expected falsy, got #{value.inspect}" if value
|
|
57
62
|
end
|
|
58
63
|
|
|
64
|
+
def trait_proc_error(key, value, expectation)
|
|
65
|
+
"#{key}: trait check failed, got #{value.inspect}" unless expectation.call(value)
|
|
66
|
+
end
|
|
67
|
+
|
|
59
68
|
def trait_equality_error(key, value, expectation)
|
|
60
69
|
"#{key}: expected #{expectation.inspect}, got #{value.inspect}" unless value == expectation
|
|
61
70
|
end
|
|
@@ -4,13 +4,16 @@ module RubyLLM
|
|
|
4
4
|
module Contract
|
|
5
5
|
module Prompt
|
|
6
6
|
class Builder
|
|
7
|
+
NOT_PROVIDED = Object.new.freeze
|
|
8
|
+
|
|
7
9
|
def initialize(block)
|
|
8
10
|
@block = block
|
|
9
11
|
@nodes = []
|
|
10
12
|
end
|
|
11
13
|
|
|
12
|
-
def build(input =
|
|
13
|
-
|
|
14
|
+
def build(input = NOT_PROVIDED)
|
|
15
|
+
@nodes = []
|
|
16
|
+
if input != NOT_PROVIDED && @block.arity >= 1
|
|
14
17
|
instance_exec(input, &@block)
|
|
15
18
|
else
|
|
16
19
|
instance_eval(&@block)
|
|
@@ -38,7 +41,7 @@ module RubyLLM
|
|
|
38
41
|
@nodes << Nodes::SectionNode.new(name, text)
|
|
39
42
|
end
|
|
40
43
|
|
|
41
|
-
def self.build(input:
|
|
44
|
+
def self.build(input: NOT_PROVIDED, &block)
|
|
42
45
|
new(block).build(input)
|
|
43
46
|
end
|
|
44
47
|
end
|
|
@@ -58,7 +58,7 @@ module RubyLLM
|
|
|
58
58
|
end
|
|
59
59
|
end
|
|
60
60
|
|
|
61
|
-
KNOWN_CONTEXT_KEYS = %i[adapter model temperature provider assume_model_exists].freeze
|
|
61
|
+
KNOWN_CONTEXT_KEYS = %i[adapter model temperature max_tokens provider assume_model_exists].freeze
|
|
62
62
|
|
|
63
63
|
def run(input, context: {})
|
|
64
64
|
context = (context || {}).transform_keys { |k| k.respond_to?(:to_sym) ? k.to_sym : k }
|
|
@@ -68,7 +68,7 @@ module RubyLLM
|
|
|
68
68
|
policy = retry_policy
|
|
69
69
|
|
|
70
70
|
ctx_temp = context[:temperature]
|
|
71
|
-
extra = context.slice(:provider, :assume_model_exists)
|
|
71
|
+
extra = context.slice(:provider, :assume_model_exists, :max_tokens)
|
|
72
72
|
result = if policy
|
|
73
73
|
run_with_retry(input, adapter: adapter, default_model: default_model,
|
|
74
74
|
policy: policy, context_temperature: ctx_temp, extra_options: extra)
|
|
@@ -82,7 +82,8 @@ module RubyLLM
|
|
|
82
82
|
|
|
83
83
|
def build_messages(input)
|
|
84
84
|
dynamic = prompt.arity >= 1
|
|
85
|
-
|
|
85
|
+
builder_input = dynamic ? input : Prompt::Builder::NOT_PROVIDED
|
|
86
|
+
ast = Prompt::Builder.build(input: builder_input, &prompt)
|
|
86
87
|
variables = dynamic ? {} : { input: input }
|
|
87
88
|
variables.merge!(input.transform_keys(&:to_sym)) if !dynamic && input.is_a?(Hash)
|
|
88
89
|
Prompt::Renderer.render(ast, variables: variables)
|
|
@@ -29,7 +29,7 @@ module RubyLLM
|
|
|
29
29
|
end
|
|
30
30
|
|
|
31
31
|
def append_cost_error(estimated, errors)
|
|
32
|
-
estimated_output =
|
|
32
|
+
estimated_output = effective_max_output || 0
|
|
33
33
|
estimated_cost = CostCalculator.calculate(
|
|
34
34
|
model_name: @model,
|
|
35
35
|
usage: { input_tokens: estimated, output_tokens: estimated_output }
|
|
@@ -83,14 +83,20 @@ module RubyLLM
|
|
|
83
83
|
end
|
|
84
84
|
|
|
85
85
|
def build_adapter_options
|
|
86
|
+
effective_max_tokens = @extra_options[:max_tokens] || @max_output
|
|
87
|
+
|
|
86
88
|
{ model: @model }.tap do |opts|
|
|
87
89
|
opts[:schema] = @output_schema if @output_schema
|
|
88
|
-
opts[:max_tokens] =
|
|
90
|
+
opts[:max_tokens] = effective_max_tokens if effective_max_tokens
|
|
89
91
|
opts[:temperature] = @temperature if @temperature
|
|
90
92
|
@extra_options.each { |k, v| opts[k] = v unless opts.key?(k) }
|
|
91
93
|
end
|
|
92
94
|
end
|
|
93
95
|
|
|
96
|
+
def effective_max_output
|
|
97
|
+
@extra_options[:max_tokens] || @max_output
|
|
98
|
+
end
|
|
99
|
+
|
|
94
100
|
def build_error_result(error_result, messages)
|
|
95
101
|
Result.new(
|
|
96
102
|
status: error_result.status,
|
data/lib/ruby_llm/contract.rb
CHANGED
|
@@ -51,6 +51,10 @@ module RubyLLM
|
|
|
51
51
|
|
|
52
52
|
return if dirs.empty?
|
|
53
53
|
|
|
54
|
+
# In Rails, eager-load parent directories so contract classes
|
|
55
|
+
# are available when eval files reference them.
|
|
56
|
+
eager_load_contract_dirs! if defined?(::Rails)
|
|
57
|
+
|
|
54
58
|
# Clear file-sourced evals ONCE, then load ALL dirs.
|
|
55
59
|
Thread.current[:ruby_llm_contract_reloading] = true
|
|
56
60
|
eval_hosts.each do |host|
|
|
@@ -79,6 +83,21 @@ module RubyLLM
|
|
|
79
83
|
@eval_hosts || []
|
|
80
84
|
end
|
|
81
85
|
|
|
86
|
+
def eager_load_contract_dirs!
|
|
87
|
+
%w[app/contracts app/steps].each do |path|
|
|
88
|
+
full = ::Rails.root.join(path)
|
|
89
|
+
next unless full.exist?
|
|
90
|
+
|
|
91
|
+
# Ignore eval/ subdirs — they don't define Zeitwerk-compatible
|
|
92
|
+
# constants and are loaded separately by load_evals!
|
|
93
|
+
eval_dir = full.join("eval")
|
|
94
|
+
::Rails.autoloaders.main.ignore(eval_dir.to_s) if eval_dir.exist?
|
|
95
|
+
::Rails.autoloaders.main.eager_load_dir(full.to_s)
|
|
96
|
+
rescue StandardError
|
|
97
|
+
nil
|
|
98
|
+
end
|
|
99
|
+
end
|
|
100
|
+
|
|
82
101
|
def auto_create_adapter!
|
|
83
102
|
require "ruby_llm"
|
|
84
103
|
configuration.default_adapter = Adapters::RubyLLM.new
|
data/ruby_llm-contract.gemspec
CHANGED
|
@@ -7,9 +7,10 @@ Gem::Specification.new do |spec|
|
|
|
7
7
|
spec.version = RubyLLM::Contract::VERSION
|
|
8
8
|
spec.authors = ["Justyna"]
|
|
9
9
|
|
|
10
|
-
spec.summary = "
|
|
11
|
-
spec.description = "
|
|
12
|
-
"
|
|
10
|
+
spec.summary = "Know which LLM model to use, what it costs, and when accuracy drops"
|
|
11
|
+
spec.description = "Compare LLM models by accuracy and cost. Regression-test prompts in CI. " \
|
|
12
|
+
"Start on nano, auto-escalate to bigger models when quality drops. " \
|
|
13
|
+
"Companion gem for ruby_llm."
|
|
13
14
|
spec.homepage = "https://github.com/justi/ruby_llm-contract"
|
|
14
15
|
spec.license = "MIT"
|
|
15
16
|
spec.required_ruby_version = ">= 3.2.0"
|
|
@@ -17,6 +18,7 @@ Gem::Specification.new do |spec|
|
|
|
17
18
|
spec.metadata["homepage_uri"] = spec.homepage
|
|
18
19
|
spec.metadata["source_code_uri"] = spec.homepage
|
|
19
20
|
spec.metadata["changelog_uri"] = "#{spec.homepage}/blob/main/CHANGELOG.md"
|
|
21
|
+
spec.metadata["documentation_uri"] = "#{spec.homepage}#readme"
|
|
20
22
|
spec.metadata["rubygems_mfa_required"] = "true"
|
|
21
23
|
|
|
22
24
|
spec.files = Dir.chdir(__dir__) do
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: ruby_llm-contract
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.3.
|
|
4
|
+
version: 0.3.7
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Justyna
|
|
@@ -51,8 +51,9 @@ dependencies:
|
|
|
51
51
|
- - "~>"
|
|
52
52
|
- !ruby/object:Gem::Version
|
|
53
53
|
version: '0.3'
|
|
54
|
-
description:
|
|
55
|
-
|
|
54
|
+
description: Compare LLM models by accuracy and cost. Regression-test prompts in CI.
|
|
55
|
+
Start on nano, auto-escalate to bigger models when quality drops. Companion gem
|
|
56
|
+
for ruby_llm.
|
|
56
57
|
executables: []
|
|
57
58
|
extensions: []
|
|
58
59
|
extra_rdoc_files: []
|
|
@@ -154,6 +155,7 @@ metadata:
|
|
|
154
155
|
homepage_uri: https://github.com/justi/ruby_llm-contract
|
|
155
156
|
source_code_uri: https://github.com/justi/ruby_llm-contract
|
|
156
157
|
changelog_uri: https://github.com/justi/ruby_llm-contract/blob/main/CHANGELOG.md
|
|
158
|
+
documentation_uri: https://github.com/justi/ruby_llm-contract#readme
|
|
157
159
|
rubygems_mfa_required: 'true'
|
|
158
160
|
rdoc_options: []
|
|
159
161
|
require_paths:
|
|
@@ -171,5 +173,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
171
173
|
requirements: []
|
|
172
174
|
rubygems_version: 3.6.7
|
|
173
175
|
specification_version: 4
|
|
174
|
-
summary:
|
|
176
|
+
summary: Know which LLM model to use, what it costs, and when accuracy drops
|
|
175
177
|
test_files: []
|