ruby_llm-contract 0.4.2 → 0.4.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +40 -0
- data/Gemfile.lock +2 -2
- data/lib/ruby_llm/contract/cost_calculator.rb +41 -1
- data/lib/ruby_llm/contract/minitest.rb +116 -2
- data/lib/ruby_llm/contract/pipeline/base.rb +5 -1
- data/lib/ruby_llm/contract/rake_task.rb +20 -1
- data/lib/ruby_llm/contract/rspec/helpers.rb +91 -6
- data/lib/ruby_llm/contract/rspec.rb +13 -0
- data/lib/ruby_llm/contract/step/base.rb +4 -2
- data/lib/ruby_llm/contract/step/dsl.rb +51 -16
- data/lib/ruby_llm/contract/step/limit_checker.rb +20 -3
- data/lib/ruby_llm/contract/step/runner.rb +3 -1
- data/lib/ruby_llm/contract/version.rb +1 -1
- data/lib/ruby_llm/contract.rb +28 -0
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 502a22f4a2c8f88416bac904fb2ca370f25ba70076b3257700ae296705320314
|
|
4
|
+
data.tar.gz: '096dd32146b497b400984185185b9e2e81e6b5b53169896946a43545e368b25c'
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 2111cd0c66eee5c1bec53ae4e5278aa9a79643304f3812bba65113ded58b7a42fa56b4d612461e1e5553e4cebd529417760bc07c919a52b1462498ca3ececbf3
|
|
7
|
+
data.tar.gz: 61e8112e9ec2c577d675458d53ecbae303da8db31351803d6e0758b7b7f8b6566587147efa8b889d93a955b85217ebe7d1883d6c506f53d04490a50b6448cf2a
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,45 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.4.5 (2026-03-24)
|
|
4
|
+
|
|
5
|
+
Audit hardening — 18 bugs fixed across 4 audit rounds.
|
|
6
|
+
|
|
7
|
+
### Fixes
|
|
8
|
+
|
|
9
|
+
- **RakeTask history before abort** — `track_history` now saves all reports (pass and fail) before gating, so failed runs appear in eval history.
|
|
10
|
+
- **RSpec/Minitest stub scoping** — block form `stub_step` uses thread-local overrides with real cleanup. Non-block `stub_all_steps` auto-restored by RSpec `around(:each)` hook and Minitest `setup`/`teardown`.
|
|
11
|
+
- **StepAdapterOverride** — handles `context: nil` and respects string key `"adapter"`. Moved to `contract.rb` so both test frameworks share one mechanism.
|
|
12
|
+
- **max_cost fail closed output estimate** — preflight uses 1x input tokens as output estimate when `max_output` not set, preventing cost bypass for output-expensive models.
|
|
13
|
+
- **reset_configuration! clears overrides** — `step_adapter_overrides` now cleared on reset.
|
|
14
|
+
- **CostCalculator.register_model** — validates `Numeric`, `finite?`, non-negative. Rejects NaN, Infinity, strings, nil.
|
|
15
|
+
- **Pipeline token_budget** — rejects negative and zero values (parity with `timeout_ms`).
|
|
16
|
+
- **track_history model fallback** — uses step DSL `model`, then `default_model` when context has no model. Handles string key `"model"`.
|
|
17
|
+
- **estimate_cost / estimate_eval_cost** — falls back to step DSL model when no explicit model arg given.
|
|
18
|
+
- **stub_steps string keys** — both RSpec and Minitest normalize string-keyed options with `transform_keys(:to_sym)`.
|
|
19
|
+
- **DSL `:default` reset** — `model(:default)`, `temperature(:default)`, `max_cost(:default)` reset inherited parent values.
|
|
20
|
+
|
|
21
|
+
## 0.4.4 (2026-03-24)
|
|
22
|
+
|
|
23
|
+
- **`stub_steps` (plural)** — stub multiple steps with different responses in one block. No nesting needed. Works in RSpec and Minitest:
|
|
24
|
+
```ruby
|
|
25
|
+
stub_steps(
|
|
26
|
+
ClassifyTicket => { response: { priority: "high" } },
|
|
27
|
+
RouteToTeam => { response: { team: "billing" } }
|
|
28
|
+
) { TicketPipeline.run("test") }
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## 0.4.3 (2026-03-24)
|
|
32
|
+
|
|
33
|
+
Production feedback release — driven by ADR-0016 (real Rails 8.1 deployment).
|
|
34
|
+
|
|
35
|
+
### Features
|
|
36
|
+
|
|
37
|
+
- **`stub_step` block form** — `stub_step(Step, response: x) { test }` auto-resets adapter after block. Works in RSpec and Minitest. Eliminates leaked test state.
|
|
38
|
+
- **Minitest per-step routing** — `stub_step(StepA, ...)` now actually routes to StepA only (was setting global adapter, ignoring step class).
|
|
39
|
+
- **`track_history` in RakeTask** — `t.track_history = true` auto-appends every eval run (pass and fail) to `.eval_history/`. Drift detection without manual `save_history!` calls.
|
|
40
|
+
- **`max_cost` fail closed** — unknown model pricing now refuses the call instead of silently skipping. Set `on_unknown_pricing: :warn` for old behavior.
|
|
41
|
+
- **`CostCalculator.register_model`** — register pricing for custom/fine-tuned models: `register_model("ft:gpt-4o", input_per_1m: 3.0, output_per_1m: 6.0)`.
|
|
42
|
+
|
|
3
43
|
## 0.4.2 (2026-03-24)
|
|
4
44
|
|
|
5
45
|
- **RakeTask lazy context** — `t.context` now accepts a Proc, resolved at task runtime (after `:environment`). Fixes adapter not being available at Rake load time in Rails apps.
|
data/Gemfile.lock
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
ruby_llm-contract (0.4.
|
|
4
|
+
ruby_llm-contract (0.4.5)
|
|
5
5
|
dry-types (~> 1.7)
|
|
6
6
|
ruby_llm (~> 1.0)
|
|
7
7
|
ruby_llm-schema (~> 0.3)
|
|
@@ -165,7 +165,7 @@ CHECKSUMS
|
|
|
165
165
|
rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
|
|
166
166
|
ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
|
|
167
167
|
ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
|
|
168
|
-
ruby_llm-contract (0.4.
|
|
168
|
+
ruby_llm-contract (0.4.5)
|
|
169
169
|
ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
|
|
170
170
|
unicode-display_width (3.2.0) sha256=0cdd96b5681a5949cdbc2c55e7b420facae74c4aaf9a9815eee1087cb1853c42
|
|
171
171
|
unicode-emoji (4.2.0) sha256=519e69150f75652e40bf736106cfbc8f0f73aa3fb6a65afe62fefa7f80b0f80f
|
|
@@ -3,6 +3,36 @@
|
|
|
3
3
|
module RubyLLM
|
|
4
4
|
module Contract
|
|
5
5
|
module CostCalculator
|
|
6
|
+
# Simple struct for custom-registered model pricing
|
|
7
|
+
RegisteredModel = Struct.new(:input_price_per_million, :output_price_per_million, keyword_init: true)
|
|
8
|
+
|
|
9
|
+
@custom_models = {}
|
|
10
|
+
|
|
11
|
+
# Register pricing for custom or fine-tuned models not in the RubyLLM registry.
|
|
12
|
+
#
|
|
13
|
+
# CostCalculator.register_model("ft:gpt-4o-custom",
|
|
14
|
+
# input_per_1m: 3.0, output_per_1m: 6.0)
|
|
15
|
+
#
|
|
16
|
+
def self.register_model(model_name, input_per_1m:, output_per_1m:)
|
|
17
|
+
validate_price!(:input_per_1m, input_per_1m)
|
|
18
|
+
validate_price!(:output_per_1m, output_per_1m)
|
|
19
|
+
|
|
20
|
+
@custom_models[model_name] = RegisteredModel.new(
|
|
21
|
+
input_price_per_million: input_per_1m,
|
|
22
|
+
output_price_per_million: output_per_1m
|
|
23
|
+
)
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
# Remove a previously registered custom model. Mainly useful in tests.
|
|
27
|
+
def self.unregister_model(model_name)
|
|
28
|
+
@custom_models.delete(model_name)
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
# Reset all custom model registrations. Mainly useful in tests.
|
|
32
|
+
def self.reset_custom_models!
|
|
33
|
+
@custom_models.clear
|
|
34
|
+
end
|
|
35
|
+
|
|
6
36
|
def self.calculate(model_name:, usage:)
|
|
7
37
|
return nil unless model_name && usage.is_a?(Hash)
|
|
8
38
|
|
|
@@ -25,6 +55,10 @@ module RubyLLM
|
|
|
25
55
|
end
|
|
26
56
|
|
|
27
57
|
def self.find_model(model_name)
|
|
58
|
+
# Check custom registry first
|
|
59
|
+
custom = @custom_models[model_name]
|
|
60
|
+
return custom if custom
|
|
61
|
+
|
|
28
62
|
return nil unless defined?(RubyLLM)
|
|
29
63
|
|
|
30
64
|
RubyLLM.models.find(model_name)
|
|
@@ -32,7 +66,13 @@ module RubyLLM
|
|
|
32
66
|
nil
|
|
33
67
|
end
|
|
34
68
|
|
|
35
|
-
|
|
69
|
+
def self.validate_price!(name, value)
|
|
70
|
+
unless value.is_a?(Numeric) && value.finite? && !value.negative?
|
|
71
|
+
raise ArgumentError, "#{name} must be a finite non-negative number, got #{value.inspect}"
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
private_class_method :compute_cost, :token_cost, :find_model, :validate_price!
|
|
36
76
|
end
|
|
37
77
|
end
|
|
38
78
|
end
|
|
@@ -5,6 +5,20 @@ require "ruby_llm/contract"
|
|
|
5
5
|
module RubyLLM
|
|
6
6
|
module Contract
|
|
7
7
|
module MinitestHelpers
|
|
8
|
+
# Snapshot adapter before each test so teardown can restore it.
|
|
9
|
+
def setup
|
|
10
|
+
super if defined?(super)
|
|
11
|
+
@_contract_original_adapter = RubyLLM::Contract.configuration.default_adapter
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
# Auto-cleanup: clear overrides AND restore original adapter.
|
|
15
|
+
# Prevents both non-block stub_step and stub_all_steps from leaking.
|
|
16
|
+
def teardown
|
|
17
|
+
RubyLLM::Contract.step_adapter_overrides.clear
|
|
18
|
+
RubyLLM::Contract.configuration.default_adapter = @_contract_original_adapter
|
|
19
|
+
super if defined?(super)
|
|
20
|
+
end
|
|
21
|
+
|
|
8
22
|
def assert_satisfies_contract(result, msg = nil)
|
|
9
23
|
assert result.ok?, msg || "Expected step result to satisfy contract, " \
|
|
10
24
|
"but got status: #{result.status}. Errors: #{result.validation_errors.join(", ")}"
|
|
@@ -33,13 +47,113 @@ module RubyLLM
|
|
|
33
47
|
report
|
|
34
48
|
end
|
|
35
49
|
|
|
36
|
-
|
|
50
|
+
# Stub a specific step to return a canned response without API calls.
|
|
51
|
+
# Routes per-step — other steps are not affected.
|
|
52
|
+
#
|
|
53
|
+
# stub_step(ClassifyTicket, response: { priority: "high" })
|
|
54
|
+
#
|
|
55
|
+
# Supports an optional block form — the override is removed after the
|
|
56
|
+
# block returns (even if it raises):
|
|
57
|
+
#
|
|
58
|
+
# stub_step(ClassifyTicket, response: data) do
|
|
59
|
+
# result = ClassifyTicket.run("test")
|
|
60
|
+
# end
|
|
61
|
+
# # ClassifyTicket.run no longer stubbed
|
|
62
|
+
#
|
|
63
|
+
def stub_step(step_class, response: nil, responses: nil, &block)
|
|
64
|
+
adapter = if responses
|
|
65
|
+
Adapters::Test.new(responses: responses)
|
|
66
|
+
else
|
|
67
|
+
Adapters::Test.new(response: response)
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
overrides = RubyLLM::Contract.step_adapter_overrides
|
|
71
|
+
previous = overrides[step_class]
|
|
72
|
+
overrides[step_class] = adapter
|
|
73
|
+
|
|
74
|
+
if block
|
|
75
|
+
begin
|
|
76
|
+
yield
|
|
77
|
+
ensure
|
|
78
|
+
if previous
|
|
79
|
+
overrides[step_class] = previous
|
|
80
|
+
else
|
|
81
|
+
overrides.delete(step_class)
|
|
82
|
+
end
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
# Stub multiple steps at once with different responses.
|
|
88
|
+
# Takes a hash of step_class => options. Requires a block.
|
|
89
|
+
#
|
|
90
|
+
# stub_steps(
|
|
91
|
+
# ClassifyTicket => { response: { priority: "high" } },
|
|
92
|
+
# RouteToTeam => { response: { team: "billing" } }
|
|
93
|
+
# ) do
|
|
94
|
+
# result = TicketPipeline.run("test")
|
|
95
|
+
# end
|
|
96
|
+
#
|
|
97
|
+
def stub_steps(stubs, &block)
|
|
98
|
+
raise ArgumentError, "stub_steps requires a block" unless block
|
|
99
|
+
|
|
100
|
+
overrides = RubyLLM::Contract.step_adapter_overrides
|
|
101
|
+
previous = {}
|
|
102
|
+
|
|
103
|
+
stubs.each do |step_class, opts|
|
|
104
|
+
opts = opts.transform_keys(&:to_sym)
|
|
105
|
+
adapter = if opts[:responses]
|
|
106
|
+
Adapters::Test.new(responses: opts[:responses])
|
|
107
|
+
else
|
|
108
|
+
Adapters::Test.new(response: opts[:response])
|
|
109
|
+
end
|
|
110
|
+
previous[step_class] = overrides[step_class]
|
|
111
|
+
overrides[step_class] = adapter
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
begin
|
|
115
|
+
yield
|
|
116
|
+
ensure
|
|
117
|
+
stubs.each_key do |step_class|
|
|
118
|
+
if previous[step_class]
|
|
119
|
+
overrides[step_class] = previous[step_class]
|
|
120
|
+
else
|
|
121
|
+
overrides.delete(step_class)
|
|
122
|
+
end
|
|
123
|
+
end
|
|
124
|
+
end
|
|
125
|
+
end
|
|
126
|
+
|
|
127
|
+
# Set a global test adapter for ALL steps.
|
|
128
|
+
#
|
|
129
|
+
# stub_all_steps(response: { default: true })
|
|
130
|
+
#
|
|
131
|
+
# Supports an optional block form — the previous adapter is restored
|
|
132
|
+
# after the block returns (even if it raises):
|
|
133
|
+
#
|
|
134
|
+
# stub_all_steps(response: { default: true }) do
|
|
135
|
+
# # all steps use test adapter
|
|
136
|
+
# end
|
|
137
|
+
# # original adapter restored
|
|
138
|
+
#
|
|
139
|
+
def stub_all_steps(response: nil, responses: nil, &block)
|
|
37
140
|
adapter = if responses
|
|
38
141
|
Adapters::Test.new(responses: responses)
|
|
39
142
|
else
|
|
40
143
|
Adapters::Test.new(response: response)
|
|
41
144
|
end
|
|
42
|
-
|
|
145
|
+
|
|
146
|
+
if block
|
|
147
|
+
previous = RubyLLM::Contract.configuration.default_adapter
|
|
148
|
+
begin
|
|
149
|
+
RubyLLM::Contract.configuration.default_adapter = adapter
|
|
150
|
+
yield
|
|
151
|
+
ensure
|
|
152
|
+
RubyLLM::Contract.configuration.default_adapter = previous
|
|
153
|
+
end
|
|
154
|
+
else
|
|
155
|
+
RubyLLM::Contract.configure { |c| c.default_adapter = adapter }
|
|
156
|
+
end
|
|
43
157
|
end
|
|
44
158
|
end
|
|
45
159
|
end
|
|
@@ -29,7 +29,11 @@ module RubyLLM
|
|
|
29
29
|
end
|
|
30
30
|
|
|
31
31
|
def token_budget(limit = nil)
|
|
32
|
-
|
|
32
|
+
if limit
|
|
33
|
+
raise ArgumentError, "token_budget must be positive, got #{limit}" unless limit.positive?
|
|
34
|
+
|
|
35
|
+
return @token_budget = limit
|
|
36
|
+
end
|
|
33
37
|
|
|
34
38
|
@token_budget
|
|
35
39
|
end
|
|
@@ -7,7 +7,7 @@ module RubyLLM
|
|
|
7
7
|
module Contract
|
|
8
8
|
class RakeTask < ::Rake::TaskLib
|
|
9
9
|
attr_accessor :name, :context, :fail_on_empty, :minimum_score, :maximum_cost,
|
|
10
|
-
:eval_dirs, :save_baseline, :fail_on_regression
|
|
10
|
+
:eval_dirs, :save_baseline, :fail_on_regression, :track_history
|
|
11
11
|
|
|
12
12
|
def initialize(name = :"ruby_llm_contract:eval", &block)
|
|
13
13
|
super()
|
|
@@ -19,6 +19,7 @@ module RubyLLM
|
|
|
19
19
|
@eval_dirs = [] # directories to load eval files from (non-Rails)
|
|
20
20
|
@save_baseline = false
|
|
21
21
|
@fail_on_regression = false
|
|
22
|
+
@track_history = false
|
|
22
23
|
block&.call(self)
|
|
23
24
|
define_task
|
|
24
25
|
end
|
|
@@ -47,18 +48,23 @@ module RubyLLM
|
|
|
47
48
|
suite_cost = 0.0
|
|
48
49
|
|
|
49
50
|
passed_reports = []
|
|
51
|
+
all_reports = []
|
|
50
52
|
|
|
51
53
|
results.each do |host, reports|
|
|
52
54
|
puts "\n#{host.name || host.to_s}"
|
|
53
55
|
reports.each_value do |report|
|
|
54
56
|
report.print_summary
|
|
55
57
|
suite_cost += report.total_cost
|
|
58
|
+
all_reports << [host, report]
|
|
56
59
|
report_ok = report_meets_score?(report) && !check_regression(report)
|
|
57
60
|
gate_passed = false unless report_ok
|
|
58
61
|
passed_reports << report if report_ok
|
|
59
62
|
end
|
|
60
63
|
end
|
|
61
64
|
|
|
65
|
+
# Save history BEFORE gating — failures are valuable trend data (ADR-0016 F3)
|
|
66
|
+
save_all_history!(all_reports, context) if @track_history
|
|
67
|
+
|
|
62
68
|
if @maximum_cost && suite_cost > @maximum_cost
|
|
63
69
|
abort "\nEval suite FAILED: total cost $#{format("%.4f", suite_cost)} " \
|
|
64
70
|
"exceeds budget $#{format("%.4f", @maximum_cost)}"
|
|
@@ -68,6 +74,7 @@ module RubyLLM
|
|
|
68
74
|
|
|
69
75
|
# Save baselines only after ALL gates pass
|
|
70
76
|
passed_reports.each { |r| save_baseline!(r) } if @save_baseline
|
|
77
|
+
|
|
71
78
|
puts "\nAll evals passed."
|
|
72
79
|
end
|
|
73
80
|
end
|
|
@@ -98,6 +105,18 @@ module RubyLLM
|
|
|
98
105
|
puts " Baseline saved: #{path}"
|
|
99
106
|
end
|
|
100
107
|
|
|
108
|
+
def save_all_history!(host_reports, context)
|
|
109
|
+
context_model = (context[:model] || context["model"]) if context.is_a?(Hash)
|
|
110
|
+
host_reports.each do |host, report|
|
|
111
|
+
# Model priority: context > step DSL > default config
|
|
112
|
+
model = context_model
|
|
113
|
+
model ||= (host.model if host.respond_to?(:model))
|
|
114
|
+
model ||= RubyLLM::Contract.configuration.default_model rescue nil
|
|
115
|
+
path = report.save_history!(model: model)
|
|
116
|
+
puts " History saved: #{path}"
|
|
117
|
+
end
|
|
118
|
+
end
|
|
119
|
+
|
|
101
120
|
def task_prerequisites
|
|
102
121
|
defined?(::Rails) ? [:environment] : []
|
|
103
122
|
end
|
|
@@ -12,11 +12,77 @@ module RubyLLM
|
|
|
12
12
|
#
|
|
13
13
|
# Only affects the specified step — other steps are not affected.
|
|
14
14
|
#
|
|
15
|
-
|
|
15
|
+
# With a block, the stub is scoped — cleaned up after the block:
|
|
16
|
+
#
|
|
17
|
+
# stub_step(ClassifyTicket, response: data) do
|
|
18
|
+
# # only stubbed inside this block
|
|
19
|
+
# end
|
|
20
|
+
# # ClassifyTicket no longer stubbed
|
|
21
|
+
#
|
|
22
|
+
# Without a block, the stub lives until the RSpec example ends.
|
|
23
|
+
#
|
|
24
|
+
def stub_step(step_class, response: nil, responses: nil, &block)
|
|
16
25
|
adapter = build_test_adapter(response: response, responses: responses)
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
26
|
+
|
|
27
|
+
if block
|
|
28
|
+
# Block form: use thread-local overrides with save/restore for real scoping
|
|
29
|
+
overrides = RubyLLM::Contract.step_adapter_overrides
|
|
30
|
+
previous = overrides[step_class]
|
|
31
|
+
overrides[step_class] = adapter
|
|
32
|
+
begin
|
|
33
|
+
yield
|
|
34
|
+
ensure
|
|
35
|
+
if previous
|
|
36
|
+
overrides[step_class] = previous
|
|
37
|
+
else
|
|
38
|
+
overrides.delete(step_class)
|
|
39
|
+
end
|
|
40
|
+
end
|
|
41
|
+
else
|
|
42
|
+
# Non-block: use RSpec allow (auto-cleaned after example)
|
|
43
|
+
allow(step_class).to receive(:run).and_wrap_original do |original, input, **kwargs|
|
|
44
|
+
context = kwargs[:context] || {}
|
|
45
|
+
unless context.key?(:adapter) || context.key?("adapter")
|
|
46
|
+
context = context.merge(adapter: adapter)
|
|
47
|
+
end
|
|
48
|
+
original.call(input, context: context)
|
|
49
|
+
end
|
|
50
|
+
end
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
# Stub multiple steps at once with different responses.
|
|
54
|
+
# Takes a hash of step_class => options. Requires a block.
|
|
55
|
+
#
|
|
56
|
+
# stub_steps(
|
|
57
|
+
# ClassifyTicket => { response: { priority: "high" } },
|
|
58
|
+
# RouteToTeam => { response: { team: "billing" } }
|
|
59
|
+
# ) do
|
|
60
|
+
# result = TicketPipeline.run("test")
|
|
61
|
+
# end
|
|
62
|
+
#
|
|
63
|
+
def stub_steps(stubs, &block)
|
|
64
|
+
raise ArgumentError, "stub_steps requires a block" unless block
|
|
65
|
+
|
|
66
|
+
overrides = RubyLLM::Contract.step_adapter_overrides
|
|
67
|
+
previous = {}
|
|
68
|
+
|
|
69
|
+
stubs.each do |step_class, opts|
|
|
70
|
+
opts = opts.transform_keys(&:to_sym)
|
|
71
|
+
adapter = build_test_adapter(**opts)
|
|
72
|
+
previous[step_class] = overrides[step_class]
|
|
73
|
+
overrides[step_class] = adapter
|
|
74
|
+
end
|
|
75
|
+
|
|
76
|
+
begin
|
|
77
|
+
yield
|
|
78
|
+
ensure
|
|
79
|
+
stubs.each_key do |step_class|
|
|
80
|
+
if previous[step_class]
|
|
81
|
+
overrides[step_class] = previous[step_class]
|
|
82
|
+
else
|
|
83
|
+
overrides.delete(step_class)
|
|
84
|
+
end
|
|
85
|
+
end
|
|
20
86
|
end
|
|
21
87
|
end
|
|
22
88
|
|
|
@@ -24,9 +90,28 @@ module RubyLLM
|
|
|
24
90
|
#
|
|
25
91
|
# stub_all_steps(response: { default: true })
|
|
26
92
|
#
|
|
27
|
-
|
|
93
|
+
# Supports an optional block form — the previous adapter is restored
|
|
94
|
+
# after the block returns (even if it raises):
|
|
95
|
+
#
|
|
96
|
+
# stub_all_steps(response: { default: true }) do
|
|
97
|
+
# # all steps use test adapter
|
|
98
|
+
# end
|
|
99
|
+
# # original adapter restored
|
|
100
|
+
#
|
|
101
|
+
def stub_all_steps(response: nil, responses: nil, &block)
|
|
28
102
|
adapter = build_test_adapter(response: response, responses: responses)
|
|
29
|
-
|
|
103
|
+
|
|
104
|
+
if block
|
|
105
|
+
previous = RubyLLM::Contract.configuration.default_adapter
|
|
106
|
+
begin
|
|
107
|
+
RubyLLM::Contract.configuration.default_adapter = adapter
|
|
108
|
+
yield
|
|
109
|
+
ensure
|
|
110
|
+
RubyLLM::Contract.configuration.default_adapter = previous
|
|
111
|
+
end
|
|
112
|
+
else
|
|
113
|
+
RubyLLM::Contract.configure { |c| c.default_adapter = adapter }
|
|
114
|
+
end
|
|
30
115
|
end
|
|
31
116
|
|
|
32
117
|
private
|
|
@@ -8,4 +8,17 @@ require_relative "rspec/helpers"
|
|
|
8
8
|
|
|
9
9
|
RSpec.configure do |config|
|
|
10
10
|
config.include RubyLLM::Contract::RSpec::Helpers
|
|
11
|
+
|
|
12
|
+
# Auto-cleanup: snapshot adapter before each example, restore after.
|
|
13
|
+
# Prevents non-block stub_all_steps from leaking between examples.
|
|
14
|
+
config.around(:each) do |example|
|
|
15
|
+
original_adapter = RubyLLM::Contract.configuration.default_adapter
|
|
16
|
+
original_overrides = RubyLLM::Contract.step_adapter_overrides.dup
|
|
17
|
+
begin
|
|
18
|
+
example.run
|
|
19
|
+
ensure
|
|
20
|
+
RubyLLM::Contract.configuration.default_adapter = original_adapter
|
|
21
|
+
RubyLLM::Contract.step_adapter_overrides.replace(original_overrides)
|
|
22
|
+
end
|
|
23
|
+
end
|
|
11
24
|
end if defined?(::RSpec)
|
|
@@ -24,7 +24,7 @@ module RubyLLM
|
|
|
24
24
|
end
|
|
25
25
|
|
|
26
26
|
def estimate_cost(input:, model: nil)
|
|
27
|
-
model_name = model || RubyLLM::Contract.configuration.default_model
|
|
27
|
+
model_name = model || (self.model if respond_to?(:model)) || RubyLLM::Contract.configuration.default_model
|
|
28
28
|
messages = build_messages(input)
|
|
29
29
|
input_tokens = TokenEstimator.estimate(messages)
|
|
30
30
|
output_tokens = max_output || 256 # conservative default
|
|
@@ -46,7 +46,8 @@ module RubyLLM
|
|
|
46
46
|
defn = send(:all_eval_definitions)[eval_name.to_s]
|
|
47
47
|
raise ArgumentError, "No eval '#{eval_name}' defined" unless defn
|
|
48
48
|
|
|
49
|
-
|
|
49
|
+
step_model = (self.model if respond_to?(:model))
|
|
50
|
+
model_list = models || [step_model || RubyLLM::Contract.configuration.default_model].compact
|
|
50
51
|
cases = defn.build_dataset.cases
|
|
51
52
|
|
|
52
53
|
model_list.each_with_object({}) do |model_name, result|
|
|
@@ -117,6 +118,7 @@ module RubyLLM
|
|
|
117
118
|
prompt_block: prompt, contract_definition: effective_contract,
|
|
118
119
|
adapter: adapter, model: model, output_schema: output_schema,
|
|
119
120
|
max_output: max_output, max_input: max_input, max_cost: max_cost,
|
|
121
|
+
on_unknown_pricing: on_unknown_pricing,
|
|
120
122
|
temperature: effective_temp, extra_options: extra_options
|
|
121
123
|
).call(input)
|
|
122
124
|
rescue ArgumentError => e
|
|
@@ -111,48 +111,83 @@ module RubyLLM
|
|
|
111
111
|
end
|
|
112
112
|
end
|
|
113
113
|
|
|
114
|
-
def max_cost(amount = nil)
|
|
114
|
+
def max_cost(amount = nil, on_unknown_pricing: nil)
|
|
115
|
+
if amount == :default
|
|
116
|
+
@max_cost = nil
|
|
117
|
+
@max_cost_explicitly_unset = true
|
|
118
|
+
@on_unknown_pricing = nil
|
|
119
|
+
return nil
|
|
120
|
+
end
|
|
121
|
+
|
|
115
122
|
if amount
|
|
116
123
|
unless amount.is_a?(Numeric) && amount.positive?
|
|
117
124
|
raise ArgumentError, "max_cost must be positive, got #{amount}"
|
|
118
125
|
end
|
|
119
126
|
|
|
120
|
-
|
|
127
|
+
if on_unknown_pricing && !%i[refuse warn].include?(on_unknown_pricing)
|
|
128
|
+
raise ArgumentError, "on_unknown_pricing must be :refuse or :warn, got #{on_unknown_pricing.inspect}"
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
@max_cost_explicitly_unset = false
|
|
132
|
+
@max_cost = amount
|
|
133
|
+
@on_unknown_pricing = on_unknown_pricing || :refuse
|
|
134
|
+
return @max_cost
|
|
121
135
|
end
|
|
122
136
|
|
|
123
|
-
if defined?(@max_cost)
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
137
|
+
return @max_cost if defined?(@max_cost) && !@max_cost_explicitly_unset
|
|
138
|
+
return nil if @max_cost_explicitly_unset
|
|
139
|
+
|
|
140
|
+
superclass.max_cost if superclass.respond_to?(:max_cost)
|
|
141
|
+
end
|
|
142
|
+
|
|
143
|
+
def on_unknown_pricing
|
|
144
|
+
if defined?(@on_unknown_pricing)
|
|
145
|
+
@on_unknown_pricing
|
|
146
|
+
elsif superclass.respond_to?(:on_unknown_pricing)
|
|
147
|
+
superclass.on_unknown_pricing
|
|
148
|
+
else
|
|
149
|
+
:refuse
|
|
127
150
|
end
|
|
128
151
|
end
|
|
129
152
|
|
|
130
153
|
def model(name = nil)
|
|
154
|
+
if name == :default
|
|
155
|
+
@model = nil
|
|
156
|
+
@model_explicitly_unset = true
|
|
157
|
+
return nil
|
|
158
|
+
end
|
|
159
|
+
|
|
131
160
|
if name
|
|
161
|
+
@model_explicitly_unset = false
|
|
132
162
|
return @model = name
|
|
133
163
|
end
|
|
134
164
|
|
|
135
|
-
if defined?(@model)
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
end
|
|
165
|
+
return @model if defined?(@model) && !@model_explicitly_unset
|
|
166
|
+
return nil if @model_explicitly_unset
|
|
167
|
+
|
|
168
|
+
superclass.model if superclass.respond_to?(:model)
|
|
140
169
|
end
|
|
141
170
|
|
|
142
171
|
def temperature(value = nil)
|
|
172
|
+
if value == :default
|
|
173
|
+
@temperature = nil
|
|
174
|
+
@temperature_explicitly_unset = true
|
|
175
|
+
return nil
|
|
176
|
+
end
|
|
177
|
+
|
|
143
178
|
if value
|
|
144
179
|
unless value.is_a?(Numeric) && value >= 0 && value <= 2
|
|
145
180
|
raise ArgumentError, "temperature must be 0.0-2.0, got #{value}"
|
|
146
181
|
end
|
|
147
182
|
|
|
183
|
+
@temperature_explicitly_unset = false
|
|
148
184
|
return @temperature = value
|
|
149
185
|
end
|
|
150
186
|
|
|
151
|
-
if defined?(@temperature)
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
end
|
|
187
|
+
return @temperature if defined?(@temperature) && !@temperature_explicitly_unset
|
|
188
|
+
return nil if @temperature_explicitly_unset
|
|
189
|
+
|
|
190
|
+
superclass.temperature if superclass.respond_to?(:temperature)
|
|
156
191
|
end
|
|
157
192
|
|
|
158
193
|
def around_call(&block)
|
|
@@ -28,16 +28,22 @@ module RubyLLM
|
|
|
28
28
|
errors
|
|
29
29
|
end
|
|
30
30
|
|
|
31
|
+
# Default output estimate when max_output is not set.
|
|
32
|
+
# Uses input token count as a conservative proxy — most LLM responses
|
|
33
|
+
# are shorter than the input, so this overestimates slightly.
|
|
34
|
+
# Without this, output cost is zero and max_cost can be bypassed
|
|
35
|
+
# for models expensive on completion side.
|
|
36
|
+
DEFAULT_OUTPUT_RATIO = 1
|
|
37
|
+
|
|
31
38
|
def append_cost_error(estimated, errors)
|
|
32
|
-
estimated_output = effective_max_output ||
|
|
39
|
+
estimated_output = effective_max_output || (estimated * DEFAULT_OUTPUT_RATIO)
|
|
33
40
|
estimated_cost = CostCalculator.calculate(
|
|
34
41
|
model_name: @model,
|
|
35
42
|
usage: { input_tokens: estimated, output_tokens: estimated_output }
|
|
36
43
|
)
|
|
37
44
|
|
|
38
45
|
if estimated_cost.nil?
|
|
39
|
-
|
|
40
|
-
"has no pricing data — cost limit not enforced"
|
|
46
|
+
handle_unknown_pricing(errors)
|
|
41
47
|
elsif estimated_cost > @max_cost
|
|
42
48
|
errors << "Cost limit exceeded: estimated $#{format("%.6f", estimated_cost)} " \
|
|
43
49
|
"(#{estimated} input + #{estimated_output} output tokens), " \
|
|
@@ -45,6 +51,17 @@ module RubyLLM
|
|
|
45
51
|
end
|
|
46
52
|
end
|
|
47
53
|
|
|
54
|
+
def handle_unknown_pricing(errors)
|
|
55
|
+
if @on_unknown_pricing == :warn
|
|
56
|
+
warn "[ruby_llm-contract] max_cost is configured but model '#{@model}' " \
|
|
57
|
+
"has no pricing data — cost limit not enforced"
|
|
58
|
+
else
|
|
59
|
+
errors << "max_cost is set but model '#{@model}' has no pricing data. " \
|
|
60
|
+
"Register pricing via CostCalculator.register_model or set " \
|
|
61
|
+
"on_unknown_pricing: :warn to proceed without cost checks."
|
|
62
|
+
end
|
|
63
|
+
end
|
|
64
|
+
|
|
48
65
|
def build_limit_result(messages, estimated, errors)
|
|
49
66
|
Result.new(
|
|
50
67
|
status: :limit_exceeded,
|
|
@@ -8,7 +8,8 @@ module RubyLLM
|
|
|
8
8
|
|
|
9
9
|
def initialize(input_type:, output_type:, prompt_block:, contract_definition:,
|
|
10
10
|
adapter:, model:, output_schema: nil, max_output: nil,
|
|
11
|
-
max_input: nil, max_cost: nil,
|
|
11
|
+
max_input: nil, max_cost: nil, on_unknown_pricing: :refuse,
|
|
12
|
+
temperature: nil, extra_options: {})
|
|
12
13
|
@input_type = input_type
|
|
13
14
|
@output_type = output_type
|
|
14
15
|
@prompt_block = prompt_block
|
|
@@ -19,6 +20,7 @@ module RubyLLM
|
|
|
19
20
|
@max_output = max_output
|
|
20
21
|
@max_input = max_input
|
|
21
22
|
@max_cost = max_cost
|
|
23
|
+
@on_unknown_pricing = on_unknown_pricing
|
|
22
24
|
@temperature = temperature
|
|
23
25
|
@extra_options = extra_options
|
|
24
26
|
end
|
data/lib/ruby_llm/contract.rb
CHANGED
|
@@ -18,6 +18,7 @@ module RubyLLM
|
|
|
18
18
|
|
|
19
19
|
def reset_configuration!
|
|
20
20
|
@configuration = Configuration.new
|
|
21
|
+
step_adapter_overrides.clear
|
|
21
22
|
end
|
|
22
23
|
|
|
23
24
|
# --- Eval host registry ---
|
|
@@ -40,6 +41,15 @@ module RubyLLM
|
|
|
40
41
|
@eval_hosts = []
|
|
41
42
|
end
|
|
42
43
|
|
|
44
|
+
# Thread-local per-step adapter overrides used by test helpers (RSpec + Minitest).
|
|
45
|
+
def step_adapter_overrides
|
|
46
|
+
Thread.current[:ruby_llm_contract_step_overrides] ||= {}
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def step_adapter_overrides=(map)
|
|
50
|
+
Thread.current[:ruby_llm_contract_step_overrides] = map
|
|
51
|
+
end
|
|
52
|
+
|
|
43
53
|
def load_evals!(*dirs)
|
|
44
54
|
dirs = dirs.flatten.compact
|
|
45
55
|
if dirs.empty? && defined?(::Rails)
|
|
@@ -102,6 +112,21 @@ module RubyLLM
|
|
|
102
112
|
nil
|
|
103
113
|
end
|
|
104
114
|
end
|
|
115
|
+
|
|
116
|
+
# One-time prepend on Step::Base that checks the override map before
|
|
117
|
+
# falling through to the normal adapter resolution.
|
|
118
|
+
# Used by both RSpec and Minitest test helpers.
|
|
119
|
+
module StepAdapterOverride
|
|
120
|
+
def run(input, context: {})
|
|
121
|
+
context = context || {}
|
|
122
|
+
overrides = RubyLLM::Contract.step_adapter_overrides
|
|
123
|
+
unless overrides.empty? || context.key?(:adapter) || context.key?("adapter")
|
|
124
|
+
override = overrides[self]
|
|
125
|
+
context = context.merge(adapter: override) if override
|
|
126
|
+
end
|
|
127
|
+
super(input, context: context)
|
|
128
|
+
end
|
|
129
|
+
end
|
|
105
130
|
end
|
|
106
131
|
end
|
|
107
132
|
|
|
@@ -126,3 +151,6 @@ require_relative "contract/pipeline"
|
|
|
126
151
|
require_relative "contract/eval"
|
|
127
152
|
require_relative "contract/dsl"
|
|
128
153
|
require_relative "contract/railtie" if defined?(Rails::Railtie)
|
|
154
|
+
|
|
155
|
+
# Prepend after Step::Base is loaded
|
|
156
|
+
RubyLLM::Contract::Step::Base.singleton_class.prepend(RubyLLM::Contract::StepAdapterOverride)
|