ruby_llm-contract 0.8.0 → 0.10.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +64 -0
- data/Gemfile.lock +2 -2
- data/README.md +96 -37
- data/lib/ruby_llm/contract/adapters/ruby_llm.rb +9 -1
- data/lib/ruby_llm/contract/concerns/eval_host.rb +6 -9
- data/lib/ruby_llm/contract/concerns/stub_helpers.rb +97 -0
- data/lib/ruby_llm/contract/contract/definition.rb +2 -0
- data/lib/ruby_llm/contract/cost_calculator.rb +11 -2
- data/lib/ruby_llm/contract/eval/recommender.rb +3 -1
- data/lib/ruby_llm/contract/eval/retry_optimizer.rb +16 -13
- data/lib/ruby_llm/contract/eval.rb +13 -0
- data/lib/ruby_llm/contract/minitest.rb +6 -108
- data/lib/ruby_llm/contract/pipeline/result.rb +1 -1
- data/lib/ruby_llm/contract/rake_task/suite_gate.rb +117 -0
- data/lib/ruby_llm/contract/rake_task.rb +30 -51
- data/lib/ruby_llm/contract/rspec/helpers.rb +9 -123
- data/lib/ruby_llm/contract/step/base.rb +56 -24
- data/lib/ruby_llm/contract/step/dsl.rb +91 -63
- data/lib/ruby_llm/contract/step/limit_checker.rb +34 -1
- data/lib/ruby_llm/contract/step/retry_executor.rb +6 -13
- data/lib/ruby_llm/contract/step/runner.rb +22 -20
- data/lib/ruby_llm/contract/step/runner_config.rb +26 -0
- data/lib/ruby_llm/contract/version.rb +1 -1
- data/lib/ruby_llm/contract.rb +1 -0
- data/ruby_llm-contract.gemspec +5 -1
- metadata +3 -4
- data/.rspec +0 -3
- data/.rubycritic.yml +0 -8
- data/.simplecov +0 -22
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 66d3692be182ac8958132d06b53c1d47e01d07769d0e7a1942f81b71eb23a759
|
|
4
|
+
data.tar.gz: f254fcbc6bc8c2a42a78cff9ea3b24b23f04d860c60d79084bbd8a5d6ea6ec0b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f035aca56a5c0e2ca0607a06431cbb76a6ff4cc45f2f35fe9309d534b881910fa05d63c5c581a6e9cbac5374377f6fc187e521a844a653da5b53d612e9a23a89
|
|
7
|
+
data.tar.gz: 31bb34333a28224e7c00ce2dea7b3905dce878c320fc0fb8a0e5db1313259a0e12e136ab6cc74126b25f0a47984830ae2172e3f824ccfdcb28f847929f40080a
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,69 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.10.1 (2026-06-01)
|
|
4
|
+
|
|
5
|
+
Patch release fixing gem packaging. 0.10.0 was yanked from rubygems.org due to the issue documented below; 0.10.1 is the recommended upgrade target. No code behavior change vs 0.10.0.
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
|
|
9
|
+
- **Gem no longer ships internal tracker / dev configs.** Excluded from `spec.files`: `TODO.md`, `.rspec`, `.rubycritic.yml`, `.simplecov`, and the `.revive/` directory. Pre-0.10.1 the published gem contained these files; adopters who already extracted 0.10.0 can safely delete them.
|
|
10
|
+
|
|
11
|
+
## 0.10.0 (2026-06-01)
|
|
12
|
+
|
|
13
|
+
First published release since 0.8.0. Consolidates work originally tagged as 0.9.0 (multimodal input) and 0.9.1 (internal quality refactor), neither of which was pushed to rubygems. Adopters upgrading from 0.8.0 should read the **Behavioural change** and **Breaking changes** sections below before installing.
|
|
14
|
+
|
|
15
|
+
### Breaking changes
|
|
16
|
+
|
|
17
|
+
- **`validate(description, &block)` and `Definition#invariant(description, &block)` now raise `ArgumentError` when `description` is `nil` or empty.** Pre-0.10.0 the empty descriptor was silently accepted and produced `""` entries in `result.validation_errors`, making debugging impossible. Codex audit found zero production use sites across `lib/`, `examples/`, `README` — only the regression-marker test certifying the bug.
|
|
18
|
+
|
|
19
|
+
### Migration
|
|
20
|
+
|
|
21
|
+
Ensure every `validate` / `invariant` call has a non-empty descriptor (this is already how every README example writes them):
|
|
22
|
+
|
|
23
|
+
```ruby
|
|
24
|
+
# Before (silently accepted, produced "" in validation_errors):
|
|
25
|
+
validate("") { |o| o[:score].between?(0, 100) }
|
|
26
|
+
|
|
27
|
+
# After (required):
|
|
28
|
+
validate("score in range 0-100") { |o| o[:score].between?(0, 100) }
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
### Added
|
|
32
|
+
|
|
33
|
+
- **Multimodal input via `context: { attachment: ... }`** — pass a file/IO/URL through `Step.run(input, context: { attachment: path })`; the adapter forwards it to `RubyLLM::Chat#ask(content, with: attachment)`. RubyLLM normalises wire format per provider (Anthropic url/base64, OpenAI `image_url`/`file`, Gemini `inline_data`). Multi-attachment supported natively (`with: [pdf1, pdf2]` or `with: { images: [...], pdfs: [...] }`). See [multimodal input guide](docs/guide/multimodal_input.md) and [ADR-0022](doc/decisions/ADR-0022-v09-multimodal-input.md).
|
|
34
|
+
- **`attachment_token_estimate(n)` class macro** — adopter-declared conservative estimate of attachment input tokens. Applied to BOTH runtime (`limit_checker`) and pre-flight (`estimate_cost`) — same source of truth, no estimate/runtime drift.
|
|
35
|
+
- **`on_unknown_attachment_size(:refuse | :warn)` class macro** — mirrors `on_unknown_pricing` opt-out semantics. Defaults to `:refuse`. Never settable as global default — same invariant as `max_cost` fail-closed.
|
|
36
|
+
|
|
37
|
+
### Behavioural change (READ BEFORE UPGRADING)
|
|
38
|
+
|
|
39
|
+
- **Contracts with `max_cost` or `max_input` AND `context[:attachment]` set AND no `attachment_token_estimate` declared → REFUSE with `:limit_exceeded`.** This is fail-closed semantics: the gem cannot bound vision/PDF token cost without an adopter-declared estimate. Opt out per-step with `on_unknown_attachment_size :warn`. Text-only contracts and contracts without `max_cost`/`max_input` are unaffected.
|
|
40
|
+
|
|
41
|
+
### Changed
|
|
42
|
+
|
|
43
|
+
- **`run_eval` (no args) return shape pinned to `Hash<String, Report>` keyed by eval name.** Documents the existing contract used by `RubyLLM::Contract::RakeTask#collect_host_reports` and adopters. No runtime change vs 0.8.0 — only the spec assertion now locks the shape.
|
|
44
|
+
- **`Parser.parse(text, strategy: :json)` first-bracket-wins boundary documented.** Extraction commits to the first balanced `{` or `[` structure and does NOT retry on later candidates. Empty `{}` followed by real JSON parses as the empty Hash; non-JSON `{braces}` before real JSON raises `ParseError`. No runtime change — this codifies long-standing behavior with explicit boundary tests.
|
|
45
|
+
|
|
46
|
+
### Fixed
|
|
47
|
+
|
|
48
|
+
- **`with_retry_disabled` no longer mutates the step class's singleton method.** The optimizer now passes `retry_policy_override: nil` through `context:` to `compare_models`, which `Step::Base#runtime_settings` already honours. Removes a concurrency hazard where two parallel `optimize_retry_policy` calls on the same step would race on the singleton restore in `ensure`.
|
|
49
|
+
- **`CostCalculator.find_model` exposed as a public class method.** Removes two `CostCalculator.send(:find_model, ...)` workarounds in `Step::Base#estimate_cost`. The `estimated_cost_for` helper is gone — `estimate_cost` now routes through the existing public `CostCalculator.calculate(model_name:, usage:)`.
|
|
50
|
+
- **`stub_step` unified on a single storage path.** Both block and non-block forms now write to `RubyLLM::Contract.step_adapter_overrides` (thread-local). The `around(:each)` hook in `rspec.rb` handles cleanup between examples. Removes the prior `allow(step).to receive(:run)` branch.
|
|
51
|
+
|
|
52
|
+
### Internal
|
|
53
|
+
|
|
54
|
+
- **Anti-facade audit complete: 89/89 spec files under per-test 17-mode walk** (Phase A: 26 specs, Phase C: 63 specs via parallel Codex fan-out). Net +30 strengthened tests against mutation-blind assertions, zero public API change beyond the breaking entry above.
|
|
55
|
+
- **Dead `ObjectSpace.each_object(Class)` fallback removed** in `concerns/eval_host.rb#register_subclasses`. The gemspec requires Ruby `>= 3.2.0`, so `Class#subclasses` (Ruby ≥ 3.1) is always available; the legacy fallback was unreachable code that would have iterated all loaded classes O(n) and was not thread-safe.
|
|
56
|
+
|
|
57
|
+
### Deferred (not in 0.10.x)
|
|
58
|
+
|
|
59
|
+
- `add_history` multi-turn replay of prior attachments — single-turn multimodal supported; follow-up questions on the same document deferred to a later release.
|
|
60
|
+
- Streaming + attachment — contract steps remain synchronous.
|
|
61
|
+
- Provider-specific attachment size caps — surface only via `attachment_token_estimate` calibration; consult provider docs.
|
|
62
|
+
|
|
63
|
+
### Tests
|
|
64
|
+
|
|
65
|
+
- Suite: 1401 examples / 0 failures / 7 pending (was 1346/0/8 at 0.8.0).
|
|
66
|
+
|
|
3
67
|
## 0.8.0 (2026-04-26)
|
|
4
68
|
|
|
5
69
|
Narrative repositioning + small API additions. Internal architecture unchanged: no `Step::Base` refactor, no breaking changes to existing DSL.
|
data/Gemfile.lock
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
ruby_llm-contract (0.
|
|
4
|
+
ruby_llm-contract (0.10.1)
|
|
5
5
|
dry-types (~> 1.7)
|
|
6
6
|
ruby_llm (~> 1.12)
|
|
7
7
|
ruby_llm-schema (~> 0.3)
|
|
@@ -258,7 +258,7 @@ CHECKSUMS
|
|
|
258
258
|
rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
|
|
259
259
|
ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
|
|
260
260
|
ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
|
|
261
|
-
ruby_llm-contract (0.
|
|
261
|
+
ruby_llm-contract (0.10.1)
|
|
262
262
|
ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
|
|
263
263
|
ruby_parser (3.22.0) sha256=1eb4937cd9eb220aa2d194e352a24dba90aef00751e24c8dfffdb14000f15d23
|
|
264
264
|
rubycritic (4.12.0) sha256=024fed90fe656fa939f6ea80aab17569699ac3863d0b52fd72cb99892247abc8
|
data/README.md
CHANGED
|
@@ -2,9 +2,9 @@
|
|
|
2
2
|
|
|
3
3
|
**Contracts + Evals for [ruby_llm](https://github.com/crmne/ruby_llm).**
|
|
4
4
|
|
|
5
|
-
Your eval passed. Prod broke anyway? This gem wraps `RubyLLM::Chat` with input/output contracts, business-rule validation, retry with model escalation on validation failure, pre-flight cost ceilings, and
|
|
5
|
+
Your eval passed. Prod broke anyway? This gem wraps `RubyLLM::Chat` with input/output contracts, business-rule validation, retry with model escalation on validation failure, pre-flight cost ceilings, and a regression-eval framework — so a flaky cheap-model call escalates to a stronger model instead of shipping garbage to your user.
|
|
6
6
|
|
|
7
|
-
`ruby_llm` handles the HTTP side (rate limits, timeouts, streaming, tool calls, embeddings). This gem handles what the model *returned
|
|
7
|
+
`ruby_llm` handles the HTTP side (rate limits, timeouts, streaming, tool calls, embeddings). This gem handles what the model *returned* at **runtime**: schema validation, business rules, model escalation on failed validation, regression datasets that gate prompt/model changes in CI.
|
|
8
8
|
|
|
9
9
|
## Install
|
|
10
10
|
|
|
@@ -13,23 +13,24 @@ gem "ruby_llm-contract"
|
|
|
13
13
|
```
|
|
14
14
|
|
|
15
15
|
```ruby
|
|
16
|
-
RubyLLM.configure
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
Works with any `ruby_llm` provider (OpenAI, Anthropic, Gemini, etc).
|
|
21
|
-
|
|
22
|
-
## Do I need this?
|
|
16
|
+
RubyLLM.configure do |c|
|
|
17
|
+
c.openai_api_key = ENV["OPENAI_API_KEY"]
|
|
18
|
+
c.default_model = "gpt-4.1-mini" # used when a Step has no explicit model
|
|
19
|
+
end
|
|
23
20
|
|
|
24
|
-
|
|
21
|
+
# Required: boots the gem so `Step.run` knows how to talk to your LLM.
|
|
22
|
+
# Empty block is fine. Pass options here if you need them (e.g. `c.logger`).
|
|
23
|
+
RubyLLM::Contract.configure { }
|
|
24
|
+
```
|
|
25
25
|
|
|
26
|
-
|
|
26
|
+
Works with any `ruby_llm` provider (OpenAI, Anthropic, Gemini, etc). Requires `ruby_llm ~> 1.12` and Ruby ≥ 3.2.
|
|
27
27
|
|
|
28
28
|
## Example
|
|
29
29
|
|
|
30
30
|
A Rails app takes article text extracted from a user-submitted URL and wants to show a summary card: a short TL;DR, 3–5 key takeaways, and a tone label. The output has to fit the UI (TL;DR under 200 chars) and the schema has to be strict enough to render without conditionals.
|
|
31
31
|
|
|
32
32
|
```ruby
|
|
33
|
+
# app/contracts/summarize_article.rb
|
|
33
34
|
class SummarizeArticle < RubyLLM::Contract::Step::Base
|
|
34
35
|
prompt <<~PROMPT
|
|
35
36
|
Summarize this article for a UI card. Return a short TL;DR,
|
|
@@ -45,48 +46,93 @@ class SummarizeArticle < RubyLLM::Contract::Step::Base
|
|
|
45
46
|
end
|
|
46
47
|
|
|
47
48
|
validate("TL;DR fits the card") { |o, _| o[:tldr].length <= 200 }
|
|
48
|
-
validate("takeaways are unique") { |o, _| o[:takeaways]
|
|
49
|
+
validate("takeaways are unique") { |o, _| o[:takeaways] == o[:takeaways].uniq }
|
|
49
50
|
|
|
50
|
-
|
|
51
|
+
# Cheapest first; last step adds a reasoning model with more thinking.
|
|
52
|
+
retry_policy do
|
|
53
|
+
escalate "gpt-4.1-nano",
|
|
54
|
+
"gpt-4.1-mini",
|
|
55
|
+
{ model: "gpt-5", reasoning_effort: "high" }
|
|
56
|
+
end
|
|
51
57
|
end
|
|
52
58
|
|
|
53
59
|
result = SummarizeArticle.run(article_text)
|
|
54
|
-
result.
|
|
55
|
-
result.
|
|
56
|
-
result.trace[:
|
|
60
|
+
result.status # => :ok (or :validation_failed if all steps fail)
|
|
61
|
+
result.parsed_output # => { tldr: "...", takeaways: [...], tone: "..." }
|
|
62
|
+
result.trace[:model] # => "gpt-4.1-mini" (winning step)
|
|
63
|
+
result.trace[:cost] # => 0.000520 (total across all attempts)
|
|
64
|
+
|
|
65
|
+
result.trace[:attempts]
|
|
66
|
+
# => [
|
|
67
|
+
# {
|
|
68
|
+
# attempt: 1,
|
|
69
|
+
# model: "gpt-4.1-nano",
|
|
70
|
+
# status: :validation_failed,
|
|
71
|
+
# usage: { input_tokens: 256, output_tokens: 84 },
|
|
72
|
+
# latency_ms: 45,
|
|
73
|
+
# cost: 0.000100
|
|
74
|
+
# },
|
|
75
|
+
# {
|
|
76
|
+
# attempt: 2,
|
|
77
|
+
# model: "gpt-4.1-mini",
|
|
78
|
+
# status: :ok,
|
|
79
|
+
# usage: { input_tokens: 256, output_tokens: 92 },
|
|
80
|
+
# latency_ms: 92,
|
|
81
|
+
# cost: 0.000420
|
|
82
|
+
# }
|
|
83
|
+
# ]
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
If the response is malformed, the TL;DR overflows the card, or the takeaway count is off, the gem moves to the next step. This is model **escalation**, not a fallback list — each step is an independent config (`model`, `reasoning_effort`), so the retry policy spends more compute only when the cheaper one couldn't satisfy the contract.
|
|
87
|
+
|
|
88
|
+
### Add a CI gate in 6 lines
|
|
89
|
+
|
|
90
|
+
The contract above already runs in production. The same `Step` doubles as the unit your regression eval runs against:
|
|
91
|
+
|
|
92
|
+
```ruby
|
|
93
|
+
SummarizeArticle.define_eval("regression") do
|
|
94
|
+
# `expected:` is a partial hash match — only listed keys check parsed_output.
|
|
95
|
+
add_case "neutral release",
|
|
96
|
+
input: "Ruby 3.4 shipped frozen string literals...",
|
|
97
|
+
expected: { tone: "analytical" }
|
|
98
|
+
add_case "outage post",
|
|
99
|
+
input: "Service was down for 4 hours...",
|
|
100
|
+
expected: { tone: "negative" }
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
# in CI (RSpec):
|
|
104
|
+
expect(SummarizeArticle).to pass_eval("regression").without_regressions
|
|
57
105
|
```
|
|
58
106
|
|
|
59
|
-
|
|
107
|
+
A bad prompt edit or model swap that drops accuracy on the frozen dataset → red CI, blocked merge. The first CI run records a baseline; subsequent runs compare against it. Every production miss should become the next `add_case`. See [Prevent silent prompt regressions](docs/guide/eval_first.md) for the full flywheel.
|
|
108
|
+
|
|
109
|
+
## Do I need this?
|
|
110
|
+
|
|
111
|
+
Use this if LLM output affects production behaviour, money, user trust, or downstream code. You probably don't need it if you have one low-risk prompt, manually inspect every result, or only generate best-effort prose.
|
|
60
112
|
|
|
61
|
-
|
|
113
|
+
Already using structured outputs from your provider? This gem adds business-rule validation, retry with model escalation, evals, regression gating, and test stubs on top of them — the layer that stops schema-valid-but-wrong output from reaching users. See [Why contracts?](docs/guide/why.md) for the four production failure modes the gem exists for.
|
|
62
114
|
|
|
63
115
|
## Most useful next
|
|
64
116
|
|
|
65
117
|
Everything below is optional — the example above is a complete step. Reach for these when one step isn't enough.
|
|
66
118
|
|
|
67
|
-
- **[CI regression gates](docs/guide/getting_started.md)** —
|
|
68
|
-
- **[Find the cheapest viable fallback list](docs/guide/optimizing_retry_policy.md)** —
|
|
69
|
-
- **[A/B test prompts](docs/guide/eval_first.md)** —
|
|
70
|
-
- **[Budget caps](docs/guide/getting_started.md)** —
|
|
71
|
-
- **[Reasoning effort / thinking config](docs/guide/optimizing_retry_policy.md)** —
|
|
119
|
+
- **[CI regression gates](docs/guide/getting_started.md)** — block CI when accuracy drops on a model update or prompt tweak.
|
|
120
|
+
- **[Find the cheapest viable fallback list](docs/guide/optimizing_retry_policy.md)** — empirically pick the cheapest model chain that still passes your evals.
|
|
121
|
+
- **[A/B test prompts](docs/guide/eval_first.md)** — measure whether a new prompt is safe to ship before merging.
|
|
122
|
+
- **[Budget caps](docs/guide/getting_started.md)** — refuse the request pre-flight when an estimate exceeds the limit.
|
|
123
|
+
- **[Reasoning effort / thinking config](docs/guide/optimizing_retry_policy.md)** — Anthropic / OpenAI thinking configuration on the Step class.
|
|
72
124
|
|
|
73
|
-
Also supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and
|
|
125
|
+
Also supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and per-step models.
|
|
74
126
|
|
|
75
127
|
## Relation to `RubyLLM::Agent`
|
|
76
128
|
|
|
77
|
-
`RubyLLM::Agent` (since RubyLLM 1.12)
|
|
129
|
+
`Step::Base` and `RubyLLM::Agent` (since RubyLLM 1.12) are **siblings** targeting the same niche: reusable, class-based prompts. Both call into `RubyLLM::Chat` directly — Step does not wrap Agent. Step adds the contract layer: `validate` (business invariants), `retry_policy escalate(...)` (model escalation on validation failure), `max_cost` pre-flight refusal, regression-eval framework, pipeline composition. **[Full feature mapping →](docs/guide/relation_to_agent.md)**
|
|
78
130
|
|
|
79
|
-
|
|
80
|
-
|---|---|
|
|
81
|
-
| `model`, `temperature`, `schema`, `instructions`, `tools`, `thinking` | covered by both — same idea, different DSL surface |
|
|
82
|
-
| `validate :rule do |out| ... end` business invariants | only here |
|
|
83
|
-
| `retry_policy escalate(...)` model escalation on validation failure | only here (different from RubyLLM's network-level retry) |
|
|
84
|
-
| `max_cost` / `max_input` / `max_output` pre-flight refusal | only here |
|
|
85
|
-
| `define_eval` + baseline regression + `compare_models` + `optimize_retry_policy` | only here (RubyLLM does not ship an eval framework) |
|
|
86
|
-
| Pipeline composition with `step SomeStep, as: :alias` | only here (RubyLLM intentionally leaves workflows as plain Ruby) |
|
|
87
|
-
| `around_call`, named `observe` hooks with pass/fail in trace | only here |
|
|
131
|
+
## Relation to `ruby_llm-tribunal`
|
|
88
132
|
|
|
89
|
-
|
|
133
|
+
Different layers, complementary. [`ruby_llm-tribunal`](https://github.com/Alqemist-labs/ruby_llm-tribunal) is a **test framework** that grades outputs **after they've reached your code**, typically in a spec. `ruby_llm-contract` is **runtime** — schema + `validate` rules gate the call **before the output reaches your code**, retry/escalate attempts to recover from failed outputs, `max_cost` refuses pre-flight. Our `define_eval` is *regression* (does this prompt/model still pass on a frozen dataset?), not *grading*.
|
|
134
|
+
|
|
135
|
+
**One-liner:** Tribunal answers *"is this output good?"* (fail → red test in CI). Contract answers *"what do we do when it isn't?"* (fail → retry/escalate, or fail closed). **[Visual flows + coexistence patterns →](docs/guide/relation_to_tribunal.md)**
|
|
90
136
|
|
|
91
137
|
## Docs
|
|
92
138
|
|
|
@@ -95,6 +141,8 @@ Also supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and
|
|
|
95
141
|
| Guide | What it does for your app |
|
|
96
142
|
|-------|---------------------------|
|
|
97
143
|
| [Why contracts?](docs/guide/why.md) | Recognise the four production failures the gem exists for |
|
|
144
|
+
| [Relation to RubyLLM::Agent](docs/guide/relation_to_agent.md) | Sibling abstractions; what each adds; runtime call path; coexistence patterns |
|
|
145
|
+
| [Relation to ruby_llm-tribunal](docs/guide/relation_to_tribunal.md) | Different layers (test framework vs runtime contract); visual flows; integration recipes |
|
|
98
146
|
| [Getting Started](docs/guide/getting_started.md) | Walk the full feature set on one concrete step |
|
|
99
147
|
| [Rails integration](docs/guide/rails_integration.md) | Directory, initializer, jobs, logging, specs, CI gate — 7 FAQs for Rails devs |
|
|
100
148
|
| [Adopt in an existing Rails app](docs/guide/migration.md) | Replace raw `LlmClient.call` with a contract, Before/After |
|
|
@@ -103,12 +151,23 @@ Also supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and
|
|
|
103
151
|
| [Write validate rules that catch real bugs](docs/guide/best_practices.md) | Patterns for cross-input checks and content-quality rules |
|
|
104
152
|
| [Stub LLM calls in tests](docs/guide/testing.md) | Deterministic specs, RSpec + Minitest matchers |
|
|
105
153
|
| [Chain LLM calls into a pipeline](docs/guide/pipeline.md) | Multi-step with fail-fast and per-step models |
|
|
154
|
+
| [Multimodal input (PDF / image / audio)](docs/guide/multimodal_input.md) | Route attachments through the contract; `attachment_token_estimate`, fail-closed cost, calibration table |
|
|
106
155
|
| [Schema DSL reference](docs/guide/output_schema.md) | Every constraint, nested objects, pattern table |
|
|
107
156
|
| [Prompt DSL reference](docs/guide/prompt_ast.md) | `system` / `rule` / `section` / `example` / `user` nodes |
|
|
108
157
|
|
|
109
|
-
##
|
|
158
|
+
## Status & versioning
|
|
159
|
+
|
|
160
|
+
Pre-1.0 (currently **0.10.1**). Semver tracked; breaking changes flagged in [CHANGELOG](CHANGELOG.md). Pin `~> 0.10.1` until 1.0 ships.
|
|
161
|
+
|
|
162
|
+
## FAQ
|
|
163
|
+
|
|
164
|
+
**Thread-safe / Sidekiq?** Yes. Each `Step.run` builds an isolated `RubyLLM::Chat`; class-level state (`output_schema`, `validate`, `retry_policy`) is set up once at class load and read-only afterwards. Safe to run from concurrent jobs/threads.
|
|
165
|
+
|
|
166
|
+
**How do I stub `Step.run` in specs?** Include `RubyLLM::Contract::RSpec::Helpers` and use `stub_step(MyStep, response: { ... })`. The block form scopes the stub to one `it`. See [testing guide](docs/guide/testing.md).
|
|
167
|
+
|
|
168
|
+
**Where in a Rails app?** Default `app/contracts/`. The Railtie reloads `app/contracts/eval/` and `app/steps/eval/` in development; any autoloaded directory also works. See [Rails integration](docs/guide/rails_integration.md).
|
|
110
169
|
|
|
111
|
-
|
|
170
|
+
**Upgraded to 0.9.0 and my contract started refusing — why?** 0.9.0 added multimodal input. If your contract has `max_cost` or `max_input` set AND now receives `context: { attachment: ... }`, you must declare `attachment_token_estimate(n)` (conservative input-token budget for the attachment) — otherwise the call fails closed with `:limit_exceeded`. The gem cannot bound vision/PDF cost without your estimate. Opt out per-step with `on_unknown_attachment_size :warn`. Text-only contracts are unaffected. See [multimodal input guide](docs/guide/multimodal_input.md).
|
|
112
171
|
|
|
113
172
|
## License
|
|
114
173
|
|
|
@@ -13,7 +13,15 @@ module RubyLLM
|
|
|
13
13
|
chat = build_chat(options, system_contents)
|
|
14
14
|
add_history(chat, conversation[0..-2])
|
|
15
15
|
|
|
16
|
-
|
|
16
|
+
# `with: nil` is a documented no-op in RubyLLM (verified against
|
|
17
|
+
# 1.15.0: chat.rb:36-37 `build_content(message, nil)` -> content.rb:8-14
|
|
18
|
+
# `Content.new(text, nil)` keeps text-only path when attachments
|
|
19
|
+
# are empty; raise only fires when BOTH text and attachments are nil,
|
|
20
|
+
# and we always pass a non-nil string thanks to `&.fetch(:content, "")`).
|
|
21
|
+
response = chat.ask(
|
|
22
|
+
conversation.last&.fetch(:content, ""),
|
|
23
|
+
with: options[:attachment]
|
|
24
|
+
)
|
|
17
25
|
build_response(response)
|
|
18
26
|
end
|
|
19
27
|
|
|
@@ -190,16 +190,13 @@ module RubyLLM
|
|
|
190
190
|
context.merge(adapter: sample_adapter)
|
|
191
191
|
end
|
|
192
192
|
|
|
193
|
+
# `Class#subclasses` is available from Ruby 3.1; the gemspec requires
|
|
194
|
+
# `>= 3.2.0` so the legacy `ObjectSpace.each_object` fallback would be
|
|
195
|
+
# dead code on every supported runtime.
|
|
193
196
|
def register_subclasses(klass)
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
register_subclasses(sub)
|
|
198
|
-
end
|
|
199
|
-
else
|
|
200
|
-
ObjectSpace.each_object(Class) do |sub|
|
|
201
|
-
Contract.register_eval_host(sub) if sub < klass
|
|
202
|
-
end
|
|
197
|
+
klass.subclasses.each do |sub|
|
|
198
|
+
Contract.register_eval_host(sub)
|
|
199
|
+
register_subclasses(sub)
|
|
203
200
|
end
|
|
204
201
|
end
|
|
205
202
|
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module RubyLLM
|
|
4
|
+
module Contract
|
|
5
|
+
module Concerns
|
|
6
|
+
# Shared implementation of `stub_step`, `stub_steps`, and `stub_all_steps`.
|
|
7
|
+
# Included by both `RubyLLM::Contract::RSpec::Helpers` and
|
|
8
|
+
# `RubyLLM::Contract::MinitestHelpers` so the two test-framework
|
|
9
|
+
# adapters cannot drift on stub semantics (Codex DRY finding #1: the
|
|
10
|
+
# prior parallel implementations had already diverged on
|
|
11
|
+
# `normalize_test_response` — RSpec had it, Minitest didn't).
|
|
12
|
+
#
|
|
13
|
+
# Cleanup between examples is the responsibility of the host helper:
|
|
14
|
+
# - RSpec: `around(:each)` hook in `lib/ruby_llm/contract/rspec.rb`
|
|
15
|
+
# restores `step_adapter_overrides`.
|
|
16
|
+
# - Minitest: `teardown` in `MinitestHelpers` clears overrides and
|
|
17
|
+
# restores `default_adapter`.
|
|
18
|
+
module StubHelpers
|
|
19
|
+
# Stub a single step to return a canned response without API calls.
|
|
20
|
+
# Block form scopes the stub to the block; non-block form lives
|
|
21
|
+
# until the host's teardown/around hook fires.
|
|
22
|
+
def stub_step(step_class, response: nil, responses: nil, &block)
|
|
23
|
+
adapter = build_test_adapter(response: response, responses: responses)
|
|
24
|
+
overrides = RubyLLM::Contract.step_adapter_overrides
|
|
25
|
+
|
|
26
|
+
if block
|
|
27
|
+
previous = overrides[step_class]
|
|
28
|
+
overrides[step_class] = adapter
|
|
29
|
+
begin
|
|
30
|
+
yield
|
|
31
|
+
ensure
|
|
32
|
+
previous ? (overrides[step_class] = previous) : overrides.delete(step_class)
|
|
33
|
+
end
|
|
34
|
+
else
|
|
35
|
+
overrides[step_class] = adapter
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
# Stub multiple steps with different responses. Requires a block.
|
|
40
|
+
def stub_steps(stubs, &block)
|
|
41
|
+
raise ArgumentError, "stub_steps requires a block" unless block
|
|
42
|
+
|
|
43
|
+
overrides = RubyLLM::Contract.step_adapter_overrides
|
|
44
|
+
previous = {}
|
|
45
|
+
|
|
46
|
+
stubs.each do |step_class, opts|
|
|
47
|
+
opts = opts.transform_keys(&:to_sym)
|
|
48
|
+
previous[step_class] = overrides[step_class]
|
|
49
|
+
overrides[step_class] = build_test_adapter(**opts.slice(:response, :responses))
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
begin
|
|
53
|
+
yield
|
|
54
|
+
ensure
|
|
55
|
+
stubs.each_key do |step_class|
|
|
56
|
+
previous[step_class] ? (overrides[step_class] = previous[step_class]) : overrides.delete(step_class)
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
# Set a global test adapter for ALL steps. Block form restores the
|
|
62
|
+
# previous adapter on exit; non-block form persists until host cleanup.
|
|
63
|
+
def stub_all_steps(response: nil, responses: nil, &block)
|
|
64
|
+
adapter = build_test_adapter(response: response, responses: responses)
|
|
65
|
+
|
|
66
|
+
if block
|
|
67
|
+
previous = RubyLLM::Contract.configuration.default_adapter
|
|
68
|
+
begin
|
|
69
|
+
RubyLLM::Contract.configuration.default_adapter = adapter
|
|
70
|
+
yield
|
|
71
|
+
ensure
|
|
72
|
+
RubyLLM::Contract.configuration.default_adapter = previous
|
|
73
|
+
end
|
|
74
|
+
else
|
|
75
|
+
RubyLLM::Contract.configure { |c| c.default_adapter = adapter }
|
|
76
|
+
end
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
private
|
|
80
|
+
|
|
81
|
+
def build_test_adapter(response: nil, responses: nil)
|
|
82
|
+
if responses
|
|
83
|
+
Adapters::Test.new(responses: responses.map { |r| normalize_test_response(r) })
|
|
84
|
+
else
|
|
85
|
+
Adapters::Test.new(response: normalize_test_response(response))
|
|
86
|
+
end
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
# Hook for host frameworks to inject custom serialization (e.g.
|
|
90
|
+
# turning hashes into JSON strings). Default: identity.
|
|
91
|
+
def normalize_test_response(value)
|
|
92
|
+
value
|
|
93
|
+
end
|
|
94
|
+
end
|
|
95
|
+
end
|
|
96
|
+
end
|
|
97
|
+
end
|
|
@@ -18,6 +18,8 @@ module RubyLLM
|
|
|
18
18
|
end
|
|
19
19
|
|
|
20
20
|
def invariant(description, &block)
|
|
21
|
+
raise ArgumentError, "invariant description must be a non-empty string", caller if description.to_s.empty?
|
|
22
|
+
|
|
21
23
|
@invariants << Invariant.new(description, block)
|
|
22
24
|
end
|
|
23
25
|
alias validate invariant
|
|
@@ -89,8 +89,13 @@ module RubyLLM
|
|
|
89
89
|
(input_cost + output_cost).round(6)
|
|
90
90
|
end
|
|
91
91
|
|
|
92
|
+
# Provider pricing is denominated per 1M tokens; divide here to get
|
|
93
|
+
# the dollar cost for the actual usage count. Named constant for
|
|
94
|
+
# consistency with how RubyLLM and provider docs express prices.
|
|
95
|
+
TOKENS_PER_MILLION = 1_000_000.0
|
|
96
|
+
|
|
92
97
|
def self.token_cost(tokens, price_per_million)
|
|
93
|
-
(tokens || 0) * (price_per_million || 0) /
|
|
98
|
+
(tokens || 0) * (price_per_million || 0) / TOKENS_PER_MILLION
|
|
94
99
|
end
|
|
95
100
|
|
|
96
101
|
def self.find_model(model_name)
|
|
@@ -111,7 +116,11 @@ module RubyLLM
|
|
|
111
116
|
end
|
|
112
117
|
end
|
|
113
118
|
|
|
114
|
-
|
|
119
|
+
# `find_model` is intentionally public: `Step::Base#estimate_cost` needs
|
|
120
|
+
# to inspect model pricing before invoking `calculate` (e.g., to short-
|
|
121
|
+
# circuit estimate when the model is unknown). Exposing it removes
|
|
122
|
+
# `CostCalculator.send(:find_model)` workarounds at call sites.
|
|
123
|
+
private_class_method :compute_cost, :token_cost, :validate_price!
|
|
115
124
|
end
|
|
116
125
|
end
|
|
117
126
|
end
|
|
@@ -4,7 +4,9 @@ module RubyLLM
|
|
|
4
4
|
module Contract
|
|
5
5
|
module Eval
|
|
6
6
|
class Recommender
|
|
7
|
-
def initialize(comparison:, min_score:,
|
|
7
|
+
def initialize(comparison:, min_score:,
|
|
8
|
+
min_first_try_pass_rate: DEFAULT_MIN_FIRST_TRY_PASS_RATE,
|
|
9
|
+
current_config: nil)
|
|
8
10
|
@comparison = comparison
|
|
9
11
|
@min_score = min_score
|
|
10
12
|
@min_first_try_pass_rate = min_first_try_pass_rate
|
|
@@ -98,7 +98,9 @@ module RubyLLM
|
|
|
98
98
|
end
|
|
99
99
|
end
|
|
100
100
|
|
|
101
|
-
def initialize(step:, candidates:, context: {},
|
|
101
|
+
def initialize(step:, candidates:, context: {},
|
|
102
|
+
min_score: DEFAULT_MIN_SCORE,
|
|
103
|
+
runs: 1, production_mode: nil)
|
|
102
104
|
@step = step
|
|
103
105
|
@candidates = candidates
|
|
104
106
|
@context = context
|
|
@@ -113,10 +115,19 @@ module RubyLLM
|
|
|
113
115
|
|
|
114
116
|
score_matrix = {}
|
|
115
117
|
evals.each do |eval_name|
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
118
|
+
# `retry_policy_override: nil` in context disables the step's
|
|
119
|
+
# class-level retry policy for this comparison run — see
|
|
120
|
+
# step/base.rb#runtime_settings, which honours the key when
|
|
121
|
+
# present (even when value is nil). Replaces the prior
|
|
122
|
+
# `define_singleton_method(:retry_policy)` mutation, which was
|
|
123
|
+
# not thread-safe across concurrent optimizer calls.
|
|
124
|
+
comparison = @step.compare_models(
|
|
125
|
+
eval_name,
|
|
126
|
+
candidates: @candidates,
|
|
127
|
+
context: @context.merge(retry_policy_override: nil),
|
|
128
|
+
runs: @runs,
|
|
129
|
+
production_mode: @production_mode
|
|
130
|
+
)
|
|
120
131
|
score_matrix[eval_name] = extract_scores(comparison)
|
|
121
132
|
end
|
|
122
133
|
|
|
@@ -203,14 +214,6 @@ module RubyLLM
|
|
|
203
214
|
end
|
|
204
215
|
end
|
|
205
216
|
|
|
206
|
-
def with_retry_disabled(&block)
|
|
207
|
-
original = @step.retry_policy if @step.respond_to?(:retry_policy)
|
|
208
|
-
@step.define_singleton_method(:retry_policy) { nil }
|
|
209
|
-
block.call
|
|
210
|
-
ensure
|
|
211
|
-
@step.define_singleton_method(:retry_policy) { original }
|
|
212
|
-
end
|
|
213
|
-
|
|
214
217
|
def empty_result(evals)
|
|
215
218
|
Result.new(
|
|
216
219
|
step_name: @step.name || @step.to_s,
|
|
@@ -33,3 +33,16 @@ require_relative "eval/eval_history"
|
|
|
33
33
|
require_relative "eval/recommendation"
|
|
34
34
|
require_relative "eval/recommender"
|
|
35
35
|
require_relative "eval/retry_optimizer"
|
|
36
|
+
|
|
37
|
+
module RubyLLM
|
|
38
|
+
module Contract
|
|
39
|
+
module Eval
|
|
40
|
+
# Default thresholds shared across `recommend`, `optimize_retry_policy`,
|
|
41
|
+
# `Recommender`, and `RetryOptimizer`. Centralised so a single change in
|
|
42
|
+
# what "viable" means (e.g. tightening from 0.95 to 0.97) propagates
|
|
43
|
+
# everywhere instead of needing the same edit in 4 places.
|
|
44
|
+
DEFAULT_MIN_SCORE = 0.95
|
|
45
|
+
DEFAULT_MIN_FIRST_TRY_PASS_RATE = 0.8
|
|
46
|
+
end
|
|
47
|
+
end
|
|
48
|
+
end
|