ruby_llm-contract 0.8.0 → 0.10.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 33f6a8a686f7f20791904c4fbfacd19f6ea5b8bad428c374ec14b7e33521354d
4
- data.tar.gz: a2b2f7d9ff1e6cd69b39d55a3809b1babbd82a5d42afe3c567733076f03fa317
3
+ metadata.gz: 66d3692be182ac8958132d06b53c1d47e01d07769d0e7a1942f81b71eb23a759
4
+ data.tar.gz: f254fcbc6bc8c2a42a78cff9ea3b24b23f04d860c60d79084bbd8a5d6ea6ec0b
5
5
  SHA512:
6
- metadata.gz: '0895986db9cde7d26d2e91ffc7c6469b7d34df299a361148c9ee339dbf1dc61539e44adf250c00c06383717ed2e47ff250d4490d1cce43c3cdf8c3169529fba5'
7
- data.tar.gz: ed35a4b4cc9ab1afd46c427468dcb33844d8d54207531f98a9f1d775004efc5a1e19d64fb22c64b20da81750165a3a979f4de4ecaa46aff4121eb7ba80a27ed2
6
+ metadata.gz: f035aca56a5c0e2ca0607a06431cbb76a6ff4cc45f2f35fe9309d534b881910fa05d63c5c581a6e9cbac5374377f6fc187e521a844a653da5b53d612e9a23a89
7
+ data.tar.gz: 31bb34333a28224e7c00ce2dea7b3905dce878c320fc0fb8a0e5db1313259a0e12e136ab6cc74126b25f0a47984830ae2172e3f824ccfdcb28f847929f40080a
data/CHANGELOG.md CHANGED
@@ -1,5 +1,69 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.10.1 (2026-06-01)
4
+
5
+ Patch release fixing gem packaging. 0.10.0 was yanked from rubygems.org due to the issue documented below; 0.10.1 is the recommended upgrade target. No code behavior change vs 0.10.0.
6
+
7
+ ### Fixed
8
+
9
+ - **Gem no longer ships internal tracker / dev configs.** Excluded from `spec.files`: `TODO.md`, `.rspec`, `.rubycritic.yml`, `.simplecov`, and the `.revive/` directory. Pre-0.10.1 the published gem contained these files; adopters who already extracted 0.10.0 can safely delete them.
10
+
11
+ ## 0.10.0 (2026-06-01)
12
+
13
+ First published release since 0.8.0. Consolidates work originally tagged as 0.9.0 (multimodal input) and 0.9.1 (internal quality refactor), neither of which was pushed to rubygems. Adopters upgrading from 0.8.0 should read the **Behavioural change** and **Breaking changes** sections below before installing.
14
+
15
+ ### Breaking changes
16
+
17
+ - **`validate(description, &block)` and `Definition#invariant(description, &block)` now raise `ArgumentError` when `description` is `nil` or empty.** Pre-0.10.0 the empty descriptor was silently accepted and produced `""` entries in `result.validation_errors`, making debugging impossible. Codex audit found zero production use sites across `lib/`, `examples/`, `README` — only the regression-marker test certifying the bug.
18
+
19
+ ### Migration
20
+
21
+ Ensure every `validate` / `invariant` call has a non-empty descriptor (this is already how every README example writes them):
22
+
23
+ ```ruby
24
+ # Before (silently accepted, produced "" in validation_errors):
25
+ validate("") { |o| o[:score].between?(0, 100) }
26
+
27
+ # After (required):
28
+ validate("score in range 0-100") { |o| o[:score].between?(0, 100) }
29
+ ```
30
+
31
+ ### Added
32
+
33
+ - **Multimodal input via `context: { attachment: ... }`** — pass a file/IO/URL through `Step.run(input, context: { attachment: path })`; the adapter forwards it to `RubyLLM::Chat#ask(content, with: attachment)`. RubyLLM normalises wire format per provider (Anthropic url/base64, OpenAI `image_url`/`file`, Gemini `inline_data`). Multi-attachment supported natively (`with: [pdf1, pdf2]` or `with: { images: [...], pdfs: [...] }`). See [multimodal input guide](docs/guide/multimodal_input.md) and [ADR-0022](doc/decisions/ADR-0022-v09-multimodal-input.md).
34
+ - **`attachment_token_estimate(n)` class macro** — adopter-declared conservative estimate of attachment input tokens. Applied to BOTH runtime (`limit_checker`) and pre-flight (`estimate_cost`) — same source of truth, no estimate/runtime drift.
35
+ - **`on_unknown_attachment_size(:refuse | :warn)` class macro** — mirrors `on_unknown_pricing` opt-out semantics. Defaults to `:refuse`. Never settable as global default — same invariant as `max_cost` fail-closed.
36
+
37
+ ### Behavioural change (READ BEFORE UPGRADING)
38
+
39
+ - **Contracts with `max_cost` or `max_input` AND `context[:attachment]` set AND no `attachment_token_estimate` declared → REFUSE with `:limit_exceeded`.** This is fail-closed semantics: the gem cannot bound vision/PDF token cost without an adopter-declared estimate. Opt out per-step with `on_unknown_attachment_size :warn`. Text-only contracts and contracts without `max_cost`/`max_input` are unaffected.
40
+
41
+ ### Changed
42
+
43
+ - **`run_eval` (no args) return shape pinned to `Hash<String, Report>` keyed by eval name.** Documents the existing contract used by `RubyLLM::Contract::RakeTask#collect_host_reports` and adopters. No runtime change vs 0.8.0 — only the spec assertion now locks the shape.
44
+ - **`Parser.parse(text, strategy: :json)` first-bracket-wins boundary documented.** Extraction commits to the first balanced `{` or `[` structure and does NOT retry on later candidates. Empty `{}` followed by real JSON parses as the empty Hash; non-JSON `{braces}` before real JSON raises `ParseError`. No runtime change — this codifies long-standing behavior with explicit boundary tests.
45
+
46
+ ### Fixed
47
+
48
+ - **`with_retry_disabled` no longer mutates the step class's singleton method.** The optimizer now passes `retry_policy_override: nil` through `context:` to `compare_models`, which `Step::Base#runtime_settings` already honours. Removes a concurrency hazard where two parallel `optimize_retry_policy` calls on the same step would race on the singleton restore in `ensure`.
49
+ - **`CostCalculator.find_model` exposed as a public class method.** Removes two `CostCalculator.send(:find_model, ...)` workarounds in `Step::Base#estimate_cost`. The `estimated_cost_for` helper is gone — `estimate_cost` now routes through the existing public `CostCalculator.calculate(model_name:, usage:)`.
50
+ - **`stub_step` unified on a single storage path.** Both block and non-block forms now write to `RubyLLM::Contract.step_adapter_overrides` (thread-local). The `around(:each)` hook in `rspec.rb` handles cleanup between examples. Removes the prior `allow(step).to receive(:run)` branch.
51
+
52
+ ### Internal
53
+
54
+ - **Anti-facade audit complete: 89/89 spec files under per-test 17-mode walk** (Phase A: 26 specs, Phase C: 63 specs via parallel Codex fan-out). Net +30 strengthened tests against mutation-blind assertions, zero public API change beyond the breaking entry above.
55
+ - **Dead `ObjectSpace.each_object(Class)` fallback removed** in `concerns/eval_host.rb#register_subclasses`. The gemspec requires Ruby `>= 3.2.0`, so `Class#subclasses` (Ruby ≥ 3.1) is always available; the legacy fallback was unreachable code that would have iterated all loaded classes O(n) and was not thread-safe.
56
+
57
+ ### Deferred (not in 0.10.x)
58
+
59
+ - `add_history` multi-turn replay of prior attachments — single-turn multimodal supported; follow-up questions on the same document deferred to a later release.
60
+ - Streaming + attachment — contract steps remain synchronous.
61
+ - Provider-specific attachment size caps — surface only via `attachment_token_estimate` calibration; consult provider docs.
62
+
63
+ ### Tests
64
+
65
+ - Suite: 1401 examples / 0 failures / 7 pending (was 1346/0/8 at 0.8.0).
66
+
3
67
  ## 0.8.0 (2026-04-26)
4
68
 
5
69
  Narrative repositioning + small API additions. Internal architecture unchanged: no `Step::Base` refactor, no breaking changes to existing DSL.
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- ruby_llm-contract (0.8.0)
4
+ ruby_llm-contract (0.10.1)
5
5
  dry-types (~> 1.7)
6
6
  ruby_llm (~> 1.12)
7
7
  ruby_llm-schema (~> 0.3)
@@ -258,7 +258,7 @@ CHECKSUMS
258
258
  rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
259
259
  ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
260
260
  ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
261
- ruby_llm-contract (0.8.0)
261
+ ruby_llm-contract (0.10.1)
262
262
  ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
263
263
  ruby_parser (3.22.0) sha256=1eb4937cd9eb220aa2d194e352a24dba90aef00751e24c8dfffdb14000f15d23
264
264
  rubycritic (4.12.0) sha256=024fed90fe656fa939f6ea80aab17569699ac3863d0b52fd72cb99892247abc8
data/README.md CHANGED
@@ -2,9 +2,9 @@
2
2
 
3
3
  **Contracts + Evals for [ruby_llm](https://github.com/crmne/ruby_llm).**
4
4
 
5
- Your eval passed. Prod broke anyway? This gem wraps `RubyLLM::Chat` with input/output contracts, business-rule validation, retry with model escalation on validation failure, pre-flight cost ceilings, and an evaluation framework — so a flaky cheap-model call escalates to a stronger model instead of shipping garbage to your user.
5
+ Your eval passed. Prod broke anyway? This gem wraps `RubyLLM::Chat` with input/output contracts, business-rule validation, retry with model escalation on validation failure, pre-flight cost ceilings, and a regression-eval framework — so a flaky cheap-model call escalates to a stronger model instead of shipping garbage to your user.
6
6
 
7
- `ruby_llm` handles the HTTP side (rate limits, timeouts, streaming, tool calls, embeddings). This gem handles what the model *returned*: schema validation, business rules, model escalation on failed validation, datasets, regression tests.
7
+ `ruby_llm` handles the HTTP side (rate limits, timeouts, streaming, tool calls, embeddings). This gem handles what the model *returned* at **runtime**: schema validation, business rules, model escalation on failed validation, regression datasets that gate prompt/model changes in CI.
8
8
 
9
9
  ## Install
10
10
 
@@ -13,23 +13,24 @@ gem "ruby_llm-contract"
13
13
  ```
14
14
 
15
15
  ```ruby
16
- RubyLLM.configure { |c| c.openai_api_key = ENV["OPENAI_API_KEY"] }
17
- RubyLLM::Contract.configure { |c| c.default_model = "gpt-4.1-mini" }
18
- ```
19
-
20
- Works with any `ruby_llm` provider (OpenAI, Anthropic, Gemini, etc).
21
-
22
- ## Do I need this?
16
+ RubyLLM.configure do |c|
17
+ c.openai_api_key = ENV["OPENAI_API_KEY"]
18
+ c.default_model = "gpt-4.1-mini" # used when a Step has no explicit model
19
+ end
23
20
 
24
- Use this if LLM output affects production behaviour, money, user trust, or downstream code. You probably don't need it if you have one low-risk prompt, manually inspect every result, or only generate best-effort prose.
21
+ # Required: boots the gem so `Step.run` knows how to talk to your LLM.
22
+ # Empty block is fine. Pass options here if you need them (e.g. `c.logger`).
23
+ RubyLLM::Contract.configure { }
24
+ ```
25
25
 
26
- Already using structured outputs from your provider? This gem adds business-rule validation, retry with model fallback, evals, regression gating, and test stubs on top of them — the layer that stops schema-valid-but-wrong output from reaching users. See [Why contracts?](docs/guide/why.md) for the four production failure modes the gem exists for, or run `ruby examples/01_fallback_showcase.rb` to see the fallback loop in 30 seconds (no API key needed).
26
+ Works with any `ruby_llm` provider (OpenAI, Anthropic, Gemini, etc). Requires `ruby_llm ~> 1.12` and Ruby 3.2.
27
27
 
28
28
  ## Example
29
29
 
30
30
  A Rails app takes article text extracted from a user-submitted URL and wants to show a summary card: a short TL;DR, 3–5 key takeaways, and a tone label. The output has to fit the UI (TL;DR under 200 chars) and the schema has to be strict enough to render without conditionals.
31
31
 
32
32
  ```ruby
33
+ # app/contracts/summarize_article.rb
33
34
  class SummarizeArticle < RubyLLM::Contract::Step::Base
34
35
  prompt <<~PROMPT
35
36
  Summarize this article for a UI card. Return a short TL;DR,
@@ -45,48 +46,93 @@ class SummarizeArticle < RubyLLM::Contract::Step::Base
45
46
  end
46
47
 
47
48
  validate("TL;DR fits the card") { |o, _| o[:tldr].length <= 200 }
48
- validate("takeaways are unique") { |o, _| o[:takeaways].uniq.size == o[:takeaways].size }
49
+ validate("takeaways are unique") { |o, _| o[:takeaways] == o[:takeaways].uniq }
49
50
 
50
- retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini gpt-4.1]
51
+ # Cheapest first; last step adds a reasoning model with more thinking.
52
+ retry_policy do
53
+ escalate "gpt-4.1-nano",
54
+ "gpt-4.1-mini",
55
+ { model: "gpt-5", reasoning_effort: "high" }
56
+ end
51
57
  end
52
58
 
53
59
  result = SummarizeArticle.run(article_text)
54
- result.parsed_output # => { tldr: "...", takeaways: [...], tone: "analytical" }
55
- result.trace[:model] # => "gpt-4.1-nano" (first model that passed)
56
- result.trace[:cost] # => 0.000032
60
+ result.status # => :ok (or :validation_failed if all steps fail)
61
+ result.parsed_output # => { tldr: "...", takeaways: [...], tone: "..." }
62
+ result.trace[:model] # => "gpt-4.1-mini" (winning step)
63
+ result.trace[:cost] # => 0.000520 (total across all attempts)
64
+
65
+ result.trace[:attempts]
66
+ # => [
67
+ # {
68
+ # attempt: 1,
69
+ # model: "gpt-4.1-nano",
70
+ # status: :validation_failed,
71
+ # usage: { input_tokens: 256, output_tokens: 84 },
72
+ # latency_ms: 45,
73
+ # cost: 0.000100
74
+ # },
75
+ # {
76
+ # attempt: 2,
77
+ # model: "gpt-4.1-mini",
78
+ # status: :ok,
79
+ # usage: { input_tokens: 256, output_tokens: 92 },
80
+ # latency_ms: 92,
81
+ # cost: 0.000420
82
+ # }
83
+ # ]
84
+ ```
85
+
86
+ If the response is malformed, the TL;DR overflows the card, or the takeaway count is off, the gem moves to the next step. This is model **escalation**, not a fallback list — each step is an independent config (`model`, `reasoning_effort`), so the retry policy spends more compute only when the cheaper one couldn't satisfy the contract.
87
+
88
+ ### Add a CI gate in 6 lines
89
+
90
+ The contract above already runs in production. The same `Step` doubles as the unit your regression eval runs against:
91
+
92
+ ```ruby
93
+ SummarizeArticle.define_eval("regression") do
94
+ # `expected:` is a partial hash match — only listed keys check parsed_output.
95
+ add_case "neutral release",
96
+ input: "Ruby 3.4 shipped frozen string literals...",
97
+ expected: { tone: "analytical" }
98
+ add_case "outage post",
99
+ input: "Service was down for 4 hours...",
100
+ expected: { tone: "negative" }
101
+ end
102
+
103
+ # in CI (RSpec):
104
+ expect(SummarizeArticle).to pass_eval("regression").without_regressions
57
105
  ```
58
106
 
59
- The model returns JSON matching the schema. If the response is malformed, the TL;DR overflows the card, or the takeaway count is off, the gem retries moving to the next model in `models:` only when the cheaper one can't satisfy the rules. In this setup cheaper models are tried first and the expensive ones are used only when cheaper models fail.
107
+ A bad prompt edit or model swap that drops accuracy on the frozen dataset red CI, blocked merge. The first CI run records a baseline; subsequent runs compare against it. Every production miss should become the next `add_case`. See [Prevent silent prompt regressions](docs/guide/eval_first.md) for the full flywheel.
108
+
109
+ ## Do I need this?
110
+
111
+ Use this if LLM output affects production behaviour, money, user trust, or downstream code. You probably don't need it if you have one low-risk prompt, manually inspect every result, or only generate best-effort prose.
60
112
 
61
- You could write this loop yourself once. The gem gives you the loop, a trace of every attempt (model, status, cost, latency), fallback policy, evals, baselines, and CI checks as one contract object tracked per-step so adding a new LLM feature to your app is one class, not one-off scaffolding.
113
+ Already using structured outputs from your provider? This gem adds business-rule validation, retry with model escalation, evals, regression gating, and test stubs on top of them the layer that stops schema-valid-but-wrong output from reaching users. See [Why contracts?](docs/guide/why.md) for the four production failure modes the gem exists for.
62
114
 
63
115
  ## Most useful next
64
116
 
65
117
  Everything below is optional — the example above is a complete step. Reach for these when one step isn't enough.
66
118
 
67
- - **[CI regression gates](docs/guide/getting_started.md)** — `define_eval` + `save_baseline!` + `pass_eval(...).without_regressions` blocks CI when accuracy drops on a model update or prompt tweak.
68
- - **[Find the cheapest viable fallback list](docs/guide/optimizing_retry_policy.md)** — `Step.recommend("regression", candidates: [...], min_score: 0.95)` returns the cheapest list of models that still passes your evals. `production_mode:` measures retry-aware cost.
69
- - **[A/B test prompts](docs/guide/eval_first.md)** — `SummarizeArticleV2.compare_with(SummarizeArticleV1, eval: "regression")` reports whether the new prompt is safe to ship.
70
- - **[Budget caps](docs/guide/getting_started.md)** — `max_cost`, `max_input`, `max_output` refuse the request before calling the API when a heuristic estimate (~±30% accuracy) exceeds the limit.
71
- - **[Reasoning effort / thinking config](docs/guide/optimizing_retry_policy.md)** — `thinking effort: :low` (or alias `reasoning_effort :low`) on the Step class; mirrors `RubyLLM::Agent.thinking` and forwards through `Chat#with_thinking`.
119
+ - **[CI regression gates](docs/guide/getting_started.md)** — block CI when accuracy drops on a model update or prompt tweak.
120
+ - **[Find the cheapest viable fallback list](docs/guide/optimizing_retry_policy.md)** — empirically pick the cheapest model chain that still passes your evals.
121
+ - **[A/B test prompts](docs/guide/eval_first.md)** — measure whether a new prompt is safe to ship before merging.
122
+ - **[Budget caps](docs/guide/getting_started.md)** — refuse the request pre-flight when an estimate exceeds the limit.
123
+ - **[Reasoning effort / thinking config](docs/guide/optimizing_retry_policy.md)** — Anthropic / OpenAI thinking configuration on the Step class.
72
124
 
73
- Also supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and `retry_policy attempts: N` for niche cases (we measured this empirically — for `gpt-4.1-nano` / `gpt-5-nano` on tasks with clear correctness criteria, same-model retry rarely helps; `escalate(model_2)` is the strategy that moves the needle, see [optimizing_retry_policy.md](docs/guide/optimizing_retry_policy.md)).
125
+ Also supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and per-step models.
74
126
 
75
127
  ## Relation to `RubyLLM::Agent`
76
128
 
77
- `RubyLLM::Agent` (since RubyLLM 1.12) and `Step::Base` here target the **same niche**: reusable, class-based prompts. They are siblings, not foundation-and-roof.
129
+ `Step::Base` and `RubyLLM::Agent` (since RubyLLM 1.12) are **siblings** targeting the same niche: reusable, class-based prompts. Both call into `RubyLLM::Chat` directly — Step does not wrap Agent. Step adds the contract layer: `validate` (business invariants), `retry_policy escalate(...)` (model escalation on validation failure), `max_cost` pre-flight refusal, regression-eval framework, pipeline composition. **[Full feature mapping →](docs/guide/relation_to_agent.md)**
78
130
 
79
- | What you write | Where it lives |
80
- |---|---|
81
- | `model`, `temperature`, `schema`, `instructions`, `tools`, `thinking` | covered by both — same idea, different DSL surface |
82
- | `validate :rule do |out| ... end` business invariants | only here |
83
- | `retry_policy escalate(...)` model escalation on validation failure | only here (different from RubyLLM's network-level retry) |
84
- | `max_cost` / `max_input` / `max_output` pre-flight refusal | only here |
85
- | `define_eval` + baseline regression + `compare_models` + `optimize_retry_policy` | only here (RubyLLM does not ship an eval framework) |
86
- | Pipeline composition with `step SomeStep, as: :alias` | only here (RubyLLM intentionally leaves workflows as plain Ruby) |
87
- | `around_call`, named `observe` hooks with pass/fail in trace | only here |
131
+ ## Relation to `ruby_llm-tribunal`
88
132
 
89
- `Step::Base` does NOT use `Agent` internally today both call into `RubyLLM::Chat` directly. The two abstractions can coexist on the same project: use `Agent` for prompt-only reuse, use `Step` when you need any of the contract-layer features above. The retry-strategy framing here (favouring `escalate(...)` over same-model `attempts: N`) is grounded in an empirical comparison; `attempts: N` stays in the API for niche cases.
133
+ Different layers, complementary. [`ruby_llm-tribunal`](https://github.com/Alqemist-labs/ruby_llm-tribunal) is a **test framework** that grades outputs **after they've reached your code**, typically in a spec. `ruby_llm-contract` is **runtime** schema + `validate` rules gate the call **before the output reaches your code**, retry/escalate attempts to recover from failed outputs, `max_cost` refuses pre-flight. Our `define_eval` is *regression* (does this prompt/model still pass on a frozen dataset?), not *grading*.
134
+
135
+ **One-liner:** Tribunal answers *"is this output good?"* (fail → red test in CI). Contract answers *"what do we do when it isn't?"* (fail → retry/escalate, or fail closed). **[Visual flows + coexistence patterns →](docs/guide/relation_to_tribunal.md)**
90
136
 
91
137
  ## Docs
92
138
 
@@ -95,6 +141,8 @@ Also supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and
95
141
  | Guide | What it does for your app |
96
142
  |-------|---------------------------|
97
143
  | [Why contracts?](docs/guide/why.md) | Recognise the four production failures the gem exists for |
144
+ | [Relation to RubyLLM::Agent](docs/guide/relation_to_agent.md) | Sibling abstractions; what each adds; runtime call path; coexistence patterns |
145
+ | [Relation to ruby_llm-tribunal](docs/guide/relation_to_tribunal.md) | Different layers (test framework vs runtime contract); visual flows; integration recipes |
98
146
  | [Getting Started](docs/guide/getting_started.md) | Walk the full feature set on one concrete step |
99
147
  | [Rails integration](docs/guide/rails_integration.md) | Directory, initializer, jobs, logging, specs, CI gate — 7 FAQs for Rails devs |
100
148
  | [Adopt in an existing Rails app](docs/guide/migration.md) | Replace raw `LlmClient.call` with a contract, Before/After |
@@ -103,12 +151,23 @@ Also supports [multi-step pipelines](docs/guide/pipeline.md) with fail-fast and
103
151
  | [Write validate rules that catch real bugs](docs/guide/best_practices.md) | Patterns for cross-input checks and content-quality rules |
104
152
  | [Stub LLM calls in tests](docs/guide/testing.md) | Deterministic specs, RSpec + Minitest matchers |
105
153
  | [Chain LLM calls into a pipeline](docs/guide/pipeline.md) | Multi-step with fail-fast and per-step models |
154
+ | [Multimodal input (PDF / image / audio)](docs/guide/multimodal_input.md) | Route attachments through the contract; `attachment_token_estimate`, fail-closed cost, calibration table |
106
155
  | [Schema DSL reference](docs/guide/output_schema.md) | Every constraint, nested objects, pattern table |
107
156
  | [Prompt DSL reference](docs/guide/prompt_ast.md) | `system` / `rule` / `section` / `example` / `user` nodes |
108
157
 
109
- ## Roadmap
158
+ ## Status & versioning
159
+
160
+ Pre-1.0 (currently **0.10.1**). Semver tracked; breaking changes flagged in [CHANGELOG](CHANGELOG.md). Pin `~> 0.10.1` until 1.0 ships.
161
+
162
+ ## FAQ
163
+
164
+ **Thread-safe / Sidekiq?** Yes. Each `Step.run` builds an isolated `RubyLLM::Chat`; class-level state (`output_schema`, `validate`, `retry_policy`) is set up once at class load and read-only afterwards. Safe to run from concurrent jobs/threads.
165
+
166
+ **How do I stub `Step.run` in specs?** Include `RubyLLM::Contract::RSpec::Helpers` and use `stub_step(MyStep, response: { ... })`. The block form scopes the stub to one `it`. See [testing guide](docs/guide/testing.md).
167
+
168
+ **Where in a Rails app?** Default `app/contracts/`. The Railtie reloads `app/contracts/eval/` and `app/steps/eval/` in development; any autoloaded directory also works. See [Rails integration](docs/guide/rails_integration.md).
110
169
 
111
- Latest: **v0.8.0**tagline + narrative repositioning around "Contracts + Evals for RubyLLM", `thinking` / `reasoning_effort` class macro, TokenEstimator labelled as heuristic, CostCalculator repositioned. See [CHANGELOG](CHANGELOG.md) for history.
170
+ **Upgraded to 0.9.0 and my contract started refusing why?** 0.9.0 added multimodal input. If your contract has `max_cost` or `max_input` set AND now receives `context: { attachment: ... }`, you must declare `attachment_token_estimate(n)` (conservative input-token budget for the attachment) otherwise the call fails closed with `:limit_exceeded`. The gem cannot bound vision/PDF cost without your estimate. Opt out per-step with `on_unknown_attachment_size :warn`. Text-only contracts are unaffected. See [multimodal input guide](docs/guide/multimodal_input.md).
112
171
 
113
172
  ## License
114
173
 
@@ -13,7 +13,15 @@ module RubyLLM
13
13
  chat = build_chat(options, system_contents)
14
14
  add_history(chat, conversation[0..-2])
15
15
 
16
- response = chat.ask(conversation.last&.fetch(:content, ""))
16
+ # `with: nil` is a documented no-op in RubyLLM (verified against
17
+ # 1.15.0: chat.rb:36-37 `build_content(message, nil)` -> content.rb:8-14
18
+ # `Content.new(text, nil)` keeps text-only path when attachments
19
+ # are empty; raise only fires when BOTH text and attachments are nil,
20
+ # and we always pass a non-nil string thanks to `&.fetch(:content, "")`).
21
+ response = chat.ask(
22
+ conversation.last&.fetch(:content, ""),
23
+ with: options[:attachment]
24
+ )
17
25
  build_response(response)
18
26
  end
19
27
 
@@ -190,16 +190,13 @@ module RubyLLM
190
190
  context.merge(adapter: sample_adapter)
191
191
  end
192
192
 
193
+ # `Class#subclasses` is available from Ruby 3.1; the gemspec requires
194
+ # `>= 3.2.0` so the legacy `ObjectSpace.each_object` fallback would be
195
+ # dead code on every supported runtime.
193
196
  def register_subclasses(klass)
194
- if klass.respond_to?(:subclasses)
195
- klass.subclasses.each do |sub|
196
- Contract.register_eval_host(sub)
197
- register_subclasses(sub)
198
- end
199
- else
200
- ObjectSpace.each_object(Class) do |sub|
201
- Contract.register_eval_host(sub) if sub < klass
202
- end
197
+ klass.subclasses.each do |sub|
198
+ Contract.register_eval_host(sub)
199
+ register_subclasses(sub)
203
200
  end
204
201
  end
205
202
 
@@ -0,0 +1,97 @@
1
+ # frozen_string_literal: true
2
+
3
+ module RubyLLM
4
+ module Contract
5
+ module Concerns
6
+ # Shared implementation of `stub_step`, `stub_steps`, and `stub_all_steps`.
7
+ # Included by both `RubyLLM::Contract::RSpec::Helpers` and
8
+ # `RubyLLM::Contract::MinitestHelpers` so the two test-framework
9
+ # adapters cannot drift on stub semantics (Codex DRY finding #1: the
10
+ # prior parallel implementations had already diverged on
11
+ # `normalize_test_response` — RSpec had it, Minitest didn't).
12
+ #
13
+ # Cleanup between examples is the responsibility of the host helper:
14
+ # - RSpec: `around(:each)` hook in `lib/ruby_llm/contract/rspec.rb`
15
+ # restores `step_adapter_overrides`.
16
+ # - Minitest: `teardown` in `MinitestHelpers` clears overrides and
17
+ # restores `default_adapter`.
18
+ module StubHelpers
19
+ # Stub a single step to return a canned response without API calls.
20
+ # Block form scopes the stub to the block; non-block form lives
21
+ # until the host's teardown/around hook fires.
22
+ def stub_step(step_class, response: nil, responses: nil, &block)
23
+ adapter = build_test_adapter(response: response, responses: responses)
24
+ overrides = RubyLLM::Contract.step_adapter_overrides
25
+
26
+ if block
27
+ previous = overrides[step_class]
28
+ overrides[step_class] = adapter
29
+ begin
30
+ yield
31
+ ensure
32
+ previous ? (overrides[step_class] = previous) : overrides.delete(step_class)
33
+ end
34
+ else
35
+ overrides[step_class] = adapter
36
+ end
37
+ end
38
+
39
+ # Stub multiple steps with different responses. Requires a block.
40
+ def stub_steps(stubs, &block)
41
+ raise ArgumentError, "stub_steps requires a block" unless block
42
+
43
+ overrides = RubyLLM::Contract.step_adapter_overrides
44
+ previous = {}
45
+
46
+ stubs.each do |step_class, opts|
47
+ opts = opts.transform_keys(&:to_sym)
48
+ previous[step_class] = overrides[step_class]
49
+ overrides[step_class] = build_test_adapter(**opts.slice(:response, :responses))
50
+ end
51
+
52
+ begin
53
+ yield
54
+ ensure
55
+ stubs.each_key do |step_class|
56
+ previous[step_class] ? (overrides[step_class] = previous[step_class]) : overrides.delete(step_class)
57
+ end
58
+ end
59
+ end
60
+
61
+ # Set a global test adapter for ALL steps. Block form restores the
62
+ # previous adapter on exit; non-block form persists until host cleanup.
63
+ def stub_all_steps(response: nil, responses: nil, &block)
64
+ adapter = build_test_adapter(response: response, responses: responses)
65
+
66
+ if block
67
+ previous = RubyLLM::Contract.configuration.default_adapter
68
+ begin
69
+ RubyLLM::Contract.configuration.default_adapter = adapter
70
+ yield
71
+ ensure
72
+ RubyLLM::Contract.configuration.default_adapter = previous
73
+ end
74
+ else
75
+ RubyLLM::Contract.configure { |c| c.default_adapter = adapter }
76
+ end
77
+ end
78
+
79
+ private
80
+
81
+ def build_test_adapter(response: nil, responses: nil)
82
+ if responses
83
+ Adapters::Test.new(responses: responses.map { |r| normalize_test_response(r) })
84
+ else
85
+ Adapters::Test.new(response: normalize_test_response(response))
86
+ end
87
+ end
88
+
89
+ # Hook for host frameworks to inject custom serialization (e.g.
90
+ # turning hashes into JSON strings). Default: identity.
91
+ def normalize_test_response(value)
92
+ value
93
+ end
94
+ end
95
+ end
96
+ end
97
+ end
@@ -18,6 +18,8 @@ module RubyLLM
18
18
  end
19
19
 
20
20
  def invariant(description, &block)
21
+ raise ArgumentError, "invariant description must be a non-empty string", caller if description.to_s.empty?
22
+
21
23
  @invariants << Invariant.new(description, block)
22
24
  end
23
25
  alias validate invariant
@@ -89,8 +89,13 @@ module RubyLLM
89
89
  (input_cost + output_cost).round(6)
90
90
  end
91
91
 
92
+ # Provider pricing is denominated per 1M tokens; divide here to get
93
+ # the dollar cost for the actual usage count. Named constant for
94
+ # consistency with how RubyLLM and provider docs express prices.
95
+ TOKENS_PER_MILLION = 1_000_000.0
96
+
92
97
  def self.token_cost(tokens, price_per_million)
93
- (tokens || 0) * (price_per_million || 0) / 1_000_000.0
98
+ (tokens || 0) * (price_per_million || 0) / TOKENS_PER_MILLION
94
99
  end
95
100
 
96
101
  def self.find_model(model_name)
@@ -111,7 +116,11 @@ module RubyLLM
111
116
  end
112
117
  end
113
118
 
114
- private_class_method :compute_cost, :token_cost, :find_model, :validate_price!
119
+ # `find_model` is intentionally public: `Step::Base#estimate_cost` needs
120
+ # to inspect model pricing before invoking `calculate` (e.g., to short-
121
+ # circuit estimate when the model is unknown). Exposing it removes
122
+ # `CostCalculator.send(:find_model)` workarounds at call sites.
123
+ private_class_method :compute_cost, :token_cost, :validate_price!
115
124
  end
116
125
  end
117
126
  end
@@ -4,7 +4,9 @@ module RubyLLM
4
4
  module Contract
5
5
  module Eval
6
6
  class Recommender
7
- def initialize(comparison:, min_score:, min_first_try_pass_rate: 0.8, current_config: nil)
7
+ def initialize(comparison:, min_score:,
8
+ min_first_try_pass_rate: DEFAULT_MIN_FIRST_TRY_PASS_RATE,
9
+ current_config: nil)
8
10
  @comparison = comparison
9
11
  @min_score = min_score
10
12
  @min_first_try_pass_rate = min_first_try_pass_rate
@@ -98,7 +98,9 @@ module RubyLLM
98
98
  end
99
99
  end
100
100
 
101
- def initialize(step:, candidates:, context: {}, min_score: 0.95, runs: 1, production_mode: nil)
101
+ def initialize(step:, candidates:, context: {},
102
+ min_score: DEFAULT_MIN_SCORE,
103
+ runs: 1, production_mode: nil)
102
104
  @step = step
103
105
  @candidates = candidates
104
106
  @context = context
@@ -113,10 +115,19 @@ module RubyLLM
113
115
 
114
116
  score_matrix = {}
115
117
  evals.each do |eval_name|
116
- comparison = with_retry_disabled do
117
- @step.compare_models(eval_name, candidates: @candidates, context: @context,
118
- runs: @runs, production_mode: @production_mode)
119
- end
118
+ # `retry_policy_override: nil` in context disables the step's
119
+ # class-level retry policy for this comparison run — see
120
+ # step/base.rb#runtime_settings, which honours the key when
121
+ # present (even when value is nil). Replaces the prior
122
+ # `define_singleton_method(:retry_policy)` mutation, which was
123
+ # not thread-safe across concurrent optimizer calls.
124
+ comparison = @step.compare_models(
125
+ eval_name,
126
+ candidates: @candidates,
127
+ context: @context.merge(retry_policy_override: nil),
128
+ runs: @runs,
129
+ production_mode: @production_mode
130
+ )
120
131
  score_matrix[eval_name] = extract_scores(comparison)
121
132
  end
122
133
 
@@ -203,14 +214,6 @@ module RubyLLM
203
214
  end
204
215
  end
205
216
 
206
- def with_retry_disabled(&block)
207
- original = @step.retry_policy if @step.respond_to?(:retry_policy)
208
- @step.define_singleton_method(:retry_policy) { nil }
209
- block.call
210
- ensure
211
- @step.define_singleton_method(:retry_policy) { original }
212
- end
213
-
214
217
  def empty_result(evals)
215
218
  Result.new(
216
219
  step_name: @step.name || @step.to_s,
@@ -33,3 +33,16 @@ require_relative "eval/eval_history"
33
33
  require_relative "eval/recommendation"
34
34
  require_relative "eval/recommender"
35
35
  require_relative "eval/retry_optimizer"
36
+
37
+ module RubyLLM
38
+ module Contract
39
+ module Eval
40
+ # Default thresholds shared across `recommend`, `optimize_retry_policy`,
41
+ # `Recommender`, and `RetryOptimizer`. Centralised so a single change in
42
+ # what "viable" means (e.g. tightening from 0.95 to 0.97) propagates
43
+ # everywhere instead of needing the same edit in 4 places.
44
+ DEFAULT_MIN_SCORE = 0.95
45
+ DEFAULT_MIN_FIRST_TRY_PASS_RATE = 0.8
46
+ end
47
+ end
48
+ end