ruby_llm-contract 0.10.1 → 0.10.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,185 @@
1
+ # Migration Guide
2
+
3
+ > Read this when adopting the gem in an existing Rails app with a raw `LlmClient.call` service you want to replace. Skip if starting fresh — go to [Getting Started](getting_started.md) instead.
4
+
5
+ How to adopt `ruby_llm-contract` in an existing Rails app. Examples use `SummarizeArticle` — the flagship step from the [README](../../README.md) — but the pattern applies to any single-input / structured-output service.
6
+
7
+ ## Step 1: Start with the simplest service
8
+
9
+ Pick the LLM service with: single input → JSON output → DB save. Don't start with parallel batches or complex pipelines.
10
+
11
+ ## Step 2: Define the contract
12
+
13
+ **Before — raw HTTP:**
14
+
15
+ ```ruby
16
+ class ArticleSummaryService
17
+ def call(article_text)
18
+ response = LlmClient.new(model: "gpt-4o-mini").call(prompt(article_text))
19
+ JSON.parse(response[:content], symbolize_names: true)
20
+ end
21
+ end
22
+ ```
23
+
24
+ **After — contract:**
25
+
26
+ ```ruby
27
+ class SummarizeArticle < RubyLLM::Contract::Step::Base
28
+ model "gpt-4.1-mini"
29
+
30
+ prompt do
31
+ system "You summarize articles for a UI card."
32
+ rule "Return valid JSON only."
33
+ user "{input}"
34
+ end
35
+
36
+ output_schema do
37
+ string :tldr
38
+ array :takeaways, of: :string, min_items: 3, max_items: 5
39
+ string :tone, enum: %w[neutral positive negative analytical]
40
+ end
41
+
42
+ validate("TL;DR fits the card") { |o, _| o[:tldr].length <= 200 }
43
+ retry_policy models: %w[gpt-4.1-mini gpt-4.1]
44
+ end
45
+ ```
46
+
47
+ ## Step 3: Replace the caller
48
+
49
+ ```ruby
50
+ # Before
51
+ parsed = ArticleSummaryService.new.call(article_text)
52
+ Article.update!(summary: parsed["tldr"])
53
+
54
+ # After
55
+ result = SummarizeArticle.run(article_text)
56
+ if result.ok?
57
+ Article.update!(summary: result.parsed_output[:tldr])
58
+ else
59
+ Rails.logger.warn "Summary failed: #{result.status}"
60
+ end
61
+ ```
62
+
63
+ ## Step 4: Add logging via around_call
64
+
65
+ ```ruby
66
+ class SummarizeArticle < RubyLLM::Contract::Step::Base
67
+ # ... prompt, schema, validates ...
68
+
69
+ around_call do |step, input, result|
70
+ AiCallLog.create!(
71
+ ai_model: result.trace.model,
72
+ duration_ms: result.trace.latency_ms,
73
+ input_tokens: result.trace.usage&.dig(:input_tokens),
74
+ output_tokens: result.trace.usage&.dig(:output_tokens),
75
+ cost: result.trace.cost,
76
+ status: result.status.to_s
77
+ )
78
+ end
79
+ end
80
+ ```
81
+
82
+ ## Step 5: Add eval cases
83
+
84
+ Use real inputs from production logs:
85
+
86
+ ```ruby
87
+ SummarizeArticle.define_eval("regression") do
88
+ add_case "short news",
89
+ input: "Ruby 3.4 ships with frozen string literals by default...",
90
+ expected: { tone: "analytical" }
91
+
92
+ add_case "critical review",
93
+ input: "The new mesh networking hardware failed under load...",
94
+ expected: { tone: "negative" }
95
+ end
96
+ ```
97
+
98
+ ## Step 6: Find the cheapest model
99
+
100
+ ```ruby
101
+ comparison = SummarizeArticle.compare_models("regression",
102
+ candidates: [{ model: "gpt-4.1-nano" }, { model: "gpt-4.1-mini" }])
103
+
104
+ comparison.print_summary
105
+ comparison.best_for(min_score: 0.95) # => cheapest model at >= 95%
106
+ ```
107
+
108
+ Full optimization workflow — multi-eval, fallback list, production-mode cost — in [Optimizing retry_policy](optimizing_retry_policy.md).
109
+
110
+ ## Step 7: Add CI gate
111
+
112
+ ```ruby
113
+ # Rakefile
114
+ require "ruby_llm/contract/rake_task"
115
+ RubyLLM::Contract::RakeTask.new do |t|
116
+ t.minimum_score = 0.8
117
+ t.maximum_cost = 0.05
118
+ t.fail_on_regression = true
119
+ t.save_baseline = true
120
+ end
121
+ ```
122
+
123
+ **Rails apps:** if your adapter is configured in an initializer, use a Proc so context is resolved after Rails boots:
124
+
125
+ ```ruby
126
+ RubyLLM::Contract::RakeTask.new do |t|
127
+ t.context = -> { { adapter: RubyLLM::Contract.configuration.default_adapter } }
128
+ t.minimum_score = 0.8
129
+ end
130
+ ```
131
+
132
+ ## Common patterns
133
+
134
+ | Old pattern | New pattern |
135
+ |---|---|
136
+ | `LlmClient.new(model:).call(prompt)` | `MyStep.run(input)` |
137
+ | `JSON.parse(response[:content])` | `result.parsed_output` |
138
+ | `begin; rescue; retry; end` | `retry_policy models: [...]` |
139
+ | `body[:temperature] = 0.7` | `temperature 0.7` |
140
+ | `AiCallLog.create(...)` | `around_call { \|s, i, r\| ... }` |
141
+ | `response_format: JsonSchema.build(...)` | `output_schema do...end` |
142
+ | `stub_request(:post, ...)` | `stub_step(MyStep, response: {...})` |
143
+
144
+ ## Anti-patterns
145
+
146
+ - **Don't migrate markdown/text output services.** The gem is for structured JSON. Prose output gets no benefit from schema validation.
147
+ - **Don't put parallelism in the gem.** Thread management is your app's concern. The gem provides the contract; you call it however you want.
148
+ - **Don't migrate all services at once.** Start with one. Validate the pattern. Then migrate the next.
149
+
150
+ ## Parallel batch generation
151
+
152
+ The gem handles single calls. You handle parallelism:
153
+
154
+ ```ruby
155
+ class SummarizeBatch < RubyLLM::Contract::Step::Base
156
+ output_schema do
157
+ array :summaries do
158
+ object do
159
+ string :article_id
160
+ string :tldr
161
+ end
162
+ end
163
+ end
164
+ retry_policy models: %w[gpt-4.1-mini gpt-4.1]
165
+ end
166
+
167
+ # Your orchestrator
168
+ threads = 10.times.map do |i|
169
+ Thread.new { Rails.application.executor.wrap { SummarizeBatch.run(input(i)) } }
170
+ end
171
+ results = threads.map(&:value)
172
+ ```
173
+
174
+ **Note:** in tests, `stub_step` overrides are thread-local. If your orchestrator spawns threads, propagate overrides manually:
175
+
176
+ ```ruby
177
+ overrides = RubyLLM::Contract.step_adapter_overrides.dup
178
+ Thread.new { RubyLLM::Contract.step_adapter_overrides = overrides; SummarizeBatch.run(input) }
179
+ ```
180
+
181
+ ## See also
182
+
183
+ - [Getting Started](getting_started.md) — the full walkthrough of every feature `SummarizeArticle` uses.
184
+ - [Testing](testing.md) — `stub_step` reference for migrating your test adapter mocks.
185
+ - [Eval-First](eval_first.md) — how to build the "regression" eval from production logs.
@@ -0,0 +1,160 @@
1
+ # Multimodal input
2
+
3
+ > Read this when your contract needs to send a PDF, image, or audio file to the LLM — not just text.
4
+
5
+ `ruby_llm-contract` 0.10.0+ routes attachments through the contract layer, so `max_cost`, `validate`, `retry_policy escalate(...)`, and trace observability still apply. The gem does **not** ship its own multimodal API — it forwards `with:` to `RubyLLM::Chat#ask`, which RubyLLM 1.15+ normalises per provider (Anthropic, OpenAI, Gemini).
6
+
7
+ ## Minimal example
8
+
9
+ ```ruby
10
+ # app/contracts/extract_invoice_data.rb
11
+ class ExtractInvoiceData < RubyLLM::Contract::Step::Base
12
+ prompt "Extract invoice fields from the attached PDF. Return JSON."
13
+
14
+ output_schema do
15
+ string :vendor
16
+ string :invoice_number
17
+ number :total_amount
18
+ string :currency, enum: %w[USD EUR PLN GBP]
19
+ end
20
+
21
+ validate("currency present") { |o, _| !o[:currency].nil? }
22
+
23
+ # REQUIRED when max_cost is set and the contract receives an attachment.
24
+ # Conservative estimate of attachment input tokens (provider/model-specific).
25
+ attachment_token_estimate 15_000 # ~12 PDF pages at ~1250 tokens/page
26
+
27
+ max_cost 0.10
28
+
29
+ retry_policy do
30
+ escalate "gpt-4.1-mini",
31
+ { model: "gpt-5", reasoning_effort: "high" }
32
+ end
33
+ end
34
+
35
+ result = ExtractInvoiceData.run(
36
+ "Look for vendor, amount, currency.",
37
+ context: { attachment: "tmp/invoice.pdf" }
38
+ )
39
+
40
+ result.status # => :ok
41
+ result.parsed_output # => { vendor: "...", invoice_number: "...", ... }
42
+ result.trace[:cost] # => 0.0042 (total across attempts)
43
+ ```
44
+
45
+ ## How it works
46
+
47
+ 1. **Input vs attachment.** The `input` argument to `Step.run` is the text prompt. The attachment travels via `context: { attachment: ... }` — opaque to the contract layer, forwarded to the adapter.
48
+ 2. **Adapter forwards `with:`.** `RubyLLM::Contract::Adapters::RubyLLM` reads `options[:attachment]` and passes it to `chat.ask(content, with: attachment)`. RubyLLM picks the right wire format per provider.
49
+ 3. **Multi-attachment supported.** `with: [pdf1, pdf2]` or `with: { images: [...], pdfs: [...] }` works natively (`RubyLLM::Content#process_attachments`).
50
+ 4. **`with: nil` is a no-op.** Text-only contracts unaffected — the kwarg defaults to nil.
51
+
52
+ ## Cost: `attachment_token_estimate` is required
53
+
54
+ The gem cannot count attachment input tokens precisely — vision/PDF token cost depends on model, image resolution, page count, and provider. To keep `max_cost` and `max_input` fail-closed, you declare a **conservative estimate** of attachment input tokens at the class level:
55
+
56
+ ```ruby
57
+ class TranscribePDF < RubyLLM::Contract::Step::Base
58
+ # ...
59
+ attachment_token_estimate 15_000 # safe upper bound for ~12-page docs
60
+ max_cost 0.05
61
+ end
62
+ ```
63
+
64
+ The same estimate applies to:
65
+
66
+ - **Runtime** (`limit_checker`) — adds the estimate to `input_tokens` before checking `max_cost`/`max_input`. Refuses pre-flight if budget exceeded.
67
+ - **Pre-flight** (`estimate_cost`) — accepts `attachment:` kwarg; same accounting. No drift between estimate and runtime decisions.
68
+
69
+ ### Fail-closed without estimate
70
+
71
+ If your contract has `max_cost` or `max_input` set, receives an attachment, and `attachment_token_estimate` is **not declared**, the call fails with `:limit_exceeded` — the gem refuses to spend money on cost it cannot bound.
72
+
73
+ ```ruby
74
+ class MyContract < RubyLLM::Contract::Step::Base
75
+ max_cost 0.05
76
+ # no attachment_token_estimate declared
77
+ end
78
+
79
+ result = MyContract.run("text", context: { attachment: "doc.pdf" })
80
+ result.status # => :limit_exceeded
81
+ result.validation_errors # => ["attachment present but attachment_token_estimate not set; ..."]
82
+ ```
83
+
84
+ ### Opting out per-step
85
+
86
+ If you do not want fail-closed (e.g., experimental or development contracts), set:
87
+
88
+ ```ruby
89
+ class FlexibleContract < RubyLLM::Contract::Step::Base
90
+ on_unknown_attachment_size :warn # log a warning instead of refusing
91
+ max_cost 0.05
92
+ end
93
+ ```
94
+
95
+ `:warn` is per-step. There is no global opt-out — the same invariant as `on_unknown_pricing`.
96
+
97
+ ## Pre-flight cost estimation
98
+
99
+ `estimate_cost` accepts an optional `attachment:` kwarg for budget planning:
100
+
101
+ ```ruby
102
+ ExtractInvoiceData.estimate_cost(
103
+ input: "Look for vendor, amount...",
104
+ attachment: "tmp/invoice.pdf"
105
+ )
106
+ # => { model: "gpt-4.1-mini",
107
+ # input_tokens: 15_320,
108
+ # output_tokens_estimate: 256,
109
+ # estimated_cost: 0.0123 }
110
+ ```
111
+
112
+ The `input_tokens` figure includes both the text estimate (chars/4 heuristic) AND the declared `attachment_token_estimate`. Pre-flight refusal mirrors runtime: if `attachment` is passed and `attachment_token_estimate` is not declared, `estimate_cost` returns nil and emits the same fail-closed reason.
113
+
114
+ **Note on output tokens.** `attachment_token_estimate` adds to `input_tokens` only — not to `output_tokens_estimate`. Vision-heavy responses (long image descriptions, transcribed paragraphs) may exceed the conservative `output_tokens_estimate` default. Treat `estimated_cost` as a floor for budget planning, not a precise predictor; inflate `max_output` or `max_cost` accordingly if your prompt routinely produces verbose descriptions.
115
+
116
+ ## Calibrating `attachment_token_estimate`
117
+
118
+ The number depends on provider, model, and content shape. Some baselines:
119
+
120
+ | Content | Provider | Approx tokens |
121
+ |-------------------------------|----------|---------------|
122
+ | 1 PDF page (text-heavy) | OpenAI | ~1000-1500 |
123
+ | 1 PDF page (text-heavy) | Anthropic | ~1000-1500 |
124
+ | 1 image (1024x1024, low res) | OpenAI | ~85 |
125
+ | 1 image (1024x1024, high res) | OpenAI | ~765 |
126
+ | 1 image | Anthropic | ~1500 max |
127
+ | 1 image | Gemini | ~258 (fixed) |
128
+
129
+ Pick a value at or above the provider's worst-case. The estimate is a **floor for safety**, not a precise count — use it to gate budget refusal, not to predict exact cost.
130
+
131
+ ## Multi-turn caveat
132
+
133
+ If your contract uses history (`add_history`), attachments from prior turns are **not** replayed in 0.10.x. Single-turn multimodal works; follow-up questions on the same document require additional work that is deferred to a later release. See [ADR-0022](../decisions/ADR-0022-v09-multimodal-input.md) (internal) for the rationale.
134
+
135
+ ## Provider notes
136
+
137
+ - **OpenAI** — PDFs sent as `type: 'file'` with `file_data` (base64). Images as `image_url`. Audio as `input_audio`. Vision pricing varies by image detail; check the model card.
138
+ - **Anthropic** — PDFs sent as `type: 'document'` with `source.type: 'base64'` or `'url'` (auto-selected). Images same. Page limit ~100 per call.
139
+ - **Gemini** — Everything via `inline_data` with `mime_type`. Multimodal token counting is unified.
140
+
141
+ RubyLLM dispatches on `attachment.type` (`:image`, `:pdf`, `:audio`, `:text`, `:unknown`). Tempfiles must have the right extension (`.pdf`, `.png`, etc.) — RubyLLM detects MIME from the filename; an unsuffixed tempfile becomes `:unknown` and is rejected by the provider.
142
+
143
+ ## Testing contracts with attachments
144
+
145
+ The Test adapter ignores the `attachment` context key (deterministic responses by step). To verify your adapter call shape, stub `RubyLLM::Chat` directly:
146
+
147
+ ```ruby
148
+ RSpec.describe ExtractInvoiceData do
149
+ it "forwards attachment to chat.ask" do
150
+ expect_any_instance_of(RubyLLM::Chat).to receive(:ask)
151
+ .with(anything, with: "fixtures/invoice.pdf")
152
+ .and_return(double(content: '{"vendor":"X",...}', input_tokens: 200, output_tokens: 50))
153
+
154
+ result = described_class.run("extract", context: { attachment: "fixtures/invoice.pdf" })
155
+ expect(result.status).to eq(:ok)
156
+ end
157
+ end
158
+ ```
159
+
160
+ For pre-flight estimate tests, just call `.estimate_cost(input: ..., attachment: ...)` — no adapter stub needed.
@@ -0,0 +1,131 @@
1
+ # Find the cheapest viable fallback list
2
+
3
+ > Read this when you have 2+ evals and want to know empirically — not by guessing — which models belong in `retry_policy` and in what order.
4
+
5
+ You defined `SummarizeArticle` in the [README](../../README.md) with `retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini gpt-4.1]`. That list was a guess. `optimize_retry_policy` tells you which models your evals actually need, so you stop paying for the strong model when `nano` was enough — or stop shipping `nano` when the hardest eval proves it isn't.
6
+
7
+ ## Requirements
8
+
9
+ - **`SummarizeArticle` already has `retry_policy`.** If your step has none, add one first ([getting started](getting_started.md)).
10
+ - **2–3 evals per step.** One eval optimizes for one scenario; with only `smoke`, you get a recommendation that passes smoke but may miss production edge cases. See [eval-first](eval_first.md).
11
+ - **Rake tasks.** The standard `RubyLLM::Contract::RakeTask` includes `ruby_llm_contract:optimize`. Non-Rails projects: set `EVAL_DIRS=...`.
12
+
13
+ > **Two orthogonal dimensions to a retry chain.** A chain element is `{ model:, reasoning_effort: }` — model identity AND thinking budget. `optimize_retry_policy` explores both. You can also fix the thinking config at class level via `thinking effort: :low` (or alias `reasoning_effort :low`) on the Step — it becomes the default for every chain element unless an override is passed. See the `thinking` DSL note at the bottom of this guide.
14
+
15
+ For this guide, assume `SummarizeArticle` has three evals:
16
+
17
+ ```ruby
18
+ SummarizeArticle.define_eval("smoke") { ... } # short news article
19
+ SummarizeArticle.define_eval("dense_article") { ... } # long form, 5 takeaways required
20
+ SummarizeArticle.define_eval("critical_tone") { ... } # negative review, tone must match
21
+ ```
22
+
23
+ ## Offline check first
24
+
25
+ Run once offline to verify the wiring:
26
+
27
+ ```bash
28
+ rake ruby_llm_contract:optimize \
29
+ STEP=SummarizeArticle \
30
+ CANDIDATES=gpt-4.1-nano,gpt-4.1-mini@low,gpt-4.1-mini,gpt-4.1
31
+ ```
32
+
33
+ Offline uses each eval's `sample_response` — zero API calls. **Every candidate gets the same score** because they all receive the canned response. That's fine for a smoke test (verifying evals load, candidates parse, output renders) but it doesn't compare model quality. For real optimization, go live.
34
+
35
+ ## Optimize against real models
36
+
37
+ ```bash
38
+ LIVE=1 RUNS=3 rake ruby_llm_contract:optimize \
39
+ STEP=SummarizeArticle \
40
+ CANDIDATES=gpt-4.1-nano,gpt-4.1-mini@low,gpt-4.1-mini,gpt-4.1
41
+ ```
42
+
43
+ `LIVE=1` makes real API calls. `RUNS=3` averages each `(candidate, eval)` pair over three runs — necessary because OpenAI forces `temperature=1.0` on gpt-5 / o-series and the same pair can score `0.00` on one run and `1.00` on the next.
44
+
45
+ Output (illustrative):
46
+
47
+ ```
48
+ SummarizeArticle — fallback list optimization
49
+
50
+ eval 4.1-nano 4.1-mini@low 4.1-mini 4.1
51
+ ---------------------------------------------------------
52
+ smoke 1.00 1.00 1.00 1.00
53
+ dense_article 0.67 ← 1.00 1.00 1.00
54
+ critical_tone 0.50 ← 0.67 ← 1.00 1.00
55
+
56
+ Hardest eval: critical_tone
57
+
58
+ Suggested fallback list:
59
+ gpt-4.1-nano — covers 1 eval(s)
60
+ gpt-4.1-mini — passes all 3 evals
61
+
62
+ DSL:
63
+ retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini]
64
+ ```
65
+
66
+ Reading the table:
67
+ - **`←` marks scores below threshold in the hardest eval.** Not a selection hint — just "this candidate fails the row that matters most".
68
+ - **Hardest eval** = the one that forces the strong fallback. Here, `critical_tone` demands `gpt-4.1-mini`.
69
+ - **Suggested fallback list** = the shortest chain where each step covers more evals, built greedy-cheapest-first. Stops when all evals pass. Order matters: `gpt-4.1-nano` is tried first; on validation failure, the gem falls back to `gpt-4.1-mini`.
70
+
71
+ Copy the DSL, paste into your step, verify with `rake ruby_llm_contract:eval`. You just dropped `gpt-4.1` from the chain — most requests finish on nano, mini handles what nano misses, and the strong model was never needed.
72
+
73
+ ## Measure effective cost before shipping
74
+
75
+ `optimize` shows **first-attempt** cost. In production, a candidate whose validator rejects 20% of outputs actually costs `first_try_cost + fallback_cost × 0.20` per successful output. The first-attempt number hides this.
76
+
77
+ `production_mode: { fallback: "..." }` runs each candidate with a runtime `[candidate, fallback]` chain and reports effective cost:
78
+
79
+ ```ruby
80
+ SummarizeArticle.compare_models(
81
+ "dense_article",
82
+ candidates: [{ model: "gpt-5-nano" }, { model: "gpt-5-mini", reasoning_effort: "low" }],
83
+ production_mode: { fallback: "gpt-5-mini" }
84
+ ).print_summary
85
+ ```
86
+
87
+ Output (live mode, illustrative):
88
+
89
+ ```
90
+ dense_article — model comparison
91
+
92
+ Chain first-attempt fallback % effective cost latency score
93
+ ---------------------------------------------------------------------------------------------------
94
+ gpt-5-nano → gpt-5-mini $0.0010 33% $0.0018 164ms 1.00
95
+ gpt-5-mini (effort: low) → gpt-5-mini $0.0015 5% $0.0016 210ms 1.00
96
+ gpt-5-mini $0.0030 — $0.0030 220ms 1.00
97
+ ```
98
+
99
+ - **first-attempt** — cost of the first run alone.
100
+ - **fallback %** — fraction of cases where the validator rejected and the fallback ran.
101
+ - **effective cost** — total per successful output including retries.
102
+ - **`—`** — candidate equals fallback, no chain to observe.
103
+
104
+ Run this before finalizing: a candidate saving 3× on first-attempt but falling back 60% of the time may save only 1.2× in production.
105
+
106
+ **Scope.** Single-fallback (2-tier) chains only. Multi-tier inspect via `trace.attempts`. Step-level — calling on `Pipeline::Base` raises `ArgumentError`.
107
+
108
+ ## When results look wrong
109
+
110
+ - **"No viable chain" from a single live run.** Re-run with `RUNS=3`. If scores jump, the first run was noise. Never trust single-run results with gpt-5 / o-series in the pool — `temperature=1.0` is server-enforced.
111
+ - **Every candidate fails the same eval**, including the strongest. The eval is rejecting correct answers. Run the step directly (`context: { retry_policy_override: nil, model: "gpt-4.1" }`), inspect the output, compare with the `verify` block. Loosen the eval if the output is correct but not one of the accepted values.
112
+ - **Testing one specific hypothesis.** (e.g. "does `mini@medium` help on `critical_tone`?") Use `SummarizeArticle.compare_models("critical_tone", candidates: [{ model: "gpt-5-mini", reasoning_effort: "medium" }], runs: 3)` directly — three calls instead of rerunning the whole optimize pass.
113
+
114
+ ## Programmatic API names
115
+
116
+ Metrics exposed on `Report` / `AggregatedReport` keep their original names: `single_shot_cost`, `single_shot_latency_ms`, `escalation_rate`. The optimize Result struct also exposes `hardest_eval` as an alias for `constraining_eval`.
117
+
118
+ ## thinking DSL note
119
+
120
+ Set the default reasoning effort once on the Step class — mirrors `RubyLLM::Agent.thinking` exactly:
121
+
122
+ ```ruby
123
+ class SummarizeArticle < RubyLLM::Contract::Step::Base
124
+ model "gpt-5-nano"
125
+ thinking effort: :low # canonical
126
+ # or
127
+ reasoning_effort :low # alias for thinking(effort: :low)
128
+ end
129
+ ```
130
+
131
+ Forwarded to `Chat#with_thinking(**)` through the adapter — works provider-agnostically (OpenAI `reasoning_effort`, Anthropic extended-thinking budget). A per-call override via `context: { reasoning_effort: :high }` still wins over the class default.
@@ -0,0 +1,93 @@
1
+ # Output Schema
2
+
3
+ > Read this as a reference for the schema DSL — every constraint, nested objects, arrays of objects, the full pattern table.
4
+
5
+ Declare the expected output structure using [ruby_llm-schema](https://github.com/danielfriis/ruby_llm-schema) DSL. The schema serves **two purposes**:
6
+
7
+ 1. **Output validation** — replaces type and shape checks (enums, ranges, required fields). One declaration instead of many.
8
+ 2. **Provider-side request** — with the RubyLLM adapter, the schema is sent to the LLM provider via `chat.with_schema(...)`, asking the model to return JSON matching the shape. Cheaper models sometimes ignore the request, which is why client-side validation (point 1) still matters.
9
+
10
+ > **Same DSL `RubyLLM::Agent.schema` accepts.** `output_schema do ... end` here is a wrapper around `RubyLLM::Schema.create(&block)` plus a client-side validation step. `RubyLLM::Agent.schema` accepts the same block — choosing one over the other does not change the schema language. The difference: `Agent.schema` lets you pass a `Proc` evaluated in runtime context (dynamic per-call schema); `Step.output_schema` is eager-compiled at class load and additionally drives `output_type` inference and `Validator.validate`. Both can coexist.
11
+
12
+ All examples below extend the `SummarizeArticle` step from the [README](../../README.md).
13
+
14
+ ## Schema replaces type and shape checks
15
+
16
+ ```ruby
17
+ # WITHOUT schema — many validates:
18
+ validate("tldr must be a string") { |o| o[:tldr].is_a?(String) }
19
+ validate("takeaways must be an array") { |o| o[:takeaways].is_a?(Array) }
20
+ validate("takeaways 3 to 5") { |o| (3..5).cover?(o[:takeaways].size) }
21
+ ALLOWED_TONES = %w[neutral positive negative analytical].freeze
22
+ validate("tone must be an allowed label") { |o| ALLOWED_TONES.include?(o[:tone]) }
23
+
24
+ # WITH schema — one declaration:
25
+ output_schema do
26
+ string :tldr
27
+ array :takeaways, of: :string, min_items: 3, max_items: 5
28
+ string :tone, enum: %w[neutral positive negative analytical]
29
+ end
30
+ ```
31
+
32
+ ## Nested objects in arrays
33
+
34
+ Use `object do...end` inside `array` when you need more than a primitive per element. Concrete scenario: the UI card grows a "confidence bar" next to each takeaway so editors can see which points the model was sure about vs guessing. That requires `confidence` paired with `text`, not two parallel arrays that could desync. Nested objects make the pairing a schema invariant:
35
+
36
+ ```ruby
37
+ output_schema do
38
+ string :tldr
39
+ array :takeaways, min_items: 3, max_items: 5 do
40
+ object do
41
+ string :text
42
+ number :confidence, minimum: 0.0, maximum: 1.0
43
+ end
44
+ end
45
+ string :tone, enum: %w[neutral positive negative analytical]
46
+ end
47
+ ```
48
+
49
+ ## Schema pattern reference
50
+
51
+ | Your output looks like | Schema pattern | Example |
52
+ |---|---|---|
53
+ | `{"tldr": "...", "tone": "positive"}` | Flat fields | `string :tldr; string :tone, enum: [...]` |
54
+ | `{"takeaways": ["...", "..."]}` | Array of primitives | `array :takeaways, of: :string, min_items: 3, max_items: 5` |
55
+ | `{"takeaways": [{"text": "...", "confidence": 0.9}]}` | Array of objects | `array :takeaways do; object do; string :text; number :confidence; end; end` |
56
+
57
+ Without `object do...end`, `array :takeaways do; string :text; end` tells the provider "takeaways is an array of strings" — not objects. That's what you get back.
58
+
59
+ ## Why schema alone is not enough
60
+
61
+ Schema validates **shape** — correct types, allowed values, field presence. But LLMs can return structurally valid JSON that is **logically wrong**. Validates catch what schema can't:
62
+
63
+ ```ruby
64
+ output_schema do
65
+ string :tldr
66
+ array :takeaways, of: :string, min_items: 3, max_items: 5
67
+ string :tone, enum: %w[neutral positive negative analytical]
68
+ end
69
+
70
+ # Schema allows any string for :tldr — but a 500-char "summary" breaks the UI card.
71
+ validate("TL;DR fits the card") { |o, _| o[:tldr].length <= 200 }
72
+
73
+ # Schema enforces 3–5 takeaways — but says nothing about them being distinct.
74
+ validate("takeaways are unique") { |o, _| o[:takeaways] == o[:takeaways].uniq }
75
+
76
+ # Schema can't express cross-field rules.
77
+ validate("critical tone requires at least one concrete risk") do |o, _|
78
+ next true unless o[:tone] == "negative"
79
+ o[:takeaways].any? { |t| t.match?(/fail|break|crash|outage|vulnerab/i) }
80
+ end
81
+ ```
82
+
83
+ ## Supported constraints
84
+
85
+ | Constraint | Types | Example |
86
+ |---|---|---|
87
+ | `enum` | string, integer | `string :tone, enum: %w[neutral positive negative analytical]` |
88
+ | `minimum` / `maximum` | number, integer | `number :confidence, minimum: 0.0, maximum: 1.0` |
89
+ | `min_length` / `max_length` | string | `string :tldr, min_length: 1, max_length: 200` |
90
+ | `min_items` / `max_items` | array | `array :takeaways, of: :string, min_items: 3, max_items: 5` |
91
+ | `additional_properties` | object | Set to `false` in the schema to reject extra keys |
92
+
93
+ Keyword args use Ruby snake_case (`min_length`, `min_items`). The DSL converts them internally to JSON Schema's camelCase (`minLength`, `minItems`) before sending the schema to the provider — you don't need to write camelCase in Ruby.