ruby_llm-contract 0.7.0 → 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +6 -0
- data/Gemfile.lock +2 -2
- data/README.md +75 -5
- data/lib/ruby_llm/contract/step/base.rb +28 -12
- data/lib/ruby_llm/contract/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: a165b7d8436d99e564fd5e774c601bae38562c0827c226995f1922e75f9987cf
|
|
4
|
+
data.tar.gz: 4124a3a13caba843448529a55eb1420601cbde70873f5924c792d9ed0c97161b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: e5f17ef5241d9f7ddc047a16608b607d5fefedeff207cc6c3fb969c937192527c5b49c4213b3a3635db4e6aa68784f10f2363a5d99ed8528740b0d4ecebf3790
|
|
7
|
+
data.tar.gz: e58336f998f4df510b707534fd5fecdd87c8e4d04187f6ed9512e2b391dd359e3c9c4cfab106022a9a34ed8dfc27669156f5fc4214320ffe9dec3a0176aab181
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,11 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.7.1 (2026-04-22)
|
|
4
|
+
|
|
5
|
+
### Changed (behavioral, follow-up to v0.7.0)
|
|
6
|
+
|
|
7
|
+
- **`Step::Base#run_once` no longer swallows adapter-phase `ArgumentError` as `:input_error`.** The previous blanket `rescue ArgumentError` was there to convert DSL misconfiguration (e.g. missing `prompt`) into an `:input_error` Result. Side effect: programmer bugs in adapter code that raised `ArgumentError` (wrong arity, bad config argument) were silently coerced into `:input_error` and retried as if the user had given bad input. Now the rescue is narrowed to the Runner-construction phase only — DSL configuration errors still produce `:input_error` (the `prompt has not been set` case is regression-tested), but `ArgumentError` raised from adapter code during `Runner#call` propagates to the caller. Input-type validation failures continue to produce `:input_error` through `InputValidator`'s own scoped rescue, unchanged.
|
|
8
|
+
|
|
3
9
|
## 0.7.0 (2026-04-21)
|
|
4
10
|
|
|
5
11
|
### Breaking changes
|
data/Gemfile.lock
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
ruby_llm-contract (0.7.
|
|
4
|
+
ruby_llm-contract (0.7.1)
|
|
5
5
|
dry-types (~> 1.7)
|
|
6
6
|
ruby_llm (~> 1.0)
|
|
7
7
|
ruby_llm-schema (~> 0.3)
|
|
@@ -258,7 +258,7 @@ CHECKSUMS
|
|
|
258
258
|
rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
|
|
259
259
|
ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
|
|
260
260
|
ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
|
|
261
|
-
ruby_llm-contract (0.7.
|
|
261
|
+
ruby_llm-contract (0.7.1)
|
|
262
262
|
ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
|
|
263
263
|
ruby_parser (3.22.0) sha256=1eb4937cd9eb220aa2d194e352a24dba90aef00751e24c8dfffdb14000f15d23
|
|
264
264
|
rubycritic (4.12.0) sha256=024fed90fe656fa939f6ea80aab17569699ac3863d0b52fd72cb99892247abc8
|
data/README.md
CHANGED
|
@@ -1,6 +1,21 @@
|
|
|
1
1
|
# ruby_llm-contract
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
**Handle LLM output variance for [ruby_llm](https://github.com/crmne/ruby_llm).** Transport is a solved problem — ruby_llm already retries rate limits, timeouts, and server errors at the Faraday layer. What it can't do: retry when the model returns malformed JSON or a wrong answer, escalate to a smarter model when the cheap one fails the rules, measure variance on your dataset, and gate CI on regressions. That's what this gem adds.
|
|
4
|
+
|
|
5
|
+
## Where the boundary sits
|
|
6
|
+
|
|
7
|
+
| Concern | Handled by |
|
|
8
|
+
|---|---|
|
|
9
|
+
| Rate limits, timeouts, 5xx, connection errors | `ruby_llm` (Faraday retry middleware) |
|
|
10
|
+
| Streaming, tool calls, embeddings, images, transcription | `ruby_llm` |
|
|
11
|
+
| Chat history persistence (`acts_as_chat`) | `ruby_llm` |
|
|
12
|
+
| **Malformed JSON / parse errors** | **`ruby_llm-contract`** |
|
|
13
|
+
| **Business rule violations (invariants, schema)** | **`ruby_llm-contract`** |
|
|
14
|
+
| **Retry with model escalation on bad output** | **`ruby_llm-contract`** |
|
|
15
|
+
| **Measuring output variance on datasets** | **`ruby_llm-contract`** |
|
|
16
|
+
| **Regression detection + CI gates** | **`ruby_llm-contract`** |
|
|
17
|
+
|
|
18
|
+
Put together: `ruby_llm` owns the wire, this gem owns what the model *said*.
|
|
4
19
|
|
|
5
20
|
```
|
|
6
21
|
YOU WRITE THE GEM HANDLES YOU GET
|
|
@@ -115,9 +130,9 @@ RubyLLM::Contract.configure { |c| c.default_model = "gpt-4.1-mini" }
|
|
|
115
130
|
|
|
116
131
|
Works with any ruby_llm provider (OpenAI, Anthropic, Gemini, etc).
|
|
117
132
|
|
|
118
|
-
##
|
|
133
|
+
## Handle output variance with model escalation
|
|
119
134
|
|
|
120
|
-
|
|
135
|
+
Models are non-deterministic. A prompt that works on 95% of inputs can break on the edge case sitting in your production traffic right now. The defensive response is to pick the strongest model and pay for it on every call. The measured response is to define a contract and let the gem escalate only when the cheaper model's output actually fails the rules:
|
|
121
136
|
|
|
122
137
|
```ruby
|
|
123
138
|
retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini gpt-4.1]
|
|
@@ -129,7 +144,58 @@ Attempt 2: gpt-4.1-mini → contract passed ($0.0004)
|
|
|
129
144
|
gpt-4.1 → never called ($0.00)
|
|
130
145
|
```
|
|
131
146
|
|
|
132
|
-
Most requests succeed on the cheapest model.
|
|
147
|
+
Most requests succeed on the cheapest model. The expensive ones fire only when output variance demands it. The cost win is a consequence of measuring variance correctly — not the primary goal. Want to know how often each tier triggers? Run `compare_models` against your dataset.
|
|
148
|
+
|
|
149
|
+
Default retry statuses (since 0.7.0) are `:validation_failed` and `:parse_error` — the two flavors of LLM output variance. Transport errors (rate limits, timeouts, 5xx) are retried by ruby_llm at the HTTP layer and intentionally not duplicated here. If you want `:adapter_error` in retry too, opt in explicitly — it's meaningful paired with an escalation chain.
|
|
150
|
+
|
|
151
|
+
## Soft delivery: retry for variance, ship the last attempt
|
|
152
|
+
|
|
153
|
+
Sometimes validation is a **soft quality check** — "options balanced", "style consistent", "tone friendly" — and a partial output is better than none. The same model generating the same prompt produces different samples run-to-run (OpenAI forces `temperature=1.0` on gpt-5/o-series), so a single unlucky draw shouldn't fail the user. Use `attempts:` to retry on the SAME model — no escalation — and get the last attempt back even if it still failed the contract:
|
|
154
|
+
|
|
155
|
+
```ruby
|
|
156
|
+
class GenerateQuiz < RubyLLM::Contract::Step::Base
|
|
157
|
+
output_schema do
|
|
158
|
+
# ... 15 questions × 4 options ...
|
|
159
|
+
end
|
|
160
|
+
|
|
161
|
+
validate("answer options balanced") do |out, _|
|
|
162
|
+
out[:questions].all? do |q|
|
|
163
|
+
lens = q[:answer_options].map(&:length)
|
|
164
|
+
next false if lens.empty? || lens.min.zero?
|
|
165
|
+
|
|
166
|
+
lens.min >= 15 && (lens.max.to_f / lens.min) <= 1.7
|
|
167
|
+
end
|
|
168
|
+
end
|
|
169
|
+
|
|
170
|
+
retry_policy attempts: 3
|
|
171
|
+
end
|
|
172
|
+
|
|
173
|
+
result = GenerateQuiz.run(document)
|
|
174
|
+
if result.ok?
|
|
175
|
+
publish(result.parsed_output)
|
|
176
|
+
else
|
|
177
|
+
# Three unlucky draws in a row — ship the last one anyway, log the miss.
|
|
178
|
+
Rails.logger.warn "Quiz delivered with soft-validation miss: #{result.validation_errors.join('; ')}"
|
|
179
|
+
publish(result.parsed_output)
|
|
180
|
+
end
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
How this differs from the escalation chain:
|
|
184
|
+
|
|
185
|
+
- `retry_policy models: %w[nano mini full]` — **document hardness.** Retry means "the cheap model isn't enough, use a smarter one."
|
|
186
|
+
- `retry_policy attempts: 3` — **sampling variance.** Retry means "same model, different random seed — the model can do better on a second try."
|
|
187
|
+
|
|
188
|
+
After all `attempts` are exhausted (`attempts: 3` means 3 total tries, not 3 retries on top of the first), the Result carries `status: :validation_failed` plus the last attempt's `parsed_output`. The caller decides: ship anyway, fall back to a template, or surface an error. The gem does not raise on exhaustion — your application policy, your choice.
|
|
189
|
+
|
|
190
|
+
Combine both when helpful:
|
|
191
|
+
|
|
192
|
+
```ruby
|
|
193
|
+
retry_policy do
|
|
194
|
+
escalate({ model: "gpt-4.1-mini" }, { model: "gpt-4.1-mini" }, { model: "gpt-4.1" })
|
|
195
|
+
end
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
Two tries on mini (variance retry) before paying for full-fat gpt-4.1.
|
|
133
199
|
|
|
134
200
|
## Know which model to use — with data
|
|
135
201
|
|
|
@@ -284,7 +350,11 @@ Full procedure with examples: **[Optimizing retry_policy](docs/guide/optimizing_
|
|
|
284
350
|
|
|
285
351
|
## Roadmap
|
|
286
352
|
|
|
287
|
-
**v0.
|
|
353
|
+
**v0.7.1 (current):** Follow-up — `Step::Base#run_once` no longer masks adapter-phase `ArgumentError` as `:input_error`. Programmer bugs in adapter code now propagate; DSL misconfiguration still becomes `:input_error` via narrower rescue.
|
|
354
|
+
|
|
355
|
+
**v0.7.0:** Sharpened retry semantics. `DEFAULT_RETRY_ON` now targets LLM output variance only (`:validation_failed`, `:parse_error`); transport errors are delegated to ruby_llm's Faraday retry. `AdapterCaller` narrowed to let programmer errors propagate instead of masking them as retries. Breaking change — see [CHANGELOG](CHANGELOG.md) for migration.
|
|
356
|
+
|
|
357
|
+
**v0.6:** "What should I do?" — `Step.recommend` returns optimal model, reasoning effort, and retry chain. Per-attempt `reasoning_effort` in retry policies.
|
|
288
358
|
|
|
289
359
|
**v0.5:** Prompt A/B testing with `compare_with`. Soft observations with `observe`.
|
|
290
360
|
|
|
@@ -186,20 +186,36 @@ module RubyLLM
|
|
|
186
186
|
"{ |c| c.default_adapter = ... } or pass context: { adapter: ... }"
|
|
187
187
|
end
|
|
188
188
|
|
|
189
|
+
# ADR-0021 deliverable 2: narrow ArgumentError rescue to DSL-setup phase only.
|
|
190
|
+
#
|
|
191
|
+
# DSL misconfiguration (e.g. `prompt has not been set`, missing required
|
|
192
|
+
# attributes) surfaces as ArgumentError when constructing Runner. We catch
|
|
193
|
+
# that and return :input_error — these are contract-definition issues the
|
|
194
|
+
# caller can handle as "bad input to the step definition".
|
|
195
|
+
#
|
|
196
|
+
# Runner#call itself does NOT get a blanket rescue: input-type validation
|
|
197
|
+
# failures return :input_error from within InputValidator; adapter/runtime
|
|
198
|
+
# programmer bugs (NoMethodError, adapter-code ArgumentError) must propagate
|
|
199
|
+
# instead of being silently masked as :input_error.
|
|
189
200
|
def run_once(input, adapter:, model:, context_temperature: nil, extra_options: {})
|
|
190
201
|
effective_temp = context_temperature || temperature
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
202
|
+
runner =
|
|
203
|
+
begin
|
|
204
|
+
Runner.new(
|
|
205
|
+
input_type: input_type, output_type: output_type,
|
|
206
|
+
prompt_block: prompt, contract_definition: effective_contract,
|
|
207
|
+
adapter: adapter, model: model, output_schema: output_schema,
|
|
208
|
+
max_output: max_output, max_input: max_input, max_cost: max_cost,
|
|
209
|
+
on_unknown_pricing: on_unknown_pricing,
|
|
210
|
+
temperature: effective_temp, extra_options: extra_options,
|
|
211
|
+
observers: class_observers
|
|
212
|
+
)
|
|
213
|
+
rescue ArgumentError => e
|
|
214
|
+
return Result.new(status: :input_error, raw_output: nil, parsed_output: nil,
|
|
215
|
+
validation_errors: [e.message])
|
|
216
|
+
end
|
|
217
|
+
|
|
218
|
+
runner.call(input)
|
|
203
219
|
end
|
|
204
220
|
|
|
205
221
|
def log_result(result)
|