ruby_llm-contract 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5decfb7456338baa05d0e6bb79287bb3fb3af0e0cc2c3f001e09122fa76ac298
4
- data.tar.gz: be3b46ff015fd651e885c4c078dbdaf06623865f4e4b80b864b66e8dd9e16c34
3
+ metadata.gz: a165b7d8436d99e564fd5e774c601bae38562c0827c226995f1922e75f9987cf
4
+ data.tar.gz: 4124a3a13caba843448529a55eb1420601cbde70873f5924c792d9ed0c97161b
5
5
  SHA512:
6
- metadata.gz: fd3882613ac2b500c46dc6b1084d8f298db96800bf01932e3bc2638a7a3d2a8588610c1de8a1f3754a8e15fb7b1ef29c0c0a7bddd11709cd95bbf12fcd48e83e
7
- data.tar.gz: 4cb584323a5575de4131b0eb82cdb1426743ca6651030708673d17264c9fec60619bb7d5d54f62eee4c1fa357b4022f57bf08b95d81f153eccf8e625ce3ef5e7
6
+ metadata.gz: e5f17ef5241d9f7ddc047a16608b607d5fefedeff207cc6c3fb969c937192527c5b49c4213b3a3635db4e6aa68784f10f2363a5d99ed8528740b0d4ecebf3790
7
+ data.tar.gz: e58336f998f4df510b707534fd5fecdd87c8e4d04187f6ed9512e2b391dd359e3c9c4cfab106022a9a34ed8dfc27669156f5fc4214320ffe9dec3a0176aab181
data/CHANGELOG.md CHANGED
@@ -1,5 +1,11 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.7.1 (2026-04-22)
4
+
5
+ ### Changed (behavioral, follow-up to v0.7.0)
6
+
7
+ - **`Step::Base#run_once` no longer swallows adapter-phase `ArgumentError` as `:input_error`.** The previous blanket `rescue ArgumentError` was there to convert DSL misconfiguration (e.g. missing `prompt`) into an `:input_error` Result. Side effect: programmer bugs in adapter code that raised `ArgumentError` (wrong arity, bad config argument) were silently coerced into `:input_error` and retried as if the user had given bad input. Now the rescue is narrowed to the Runner-construction phase only — DSL configuration errors still produce `:input_error` (the `prompt has not been set` case is regression-tested), but `ArgumentError` raised from adapter code during `Runner#call` propagates to the caller. Input-type validation failures continue to produce `:input_error` through `InputValidator`'s own scoped rescue, unchanged.
8
+
3
9
  ## 0.7.0 (2026-04-21)
4
10
 
5
11
  ### Breaking changes
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- ruby_llm-contract (0.7.0)
4
+ ruby_llm-contract (0.7.1)
5
5
  dry-types (~> 1.7)
6
6
  ruby_llm (~> 1.0)
7
7
  ruby_llm-schema (~> 0.3)
@@ -258,7 +258,7 @@ CHECKSUMS
258
258
  rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
259
259
  ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
260
260
  ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
261
- ruby_llm-contract (0.7.0)
261
+ ruby_llm-contract (0.7.1)
262
262
  ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
263
263
  ruby_parser (3.22.0) sha256=1eb4937cd9eb220aa2d194e352a24dba90aef00751e24c8dfffdb14000f15d23
264
264
  rubycritic (4.12.0) sha256=024fed90fe656fa939f6ea80aab17569699ac3863d0b52fd72cb99892247abc8
data/README.md CHANGED
@@ -1,6 +1,21 @@
1
1
  # ruby_llm-contract
2
2
 
3
- The missing link between LLM cost and quality. Stop choosing between "cheap but wrong" and "accurate but expensive" get both. Contracts, model escalation, and data-driven recommendations for [ruby_llm](https://github.com/crmne/ruby_llm).
3
+ **Handle LLM output variance for [ruby_llm](https://github.com/crmne/ruby_llm).** Transport is a solved problem ruby_llm already retries rate limits, timeouts, and server errors at the Faraday layer. What it can't do: retry when the model returns malformed JSON or a wrong answer, escalate to a smarter model when the cheap one fails the rules, measure variance on your dataset, and gate CI on regressions. That's what this gem adds.
4
+
5
+ ## Where the boundary sits
6
+
7
+ | Concern | Handled by |
8
+ |---|---|
9
+ | Rate limits, timeouts, 5xx, connection errors | `ruby_llm` (Faraday retry middleware) |
10
+ | Streaming, tool calls, embeddings, images, transcription | `ruby_llm` |
11
+ | Chat history persistence (`acts_as_chat`) | `ruby_llm` |
12
+ | **Malformed JSON / parse errors** | **`ruby_llm-contract`** |
13
+ | **Business rule violations (invariants, schema)** | **`ruby_llm-contract`** |
14
+ | **Retry with model escalation on bad output** | **`ruby_llm-contract`** |
15
+ | **Measuring output variance on datasets** | **`ruby_llm-contract`** |
16
+ | **Regression detection + CI gates** | **`ruby_llm-contract`** |
17
+
18
+ Put together: `ruby_llm` owns the wire, this gem owns what the model *said*.
4
19
 
5
20
  ```
6
21
  YOU WRITE THE GEM HANDLES YOU GET
@@ -115,9 +130,9 @@ RubyLLM::Contract.configure { |c| c.default_model = "gpt-4.1-mini" }
115
130
 
116
131
  Works with any ruby_llm provider (OpenAI, Anthropic, Gemini, etc).
117
132
 
118
- ## Save money with model escalation
133
+ ## Handle output variance with model escalation
119
134
 
120
- Without a contract, you use gpt-4.1 for everything because you can't tell when a cheaper model gets it wrong. With a contract, you start on nano and only escalate when the answer fails the contract:
135
+ Models are non-deterministic. A prompt that works on 95% of inputs can break on the edge case sitting in your production traffic right now. The defensive response is to pick the strongest model and pay for it on every call. The measured response is to define a contract and let the gem escalate only when the cheaper model's output actually fails the rules:
121
136
 
122
137
  ```ruby
123
138
  retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini gpt-4.1]
@@ -129,7 +144,58 @@ Attempt 2: gpt-4.1-mini → contract passed ($0.0004)
129
144
  gpt-4.1 → never called ($0.00)
130
145
  ```
131
146
 
132
- Most requests succeed on the cheapest model. You pay full price only for the ones that need it. How many? Run `compare_models` and find out.
147
+ Most requests succeed on the cheapest model. The expensive ones fire only when output variance demands it. The cost win is a consequence of measuring variance correctly — not the primary goal. Want to know how often each tier triggers? Run `compare_models` against your dataset.
148
+
149
+ Default retry statuses (since 0.7.0) are `:validation_failed` and `:parse_error` — the two flavors of LLM output variance. Transport errors (rate limits, timeouts, 5xx) are retried by ruby_llm at the HTTP layer and intentionally not duplicated here. If you want `:adapter_error` in retry too, opt in explicitly — it's meaningful paired with an escalation chain.
150
+
151
+ ## Soft delivery: retry for variance, ship the last attempt
152
+
153
+ Sometimes validation is a **soft quality check** — "options balanced", "style consistent", "tone friendly" — and a partial output is better than none. The same model generating the same prompt produces different samples run-to-run (OpenAI forces `temperature=1.0` on gpt-5/o-series), so a single unlucky draw shouldn't fail the user. Use `attempts:` to retry on the SAME model — no escalation — and get the last attempt back even if it still failed the contract:
154
+
155
+ ```ruby
156
+ class GenerateQuiz < RubyLLM::Contract::Step::Base
157
+ output_schema do
158
+ # ... 15 questions × 4 options ...
159
+ end
160
+
161
+ validate("answer options balanced") do |out, _|
162
+ out[:questions].all? do |q|
163
+ lens = q[:answer_options].map(&:length)
164
+ next false if lens.empty? || lens.min.zero?
165
+
166
+ lens.min >= 15 && (lens.max.to_f / lens.min) <= 1.7
167
+ end
168
+ end
169
+
170
+ retry_policy attempts: 3
171
+ end
172
+
173
+ result = GenerateQuiz.run(document)
174
+ if result.ok?
175
+ publish(result.parsed_output)
176
+ else
177
+ # Three unlucky draws in a row — ship the last one anyway, log the miss.
178
+ Rails.logger.warn "Quiz delivered with soft-validation miss: #{result.validation_errors.join('; ')}"
179
+ publish(result.parsed_output)
180
+ end
181
+ ```
182
+
183
+ How this differs from the escalation chain:
184
+
185
+ - `retry_policy models: %w[nano mini full]` — **document hardness.** Retry means "the cheap model isn't enough, use a smarter one."
186
+ - `retry_policy attempts: 3` — **sampling variance.** Retry means "same model, different random seed — the model can do better on a second try."
187
+
188
+ After all `attempts` are exhausted (`attempts: 3` means 3 total tries, not 3 retries on top of the first), the Result carries `status: :validation_failed` plus the last attempt's `parsed_output`. The caller decides: ship anyway, fall back to a template, or surface an error. The gem does not raise on exhaustion — your application policy, your choice.
189
+
190
+ Combine both when helpful:
191
+
192
+ ```ruby
193
+ retry_policy do
194
+ escalate({ model: "gpt-4.1-mini" }, { model: "gpt-4.1-mini" }, { model: "gpt-4.1" })
195
+ end
196
+ ```
197
+
198
+ Two tries on mini (variance retry) before paying for full-fat gpt-4.1.
133
199
 
134
200
  ## Know which model to use — with data
135
201
 
@@ -284,7 +350,11 @@ Full procedure with examples: **[Optimizing retry_policy](docs/guide/optimizing_
284
350
 
285
351
  ## Roadmap
286
352
 
287
- **v0.6 (current):** "What should I do?" `Step.recommend` returns optimal model, reasoning effort, and retry chain. Per-attempt `reasoning_effort` in retry policies.
353
+ **v0.7.1 (current):** Follow-up `Step::Base#run_once` no longer masks adapter-phase `ArgumentError` as `:input_error`. Programmer bugs in adapter code now propagate; DSL misconfiguration still becomes `:input_error` via narrower rescue.
354
+
355
+ **v0.7.0:** Sharpened retry semantics. `DEFAULT_RETRY_ON` now targets LLM output variance only (`:validation_failed`, `:parse_error`); transport errors are delegated to ruby_llm's Faraday retry. `AdapterCaller` narrowed to let programmer errors propagate instead of masking them as retries. Breaking change — see [CHANGELOG](CHANGELOG.md) for migration.
356
+
357
+ **v0.6:** "What should I do?" — `Step.recommend` returns optimal model, reasoning effort, and retry chain. Per-attempt `reasoning_effort` in retry policies.
288
358
 
289
359
  **v0.5:** Prompt A/B testing with `compare_with`. Soft observations with `observe`.
290
360
 
@@ -186,20 +186,36 @@ module RubyLLM
186
186
  "{ |c| c.default_adapter = ... } or pass context: { adapter: ... }"
187
187
  end
188
188
 
189
+ # ADR-0021 deliverable 2: narrow ArgumentError rescue to DSL-setup phase only.
190
+ #
191
+ # DSL misconfiguration (e.g. `prompt has not been set`, missing required
192
+ # attributes) surfaces as ArgumentError when constructing Runner. We catch
193
+ # that and return :input_error — these are contract-definition issues the
194
+ # caller can handle as "bad input to the step definition".
195
+ #
196
+ # Runner#call itself does NOT get a blanket rescue: input-type validation
197
+ # failures return :input_error from within InputValidator; adapter/runtime
198
+ # programmer bugs (NoMethodError, adapter-code ArgumentError) must propagate
199
+ # instead of being silently masked as :input_error.
189
200
  def run_once(input, adapter:, model:, context_temperature: nil, extra_options: {})
190
201
  effective_temp = context_temperature || temperature
191
- Runner.new(
192
- input_type: input_type, output_type: output_type,
193
- prompt_block: prompt, contract_definition: effective_contract,
194
- adapter: adapter, model: model, output_schema: output_schema,
195
- max_output: max_output, max_input: max_input, max_cost: max_cost,
196
- on_unknown_pricing: on_unknown_pricing,
197
- temperature: effective_temp, extra_options: extra_options,
198
- observers: class_observers
199
- ).call(input)
200
- rescue ArgumentError => e
201
- Result.new(status: :input_error, raw_output: nil, parsed_output: nil,
202
- validation_errors: [e.message])
202
+ runner =
203
+ begin
204
+ Runner.new(
205
+ input_type: input_type, output_type: output_type,
206
+ prompt_block: prompt, contract_definition: effective_contract,
207
+ adapter: adapter, model: model, output_schema: output_schema,
208
+ max_output: max_output, max_input: max_input, max_cost: max_cost,
209
+ on_unknown_pricing: on_unknown_pricing,
210
+ temperature: effective_temp, extra_options: extra_options,
211
+ observers: class_observers
212
+ )
213
+ rescue ArgumentError => e
214
+ return Result.new(status: :input_error, raw_output: nil, parsed_output: nil,
215
+ validation_errors: [e.message])
216
+ end
217
+
218
+ runner.call(input)
203
219
  end
204
220
 
205
221
  def log_result(result)
@@ -2,6 +2,6 @@
2
2
 
3
3
  module RubyLLM
4
4
  module Contract
5
- VERSION = "0.7.0"
5
+ VERSION = "0.7.1"
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby_llm-contract
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 0.7.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Justyna