RubyGems - ruby_llm-contract - Versions diffs - 0.7.0 → 0.7.1 - Mend

ruby_llm-contract 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +6 -0
data/Gemfile.lock +2 -2
data/README.md +75 -5
data/lib/ruby_llm/contract/step/base.rb +28 -12
data/lib/ruby_llm/contract/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 5decfb7456338baa05d0e6bb79287bb3fb3af0e0cc2c3f001e09122fa76ac298
-  data.tar.gz: be3b46ff015fd651e885c4c078dbdaf06623865f4e4b80b864b66e8dd9e16c34
+  metadata.gz: a165b7d8436d99e564fd5e774c601bae38562c0827c226995f1922e75f9987cf
+  data.tar.gz: 4124a3a13caba843448529a55eb1420601cbde70873f5924c792d9ed0c97161b
 SHA512:
-  metadata.gz: fd3882613ac2b500c46dc6b1084d8f298db96800bf01932e3bc2638a7a3d2a8588610c1de8a1f3754a8e15fb7b1ef29c0c0a7bddd11709cd95bbf12fcd48e83e
-  data.tar.gz: 4cb584323a5575de4131b0eb82cdb1426743ca6651030708673d17264c9fec60619bb7d5d54f62eee4c1fa357b4022f57bf08b95d81f153eccf8e625ce3ef5e7
+  metadata.gz: e5f17ef5241d9f7ddc047a16608b607d5fefedeff207cc6c3fb969c937192527c5b49c4213b3a3635db4e6aa68784f10f2363a5d99ed8528740b0d4ecebf3790
+  data.tar.gz: e58336f998f4df510b707534fd5fecdd87c8e4d04187f6ed9512e2b391dd359e3c9c4cfab106022a9a34ed8dfc27669156f5fc4214320ffe9dec3a0176aab181

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,11 @@
 # Changelog
+## 0.7.1 (2026-04-22)
+### Changed (behavioral, follow-up to v0.7.0)
+- **`Step::Base#run_once` no longer swallows adapter-phase `ArgumentError` as `:input_error`.** The previous blanket `rescue ArgumentError` was there to convert DSL misconfiguration (e.g. missing `prompt`) into an `:input_error` Result. Side effect: programmer bugs in adapter code that raised `ArgumentError` (wrong arity, bad config argument) were silently coerced into `:input_error` and retried as if the user had given bad input. Now the rescue is narrowed to the Runner-construction phase only — DSL configuration errors still produce `:input_error` (the `prompt has not been set` case is regression-tested), but `ArgumentError` raised from adapter code during `Runner#call` propagates to the caller. Input-type validation failures continue to produce `:input_error` through `InputValidator`'s own scoped rescue, unchanged.
 ## 0.7.0 (2026-04-21)
 ### Breaking changes

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    ruby_llm-contract (0.7.0)
+    ruby_llm-contract (0.7.1)
       dry-types (~> 1.7)
       ruby_llm (~> 1.0)
       ruby_llm-schema (~> 0.3)
@@ -258,7 +258,7 @@ CHECKSUMS
   rubocop-ast (1.49.1) sha256=4412f3ee70f6fe4546cc489548e0f6fcf76cafcfa80fa03af67098ffed755035
   ruby-progressbar (1.13.0) sha256=80fc9c47a9b640d6834e0dc7b3c94c9df37f08cb072b7761e4a71e22cff29b33
   ruby_llm (1.14.0) sha256=57c6f7034fc4a44504ea137d70f853b07824f1c1cdbe774ab3ab3522e7098deb
-  ruby_llm-contract (0.7.0)
+  ruby_llm-contract (0.7.1)
   ruby_llm-schema (0.3.0) sha256=a591edc5ca1b7f0304f0e2261de61ba4b3bea17be09f5cf7558153adfda3dec6
   ruby_parser (3.22.0) sha256=1eb4937cd9eb220aa2d194e352a24dba90aef00751e24c8dfffdb14000f15d23
   rubycritic (4.12.0) sha256=024fed90fe656fa939f6ea80aab17569699ac3863d0b52fd72cb99892247abc8

data/README.md CHANGED Viewed

@@ -1,6 +1,21 @@
 # ruby_llm-contract
-The missing link between LLM cost and quality. Stop choosing between "cheap but wrong" and "accurate but expensive" — get both. Contracts, model escalation, and data-driven recommendations for [ruby_llm](https://github.com/crmne/ruby_llm).
+**Handle LLM output variance for [ruby_llm](https://github.com/crmne/ruby_llm).** Transport is a solved problem — ruby_llm already retries rate limits, timeouts, and server errors at the Faraday layer. What it can't do: retry when the model returns malformed JSON or a wrong answer, escalate to a smarter model when the cheap one fails the rules, measure variance on your dataset, and gate CI on regressions. That's what this gem adds.
+## Where the boundary sits
+| Concern | Handled by |
+|---|---|
+| Rate limits, timeouts, 5xx, connection errors | `ruby_llm` (Faraday retry middleware) |
+| Streaming, tool calls, embeddings, images, transcription | `ruby_llm` |
+| Chat history persistence (`acts_as_chat`) | `ruby_llm` |
+| **Malformed JSON / parse errors** | **`ruby_llm-contract`** |
+| **Business rule violations (invariants, schema)** | **`ruby_llm-contract`** |
+| **Retry with model escalation on bad output** | **`ruby_llm-contract`** |
+| **Measuring output variance on datasets** | **`ruby_llm-contract`** |
+| **Regression detection + CI gates** | **`ruby_llm-contract`** |
+Put together: `ruby_llm` owns the wire, this gem owns what the model *said*.
 ```
   YOU WRITE                       THE GEM HANDLES                 YOU GET
@@ -115,9 +130,9 @@ RubyLLM::Contract.configure { |c| c.default_model = "gpt-4.1-mini" }
 Works with any ruby_llm provider (OpenAI, Anthropic, Gemini, etc).
-## Save money with model escalation
+## Handle output variance with model escalation
-Without a contract, you use gpt-4.1 for everything because you can't tell when a cheaper model gets it wrong. With a contract, you start on nano and only escalate when the answer fails the contract:
+Models are non-deterministic. A prompt that works on 95% of inputs can break on the edge case sitting in your production traffic right now. The defensive response is to pick the strongest model and pay for it on every call. The measured response is to define a contract and let the gem escalate only when the cheaper model's output actually fails the rules:
 ```ruby
 retry_policy models: %w[gpt-4.1-nano gpt-4.1-mini gpt-4.1]
@@ -129,7 +144,58 @@ Attempt 2: gpt-4.1-mini  → contract passed  ($0.0004)
            gpt-4.1       → never called      ($0.00)
 ```
-Most requests succeed on the cheapest model. You pay full price only for the ones that need it. How many? Run `compare_models` and find out.
+Most requests succeed on the cheapest model. The expensive ones fire only when output variance demands it. The cost win is a consequence of measuring variance correctly — not the primary goal. Want to know how often each tier triggers? Run `compare_models` against your dataset.
+Default retry statuses (since 0.7.0) are `:validation_failed` and `:parse_error` — the two flavors of LLM output variance. Transport errors (rate limits, timeouts, 5xx) are retried by ruby_llm at the HTTP layer and intentionally not duplicated here. If you want `:adapter_error` in retry too, opt in explicitly — it's meaningful paired with an escalation chain.
+## Soft delivery: retry for variance, ship the last attempt
+Sometimes validation is a **soft quality check** — "options balanced", "style consistent", "tone friendly" — and a partial output is better than none. The same model generating the same prompt produces different samples run-to-run (OpenAI forces `temperature=1.0` on gpt-5/o-series), so a single unlucky draw shouldn't fail the user. Use `attempts:` to retry on the SAME model — no escalation — and get the last attempt back even if it still failed the contract:
+```ruby
+class GenerateQuiz < RubyLLM::Contract::Step::Base
+  output_schema do
+    # ... 15 questions × 4 options ...
+  end
+  validate("answer options balanced") do |out, _|
+    out[:questions].all? do |q|
+      lens = q[:answer_options].map(&:length)
+      next false if lens.empty? || lens.min.zero?
+      lens.min >= 15 && (lens.max.to_f / lens.min) <= 1.7
+    end
+  end
+  retry_policy attempts: 3
+end
+result = GenerateQuiz.run(document)
+if result.ok?
+  publish(result.parsed_output)
+else
+  # Three unlucky draws in a row — ship the last one anyway, log the miss.
+  Rails.logger.warn "Quiz delivered with soft-validation miss: #{result.validation_errors.join('; ')}"
+  publish(result.parsed_output)
+end
+```
+How this differs from the escalation chain:
+- `retry_policy models: %w[nano mini full]` — **document hardness.** Retry means "the cheap model isn't enough, use a smarter one."
+- `retry_policy attempts: 3` — **sampling variance.** Retry means "same model, different random seed — the model can do better on a second try."
+After all `attempts` are exhausted (`attempts: 3` means 3 total tries, not 3 retries on top of the first), the Result carries `status: :validation_failed` plus the last attempt's `parsed_output`. The caller decides: ship anyway, fall back to a template, or surface an error. The gem does not raise on exhaustion — your application policy, your choice.
+Combine both when helpful:
+```ruby
+retry_policy do
+  escalate({ model: "gpt-4.1-mini" }, { model: "gpt-4.1-mini" }, { model: "gpt-4.1" })
+end
+```
+Two tries on mini (variance retry) before paying for full-fat gpt-4.1.
 ## Know which model to use — with data
@@ -284,7 +350,11 @@ Full procedure with examples: **[Optimizing retry_policy](docs/guide/optimizing_
 ## Roadmap
-**v0.6 (current):** "What should I do?" — `Step.recommend` returns optimal model, reasoning effort, and retry chain. Per-attempt `reasoning_effort` in retry policies.
+**v0.7.1 (current):** Follow-up — `Step::Base#run_once` no longer masks adapter-phase `ArgumentError` as `:input_error`. Programmer bugs in adapter code now propagate; DSL misconfiguration still becomes `:input_error` via narrower rescue.
+**v0.7.0:** Sharpened retry semantics. `DEFAULT_RETRY_ON` now targets LLM output variance only (`:validation_failed`, `:parse_error`); transport errors are delegated to ruby_llm's Faraday retry. `AdapterCaller` narrowed to let programmer errors propagate instead of masking them as retries. Breaking change — see [CHANGELOG](CHANGELOG.md) for migration.
+**v0.6:** "What should I do?" — `Step.recommend` returns optimal model, reasoning effort, and retry chain. Per-attempt `reasoning_effort` in retry policies.
 **v0.5:** Prompt A/B testing with `compare_with`. Soft observations with `observe`.

data/lib/ruby_llm/contract/step/base.rb CHANGED Viewed

@@ -186,20 +186,36 @@ module RubyLLM
                                             "{ |c| c.default_adapter = ... } or pass context: { adapter: ... }"
           end
+          # ADR-0021 deliverable 2: narrow ArgumentError rescue to DSL-setup phase only.
+          #
+          # DSL misconfiguration (e.g. `prompt has not been set`, missing required
+          # attributes) surfaces as ArgumentError when constructing Runner. We catch
+          # that and return :input_error — these are contract-definition issues the
+          # caller can handle as "bad input to the step definition".
+          #
+          # Runner#call itself does NOT get a blanket rescue: input-type validation
+          # failures return :input_error from within InputValidator; adapter/runtime
+          # programmer bugs (NoMethodError, adapter-code ArgumentError) must propagate
+          # instead of being silently masked as :input_error.
           def run_once(input, adapter:, model:, context_temperature: nil, extra_options: {})
             effective_temp = context_temperature || temperature
-            Runner.new(
-              input_type: input_type, output_type: output_type,
-              prompt_block: prompt, contract_definition: effective_contract,
-              adapter: adapter, model: model, output_schema: output_schema,
-              max_output: max_output, max_input: max_input, max_cost: max_cost,
-              on_unknown_pricing: on_unknown_pricing,
-              temperature: effective_temp, extra_options: extra_options,
-              observers: class_observers
-            ).call(input)
-          rescue ArgumentError => e
-            Result.new(status: :input_error, raw_output: nil, parsed_output: nil,
-                       validation_errors: [e.message])
+            runner =
+              begin
+                Runner.new(
+                  input_type: input_type, output_type: output_type,
+                  prompt_block: prompt, contract_definition: effective_contract,
+                  adapter: adapter, model: model, output_schema: output_schema,
+                  max_output: max_output, max_input: max_input, max_cost: max_cost,
+                  on_unknown_pricing: on_unknown_pricing,
+                  temperature: effective_temp, extra_options: extra_options,
+                  observers: class_observers
+                )
+              rescue ArgumentError => e
+                return Result.new(status: :input_error, raw_output: nil, parsed_output: nil,
+                                  validation_errors: [e.message])
+              end
+            runner.call(input)
           end
           def log_result(result)

data/lib/ruby_llm/contract/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module RubyLLM
   module Contract
-    VERSION = "0.7.0"
+    VERSION = "0.7.1"
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ruby_llm-contract
 version: !ruby/object:Gem::Version
-  version: 0.7.0
+  version: 0.7.1
 platform: ruby
 authors:
 - Justyna