ruby-pi 0.1.5 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 45a50058497c3e040f81e977ffdbe0030b901ea053e91002e4ddddba06c23a6f
4
- data.tar.gz: 6066bc184d7f8eb951ae137595a8ed082ec41c3abc8ca84cbfe5b4206fde6f66
3
+ metadata.gz: b0054adb6a0863a8f296917be736df0ebfd789aa7205589b82689199d4bf4c06
4
+ data.tar.gz: fc79dcc61dbefce874e609807d989cf2293b0ecb45a6aa036069b11038ac5c9a
5
5
  SHA512:
6
- metadata.gz: d486b3a171dca8c59bf442ba5c03f135ee450a083c47c05f075af6b23d87eb24285c72c734a9579456d71d6ac247ed4357f7bd16b813219e77655a98f44b0cc2
7
- data.tar.gz: 602ed6dc493731203c9fedb8d69f118c81ce292c653ca42ac4ca0d9f8d793dc281697fa87f9df9f7601022812dd791d4420d0f22ba89ad6051119791e87f33ca
6
+ metadata.gz: c130ada9b7ed93f5c9a0d16596c1176fec258204be26af15c61db3c18effee94bc7a8a1783620397780b0e3501e660b4e8ff48d8463e4089067edfcbf3bf9b60
7
+ data.tar.gz: dc179fe40cb063c4321a1c7a1aff5abb7b441d5fd87ced19908f9875c1f5b26bf2e17555b44658f603b32627577fe99feb8f34d061ed7e53eaba3c28cecd8bbb
data/CHANGELOG.md CHANGED
@@ -5,6 +5,57 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.1.8] - 2026-06-09
9
+
10
+ ### Fixed (adversarial review round 6)
11
+
12
+ - **`Retry-After` header was parsed but never honored (High)**: On a 429, `handle_error_response` stored the server's `Retry-After` on `RateLimitError#retry_after`, but the retry loop in `BaseProvider#complete` always slept the local exponential backoff (capped at `retry_max_delay`) — hammering a server that asked for a longer cooldown until the retry budget burned out. The retry delay now prefers a positive `retry_after` (capped at `RETRY_AFTER_CEILING`, 60s); HTTP-date values (which parse to `0.0`) and absent headers fall through to the computed backoff
13
+ - **Parallel executor timeout/rejection Results reported `name: "unknown"` (High)**: `execute_parallel` hardcoded `"unknown"` in the timeout and rejected-future branches, so with several tools timing out concurrently, logs and `:tool_execution_end` subscribers could not tell which tool hung. Futures are now zipped with their originating calls and failure Results carry the real tool name; the timeout message matches sequential mode (`Tool 'x' timed out after Ns`). The rejected-future branch also no longer reports a misleading full-timeout `duration_ms` for what may have been an instant failure (now `0.0`)
14
+ - **Keyword-parameter tool blocks failed on every call (High)**: `Definition#call` passed a single positional Hash, so a block written `{ |content:, platform:| ... }` — the natural style given named schema parameters — raised `ArgumentError: missing keyword` on every invocation (surfacing as a confusing failed Result). `Definition` now detects keyword parameters at construction and splats the arguments hash to keywords; positional-Hash blocks are unchanged. Keyword blocks without `**rest` raise on unexpected keys — strict by design, since the keys come from the LLM
15
+ - **`:compaction` event was never emitted in production (Medium)**: `Compaction#emitter` defaults to nil and nothing ever assigned it, so the documented `agent.on(:compaction)` subscription silently never fired — the only place the emitter was set was the spec itself. `Loop#initialize` now wires its emitter into the compaction strategy (an explicitly preassigned emitter is left untouched)
16
+ - **Streaming chunks were never normalized to UTF-8 (Medium)**: Faraday delivers `on_data` chunks as ASCII-8BIT; appending a chunk to a UTF-8 SSE buffer already holding non-ASCII text raises `Encoding::CompatibilityError`, and yielded deltas could carry binary encoding into consumers' UTF-8 buffers. All three providers now buffer in BINARY and re-encode each complete SSE line to UTF-8 (with `scrub` guarding invalid bytes) before parsing, so `:text_delta` events are always valid UTF-8 — including multi-byte characters split across network chunks
17
+ - **Streaming fallback gave consumers no way to truncate partial primary output (Medium)**: If the primary streamed text and then died mid-stream, the fallback streamed a complete fresh response — a delta-appending consumer rendered `"<partial primary><full fallback>"` with no signal of how much to discard. The `:fallback_start` payload now includes `partial_output` (Boolean) and `partial_chars` (characters already yielded), so consumers can deterministically reset
18
+ - **Tool names were not validated against provider constraints (Medium)**: A tool named `send.email` registered fine and then 400'd on every Anthropic request with an opaque server error. `Definition` now validates names against `/\A[a-zA-Z0-9_-]{1,64}\z/` (the strictest provider constraint) and raises `ArgumentError` at definition time with a pointed message
19
+ - **`json` was used everywhere but never declared or required (Medium)**: `JSON.parse`/`JSON.generate` are called throughout the providers and agent loop, but the gem relied on Faraday's transitive `json` dependency and the entry point's single `require "json"` — loading `agent/loop.rb` in isolation raised `NameError`, contradicting the composability principle. The gemspec now declares `json >= 2.0` and every file referencing `JSON` requires it directly (pinned by a source-scan spec)
20
+ - **Configuration accepted negative retry/timeout values (Low)**: `max_retries = -1` silently disabled retries and a negative delay raised deep inside the retry loop's `sleep`. The numeric settings now have validated writers that raise `ArgumentError` at assignment time
21
+ - **Global configuration first-access race (Low)**: `@configuration ||= Configuration.new` was unsynchronized; two threads racing the first call could each construct a Configuration with one silently discarded. The configuration is now eagerly initialized at require time
22
+ - **`continue()` Result accounting documented (Docs)**: Each `run`/`continue` builds a fresh Loop, so the returned Result's `usage`/`tool_calls_made`/`turns` cover only that invocation while `messages` is cumulative — an undocumented asymmetry, now documented on `Core#continue`
23
+ - **Schema DSL documented as LLM-facing hints, not validation (Docs)**: Nothing validates model-supplied arguments against `tool.parameters` before invoking the block — `required`/`enum`/`minimum` constrain what the model is asked to produce, with no runtime enforcement or type coercion. This is deliberate (anti-framework), but the schema header now says so loudly and directs tool blocks to treat arguments as untrusted input
24
+ - **`State#add_message` unbounded growth documented (Docs)**: Long-lived agents calling `continue()` repeatedly accumulate messages linearly without compaction configured; documented on the method
25
+ - **CLAUDE.md module map corrected (Docs)**: The map referenced a nonexistent `agent/agent.rb`, omitted `core.rb`/`loop.rb`/`state.rb`/`events.rb`, hardcoded version `0.1.0`, and the extension example used the one-arg `|event|` block signature instead of the actual `|data, agent|`. All corrected
26
+
27
+ ### Release-history note
28
+
29
+ - **`[0.1.4]` below was never actually released**: `lib/ruby_pi/version.rb` went from `0.1.3` directly to `0.1.5` — the round-2 fixes documented under 0.1.4 shipped without a version bump and were first published as part of 0.1.5. There is intentionally no `v0.1.4` git tag or gem. (Discovered during round 6; the entry is kept for historical accuracy of *what* changed.)
30
+
31
+ ## [0.1.7] - 2026-05-28
32
+
33
+ ### Fixed (adversarial review round 5)
34
+
35
+ - **Compaction produced an Anthropic-invalid leading `:assistant` message (Critical)**: The 0.1.6 orphan-`:tool` strip fixed tool-result splitting but left the summary-role logic (`first_preserved == :assistant ? :user : :assistant`) intact. Whenever the first preserved message was `:user` (multi-turn reuse) or the preserved window emptied out (all tool results), the summary became an `:assistant` message at the head of the conversation — which Anthropic rejects with HTTP 400 "first message must use the 'user' role". The summary is now **always** a `:user` message (valid as the first message and never overwriting the system prompt). When the first preserved message is itself `:user`, the summary is merged into it to avoid consecutive same-role messages; an empty preserved window yields a lone `:user` summary. Extracted into `Compaction#build_compacted_history`
36
+ - **Compaction dead "mirror case" branch removed (Minor)**: The 0.1.6 `if droppable.last … && preserved.first[:role] == :tool` block was unreachable — the preceding `while` loop guarantees `preserved.first` is never `:tool`. Removed it (the originating assistant message is already in droppable alongside its now-moved tool results, so the pair is never split), eliminating misleading dead code
37
+ - **Deterministic `ProviderError` was retried with backoff (Minor)**: 0.1.6 added `RubyPi::ProviderError` to the retryable set in `BaseProvider#complete`, but provider errors are overwhelmingly deterministic request-construction failures (missing `tool_call_id`, invalid tool-argument JSON) raised before any HTTP call — retrying only burned the backoff schedule before re-raising the identical error. `ProviderError` is no longer retried. Fallback failover is unaffected (it rescues the `RubyPi::Error` superclass)
38
+ - **Lifecycle hooks saw string-keyed tool arguments while events saw symbols (Minor)**: `before_tool_call`/`after_tool_call` received the raw `ToolCall` (string-keyed `arguments`) while the `:tool_execution_start` event and `tool_calls_made` carried symbol keys — so a hook and an event subscriber disagreed on the key type for the same call. `Loop#act` now rebuilds each `ToolCall` with symbol-keyed arguments up front, so hooks, events, `tool_calls_made`, and the tool block all observe the identical shape
39
+ - **Anthropic streaming `finish_reason` could be clobbered to nil (Minor)**: A trailing `message_delta` event without a `stop_reason` overwrote the previously captured value, yielding a `Response` with no `finish_reason`. The assignment is now guarded (`finish_reason = delta["stop_reason"] if delta["stop_reason"]`), matching the OpenAI/Gemini guards
40
+ - **Gemini `finishReason` assumed a String (Minor)**: `finishReason.downcase` would raise `NoMethodError` on a non-String payload mid-stream. Both the streaming and standard paths now coerce via `to_s` before `downcase`, and remain consistent with each other
41
+ - **Dead streamed-content accumulator removed (Cleanup)**: `Loop#think` accumulated `streamed_content` that was never read (the recorded assistant message uses `Response#content`); the `.clear` on `:fallback_start` was a no-op and its comment was inaccurate. Removed the local; the `:provider_fallback` event still fires
42
+ - **`Fallback` class docstring corrected (Docs)**: The class-level docstring still described the removed happy-path buffering ("the Fallback now buffers deltas… buffered deltas are discarded"), contradicting the real-time direct-streaming implementation. Updated to describe direct streaming plus the `:fallback_start` signal
43
+
44
+ ### Investigated, no change
45
+
46
+ - **Streaming HTTP error bodies via `env.status`**: A prior review raised that streaming error responses might lose their body if Faraday's `on_data` callback received a nil `env.status`. Verified against the actual stack (faraday 2.14.1 / faraday-net_http 3.3.0): the net_http adapter calls `save_http_response` (which sets `env.status`) before `response.read_body` streams chunks, and `Env#stream_response` passes that same populated `env` to the user's `on_data` proc. `env.status` is therefore reliably available before the first chunk, so the existing `error_body` recovery works. No fix needed
47
+
48
+ ## [0.1.6] - 2026-05-01
49
+
50
+ ### Fixed (adversarial review round 4)
51
+
52
+ - **Faraday transport errors leaked untyped, bypassed retry (Critical)**: `BaseProvider#complete` rescued only `RubyPi::*` errors, but providers never wrapped Faraday network exceptions. A `Faraday::TimeoutError`, `Faraday::ConnectionFailed`, or `Faraday::SSLError` propagated as the raw Faraday class — breaking the documented error hierarchy and skipping the retry loop entirely (the exact case retries exist for). Added `BaseProvider#with_transport_errors` which translates `Faraday::TimeoutError` → `RubyPi::TimeoutError` and `Faraday::ConnectionFailed`/`SSLError`/other `Faraday::Error` → `RubyPi::ApiError`. Wrapped every `conn.post` call in all three providers (standard and streaming paths). `RubyPi::ProviderError` is now also retryable
53
+ - **Gemini multi-turn tool use was broken (Critical)**: `Gemini#format_message` rendered assistant messages as text-only and silently dropped the `:tool_calls` field set by the agent loop. The next turn's `functionResponse` had no preceding `functionCall` to bind to, so Gemini rejected any conversation that included a tool call followed by a tool result. Assistant messages now emit one `functionCall` part per tool call (mirroring Anthropic's `tool_use` and OpenAI's `tool_calls` behavior). Empty text parts are also no longer emitted on tool-only assistant turns
54
+ - **Compaction split tool_use/tool_result pairs (Critical)**: When `preserve_last_n` cut between an assistant `tool_calls` message (in droppable) and its matching `:tool` result (in preserved), Anthropic and OpenAI rejected the conversation with "tool_result without preceding tool_use". Compaction now strips orphan `:tool` messages from the head of preserved (moves them into droppable so they're summarized away). Mirror case where preserved starts with a tool result whose assistant is the last droppable message also handled
55
+ - **`Tools::Executor` swallowed non-StandardError exceptions as nil success (Major)**: The worker thread rescued only `StandardError`. A tool block raising `Interrupt`, `SystemExit`, or any other `Exception` subclass left both `value` and `error` nil; the join then reported a *successful* `nil` result. Now rescues `Exception`, captures it as a failed `Result`. Worker thread also sets `report_on_exception = false` to avoid stderr spam
56
+ - **Gemini tool_call IDs collided across turns (Major)**: IDs were generated as `"gemini_#{accumulated_tool_calls.length}"` — every response restarted numbering at 0, so a multi-turn conversation produced multiple tool calls all named `"gemini_0"`. Any caller using ID as a hash key (observability, result correlation) saw collisions. IDs now use `SecureRandom.hex(8)` for global uniqueness across both standard and streaming responses
57
+ - **OpenAI passed malformed tool_call.arguments JSON verbatim (Minor)**: A non-JSON string in `tool_call.arguments` on an assistant message was forwarded unchanged to OpenAI, producing an opaque HTTP 400. Now validated up-front with `JSON.parse`; malformed input raises a typed `RubyPi::ProviderError` with the tool name and parse error before sending the request, matching Anthropic's input validation
58
+
8
59
  ## [0.1.5] - 2026-04-30
9
60
 
10
61
  ### Fixed (adversarial review round 3)
@@ -140,12 +140,18 @@ module RubyPi
140
140
  # the existing conversation history and appends the new prompt before
141
141
  # resuming the loop.
142
142
  #
143
+ # NOTE on Result accounting: each run/continue builds a fresh Loop, so
144
+ # the returned Result's `usage`, `tool_calls_made`, and `turns` cover
145
+ # ONLY this invocation — while `messages` is cumulative across the whole
146
+ # conversation. Sum the per-call Results if you need session totals.
147
+ #
143
148
  # Issue #16: Uses the encapsulated reset_iteration! method instead of
144
149
  # the old approach that bypassed encapsulation
145
150
  # and was fragile.
146
151
  #
147
152
  # @param prompt [String] the follow-up user message
148
153
  # @return [RubyPi::Agent::Result] the outcome of the continued run
154
+ # (usage/tool_calls_made/turns are per-invocation; messages cumulative)
149
155
  def continue(prompt)
150
156
  @state.reset_iteration!
151
157
  @state.add_message(role: :user, content: prompt)
@@ -10,6 +10,8 @@
10
10
  # is reached. It handles streaming, lifecycle events, compaction, and all
11
11
  # pre/post tool call hooks.
12
12
 
13
+ require "json"
14
+
13
15
  module RubyPi
14
16
  module Agent
15
17
  # Executes the think-act-observe cycle against a given State, emitting
@@ -55,6 +57,14 @@ module RubyPi
55
57
  @state = state
56
58
  @emitter = emitter
57
59
  @compaction = compaction
60
+ # Wire the loop's emitter into the compaction strategy so the
61
+ # documented :compaction event actually reaches agent subscribers.
62
+ # Compaction#emitter defaults to nil and nothing else ever sets it —
63
+ # without this, `agent.on(:compaction)` never fires. An emitter that
64
+ # was already assigned explicitly is left untouched.
65
+ if @compaction.respond_to?(:emitter=) && @compaction.respond_to?(:emitter) && @compaction.emitter.nil?
66
+ @compaction.emitter = emitter
67
+ end
58
68
  @execution_mode = execution_mode
59
69
  @tool_timeout = tool_timeout
60
70
  @tool_calls_made = []
@@ -145,17 +155,16 @@ module RubyPi
145
155
  # Build tools array for the LLM
146
156
  tools = build_tools_array
147
157
 
148
- # Accumulate streamed content
149
- streamed_content = +""
150
-
151
- # Call the LLM with streaming
158
+ # Call the LLM with streaming. The recorded assistant message uses
159
+ # the returned Response#content (already the final, authoritative
160
+ # text), so there is no need to accumulate deltas here — we only
161
+ # re-emit them for subscribers.
152
162
  response = @state.model.complete(
153
163
  messages: messages,
154
164
  tools: tools,
155
165
  stream: true
156
166
  ) do |event|
157
167
  if event.text_delta?
158
- streamed_content << event.data.to_s
159
168
  @emitter.emit(:text_delta, content: event.data)
160
169
  elsif event.tool_call_delta?
161
170
  # Emit tool call delta events so subscribers can observe partial
@@ -164,12 +173,11 @@ module RubyPi
164
173
  @emitter.emit(:tool_call_delta, data: event.data)
165
174
  elsif event.fallback_start?
166
175
  # The primary LLM provider failed mid-stream and a Fallback
167
- # provider is now taking over. Discard the partial text we
168
- # accumulated from the failed primary so the agent's recorded
169
- # response reflects only the fallback's output, and surface a
170
- # :provider_fallback event so subscribers can clear any UI
171
- # state they rendered from the discarded primary deltas.
172
- streamed_content.clear
176
+ # provider is now taking over. Surface a :provider_fallback event
177
+ # so subscribers can clear any UI state they rendered from the
178
+ # discarded primary deltas. The recorded response is unaffected:
179
+ # it comes from the fallback provider's returned Response#content,
180
+ # never from the failed primary's partial text.
173
181
  @emitter.emit(:provider_fallback, **event.data)
174
182
  end
175
183
  end
@@ -208,32 +216,39 @@ module RubyPi
208
216
  timeout: @tool_timeout
209
217
  )
210
218
 
211
- # Symbolize the JSON-parsed (string-keyed) tool_call arguments once,
212
- # up front. Both the executor (which actually invokes the tool block)
213
- # and the recorded `tool_calls_made` payload use this symbol-keyed
214
- # form, keeping a single consistent shape across the pipeline rather
215
- # than mixing string keys (raw from JSON) and symbol keys (post-
216
- # symbolize) in different places.
217
- symbolized = response.tool_calls.map do |tc|
218
- RubyPi::Tools::Executor.deep_symbolize_keys(tc.arguments)
219
+ # Normalize each tool call's arguments to symbol keys ONCE, up front,
220
+ # by rebuilding the ToolCall objects. Every downstream consumer the
221
+ # executor (which invokes the tool block), the before/after_tool_call
222
+ # hooks (which receive the ToolCall directly), the emitted
223
+ # :tool_execution_start event, and the recorded `tool_calls_made`
224
+ # payload then observes the identical symbol-keyed shape. Carrying
225
+ # the symbolized form on the ToolCall itself (rather than in a side
226
+ # array) is what keeps the hooks consistent with everything else;
227
+ # previously hooks saw raw string keys while events/records saw symbols.
228
+ tool_calls = response.tool_calls.map do |tc|
229
+ RubyPi::LLM::ToolCall.new(
230
+ id: tc.id,
231
+ name: tc.name,
232
+ arguments: RubyPi::Tools::Executor.deep_symbolize_keys(tc.arguments)
233
+ )
219
234
  end
220
235
 
221
236
  # Prepare call hashes for the executor
222
- calls = response.tool_calls.each_with_index.map do |tc, idx|
223
- { name: tc.name, arguments: symbolized[idx] }
237
+ calls = tool_calls.map do |tc|
238
+ { name: tc.name, arguments: tc.arguments }
224
239
  end
225
240
 
226
241
  # Fire before_tool_call hooks and emit start events
227
- response.tool_calls.each_with_index do |tc, idx|
242
+ tool_calls.each do |tc|
228
243
  @state.before_tool_call&.call(tc)
229
- @emitter.emit(:tool_execution_start, tool_name: tc.name, arguments: symbolized[idx])
244
+ @emitter.emit(:tool_execution_start, tool_name: tc.name, arguments: tc.arguments)
230
245
  end
231
246
 
232
247
  # Execute all tool calls
233
248
  results = executor.execute(calls)
234
249
 
235
250
  # Fire after_tool_call hooks, emit end events, and add results to messages
236
- response.tool_calls.each_with_index do |tc, idx|
251
+ tool_calls.each_with_index do |tc, idx|
237
252
  result = results[idx]
238
253
 
239
254
  @state.after_tool_call&.call(tc, result)
@@ -247,7 +262,7 @@ module RubyPi
247
262
  # arguments so callers see the same shape the tool itself received.
248
263
  @tool_calls_made << {
249
264
  tool_name: tc.name,
250
- arguments: symbolized[idx],
265
+ arguments: tc.arguments,
251
266
  result: result.to_h
252
267
  }
253
268
 
@@ -91,6 +91,12 @@ module RubyPi
91
91
 
92
92
  # Appends a message to the conversation history.
93
93
  #
94
+ # NOTE: history grows without bound — there is no built-in cap. Growth
95
+ # per run is limited by max_iterations, but long-lived agents that call
96
+ # continue() repeatedly (or use a high max_iterations with large tool
97
+ # outputs) accumulate messages linearly. Configure
98
+ # Agent.new(compaction: ...) to keep the context bounded.
99
+ #
94
100
  # @param role [Symbol, String] the message role (:user, :assistant, :system, :tool)
95
101
  # @param content [String, nil] the text content of the message
96
102
  # @param options [Hash] additional fields (e.g., :tool_call_id, :tool_calls)
@@ -37,19 +37,53 @@ module RubyPi
37
37
  attr_accessor :openai_api_key
38
38
 
39
39
  # @return [Integer] Maximum number of retry attempts for transient errors (default: 3)
40
- attr_accessor :max_retries
40
+ attr_reader :max_retries
41
41
 
42
42
  # @return [Float] Base delay in seconds for exponential backoff (default: 1.0)
43
- attr_accessor :retry_base_delay
43
+ attr_reader :retry_base_delay
44
44
 
45
45
  # @return [Float] Maximum delay in seconds between retries (default: 30.0)
46
- attr_accessor :retry_max_delay
46
+ attr_reader :retry_max_delay
47
47
 
48
48
  # @return [Integer] HTTP request timeout in seconds (default: 120)
49
- attr_accessor :request_timeout
49
+ attr_reader :request_timeout
50
50
 
51
51
  # @return [Integer] HTTP connection open timeout in seconds (default: 10)
52
- attr_accessor :open_timeout
52
+ attr_reader :open_timeout
53
+
54
+ # Validated writers for numeric settings. A negative max_retries silently
55
+ # disables retries and a negative delay raises deep inside the retry
56
+ # loop's sleep — fail fast at assignment time instead, where the typo is.
57
+
58
+ # @param value [Integer] must be a non-negative integer
59
+ def max_retries=(value)
60
+ validate_numeric!(:max_retries, value)
61
+ @max_retries = value
62
+ end
63
+
64
+ # @param value [Numeric] must be non-negative
65
+ def retry_base_delay=(value)
66
+ validate_numeric!(:retry_base_delay, value)
67
+ @retry_base_delay = value
68
+ end
69
+
70
+ # @param value [Numeric] must be non-negative
71
+ def retry_max_delay=(value)
72
+ validate_numeric!(:retry_max_delay, value)
73
+ @retry_max_delay = value
74
+ end
75
+
76
+ # @param value [Numeric] must be non-negative
77
+ def request_timeout=(value)
78
+ validate_numeric!(:request_timeout, value)
79
+ @request_timeout = value
80
+ end
81
+
82
+ # @param value [Numeric] must be non-negative
83
+ def open_timeout=(value)
84
+ validate_numeric!(:open_timeout, value)
85
+ @open_timeout = value
86
+ end
53
87
 
54
88
  # @return [String] Default model name for Gemini provider
55
89
  attr_accessor :default_gemini_model
@@ -78,6 +112,17 @@ module RubyPi
78
112
 
79
113
  private
80
114
 
115
+ # Raises unless the value is a non-negative Numeric.
116
+ #
117
+ # @param name [Symbol] the setting name (for the error message)
118
+ # @param value [Object] the value being assigned
119
+ # @raise [ArgumentError] if value is not a Numeric or is negative
120
+ def validate_numeric!(name, value)
121
+ return if value.is_a?(Numeric) && value >= 0
122
+
123
+ raise ArgumentError, "#{name} must be a non-negative number, got #{value.inspect}"
124
+ end
125
+
81
126
  # Sets all configuration ivars to their default values. Called by both
82
127
  # initialize and reset! to ensure consistent defaults without the
83
128
  # anti-pattern of calling initialize from reset!.
@@ -75,40 +75,77 @@ module RubyPi
75
75
 
76
76
  # Split into messages to summarize and messages to keep
77
77
  preserved_count = [@preserve_last_n, messages.size].min
78
- droppable = messages[0...(messages.size - preserved_count)]
79
- preserved = messages[(messages.size - preserved_count)..]
78
+ droppable = messages[0...(messages.size - preserved_count)].dup
79
+ preserved = messages[(messages.size - preserved_count)..].dup
80
80
 
81
81
  # If there's nothing to drop, we can't compact further
82
82
  return nil if droppable.empty?
83
83
 
84
+ # Anthropic and OpenAI both require every tool_result / tool message
85
+ # to reference a tool_use / tool_call from a preceding assistant
86
+ # message. If we summarize the assistant turn that originated a tool
87
+ # call but keep the matching tool_result, the API rejects the
88
+ # request with "tool_result without preceding tool_use".
89
+ #
90
+ # When the boundary between droppable and preserved cuts mid-exchange,
91
+ # preserved can start with one or more orphan :tool messages whose
92
+ # matching assistant turn is in droppable. Strip those off the head of
93
+ # preserved and move them into droppable so they are summarized away
94
+ # rather than sent. Because the originating assistant message is older,
95
+ # it is already in droppable, so the pair stays together there — there
96
+ # is no mirror case to handle (once a tool result is moved across, its
97
+ # assistant is never left stranded on the preserved side).
98
+ while preserved.first && preserved.first[:role] == :tool
99
+ droppable << preserved.shift
100
+ end
101
+
102
+ # The orphan-strip only moves messages INTO droppable, so droppable
103
+ # cannot have shrunk; it is still non-empty here. preserved, however,
104
+ # may now be empty (the whole window was tool results) — the summary
105
+ # construction below handles that case.
106
+
84
107
  # Generate a summary of the dropped messages
85
108
  summary = summarize(droppable)
86
109
 
87
110
  # Emit compaction event if an emitter is available
88
111
  @emitter&.emit(:compaction, dropped_count: droppable.size, summary: summary)
89
112
 
90
- # Build the compacted history: summary message + preserved.
91
- #
92
- # The summary role MUST NOT be :system (that would overwrite the real
93
- # system prompt on Anthropic, which extracts the last :system message
94
- # as the top-level `system:` parameter).
95
- #
96
- # The summary role must also NOT match the role of the first preserved
97
- # message consecutive same-role messages are rejected by Anthropic.
98
- # We pick :user when the next preserved message is :assistant, and
99
- # :assistant otherwise (covers :user, :tool, and an empty preserved).
100
- # On Anthropic, :tool messages become role :user with tool_result
101
- # blocks, so :assistant is the safe choice when the next message is
102
- # :tool too.
103
- first_preserved_role = preserved.first&.dig(:role)
104
- summary_role = first_preserved_role == :assistant ? :user : :assistant
105
-
106
- summary_message = {
107
- role: summary_role,
108
- content: "[Conversation Summary]\n#{summary}"
109
- }
110
-
111
- [summary_message] + preserved
113
+ build_compacted_history(summary, preserved)
114
+ end
115
+
116
+ # Builds the compacted history: a summary message followed by the
117
+ # preserved tail.
118
+ #
119
+ # The summary becomes the FIRST message of the compacted history, so it
120
+ # must satisfy the strictest provider constraints (Anthropic):
121
+ # 1. The summary role MUST NOT be :system that would overwrite the
122
+ # real system prompt on Anthropic, which promotes the last :system
123
+ # message to the top-level `system:` parameter.
124
+ # 2. The first message MUST use role :user.
125
+ # 3. Consecutive same-role messages are rejected.
126
+ #
127
+ # A :user summary satisfies (1) and (2). For (3): the orphan-strip above
128
+ # guarantees the first preserved message is :assistant, :user, or absent
129
+ # (never :tool). When it is :assistant or absent, a standalone :user
130
+ # summary alternates correctly. When it is :user, a separate :user
131
+ # summary would create two consecutive user messages, so we instead
132
+ # merge the summary text into that existing user message — keeping the
133
+ # first message a single :user message with no role collision.
134
+ #
135
+ # @param summary [String] the generated summary text
136
+ # @param preserved [Array<Hash>] the preserved tail of messages
137
+ # @return [Array<Hash>] the compacted history
138
+ def build_compacted_history(summary, preserved)
139
+ summary_text = "[Conversation Summary]\n#{summary}"
140
+ first_preserved = preserved.first
141
+
142
+ if first_preserved && first_preserved[:role] == :user
143
+ merged = first_preserved.dup
144
+ merged[:content] = "#{summary_text}\n\n#{first_preserved[:content]}"
145
+ [merged] + preserved.drop(1)
146
+ else
147
+ [{ role: :user, content: summary_text }] + preserved
148
+ end
112
149
  end
113
150
 
114
151
  # Estimates the total token count for a system prompt and message array
@@ -6,6 +6,8 @@
6
6
  # the Anthropic Messages API for both synchronous and streaming completions,
7
7
  # including tool_use block support.
8
8
 
9
+ require "json"
10
+
9
11
  module RubyPi
10
12
  module LLM
11
13
  # Anthropic Claude provider implementation. Communicates with the Anthropic
@@ -330,9 +332,11 @@ module RubyPi
330
332
  headers: default_headers
331
333
  )
332
334
 
333
- response = conn.post("/v1/messages") do |req|
334
- req.headers["Content-Type"] = "application/json"
335
- req.body = JSON.generate(body)
335
+ response = with_transport_errors do
336
+ conn.post("/v1/messages") do |req|
337
+ req.headers["Content-Type"] = "application/json"
338
+ req.body = JSON.generate(body)
339
+ end
336
340
  end
337
341
 
338
342
  handle_error_response(response) unless response.success?
@@ -368,18 +372,26 @@ module RubyPi
368
372
  # process complete lines incrementally so that deltas reach the caller
369
373
  # as soon as each SSE event is fully received — not after the entire
370
374
  # response has been buffered.
371
- sse_buffer = +""
375
+ #
376
+ # The buffer is BINARY because chunks arrive as ASCII-8BIT and may end
377
+ # mid-way through a multi-byte UTF-8 character; appending such a chunk
378
+ # to a UTF-8 buffer that already holds non-ASCII text raises
379
+ # Encoding::CompatibilityError. Each complete line is re-encoded to
380
+ # UTF-8 (and scrubbed) before parsing, so deltas reach the caller as
381
+ # valid UTF-8 strings.
382
+ sse_buffer = (+"").force_encoding(Encoding::BINARY)
372
383
  response_status = nil
373
384
 
374
385
  # Accumulate error response body separately so ApiError gets the
375
386
  # full body even though on_data consumed the chunks.
376
- error_body = +""
387
+ error_body = (+"").force_encoding(Encoding::BINARY)
377
388
 
378
- response = conn.post("/v1/messages") do |req|
379
- req.headers["Content-Type"] = "application/json"
380
- req.body = JSON.generate(body)
389
+ response = with_transport_errors do
390
+ conn.post("/v1/messages") do |req|
391
+ req.headers["Content-Type"] = "application/json"
392
+ req.body = JSON.generate(body)
381
393
 
382
- # Use Faraday's on_data callback for real incremental streaming.
394
+ # Use Faraday's on_data callback for real incremental streaming.
383
395
  # Without this, Faraday buffers the entire response body before
384
396
  # returning, which means no deltas reach the caller until the model
385
397
  # finishes generating (fake streaming).
@@ -391,14 +403,17 @@ module RubyPi
391
403
  # calls on_data for error responses too, which would otherwise
392
404
  # consume the body and leave response.body empty.
393
405
  if response_status && response_status >= 400
394
- error_body << chunk
406
+ error_body << chunk.b
395
407
  next
396
408
  end
397
409
 
398
- sse_buffer << chunk
399
- # Process all complete lines in the buffer
410
+ sse_buffer << chunk.b
411
+ # Process all complete lines in the buffer. A complete line holds
412
+ # complete UTF-8 sequences (multi-byte characters split across
413
+ # chunks are repaired by the buffering), so re-encode it to UTF-8
414
+ # here; scrub guards against a server sending invalid bytes.
400
415
  while (line_end = sse_buffer.index("\n"))
401
- line = sse_buffer.slice!(0, line_end + 1).strip
416
+ line = sse_buffer.slice!(0, line_end + 1).force_encoding(Encoding::UTF_8).scrub.strip
402
417
  next if line.empty?
403
418
  next unless line.start_with?("data: ")
404
419
 
@@ -424,7 +439,8 @@ module RubyPi
424
439
  finish_reason = stream_state[:finish_reason]
425
440
  end
426
441
  end
427
- end
442
+ end # conn.post
443
+ end # with_transport_errors
428
444
 
429
445
  # Check for HTTP errors. When on_data was active, the response body
430
446
  # was consumed by the callback, so we pass the accumulated error_body
@@ -432,12 +448,12 @@ module RubyPi
432
448
  unless response.success?
433
449
  # Reconstruct the response body from what on_data accumulated
434
450
  error_response = response
435
- error_body_str = error_body.empty? ? response.body : error_body
451
+ error_body_str = error_body.empty? ? response.body : error_body.force_encoding(Encoding::UTF_8).scrub
436
452
  handle_error_response(error_response, override_body: error_body_str)
437
453
  end
438
454
 
439
455
  # Process any remaining data in the buffer after the connection closes
440
- sse_buffer.each_line do |line|
456
+ sse_buffer.force_encoding(Encoding::UTF_8).scrub.each_line do |line|
441
457
  line = line.strip
442
458
  next if line.empty?
443
459
  next unless line.start_with?("data: ")
@@ -558,7 +574,12 @@ module RubyPi
558
574
 
559
575
  when "message_delta"
560
576
  delta = data["delta"] || {}
561
- finish_reason = delta["stop_reason"]
577
+ # Only overwrite finish_reason when this delta actually carries a
578
+ # stop_reason. Anthropic emits the stop_reason on a single
579
+ # message_delta near the end of the stream; a later message_delta
580
+ # without one must not clobber the captured value back to nil
581
+ # (which would yield a Response with no finish_reason).
582
+ finish_reason = delta["stop_reason"] if delta["stop_reason"]
562
583
  if data.key?("usage")
563
584
  usage_info = data["usage"]
564
585
  usage_data[:completion_tokens] = usage_info["output_tokens"]
@@ -79,13 +79,22 @@ module RubyPi
79
79
  # Authentication errors are not retryable — raise immediately
80
80
  raise
81
81
  rescue RubyPi::RateLimitError, RubyPi::ApiError, RubyPi::TimeoutError => e
82
+ # NOTE: RubyPi::ProviderError is intentionally NOT retried. Provider
83
+ # errors are overwhelmingly deterministic request-construction
84
+ # failures (missing tool_call_id, invalid tool-argument JSON, missing
85
+ # tool name) raised by build_request_body BEFORE any HTTP call. They
86
+ # produce the identical error on every attempt, so retrying only
87
+ # burns the backoff schedule before surfacing the same failure.
88
+ # Fallback wrappers still rescue RubyPi::Error (the ProviderError
89
+ # superclass), so provider failover is unaffected.
90
+ #
82
91
  # Retry up to max_retries times AFTER the initial attempt.
83
92
  # With max_retries: 3, attempt goes 1 (initial), 2, 3, 4 — the condition
84
93
  # `attempt <= @max_retries` allows retries on attempts 1..3, so we get
85
94
  # 3 retries + 1 initial = 4 total attempts. Previously used `< @max_retries`
86
95
  # which was off-by-one (only 2 retries with max_retries: 3).
87
96
  if attempt <= @max_retries
88
- delay = calculate_backoff(attempt)
97
+ delay = retry_delay_for(e, attempt)
89
98
  log_retry(attempt, delay, e)
90
99
  sleep(delay)
91
100
  retry
@@ -127,6 +136,29 @@ module RubyPi
127
136
  raise RubyPi::AbstractMethodError, :perform_complete
128
137
  end
129
138
 
139
+ # Maximum delay (seconds) honored from a server-provided Retry-After
140
+ # header. Caps pathological or misconfigured server values so a single
141
+ # 429 cannot stall the client indefinitely.
142
+ RETRY_AFTER_CEILING = 60.0
143
+
144
+ # Picks the delay before the next retry. A server-provided Retry-After
145
+ # on a 429 takes precedence over the local exponential backoff: the
146
+ # server knows its own cooldown window, and retrying earlier just burns
147
+ # the retry budget against guaranteed 429s. Retry-After parsed from an
148
+ # HTTP-date (rather than delta-seconds) arrives as 0.0 and falls through
149
+ # to the computed backoff.
150
+ #
151
+ # @param error [Exception] the error that triggered the retry
152
+ # @param attempt [Integer] the current attempt number (1-based)
153
+ # @return [Float] delay in seconds
154
+ def retry_delay_for(error, attempt)
155
+ if error.is_a?(RubyPi::RateLimitError) && error.retry_after&.positive?
156
+ [error.retry_after, RETRY_AFTER_CEILING].min
157
+ else
158
+ calculate_backoff(attempt)
159
+ end
160
+ end
161
+
130
162
  # Calculates the backoff delay for a given retry attempt using
131
163
  # exponential backoff with jitter.
132
164
  #
@@ -178,6 +210,45 @@ module RubyPi
178
210
  end
179
211
  end
180
212
 
213
+ # Wraps an HTTP block, translating Faraday transport-level exceptions
214
+ # (DNS failures, connection resets, TLS handshakes, read/write timeouts)
215
+ # into the RubyPi typed-error hierarchy so callers and the retry loop
216
+ # can rescue them uniformly.
217
+ #
218
+ # Without this wrapper, a `Faraday::TimeoutError` or
219
+ # `Faraday::ConnectionFailed` would propagate out of the provider as
220
+ # the raw Faraday class. That breaks two contracts:
221
+ # 1. The documented retry policy (BaseProvider#complete) only rescues
222
+ # RubyPi errors, so transport failures would not be retried —
223
+ # exactly the case retries exist for.
224
+ # 2. Callers `rescue RubyPi::TimeoutError` per the documented error
225
+ # hierarchy and would not catch real network timeouts.
226
+ #
227
+ # @yield the HTTP call to wrap
228
+ # @return [Object] whatever the block returns
229
+ # @raise [RubyPi::TimeoutError] on Faraday::TimeoutError
230
+ # @raise [RubyPi::ApiError] on connection failures, SSL errors, or
231
+ # any other Faraday::Error not otherwise classified
232
+ def with_transport_errors
233
+ yield
234
+ rescue Faraday::TimeoutError => e
235
+ raise RubyPi::TimeoutError, "#{provider_name} request timed out: #{e.message}"
236
+ rescue Faraday::ConnectionFailed, Faraday::SSLError => e
237
+ raise RubyPi::ApiError.new(
238
+ "#{provider_name} transport error: #{e.class}: #{e.message}",
239
+ status_code: nil,
240
+ response_body: nil
241
+ )
242
+ rescue Faraday::Error => e
243
+ # Catch-all for any other Faraday-level failure (parsing, adapter
244
+ # issues, etc.) so transport problems never leak provider internals.
245
+ raise RubyPi::ApiError.new(
246
+ "#{provider_name} HTTP client error: #{e.class}: #{e.message}",
247
+ status_code: nil,
248
+ response_body: nil
249
+ )
250
+ end
251
+
181
252
  # Handles HTTP error responses by raising the appropriate RubyPi error.
182
253
  # When streaming with on_data, the response body is consumed by the
183
254
  # callback and response.body may be empty. Pass override_body with the