RubyGems - ruby-pi - Versions diffs - 0.1.5 → 0.1.6 - Mend

ruby-pi 0.1.5 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +11 -0
data/lib/ruby_pi/context/compaction.rb +37 -2
data/lib/ruby_pi/llm/anthropic.rb +12 -8
data/lib/ruby_pi/llm/base_provider.rb +40 -1
data/lib/ruby_pi/llm/gemini.rb +112 -27
data/lib/ruby_pi/llm/openai.rb +36 -12
data/lib/ruby_pi/tools/executor.rb +10 -1
data/lib/ruby_pi/version.rb +1 -1
metadata +3 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 45a50058497c3e040f81e977ffdbe0030b901ea053e91002e4ddddba06c23a6f
-  data.tar.gz: 6066bc184d7f8eb951ae137595a8ed082ec41c3abc8ca84cbfe5b4206fde6f66
+  metadata.gz: c78d37122ed67d80e61cf51b182dcd79a20a7efa77b503c8b0340963ad60b728
+  data.tar.gz: e3b147cb2b01fe28ac15c2a65d6177156992be7560601886296b16941784ee08
 SHA512:
-  metadata.gz: d486b3a171dca8c59bf442ba5c03f135ee450a083c47c05f075af6b23d87eb24285c72c734a9579456d71d6ac247ed4357f7bd16b813219e77655a98f44b0cc2
-  data.tar.gz: 602ed6dc493731203c9fedb8d69f118c81ce292c653ca42ac4ca0d9f8d793dc281697fa87f9df9f7601022812dd791d4420d0f22ba89ad6051119791e87f33ca
+  metadata.gz: cbc0c9abddf98885bf1a22352a9cd09475c324f9aff4bcdff66ce3a6a87e06eb677ab045c966038666744cf9819d5114714c66ba5b7c676de5958d5d964a6242
+  data.tar.gz: 3f9c28b1a30d0e3ad0f1badd391c95065adea822927c1d334dc5fc5c9867e658b43e339e9307dd3eba8dd5a534043c9fae3ea8d0384bfae8eb35a1a09356f035

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,17 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.6] - 2026-05-01
+### Fixed (adversarial review round 4)
+- **Faraday transport errors leaked untyped, bypassed retry (Critical)**: `BaseProvider#complete` rescued only `RubyPi::*` errors, but providers never wrapped Faraday network exceptions. A `Faraday::TimeoutError`, `Faraday::ConnectionFailed`, or `Faraday::SSLError` propagated as the raw Faraday class — breaking the documented error hierarchy and skipping the retry loop entirely (the exact case retries exist for). Added `BaseProvider#with_transport_errors` which translates `Faraday::TimeoutError` → `RubyPi::TimeoutError` and `Faraday::ConnectionFailed`/`SSLError`/other `Faraday::Error` → `RubyPi::ApiError`. Wrapped every `conn.post` call in all three providers (standard and streaming paths). `RubyPi::ProviderError` is now also retryable
+- **Gemini multi-turn tool use was broken (Critical)**: `Gemini#format_message` rendered assistant messages as text-only and silently dropped the `:tool_calls` field set by the agent loop. The next turn's `functionResponse` had no preceding `functionCall` to bind to, so Gemini rejected any conversation that included a tool call followed by a tool result. Assistant messages now emit one `functionCall` part per tool call (mirroring Anthropic's `tool_use` and OpenAI's `tool_calls` behavior). Empty text parts are also no longer emitted on tool-only assistant turns
+- **Compaction split tool_use/tool_result pairs (Critical)**: When `preserve_last_n` cut between an assistant `tool_calls` message (in droppable) and its matching `:tool` result (in preserved), Anthropic and OpenAI rejected the conversation with "tool_result without preceding tool_use". Compaction now strips orphan `:tool` messages from the head of preserved (moves them into droppable so they're summarized away). Mirror case where preserved starts with a tool result whose assistant is the last droppable message also handled
+- **`Tools::Executor` swallowed non-StandardError exceptions as nil success (Major)**: The worker thread rescued only `StandardError`. A tool block raising `Interrupt`, `SystemExit`, or any other `Exception` subclass left both `value` and `error` nil; the join then reported a *successful* `nil` result. Now rescues `Exception`, captures it as a failed `Result`. Worker thread also sets `report_on_exception = false` to avoid stderr spam
+- **Gemini tool_call IDs collided across turns (Major)**: IDs were generated as `"gemini_#{accumulated_tool_calls.length}"` — every response restarted numbering at 0, so a multi-turn conversation produced multiple tool calls all named `"gemini_0"`. Any caller using ID as a hash key (observability, result correlation) saw collisions. IDs now use `SecureRandom.hex(8)` for global uniqueness across both standard and streaming responses
+- **OpenAI passed malformed tool_call.arguments JSON verbatim (Minor)**: A non-JSON string in `tool_call.arguments` on an assistant message was forwarded unchanged to OpenAI, producing an opaque HTTP 400. Now validated up-front with `JSON.parse`; malformed input raises a typed `RubyPi::ProviderError` with the tool name and parse error before sending the request, matching Anthropic's input validation
 ## [0.1.5] - 2026-04-30
 ### Fixed (adversarial review round 3)

data/lib/ruby_pi/context/compaction.rb CHANGED Viewed

@@ -75,12 +75,47 @@ module RubyPi
         # Split into messages to summarize and messages to keep
         preserved_count = [@preserve_last_n, messages.size].min
-        droppable = messages[0...(messages.size - preserved_count)]
-        preserved = messages[(messages.size - preserved_count)..]
+        droppable = messages[0...(messages.size - preserved_count)].dup
+        preserved = messages[(messages.size - preserved_count)..].dup
         # If there's nothing to drop, we can't compact further
         return nil if droppable.empty?
+        # Anthropic and OpenAI both require every tool_result / tool message
+        # to reference a tool_use / tool_call from a preceding assistant
+        # message. If we summarize the assistant turn that originated a tool
+        # call but keep the matching tool_result, the API rejects the
+        # request with "tool_result without preceding tool_use".
+        #
+        # The boundary between droppable and preserved can split a tool
+        # exchange in two ways:
+        #   (a) preserved starts with one or more :tool messages whose
+        #       matching assistant turn is in droppable. Strip those
+        #       orphan tool messages from the head of preserved (move
+        #       them into droppable so they are summarized, not sent).
+        #   (b) the last droppable message is an :assistant with tool_calls,
+        #       but its matching :tool result(s) are in preserved. Pull
+        #       that assistant message back into preserved so the pair
+        #       stays intact.
+        #
+        # We apply (a) first: it's the common case (preserve_last_n=4 cuts
+        # mid-pair, leaving a stranded tool message). Then (b) catches the
+        # mirror case.
+        while preserved.first && preserved.first[:role] == :tool
+          droppable << preserved.shift
+        end
+        if droppable.last &&
+           droppable.last[:role] == :assistant &&
+           droppable.last[:tool_calls].is_a?(Array) &&
+           !droppable.last[:tool_calls].empty? &&
+           preserved.first && preserved.first[:role] == :tool
+          preserved.unshift(droppable.pop)
+        end
+        # After the boundary fix-ups, droppable may have become empty.
+        return nil if droppable.empty?
         # Generate a summary of the dropped messages
         summary = summarize(droppable)

data/lib/ruby_pi/llm/anthropic.rb CHANGED Viewed

@@ -330,9 +330,11 @@ module RubyPi
           headers: default_headers
         )
-        response = conn.post("/v1/messages") do |req|
-          req.headers["Content-Type"] = "application/json"
-          req.body = JSON.generate(body)
+        response = with_transport_errors do
+          conn.post("/v1/messages") do |req|
+            req.headers["Content-Type"] = "application/json"
+            req.body = JSON.generate(body)
+          end
         end
         handle_error_response(response) unless response.success?
@@ -375,11 +377,12 @@ module RubyPi
         # full body even though on_data consumed the chunks.
         error_body = +""
-        response = conn.post("/v1/messages") do |req|
-          req.headers["Content-Type"] = "application/json"
-          req.body = JSON.generate(body)
+        response = with_transport_errors do
+          conn.post("/v1/messages") do |req|
+            req.headers["Content-Type"] = "application/json"
+            req.body = JSON.generate(body)
-          # Use Faraday's on_data callback for real incremental streaming.
+            # Use Faraday's on_data callback for real incremental streaming.
           # Without this, Faraday buffers the entire response body before
           # returning, which means no deltas reach the caller until the model
           # finishes generating (fake streaming).
@@ -424,7 +427,8 @@ module RubyPi
               finish_reason = stream_state[:finish_reason]
             end
           end
-        end
+          end # conn.post
+        end # with_transport_errors
         # Check for HTTP errors. When on_data was active, the response body
         # was consumed by the callback, so we pass the accumulated error_body

data/lib/ruby_pi/llm/base_provider.rb CHANGED Viewed

@@ -78,7 +78,7 @@ module RubyPi
         rescue RubyPi::AuthenticationError
           # Authentication errors are not retryable — raise immediately
           raise
-        rescue RubyPi::RateLimitError, RubyPi::ApiError, RubyPi::TimeoutError => e
+        rescue RubyPi::RateLimitError, RubyPi::ApiError, RubyPi::TimeoutError, RubyPi::ProviderError => e
           # Retry up to max_retries times AFTER the initial attempt.
           # With max_retries: 3, attempt goes 1 (initial), 2, 3, 4 — the condition
           # `attempt <= @max_retries` allows retries on attempts 1..3, so we get
@@ -178,6 +178,45 @@ module RubyPi
         end
       end
+      # Wraps an HTTP block, translating Faraday transport-level exceptions
+      # (DNS failures, connection resets, TLS handshakes, read/write timeouts)
+      # into the RubyPi typed-error hierarchy so callers and the retry loop
+      # can rescue them uniformly.
+      #
+      # Without this wrapper, a `Faraday::TimeoutError` or
+      # `Faraday::ConnectionFailed` would propagate out of the provider as
+      # the raw Faraday class. That breaks two contracts:
+      #   1. The documented retry policy (BaseProvider#complete) only rescues
+      #      RubyPi errors, so transport failures would not be retried —
+      #      exactly the case retries exist for.
+      #   2. Callers `rescue RubyPi::TimeoutError` per the documented error
+      #      hierarchy and would not catch real network timeouts.
+      #
+      # @yield the HTTP call to wrap
+      # @return [Object] whatever the block returns
+      # @raise [RubyPi::TimeoutError] on Faraday::TimeoutError
+      # @raise [RubyPi::ApiError] on connection failures, SSL errors, or
+      #   any other Faraday::Error not otherwise classified
+      def with_transport_errors
+        yield
+      rescue Faraday::TimeoutError => e
+        raise RubyPi::TimeoutError, "#{provider_name} request timed out: #{e.message}"
+      rescue Faraday::ConnectionFailed, Faraday::SSLError => e
+        raise RubyPi::ApiError.new(
+          "#{provider_name} transport error: #{e.class}: #{e.message}",
+          status_code: nil,
+          response_body: nil
+        )
+      rescue Faraday::Error => e
+        # Catch-all for any other Faraday-level failure (parsing, adapter
+        # issues, etc.) so transport problems never leak provider internals.
+        raise RubyPi::ApiError.new(
+          "#{provider_name} HTTP client error: #{e.class}: #{e.message}",
+          status_code: nil,
+          response_body: nil
+        )
+      end
       # Handles HTTP error responses by raising the appropriate RubyPi error.
       # When streaming with on_data, the response body is consumed by the
       # callback and response.body may be empty. Pass override_body with the

data/lib/ruby_pi/llm/gemini.rb CHANGED Viewed

@@ -6,6 +6,8 @@
 # the Gemini REST API for both synchronous and streaming completions, including
 # tool/function calling support.
+require "securerandom"
 module RubyPi
   module LLM
     # Google Gemini provider implementation. Communicates with the Gemini
@@ -115,44 +117,116 @@ module RubyPi
       # Converts a normalized message hash to Gemini's content format.
       #
+      # Critically, an assistant message that carries `tool_calls` (set by
+      # the agent loop after a tool-using turn) must be rendered with one
+      # `functionCall` part per tool call. Without those parts, Gemini
+      # rejects any subsequent `functionResponse` on the next turn because
+      # the response has nothing to correlate against. Earlier versions
+      # dropped `tool_calls` here, breaking multi-turn tool use.
+      #
       # @param message [Hash] a message with :role and :content keys
       # @return [Hash] Gemini-formatted content object
       def format_message(message)
         role = message[:role]&.to_s || message["role"]&.to_s || "user"
-        content = message[:content] || message["content"] || ""
-        # Gemini uses "user" and "model" roles. Map tool results to "user"
-        # role with a functionResponse part when we have the metadata, or
-        # plain text otherwise. System messages should have been extracted
-        # by build_request_body before reaching this method.
-        gemini_role = case role
-                      when "assistant" then "model"
-                      when "tool"      then "user"
-                      else                  role
-                      end
-        # Tool-role messages carry function call results. When tool_call_id
-        # and name are present, send as a Gemini functionResponse so the
-        # model can correlate the result with its earlier functionCall.
+        content = message[:content] || message["content"]
+        # Tool-role messages carry function-call results. When the tool name
+        # is present, send as a Gemini functionResponse so the model can
+        # correlate the result with its earlier functionCall. System messages
+        # should have been extracted by build_request_body before reaching
+        # this method.
         tool_name = message[:name] || message["name"]
         if role == "tool" && tool_name
+          # Gemini's functionResponse expects a structured `response` object.
+          # Tool results are pre-serialized by the loop as either a JSON
+          # string (success) or an "Error: ..." string (failure). Try to
+          # parse JSON so the model receives structured data; fall back to
+          # wrapping the raw string under :result for plain-text content.
+          response_payload = parse_tool_response(content)
           return {
             role: "user",
             parts: [{
               functionResponse: {
                 name: tool_name.to_s,
-                response: { result: content.to_s }
+                response: response_payload
               }
             }]
           }
         end
+        # Assistant messages may carry `tool_calls` from a prior turn. Each
+        # one must be emitted as a `functionCall` part on the model turn so
+        # that the next turn's `functionResponse` has something to bind to.
+        if role == "assistant"
+          parts = []
+          text = content.to_s
+          parts << { text: text } unless text.empty?
+          tool_calls = message[:tool_calls] || message["tool_calls"]
+          if tool_calls.is_a?(Array)
+            tool_calls.each do |tc|
+              tc_name = (tc[:name] || tc["name"]).to_s
+              tc_args = tc[:arguments] || tc["arguments"] || {}
+              tc_args = parse_tool_arguments(tc_args)
+              parts << { functionCall: { name: tc_name, args: tc_args } }
+            end
+          end
+          # Gemini rejects an empty parts array on a model turn. If the
+          # assistant truly had no content and no tool_calls, fall back to
+          # an empty text part.
+          parts << { text: "" } if parts.empty?
+          return { role: "model", parts: parts }
+        end
         {
-          role: gemini_role,
+          role: role,
           parts: [{ text: content.to_s }]
         }
       end
+      # Best-effort parse of a tool-result string into a structured object
+      # for Gemini's `functionResponse.response`. JSON content is returned
+      # as-is (wrapped in a hash if it parsed to a non-hash); non-JSON
+      # content (e.g., "Error: ...") is wrapped under :result.
+      #
+      # @param content [String, Hash, nil]
+      # @return [Hash]
+      def parse_tool_response(content)
+        return { result: "" } if content.nil?
+        return content if content.is_a?(Hash)
+        str = content.to_s
+        return { result: str } if str.strip.empty?
+        begin
+          parsed = JSON.parse(str)
+          parsed.is_a?(Hash) ? parsed : { result: parsed }
+        rescue JSON::ParserError
+          { result: str }
+        end
+      end
+      # Coerce a tool_call.arguments value (Hash, JSON string, or other)
+      # into a Hash suitable for Gemini's `functionCall.args`. Malformed
+      # or non-Hash values become an empty hash so the request is still
+      # well-formed.
+      #
+      # @param args [Hash, String, nil]
+      # @return [Hash]
+      def parse_tool_arguments(args)
+        return args if args.is_a?(Hash)
+        return {} unless args.is_a?(String) && !args.strip.empty?
+        begin
+          parsed = JSON.parse(args)
+          parsed.is_a?(Hash) ? parsed : {}
+        rescue JSON::ParserError
+          {}
+        end
+      end
       # Converts a tool definition to Gemini's function declaration format.
       # Accepts either a RubyPi::Tools::Definition or a plain Hash.
       #
@@ -198,9 +272,11 @@ module RubyPi
         conn = build_connection(base_url: BASE_URL, headers: default_headers)
         url = "/#{API_VERSION}/models/#{@model}:generateContent"
-        response = conn.post(url) do |req|
-          req.headers["Content-Type"] = "application/json"
-          req.body = JSON.generate(body)
+        response = with_transport_errors do
+          conn.post(url) do |req|
+            req.headers["Content-Type"] = "application/json"
+            req.body = JSON.generate(body)
+          end
         end
         handle_error_response(response) unless response.success?
@@ -233,11 +309,12 @@ module RubyPi
         response_status = nil
         error_body = +""
-        response = conn.post(url) do |req|
-          req.headers["Content-Type"] = "application/json"
-          req.body = JSON.generate(body)
+        response = with_transport_errors do
+          conn.post(url) do |req|
+            req.headers["Content-Type"] = "application/json"
+            req.body = JSON.generate(body)
-          # Use Faraday's on_data callback for real incremental streaming.
+            # Use Faraday's on_data callback for real incremental streaming.
           # Without this, Faraday buffers the entire response body before
           # returning — no deltas reach the caller until the model finishes
           # generating (fake streaming).
@@ -281,7 +358,12 @@ module RubyPi
                 elsif part.key?("functionCall")
                   fc = part["functionCall"]
                   tool_call = ToolCall.new(
-                    id: "gemini_#{accumulated_tool_calls.length}",
+                    # Generate a globally-unique ID per tool call. A simple
+                    # length-based counter ("gemini_0", "gemini_1") collides
+                    # across turns since each response restarts numbering at
+                    # 0, breaking any caller that uses ID as a hash key for
+                    # observability or result correlation.
+                    id: "gemini_#{SecureRandom.hex(8)}",
                     name: fc["name"],
                     arguments: fc["args"] || {}
                   )
@@ -308,7 +390,8 @@ module RubyPi
               end
             end
           end
-        end
+          end # conn.post
+        end # with_transport_errors
         # When on_data is active, the response body was consumed by the
         # callback. Pass the accumulated error_body so ApiError carries the
@@ -347,7 +430,9 @@ module RubyPi
           elsif part.key?("functionCall")
             fc = part["functionCall"]
             tool_calls << ToolCall.new(
-              id: "gemini_#{tool_calls.length}",
+              # See note in perform_streaming_request: per-response counters
+              # collide across turns, so we generate a globally-unique ID.
+              id: "gemini_#{SecureRandom.hex(8)}",
               name: fc["name"],
               arguments: fc["args"] || {}
             )

data/lib/ruby_pi/llm/openai.rb CHANGED Viewed

@@ -183,11 +183,31 @@ module RubyPi
             tc_name = tc[:name] || tc["name"]
             tc_args = tc[:arguments] || tc["arguments"] || {}
-            # OpenAI requires arguments as a JSON string
-            args_string = if tc_args.is_a?(String)
-                            tc_args
-                          elsif tc_args.is_a?(Hash)
+            # OpenAI requires arguments to be a JSON-encoded string. We
+            # validate up-front so a malformed string fails fast with a
+            # typed error here rather than as an opaque HTTP 400 from
+            # OpenAI. This mirrors Anthropic's input validation in
+            # build_assistant_message.
+            args_string = case tc_args
+                          when Hash
                             JSON.generate(tc_args)
+                          when String
+                            stripped = tc_args.strip
+                            if stripped.empty?
+                              "{}"
+                            else
+                              begin
+                                JSON.parse(tc_args)
+                                tc_args
+                              rescue JSON::ParserError => e
+                                raise RubyPi::ProviderError.new(
+                                  "Invalid JSON in assistant tool_call.arguments " \
+                                  "for tool '#{tc_name || "unknown"}': #{e.message} " \
+                                  "(raw: #{tc_args.inspect})",
+                                  provider: :openai
+                                )
+                              end
+                            end
                           else
                             "{}"
                           end
@@ -261,9 +281,11 @@ module RubyPi
           headers: default_headers
         )
-        response = conn.post("/v1/chat/completions") do |req|
-          req.headers["Content-Type"] = "application/json"
-          req.body = JSON.generate(body)
+        response = with_transport_errors do
+          conn.post("/v1/chat/completions") do |req|
+            req.headers["Content-Type"] = "application/json"
+            req.body = JSON.generate(body)
+          end
         end
         handle_error_response(response) unless response.success?
@@ -300,11 +322,12 @@ module RubyPi
         response_status = nil
         error_body = +""
-        response = conn.post("/v1/chat/completions") do |req|
-          req.headers["Content-Type"] = "application/json"
-          req.body = JSON.generate(body)
+        response = with_transport_errors do
+          conn.post("/v1/chat/completions") do |req|
+            req.headers["Content-Type"] = "application/json"
+            req.body = JSON.generate(body)
-          # Use Faraday's on_data callback for real incremental streaming.
+            # Use Faraday's on_data callback for real incremental streaming.
           # Without this, Faraday buffers the entire response body before
           # returning — no deltas reach the caller until the model finishes
           # generating (fake streaming).
@@ -389,7 +412,8 @@ module RubyPi
               end
             end
           end
-        end
+          end # conn.post
+        end # with_transport_errors
         # When on_data is active, the response body was consumed by the
         # callback. Pass the accumulated error_body so ApiError carries the

data/lib/ruby_pi/tools/executor.rb CHANGED Viewed

@@ -195,9 +195,18 @@ module RubyPi
         error = nil
         worker = Thread.new do
+          # Don't spam stderr from the rescued worker thread.
+          Thread.current.report_on_exception = false
           begin
             value = tool.call(arguments)
-          rescue StandardError => e
+          rescue Exception => e # rubocop:disable Lint/RescueException
+            # Rescue the full Exception hierarchy (not just StandardError).
+            # If a tool block raises Interrupt, SystemExit, or any other
+            # non-StandardError, rescuing only StandardError leaves both
+            # `value` and `error` nil; the join then reports a successful
+            # nil result — a panic in a tool silently becomes "returned nil".
+            # Capture the failure here; the main thread surfaces it as a
+            # failed Result. The worker thread itself does not propagate.
             error = e
           end
         end

data/lib/ruby_pi/version.rb CHANGED Viewed

@@ -7,5 +7,5 @@
 module RubyPi
   # The current version of the RubyPi gem, following Semantic Versioning.
-  VERSION = "0.1.5"
+  VERSION = "0.1.6"
 end

metadata CHANGED Viewed

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: ruby-pi
 version: !ruby/object:Gem::Version
-  version: 0.1.5
+  version: 0.1.6
 platform: ruby
 authors:
 - RubyPi Contributors
 bindir: bin
 cert_chain: []
-date: 1980-01-02 00:00:00.000000000 Z
+date: 2026-05-01 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: faraday
@@ -157,7 +157,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.6.9
+rubygems_version: 3.6.2
 specification_version: 4
 summary: AI agent harness for Ruby — build LLM agents with tool calling, streaming,
   and a unified interface to OpenAI, Anthropic Claude, and Google Gemini.