RubyGems - pikuri-core - Versions diffs - 0.0.6 → 0.0.7 - Mend

pikuri-core 0.0.6 → 0.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

checksums.yaml +4 -4
data/README.md +5 -3
data/lib/pikuri/agent/chat_transport.rb +135 -11
data/lib/pikuri/agent/configurator.rb +4 -4
data/lib/pikuri/agent/context_window_detector.rb +103 -52
data/lib/pikuri/agent/control/step_limit.rb +39 -7
data/lib/pikuri/agent/event.rb +43 -16
data/lib/pikuri/agent/extension.rb +31 -17
data/lib/pikuri/agent/extension_context.rb +147 -0
data/lib/pikuri/agent/listener/terminal.rb +13 -2
data/lib/pikuri/agent/listener/token_log.rb +60 -13
data/lib/pikuri/agent/listener.rb +12 -5
data/lib/pikuri/agent/listener_list.rb +7 -17
data/lib/pikuri/agent/synthesizer.rb +93 -67
data/lib/pikuri/agent.rb +358 -403
data/lib/pikuri/sanitizer.rb +179 -0
data/lib/pikuri/tool/parameters.rb +65 -2
data/lib/pikuri/tool/search/brave.rb +32 -18
data/lib/pikuri/tool/search/duckduckgo.rb +18 -7
data/lib/pikuri/tool/search/engines.rb +72 -49
data/lib/pikuri/tool/search/exa.rb +34 -22
data/lib/pikuri/tool/web_search.rb +45 -26
data/lib/pikuri/version.rb +1 -1
data/lib/pikuri-core.rb +11 -9
metadata +5 -6

data/lib/pikuri/agent.rb CHANGED Viewed

@@ -18,10 +18,16 @@ module Pikuri
   # Two seams are visible:
   #
   # 1. **Listeners** ({ListenerList} + {Listener::Base} subclasses)
-  #    — pure consumers of the event stream. The +Agent+ is the
-  #    sole emitter; listeners never write back. New rendering or
-  #    capture targets (a web sink, a structured log) are added
-  #    here without touching {Agent}.
+  #    — pure consumers of the event stream; they never write back.
+  #    The +Agent+ emits every loop-narration {Event} variant;
+  #    extensions emit their own domain events through the
+  #    {ExtensionContext} capability facade handed to
+  #    {Extension#bind}. There is no public path from an +Agent+
+  #    reference to emission — no +listeners+ reader, no +chat+
+  #    reader, no emit method — so holding an agent grants read
+  #    access to its configuration and nothing more. New rendering
+  #    or capture targets (a web sink, a structured log) are added
+  #    as listeners without touching {Agent}.
   # 2. **Controls** ({Control::StepLimit}, {Control::Cancellable},
   #    {Control::Interloper}) — host-facing signal holders. The
   #    +Agent+ reads from them at well-defined boundaries:
@@ -32,24 +38,32 @@ module Pikuri
   #    {Control::Interloper#drain!} on every +after_tool_result+.
   #
   # The two roles are named separately so "what fires when" is a
-  # single grep for +@listeners.emit+ in this file.
+  # single grep for +@listeners.emit+ in this file (loop narration)
+  # plus the capability calls in {ExtensionContext} (domain events).
   #
-  # == Step-exhaustion rescue
+  # == Step-exhaustion policy
   #
   # If the +step_limit:+ {Control::StepLimit} trips during
-  # +Chat#ask+, {#run_loop} catches the +Exceeded+ exception,
-  # emits an {Event::FallbackNotice} to the listener stream, and
-  # hands off to {Synthesizer.run} on a fresh +RubyLLM::Chat+.
-  # The synth reuses the parent's listener stream via
-  # {ListenerList#for_sub_agent} (Terminal padded, TokenLog
-  # zeroed, recorder shared by reference) with a +name:+ derived
-  # from the parent's. The synth shares the parent's
-  # +cancellable+ so a user cancel during synthesis still works,
-  # and gets a fresh +step_limit+ at +max: 1+ (defensive — the
-  # synth has no tools and shouldn't trip it). The synth's
-  # answer becomes the value reported by
-  # {#last_assistant_content}, so callers (notably the +agent+ tool
-  # from +pikuri-subagents+) still get a usable reply.
+  # completion, {#run_loop} catches the +Exceeded+ exception and
+  # applies the budget's {Control::StepLimit#on_exhausted} policy
+  # (see that class header for how hosts pick):
+  #
+  # - +:raise+ (the default) — re-raise to the host, same shape as
+  #   the cancellation rescue below. The chat history survives and
+  #   {Control::StepLimit#reset!} fires at the next turn boundary,
+  #   so a REPL user can simply say "continue".
+  # - +:synthesize+ — emit an {Event::FallbackNotice} and run the
+  #   {Synthesizer} prompt on a nested tools-free +Agent+ (the
+  #   same construction shape the +agent+ tool from
+  #   +pikuri-subagents+ uses for sub-agents): parent's listener
+  #   stream derived via {ListenerList#for_sub_agent} (Terminal
+  #   padded, TokenLog zeroed, recorder shared by reference),
+  #   parent's +cancellable+ shared so a user cancel during
+  #   synthesis still works, a defensive +step_limit+ at +max: 1+
+  #   (the synth has no tools and shouldn't tick it). The synth's
+  #   answer becomes the value reported by
+  #   {#last_assistant_content}, so callers (notably the +agent+
+  #   tool from +pikuri-subagents+) still get a usable reply.
   #
   # == Cancellation rescue
   #
@@ -68,253 +82,13 @@ module Pikuri
     LOGGER = Pikuri.logger_for('Agent')
     private_constant :LOGGER
-    # Wire one +RubyLLM::Chat+ for pikuri's event stream and
-    # controls. Used by both {#initialize} (on the main chat) and
-    # {Synthesizer.run} (on the synth chat) so the two share one
-    # source of truth for "which callback emits which event."
-    #
-    # Handles the three message-level registered callbacks
-    # (+after_message+, +before_tool_call+, +after_tool_result+);
-    # the per-chunk streaming callback is separate because
-    # ruby_llm takes it as a block to +Chat#ask+ rather than a
-    # registered hook — see {.streaming_block}.
-    #
-    # @param chat [RubyLLM::Chat] the chat instance to wire
-    # @param listeners [ListenerList] the listener stream events
-    #   flow into
-    # @param step_limit [Control::StepLimit, nil] when set,
-    #   {Control::StepLimit#tick!} is poked on every
-    #   +before_tool_call+ (and raises {Control::StepLimit::Exceeded}
-    #   when over budget)
-    # @param cancellable [Control::Cancellable, nil] when set,
-    #   {Control::Cancellable#check!} is poked on every
-    #   +before_tool_call+ (and raises
-    #   {Control::Cancellable::Cancelled} when the flag is up)
-    # @param interloper [Control::Interloper, nil] when set, the
-    #   queue is drained on every +after_tool_result+, each item
-    #   appended as a +role: :user+ message and emitted as
-    #   {Event::UserTurn} with +mid_loop: true+
-    # @param on_user_message [Proc, nil] when set, called with each
-    #   drained interloper +content+ String *after* it is appended
-    #   to the chat — the per-turn {Extension#on_user_message}
-    #   dispatch (prefetch + recording). Threaded through here rather
-    #   than fired inline so {Synthesizer.run}, which reuses this
-    #   wiring without an interloper or memory, simply passes +nil+.
-    #   Only consulted when +interloper+ is also set.
-    # @return [void]
-    def self.wire_chat(chat, listeners:, step_limit: nil, cancellable: nil, interloper: nil,
-                       on_user_message: nil)
-      chat.after_message do |msg|
-        emit_after_message(msg, listeners)
-      end
-      chat.before_tool_call do |tc|
-        listeners.emit(Event::ToolCall.new(name: tc.name, arguments: tc.arguments))
-        step_limit&.tick!
-        cancellable&.check!
-      end
-      chat.after_tool_result do |result|
-        listeners.emit(Event::ToolResult.new(content: result))
-        drain_interloper(interloper, chat, listeners, on_user_message) if interloper
-      end
-    end
-    # Build the per-chunk streaming block passed to +Chat#ask+.
-    # Each invocation of the returned proc converts one
-    # +RubyLLM::Chunk+ into zero, one, or two delta events
-    # ({Event::ThinkingDelta} / {Event::AssistantDelta}) on
-    # +listeners+. Tool-call chunks are intentionally ignored —
-    # partial JSON has no useful rendering; the assembled
-    # +tool_calls+ surface through {Event::ToolCall} once the
-    # message completes.
-    #
-    # Lives parallel to {.wire_chat} (instead of being folded into
-    # it) because +Chat#ask+ takes the streaming block as an
-    # argument rather than a registered callback, so both
-    # {#run_loop} and {Synthesizer.run} pass it inline at the call
-    # site with +&Agent.streaming_block(listeners: ..., cancellable: ...)+.
-    #
-    # == Cancellation polling
-    #
-    # When +cancellable+ is non-nil, {Control::Cancellable#check!}
-    # fires *before* each chunk's emit. The +before_tool_call+
-    # wiring in {.wire_chat} only fires when the model requests a
-    # tool, which leaves a no-tool turn (e.g. a plain greeting)
-    # with zero cancellation points — Ctrl+C trips the flag but
-    # nothing reads it. Polling on every streamed chunk closes
-    # that gap: an in-flight Cancellation+check! raises on the
-    # next chunk delivered after the flag flips, the exception
-    # propagates out through ruby_llm's streaming path
-    # (+Chat#ask+ doesn't rescue), and {#run_loop} catches it,
-    # emits {Event::Cancelled}, and re-raises. The pre-emit
-    # ordering is deliberate: a chunk that arrives after a cancel
-    # request shouldn't render — the user has said stop.
-    #
-    # @param listeners [ListenerList] the listener stream chunk
-    #   events flow into
-    # @param cancellable [Control::Cancellable, nil] when non-nil,
-    #   polled on every chunk so a flag flipped mid-stream raises
-    #   {Control::Cancellable::Cancelled} on the very next chunk
-    # @return [Proc] a +-> (chunk) { ... }+ proc suitable for
-    #   passing to +Chat#ask+ with +&+
-    def self.streaming_block(listeners:, cancellable: nil)
-      ->(chunk) {
-        cancellable&.check!
-        emit_chunk(chunk, listeners)
-      }
-    end
-    # Normalize a +RubyLLM::Chat+ +after_message+ payload into
-    # zero, one, or two {Event} variants (+Thinking+ and/or
-    # +Assistant+) plus one {Event::Tokens} for the usage block.
-    # Empty thinking / empty content are filtered here so
-    # listeners never see vacuous events. Non-assistant roles
-    # (e.g. tool-role messages echoed back through
-    # +after_message+) are skipped entirely.
-    #
-    # +msg+ is a +RubyLLM::Message+. Beyond +role+, +content+,
-    # +thinking+, and the +*_tokens+ accessors used here, it also
-    # carries +msg.tool_calls+ on assistant turns that requested
-    # one and +msg.raw+ for the unparsed provider payload.
-    #
-    # @param msg [RubyLLM::Message]
-    # @param listeners [ListenerList]
-    # @return [void]
-    def self.emit_after_message(msg, listeners)
-      return unless msg.role == :assistant
-      text = msg.thinking&.text
-      listeners.emit(Event::Thinking.new(content: text)) if text && !text.empty?
-      content = msg.content
-      listeners.emit(Event::Assistant.new(content: content)) if content.is_a?(String) && !content.empty?
-      listeners.emit(Event::Tokens.new(
-                       input: msg.input_tokens,
-                       output: msg.output_tokens,
-                       cached: msg.cached_tokens,
-                       cache_creation: msg.cache_creation_tokens,
-                       thinking: msg.thinking_tokens,
-                       model_id: msg.model_id
-                     ))
-    end
-    private_class_method :emit_after_message
-    # Normalize a +RubyLLM::Chunk+ from a streaming +Chat#ask+
-    # into zero, one, or two delta events
-    # ({Event::ThinkingDelta} / {Event::AssistantDelta}). Empty
-    # +thinking.text+ and empty +content+ are filtered here so
-    # listeners never see vacuous fragments. Tool-call deltas are
-    # intentionally skipped — see {.streaming_block}.
-    #
-    # +chunk+ is a +RubyLLM::Chunk+ (subclass of +RubyLLM::Message+),
-    # so the same +.thinking+ / +.content+ accessors used in
-    # {.emit_after_message} apply.
-    #
-    # @param chunk [RubyLLM::Chunk]
-    # @param listeners [ListenerList]
-    # @return [void]
-    def self.emit_chunk(chunk, listeners)
-      thinking = chunk.thinking&.text
-      listeners.emit(Event::ThinkingDelta.new(content: thinking)) if thinking && !thinking.empty?
-      content = chunk.content
-      listeners.emit(Event::AssistantDelta.new(content: content)) if content.is_a?(String) && !content.empty?
-    end
-    private_class_method :emit_chunk
-    # Drain the interloper queue: for each pending item, append a
-    # +role: :user+ message to the chat history so the next
-    # round-trip sees it, emit an {Event::UserTurn} with
-    # +mid_loop: true+ to the listener stream so renderers see the
-    # injection, then run the per-turn {Extension#on_user_message}
-    # dispatch (so mid-loop injections are prefetched + recorded
-    # exactly like initial turns).
-    #
-    # The dispatch runs *after* the +:user+ append so any
-    # +<memory-context>+ it injects lands as a +:system+ message
-    # right behind the user turn it annotates — the same
-    # append-at-the-tail ordering {#run_loop} produces for initial
-    # turns.
-    #
-    # @param interloper [Control::Interloper]
-    # @param chat [RubyLLM::Chat]
-    # @param listeners [ListenerList]
-    # @param on_user_message [Proc, nil] per-content dispatch; +nil+
-    #   skips it (e.g. an interloper with no memory extension wired)
-    # @return [void]
-    def self.drain_interloper(interloper, chat, listeners, on_user_message = nil)
-      interloper.drain!.each do |content|
-        chat.add_message(role: :user, content: content)
-        listeners.emit(Event::UserTurn.new(content: content, mid_loop: true))
-        on_user_message&.call(content)
-      end
-    end
-    private_class_method :drain_interloper
-    # One-shot inference. Builds a fresh +RubyLLM::Chat+ with no
-    # tools, no MCP, no listeners, no step budget, asks +prompt+ as
-    # the single user turn, and returns the assistant's reply as a
-    # plain String. Lives parallel to {#initialize} / {#run_loop}
-    # because the use case (e.g. summarizing an MCP server's tool
-    # set into a short description block before any agent turn
-    # runs) is genuinely one-shot — there is no loop, no tool
-    # iteration, no listener stream.
-    #
-    # +prompt+ is sent as the user message. For a one-shot call
-    # there is no behavioral difference between the system slot
-    # and the user slot, so we use one parameter; pack any
-    # "instructions + data" framing into +prompt+ directly.
-    #
-    # == Cancellation
-    #
-    # {Control::Cancellable#check!} fires once before the call and
-    # once after, so a flag flipped right around the request
-    # raises {Control::Cancellable::Cancelled} promptly. The
-    # in-flight HTTP call itself is *not* interrupted — same
-    # "gentle cancel" semantic the main loop offers (see
-    # {Control::Cancellable}'s class header). For 30s synthesis
-    # passes at boot this is still a useful escape hatch: the next
-    # check raises and the call returns.
-    #
-    # == Failure
-    #
-    # Errors from the provider (HTTP failure, malformed response,
-    # +RubyLLM+ raising) propagate to the caller verbatim — there
-    # is no recovery layer here. Callers that want "fail soft on
-    # synthesis errors" (e.g. {Mcp::Servers}) rescue at their level
-    # and fall back to a default; this method stays loud.
-    #
-    # @param transport [ChatTransport] same model-resolution
-    #   triple {#initialize} uses; if +model+ is +nil+, falls
-    #   back to +RubyLLM.config.default_model+
-    # @param prompt [String] the prompt sent as the single user
-    #   turn; must be non-blank
-    # @param cancellable [Control::Cancellable, nil] when set,
-    #   checked before the call so a flag flipped right
-    #   around the request raises {Control::Cancellable::Cancelled}
-    # @return [String] the assistant's reply content
-    # @raise [ArgumentError] when +prompt+ is +nil+, empty, or
-    #   whitespace-only
-    # @raise [Control::Cancellable::Cancelled] when the
-    #   +cancellable+ flag was tripped at the pre-call check
-    def self.think(transport:, prompt:, cancellable: nil)
-      raise ArgumentError, "prompt must not be blank, got #{prompt.inspect}" \
-        if prompt.nil? || prompt.to_s.strip.empty?
-      transport = transport.with(model: RubyLLM.config.default_model) unless transport.model
-      cancellable&.check!
-      chat = RubyLLM.chat(**transport.to_h)
-      chat.ask(prompt)
-      last = chat.messages.reverse.find { |m| m.role == :assistant }
-      last&.content.to_s
-    end
-    # @param transport [ChatTransport] the model-resolution triple
-    #   (+model+ / +provider+ / +assume_model_exists+) forwarded
-    #   to +RubyLLM.chat+. Bundled into one value object so every
-    #   construction site — this constructor and the synthesizer
-    #   rescue below — can forward all three with one assignment
-    #   instead of three kwargs (where dropping one would silently
+    # @param transport [ChatTransport] the model-resolution bundle
+    #   (+model+ / +provider+ / +assume_model_exists+ and, for a model
+    #   on a non-global server, +api_base+ / +api_key+) the chat is
+    #   built from. Bundled into one value object so every construction
+    #   site — this constructor, the synthesizer rescue below, a
+    #   mid-conversation switch — can forward it with one assignment
+    #   instead of loose kwargs (where dropping one would silently
     #   route the chat elsewhere or raise
     #   +RubyLLM::ModelNotFoundError+). If +transport.model+ is
     #   +nil+, it's filled in from +RubyLLM.config.default_model+.
@@ -326,7 +100,10 @@ module Pikuri
     # @param step_limit [Control::StepLimit, nil] step budget
     #   control. When set, {Control::StepLimit#tick!} fires on
     #   every +before_tool_call+ and {Control::StepLimit#reset!}
-    #   at the start of each turn. +nil+ means "no step budget"
+    #   at the start of each turn; the budget's
+    #   {Control::StepLimit#on_exhausted} policy decides what
+    #   {#run_loop} does when it trips (see "Step-exhaustion
+    #   policy" in the class header). +nil+ means "no step budget"
     #   (the agent can loop indefinitely).
     # @param cancellable [Control::Cancellable, nil] cancellation
     #   control. When set, {Control::Cancellable#check!} fires on
@@ -339,18 +116,6 @@ module Pikuri
     #   every +after_tool_result+ and each item becomes a
     #   {Event::UserTurn} with +mid_loop: true+. +nil+ means
     #   "no mid-loop injection" (the bundled CLIs default).
-    # @param context_window [Integer, nil] explicit override for
-    #   the model's context-window cap. When set, it wins over
-    #   ruby_llm's reported value and the llama.cpp probe — see
-    #   {ContextWindowDetector} for precedence. Resolved cap is
-    #   emitted as an {Event::ContextCap} immediately after
-    #   construction.
-    # @param llama_probe_url [String, nil] llama.cpp +/props+ URL
-    #   used as the third detection source. Only consulted when
-    #   neither +context_window+ nor ruby_llm's reported value is
-    #   set. Typically derived by +bin/pikuri-chat+ from its
-    #   configured +openai_api_base+; leave +nil+ when the
-    #   configured server is anything other than llama.cpp.
     # @param id [String] unique identifier for this agent. Empty
     #   for the main agent; sub-agents get persona-rooted ids
     #   like +"researcher 0"+, +"researcher 1"+, +"file_miner 0"+, ...
@@ -362,8 +127,8 @@ module Pikuri
     #   the codebase for the persona-name load (the value the LLM
     #   picks in the +agent+ tool's +name:+ argument).
     # @param streaming [Boolean] opt into chunk-level streaming.
-    #   When +true+, {#run_loop} passes the block returned by
-    #   {.streaming_block} to +Chat#ask+, and ruby_llm requests
+    #   When +true+, {#run_loop} passes a per-chunk block to
+    #   +Chat#complete+, and ruby_llm requests
     #   SSE responses from the provider — chunks are normalized
     #   into {Event::ThinkingDelta} / {Event::AssistantDelta} on
     #   the listener stream as they arrive. When +false+ (the
@@ -386,7 +151,7 @@ module Pikuri
     # @return [Agent]
     def initialize(transport:, system_prompt:,
                    step_limit: nil, cancellable: nil, interloper: nil,
-                   context_window: nil, llama_probe_url: nil, id: '',
+                   id: '',
                    streaming: false,
                    &block)
       @transport = transport.model ? transport : transport.with(model: RubyLLM.config.default_model)
@@ -403,8 +168,6 @@ module Pikuri
       # Stashed for {#run_configure}, which runs the failure-prone
       # build phase below out of a separate method.
       @block = block
-      @context_window = context_window
-      @llama_probe_url = llama_probe_url
       # Register *before* the build phase so a mid-construction raise
       # is still recoverable: extensions arm their cleanup via
@@ -427,9 +190,6 @@ module Pikuri
       end
     end
-    # @return [RubyLLM::Chat] underlying chat; the extension seam
-    attr_reader :chat
     # @return [ChatTransport] the resolved transport bundle this
     #   agent was constructed with — same model id / provider /
     #   assume-model-exists flag passed to every +RubyLLM.chat+
@@ -473,10 +233,6 @@ module Pikuri
     #   each persona owns its own system prompt verbatim.
     attr_reader :system_prompt
-    # @return [ListenerList] the listener list attached to this
-    #   agent's chat
-    attr_reader :listeners
     # @return [Control::StepLimit, nil] the step-budget control
     #   this agent was constructed with, or +nil+ when none.
     attr_reader :step_limit
@@ -521,13 +277,15 @@ module Pikuri
     #   extensions).
     attr_reader :extensions
-    # @return [Integer, nil] context-window cap resolved by
-    #   {ContextWindowDetector} at construction time. +nil+ when
-    #   no source produced a value (custom local model with no
-    #   override and no reachable llama.cpp +/props+). Read by
-    #   extensions that spawn their own ruby_llm calls (notably
-    #   the +agent+ tool from +pikuri-subagents+, so spawned
-    #   sub-agents inherit the same cap without re-probing).
+    # @return [Integer, nil] resolved context-window cap — the
+    #   {ChatTransport#context_window} if one was given, else what
+    #   {ContextWindowDetector} probed. +nil+ when neither produced
+    #   a value (a non-llama server with no explicit cap). Re-resolved
+    #   on every model switch (see {#run_loop}'s +transport:+). Read by
+    #   extensions that spawn their own ruby_llm calls (notably the
+    #   +agent+ tool from +pikuri-subagents+, which hands a sub-agent
+    #   +parent.transport.with(context_window: this)+ so the resolved
+    #   cap rides along without a re-probe).
     attr_reader :context_window_cap
     # Final assistant message content for the most recent
@@ -552,11 +310,15 @@ module Pikuri
     # and any other observable output is the listeners'
     # responsibility.
     #
-    # If the +step_limit+ control trips during +ask+, the rescue
-    # branch emits an {Event::FallbackNotice} and runs
-    # {Synthesizer.run} on a fresh +RubyLLM::Chat+. The synth's
-    # answer is captured for {#last_assistant_content}; the
-    # exception does not bubble out.
+    # If the +step_limit+ control trips during completion, the
+    # rescue branch applies its {Control::StepLimit#on_exhausted}
+    # policy: +:raise+ re-raises the +Exceeded+ exception to the
+    # host (chat history intact — the next turn's +reset!+
+    # refreshes the budget, so "continue" just works);
+    # +:synthesize+ emits an {Event::FallbackNotice} and runs the
+    # {Synthesizer} prompt on a nested tools-free agent, capturing
+    # its answer for {#last_assistant_content}. See
+    # "Step-exhaustion policy" in the class header.
     #
     # If the +cancellable+ control trips during +ask+, the rescue
     # branch emits an {Event::Cancelled} and re-raises the
@@ -566,19 +328,45 @@ module Pikuri
     # Subsequent calls keep building on the same chat history, so
     # the model sees full multi-turn context.
     #
+    # == Switching models mid-conversation
+    #
+    # Passing a +transport:+ that differs from the current one
+    # switches the underlying chat to that model — via
+    # +Chat#with_model+, so the history and the registered
+    # callbacks survive — re-resolves the context-window cap, and
+    # emits an {Event::ModelSwitched} followed by a fresh
+    # {Event::ContextCap}. The switch is deliberately confined to
+    # the top of this method (a private +apply_transport!+) rather
+    # than exposed as a standalone setter: the chat is
+    # single-thread-confined, so doing it here serializes the swap
+    # with the turn on the loop's own thread — a background thread
+    # mutating +with_model+'s connection mid-completion would tear
+    # an in-flight stream. A +nil+ +transport:+ (the default) keeps
+    # the current model. The conversation is *not* re-baselined: a
+    # switch is the same conversation under a new model, so the
+    # message count and running context size carry over (the next
+    # turn's token report self-corrects to the new model's count).
+    #
     # @param user_message [String] the user's request for this
     #   turn; must not be +nil+, empty, or whitespace-only
+    # @param transport [ChatTransport, nil] when non-+nil+ and
+    #   structurally different from the current transport, switch to
+    #   it before running the turn (see above); +nil+ keeps the
+    #   current model
     # @raise [ArgumentError] if +user_message+ is +nil+, empty,
     #   or contains only whitespace — an empty turn would poison
     #   the chat history and burn a step budget on nothing
     # @raise [Control::Cancellable::Cancelled] if the registered
     #   {Control::Cancellable} was triggered during the turn;
     #   the listener stream sees an {Event::Cancelled} first
+    # @raise [Control::StepLimit::Exceeded] if the step budget
+    #   tripped and its policy is +:raise+ (the default)
     # @return [nil]
-    def run_loop(user_message:)
+    def run_loop(user_message:, transport: nil)
       raise ArgumentError, "user_message must not be blank, got #{user_message.inspect}" \
         if user_message.nil? || user_message.to_s.strip.empty?
+      apply_transport!(transport) if transport
       @synth_answer = nil
       @step_limit&.reset!
       @cancellable&.reset!
@@ -594,7 +382,7 @@ module Pikuri
       @listeners.emit(Event::UserTurn.new(content: user_message, mid_loop: false))
       dispatch_ext_on_user_message(user_message)
       if @streaming
-        @chat.complete(&self.class.streaming_block(listeners: @listeners, cancellable: @cancellable))
+        @chat.complete(&streaming_block)
       else
         @chat.complete
       end
@@ -603,42 +391,16 @@ module Pikuri
       @listeners.emit(Event::Cancelled.new)
       raise
     rescue Control::StepLimit::Exceeded => e
-      @listeners.emit(Event::FallbackNotice.new(
-                        reason: "agent exhausted #{e.max_steps} steps; synthesizing answer from gathered evidence"
-                      ))
+      raise unless @step_limit&.on_exhausted == :synthesize
-      # Synth runs under this agent's identity but on a fresh
-      # chat with a different system prompt, so it gets a
-      # distinct +_synthesizer+ suffix on the id — same +_+
-      # separator the sub-agent generator uses, so main becomes
-      # +"synthesizer"+ and a sub-agent +"researcher 0"+ becomes
-      # +"researcher 0_synthesizer"+. Any +TokenLog+ in the list
-      # tags the synth's prompt under that bracket so it's
-      # obvious from the log which turns were the rescue rather
-      # than the original loop.
-      synth_id = @id.empty? ? 'synthesizer' : "#{@id}_synthesizer"
-      synth_chat = RubyLLM.chat(**@transport.to_h)
-      # Defensive step limit on the synth: the synth has no
-      # tools so it should never trip +before_tool_call+, but
-      # guarding the budget anyway means a buggy provider that
-      # somehow returns a tool call doesn't loop forever.
-      synth_step_limit = @step_limit && Control::StepLimit.new(max: 1)
-      @synth_answer = Synthesizer.run(
-        chat: synth_chat,
-        parent_messages: @chat.messages,
-        user_message: user_message,
-        listeners: @listeners.for_sub_agent(id: synth_id),
-        step_limit: synth_step_limit,
-        cancellable: @cancellable,
-        streaming: @streaming
-      )
+      @synth_answer = Synthesizer.run_synthesizer(@extension_context, @chat.messages, user_message)
       nil
     end
     # Release agent-owned resources. Fires every handler registered
     # via {Configurator#on_close} (during the +Agent.new+ block) and
-    # {#on_close} (during {Extension#bind} or any post-construction
-    # call), in LIFO order — matches Ruby +ensure+-block semantics
+    # {ExtensionContext#on_close} (during {Extension#bind} or any
+    # later hook), in LIFO order — matches Ruby +ensure+-block semantics
     # so handlers registered later (which may depend on handlers
     # registered earlier) tear down first. Each handler runs inside
     # its own +rescue+; an exception is logged via
@@ -661,46 +423,6 @@ module Pikuri
       end
     end
-    # Register a handler called by {#close}. Symmetric to
-    # {Configurator#on_close} — same LIFO + per-handler-rescue +
-    # idempotent semantics — but available post-construction, so
-    # an {Extension}'s +bind(agent)+ can install per-agent cleanup
-    # that's keyed to this specific agent rather than the parent.
-    #
-    # @yield called with no arguments at close time
-    # @return [void]
-    def on_close(&blk)
-      raise ArgumentError, 'on_close requires a block' unless block_given?
-      @on_close_handlers << blk
-      nil
-    end
-    # Register a raw +RubyLLM::Tool+ subclass on this agent's
-    # underlying chat, bypassing the {Pikuri::Tool} strict-validation
-    # seam. Sole intended caller: {Mcp::Servers::Connect}, which uses
-    # this to lazy-add MCP-exposed tools after the LLM invokes
-    # +mcp_connect+ in a turn.
-    #
-    # The +internal_+ prefix is the warning: native pikuri tools
-    # should go through {Pikuri::Tool} so they get
-    # {Tool::Parameters} validation and the LLM-actionable
-    # +"Error: ..."+ contract. MCP tools deliberately don't — see
-    # IDEAS.md §"v1 implementation shape" / "MCP tools bypass
-    # +Pikuri::Tool+ entirely."
-    #
-    # The added tool does NOT enter +@tools+, only +@chat+'s tool
-    # list. Sub-agents (the +agent+ tool from +pikuri-subagents+)
-    # therefore cannot snapshot it — which is the whole point:
-    # activation is strictly per-agent, see IDEAS.md §"Per-agent
-    # activation, no propagation".
-    #
-    # @param ruby_llm_tool [Class] subclass of +RubyLLM::Tool+
-    # @return [void]
-    def internal_add_tool(ruby_llm_tool)
-      @chat.with_tool(ruby_llm_tool)
-    end
     # Short, single-line config dump suitable for a startup
     # banner or a debug print.
     #
@@ -748,42 +470,164 @@ module Pikuri
       end
       @extensions = configurator.extensions.dup
-      @chat = RubyLLM.chat(**@transport.to_h)
+      @chat = build_chat(@transport)
       @chat.with_instructions(@system_prompt)
       @tools.each { |t| @chat.with_tool(t.to_ruby_llm_tool) }
-      @context_window_cap = ContextWindowDetector.new(
-        override: @context_window,
-        ruby_llm_reported: @chat.model.context_window,
-        llama_probe_url: @llama_probe_url,
-        model_id: @chat.model.id
-      ).detect
+      # Wire @chat for pikuri's event stream and controls — the
+      # three message-level registered callbacks (+after_message+,
+      # +before_tool_call+, +after_tool_result+). The per-chunk
+      # streaming callback is separate because ruby_llm takes it as
+      # a block to +Chat#complete+ rather than a registered hook —
+      # see {#streaming_block}. Together with the +@listeners.emit+
+      # calls in {#run_loop} / {#dispatch_ext_on_user_message} this
+      # is the complete "which callback emits which event" map.
+      @chat.after_message do |msg|
+        emit_after_message(msg)
+      end
+      @chat.before_tool_call do |tc|
+        @listeners.emit(Event::ToolCall.new(name: tc.name, arguments: tc.arguments))
+        @step_limit&.tick!
+        @cancellable&.check!
+      end
+      @chat.after_tool_result do |result|
+        @listeners.emit(Event::ToolResult.new(content: result))
+        drain_interloper if @interloper
+      end
-      self.class.wire_chat(
-        @chat,
-        listeners: @listeners,
-        step_limit: @step_limit,
-        cancellable: @cancellable,
-        interloper: @interloper,
-        on_user_message: method(:dispatch_ext_on_user_message)
+      # Context-window cap: lets every listener that cares (notably
+      # TokenLog) pick the value off the stream before any Tokens
+      # event arrives. Re-fires on each model switch (see
+      # {#apply_transport!}).
+      detect_and_emit_context_cap!
+      # The runtime capability facade — constructed once, after the
+      # chat and listener list are final, and handed to every
+      # extension's #bind / #on_user_message. The ONLY object that
+      # grants emission / raw-tool-registration / close-handler
+      # capabilities; the Agent itself exposes no public path to
+      # them. See {ExtensionContext}.
+      @extension_context = ExtensionContext.new(
+        agent: self, chat: @chat, listeners: @listeners,
+        on_close_sink: @on_close_handlers
       )
-      # One-shot context-window cap: lets every listener that
-      # cares (notably TokenLog) pick the value off the stream
-      # before any Tokens event arrives.
+      # Bind sweep — each extension gets its chance to install
+      # per-agent state (dynamic tools via
+      # {ExtensionContext#add_raw_tool}, per-agent close hooks via
+      # {ExtensionContext#on_close}, domain-event wiring via
+      # {ExtensionContext#emit_event}, etc.) now that the chat is
+      # fully wired. See IDEAS.md §"Extension protocol design" for
+      # what #configure vs #bind are each for.
+      @extensions.each { |ext| ext.bind(@extension_context) }
+    end
+    # Resolve the context-window cap and announce it on the listener
+    # stream as an {Event::ContextCap}. The transport's explicit
+    # +context_window+ wins verbatim; otherwise {ContextWindowDetector}
+    # probes the server (yielding +nil+ for a non-llama one). Because the
+    # cap rides {ChatTransport}, a model switch resolves the *new*
+    # transport's cap — explicit caps don't bleed across models. Shared
+    # by {#run_configure} (construction) and {#apply_transport!} (each
+    # model switch) so "how the cap is resolved and emitted" lives in
+    # one place.
+    #
+    # @return [void]
+    def detect_and_emit_context_cap!
+      # Probe the server this transport actually targets — after a
+      # cross-server switch the chat's connection points at
+      # +@transport.api_base+, but the process-global config the
+      # detector defaults to still names the *old* server, so derive the
+      # base from the transport (falling back to the global base for a
+      # transport that rides it).
+      @context_window_cap = @transport.context_window ||
+                            ContextWindowDetector.detect(
+                              @transport,
+                              openai_base: @transport.api_base || RubyLLM.config.openai_api_base
+                            )
       @listeners.emit(Event::ContextCap.new(cap: @context_window_cap))
+    end
-      # Bind sweep — each extension gets its chance to install
-      # per-agent state (dynamic tools via #internal_add_tool,
-      # per-agent close hooks via #on_close, etc.) now that the
-      # chat is fully wired. See IDEAS.md §"Extension protocol
-      # design" for what #configure vs #bind are each for.
-      @extensions.each { |ext| ext.bind(self) }
+    # Switch the underlying chat to +transport+ when it differs from
+    # the current one. Called only from the top of {#run_loop} — see
+    # that method's "Switching models mid-conversation" section for why
+    # the swap is confined to the loop's own thread.
+    #
+    # Mirrors {#initialize}'s +nil+-model fill before the structural
+    # comparison, so a +transport+ that defers its model to the default
+    # doesn't read as "different" and switch spuriously.
+    # +Chat#with_model+ swaps only the model / provider / connection,
+    # leaving +@chat+'s message history and registered callbacks
+    # intact, so the conversation continues seamlessly under the new
+    # model. Emits {Event::ModelSwitched} (the narration: old → new
+    # transport, unformatted, for the chrome to present) then re-resolves
+    # and re-emits the cap via {#detect_and_emit_context_cap!}.
+    #
+    # == Cross-server switches
+    #
+    # +with_model+ alone re-resolves against the chat's *existing*
+    # connection, so it can only move between models on one server. When
+    # either the new or the old transport overrides the connection
+    # (+ChatTransport#connection_overrides?+ — a different +api_base+ /
+    # +api_key+), the swap first installs a fresh +RubyLLM::Context+ via
+    # +Chat#with_context+ (which re-points the connection and re-resolves
+    # the model in place, again preserving history + callbacks) before
+    # +with_model+ lands the new model id. The "old overrode" half of
+    # the guard handles switching *back* to a global-config model: the
+    # rebuilt context dups the process-global config, resetting the
+    # connection the previous override installed.
+    #
+    # @param transport [ChatTransport] the model to switch to
+    # @return [void]
+    def apply_transport!(transport)
+      filled = transport.model ? transport : transport.with(model: RubyLLM.config.default_model)
+      return if filled == @transport
+      old = @transport
+      @chat.with_context(build_context(filled)) if filled.connection_overrides? || old.connection_overrides?
+      @chat.with_model(filled.model, provider: filled.provider, assume_exists: filled.assume_model_exists)
+      @transport = filled
+      @listeners.emit(Event::ModelSwitched.new(from: old, to: filled))
+      detect_and_emit_context_cap!
+    end
+    # Build the chat for +transport+: through a dedicated
+    # +RubyLLM::Context+ when it overrides the connection, else through
+    # the process-global +RubyLLM.chat+ (which the construction-time
+    # path has always used).
+    #
+    # @param transport [ChatTransport]
+    # @return [RubyLLM::Chat]
+    def build_chat(transport)
+      if transport.connection_overrides?
+        build_context(transport).chat(**transport.chat_kwargs)
+      else
+        RubyLLM.chat(**transport.chat_kwargs)
+      end
+    end
+    # A +RubyLLM::Context+ carrying +transport+'s connection overrides
+    # mapped onto the provider's ruby_llm config slots
+    # (+#{provider}_api_base+ / +#{provider}_api_key+). +RubyLLM.context+
+    # dups the process-global config, so an absent override inherits the
+    # global value; a transport with no overrides yields a plain dup
+    # (used by {#apply_transport!} to reset a prior override). The
+    # +ChatTransport+ guarantees a non-+nil+ +provider+ whenever an
+    # override is set, so the slot name is always resolvable.
+    #
+    # @param transport [ChatTransport]
+    # @return [RubyLLM::Context]
+    def build_context(transport)
+      slug = transport.provider
+      RubyLLM.context do |c|
+        c.public_send("#{slug}_api_base=", transport.api_base) unless transport.api_base.nil?
+        c.public_send("#{slug}_api_key=", transport.api_key) unless transport.api_key.nil?
+      end
     end
     # Fire the per-turn {Extension#on_user_message} hook on every
     # extension that defines it, appending any returned
-    # +<memory-context>+ block to the chat as a +role: :system+
+    # String text block to the chat as a +role: :system+
     # message right after the user turn it annotates (callers append
     # the +:user+ message first; this runs last). The system role is
     # load-bearing — it tags the block as recalled reference (not new
@@ -797,9 +641,8 @@ module Pikuri
     #
     # Private and the single place the chat log grows by a memory
     # block — keeps "what mutates the log, when" one grep in this
-    # file. Fired from {#run_loop} (initial turn) and, via the
-    # +on_user_message:+ proc threaded into {.wire_chat}, from
-    # {.drain_interloper} (mid-loop interlopers). Called on every
+    # file. Fired from {#run_loop} (initial turn) and from
+    # {#drain_interloper} (mid-loop interlopers). Called on every
     # extension unconditionally — same as {Extension#configure} /
     # {Extension#bind}: the hook is part of the protocol and the
     # {Extension} module supplies a no-op default, so any extension
@@ -811,7 +654,7 @@ module Pikuri
     # @return [void]
     def dispatch_ext_on_user_message(content)
       @extensions.each do |ext|
-        message = ext.on_user_message(self, content)
+        message = ext.on_user_message(@extension_context, content)
         next unless message.is_a?(String) && !message.strip.empty?
         block = message.strip
@@ -820,5 +663,117 @@ module Pikuri
       end
       nil
     end
+    # Build the per-chunk streaming block passed to +Chat#complete+.
+    # Each invocation of the returned proc converts one
+    # +RubyLLM::Chunk+ into zero, one, or two delta events
+    # ({Event::ThinkingDelta} / {Event::AssistantDelta}) on
+    # +@listeners+. Tool-call chunks are intentionally ignored —
+    # partial JSON has no useful rendering; the assembled
+    # +tool_calls+ surface through {Event::ToolCall} once the
+    # message completes.
+    #
+    # == Cancellation polling
+    #
+    # When +@cancellable+ is non-nil, {Control::Cancellable#check!}
+    # fires *before* each chunk's emit. The +before_tool_call+
+    # wiring in {#run_configure} only fires when the model requests a
+    # tool, which leaves a no-tool turn (e.g. a plain greeting)
+    # with zero cancellation points — Ctrl+C trips the flag but
+    # nothing reads it. Polling on every streamed chunk closes
+    # that gap: an in-flight Cancellation+check! raises on the
+    # next chunk delivered after the flag flips, the exception
+    # propagates out through ruby_llm's streaming path
+    # (+Chat#complete+ doesn't rescue), and {#run_loop} catches it,
+    # emits {Event::Cancelled}, and re-raises. The pre-emit
+    # ordering is deliberate: a chunk that arrives after a cancel
+    # request shouldn't render — the user has said stop.
+    #
+    # @return [Proc] a +-> (chunk) { ... }+ proc suitable for
+    #   passing to +Chat#complete+ with +&+
+    def streaming_block
+      ->(chunk) {
+        @cancellable&.check!
+        emit_chunk(chunk)
+      }
+    end
+    # Normalize a +RubyLLM::Chat+ +after_message+ payload into
+    # zero, one, or two {Event} variants (+Thinking+ and/or
+    # +Assistant+) plus one {Event::Tokens} for the usage block.
+    # Empty thinking / empty content are filtered here so
+    # listeners never see vacuous events. Non-assistant roles
+    # (e.g. tool-role messages echoed back through
+    # +after_message+) are skipped entirely.
+    #
+    # +msg+ is a +RubyLLM::Message+. Beyond +role+, +content+,
+    # +thinking+, and the +*_tokens+ accessors used here, it also
+    # carries +msg.tool_calls+ on assistant turns that requested
+    # one and +msg.raw+ for the unparsed provider payload.
+    #
+    # @param msg [RubyLLM::Message]
+    # @return [void]
+    def emit_after_message(msg)
+      return unless msg.role == :assistant
+      text = msg.thinking&.text
+      @listeners.emit(Event::Thinking.new(content: text)) if text && !text.empty?
+      content = msg.content
+      @listeners.emit(Event::Assistant.new(content: content)) if content.is_a?(String) && !content.empty?
+      @listeners.emit(Event::Tokens.new(
+                        input: msg.input_tokens,
+                        output: msg.output_tokens,
+                        cached: msg.cached_tokens,
+                        cache_creation: msg.cache_creation_tokens,
+                        thinking: msg.thinking_tokens,
+                        model_id: msg.model_id
+                      ))
+    end
+    # Normalize a +RubyLLM::Chunk+ from a streaming completion
+    # into zero, one, or two delta events
+    # ({Event::ThinkingDelta} / {Event::AssistantDelta}). Empty
+    # +thinking.text+ and empty +content+ are filtered here so
+    # listeners never see vacuous fragments. Tool-call deltas are
+    # intentionally skipped — see {#streaming_block}.
+    #
+    # +chunk+ is a +RubyLLM::Chunk+ (subclass of +RubyLLM::Message+),
+    # so the same +.thinking+ / +.content+ accessors used in
+    # {#emit_after_message} apply.
+    #
+    # @param chunk [RubyLLM::Chunk]
+    # @return [void]
+    def emit_chunk(chunk)
+      thinking = chunk.thinking&.text
+      @listeners.emit(Event::ThinkingDelta.new(content: thinking)) if thinking && !thinking.empty?
+      content = chunk.content
+      @listeners.emit(Event::AssistantDelta.new(content: content)) if content.is_a?(String) && !content.empty?
+    end
+    # Drain the interloper queue: for each pending item, append a
+    # +role: :user+ message to the chat history so the next
+    # round-trip sees it, emit an {Event::UserTurn} with
+    # +mid_loop: true+ to the listener stream so renderers see the
+    # injection, then run the per-turn {Extension#on_user_message}
+    # dispatch (so mid-loop injections are prefetched + recorded
+    # exactly like initial turns).
+    #
+    # The dispatch runs *after* the +:user+ append so any
+    # +<memory-context>+ it injects lands as a +:system+ message
+    # right behind the user turn it annotates — the same
+    # append-at-the-tail ordering {#run_loop} produces for initial
+    # turns.
+    #
+    # @return [void]
+    def drain_interloper
+      @interloper.drain!.each do |content|
+        @chat.add_message(role: :user, content: content)
+        @listeners.emit(Event::UserTurn.new(content: content, mid_loop: true))
+        dispatch_ext_on_user_message(content)
+      end
+    end
   end
 end