RubyGems - pikuri-core - Versions diffs - 0.0.6 → 0.0.7 - Mend

pikuri-core 0.0.6 → 0.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

checksums.yaml +4 -4
data/README.md +5 -3
data/lib/pikuri/agent/chat_transport.rb +135 -11
data/lib/pikuri/agent/configurator.rb +4 -4
data/lib/pikuri/agent/context_window_detector.rb +103 -52
data/lib/pikuri/agent/control/step_limit.rb +39 -7
data/lib/pikuri/agent/event.rb +43 -16
data/lib/pikuri/agent/extension.rb +31 -17
data/lib/pikuri/agent/extension_context.rb +147 -0
data/lib/pikuri/agent/listener/terminal.rb +13 -2
data/lib/pikuri/agent/listener/token_log.rb +60 -13
data/lib/pikuri/agent/listener.rb +12 -5
data/lib/pikuri/agent/listener_list.rb +7 -17
data/lib/pikuri/agent/synthesizer.rb +93 -67
data/lib/pikuri/agent.rb +358 -403
data/lib/pikuri/sanitizer.rb +179 -0
data/lib/pikuri/tool/parameters.rb +65 -2
data/lib/pikuri/tool/search/brave.rb +32 -18
data/lib/pikuri/tool/search/duckduckgo.rb +18 -7
data/lib/pikuri/tool/search/engines.rb +72 -49
data/lib/pikuri/tool/search/exa.rb +34 -22
data/lib/pikuri/tool/web_search.rb +45 -26
data/lib/pikuri/version.rb +1 -1
data/lib/pikuri-core.rb +11 -9
metadata +5 -6

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ac822a7bd46228f2eea2994c2e1428e3aa90c269e6ebafd603474fb630ba34ee
-  data.tar.gz: 00c69d139bc38c1a881bf87980970672517db51f59469aef266009a803a874db
+  metadata.gz: 3b034cfd9a32f43694e474444155ea4e18c0c728449f50f33755a789b4879ab6
+  data.tar.gz: 0fdff7d50e07f52f9063b240db87e67d8bb93ac85a09cbb56b3caace3232478c
 SHA512:
-  metadata.gz: d148d78b2027d747ef10f4dd7a19252f66bb1b5f99e8eef763149bf3e93ae608a3f7cf739b02ed4d92ab3ab39d8fe6888615a5acaf90dcb44ca783058a95a716
-  data.tar.gz: 32ef75bbd6d825970e5a1e6b5e27cf0fc812980e87e5ddd59b14f2c10772c91cf93003c2ebe2c9501f5e9682e726f77c4c387c7c83b467b9aa57dc91a9a178f0
+  metadata.gz: 25ac79d6951ce4574d6400742bec3603306c72ca334786a13c34c3a3a5ab48de2735be5b5f51b1b6a25aa647c8f3d36da2f53d3bc97072ea276528464d6d4e16
+  data.tar.gz: 60100a4ab334e08066b29057d0eea4003a6c0db67947cf17a81484308dcd305ef08869b740965d6e81be246fd3ca964ad5afb2ca668bc101eaa867ed63c1bc02

data/README.md CHANGED Viewed

@@ -12,8 +12,10 @@ AI-assistant toolkit:
   token accounting, and structured capture.
 - Controls (`StepLimit`, `Cancellable`, `Interloper`) for budget
   enforcement + cancellation.
-- Four stateless bundled tools: `CALCULATOR`, `WEB_SEARCH`,
-  `WEB_SCRAPE`, `FETCH`.
+- Three stateless bundled tools (`CALCULATOR`, `WEB_SCRAPE`,
+  `FETCH`) plus a host-configured web-search tool built via
+  `Pikuri::Tool::WebSearch.build` (provider keys passed in, never
+  read from the environment).
 - A demo binary, `bin/pikuri-chat`.
 Extensions (skills, MCP, workspace, coding stack, named-agent
@@ -57,7 +59,7 @@ agent = Pikuri::Agent.new(
   step_limit: Pikuri::Agent::Control::StepLimit.new(max: 20)
 ) do |c|
   c.add_tool Pikuri::Tool::CALCULATOR
-  c.add_tool Pikuri::Tool::WEB_SEARCH
+  c.add_tool Pikuri::Tool::WebSearch.build
   c.add_listener Pikuri::Agent::Listener::Terminal.new
 end
 agent.run_loop(user_message: 'What is 17 * 23?')

data/lib/pikuri/agent/chat_transport.rb CHANGED Viewed

@@ -2,18 +2,32 @@
 module Pikuri
   class Agent
-    # The trio of arguments that has to travel together to +RubyLLM.chat+
-    # for model resolution to come out the same on every construction:
-    # the model id, the provider hint, and the registry-bypass flag.
+    # Everything that has to travel together for a chat to resolve to
+    # the same model *on the same server* on every construction: the
+    # model id, the provider hint, the registry-bypass flag, and — when
+    # the model lives on a server other than the process-global
+    # +RubyLLM.config+ default — that server's base URL and API key.
     #
     # Bundling them is structural protection against a recurring bug
     # class — every forwarding site (the synthesizer rescue in
     # {Agent#run_loop}, the +agent+ tool from +pikuri-subagents+
-    # spawning a sub-agent) used to pass the three individually, and
-    # dropping one routed the spawned chat to a different server or
-    # raised +RubyLLM::ModelNotFoundError+ on the unknown model id.
-    # With a single value object the call site can't silently miss a
-    # field.
+    # spawning a sub-agent, a mid-conversation model switch) used to
+    # pass the resolution fields individually, and dropping one routed
+    # the chat to a different server or raised
+    # +RubyLLM::ModelNotFoundError+ on the unknown model id. With a
+    # single value object the call site can't silently miss a field.
+    #
+    # == Why +api_base+ / +api_key+ live here
+    #
+    # +RubyLLM::Chat#with_model+ swaps only the model/provider against
+    # the chat's *existing* connection config, so switching to a model
+    # on a different server (a small local llama.cpp vs a big cloud
+    # model) needs the connection to travel with the model — otherwise
+    # the new model id is sent to the old server's URL with the old
+    # key. {Agent} maps these two generic fields onto the provider's
+    # ruby_llm config slots (+#{provider}_api_base+ /
+    # +#{provider}_api_key+) via a per-chat +RubyLLM::Context+; both are
+    # +nil+ for a transport that rides the process-global config.
     #
     # Pure data carrier: no +RubyLLM+ references here, so the seam stays
     # in {Agent}, +bin/pikuri-chat+, and {Tool}.
@@ -25,18 +39,128 @@ module Pikuri
     #   @return [Symbol, nil] forwarded to +RubyLLM.chat+. Required
     #     together with +assume_model_exists+ when pointing at a local
     #     OpenAI-compatible server (llama.cpp, gpustack, ...) whose model
-    #     ids are not in ruby_llm's bundled registry.
+    #     ids are not in ruby_llm's bundled registry; required whenever
+    #     +api_base+ / +api_key+ is set (it names the config slots).
     # @!attribute [r] assume_model_exists
     #   @return [Boolean] forwarded to +RubyLLM.chat+; +true+ skips
     #     ruby_llm's registry lookup and trusts the supplied model id.
     #     Requires +provider+.
-    class ChatTransport < Data.define(:model, :provider, :assume_model_exists)
+    # @!attribute [r] api_base
+    #   @return [String, nil] connection base URL for this model's
+    #     server (e.g. +http://localhost:8080/v1+). +nil+ rides the
+    #     process-global +RubyLLM.config+ base. Mapped to the provider's
+    #     +#{provider}_api_base+ slot by {Agent}.
+    # @!attribute [r] api_key
+    #   @return [String, nil] API key for this model's server. +nil+
+    #     rides the process-global config key. Mapped to the provider's
+    #     +#{provider}_api_key+ slot by {Agent}. Redacted in {#inspect}
+    #     so it never leaks into a log line or backtrace.
+    # @!attribute [r] context_window
+    #   @return [Integer, nil] explicit context-window cap for this
+    #     model on this server, or +nil+ to defer to
+    #     {ContextWindowDetector}'s probe. Travels with the model
+    #     because the cap *is* a per-model-per-server property: a
+    #     {Ctrl+P}-style switch to a different transport must carry its
+    #     own cap, not inherit the previous model's. Never sent to
+    #     ruby_llm (it is neither a {#chat_kwargs} entry nor a
+    #     connection slot) — pure pikuri metadata read by
+    #     {Agent#detect_and_emit_context_cap!}. The cap-inheritance
+    #     channel too: the +agent+ tool from +pikuri-subagents+ and the
+    #     synthesizer hand a spawned agent +parent.transport.with(
+    #     context_window: parent.context_window_cap)+ so the parent's
+    #     *resolved* cap (explicit or probed) rides along without a
+    #     re-probe.
+    class ChatTransport < Data.define(:model, :provider, :assume_model_exists, :api_base, :api_key, :context_window)
+      # Build an +:openai+-provider transport for an OpenAI-compatible
+      # server (a local llama.cpp, a cloud endpoint, ...), carrying that
+      # server's connection so the agent rides a per-chat
+      # +RubyLLM::Context+ instead of the process-global +RubyLLM.config+.
+      # This is the host-boot factory the +bin/pikuri-*+ demos use in
+      # place of +RubyLLM.configure+ — one isolated connection per agent,
+      # so several agents pointed at different servers (and different
+      # keys) don't stomp a shared global.
+      #
+      # +server+ is the bare server origin; a trailing +/v1+ (the
+      # OpenAI-compatible suffix ruby_llm appends to reach
+      # +/v1/chat/completions+) is stripped and re-appended exactly once,
+      # so +https://api.x.ai+, +https://api.x.ai/v1+, and
+      # +https://api.x.ai/v1/+ all normalize to the same +.../v1+ base.
+      # Without this, a +server+ value that already ended in +/v1+ would
+      # double to +/v1/v1+ and every request would 404.
+      #
+      # @param server [String] server origin, with or without a trailing
+      #   +/v1+, e.g. +"http://localhost:8080"+ or +"https://api.x.ai/v1"+
+      # @param model [String] model id served there, trusted verbatim
+      #   (+assume_model_exists+ is +true+, so it need not appear in
+      #   ruby_llm's bundled registry)
+      # @param api_key [String] API key for the server; the conventional
+      #   +"not-needed"+ placeholder for a keyless local server
+      # @param context_window [Integer, nil] explicit context-window cap
+      #   for this model, or +nil+ to defer to {ContextWindowDetector}'s
+      #   +/props+ probe (the right default for a local llama.cpp, which
+      #   reports its launched +n_ctx+; the right *override* for a cloud
+      #   server the probe can't reach, e.g. a 2M-window model on x.ai)
+      # @return [ChatTransport] a transport whose +api_base+ is the
+      #   normalized +.../v1+ URL and whose +api_key+ is +api_key+
+      def self.from_openai_server(server:, model:, api_key: 'not-needed', context_window: nil)
+        base = server.to_s.strip.chomp('/').delete_suffix('/v1')
+        new(
+          model: model,
+          provider: :openai,
+          assume_model_exists: true,
+          api_base: "#{base}/v1",
+          api_key: api_key,
+          context_window: context_window
+        )
+      end
       # @param model [String, nil]
       # @param provider [Symbol, nil]
       # @param assume_model_exists [Boolean]
-      def initialize(model:, provider: nil, assume_model_exists: false)
+      # @param api_base [String, nil]
+      # @param api_key [String, nil]
+      # @param context_window [Integer, nil]
+      # @raise [ArgumentError] if +api_base+ or +api_key+ is set without
+      #   a +provider+ (the provider names the config slots the
+      #   connection overrides map onto)
+      def initialize(model:, provider: nil, assume_model_exists: false,
+                     api_base: nil, api_key: nil, context_window: nil)
+        if (api_base || api_key) && provider.nil?
+          raise ArgumentError, "api_base/api_key require a provider, got #{provider.inspect}"
+        end
         super
       end
+      # The model-resolution kwargs to spread into +RubyLLM.chat+ /
+      # +RubyLLM::Context#chat+. Excludes the connection fields — those
+      # configure the +Context+ the chat is built from, not the +chat+
+      # call itself.
+      #
+      # @return [Hash{Symbol => String, Symbol, Boolean, nil}]
+      def chat_kwargs
+        { model: model, provider: provider, assume_model_exists: assume_model_exists }
+      end
+      # Whether this transport overrides the process-global connection
+      # (and so needs a dedicated +RubyLLM::Context+).
+      #
+      # @return [Boolean]
+      def connection_overrides?
+        !api_base.nil? || !api_key.nil?
+      end
+      # Default +Data#inspect+ would print +api_key+ verbatim, leaking
+      # the secret into any log line, +to_s+ interpolation, or backtrace
+      # that touches the transport. Redact it.
+      #
+      # @return [String]
+      def inspect
+        "#<#{self.class} model=#{model.inspect} provider=#{provider.inspect} " \
+          "assume_model_exists=#{assume_model_exists} api_base=#{api_base.inspect} " \
+          "api_key=#{api_key.nil? ? 'nil' : '[REDACTED]'} context_window=#{context_window.inspect}>"
+      end
+      alias to_s inspect
     end
   end
 end

data/lib/pikuri/agent/configurator.rb CHANGED Viewed

@@ -16,7 +16,7 @@ module Pikuri
     #
     #   Pikuri::Agent.new(transport: ..., system_prompt: ...) do |c|
     #     c.add_listener Pikuri::Agent::Listener::Terminal.new
-    #     c.add_tool     Pikuri::Tool::WEB_SEARCH
+    #     c.add_tool     Pikuri::Tool::WebSearch.build
     #     c.add_tool     Pikuri::Tool::WEB_SCRAPE
     #     c.add_tool     Pikuri::Tool::FETCH
     #     c.add_extension Pikuri::Skill::Extension.new(catalog: catalog)
@@ -116,8 +116,8 @@ module Pikuri
       # @return [Array<#configure>] extension instances added via
       #   {#add_extension}, in declaration order. The Agent ctor
-      #   walks this list and calls +bind(self)+ on each after
-      #   wiring is complete.
+      #   walks this list after wiring is complete and calls +bind+
+      #   on each with the agent's {ExtensionContext}.
       attr_reader :extensions
       # @param transport [Agent::ChatTransport]
@@ -219,7 +219,7 @@ module Pikuri
       # Register an extension. The extension's +configure(self)+ is
       # called immediately so source-order matches execution-order.
-      # The instance is also retained for the +bind(agent)+ sweep
+      # The instance is also retained for the +bind(ctx)+ sweep
       # that runs at the end of {Agent#initialize}.
       #
       # Extensions must implement both +configure+ and +bind+. The

data/lib/pikuri/agent/context_window_detector.rb CHANGED Viewed

@@ -1,36 +1,55 @@
 # frozen_string_literal: true
+require 'ruby_llm'
 require 'faraday'
 require 'json'
 require 'cgi'
 module Pikuri
   class Agent
-    # Resolves the model's context-window cap from three sources, in order:
-    # an explicit override, the value ruby_llm reports for the model, or a
-    # llama.cpp +/props+ probe. Returns +nil+ if none of those produce a
-    # value.
+    # Resolves the model's context-window cap by asking the server that
+    # actually serves it. The only authoritative runtime source pikuri
+    # has is llama.cpp's non-standard +/props+ endpoint, which reports
+    # the server's *launched* +n_ctx+ (the real window — possibly
+    # smaller than the model's theoretical max, e.g. +llama-server -c
+    # 8192+ on a 128k model). Returns +nil+ — an honest "we don't
+    # know" — for anything else.
     #
-    # Used by {Agent#initialize} at construction time to feed
-    # {Listener::TokenLog} a cap it can render alongside the running
-    # context size (so the +ctx=12.2k/32.0k+ line tells the operator how
-    # close the conversation is to the limit).
+    # Used by {Agent#detect_and_emit_context_cap!} at construction and
+    # after every model switch to feed {Listener::TokenLog} a cap it can
+    # render alongside the running context size (so the +ctx=12.2k/32.0k+
+    # line tells the operator how close the conversation is to the limit).
+    # The caller prefers an explicit/inherited
+    # {ChatTransport#context_window} over this probe; this runs only when
+    # the transport carries none.
     #
-    # == Precedence
+    # == Why no ruby_llm registry source
     #
-    # 1. +override+ — the +Agent.new(context_window:)+ kwarg. Wins over
-    #    everything; an explicit value is the operator's statement of
-    #    truth.
-    # 2. +ruby_llm_reported+ — +RubyLLM::Model::Info#context_window+ from
-    #    {Agent#chat}'s resolved model. Populated for models in ruby_llm's
-    #    bundled registry (OpenAI, Anthropic, Gemini, …); +nil+ for custom
-    #    local model ids that fall through to +Model::Info.default+.
-    # 3. +llama_probe_url+ — HTTP GET against llama.cpp's non-standard
-    #    +/props+ endpoint. The server exposes the launched +n_ctx+ at
-    #    +default_generation_settings.n_ctx+ there. Probed only when the
-    #    first two are +nil+. Provider-specific to llama.cpp; the caller
-    #    (typically +bin/pikuri-chat+) derives the right URL from its configured
-    #    base.
+    # +RubyLLM::Model::Info#context_window+ is a static lookup in a
+    # bundled +models.json+ snapshot: +nil+ for every +assume_exists+
+    # local model id, +nil+ for anything newer than the snapshot, and —
+    # worst — a *frozen* value for known models, so a window the provider
+    # later bumped (256k → 1M) still reports the old number. A cap you
+    # have to caveat defeats the cap's only job (a number trustworthy
+    # enough to act on before +RubyLLM::ContextLengthExceededError+), so
+    # pikuri deliberately does not consult it. The probe (server truth)
+    # and an explicit {ChatTransport#context_window} (operator/parent
+    # truth) are the only two sources; absent both, the cap is +nil+.
+    #
+    # == The openai-provider gate + auto-derived URL
+    #
+    # The probe only makes sense against an OpenAI-compatible local
+    # server (llama.cpp), reached through ruby_llm's +:openai+ provider
+    # with a custom base. So {.detect} runs only when
+    # +transport.provider == :openai+ and derives the probe URL from the
+    # *same* +RubyLLM.config.openai_api_base+ the chat itself uses —
+    # +/props+ lives at the host root, NOT under +/v1+, so the +/v1+
+    # suffix is stripped. Deriving from the live config (rather than a
+    # URL passed in) means the probe can't target a different server than
+    # the chat. A bare +:openai+ pointed at real +api.openai.com+ gets
+    # one fast +/props+ 404 that degrades to +nil+ (the simple gate; not
+    # worth narrowing — you're already sending that server the whole
+    # conversation).
     #
     # == llama.cpp router mode
     #
@@ -62,48 +81,80 @@ module Pikuri
       LOGGER = Pikuri.logger_for('ContextWindowDetector')
       # Connect timeout in seconds for the llama.cpp +/props+ probe.
-      # Short on purpose: this runs synchronously during +Agent.new+ and
-      # a wedged server should not stall startup noticeably.
+      # Short on purpose: a server that isn't even listening should fail
+      # fast rather than stall +Agent+ construction.
       #
       # @return [Integer]
       OPEN_TIMEOUT = 2
-      # Read timeout in seconds for the llama.cpp +/props+ probe; matches
-      # {OPEN_TIMEOUT} for the same reason.
+      # Read timeout in seconds for the llama.cpp +/props+ probe.
+      # Generous on purpose, and the reason it differs from
+      # {OPEN_TIMEOUT}: a llama.cpp router answers +/props?model=<id>+
+      # only *after* spinning up that model's instance, and a cold model
+      # load can take 10+ seconds — which the next chat turn must wait
+      # for anyway. A read timeout shorter than the load would abandon
+      # the probe (and lose the cap) precisely when switching to a
+      # cold model. A server that accepts the connection but then hangs
+      # would stall the actual chat identically, so tolerating the wait
+      # here costs nothing extra.
       #
       # @return [Integer]
-      READ_TIMEOUT = 2
-      # @param override [Integer, nil] explicit cap from the caller; wins if
-      #   non-+nil+
-      # @param ruby_llm_reported [Integer, nil] value off
-      #   +RubyLLM::Chat#model.context_window+
-      # @param llama_probe_url [String, nil] full URL to llama.cpp +/props+;
-      #   +nil+ or empty string skips the probe
-      # @param model_id [String, nil] the chat model id, used only to
-      #   follow a llama.cpp router via +/props?model=<id>+ when the bare
-      #   probe reports +role: router+. +nil+ or empty disables that
-      #   second hop.
-      def initialize(override:, ruby_llm_reported:, llama_probe_url:, model_id: nil)
-        @override = override
-        @ruby_llm_reported = ruby_llm_reported
-        @llama_probe_url = llama_probe_url
-        @model_id = model_id
+      READ_TIMEOUT = 30
+      # Resolve the context-window cap for +transport+ by probing the
+      # server that serves it.
+      #
+      # @param transport [Agent::ChatTransport] the model-resolution
+      #   triple; +provider+ gates the probe and +model+ drives the
+      #   router +?model=+ hop
+      # @param openai_base [String, nil] the configured OpenAI-compatible
+      #   base URL the probe URL is derived from; defaults to the live
+      #   +RubyLLM.config.openai_api_base+. Passed explicitly only by
+      #   tests, which don't want to mutate global config.
+      # @return [Integer, nil] the launched +n_ctx+, or +nil+ for a
+      #   non-+:openai+ transport, an unconfigured base, or any probe
+      #   failure
+      def self.detect(transport, openai_base: RubyLLM.config.openai_api_base)
+        return nil unless transport.provider == :openai
+        url = props_url(openai_base)
+        return nil if url.nil?
+        new(probe_url: url, model_id: transport.model).probe
       end
-      # @return [Integer, nil] resolved cap, or +nil+ if no source produced
-      #   one
-      def detect
-        return @override if @override
-        return @ruby_llm_reported if @ruby_llm_reported
-        return nil if @llama_probe_url.nil? || @llama_probe_url.empty?
+      # Derive the llama.cpp +/props+ URL from the OpenAI-compatible
+      # base. +/props+ sits at the host root, so a trailing +/v1+ is
+      # stripped before appending.
+      #
+      # @param openai_base [String, nil]
+      # @return [String, nil] the +/props+ URL, or +nil+ when the base
+      #   is blank
+      def self.props_url(openai_base)
+        base = openai_base.to_s.strip.chomp('/')
+        return nil if base.empty?
+        "#{base.delete_suffix('/v1')}/props"
+      end
+      # @param probe_url [String] full URL to llama.cpp +/props+
+      # @param model_id [String, nil] the chat model id, used to follow a
+      #   llama.cpp router via +/props?model=<id>+ when the bare probe
+      #   reports +role: router+. +nil+ or empty disables that second hop.
+      def initialize(probe_url:, model_id:)
+        @probe_url = probe_url
+        @model_id = model_id
+      end
+      # @return [Integer, nil] resolved cap, or +nil+ if the probe
+      #   produced none
+      def probe
         probe_llama_cpp
       end
       private
       def probe_llama_cpp
-        data = fetch_props(@llama_probe_url)
+        data = fetch_props(@probe_url)
         return nil if data.nil?
         n_ctx = positive_n_ctx(data)
@@ -114,12 +165,12 @@ module Pikuri
         return probe_router_model if data['role'] == 'router' && model_id_present?
         warn_and_nil(
-          "no positive integer at default_generation_settings.n_ctx in #{@llama_probe_url} response"
+          "no positive integer at default_generation_settings.n_ctx in #{@probe_url} response"
         )
       end
       def probe_router_model
-        url = "#{@llama_probe_url}?model=#{CGI.escape(@model_id)}"
+        url = "#{@probe_url}?model=#{CGI.escape(@model_id)}"
         data = fetch_props(url)
         return nil if data.nil?

data/lib/pikuri/agent/control/step_limit.rb CHANGED Viewed

@@ -8,10 +8,27 @@ module Pikuri
       # +Agent+ pokes {#tick!} on every +before_tool_call+
       # callback and {#reset!} at the start of each turn. Once the
       # counter exceeds the configured cap, {#tick!} raises
-      # {Exceeded}, the +Agent+ catches it, and the step-
-      # exhaustion synthesizer rescues to salvage a partial
-      # answer.
+      # {Exceeded} and the +Agent+ applies the {#on_exhausted}
+      # policy: re-raise to the host (the default), or run the
+      # step-exhaustion synthesizer to salvage a partial answer.
+      #
+      # == Why the policy lives here, not on +Agent+
+      #
+      # Synthesis can only ever fire off a tripped step limit, so
+      # an +Agent.new(synthesize: ...)+ kwarg would be meaningless
+      # whenever +step_limit:+ is +nil+ — an invalid combination
+      # the API would have to document away. Attaching the policy
+      # to the budget makes "what happens when the budget runs
+      # out" travel with the budget, and the nonsense state is
+      # unrepresentable. The host picks per wiring: a Q&A REPL
+      # wants +:synthesize+ (salvage an answer from the evidence
+      # gathered so far); a coding agent wants the default
+      # +:raise+ (a tools-free pass can't finish writing code —
+      # stop, let the user say "continue"; {#reset!} at the next
+      # turn boundary refreshes the budget).
       class StepLimit
+        # Valid {#on_exhausted} policies.
+        ON_EXHAUSTED = %i[raise synthesize].freeze
         # Raised by {#tick!} once tool-call count exceeds +max+.
         # Carries the budget that was tripped so rescue clauses
         # can include it in user-facing messages.
@@ -29,13 +46,25 @@ module Pikuri
         # @return [Integer] the configured cap
         attr_reader :max
+        # @return [Symbol] what {Agent#run_loop} does when this
+        #   budget trips: +:raise+ lets {Exceeded} propagate to the
+        #   host; +:synthesize+ runs the tools-free synthesizer
+        #   rescue. See the class header for how to pick.
+        attr_reader :on_exhausted
         # @param max [Integer] hard cap on tool-call rounds; must
         #   be positive
-        # @raise [ArgumentError] if +max+ is zero or negative
-        def initialize(max:)
+        # @param on_exhausted [Symbol] +:raise+ (default) or
+        #   +:synthesize+ — see {#on_exhausted}
+        # @raise [ArgumentError] if +max+ is zero or negative, or
+        #   +on_exhausted+ is not one of {ON_EXHAUSTED}
+        def initialize(max:, on_exhausted: :raise)
           raise ArgumentError, "max must be positive, got #{max}" if max <= 0
+          raise ArgumentError, "on_exhausted must be one of #{ON_EXHAUSTED.inspect}, got #{on_exhausted.inspect}" \
+            unless ON_EXHAUSTED.include?(on_exhausted)
           @max = max
+          @on_exhausted = on_exhausted
           @step = 0
         end
@@ -69,9 +98,12 @@ module Pikuri
         #   can introspect it (and so tests can assert it)
         attr_reader :step
-        # @return [String] short config dump for {Agent#to_s}
+        # @return [String] short config dump for {Agent#to_s}.
+        #   The policy only renders when it's the non-default
+        #   +:synthesize+, so existing banner output is unchanged.
         def to_s
-          "StepLimit(max=#{@max})"
+          policy = @on_exhausted == :raise ? '' : ", on_exhausted=#{@on_exhausted}"
+          "StepLimit(max=#{@max}#{policy})"
         end
       end
     end

data/lib/pikuri/agent/event.rb CHANGED Viewed

@@ -19,6 +19,18 @@ module Pikuri
     # +case+-match on the variant they care about. The per-variant
     # docs below name the emission site for each (which {Agent}
     # callback wires it and what payload it carries).
+    #
+    # == Sealed for loop narration; gems add domain events
+    #
+    # "Sealed" applies to the *loop-narration* vocabulary: the
+    # variants below are the complete set, all emitted by {Agent},
+    # and new chat-loop observability belongs here, not in a gem.
+    # Gems may define their own *domain* events in their own
+    # namespace (e.g. +Pikuri::Tasks::ListChanged+) and emit them
+    # via {ExtensionContext#emit_event}; they ride the same stream.
+    # Listeners must no-op on variants they don't recognize —
+    # {Listener::Base#on_event}'s default plus +case+-fallthrough
+    # give that for free.
     module Event
       # User's input for a turn (+mid_loop: false+, the default) or a
       # host-supplied injection delivered while the loop is running
@@ -74,10 +86,11 @@ module Pikuri
       Assistant = Data.define(:content)
       # Streaming fragment of an assistant reasoning block, pulled
-      # off a +RubyLLM::Chunk+ during a +Chat#ask+ stream. Emitted
-      # by the per-chunk streaming block {Agent.streaming_block}
-      # builds and {Agent#run_loop} / {Synthesizer.run} pass to
-      # +ask+; empty fragments are filtered at the dispatch site.
+      # off a +RubyLLM::Chunk+ during a streaming completion.
+      # Emitted by the per-chunk streaming block {Agent#run_loop}
+      # passes to +Chat#complete+ when the agent's +streaming:+
+      # flag is on; empty fragments are filtered at the dispatch
+      # site.
       #
       # Preview-only, not authoritative: the {Thinking} event
       # emitted from +after_message+ at the end of the round-trip
@@ -100,10 +113,10 @@ module Pikuri
       ThinkingDelta = Data.define(:content)
       # Streaming fragment of an assistant Markdown content block,
-      # pulled off a +RubyLLM::Chunk+ during a +Chat#ask+ stream.
-      # Emitted by the per-chunk streaming block
-      # {Agent.streaming_block} builds and {Agent#run_loop} /
-      # {Synthesizer.run} pass to +ask+; empty fragments are
+      # pulled off a +RubyLLM::Chunk+ during a streaming
+      # completion. Emitted by the per-chunk streaming block
+      # {Agent#run_loop} passes to +Chat#complete+ when the
+      # agent's +streaming:+ flag is on; empty fragments are
       # filtered at the dispatch site.
       #
       # Preview-only, same semantics as {ThinkingDelta}: the
@@ -173,16 +186,30 @@ module Pikuri
       # {Listener::TokenLog#context_window_size} tracks.
       Tokens = Data.define(:input, :output, :cached, :cache_creation, :thinking, :model_id)
-      # Model's resolved context-window cap. Emitted once by
-      # {Agent#initialize} immediately after
-      # {Agent::ContextWindowDetector} runs. Carries +nil+ when no
-      # source produced a value (custom local model with no override
-      # and no reachable llama.cpp +/props+). Listeners that care —
-      # {Listener::TokenLog} renders +ctx=<used>/<cap>+ when set,
-      # +ctx=<used>+ when +nil+ — pick the value off this event and
-      # cache it; non-caring listeners ignore.
+      # Model's resolved context-window cap. Emitted at construction
+      # by {Agent#initialize}, and again after every model switch
+      # (see {Agent#run_loop}'s +transport:+) since the cap is a
+      # property of the model. Carries +nil+ when no source produced
+      # a value (a non-llama server with no explicit cap). Listeners
+      # that care — {Listener::TokenLog} renders +ctx=<used>/<cap>+
+      # when set, +ctx=<used>+ when +nil+ — pick the value off this
+      # event and cache it; non-caring listeners ignore. A second
+      # ContextCap simply overwrites the first; the conversation is
+      # not re-baselined (a switch keeps the running context size).
       ContextCap = Data.define(:cap)
+      # The agent switched to a different model mid-conversation,
+      # emitted by {Agent#run_loop} (via +apply_transport!+) just
+      # before the matching {ContextCap} for the new model. Carries
+      # the old and new {Agent::ChatTransport}s verbatim — unformatted
+      # by design, so each chrome presents them its own way (a TUI
+      # adds ANSI off +.model+, a web client adds CSS). The cap rides
+      # on the paired {ContextCap}, not here, so {Listener::TokenLog}
+      # needs no awareness of this event (its existing {ContextCap}
+      # arm picks up the new cap); a renderer wanting "switched to X
+      # (128k)" on one line correlates the two.
+      ModelSwitched = Data.define(:from, :to)
       # Out-of-band notice that the agent had to take a rescue path.
       # Emitted by {Agent#run_loop} when {Control::StepLimit} trips
       # and the synthesizer fallback runs; carries the reason string