RubyGems - phronomy - Versions diffs - 0.8.0 → 0.9.0 - Mend

phronomy 0.8.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (77) hide show

checksums.yaml +4 -4
data/README.md +31 -41
data/benchmark/baseline.json +1 -1
data/benchmark/bench_agent_invoke.rb +1 -1
data/benchmark/bench_context_assembler.rb +9 -1
data/benchmark/bench_regression.rb +8 -8
data/benchmark/bench_tool_schema.rb +2 -2
data/benchmark/bench_vector_store.rb +1 -1
data/docs/decisions/011-build-context-as-single-llm-input-authority.md +224 -0
data/lib/phronomy/agent/base.rb +253 -351
data/lib/phronomy/agent/concerns/suspendable.rb +6 -6
data/lib/phronomy/agent/context/capability/base.rb +689 -0
data/lib/phronomy/agent/context/capability/scope_policy.rb +54 -0
data/lib/phronomy/agent/context/knowledge/base.rb +58 -0
data/lib/phronomy/agent/context/knowledge/entity_knowledge.rb +102 -0
data/lib/phronomy/agent/context/knowledge/static_knowledge.rb +58 -0
data/lib/phronomy/agent/invocation_pipeline.rb +10 -1
data/lib/phronomy/agent/react_agent.rb +24 -23
data/lib/phronomy/agent/shared_state.rb +2 -2
data/lib/phronomy/agent/tool_executor.rb +1 -1
data/lib/phronomy/concurrency/gate_registry.rb +0 -1
data/lib/phronomy/configuration.rb +0 -6
data/lib/phronomy/llm_context_window/assembler.rb +77 -44
data/lib/phronomy/multi_agent/handoff.rb +4 -4
data/lib/phronomy/multi_agent/orchestrator.rb +1 -1
data/lib/phronomy/multi_agent/team_coordinator.rb +2 -2
data/lib/phronomy/runtime/runtime_metrics.rb +0 -1
data/lib/phronomy/runtime.rb +1 -2
data/lib/phronomy/tool.rb +3 -4
data/lib/phronomy/{tool/agent_tool.rb → tools/agent.rb} +6 -6
data/lib/phronomy/{tool/mcp_tool.rb → tools/mcp.rb} +9 -9
data/lib/phronomy/tools/vector_search.rb +70 -0
data/lib/phronomy/vector_store/async_backend.rb +110 -0
data/lib/phronomy/vector_store/base.rb +89 -0
data/lib/phronomy/vector_store/embeddings/base.rb +41 -0
data/lib/phronomy/vector_store/embeddings/ruby_llm_embeddings.rb +47 -0
data/lib/phronomy/vector_store/in_memory.rb +103 -0
data/lib/phronomy/vector_store/loader/base.rb +27 -0
data/lib/phronomy/vector_store/loader/csv_loader.rb +58 -0
data/lib/phronomy/vector_store/loader/markdown_loader.rb +78 -0
data/lib/phronomy/vector_store/loader/plain_text_loader.rb +24 -0
data/lib/phronomy/vector_store/pgvector.rb +127 -0
data/lib/phronomy/vector_store/redis_search.rb +192 -0
data/lib/phronomy/vector_store/splitter/base.rb +49 -0
data/lib/phronomy/vector_store/splitter/fixed_size_splitter.rb +53 -0
data/lib/phronomy/vector_store/splitter/recursive_splitter.rb +107 -0
data/lib/phronomy/vector_store.rb +16 -4
data/lib/phronomy/version.rb +1 -1
data/lib/phronomy.rb +2 -1
data/scripts/api_snapshot.rb +11 -9
metadata +28 -32
data/lib/phronomy/agent/context/conversation/compaction_context.rb +0 -117
data/lib/phronomy/agent/context/conversation/trigger_context.rb +0 -43
data/lib/phronomy/agent/context/conversation/trim_context.rb +0 -82
data/lib/phronomy/agent/context/knowledge/embeddings/base.rb +0 -45
data/lib/phronomy/agent/context/knowledge/embeddings/ruby_llm_embeddings.rb +0 -51
data/lib/phronomy/agent/context/knowledge/loader/base.rb +0 -31
data/lib/phronomy/agent/context/knowledge/loader/csv_loader.rb +0 -62
data/lib/phronomy/agent/context/knowledge/loader/markdown_loader.rb +0 -82
data/lib/phronomy/agent/context/knowledge/loader/plain_text_loader.rb +0 -28
data/lib/phronomy/agent/context/knowledge/source/base.rb +0 -60
data/lib/phronomy/agent/context/knowledge/source/entity_knowledge.rb +0 -102
data/lib/phronomy/agent/context/knowledge/source/rag_knowledge.rb +0 -63
data/lib/phronomy/agent/context/knowledge/source/static_knowledge.rb +0 -58
data/lib/phronomy/agent/context/knowledge/splitter/base.rb +0 -53
data/lib/phronomy/agent/context/knowledge/splitter/fixed_size_splitter.rb +0 -57
data/lib/phronomy/agent/context/knowledge/splitter/recursive_splitter.rb +0 -111
data/lib/phronomy/agent/context/knowledge/vector_store/async_backend.rb +0 -116
data/lib/phronomy/agent/context/knowledge/vector_store/base.rb +0 -95
data/lib/phronomy/agent/context/knowledge/vector_store/in_memory.rb +0 -109
data/lib/phronomy/agent/context/knowledge/vector_store/pgvector.rb +0 -133
data/lib/phronomy/agent/context/knowledge/vector_store/redis_search.rb +0 -198
data/lib/phronomy/embeddings.rb +0 -11
data/lib/phronomy/loader.rb +0 -13
data/lib/phronomy/splitter.rb +0 -12
data/lib/phronomy/tool/base.rb +0 -685
data/lib/phronomy/tool/scope_policy.rb +0 -50

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d4410424efcdcdf0ab529106ba2c872bddae9decc322995f37065f426255a05b
-  data.tar.gz: c9ae0dff7f184244b92fc91c585536f767aded5ff6ea8ecaafe86221863738e8
+  metadata.gz: d91e0fb85732153a69d268b41bdfe865791dd8f007e8bed983269284478af002
+  data.tar.gz: c334678280139ac7934b6804b06e282051218472985c022823d26913a3f64905
 SHA512:
-  metadata.gz: e7afa1749dc1431e27e225dfe7a8eafebb2e781c0e6a6ca6e0bdda9712c22b4b5b68d3a9897bc92b466026c698070045774d79a0197ad0463da3f81ff103b36c
-  data.tar.gz: be93a29c2b98b2069ef912815e847f0e0115de07bb435e36fbcb834433757fc72442d7c5db129c9f97dd891113ecf75c46606a6295b2e9f3357831c24246d974
+  metadata.gz: 393567f7c01633ea20160101705b0fde21ddd009a4950f1cb44a106285500b90a3bec88d4c9681cebb7656d0529c09c9e7c52da42e3e12f103231423921b43aa
+  data.tar.gz: 03f5d2e764df9d3becb782ecdec0bf42f03b0f3fc7414efaad2334fe1d047443ef3180e1993244cad92c305607113d0afe2915caa6ff53d14c05c779a61f6b4b

data/README.md CHANGED Viewed

@@ -7,7 +7,7 @@
 > We apologise for the instability this may cause.
 **Phronomy** is a Ruby AI agent framework inspired by open-source AI agent frameworks.
-It provides composable building blocks — Workflows, Agents, Tools, Guardrails, RAG, and Tracing — all powered by [RubyLLM](https://github.com/crmne/ruby_llm) for LLM abstraction.
+It provides composable building blocks — Workflows, Agents, Tools, Guardrails, and Tracing — all powered by [RubyLLM](https://github.com/crmne/ruby_llm) for LLM abstraction.
 ## Features
@@ -30,10 +30,10 @@ It provides composable building blocks — Workflows, Agents, Tools, Guardrails,
 | **Workflow action_timeout** — Per-state `action_timeout:` keyword on `state` DSL; cancels Task-returning entry actions that exceed the limit and raises `Phronomy::ActionTimeoutError` | Beta |
 | **Agent** — ReAct-style tool-calling agents with guardrails and conversation history | Stable |
 | **Before-Completion Hook** — Three-tier LLM parameter injection | Stable |
-| **Context Management** — Token budget calculation, estimation, and pruning | Stable |
+| **Context Management** — Token budget calculation, estimation, and pruning; `Agent::Base` protected hooks: `build_context` (overridable), `trim_messages`, `trim_to_budget`, `compact_messages`, `budget_exceeded?`, `drop_messages_over` | Stable |
 | **Guardrails** — Input/output validation with custom `InputGuardrail`/`OutputGuardrail` | Beta |
 | **`PromptInjectionGuardrail`** — Built-in `InputGuardrail` subclass that detects prompt-injection patterns; usable standalone or as part of a guardrail chain | Beta |
-| **`Tool::Base.redact_params` / `.max_result_size`** — Class-level DSL: `redact_params` masks parameter values in log/trace output; `max_result_size` truncates oversized tool results before they reach the LLM | Beta |
+| **`Agent::Context::Capability::Base.redact_params` / `.max_result_size`** — Class-level DSL: `redact_params` masks parameter values in log/trace output; `max_result_size` truncates oversized tool results before they reach the LLM | Beta |
 | **Output Parser** — JSON and Struct-mapped parsers for structured LLM responses | Stable |
 | **Eval Framework** — Dataset-driven evaluation with multiple scorer types | Beta |
 | **Tracing** — Pluggable span-based observability | Stable |
@@ -43,11 +43,11 @@ It provides composable building blocks — Workflows, Agents, Tools, Guardrails,
 | Feature | Stability |
 |---|---|
-| **Knowledge/RAG** — Retrieval sources with pluggable loaders, splitters, and vector stores; `static_knowledge_refresh!` for runtime cache invalidation | Beta |
+| **Knowledge** — Static context injection with pluggable loaders, splitters, and vector stores; `static_knowledge_refresh!` for runtime cache invalidation | Beta |
 | **`VectorStore#size`** — Returns document count for all three backends (InMemory, RedisSearch, Pgvector) | Beta |
-| **`Agent::Context::Knowledge::VectorStore::AsyncBackend` mixin** — Pluggable async interface for `VectorStore`; default pool-backed implementations for `search_async`, `add_async`, `remove_async`, `clear_async`; backends with native async drivers override individual methods to bypass `BlockingAdapterPool` entirely; all existing backends remain unchanged | Beta |
-| **Parallel RAG multi-source fetch** — `Agent#build_context` fetches all `knowledge_sources` concurrently via `TaskGroup`; `config[:rag_failure_policy]` `:skip` (default) silently ignores failed sources so the agent answers with partial context, `:fail` surfaces the first error; per-source latency is emitted to `Phronomy.configuration.logger` at debug level | Beta |
-| **MCP Tool** — Model Context Protocol server integration | Beta |
+| **`VectorStore::AsyncBackend` mixin** — Pluggable async interface for `VectorStore`; default pool-backed implementations for `search_async`, `add_async`, `remove_async`, `clear_async`; backends with native async drivers override individual methods to bypass `BlockingAdapterPool` entirely; all existing backends remain unchanged | Beta |
+| **MCP Tool** — `Phronomy::Tools::Mcp`: Model Context Protocol server integration; `Phronomy::Tools::Agent`: wraps an agent class as a callable tool via `from_agent` | Beta |
+| **Vector Search Tool** — `Phronomy::Tools::VectorSearch`: wraps a `VectorStore` and `Embeddings` adapter as a callable agent tool via `from_store` | Beta |
 **Execution and reliability**
@@ -55,14 +55,14 @@ It provides composable building blocks — Workflows, Agents, Tools, Guardrails,
 |---|---|
 | **Workflow EventLoop Mode** — Opt-in event-driven execution: `Phronomy.configure { \|c\| c.event_loop = true }` | Experimental |
 | **Agent EventLoop Mode** — `Agent#invoke` (non-blocking via EventLoop), `Agent#run_as_child` (child-FSM pattern for Workflow integration), parallel tool dispatch via `ParallelToolChat` | Experimental |
-| **`invoke_async` / `call_async`** — `Agent::Base#invoke_async` and `Workflow#invoke_async` return a `Task`; `Tool::Base#call_async` similarly; compatible with EventLoop and standalone contexts | Experimental |
+| **`invoke_async` / `call_async`** — `Agent::Base#invoke_async` and `Workflow#invoke_async` return a `Task`; `Agent::Context::Capability::Base#call_async` similarly; compatible with EventLoop and standalone contexts | Experimental |
 | **CancellationToken** — Cooperative cancellation via `cancel!`/`cancelled?`/`raise_if_cancelled!`; `timeout_after(seconds)` for monotonic-clock deadlines; optional `deadline:` (wall-clock) for backward compatibility; passed as `config: { cancellation_token: token }` to agents and `dispatch_parallel`; injected into `tool.execute` when the method declares a `cancellation_token:` keyword | Experimental |
 | **`dispatch_parallel` / `fan_out` `force_kill:` option** — `force_kill: false` (default) leaves timed-out workers running and raises `TimeoutError` immediately; `force_kill: true` restores the old `Thread#kill` behaviour with a `logger.warn` | Beta |
-| **`execution_mode` DSL on `Tool::Base`** — Declares how a tool's `execute` should be dispatched: `:cooperative` (same scheduler thread), `:blocking_io` (default; offloaded to `BlockingAdapterPool`), `:cpu_bound`, `:external_process` | Experimental |
+| **`execution_mode` DSL on `Agent::Context::Capability::Base`** — Declares how a tool's `execute` should be dispatched: `:cooperative` (same scheduler thread), `:blocking_io` (default; offloaded to `BlockingAdapterPool`), `:cpu_bound`, `:external_process` | Experimental |
 | **`invocation_context:` keyword on `Agent#invoke` / `Workflow#invoke`** — Pass a `Phronomy::InvocationContext` directly; `thread_id`, `cancellation_token`, and `deadline`-based timeout are derived from it; `task_id` / `parent_task_id` appear in trace spans automatically; `config:` keys remain supported as backward-compat aliases | Beta |
-| **`ConcurrencyGate` — unified backpressure** — Counting semaphore that enforces per-resource concurrency caps (`max_concurrent_agent_tasks`, `max_concurrent_tool_tasks`, `max_concurrent_workflow_tasks`, `max_concurrent_llm_calls`, `max_concurrent_rag_fetches`, `max_concurrent_vector_searches`); configured via `Phronomy.configure`; backpressure behaviour follows the global `backpressure` setting (`:wait`, `:raise`/`:reject`, `:timeout`); `nil` cap = unlimited (default) | Beta |
+| **`ConcurrencyGate` — unified backpressure** — Counting semaphore that enforces per-resource concurrency caps (`max_concurrent_agent_tasks`, `max_concurrent_tool_tasks`, `max_concurrent_workflow_tasks`, `max_concurrent_llm_calls`, `max_concurrent_vector_searches`); configured via `Phronomy.configure`; backpressure behaviour follows the global `backpressure` setting (`:wait`, `:raise`/`:reject`, `:timeout`); `nil` cap = unlimited (default) | Beta |
 | **Cooperative scheduler yield points** — `Runtime#yield` (cooperative yield; yields the current task's time slice); `Runtime#yield_if_needed(every: N)` (thread-local counter, yields every N calls); CPU-bound detection when `blocking_detect_threshold_ms` is set (warns and increments `non_yield_threshold_violation_count` when a task runs longer than the threshold without yielding); `starvation_threshold_ms` configuration field (default: 50ms) | Beta |
-| **`Phronomy::Metrics`** — `Phronomy::Metrics.snapshot` returns task-tree and pool counters; task-centric keys: `active_agent_tasks`, `active_tool_tasks`, `active_workflow_tasks`, `active_rag_tasks`, `active_llm_tasks`, `task_wait_time_p50_ms`, `task_wait_time_p95_ms`, `task_run_time_p50_ms`, `task_run_time_p95_ms`, `cancelled_tasks`, `failed_tasks`, `non_yield_threshold_violation_count`; pool/event-loop keys remain for backward compatibility; `Runtime#task_snapshot` exposes task-centric metrics directly | Beta |
+| **`Phronomy::Metrics`** — `Phronomy::Metrics.snapshot` returns task-tree and pool counters; task-centric keys: `active_agent_tasks`, `active_tool_tasks`, `active_workflow_tasks`, `active_llm_tasks`, `task_wait_time_p50_ms`, `task_wait_time_p95_ms`, `task_run_time_p50_ms`, `task_run_time_p95_ms`, `cancelled_tasks`, `failed_tasks`, `non_yield_threshold_violation_count`; pool/event-loop keys remain for backward compatibility; `Runtime#task_snapshot` exposes task-centric metrics directly | Beta |
 | **`Phronomy.with_configuration` / `Phronomy.reset_runtime!`** — Scoped configuration override and full runtime reset for test isolation | Beta |
 **Agent patterns**
@@ -131,8 +131,8 @@ Install additional gems only for the features you use:
 | Gem | Required for |
 |-----|-------------|
-| `pgvector` | `Phronomy::Agent::Context::Knowledge::VectorStore::Pgvector` |
-| `redis` | `Phronomy::Agent::Context::Knowledge::VectorStore::RedisSearch` |
+| `pgvector` | `Phronomy::VectorStore::Pgvector` |
+| `redis` | `Phronomy::VectorStore::RedisSearch` |
 | `opentelemetry-api` | `Phronomy::Tracing::OpenTelemetryTracer` |
 ## Quick Start
@@ -140,7 +140,7 @@ Install additional gems only for the features you use:
 ### Agent — ReAct tool-calling agent
 ```ruby runnable
-class WebSearch < Phronomy::Tool::Base
+class WebSearch < Phronomy::Agent::Context::Capability::Base
   description "Search the web"
   param :query, type: :string, desc: "Search query"
@@ -216,10 +216,10 @@ transition from: :run_agent, on: :child_failed,    to: :handle_error
 ### Multi-Agent — Agent-as-Tool pattern
-Wrap sub-agents as `Tool::Base` subclasses so the orchestrator LLM can call them on demand.
+Wrap sub-agents as `Agent::Context::Capability::Base` subclasses so the orchestrator LLM can call them on demand.
 ```ruby
-class ResearchTool < Phronomy::Tool::Base
+class ResearchTool < Phronomy::Agent::Context::Capability::Base
   description "Research a topic and return key findings as bullet points."
   param :topic, type: :string, desc: "The topic to research"
@@ -233,7 +233,7 @@ class WriterAgent < Phronomy::Agent::Base
   instructions "You are a professional technical writer."
 end
-class WriteTool < Phronomy::Tool::Base
+class WriteTool < Phronomy::Agent::Context::Capability::Base
   description "Write a technical blog post given research notes and a writing brief."
   param :instructions, type: :string, desc: "Writing brief including research notes"
@@ -280,35 +280,25 @@ end
 > that logic must be implemented by the application. Reference implementations for
 > common patterns are available in `phronomy-examples` (example 06).
-### Knowledge/RAG — Context injection and vector retrieval
+### Knowledge — Static context injection
 ```ruby
 # Static knowledge (policy files, reference docs)
-policy = Phronomy::Agent::Context::Knowledge::Source::StaticKnowledge.new(
+policy = Phronomy::Agent::Context::Knowledge::StaticKnowledge.new(
   File.read("policy.md"),
   type:   :policy,
   source: "policy.md"   # exposed to LLM for citation
 )
-# RAG retrieval from a vector store
-store      = Phronomy::Agent::Context::Knowledge::VectorStore::InMemory.new
-embeddings = Phronomy::Agent::Context::Knowledge::Embeddings::RubyLLMEmbeddings.new(model: "text-embedding-3-small")
-# Add documents before querying
-text1 = "Refunds are processed within 5 business days."
-text2 = "Contact support@example.com for refund requests."
-store.add(id: "doc-1", embedding: embeddings.embed(text1), metadata: { content: text1, source: "policy.md" })
-store.add(id: "doc-2", embedding: embeddings.embed(text2), metadata: { content: text2, source: "policy.md" })
-rag = Phronomy::Agent::Context::Knowledge::Source::RAGKnowledge.new(store: store, embeddings: embeddings, k: 5)
-# Inject at invocation time
-result = MyAgent.new.invoke("What is the refund policy?",
-  config: { knowledge_sources: [policy, rag] })
+# Inject at invocation time via the agent DSL
+class MyAgent < Phronomy::Agent::Base
+  model "gpt-4o-mini"
+  knowledge policy
+end
 ```
-`static_knowledge_refresh!` invalidates the class-level cache of *static* knowledge sources
-(not RAG stores). Call it when the underlying file or content has changed:
+`static_knowledge_refresh!` invalidates the class-level cache of static knowledge sources.
+Call it when the underlying file or content has changed:
 ```ruby
 # Static knowledge sources are cached at the class level after the first fetch.
@@ -319,8 +309,8 @@ MyAgent.static_knowledge_refresh!
 Load and split documents with built-in loaders:
 ```ruby
-chunks = Phronomy::Agent::Context::Knowledge::Loader::MarkdownLoader.new.load("docs/guide.md")
-         .then { |docs| Phronomy::Agent::Context::Knowledge::Splitter::RecursiveSplitter.new(chunk_size: 512).split(docs) }
+chunks = Phronomy::VectorStore::Loader::MarkdownLoader.new.load("docs/guide.md")
+         .then { |docs| Phronomy::VectorStore::Splitter::RecursiveSplitter.new(chunk_size: 512).split(docs) }
 ```
 ### Multi-Agent Handoff — Hub-and-spoke routing
@@ -539,7 +529,7 @@ end
 ### MCP Tool — External tool servers
 ```ruby
-search_tool = Phronomy::Tool::McpTool.from_server(
+search_tool = Phronomy::Tools::Mcp.from_server(
   "stdio://./mcp-server",
   tool_name: "web_search"
 )
@@ -723,8 +713,8 @@ registry the budget is silently skipped.
 ### CancellationToken — Cooperative cancellation
 Pass a `CancellationToken` to any agent via `config: { cancellation_token: token }`.
-Cancellation is checked at multiple granular checkpoints: before the LLM call, before
-each RAG knowledge-source fetch, after each streaming chunk, before each parallel
+Cancellation is checked at multiple granular checkpoints: before the LLM call,
+after each streaming chunk, before each parallel
 tool-call batch, and after each `before_completion` hook. `CancellationError` is
 raised immediately and is never retried. No threads are force-killed — `ensure`
 blocks always execute.

data/benchmark/baseline.json CHANGED Viewed

@@ -5,5 +5,5 @@
   "dispatch_parallel_10": 886.0,
   "cancellation_token_cancelled": 4335060.97443425,
   "cancellation_token_raise_if_cancelled_noop": 3566903.189098373,
-  "trim_context_remove_2000": 1761.5700678986254
+  "trim_messages_2000": 2896552.0
 }

data/benchmark/bench_agent_invoke.rb CHANGED Viewed

@@ -53,7 +53,7 @@ class BenchStubChat
 end
 # A stub tool that does nothing but conforms to the Tool::Base interface.
-class BenchNullTool < Phronomy::Tool::Base
+class BenchNullTool < Phronomy::Agent::Context::Capability::Base
   description "No-op benchmark tool"
   param :x, type: :string, desc: "input"

data/benchmark/bench_context_assembler.rb CHANGED Viewed

@@ -41,6 +41,14 @@ Benchmark.bm(40) do |x|
   end
   x.report("build(1000 msgs, 10 chunks, budgeted)") do
-    (BENCH_ASM_ITERATIONS / 10).times { make_assembler(n_messages: 1000, n_chunks: 10, with_budget: true).build }
+    (BENCH_ASM_ITERATIONS / 10).times do
+      # Assembler raises ContextLengthError when messages exceed the budget;
+      # callers (e.g. Agent::Base#build_context) are expected to pre-trim via
+      # trim_to_budget before calling build. The rescue here keeps the benchmark
+      # measuring build's fast path without triggering the error path.
+      make_assembler(n_messages: 1000, n_chunks: 10, with_budget: true).build
+    rescue Phronomy::ContextLengthError
+      # expected — budget exceeded
+    end
   end
 end

data/benchmark/bench_regression.rb CHANGED Viewed

@@ -62,7 +62,7 @@ end
 # ---------------------------------------------------------------------------
 # Target 3: Tool::Base#params_schema generation (10 params)
 # ---------------------------------------------------------------------------
-tool_class = Class.new(Phronomy::Tool::Base) do
+tool_class = Class.new(Phronomy::Agent::Context::Capability::Base) do
   description "Test tool with 10 params"
   param :p1, type: :string, desc: "param 1"
   param :p2, type: :string, desc: "param 2"
@@ -130,18 +130,18 @@ t6 = Benchmark.measure("CancellationToken#raise_if_cancelled! (no-op)") do
 end
 # ---------------------------------------------------------------------------
-# Target 7: Context::TrimContext#remove on a 2000-element history
+# Target 7: Agent::Base#trim_messages on a 2000-message history
 # ---------------------------------------------------------------------------
 BenchMsg = Struct.new(:content) unless defined?(BenchMsg)
-TRIM_ELEMENTS = Array.new(2_000) { |i| {seq: i, message: BenchMsg.new("msg #{i}"), tokens: 10, role: :user} }
-TRIM_BUDGET = Phronomy::LlmContextWindow::TokenBudget.new(context_window: 4096, max_output_tokens: 512)
+TRIM_MESSAGES = Array.new(2_000) { |i| BenchMsg.new("msg #{i}") }
 TRIM_ITERATIONS = 500
-t7 = Benchmark.measure("TrimContext#remove (2000-element history)") do
+bench_trim_agent = Class.new(Phronomy::Agent::Base).new
+t7 = Benchmark.measure("Agent::Base#trim_messages (2000-msg history)") do
   TRIM_ITERATIONS.times do
-    tc = Phronomy::Agent::Context::Conversation::TrimContext.new(message_elements: TRIM_ELEMENTS, budget: TRIM_BUDGET)
-    tc.remove((0...200).to_a)  # remove 200 oldest messages
+    bench_trim_agent.send(:trim_messages, TRIM_MESSAGES, keep: 1_800)
   end
 end
@@ -159,7 +159,7 @@ metrics = {
   "dispatch_parallel_10" => [t4, PARALLEL_ITERATIONS],
   "cancellation_token_cancelled" => [t5, 8 * CANCEL_ITERATIONS],
   "cancellation_token_raise_if_cancelled_noop" => [t6, RAISE_ITERATIONS],
-  "trim_context_remove_2000" => [t7, TRIM_ITERATIONS]
+  "trim_messages_2000" => [t7, TRIM_ITERATIONS]
 }
 REGRESSION_RESULTS = {} # rubocop:disable Style/MutableConstant

data/benchmark/bench_tool_schema.rb CHANGED Viewed

@@ -11,7 +11,7 @@ require_relative "../lib/phronomy"
 # --- Tool schema ---
-class BenchTool10Params < Phronomy::Tool::Base
+class BenchTool10Params < Phronomy::Agent::Context::Capability::Base
   description "A tool with 10 parameters for benchmarking purposes"
   param :param1, type: :string, desc: "First parameter"
   param :param2, type: :integer, desc: "Second parameter"
@@ -43,7 +43,7 @@ end
 # --- static_knowledge_chunks cache ---
-class BenchKnowledgeSource < Phronomy::Agent::Context::Knowledge::Source::Base
+class BenchKnowledgeSource < Phronomy::Agent::Context::Knowledge::Base
   def fetch(query: nil)
     [{content: "Cached knowledge fact.", type: :static}]
   end

data/benchmark/bench_vector_store.rb CHANGED Viewed

@@ -28,7 +28,7 @@ BENCH_VS_ITERS = {100 => 100, 1_000 => 20, 10_000 => 5}.freeze
 puts "=== bench_vector_store_inmemory ==="
 Benchmark.bm(35) do |x|
   [100, 1_000, 10_000].each do |n|
-    store = Phronomy::Agent::Context::Knowledge::VectorStore::InMemory.new(dimension: DIM)
+    store = Phronomy::VectorStore::InMemory.new(dimension: DIM)
     populate(store, n)
     iters = BENCH_VS_ITERS[n]

data/docs/decisions/011-build-context-as-single-llm-input-authority.md ADDED Viewed

@@ -0,0 +1,224 @@
+# ADR-011: build_context as the Single Authority for LLM Input
+## Status
+Proposed — 2026-05-31
+## Context
+### Background
+`Agent::Base#build_context` was introduced as a hook for subclasses to customise
+the system prompt and conversation history passed to the LLM.  Its original return
+value was `{ system: String|nil, messages: Array }`, covering only two of the four
+conceptual regions of an LLM context window.
+`LlmContextWindow::Assembler` documents the four regions explicitly:
+```
+1. Instruction  — system prompt text
+2. Capability   — tool definitions
+3. Knowledge    — external facts (XML context tags)
+4. Conversation — conversation messages
+```
+However, the Assembler itself states Region 2 is "handled by RubyLLM, not here",
+leaving tool registration entirely outside the `build_context` path.
+### Problems identified
+**P1 — Tool definitions are not part of `build_context` output**
+Tools were registered with `chat.with_tool(tc)` *after* `build_context` returned,
+directly in `InvocationPipeline`, `_stream_impl`, and `ReactAgent#step`.
+This means a subclass that overrides `build_context` cannot control which tools
+are actually sent to the LLM; tools are always added behind its back.
+**P2 — `_handoff_tools` bypass `build_context` entirely**
+`Runner` adds handoff tools via `_add_handoff_tool` onto the agent instance.
+These were registered with `chat.with_tool(tc)` at every call site, separately
+from `context[:tool_classes]`, without going through `build_context` at all.
+Even if a subclass override returned a modified tool list, handoff tools would
+still be added unconditionally.
+**P3 — Tool token cost excluded from budget calculation**
+LLM providers (OpenAI, Anthropic, Gemini) count tool schema tokens against the
+context window.  The `TokenBudget` / `Assembler` pipeline never subtracted tool
+tokens from the available budget before trimming conversation messages.  This
+caused the budget calculation to be consistently optimistic: the `effective_input_limit`
+was always larger than the tokens actually available for messages, risking context
+window overflow on long conversations with many or complex tools.
+The existing `context_overhead` DSL was a manual workaround:
+```ruby
+class MyAgent < Phronomy::Agent::Base
+  context_overhead 800  # developer guesses tool token cost
+end
+```
+This is inaccurate by design and should not be necessary.
+**P4 — RAG fetch called inside `build_context` on every invocation**
+`build_context` called `fetch_knowledge_chunks` dynamically.  In a ReAct loop
+with N iterations, RAG was fetched N times for the same query.  More importantly,
+dynamic per-call RAG fetch is architecturally misplaced:
+- Knowledge fetched by RAG and injected as Region 3 context belongs to the
+  *agent's knowledge*, not to the per-invocation message flow.
+- If the LLM needs to retrieve information dynamically, the correct mechanism is
+  **function calling**: the LLM calls a retrieval tool, and the result appears in
+  the conversation log as a tool result message (Region 4).
+- Static knowledge that the agent always needs should be registered once at
+  agent initialisation time, not re-fetched on every `build_context` call.
+**P5 — `build_capability_tool_classes` is redundant indirection**
+`build_capability_tool_classes` was introduced as a narrower override hook to
+avoid requiring subclasses to copy `build_context` just to change tool selection.
+However, it has no documentation, no usage examples, and provides no capability
+that overriding `build_context` itself does not already provide.  Its existence
+adds a public API surface and conceptual overhead without commensurate value.
+**P6 — No access to previous context**
+`build_context` builds from scratch every call with no knowledge of what was sent
+to the LLM in the previous call.  This prevents:
+- Token cache hit optimisations (OpenAI prompt caching, Anthropic `cache_control`)
+  which require a stable prompt prefix
+- Incremental context strategies that avoid recomputing unchanged regions
+## Decision
+### D1 — `build_context` is the single authority for all LLM input
+**Nothing may be added to or removed from the LLM request outside of
+`build_context`.**  Every call site (`InvocationPipeline`, `_stream_impl`,
+`ReactAgent#step`, `ReactAgent#stream_step`) must:
+1. Call `build_context` to obtain `{ system:, messages:, tool_classes: }`.
+2. Apply the result to `chat` — and *only* the result.
+3. Not register any additional tools, messages, or instructions independently.
+### D2 — Assembler handles all four regions including Capability
+`LlmContextWindow::Assembler` gains `add_capability(tool_classes)`:
+```ruby
+assembler.add_capability(tools)  # Region 2
+```
+Responsibilities of `add_capability`:
+1. Store `tool_classes` for pass-through in `build` return value.
+2. Serialise each tool's schema (via RubyLLM's provider-specific `tool_for` /
+   `function_declaration_for`) and estimate its token cost.
+3. Add that cost to the `used` token count before conversation message trimming.
+`build` return value expands to:
+```ruby
+{ system: String|nil, messages: Array, tool_classes: Array }
+```
+### D3 — `build_context` includes all tools (user tools + handoff tools)
+`build_context` passes `self.class.tools + _handoff_tools` to
+`assembler.add_capability`.  `_handoff_tools` are framework-managed routing tools;
+they are always included and are not subject to user-level filtering.
+Subclasses that need dynamic tool selection override `build_context` and call
+`assembler.add_capability` with their own selection logic.
+`build_capability_tool_classes` is **removed** (P5 resolution).
+### D4 — `fetch_knowledge_chunks` is removed from `build_context`
+Knowledge enters Region 3 through exactly two paths:
+**Path A — Agent initialisation (static knowledge)**
+```ruby
+class MyAgent < Phronomy::Agent::Base
+  knowledge "The capital of Japan is Tokyo.", type: :entity
+end
+```
+Registered once; the Assembler always includes it.
+**Path B — Per-invocation dynamic knowledge via `config[:knowledge_sources]`**
+The caller passes knowledge sources in the invocation config:
+```ruby
+agent.invoke(input, config: { knowledge_sources: [my_rag_source] })
+```
+`build_context` calls `fetch_knowledge_chunks` exactly **once per `invoke`**,
+not once per LLM call within a ReAct loop.  The result is cached on the agent
+instance for the duration of that invocation.
+This is a caller responsibility: if the caller needs fresh knowledge on every
+`invoke`, it passes new sources.  Within a single `invoke`, knowledge is stable.
+### D5 — Previous context stored as instance variable
+After each `build_context` call, the result is stored:
+```ruby
+@last_context = { system: ..., messages: ..., tool_classes: ... }
+```
+`build_context` may reference `@last_context` for optimisations such as:
+- Detecting that `system` and `tool_classes` are unchanged → skip regeneration
+  of the stable prefix to improve LLM provider token cache hit rate.
+- Skipping Assembler work when the context is provably identical to the last call.
+`@last_context` is **not** passed as a method parameter; it is read from the
+instance.  This avoids changing call-site signatures.
+Note: `Agent` instances are not thread-safe (already documented).  `@last_context`
+inherits this constraint — concurrent invocations on the same instance are not
+supported.
+## Consequences
+### Token budget accuracy
+With D2, `effective_input_limit` correctly reflects the tokens actually available
+for conversation messages after system prompt, tool schemas, and knowledge are
+accounted for.  `context_overhead` becomes unnecessary for tool costs; it may
+still be used as a manual reserve for provider-specific overhead not captured by
+schema serialisation.
+### `build_context` as the integration surface
+Subclasses override `build_context` for all customisation: tool selection,
+knowledge injection, system prompt variants, context compression strategies.
+There is one integration point, not several.
+### RAG fetch frequency
+`fetch_knowledge_chunks` runs at most once per `invoke` call (P4 resolution).
+In ReAct loops with N iterations, RAG is fetched once, not N times.
+### Removed API
+`build_capability_tool_classes` is removed.  It was never documented or used
+outside of internal framework code, so there is no public API break.
+## Migration notes
+- All call sites (`InvocationPipeline`, `_stream_impl`, `ReactAgent#step/stream_step`)
+  must be updated to remove the separate `_handoff_tools` registration lines and
+  rely solely on `context[:tool_classes]`.
+- `Assembler#add_capability` and the token estimation for tool schemas must be
+  implemented.
+- `build_context` must be updated to pass all tools to `assembler.add_capability`
+  and to cache `@last_context`.
+- `fetch_knowledge_chunks` must be lifted out of `build_context` into the
+  invocation-scoped cache described in D4.