RubyGems - legion-llm - Versions diffs - 0.6.20 → 0.6.23 - Mend

legion-llm 0.6.20 → 0.6.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +35 -0
data/README.md +1 -1
data/docs/llm-schema-spec.md +145 -2
data/lib/legion/llm/pipeline/executor.rb +150 -13
data/lib/legion/llm/pipeline/request.rb +35 -16
data/lib/legion/llm/pipeline/response.rb +9 -1
data/lib/legion/llm/pipeline/steps/classification.rb +2 -2
data/lib/legion/llm/routes.rb +143 -56
data/lib/legion/llm/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 45d07a2c60a8663ba1b62165b3b489d49a2aac37ee1e1ec6abff7bd5f4357d6c
-  data.tar.gz: 9ee8246c75fee6d7e690b55f4e2a91b030f6b142c91dd79acb7bf66edf4d9d05
+  metadata.gz: bd422bcc5c5b6da0dbd4906df8ac394e5c712e709eb8cb367cc676fbf6e45f97
+  data.tar.gz: 148e5741014313918781e757c87a50e40b2d5e5ef164631b71959f6027c70316
 SHA512:
-  metadata.gz: 92b102167bb6f346fab490787baedda2f2fa6fb528713c6b055b269f747c490d56fed5d21027616bc2c6d1f7cf4069ce14bb9b7607f7a6ad07c2c69b05ce0814
-  data.tar.gz: f1fded39722bf678936df28f3bbf3ec095265bdabc28f70eaf67e64fae5519b7c58842a8432e4b4bfdc0476c9275473f75a9c83bf0c77d6f5cc2afe1fa700aeb
+  metadata.gz: dc80d32daf35e53bfe514a0e318911c97e9e3971374eb711128c68db6a02084cc9fd259f68ccf6e0242fb10ff1cbccf2a1ca9132b37e2aa1e6ee23cd5cbe0b5d
+  data.tar.gz: 5673f3536126bc1d3e17e69ab2892edb1b8bd9524bdc6c38993e85ac0869e48a42bd82f9d8691923527f701940a42969052bbbf9db40976434f1ad00210f3934

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,40 @@
 # Legion LLM Changelog
+## [0.6.23] - 2026-04-07
+### Fixed
+- `build_response_routing` now always sets `routing[:escalated]` (defaults to `false`) instead of conditionally omitting the key
+- Schema spec annotations updated: Thinking, Cache, Config(Generation) corrected to reflect `from_chat_args` first-class field mapping; ErrorResponse annotation updated with complete error hierarchy including `EscalationExhausted`, `PrivacyModeError`, `TokenBudgetExceeded`, `DaemonDeniedError`, `DaemonRateLimitedError`
+## [0.6.22] - 2026-04-07
+### Fixed
+- Classification LEVELS ordering: swapped `[:public, :internal, :restricted, :confidential]` to correct `[:public, :internal, :confidential, :restricted]` so severity comparisons work properly
+- `Response.from_ruby_llm` now extracts actual `stop_reason` from provider response instead of hardcoding `:end_turn`
+- `Request.from_chat_args` maps 16 fields (`tool_choice`, `generation`, `thinking`, `response_format`, `context_strategy`, `cache`, `fork`, `tokens`, `stop`, `modality`, `hooks`, `idempotency_key`, `ttl`, `metadata`, `enrichments`, `predictions`) to first-class struct members instead of dumping into `extra`
+- `build_response` populates routing details (strategy, tier, escalation chain, latency), cost estimation via `CostEstimator`, and actual stop reason instead of hardcoded defaults
+- `response_tool_calls` merges execution data (exchange_id, source, status, duration_ms, result) from timeline events into tool call hashes
+- `step_conversation_uuid` now auto-generates `conv_<hex>` when no conversation_id is provided (was a no-op)
+- `step_response_normalization` now normalizes all enrichment keys to string format (was a no-op)
+- Enrichment key `[:conversation_history]` corrected to `['context:conversation_history']` for consistent `source:type` pattern
+### Changed
+- Schema spec (`docs/llm-schema-spec.md`) updated: ToolCall, Config(Generation), Cost, Routing(response), Stop status changed from Partial/Not-implemented to Implemented
+## [0.6.21] - 2026-04-07
+### Added
+- Real-time tool call SSE streaming: tool-call, tool-result, and tool-error events emitted during execution, not after completion
+- `ClientToolMethods` module extracted from inline tool class for cleaner separation
+- Rich tool execution logging: command, path, pattern, url shown per tool type instead of just key names
+- `summarize_tool_args` produces structured log details per tool type (sh, file_read, file_write, file_edit, grep, glob, web_fetch, list_directory)
+- `tool_event_handler` callback on `Pipeline::Executor` for real-time tool event forwarding via `Thread.current`
+### Fixed
+- `install_tool_loop_guard` now uses `session.on_tool_call` instead of `session.on(:tool_call)` — RubyLLM callback was never firing, tool_call_id was always nil
+- `list_directory` tool now expands `~` via `File.expand_path` — previously failed with `ENOENT` on tilde paths
+- SSE text-delta events logged at debug level instead of info to reduce log noise
 ## [0.6.20] - 2026-04-06
 ### Added

data/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
-**Version**: 0.6.14
+**Version**: 0.6.23
 ## Installation

data/docs/llm-schema-spec.md CHANGED Viewed

@@ -1,6 +1,75 @@
 # Legion::LLM Schema Specification
-## Status: Draft / Brainstorming
+## Status: Mixed — Envelope Implemented, Inner Types Aspirational
+**Implemented in**: `Pipeline::Request` and `Pipeline::Response` (`lib/legion/llm/pipeline/request.rb`, `response.rb`)
+**Version**: 1.0.0 (schema_version field on all payloads)
+**Last verified**: 2026-04-07
+The outer envelope is implemented: all 32 `Request` fields and 34 `Response` fields exist as `Data.define` members. However, many inner types (Message, ContentBlock, ToolCall, Chunk, Conversation, Feedback, ErrorResponse) are **not yet implemented as dedicated structs** — they are plain hashes or strings in the current code. Several Response fields are **always nil or empty** in the pipeline today.
+This document serves as both the **canonical reference** for what is implemented and the **target specification** for what inner types should look like. Sections are annotated with implementation status.
+For the AMQP wire protocol (exchange topology, queue configuration, message envelope, routing keys), see the Legion Wire Protocol spec in the LegionIO docs repo.
+### Implementation Status Matrix
+| Section | Status | Notes |
+|---------|--------|-------|
+| **Request (envelope)** | Implemented | All 32 fields exist on `Data.define`. `from_chat_args` maps all to first-class fields. |
+| **Response (envelope)** | Partial | All 34 fields exist. 10 fields always nil/empty (see below). |
+| **Message** | Not implemented | Plain `{ role:, content: }` hashes. No struct, no id/seq/status/version. |
+| **ContentBlock** | Not implemented | Content is always String. Only `:text` block used (system prompt caching). |
+| **Tool** | Partial | `ToolAdapter` has name/description/parameters. No `source` on object, no `version`. |
+| **ToolCall** | Partial | `id`, `name`, `arguments` + `exchange_id`, `source`, `status`, `duration_ms`, `result` merged from Timeline. `error` field never populated. Timeline lookup by tool name, not call ID (breaks duplicate tool calls). |
+| **ToolChoice** | Stub | Field exists on Request, defaults to `{ mode: :auto }`, never forwarded to provider. |
+| **Enrichment** | Implemented | RAG/GAIA enrichments work. Value shapes vary between steps. |
+| **Prediction** | Partial | Request-side works. Response-side actuals never filled in. |
+| **Tracing** | Implemented | trace_id, span_id, exchange_id all generated and propagated. |
+| **Classification** | Partial | Labels applied but routing restrictions not enforced. |
+| **Caller** | Implemented | Identity propagated, Profile derived. |
+| **Agent** | Not implemented | Response `agent` field always nil. |
+| **Billing** | Partial | Per-request cap only. No cumulative budget enforcement. |
+| **Test** | Implemented | Test mode flags propagated. |
+| **Modality** | Not implemented | Field exists, not acted upon. |
+| **Hooks** | Partial | Pre/post hooks on Request. Response hooks not fired. |
+| **Feedback** | Not implemented | No struct, class, or storage. Spec only. |
+| **Audit** | Implemented | Uses symbol keys (not string keys as spec claims). |
+| **Timeline** | Implemented | Event recording works. Participant tracking works. |
+| **Participants** | Implemented | Tracked via Timeline. |
+| **Wire Capture** | Not implemented | Response `wire` field always nil. |
+| **Retry** | Not implemented | Response `retry` field always nil. |
+| **Safety** | Not implemented | Response `safety` field always nil. |
+| **Rate Limit** | Not implemented | Response `rate_limit` field always nil. |
+| **Thinking** | Partial | Request thinking config mapped to first-class field. Response thinking **never populated** by executor (always nil). |
+| **Context Window** | Not implemented | `tokens.context_window`, `utilization`, `headroom` never populated. |
+| **Validation** | Not implemented | Response `validation` field always nil. |
+| **Provider Features** | Not implemented | Response `features` field always nil. |
+| **Model Deprecation** | Not implemented | Response `deprecation` field always nil. |
+| **Cache** | Partial | Request cache mapped to first-class field. Response `cache` always `{}`. |
+| **Chunk (Streaming)** | Not implemented | Raw RubyLLM chunks passed through; no spec-compliant Chunk struct. |
+| **ErrorResponse** | Not implemented | No struct; only exception classes (`LLMError` hierarchy). |
+| **Conversation** | Partial | `ConversationStore` exists but no `Conversation` struct. Limited fields. |
+| **Config (Generation)** | Implemented | `from_chat_args` now maps generation, thinking, response_format, etc. to first-class fields. |
+| **Quality** | Implemented | Returns `{ score:, band:, source: }` (not `{ score:, acceptable:, checker: }` as spec says). |
+| **Cost** | Implemented | Populated via `CostEstimator.estimate` with `estimated_usd`, `provider`, `model`. |
+| **Routing (response)** | Implemented | `provider`, `model`, `strategy`, `tier`, `escalated`, `escalation_chain`, `latency_ms` populated. |
+| **Stop** | Implemented | `stop.reason` extracted from provider response (`:end_turn`, `:tool_use`, etc.). |
+| **Metering** | Not implemented | Module exists but not wired into pipeline steps. |
+#### Response Fields Always Nil/Empty
+These Response fields exist on the `Data.define` but are **never populated** by the executor today:
+- `agent` — always nil
+- `cache` — always `{}`
+- `safety` — always nil
+- `rate_limit` — always nil
+- `features` — always nil
+- `deprecation` — always nil
+- `validation` — always nil
+- `wire` — always nil
+- `retry` — always nil
 ## Design Principles
@@ -27,6 +96,8 @@ schema_version: "1.0.0"   # semver -- major.minor.patch
 ## Message
+> **Implementation status: NOT IMPLEMENTED** — No `Message` struct exists. Messages are plain hashes with only `role` and `content` in the pipeline. `ConversationStore` persists additional fields (`id`, `seq`, `parent_id`, `agent_id`, `created_at`) in its DB rows, but these are not surfaced as a structured Message object.
 The atomic unit of conversation. Every exchange between user, assistant, and tools is a Message.
 ```
@@ -78,6 +149,8 @@ message.text  # returns text content regardless of String vs Array<ContentBlock>
 ## Content Blocks
+> **Implementation status: NOT IMPLEMENTED** — No `ContentBlock` struct exists. Content is always a plain String in the pipeline. The only place a typed block hash is constructed is for system prompt caching (`{ type: :text, content: ..., cache_control: ... }`). No image, audio, video, document, tool_use, tool_result, citation, or error block handling exists.
 Multimodal content. When `Message.content` is an array, each element is a ContentBlock.
 ### Block Types
@@ -199,6 +272,8 @@ data:            Hash?         # structured error data
 ## Tool
+> **Implementation status: PARTIAL** — `ToolAdapter` wraps `RubyLLM::Tool` with `name`, `description`, `parameters`. The `source` field exists as a parallel lookup in `find_tool_source` (not on the tool object). `version` does not exist.
 Tool definitions available to the LLM.
 ```
@@ -227,6 +302,8 @@ Used by RBAC (can this caller use tools from this source?) and audit (which syst
 ## ToolCall
+> **Implementation status: PARTIAL** — Tool calls are hashes with `id`, `name`, `arguments` and optionally `exchange_id`, `source`, `status`, `duration_ms`, `result` merged from matching Timeline events. The `error` field is never populated. Timeline lookup uses tool name (not call ID), so duplicate invocations of the same tool in one response will only have execution data for the last invocation.
 A tool invocation made by the assistant, with execution results.
 ```
@@ -251,6 +328,8 @@ Always a parsed Hash, never a JSON string. Provider adapters that receive argume
 ## ToolChoice
+> **Implementation status: STUB** — Field exists on Request, defaults to `{ mode: :auto }`. The `:specific` mode's `name` field is not handled. The `tool_choice` value is never forwarded to the underlying RubyLLM provider call.
 Controls how the LLM uses available tools.
 ```
@@ -263,6 +342,8 @@ ToolChoice
 ## Enrichment
+> **Implementation status: IMPLEMENTED** — RAG and GAIA enrichments work. Note: value shapes are inconsistent across pipeline steps — not all enrichments include `content:`, `data:`, `duration_ms:`, `timestamp:` as spec describes.
 Things that *shaped* the request during processing. Any system can contribute enrichments without schema changes. Enrichments modify or observe the request -- for decisions and outcomes, see [Audit](#audit).
 Enrichments are a **Hash keyed by `"source:type"`**, not an array. This enables direct lookup and clean request-vs-response comparison without looping.
@@ -319,6 +400,8 @@ Adding a new system requires zero schema changes -- just add a new key.
 ## Prediction
+> **Implementation status: PARTIAL** — Request-side predictions work (components can contribute predictions). Response-side actuals (`actual_value`, `accurate`) are never filled in — no post-execution comparison occurs.
 Hypothesis recorded before execution, compared to reality after execution. Enables self-improving systems. Any component in the pipeline can contribute predictions.
 Predictions are a **Hash keyed by `"source:type"`**, same pattern as enrichments. Direct lookup, no looping.
@@ -395,6 +478,8 @@ response.predictions.count { |_, v| v[:correct] }.to_f / response.predictions.si
 ## Tracing & Correlation
+> **Implementation status: IMPLEMENTED** — `trace_id`, `span_id`, `exchange_id` all generated and propagated via `Pipeline::Tracing`.
 OpenTelemetry-compatible distributed tracing. Groups related requests across agentic loops, forks, and multi-step tasks.
 ```
@@ -424,6 +509,8 @@ Tracing is present on Request, Response, ErrorResponse, and Chunk.
 ## Exchange (Per-Hop Tracking)
+> **Implementation status: IMPLEMENTED** — `conversation_id`, `request_id` (mapped to `id`), and `exchange_id` all generated via `Pipeline::Tracing` and propagated through the pipeline.
 Three-level ID hierarchy inspired by SIP's Call-ID / CSeq / Branch/Via model. Tracks every hop within a single request.
 ```
@@ -487,6 +574,8 @@ In practice, each exchange would become a child span under the request's span in
 ## Data Classification & Compliance
+> **Implementation status: PARTIAL** — Classification labels are applied to requests. However, routing restrictions (e.g., preventing PHI-tagged data from going to certain providers) are not enforced.
 Data governance for enterprise adoption. Controls where data can be processed, how long it's retained, and what it contains.
 ```
@@ -538,6 +627,8 @@ Provider registry includes each provider's processing jurisdiction. Router match
 ## Caller
+> **Implementation status: IMPLEMENTED** — Caller identity propagated through the pipeline. `Profile.derive` reads `caller[:requested_by][:type]` to determine step skipping.
 Auth-level identity tracking. Who authenticated to make this request, and on whose behalf. Separate from `agent` (which tracks AI entity identity).
 ```
@@ -602,6 +693,8 @@ RBAC checks `caller.requested_by` for permission evaluation. If `requested_for`
 ## Agent Identity
+> **Implementation status: NOT IMPLEMENTED** — The `agent` field exists on both Request and Response but is always nil. No agent identity is attached during pipeline execution.
 Tracks which AI entity is executing the request. Not about auth (that's `caller`) -- about the AI agent doing the work.
 ```
@@ -648,6 +741,8 @@ Multiple LLM requests can share a `task_id`, enabling: "Show me everything that
 ## Billing & Budget
+> **Implementation status: PARTIAL** — Per-request cost cap works. Cumulative budget tracking (daily/monthly limits) is not implemented. Metering module exists but is not wired into pipeline steps.
 Cost tracking, budget enforcement, and rate limiting.
 ```
@@ -690,6 +785,8 @@ Checked in the pipeline before the provider call:
 ## Test & Evaluation Mode
+> **Implementation status: IMPLEMENTED** — Test mode flags propagated through the pipeline.
 Controls for testing, benchmarking, replay, and experimentation.
 ```
@@ -743,6 +840,8 @@ Experiment results are tracked via predictions (expected: better quality with GA
 ## Modality
+> **Implementation status: NOT IMPLEMENTED** — The `modality` field exists on Request but is not acted upon by the pipeline or provider adapters.
 Declares input and output modality expectations. Guides routing (not all providers support all combinations) and future-proofs for multimodal evolution.
 ```
@@ -799,6 +898,8 @@ Provider capabilities:
 ## Lifecycle Hooks
+> **Implementation status: PARTIAL** — Pre/post hooks on Request are supported. Response-side hook firing is not implemented.
 Caller-declared injection points in the pipeline. Named hooks registered by extensions or configuration.
 ```
@@ -839,6 +940,8 @@ Hooks receive the full request/response context and can add enrichments, but can
 ## Feedback
+> **Implementation status: NOT IMPLEMENTED** — No Feedback struct, class, or storage exists. No code submits, receives, or stores feedback.
 User or automated quality feedback on specific messages. Lives on the Conversation, not on individual requests. Closes the learning loop.
 ```
@@ -884,6 +987,8 @@ Quality checkers and GAIA can also submit feedback:
 ## Audit
+> **Implementation status: IMPLEMENTED** — Audit records are populated by the pipeline. Note: uses symbol keys (`:step`, `:action`), not string keys as some examples in this spec show.
 Record of what *happened* during pipeline processing -- decisions, actions, outcomes. Separate from enrichments (which record what *shaped* the request). Response-only.
 Audit is a **Hash keyed by `"step:action"`**, same pattern as enrichments and predictions.
@@ -990,6 +1095,8 @@ response.audit[:"persistence:store"][:data][:method]  # => :direct
 ## Pipeline Timeline
+> **Implementation status: IMPLEMENTED** — `Pipeline::Timeline` records ordered events with participant tracking.
 Inspired by [Homer/SIPCAPTURE](https://github.com/sipcapture/homer) call flow diagrams. A unified, globally-sequenced timeline of **everything** that happened during a request. Reconstructs the full call flow across all systems -- enrichments, audit, tool calls, provider calls, connections -- in one ordered record.
 This is the **one place an array is correct**. Timeline is ordered data, not lookup data. You iterate it in sequence to reconstruct the call flow, like Homer's ladder diagram.
@@ -1125,6 +1232,8 @@ The timeline is built during pipeline execution and returned on the response. It
 ## Participants
+> **Implementation status: IMPLEMENTED** — Tracked via `Pipeline::Timeline`.
 All systems that touched this request. Enables Homer-style column headers for call flow visualization. Response-only, populated by the pipeline.
 ```
@@ -1154,6 +1263,8 @@ Auto-populated: every unique `from` and `to` value in the timeline becomes a par
 ## Wire Capture
+> **Implementation status: NOT IMPLEMENTED** — Response `wire` field is always nil. No capture of raw provider payloads occurs.
 Raw request and response payloads as sent to/received from the provider. For debugging translator issues, you need both sides of the wire. Opt-in (can be expensive to store).
 Keyed by `exchange_id` -- one capture per provider call, not per request. A request with retries or tool loops produces multiple wire captures.
@@ -1223,6 +1334,8 @@ This lives on `response.routing.connection` since it's part of the routing outco
 ## Retry
+> **Implementation status: NOT IMPLEMENTED** — Response `retry` field is always nil. Retry logic exists in the executor (rate limit rescue) but results are not captured in the retry struct.
 Distinct from escalation. Retries are the same provider/model attempted again after a transient failure. Escalation is switching to a different provider/model.
 ```
@@ -1269,6 +1382,8 @@ response.retry = {
 ## Content Safety
+> **Implementation status: NOT IMPLEMENTED** — Response `safety` field is always nil. Provider safety results are not captured.
 Provider-reported content filtering results. Different from classification (which is our data governance). This is the provider saying "I evaluated this content against my safety policies."
 Response-only. Not all providers return this.
@@ -1322,6 +1437,8 @@ response.safety = {
 ## Rate Limit State
+> **Implementation status: NOT IMPLEMENTED** — Response `rate_limit` field is always nil. Provider rate limit headers are not captured (rate limit errors are rescued and retried, but quota state is not stored).
 Provider quota state returned in response headers. Structured and always captured (not opt-in like wire). Critical for routing decisions.
 ```
@@ -1361,6 +1478,8 @@ end
 ## Thinking & Reasoning
+> **Implementation status: PARTIAL** — Request-side thinking configuration is mapped to the first-class `thinking` field by `from_chat_args`. Response-side `thinking` field exists on the Response struct but is **never populated** by the executor — it is always nil.
 Controls for extended thinking, chain-of-thought, and reasoning behavior. Separate from generation parameters (temperature, top_p) because reasoning is about *how deeply* the model thinks, not *how randomly* it samples.
 ### Request side
@@ -1395,6 +1514,8 @@ Thinking tokens are tracked separately from regular output tokens because they h
 ## Context Window Utilization
+> **Implementation status: NOT IMPLEMENTED** — `tokens.context_window`, `tokens.utilization`, and `tokens.headroom` are never populated on the Response. Only `input_tokens` and `output_tokens` are set.
 Expands response-side tokens with capacity information. Drives context strategy decisions.
 Added to `response.tokens`:
@@ -1439,6 +1560,8 @@ end
 ## Structured Output Validation
+> **Implementation status: NOT IMPLEMENTED** — Response `validation` field is always nil. `StructuredOutput` module exists for enforcing schemas but does not populate this struct.
 When `response_format.type` is `:json` or `:json_schema`, reports whether the response actually validated.
 Response-only. Added to response alongside quality.
@@ -1479,6 +1602,8 @@ response.validation = {
 ## Provider Features
+> **Implementation status: NOT IMPLEMENTED** — Response `features` field is always nil.
 Post-hoc report of which provider-specific features actually activated on this request. Different from capabilities (what the provider CAN do) -- this is what it DID.
 Response-only. Hash-keyed by feature name.
@@ -1519,6 +1644,8 @@ end
 ## Model Deprecation
+> **Implementation status: NOT IMPLEMENTED** — Response `deprecation` field is always nil.
 Structured deprecation warnings from providers. Separate from the `warnings` array because automated systems need to act on these programmatically.
 Response-only.
@@ -1564,6 +1691,8 @@ end
 ## Cache
+> **Implementation status: PARTIAL** — Request-side `cache` field is mapped to the first-class field by `from_chat_args` (defaults to `{ strategy: :default, cacheable: true }`). Response-side `cache` field is always `{}`.
 Symmetric caching controls on request and response. Replaces a flat strategy symbol with structured metadata.
 ### Request side (what I want)
@@ -1631,6 +1760,8 @@ Response: cache: { hit: true, key: "sha256:abc123", tier: :local, age: 45, expir
 ## Request
+> **Implementation status: IMPLEMENTED (envelope)** — All 32 fields exist as `Data.define` members with `.build` and `.from_chat_args` constructors. All fields including `generation`, `thinking`, `response_format`, `context_strategy`, `cache`, `fork`, `tokens`, `stop`, `modality`, `hooks`, `idempotency_key`, `ttl`, `metadata`, `enrichments`, and `predictions` are mapped to first-class struct members. Convenience accessors (`.model`, `.provider`) described in the spec are not defined.
 What goes into the Legion::LLM pipeline.
 ```
@@ -1795,6 +1926,8 @@ For queue ordering when requests go through RMQ:
 ## Response
+> **Implementation status: PARTIAL (envelope)** — All 34 fields exist as `Data.define` members. 9 fields are always nil/empty (see status matrix above). `routing` populates `provider`, `model`, `strategy`, `tier`, `escalated`, `escalation_chain`, `latency_ms`. `stop.reason` extracted from provider response (falls back to `:end_turn`). `quality` returns `{ score:, band:, source:, signals: }` from `ConfidenceScorer` (not `{ score:, acceptable:, checker: }` as the Response struct below shows). `cost` populated via `CostEstimator.estimate` with `estimated_usd`, `provider`, `model`. Convenience accessors (`.model`, `.provider`) are not defined.
 What comes back from the Legion::LLM pipeline.
 ```
@@ -1843,7 +1976,7 @@ Response
   # Stop (symmetric with request)
   stop:              Hash
-    reason:          Symbol       # :end_turn, :tool_calls, :max_tokens, :safety, :stop_sequence
+    reason:          Symbol       # :end_turn, :tool_use, :max_tokens, :safety, :stop_sequence
     sequence:        String?      # which stop sequence was hit (nil if none)
   # Tools (symmetric with request)
@@ -1984,6 +2117,8 @@ response.participants          # ["pipeline", "rbac", "provider:claude", ...]
 ## Chunk (Streaming)
+> **Implementation status: NOT IMPLEMENTED** — No `Chunk` struct exists. Streaming (`call_stream`) yields raw RubyLLM chunk objects directly to callers with no translation to the spec format.
 Incremental data during a streamed response.
 ```
@@ -2015,6 +2150,8 @@ Chunk
 ## ErrorResponse
+> **Implementation status: NOT IMPLEMENTED** — No `ErrorResponse` struct exists. Errors are raised as exceptions from the `Legion::LLM` error hierarchy: `LLMError` (base), `AuthError`, `RateLimitError`, `ContextOverflow`, `ProviderError`, `ProviderDown`, `UnsupportedCapability`, `PipelineError`, `TokenBudgetExceeded`, `EmbeddingUnavailableError`. Additionally, `EscalationExhausted`, `DaemonDeniedError`, `DaemonRateLimitedError`, and `PrivacyModeError` inherit from `StandardError` directly (not `LLMError`). These are Ruby exceptions, not structured response payloads.
 Standard error format for failed requests.
 ```
@@ -2059,6 +2196,8 @@ ErrorResponse
 ## Conversation
+> **Implementation status: PARTIAL** — `ConversationStore` exists as an in-memory LRU (256 slots) with optional DB persistence. No `Conversation` struct — conversations are plain hashes (`{ messages: [], metadata: {}, lru_tick: N }`). DB persistence stores `id`, `caller_identity`, `metadata` (JSON blob), `created_at`, `updated_at`. Most spec fields (`title`, `summary`, `state`, `shared`, `participants`, `tags`, `pinned`, `usage_total`, `routing_history`) exist only as arbitrary metadata blob entries, not first-class fields.
 The persistent conversation object stored in the ConversationStore.
 ```
@@ -2119,6 +2258,8 @@ Legion::LLM.chat(
 ## Config (Generation Parameters)
+> **Implementation status: PARTIAL** — Generation parameters are mapped to the first-class `generation` field by `from_chat_args`. However, provider adapters only forward `model` and `provider` to RubyLLM, not temperature/top_p/etc from the `generation` hash.
 Sent in `request.generation`. Provider adapters map supported parameters and ignore unsupported ones.
 ```
@@ -2162,6 +2303,8 @@ response_format:
 ## Provider Adapter Contract
+> **Implementation status: PARTIAL** — Provider LEXs (extensions-ai/) exist and work for chat/embed. The formal `ProviderAdapter` interface with `Translator` is not enforced — providers integrate via RubyLLM's native provider system.
 Every provider LEX must implement `Legion::LLM::ProviderAdapter` including a `Translator`.
 ### Required methods

data/lib/legion/llm/pipeline/executor.rb CHANGED Viewed

@@ -17,6 +17,7 @@ module Legion
         attr_reader :request, :profile, :timeline, :tracing, :enrichments,
                     :audit, :warnings, :discovered_tools, :confidence_score,
                     :escalation_chain
+        attr_accessor :tool_event_handler
         include Steps::ToolDiscovery
         include Steps::ToolCalls
@@ -67,6 +68,7 @@ module Legion
           @escalation_chain = nil
           @escalation_history = []
           @proactive_tier_assignment = nil
+          @tool_event_handler = nil
         end
         def call
@@ -164,7 +166,11 @@ module Legion
         def step_idempotency; end
-        def step_conversation_uuid; end
+        def step_conversation_uuid
+          return if @request.conversation_id
+          @request = @request.with(conversation_id: "conv_#{SecureRandom.hex(8)}")
+        end
         def step_context_load
           conv_id = @request.conversation_id
@@ -187,7 +193,7 @@ module Legion
                       maybe_compact_history(conv_id, history)
                     end
-          @enrichments[:conversation_history] = history
+          @enrichments['context:conversation_history'] = history
           @timeline.record(
             category: :internal, key: 'context:loaded',
             direction: :internal, detail: "loaded #{history.size} prior messages",
@@ -656,7 +662,15 @@ module Legion
           session, message_content = build_ruby_llm_session
           install_tool_loop_guard(session)
-          @raw_response = message_content ? session.ask(message_content, &) : session
+          Thread.current[:legion_tool_event_handler] = @tool_event_handler
+          begin
+            @raw_response = message_content ? session.ask(message_content, &) : session
+          ensure
+            Thread.current[:legion_tool_event_handler] = nil
+            Thread.current[:legion_current_tool_call_id] = nil
+            Thread.current[:legion_current_tool_name] = nil
+          end
           @timestamps[:provider_end] = Time.now
           record_provider_response
@@ -690,18 +704,47 @@ module Legion
         end
         def install_tool_loop_guard(session)
-          return unless session.respond_to?(:on)
+          unless session.respond_to?(:on_tool_call)
+            log.warn('[pipeline] tool loop guard unavailable: ruby_llm session does not respond to on_tool_call')
+            return
+          end
           tool_round = 0
-          session.on(:tool_call) do |_tool_call|
+          session.on_tool_call do |tool_call|
             tool_round += 1
             if tool_round > MAX_RUBY_LLM_TOOL_ROUNDS
               log.warn("[pipeline] tool loop cap hit: #{tool_round} rounds, halting")
               raise Legion::LLM::PipelineError, "tool loop exceeded #{MAX_RUBY_LLM_TOOL_ROUNDS} rounds"
             end
+            emit_tool_call_event(tool_call, tool_round)
           end
         end
+        def emit_tool_call_event(tool_call, round)
+          tc_id   = tool_call_field(tool_call, :id)
+          tc_name = tool_call_field(tool_call, :name)
+          tc_args = tool_call_field(tool_call, :arguments)
+          log.info("[pipeline][tool-call] round=#{round} id=#{tc_id} tool=#{tc_name}")
+          Thread.current[:legion_current_tool_call_id] = tc_id
+          Thread.current[:legion_current_tool_name] = tc_name
+          @tool_event_handler&.call(
+            type: :tool_call, tool_call_id: tc_id, tool_name: tc_name,
+            arguments: tc_args, round: round
+          )
+        end
+        def tool_call_field(tool_call, field)
+          return tool_call.public_send(field) if tool_call.respond_to?(field)
+          tool_call[field]
+        rescue StandardError
+          nil
+        end
         def apply_ruby_llm_instructions(session)
           injected_system = EnrichmentInjector.inject(
             system:      @request.system,
@@ -758,7 +801,7 @@ module Legion
           attrs = Steps::SpanAnnotator.attributes_for(step_name, audit: @audit, enrichments: @enrichments)
           attrs.each { |key, val| span.set_attribute(key, val) unless val.nil? }
         rescue StandardError => e
-          handle_exception(e, level: :debug, operation: 'llm.pipeline.annotate_span', step: step_name)
+          handle_exception(e, level: :warn, operation: 'llm.pipeline.annotate_span', step: step_name)
           nil
         end
@@ -783,7 +826,7 @@ module Legion
             span.set_attribute('routing.tier', data[:tier].to_s) if data[:tier]
           end
         rescue StandardError => e
-          handle_exception(e, level: :debug, operation: 'llm.pipeline.annotate_top_level_span')
+          handle_exception(e, level: :warn, operation: 'llm.pipeline.annotate_top_level_span')
           nil
         end
@@ -800,7 +843,14 @@ module Legion
           nil
         end
-        def step_response_normalization; end
+        def step_response_normalization
+          # Normalize enrichment keys to consistent string "source:type" format
+          normalized = {}
+          @enrichments.each do |key, value|
+            normalized[key.to_s] = value
+          end
+          @enrichments = normalized
+        end
         def step_context_store
           conv_id = @request.conversation_id
@@ -865,10 +915,11 @@ module Legion
             request_id:      @request.id,
             conversation_id: @request.conversation_id || "conv_#{SecureRandom.hex(8)}",
             message:         msg,
-            routing:         { provider: @resolved_provider, model: @resolved_model },
+            routing:         build_response_routing,
             tokens:          extract_tokens,
-            stop:            { reason: :end_turn },
+            stop:            extract_stop_reason,
             tools:           response_tool_calls,
+            cost:            estimate_response_cost,
             timestamps:      @timestamps,
             enrichments:     @enrichments,
             audit:           @audit,
@@ -890,17 +941,103 @@ module Legion
           Array(requested).map { |name| name.to_s.tr('.', '_') }.reject(&:empty?)
         end
+        def build_response_routing
+          routing = { provider: @resolved_provider, model: @resolved_model }
+          routing_audit = @audit[:'routing:provider_selection']
+          if routing_audit.is_a?(Hash) && routing_audit[:data].is_a?(Hash)
+            routing[:strategy] = routing_audit[:data][:strategy]
+            routing[:tier]     = routing_audit[:data][:tier]
+          end
+          routing[:escalated] = @escalation_history.size > 1
+          routing[:escalation_chain] = @escalation_history if @escalation_history.any?
+          if @timestamps[:provider_start] && @timestamps[:provider_end]
+            routing[:latency_ms] = ((@timestamps[:provider_end] - @timestamps[:provider_start]) * 1000).round
+          end
+          routing
+        end
+        def extract_stop_reason
+          reason = if @raw_response.respond_to?(:stop_reason)
+                     @raw_response.stop_reason&.to_sym
+                   elsif @raw_response.respond_to?(:tool_calls) && @raw_response.tool_calls&.any?
+                     :tool_use
+                   end
+          { reason: reason || :end_turn }
+        rescue StandardError
+          { reason: :end_turn }
+        end
+        def estimate_response_cost
+          tokens = extract_tokens
+          input  = tokens.respond_to?(:input_tokens) ? tokens.input_tokens : tokens[:input].to_i
+          output = tokens.respond_to?(:output_tokens) ? tokens.output_tokens : tokens[:output].to_i
+          return {} unless @resolved_model && (input + output).positive?
+          estimated = CostEstimator.estimate(
+            model_id:      @resolved_model,
+            input_tokens:  input,
+            output_tokens: output
+          )
+          { estimated_usd: estimated, provider: @resolved_provider, model: @resolved_model }
+        rescue StandardError
+          {}
+        end
         def response_tool_calls
           return [] unless @raw_response.respond_to?(:tool_calls) && @raw_response.tool_calls
+          tool_timeline = build_tool_timeline_index
           Array(@raw_response.tool_calls).map do |tool_call|
-            {
-              id:        tool_call[:id] || tool_call['id'],
-              name:      tool_call[:name] || tool_call['name'],
+            tc_id   = tool_call[:id] || tool_call['id']
+            tc_name = tool_call[:name] || tool_call['name']
+            entry = {
+              id:        tc_id,
+              name:      tc_name,
               arguments: tool_call[:arguments] || tool_call['arguments'] || {}
             }
+            # Merge execution data from timeline if available
+            timeline_data = tool_timeline[tc_name]
+            if timeline_data
+              entry[:exchange_id] = timeline_data[:exchange_id]
+              entry[:source]      = timeline_data[:source]
+              entry[:status]      = timeline_data[:status]
+              entry[:duration_ms] = timeline_data[:duration_ms]
+              entry[:result]      = timeline_data[:result]
+            end
+            entry
           end
         end
+        def build_tool_timeline_index
+          index = {}
+          @timeline.events.each do |event|
+            key = event[:key]
+            data = event[:data] || {}
+            if key&.start_with?('tool:execute:')
+              tool_name = key.sub('tool:execute:', '')
+              index[tool_name] = {
+                exchange_id: event[:exchange_id],
+                source:      data[:source],
+                status:      data[:status],
+                duration_ms: event[:duration_ms]
+              }
+            elsif key&.start_with?('tool:result:')
+              tool_name = key.sub('tool:result:', '')
+              index[tool_name][:result] = data[:result] if index[tool_name]
+            end
+          end
+          index
+        end
       end
     end
   end

data/lib/legion/llm/pipeline/request.rb CHANGED Viewed

@@ -67,26 +67,45 @@ module Legion
           extra = kwargs.except(
             :message, :messages, :model, :provider, :system,
-            :tools, :stream, :caller, :classification, :billing,
+            :tools, :tool_choice, :stream, :caller, :classification, :billing,
             :agent, :test, :tracing, :priority, :conversation_id,
-            :request_id, :id
+            :request_id, :id, :generation, :thinking, :response_format,
+            :context_strategy, :cache, :fork, :tokens, :stop,
+            :modality, :hooks, :idempotency_key, :ttl, :metadata,
+            :enrichments, :predictions
           )
           build_args = {
-            messages:        messages,
-            system:          kwargs[:system],
-            routing:         routing,
-            tools:           kwargs.fetch(:tools, []),
-            stream:          kwargs.fetch(:stream, false),
-            caller:          kwargs[:caller],
-            classification:  kwargs[:classification],
-            billing:         kwargs[:billing],
-            agent:           kwargs[:agent],
-            test:            kwargs[:test],
-            tracing:         kwargs[:tracing],
-            priority:        kwargs.fetch(:priority, :normal),
-            conversation_id: kwargs[:conversation_id],
-            extra:           extra
+            messages:         messages,
+            system:           kwargs[:system],
+            routing:          routing,
+            tools:            kwargs.fetch(:tools, []),
+            tool_choice:      kwargs[:tool_choice] || { mode: :auto },
+            stream:           kwargs.fetch(:stream, false),
+            generation:       kwargs[:generation] || {},
+            thinking:         kwargs[:thinking],
+            response_format:  kwargs[:response_format] || { type: :text },
+            context_strategy: kwargs.fetch(:context_strategy, :auto),
+            cache:            kwargs[:cache] || { strategy: :default, cacheable: true },
+            fork:             kwargs[:fork],
+            tokens:           kwargs[:tokens] || { max: 4096 },
+            stop:             kwargs[:stop] || { sequences: [] },
+            modality:         kwargs[:modality],
+            hooks:            kwargs[:hooks],
+            caller:           kwargs[:caller],
+            classification:   kwargs[:classification],
+            billing:          kwargs[:billing],
+            agent:            kwargs[:agent],
+            test:             kwargs[:test],
+            tracing:          kwargs[:tracing],
+            priority:         kwargs.fetch(:priority, :normal),
+            conversation_id:  kwargs[:conversation_id],
+            idempotency_key:  kwargs[:idempotency_key],
+            ttl:              kwargs[:ttl],
+            metadata:         kwargs[:metadata] || {},
+            enrichments:      kwargs[:enrichments] || {},
+            predictions:      kwargs[:predictions] || {},
+            extra:            extra
           }
           build_args[:id] = request_id if request_id
           build(**build_args)

data/lib/legion/llm/pipeline/response.rb CHANGED Viewed

@@ -55,13 +55,21 @@ module Legion
           input  = msg.respond_to?(:input_tokens) ? msg.input_tokens.to_i : 0
           output = msg.respond_to?(:output_tokens) ? msg.output_tokens.to_i : 0
+          stop_reason = if msg.respond_to?(:stop_reason)
+                          msg.stop_reason&.to_sym || :end_turn
+                        elsif msg.respond_to?(:tool_calls) && msg.tool_calls&.any?
+                          :tool_use
+                        else
+                          :end_turn
+                        end
           build(
             request_id:      request_id,
             conversation_id: conversation_id,
             message:         { role: :assistant, content: msg.content },
             routing:         { provider: provider, model: model || (msg.respond_to?(:model_id) ? msg.model_id : nil) },
             tokens:          { input: input, output: output, total: input + output },
-            stop:            { reason: :end_turn },
+            stop:            { reason: stop_reason },
             **extra
           )
         end

data/lib/legion/llm/pipeline/steps/classification.rb CHANGED Viewed

@@ -9,7 +9,7 @@ module Legion
         module Classification
           include Legion::Logging::Helper
-          LEVELS = %i[public internal restricted confidential].freeze
+          LEVELS = %i[public internal confidential restricted].freeze
           PII_PATTERNS = {
             ssn:   /\b\d{3}-\d{2}-\d{4}\b/,
@@ -105,7 +105,7 @@ module Legion
             { level: level.to_sym }
           rescue StandardError => e
-            handle_exception(e, level: :debug, operation: 'llm.pipeline.steps.classification.default')
+            handle_exception(e, level: :warn, operation: 'llm.pipeline.steps.classification.default')
             nil
           end
         end

data/lib/legion/llm/routes.rb CHANGED Viewed

@@ -15,6 +15,93 @@ require 'legion/logging/helper'
 module Legion
   module LLM
     module Routes
+      # Mixin for dynamically-built client tool classes — keeps build_client_tool_class small.
+      module ClientToolMethods
+        private
+        def log_tool(level, ref, status, **details)
+          return unless defined?(Legion::Logging)
+          parts = ["[tool][#{ref}] #{status}"]
+          details.each { |k, v| parts << "#{k}=#{v}" }
+          Legion::Logging.send(level, parts.join(' '))
+        end
+        def summarize_tool_arg_keys(kwargs)
+          kwargs.keys.map(&:to_s).sort.join(',')
+        end
+        def summarize_tool_args(ref, kwargs)
+          case ref
+          when 'sh'
+            { args: summarize_tool_arg_keys(kwargs), command_provided: kwargs.key?(:command) || kwargs.key?(:cmd) || !kwargs.empty? }
+          when 'file_write'
+            content = kwargs[:content] || kwargs[:contents]
+            { args: summarize_tool_arg_keys(kwargs), bytes: content.to_s.bytesize }
+          when 'file_edit'
+            { args: summarize_tool_arg_keys(kwargs),
+              old_len: kwargs[:old_text].to_s.length, new_len: kwargs[:new_text].to_s.length }
+          else
+            { args: summarize_tool_arg_keys(kwargs) }
+          end
+        end
+        def dispatch_client_tool(ref, **kwargs)
+          case ref
+          when 'sh'
+            cmd = kwargs[:command] || kwargs[:cmd] || kwargs.values.first.to_s
+            output, status = ::Open3.capture2e(cmd, chdir: Dir.pwd)
+            "exit=#{status.exitstatus}\n#{output}"
+          when 'file_read'
+            path = kwargs[:path] || kwargs[:file_path] || kwargs.values.first.to_s
+            ::File.exist?(path) ? ::File.read(path, encoding: 'utf-8') : "File not found: #{path}"
+          when 'file_write'
+            path = kwargs[:path] || kwargs[:file_path]
+            content = kwargs[:content] || kwargs[:contents]
+            ::File.write(path, content)
+            "Written #{content.to_s.bytesize} bytes to #{path}"
+          when 'file_edit'
+            path = kwargs[:path] || kwargs[:file_path]
+            old_text = kwargs[:old_text] || kwargs[:search]
+            new_text = kwargs[:new_text] || kwargs[:replace]
+            content = ::File.read(path, encoding: 'utf-8')
+            content.sub!(old_text, new_text)
+            ::File.write(path, content)
+            "Edited #{path}"
+          when 'list_directory'
+            path = ::File.expand_path(kwargs[:path] || kwargs[:dir] || Dir.pwd)
+            Dir.entries(path).reject { |e| e.start_with?('.') }.sort.join("\n")
+          when 'grep'
+            pattern = kwargs[:pattern] || kwargs[:query] || kwargs.values.first.to_s
+            path = kwargs[:path] || Dir.pwd
+            output, = ::Open3.capture2e('grep', '-rn', '--include=*.rb', pattern, path)
+            output.lines.first(50).join
+          when 'glob'
+            pattern = kwargs[:pattern] || kwargs.values.first.to_s
+            Dir.glob(pattern).first(100).join("\n")
+          when 'web_fetch'
+            url = kwargs[:url] || kwargs.values.first.to_s
+            require 'net/http'
+            uri = URI(url)
+            Net::HTTP.get(uri)
+          else
+            "Tool #{ref} is not executable server-side. Use a legion_ prefixed tool instead."
+          end
+        end
+        def notify_tool_event(type, ref, **data)
+          handler = Thread.current[:legion_tool_event_handler]
+          return unless handler
+          handler.call(
+            type:         type,
+            tool_call_id: Thread.current[:legion_current_tool_call_id],
+            tool_name:    ref,
+            **data
+          )
+        end
+      end
       def self.registered(app) # rubocop:disable Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity,Metrics/AbcSize,Metrics/MethodLength
         app.helpers do # rubocop:disable Metrics/BlockLength
           include Legion::Logging::Helper
@@ -31,7 +118,7 @@ module Legion
               begin
                 parsed = Legion::JSON.load(raw)
               rescue StandardError => e
-                handle_exception(e, level: :debug, operation: 'llm.routes.parse_request_body')
+                handle_exception(e, level: :warn, operation: 'llm.routes.parse_request_body')
                 halt 400, { 'Content-Type' => 'application/json' },
                      Legion::JSON.dump({ error: { code: 'invalid_json', message: 'request body is not valid JSON' } })
               end
@@ -140,55 +227,31 @@ module Legion
             end
           end
-          # rubocop:disable Metrics/BlockLength
           define_method(:build_client_tool_class) do |tname, tdesc, tschema|
+            tool_ref = tname
             klass = Class.new(RubyLLM::Tool) do
+              include Legion::LLM::Routes::ClientToolMethods
               description tdesc
-              define_method(:name) { tname }
-              tool_ref = tname
+              define_method(:name) { tool_ref }
               define_method(:execute) do |**kwargs|
-                case tool_ref
-                when 'sh'
-                  cmd = kwargs[:command] || kwargs[:cmd] || kwargs.values.first.to_s
-                  output, status = ::Open3.capture2e(cmd, chdir: Dir.pwd)
-                  "exit=#{status.exitstatus}\n#{output}"
-                when 'file_read'
-                  path = kwargs[:path] || kwargs[:file_path] || kwargs.values.first.to_s
-                  ::File.exist?(path) ? ::File.read(path, encoding: 'utf-8') : "File not found: #{path}"
-                when 'file_write'
-                  path = kwargs[:path] || kwargs[:file_path]
-                  content = kwargs[:content] || kwargs[:contents]
-                  ::File.write(path, content)
-                  "Written #{content.to_s.bytesize} bytes to #{path}"
-                when 'file_edit'
-                  path = kwargs[:path] || kwargs[:file_path]
-                  old_text = kwargs[:old_text] || kwargs[:search]
-                  new_text = kwargs[:new_text] || kwargs[:replace]
-                  content = ::File.read(path, encoding: 'utf-8')
-                  content.sub!(old_text, new_text)
-                  ::File.write(path, content)
-                  "Edited #{path}"
-                when 'list_directory'
-                  path = kwargs[:path] || kwargs[:dir] || Dir.pwd
-                  Dir.entries(path).reject { |e| e.start_with?('.') }.sort.join("\n")
-                when 'grep'
-                  pattern = kwargs[:pattern] || kwargs[:query] || kwargs.values.first.to_s
-                  path = kwargs[:path] || Dir.pwd
-                  output, = ::Open3.capture2e('grep', '-rn', '--include=*.rb', pattern, path)
-                  output.lines.first(50).join
-                when 'glob'
-                  pattern = kwargs[:pattern] || kwargs.values.first.to_s
-                  Dir.glob(pattern).first(100).join("\n")
-                when 'web_fetch'
-                  url = kwargs[:url] || kwargs.values.first.to_s
-                  require 'net/http'
-                  uri = URI(url)
-                  Net::HTTP.get(uri)
-                else
-                  "Tool #{tool_ref} is not executable server-side. Use a legion_ prefixed tool instead."
-                end
+                summary = summarize_tool_args(tool_ref, kwargs)
+                log_tool(:info, tool_ref, 'executing', **summary)
+                t0 = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
+                result = dispatch_client_tool(tool_ref, **kwargs)
+                ms = ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - t0) * 1000).round(1)
+                log_tool(:info, tool_ref, 'completed', duration_ms: ms, result_size: result.to_s.bytesize)
+                notify_tool_event(:tool_result, tool_ref, result: result.to_s[0, 4096])
+                result
               rescue StandardError => e
+                ms = begin
+                  ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - t0) * 1000).round(1)
+                rescue StandardError
+                  nil
+                end
+                log_tool(:error, tool_ref, 'failed', duration_ms: ms, error: e.message)
+                notify_tool_event(:tool_error, tool_ref, error: e.message)
                 if defined?(Legion::Logging) && Legion::Logging.respond_to?(:log_exception)
                   Legion::Logging.log_exception(e, payload_summary: "client tool #{tool_ref} failed", component_type: :api)
                 end
@@ -201,7 +264,6 @@ module Legion
             handle_exception(e, level: :warn, operation: "llm.routes.build_client_tool_class.#{tname}")
             nil
           end
-          # rubocop:enable Metrics/BlockLength
           define_method(:extract_tool_calls) do |pipeline_response|
             tools_data = pipeline_response.tools
@@ -217,10 +279,12 @@ module Legion
           end
           define_method(:emit_sse_event) do |stream, event_name, payload|
+            level = event_name == 'text-delta' ? :debug : :info
+            log.send(level, "[sse][emit] event=#{event_name} keys=#{payload.is_a?(Hash) ? payload.keys.join(',') : 'n/a'}")
             stream << "event: #{event_name}\ndata: #{Legion::JSON.dump(payload)}\n\n"
           end
-          define_method(:emit_timeline_tool_events) do |stream, pipeline_response|
+          define_method(:emit_timeline_tool_events) do |stream, pipeline_response, skip_tool_results: false|
             timeline = Array(pipeline_response.timeline)
             timeline.each do |event|
               key = event[:key].to_s
@@ -230,6 +294,9 @@ module Legion
               next if name.to_s.empty?
               if key.start_with?('tool:result:')
+                # Skip replay when real-time tool events already emitted these during streaming
+                next if skip_tool_results
                 event_name = data[:status].to_s == 'error' ? 'tool-error' : 'tool-result'
                 emit_sse_event(stream, event_name, {
                                  toolCallId: data[:tool_call_id],
@@ -520,6 +587,35 @@ module Legion
             # rubocop:disable Metrics/BlockLength
             stream do |out|
               full_text = +''
+              executor.tool_event_handler = lambda { |event|
+                log.info("[inference][tool-event] type=#{event[:type]} tool=#{event[:tool_name]} id=#{event[:tool_call_id]}")
+                case event[:type]
+                when :tool_call
+                  emit_sse_event(out, 'tool-call', {
+                                   toolCallId: event[:tool_call_id],
+                                   toolName:   event[:tool_name],
+                                   args:       event[:arguments],
+                                   timestamp:  Time.now.utc.iso8601
+                                 })
+                when :tool_result
+                  emit_sse_event(out, 'tool-result', {
+                                   toolCallId: event[:tool_call_id],
+                                   toolName:   event[:tool_name],
+                                   result:     event[:result],
+                                   timestamp:  Time.now.utc.iso8601
+                                 })
+                when :tool_error
+                  emit_sse_event(out, 'tool-error', {
+                                   toolCallId: event[:tool_call_id],
+                                   toolName:   event[:tool_name],
+                                   result:     event[:error],
+                                   status:     'error',
+                                   timestamp:  Time.now.utc.iso8601
+                                 })
+                end
+              }
               pipeline_response = executor.call_stream do |chunk|
                 text = chunk.respond_to?(:content) ? chunk.content.to_s : chunk.to_s
                 next if text.empty?
@@ -528,16 +624,7 @@ module Legion
                 emit_sse_event(out, 'text-delta', { delta: text })
               end
-              extract_tool_calls(pipeline_response).each do |tool_call|
-                emit_sse_event(out, 'tool-call', {
-                                 toolCallId: tool_call[:id],
-                                 toolName:   tool_call[:name],
-                                 args:       tool_call[:arguments],
-                                 timestamp:  Time.now.utc.iso8601
-                               })
-              end
-              emit_timeline_tool_events(out, pipeline_response)
+              emit_timeline_tool_events(out, pipeline_response, skip_tool_results: !executor.tool_event_handler.nil?)
               enrichments = pipeline_response.enrichments
               emit_sse_event(out, 'enrichment', enrichments) if enrichments.is_a?(Hash) && !enrichments.empty?

data/lib/legion/llm/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Legion
   module LLM
-    VERSION = '0.6.20'
+    VERSION = '0.6.23'
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: legion-llm
 version: !ruby/object:Gem::Version
-  version: 0.6.20
+  version: 0.6.23
 platform: ruby
 authors:
 - Esity