legion-llm 0.6.20 → 0.6.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 45d07a2c60a8663ba1b62165b3b489d49a2aac37ee1e1ec6abff7bd5f4357d6c
4
- data.tar.gz: 9ee8246c75fee6d7e690b55f4e2a91b030f6b142c91dd79acb7bf66edf4d9d05
3
+ metadata.gz: bd422bcc5c5b6da0dbd4906df8ac394e5c712e709eb8cb367cc676fbf6e45f97
4
+ data.tar.gz: 148e5741014313918781e757c87a50e40b2d5e5ef164631b71959f6027c70316
5
5
  SHA512:
6
- metadata.gz: 92b102167bb6f346fab490787baedda2f2fa6fb528713c6b055b269f747c490d56fed5d21027616bc2c6d1f7cf4069ce14bb9b7607f7a6ad07c2c69b05ce0814
7
- data.tar.gz: f1fded39722bf678936df28f3bbf3ec095265bdabc28f70eaf67e64fae5519b7c58842a8432e4b4bfdc0476c9275473f75a9c83bf0c77d6f5cc2afe1fa700aeb
6
+ metadata.gz: dc80d32daf35e53bfe514a0e318911c97e9e3971374eb711128c68db6a02084cc9fd259f68ccf6e0242fb10ff1cbccf2a1ca9132b37e2aa1e6ee23cd5cbe0b5d
7
+ data.tar.gz: 5673f3536126bc1d3e17e69ab2892edb1b8bd9524bdc6c38993e85ac0869e48a42bd82f9d8691923527f701940a42969052bbbf9db40976434f1ad00210f3934
data/CHANGELOG.md CHANGED
@@ -1,5 +1,40 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.6.23] - 2026-04-07
4
+
5
+ ### Fixed
6
+ - `build_response_routing` now always sets `routing[:escalated]` (defaults to `false`) instead of conditionally omitting the key
7
+ - Schema spec annotations updated: Thinking, Cache, Config(Generation) corrected to reflect `from_chat_args` first-class field mapping; ErrorResponse annotation updated with complete error hierarchy including `EscalationExhausted`, `PrivacyModeError`, `TokenBudgetExceeded`, `DaemonDeniedError`, `DaemonRateLimitedError`
8
+
9
+ ## [0.6.22] - 2026-04-07
10
+
11
+ ### Fixed
12
+ - Classification LEVELS ordering: swapped `[:public, :internal, :restricted, :confidential]` to correct `[:public, :internal, :confidential, :restricted]` so severity comparisons work properly
13
+ - `Response.from_ruby_llm` now extracts actual `stop_reason` from provider response instead of hardcoding `:end_turn`
14
+ - `Request.from_chat_args` maps 16 fields (`tool_choice`, `generation`, `thinking`, `response_format`, `context_strategy`, `cache`, `fork`, `tokens`, `stop`, `modality`, `hooks`, `idempotency_key`, `ttl`, `metadata`, `enrichments`, `predictions`) to first-class struct members instead of dumping into `extra`
15
+ - `build_response` populates routing details (strategy, tier, escalation chain, latency), cost estimation via `CostEstimator`, and actual stop reason instead of hardcoded defaults
16
+ - `response_tool_calls` merges execution data (exchange_id, source, status, duration_ms, result) from timeline events into tool call hashes
17
+ - `step_conversation_uuid` now auto-generates `conv_<hex>` when no conversation_id is provided (was a no-op)
18
+ - `step_response_normalization` now normalizes all enrichment keys to string format (was a no-op)
19
+ - Enrichment key `[:conversation_history]` corrected to `['context:conversation_history']` for consistent `source:type` pattern
20
+
21
+ ### Changed
22
+ - Schema spec (`docs/llm-schema-spec.md`) updated: ToolCall, Config(Generation), Cost, Routing(response), Stop status changed from Partial/Not-implemented to Implemented
23
+
24
+ ## [0.6.21] - 2026-04-07
25
+
26
+ ### Added
27
+ - Real-time tool call SSE streaming: tool-call, tool-result, and tool-error events emitted during execution, not after completion
28
+ - `ClientToolMethods` module extracted from inline tool class for cleaner separation
29
+ - Rich tool execution logging: command, path, pattern, url shown per tool type instead of just key names
30
+ - `summarize_tool_args` produces structured log details per tool type (sh, file_read, file_write, file_edit, grep, glob, web_fetch, list_directory)
31
+ - `tool_event_handler` callback on `Pipeline::Executor` for real-time tool event forwarding via `Thread.current`
32
+
33
+ ### Fixed
34
+ - `install_tool_loop_guard` now uses `session.on_tool_call` instead of `session.on(:tool_call)` — RubyLLM callback was never firing, tool_call_id was always nil
35
+ - `list_directory` tool now expands `~` via `File.expand_path` — previously failed with `ENOENT` on tilde paths
36
+ - SSE text-delta events logged at debug level instead of info to reduce log noise
37
+
3
38
  ## [0.6.20] - 2026-04-06
4
39
 
5
40
  ### Added
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
4
4
 
5
- **Version**: 0.6.14
5
+ **Version**: 0.6.23
6
6
 
7
7
  ## Installation
8
8
 
@@ -1,6 +1,75 @@
1
1
  # Legion::LLM Schema Specification
2
2
 
3
- ## Status: Draft / Brainstorming
3
+ ## Status: Mixed Envelope Implemented, Inner Types Aspirational
4
+
5
+ **Implemented in**: `Pipeline::Request` and `Pipeline::Response` (`lib/legion/llm/pipeline/request.rb`, `response.rb`)
6
+ **Version**: 1.0.0 (schema_version field on all payloads)
7
+ **Last verified**: 2026-04-07
8
+
9
+ The outer envelope is implemented: all 32 `Request` fields and 34 `Response` fields exist as `Data.define` members. However, many inner types (Message, ContentBlock, ToolCall, Chunk, Conversation, Feedback, ErrorResponse) are **not yet implemented as dedicated structs** — they are plain hashes or strings in the current code. Several Response fields are **always nil or empty** in the pipeline today.
10
+
11
+ This document serves as both the **canonical reference** for what is implemented and the **target specification** for what inner types should look like. Sections are annotated with implementation status.
12
+
13
+ For the AMQP wire protocol (exchange topology, queue configuration, message envelope, routing keys), see the Legion Wire Protocol spec in the LegionIO docs repo.
14
+
15
+ ### Implementation Status Matrix
16
+
17
+ | Section | Status | Notes |
18
+ |---------|--------|-------|
19
+ | **Request (envelope)** | Implemented | All 32 fields exist on `Data.define`. `from_chat_args` maps all to first-class fields. |
20
+ | **Response (envelope)** | Partial | All 34 fields exist. 10 fields always nil/empty (see below). |
21
+ | **Message** | Not implemented | Plain `{ role:, content: }` hashes. No struct, no id/seq/status/version. |
22
+ | **ContentBlock** | Not implemented | Content is always String. Only `:text` block used (system prompt caching). |
23
+ | **Tool** | Partial | `ToolAdapter` has name/description/parameters. No `source` on object, no `version`. |
24
+ | **ToolCall** | Partial | `id`, `name`, `arguments` + `exchange_id`, `source`, `status`, `duration_ms`, `result` merged from Timeline. `error` field never populated. Timeline lookup by tool name, not call ID (breaks duplicate tool calls). |
25
+ | **ToolChoice** | Stub | Field exists on Request, defaults to `{ mode: :auto }`, never forwarded to provider. |
26
+ | **Enrichment** | Implemented | RAG/GAIA enrichments work. Value shapes vary between steps. |
27
+ | **Prediction** | Partial | Request-side works. Response-side actuals never filled in. |
28
+ | **Tracing** | Implemented | trace_id, span_id, exchange_id all generated and propagated. |
29
+ | **Classification** | Partial | Labels applied but routing restrictions not enforced. |
30
+ | **Caller** | Implemented | Identity propagated, Profile derived. |
31
+ | **Agent** | Not implemented | Response `agent` field always nil. |
32
+ | **Billing** | Partial | Per-request cap only. No cumulative budget enforcement. |
33
+ | **Test** | Implemented | Test mode flags propagated. |
34
+ | **Modality** | Not implemented | Field exists, not acted upon. |
35
+ | **Hooks** | Partial | Pre/post hooks on Request. Response hooks not fired. |
36
+ | **Feedback** | Not implemented | No struct, class, or storage. Spec only. |
37
+ | **Audit** | Implemented | Uses symbol keys (not string keys as spec claims). |
38
+ | **Timeline** | Implemented | Event recording works. Participant tracking works. |
39
+ | **Participants** | Implemented | Tracked via Timeline. |
40
+ | **Wire Capture** | Not implemented | Response `wire` field always nil. |
41
+ | **Retry** | Not implemented | Response `retry` field always nil. |
42
+ | **Safety** | Not implemented | Response `safety` field always nil. |
43
+ | **Rate Limit** | Not implemented | Response `rate_limit` field always nil. |
44
+ | **Thinking** | Partial | Request thinking config mapped to first-class field. Response thinking **never populated** by executor (always nil). |
45
+ | **Context Window** | Not implemented | `tokens.context_window`, `utilization`, `headroom` never populated. |
46
+ | **Validation** | Not implemented | Response `validation` field always nil. |
47
+ | **Provider Features** | Not implemented | Response `features` field always nil. |
48
+ | **Model Deprecation** | Not implemented | Response `deprecation` field always nil. |
49
+ | **Cache** | Partial | Request cache mapped to first-class field. Response `cache` always `{}`. |
50
+ | **Chunk (Streaming)** | Not implemented | Raw RubyLLM chunks passed through; no spec-compliant Chunk struct. |
51
+ | **ErrorResponse** | Not implemented | No struct; only exception classes (`LLMError` hierarchy). |
52
+ | **Conversation** | Partial | `ConversationStore` exists but no `Conversation` struct. Limited fields. |
53
+ | **Config (Generation)** | Implemented | `from_chat_args` now maps generation, thinking, response_format, etc. to first-class fields. |
54
+ | **Quality** | Implemented | Returns `{ score:, band:, source: }` (not `{ score:, acceptable:, checker: }` as spec says). |
55
+ | **Cost** | Implemented | Populated via `CostEstimator.estimate` with `estimated_usd`, `provider`, `model`. |
56
+ | **Routing (response)** | Implemented | `provider`, `model`, `strategy`, `tier`, `escalated`, `escalation_chain`, `latency_ms` populated. |
57
+ | **Stop** | Implemented | `stop.reason` extracted from provider response (`:end_turn`, `:tool_use`, etc.). |
58
+ | **Metering** | Not implemented | Module exists but not wired into pipeline steps. |
59
+
60
+ #### Response Fields Always Nil/Empty
61
+
62
+ These Response fields exist on the `Data.define` but are **never populated** by the executor today:
63
+
64
+ - `agent` — always nil
65
+ - `cache` — always `{}`
66
+ - `safety` — always nil
67
+ - `rate_limit` — always nil
68
+ - `features` — always nil
69
+ - `deprecation` — always nil
70
+ - `validation` — always nil
71
+ - `wire` — always nil
72
+ - `retry` — always nil
4
73
 
5
74
  ## Design Principles
6
75
 
@@ -27,6 +96,8 @@ schema_version: "1.0.0" # semver -- major.minor.patch
27
96
 
28
97
  ## Message
29
98
 
99
+ > **Implementation status: NOT IMPLEMENTED** — No `Message` struct exists. Messages are plain hashes with only `role` and `content` in the pipeline. `ConversationStore` persists additional fields (`id`, `seq`, `parent_id`, `agent_id`, `created_at`) in its DB rows, but these are not surfaced as a structured Message object.
100
+
30
101
  The atomic unit of conversation. Every exchange between user, assistant, and tools is a Message.
31
102
 
32
103
  ```
@@ -78,6 +149,8 @@ message.text # returns text content regardless of String vs Array<ContentBlock>
78
149
 
79
150
  ## Content Blocks
80
151
 
152
+ > **Implementation status: NOT IMPLEMENTED** — No `ContentBlock` struct exists. Content is always a plain String in the pipeline. The only place a typed block hash is constructed is for system prompt caching (`{ type: :text, content: ..., cache_control: ... }`). No image, audio, video, document, tool_use, tool_result, citation, or error block handling exists.
153
+
81
154
  Multimodal content. When `Message.content` is an array, each element is a ContentBlock.
82
155
 
83
156
  ### Block Types
@@ -199,6 +272,8 @@ data: Hash? # structured error data
199
272
 
200
273
  ## Tool
201
274
 
275
+ > **Implementation status: PARTIAL** — `ToolAdapter` wraps `RubyLLM::Tool` with `name`, `description`, `parameters`. The `source` field exists as a parallel lookup in `find_tool_source` (not on the tool object). `version` does not exist.
276
+
202
277
  Tool definitions available to the LLM.
203
278
 
204
279
  ```
@@ -227,6 +302,8 @@ Used by RBAC (can this caller use tools from this source?) and audit (which syst
227
302
 
228
303
  ## ToolCall
229
304
 
305
+ > **Implementation status: PARTIAL** — Tool calls are hashes with `id`, `name`, `arguments` and optionally `exchange_id`, `source`, `status`, `duration_ms`, `result` merged from matching Timeline events. The `error` field is never populated. Timeline lookup uses tool name (not call ID), so duplicate invocations of the same tool in one response will only have execution data for the last invocation.
306
+
230
307
  A tool invocation made by the assistant, with execution results.
231
308
 
232
309
  ```
@@ -251,6 +328,8 @@ Always a parsed Hash, never a JSON string. Provider adapters that receive argume
251
328
 
252
329
  ## ToolChoice
253
330
 
331
+ > **Implementation status: STUB** — Field exists on Request, defaults to `{ mode: :auto }`. The `:specific` mode's `name` field is not handled. The `tool_choice` value is never forwarded to the underlying RubyLLM provider call.
332
+
254
333
  Controls how the LLM uses available tools.
255
334
 
256
335
  ```
@@ -263,6 +342,8 @@ ToolChoice
263
342
 
264
343
  ## Enrichment
265
344
 
345
+ > **Implementation status: IMPLEMENTED** — RAG and GAIA enrichments work. Note: value shapes are inconsistent across pipeline steps — not all enrichments include `content:`, `data:`, `duration_ms:`, `timestamp:` as spec describes.
346
+
266
347
  Things that *shaped* the request during processing. Any system can contribute enrichments without schema changes. Enrichments modify or observe the request -- for decisions and outcomes, see [Audit](#audit).
267
348
 
268
349
  Enrichments are a **Hash keyed by `"source:type"`**, not an array. This enables direct lookup and clean request-vs-response comparison without looping.
@@ -319,6 +400,8 @@ Adding a new system requires zero schema changes -- just add a new key.
319
400
 
320
401
  ## Prediction
321
402
 
403
+ > **Implementation status: PARTIAL** — Request-side predictions work (components can contribute predictions). Response-side actuals (`actual_value`, `accurate`) are never filled in — no post-execution comparison occurs.
404
+
322
405
  Hypothesis recorded before execution, compared to reality after execution. Enables self-improving systems. Any component in the pipeline can contribute predictions.
323
406
 
324
407
  Predictions are a **Hash keyed by `"source:type"`**, same pattern as enrichments. Direct lookup, no looping.
@@ -395,6 +478,8 @@ response.predictions.count { |_, v| v[:correct] }.to_f / response.predictions.si
395
478
 
396
479
  ## Tracing & Correlation
397
480
 
481
+ > **Implementation status: IMPLEMENTED** — `trace_id`, `span_id`, `exchange_id` all generated and propagated via `Pipeline::Tracing`.
482
+
398
483
  OpenTelemetry-compatible distributed tracing. Groups related requests across agentic loops, forks, and multi-step tasks.
399
484
 
400
485
  ```
@@ -424,6 +509,8 @@ Tracing is present on Request, Response, ErrorResponse, and Chunk.
424
509
 
425
510
  ## Exchange (Per-Hop Tracking)
426
511
 
512
+ > **Implementation status: IMPLEMENTED** — `conversation_id`, `request_id` (mapped to `id`), and `exchange_id` all generated via `Pipeline::Tracing` and propagated through the pipeline.
513
+
427
514
  Three-level ID hierarchy inspired by SIP's Call-ID / CSeq / Branch/Via model. Tracks every hop within a single request.
428
515
 
429
516
  ```
@@ -487,6 +574,8 @@ In practice, each exchange would become a child span under the request's span in
487
574
 
488
575
  ## Data Classification & Compliance
489
576
 
577
+ > **Implementation status: PARTIAL** — Classification labels are applied to requests. However, routing restrictions (e.g., preventing PHI-tagged data from going to certain providers) are not enforced.
578
+
490
579
  Data governance for enterprise adoption. Controls where data can be processed, how long it's retained, and what it contains.
491
580
 
492
581
  ```
@@ -538,6 +627,8 @@ Provider registry includes each provider's processing jurisdiction. Router match
538
627
 
539
628
  ## Caller
540
629
 
630
+ > **Implementation status: IMPLEMENTED** — Caller identity propagated through the pipeline. `Profile.derive` reads `caller[:requested_by][:type]` to determine step skipping.
631
+
541
632
  Auth-level identity tracking. Who authenticated to make this request, and on whose behalf. Separate from `agent` (which tracks AI entity identity).
542
633
 
543
634
  ```
@@ -602,6 +693,8 @@ RBAC checks `caller.requested_by` for permission evaluation. If `requested_for`
602
693
 
603
694
  ## Agent Identity
604
695
 
696
+ > **Implementation status: NOT IMPLEMENTED** — The `agent` field exists on both Request and Response but is always nil. No agent identity is attached during pipeline execution.
697
+
605
698
  Tracks which AI entity is executing the request. Not about auth (that's `caller`) -- about the AI agent doing the work.
606
699
 
607
700
  ```
@@ -648,6 +741,8 @@ Multiple LLM requests can share a `task_id`, enabling: "Show me everything that
648
741
 
649
742
  ## Billing & Budget
650
743
 
744
+ > **Implementation status: PARTIAL** — Per-request cost cap works. Cumulative budget tracking (daily/monthly limits) is not implemented. Metering module exists but is not wired into pipeline steps.
745
+
651
746
  Cost tracking, budget enforcement, and rate limiting.
652
747
 
653
748
  ```
@@ -690,6 +785,8 @@ Checked in the pipeline before the provider call:
690
785
 
691
786
  ## Test & Evaluation Mode
692
787
 
788
+ > **Implementation status: IMPLEMENTED** — Test mode flags propagated through the pipeline.
789
+
693
790
  Controls for testing, benchmarking, replay, and experimentation.
694
791
 
695
792
  ```
@@ -743,6 +840,8 @@ Experiment results are tracked via predictions (expected: better quality with GA
743
840
 
744
841
  ## Modality
745
842
 
843
+ > **Implementation status: NOT IMPLEMENTED** — The `modality` field exists on Request but is not acted upon by the pipeline or provider adapters.
844
+
746
845
  Declares input and output modality expectations. Guides routing (not all providers support all combinations) and future-proofs for multimodal evolution.
747
846
 
748
847
  ```
@@ -799,6 +898,8 @@ Provider capabilities:
799
898
 
800
899
  ## Lifecycle Hooks
801
900
 
901
+ > **Implementation status: PARTIAL** — Pre/post hooks on Request are supported. Response-side hook firing is not implemented.
902
+
802
903
  Caller-declared injection points in the pipeline. Named hooks registered by extensions or configuration.
803
904
 
804
905
  ```
@@ -839,6 +940,8 @@ Hooks receive the full request/response context and can add enrichments, but can
839
940
 
840
941
  ## Feedback
841
942
 
943
+ > **Implementation status: NOT IMPLEMENTED** — No Feedback struct, class, or storage exists. No code submits, receives, or stores feedback.
944
+
842
945
  User or automated quality feedback on specific messages. Lives on the Conversation, not on individual requests. Closes the learning loop.
843
946
 
844
947
  ```
@@ -884,6 +987,8 @@ Quality checkers and GAIA can also submit feedback:
884
987
 
885
988
  ## Audit
886
989
 
990
+ > **Implementation status: IMPLEMENTED** — Audit records are populated by the pipeline. Note: uses symbol keys (`:step`, `:action`), not string keys as some examples in this spec show.
991
+
887
992
  Record of what *happened* during pipeline processing -- decisions, actions, outcomes. Separate from enrichments (which record what *shaped* the request). Response-only.
888
993
 
889
994
  Audit is a **Hash keyed by `"step:action"`**, same pattern as enrichments and predictions.
@@ -990,6 +1095,8 @@ response.audit[:"persistence:store"][:data][:method] # => :direct
990
1095
 
991
1096
  ## Pipeline Timeline
992
1097
 
1098
+ > **Implementation status: IMPLEMENTED** — `Pipeline::Timeline` records ordered events with participant tracking.
1099
+
993
1100
  Inspired by [Homer/SIPCAPTURE](https://github.com/sipcapture/homer) call flow diagrams. A unified, globally-sequenced timeline of **everything** that happened during a request. Reconstructs the full call flow across all systems -- enrichments, audit, tool calls, provider calls, connections -- in one ordered record.
994
1101
 
995
1102
  This is the **one place an array is correct**. Timeline is ordered data, not lookup data. You iterate it in sequence to reconstruct the call flow, like Homer's ladder diagram.
@@ -1125,6 +1232,8 @@ The timeline is built during pipeline execution and returned on the response. It
1125
1232
 
1126
1233
  ## Participants
1127
1234
 
1235
+ > **Implementation status: IMPLEMENTED** — Tracked via `Pipeline::Timeline`.
1236
+
1128
1237
  All systems that touched this request. Enables Homer-style column headers for call flow visualization. Response-only, populated by the pipeline.
1129
1238
 
1130
1239
  ```
@@ -1154,6 +1263,8 @@ Auto-populated: every unique `from` and `to` value in the timeline becomes a par
1154
1263
 
1155
1264
  ## Wire Capture
1156
1265
 
1266
+ > **Implementation status: NOT IMPLEMENTED** — Response `wire` field is always nil. No capture of raw provider payloads occurs.
1267
+
1157
1268
  Raw request and response payloads as sent to/received from the provider. For debugging translator issues, you need both sides of the wire. Opt-in (can be expensive to store).
1158
1269
 
1159
1270
  Keyed by `exchange_id` -- one capture per provider call, not per request. A request with retries or tool loops produces multiple wire captures.
@@ -1223,6 +1334,8 @@ This lives on `response.routing.connection` since it's part of the routing outco
1223
1334
 
1224
1335
  ## Retry
1225
1336
 
1337
+ > **Implementation status: NOT IMPLEMENTED** — Response `retry` field is always nil. Retry logic exists in the executor (rate limit rescue) but results are not captured in the retry struct.
1338
+
1226
1339
  Distinct from escalation. Retries are the same provider/model attempted again after a transient failure. Escalation is switching to a different provider/model.
1227
1340
 
1228
1341
  ```
@@ -1269,6 +1382,8 @@ response.retry = {
1269
1382
 
1270
1383
  ## Content Safety
1271
1384
 
1385
+ > **Implementation status: NOT IMPLEMENTED** — Response `safety` field is always nil. Provider safety results are not captured.
1386
+
1272
1387
  Provider-reported content filtering results. Different from classification (which is our data governance). This is the provider saying "I evaluated this content against my safety policies."
1273
1388
 
1274
1389
  Response-only. Not all providers return this.
@@ -1322,6 +1437,8 @@ response.safety = {
1322
1437
 
1323
1438
  ## Rate Limit State
1324
1439
 
1440
+ > **Implementation status: NOT IMPLEMENTED** — Response `rate_limit` field is always nil. Provider rate limit headers are not captured (rate limit errors are rescued and retried, but quota state is not stored).
1441
+
1325
1442
  Provider quota state returned in response headers. Structured and always captured (not opt-in like wire). Critical for routing decisions.
1326
1443
 
1327
1444
  ```
@@ -1361,6 +1478,8 @@ end
1361
1478
 
1362
1479
  ## Thinking & Reasoning
1363
1480
 
1481
+ > **Implementation status: PARTIAL** — Request-side thinking configuration is mapped to the first-class `thinking` field by `from_chat_args`. Response-side `thinking` field exists on the Response struct but is **never populated** by the executor — it is always nil.
1482
+
1364
1483
  Controls for extended thinking, chain-of-thought, and reasoning behavior. Separate from generation parameters (temperature, top_p) because reasoning is about *how deeply* the model thinks, not *how randomly* it samples.
1365
1484
 
1366
1485
  ### Request side
@@ -1395,6 +1514,8 @@ Thinking tokens are tracked separately from regular output tokens because they h
1395
1514
 
1396
1515
  ## Context Window Utilization
1397
1516
 
1517
+ > **Implementation status: NOT IMPLEMENTED** — `tokens.context_window`, `tokens.utilization`, and `tokens.headroom` are never populated on the Response. Only `input_tokens` and `output_tokens` are set.
1518
+
1398
1519
  Expands response-side tokens with capacity information. Drives context strategy decisions.
1399
1520
 
1400
1521
  Added to `response.tokens`:
@@ -1439,6 +1560,8 @@ end
1439
1560
 
1440
1561
  ## Structured Output Validation
1441
1562
 
1563
+ > **Implementation status: NOT IMPLEMENTED** — Response `validation` field is always nil. `StructuredOutput` module exists for enforcing schemas but does not populate this struct.
1564
+
1442
1565
  When `response_format.type` is `:json` or `:json_schema`, reports whether the response actually validated.
1443
1566
 
1444
1567
  Response-only. Added to response alongside quality.
@@ -1479,6 +1602,8 @@ response.validation = {
1479
1602
 
1480
1603
  ## Provider Features
1481
1604
 
1605
+ > **Implementation status: NOT IMPLEMENTED** — Response `features` field is always nil.
1606
+
1482
1607
  Post-hoc report of which provider-specific features actually activated on this request. Different from capabilities (what the provider CAN do) -- this is what it DID.
1483
1608
 
1484
1609
  Response-only. Hash-keyed by feature name.
@@ -1519,6 +1644,8 @@ end
1519
1644
 
1520
1645
  ## Model Deprecation
1521
1646
 
1647
+ > **Implementation status: NOT IMPLEMENTED** — Response `deprecation` field is always nil.
1648
+
1522
1649
  Structured deprecation warnings from providers. Separate from the `warnings` array because automated systems need to act on these programmatically.
1523
1650
 
1524
1651
  Response-only.
@@ -1564,6 +1691,8 @@ end
1564
1691
 
1565
1692
  ## Cache
1566
1693
 
1694
+ > **Implementation status: PARTIAL** — Request-side `cache` field is mapped to the first-class field by `from_chat_args` (defaults to `{ strategy: :default, cacheable: true }`). Response-side `cache` field is always `{}`.
1695
+
1567
1696
  Symmetric caching controls on request and response. Replaces a flat strategy symbol with structured metadata.
1568
1697
 
1569
1698
  ### Request side (what I want)
@@ -1631,6 +1760,8 @@ Response: cache: { hit: true, key: "sha256:abc123", tier: :local, age: 45, expir
1631
1760
 
1632
1761
  ## Request
1633
1762
 
1763
+ > **Implementation status: IMPLEMENTED (envelope)** — All 32 fields exist as `Data.define` members with `.build` and `.from_chat_args` constructors. All fields including `generation`, `thinking`, `response_format`, `context_strategy`, `cache`, `fork`, `tokens`, `stop`, `modality`, `hooks`, `idempotency_key`, `ttl`, `metadata`, `enrichments`, and `predictions` are mapped to first-class struct members. Convenience accessors (`.model`, `.provider`) described in the spec are not defined.
1764
+
1634
1765
  What goes into the Legion::LLM pipeline.
1635
1766
 
1636
1767
  ```
@@ -1795,6 +1926,8 @@ For queue ordering when requests go through RMQ:
1795
1926
 
1796
1927
  ## Response
1797
1928
 
1929
+ > **Implementation status: PARTIAL (envelope)** — All 34 fields exist as `Data.define` members. 9 fields are always nil/empty (see status matrix above). `routing` populates `provider`, `model`, `strategy`, `tier`, `escalated`, `escalation_chain`, `latency_ms`. `stop.reason` extracted from provider response (falls back to `:end_turn`). `quality` returns `{ score:, band:, source:, signals: }` from `ConfidenceScorer` (not `{ score:, acceptable:, checker: }` as the Response struct below shows). `cost` populated via `CostEstimator.estimate` with `estimated_usd`, `provider`, `model`. Convenience accessors (`.model`, `.provider`) are not defined.
1930
+
1798
1931
  What comes back from the Legion::LLM pipeline.
1799
1932
 
1800
1933
  ```
@@ -1843,7 +1976,7 @@ Response
1843
1976
 
1844
1977
  # Stop (symmetric with request)
1845
1978
  stop: Hash
1846
- reason: Symbol # :end_turn, :tool_calls, :max_tokens, :safety, :stop_sequence
1979
+ reason: Symbol # :end_turn, :tool_use, :max_tokens, :safety, :stop_sequence
1847
1980
  sequence: String? # which stop sequence was hit (nil if none)
1848
1981
 
1849
1982
  # Tools (symmetric with request)
@@ -1984,6 +2117,8 @@ response.participants # ["pipeline", "rbac", "provider:claude", ...]
1984
2117
 
1985
2118
  ## Chunk (Streaming)
1986
2119
 
2120
+ > **Implementation status: NOT IMPLEMENTED** — No `Chunk` struct exists. Streaming (`call_stream`) yields raw RubyLLM chunk objects directly to callers with no translation to the spec format.
2121
+
1987
2122
  Incremental data during a streamed response.
1988
2123
 
1989
2124
  ```
@@ -2015,6 +2150,8 @@ Chunk
2015
2150
 
2016
2151
  ## ErrorResponse
2017
2152
 
2153
+ > **Implementation status: NOT IMPLEMENTED** — No `ErrorResponse` struct exists. Errors are raised as exceptions from the `Legion::LLM` error hierarchy: `LLMError` (base), `AuthError`, `RateLimitError`, `ContextOverflow`, `ProviderError`, `ProviderDown`, `UnsupportedCapability`, `PipelineError`, `TokenBudgetExceeded`, `EmbeddingUnavailableError`. Additionally, `EscalationExhausted`, `DaemonDeniedError`, `DaemonRateLimitedError`, and `PrivacyModeError` inherit from `StandardError` directly (not `LLMError`). These are Ruby exceptions, not structured response payloads.
2154
+
2018
2155
  Standard error format for failed requests.
2019
2156
 
2020
2157
  ```
@@ -2059,6 +2196,8 @@ ErrorResponse
2059
2196
 
2060
2197
  ## Conversation
2061
2198
 
2199
+ > **Implementation status: PARTIAL** — `ConversationStore` exists as an in-memory LRU (256 slots) with optional DB persistence. No `Conversation` struct — conversations are plain hashes (`{ messages: [], metadata: {}, lru_tick: N }`). DB persistence stores `id`, `caller_identity`, `metadata` (JSON blob), `created_at`, `updated_at`. Most spec fields (`title`, `summary`, `state`, `shared`, `participants`, `tags`, `pinned`, `usage_total`, `routing_history`) exist only as arbitrary metadata blob entries, not first-class fields.
2200
+
2062
2201
  The persistent conversation object stored in the ConversationStore.
2063
2202
 
2064
2203
  ```
@@ -2119,6 +2258,8 @@ Legion::LLM.chat(
2119
2258
 
2120
2259
  ## Config (Generation Parameters)
2121
2260
 
2261
+ > **Implementation status: PARTIAL** — Generation parameters are mapped to the first-class `generation` field by `from_chat_args`. However, provider adapters only forward `model` and `provider` to RubyLLM, not temperature/top_p/etc from the `generation` hash.
2262
+
2122
2263
  Sent in `request.generation`. Provider adapters map supported parameters and ignore unsupported ones.
2123
2264
 
2124
2265
  ```
@@ -2162,6 +2303,8 @@ response_format:
2162
2303
 
2163
2304
  ## Provider Adapter Contract
2164
2305
 
2306
+ > **Implementation status: PARTIAL** — Provider LEXs (extensions-ai/) exist and work for chat/embed. The formal `ProviderAdapter` interface with `Translator` is not enforced — providers integrate via RubyLLM's native provider system.
2307
+
2165
2308
  Every provider LEX must implement `Legion::LLM::ProviderAdapter` including a `Translator`.
2166
2309
 
2167
2310
  ### Required methods
@@ -17,6 +17,7 @@ module Legion
17
17
  attr_reader :request, :profile, :timeline, :tracing, :enrichments,
18
18
  :audit, :warnings, :discovered_tools, :confidence_score,
19
19
  :escalation_chain
20
+ attr_accessor :tool_event_handler
20
21
 
21
22
  include Steps::ToolDiscovery
22
23
  include Steps::ToolCalls
@@ -67,6 +68,7 @@ module Legion
67
68
  @escalation_chain = nil
68
69
  @escalation_history = []
69
70
  @proactive_tier_assignment = nil
71
+ @tool_event_handler = nil
70
72
  end
71
73
 
72
74
  def call
@@ -164,7 +166,11 @@ module Legion
164
166
 
165
167
  def step_idempotency; end
166
168
 
167
- def step_conversation_uuid; end
169
+ def step_conversation_uuid
170
+ return if @request.conversation_id
171
+
172
+ @request = @request.with(conversation_id: "conv_#{SecureRandom.hex(8)}")
173
+ end
168
174
 
169
175
  def step_context_load
170
176
  conv_id = @request.conversation_id
@@ -187,7 +193,7 @@ module Legion
187
193
  maybe_compact_history(conv_id, history)
188
194
  end
189
195
 
190
- @enrichments[:conversation_history] = history
196
+ @enrichments['context:conversation_history'] = history
191
197
  @timeline.record(
192
198
  category: :internal, key: 'context:loaded',
193
199
  direction: :internal, detail: "loaded #{history.size} prior messages",
@@ -656,7 +662,15 @@ module Legion
656
662
 
657
663
  session, message_content = build_ruby_llm_session
658
664
  install_tool_loop_guard(session)
659
- @raw_response = message_content ? session.ask(message_content, &) : session
665
+
666
+ Thread.current[:legion_tool_event_handler] = @tool_event_handler
667
+ begin
668
+ @raw_response = message_content ? session.ask(message_content, &) : session
669
+ ensure
670
+ Thread.current[:legion_tool_event_handler] = nil
671
+ Thread.current[:legion_current_tool_call_id] = nil
672
+ Thread.current[:legion_current_tool_name] = nil
673
+ end
660
674
 
661
675
  @timestamps[:provider_end] = Time.now
662
676
  record_provider_response
@@ -690,18 +704,47 @@ module Legion
690
704
  end
691
705
 
692
706
  def install_tool_loop_guard(session)
693
- return unless session.respond_to?(:on)
707
+ unless session.respond_to?(:on_tool_call)
708
+ log.warn('[pipeline] tool loop guard unavailable: ruby_llm session does not respond to on_tool_call')
709
+ return
710
+ end
694
711
 
695
712
  tool_round = 0
696
- session.on(:tool_call) do |_tool_call|
713
+ session.on_tool_call do |tool_call|
697
714
  tool_round += 1
698
715
  if tool_round > MAX_RUBY_LLM_TOOL_ROUNDS
699
716
  log.warn("[pipeline] tool loop cap hit: #{tool_round} rounds, halting")
700
717
  raise Legion::LLM::PipelineError, "tool loop exceeded #{MAX_RUBY_LLM_TOOL_ROUNDS} rounds"
701
718
  end
719
+
720
+ emit_tool_call_event(tool_call, tool_round)
702
721
  end
703
722
  end
704
723
 
724
+ def emit_tool_call_event(tool_call, round)
725
+ tc_id = tool_call_field(tool_call, :id)
726
+ tc_name = tool_call_field(tool_call, :name)
727
+ tc_args = tool_call_field(tool_call, :arguments)
728
+
729
+ log.info("[pipeline][tool-call] round=#{round} id=#{tc_id} tool=#{tc_name}")
730
+
731
+ Thread.current[:legion_current_tool_call_id] = tc_id
732
+ Thread.current[:legion_current_tool_name] = tc_name
733
+
734
+ @tool_event_handler&.call(
735
+ type: :tool_call, tool_call_id: tc_id, tool_name: tc_name,
736
+ arguments: tc_args, round: round
737
+ )
738
+ end
739
+
740
+ def tool_call_field(tool_call, field)
741
+ return tool_call.public_send(field) if tool_call.respond_to?(field)
742
+
743
+ tool_call[field]
744
+ rescue StandardError
745
+ nil
746
+ end
747
+
705
748
  def apply_ruby_llm_instructions(session)
706
749
  injected_system = EnrichmentInjector.inject(
707
750
  system: @request.system,
@@ -758,7 +801,7 @@ module Legion
758
801
  attrs = Steps::SpanAnnotator.attributes_for(step_name, audit: @audit, enrichments: @enrichments)
759
802
  attrs.each { |key, val| span.set_attribute(key, val) unless val.nil? }
760
803
  rescue StandardError => e
761
- handle_exception(e, level: :debug, operation: 'llm.pipeline.annotate_span', step: step_name)
804
+ handle_exception(e, level: :warn, operation: 'llm.pipeline.annotate_span', step: step_name)
762
805
  nil
763
806
  end
764
807
 
@@ -783,7 +826,7 @@ module Legion
783
826
  span.set_attribute('routing.tier', data[:tier].to_s) if data[:tier]
784
827
  end
785
828
  rescue StandardError => e
786
- handle_exception(e, level: :debug, operation: 'llm.pipeline.annotate_top_level_span')
829
+ handle_exception(e, level: :warn, operation: 'llm.pipeline.annotate_top_level_span')
787
830
  nil
788
831
  end
789
832
 
@@ -800,7 +843,14 @@ module Legion
800
843
  nil
801
844
  end
802
845
 
803
- def step_response_normalization; end
846
+ def step_response_normalization
847
+ # Normalize enrichment keys to consistent string "source:type" format
848
+ normalized = {}
849
+ @enrichments.each do |key, value|
850
+ normalized[key.to_s] = value
851
+ end
852
+ @enrichments = normalized
853
+ end
804
854
 
805
855
  def step_context_store
806
856
  conv_id = @request.conversation_id
@@ -865,10 +915,11 @@ module Legion
865
915
  request_id: @request.id,
866
916
  conversation_id: @request.conversation_id || "conv_#{SecureRandom.hex(8)}",
867
917
  message: msg,
868
- routing: { provider: @resolved_provider, model: @resolved_model },
918
+ routing: build_response_routing,
869
919
  tokens: extract_tokens,
870
- stop: { reason: :end_turn },
920
+ stop: extract_stop_reason,
871
921
  tools: response_tool_calls,
922
+ cost: estimate_response_cost,
872
923
  timestamps: @timestamps,
873
924
  enrichments: @enrichments,
874
925
  audit: @audit,
@@ -890,17 +941,103 @@ module Legion
890
941
  Array(requested).map { |name| name.to_s.tr('.', '_') }.reject(&:empty?)
891
942
  end
892
943
 
944
+ def build_response_routing
945
+ routing = { provider: @resolved_provider, model: @resolved_model }
946
+
947
+ routing_audit = @audit[:'routing:provider_selection']
948
+ if routing_audit.is_a?(Hash) && routing_audit[:data].is_a?(Hash)
949
+ routing[:strategy] = routing_audit[:data][:strategy]
950
+ routing[:tier] = routing_audit[:data][:tier]
951
+ end
952
+
953
+ routing[:escalated] = @escalation_history.size > 1
954
+ routing[:escalation_chain] = @escalation_history if @escalation_history.any?
955
+
956
+ if @timestamps[:provider_start] && @timestamps[:provider_end]
957
+ routing[:latency_ms] = ((@timestamps[:provider_end] - @timestamps[:provider_start]) * 1000).round
958
+ end
959
+
960
+ routing
961
+ end
962
+
963
+ def extract_stop_reason
964
+ reason = if @raw_response.respond_to?(:stop_reason)
965
+ @raw_response.stop_reason&.to_sym
966
+ elsif @raw_response.respond_to?(:tool_calls) && @raw_response.tool_calls&.any?
967
+ :tool_use
968
+ end
969
+ { reason: reason || :end_turn }
970
+ rescue StandardError
971
+ { reason: :end_turn }
972
+ end
973
+
974
+ def estimate_response_cost
975
+ tokens = extract_tokens
976
+ input = tokens.respond_to?(:input_tokens) ? tokens.input_tokens : tokens[:input].to_i
977
+ output = tokens.respond_to?(:output_tokens) ? tokens.output_tokens : tokens[:output].to_i
978
+ return {} unless @resolved_model && (input + output).positive?
979
+
980
+ estimated = CostEstimator.estimate(
981
+ model_id: @resolved_model,
982
+ input_tokens: input,
983
+ output_tokens: output
984
+ )
985
+ { estimated_usd: estimated, provider: @resolved_provider, model: @resolved_model }
986
+ rescue StandardError
987
+ {}
988
+ end
989
+
893
990
  def response_tool_calls
894
991
  return [] unless @raw_response.respond_to?(:tool_calls) && @raw_response.tool_calls
895
992
 
993
+ tool_timeline = build_tool_timeline_index
994
+
896
995
  Array(@raw_response.tool_calls).map do |tool_call|
897
- {
898
- id: tool_call[:id] || tool_call['id'],
899
- name: tool_call[:name] || tool_call['name'],
996
+ tc_id = tool_call[:id] || tool_call['id']
997
+ tc_name = tool_call[:name] || tool_call['name']
998
+
999
+ entry = {
1000
+ id: tc_id,
1001
+ name: tc_name,
900
1002
  arguments: tool_call[:arguments] || tool_call['arguments'] || {}
901
1003
  }
1004
+
1005
+ # Merge execution data from timeline if available
1006
+ timeline_data = tool_timeline[tc_name]
1007
+ if timeline_data
1008
+ entry[:exchange_id] = timeline_data[:exchange_id]
1009
+ entry[:source] = timeline_data[:source]
1010
+ entry[:status] = timeline_data[:status]
1011
+ entry[:duration_ms] = timeline_data[:duration_ms]
1012
+ entry[:result] = timeline_data[:result]
1013
+ end
1014
+
1015
+ entry
902
1016
  end
903
1017
  end
1018
+
1019
+ def build_tool_timeline_index
1020
+ index = {}
1021
+ @timeline.events.each do |event|
1022
+ key = event[:key]
1023
+ data = event[:data] || {}
1024
+
1025
+ if key&.start_with?('tool:execute:')
1026
+ tool_name = key.sub('tool:execute:', '')
1027
+ index[tool_name] = {
1028
+ exchange_id: event[:exchange_id],
1029
+ source: data[:source],
1030
+ status: data[:status],
1031
+ duration_ms: event[:duration_ms]
1032
+ }
1033
+ elsif key&.start_with?('tool:result:')
1034
+ tool_name = key.sub('tool:result:', '')
1035
+ index[tool_name][:result] = data[:result] if index[tool_name]
1036
+ end
1037
+ end
1038
+
1039
+ index
1040
+ end
904
1041
  end
905
1042
  end
906
1043
  end
@@ -67,26 +67,45 @@ module Legion
67
67
 
68
68
  extra = kwargs.except(
69
69
  :message, :messages, :model, :provider, :system,
70
- :tools, :stream, :caller, :classification, :billing,
70
+ :tools, :tool_choice, :stream, :caller, :classification, :billing,
71
71
  :agent, :test, :tracing, :priority, :conversation_id,
72
- :request_id, :id
72
+ :request_id, :id, :generation, :thinking, :response_format,
73
+ :context_strategy, :cache, :fork, :tokens, :stop,
74
+ :modality, :hooks, :idempotency_key, :ttl, :metadata,
75
+ :enrichments, :predictions
73
76
  )
74
77
 
75
78
  build_args = {
76
- messages: messages,
77
- system: kwargs[:system],
78
- routing: routing,
79
- tools: kwargs.fetch(:tools, []),
80
- stream: kwargs.fetch(:stream, false),
81
- caller: kwargs[:caller],
82
- classification: kwargs[:classification],
83
- billing: kwargs[:billing],
84
- agent: kwargs[:agent],
85
- test: kwargs[:test],
86
- tracing: kwargs[:tracing],
87
- priority: kwargs.fetch(:priority, :normal),
88
- conversation_id: kwargs[:conversation_id],
89
- extra: extra
79
+ messages: messages,
80
+ system: kwargs[:system],
81
+ routing: routing,
82
+ tools: kwargs.fetch(:tools, []),
83
+ tool_choice: kwargs[:tool_choice] || { mode: :auto },
84
+ stream: kwargs.fetch(:stream, false),
85
+ generation: kwargs[:generation] || {},
86
+ thinking: kwargs[:thinking],
87
+ response_format: kwargs[:response_format] || { type: :text },
88
+ context_strategy: kwargs.fetch(:context_strategy, :auto),
89
+ cache: kwargs[:cache] || { strategy: :default, cacheable: true },
90
+ fork: kwargs[:fork],
91
+ tokens: kwargs[:tokens] || { max: 4096 },
92
+ stop: kwargs[:stop] || { sequences: [] },
93
+ modality: kwargs[:modality],
94
+ hooks: kwargs[:hooks],
95
+ caller: kwargs[:caller],
96
+ classification: kwargs[:classification],
97
+ billing: kwargs[:billing],
98
+ agent: kwargs[:agent],
99
+ test: kwargs[:test],
100
+ tracing: kwargs[:tracing],
101
+ priority: kwargs.fetch(:priority, :normal),
102
+ conversation_id: kwargs[:conversation_id],
103
+ idempotency_key: kwargs[:idempotency_key],
104
+ ttl: kwargs[:ttl],
105
+ metadata: kwargs[:metadata] || {},
106
+ enrichments: kwargs[:enrichments] || {},
107
+ predictions: kwargs[:predictions] || {},
108
+ extra: extra
90
109
  }
91
110
  build_args[:id] = request_id if request_id
92
111
  build(**build_args)
@@ -55,13 +55,21 @@ module Legion
55
55
  input = msg.respond_to?(:input_tokens) ? msg.input_tokens.to_i : 0
56
56
  output = msg.respond_to?(:output_tokens) ? msg.output_tokens.to_i : 0
57
57
 
58
+ stop_reason = if msg.respond_to?(:stop_reason)
59
+ msg.stop_reason&.to_sym || :end_turn
60
+ elsif msg.respond_to?(:tool_calls) && msg.tool_calls&.any?
61
+ :tool_use
62
+ else
63
+ :end_turn
64
+ end
65
+
58
66
  build(
59
67
  request_id: request_id,
60
68
  conversation_id: conversation_id,
61
69
  message: { role: :assistant, content: msg.content },
62
70
  routing: { provider: provider, model: model || (msg.respond_to?(:model_id) ? msg.model_id : nil) },
63
71
  tokens: { input: input, output: output, total: input + output },
64
- stop: { reason: :end_turn },
72
+ stop: { reason: stop_reason },
65
73
  **extra
66
74
  )
67
75
  end
@@ -9,7 +9,7 @@ module Legion
9
9
  module Classification
10
10
  include Legion::Logging::Helper
11
11
 
12
- LEVELS = %i[public internal restricted confidential].freeze
12
+ LEVELS = %i[public internal confidential restricted].freeze
13
13
 
14
14
  PII_PATTERNS = {
15
15
  ssn: /\b\d{3}-\d{2}-\d{4}\b/,
@@ -105,7 +105,7 @@ module Legion
105
105
 
106
106
  { level: level.to_sym }
107
107
  rescue StandardError => e
108
- handle_exception(e, level: :debug, operation: 'llm.pipeline.steps.classification.default')
108
+ handle_exception(e, level: :warn, operation: 'llm.pipeline.steps.classification.default')
109
109
  nil
110
110
  end
111
111
  end
@@ -15,6 +15,93 @@ require 'legion/logging/helper'
15
15
  module Legion
16
16
  module LLM
17
17
  module Routes
18
+ # Mixin for dynamically-built client tool classes — keeps build_client_tool_class small.
19
+ module ClientToolMethods
20
+ private
21
+
22
+ def log_tool(level, ref, status, **details)
23
+ return unless defined?(Legion::Logging)
24
+
25
+ parts = ["[tool][#{ref}] #{status}"]
26
+ details.each { |k, v| parts << "#{k}=#{v}" }
27
+ Legion::Logging.send(level, parts.join(' '))
28
+ end
29
+
30
+ def summarize_tool_arg_keys(kwargs)
31
+ kwargs.keys.map(&:to_s).sort.join(',')
32
+ end
33
+
34
+ def summarize_tool_args(ref, kwargs)
35
+ case ref
36
+ when 'sh'
37
+ { args: summarize_tool_arg_keys(kwargs), command_provided: kwargs.key?(:command) || kwargs.key?(:cmd) || !kwargs.empty? }
38
+ when 'file_write'
39
+ content = kwargs[:content] || kwargs[:contents]
40
+ { args: summarize_tool_arg_keys(kwargs), bytes: content.to_s.bytesize }
41
+ when 'file_edit'
42
+ { args: summarize_tool_arg_keys(kwargs),
43
+ old_len: kwargs[:old_text].to_s.length, new_len: kwargs[:new_text].to_s.length }
44
+ else
45
+ { args: summarize_tool_arg_keys(kwargs) }
46
+ end
47
+ end
48
+
49
+ def dispatch_client_tool(ref, **kwargs)
50
+ case ref
51
+ when 'sh'
52
+ cmd = kwargs[:command] || kwargs[:cmd] || kwargs.values.first.to_s
53
+ output, status = ::Open3.capture2e(cmd, chdir: Dir.pwd)
54
+ "exit=#{status.exitstatus}\n#{output}"
55
+ when 'file_read'
56
+ path = kwargs[:path] || kwargs[:file_path] || kwargs.values.first.to_s
57
+ ::File.exist?(path) ? ::File.read(path, encoding: 'utf-8') : "File not found: #{path}"
58
+ when 'file_write'
59
+ path = kwargs[:path] || kwargs[:file_path]
60
+ content = kwargs[:content] || kwargs[:contents]
61
+ ::File.write(path, content)
62
+ "Written #{content.to_s.bytesize} bytes to #{path}"
63
+ when 'file_edit'
64
+ path = kwargs[:path] || kwargs[:file_path]
65
+ old_text = kwargs[:old_text] || kwargs[:search]
66
+ new_text = kwargs[:new_text] || kwargs[:replace]
67
+ content = ::File.read(path, encoding: 'utf-8')
68
+ content.sub!(old_text, new_text)
69
+ ::File.write(path, content)
70
+ "Edited #{path}"
71
+ when 'list_directory'
72
+ path = ::File.expand_path(kwargs[:path] || kwargs[:dir] || Dir.pwd)
73
+ Dir.entries(path).reject { |e| e.start_with?('.') }.sort.join("\n")
74
+ when 'grep'
75
+ pattern = kwargs[:pattern] || kwargs[:query] || kwargs.values.first.to_s
76
+ path = kwargs[:path] || Dir.pwd
77
+ output, = ::Open3.capture2e('grep', '-rn', '--include=*.rb', pattern, path)
78
+ output.lines.first(50).join
79
+ when 'glob'
80
+ pattern = kwargs[:pattern] || kwargs.values.first.to_s
81
+ Dir.glob(pattern).first(100).join("\n")
82
+ when 'web_fetch'
83
+ url = kwargs[:url] || kwargs.values.first.to_s
84
+ require 'net/http'
85
+ uri = URI(url)
86
+ Net::HTTP.get(uri)
87
+ else
88
+ "Tool #{ref} is not executable server-side. Use a legion_ prefixed tool instead."
89
+ end
90
+ end
91
+
92
+ def notify_tool_event(type, ref, **data)
93
+ handler = Thread.current[:legion_tool_event_handler]
94
+ return unless handler
95
+
96
+ handler.call(
97
+ type: type,
98
+ tool_call_id: Thread.current[:legion_current_tool_call_id],
99
+ tool_name: ref,
100
+ **data
101
+ )
102
+ end
103
+ end
104
+
18
105
  def self.registered(app) # rubocop:disable Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity,Metrics/AbcSize,Metrics/MethodLength
19
106
  app.helpers do # rubocop:disable Metrics/BlockLength
20
107
  include Legion::Logging::Helper
@@ -31,7 +118,7 @@ module Legion
31
118
  begin
32
119
  parsed = Legion::JSON.load(raw)
33
120
  rescue StandardError => e
34
- handle_exception(e, level: :debug, operation: 'llm.routes.parse_request_body')
121
+ handle_exception(e, level: :warn, operation: 'llm.routes.parse_request_body')
35
122
  halt 400, { 'Content-Type' => 'application/json' },
36
123
  Legion::JSON.dump({ error: { code: 'invalid_json', message: 'request body is not valid JSON' } })
37
124
  end
@@ -140,55 +227,31 @@ module Legion
140
227
  end
141
228
  end
142
229
 
143
- # rubocop:disable Metrics/BlockLength
144
230
  define_method(:build_client_tool_class) do |tname, tdesc, tschema|
231
+ tool_ref = tname
145
232
  klass = Class.new(RubyLLM::Tool) do
233
+ include Legion::LLM::Routes::ClientToolMethods
234
+
146
235
  description tdesc
147
- define_method(:name) { tname }
148
- tool_ref = tname
236
+ define_method(:name) { tool_ref }
149
237
 
150
238
  define_method(:execute) do |**kwargs|
151
- case tool_ref
152
- when 'sh'
153
- cmd = kwargs[:command] || kwargs[:cmd] || kwargs.values.first.to_s
154
- output, status = ::Open3.capture2e(cmd, chdir: Dir.pwd)
155
- "exit=#{status.exitstatus}\n#{output}"
156
- when 'file_read'
157
- path = kwargs[:path] || kwargs[:file_path] || kwargs.values.first.to_s
158
- ::File.exist?(path) ? ::File.read(path, encoding: 'utf-8') : "File not found: #{path}"
159
- when 'file_write'
160
- path = kwargs[:path] || kwargs[:file_path]
161
- content = kwargs[:content] || kwargs[:contents]
162
- ::File.write(path, content)
163
- "Written #{content.to_s.bytesize} bytes to #{path}"
164
- when 'file_edit'
165
- path = kwargs[:path] || kwargs[:file_path]
166
- old_text = kwargs[:old_text] || kwargs[:search]
167
- new_text = kwargs[:new_text] || kwargs[:replace]
168
- content = ::File.read(path, encoding: 'utf-8')
169
- content.sub!(old_text, new_text)
170
- ::File.write(path, content)
171
- "Edited #{path}"
172
- when 'list_directory'
173
- path = kwargs[:path] || kwargs[:dir] || Dir.pwd
174
- Dir.entries(path).reject { |e| e.start_with?('.') }.sort.join("\n")
175
- when 'grep'
176
- pattern = kwargs[:pattern] || kwargs[:query] || kwargs.values.first.to_s
177
- path = kwargs[:path] || Dir.pwd
178
- output, = ::Open3.capture2e('grep', '-rn', '--include=*.rb', pattern, path)
179
- output.lines.first(50).join
180
- when 'glob'
181
- pattern = kwargs[:pattern] || kwargs.values.first.to_s
182
- Dir.glob(pattern).first(100).join("\n")
183
- when 'web_fetch'
184
- url = kwargs[:url] || kwargs.values.first.to_s
185
- require 'net/http'
186
- uri = URI(url)
187
- Net::HTTP.get(uri)
188
- else
189
- "Tool #{tool_ref} is not executable server-side. Use a legion_ prefixed tool instead."
190
- end
239
+ summary = summarize_tool_args(tool_ref, kwargs)
240
+ log_tool(:info, tool_ref, 'executing', **summary)
241
+ t0 = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
242
+ result = dispatch_client_tool(tool_ref, **kwargs)
243
+ ms = ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - t0) * 1000).round(1)
244
+ log_tool(:info, tool_ref, 'completed', duration_ms: ms, result_size: result.to_s.bytesize)
245
+ notify_tool_event(:tool_result, tool_ref, result: result.to_s[0, 4096])
246
+ result
191
247
  rescue StandardError => e
248
+ ms = begin
249
+ ((::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - t0) * 1000).round(1)
250
+ rescue StandardError
251
+ nil
252
+ end
253
+ log_tool(:error, tool_ref, 'failed', duration_ms: ms, error: e.message)
254
+ notify_tool_event(:tool_error, tool_ref, error: e.message)
192
255
  if defined?(Legion::Logging) && Legion::Logging.respond_to?(:log_exception)
193
256
  Legion::Logging.log_exception(e, payload_summary: "client tool #{tool_ref} failed", component_type: :api)
194
257
  end
@@ -201,7 +264,6 @@ module Legion
201
264
  handle_exception(e, level: :warn, operation: "llm.routes.build_client_tool_class.#{tname}")
202
265
  nil
203
266
  end
204
- # rubocop:enable Metrics/BlockLength
205
267
 
206
268
  define_method(:extract_tool_calls) do |pipeline_response|
207
269
  tools_data = pipeline_response.tools
@@ -217,10 +279,12 @@ module Legion
217
279
  end
218
280
 
219
281
  define_method(:emit_sse_event) do |stream, event_name, payload|
282
+ level = event_name == 'text-delta' ? :debug : :info
283
+ log.send(level, "[sse][emit] event=#{event_name} keys=#{payload.is_a?(Hash) ? payload.keys.join(',') : 'n/a'}")
220
284
  stream << "event: #{event_name}\ndata: #{Legion::JSON.dump(payload)}\n\n"
221
285
  end
222
286
 
223
- define_method(:emit_timeline_tool_events) do |stream, pipeline_response|
287
+ define_method(:emit_timeline_tool_events) do |stream, pipeline_response, skip_tool_results: false|
224
288
  timeline = Array(pipeline_response.timeline)
225
289
  timeline.each do |event|
226
290
  key = event[:key].to_s
@@ -230,6 +294,9 @@ module Legion
230
294
  next if name.to_s.empty?
231
295
 
232
296
  if key.start_with?('tool:result:')
297
+ # Skip replay when real-time tool events already emitted these during streaming
298
+ next if skip_tool_results
299
+
233
300
  event_name = data[:status].to_s == 'error' ? 'tool-error' : 'tool-result'
234
301
  emit_sse_event(stream, event_name, {
235
302
  toolCallId: data[:tool_call_id],
@@ -520,6 +587,35 @@ module Legion
520
587
  # rubocop:disable Metrics/BlockLength
521
588
  stream do |out|
522
589
  full_text = +''
590
+
591
+ executor.tool_event_handler = lambda { |event|
592
+ log.info("[inference][tool-event] type=#{event[:type]} tool=#{event[:tool_name]} id=#{event[:tool_call_id]}")
593
+ case event[:type]
594
+ when :tool_call
595
+ emit_sse_event(out, 'tool-call', {
596
+ toolCallId: event[:tool_call_id],
597
+ toolName: event[:tool_name],
598
+ args: event[:arguments],
599
+ timestamp: Time.now.utc.iso8601
600
+ })
601
+ when :tool_result
602
+ emit_sse_event(out, 'tool-result', {
603
+ toolCallId: event[:tool_call_id],
604
+ toolName: event[:tool_name],
605
+ result: event[:result],
606
+ timestamp: Time.now.utc.iso8601
607
+ })
608
+ when :tool_error
609
+ emit_sse_event(out, 'tool-error', {
610
+ toolCallId: event[:tool_call_id],
611
+ toolName: event[:tool_name],
612
+ result: event[:error],
613
+ status: 'error',
614
+ timestamp: Time.now.utc.iso8601
615
+ })
616
+ end
617
+ }
618
+
523
619
  pipeline_response = executor.call_stream do |chunk|
524
620
  text = chunk.respond_to?(:content) ? chunk.content.to_s : chunk.to_s
525
621
  next if text.empty?
@@ -528,16 +624,7 @@ module Legion
528
624
  emit_sse_event(out, 'text-delta', { delta: text })
529
625
  end
530
626
 
531
- extract_tool_calls(pipeline_response).each do |tool_call|
532
- emit_sse_event(out, 'tool-call', {
533
- toolCallId: tool_call[:id],
534
- toolName: tool_call[:name],
535
- args: tool_call[:arguments],
536
- timestamp: Time.now.utc.iso8601
537
- })
538
- end
539
-
540
- emit_timeline_tool_events(out, pipeline_response)
627
+ emit_timeline_tool_events(out, pipeline_response, skip_tool_results: !executor.tool_event_handler.nil?)
541
628
 
542
629
  enrichments = pipeline_response.enrichments
543
630
  emit_sse_event(out, 'enrichment', enrichments) if enrichments.is_a?(Hash) && !enrichments.empty?
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module LLM
5
- VERSION = '0.6.20'
5
+ VERSION = '0.6.23'
6
6
  end
7
7
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-llm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.20
4
+ version: 0.6.23
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity