legion-llm 0.12.14 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (83) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +1 -0
  3. data/.rubocop.yml +63 -0
  4. data/AGENTS.md +48 -57
  5. data/CHANGELOG.md +293 -0
  6. data/CLAUDE.md +104 -762
  7. data/Gemfile +12 -8
  8. data/README.md +97 -4
  9. data/legion-llm.gemspec +1 -1
  10. data/lib/legion/llm/api/client_translators/anthropic_messages.rb +761 -0
  11. data/lib/legion/llm/api/client_translators/openai_chat.rb +623 -0
  12. data/lib/legion/llm/api/client_translators/openai_responses.rb +852 -0
  13. data/lib/legion/llm/api/client_translators/shared_extractors.rb +150 -0
  14. data/lib/legion/llm/api/debug_formats.rb +356 -0
  15. data/lib/legion/llm/api/namespaces/anthropic/messages.rb +66 -408
  16. data/lib/legion/llm/api/namespaces/openai/batches.rb +1 -1
  17. data/lib/legion/llm/api/namespaces/openai/chat/completions.rb +71 -175
  18. data/lib/legion/llm/api/namespaces/openai/responses.rb +90 -456
  19. data/lib/legion/llm/api/native/models.rb +2 -2
  20. data/lib/legion/llm/api/native/tiers.rb +2 -2
  21. data/lib/legion/llm/api/openai/responses.rb +1 -1
  22. data/lib/legion/llm/api/stream_assembler.rb +705 -0
  23. data/lib/legion/llm/api.rb +8 -4
  24. data/lib/legion/llm/cache/response.rb +2 -2
  25. data/lib/legion/llm/cache.rb +9 -7
  26. data/lib/legion/llm/call/dispatch.rb +347 -215
  27. data/lib/legion/llm/call/embeddings.rb +3 -3
  28. data/lib/legion/llm/call/lex_llm_adapter.rb +80 -23
  29. data/lib/legion/llm/call/structured_output.rb +2 -2
  30. data/lib/legion/llm/capabilities.rb +46 -0
  31. data/lib/legion/llm/compat.rb +1 -2
  32. data/lib/legion/llm/content_hash.rb +52 -0
  33. data/lib/legion/llm/context/compressor.rb +1 -1
  34. data/lib/legion/llm/context/curator.rb +1 -1
  35. data/lib/legion/llm/deprecation.rb +34 -0
  36. data/lib/legion/llm/discovery/rule_generator.rb +126 -15
  37. data/lib/legion/llm/discovery/system.rb +1 -9
  38. data/lib/legion/llm/discovery.rb +205 -23
  39. data/lib/legion/llm/errors.rb +37 -0
  40. data/lib/legion/llm/fleet/dispatcher.rb +1 -3
  41. data/lib/legion/llm/fleet/lane.rb +16 -1
  42. data/lib/legion/llm/fleet/token_issuer.rb +2 -1
  43. data/lib/legion/llm/inference/audit_publisher.rb +25 -0
  44. data/lib/legion/llm/inference/context_accounting.rb +111 -0
  45. data/lib/legion/llm/inference/embed_pipeline.rb +187 -0
  46. data/lib/legion/llm/inference/executor/context_window.rb +199 -0
  47. data/lib/legion/llm/inference/executor/escalation.rb +798 -0
  48. data/lib/legion/llm/inference/executor/routing.rb +471 -0
  49. data/lib/legion/llm/inference/executor/tool_injection.rb +396 -0
  50. data/lib/legion/llm/inference/executor.rb +306 -1635
  51. data/lib/legion/llm/inference/native_tool_loop.rb +307 -53
  52. data/lib/legion/llm/inference/request.rb +9 -4
  53. data/lib/legion/llm/inference/route_attempts.rb +41 -4
  54. data/lib/legion/llm/inference/steps/debate.rb +10 -3
  55. data/lib/legion/llm/inference/steps/knowledge_capture.rb +1 -1
  56. data/lib/legion/llm/inference/steps/metering.rb +16 -2
  57. data/lib/legion/llm/inference/steps/post_response.rb +18 -46
  58. data/lib/legion/llm/inference/steps/rag_context.rb +18 -0
  59. data/lib/legion/llm/inference/steps/tier_assigner.rb +4 -4
  60. data/lib/legion/llm/inference/steps/tool_calls.rb +63 -10
  61. data/lib/legion/llm/inference/steps/trigger_match.rb +20 -1
  62. data/lib/legion/llm/inference.rb +104 -15
  63. data/lib/legion/llm/inventory.rb +107 -22
  64. data/lib/legion/llm/metering/tracker.rb +1 -1
  65. data/lib/legion/llm/metering.rb +1 -1
  66. data/lib/legion/llm/quality/checker.rb +5 -1
  67. data/lib/legion/llm/quality/confidence/scorer.rb +7 -1
  68. data/lib/legion/llm/router/availability.rb +178 -0
  69. data/lib/legion/llm/router/candidates.rb +263 -0
  70. data/lib/legion/llm/router/health_tracker.rb +31 -2
  71. data/lib/legion/llm/router/registry_lookup.rb +121 -0
  72. data/lib/legion/llm/router/rule.rb +3 -2
  73. data/lib/legion/llm/router.rb +295 -344
  74. data/lib/legion/llm/scheduling/batch.rb +3 -3
  75. data/lib/legion/llm/scheduling.rb +2 -2
  76. data/lib/legion/llm/settings.rb +78 -25
  77. data/lib/legion/llm/tools/dispatcher.rb +45 -2
  78. data/lib/legion/llm/tools/special.rb +45 -3
  79. data/lib/legion/llm/types/tool_definition.rb +3 -1
  80. data/lib/legion/llm/vector_store/storage.rb +0 -2
  81. data/lib/legion/llm/version.rb +1 -1
  82. data/lib/legion/llm.rb +66 -6
  83. metadata +21 -3
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7d61b50d6573478325baba59ea7b05a8e7a6bce2c66c453d15eec40b1380b891
4
- data.tar.gz: e14038bcac7c816169e31bc2f8a08fb76331e0bc7b18766f029fe217c3b57d2d
3
+ metadata.gz: eaa596812e2320baa0d8ae31b44225c80e9ad9b54e62531b3cd1c7640ff09fe6
4
+ data.tar.gz: a98b90ae07c7040fcd8a13adccf1d07037f253cc845cdc4dc55b65f434805ce0
5
5
  SHA512:
6
- metadata.gz: e1abe73f183b7b6e135db20bb2bc2b875b6bb40234630c5327a906d38892b6ef83f35c2c9f8a8b0ee193451d1609f1e30c1ebc4b0ebf28e844b3a9e8044ffefa
7
- data.tar.gz: bf44ad26a524c018b042dda702b076c98ef7068aaf48e638305776532d809036778c86a0e4e385974c2d59bffbdf61da4a20067b89d09cbc85a2e5a6b34f5203
6
+ metadata.gz: a7611f997b2163792aa4f29d8ca2e3b8c10ec11af08ff8d89fae578ab0b0a138fbd9ad4957a5c42b793c30e959ab4c17290f01972cd0d8de8128e7e79873e28c
7
+ data.tar.gz: 3b0acc643ffe9c06c6d07df9c5dce9a54d79351878354e30c1cc22fb5c5001debf9ac529ddeb873868ff713f455665c6e5fe992558d70ef376151f9b368b7086
data/.gitignore CHANGED
@@ -24,3 +24,4 @@ docs/
24
24
  bin/apollo-setup-postreboot.sh
25
25
  bin/apollo-setup-prereboot.sh
26
26
  legionio-bootstrap-uhg-v3.json
27
+ docs/
data/.rubocop.yml CHANGED
@@ -1,8 +1,71 @@
1
+ plugins:
2
+ - rubocop-legion
3
+
4
+ # These rubocop-legion cops surface in the local-path 0.1.7 build but were not
5
+ # in the published 0.1.7 gem the repo previously tracked. They flag broad
6
+ # pre-existing patterns unrelated to the N×N enforcement pass; deferred to
7
+ # their own cleanup task. NoUnderscorePrefixedKwargs / NoInlineSettingDefaults
8
+ # / NoDirectDispatch / NoShapeDuckTyping (the B4 set Phase 6 adopts) remain
9
+ # enabled.
10
+ Legion/RescueLogging/NoCapture:
11
+ Enabled: false
12
+ Legion/ConstantSafety/InheritParam:
13
+ Enabled: false
14
+
1
15
  AllCops:
2
16
  TargetRubyVersion: 3.4
3
17
  NewCops: enable
4
18
  SuggestExtensions: false
5
19
 
20
+ # N×N routing guard cops (Phase 6 enforcement; defaults from rubocop-legion config/default.yml).
21
+ #
22
+ # - NoUnderscorePrefixedKwargs / NoInlineSettingDefaults / NoDirectDispatch are
23
+ # enabled repo-wide.
24
+ # - NoShapeDuckTyping is enabled on the canonical-only surface where the shape
25
+ # contract is fully established by translators. Code at the HTTP/wire ingress
26
+ # (client_translator parse_request, StreamAssembler chunk adapter, DebugFormats
27
+ # request envelope reader, Response.from_provider_message bridge) legitimately
28
+ # inspects shape because that's the layer responsible for normalising into
29
+ # canonical. The cop's scope expands here as the executor's canonical
30
+ # migration (Phase 4 follow-up) lands.
31
+ Legion/Framework/NoShapeDuckTyping:
32
+ Enabled: true
33
+ Include:
34
+ - 'lib/legion/llm/api/**/*.rb'
35
+ - 'lib/legion/llm/inference/**/*.rb'
36
+ Exclude:
37
+ # Legacy tree deprecated this release (R11); deleted next minor.
38
+ - 'lib/legion/llm/api/translators/**/*.rb'
39
+ - 'lib/legion/llm/api/anthropic/**/*.rb'
40
+ - 'lib/legion/llm/api/openai/**/*.rb'
41
+ - 'lib/legion/llm/api/native/**/*.rb'
42
+ - 'lib/legion/llm/api/shared_helpers.rb'
43
+ # Boundary code that legitimately bridges wire ↔ canonical: parse_request
44
+ # at the HTTP ingress, the StreamAssembler chunk adapter (P5 explicitly
45
+ # accepts both Canonical::Chunk and the legacy StreamChunk shape during
46
+ # migration), DebugFormats (reads raw env / reflects request), and
47
+ # shared_extractors (normalises arbitrary thinking content shapes).
48
+ # Pre-canonical inference steps still navigate raw wire hashes; that
49
+ # scope tightens once the executor finishes the canonical migration
50
+ # (Phase 4 follow-up).
51
+ - 'lib/legion/llm/api/client_translators/anthropic_messages.rb'
52
+ - 'lib/legion/llm/api/client_translators/openai_chat.rb'
53
+ - 'lib/legion/llm/api/client_translators/openai_responses.rb'
54
+ - 'lib/legion/llm/api/client_translators/shared_extractors.rb'
55
+ - 'lib/legion/llm/api/stream_assembler.rb'
56
+ - 'lib/legion/llm/api/debug_formats.rb'
57
+ - 'lib/legion/llm/api/namespaces/**/*.rb'
58
+ - 'lib/legion/llm/inference/audit_publisher.rb'
59
+ - 'lib/legion/llm/inference/embed_pipeline.rb'
60
+ - 'lib/legion/llm/inference/enrichment_injector.rb'
61
+ - 'lib/legion/llm/inference/executor.rb'
62
+ - 'lib/legion/llm/inference/executor/**/*.rb'
63
+ - 'lib/legion/llm/inference/native_tool_loop.rb'
64
+ - 'lib/legion/llm/inference/profile.rb'
65
+ - 'lib/legion/llm/inference/response.rb'
66
+ - 'lib/legion/llm/inference/route_attempts.rb'
67
+ - 'lib/legion/llm/inference/steps/**/*.rb'
68
+
6
69
  Layout/LineLength:
7
70
  Max: 195
8
71
  Layout/SpaceAroundEqualsInParameterDefault:
data/AGENTS.md CHANGED
@@ -1,46 +1,61 @@
1
- # legion-llm Agent Notes
1
+ # legion-llm Agent Notes (v0.13.0)
2
2
 
3
- ## Scope
4
-
5
- `legion-llm` provides provider configuration, chat/embed/structured interfaces, dynamic routing, escalation, quality checks, and pipeline execution for Legion.
3
+ `legion-llm` is a **universal translation proxy** for LLM traffic: N client dialects (OpenAI Chat,
4
+ OpenAI Responses, Anthropic Messages) × N provider backends (Bedrock, Anthropic, OpenAI, vLLM,
5
+ Ollama, fleet), any direction. Every request parses once into `Canonical::Request`, is
6
+ routed/executed, then renders once back to the caller's dialect. See `CLAUDE.md` for the full
7
+ invariant set; `README.md` for detailed reference.
6
8
 
7
9
  ## Fast Start
8
10
 
9
11
  ```bash
10
12
  bundle install
11
- bundle exec rspec
12
- bundle exec rubocop
13
+ bundle exec rspec # 0 failures required before commit
14
+ bundle exec rubocop # 0 offenses required
13
15
  ```
14
16
 
17
+ **The in-process matrix harness (`spec/legion/llm/api/matrix/`) is the commit gate.** Touch
18
+ `lib/legion/llm/api/`, the executor, or the canonical/translator boundary → it must pass before push.
19
+
15
20
  ## Primary Entry Points
16
21
 
17
- - `lib/legion/llm.rb`
18
- - `lib/legion/llm/providers.rb`
19
- - `lib/legion/llm/router/`
20
- - `lib/legion/llm/pipeline/`
21
- - `lib/legion/llm/structured_output.rb`
22
- - `lib/legion/llm/embeddings.rb`
23
- - `lib/legion/llm/fleet/`
22
+ - `lib/legion/llm.rb` — facade (`start`, `chat`, `ask`, `embed`, `structured`)
23
+ - `lib/legion/llm/inventory.rb` — **single source of truth** for the model catalog
24
+ - `lib/legion/llm/router.rb` + `router/{candidates,availability,health_tracker,escalation/}` — routing
25
+ - `lib/legion/llm/inference/executor.rb` + `executor/{routing,escalation}.rb` — pipeline
26
+ - `lib/legion/llm/inference/steps/` — the 18 pipeline steps
27
+ - `lib/legion/llm/api/{openai,anthropic,native}/` — client routes
28
+ - `lib/legion/llm/api/client_translators/` — canonical ↔ client wire formats
29
+ - `lib/legion/llm/context/curator.rb` — async conversation curation (context-cost control)
30
+ - Provider behaviour (defaults, capabilities, model filtering) lives in `../extensions-ai/lex-llm-*`
24
31
 
25
32
  ## Guardrails
26
33
 
27
- - Keep typed error behavior and retry semantics stable (`ProviderDown`, `RateLimitError`, `EscalationExhausted`, etc.).
28
- - Routing and escalation must remain deterministic given the same inputs/settings.
29
- - Preserve pipeline feature-flag behavior; avoid forcing pipeline-only code paths.
30
- - Keep provider credentials resolved through settings secret resolution flow; never hardcode secrets.
31
- - Maintain compatibility with direct methods (`chat_direct`, `embed_direct`, `structured_direct`) and daemon-aware flows.
32
- - Health tracker and rule scoring are contract-sensitive; changes require spec updates.
34
+ - **Always translate, never passthrough**; **no `provider == :x` branches** outside translators.
35
+ - **Inventory is the only catalog**; `Discovery`/`Registry`/`HealthTracker` are feeders.
36
+ - Never dispatch a triple absent from the live catalog or unhealthy; **fail over, don't hard-fail**.
37
+ - **Model policy = compliance**: `model_whitelist`/`model_blacklist` honored at dispatch, fail-closed;
38
+ a policy-denied model is terminal (never escalated, never trips circuits).
39
+ - Thinking never crosses providers; mid-stream failover must not kill an in-flight conversation.
40
+ - Every pipeline exit emits ledger events (metering/audit) — no bypasses.
41
+ - `Legion::JSON` only (symbol keys); every `rescue` re-raises or `handle_exception`s; no
42
+ `defined?(Legion::Settings)` guards; `log.*` not `puts`.
43
+ - **No personal/company identifiers in VCS**; never force-push.
44
+ - Routing/escalation deterministic for the same inputs/settings; health-tracker & rule scoring are
45
+ contract-sensitive — changes require spec updates.
33
46
 
34
47
  ## Validation
35
48
 
36
- - Run targeted specs for modified router/pipeline/provider code.
37
- - Before handoff, run full `bundle exec rspec` and `bundle exec rubocop`.
49
+ Run targeted specs for modified router/pipeline/translator code, then full `rspec` + `rubocop` +
50
+ the matrix harness before handoff.
38
51
 
39
52
  ---
40
53
 
41
54
  ## Client Request Headers Reference
42
55
 
43
- Verified from source code (Claude Code binary + Codex `codex-rs` Rust source).
56
+ Verified from source (Claude Code binary + Codex `codex-rs`). Useful when working on `/v1/messages`
57
+ and `/v1/responses` handlers. Routing/identity headers `X-Legion-{Provider,Model,Instance,Tier}` are
58
+ honored as **rules** (hard constraints), not hints.
44
59
 
45
60
  ### Claude Code → `POST /v1/messages`
46
61
 
@@ -48,15 +63,10 @@ Verified from source code (Claude Code binary + Codex `codex-rs` Rust source).
48
63
  |---|---|---|
49
64
  | `X-Claude-Code-Session-Id` | Stable UUID for the CLI session | Yes |
50
65
  | `x-app` | `"cli"` (foreground) or `"cli-bg"` (background) | Yes |
51
- | `x-claude-remote-session-id` | Remote container session ID | Conditional |
52
- | `x-claude-remote-container-id` | Remote container ID | Conditional |
53
- | `x-claude-code-agent-id` | Agent UUID for multi-agent sessions | Conditional |
54
- | `x-claude-code-parent-agent-id` | Parent agent UUID (spawned subagent) | Conditional |
55
- | `x-client-app` | Additional client app identifier | Conditional |
56
-
57
- Conversation threading is **stateless** — full `messages[]` history sent in the body on every request. No conversation ID, turn ID, or `x-client-request-id` header is sent.
66
+ | `x-claude-code-agent-id` / `x-claude-code-parent-agent-id` | Agent / parent-agent UUIDs | Conditional |
58
67
 
59
- In Rack/Sinatra env keys, headers arrive as `HTTP_X_CLAUDE_CODE_SESSION_ID`, `HTTP_X_APP`, etc.
68
+ Threading is **stateless** full `messages[]` history in the body every request; no conversation/turn
69
+ ID header. In Rack env: `HTTP_X_CLAUDE_CODE_SESSION_ID`, `HTTP_X_APP`, etc.
60
70
 
61
71
  ### Codex → `POST /v1/responses`
62
72
 
@@ -66,34 +76,15 @@ In Rack/Sinatra env keys, headers arrive as `HTTP_X_CLAUDE_CODE_SESSION_ID`, `HT
66
76
  | `thread-id` | Stable UUID for the thread/conversation | Yes |
67
77
  | `x-client-request-id` | Same value as `thread-id` | Yes |
68
78
  | `x-codex-installation-id` | Installation-scoped UUID | Yes |
69
- | `x-codex-window-id` | `"{thread_id}:{window_generation}"` | Yes |
70
- | `x-codex-turn-state` | Sticky-routing token returned by server, replayed by client | After first response |
71
- | `x-codex-turn-metadata` | Per-turn observability metadata | Conditional |
72
- | `x-codex-parent-thread-id` | Parent thread UUID (sub-agents) | Conditional |
73
- | `x-openai-subagent` | Sub-agent type (`"review"`, `"compact"`, `"memory_consolidation"`, etc.) | Conditional |
74
- | `x-openai-memgen-request` | `"true"` for memory generation requests | Conditional |
75
-
76
- In Rack/Sinatra env keys: `HTTP_SESSION_ID`, `HTTP_THREAD_ID`, `HTTP_X_CLIENT_REQUEST_ID`, `HTTP_X_CODEX_INSTALLATION_ID`, etc.
77
-
78
- **`HTTP_THREAD_ID` is the stable Codex thread/conversation ID** — it is stable for the lifetime of a thread, not per-request. `HTTP_X_CLIENT_REQUEST_ID` equals `HTTP_THREAD_ID` (Codex sets them to the same value).
79
-
80
- Conversation threading over HTTP uses full input in body (stateless like Anthropic). Over WebSocket, `previous_response_id` is sent in the request body to enable delta-only input.
79
+ | `x-codex-turn-state` | Sticky-routing token, replayed by client | After first response |
80
+ | `x-openai-subagent` | Sub-agent type (`review`, `compact`, …) | Conditional |
81
81
 
82
- ### Practical Usage in `/v1/messages` and `/v1/responses` Handlers
82
+ `HTTP_THREAD_ID` is the stable thread/conversation ID (not per-request); `HTTP_X_CLIENT_REQUEST_ID`
83
+ equals it. HTTP threading is stateless (full input in body); over WebSocket, `previous_response_id`
84
+ enables delta-only input.
83
85
 
84
86
  ```ruby
85
- # Stable request ID (Claude Code sends X-Claude-Code-Session-Id; Codex sends x-client-request-id = thread-id)
86
- request_id = env['HTTP_X_CLIENT_REQUEST_ID'] || "req_#{SecureRandom.hex(12)}"
87
-
88
- # Stable conversation/thread ID
89
- # Claude Code: no header — generate per-request or use Legion conversation tracking
90
- # Codex: HTTP_THREAD_ID is stable for the thread lifetime
91
- conversation_id = env['HTTP_THREAD_ID'] ||
92
- env['HTTP_X_LEGION_CONVERSATION_ID'] ||
93
- body[:conversation_id] ||
94
- "conv_#{SecureRandom.hex(8)}"
95
-
96
- # Identify the calling client
97
- claude_code_session = env['HTTP_X_CLAUDE_CODE_SESSION_ID'] # present only for Claude Code
98
- codex_installation = env['HTTP_X_CODEX_INSTALLATION_ID'] # present only for Codex
87
+ request_id = env['HTTP_X_CLIENT_REQUEST_ID'] || "req_#{SecureRandom.hex(12)}"
88
+ conversation_id = env['HTTP_THREAD_ID'] || env['HTTP_X_LEGION_CONVERSATION_ID'] ||
89
+ body[:conversation_id] || "conv_#{SecureRandom.hex(8)}"
99
90
  ```
data/CHANGELOG.md CHANGED
@@ -1,5 +1,298 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.13.0] - 2026-06-17
4
+
5
+ Consolidated release. This single version bundles every change from `0.12.14` through `0.12.35`
6
+ into one published release — the patch series was developed on a long-running branch and is shipped
7
+ together as `0.13.0`. The per-patch entries below (`0.12.14`–`0.12.35`) remain the authoritative
8
+ detail; this section summarizes the themes.
9
+
10
+ ### Highlights
11
+
12
+ - **N × N routing with Inventory as the single source of truth** — `Inventory.offerings` is now the
13
+ one catalog (registration + liveness + health/circuit/denied); `Call::Registry`, `Discovery`, and
14
+ `HealthTracker` are feeders only. Routing, availability, and the executor read Inventory exclusively.
15
+ Cloud/frontier providers (Bedrock, Anthropic, OpenAI) are first-class routable and are no longer
16
+ shadowed by discovered local models.
17
+ - **Canonical / execution-proxy translation boundary** — every request parses into `Canonical::Request`
18
+ and every response renders from canonical back to the caller's dialect; no passthrough, no
19
+ provider-name branching outside translators. Tool-loop linkage (OpenAI Responses
20
+ `function_call`/`function_call_output`, qwen single-tag synthesis), per-format tool-arg typing, and
21
+ prompt-cache `cache_control` preservation are aligned and asserted by the in-process matrix harness.
22
+ - **Resilient multi-tier routing** — automatic escalation, mid-stream provider failover,
23
+ per-instance circuit breakers, multi-instance failover that exhausts a provider's own instances
24
+ before crossing providers, and account-scoped (credit/quota) errors that deprioritize the failing
25
+ instance instead of denying the model.
26
+ - **Model-policy compliance** — `model_whitelist`/`model_blacklist` enforced at dispatch, fail-closed;
27
+ a policy-denied model is terminal (never escalated, never trips circuits). Requires `lex-llm >= 0.5.4`.
28
+ - **Context curation, validated** — the Curator's deterministic strategies were validated against
29
+ ground-truth wire payloads (86.8% context reduction across 29 turns).
30
+ - **G14 router decomposition** — `Router::Candidates` and `Router::RegistryLookup` extracted from the
31
+ router (1030 → 694 lines) with no behavior change.
32
+ - **CI / observability** — RSpec and RuboCop dependency pins corrected (`lex-llm >= 0.5.4`,
33
+ `rubocop-legion >= 0.1.8`); discovery model-divergence warning made tolerant of versioned families.
34
+
35
+ ### Fixed
36
+
37
+ - **A legacy `:capability` routing-intent key no longer bricks routing.** The `:capability`
38
+ dimension was renamed to `:operation` + `:effort`, but three paths (`Router#normalize_intent`,
39
+ `Inference::Request.default_auto_routing_intent`, and the executor's `routing_intent_for_request`)
40
+ *raised* `ArgumentError` whenever the key was present — so any install whose on-disk
41
+ `default_intent` still carried the pre-rename `{ capability: 'moderate' }` default hit an error on
42
+ **every** request. The key is now tolerated (ignored) wherever it appears; `:operation`/`:effort`
43
+ are what's read.
44
+
45
+ ### Docs
46
+
47
+ - README rewritten with an N × N overview, the execution-proxy contract, and a validated
48
+ context-curation showcase. `CLAUDE.md` trimmed to the high-value invariants and gotchas; `AGENTS.md`
49
+ refreshed with current entry points and guardrails.
50
+
51
+ ## [0.12.35] - 2026-06-17
52
+
53
+ ### Fixed
54
+
55
+ - **Explicit provider failover exhausts all of the provider's instances first** — an explicit `X-Legion-Provider` hint only prepended the provider's first registered instance to the escalation chain, so a failing instance (e.g. one account hitting a provider error) failed straight over to a *different* provider, skipping the provider's other configured instances. `prepend_hinted_provider` now prepends every registered instance of the hinted provider (registry order), so failover stays within the provider — across all its accounts/instances — before ever crossing to another provider. The fallback-chain builders (`build_fallback_resolutions`, `enabled_provider_chain`) also source a sibling instance's model from `Inventory` when it has no configured registry default (e.g. a whitelist-restricted instance whose policy-aware default resolved to `nil`), so such siblings are no longer dropped from the chain.
56
+ - **Native offerings attributed to the registry instance, not a generic default** — `Inventory#native_provider_offerings` honored the adapter's self-reported instance (often a generic `default`, because the adapter is not told its registration name) over the authoritative registry instance Inventory was enumerating. That collapsed multiple configured instances of a cloud provider into a single `default` offering. The registry instance now wins, so each configured instance appears as its own offering.
57
+ - **Account-scoped errors fail over to a sibling instance and deprioritize the failing one** — the escalation loop's generic provider-error handler called `skip_all_provider_model_instances!`, marking *every* instance of the failing provider+model as tried, and reported a provider-health failure that opened the instance's circuit. So a credit-balance error on one account (`anthropic` instance A) skipped the *other* account (instance B, same model) and crossed to a different provider — making the outcome depend on instance order. Account/instance-scoped errors (credit balance, payment, quota) now (a) skip only the failing instance so failover walks the provider's sibling instances first, and (b) **deprioritize** the failing instance by tripping *its* per-instance circuit (immediate, since the condition is deterministic) without `deny_model` and without penalizing the provider globally — so future requests prefer the healthy sibling and the circuit's cooldown→half_open re-probe auto-recovers the instance once topped up. Model-intrinsic errors still skip all instances of the model to preserve the attempt budget.
58
+ - **Fleet lane renders the instance label, not rejects it** — an `instance_id` is a trusted operator label on the internal (datacenter-hosted) `Legion::Transport` RabbitMQ, not secret material, so `Fleet::Lane.offering_key` now *sanitizes* it (`Fleet::Lane.label_segment`) rather than rejecting any label containing a credential-ish word (e.g. an instance named `env_bearer`). The credential denylist still applies to genuinely untrusted values (`boundary`, eligibility facts). `Inventory#add_fleet_lane` also no longer lets a malformed (empty/over-length) label break offering construction — the offering builds without a fleet lane. Previously a credential-word instance name raised `ArgumentError` mid-build and made that instance unroutable.
59
+
60
+ ## [0.12.34] - 2026-06-17
61
+
62
+ ### Fixed
63
+
64
+ - **Never dispatch a model the provider doesn't offer** — routing could pair an explicit provider with a foreign/stale model (observed: `anthropic` + `qwen3.6-27b`, which Anthropic never offered). Two sources closed: (1) `Router#explicit_resolution` now sources the model from `Inventory` (the SSOT, already whitelist/blacklist-filtered) before any stale registry/tier default, so an explicit provider resolves to a model it actually offers; (2) the executor's no-model fallback no longer drops the global `default_model` onto an unrelated provider — a resolved provider gets *its own* catalog model, and the global default applies only when no provider resolved (or it belongs to the resolved provider).
65
+ - **Availability enforces the live catalog for every provider** — the Inventory model-existence gate previously exempted cloud/frontier providers; it now applies to all of them. A `(provider, model)` the catalog doesn't list is rejected (`:model_not_offered`), so a foreign or policy-excluded model can never reach dispatch. Empty/nil catalogs stay permissive (cold-boot safe).
66
+
67
+ ## [0.12.33] - 2026-06-17
68
+
69
+ ### Added
70
+
71
+ - **Daemon-side model-policy enforcement (compliance)** — `Call::Dispatch.call` now refuses to dispatch a model excluded by a provider's `model_whitelist`/`model_blacklist`, failing closed with the new terminal `Legion::LLM::ModelNotAllowed` error before the provider call (the provider enforces the same policy as a backstop). Provider-raised `lex-llm` `ModelNotAllowedError` is mapped to the same type.
72
+
73
+ ### Fixed
74
+
75
+ - **A policy-denied model is not an escalation** — both escalation paths (`Inference::Executor#run_escalation_resolution` and `Inference.chat_with_escalation`) now treat `ModelNotAllowed` as terminal: it is re-raised immediately rather than escalated to the next model, and it does not record a health failure, trip a circuit breaker, or deny-record the model. `ModelNotAllowed` is non-retryable.
76
+
77
+ ## [0.12.32] - 2026-06-16
78
+
79
+ ### Fixed
80
+
81
+ - **Discovery model-divergence false positives** — The divergence warning now treats a configured default as present when a discovered id is a versioned family member of it (e.g. `anthropic.claude-sonnet-4` matches `anthropic.claude-sonnet-4-6`), instead of requiring an exact string or Ollama-style `:` tag. Multi-model cloud providers (Bedrock lists ~90 models) no longer warn on every boot. The warning also reports `discovered_count` and truncates the id list to a sample, so a divergence no longer dumps the full catalog into a single log line.
82
+
83
+ ## [0.12.31] - 2026-06-16
84
+
85
+ ### Changed
86
+
87
+ - **lex-llm dependency** — Require `lex-llm >= 0.5.3`, the first published release carrying the `Legion::Extensions::Llm::Canonical` types the native dispatch path depends on. Resolves CI `NameError: uninitialized constant Legion::Extensions::Llm::Canonical` when the published gem (rather than a local checkout) is resolved.
88
+
89
+ ### Build
90
+
91
+ - **RuboCop tooling** — Track `rubocop-legion` main until `0.1.8` (which ships the four `Legion/Framework` N×N guard cops referenced by `.rubocop.yml`, including `NoShapeDuckTyping`) is published; the published `0.1.7` predates those cops.
92
+
93
+ ## [0.12.30] - 2026-06-16
94
+
95
+ ### Fixed
96
+
97
+ - **Legion routing header precedence** — Client translators now ignore protocol body `model` values for Legion routing and route only from `X-Legion-Provider`, `X-Legion-Model`, `X-Legion-Instance`, and `X-Legion-Tier` preferences.
98
+ - **LegionIO alias routing** — The internal `legionio` model alias no longer erases existing provider, instance, or tier routing preferences when normalizing inference requests.
99
+ - **Routing preference scoring** — Router hint matches now carry a dominant preference bonus without filtering fallback candidates, so `X-Legion-*` headers bias routing strongly while preserving normal fallback behavior.
100
+
101
+ ## [0.12.29] - 2026-06-16
102
+
103
+ ### Fixed
104
+
105
+ - **Canonical content-block rendering** — Claude Messages and OpenAI Chat responses now unwrap canonical content blocks before client formatting, preventing Ruby object inspect strings from crossing HTTP response boundaries.
106
+ - **OpenAI Chat server tool visibility** — Mixed LegionIO-executed tool failures and client passthrough tool calls now render the server tool result in assistant content while leaving only client tools actionable.
107
+
108
+ ## [0.12.28] - 2026-06-16
109
+
110
+ ### Fixed
111
+
112
+ - **Canonical tool-loop result propagation** — Native tool loops now attach LegionIO-executed tool results to immutable canonical `ToolCall` objects without Hash mutation, preserving server-resolved source/result state alongside client passthrough tool calls for `/v1/responses`.
113
+ - **Tool error result preservation** — Dispatcher failure details now survive native tool result content rendering so failed LegionIO-executed tools surface useful server-side tool output instead of `{}`.
114
+
115
+ ## [0.12.27] - 2026-06-16
116
+
117
+ ### Fixed
118
+
119
+ - **Context-window escalation** — Provider-wrapped maximum-context-length errors now classify as `ContextOverflow` even when a provider gem reports them as a generic server/provider error, allowing escalation to skip same-tier candidates and seek a larger-context model.
120
+
121
+ ## [0.12.26] - 2026-06-16
122
+
123
+ ### Fixed
124
+
125
+ - **Legion tool failure diagnostics** — Tool dispatch failure logs now use configurable `llm.tool_error_log_chars` with a 500-character default and prefer structured runtime failure details (`exit_status`, error line, output tail) over generated command prefixes.
126
+ - **Legion-executed tool wording** — Native tool-loop logs now report `legion_executed_tools` and `all_legion_executed_tools_failed`, avoiding ambiguous server/client terminology while preserving client wire protocols.
127
+
128
+ ## [0.12.25] - 2026-06-16
129
+
130
+ ### Fixed
131
+
132
+ - **Codex Responses rendering** — `/v1/responses` now unwraps canonical content-block arrays into plain `output_text` strings for both non-streaming responses and streaming fallback finalization, preventing Ruby object inspect strings from leaking into Codex.
133
+
134
+ ## [0.12.24] - 2026-06-16
135
+
136
+ ### Fixed
137
+
138
+ - **Codex Responses routing** — `/v1/responses` no longer performs a provider-capability shortcut before routing. Codex requests always run through the router first, then dispatch via upstream Responses only when the resolved provider supports it.
139
+ - **Responses escalation dispatch** — Escalation attempts for Responses-origin requests now use upstream Responses for capable providers and fall back to the normal routed chat/stream path for providers without Responses support.
140
+ - **Escalation visibility** — Non-primary escalation attempts now log at `WARN` and include previous failure context so actual failover is visible in live logs. Primary attempts remain `INFO`.
141
+
142
+ ## [0.12.23] - 2026-06-16
143
+
144
+ ### Fixed
145
+
146
+ - **Streaming escalation failover** — Provider switch notifications now build `Legion::LLM::Router::Resolution` with the fully-qualified namespace, preventing `NameError` after the first streaming attempt fails.
147
+ - **Auth failure health handling** — Authentication and provider configuration failures now deny the affected provider instance/model and immediately trip that instance circuit instead of waiting for normal error-threshold health decay.
148
+ - **Final context preflight** — Direct and Responses dispatch now re-estimate the final provider payload after system enrichment, tool definitions, tool preferences, and thinking options are materialized, raising `ContextOverflow` before submitting an oversized request to the provider.
149
+ - **Failed attempt metering** — Escalation attempt metering events now include `status`, `error`, and `provider_submitted` fields so submitted failed calls can be audited without looking like successful zero-token completions.
150
+
151
+ ## [0.12.22] - 2026-06-16
152
+
153
+ ### Added
154
+
155
+ - **Context token accounting** — `llm_message_inference_metrics` is now the canonical source of truth for all pipeline context token metrics. Every inference request emits a normalized `context_accounting` payload with per-component token estimates covering: loaded history, curated history, curation savings, thinking strip savings, archive savings, context-window enforcement savings, RAG injection, system/baseline prompt, tool definitions, and final estimated context size.
156
+ - **`Inference::ContextAccounting` module** — Deterministic char/4 estimator with structured event builder for pipeline instrumentation.
157
+ - **Executor instrumentation** — `step_context_load` records loaded/curated/archived history tokens; `ContextWindow` records thinking-strip and context-window enforcement savings; `RagContext` records RAG injection tokens; `ToolInjection` records tool definition payload tokens; system/baseline enrichment tokens recorded at dispatch.
158
+ - **Provider reconciliation** — Finalized accounting includes a reconciliation block comparing estimated input tokens against provider-reported input tokens with delta.
159
+ - **Metering event enrichment** — `Steps::Metering.build_event` carries the `context_accounting` payload for downstream ledger persistence.
160
+ - **Audit event enrichment** — `AuditPublisher.build_event` exposes `context_accounting` as a top-level key for ledger writer convenience.
161
+ - **Component status tracking** — Each accounting-producing pipeline step sets its component status (`:observed`, `:not_observed`, `:profile_skipped`) so zero-valued columns are distinguishable from skipped steps.
162
+
163
+ ## [0.12.21] - 2026-06-15
164
+
165
+ ### Added
166
+
167
+ - **Capability source metadata** — Discovery, rule generation, and availability logs now carry per-capability source tags (`:model_override`, `:instance_override`, `:provider_override`, `:model_metadata`, `:provider_catalog`, `:probe`, `:provider_envelope`, `:default_false`).
168
+ - **Conservative router hard gates** — Empty or unconfirmed capability data no longer passes `required_capabilities` checks. Absent means false.
169
+ - **Source-aware cold boot** — During `:unknown` discovery status, capabilities must be explicitly confirmed by settings overrides or explicit metadata to satisfy hard gates.
170
+ - **Typed routing errors** — `RoutingTooEarly` (425) when discovery not authoritative; `RoutingFailedDependency` (424) when no candidate satisfies hard gates. Replaces generic `EscalationExhausted` for routing-policy failures.
171
+ - **Instance resolution enforcement** — `nil` instance on a resolution returns `:instance_unresolved` rejection.
172
+ - **Diagnostic logging** — Missing-capability rejections include `sources=thinking:default_false` detail for operator visibility.
173
+ - **Discovery schema v3** — `DISCOVERED_MODELS_SCHEMA_VERSION` bumped to invalidate cached entries lacking `capability_sources`.
174
+ - **Rule generator source awareness** — Generated rules only include capabilities confirmed by source-tagged offering truth; stale registry metadata no longer blindly merged.
175
+ - **Operator contract documentation** — `docs/work/planning/2026-06-15-capability-source-operator-contract.md`.
176
+
177
+ ### Fixed
178
+
179
+ - Discovery no longer merges stale registry metadata capabilities over live offering data when offerings carry `capability_sources`.
180
+ - Tool trigger matching strips `<system-reminder>...</system-reminder>` blocks from its scan text without mutating request history, preventing startup/handoff prompts from triggering broad tool injection.
181
+
182
+ ## [0.12.20] - 2026-06-15
183
+
184
+ ### Breaking
185
+
186
+ - Routing intent key `:capability` removed; use `:operation` and `:effort`. Supplying `capability:` raises `ArgumentError`.
187
+ - Settings `default_intent` must use `effort:`/`operation:` instead of `capability:`.
188
+
189
+ ### Added
190
+
191
+ - Routing intent separates `effort` (soft preference), `operation` (hard filter), and `required_capabilities` (hard filter).
192
+ - Effort levels: `:low`, `:moderate`, `:high`, `:reasoning`. `:medium` normalizes to `:moderate`.
193
+ - Thinking is a hard capability only when explicitly requested via thinking config.
194
+ - Router chains reject stale registry defaults not present in live discovered offerings.
195
+ - Discovery status policy: `:unknown` permissive, `:ok` authoritative, `:empty` rejects, `:unreachable`/`:error` rejects.
196
+ - `Discovery::DISCOVERED_MODELS_SCHEMA_VERSION` and `Cache::RESPONSE_CACHE_SCHEMA_VERSION` invalidate stale payloads.
197
+ - Multi-instance provider routing: same provider with different instances carries distinct capabilities and availability.
198
+ - `Inventory.invalidate_offerings_cache!` public method for discovery refresh actors.
199
+ - Per-offering health bridging: discovery reports `:success`, `:error`, `:latency` to `Router.health_tracker`.
200
+ - Discovered model entries include `health` and `loaded` fields from live offerings.
201
+ - `loaded_model_bonus` scoring (+5) for models confirmed running by provider.
202
+ - `resolve.no_rules_matched` warning includes rejection trace breakdown.
203
+ - `missing_capability` availability log includes required and available capabilities.
204
+ - Determinism and regression spec coverage (`spec/legion/llm/router/determinism_spec.rb`, `multi_instance_spec.rb`).
205
+
206
+ ### Fixed
207
+
208
+ - vLLM live catalog IDs honored before dispatch; stale `qwen3.6-27b` rejected when only `legion-code-27b-v1` is offered.
209
+ - `enabled_provider_chain` includes all registered instances, not just first per provider family.
210
+ - `chain_from_defaults` primary resolution carries registered instance.
211
+ - `chain_from_intent` dedup includes instance (same-model/different-instance preserved).
212
+ - `build_fallback_resolutions` preserves instance directly.
213
+ - Last-resort fallback resolutions filtered through live availability.
214
+ - `enterprise_privacy_spec` order-dependency fixed.
215
+
216
+ ### Removed
217
+
218
+ - `Discovery::System.memory_pressure?` (confirmed dead, no production callers).
219
+
220
+ ## [0.12.19] - 2026-06-12
221
+
222
+ ### Fixed
223
+ - **Request-payload errors no longer deny models or trip circuit breakers** — `ValidationException` for malformed tool schemas (e.g., `tools.16.custom.input_schema.type: Field required`) is now classified as a request-payload error, not a provider config error. Models are no longer permanently denied for client-side schema bugs. (lib/legion/llm/inference/executor/escalation.rb)
224
+ - **HealthTracker honors signal value and logs honestly** — Error handler now uses `payload[:value]` (default 1.0) instead of always incrementing by 1. Already-open circuits no longer re-log fake `closed→open` transitions. (lib/legion/llm/router/health_tracker.rb)
225
+ - **Health keying consistency** — All `health_tracker.report` calls now include `instance:` from the resolution, ensuring discovery/escalation/post-request signals accumulate on the same provider/instance key. (lib/legion/llm/inference.rb, lib/legion/llm/inference/executor/escalation.rb)
226
+ - **Discovery unreachable trips circuit immediately** — Connection failures during discovery now call `trip_circuit` instead of a `value: 1` report. A boot-time unreachable vLLM is marked `:open` without requiring 3 separate failures. (lib/legion/llm/discovery.rb)
227
+ - **Tool schema normalization at canonical boundary** — `Canonical::ToolDefinition.normalize_parameters` guarantees every tool schema has a valid top-level `type`. Prevents Bedrock `tools.16.custom.input_schema.type: Field required` rejections. (lex-llm, legion-llm types/tool_definition.rb)
228
+ - **Anthropic translator double-wrap eliminated** — `render_tools` no longer wraps the full JSON schema inside `{type: 'object', properties: schema}`. Passes canonical schemas through directly. (lex-llm-anthropic translator.rb, provider.rb)
229
+ - **Provider tool renderers accept canonical ToolDefinition objects** — All provider gems (Gemini, Ollama, Vertex, vLLM, OpenAI-compatible) now use `ToolSchema.extract` instead of calling `tool.params_schema` directly. (lex-llm, lex-llm-gemini, lex-llm-ollama, lex-llm-vertex)
230
+ - **Discovery unreachable propagates to legion-llm** — `discover_offerings(raise_on_unreachable: true)` raises transport failures instead of swallowing into `[]`. (lex-llm provider.rb, legion-llm lex_llm_adapter.rb)
231
+
232
+ ### Added
233
+ - **Router::Availability oracle** — Single availability filter for routing and escalation. Checks circuit state, denied models, discovery status, context length, and required capabilities before building escalation chains. (lib/legion/llm/router/availability.rb)
234
+ - **Legion::LLM::Capabilities module** — Normalized capability alias handling (`:function_calling` → `:tools`, `:stream` → `:streaming`). Shared across Router, Discovery, RuleGenerator. (lib/legion/llm/capabilities.rb)
235
+ - **Escalation loop circuit guard** — Open circuits are skipped in the escalation loop (`:half_open` allowed as recovery probe). Empty chains raise `EscalationExhausted` immediately without opening sockets. (lib/legion/llm/inference/executor/escalation.rb)
236
+ - **G6 streaming failover hooks** — `StreamAssembler` gains `provider_failed`/`provider_switched`/`safe_replay_snapshot` observer hooks. All client emitters gain `on_tool_call_abort`. Executor accepts `stream_observer:` kwarg. (lib/legion/llm/api/stream_assembler.rb, client_translators/*)
237
+ - **Canonical::ToolSchema extractor** — Shared tool schema extraction regardless of input shape (ToolDefinition, Hash, legacy tool). (lex-llm canonical/tool_schema.rb)
238
+ - **Provider contract strengthened** — `discover_offerings` requires `raise_on_unreachable:` parameter. Providers must accept canonical `ToolDefinition` objects. (lex-llm provider_contract.rb)
239
+
240
+ ### Removed
241
+ - **Legacy non-stateful fallback path** — `try_fallback_or_raise`, `find_fallback_provider`, `fallback_local_providers?` deleted. One provider-switching mechanism: stateful escalation through `Router::EscalationChain`. (lib/legion/llm/inference/executor/routing.rb)
242
+
243
+ ### Changed
244
+ - **Settings defaults** — `escalation.enabled: true`, `gaia.advisory_enabled: true`, `context_curation.thinking_eviction: true`, `context_curation.exchange_folding: true`, `streaming.emit_thinking_blocks: true`, `discovery.trip_circuit_on_unreachable: true`, `escalation.skip_open_circuits: true`.
245
+ - **Provider capabilities include `:tools`** — All tool-capable providers now emit canonical `:tools` in capability metadata alongside aliases. (lex-llm-openai, lex-llm-gemini, lex-llm-vertex, lex-llm-bedrock, lex-llm-azure-foundry, lex-llm-ollama)
246
+
247
+ ## [0.12.18] - 2026-06-12
248
+
249
+ ### Fixed
250
+ - **Anthropic `tool_use.input` regression — must be Object, not JSON string** — The P6 SharedExtractors dedup folded `serialize_args` into a single uniform helper that always returned a JSON string. The two client wire formats are incompatible: Anthropic `/v1/messages` REQUIRES `tool_use.input` (and `server_tool_use.input`) to be an Object; OpenAI `/v1/responses` and `/v1/chat/completions` REQUIRE `function_call.arguments` to be a JSON String. Replaced the uniform helper with two explicit per-format helpers — `args_as_object` (Anthropic) and `args_as_json_string` (OpenAI). Both helpers also defensively coerce degraded provider output (e.g. `1.01` numeric or unparsed JSON string from a qwen3.6-27b run that fell back to plain content) to the format-correct shape — `{}` for Anthropic, `"{}"` for OpenAI — rather than letting an off-spec value reach the wire. Live evidence: `legionio-e2e/results/claude/vllm_multi_turn_*` showed `"input": 1.01` against an Anthropic spec demanding an Object. Sibling check: G24 `server_tool_use.input` (Anthropic) and server `function_call.arguments` (Responses + chat completions) all use the per-format helper. (lib/legion/llm/api/client_translators/{shared_extractors,anthropic_messages,openai_chat,openai_responses}.rb)
251
+
252
+ ### Added
253
+ - **Matrix harness oracle: `tool_args_typing_matrix_spec.rb`** — Asserts `tool_use.input is Hash` for Anthropic and `function_call.arguments is String` for OpenAI Responses + chat completions, on three input shapes (normal Hash args, degraded numeric, G24 server-tool block). Verified to fail with the exact regression signature ("got Float: 1.01") when the per-format coercion is reverted. Closes the assertion gap that let 3056 specs pass while the regression shipped to live e2e. (spec/legion/llm/api/matrix/tool_args_typing_matrix_spec.rb, spec/support/fake_provider.rb new `:tool_degraded_args` scenario)
254
+
255
+ ## [0.12.17] - 2026-06-12
256
+
257
+ ### Deprecated
258
+ - **Legacy flat API tree under `lib/legion/llm/api/{anthropic,openai,native}/`** — The flat-file route tree is deprecated. `llm.api.use_namespaces` defaults to `true`; setting it to `false` continues to register the legacy chain but now logs a deprecation warning at registration time. The legacy tree (`api/anthropic/messages.rb`, `api/openai/{chat_completions,embeddings,models,responses}.rb`, `api/native/*.rb` flat files) and the `register_legacy` dispatcher will be **deleted in the next minor release**. All routing is consolidated under `api/namespaces/` and the new `api/client_translators/` + `api/stream_assembler.rb` (P5). (lib/legion/llm/api.rb)
259
+ - **`Legion::LLM::Inference.ask_direct` is a deprecated shim** — Previously routed through `chat_direct_raw`, an ungoverned path that bypassed metering/audit. Now routes through the governed pipeline via `chat_direct` and emits a `Deprecation.warn_once` warning. Same compliance gap closure as the `chat_direct`/`embed_direct`/`structured_direct` deprecation in 0.12.16. Use `Legion::LLM.ask` instead. (lib/legion/llm/inference.rb)
260
+
261
+ ### Removed
262
+ - **Absorbed translator shims** — `lib/legion/llm/api/translators/{anthropic,openai}_{request,response}.rb`, `Legion::LLM::Call::NativeResponseAdapter`, and the per-route thinking/token/tool-call extractor duplicates absorbed by `api/client_translators/` and `api/stream_assembler.rb` are removed.
263
+
264
+ ### Added
265
+ - **rubocop-legion guard cops adopted** — Repo-wide enable for `Legion/Framework/NoUnderscorePrefixedKwargs` (G13), `Legion/Framework/NoInlineSettingDefaults` (G13), and `Legion/Framework/NoDirectDispatch` (G16). `Legion/Framework/NoShapeDuckTyping` (R10) enabled scoped to `lib/legion/llm/api/**` and `lib/legion/llm/inference/**`. (.rubocop.yml, Gemfile)
266
+ - **CLAUDE.md "LLM Routing Invariants" section (G2)** — Public-safe invariant set: execution-proxy contract (LegionIO tools look server-side to clients, client-side to providers), always-translate (never passthrough), no provider-name conditionals outside translators, thinking never crosses providers, mid-stream failover required, every pipeline exit emits ledger events, the canary prompt.
267
+ - **RuleGenerator merges instance-level capabilities into chat rules** — Auto-generated chat rules now carry the provider's instance-level `:tools`/`:streaming`/`:vision` capabilities (e.g. lex-llm-vllm declares `capabilities: %i[completion streaming vision tools]` on its DEFAULT_INSTANCE_TIER) when the per-model offerings hash only surfaces `[:completion]`. Without this, the router logged `resolve.no_rules_matched required_capabilities=[:tools]` on every tool request and fell through to the default-provider chain. `Discovery.discovered_instances` threads `Call::Registry` instance metadata into the grouped instance hash; `RuleGenerator.merged_capabilities` unions per-model and instance-level caps. (lib/legion/llm/discovery.rb, lib/legion/llm/discovery/rule_generator.rb)
268
+ - **B3 OpenAI Responses reasoning summary opt-in** — `OpenAIResponses#ensure_reasoning_summary` defaults `reasoning.summary` to `'auto'` when the caller asked for reasoning (effort set) but didn't pin a summary mode. OpenAI's `/v1/responses` lane omits reasoning content otherwise, which left codex→openai cells returning only the message item with no reasoning. (lib/legion/llm/api/client_translators/openai_responses.rb, lib/legion/llm/api/namespaces/openai/responses.rb)
269
+ - **Matrix harness regression encoding** — `spec/legion/llm/api/matrix/tool_injection_matrix_spec.rb` asserts that registered LegionIO tools reach the upstream provider's `tools:` kwarg on all three client formats. The cell that previously surfaced this failure live (claude/vllm legionio_tool_injection answering "There is no tool") now fails offline with a deterministic FakeProvider when injection drops out.
270
+
271
+ ### Changed
272
+ - **`Legion::LLM::Router` signature cleanup** — Removed unused `**_opts` swallow-splats from `resolve`, `resolve_chain`, `select_candidates`, `chain_from_intent` (no callers passed extra kwargs).
273
+ - **`routing.last_resort_{model,provider}` settings** — Replace inline `'claude-sonnet-4-6'`/`:anthropic` defaults in the router's last-resort fallback chain.
274
+ - **`telemetry.unknown_model_tag` setting** — Replaces the inline `'unknown'` default in OpenInference span tagging.
275
+
276
+ ## [0.12.16] - 2026-06-11
277
+
278
+ ### Deprecated
279
+ - **`chat_direct`, `embed_direct`, `structured_direct` are deprecated shims** — These methods previously bypassed the Inference pipeline (no metering/audit). They are now rerouted through the governed pipeline using a `:system` caller profile that skips governance steps but preserves metering and audit emission. Use `Legion::LLM.chat`, `Legion::LLM.embed`, and `Legion::LLM.structured` instead. The deprecated names will be removed in the next major version. (lib/legion/llm/inference.rb, lib/legion/llm.rb, lib/legion/llm/deprecation.rb)
280
+
281
+ ### Changed
282
+ - **scheduling/batch.rb uses governed pipeline** — `Batch.submit_single` now calls `Legion::LLM.chat` with a `:system` caller identity instead of `chat_direct`, ensuring batched requests are metered (lib/legion/llm/scheduling/batch.rb)
283
+ - **inference/steps/debate.rb uses governed pipeline** — Debate role calls now use `Legion::LLM.chat` with a `:system` caller identity instead of `chat_direct`, ensuring debate invocations are metered (lib/legion/llm/inference/steps/debate.rb)
284
+
285
+ ### Added
286
+ - **Deprecation helper** — `Legion::LLM::Deprecation.warn_once` emits a single `log.warn` per process per method name, thread-safe via Mutex (lib/legion/llm/deprecation.rb)
287
+ - **Recursion guard in Executor** — `Thread.current[:legion_llm_in_pipeline]` prevents infinite loops when pipeline steps internally call `chat_direct` (lib/legion/llm/inference/executor.rb)
288
+
289
+ ## [0.12.15] - 2026-06-10
290
+
291
+ ### Fixed
292
+ - **Async post-step race condition in test suite** — Disabled `pipeline_async_post_steps` in spec_helper's global `before(:each)` to prevent `ASYNC_THREAD_POOL` from racing with `Settings.reset!` between examples, which caused 4 intermittent `NoMethodError: undefined method '[]' for nil` failures in executor_stream_spec and pre_rollout_integration_spec (spec_helper.rb)
293
+ - **knowledge_capture_spec missing build_response** — Added minimal `build_response` to the test harness klass so `current_response` (from PostResponse) can construct a Response object instead of silently failing, which caused the ingest assertion to never fire (steps/knowledge_capture_spec.rb)
294
+ - **executor_async_spec stale stub target** — Fixed string-keyed async test that stubbed `Legion::LLM.settings` (unused by production code) instead of setting `Legion::Settings[:llm][:pipeline_async_post_steps]` directly (executor_async_spec.rb)
295
+
3
296
  ## [0.12.14] - 2026-06-10
4
297
 
5
298
  ### Added