RubyGems - legion-llm - Versions diffs - 0.12.14 → 0.13.0 - Mend

legion-llm 0.12.14 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (83) hide show

checksums.yaml +4 -4
data/.gitignore +1 -0
data/.rubocop.yml +63 -0
data/AGENTS.md +48 -57
data/CHANGELOG.md +293 -0
data/CLAUDE.md +104 -762
data/Gemfile +12 -8
data/README.md +97 -4
data/legion-llm.gemspec +1 -1
data/lib/legion/llm/api/client_translators/anthropic_messages.rb +761 -0
data/lib/legion/llm/api/client_translators/openai_chat.rb +623 -0
data/lib/legion/llm/api/client_translators/openai_responses.rb +852 -0
data/lib/legion/llm/api/client_translators/shared_extractors.rb +150 -0
data/lib/legion/llm/api/debug_formats.rb +356 -0
data/lib/legion/llm/api/namespaces/anthropic/messages.rb +66 -408
data/lib/legion/llm/api/namespaces/openai/batches.rb +1 -1
data/lib/legion/llm/api/namespaces/openai/chat/completions.rb +71 -175
data/lib/legion/llm/api/namespaces/openai/responses.rb +90 -456
data/lib/legion/llm/api/native/models.rb +2 -2
data/lib/legion/llm/api/native/tiers.rb +2 -2
data/lib/legion/llm/api/openai/responses.rb +1 -1
data/lib/legion/llm/api/stream_assembler.rb +705 -0
data/lib/legion/llm/api.rb +8 -4
data/lib/legion/llm/cache/response.rb +2 -2
data/lib/legion/llm/cache.rb +9 -7
data/lib/legion/llm/call/dispatch.rb +347 -215
data/lib/legion/llm/call/embeddings.rb +3 -3
data/lib/legion/llm/call/lex_llm_adapter.rb +80 -23
data/lib/legion/llm/call/structured_output.rb +2 -2
data/lib/legion/llm/capabilities.rb +46 -0
data/lib/legion/llm/compat.rb +1 -2
data/lib/legion/llm/content_hash.rb +52 -0
data/lib/legion/llm/context/compressor.rb +1 -1
data/lib/legion/llm/context/curator.rb +1 -1
data/lib/legion/llm/deprecation.rb +34 -0
data/lib/legion/llm/discovery/rule_generator.rb +126 -15
data/lib/legion/llm/discovery/system.rb +1 -9
data/lib/legion/llm/discovery.rb +205 -23
data/lib/legion/llm/errors.rb +37 -0
data/lib/legion/llm/fleet/dispatcher.rb +1 -3
data/lib/legion/llm/fleet/lane.rb +16 -1
data/lib/legion/llm/fleet/token_issuer.rb +2 -1
data/lib/legion/llm/inference/audit_publisher.rb +25 -0
data/lib/legion/llm/inference/context_accounting.rb +111 -0
data/lib/legion/llm/inference/embed_pipeline.rb +187 -0
data/lib/legion/llm/inference/executor/context_window.rb +199 -0
data/lib/legion/llm/inference/executor/escalation.rb +798 -0
data/lib/legion/llm/inference/executor/routing.rb +471 -0
data/lib/legion/llm/inference/executor/tool_injection.rb +396 -0
data/lib/legion/llm/inference/executor.rb +306 -1635
data/lib/legion/llm/inference/native_tool_loop.rb +307 -53
data/lib/legion/llm/inference/request.rb +9 -4
data/lib/legion/llm/inference/route_attempts.rb +41 -4
data/lib/legion/llm/inference/steps/debate.rb +10 -3
data/lib/legion/llm/inference/steps/knowledge_capture.rb +1 -1
data/lib/legion/llm/inference/steps/metering.rb +16 -2
data/lib/legion/llm/inference/steps/post_response.rb +18 -46
data/lib/legion/llm/inference/steps/rag_context.rb +18 -0
data/lib/legion/llm/inference/steps/tier_assigner.rb +4 -4
data/lib/legion/llm/inference/steps/tool_calls.rb +63 -10
data/lib/legion/llm/inference/steps/trigger_match.rb +20 -1
data/lib/legion/llm/inference.rb +104 -15
data/lib/legion/llm/inventory.rb +107 -22
data/lib/legion/llm/metering/tracker.rb +1 -1
data/lib/legion/llm/metering.rb +1 -1
data/lib/legion/llm/quality/checker.rb +5 -1
data/lib/legion/llm/quality/confidence/scorer.rb +7 -1
data/lib/legion/llm/router/availability.rb +178 -0
data/lib/legion/llm/router/candidates.rb +263 -0
data/lib/legion/llm/router/health_tracker.rb +31 -2
data/lib/legion/llm/router/registry_lookup.rb +121 -0
data/lib/legion/llm/router/rule.rb +3 -2
data/lib/legion/llm/router.rb +295 -344
data/lib/legion/llm/scheduling/batch.rb +3 -3
data/lib/legion/llm/scheduling.rb +2 -2
data/lib/legion/llm/settings.rb +78 -25
data/lib/legion/llm/tools/dispatcher.rb +45 -2
data/lib/legion/llm/tools/special.rb +45 -3
data/lib/legion/llm/types/tool_definition.rb +3 -1
data/lib/legion/llm/vector_store/storage.rb +0 -2
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +66 -6
metadata +21 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 7d61b50d6573478325baba59ea7b05a8e7a6bce2c66c453d15eec40b1380b891
-  data.tar.gz: e14038bcac7c816169e31bc2f8a08fb76331e0bc7b18766f029fe217c3b57d2d
+  metadata.gz: eaa596812e2320baa0d8ae31b44225c80e9ad9b54e62531b3cd1c7640ff09fe6
+  data.tar.gz: a98b90ae07c7040fcd8a13adccf1d07037f253cc845cdc4dc55b65f434805ce0
 SHA512:
-  metadata.gz: e1abe73f183b7b6e135db20bb2bc2b875b6bb40234630c5327a906d38892b6ef83f35c2c9f8a8b0ee193451d1609f1e30c1ebc4b0ebf28e844b3a9e8044ffefa
-  data.tar.gz: bf44ad26a524c018b042dda702b076c98ef7068aaf48e638305776532d809036778c86a0e4e385974c2d59bffbdf61da4a20067b89d09cbc85a2e5a6b34f5203
+  metadata.gz: a7611f997b2163792aa4f29d8ca2e3b8c10ec11af08ff8d89fae578ab0b0a138fbd9ad4957a5c42b793c30e959ab4c17290f01972cd0d8de8128e7e79873e28c
+  data.tar.gz: 3b0acc643ffe9c06c6d07df9c5dce9a54d79351878354e30c1cc22fb5c5001debf9ac529ddeb873868ff713f455665c6e5fe992558d70ef376151f9b368b7086

data/.gitignore CHANGED Viewed

@@ -24,3 +24,4 @@ docs/
 bin/apollo-setup-postreboot.sh
 bin/apollo-setup-prereboot.sh
 legionio-bootstrap-uhg-v3.json
+docs/

data/.rubocop.yml CHANGED Viewed

@@ -1,8 +1,71 @@
+plugins:
+  - rubocop-legion
+# These rubocop-legion cops surface in the local-path 0.1.7 build but were not
+# in the published 0.1.7 gem the repo previously tracked. They flag broad
+# pre-existing patterns unrelated to the N×N enforcement pass; deferred to
+# their own cleanup task. NoUnderscorePrefixedKwargs / NoInlineSettingDefaults
+# / NoDirectDispatch / NoShapeDuckTyping (the B4 set Phase 6 adopts) remain
+# enabled.
+Legion/RescueLogging/NoCapture:
+  Enabled: false
+Legion/ConstantSafety/InheritParam:
+  Enabled: false
 AllCops:
   TargetRubyVersion: 3.4
   NewCops: enable
   SuggestExtensions: false
+# N×N routing guard cops (Phase 6 enforcement; defaults from rubocop-legion config/default.yml).
+#
+# - NoUnderscorePrefixedKwargs / NoInlineSettingDefaults / NoDirectDispatch are
+#   enabled repo-wide.
+# - NoShapeDuckTyping is enabled on the canonical-only surface where the shape
+#   contract is fully established by translators. Code at the HTTP/wire ingress
+#   (client_translator parse_request, StreamAssembler chunk adapter, DebugFormats
+#   request envelope reader, Response.from_provider_message bridge) legitimately
+#   inspects shape because that's the layer responsible for normalising into
+#   canonical. The cop's scope expands here as the executor's canonical
+#   migration (Phase 4 follow-up) lands.
+Legion/Framework/NoShapeDuckTyping:
+  Enabled: true
+  Include:
+    - 'lib/legion/llm/api/**/*.rb'
+    - 'lib/legion/llm/inference/**/*.rb'
+  Exclude:
+    # Legacy tree deprecated this release (R11); deleted next minor.
+    - 'lib/legion/llm/api/translators/**/*.rb'
+    - 'lib/legion/llm/api/anthropic/**/*.rb'
+    - 'lib/legion/llm/api/openai/**/*.rb'
+    - 'lib/legion/llm/api/native/**/*.rb'
+    - 'lib/legion/llm/api/shared_helpers.rb'
+    # Boundary code that legitimately bridges wire ↔ canonical: parse_request
+    # at the HTTP ingress, the StreamAssembler chunk adapter (P5 explicitly
+    # accepts both Canonical::Chunk and the legacy StreamChunk shape during
+    # migration), DebugFormats (reads raw env / reflects request), and
+    # shared_extractors (normalises arbitrary thinking content shapes).
+    # Pre-canonical inference steps still navigate raw wire hashes; that
+    # scope tightens once the executor finishes the canonical migration
+    # (Phase 4 follow-up).
+    - 'lib/legion/llm/api/client_translators/anthropic_messages.rb'
+    - 'lib/legion/llm/api/client_translators/openai_chat.rb'
+    - 'lib/legion/llm/api/client_translators/openai_responses.rb'
+    - 'lib/legion/llm/api/client_translators/shared_extractors.rb'
+    - 'lib/legion/llm/api/stream_assembler.rb'
+    - 'lib/legion/llm/api/debug_formats.rb'
+    - 'lib/legion/llm/api/namespaces/**/*.rb'
+    - 'lib/legion/llm/inference/audit_publisher.rb'
+    - 'lib/legion/llm/inference/embed_pipeline.rb'
+    - 'lib/legion/llm/inference/enrichment_injector.rb'
+    - 'lib/legion/llm/inference/executor.rb'
+    - 'lib/legion/llm/inference/executor/**/*.rb'
+    - 'lib/legion/llm/inference/native_tool_loop.rb'
+    - 'lib/legion/llm/inference/profile.rb'
+    - 'lib/legion/llm/inference/response.rb'
+    - 'lib/legion/llm/inference/route_attempts.rb'
+    - 'lib/legion/llm/inference/steps/**/*.rb'
 Layout/LineLength:
   Max: 195
 Layout/SpaceAroundEqualsInParameterDefault:

data/AGENTS.md CHANGED Viewed

@@ -1,46 +1,61 @@
-# legion-llm Agent Notes
+# legion-llm — Agent Notes (v0.13.0)
-## Scope
-`legion-llm` provides provider configuration, chat/embed/structured interfaces, dynamic routing, escalation, quality checks, and pipeline execution for Legion.
+`legion-llm` is a **universal translation proxy** for LLM traffic: N client dialects (OpenAI Chat,
+OpenAI Responses, Anthropic Messages) × N provider backends (Bedrock, Anthropic, OpenAI, vLLM,
+Ollama, fleet), any direction. Every request parses once into `Canonical::Request`, is
+routed/executed, then renders once back to the caller's dialect. See `CLAUDE.md` for the full
+invariant set; `README.md` for detailed reference.
 ## Fast Start
 ```bash
 bundle install
-bundle exec rspec
-bundle exec rubocop
+bundle exec rspec      # 0 failures required before commit
+bundle exec rubocop    # 0 offenses required
 ```
+**The in-process matrix harness (`spec/legion/llm/api/matrix/`) is the commit gate.** Touch
+`lib/legion/llm/api/`, the executor, or the canonical/translator boundary → it must pass before push.
 ## Primary Entry Points
-- `lib/legion/llm.rb`
-- `lib/legion/llm/providers.rb`
-- `lib/legion/llm/router/`
-- `lib/legion/llm/pipeline/`
-- `lib/legion/llm/structured_output.rb`
-- `lib/legion/llm/embeddings.rb`
-- `lib/legion/llm/fleet/`
+- `lib/legion/llm.rb` — facade (`start`, `chat`, `ask`, `embed`, `structured`)
+- `lib/legion/llm/inventory.rb` — **single source of truth** for the model catalog
+- `lib/legion/llm/router.rb` + `router/{candidates,availability,health_tracker,escalation/}` — routing
+- `lib/legion/llm/inference/executor.rb` + `executor/{routing,escalation}.rb` — pipeline
+- `lib/legion/llm/inference/steps/` — the 18 pipeline steps
+- `lib/legion/llm/api/{openai,anthropic,native}/` — client routes
+- `lib/legion/llm/api/client_translators/` — canonical ↔ client wire formats
+- `lib/legion/llm/context/curator.rb` — async conversation curation (context-cost control)
+- Provider behaviour (defaults, capabilities, model filtering) lives in `../extensions-ai/lex-llm-*`
 ## Guardrails
-- Keep typed error behavior and retry semantics stable (`ProviderDown`, `RateLimitError`, `EscalationExhausted`, etc.).
-- Routing and escalation must remain deterministic given the same inputs/settings.
-- Preserve pipeline feature-flag behavior; avoid forcing pipeline-only code paths.
-- Keep provider credentials resolved through settings secret resolution flow; never hardcode secrets.
-- Maintain compatibility with direct methods (`chat_direct`, `embed_direct`, `structured_direct`) and daemon-aware flows.
-- Health tracker and rule scoring are contract-sensitive; changes require spec updates.
+- **Always translate, never passthrough**; **no `provider == :x` branches** outside translators.
+- **Inventory is the only catalog**; `Discovery`/`Registry`/`HealthTracker` are feeders.
+- Never dispatch a triple absent from the live catalog or unhealthy; **fail over, don't hard-fail**.
+- **Model policy = compliance**: `model_whitelist`/`model_blacklist` honored at dispatch, fail-closed;
+  a policy-denied model is terminal (never escalated, never trips circuits).
+- Thinking never crosses providers; mid-stream failover must not kill an in-flight conversation.
+- Every pipeline exit emits ledger events (metering/audit) — no bypasses.
+- `Legion::JSON` only (symbol keys); every `rescue` re-raises or `handle_exception`s; no
+  `defined?(Legion::Settings)` guards; `log.*` not `puts`.
+- **No personal/company identifiers in VCS**; never force-push.
+- Routing/escalation deterministic for the same inputs/settings; health-tracker & rule scoring are
+  contract-sensitive — changes require spec updates.
 ## Validation
-- Run targeted specs for modified router/pipeline/provider code.
-- Before handoff, run full `bundle exec rspec` and `bundle exec rubocop`.
+Run targeted specs for modified router/pipeline/translator code, then full `rspec` + `rubocop` +
+the matrix harness before handoff.
 ---
 ## Client Request Headers Reference
-Verified from source code (Claude Code binary + Codex `codex-rs` Rust source).
+Verified from source (Claude Code binary + Codex `codex-rs`). Useful when working on `/v1/messages`
+and `/v1/responses` handlers. Routing/identity headers `X-Legion-{Provider,Model,Instance,Tier}` are
+honored as **rules** (hard constraints), not hints.
 ### Claude Code → `POST /v1/messages`
@@ -48,15 +63,10 @@ Verified from source code (Claude Code binary + Codex `codex-rs` Rust source).
 |---|---|---|
 | `X-Claude-Code-Session-Id` | Stable UUID for the CLI session | Yes |
 | `x-app` | `"cli"` (foreground) or `"cli-bg"` (background) | Yes |
-| `x-claude-remote-session-id` | Remote container session ID | Conditional |
-| `x-claude-remote-container-id` | Remote container ID | Conditional |
-| `x-claude-code-agent-id` | Agent UUID for multi-agent sessions | Conditional |
-| `x-claude-code-parent-agent-id` | Parent agent UUID (spawned subagent) | Conditional |
-| `x-client-app` | Additional client app identifier | Conditional |
-Conversation threading is **stateless** — full `messages[]` history sent in the body on every request. No conversation ID, turn ID, or `x-client-request-id` header is sent.
+| `x-claude-code-agent-id` / `x-claude-code-parent-agent-id` | Agent / parent-agent UUIDs | Conditional |
-In Rack/Sinatra env keys, headers arrive as `HTTP_X_CLAUDE_CODE_SESSION_ID`, `HTTP_X_APP`, etc.
+Threading is **stateless** — full `messages[]` history in the body every request; no conversation/turn
+ID header. In Rack env: `HTTP_X_CLAUDE_CODE_SESSION_ID`, `HTTP_X_APP`, etc.
 ### Codex → `POST /v1/responses`
@@ -66,34 +76,15 @@ In Rack/Sinatra env keys, headers arrive as `HTTP_X_CLAUDE_CODE_SESSION_ID`, `HT
 | `thread-id` | Stable UUID for the thread/conversation | Yes |
 | `x-client-request-id` | Same value as `thread-id` | Yes |
 | `x-codex-installation-id` | Installation-scoped UUID | Yes |
-| `x-codex-window-id` | `"{thread_id}:{window_generation}"` | Yes |
-| `x-codex-turn-state` | Sticky-routing token returned by server, replayed by client | After first response |
-| `x-codex-turn-metadata` | Per-turn observability metadata | Conditional |
-| `x-codex-parent-thread-id` | Parent thread UUID (sub-agents) | Conditional |
-| `x-openai-subagent` | Sub-agent type (`"review"`, `"compact"`, `"memory_consolidation"`, etc.) | Conditional |
-| `x-openai-memgen-request` | `"true"` for memory generation requests | Conditional |
-In Rack/Sinatra env keys: `HTTP_SESSION_ID`, `HTTP_THREAD_ID`, `HTTP_X_CLIENT_REQUEST_ID`, `HTTP_X_CODEX_INSTALLATION_ID`, etc.
-**`HTTP_THREAD_ID` is the stable Codex thread/conversation ID** — it is stable for the lifetime of a thread, not per-request. `HTTP_X_CLIENT_REQUEST_ID` equals `HTTP_THREAD_ID` (Codex sets them to the same value).
-Conversation threading over HTTP uses full input in body (stateless like Anthropic). Over WebSocket, `previous_response_id` is sent in the request body to enable delta-only input.
+| `x-codex-turn-state` | Sticky-routing token, replayed by client | After first response |
+| `x-openai-subagent` | Sub-agent type (`review`, `compact`, …) | Conditional |
-### Practical Usage in `/v1/messages` and `/v1/responses` Handlers
+`HTTP_THREAD_ID` is the stable thread/conversation ID (not per-request); `HTTP_X_CLIENT_REQUEST_ID`
+equals it. HTTP threading is stateless (full input in body); over WebSocket, `previous_response_id`
+enables delta-only input.
 ```ruby
-# Stable request ID (Claude Code sends X-Claude-Code-Session-Id; Codex sends x-client-request-id = thread-id)
-request_id = env['HTTP_X_CLIENT_REQUEST_ID'] || "req_#{SecureRandom.hex(12)}"
-# Stable conversation/thread ID
-# Claude Code: no header — generate per-request or use Legion conversation tracking
-# Codex: HTTP_THREAD_ID is stable for the thread lifetime
-conversation_id = env['HTTP_THREAD_ID'] ||
-                  env['HTTP_X_LEGION_CONVERSATION_ID'] ||
-                  body[:conversation_id] ||
-                  "conv_#{SecureRandom.hex(8)}"
-# Identify the calling client
-claude_code_session = env['HTTP_X_CLAUDE_CODE_SESSION_ID']  # present only for Claude Code
-codex_installation  = env['HTTP_X_CODEX_INSTALLATION_ID']   # present only for Codex
+request_id      = env['HTTP_X_CLIENT_REQUEST_ID'] || "req_#{SecureRandom.hex(12)}"
+conversation_id = env['HTTP_THREAD_ID'] || env['HTTP_X_LEGION_CONVERSATION_ID'] ||
+                  body[:conversation_id] || "conv_#{SecureRandom.hex(8)}"
 ```

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,298 @@
 # Legion LLM Changelog
+## [0.13.0] - 2026-06-17
+Consolidated release. This single version bundles every change from `0.12.14` through `0.12.35`
+into one published release — the patch series was developed on a long-running branch and is shipped
+together as `0.13.0`. The per-patch entries below (`0.12.14`–`0.12.35`) remain the authoritative
+detail; this section summarizes the themes.
+### Highlights
+- **N × N routing with Inventory as the single source of truth** — `Inventory.offerings` is now the
+  one catalog (registration + liveness + health/circuit/denied); `Call::Registry`, `Discovery`, and
+  `HealthTracker` are feeders only. Routing, availability, and the executor read Inventory exclusively.
+  Cloud/frontier providers (Bedrock, Anthropic, OpenAI) are first-class routable and are no longer
+  shadowed by discovered local models.
+- **Canonical / execution-proxy translation boundary** — every request parses into `Canonical::Request`
+  and every response renders from canonical back to the caller's dialect; no passthrough, no
+  provider-name branching outside translators. Tool-loop linkage (OpenAI Responses
+  `function_call`/`function_call_output`, qwen single-tag synthesis), per-format tool-arg typing, and
+  prompt-cache `cache_control` preservation are aligned and asserted by the in-process matrix harness.
+- **Resilient multi-tier routing** — automatic escalation, mid-stream provider failover,
+  per-instance circuit breakers, multi-instance failover that exhausts a provider's own instances
+  before crossing providers, and account-scoped (credit/quota) errors that deprioritize the failing
+  instance instead of denying the model.
+- **Model-policy compliance** — `model_whitelist`/`model_blacklist` enforced at dispatch, fail-closed;
+  a policy-denied model is terminal (never escalated, never trips circuits). Requires `lex-llm >= 0.5.4`.
+- **Context curation, validated** — the Curator's deterministic strategies were validated against
+  ground-truth wire payloads (86.8% context reduction across 29 turns).
+- **G14 router decomposition** — `Router::Candidates` and `Router::RegistryLookup` extracted from the
+  router (1030 → 694 lines) with no behavior change.
+- **CI / observability** — RSpec and RuboCop dependency pins corrected (`lex-llm >= 0.5.4`,
+  `rubocop-legion >= 0.1.8`); discovery model-divergence warning made tolerant of versioned families.
+### Fixed
+- **A legacy `:capability` routing-intent key no longer bricks routing.** The `:capability`
+  dimension was renamed to `:operation` + `:effort`, but three paths (`Router#normalize_intent`,
+  `Inference::Request.default_auto_routing_intent`, and the executor's `routing_intent_for_request`)
+  *raised* `ArgumentError` whenever the key was present — so any install whose on-disk
+  `default_intent` still carried the pre-rename `{ capability: 'moderate' }` default hit an error on
+  **every** request. The key is now tolerated (ignored) wherever it appears; `:operation`/`:effort`
+  are what's read.
+### Docs
+- README rewritten with an N × N overview, the execution-proxy contract, and a validated
+  context-curation showcase. `CLAUDE.md` trimmed to the high-value invariants and gotchas; `AGENTS.md`
+  refreshed with current entry points and guardrails.
+## [0.12.35] - 2026-06-17
+### Fixed
+- **Explicit provider failover exhausts all of the provider's instances first** — an explicit `X-Legion-Provider` hint only prepended the provider's first registered instance to the escalation chain, so a failing instance (e.g. one account hitting a provider error) failed straight over to a *different* provider, skipping the provider's other configured instances. `prepend_hinted_provider` now prepends every registered instance of the hinted provider (registry order), so failover stays within the provider — across all its accounts/instances — before ever crossing to another provider. The fallback-chain builders (`build_fallback_resolutions`, `enabled_provider_chain`) also source a sibling instance's model from `Inventory` when it has no configured registry default (e.g. a whitelist-restricted instance whose policy-aware default resolved to `nil`), so such siblings are no longer dropped from the chain.
+- **Native offerings attributed to the registry instance, not a generic default** — `Inventory#native_provider_offerings` honored the adapter's self-reported instance (often a generic `default`, because the adapter is not told its registration name) over the authoritative registry instance Inventory was enumerating. That collapsed multiple configured instances of a cloud provider into a single `default` offering. The registry instance now wins, so each configured instance appears as its own offering.
+- **Account-scoped errors fail over to a sibling instance and deprioritize the failing one** — the escalation loop's generic provider-error handler called `skip_all_provider_model_instances!`, marking *every* instance of the failing provider+model as tried, and reported a provider-health failure that opened the instance's circuit. So a credit-balance error on one account (`anthropic` instance A) skipped the *other* account (instance B, same model) and crossed to a different provider — making the outcome depend on instance order. Account/instance-scoped errors (credit balance, payment, quota) now (a) skip only the failing instance so failover walks the provider's sibling instances first, and (b) **deprioritize** the failing instance by tripping *its* per-instance circuit (immediate, since the condition is deterministic) without `deny_model` and without penalizing the provider globally — so future requests prefer the healthy sibling and the circuit's cooldown→half_open re-probe auto-recovers the instance once topped up. Model-intrinsic errors still skip all instances of the model to preserve the attempt budget.
+- **Fleet lane renders the instance label, not rejects it** — an `instance_id` is a trusted operator label on the internal (datacenter-hosted) `Legion::Transport` RabbitMQ, not secret material, so `Fleet::Lane.offering_key` now *sanitizes* it (`Fleet::Lane.label_segment`) rather than rejecting any label containing a credential-ish word (e.g. an instance named `env_bearer`). The credential denylist still applies to genuinely untrusted values (`boundary`, eligibility facts). `Inventory#add_fleet_lane` also no longer lets a malformed (empty/over-length) label break offering construction — the offering builds without a fleet lane. Previously a credential-word instance name raised `ArgumentError` mid-build and made that instance unroutable.
+## [0.12.34] - 2026-06-17
+### Fixed
+- **Never dispatch a model the provider doesn't offer** — routing could pair an explicit provider with a foreign/stale model (observed: `anthropic` + `qwen3.6-27b`, which Anthropic never offered). Two sources closed: (1) `Router#explicit_resolution` now sources the model from `Inventory` (the SSOT, already whitelist/blacklist-filtered) before any stale registry/tier default, so an explicit provider resolves to a model it actually offers; (2) the executor's no-model fallback no longer drops the global `default_model` onto an unrelated provider — a resolved provider gets *its own* catalog model, and the global default applies only when no provider resolved (or it belongs to the resolved provider).
+- **Availability enforces the live catalog for every provider** — the Inventory model-existence gate previously exempted cloud/frontier providers; it now applies to all of them. A `(provider, model)` the catalog doesn't list is rejected (`:model_not_offered`), so a foreign or policy-excluded model can never reach dispatch. Empty/nil catalogs stay permissive (cold-boot safe).
+## [0.12.33] - 2026-06-17
+### Added
+- **Daemon-side model-policy enforcement (compliance)** — `Call::Dispatch.call` now refuses to dispatch a model excluded by a provider's `model_whitelist`/`model_blacklist`, failing closed with the new terminal `Legion::LLM::ModelNotAllowed` error before the provider call (the provider enforces the same policy as a backstop). Provider-raised `lex-llm` `ModelNotAllowedError` is mapped to the same type.
+### Fixed
+- **A policy-denied model is not an escalation** — both escalation paths (`Inference::Executor#run_escalation_resolution` and `Inference.chat_with_escalation`) now treat `ModelNotAllowed` as terminal: it is re-raised immediately rather than escalated to the next model, and it does not record a health failure, trip a circuit breaker, or deny-record the model. `ModelNotAllowed` is non-retryable.
+## [0.12.32] - 2026-06-16
+### Fixed
+- **Discovery model-divergence false positives** — The divergence warning now treats a configured default as present when a discovered id is a versioned family member of it (e.g. `anthropic.claude-sonnet-4` matches `anthropic.claude-sonnet-4-6`), instead of requiring an exact string or Ollama-style `:` tag. Multi-model cloud providers (Bedrock lists ~90 models) no longer warn on every boot. The warning also reports `discovered_count` and truncates the id list to a sample, so a divergence no longer dumps the full catalog into a single log line.
+## [0.12.31] - 2026-06-16
+### Changed
+- **lex-llm dependency** — Require `lex-llm >= 0.5.3`, the first published release carrying the `Legion::Extensions::Llm::Canonical` types the native dispatch path depends on. Resolves CI `NameError: uninitialized constant Legion::Extensions::Llm::Canonical` when the published gem (rather than a local checkout) is resolved.
+### Build
+- **RuboCop tooling** — Track `rubocop-legion` main until `0.1.8` (which ships the four `Legion/Framework` N×N guard cops referenced by `.rubocop.yml`, including `NoShapeDuckTyping`) is published; the published `0.1.7` predates those cops.
+## [0.12.30] - 2026-06-16
+### Fixed
+- **Legion routing header precedence** — Client translators now ignore protocol body `model` values for Legion routing and route only from `X-Legion-Provider`, `X-Legion-Model`, `X-Legion-Instance`, and `X-Legion-Tier` preferences.
+- **LegionIO alias routing** — The internal `legionio` model alias no longer erases existing provider, instance, or tier routing preferences when normalizing inference requests.
+- **Routing preference scoring** — Router hint matches now carry a dominant preference bonus without filtering fallback candidates, so `X-Legion-*` headers bias routing strongly while preserving normal fallback behavior.
+## [0.12.29] - 2026-06-16
+### Fixed
+- **Canonical content-block rendering** — Claude Messages and OpenAI Chat responses now unwrap canonical content blocks before client formatting, preventing Ruby object inspect strings from crossing HTTP response boundaries.
+- **OpenAI Chat server tool visibility** — Mixed LegionIO-executed tool failures and client passthrough tool calls now render the server tool result in assistant content while leaving only client tools actionable.
+## [0.12.28] - 2026-06-16
+### Fixed
+- **Canonical tool-loop result propagation** — Native tool loops now attach LegionIO-executed tool results to immutable canonical `ToolCall` objects without Hash mutation, preserving server-resolved source/result state alongside client passthrough tool calls for `/v1/responses`.
+- **Tool error result preservation** — Dispatcher failure details now survive native tool result content rendering so failed LegionIO-executed tools surface useful server-side tool output instead of `{}`.
+## [0.12.27] - 2026-06-16
+### Fixed
+- **Context-window escalation** — Provider-wrapped maximum-context-length errors now classify as `ContextOverflow` even when a provider gem reports them as a generic server/provider error, allowing escalation to skip same-tier candidates and seek a larger-context model.
+## [0.12.26] - 2026-06-16
+### Fixed
+- **Legion tool failure diagnostics** — Tool dispatch failure logs now use configurable `llm.tool_error_log_chars` with a 500-character default and prefer structured runtime failure details (`exit_status`, error line, output tail) over generated command prefixes.
+- **Legion-executed tool wording** — Native tool-loop logs now report `legion_executed_tools` and `all_legion_executed_tools_failed`, avoiding ambiguous server/client terminology while preserving client wire protocols.
+## [0.12.25] - 2026-06-16
+### Fixed
+- **Codex Responses rendering** — `/v1/responses` now unwraps canonical content-block arrays into plain `output_text` strings for both non-streaming responses and streaming fallback finalization, preventing Ruby object inspect strings from leaking into Codex.
+## [0.12.24] - 2026-06-16
+### Fixed
+- **Codex Responses routing** — `/v1/responses` no longer performs a provider-capability shortcut before routing. Codex requests always run through the router first, then dispatch via upstream Responses only when the resolved provider supports it.
+- **Responses escalation dispatch** — Escalation attempts for Responses-origin requests now use upstream Responses for capable providers and fall back to the normal routed chat/stream path for providers without Responses support.
+- **Escalation visibility** — Non-primary escalation attempts now log at `WARN` and include previous failure context so actual failover is visible in live logs. Primary attempts remain `INFO`.
+## [0.12.23] - 2026-06-16
+### Fixed
+- **Streaming escalation failover** — Provider switch notifications now build `Legion::LLM::Router::Resolution` with the fully-qualified namespace, preventing `NameError` after the first streaming attempt fails.
+- **Auth failure health handling** — Authentication and provider configuration failures now deny the affected provider instance/model and immediately trip that instance circuit instead of waiting for normal error-threshold health decay.
+- **Final context preflight** — Direct and Responses dispatch now re-estimate the final provider payload after system enrichment, tool definitions, tool preferences, and thinking options are materialized, raising `ContextOverflow` before submitting an oversized request to the provider.
+- **Failed attempt metering** — Escalation attempt metering events now include `status`, `error`, and `provider_submitted` fields so submitted failed calls can be audited without looking like successful zero-token completions.
+## [0.12.22] - 2026-06-16
+### Added
+- **Context token accounting** — `llm_message_inference_metrics` is now the canonical source of truth for all pipeline context token metrics. Every inference request emits a normalized `context_accounting` payload with per-component token estimates covering: loaded history, curated history, curation savings, thinking strip savings, archive savings, context-window enforcement savings, RAG injection, system/baseline prompt, tool definitions, and final estimated context size.
+- **`Inference::ContextAccounting` module** — Deterministic char/4 estimator with structured event builder for pipeline instrumentation.
+- **Executor instrumentation** — `step_context_load` records loaded/curated/archived history tokens; `ContextWindow` records thinking-strip and context-window enforcement savings; `RagContext` records RAG injection tokens; `ToolInjection` records tool definition payload tokens; system/baseline enrichment tokens recorded at dispatch.
+- **Provider reconciliation** — Finalized accounting includes a reconciliation block comparing estimated input tokens against provider-reported input tokens with delta.
+- **Metering event enrichment** — `Steps::Metering.build_event` carries the `context_accounting` payload for downstream ledger persistence.
+- **Audit event enrichment** — `AuditPublisher.build_event` exposes `context_accounting` as a top-level key for ledger writer convenience.
+- **Component status tracking** — Each accounting-producing pipeline step sets its component status (`:observed`, `:not_observed`, `:profile_skipped`) so zero-valued columns are distinguishable from skipped steps.
+## [0.12.21] - 2026-06-15
+### Added
+- **Capability source metadata** — Discovery, rule generation, and availability logs now carry per-capability source tags (`:model_override`, `:instance_override`, `:provider_override`, `:model_metadata`, `:provider_catalog`, `:probe`, `:provider_envelope`, `:default_false`).
+- **Conservative router hard gates** — Empty or unconfirmed capability data no longer passes `required_capabilities` checks. Absent means false.
+- **Source-aware cold boot** — During `:unknown` discovery status, capabilities must be explicitly confirmed by settings overrides or explicit metadata to satisfy hard gates.
+- **Typed routing errors** — `RoutingTooEarly` (425) when discovery not authoritative; `RoutingFailedDependency` (424) when no candidate satisfies hard gates. Replaces generic `EscalationExhausted` for routing-policy failures.
+- **Instance resolution enforcement** — `nil` instance on a resolution returns `:instance_unresolved` rejection.
+- **Diagnostic logging** — Missing-capability rejections include `sources=thinking:default_false` detail for operator visibility.
+- **Discovery schema v3** — `DISCOVERED_MODELS_SCHEMA_VERSION` bumped to invalidate cached entries lacking `capability_sources`.
+- **Rule generator source awareness** — Generated rules only include capabilities confirmed by source-tagged offering truth; stale registry metadata no longer blindly merged.
+- **Operator contract documentation** — `docs/work/planning/2026-06-15-capability-source-operator-contract.md`.
+### Fixed
+- Discovery no longer merges stale registry metadata capabilities over live offering data when offerings carry `capability_sources`.
+- Tool trigger matching strips `<system-reminder>...</system-reminder>` blocks from its scan text without mutating request history, preventing startup/handoff prompts from triggering broad tool injection.
+## [0.12.20] - 2026-06-15
+### Breaking
+- Routing intent key `:capability` removed; use `:operation` and `:effort`. Supplying `capability:` raises `ArgumentError`.
+- Settings `default_intent` must use `effort:`/`operation:` instead of `capability:`.
+### Added
+- Routing intent separates `effort` (soft preference), `operation` (hard filter), and `required_capabilities` (hard filter).
+- Effort levels: `:low`, `:moderate`, `:high`, `:reasoning`. `:medium` normalizes to `:moderate`.
+- Thinking is a hard capability only when explicitly requested via thinking config.
+- Router chains reject stale registry defaults not present in live discovered offerings.
+- Discovery status policy: `:unknown` permissive, `:ok` authoritative, `:empty` rejects, `:unreachable`/`:error` rejects.
+- `Discovery::DISCOVERED_MODELS_SCHEMA_VERSION` and `Cache::RESPONSE_CACHE_SCHEMA_VERSION` invalidate stale payloads.
+- Multi-instance provider routing: same provider with different instances carries distinct capabilities and availability.
+- `Inventory.invalidate_offerings_cache!` public method for discovery refresh actors.
+- Per-offering health bridging: discovery reports `:success`, `:error`, `:latency` to `Router.health_tracker`.
+- Discovered model entries include `health` and `loaded` fields from live offerings.
+- `loaded_model_bonus` scoring (+5) for models confirmed running by provider.
+- `resolve.no_rules_matched` warning includes rejection trace breakdown.
+- `missing_capability` availability log includes required and available capabilities.
+- Determinism and regression spec coverage (`spec/legion/llm/router/determinism_spec.rb`, `multi_instance_spec.rb`).
+### Fixed
+- vLLM live catalog IDs honored before dispatch; stale `qwen3.6-27b` rejected when only `legion-code-27b-v1` is offered.
+- `enabled_provider_chain` includes all registered instances, not just first per provider family.
+- `chain_from_defaults` primary resolution carries registered instance.
+- `chain_from_intent` dedup includes instance (same-model/different-instance preserved).
+- `build_fallback_resolutions` preserves instance directly.
+- Last-resort fallback resolutions filtered through live availability.
+- `enterprise_privacy_spec` order-dependency fixed.
+### Removed
+- `Discovery::System.memory_pressure?` (confirmed dead, no production callers).
+## [0.12.19] - 2026-06-12
+### Fixed
+- **Request-payload errors no longer deny models or trip circuit breakers** — `ValidationException` for malformed tool schemas (e.g., `tools.16.custom.input_schema.type: Field required`) is now classified as a request-payload error, not a provider config error. Models are no longer permanently denied for client-side schema bugs. (lib/legion/llm/inference/executor/escalation.rb)
+- **HealthTracker honors signal value and logs honestly** — Error handler now uses `payload[:value]` (default 1.0) instead of always incrementing by 1. Already-open circuits no longer re-log fake `closed→open` transitions. (lib/legion/llm/router/health_tracker.rb)
+- **Health keying consistency** — All `health_tracker.report` calls now include `instance:` from the resolution, ensuring discovery/escalation/post-request signals accumulate on the same provider/instance key. (lib/legion/llm/inference.rb, lib/legion/llm/inference/executor/escalation.rb)
+- **Discovery unreachable trips circuit immediately** — Connection failures during discovery now call `trip_circuit` instead of a `value: 1` report. A boot-time unreachable vLLM is marked `:open` without requiring 3 separate failures. (lib/legion/llm/discovery.rb)
+- **Tool schema normalization at canonical boundary** — `Canonical::ToolDefinition.normalize_parameters` guarantees every tool schema has a valid top-level `type`. Prevents Bedrock `tools.16.custom.input_schema.type: Field required` rejections. (lex-llm, legion-llm types/tool_definition.rb)
+- **Anthropic translator double-wrap eliminated** — `render_tools` no longer wraps the full JSON schema inside `{type: 'object', properties: schema}`. Passes canonical schemas through directly. (lex-llm-anthropic translator.rb, provider.rb)
+- **Provider tool renderers accept canonical ToolDefinition objects** — All provider gems (Gemini, Ollama, Vertex, vLLM, OpenAI-compatible) now use `ToolSchema.extract` instead of calling `tool.params_schema` directly. (lex-llm, lex-llm-gemini, lex-llm-ollama, lex-llm-vertex)
+- **Discovery unreachable propagates to legion-llm** — `discover_offerings(raise_on_unreachable: true)` raises transport failures instead of swallowing into `[]`. (lex-llm provider.rb, legion-llm lex_llm_adapter.rb)
+### Added
+- **Router::Availability oracle** — Single availability filter for routing and escalation. Checks circuit state, denied models, discovery status, context length, and required capabilities before building escalation chains. (lib/legion/llm/router/availability.rb)
+- **Legion::LLM::Capabilities module** — Normalized capability alias handling (`:function_calling` → `:tools`, `:stream` → `:streaming`). Shared across Router, Discovery, RuleGenerator. (lib/legion/llm/capabilities.rb)
+- **Escalation loop circuit guard** — Open circuits are skipped in the escalation loop (`:half_open` allowed as recovery probe). Empty chains raise `EscalationExhausted` immediately without opening sockets. (lib/legion/llm/inference/executor/escalation.rb)
+- **G6 streaming failover hooks** — `StreamAssembler` gains `provider_failed`/`provider_switched`/`safe_replay_snapshot` observer hooks. All client emitters gain `on_tool_call_abort`. Executor accepts `stream_observer:` kwarg. (lib/legion/llm/api/stream_assembler.rb, client_translators/*)
+- **Canonical::ToolSchema extractor** — Shared tool schema extraction regardless of input shape (ToolDefinition, Hash, legacy tool). (lex-llm canonical/tool_schema.rb)
+- **Provider contract strengthened** — `discover_offerings` requires `raise_on_unreachable:` parameter. Providers must accept canonical `ToolDefinition` objects. (lex-llm provider_contract.rb)
+### Removed
+- **Legacy non-stateful fallback path** — `try_fallback_or_raise`, `find_fallback_provider`, `fallback_local_providers?` deleted. One provider-switching mechanism: stateful escalation through `Router::EscalationChain`. (lib/legion/llm/inference/executor/routing.rb)
+### Changed
+- **Settings defaults** — `escalation.enabled: true`, `gaia.advisory_enabled: true`, `context_curation.thinking_eviction: true`, `context_curation.exchange_folding: true`, `streaming.emit_thinking_blocks: true`, `discovery.trip_circuit_on_unreachable: true`, `escalation.skip_open_circuits: true`.
+- **Provider capabilities include `:tools`** — All tool-capable providers now emit canonical `:tools` in capability metadata alongside aliases. (lex-llm-openai, lex-llm-gemini, lex-llm-vertex, lex-llm-bedrock, lex-llm-azure-foundry, lex-llm-ollama)
+## [0.12.18] - 2026-06-12
+### Fixed
+- **Anthropic `tool_use.input` regression — must be Object, not JSON string** — The P6 SharedExtractors dedup folded `serialize_args` into a single uniform helper that always returned a JSON string. The two client wire formats are incompatible: Anthropic `/v1/messages` REQUIRES `tool_use.input` (and `server_tool_use.input`) to be an Object; OpenAI `/v1/responses` and `/v1/chat/completions` REQUIRE `function_call.arguments` to be a JSON String. Replaced the uniform helper with two explicit per-format helpers — `args_as_object` (Anthropic) and `args_as_json_string` (OpenAI). Both helpers also defensively coerce degraded provider output (e.g. `1.01` numeric or unparsed JSON string from a qwen3.6-27b run that fell back to plain content) to the format-correct shape — `{}` for Anthropic, `"{}"` for OpenAI — rather than letting an off-spec value reach the wire. Live evidence: `legionio-e2e/results/claude/vllm_multi_turn_*` showed `"input": 1.01` against an Anthropic spec demanding an Object. Sibling check: G24 `server_tool_use.input` (Anthropic) and server `function_call.arguments` (Responses + chat completions) all use the per-format helper. (lib/legion/llm/api/client_translators/{shared_extractors,anthropic_messages,openai_chat,openai_responses}.rb)
+### Added
+- **Matrix harness oracle: `tool_args_typing_matrix_spec.rb`** — Asserts `tool_use.input is Hash` for Anthropic and `function_call.arguments is String` for OpenAI Responses + chat completions, on three input shapes (normal Hash args, degraded numeric, G24 server-tool block). Verified to fail with the exact regression signature ("got Float: 1.01") when the per-format coercion is reverted. Closes the assertion gap that let 3056 specs pass while the regression shipped to live e2e. (spec/legion/llm/api/matrix/tool_args_typing_matrix_spec.rb, spec/support/fake_provider.rb new `:tool_degraded_args` scenario)
+## [0.12.17] - 2026-06-12
+### Deprecated
+- **Legacy flat API tree under `lib/legion/llm/api/{anthropic,openai,native}/`** — The flat-file route tree is deprecated. `llm.api.use_namespaces` defaults to `true`; setting it to `false` continues to register the legacy chain but now logs a deprecation warning at registration time. The legacy tree (`api/anthropic/messages.rb`, `api/openai/{chat_completions,embeddings,models,responses}.rb`, `api/native/*.rb` flat files) and the `register_legacy` dispatcher will be **deleted in the next minor release**. All routing is consolidated under `api/namespaces/` and the new `api/client_translators/` + `api/stream_assembler.rb` (P5). (lib/legion/llm/api.rb)
+- **`Legion::LLM::Inference.ask_direct` is a deprecated shim** — Previously routed through `chat_direct_raw`, an ungoverned path that bypassed metering/audit. Now routes through the governed pipeline via `chat_direct` and emits a `Deprecation.warn_once` warning. Same compliance gap closure as the `chat_direct`/`embed_direct`/`structured_direct` deprecation in 0.12.16. Use `Legion::LLM.ask` instead. (lib/legion/llm/inference.rb)
+### Removed
+- **Absorbed translator shims** — `lib/legion/llm/api/translators/{anthropic,openai}_{request,response}.rb`, `Legion::LLM::Call::NativeResponseAdapter`, and the per-route thinking/token/tool-call extractor duplicates absorbed by `api/client_translators/` and `api/stream_assembler.rb` are removed.
+### Added
+- **rubocop-legion guard cops adopted** — Repo-wide enable for `Legion/Framework/NoUnderscorePrefixedKwargs` (G13), `Legion/Framework/NoInlineSettingDefaults` (G13), and `Legion/Framework/NoDirectDispatch` (G16). `Legion/Framework/NoShapeDuckTyping` (R10) enabled scoped to `lib/legion/llm/api/**` and `lib/legion/llm/inference/**`. (.rubocop.yml, Gemfile)
+- **CLAUDE.md "LLM Routing Invariants" section (G2)** — Public-safe invariant set: execution-proxy contract (LegionIO tools look server-side to clients, client-side to providers), always-translate (never passthrough), no provider-name conditionals outside translators, thinking never crosses providers, mid-stream failover required, every pipeline exit emits ledger events, the canary prompt.
+- **RuleGenerator merges instance-level capabilities into chat rules** — Auto-generated chat rules now carry the provider's instance-level `:tools`/`:streaming`/`:vision` capabilities (e.g. lex-llm-vllm declares `capabilities: %i[completion streaming vision tools]` on its DEFAULT_INSTANCE_TIER) when the per-model offerings hash only surfaces `[:completion]`. Without this, the router logged `resolve.no_rules_matched required_capabilities=[:tools]` on every tool request and fell through to the default-provider chain. `Discovery.discovered_instances` threads `Call::Registry` instance metadata into the grouped instance hash; `RuleGenerator.merged_capabilities` unions per-model and instance-level caps. (lib/legion/llm/discovery.rb, lib/legion/llm/discovery/rule_generator.rb)
+- **B3 OpenAI Responses reasoning summary opt-in** — `OpenAIResponses#ensure_reasoning_summary` defaults `reasoning.summary` to `'auto'` when the caller asked for reasoning (effort set) but didn't pin a summary mode. OpenAI's `/v1/responses` lane omits reasoning content otherwise, which left codex→openai cells returning only the message item with no reasoning. (lib/legion/llm/api/client_translators/openai_responses.rb, lib/legion/llm/api/namespaces/openai/responses.rb)
+- **Matrix harness regression encoding** — `spec/legion/llm/api/matrix/tool_injection_matrix_spec.rb` asserts that registered LegionIO tools reach the upstream provider's `tools:` kwarg on all three client formats. The cell that previously surfaced this failure live (claude/vllm legionio_tool_injection answering "There is no tool") now fails offline with a deterministic FakeProvider when injection drops out.
+### Changed
+- **`Legion::LLM::Router` signature cleanup** — Removed unused `**_opts` swallow-splats from `resolve`, `resolve_chain`, `select_candidates`, `chain_from_intent` (no callers passed extra kwargs).
+- **`routing.last_resort_{model,provider}` settings** — Replace inline `'claude-sonnet-4-6'`/`:anthropic` defaults in the router's last-resort fallback chain.
+- **`telemetry.unknown_model_tag` setting** — Replaces the inline `'unknown'` default in OpenInference span tagging.
+## [0.12.16] - 2026-06-11
+### Deprecated
+- **`chat_direct`, `embed_direct`, `structured_direct` are deprecated shims** — These methods previously bypassed the Inference pipeline (no metering/audit). They are now rerouted through the governed pipeline using a `:system` caller profile that skips governance steps but preserves metering and audit emission. Use `Legion::LLM.chat`, `Legion::LLM.embed`, and `Legion::LLM.structured` instead. The deprecated names will be removed in the next major version. (lib/legion/llm/inference.rb, lib/legion/llm.rb, lib/legion/llm/deprecation.rb)
+### Changed
+- **scheduling/batch.rb uses governed pipeline** — `Batch.submit_single` now calls `Legion::LLM.chat` with a `:system` caller identity instead of `chat_direct`, ensuring batched requests are metered (lib/legion/llm/scheduling/batch.rb)
+- **inference/steps/debate.rb uses governed pipeline** — Debate role calls now use `Legion::LLM.chat` with a `:system` caller identity instead of `chat_direct`, ensuring debate invocations are metered (lib/legion/llm/inference/steps/debate.rb)
+### Added
+- **Deprecation helper** — `Legion::LLM::Deprecation.warn_once` emits a single `log.warn` per process per method name, thread-safe via Mutex (lib/legion/llm/deprecation.rb)
+- **Recursion guard in Executor** — `Thread.current[:legion_llm_in_pipeline]` prevents infinite loops when pipeline steps internally call `chat_direct` (lib/legion/llm/inference/executor.rb)
+## [0.12.15] - 2026-06-10
+### Fixed
+- **Async post-step race condition in test suite** — Disabled `pipeline_async_post_steps` in spec_helper's global `before(:each)` to prevent `ASYNC_THREAD_POOL` from racing with `Settings.reset!` between examples, which caused 4 intermittent `NoMethodError: undefined method '[]' for nil` failures in executor_stream_spec and pre_rollout_integration_spec (spec_helper.rb)
+- **knowledge_capture_spec missing build_response** — Added minimal `build_response` to the test harness klass so `current_response` (from PostResponse) can construct a Response object instead of silently failing, which caused the ingest assertion to never fire (steps/knowledge_capture_spec.rb)
+- **executor_async_spec stale stub target** — Fixed string-keyed async test that stubbed `Legion::LLM.settings` (unused by production code) instead of setting `Legion::Settings[:llm][:pipeline_async_post_steps]` directly (executor_async_spec.rb)
 ## [0.12.14] - 2026-06-10
 ### Added