legion-llm 0.12.3 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (116) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +1 -0
  3. data/.rubocop.yml +71 -20
  4. data/AGENTS.md +74 -21
  5. data/CHANGELOG.md +434 -0
  6. data/CLAUDE.md +104 -762
  7. data/Gemfile +12 -8
  8. data/README.md +97 -4
  9. data/legion-llm.gemspec +1 -1
  10. data/lib/legion/llm/api/anthropic/messages.rb +15 -3
  11. data/lib/legion/llm/api/client_translators/anthropic_messages.rb +761 -0
  12. data/lib/legion/llm/api/client_translators/openai_chat.rb +623 -0
  13. data/lib/legion/llm/api/client_translators/openai_responses.rb +852 -0
  14. data/lib/legion/llm/api/client_translators/shared_extractors.rb +150 -0
  15. data/lib/legion/llm/api/debug_formats.rb +356 -0
  16. data/lib/legion/llm/api/namespaces/anthropic/files.rb +0 -3
  17. data/lib/legion/llm/api/namespaces/anthropic/messages.rb +80 -168
  18. data/lib/legion/llm/api/namespaces/native/chat.rb +12 -3
  19. data/lib/legion/llm/api/namespaces/native/inference.rb +19 -12
  20. data/lib/legion/llm/api/namespaces/native/tiers.rb +1 -1
  21. data/lib/legion/llm/api/namespaces/openai/audio/speech.rb +0 -2
  22. data/lib/legion/llm/api/namespaces/openai/audio/transcriptions.rb +0 -2
  23. data/lib/legion/llm/api/namespaces/openai/audio/translations.rb +0 -2
  24. data/lib/legion/llm/api/namespaces/openai/batches.rb +3 -3
  25. data/lib/legion/llm/api/namespaces/openai/chat/completions.rb +82 -175
  26. data/lib/legion/llm/api/namespaces/openai/completions.rb +11 -5
  27. data/lib/legion/llm/api/namespaces/openai/conversations/items.rb +51 -2
  28. data/lib/legion/llm/api/namespaces/openai/conversations.rb +1 -1
  29. data/lib/legion/llm/api/namespaces/openai/embeddings.rb +1 -1
  30. data/lib/legion/llm/api/namespaces/openai/files.rb +2 -2
  31. data/lib/legion/llm/api/namespaces/openai/images.rb +0 -8
  32. data/lib/legion/llm/api/namespaces/openai/moderations.rb +0 -3
  33. data/lib/legion/llm/api/namespaces/openai/responses.rb +103 -233
  34. data/lib/legion/llm/api/namespaces/openai/uploads/parts.rb +1 -1
  35. data/lib/legion/llm/api/namespaces/openai/uploads.rb +2 -2
  36. data/lib/legion/llm/api/namespaces/openai/vector_stores/file_batches.rb +0 -3
  37. data/lib/legion/llm/api/namespaces/openai/vector_stores/files.rb +0 -3
  38. data/lib/legion/llm/api/namespaces/openai/vector_stores.rb +0 -3
  39. data/lib/legion/llm/api/native/chat.rb +2 -2
  40. data/lib/legion/llm/api/native/helpers.rb +1 -1
  41. data/lib/legion/llm/api/native/inference.rb +0 -2
  42. data/lib/legion/llm/api/native/models.rb +2 -2
  43. data/lib/legion/llm/api/native/tiers.rb +3 -3
  44. data/lib/legion/llm/api/openai/chat_completions.rb +20 -5
  45. data/lib/legion/llm/api/openai/responses.rb +15 -6
  46. data/lib/legion/llm/api/shared_helpers.rb +141 -4
  47. data/lib/legion/llm/api/stream_assembler.rb +705 -0
  48. data/lib/legion/llm/api/translators/anthropic_response.rb +208 -33
  49. data/lib/legion/llm/api/translators/openai_response.rb +20 -1
  50. data/lib/legion/llm/api.rb +8 -4
  51. data/lib/legion/llm/cache/response.rb +2 -2
  52. data/lib/legion/llm/cache.rb +9 -7
  53. data/lib/legion/llm/call/dispatch.rb +349 -200
  54. data/lib/legion/llm/call/embeddings.rb +3 -3
  55. data/lib/legion/llm/call/lex_llm_adapter.rb +253 -39
  56. data/lib/legion/llm/call/structured_output.rb +3 -3
  57. data/lib/legion/llm/capabilities.rb +46 -0
  58. data/lib/legion/llm/compat.rb +2 -3
  59. data/lib/legion/llm/content_hash.rb +52 -0
  60. data/lib/legion/llm/context/compressor.rb +1 -1
  61. data/lib/legion/llm/context/curator.rb +23 -6
  62. data/lib/legion/llm/deprecation.rb +34 -0
  63. data/lib/legion/llm/discovery/rule_generator.rb +126 -15
  64. data/lib/legion/llm/discovery/system.rb +1 -9
  65. data/lib/legion/llm/discovery.rb +205 -23
  66. data/lib/legion/llm/errors.rb +37 -0
  67. data/lib/legion/llm/fleet/dispatcher.rb +1 -3
  68. data/lib/legion/llm/fleet/lane.rb +16 -1
  69. data/lib/legion/llm/fleet/token_issuer.rb +2 -1
  70. data/lib/legion/llm/inference/audit_publisher.rb +25 -0
  71. data/lib/legion/llm/inference/context_accounting.rb +111 -0
  72. data/lib/legion/llm/inference/embed_pipeline.rb +187 -0
  73. data/lib/legion/llm/inference/executor/context_window.rb +199 -0
  74. data/lib/legion/llm/inference/executor/escalation.rb +798 -0
  75. data/lib/legion/llm/inference/executor/routing.rb +471 -0
  76. data/lib/legion/llm/inference/executor/tool_injection.rb +396 -0
  77. data/lib/legion/llm/inference/executor.rb +439 -1419
  78. data/lib/legion/llm/inference/native_tool_loop.rb +474 -45
  79. data/lib/legion/llm/inference/prompt.rb +1 -1
  80. data/lib/legion/llm/inference/request.rb +9 -4
  81. data/lib/legion/llm/inference/route_attempts.rb +43 -6
  82. data/lib/legion/llm/inference/steps/debate.rb +10 -3
  83. data/lib/legion/llm/inference/steps/knowledge_capture.rb +40 -1
  84. data/lib/legion/llm/inference/steps/metering.rb +23 -1
  85. data/lib/legion/llm/inference/steps/post_response.rb +27 -48
  86. data/lib/legion/llm/inference/steps/rag_context.rb +20 -0
  87. data/lib/legion/llm/inference/steps/rbac.rb +2 -2
  88. data/lib/legion/llm/inference/steps/sticky_persist.rb +1 -1
  89. data/lib/legion/llm/inference/steps/tier_assigner.rb +4 -4
  90. data/lib/legion/llm/inference/steps/tool_calls.rb +85 -10
  91. data/lib/legion/llm/inference/steps/tool_history.rb +62 -9
  92. data/lib/legion/llm/inference/steps/trigger_match.rb +20 -1
  93. data/lib/legion/llm/inference.rb +105 -18
  94. data/lib/legion/llm/inventory.rb +107 -22
  95. data/lib/legion/llm/metering/tracker.rb +1 -1
  96. data/lib/legion/llm/metering/usage.rb +6 -3
  97. data/lib/legion/llm/metering.rb +1 -1
  98. data/lib/legion/llm/quality/checker.rb +34 -2
  99. data/lib/legion/llm/quality/confidence/scorer.rb +7 -1
  100. data/lib/legion/llm/router/availability.rb +178 -0
  101. data/lib/legion/llm/router/candidates.rb +263 -0
  102. data/lib/legion/llm/router/health_tracker.rb +32 -3
  103. data/lib/legion/llm/router/registry_lookup.rb +121 -0
  104. data/lib/legion/llm/router/rule.rb +3 -2
  105. data/lib/legion/llm/router.rb +352 -309
  106. data/lib/legion/llm/scheduling/batch.rb +3 -3
  107. data/lib/legion/llm/scheduling.rb +2 -2
  108. data/lib/legion/llm/settings.rb +103 -35
  109. data/lib/legion/llm/tools/dispatcher.rb +49 -11
  110. data/lib/legion/llm/tools/special.rb +45 -3
  111. data/lib/legion/llm/transport/message.rb +43 -1
  112. data/lib/legion/llm/types/tool_definition.rb +3 -1
  113. data/lib/legion/llm/vector_store/storage.rb +0 -2
  114. data/lib/legion/llm/version.rb +1 -1
  115. data/lib/legion/llm.rb +71 -6
  116. metadata +21 -3
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 733b0a06bf37557dfe7c661d6d63596e30ef5cde4b01c9ae12e801c5257b684e
4
- data.tar.gz: f24cfe5b32ed137a47efaac7adf57f348d911cb1f511fbdfc5a769ce54d7c32a
3
+ metadata.gz: eaa596812e2320baa0d8ae31b44225c80e9ad9b54e62531b3cd1c7640ff09fe6
4
+ data.tar.gz: a98b90ae07c7040fcd8a13adccf1d07037f253cc845cdc4dc55b65f434805ce0
5
5
  SHA512:
6
- metadata.gz: c006671e6a1dc02bff77b18e3c31dc4cb771ffd1ba5a6b773e2d7a62a9e886b76e95fb1e9780b0c1285e0165a218112f239fe088ceae9296da440918bd682648
7
- data.tar.gz: 14eaff3d563eb80e058f699bca53c79308642f745710c1a546b7e00ecfde049129d90e91a52cea5e13a0e19c39afb1282daf1228640680b8f7b94db777676f81
6
+ metadata.gz: a7611f997b2163792aa4f29d8ca2e3b8c10ec11af08ff8d89fae578ab0b0a138fbd9ad4957a5c42b793c30e959ab4c17290f01972cd0d8de8128e7e79873e28c
7
+ data.tar.gz: 3b0acc643ffe9c06c6d07df9c5dce9a54d79351878354e30c1cc22fb5c5001debf9ac529ddeb873868ff713f455665c6e5fe992558d70ef376151f9b368b7086
data/.gitignore CHANGED
@@ -24,3 +24,4 @@ docs/
24
24
  bin/apollo-setup-postreboot.sh
25
25
  bin/apollo-setup-prereboot.sh
26
26
  legionio-bootstrap-uhg-v3.json
27
+ docs/
data/.rubocop.yml CHANGED
@@ -1,60 +1,111 @@
1
+ plugins:
2
+ - rubocop-legion
3
+
4
+ # These rubocop-legion cops surface in the local-path 0.1.7 build but were not
5
+ # in the published 0.1.7 gem the repo previously tracked. They flag broad
6
+ # pre-existing patterns unrelated to the N×N enforcement pass; deferred to
7
+ # their own cleanup task. NoUnderscorePrefixedKwargs / NoInlineSettingDefaults
8
+ # / NoDirectDispatch / NoShapeDuckTyping (the B4 set Phase 6 adopts) remain
9
+ # enabled.
10
+ Legion/RescueLogging/NoCapture:
11
+ Enabled: false
12
+ Legion/ConstantSafety/InheritParam:
13
+ Enabled: false
14
+
1
15
  AllCops:
2
16
  TargetRubyVersion: 3.4
3
17
  NewCops: enable
4
18
  SuggestExtensions: false
5
19
 
6
- Layout/LineLength:
7
- Max: 160
20
+ # N×N routing guard cops (Phase 6 enforcement; defaults from rubocop-legion config/default.yml).
21
+ #
22
+ # - NoUnderscorePrefixedKwargs / NoInlineSettingDefaults / NoDirectDispatch are
23
+ # enabled repo-wide.
24
+ # - NoShapeDuckTyping is enabled on the canonical-only surface where the shape
25
+ # contract is fully established by translators. Code at the HTTP/wire ingress
26
+ # (client_translator parse_request, StreamAssembler chunk adapter, DebugFormats
27
+ # request envelope reader, Response.from_provider_message bridge) legitimately
28
+ # inspects shape because that's the layer responsible for normalising into
29
+ # canonical. The cop's scope expands here as the executor's canonical
30
+ # migration (Phase 4 follow-up) lands.
31
+ Legion/Framework/NoShapeDuckTyping:
32
+ Enabled: true
33
+ Include:
34
+ - 'lib/legion/llm/api/**/*.rb'
35
+ - 'lib/legion/llm/inference/**/*.rb'
36
+ Exclude:
37
+ # Legacy tree deprecated this release (R11); deleted next minor.
38
+ - 'lib/legion/llm/api/translators/**/*.rb'
39
+ - 'lib/legion/llm/api/anthropic/**/*.rb'
40
+ - 'lib/legion/llm/api/openai/**/*.rb'
41
+ - 'lib/legion/llm/api/native/**/*.rb'
42
+ - 'lib/legion/llm/api/shared_helpers.rb'
43
+ # Boundary code that legitimately bridges wire ↔ canonical: parse_request
44
+ # at the HTTP ingress, the StreamAssembler chunk adapter (P5 explicitly
45
+ # accepts both Canonical::Chunk and the legacy StreamChunk shape during
46
+ # migration), DebugFormats (reads raw env / reflects request), and
47
+ # shared_extractors (normalises arbitrary thinking content shapes).
48
+ # Pre-canonical inference steps still navigate raw wire hashes; that
49
+ # scope tightens once the executor finishes the canonical migration
50
+ # (Phase 4 follow-up).
51
+ - 'lib/legion/llm/api/client_translators/anthropic_messages.rb'
52
+ - 'lib/legion/llm/api/client_translators/openai_chat.rb'
53
+ - 'lib/legion/llm/api/client_translators/openai_responses.rb'
54
+ - 'lib/legion/llm/api/client_translators/shared_extractors.rb'
55
+ - 'lib/legion/llm/api/stream_assembler.rb'
56
+ - 'lib/legion/llm/api/debug_formats.rb'
57
+ - 'lib/legion/llm/api/namespaces/**/*.rb'
58
+ - 'lib/legion/llm/inference/audit_publisher.rb'
59
+ - 'lib/legion/llm/inference/embed_pipeline.rb'
60
+ - 'lib/legion/llm/inference/enrichment_injector.rb'
61
+ - 'lib/legion/llm/inference/executor.rb'
62
+ - 'lib/legion/llm/inference/executor/**/*.rb'
63
+ - 'lib/legion/llm/inference/native_tool_loop.rb'
64
+ - 'lib/legion/llm/inference/profile.rb'
65
+ - 'lib/legion/llm/inference/response.rb'
66
+ - 'lib/legion/llm/inference/route_attempts.rb'
67
+ - 'lib/legion/llm/inference/steps/**/*.rb'
8
68
 
69
+ Layout/LineLength:
70
+ Max: 195
9
71
  Layout/SpaceAroundEqualsInParameterDefault:
10
72
  EnforcedStyle: space
11
-
12
73
  Layout/HashAlignment:
13
74
  EnforcedHashRocketStyle: table
14
75
  EnforcedColonStyle: table
15
-
16
76
  Metrics/MethodLength:
17
- Max: 60
18
-
77
+ Max: 150
19
78
  Metrics/ClassLength:
20
79
  Max: 1500
21
-
22
80
  Metrics/ModuleLength:
23
81
  Max: 1500
24
-
25
82
  Metrics/BlockLength:
26
- Max: 40
83
+ Max: 150
27
84
  Exclude:
28
85
  - 'spec/**/*'
29
86
 
30
87
  Metrics/AbcSize:
31
- Max: 85
32
-
88
+ Max: 110
89
+ Metrics/BlockNesting:
90
+ Max: 4
33
91
  Metrics/CyclomaticComplexity:
34
- Max: 35
92
+ Max: 50
35
93
 
36
94
  Metrics/PerceivedComplexity:
37
- Max: 35
38
-
95
+ Max: 50
39
96
  Style/Documentation:
40
97
  Enabled: false
41
-
42
98
  Style/SymbolArray:
43
99
  Enabled: true
44
-
45
100
  Style/FrozenStringLiteralComment:
46
101
  Enabled: true
47
102
  EnforcedStyle: always
48
-
49
103
  Naming/FileName:
50
104
  Enabled: false
51
-
52
105
  Naming/PredicateMethod:
53
106
  Enabled: false
54
-
55
107
  Metrics/ParameterLists:
56
108
  Max: 9
57
-
58
109
  Style/RedundantConstantBase:
59
110
  Exclude:
60
111
  - 'spec/**/*'
data/AGENTS.md CHANGED
@@ -1,37 +1,90 @@
1
- # legion-llm Agent Notes
1
+ # legion-llm Agent Notes (v0.13.0)
2
2
 
3
- ## Scope
4
-
5
- `legion-llm` provides provider configuration, chat/embed/structured interfaces, dynamic routing, escalation, quality checks, and pipeline execution for Legion.
3
+ `legion-llm` is a **universal translation proxy** for LLM traffic: N client dialects (OpenAI Chat,
4
+ OpenAI Responses, Anthropic Messages) × N provider backends (Bedrock, Anthropic, OpenAI, vLLM,
5
+ Ollama, fleet), any direction. Every request parses once into `Canonical::Request`, is
6
+ routed/executed, then renders once back to the caller's dialect. See `CLAUDE.md` for the full
7
+ invariant set; `README.md` for detailed reference.
6
8
 
7
9
  ## Fast Start
8
10
 
9
11
  ```bash
10
12
  bundle install
11
- bundle exec rspec
12
- bundle exec rubocop
13
+ bundle exec rspec # 0 failures required before commit
14
+ bundle exec rubocop # 0 offenses required
13
15
  ```
14
16
 
17
+ **The in-process matrix harness (`spec/legion/llm/api/matrix/`) is the commit gate.** Touch
18
+ `lib/legion/llm/api/`, the executor, or the canonical/translator boundary → it must pass before push.
19
+
15
20
  ## Primary Entry Points
16
21
 
17
- - `lib/legion/llm.rb`
18
- - `lib/legion/llm/providers.rb`
19
- - `lib/legion/llm/router/`
20
- - `lib/legion/llm/pipeline/`
21
- - `lib/legion/llm/structured_output.rb`
22
- - `lib/legion/llm/embeddings.rb`
23
- - `lib/legion/llm/fleet/`
22
+ - `lib/legion/llm.rb` — facade (`start`, `chat`, `ask`, `embed`, `structured`)
23
+ - `lib/legion/llm/inventory.rb` — **single source of truth** for the model catalog
24
+ - `lib/legion/llm/router.rb` + `router/{candidates,availability,health_tracker,escalation/}` — routing
25
+ - `lib/legion/llm/inference/executor.rb` + `executor/{routing,escalation}.rb` — pipeline
26
+ - `lib/legion/llm/inference/steps/` — the 18 pipeline steps
27
+ - `lib/legion/llm/api/{openai,anthropic,native}/` — client routes
28
+ - `lib/legion/llm/api/client_translators/` — canonical ↔ client wire formats
29
+ - `lib/legion/llm/context/curator.rb` — async conversation curation (context-cost control)
30
+ - Provider behaviour (defaults, capabilities, model filtering) lives in `../extensions-ai/lex-llm-*`
24
31
 
25
32
  ## Guardrails
26
33
 
27
- - Keep typed error behavior and retry semantics stable (`ProviderDown`, `RateLimitError`, `EscalationExhausted`, etc.).
28
- - Routing and escalation must remain deterministic given the same inputs/settings.
29
- - Preserve pipeline feature-flag behavior; avoid forcing pipeline-only code paths.
30
- - Keep provider credentials resolved through settings secret resolution flow; never hardcode secrets.
31
- - Maintain compatibility with direct methods (`chat_direct`, `embed_direct`, `structured_direct`) and daemon-aware flows.
32
- - Health tracker and rule scoring are contract-sensitive; changes require spec updates.
34
+ - **Always translate, never passthrough**; **no `provider == :x` branches** outside translators.
35
+ - **Inventory is the only catalog**; `Discovery`/`Registry`/`HealthTracker` are feeders.
36
+ - Never dispatch a triple absent from the live catalog or unhealthy; **fail over, don't hard-fail**.
37
+ - **Model policy = compliance**: `model_whitelist`/`model_blacklist` honored at dispatch, fail-closed;
38
+ a policy-denied model is terminal (never escalated, never trips circuits).
39
+ - Thinking never crosses providers; mid-stream failover must not kill an in-flight conversation.
40
+ - Every pipeline exit emits ledger events (metering/audit) — no bypasses.
41
+ - `Legion::JSON` only (symbol keys); every `rescue` re-raises or `handle_exception`s; no
42
+ `defined?(Legion::Settings)` guards; `log.*` not `puts`.
43
+ - **No personal/company identifiers in VCS**; never force-push.
44
+ - Routing/escalation deterministic for the same inputs/settings; health-tracker & rule scoring are
45
+ contract-sensitive — changes require spec updates.
33
46
 
34
47
  ## Validation
35
48
 
36
- - Run targeted specs for modified router/pipeline/provider code.
37
- - Before handoff, run full `bundle exec rspec` and `bundle exec rubocop`.
49
+ Run targeted specs for modified router/pipeline/translator code, then full `rspec` + `rubocop` +
50
+ the matrix harness before handoff.
51
+
52
+ ---
53
+
54
+ ## Client Request Headers Reference
55
+
56
+ Verified from source (Claude Code binary + Codex `codex-rs`). Useful when working on `/v1/messages`
57
+ and `/v1/responses` handlers. Routing/identity headers `X-Legion-{Provider,Model,Instance,Tier}` are
58
+ honored as **rules** (hard constraints), not hints.
59
+
60
+ ### Claude Code → `POST /v1/messages`
61
+
62
+ | Header | Value | Always? |
63
+ |---|---|---|
64
+ | `X-Claude-Code-Session-Id` | Stable UUID for the CLI session | Yes |
65
+ | `x-app` | `"cli"` (foreground) or `"cli-bg"` (background) | Yes |
66
+ | `x-claude-code-agent-id` / `x-claude-code-parent-agent-id` | Agent / parent-agent UUIDs | Conditional |
67
+
68
+ Threading is **stateless** — full `messages[]` history in the body every request; no conversation/turn
69
+ ID header. In Rack env: `HTTP_X_CLAUDE_CODE_SESSION_ID`, `HTTP_X_APP`, etc.
70
+
71
+ ### Codex → `POST /v1/responses`
72
+
73
+ | Header | Value | Always? |
74
+ |---|---|---|
75
+ | `session-id` | Stable UUID for the Codex session | Yes |
76
+ | `thread-id` | Stable UUID for the thread/conversation | Yes |
77
+ | `x-client-request-id` | Same value as `thread-id` | Yes |
78
+ | `x-codex-installation-id` | Installation-scoped UUID | Yes |
79
+ | `x-codex-turn-state` | Sticky-routing token, replayed by client | After first response |
80
+ | `x-openai-subagent` | Sub-agent type (`review`, `compact`, …) | Conditional |
81
+
82
+ `HTTP_THREAD_ID` is the stable thread/conversation ID (not per-request); `HTTP_X_CLIENT_REQUEST_ID`
83
+ equals it. HTTP threading is stateless (full input in body); over WebSocket, `previous_response_id`
84
+ enables delta-only input.
85
+
86
+ ```ruby
87
+ request_id = env['HTTP_X_CLIENT_REQUEST_ID'] || "req_#{SecureRandom.hex(12)}"
88
+ conversation_id = env['HTTP_THREAD_ID'] || env['HTTP_X_LEGION_CONVERSATION_ID'] ||
89
+ body[:conversation_id] || "conv_#{SecureRandom.hex(8)}"
90
+ ```