legion-llm 0.5.24 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7faa875219df27f00724444b192db231e6ef9f0b9a56da041791fbec1a17abb0
4
- data.tar.gz: 271946a65933ed5366a9240490f4ba2df08f9a5c70200ab3b2605d413f6da611
3
+ metadata.gz: f65ac724c32de98ddfa324545b62e81cda38e27efdcdcbceb21abd21729ae599
4
+ data.tar.gz: 7c02a90eac3bda99512da956c889a06084980468c034c25e0c602d7e06db7ac3
5
5
  SHA512:
6
- metadata.gz: 5b75a996f84fa52a56e2dd145ae411007e7b0e82ebbf4c3ebf472d3368c8fb9647c311de67eb3500861c2758ba2c62ae7d90c7f8540c2080625ee99ba40d21c2
7
- data.tar.gz: d390c9821b76cca8889c46a5f1999408de60dadb5d0c6db05c3fa7b0e68faa152bd557b9e19e38897b5ff52af8a05be26f74f0dc32a10ea09a282247a62f0c60
6
+ metadata.gz: 71f7496e4df651c8d93bf3ac27059a2075f0b82299afa1f61f98138dc81db90ed3139c27b933774969d7d727ff9483db2a92514d82460a3f3de1c2dfbbff44ff
7
+ data.tar.gz: 6757e931ab1bef7d95c1470a3cf24077fa777683955bc0e5ed6ab6b7d7ef6a2f6f4f613668b6395a52a4200cdb97ae9aebc80f6bc9af5ee896c8a44142215425
data/CHANGELOG.md CHANGED
@@ -2,6 +2,29 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [0.6.0] - 2026-03-31
6
+
7
+ ### Added
8
+ - `Legion::LLM::ProviderRegistry` — thread-safe registry for native lex-* provider extensions: `register(name, ext)`, `for(name)`, `available`, `registered?(name)`, `reset!`; cleared automatically on `Legion::LLM.shutdown` (closes #37)
9
+ - `Legion::LLM::NativeDispatch` — native provider dispatch layer: `dispatch_chat`, `dispatch_embed`, `dispatch_stream`, `dispatch_count_tokens` route calls to registered lex-* extension modules and return standardized `{ result:, usage: Usage }` hashes; raises `ProviderError` when provider is not registered (closes #37)
10
+ - `Legion::LLM::NativeResponseAdapter` — adapter wrapping native dispatch result hash to expose the same `.content`, `.input_tokens`, `.output_tokens`, `.usage` interface as a RubyLLM response object (closes #37)
11
+ - `provider_layer` settings section: `mode` (`'ruby_llm'` default / `'native'` / `'auto'`), `native_providers` (default `['claude', 'bedrock']`), `fallback_to_ruby_llm` (default `true`); `ruby_llm` mode preserves all existing behavior unchanged (closes #37)
12
+ - Auto-registration in `Legion::LLM.start`: detects loaded lex-* extensions via `Object.const_defined?` and registers them — `lex-claude` → `:claude`/`:anthropic`, `lex-bedrock` → `:bedrock`, `lex-openai` → `:openai`, `lex-gemini` → `:gemini`; no hard dependencies added (closes #37)
13
+ - `Pipeline::Executor` provider layer integration: `use_native_dispatch?` checks `provider_layer.mode`; `execute_provider_request_native` calls `NativeDispatch.dispatch_chat` and wraps result in `NativeResponseAdapter`, falls back to RubyLLM when `fallback_to_ruby_llm: true`; `execute_provider_request_ruby_llm` is the extracted RubyLLM path (default, no behavior change) (closes #37)
14
+ - Optional adversarial debate pipeline step for high-stakes decisions (closes #28): `Pipeline::Steps::Debate` runs a multi-round advocate/challenger/judge debate after `provider_call`; the initial response is the advocate, a challenger model critiques it, the advocate rebuts, and a judge model synthesizes all sides into the final response; activation via `debate: true` in `chat()` kwargs, or `Legion::Settings[:llm][:debate][:enabled]`, or GAIA auto-trigger when `gaia_auto_trigger: true` and `high_stakes`/`debate_recommended` are set in the advisory enrichment; debate is disabled by default; GAIA auto-trigger defaults to false in v0.6.0; different models are required for each role (advocate, challenger, judge) to avoid training bias — model rotation picks from enabled providers automatically when not explicitly configured; model strings use `provider:model` format; all LLM calls use `chat_direct` to avoid pipeline recursion; configurable via `debate.default_rounds` (default 1), `debate.max_rounds` (cap, default 3), `debate.advocate_model`, `debate.challenger_model`, `debate.judge_model`, `debate.model_selection_strategy` (default `'rotate'`); debate metadata (`enabled`, `rounds`, `advocate_model`, `challenger_model`, `judge_model`, `advocate_summary`, `challenger_summary`, `judge_confidence`) stored in `enrichments['debate:result']`; gracefully degrades to single-model mode with a warning when fewer than 2 models are available
15
+ - Async context curation (`Legion::LLM::ContextCurator`): keeps LLM context lean without compaction (closes #38). Heuristic curation runs async in `Thread.new` after each `step_context_store` — zero latency impact. Curated messages are used in `step_context_load` when available, falling back to raw history. Heuristic pipeline: `strip_thinking` removes `<thinking>` blocks; `distill_tool_result` summarizes large tool outputs by tool type (`read_file` → line count + first/last, `search`/`grep` → match counts, `bash` → exit code + last lines, default → char count + preview); `fold_resolved_exchanges` detects multi-turn clarification reaching agreement and folds to a system note; `evict_superseded` keeps only the latest read of each file path; `dedup_similar` removes near-duplicate messages via Jaccard similarity (delegates to `Compressor.deduplicate_messages`). LLM-assisted mode is built but off by default (`llm_assisted: false`); when enabled with `mode: 'llm_assisted'`, a configurable small/fast model produces better summaries with automatic fallback to heuristic on any error. All behavior gated by `Legion::Settings[:llm][:context_curation]`: `enabled` (default `true`), `mode` (`'heuristic'`), `llm_assisted` (`false`), `llm_model` (`nil`), `tool_result_max_chars` (2000), `thinking_eviction` (`true`), `exchange_folding` (`true`), `superseded_eviction` (`true`), `dedup_enabled` (`true`), `dedup_threshold` (0.85), `target_context_tokens` (40000).
16
+ - Message chain architecture with parent links and sidechain support in `ConversationStore` (closes #39): every message now carries `id` (UUID), `parent_id`, `sidechain` (default `false`), `message_group_id`, and `agent_id` fields; `build_chain(conversation_id, include_sidechains: false)` reconstructs ordered message history from parent links with rooted-leaf selection, parallel sibling recovery via `message_group_id`, and orphan appending; `sidechain_messages(conversation_id, agent_id: nil)` queries background/subagent messages with optional agent filter; `branch(conversation_id, from_message_id:)` creates a new conversation by copying history up to the given message; `store_metadata` / `read_metadata` provide tail-window session metadata storage; `migrate_parent_links!` backfills parent links on pre-migration sequential data; `messages()` backward-compatible flat array uses chain reconstruction when parent links are present, seq ordering otherwise; DB persistence adds `message_id`, `parent_id`, `sidechain`, `message_group_id`, `agent_id` columns when present (graceful degradation without migration)
17
+ - Per-pipeline-step OTEL child spans for distributed tracing (closes #21): `Pipeline::Steps::SpanAnnotator` maps step audit/enrichment data to OTEL span attributes (`rbac.outcome`, `classification.pii_detected`, `billing.estimated_cost_usd`, `rag.entry_count`, `routing.strategy`, `gen_ai.usage.input_tokens`, `confidence.score`, etc.); `Pipeline::Executor#execute_step` wraps each step in a `Legion::Telemetry.with_span("pipeline.<name>", kind: :internal)` child span; `annotate_top_level_span` sets `legion.pipeline.steps_executed`, `legion.pipeline.steps_skipped`, and `gen_ai.usage.cost_usd` on the top-level span after all steps complete; all wrapping gracefully no-ops when `Legion::Telemetry` is not defined or `enabled?` returns false, or when `telemetry.pipeline_spans` is set to `false`; telemetry errors never crash the pipeline
18
+ - Proactive model tier routing by task role and caller context (`Pipeline::Steps::TierAssigner`, step 8a): assigns routing tier before `step_routing` fires, based on GAIA routing hints, caller identity pattern matching (via `File.fnmatch?`), content classification (PHI/PII), and request priority; overrides are suppressed when the caller already sets an explicit `tier:`; default role mappings cover `gaia:tick:*`, `gaia:dream:*`, `system:guardrails`, `system:reflection`, and `user:*`; custom mappings configurable via `Legion::Settings[:llm][:routing][:tier_mappings]`; `step_routing` consumes the proactive assignment when no explicit caller intent is present (closes #22)
19
+ - `:quick_reply` pipeline profile for latency-sensitive conversational turns — skips 12 non-essential steps (idempotency, conversation_uuid, context_load, classification, gaia_advisory, rag_context, mcp_discovery, confidence_scoring, tool_calls, context_store, post_response, knowledge_capture), retaining only the 8 steps required for a valid provider round-trip (closes #27)
20
+ - Conversation auto-summarization at token threshold: `Compressor.auto_compact` compacts history when estimated tokens exceed `conversation.summarize_threshold` (default 50,000); preserves the most recent N turns (`preserve_recent`, default 10); older turns are summarized via `Compressor.summarize_messages` with LLM or stopword fallback; `Compressor.estimate_tokens` provides character-count/4 approximation; `ConversationStore.replace` atomically replaces in-memory history after compaction; wired into `Pipeline::Executor#step_context_load`; controlled by `conversation.auto_compact` (default `true`) (closes #26)
21
+ - `Legion::LLM::Usage` standard struct (`lib/legion/llm/usage.rb`): immutable `::Data.define` value object with `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`, and `total_tokens` fields; `total_tokens` auto-calculated as `input + output` when not explicitly provided; all fields default to 0 (closes #35)
22
+ - Pipeline `extract_tokens` now returns a `Usage` struct instead of a plain hash when the provider response exposes token counts; populates `cache_read_tokens` and `cache_write_tokens` from response when available
23
+ - Asymmetric embedding prefix injection by task type: `generate` and `generate_batch` accept a `task:` keyword (`:document` or `:query`, default `:document`). `PREFIX_REGISTRY` maps model names to task-specific prefixes (`nomic-embed-text` gets `search_document:` / `search_query:`, `mxbai-embed-large` gets a query prefix). Prefix injection is controlled by `Legion::Settings.dig(:llm, :embedding, :prefix_injection)` (default `true`). Unknown models are passed through unchanged (closes #24).
24
+ - Prompt caching pipeline step (`Pipeline::Steps::PromptCache`): `apply_cache_control` marks the last system block with `cache_control: { type: 'ephemeral' }` when content exceeds `min_tokens * 4` chars; `sort_tools_deterministically` sorts tool schemas by name for stable cache keys; `apply_conversation_breakpoint` marks the last stable prior message with a cache breakpoint; all behavior gated behind `Legion::Settings.dig(:llm, :prompt_caching, :enabled)` (default: `false`); individual sub-features controlled by `cache_system_prompt`, `cache_tools`, `cache_conversation`, `sort_tools` flags; `scope` defaults to `'ephemeral'`; wired into `Pipeline::Executor#execute_provider_request` for system prompt and conversation history (closes #36)
25
+ - Escalation chain wired into `Pipeline::Executor#step_provider_call`: when `routing.escalation.enabled` and `pipeline_enabled` are both `true`, the provider call runs through the `EscalationChain` with per-attempt `QualityChecker` evaluation; non-retryable errors (`AuthError`, `RateLimitError`, `PrivacyModeError`) bubble up immediately; quality failures and transient errors advance to the next resolution in the chain; raises `EscalationExhausted` when all attempts are exhausted; timeline records an `escalation:attempt` event per try; `step_routing` populates `@escalation_chain` via `Router.resolve_chain` when escalation is enabled; `pipeline_enabled: true` added to `routing.escalation` defaults (closes #23).
26
+ - Token budget enforcement at the LLM call boundary (closes #25): `Legion::LLM::TokenTracker` thread-safe per-session accumulator (`record`, `total_tokens`, `session_exceeded?`, `session_warning?`, `reset!`, `summary`); `Pipeline::Steps::TokenBudget` pipeline step runs before `provider_call` — raises `TokenBudgetExceeded` when the estimated request input exceeds `max_input_tokens` (from `request.extra`) or the session total hits `session_max_tokens`; logs a warning at `session_warn_tokens`; `TokenBudgetExceeded` added to typed error hierarchy; token counts recorded automatically via `Pipeline::Steps::PostResponse#record_token_usage` after each successful provider call; budget settings under `Legion::Settings[:llm][:budget]`: `session_max_tokens` (nil = off), `session_warn_tokens` (nil = off), `daily_max_tokens` (nil = off, future enforcement).
27
+
5
28
  ## [0.5.24] - 2026-03-31
6
29
 
7
30
  ### Added
data/Gemfile CHANGED
@@ -9,6 +9,7 @@ group :test do
9
9
  gem 'rspec'
10
10
  gem 'rspec_junit_formatter'
11
11
  gem 'rubocop'
12
+ gem 'rubocop-legion'
12
13
  gem 'simplecov'
13
14
  gem 'webmock'
14
15
  end
@@ -63,7 +63,7 @@ module Legion
63
63
  end
64
64
 
65
65
  private_class_method def self.llm_settings
66
- if Legion.const_defined?('Settings')
66
+ if Legion.const_defined?('Settings', false)
67
67
  Legion::Settings[:llm]
68
68
  else
69
69
  Legion::LLM::Settings.default
@@ -88,6 +88,39 @@ module Legion
88
88
  { messages: kept, removed: removed, original_count: messages.size }
89
89
  end
90
90
 
91
+ def auto_compact(messages, target_tokens:, preserve_recent: 10)
92
+ return messages if messages.size <= preserve_recent
93
+
94
+ recent = messages.last(preserve_recent)
95
+ older = messages[0..-(preserve_recent + 1)]
96
+
97
+ summarized = summarize_messages(older, max_tokens: target_tokens / 2)
98
+
99
+ compaction_msg = {
100
+ role: 'system',
101
+ content: "[Conversation compacted: #{older.size} turns summarized]",
102
+ metadata: {
103
+ compacted_at: Time.now.utc.iso8601,
104
+ original_count: messages.size,
105
+ preserved: recent.size
106
+ }
107
+ }
108
+
109
+ summary_msg = {
110
+ role: 'system',
111
+ content: summarized[:summary]
112
+ }
113
+
114
+ [compaction_msg, summary_msg, *recent].flatten
115
+ end
116
+
117
+ def estimate_tokens(messages)
118
+ return 0 if messages.nil? || messages.empty?
119
+
120
+ total_chars = messages.sum { |m| m[:content].to_s.length }
121
+ total_chars / 4
122
+ end
123
+
91
124
  def stopwords_for_level(level)
92
125
  return [] if level <= NONE
93
126
 
@@ -0,0 +1,308 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Legion
4
+ module LLM
5
+ class ContextCurator
6
+ CURATED_KEY = :__curated__
7
+
8
+ def initialize(conversation_id:)
9
+ @conversation_id = conversation_id
10
+ @curated_cache = nil
11
+ end
12
+
13
+ # Called async after each turn completes — zero latency impact.
14
+ def curate_turn(turn_messages:, assistant_response:)
15
+ return unless enabled?
16
+
17
+ Thread.new do
18
+ curated = turn_messages.map { |msg| curate_message(msg, assistant_response) }
19
+ store_curated(@conversation_id, curated)
20
+ @curated_cache = nil
21
+ rescue StandardError => e
22
+ Legion::Logging.warn("ContextCurator: async curation failed: #{e.message}") if defined?(Legion::Logging)
23
+ end
24
+ end
25
+
26
+ # Called sync when building next API request.
27
+ # Returns curated messages when available; nil means use raw history.
28
+ def curated_messages
29
+ return nil unless enabled?
30
+
31
+ @curated_messages ||= load_curated(@conversation_id)
32
+ end
33
+
34
+ # Heuristic: distill a single tool-result message to a compact summary.
35
+ def distill_tool_result(msg, _assistant_context = nil)
36
+ content = msg[:content].to_s
37
+ max_chars = setting(:tool_result_max_chars, 2000)
38
+ return msg if content.length <= max_chars
39
+
40
+ summary = heuristic_tool_summary(content, tool_name_from(msg))
41
+ msg.merge(content: summary, curated: true, original_content: content)
42
+ end
43
+
44
+ # Heuristic: remove extended thinking blocks, keep conclusions.
45
+ def strip_thinking(msg)
46
+ return msg unless setting(:thinking_eviction, true)
47
+
48
+ content = msg[:content].to_s
49
+ stripped = content
50
+ .gsub(%r{<thinking>.*?</thinking>}m, '')
51
+ .gsub(/^#+\s*[Tt]hinking.*?\n(?:(?!^#+\s).)*\n/m, '')
52
+ .strip
53
+
54
+ return msg if stripped == content || stripped.empty?
55
+
56
+ msg.merge(content: stripped, curated: true, original_content: content)
57
+ end
58
+
59
+ # Heuristic: detect multi-turn clarification that reached agreement; fold to single system note.
60
+ def fold_resolved_exchanges(messages)
61
+ return messages unless setting(:exchange_folding, true)
62
+
63
+ result = []
64
+ i = 0
65
+ while i < messages.length
66
+ window = messages[i, 4]
67
+ if resolved_exchange?(window)
68
+ conclusion = window.last[:content].to_s[0, 300]
69
+ note = {
70
+ role: :system,
71
+ content: "[Exchange resolved: #{conclusion}]",
72
+ curated: true,
73
+ original_content: window.map { |m| m[:content] }.join("\n")
74
+ }
75
+ result << note
76
+ i += window.length
77
+ else
78
+ result << messages[i]
79
+ i += 1
80
+ end
81
+ end
82
+ result
83
+ end
84
+
85
+ # Heuristic: if same file was read multiple times, keep only the latest read.
86
+ def evict_superseded(messages)
87
+ return messages unless setting(:superseded_eviction, true)
88
+
89
+ file_last_seen = {}
90
+ messages.each_with_index do |msg, idx|
91
+ path = extract_file_path(msg[:content].to_s)
92
+ file_last_seen[path] = idx if path
93
+ end
94
+
95
+ messages.each_with_index.reject do |msg, idx|
96
+ path = extract_file_path(msg[:content].to_s)
97
+ path && file_last_seen[path] != idx
98
+ end.map(&:first)
99
+ end
100
+
101
+ # Heuristic: deduplicate near-identical messages using Jaccard similarity.
102
+ def dedup_similar(messages, threshold: nil)
103
+ return messages unless setting(:dedup_enabled, true)
104
+
105
+ threshold ||= setting(:dedup_threshold, 0.85)
106
+ result = Compressor.deduplicate_messages(messages, threshold: threshold)
107
+ result[:messages]
108
+ end
109
+
110
+ # LLM-assisted distillation: uses small/fast model to summarize tool results.
111
+ # Falls back to heuristic on any error.
112
+ def llm_distill_tool_result(msg, assistant_response = nil)
113
+ return distill_tool_result(msg, assistant_response) unless llm_assisted?
114
+
115
+ content = msg[:content].to_s
116
+ max_chars = setting(:tool_result_max_chars, 2000)
117
+ return msg if content.length <= max_chars
118
+
119
+ summary = llm_summarize_tool_result(content, tool_name_from(msg))
120
+ if summary
121
+ msg.merge(content: summary, curated: true, original_content: content)
122
+ else
123
+ distill_tool_result(msg, assistant_response)
124
+ end
125
+ rescue StandardError => e
126
+ Legion::Logging.warn("ContextCurator: LLM distillation failed, using heuristic: #{e.message}") if defined?(Legion::Logging)
127
+ distill_tool_result(msg, assistant_response)
128
+ end
129
+
130
+ private
131
+
132
+ def enabled?
133
+ setting(:enabled, true)
134
+ end
135
+
136
+ def llm_assisted?
137
+ enabled? &&
138
+ setting(:llm_assisted, false) &&
139
+ setting(:mode, 'heuristic') == 'llm_assisted'
140
+ end
141
+
142
+ def curation_settings
143
+ Legion::Settings.dig(:llm, :context_curation) || {}
144
+ rescue StandardError
145
+ {}
146
+ end
147
+
148
+ def setting(key, default)
149
+ val = curation_settings[key]
150
+ val.nil? ? default : val
151
+ end
152
+
153
+ def curate_message(msg, assistant_response)
154
+ return msg if msg[:role] == :system
155
+
156
+ msg = strip_thinking(msg)
157
+ if llm_assisted?
158
+ llm_distill_tool_result(msg, assistant_response)
159
+ else
160
+ distill_tool_result(msg, assistant_response)
161
+ end
162
+ end
163
+
164
+ def store_curated(conversation_id, curated_messages)
165
+ curated_messages.each do |msg|
166
+ next unless msg[:curated]
167
+
168
+ ConversationStore.append(
169
+ conversation_id,
170
+ role: CURATED_KEY,
171
+ content: msg[:content],
172
+ original_content: msg[:original_content],
173
+ source_role: msg[:role]
174
+ )
175
+ end
176
+ rescue StandardError => e
177
+ Legion::Logging.warn("ContextCurator: store_curated failed: #{e.message}") if defined?(Legion::Logging)
178
+ end
179
+
180
+ def load_curated(conversation_id)
181
+ return nil unless ConversationStore.conversation_exists?(conversation_id)
182
+
183
+ raw = ConversationStore.messages(conversation_id)
184
+ curated = raw.select { |m| m[:role] == CURATED_KEY }
185
+ return nil if curated.empty?
186
+
187
+ regular = raw.reject { |m| m[:role] == CURATED_KEY }
188
+ apply_curation_pipeline(regular)
189
+ rescue StandardError => e
190
+ Legion::Logging.warn("ContextCurator: load_curated failed: #{e.message}") if defined?(Legion::Logging)
191
+ nil
192
+ end
193
+
194
+ # Apply heuristic curation pipeline to a set of messages.
195
+ def apply_curation_pipeline(messages)
196
+ return messages if messages.nil? || messages.empty?
197
+
198
+ result = messages.map { |msg| strip_thinking(msg) }
199
+ result = result.map { |msg| distill_tool_result(msg) }
200
+ result = fold_resolved_exchanges(result)
201
+ result = evict_superseded(result)
202
+ dedup_similar(result)
203
+ rescue StandardError => e
204
+ Legion::Logging.warn("ContextCurator: apply_curation_pipeline failed: #{e.message}") if defined?(Legion::Logging)
205
+ messages
206
+ end
207
+
208
+ # Build a heuristic summary for a tool result based on detected tool type.
209
+ def heuristic_tool_summary(content, tool_name)
210
+ lines = content.lines
211
+ line_count = lines.length
212
+ char_count = content.length
213
+
214
+ case tool_name&.to_s
215
+ when /read_file|read/
216
+ first_line = lines.first.to_s.chomp
217
+ last_line = lines.last.to_s.chomp
218
+ "Read file (#{line_count} lines). First: #{first_line[0, 80]}... Last: #{last_line[0, 80]}"
219
+ when /search|grep|glob/
220
+ file_count = content.scan(%r{[^\s/]+/[^\s]+}).uniq.length
221
+ "Search returned #{line_count} matches across #{file_count} files"
222
+ when /bash|run_command|execute/
223
+ exit_match = content.match(/exit(?:\s+code)?:?\s*(\d+)/i)
224
+ exit_code = exit_match ? exit_match[1] : '0'
225
+ last_lines = lines.last(3).map(&:chomp).join(' | ')
226
+ "Command output (#{line_count} lines), exit #{exit_code}: #{last_lines[0, 200]}"
227
+ else
228
+ preview = content[0, 200]
229
+ "Tool result (#{line_count} lines, #{char_count} chars): #{preview}"
230
+ end
231
+ end
232
+
233
+ # Detect tool name from message metadata or content.
234
+ def tool_name_from(msg)
235
+ msg[:tool_name] || msg[:name] || infer_tool_name(msg[:content].to_s)
236
+ end
237
+
238
+ def infer_tool_name(content)
239
+ return :read_file if content.match?(/\A(?:File:|Read:|#\s+\S+\.rb|\d+\t)/)
240
+ return :bash if content.match?(/exit code|STDOUT|STDERR/i)
241
+ return :search if content.match?(/\d+ match(?:es)? (?:across|in)/i)
242
+
243
+ nil
244
+ end
245
+
246
+ # Detect if a 2–4 message window represents a resolved Q&A exchange.
247
+ def resolved_exchange?(window)
248
+ return false if window.length < 2
249
+
250
+ roles = window.map { |m| m[:role].to_s }
251
+ # Simple pattern: user -> assistant -> user -> assistant with clarification signals
252
+ return false unless roles.first == 'user' && roles.last == 'assistant'
253
+
254
+ contents = window.map { |m| m[:content].to_s.downcase }
255
+ clarification_signals = ['clarif', 'what do you mean', 'i see', 'understood', 'got it', 'correct', 'exactly', 'yes', 'right', 'agree']
256
+ conclusion_signals = ['in summary', 'to summarize', 'in conclusion', 'therefore', 'so to answer', 'the answer is']
257
+
258
+ has_clarification = contents.any? { |c| clarification_signals.any? { |s| c.include?(s) } }
259
+ has_conclusion = contents.last.length < 500 || conclusion_signals.any? { |s| contents.last.include?(s) }
260
+
261
+ has_clarification && has_conclusion
262
+ end
263
+
264
+ # Extract a file path from content heuristically.
265
+ def extract_file_path(content)
266
+ match = content.match(%r{(?:reading|read|loaded?|opened?|file:)\s+[`'"]?(/[^\s`'"]+)[`'"]?}i) ||
267
+ content.match(%r{^(/(?:[\w.-]+/)*[\w.-]+\.\w+)})
268
+ match ? match[1] : nil
269
+ end
270
+
271
+ # Use a small/fast LLM model to distill a tool result.
272
+ def llm_summarize_tool_result(content, tool_name)
273
+ return nil unless defined?(Legion::LLM) && Legion::LLM.respond_to?(:chat_direct)
274
+
275
+ model = setting(:llm_model, nil) || detect_small_model
276
+ return nil unless model
277
+
278
+ prompt = build_distillation_prompt(content, tool_name)
279
+ response = Legion::LLM.chat_direct(model: model, message: prompt)
280
+ response.respond_to?(:content) ? response.content : nil
281
+ rescue StandardError => e
282
+ Legion::Logging.warn("ContextCurator: llm_summarize_tool_result failed: #{e.message}") if defined?(Legion::Logging)
283
+ nil
284
+ end
285
+
286
+ def build_distillation_prompt(content, tool_name)
287
+ tool_hint = tool_name ? " (from #{tool_name})" : ''
288
+ <<~PROMPT.strip
289
+ Summarize this tool result#{tool_hint} in 1-3 sentences, preserving key facts, file paths, line numbers, and error messages. Omit irrelevant details.
290
+
291
+ Tool result:
292
+ #{content[0, 4000]}
293
+ PROMPT
294
+ end
295
+
296
+ def detect_small_model
297
+ providers = Legion::Settings.dig(:llm, :providers) || {}
298
+ %w[ollama].each do |provider|
299
+ config = providers[provider.to_sym] || providers[provider]
300
+ return config[:default_model] if config.is_a?(Hash) && config[:enabled] && config[:default_model]
301
+ end
302
+ nil
303
+ rescue StandardError
304
+ nil
305
+ end
306
+ end
307
+ end
308
+ end