RubyGems - legion-llm - Versions diffs - 0.5.24 → 0.6.0 - Mend

legion-llm 0.5.24 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +23 -0
data/Gemfile +1 -0
data/lib/legion/llm/cache.rb +1 -1
data/lib/legion/llm/compressor.rb +33 -0
data/lib/legion/llm/context_curator.rb +308 -0
data/lib/legion/llm/conversation_store.rb +270 -10
data/lib/legion/llm/discovery/ollama.rb +2 -2
data/lib/legion/llm/discovery/system.rb +1 -1
data/lib/legion/llm/embeddings.rb +29 -2
data/lib/legion/llm/errors.rb +2 -0
data/lib/legion/llm/hooks/rag_guard.rb +3 -3
data/lib/legion/llm/hooks/response_guard.rb +1 -1
data/lib/legion/llm/native_dispatch.rb +128 -0
data/lib/legion/llm/pipeline/executor.rb +350 -19
data/lib/legion/llm/pipeline/profile.rb +10 -2
data/lib/legion/llm/pipeline/steps/debate.rb +286 -0
data/lib/legion/llm/pipeline/steps/post_response.rb +10 -1
data/lib/legion/llm/pipeline/steps/prompt_cache.rb +90 -0
data/lib/legion/llm/pipeline/steps/span_annotator.rb +95 -0
data/lib/legion/llm/pipeline/steps/tier_assigner.rb +61 -0
data/lib/legion/llm/pipeline/steps/token_budget.rb +47 -0
data/lib/legion/llm/pipeline/steps.rb +5 -0
data/lib/legion/llm/pipeline.rb +1 -0
data/lib/legion/llm/provider_registry.rb +32 -0
data/lib/legion/llm/router.rb +2 -2
data/lib/legion/llm/settings.rb +80 -7
data/lib/legion/llm/token_tracker.rb +117 -0
data/lib/legion/llm/usage.rb +30 -0
data/lib/legion/llm/version.rb +1 -1
data/lib/legion/llm.rb +43 -3
metadata +11 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 7faa875219df27f00724444b192db231e6ef9f0b9a56da041791fbec1a17abb0
-  data.tar.gz: 271946a65933ed5366a9240490f4ba2df08f9a5c70200ab3b2605d413f6da611
+  metadata.gz: f65ac724c32de98ddfa324545b62e81cda38e27efdcdcbceb21abd21729ae599
+  data.tar.gz: 7c02a90eac3bda99512da956c889a06084980468c034c25e0c602d7e06db7ac3
 SHA512:
-  metadata.gz: 5b75a996f84fa52a56e2dd145ae411007e7b0e82ebbf4c3ebf472d3368c8fb9647c311de67eb3500861c2758ba2c62ae7d90c7f8540c2080625ee99ba40d21c2
-  data.tar.gz: d390c9821b76cca8889c46a5f1999408de60dadb5d0c6db05c3fa7b0e68faa152bd557b9e19e38897b5ff52af8a05be26f74f0dc32a10ea09a282247a62f0c60
+  metadata.gz: 71f7496e4df651c8d93bf3ac27059a2075f0b82299afa1f61f98138dc81db90ed3139c27b933774969d7d727ff9483db2a92514d82460a3f3de1c2dfbbff44ff
+  data.tar.gz: 6757e931ab1bef7d95c1470a3cf24077fa777683955bc0e5ed6ab6b7d7ef6a2f6f4f613668b6395a52a4200cdb97ae9aebc80f6bc9af5ee896c8a44142215425

data/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,29 @@
 ## [Unreleased]
+## [0.6.0] - 2026-03-31
+### Added
+- `Legion::LLM::ProviderRegistry` — thread-safe registry for native lex-* provider extensions: `register(name, ext)`, `for(name)`, `available`, `registered?(name)`, `reset!`; cleared automatically on `Legion::LLM.shutdown` (closes #37)
+- `Legion::LLM::NativeDispatch` — native provider dispatch layer: `dispatch_chat`, `dispatch_embed`, `dispatch_stream`, `dispatch_count_tokens` route calls to registered lex-* extension modules and return standardized `{ result:, usage: Usage }` hashes; raises `ProviderError` when provider is not registered (closes #37)
+- `Legion::LLM::NativeResponseAdapter` — adapter wrapping native dispatch result hash to expose the same `.content`, `.input_tokens`, `.output_tokens`, `.usage` interface as a RubyLLM response object (closes #37)
+- `provider_layer` settings section: `mode` (`'ruby_llm'` default / `'native'` / `'auto'`), `native_providers` (default `['claude', 'bedrock']`), `fallback_to_ruby_llm` (default `true`); `ruby_llm` mode preserves all existing behavior unchanged (closes #37)
+- Auto-registration in `Legion::LLM.start`: detects loaded lex-* extensions via `Object.const_defined?` and registers them — `lex-claude` → `:claude`/`:anthropic`, `lex-bedrock` → `:bedrock`, `lex-openai` → `:openai`, `lex-gemini` → `:gemini`; no hard dependencies added (closes #37)
+- `Pipeline::Executor` provider layer integration: `use_native_dispatch?` checks `provider_layer.mode`; `execute_provider_request_native` calls `NativeDispatch.dispatch_chat` and wraps result in `NativeResponseAdapter`, falls back to RubyLLM when `fallback_to_ruby_llm: true`; `execute_provider_request_ruby_llm` is the extracted RubyLLM path (default, no behavior change) (closes #37)
+- Optional adversarial debate pipeline step for high-stakes decisions (closes #28): `Pipeline::Steps::Debate` runs a multi-round advocate/challenger/judge debate after `provider_call`; the initial response is the advocate, a challenger model critiques it, the advocate rebuts, and a judge model synthesizes all sides into the final response; activation via `debate: true` in `chat()` kwargs, or `Legion::Settings[:llm][:debate][:enabled]`, or GAIA auto-trigger when `gaia_auto_trigger: true` and `high_stakes`/`debate_recommended` are set in the advisory enrichment; debate is disabled by default; GAIA auto-trigger defaults to false in v0.6.0; different models are required for each role (advocate, challenger, judge) to avoid training bias — model rotation picks from enabled providers automatically when not explicitly configured; model strings use `provider:model` format; all LLM calls use `chat_direct` to avoid pipeline recursion; configurable via `debate.default_rounds` (default 1), `debate.max_rounds` (cap, default 3), `debate.advocate_model`, `debate.challenger_model`, `debate.judge_model`, `debate.model_selection_strategy` (default `'rotate'`); debate metadata (`enabled`, `rounds`, `advocate_model`, `challenger_model`, `judge_model`, `advocate_summary`, `challenger_summary`, `judge_confidence`) stored in `enrichments['debate:result']`; gracefully degrades to single-model mode with a warning when fewer than 2 models are available
+- Async context curation (`Legion::LLM::ContextCurator`): keeps LLM context lean without compaction (closes #38). Heuristic curation runs async in `Thread.new` after each `step_context_store` — zero latency impact. Curated messages are used in `step_context_load` when available, falling back to raw history. Heuristic pipeline: `strip_thinking` removes `<thinking>` blocks; `distill_tool_result` summarizes large tool outputs by tool type (`read_file` → line count + first/last, `search`/`grep` → match counts, `bash` → exit code + last lines, default → char count + preview); `fold_resolved_exchanges` detects multi-turn clarification reaching agreement and folds to a system note; `evict_superseded` keeps only the latest read of each file path; `dedup_similar` removes near-duplicate messages via Jaccard similarity (delegates to `Compressor.deduplicate_messages`). LLM-assisted mode is built but off by default (`llm_assisted: false`); when enabled with `mode: 'llm_assisted'`, a configurable small/fast model produces better summaries with automatic fallback to heuristic on any error. All behavior gated by `Legion::Settings[:llm][:context_curation]`: `enabled` (default `true`), `mode` (`'heuristic'`), `llm_assisted` (`false`), `llm_model` (`nil`), `tool_result_max_chars` (2000), `thinking_eviction` (`true`), `exchange_folding` (`true`), `superseded_eviction` (`true`), `dedup_enabled` (`true`), `dedup_threshold` (0.85), `target_context_tokens` (40000).
+- Message chain architecture with parent links and sidechain support in `ConversationStore` (closes #39): every message now carries `id` (UUID), `parent_id`, `sidechain` (default `false`), `message_group_id`, and `agent_id` fields; `build_chain(conversation_id, include_sidechains: false)` reconstructs ordered message history from parent links with rooted-leaf selection, parallel sibling recovery via `message_group_id`, and orphan appending; `sidechain_messages(conversation_id, agent_id: nil)` queries background/subagent messages with optional agent filter; `branch(conversation_id, from_message_id:)` creates a new conversation by copying history up to the given message; `store_metadata` / `read_metadata` provide tail-window session metadata storage; `migrate_parent_links!` backfills parent links on pre-migration sequential data; `messages()` backward-compatible flat array uses chain reconstruction when parent links are present, seq ordering otherwise; DB persistence adds `message_id`, `parent_id`, `sidechain`, `message_group_id`, `agent_id` columns when present (graceful degradation without migration)
+- Per-pipeline-step OTEL child spans for distributed tracing (closes #21): `Pipeline::Steps::SpanAnnotator` maps step audit/enrichment data to OTEL span attributes (`rbac.outcome`, `classification.pii_detected`, `billing.estimated_cost_usd`, `rag.entry_count`, `routing.strategy`, `gen_ai.usage.input_tokens`, `confidence.score`, etc.); `Pipeline::Executor#execute_step` wraps each step in a `Legion::Telemetry.with_span("pipeline.<name>", kind: :internal)` child span; `annotate_top_level_span` sets `legion.pipeline.steps_executed`, `legion.pipeline.steps_skipped`, and `gen_ai.usage.cost_usd` on the top-level span after all steps complete; all wrapping gracefully no-ops when `Legion::Telemetry` is not defined or `enabled?` returns false, or when `telemetry.pipeline_spans` is set to `false`; telemetry errors never crash the pipeline
+- Proactive model tier routing by task role and caller context (`Pipeline::Steps::TierAssigner`, step 8a): assigns routing tier before `step_routing` fires, based on GAIA routing hints, caller identity pattern matching (via `File.fnmatch?`), content classification (PHI/PII), and request priority; overrides are suppressed when the caller already sets an explicit `tier:`; default role mappings cover `gaia:tick:*`, `gaia:dream:*`, `system:guardrails`, `system:reflection`, and `user:*`; custom mappings configurable via `Legion::Settings[:llm][:routing][:tier_mappings]`; `step_routing` consumes the proactive assignment when no explicit caller intent is present (closes #22)
+- `:quick_reply` pipeline profile for latency-sensitive conversational turns — skips 12 non-essential steps (idempotency, conversation_uuid, context_load, classification, gaia_advisory, rag_context, mcp_discovery, confidence_scoring, tool_calls, context_store, post_response, knowledge_capture), retaining only the 8 steps required for a valid provider round-trip (closes #27)
+- Conversation auto-summarization at token threshold: `Compressor.auto_compact` compacts history when estimated tokens exceed `conversation.summarize_threshold` (default 50,000); preserves the most recent N turns (`preserve_recent`, default 10); older turns are summarized via `Compressor.summarize_messages` with LLM or stopword fallback; `Compressor.estimate_tokens` provides character-count/4 approximation; `ConversationStore.replace` atomically replaces in-memory history after compaction; wired into `Pipeline::Executor#step_context_load`; controlled by `conversation.auto_compact` (default `true`) (closes #26)
+- `Legion::LLM::Usage` standard struct (`lib/legion/llm/usage.rb`): immutable `::Data.define` value object with `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`, and `total_tokens` fields; `total_tokens` auto-calculated as `input + output` when not explicitly provided; all fields default to 0 (closes #35)
+- Pipeline `extract_tokens` now returns a `Usage` struct instead of a plain hash when the provider response exposes token counts; populates `cache_read_tokens` and `cache_write_tokens` from response when available
+- Asymmetric embedding prefix injection by task type: `generate` and `generate_batch` accept a `task:` keyword (`:document` or `:query`, default `:document`). `PREFIX_REGISTRY` maps model names to task-specific prefixes (`nomic-embed-text` gets `search_document:` / `search_query:`, `mxbai-embed-large` gets a query prefix). Prefix injection is controlled by `Legion::Settings.dig(:llm, :embedding, :prefix_injection)` (default `true`). Unknown models are passed through unchanged (closes #24).
+- Prompt caching pipeline step (`Pipeline::Steps::PromptCache`): `apply_cache_control` marks the last system block with `cache_control: { type: 'ephemeral' }` when content exceeds `min_tokens * 4` chars; `sort_tools_deterministically` sorts tool schemas by name for stable cache keys; `apply_conversation_breakpoint` marks the last stable prior message with a cache breakpoint; all behavior gated behind `Legion::Settings.dig(:llm, :prompt_caching, :enabled)` (default: `false`); individual sub-features controlled by `cache_system_prompt`, `cache_tools`, `cache_conversation`, `sort_tools` flags; `scope` defaults to `'ephemeral'`; wired into `Pipeline::Executor#execute_provider_request` for system prompt and conversation history (closes #36)
+- Escalation chain wired into `Pipeline::Executor#step_provider_call`: when `routing.escalation.enabled` and `pipeline_enabled` are both `true`, the provider call runs through the `EscalationChain` with per-attempt `QualityChecker` evaluation; non-retryable errors (`AuthError`, `RateLimitError`, `PrivacyModeError`) bubble up immediately; quality failures and transient errors advance to the next resolution in the chain; raises `EscalationExhausted` when all attempts are exhausted; timeline records an `escalation:attempt` event per try; `step_routing` populates `@escalation_chain` via `Router.resolve_chain` when escalation is enabled; `pipeline_enabled: true` added to `routing.escalation` defaults (closes #23).
+- Token budget enforcement at the LLM call boundary (closes #25): `Legion::LLM::TokenTracker` thread-safe per-session accumulator (`record`, `total_tokens`, `session_exceeded?`, `session_warning?`, `reset!`, `summary`); `Pipeline::Steps::TokenBudget` pipeline step runs before `provider_call` — raises `TokenBudgetExceeded` when the estimated request input exceeds `max_input_tokens` (from `request.extra`) or the session total hits `session_max_tokens`; logs a warning at `session_warn_tokens`; `TokenBudgetExceeded` added to typed error hierarchy; token counts recorded automatically via `Pipeline::Steps::PostResponse#record_token_usage` after each successful provider call; budget settings under `Legion::Settings[:llm][:budget]`: `session_max_tokens` (nil = off), `session_warn_tokens` (nil = off), `daily_max_tokens` (nil = off, future enforcement).
 ## [0.5.24] - 2026-03-31
 ### Added

data/Gemfile CHANGED Viewed

@@ -9,6 +9,7 @@ group :test do
   gem 'rspec'
   gem 'rspec_junit_formatter'
   gem 'rubocop'
+  gem 'rubocop-legion'
   gem 'simplecov'
   gem 'webmock'
 end

data/lib/legion/llm/cache.rb CHANGED Viewed

@@ -63,7 +63,7 @@ module Legion
       end
       private_class_method def self.llm_settings
-        if Legion.const_defined?('Settings')
+        if Legion.const_defined?('Settings', false)
           Legion::Settings[:llm]
         else
           Legion::LLM::Settings.default

data/lib/legion/llm/compressor.rb CHANGED Viewed

@@ -88,6 +88,39 @@ module Legion
           { messages: kept, removed: removed, original_count: messages.size }
         end
+        def auto_compact(messages, target_tokens:, preserve_recent: 10)
+          return messages if messages.size <= preserve_recent
+          recent = messages.last(preserve_recent)
+          older  = messages[0..-(preserve_recent + 1)]
+          summarized = summarize_messages(older, max_tokens: target_tokens / 2)
+          compaction_msg = {
+            role:     'system',
+            content:  "[Conversation compacted: #{older.size} turns summarized]",
+            metadata: {
+              compacted_at:   Time.now.utc.iso8601,
+              original_count: messages.size,
+              preserved:      recent.size
+            }
+          }
+          summary_msg = {
+            role:    'system',
+            content: summarized[:summary]
+          }
+          [compaction_msg, summary_msg, *recent].flatten
+        end
+        def estimate_tokens(messages)
+          return 0 if messages.nil? || messages.empty?
+          total_chars = messages.sum { |m| m[:content].to_s.length }
+          total_chars / 4
+        end
         def stopwords_for_level(level)
           return [] if level <= NONE

data/lib/legion/llm/context_curator.rb ADDED Viewed

@@ -0,0 +1,308 @@
+# frozen_string_literal: true
+module Legion
+  module LLM
+    class ContextCurator
+      CURATED_KEY = :__curated__
+      def initialize(conversation_id:)
+        @conversation_id = conversation_id
+        @curated_cache   = nil
+      end
+      # Called async after each turn completes — zero latency impact.
+      def curate_turn(turn_messages:, assistant_response:)
+        return unless enabled?
+        Thread.new do
+          curated = turn_messages.map { |msg| curate_message(msg, assistant_response) }
+          store_curated(@conversation_id, curated)
+          @curated_cache = nil
+        rescue StandardError => e
+          Legion::Logging.warn("ContextCurator: async curation failed: #{e.message}") if defined?(Legion::Logging)
+        end
+      end
+      # Called sync when building next API request.
+      # Returns curated messages when available; nil means use raw history.
+      def curated_messages
+        return nil unless enabled?
+        @curated_messages ||= load_curated(@conversation_id)
+      end
+      # Heuristic: distill a single tool-result message to a compact summary.
+      def distill_tool_result(msg, _assistant_context = nil)
+        content = msg[:content].to_s
+        max_chars = setting(:tool_result_max_chars, 2000)
+        return msg if content.length <= max_chars
+        summary = heuristic_tool_summary(content, tool_name_from(msg))
+        msg.merge(content: summary, curated: true, original_content: content)
+      end
+      # Heuristic: remove extended thinking blocks, keep conclusions.
+      def strip_thinking(msg)
+        return msg unless setting(:thinking_eviction, true)
+        content = msg[:content].to_s
+        stripped = content
+                   .gsub(%r{<thinking>.*?</thinking>}m, '')
+                   .gsub(/^#+\s*[Tt]hinking.*?\n(?:(?!^#+\s).)*\n/m, '')
+                   .strip
+        return msg if stripped == content || stripped.empty?
+        msg.merge(content: stripped, curated: true, original_content: content)
+      end
+      # Heuristic: detect multi-turn clarification that reached agreement; fold to single system note.
+      def fold_resolved_exchanges(messages)
+        return messages unless setting(:exchange_folding, true)
+        result = []
+        i = 0
+        while i < messages.length
+          window = messages[i, 4]
+          if resolved_exchange?(window)
+            conclusion = window.last[:content].to_s[0, 300]
+            note = {
+              role:             :system,
+              content:          "[Exchange resolved: #{conclusion}]",
+              curated:          true,
+              original_content: window.map { |m| m[:content] }.join("\n")
+            }
+            result << note
+            i += window.length
+          else
+            result << messages[i]
+            i += 1
+          end
+        end
+        result
+      end
+      # Heuristic: if same file was read multiple times, keep only the latest read.
+      def evict_superseded(messages)
+        return messages unless setting(:superseded_eviction, true)
+        file_last_seen = {}
+        messages.each_with_index do |msg, idx|
+          path = extract_file_path(msg[:content].to_s)
+          file_last_seen[path] = idx if path
+        end
+        messages.each_with_index.reject do |msg, idx|
+          path = extract_file_path(msg[:content].to_s)
+          path && file_last_seen[path] != idx
+        end.map(&:first)
+      end
+      # Heuristic: deduplicate near-identical messages using Jaccard similarity.
+      def dedup_similar(messages, threshold: nil)
+        return messages unless setting(:dedup_enabled, true)
+        threshold ||= setting(:dedup_threshold, 0.85)
+        result = Compressor.deduplicate_messages(messages, threshold: threshold)
+        result[:messages]
+      end
+      # LLM-assisted distillation: uses small/fast model to summarize tool results.
+      # Falls back to heuristic on any error.
+      def llm_distill_tool_result(msg, assistant_response = nil)
+        return distill_tool_result(msg, assistant_response) unless llm_assisted?
+        content = msg[:content].to_s
+        max_chars = setting(:tool_result_max_chars, 2000)
+        return msg if content.length <= max_chars
+        summary = llm_summarize_tool_result(content, tool_name_from(msg))
+        if summary
+          msg.merge(content: summary, curated: true, original_content: content)
+        else
+          distill_tool_result(msg, assistant_response)
+        end
+      rescue StandardError => e
+        Legion::Logging.warn("ContextCurator: LLM distillation failed, using heuristic: #{e.message}") if defined?(Legion::Logging)
+        distill_tool_result(msg, assistant_response)
+      end
+      private
+      def enabled?
+        setting(:enabled, true)
+      end
+      def llm_assisted?
+        enabled? &&
+          setting(:llm_assisted, false) &&
+          setting(:mode, 'heuristic') == 'llm_assisted'
+      end
+      def curation_settings
+        Legion::Settings.dig(:llm, :context_curation) || {}
+      rescue StandardError
+        {}
+      end
+      def setting(key, default)
+        val = curation_settings[key]
+        val.nil? ? default : val
+      end
+      def curate_message(msg, assistant_response)
+        return msg if msg[:role] == :system
+        msg = strip_thinking(msg)
+        if llm_assisted?
+          llm_distill_tool_result(msg, assistant_response)
+        else
+          distill_tool_result(msg, assistant_response)
+        end
+      end
+      def store_curated(conversation_id, curated_messages)
+        curated_messages.each do |msg|
+          next unless msg[:curated]
+          ConversationStore.append(
+            conversation_id,
+            role:             CURATED_KEY,
+            content:          msg[:content],
+            original_content: msg[:original_content],
+            source_role:      msg[:role]
+          )
+        end
+      rescue StandardError => e
+        Legion::Logging.warn("ContextCurator: store_curated failed: #{e.message}") if defined?(Legion::Logging)
+      end
+      def load_curated(conversation_id)
+        return nil unless ConversationStore.conversation_exists?(conversation_id)
+        raw = ConversationStore.messages(conversation_id)
+        curated = raw.select { |m| m[:role] == CURATED_KEY }
+        return nil if curated.empty?
+        regular = raw.reject { |m| m[:role] == CURATED_KEY }
+        apply_curation_pipeline(regular)
+      rescue StandardError => e
+        Legion::Logging.warn("ContextCurator: load_curated failed: #{e.message}") if defined?(Legion::Logging)
+        nil
+      end
+      # Apply heuristic curation pipeline to a set of messages.
+      def apply_curation_pipeline(messages)
+        return messages if messages.nil? || messages.empty?
+        result = messages.map { |msg| strip_thinking(msg) }
+        result = result.map { |msg| distill_tool_result(msg) }
+        result = fold_resolved_exchanges(result)
+        result = evict_superseded(result)
+        dedup_similar(result)
+      rescue StandardError => e
+        Legion::Logging.warn("ContextCurator: apply_curation_pipeline failed: #{e.message}") if defined?(Legion::Logging)
+        messages
+      end
+      # Build a heuristic summary for a tool result based on detected tool type.
+      def heuristic_tool_summary(content, tool_name)
+        lines = content.lines
+        line_count = lines.length
+        char_count = content.length
+        case tool_name&.to_s
+        when /read_file|read/
+          first_line = lines.first.to_s.chomp
+          last_line  = lines.last.to_s.chomp
+          "Read file (#{line_count} lines). First: #{first_line[0, 80]}... Last: #{last_line[0, 80]}"
+        when /search|grep|glob/
+          file_count = content.scan(%r{[^\s/]+/[^\s]+}).uniq.length
+          "Search returned #{line_count} matches across #{file_count} files"
+        when /bash|run_command|execute/
+          exit_match = content.match(/exit(?:\s+code)?:?\s*(\d+)/i)
+          exit_code  = exit_match ? exit_match[1] : '0'
+          last_lines = lines.last(3).map(&:chomp).join(' | ')
+          "Command output (#{line_count} lines), exit #{exit_code}: #{last_lines[0, 200]}"
+        else
+          preview = content[0, 200]
+          "Tool result (#{line_count} lines, #{char_count} chars): #{preview}"
+        end
+      end
+      # Detect tool name from message metadata or content.
+      def tool_name_from(msg)
+        msg[:tool_name] || msg[:name] || infer_tool_name(msg[:content].to_s)
+      end
+      def infer_tool_name(content)
+        return :read_file   if content.match?(/\A(?:File:|Read:|#\s+\S+\.rb|\d+\t)/)
+        return :bash        if content.match?(/exit code|STDOUT|STDERR/i)
+        return :search      if content.match?(/\d+ match(?:es)? (?:across|in)/i)
+        nil
+      end
+      # Detect if a 2–4 message window represents a resolved Q&A exchange.
+      def resolved_exchange?(window)
+        return false if window.length < 2
+        roles = window.map { |m| m[:role].to_s }
+        # Simple pattern: user -> assistant -> user -> assistant with clarification signals
+        return false unless roles.first == 'user' && roles.last == 'assistant'
+        contents = window.map { |m| m[:content].to_s.downcase }
+        clarification_signals = ['clarif', 'what do you mean', 'i see', 'understood', 'got it', 'correct', 'exactly', 'yes', 'right', 'agree']
+        conclusion_signals    = ['in summary', 'to summarize', 'in conclusion', 'therefore', 'so to answer', 'the answer is']
+        has_clarification = contents.any? { |c| clarification_signals.any? { |s| c.include?(s) } }
+        has_conclusion    = contents.last.length < 500 || conclusion_signals.any? { |s| contents.last.include?(s) }
+        has_clarification && has_conclusion
+      end
+      # Extract a file path from content heuristically.
+      def extract_file_path(content)
+        match = content.match(%r{(?:reading|read|loaded?|opened?|file:)\s+[`'"]?(/[^\s`'"]+)[`'"]?}i) ||
+                content.match(%r{^(/(?:[\w.-]+/)*[\w.-]+\.\w+)})
+        match ? match[1] : nil
+      end
+      # Use a small/fast LLM model to distill a tool result.
+      def llm_summarize_tool_result(content, tool_name)
+        return nil unless defined?(Legion::LLM) && Legion::LLM.respond_to?(:chat_direct)
+        model = setting(:llm_model, nil) || detect_small_model
+        return nil unless model
+        prompt = build_distillation_prompt(content, tool_name)
+        response = Legion::LLM.chat_direct(model: model, message: prompt)
+        response.respond_to?(:content) ? response.content : nil
+      rescue StandardError => e
+        Legion::Logging.warn("ContextCurator: llm_summarize_tool_result failed: #{e.message}") if defined?(Legion::Logging)
+        nil
+      end
+      def build_distillation_prompt(content, tool_name)
+        tool_hint = tool_name ? " (from #{tool_name})" : ''
+        <<~PROMPT.strip
+          Summarize this tool result#{tool_hint} in 1-3 sentences, preserving key facts, file paths, line numbers, and error messages. Omit irrelevant details.
+          Tool result:
+          #{content[0, 4000]}
+        PROMPT
+      end
+      def detect_small_model
+        providers = Legion::Settings.dig(:llm, :providers) || {}
+        %w[ollama].each do |provider|
+          config = providers[provider.to_sym] || providers[provider]
+          return config[:default_model] if config.is_a?(Hash) && config[:enabled] && config[:default_model]
+        end
+        nil
+      rescue StandardError
+        nil
+      end
+    end
+  end
+end