RubyGems - kairos-chain - Versions diffs - 3.12.0 → 3.13.0 - Mend

kairos-chain 3.12.0 → 3.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +45 -0
data/lib/kairos_mcp/version.rb +1 -1
data/templates/skillsets/agent/config/agent.yml +14 -0
data/templates/skillsets/agent/lib/agent/session.rb +31 -0
data/templates/skillsets/agent/test/test_agent_complexity_review.rb +585 -0
data/templates/skillsets/agent/tools/agent_step.rb +429 -1
data/templates/skillsets/llm_client/lib/llm_client/claude_code_adapter.rb +24 -3
data/templates/skillsets/llm_client/tools/llm_call.rb +6 -3
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 9e4f14b45f5ece26ac4279095f818d8772efcf5ae0d1dfb95924e2f69a2c2bcc
-  data.tar.gz: a51ac4d001f7e5c5c17491fdb168da11b04d17502972d80d1119ac99f2243acf
+  metadata.gz: 9677cd39be103647a6f733640b46d4785fb7898bff420596ac9bf34880b5d9f7
+  data.tar.gz: daaff4ac3d451e272429787ccd1ffd0c22c3522c355a382f0129d89d806b39c3
 SHA512:
-  metadata.gz: 0cb3cba2544bbb1244945199b783db04da81573dfde6d45d8c1c46d0c310e0fe26e9c521b7983263e66ec2d363c5614f630336ecdd10af056ca3ea9cb8b5ccf6
-  data.tar.gz: 64ef0f93ebc9922c5acc8bbc6f0733734e9be5445c810f51f1c2b4f55c975764f39277bf07d48a5d1477010fdb7b1c7dfe9452f88bcf3db0ed8bbd1339caf3f9
+  metadata.gz: 73aa8e44ecb933877178888fe64c494a5a85544051aa68f58d7823e541322a53a9a39d73d907bd6c30974bb79a7b4ada52c31c77e0165d5a17ba71df7981e663
+  data.tar.gz: de7dc8b133bdc276781da26aa6fea468b26e7190af199cfaa4c0de2e966622cd171c040431c4baa7c0712e4728b4ffbe4fe8d661e594cf07c54c26b0c28b0b1d

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,51 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
 This project follows [Semantic Versioning](https://semver.org/).
+## [3.13.0] - 2026-04-02
+### Added
+- **Complexity-driven review for Agent auto mode** — New Gate 5.5a/b (pre-ACT) and
+  Gate 6.5 (post-ACT) in the Agent autonomous OODA loop.
+  - **Structural complexity assessment** with 7 signals: `high_risk`, `many_steps`,
+    `design_scope`, `l0_change`, `core_files`, `multi_file`, `state_mutation`
+  - **LLM self-assessment merge**: DECIDE prompt requests `complexity_hint`; merge rule
+    caps LLM at structural + 1 level (prevents over-reporting)
+  - **Gate 5.5a**: L0 changes always checkpoint with multi-LLM review prompt generation
+  - **Gate 5.5b**: High complexity triggers Persona Assembly review (inner retry loop
+    with max re-DECIDE attempts, risk/loop/complexity re-checks per revision)
+  - **Gate 6.5**: Medium complexity runs post-ACT lightweight advisory review
+  - Low complexity: no overhead (unchanged flow)
+  - Parse failures default to REVISE (never silent APPROVE)
+  - Persona definitions loaded from L1 knowledge with hardcoded fallback
+  - Configuration: `complexity_review` section in `agent.yml` (personas, retries,
+    L0 checkpoint policy, post-ACT toggle)
+  - New `review` phase config in `agent.yml` (max_llm_calls, max_tool_calls)
+  - Session: `save_review_result`, `load_review_result`, `save_progress_amendment`
+  - Tests: 37 new (complexity assessment, persona review parsing, config, session, prompts)
+### Fixed
+- **`llm_call.rb` eager adapter loading** — All provider adapters were unconditionally
+  required at startup, crashing with `LoadError` when optional gems (`faraday`,
+  `aws-sdk`) were not installed. Now lazy-loads adapters in `build_adapter()`;
+  only `claude_code_adapter` and base modules loaded at startup.
+- **`claude_code_adapter` recursive MCP server loading** — `claude -p` subprocess
+  loaded `.mcp.json` and spawned additional MCP server instances, causing deadlocks
+  (stdio) or port conflicts (HTTP). Fixed with `--mcp-config '{"mcpServers":{}}'`
+  and `--no-session-persistence`.
+- **`claude_code_adapter` missing timeout** — `Open3.capture3` had no timeout,
+  risking indefinite hangs. Wrapped with `Timeout.timeout` (default 120s,
+  configurable via `timeout_seconds` in `llm_client.yml`).
+### Design Process
+- Complexity review design: 2R x 2 LLMs (Claude Team, Cursor Composer) → APPROVED
+- Complexity review impl: R1 x 3 LLMs (Claude Team, Cursor, Codex) → fixes applied
+  - Codex found off-by-one in retry counter and cycle number (both fixed)
+  - R2 (Claude Team) → APPROVED
+- llm_client fixes: reported by SUSHI self-maintenance MCP project, verified in upstream
 ## [3.12.0] - 2026-04-02
 ### Added

data/lib/kairos_mcp/version.rb CHANGED Viewed

@@ -1,4 +1,4 @@
 module KairosMcp
-  VERSION = "3.12.0"
+  VERSION = "3.13.0"
   CHANGELOG_URL = "https://github.com/masaomi/KairosChain_2026/blob/main/CHANGELOG.md"
 end

data/templates/skillsets/agent/config/agent.yml CHANGED Viewed

@@ -17,6 +17,10 @@ phases:
   reflect:
     max_llm_calls: 3
     max_tool_calls: 0
+  review:
+    max_llm_calls: 5
+    max_tool_calls: 0
+    max_repair_attempts: 1
 # Default policy
 tool_blacklist:
@@ -62,5 +66,15 @@ agent_execute:
   default_budget_usd: 0.50
   max_budget_usd: 2.00
+# Complexity-driven review (Gate 5.5 / Gate 6.5)
+complexity_review:
+  enabled: true
+  personas: [pragmatic, skeptic]              # default personas for review
+  high_personas: [kairos, pragmatic, skeptic] # personas for high complexity
+  max_review_retries: 2                       # max re-DECIDE from review feedback
+  # review_budget_llm is controlled by phases.review.max_llm_calls (above)
+  l0_always_checkpoint: true                  # L0 changes always pause for human
+  post_act_review: true                       # enable medium-complexity post-ACT review
 # Audit
 audit_level: summary

data/templates/skillsets/agent/lib/agent/session.rb CHANGED Viewed

@@ -161,8 +161,39 @@ module KairosMcp
           end
         end
+        # Save persona review result for audit trail.
+        def save_review_result(review)
+          File.write(review_path, JSON.pretty_generate(review))
+        end
+        # Load the last persona review result.
+        def load_review_result
+          return nil unless File.exist?(review_path)
+          JSON.parse(File.read(review_path), symbolize_names: true)
+        rescue JSON::ParserError
+          nil
+        end
+        # Append review concerns as a progress amendment entry.
+        # Append review concerns as a progress amendment entry.
+        # Called after run_act_reflect_internal which already incremented cycle_number,
+        # so @cycle_number is the current (post-increment) cycle.
+        def save_progress_amendment(concerns)
+          entry = {
+            'cycle'     => @cycle_number,
+            'timestamp' => Time.now.utc.iso8601,
+            'type'      => 'review_amendment',
+            'concerns'  => concerns
+          }
+          File.open(progress_path, 'a') { |f| f.puts(JSON.generate(entry)) }
+        end
         private
+        def review_path
+          File.join(session_dir, 'last_review.json')
+        end
         def session_dir
           dir = self.class.storage_path("agent_sessions/#{@session_id}")
           FileUtils.mkdir_p(dir)

data/templates/skillsets/agent/test/test_agent_complexity_review.rb ADDED Viewed

@@ -0,0 +1,585 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+# Test suite for Complexity-Driven Review Integration
+# Tests: M1 (complexity assessment), M2 (persona review), M3 (autonomous loop), M4 (config)
+# Usage: ruby test_agent_complexity_review.rb
+$LOAD_PATH.unshift File.expand_path('../../lib', __dir__)
+$LOAD_PATH.unshift File.expand_path('../../../../lib', __dir__)
+require 'json'
+require 'yaml'
+require 'fileutils'
+require 'tmpdir'
+require 'digest'
+require 'time'
+require 'kairos_mcp/invocation_context'
+require 'kairos_mcp/tools/base_tool'
+require 'kairos_mcp/tool_registry'
+require_relative '../lib/agent'
+require_relative '../tools/agent_start'
+require_relative '../tools/agent_step'
+require_relative '../tools/agent_status'
+require_relative '../tools/agent_stop'
+$pass = 0
+$fail = 0
+def assert(description, &block)
+  result = block.call
+  if result
+    $pass += 1
+    puts "  PASS: #{description}"
+  else
+    $fail += 1
+    puts "  FAIL: #{description}"
+  end
+rescue StandardError => e
+  $fail += 1
+  puts "  FAIL: #{description} (#{e.class}: #{e.message})"
+  puts "        #{e.backtrace.first(3).join("\n        ")}"
+end
+def section(title)
+  puts "\n#{'=' * 60}"
+  puts "TEST: #{title}"
+  puts '=' * 60
+end
+# ---- Test infrastructure ----
+TMPDIR = Dir.mktmpdir('agent_complexity_test')
+module Autonomos
+  @storage_base = TMPDIR
+  def self.storage_path(subpath)
+    path = File.join(@storage_base, subpath)
+    FileUtils.mkdir_p(path)
+    path
+  end
+  def self.config
+    {}
+  end
+  module Ooda
+    COMPLEX_KEYWORDS = /\b(architect|design|refactor|migrat|restructur|integrat|security|auth)/i
+  end
+end
+require File.expand_path('../../../../.kairos/skillsets/autonomos/lib/autonomos/mandate',
+  File.dirname(__dir__))
+Session = KairosMcp::SkillSets::Agent::Session
+AgentStep = KairosMcp::SkillSets::Agent::Tools::AgentStep
+MandateAdapter = KairosMcp::SkillSets::Agent::MandateAdapter
+module Autoexec
+  class TaskDsl
+    def self.from_json(json_str)
+      parsed = JSON.parse(json_str)
+      raise ArgumentError, "Missing task_id" unless parsed['task_id']
+      raise ArgumentError, "Missing steps" unless parsed['steps'].is_a?(Array)
+      parsed
+    end
+  end
+end
+# ---- Mock tools ----
+class MockLlmCall < KairosMcp::Tools::BaseTool
+  @@responses = []
+  def self.queue_response(r); @@responses << r; end
+  def self.clear!; @@responses.clear; end
+  def name; 'llm_call'; end
+  def description; 'mock'; end
+  def input_schema; { type: 'object', properties: {} }; end
+  def call(arguments)
+    resp = @@responses.shift || { 'content' => 'default', 'tool_use' => nil, 'stop_reason' => 'end_turn' }
+    text_content(JSON.generate({
+      'status' => 'ok', 'provider' => 'mock', 'model' => 'mock-1',
+      'response' => resp, 'usage' => { 'input_tokens' => 10, 'output_tokens' => 20 },
+      'snapshot' => { 'model' => 'mock-1', 'timestamp' => Time.now.iso8601 }
+    }))
+  end
+end
+class MockKnowledgeGet < KairosMcp::Tools::BaseTool
+  def name; 'knowledge_get'; end
+  def description; 'mock'; end
+  def input_schema; { type: 'object', properties: {} }; end
+  def call(arguments)
+    text_content(JSON.generate({ 'name' => arguments['name'], 'content' => 'mock persona content' }))
+  end
+end
+class MockAutoexecPlan < KairosMcp::Tools::BaseTool
+  def name; 'autoexec_plan'; end
+  def description; 'mock'; end
+  def input_schema; { type: 'object', properties: {} }; end
+  def call(arguments)
+    task_json = JSON.parse(arguments['task_json'])
+    text_content(JSON.generate({
+      'status' => 'ok', 'task_id' => task_json['task_id'] || 'mock_001',
+      'plan_hash' => Digest::SHA256.hexdigest(arguments['task_json'])[0..15]
+    }))
+  end
+end
+class MockAutoexecRun < KairosMcp::Tools::BaseTool
+  def name; 'autoexec_run'; end
+  def description; 'mock'; end
+  def input_schema; { type: 'object', properties: {} }; end
+  def call(arguments)
+    text_content(JSON.generate({ 'status' => 'ok', 'outcome' => 'step_complete' }))
+  end
+end
+def build_registry
+  registry = KairosMcp::ToolRegistry.allocate
+  registry.instance_variable_set(:@safety, KairosMcp::Safety.new)
+  registry.instance_variable_set(:@tools, {})
+  KairosMcp::ToolRegistry.clear_gates!
+  tools = {
+    'llm_call' => MockLlmCall.new(nil, registry: registry),
+    'knowledge_get' => MockKnowledgeGet.new(nil, registry: registry),
+    'autoexec_plan' => MockAutoexecPlan.new(nil, registry: registry),
+    'autoexec_run' => MockAutoexecRun.new(nil, registry: registry),
+    'agent_start' => KairosMcp::SkillSets::Agent::Tools::AgentStart.new(nil, registry: registry),
+    'agent_step' => AgentStep.new(nil, registry: registry),
+    'agent_status' => KairosMcp::SkillSets::Agent::Tools::AgentStatus.new(nil, registry: registry),
+    'agent_stop' => KairosMcp::SkillSets::Agent::Tools::AgentStop.new(nil, registry: registry)
+  }
+  registry.instance_variable_set(:@tools, tools)
+  registry
+end
+# Helper to get AgentStep instance for testing private methods
+def build_step_tool
+  registry = build_registry
+  registry.instance_variable_get(:@tools)['agent_step']
+end
+# ---- Decision payload factories ----
+def low_complexity_payload
+  {
+    'summary' => 'Update readme file',
+    'task_json' => {
+      'task_id' => 'test_001', 'meta' => { 'description' => 'test', 'risk_default' => 'low' },
+      'steps' => [
+        { 'step_id' => 's1', 'action' => 'edit file', 'tool_name' => 'Edit',
+          'tool_arguments' => { 'file_path' => '/tmp/readme.md' }, 'risk' => 'low',
+          'depends_on' => [], 'requires_human_cognition' => false }
+      ]
+    }
+  }
+end
+def medium_complexity_payload
+  {
+    'summary' => 'Add logging to API handler',
+    'task_json' => {
+      'task_id' => 'test_002', 'meta' => { 'description' => 'test', 'risk_default' => 'high' },
+      'steps' => [
+        { 'step_id' => 's1', 'action' => 'modify handler', 'tool_name' => 'Edit',
+          'tool_arguments' => { 'file_path' => '/tmp/handler.rb' }, 'risk' => 'high',
+          'depends_on' => [], 'requires_human_cognition' => false }
+      ]
+    }
+  }
+end
+def high_complexity_payload
+  {
+    'summary' => 'Refactor authentication architecture',
+    'task_json' => {
+      'task_id' => 'test_003', 'meta' => { 'description' => 'test', 'risk_default' => 'high' },
+      'steps' => [
+        { 'step_id' => 's1', 'action' => 'modify auth', 'tool_name' => 'Edit',
+          'tool_arguments' => { 'file_path' => '/tmp/auth.rb' }, 'risk' => 'high',
+          'depends_on' => [], 'requires_human_cognition' => false },
+        { 'step_id' => 's2', 'action' => 'update config', 'tool_name' => 'Write',
+          'tool_arguments' => { 'file_path' => '/tmp/config.yml' }, 'risk' => 'medium',
+          'depends_on' => ['s1'], 'requires_human_cognition' => false }
+      ]
+    }
+  }
+end
+def l0_change_payload
+  {
+    'summary' => 'Update skill definitions',
+    'task_json' => {
+      'task_id' => 'test_004', 'meta' => { 'description' => 'test', 'risk_default' => 'low' },
+      'steps' => [
+        { 'step_id' => 's1', 'action' => 'evolve skill', 'tool_name' => 'skills_evolve',
+          'tool_arguments' => { 'name' => 'test_skill' }, 'risk' => 'low',
+          'depends_on' => [], 'requires_human_cognition' => false }
+      ]
+    }
+  }
+end
+def multi_file_payload
+  {
+    'summary' => 'Update multiple modules',
+    'task_json' => {
+      'task_id' => 'test_005', 'meta' => { 'description' => 'test', 'risk_default' => 'low' },
+      'steps' => (1..5).map { |i|
+        { 'step_id' => "s#{i}", 'action' => 'edit', 'tool_name' => 'Edit',
+          'tool_arguments' => { 'file_path' => "/tmp/file#{i}.rb" }, 'risk' => 'low',
+          'depends_on' => [], 'requires_human_cognition' => false }
+      }
+    }
+  }
+end
+def state_mutation_payload
+  {
+    'summary' => 'Record knowledge update',
+    'task_json' => {
+      'task_id' => 'test_006', 'meta' => { 'description' => 'test', 'risk_default' => 'low' },
+      'steps' => [
+        { 'step_id' => 's1', 'action' => 'update knowledge', 'tool_name' => 'knowledge_update',
+          'tool_arguments' => { 'name' => 'test' }, 'risk' => 'low',
+          'depends_on' => [], 'requires_human_cognition' => false }
+      ]
+    }
+  }
+end
+# ============================================================
+# M1: Complexity Assessment
+# ============================================================
+section "M1: Complexity Assessment"
+step = build_step_tool
+assert "test_low_complexity: single low-risk step → level 'low'" do
+  result = step.send(:assess_decision_complexity, low_complexity_payload)
+  result[:level] == 'low' && result[:signals].empty?
+end
+assert "test_medium_complexity_risk: one high-risk step → level 'medium'" do
+  result = step.send(:assess_decision_complexity, medium_complexity_payload)
+  result[:level] == 'medium' && result[:signals] == ['high_risk']
+end
+assert "test_high_complexity: high_risk + design_scope → level 'high'" do
+  result = step.send(:assess_decision_complexity, high_complexity_payload)
+  result[:level] == 'high' &&
+    result[:signals].include?('high_risk') &&
+    result[:signals].include?('design_scope')
+end
+assert "test_l0_forces_high: single l0_change → level 'high' (not medium)" do
+  result = step.send(:assess_decision_complexity, l0_change_payload)
+  result[:level] == 'high' && result[:signals] == ['l0_change']
+end
+assert "test_multi_file_signal: 4+ distinct file paths → 'multi_file' signal" do
+  result = step.send(:assess_decision_complexity, multi_file_payload)
+  result[:signals].include?('multi_file')
+end
+assert "test_state_mutation_signal: knowledge_update tool → 'state_mutation' signal" do
+  result = step.send(:assess_decision_complexity, state_mutation_payload)
+  result[:signals].include?('state_mutation')
+end
+assert "test_many_steps_signal: >5 steps → 'many_steps' signal" do
+  payload = {
+    'summary' => 'Big task',
+    'task_json' => {
+      'task_id' => 'test_007', 'meta' => {},
+      'steps' => (1..7).map { |i|
+        { 'step_id' => "s#{i}", 'action' => 'do', 'tool_name' => 'Read',
+          'tool_arguments' => {}, 'risk' => 'low', 'depends_on' => [],
+          'requires_human_cognition' => false }
+      }
+    }
+  }
+  result = step.send(:assess_decision_complexity, payload)
+  result[:signals].include?('many_steps')
+end
+assert "test_core_files_signal: kairos lib path → 'core_files' signal" do
+  payload = {
+    'summary' => 'Fix bug',
+    'task_json' => {
+      'task_id' => 'test_008', 'meta' => {},
+      'steps' => [
+        { 'step_id' => 's1', 'action' => 'fix', 'tool_name' => 'Edit',
+          'tool_arguments' => { 'file_path' => '/project/kairos_mcp/lib/kairos_mcp/chain.rb' },
+          'risk' => 'low', 'depends_on' => [], 'requires_human_cognition' => false }
+      ]
+    }
+  }
+  result = step.send(:assess_decision_complexity, payload)
+  result[:signals].include?('core_files')
+end
+assert "test_nil_task_json: missing task_json → low complexity, no crash" do
+  payload = { 'summary' => 'nothing' }
+  result = step.send(:assess_decision_complexity, payload)
+  result[:level] == 'low' && result[:signals].empty?
+end
+# ---- Merge complexity ----
+assert "test_merge_llm_structural: LLM high + structural low → final medium (capped +1)" do
+  structural = { level: 'low', signals: [] }
+  llm_hint = { 'level' => 'high', 'signals' => ['semantic_complexity'] }
+  result = step.send(:merge_complexity, structural, llm_hint)
+  result[:level] == 'medium'
+end
+assert "test_merge_llm_cannot_lower: LLM low + structural high → final high" do
+  structural = { level: 'high', signals: ['high_risk', 'design_scope'] }
+  llm_hint = { 'level' => 'low', 'signals' => [] }
+  result = step.send(:merge_complexity, structural, llm_hint)
+  result[:level] == 'high'
+end
+assert "test_merge_llm_same_level: LLM medium + structural medium → final medium" do
+  structural = { level: 'medium', signals: ['high_risk'] }
+  llm_hint = { 'level' => 'medium', 'signals' => ['moderate_scope'] }
+  result = step.send(:merge_complexity, structural, llm_hint)
+  result[:level] == 'medium' && result[:signals].include?('moderate_scope')
+end
+assert "test_merge_nil_hint: nil LLM hint → structural unchanged" do
+  structural = { level: 'medium', signals: ['high_risk'] }
+  result = step.send(:merge_complexity, structural, nil)
+  result[:level] == 'medium'
+end
+assert "test_merge_symbol_keys: symbol-key llm_hint works" do
+  structural = { level: 'low', signals: [] }
+  llm_hint = { level: 'high', signals: ['deep'] }
+  result = step.send(:merge_complexity, structural, llm_hint)
+  result[:level] == 'medium' && result[:signals].include?('deep')
+end
+# ============================================================
+# M2: Persona Review Parsing
+# ============================================================
+section "M2: Persona Review Parsing"
+assert "test_parse_approve: valid JSON with APPROVE → overall_verdict APPROVE" do
+  content = JSON.generate({
+    'personas' => { 'pragmatic' => { 'verdict' => 'APPROVE' } },
+    'overall_verdict' => 'APPROVE',
+    'key_findings' => []
+  })
+  result = step.send(:parse_persona_review, content)
+  result[:overall_verdict] == 'APPROVE'
+end
+assert "test_parse_revise: REVISE verdict → correctly parsed" do
+  content = JSON.generate({
+    'personas' => { 'skeptic' => { 'verdict' => 'REVISE', 'concerns' => ['No rollback'] } },
+    'overall_verdict' => 'revise',
+    'key_findings' => ['No rollback plan']
+  })
+  result = step.send(:parse_persona_review, content)
+  result[:overall_verdict] == 'REVISE' && result[:key_findings] == ['No rollback plan']
+end
+assert "test_parse_reject: REJECT verdict → correctly parsed" do
+  content = JSON.generate({
+    'personas' => {},
+    'overall_verdict' => 'REJECT',
+    'key_findings' => ['Violates layer boundaries']
+  })
+  result = step.send(:parse_persona_review, content)
+  result[:overall_verdict] == 'REJECT'
+end
+assert "test_parse_json_error: malformed content → fallback REVISE with parse_error" do
+  result = step.send(:parse_persona_review, 'not json at all')
+  result[:overall_verdict] == 'REVISE' && result[:parse_error] == true
+end
+assert "test_parse_nil_content: nil → fallback REVISE" do
+  result = step.send(:parse_persona_review, nil)
+  result[:overall_verdict] == 'REVISE' && result[:parse_error] == true
+end
+assert "test_parse_missing_verdict: valid JSON but no overall_verdict → fallback REVISE" do
+  content = JSON.generate({ 'personas' => {}, 'key_findings' => [] })
+  result = step.send(:parse_persona_review, content)
+  result[:overall_verdict] == 'REVISE' && result[:parse_error] == true
+end
+assert "test_parse_non_string_verdict: numeric verdict → fallback REVISE" do
+  content = JSON.generate({ 'overall_verdict' => 42, 'key_findings' => [], 'personas' => {} })
+  result = step.send(:parse_persona_review, content)
+  result[:overall_verdict] == 'REVISE' && result[:parse_error] == true
+end
+assert "test_parse_code_fenced_json: JSON in code fences → parsed correctly" do
+  content = "Here is my review:\n```json\n{\"overall_verdict\": \"APPROVE\", \"key_findings\": [], \"personas\": {}}\n```"
+  result = step.send(:parse_persona_review, content)
+  result[:overall_verdict] == 'APPROVE'
+end
+assert "test_parse_bare_json_after_prose: JSON after text (no fences) → parsed" do
+  content = "Here is my analysis:\n{\"overall_verdict\": \"REVISE\", \"key_findings\": [\"issue found\"], \"personas\": {}}"
+  result = step.send(:parse_persona_review, content)
+  result[:overall_verdict] == 'REVISE' && result[:key_findings] == ['issue found']
+end
+# ---- Lightweight review parsing ----
+assert "test_parse_lightweight_concerns: valid JSON → concerns extracted" do
+  content = JSON.generate({ 'concerns' => ['edge case missed'], 'suggestions' => ['add test'] })
+  result = step.send(:parse_lightweight_review, content)
+  result[:concerns] == ['edge case missed']
+end
+assert "test_parse_lightweight_nil: nil content → empty concerns" do
+  result = step.send(:parse_lightweight_review, nil)
+  result[:concerns] == []
+end
+assert "test_parse_lightweight_malformed: bad JSON → empty concerns" do
+  result = step.send(:parse_lightweight_review, 'garbage')
+  result[:concerns] == []
+end
+# ============================================================
+# M3: review_enabled? and configuration
+# ============================================================
+section "M3: Configuration"
+assert "test_review_enabled_default: no config → true" do
+  session = Session.new(
+    session_id: 'test_cfg_1', mandate_id: 'test_m', goal_name: 'test',
+    invocation_context: KairosMcp::InvocationContext.new, config: {}
+  )
+  step.send(:review_enabled?, session) == true
+end
+assert "test_review_disabled: enabled=false → false" do
+  session = Session.new(
+    session_id: 'test_cfg_2', mandate_id: 'test_m', goal_name: 'test',
+    invocation_context: KairosMcp::InvocationContext.new,
+    config: { 'complexity_review' => { 'enabled' => false } }
+  )
+  step.send(:review_enabled?, session) == false
+end
+assert "test_review_enabled_explicit: enabled=true → true" do
+  session = Session.new(
+    session_id: 'test_cfg_3', mandate_id: 'test_m', goal_name: 'test',
+    invocation_context: KairosMcp::InvocationContext.new,
+    config: { 'complexity_review' => { 'enabled' => true } }
+  )
+  step.send(:review_enabled?, session) == true
+end
+# ============================================================
+# M4: Session review methods
+# ============================================================
+section "M4: Session review persistence"
+assert "test_save_load_review_result: round-trip review result (symbol keys)" do
+  session = Session.new(
+    session_id: 'test_review_1', mandate_id: 'test_m', goal_name: 'test',
+    invocation_context: KairosMcp::InvocationContext.new, config: {}
+  )
+  review = { overall_verdict: 'APPROVE', key_findings: [], personas: {} }
+  session.save_review_result(review)
+  loaded = session.load_review_result
+  loaded && loaded[:overall_verdict] == 'APPROVE'
+end
+assert "test_load_review_result_missing: no file → nil" do
+  session = Session.new(
+    session_id: 'test_review_nonexist', mandate_id: 'test_m', goal_name: 'test',
+    invocation_context: KairosMcp::InvocationContext.new, config: {}
+  )
+  session.load_review_result.nil?
+end
+assert "test_load_review_result_symbol_keys: round-trip preserves symbol keys" do
+  session = Session.new(
+    session_id: 'test_review_sym', mandate_id: 'test_m', goal_name: 'test',
+    invocation_context: KairosMcp::InvocationContext.new, config: {}
+  )
+  review = { overall_verdict: 'APPROVE', key_findings: [], personas: {} }
+  session.save_review_result(review)
+  loaded = session.load_review_result
+  loaded && loaded[:overall_verdict] == 'APPROVE'
+end
+assert "test_save_progress_amendment: appends review concerns to progress" do
+  session = Session.new(
+    session_id: 'test_amend_1', mandate_id: 'test_m', goal_name: 'test',
+    invocation_context: KairosMcp::InvocationContext.new, config: {}
+  )
+  session.save_progress_amendment(['concern 1', 'concern 2'])
+  progress = session.load_progress
+  progress.any? { |e| e['type'] == 'review_amendment' && e['concerns'].include?('concern 1') }
+end
+assert "test_save_progress_amendment_cycle_number: uses current cycle_number" do
+  session = Session.new(
+    session_id: 'test_amend_cycle', mandate_id: 'test_m', goal_name: 'test',
+    invocation_context: KairosMcp::InvocationContext.new, config: {}
+  )
+  session.increment_cycle  # simulate post-ACT increment → cycle_number = 1
+  session.save_progress_amendment(['test concern'])
+  progress = session.load_progress
+  progress.any? { |e| e['type'] == 'review_amendment' && e['cycle'] == 1 }
+end
+# ============================================================
+# M5: Persona review prompt building
+# ============================================================
+section "M5: Prompt building"
+assert "test_persona_review_prompt_contains_summary: summary in prompt" do
+  payload = high_complexity_payload
+  complexity = { level: 'high', signals: ['high_risk', 'design_scope'] }
+  persona_defs = { 'skeptic' => 'Be critical.' }
+  prompt = step.send(:build_persona_review_prompt, payload, complexity, persona_defs)
+  prompt.include?('Refactor authentication architecture') && prompt.include?('skeptic')
+end
+assert "test_lightweight_review_prompt: contains plan summary" do
+  payload = medium_complexity_payload
+  ar_result = { act: { 'summary' => 'completed' }, reflect: { 'confidence' => 0.8, 'achieved' => ['done'] } }
+  prompt = step.send(:build_lightweight_review_prompt, payload, ar_result)
+  prompt.include?('Add logging to API handler') && prompt.include?('skeptical')
+end
+assert "test_multi_llm_review_prompt: L0 review prompt generated" do
+  session = Session.new(
+    session_id: 'test_mlp_1', mandate_id: 'test_m', goal_name: 'test_goal',
+    invocation_context: KairosMcp::InvocationContext.new, config: {}
+  )
+  prompt = step.send(:generate_multi_llm_review_prompt, session, l0_change_payload)
+  prompt.include?('L0 Change Review') && prompt.include?('evolve skill') && prompt.include?('test_goal')
+end
+# ============================================================
+# Summary
+# ============================================================
+puts "\n#{'=' * 60}"
+puts "RESULTS: #{$pass} passed, #{$fail} failed (total: #{$pass + $fail})"
+puts '=' * 60
+FileUtils.rm_rf(TMPDIR)
+exit($fail > 0 ? 1 : 0)

data/templates/skillsets/agent/tools/agent_step.rb CHANGED Viewed

@@ -372,6 +372,105 @@ module KairosMcp
                   return finalize_autonomous(session, results, paused: 'risk_exceeded')
                 end
+                # Gate 5.5: Complexity-driven review
+                review_cfg = session.config['complexity_review'] || {}
+                complexity = assess_decision_complexity(decision_payload)
+                llm_hint = decision_payload['complexity_hint']
+                complexity = merge_complexity(complexity, llm_hint) if llm_hint
+                if review_enabled?(session)
+                  # Gate 5.5a: L0 escalation (before persona review — save LLM cost)
+                  if complexity[:signals].include?('l0_change') &&
+                     review_cfg.fetch('l0_always_checkpoint', true)
+                    multi_llm_prompt = generate_multi_llm_review_prompt(session, decision_payload)
+                    session.update_state('checkpoint')
+                    session.save
+                    return finalize_autonomous(session, results, checkpoint: true,
+                                               warning: 'l0_requires_external_review',
+                                               multi_llm_prompt: multi_llm_prompt)
+                  end
+                  # Gate 5.5b: High-complexity persona review (inner retry loop)
+                  if complexity[:level] == 'high'
+                    review_retries = 0
+                    max_retries = review_cfg['max_review_retries'] || 2
+                    loop do
+                      # Budget guard inside inner loop (P1-2 fix)
+                      if total_llm_calls >= max_total_llm
+                        session.update_state('checkpoint')
+                        session.save
+                        return finalize_autonomous(session, results, checkpoint: true,
+                                                   paused: 'llm_budget_exceeded')
+                      end
+                      review = run_persona_review(session, decision_payload, complexity)
+                      total_llm_calls += review[:llm_calls] || 0
+                      session.save_review_result(review)
+                      case review[:overall_verdict]
+                      when 'APPROVE'
+                        break
+                      when 'REJECT'
+                        session.update_state('checkpoint')
+                        session.save
+                        return finalize_autonomous(session, results, checkpoint: true,
+                                                   warning: 'review_rejected', review: review)
+                      else # REVISE or parse fallback
+                        review_retries += 1
+                        if review_retries > max_retries
+                          session.update_state('checkpoint')
+                          session.save
+                          return finalize_autonomous(session, results, checkpoint: true,
+                                                     warning: 'review_max_retries', review: review)
+                        end
+                        findings = Array(review[:key_findings]).join("\n- ")
+                        feedback = "Persona review (attempt #{review_retries}/#{max_retries}) found issues:\n- #{findings}\n\nRevise the plan to address these concerns."
+                        decide_result = run_decide_with_review_feedback_internal(session, feedback)
+                        total_llm_calls += decide_result[:llm_calls] || 0
+                        if decide_result[:error]
+                          session.update_state('paused_error')
+                          session.save
+                          return finalize_autonomous(session, results, error: decide_result[:error])
+                        end
+                        decision_payload = session.load_decision
+                        # Re-check loop detection with review-tagged summary
+                        tagged_summary = "#{decision_payload['summary']}_review_rev#{review_retries}"
+                        loop_term = check_loop_detection(
+                          session, nil,
+                          decision_payload.merge('summary' => tagged_summary),
+                          mandate_override: mandate
+                        )
+                        if loop_term
+                          session.update_state('terminated')
+                          return finalize_autonomous(session, results, terminated: 'loop_detected')
+                        end
+                        # Re-check risk budget on revised plan
+                        proposal = MandateAdapter.to_mandate_proposal(decision_payload)
+                        if ::Autonomos::Mandate.risk_exceeds_budget?(proposal, mandate[:risk_budget])
+                          mandate[:status] = 'paused_risk_exceeded'
+                          ::Autonomos::Mandate.save(session.mandate_id, mandate)
+                          session.update_state('paused_risk')
+                          session.save
+                          return finalize_autonomous(session, results, paused: 'risk_exceeded')
+                        end
+                        # Re-assess complexity for revised plan
+                        complexity = assess_decision_complexity(decision_payload)
+                        llm_hint = decision_payload['complexity_hint']
+                        complexity = merge_complexity(complexity, llm_hint) if llm_hint
+                        break unless complexity[:level] == 'high'
+                      end
+                    end
+                  end
+                end
                 # ACT + REFLECT
                 ar_result = run_act_reflect_internal(session)
                 total_llm_calls += ar_result[:llm_calls] || 0
@@ -391,6 +490,19 @@ module KairosMcp
                   return finalize_autonomous(session, results, terminated: term_reason)
                 end
+                # Gate 6.5: Post-ACT advisory review for medium complexity
+                if review_enabled?(session) &&
+                   review_cfg.fetch('post_act_review', true) &&
+                   complexity[:level] == 'medium' &&
+                   ar_result[:act_succeeded]
+                  post_review = run_lightweight_review(session, decision_payload, ar_result)
+                  total_llm_calls += post_review[:llm_calls] || 0
+                  if Array(post_review[:concerns]).any?
+                    ar_result[:reflect]['review_concerns'] = post_review[:concerns]
+                    session.save_progress_amendment(post_review[:concerns])
+                  end
+                end
                 # Gate 7: Confidence-based early exit
                 if session.cycle_number >= min_exit_cycles
                   confidence = clamp_confidence(ar_result.dig(:reflect, 'confidence'))
@@ -425,7 +537,8 @@ module KairosMcp
           end
           def finalize_autonomous(session, cycle_results, terminated: nil, paused: nil,
-                                  checkpoint: nil, error: nil, warning: nil)
+                                  checkpoint: nil, error: nil, warning: nil,
+                                  review: nil, multi_llm_prompt: nil)
             session.save
             status = if checkpoint then 'checkpoint'
@@ -451,6 +564,8 @@ module KairosMcp
               }
             }
             response['permission_advisory'] = session.permission_advisory if session.permission_advisory
+            response['review'] = review if review
+            response['multi_llm_prompt'] = multi_llm_prompt if multi_llm_prompt
             text_content(JSON.generate(response))
           end
@@ -806,6 +921,216 @@ module KairosMcp
             warn "[agent] Failed to record cycle: #{e.message}"
           end
+          # ---- Complexity Assessment ----
+          L0_TOOLS = %w[skills_evolve skills_rollback instructions_update system_upgrade].freeze
+          STATE_MUTATION_TOOLS = %w[state_commit chain_record knowledge_update formalization_record].freeze
+          def assess_decision_complexity(decision_payload)
+            signals = []
+            steps = decision_payload.dig('task_json', 'steps') || []
+            signals << 'high_risk' if steps.any? { |s| s['risk'] == 'high' }
+            signals << 'many_steps' if steps.size > 5
+            signals << 'design_scope' if decision_payload['summary']&.match?(
+              ::Autonomos::Ooda::COMPLEX_KEYWORDS
+            )
+            signals << 'l0_change' if steps.any? { |s| L0_TOOLS.include?(s['tool_name']) }
+            signals << 'core_files' if steps.any? { |s|
+              path = s.dig('tool_arguments', 'file_path').to_s
+              path.include?('/lib/') && path.include?('kairos')
+            }
+            file_paths = steps.filter_map { |s| s.dig('tool_arguments', 'file_path') }.uniq
+            signals << 'multi_file' if file_paths.size > 3
+            signals << 'state_mutation' if steps.any? { |s| STATE_MUTATION_TOOLS.include?(s['tool_name']) }
+            level = case signals.size
+                    when 0 then 'low'
+                    when 1 then 'medium'
+                    else 'high'
+                    end
+            # L0 override: always high
+            level = 'high' if signals.include?('l0_change')
+            { level: level, signals: signals }
+          end
+          def merge_complexity(structural, llm_hint)
+            levels = { 'low' => 0, 'medium' => 1, 'high' => 2 }
+            s_val = levels[structural[:level]] || 0
+            l_val = levels[llm_hint&.dig('level') || llm_hint&.dig(:level)] || 0
+            # LLM can raise by at most 1 level
+            capped_llm = [l_val, s_val + 1].min
+            final_val = [s_val, capped_llm].max
+            final_level = levels.key(final_val) || 'low'
+            {
+              level: final_level,
+              signals: (structural[:signals] + Array(llm_hint&.dig('signals') || llm_hint&.dig(:signals))).uniq
+            }
+          end
+          def review_enabled?(session)
+            review_cfg = session.config['complexity_review'] || {}
+            review_cfg.fetch('enabled', true)
+          end
+          # ---- Persona Review ----
+          def run_persona_review(session, decision_payload, complexity)
+            review_cfg = session.config['complexity_review'] || {}
+            personas = if complexity[:signals].include?('l0_change')
+                         review_cfg['high_personas'] || %w[kairos pragmatic skeptic]
+                       else
+                         review_cfg['personas'] || %w[pragmatic skeptic]
+                       end
+            persona_defs = load_persona_definitions(personas, session)
+            prompt = build_persona_review_prompt(decision_payload, complexity, persona_defs)
+            review_loop = CognitiveLoop.new(self, session)
+            messages = [{ 'role' => 'user', 'content' => prompt }]
+            result = review_loop.run_phase('review', persona_review_system_prompt, messages, [])
+            parsed = parse_persona_review(result['content'])
+            parsed[:llm_calls] = review_loop.total_calls
+            parsed
+          end
+          def run_lightweight_review(session, decision_payload, ar_result)
+            prompt = build_lightweight_review_prompt(decision_payload, ar_result)
+            review_loop = CognitiveLoop.new(self, session)
+            messages = [{ 'role' => 'user', 'content' => prompt }]
+            result = review_loop.run_phase('review', lightweight_review_system_prompt, messages, [])
+            parsed = parse_lightweight_review(result['content'])
+            parsed[:llm_calls] = review_loop.total_calls
+            parsed
+          end
+          def parse_persona_review(content)
+            return review_parse_fallback('no content') unless content
+            json_str = extract_json_from_content(content)
+            return review_parse_fallback('no JSON found') unless json_str
+            parsed = JSON.parse(json_str)
+            verdict = parsed['overall_verdict']
+            unless verdict.is_a?(String)
+              return review_parse_fallback("invalid overall_verdict type: #{verdict.class}")
+            end
+            parsed['overall_verdict'] = verdict.upcase
+            parsed.transform_keys(&:to_sym)
+          rescue JSON::ParserError => e
+            review_parse_fallback("JSON parse error: #{e.message}")
+          end
+          def review_parse_fallback(reason)
+            {
+              overall_verdict: 'REVISE',
+              key_findings: ["Review parse failed (#{reason}) — defaulting to REVISE"],
+              parse_error: true,
+              personas: {}
+            }
+          end
+          def parse_lightweight_review(content)
+            return { concerns: [], llm_calls: 0 } unless content
+            json_str = extract_json_from_content(content)
+            if json_str
+              parsed = JSON.parse(json_str)
+              { concerns: Array(parsed['concerns']), suggestions: Array(parsed['suggestions']) }
+            else
+              { concerns: [], suggestions: [], parse_error: true }
+            end
+          rescue JSON::ParserError
+            { concerns: [], suggestions: [], parse_error: true }
+          end
+          def load_persona_definitions(persona_names, session)
+            result = invoke_tool('knowledge_get', { 'name' => 'persona_definitions' },
+                                 context: session.invocation_context)
+            parsed = JSON.parse(result.map { |b| b[:text] || b['text'] }.compact.join)
+            content = parsed['content'] || ''
+            extract_persona_sections(content, persona_names)
+          rescue StandardError
+            # Hardcoded fallback
+            {
+              'pragmatic' => 'Evaluate for real-world utility, implementation complexity, and maintenance burden.',
+              'skeptic' => 'Challenge assumptions, identify edge cases, failure modes, and unintended consequences.',
+              'kairos' => 'Evaluate alignment with KairosChain philosophy: self-referentiality, structural integrity, and layer boundaries.'
+            }.slice(*persona_names)
+          end
+          def extract_persona_sections(content, persona_names)
+            defs = {}
+            persona_names.each do |name|
+              # Try to find "### name" or "## name" section
+              if content =~ /##\s*#{Regexp.escape(name)}\s*\n(.*?)(?=\n##|\z)/mi
+                defs[name] = $1.strip[0..300]
+              end
+            end
+            defs
+          end
+          # ---- Internal DECIDE with Review Feedback ----
+          def run_decide_with_review_feedback_internal(session, feedback)
+            loop_inst = CognitiveLoop.new(self, session)
+            prior_decision = session.load_decision
+            prior_json = prior_decision ? JSON.generate(prior_decision) : '(none)'
+            catalog = build_tool_catalog(session)
+            messages = [
+              { 'role' => 'user', 'content' =>
+                "## Available Tools\n#{catalog}\n\n" \
+                "Previous plan:\n#{prior_json}\n\n" \
+                "This plan was flagged by persona review. Feedback:\n#{feedback}\n\n" \
+                "Revise the plan and output a new decision_payload as JSON. " \
+                "Include a complexity_hint key in your output. Use ONLY tools listed above." }
+            ]
+            decide_result = loop_inst.run_decide(decide_system_prompt, messages)
+            if decide_result['error']
+              return { error: decide_result['error'], llm_calls: loop_inst.total_calls }
+            end
+            session.save_decision(decide_result['decision_payload'])
+            { decision_payload: decide_result['decision_payload'],
+              llm_calls: loop_inst.total_calls, error: nil }
+          end
+          # ---- Multi-LLM Review Prompt Generation ----
+          def generate_multi_llm_review_prompt(session, decision_payload)
+            summary = decision_payload['summary'] || 'unknown'
+            steps = decision_payload.dig('task_json', 'steps') || []
+            step_desc = steps.map.with_index(1) { |s, i|
+              "  #{i}. #{s['action'] || s['tool_name']} (risk: #{s['risk']})"
+            }.join("\n")
+            <<~PROMPT
+              # L0 Change Review Required
+              An autonomous agent proposed the following L0-level change.
+              L0 changes modify the KairosChain framework itself and require external review.
+              ## Goal: #{session.goal_name}
+              ## Summary: #{summary}
+              ## Steps:
+              #{step_desc}
+              ## Review Criteria
+              1. Does this change preserve structural self-referentiality?
+              2. Is the change recorded on the blockchain?
+              3. Could this be a SkillSet instead of core infrastructure?
+              4. Are layer boundaries (L0/L1/L2) respected?
+              Please evaluate with APPROVE / REVISE / REJECT and explain your reasoning.
+            PROMPT
+          end
           # ---- Prompts ----
           def orient_system_prompt
@@ -830,6 +1155,19 @@ module KairosMcp
             "remaining: [...], learnings: [...], open_questions: [...]}."
           end
+          def persona_review_system_prompt
+            "You are a multi-perspective review panel evaluating an autonomous agent's " \
+            "proposed action plan. Each persona has a distinct viewpoint. Evaluate " \
+            "independently, then synthesize. Output ONLY a JSON object with the " \
+            "structure specified in the prompt."
+          end
+          def lightweight_review_system_prompt
+            "You are a skeptical reviewer evaluating the results of an autonomous agent's " \
+            "execution. Identify any concerns, edge cases, or quality issues. " \
+            "Output a JSON object: {concerns: [...], suggestions: [...]}."
+          end
           def build_orient_prompt(session, observation_text = nil)
             parts = ["Goal: #{session.goal_name}", "Cycle: #{session.cycle_number + 1}"]
             # M5: Prepend progress summary for cross-cycle continuity
@@ -858,9 +1196,71 @@ module KairosMcp
             "Based on this analysis:\n#{analysis}\n\n" \
             "## Available Tools\n#{catalog}\n\n" \
             "Create a task execution plan as JSON (decision_payload format). " \
+            "Include a 'complexity_hint' key: {\"level\": \"low\"|\"medium\"|\"high\", " \
+            "\"signals\": [\"reason1\"]}. Assess complexity based on risk, step count, " \
+            "architectural scope, and L0 framework changes. " \
             "Use ONLY tools listed above."
           end
+          def build_persona_review_prompt(decision_payload, complexity, persona_defs)
+            summary = decision_payload['summary'] || 'unknown'
+            steps = decision_payload.dig('task_json', 'steps') || []
+            step_text = steps.map.with_index(1) { |s, i|
+              "  #{i}. #{s['action'] || s['tool_name']} (risk: #{s['risk']}, tool: #{s['tool_name']})"
+            }.join("\n")
+            persona_sections = persona_defs.map { |name, desc|
+              "### #{name}\n#{desc}\nEvaluate from this perspective."
+            }.join("\n\n")
+            <<~PROMPT
+              Evaluate the following proposed action plan from multiple perspectives.
+              ## Proposal
+              Summary: #{summary}
+              Complexity: #{complexity[:level]} (#{complexity[:signals].join(', ')})
+              Steps:
+              #{step_text}
+              ## Personas
+              #{persona_sections}
+              For EACH persona, provide:
+              - VERDICT: APPROVE | REVISE | REJECT
+              - CONCERNS: [list of specific concerns]
+              - SUGGESTIONS: [list of improvements]
+              Then provide:
+              - OVERALL_VERDICT: APPROVE (all approve) | REVISE (any revise) | REJECT (any reject)
+              - KEY_FINDINGS: [consolidated list of all concerns]
+              Output as a single JSON object.
+            PROMPT
+          end
+          def build_lightweight_review_prompt(decision_payload, ar_result)
+            summary = decision_payload['summary'] || 'unknown'
+            act_summary = ar_result.dig(:act, 'summary') || 'unknown'
+            confidence = ar_result.dig(:reflect, 'confidence') || 0.0
+            achieved = ar_result.dig(:reflect, 'achieved') || []
+            <<~PROMPT
+              Review the execution results of an autonomous agent cycle.
+              Plan summary: #{summary}
+              Execution result: #{act_summary}
+              Confidence: #{confidence}
+              Achieved: #{achieved.join(', ')}
+              As a skeptical reviewer, identify any concerns about:
+              1. Whether the execution actually achieved what was planned
+              2. Edge cases or error handling that may have been missed
+              3. Quality issues in the approach taken
+              Output a JSON object: {"concerns": [...], "suggestions": [...]}
+            PROMPT
+          end
           def build_reflect_prompt(session, act_result)
             "Goal: #{session.goal_name}\n" \
             "Execution result:\n#{JSON.generate(act_result)}\n\n" \
@@ -928,6 +1328,34 @@ module KairosMcp
             required.map(&:to_s)
           end
+          # ---- JSON Extraction ----
+          # Extract valid JSON from content that may include code fences or prose.
+          # Same logic as CognitiveLoop#extract_json but accessible from AgentStep.
+          def extract_json_from_content(content)
+            JSON.parse(content)
+            content
+          rescue JSON::ParserError
+            # Try code fences first
+            if content =~ /```(?:json)?\s*\n?(.*?)\n?```/m
+              begin
+                JSON.parse($1)
+                return $1
+              rescue JSON::ParserError
+                # fall through
+              end
+            end
+            # Bare JSON after prose: find first { to last }
+            if content =~ /(\{.*\})/m
+              begin
+                JSON.parse($1)
+                return $1
+              rescue JSON::ParserError
+                nil
+              end
+            end
+          end
           # ---- Helpers ----
           def load_last_decision(session)

data/templates/skillsets/llm_client/lib/llm_client/claude_code_adapter.rb CHANGED Viewed

@@ -2,6 +2,7 @@
 require 'json'
 require 'open3'
+require 'timeout'
 require_relative 'adapter'
 module KairosMcp
@@ -10,15 +11,30 @@ module KairosMcp
       # Adapter that uses Claude Code CLI as the LLM backend.
       # No API costs — uses the Claude Code subscription.
       # Invokes `claude -p --output-format json` as a subprocess.
+      #
+      # Key safety measures:
+      # - --mcp-config '{"mcpServers":{}}' prevents recursive MCP server loading
+      # - --no-session-persistence avoids polluting session state
+      # - Timeout.timeout prevents indefinite hangs
       class ClaudeCodeAdapter < Adapter
+        DEFAULT_TIMEOUT = 120
         def call(messages:, system: nil, tools: nil, model: nil,
                  max_tokens: nil, temperature: nil, output_schema: nil)
           prompt = build_prompt(messages, system, tools, output_schema)
-          args = ['claude', '-p', '--output-format', 'json']
+          timeout_seconds = @config&.dig('timeout_seconds') || DEFAULT_TIMEOUT
+          args = [
+            'claude', '-p',
+            '--output-format', 'json',
+            '--no-session-persistence',
+            '--mcp-config', '{"mcpServers":{}}'
+          ]
           args += ['--model', model] if model
-          stdout, stderr, status = Open3.capture3(*args, stdin_data: prompt)
+          stdout, stderr, status = Timeout.timeout(timeout_seconds) do
+            Open3.capture3(*args, stdin_data: prompt)
+          end
           unless status.success?
             raise ApiError.new(
@@ -28,6 +44,11 @@ module KairosMcp
           end
           parse_response(stdout)
+        rescue Timeout::Error
+          raise ApiError.new(
+            "Claude Code timed out after #{timeout_seconds}s",
+            provider: 'claude_code', retryable: true
+          )
         rescue Errno::ENOENT
           raise ApiError.new(
             "Claude Code CLI not found. Install: https://docs.anthropic.com/en/docs/claude-code",

data/templates/skillsets/llm_client/tools/llm_call.rb CHANGED Viewed

@@ -3,11 +3,11 @@
 require 'json'
 require 'digest'
 require 'time'
+# Only load always-needed modules at startup.
+# Provider adapters are lazy-loaded in build_adapter() to avoid
+# crashing when optional gems (faraday, aws-sdk) are not installed.
 require_relative '../lib/llm_client/adapter'
-require_relative '../lib/llm_client/anthropic_adapter'
-require_relative '../lib/llm_client/openai_adapter'
 require_relative '../lib/llm_client/claude_code_adapter'
-require_relative '../lib/llm_client/bedrock_adapter'
 require_relative '../lib/llm_client/schema_converter'
 module KairosMcp
@@ -172,12 +172,15 @@ module KairosMcp
           def build_adapter(config)
             case config['provider']
             when 'openai', 'local', 'openrouter'
+              require_relative '../lib/llm_client/openai_adapter'
               OpenaiAdapter.new(config)
             when 'claude_code'
               ClaudeCodeAdapter.new(config)
             when 'bedrock'
+              require_relative '../lib/llm_client/bedrock_adapter'
               BedrockAdapter.new(config)
             else
+              require_relative '../lib/llm_client/anthropic_adapter'
               AnthropicAdapter.new(config)
             end
           end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: kairos-chain
 version: !ruby/object:Gem::Version
-  version: 3.12.0
+  version: 3.13.0
 platform: ruby
 authors:
 - Masaomi Hatakeyama
@@ -214,6 +214,7 @@ files:
 - templates/skillsets/agent/lib/agent/session.rb
 - templates/skillsets/agent/skillset.json
 - templates/skillsets/agent/test/test_agent_capability_discovery.rb
+- templates/skillsets/agent/test/test_agent_complexity_review.rb
 - templates/skillsets/agent/test/test_agent_m1.rb
 - templates/skillsets/agent/test/test_agent_m2.rb
 - templates/skillsets/agent/test/test_agent_m3.rb