RubyGems - kairos-chain - Versions diffs - 3.30.0 → 3.31.0 - Mend

kairos-chain 3.30.0 → 3.31.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +35 -0
data/lib/kairos_mcp/drift_detection/correspondence_checker.rb +136 -0
data/lib/kairos_mcp/knowledge_provider.rb +5 -0
data/lib/kairos_mcp/tools/knowledge_get.rb +50 -1
data/lib/kairos_mcp/tools/knowledge_update.rb +7 -5
data/lib/kairos_mcp/version.rb +1 -1
data/templates/knowledge/llm_cross_evaluation/scripts/run_cross_eval.rb +8 -0
data/templates/knowledge/multi_llm_review_workflow/multi_llm_review_workflow.md +77 -52
data/templates/skillsets/multi_llm_review/config/multi_llm_review.yml +42 -16
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 3ec94fadb095a49a89c0235010c80e37913aaf6781aa16d22a2e5855b1eb154d
-  data.tar.gz: 38c3b631326b87f07285f9a5f4ecdd64fc22d086b6b08de76d5b2aaf2d7d7040
+  metadata.gz: 227d024f36839c595295ed0e3b4415c6764750597283b59a21ea1f5e16112210
+  data.tar.gz: 8fd8767580dbe3db617cbaf4ee0dcfafdb38f5823315f0321bae0c4893b2f45d
 SHA512:
-  metadata.gz: 7d3d7e9d5f52b58109a795ce4f67243551ad353d18d273d4062034e09bcafe8201559ceabcef16c5ad0d0fc0bca4b8b8629c27cb49e224091ae05ffd5aa364ac
-  data.tar.gz: a20e8e7a2cb18d8cbcbd76f39c7fa69a9bbd55d4447c16fa289ca8227fd2537325693748e711e46e57693c6572b41f15096dc358368f89911d95eaea3c714ca8
+  metadata.gz: 7486fef959f54577c0a13a301112f746f8c440100f11f5d424f27771759ad3024d6d30988b7a9e7c58a4544db1ffe6c2705b3271226d9445f1537dc591f96ef7
+  data.tar.gz: 8d670bb2a2f5d8d848072101527fcb5b3153c23ea4c5ae1359c1d46c30187011911b32512086e1e1734e72ba5d0d3ae20913499d5d1d8c4b4d1d359b79963281

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,41 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
 This project follows [Semantic Versioning](https://semver.org/).
+## [3.31.0] - 2026-06-11
+### Changed — multi_llm_review roster: Fable 5 + Opus 4.6/4.8 (6 reviewers)
+Default reviewer roster updated for the Fable 5 / Opus 4.8 model generation:
+- Orchestrator/team slot: `claude-fable-5` (`claude_team_fable5`) replaces
+  Opus 4.7. Opus 4.8 added as a second subprocess CLI reviewer
+  (`claude_cli_opus4.8`) alongside Opus 4.6, which is retained for its
+  documented complementary bias (ambiguity-preserving, self-reference-friendly).
+  4.7 retired: its register is covered by 4.8 and Fable 5.
+- Convergence rules: `4/6 APPROVE` full roster, `3/5 APPROVE` after
+  orchestrator exclusion ("exclude" strategy only — "subprocess" keeps the
+  full roster; "delegate" re-adds the slot at collect, so 4/6 governs there).
+- `timeout_seconds` raised 300 → 600 (live 6-roster run measured 381s
+  wall-clock with `max_concurrent: 2`).
+- Validated by a 2-round self-referential review of the workflow L1 with the
+  new roster itself (R1 REVISE → fixes → R2 with 4.6/4.8/codex-5.4 APPROVE,
+  including a code-grounded Cursor correction of the exclusion semantics).
+### Changed — `multi_llm_review_workflow` L1 v3.5
+- All roster-dependent sections updated (pre-flight checklist, CLI tool
+  matrix, convergence rules, orchestrator self-identification, orchestration
+  template, LLM identifiers). Claude CLI 4.6/4.8 rows verified live 2026-06-10.
+- Effort escalation paragraph scoped to coding/design sub-agents and the
+  revision phase (reviewers stay at default per the 2026-04-29 policy).
+### Fixed — `knowledge_update` size guidance
+- Removed the "~2 KB MCP stdio limit" warning from the tool description and
+  nil-content diagnostic: two ~40 KB updates succeeded over stdio on
+  2026-06-10/11. The old figure generalized a single unreproduced
+  nil-content incident; the nil-content detection itself is retained.
 ## [3.30.0] - 2026-06-08
 ### Added — `dream_digest`: derived narrative view over L2/L1 fragments (dream SkillSet v0.3.0)

data/lib/kairos_mcp/drift_detection/correspondence_checker.rb ADDED Viewed

@@ -0,0 +1,136 @@
+# frozen_string_literal: true
+require 'digest'
+require 'json'
+require_relative '../kairos_chain/chain'
+module KairosMcp
+  module DriftDetection
+    # CorrespondenceChecker — INV-A detection floor (Cycle 1, toward by-construction).
+    #
+    # Checks whether a live L0/L1 artifact still corresponds to its *current
+    # recorded provenance*: the content digest stored for that artifact at the
+    # head of the constitutive record (the hash chain). This is detection only —
+    # it surfaces divergence; it does not prevent edits or gate writes. Those are
+    # later cycles (single-source enforcement, record-as-gate).
+    #
+    # Provenance is rooted in the hash chain, not in the SQLite knowledge_meta
+    # cache: INV-A names the chain head as the non-editable anchor. The chain is
+    # therefore the single source consulted here; the meta table (when present)
+    # is a derived view and is intentionally not used for the comparison.
+    #
+    # The digest is computed over the *raw file content* (frontmatter included),
+    # matching exactly how it was recorded on create/update (a verbatim write,
+    # no normalization). Comparing the parsed/stripped body would never match.
+    class CorrespondenceChecker
+      # Result of a single correspondence check.
+      #
+      # status:
+      #   :match            live artifact corresponds to recorded provenance
+      #   :mismatch         live content diverged from recorded provenance (silent edit)
+      #   :missing_record   live artifact relied upon, but no recorded provenance exists
+      #   :missing_artifact recorded/expected artifact is absent at the reliance point
+      #   :error            the check itself could not complete (not a correspondence claim)
+      Result = Struct.new(
+        :status, :name, :active_digest, :recorded_digest, :message,
+        keyword_init: true
+      ) do
+        def corresponds?
+          status == :match
+        end
+        # A surfaced non-correspondence per INV-A (divergence, not an internal error).
+        def divergence?
+          %i[mismatch missing_record missing_artifact].include?(status)
+        end
+      end
+      class << self
+        # Check an L1 knowledge artifact against its recorded provenance.
+        #
+        # @param name [String] knowledge id (knowledge_id on the chain record)
+        # @param md_file_path [String, nil] path to the live .md file relied upon
+        # @param storage_backend [Storage::Backend, nil] backend for chain access
+        # @return [Result]
+        def check_l1(name:, md_file_path:, storage_backend: nil)
+          unless md_file_path && File.file?(md_file_path)
+            # Relied upon but absent — a missing artifact is itself a
+            # non-correspondence under INV-A (the expected set is recorded).
+            return Result.new(
+              status: :missing_artifact, name: name,
+              active_digest: nil, recorded_digest: nil,
+              message: "L1 '#{name}': artifact missing at the point of reliance"
+            )
+          end
+          active = Digest::SHA256.hexdigest(File.read(md_file_path))
+          recorded = recorded_digest_for(name, storage_backend)
+          if recorded.nil?
+            return Result.new(
+              status: :missing_record, name: name,
+              active_digest: active, recorded_digest: nil,
+              message: "L1 '#{name}': live artifact has no recorded provenance on the chain"
+            )
+          end
+          if active == recorded
+            Result.new(
+              status: :match, name: name,
+              active_digest: active, recorded_digest: recorded, message: nil
+            )
+          else
+            Result.new(
+              status: :mismatch, name: name,
+              active_digest: active, recorded_digest: recorded,
+              message: "L1 '#{name}': live content diverged from recorded provenance " \
+                       "(active #{short(active)} ≠ recorded #{short(recorded)})"
+            )
+          end
+        rescue StandardError => e
+          Result.new(
+            status: :error, name: name,
+            active_digest: nil, recorded_digest: nil,
+            message: "L1 '#{name}': correspondence check could not complete: #{e.message}"
+          )
+        end
+        private
+        # The current recorded content digest for a knowledge_id: the next_hash of
+        # the most recent knowledge_update record, scanning the chain from head
+        # backward. Returns nil when the most recent relevant record removed the
+        # artifact (next_hash nil — delete/archive) or when none exists.
+        def recorded_digest_for(name, storage_backend)
+          chain = KairosChain::Chain.new(storage_backend: storage_backend)
+          chain.chain.reverse_each do |block|
+            Array(block.data).each do |entry|
+              record = parse_entry(entry)
+              next unless record.is_a?(Hash)
+              next unless record['type'] == 'knowledge_update'
+              next unless record['knowledge_id'] == name
+              # First match from the head is the current provenance (may be nil
+              # if the artifact was removed — caller treats nil as no record).
+              return record['next_hash']
+            end
+          end
+          nil
+        end
+        def parse_entry(entry)
+          return entry if entry.is_a?(Hash)
+          return JSON.parse(entry) if entry.is_a?(String)
+          nil
+        rescue JSON::ParserError
+          nil
+        end
+        def short(digest)
+          digest ? digest[0, 12] : '-'
+        end
+      end
+    end
+  end
+end

data/lib/kairos_mcp/knowledge_provider.rb CHANGED Viewed

@@ -23,6 +23,11 @@ module KairosMcp
   # - Blockchain: Uses the configured storage backend
   #
   class KnowledgeProvider
+    # Main knowledge directory (constitutively-recorded L1). Exposed so callers
+    # can distinguish main-dir knowledge from read-only external SkillSet
+    # knowledge, e.g. to scope INV-A correspondence checks to recorded artifacts.
+    attr_reader :knowledge_dir
     ARCHIVED_DIR = '.archived'
     ARCHIVE_META_FILE = '.archive_meta.yml'
     # Backup directories created by upgrade flow (`.bak.<timestamp>`).

data/lib/kairos_mcp/tools/knowledge_get.rb CHANGED Viewed

@@ -2,6 +2,7 @@
 require_relative 'base_tool'
 require_relative '../knowledge_provider'
+require_relative '../drift_detection/correspondence_checker'
 module KairosMcp
   module Tools
@@ -77,11 +78,59 @@ module KairosMcp
         end
         output = build_output(skill, arguments, provider)
-        text_content(output)
+        # INV-A detection floor: reading L1 knowledge "in order to act upon" is a
+        # point of reliance. Surface any divergence from the recorded provenance
+        # here — never silently. Scoped to main-dir L1 (external SkillSet
+        # knowledge has no chain provenance and would false-positive).
+        banner = correspondence_banner(skill, provider)
+        text_content(banner ? banner + output : output)
       end
       private
+      # Returns a surfacing prefix if the live artifact does not correspond to its
+      # recorded provenance, or nil when it corresponds (stay silent on match).
+      #
+      # Surfacing is graded by signal strength (the invariant requires only
+      # non-silence; grading is a Cycle-1 backlog policy):
+      #   :mismatch       a recorded artifact whose content silently changed —
+      #                   high signal, rare → an alarm banner.
+      #   :missing_record a live artifact with no chain provenance — overwhelmingly
+      #                   template-provisioned knowledge whose provenance root is the
+      #                   gem/template, not a per-instance record. Chain-rooting the
+      #                   expected set is explicit Cycle-1 backlog, so this is a muted
+      #                   one-line note, not an alarm — bannering every bundled read
+      #                   would train the reader to ignore the banner.
+      def correspondence_banner(skill, provider)
+        return nil unless main_dir_l1?(skill, provider)
+        result = DriftDetection::CorrespondenceChecker.check_l1(
+          name: skill.name,
+          md_file_path: skill.md_file_path
+        )
+        case result.status
+        when :mismatch, :missing_artifact
+          "> ⚠️ **Drift detected (INV-A)** — #{result.message}.\n" \
+            "> This content was modified outside the recorded change path; treat it as unverified.\n\n"
+        when :missing_record
+          "> ℹ️ No recorded provenance for this entry (provisioning not yet chain-rooted — Cycle-1 backlog).\n\n"
+        end
+      rescue StandardError => e
+        # A failed check must not break the read; report it without claiming correspondence.
+        warn "[knowledge_get] correspondence check failed: #{e.message}"
+        nil
+      end
+      # True only for knowledge living under the provider's main knowledge dir —
+      # i.e. constitutively-recorded L1, not read-only external SkillSet knowledge.
+      def main_dir_l1?(skill, provider)
+        return false unless skill.md_file_path && provider.respond_to?(:knowledge_dir)
+        root = File.expand_path(provider.knowledge_dir) + File::SEPARATOR
+        File.expand_path(skill.md_file_path).start_with?(root)
+      end
       def build_output(skill, arguments, provider)
         output = "## [#{skill.name}] #{skill.description || 'No description'}\n\n"
         output += "**Layer:** L1 (Knowledge)\n"

data/lib/kairos_mcp/tools/knowledge_update.rb CHANGED Viewed

@@ -13,8 +13,9 @@ module KairosMcp
       def description
         'Create, update, or delete L1 knowledge skills. Changes are recorded with hash references to the blockchain. ' \
-        'NOTE: MCP stdio transport may silently drop large arguments. Keep combined content + reason under ~2 KB. ' \
-        'For larger L1 entries, trim prose or split into multiple entries.'
+        'Large content is supported (a ~40 KB update was verified over stdio on 2026-06-10; the earlier ~2 KB guidance ' \
+        'generalized a single unreproduced nil-content incident). If content ever arrives as nil, the error message ' \
+        'below explains recovery; no preemptive size limit applies.'
       end
       def category
@@ -128,9 +129,10 @@ module KairosMcp
       def content_missing_error(command, content)
         if content.nil?
           "Error: content is required for #{command}. " \
-          "The content argument arrived as nil — this often means the MCP transport silently dropped it " \
-          "because the combined argument size exceeded the client limit (~2 KB for stdio). " \
-          "Try trimming content and reason, or split into smaller entries."
+          "The content argument arrived as nil — the MCP transport or the calling LLM dropped it. " \
+          "(A ~2 KB stdio limit was once suspected, but a ~40 KB update succeeded on 2026-06-10, " \
+          "so size alone is unlikely to be the cause.) " \
+          "Retry the call; if nil persists, write the content to a file and report the incident."
         else
           "Error: content is required for #{command} (received empty string)"
         end

data/lib/kairos_mcp/version.rb CHANGED Viewed

@@ -1,4 +1,4 @@
 module KairosMcp
-  VERSION = "3.30.0"
+  VERSION = "3.31.0"
   CHANGELOG_URL = "https://github.com/masaomi/KairosChain_2026/blob/main/CHANGELOG.md"
 end

data/templates/knowledge/llm_cross_evaluation/scripts/run_cross_eval.rb CHANGED Viewed

@@ -45,6 +45,14 @@ MODELS = {
     input_mode: :stdin,
     thinking_effort: "medium",
   },
+  "claude_fable5" => {
+    tool: :claude,
+    cmd: "claude --print --model claude-fable-5 --effort medium",
+    label: "Claude Fable 5",
+    provider: "anthropic",
+    input_mode: :stdin,
+    thinking_effort: "medium",
+  },
   "claude_opus46" => {
     tool: :claude,
     cmd: "claude --print --model claude-opus-4-6 --effort medium",

data/templates/knowledge/multi_llm_review_workflow/multi_llm_review_workflow.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: multi_llm_review_workflow
 description: "Multi-LLM review methodology and execution — workflow pattern, CLI tooling, consensus analysis, Persona Assembly. Applicable to design, implementation, documentation, or any artifact."
-version: "3.4"
+version: "3.5"
 tags:
   - workflow
   - review
@@ -192,11 +192,11 @@ starting** and verify each against `config/multi_llm_review.yml`:
 ```
 - [ ] Your model (orchestrator): ___
 - [ ] Agent Team Personas model: = orchestrator model (NOT a different model)
-- [ ] Subprocess CLI model: opposite Opus (4.6 if you are 4.7, vice versa)
+- [ ] Subprocess CLI models: Opus 4.6 AND Opus 4.8 (both, not either/or)
 - [ ] Codex models: gpt-5.5 (default) AND gpt-5.4 (both, not either/or)
 - [ ] Cursor model: default (composer-2.5, no --model flag)
-- [ ] Total reviewer count: 5 (or 4 after orchestrator exclusion from subprocess)
-- [ ] Convergence rule: 3/5 APPROVE (full) or 3/4 APPROVE (after exclusion)
+- [ ] Total reviewer count: 6 (or 5 after orchestrator exclusion from subprocess)
+- [ ] Convergence rule: 4/6 APPROVE (full) or 3/5 APPROVE (after exclusion)
 ```
 ### Common mistakes (Path A)
@@ -206,7 +206,7 @@ starting** and verify each against `config/multi_llm_review.yml`:
 | Exclude orchestrator model from Agent Team Personas | Agent Team uses orchestrator model — they provide persona diversity, not epistemic diversity | LLM misreads "do not assign yourself as a reviewer" as applying to Agent Team; it applies only to subprocess CLI |
 | Run only Codex GPT-5.4, skip 5.5 | Run both — they catch different things (5.5 found §5 schema contradiction in Phase 2 Case A that no other reviewer caught) | Cost-saving heuristic; roster has both for a reason |
 | Use a smaller/cheaper model as Agent Team substitute | Use the orchestrator's own model with different personas | Confusing "model diversity" with "persona diversity" — Agent Team is the latter |
-| Run 3 reviewers instead of 5 (or 4 after exclusion) | Use the full roster from config | Ad-hoc "3 is enough" reasoning; config specifies 5 for empirical reasons |
+| Run 3 reviewers instead of 6 (or 5 after exclusion) | Use the full roster from config | Ad-hoc "3 is enough" reasoning; config specifies 6 for empirical reasons |
 ## Roles
@@ -304,10 +304,10 @@ The rule applies **after** orchestrator classifies each finding as (a)/(b)/(c) p
 findings count toward the thresholds below; (c) findings are recorded as advisory
 and never block.
-- **3/4 APPROVE** (no (a)/(b) REJECT) = proceed to next step
+- **4/6 APPROVE** full roster, or **3/5 APPROVE** after orchestrator exclusion ("exclude" strategy only — the default "delegate" strategy keeps 6 voters via collect) (no (a)/(b) REJECT) = proceed to next step
 - **Any (a) or (b) REJECT or FAIL** = revise and re-review
 - **(c)-only REJECT** = record as advisory, non-blocking
-- **4/4 APPROVE** (no (a)/(b)) = highest confidence, proceed
+- **Unanimous APPROVE** (no (a)/(b)) = highest confidence, proceed
 - Legacy 3-reviewer mode: 2/3 APPROVE (no (a)/(b)) = proceed
 - Codex REJECT with (a)/(b) findings + others APPROVE = likely real issue, investigate before overriding
 - Codex REJECT with only (c) findings = expected per Codex value-system divergence; non-blocking
@@ -319,11 +319,11 @@ For normative detail and the underlying classification, see
 | Agreement | Meaning | Action |
 |-----------|---------|--------|
-| **4/4** (or **3/3**) | Architectural-level gap | Must fix |
-| **3/4** (or **2/3**) | Implementation-level issue | Should fix |
-| **1/4 only** | Specialty-specific insight | Do NOT ignore — often the most novel finding |
+| **N/N** (unanimous) | Architectural-level gap | Must fix |
+| **Majority** (e.g. 4/6, 3/5) | Implementation-level issue | Should fix |
+| **1/N only** | Specialty-specific insight | Do NOT ignore — often the most novel finding |
-1/4 (or 1/N) findings are not "minority opinions to discard." They represent unique expertise.
+1/N findings are not "minority opinions to discard." They represent unique expertise.
 ### Majority Rule — Reference Only
@@ -401,19 +401,20 @@ which agent 2>/dev/null && echo "agent: available" || echo "agent: NOT FOUND"
 which claude 2>/dev/null && echo "claude: available" || echo "claude: NOT FOUND"
 ```
-- All three available → Auto mode (4 reviewers, default)
-- Codex + Agent only → Auto mode (3 reviewers, legacy)
+- All three available → Auto mode (6 reviewers, default)
+- Codex + Agent only → Auto mode (legacy, reduced roster — apply "Legacy 3-reviewer mode: 2/3 APPROVE" from Convergence Rules)
 - Any of codex/agent missing → Manual mode
 - User override: `mode: manual` or `mode: auto`
-### CLI Tool Matrix (Tested 2026-03-28)
+### CLI Tool Matrix (tested 2026-03-28; Claude CLI 4.6/4.8 rows verified live 2026-06-10)
 | Tool | Command | Prompt Input | Output Collection | Model |
 |------|---------|-------------|-------------------|-------|
-| **Codex** | `codex exec` | stdin pipe: `cat prompt.md \| codex exec -` | `-o /path/output.md` | GPT-5.5 (default) |
+| **Codex** | `codex exec -m <model>` | stdin pipe: `cat prompt.md \| codex exec -` | `-o /path/output.md` | GPT-5.5 + GPT-5.4 (both roster entries, `-m` per entry) |
 | **Cursor Agent** | `agent -p` | File reference (stdin NOT supported) | stdout redirect: `> output.md` | Composer-2.5 (default) |
-| **Claude Code** | Agent tool (internal) | Direct prompt string | Write to workspace file | Opus 4.6 (session) |
-| **Claude CLI (4.7)** | `claude -p --model claude-opus-4-7 --bare` | stdin pipe: `cat prompt.md \| claude -p --model claude-opus-4-7 --bare` | stdout redirect: `> output.md` | Opus 4.7 |
+| **Claude Code** | Agent tool (internal) | Direct prompt string | Write to workspace file | Fable 5 (session) |
+| **Claude CLI (4.6)** | `claude -p --model claude-opus-4-6 --bare` | stdin pipe: `cat prompt.md \| claude -p --model claude-opus-4-6 --bare` | stdout redirect: `> output.md` | Opus 4.6 |
+| **Claude CLI (4.8)** | `claude -p --model claude-opus-4-8 --bare` | stdin pipe: `cat prompt.md \| claude -p --model claude-opus-4-8 --bare` | stdout redirect: `> output.md` | Opus 4.8 |
 ### Thinking Effort Configuration (validated 2026-04-20)
@@ -421,23 +422,30 @@ Based on cross-evaluation experiment (7 models × 4 tasks + Nomic, 518 CLI calls
 | Role | Model | Effort Flag | Rationale |
 |------|-------|-------------|-----------|
-| **Primary (orchestrator)** | Opus 4.6 | `--effort medium` | Sufficient for integration, dialogue, judgment |
-| **Reviewer: Agent Team** | Opus 4.6 | `--effort medium` | Evaluator quality adequate at medium |
-| **Reviewer: Claude CLI** | Opus 4.7 | `--effort low` | Evaluator quality is effort-independent (low≈high: 8.35 vs 8.16) |
+| **Primary (orchestrator)** | Fable 5 (session default) | (default) | Sufficient for integration, dialogue, judgment |
+| **Reviewer: Agent Team** | = orchestrator (Fable 5) | (default) | Personas inherit orchestrator model |
+| **Reviewer: Claude CLI** | Opus 4.6 / Opus 4.8 | (default; config `effort: medium`) | Evaluator quality is effort-independent (low≈high: 8.35 vs 8.16) — per 2026-04-29 policy reviewers stay at default |
 | **Coding sub-agent** | Opus 4.7 | `--effort medium` | Cost-effective default; use `high` for complex tasks |
 | **Design sub-agent** | Opus 4.7 | `--effort medium` | Cost-effective default; use `high` for complex tasks |
 | **Codex** | GPT-5.5 (default) | (no flag) | Fixed effort |
 | **Cursor Agent** | Composer-2.5 | (no flag) | Fixed effort |
+Note (2026-06-10): the effort experiment data is from the Opus 4.6/4.7
+generation. Fable 5 and Opus 4.8 effort sensitivity is not yet calibrated;
+defaults apply until re-measured.
 Key findings:
 - **Opus 4.6** high effort improves Evaluator/Strategy (+0.43/+0.200 Nomic), not Response
 - **Opus 4.7** high effort improves Response/Thinking (+0.81 code, +0.53 philosophy), not Evaluator
 - **Opus 4.7 low > Opus 4.6 high** in combined score — model generation > effort setting
-**Effort escalation**: For particularly complex tasks (Tier 3+ architecture, security-critical
-code, multi-component refactoring), the LLM accessing this skill SHOULD escalate to `--effort high`
-at its own judgment. No human approval is needed for effort escalation — it is a cost/quality
-tradeoff that the executing LLM is best positioned to evaluate in context.
+**Effort escalation** (coding/design sub-agents and the post-aggregation revision
+phase only — NOT reviewers, who stay at default per the 2026-04-29 policy): For
+particularly complex tasks (Tier 3+ architecture, security-critical code,
+multi-component refactoring), the LLM accessing this skill SHOULD escalate to
+`--effort high` at its own judgment. No human approval is needed for effort
+escalation — it is a cost/quality tradeoff that the executing LLM is best
+positioned to evaluate in context.
 ### Model Detection
@@ -454,15 +462,17 @@ agent --list-models 2>&1 | grep "(current\|default)"
 **Rule**: When invoking `multi_llm_review` (or running this workflow manually), the
 orchestrating LLM MUST pass its own model identifier as `orchestrator_model`.
-**Rationale**: The reviewer roster typically contains both Opus 4.6 and Opus 4.7
-entries. To avoid the orchestrator reviewing its own output (no independent signal),
-the dispatcher excludes any roster entry whose `model` matches `orchestrator_model`.
-This keeps the same SkillSet useful regardless of which Opus version the user has
-toggled to via `/model` — review composition adapts automatically.
+**Rationale**: The reviewer roster contains multiple Claude entries (Fable 5
+team slot, Opus 4.6 CLI, Opus 4.8 CLI). To avoid the orchestrator reviewing its
+own output (no independent signal), the dispatcher excludes or delegates the
+roster entry whose `model` matches `orchestrator_model` (per
+`orchestrator_strategy`). This keeps the same SkillSet useful
+regardless of which Claude model the user has toggled to via `/model` — review
+composition adapts automatically.
 **Why "argument-passing" not "file-introspection"**:
 - The orchestrator's model identity lives in *its own context* (system prompt
-  declares e.g. "You are powered by Opus 4.7"). No external file or env var is
+  declares e.g. "You are powered by Fable 5"). No external file or env var is
   authoritative — `/model` switches change context immediately.
 - MCP protocol does not transmit caller-model info; only the orchestrator can
   truthfully report its own identity. This is genuine self-reference: the system
@@ -473,7 +483,8 @@ toggled to via `/model` — review composition adapts automatically.
 **How orchestrator obtains its model ID**:
 - Claude Code sessions: read the system prompt line "You are powered by the
-  model named ... The exact model ID is `claude-opus-X-Y`". Use the exact ID.
+  model named ... The exact model ID is ...". Use the exact ID as stated,
+  whatever its form (e.g. `claude-fable-5`, `claude-opus-4-8`).
 - Other hosts: use whatever introspection the host provides; if none, pass
   `null` and accept that no exclusion happens.
@@ -482,27 +493,33 @@ toggled to via `/model` — review composition adapts automatically.
 multi_llm_review(
   artifact_path: "log/design.md",
   review_type: "design",
-  orchestrator_model: "claude-opus-4-7"   # MUST be set by caller
+  orchestrator_model: "claude-fable-5"    # MUST be set by caller
 )
 ```
 **Dispatcher behavior** (config: `exclude_orchestrator_model: true`, default `true`):
 - If `orchestrator_model` matches a roster entry's `model`, that entry is skipped.
 - `min_quorum` and `convergence_rule` apply to the remaining reviewers.
-- 4-reviewer roster → 3 reviewer; recommended `convergence_rule: "2/3 APPROVE"`
-  when one Opus is excluded.
+- 6-reviewer roster → 5 reviewers; `convergence_rule_after_exclusion: "3/5 APPROVE"`
+  (from config) replaces the full-roster rule. This reduced count applies to the
+  "exclude" strategy only. The "subprocess" strategy keeps the full roster (the
+  matching entry runs as a fresh CLI process instead of being skipped). Under the
+  default "delegate" strategy, the matching entry is dropped at dispatch but
+  re-added at collect as the persona-team entry, so the voter count returns to 6
+  and the full-roster rule (4/6 APPROVE) applies — verified live 2026-06-10.
 - If `orchestrator_model` is `null` or unmatched, full roster runs (back-compat).
 **Manual-mode equivalent**: When orchestrating by hand, do not assign yourself
-as a reviewer. Pick the *other* Opus version for the Claude CLI subprocess
-reviewer (4.6 if you are 4.7, and vice versa).
+as a subprocess reviewer. Run the Claude CLI subprocess reviewers (Opus 4.6 and
+Opus 4.8); if your own model matches one of them, skip that entry and use the
+after-exclusion convergence rule.
-### Orchestrator Delegation Protocol (Two-Phase, opt-in)
+### Orchestrator Delegation Protocol (Two-Phase, default)
 The `delegate` strategy lets the orchestrator perform persona-based "Agent Team"
 review in its own context — preserving inherited project context that a fresh
-`claude -p` subprocess loses. Subprocess reviewers (codex, cursor, opposite-Opus)
-remain single-LLM.
+`claude -p` subprocess loses. Subprocess reviewers (codex, cursor, Claude CLI
+Opus 4.6/4.8) remain single-LLM.
 **Why**: The orchestrator already holds the artifact in context with full project
 awareness. Re-shipping it to a sandboxed subprocess discards that context. Same-
@@ -537,8 +554,10 @@ cross-model subprocess reviewers give epistemic diversity. The two are complemen
   required fields. Fix and retry collect with the same token.
 - All-subprocess-failed at Call 1: returns error immediately; no token issued.
-**Default**: `orchestrator_strategy` defaults to `"exclude"` (back-compat). Use
-`"delegate"` explicitly until validated by use.
+**Default**: `orchestrator_strategy` defaults to `"delegate"` (config key
+`default_orchestrator_strategy`). `"exclude"` remains available as the legacy
+strategy. (Historical note: delegate was opt-in until validated by use; it has
+been the config default since v3.x.)
 #### Async/Parallel Collect Timing — Iron Rule
@@ -603,8 +622,8 @@ readable until GC. Read them directly and synthesize manually, then re-run
 - **Cursor Agent trust**: `--trust` required for headless/non-interactive mode
 - **Codex workspace**: `-C /path/to/workspace` to set working directory
 - **Claude Agent paths**: Write within workspace (e.g., `log/`), not `/tmp`
-- **Claude CLI (Opus 4.7)**: `claude -p --model claude-opus-4-7 --bare` runs as external process. Uses stdin pipe (like Codex). `--bare` required for review tasks (skips hooks, CLAUDE.md, avoids bias from project instructions). Without `--bare`, CLAUDE.md's three-layer response structure may distort review output
-- **Claude CLI parallelism**: Agent tool (internal, Opus 4.6) + Bash `claude -p` (external, Opus 4.7) run truly in parallel as separate processes
+- **Claude CLI (Opus 4.6 / 4.8)**: `claude -p --model claude-opus-4-6 --bare` (likewise `claude-opus-4-8`) runs as external process. Uses stdin pipe (like Codex). `--bare` required for review tasks (skips hooks, CLAUDE.md, avoids bias from project instructions). Without `--bare`, CLAUDE.md's three-layer response structure may distort review output
+- **Claude CLI parallelism**: Agent tool (internal, orchestrator model = Fable 5) + Bash `claude -p` (external, Opus 4.6 / 4.8) run truly in parallel as separate processes
 - **Claude CLI file access**: `claude -p` with `--bare` has no MCP tools or file access. Ensure review prompt includes all artifact content inline (rule #6). Use `--add-dir` + `--allowedTools "Read,Glob,Grep"` if file access is needed (but note: this loads CLAUDE.md unless `--bare` is also used)
 ## Prompt Generation Rules
@@ -709,13 +728,15 @@ Step 1: Generate review prompt
 Step 2: Detect environment and models
   - Run: which codex && which agent && which claude
   - Detect default models
-  - Report: "Auto mode: Codex (gpt-5.5), Agent (composer-2.5), Claude (opus-4.6), Claude CLI (opus-4.7)"
+  - Report: "Auto mode: Codex (gpt-5.5, gpt-5.4), Agent (composer-2.5), Claude Team (claude-fable-5), Claude CLI (opus-4.6, opus-4.8)"
-Step 3: Execute N reviews in parallel (default 4 reviewers)
-  - Bash(background): cat prompt.md | codex exec -C workspace -o log/review_codex.md -
+Step 3: Execute N reviews in parallel (default 6 reviewers)
+  - Bash(background): cat prompt.md | codex exec -m gpt-5.5 -C workspace -o log/review_codex_gpt5.5.md -
+  - Bash(background): cat prompt.md | codex exec -m gpt-5.4 -C workspace -o log/review_codex_gpt5.4.md -
   - Bash(background): agent -p --trust "Read prompt and review..." > log/review_cursor.md
-  - Agent(background): Claude Team (Opus 4.6) → write to log/review_claude_opus4.6.md
-  - Bash(background): cat prompt.md | claude -p --model claude-opus-4-7 --bare > log/review_claude_opus4.7.md 2>log/review_claude_opus4.7.stderr.log
+  - Agent(background): Claude Team (orchestrator model, Fable 5) → write to log/review_claude_team_fable5.md
+  - Bash(background): cat prompt.md | claude -p --model claude-opus-4-6 --bare > log/review_claude_opus4.6.md 2>log/review_claude_opus4.6.stderr.log
+  - Bash(background): cat prompt.md | claude -p --model claude-opus-4-8 --bare > log/review_claude_opus4.8.md 2>log/review_claude_opus4.8.stderr.log
 Step 4: Collect and validate
   - Wait for all to complete (background task notifications)
@@ -752,9 +773,11 @@ log/{artifact}_review{N}_{llm_id}_{date}.md       # Individual reviews
 log/{artifact}_review{N}_consensus_{date}.md       # Consensus analysis
 ```
-LLM identifiers: `claude_opus4.6`, `claude_team_opus4.6`,
-`claude_cli_opus4.7`, `codex_gpt5.5`, `codex_gpt5.4`, `cursor_composer2`, `cursor_gpt5.4`,
+LLM identifiers: `claude_team_fable5`, `claude_cli_opus4.6`, `claude_cli_opus4.8`,
+`codex_gpt5.5`, `codex_gpt5.4`, `cursor_composer2.5`, `cursor_gpt5.4`,
 `cursor_premium`
+(legacy, pre-2026-06-10: `claude_opus4.6`, `claude_team_opus4.6`, `claude_team_opus4.7`,
+`claude_cli_opus4.7`, `cursor_composer2`)
 ## Internal Agent Team Review
@@ -774,7 +797,7 @@ Compression ratio: parallel agent raw → Assembly ≈ 2:1
 - Don't advance to Phase N+1 before Phase N review converges
 - Don't re-review from scratch — each round checks only the delta
 - Don't use only internal agent team — different providers catch different bugs
-- Don't dismiss 1/4 (or 1/N) findings without evaluating substance
+- Don't dismiss 1/N findings without evaluating substance
 - Don't use Persona Assembly in every intermediate round (save for final gate)
 ---
@@ -793,6 +816,8 @@ Compression ratio: parallel agent raw → Assembly ≈ 2:1
 - Codex convergence: REJECT → REJECT → REJECT → APPROVE (4 rounds)
 - Self-referential review: v3.0 of this skill reviewed by its own process → v3.1
 - Self-referential review: v3.2 (4-reviewer update, 2026-04-19) reviewed with new 4-reviewer default (Opus 4.6 + 4.7 + Codex + Composer-2). 4/4 APPROVE WITH CHANGES R1. Findings integrated → v3.3
+- Roster update (v3.5, 2026-06-10): Fable 5 replaces Opus 4.7 as orchestrator/team slot; Opus 4.8 added as second subprocess CLI reviewer alongside Opus 4.6. 4.6 retained for its documented complementary bias (ambiguity-preserving, self-reference-friendly); 4.7 retired as its register is covered by 4.8 and Fable 5. 4.8/Fable 5 bias profiles uncalibrated — record (a)/(b)/(c) breakdowns per round until profiles accumulate in `multi_llm_reviewer_evaluation`
+- Self-referential review of v3.5 (2 rounds, 2026-06-10/11, first run of the 6-reviewer roster): R1 REVISE (1 APPROVE / 4 REJECT — stale pre-v3.5 passages) → fixes → R2 3/6 APPROVE (4.6, 4.8, codex 5.4) with Cursor contributing a code-grounded correction (subprocess strategy keeps the full roster). 4.6/4.8 verdicts split along the predicted lenient/strict axis in R1 and converged to APPROVE in R2
 **Key insight**: Design reviews and implementation reviews find
-**categorically different bugs**. Both phases are necessary.
+**categorically different bugs**. Both phases are necessary.

data/templates/skillsets/multi_llm_review/config/multi_llm_review.yml CHANGED Viewed

@@ -4,21 +4,24 @@
 # to avoid duplication.
 # Convergence rules
-# Roster has 5 reviewers (claude_team_opus4.7, claude_cli_opus4.6,
-# codex_gpt5.4, codex_gpt5.5, cursor_composer2.5). Rules are ratio-based
-# (parser interprets "N/M" as N/M fraction applied to successful count),
-# so the literal numerator/denominator is informational; what matters is
-# the ratio.
-convergence_rule: "3/5 APPROVE"  # 60% of successful reviewers must APPROVE
+# Roster has 6 reviewers (claude_team_fable5, claude_cli_opus4.6,
+# claude_cli_opus4.8, codex_gpt5.4, codex_gpt5.5, cursor_composer2.5).
+# Rules are ratio-based (parser interprets "N/M" as N/M fraction applied
+# to successful count), so the literal numerator/denominator is
+# informational; what matters is the ratio.
+convergence_rule: "4/6 APPROVE"  # ceil(6 * 0.6) = 4 of the 6-reviewer roster
 min_quorum: 2                    # minimum successful reviews for any verdict
 # Self-referential orchestrator exclusion.
-# When the caller passes orchestrator_model (e.g. "claude-opus-4-7"), any
+# When the caller passes orchestrator_model (e.g. "claude-fable-5"), any
 # roster entry with that exact model is dropped before dispatch — the
 # orchestrator should not review its own output.
-# When at least one entry is excluded, convergence_rule_after_exclusion
-# replaces convergence_rule for that dispatch (4 reviewers → 3 reviewers
-# makes "3/4 APPROVE" require unanimity, which is too strict).
+# Under the "exclude" strategy, when at least one entry is excluded,
+# convergence_rule_after_exclusion replaces convergence_rule for that
+# dispatch (the literal rule for the full roster would otherwise be too
+# strict for the reduced count). The "subprocess" strategy keeps the full
+# roster; the default "delegate" strategy re-adds the slot at collect, so
+# the full-roster rule governs there.
 exclude_orchestrator_model: true
 # Default orchestrator_strategy when the caller does not specify one.
@@ -29,9 +32,15 @@ exclude_orchestrator_model: true
 # "exclude": legacy behavior — drop the matching reviewer entirely.
 # "subprocess": spawn fresh claude -p for the matching reviewer.
 default_orchestrator_strategy: "delegate"
-# After excluding 1 orchestrator from 5 → 4 reviewers. Maintain the same
-# 60% ratio (ceil(4 * 0.6) = 3 → 3 of 4 must APPROVE).
-convergence_rule_after_exclusion: "3/4 APPROVE"
+# After excluding 1 orchestrator from 6 → 5 reviewers. Same ceil(N * 0.6)
+# majority basis (ceil(5 * 0.6) = 3 → 3 of 5 must APPROVE). Note the two
+# rules are not the same literal ratio (4/6 ≈ 0.67 vs 3/5 = 0.60); since the
+# parser applies the ratio to the successful count, the full-roster rule is
+# slightly stricter when some reviewers fail. Accepted as-is.
+# Applies to the "exclude" strategy only — "subprocess" keeps the full
+# roster, and "delegate" re-adds the orchestrator slot at collect, so the
+# full-roster rule (4/6) governs both (verified live 2026-06-10).
+convergence_rule_after_exclusion: "3/5 APPROVE"
 # Two-phase delegation (orchestrator_strategy: "delegate").
 # Phase 1 dispatches subprocess reviewers synchronously, persists their
@@ -73,7 +82,9 @@ delegation:
     wait_still_pending_streak_limit: 3        # consecutive still_pending returns before crashed/wait_exhausted
 # Dispatch settings
-timeout_seconds: 300              # global deadline for all reviewers
+timeout_seconds: 600              # global deadline for all reviewers
+                                  # (raised 300 -> 600 on 2026-06-10: live 6-roster
+                                  # run took 381s wall-clock with max_concurrent: 2)
 max_concurrent: 2                 # semaphore limit (2 for laptop, 4 for CI)
 # Reviewer roster
@@ -84,16 +95,31 @@ max_concurrent: 2                 # semaphore limit (2 for laptop, 4 for CI)
 # tool receives a `complexity` argument (or auto-detects it), these defaults
 # are overridden per-dispatch by the effort_map below.
 reviewers:
+  # Orchestrator slot. With orchestrator_strategy "delegate" (the default),
+  # this entry is replaced by the orchestrator's own persona Agent Team
+  # review when orchestrator_model matches. Updated 2026-06-10: Fable 5
+  # replaces Opus 4.7 as the session default model.
   - provider: claude_code
-    model: claude-opus-4-7
+    model: claude-fable-5
     effort: medium
-    role_label: claude_team_opus4.7
+    role_label: claude_team_fable5
   - provider: claude_code
     model: claude-opus-4-6
     effort: medium
     role_label: claude_cli_opus4.6
+  # Opus 4.8 added 2026-06-10, replacing Opus 4.7 in the roster. Rationale:
+  # 4.7's register (strictness / systematizing) is covered by its direct
+  # successor 4.8 and by Fable 5; 4.6 stays for its documented complementary
+  # bias (ambiguity-preserving, self-reference-friendly). 4.8's reviewer
+  # bias profile is not yet calibrated — record (a)/(b)/(c) breakdowns per
+  # round (see multi_llm_reviewer_evaluation) until a profile accumulates.
+  - provider: claude_code
+    model: claude-opus-4-8
+    effort: medium
+    role_label: claude_cli_opus4.8
   - provider: codex
     model: gpt-5.4
     effort: medium

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: kairos-chain
 version: !ruby/object:Gem::Version
-  version: 3.30.0
+  version: 3.31.0
 platform: ruby
 authors:
 - Masaomi Hatakeyama
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2026-06-08 00:00:00.000000000 Z
+date: 2026-06-10 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: minitest
@@ -128,6 +128,7 @@ files:
 - lib/kairos_mcp/daemon/wal.rb
 - lib/kairos_mcp/daemon/wal_phase_recorder.rb
 - lib/kairos_mcp/daemon/wal_recovery.rb
+- lib/kairos_mcp/drift_detection/correspondence_checker.rb
 - lib/kairos_mcp/dsl_ast/ast_engine.rb
 - lib/kairos_mcp/dsl_ast/decompiler.rb
 - lib/kairos_mcp/dsl_ast/drift_detector.rb