kairos-chain 3.30.0 → 3.31.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3ec94fadb095a49a89c0235010c80e37913aaf6781aa16d22a2e5855b1eb154d
4
- data.tar.gz: 38c3b631326b87f07285f9a5f4ecdd64fc22d086b6b08de76d5b2aaf2d7d7040
3
+ metadata.gz: 227d024f36839c595295ed0e3b4415c6764750597283b59a21ea1f5e16112210
4
+ data.tar.gz: 8fd8767580dbe3db617cbaf4ee0dcfafdb38f5823315f0321bae0c4893b2f45d
5
5
  SHA512:
6
- metadata.gz: 7d3d7e9d5f52b58109a795ce4f67243551ad353d18d273d4062034e09bcafe8201559ceabcef16c5ad0d0fc0bca4b8b8629c27cb49e224091ae05ffd5aa364ac
7
- data.tar.gz: a20e8e7a2cb18d8cbcbd76f39c7fa69a9bbd55d4447c16fa289ca8227fd2537325693748e711e46e57693c6572b41f15096dc358368f89911d95eaea3c714ca8
6
+ metadata.gz: 7486fef959f54577c0a13a301112f746f8c440100f11f5d424f27771759ad3024d6d30988b7a9e7c58a4544db1ffe6c2705b3271226d9445f1537dc591f96ef7
7
+ data.tar.gz: 8d670bb2a2f5d8d848072101527fcb5b3153c23ea4c5ae1359c1d46c30187011911b32512086e1e1734e72ba5d0d3ae20913499d5d1d8c4b4d1d359b79963281
data/CHANGELOG.md CHANGED
@@ -4,6 +4,41 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
4
4
 
5
5
  This project follows [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [3.31.0] - 2026-06-11
8
+
9
+ ### Changed — multi_llm_review roster: Fable 5 + Opus 4.6/4.8 (6 reviewers)
10
+
11
+ Default reviewer roster updated for the Fable 5 / Opus 4.8 model generation:
12
+
13
+ - Orchestrator/team slot: `claude-fable-5` (`claude_team_fable5`) replaces
14
+ Opus 4.7. Opus 4.8 added as a second subprocess CLI reviewer
15
+ (`claude_cli_opus4.8`) alongside Opus 4.6, which is retained for its
16
+ documented complementary bias (ambiguity-preserving, self-reference-friendly).
17
+ 4.7 retired: its register is covered by 4.8 and Fable 5.
18
+ - Convergence rules: `4/6 APPROVE` full roster, `3/5 APPROVE` after
19
+ orchestrator exclusion ("exclude" strategy only — "subprocess" keeps the
20
+ full roster; "delegate" re-adds the slot at collect, so 4/6 governs there).
21
+ - `timeout_seconds` raised 300 → 600 (live 6-roster run measured 381s
22
+ wall-clock with `max_concurrent: 2`).
23
+ - Validated by a 2-round self-referential review of the workflow L1 with the
24
+ new roster itself (R1 REVISE → fixes → R2 with 4.6/4.8/codex-5.4 APPROVE,
25
+ including a code-grounded Cursor correction of the exclusion semantics).
26
+
27
+ ### Changed — `multi_llm_review_workflow` L1 v3.5
28
+
29
+ - All roster-dependent sections updated (pre-flight checklist, CLI tool
30
+ matrix, convergence rules, orchestrator self-identification, orchestration
31
+ template, LLM identifiers). Claude CLI 4.6/4.8 rows verified live 2026-06-10.
32
+ - Effort escalation paragraph scoped to coding/design sub-agents and the
33
+ revision phase (reviewers stay at default per the 2026-04-29 policy).
34
+
35
+ ### Fixed — `knowledge_update` size guidance
36
+
37
+ - Removed the "~2 KB MCP stdio limit" warning from the tool description and
38
+ nil-content diagnostic: two ~40 KB updates succeeded over stdio on
39
+ 2026-06-10/11. The old figure generalized a single unreproduced
40
+ nil-content incident; the nil-content detection itself is retained.
41
+
7
42
  ## [3.30.0] - 2026-06-08
8
43
 
9
44
  ### Added — `dream_digest`: derived narrative view over L2/L1 fragments (dream SkillSet v0.3.0)
@@ -0,0 +1,136 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'digest'
4
+ require 'json'
5
+ require_relative '../kairos_chain/chain'
6
+
7
+ module KairosMcp
8
+ module DriftDetection
9
+ # CorrespondenceChecker — INV-A detection floor (Cycle 1, toward by-construction).
10
+ #
11
+ # Checks whether a live L0/L1 artifact still corresponds to its *current
12
+ # recorded provenance*: the content digest stored for that artifact at the
13
+ # head of the constitutive record (the hash chain). This is detection only —
14
+ # it surfaces divergence; it does not prevent edits or gate writes. Those are
15
+ # later cycles (single-source enforcement, record-as-gate).
16
+ #
17
+ # Provenance is rooted in the hash chain, not in the SQLite knowledge_meta
18
+ # cache: INV-A names the chain head as the non-editable anchor. The chain is
19
+ # therefore the single source consulted here; the meta table (when present)
20
+ # is a derived view and is intentionally not used for the comparison.
21
+ #
22
+ # The digest is computed over the *raw file content* (frontmatter included),
23
+ # matching exactly how it was recorded on create/update (a verbatim write,
24
+ # no normalization). Comparing the parsed/stripped body would never match.
25
+ class CorrespondenceChecker
26
+ # Result of a single correspondence check.
27
+ #
28
+ # status:
29
+ # :match live artifact corresponds to recorded provenance
30
+ # :mismatch live content diverged from recorded provenance (silent edit)
31
+ # :missing_record live artifact relied upon, but no recorded provenance exists
32
+ # :missing_artifact recorded/expected artifact is absent at the reliance point
33
+ # :error the check itself could not complete (not a correspondence claim)
34
+ Result = Struct.new(
35
+ :status, :name, :active_digest, :recorded_digest, :message,
36
+ keyword_init: true
37
+ ) do
38
+ def corresponds?
39
+ status == :match
40
+ end
41
+
42
+ # A surfaced non-correspondence per INV-A (divergence, not an internal error).
43
+ def divergence?
44
+ %i[mismatch missing_record missing_artifact].include?(status)
45
+ end
46
+ end
47
+
48
+ class << self
49
+ # Check an L1 knowledge artifact against its recorded provenance.
50
+ #
51
+ # @param name [String] knowledge id (knowledge_id on the chain record)
52
+ # @param md_file_path [String, nil] path to the live .md file relied upon
53
+ # @param storage_backend [Storage::Backend, nil] backend for chain access
54
+ # @return [Result]
55
+ def check_l1(name:, md_file_path:, storage_backend: nil)
56
+ unless md_file_path && File.file?(md_file_path)
57
+ # Relied upon but absent — a missing artifact is itself a
58
+ # non-correspondence under INV-A (the expected set is recorded).
59
+ return Result.new(
60
+ status: :missing_artifact, name: name,
61
+ active_digest: nil, recorded_digest: nil,
62
+ message: "L1 '#{name}': artifact missing at the point of reliance"
63
+ )
64
+ end
65
+
66
+ active = Digest::SHA256.hexdigest(File.read(md_file_path))
67
+ recorded = recorded_digest_for(name, storage_backend)
68
+
69
+ if recorded.nil?
70
+ return Result.new(
71
+ status: :missing_record, name: name,
72
+ active_digest: active, recorded_digest: nil,
73
+ message: "L1 '#{name}': live artifact has no recorded provenance on the chain"
74
+ )
75
+ end
76
+
77
+ if active == recorded
78
+ Result.new(
79
+ status: :match, name: name,
80
+ active_digest: active, recorded_digest: recorded, message: nil
81
+ )
82
+ else
83
+ Result.new(
84
+ status: :mismatch, name: name,
85
+ active_digest: active, recorded_digest: recorded,
86
+ message: "L1 '#{name}': live content diverged from recorded provenance " \
87
+ "(active #{short(active)} ≠ recorded #{short(recorded)})"
88
+ )
89
+ end
90
+ rescue StandardError => e
91
+ Result.new(
92
+ status: :error, name: name,
93
+ active_digest: nil, recorded_digest: nil,
94
+ message: "L1 '#{name}': correspondence check could not complete: #{e.message}"
95
+ )
96
+ end
97
+
98
+ private
99
+
100
+ # The current recorded content digest for a knowledge_id: the next_hash of
101
+ # the most recent knowledge_update record, scanning the chain from head
102
+ # backward. Returns nil when the most recent relevant record removed the
103
+ # artifact (next_hash nil — delete/archive) or when none exists.
104
+ def recorded_digest_for(name, storage_backend)
105
+ chain = KairosChain::Chain.new(storage_backend: storage_backend)
106
+ chain.chain.reverse_each do |block|
107
+ Array(block.data).each do |entry|
108
+ record = parse_entry(entry)
109
+ next unless record.is_a?(Hash)
110
+ next unless record['type'] == 'knowledge_update'
111
+ next unless record['knowledge_id'] == name
112
+
113
+ # First match from the head is the current provenance (may be nil
114
+ # if the artifact was removed — caller treats nil as no record).
115
+ return record['next_hash']
116
+ end
117
+ end
118
+ nil
119
+ end
120
+
121
+ def parse_entry(entry)
122
+ return entry if entry.is_a?(Hash)
123
+ return JSON.parse(entry) if entry.is_a?(String)
124
+
125
+ nil
126
+ rescue JSON::ParserError
127
+ nil
128
+ end
129
+
130
+ def short(digest)
131
+ digest ? digest[0, 12] : '-'
132
+ end
133
+ end
134
+ end
135
+ end
136
+ end
@@ -23,6 +23,11 @@ module KairosMcp
23
23
  # - Blockchain: Uses the configured storage backend
24
24
  #
25
25
  class KnowledgeProvider
26
+ # Main knowledge directory (constitutively-recorded L1). Exposed so callers
27
+ # can distinguish main-dir knowledge from read-only external SkillSet
28
+ # knowledge, e.g. to scope INV-A correspondence checks to recorded artifacts.
29
+ attr_reader :knowledge_dir
30
+
26
31
  ARCHIVED_DIR = '.archived'
27
32
  ARCHIVE_META_FILE = '.archive_meta.yml'
28
33
  # Backup directories created by upgrade flow (`.bak.<timestamp>`).
@@ -2,6 +2,7 @@
2
2
 
3
3
  require_relative 'base_tool'
4
4
  require_relative '../knowledge_provider'
5
+ require_relative '../drift_detection/correspondence_checker'
5
6
 
6
7
  module KairosMcp
7
8
  module Tools
@@ -77,11 +78,59 @@ module KairosMcp
77
78
  end
78
79
 
79
80
  output = build_output(skill, arguments, provider)
80
- text_content(output)
81
+ # INV-A detection floor: reading L1 knowledge "in order to act upon" is a
82
+ # point of reliance. Surface any divergence from the recorded provenance
83
+ # here — never silently. Scoped to main-dir L1 (external SkillSet
84
+ # knowledge has no chain provenance and would false-positive).
85
+ banner = correspondence_banner(skill, provider)
86
+ text_content(banner ? banner + output : output)
81
87
  end
82
88
 
83
89
  private
84
90
 
91
+ # Returns a surfacing prefix if the live artifact does not correspond to its
92
+ # recorded provenance, or nil when it corresponds (stay silent on match).
93
+ #
94
+ # Surfacing is graded by signal strength (the invariant requires only
95
+ # non-silence; grading is a Cycle-1 backlog policy):
96
+ # :mismatch a recorded artifact whose content silently changed —
97
+ # high signal, rare → an alarm banner.
98
+ # :missing_record a live artifact with no chain provenance — overwhelmingly
99
+ # template-provisioned knowledge whose provenance root is the
100
+ # gem/template, not a per-instance record. Chain-rooting the
101
+ # expected set is explicit Cycle-1 backlog, so this is a muted
102
+ # one-line note, not an alarm — bannering every bundled read
103
+ # would train the reader to ignore the banner.
104
+ def correspondence_banner(skill, provider)
105
+ return nil unless main_dir_l1?(skill, provider)
106
+
107
+ result = DriftDetection::CorrespondenceChecker.check_l1(
108
+ name: skill.name,
109
+ md_file_path: skill.md_file_path
110
+ )
111
+
112
+ case result.status
113
+ when :mismatch, :missing_artifact
114
+ "> ⚠️ **Drift detected (INV-A)** — #{result.message}.\n" \
115
+ "> This content was modified outside the recorded change path; treat it as unverified.\n\n"
116
+ when :missing_record
117
+ "> ℹ️ No recorded provenance for this entry (provisioning not yet chain-rooted — Cycle-1 backlog).\n\n"
118
+ end
119
+ rescue StandardError => e
120
+ # A failed check must not break the read; report it without claiming correspondence.
121
+ warn "[knowledge_get] correspondence check failed: #{e.message}"
122
+ nil
123
+ end
124
+
125
+ # True only for knowledge living under the provider's main knowledge dir —
126
+ # i.e. constitutively-recorded L1, not read-only external SkillSet knowledge.
127
+ def main_dir_l1?(skill, provider)
128
+ return false unless skill.md_file_path && provider.respond_to?(:knowledge_dir)
129
+
130
+ root = File.expand_path(provider.knowledge_dir) + File::SEPARATOR
131
+ File.expand_path(skill.md_file_path).start_with?(root)
132
+ end
133
+
85
134
  def build_output(skill, arguments, provider)
86
135
  output = "## [#{skill.name}] #{skill.description || 'No description'}\n\n"
87
136
  output += "**Layer:** L1 (Knowledge)\n"
@@ -13,8 +13,9 @@ module KairosMcp
13
13
 
14
14
  def description
15
15
  'Create, update, or delete L1 knowledge skills. Changes are recorded with hash references to the blockchain. ' \
16
- 'NOTE: MCP stdio transport may silently drop large arguments. Keep combined content + reason under ~2 KB. ' \
17
- 'For larger L1 entries, trim prose or split into multiple entries.'
16
+ 'Large content is supported (a ~40 KB update was verified over stdio on 2026-06-10; the earlier ~2 KB guidance ' \
17
+ 'generalized a single unreproduced nil-content incident). If content ever arrives as nil, the error message ' \
18
+ 'below explains recovery; no preemptive size limit applies.'
18
19
  end
19
20
 
20
21
  def category
@@ -128,9 +129,10 @@ module KairosMcp
128
129
  def content_missing_error(command, content)
129
130
  if content.nil?
130
131
  "Error: content is required for #{command}. " \
131
- "The content argument arrived as nil — this often means the MCP transport silently dropped it " \
132
- "because the combined argument size exceeded the client limit (~2 KB for stdio). " \
133
- "Try trimming content and reason, or split into smaller entries."
132
+ "The content argument arrived as nil — the MCP transport or the calling LLM dropped it. " \
133
+ "(A ~2 KB stdio limit was once suspected, but a ~40 KB update succeeded on 2026-06-10, " \
134
+ "so size alone is unlikely to be the cause.) " \
135
+ "Retry the call; if nil persists, write the content to a file and report the incident."
134
136
  else
135
137
  "Error: content is required for #{command} (received empty string)"
136
138
  end
@@ -1,4 +1,4 @@
1
1
  module KairosMcp
2
- VERSION = "3.30.0"
2
+ VERSION = "3.31.0"
3
3
  CHANGELOG_URL = "https://github.com/masaomi/KairosChain_2026/blob/main/CHANGELOG.md"
4
4
  end
@@ -45,6 +45,14 @@ MODELS = {
45
45
  input_mode: :stdin,
46
46
  thinking_effort: "medium",
47
47
  },
48
+ "claude_fable5" => {
49
+ tool: :claude,
50
+ cmd: "claude --print --model claude-fable-5 --effort medium",
51
+ label: "Claude Fable 5",
52
+ provider: "anthropic",
53
+ input_mode: :stdin,
54
+ thinking_effort: "medium",
55
+ },
48
56
  "claude_opus46" => {
49
57
  tool: :claude,
50
58
  cmd: "claude --print --model claude-opus-4-6 --effort medium",
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: multi_llm_review_workflow
3
3
  description: "Multi-LLM review methodology and execution — workflow pattern, CLI tooling, consensus analysis, Persona Assembly. Applicable to design, implementation, documentation, or any artifact."
4
- version: "3.4"
4
+ version: "3.5"
5
5
  tags:
6
6
  - workflow
7
7
  - review
@@ -192,11 +192,11 @@ starting** and verify each against `config/multi_llm_review.yml`:
192
192
  ```
193
193
  - [ ] Your model (orchestrator): ___
194
194
  - [ ] Agent Team Personas model: = orchestrator model (NOT a different model)
195
- - [ ] Subprocess CLI model: opposite Opus (4.6 if you are 4.7, vice versa)
195
+ - [ ] Subprocess CLI models: Opus 4.6 AND Opus 4.8 (both, not either/or)
196
196
  - [ ] Codex models: gpt-5.5 (default) AND gpt-5.4 (both, not either/or)
197
197
  - [ ] Cursor model: default (composer-2.5, no --model flag)
198
- - [ ] Total reviewer count: 5 (or 4 after orchestrator exclusion from subprocess)
199
- - [ ] Convergence rule: 3/5 APPROVE (full) or 3/4 APPROVE (after exclusion)
198
+ - [ ] Total reviewer count: 6 (or 5 after orchestrator exclusion from subprocess)
199
+ - [ ] Convergence rule: 4/6 APPROVE (full) or 3/5 APPROVE (after exclusion)
200
200
  ```
201
201
 
202
202
  ### Common mistakes (Path A)
@@ -206,7 +206,7 @@ starting** and verify each against `config/multi_llm_review.yml`:
206
206
  | Exclude orchestrator model from Agent Team Personas | Agent Team uses orchestrator model — they provide persona diversity, not epistemic diversity | LLM misreads "do not assign yourself as a reviewer" as applying to Agent Team; it applies only to subprocess CLI |
207
207
  | Run only Codex GPT-5.4, skip 5.5 | Run both — they catch different things (5.5 found §5 schema contradiction in Phase 2 Case A that no other reviewer caught) | Cost-saving heuristic; roster has both for a reason |
208
208
  | Use a smaller/cheaper model as Agent Team substitute | Use the orchestrator's own model with different personas | Confusing "model diversity" with "persona diversity" — Agent Team is the latter |
209
- | Run 3 reviewers instead of 5 (or 4 after exclusion) | Use the full roster from config | Ad-hoc "3 is enough" reasoning; config specifies 5 for empirical reasons |
209
+ | Run 3 reviewers instead of 6 (or 5 after exclusion) | Use the full roster from config | Ad-hoc "3 is enough" reasoning; config specifies 6 for empirical reasons |
210
210
 
211
211
  ## Roles
212
212
 
@@ -304,10 +304,10 @@ The rule applies **after** orchestrator classifies each finding as (a)/(b)/(c) p
304
304
  findings count toward the thresholds below; (c) findings are recorded as advisory
305
305
  and never block.
306
306
 
307
- - **3/4 APPROVE** (no (a)/(b) REJECT) = proceed to next step
307
+ - **4/6 APPROVE** full roster, or **3/5 APPROVE** after orchestrator exclusion ("exclude" strategy only — the default "delegate" strategy keeps 6 voters via collect) (no (a)/(b) REJECT) = proceed to next step
308
308
  - **Any (a) or (b) REJECT or FAIL** = revise and re-review
309
309
  - **(c)-only REJECT** = record as advisory, non-blocking
310
- - **4/4 APPROVE** (no (a)/(b)) = highest confidence, proceed
310
+ - **Unanimous APPROVE** (no (a)/(b)) = highest confidence, proceed
311
311
  - Legacy 3-reviewer mode: 2/3 APPROVE (no (a)/(b)) = proceed
312
312
  - Codex REJECT with (a)/(b) findings + others APPROVE = likely real issue, investigate before overriding
313
313
  - Codex REJECT with only (c) findings = expected per Codex value-system divergence; non-blocking
@@ -319,11 +319,11 @@ For normative detail and the underlying classification, see
319
319
 
320
320
  | Agreement | Meaning | Action |
321
321
  |-----------|---------|--------|
322
- | **4/4** (or **3/3**) | Architectural-level gap | Must fix |
323
- | **3/4** (or **2/3**) | Implementation-level issue | Should fix |
324
- | **1/4 only** | Specialty-specific insight | Do NOT ignore — often the most novel finding |
322
+ | **N/N** (unanimous) | Architectural-level gap | Must fix |
323
+ | **Majority** (e.g. 4/6, 3/5) | Implementation-level issue | Should fix |
324
+ | **1/N only** | Specialty-specific insight | Do NOT ignore — often the most novel finding |
325
325
 
326
- 1/4 (or 1/N) findings are not "minority opinions to discard." They represent unique expertise.
326
+ 1/N findings are not "minority opinions to discard." They represent unique expertise.
327
327
 
328
328
  ### Majority Rule — Reference Only
329
329
 
@@ -401,19 +401,20 @@ which agent 2>/dev/null && echo "agent: available" || echo "agent: NOT FOUND"
401
401
  which claude 2>/dev/null && echo "claude: available" || echo "claude: NOT FOUND"
402
402
  ```
403
403
 
404
- - All three available → Auto mode (4 reviewers, default)
405
- - Codex + Agent only → Auto mode (3 reviewers, legacy)
404
+ - All three available → Auto mode (6 reviewers, default)
405
+ - Codex + Agent only → Auto mode (legacy, reduced roster — apply "Legacy 3-reviewer mode: 2/3 APPROVE" from Convergence Rules)
406
406
  - Any of codex/agent missing → Manual mode
407
407
  - User override: `mode: manual` or `mode: auto`
408
408
 
409
- ### CLI Tool Matrix (Tested 2026-03-28)
409
+ ### CLI Tool Matrix (tested 2026-03-28; Claude CLI 4.6/4.8 rows verified live 2026-06-10)
410
410
 
411
411
  | Tool | Command | Prompt Input | Output Collection | Model |
412
412
  |------|---------|-------------|-------------------|-------|
413
- | **Codex** | `codex exec` | stdin pipe: `cat prompt.md \| codex exec -` | `-o /path/output.md` | GPT-5.5 (default) |
413
+ | **Codex** | `codex exec -m <model>` | stdin pipe: `cat prompt.md \| codex exec -` | `-o /path/output.md` | GPT-5.5 + GPT-5.4 (both roster entries, `-m` per entry) |
414
414
  | **Cursor Agent** | `agent -p` | File reference (stdin NOT supported) | stdout redirect: `> output.md` | Composer-2.5 (default) |
415
- | **Claude Code** | Agent tool (internal) | Direct prompt string | Write to workspace file | Opus 4.6 (session) |
416
- | **Claude CLI (4.7)** | `claude -p --model claude-opus-4-7 --bare` | stdin pipe: `cat prompt.md \| claude -p --model claude-opus-4-7 --bare` | stdout redirect: `> output.md` | Opus 4.7 |
415
+ | **Claude Code** | Agent tool (internal) | Direct prompt string | Write to workspace file | Fable 5 (session) |
416
+ | **Claude CLI (4.6)** | `claude -p --model claude-opus-4-6 --bare` | stdin pipe: `cat prompt.md \| claude -p --model claude-opus-4-6 --bare` | stdout redirect: `> output.md` | Opus 4.6 |
417
+ | **Claude CLI (4.8)** | `claude -p --model claude-opus-4-8 --bare` | stdin pipe: `cat prompt.md \| claude -p --model claude-opus-4-8 --bare` | stdout redirect: `> output.md` | Opus 4.8 |
417
418
 
418
419
  ### Thinking Effort Configuration (validated 2026-04-20)
419
420
 
@@ -421,23 +422,30 @@ Based on cross-evaluation experiment (7 models × 4 tasks + Nomic, 518 CLI calls
421
422
 
422
423
  | Role | Model | Effort Flag | Rationale |
423
424
  |------|-------|-------------|-----------|
424
- | **Primary (orchestrator)** | Opus 4.6 | `--effort medium` | Sufficient for integration, dialogue, judgment |
425
- | **Reviewer: Agent Team** | Opus 4.6 | `--effort medium` | Evaluator quality adequate at medium |
426
- | **Reviewer: Claude CLI** | Opus 4.7 | `--effort low` | Evaluator quality is effort-independent (low≈high: 8.35 vs 8.16) |
425
+ | **Primary (orchestrator)** | Fable 5 (session default) | (default) | Sufficient for integration, dialogue, judgment |
426
+ | **Reviewer: Agent Team** | = orchestrator (Fable 5) | (default) | Personas inherit orchestrator model |
427
+ | **Reviewer: Claude CLI** | Opus 4.6 / Opus 4.8 | (default; config `effort: medium`) | Evaluator quality is effort-independent (low≈high: 8.35 vs 8.16) — per 2026-04-29 policy reviewers stay at default |
427
428
  | **Coding sub-agent** | Opus 4.7 | `--effort medium` | Cost-effective default; use `high` for complex tasks |
428
429
  | **Design sub-agent** | Opus 4.7 | `--effort medium` | Cost-effective default; use `high` for complex tasks |
429
430
  | **Codex** | GPT-5.5 (default) | (no flag) | Fixed effort |
430
431
  | **Cursor Agent** | Composer-2.5 | (no flag) | Fixed effort |
431
432
 
433
+ Note (2026-06-10): the effort experiment data is from the Opus 4.6/4.7
434
+ generation. Fable 5 and Opus 4.8 effort sensitivity is not yet calibrated;
435
+ defaults apply until re-measured.
436
+
432
437
  Key findings:
433
438
  - **Opus 4.6** high effort improves Evaluator/Strategy (+0.43/+0.200 Nomic), not Response
434
439
  - **Opus 4.7** high effort improves Response/Thinking (+0.81 code, +0.53 philosophy), not Evaluator
435
440
  - **Opus 4.7 low > Opus 4.6 high** in combined score — model generation > effort setting
436
441
 
437
- **Effort escalation**: For particularly complex tasks (Tier 3+ architecture, security-critical
438
- code, multi-component refactoring), the LLM accessing this skill SHOULD escalate to `--effort high`
439
- at its own judgment. No human approval is needed for effort escalation — it is a cost/quality
440
- tradeoff that the executing LLM is best positioned to evaluate in context.
442
+ **Effort escalation** (coding/design sub-agents and the post-aggregation revision
443
+ phase only — NOT reviewers, who stay at default per the 2026-04-29 policy): For
444
+ particularly complex tasks (Tier 3+ architecture, security-critical code,
445
+ multi-component refactoring), the LLM accessing this skill SHOULD escalate to
446
+ `--effort high` at its own judgment. No human approval is needed for effort
447
+ escalation — it is a cost/quality tradeoff that the executing LLM is best
448
+ positioned to evaluate in context.
441
449
 
442
450
  ### Model Detection
443
451
 
@@ -454,15 +462,17 @@ agent --list-models 2>&1 | grep "(current\|default)"
454
462
  **Rule**: When invoking `multi_llm_review` (or running this workflow manually), the
455
463
  orchestrating LLM MUST pass its own model identifier as `orchestrator_model`.
456
464
 
457
- **Rationale**: The reviewer roster typically contains both Opus 4.6 and Opus 4.7
458
- entries. To avoid the orchestrator reviewing its own output (no independent signal),
459
- the dispatcher excludes any roster entry whose `model` matches `orchestrator_model`.
460
- This keeps the same SkillSet useful regardless of which Opus version the user has
461
- toggled to via `/model` review composition adapts automatically.
465
+ **Rationale**: The reviewer roster contains multiple Claude entries (Fable 5
466
+ team slot, Opus 4.6 CLI, Opus 4.8 CLI). To avoid the orchestrator reviewing its
467
+ own output (no independent signal), the dispatcher excludes or delegates the
468
+ roster entry whose `model` matches `orchestrator_model` (per
469
+ `orchestrator_strategy`). This keeps the same SkillSet useful
470
+ regardless of which Claude model the user has toggled to via `/model` — review
471
+ composition adapts automatically.
462
472
 
463
473
  **Why "argument-passing" not "file-introspection"**:
464
474
  - The orchestrator's model identity lives in *its own context* (system prompt
465
- declares e.g. "You are powered by Opus 4.7"). No external file or env var is
475
+ declares e.g. "You are powered by Fable 5"). No external file or env var is
466
476
  authoritative — `/model` switches change context immediately.
467
477
  - MCP protocol does not transmit caller-model info; only the orchestrator can
468
478
  truthfully report its own identity. This is genuine self-reference: the system
@@ -473,7 +483,8 @@ toggled to via `/model` — review composition adapts automatically.
473
483
 
474
484
  **How orchestrator obtains its model ID**:
475
485
  - Claude Code sessions: read the system prompt line "You are powered by the
476
- model named ... The exact model ID is `claude-opus-X-Y`". Use the exact ID.
486
+ model named ... The exact model ID is ...". Use the exact ID as stated,
487
+ whatever its form (e.g. `claude-fable-5`, `claude-opus-4-8`).
477
488
  - Other hosts: use whatever introspection the host provides; if none, pass
478
489
  `null` and accept that no exclusion happens.
479
490
 
@@ -482,27 +493,33 @@ toggled to via `/model` — review composition adapts automatically.
482
493
  multi_llm_review(
483
494
  artifact_path: "log/design.md",
484
495
  review_type: "design",
485
- orchestrator_model: "claude-opus-4-7" # MUST be set by caller
496
+ orchestrator_model: "claude-fable-5" # MUST be set by caller
486
497
  )
487
498
  ```
488
499
 
489
500
  **Dispatcher behavior** (config: `exclude_orchestrator_model: true`, default `true`):
490
501
  - If `orchestrator_model` matches a roster entry's `model`, that entry is skipped.
491
502
  - `min_quorum` and `convergence_rule` apply to the remaining reviewers.
492
- - 4-reviewer roster → 3 reviewer; recommended `convergence_rule: "2/3 APPROVE"`
493
- when one Opus is excluded.
503
+ - 6-reviewer roster → 5 reviewers; `convergence_rule_after_exclusion: "3/5 APPROVE"`
504
+ (from config) replaces the full-roster rule. This reduced count applies to the
505
+ "exclude" strategy only. The "subprocess" strategy keeps the full roster (the
506
+ matching entry runs as a fresh CLI process instead of being skipped). Under the
507
+ default "delegate" strategy, the matching entry is dropped at dispatch but
508
+ re-added at collect as the persona-team entry, so the voter count returns to 6
509
+ and the full-roster rule (4/6 APPROVE) applies — verified live 2026-06-10.
494
510
  - If `orchestrator_model` is `null` or unmatched, full roster runs (back-compat).
495
511
 
496
512
  **Manual-mode equivalent**: When orchestrating by hand, do not assign yourself
497
- as a reviewer. Pick the *other* Opus version for the Claude CLI subprocess
498
- reviewer (4.6 if you are 4.7, and vice versa).
513
+ as a subprocess reviewer. Run the Claude CLI subprocess reviewers (Opus 4.6 and
514
+ Opus 4.8); if your own model matches one of them, skip that entry and use the
515
+ after-exclusion convergence rule.
499
516
 
500
- ### Orchestrator Delegation Protocol (Two-Phase, opt-in)
517
+ ### Orchestrator Delegation Protocol (Two-Phase, default)
501
518
 
502
519
  The `delegate` strategy lets the orchestrator perform persona-based "Agent Team"
503
520
  review in its own context — preserving inherited project context that a fresh
504
- `claude -p` subprocess loses. Subprocess reviewers (codex, cursor, opposite-Opus)
505
- remain single-LLM.
521
+ `claude -p` subprocess loses. Subprocess reviewers (codex, cursor, Claude CLI
522
+ Opus 4.6/4.8) remain single-LLM.
506
523
 
507
524
  **Why**: The orchestrator already holds the artifact in context with full project
508
525
  awareness. Re-shipping it to a sandboxed subprocess discards that context. Same-
@@ -537,8 +554,10 @@ cross-model subprocess reviewers give epistemic diversity. The two are complemen
537
554
  required fields. Fix and retry collect with the same token.
538
555
  - All-subprocess-failed at Call 1: returns error immediately; no token issued.
539
556
 
540
- **Default**: `orchestrator_strategy` defaults to `"exclude"` (back-compat). Use
541
- `"delegate"` explicitly until validated by use.
557
+ **Default**: `orchestrator_strategy` defaults to `"delegate"` (config key
558
+ `default_orchestrator_strategy`). `"exclude"` remains available as the legacy
559
+ strategy. (Historical note: delegate was opt-in until validated by use; it has
560
+ been the config default since v3.x.)
542
561
 
543
562
  #### Async/Parallel Collect Timing — Iron Rule
544
563
 
@@ -603,8 +622,8 @@ readable until GC. Read them directly and synthesize manually, then re-run
603
622
  - **Cursor Agent trust**: `--trust` required for headless/non-interactive mode
604
623
  - **Codex workspace**: `-C /path/to/workspace` to set working directory
605
624
  - **Claude Agent paths**: Write within workspace (e.g., `log/`), not `/tmp`
606
- - **Claude CLI (Opus 4.7)**: `claude -p --model claude-opus-4-7 --bare` runs as external process. Uses stdin pipe (like Codex). `--bare` required for review tasks (skips hooks, CLAUDE.md, avoids bias from project instructions). Without `--bare`, CLAUDE.md's three-layer response structure may distort review output
607
- - **Claude CLI parallelism**: Agent tool (internal, Opus 4.6) + Bash `claude -p` (external, Opus 4.7) run truly in parallel as separate processes
625
+ - **Claude CLI (Opus 4.6 / 4.8)**: `claude -p --model claude-opus-4-6 --bare` (likewise `claude-opus-4-8`) runs as external process. Uses stdin pipe (like Codex). `--bare` required for review tasks (skips hooks, CLAUDE.md, avoids bias from project instructions). Without `--bare`, CLAUDE.md's three-layer response structure may distort review output
626
+ - **Claude CLI parallelism**: Agent tool (internal, orchestrator model = Fable 5) + Bash `claude -p` (external, Opus 4.6 / 4.8) run truly in parallel as separate processes
608
627
  - **Claude CLI file access**: `claude -p` with `--bare` has no MCP tools or file access. Ensure review prompt includes all artifact content inline (rule #6). Use `--add-dir` + `--allowedTools "Read,Glob,Grep"` if file access is needed (but note: this loads CLAUDE.md unless `--bare` is also used)
609
628
 
610
629
  ## Prompt Generation Rules
@@ -709,13 +728,15 @@ Step 1: Generate review prompt
709
728
  Step 2: Detect environment and models
710
729
  - Run: which codex && which agent && which claude
711
730
  - Detect default models
712
- - Report: "Auto mode: Codex (gpt-5.5), Agent (composer-2.5), Claude (opus-4.6), Claude CLI (opus-4.7)"
731
+ - Report: "Auto mode: Codex (gpt-5.5, gpt-5.4), Agent (composer-2.5), Claude Team (claude-fable-5), Claude CLI (opus-4.6, opus-4.8)"
713
732
 
714
- Step 3: Execute N reviews in parallel (default 4 reviewers)
715
- - Bash(background): cat prompt.md | codex exec -C workspace -o log/review_codex.md -
733
+ Step 3: Execute N reviews in parallel (default 6 reviewers)
734
+ - Bash(background): cat prompt.md | codex exec -m gpt-5.5 -C workspace -o log/review_codex_gpt5.5.md -
735
+ - Bash(background): cat prompt.md | codex exec -m gpt-5.4 -C workspace -o log/review_codex_gpt5.4.md -
716
736
  - Bash(background): agent -p --trust "Read prompt and review..." > log/review_cursor.md
717
- - Agent(background): Claude Team (Opus 4.6) → write to log/review_claude_opus4.6.md
718
- - Bash(background): cat prompt.md | claude -p --model claude-opus-4-7 --bare > log/review_claude_opus4.7.md 2>log/review_claude_opus4.7.stderr.log
737
+ - Agent(background): Claude Team (orchestrator model, Fable 5) → write to log/review_claude_team_fable5.md
738
+ - Bash(background): cat prompt.md | claude -p --model claude-opus-4-6 --bare > log/review_claude_opus4.6.md 2>log/review_claude_opus4.6.stderr.log
739
+ - Bash(background): cat prompt.md | claude -p --model claude-opus-4-8 --bare > log/review_claude_opus4.8.md 2>log/review_claude_opus4.8.stderr.log
719
740
 
720
741
  Step 4: Collect and validate
721
742
  - Wait for all to complete (background task notifications)
@@ -752,9 +773,11 @@ log/{artifact}_review{N}_{llm_id}_{date}.md # Individual reviews
752
773
  log/{artifact}_review{N}_consensus_{date}.md # Consensus analysis
753
774
  ```
754
775
 
755
- LLM identifiers: `claude_opus4.6`, `claude_team_opus4.6`,
756
- `claude_cli_opus4.7`, `codex_gpt5.5`, `codex_gpt5.4`, `cursor_composer2`, `cursor_gpt5.4`,
776
+ LLM identifiers: `claude_team_fable5`, `claude_cli_opus4.6`, `claude_cli_opus4.8`,
777
+ `codex_gpt5.5`, `codex_gpt5.4`, `cursor_composer2.5`, `cursor_gpt5.4`,
757
778
  `cursor_premium`
779
+ (legacy, pre-2026-06-10: `claude_opus4.6`, `claude_team_opus4.6`, `claude_team_opus4.7`,
780
+ `claude_cli_opus4.7`, `cursor_composer2`)
758
781
 
759
782
  ## Internal Agent Team Review
760
783
 
@@ -774,7 +797,7 @@ Compression ratio: parallel agent raw → Assembly ≈ 2:1
774
797
  - Don't advance to Phase N+1 before Phase N review converges
775
798
  - Don't re-review from scratch — each round checks only the delta
776
799
  - Don't use only internal agent team — different providers catch different bugs
777
- - Don't dismiss 1/4 (or 1/N) findings without evaluating substance
800
+ - Don't dismiss 1/N findings without evaluating substance
778
801
  - Don't use Persona Assembly in every intermediate round (save for final gate)
779
802
 
780
803
  ---
@@ -793,6 +816,8 @@ Compression ratio: parallel agent raw → Assembly ≈ 2:1
793
816
  - Codex convergence: REJECT → REJECT → REJECT → APPROVE (4 rounds)
794
817
  - Self-referential review: v3.0 of this skill reviewed by its own process → v3.1
795
818
  - Self-referential review: v3.2 (4-reviewer update, 2026-04-19) reviewed with new 4-reviewer default (Opus 4.6 + 4.7 + Codex + Composer-2). 4/4 APPROVE WITH CHANGES R1. Findings integrated → v3.3
819
+ - Roster update (v3.5, 2026-06-10): Fable 5 replaces Opus 4.7 as orchestrator/team slot; Opus 4.8 added as second subprocess CLI reviewer alongside Opus 4.6. 4.6 retained for its documented complementary bias (ambiguity-preserving, self-reference-friendly); 4.7 retired as its register is covered by 4.8 and Fable 5. 4.8/Fable 5 bias profiles uncalibrated — record (a)/(b)/(c) breakdowns per round until profiles accumulate in `multi_llm_reviewer_evaluation`
820
+ - Self-referential review of v3.5 (2 rounds, 2026-06-10/11, first run of the 6-reviewer roster): R1 REVISE (1 APPROVE / 4 REJECT — stale pre-v3.5 passages) → fixes → R2 3/6 APPROVE (4.6, 4.8, codex 5.4) with Cursor contributing a code-grounded correction (subprocess strategy keeps the full roster). 4.6/4.8 verdicts split along the predicted lenient/strict axis in R1 and converged to APPROVE in R2
796
821
 
797
822
  **Key insight**: Design reviews and implementation reviews find
798
- **categorically different bugs**. Both phases are necessary.
823
+ **categorically different bugs**. Both phases are necessary.
@@ -4,21 +4,24 @@
4
4
  # to avoid duplication.
5
5
 
6
6
  # Convergence rules
7
- # Roster has 5 reviewers (claude_team_opus4.7, claude_cli_opus4.6,
8
- # codex_gpt5.4, codex_gpt5.5, cursor_composer2.5). Rules are ratio-based
9
- # (parser interprets "N/M" as N/M fraction applied to successful count),
10
- # so the literal numerator/denominator is informational; what matters is
11
- # the ratio.
12
- convergence_rule: "3/5 APPROVE" # 60% of successful reviewers must APPROVE
7
+ # Roster has 6 reviewers (claude_team_fable5, claude_cli_opus4.6,
8
+ # claude_cli_opus4.8, codex_gpt5.4, codex_gpt5.5, cursor_composer2.5).
9
+ # Rules are ratio-based (parser interprets "N/M" as N/M fraction applied
10
+ # to successful count), so the literal numerator/denominator is
11
+ # informational; what matters is the ratio.
12
+ convergence_rule: "4/6 APPROVE" # ceil(6 * 0.6) = 4 of the 6-reviewer roster
13
13
  min_quorum: 2 # minimum successful reviews for any verdict
14
14
 
15
15
  # Self-referential orchestrator exclusion.
16
- # When the caller passes orchestrator_model (e.g. "claude-opus-4-7"), any
16
+ # When the caller passes orchestrator_model (e.g. "claude-fable-5"), any
17
17
  # roster entry with that exact model is dropped before dispatch — the
18
18
  # orchestrator should not review its own output.
19
- # When at least one entry is excluded, convergence_rule_after_exclusion
20
- # replaces convergence_rule for that dispatch (4 reviewers → 3 reviewers
21
- # makes "3/4 APPROVE" require unanimity, which is too strict).
19
+ # Under the "exclude" strategy, when at least one entry is excluded,
20
+ # convergence_rule_after_exclusion replaces convergence_rule for that
21
+ # dispatch (the literal rule for the full roster would otherwise be too
22
+ # strict for the reduced count). The "subprocess" strategy keeps the full
23
+ # roster; the default "delegate" strategy re-adds the slot at collect, so
24
+ # the full-roster rule governs there.
22
25
  exclude_orchestrator_model: true
23
26
 
24
27
  # Default orchestrator_strategy when the caller does not specify one.
@@ -29,9 +32,15 @@ exclude_orchestrator_model: true
29
32
  # "exclude": legacy behavior — drop the matching reviewer entirely.
30
33
  # "subprocess": spawn fresh claude -p for the matching reviewer.
31
34
  default_orchestrator_strategy: "delegate"
32
- # After excluding 1 orchestrator from 54 reviewers. Maintain the same
33
- # 60% ratio (ceil(4 * 0.6) = 3 → 3 of 4 must APPROVE).
34
- convergence_rule_after_exclusion: "3/4 APPROVE"
35
+ # After excluding 1 orchestrator from 65 reviewers. Same ceil(N * 0.6)
36
+ # majority basis (ceil(5 * 0.6) = 3 → 3 of 5 must APPROVE). Note the two
37
+ # rules are not the same literal ratio (4/6 ≈ 0.67 vs 3/5 = 0.60); since the
38
+ # parser applies the ratio to the successful count, the full-roster rule is
39
+ # slightly stricter when some reviewers fail. Accepted as-is.
40
+ # Applies to the "exclude" strategy only — "subprocess" keeps the full
41
+ # roster, and "delegate" re-adds the orchestrator slot at collect, so the
42
+ # full-roster rule (4/6) governs both (verified live 2026-06-10).
43
+ convergence_rule_after_exclusion: "3/5 APPROVE"
35
44
 
36
45
  # Two-phase delegation (orchestrator_strategy: "delegate").
37
46
  # Phase 1 dispatches subprocess reviewers synchronously, persists their
@@ -73,7 +82,9 @@ delegation:
73
82
  wait_still_pending_streak_limit: 3 # consecutive still_pending returns before crashed/wait_exhausted
74
83
 
75
84
  # Dispatch settings
76
- timeout_seconds: 300 # global deadline for all reviewers
85
+ timeout_seconds: 600 # global deadline for all reviewers
86
+ # (raised 300 -> 600 on 2026-06-10: live 6-roster
87
+ # run took 381s wall-clock with max_concurrent: 2)
77
88
  max_concurrent: 2 # semaphore limit (2 for laptop, 4 for CI)
78
89
 
79
90
  # Reviewer roster
@@ -84,16 +95,31 @@ max_concurrent: 2 # semaphore limit (2 for laptop, 4 for CI)
84
95
  # tool receives a `complexity` argument (or auto-detects it), these defaults
85
96
  # are overridden per-dispatch by the effort_map below.
86
97
  reviewers:
98
+ # Orchestrator slot. With orchestrator_strategy "delegate" (the default),
99
+ # this entry is replaced by the orchestrator's own persona Agent Team
100
+ # review when orchestrator_model matches. Updated 2026-06-10: Fable 5
101
+ # replaces Opus 4.7 as the session default model.
87
102
  - provider: claude_code
88
- model: claude-opus-4-7
103
+ model: claude-fable-5
89
104
  effort: medium
90
- role_label: claude_team_opus4.7
105
+ role_label: claude_team_fable5
91
106
 
92
107
  - provider: claude_code
93
108
  model: claude-opus-4-6
94
109
  effort: medium
95
110
  role_label: claude_cli_opus4.6
96
111
 
112
+ # Opus 4.8 added 2026-06-10, replacing Opus 4.7 in the roster. Rationale:
113
+ # 4.7's register (strictness / systematizing) is covered by its direct
114
+ # successor 4.8 and by Fable 5; 4.6 stays for its documented complementary
115
+ # bias (ambiguity-preserving, self-reference-friendly). 4.8's reviewer
116
+ # bias profile is not yet calibrated — record (a)/(b)/(c) breakdowns per
117
+ # round (see multi_llm_reviewer_evaluation) until a profile accumulates.
118
+ - provider: claude_code
119
+ model: claude-opus-4-8
120
+ effort: medium
121
+ role_label: claude_cli_opus4.8
122
+
97
123
  - provider: codex
98
124
  model: gpt-5.4
99
125
  effort: medium
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: kairos-chain
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.30.0
4
+ version: 3.31.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Masaomi Hatakeyama
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2026-06-08 00:00:00.000000000 Z
11
+ date: 2026-06-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: minitest
@@ -128,6 +128,7 @@ files:
128
128
  - lib/kairos_mcp/daemon/wal.rb
129
129
  - lib/kairos_mcp/daemon/wal_phase_recorder.rb
130
130
  - lib/kairos_mcp/daemon/wal_recovery.rb
131
+ - lib/kairos_mcp/drift_detection/correspondence_checker.rb
131
132
  - lib/kairos_mcp/dsl_ast/ast_engine.rb
132
133
  - lib/kairos_mcp/dsl_ast/decompiler.rb
133
134
  - lib/kairos_mcp/dsl_ast/drift_detector.rb