kairos-chain 3.30.0 → 3.31.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +35 -0
- data/lib/kairos_mcp/drift_detection/correspondence_checker.rb +136 -0
- data/lib/kairos_mcp/knowledge_provider.rb +5 -0
- data/lib/kairos_mcp/tools/knowledge_get.rb +50 -1
- data/lib/kairos_mcp/tools/knowledge_update.rb +7 -5
- data/lib/kairos_mcp/version.rb +1 -1
- data/templates/knowledge/llm_cross_evaluation/scripts/run_cross_eval.rb +8 -0
- data/templates/knowledge/multi_llm_review_workflow/multi_llm_review_workflow.md +77 -52
- data/templates/skillsets/multi_llm_review/config/multi_llm_review.yml +42 -16
- metadata +3 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 227d024f36839c595295ed0e3b4415c6764750597283b59a21ea1f5e16112210
|
|
4
|
+
data.tar.gz: 8fd8767580dbe3db617cbaf4ee0dcfafdb38f5823315f0321bae0c4893b2f45d
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7486fef959f54577c0a13a301112f746f8c440100f11f5d424f27771759ad3024d6d30988b7a9e7c58a4544db1ffe6c2705b3271226d9445f1537dc591f96ef7
|
|
7
|
+
data.tar.gz: 8d670bb2a2f5d8d848072101527fcb5b3153c23ea4c5ae1359c1d46c30187011911b32512086e1e1734e72ba5d0d3ae20913499d5d1d8c4b4d1d359b79963281
|
data/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,41 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
This project follows [Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [3.31.0] - 2026-06-11
|
|
8
|
+
|
|
9
|
+
### Changed — multi_llm_review roster: Fable 5 + Opus 4.6/4.8 (6 reviewers)
|
|
10
|
+
|
|
11
|
+
Default reviewer roster updated for the Fable 5 / Opus 4.8 model generation:
|
|
12
|
+
|
|
13
|
+
- Orchestrator/team slot: `claude-fable-5` (`claude_team_fable5`) replaces
|
|
14
|
+
Opus 4.7. Opus 4.8 added as a second subprocess CLI reviewer
|
|
15
|
+
(`claude_cli_opus4.8`) alongside Opus 4.6, which is retained for its
|
|
16
|
+
documented complementary bias (ambiguity-preserving, self-reference-friendly).
|
|
17
|
+
4.7 retired: its register is covered by 4.8 and Fable 5.
|
|
18
|
+
- Convergence rules: `4/6 APPROVE` full roster, `3/5 APPROVE` after
|
|
19
|
+
orchestrator exclusion ("exclude" strategy only — "subprocess" keeps the
|
|
20
|
+
full roster; "delegate" re-adds the slot at collect, so 4/6 governs there).
|
|
21
|
+
- `timeout_seconds` raised 300 → 600 (live 6-roster run measured 381s
|
|
22
|
+
wall-clock with `max_concurrent: 2`).
|
|
23
|
+
- Validated by a 2-round self-referential review of the workflow L1 with the
|
|
24
|
+
new roster itself (R1 REVISE → fixes → R2 with 4.6/4.8/codex-5.4 APPROVE,
|
|
25
|
+
including a code-grounded Cursor correction of the exclusion semantics).
|
|
26
|
+
|
|
27
|
+
### Changed — `multi_llm_review_workflow` L1 v3.5
|
|
28
|
+
|
|
29
|
+
- All roster-dependent sections updated (pre-flight checklist, CLI tool
|
|
30
|
+
matrix, convergence rules, orchestrator self-identification, orchestration
|
|
31
|
+
template, LLM identifiers). Claude CLI 4.6/4.8 rows verified live 2026-06-10.
|
|
32
|
+
- Effort escalation paragraph scoped to coding/design sub-agents and the
|
|
33
|
+
revision phase (reviewers stay at default per the 2026-04-29 policy).
|
|
34
|
+
|
|
35
|
+
### Fixed — `knowledge_update` size guidance
|
|
36
|
+
|
|
37
|
+
- Removed the "~2 KB MCP stdio limit" warning from the tool description and
|
|
38
|
+
nil-content diagnostic: two ~40 KB updates succeeded over stdio on
|
|
39
|
+
2026-06-10/11. The old figure generalized a single unreproduced
|
|
40
|
+
nil-content incident; the nil-content detection itself is retained.
|
|
41
|
+
|
|
7
42
|
## [3.30.0] - 2026-06-08
|
|
8
43
|
|
|
9
44
|
### Added — `dream_digest`: derived narrative view over L2/L1 fragments (dream SkillSet v0.3.0)
|
|
@@ -0,0 +1,136 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'digest'
|
|
4
|
+
require 'json'
|
|
5
|
+
require_relative '../kairos_chain/chain'
|
|
6
|
+
|
|
7
|
+
module KairosMcp
|
|
8
|
+
module DriftDetection
|
|
9
|
+
# CorrespondenceChecker — INV-A detection floor (Cycle 1, toward by-construction).
|
|
10
|
+
#
|
|
11
|
+
# Checks whether a live L0/L1 artifact still corresponds to its *current
|
|
12
|
+
# recorded provenance*: the content digest stored for that artifact at the
|
|
13
|
+
# head of the constitutive record (the hash chain). This is detection only —
|
|
14
|
+
# it surfaces divergence; it does not prevent edits or gate writes. Those are
|
|
15
|
+
# later cycles (single-source enforcement, record-as-gate).
|
|
16
|
+
#
|
|
17
|
+
# Provenance is rooted in the hash chain, not in the SQLite knowledge_meta
|
|
18
|
+
# cache: INV-A names the chain head as the non-editable anchor. The chain is
|
|
19
|
+
# therefore the single source consulted here; the meta table (when present)
|
|
20
|
+
# is a derived view and is intentionally not used for the comparison.
|
|
21
|
+
#
|
|
22
|
+
# The digest is computed over the *raw file content* (frontmatter included),
|
|
23
|
+
# matching exactly how it was recorded on create/update (a verbatim write,
|
|
24
|
+
# no normalization). Comparing the parsed/stripped body would never match.
|
|
25
|
+
class CorrespondenceChecker
|
|
26
|
+
# Result of a single correspondence check.
|
|
27
|
+
#
|
|
28
|
+
# status:
|
|
29
|
+
# :match live artifact corresponds to recorded provenance
|
|
30
|
+
# :mismatch live content diverged from recorded provenance (silent edit)
|
|
31
|
+
# :missing_record live artifact relied upon, but no recorded provenance exists
|
|
32
|
+
# :missing_artifact recorded/expected artifact is absent at the reliance point
|
|
33
|
+
# :error the check itself could not complete (not a correspondence claim)
|
|
34
|
+
Result = Struct.new(
|
|
35
|
+
:status, :name, :active_digest, :recorded_digest, :message,
|
|
36
|
+
keyword_init: true
|
|
37
|
+
) do
|
|
38
|
+
def corresponds?
|
|
39
|
+
status == :match
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
# A surfaced non-correspondence per INV-A (divergence, not an internal error).
|
|
43
|
+
def divergence?
|
|
44
|
+
%i[mismatch missing_record missing_artifact].include?(status)
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
class << self
|
|
49
|
+
# Check an L1 knowledge artifact against its recorded provenance.
|
|
50
|
+
#
|
|
51
|
+
# @param name [String] knowledge id (knowledge_id on the chain record)
|
|
52
|
+
# @param md_file_path [String, nil] path to the live .md file relied upon
|
|
53
|
+
# @param storage_backend [Storage::Backend, nil] backend for chain access
|
|
54
|
+
# @return [Result]
|
|
55
|
+
def check_l1(name:, md_file_path:, storage_backend: nil)
|
|
56
|
+
unless md_file_path && File.file?(md_file_path)
|
|
57
|
+
# Relied upon but absent — a missing artifact is itself a
|
|
58
|
+
# non-correspondence under INV-A (the expected set is recorded).
|
|
59
|
+
return Result.new(
|
|
60
|
+
status: :missing_artifact, name: name,
|
|
61
|
+
active_digest: nil, recorded_digest: nil,
|
|
62
|
+
message: "L1 '#{name}': artifact missing at the point of reliance"
|
|
63
|
+
)
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
active = Digest::SHA256.hexdigest(File.read(md_file_path))
|
|
67
|
+
recorded = recorded_digest_for(name, storage_backend)
|
|
68
|
+
|
|
69
|
+
if recorded.nil?
|
|
70
|
+
return Result.new(
|
|
71
|
+
status: :missing_record, name: name,
|
|
72
|
+
active_digest: active, recorded_digest: nil,
|
|
73
|
+
message: "L1 '#{name}': live artifact has no recorded provenance on the chain"
|
|
74
|
+
)
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
if active == recorded
|
|
78
|
+
Result.new(
|
|
79
|
+
status: :match, name: name,
|
|
80
|
+
active_digest: active, recorded_digest: recorded, message: nil
|
|
81
|
+
)
|
|
82
|
+
else
|
|
83
|
+
Result.new(
|
|
84
|
+
status: :mismatch, name: name,
|
|
85
|
+
active_digest: active, recorded_digest: recorded,
|
|
86
|
+
message: "L1 '#{name}': live content diverged from recorded provenance " \
|
|
87
|
+
"(active #{short(active)} ≠ recorded #{short(recorded)})"
|
|
88
|
+
)
|
|
89
|
+
end
|
|
90
|
+
rescue StandardError => e
|
|
91
|
+
Result.new(
|
|
92
|
+
status: :error, name: name,
|
|
93
|
+
active_digest: nil, recorded_digest: nil,
|
|
94
|
+
message: "L1 '#{name}': correspondence check could not complete: #{e.message}"
|
|
95
|
+
)
|
|
96
|
+
end
|
|
97
|
+
|
|
98
|
+
private
|
|
99
|
+
|
|
100
|
+
# The current recorded content digest for a knowledge_id: the next_hash of
|
|
101
|
+
# the most recent knowledge_update record, scanning the chain from head
|
|
102
|
+
# backward. Returns nil when the most recent relevant record removed the
|
|
103
|
+
# artifact (next_hash nil — delete/archive) or when none exists.
|
|
104
|
+
def recorded_digest_for(name, storage_backend)
|
|
105
|
+
chain = KairosChain::Chain.new(storage_backend: storage_backend)
|
|
106
|
+
chain.chain.reverse_each do |block|
|
|
107
|
+
Array(block.data).each do |entry|
|
|
108
|
+
record = parse_entry(entry)
|
|
109
|
+
next unless record.is_a?(Hash)
|
|
110
|
+
next unless record['type'] == 'knowledge_update'
|
|
111
|
+
next unless record['knowledge_id'] == name
|
|
112
|
+
|
|
113
|
+
# First match from the head is the current provenance (may be nil
|
|
114
|
+
# if the artifact was removed — caller treats nil as no record).
|
|
115
|
+
return record['next_hash']
|
|
116
|
+
end
|
|
117
|
+
end
|
|
118
|
+
nil
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
def parse_entry(entry)
|
|
122
|
+
return entry if entry.is_a?(Hash)
|
|
123
|
+
return JSON.parse(entry) if entry.is_a?(String)
|
|
124
|
+
|
|
125
|
+
nil
|
|
126
|
+
rescue JSON::ParserError
|
|
127
|
+
nil
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
def short(digest)
|
|
131
|
+
digest ? digest[0, 12] : '-'
|
|
132
|
+
end
|
|
133
|
+
end
|
|
134
|
+
end
|
|
135
|
+
end
|
|
136
|
+
end
|
|
@@ -23,6 +23,11 @@ module KairosMcp
|
|
|
23
23
|
# - Blockchain: Uses the configured storage backend
|
|
24
24
|
#
|
|
25
25
|
class KnowledgeProvider
|
|
26
|
+
# Main knowledge directory (constitutively-recorded L1). Exposed so callers
|
|
27
|
+
# can distinguish main-dir knowledge from read-only external SkillSet
|
|
28
|
+
# knowledge, e.g. to scope INV-A correspondence checks to recorded artifacts.
|
|
29
|
+
attr_reader :knowledge_dir
|
|
30
|
+
|
|
26
31
|
ARCHIVED_DIR = '.archived'
|
|
27
32
|
ARCHIVE_META_FILE = '.archive_meta.yml'
|
|
28
33
|
# Backup directories created by upgrade flow (`.bak.<timestamp>`).
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
require_relative 'base_tool'
|
|
4
4
|
require_relative '../knowledge_provider'
|
|
5
|
+
require_relative '../drift_detection/correspondence_checker'
|
|
5
6
|
|
|
6
7
|
module KairosMcp
|
|
7
8
|
module Tools
|
|
@@ -77,11 +78,59 @@ module KairosMcp
|
|
|
77
78
|
end
|
|
78
79
|
|
|
79
80
|
output = build_output(skill, arguments, provider)
|
|
80
|
-
|
|
81
|
+
# INV-A detection floor: reading L1 knowledge "in order to act upon" is a
|
|
82
|
+
# point of reliance. Surface any divergence from the recorded provenance
|
|
83
|
+
# here — never silently. Scoped to main-dir L1 (external SkillSet
|
|
84
|
+
# knowledge has no chain provenance and would false-positive).
|
|
85
|
+
banner = correspondence_banner(skill, provider)
|
|
86
|
+
text_content(banner ? banner + output : output)
|
|
81
87
|
end
|
|
82
88
|
|
|
83
89
|
private
|
|
84
90
|
|
|
91
|
+
# Returns a surfacing prefix if the live artifact does not correspond to its
|
|
92
|
+
# recorded provenance, or nil when it corresponds (stay silent on match).
|
|
93
|
+
#
|
|
94
|
+
# Surfacing is graded by signal strength (the invariant requires only
|
|
95
|
+
# non-silence; grading is a Cycle-1 backlog policy):
|
|
96
|
+
# :mismatch a recorded artifact whose content silently changed —
|
|
97
|
+
# high signal, rare → an alarm banner.
|
|
98
|
+
# :missing_record a live artifact with no chain provenance — overwhelmingly
|
|
99
|
+
# template-provisioned knowledge whose provenance root is the
|
|
100
|
+
# gem/template, not a per-instance record. Chain-rooting the
|
|
101
|
+
# expected set is explicit Cycle-1 backlog, so this is a muted
|
|
102
|
+
# one-line note, not an alarm — bannering every bundled read
|
|
103
|
+
# would train the reader to ignore the banner.
|
|
104
|
+
def correspondence_banner(skill, provider)
|
|
105
|
+
return nil unless main_dir_l1?(skill, provider)
|
|
106
|
+
|
|
107
|
+
result = DriftDetection::CorrespondenceChecker.check_l1(
|
|
108
|
+
name: skill.name,
|
|
109
|
+
md_file_path: skill.md_file_path
|
|
110
|
+
)
|
|
111
|
+
|
|
112
|
+
case result.status
|
|
113
|
+
when :mismatch, :missing_artifact
|
|
114
|
+
"> ⚠️ **Drift detected (INV-A)** — #{result.message}.\n" \
|
|
115
|
+
"> This content was modified outside the recorded change path; treat it as unverified.\n\n"
|
|
116
|
+
when :missing_record
|
|
117
|
+
"> ℹ️ No recorded provenance for this entry (provisioning not yet chain-rooted — Cycle-1 backlog).\n\n"
|
|
118
|
+
end
|
|
119
|
+
rescue StandardError => e
|
|
120
|
+
# A failed check must not break the read; report it without claiming correspondence.
|
|
121
|
+
warn "[knowledge_get] correspondence check failed: #{e.message}"
|
|
122
|
+
nil
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
# True only for knowledge living under the provider's main knowledge dir —
|
|
126
|
+
# i.e. constitutively-recorded L1, not read-only external SkillSet knowledge.
|
|
127
|
+
def main_dir_l1?(skill, provider)
|
|
128
|
+
return false unless skill.md_file_path && provider.respond_to?(:knowledge_dir)
|
|
129
|
+
|
|
130
|
+
root = File.expand_path(provider.knowledge_dir) + File::SEPARATOR
|
|
131
|
+
File.expand_path(skill.md_file_path).start_with?(root)
|
|
132
|
+
end
|
|
133
|
+
|
|
85
134
|
def build_output(skill, arguments, provider)
|
|
86
135
|
output = "## [#{skill.name}] #{skill.description || 'No description'}\n\n"
|
|
87
136
|
output += "**Layer:** L1 (Knowledge)\n"
|
|
@@ -13,8 +13,9 @@ module KairosMcp
|
|
|
13
13
|
|
|
14
14
|
def description
|
|
15
15
|
'Create, update, or delete L1 knowledge skills. Changes are recorded with hash references to the blockchain. ' \
|
|
16
|
-
'
|
|
17
|
-
'
|
|
16
|
+
'Large content is supported (a ~40 KB update was verified over stdio on 2026-06-10; the earlier ~2 KB guidance ' \
|
|
17
|
+
'generalized a single unreproduced nil-content incident). If content ever arrives as nil, the error message ' \
|
|
18
|
+
'below explains recovery; no preemptive size limit applies.'
|
|
18
19
|
end
|
|
19
20
|
|
|
20
21
|
def category
|
|
@@ -128,9 +129,10 @@ module KairosMcp
|
|
|
128
129
|
def content_missing_error(command, content)
|
|
129
130
|
if content.nil?
|
|
130
131
|
"Error: content is required for #{command}. " \
|
|
131
|
-
"The content argument arrived as nil —
|
|
132
|
-
"
|
|
133
|
-
"
|
|
132
|
+
"The content argument arrived as nil — the MCP transport or the calling LLM dropped it. " \
|
|
133
|
+
"(A ~2 KB stdio limit was once suspected, but a ~40 KB update succeeded on 2026-06-10, " \
|
|
134
|
+
"so size alone is unlikely to be the cause.) " \
|
|
135
|
+
"Retry the call; if nil persists, write the content to a file and report the incident."
|
|
134
136
|
else
|
|
135
137
|
"Error: content is required for #{command} (received empty string)"
|
|
136
138
|
end
|
data/lib/kairos_mcp/version.rb
CHANGED
|
@@ -45,6 +45,14 @@ MODELS = {
|
|
|
45
45
|
input_mode: :stdin,
|
|
46
46
|
thinking_effort: "medium",
|
|
47
47
|
},
|
|
48
|
+
"claude_fable5" => {
|
|
49
|
+
tool: :claude,
|
|
50
|
+
cmd: "claude --print --model claude-fable-5 --effort medium",
|
|
51
|
+
label: "Claude Fable 5",
|
|
52
|
+
provider: "anthropic",
|
|
53
|
+
input_mode: :stdin,
|
|
54
|
+
thinking_effort: "medium",
|
|
55
|
+
},
|
|
48
56
|
"claude_opus46" => {
|
|
49
57
|
tool: :claude,
|
|
50
58
|
cmd: "claude --print --model claude-opus-4-6 --effort medium",
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: multi_llm_review_workflow
|
|
3
3
|
description: "Multi-LLM review methodology and execution — workflow pattern, CLI tooling, consensus analysis, Persona Assembly. Applicable to design, implementation, documentation, or any artifact."
|
|
4
|
-
version: "3.
|
|
4
|
+
version: "3.5"
|
|
5
5
|
tags:
|
|
6
6
|
- workflow
|
|
7
7
|
- review
|
|
@@ -192,11 +192,11 @@ starting** and verify each against `config/multi_llm_review.yml`:
|
|
|
192
192
|
```
|
|
193
193
|
- [ ] Your model (orchestrator): ___
|
|
194
194
|
- [ ] Agent Team Personas model: = orchestrator model (NOT a different model)
|
|
195
|
-
- [ ] Subprocess CLI
|
|
195
|
+
- [ ] Subprocess CLI models: Opus 4.6 AND Opus 4.8 (both, not either/or)
|
|
196
196
|
- [ ] Codex models: gpt-5.5 (default) AND gpt-5.4 (both, not either/or)
|
|
197
197
|
- [ ] Cursor model: default (composer-2.5, no --model flag)
|
|
198
|
-
- [ ] Total reviewer count:
|
|
199
|
-
- [ ] Convergence rule:
|
|
198
|
+
- [ ] Total reviewer count: 6 (or 5 after orchestrator exclusion from subprocess)
|
|
199
|
+
- [ ] Convergence rule: 4/6 APPROVE (full) or 3/5 APPROVE (after exclusion)
|
|
200
200
|
```
|
|
201
201
|
|
|
202
202
|
### Common mistakes (Path A)
|
|
@@ -206,7 +206,7 @@ starting** and verify each against `config/multi_llm_review.yml`:
|
|
|
206
206
|
| Exclude orchestrator model from Agent Team Personas | Agent Team uses orchestrator model — they provide persona diversity, not epistemic diversity | LLM misreads "do not assign yourself as a reviewer" as applying to Agent Team; it applies only to subprocess CLI |
|
|
207
207
|
| Run only Codex GPT-5.4, skip 5.5 | Run both — they catch different things (5.5 found §5 schema contradiction in Phase 2 Case A that no other reviewer caught) | Cost-saving heuristic; roster has both for a reason |
|
|
208
208
|
| Use a smaller/cheaper model as Agent Team substitute | Use the orchestrator's own model with different personas | Confusing "model diversity" with "persona diversity" — Agent Team is the latter |
|
|
209
|
-
| Run 3 reviewers instead of
|
|
209
|
+
| Run 3 reviewers instead of 6 (or 5 after exclusion) | Use the full roster from config | Ad-hoc "3 is enough" reasoning; config specifies 6 for empirical reasons |
|
|
210
210
|
|
|
211
211
|
## Roles
|
|
212
212
|
|
|
@@ -304,10 +304,10 @@ The rule applies **after** orchestrator classifies each finding as (a)/(b)/(c) p
|
|
|
304
304
|
findings count toward the thresholds below; (c) findings are recorded as advisory
|
|
305
305
|
and never block.
|
|
306
306
|
|
|
307
|
-
- **3/
|
|
307
|
+
- **4/6 APPROVE** full roster, or **3/5 APPROVE** after orchestrator exclusion ("exclude" strategy only — the default "delegate" strategy keeps 6 voters via collect) (no (a)/(b) REJECT) = proceed to next step
|
|
308
308
|
- **Any (a) or (b) REJECT or FAIL** = revise and re-review
|
|
309
309
|
- **(c)-only REJECT** = record as advisory, non-blocking
|
|
310
|
-
- **
|
|
310
|
+
- **Unanimous APPROVE** (no (a)/(b)) = highest confidence, proceed
|
|
311
311
|
- Legacy 3-reviewer mode: 2/3 APPROVE (no (a)/(b)) = proceed
|
|
312
312
|
- Codex REJECT with (a)/(b) findings + others APPROVE = likely real issue, investigate before overriding
|
|
313
313
|
- Codex REJECT with only (c) findings = expected per Codex value-system divergence; non-blocking
|
|
@@ -319,11 +319,11 @@ For normative detail and the underlying classification, see
|
|
|
319
319
|
|
|
320
320
|
| Agreement | Meaning | Action |
|
|
321
321
|
|-----------|---------|--------|
|
|
322
|
-
| **
|
|
323
|
-
| **
|
|
324
|
-
| **1/
|
|
322
|
+
| **N/N** (unanimous) | Architectural-level gap | Must fix |
|
|
323
|
+
| **Majority** (e.g. 4/6, 3/5) | Implementation-level issue | Should fix |
|
|
324
|
+
| **1/N only** | Specialty-specific insight | Do NOT ignore — often the most novel finding |
|
|
325
325
|
|
|
326
|
-
1/
|
|
326
|
+
1/N findings are not "minority opinions to discard." They represent unique expertise.
|
|
327
327
|
|
|
328
328
|
### Majority Rule — Reference Only
|
|
329
329
|
|
|
@@ -401,19 +401,20 @@ which agent 2>/dev/null && echo "agent: available" || echo "agent: NOT FOUND"
|
|
|
401
401
|
which claude 2>/dev/null && echo "claude: available" || echo "claude: NOT FOUND"
|
|
402
402
|
```
|
|
403
403
|
|
|
404
|
-
- All three available → Auto mode (
|
|
405
|
-
- Codex + Agent only → Auto mode (3
|
|
404
|
+
- All three available → Auto mode (6 reviewers, default)
|
|
405
|
+
- Codex + Agent only → Auto mode (legacy, reduced roster — apply "Legacy 3-reviewer mode: 2/3 APPROVE" from Convergence Rules)
|
|
406
406
|
- Any of codex/agent missing → Manual mode
|
|
407
407
|
- User override: `mode: manual` or `mode: auto`
|
|
408
408
|
|
|
409
|
-
### CLI Tool Matrix (
|
|
409
|
+
### CLI Tool Matrix (tested 2026-03-28; Claude CLI 4.6/4.8 rows verified live 2026-06-10)
|
|
410
410
|
|
|
411
411
|
| Tool | Command | Prompt Input | Output Collection | Model |
|
|
412
412
|
|------|---------|-------------|-------------------|-------|
|
|
413
|
-
| **Codex** | `codex exec
|
|
413
|
+
| **Codex** | `codex exec -m <model>` | stdin pipe: `cat prompt.md \| codex exec -` | `-o /path/output.md` | GPT-5.5 + GPT-5.4 (both roster entries, `-m` per entry) |
|
|
414
414
|
| **Cursor Agent** | `agent -p` | File reference (stdin NOT supported) | stdout redirect: `> output.md` | Composer-2.5 (default) |
|
|
415
|
-
| **Claude Code** | Agent tool (internal) | Direct prompt string | Write to workspace file |
|
|
416
|
-
| **Claude CLI (4.
|
|
415
|
+
| **Claude Code** | Agent tool (internal) | Direct prompt string | Write to workspace file | Fable 5 (session) |
|
|
416
|
+
| **Claude CLI (4.6)** | `claude -p --model claude-opus-4-6 --bare` | stdin pipe: `cat prompt.md \| claude -p --model claude-opus-4-6 --bare` | stdout redirect: `> output.md` | Opus 4.6 |
|
|
417
|
+
| **Claude CLI (4.8)** | `claude -p --model claude-opus-4-8 --bare` | stdin pipe: `cat prompt.md \| claude -p --model claude-opus-4-8 --bare` | stdout redirect: `> output.md` | Opus 4.8 |
|
|
417
418
|
|
|
418
419
|
### Thinking Effort Configuration (validated 2026-04-20)
|
|
419
420
|
|
|
@@ -421,23 +422,30 @@ Based on cross-evaluation experiment (7 models × 4 tasks + Nomic, 518 CLI calls
|
|
|
421
422
|
|
|
422
423
|
| Role | Model | Effort Flag | Rationale |
|
|
423
424
|
|------|-------|-------------|-----------|
|
|
424
|
-
| **Primary (orchestrator)** |
|
|
425
|
-
| **Reviewer: Agent Team** |
|
|
426
|
-
| **Reviewer: Claude CLI** | Opus 4.
|
|
425
|
+
| **Primary (orchestrator)** | Fable 5 (session default) | (default) | Sufficient for integration, dialogue, judgment |
|
|
426
|
+
| **Reviewer: Agent Team** | = orchestrator (Fable 5) | (default) | Personas inherit orchestrator model |
|
|
427
|
+
| **Reviewer: Claude CLI** | Opus 4.6 / Opus 4.8 | (default; config `effort: medium`) | Evaluator quality is effort-independent (low≈high: 8.35 vs 8.16) — per 2026-04-29 policy reviewers stay at default |
|
|
427
428
|
| **Coding sub-agent** | Opus 4.7 | `--effort medium` | Cost-effective default; use `high` for complex tasks |
|
|
428
429
|
| **Design sub-agent** | Opus 4.7 | `--effort medium` | Cost-effective default; use `high` for complex tasks |
|
|
429
430
|
| **Codex** | GPT-5.5 (default) | (no flag) | Fixed effort |
|
|
430
431
|
| **Cursor Agent** | Composer-2.5 | (no flag) | Fixed effort |
|
|
431
432
|
|
|
433
|
+
Note (2026-06-10): the effort experiment data is from the Opus 4.6/4.7
|
|
434
|
+
generation. Fable 5 and Opus 4.8 effort sensitivity is not yet calibrated;
|
|
435
|
+
defaults apply until re-measured.
|
|
436
|
+
|
|
432
437
|
Key findings:
|
|
433
438
|
- **Opus 4.6** high effort improves Evaluator/Strategy (+0.43/+0.200 Nomic), not Response
|
|
434
439
|
- **Opus 4.7** high effort improves Response/Thinking (+0.81 code, +0.53 philosophy), not Evaluator
|
|
435
440
|
- **Opus 4.7 low > Opus 4.6 high** in combined score — model generation > effort setting
|
|
436
441
|
|
|
437
|
-
**Effort escalation
|
|
438
|
-
|
|
439
|
-
|
|
440
|
-
|
|
442
|
+
**Effort escalation** (coding/design sub-agents and the post-aggregation revision
|
|
443
|
+
phase only — NOT reviewers, who stay at default per the 2026-04-29 policy): For
|
|
444
|
+
particularly complex tasks (Tier 3+ architecture, security-critical code,
|
|
445
|
+
multi-component refactoring), the LLM accessing this skill SHOULD escalate to
|
|
446
|
+
`--effort high` at its own judgment. No human approval is needed for effort
|
|
447
|
+
escalation — it is a cost/quality tradeoff that the executing LLM is best
|
|
448
|
+
positioned to evaluate in context.
|
|
441
449
|
|
|
442
450
|
### Model Detection
|
|
443
451
|
|
|
@@ -454,15 +462,17 @@ agent --list-models 2>&1 | grep "(current\|default)"
|
|
|
454
462
|
**Rule**: When invoking `multi_llm_review` (or running this workflow manually), the
|
|
455
463
|
orchestrating LLM MUST pass its own model identifier as `orchestrator_model`.
|
|
456
464
|
|
|
457
|
-
**Rationale**: The reviewer roster
|
|
458
|
-
|
|
459
|
-
|
|
460
|
-
|
|
461
|
-
|
|
465
|
+
**Rationale**: The reviewer roster contains multiple Claude entries (Fable 5
|
|
466
|
+
team slot, Opus 4.6 CLI, Opus 4.8 CLI). To avoid the orchestrator reviewing its
|
|
467
|
+
own output (no independent signal), the dispatcher excludes or delegates the
|
|
468
|
+
roster entry whose `model` matches `orchestrator_model` (per
|
|
469
|
+
`orchestrator_strategy`). This keeps the same SkillSet useful
|
|
470
|
+
regardless of which Claude model the user has toggled to via `/model` — review
|
|
471
|
+
composition adapts automatically.
|
|
462
472
|
|
|
463
473
|
**Why "argument-passing" not "file-introspection"**:
|
|
464
474
|
- The orchestrator's model identity lives in *its own context* (system prompt
|
|
465
|
-
declares e.g. "You are powered by
|
|
475
|
+
declares e.g. "You are powered by Fable 5"). No external file or env var is
|
|
466
476
|
authoritative — `/model` switches change context immediately.
|
|
467
477
|
- MCP protocol does not transmit caller-model info; only the orchestrator can
|
|
468
478
|
truthfully report its own identity. This is genuine self-reference: the system
|
|
@@ -473,7 +483,8 @@ toggled to via `/model` — review composition adapts automatically.
|
|
|
473
483
|
|
|
474
484
|
**How orchestrator obtains its model ID**:
|
|
475
485
|
- Claude Code sessions: read the system prompt line "You are powered by the
|
|
476
|
-
model named ... The exact model ID is
|
|
486
|
+
model named ... The exact model ID is ...". Use the exact ID as stated,
|
|
487
|
+
whatever its form (e.g. `claude-fable-5`, `claude-opus-4-8`).
|
|
477
488
|
- Other hosts: use whatever introspection the host provides; if none, pass
|
|
478
489
|
`null` and accept that no exclusion happens.
|
|
479
490
|
|
|
@@ -482,27 +493,33 @@ toggled to via `/model` — review composition adapts automatically.
|
|
|
482
493
|
multi_llm_review(
|
|
483
494
|
artifact_path: "log/design.md",
|
|
484
495
|
review_type: "design",
|
|
485
|
-
orchestrator_model: "claude-
|
|
496
|
+
orchestrator_model: "claude-fable-5" # MUST be set by caller
|
|
486
497
|
)
|
|
487
498
|
```
|
|
488
499
|
|
|
489
500
|
**Dispatcher behavior** (config: `exclude_orchestrator_model: true`, default `true`):
|
|
490
501
|
- If `orchestrator_model` matches a roster entry's `model`, that entry is skipped.
|
|
491
502
|
- `min_quorum` and `convergence_rule` apply to the remaining reviewers.
|
|
492
|
-
-
|
|
493
|
-
|
|
503
|
+
- 6-reviewer roster → 5 reviewers; `convergence_rule_after_exclusion: "3/5 APPROVE"`
|
|
504
|
+
(from config) replaces the full-roster rule. This reduced count applies to the
|
|
505
|
+
"exclude" strategy only. The "subprocess" strategy keeps the full roster (the
|
|
506
|
+
matching entry runs as a fresh CLI process instead of being skipped). Under the
|
|
507
|
+
default "delegate" strategy, the matching entry is dropped at dispatch but
|
|
508
|
+
re-added at collect as the persona-team entry, so the voter count returns to 6
|
|
509
|
+
and the full-roster rule (4/6 APPROVE) applies — verified live 2026-06-10.
|
|
494
510
|
- If `orchestrator_model` is `null` or unmatched, full roster runs (back-compat).
|
|
495
511
|
|
|
496
512
|
**Manual-mode equivalent**: When orchestrating by hand, do not assign yourself
|
|
497
|
-
as a reviewer.
|
|
498
|
-
|
|
513
|
+
as a subprocess reviewer. Run the Claude CLI subprocess reviewers (Opus 4.6 and
|
|
514
|
+
Opus 4.8); if your own model matches one of them, skip that entry and use the
|
|
515
|
+
after-exclusion convergence rule.
|
|
499
516
|
|
|
500
|
-
### Orchestrator Delegation Protocol (Two-Phase,
|
|
517
|
+
### Orchestrator Delegation Protocol (Two-Phase, default)
|
|
501
518
|
|
|
502
519
|
The `delegate` strategy lets the orchestrator perform persona-based "Agent Team"
|
|
503
520
|
review in its own context — preserving inherited project context that a fresh
|
|
504
|
-
`claude -p` subprocess loses. Subprocess reviewers (codex, cursor,
|
|
505
|
-
remain single-LLM.
|
|
521
|
+
`claude -p` subprocess loses. Subprocess reviewers (codex, cursor, Claude CLI
|
|
522
|
+
Opus 4.6/4.8) remain single-LLM.
|
|
506
523
|
|
|
507
524
|
**Why**: The orchestrator already holds the artifact in context with full project
|
|
508
525
|
awareness. Re-shipping it to a sandboxed subprocess discards that context. Same-
|
|
@@ -537,8 +554,10 @@ cross-model subprocess reviewers give epistemic diversity. The two are complemen
|
|
|
537
554
|
required fields. Fix and retry collect with the same token.
|
|
538
555
|
- All-subprocess-failed at Call 1: returns error immediately; no token issued.
|
|
539
556
|
|
|
540
|
-
**Default**: `orchestrator_strategy` defaults to `"
|
|
541
|
-
`"
|
|
557
|
+
**Default**: `orchestrator_strategy` defaults to `"delegate"` (config key
|
|
558
|
+
`default_orchestrator_strategy`). `"exclude"` remains available as the legacy
|
|
559
|
+
strategy. (Historical note: delegate was opt-in until validated by use; it has
|
|
560
|
+
been the config default since v3.x.)
|
|
542
561
|
|
|
543
562
|
#### Async/Parallel Collect Timing — Iron Rule
|
|
544
563
|
|
|
@@ -603,8 +622,8 @@ readable until GC. Read them directly and synthesize manually, then re-run
|
|
|
603
622
|
- **Cursor Agent trust**: `--trust` required for headless/non-interactive mode
|
|
604
623
|
- **Codex workspace**: `-C /path/to/workspace` to set working directory
|
|
605
624
|
- **Claude Agent paths**: Write within workspace (e.g., `log/`), not `/tmp`
|
|
606
|
-
- **Claude CLI (Opus 4.
|
|
607
|
-
- **Claude CLI parallelism**: Agent tool (internal,
|
|
625
|
+
- **Claude CLI (Opus 4.6 / 4.8)**: `claude -p --model claude-opus-4-6 --bare` (likewise `claude-opus-4-8`) runs as external process. Uses stdin pipe (like Codex). `--bare` required for review tasks (skips hooks, CLAUDE.md, avoids bias from project instructions). Without `--bare`, CLAUDE.md's three-layer response structure may distort review output
|
|
626
|
+
- **Claude CLI parallelism**: Agent tool (internal, orchestrator model = Fable 5) + Bash `claude -p` (external, Opus 4.6 / 4.8) run truly in parallel as separate processes
|
|
608
627
|
- **Claude CLI file access**: `claude -p` with `--bare` has no MCP tools or file access. Ensure review prompt includes all artifact content inline (rule #6). Use `--add-dir` + `--allowedTools "Read,Glob,Grep"` if file access is needed (but note: this loads CLAUDE.md unless `--bare` is also used)
|
|
609
628
|
|
|
610
629
|
## Prompt Generation Rules
|
|
@@ -709,13 +728,15 @@ Step 1: Generate review prompt
|
|
|
709
728
|
Step 2: Detect environment and models
|
|
710
729
|
- Run: which codex && which agent && which claude
|
|
711
730
|
- Detect default models
|
|
712
|
-
- Report: "Auto mode: Codex (gpt-5.5), Agent (composer-2.5), Claude (
|
|
731
|
+
- Report: "Auto mode: Codex (gpt-5.5, gpt-5.4), Agent (composer-2.5), Claude Team (claude-fable-5), Claude CLI (opus-4.6, opus-4.8)"
|
|
713
732
|
|
|
714
|
-
Step 3: Execute N reviews in parallel (default
|
|
715
|
-
- Bash(background): cat prompt.md | codex exec -C workspace -o log/
|
|
733
|
+
Step 3: Execute N reviews in parallel (default 6 reviewers)
|
|
734
|
+
- Bash(background): cat prompt.md | codex exec -m gpt-5.5 -C workspace -o log/review_codex_gpt5.5.md -
|
|
735
|
+
- Bash(background): cat prompt.md | codex exec -m gpt-5.4 -C workspace -o log/review_codex_gpt5.4.md -
|
|
716
736
|
- Bash(background): agent -p --trust "Read prompt and review..." > log/review_cursor.md
|
|
717
|
-
- Agent(background): Claude Team (
|
|
718
|
-
- Bash(background): cat prompt.md | claude -p --model claude-opus-4-
|
|
737
|
+
- Agent(background): Claude Team (orchestrator model, Fable 5) → write to log/review_claude_team_fable5.md
|
|
738
|
+
- Bash(background): cat prompt.md | claude -p --model claude-opus-4-6 --bare > log/review_claude_opus4.6.md 2>log/review_claude_opus4.6.stderr.log
|
|
739
|
+
- Bash(background): cat prompt.md | claude -p --model claude-opus-4-8 --bare > log/review_claude_opus4.8.md 2>log/review_claude_opus4.8.stderr.log
|
|
719
740
|
|
|
720
741
|
Step 4: Collect and validate
|
|
721
742
|
- Wait for all to complete (background task notifications)
|
|
@@ -752,9 +773,11 @@ log/{artifact}_review{N}_{llm_id}_{date}.md # Individual reviews
|
|
|
752
773
|
log/{artifact}_review{N}_consensus_{date}.md # Consensus analysis
|
|
753
774
|
```
|
|
754
775
|
|
|
755
|
-
LLM identifiers: `
|
|
756
|
-
`
|
|
776
|
+
LLM identifiers: `claude_team_fable5`, `claude_cli_opus4.6`, `claude_cli_opus4.8`,
|
|
777
|
+
`codex_gpt5.5`, `codex_gpt5.4`, `cursor_composer2.5`, `cursor_gpt5.4`,
|
|
757
778
|
`cursor_premium`
|
|
779
|
+
(legacy, pre-2026-06-10: `claude_opus4.6`, `claude_team_opus4.6`, `claude_team_opus4.7`,
|
|
780
|
+
`claude_cli_opus4.7`, `cursor_composer2`)
|
|
758
781
|
|
|
759
782
|
## Internal Agent Team Review
|
|
760
783
|
|
|
@@ -774,7 +797,7 @@ Compression ratio: parallel agent raw → Assembly ≈ 2:1
|
|
|
774
797
|
- Don't advance to Phase N+1 before Phase N review converges
|
|
775
798
|
- Don't re-review from scratch — each round checks only the delta
|
|
776
799
|
- Don't use only internal agent team — different providers catch different bugs
|
|
777
|
-
- Don't dismiss 1/
|
|
800
|
+
- Don't dismiss 1/N findings without evaluating substance
|
|
778
801
|
- Don't use Persona Assembly in every intermediate round (save for final gate)
|
|
779
802
|
|
|
780
803
|
---
|
|
@@ -793,6 +816,8 @@ Compression ratio: parallel agent raw → Assembly ≈ 2:1
|
|
|
793
816
|
- Codex convergence: REJECT → REJECT → REJECT → APPROVE (4 rounds)
|
|
794
817
|
- Self-referential review: v3.0 of this skill reviewed by its own process → v3.1
|
|
795
818
|
- Self-referential review: v3.2 (4-reviewer update, 2026-04-19) reviewed with new 4-reviewer default (Opus 4.6 + 4.7 + Codex + Composer-2). 4/4 APPROVE WITH CHANGES R1. Findings integrated → v3.3
|
|
819
|
+
- Roster update (v3.5, 2026-06-10): Fable 5 replaces Opus 4.7 as orchestrator/team slot; Opus 4.8 added as second subprocess CLI reviewer alongside Opus 4.6. 4.6 retained for its documented complementary bias (ambiguity-preserving, self-reference-friendly); 4.7 retired as its register is covered by 4.8 and Fable 5. 4.8/Fable 5 bias profiles uncalibrated — record (a)/(b)/(c) breakdowns per round until profiles accumulate in `multi_llm_reviewer_evaluation`
|
|
820
|
+
- Self-referential review of v3.5 (2 rounds, 2026-06-10/11, first run of the 6-reviewer roster): R1 REVISE (1 APPROVE / 4 REJECT — stale pre-v3.5 passages) → fixes → R2 3/6 APPROVE (4.6, 4.8, codex 5.4) with Cursor contributing a code-grounded correction (subprocess strategy keeps the full roster). 4.6/4.8 verdicts split along the predicted lenient/strict axis in R1 and converged to APPROVE in R2
|
|
796
821
|
|
|
797
822
|
**Key insight**: Design reviews and implementation reviews find
|
|
798
|
-
**categorically different bugs**. Both phases are necessary.
|
|
823
|
+
**categorically different bugs**. Both phases are necessary.
|
|
@@ -4,21 +4,24 @@
|
|
|
4
4
|
# to avoid duplication.
|
|
5
5
|
|
|
6
6
|
# Convergence rules
|
|
7
|
-
# Roster has
|
|
8
|
-
# codex_gpt5.4, codex_gpt5.5, cursor_composer2.5).
|
|
9
|
-
# (parser interprets "N/M" as N/M fraction applied
|
|
10
|
-
# so the literal numerator/denominator is
|
|
11
|
-
# the ratio.
|
|
12
|
-
convergence_rule: "
|
|
7
|
+
# Roster has 6 reviewers (claude_team_fable5, claude_cli_opus4.6,
|
|
8
|
+
# claude_cli_opus4.8, codex_gpt5.4, codex_gpt5.5, cursor_composer2.5).
|
|
9
|
+
# Rules are ratio-based (parser interprets "N/M" as N/M fraction applied
|
|
10
|
+
# to successful count), so the literal numerator/denominator is
|
|
11
|
+
# informational; what matters is the ratio.
|
|
12
|
+
convergence_rule: "4/6 APPROVE" # ceil(6 * 0.6) = 4 of the 6-reviewer roster
|
|
13
13
|
min_quorum: 2 # minimum successful reviews for any verdict
|
|
14
14
|
|
|
15
15
|
# Self-referential orchestrator exclusion.
|
|
16
|
-
# When the caller passes orchestrator_model (e.g. "claude-
|
|
16
|
+
# When the caller passes orchestrator_model (e.g. "claude-fable-5"), any
|
|
17
17
|
# roster entry with that exact model is dropped before dispatch — the
|
|
18
18
|
# orchestrator should not review its own output.
|
|
19
|
-
#
|
|
20
|
-
# replaces convergence_rule for that
|
|
21
|
-
#
|
|
19
|
+
# Under the "exclude" strategy, when at least one entry is excluded,
|
|
20
|
+
# convergence_rule_after_exclusion replaces convergence_rule for that
|
|
21
|
+
# dispatch (the literal rule for the full roster would otherwise be too
|
|
22
|
+
# strict for the reduced count). The "subprocess" strategy keeps the full
|
|
23
|
+
# roster; the default "delegate" strategy re-adds the slot at collect, so
|
|
24
|
+
# the full-roster rule governs there.
|
|
22
25
|
exclude_orchestrator_model: true
|
|
23
26
|
|
|
24
27
|
# Default orchestrator_strategy when the caller does not specify one.
|
|
@@ -29,9 +32,15 @@ exclude_orchestrator_model: true
|
|
|
29
32
|
# "exclude": legacy behavior — drop the matching reviewer entirely.
|
|
30
33
|
# "subprocess": spawn fresh claude -p for the matching reviewer.
|
|
31
34
|
default_orchestrator_strategy: "delegate"
|
|
32
|
-
# After excluding 1 orchestrator from
|
|
33
|
-
#
|
|
34
|
-
|
|
35
|
+
# After excluding 1 orchestrator from 6 → 5 reviewers. Same ceil(N * 0.6)
|
|
36
|
+
# majority basis (ceil(5 * 0.6) = 3 → 3 of 5 must APPROVE). Note the two
|
|
37
|
+
# rules are not the same literal ratio (4/6 ≈ 0.67 vs 3/5 = 0.60); since the
|
|
38
|
+
# parser applies the ratio to the successful count, the full-roster rule is
|
|
39
|
+
# slightly stricter when some reviewers fail. Accepted as-is.
|
|
40
|
+
# Applies to the "exclude" strategy only — "subprocess" keeps the full
|
|
41
|
+
# roster, and "delegate" re-adds the orchestrator slot at collect, so the
|
|
42
|
+
# full-roster rule (4/6) governs both (verified live 2026-06-10).
|
|
43
|
+
convergence_rule_after_exclusion: "3/5 APPROVE"
|
|
35
44
|
|
|
36
45
|
# Two-phase delegation (orchestrator_strategy: "delegate").
|
|
37
46
|
# Phase 1 dispatches subprocess reviewers synchronously, persists their
|
|
@@ -73,7 +82,9 @@ delegation:
|
|
|
73
82
|
wait_still_pending_streak_limit: 3 # consecutive still_pending returns before crashed/wait_exhausted
|
|
74
83
|
|
|
75
84
|
# Dispatch settings
|
|
76
|
-
timeout_seconds:
|
|
85
|
+
timeout_seconds: 600 # global deadline for all reviewers
|
|
86
|
+
# (raised 300 -> 600 on 2026-06-10: live 6-roster
|
|
87
|
+
# run took 381s wall-clock with max_concurrent: 2)
|
|
77
88
|
max_concurrent: 2 # semaphore limit (2 for laptop, 4 for CI)
|
|
78
89
|
|
|
79
90
|
# Reviewer roster
|
|
@@ -84,16 +95,31 @@ max_concurrent: 2 # semaphore limit (2 for laptop, 4 for CI)
|
|
|
84
95
|
# tool receives a `complexity` argument (or auto-detects it), these defaults
|
|
85
96
|
# are overridden per-dispatch by the effort_map below.
|
|
86
97
|
reviewers:
|
|
98
|
+
# Orchestrator slot. With orchestrator_strategy "delegate" (the default),
|
|
99
|
+
# this entry is replaced by the orchestrator's own persona Agent Team
|
|
100
|
+
# review when orchestrator_model matches. Updated 2026-06-10: Fable 5
|
|
101
|
+
# replaces Opus 4.7 as the session default model.
|
|
87
102
|
- provider: claude_code
|
|
88
|
-
model: claude-
|
|
103
|
+
model: claude-fable-5
|
|
89
104
|
effort: medium
|
|
90
|
-
role_label:
|
|
105
|
+
role_label: claude_team_fable5
|
|
91
106
|
|
|
92
107
|
- provider: claude_code
|
|
93
108
|
model: claude-opus-4-6
|
|
94
109
|
effort: medium
|
|
95
110
|
role_label: claude_cli_opus4.6
|
|
96
111
|
|
|
112
|
+
# Opus 4.8 added 2026-06-10, replacing Opus 4.7 in the roster. Rationale:
|
|
113
|
+
# 4.7's register (strictness / systematizing) is covered by its direct
|
|
114
|
+
# successor 4.8 and by Fable 5; 4.6 stays for its documented complementary
|
|
115
|
+
# bias (ambiguity-preserving, self-reference-friendly). 4.8's reviewer
|
|
116
|
+
# bias profile is not yet calibrated — record (a)/(b)/(c) breakdowns per
|
|
117
|
+
# round (see multi_llm_reviewer_evaluation) until a profile accumulates.
|
|
118
|
+
- provider: claude_code
|
|
119
|
+
model: claude-opus-4-8
|
|
120
|
+
effort: medium
|
|
121
|
+
role_label: claude_cli_opus4.8
|
|
122
|
+
|
|
97
123
|
- provider: codex
|
|
98
124
|
model: gpt-5.4
|
|
99
125
|
effort: medium
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: kairos-chain
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 3.
|
|
4
|
+
version: 3.31.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Masaomi Hatakeyama
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2026-06-
|
|
11
|
+
date: 2026-06-10 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: minitest
|
|
@@ -128,6 +128,7 @@ files:
|
|
|
128
128
|
- lib/kairos_mcp/daemon/wal.rb
|
|
129
129
|
- lib/kairos_mcp/daemon/wal_phase_recorder.rb
|
|
130
130
|
- lib/kairos_mcp/daemon/wal_recovery.rb
|
|
131
|
+
- lib/kairos_mcp/drift_detection/correspondence_checker.rb
|
|
131
132
|
- lib/kairos_mcp/dsl_ast/ast_engine.rb
|
|
132
133
|
- lib/kairos_mcp/dsl_ast/decompiler.rb
|
|
133
134
|
- lib/kairos_mcp/dsl_ast/drift_detector.rb
|