kairos-chain 3.25.0 → 3.25.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +27 -0
- data/bin/kairos-chain +39 -0
- data/lib/kairos_mcp/version.rb +1 -1
- data/templates/knowledge/multi_llm_review_workflow/multi_llm_review_workflow.md +27 -6
- data/templates/knowledge/multi_llm_reviewer_evaluation/multi_llm_reviewer_evaluation.md +11 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 2ff49b5dba49e9c78161990be5bbf9bc94252542aeafe636b0c6cd424952ccff
|
|
4
|
+
data.tar.gz: 816f240e2cea9d045b4ac0c19c622792fe603dc49d2b454ec9409a8a6ab4f74a
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: aa98790fe6f4b71d1995b6d6a6822692a246c5aff24e3d99c9087ed44e78c9dc92ee19497628335862b1b6c93d156da514a7059bb275116d6e2ee62474e091a2
|
|
7
|
+
data.tar.gz: d50285be992138c811fb27bf9bf375648bbab801b71f37c9150d980996fc418da9e8bbd0bbaed6b506d19ecb187df932395f0a70f4501772b027c2bea8695581
|
data/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,33 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
This project follows [Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [3.25.2] - 2026-05-07
|
|
8
|
+
|
|
9
|
+
### Changed (L1 knowledge: reviewer evaluation feedback loop)
|
|
10
|
+
|
|
11
|
+
- `multi_llm_review_workflow` § L2 Save Points: 各 review round 終了時に
|
|
12
|
+
per-reviewer observation (verdict, (a)/(b)/(c) breakdown, briefing-reaction
|
|
13
|
+
shift, anomalies) を `reviewer_evaluation_observation_<reviewer>_<date>`
|
|
14
|
+
prefix で L2 context に記録するよう明示。次回以降の
|
|
15
|
+
`multi_llm_reviewer_evaluation` refinement 用 sample 蓄積 channel として
|
|
16
|
+
workflow に組み込み。
|
|
17
|
+
- `multi_llm_reviewer_evaluation` 末尾に "Refinement Source" section を追加。
|
|
18
|
+
上記 L2 context を refinement の source として明示することで L2 → L1
|
|
19
|
+
promotion loop を reviewer profile 自身に対しても閉じる (Prop 5
|
|
20
|
+
constitutive recording + Prop 6 incompleteness as driving force)。
|
|
21
|
+
|
|
22
|
+
Surface 拡張なし: 既存セクションへの bullet 追加 + 新規 1 段落のみ。新 mechanism /
|
|
23
|
+
新 field / 新 tool なし。
|
|
24
|
+
|
|
25
|
+
## [3.25.1] - 2026-05-07
|
|
26
|
+
|
|
27
|
+
### Changed (L1 knowledge: multi-LLM review)
|
|
28
|
+
|
|
29
|
+
- `multi_llm_reviewer_evaluation` v1.2 → v1.3: harness memory に分散していた reviewer 性癖知識 (Codex 3 structural biases、Cursor vs Codex briefing-reaction data、Codex GPT-5.5 profile) を統合。新セクション "Reviewer Value-System Divergence" + (a)/(b)/(c) finding classification を追加。Convergence Rule を分類後ベース ((a)+(b) のみ blocking) に更新。Cost-Benefit を "Phase 1 baseline (5 reviewers)" にリネームし scope 明示。
|
|
30
|
+
- `multi_llm_review_workflow`: Step 0 (mandatory `knowledge_get multi_llm_reviewer_evaluation`) と Step 0.5 (Design Direction Block for design / docs reviews) を追加。§ Convergence Rules と § Workflow Pattern step [4] を (a)/(b)/(c)-aware に整合。Step 0.5 block structure に invariant preface を追加 (anti-enumeration 整合)。
|
|
31
|
+
|
|
32
|
+
設計の経緯と検証は self-review 2 round (Codex GPT-5.5 / Cursor Composer-2 / Claude CLI Opus 4.6 / Persona Team Opus 4.7) で実施。4/4 APPROVE / APPROVE WITH CHANGES、no REJECT。Phase 2 Case A (Context Graph review loop, 2026-05-04) で観察された value-system divergence を起点とし、KairosChain_2026 only の experimental briefing protocol (project CLAUDE.md) を operational extension として L1 化。
|
|
33
|
+
|
|
7
34
|
## [3.25.0] - 2026-05-07
|
|
8
35
|
|
|
9
36
|
### Added (Instruction mode projection)
|
data/bin/kairos-chain
CHANGED
|
@@ -348,6 +348,42 @@ when 'mode'
|
|
|
348
348
|
|
|
349
349
|
mode_action = ARGV.shift || 'project'
|
|
350
350
|
|
|
351
|
+
if %w[-h --help help].include?(mode_action)
|
|
352
|
+
puts <<~HELP
|
|
353
|
+
Usage: kairos-chain mode <action> [--data-dir DIR]
|
|
354
|
+
|
|
355
|
+
Project the active instruction mode (Masa Mode, Tutorial Mode, ...)
|
|
356
|
+
to project-root CLAUDE.md via a managed @-import region. Required
|
|
357
|
+
for the mode body to reach Agent tool sub-agents (which do not
|
|
358
|
+
receive MCP `instructions`) and to bypass the harness truncation
|
|
359
|
+
cap on long mode bodies.
|
|
360
|
+
|
|
361
|
+
Actions:
|
|
362
|
+
project Materialize the active mode body to .claude/kairos/
|
|
363
|
+
instruction_mode.md and merge a marker region into
|
|
364
|
+
project-root CLAUDE.md. Default action when no action
|
|
365
|
+
is given. Idempotent — safe to re-run.
|
|
366
|
+
status Print the current projection state (active mode name,
|
|
367
|
+
version, artifact path/size, region presence, last
|
|
368
|
+
projection time).
|
|
369
|
+
remove Delete the projected artifact and remove the marker
|
|
370
|
+
region from CLAUDE.md. Manifest is cleared.
|
|
371
|
+
|
|
372
|
+
Options:
|
|
373
|
+
--data-dir DIR Override the .kairos/ data directory location.
|
|
374
|
+
|
|
375
|
+
Notes:
|
|
376
|
+
- The active mode is read from `instructions_mode` in
|
|
377
|
+
.kairos/skills/config.yml. Use `instructions_update` MCP tool
|
|
378
|
+
to change it; then re-run `mode project`.
|
|
379
|
+
- CLAUDE.md @-imports resolve at Claude Code session start;
|
|
380
|
+
you must restart Claude Code (`exit` then `claude`) for any
|
|
381
|
+
projection or removal to take effect.
|
|
382
|
+
- Body size policy: warn at >=150KB, refuse at >=256KB.
|
|
383
|
+
HELP
|
|
384
|
+
exit 0
|
|
385
|
+
end
|
|
386
|
+
|
|
351
387
|
$LOAD_PATH.unshift File.expand_path('../lib', __dir__)
|
|
352
388
|
require 'kairos_mcp'
|
|
353
389
|
|
|
@@ -484,6 +520,9 @@ OptionParser.new do |opts|
|
|
|
484
520
|
puts " init [DIR] Initialize data directory with default templates"
|
|
485
521
|
puts " upgrade [--apply] Check/apply template migrations after gem update"
|
|
486
522
|
puts " skillset <cmd> Manage SkillSet plugins (list/install/enable/disable/remove/info)"
|
|
523
|
+
puts " mode <action> Project active instruction mode to CLAUDE.md (project/status/remove)"
|
|
524
|
+
puts ""
|
|
525
|
+
puts "Run a subcommand with -h for details, e.g. 'kairos-chain mode -h'."
|
|
487
526
|
exit
|
|
488
527
|
end
|
|
489
528
|
end.parse!
|
data/lib/kairos_mcp/version.rb
CHANGED
|
@@ -234,8 +234,10 @@ The user always has the final say.
|
|
|
234
234
|
├── outputs: revised artifact + new review prompt
|
|
235
235
|
└── L2 save: consensus + revised artifact
|
|
236
236
|
|
|
|
237
|
-
[4]
|
|
238
|
-
If
|
|
237
|
+
[4] Classify findings as (a)/(b)/(c) per `multi_llm_reviewer_evaluation`
|
|
238
|
+
If no (a)/(b) blocking findings → proceed to next phase
|
|
239
|
+
If any (a)/(b) finding → repeat from [2] with revised artifact
|
|
240
|
+
(c) findings are recorded as advisory; non-blocking
|
|
239
241
|
```
|
|
240
242
|
|
|
241
243
|
## Review Types
|
|
@@ -263,10 +265,21 @@ likely to be missed by a single LLM reviewing its own design. For per-model prof
|
|
|
263
265
|
|
|
264
266
|
## Convergence Rules
|
|
265
267
|
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
268
|
+
The rule applies **after** orchestrator classifies each finding as (a)/(b)/(c) per
|
|
269
|
+
`multi_llm_reviewer_evaluation` § Reviewer Value-System Divergence. Only (a)+(b)
|
|
270
|
+
findings count toward the thresholds below; (c) findings are recorded as advisory
|
|
271
|
+
and never block.
|
|
272
|
+
|
|
273
|
+
- **3/4 APPROVE** (no (a)/(b) REJECT) = proceed to next step
|
|
274
|
+
- **Any (a) or (b) REJECT or FAIL** = revise and re-review
|
|
275
|
+
- **(c)-only REJECT** = record as advisory, non-blocking
|
|
276
|
+
- **4/4 APPROVE** (no (a)/(b)) = highest confidence, proceed
|
|
277
|
+
- Legacy 3-reviewer mode: 2/3 APPROVE (no (a)/(b)) = proceed
|
|
278
|
+
- Codex REJECT with (a)/(b) findings + others APPROVE = likely real issue, investigate before overriding
|
|
279
|
+
- Codex REJECT with only (c) findings = expected per Codex value-system divergence; non-blocking
|
|
280
|
+
|
|
281
|
+
For normative detail and the underlying classification, see
|
|
282
|
+
`multi_llm_reviewer_evaluation` § Convergence Rule (Updated).
|
|
270
283
|
|
|
271
284
|
### Consensus Patterns
|
|
272
285
|
|
|
@@ -331,6 +344,14 @@ Save to L2 context at these moments:
|
|
|
331
344
|
- After design/implementation complete (before review)
|
|
332
345
|
- After synthesis of reviews (revised version)
|
|
333
346
|
- After final convergence (implementation-ready / merge-ready)
|
|
347
|
+
- **After each review round**: capture per-reviewer observations — verdict,
|
|
348
|
+
(a)/(b)/(c) classification breakdown, briefing-reaction shift (did the
|
|
349
|
+
reviewer change verdict after Step 0.5 design direction?), anomalies
|
|
350
|
+
(off-pattern findings, format failures, refusal). Tag context name with
|
|
351
|
+
prefix `reviewer_evaluation_observation_<reviewer>_<date>` so future
|
|
352
|
+
refinement of `multi_llm_reviewer_evaluation` can sample these records
|
|
353
|
+
systematically. This closes the L2→L1 promotion loop for reviewer
|
|
354
|
+
profiles themselves.
|
|
334
355
|
|
|
335
356
|
---
|
|
336
357
|
|
|
@@ -271,7 +271,8 @@ Deployment: Composer-2 or Cursor GPT-5.4
|
|
|
271
271
|
| Reviewer | Summary |
|
|
272
272
|
|----------|---------|
|
|
273
273
|
| Claude Opus 4.6 | Guardian of design. Finds security threats and novel architectural alternatives |
|
|
274
|
-
| Codex GPT-5.4 | Strictest judge.
|
|
274
|
+
| Codex GPT-5.4 | Strictest judge. Classify findings (a)/(b)/(c) before treating REJECT as blocking; APPROVE is a strong signal **when reachable**, not a mandatory gate (see Phase 2 Case A caveat) |
|
|
275
|
+
| Codex GPT-5.5 | Stricter sibling of 5.4. Same value-system divergence (3 biases); apply the same classification discipline |
|
|
275
276
|
| Cursor Premium | Implementation craftsman. Bug hunter for concurrency and resource management |
|
|
276
277
|
| Composer-2 | Fastest pragmatist. First to determine if something is deployable |
|
|
277
278
|
| Cursor GPT-5.4 | Binary sword. Clear approve-or-reject, strictest on test coverage |
|
|
@@ -292,3 +293,12 @@ Deployment: Composer-2 or Cursor GPT-5.4
|
|
|
292
293
|
5. Some REJECTs reflect the reviewer's value system, not the artifact. The (a)/(b)/(c)
|
|
293
294
|
classification (see § Reviewer Value-System Divergence) is required to separate
|
|
294
295
|
blocking signal from advisory noise. Codex models in particular require this lens.
|
|
296
|
+
|
|
297
|
+
## Refinement Source
|
|
298
|
+
|
|
299
|
+
Profiles in this knowledge are refined from accumulated L2 contexts named with prefix
|
|
300
|
+
`reviewer_evaluation_observation_<reviewer>_<date>`, recorded after each multi-LLM
|
|
301
|
+
review round per `multi_llm_review_workflow` § L2 Save Points. When updating this
|
|
302
|
+
file, sample those records to revise per-reviewer profiles, Strength Matrix entries,
|
|
303
|
+
Cost-Benefit ratings, and the value-system divergence section. This closes the
|
|
304
|
+
L2 → L1 promotion loop for reviewer profiles themselves.
|