kairos-chain 3.23.1 → 3.24.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 35124b7f595066816a5e59823ca5f871bcb2c009f12db8127f7570ee0e923206
4
- data.tar.gz: 482cef573f07539663a119db81cbdcb5b600362a551b3ee45459d9727c78c755
3
+ metadata.gz: 20e2a223137f51dc61025e57dd6fd205a8f702ef923b5b0b2e0d464a308d279f
4
+ data.tar.gz: 51627cb487cf5fc2e46b8e6055bf36e0cf5c8839f15cb2c2236f2ff56efa2def
5
5
  SHA512:
6
- metadata.gz: a07f31d1e33713c4993aaf396838ffaac1ac5c9979ba7ff24f0d590a39fa4341f64cb1899d061507cc13b7fdfff3a85ee6177dc21b54578a7ef2af9b8de5fe81
7
- data.tar.gz: aebf6ebc84bf36682c74ebf7b000c5168051cab9685842c75dcd2a9ebc4b733713b4f1389e7e2d8fe00f61eefd3dc201885e9582a620c77e937a53999b349e3b
6
+ metadata.gz: 9fd4b17a28bdc06b7b19195e7274ef41e4ba7a93fc70e2c3a38083c58eae85ea8dfd432cb7cc6219cf7ba49ba637dbed12c48ea12312f4a3f26f46dfb936438f
7
+ data.tar.gz: fadb35fdbf47eeebfc9b667685452a0c222ecb396dbb2031de08ac27b2df1de1a3985fc41c03e4e767d22a58ec98a77d52c2059e921f8084d1663a2a099bdd1d
data/CHANGELOG.md CHANGED
@@ -4,6 +4,77 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
4
4
 
5
5
  This project follows [Semantic Versioning](https://semver.org/).
6
6
 
7
+ ## [3.24.0] - 2026-04-27
8
+
9
+ ### Added
10
+
11
+ - **multi_llm_review_wait MCP tool** (Phase 1.5) — optional blocking gate
12
+ between `multi_llm_review` (Phase 1) and `multi_llm_review_collect`
13
+ (Phase 2). Wraps the existing `WaitForWorker.wait` polling loop and
14
+ exposes 6 distinct status codes (`ready`, `still_pending`, `crashed`,
15
+ `unknown_token`, `already_collected`, `past_collect_deadline`) each with
16
+ a `next_action` recovery hint pointing at the right next tool.
17
+ - **`next_action` hint on `multi_llm_review` delegation_pending response** —
18
+ structured `{tool, args, purpose}` field nudging the orchestrator to call
19
+ `multi_llm_review_wait` after persona Agent dispatch. MCP does not enforce
20
+ ordering; this is a hint, not a constraint, but in practice LLMs follow
21
+ it reliably.
22
+ - **Path A vs Path B disambiguation in workflow knowledge doc** — surfaces
23
+ the long-implicit distinction between the host-tracked Bash workflow
24
+ (Claude Code's `Bash(background)` pattern, statusbar shows `XX shells`)
25
+ and the MCP-managed SkillSet (detached worker, no host-side tracking,
26
+ polling required).
27
+ - New config keys under `delegation.parallel`:
28
+ `wait_poll_interval_seconds: 1.0`, `wait_max_default_seconds: 600`,
29
+ `wait_max_hard_cap_seconds: 1800`, `wait_still_pending_streak_limit: 3`.
30
+ - Streak guard: 3 consecutive `still_pending` returns escalate to
31
+ `crashed/wait_exhausted` so a wedged worker cannot trap the orchestrator
32
+ in an infinite wait loop.
33
+ - 14 new tests in `test_multi_llm_review_wait.rb` covering all status
34
+ paths, streak persistence/reset, hard cap clamping, deadline-remaining
35
+ clamping, and backward compatibility (collect still works without wait).
36
+
37
+ ### Changed
38
+
39
+ - `multi_llm_review` SkillSet version 0.4.0 → 0.5.0.
40
+ - `delegation` instruction text now mentions wait → collect chain.
41
+
42
+ ### Notes
43
+
44
+ - Backward compatible: callers that skip wait and call collect directly
45
+ still work via collect's existing internal polling.
46
+ - Design review (Codex GPT-5.5 + Cursor Composer-2 + Claude Team Opus 4.7)
47
+ produced 3/3 REVISE with 6-7 P1 issues; revisions R1-R14 captured in
48
+ handoff L2 `multi_llm_review_wait_tool_handoff` before implementation.
49
+
50
+ ## [3.23.3] - 2026-04-27
51
+
52
+ ### Documentation
53
+
54
+ - **multi_llm_review_workflow knowledge** — Added "Async/Parallel Collect
55
+ Timing — Iron Rule" subsection. Documents the workflow constraint that the
56
+ orchestrator must call `multi_llm_review_collect` immediately after persona
57
+ Agent reviews complete, without intervening user dialogue. Explains the
58
+ underlying mechanics (LLM is not event-driven; collect already polls
59
+ internally at 0.5s intervals; token expiry vs subprocess completion). Adds
60
+ recommended flow, anti-pattern, and manual recovery instructions.
61
+ - Updated stale `must_collect_by` default reference (600s → 1800s).
62
+
63
+ ## [3.23.2] - 2026-04-26
64
+
65
+ ### Fixed
66
+
67
+ - **multi_llm_review collect_deadline bug** — `timeout_seconds_override` no longer
68
+ leaves the orchestrator's submission window shorter than the worker lifespan.
69
+ In the async/parallel path, `collect_deadline` is now auto-extended to cover
70
+ `worker self_timeout + poll margin` so raising `timeout_seconds_override`
71
+ alone keeps the token alive while the worker is healthy.
72
+ - New `collect_deadline_seconds_override` argument on `multi_llm_review` for
73
+ explicit control of the orchestrator's submission window.
74
+ - Default `delegation.collect_deadline_seconds` raised from `600` (10 min) to
75
+ `1800` (30 min) to better fit interactive runs where user dialogue intervenes
76
+ between Phase 1 and `multi_llm_review_collect`.
77
+
7
78
  ## [3.17.0] - 2026-04-22
8
79
 
9
80
  ### Added
@@ -1,4 +1,4 @@
1
1
  module KairosMcp
2
- VERSION = "3.23.1"
2
+ VERSION = "3.24.0"
3
3
  CHANGELOG_URL = "https://github.com/masaomi/KairosChain_2026/blob/main/CHANGELOG.md"
4
4
  end
@@ -29,6 +29,54 @@ This skill covers:
29
29
  For **WHO** (which LLM is good at what), see: `multi_llm_reviewer_evaluation`
30
30
  For **development lifecycle** (design → implement → verify), see: `design_to_implementation_workflow`
31
31
 
32
+ ## Two Execution Paths (read this first)
33
+
34
+ There are **two distinct execution paths** with the same name "multi-LLM review".
35
+ They differ in subprocess lifecycle ownership and completion-detection mechanics.
36
+ Pick the right one for your environment:
37
+
38
+ ### Path A — Host-tracked (Bash workflow)
39
+
40
+ - **Trigger**: orchestrator (LLM) calls Claude Code's `Bash` tool with
41
+ `run_in_background: true` to spawn `claude -p`, `codex exec`, `agent -p` directly.
42
+ - **Process parent**: Claude Code (the host harness).
43
+ - **Completion detection**: **event-driven**. Claude Code's shell tracker monitors
44
+ the spawned shells; when they exit, the LLM is notified through the standard
45
+ tool-result mechanism. Statusbar shows `XX shells` while reviewers are running.
46
+ - **When to use**: interactive Claude Code sessions for one-off Tier 3 reviews.
47
+ - **Reference**: see "Orchestration Template" section below for the canonical
48
+ `Bash(background)` pattern.
49
+
50
+ ### Path B — MCP-managed (multi_llm_review SkillSet)
51
+
52
+ - **Trigger**: orchestrator calls the MCP tool `multi_llm_review`.
53
+ - **Process parent**: the kairos-chain Ruby gem (MCP server). The gem forks a
54
+ detached worker (`bin/dispatch_worker.rb`) which calls `Process.setsid` and
55
+ spawns CLI reviewers as a separate session leader.
56
+ - **Completion detection**: **polling required**. Claude Code is not the parent,
57
+ so the spawned subprocesses do NOT appear in the `XX shells` statusbar count.
58
+ The orchestrator must call `multi_llm_review_collect` (and optionally
59
+ `multi_llm_review_wait` first) to observe completion.
60
+ - **When to use**: portable execution (other MCP hosts, autonomous Agent SkillSet),
61
+ or any case where you want the consensus computation done server-side.
62
+ - **Recommended chain (3-step)**: `multi_llm_review` → `multi_llm_review_wait` →
63
+ `multi_llm_review_collect`. Each Phase-1/1.5 response carries a `next_action`
64
+ hint pointing at the next tool. wait is optional but recommended — without it,
65
+ collect's internal polling still covers worker completion, but recovery hints
66
+ for `still_pending`, `crashed`, and `past_collect_deadline` are less explicit.
67
+ - **Reference**: see "Orchestrator Delegation Protocol" + "Async/Parallel Collect
68
+ Timing — Iron Rule" sections below.
69
+
70
+ ### Quick selector
71
+
72
+ | Question | Answer |
73
+ |----------|--------|
74
+ | Are you in an interactive Claude Code session and just need one review? | **Path A** |
75
+ | Do you need this to work in Cursor / autonomous mode / other MCP host? | **Path B** |
76
+ | Do you want the consensus result inside the MCP tool response? | **Path B** |
77
+ | Did you observe `XX shells` in the statusbar last time it worked? | That was Path A |
78
+ | Did the run produce a `collect_token` and a `pending/<token>/` directory? | That was Path B |
79
+
32
80
  ## Roles
33
81
 
34
82
  | Role | Who | Responsibility |
@@ -331,8 +379,8 @@ cross-model subprocess reviewers give epistemic diversity. The two are complemen
331
379
 
332
380
  **Failure modes**:
333
381
  - `expired_or_unknown_token`: orchestrator missed `must_collect_by` deadline
334
- (default 600s), or token never existed. The pending review is gone; call
335
- `multi_llm_review` again from scratch.
382
+ (default 1800s since v3.23.2; was 600s), or token never existed. The pending
383
+ review is gone; call `multi_llm_review` again from scratch.
336
384
  - `error: invalid orchestrator_reviews`: persona count outside 2-4 or missing
337
385
  required fields. Fix and retry collect with the same token.
338
386
  - All-subprocess-failed at Call 1: returns error immediately; no token issued.
@@ -340,6 +388,62 @@ cross-model subprocess reviewers give epistemic diversity. The two are complemen
340
388
  **Default**: `orchestrator_strategy` defaults to `"exclude"` (back-compat). Use
341
389
  `"delegate"` explicitly until validated by use.
342
390
 
391
+ #### Async/Parallel Collect Timing — Iron Rule
392
+
393
+ When `delegation.parallel.default: true` (the v3.x default), Call 1 returns
394
+ `delegation_pending` **immediately** (~50ms) and a detached worker runs the
395
+ subprocess reviewers in parallel with the orchestrator's persona Agent
396
+ reviews. This is faster, but introduces a timing trap:
397
+
398
+ > **The orchestrator MUST call `multi_llm_review_collect` immediately after
399
+ > the persona Agent reviews complete — without intervening user dialogue,
400
+ > unrelated tool calls, or context switches.**
401
+
402
+ Why this matters:
403
+
404
+ - The LLM is **not event-driven**. When the worker finishes writing
405
+ `subprocess_status: "done"` to `state.json`, nothing wakes the orchestrator.
406
+ The orchestrator only notices when it next calls `multi_llm_review_collect`.
407
+ - `multi_llm_review_collect` already polls internally at
408
+ `poll_interval_seconds: 0.5` for up to `collect_max_wait_seconds: 420` (7min)
409
+ per call. Polling is not the bottleneck — the bottleneck is the orchestrator
410
+ forgetting to call collect at all.
411
+ - The token expires at `collect_deadline` (default 30min since v3.23.2). If
412
+ user dialogue or other work intervenes between persona Agent completion and
413
+ the collect call, the token can expire while the subprocess results sit
414
+ ready and unread on disk.
415
+
416
+ Recommended orchestrator flow (single LLM turn, no detours):
417
+
418
+ ```
419
+ 1. multi_llm_review(...) → receive delegation_pending + collect_token
420
+ 2. Spawn persona Agent reviews (Agent tool, parallel, 2-4 personas)
421
+ 3. As soon as ALL personas return → multi_llm_review_collect(collect_token, ...)
422
+ 4. Return final consensus to user
423
+ ```
424
+
425
+ Anti-pattern (do NOT do this):
426
+
427
+ ```
428
+ 1. multi_llm_review(...) → delegation_pending
429
+ 2. Run persona Agent reviews
430
+ 3. ❌ "By the way, while we wait, let me explain X to the user…"
431
+ 4. ❌ User asks an unrelated question, conversation drifts
432
+ 5. ❌ 30+ minutes later, finally try collect → expired_or_unknown_token
433
+ ```
434
+
435
+ If the orchestrator is genuinely interrupted (user explicitly switches topic,
436
+ or persona Agent itself takes a long time and the orchestrator wants to
437
+ report progress), it should still **call collect first** — collect returns
438
+ quickly if the worker is already done, or blocks up to 7min if not. Either
439
+ way, the token stays alive and consensus is captured before resuming side
440
+ work.
441
+
442
+ Manual recovery if expiry happens: subprocess results are persisted at
443
+ `.kairos/multi_llm_review/pending/<token>/subprocess_results.json` and remain
444
+ readable until GC. Read them directly and synthesize manually, then re-run
445
+ `multi_llm_review` for fresh results if needed.
446
+
343
447
  ### Critical CLI Notes
344
448
 
345
449
  - **Cursor Agent stdin**: `cat file | agent -p -` does NOT work. Use file-reference:
@@ -39,7 +39,7 @@ convergence_rule_after_exclusion: "3/4 APPROVE"
39
39
  # Phase 2 (multi_llm_review_collect) receives the orchestrator's persona
40
40
  # team review and computes final consensus.
41
41
  delegation:
42
- collect_deadline_seconds: 600 # how long the orchestrator has to call collect
42
+ collect_deadline_seconds: 1800 # how long the orchestrator has to call collect (30min — interactive runs often have user dialogue between Phase 1 and collect)
43
43
  retain_collected_seconds: 3600 # how long collected results stay for idempotent replay
44
44
  # v0.3.0 parallel subprocess worker (Phase 11.5). When default:true, Phase 1
45
45
  # returns a delegation_pending token immediately and a detached OS worker
@@ -66,6 +66,11 @@ delegation:
66
66
  worker_self_timeout_floor_seconds: 60
67
67
  main_call_max_timeout_seconds: 300
68
68
  main_call_timeout_margin_seconds: 60
69
+ # multi_llm_review_wait tool (Phase 1.5) — see tools/multi_llm_review_wait.rb
70
+ wait_poll_interval_seconds: 1.0 # wait tool polling cadence (separate from collect's 0.5s)
71
+ wait_max_default_seconds: 600 # default per-call blocking ceiling
72
+ wait_max_hard_cap_seconds: 1800 # per-call hard cap (clamps max_wait_seconds arg)
73
+ wait_still_pending_streak_limit: 3 # consecutive still_pending returns before crashed/wait_exhausted
69
74
 
70
75
  # Dispatch settings
71
76
  timeout_seconds: 300 # global deadline for all reviewers
@@ -1,19 +1,21 @@
1
1
  {
2
2
  "name": "multi_llm_review",
3
- "version": "0.4.0",
4
- "description": "Parallel multi-LLM review orchestration. Dispatches review prompts to N LLM backends via llm_client, collects verdicts, and computes consensus. v0.4.0 (Phase 12): adds feedback_text + schema_version to response, sanitization contract for prompt-injection defense, and multi_llm_review_bundle tool for human-handoff paths without dispatch.",
3
+ "version": "0.5.0",
4
+ "description": "Parallel multi-LLM review orchestration. Dispatches review prompts to N LLM backends via llm_client, collects verdicts, and computes consensus. v0.5.0: adds multi_llm_review_wait (Phase 1.5) for explicit subprocess completion gating with next_action recovery hints, and Path A/B doc disambiguation. v0.4.0 (Phase 12): feedback_text + schema_version, sanitization contract for prompt-injection defense, and multi_llm_review_bundle tool for human-handoff paths without dispatch.",
5
5
  "author": "Masaomi Hatakeyama",
6
6
  "layer": "L1",
7
7
  "depends_on": ["llm_client"],
8
8
  "provides": [
9
9
  "multi_llm_review_orchestration",
10
10
  "review_consensus",
11
- "review_bundle_human_handoff"
11
+ "review_bundle_human_handoff",
12
+ "review_wait_gate"
12
13
  ],
13
14
  "tool_classes": [
14
15
  "KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReview",
15
16
  "KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReviewCollect",
16
- "KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReviewBundle"
17
+ "KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReviewBundle",
18
+ "KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReviewWait"
17
19
  ],
18
20
  "config_files": ["config/multi_llm_review.yml"],
19
21
  "knowledge_dirs": [],
@@ -861,6 +861,19 @@ module KairosMcp
861
861
  FileUtils.rm_rf(@tmp)
862
862
  end
863
863
 
864
+ # Replace WorkerSpawner.spawn with a no-op for the duration of the block.
865
+ # Avoids actually forking a detached worker process during async tests.
866
+ def with_stubbed_worker_spawner
867
+ singleton = WorkerSpawner.singleton_class
868
+ original = WorkerSpawner.method(:spawn)
869
+ singleton.send(:define_method, :spawn) { |**_kwargs| true }
870
+ begin
871
+ yield
872
+ ensure
873
+ singleton.send(:define_method, :spawn, original)
874
+ end
875
+ end
876
+
864
877
  def test_partition_for_strategy_delegate_drops_match
865
878
  reviewers = [
866
879
  { provider: 'claude_code', model: 'claude-opus-4-7', role_label: 'r47' },
@@ -1024,9 +1037,150 @@ module KairosMcp
1024
1037
  )
1025
1038
  payload = JSON.parse(result.first[:text])
1026
1039
  deadline = Time.iso8601(payload['must_collect_by'])
1027
- # Should be ~60s from now, not the default 600s
1040
+ # Should be ~60s from now, not the default 1800s
1028
1041
  assert_in_delta 60, deadline - Time.now, 5
1029
1042
  end
1043
+
1044
+ # Bug #1 fix: collect_deadline_seconds_override must extend the sync
1045
+ # delegate_response deadline beyond the config default.
1046
+ def test_delegate_sync_respects_collect_deadline_override
1047
+ subprocess_results = [
1048
+ { role_label: 'codex', provider: 'codex', model: 'm',
1049
+ raw_text: 'APPROVE', elapsed_seconds: 1, error: nil, status: :success }
1050
+ ]
1051
+ result = @tool.send(:delegate_response,
1052
+ raw_results: subprocess_results,
1053
+ arguments: {
1054
+ 'review_type' => 'design', 'artifact_name' => 'x',
1055
+ 'collect_deadline_seconds_override' => 3000
1056
+ },
1057
+ config: { 'delegation' => { 'collect_deadline_seconds' => 60 } },
1058
+ orchestrator_model: 'claude-opus-4-7',
1059
+ convergence_rule: '3/4 APPROVE',
1060
+ min_quorum: 2,
1061
+ review_round: 1,
1062
+ complexity: 'high'
1063
+ )
1064
+ payload = JSON.parse(result.first[:text])
1065
+ deadline = Time.iso8601(payload['must_collect_by'])
1066
+ # Override (3000s) wins over config (60s)
1067
+ assert_in_delta 3000, deadline - Time.now, 5
1068
+ end
1069
+
1070
+ # Bug #3 fix: when no override and no config, default is now 1800s (was 600s).
1071
+ def test_delegate_sync_default_deadline_is_1800
1072
+ subprocess_results = [
1073
+ { role_label: 'codex', provider: 'codex', model: 'm',
1074
+ raw_text: 'APPROVE', elapsed_seconds: 1, error: nil, status: :success }
1075
+ ]
1076
+ result = @tool.send(:delegate_response,
1077
+ raw_results: subprocess_results,
1078
+ arguments: { 'review_type' => 'design', 'artifact_name' => 'x' },
1079
+ config: {},
1080
+ orchestrator_model: 'claude-opus-4-7',
1081
+ convergence_rule: '3/4 APPROVE',
1082
+ min_quorum: 2,
1083
+ review_round: 1,
1084
+ complexity: 'high'
1085
+ )
1086
+ payload = JSON.parse(result.first[:text])
1087
+ deadline = Time.iso8601(payload['must_collect_by'])
1088
+ assert_in_delta 1800, deadline - Time.now, 5
1089
+ end
1090
+
1091
+ # Bug #1 fix (async): when timeout_seconds_override raises the worker
1092
+ # self_timeout above the configured collect_deadline, the deadline must
1093
+ # auto-extend to cover the worker lifespan + poll margin. Otherwise the
1094
+ # token expires while the worker is still healthy.
1095
+ def test_delegate_async_auto_extends_deadline_to_worker_lifespan
1096
+ reviewers = [{ provider: 'codex', model: 'codex-default', role_label: 'codex' }]
1097
+ arguments = {
1098
+ 'review_type' => 'design',
1099
+ 'artifact_name' => 'x',
1100
+ 'timeout_seconds_override' => 1500
1101
+ }
1102
+ config = {
1103
+ 'delegation' => {
1104
+ 'collect_deadline_seconds' => 600,
1105
+ 'parallel' => {
1106
+ 'worker_self_timeout_multiplier' => 1.5,
1107
+ 'worker_self_timeout_floor_seconds' => 60,
1108
+ 'poll_interval_seconds' => 0.5
1109
+ }
1110
+ }
1111
+ }
1112
+ parallel_cfg = config.dig('delegation', 'parallel')
1113
+
1114
+ result = nil
1115
+ with_stubbed_worker_spawner do
1116
+ result = @tool.send(:delegate_response_async,
1117
+ reviewers: reviewers,
1118
+ messages: [{ 'role' => 'user', 'content' => 'x' }],
1119
+ system_prompt: 'sys',
1120
+ arguments: arguments,
1121
+ config: config,
1122
+ orchestrator_model: 'claude-opus-4-7',
1123
+ convergence_rule: '3/4 APPROVE',
1124
+ min_quorum: 2,
1125
+ review_round: 1,
1126
+ complexity: 'high',
1127
+ review_context: 'independent',
1128
+ max_concurrent: 2,
1129
+ timeout_secs: 1500,
1130
+ parallel_cfg: parallel_cfg
1131
+ )
1132
+ end
1133
+ payload = JSON.parse(result.first[:text])
1134
+ assert_equal 'delegation_pending', payload['status']
1135
+ deadline = Time.iso8601(payload['must_collect_by'])
1136
+ # worker_lifespan = 1500*1.5 + 60 = 2310; +10s poll margin = 2320
1137
+ # Deadline must be at least worker_lifespan + margin, NOT 600
1138
+ assert_operator deadline - Time.now, :>=, 2320 - 5
1139
+ end
1140
+
1141
+ # Async: explicit collect_deadline_seconds_override above the auto-min wins.
1142
+ def test_delegate_async_respects_explicit_override
1143
+ reviewers = [{ provider: 'codex', model: 'codex-default', role_label: 'codex' }]
1144
+ arguments = {
1145
+ 'review_type' => 'design',
1146
+ 'artifact_name' => 'x',
1147
+ 'collect_deadline_seconds_override' => 5000
1148
+ }
1149
+ config = {
1150
+ 'delegation' => {
1151
+ 'collect_deadline_seconds' => 600,
1152
+ 'parallel' => {
1153
+ 'worker_self_timeout_multiplier' => 1.5,
1154
+ 'worker_self_timeout_floor_seconds' => 60,
1155
+ 'poll_interval_seconds' => 0.5
1156
+ }
1157
+ }
1158
+ }
1159
+ parallel_cfg = config.dig('delegation', 'parallel')
1160
+
1161
+ result = nil
1162
+ with_stubbed_worker_spawner do
1163
+ result = @tool.send(:delegate_response_async,
1164
+ reviewers: reviewers,
1165
+ messages: [{ 'role' => 'user', 'content' => 'x' }],
1166
+ system_prompt: 'sys',
1167
+ arguments: arguments,
1168
+ config: config,
1169
+ orchestrator_model: 'claude-opus-4-7',
1170
+ convergence_rule: '3/4 APPROVE',
1171
+ min_quorum: 2,
1172
+ review_round: 1,
1173
+ complexity: 'high',
1174
+ review_context: 'independent',
1175
+ max_concurrent: 2,
1176
+ timeout_secs: 300,
1177
+ parallel_cfg: parallel_cfg
1178
+ )
1179
+ end
1180
+ payload = JSON.parse(result.first[:text])
1181
+ deadline = Time.iso8601(payload['must_collect_by'])
1182
+ assert_in_delta 5000, deadline - Time.now, 5
1183
+ end
1030
1184
  end
1031
1185
 
1032
1186
  class TestCollectTool < Minitest::Test
@@ -0,0 +1,249 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'minitest/autorun'
4
+ require 'json'
5
+ require 'tmpdir'
6
+ require 'fileutils'
7
+ require 'time'
8
+
9
+ # Stub BaseTool so we can load the tool file in isolation.
10
+ module KairosMcp
11
+ module Tools
12
+ class BaseTool
13
+ def text_content(s); [{ text: s }]; end
14
+ end
15
+ end unless defined?(KairosMcp::Tools::BaseTool)
16
+ end
17
+
18
+ require_relative '../lib/multi_llm_review/pending_state'
19
+ require_relative '../lib/multi_llm_review/wait_for_worker'
20
+ require_relative '../tools/multi_llm_review_wait'
21
+
22
+ module KairosMcp
23
+ module SkillSets
24
+ module MultiLlmReview
25
+ class TestMultiLlmReviewWait < Minitest::Test
26
+ def setup
27
+ @tmp = Dir.mktmpdir('mlr-wait-')
28
+ @orig_cwd = Dir.pwd
29
+ Dir.chdir(@tmp)
30
+ @tool = Tools::MultiLlmReviewWait.new
31
+ @token = '11111111-2222-4333-8444-555555555555'
32
+ end
33
+
34
+ def teardown
35
+ Dir.chdir(@orig_cwd)
36
+ FileUtils.rm_rf(@tmp)
37
+ end
38
+
39
+ def write_state(extra = {})
40
+ PendingState.create_token_dir!(@token)
41
+ PendingState.write_state(@token, {
42
+ 'schema_version' => 4,
43
+ 'token' => @token,
44
+ 'created_at' => Time.now.iso8601,
45
+ 'collect_deadline' => (Time.now + 1800).iso8601,
46
+ 'subprocess_status' => 'pending',
47
+ 'subprocess_total' => 3,
48
+ 'parallel' => true
49
+ }.merge(extra))
50
+ FileUtils.touch(PendingState.collect_lock_path(@token))
51
+ end
52
+
53
+ def call_wait(args = {})
54
+ payload = JSON.parse(@tool.call({ 'collect_token' => @token }.merge(args)).first[:text])
55
+ payload
56
+ end
57
+
58
+ # ── unknown_token ────────────────────────────────────────────────
59
+
60
+ def test_unknown_token_returns_unknown_with_redispatch_hint
61
+ payload = call_wait
62
+ assert_equal 'unknown_token', payload['status']
63
+ assert_equal @token, payload['collect_token']
64
+ assert_equal 'multi_llm_review', payload['next_action']['tool']
65
+ assert_match(/never existed|garbage-collected|new dispatch/i,
66
+ payload['next_action']['purpose'])
67
+ end
68
+
69
+ def test_invalid_token_format_returns_unknown
70
+ payload = JSON.parse(@tool.call({ 'collect_token' => 'not-a-uuid' }).first[:text])
71
+ assert_equal 'unknown_token', payload['status']
72
+ end
73
+
74
+ # ── already_collected ────────────────────────────────────────────
75
+
76
+ def test_already_collected_returns_replay_hint
77
+ write_state
78
+ PendingState.write_collected(@token, {
79
+ 'final_payload' => { 'status' => 'ok', 'verdict' => 'APPROVE' }
80
+ })
81
+ payload = call_wait
82
+ assert_equal 'already_collected', payload['status']
83
+ assert_equal 'multi_llm_review_collect', payload['next_action']['tool']
84
+ assert_match(/idempotent replay/i, payload['next_action']['purpose'])
85
+ end
86
+
87
+ # ── past_collect_deadline ────────────────────────────────────────
88
+
89
+ def test_past_deadline_returns_redispatch_without_blocking
90
+ write_state('collect_deadline' => (Time.now - 60).iso8601)
91
+ t0 = Time.now
92
+ payload = call_wait('max_wait_seconds' => 5)
93
+ elapsed = Time.now - t0
94
+ assert_equal 'past_collect_deadline', payload['status']
95
+ assert_equal 'multi_llm_review', payload['next_action']['tool']
96
+ assert_operator elapsed, :<, 1.0, 'must not block when past deadline'
97
+ end
98
+
99
+ # ── ready ────────────────────────────────────────────────────────
100
+
101
+ def test_ready_when_subprocess_results_present
102
+ write_state
103
+ PendingState.write_subprocess_results(@token, {
104
+ 'results' => [
105
+ { 'role_label' => 'codex', 'raw_text' => 'APPROVE', 'status' => 'success' },
106
+ { 'role_label' => 'cursor', 'raw_text' => 'APPROVE', 'status' => 'success' },
107
+ { 'role_label' => 'claude', 'raw_text' => 'APPROVE', 'status' => 'success' }
108
+ ],
109
+ 'elapsed_seconds' => 12.3
110
+ })
111
+ payload = call_wait('max_wait_seconds' => 2)
112
+ assert_equal 'ready', payload['status']
113
+ assert_equal 3, payload['subprocess_done']
114
+ assert_equal 3, payload['subprocess_total']
115
+ assert_equal 'multi_llm_review_collect', payload['next_action']['tool']
116
+ assert_includes payload['next_action']['args'].keys, 'orchestrator_reviews'
117
+ end
118
+
119
+ # ── still_pending + streak escalation ────────────────────────────
120
+
121
+ def test_still_pending_returned_when_worker_healthy_but_slow
122
+ write_state
123
+ # Live heartbeat so WaitForWorker sees a healthy worker.
124
+ FileUtils.touch(PendingState.worker_heartbeat_path(@token))
125
+ PendingState.write_worker_pid(@token, { 'pid' => Process.pid, 'pgid' => Process.pid })
126
+
127
+ payload = call_wait('max_wait_seconds' => 1)
128
+ assert_equal 'still_pending', payload['status']
129
+ assert_equal 1, payload['still_pending_streak']
130
+ assert_equal 'multi_llm_review_wait', payload['next_action']['tool']
131
+ end
132
+
133
+ def test_still_pending_streak_persists_across_calls
134
+ write_state
135
+ FileUtils.touch(PendingState.worker_heartbeat_path(@token))
136
+ PendingState.write_worker_pid(@token, { 'pid' => Process.pid, 'pgid' => Process.pid })
137
+
138
+ p1 = call_wait('max_wait_seconds' => 1)
139
+ assert_equal 1, p1['still_pending_streak']
140
+ p2 = call_wait('max_wait_seconds' => 1)
141
+ assert_equal 2, p2['still_pending_streak']
142
+ end
143
+
144
+ def test_streak_at_limit_escalates_to_crashed
145
+ write_state('wait_still_pending_streak' => 3)
146
+ payload = call_wait('max_wait_seconds' => 1)
147
+ assert_equal 'crashed', payload['status']
148
+ assert_equal 'wait_exhausted', payload['crashed_reason']
149
+ assert_equal 'multi_llm_review', payload['next_action']['tool']
150
+ end
151
+
152
+ def test_ready_resets_streak
153
+ write_state('wait_still_pending_streak' => 2)
154
+ PendingState.write_subprocess_results(@token, { 'results' => [], 'elapsed_seconds' => 1 })
155
+ payload = call_wait('max_wait_seconds' => 1)
156
+ assert_equal 'ready', payload['status']
157
+ state = PendingState.load_state(@token)
158
+ assert_equal 0, state['wait_still_pending_streak'].to_i
159
+ end
160
+
161
+ # ── crashed (worker terminal) ────────────────────────────────────
162
+
163
+ def test_crashed_status_propagates_reason
164
+ write_state('subprocess_status' => 'crashed', 'crash_reason' => 'segfault')
165
+ payload = call_wait('max_wait_seconds' => 1)
166
+ assert_equal 'crashed', payload['status']
167
+ assert_equal 'segfault', payload['crashed_reason']
168
+ assert_equal 'multi_llm_review', payload['next_action']['tool']
169
+ end
170
+
171
+ # ── hard cap ─────────────────────────────────────────────────────
172
+ # Hard cap is enforced before WaitForWorker is invoked. We verify the
173
+ # clamping logic without actually waiting for the cap by checking the
174
+ # request was processed (well-formed payload returned in bounded time)
175
+ # and the deadline-remaining check fired.
176
+ def test_max_wait_clamped_when_request_exceeds_hard_cap
177
+ # Set a very short deadline so the deadline-remaining clamp fires
178
+ # almost immediately.
179
+ write_state('collect_deadline' => (Time.now + 2).iso8601)
180
+ FileUtils.touch(PendingState.worker_heartbeat_path(@token))
181
+ PendingState.write_worker_pid(@token, { 'pid' => Process.pid, 'pgid' => Process.pid })
182
+
183
+ t0 = Time.now
184
+ payload = call_wait('max_wait_seconds' => 999_999)
185
+ elapsed = Time.now - t0
186
+ # Whatever status comes back (still_pending or past_collect_deadline
187
+ # depending on timing), elapsed must be bounded — never the 999_999s
188
+ # the caller requested. Enforces the clamp path is not bypassed.
189
+ refute_nil payload['status']
190
+ assert_operator elapsed, :<, 30.0,
191
+ 'elapsed must be bounded by deadline-remaining clamp, not by raw max_wait_seconds'
192
+ end
193
+
194
+ # ── elapsed_seconds field is always present ──────────────────────
195
+
196
+ def test_elapsed_seconds_always_present
197
+ write_state
198
+ PendingState.write_subprocess_results(@token, { 'results' => [], 'elapsed_seconds' => 0.1 })
199
+ payload = call_wait('max_wait_seconds' => 1)
200
+ assert payload.key?('elapsed_seconds'), 'elapsed_seconds field missing'
201
+ assert_kind_of Float, payload['elapsed_seconds']
202
+ end
203
+
204
+ # ── next_action present on every status ──────────────────────────
205
+
206
+ def test_next_action_present_on_every_status
207
+ write_state
208
+ # ready
209
+ PendingState.write_subprocess_results(@token, { 'results' => [], 'elapsed_seconds' => 1 })
210
+ assert call_wait('max_wait_seconds' => 1)['next_action'], 'ready missing next_action'
211
+
212
+ # past_collect_deadline
213
+ File.delete(PendingState.subprocess_results_path(@token))
214
+ PendingState.write_state(@token, PendingState.load_state(@token)
215
+ .merge('collect_deadline' => (Time.now - 1).iso8601))
216
+ assert call_wait['next_action'], 'past_collect_deadline missing next_action'
217
+
218
+ # crashed
219
+ PendingState.write_state(@token, PendingState.load_state(@token).merge(
220
+ 'collect_deadline' => (Time.now + 600).iso8601,
221
+ 'subprocess_status' => 'crashed', 'crash_reason' => 'oom'
222
+ ))
223
+ assert call_wait['next_action'], 'crashed missing next_action'
224
+ end
225
+ end
226
+
227
+ # ── backward compat: collect can still be called without wait ────────
228
+ # Verifies that introducing wait does not break the existing
229
+ # "delegation_pending → collect" path. The collect tool already polls
230
+ # internally and remains the primary completion gate.
231
+ class TestWaitToolBackwardCompat < Minitest::Test
232
+ def test_collect_works_without_wait_tool
233
+ # Smoke test: load the collect tool and verify it has not gained a
234
+ # required dependency on wait. (Full collect integration is covered
235
+ # in test_multi_llm_review.rb; this is a presence check.)
236
+ require_relative '../tools/multi_llm_review_collect'
237
+ collect = Tools::MultiLlmReviewCollect.new
238
+ schema = collect.input_schema
239
+ assert_equal 'object', schema[:type]
240
+ # The collect tool's required fields must still be just collect_token
241
+ # + orchestrator_reviews — wait must NOT have been added as required.
242
+ required = schema[:required] || []
243
+ refute_includes required, 'wait_completed'
244
+ refute_includes required, 'wait_token'
245
+ end
246
+ end
247
+ end
248
+ end
249
+ end
@@ -92,6 +92,16 @@ module KairosMcp
92
92
  type: 'integer',
93
93
  description: 'Override dispatch timeout in seconds (default from config)'
94
94
  },
95
+ collect_deadline_seconds_override: {
96
+ type: 'integer',
97
+ description: 'Override how long the orchestrator has to call ' \
98
+ 'multi_llm_review_collect before the pending token expires ' \
99
+ '(default from config: delegation.collect_deadline_seconds). ' \
100
+ 'In the async/parallel path, the effective deadline is also ' \
101
+ 'auto-extended to cover the worker self_timeout plus a poll margin, ' \
102
+ 'so raising timeout_seconds_override alone no longer leaves the ' \
103
+ 'collect deadline shorter than the worker lifespan.'
104
+ },
95
105
  complexity: {
96
106
  type: 'string',
97
107
  enum: %w[auto low medium high critical],
@@ -446,7 +456,8 @@ module KairosMcp
446
456
  }))
447
457
  end
448
458
 
449
- deadline_secs = config.dig('delegation', 'collect_deadline_seconds') || 600
459
+ deadline_secs = arguments['collect_deadline_seconds_override'] ||
460
+ config.dig('delegation', 'collect_deadline_seconds') || 1800
450
461
  now = Time.now
451
462
  token = PendingState.generate_token
452
463
 
@@ -507,11 +518,22 @@ module KairosMcp
507
518
  }))
508
519
  end
509
520
 
510
- deadline_secs = config.dig('delegation', 'collect_deadline_seconds') || 600
521
+ deadline_secs = arguments['collect_deadline_seconds_override'] ||
522
+ config.dig('delegation', 'collect_deadline_seconds') || 1800
511
523
  multiplier = parallel_cfg['worker_self_timeout_multiplier'] || 1.5
512
524
  floor = parallel_cfg['worker_self_timeout_floor_seconds'] || 60
525
+ poll_interval = parallel_cfg['poll_interval_seconds'] || 0.5
513
526
  now = Time.now
514
- self_timeout_at = (now + timeout_secs * multiplier + floor).iso8601
527
+ worker_lifespan_secs = (timeout_secs * multiplier + floor).to_f
528
+ self_timeout_at = (now + worker_lifespan_secs).iso8601
529
+
530
+ # Auto-extend collect_deadline to cover the worker's self_timeout plus
531
+ # a polling margin. Without this, raising timeout_seconds_override alone
532
+ # leaves the orchestrator's submission window shorter than the worker
533
+ # lifespan — the collect token expires while the worker is still healthy.
534
+ # Only kicks in for the async path; sync delegate_response has no worker.
535
+ min_deadline_secs = (worker_lifespan_secs + (poll_interval * 20)).ceil
536
+ deadline_secs = [deadline_secs.to_i, min_deadline_secs].max
515
537
 
516
538
  # UUID collision retry (EEXIST on Dir.mkdir per PendingState§token_dir).
517
539
  token = nil
@@ -578,7 +600,7 @@ module KairosMcp
578
600
  'instruction' => 'Run persona-based review using your Agent tool. ' \
579
601
  "Choose #{PersonaAssembly::MIN_PERSONAS}-#{PersonaAssembly::MAX_PERSONAS} " \
580
602
  'personas appropriate to the artifact and review_type. ' \
581
- 'Submit findings via multi_llm_review_collect with the collect_token below.',
603
+ 'Then call multi_llm_review_wait, then multi_llm_review_collect.',
582
604
  'review_type' => arguments['review_type'],
583
605
  'persona_count_min' => PersonaAssembly::MIN_PERSONAS,
584
606
  'persona_count_max' => PersonaAssembly::MAX_PERSONAS
@@ -586,7 +608,18 @@ module KairosMcp
586
608
  'subprocess_status' => 'pending',
587
609
  'subprocess_total' => reviewers.size,
588
610
  'must_collect_by' => (now + deadline_secs).iso8601,
589
- 'orchestrator_model' => orchestrator_model
611
+ 'orchestrator_model' => orchestrator_model,
612
+ # next_action hint (R1, R8): MCP does not enforce ordering, but
613
+ # the LLM is highly likely to follow this hint. Calling wait is
614
+ # optional — collect alone still works via its internal polling —
615
+ # but wait surfaces structural completion deterministically.
616
+ 'next_action' => {
617
+ 'tool' => 'multi_llm_review_wait',
618
+ 'args' => { 'collect_token' => token, 'max_wait_seconds' => 600 },
619
+ 'purpose' => 'Phase 1.5: block until subprocess reviewers complete. Call ' \
620
+ 'AFTER spawning persona Agent reviews, BEFORE multi_llm_review_collect. ' \
621
+ 'Optional but strongly recommended for deterministic recovery hints.'
622
+ }
590
623
  }))
591
624
  end
592
625
 
@@ -0,0 +1,313 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'json'
4
+ require 'time'
5
+ require_relative '../lib/multi_llm_review/pending_state'
6
+ require_relative '../lib/multi_llm_review/wait_for_worker'
7
+
8
+ module KairosMcp
9
+ module SkillSets
10
+ module MultiLlmReview
11
+ module Tools
12
+ # Phase 1.5 of the orchestrator delegation protocol.
13
+ #
14
+ # Optional blocking gate that orchestrator can call AFTER spawning
15
+ # persona Agent reviews and BEFORE multi_llm_review_collect. Server
16
+ # polls the detached worker's state and returns when subprocess
17
+ # reviewers complete (or earlier on terminal conditions).
18
+ #
19
+ # Without this tool, orchestrator can still call collect directly —
20
+ # collect's own internal polling covers worker completion. wait is a
21
+ # tool-chain checkpoint that surfaces structural status (ready,
22
+ # crashed, exhausted) with explicit next_action recovery hints, so
23
+ # the LLM can choose the right next step deterministically.
24
+ #
25
+ # Status enum (R10):
26
+ # ready — subprocess_results.json present, proceed to collect
27
+ # still_pending — max_wait elapsed, worker healthy, may call wait again
28
+ # crashed — worker terminal failure (with reason)
29
+ # unknown_token — token dir missing (never existed or GC'd)
30
+ # already_collected — collected.json present, retrieve cached payload
31
+ # past_collect_deadline — token alive but past deadline; collect would reject
32
+ class MultiLlmReviewWait < KairosMcp::Tools::BaseTool
33
+ # Per-call hard cap on max_wait_seconds (R7).
34
+ MAX_WAIT_HARD_CAP_DEFAULT = 1800
35
+
36
+ # Default streak limit before still_pending escalates to crashed (R7).
37
+ STILL_PENDING_STREAK_LIMIT_DEFAULT = 3
38
+
39
+ def name
40
+ 'multi_llm_review_wait'
41
+ end
42
+
43
+ def description
44
+ 'Phase 1.5 — block until subprocess reviewers complete for a delegated ' \
45
+ 'multi_llm_review token. Optional but recommended: call after spawning ' \
46
+ 'persona Agent reviews and before multi_llm_review_collect. Returns ' \
47
+ 'a status enum with a next_action recovery hint for every status.'
48
+ end
49
+
50
+ def category
51
+ :review
52
+ end
53
+
54
+ def usecase_tags
55
+ %w[review multi-llm wait blocking polling]
56
+ end
57
+
58
+ def related_tools
59
+ %w[multi_llm_review multi_llm_review_collect]
60
+ end
61
+
62
+ def input_schema
63
+ {
64
+ type: 'object',
65
+ properties: {
66
+ collect_token: {
67
+ type: 'string',
68
+ description: 'UUID v4 token returned by multi_llm_review delegation_pending'
69
+ },
70
+ max_wait_seconds: {
71
+ type: 'integer',
72
+ description: 'Server-side blocking duration cap in seconds. ' \
73
+ 'Default from config (delegation.parallel.wait_max_default_seconds). ' \
74
+ 'Hard cap 1800 (delegation.parallel.wait_max_hard_cap_seconds).'
75
+ }
76
+ },
77
+ required: %w[collect_token]
78
+ }
79
+ end
80
+
81
+ def call(arguments)
82
+ token = arguments['collect_token'].to_s
83
+ unless PendingState.valid_token?(token)
84
+ return text_content(JSON.generate({
85
+ 'status' => 'unknown_token',
86
+ 'collect_token' => token,
87
+ 'elapsed_seconds' => 0.0,
88
+ 'next_action' => next_action_redispatch(
89
+ 'Token format invalid. Re-run multi_llm_review to start a new dispatch.'
90
+ )
91
+ }))
92
+ end
93
+
94
+ cfg = config_parallel
95
+ default_max = (cfg['wait_max_default_seconds'] || 600).to_i
96
+ hard_cap = (cfg['wait_max_hard_cap_seconds'] || MAX_WAIT_HARD_CAP_DEFAULT).to_i
97
+ poll_int = (cfg['wait_poll_interval_seconds'] || 1.0).to_f
98
+ streak_limit = (cfg['wait_still_pending_streak_limit'] ||
99
+ STILL_PENDING_STREAK_LIMIT_DEFAULT).to_i
100
+
101
+ requested_max = (arguments['max_wait_seconds'] || default_max).to_i
102
+ requested_max = hard_cap if requested_max > hard_cap
103
+ requested_max = 1 if requested_max < 1
104
+
105
+ # 1. already_collected check (collected.json present) — before any
106
+ # deadline / token-dir checks so a successful collect always
107
+ # returns deterministically even after deadline expiry.
108
+ if File.exist?(safe_path { PendingState.collected_path(token) })
109
+ return reply('already_collected', token, 0.0,
110
+ next_action: next_action_collect_replay(token,
111
+ 'Collect already completed for this token. Call multi_llm_review_collect ' \
112
+ 'to retrieve the cached final consensus (idempotent replay).'))
113
+ end
114
+
115
+ # 2. unknown_token check (state.json missing).
116
+ state = PendingState.load_state(token)
117
+ if state.nil?
118
+ return reply('unknown_token', token, 0.0,
119
+ next_action: next_action_redispatch(
120
+ 'Token not found (never existed or already garbage-collected). ' \
121
+ 'Re-run multi_llm_review to start a new dispatch.'))
122
+ end
123
+
124
+ # 3. past_collect_deadline early exit (collect would reject anyway).
125
+ deadline = (Time.iso8601(state['collect_deadline']) rescue nil)
126
+ if deadline && Time.now > deadline
127
+ return reply('past_collect_deadline', token, 0.0,
128
+ subprocess_total: state['subprocess_total'] ||
129
+ (PendingState.load_request(token)&.dig('reviewers')&.size),
130
+ next_action: next_action_redispatch(
131
+ 'Token deadline elapsed. multi_llm_review_collect would reject. ' \
132
+ 'Re-run multi_llm_review to start a new dispatch.'))
133
+ end
134
+
135
+ # 4. Cap max_wait by remaining deadline (R7) so we never block
136
+ # longer than the useful lifetime of the token.
137
+ if deadline
138
+ remaining = (deadline - Time.now).to_i
139
+ requested_max = remaining if remaining < requested_max
140
+ requested_max = 1 if requested_max < 1
141
+ end
142
+
143
+ # 5. Streak guard: if still_pending was returned too many times in
144
+ # a row, escalate to crashed/wait_exhausted.
145
+ streak = (state['wait_still_pending_streak'] || 0).to_i
146
+ if streak >= streak_limit
147
+ return reply('crashed', token, 0.0,
148
+ crashed_reason: 'wait_exhausted',
149
+ still_pending_streak: streak,
150
+ next_action: next_action_redispatch(
151
+ "still_pending streak reached limit (#{streak_limit}). Worker may be " \
152
+ 'wedged or pathologically slow. Re-run multi_llm_review.'))
153
+ end
154
+
155
+ # 6. Delegate to existing WaitForWorker for the actual polling.
156
+ outcome = WaitForWorker.wait(token, {
157
+ max_wait_seconds: requested_max,
158
+ poll_interval_seconds: poll_int,
159
+ startup_grace_seconds: cfg['startup_grace_seconds'] || 30,
160
+ heartbeat_stale_threshold_seconds: cfg['heartbeat_stale_threshold_seconds'] || 15
161
+ })
162
+
163
+ translate_outcome(token, outcome, streak, requested_max, state)
164
+ rescue StandardError => e
165
+ warn "[multi_llm_review_wait] INTERNAL ERROR: #{e.class}: #{e.message}"
166
+ warn e.backtrace.first(10).join("\n") if e.backtrace
167
+ text_content(JSON.generate({
168
+ 'status' => 'error',
169
+ 'error_class' => 'internal',
170
+ 'error' => "#{e.class}: #{e.message}",
171
+ 'collect_token' => arguments['collect_token']
172
+ }))
173
+ end
174
+
175
+ private
176
+
177
+ def translate_outcome(token, outcome, prior_streak, requested_max, state)
178
+ elapsed = (outcome[:elapsed] || requested_max).to_f
179
+ subprocess_total = state['subprocess_total'] ||
180
+ PendingState.load_request(token)&.dig('reviewers')&.size
181
+
182
+ case outcome[:status]
183
+ when :ready
184
+ reset_streak(token)
185
+ done = (outcome[:results].is_a?(Array) ? outcome[:results].size : nil) ||
186
+ subprocess_total
187
+ reply('ready', token, elapsed,
188
+ subprocess_done: done,
189
+ subprocess_total: subprocess_total,
190
+ next_action: next_action_collect(token,
191
+ 'Subprocess reviewers complete. Submit your persona Agent findings to ' \
192
+ 'multi_llm_review_collect to compute the final consensus.'))
193
+ when :crashed
194
+ reset_streak(token)
195
+ reply('crashed', token, elapsed,
196
+ crashed_reason: outcome[:reason] || 'crashed',
197
+ subprocess_total: subprocess_total,
198
+ next_action: next_action_redispatch(
199
+ "Worker terminated abnormally (#{outcome[:reason] || 'crashed'}). " \
200
+ 'Re-run multi_llm_review to start a new dispatch.'))
201
+ when :timeout
202
+ new_streak = prior_streak + 1
203
+ persist_streak(token, new_streak)
204
+ reply('still_pending', token, elapsed,
205
+ subprocess_total: subprocess_total,
206
+ still_pending_streak: new_streak,
207
+ next_action: next_action_wait(token,
208
+ "Worker still healthy after #{requested_max}s. Call multi_llm_review_wait " \
209
+ "again with the same token (streak #{new_streak}/#{(state.dig('wait_still_pending_streak_limit') || STILL_PENDING_STREAK_LIMIT_DEFAULT)})."))
210
+ else
211
+ reply('crashed', token, elapsed,
212
+ crashed_reason: "unknown_outcome:#{outcome[:status]}",
213
+ subprocess_total: subprocess_total,
214
+ next_action: next_action_redispatch(
215
+ 'Worker reported an unexpected outcome. Re-run multi_llm_review.'))
216
+ end
217
+ end
218
+
219
+ def reply(status, token, elapsed, **fields)
220
+ payload = {
221
+ 'status' => status,
222
+ 'collect_token' => token,
223
+ 'elapsed_seconds' => elapsed.round(3)
224
+ }
225
+ payload['subprocess_done'] = fields[:subprocess_done] if fields.key?(:subprocess_done)
226
+ payload['subprocess_total'] = fields[:subprocess_total] if fields.key?(:subprocess_total)
227
+ payload['crashed_reason'] = fields[:crashed_reason] if fields.key?(:crashed_reason)
228
+ payload['still_pending_streak'] = fields[:still_pending_streak] if fields.key?(:still_pending_streak)
229
+ payload['next_action'] = fields[:next_action] if fields.key?(:next_action)
230
+ text_content(JSON.generate(payload))
231
+ end
232
+
233
+ def next_action_collect(token, purpose)
234
+ {
235
+ 'tool' => 'multi_llm_review_collect',
236
+ 'args' => {
237
+ 'collect_token' => token,
238
+ 'orchestrator_reviews' => '<persona findings array, 2-4 entries>'
239
+ },
240
+ 'purpose' => purpose
241
+ }
242
+ end
243
+
244
+ def next_action_collect_replay(token, purpose)
245
+ {
246
+ 'tool' => 'multi_llm_review_collect',
247
+ 'args' => { 'collect_token' => token },
248
+ 'purpose' => purpose
249
+ }
250
+ end
251
+
252
+ def next_action_wait(token, purpose)
253
+ {
254
+ 'tool' => 'multi_llm_review_wait',
255
+ 'args' => { 'collect_token' => token },
256
+ 'purpose' => purpose
257
+ }
258
+ end
259
+
260
+ def next_action_redispatch(purpose)
261
+ {
262
+ 'tool' => 'multi_llm_review',
263
+ 'args' => '<original arguments>',
264
+ 'purpose' => purpose
265
+ }
266
+ end
267
+
268
+ # Streak persistence via PendingState.update_state (atomic RMW).
269
+ def persist_streak(token, n)
270
+ PendingState.update_state(token) do |state|
271
+ next nil unless state
272
+ state['wait_still_pending_streak'] = n
273
+ state
274
+ end
275
+ rescue StandardError
276
+ # Best-effort. Streak loss = orchestrator gets one more retry,
277
+ # acceptable degradation.
278
+ end
279
+
280
+ def reset_streak(token)
281
+ PendingState.update_state(token) do |state|
282
+ next nil unless state
283
+ if state['wait_still_pending_streak'].to_i.positive?
284
+ state['wait_still_pending_streak'] = 0
285
+ state
286
+ else
287
+ nil
288
+ end
289
+ end
290
+ rescue StandardError
291
+ # Best-effort.
292
+ end
293
+
294
+ def safe_path
295
+ yield
296
+ rescue StandardError
297
+ '/dev/null/never_exists'
298
+ end
299
+
300
+ def config_parallel
301
+ return {} unless self.class.const_defined?(:CONFIG_PATH) || true
302
+ path = File.expand_path('../config/multi_llm_review.yml', __dir__)
303
+ return {} unless File.exist?(path)
304
+ cfg = YAML.safe_load_file(path, permitted_classes: [Symbol], aliases: true)
305
+ (cfg.dig('delegation', 'parallel') || {}).to_h
306
+ rescue StandardError
307
+ {}
308
+ end
309
+ end
310
+ end
311
+ end
312
+ end
313
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: kairos-chain
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.23.1
4
+ version: 3.24.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Masaomi Hatakeyama
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2026-04-26 00:00:00.000000000 Z
11
+ date: 2026-04-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: minitest
@@ -497,6 +497,7 @@ files:
497
497
  - templates/skillsets/multi_llm_review/test/test_feedback_formatter.rb
498
498
  - templates/skillsets/multi_llm_review/test/test_multi_llm_review.rb
499
499
  - templates/skillsets/multi_llm_review/test/test_multi_llm_review_bundle.rb
500
+ - templates/skillsets/multi_llm_review/test/test_multi_llm_review_wait.rb
500
501
  - templates/skillsets/multi_llm_review/test/test_pending_state_v3.rb
501
502
  - templates/skillsets/multi_llm_review/test/test_pin_resolver.rb
502
503
  - templates/skillsets/multi_llm_review/test/test_sanitizer.rb
@@ -504,6 +505,7 @@ files:
504
505
  - templates/skillsets/multi_llm_review/tools/multi_llm_review.rb
505
506
  - templates/skillsets/multi_llm_review/tools/multi_llm_review_bundle.rb
506
507
  - templates/skillsets/multi_llm_review/tools/multi_llm_review_collect.rb
508
+ - templates/skillsets/multi_llm_review/tools/multi_llm_review_wait.rb
507
509
  - templates/skillsets/multiuser/config/multiuser.yml
508
510
  - templates/skillsets/multiuser/lib/multiuser.rb
509
511
  - templates/skillsets/multiuser/lib/multiuser/authorization_gate.rb