kairos-chain 3.23.1 → 3.24.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +71 -0
- data/lib/kairos_mcp/version.rb +1 -1
- data/templates/knowledge/multi_llm_review_workflow/multi_llm_review_workflow.md +106 -2
- data/templates/skillsets/multi_llm_review/config/multi_llm_review.yml +6 -1
- data/templates/skillsets/multi_llm_review/skillset.json +6 -4
- data/templates/skillsets/multi_llm_review/test/test_multi_llm_review.rb +155 -1
- data/templates/skillsets/multi_llm_review/test/test_multi_llm_review_wait.rb +249 -0
- data/templates/skillsets/multi_llm_review/tools/multi_llm_review.rb +38 -5
- data/templates/skillsets/multi_llm_review/tools/multi_llm_review_wait.rb +313 -0
- metadata +4 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 20e2a223137f51dc61025e57dd6fd205a8f702ef923b5b0b2e0d464a308d279f
|
|
4
|
+
data.tar.gz: 51627cb487cf5fc2e46b8e6055bf36e0cf5c8839f15cb2c2236f2ff56efa2def
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 9fd4b17a28bdc06b7b19195e7274ef41e4ba7a93fc70e2c3a38083c58eae85ea8dfd432cb7cc6219cf7ba49ba637dbed12c48ea12312f4a3f26f46dfb936438f
|
|
7
|
+
data.tar.gz: fadb35fdbf47eeebfc9b667685452a0c222ecb396dbb2031de08ac27b2df1de1a3985fc41c03e4e767d22a58ec98a77d52c2059e921f8084d1663a2a099bdd1d
|
data/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,77 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
This project follows [Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [3.24.0] - 2026-04-27
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **multi_llm_review_wait MCP tool** (Phase 1.5) — optional blocking gate
|
|
12
|
+
between `multi_llm_review` (Phase 1) and `multi_llm_review_collect`
|
|
13
|
+
(Phase 2). Wraps the existing `WaitForWorker.wait` polling loop and
|
|
14
|
+
exposes 6 distinct status codes (`ready`, `still_pending`, `crashed`,
|
|
15
|
+
`unknown_token`, `already_collected`, `past_collect_deadline`) each with
|
|
16
|
+
a `next_action` recovery hint pointing at the right next tool.
|
|
17
|
+
- **`next_action` hint on `multi_llm_review` delegation_pending response** —
|
|
18
|
+
structured `{tool, args, purpose}` field nudging the orchestrator to call
|
|
19
|
+
`multi_llm_review_wait` after persona Agent dispatch. MCP does not enforce
|
|
20
|
+
ordering; this is a hint, not a constraint, but in practice LLMs follow
|
|
21
|
+
it reliably.
|
|
22
|
+
- **Path A vs Path B disambiguation in workflow knowledge doc** — surfaces
|
|
23
|
+
the long-implicit distinction between the host-tracked Bash workflow
|
|
24
|
+
(Claude Code's `Bash(background)` pattern, statusbar shows `XX shells`)
|
|
25
|
+
and the MCP-managed SkillSet (detached worker, no host-side tracking,
|
|
26
|
+
polling required).
|
|
27
|
+
- New config keys under `delegation.parallel`:
|
|
28
|
+
`wait_poll_interval_seconds: 1.0`, `wait_max_default_seconds: 600`,
|
|
29
|
+
`wait_max_hard_cap_seconds: 1800`, `wait_still_pending_streak_limit: 3`.
|
|
30
|
+
- Streak guard: 3 consecutive `still_pending` returns escalate to
|
|
31
|
+
`crashed/wait_exhausted` so a wedged worker cannot trap the orchestrator
|
|
32
|
+
in an infinite wait loop.
|
|
33
|
+
- 14 new tests in `test_multi_llm_review_wait.rb` covering all status
|
|
34
|
+
paths, streak persistence/reset, hard cap clamping, deadline-remaining
|
|
35
|
+
clamping, and backward compatibility (collect still works without wait).
|
|
36
|
+
|
|
37
|
+
### Changed
|
|
38
|
+
|
|
39
|
+
- `multi_llm_review` SkillSet version 0.4.0 → 0.5.0.
|
|
40
|
+
- `delegation` instruction text now mentions wait → collect chain.
|
|
41
|
+
|
|
42
|
+
### Notes
|
|
43
|
+
|
|
44
|
+
- Backward compatible: callers that skip wait and call collect directly
|
|
45
|
+
still work via collect's existing internal polling.
|
|
46
|
+
- Design review (Codex GPT-5.5 + Cursor Composer-2 + Claude Team Opus 4.7)
|
|
47
|
+
produced 3/3 REVISE with 6-7 P1 issues; revisions R1-R14 captured in
|
|
48
|
+
handoff L2 `multi_llm_review_wait_tool_handoff` before implementation.
|
|
49
|
+
|
|
50
|
+
## [3.23.3] - 2026-04-27
|
|
51
|
+
|
|
52
|
+
### Documentation
|
|
53
|
+
|
|
54
|
+
- **multi_llm_review_workflow knowledge** — Added "Async/Parallel Collect
|
|
55
|
+
Timing — Iron Rule" subsection. Documents the workflow constraint that the
|
|
56
|
+
orchestrator must call `multi_llm_review_collect` immediately after persona
|
|
57
|
+
Agent reviews complete, without intervening user dialogue. Explains the
|
|
58
|
+
underlying mechanics (LLM is not event-driven; collect already polls
|
|
59
|
+
internally at 0.5s intervals; token expiry vs subprocess completion). Adds
|
|
60
|
+
recommended flow, anti-pattern, and manual recovery instructions.
|
|
61
|
+
- Updated stale `must_collect_by` default reference (600s → 1800s).
|
|
62
|
+
|
|
63
|
+
## [3.23.2] - 2026-04-26
|
|
64
|
+
|
|
65
|
+
### Fixed
|
|
66
|
+
|
|
67
|
+
- **multi_llm_review collect_deadline bug** — `timeout_seconds_override` no longer
|
|
68
|
+
leaves the orchestrator's submission window shorter than the worker lifespan.
|
|
69
|
+
In the async/parallel path, `collect_deadline` is now auto-extended to cover
|
|
70
|
+
`worker self_timeout + poll margin` so raising `timeout_seconds_override`
|
|
71
|
+
alone keeps the token alive while the worker is healthy.
|
|
72
|
+
- New `collect_deadline_seconds_override` argument on `multi_llm_review` for
|
|
73
|
+
explicit control of the orchestrator's submission window.
|
|
74
|
+
- Default `delegation.collect_deadline_seconds` raised from `600` (10 min) to
|
|
75
|
+
`1800` (30 min) to better fit interactive runs where user dialogue intervenes
|
|
76
|
+
between Phase 1 and `multi_llm_review_collect`.
|
|
77
|
+
|
|
7
78
|
## [3.17.0] - 2026-04-22
|
|
8
79
|
|
|
9
80
|
### Added
|
data/lib/kairos_mcp/version.rb
CHANGED
|
@@ -29,6 +29,54 @@ This skill covers:
|
|
|
29
29
|
For **WHO** (which LLM is good at what), see: `multi_llm_reviewer_evaluation`
|
|
30
30
|
For **development lifecycle** (design → implement → verify), see: `design_to_implementation_workflow`
|
|
31
31
|
|
|
32
|
+
## Two Execution Paths (read this first)
|
|
33
|
+
|
|
34
|
+
There are **two distinct execution paths** with the same name "multi-LLM review".
|
|
35
|
+
They differ in subprocess lifecycle ownership and completion-detection mechanics.
|
|
36
|
+
Pick the right one for your environment:
|
|
37
|
+
|
|
38
|
+
### Path A — Host-tracked (Bash workflow)
|
|
39
|
+
|
|
40
|
+
- **Trigger**: orchestrator (LLM) calls Claude Code's `Bash` tool with
|
|
41
|
+
`run_in_background: true` to spawn `claude -p`, `codex exec`, `agent -p` directly.
|
|
42
|
+
- **Process parent**: Claude Code (the host harness).
|
|
43
|
+
- **Completion detection**: **event-driven**. Claude Code's shell tracker monitors
|
|
44
|
+
the spawned shells; when they exit, the LLM is notified through the standard
|
|
45
|
+
tool-result mechanism. Statusbar shows `XX shells` while reviewers are running.
|
|
46
|
+
- **When to use**: interactive Claude Code sessions for one-off Tier 3 reviews.
|
|
47
|
+
- **Reference**: see "Orchestration Template" section below for the canonical
|
|
48
|
+
`Bash(background)` pattern.
|
|
49
|
+
|
|
50
|
+
### Path B — MCP-managed (multi_llm_review SkillSet)
|
|
51
|
+
|
|
52
|
+
- **Trigger**: orchestrator calls the MCP tool `multi_llm_review`.
|
|
53
|
+
- **Process parent**: the kairos-chain Ruby gem (MCP server). The gem forks a
|
|
54
|
+
detached worker (`bin/dispatch_worker.rb`) which calls `Process.setsid` and
|
|
55
|
+
spawns CLI reviewers as a separate session leader.
|
|
56
|
+
- **Completion detection**: **polling required**. Claude Code is not the parent,
|
|
57
|
+
so the spawned subprocesses do NOT appear in the `XX shells` statusbar count.
|
|
58
|
+
The orchestrator must call `multi_llm_review_collect` (and optionally
|
|
59
|
+
`multi_llm_review_wait` first) to observe completion.
|
|
60
|
+
- **When to use**: portable execution (other MCP hosts, autonomous Agent SkillSet),
|
|
61
|
+
or any case where you want the consensus computation done server-side.
|
|
62
|
+
- **Recommended chain (3-step)**: `multi_llm_review` → `multi_llm_review_wait` →
|
|
63
|
+
`multi_llm_review_collect`. Each Phase-1/1.5 response carries a `next_action`
|
|
64
|
+
hint pointing at the next tool. wait is optional but recommended — without it,
|
|
65
|
+
collect's internal polling still covers worker completion, but recovery hints
|
|
66
|
+
for `still_pending`, `crashed`, and `past_collect_deadline` are less explicit.
|
|
67
|
+
- **Reference**: see "Orchestrator Delegation Protocol" + "Async/Parallel Collect
|
|
68
|
+
Timing — Iron Rule" sections below.
|
|
69
|
+
|
|
70
|
+
### Quick selector
|
|
71
|
+
|
|
72
|
+
| Question | Answer |
|
|
73
|
+
|----------|--------|
|
|
74
|
+
| Are you in an interactive Claude Code session and just need one review? | **Path A** |
|
|
75
|
+
| Do you need this to work in Cursor / autonomous mode / other MCP host? | **Path B** |
|
|
76
|
+
| Do you want the consensus result inside the MCP tool response? | **Path B** |
|
|
77
|
+
| Did you observe `XX shells` in the statusbar last time it worked? | That was Path A |
|
|
78
|
+
| Did the run produce a `collect_token` and a `pending/<token>/` directory? | That was Path B |
|
|
79
|
+
|
|
32
80
|
## Roles
|
|
33
81
|
|
|
34
82
|
| Role | Who | Responsibility |
|
|
@@ -331,8 +379,8 @@ cross-model subprocess reviewers give epistemic diversity. The two are complemen
|
|
|
331
379
|
|
|
332
380
|
**Failure modes**:
|
|
333
381
|
- `expired_or_unknown_token`: orchestrator missed `must_collect_by` deadline
|
|
334
|
-
(default 600s), or token never existed. The pending
|
|
335
|
-
`multi_llm_review` again from scratch.
|
|
382
|
+
(default 1800s since v3.23.2; was 600s), or token never existed. The pending
|
|
383
|
+
review is gone; call `multi_llm_review` again from scratch.
|
|
336
384
|
- `error: invalid orchestrator_reviews`: persona count outside 2-4 or missing
|
|
337
385
|
required fields. Fix and retry collect with the same token.
|
|
338
386
|
- All-subprocess-failed at Call 1: returns error immediately; no token issued.
|
|
@@ -340,6 +388,62 @@ cross-model subprocess reviewers give epistemic diversity. The two are complemen
|
|
|
340
388
|
**Default**: `orchestrator_strategy` defaults to `"exclude"` (back-compat). Use
|
|
341
389
|
`"delegate"` explicitly until validated by use.
|
|
342
390
|
|
|
391
|
+
#### Async/Parallel Collect Timing — Iron Rule
|
|
392
|
+
|
|
393
|
+
When `delegation.parallel.default: true` (the v3.x default), Call 1 returns
|
|
394
|
+
`delegation_pending` **immediately** (~50ms) and a detached worker runs the
|
|
395
|
+
subprocess reviewers in parallel with the orchestrator's persona Agent
|
|
396
|
+
reviews. This is faster, but introduces a timing trap:
|
|
397
|
+
|
|
398
|
+
> **The orchestrator MUST call `multi_llm_review_collect` immediately after
|
|
399
|
+
> the persona Agent reviews complete — without intervening user dialogue,
|
|
400
|
+
> unrelated tool calls, or context switches.**
|
|
401
|
+
|
|
402
|
+
Why this matters:
|
|
403
|
+
|
|
404
|
+
- The LLM is **not event-driven**. When the worker finishes writing
|
|
405
|
+
`subprocess_status: "done"` to `state.json`, nothing wakes the orchestrator.
|
|
406
|
+
The orchestrator only notices when it next calls `multi_llm_review_collect`.
|
|
407
|
+
- `multi_llm_review_collect` already polls internally at
|
|
408
|
+
`poll_interval_seconds: 0.5` for up to `collect_max_wait_seconds: 420` (7min)
|
|
409
|
+
per call. Polling is not the bottleneck — the bottleneck is the orchestrator
|
|
410
|
+
forgetting to call collect at all.
|
|
411
|
+
- The token expires at `collect_deadline` (default 30min since v3.23.2). If
|
|
412
|
+
user dialogue or other work intervenes between persona Agent completion and
|
|
413
|
+
the collect call, the token can expire while the subprocess results sit
|
|
414
|
+
ready and unread on disk.
|
|
415
|
+
|
|
416
|
+
Recommended orchestrator flow (single LLM turn, no detours):
|
|
417
|
+
|
|
418
|
+
```
|
|
419
|
+
1. multi_llm_review(...) → receive delegation_pending + collect_token
|
|
420
|
+
2. Spawn persona Agent reviews (Agent tool, parallel, 2-4 personas)
|
|
421
|
+
3. As soon as ALL personas return → multi_llm_review_collect(collect_token, ...)
|
|
422
|
+
4. Return final consensus to user
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
Anti-pattern (do NOT do this):
|
|
426
|
+
|
|
427
|
+
```
|
|
428
|
+
1. multi_llm_review(...) → delegation_pending
|
|
429
|
+
2. Run persona Agent reviews
|
|
430
|
+
3. ❌ "By the way, while we wait, let me explain X to the user…"
|
|
431
|
+
4. ❌ User asks an unrelated question, conversation drifts
|
|
432
|
+
5. ❌ 30+ minutes later, finally try collect → expired_or_unknown_token
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
If the orchestrator is genuinely interrupted (user explicitly switches topic,
|
|
436
|
+
or persona Agent itself takes a long time and the orchestrator wants to
|
|
437
|
+
report progress), it should still **call collect first** — collect returns
|
|
438
|
+
quickly if the worker is already done, or blocks up to 7min if not. Either
|
|
439
|
+
way, the token stays alive and consensus is captured before resuming side
|
|
440
|
+
work.
|
|
441
|
+
|
|
442
|
+
Manual recovery if expiry happens: subprocess results are persisted at
|
|
443
|
+
`.kairos/multi_llm_review/pending/<token>/subprocess_results.json` and remain
|
|
444
|
+
readable until GC. Read them directly and synthesize manually, then re-run
|
|
445
|
+
`multi_llm_review` for fresh results if needed.
|
|
446
|
+
|
|
343
447
|
### Critical CLI Notes
|
|
344
448
|
|
|
345
449
|
- **Cursor Agent stdin**: `cat file | agent -p -` does NOT work. Use file-reference:
|
|
@@ -39,7 +39,7 @@ convergence_rule_after_exclusion: "3/4 APPROVE"
|
|
|
39
39
|
# Phase 2 (multi_llm_review_collect) receives the orchestrator's persona
|
|
40
40
|
# team review and computes final consensus.
|
|
41
41
|
delegation:
|
|
42
|
-
collect_deadline_seconds:
|
|
42
|
+
collect_deadline_seconds: 1800 # how long the orchestrator has to call collect (30min — interactive runs often have user dialogue between Phase 1 and collect)
|
|
43
43
|
retain_collected_seconds: 3600 # how long collected results stay for idempotent replay
|
|
44
44
|
# v0.3.0 parallel subprocess worker (Phase 11.5). When default:true, Phase 1
|
|
45
45
|
# returns a delegation_pending token immediately and a detached OS worker
|
|
@@ -66,6 +66,11 @@ delegation:
|
|
|
66
66
|
worker_self_timeout_floor_seconds: 60
|
|
67
67
|
main_call_max_timeout_seconds: 300
|
|
68
68
|
main_call_timeout_margin_seconds: 60
|
|
69
|
+
# multi_llm_review_wait tool (Phase 1.5) — see tools/multi_llm_review_wait.rb
|
|
70
|
+
wait_poll_interval_seconds: 1.0 # wait tool polling cadence (separate from collect's 0.5s)
|
|
71
|
+
wait_max_default_seconds: 600 # default per-call blocking ceiling
|
|
72
|
+
wait_max_hard_cap_seconds: 1800 # per-call hard cap (clamps max_wait_seconds arg)
|
|
73
|
+
wait_still_pending_streak_limit: 3 # consecutive still_pending returns before crashed/wait_exhausted
|
|
69
74
|
|
|
70
75
|
# Dispatch settings
|
|
71
76
|
timeout_seconds: 300 # global deadline for all reviewers
|
|
@@ -1,19 +1,21 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "multi_llm_review",
|
|
3
|
-
"version": "0.
|
|
4
|
-
"description": "Parallel multi-LLM review orchestration. Dispatches review prompts to N LLM backends via llm_client, collects verdicts, and computes consensus. v0.4.0 (Phase 12):
|
|
3
|
+
"version": "0.5.0",
|
|
4
|
+
"description": "Parallel multi-LLM review orchestration. Dispatches review prompts to N LLM backends via llm_client, collects verdicts, and computes consensus. v0.5.0: adds multi_llm_review_wait (Phase 1.5) for explicit subprocess completion gating with next_action recovery hints, and Path A/B doc disambiguation. v0.4.0 (Phase 12): feedback_text + schema_version, sanitization contract for prompt-injection defense, and multi_llm_review_bundle tool for human-handoff paths without dispatch.",
|
|
5
5
|
"author": "Masaomi Hatakeyama",
|
|
6
6
|
"layer": "L1",
|
|
7
7
|
"depends_on": ["llm_client"],
|
|
8
8
|
"provides": [
|
|
9
9
|
"multi_llm_review_orchestration",
|
|
10
10
|
"review_consensus",
|
|
11
|
-
"review_bundle_human_handoff"
|
|
11
|
+
"review_bundle_human_handoff",
|
|
12
|
+
"review_wait_gate"
|
|
12
13
|
],
|
|
13
14
|
"tool_classes": [
|
|
14
15
|
"KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReview",
|
|
15
16
|
"KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReviewCollect",
|
|
16
|
-
"KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReviewBundle"
|
|
17
|
+
"KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReviewBundle",
|
|
18
|
+
"KairosMcp::SkillSets::MultiLlmReview::Tools::MultiLlmReviewWait"
|
|
17
19
|
],
|
|
18
20
|
"config_files": ["config/multi_llm_review.yml"],
|
|
19
21
|
"knowledge_dirs": [],
|
|
@@ -861,6 +861,19 @@ module KairosMcp
|
|
|
861
861
|
FileUtils.rm_rf(@tmp)
|
|
862
862
|
end
|
|
863
863
|
|
|
864
|
+
# Replace WorkerSpawner.spawn with a no-op for the duration of the block.
|
|
865
|
+
# Avoids actually forking a detached worker process during async tests.
|
|
866
|
+
def with_stubbed_worker_spawner
|
|
867
|
+
singleton = WorkerSpawner.singleton_class
|
|
868
|
+
original = WorkerSpawner.method(:spawn)
|
|
869
|
+
singleton.send(:define_method, :spawn) { |**_kwargs| true }
|
|
870
|
+
begin
|
|
871
|
+
yield
|
|
872
|
+
ensure
|
|
873
|
+
singleton.send(:define_method, :spawn, original)
|
|
874
|
+
end
|
|
875
|
+
end
|
|
876
|
+
|
|
864
877
|
def test_partition_for_strategy_delegate_drops_match
|
|
865
878
|
reviewers = [
|
|
866
879
|
{ provider: 'claude_code', model: 'claude-opus-4-7', role_label: 'r47' },
|
|
@@ -1024,9 +1037,150 @@ module KairosMcp
|
|
|
1024
1037
|
)
|
|
1025
1038
|
payload = JSON.parse(result.first[:text])
|
|
1026
1039
|
deadline = Time.iso8601(payload['must_collect_by'])
|
|
1027
|
-
# Should be ~60s from now, not the default
|
|
1040
|
+
# Should be ~60s from now, not the default 1800s
|
|
1028
1041
|
assert_in_delta 60, deadline - Time.now, 5
|
|
1029
1042
|
end
|
|
1043
|
+
|
|
1044
|
+
# Bug #1 fix: collect_deadline_seconds_override must extend the sync
|
|
1045
|
+
# delegate_response deadline beyond the config default.
|
|
1046
|
+
def test_delegate_sync_respects_collect_deadline_override
|
|
1047
|
+
subprocess_results = [
|
|
1048
|
+
{ role_label: 'codex', provider: 'codex', model: 'm',
|
|
1049
|
+
raw_text: 'APPROVE', elapsed_seconds: 1, error: nil, status: :success }
|
|
1050
|
+
]
|
|
1051
|
+
result = @tool.send(:delegate_response,
|
|
1052
|
+
raw_results: subprocess_results,
|
|
1053
|
+
arguments: {
|
|
1054
|
+
'review_type' => 'design', 'artifact_name' => 'x',
|
|
1055
|
+
'collect_deadline_seconds_override' => 3000
|
|
1056
|
+
},
|
|
1057
|
+
config: { 'delegation' => { 'collect_deadline_seconds' => 60 } },
|
|
1058
|
+
orchestrator_model: 'claude-opus-4-7',
|
|
1059
|
+
convergence_rule: '3/4 APPROVE',
|
|
1060
|
+
min_quorum: 2,
|
|
1061
|
+
review_round: 1,
|
|
1062
|
+
complexity: 'high'
|
|
1063
|
+
)
|
|
1064
|
+
payload = JSON.parse(result.first[:text])
|
|
1065
|
+
deadline = Time.iso8601(payload['must_collect_by'])
|
|
1066
|
+
# Override (3000s) wins over config (60s)
|
|
1067
|
+
assert_in_delta 3000, deadline - Time.now, 5
|
|
1068
|
+
end
|
|
1069
|
+
|
|
1070
|
+
# Bug #3 fix: when no override and no config, default is now 1800s (was 600s).
|
|
1071
|
+
def test_delegate_sync_default_deadline_is_1800
|
|
1072
|
+
subprocess_results = [
|
|
1073
|
+
{ role_label: 'codex', provider: 'codex', model: 'm',
|
|
1074
|
+
raw_text: 'APPROVE', elapsed_seconds: 1, error: nil, status: :success }
|
|
1075
|
+
]
|
|
1076
|
+
result = @tool.send(:delegate_response,
|
|
1077
|
+
raw_results: subprocess_results,
|
|
1078
|
+
arguments: { 'review_type' => 'design', 'artifact_name' => 'x' },
|
|
1079
|
+
config: {},
|
|
1080
|
+
orchestrator_model: 'claude-opus-4-7',
|
|
1081
|
+
convergence_rule: '3/4 APPROVE',
|
|
1082
|
+
min_quorum: 2,
|
|
1083
|
+
review_round: 1,
|
|
1084
|
+
complexity: 'high'
|
|
1085
|
+
)
|
|
1086
|
+
payload = JSON.parse(result.first[:text])
|
|
1087
|
+
deadline = Time.iso8601(payload['must_collect_by'])
|
|
1088
|
+
assert_in_delta 1800, deadline - Time.now, 5
|
|
1089
|
+
end
|
|
1090
|
+
|
|
1091
|
+
# Bug #1 fix (async): when timeout_seconds_override raises the worker
|
|
1092
|
+
# self_timeout above the configured collect_deadline, the deadline must
|
|
1093
|
+
# auto-extend to cover the worker lifespan + poll margin. Otherwise the
|
|
1094
|
+
# token expires while the worker is still healthy.
|
|
1095
|
+
def test_delegate_async_auto_extends_deadline_to_worker_lifespan
|
|
1096
|
+
reviewers = [{ provider: 'codex', model: 'codex-default', role_label: 'codex' }]
|
|
1097
|
+
arguments = {
|
|
1098
|
+
'review_type' => 'design',
|
|
1099
|
+
'artifact_name' => 'x',
|
|
1100
|
+
'timeout_seconds_override' => 1500
|
|
1101
|
+
}
|
|
1102
|
+
config = {
|
|
1103
|
+
'delegation' => {
|
|
1104
|
+
'collect_deadline_seconds' => 600,
|
|
1105
|
+
'parallel' => {
|
|
1106
|
+
'worker_self_timeout_multiplier' => 1.5,
|
|
1107
|
+
'worker_self_timeout_floor_seconds' => 60,
|
|
1108
|
+
'poll_interval_seconds' => 0.5
|
|
1109
|
+
}
|
|
1110
|
+
}
|
|
1111
|
+
}
|
|
1112
|
+
parallel_cfg = config.dig('delegation', 'parallel')
|
|
1113
|
+
|
|
1114
|
+
result = nil
|
|
1115
|
+
with_stubbed_worker_spawner do
|
|
1116
|
+
result = @tool.send(:delegate_response_async,
|
|
1117
|
+
reviewers: reviewers,
|
|
1118
|
+
messages: [{ 'role' => 'user', 'content' => 'x' }],
|
|
1119
|
+
system_prompt: 'sys',
|
|
1120
|
+
arguments: arguments,
|
|
1121
|
+
config: config,
|
|
1122
|
+
orchestrator_model: 'claude-opus-4-7',
|
|
1123
|
+
convergence_rule: '3/4 APPROVE',
|
|
1124
|
+
min_quorum: 2,
|
|
1125
|
+
review_round: 1,
|
|
1126
|
+
complexity: 'high',
|
|
1127
|
+
review_context: 'independent',
|
|
1128
|
+
max_concurrent: 2,
|
|
1129
|
+
timeout_secs: 1500,
|
|
1130
|
+
parallel_cfg: parallel_cfg
|
|
1131
|
+
)
|
|
1132
|
+
end
|
|
1133
|
+
payload = JSON.parse(result.first[:text])
|
|
1134
|
+
assert_equal 'delegation_pending', payload['status']
|
|
1135
|
+
deadline = Time.iso8601(payload['must_collect_by'])
|
|
1136
|
+
# worker_lifespan = 1500*1.5 + 60 = 2310; +10s poll margin = 2320
|
|
1137
|
+
# Deadline must be at least worker_lifespan + margin, NOT 600
|
|
1138
|
+
assert_operator deadline - Time.now, :>=, 2320 - 5
|
|
1139
|
+
end
|
|
1140
|
+
|
|
1141
|
+
# Async: explicit collect_deadline_seconds_override above the auto-min wins.
|
|
1142
|
+
def test_delegate_async_respects_explicit_override
|
|
1143
|
+
reviewers = [{ provider: 'codex', model: 'codex-default', role_label: 'codex' }]
|
|
1144
|
+
arguments = {
|
|
1145
|
+
'review_type' => 'design',
|
|
1146
|
+
'artifact_name' => 'x',
|
|
1147
|
+
'collect_deadline_seconds_override' => 5000
|
|
1148
|
+
}
|
|
1149
|
+
config = {
|
|
1150
|
+
'delegation' => {
|
|
1151
|
+
'collect_deadline_seconds' => 600,
|
|
1152
|
+
'parallel' => {
|
|
1153
|
+
'worker_self_timeout_multiplier' => 1.5,
|
|
1154
|
+
'worker_self_timeout_floor_seconds' => 60,
|
|
1155
|
+
'poll_interval_seconds' => 0.5
|
|
1156
|
+
}
|
|
1157
|
+
}
|
|
1158
|
+
}
|
|
1159
|
+
parallel_cfg = config.dig('delegation', 'parallel')
|
|
1160
|
+
|
|
1161
|
+
result = nil
|
|
1162
|
+
with_stubbed_worker_spawner do
|
|
1163
|
+
result = @tool.send(:delegate_response_async,
|
|
1164
|
+
reviewers: reviewers,
|
|
1165
|
+
messages: [{ 'role' => 'user', 'content' => 'x' }],
|
|
1166
|
+
system_prompt: 'sys',
|
|
1167
|
+
arguments: arguments,
|
|
1168
|
+
config: config,
|
|
1169
|
+
orchestrator_model: 'claude-opus-4-7',
|
|
1170
|
+
convergence_rule: '3/4 APPROVE',
|
|
1171
|
+
min_quorum: 2,
|
|
1172
|
+
review_round: 1,
|
|
1173
|
+
complexity: 'high',
|
|
1174
|
+
review_context: 'independent',
|
|
1175
|
+
max_concurrent: 2,
|
|
1176
|
+
timeout_secs: 300,
|
|
1177
|
+
parallel_cfg: parallel_cfg
|
|
1178
|
+
)
|
|
1179
|
+
end
|
|
1180
|
+
payload = JSON.parse(result.first[:text])
|
|
1181
|
+
deadline = Time.iso8601(payload['must_collect_by'])
|
|
1182
|
+
assert_in_delta 5000, deadline - Time.now, 5
|
|
1183
|
+
end
|
|
1030
1184
|
end
|
|
1031
1185
|
|
|
1032
1186
|
class TestCollectTool < Minitest::Test
|
|
@@ -0,0 +1,249 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'minitest/autorun'
|
|
4
|
+
require 'json'
|
|
5
|
+
require 'tmpdir'
|
|
6
|
+
require 'fileutils'
|
|
7
|
+
require 'time'
|
|
8
|
+
|
|
9
|
+
# Stub BaseTool so we can load the tool file in isolation.
|
|
10
|
+
module KairosMcp
|
|
11
|
+
module Tools
|
|
12
|
+
class BaseTool
|
|
13
|
+
def text_content(s); [{ text: s }]; end
|
|
14
|
+
end
|
|
15
|
+
end unless defined?(KairosMcp::Tools::BaseTool)
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
require_relative '../lib/multi_llm_review/pending_state'
|
|
19
|
+
require_relative '../lib/multi_llm_review/wait_for_worker'
|
|
20
|
+
require_relative '../tools/multi_llm_review_wait'
|
|
21
|
+
|
|
22
|
+
module KairosMcp
|
|
23
|
+
module SkillSets
|
|
24
|
+
module MultiLlmReview
|
|
25
|
+
class TestMultiLlmReviewWait < Minitest::Test
|
|
26
|
+
def setup
|
|
27
|
+
@tmp = Dir.mktmpdir('mlr-wait-')
|
|
28
|
+
@orig_cwd = Dir.pwd
|
|
29
|
+
Dir.chdir(@tmp)
|
|
30
|
+
@tool = Tools::MultiLlmReviewWait.new
|
|
31
|
+
@token = '11111111-2222-4333-8444-555555555555'
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def teardown
|
|
35
|
+
Dir.chdir(@orig_cwd)
|
|
36
|
+
FileUtils.rm_rf(@tmp)
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
def write_state(extra = {})
|
|
40
|
+
PendingState.create_token_dir!(@token)
|
|
41
|
+
PendingState.write_state(@token, {
|
|
42
|
+
'schema_version' => 4,
|
|
43
|
+
'token' => @token,
|
|
44
|
+
'created_at' => Time.now.iso8601,
|
|
45
|
+
'collect_deadline' => (Time.now + 1800).iso8601,
|
|
46
|
+
'subprocess_status' => 'pending',
|
|
47
|
+
'subprocess_total' => 3,
|
|
48
|
+
'parallel' => true
|
|
49
|
+
}.merge(extra))
|
|
50
|
+
FileUtils.touch(PendingState.collect_lock_path(@token))
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
def call_wait(args = {})
|
|
54
|
+
payload = JSON.parse(@tool.call({ 'collect_token' => @token }.merge(args)).first[:text])
|
|
55
|
+
payload
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
# ── unknown_token ────────────────────────────────────────────────
|
|
59
|
+
|
|
60
|
+
def test_unknown_token_returns_unknown_with_redispatch_hint
|
|
61
|
+
payload = call_wait
|
|
62
|
+
assert_equal 'unknown_token', payload['status']
|
|
63
|
+
assert_equal @token, payload['collect_token']
|
|
64
|
+
assert_equal 'multi_llm_review', payload['next_action']['tool']
|
|
65
|
+
assert_match(/never existed|garbage-collected|new dispatch/i,
|
|
66
|
+
payload['next_action']['purpose'])
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
def test_invalid_token_format_returns_unknown
|
|
70
|
+
payload = JSON.parse(@tool.call({ 'collect_token' => 'not-a-uuid' }).first[:text])
|
|
71
|
+
assert_equal 'unknown_token', payload['status']
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
# ── already_collected ────────────────────────────────────────────
|
|
75
|
+
|
|
76
|
+
def test_already_collected_returns_replay_hint
|
|
77
|
+
write_state
|
|
78
|
+
PendingState.write_collected(@token, {
|
|
79
|
+
'final_payload' => { 'status' => 'ok', 'verdict' => 'APPROVE' }
|
|
80
|
+
})
|
|
81
|
+
payload = call_wait
|
|
82
|
+
assert_equal 'already_collected', payload['status']
|
|
83
|
+
assert_equal 'multi_llm_review_collect', payload['next_action']['tool']
|
|
84
|
+
assert_match(/idempotent replay/i, payload['next_action']['purpose'])
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
# ── past_collect_deadline ────────────────────────────────────────
|
|
88
|
+
|
|
89
|
+
def test_past_deadline_returns_redispatch_without_blocking
|
|
90
|
+
write_state('collect_deadline' => (Time.now - 60).iso8601)
|
|
91
|
+
t0 = Time.now
|
|
92
|
+
payload = call_wait('max_wait_seconds' => 5)
|
|
93
|
+
elapsed = Time.now - t0
|
|
94
|
+
assert_equal 'past_collect_deadline', payload['status']
|
|
95
|
+
assert_equal 'multi_llm_review', payload['next_action']['tool']
|
|
96
|
+
assert_operator elapsed, :<, 1.0, 'must not block when past deadline'
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
# ── ready ────────────────────────────────────────────────────────
|
|
100
|
+
|
|
101
|
+
def test_ready_when_subprocess_results_present
|
|
102
|
+
write_state
|
|
103
|
+
PendingState.write_subprocess_results(@token, {
|
|
104
|
+
'results' => [
|
|
105
|
+
{ 'role_label' => 'codex', 'raw_text' => 'APPROVE', 'status' => 'success' },
|
|
106
|
+
{ 'role_label' => 'cursor', 'raw_text' => 'APPROVE', 'status' => 'success' },
|
|
107
|
+
{ 'role_label' => 'claude', 'raw_text' => 'APPROVE', 'status' => 'success' }
|
|
108
|
+
],
|
|
109
|
+
'elapsed_seconds' => 12.3
|
|
110
|
+
})
|
|
111
|
+
payload = call_wait('max_wait_seconds' => 2)
|
|
112
|
+
assert_equal 'ready', payload['status']
|
|
113
|
+
assert_equal 3, payload['subprocess_done']
|
|
114
|
+
assert_equal 3, payload['subprocess_total']
|
|
115
|
+
assert_equal 'multi_llm_review_collect', payload['next_action']['tool']
|
|
116
|
+
assert_includes payload['next_action']['args'].keys, 'orchestrator_reviews'
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
# ── still_pending + streak escalation ────────────────────────────
|
|
120
|
+
|
|
121
|
+
def test_still_pending_returned_when_worker_healthy_but_slow
|
|
122
|
+
write_state
|
|
123
|
+
# Live heartbeat so WaitForWorker sees a healthy worker.
|
|
124
|
+
FileUtils.touch(PendingState.worker_heartbeat_path(@token))
|
|
125
|
+
PendingState.write_worker_pid(@token, { 'pid' => Process.pid, 'pgid' => Process.pid })
|
|
126
|
+
|
|
127
|
+
payload = call_wait('max_wait_seconds' => 1)
|
|
128
|
+
assert_equal 'still_pending', payload['status']
|
|
129
|
+
assert_equal 1, payload['still_pending_streak']
|
|
130
|
+
assert_equal 'multi_llm_review_wait', payload['next_action']['tool']
|
|
131
|
+
end
|
|
132
|
+
|
|
133
|
+
def test_still_pending_streak_persists_across_calls
|
|
134
|
+
write_state
|
|
135
|
+
FileUtils.touch(PendingState.worker_heartbeat_path(@token))
|
|
136
|
+
PendingState.write_worker_pid(@token, { 'pid' => Process.pid, 'pgid' => Process.pid })
|
|
137
|
+
|
|
138
|
+
p1 = call_wait('max_wait_seconds' => 1)
|
|
139
|
+
assert_equal 1, p1['still_pending_streak']
|
|
140
|
+
p2 = call_wait('max_wait_seconds' => 1)
|
|
141
|
+
assert_equal 2, p2['still_pending_streak']
|
|
142
|
+
end
|
|
143
|
+
|
|
144
|
+
def test_streak_at_limit_escalates_to_crashed
|
|
145
|
+
write_state('wait_still_pending_streak' => 3)
|
|
146
|
+
payload = call_wait('max_wait_seconds' => 1)
|
|
147
|
+
assert_equal 'crashed', payload['status']
|
|
148
|
+
assert_equal 'wait_exhausted', payload['crashed_reason']
|
|
149
|
+
assert_equal 'multi_llm_review', payload['next_action']['tool']
|
|
150
|
+
end
|
|
151
|
+
|
|
152
|
+
def test_ready_resets_streak
|
|
153
|
+
write_state('wait_still_pending_streak' => 2)
|
|
154
|
+
PendingState.write_subprocess_results(@token, { 'results' => [], 'elapsed_seconds' => 1 })
|
|
155
|
+
payload = call_wait('max_wait_seconds' => 1)
|
|
156
|
+
assert_equal 'ready', payload['status']
|
|
157
|
+
state = PendingState.load_state(@token)
|
|
158
|
+
assert_equal 0, state['wait_still_pending_streak'].to_i
|
|
159
|
+
end
|
|
160
|
+
|
|
161
|
+
# ── crashed (worker terminal) ────────────────────────────────────
|
|
162
|
+
|
|
163
|
+
def test_crashed_status_propagates_reason
|
|
164
|
+
write_state('subprocess_status' => 'crashed', 'crash_reason' => 'segfault')
|
|
165
|
+
payload = call_wait('max_wait_seconds' => 1)
|
|
166
|
+
assert_equal 'crashed', payload['status']
|
|
167
|
+
assert_equal 'segfault', payload['crashed_reason']
|
|
168
|
+
assert_equal 'multi_llm_review', payload['next_action']['tool']
|
|
169
|
+
end
|
|
170
|
+
|
|
171
|
+
# ── hard cap ─────────────────────────────────────────────────────
|
|
172
|
+
# Hard cap is enforced before WaitForWorker is invoked. We verify the
|
|
173
|
+
# clamping logic without actually waiting for the cap by checking the
|
|
174
|
+
# request was processed (well-formed payload returned in bounded time)
|
|
175
|
+
# and the deadline-remaining check fired.
|
|
176
|
+
def test_max_wait_clamped_when_request_exceeds_hard_cap
|
|
177
|
+
# Set a very short deadline so the deadline-remaining clamp fires
|
|
178
|
+
# almost immediately.
|
|
179
|
+
write_state('collect_deadline' => (Time.now + 2).iso8601)
|
|
180
|
+
FileUtils.touch(PendingState.worker_heartbeat_path(@token))
|
|
181
|
+
PendingState.write_worker_pid(@token, { 'pid' => Process.pid, 'pgid' => Process.pid })
|
|
182
|
+
|
|
183
|
+
t0 = Time.now
|
|
184
|
+
payload = call_wait('max_wait_seconds' => 999_999)
|
|
185
|
+
elapsed = Time.now - t0
|
|
186
|
+
# Whatever status comes back (still_pending or past_collect_deadline
|
|
187
|
+
# depending on timing), elapsed must be bounded — never the 999_999s
|
|
188
|
+
# the caller requested. Enforces the clamp path is not bypassed.
|
|
189
|
+
refute_nil payload['status']
|
|
190
|
+
assert_operator elapsed, :<, 30.0,
|
|
191
|
+
'elapsed must be bounded by deadline-remaining clamp, not by raw max_wait_seconds'
|
|
192
|
+
end
|
|
193
|
+
|
|
194
|
+
# ── elapsed_seconds field is always present ──────────────────────
|
|
195
|
+
|
|
196
|
+
def test_elapsed_seconds_always_present
|
|
197
|
+
write_state
|
|
198
|
+
PendingState.write_subprocess_results(@token, { 'results' => [], 'elapsed_seconds' => 0.1 })
|
|
199
|
+
payload = call_wait('max_wait_seconds' => 1)
|
|
200
|
+
assert payload.key?('elapsed_seconds'), 'elapsed_seconds field missing'
|
|
201
|
+
assert_kind_of Float, payload['elapsed_seconds']
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
# ── next_action present on every status ──────────────────────────
|
|
205
|
+
|
|
206
|
+
def test_next_action_present_on_every_status
|
|
207
|
+
write_state
|
|
208
|
+
# ready
|
|
209
|
+
PendingState.write_subprocess_results(@token, { 'results' => [], 'elapsed_seconds' => 1 })
|
|
210
|
+
assert call_wait('max_wait_seconds' => 1)['next_action'], 'ready missing next_action'
|
|
211
|
+
|
|
212
|
+
# past_collect_deadline
|
|
213
|
+
File.delete(PendingState.subprocess_results_path(@token))
|
|
214
|
+
PendingState.write_state(@token, PendingState.load_state(@token)
|
|
215
|
+
.merge('collect_deadline' => (Time.now - 1).iso8601))
|
|
216
|
+
assert call_wait['next_action'], 'past_collect_deadline missing next_action'
|
|
217
|
+
|
|
218
|
+
# crashed
|
|
219
|
+
PendingState.write_state(@token, PendingState.load_state(@token).merge(
|
|
220
|
+
'collect_deadline' => (Time.now + 600).iso8601,
|
|
221
|
+
'subprocess_status' => 'crashed', 'crash_reason' => 'oom'
|
|
222
|
+
))
|
|
223
|
+
assert call_wait['next_action'], 'crashed missing next_action'
|
|
224
|
+
end
|
|
225
|
+
end
|
|
226
|
+
|
|
227
|
+
# ── backward compat: collect can still be called without wait ────────
|
|
228
|
+
# Verifies that introducing wait does not break the existing
|
|
229
|
+
# "delegation_pending → collect" path. The collect tool already polls
|
|
230
|
+
# internally and remains the primary completion gate.
|
|
231
|
+
class TestWaitToolBackwardCompat < Minitest::Test
|
|
232
|
+
def test_collect_works_without_wait_tool
|
|
233
|
+
# Smoke test: load the collect tool and verify it has not gained a
|
|
234
|
+
# required dependency on wait. (Full collect integration is covered
|
|
235
|
+
# in test_multi_llm_review.rb; this is a presence check.)
|
|
236
|
+
require_relative '../tools/multi_llm_review_collect'
|
|
237
|
+
collect = Tools::MultiLlmReviewCollect.new
|
|
238
|
+
schema = collect.input_schema
|
|
239
|
+
assert_equal 'object', schema[:type]
|
|
240
|
+
# The collect tool's required fields must still be just collect_token
|
|
241
|
+
# + orchestrator_reviews — wait must NOT have been added as required.
|
|
242
|
+
required = schema[:required] || []
|
|
243
|
+
refute_includes required, 'wait_completed'
|
|
244
|
+
refute_includes required, 'wait_token'
|
|
245
|
+
end
|
|
246
|
+
end
|
|
247
|
+
end
|
|
248
|
+
end
|
|
249
|
+
end
|
|
@@ -92,6 +92,16 @@ module KairosMcp
|
|
|
92
92
|
type: 'integer',
|
|
93
93
|
description: 'Override dispatch timeout in seconds (default from config)'
|
|
94
94
|
},
|
|
95
|
+
collect_deadline_seconds_override: {
|
|
96
|
+
type: 'integer',
|
|
97
|
+
description: 'Override how long the orchestrator has to call ' \
|
|
98
|
+
'multi_llm_review_collect before the pending token expires ' \
|
|
99
|
+
'(default from config: delegation.collect_deadline_seconds). ' \
|
|
100
|
+
'In the async/parallel path, the effective deadline is also ' \
|
|
101
|
+
'auto-extended to cover the worker self_timeout plus a poll margin, ' \
|
|
102
|
+
'so raising timeout_seconds_override alone no longer leaves the ' \
|
|
103
|
+
'collect deadline shorter than the worker lifespan.'
|
|
104
|
+
},
|
|
95
105
|
complexity: {
|
|
96
106
|
type: 'string',
|
|
97
107
|
enum: %w[auto low medium high critical],
|
|
@@ -446,7 +456,8 @@ module KairosMcp
|
|
|
446
456
|
}))
|
|
447
457
|
end
|
|
448
458
|
|
|
449
|
-
deadline_secs =
|
|
459
|
+
deadline_secs = arguments['collect_deadline_seconds_override'] ||
|
|
460
|
+
config.dig('delegation', 'collect_deadline_seconds') || 1800
|
|
450
461
|
now = Time.now
|
|
451
462
|
token = PendingState.generate_token
|
|
452
463
|
|
|
@@ -507,11 +518,22 @@ module KairosMcp
|
|
|
507
518
|
}))
|
|
508
519
|
end
|
|
509
520
|
|
|
510
|
-
deadline_secs =
|
|
521
|
+
deadline_secs = arguments['collect_deadline_seconds_override'] ||
|
|
522
|
+
config.dig('delegation', 'collect_deadline_seconds') || 1800
|
|
511
523
|
multiplier = parallel_cfg['worker_self_timeout_multiplier'] || 1.5
|
|
512
524
|
floor = parallel_cfg['worker_self_timeout_floor_seconds'] || 60
|
|
525
|
+
poll_interval = parallel_cfg['poll_interval_seconds'] || 0.5
|
|
513
526
|
now = Time.now
|
|
514
|
-
|
|
527
|
+
worker_lifespan_secs = (timeout_secs * multiplier + floor).to_f
|
|
528
|
+
self_timeout_at = (now + worker_lifespan_secs).iso8601
|
|
529
|
+
|
|
530
|
+
# Auto-extend collect_deadline to cover the worker's self_timeout plus
|
|
531
|
+
# a polling margin. Without this, raising timeout_seconds_override alone
|
|
532
|
+
# leaves the orchestrator's submission window shorter than the worker
|
|
533
|
+
# lifespan — the collect token expires while the worker is still healthy.
|
|
534
|
+
# Only kicks in for the async path; sync delegate_response has no worker.
|
|
535
|
+
min_deadline_secs = (worker_lifespan_secs + (poll_interval * 20)).ceil
|
|
536
|
+
deadline_secs = [deadline_secs.to_i, min_deadline_secs].max
|
|
515
537
|
|
|
516
538
|
# UUID collision retry (EEXIST on Dir.mkdir per PendingState§token_dir).
|
|
517
539
|
token = nil
|
|
@@ -578,7 +600,7 @@ module KairosMcp
|
|
|
578
600
|
'instruction' => 'Run persona-based review using your Agent tool. ' \
|
|
579
601
|
"Choose #{PersonaAssembly::MIN_PERSONAS}-#{PersonaAssembly::MAX_PERSONAS} " \
|
|
580
602
|
'personas appropriate to the artifact and review_type. ' \
|
|
581
|
-
'
|
|
603
|
+
'Then call multi_llm_review_wait, then multi_llm_review_collect.',
|
|
582
604
|
'review_type' => arguments['review_type'],
|
|
583
605
|
'persona_count_min' => PersonaAssembly::MIN_PERSONAS,
|
|
584
606
|
'persona_count_max' => PersonaAssembly::MAX_PERSONAS
|
|
@@ -586,7 +608,18 @@ module KairosMcp
|
|
|
586
608
|
'subprocess_status' => 'pending',
|
|
587
609
|
'subprocess_total' => reviewers.size,
|
|
588
610
|
'must_collect_by' => (now + deadline_secs).iso8601,
|
|
589
|
-
'orchestrator_model' => orchestrator_model
|
|
611
|
+
'orchestrator_model' => orchestrator_model,
|
|
612
|
+
# next_action hint (R1, R8): MCP does not enforce ordering, but
|
|
613
|
+
# the LLM is highly likely to follow this hint. Calling wait is
|
|
614
|
+
# optional — collect alone still works via its internal polling —
|
|
615
|
+
# but wait surfaces structural completion deterministically.
|
|
616
|
+
'next_action' => {
|
|
617
|
+
'tool' => 'multi_llm_review_wait',
|
|
618
|
+
'args' => { 'collect_token' => token, 'max_wait_seconds' => 600 },
|
|
619
|
+
'purpose' => 'Phase 1.5: block until subprocess reviewers complete. Call ' \
|
|
620
|
+
'AFTER spawning persona Agent reviews, BEFORE multi_llm_review_collect. ' \
|
|
621
|
+
'Optional but strongly recommended for deterministic recovery hints.'
|
|
622
|
+
}
|
|
590
623
|
}))
|
|
591
624
|
end
|
|
592
625
|
|
|
@@ -0,0 +1,313 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'json'
|
|
4
|
+
require 'time'
|
|
5
|
+
require_relative '../lib/multi_llm_review/pending_state'
|
|
6
|
+
require_relative '../lib/multi_llm_review/wait_for_worker'
|
|
7
|
+
|
|
8
|
+
module KairosMcp
|
|
9
|
+
module SkillSets
|
|
10
|
+
module MultiLlmReview
|
|
11
|
+
module Tools
|
|
12
|
+
# Phase 1.5 of the orchestrator delegation protocol.
|
|
13
|
+
#
|
|
14
|
+
# Optional blocking gate that orchestrator can call AFTER spawning
|
|
15
|
+
# persona Agent reviews and BEFORE multi_llm_review_collect. Server
|
|
16
|
+
# polls the detached worker's state and returns when subprocess
|
|
17
|
+
# reviewers complete (or earlier on terminal conditions).
|
|
18
|
+
#
|
|
19
|
+
# Without this tool, orchestrator can still call collect directly —
|
|
20
|
+
# collect's own internal polling covers worker completion. wait is a
|
|
21
|
+
# tool-chain checkpoint that surfaces structural status (ready,
|
|
22
|
+
# crashed, exhausted) with explicit next_action recovery hints, so
|
|
23
|
+
# the LLM can choose the right next step deterministically.
|
|
24
|
+
#
|
|
25
|
+
# Status enum (R10):
|
|
26
|
+
# ready — subprocess_results.json present, proceed to collect
|
|
27
|
+
# still_pending — max_wait elapsed, worker healthy, may call wait again
|
|
28
|
+
# crashed — worker terminal failure (with reason)
|
|
29
|
+
# unknown_token — token dir missing (never existed or GC'd)
|
|
30
|
+
# already_collected — collected.json present, retrieve cached payload
|
|
31
|
+
# past_collect_deadline — token alive but past deadline; collect would reject
|
|
32
|
+
class MultiLlmReviewWait < KairosMcp::Tools::BaseTool
|
|
33
|
+
# Per-call hard cap on max_wait_seconds (R7).
|
|
34
|
+
MAX_WAIT_HARD_CAP_DEFAULT = 1800
|
|
35
|
+
|
|
36
|
+
# Default streak limit before still_pending escalates to crashed (R7).
|
|
37
|
+
STILL_PENDING_STREAK_LIMIT_DEFAULT = 3
|
|
38
|
+
|
|
39
|
+
def name
|
|
40
|
+
'multi_llm_review_wait'
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def description
|
|
44
|
+
'Phase 1.5 — block until subprocess reviewers complete for a delegated ' \
|
|
45
|
+
'multi_llm_review token. Optional but recommended: call after spawning ' \
|
|
46
|
+
'persona Agent reviews and before multi_llm_review_collect. Returns ' \
|
|
47
|
+
'a status enum with a next_action recovery hint for every status.'
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def category
|
|
51
|
+
:review
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
def usecase_tags
|
|
55
|
+
%w[review multi-llm wait blocking polling]
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
def related_tools
|
|
59
|
+
%w[multi_llm_review multi_llm_review_collect]
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
def input_schema
|
|
63
|
+
{
|
|
64
|
+
type: 'object',
|
|
65
|
+
properties: {
|
|
66
|
+
collect_token: {
|
|
67
|
+
type: 'string',
|
|
68
|
+
description: 'UUID v4 token returned by multi_llm_review delegation_pending'
|
|
69
|
+
},
|
|
70
|
+
max_wait_seconds: {
|
|
71
|
+
type: 'integer',
|
|
72
|
+
description: 'Server-side blocking duration cap in seconds. ' \
|
|
73
|
+
'Default from config (delegation.parallel.wait_max_default_seconds). ' \
|
|
74
|
+
'Hard cap 1800 (delegation.parallel.wait_max_hard_cap_seconds).'
|
|
75
|
+
}
|
|
76
|
+
},
|
|
77
|
+
required: %w[collect_token]
|
|
78
|
+
}
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
def call(arguments)
|
|
82
|
+
token = arguments['collect_token'].to_s
|
|
83
|
+
unless PendingState.valid_token?(token)
|
|
84
|
+
return text_content(JSON.generate({
|
|
85
|
+
'status' => 'unknown_token',
|
|
86
|
+
'collect_token' => token,
|
|
87
|
+
'elapsed_seconds' => 0.0,
|
|
88
|
+
'next_action' => next_action_redispatch(
|
|
89
|
+
'Token format invalid. Re-run multi_llm_review to start a new dispatch.'
|
|
90
|
+
)
|
|
91
|
+
}))
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
cfg = config_parallel
|
|
95
|
+
default_max = (cfg['wait_max_default_seconds'] || 600).to_i
|
|
96
|
+
hard_cap = (cfg['wait_max_hard_cap_seconds'] || MAX_WAIT_HARD_CAP_DEFAULT).to_i
|
|
97
|
+
poll_int = (cfg['wait_poll_interval_seconds'] || 1.0).to_f
|
|
98
|
+
streak_limit = (cfg['wait_still_pending_streak_limit'] ||
|
|
99
|
+
STILL_PENDING_STREAK_LIMIT_DEFAULT).to_i
|
|
100
|
+
|
|
101
|
+
requested_max = (arguments['max_wait_seconds'] || default_max).to_i
|
|
102
|
+
requested_max = hard_cap if requested_max > hard_cap
|
|
103
|
+
requested_max = 1 if requested_max < 1
|
|
104
|
+
|
|
105
|
+
# 1. already_collected check (collected.json present) — before any
|
|
106
|
+
# deadline / token-dir checks so a successful collect always
|
|
107
|
+
# returns deterministically even after deadline expiry.
|
|
108
|
+
if File.exist?(safe_path { PendingState.collected_path(token) })
|
|
109
|
+
return reply('already_collected', token, 0.0,
|
|
110
|
+
next_action: next_action_collect_replay(token,
|
|
111
|
+
'Collect already completed for this token. Call multi_llm_review_collect ' \
|
|
112
|
+
'to retrieve the cached final consensus (idempotent replay).'))
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
# 2. unknown_token check (state.json missing).
|
|
116
|
+
state = PendingState.load_state(token)
|
|
117
|
+
if state.nil?
|
|
118
|
+
return reply('unknown_token', token, 0.0,
|
|
119
|
+
next_action: next_action_redispatch(
|
|
120
|
+
'Token not found (never existed or already garbage-collected). ' \
|
|
121
|
+
'Re-run multi_llm_review to start a new dispatch.'))
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
# 3. past_collect_deadline early exit (collect would reject anyway).
|
|
125
|
+
deadline = (Time.iso8601(state['collect_deadline']) rescue nil)
|
|
126
|
+
if deadline && Time.now > deadline
|
|
127
|
+
return reply('past_collect_deadline', token, 0.0,
|
|
128
|
+
subprocess_total: state['subprocess_total'] ||
|
|
129
|
+
(PendingState.load_request(token)&.dig('reviewers')&.size),
|
|
130
|
+
next_action: next_action_redispatch(
|
|
131
|
+
'Token deadline elapsed. multi_llm_review_collect would reject. ' \
|
|
132
|
+
'Re-run multi_llm_review to start a new dispatch.'))
|
|
133
|
+
end
|
|
134
|
+
|
|
135
|
+
# 4. Cap max_wait by remaining deadline (R7) so we never block
|
|
136
|
+
# longer than the useful lifetime of the token.
|
|
137
|
+
if deadline
|
|
138
|
+
remaining = (deadline - Time.now).to_i
|
|
139
|
+
requested_max = remaining if remaining < requested_max
|
|
140
|
+
requested_max = 1 if requested_max < 1
|
|
141
|
+
end
|
|
142
|
+
|
|
143
|
+
# 5. Streak guard: if still_pending was returned too many times in
|
|
144
|
+
# a row, escalate to crashed/wait_exhausted.
|
|
145
|
+
streak = (state['wait_still_pending_streak'] || 0).to_i
|
|
146
|
+
if streak >= streak_limit
|
|
147
|
+
return reply('crashed', token, 0.0,
|
|
148
|
+
crashed_reason: 'wait_exhausted',
|
|
149
|
+
still_pending_streak: streak,
|
|
150
|
+
next_action: next_action_redispatch(
|
|
151
|
+
"still_pending streak reached limit (#{streak_limit}). Worker may be " \
|
|
152
|
+
'wedged or pathologically slow. Re-run multi_llm_review.'))
|
|
153
|
+
end
|
|
154
|
+
|
|
155
|
+
# 6. Delegate to existing WaitForWorker for the actual polling.
|
|
156
|
+
outcome = WaitForWorker.wait(token, {
|
|
157
|
+
max_wait_seconds: requested_max,
|
|
158
|
+
poll_interval_seconds: poll_int,
|
|
159
|
+
startup_grace_seconds: cfg['startup_grace_seconds'] || 30,
|
|
160
|
+
heartbeat_stale_threshold_seconds: cfg['heartbeat_stale_threshold_seconds'] || 15
|
|
161
|
+
})
|
|
162
|
+
|
|
163
|
+
translate_outcome(token, outcome, streak, requested_max, state)
|
|
164
|
+
rescue StandardError => e
|
|
165
|
+
warn "[multi_llm_review_wait] INTERNAL ERROR: #{e.class}: #{e.message}"
|
|
166
|
+
warn e.backtrace.first(10).join("\n") if e.backtrace
|
|
167
|
+
text_content(JSON.generate({
|
|
168
|
+
'status' => 'error',
|
|
169
|
+
'error_class' => 'internal',
|
|
170
|
+
'error' => "#{e.class}: #{e.message}",
|
|
171
|
+
'collect_token' => arguments['collect_token']
|
|
172
|
+
}))
|
|
173
|
+
end
|
|
174
|
+
|
|
175
|
+
private
|
|
176
|
+
|
|
177
|
+
def translate_outcome(token, outcome, prior_streak, requested_max, state)
|
|
178
|
+
elapsed = (outcome[:elapsed] || requested_max).to_f
|
|
179
|
+
subprocess_total = state['subprocess_total'] ||
|
|
180
|
+
PendingState.load_request(token)&.dig('reviewers')&.size
|
|
181
|
+
|
|
182
|
+
case outcome[:status]
|
|
183
|
+
when :ready
|
|
184
|
+
reset_streak(token)
|
|
185
|
+
done = (outcome[:results].is_a?(Array) ? outcome[:results].size : nil) ||
|
|
186
|
+
subprocess_total
|
|
187
|
+
reply('ready', token, elapsed,
|
|
188
|
+
subprocess_done: done,
|
|
189
|
+
subprocess_total: subprocess_total,
|
|
190
|
+
next_action: next_action_collect(token,
|
|
191
|
+
'Subprocess reviewers complete. Submit your persona Agent findings to ' \
|
|
192
|
+
'multi_llm_review_collect to compute the final consensus.'))
|
|
193
|
+
when :crashed
|
|
194
|
+
reset_streak(token)
|
|
195
|
+
reply('crashed', token, elapsed,
|
|
196
|
+
crashed_reason: outcome[:reason] || 'crashed',
|
|
197
|
+
subprocess_total: subprocess_total,
|
|
198
|
+
next_action: next_action_redispatch(
|
|
199
|
+
"Worker terminated abnormally (#{outcome[:reason] || 'crashed'}). " \
|
|
200
|
+
'Re-run multi_llm_review to start a new dispatch.'))
|
|
201
|
+
when :timeout
|
|
202
|
+
new_streak = prior_streak + 1
|
|
203
|
+
persist_streak(token, new_streak)
|
|
204
|
+
reply('still_pending', token, elapsed,
|
|
205
|
+
subprocess_total: subprocess_total,
|
|
206
|
+
still_pending_streak: new_streak,
|
|
207
|
+
next_action: next_action_wait(token,
|
|
208
|
+
"Worker still healthy after #{requested_max}s. Call multi_llm_review_wait " \
|
|
209
|
+
"again with the same token (streak #{new_streak}/#{(state.dig('wait_still_pending_streak_limit') || STILL_PENDING_STREAK_LIMIT_DEFAULT)})."))
|
|
210
|
+
else
|
|
211
|
+
reply('crashed', token, elapsed,
|
|
212
|
+
crashed_reason: "unknown_outcome:#{outcome[:status]}",
|
|
213
|
+
subprocess_total: subprocess_total,
|
|
214
|
+
next_action: next_action_redispatch(
|
|
215
|
+
'Worker reported an unexpected outcome. Re-run multi_llm_review.'))
|
|
216
|
+
end
|
|
217
|
+
end
|
|
218
|
+
|
|
219
|
+
def reply(status, token, elapsed, **fields)
|
|
220
|
+
payload = {
|
|
221
|
+
'status' => status,
|
|
222
|
+
'collect_token' => token,
|
|
223
|
+
'elapsed_seconds' => elapsed.round(3)
|
|
224
|
+
}
|
|
225
|
+
payload['subprocess_done'] = fields[:subprocess_done] if fields.key?(:subprocess_done)
|
|
226
|
+
payload['subprocess_total'] = fields[:subprocess_total] if fields.key?(:subprocess_total)
|
|
227
|
+
payload['crashed_reason'] = fields[:crashed_reason] if fields.key?(:crashed_reason)
|
|
228
|
+
payload['still_pending_streak'] = fields[:still_pending_streak] if fields.key?(:still_pending_streak)
|
|
229
|
+
payload['next_action'] = fields[:next_action] if fields.key?(:next_action)
|
|
230
|
+
text_content(JSON.generate(payload))
|
|
231
|
+
end
|
|
232
|
+
|
|
233
|
+
def next_action_collect(token, purpose)
|
|
234
|
+
{
|
|
235
|
+
'tool' => 'multi_llm_review_collect',
|
|
236
|
+
'args' => {
|
|
237
|
+
'collect_token' => token,
|
|
238
|
+
'orchestrator_reviews' => '<persona findings array, 2-4 entries>'
|
|
239
|
+
},
|
|
240
|
+
'purpose' => purpose
|
|
241
|
+
}
|
|
242
|
+
end
|
|
243
|
+
|
|
244
|
+
def next_action_collect_replay(token, purpose)
|
|
245
|
+
{
|
|
246
|
+
'tool' => 'multi_llm_review_collect',
|
|
247
|
+
'args' => { 'collect_token' => token },
|
|
248
|
+
'purpose' => purpose
|
|
249
|
+
}
|
|
250
|
+
end
|
|
251
|
+
|
|
252
|
+
def next_action_wait(token, purpose)
|
|
253
|
+
{
|
|
254
|
+
'tool' => 'multi_llm_review_wait',
|
|
255
|
+
'args' => { 'collect_token' => token },
|
|
256
|
+
'purpose' => purpose
|
|
257
|
+
}
|
|
258
|
+
end
|
|
259
|
+
|
|
260
|
+
def next_action_redispatch(purpose)
|
|
261
|
+
{
|
|
262
|
+
'tool' => 'multi_llm_review',
|
|
263
|
+
'args' => '<original arguments>',
|
|
264
|
+
'purpose' => purpose
|
|
265
|
+
}
|
|
266
|
+
end
|
|
267
|
+
|
|
268
|
+
# Streak persistence via PendingState.update_state (atomic RMW).
|
|
269
|
+
def persist_streak(token, n)
|
|
270
|
+
PendingState.update_state(token) do |state|
|
|
271
|
+
next nil unless state
|
|
272
|
+
state['wait_still_pending_streak'] = n
|
|
273
|
+
state
|
|
274
|
+
end
|
|
275
|
+
rescue StandardError
|
|
276
|
+
# Best-effort. Streak loss = orchestrator gets one more retry,
|
|
277
|
+
# acceptable degradation.
|
|
278
|
+
end
|
|
279
|
+
|
|
280
|
+
def reset_streak(token)
|
|
281
|
+
PendingState.update_state(token) do |state|
|
|
282
|
+
next nil unless state
|
|
283
|
+
if state['wait_still_pending_streak'].to_i.positive?
|
|
284
|
+
state['wait_still_pending_streak'] = 0
|
|
285
|
+
state
|
|
286
|
+
else
|
|
287
|
+
nil
|
|
288
|
+
end
|
|
289
|
+
end
|
|
290
|
+
rescue StandardError
|
|
291
|
+
# Best-effort.
|
|
292
|
+
end
|
|
293
|
+
|
|
294
|
+
def safe_path
|
|
295
|
+
yield
|
|
296
|
+
rescue StandardError
|
|
297
|
+
'/dev/null/never_exists'
|
|
298
|
+
end
|
|
299
|
+
|
|
300
|
+
def config_parallel
|
|
301
|
+
return {} unless self.class.const_defined?(:CONFIG_PATH) || true
|
|
302
|
+
path = File.expand_path('../config/multi_llm_review.yml', __dir__)
|
|
303
|
+
return {} unless File.exist?(path)
|
|
304
|
+
cfg = YAML.safe_load_file(path, permitted_classes: [Symbol], aliases: true)
|
|
305
|
+
(cfg.dig('delegation', 'parallel') || {}).to_h
|
|
306
|
+
rescue StandardError
|
|
307
|
+
{}
|
|
308
|
+
end
|
|
309
|
+
end
|
|
310
|
+
end
|
|
311
|
+
end
|
|
312
|
+
end
|
|
313
|
+
end
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: kairos-chain
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 3.
|
|
4
|
+
version: 3.24.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Masaomi Hatakeyama
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2026-04-
|
|
11
|
+
date: 2026-04-27 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: minitest
|
|
@@ -497,6 +497,7 @@ files:
|
|
|
497
497
|
- templates/skillsets/multi_llm_review/test/test_feedback_formatter.rb
|
|
498
498
|
- templates/skillsets/multi_llm_review/test/test_multi_llm_review.rb
|
|
499
499
|
- templates/skillsets/multi_llm_review/test/test_multi_llm_review_bundle.rb
|
|
500
|
+
- templates/skillsets/multi_llm_review/test/test_multi_llm_review_wait.rb
|
|
500
501
|
- templates/skillsets/multi_llm_review/test/test_pending_state_v3.rb
|
|
501
502
|
- templates/skillsets/multi_llm_review/test/test_pin_resolver.rb
|
|
502
503
|
- templates/skillsets/multi_llm_review/test/test_sanitizer.rb
|
|
@@ -504,6 +505,7 @@ files:
|
|
|
504
505
|
- templates/skillsets/multi_llm_review/tools/multi_llm_review.rb
|
|
505
506
|
- templates/skillsets/multi_llm_review/tools/multi_llm_review_bundle.rb
|
|
506
507
|
- templates/skillsets/multi_llm_review/tools/multi_llm_review_collect.rb
|
|
508
|
+
- templates/skillsets/multi_llm_review/tools/multi_llm_review_wait.rb
|
|
507
509
|
- templates/skillsets/multiuser/config/multiuser.yml
|
|
508
510
|
- templates/skillsets/multiuser/lib/multiuser.rb
|
|
509
511
|
- templates/skillsets/multiuser/lib/multiuser/authorization_gate.rb
|