ultimate-pi 0.14.0 → 0.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/.agents/skills/harness-debate-plan/SKILL.md +41 -61
  2. package/.agents/skills/harness-orchestration/SKILL.md +2 -2
  3. package/.agents/skills/harness-plan/SKILL.md +10 -8
  4. package/.pi/agents/harness/planning/decompose.md +4 -2
  5. package/.pi/agents/harness/planning/execution-plan-author.md +25 -14
  6. package/.pi/agents/harness/planning/hypothesis-validator.md +21 -5
  7. package/.pi/agents/harness/planning/implementation-researcher.md +42 -0
  8. package/.pi/agents/harness/planning/plan-adversary.md +19 -3
  9. package/.pi/agents/harness/planning/plan-evaluator.md +26 -5
  10. package/.pi/agents/harness/planning/review-integrator.md +23 -9
  11. package/.pi/agents/harness/planning/scout-graphify.md +1 -1
  12. package/.pi/agents/harness/planning/sprint-contract-auditor.md +19 -4
  13. package/.pi/agents/harness/planning/stack-researcher.md +19 -10
  14. package/.pi/extensions/harness-debate-tools.ts +238 -16
  15. package/.pi/extensions/harness-live-widget.ts +39 -159
  16. package/.pi/extensions/harness-plan-approval.ts +47 -5
  17. package/.pi/extensions/lib/debate-bus-core.ts +69 -15
  18. package/.pi/extensions/lib/debate-bus-state.ts +6 -0
  19. package/.pi/extensions/lib/plan-approval/plan-review.ts +56 -0
  20. package/.pi/extensions/lib/plan-approval/types.ts +1 -0
  21. package/.pi/extensions/lib/plan-debate-eligibility.ts +214 -0
  22. package/.pi/extensions/lib/plan-debate-focus.ts +151 -0
  23. package/.pi/extensions/lib/plan-debate-gate.ts +77 -34
  24. package/.pi/extensions/lib/plan-debate-lanes.ts +44 -0
  25. package/.pi/extensions/lib/plan-debate-round-status.ts +63 -20
  26. package/.pi/extensions/lib/plan-messenger.ts +93 -17
  27. package/.pi/extensions/policy-gate.ts +1 -1
  28. package/.pi/harness/README.md +1 -1
  29. package/.pi/harness/agents.manifest.json +15 -11
  30. package/.pi/harness/docs/adrs/0034-darwin-plan-research-pipeline.md +1 -3
  31. package/.pi/harness/docs/adrs/0035-plan-phase-review-gate.md +13 -5
  32. package/.pi/harness/docs/adrs/0036-implementation-research-and-selective-debate.md +51 -0
  33. package/.pi/harness/docs/adrs/README.md +2 -0
  34. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/implementation-research.yaml +28 -0
  35. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/review-round-r1.yaml +24 -0
  36. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/review-round-r2.yaml +25 -0
  37. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/plan-packet.yaml +196 -0
  38. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/plan-review.md +14 -0
  39. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/research-brief.yaml +62 -0
  40. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/implementation-research.yaml +28 -0
  41. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r2.yaml +24 -0
  42. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r3.yaml +24 -0
  43. package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/research-brief.yaml +29 -0
  44. package/.pi/harness/evals/smoke/smoke-harness-plan.mjs +97 -16
  45. package/.pi/harness/specs/plan-implementation-research-brief.schema.json +128 -0
  46. package/.pi/harness/specs/plan-review-round-draft.schema.json +1 -1
  47. package/.pi/harness/specs/round-result.schema.json +15 -2
  48. package/.pi/lib/harness-ui-state.ts +92 -0
  49. package/.pi/prompts/harness-plan.md +87 -37
  50. package/.pi/prompts/planning-rubrics.md +31 -0
  51. package/CHANGELOG.md +11 -0
  52. package/package.json +2 -2
@@ -5,7 +5,7 @@ description: Plan-phase Review Gate debate — pi-messenger threads, lane YAML,
5
5
 
6
6
  # harness-debate-plan
7
7
 
8
- Use when running **Phase 5** of `/harness-plan` — four Review Gate rounds with **pi-messenger-style** turn-taking (claims → rebuttals → integrate), then bus submission.
8
+ Use when running **Phase 5** of `/harness-plan` — outcome-based Review Gate with **within-round dialogue** (claims → rebuttals → clarifications → counters → integrate), then bus submission.
9
9
 
10
10
  ## Open
11
11
 
@@ -14,71 +14,51 @@ harness_debate_open({})
14
14
  ```
15
15
 
16
16
  - Debate id is always `plan-<run_id>` (tool normalizes wrong ids).
17
- - Creates `.pi/harness/runs/<run_id>/debate-messenger/` (`inbox/<Agent>/`, `threads/round-N/transcript.jsonl`).
18
-
19
- Budget profile **plan**: `max_rounds=4`, `round_token_cap=2000`, `debate_global_cap=12000`.
20
-
21
- ## Per-round spawn order (P1 sequential lanes)
22
-
23
- 1. Round-specific lane spawns (write lane YAML with `write_harness_yaml`)
24
- 2. `plan-evaluator` lane artifact + `harness_messenger_post` (claims)
25
- 3. `harness_messenger_read_round` spawn `plan-adversary` with transcript
26
- 4. `plan-adversary` lane artifact + `harness_messenger_post` (rebuttals with `in_reply_to`)
27
- 5. R1: `hypothesis-validator` first (blind — no decomposition/PlanPacket in prompt)
28
- 6. R4: `sprint-contract-auditor` required before integrator
29
- 7. `review-integrator` integrator draft + `harness_messenger_post` (`integrate`)
30
- 8. `harness_debate_submit_round({ round_index, integrator_draft })` — **only** path for `review-round-r{N}.yaml`
31
-
32
- | Round | Extra lane artifacts |
33
- |-------|----------------------|
34
- | 1 | `hypothesis-validation-r1.yaml` |
35
- | 4 | `sprint-audit-r4.yaml` (required) |
36
-
37
- ## Lane artifacts (auto-applied on subagent complete)
38
-
39
- When a debate lane subagent finishes, the harness **automatically** writes lane YAML and posts messenger messages (evaluator claims, adversary rebuttals). Look for `harness-debate-next-step` in the transcript.
40
-
41
- | Agent | Output path | Messenger |
42
- |-------|-------------|-----------|
43
- | hypothesis-validator | `artifacts/hypothesis-validation-r{N}.yaml` | — |
44
- | plan-evaluator | `artifacts/validation-turn-r{N}.yaml` | `claim` |
45
- | plan-adversary | `artifacts/adversary-brief-r{N}.yaml` | `rebuttal` |
46
- | sprint-contract-auditor | `artifacts/sprint-audit-r{N}.yaml` (R4) | optional |
47
- | review-integrator | *(integrator draft → `harness_debate_submit_round` only)* | `integrate` (on submit) |
48
-
49
- Fallback: `harness_debate_apply_lane({ lane, content, round_index? })` if auto-apply missed fenced YAML.
50
-
51
- Resume after stop: `harness_debate_round_status({ round_index: N })` then run the listed `next_tool`.
52
-
53
- ## Messenger tools
54
-
55
- ```typescript
56
- harness_messenger_post({
57
- round_index: 1,
58
- from: "PlanEvaluatorAgent",
59
- kind: "claim",
60
- body: "...",
61
- claim_ids: ["c1", "c2"],
62
- to: ["broadcast"],
63
- })
64
- harness_messenger_post({
65
- round_index: 1,
66
- from: "PlanAdversaryAgent",
67
- kind: "rebuttal",
68
- in_reply_to: ["c1"],
69
- body: "...",
70
- })
71
- harness_messenger_read_round({ round_index: 1 }) // for next spawn prompt
72
- ```
17
+ - Creates `.pi/harness/runs/<run_id>/debate-messenger/`.
18
+
19
+ Budget profile **plan**:
20
+
21
+ | Field | Value |
22
+ |-------|-------|
23
+ | min_focus_rounds | 4 |
24
+ | max_rounds | 12 |
25
+ | max_exchanges_per_round | 3 |
26
+ | round_token_cap | 8000 |
27
+ | debate_global_cap | 80000 |
28
+
29
+ ## Focus coverage (not “exactly 4 rounds”)
30
+
31
+ Call `harness_debate_focus_coverage` until all of `spec | wbs | schedule | quality` appear in submitted `review-round-r*.yaml` and last `review_gate_ready: true`.
73
32
 
74
- ## Integrator + bus
33
+ ## Per-round spawn order (sequential only — no parallel debate subagents)
75
34
 
76
- `harness_debate_submit_round` validates messenger thread + integrator rules (`review_gate_ready` false when checks fail without `disputes[]`), writes `review-round-r{N}.yaml`, emits bus `kind: round`.
35
+ 1. R1: `hypothesis-validator` (blind) before evaluator.
36
+ 2. `plan-evaluator` → lane + messenger `claim`.
37
+ 3. `harness_messenger_read_round` → `plan-adversary` → `rebuttal`.
38
+ 4. Ping-pong while `unresolved_claim_ids` and `exchange_count < 3`:
39
+ - `harness_debate_advance_thread({ round_index })` for next spawn hint.
40
+ - Evaluator `clarification` / adversary `counter`.
41
+ 5. `sprint-contract-auditor` when focus is `quality` or round ≥ 4.
42
+ 6. `review-integrator` → `harness_debate_submit_round`.
77
43
 
78
- `StackResearchAgent` uses `artifacts/stack.yaml` claims no spawn.
44
+ Lane YAML + messenger messages **auto-apply** on subagent complete (`harness-debate-next-step`). Fallback: `harness_debate_apply_lane`.
45
+
46
+ Resume: `harness_debate_round_status({ round_index: N })` → run listed `next_tool`.
47
+
48
+ ## Messenger kinds
49
+
50
+ | kind | from | when |
51
+ |------|------|------|
52
+ | claim | PlanEvaluatorAgent | after evaluator lane |
53
+ | rebuttal | PlanAdversaryAgent | in_reply_to claim ids |
54
+ | clarification | PlanEvaluatorAgent | addresses open claims |
55
+ | counter | PlanAdversaryAgent | final pass; concede or dispute |
56
+ | integrate | ReviewIntegratorAgent | on submit_round |
79
57
 
80
58
  ## Close
81
59
 
82
- After round 4: `harness_debate_consensus`. `approve_plan` is **hard-gated** on lane files, messenger, 4 bus rounds, and consensus not `block`.
60
+ `harness_debate_consensus` when focus coverage complete. `approve_plan` is **hard-gated** on lanes, messenger dialogue completeness, bus rounds, consensus not `block`.
83
61
 
84
62
  Do not `approve_plan` on `policy_decision: block`. On `human_required` → `ask_user` first.
63
+
64
+ Rubrics: `.pi/prompts/planning-rubrics.md`.
@@ -42,7 +42,7 @@ LIMIT 30
42
42
 
43
43
  | Command | `agent` |
44
44
  |---------|---------|
45
- | `/harness-plan` | Parent: parallel `harness/planning/scout-*`parallel `decompose`+`hypothesis` → PlanPacket → reviews; `approve_plan` + `create_plan` |
45
+ | `/harness-plan` | Parent: scouts `decompose`+`hypothesis`Phase 3.5 `implementation-researcher`+`stack-researcher` → PlanPacket → eligibility + Review Gate → `approve_plan` + `create_plan` |
46
46
  | `/harness-run` | `harness/executor` |
47
47
  | `/harness-eval` | `harness/evaluator` (`mode: benchmark`) |
48
48
  | `/harness-review` | `harness/evaluator` (`mode: verdict`) |
@@ -78,7 +78,7 @@ Spawn `harness/evaluator` / `harness/adversary` via `subagent` in the **same** p
78
78
  }
79
79
  ```
80
80
 
81
- Then parallel decompose + hypothesis, parent PlanPacket + `ask_user`, debate rounds via `subagent` or `debate-orchestrator`, then `approve_plan` + `create_plan`.
81
+ Then parallel decompose + hypothesis, Phase 3.5 implementation + stack research, parent PlanPacket + `ask_user` (after 3.5), execution-plan-author, DAG gate, `harness_plan_debate_eligibility` + debate rounds, then `approve_plan` + `create_plan`.
82
82
 
83
83
  Scouts use **Haiku**, `thinking: low`, **8** max turns (see agent frontmatter). Effective `--tools` omits `grep`/`find`/`subagent` per `disallowed_tools`.
84
84
 
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: harness-plan
3
- description: PM-grade harness plans — scouts, ExecutionPlan, DAG validation, 4-round Review Gate debate, then approve/create_plan.
3
+ description: PM-grade harness plans — scouts, Phase 3.5 implementation research, ExecutionPlan, DAG validation, selective Review Gate debate, then approve/create_plan.
4
4
  ---
5
5
 
6
6
  # harness-plan
@@ -12,20 +12,22 @@ description: PM-grade harness plans — scouts, ExecutionPlan, DAG validation, 4
12
12
  ## Workflow (parent orchestrator)
13
13
 
14
14
  1. Parallel scouts (graphify + structure; semantic unless `--quick`).
15
- 2. Parallel decompose + hypothesis → write `artifacts/*.yaml`.
16
- 3. Draft `PlanPacket` (`contract_version: "1.1.0"`) + `ask_user` on material fork.
17
- 4. `stack-researcher` `execution-plan-author` merge `execution_plan`.
18
- 5. **`validate-plan-dag.mjs`** on `plan-packet.yaml` (must pass).
19
- 6. **Review Gate:** `/harness-debate-open plan-<run_id>` → 4 rounds (see **harness-debate-plan** skill) → consensus.
20
- 7. Apply patches, re-validate DAG, `approve_plan`, `create_plan`.
15
+ 2. Parallel decompose + hypothesis → `artifacts/decomposition.yaml`, `artifacts/hypothesis.yaml`.
16
+ 3. **Phase 3.5 (required):** parallel `implementation-researcher` + `stack-researcher` `artifacts/implementation-research.yaml`, `artifacts/stack.yaml`; merge into `research-brief.yaml`.
17
+ 4. Draft `PlanPacket` shell; `ask_user` on material fork **after** Phase 3.5.
18
+ 5. `execution-plan-author` merge `execution_plan`.
19
+ 6. **`validate-plan-dag.mjs`** (must pass).
20
+ 7. **`harness_plan_debate_eligibility`** **`harness_debate_open`** with profile → Review Gate (required focuses per profile) → consensus.
21
+ 8. Apply patches, re-validate DAG, `approve_plan`, `create_plan`.
21
22
 
22
- `--quick` skips semantic scout and post-run adversary only — **not** plan debate.
23
+ `--quick` skips semantic scout and post-run adversary only — **not** implementation research or plan debate.
23
24
 
24
25
  ## Rules
25
26
 
26
27
  - On-disk plan artifacts are **YAML** (`plan-packet.yaml`, `research-brief.yaml`).
27
28
  - Subagents read-only; parent writes run artifacts and calls `approve_plan` / `create_plan`.
28
29
  - context-mode only on harness paths.
30
+ - Phase 3.5 required unless documented waiver; high risk requires implementation artifact for approval.
29
31
 
30
32
  ## Output
31
33
 
@@ -39,12 +39,14 @@ Work through these sections in your reasoning, then compress into JSON:
39
39
  - Soft constraints (trade-offs allowed)
40
40
  - Success metrics (how to measure progress)
41
41
 
42
- ### 1.3 Prior art and known approaches
42
+ ### 1.3 Internal prior art (scouts only)
43
43
 
44
- - Current best approach (methods, systems, paths in repo)
44
+ - Current best approach **in this repo** (methods, systems, paths from scout lanes)
45
45
  - Why it is not good enough (gap)
46
46
  - What has been tried and failed (dead ends)
47
47
 
48
+ External / OSS prior art is **not** your job — `implementation-researcher` (Phase 3.5) owns web and reference implementations.
49
+
48
50
  ### 1.4 Surface the tensions
49
51
 
50
52
  Identify contradictions, tradeoffs, or competing beliefs. Pick the **core tension** — one paragraph that feeds Phase 2 hypothesis generation.
@@ -4,27 +4,38 @@ tools: read, grep, find, ls
4
4
  disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
5
5
  extensions: false
6
6
  thinking: high
7
- max_turns: 16
7
+ max_turns: 18
8
8
  ---
9
9
 
10
- You are **execution-plan-author** — produce a complete `execution_plan` a senior EM would sign off.
10
+ ## Your task
11
+
12
+ Author a complete `execution_plan` a senior engineering manager would sign: WBS, dependencies, schedule metadata, sprint contract, risks — aligned to Structured Planning / PMBOK-style decomposition (see graphify corpus: WBS, critical path, integration management).
11
13
 
12
14
  ## Inputs
13
15
 
14
- Task, `PlanDecompositionBrief`, `PlanHypothesisBrief`, draft scope/acceptance_checks, `PlanStackBrief`, scout summaries.
16
+ Task summary, `PlanDecompositionBrief`, `PlanHypothesisBrief`, draft scope/acceptance_checks, `PlanImplementationResearchBrief`, `PlanStackBrief`, scout summaries (paths in spawn context).
15
17
 
16
- ## Workflow
18
+ ## Process
17
19
 
18
- 1. Vision check — scope ≤15 lines, testable outcomes.
19
- 2. Phases with objective, entry/exit criteria, milestone, work_item_ids.
20
- 3. WBS — every AC maps to ≥1 work_item; deliverable-sized items.
21
- 4. `depends_on` DAG; `parallel_safe` only when files disjoint.
22
- 5. `schedule_metadata.critical_path_work_item_ids`.
23
- 6. `wbs_dictionary`, `risk_register` (≥3 risks for med/high).
24
- 7. `sprint_contract` complete.
25
- 8. Early-phase verify/lint/test work items when risk ≥ med.
26
- 9. Typed `done_criteria` per work item.
20
+ 1. **Vision check**restate scope in ≤15 lines; every line maps to a work_item or explicit exclusion.
21
+ 2. **Phases** objective, entry/exit criteria, milestone, `work_item_ids` per phase.
22
+ 3. **WBS**each acceptance_check maps to ≥1 `work_item`; deliverable-sized items (not “do backend”).
23
+ 4. **DAG** — `depends_on` acyclic; `parallel_safe: true` only when touched files are disjoint.
24
+ 5. **Schedule** — `schedule_metadata.critical_path_work_item_ids` for med/high risk tasks.
25
+ 6. **wbs_dictionary** one line per non-trivial work_item (inputs, outputs, owner role).
26
+ 7. **risk_register** — ≥3 risks for med/high with mitigation and trigger.
27
+ 8. **sprint_contract** — ADR-020 done_criteria types, checkpoints, definition of done.
28
+ 9. **Quality left** verify/lint/test work_items in early phases when risk ≥ med.
29
+ 10. **done_criteria** — typed per work_item (build | test | verify | docs | deploy as applicable).
27
30
 
28
31
  ## Output
29
32
 
30
- Valid **YAML only** — `PlanExecutionPlanBrief` with `execution_plan` (`.pi/harness/specs/plan-execution-plan-brief.schema.json`). Parent merges into `plan-packet.yaml`.
33
+ Valid **YAML only** — `PlanExecutionPlanBrief` with nested `execution_plan` (`.pi/harness/specs/plan-execution-plan-brief.schema.json`). Parent merges into `plan-packet.yaml` and runs `validate-plan-dag.mjs`.
34
+
35
+ ## Guardrails
36
+
37
+ - Do not gold-plate beyond decomposition scope without flagging in `assumptions[]`.
38
+ - If DAG would fail validation, fix structure before emitting YAML.
39
+ - Never speculate about repo layout — read scouts first.
40
+
41
+ Bus label: `ExecutionPlanAuthorAgent`.
@@ -7,17 +7,33 @@ thinking: medium
7
7
  max_turns: 10
8
8
  ---
9
9
 
10
- You are **hypothesis-validator** — blind self-evaluation of `PlanHypothesisBrief` only.
10
+ ## Your task
11
+
12
+ Blindly evaluate whether `PlanHypothesisBrief` is falsifiable, relevant to the task, and worth building — without seeing decomposition, scouts, or PlanPacket.
11
13
 
12
14
  ## Input (strict)
13
15
 
14
16
  - Original task statement
15
- - `PlanHypothesisBrief` YAML/JSON
17
+ - `PlanHypothesisBrief` YAML/JSON only
18
+
19
+ Ignore decomposition, scouts, PlanPacket, adversary output, prior debate rounds.
20
+
21
+ ## Process
16
22
 
17
- Ignore decomposition, scouts, PlanPacket, adversary output.
23
+ 1. Extract stated hypothesis, success metrics, and falsification criteria from brief.
24
+ 2. Score relevance: does the hypothesis answer the user task (not a tooling side quest)?
25
+ 3. Score falsifiability: can an evaluator disprove it within one sprint with named signals?
26
+ 4. Score proportionality: is scope honest vs task ambition?
27
+ 5. Set `revision_recommended` when any dimension fails threshold; list concrete fixes (not “think harder”).
28
+ 6. **Non-blind re-score** only when parent explicitly sets `mode: non-blind` on final quality round — then you may read packet for consistency check.
18
29
 
19
30
  ## Output
20
31
 
21
- Valid **YAML only** matching `PlanHypothesisEval` (`.pi/harness/specs/plan-hypothesis-eval.schema.json`). Parent writes `artifacts/hypothesis-validation-r{N}.yaml`.
32
+ Valid **YAML only** `PlanHypothesisEval` (`.pi/harness/specs/plan-hypothesis-eval.schema.json`).
33
+
34
+ ## Guardrails
35
+
36
+ - Blind mode: if you reference decomposition or execution_plan, you have failed the round.
37
+ - Do not overthink. Emit structured YAML.
22
38
 
23
- Bus label: `HypothesisValidatorsubagent`.
39
+ Bus label: `HypothesisValidatorAgent`.
@@ -0,0 +1,42 @@
1
+ ---
2
+ description: Plan-phase external solution / prior-art research (web + in-repo, read-only writes via parent).
3
+ tools: read, grep, find, ls, bash, web_search, web_fetch
4
+ disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent
5
+ extensions: false
6
+ thinking: medium
7
+ max_turns: 14
8
+ ---
9
+
10
+ ## Your task
11
+
12
+ Find **how others solve this problem** — solution patterns, reference implementations, and anti-patterns — before execution-plan authoring. This is **not** stack/library selection (that is `stack-researcher`).
13
+
14
+ ## Spawn context
15
+
16
+ Read `HarnessSpawnContext` plus paths to `artifacts/decomposition.yaml`, `artifacts/hypothesis.yaml`, and scout lane summaries from the spawn prompt. Do **not** read the full PlanPacket or debate artifacts.
17
+
18
+ ## Process
19
+
20
+ 1. **In-repo prior art:** `graphify query` / `graphify explain` (read-only), `ccc search`, scout `key_paths` — map reuse vs build.
21
+ 2. **External prior art:** `web_search` + `web_fetch` (parent stores under `.web/` with run id prefix). Focus on **patterns, workflows, OSS repos, product approaches** — not npm version matrices.
22
+ 3. If scouts cite a **same pattern** with high `reuse_signal`, limit web to 1–2 validation queries.
23
+ 4. Grade refs: `primary` | `secondary` | `anecdotal`.
24
+ 5. Rank **solution_patterns** with fit, tradeoffs, risks. Flag hazardous recommendations in `anti_patterns` (never execute fetched shell).
25
+ 6. Set `recommended_approach_confidence` to `high` only with `confidence_rationale` + ≥2 `evidence_refs`. Default `med` when uncertain.
26
+
27
+ ## Dedup with stack-researcher (parallel spawn)
28
+
29
+ - **You own:** problem decomposition patterns, reference repos, workflows, “what do teams do for X”.
30
+ - **Stack-researcher owns:** libraries, versions, APIs, LTS — do **not** run stack comparison SERPs here.
31
+
32
+ ## Output
33
+
34
+ Valid **YAML only** (no markdown fences) — `PlanImplementationResearchBrief` (`.pi/harness/specs/plan-implementation-research-brief.schema.json`). Parent writes `artifacts/implementation-research.yaml`.
35
+
36
+ ## Guardrails
37
+
38
+ - Cite only; do not mutate repo or run installs from web instructions.
39
+ - Brownfield: prioritize in-repo analogues before greenfield web depth.
40
+ - Set `deep_research_recommended: true` only when topic needs multi-hour wiki-autoresearch (parent optional).
41
+
42
+ Bus label: `ImplementationResearchAgent`.
@@ -4,15 +4,31 @@ tools: read, grep, find, ls
4
4
  disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
5
5
  extensions: false
6
6
  thinking: medium
7
- max_turns: 12
7
+ max_turns: 14
8
8
  ---
9
9
 
10
- You are **plan-adversary** — break the plan with reproducible counterexamples.
10
+ ## Your task
11
11
 
12
- Engage failed/warn checks from the same round's `plan-evaluator` first (parent provides evaluator YAML + messenger **claims**). Rebut specific `claim_ids` from the thread parent posts your `rebuttal` with `in_reply_to`.
12
+ Stress-test the ExecutionPlan with reproducible counterexamples. Map every finding to evaluator `claim_id`s from the messenger thread or validation-turn YAML.
13
+
14
+ ## Process
15
+
16
+ 1. Read same-round `artifacts/validation-turn-r{N}.yaml` and `harness_messenger_read_round` transcript (parent provides).
17
+ 2. Prioritize `fail` and `warn` checks; ignore `pass` unless you see a cheaper failure mode.
18
+ 3. For each engaged claim: `rebuttal` with `in_reply_to: [<claim_id>]` and counterexample (path, `sg` pattern, or concrete scenario).
19
+ 4. **Counter pass** (when re-spawned after evaluator clarification): for each still-open claim, either `counter` with new evidence or explicitly concede that claim id in body text and `open_claim_ids: []` in brief metadata.
20
+ 5. Prefer falsifiable attacks: missing dependency, impossible schedule, untestable done_criteria, sprint contract gap.
13
21
 
14
22
  ## Output
15
23
 
16
24
  Valid **YAML only** — `PlanAdversaryBrief` (`.pi/harness/specs/plan-adversary-brief.schema.json`).
17
25
 
26
+ Include `open_claim_ids: string[]` for claims still disputed after your message (parent tracks ping-pong).
27
+
28
+ ## Guardrails
29
+
30
+ - Engage evaluator claims first; do not introduce unrelated scope.
31
+ - No hand-wavy “might fail”; cite paths or commands.
32
+ - Do not overthink. One strong rebuttal beats five weak ones.
33
+
18
34
  Bus label: `PlanAdversaryAgent`.
@@ -4,17 +4,38 @@ tools: read, grep, find, ls
4
4
  disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
5
5
  extensions: false
6
6
  thinking: medium
7
- max_turns: 12
7
+ max_turns: 14
8
8
  ---
9
9
 
10
- You are **plan-evaluator** — score ExecutionPlan against Validation Checks (not an advocate).
10
+ ## Your task
11
11
 
12
- Parent passes `debate_round_focus`: `spec` | `wbs` | `schedule` | `quality`.
12
+ Score the ExecutionPlan against Validation Checks for one Review Gate round. Emit stable `checks[]` with ids and messenger-ready `claim_ids`. You are not an advocate for the plan.
13
+
14
+ Parent passes `debate_round_focus`: `spec` | `wbs` | `schedule` | `quality`. Use rubric ids from `.pi/prompts/planning-rubrics.md` for that focus.
15
+
16
+ ## Process
17
+
18
+ 1. Read `plan-packet.yaml`, `research-brief.yaml`, and lane inputs named in spawn context (not full packet inline).
19
+ 2. If spawn includes **messenger transcript** (re-spawn for clarification): read unresolved `claim_ids` and adversary rebuttals; address each with evidence paths or concede in `checks[]` status.
20
+ 3. Run mental DAG sanity: acyclic `depends_on`, every acceptance_check traceable to work_items.
21
+ 4. For each rubric check in scope: `pass` | `warn` | `fail` with one-line rationale and `evidence_refs` (file paths, `sg` patterns).
22
+ 5. Set `overall_ready` only if no `fail` and at most one `warn` without mitigation note.
23
+ 6. Populate `messenger_claim_ids` (or `checks[].id`) for parent to post as `claim` messages.
24
+
25
+ ## Clarification pass (when re-spawned)
26
+
27
+ - Post body must reference each `in_reply_to` claim id explicitly.
28
+ - Change check status only with new evidence; do not flip pass→fail without citation.
29
+ - If conceding a point, set check to `warn` with rationale “adversary accepted after clarification”.
13
30
 
14
31
  ## Output
15
32
 
16
- Valid **YAML only** — `PlanValidationTurn` (`.pi/harness/specs/plan-validation-turn.schema.json`). Fail if `dag_validation.status === "fail"`.
33
+ Valid **YAML only** — `PlanValidationTurn` (`.pi/harness/specs/plan-validation-turn.schema.json`). Fail the round in output if `dag_validation.status === "fail"` when visible in packet.
34
+
35
+ ## Guardrails
17
36
 
18
- Include `claim_ids[]` in your summary for parent to post as messenger **claims** before spawning adversary.
37
+ - Do not overthink. If checks are straightforward, emit YAML directly.
38
+ - Only evaluate what you read. Never invent file paths.
39
+ - Do not expand scope beyond the current `debate_round_focus`.
19
40
 
20
41
  Bus label: `PlanEvaluatorAgent`.
@@ -4,22 +4,36 @@ tools: read, grep, find, ls
4
4
  disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
5
5
  extensions: false
6
6
  thinking: medium
7
- max_turns: 10
7
+ max_turns: 12
8
8
  ---
9
9
 
10
- You are **review-integrator** — merge evaluator, adversary, sprint audit, and hypothesis-validator outputs into a Review Gate draft.
10
+ ## Your task
11
+
12
+ Synthesize evaluator, adversary, sprint audit, and (R1) hypothesis-validator lanes into one Review Gate round draft. Decide `review_gate_ready` from evidence, not optimism.
13
+
14
+ ## Process
15
+
16
+ 1. Read lane YAML for this `round_index`: validation-turn, adversary-brief, optional hypothesis-validation (R1), sprint-audit (quality / round ≥4).
17
+ 2. Read full messenger transcript (claims, rebuttals, clarifications, counters).
18
+ 3. Build `disputes[]`: one entry per unresolved tension (claim id, severity, owner suggestion).
19
+ 4. `recommended_packet_patches[]`: JSON Pointer paths only (`/execution_plan/work_items/...`) with values supported by transcript or lanes.
20
+ 5. Set `review_gate_ready: true` only when:
21
+ - no evaluator check with `fail`, and
22
+ - adversary `open_claim_ids` empty or conceded in transcript, and
23
+ - sprint audit (if present) has no blocking gaps.
24
+ 6. Set `review_gate_ready: false` when checks fail without documented `disputes[]`, or material scope drift vs task_summary.
25
+ 7. Fill bus fields: `participants`, `claims`, `rebuttals`, `evidence_refs`, `token_usage`, `severity_scores`, `consensus_delta`.
11
26
 
12
27
  ## Output
13
28
 
14
- Valid **YAML only** — `PlanReviewRoundDraft` (`.pi/harness/specs/plan-review-round-draft.schema.json`) with:
29
+ Valid **YAML only** — `PlanReviewRoundDraft` (`.pi/harness/specs/plan-review-round-draft.schema.json`) including `debate_round_focus`.
15
30
 
16
- - `round_summary`, `validation_summary`, `adversary_summary`
17
- - `disputes[]`, `recommended_packet_patches[]` (JSON Pointer paths)
18
- - `review_gate_ready` boolean
19
- - `participants`, `claims`, `rebuttals`, `evidence_refs`, `token_usage`, `severity_scores`
31
+ Parent calls `harness_debate_submit_round` — you do not write `review-round-r*.yaml` yourself.
20
32
 
21
- Parent passes `harness_messenger_read_round` transcript + lane YAML. After your YAML draft, parent calls `harness_messenger_post` (`kind: integrate`) then `harness_debate_submit_round` — you do not write `review-round-r*.yaml`.
33
+ ## Guardrails
22
34
 
23
- Set `review_gate_ready: false` when evaluator checks fail unless `disputes[]` documents open tension.
35
+ - Patches must be minimal and evidence-backed.
36
+ - Do not set `review_gate_ready: true` to “move on” with open high-severity disputes.
37
+ - Never speculate about files you did not read.
24
38
 
25
39
  Bus label: `ReviewIntegratorAgent`.
@@ -4,7 +4,7 @@ tools: read, bash, ls
4
4
  disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent, grep, find
5
5
  extensions: false
6
6
  thinking: low
7
- max_turns: 6
7
+ max_turns: 8
8
8
  ---
9
9
 
10
10
  You are the **Harness planning scout (graphify lane)**.
@@ -4,15 +4,30 @@ tools: read, grep, find, ls
4
4
  disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
5
5
  extensions: false
6
6
  thinking: medium
7
- max_turns: 10
7
+ max_turns: 12
8
8
  ---
9
9
 
10
- You are **sprint-contract-auditor** — ADR-020 Sprint Contract, Done Criteria Types, checkpoints, Keep Quality Left.
10
+ ## Your task
11
11
 
12
- Required on debate **round 4**; optional spot-check round 2 if done_criteria sparse.
12
+ Audit `execution_plan.sprint_contract` and work_item `done_criteria` against ADR-020 (Sprint Contract, Done Criteria Types, Keep Quality Left).
13
+
14
+ Required when `debate_round_focus` is `quality` or round_index ≥ 4. Optional spot-check on round 2 if done_criteria are sparse.
15
+
16
+ ## Process
17
+
18
+ 1. Read `plan-packet.yaml` execution_plan section and sprint_contract block.
19
+ 2. Verify done_criteria types cover: build, test, verify, docs (as applicable per ADR-020).
20
+ 3. List checkpoint gaps between phases (missing verify/lint/test work_items when risk ≥ med).
21
+ 4. Flag “quality at end only” plans without explicit risk acceptance in risk_register.
22
+ 5. Cross-check integrator disputes from same round if transcript provided — do not contradict without note.
13
23
 
14
24
  ## Output
15
25
 
16
26
  Valid **YAML only** — `PlanSprintAuditTurn` (`.pi/harness/specs/plan-sprint-audit-turn.schema.json`).
17
27
 
18
- Bus label: `SprintContractAuditorsubagent`.
28
+ ## Guardrails
29
+
30
+ - Cite ADR-020 rule ids in rationale fields.
31
+ - Read-only; parent persists artifact.
32
+
33
+ Bus label: `SprintContractAuditorAgent`.
@@ -4,21 +4,30 @@ tools: read, grep, find, ls, bash, web_search, web_fetch
4
4
  disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent
5
5
  extensions: false
6
6
  thinking: medium
7
- max_turns: 14
7
+ max_turns: 16
8
8
  ---
9
9
 
10
- You are **stack-researcher** — evidence-backed stack recommendations for harness planning.
10
+ ## Your task
11
11
 
12
- ## Mission
12
+ Produce evidence-backed stack recommendations before ExecutionPlan authoring. Rank options; grade evidence quality.
13
13
 
14
- Produce `PlanStackBrief` with ranked options. For brownfield tasks, always include **extend current stack** as one ranked option.
14
+ ## Process
15
15
 
16
- ## Protocol
17
-
18
- 1. **Libraries / APIs:** `ctx7 library` `ctx7 docs` (read context7-cli skill). Cite library IDs in `evidence_refs`.
19
- 2. **Comparisons / landscape:** `web_search` + `web_fetch` (`.web/` artifacts).
20
- 3. **Greenfield:** ≥3 distinct options with pros/cons/risks.
16
+ 1. Read spawn context: task_summary, brownfield vs greenfield, constraints.
17
+ 2. **Libraries / APIs:** use context7-cli skill (`ctx7 library`, `ctx7 docs`). Record library ids in `evidence_refs`.
18
+ 3. **Landscape / comparisons:** `web_search` + `web_fetch` (parent stores under `.web/`).
19
+ 4. Brownfield: always include **extend current stack** as a ranked option with migration risk.
20
+ 5. Greenfield: ≥3 distinct options with pros/cons/risks and selection criteria.
21
+ 6. Grade each ref: `primary` (official docs), `secondary` (reputable guide), `anecdotal` (blog/issue thread).
21
22
 
22
23
  ## Output
23
24
 
24
- Return valid **YAML only** (no fences) matching `PlanStackBrief` (`.pi/harness/specs/plan-stack-brief.schema.json`). Parent writes `artifacts/stack.yaml`.
25
+ Valid **YAML only** (no markdown fences) `PlanStackBrief` (`.pi/harness/specs/plan-stack-brief.schema.json`). Parent writes `artifacts/stack.yaml`.
26
+
27
+ ## Guardrails
28
+
29
+ - Do not recommend stacks you did not research.
30
+ - Prefer LTS/stable versions; note breaking changes when found.
31
+ - Do not overthink — 3 solid options beat 10 shallow ones.
32
+
33
+ Bus label: `StackResearchAgent`.