ultimate-pi 0.14.0 → 0.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/harness-debate-plan/SKILL.md +41 -61
- package/.agents/skills/harness-governor/SKILL.md +11 -0
- package/.agents/skills/harness-orchestration/SKILL.md +5 -3
- package/.agents/skills/harness-plan/SKILL.md +11 -9
- package/.pi/agents/harness/adversary.md +1 -1
- package/.pi/agents/harness/evaluator.md +1 -1
- package/.pi/agents/harness/executor.md +1 -1
- package/.pi/agents/harness/incident-recorder.md +1 -1
- package/.pi/agents/harness/meta-optimizer.md +1 -1
- package/.pi/agents/harness/planning/decompose.md +8 -35
- package/.pi/agents/harness/planning/execution-plan-author.md +27 -15
- package/.pi/agents/harness/planning/hypothesis-validator.md +23 -6
- package/.pi/agents/harness/planning/hypothesis.md +4 -27
- package/.pi/agents/harness/planning/implementation-researcher.md +43 -0
- package/.pi/agents/harness/planning/plan-adversary.md +20 -5
- package/.pi/agents/harness/planning/plan-evaluator.md +28 -6
- package/.pi/agents/harness/planning/review-integrator.md +23 -10
- package/.pi/agents/harness/planning/scout-graphify.md +4 -23
- package/.pi/agents/harness/planning/scout-semantic.md +3 -18
- package/.pi/agents/harness/planning/scout-structure.md +3 -18
- package/.pi/agents/harness/planning/sprint-contract-auditor.md +22 -6
- package/.pi/agents/harness/planning/stack-researcher.md +21 -11
- package/.pi/agents/harness/tie-breaker.md +1 -1
- package/.pi/agents/harness/trace-librarian.md +1 -1
- package/.pi/extensions/budget-guard.ts +33 -19
- package/.pi/extensions/harness-debate-tools.ts +280 -19
- package/.pi/extensions/harness-live-widget.ts +39 -159
- package/.pi/extensions/harness-plan-approval.ts +47 -5
- package/.pi/extensions/harness-run-context.ts +96 -2
- package/.pi/extensions/harness-subagent-submit.ts +195 -0
- package/.pi/extensions/lib/debate-bus-core.ts +108 -17
- package/.pi/extensions/lib/debate-bus-state.ts +6 -0
- package/.pi/extensions/lib/harness-subagent-policy.ts +45 -0
- package/.pi/extensions/lib/harness-subagent-submit-pipeline.ts +82 -0
- package/.pi/extensions/lib/harness-subagent-submit-registry.ts +172 -0
- package/.pi/extensions/lib/harness-subagents-bridge.ts +42 -0
- package/.pi/extensions/lib/plan-approval/plan-review.ts +56 -0
- package/.pi/extensions/lib/plan-approval/types.ts +1 -0
- package/.pi/extensions/lib/plan-debate-eligibility.ts +214 -0
- package/.pi/extensions/lib/plan-debate-focus.ts +151 -0
- package/.pi/extensions/lib/plan-debate-gate.ts +88 -34
- package/.pi/extensions/lib/plan-debate-lane.ts +15 -0
- package/.pi/extensions/lib/plan-debate-lanes.ts +44 -0
- package/.pi/extensions/lib/plan-debate-round-status.ts +63 -20
- package/.pi/extensions/lib/plan-messenger.ts +93 -17
- package/.pi/extensions/policy-gate.ts +1 -1
- package/.pi/harness/README.md +1 -1
- package/.pi/harness/agents.manifest.json +25 -21
- package/.pi/harness/docs/adrs/0034-darwin-plan-research-pipeline.md +1 -3
- package/.pi/harness/docs/adrs/0035-plan-phase-review-gate.md +13 -5
- package/.pi/harness/docs/adrs/0036-implementation-research-and-selective-debate.md +51 -0
- package/.pi/harness/docs/adrs/0037-subagent-submit-tools.md +31 -0
- package/.pi/harness/docs/adrs/0038-budget-telemetry-only.md +23 -0
- package/.pi/harness/docs/adrs/README.md +4 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/implementation-research.yaml +28 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/review-round-r1.yaml +24 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/artifacts/review-round-r2.yaml +25 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/plan-packet.yaml +196 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/plan-review.md +14 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-low-light/research-brief.yaml +62 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/implementation-research.yaml +28 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r2.yaml +24 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/artifacts/review-round-r3.yaml +24 -0
- package/.pi/harness/evals/smoke/fixtures/plan-phase/minimal-med/research-brief.yaml +29 -0
- package/.pi/harness/evals/smoke/smoke-harness-plan.mjs +97 -16
- package/.pi/harness/specs/harness-executor-handoff.schema.json +19 -0
- package/.pi/harness/specs/harness-human-required.schema.json +16 -0
- package/.pi/harness/specs/plan-implementation-research-brief.schema.json +128 -0
- package/.pi/harness/specs/plan-review-round-draft.schema.json +1 -1
- package/.pi/harness/specs/plan-scout-findings.schema.json +19 -0
- package/.pi/harness/specs/round-result.schema.json +15 -2
- package/.pi/lib/harness-agent-output.ts +45 -0
- package/.pi/lib/harness-budget-enforce.ts +18 -0
- package/.pi/lib/harness-schema-validate.ts +89 -0
- package/.pi/lib/harness-spawn-parse.ts +86 -0
- package/.pi/lib/harness-subagent-submit-path.ts +41 -0
- package/.pi/lib/harness-ui-state.ts +107 -2
- package/.pi/prompts/harness-auto.md +2 -2
- package/.pi/prompts/harness-plan.md +94 -42
- package/.pi/prompts/harness-run.md +2 -2
- package/.pi/prompts/planning-rubrics.md +31 -0
- package/.pi/scripts/harness-verify.mjs +2 -0
- package/.pi/scripts/harness_web/__pycache__/__init__.cpython-314.pyc +0 -0
- package/.pi/scripts/harness_web/__pycache__/config.cpython-314.pyc +0 -0
- package/.pi/scripts/harness_web/__pycache__/output.cpython-314.pyc +0 -0
- package/.pi/scripts/harness_web/__pycache__/scrape.cpython-314.pyc +0 -0
- package/.pi/scripts/harness_web/__pycache__/search.cpython-314.pyc +0 -0
- package/.pi/scripts/harness_web/__pycache__/search_ddg.cpython-314.pyc +0 -0
- package/.pi/scripts/harness_web/__pycache__/search_searxng.cpython-314.pyc +0 -0
- package/CHANGELOG.md +21 -0
- package/package.json +4 -2
- package/vendor/pi-subagents/src/subagents.ts +29 -3
|
@@ -5,7 +5,7 @@ description: Plan-phase Review Gate debate — pi-messenger threads, lane YAML,
|
|
|
5
5
|
|
|
6
6
|
# harness-debate-plan
|
|
7
7
|
|
|
8
|
-
Use when running **Phase 5** of `/harness-plan` —
|
|
8
|
+
Use when running **Phase 5** of `/harness-plan` — outcome-based Review Gate with **within-round dialogue** (claims → rebuttals → clarifications → counters → integrate), then bus submission.
|
|
9
9
|
|
|
10
10
|
## Open
|
|
11
11
|
|
|
@@ -14,71 +14,51 @@ harness_debate_open({})
|
|
|
14
14
|
```
|
|
15
15
|
|
|
16
16
|
- Debate id is always `plan-<run_id>` (tool normalizes wrong ids).
|
|
17
|
-
- Creates `.pi/harness/runs/<run_id>/debate-messenger
|
|
18
|
-
|
|
19
|
-
Budget profile **plan**:
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
| Round | Extra lane artifacts |
|
|
33
|
-
|-------|----------------------|
|
|
34
|
-
| 1 | `hypothesis-validation-r1.yaml` |
|
|
35
|
-
| 4 | `sprint-audit-r4.yaml` (required) |
|
|
36
|
-
|
|
37
|
-
## Lane artifacts (auto-applied on subagent complete)
|
|
38
|
-
|
|
39
|
-
When a debate lane subagent finishes, the harness **automatically** writes lane YAML and posts messenger messages (evaluator claims, adversary rebuttals). Look for `harness-debate-next-step` in the transcript.
|
|
40
|
-
|
|
41
|
-
| Agent | Output path | Messenger |
|
|
42
|
-
|-------|-------------|-----------|
|
|
43
|
-
| hypothesis-validator | `artifacts/hypothesis-validation-r{N}.yaml` | — |
|
|
44
|
-
| plan-evaluator | `artifacts/validation-turn-r{N}.yaml` | `claim` |
|
|
45
|
-
| plan-adversary | `artifacts/adversary-brief-r{N}.yaml` | `rebuttal` |
|
|
46
|
-
| sprint-contract-auditor | `artifacts/sprint-audit-r{N}.yaml` (R4) | optional |
|
|
47
|
-
| review-integrator | *(integrator draft → `harness_debate_submit_round` only)* | `integrate` (on submit) |
|
|
48
|
-
|
|
49
|
-
Fallback: `harness_debate_apply_lane({ lane, content, round_index? })` if auto-apply missed fenced YAML.
|
|
50
|
-
|
|
51
|
-
Resume after stop: `harness_debate_round_status({ round_index: N })` then run the listed `next_tool`.
|
|
52
|
-
|
|
53
|
-
## Messenger tools
|
|
54
|
-
|
|
55
|
-
```typescript
|
|
56
|
-
harness_messenger_post({
|
|
57
|
-
round_index: 1,
|
|
58
|
-
from: "PlanEvaluatorAgent",
|
|
59
|
-
kind: "claim",
|
|
60
|
-
body: "...",
|
|
61
|
-
claim_ids: ["c1", "c2"],
|
|
62
|
-
to: ["broadcast"],
|
|
63
|
-
})
|
|
64
|
-
harness_messenger_post({
|
|
65
|
-
round_index: 1,
|
|
66
|
-
from: "PlanAdversaryAgent",
|
|
67
|
-
kind: "rebuttal",
|
|
68
|
-
in_reply_to: ["c1"],
|
|
69
|
-
body: "...",
|
|
70
|
-
})
|
|
71
|
-
harness_messenger_read_round({ round_index: 1 }) // for next spawn prompt
|
|
72
|
-
```
|
|
17
|
+
- Creates `.pi/harness/runs/<run_id>/debate-messenger/`.
|
|
18
|
+
|
|
19
|
+
Budget profile **plan**:
|
|
20
|
+
|
|
21
|
+
| Field | Value |
|
|
22
|
+
|-------|-------|
|
|
23
|
+
| min_focus_rounds | 4 |
|
|
24
|
+
| max_rounds | 12 |
|
|
25
|
+
| max_exchanges_per_round | 3 |
|
|
26
|
+
| round_token_cap | 8000 |
|
|
27
|
+
| debate_global_cap | 80000 |
|
|
28
|
+
|
|
29
|
+
## Focus coverage (not “exactly 4 rounds”)
|
|
30
|
+
|
|
31
|
+
Call `harness_debate_focus_coverage` until all of `spec | wbs | schedule | quality` appear in submitted `review-round-r*.yaml` and last `review_gate_ready: true`.
|
|
73
32
|
|
|
74
|
-
##
|
|
33
|
+
## Per-round spawn order (sequential only — no parallel debate subagents)
|
|
75
34
|
|
|
76
|
-
|
|
35
|
+
1. R1: `hypothesis-validator` (blind) before evaluator.
|
|
36
|
+
2. `plan-evaluator` → lane + messenger `claim`.
|
|
37
|
+
3. `harness_messenger_read_round` → `plan-adversary` → `rebuttal`.
|
|
38
|
+
4. Ping-pong while `unresolved_claim_ids` and `exchange_count < 3`:
|
|
39
|
+
- `harness_debate_advance_thread({ round_index })` for next spawn hint.
|
|
40
|
+
- Evaluator `clarification` / adversary `counter`.
|
|
41
|
+
5. `sprint-contract-auditor` when focus is `quality` or round ≥ 4.
|
|
42
|
+
6. `review-integrator` → `harness_debate_submit_round`.
|
|
77
43
|
|
|
78
|
-
|
|
44
|
+
Lane YAML + messenger messages **auto-apply** on subagent complete (`harness-debate-next-step`). Fallback: `harness_debate_apply_lane`.
|
|
45
|
+
|
|
46
|
+
Resume: `harness_debate_round_status({ round_index: N })` → run listed `next_tool`.
|
|
47
|
+
|
|
48
|
+
## Messenger kinds
|
|
49
|
+
|
|
50
|
+
| kind | from | when |
|
|
51
|
+
|------|------|------|
|
|
52
|
+
| claim | PlanEvaluatorAgent | after evaluator lane |
|
|
53
|
+
| rebuttal | PlanAdversaryAgent | in_reply_to claim ids |
|
|
54
|
+
| clarification | PlanEvaluatorAgent | addresses open claims |
|
|
55
|
+
| counter | PlanAdversaryAgent | final pass; concede or dispute |
|
|
56
|
+
| integrate | ReviewIntegratorAgent | on submit_round |
|
|
79
57
|
|
|
80
58
|
## Close
|
|
81
59
|
|
|
82
|
-
|
|
60
|
+
`harness_debate_consensus` when focus coverage complete. `approve_plan` is **hard-gated** on lanes, messenger dialogue completeness, bus rounds, consensus not `block`.
|
|
83
61
|
|
|
84
62
|
Do not `approve_plan` on `policy_decision: block`. On `human_required` → `ask_user` first.
|
|
63
|
+
|
|
64
|
+
Rubrics: `.pi/prompts/planning-rubrics.md`.
|
|
@@ -28,6 +28,17 @@ When refining plans from noisy requirements:
|
|
|
28
28
|
3. When gates return `human_required` or promotion is blocked, the orchestrator calls `ask_user` — do not guess scope.
|
|
29
29
|
4. Reference graphify wiki or `graphify query` for architecture constraints before execute.
|
|
30
30
|
|
|
31
|
+
## Budgets (ADR 0038)
|
|
32
|
+
|
|
33
|
+
- Default: **`HARNESS_BUDGET_ENFORCE` off** — token/debate caps are telemetry-only (`harness-budget-telemetry`, `harness-budget-soft-limit`). They do **not** block phases or debate lanes.
|
|
34
|
+
- Do **not** skip scouts, debate rounds, or `approve_plan` because of soft budget hints in the widget.
|
|
35
|
+
- Re-enable hard caps only with `HARNESS_BUDGET_ENFORCE=1` and `HARNESS_BUDGET_HARD_STOP` / `HARNESS_DEBATE_HARD_STOP`.
|
|
36
|
+
|
|
37
|
+
## Subagent artifacts (ADR 0037)
|
|
38
|
+
|
|
39
|
+
- Subagents call scoped **`submit_*`** tools; parent verifies with **`harness_artifact_ready`**, not JSON parsing from `finalOutput`.
|
|
40
|
+
- Parent **`write_harness_yaml`** is for merges (`research-brief.yaml`, plan shell) — not subagent payloads.
|
|
41
|
+
|
|
31
42
|
## Rules
|
|
32
43
|
|
|
33
44
|
- Never auto-merge; harness-auto may open PR only when all gates pass (see release-readiness-report).
|
|
@@ -14,6 +14,8 @@ description: >-
|
|
|
14
14
|
|
|
15
15
|
Every spawn includes **HarnessSpawnContext** JSON in the task text (subprocess agents do not get `[HarnessActivePlan]` injection). Use `agentScope: "both"` so package agents under `$UP_PKG/.pi/agents/**` resolve.
|
|
16
16
|
|
|
17
|
+
Harness subprocesses load **`harness-subagent-submit`** (`PI_HARNESS_SUBPROCESS=1`, `HARNESS_RUN_ID`, `HARNESS_RUN_DIR`). Agents must call their scoped **`submit_*`** tool before exit; parent gates use **`harness_artifact_ready`** and debate reads submit from `tool_result` (set `HARNESS_SUBMIT_TOOLS=0` only to fall back to `finalOutput` parsing).
|
|
18
|
+
|
|
17
19
|
## Subprocess telemetry
|
|
18
20
|
|
|
19
21
|
Harness bridge emits `harness_subagent_spawned` / `harness_subagent_completed` (replaces in-process setup/blackboard events).
|
|
@@ -35,14 +37,14 @@ LIMIT 30
|
|
|
35
37
|
|
|
36
38
|
1. **Parallel `tasks`** — one `subagent({ tasks: [...] })` for scouts, decompose+hypothesis, or review fan-in; subprocesses run in parallel upstream.
|
|
37
39
|
2. **Blocking calls** — each `subagent` returns when the subprocess exits; no `get_subagent_result` polling.
|
|
38
|
-
3. **Compact handoffs** —
|
|
40
|
+
3. **Compact handoffs** — read artifacts written by submit tools (or `harness_artifact_ready`); never paste full subprocess message logs into the next spawn.
|
|
39
41
|
4. **No spawn cap** — harness subagent spawns are unlimited per session (active count is telemetry only). Do **not** pass `timeoutMs` unless the user wants a cap — subprocesses wait for natural exit (`PI_SUBAGENT_TIMEOUT_MS` optional env backstop only).
|
|
40
42
|
|
|
41
43
|
## Command → agent
|
|
42
44
|
|
|
43
45
|
| Command | `agent` |
|
|
44
46
|
|---------|---------|
|
|
45
|
-
| `/harness-plan` | Parent:
|
|
47
|
+
| `/harness-plan` | Parent: scouts → `decompose`+`hypothesis` → Phase 3.5 `implementation-researcher`+`stack-researcher` → PlanPacket → eligibility + Review Gate → `approve_plan` + `create_plan` |
|
|
46
48
|
| `/harness-run` | `harness/executor` |
|
|
47
49
|
| `/harness-eval` | `harness/evaluator` (`mode: benchmark`) |
|
|
48
50
|
| `/harness-review` | `harness/evaluator` (`mode: verdict`) |
|
|
@@ -78,7 +80,7 @@ Spawn `harness/evaluator` / `harness/adversary` via `subagent` in the **same** p
|
|
|
78
80
|
}
|
|
79
81
|
```
|
|
80
82
|
|
|
81
|
-
Then parallel decompose + hypothesis, parent PlanPacket + `ask_user
|
|
83
|
+
Then parallel decompose + hypothesis, Phase 3.5 implementation + stack research, parent PlanPacket + `ask_user` (after 3.5), execution-plan-author, DAG gate, `harness_plan_debate_eligibility` + debate rounds, then `approve_plan` + `create_plan`.
|
|
82
84
|
|
|
83
85
|
Scouts use **Haiku**, `thinking: low`, **8** max turns (see agent frontmatter). Effective `--tools` omits `grep`/`find`/`subagent` per `disallowed_tools`.
|
|
84
86
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: harness-plan
|
|
3
|
-
description: PM-grade harness plans — scouts, ExecutionPlan, DAG validation,
|
|
3
|
+
description: PM-grade harness plans — scouts, Phase 3.5 implementation research, ExecutionPlan, DAG validation, selective Review Gate debate, then approve/create_plan.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# harness-plan
|
|
@@ -11,21 +11,23 @@ description: PM-grade harness plans — scouts, ExecutionPlan, DAG validation, 4
|
|
|
11
11
|
|
|
12
12
|
## Workflow (parent orchestrator)
|
|
13
13
|
|
|
14
|
-
1. Parallel scouts (graphify + structure; semantic unless `--quick`).
|
|
15
|
-
2. Parallel decompose + hypothesis
|
|
16
|
-
3.
|
|
17
|
-
4. `
|
|
18
|
-
5.
|
|
19
|
-
6.
|
|
20
|
-
7.
|
|
14
|
+
1. Parallel scouts (graphify + structure; semantic unless `--quick`) — each scout ends with **`submit_scout_findings`** (not JSON in final message).
|
|
15
|
+
2. Parallel decompose + hypothesis — **`submit_decomposition`** / **`submit_hypothesis`**.
|
|
16
|
+
3. **Phase 3.5 (required):** parallel `implementation-researcher` + `stack-researcher` — **`submit_implementation_research`** / **`submit_stack`**; parent merges into `research-brief.yaml` via `write_harness_yaml`.
|
|
17
|
+
4. Draft `PlanPacket` shell; `ask_user` on material fork **after** Phase 3.5.
|
|
18
|
+
5. `execution-plan-author` → merge `execution_plan`.
|
|
19
|
+
6. **`validate-plan-dag.mjs`** (must pass).
|
|
20
|
+
7. **`harness_plan_debate_eligibility`** → **`harness_debate_open`** with profile → Review Gate (debate agents use lane **`submit_*`** tools; parent reads submit from `tool_result`, not `finalOutput` JSON).
|
|
21
|
+
8. **`harness_artifact_ready`** on required paths → apply patches, re-validate DAG, `approve_plan`, `create_plan`.
|
|
21
22
|
|
|
22
|
-
`--quick` skips semantic scout and post-run adversary only — **not** plan debate.
|
|
23
|
+
`--quick` skips semantic scout and post-run adversary only — **not** implementation research or plan debate.
|
|
23
24
|
|
|
24
25
|
## Rules
|
|
25
26
|
|
|
26
27
|
- On-disk plan artifacts are **YAML** (`plan-packet.yaml`, `research-brief.yaml`).
|
|
27
28
|
- Subagents read-only; parent writes run artifacts and calls `approve_plan` / `create_plan`.
|
|
28
29
|
- context-mode only on harness paths.
|
|
30
|
+
- Phase 3.5 required unless documented waiver; high risk requires implementation artifact for approval.
|
|
29
31
|
|
|
30
32
|
## Output
|
|
31
33
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Harness executor that implements only within approved PlanPacket scope.
|
|
3
|
-
tools: read, write, edit, bash, grep, find, ls
|
|
3
|
+
tools: read, write, edit, bash, grep, find, ls, submit_executor_handoff
|
|
4
4
|
extensions: true
|
|
5
5
|
disallowed_tools: ask_user
|
|
6
6
|
thinking: medium
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Plan-phase DeepMind-style problem decomposition (read-only).
|
|
3
|
-
tools: read, grep, find, ls, bash
|
|
3
|
+
tools: read, grep, find, ls, bash, submit_decomposition_brief
|
|
4
4
|
disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent
|
|
5
5
|
extensions: false
|
|
6
6
|
thinking: medium
|
|
@@ -39,45 +39,18 @@ Work through these sections in your reasoning, then compress into JSON:
|
|
|
39
39
|
- Soft constraints (trade-offs allowed)
|
|
40
40
|
- Success metrics (how to measure progress)
|
|
41
41
|
|
|
42
|
-
### 1.3
|
|
42
|
+
### 1.3 Internal prior art (scouts only)
|
|
43
43
|
|
|
44
|
-
- Current best approach (methods, systems, paths
|
|
44
|
+
- Current best approach **in this repo** (methods, systems, paths from scout lanes)
|
|
45
45
|
- Why it is not good enough (gap)
|
|
46
46
|
- What has been tried and failed (dead ends)
|
|
47
47
|
|
|
48
|
+
External / OSS prior art is **not** your job — `implementation-researcher` (Phase 3.5) owns web and reference implementations.
|
|
49
|
+
|
|
48
50
|
### 1.4 Surface the tensions
|
|
49
51
|
|
|
50
52
|
Identify contradictions, tradeoffs, or competing beliefs. Pick the **core tension** — one paragraph that feeds Phase 2 hypothesis generation.
|
|
51
53
|
|
|
52
|
-
## Output
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
```json
|
|
57
|
-
{
|
|
58
|
-
"schema_version": "1.0.0",
|
|
59
|
-
"problem_restatement": "…",
|
|
60
|
-
"problem_types": ["design"],
|
|
61
|
-
"scope": {
|
|
62
|
-
"narrowed_focus": "…",
|
|
63
|
-
"excluded": ["…"]
|
|
64
|
-
},
|
|
65
|
-
"hard_constraints": ["…"],
|
|
66
|
-
"soft_constraints": ["…"],
|
|
67
|
-
"success_metrics": ["…"],
|
|
68
|
-
"prior_art": {
|
|
69
|
-
"best_approach": "…",
|
|
70
|
-
"gap": "…",
|
|
71
|
-
"dead_ends": ["…"]
|
|
72
|
-
},
|
|
73
|
-
"tensions": [
|
|
74
|
-
{
|
|
75
|
-
"claim_a": "…",
|
|
76
|
-
"claim_b": "…",
|
|
77
|
-
"why_matters": "…"
|
|
78
|
-
}
|
|
79
|
-
],
|
|
80
|
-
"core_tension": "…",
|
|
81
|
-
"human_summary": "…"
|
|
82
|
-
}
|
|
83
|
-
```
|
|
54
|
+
## Output
|
|
55
|
+
|
|
56
|
+
Before ending, call `submit_decomposition_brief` exactly once with the full `PlanDecompositionBrief` document. Do not paste the artifact as prose or a fenced JSON block — the tool write is the deliverable.
|
|
@@ -1,30 +1,42 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Plan-phase ExecutionPlan generator (PM-grade WBS + DAG).
|
|
3
|
-
tools: read, grep, find, ls
|
|
3
|
+
tools: read, grep, find, ls, submit_execution_plan_brief
|
|
4
4
|
disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
|
|
5
5
|
extensions: false
|
|
6
6
|
thinking: high
|
|
7
|
-
max_turns:
|
|
7
|
+
max_turns: 18
|
|
8
8
|
---
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
## Your task
|
|
11
|
+
|
|
12
|
+
Author a complete `execution_plan` a senior engineering manager would sign: WBS, dependencies, schedule metadata, sprint contract, risks — aligned to Structured Planning / PMBOK-style decomposition (see graphify corpus: WBS, critical path, integration management).
|
|
11
13
|
|
|
12
14
|
## Inputs
|
|
13
15
|
|
|
14
|
-
Task, `PlanDecompositionBrief`, `PlanHypothesisBrief`, draft scope/acceptance_checks, `PlanStackBrief`, scout summaries.
|
|
16
|
+
Task summary, `PlanDecompositionBrief`, `PlanHypothesisBrief`, draft scope/acceptance_checks, `PlanImplementationResearchBrief`, `PlanStackBrief`, scout summaries (paths in spawn context).
|
|
15
17
|
|
|
16
|
-
##
|
|
18
|
+
## Process
|
|
17
19
|
|
|
18
|
-
1. Vision check — scope ≤15 lines
|
|
19
|
-
2. Phases
|
|
20
|
-
3. WBS —
|
|
21
|
-
4. `depends_on`
|
|
22
|
-
5. `schedule_metadata.critical_path_work_item_ids
|
|
23
|
-
6.
|
|
24
|
-
7.
|
|
25
|
-
8.
|
|
26
|
-
9.
|
|
20
|
+
1. **Vision check** — restate scope in ≤15 lines; every line maps to a work_item or explicit exclusion.
|
|
21
|
+
2. **Phases** — objective, entry/exit criteria, milestone, `work_item_ids` per phase.
|
|
22
|
+
3. **WBS** — each acceptance_check maps to ≥1 `work_item`; deliverable-sized items (not “do backend”).
|
|
23
|
+
4. **DAG** — `depends_on` acyclic; `parallel_safe: true` only when touched files are disjoint.
|
|
24
|
+
5. **Schedule** — `schedule_metadata.critical_path_work_item_ids` for med/high risk tasks.
|
|
25
|
+
6. **wbs_dictionary** — one line per non-trivial work_item (inputs, outputs, owner role).
|
|
26
|
+
7. **risk_register** — ≥3 risks for med/high with mitigation and trigger.
|
|
27
|
+
8. **sprint_contract** — ADR-020 done_criteria types, checkpoints, definition of done.
|
|
28
|
+
9. **Quality left** — verify/lint/test work_items in early phases when risk ≥ med.
|
|
29
|
+
10. **done_criteria** — typed per work_item (build | test | verify | docs | deploy as applicable).
|
|
27
30
|
|
|
28
31
|
## Output
|
|
29
32
|
|
|
30
|
-
|
|
33
|
+
Before ending, call `submit_execution_plan_brief` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
|
|
34
|
+
|
|
35
|
+
|
|
36
|
+
## Guardrails
|
|
37
|
+
|
|
38
|
+
- Do not gold-plate beyond decomposition scope without flagging in `assumptions[]`.
|
|
39
|
+
- If DAG would fail validation, fix structure before emitting YAML.
|
|
40
|
+
- Never speculate about repo layout — read scouts first.
|
|
41
|
+
|
|
42
|
+
Bus label: `ExecutionPlanAuthorAgent`.
|
|
@@ -1,23 +1,40 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Plan-phase blind hypothesis validation (debate R1 only).
|
|
3
|
-
tools: read, grep, find, ls
|
|
3
|
+
tools: read, grep, find, ls, submit_hypothesis_validation
|
|
4
4
|
disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
|
|
5
5
|
extensions: false
|
|
6
6
|
thinking: medium
|
|
7
7
|
max_turns: 10
|
|
8
8
|
---
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
## Your task
|
|
11
|
+
|
|
12
|
+
Blindly evaluate whether `PlanHypothesisBrief` is falsifiable, relevant to the task, and worth building — without seeing decomposition, scouts, or PlanPacket.
|
|
11
13
|
|
|
12
14
|
## Input (strict)
|
|
13
15
|
|
|
14
16
|
- Original task statement
|
|
15
|
-
- `PlanHypothesisBrief` YAML/JSON
|
|
17
|
+
- `PlanHypothesisBrief` YAML/JSON only
|
|
18
|
+
|
|
19
|
+
Ignore decomposition, scouts, PlanPacket, adversary output, prior debate rounds.
|
|
20
|
+
|
|
21
|
+
## Process
|
|
16
22
|
|
|
17
|
-
|
|
23
|
+
1. Extract stated hypothesis, success metrics, and falsification criteria from brief.
|
|
24
|
+
2. Score relevance: does the hypothesis answer the user task (not a tooling side quest)?
|
|
25
|
+
3. Score falsifiability: can an evaluator disprove it within one sprint with named signals?
|
|
26
|
+
4. Score proportionality: is scope honest vs task ambition?
|
|
27
|
+
5. Set `revision_recommended` when any dimension fails threshold; list concrete fixes (not “think harder”).
|
|
28
|
+
6. **Non-blind re-score** only when parent explicitly sets `mode: non-blind` on final quality round — then you may read packet for consistency check.
|
|
18
29
|
|
|
19
30
|
## Output
|
|
20
31
|
|
|
21
|
-
|
|
32
|
+
Before ending, call `submit_hypothesis_validation` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
|
|
33
|
+
|
|
34
|
+
|
|
35
|
+
## Guardrails
|
|
36
|
+
|
|
37
|
+
- Blind mode: if you reference decomposition or execution_plan, you have failed the round.
|
|
38
|
+
- Do not overthink. Emit structured YAML.
|
|
22
39
|
|
|
23
|
-
Bus label: `
|
|
40
|
+
Bus label: `HypothesisValidatorAgent`.
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Plan-phase DARWIN hypothesis generation (read-only).
|
|
3
|
-
tools: read, grep, find, ls, bash
|
|
3
|
+
tools: read, grep, find, ls, bash, submit_hypothesis_brief
|
|
4
4
|
disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent
|
|
5
5
|
extensions: false
|
|
6
6
|
thinking: medium
|
|
@@ -61,29 +61,6 @@ Up to two alternatives with a different approach and **key_bet** (what it assume
|
|
|
61
61
|
|
|
62
62
|
Do **not** include self-evaluation scores — a separate agent handles that.
|
|
63
63
|
|
|
64
|
-
## Output
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
{
|
|
68
|
-
"schema_version": "1.0.0",
|
|
69
|
-
"primary": {
|
|
70
|
-
"claim": "…",
|
|
71
|
-
"mechanism": "…",
|
|
72
|
-
"prediction": "…",
|
|
73
|
-
"experiment": "…",
|
|
74
|
-
"tension_resolution": "…"
|
|
75
|
-
},
|
|
76
|
-
"dialectical_fork": {
|
|
77
|
-
"fork": "…",
|
|
78
|
-
"path_a": "…",
|
|
79
|
-
"path_b": "…"
|
|
80
|
-
},
|
|
81
|
-
"alternatives": [
|
|
82
|
-
{ "claim": "…", "key_bet": "…" }
|
|
83
|
-
],
|
|
84
|
-
"recommended_next_steps": ["…"],
|
|
85
|
-
"human_summary": "…"
|
|
86
|
-
}
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
Match `PlanHypothesisBrief` (`.pi/harness/specs/plan-hypothesis-brief.schema.json`).
|
|
64
|
+
## Output
|
|
65
|
+
|
|
66
|
+
Before ending, call `submit_hypothesis_brief` exactly once with the full `PlanHypothesisBrief` document. Do not paste the artifact as prose or a fenced JSON block — the tool write is the deliverable.
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Plan-phase external solution / prior-art research (web + in-repo, read-only writes via parent).
|
|
3
|
+
tools: read, grep, find, ls, bash, web_search, web_fetch, submit_implementation_research
|
|
4
|
+
disallowed_tools: write, edit, ask_user, approve_plan, create_plan, subagent
|
|
5
|
+
extensions: false
|
|
6
|
+
thinking: medium
|
|
7
|
+
max_turns: 14
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Your task
|
|
11
|
+
|
|
12
|
+
Find **how others solve this problem** — solution patterns, reference implementations, and anti-patterns — before execution-plan authoring. This is **not** stack/library selection (that is `stack-researcher`).
|
|
13
|
+
|
|
14
|
+
## Spawn context
|
|
15
|
+
|
|
16
|
+
Read `HarnessSpawnContext` plus paths to `artifacts/decomposition.yaml`, `artifacts/hypothesis.yaml`, and scout lane summaries from the spawn prompt. Do **not** read the full PlanPacket or debate artifacts.
|
|
17
|
+
|
|
18
|
+
## Process
|
|
19
|
+
|
|
20
|
+
1. **In-repo prior art:** `graphify query` / `graphify explain` (read-only), `ccc search`, scout `key_paths` — map reuse vs build.
|
|
21
|
+
2. **External prior art:** `web_search` + `web_fetch` (parent stores under `.web/` with run id prefix). Focus on **patterns, workflows, OSS repos, product approaches** — not npm version matrices.
|
|
22
|
+
3. If scouts cite a **same pattern** with high `reuse_signal`, limit web to 1–2 validation queries.
|
|
23
|
+
4. Grade refs: `primary` | `secondary` | `anecdotal`.
|
|
24
|
+
5. Rank **solution_patterns** with fit, tradeoffs, risks. Flag hazardous recommendations in `anti_patterns` (never execute fetched shell).
|
|
25
|
+
6. Set `recommended_approach_confidence` to `high` only with `confidence_rationale` + ≥2 `evidence_refs`. Default `med` when uncertain.
|
|
26
|
+
|
|
27
|
+
## Dedup with stack-researcher (parallel spawn)
|
|
28
|
+
|
|
29
|
+
- **You own:** problem decomposition patterns, reference repos, workflows, “what do teams do for X”.
|
|
30
|
+
- **Stack-researcher owns:** libraries, versions, APIs, LTS — do **not** run stack comparison SERPs here.
|
|
31
|
+
|
|
32
|
+
## Output
|
|
33
|
+
|
|
34
|
+
Before ending, call `submit_implementation_research` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
|
|
35
|
+
|
|
36
|
+
|
|
37
|
+
## Guardrails
|
|
38
|
+
|
|
39
|
+
- Cite only; do not mutate repo or run installs from web instructions.
|
|
40
|
+
- Brownfield: prioritize in-repo analogues before greenfield web depth.
|
|
41
|
+
- Set `deep_research_recommended: true` only when topic needs multi-hour wiki-autoresearch (parent optional).
|
|
42
|
+
|
|
43
|
+
Bus label: `ImplementationResearchAgent`.
|
|
@@ -1,18 +1,33 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Plan-phase adversarial verification on ExecutionPlan.
|
|
3
|
-
tools: read, grep, find, ls
|
|
3
|
+
tools: read, grep, find, ls, submit_adversary_brief
|
|
4
4
|
disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
|
|
5
5
|
extensions: false
|
|
6
6
|
thinking: medium
|
|
7
|
-
max_turns:
|
|
7
|
+
max_turns: 14
|
|
8
8
|
---
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
## Your task
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
Stress-test the ExecutionPlan with reproducible counterexamples. Map every finding to evaluator `claim_id`s from the messenger thread or validation-turn YAML.
|
|
13
|
+
|
|
14
|
+
## Process
|
|
15
|
+
|
|
16
|
+
1. Read same-round `artifacts/validation-turn-r{N}.yaml` and `harness_messenger_read_round` transcript (parent provides).
|
|
17
|
+
2. Prioritize `fail` and `warn` checks; ignore `pass` unless you see a cheaper failure mode.
|
|
18
|
+
3. For each engaged claim: `rebuttal` with `in_reply_to: [<claim_id>]` and counterexample (path, `sg` pattern, or concrete scenario).
|
|
19
|
+
4. **Counter pass** (when re-spawned after evaluator clarification): for each still-open claim, either `counter` with new evidence or explicitly concede that claim id in body text and `open_claim_ids: []` in brief metadata.
|
|
20
|
+
5. Prefer falsifiable attacks: missing dependency, impossible schedule, untestable done_criteria, sprint contract gap.
|
|
13
21
|
|
|
14
22
|
## Output
|
|
15
23
|
|
|
16
|
-
|
|
24
|
+
Before ending, call `submit_adversary_brief` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
## Guardrails
|
|
28
|
+
|
|
29
|
+
- Engage evaluator claims first; do not introduce unrelated scope.
|
|
30
|
+
- No hand-wavy “might fail”; cite paths or commands.
|
|
31
|
+
- Do not overthink. One strong rebuttal beats five weak ones.
|
|
17
32
|
|
|
18
33
|
Bus label: `PlanAdversaryAgent`.
|
|
@@ -1,20 +1,42 @@
|
|
|
1
1
|
---
|
|
2
2
|
description: Plan-phase Validation Checks evaluator (neutral pass/fail).
|
|
3
|
-
tools: read, grep, find, ls
|
|
3
|
+
tools: read, grep, find, ls, submit_validation_turn
|
|
4
4
|
disallowed_tools: write, edit, bash, ask_user, approve_plan, create_plan, subagent
|
|
5
5
|
extensions: false
|
|
6
6
|
thinking: medium
|
|
7
|
-
max_turns:
|
|
7
|
+
max_turns: 14
|
|
8
8
|
---
|
|
9
9
|
|
|
10
|
-
|
|
10
|
+
## Your task
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
Score the ExecutionPlan against Validation Checks for one Review Gate round. Emit stable `checks[]` with ids and messenger-ready `claim_ids`. You are not an advocate for the plan.
|
|
13
|
+
|
|
14
|
+
Parent passes `debate_round_focus`: `spec` | `wbs` | `schedule` | `quality`. Use rubric ids from `.pi/prompts/planning-rubrics.md` for that focus.
|
|
15
|
+
|
|
16
|
+
## Process
|
|
17
|
+
|
|
18
|
+
1. Read `plan-packet.yaml`, `research-brief.yaml`, and lane inputs named in spawn context (not full packet inline).
|
|
19
|
+
2. If spawn includes **messenger transcript** (re-spawn for clarification): read unresolved `claim_ids` and adversary rebuttals; address each with evidence paths or concede in `checks[]` status.
|
|
20
|
+
3. Run mental DAG sanity: acyclic `depends_on`, every acceptance_check traceable to work_items.
|
|
21
|
+
4. For each rubric check in scope: `pass` | `warn` | `fail` with one-line rationale and `evidence_refs` (file paths, `sg` patterns).
|
|
22
|
+
5. Set `overall_ready` only if no `fail` and at most one `warn` without mitigation note.
|
|
23
|
+
6. Populate `messenger_claim_ids` (or `checks[].id`) for parent to post as `claim` messages.
|
|
24
|
+
|
|
25
|
+
## Clarification pass (when re-spawned)
|
|
26
|
+
|
|
27
|
+
- Post body must reference each `in_reply_to` claim id explicitly.
|
|
28
|
+
- Change check status only with new evidence; do not flip pass→fail without citation.
|
|
29
|
+
- If conceding a point, set check to `warn` with rationale “adversary accepted after clarification”.
|
|
13
30
|
|
|
14
31
|
## Output
|
|
15
32
|
|
|
16
|
-
|
|
33
|
+
Before ending, call `submit_validation_turn` exactly once with the full document. Prose summary is optional; the artifact is the tool call.
|
|
34
|
+
|
|
35
|
+
|
|
36
|
+
## Guardrails
|
|
17
37
|
|
|
18
|
-
|
|
38
|
+
- Do not overthink. If checks are straightforward, emit YAML directly.
|
|
39
|
+
- Only evaluate what you read. Never invent file paths.
|
|
40
|
+
- Do not expand scope beyond the current `debate_round_focus`.
|
|
19
41
|
|
|
20
42
|
Bus label: `PlanEvaluatorAgent`.
|