@codename_inc/spectre 5.1.0 → 5.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/plugins/spectre/.claude-plugin/plugin.json +1 -1
- package/plugins/spectre/skills/code_review/SKILL.md +2 -0
- package/plugins/spectre/skills/execute/SKILL.md +69 -24
- package/plugins/spectre/skills/plan_review/SKILL.md +33 -28
- package/plugins/spectre-codex/skills/code_review/SKILL.md +2 -0
- package/plugins/spectre-codex/skills/execute/SKILL.md +69 -24
- package/plugins/spectre-codex/skills/plan_review/SKILL.md +33 -28
package/package.json
CHANGED
|
@@ -171,6 +171,8 @@ Optional user input to seed this workflow.
|
|
|
171
171
|
- **MEDIUM**: Quality improvements, test coverage, configuration, performance (non-critical)
|
|
172
172
|
- **LOW**: Documentation, polish, cleanup
|
|
173
173
|
|
|
174
|
+
**Evidence rule:** Every CRITICAL or HIGH finding MUST include (1) `file:line` and (2) a reproducible failure scenario or exploit path describing observable behavior. Findings without an evidence chain are auto-downgraded one severity level. "Could potentially" is not evidence.
|
|
175
|
+
|
|
174
176
|
**Perform comprehensive analysis covering all aspects:**
|
|
175
177
|
|
|
176
178
|
### 🔧 Foundation & Correctness
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: execute
|
|
3
|
-
description: 👻 | Adaptive Wave-Based Build
|
|
3
|
+
description: 👻 | Adaptive Wave-Based Build with Per-Wave Verification Gate
|
|
4
4
|
user-invocable: true
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -11,9 +11,9 @@ user-invocable: true
|
|
|
11
11
|
Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
|
|
12
12
|
|
|
13
13
|
|
|
14
|
-
# execute: Adaptive Task Execution with
|
|
14
|
+
# execute: Adaptive Task Execution with Per-Wave Verification
|
|
15
15
|
|
|
16
|
-
Execute tasks in parallel waves with full scope context, adapt based on learnings,
|
|
16
|
+
Execute tasks in parallel waves with full scope context, verify each wave before proceeding, adapt based on learnings, audit cross-wave integration, generate manual test guide. Outcome: complete implementation with verified quality and E2E requirement coverage.
|
|
17
17
|
|
|
18
18
|
## ARGUMENTS
|
|
19
19
|
|
|
@@ -39,7 +39,9 @@ $ARGUMENTS
|
|
|
39
39
|
|
|
40
40
|
2. **Dispatch Wave**: Launch parallel @dev subagents (1 per task batch)
|
|
41
41
|
- **CRITICAL**: Each subagent MUST read `SCOPE_DOCS` before executing
|
|
42
|
-
- Each receives: task batch assignment,
|
|
42
|
+
- Each receives: task batch assignment, SCOPE_DOCS paths, and (after wave 1) a **Prior-Wave Context** block
|
|
43
|
+
- **Prior-Wave Context** (REQUIRED in waves 2+): the orchestrator appends each prior wave's @dev Completion Reports verbatim into this wave's dispatch prompt under a `## Prior-Wave Context` header. Includes Completed tasks, Files changed, Scope signal, Discoveries, and Guidance from each prior batch. This is how state is carried forward — there is no separate state file.
|
|
44
|
+
- **Test discovery**: instruct @dev to use the project's native related-test command (`jest --findRelatedTests <file>`, `pytest` by path, `vitest related`, `cargo test <path>`). Do not create parallel test files for code already covered.
|
|
43
45
|
- Instruct: "Read scope docs first to understand E2E UX and integration points. Load @skill-spectre:spectre-tdd, then execute tasks sequentially using its TDD methodology. **Commit after each parent task** with conventional commit format (e.g., `feat(module): add X`, `fix(module): resolve Y`). Return completion report with **Implementation Insights** + **E2E Completeness Check**."
|
|
44
46
|
|
|
45
47
|
**E2E Completeness Check** (subagent returns one per batch):
|
|
@@ -47,15 +49,64 @@ $ARGUMENTS
|
|
|
47
49
|
- 🟡 Gap — [specific functionality missing for E2E UX]
|
|
48
50
|
- 🔴 Blocker — [cannot deliver spec without changes to other tasks]
|
|
49
51
|
|
|
50
|
-
3. **
|
|
52
|
+
3. **Per-Wave Verification Gate**: Verify the wave's output before adapting or advancing.
|
|
51
53
|
|
|
52
|
-
|
|
54
|
+
**3a. Deterministic pre-gate (no AI)**
|
|
55
|
+
- Detect project commands from `package.json` / `pyproject.toml` / `Cargo.toml` / `Makefile`
|
|
56
|
+
- Run lint, typecheck, build — whichever apply
|
|
57
|
+
- If any fail: dispatch @dev to fix the failures, re-run the gate. Do NOT invoke @reviewer until all deterministic checks pass.
|
|
58
|
+
|
|
59
|
+
**3b. Parallel review lenses (single message, two @reviewer dispatches)**
|
|
60
|
+
|
|
61
|
+
Build each reviewer prompt from:
|
|
62
|
+
- Wave diff: `git diff <parent-of-first-wave-commit>..HEAD`
|
|
63
|
+
- Acceptance criteria: verbatim text from scope/tasks docs for this wave's tasks
|
|
64
|
+
- Files-touched manifest
|
|
65
|
+
|
|
66
|
+
**Forbidden in reviewer prompts**: @dev completion reports, implementer rationale, orchestrator paraphrase of "what the dev did and why". The reviewer is a clean room — diff + criteria only.
|
|
67
|
+
|
|
68
|
+
**Lens 1 — security + correctness**
|
|
69
|
+
- OWASP Top-10, injection, auth, secrets, data exposure
|
|
70
|
+
- Logic, edge cases, state transitions
|
|
71
|
+
- Scope adherence (flag only in-scope issues; do not flag missing out-of-scope work)
|
|
72
|
+
|
|
73
|
+
**Lens 2 — wiring**
|
|
74
|
+
- Apply the Defined → Connected → Reachable methodology:
|
|
75
|
+
- Defined: code exists in a file
|
|
76
|
+
- Connected: code is imported/called by other code
|
|
77
|
+
- Reachable: a user action can trigger the code path
|
|
78
|
+
- For each new function/component, grep for usage (not just definition)
|
|
79
|
+
- For UI features, trace render-backward: JSX ← variable ← source ← user action
|
|
80
|
+
- Flag dead computations (computed but never reach output) and old code paths still active when replaced
|
|
81
|
+
|
|
82
|
+
**Severity & evidence rule** (enforced in both lens prompts):
|
|
83
|
+
- Every CRITICAL or HIGH finding MUST include:
|
|
84
|
+
1. `file:line` reference
|
|
85
|
+
2. A reproducible failure scenario or exploit path describing observable behavior
|
|
86
|
+
- Findings without an evidence chain are auto-downgraded one severity level. "Could potentially" is not evidence.
|
|
87
|
+
- Each finding includes a hash: `sha256(file_path + line + finding_category)` for the fix-loop ledger (3c).
|
|
88
|
+
|
|
89
|
+
**3c. Bounded fix loop**
|
|
90
|
+
|
|
91
|
+
If lens dispatches return CRITICAL/HIGH:
|
|
92
|
+
- **Iteration cap**: 3 fix waves maximum
|
|
93
|
+
- **Hash ledger**: maintain a set of finding hashes addressed. If a finding with a hash already in the ledger reappears in a later review, classify as "reviewer disagreement" and escalate to user — do NOT re-queue.
|
|
94
|
+
- **Fix/test ratio**: monitor changes per fix wave. If test-file changes > 0.5 × implementation-file changes, halt and surface to user — likely "fixing the test instead of the bug."
|
|
95
|
+
- **Diff-growth circuit-breaker**: if cumulative fix-wave diff grows > 25% per iteration, halt and surface — fixes are adding surface area, not reducing it.
|
|
96
|
+
- **Dispatch fix**: parallel @dev subagents address each CRITICAL/HIGH finding. Each fix-dev receives the finding's full evidence chain (file:line + scenario), not just the description.
|
|
97
|
+
- **Re-verify**: after fixes commit, return to 3a (deterministic) then 3b (lenses).
|
|
98
|
+
|
|
99
|
+
**3d. Exit condition**: No CRITICAL/HIGH remain, OR iteration cap reached and user has been notified of unresolved findings.
|
|
100
|
+
|
|
101
|
+
4. **Mark Complete**: Update tasks doc with `[x]` for completed tasks
|
|
102
|
+
|
|
103
|
+
5. **Reflect**: Review completion reports for:
|
|
53
104
|
- Scope signals (🟡/🟠/🔴) from implementation insights
|
|
54
105
|
- E2E completeness gaps (🟡/🔴) from completeness checks
|
|
55
|
-
- **If** all ⚪ across both → skip to step
|
|
106
|
+
- **If** all ⚪ across both → skip to step 7
|
|
56
107
|
- **Else** → adapt tasks
|
|
57
108
|
|
|
58
|
-
|
|
109
|
+
6. **Adapt** (only if triggered):
|
|
59
110
|
- Modify future tasks with learned context
|
|
60
111
|
- Add tasks for E2E gaps with `[ADDED - E2E gap]` prefix
|
|
61
112
|
- Add required sub-tasks with `[ADDED]` prefix
|
|
@@ -63,34 +114,28 @@ $ARGUMENTS
|
|
|
63
114
|
- Flag cross-task integration issues to remaining waves
|
|
64
115
|
- **Guardrails**: ❌ No "nice-to-have" additions, ❌ No scope expansion, ✅ Only adapt for spec compliance
|
|
65
116
|
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
## Step 2 - Code Review Loop
|
|
69
|
-
|
|
70
|
-
- **Action** — ExecutedeveviewLoop: Until no critical/high feedback:
|
|
117
|
+
7. **Next Wave**: Identify next tasks, gather prior-wave completion reports for the Prior-Wave Context block, return to step 1
|
|
71
118
|
|
|
72
|
-
|
|
73
|
-
2. **Analyze**: Identify critical/high items
|
|
74
|
-
- **If** none → exit loop
|
|
75
|
-
3. **Address**: Parallel @dev subagents fix feedback
|
|
76
|
-
4. **Re-verify**: Return to step 1
|
|
119
|
+
## Step 2 - Cross-Wave Validate
|
|
77
120
|
|
|
78
|
-
|
|
121
|
+
- **Action** — SpawnValidation: @analyst runs `Skill(validate)` (Claude slash route: `/spectre:validate`) with **narrowed scope**:
|
|
122
|
+
- Focus: cross-wave integration audit (did later waves silently break earlier waves' wiring?) + scope-creep audit (anything implemented that is NOT in the acceptance criteria?) + dead-computation sweep across the full cumulative diff
|
|
123
|
+
- Skip: per-area wiring verification (already done per-wave in Step 1.3b's wiring lens)
|
|
79
124
|
|
|
80
|
-
- **Action** —
|
|
81
|
-
- **Action** — AddressGaps: If high priority gaps → dispatch @dev subagents to fix
|
|
125
|
+
- **Action** — AddressGaps: If high priority gaps surface → dispatch @dev subagents to fix.
|
|
82
126
|
|
|
83
|
-
## Step
|
|
127
|
+
## Step 3 - Prepare for QA
|
|
84
128
|
|
|
85
129
|
- **Action** — GenerateTestGuide: @dev runs `Skill(create_test_guide)` (Claude slash route: `/spectre:create_test_guide`)
|
|
86
130
|
- Save to `{OUT_DIR}/test_guide.md`
|
|
87
131
|
|
|
88
|
-
## Step
|
|
132
|
+
## Step 4 - Report
|
|
89
133
|
|
|
90
134
|
- **Action** — SummarizeCompletion:
|
|
91
|
-
- Tasks completed, waves executed,
|
|
135
|
+
- Tasks completed, waves executed, per-wave fix-loop iteration counts, validation status
|
|
92
136
|
- Test guide location
|
|
93
137
|
- **Task Evolution Summary**: Adaptations made (or "None - original plan executed")
|
|
94
138
|
- **E2E Gaps Addressed**: Summary of completeness issues found and resolved
|
|
139
|
+
- **Unresolved Findings** (if any): Any CRITICAL/HIGH that hit the fix-loop cap and were escalated to user
|
|
95
140
|
|
|
96
141
|
- **Action** — RenderFooter: Use `@skill-spectre:spectre-guide` skill for Next Steps
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: plan_review
|
|
3
|
-
description: 👻 | Independent multi-lens review of plan.md
|
|
3
|
+
description: 👻 | Independent multi-lens review of plan.md and/or tasks.md — finds overengineering, missing verification, hallucinated deps, weak references
|
|
4
4
|
user-invocable: true
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -10,12 +10,12 @@ user-invocable: true
|
|
|
10
10
|
|
|
11
11
|
Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
|
|
12
12
|
|
|
13
|
-
# plan_review: Multi-Lens Review of Plan
|
|
13
|
+
# plan_review: Multi-Lens Review of Plan and/or Tasks
|
|
14
14
|
|
|
15
15
|
## Description
|
|
16
16
|
|
|
17
|
-
- **What** — Independent review of `plan.md`
|
|
18
|
-
- **Outcome** — Structured findings with concrete edit suggestions; optional write-back to update
|
|
17
|
+
- **What** — Independent review of any available planning artifacts (`plan.md`, `tasks.md`, and optional `task_context.md`) from four specialized lenses, dispatched in parallel
|
|
18
|
+
- **Outcome** — Structured findings with concrete edit suggestions; optional write-back to update the available artifacts
|
|
19
19
|
- **Role** — Senior staff engineer + reviewer panel; bias toward pragmatic problem-solving, YAGNI enforcement, and verifiability
|
|
20
20
|
|
|
21
21
|
## ARGUMENTS Input
|
|
@@ -42,25 +42,30 @@ A single reviewer biases toward the issues it notices first. Published practice
|
|
|
42
42
|
- **If** user specifies path in ARGUMENTS → `TASK_DIR={that value}`
|
|
43
43
|
- **Else** → `TASK_DIR=docs/tasks/{branch_name}`
|
|
44
44
|
|
|
45
|
-
- **Action** — ResolveArtifacts: Locate the
|
|
45
|
+
- **Action** — ResolveArtifacts: Locate the available review inputs.
|
|
46
46
|
- `PLAN=${TASK_DIR}/specs/plan.md` (or scoped name)
|
|
47
47
|
- `TASKS=${TASK_DIR}/specs/tasks.md` (or scoped name)
|
|
48
48
|
- `CONTEXT=${TASK_DIR}/task_context.md`
|
|
49
|
-
-
|
|
49
|
+
- `plan.md` and `tasks.md` are independently reviewable. It is valid to review only `plan.md`, only `tasks.md`, or both.
|
|
50
|
+
- `task_context.md` is helpful context but is not required. If it is missing, continue and note that requirements traceability is limited.
|
|
51
|
+
- If both `plan.md` and `tasks.md` are missing, stop and suggest the user run `/spectre:plan` or `/spectre:create_tasks` first.
|
|
52
|
+
- If exactly one of `plan.md` or `tasks.md` is missing, list it as absent context and continue. Do not decline, stop, or ask the user to create the missing artifact.
|
|
50
53
|
|
|
51
|
-
- **Action** —
|
|
54
|
+
- **Action** — ReadAvailable: Read each available file completely into context before dispatching reviewers. Reviewers receive curated excerpts plus an artifact manifest that says which files are present and absent. Every reviewer must review the artifacts that exist and must not treat absent artifacts as a blocker.
|
|
52
55
|
|
|
53
56
|
## Step 2 — Dispatch Four Parallel Reviewers
|
|
54
57
|
|
|
55
|
-
Spawn all four subagents in a single message (parallel). Each receives the same artifact excerpts
|
|
58
|
+
Spawn all four subagents in a single message (parallel). Each receives the same available artifact excerpts, the artifact manifest, and a different review brief.
|
|
59
|
+
|
|
60
|
+
Missing-artifact rule for every lens: review what exists. If a finding depends on a missing artifact, phrase it as "not reviewable because `<artifact>` is absent" only when that context is necessary; do not fail the review or ask for the missing artifact.
|
|
56
61
|
|
|
57
62
|
### Lens 1 — YAGNI / Familiar-Shape Bias (`@reviewer`)
|
|
58
63
|
|
|
59
|
-
> Review
|
|
64
|
+
> Review the available plan and/or task list for unrequested complexity. Agents have a documented "familiar-shape bias": shown a feature, they reproduce the mature-system shape from their training data (auth → adds rate-limiting; CRUD → adds soft-delete; form → adds optimistic UI; service → adds telemetry; module → adds feature flags). Your job is to find that bias here.
|
|
60
65
|
>
|
|
61
66
|
> Find:
|
|
62
|
-
> 1.
|
|
63
|
-
> 2.
|
|
67
|
+
> 1. When `plan.md` is present: anything in Technical Approach that isn't traceable to a requirement in available context (`task_context.md` / scope / PRD). If context is absent, use the plan's own requirements and boundaries.
|
|
68
|
+
> 2. When `tasks.md` is present: tasks that implement something the available requirements don't ask for. If requirements context is absent, use the task list's stated goals and boundaries.
|
|
64
69
|
> 3. Abstractions, interfaces, or layers introduced for a single concrete caller.
|
|
65
70
|
> 4. Generality (config files, plugin points, factories) where the actual need is one specific behavior.
|
|
66
71
|
> 5. Overlap with the `Out-of-Bounds — DO NOT add` list (if anything violates that list, it's a hard fail).
|
|
@@ -69,36 +74,36 @@ Spawn all four subagents in a single message (parallel). Each receives the same
|
|
|
69
74
|
|
|
70
75
|
### Lens 2 — Verifiability (`@analyst`)
|
|
71
76
|
|
|
72
|
-
> Review
|
|
77
|
+
> Review the available plan and/or task list for verification quality. The single highest-correlate of successful AI-agent execution is the ability to self-verify. Find every place where verification is missing, prose-only, or disconnected.
|
|
73
78
|
>
|
|
74
79
|
> Find:
|
|
75
|
-
> 1.
|
|
76
|
-
> 2.
|
|
77
|
-
> 3.
|
|
78
|
-
> 4.
|
|
79
|
-
> 5.
|
|
80
|
+
> 1. When `plan.md` is present: items in "Verification — How We Know This Works" that are prose ("works correctly", "is consistent") rather than executable (test name / observable behavior / state condition).
|
|
81
|
+
> 2. When `plan.md` is present: phases that don't declare a verification signal.
|
|
82
|
+
> 3. When `tasks.md` is present: sub-tasks whose acceptance criteria aren't one of the three executable types (test passes / observable behavior / state condition).
|
|
83
|
+
> 4. When both `plan.md` and `tasks.md` are present: verification signals in `plan.md` with no matching acceptance criterion in `tasks.md`.
|
|
84
|
+
> 5. When `tasks.md` is present: behavior-changing sub-tasks that lack a preceding RED test sub-task.
|
|
80
85
|
>
|
|
81
86
|
> Required output: list every non-executable criterion with a proposed rewrite in one of the three types. Cite file:line for each.
|
|
82
87
|
|
|
83
88
|
### Lens 3 — Existence / Hallucination (`@finder`)
|
|
84
89
|
|
|
85
|
-
> Review
|
|
90
|
+
> Review the available plan and/or task list for references to things that may not exist. AI-generated plans hallucinate file paths, package names, function signatures, and API endpoints at measurable rates (~20% for packages per Snyk analysis). Your job is to verify every reference is real.
|
|
86
91
|
>
|
|
87
92
|
> Verify:
|
|
88
|
-
> 1. Every file path mentioned in `plan.md` "Critical Files for Implementation" and
|
|
89
|
-
> 2. Every package in `plan.md` "External Dependencies" — does it exist at the named version? (Note: actual install/registry check is the executor's Phase 0 job; your job is to flag suspicious names — typos, near-misses to well-known packages, lookalikes.)
|
|
90
|
-
> 3. Every function, class, or symbol named in plan/tasks — grep the repo, confirm it exists where claimed.
|
|
93
|
+
> 1. Every file path mentioned in available artifacts, including `plan.md` "Critical Files for Implementation" and `tasks.md` Context blocks when present — does the file exist in the repo today? Use Glob/Read to confirm.
|
|
94
|
+
> 2. Every package in `plan.md` "External Dependencies" when `plan.md` is present — does it exist at the named version? (Note: actual install/registry check is the executor's Phase 0 job; your job is to flag suspicious names — typos, near-misses to well-known packages, lookalikes.)
|
|
95
|
+
> 3. Every function, class, or symbol named in available plan/tasks — grep the repo, confirm it exists where claimed.
|
|
91
96
|
> 4. Every API endpoint, env var, or CLI flag referenced — confirm it's defined in the codebase.
|
|
92
97
|
>
|
|
93
98
|
> Required output: list every reference that fails verification, with `expected: <plan claim>` and `actual: <repo state>`. If everything checks out, say so explicitly — don't pad.
|
|
94
99
|
|
|
95
100
|
### Lens 4 — Canonical Reference Quality (`@patterns`)
|
|
96
101
|
|
|
97
|
-
> Review
|
|
102
|
+
> Review the available plan and/or task list for the quality of "follow existing pattern" references. Anthropic's own guidance is to anchor plans with concrete examples (e.g., "HotDogWidget.php is a good example"). Vague "follow existing patterns" without a file:line anchor is a documented failure mode.
|
|
98
103
|
>
|
|
99
104
|
> Find:
|
|
100
|
-
> 1.
|
|
101
|
-
> 2.
|
|
105
|
+
> 1. When `plan.md` is present: places in Technical Approach that reference "existing patterns" or "similar features" without a specific file:line.
|
|
106
|
+
> 2. When `tasks.md` is present: sub-tasks whose Context block lacks a canonical reference pointer.
|
|
102
107
|
> 3. Better canonical references that the plan missed — actual files in the codebase that more closely match the intended shape.
|
|
103
108
|
> 4. Reuse opportunities the plan ignored: utilities, hooks, helpers, or types already in the repo that the plan re-implements.
|
|
104
109
|
>
|
|
@@ -150,12 +155,12 @@ Spawn all four subagents in a single message (parallel). Each receives the same
|
|
|
150
155
|
> - `1,3,5` — apply specific finding numbers
|
|
151
156
|
> - `skip` — leave artifacts unchanged
|
|
152
157
|
>
|
|
153
|
-
> For findings I apply, I'll edit
|
|
158
|
+
> For findings I apply, I'll edit the relevant available artifact(s) inline and re-run a fast self-check.
|
|
154
159
|
|
|
155
160
|
- **Wait** — User selects.
|
|
156
161
|
|
|
157
162
|
- **Action** — ApplyEdits: For each selected finding:
|
|
158
|
-
- Open the named artifact (plan.md or tasks.md)
|
|
163
|
+
- Open the named artifact (`plan.md` or `tasks.md`)
|
|
159
164
|
- Apply the Suggested Edit verbatim where possible; if the edit needs adaptation, make the minimum change consistent with the finding's intent
|
|
160
165
|
- Track which findings were applied
|
|
161
166
|
|
|
@@ -168,7 +173,7 @@ Spawn all four subagents in a single message (parallel). Each receives the same
|
|
|
168
173
|
- **Action** — ReportApplied:
|
|
169
174
|
|
|
170
175
|
> Applied: {list of finding numbers}. Skipped: {list}.
|
|
171
|
-
> {Path to updated
|
|
176
|
+
> {Path to updated artifact(s)}.
|
|
172
177
|
|
|
173
178
|
## Step 5 — Next Steps
|
|
174
179
|
|
|
@@ -178,6 +183,6 @@ Spawn all four subagents in a single message (parallel). Each receives the same
|
|
|
178
183
|
|
|
179
184
|
## Notes
|
|
180
185
|
|
|
181
|
-
- This skill does NOT generate plans or tasks. It reviews
|
|
186
|
+
- This skill does NOT generate plans or tasks. It reviews available planning artifacts. If only one of `plan.md` or `tasks.md` exists, review that artifact. Only route the user to `/spectre:plan` or `/spectre:create_tasks` when neither reviewable artifact exists.
|
|
182
187
|
- The four lenses are intentionally non-overlapping by design but will surface overlap in practice — dedupe at synthesis, don't ask reviewers to coordinate.
|
|
183
188
|
- The "Must-Delete" nomination from Lens 1 is mandatory output — even on a tight plan, naming the single weakest element is a forcing function against under-review.
|
|
@@ -171,6 +171,8 @@ Optional user input to seed this workflow.
|
|
|
171
171
|
- **MEDIUM**: Quality improvements, test coverage, configuration, performance (non-critical)
|
|
172
172
|
- **LOW**: Documentation, polish, cleanup
|
|
173
173
|
|
|
174
|
+
**Evidence rule:** Every CRITICAL or HIGH finding MUST include (1) `file:line` and (2) a reproducible failure scenario or exploit path describing observable behavior. Findings without an evidence chain are auto-downgraded one severity level. "Could potentially" is not evidence.
|
|
175
|
+
|
|
174
176
|
**Perform comprehensive analysis covering all aspects:**
|
|
175
177
|
|
|
176
178
|
### 🔧 Foundation & Correctness
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: "execute"
|
|
3
|
-
description: "👻 | Adaptive Wave-Based Build
|
|
3
|
+
description: "👻 | Adaptive Wave-Based Build with Per-Wave Verification Gate"
|
|
4
4
|
user-invocable: true
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -11,9 +11,9 @@ user-invocable: true
|
|
|
11
11
|
Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
|
|
12
12
|
|
|
13
13
|
|
|
14
|
-
# execute: Adaptive Task Execution with
|
|
14
|
+
# execute: Adaptive Task Execution with Per-Wave Verification
|
|
15
15
|
|
|
16
|
-
Execute tasks in parallel waves with full scope context, adapt based on learnings,
|
|
16
|
+
Execute tasks in parallel waves with full scope context, verify each wave before proceeding, adapt based on learnings, audit cross-wave integration, generate manual test guide. Outcome: complete implementation with verified quality and E2E requirement coverage.
|
|
17
17
|
|
|
18
18
|
## ARGUMENTS
|
|
19
19
|
|
|
@@ -39,7 +39,9 @@ $ARGUMENTS
|
|
|
39
39
|
|
|
40
40
|
2. **Dispatch Wave**: Launch parallel @dev subagents (1 per task batch)
|
|
41
41
|
- **CRITICAL**: Each subagent MUST read `SCOPE_DOCS` before executing
|
|
42
|
-
- Each receives: task batch assignment,
|
|
42
|
+
- Each receives: task batch assignment, SCOPE_DOCS paths, and (after wave 1) a **Prior-Wave Context** block
|
|
43
|
+
- **Prior-Wave Context** (REQUIRED in waves 2+): the orchestrator appends each prior wave's @dev Completion Reports verbatim into this wave's dispatch prompt under a `## Prior-Wave Context` header. Includes Completed tasks, Files changed, Scope signal, Discoveries, and Guidance from each prior batch. This is how state is carried forward — there is no separate state file.
|
|
44
|
+
- **Test discovery**: instruct @dev to use the project's native related-test command (`jest --findRelatedTests <file>`, `pytest` by path, `vitest related`, `cargo test <path>`). Do not create parallel test files for code already covered.
|
|
43
45
|
- Instruct: "Read scope docs first to understand E2E UX and integration points. Load Skill(spectre-tdd), then execute tasks sequentially using its TDD methodology. **Commit after each parent task** with conventional commit format (e.g., `feat(module): add X`, `fix(module): resolve Y`). Return completion report with **Implementation Insights** + **E2E Completeness Check**."
|
|
44
46
|
|
|
45
47
|
**E2E Completeness Check** (subagent returns one per batch):
|
|
@@ -47,15 +49,64 @@ $ARGUMENTS
|
|
|
47
49
|
- 🟡 Gap — [specific functionality missing for E2E UX]
|
|
48
50
|
- 🔴 Blocker — [cannot deliver spec without changes to other tasks]
|
|
49
51
|
|
|
50
|
-
3. **
|
|
52
|
+
3. **Per-Wave Verification Gate**: Verify the wave's output before adapting or advancing.
|
|
51
53
|
|
|
52
|
-
|
|
54
|
+
**3a. Deterministic pre-gate (no AI)**
|
|
55
|
+
- Detect project commands from `package.json` / `pyproject.toml` / `Cargo.toml` / `Makefile`
|
|
56
|
+
- Run lint, typecheck, build — whichever apply
|
|
57
|
+
- If any fail: dispatch @dev to fix the failures, re-run the gate. Do NOT invoke @reviewer until all deterministic checks pass.
|
|
58
|
+
|
|
59
|
+
**3b. Parallel review lenses (single message, two @reviewer dispatches)**
|
|
60
|
+
|
|
61
|
+
Build each reviewer prompt from:
|
|
62
|
+
- Wave diff: `git diff <parent-of-first-wave-commit>..HEAD`
|
|
63
|
+
- Acceptance criteria: verbatim text from scope/tasks docs for this wave's tasks
|
|
64
|
+
- Files-touched manifest
|
|
65
|
+
|
|
66
|
+
**Forbidden in reviewer prompts**: @dev completion reports, implementer rationale, orchestrator paraphrase of "what the dev did and why". The reviewer is a clean room — diff + criteria only.
|
|
67
|
+
|
|
68
|
+
**Lens 1 — security + correctness**
|
|
69
|
+
- OWASP Top-10, injection, auth, secrets, data exposure
|
|
70
|
+
- Logic, edge cases, state transitions
|
|
71
|
+
- Scope adherence (flag only in-scope issues; do not flag missing out-of-scope work)
|
|
72
|
+
|
|
73
|
+
**Lens 2 — wiring**
|
|
74
|
+
- Apply the Defined → Connected → Reachable methodology:
|
|
75
|
+
- Defined: code exists in a file
|
|
76
|
+
- Connected: code is imported/called by other code
|
|
77
|
+
- Reachable: a user action can trigger the code path
|
|
78
|
+
- For each new function/component, grep for usage (not just definition)
|
|
79
|
+
- For UI features, trace render-backward: JSX ← variable ← source ← user action
|
|
80
|
+
- Flag dead computations (computed but never reach output) and old code paths still active when replaced
|
|
81
|
+
|
|
82
|
+
**Severity & evidence rule** (enforced in both lens prompts):
|
|
83
|
+
- Every CRITICAL or HIGH finding MUST include:
|
|
84
|
+
1. `file:line` reference
|
|
85
|
+
2. A reproducible failure scenario or exploit path describing observable behavior
|
|
86
|
+
- Findings without an evidence chain are auto-downgraded one severity level. "Could potentially" is not evidence.
|
|
87
|
+
- Each finding includes a hash: `sha256(file_path + line + finding_category)` for the fix-loop ledger (3c).
|
|
88
|
+
|
|
89
|
+
**3c. Bounded fix loop**
|
|
90
|
+
|
|
91
|
+
If lens dispatches return CRITICAL/HIGH:
|
|
92
|
+
- **Iteration cap**: 3 fix waves maximum
|
|
93
|
+
- **Hash ledger**: maintain a set of finding hashes addressed. If a finding with a hash already in the ledger reappears in a later review, classify as "reviewer disagreement" and escalate to user — do NOT re-queue.
|
|
94
|
+
- **Fix/test ratio**: monitor changes per fix wave. If test-file changes > 0.5 × implementation-file changes, halt and surface to user — likely "fixing the test instead of the bug."
|
|
95
|
+
- **Diff-growth circuit-breaker**: if cumulative fix-wave diff grows > 25% per iteration, halt and surface — fixes are adding surface area, not reducing it.
|
|
96
|
+
- **Dispatch fix**: parallel @dev subagents address each CRITICAL/HIGH finding. Each fix-dev receives the finding's full evidence chain (file:line + scenario), not just the description.
|
|
97
|
+
- **Re-verify**: after fixes commit, return to 3a (deterministic) then 3b (lenses).
|
|
98
|
+
|
|
99
|
+
**3d. Exit condition**: No CRITICAL/HIGH remain, OR iteration cap reached and user has been notified of unresolved findings.
|
|
100
|
+
|
|
101
|
+
4. **Mark Complete**: Update tasks doc with `[x]` for completed tasks
|
|
102
|
+
|
|
103
|
+
5. **Reflect**: Review completion reports for:
|
|
53
104
|
- Scope signals (🟡/🟠/🔴) from implementation insights
|
|
54
105
|
- E2E completeness gaps (🟡/🔴) from completeness checks
|
|
55
|
-
- **If** all ⚪ across both → skip to step
|
|
106
|
+
- **If** all ⚪ across both → skip to step 7
|
|
56
107
|
- **Else** → adapt tasks
|
|
57
108
|
|
|
58
|
-
|
|
109
|
+
6. **Adapt** (only if triggered):
|
|
59
110
|
- Modify future tasks with learned context
|
|
60
111
|
- Add tasks for E2E gaps with `[ADDED - E2E gap]` prefix
|
|
61
112
|
- Add required sub-tasks with `[ADDED]` prefix
|
|
@@ -63,34 +114,28 @@ $ARGUMENTS
|
|
|
63
114
|
- Flag cross-task integration issues to remaining waves
|
|
64
115
|
- **Guardrails**: ❌ No "nice-to-have" additions, ❌ No scope expansion, ✅ Only adapt for spec compliance
|
|
65
116
|
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
## Step 2 - Code Review Loop
|
|
69
|
-
|
|
70
|
-
- **Action** — ExecutedeveviewLoop: Until no critical/high feedback:
|
|
117
|
+
7. **Next Wave**: Identify next tasks, gather prior-wave completion reports for the Prior-Wave Context block, return to step 1
|
|
71
118
|
|
|
72
|
-
|
|
73
|
-
2. **Analyze**: Identify critical/high items
|
|
74
|
-
- **If** none → exit loop
|
|
75
|
-
3. **Address**: Parallel @dev subagents fix feedback
|
|
76
|
-
4. **Re-verify**: Return to step 1
|
|
119
|
+
## Step 2 - Cross-Wave Validate
|
|
77
120
|
|
|
78
|
-
|
|
121
|
+
- **Action** — SpawnValidation: @analyst runs `Skill(validate)` (Claude slash route: `validate`) with **narrowed scope**:
|
|
122
|
+
- Focus: cross-wave integration audit (did later waves silently break earlier waves' wiring?) + scope-creep audit (anything implemented that is NOT in the acceptance criteria?) + dead-computation sweep across the full cumulative diff
|
|
123
|
+
- Skip: per-area wiring verification (already done per-wave in Step 1.3b's wiring lens)
|
|
79
124
|
|
|
80
|
-
- **Action** —
|
|
81
|
-
- **Action** — AddressGaps: If high priority gaps → dispatch @dev subagents to fix
|
|
125
|
+
- **Action** — AddressGaps: If high priority gaps surface → dispatch @dev subagents to fix.
|
|
82
126
|
|
|
83
|
-
## Step
|
|
127
|
+
## Step 3 - Prepare for QA
|
|
84
128
|
|
|
85
129
|
- **Action** — GenerateTestGuide: @dev runs `Skill(create_test_guide)` (Claude slash route: `create_test_guide`)
|
|
86
130
|
- Save to `{OUT_DIR}/test_guide.md`
|
|
87
131
|
|
|
88
|
-
## Step
|
|
132
|
+
## Step 4 - Report
|
|
89
133
|
|
|
90
134
|
- **Action** — SummarizeCompletion:
|
|
91
|
-
- Tasks completed, waves executed,
|
|
135
|
+
- Tasks completed, waves executed, per-wave fix-loop iteration counts, validation status
|
|
92
136
|
- Test guide location
|
|
93
137
|
- **Task Evolution Summary**: Adaptations made (or "None - original plan executed")
|
|
94
138
|
- **E2E Gaps Addressed**: Summary of completeness issues found and resolved
|
|
139
|
+
- **Unresolved Findings** (if any): Any CRITICAL/HIGH that hit the fix-loop cap and were escalated to user
|
|
95
140
|
|
|
96
141
|
- **Action** — RenderFooter: Use `Skill(spectre-guide)` skill for Next Steps
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: "plan_review"
|
|
3
|
-
description: "👻 | Independent multi-lens review of plan.md
|
|
3
|
+
description: "👻 | Independent multi-lens review of plan.md and/or tasks.md — finds overengineering, missing verification, hallucinated deps, weak references"
|
|
4
4
|
user-invocable: true
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -10,12 +10,12 @@ user-invocable: true
|
|
|
10
10
|
|
|
11
11
|
Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
|
|
12
12
|
|
|
13
|
-
# plan_review: Multi-Lens Review of Plan
|
|
13
|
+
# plan_review: Multi-Lens Review of Plan and/or Tasks
|
|
14
14
|
|
|
15
15
|
## Description
|
|
16
16
|
|
|
17
|
-
- **What** — Independent review of `plan.md`
|
|
18
|
-
- **Outcome** — Structured findings with concrete edit suggestions; optional write-back to update
|
|
17
|
+
- **What** — Independent review of any available planning artifacts (`plan.md`, `tasks.md`, and optional `task_context.md`) from four specialized lenses, dispatched in parallel
|
|
18
|
+
- **Outcome** — Structured findings with concrete edit suggestions; optional write-back to update the available artifacts
|
|
19
19
|
- **Role** — Senior staff engineer + reviewer panel; bias toward pragmatic problem-solving, YAGNI enforcement, and verifiability
|
|
20
20
|
|
|
21
21
|
## ARGUMENTS Input
|
|
@@ -42,25 +42,30 @@ A single reviewer biases toward the issues it notices first. Published practice
|
|
|
42
42
|
- **If** user specifies path in ARGUMENTS → `TASK_DIR={that value}`
|
|
43
43
|
- **Else** → `TASK_DIR=docs/tasks/{branch_name}`
|
|
44
44
|
|
|
45
|
-
- **Action** — ResolveArtifacts: Locate the
|
|
45
|
+
- **Action** — ResolveArtifacts: Locate the available review inputs.
|
|
46
46
|
- `PLAN=${TASK_DIR}/specs/plan.md` (or scoped name)
|
|
47
47
|
- `TASKS=${TASK_DIR}/specs/tasks.md` (or scoped name)
|
|
48
48
|
- `CONTEXT=${TASK_DIR}/task_context.md`
|
|
49
|
-
-
|
|
49
|
+
- `plan.md` and `tasks.md` are independently reviewable. It is valid to review only `plan.md`, only `tasks.md`, or both.
|
|
50
|
+
- `task_context.md` is helpful context but is not required. If it is missing, continue and note that requirements traceability is limited.
|
|
51
|
+
- If both `plan.md` and `tasks.md` are missing, stop and suggest the user run `plan` or `create_tasks` first.
|
|
52
|
+
- If exactly one of `plan.md` or `tasks.md` is missing, list it as absent context and continue. Do not decline, stop, or ask the user to create the missing artifact.
|
|
50
53
|
|
|
51
|
-
- **Action** —
|
|
54
|
+
- **Action** — ReadAvailable: Read each available file completely into context before dispatching reviewers. Reviewers receive curated excerpts plus an artifact manifest that says which files are present and absent. Every reviewer must review the artifacts that exist and must not treat absent artifacts as a blocker.
|
|
52
55
|
|
|
53
56
|
## Step 2 — Dispatch Four Parallel Reviewers
|
|
54
57
|
|
|
55
|
-
Spawn all four subagents in a single message (parallel). Each receives the same artifact excerpts
|
|
58
|
+
Spawn all four subagents in a single message (parallel). Each receives the same available artifact excerpts, the artifact manifest, and a different review brief.
|
|
59
|
+
|
|
60
|
+
Missing-artifact rule for every lens: review what exists. If a finding depends on a missing artifact, phrase it as "not reviewable because `<artifact>` is absent" only when that context is necessary; do not fail the review or ask for the missing artifact.
|
|
56
61
|
|
|
57
62
|
### Lens 1 — YAGNI / Familiar-Shape Bias (`@reviewer`)
|
|
58
63
|
|
|
59
|
-
> Review
|
|
64
|
+
> Review the available plan and/or task list for unrequested complexity. Agents have a documented "familiar-shape bias": shown a feature, they reproduce the mature-system shape from their training data (auth → adds rate-limiting; CRUD → adds soft-delete; form → adds optimistic UI; service → adds telemetry; module → adds feature flags). Your job is to find that bias here.
|
|
60
65
|
>
|
|
61
66
|
> Find:
|
|
62
|
-
> 1.
|
|
63
|
-
> 2.
|
|
67
|
+
> 1. When `plan.md` is present: anything in Technical Approach that isn't traceable to a requirement in available context (`task_context.md` / scope / PRD). If context is absent, use the plan's own requirements and boundaries.
|
|
68
|
+
> 2. When `tasks.md` is present: tasks that implement something the available requirements don't ask for. If requirements context is absent, use the task list's stated goals and boundaries.
|
|
64
69
|
> 3. Abstractions, interfaces, or layers introduced for a single concrete caller.
|
|
65
70
|
> 4. Generality (config files, plugin points, factories) where the actual need is one specific behavior.
|
|
66
71
|
> 5. Overlap with the `Out-of-Bounds — DO NOT add` list (if anything violates that list, it's a hard fail).
|
|
@@ -69,36 +74,36 @@ Spawn all four subagents in a single message (parallel). Each receives the same
|
|
|
69
74
|
|
|
70
75
|
### Lens 2 — Verifiability (`@analyst`)
|
|
71
76
|
|
|
72
|
-
> Review
|
|
77
|
+
> Review the available plan and/or task list for verification quality. The single highest-correlate of successful AI-agent execution is the ability to self-verify. Find every place where verification is missing, prose-only, or disconnected.
|
|
73
78
|
>
|
|
74
79
|
> Find:
|
|
75
|
-
> 1.
|
|
76
|
-
> 2.
|
|
77
|
-
> 3.
|
|
78
|
-
> 4.
|
|
79
|
-
> 5.
|
|
80
|
+
> 1. When `plan.md` is present: items in "Verification — How We Know This Works" that are prose ("works correctly", "is consistent") rather than executable (test name / observable behavior / state condition).
|
|
81
|
+
> 2. When `plan.md` is present: phases that don't declare a verification signal.
|
|
82
|
+
> 3. When `tasks.md` is present: sub-tasks whose acceptance criteria aren't one of the three executable types (test passes / observable behavior / state condition).
|
|
83
|
+
> 4. When both `plan.md` and `tasks.md` are present: verification signals in `plan.md` with no matching acceptance criterion in `tasks.md`.
|
|
84
|
+
> 5. When `tasks.md` is present: behavior-changing sub-tasks that lack a preceding RED test sub-task.
|
|
80
85
|
>
|
|
81
86
|
> Required output: list every non-executable criterion with a proposed rewrite in one of the three types. Cite file:line for each.
|
|
82
87
|
|
|
83
88
|
### Lens 3 — Existence / Hallucination (`@finder`)
|
|
84
89
|
|
|
85
|
-
> Review
|
|
90
|
+
> Review the available plan and/or task list for references to things that may not exist. AI-generated plans hallucinate file paths, package names, function signatures, and API endpoints at measurable rates (~20% for packages per Snyk analysis). Your job is to verify every reference is real.
|
|
86
91
|
>
|
|
87
92
|
> Verify:
|
|
88
|
-
> 1. Every file path mentioned in `plan.md` "Critical Files for Implementation" and
|
|
89
|
-
> 2. Every package in `plan.md` "External Dependencies" — does it exist at the named version? (Note: actual install/registry check is the executor's Phase 0 job; your job is to flag suspicious names — typos, near-misses to well-known packages, lookalikes.)
|
|
90
|
-
> 3. Every function, class, or symbol named in plan/tasks — grep the repo, confirm it exists where claimed.
|
|
93
|
+
> 1. Every file path mentioned in available artifacts, including `plan.md` "Critical Files for Implementation" and `tasks.md` Context blocks when present — does the file exist in the repo today? Use Glob/Read to confirm.
|
|
94
|
+
> 2. Every package in `plan.md` "External Dependencies" when `plan.md` is present — does it exist at the named version? (Note: actual install/registry check is the executor's Phase 0 job; your job is to flag suspicious names — typos, near-misses to well-known packages, lookalikes.)
|
|
95
|
+
> 3. Every function, class, or symbol named in available plan/tasks — grep the repo, confirm it exists where claimed.
|
|
91
96
|
> 4. Every API endpoint, env var, or CLI flag referenced — confirm it's defined in the codebase.
|
|
92
97
|
>
|
|
93
98
|
> Required output: list every reference that fails verification, with `expected: <plan claim>` and `actual: <repo state>`. If everything checks out, say so explicitly — don't pad.
|
|
94
99
|
|
|
95
100
|
### Lens 4 — Canonical Reference Quality (`@patterns`)
|
|
96
101
|
|
|
97
|
-
> Review
|
|
102
|
+
> Review the available plan and/or task list for the quality of "follow existing pattern" references. Anthropic's own guidance is to anchor plans with concrete examples (e.g., "HotDogWidget.php is a good example"). Vague "follow existing patterns" without a file:line anchor is a documented failure mode.
|
|
98
103
|
>
|
|
99
104
|
> Find:
|
|
100
|
-
> 1.
|
|
101
|
-
> 2.
|
|
105
|
+
> 1. When `plan.md` is present: places in Technical Approach that reference "existing patterns" or "similar features" without a specific file:line.
|
|
106
|
+
> 2. When `tasks.md` is present: sub-tasks whose Context block lacks a canonical reference pointer.
|
|
102
107
|
> 3. Better canonical references that the plan missed — actual files in the codebase that more closely match the intended shape.
|
|
103
108
|
> 4. Reuse opportunities the plan ignored: utilities, hooks, helpers, or types already in the repo that the plan re-implements.
|
|
104
109
|
>
|
|
@@ -150,12 +155,12 @@ Spawn all four subagents in a single message (parallel). Each receives the same
|
|
|
150
155
|
> - `1,3,5` — apply specific finding numbers
|
|
151
156
|
> - `skip` — leave artifacts unchanged
|
|
152
157
|
>
|
|
153
|
-
> For findings I apply, I'll edit
|
|
158
|
+
> For findings I apply, I'll edit the relevant available artifact(s) inline and re-run a fast self-check.
|
|
154
159
|
|
|
155
160
|
- **Wait** — User selects.
|
|
156
161
|
|
|
157
162
|
- **Action** — ApplyEdits: For each selected finding:
|
|
158
|
-
- Open the named artifact (plan.md or tasks.md)
|
|
163
|
+
- Open the named artifact (`plan.md` or `tasks.md`)
|
|
159
164
|
- Apply the Suggested Edit verbatim where possible; if the edit needs adaptation, make the minimum change consistent with the finding's intent
|
|
160
165
|
- Track which findings were applied
|
|
161
166
|
|
|
@@ -168,7 +173,7 @@ Spawn all four subagents in a single message (parallel). Each receives the same
|
|
|
168
173
|
- **Action** — ReportApplied:
|
|
169
174
|
|
|
170
175
|
> Applied: {list of finding numbers}. Skipped: {list}.
|
|
171
|
-
> {Path to updated
|
|
176
|
+
> {Path to updated artifact(s)}.
|
|
172
177
|
|
|
173
178
|
## Step 5 — Next Steps
|
|
174
179
|
|
|
@@ -178,6 +183,6 @@ Spawn all four subagents in a single message (parallel). Each receives the same
|
|
|
178
183
|
|
|
179
184
|
## Notes
|
|
180
185
|
|
|
181
|
-
- This skill does NOT generate plans or tasks. It reviews
|
|
186
|
+
- This skill does NOT generate plans or tasks. It reviews available planning artifacts. If only one of `plan.md` or `tasks.md` exists, review that artifact. Only route the user to `plan` or `create_tasks` when neither reviewable artifact exists.
|
|
182
187
|
- The four lenses are intentionally non-overlapping by design but will surface overlap in practice — dedupe at synthesis, don't ask reviewers to coordinate.
|
|
183
188
|
- The "Must-Delete" nomination from Lens 1 is mandatory output — even on a tight plan, naming the single weakest element is a forcing function against under-review.
|