@codename_inc/spectre 5.0.0 → 5.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: "execute"
3
- description: "👻 | Adaptive Wave-Based Build -> Code_Review -> Validate Flow"
3
+ description: "👻 | Adaptive Wave-Based Build with Per-Wave Verification Gate"
4
4
  user-invocable: true
5
5
  ---
6
6
 
@@ -11,9 +11,9 @@ user-invocable: true
11
11
  Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
12
12
 
13
13
 
14
- # execute: Adaptive Task Execution with Quality Gates
14
+ # execute: Adaptive Task Execution with Per-Wave Verification
15
15
 
16
- Execute tasks in parallel waves with full scope context, adapt based on learnings, code review loop, validate requirements. Outcome: complete implementation with verified quality and E2E requirement coverage.
16
+ Execute tasks in parallel waves with full scope context, verify each wave before proceeding, adapt based on learnings, audit cross-wave integration, generate manual test guide. Outcome: complete implementation with verified quality and E2E requirement coverage.
17
17
 
18
18
  ## ARGUMENTS
19
19
 
@@ -39,7 +39,9 @@ $ARGUMENTS
39
39
 
40
40
  2. **Dispatch Wave**: Launch parallel @dev subagents (1 per task batch)
41
41
  - **CRITICAL**: Each subagent MUST read `SCOPE_DOCS` before executing
42
- - Each receives: task batch assignment, dependency completion reports, SCOPE_DOCS paths
42
+ - Each receives: task batch assignment, SCOPE_DOCS paths, and (after wave 1) a **Prior-Wave Context** block
43
+ - **Prior-Wave Context** (REQUIRED in waves 2+): the orchestrator appends each prior wave's @dev Completion Reports verbatim into this wave's dispatch prompt under a `## Prior-Wave Context` header. Includes Completed tasks, Files changed, Scope signal, Discoveries, and Guidance from each prior batch. This is how state is carried forward — there is no separate state file.
44
+ - **Test discovery**: instruct @dev to use the project's native related-test command (`jest --findRelatedTests <file>`, `pytest` by path, `vitest related`, `cargo test <path>`). Do not create parallel test files for code already covered.
43
45
  - Instruct: "Read scope docs first to understand E2E UX and integration points. Load Skill(spectre-tdd), then execute tasks sequentially using its TDD methodology. **Commit after each parent task** with conventional commit format (e.g., `feat(module): add X`, `fix(module): resolve Y`). Return completion report with **Implementation Insights** + **E2E Completeness Check**."
44
46
 
45
47
  **E2E Completeness Check** (subagent returns one per batch):
@@ -47,15 +49,64 @@ $ARGUMENTS
47
49
  - 🟡 Gap — [specific functionality missing for E2E UX]
48
50
  - 🔴 Blocker — [cannot deliver spec without changes to other tasks]
49
51
 
50
- 3. **Mark Complete**: Update tasks doc with `[x]` for completed tasks
52
+ 3. **Per-Wave Verification Gate**: Verify the wave's output before adapting or advancing.
51
53
 
52
- 4. **Reflect**: Review completion reports for:
54
+ **3a. Deterministic pre-gate (no AI)**
55
+ - Detect project commands from `package.json` / `pyproject.toml` / `Cargo.toml` / `Makefile`
56
+ - Run lint, typecheck, build — whichever apply
57
+ - If any fail: dispatch @dev to fix the failures, re-run the gate. Do NOT invoke @reviewer until all deterministic checks pass.
58
+
59
+ **3b. Parallel review lenses (single message, two @reviewer dispatches)**
60
+
61
+ Build each reviewer prompt from:
62
+ - Wave diff: `git diff <parent-of-first-wave-commit>..HEAD`
63
+ - Acceptance criteria: verbatim text from scope/tasks docs for this wave's tasks
64
+ - Files-touched manifest
65
+
66
+ **Forbidden in reviewer prompts**: @dev completion reports, implementer rationale, orchestrator paraphrase of "what the dev did and why". The reviewer is a clean room — diff + criteria only.
67
+
68
+ **Lens 1 — security + correctness**
69
+ - OWASP Top-10, injection, auth, secrets, data exposure
70
+ - Logic, edge cases, state transitions
71
+ - Scope adherence (flag only in-scope issues; do not flag missing out-of-scope work)
72
+
73
+ **Lens 2 — wiring**
74
+ - Apply the Defined → Connected → Reachable methodology:
75
+ - Defined: code exists in a file
76
+ - Connected: code is imported/called by other code
77
+ - Reachable: a user action can trigger the code path
78
+ - For each new function/component, grep for usage (not just definition)
79
+ - For UI features, trace render-backward: JSX ← variable ← source ← user action
80
+ - Flag dead computations (computed but never reach output) and old code paths still active when replaced
81
+
82
+ **Severity & evidence rule** (enforced in both lens prompts):
83
+ - Every CRITICAL or HIGH finding MUST include:
84
+ 1. `file:line` reference
85
+ 2. A reproducible failure scenario or exploit path describing observable behavior
86
+ - Findings without an evidence chain are auto-downgraded one severity level. "Could potentially" is not evidence.
87
+ - Each finding includes a hash: `sha256(file_path + line + finding_category)` for the fix-loop ledger (3c).
88
+
89
+ **3c. Bounded fix loop**
90
+
91
+ If lens dispatches return CRITICAL/HIGH:
92
+ - **Iteration cap**: 3 fix waves maximum
93
+ - **Hash ledger**: maintain a set of finding hashes addressed. If a finding with a hash already in the ledger reappears in a later review, classify as "reviewer disagreement" and escalate to user — do NOT re-queue.
94
+ - **Fix/test ratio**: monitor changes per fix wave. If test-file changes > 0.5 × implementation-file changes, halt and surface to user — likely "fixing the test instead of the bug."
95
+ - **Diff-growth circuit-breaker**: if cumulative fix-wave diff grows > 25% per iteration, halt and surface — fixes are adding surface area, not reducing it.
96
+ - **Dispatch fix**: parallel @dev subagents address each CRITICAL/HIGH finding. Each fix-dev receives the finding's full evidence chain (file:line + scenario), not just the description.
97
+ - **Re-verify**: after fixes commit, return to 3a (deterministic) then 3b (lenses).
98
+
99
+ **3d. Exit condition**: No CRITICAL/HIGH remain, OR iteration cap reached and user has been notified of unresolved findings.
100
+
101
+ 4. **Mark Complete**: Update tasks doc with `[x]` for completed tasks
102
+
103
+ 5. **Reflect**: Review completion reports for:
53
104
  - Scope signals (🟡/🟠/🔴) from implementation insights
54
105
  - E2E completeness gaps (🟡/🔴) from completeness checks
55
- - **If** all ⚪ across both → skip to step 6
106
+ - **If** all ⚪ across both → skip to step 7
56
107
  - **Else** → adapt tasks
57
108
 
58
- 5. **Adapt** (only if triggered):
109
+ 6. **Adapt** (only if triggered):
59
110
  - Modify future tasks with learned context
60
111
  - Add tasks for E2E gaps with `[ADDED - E2E gap]` prefix
61
112
  - Add required sub-tasks with `[ADDED]` prefix
@@ -63,34 +114,28 @@ $ARGUMENTS
63
114
  - Flag cross-task integration issues to remaining waves
64
115
  - **Guardrails**: ❌ No "nice-to-have" additions, ❌ No scope expansion, ✅ Only adapt for spec compliance
65
116
 
66
- 6. **Next Wave**: Identify next tasks, gather relevant completion reports, return to step 1
67
-
68
- ## Step 2 - Code Review Loop
69
-
70
- - **Action** — ExecutedeveviewLoop: Until no critical/high feedback:
117
+ 7. **Next Wave**: Identify next tasks, gather prior-wave completion reports for the Prior-Wave Context block, return to step 1
71
118
 
72
- 1. **Spawn Review**: @dev subagent runs `Skill(code_review)` (Claude slash route: `code_review`)
73
- 2. **Analyze**: Identify critical/high items
74
- - **If** none → exit loop
75
- 3. **Address**: Parallel @dev subagents fix feedback
76
- 4. **Re-verify**: Return to step 1
119
+ ## Step 2 - Cross-Wave Validate
77
120
 
78
- ## Step 3 - Validate Requirements
121
+ - **Action** SpawnValidation: @analyst runs `Skill(validate)` (Claude slash route: `validate`) with **narrowed scope**:
122
+ - Focus: cross-wave integration audit (did later waves silently break earlier waves' wiring?) + scope-creep audit (anything implemented that is NOT in the acceptance criteria?) + dead-computation sweep across the full cumulative diff
123
+ - Skip: per-area wiring verification (already done per-wave in Step 1.3b's wiring lens)
79
124
 
80
- - **Action** — SpawnValidation: @reviewer runs `Skill(validate)` (Claude slash route: `validate`) with task list
81
- - **Action** — AddressGaps: If high priority gaps → dispatch @dev subagents to fix
125
+ - **Action** — AddressGaps: If high priority gaps surface dispatch @dev subagents to fix.
82
126
 
83
- ## Step 4 - Prepare for QA
127
+ ## Step 3 - Prepare for QA
84
128
 
85
129
  - **Action** — GenerateTestGuide: @dev runs `Skill(create_test_guide)` (Claude slash route: `create_test_guide`)
86
130
  - Save to `{OUT_DIR}/test_guide.md`
87
131
 
88
- ## Step 5 - Report
132
+ ## Step 4 - Report
89
133
 
90
134
  - **Action** — SummarizeCompletion:
91
- - Tasks completed, waves executed, code review iterations, validation status
135
+ - Tasks completed, waves executed, per-wave fix-loop iteration counts, validation status
92
136
  - Test guide location
93
137
  - **Task Evolution Summary**: Adaptations made (or "None - original plan executed")
94
138
  - **E2E Gaps Addressed**: Summary of completeness issues found and resolved
139
+ - **Unresolved Findings** (if any): Any CRITICAL/HIGH that hit the fix-loop cap and were escalated to user
95
140
 
96
141
  - **Action** — RenderFooter: Use `Skill(spectre-guide)` skill for Next Steps
@@ -49,9 +49,11 @@ Treat the current command arguments as this workflow's input. When invoked from
49
49
  - `OUT_DIR=docs/tasks/{branch_name}` (or user-specified)
50
50
  - `mkdir -p "${OUT_DIR}"`
51
51
 
52
- - **Action** — ScanExistingContext: Read all existing artifacts in `{OUT_DIR}/ (if you havent already)` and assess coverage across 4 dimensions.
52
+ - **Action** — ScanExistingContext: Read all existing artifacts in `{OUT_DIR}/ (if you haven't already)` and assess coverage across 4 dimensions.
53
53
 
54
- Scan for: `task_context.md`, `specs/plan.md`, `concepts/scope.md`, `research/*.md`
54
+ Scan for: `task_context.md`, `specs/plan.md`, `concepts/scope.md`, `specs/ux.md`, `research/*.md`
55
+
56
+ While scanning `concepts/scope.md` and `specs/ux.md`, extract any **filled assumptions** — places where the upstream artifact defaulted a value because the user didn't specify (e.g., DB choice, retry policy, copy variants, segment fallbacks). Carry these forward to Step 3's design surface so they're reviewer-visible before plan generation.
55
57
 
56
58
  | Dimension | Covered if artifact contains... | Covered by |
57
59
  | --- | --- | --- |
@@ -95,13 +97,26 @@ Use research findings from Step 1 to determine appropriate planning depth.
95
97
  | Integration points | Research findings | Internal only = Low, 1-2 external = Med, 3+ external = High |
96
98
  | External complexity | @web-research | Well-documented with libraries = Low, Some prior art = Med, Novel/emerging = High |
97
99
 
98
- - **Action** — CheckHardStops: Any true = automatic COMPREHENSIVE | db_schema_destructive | new_service_or_component | auth_or_pii_change | | payment_billing_logic | public_api_change | caching_consistency | slo_sla_risk |
100
+ - **Action** — CheckHardStops: Any true = automatic COMPREHENSIVE.
101
+ - `db_schema_destructive` — drops, renames, or non-additive column changes
102
+ - `data_migration_required` — backfill, transform, or row-by-row data change
103
+ - `new_service_or_component` — net-new service, daemon, or top-level component
104
+ - `auth_or_pii_change` — authn/authz flow, session handling, PII storage/exposure
105
+ - `secrets_or_credentials_handling` — new secret introduced, rotation, or boundary change
106
+ - `payment_billing_logic` — money flow, invoicing, charge logic
107
+ - `public_api_change` — externally-consumed API surface modified
108
+ - `concurrent_writes_or_locking` — concurrency, locking, or distributed coordination
109
+ - `caching_consistency` — cache invalidation, staleness windows, multi-tier caching
110
+ - `cross_service_or_cross_workspace_change` — coordinated change across services or workspaces
111
+ - `slo_sla_risk` — latency, throughput, or availability budget at stake
112
+
113
+ - **Action** — DetermineTier (decisive rules, not point-scoring):
99
114
 
100
- - **Action** — DetermineTier:
115
+ - **COMPREHENSIVE** — if ANY hard-stop is triggered OR any signal scores High OR two or more signals score Medium
116
+ - **STANDARD** — if no hard-stops AND no High signals AND at most one Medium signal
117
+ - **LIGHT** — only if every signal scores Low AND no hard-stops AND the change is plausibly a single-file diff
101
118
 
102
- - **LIGHT**: All/most Low signals, single component, clear pattern match, no hard-stops
103
- - **STANDARD**: Mix of Low/Med signals, multi-file but contained scope, no hard-stops
104
- - **COMPREHENSIVE**: Any High signal, multiple Med signals, or any hard-stop triggered
119
+ When in doubt between two tiers, choose the higher. The cost of over-planning a small change is hours; the cost of under-planning a large one is weeks.
105
120
 
106
121
  - **Action** — LogTier: Note the assessed tier in your response for transparency, then proceed immediately to the next step. Do NOT ask for confirmation.
107
122
 
@@ -129,6 +144,14 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
129
144
  > - [decision] — [rationale; alternative considered]
130
145
  > - [decision] — [rationale; alternative considered]
131
146
  >
147
+ > **How we'll know it works** (verification spine):
148
+ > - [change] → [test name | observable behavior | state condition]
149
+ > - [change] → [test name | observable behavior | state condition]
150
+ >
151
+ > **Filled assumptions** (surfaced from scope.md / ux.md / inferred):
152
+ > - [assumption] — *source: [scope.md / ux.md / default]*
153
+ > - [assumption] — *source: [scope.md / ux.md / default]*
154
+ >
132
155
  > **Open questions** (with default assumption):
133
156
  > 1. [question] — *default: [assumption]*
134
157
  > 2. [question] — *default: [assumption]*
@@ -138,6 +161,8 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
138
161
  **CRITICAL**:
139
162
  - **Single proposed approach**, not a menu. If a true fork exists, surface it as an open question with your recommendation — not as parallel options.
140
163
  - Stay at the *shape* level: components, key decisions, structural changes. Defer file-by-file detail to `create_plan`.
164
+ - **Verification is mandatory.** Every major change in the approach must declare how it will be checked — falsifiable signal, not prose. This becomes the spine that `create_plan` and `create_tasks` build on.
165
+ - **Filled assumptions are mandatory.** If scope.md or ux.md left something silent and you defaulted it, surface the default here. Reviewer-visible by design — these are the silent decisions that bite at execution.
141
166
  - Open questions should be specific and answerable; pair each with a default assumption so the user can skip if the default is fine.
142
167
 
143
168
  - **Action** — IterateDesign: If the user replies with answers, edits, or pushback, update the design and re-present. Loop until user says 'looks good' (or equivalent).
@@ -208,4 +233,24 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
208
233
 
209
234
  ---
210
235
 
236
+ ### Post-Tasks Tier Re-check
237
+
238
+ After tasks return, do a fast self-check against tier signals:
239
+
240
+ - Count parent tasks, sub-tasks, files touched (sum of unique paths in Context blocks), and Phase 0 dep count.
241
+ - **Escalation triggers** (any true → recommend re-running at a higher tier):
242
+ - Tier was LIGHT but tasks touch >3 files OR have >2 parent tasks
243
+ - Tier was STANDARD but tasks reveal a hard-stop signal not caught earlier (e.g., a migration sub-task appeared)
244
+ - Tasks contain any Out-of-Bounds violation
245
+ - **Downgrade triggers** (rare; only suggest if confident):
246
+ - Tier was COMPREHENSIVE but tasks collapsed to a single parent with no migrations, no new components, and no API change
247
+
248
+ If an escalation/downgrade is triggered, surface it as a recommendation — do NOT silently re-run. Format:
249
+
250
+ > Tier reassessment: I planned this as {original tier}, but tasks revealed {signal}. Recommend re-running as {new tier}. Reply 'rerun' to regenerate or 'keep' to proceed as-is.
251
+
252
+ Only proceed past this checkpoint when the user confirms.
253
+
254
+ ---
255
+
211
256
  - **Action** — RenderFooter: Use `Skill(spectre-guide)` skill for Next Steps
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: "plan_review"
3
- description: "👻 | Find simplifications in a plan or tasks"
3
+ description: "👻 | Independent multi-lens review of plan.md + tasks.md — finds overengineering, missing verification, hallucinated deps, weak references"
4
4
  user-invocable: true
5
5
  ---
6
6
 
@@ -10,33 +10,174 @@ user-invocable: true
10
10
 
11
11
  Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
12
12
 
13
+ # plan_review: Multi-Lens Review of Plan & Tasks
13
14
 
14
- You are a senior staff engineer with deep expertise in system design, architecture, and pragmatic problem-solving. Your specialty is finding the simplest path to meet all requirements.
15
+ ## Description
15
16
 
16
- Review the following [plan/document/tasks/context] and identify opportunities to simplify while ensuring all requirements and functionality are delivered.
17
+ - **What** Independent review of `plan.md` + `tasks.md` from four specialized lenses, dispatched in parallel
18
+ - **Outcome** — Structured findings with concrete edit suggestions; optional write-back to update both artifacts
19
+ - **Role** — Senior staff engineer + reviewer panel; bias toward pragmatic problem-solving, YAGNI enforcement, and verifiability
17
20
 
18
- For each simplification opportunity, provide:
19
- 1. **What to simplify** - Specific component, process, or decision
20
- 2. **Why** - What complexity it removes (cognitive load, dependencies, maintenance burden, etc.)
21
- 3. **Impact** - Confirm that all original requirements remain satisfied
22
- 4. **Risk** - Any trade-offs or risks introduced by the simplification
21
+ ## ARGUMENTS Input
23
22
 
24
- Focus on:
25
- - Removing unnecessary abstractions or indirection
26
- - Consolidating duplicated logic or patterns
27
- - Questioning assumptions that add complexity
28
- - Identifying over-engineering
29
- - Suggesting proven, boring solutions over novel approaches
23
+ <ARGUMENTS>
24
+ $ARGUMENTS
25
+ </ARGUMENTS>
30
26
 
31
- ## Testing Review
32
- **Context**: We use fast TDD with 1 happy path test + 1 unhappy path test per feature. A separate task handles achieving 100% test coverage post-feature work.
27
+ ## Why Four Lenses
33
28
 
34
- Evaluate the testing approach and flag:
35
- - **Over-testing**: Tests beyond 1 happy + 1 unhappy path that should be deferred to the coverage task
36
- - **Wrong tests**: Testing implementation details instead of behavior, brittle tests that will break on refactors, or tests that don't actually validate requirements
37
- - **Missing critical paths**: Cases where the 1+1 approach genuinely misses a requirement-breaking scenario (rare, but call it out)
38
- - **Test complexity**: Overly elaborate test setup, mocking, or assertions that could be simpler
29
+ A single reviewer biases toward the issues it notices first. Published practice (Cognition, Anthropic, Osmani) converges on four high-yield review angles for AI-agent-authored plans. We dispatch each as a parallel subagent so coverage is structurally guaranteed, not dependent on a single reviewer remembering everything.
39
30
 
40
- Remember: The goal is fast feedback during development. More comprehensive testing comes later.
31
+ | Lens | Subagent | Finds |
32
+ |------|----------|-------|
33
+ | **YAGNI / familiar-shape bias** | `@reviewer` | Mature-system patterns that crept in unprompted (auth → rate-limit, CRUD → soft-delete, etc.). Forces ONE "delete this" recommendation. |
34
+ | **Verifiability** | `@analyst` | Acceptance criteria that aren't executable; verification gaps between plan and tasks. |
35
+ | **Existence / hallucination** | `@finder` | File paths, packages, APIs, or symbols referenced that don't actually exist. The slopsquatting fence. |
36
+ | **Canonical reference quality** | `@patterns` | "Follow existing pattern" claims without a real file:line anchor; missed reuse opportunities. |
41
37
 
42
- End with a prioritized list of recommendations (high/medium/low impact).
38
+ ## Step 1 Locate Artifacts
39
+
40
+ - **Action** — DetermineTaskDir:
41
+ - `branch_name=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo unknown)`
42
+ - **If** user specifies path in ARGUMENTS → `TASK_DIR={that value}`
43
+ - **Else** → `TASK_DIR=docs/tasks/{branch_name}`
44
+
45
+ - **Action** — ResolveArtifacts: Locate the three required inputs.
46
+ - `PLAN=${TASK_DIR}/specs/plan.md` (or scoped name)
47
+ - `TASKS=${TASK_DIR}/specs/tasks.md` (or scoped name)
48
+ - `CONTEXT=${TASK_DIR}/task_context.md`
49
+ - If any are missing, list what's missing and stop — do NOT review against a partial set. Suggest the user run `plan` or `create_tasks` first.
50
+
51
+ - **Action** — ReadAll: Read each file completely into context before dispatching reviewers. Reviewers receive curated excerpts, not raw paths.
52
+
53
+ ## Step 2 — Dispatch Four Parallel Reviewers
54
+
55
+ Spawn all four subagents in a single message (parallel). Each receives the same artifact excerpts but a different review brief.
56
+
57
+ ### Lens 1 — YAGNI / Familiar-Shape Bias (`@reviewer`)
58
+
59
+ > Review this plan and task list for unrequested complexity. Agents have a documented "familiar-shape bias": shown a feature, they reproduce the mature-system shape from their training data (auth → adds rate-limiting; CRUD → adds soft-delete; form → adds optimistic UI; service → adds telemetry; module → adds feature flags). Your job is to find that bias here.
60
+ >
61
+ > Find:
62
+ > 1. Anything in `plan.md` Technical Approach that isn't traceable to a requirement in `task_context.md` / scope / PRD.
63
+ > 2. Tasks in `tasks.md` that implement something the requirements don't ask for.
64
+ > 3. Abstractions, interfaces, or layers introduced for a single concrete caller.
65
+ > 4. Generality (config files, plugin points, factories) where the actual need is one specific behavior.
66
+ > 5. Overlap with the `Out-of-Bounds — DO NOT add` list (if anything violates that list, it's a hard fail).
67
+ >
68
+ > Required output: nominate the SINGLE highest-leverage thing to delete and justify it. You must pick one. Then list other simplifications ranked by impact. For each finding, cite the exact file:line or section header it lives in.
69
+
70
+ ### Lens 2 — Verifiability (`@analyst`)
71
+
72
+ > Review this plan and task list for verification quality. The single highest-correlate of successful AI-agent execution is the ability to self-verify. Find every place where verification is missing, prose-only, or disconnected.
73
+ >
74
+ > Find:
75
+ > 1. Items in `plan.md` "Verification — How We Know This Works" that are prose ("works correctly", "is consistent") rather than executable (test name / observable behavior / state condition).
76
+ > 2. Phases in `plan.md` that don't declare a verification signal.
77
+ > 3. Sub-tasks in `tasks.md` whose acceptance criteria aren't one of the three executable types (test passes / observable behavior / state condition).
78
+ > 4. Verification signals in `plan.md` with no matching acceptance criterion in `tasks.md`.
79
+ > 5. Behavior-changing sub-tasks in `tasks.md` that lack a preceding RED test sub-task.
80
+ >
81
+ > Required output: list every non-executable criterion with a proposed rewrite in one of the three types. Cite file:line for each.
82
+
83
+ ### Lens 3 — Existence / Hallucination (`@finder`)
84
+
85
+ > Review this plan and task list for references to things that may not exist. AI-generated plans hallucinate file paths, package names, function signatures, and API endpoints at measurable rates (~20% for packages per Snyk analysis). Your job is to verify every reference is real.
86
+ >
87
+ > Verify:
88
+ > 1. Every file path mentioned in `plan.md` "Critical Files for Implementation" and in `tasks.md` Context blocks — does the file exist in the repo today? Use Glob/Read to confirm.
89
+ > 2. Every package in `plan.md` "External Dependencies" — does it exist at the named version? (Note: actual install/registry check is the executor's Phase 0 job; your job is to flag suspicious names — typos, near-misses to well-known packages, lookalikes.)
90
+ > 3. Every function, class, or symbol named in plan/tasks — grep the repo, confirm it exists where claimed.
91
+ > 4. Every API endpoint, env var, or CLI flag referenced — confirm it's defined in the codebase.
92
+ >
93
+ > Required output: list every reference that fails verification, with `expected: <plan claim>` and `actual: <repo state>`. If everything checks out, say so explicitly — don't pad.
94
+
95
+ ### Lens 4 — Canonical Reference Quality (`@patterns`)
96
+
97
+ > Review this plan and task list for the quality of "follow existing pattern" references. Anthropic's own guidance is to anchor plans with concrete examples (e.g., "HotDogWidget.php is a good example"). Vague "follow existing patterns" without a file:line anchor is a documented failure mode.
98
+ >
99
+ > Find:
100
+ > 1. Places in `plan.md` Technical Approach that reference "existing patterns" or "similar features" without a specific file:line.
101
+ > 2. Sub-tasks in `tasks.md` whose Context block lacks a canonical reference pointer.
102
+ > 3. Better canonical references that the plan missed — actual files in the codebase that more closely match the intended shape.
103
+ > 4. Reuse opportunities the plan ignored: utilities, hooks, helpers, or types already in the repo that the plan re-implements.
104
+ >
105
+ > Required output: for each weak/missing reference, propose a specific file:line that should be the anchor. For each missed reuse, cite the existing utility and which task should use it.
106
+
107
+ ## Step 3 — Synthesize Findings
108
+
109
+ - **Action** — CollectFindings: Wait for all four reviewers to return. Read every finding.
110
+
111
+ - **Action** — DeduplicateAndPrioritize: Merge findings that overlap (e.g., a missing canonical reference may surface from both Lens 4 and Lens 2). Assign severity:
112
+ - **Blocker** — would cause execution to fail or produce wrong output (hallucinated file path, criterion the executor can't check, Out-of-Bounds violation)
113
+ - **High** — meaningfully reduces output quality (missing RED test, weak canonical reference, prose criterion)
114
+ - **Medium** — overengineering or reuse miss without functional blast radius
115
+ - **Low** — stylistic or nice-to-have
116
+
117
+ - **Action** — RenderFindingsTable: Output a single structured table. Schema is fixed.
118
+
119
+ ```markdown
120
+ ## Review Findings — {feature name}
121
+
122
+ ### Must-Delete (Lens 1 — YAGNI)
123
+ > {The single nominated highest-leverage cut, with rationale.}
124
+
125
+ ### Findings
126
+
127
+ | # | Severity | Lens | Location | Finding | Suggested Edit |
128
+ |---|----------|------|----------|---------|----------------|
129
+ | 1 | Blocker | Existence | plan.md `## External Dependencies` | `react-use-undocumented@2.4.0` doesn't exist on npm | Remove; the plan can use `useReducer` from React stdlib (see `src/hooks/useFormState.ts:18`) |
130
+ | 2 | High | Verifiability | tasks.md `1.2.1` | "Component renders correctly" is prose | Replace with: Test passes `<ProductCard /> renders product.title and product.price` |
131
+ | 3 | High | YAGNI | plan.md `## Technical Approach` | Adds retry-with-backoff for a sync internal call | Delete; not in requirements; Out-of-Bounds list already forbids retry logic |
132
+ | … | | | | | |
133
+
134
+ ### Summary
135
+ - Blockers: {N} — must resolve before /execute
136
+ - High: {N}
137
+ - Medium: {N}
138
+ - Low: {N}
139
+ ```
140
+
141
+ ## Step 4 — Surface Findings & Apply Edits
142
+
143
+ - **Action** — PresentFindings: Render the findings table inline.
144
+
145
+ - **Action** — OfferWriteBack: After the table, prompt:
146
+
147
+ > Reply with which findings to apply:
148
+ > - `all` — apply every suggested edit
149
+ > - `blockers` — apply Blocker + High severity only
150
+ > - `1,3,5` — apply specific finding numbers
151
+ > - `skip` — leave artifacts unchanged
152
+ >
153
+ > For findings I apply, I'll edit plan.md and/or tasks.md inline and re-run a fast self-check.
154
+
155
+ - **Wait** — User selects.
156
+
157
+ - **Action** — ApplyEdits: For each selected finding:
158
+ - Open the named artifact (plan.md or tasks.md)
159
+ - Apply the Suggested Edit verbatim where possible; if the edit needs adaptation, make the minimum change consistent with the finding's intent
160
+ - Track which findings were applied
161
+
162
+ - **Action** — SelfCheck: After edits, run a fast pass over the modified sections:
163
+ - Re-verify any file:line refs touched
164
+ - Re-verify acceptance criteria are still executable
165
+ - Confirm no edit introduced a new Out-of-Bounds violation
166
+ - If any check fails, surface it and ask the user before continuing
167
+
168
+ - **Action** — ReportApplied:
169
+
170
+ > Applied: {list of finding numbers}. Skipped: {list}.
171
+ > {Path to updated plan.md and tasks.md}.
172
+
173
+ ## Step 5 — Next Steps
174
+
175
+ - **Action** — RenderFooter: Use `Skill(spectre-guide)` skill for Next Steps footer.
176
+
177
+ ---
178
+
179
+ ## Notes
180
+
181
+ - This skill does NOT generate plans or tasks. It reviews them. If `plan.md` or `tasks.md` doesn't exist, route the user to `plan` first.
182
+ - The four lenses are intentionally non-overlapping by design but will surface overlap in practice — dedupe at synthesis, don't ask reviewers to coordinate.
183
+ - The "Must-Delete" nomination from Lens 1 is mandatory output — even on a tight plan, naming the single weakest element is a forcing function against under-review.