@codename_inc/spectre 5.0.0 → 5.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/plugins/spectre/.claude-plugin/plugin.json +1 -1
- package/plugins/spectre/skills/create_plan/SKILL.md +25 -13
- package/plugins/spectre/skills/create_tasks/SKILL.md +137 -38
- package/plugins/spectre/skills/plan/SKILL.md +52 -7
- package/plugins/spectre/skills/plan_review/SKILL.md +164 -23
- package/plugins/spectre-codex/skills/create_plan/SKILL.md +25 -13
- package/plugins/spectre-codex/skills/create_tasks/SKILL.md +137 -38
- package/plugins/spectre-codex/skills/plan/SKILL.md +52 -7
- package/plugins/spectre-codex/skills/plan_review/SKILL.md +164 -23
package/package.json
CHANGED
|
@@ -29,8 +29,9 @@ Treat the current command arguments as this workflow's input. When invoked from
|
|
|
29
29
|
- **If** found with comprehensive analysis → use existing research; skip to Step 3.
|
|
30
30
|
- **Else** → proceed with new research below.
|
|
31
31
|
- **Action** — AutomatedResearch: Spawn parallel research agents for comprehensive analysis.
|
|
32
|
-
- Use `
|
|
33
|
-
- Dispatch multiple parallel `
|
|
32
|
+
- Use `@finder` to find all files related to feature area.
|
|
33
|
+
- Dispatch multiple parallel `@analyst` subagents to understand current implementation patterns. Pay particular attention to how and where data is accessed that will be needed for this feature.
|
|
34
|
+
- Use `@patterns` to surface canonical reference implementations already in the codebase — these become "follow this file" anchors in the plan.
|
|
34
35
|
- Wait for ALL agents to complete before proceeding.
|
|
35
36
|
- Read ALL identified files into context.
|
|
36
37
|
- **Action** — TraceCodePaths: Trace through relevant execution paths.
|
|
@@ -97,17 +98,28 @@ Dynamically generate up to 10 technical questions based on research findings. **
|
|
|
97
98
|
|
|
98
99
|
- **Action** — DesignTechnicalApproach: Create the implementation plan.
|
|
99
100
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
**
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
-
|
|
109
|
-
|
|
110
|
-
|
|
101
|
+
Every plan, regardless of depth, MUST include these seven sections. They are the verification spine — without them, downstream agents cannot self-check their work.
|
|
102
|
+
|
|
103
|
+
**Required for both STANDARD and COMPREHENSIVE:**
|
|
104
|
+
1. **Overview** — 1–2 paragraphs: what problem, what shape the solution takes, why this approach.
|
|
105
|
+
2. **Technical Approach** — How the change actually lands: components touched, data flow, key decisions with rationale. Reference existing patterns from `@patterns` research by file:line (e.g., "follow the shape of `src/widgets/HotDogWidget.ts:42` for the registration step").
|
|
106
|
+
3. **Critical Files for Implementation** — 3–7 specific files from research. Format: `path/to/file.ts` — *reason* (Core logic to modify / Pattern to follow / Interface to implement / Test to extend). No guesses — only files surfaced during Step 1 research.
|
|
107
|
+
4. **External Dependencies — Verify Before Implementation** — Every third-party package required, with exact version and a one-line existence check. Format: `package@1.2.3 — verify: npm view package@1.2.3` (or pip equivalent). Required even if "no new packages" (write that explicitly). This is the slopsquatting fence: ~20% of AI-suggested packages don't exist; we catch that here, not in production.
|
|
108
|
+
5. **Verification — How We Know This Works** — For each major change in Technical Approach, 1–3 falsifiable signals: a test name, an observable behavior, or a state/file condition. Prose like "the feature works" is not acceptable — it must be checkable. Format: `<change> → verifies by: <test name | observable behavior | state condition>`. These become acceptance criteria in `create_tasks` downstream.
|
|
109
|
+
6. **Out-of-Bounds — DO NOT add** — 4–8 concrete things the implementation must NOT add, even if "best practice." Examples: rate limiting, retry/backoff, caching layer, optimistic UI, soft-delete, telemetry events, feature flags, admin UI. This is the YAGNI fence against familiar-shape bias (agents reproduce mature-system patterns unprompted). Be specific to this feature, not generic.
|
|
110
|
+
7. **Risks & Filled Assumptions** — Two short subsections:
|
|
111
|
+
- *Risks*: what could go wrong (e.g., concurrent write race, migration ordering, third-party rate limit). Each with a one-line mitigation or "accept and monitor."
|
|
112
|
+
- *Filled Assumptions*: things the plan defaulted because the spec didn't say (e.g., "Assumed Postgres; spec didn't specify DB." "Assumed retry count = 0; spec didn't mention failure modes."). Reviewer-visible by design — these are the silent decisions that bite at execution.
|
|
113
|
+
|
|
114
|
+
**COMPREHENSIVE additionally requires:**
|
|
115
|
+
8. **Current State** — How the affected code path works today, with file:line refs. Anchored to research findings.
|
|
116
|
+
9. **Implementation Phases** — Ordered phases, each with its own Verification subsection (Phase N succeeds when …). Phases must be sequenced by dependency, not by file. Migration phases come before consumer phases.
|
|
117
|
+
10. **Component / Data Architecture** — Where data is created, mutated, and read. Schema deltas if any.
|
|
118
|
+
11. **API Design** — Endpoint signatures, request/response shapes, error contracts. Required if any external or internal API surface changes.
|
|
119
|
+
12. **Migration Plan** — Required if any data-layer change. Up + down migration sketch, backfill strategy, rollback plan.
|
|
120
|
+
13. **Testing Strategy** — What test types cover what (unit / integration / e2e), where new tests live, what's deferred to the post-feature coverage task.
|
|
121
|
+
|
|
122
|
+
Use your judgment on section length, not on inclusion. If a required section is genuinely N/A for this feature, write the section header followed by *"N/A — <one-line reason>"*. Empty section headers are not acceptable; absent section headers are not acceptable.
|
|
111
123
|
|
|
112
124
|
- **Action** — DocumentPlan: Save to `{OUT_DIR}/specs/plan.md` (use scoped name if exists)
|
|
113
125
|
|
|
@@ -129,12 +129,46 @@ Read completely (no limits):
|
|
|
129
129
|
- This section helps the user understand how the work integrates with the product before diving into tasks
|
|
130
130
|
|
|
131
131
|
### Task Hierarchy (4 Levels)
|
|
132
|
-
-
|
|
133
|
-
-
|
|
134
|
-
-
|
|
135
|
-
-
|
|
132
|
+
- **Phase**: Organizational header (no checkbox) — groups related parent tasks
|
|
133
|
+
- **Parent Task**: Cohesive deliverable (small-medium scope) — one component/file
|
|
134
|
+
- **Sub-task**: Atomic work (single focused change) — single action, 2-3 acceptance criteria
|
|
135
|
+
- **Acceptance Criteria**: Executable, verifiable outcomes (see Acceptance Criteria Types below)
|
|
136
136
|
|
|
137
|
-
**Numbering**: Phase 1 → Parent 1.1, 1.2 → Sub-tasks 1.1.1, 1.1.2 → Criteria
|
|
137
|
+
**Numbering**: Phase 1 → Parent 1.1, 1.2 → Sub-tasks 1.1.1, 1.1.2 → Criteria
|
|
138
|
+
|
|
139
|
+
### Right-Sized for AI Execution
|
|
140
|
+
|
|
141
|
+
Published data on AI agent execution (Cognition's Devin reviews, Anthropic's Claude Code guidance) converges on a bounded sweet spot: each sub-task should be completable in roughly the time a junior would take in a 4–8 hour window — not a multi-day epic, not a 10-line tweak.
|
|
142
|
+
|
|
143
|
+
**Hard size cap — split a sub-task if ANY of these is true:**
|
|
144
|
+
- Touches more than 3 files
|
|
145
|
+
- Has more than 5 acceptance criteria
|
|
146
|
+
- Would require more than ~200 lines of diff
|
|
147
|
+
- Requires a mid-execution judgment call about scope (split the judgment into its own predecessor task)
|
|
148
|
+
- Spans more than one concern (e.g., schema + UI in one sub-task)
|
|
149
|
+
|
|
150
|
+
When splitting, keep the integration-aware principle intact: each split task still names its Producer / Consumer / Replaces.
|
|
151
|
+
|
|
152
|
+
### Acceptance Criteria Types
|
|
153
|
+
|
|
154
|
+
Every acceptance criterion MUST be one of three executable types. Prose criteria like "feature works correctly" or "behavior is consistent" are forbidden — an executor cannot self-check them.
|
|
155
|
+
|
|
156
|
+
1. **Test passes** — `Test \`<test_name>\` passes` (or `tests in <file_path> pass`)
|
|
157
|
+
2. **Observable behavior** — A specific, checkable runtime signal: `GET /api/x returns 200 with field \`y\``, `Console logs \`event=loaded params={...}\``, `Button click triggers <handler> within 100ms`
|
|
158
|
+
3. **State / file condition** — `File \`<path>\` exists and contains <pattern>`, `Migration \`<id>\` applied`, `Env var \`X\` is read at startup`
|
|
159
|
+
|
|
160
|
+
Mixing types within a sub-task is fine. What's not fine: criteria the agent cannot verify without asking the user.
|
|
161
|
+
|
|
162
|
+
### Test-First Task Pairing
|
|
163
|
+
|
|
164
|
+
For any sub-task that changes observable behavior (not pure refactors or cleanup), pair it with a preceding RED task. Pattern:
|
|
165
|
+
|
|
166
|
+
- **N.M.k RED**: Write failing test `<test_name>` asserting `<behavior>`. Acceptance: test exists and fails for the documented reason.
|
|
167
|
+
- **N.M.(k+1) Build**: Implement `<change>`. Acceptance: the RED test passes; no other tests regress.
|
|
168
|
+
|
|
169
|
+
This is the TDAD pattern (test-driven agentic development): the failing test is the executor's self-correction signal. Without it, the executor is guessing whether the implementation is right.
|
|
170
|
+
|
|
171
|
+
Pure refactors, cleanups, and config-only tasks don't require RED pairing — but if behavior changes, the RED comes first.
|
|
138
172
|
|
|
139
173
|
### Integration-Aware Task Principle
|
|
140
174
|
|
|
@@ -153,11 +187,23 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
|
|
|
153
187
|
- **Cleanup tasks**: Remove/redirect old code paths (MANDATORY when replacing patterns)
|
|
154
188
|
|
|
155
189
|
### 4b. Create Parent Tasks
|
|
156
|
-
- **Action** — CreateParentTasks: Draft as many phases as needed to logically organize work, each with as many parent tasks
|
|
190
|
+
- **Action** — CreateParentTasks: Draft as many phases as needed to logically organize work, each with as many parent tasks as required to cover complete scope.
|
|
157
191
|
- Each parent task = single cohesive deliverable (small-medium scope)
|
|
158
192
|
- Cover ALL extracted requirements with no gaps
|
|
159
193
|
- Group related work into phases for clarity
|
|
160
194
|
- Align with technical approach (from research or existing docs)
|
|
195
|
+
- Every parent task carries explicit sequencing in its body:
|
|
196
|
+
- **Predecessor**: parent task IDs that must complete first (or "none")
|
|
197
|
+
- **Unblocks**: parent task IDs this unblocks (or "terminal")
|
|
198
|
+
- The first phase is always **Phase 0 — Dependency Verification** (see 4a-Phase0 below). Other phases start at Phase 1.
|
|
199
|
+
|
|
200
|
+
### 4a-Phase0. Phase 0 — Dependency Verification (always present)
|
|
201
|
+
|
|
202
|
+
Before any implementation, generate a Phase 0 containing one sub-task per external dependency listed in `plan.md`'s "External Dependencies — Verify Before Implementation" section. Each sub-task verifies the package exists at the named version and exposes the API the plan assumed.
|
|
203
|
+
|
|
204
|
+
- Acceptance type: state condition (`npm view <pkg>@<ver>` returns valid metadata) and/or test passes (a minimal import-and-call smoke test).
|
|
205
|
+
- If `plan.md` declared "no new packages," Phase 0 is a single sub-task that confirms no new dependencies were silently introduced during implementation (cross-check `package.json` diff at end).
|
|
206
|
+
- Phase 0 unblocks Phase 1; it cannot be skipped or run in parallel with Phase 1.
|
|
161
207
|
|
|
162
208
|
### 4c. Break Down Sub-tasks
|
|
163
209
|
- **Action** — BreakdownSubTasks: For each parent, generate as many detailed sub-tasks as needed to complete the parent.
|
|
@@ -171,14 +217,19 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
|
|
|
171
217
|
- Completable as a single focused change
|
|
172
218
|
|
|
173
219
|
- **What to INCLUDE in sub-tasks:**
|
|
174
|
-
-
|
|
175
|
-
-
|
|
176
|
-
-
|
|
177
|
-
-
|
|
178
|
-
-
|
|
179
|
-
-
|
|
180
|
-
-
|
|
181
|
-
-
|
|
220
|
+
- Technical terms (JWT, REST, WebSocket, React hooks, SQL queries)
|
|
221
|
+
- Architecture patterns (middleware, pub/sub, observer, factory)
|
|
222
|
+
- Integration points (which components connect, API contracts)
|
|
223
|
+
- File/component names (UserProfileComponent, authMiddleware.ts)
|
|
224
|
+
- Technical constraints (max file size, timeout duration, data format)
|
|
225
|
+
- **Produces**: What output this creates (variable name, return value, prop)
|
|
226
|
+
- **Consumed by**: What uses this output (component, hook, render path)
|
|
227
|
+
- **Replaces**: What old code path this supersedes (if any)
|
|
228
|
+
- **Context** (required): a self-contained payload an executor can use without re-reading the full plan. Include:
|
|
229
|
+
- 2–4 file:line refs pulled from research (the exact code being modified or extended)
|
|
230
|
+
- 1 canonical reference pointer (a file:line from `@patterns` research that shows the shape to follow)
|
|
231
|
+
- 1 link/anchor into `plan.md` for the relevant section
|
|
232
|
+
- **Predecessor** (sub-task level, optional): a sub-task ID this depends on. Only when intra-parent ordering is non-obvious.
|
|
182
233
|
|
|
183
234
|
- **What to AVOID in sub-tasks:**
|
|
184
235
|
- ❌ Code snippets or pseudo-code
|
|
@@ -187,12 +238,11 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
|
|
|
187
238
|
- ❌ Specific library API calls (unless architecturally significant)
|
|
188
239
|
|
|
189
240
|
- **Acceptance criteria**:
|
|
190
|
-
-
|
|
191
|
-
-
|
|
192
|
-
-
|
|
193
|
-
- Be specific about technical requirements
|
|
241
|
+
- Every criterion MUST be one of the three executable types (see "Acceptance Criteria Types" above): test passes / observable behavior / state condition.
|
|
242
|
+
- 2–3 criteria per sub-task. If a sub-task needs more than 3 to be checkable, split it.
|
|
243
|
+
- Prose criteria ("works correctly", "is consistent", "user-friendly") are forbidden — they're not self-checkable.
|
|
194
244
|
|
|
195
|
-
- **Decomposition**: Split if 5
|
|
245
|
+
- **Decomposition (hard size cap)**: Split if ANY of: >3 files touched, >5 criteria, >~200 LOC, mid-task scope judgment required, or more than one concern.
|
|
196
246
|
|
|
197
247
|
### 4d. Validate Task Structure
|
|
198
248
|
- **Action** — VerifyCoverage: Cross-reference tasks against extracted requirements.
|
|
@@ -204,13 +254,19 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
|
|
|
204
254
|
- **Coverage Validation**:
|
|
205
255
|
- [ ] All extracted requirements from Step 3 addressed by tasks?
|
|
206
256
|
- [ ] No gaps in requirement coverage?
|
|
257
|
+
- [ ] Every "Verification" entry from `plan.md` mapped to at least one acceptance criterion?
|
|
207
258
|
- **Exclusion Validation**:
|
|
208
|
-
- [ ]
|
|
209
|
-
- [ ]
|
|
259
|
+
- [ ] No additions beyond explicit requests?
|
|
260
|
+
- [ ] `plan.md`'s "Out-of-Bounds — DO NOT add" list carried forward verbatim into tasks.md banner?
|
|
261
|
+
- [ ] No task implements anything in the Out-of-Bounds list?
|
|
210
262
|
- **Structure Validation**:
|
|
211
263
|
- [ ] Parent tasks are small-medium scope, sub-tasks are atomic?
|
|
212
|
-
- [ ] Each sub-task has 2-3 acceptance criteria?
|
|
213
|
-
- [ ]
|
|
264
|
+
- [ ] Each sub-task has 2-3 acceptance criteria, each one of the three executable types?
|
|
265
|
+
- [ ] No sub-task exceeds the size cap (>3 files / >5 criteria / >~200 LOC / multi-concern / mid-task scope judgment)?
|
|
266
|
+
- [ ] Every behavior-changing sub-task is preceded by a RED test sub-task?
|
|
267
|
+
- [ ] Every sub-task has a Context payload (2–4 file:line refs, 1 canonical reference, 1 plan.md anchor)?
|
|
268
|
+
- [ ] Every parent task has Predecessor and Unblocks declared?
|
|
269
|
+
- [ ] Phase 0 — Dependency Verification is present and unblocks Phase 1?
|
|
214
270
|
|
|
215
271
|
- **Action** — ValidateIntegration: Verify every build task is wired to consumers.
|
|
216
272
|
- **Consumer Specified**:
|
|
@@ -288,6 +344,13 @@ Save to `${TASKS_FILE}`:
|
|
|
288
344
|
- **In Scope**: {bullet list}
|
|
289
345
|
- **Out of Scope**: {bullet list}
|
|
290
346
|
|
|
347
|
+
## Out-of-Bounds — DO NOT add
|
|
348
|
+
*Carried forward verbatim from plan.md. Executors: if a task tempts you to add any of these, stop and ask.*
|
|
349
|
+
- {Forbidden addition 1, e.g. "rate limiting"}
|
|
350
|
+
- {Forbidden addition 2, e.g. "retry/backoff"}
|
|
351
|
+
- {Forbidden addition 3, e.g. "telemetry events"}
|
|
352
|
+
- {Forbidden addition 4, e.g. "admin UI"}
|
|
353
|
+
|
|
291
354
|
## Requirements Traced
|
|
292
355
|
| ID | Description | Source | Tasks |
|
|
293
356
|
|----|-------------|--------|-------|
|
|
@@ -315,30 +378,66 @@ Save to `${TASKS_FILE}`:
|
|
|
315
378
|
|
|
316
379
|
## Tasks
|
|
317
380
|
|
|
381
|
+
### Phase 0: Dependency Verification
|
|
382
|
+
*Confirms every external dependency in plan.md exists at the declared version before any implementation begins.*
|
|
383
|
+
|
|
384
|
+
#### [0.1] Verify external dependencies
|
|
385
|
+
- **Predecessor**: none
|
|
386
|
+
- **Unblocks**: 1.1
|
|
387
|
+
- [ ] **0.1.1** Verify each package@version from plan.md "External Dependencies" section exists
|
|
388
|
+
- **Produces**: confirmation log of resolved package metadata
|
|
389
|
+
- **Consumed by**: Phase 1 implementation tasks
|
|
390
|
+
- **Context**:
|
|
391
|
+
- plan.md anchor: `## External Dependencies — Verify Before Implementation`
|
|
392
|
+
- check commands listed in plan section
|
|
393
|
+
- [ ] State condition: `npm view <pkg>@<ver>` returns valid metadata for every package
|
|
394
|
+
- [ ] State condition: no package in the list is flagged as deprecated or security-advised
|
|
395
|
+
- [ ] Test passes: minimal import-and-call smoke for each new package
|
|
396
|
+
|
|
318
397
|
### Phase 1: {Phase Name}
|
|
319
398
|
|
|
320
399
|
#### [1.1] {Parent Task Title}
|
|
321
|
-
-
|
|
400
|
+
- **Predecessor**: 0.1
|
|
401
|
+
- **Unblocks**: 1.2
|
|
402
|
+
|
|
403
|
+
- [ ] **1.1.1 RED** Write failing test `{test_name}` asserting `{behavior}`
|
|
404
|
+
- **Produces**: a failing test that pins the desired behavior
|
|
405
|
+
- **Consumed by**: 1.1.2 (turns this red to green)
|
|
406
|
+
- **Replaces**: N/A
|
|
407
|
+
- **Context**:
|
|
408
|
+
- `path/to/existing/code.ts:42` — current behavior being changed
|
|
409
|
+
- `path/to/similar/test.ts:18` — canonical test shape to follow
|
|
410
|
+
- plan.md anchor: `### Verification — How We Know This Works`
|
|
411
|
+
- [ ] State condition: file `path/to/test.ts` exists and contains test `{test_name}`
|
|
412
|
+
- [ ] Test passes: the new test fails, with failure message referencing the unimplemented behavior
|
|
413
|
+
|
|
414
|
+
- [ ] **1.1.2 Build** {Implement the change}
|
|
322
415
|
- **Produces**: {output variable/value/prop}
|
|
323
416
|
- **Consumed by**: {component/hook that uses this}
|
|
324
417
|
- **Replaces**: {old code path, or "N/A" if new}
|
|
325
|
-
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
-
|
|
330
|
-
-
|
|
331
|
-
-
|
|
332
|
-
- [ ] {
|
|
333
|
-
- [ ] {Technical outcome 2}
|
|
418
|
+
- **Context**:
|
|
419
|
+
- `path/to/file.ts:120` — code to modify
|
|
420
|
+
- `path/to/file.ts:180` — adjacent code that must not regress
|
|
421
|
+
- `path/to/canonical/example.ts:55` — pattern to follow (from @patterns research)
|
|
422
|
+
- plan.md anchor: `## Technical Approach`
|
|
423
|
+
- [ ] Test passes: `{test_name}` (from 1.1.1) now passes
|
|
424
|
+
- [ ] Test passes: existing tests in `path/to/related.test.ts` still pass
|
|
425
|
+
- [ ] Observable behavior: `{specific runtime signal, e.g. log line, HTTP response shape}`
|
|
334
426
|
|
|
335
427
|
#### [1.2] {Parent Task Title} — Integration
|
|
336
|
-
|
|
337
|
-
-
|
|
338
|
-
|
|
428
|
+
- **Predecessor**: 1.1
|
|
429
|
+
- **Unblocks**: {next parent or "terminal"}
|
|
430
|
+
|
|
431
|
+
- [ ] **1.2.1** Wire {1.1.2 output} to {consumer}
|
|
432
|
+
- **Wires**: {1.1.2 output} → {consumer component/render}
|
|
339
433
|
- **Removes**: {old code path being replaced}
|
|
340
|
-
-
|
|
341
|
-
|
|
434
|
+
- **Context**:
|
|
435
|
+
- `path/to/consumer.tsx:30` — where the wire lands
|
|
436
|
+
- `path/to/old/path.ts:12` — old code path to remove
|
|
437
|
+
- plan.md anchor: `### Technical Approach`
|
|
438
|
+
- [ ] Test passes: integration test asserting consumer renders new data source
|
|
439
|
+
- [ ] State condition: old code path file `path/to/old/path.ts` deleted or import removed
|
|
440
|
+
- [ ] Observable behavior: data flows from producer to rendered output (with `{specific assertion}`)
|
|
342
441
|
|
|
343
442
|
### Phase 2: {Phase Name}
|
|
344
443
|
...
|
|
@@ -49,9 +49,11 @@ Treat the current command arguments as this workflow's input. When invoked from
|
|
|
49
49
|
- `OUT_DIR=docs/tasks/{branch_name}` (or user-specified)
|
|
50
50
|
- `mkdir -p "${OUT_DIR}"`
|
|
51
51
|
|
|
52
|
-
- **Action** — ScanExistingContext: Read all existing artifacts in `{OUT_DIR}/ (if you haven
|
|
52
|
+
- **Action** — ScanExistingContext: Read all existing artifacts in `{OUT_DIR}/ (if you haven't already)` and assess coverage across 4 dimensions.
|
|
53
53
|
|
|
54
|
-
Scan for: `task_context.md`, `specs/plan.md`, `concepts/scope.md`, `research/*.md`
|
|
54
|
+
Scan for: `task_context.md`, `specs/plan.md`, `concepts/scope.md`, `specs/ux.md`, `research/*.md`
|
|
55
|
+
|
|
56
|
+
While scanning `concepts/scope.md` and `specs/ux.md`, extract any **filled assumptions** — places where the upstream artifact defaulted a value because the user didn't specify (e.g., DB choice, retry policy, copy variants, segment fallbacks). Carry these forward to Step 3's design surface so they're reviewer-visible before plan generation.
|
|
55
57
|
|
|
56
58
|
| Dimension | Covered if artifact contains... | Covered by |
|
|
57
59
|
| --- | --- | --- |
|
|
@@ -95,13 +97,26 @@ Use research findings from Step 1 to determine appropriate planning depth.
|
|
|
95
97
|
| Integration points | Research findings | Internal only = Low, 1-2 external = Med, 3+ external = High |
|
|
96
98
|
| External complexity | @web-research | Well-documented with libraries = Low, Some prior art = Med, Novel/emerging = High |
|
|
97
99
|
|
|
98
|
-
- **Action** — CheckHardStops: Any true = automatic COMPREHENSIVE
|
|
100
|
+
- **Action** — CheckHardStops: Any true = automatic COMPREHENSIVE.
|
|
101
|
+
- `db_schema_destructive` — drops, renames, or non-additive column changes
|
|
102
|
+
- `data_migration_required` — backfill, transform, or row-by-row data change
|
|
103
|
+
- `new_service_or_component` — net-new service, daemon, or top-level component
|
|
104
|
+
- `auth_or_pii_change` — authn/authz flow, session handling, PII storage/exposure
|
|
105
|
+
- `secrets_or_credentials_handling` — new secret introduced, rotation, or boundary change
|
|
106
|
+
- `payment_billing_logic` — money flow, invoicing, charge logic
|
|
107
|
+
- `public_api_change` — externally-consumed API surface modified
|
|
108
|
+
- `concurrent_writes_or_locking` — concurrency, locking, or distributed coordination
|
|
109
|
+
- `caching_consistency` — cache invalidation, staleness windows, multi-tier caching
|
|
110
|
+
- `cross_service_or_cross_workspace_change` — coordinated change across services or workspaces
|
|
111
|
+
- `slo_sla_risk` — latency, throughput, or availability budget at stake
|
|
112
|
+
|
|
113
|
+
- **Action** — DetermineTier (decisive rules, not point-scoring):
|
|
99
114
|
|
|
100
|
-
- **
|
|
115
|
+
- **COMPREHENSIVE** — if ANY hard-stop is triggered OR any signal scores High OR two or more signals score Medium
|
|
116
|
+
- **STANDARD** — if no hard-stops AND no High signals AND at most one Medium signal
|
|
117
|
+
- **LIGHT** — only if every signal scores Low AND no hard-stops AND the change is plausibly a single-file diff
|
|
101
118
|
|
|
102
|
-
|
|
103
|
-
- **STANDARD**: Mix of Low/Med signals, multi-file but contained scope, no hard-stops
|
|
104
|
-
- **COMPREHENSIVE**: Any High signal, multiple Med signals, or any hard-stop triggered
|
|
119
|
+
When in doubt between two tiers, choose the higher. The cost of over-planning a small change is hours; the cost of under-planning a large one is weeks.
|
|
105
120
|
|
|
106
121
|
- **Action** — LogTier: Note the assessed tier in your response for transparency, then proceed immediately to the next step. Do NOT ask for confirmation.
|
|
107
122
|
|
|
@@ -129,6 +144,14 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
|
|
|
129
144
|
> - [decision] — [rationale; alternative considered]
|
|
130
145
|
> - [decision] — [rationale; alternative considered]
|
|
131
146
|
>
|
|
147
|
+
> **How we'll know it works** (verification spine):
|
|
148
|
+
> - [change] → [test name | observable behavior | state condition]
|
|
149
|
+
> - [change] → [test name | observable behavior | state condition]
|
|
150
|
+
>
|
|
151
|
+
> **Filled assumptions** (surfaced from scope.md / ux.md / inferred):
|
|
152
|
+
> - [assumption] — *source: [scope.md / ux.md / default]*
|
|
153
|
+
> - [assumption] — *source: [scope.md / ux.md / default]*
|
|
154
|
+
>
|
|
132
155
|
> **Open questions** (with default assumption):
|
|
133
156
|
> 1. [question] — *default: [assumption]*
|
|
134
157
|
> 2. [question] — *default: [assumption]*
|
|
@@ -138,6 +161,8 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
|
|
|
138
161
|
**CRITICAL**:
|
|
139
162
|
- **Single proposed approach**, not a menu. If a true fork exists, surface it as an open question with your recommendation — not as parallel options.
|
|
140
163
|
- Stay at the *shape* level: components, key decisions, structural changes. Defer file-by-file detail to `create_plan`.
|
|
164
|
+
- **Verification is mandatory.** Every major change in the approach must declare how it will be checked — falsifiable signal, not prose. This becomes the spine that `create_plan` and `create_tasks` build on.
|
|
165
|
+
- **Filled assumptions are mandatory.** If scope.md or ux.md left something silent and you defaulted it, surface the default here. Reviewer-visible by design — these are the silent decisions that bite at execution.
|
|
141
166
|
- Open questions should be specific and answerable; pair each with a default assumption so the user can skip if the default is fine.
|
|
142
167
|
|
|
143
168
|
- **Action** — IterateDesign: If the user replies with answers, edits, or pushback, update the design and re-present. Loop until user says 'looks good' (or equivalent).
|
|
@@ -208,4 +233,24 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
|
|
|
208
233
|
|
|
209
234
|
---
|
|
210
235
|
|
|
236
|
+
### Post-Tasks Tier Re-check
|
|
237
|
+
|
|
238
|
+
After tasks return, do a fast self-check against tier signals:
|
|
239
|
+
|
|
240
|
+
- Count parent tasks, sub-tasks, files touched (sum of unique paths in Context blocks), and Phase 0 dep count.
|
|
241
|
+
- **Escalation triggers** (any true → recommend re-running at a higher tier):
|
|
242
|
+
- Tier was LIGHT but tasks touch >3 files OR have >2 parent tasks
|
|
243
|
+
- Tier was STANDARD but tasks reveal a hard-stop signal not caught earlier (e.g., a migration sub-task appeared)
|
|
244
|
+
- Tasks contain any Out-of-Bounds violation
|
|
245
|
+
- **Downgrade triggers** (rare; only suggest if confident):
|
|
246
|
+
- Tier was COMPREHENSIVE but tasks collapsed to a single parent with no migrations, no new components, and no API change
|
|
247
|
+
|
|
248
|
+
If an escalation/downgrade is triggered, surface it as a recommendation — do NOT silently re-run. Format:
|
|
249
|
+
|
|
250
|
+
> Tier reassessment: I planned this as {original tier}, but tasks revealed {signal}. Recommend re-running as {new tier}. Reply 'rerun' to regenerate or 'keep' to proceed as-is.
|
|
251
|
+
|
|
252
|
+
Only proceed past this checkpoint when the user confirms.
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
211
256
|
- **Action** — RenderFooter: Use `@skill-spectre:spectre-guide` skill for Next Steps
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: plan_review
|
|
3
|
-
description: 👻 |
|
|
3
|
+
description: 👻 | Independent multi-lens review of plan.md + tasks.md — finds overengineering, missing verification, hallucinated deps, weak references
|
|
4
4
|
user-invocable: true
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -10,33 +10,174 @@ user-invocable: true
|
|
|
10
10
|
|
|
11
11
|
Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
|
|
12
12
|
|
|
13
|
+
# plan_review: Multi-Lens Review of Plan & Tasks
|
|
13
14
|
|
|
14
|
-
|
|
15
|
+
## Description
|
|
15
16
|
|
|
16
|
-
|
|
17
|
+
- **What** — Independent review of `plan.md` + `tasks.md` from four specialized lenses, dispatched in parallel
|
|
18
|
+
- **Outcome** — Structured findings with concrete edit suggestions; optional write-back to update both artifacts
|
|
19
|
+
- **Role** — Senior staff engineer + reviewer panel; bias toward pragmatic problem-solving, YAGNI enforcement, and verifiability
|
|
17
20
|
|
|
18
|
-
|
|
19
|
-
1. **What to simplify** - Specific component, process, or decision
|
|
20
|
-
2. **Why** - What complexity it removes (cognitive load, dependencies, maintenance burden, etc.)
|
|
21
|
-
3. **Impact** - Confirm that all original requirements remain satisfied
|
|
22
|
-
4. **Risk** - Any trade-offs or risks introduced by the simplification
|
|
21
|
+
## ARGUMENTS Input
|
|
23
22
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
- Questioning assumptions that add complexity
|
|
28
|
-
- Identifying over-engineering
|
|
29
|
-
- Suggesting proven, boring solutions over novel approaches
|
|
23
|
+
<ARGUMENTS>
|
|
24
|
+
$ARGUMENTS
|
|
25
|
+
</ARGUMENTS>
|
|
30
26
|
|
|
31
|
-
##
|
|
32
|
-
**Context**: We use fast TDD with 1 happy path test + 1 unhappy path test per feature. A separate task handles achieving 100% test coverage post-feature work.
|
|
27
|
+
## Why Four Lenses
|
|
33
28
|
|
|
34
|
-
|
|
35
|
-
- **Over-testing**: Tests beyond 1 happy + 1 unhappy path that should be deferred to the coverage task
|
|
36
|
-
- **Wrong tests**: Testing implementation details instead of behavior, brittle tests that will break on refactors, or tests that don't actually validate requirements
|
|
37
|
-
- **Missing critical paths**: Cases where the 1+1 approach genuinely misses a requirement-breaking scenario (rare, but call it out)
|
|
38
|
-
- **Test complexity**: Overly elaborate test setup, mocking, or assertions that could be simpler
|
|
29
|
+
A single reviewer biases toward the issues it notices first. Published practice (Cognition, Anthropic, Osmani) converges on four high-yield review angles for AI-agent-authored plans. We dispatch each as a parallel subagent so coverage is structurally guaranteed, not dependent on a single reviewer remembering everything.
|
|
39
30
|
|
|
40
|
-
|
|
31
|
+
| Lens | Subagent | Finds |
|
|
32
|
+
|------|----------|-------|
|
|
33
|
+
| **YAGNI / familiar-shape bias** | `@reviewer` | Mature-system patterns that crept in unprompted (auth → rate-limit, CRUD → soft-delete, etc.). Forces ONE "delete this" recommendation. |
|
|
34
|
+
| **Verifiability** | `@analyst` | Acceptance criteria that aren't executable; verification gaps between plan and tasks. |
|
|
35
|
+
| **Existence / hallucination** | `@finder` | File paths, packages, APIs, or symbols referenced that don't actually exist. The slopsquatting fence. |
|
|
36
|
+
| **Canonical reference quality** | `@patterns` | "Follow existing pattern" claims without a real file:line anchor; missed reuse opportunities. |
|
|
41
37
|
|
|
42
|
-
|
|
38
|
+
## Step 1 — Locate Artifacts
|
|
39
|
+
|
|
40
|
+
- **Action** — DetermineTaskDir:
|
|
41
|
+
- `branch_name=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo unknown)`
|
|
42
|
+
- **If** user specifies path in ARGUMENTS → `TASK_DIR={that value}`
|
|
43
|
+
- **Else** → `TASK_DIR=docs/tasks/{branch_name}`
|
|
44
|
+
|
|
45
|
+
- **Action** — ResolveArtifacts: Locate the three required inputs.
|
|
46
|
+
- `PLAN=${TASK_DIR}/specs/plan.md` (or scoped name)
|
|
47
|
+
- `TASKS=${TASK_DIR}/specs/tasks.md` (or scoped name)
|
|
48
|
+
- `CONTEXT=${TASK_DIR}/task_context.md`
|
|
49
|
+
- If any are missing, list what's missing and stop — do NOT review against a partial set. Suggest the user run `/spectre:plan` or `/spectre:create_tasks` first.
|
|
50
|
+
|
|
51
|
+
- **Action** — ReadAll: Read each file completely into context before dispatching reviewers. Reviewers receive curated excerpts, not raw paths.
|
|
52
|
+
|
|
53
|
+
## Step 2 — Dispatch Four Parallel Reviewers
|
|
54
|
+
|
|
55
|
+
Spawn all four subagents in a single message (parallel). Each receives the same artifact excerpts but a different review brief.
|
|
56
|
+
|
|
57
|
+
### Lens 1 — YAGNI / Familiar-Shape Bias (`@reviewer`)
|
|
58
|
+
|
|
59
|
+
> Review this plan and task list for unrequested complexity. Agents have a documented "familiar-shape bias": shown a feature, they reproduce the mature-system shape from their training data (auth → adds rate-limiting; CRUD → adds soft-delete; form → adds optimistic UI; service → adds telemetry; module → adds feature flags). Your job is to find that bias here.
|
|
60
|
+
>
|
|
61
|
+
> Find:
|
|
62
|
+
> 1. Anything in `plan.md` Technical Approach that isn't traceable to a requirement in `task_context.md` / scope / PRD.
|
|
63
|
+
> 2. Tasks in `tasks.md` that implement something the requirements don't ask for.
|
|
64
|
+
> 3. Abstractions, interfaces, or layers introduced for a single concrete caller.
|
|
65
|
+
> 4. Generality (config files, plugin points, factories) where the actual need is one specific behavior.
|
|
66
|
+
> 5. Overlap with the `Out-of-Bounds — DO NOT add` list (if anything violates that list, it's a hard fail).
|
|
67
|
+
>
|
|
68
|
+
> Required output: nominate the SINGLE highest-leverage thing to delete and justify it. You must pick one. Then list other simplifications ranked by impact. For each finding, cite the exact file:line or section header it lives in.
|
|
69
|
+
|
|
70
|
+
### Lens 2 — Verifiability (`@analyst`)
|
|
71
|
+
|
|
72
|
+
> Review this plan and task list for verification quality. The single highest-correlate of successful AI-agent execution is the ability to self-verify. Find every place where verification is missing, prose-only, or disconnected.
|
|
73
|
+
>
|
|
74
|
+
> Find:
|
|
75
|
+
> 1. Items in `plan.md` "Verification — How We Know This Works" that are prose ("works correctly", "is consistent") rather than executable (test name / observable behavior / state condition).
|
|
76
|
+
> 2. Phases in `plan.md` that don't declare a verification signal.
|
|
77
|
+
> 3. Sub-tasks in `tasks.md` whose acceptance criteria aren't one of the three executable types (test passes / observable behavior / state condition).
|
|
78
|
+
> 4. Verification signals in `plan.md` with no matching acceptance criterion in `tasks.md`.
|
|
79
|
+
> 5. Behavior-changing sub-tasks in `tasks.md` that lack a preceding RED test sub-task.
|
|
80
|
+
>
|
|
81
|
+
> Required output: list every non-executable criterion with a proposed rewrite in one of the three types. Cite file:line for each.
|
|
82
|
+
|
|
83
|
+
### Lens 3 — Existence / Hallucination (`@finder`)
|
|
84
|
+
|
|
85
|
+
> Review this plan and task list for references to things that may not exist. AI-generated plans hallucinate file paths, package names, function signatures, and API endpoints at measurable rates (~20% for packages per Snyk analysis). Your job is to verify every reference is real.
|
|
86
|
+
>
|
|
87
|
+
> Verify:
|
|
88
|
+
> 1. Every file path mentioned in `plan.md` "Critical Files for Implementation" and in `tasks.md` Context blocks — does the file exist in the repo today? Use Glob/Read to confirm.
|
|
89
|
+
> 2. Every package in `plan.md` "External Dependencies" — does it exist at the named version? (Note: actual install/registry check is the executor's Phase 0 job; your job is to flag suspicious names — typos, near-misses to well-known packages, lookalikes.)
|
|
90
|
+
> 3. Every function, class, or symbol named in plan/tasks — grep the repo, confirm it exists where claimed.
|
|
91
|
+
> 4. Every API endpoint, env var, or CLI flag referenced — confirm it's defined in the codebase.
|
|
92
|
+
>
|
|
93
|
+
> Required output: list every reference that fails verification, with `expected: <plan claim>` and `actual: <repo state>`. If everything checks out, say so explicitly — don't pad.
|
|
94
|
+
|
|
95
|
+
### Lens 4 — Canonical Reference Quality (`@patterns`)
|
|
96
|
+
|
|
97
|
+
> Review this plan and task list for the quality of "follow existing pattern" references. Anthropic's own guidance is to anchor plans with concrete examples (e.g., "HotDogWidget.php is a good example"). Vague "follow existing patterns" without a file:line anchor is a documented failure mode.
|
|
98
|
+
>
|
|
99
|
+
> Find:
|
|
100
|
+
> 1. Places in `plan.md` Technical Approach that reference "existing patterns" or "similar features" without a specific file:line.
|
|
101
|
+
> 2. Sub-tasks in `tasks.md` whose Context block lacks a canonical reference pointer.
|
|
102
|
+
> 3. Better canonical references that the plan missed — actual files in the codebase that more closely match the intended shape.
|
|
103
|
+
> 4. Reuse opportunities the plan ignored: utilities, hooks, helpers, or types already in the repo that the plan re-implements.
|
|
104
|
+
>
|
|
105
|
+
> Required output: for each weak/missing reference, propose a specific file:line that should be the anchor. For each missed reuse, cite the existing utility and which task should use it.
|
|
106
|
+
|
|
107
|
+
## Step 3 — Synthesize Findings
|
|
108
|
+
|
|
109
|
+
- **Action** — CollectFindings: Wait for all four reviewers to return. Read every finding.
|
|
110
|
+
|
|
111
|
+
- **Action** — DeduplicateAndPrioritize: Merge findings that overlap (e.g., a missing canonical reference may surface from both Lens 4 and Lens 2). Assign severity:
|
|
112
|
+
- **Blocker** — would cause execution to fail or produce wrong output (hallucinated file path, criterion the executor can't check, Out-of-Bounds violation)
|
|
113
|
+
- **High** — meaningfully reduces output quality (missing RED test, weak canonical reference, prose criterion)
|
|
114
|
+
- **Medium** — overengineering or reuse miss without functional blast radius
|
|
115
|
+
- **Low** — stylistic or nice-to-have
|
|
116
|
+
|
|
117
|
+
- **Action** — RenderFindingsTable: Output a single structured table. Schema is fixed.
|
|
118
|
+
|
|
119
|
+
```markdown
|
|
120
|
+
## Review Findings — {feature name}
|
|
121
|
+
|
|
122
|
+
### Must-Delete (Lens 1 — YAGNI)
|
|
123
|
+
> {The single nominated highest-leverage cut, with rationale.}
|
|
124
|
+
|
|
125
|
+
### Findings
|
|
126
|
+
|
|
127
|
+
| # | Severity | Lens | Location | Finding | Suggested Edit |
|
|
128
|
+
|---|----------|------|----------|---------|----------------|
|
|
129
|
+
| 1 | Blocker | Existence | plan.md `## External Dependencies` | `react-use-undocumented@2.4.0` doesn't exist on npm | Remove; the plan can use `useReducer` from React stdlib (see `src/hooks/useFormState.ts:18`) |
|
|
130
|
+
| 2 | High | Verifiability | tasks.md `1.2.1` | "Component renders correctly" is prose | Replace with: Test passes `<ProductCard /> renders product.title and product.price` |
|
|
131
|
+
| 3 | High | YAGNI | plan.md `## Technical Approach` | Adds retry-with-backoff for a sync internal call | Delete; not in requirements; Out-of-Bounds list already forbids retry logic |
|
|
132
|
+
| … | | | | | |
|
|
133
|
+
|
|
134
|
+
### Summary
|
|
135
|
+
- Blockers: {N} — must resolve before /execute
|
|
136
|
+
- High: {N}
|
|
137
|
+
- Medium: {N}
|
|
138
|
+
- Low: {N}
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
## Step 4 — Surface Findings & Apply Edits
|
|
142
|
+
|
|
143
|
+
- **Action** — PresentFindings: Render the findings table inline.
|
|
144
|
+
|
|
145
|
+
- **Action** — OfferWriteBack: After the table, prompt:
|
|
146
|
+
|
|
147
|
+
> Reply with which findings to apply:
|
|
148
|
+
> - `all` — apply every suggested edit
|
|
149
|
+
> - `blockers` — apply Blocker + High severity only
|
|
150
|
+
> - `1,3,5` — apply specific finding numbers
|
|
151
|
+
> - `skip` — leave artifacts unchanged
|
|
152
|
+
>
|
|
153
|
+
> For findings I apply, I'll edit plan.md and/or tasks.md inline and re-run a fast self-check.
|
|
154
|
+
|
|
155
|
+
- **Wait** — User selects.
|
|
156
|
+
|
|
157
|
+
- **Action** — ApplyEdits: For each selected finding:
|
|
158
|
+
- Open the named artifact (plan.md or tasks.md)
|
|
159
|
+
- Apply the Suggested Edit verbatim where possible; if the edit needs adaptation, make the minimum change consistent with the finding's intent
|
|
160
|
+
- Track which findings were applied
|
|
161
|
+
|
|
162
|
+
- **Action** — SelfCheck: After edits, run a fast pass over the modified sections:
|
|
163
|
+
- Re-verify any file:line refs touched
|
|
164
|
+
- Re-verify acceptance criteria are still executable
|
|
165
|
+
- Confirm no edit introduced a new Out-of-Bounds violation
|
|
166
|
+
- If any check fails, surface it and ask the user before continuing
|
|
167
|
+
|
|
168
|
+
- **Action** — ReportApplied:
|
|
169
|
+
|
|
170
|
+
> Applied: {list of finding numbers}. Skipped: {list}.
|
|
171
|
+
> {Path to updated plan.md and tasks.md}.
|
|
172
|
+
|
|
173
|
+
## Step 5 — Next Steps
|
|
174
|
+
|
|
175
|
+
- **Action** — RenderFooter: Use `@skill-spectre:spectre-guide` skill for Next Steps footer.
|
|
176
|
+
|
|
177
|
+
---
|
|
178
|
+
|
|
179
|
+
## Notes
|
|
180
|
+
|
|
181
|
+
- This skill does NOT generate plans or tasks. It reviews them. If `plan.md` or `tasks.md` doesn't exist, route the user to `/spectre:plan` first.
|
|
182
|
+
- The four lenses are intentionally non-overlapping by design but will surface overlap in practice — dedupe at synthesis, don't ask reviewers to coordinate.
|
|
183
|
+
- The "Must-Delete" nomination from Lens 1 is mandatory output — even on a tight plan, naming the single weakest element is a forcing function against under-review.
|
|
@@ -29,8 +29,9 @@ Treat the current command arguments as this workflow's input. When invoked from
|
|
|
29
29
|
- **If** found with comprehensive analysis → use existing research; skip to Step 3.
|
|
30
30
|
- **Else** → proceed with new research below.
|
|
31
31
|
- **Action** — AutomatedResearch: Spawn parallel research agents for comprehensive analysis.
|
|
32
|
-
- Use `
|
|
33
|
-
- Dispatch multiple parallel `
|
|
32
|
+
- Use `@finder` to find all files related to feature area.
|
|
33
|
+
- Dispatch multiple parallel `@analyst` subagents to understand current implementation patterns. Pay particular attention to how and where data is accessed that will be needed for this feature.
|
|
34
|
+
- Use `@patterns` to surface canonical reference implementations already in the codebase — these become "follow this file" anchors in the plan.
|
|
34
35
|
- Wait for ALL agents to complete before proceeding.
|
|
35
36
|
- Read ALL identified files into context.
|
|
36
37
|
- **Action** — TraceCodePaths: Trace through relevant execution paths.
|
|
@@ -97,17 +98,28 @@ Dynamically generate up to 10 technical questions based on research findings. **
|
|
|
97
98
|
|
|
98
99
|
- **Action** — DesignTechnicalApproach: Create the implementation plan.
|
|
99
100
|
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
**
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
-
|
|
109
|
-
|
|
110
|
-
|
|
101
|
+
Every plan, regardless of depth, MUST include these seven sections. They are the verification spine — without them, downstream agents cannot self-check their work.
|
|
102
|
+
|
|
103
|
+
**Required for both STANDARD and COMPREHENSIVE:**
|
|
104
|
+
1. **Overview** — 1–2 paragraphs: what problem, what shape the solution takes, why this approach.
|
|
105
|
+
2. **Technical Approach** — How the change actually lands: components touched, data flow, key decisions with rationale. Reference existing patterns from `@patterns` research by file:line (e.g., "follow the shape of `src/widgets/HotDogWidget.ts:42` for the registration step").
|
|
106
|
+
3. **Critical Files for Implementation** — 3–7 specific files from research. Format: `path/to/file.ts` — *reason* (Core logic to modify / Pattern to follow / Interface to implement / Test to extend). No guesses — only files surfaced during Step 1 research.
|
|
107
|
+
4. **External Dependencies — Verify Before Implementation** — Every third-party package required, with exact version and a one-line existence check. Format: `package@1.2.3 — verify: npm view package@1.2.3` (or pip equivalent). Required even if "no new packages" (write that explicitly). This is the slopsquatting fence: ~20% of AI-suggested packages don't exist; we catch that here, not in production.
|
|
108
|
+
5. **Verification — How We Know This Works** — For each major change in Technical Approach, 1–3 falsifiable signals: a test name, an observable behavior, or a state/file condition. Prose like "the feature works" is not acceptable — it must be checkable. Format: `<change> → verifies by: <test name | observable behavior | state condition>`. These become acceptance criteria in `create_tasks` downstream.
|
|
109
|
+
6. **Out-of-Bounds — DO NOT add** — 4–8 concrete things the implementation must NOT add, even if "best practice." Examples: rate limiting, retry/backoff, caching layer, optimistic UI, soft-delete, telemetry events, feature flags, admin UI. This is the YAGNI fence against familiar-shape bias (agents reproduce mature-system patterns unprompted). Be specific to this feature, not generic.
|
|
110
|
+
7. **Risks & Filled Assumptions** — Two short subsections:
|
|
111
|
+
- *Risks*: what could go wrong (e.g., concurrent write race, migration ordering, third-party rate limit). Each with a one-line mitigation or "accept and monitor."
|
|
112
|
+
- *Filled Assumptions*: things the plan defaulted because the spec didn't say (e.g., "Assumed Postgres; spec didn't specify DB." "Assumed retry count = 0; spec didn't mention failure modes."). Reviewer-visible by design — these are the silent decisions that bite at execution.
|
|
113
|
+
|
|
114
|
+
**COMPREHENSIVE additionally requires:**
|
|
115
|
+
8. **Current State** — How the affected code path works today, with file:line refs. Anchored to research findings.
|
|
116
|
+
9. **Implementation Phases** — Ordered phases, each with its own Verification subsection (Phase N succeeds when …). Phases must be sequenced by dependency, not by file. Migration phases come before consumer phases.
|
|
117
|
+
10. **Component / Data Architecture** — Where data is created, mutated, and read. Schema deltas if any.
|
|
118
|
+
11. **API Design** — Endpoint signatures, request/response shapes, error contracts. Required if any external or internal API surface changes.
|
|
119
|
+
12. **Migration Plan** — Required if any data-layer change. Up + down migration sketch, backfill strategy, rollback plan.
|
|
120
|
+
13. **Testing Strategy** — What test types cover what (unit / integration / e2e), where new tests live, what's deferred to the post-feature coverage task.
|
|
121
|
+
|
|
122
|
+
Use your judgment on section length, not on inclusion. If a required section is genuinely N/A for this feature, write the section header followed by *"N/A — <one-line reason>"*. Empty section headers are not acceptable; absent section headers are not acceptable.
|
|
111
123
|
|
|
112
124
|
- **Action** — DocumentPlan: Save to `{OUT_DIR}/specs/plan.md` (use scoped name if exists)
|
|
113
125
|
|
|
@@ -129,12 +129,46 @@ Read completely (no limits):
|
|
|
129
129
|
- This section helps the user understand how the work integrates with the product before diving into tasks
|
|
130
130
|
|
|
131
131
|
### Task Hierarchy (4 Levels)
|
|
132
|
-
-
|
|
133
|
-
-
|
|
134
|
-
-
|
|
135
|
-
-
|
|
132
|
+
- **Phase**: Organizational header (no checkbox) — groups related parent tasks
|
|
133
|
+
- **Parent Task**: Cohesive deliverable (small-medium scope) — one component/file
|
|
134
|
+
- **Sub-task**: Atomic work (single focused change) — single action, 2-3 acceptance criteria
|
|
135
|
+
- **Acceptance Criteria**: Executable, verifiable outcomes (see Acceptance Criteria Types below)
|
|
136
136
|
|
|
137
|
-
**Numbering**: Phase 1 → Parent 1.1, 1.2 → Sub-tasks 1.1.1, 1.1.2 → Criteria
|
|
137
|
+
**Numbering**: Phase 1 → Parent 1.1, 1.2 → Sub-tasks 1.1.1, 1.1.2 → Criteria
|
|
138
|
+
|
|
139
|
+
### Right-Sized for AI Execution
|
|
140
|
+
|
|
141
|
+
Published data on AI agent execution (Cognition's Devin reviews, Anthropic's Claude Code guidance) converges on a bounded sweet spot: each sub-task should be completable in roughly the time a junior would take in a 4–8 hour window — not a multi-day epic, not a 10-line tweak.
|
|
142
|
+
|
|
143
|
+
**Hard size cap — split a sub-task if ANY of these is true:**
|
|
144
|
+
- Touches more than 3 files
|
|
145
|
+
- Has more than 5 acceptance criteria
|
|
146
|
+
- Would require more than ~200 lines of diff
|
|
147
|
+
- Requires a mid-execution judgment call about scope (split the judgment into its own predecessor task)
|
|
148
|
+
- Spans more than one concern (e.g., schema + UI in one sub-task)
|
|
149
|
+
|
|
150
|
+
When splitting, keep the integration-aware principle intact: each split task still names its Producer / Consumer / Replaces.
|
|
151
|
+
|
|
152
|
+
### Acceptance Criteria Types
|
|
153
|
+
|
|
154
|
+
Every acceptance criterion MUST be one of three executable types. Prose criteria like "feature works correctly" or "behavior is consistent" are forbidden — an executor cannot self-check them.
|
|
155
|
+
|
|
156
|
+
1. **Test passes** — `Test \`<test_name>\` passes` (or `tests in <file_path> pass`)
|
|
157
|
+
2. **Observable behavior** — A specific, checkable runtime signal: `GET /api/x returns 200 with field \`y\``, `Console logs \`event=loaded params={...}\``, `Button click triggers <handler> within 100ms`
|
|
158
|
+
3. **State / file condition** — `File \`<path>\` exists and contains <pattern>`, `Migration \`<id>\` applied`, `Env var \`X\` is read at startup`
|
|
159
|
+
|
|
160
|
+
Mixing types within a sub-task is fine. What's not fine: criteria the agent cannot verify without asking the user.
|
|
161
|
+
|
|
162
|
+
### Test-First Task Pairing
|
|
163
|
+
|
|
164
|
+
For any sub-task that changes observable behavior (not pure refactors or cleanup), pair it with a preceding RED task. Pattern:
|
|
165
|
+
|
|
166
|
+
- **N.M.k RED**: Write failing test `<test_name>` asserting `<behavior>`. Acceptance: test exists and fails for the documented reason.
|
|
167
|
+
- **N.M.(k+1) Build**: Implement `<change>`. Acceptance: the RED test passes; no other tests regress.
|
|
168
|
+
|
|
169
|
+
This is the TDAD pattern (test-driven agentic development): the failing test is the executor's self-correction signal. Without it, the executor is guessing whether the implementation is right.
|
|
170
|
+
|
|
171
|
+
Pure refactors, cleanups, and config-only tasks don't require RED pairing — but if behavior changes, the RED comes first.
|
|
138
172
|
|
|
139
173
|
### Integration-Aware Task Principle
|
|
140
174
|
|
|
@@ -153,11 +187,23 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
|
|
|
153
187
|
- **Cleanup tasks**: Remove/redirect old code paths (MANDATORY when replacing patterns)
|
|
154
188
|
|
|
155
189
|
### 4b. Create Parent Tasks
|
|
156
|
-
- **Action** — CreateParentTasks: Draft as many phases as needed to logically organize work, each with as many parent tasks
|
|
190
|
+
- **Action** — CreateParentTasks: Draft as many phases as needed to logically organize work, each with as many parent tasks as required to cover complete scope.
|
|
157
191
|
- Each parent task = single cohesive deliverable (small-medium scope)
|
|
158
192
|
- Cover ALL extracted requirements with no gaps
|
|
159
193
|
- Group related work into phases for clarity
|
|
160
194
|
- Align with technical approach (from research or existing docs)
|
|
195
|
+
- Every parent task carries explicit sequencing in its body:
|
|
196
|
+
- **Predecessor**: parent task IDs that must complete first (or "none")
|
|
197
|
+
- **Unblocks**: parent task IDs this unblocks (or "terminal")
|
|
198
|
+
- The first phase is always **Phase 0 — Dependency Verification** (see 4a-Phase0 below). Other phases start at Phase 1.
|
|
199
|
+
|
|
200
|
+
### 4a-Phase0. Phase 0 — Dependency Verification (always present)
|
|
201
|
+
|
|
202
|
+
Before any implementation, generate a Phase 0 containing one sub-task per external dependency listed in `plan.md`'s "External Dependencies — Verify Before Implementation" section. Each sub-task verifies the package exists at the named version and exposes the API the plan assumed.
|
|
203
|
+
|
|
204
|
+
- Acceptance type: state condition (`npm view <pkg>@<ver>` returns valid metadata) and/or test passes (a minimal import-and-call smoke test).
|
|
205
|
+
- If `plan.md` declared "no new packages," Phase 0 is a single sub-task that confirms no new dependencies were silently introduced during implementation (cross-check `package.json` diff at end).
|
|
206
|
+
- Phase 0 unblocks Phase 1; it cannot be skipped or run in parallel with Phase 1.
|
|
161
207
|
|
|
162
208
|
### 4c. Break Down Sub-tasks
|
|
163
209
|
- **Action** — BreakdownSubTasks: For each parent, generate as many detailed sub-tasks as needed to complete the parent.
|
|
@@ -171,14 +217,19 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
|
|
|
171
217
|
- Completable as a single focused change
|
|
172
218
|
|
|
173
219
|
- **What to INCLUDE in sub-tasks:**
|
|
174
|
-
-
|
|
175
|
-
-
|
|
176
|
-
-
|
|
177
|
-
-
|
|
178
|
-
-
|
|
179
|
-
-
|
|
180
|
-
-
|
|
181
|
-
-
|
|
220
|
+
- Technical terms (JWT, REST, WebSocket, React hooks, SQL queries)
|
|
221
|
+
- Architecture patterns (middleware, pub/sub, observer, factory)
|
|
222
|
+
- Integration points (which components connect, API contracts)
|
|
223
|
+
- File/component names (UserProfileComponent, authMiddleware.ts)
|
|
224
|
+
- Technical constraints (max file size, timeout duration, data format)
|
|
225
|
+
- **Produces**: What output this creates (variable name, return value, prop)
|
|
226
|
+
- **Consumed by**: What uses this output (component, hook, render path)
|
|
227
|
+
- **Replaces**: What old code path this supersedes (if any)
|
|
228
|
+
- **Context** (required): a self-contained payload an executor can use without re-reading the full plan. Include:
|
|
229
|
+
- 2–4 file:line refs pulled from research (the exact code being modified or extended)
|
|
230
|
+
- 1 canonical reference pointer (a file:line from `@patterns` research that shows the shape to follow)
|
|
231
|
+
- 1 link/anchor into `plan.md` for the relevant section
|
|
232
|
+
- **Predecessor** (sub-task level, optional): a sub-task ID this depends on. Only when intra-parent ordering is non-obvious.
|
|
182
233
|
|
|
183
234
|
- **What to AVOID in sub-tasks:**
|
|
184
235
|
- ❌ Code snippets or pseudo-code
|
|
@@ -187,12 +238,11 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
|
|
|
187
238
|
- ❌ Specific library API calls (unless architecturally significant)
|
|
188
239
|
|
|
189
240
|
- **Acceptance criteria**:
|
|
190
|
-
-
|
|
191
|
-
-
|
|
192
|
-
-
|
|
193
|
-
- Be specific about technical requirements
|
|
241
|
+
- Every criterion MUST be one of the three executable types (see "Acceptance Criteria Types" above): test passes / observable behavior / state condition.
|
|
242
|
+
- 2–3 criteria per sub-task. If a sub-task needs more than 3 to be checkable, split it.
|
|
243
|
+
- Prose criteria ("works correctly", "is consistent", "user-friendly") are forbidden — they're not self-checkable.
|
|
194
244
|
|
|
195
|
-
- **Decomposition**: Split if 5
|
|
245
|
+
- **Decomposition (hard size cap)**: Split if ANY of: >3 files touched, >5 criteria, >~200 LOC, mid-task scope judgment required, or more than one concern.
|
|
196
246
|
|
|
197
247
|
### 4d. Validate Task Structure
|
|
198
248
|
- **Action** — VerifyCoverage: Cross-reference tasks against extracted requirements.
|
|
@@ -204,13 +254,19 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
|
|
|
204
254
|
- **Coverage Validation**:
|
|
205
255
|
- [ ] All extracted requirements from Step 3 addressed by tasks?
|
|
206
256
|
- [ ] No gaps in requirement coverage?
|
|
257
|
+
- [ ] Every "Verification" entry from `plan.md` mapped to at least one acceptance criterion?
|
|
207
258
|
- **Exclusion Validation**:
|
|
208
|
-
- [ ]
|
|
209
|
-
- [ ]
|
|
259
|
+
- [ ] No additions beyond explicit requests?
|
|
260
|
+
- [ ] `plan.md`'s "Out-of-Bounds — DO NOT add" list carried forward verbatim into tasks.md banner?
|
|
261
|
+
- [ ] No task implements anything in the Out-of-Bounds list?
|
|
210
262
|
- **Structure Validation**:
|
|
211
263
|
- [ ] Parent tasks are small-medium scope, sub-tasks are atomic?
|
|
212
|
-
- [ ] Each sub-task has 2-3 acceptance criteria?
|
|
213
|
-
- [ ]
|
|
264
|
+
- [ ] Each sub-task has 2-3 acceptance criteria, each one of the three executable types?
|
|
265
|
+
- [ ] No sub-task exceeds the size cap (>3 files / >5 criteria / >~200 LOC / multi-concern / mid-task scope judgment)?
|
|
266
|
+
- [ ] Every behavior-changing sub-task is preceded by a RED test sub-task?
|
|
267
|
+
- [ ] Every sub-task has a Context payload (2–4 file:line refs, 1 canonical reference, 1 plan.md anchor)?
|
|
268
|
+
- [ ] Every parent task has Predecessor and Unblocks declared?
|
|
269
|
+
- [ ] Phase 0 — Dependency Verification is present and unblocks Phase 1?
|
|
214
270
|
|
|
215
271
|
- **Action** — ValidateIntegration: Verify every build task is wired to consumers.
|
|
216
272
|
- **Consumer Specified**:
|
|
@@ -288,6 +344,13 @@ Save to `${TASKS_FILE}`:
|
|
|
288
344
|
- **In Scope**: {bullet list}
|
|
289
345
|
- **Out of Scope**: {bullet list}
|
|
290
346
|
|
|
347
|
+
## Out-of-Bounds — DO NOT add
|
|
348
|
+
*Carried forward verbatim from plan.md. Executors: if a task tempts you to add any of these, stop and ask.*
|
|
349
|
+
- {Forbidden addition 1, e.g. "rate limiting"}
|
|
350
|
+
- {Forbidden addition 2, e.g. "retry/backoff"}
|
|
351
|
+
- {Forbidden addition 3, e.g. "telemetry events"}
|
|
352
|
+
- {Forbidden addition 4, e.g. "admin UI"}
|
|
353
|
+
|
|
291
354
|
## Requirements Traced
|
|
292
355
|
| ID | Description | Source | Tasks |
|
|
293
356
|
|----|-------------|--------|-------|
|
|
@@ -315,30 +378,66 @@ Save to `${TASKS_FILE}`:
|
|
|
315
378
|
|
|
316
379
|
## Tasks
|
|
317
380
|
|
|
381
|
+
### Phase 0: Dependency Verification
|
|
382
|
+
*Confirms every external dependency in plan.md exists at the declared version before any implementation begins.*
|
|
383
|
+
|
|
384
|
+
#### [0.1] Verify external dependencies
|
|
385
|
+
- **Predecessor**: none
|
|
386
|
+
- **Unblocks**: 1.1
|
|
387
|
+
- [ ] **0.1.1** Verify each package@version from plan.md "External Dependencies" section exists
|
|
388
|
+
- **Produces**: confirmation log of resolved package metadata
|
|
389
|
+
- **Consumed by**: Phase 1 implementation tasks
|
|
390
|
+
- **Context**:
|
|
391
|
+
- plan.md anchor: `## External Dependencies — Verify Before Implementation`
|
|
392
|
+
- check commands listed in plan section
|
|
393
|
+
- [ ] State condition: `npm view <pkg>@<ver>` returns valid metadata for every package
|
|
394
|
+
- [ ] State condition: no package in the list is flagged as deprecated or security-advised
|
|
395
|
+
- [ ] Test passes: minimal import-and-call smoke for each new package
|
|
396
|
+
|
|
318
397
|
### Phase 1: {Phase Name}
|
|
319
398
|
|
|
320
399
|
#### [1.1] {Parent Task Title}
|
|
321
|
-
-
|
|
400
|
+
- **Predecessor**: 0.1
|
|
401
|
+
- **Unblocks**: 1.2
|
|
402
|
+
|
|
403
|
+
- [ ] **1.1.1 RED** Write failing test `{test_name}` asserting `{behavior}`
|
|
404
|
+
- **Produces**: a failing test that pins the desired behavior
|
|
405
|
+
- **Consumed by**: 1.1.2 (turns this red to green)
|
|
406
|
+
- **Replaces**: N/A
|
|
407
|
+
- **Context**:
|
|
408
|
+
- `path/to/existing/code.ts:42` — current behavior being changed
|
|
409
|
+
- `path/to/similar/test.ts:18` — canonical test shape to follow
|
|
410
|
+
- plan.md anchor: `### Verification — How We Know This Works`
|
|
411
|
+
- [ ] State condition: file `path/to/test.ts` exists and contains test `{test_name}`
|
|
412
|
+
- [ ] Test passes: the new test fails, with failure message referencing the unimplemented behavior
|
|
413
|
+
|
|
414
|
+
- [ ] **1.1.2 Build** {Implement the change}
|
|
322
415
|
- **Produces**: {output variable/value/prop}
|
|
323
416
|
- **Consumed by**: {component/hook that uses this}
|
|
324
417
|
- **Replaces**: {old code path, or "N/A" if new}
|
|
325
|
-
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
-
|
|
330
|
-
-
|
|
331
|
-
-
|
|
332
|
-
- [ ] {
|
|
333
|
-
- [ ] {Technical outcome 2}
|
|
418
|
+
- **Context**:
|
|
419
|
+
- `path/to/file.ts:120` — code to modify
|
|
420
|
+
- `path/to/file.ts:180` — adjacent code that must not regress
|
|
421
|
+
- `path/to/canonical/example.ts:55` — pattern to follow (from @patterns research)
|
|
422
|
+
- plan.md anchor: `## Technical Approach`
|
|
423
|
+
- [ ] Test passes: `{test_name}` (from 1.1.1) now passes
|
|
424
|
+
- [ ] Test passes: existing tests in `path/to/related.test.ts` still pass
|
|
425
|
+
- [ ] Observable behavior: `{specific runtime signal, e.g. log line, HTTP response shape}`
|
|
334
426
|
|
|
335
427
|
#### [1.2] {Parent Task Title} — Integration
|
|
336
|
-
|
|
337
|
-
-
|
|
338
|
-
|
|
428
|
+
- **Predecessor**: 1.1
|
|
429
|
+
- **Unblocks**: {next parent or "terminal"}
|
|
430
|
+
|
|
431
|
+
- [ ] **1.2.1** Wire {1.1.2 output} to {consumer}
|
|
432
|
+
- **Wires**: {1.1.2 output} → {consumer component/render}
|
|
339
433
|
- **Removes**: {old code path being replaced}
|
|
340
|
-
-
|
|
341
|
-
|
|
434
|
+
- **Context**:
|
|
435
|
+
- `path/to/consumer.tsx:30` — where the wire lands
|
|
436
|
+
- `path/to/old/path.ts:12` — old code path to remove
|
|
437
|
+
- plan.md anchor: `### Technical Approach`
|
|
438
|
+
- [ ] Test passes: integration test asserting consumer renders new data source
|
|
439
|
+
- [ ] State condition: old code path file `path/to/old/path.ts` deleted or import removed
|
|
440
|
+
- [ ] Observable behavior: data flows from producer to rendered output (with `{specific assertion}`)
|
|
342
441
|
|
|
343
442
|
### Phase 2: {Phase Name}
|
|
344
443
|
...
|
|
@@ -49,9 +49,11 @@ Treat the current command arguments as this workflow's input. When invoked from
|
|
|
49
49
|
- `OUT_DIR=docs/tasks/{branch_name}` (or user-specified)
|
|
50
50
|
- `mkdir -p "${OUT_DIR}"`
|
|
51
51
|
|
|
52
|
-
- **Action** — ScanExistingContext: Read all existing artifacts in `{OUT_DIR}/ (if you haven
|
|
52
|
+
- **Action** — ScanExistingContext: Read all existing artifacts in `{OUT_DIR}/ (if you haven't already)` and assess coverage across 4 dimensions.
|
|
53
53
|
|
|
54
|
-
Scan for: `task_context.md`, `specs/plan.md`, `concepts/scope.md`, `research/*.md`
|
|
54
|
+
Scan for: `task_context.md`, `specs/plan.md`, `concepts/scope.md`, `specs/ux.md`, `research/*.md`
|
|
55
|
+
|
|
56
|
+
While scanning `concepts/scope.md` and `specs/ux.md`, extract any **filled assumptions** — places where the upstream artifact defaulted a value because the user didn't specify (e.g., DB choice, retry policy, copy variants, segment fallbacks). Carry these forward to Step 3's design surface so they're reviewer-visible before plan generation.
|
|
55
57
|
|
|
56
58
|
| Dimension | Covered if artifact contains... | Covered by |
|
|
57
59
|
| --- | --- | --- |
|
|
@@ -95,13 +97,26 @@ Use research findings from Step 1 to determine appropriate planning depth.
|
|
|
95
97
|
| Integration points | Research findings | Internal only = Low, 1-2 external = Med, 3+ external = High |
|
|
96
98
|
| External complexity | @web-research | Well-documented with libraries = Low, Some prior art = Med, Novel/emerging = High |
|
|
97
99
|
|
|
98
|
-
- **Action** — CheckHardStops: Any true = automatic COMPREHENSIVE
|
|
100
|
+
- **Action** — CheckHardStops: Any true = automatic COMPREHENSIVE.
|
|
101
|
+
- `db_schema_destructive` — drops, renames, or non-additive column changes
|
|
102
|
+
- `data_migration_required` — backfill, transform, or row-by-row data change
|
|
103
|
+
- `new_service_or_component` — net-new service, daemon, or top-level component
|
|
104
|
+
- `auth_or_pii_change` — authn/authz flow, session handling, PII storage/exposure
|
|
105
|
+
- `secrets_or_credentials_handling` — new secret introduced, rotation, or boundary change
|
|
106
|
+
- `payment_billing_logic` — money flow, invoicing, charge logic
|
|
107
|
+
- `public_api_change` — externally-consumed API surface modified
|
|
108
|
+
- `concurrent_writes_or_locking` — concurrency, locking, or distributed coordination
|
|
109
|
+
- `caching_consistency` — cache invalidation, staleness windows, multi-tier caching
|
|
110
|
+
- `cross_service_or_cross_workspace_change` — coordinated change across services or workspaces
|
|
111
|
+
- `slo_sla_risk` — latency, throughput, or availability budget at stake
|
|
112
|
+
|
|
113
|
+
- **Action** — DetermineTier (decisive rules, not point-scoring):
|
|
99
114
|
|
|
100
|
-
- **
|
|
115
|
+
- **COMPREHENSIVE** — if ANY hard-stop is triggered OR any signal scores High OR two or more signals score Medium
|
|
116
|
+
- **STANDARD** — if no hard-stops AND no High signals AND at most one Medium signal
|
|
117
|
+
- **LIGHT** — only if every signal scores Low AND no hard-stops AND the change is plausibly a single-file diff
|
|
101
118
|
|
|
102
|
-
|
|
103
|
-
- **STANDARD**: Mix of Low/Med signals, multi-file but contained scope, no hard-stops
|
|
104
|
-
- **COMPREHENSIVE**: Any High signal, multiple Med signals, or any hard-stop triggered
|
|
119
|
+
When in doubt between two tiers, choose the higher. The cost of over-planning a small change is hours; the cost of under-planning a large one is weeks.
|
|
105
120
|
|
|
106
121
|
- **Action** — LogTier: Note the assessed tier in your response for transparency, then proceed immediately to the next step. Do NOT ask for confirmation.
|
|
107
122
|
|
|
@@ -129,6 +144,14 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
|
|
|
129
144
|
> - [decision] — [rationale; alternative considered]
|
|
130
145
|
> - [decision] — [rationale; alternative considered]
|
|
131
146
|
>
|
|
147
|
+
> **How we'll know it works** (verification spine):
|
|
148
|
+
> - [change] → [test name | observable behavior | state condition]
|
|
149
|
+
> - [change] → [test name | observable behavior | state condition]
|
|
150
|
+
>
|
|
151
|
+
> **Filled assumptions** (surfaced from scope.md / ux.md / inferred):
|
|
152
|
+
> - [assumption] — *source: [scope.md / ux.md / default]*
|
|
153
|
+
> - [assumption] — *source: [scope.md / ux.md / default]*
|
|
154
|
+
>
|
|
132
155
|
> **Open questions** (with default assumption):
|
|
133
156
|
> 1. [question] — *default: [assumption]*
|
|
134
157
|
> 2. [question] — *default: [assumption]*
|
|
@@ -138,6 +161,8 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
|
|
|
138
161
|
**CRITICAL**:
|
|
139
162
|
- **Single proposed approach**, not a menu. If a true fork exists, surface it as an open question with your recommendation — not as parallel options.
|
|
140
163
|
- Stay at the *shape* level: components, key decisions, structural changes. Defer file-by-file detail to `create_plan`.
|
|
164
|
+
- **Verification is mandatory.** Every major change in the approach must declare how it will be checked — falsifiable signal, not prose. This becomes the spine that `create_plan` and `create_tasks` build on.
|
|
165
|
+
- **Filled assumptions are mandatory.** If scope.md or ux.md left something silent and you defaulted it, surface the default here. Reviewer-visible by design — these are the silent decisions that bite at execution.
|
|
141
166
|
- Open questions should be specific and answerable; pair each with a default assumption so the user can skip if the default is fine.
|
|
142
167
|
|
|
143
168
|
- **Action** — IterateDesign: If the user replies with answers, edits, or pushback, update the design and re-present. Loop until user says 'looks good' (or equivalent).
|
|
@@ -208,4 +233,24 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
|
|
|
208
233
|
|
|
209
234
|
---
|
|
210
235
|
|
|
236
|
+
### Post-Tasks Tier Re-check
|
|
237
|
+
|
|
238
|
+
After tasks return, do a fast self-check against tier signals:
|
|
239
|
+
|
|
240
|
+
- Count parent tasks, sub-tasks, files touched (sum of unique paths in Context blocks), and Phase 0 dep count.
|
|
241
|
+
- **Escalation triggers** (any true → recommend re-running at a higher tier):
|
|
242
|
+
- Tier was LIGHT but tasks touch >3 files OR have >2 parent tasks
|
|
243
|
+
- Tier was STANDARD but tasks reveal a hard-stop signal not caught earlier (e.g., a migration sub-task appeared)
|
|
244
|
+
- Tasks contain any Out-of-Bounds violation
|
|
245
|
+
- **Downgrade triggers** (rare; only suggest if confident):
|
|
246
|
+
- Tier was COMPREHENSIVE but tasks collapsed to a single parent with no migrations, no new components, and no API change
|
|
247
|
+
|
|
248
|
+
If an escalation/downgrade is triggered, surface it as a recommendation — do NOT silently re-run. Format:
|
|
249
|
+
|
|
250
|
+
> Tier reassessment: I planned this as {original tier}, but tasks revealed {signal}. Recommend re-running as {new tier}. Reply 'rerun' to regenerate or 'keep' to proceed as-is.
|
|
251
|
+
|
|
252
|
+
Only proceed past this checkpoint when the user confirms.
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
211
256
|
- **Action** — RenderFooter: Use `Skill(spectre-guide)` skill for Next Steps
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: "plan_review"
|
|
3
|
-
description: "👻 |
|
|
3
|
+
description: "👻 | Independent multi-lens review of plan.md + tasks.md — finds overengineering, missing verification, hallucinated deps, weak references"
|
|
4
4
|
user-invocable: true
|
|
5
5
|
---
|
|
6
6
|
|
|
@@ -10,33 +10,174 @@ user-invocable: true
|
|
|
10
10
|
|
|
11
11
|
Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
|
|
12
12
|
|
|
13
|
+
# plan_review: Multi-Lens Review of Plan & Tasks
|
|
13
14
|
|
|
14
|
-
|
|
15
|
+
## Description
|
|
15
16
|
|
|
16
|
-
|
|
17
|
+
- **What** — Independent review of `plan.md` + `tasks.md` from four specialized lenses, dispatched in parallel
|
|
18
|
+
- **Outcome** — Structured findings with concrete edit suggestions; optional write-back to update both artifacts
|
|
19
|
+
- **Role** — Senior staff engineer + reviewer panel; bias toward pragmatic problem-solving, YAGNI enforcement, and verifiability
|
|
17
20
|
|
|
18
|
-
|
|
19
|
-
1. **What to simplify** - Specific component, process, or decision
|
|
20
|
-
2. **Why** - What complexity it removes (cognitive load, dependencies, maintenance burden, etc.)
|
|
21
|
-
3. **Impact** - Confirm that all original requirements remain satisfied
|
|
22
|
-
4. **Risk** - Any trade-offs or risks introduced by the simplification
|
|
21
|
+
## ARGUMENTS Input
|
|
23
22
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
- Questioning assumptions that add complexity
|
|
28
|
-
- Identifying over-engineering
|
|
29
|
-
- Suggesting proven, boring solutions over novel approaches
|
|
23
|
+
<ARGUMENTS>
|
|
24
|
+
$ARGUMENTS
|
|
25
|
+
</ARGUMENTS>
|
|
30
26
|
|
|
31
|
-
##
|
|
32
|
-
**Context**: We use fast TDD with 1 happy path test + 1 unhappy path test per feature. A separate task handles achieving 100% test coverage post-feature work.
|
|
27
|
+
## Why Four Lenses
|
|
33
28
|
|
|
34
|
-
|
|
35
|
-
- **Over-testing**: Tests beyond 1 happy + 1 unhappy path that should be deferred to the coverage task
|
|
36
|
-
- **Wrong tests**: Testing implementation details instead of behavior, brittle tests that will break on refactors, or tests that don't actually validate requirements
|
|
37
|
-
- **Missing critical paths**: Cases where the 1+1 approach genuinely misses a requirement-breaking scenario (rare, but call it out)
|
|
38
|
-
- **Test complexity**: Overly elaborate test setup, mocking, or assertions that could be simpler
|
|
29
|
+
A single reviewer biases toward the issues it notices first. Published practice (Cognition, Anthropic, Osmani) converges on four high-yield review angles for AI-agent-authored plans. We dispatch each as a parallel subagent so coverage is structurally guaranteed, not dependent on a single reviewer remembering everything.
|
|
39
30
|
|
|
40
|
-
|
|
31
|
+
| Lens | Subagent | Finds |
|
|
32
|
+
|------|----------|-------|
|
|
33
|
+
| **YAGNI / familiar-shape bias** | `@reviewer` | Mature-system patterns that crept in unprompted (auth → rate-limit, CRUD → soft-delete, etc.). Forces ONE "delete this" recommendation. |
|
|
34
|
+
| **Verifiability** | `@analyst` | Acceptance criteria that aren't executable; verification gaps between plan and tasks. |
|
|
35
|
+
| **Existence / hallucination** | `@finder` | File paths, packages, APIs, or symbols referenced that don't actually exist. The slopsquatting fence. |
|
|
36
|
+
| **Canonical reference quality** | `@patterns` | "Follow existing pattern" claims without a real file:line anchor; missed reuse opportunities. |
|
|
41
37
|
|
|
42
|
-
|
|
38
|
+
## Step 1 — Locate Artifacts
|
|
39
|
+
|
|
40
|
+
- **Action** — DetermineTaskDir:
|
|
41
|
+
- `branch_name=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo unknown)`
|
|
42
|
+
- **If** user specifies path in ARGUMENTS → `TASK_DIR={that value}`
|
|
43
|
+
- **Else** → `TASK_DIR=docs/tasks/{branch_name}`
|
|
44
|
+
|
|
45
|
+
- **Action** — ResolveArtifacts: Locate the three required inputs.
|
|
46
|
+
- `PLAN=${TASK_DIR}/specs/plan.md` (or scoped name)
|
|
47
|
+
- `TASKS=${TASK_DIR}/specs/tasks.md` (or scoped name)
|
|
48
|
+
- `CONTEXT=${TASK_DIR}/task_context.md`
|
|
49
|
+
- If any are missing, list what's missing and stop — do NOT review against a partial set. Suggest the user run `plan` or `create_tasks` first.
|
|
50
|
+
|
|
51
|
+
- **Action** — ReadAll: Read each file completely into context before dispatching reviewers. Reviewers receive curated excerpts, not raw paths.
|
|
52
|
+
|
|
53
|
+
## Step 2 — Dispatch Four Parallel Reviewers
|
|
54
|
+
|
|
55
|
+
Spawn all four subagents in a single message (parallel). Each receives the same artifact excerpts but a different review brief.
|
|
56
|
+
|
|
57
|
+
### Lens 1 — YAGNI / Familiar-Shape Bias (`@reviewer`)
|
|
58
|
+
|
|
59
|
+
> Review this plan and task list for unrequested complexity. Agents have a documented "familiar-shape bias": shown a feature, they reproduce the mature-system shape from their training data (auth → adds rate-limiting; CRUD → adds soft-delete; form → adds optimistic UI; service → adds telemetry; module → adds feature flags). Your job is to find that bias here.
|
|
60
|
+
>
|
|
61
|
+
> Find:
|
|
62
|
+
> 1. Anything in `plan.md` Technical Approach that isn't traceable to a requirement in `task_context.md` / scope / PRD.
|
|
63
|
+
> 2. Tasks in `tasks.md` that implement something the requirements don't ask for.
|
|
64
|
+
> 3. Abstractions, interfaces, or layers introduced for a single concrete caller.
|
|
65
|
+
> 4. Generality (config files, plugin points, factories) where the actual need is one specific behavior.
|
|
66
|
+
> 5. Overlap with the `Out-of-Bounds — DO NOT add` list (if anything violates that list, it's a hard fail).
|
|
67
|
+
>
|
|
68
|
+
> Required output: nominate the SINGLE highest-leverage thing to delete and justify it. You must pick one. Then list other simplifications ranked by impact. For each finding, cite the exact file:line or section header it lives in.
|
|
69
|
+
|
|
70
|
+
### Lens 2 — Verifiability (`@analyst`)
|
|
71
|
+
|
|
72
|
+
> Review this plan and task list for verification quality. The single highest-correlate of successful AI-agent execution is the ability to self-verify. Find every place where verification is missing, prose-only, or disconnected.
|
|
73
|
+
>
|
|
74
|
+
> Find:
|
|
75
|
+
> 1. Items in `plan.md` "Verification — How We Know This Works" that are prose ("works correctly", "is consistent") rather than executable (test name / observable behavior / state condition).
|
|
76
|
+
> 2. Phases in `plan.md` that don't declare a verification signal.
|
|
77
|
+
> 3. Sub-tasks in `tasks.md` whose acceptance criteria aren't one of the three executable types (test passes / observable behavior / state condition).
|
|
78
|
+
> 4. Verification signals in `plan.md` with no matching acceptance criterion in `tasks.md`.
|
|
79
|
+
> 5. Behavior-changing sub-tasks in `tasks.md` that lack a preceding RED test sub-task.
|
|
80
|
+
>
|
|
81
|
+
> Required output: list every non-executable criterion with a proposed rewrite in one of the three types. Cite file:line for each.
|
|
82
|
+
|
|
83
|
+
### Lens 3 — Existence / Hallucination (`@finder`)
|
|
84
|
+
|
|
85
|
+
> Review this plan and task list for references to things that may not exist. AI-generated plans hallucinate file paths, package names, function signatures, and API endpoints at measurable rates (~20% for packages per Snyk analysis). Your job is to verify every reference is real.
|
|
86
|
+
>
|
|
87
|
+
> Verify:
|
|
88
|
+
> 1. Every file path mentioned in `plan.md` "Critical Files for Implementation" and in `tasks.md` Context blocks — does the file exist in the repo today? Use Glob/Read to confirm.
|
|
89
|
+
> 2. Every package in `plan.md` "External Dependencies" — does it exist at the named version? (Note: actual install/registry check is the executor's Phase 0 job; your job is to flag suspicious names — typos, near-misses to well-known packages, lookalikes.)
|
|
90
|
+
> 3. Every function, class, or symbol named in plan/tasks — grep the repo, confirm it exists where claimed.
|
|
91
|
+
> 4. Every API endpoint, env var, or CLI flag referenced — confirm it's defined in the codebase.
|
|
92
|
+
>
|
|
93
|
+
> Required output: list every reference that fails verification, with `expected: <plan claim>` and `actual: <repo state>`. If everything checks out, say so explicitly — don't pad.
|
|
94
|
+
|
|
95
|
+
### Lens 4 — Canonical Reference Quality (`@patterns`)
|
|
96
|
+
|
|
97
|
+
> Review this plan and task list for the quality of "follow existing pattern" references. Anthropic's own guidance is to anchor plans with concrete examples (e.g., "HotDogWidget.php is a good example"). Vague "follow existing patterns" without a file:line anchor is a documented failure mode.
|
|
98
|
+
>
|
|
99
|
+
> Find:
|
|
100
|
+
> 1. Places in `plan.md` Technical Approach that reference "existing patterns" or "similar features" without a specific file:line.
|
|
101
|
+
> 2. Sub-tasks in `tasks.md` whose Context block lacks a canonical reference pointer.
|
|
102
|
+
> 3. Better canonical references that the plan missed — actual files in the codebase that more closely match the intended shape.
|
|
103
|
+
> 4. Reuse opportunities the plan ignored: utilities, hooks, helpers, or types already in the repo that the plan re-implements.
|
|
104
|
+
>
|
|
105
|
+
> Required output: for each weak/missing reference, propose a specific file:line that should be the anchor. For each missed reuse, cite the existing utility and which task should use it.
|
|
106
|
+
|
|
107
|
+
## Step 3 — Synthesize Findings
|
|
108
|
+
|
|
109
|
+
- **Action** — CollectFindings: Wait for all four reviewers to return. Read every finding.
|
|
110
|
+
|
|
111
|
+
- **Action** — DeduplicateAndPrioritize: Merge findings that overlap (e.g., a missing canonical reference may surface from both Lens 4 and Lens 2). Assign severity:
|
|
112
|
+
- **Blocker** — would cause execution to fail or produce wrong output (hallucinated file path, criterion the executor can't check, Out-of-Bounds violation)
|
|
113
|
+
- **High** — meaningfully reduces output quality (missing RED test, weak canonical reference, prose criterion)
|
|
114
|
+
- **Medium** — overengineering or reuse miss without functional blast radius
|
|
115
|
+
- **Low** — stylistic or nice-to-have
|
|
116
|
+
|
|
117
|
+
- **Action** — RenderFindingsTable: Output a single structured table. Schema is fixed.
|
|
118
|
+
|
|
119
|
+
```markdown
|
|
120
|
+
## Review Findings — {feature name}
|
|
121
|
+
|
|
122
|
+
### Must-Delete (Lens 1 — YAGNI)
|
|
123
|
+
> {The single nominated highest-leverage cut, with rationale.}
|
|
124
|
+
|
|
125
|
+
### Findings
|
|
126
|
+
|
|
127
|
+
| # | Severity | Lens | Location | Finding | Suggested Edit |
|
|
128
|
+
|---|----------|------|----------|---------|----------------|
|
|
129
|
+
| 1 | Blocker | Existence | plan.md `## External Dependencies` | `react-use-undocumented@2.4.0` doesn't exist on npm | Remove; the plan can use `useReducer` from React stdlib (see `src/hooks/useFormState.ts:18`) |
|
|
130
|
+
| 2 | High | Verifiability | tasks.md `1.2.1` | "Component renders correctly" is prose | Replace with: Test passes `<ProductCard /> renders product.title and product.price` |
|
|
131
|
+
| 3 | High | YAGNI | plan.md `## Technical Approach` | Adds retry-with-backoff for a sync internal call | Delete; not in requirements; Out-of-Bounds list already forbids retry logic |
|
|
132
|
+
| … | | | | | |
|
|
133
|
+
|
|
134
|
+
### Summary
|
|
135
|
+
- Blockers: {N} — must resolve before /execute
|
|
136
|
+
- High: {N}
|
|
137
|
+
- Medium: {N}
|
|
138
|
+
- Low: {N}
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
## Step 4 — Surface Findings & Apply Edits
|
|
142
|
+
|
|
143
|
+
- **Action** — PresentFindings: Render the findings table inline.
|
|
144
|
+
|
|
145
|
+
- **Action** — OfferWriteBack: After the table, prompt:
|
|
146
|
+
|
|
147
|
+
> Reply with which findings to apply:
|
|
148
|
+
> - `all` — apply every suggested edit
|
|
149
|
+
> - `blockers` — apply Blocker + High severity only
|
|
150
|
+
> - `1,3,5` — apply specific finding numbers
|
|
151
|
+
> - `skip` — leave artifacts unchanged
|
|
152
|
+
>
|
|
153
|
+
> For findings I apply, I'll edit plan.md and/or tasks.md inline and re-run a fast self-check.
|
|
154
|
+
|
|
155
|
+
- **Wait** — User selects.
|
|
156
|
+
|
|
157
|
+
- **Action** — ApplyEdits: For each selected finding:
|
|
158
|
+
- Open the named artifact (plan.md or tasks.md)
|
|
159
|
+
- Apply the Suggested Edit verbatim where possible; if the edit needs adaptation, make the minimum change consistent with the finding's intent
|
|
160
|
+
- Track which findings were applied
|
|
161
|
+
|
|
162
|
+
- **Action** — SelfCheck: After edits, run a fast pass over the modified sections:
|
|
163
|
+
- Re-verify any file:line refs touched
|
|
164
|
+
- Re-verify acceptance criteria are still executable
|
|
165
|
+
- Confirm no edit introduced a new Out-of-Bounds violation
|
|
166
|
+
- If any check fails, surface it and ask the user before continuing
|
|
167
|
+
|
|
168
|
+
- **Action** — ReportApplied:
|
|
169
|
+
|
|
170
|
+
> Applied: {list of finding numbers}. Skipped: {list}.
|
|
171
|
+
> {Path to updated plan.md and tasks.md}.
|
|
172
|
+
|
|
173
|
+
## Step 5 — Next Steps
|
|
174
|
+
|
|
175
|
+
- **Action** — RenderFooter: Use `Skill(spectre-guide)` skill for Next Steps footer.
|
|
176
|
+
|
|
177
|
+
---
|
|
178
|
+
|
|
179
|
+
## Notes
|
|
180
|
+
|
|
181
|
+
- This skill does NOT generate plans or tasks. It reviews them. If `plan.md` or `tasks.md` doesn't exist, route the user to `plan` first.
|
|
182
|
+
- The four lenses are intentionally non-overlapping by design but will surface overlap in practice — dedupe at synthesis, don't ask reviewers to coordinate.
|
|
183
|
+
- The "Must-Delete" nomination from Lens 1 is mandatory output — even on a tight plan, naming the single weakest element is a forcing function against under-review.
|