@codename_inc/spectre 5.0.0 → 5.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@codename_inc/spectre",
3
- "version": "5.0.0",
3
+ "version": "5.2.0",
4
4
  "type": "module",
5
5
  "bin": {
6
6
  "spectre": "./bin/spectre.js"
@@ -1,5 +1,5 @@
1
1
  {
2
2
  "name": "spectre",
3
- "version": "5.0.0",
3
+ "version": "5.2.0",
4
4
  "description": "Agentic coding workflow with session memory. spectre guides you through Scope, Plan, Execute, Clean, Test, Rebase, and Extract phases."
5
5
  }
@@ -171,6 +171,8 @@ Optional user input to seed this workflow.
171
171
  - **MEDIUM**: Quality improvements, test coverage, configuration, performance (non-critical)
172
172
  - **LOW**: Documentation, polish, cleanup
173
173
 
174
+ **Evidence rule:** Every CRITICAL or HIGH finding MUST include (1) `file:line` and (2) a reproducible failure scenario or exploit path describing observable behavior. Findings without an evidence chain are auto-downgraded one severity level. "Could potentially" is not evidence.
175
+
174
176
  **Perform comprehensive analysis covering all aspects:**
175
177
 
176
178
  ### 🔧 Foundation & Correctness
@@ -29,8 +29,9 @@ Treat the current command arguments as this workflow's input. When invoked from
29
29
  - **If** found with comprehensive analysis → use existing research; skip to Step 3.
30
30
  - **Else** → proceed with new research below.
31
31
  - **Action** — AutomatedResearch: Spawn parallel research agents for comprehensive analysis.
32
- - Use `codebase-locator` to find all files related to feature area.
33
- - Dispatch multiple parallel `codebase-analyzer` subagents to understand current implementation patterns. Pay particular attention to how and where data is accessed that will be needed for this feature.
32
+ - Use `@finder` to find all files related to feature area.
33
+ - Dispatch multiple parallel `@analyst` subagents to understand current implementation patterns. Pay particular attention to how and where data is accessed that will be needed for this feature.
34
+ - Use `@patterns` to surface canonical reference implementations already in the codebase — these become "follow this file" anchors in the plan.
34
35
  - Wait for ALL agents to complete before proceeding.
35
36
  - Read ALL identified files into context.
36
37
  - **Action** — TraceCodePaths: Trace through relevant execution paths.
@@ -97,17 +98,28 @@ Dynamically generate up to 10 technical questions based on research findings. **
97
98
 
98
99
  - **Action** — DesignTechnicalApproach: Create the implementation plan.
99
100
 
100
- **STANDARD** depth Focused plan for contained changes. Include the sections that matter for THIS feature. Typical sections: Overview, Desired End State, Out of Scope, Technical Approach.
101
-
102
- **COMPREHENSIVE** depth — Full technical design for complex/risky changes. Consider all of the following, but only include sections relevant to the feature: Overview, Current State (with file:line refs), Desired End State, Out of Scope, Technical Approach, System Architecture, Implementation Phases, Component/Data Architecture, API Design, Testing Strategy.
103
-
104
- Use your judgment — the goal is a plan that gives a developer everything they need to implement, not a template with empty sections.
105
-
106
- - **Action** — AppendCriticalFiles: End the plan with a "Critical Files for Implementation" section.
107
-
108
- - List 3-7 files most critical for implementing this plan.
109
- - Format: `path/to/file.ts` brief reason (e.g., "Core logic to modify", "Pattern to follow", "Interface to implement").
110
- - These should be specific files discovered during research, not guesses.
101
+ Every plan, regardless of depth, MUST include these seven sections. They are the verification spine without them, downstream agents cannot self-check their work.
102
+
103
+ **Required for both STANDARD and COMPREHENSIVE:**
104
+ 1. **Overview** — 1–2 paragraphs: what problem, what shape the solution takes, why this approach.
105
+ 2. **Technical Approach**How the change actually lands: components touched, data flow, key decisions with rationale. Reference existing patterns from `@patterns` research by file:line (e.g., "follow the shape of `src/widgets/HotDogWidget.ts:42` for the registration step").
106
+ 3. **Critical Files for Implementation** — 3–7 specific files from research. Format: `path/to/file.ts` — *reason* (Core logic to modify / Pattern to follow / Interface to implement / Test to extend). No guesses — only files surfaced during Step 1 research.
107
+ 4. **External Dependencies — Verify Before Implementation** — Every third-party package required, with exact version and a one-line existence check. Format: `package@1.2.3 — verify: npm view package@1.2.3` (or pip equivalent). Required even if "no new packages" (write that explicitly). This is the slopsquatting fence: ~20% of AI-suggested packages don't exist; we catch that here, not in production.
108
+ 5. **Verification — How We Know This Works** — For each major change in Technical Approach, 1–3 falsifiable signals: a test name, an observable behavior, or a state/file condition. Prose like "the feature works" is not acceptable — it must be checkable. Format: `<change> → verifies by: <test name | observable behavior | state condition>`. These become acceptance criteria in `create_tasks` downstream.
109
+ 6. **Out-of-Bounds DO NOT add** — 4–8 concrete things the implementation must NOT add, even if "best practice." Examples: rate limiting, retry/backoff, caching layer, optimistic UI, soft-delete, telemetry events, feature flags, admin UI. This is the YAGNI fence against familiar-shape bias (agents reproduce mature-system patterns unprompted). Be specific to this feature, not generic.
110
+ 7. **Risks & Filled Assumptions** Two short subsections:
111
+ - *Risks*: what could go wrong (e.g., concurrent write race, migration ordering, third-party rate limit). Each with a one-line mitigation or "accept and monitor."
112
+ - *Filled Assumptions*: things the plan defaulted because the spec didn't say (e.g., "Assumed Postgres; spec didn't specify DB." "Assumed retry count = 0; spec didn't mention failure modes."). Reviewer-visible by design — these are the silent decisions that bite at execution.
113
+
114
+ **COMPREHENSIVE additionally requires:**
115
+ 8. **Current State** — How the affected code path works today, with file:line refs. Anchored to research findings.
116
+ 9. **Implementation Phases** — Ordered phases, each with its own Verification subsection (Phase N succeeds when …). Phases must be sequenced by dependency, not by file. Migration phases come before consumer phases.
117
+ 10. **Component / Data Architecture** — Where data is created, mutated, and read. Schema deltas if any.
118
+ 11. **API Design** — Endpoint signatures, request/response shapes, error contracts. Required if any external or internal API surface changes.
119
+ 12. **Migration Plan** — Required if any data-layer change. Up + down migration sketch, backfill strategy, rollback plan.
120
+ 13. **Testing Strategy** — What test types cover what (unit / integration / e2e), where new tests live, what's deferred to the post-feature coverage task.
121
+
122
+ Use your judgment on section length, not on inclusion. If a required section is genuinely N/A for this feature, write the section header followed by *"N/A — <one-line reason>"*. Empty section headers are not acceptable; absent section headers are not acceptable.
111
123
 
112
124
  - **Action** — DocumentPlan: Save to `{OUT_DIR}/specs/plan.md` (use scoped name if exists)
113
125
 
@@ -129,12 +129,46 @@ Read completely (no limits):
129
129
  - This section helps the user understand how the work integrates with the product before diving into tasks
130
130
 
131
131
  ### Task Hierarchy (4 Levels)
132
- - **📦 Phase**: Organizational header (no checkbox) — groups related parent tasks
133
- - **📋 Parent Task**: Cohesive deliverable (small-medium scope) — one component/file
134
- - **✓ Sub-task**: Atomic work (single focused change) — single action, 2-3 acceptance criteria
135
- - **✓ Acceptance Criteria**: Verifiable outcomes (not implementation steps)
132
+ - **Phase**: Organizational header (no checkbox) — groups related parent tasks
133
+ - **Parent Task**: Cohesive deliverable (small-medium scope) — one component/file
134
+ - **Sub-task**: Atomic work (single focused change) — single action, 2-3 acceptance criteria
135
+ - **Acceptance Criteria**: Executable, verifiable outcomes (see Acceptance Criteria Types below)
136
136
 
137
- **Numbering**: Phase 1 → Parent 1.1, 1.2 → Sub-tasks 1.1.1, 1.1.2 → Criteria
137
+ **Numbering**: Phase 1 → Parent 1.1, 1.2 → Sub-tasks 1.1.1, 1.1.2 → Criteria
138
+
139
+ ### Right-Sized for AI Execution
140
+
141
+ Published data on AI agent execution (Cognition's Devin reviews, Anthropic's Claude Code guidance) converges on a bounded sweet spot: each sub-task should be completable in roughly the time a junior would take in a 4–8 hour window — not a multi-day epic, not a 10-line tweak.
142
+
143
+ **Hard size cap — split a sub-task if ANY of these is true:**
144
+ - Touches more than 3 files
145
+ - Has more than 5 acceptance criteria
146
+ - Would require more than ~200 lines of diff
147
+ - Requires a mid-execution judgment call about scope (split the judgment into its own predecessor task)
148
+ - Spans more than one concern (e.g., schema + UI in one sub-task)
149
+
150
+ When splitting, keep the integration-aware principle intact: each split task still names its Producer / Consumer / Replaces.
151
+
152
+ ### Acceptance Criteria Types
153
+
154
+ Every acceptance criterion MUST be one of three executable types. Prose criteria like "feature works correctly" or "behavior is consistent" are forbidden — an executor cannot self-check them.
155
+
156
+ 1. **Test passes** — `Test \`<test_name>\` passes` (or `tests in <file_path> pass`)
157
+ 2. **Observable behavior** — A specific, checkable runtime signal: `GET /api/x returns 200 with field \`y\``, `Console logs \`event=loaded params={...}\``, `Button click triggers <handler> within 100ms`
158
+ 3. **State / file condition** — `File \`<path>\` exists and contains <pattern>`, `Migration \`<id>\` applied`, `Env var \`X\` is read at startup`
159
+
160
+ Mixing types within a sub-task is fine. What's not fine: criteria the agent cannot verify without asking the user.
161
+
162
+ ### Test-First Task Pairing
163
+
164
+ For any sub-task that changes observable behavior (not pure refactors or cleanup), pair it with a preceding RED task. Pattern:
165
+
166
+ - **N.M.k RED**: Write failing test `<test_name>` asserting `<behavior>`. Acceptance: test exists and fails for the documented reason.
167
+ - **N.M.(k+1) Build**: Implement `<change>`. Acceptance: the RED test passes; no other tests regress.
168
+
169
+ This is the TDAD pattern (test-driven agentic development): the failing test is the executor's self-correction signal. Without it, the executor is guessing whether the implementation is right.
170
+
171
+ Pure refactors, cleanups, and config-only tasks don't require RED pairing — but if behavior changes, the RED comes first.
138
172
 
139
173
  ### Integration-Aware Task Principle
140
174
 
@@ -153,11 +187,23 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
153
187
  - **Cleanup tasks**: Remove/redirect old code paths (MANDATORY when replacing patterns)
154
188
 
155
189
  ### 4b. Create Parent Tasks
156
- - **Action** — CreateParentTasks: Draft as many phases as needed to logically organize work, each with as many parent tasks (📋) as required to cover complete scope.
190
+ - **Action** — CreateParentTasks: Draft as many phases as needed to logically organize work, each with as many parent tasks as required to cover complete scope.
157
191
  - Each parent task = single cohesive deliverable (small-medium scope)
158
192
  - Cover ALL extracted requirements with no gaps
159
193
  - Group related work into phases for clarity
160
194
  - Align with technical approach (from research or existing docs)
195
+ - Every parent task carries explicit sequencing in its body:
196
+ - **Predecessor**: parent task IDs that must complete first (or "none")
197
+ - **Unblocks**: parent task IDs this unblocks (or "terminal")
198
+ - The first phase is always **Phase 0 — Dependency Verification** (see 4a-Phase0 below). Other phases start at Phase 1.
199
+
200
+ ### 4a-Phase0. Phase 0 — Dependency Verification (always present)
201
+
202
+ Before any implementation, generate a Phase 0 containing one sub-task per external dependency listed in `plan.md`'s "External Dependencies — Verify Before Implementation" section. Each sub-task verifies the package exists at the named version and exposes the API the plan assumed.
203
+
204
+ - Acceptance type: state condition (`npm view <pkg>@<ver>` returns valid metadata) and/or test passes (a minimal import-and-call smoke test).
205
+ - If `plan.md` declared "no new packages," Phase 0 is a single sub-task that confirms no new dependencies were silently introduced during implementation (cross-check `package.json` diff at end).
206
+ - Phase 0 unblocks Phase 1; it cannot be skipped or run in parallel with Phase 1.
161
207
 
162
208
  ### 4c. Break Down Sub-tasks
163
209
  - **Action** — BreakdownSubTasks: For each parent, generate as many detailed sub-tasks as needed to complete the parent.
@@ -171,14 +217,19 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
171
217
  - Completable as a single focused change
172
218
 
173
219
  - **What to INCLUDE in sub-tasks:**
174
- - Technical terms (JWT, REST, WebSocket, React hooks, SQL queries)
175
- - Architecture patterns (middleware, pub/sub, observer, factory)
176
- - Integration points (which components connect, API contracts)
177
- - File/component names (UserProfileComponent, authMiddleware.ts)
178
- - Technical constraints (max file size, timeout duration, data format)
179
- - **Produces**: What output this creates (variable name, return value, prop)
180
- - **Consumed by**: What uses this output (component, hook, render path)
181
- - **Replaces**: What old code path this supersedes (if any)
220
+ - Technical terms (JWT, REST, WebSocket, React hooks, SQL queries)
221
+ - Architecture patterns (middleware, pub/sub, observer, factory)
222
+ - Integration points (which components connect, API contracts)
223
+ - File/component names (UserProfileComponent, authMiddleware.ts)
224
+ - Technical constraints (max file size, timeout duration, data format)
225
+ - **Produces**: What output this creates (variable name, return value, prop)
226
+ - **Consumed by**: What uses this output (component, hook, render path)
227
+ - **Replaces**: What old code path this supersedes (if any)
228
+ - **Context** (required): a self-contained payload an executor can use without re-reading the full plan. Include:
229
+ - 2–4 file:line refs pulled from research (the exact code being modified or extended)
230
+ - 1 canonical reference pointer (a file:line from `@patterns` research that shows the shape to follow)
231
+ - 1 link/anchor into `plan.md` for the relevant section
232
+ - **Predecessor** (sub-task level, optional): a sub-task ID this depends on. Only when intra-parent ordering is non-obvious.
182
233
 
183
234
  - **What to AVOID in sub-tasks:**
184
235
  - ❌ Code snippets or pseudo-code
@@ -187,12 +238,11 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
187
238
  - ❌ Specific library API calls (unless architecturally significant)
188
239
 
189
240
  - **Acceptance criteria**:
190
- - Describe technical behaviors and observable outcomes
191
- - Include integration expectations and error handling
192
- - 2-3 verifiable outcomes per sub-task
193
- - Be specific about technical requirements
241
+ - Every criterion MUST be one of the three executable types (see "Acceptance Criteria Types" above): test passes / observable behavior / state condition.
242
+ - 2–3 criteria per sub-task. If a sub-task needs more than 3 to be checkable, split it.
243
+ - Prose criteria ("works correctly", "is consistent", "user-friendly") are forbidden they're not self-checkable.
194
244
 
195
- - **Decomposition**: Split if 5+ criteria or multiple concerns
245
+ - **Decomposition (hard size cap)**: Split if ANY of: >3 files touched, >5 criteria, >~200 LOC, mid-task scope judgment required, or more than one concern.
196
246
 
197
247
  ### 4d. Validate Task Structure
198
248
  - **Action** — VerifyCoverage: Cross-reference tasks against extracted requirements.
@@ -204,13 +254,19 @@ Tasks without consumers are incomplete. Tasks that don't address old code paths
204
254
  - **Coverage Validation**:
205
255
  - [ ] All extracted requirements from Step 3 addressed by tasks?
206
256
  - [ ] No gaps in requirement coverage?
257
+ - [ ] Every "Verification" entry from `plan.md` mapped to at least one acceptance criterion?
207
258
  - **Exclusion Validation**:
208
- - [ ] Adding anything beyond explicit requests?
209
- - [ ] Avoiding "nice-to-have" additions not requested?
259
+ - [ ] No additions beyond explicit requests?
260
+ - [ ] `plan.md`'s "Out-of-Bounds — DO NOT add" list carried forward verbatim into tasks.md banner?
261
+ - [ ] No task implements anything in the Out-of-Bounds list?
210
262
  - **Structure Validation**:
211
263
  - [ ] Parent tasks are small-medium scope, sub-tasks are atomic?
212
- - [ ] Each sub-task has 2-3 acceptance criteria?
213
- - [ ] Acceptance criteria verifiable (not implementation steps)?
264
+ - [ ] Each sub-task has 2-3 acceptance criteria, each one of the three executable types?
265
+ - [ ] No sub-task exceeds the size cap (>3 files / >5 criteria / >~200 LOC / multi-concern / mid-task scope judgment)?
266
+ - [ ] Every behavior-changing sub-task is preceded by a RED test sub-task?
267
+ - [ ] Every sub-task has a Context payload (2–4 file:line refs, 1 canonical reference, 1 plan.md anchor)?
268
+ - [ ] Every parent task has Predecessor and Unblocks declared?
269
+ - [ ] Phase 0 — Dependency Verification is present and unblocks Phase 1?
214
270
 
215
271
  - **Action** — ValidateIntegration: Verify every build task is wired to consumers.
216
272
  - **Consumer Specified**:
@@ -288,6 +344,13 @@ Save to `${TASKS_FILE}`:
288
344
  - **In Scope**: {bullet list}
289
345
  - **Out of Scope**: {bullet list}
290
346
 
347
+ ## Out-of-Bounds — DO NOT add
348
+ *Carried forward verbatim from plan.md. Executors: if a task tempts you to add any of these, stop and ask.*
349
+ - {Forbidden addition 1, e.g. "rate limiting"}
350
+ - {Forbidden addition 2, e.g. "retry/backoff"}
351
+ - {Forbidden addition 3, e.g. "telemetry events"}
352
+ - {Forbidden addition 4, e.g. "admin UI"}
353
+
291
354
  ## Requirements Traced
292
355
  | ID | Description | Source | Tasks |
293
356
  |----|-------------|--------|-------|
@@ -315,30 +378,66 @@ Save to `${TASKS_FILE}`:
315
378
 
316
379
  ## Tasks
317
380
 
381
+ ### Phase 0: Dependency Verification
382
+ *Confirms every external dependency in plan.md exists at the declared version before any implementation begins.*
383
+
384
+ #### [0.1] Verify external dependencies
385
+ - **Predecessor**: none
386
+ - **Unblocks**: 1.1
387
+ - [ ] **0.1.1** Verify each package@version from plan.md "External Dependencies" section exists
388
+ - **Produces**: confirmation log of resolved package metadata
389
+ - **Consumed by**: Phase 1 implementation tasks
390
+ - **Context**:
391
+ - plan.md anchor: `## External Dependencies — Verify Before Implementation`
392
+ - check commands listed in plan section
393
+ - [ ] State condition: `npm view <pkg>@<ver>` returns valid metadata for every package
394
+ - [ ] State condition: no package in the list is flagged as deprecated or security-advised
395
+ - [ ] Test passes: minimal import-and-call smoke for each new package
396
+
318
397
  ### Phase 1: {Phase Name}
319
398
 
320
399
  #### [1.1] {Parent Task Title}
321
- - [ ] **1.1.1** {Sub-task with technical specifics}
400
+ - **Predecessor**: 0.1
401
+ - **Unblocks**: 1.2
402
+
403
+ - [ ] **1.1.1 RED** Write failing test `{test_name}` asserting `{behavior}`
404
+ - **Produces**: a failing test that pins the desired behavior
405
+ - **Consumed by**: 1.1.2 (turns this red to green)
406
+ - **Replaces**: N/A
407
+ - **Context**:
408
+ - `path/to/existing/code.ts:42` — current behavior being changed
409
+ - `path/to/similar/test.ts:18` — canonical test shape to follow
410
+ - plan.md anchor: `### Verification — How We Know This Works`
411
+ - [ ] State condition: file `path/to/test.ts` exists and contains test `{test_name}`
412
+ - [ ] Test passes: the new test fails, with failure message referencing the unimplemented behavior
413
+
414
+ - [ ] **1.1.2 Build** {Implement the change}
322
415
  - **Produces**: {output variable/value/prop}
323
416
  - **Consumed by**: {component/hook that uses this}
324
417
  - **Replaces**: {old code path, or "N/A" if new}
325
- - [ ] {Technical outcome 1}
326
- - [ ] {Technical outcome 2}
327
- - [ ] {Technical outcome 3}
328
-
329
- - [ ] **1.1.2** {Sub-task with technical specifics}
330
- - **Produces**: {output variable/value/prop}
331
- - **Consumed by**: {component/hook that uses this}
332
- - [ ] {Technical outcome 1}
333
- - [ ] {Technical outcome 2}
418
+ - **Context**:
419
+ - `path/to/file.ts:120` code to modify
420
+ - `path/to/file.ts:180` adjacent code that must not regress
421
+ - `path/to/canonical/example.ts:55` — pattern to follow (from @patterns research)
422
+ - plan.md anchor: `## Technical Approach`
423
+ - [ ] Test passes: `{test_name}` (from 1.1.1) now passes
424
+ - [ ] Test passes: existing tests in `path/to/related.test.ts` still pass
425
+ - [ ] Observable behavior: `{specific runtime signal, e.g. log line, HTTP response shape}`
334
426
 
335
427
  #### [1.2] {Parent Task Title} — Integration
336
- *This task wires outputs from 1.1 to consumers*
337
- - [ ] **1.2.1** {Wire X to Y}
338
- - **Wires**: {1.1.1 output} → {consumer component/render}
428
+ - **Predecessor**: 1.1
429
+ - **Unblocks**: {next parent or "terminal"}
430
+
431
+ - [ ] **1.2.1** Wire {1.1.2 output} to {consumer}
432
+ - **Wires**: {1.1.2 output} → {consumer component/render}
339
433
  - **Removes**: {old code path being replaced}
340
- - [ ] {Consumer uses new data source}
341
- - [ ] {Old data source removed/redirected}
434
+ - **Context**:
435
+ - `path/to/consumer.tsx:30` where the wire lands
436
+ - `path/to/old/path.ts:12` — old code path to remove
437
+ - plan.md anchor: `### Technical Approach`
438
+ - [ ] Test passes: integration test asserting consumer renders new data source
439
+ - [ ] State condition: old code path file `path/to/old/path.ts` deleted or import removed
440
+ - [ ] Observable behavior: data flows from producer to rendered output (with `{specific assertion}`)
342
441
 
343
442
  ### Phase 2: {Phase Name}
344
443
  ...
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: execute
3
- description: 👻 | Adaptive Wave-Based Build -> Code_Review -> Validate Flow
3
+ description: 👻 | Adaptive Wave-Based Build with Per-Wave Verification Gate
4
4
  user-invocable: true
5
5
  ---
6
6
 
@@ -11,9 +11,9 @@ user-invocable: true
11
11
  Treat the current command arguments as this workflow's input. When invoked from a slash command, use the forwarded `$ARGUMENTS` value.
12
12
 
13
13
 
14
- # execute: Adaptive Task Execution with Quality Gates
14
+ # execute: Adaptive Task Execution with Per-Wave Verification
15
15
 
16
- Execute tasks in parallel waves with full scope context, adapt based on learnings, code review loop, validate requirements. Outcome: complete implementation with verified quality and E2E requirement coverage.
16
+ Execute tasks in parallel waves with full scope context, verify each wave before proceeding, adapt based on learnings, audit cross-wave integration, generate manual test guide. Outcome: complete implementation with verified quality and E2E requirement coverage.
17
17
 
18
18
  ## ARGUMENTS
19
19
 
@@ -39,7 +39,9 @@ $ARGUMENTS
39
39
 
40
40
  2. **Dispatch Wave**: Launch parallel @dev subagents (1 per task batch)
41
41
  - **CRITICAL**: Each subagent MUST read `SCOPE_DOCS` before executing
42
- - Each receives: task batch assignment, dependency completion reports, SCOPE_DOCS paths
42
+ - Each receives: task batch assignment, SCOPE_DOCS paths, and (after wave 1) a **Prior-Wave Context** block
43
+ - **Prior-Wave Context** (REQUIRED in waves 2+): the orchestrator appends each prior wave's @dev Completion Reports verbatim into this wave's dispatch prompt under a `## Prior-Wave Context` header. Includes Completed tasks, Files changed, Scope signal, Discoveries, and Guidance from each prior batch. This is how state is carried forward — there is no separate state file.
44
+ - **Test discovery**: instruct @dev to use the project's native related-test command (`jest --findRelatedTests <file>`, `pytest` by path, `vitest related`, `cargo test <path>`). Do not create parallel test files for code already covered.
43
45
  - Instruct: "Read scope docs first to understand E2E UX and integration points. Load @skill-spectre:spectre-tdd, then execute tasks sequentially using its TDD methodology. **Commit after each parent task** with conventional commit format (e.g., `feat(module): add X`, `fix(module): resolve Y`). Return completion report with **Implementation Insights** + **E2E Completeness Check**."
44
46
 
45
47
  **E2E Completeness Check** (subagent returns one per batch):
@@ -47,15 +49,64 @@ $ARGUMENTS
47
49
  - 🟡 Gap — [specific functionality missing for E2E UX]
48
50
  - 🔴 Blocker — [cannot deliver spec without changes to other tasks]
49
51
 
50
- 3. **Mark Complete**: Update tasks doc with `[x]` for completed tasks
52
+ 3. **Per-Wave Verification Gate**: Verify the wave's output before adapting or advancing.
51
53
 
52
- 4. **Reflect**: Review completion reports for:
54
+ **3a. Deterministic pre-gate (no AI)**
55
+ - Detect project commands from `package.json` / `pyproject.toml` / `Cargo.toml` / `Makefile`
56
+ - Run lint, typecheck, build — whichever apply
57
+ - If any fail: dispatch @dev to fix the failures, re-run the gate. Do NOT invoke @reviewer until all deterministic checks pass.
58
+
59
+ **3b. Parallel review lenses (single message, two @reviewer dispatches)**
60
+
61
+ Build each reviewer prompt from:
62
+ - Wave diff: `git diff <parent-of-first-wave-commit>..HEAD`
63
+ - Acceptance criteria: verbatim text from scope/tasks docs for this wave's tasks
64
+ - Files-touched manifest
65
+
66
+ **Forbidden in reviewer prompts**: @dev completion reports, implementer rationale, orchestrator paraphrase of "what the dev did and why". The reviewer is a clean room — diff + criteria only.
67
+
68
+ **Lens 1 — security + correctness**
69
+ - OWASP Top-10, injection, auth, secrets, data exposure
70
+ - Logic, edge cases, state transitions
71
+ - Scope adherence (flag only in-scope issues; do not flag missing out-of-scope work)
72
+
73
+ **Lens 2 — wiring**
74
+ - Apply the Defined → Connected → Reachable methodology:
75
+ - Defined: code exists in a file
76
+ - Connected: code is imported/called by other code
77
+ - Reachable: a user action can trigger the code path
78
+ - For each new function/component, grep for usage (not just definition)
79
+ - For UI features, trace render-backward: JSX ← variable ← source ← user action
80
+ - Flag dead computations (computed but never reach output) and old code paths still active when replaced
81
+
82
+ **Severity & evidence rule** (enforced in both lens prompts):
83
+ - Every CRITICAL or HIGH finding MUST include:
84
+ 1. `file:line` reference
85
+ 2. A reproducible failure scenario or exploit path describing observable behavior
86
+ - Findings without an evidence chain are auto-downgraded one severity level. "Could potentially" is not evidence.
87
+ - Each finding includes a hash: `sha256(file_path + line + finding_category)` for the fix-loop ledger (3c).
88
+
89
+ **3c. Bounded fix loop**
90
+
91
+ If lens dispatches return CRITICAL/HIGH:
92
+ - **Iteration cap**: 3 fix waves maximum
93
+ - **Hash ledger**: maintain a set of finding hashes addressed. If a finding with a hash already in the ledger reappears in a later review, classify as "reviewer disagreement" and escalate to user — do NOT re-queue.
94
+ - **Fix/test ratio**: monitor changes per fix wave. If test-file changes > 0.5 × implementation-file changes, halt and surface to user — likely "fixing the test instead of the bug."
95
+ - **Diff-growth circuit-breaker**: if cumulative fix-wave diff grows > 25% per iteration, halt and surface — fixes are adding surface area, not reducing it.
96
+ - **Dispatch fix**: parallel @dev subagents address each CRITICAL/HIGH finding. Each fix-dev receives the finding's full evidence chain (file:line + scenario), not just the description.
97
+ - **Re-verify**: after fixes commit, return to 3a (deterministic) then 3b (lenses).
98
+
99
+ **3d. Exit condition**: No CRITICAL/HIGH remain, OR iteration cap reached and user has been notified of unresolved findings.
100
+
101
+ 4. **Mark Complete**: Update tasks doc with `[x]` for completed tasks
102
+
103
+ 5. **Reflect**: Review completion reports for:
53
104
  - Scope signals (🟡/🟠/🔴) from implementation insights
54
105
  - E2E completeness gaps (🟡/🔴) from completeness checks
55
- - **If** all ⚪ across both → skip to step 6
106
+ - **If** all ⚪ across both → skip to step 7
56
107
  - **Else** → adapt tasks
57
108
 
58
- 5. **Adapt** (only if triggered):
109
+ 6. **Adapt** (only if triggered):
59
110
  - Modify future tasks with learned context
60
111
  - Add tasks for E2E gaps with `[ADDED - E2E gap]` prefix
61
112
  - Add required sub-tasks with `[ADDED]` prefix
@@ -63,34 +114,28 @@ $ARGUMENTS
63
114
  - Flag cross-task integration issues to remaining waves
64
115
  - **Guardrails**: ❌ No "nice-to-have" additions, ❌ No scope expansion, ✅ Only adapt for spec compliance
65
116
 
66
- 6. **Next Wave**: Identify next tasks, gather relevant completion reports, return to step 1
67
-
68
- ## Step 2 - Code Review Loop
69
-
70
- - **Action** — ExecutedeveviewLoop: Until no critical/high feedback:
117
+ 7. **Next Wave**: Identify next tasks, gather prior-wave completion reports for the Prior-Wave Context block, return to step 1
71
118
 
72
- 1. **Spawn Review**: @dev subagent runs `Skill(code_review)` (Claude slash route: `/spectre:code_review`)
73
- 2. **Analyze**: Identify critical/high items
74
- - **If** none → exit loop
75
- 3. **Address**: Parallel @dev subagents fix feedback
76
- 4. **Re-verify**: Return to step 1
119
+ ## Step 2 - Cross-Wave Validate
77
120
 
78
- ## Step 3 - Validate Requirements
121
+ - **Action** SpawnValidation: @analyst runs `Skill(validate)` (Claude slash route: `/spectre:validate`) with **narrowed scope**:
122
+ - Focus: cross-wave integration audit (did later waves silently break earlier waves' wiring?) + scope-creep audit (anything implemented that is NOT in the acceptance criteria?) + dead-computation sweep across the full cumulative diff
123
+ - Skip: per-area wiring verification (already done per-wave in Step 1.3b's wiring lens)
79
124
 
80
- - **Action** — SpawnValidation: @reviewer runs `Skill(validate)` (Claude slash route: `/spectre:validate`) with task list
81
- - **Action** — AddressGaps: If high priority gaps → dispatch @dev subagents to fix
125
+ - **Action** — AddressGaps: If high priority gaps surface dispatch @dev subagents to fix.
82
126
 
83
- ## Step 4 - Prepare for QA
127
+ ## Step 3 - Prepare for QA
84
128
 
85
129
  - **Action** — GenerateTestGuide: @dev runs `Skill(create_test_guide)` (Claude slash route: `/spectre:create_test_guide`)
86
130
  - Save to `{OUT_DIR}/test_guide.md`
87
131
 
88
- ## Step 5 - Report
132
+ ## Step 4 - Report
89
133
 
90
134
  - **Action** — SummarizeCompletion:
91
- - Tasks completed, waves executed, code review iterations, validation status
135
+ - Tasks completed, waves executed, per-wave fix-loop iteration counts, validation status
92
136
  - Test guide location
93
137
  - **Task Evolution Summary**: Adaptations made (or "None - original plan executed")
94
138
  - **E2E Gaps Addressed**: Summary of completeness issues found and resolved
139
+ - **Unresolved Findings** (if any): Any CRITICAL/HIGH that hit the fix-loop cap and were escalated to user
95
140
 
96
141
  - **Action** — RenderFooter: Use `@skill-spectre:spectre-guide` skill for Next Steps
@@ -49,9 +49,11 @@ Treat the current command arguments as this workflow's input. When invoked from
49
49
  - `OUT_DIR=docs/tasks/{branch_name}` (or user-specified)
50
50
  - `mkdir -p "${OUT_DIR}"`
51
51
 
52
- - **Action** — ScanExistingContext: Read all existing artifacts in `{OUT_DIR}/ (if you havent already)` and assess coverage across 4 dimensions.
52
+ - **Action** — ScanExistingContext: Read all existing artifacts in `{OUT_DIR}/ (if you haven't already)` and assess coverage across 4 dimensions.
53
53
 
54
- Scan for: `task_context.md`, `specs/plan.md`, `concepts/scope.md`, `research/*.md`
54
+ Scan for: `task_context.md`, `specs/plan.md`, `concepts/scope.md`, `specs/ux.md`, `research/*.md`
55
+
56
+ While scanning `concepts/scope.md` and `specs/ux.md`, extract any **filled assumptions** — places where the upstream artifact defaulted a value because the user didn't specify (e.g., DB choice, retry policy, copy variants, segment fallbacks). Carry these forward to Step 3's design surface so they're reviewer-visible before plan generation.
55
57
 
56
58
  | Dimension | Covered if artifact contains... | Covered by |
57
59
  | --- | --- | --- |
@@ -95,13 +97,26 @@ Use research findings from Step 1 to determine appropriate planning depth.
95
97
  | Integration points | Research findings | Internal only = Low, 1-2 external = Med, 3+ external = High |
96
98
  | External complexity | @web-research | Well-documented with libraries = Low, Some prior art = Med, Novel/emerging = High |
97
99
 
98
- - **Action** — CheckHardStops: Any true = automatic COMPREHENSIVE | db_schema_destructive | new_service_or_component | auth_or_pii_change | | payment_billing_logic | public_api_change | caching_consistency | slo_sla_risk |
100
+ - **Action** — CheckHardStops: Any true = automatic COMPREHENSIVE.
101
+ - `db_schema_destructive` — drops, renames, or non-additive column changes
102
+ - `data_migration_required` — backfill, transform, or row-by-row data change
103
+ - `new_service_or_component` — net-new service, daemon, or top-level component
104
+ - `auth_or_pii_change` — authn/authz flow, session handling, PII storage/exposure
105
+ - `secrets_or_credentials_handling` — new secret introduced, rotation, or boundary change
106
+ - `payment_billing_logic` — money flow, invoicing, charge logic
107
+ - `public_api_change` — externally-consumed API surface modified
108
+ - `concurrent_writes_or_locking` — concurrency, locking, or distributed coordination
109
+ - `caching_consistency` — cache invalidation, staleness windows, multi-tier caching
110
+ - `cross_service_or_cross_workspace_change` — coordinated change across services or workspaces
111
+ - `slo_sla_risk` — latency, throughput, or availability budget at stake
112
+
113
+ - **Action** — DetermineTier (decisive rules, not point-scoring):
99
114
 
100
- - **Action** — DetermineTier:
115
+ - **COMPREHENSIVE** — if ANY hard-stop is triggered OR any signal scores High OR two or more signals score Medium
116
+ - **STANDARD** — if no hard-stops AND no High signals AND at most one Medium signal
117
+ - **LIGHT** — only if every signal scores Low AND no hard-stops AND the change is plausibly a single-file diff
101
118
 
102
- - **LIGHT**: All/most Low signals, single component, clear pattern match, no hard-stops
103
- - **STANDARD**: Mix of Low/Med signals, multi-file but contained scope, no hard-stops
104
- - **COMPREHENSIVE**: Any High signal, multiple Med signals, or any hard-stop triggered
119
+ When in doubt between two tiers, choose the higher. The cost of over-planning a small change is hours; the cost of under-planning a large one is weeks.
105
120
 
106
121
  - **Action** — LogTier: Note the assessed tier in your response for transparency, then proceed immediately to the next step. Do NOT ask for confirmation.
107
122
 
@@ -129,6 +144,14 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
129
144
  > - [decision] — [rationale; alternative considered]
130
145
  > - [decision] — [rationale; alternative considered]
131
146
  >
147
+ > **How we'll know it works** (verification spine):
148
+ > - [change] → [test name | observable behavior | state condition]
149
+ > - [change] → [test name | observable behavior | state condition]
150
+ >
151
+ > **Filled assumptions** (surfaced from scope.md / ux.md / inferred):
152
+ > - [assumption] — *source: [scope.md / ux.md / default]*
153
+ > - [assumption] — *source: [scope.md / ux.md / default]*
154
+ >
132
155
  > **Open questions** (with default assumption):
133
156
  > 1. [question] — *default: [assumption]*
134
157
  > 2. [question] — *default: [assumption]*
@@ -138,6 +161,8 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
138
161
  **CRITICAL**:
139
162
  - **Single proposed approach**, not a menu. If a true fork exists, surface it as an open question with your recommendation — not as parallel options.
140
163
  - Stay at the *shape* level: components, key decisions, structural changes. Defer file-by-file detail to `create_plan`.
164
+ - **Verification is mandatory.** Every major change in the approach must declare how it will be checked — falsifiable signal, not prose. This becomes the spine that `create_plan` and `create_tasks` build on.
165
+ - **Filled assumptions are mandatory.** If scope.md or ux.md left something silent and you defaulted it, surface the default here. Reviewer-visible by design — these are the silent decisions that bite at execution.
141
166
  - Open questions should be specific and answerable; pair each with a default assumption so the user can skip if the default is fine.
142
167
 
143
168
  - **Action** — IterateDesign: If the user replies with answers, edits, or pushback, update the design and re-present. Loop until user says 'looks good' (or equivalent).
@@ -208,4 +233,24 @@ Goal: align on the *shape* of the solution before generating a full plan. This c
208
233
 
209
234
  ---
210
235
 
236
+ ### Post-Tasks Tier Re-check
237
+
238
+ After tasks return, do a fast self-check against tier signals:
239
+
240
+ - Count parent tasks, sub-tasks, files touched (sum of unique paths in Context blocks), and Phase 0 dep count.
241
+ - **Escalation triggers** (any true → recommend re-running at a higher tier):
242
+ - Tier was LIGHT but tasks touch >3 files OR have >2 parent tasks
243
+ - Tier was STANDARD but tasks reveal a hard-stop signal not caught earlier (e.g., a migration sub-task appeared)
244
+ - Tasks contain any Out-of-Bounds violation
245
+ - **Downgrade triggers** (rare; only suggest if confident):
246
+ - Tier was COMPREHENSIVE but tasks collapsed to a single parent with no migrations, no new components, and no API change
247
+
248
+ If an escalation/downgrade is triggered, surface it as a recommendation — do NOT silently re-run. Format:
249
+
250
+ > Tier reassessment: I planned this as {original tier}, but tasks revealed {signal}. Recommend re-running as {new tier}. Reply 'rerun' to regenerate or 'keep' to proceed as-is.
251
+
252
+ Only proceed past this checkpoint when the user confirms.
253
+
254
+ ---
255
+
211
256
  - **Action** — RenderFooter: Use `@skill-spectre:spectre-guide` skill for Next Steps