npm - @harness-engineering/cli - Versions diffs - 1.6.2 → 1.8.0 - Mend

@harness-engineering/cli 1.6.2 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (180) hide show

package/dist/agents/skills/claude-code/harness-soundness-review/SKILL.md ADDED Viewed

@@ -0,0 +1,1267 @@
+# Harness Soundness Review
+> Deep soundness analysis of specs and plans. Auto-fixes inferrable issues, surfaces design decisions to you. Runs automatically before sign-off — no extra commands needed.
+## When to Use
+- Automatically invoked by harness-brainstorming before spec sign-off (`--mode spec`)
+- Automatically invoked by harness-planning before plan sign-off (`--mode plan`)
+- Manually invoked to review a spec or plan on demand
+- NOT for reviewing implementation code (use harness-code-review)
+- NOT as a replacement for mechanical validation (harness validate, check-deps remain as-is)
+- NOT in CI — this is a design-time skill
+## Arguments
+- **`--mode spec`** — Run spec-mode checks (S1-S7) against a draft spec. Invoked by harness-brainstorming.
+- **`--mode plan`** — Run plan-mode checks (P1-P7) against a draft plan. Invoked by harness-planning.
+## Process
+### Iron Law
+**No spec or plan may be signed off without a converged soundness review. Inferrable fixes are applied silently. Design decisions are always surfaced to the user.**
+---
+### Finding Schema
+Every finding produced by a check conforms to this structure:
+```json
+{
+  "id": "string — unique identifier for this finding",
+  "check": "string — check ID, e.g. S1, P3",
+  "title": "string — one-line summary",
+  "detail": "string — full explanation with evidence",
+  "severity": "error | warning — errors block sign-off, warnings are advisory",
+  "autoFixable": "boolean — whether this can be fixed without user input",
+  "suggestedFix": "string | undefined — what the auto-fix would do, or suggestion for user",
+  "evidence": ["string[] — references to spec/plan sections, codebase files"]
+}
+```
+---
+### Phase 1: CHECK — Run All Checks for Current Mode
+Execute all checks for the active mode. Classify each finding as `autoFixable: true` or `autoFixable: false`. Record the total issue count.
+#### Graph Detection and Fallback
+Before running checks, determine graph availability:
+1. Check whether `.harness/graph/` exists in the project root.
+2. If the directory exists, the following MCP tools are available for enhanced analysis during checks S3, S5, P1, P3, and P4:
+   - `query_graph` — traverse module and dependency nodes to verify referenced patterns exist and check architectural compatibility
+   - `find_context_for` — search the graph for related design decisions and assumptions from other specs
+   - `get_relationships` — get inbound/outbound relationships for a node to verify dependency direction and layer compliance
+   - `get_impact` — analyze downstream impact of file changes to verify dependency completeness and detect indirect conflicts
+3. If the directory does not exist, use the "Without graph" path for every check. Do not block, warn, or degrade the review — all checks produce useful results from document analysis and codebase reads alone.
+The per-check procedures below include "Without graph" and "With graph" variants. Use the variant matching the detection result from step 1.
+#### Spec Mode Checks (`--mode spec`)
+| #   | Check                      | What it detects                                                                                | Auto-fixable?                                                 |
+| --- | -------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------- |
+| S1  | Internal coherence         | Contradictions between decisions, technical design, and success criteria                       | No — surface to user                                          |
+| S2  | Goal-criteria traceability | Goals without success criteria; orphan criteria not tied to any goal                           | Yes — add missing links, flag orphans                         |
+| S3  | Unstated assumptions       | Implicit assumptions in the design not called out (e.g., single-tenant, always-online)         | Partially — infer and add obvious ones, surface ambiguous     |
+| S4  | Requirement completeness   | Missing error cases, edge cases, failure modes; apply EARS patterns for unwanted-behavior gaps | Partially — add obvious error cases, surface design-dependent |
+| S5  | Feasibility red flags      | Design depends on nonexistent codebase capabilities or incompatible patterns                   | No — surface to user with evidence                            |
+| S6  | YAGNI re-scan              | Speculative features that crept in during conversation                                         | No — surface to user (removing features is a design decision) |
+| S7  | Testability                | Vague success criteria that are not observable or measurable ("should be fast")                | Yes — add thresholds/specificity where inferrable             |
+##### S1 Internal Coherence
+**What to analyze:** Decisions table, Technical Design section, Success Criteria section, Non-goals section.
+**How to detect:**
+1. For each decision in the Decisions table, verify it is consistent with the Technical Design. A decision that says "use approach A" while the Technical Design describes approach B is a contradiction.
+2. For each success criterion, verify it does not contradict a decision or a non-goal. A criterion that requires behavior explicitly excluded by a non-goal is a contradiction.
+3. For each non-goal, verify no part of the Technical Design implements it. A non-goal with a corresponding implementation section is a contradiction.
+4. Flag any pair where one section asserts X and another section asserts not-X or a conflicting approach.
+**Finding classification:** Always `severity: "error"`, always `autoFixable: false`. Contradictions are design decisions — the user must resolve which side is correct.
+**Example finding:**
+```json
+{
+  "id": "S1-001",
+  "check": "S1",
+  "title": "Decision contradicts Technical Design",
+  "detail": "Decision D3 says 'use SQLite for local storage' but Technical Design section 'Data Layer' describes a PostgreSQL schema with migrations. These are incompatible storage approaches.",
+  "severity": "error",
+  "autoFixable": false,
+  "suggestedFix": "Align the Technical Design with the decision (use SQLite) or update the decision to reflect the PostgreSQL approach.",
+  "evidence": [
+    "Decisions table, row D3: 'Use SQLite for local storage'",
+    "Technical Design > Data Layer: 'PostgreSQL schema with up/down migrations'"
+  ]
+}
+```
+##### S2 Goal-Criteria Traceability
+**What to analyze:** Overview section (goals), Success Criteria section.
+**How to detect:**
+1. Extract the stated goals from the Overview section. Goals are typically phrased as desired outcomes or capabilities the system should have after implementation.
+2. For each goal, check that at least one success criterion covers it. A goal without a corresponding criterion is a **gap** — there is no way to verify the goal was achieved.
+3. For each success criterion, check that it traces back to a stated goal or an explicit design decision. A criterion with no corresponding goal is an **orphan** — it may be testing something that was never requested.
+4. Flag gaps (goals without criteria) and orphans (criteria without goals) separately, as they have different fix strategies.
+**Finding classification:**
+- Missing traceability links (goals without criteria): `severity: "warning"`, `autoFixable: true`. The fix is to add a new success criterion that covers the uncovered goal, derived from the Technical Design context.
+- Orphan criteria (criteria without goals): `severity: "warning"`, `autoFixable: false`. Removing or reassigning criteria is a design decision — the criterion may be intentional prerequisite work.
+**Example findings:**
+```json
+{
+  "id": "S2-001",
+  "check": "S2",
+  "title": "Goal has no success criterion",
+  "detail": "Overview goal 'Support offline mode' has no corresponding success criterion. There is no way to verify this goal was achieved.",
+  "severity": "warning",
+  "autoFixable": true,
+  "suggestedFix": "Add criterion: 'The application functions without network connectivity for all read operations, returning cached data when available.'",
+  "evidence": ["Overview: 'Support offline mode'", "Success Criteria: no match found"]
+}
+```
+```json
+{
+  "id": "S2-002",
+  "check": "S2",
+  "title": "Orphan criterion not tied to any goal",
+  "detail": "Success criterion 7 ('All API responses include request-id headers') does not trace to any stated goal in the Overview. It may be an operational requirement that should be added as a goal, or it may be out of scope.",
+  "severity": "warning",
+  "autoFixable": false,
+  "suggestedFix": "Either add a corresponding goal to the Overview (e.g., 'Support request tracing for debugging') or remove this criterion if it is out of scope.",
+  "evidence": [
+    "Success Criteria #7: 'All API responses include request-id headers'",
+    "Overview: no matching goal found"
+  ]
+}
+```
+##### S3 Unstated Assumptions
+**What to analyze:** Technical Design section, Decisions table, data structures, integration points.
+**How to detect:**
+- **Document analysis:** Scan for implicit assumptions about runtime environment (single-process, always-online, specific OS), data characteristics (fits in memory, UTF-8 only, no concurrent access), deployment model (single-tenant, monolith, specific cloud provider), and user context (has admin access, uses specific tools). Check whether the spec explicitly states or acknowledges these assumptions.
+- **Without graph (codebase reads):** Read referenced source files (from Technical Design) to identify conventions the spec assumes but does not state (e.g., "uses the existing email utility" — does that utility exist? Does it have the expected interface?). Use Grep/Glob to verify referenced patterns and modules exist.
+- **With graph:** Use `query_graph` to find related modules and their documented assumptions. Use `find_context_for` to surface design decisions from related specs that may conflict.
+**Finding classification:**
+- Obvious assumptions (e.g., Node.js runtime, filesystem access, UTF-8 encoding): `severity: "warning"`, `autoFixable: true`. The fix is to add them to an explicit Assumptions section in the spec.
+- Ambiguous assumptions (e.g., single-tenant vs multi-tenant, concurrency model, deployment topology): `severity: "warning"`, `autoFixable: false`. The user must decide which assumption is correct.
+**Example findings:**
+```json
+{
+  "id": "S3-001",
+  "check": "S3",
+  "title": "Implicit Node.js runtime assumption",
+  "detail": "Technical Design references 'path.join' and 'fs.readFileSync' without stating the runtime environment. The spec assumes Node.js but does not declare it.",
+  "severity": "warning",
+  "autoFixable": true,
+  "suggestedFix": "Add to Assumptions section: 'Runtime: Node.js >= 18.x (LTS). The implementation uses Node.js built-in modules (fs, path, child_process).'",
+  "evidence": [
+    "Technical Design > File Operations: uses path.join, fs.readFileSync",
+    "No Assumptions section found in spec"
+  ]
+}
+```
+```json
+{
+  "id": "S3-002",
+  "check": "S3",
+  "title": "Ambiguous concurrency model",
+  "detail": "Technical Design describes a background job processor but does not specify whether it runs in-process (setTimeout/setInterval), as a separate worker thread, or as an independent process. This affects error isolation, memory limits, and deployment.",
+  "severity": "warning",
+  "autoFixable": false,
+  "suggestedFix": "Add a decision to the Decisions table specifying the concurrency model: in-process event loop, worker_threads, or separate process.",
+  "evidence": [
+    "Technical Design > Job Processor: 'processes background jobs'",
+    "Decisions table: no entry for concurrency model"
+  ]
+}
+```
+##### S4 Requirement Completeness
+**What to analyze:** Technical Design section (especially data structures, API endpoints, integration points), Success Criteria section.
+**How to detect:**
+- **Error cases:** For each data structure, identify what happens when fields are missing, null, or malformed. For each API endpoint or function, identify error responses. Flag any operation that has no defined error behavior.
+- **Edge cases:** For each numeric field, check if boundary values are specified (zero, negative, overflow). For each string field, check if empty string, very long string, and special character handling is defined. For each collection, check if empty collection behavior is defined.
+- **Failure modes:** For each external dependency (network call, file I/O, third-party service), check if timeout, unavailability, and partial failure behaviors are defined. Apply the EARS "Unwanted" pattern: "If [failure condition], then the system shall [graceful behavior]."
+- **Codebase context:** Read referenced modules to identify error patterns already established in the codebase that the spec should follow.
+**Finding classification:**
+- Obvious error cases (missing error handling for file I/O, network calls, JSON parsing): `severity: "warning"`, `autoFixable: true`. The fix is to add the error case following established codebase patterns.
+- Design-dependent error handling (what to do when a service is down — retry? cache? fail?): `severity: "warning"`, `autoFixable: false`. The user must decide the error strategy.
+**Example findings:**
+```json
+{
+  "id": "S4-001",
+  "check": "S4",
+  "title": "Missing file-not-found error case",
+  "detail": "Technical Design describes reading a config file with fs.readFileSync but does not specify behavior when the file does not exist. The codebase convention (see packages/core/src/config.ts) is to return a default config object.",
+  "severity": "warning",
+  "autoFixable": true,
+  "suggestedFix": "Add error case: 'If the config file does not exist (ENOENT), return the default configuration object. Log a debug message indicating defaults are being used.'",
+  "evidence": [
+    "Technical Design > Configuration: 'read config from harness.config.json'",
+    "No error handling specified for missing file",
+    "Codebase pattern: packages/core/src/config.ts returns defaults on ENOENT"
+  ]
+}
+```
+```json
+{
+  "id": "S4-002",
+  "check": "S4",
+  "title": "Undefined retry strategy for external service",
+  "detail": "Technical Design describes calling an external API for license validation but does not specify behavior when the API is unavailable, times out, or returns an error. This is a design decision that affects user experience (block vs. degrade gracefully).",
+  "severity": "warning",
+  "autoFixable": false,
+  "suggestedFix": "Add a decision: 'When the license API is unavailable: (a) fail open — allow usage with a warning, (b) fail closed — block usage until validated, or (c) cache — use last known result for N hours.'",
+  "evidence": [
+    "Technical Design > License Check: 'call /api/validate on startup'",
+    "No timeout, retry, or fallback behavior specified"
+  ]
+}
+```
+##### S5 Feasibility Red Flags
+**What to analyze:** Technical Design section (referenced modules, dependencies, patterns, APIs).
+**How to detect:**
+- **Without graph (codebase reads):** For each module, function, or class referenced in the Technical Design, use Glob/Grep to verify it exists in the codebase. For each API or interface referenced, read the source to verify the expected signature matches. For each pattern referenced ("uses the existing X"), verify X exists and has the capabilities assumed. Flag references to nonexistent modules, functions with different signatures than assumed, or patterns incompatible with the codebase architecture.
+- **With graph:** Use `query_graph` to verify referenced modules exist and check their dependency relationships. Use `get_relationships` to verify architectural compatibility (e.g., a module in layer A should not depend on layer B). Use `get_impact` to assess whether the proposed changes have cascading effects not accounted for in the spec.
+**Finding classification:** Always `severity: "error"`, always `autoFixable: false`. Feasibility problems require the user to revise the technical design.
+**Example finding:**
+```json
+{
+  "id": "S5-001",
+  "check": "S5",
+  "title": "Referenced function has different signature",
+  "detail": "Technical Design says 'call validateDependencies(projectPath)' but the actual function signature is 'validateDependencies(config: ProjectConfig): ValidationResult'. The spec assumes a simpler interface than what exists.",
+  "severity": "error",
+  "autoFixable": false,
+  "suggestedFix": "Update the Technical Design to use the actual signature: validateDependencies(config) where config is a ProjectConfig object. This may require adding a config construction step before the call.",
+  "evidence": [
+    "Technical Design > Validation: 'call validateDependencies(projectPath)'",
+    "packages/core/src/validator.ts:42: export function validateDependencies(config: ProjectConfig): ValidationResult"
+  ]
+}
+```
+##### S6 YAGNI Re-scan
+**What to analyze:** Technical Design section, Decisions table, Implementation Order.
+**How to detect:**
+1. For each technical component, interface, or configuration option described in Technical Design, check whether it is required by a stated goal or success criterion. Flag components that exist "for future use", "in case we need", or that implement functionality explicitly listed in Non-goals.
+2. Flag decision rationale that references hypothetical future requirements rather than current needs (e.g., "we might need this later", "for extensibility", "in case the requirements change").
+3. Flag configuration options that toggle features not yet defined in any goal or criterion.
+4. Flag abstraction layers or interfaces introduced solely for "flexibility" without a concrete current consumer.
+**Finding classification:** Always `severity: "warning"`, always `autoFixable: false`. Removing speculative features is a design decision — the user must decide whether the feature is truly needed now or can be deferred.
+**Example finding:**
+```json
+{
+  "id": "S6-001",
+  "check": "S6",
+  "title": "Speculative configuration option",
+  "detail": "Technical Design defines a 'pluginDir' configuration option for loading third-party plugins, but no goal or success criterion mentions plugins. The Non-goals section does not exclude plugins, but no current requirement needs them.",
+  "severity": "warning",
+  "autoFixable": false,
+  "suggestedFix": "Remove the pluginDir configuration option and plugin loading logic from the Technical Design. If plugin support is needed later, it can be added in a future spec.",
+  "evidence": [
+    "Technical Design > Configuration: 'pluginDir: string — directory for third-party plugins'",
+    "Overview goals: no mention of plugins",
+    "Success Criteria: no criterion references plugins"
+  ]
+}
+```
+##### S7 Testability
+**What to analyze:** Success Criteria section.
+**How to detect:**
+1. For each success criterion, evaluate whether it is observable and measurable. A testable criterion describes a specific behavior that can be verified with a concrete test or measurement.
+2. Flag criteria that use vague qualifiers without specific thresholds: "should be fast", "handles errors well", "is user-friendly", "scales appropriately", "is robust", "performs efficiently".
+3. Flag criteria that describe internal implementation details rather than externally observable outcomes (e.g., "uses a clean architecture" — what does "clean" mean in observable terms?).
+4. For vague criteria where the Technical Design provides context, infer a specific threshold. For example, if the Technical Design mentions a 100ms timeout, "should be fast" can be replaced with "responds within 100ms".
+**Finding classification:**
+- Vague criteria with inferrable thresholds (context in Technical Design provides a specific number or behavior): `severity: "warning"`, `autoFixable: true`. The fix is to replace the vague qualifier with the specific threshold or observable behavior derived from the Technical Design.
+- Criteria that are fundamentally unmeasurable (subjective quality, aesthetic judgment, no Technical Design context to infer from): `severity: "error"`, `autoFixable: false`. The user must rewrite the criterion to be observable.
+**Example findings:**
+```json
+{
+  "id": "S7-001",
+  "check": "S7",
+  "title": "Vague performance criterion",
+  "detail": "Success criterion 3 says 'the build should be fast' without specifying a threshold. The Technical Design mentions a 30-second CI timeout, suggesting a concrete threshold exists.",
+  "severity": "warning",
+  "autoFixable": true,
+  "suggestedFix": "Replace 'the build should be fast' with 'the build completes in under 30 seconds on CI (as specified in Technical Design > CI Configuration).'",
+  "evidence": [
+    "Success Criteria #3: 'the build should be fast'",
+    "Technical Design > CI Configuration: '30-second timeout'"
+  ]
+}
+```
+```json
+{
+  "id": "S7-002",
+  "check": "S7",
+  "title": "Unmeasurable quality criterion",
+  "detail": "Success criterion 8 says 'the code is clean and maintainable'. This is a subjective judgment with no observable behavior. There is no Technical Design context to infer specific metrics.",
+  "severity": "error",
+  "autoFixable": false,
+  "suggestedFix": "Rewrite with observable criteria, e.g., 'all functions are under 50 lines, cyclomatic complexity under 10, no eslint warnings' or remove if covered by existing linting rules.",
+  "evidence": [
+    "Success Criteria #8: 'the code is clean and maintainable'",
+    "No Technical Design context for measurable thresholds"
+  ]
+}
+```
+#### Plan Mode Checks (`--mode plan`)
+| #   | Check                  | What it detects                                                                     | Auto-fixable?                                                      |
+| --- | ---------------------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
+| P1  | Spec-plan coverage     | Success criteria from spec with no corresponding task(s)                            | Yes — add missing tasks                                            |
+| P2  | Task completeness      | Tasks missing clear inputs, outputs, or verification criteria                       | Yes — infer from context and fill in                               |
+| P3  | Dependency correctness | Cycles in dependency graph; task B uses output of A but does not declare dependency | Yes — add missing dependency edges                                 |
+| P4  | Ordering sanity        | Tasks touching same files scheduled in parallel; consumers before producers         | Yes — reorder                                                      |
+| P5  | Risk coverage          | Spec risks without mitigation in plan (no task or explicit acceptance)              | Partially — add mitigation tasks for obvious risks, surface others |
+| P6  | Scope drift            | Plan tasks not traceable to any spec requirement                                    | No — surface to user (might be intentional prerequisite work)      |
+| P7  | Task-level feasibility | Tasks requiring decisions not made in brainstorming; tasks too vague to execute     | No — surface to user                                               |
+##### P1 Spec-Plan Coverage
+**What to analyze:** The spec's Success Criteria section and the plan's Tasks section. Requires access to both the spec document (referenced in the plan header) and the plan being reviewed.
+**How to detect:**
+- Without graph: Extract each numbered success criterion from the spec. For each criterion, search the plan's task descriptions, verification steps, and observable truths for text that covers the criterion. A criterion is "covered" if at least one task's verification step or observable truth would confirm the criterion is met. Flag any criterion with no corresponding task coverage.
+- With graph: If graph traceability edges exist between spec criteria and plan tasks, use those edges to verify coverage. Flag criteria with no inbound traceability edge.
+**Finding classification:** Always `severity: "error"`, always `autoFixable: true`. The fix is to add a new task (or extend an existing task's verification step) that covers the uncovered criterion.
+**Example finding:**
+```json
+{
+  "id": "P1-001",
+  "check": "P1",
+  "title": "Spec criterion not covered by any plan task",
+  "detail": "Success criterion #4 ('All API errors return structured error responses with request-id') has no corresponding task in the plan. No task's verification step or observable truth would confirm this criterion is met.",
+  "severity": "error",
+  "autoFixable": true,
+  "suggestedFix": "Add a new task that implements structured error responses and verifies request-id headers are included. Place it after the API route tasks (Task 3) with appropriate dependencies.",
+  "evidence": [
+    "Spec Success Criteria #4: 'All API errors return structured error responses with request-id'",
+    "Plan Tasks 1-8: no task references error response format or request-id"
+  ]
+}
+```
+##### P2 Task Completeness
+**What to analyze:** Each task in the plan's Tasks section.
+**How to detect:** For each task, verify it has: (a) clear inputs (what files/artifacts the task reads or depends on), (b) clear outputs (what files the task creates or modifies), (c) a verification criterion (a test command, observable behavior, or check that confirms the task succeeded). Flag tasks missing any of these three elements.
+**Finding classification:** Always `severity: "warning"`, always `autoFixable: true`. The fix is to infer the missing element from the task description and surrounding context (e.g., if a task says "create src/foo.ts" but has no verification, add "Run: `npx vitest run src/foo.test.ts`" if a test file exists in the plan, or "Run: `tsc --noEmit`" as a minimal verification).
+**Example finding:**
+```json
+{
+  "id": "P2-001",
+  "check": "P2",
+  "title": "Task missing verification criterion",
+  "detail": "Task 3 ('Create notification service') specifies inputs (notification types from Task 1) and outputs (src/services/notification-service.ts) but has no verification criterion. There is no test command, observable behavior, or check that confirms the task succeeded.",
+  "severity": "warning",
+  "autoFixable": true,
+  "suggestedFix": "Add verification: 'Run: `npx vitest run src/services/notification-service.test.ts`' (test file exists in Task 4 of the plan).",
+  "evidence": [
+    "Task 3: no 'Run:', 'Verify:', or 'Check:' step found",
+    "Task 4 creates src/services/notification-service.test.ts — can be referenced as verification"
+  ]
+}
+```
+##### P3 Dependency Correctness
+**What to analyze:** The "Depends on" declarations across all tasks, and the file paths / artifacts referenced in each task.
+**How to detect:**
+- Build a dependency graph from all "Depends on: Task N" declarations.
+- **Cycle detection:** Run a topological sort on the graph. If the sort fails, a cycle exists. Report the cycle as the set of tasks involved (e.g., "Task 3 -> Task 5 -> Task 3").
+- **Missing edges:** For each task, extract the files it reads or imports. If a file is created by another task (check the File Map), verify the creating task is declared as a dependency. Flag missing edges.
+- Without graph (static analysis): Parse file paths from task descriptions ("Create src/types/foo.ts", "Modify src/services/bar.ts") and match creators to consumers.
+- With graph: Use `get_impact` on each task's output files to verify that all downstream consumers are declared as dependents. Graph edges provide more accurate dependency data than text parsing.
+**Finding classification:**
+- Cycles: `severity: "error"`, `autoFixable: false`. Cycles indicate a decomposition error that requires restructuring tasks. Surface to user.
+- Missing dependency edges: `severity: "warning"`, `autoFixable: true`. The fix is to add the missing "Depends on" declaration to the consuming task.
+**Example findings:**
+```json
+{
+  "id": "P3-001",
+  "check": "P3",
+  "title": "Dependency cycle detected",
+  "detail": "Tasks form a cycle: Task 3 depends on Task 5, Task 5 depends on Task 3. Topological sort fails. These tasks cannot be executed in any valid order without restructuring.",
+  "severity": "error",
+  "autoFixable": false,
+  "suggestedFix": "Break the cycle by merging Tasks 3 and 5 into a single task, or by extracting the shared dependency into a new task that both depend on.",
+  "evidence": [
+    "Task 3: 'Depends on: Task 5'",
+    "Task 5: 'Depends on: Task 3'",
+    "Topological sort failed — cycle: Task 3 -> Task 5 -> Task 3"
+  ]
+}
+```
+```json
+{
+  "id": "P3-002",
+  "check": "P3",
+  "title": "Missing dependency edge",
+  "detail": "Task 5 imports from 'src/types/notification.ts' which is created by Task 1, but Task 5 does not declare 'Depends on: Task 1'. If Task 5 runs before Task 1, it will fail.",
+  "severity": "warning",
+  "autoFixable": true,
+  "suggestedFix": "Add 'Depends on: Task 1' to Task 5's header.",
+  "evidence": [
+    "Task 5: imports src/types/notification.ts",
+    "File Map: src/types/notification.ts created by Task 1",
+    "Task 5 'Depends on' line: 'Depends on: Task 4' (Task 1 not listed)"
+  ]
+}
+```
+##### P4 Ordering Sanity
+**What to analyze:** The task execution order (numbering and dependency graph), the file paths each task touches, and any parallel opportunities declared.
+**How to detect:**
+- **File conflict detection:** Extract file paths from each task. If two tasks touch the same file and are not sequenced by a dependency edge (one could run before the other), flag them as a potential conflict. Tasks touching the same file must be ordered.
+- **Consumer-before-producer:** If Task A creates a type or export that Task B imports, but Task B has a lower number and no dependency on Task A, the consumer is scheduled before the producer. Flag the ordering violation.
+- Without graph: Parse file paths from task descriptions and the File Map. Build a file-to-task mapping and check for conflicts.
+- With graph: Use graph file ownership data to get accurate file-to-module mappings. This catches indirect conflicts (e.g., two tasks modify different files in the same module, and the module has a single barrel export that both affect).
+**Finding classification:** Always `severity: "warning"`, always `autoFixable: true`. The fix is to reorder the tasks (update task numbers and "Depends on" declarations) so that producers come before consumers and file-conflicting tasks are sequenced.
+**Example finding:**
+```json
+{
+  "id": "P4-001",
+  "check": "P4",
+  "title": "Consumer task scheduled before producer",
+  "detail": "Task 2 imports from 'src/types/user.ts' which is created by Task 4. Task 2 has no dependency on Task 4, so it could execute first and fail on the missing import.",
+  "severity": "warning",
+  "autoFixable": true,
+  "suggestedFix": "Add 'Depends on: Task 4' to Task 2, or reorder so the type definition task (currently Task 4) comes before Task 2.",
+  "evidence": [
+    "Task 2: 'import { User } from src/types/user.ts'",
+    "Task 4: 'Create src/types/user.ts with User interface'",
+    "Task 2 'Depends on': 'none' (Task 4 not listed)"
+  ]
+}
+```
+##### P5 Risk Coverage
+**What to analyze:** The spec's risk-related content (any section mentioning risks, caveats, concerns, open questions) and the plan's tasks and checkpoints.
+**How to detect:** Identify risks stated in the spec. These appear in: explicit "Risks" sections, decision rationale mentioning tradeoffs, success criteria that imply failure modes, non-goals that have adjacent risk (e.g., "not in CI" implies no automated gate). For each identified risk, check whether the plan contains: (a) a task that directly mitigates it, (b) a checkpoint that acknowledges it, or (c) an explicit "accepted risk" note. Flag risks with no coverage.
+**Finding classification:**
+- Obvious mitigation (the risk is technical and a straightforward task addresses it, e.g., "add error handling for X"): `severity: "warning"`, `autoFixable: true`. The fix is to add a mitigation task or extend an existing task's verification step.
+- Judgment-dependent mitigation (the risk involves a design tradeoff, e.g., "performance vs correctness" or "scope vs timeline"): `severity: "warning"`, `autoFixable: false`. Surface to user with mitigation options.
+**Example findings:**
+```json
+{
+  "id": "P5-001",
+  "check": "P5",
+  "title": "Spec risk has no mitigation in plan",
+  "detail": "The spec identifies 'convergence loop may not terminate' as a risk in the Risks section, but no plan task tests termination behavior. The mitigation is straightforward: add a test that verifies the loop terminates on fixed-point inputs.",
+  "severity": "warning",
+  "autoFixable": true,
+  "suggestedFix": "Add a task that tests convergence termination with inputs that produce a fixed point (zero auto-fixable findings on first pass).",
+  "evidence": [
+    "Spec Risks: 'The convergence loop may not terminate if auto-fixes oscillate'",
+    "Plan Tasks 1-8: no task references termination testing or loop bounds"
+  ]
+}
+```
+```json
+{
+  "id": "P5-002",
+  "check": "P5",
+  "title": "Risk requires design judgment to mitigate",
+  "detail": "The spec notes 'auto-fix may introduce new issues' as a risk. Mitigation depends on a design choice: (a) add a rollback mechanism, (b) limit auto-fixes to one pass, or (c) require human approval for cascading fixes. This is a design tradeoff the user must decide.",
+  "severity": "warning",
+  "autoFixable": false,
+  "suggestedFix": "Choose a mitigation strategy: (a) rollback mechanism — add undo capability, (b) single-pass limit — simpler but less thorough, (c) human gate — safer but slower.",
+  "evidence": [
+    "Spec Risks: 'Auto-fixes may introduce new issues in subsequent passes'",
+    "No mitigation strategy specified in spec Decisions table"
+  ]
+}
+```
+##### P6 Scope Drift
+**What to analyze:** The plan's tasks and the spec's goals, success criteria, and technical design.
+**How to detect:** For each plan task, check whether it is traceable to a spec requirement. A task is traceable if it (a) directly implements a success criterion, (b) is a necessary prerequisite for a task that implements a criterion (type definitions, shared utilities), or (c) is infrastructure work explicitly called for in the spec's implementation order. Flag tasks that cannot be traced to any spec requirement.
+**Finding classification:** Always `severity: "warning"`, always `autoFixable: false`. Untraceable tasks might be intentional prerequisite work that the planner identified as necessary. The user must confirm whether each flagged task is in scope or should be removed.
+**Example finding:**
+```json
+{
+  "id": "P6-001",
+  "check": "P6",
+  "title": "Plan task not traceable to spec requirement",
+  "detail": "Task 8 ('Add Redis caching layer for API responses') is not traceable to any spec goal, success criterion, or technical design section. The spec does not mention caching, Redis, or response-time optimization.",
+  "severity": "warning",
+  "autoFixable": false,
+  "suggestedFix": "Either (a) remove Task 8 if caching is not needed for the current scope, or (b) add a corresponding goal and success criterion to the spec if caching is a genuine requirement.",
+  "evidence": [
+    "Task 8: 'Add Redis caching layer for API responses'",
+    "Spec goals: no mention of caching or performance optimization",
+    "Spec success criteria: no criterion references response time or caching",
+    "Spec technical design: no caching architecture described"
+  ]
+}
+```
+##### P7 Task-Level Feasibility
+**What to analyze:** Each task's description, file paths, code snippets, and referenced decisions.
+**How to detect:**
+- **Undecided dependencies:** Check whether any task requires a design decision that was not made during brainstorming. Indicators: task description says "depending on the approach chosen", "if we go with option A", or references a decision not present in the spec's Decisions table.
+- **Vague instructions:** Check whether any task lacks the specificity required by the harness-planning iron law ("every task must be completable in one context window"). Indicators: task says "implement the service" without specifying which functions, "add validation" without specifying what validation rules, or "handle errors" without specifying which errors and how.
+- **Oversized tasks:** Check whether any task touches more than 3 files, or combines multiple independent concerns (e.g., "create the type, implement the service, write tests, and integrate with the API" in a single task).
+**Finding classification:** Always `severity: "error"`, always `autoFixable: false`. Feasibility problems require the planner to revise the task — either by making a decision, splitting the task, or adding specificity. These are judgment calls that an auto-fix cannot resolve correctly.
+**Example findings:**
+```json
+{
+  "id": "P7-001",
+  "check": "P7",
+  "title": "Task depends on undecided design choice",
+  "detail": "Task 7 says 'implement caching layer' but the spec's Decisions table has no entry for caching strategy (LRU, TTL, write-through, etc.). The task cannot be executed without knowing which caching approach to use.",
+  "severity": "error",
+  "autoFixable": false,
+  "suggestedFix": "Make the caching decision in the spec's Decisions table (e.g., 'D5: Use LRU cache with 5-minute TTL'), then update Task 7 with the specific implementation details.",
+  "evidence": [
+    "Task 7: 'Implement caching layer for API responses'",
+    "Spec Decisions table: no entry for caching strategy",
+    "Task 7 references no specific cache implementation"
+  ]
+}
+```
+```json
+{
+  "id": "P7-002",
+  "check": "P7",
+  "title": "Task too vague to execute in one context window",
+  "detail": "Task 4 says 'implement the notification service' without specifying which methods to implement, what the function signatures are, or what error handling to apply. A developer cannot complete this task without making design decisions that should have been made during planning.",
+  "severity": "error",
+  "autoFixable": false,
+  "suggestedFix": "Split Task 4 into specific sub-tasks: (a) create NotificationService.create() with signature and error handling, (b) create NotificationService.list() with filtering logic, (c) create NotificationService.markRead() with idempotency handling.",
+  "evidence": [
+    "Task 4: 'Implement the notification service'",
+    "No function signatures, no error handling spec, no test expectations",
+    "harness-planning iron law: every task must be completable in one context window"
+  ]
+}
+```
+---
+### Phase 2: FIX — Auto-Fix Inferrable Issues
+For every finding where `autoFixable: true`:
+1. Apply the fix to the spec or plan document in place.
+2. Log what changed and why (visible to the user after convergence).
+3. Do NOT prompt the user for auto-fixable issues — they are mechanical.
+For findings where `autoFixable: false`: skip them in this phase. They will be surfaced in Phase 4.
+#### Silent vs Surfaced Classification
+| Check | Auto-fixable findings                                       | Fix behavior                                      |
+| ----- | ----------------------------------------------------------- | ------------------------------------------------- |
+| S1    | None — all findings need user input                         | Always surfaced                                   |
+| S2    | Missing traceability links (goals without criteria)         | Silent fix                                        |
+| S2    | Orphan criteria (criteria without goals)                    | Surfaced — removing criteria is a design decision |
+| S3    | Obvious assumptions (runtime, encoding, filesystem)         | Silent fix                                        |
+| S3    | Ambiguous assumptions (concurrency, tenancy, deployment)    | Surfaced — user must choose                       |
+| S4    | Obvious error cases (file I/O, JSON parse, network timeout) | Silent fix                                        |
+| S4    | Design-dependent error handling (retry strategy, failover)  | Surfaced — user must choose strategy              |
+| S5    | None — all findings need user input                         | Always surfaced                                   |
+| S6    | None — all findings need user input                         | Always surfaced                                   |
+| S7    | Vague criteria with inferrable thresholds                   | Silent fix                                        |
+| S7    | Unmeasurable criteria (no context to infer)                 | Surfaced — user must rewrite                      |
+| P1    | Missing task for uncovered criterion                        | Silent fix                                        |
+| P2    | Missing inputs, outputs, or verification                    | Silent fix                                        |
+| P3    | Missing dependency edges                                    | Silent fix                                        |
+| P3    | Dependency cycles                                           | Surfaced — restructuring is a design decision     |
+| P4    | File conflicts or consumer-before-producer                  | Silent fix                                        |
+| P5    | Obvious risk mitigation (technical, straightforward)        | Silent fix                                        |
+| P5    | Judgment-dependent risk mitigation                          | Surfaced — user must choose strategy              |
+| P6    | None — all findings need user input                         | Always surfaced                                   |
+| P7    | None — all findings need user input                         | Always surfaced                                   |
+**Rule:** A fix is silent when the correct resolution can be determined from the document context alone, with no design judgment required. If there are two or more plausible resolutions, the fix is surfaced.
+#### Fix Procedures by Check
+##### S2 Fix: Add Missing Success Criteria
+**When:** A goal in the Overview has no corresponding success criterion.
+**Procedure:**
+1. Read the Technical Design section for context about the uncovered goal.
+2. Draft a success criterion that is specific, observable, and testable — following the EARS patterns if applicable.
+3. Append the new criterion to the Success Criteria section with the next available number.
+4. Record a fix log entry.
+**Edit operation:** Append to the Success Criteria list.
+**Fix log entry example:**
+```
+[S2-001] FIXED: Added success criterion #11 for goal 'Support offline mode':
+  'The application functions without network connectivity for all read operations,
+   returning cached data when available.'
+  Derived from: Technical Design > Offline Cache section.
+```
+##### S7 Fix: Replace Vague Criteria with Specific Thresholds
+**When:** A success criterion uses vague qualifiers ("should be fast", "handles errors well") and the Technical Design provides a concrete threshold or behavior to reference.
+**Procedure:**
+1. Identify the vague qualifier in the criterion.
+2. Search the Technical Design for a related threshold, timeout, limit, or behavioral specification.
+3. Replace the vague qualifier with the specific threshold, citing the Technical Design source.
+4. Record a fix log entry.
+**Edit operation:** Replace the vague criterion text in place.
+**Fix log entry example:**
+```
+[S7-001] FIXED: Replaced vague criterion #3 'the build should be fast' with:
+  'The build completes in under 30 seconds on CI
+   (per Technical Design > CI Configuration: 30-second timeout).'
+```
+##### S3 Fix: Add Obvious Assumptions
+**When:** The Technical Design uses patterns or APIs that imply a specific runtime, encoding, or environment, and no Assumptions section exists or the assumption is missing from it.
+**Procedure:**
+1. Identify the assumption from the Technical Design evidence (e.g., `fs.readFileSync` implies Node.js, `UTF-8` encoding implied by string operations).
+2. If no Assumptions section exists in the spec, create one after the Non-goals section.
+3. Add the assumption as a bullet point with a brief rationale.
+4. Record a fix log entry.
+**Edit operation:** Append to the Assumptions section (create section if missing).
+**Fix log entry example:**
+```
+[S3-001] FIXED: Added assumption to Assumptions section:
+  'Runtime: Node.js >= 18.x (LTS). The implementation uses Node.js
+   built-in modules (fs, path, child_process).'
+  Evidence: Technical Design references path.join, fs.readFileSync.
+```
+##### S4 Fix: Add Obvious Error Cases
+**When:** A Technical Design operation (file I/O, JSON parsing, network call) has no defined error behavior, and the codebase has an established pattern for that error.
+**Procedure:**
+1. Identify the operation missing error handling.
+2. Read the referenced codebase module (if cited) to find the established error pattern (e.g., return defaults on ENOENT, log and rethrow on parse errors).
+3. Add the error case to the Technical Design section near the operation, following EARS "Unwanted" pattern: "If [failure condition], then the system shall [graceful behavior]."
+4. Record a fix log entry.
+**Edit operation:** Insert error case after the operation description in Technical Design.
+**Fix log entry example:**
+```
+[S4-001] FIXED: Added error case for config file read:
+  'If the config file does not exist (ENOENT), return the default
+   configuration object. Log a debug message indicating defaults are used.'
+  Following codebase pattern: packages/core/src/config.ts returns defaults on ENOENT.
+```
+##### P1 Fix: Add Missing Tasks for Uncovered Criteria
+**When:** A spec success criterion has no corresponding plan task.
+**Procedure:**
+1. Read the spec criterion and Technical Design for context.
+2. Draft a new task that would verify the criterion, including file paths, test commands, and commit message.
+3. Insert the task at the appropriate position in the task list (respecting dependencies).
+4. Update the File Map if new files are introduced.
+5. Record a fix log entry.
+**Edit operation:** Insert new task in Tasks section; update File Map.
+**Fix log entry example:**
+```
+[P1-001] FIXED: Added Task 9 covering spec criterion #5 (error logging):
+  'Create src/utils/error-logger.ts with structured error logging.
+   Verify: npx vitest run src/utils/error-logger.test.ts'
+  Derived from: Spec criterion #5 and Technical Design > Error Handling section.
+```
+##### P2 Fix: Fill In Missing Task Elements
+**When:** A task is missing clear inputs, outputs, or verification criteria.
+**Procedure:**
+1. Identify which element is missing (inputs, outputs, or verification).
+2. Infer from the task description and surrounding tasks.
+3. Add the missing element to the task.
+4. Record a fix log entry.
+**Edit operation:** Modify the task in place.
+**Fix log entry example:**
+```
+[P2-001] FIXED: Added verification step to Task 3:
+  'Run: npx vitest run src/services/notification-service.test.ts'
+  Inferred from: Task 4 creates the test file for the service Task 3 implements.
+```
+##### P3 Fix: Add Missing Dependency Edges
+**When:** Task B uses a file or artifact produced by Task A but does not declare "Depends on: Task A".
+**Procedure:**
+1. Identify the producer task from the File Map.
+2. Add "Depends on: Task N" to the consuming task's header.
+3. Record a fix log entry.
+**Edit operation:** Modify the consuming task's "Depends on" line.
+**Fix log entry example:**
+```
+[P3-001] FIXED: Added 'Depends on: Task 2' to Task 5:
+  Task 5 imports src/types/notification.ts which is created by Task 2.
+```
+##### P4 Fix: Reorder Conflicting Tasks
+**When:** Two tasks touch the same file but are not sequenced, or a consumer task is numbered before its producer.
+**Procedure:**
+1. Identify the conflict.
+2. Reorder by updating task numbers or adding a dependency edge.
+3. If reordering changes task numbers, update all "Depends on" references throughout the plan.
+4. Record a fix log entry.
+**Edit operation:** Reorder tasks and update cross-references.
+**Fix log entry example:**
+```
+[P4-001] FIXED: Added 'Depends on: Task 4' to Task 2:
+  Both tasks modify src/routes/index.ts. Task 4 creates the base route
+  that Task 2 extends. Sequencing prevents merge conflicts.
+```
+##### P5 Fix: Add Obvious Mitigation Tasks
+**When:** A spec risk has no coverage in the plan and the mitigation is straightforward (e.g., add error handling, add a test for an edge case).
+**Procedure:**
+1. Read the risk description from the spec.
+2. Draft a mitigation task or extend an existing task's verification step.
+3. Insert at the appropriate position.
+4. Record a fix log entry.
+**Edit operation:** Insert new task or extend existing task; update File Map if needed.
+**Fix log entry example:**
+```
+[P5-001] FIXED: Added Task 10 for convergence termination testing:
+  'Add test that verifies convergence loop terminates on fixed-point inputs
+   (zero auto-fixable findings on first pass).'
+  Mitigates spec risk: 'convergence loop may not terminate'.
+```
+#### Fix Log Format
+Every auto-fix MUST be logged. The fix log is accumulated during Phase 2 and presented to the user after convergence (in Phase 4) as an informational summary. The format is:
+```
+[{finding-id}] FIXED: {one-line description of what changed}
+  {the new text or criterion that was added/modified}
+  {source/evidence for the fix}
+```
+The fix log serves two purposes: (1) the user can review what was silently changed, and (2) if a fix introduces a new issue in the re-check, the log helps trace the cause.
+---
+### Phase 3: CONVERGE — Re-Check and Loop
+After auto-fixes are applied in Phase 2, the convergence loop determines whether further progress is possible.
+#### Convergence Procedure
+1. **Record the issue count.** After Phase 2 completes, note the total number of remaining findings (both auto-fixable and non-auto-fixable) as `count_previous`.
+2. **Re-run all checks.** Execute every check for the current mode (S1-S7 for spec mode) against the updated document. Produce a fresh set of findings. Note the new total as `count_current`.
+3. **Compare counts.**
+   - If `count_current < count_previous`: progress was made. Some auto-fixes resolved issues, or a fix in one area resolved a finding in another (cascading fix). Go to Phase 2 (FIX) and apply any new auto-fixable findings, then return here.
+   - If `count_current == count_previous`: no progress. The remaining issues either need user input or cannot be resolved by auto-fix. Stop looping and proceed to Phase 4 (SURFACE).
+   - If `count_current > count_previous`: auto-fixes introduced new issues. Log a warning in the fix log ("Auto-fixes increased issue count from {previous} to {current} — review fix procedures for unintended side effects.") and proceed to Phase 4 (SURFACE). Do not continue looping.
+4. **Repeat.** Steps 1-3 repeat until no progress is detected. There is no arbitrary iteration cap — the "no progress" check is the termination condition.
+#### Cascading Fixes
+A fix applied in one pass can make a previously non-auto-fixable finding become auto-fixable in the next pass. This is called a **cascading fix**. Examples:
+- **S4 enables S3:** The S4 fix adds an error case that creates an Assumptions section. In the next pass, S3 finds that additional obvious assumptions can now be appended to the existing section (previously S3 could not infer whether to create the section or append to it).
+- **S2 enables S7:** The S2 fix adds a new success criterion. In the next pass, S7 checks the new criterion and finds it can be made more specific using Technical Design context.
+- **S4 enables S4:** The S4 fix adds an error case for one operation. In the next pass, S4 finds a related operation that can now follow the same error pattern (the first fix established a local convention).
+Plan-mode cascading fix examples:
+- **P1 enables P3:** The P1 fix adds a new task (covering a missing spec criterion). In the next pass, P3 detects that existing tasks import files created by the new task but do not declare a dependency on it. P3 adds the missing dependency edges.
+- **P1 enables P4:** The P1 fix adds a new task that creates a type file. In the next pass, P4 detects that the new task should be ordered before tasks that import that type, and reorders accordingly.
+- **P2 enables P5:** The P2 fix adds a verification step to a task, making its outputs explicit. In the next pass, P5 finds that a spec risk (previously unmatched to any task) is now mitigated by the newly explicit verification step.
+Cascading fixes are the reason the loop re-runs all checks, not just the checks that produced auto-fixable findings in the previous pass.
+#### Worked Example: Two-Pass Convergence
+```
+Pass 1 (initial check):
+  S1: 0 findings
+  S2: 1 finding (auto-fixable: missing criterion for 'offline mode' goal)
+  S3: 2 findings (1 auto-fixable: Node.js runtime, 1 needs user input: concurrency)
+  S4: 1 finding (auto-fixable: missing ENOENT error case)
+  S5: 0 findings
+  S6: 0 findings
+  S7: 1 finding (auto-fixable: vague 'fast' criterion)
+  Total: 5 findings, 4 auto-fixable, 1 needs user input.
+  → count_previous = 5
+Phase 2 (FIX): Apply 4 auto-fixes.
+  [S2-001] FIXED: Added success criterion #11 for 'offline mode'.
+  [S3-001] FIXED: Added Node.js runtime assumption to new Assumptions section.
+  [S4-001] FIXED: Added ENOENT error case for config read.
+  [S7-001] FIXED: Replaced 'fast' with 'under 30 seconds on CI'.
+Pass 2 (re-check):
+  S1: 0 findings
+  S2: 0 findings (criterion added — gap closed)
+  S3: 1 finding — CASCADING: S4-001 fix created Assumptions section,
+      so the Node.js assumption that S3-001 added is confirmed,
+      BUT a new obvious assumption (UTF-8 encoding) can now be appended.
+      (1 auto-fixable)
+  S3: 1 finding (unchanged: concurrency model still needs user input)
+  S4: 0 findings (error case added)
+  S5: 0 findings
+  S6: 0 findings
+  S7: 0 findings (criterion sharpened)
+  Total: 2 findings, 1 auto-fixable, 1 needs user input.
+  → count_current = 2 < count_previous = 5. Progress made. Continue.
+Phase 2 (FIX): Apply 1 auto-fix.
+  [S3-003] FIXED: Added UTF-8 encoding assumption to Assumptions section.
+Pass 3 (re-check):
+  S3: 1 finding (unchanged: concurrency model still needs user input)
+  Total: 1 finding, 0 auto-fixable, 1 needs user input.
+  → count_current = 1 < count_previous = 2. Progress made. Continue.
+Phase 2 (FIX): 0 auto-fixable findings. Nothing to fix.
+Pass 4 (re-check):
+  Total: 1 finding, 0 auto-fixable.
+  → count_current = 1 = count_previous = 1. No progress. Converged.
+  → Proceed to Phase 4 (SURFACE) with 1 remaining issue.
+```
+#### Worked Example: Plan-Mode Two-Pass Convergence
+```
+Pass 1 (initial check):
+  P1: 1 finding (auto-fixable: spec criterion #6 has no plan task)
+  P2: 1 finding (auto-fixable: Task 4 missing verification step)
+  P3: 0 findings
+  P4: 0 findings
+  P5: 1 finding (needs user input: performance vs correctness tradeoff)
+  P6: 0 findings
+  P7: 1 finding (needs user input: Task 7 depends on undecided caching strategy)
+  Total: 4 findings, 2 auto-fixable, 2 need user input.
+  → count_previous = 4
+Phase 2 (FIX): Apply 2 auto-fixes.
+  [P1-001] FIXED: Added Task 9 covering spec criterion #6 (structured error logging).
+    Creates src/utils/error-logger.ts and src/utils/error-logger.test.ts.
+  [P2-001] FIXED: Added verification step to Task 4:
+    'Run: npx vitest run src/services/notification-service.test.ts'
+Pass 2 (re-check):
+  P1: 0 findings (criterion now covered by Task 9)
+  P2: 0 findings (Task 4 now has verification)
+  P3: 1 finding — CASCADING: Task 9 (added by P1-001) creates
+      src/utils/error-logger.ts, but Task 6 imports from it without
+      declaring 'Depends on: Task 9'. (1 auto-fixable)
+  P4: 0 findings
+  P5: 1 finding (unchanged: performance tradeoff still needs user input)
+  P6: 0 findings
+  P7: 1 finding (unchanged: caching decision still needed)
+  Total: 3 findings, 1 auto-fixable, 2 need user input.
+  → count_current = 3 < count_previous = 4. Progress made. Continue.
+Phase 2 (FIX): Apply 1 auto-fix.
+  [P3-001] FIXED: Added 'Depends on: Task 9' to Task 6.
+Pass 3 (re-check):
+  P5: 1 finding (unchanged: performance tradeoff)
+  P7: 1 finding (unchanged: caching decision)
+  Total: 2 findings, 0 auto-fixable, 2 need user input.
+  → count_current = 2 < count_previous = 3. Progress made. Continue.
+Phase 2 (FIX): 0 auto-fixable findings. Nothing to fix.
+Pass 4 (re-check):
+  Total: 2 findings, 0 auto-fixable.
+  → count_current = 2 = count_previous = 2. No progress. Converged.
+  → Proceed to Phase 4 (SURFACE) with 2 remaining issues.
+```
+#### Termination Guarantee
+The loop terminates because:
+1. Each pass can only fix auto-fixable findings. The set of auto-fixable findings is finite (bounded by the document size).
+2. Each fix modifies the document, so the "same" finding cannot be auto-fixed twice (the context has changed).
+3. If no auto-fixable findings remain, Phase 2 applies zero fixes, and the re-check produces the same count — triggering the "no progress" exit.
+4. Cascading fixes can only occur a finite number of times because each adds content to the document, and the checks that detect missing content will eventually find nothing missing.
+---
+### Phase 4: SURFACE — Present Remaining Issues
+When findings remain after the convergence loop (Phase 3 determined no further auto-fix progress), present them to the user. If no `needs-user-input` findings remain (all were resolved by auto-fix), skip this phase entirely and proceed to Clean Exit.
+#### Step 1: Group and Prioritize Findings
+Organize remaining findings for presentation:
+1. **Group by severity.** Present all `error` findings before `warning` findings. Errors block sign-off; warnings are advisory.
+2. **Within each severity group, order by check ID.** S1 before S2 before S3 (spec mode); P1 before P2 before P3 (plan mode). This gives the user a predictable reading order.
+3. **Count and announce.** State the total: `N remaining issues need your input (X errors, Y warnings).`
+#### Step 2: Present Each Finding
+For each finding, present exactly three sections:
+**What is wrong** — Use the finding's `title` as a heading, followed by the `detail` field. Include the `evidence` references so the user can locate the problem in context.
+```
+[{id}] {title} ({severity})
+{detail}
+Evidence: {evidence[0]}, {evidence[1]}, ...
+```
+**Why it matters** — Explain the consequence of leaving this unresolved:
+- For `error` severity: "This blocks sign-off. The spec/plan cannot be finalized until this is resolved."
+- For `warning` severity: "This is advisory. You may dismiss it with a reason, but the concern will be logged."
+**Suggested resolution** — Present the `suggestedFix` as the primary option, then list alternative resolution paths:
+- **Option A (recommended):** The suggested fix from the finding.
+- **Option B:** An alternative approach if one is apparent from context.
+- **Option C (warnings only):** "Dismiss with reason — explain why this is acceptable."
+#### Step 3: User Interaction
+Wait for the user to respond to each finding. Accepted responses:
+1. **Resolve:** The user makes the suggested change (or an alternative). Mark the finding as `resolved`. The user may edit the spec/plan directly, add a decision to the Decisions table, add a task, or modify an existing task.
+2. **Dismiss with reason (warnings only):** The user provides a reason why the warning is acceptable. Mark the finding as `dismissed` and log: `[{id}] DISMISSED by user: {reason}`. Dismissed findings are not re-surfaced in subsequent loop iterations.
+3. **Clarify:** The user asks for more context about the finding. Provide additional detail from the evidence and codebase reads. Do not mark the finding as resolved — wait for a resolve or dismiss response.
+Error-severity findings cannot be dismissed. They must be resolved before sign-off.
+#### Step 4: Track Resolution Progress
+Maintain a running status of all surfaced findings:
+```
+Surfaced findings: N total
+  Resolved: X
+  Dismissed: Y (warnings only)
+  Pending: Z
+```
+After each user resolution or dismissal, update the count and present the next pending finding. When all findings are either resolved or dismissed, proceed to Step 5.
+#### Step 5: Re-Check After Resolution
+After all surfaced findings have been addressed:
+1. Loop back to Phase 1 (CHECK) to verify that user resolutions are correct and catch any cascading issues introduced by the changes.
+2. Previously dismissed findings (logged with reason) are excluded from this re-check. They do not re-appear.
+3. If the re-check produces new findings, the full convergence loop runs again (Phase 1 through Phase 4). This is expected — user changes can introduce new issues.
+4. If the re-check produces zero findings, proceed to Clean Exit.
+#### Clean Exit
+Clean Exit occurs when ALL of the following are true:
+- All checks pass with zero findings (excluding dismissed warnings).
+- No `error`-severity findings are pending or dismissed (errors cannot be dismissed).
+- The convergence loop has terminated (issue count stopped decreasing or reached zero).
+On clean exit:
+1. Announce: `CLEAN EXIT — all checks pass. Returning control to {parent skill} for sign-off.`
+2. If any warnings were dismissed, include a summary: `Note: {N} warnings were dismissed by user. See log for reasons.`
+3. Return control to the parent skill (harness-brainstorming or harness-planning).
+---
+### Codebase and Graph Integration
+Checks that benefit from codebase awareness (S3, S5, P1, P3, P4) use these tools:
+| Check | Without graph                        | With graph                                                                                |
+| ----- | ------------------------------------ | ----------------------------------------------------------------------------------------- |
+| S5    | Grep/glob for referenced patterns    | `query_graph` + `get_relationships` to verify dependencies and architecture compatibility |
+| S3    | Infer from codebase conventions      | `find_context_for` to surface related design decisions                                    |
+| P1    | Text matching criteria to tasks      | Graph traceability edges if available                                                     |
+| P3    | Static analysis of task descriptions | `get_impact` to verify dependency completeness                                            |
+| P4    | Parse file paths, detect conflicts   | Graph file ownership for accurate conflict detection                                      |
+All checks produce useful results from document analysis and basic codebase reads alone. Graph adds precision but is never required.
+## Harness Integration
+- **`harness validate`** — Run by the parent skill (harness-brainstorming or harness-planning) before and after the soundness review. This skill does not invoke validate directly.
+- **Parent skill invocation** — harness-brainstorming invokes `--mode spec` before sign-off; harness-planning invokes `--mode plan` before sign-off.
+- **No new user commands** — Users invoke brainstorming and planning exactly as before. The soundness review is invisible until it surfaces an issue.
+- **Graph queries** — When `.harness/graph/` exists, use `query_graph` and `get_impact` for enhanced feasibility and dependency checks. Fall back to file-based reads when no graph is available.
+## Success Criteria
+These criteria validate the skill implementation artifacts. The behavioral success criteria from the spec (automatic invocation, coherence detection, convergence termination, etc.) are verified by running the skill against real specs and plans.
+1. The skill.yaml passes schema validation with all required fields
+2. The SKILL.md contains all required sections and passes structure tests
+3. Both platform copies (claude-code, gemini-cli) are byte-identical and pass parity tests
+4. The two modes (spec, plan) are defined with their check tables (S1-S7, P1-P7)
+5. The `SoundnessFinding` schema is defined in the SKILL.md
+6. The convergence loop structure (CHECK, FIX, CONVERGE, SURFACE, CLEAN EXIT) is documented
+7. `harness validate` passes after all files are written
+8. The skill test suite passes (structure, schema, platform-parity, references)
+## Examples
+### Example: Spec Mode Invocation
+**Context:** harness-brainstorming has drafted a spec and is about to sign off.
+```
+Invoking harness-soundness-review --mode spec...
+Phase 1: CHECK
+  Running S1 (internal coherence)... 0 findings
+  Running S2 (goal-criteria traceability)... 1 finding (auto-fixable)
+  Running S3 (unstated assumptions)... 2 findings (2 need user input)
+  Running S4 (requirement completeness)... 1 finding (auto-fixable)
+  Running S5 (feasibility red flags)... 0 findings
+  Running S6 (YAGNI re-scan)... 0 findings
+  Running S7 (testability)... 1 finding (auto-fixable)
+  5 findings total: 3 auto-fixable, 2 need user input.
+Phase 2: FIX
+  [S2-001] FIXED: Added success criterion for 'Support offline mode' goal.
+  [S4-001] FIXED: Added ENOENT error case for config file read (following codebase pattern).
+  [S7-001] FIXED: Replaced 'build should be fast' with 'completes in under 30 seconds on CI'.
+  3 auto-fixes applied.
+Phase 3: CONVERGE
+  Re-running checks...
+  S3-001 (implicit Node.js assumption) — now auto-fixable (S4-001 fix added
+    Assumptions section, so S3-001 can append to it instead of creating one).
+  [S3-001] FIXED: Added Node.js runtime assumption to Assumptions section.
+  1 additional fix applied. Re-checking...
+  Issue count: 1 (was 2). Decreased — continuing.
+  Re-running checks...
+  Issue count: 1 (unchanged). Converged.
+Phase 4: SURFACE
+  1 remaining issue needs your input:
+  [S3-002] Ambiguous concurrency model (warning)
+  Technical Design describes a background job processor but does not specify
+  whether it runs in-process, as a worker thread, or as a separate process.
+  → Add a decision to the Decisions table specifying the concurrency model.
+  User resolves S3-002 → adds decision: "in-process event loop"
+  Re-running checks... 0 findings.
+CLEAN EXIT — returning control to harness-brainstorming for sign-off.
+```
+### Example: Plan Mode Invocation
+**Context:** harness-planning has drafted a plan and is about to sign off.
+```
+Invoking harness-soundness-review --mode plan...
+Phase 1: CHECK
+  Running P1 (spec-plan coverage)... 1 finding (auto-fixable)
+  Running P2 (task completeness)... 2 findings (auto-fixable)
+  Running P3 (dependency correctness)... 1 finding (auto-fixable)
+  Running P4 (ordering sanity)... 0 findings
+  Running P5 (risk coverage)... 1 finding (1 needs user input)
+  Running P6 (scope drift)... 0 findings
+  Running P7 (task-level feasibility)... 1 finding (needs user input)
+  6 findings total: 4 auto-fixable, 2 need user input.
+Phase 2: FIX
+  [P1-001] FIXED: Added Task 9 covering spec criterion #5 (error logging).
+  [P2-001] FIXED: Added verification step to Task 3 (run vitest).
+  [P2-002] FIXED: Added outputs to Task 6 (creates src/utils/helper.ts).
+  [P3-001] FIXED: Added 'Depends on: Task 2' to Task 5 (uses types from Task 2).
+  4 auto-fixes applied.
+Phase 3: CONVERGE
+  Re-running checks...
+  Issue count: 2 (was 6). Decreased — continuing.
+  Re-running checks...
+  Issue count: 2 (unchanged). Converged.
+Phase 4: SURFACE
+  2 remaining issues need your input:
+  [P5-001] Spec risk 'performance vs correctness tradeoff' has no mitigation (warning)
+  The spec notes that strict validation may impact throughput, but no plan task
+  addresses performance testing or defines an acceptable latency threshold.
+  -> Add a performance benchmark task, relax validation, or accept the risk.
+  [P7-001] Task 7 depends on undecided caching strategy (error)
+  Task 7 says 'implement caching layer' but the spec Decisions table has no
+  entry for caching strategy. This task cannot be executed without a decision.
+  -> Make the caching decision in the spec, then update Task 7 with specifics.
+  User resolves P5-001 -> adds Task 10 for performance benchmark at 100ms threshold.
+  User resolves P7-001 -> updates spec with LRU cache decision, updates Task 7.
+  Re-running checks... 0 findings.
+CLEAN EXIT — returning control to harness-planning for sign-off.
+```
+## Gates
+These are hard stops. Violating any gate means the process has broken down.
+- **No sign-off without convergence.** The soundness review must reach a clean exit (zero findings) before the parent skill proceeds to write the spec or plan. If issues remain, the user must resolve them.
+- **No silent resolution of design decisions.** Contradictions (S1), feasibility concerns (S5), YAGNI violations (S6), scope drift (P6), and task-level feasibility (P7) are NEVER auto-fixed. The user must always decide.
+- **No auto-fix without logging.** Every auto-fix must be logged with what changed and why. Silent, unlogged mutations are not allowed.
+- **Convergence must terminate.** The loop stops when issue count stops decreasing. There is no retry cap — but "no progress" is the hard stop.
+## Escalation
+- **When the spec or plan is too large for a single pass:** Break the document into sections and run checks section by section. Present findings grouped by section.
+- **When a check produces false positives:** Log the false positive and skip it. Do not block sign-off on a finding that the user has explicitly dismissed.
+- **When the convergence loop makes no progress on the first iteration:** All remaining findings need user input. Skip directly to Phase 4 (SURFACE) without looping.
+- **When graph queries are unavailable:** Fall back to document analysis and codebase reads. All checks are designed to work without graph. Do not block or warn about missing graph — just use the fallback path.
+- **When user resolutions repeatedly introduce new errors:** After 2 consecutive resolution attempts that each introduce a new error-severity finding, suggest pausing the soundness review to revisit the spec design holistically rather than fixing issues one at a time.
+- **When codebase files referenced in the spec cannot be read:** Skip the feasibility sub-check for that file. Log the skip and continue with the remaining checks. Do not block the review on inaccessible files.