pan-wizard 2.9.0 → 3.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (69) hide show
  1. package/README.md +8 -8
  2. package/agents/pan-conductor.md +189 -0
  3. package/agents/pan-counterfactual.md +112 -0
  4. package/agents/pan-debugger.md +15 -1
  5. package/agents/pan-document_code.md +21 -0
  6. package/agents/pan-executor.md +16 -0
  7. package/agents/pan-hardener.md +113 -0
  8. package/agents/pan-integration-checker.md +2 -0
  9. package/agents/pan-knowledge.md +81 -0
  10. package/agents/pan-meta-reviewer.md +91 -0
  11. package/agents/pan-plan-checker.md +2 -0
  12. package/agents/pan-previewer.md +98 -0
  13. package/agents/pan-project-researcher.md +4 -4
  14. package/agents/pan-reviewer.md +2 -0
  15. package/agents/pan-verifier.md +2 -0
  16. package/bin/install-lib.cjs +197 -0
  17. package/bin/install.js +1999 -1959
  18. package/commands/pan/assumptions.md +38 -3
  19. package/commands/pan/audit-deployment.md +6 -0
  20. package/commands/pan/cost.md +132 -0
  21. package/commands/pan/debug.md +71 -2
  22. package/commands/pan/exec-phase.md +105 -0
  23. package/commands/pan/focus-auto.md +199 -18
  24. package/commands/pan/focus-design.md +67 -2
  25. package/commands/pan/focus-exec.md +178 -47
  26. package/commands/pan/focus-scan.md +17 -5
  27. package/commands/pan/knowledge.md +129 -0
  28. package/commands/pan/map-codebase.md +47 -6
  29. package/commands/pan/mcp-bridge.md +145 -0
  30. package/commands/pan/milestone-audit.md +23 -0
  31. package/commands/pan/new-project.md +64 -0
  32. package/commands/pan/pause.md +42 -1
  33. package/commands/pan/plan-phase.md +95 -0
  34. package/commands/pan/preview.md +114 -0
  35. package/commands/pan/profile.md +37 -0
  36. package/commands/pan/quick.md +15 -0
  37. package/commands/pan/resume.md +62 -2
  38. package/commands/pan/review-deep.md +128 -0
  39. package/commands/pan/verify-phase.md +53 -0
  40. package/commands/pan/what-if.md +146 -0
  41. package/hooks/dist/pan-cost-logger.js +102 -0
  42. package/hooks/dist/pan-statusline.js +154 -108
  43. package/package.json +1 -1
  44. package/pan-wizard-core/bin/lib/bridge.cjs +269 -0
  45. package/pan-wizard-core/bin/lib/bus.cjs +251 -0
  46. package/pan-wizard-core/bin/lib/codebase.cjs +118 -0
  47. package/pan-wizard-core/bin/lib/constants.cjs +42 -1
  48. package/pan-wizard-core/bin/lib/context-budget.cjs +27 -0
  49. package/pan-wizard-core/bin/lib/core.cjs +91 -6
  50. package/pan-wizard-core/bin/lib/cost.cjs +359 -0
  51. package/pan-wizard-core/bin/lib/focus.cjs +105 -2
  52. package/pan-wizard-core/bin/lib/init.cjs +5 -5
  53. package/pan-wizard-core/bin/lib/knowledge.cjs +331 -0
  54. package/pan-wizard-core/bin/lib/memory.cjs +252 -0
  55. package/pan-wizard-core/bin/lib/phase.cjs +40 -13
  56. package/pan-wizard-core/bin/lib/preview.cjs +480 -0
  57. package/pan-wizard-core/bin/lib/review-deep.cjs +280 -0
  58. package/pan-wizard-core/bin/lib/roadmap.cjs +4 -4
  59. package/pan-wizard-core/bin/lib/state.cjs +2 -2
  60. package/pan-wizard-core/bin/lib/verify.cjs +34 -1
  61. package/pan-wizard-core/bin/lib/whatif.cjs +289 -0
  62. package/pan-wizard-core/bin/pan-tools.cjs +239 -4
  63. package/pan-wizard-core/templates/playbook.md +53 -0
  64. package/pan-wizard-core/templates/preview-report.md +93 -0
  65. package/pan-wizard-core/templates/roadmap.md +24 -24
  66. package/pan-wizard-core/templates/state.md +12 -9
  67. package/pan-wizard-core/workflows/plan-phase.md +1 -1
  68. package/scripts/build-hooks.js +2 -1
  69. package/scripts/generate-skills-docs.py +560 -0
@@ -27,15 +27,50 @@ Phase number: $ARGUMENTS (required)
27
27
  Project state and roadmap are loaded in-workflow using targeted reads.
28
28
  </context>
29
29
 
30
+ <investigate_before_claiming>
31
+ Before surfacing any assumption, read the actual codebase first.
32
+ - Read existing source files related to the phase's domain
33
+ - Grep for relevant function names, imports, patterns
34
+ - Base assumptions on what the code actually shows, not speculation
35
+ Do not claim "the project uses X" without verifying it in the files.
36
+ </investigate_before_claiming>
37
+
38
+ <citation_requirement>
39
+ Every assumption MUST cite the evidence that supports it.
40
+
41
+ **Before presenting assumptions to the user, scan your draft for unsourced claims.** Any assumption without file:line evidence is speculation, not a grounded assumption.
42
+
43
+ **Format:** "Assumption: [claim] — Evidence: [file:line or grep result]"
44
+
45
+ **Grounding rules:**
46
+ - Technical approach assumptions require: file:line showing the current pattern/framework in use
47
+ - Dependency assumptions require: import/require evidence from the relevant module
48
+ - Scope boundary assumptions require: file paths showing what exists vs what doesn't
49
+ - Risk assumptions require: file:line showing the fragile pattern or grep showing the coupling
50
+
51
+ **Anti-pattern:**
52
+ ```
53
+ BAD: "Assumption: The project uses Express for routing"
54
+ → Did you check? Maybe it uses Fastify, or has no server at all.
55
+ GOOD: "Assumption: The project uses Express for routing — Evidence: require('express')
56
+ at src/server.ts:3, route definitions at src/routes/index.ts:12-45"
57
+ ```
58
+ </citation_requirement>
59
+
30
60
  <process>
31
61
  1. Validate phase number argument (error if missing or invalid)
32
62
  2. Check if phase exists in roadmap
33
- 3. Follow assumptions.md workflow:
63
+ 3. Read relevant source files to ground assumptions in evidence
64
+ 4. For each assumption, follow observe-think-conclude:
65
+ - OBSERVE: What does the code show?
66
+ - THINK: What does this imply for the phase approach?
67
+ - CONCLUDE: State the assumption with file:line evidence
68
+ 5. Follow assumptions.md workflow:
34
69
  - Analyze roadmap description
35
70
  - Surface assumptions about: technical approach, implementation order, scope, risks, dependencies
36
- - Present assumptions clearly
71
+ - Present assumptions clearly with file:line references where applicable
37
72
  - Prompt "What do you think?"
38
- 4. Gather feedback and offer next steps
73
+ 5. Gather feedback and offer next steps
39
74
  </process>
40
75
 
41
76
  <success_criteria>
@@ -38,6 +38,12 @@ If no target directory is provided, ask the user:
38
38
  Validate the directory exists before proceeding.
39
39
  </step>
40
40
 
41
+ <investigate_before_judging>
42
+ Never report a file as missing, broken, or misconfigured without reading it first.
43
+ For every audit check: read the actual file, verify its contents, then state the finding with evidence.
44
+ Do not speculate about file contents based on filenames alone.
45
+ </investigate_before_judging>
46
+
41
47
  <step name="installation_audit">
42
48
  **Phase 1 — Installation Integrity Audit**
43
49
 
@@ -0,0 +1,132 @@
1
+ ---
2
+ name: pan:cost
3
+ group: Observability
4
+ description: Show token usage and estimated cost across PAN commands and agents
5
+ argument-hint: "[report|append|clear] [--format json|table|chart] [--since YYYY-MM-DD] [--until YYYY-MM-DD]"
6
+ allowed-tools:
7
+ - Read
8
+ - Bash
9
+ ---
10
+
11
+ <objective>
12
+ Report token usage and estimated cost across all PAN invocations in this project.
13
+
14
+ Reads `.planning/metrics/tokens.jsonl` — an append-only log where each line is one call (agent or command) with token counts and model. Cost is computed from a built-in rate table (overridable via `.planning/config.json` → `cost.rates`).
15
+
16
+ Default output is JSON for piping. Use `--format table` for human-readable tables or `--format chart` for an ASCII bar chart of daily spend.
17
+ </objective>
18
+
19
+ <execution_context>
20
+ @~/.claude/pan-wizard-core/bin/lib/cost.cjs
21
+ </execution_context>
22
+
23
+ <subcommands>
24
+
25
+ ### `report` (default)
26
+
27
+ Aggregate all records into totals + breakdowns by agent, command, tier, and day.
28
+
29
+ ```
30
+ pan-tools cost report [--format json|table|chart] [--since YYYY-MM-DD] [--until YYYY-MM-DD]
31
+ ```
32
+
33
+ **Flags:**
34
+ - `--format` — `json` (default, for tools) | `table` (aligned text columns) | `chart` (per-day ASCII bars).
35
+ - `--since` — ISO date lower bound (inclusive). Records without `ts` always pass.
36
+ - `--until` — ISO date upper bound (inclusive).
37
+
38
+ **JSON output shape:**
39
+ ```json
40
+ {
41
+ "totals": {
42
+ "calls": 42,
43
+ "input_tokens": 123456,
44
+ "output_tokens": 4567,
45
+ "cache_read_tokens": 50000,
46
+ "cache_write_tokens": 5000,
47
+ "cost_usd": 2.1234,
48
+ "cost_unknown": 0
49
+ },
50
+ "cache_hit_rate_pct": 40.5,
51
+ "by_agent": { "pan-planner": { "calls": 8, "input": 50000, ... } },
52
+ "by_command": { ... },
53
+ "by_tier": { ... },
54
+ "by_day": { "2026-04-18": { ... } },
55
+ "window": { "since": null, "until": null }
56
+ }
57
+ ```
58
+
59
+ ### `append`
60
+
61
+ Append a single cost record. Normally called by instrumented agent spawns; users rarely invoke directly.
62
+
63
+ ```
64
+ pan-tools cost append \
65
+ [--agent <name>] [--command <name>] [--model <id>] [--tier reasoning|mid|fast] \
66
+ [--input-tokens N] [--output-tokens N] \
67
+ [--cache-read-tokens N] [--cache-write-tokens N] \
68
+ [--phase <num>] [--session <id>]
69
+ ```
70
+
71
+ Missing fields are stored as `null` / `0`. Cost is auto-computed when `model` or `tier` resolves to a known rate.
72
+
73
+ ### `clear`
74
+
75
+ Delete the cost log. Useful at the start of a billing cycle.
76
+
77
+ ```
78
+ pan-tools cost clear
79
+ ```
80
+
81
+ </subcommands>
82
+
83
+ <rate_table>
84
+ Default rates (USD per million tokens) as of 2026-04. Override per-model in `.planning/config.json`:
85
+
86
+ ```json
87
+ {
88
+ "cost": {
89
+ "rates": {
90
+ "claude-opus-4-7": { "input": 15.0, "output": 75.0, "cache_read": 1.5, "cache_write": 18.75 },
91
+ "my-custom-model": { "input": 1.0, "output": 2.0, "cache_read": 0.1, "cache_write": 1.25 }
92
+ }
93
+ }
94
+ }
95
+ ```
96
+
97
+ When a record has neither a known model nor a known tier, its cost is `null` and it counts toward `totals.cost_unknown`.
98
+ </rate_table>
99
+
100
+ <workflow>
101
+
102
+ **Daily check:** run `/pan:cost --format chart` at the end of a working day to see the spend shape.
103
+
104
+ **Before shipping:** run `/pan:cost --since 2026-04-01 --format table` to get a total for the billing period.
105
+
106
+ **After an expensive run:** check `by_agent` and `by_command` to see which stage drove the spend.
107
+
108
+ **To reconcile with provider bill:** providers report total tokens; PAN's log is append-only and in ISO-8601, so `--since / --until` should match the provider's billing window.
109
+
110
+ </workflow>
111
+
112
+ <instrumentation_note>
113
+
114
+ Token records are written by any caller that knows its usage — typically the host runtime or a wrapper. PAN ships the log format + aggregator (this command); the capture hook itself is opt-in (Wave 5 of Spec B v2). Until then, records can be appended manually via `pan-tools cost append` or by external scripts reading the provider API.
115
+
116
+ If `.planning/metrics/tokens.jsonl` is empty, `/pan:cost` returns zero totals — the feature is inert, not broken.
117
+
118
+ </instrumentation_note>
119
+
120
+ <runtime_compatibility>
121
+
122
+ | Runtime | Support |
123
+ |---------|---------|
124
+ | Claude Code | Full — data format + aggregation + all output formats |
125
+ | OpenCode | Full aggregator; token capture depends on OpenCode's own hooks |
126
+ | Gemini | Full aggregator; token capture depends on Gemini CLI instrumentation |
127
+ | Codex | Full aggregator; token capture via external script |
128
+ | Copilot CLI | Full aggregator; Copilot doesn't currently expose per-call usage |
129
+
130
+ The aggregator is runtime-agnostic. What varies across runtimes is how records *get into* `tokens.jsonl` in the first place.
131
+
132
+ </runtime_compatibility>
@@ -49,6 +49,30 @@ If active sessions exist AND no $ARGUMENTS:
49
49
  If $ARGUMENTS provided OR user describes new issue:
50
50
  - Continue to symptom gathering
51
51
 
52
+ ## Reasoning Protocol
53
+
54
+ For each debugging step, follow the observe-think-act pattern:
55
+ 1. **OBSERVE** — State what you see (error message, unexpected output, file contents)
56
+ 2. **THINK** — Reason about what this means and what to investigate next
57
+ 3. **ACT** — Execute one targeted tool call based on the reasoning
58
+ This prevents random exploration and keeps investigation systematic.
59
+
60
+ ## Meta-Prompting: Self-Generated Debug Strategy
61
+
62
+ After gathering symptoms (step 2), generate your own investigation plan before spawning the debugger:
63
+
64
+ ```
65
+ Given symptoms: "{summary}"
66
+ My debug strategy:
67
+ 1. Most likely cause: {hypothesis} → Test by: {specific check}
68
+ 2. Second most likely: {hypothesis} → Test by: {specific check}
69
+ 3. Long shot: {hypothesis} → Test by: {specific check}
70
+ 4. Files to read first: {ordered list, most relevant first}
71
+ 5. What would DISPROVE each hypothesis: {falsification criteria}
72
+ ```
73
+
74
+ This self-generated strategy is passed to the pan-debugger agent as part of the prompt, giving it a targeted investigation plan rather than open-ended exploration. The falsification criteria are critical — they prevent the agent from confirming a hypothesis by only looking for supporting evidence.
75
+
52
76
  ## 2. Gather Symptoms (if new issue)
53
77
 
54
78
  Use AskUserQuestion for each:
@@ -123,9 +147,47 @@ Task(
123
147
  - "Manual investigation" - done
124
148
  - "Add more context" - gather more symptoms, spawn again
125
149
 
150
+ <debug_handoff_schema>
151
+ Debug session files (`.planning/debug/{slug}.md`) MUST contain structured state for cross-agent handoff:
152
+
153
+ ```yaml
154
+ # Required sections in debug session file
155
+ session: "{slug}"
156
+ status: "investigating | root-cause-found | fix-applied | resolved"
157
+ created: "{ISO-8601}"
158
+ updated: "{ISO-8601}"
159
+
160
+ symptoms:
161
+ expected: "{what should happen}"
162
+ actual: "{what happens instead}"
163
+ errors: "{error messages}"
164
+ reproduction: "{steps to reproduce}"
165
+
166
+ investigation:
167
+ hypotheses_tested:
168
+ - hypothesis: "{what we thought}"
169
+ result: "confirmed | eliminated"
170
+ evidence: "{file:line or command output}"
171
+ hypotheses_remaining:
172
+ - "{what still needs checking}"
173
+
174
+ root_cause: # Populated when found
175
+ description: "{what's actually wrong}"
176
+ evidence: "{file:line proof}"
177
+ confidence: "high | medium | low"
178
+
179
+ fix: # Populated when applied
180
+ files_changed: ["{paths}"]
181
+ approach: "{what was done}"
182
+ tests_added: ["{test paths}"]
183
+ ```
184
+
185
+ **Why structured:** Each continuation agent starts with 0 context. Without structured state, it re-reads the entire investigation log and may re-test eliminated hypotheses. With structured state, it reads `hypotheses_tested` (skip these), checks `hypotheses_remaining` (do these next), and picks up exactly where the previous agent stopped.
186
+ </debug_handoff_schema>
187
+
126
188
  ## 5. Spawn Continuation Agent (After Checkpoint)
127
189
 
128
- When user responds to checkpoint, spawn fresh agent:
190
+ When user responds to checkpoint, spawn fresh agent with the structured debug state:
129
191
 
130
192
  ```markdown
131
193
  <objective>
@@ -134,7 +196,7 @@ Continue debugging {slug}. Evidence is in the debug file.
134
196
 
135
197
  <prior_state>
136
198
  <files_to_read>
137
- - .planning/debug/{slug}.md (Debug session state)
199
+ - .planning/debug/{slug}.md (Debug session state — parse structured sections)
138
200
  </files_to_read>
139
201
  </prior_state>
140
202
 
@@ -146,6 +208,13 @@ Continue debugging {slug}. Evidence is in the debug file.
146
208
  <mode>
147
209
  goal: find_and_fix
148
210
  </mode>
211
+
212
+ <handoff_instructions>
213
+ 1. Parse the debug file's structured sections (symptoms, investigation, root_cause, fix)
214
+ 2. Do NOT re-test hypotheses marked "eliminated" — they are dead ends
215
+ 3. Start from hypotheses_remaining or the checkpoint's next action
216
+ 4. Update the debug file's structured sections as you progress
217
+ </handoff_instructions>
149
218
  ```
150
219
 
151
220
  ```
@@ -27,6 +27,32 @@ Context budget: ~15% orchestrator, 100% fresh per subagent.
27
27
  @~/.claude/pan-wizard-core/references/ui-brand.md
28
28
  </execution_context>
29
29
 
30
+ <completion_contract>
31
+ Execution is complete when ALL conditions are met:
32
+ 1. Every plan in the phase has been dispatched to a subagent
33
+ 2. All subagents have returned (success or failure)
34
+ 3. Full test suite passes with count >= pre-execution baseline
35
+ 4. All verified tasks committed with accurate commit messages
36
+ 5. state.md updated with phase progress
37
+ 6. Failed tasks (if any) logged with error classification and root cause
38
+
39
+ Execution FAILS if: test count drops below baseline after all retries, or state corruption is detected.
40
+ </completion_contract>
41
+
42
+ <wave_dependencies>
43
+ Discovery → Baseline: Test baseline MUST be captured before any wave executes (regression detection requires it)
44
+ Baseline → Wave N: Each wave MUST wait for the previous wave to complete and pass verification
45
+ Wave N → Commit N: Wave changes MUST pass tests before committing (don't commit broken code)
46
+ All Waves → Final Verify: Full test suite MUST pass after all waves complete
47
+ Final Verify → State Update: state.md MUST only be updated after verification passes
48
+
49
+ HARD STOP conditions (do not proceed to next wave):
50
+ - Baseline capture fails (test suite broken before we start) → STOP, report to user
51
+ - Wave N test count drops below baseline after 3 retries → revert wave, mark all wave tasks FAILED, continue to next wave
52
+ - State corruption detected (malformed state.md or plan files) → STOP execution entirely, report to user
53
+ - All waves complete but final test count < baseline → revert last wave, re-verify
54
+ </wave_dependencies>
55
+
30
56
  <context>
31
57
  Phase: $ARGUMENTS
32
58
 
@@ -35,11 +61,90 @@ Phase: $ARGUMENTS
35
61
  - `--skip-tests` — Skip automatic test generation after execution completes.
36
62
  - `--skip-review` — Skip automatic code review after execution completes.
37
63
  - `--fast` — Skip both test generation and code review (implies `--skip-tests --skip-review`).
64
+ - `--deep-review` (v3.4+) — After the normal reviewer step, also run `/pan:review-deep <phase>` (security audit via pan-hardener + cross-check via pan-meta-reviewer). Produces `.planning/reviews/<N>/deep-review.md`. Recommended for phases touching auth, payment, PII, migrations, or public APIs. Costs roughly 3× a normal review.
65
+ - `--hierarchical` (v3.4+, Claude + Opus 4.7 only) — Spawn `pan-conductor` as a top-level orchestrator that decomposes the phase and spawns executor/reviewer/verifier sub-agents in sequence. Bounded by safety harness: max 2 nesting levels, 12 spawns per phase, budget ceiling, `.planning/orchestration/abort` kill-switch. On non-Claude runtimes or older models, this flag is a no-op with a warning and falls back to flat exec. Use only for large phases (≥4 autonomous plans) where wall-clock reduction justifies the ~20-30% orchestration tax.
38
66
 
39
67
  Context files are resolved inside the workflow via `pan-tools init execute-phase` and per-subagent `<files_to_read>` blocks.
40
68
  </context>
41
69
 
70
+ <action_gating>
71
+ Each execution stage has a restricted set of appropriate actions. Using the wrong tool at the wrong stage causes regressions.
72
+
73
+ | Stage | Read | Grep/Glob | Edit/Write | Bash (tests) | Bash (git) | Agent |
74
+ |-------|------|-----------|------------|--------------|------------|-------|
75
+ | Discovery (find plans) | YES | YES | NO | NO | NO | NO |
76
+ | Baseline capture | YES | NO | NO | YES | YES | NO |
77
+ | Wave execution | YES | YES | YES | YES | NO | YES |
78
+ | Wave verification | YES | YES | NO | YES | NO | NO |
79
+ | Wave commit | NO | NO | NO | NO | YES | NO |
80
+ | Final verification | YES | YES | NO | YES | NO | NO |
81
+ | State update | YES | NO | YES | NO | YES | NO |
82
+
83
+ **Key constraints:**
84
+ - Discovery: read-only — do not modify files while figuring out what to execute
85
+ - Baseline: run tests + git status only — no code changes before baseline is captured
86
+ - Wave verification: NO Edit/Write — you are checking work, not doing more work
87
+ - Wave commit: git operations only — all code changes must be done before committing
88
+ </action_gating>
89
+
90
+ <cache_priming>
91
+ **Before Discovery, prime the prompt cache once per invocation.** All subagents spawned within the next 5 minutes will hit the cache instead of re-sending the full context.
92
+
93
+ Run once:
94
+ ```
95
+ pan-tools cache prime --summary
96
+ ```
97
+
98
+ This returns `{blocks: [{path, bytes, cache}], total_bytes, sha}` for the cacheable set (project.md, requirements.md, roadmap.md, state.md, standards.md). The `sha` is stable across identical inputs, so repeated calls within the phase hit cached reads.
99
+
100
+ When spawning subagents for wave execution, include the cacheable block paths in each agent's system-context so the host runtime (Claude Code with Opus 4.7) can mark them `cache_control: ephemeral`. On non-Claude runtimes or older models, this step is a no-op — nothing breaks, just no savings.
101
+ </cache_priming>
102
+
42
103
  <process>
43
104
  Execute the execute-phase workflow from @~/.claude/pan-wizard-core/workflows/exec-phase.md end-to-end.
44
105
  Preserve all workflow gates (wave execution, checkpoint handling, verification, state updates, routing).
106
+
107
+ **Context Management Across Waves:**
108
+ - KEEP: Phase goals, test baseline, current wave tasks, file paths being modified
109
+ - SUMMARIZE: Completed wave results to one-line summaries
110
+ - DISCARD: Raw tool output from previous waves
111
+
112
+ **Attention Anchor — emit after each wave completes:**
113
+ ```
114
+ Wave {N}/{total} complete | Tasks: {done}/{total} | Tests: {baseline} → {current}
115
+ Remaining waves: {list of wave numbers with task counts}
116
+ Next: Wave {N+1} — {task count} tasks [{task IDs}]
117
+ ```
118
+ This prevents drift in multi-wave phases where the agent loses track of which waves remain and what the test baseline was.
119
+
120
+ **State Intent Before Implementing (M+ tasks):**
121
+ For each STANDARD or FULL task, state before coding: "I will modify [files], adding [what], to achieve [goal]. Risk: [what could break]."
122
+
123
+ **Pre-Commit Verification Checklist — apply before each wave commit:**
124
+ 1. Every modified file was read before editing
125
+ 2. `git diff --stat` contains only files related to the current wave's tasks
126
+ 3. Test suite passes and count meets or exceeds pre-wave baseline
127
+ 4. Commit message lists only tasks that are verified (tests ran, tests passed)
128
+ 5. No secrets or credentials staged
129
+
130
+ If any check fails: fix and re-verify before committing.
131
+
132
+ **Error Recovery Classification — apply when any task fails:**
133
+ - RECOVERABLE (retry up to 3 times): test failure after code change, build syntax error, file not found (search for moved path)
134
+ - UNRECOVERABLE (mark task FAILED, continue to next): same failure after 3 retries, permission errors, state corruption, unrelated test regression
135
+ Never let a failed task block the rest of the wave.
136
+
137
+ **Anti-Overengineering:**
138
+ Implement exactly what the plan says. Do not add features, refactor surrounding code, add comments to unchanged files, or create abstractions for one-time operations.
139
+
140
+ **Common Anti-Patterns (avoid these):**
141
+ ```
142
+ BAD: Task says "add input validation" → you also refactor the error handler, add logging, and rename variables
143
+ → 3 unrelated changes pollute the diff, risk regressions in untested paths
144
+ GOOD: Add validation only → commit → let the next task handle error handling if planned
145
+
146
+ BAD: Test fails → change the test's expected output to match the broken code
147
+ → Bug is now hidden, passes CI, breaks in production
148
+ GOOD: Test fails → read the test intent → fix the code to match the expected behavior
149
+ ```
45
150
  </process>