shipwright-cli 3.1.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (283) hide show
  1. package/.claude/agents/code-reviewer.md +2 -0
  2. package/.claude/agents/devops-engineer.md +2 -0
  3. package/.claude/agents/doc-fleet-agent.md +2 -0
  4. package/.claude/agents/pipeline-agent.md +2 -0
  5. package/.claude/agents/shell-script-specialist.md +2 -0
  6. package/.claude/agents/test-specialist.md +2 -0
  7. package/.claude/hooks/agent-crash-capture.sh +32 -0
  8. package/.claude/hooks/post-tool-use.sh +3 -2
  9. package/.claude/hooks/pre-tool-use.sh +35 -3
  10. package/README.md +22 -8
  11. package/claude-code/hooks/config-change.sh +18 -0
  12. package/claude-code/hooks/instructions-reloaded.sh +7 -0
  13. package/claude-code/hooks/worktree-create.sh +25 -0
  14. package/claude-code/hooks/worktree-remove.sh +20 -0
  15. package/config/code-constitution.json +130 -0
  16. package/config/defaults.json +25 -2
  17. package/config/policy.json +1 -1
  18. package/dashboard/middleware/auth.ts +134 -0
  19. package/dashboard/middleware/constants.ts +21 -0
  20. package/dashboard/public/index.html +8 -6
  21. package/dashboard/public/styles.css +176 -97
  22. package/dashboard/routes/auth.ts +38 -0
  23. package/dashboard/server.ts +117 -25
  24. package/dashboard/services/config.ts +26 -0
  25. package/dashboard/services/db.ts +118 -0
  26. package/dashboard/src/canvas/pixel-agent.ts +298 -0
  27. package/dashboard/src/canvas/pixel-sprites.ts +440 -0
  28. package/dashboard/src/canvas/shipyard-effects.ts +367 -0
  29. package/dashboard/src/canvas/shipyard-scene.ts +616 -0
  30. package/dashboard/src/canvas/submarine-layout.ts +267 -0
  31. package/dashboard/src/components/header.ts +8 -7
  32. package/dashboard/src/core/api.ts +5 -0
  33. package/dashboard/src/core/router.ts +1 -0
  34. package/dashboard/src/design/submarine-theme.ts +253 -0
  35. package/dashboard/src/main.ts +2 -0
  36. package/dashboard/src/types/api.ts +12 -1
  37. package/dashboard/src/views/activity.ts +2 -1
  38. package/dashboard/src/views/metrics.ts +69 -1
  39. package/dashboard/src/views/shipyard.ts +39 -0
  40. package/dashboard/types/index.ts +166 -0
  41. package/docs/plans/2026-02-28-compound-audit-and-shipyard-design.md +186 -0
  42. package/docs/plans/2026-02-28-skipper-shipwright-implementation-plan.md +1182 -0
  43. package/docs/plans/2026-02-28-skipper-shipwright-integration-design.md +531 -0
  44. package/docs/plans/2026-03-01-ai-powered-skill-injection-design.md +298 -0
  45. package/docs/plans/2026-03-01-ai-powered-skill-injection-plan.md +1109 -0
  46. package/docs/plans/2026-03-01-capabilities-cleanup-plan.md +658 -0
  47. package/docs/plans/2026-03-01-clean-architecture-plan.md +924 -0
  48. package/docs/plans/2026-03-01-compound-audit-cascade-design.md +191 -0
  49. package/docs/plans/2026-03-01-compound-audit-cascade-plan.md +921 -0
  50. package/docs/plans/2026-03-01-deep-integration-plan.md +851 -0
  51. package/docs/plans/2026-03-01-pipeline-audit-trail-design.md +145 -0
  52. package/docs/plans/2026-03-01-pipeline-audit-trail-plan.md +770 -0
  53. package/docs/plans/2026-03-01-refined-depths-brand-design.md +382 -0
  54. package/docs/plans/2026-03-01-refined-depths-implementation.md +599 -0
  55. package/docs/plans/2026-03-01-skipper-kernel-integration-design.md +203 -0
  56. package/docs/plans/2026-03-01-unified-platform-design.md +272 -0
  57. package/docs/plans/2026-03-07-claude-code-feature-integration-design.md +189 -0
  58. package/docs/plans/2026-03-07-claude-code-feature-integration-plan.md +1165 -0
  59. package/docs/research/BACKLOG_QUICK_REFERENCE.md +352 -0
  60. package/docs/research/CUTTING_EDGE_RESEARCH_2026.md +546 -0
  61. package/docs/research/RESEARCH_INDEX.md +439 -0
  62. package/docs/research/RESEARCH_SOURCES.md +440 -0
  63. package/docs/research/RESEARCH_SUMMARY.txt +275 -0
  64. package/docs/superpowers/specs/2026-03-10-pipeline-quality-revolution-design.md +341 -0
  65. package/package.json +2 -2
  66. package/scripts/lib/adaptive-model.sh +427 -0
  67. package/scripts/lib/adaptive-timeout.sh +316 -0
  68. package/scripts/lib/audit-trail.sh +309 -0
  69. package/scripts/lib/auto-recovery.sh +471 -0
  70. package/scripts/lib/bandit-selector.sh +431 -0
  71. package/scripts/lib/bootstrap.sh +104 -2
  72. package/scripts/lib/causal-graph.sh +455 -0
  73. package/scripts/lib/compat.sh +126 -0
  74. package/scripts/lib/compound-audit.sh +337 -0
  75. package/scripts/lib/constitutional.sh +454 -0
  76. package/scripts/lib/context-budget.sh +359 -0
  77. package/scripts/lib/convergence.sh +594 -0
  78. package/scripts/lib/cost-optimizer.sh +634 -0
  79. package/scripts/lib/daemon-adaptive.sh +14 -2
  80. package/scripts/lib/daemon-dispatch.sh +106 -17
  81. package/scripts/lib/daemon-failure.sh +34 -4
  82. package/scripts/lib/daemon-patrol.sh +25 -4
  83. package/scripts/lib/daemon-poll-github.sh +361 -0
  84. package/scripts/lib/daemon-poll-health.sh +299 -0
  85. package/scripts/lib/daemon-poll.sh +27 -611
  86. package/scripts/lib/daemon-state.sh +119 -66
  87. package/scripts/lib/daemon-triage.sh +10 -0
  88. package/scripts/lib/dod-scorecard.sh +442 -0
  89. package/scripts/lib/error-actionability.sh +300 -0
  90. package/scripts/lib/formal-spec.sh +461 -0
  91. package/scripts/lib/helpers.sh +180 -5
  92. package/scripts/lib/intent-analysis.sh +409 -0
  93. package/scripts/lib/loop-convergence.sh +350 -0
  94. package/scripts/lib/loop-iteration.sh +682 -0
  95. package/scripts/lib/loop-progress.sh +48 -0
  96. package/scripts/lib/loop-restart.sh +185 -0
  97. package/scripts/lib/memory-effectiveness.sh +506 -0
  98. package/scripts/lib/mutation-executor.sh +352 -0
  99. package/scripts/lib/outcome-feedback.sh +521 -0
  100. package/scripts/lib/pipeline-cli.sh +336 -0
  101. package/scripts/lib/pipeline-commands.sh +1216 -0
  102. package/scripts/lib/pipeline-detection.sh +101 -3
  103. package/scripts/lib/pipeline-execution.sh +897 -0
  104. package/scripts/lib/pipeline-github.sh +28 -3
  105. package/scripts/lib/pipeline-intelligence-compound.sh +431 -0
  106. package/scripts/lib/pipeline-intelligence-scoring.sh +407 -0
  107. package/scripts/lib/pipeline-intelligence-skip.sh +181 -0
  108. package/scripts/lib/pipeline-intelligence.sh +104 -1138
  109. package/scripts/lib/pipeline-quality-bash-compat.sh +182 -0
  110. package/scripts/lib/pipeline-quality-checks.sh +17 -711
  111. package/scripts/lib/pipeline-quality-gates.sh +563 -0
  112. package/scripts/lib/pipeline-stages-build.sh +730 -0
  113. package/scripts/lib/pipeline-stages-delivery.sh +965 -0
  114. package/scripts/lib/pipeline-stages-intake.sh +1133 -0
  115. package/scripts/lib/pipeline-stages-monitor.sh +407 -0
  116. package/scripts/lib/pipeline-stages-review.sh +1022 -0
  117. package/scripts/lib/pipeline-stages.sh +161 -2901
  118. package/scripts/lib/pipeline-state.sh +36 -5
  119. package/scripts/lib/pipeline-util.sh +487 -0
  120. package/scripts/lib/policy-learner.sh +438 -0
  121. package/scripts/lib/process-reward.sh +493 -0
  122. package/scripts/lib/project-detect.sh +649 -0
  123. package/scripts/lib/quality-profile.sh +334 -0
  124. package/scripts/lib/recruit-commands.sh +885 -0
  125. package/scripts/lib/recruit-learning.sh +739 -0
  126. package/scripts/lib/recruit-roles.sh +648 -0
  127. package/scripts/lib/reward-aggregator.sh +458 -0
  128. package/scripts/lib/rl-optimizer.sh +362 -0
  129. package/scripts/lib/root-cause.sh +427 -0
  130. package/scripts/lib/scope-enforcement.sh +445 -0
  131. package/scripts/lib/session-restart.sh +493 -0
  132. package/scripts/lib/skill-memory.sh +300 -0
  133. package/scripts/lib/skill-registry.sh +775 -0
  134. package/scripts/lib/spec-driven.sh +476 -0
  135. package/scripts/lib/test-helpers.sh +18 -7
  136. package/scripts/lib/test-holdout.sh +429 -0
  137. package/scripts/lib/test-optimizer.sh +511 -0
  138. package/scripts/shipwright-file-suggest.sh +45 -0
  139. package/scripts/skills/adversarial-quality.md +61 -0
  140. package/scripts/skills/api-design.md +44 -0
  141. package/scripts/skills/architecture-design.md +50 -0
  142. package/scripts/skills/brainstorming.md +43 -0
  143. package/scripts/skills/data-pipeline.md +44 -0
  144. package/scripts/skills/deploy-safety.md +64 -0
  145. package/scripts/skills/documentation.md +38 -0
  146. package/scripts/skills/frontend-design.md +45 -0
  147. package/scripts/skills/generated/.gitkeep +0 -0
  148. package/scripts/skills/generated/_refinements/.gitkeep +0 -0
  149. package/scripts/skills/generated/_refinements/adversarial-quality.patch.md +3 -0
  150. package/scripts/skills/generated/_refinements/architecture-design.patch.md +3 -0
  151. package/scripts/skills/generated/_refinements/brainstorming.patch.md +3 -0
  152. package/scripts/skills/generated/cli-version-management.md +29 -0
  153. package/scripts/skills/generated/collection-system-validation.md +99 -0
  154. package/scripts/skills/generated/large-scale-c-refactoring-coordination.md +97 -0
  155. package/scripts/skills/generated/pattern-matching-similarity-scoring.md +195 -0
  156. package/scripts/skills/generated/test-parallelization-detection.md +65 -0
  157. package/scripts/skills/observability.md +79 -0
  158. package/scripts/skills/performance.md +48 -0
  159. package/scripts/skills/pr-quality.md +49 -0
  160. package/scripts/skills/product-thinking.md +43 -0
  161. package/scripts/skills/security-audit.md +49 -0
  162. package/scripts/skills/systematic-debugging.md +40 -0
  163. package/scripts/skills/testing-strategy.md +47 -0
  164. package/scripts/skills/two-stage-review.md +52 -0
  165. package/scripts/skills/validation-thoroughness.md +55 -0
  166. package/scripts/sw +9 -3
  167. package/scripts/sw-activity.sh +9 -8
  168. package/scripts/sw-adaptive.sh +8 -7
  169. package/scripts/sw-adversarial.sh +2 -1
  170. package/scripts/sw-architecture-enforcer.sh +3 -1
  171. package/scripts/sw-auth.sh +12 -2
  172. package/scripts/sw-autonomous.sh +5 -1
  173. package/scripts/sw-changelog.sh +4 -1
  174. package/scripts/sw-checkpoint.sh +2 -1
  175. package/scripts/sw-ci.sh +15 -6
  176. package/scripts/sw-cleanup.sh +4 -26
  177. package/scripts/sw-code-review.sh +45 -20
  178. package/scripts/sw-connect.sh +2 -1
  179. package/scripts/sw-context.sh +2 -1
  180. package/scripts/sw-cost.sh +107 -5
  181. package/scripts/sw-daemon.sh +71 -11
  182. package/scripts/sw-dashboard.sh +3 -1
  183. package/scripts/sw-db.sh +71 -20
  184. package/scripts/sw-decide.sh +8 -2
  185. package/scripts/sw-decompose.sh +360 -17
  186. package/scripts/sw-deps.sh +4 -1
  187. package/scripts/sw-developer-simulation.sh +4 -1
  188. package/scripts/sw-discovery.sh +378 -5
  189. package/scripts/sw-doc-fleet.sh +4 -1
  190. package/scripts/sw-docs-agent.sh +3 -1
  191. package/scripts/sw-docs.sh +2 -1
  192. package/scripts/sw-doctor.sh +453 -2
  193. package/scripts/sw-dora.sh +4 -1
  194. package/scripts/sw-durable.sh +12 -7
  195. package/scripts/sw-e2e-orchestrator.sh +17 -16
  196. package/scripts/sw-eventbus.sh +13 -4
  197. package/scripts/sw-evidence.sh +364 -12
  198. package/scripts/sw-feedback.sh +550 -9
  199. package/scripts/sw-fix.sh +20 -1
  200. package/scripts/sw-fleet-discover.sh +6 -2
  201. package/scripts/sw-fleet-viz.sh +9 -4
  202. package/scripts/sw-fleet.sh +5 -1
  203. package/scripts/sw-github-app.sh +18 -4
  204. package/scripts/sw-github-checks.sh +3 -2
  205. package/scripts/sw-github-deploy.sh +3 -2
  206. package/scripts/sw-github-graphql.sh +18 -7
  207. package/scripts/sw-guild.sh +5 -1
  208. package/scripts/sw-heartbeat.sh +5 -30
  209. package/scripts/sw-hello.sh +67 -0
  210. package/scripts/sw-hygiene.sh +10 -3
  211. package/scripts/sw-incident.sh +273 -5
  212. package/scripts/sw-init.sh +18 -2
  213. package/scripts/sw-instrument.sh +10 -2
  214. package/scripts/sw-intelligence.sh +44 -7
  215. package/scripts/sw-jira.sh +5 -1
  216. package/scripts/sw-launchd.sh +2 -1
  217. package/scripts/sw-linear.sh +4 -1
  218. package/scripts/sw-logs.sh +4 -1
  219. package/scripts/sw-loop.sh +436 -1076
  220. package/scripts/sw-memory.sh +357 -3
  221. package/scripts/sw-mission-control.sh +6 -1
  222. package/scripts/sw-model-router.sh +483 -27
  223. package/scripts/sw-otel.sh +15 -4
  224. package/scripts/sw-oversight.sh +14 -5
  225. package/scripts/sw-patrol-meta.sh +334 -0
  226. package/scripts/sw-pipeline-composer.sh +7 -1
  227. package/scripts/sw-pipeline-vitals.sh +12 -6
  228. package/scripts/sw-pipeline.sh +54 -2653
  229. package/scripts/sw-pm.sh +16 -8
  230. package/scripts/sw-pr-lifecycle.sh +2 -1
  231. package/scripts/sw-predictive.sh +17 -5
  232. package/scripts/sw-prep.sh +185 -2
  233. package/scripts/sw-ps.sh +5 -25
  234. package/scripts/sw-public-dashboard.sh +17 -4
  235. package/scripts/sw-quality.sh +14 -6
  236. package/scripts/sw-reaper.sh +8 -25
  237. package/scripts/sw-recruit.sh +156 -2303
  238. package/scripts/sw-regression.sh +19 -12
  239. package/scripts/sw-release-manager.sh +3 -1
  240. package/scripts/sw-release.sh +4 -1
  241. package/scripts/sw-remote.sh +3 -1
  242. package/scripts/sw-replay.sh +7 -1
  243. package/scripts/sw-retro.sh +158 -1
  244. package/scripts/sw-review-rerun.sh +3 -1
  245. package/scripts/sw-scale.sh +14 -5
  246. package/scripts/sw-security-audit.sh +6 -1
  247. package/scripts/sw-self-optimize.sh +173 -6
  248. package/scripts/sw-session.sh +9 -3
  249. package/scripts/sw-setup.sh +3 -1
  250. package/scripts/sw-stall-detector.sh +406 -0
  251. package/scripts/sw-standup.sh +15 -7
  252. package/scripts/sw-status.sh +3 -1
  253. package/scripts/sw-strategic.sh +14 -6
  254. package/scripts/sw-stream.sh +13 -4
  255. package/scripts/sw-swarm.sh +20 -7
  256. package/scripts/sw-team-stages.sh +13 -6
  257. package/scripts/sw-templates.sh +7 -31
  258. package/scripts/sw-testgen.sh +17 -6
  259. package/scripts/sw-tmux-pipeline.sh +4 -1
  260. package/scripts/sw-tmux-role-color.sh +2 -0
  261. package/scripts/sw-tmux-status.sh +1 -1
  262. package/scripts/sw-tmux.sh +37 -1
  263. package/scripts/sw-trace.sh +3 -1
  264. package/scripts/sw-tracker-github.sh +3 -0
  265. package/scripts/sw-tracker-jira.sh +3 -0
  266. package/scripts/sw-tracker-linear.sh +3 -0
  267. package/scripts/sw-tracker.sh +3 -1
  268. package/scripts/sw-triage.sh +3 -2
  269. package/scripts/sw-upgrade.sh +3 -1
  270. package/scripts/sw-ux.sh +5 -2
  271. package/scripts/sw-webhook.sh +5 -2
  272. package/scripts/sw-widgets.sh +9 -4
  273. package/scripts/sw-worktree.sh +15 -3
  274. package/scripts/test-skill-injection.sh +1233 -0
  275. package/templates/pipelines/autonomous.json +27 -3
  276. package/templates/pipelines/cost-aware.json +34 -8
  277. package/templates/pipelines/deployed.json +12 -0
  278. package/templates/pipelines/enterprise.json +12 -0
  279. package/templates/pipelines/fast.json +6 -0
  280. package/templates/pipelines/full.json +27 -3
  281. package/templates/pipelines/hotfix.json +6 -0
  282. package/templates/pipelines/standard.json +12 -0
  283. package/templates/pipelines/tdd.json +12 -0
@@ -0,0 +1,195 @@
1
+ # Pattern Matching & Failure Prevention Scoring
2
+
3
+ ## Overview
4
+
5
+ This skill guides design and implementation of pattern-based proactive failure prevention: matching incoming issues against captured failure patterns, scoring similarity, injecting relevant context, and measuring whether patterns actually prevent repeat failures.
6
+
7
+ ## Similarity Scoring Algorithm (0-100 scale)
8
+
9
+ For each incoming issue, compute a composite similarity score against each known failure pattern:
10
+
11
+ ### Component 1: Title Similarity (40% weight)
12
+ - Fuzzy string matching using token overlap or Levenshtein distance normalized by string length
13
+ - Captures semantic closeness of the problem description
14
+ - Example: "API timeout on user endpoint" vs "Timeout in auth middleware" → ~0.7 similarity → 28 points
15
+
16
+ ### Component 2: File Overlap (35% weight)
17
+ - Compare changed files in original failure vs incoming issue
18
+ - Score = (overlapping_files / max(original_files, incoming_files)) * 100
19
+ - Files touching the same components are more likely to have similar root causes
20
+ - Example: Both touched `scripts/sw-daemon.sh` and `scripts/lib/daemon-dispatch.sh` → 35 points if full overlap
21
+
22
+ ### Component 3: Error Signature Match (25% weight)
23
+ - Check if error message substrings or error codes appear in both
24
+ - Extract from error-summary.json or stack trace (structured format preferred)
25
+ - Example: Both contain "pipefail" or "ENOENT" → 25 points
26
+
27
+ **Formula: score = (title_score * 0.4) + (file_score * 0.35) + (error_score * 0.25)**
28
+
29
+ ## Injection Thresholds
30
+
31
+ - **Below 60**: Pattern not relevant, no injection
32
+ - **60-80**: Inject with confidence tag ("medium confidence match")
33
+ - **80-100**: Inject with high confidence ("strong pattern match")
34
+ - **Configurable threshold**: daemon-config.json `memory_pattern_matching.similarity_threshold`
35
+
36
+ ## Proactive Injection Strategy
37
+
38
+ When score > threshold (at pipeline spawn time, before plan stage):
39
+
40
+ 1. Extract relevant context from memory pattern:
41
+ - Root cause description
42
+ - Applied fix(es)
43
+ - Environment/version context if present
44
+ - What worked vs what didn't
45
+
46
+ 2. Inject into pipeline prompt:
47
+ ```
48
+ Similar pattern found (confidence: 85%): Issue #123 "API timeout on user endpoint"
49
+ Root cause: Unbounded goroutine creation in event loop
50
+ Applied fix: Add semaphore to limit concurrent handlers
51
+ Files affected: scripts/sw-daemon.sh, internal/loop.go
52
+ ```
53
+
54
+ 3. Tag injection metadata:
55
+ - `pattern_id`: which pattern was injected
56
+ - `injection_score`: 0-100 confidence
57
+ - `timestamp`: when injected
58
+ - `source_issue_id`: which failure this pattern came from
59
+
60
+ 4. **Important**: No coercion. Agent can ignore injected pattern if context doesn't match.
61
+
62
+ ## Outcome Tracking
63
+
64
+ After pipeline completes, record:
65
+
66
+ ```json
67
+ {
68
+ "pattern_injected": true,
69
+ "pattern_id": "mem_abc123",
70
+ "injection_score": 85,
71
+ "incoming_issue_id": "456",
72
+ "source_issue_id": "123",
73
+ "failure_occurred": false,
74
+ "failure_type": null,
75
+ "expected_failure_type": "timeout",
76
+ "failure_type_matched": null,
77
+ "failure_prevented": true,
78
+ "confidence_in_prevention": 0.7,
79
+ "notes": "Applied semaphore fix suggested by pattern; no timeout occurred."
80
+ }
81
+ ```
82
+
83
+ **Caveat**: `failure_prevented` is **inference**, not proof.
84
+ - True positive: pattern injected, agent applied the fix, no failure occurred, failure type matched expected type
85
+ - Could be false positive: pattern coincidentally matched, but agent's own skill prevented failure
86
+ - Always include confidence score; never claim 100% causation
87
+
88
+ ## Effectiveness Metrics (Dashboard)
89
+
90
+ **Per-pattern metrics:**
91
+ - **Success Rate**: (times_injected AND failure_prevented) / times_injected
92
+ - **Usage Frequency**: times_injected in last 30 days
93
+ - **Confidence Distribution**: histogram of injection_scores
94
+ - **False Positive Rate**: (times_injected AND failure_occurred) / times_injected
95
+
96
+ **Aggregate metrics:**
97
+ - **Overall Memory Injection ROI**: sum(successful_injections) / sum(total_injections)
98
+ - **Patterns Needing Refinement**: patterns with > 30% usage but < 40% success rate (candidates for root cause re-analysis)
99
+ - **Trending**: success rate on 7-day and 30-day windows; alert if trending down
100
+ - **Pattern Lifecycle**: which patterns are becoming obsolete (< 1% usage in 90 days)?
101
+
102
+ ## Integration Points
103
+
104
+ 1. **sw-memory.sh**
105
+ - Call `memory_get_patterns()` to retrieve all patterns with timestamps, failure_type, root_cause
106
+ - Call `memory_add_outcome_tracking()` to record success/failure outcome
107
+
108
+ 2. **sw-intelligence.sh**
109
+ - Integrate pattern scoring into `intake` stage
110
+ - Score issue at pipeline spawn time (before plan stage)
111
+ - Return top 3 matching patterns sorted by score
112
+
113
+ 3. **Pipeline prompt composition**
114
+ - Add `memory_pattern_context` section to prompt if score > threshold
115
+ - Include confidence score so agent is aware this is a suggestion, not a fact
116
+
117
+ 4. **Pipeline state tracking**
118
+ - Add `memory_patterns` section to pipeline-state.md with injected pattern details
119
+ - Track injection_score, outcome_recorded=true/false
120
+
121
+ 5. **Loop iteration context**
122
+ - If issue re-runs in build loop, re-score with new error context
123
+ - Emerging error signatures may match different patterns on retry
124
+
125
+ ## Testing Strategy
126
+
127
+ **Unit tests:**
128
+ - Similarity scoring against known issue pairs with ground truth
129
+ - Threshold boundary behavior (59, 60, 61)
130
+ - Weight adjustment: verify 0.4 + 0.35 + 0.25 = 1.0
131
+
132
+ **Edge cases:**
133
+ - Empty pattern database → score undefined, no injection
134
+ - Identical issues with different outcomes → verify both outcomes tracked
135
+ - Pattern with malformed error_signature → graceful fallback
136
+ - Very high similarity (> 95%) → verify no over-confidence
137
+
138
+ **Integration tests:**
139
+ - Inject pattern, verify it appears in pipeline prompt
140
+ - Run build, record outcome, verify outcome_tracking fires
141
+ - Query dashboard, verify metrics match recorded outcomes
142
+
143
+ **Effectiveness validation:**
144
+ - Mock a failure type, populate patterns database, run scoring
145
+ - Verify pattern was injected at expected score
146
+ - Mock outcome (failure_prevented=true/false), verify metrics compute correctly
147
+
148
+ ## Configuration Example
149
+
150
+ ```json
151
+ {
152
+ "memory_pattern_matching": {
153
+ "enabled": true,
154
+ "similarity_threshold": 60,
155
+ "weights": {
156
+ "title_similarity": 0.4,
157
+ "file_overlap": 0.35,
158
+ "error_signature": 0.25
159
+ },
160
+ "confidence_tiers": {
161
+ "high": 80,
162
+ "medium": 60,
163
+ "low": 30
164
+ },
165
+ "max_patterns_to_inject": 3,
166
+ "metrics_retention_days": 90,
167
+ "anomaly_detection_enabled": true
168
+ }
169
+ }
170
+ ```
171
+
172
+ ## Risk Mitigation
173
+
174
+ **Risk 1: False positive injection**
175
+ - Monitoring: alert if false_positive_rate > 15%
176
+ - Mitigation: lower threshold, disable for specific pattern types, or retire pattern
177
+
178
+ **Risk 2: Outcome attribution confusion**
179
+ - Always show confidence_in_prevention as a float (0.0-1.0), never binary
180
+ - Document that "prevented" is inferred, not measured
181
+ - Quarterly review of patterns with low confidence
182
+
183
+ **Risk 3: Circular reasoning**
184
+ - Patterns must capture ROOT CAUSE, not just "solution"
185
+ - Red flag: if pattern root_cause is identical to another pattern → merge
186
+ - Quarterly audit of pattern root_cause quality
187
+
188
+ **Risk 4: Performance at scale**
189
+ - Scoring 100+ patterns should be < 500ms
190
+ - Use cached similarity scores if possible
191
+ - Parallel scoring if pattern database grows beyond 500
192
+
193
+ **Risk 5: Stale patterns**
194
+ - Patterns from > 180 days ago with < 5 uses → mark for review
195
+ - Dashboard should surface "patterns never injected" for root cause analysis
@@ -0,0 +1,65 @@
1
+ ## Test Parallelization Detection & Coordination
2
+
3
+ ### Problem
4
+ Test parallelization is dangerous: undetected shared state (temp files, global state, database connections) causes race conditions and flaky failures. This skill provides a systematic approach to detect parallelizable test suites and coordinate their execution safely.
5
+
6
+ ### Shared State Detection Heuristics
7
+
8
+ **Static Analysis (file scanning):**
9
+ - Scan test file imports for singleton patterns (db connections, file handles, global state modules)
10
+ - Detect hardcoded file paths (temp dirs) and network ports — tests using fixed resources conflict
11
+ - Check for `beforeAll`/`afterAll` hooks that modify global state
12
+ - Identify test files importing shared fixtures/setup modules
13
+
14
+ **Dynamic Analysis (test execution):**
15
+ - Run test suite with `--detectOpenHandles` (Node.js) or equivalent to catch file/port leaks
16
+ - Track temp directory usage per test file — any overlap = unsafe to parallelize
17
+ - Monitor for test isolation violations (tests passing in isolation but failing when run together)
18
+
19
+ **Safety Levels:**
20
+ - **Green (parallelizable)**: No shared state detected, no fixture conflicts, passes isolation tests
21
+ - **Yellow (conditional)**: Shared fixtures but isolated datasets, parallel execution with coordination (e.g., separate DB schemas)
22
+ - **Red (sequential)**: Database transaction rollback, process spawning, hardware resource contention — must run serially
23
+
24
+ ### Affected-Test Detection via Git Diff
25
+
26
+ **Module Dependency Tracking:**
27
+ 1. Build module-to-test mapping (which tests exercise which modules)
28
+ 2. On each commit, run `git diff --name-only HEAD~1` to identify changed modules
29
+ 3. Find all tests that import/test those modules
30
+ 4. Prioritize affected tests first in execution order (fail-fast on functionality regression)
31
+ 5. Cache mapping per commit to avoid re-scanning on retries
32
+
33
+ **False Negatives to Handle:**
34
+ - Integration tests that cross module boundaries (require broader analysis)
35
+ - Tests that exercise shared utilities or base classes (conservative: mark as affected if any parent module changed)
36
+ - Dynamic imports and string-based test discovery (fallback: scan test code for patterns)
37
+
38
+ ### Parallel Execution Coordination
39
+
40
+ **Scheduler:**
41
+ - Detect CPU core count, default to `cores - 1` (reserve 1 for OS)
42
+ - Group parallelizable tests into batches, run batches in parallel
43
+ - Within each batch, respect test file order (some test runners depend on execution order)
44
+ - Run non-parallelizable (red) tests serially, either before or after parallel batches (configurable)
45
+
46
+ **Fast-Fail Policy:**
47
+ - Critical failures: assertion errors, uncaught exceptions → abort immediately
48
+ - Flaky failures: timeout, process exit, known-flaky markers → retry up to N times before aborting
49
+ - Aggregate results across parallel workers before reporting
50
+ - Time tracking: measure wall-clock time for each batch, report parallelization efficiency (theoretical vs actual speedup)
51
+
52
+ ### Dashboard Integration
53
+
54
+ - Display parallel execution summary: N tests in M workers, X% speedup
55
+ - Visualize test dependency graph (which tests block which)
56
+ - Alert on shared-state violations (test passed alone, failed in parallel)
57
+ - Trend: parallelization efficiency over time (detect regressions where new tests add serial bottlenecks)
58
+
59
+ ### Key Decisions for This Issue
60
+
61
+ 1. **Minimum Parallelization Threshold**: What's the smallest safe granularity? (per file, per suite, per test?)
62
+ 2. **Flaky Detection**: How many retries before marking as critical failure? (recommend 3)
63
+ 3. **Shared-State Confidence**: Are heuristics sufficient, or require explicit opt-in per test file?
64
+ 4. **Fast-Fail Behavior**: Abort on first critical failure globally, or let all workers finish for faster feedback iteration?
65
+ 5. **Fallback**: If parallelization detection is uncertain, run serial — safety over speed.
@@ -0,0 +1,79 @@
1
+ ## Observability: Watch the Deploy Like a Hawk
2
+
3
+ Post-deploy monitoring catches what tests miss. Real traffic reveals real problems.
4
+
5
+ ### What to Monitor (by Priority)
6
+
7
+ **P0 — Immediate (first 5 minutes):**
8
+ - Error rate: any increase over baseline?
9
+ - Health check: still returning 200?
10
+ - Latency: p50/p95/p99 within normal range?
11
+ - Memory/CPU: any sudden spikes?
12
+
13
+ **P1 — Short-term (5-30 minutes):**
14
+ - Business metrics: are users completing key flows?
15
+ - Queue depths: are background jobs processing normally?
16
+ - Connection pools: any exhaustion or leak patterns?
17
+ - Disk usage: any unexpected growth?
18
+
19
+ **P2 — Medium-term (1-24 hours):**
20
+ - Memory trends: gradual leak over time?
21
+ - Error rate trends: slowly increasing?
22
+ - User-reported issues: any new support tickets?
23
+ - Performance degradation under sustained load?
24
+
25
+ ### Anomaly Detection Patterns
26
+ - **Spike detection**: >2x baseline error rate in any 1-minute window
27
+ - **Trend detection**: steadily increasing error rate over 5-minute window
28
+ - **Absence detection**: expected periodic events stop occurring
29
+ - **Latency shift**: p95 latency increases >50% from baseline
30
+
31
+ ### Log Analysis
32
+ - Search for new ERROR/FATAL/PANIC entries not present before deploy
33
+ - Check for stack traces — they indicate unhandled exceptions
34
+ - Look for retry storms — repeated failed attempts at the same operation
35
+ - Monitor for resource exhaustion messages (OOM, connection refused, disk full)
36
+
37
+ ### Auto-Rollback Triggers
38
+ Automatically rollback if ANY of these occur:
39
+ - Health check fails 3 consecutive times
40
+ - Error rate exceeds threshold for 2+ minutes
41
+ - Critical service dependency becomes unreachable
42
+ - Memory usage exceeds 90% of limit
43
+
44
+ ### Monitoring by Issue Type
45
+
46
+ **Frontend changes:**
47
+ - JavaScript error rates in browser (if client-side monitoring exists)
48
+ - Asset load failures (404s on new bundles)
49
+ - Core Web Vitals regression (LCP, FID, CLS)
50
+
51
+ **API changes:**
52
+ - Response status code distribution (2xx vs 4xx vs 5xx)
53
+ - Request throughput — drops indicate client-side breakage
54
+ - Authentication failures — spikes indicate auth regression
55
+
56
+ **Database changes:**
57
+ - Query latency per endpoint
58
+ - Connection pool utilization
59
+ - Slow query log entries
60
+ - Replication lag (if applicable)
61
+
62
+ ### Incident Escalation
63
+ If monitoring detects issues:
64
+ 1. Execute rollback (if auto-rollback enabled)
65
+ 2. Create incident issue with monitoring data
66
+ 3. Attach relevant logs and metrics
67
+ 4. Tag the original issue with `incident` label
68
+ 5. Do NOT silence alerts — let them fire
69
+
70
+ ### Required Output (Mandatory)
71
+
72
+ Your output MUST include these sections when this skill is active:
73
+
74
+ 1. **Monitoring Checklist**: P0/P1/P2 metrics to watch (error rate, latency, memory, health checks) with specific thresholds
75
+ 2. **Anomaly Detection Triggers**: Explicit conditions that trigger alerts (spike detection >2x, trend detection over 5min, absence detection, latency shift >50%)
76
+ 3. **Log Analysis**: Search strategy for new ERROR/FATAL entries, stack traces, retry storms, resource exhaustion patterns
77
+ 4. **Auto-Rollback Decision Criteria**: Conditions that trigger automatic rollback (health check failures, error rate threshold, critical dependency unreachable, memory exhaustion)
78
+
79
+ If any section is not applicable, explicitly state why it's skipped.
@@ -0,0 +1,48 @@
1
+ ## Performance Expertise
2
+
3
+ Apply these optimization patterns:
4
+
5
+ ### Profiling First
6
+ - Measure before optimizing — identify the actual bottleneck
7
+ - Use profiling tools appropriate to the language/runtime
8
+ - Focus on the critical path — optimize what users experience
9
+
10
+ ### Caching Strategy
11
+ - Cache expensive computations and repeated queries
12
+ - Set appropriate TTLs — stale data vs freshness trade-off
13
+ - Invalidate caches on write operations
14
+ - Use cache layers: in-memory (L1) → distributed (L2) → database (L3)
15
+
16
+ ### Database Performance
17
+ - Add indexes for frequently queried columns (check EXPLAIN plans)
18
+ - Avoid N+1 queries — use batch loading or JOINs
19
+ - Use connection pooling
20
+ - Consider read replicas for read-heavy workloads
21
+
22
+ ### Algorithm Complexity
23
+ - Prefer O(n log n) over O(n²) for sorting/searching
24
+ - Use appropriate data structures (hash maps for lookups, trees for ranges)
25
+ - Avoid unnecessary allocations in hot paths
26
+ - Pre-compute values that are used repeatedly
27
+
28
+ ### Network Optimization
29
+ - Minimize round trips — batch API calls where possible
30
+ - Use compression for large payloads
31
+ - Implement pagination — never return unbounded result sets
32
+ - Use CDNs for static assets
33
+
34
+ ### Benchmarking
35
+ - Include before/after benchmarks for performance changes
36
+ - Test with realistic data volumes (not just unit test fixtures)
37
+ - Measure p50, p95, p99 latencies — not just averages
38
+
39
+ ### Required Output (Mandatory)
40
+
41
+ Your output MUST include these sections when this skill is active:
42
+
43
+ 1. **Baseline Metrics**: Current performance metrics before optimization (p50/p95/p99 latency, throughput, resource usage)
44
+ 2. **Optimization Targets**: Specific targets (e.g., "reduce p95 latency from 250ms to <100ms") with rationale
45
+ 3. **Profiling Strategy**: Tools and methodology to identify bottlenecks (CPU profiler, memory profiler, query analyzer, benchmarks)
46
+ 4. **Benchmark Plan**: Before/after benchmarks with realistic data volume and success criteria for each optimization
47
+
48
+ If any section is not applicable, explicitly state why it's skipped.
@@ -0,0 +1,49 @@
1
+ ## PR Quality: Ship a Reviewable Pull Request
2
+
3
+ Write a PR that a reviewer can understand in 5 minutes.
4
+
5
+ ### PR Description Structure
6
+ 1. **What** — One sentence: what does this PR do?
7
+ 2. **Why** — Link to issue. Why is this change needed?
8
+ 3. **How** — Brief technical approach (2-3 sentences max)
9
+ 4. **Testing** — What was tested? How to verify?
10
+ 5. **Screenshots** — If UI changes, before/after screenshots
11
+
12
+ ### Commit Hygiene
13
+ - Each commit should be a logical unit of work
14
+ - Commit messages: imperative mood, 50-char subject, blank line, body explains WHY
15
+ - No WIP/fixup/squash commits in final PR
16
+ - No merge commits — rebase onto base branch
17
+ - Separate refactoring commits from feature commits
18
+
19
+ ### Diff Quality
20
+ - Remove all debugging artifacts (console.log, print statements, commented-out code)
21
+ - No unrelated formatting changes mixed with logic changes
22
+ - Generated files should be committed separately or excluded
23
+ - File renames should be separate commits (so git tracks them)
24
+
25
+ ### Reviewer Empathy
26
+ - If the diff is >500 lines, add a "Review guide" section explaining the reading order
27
+ - Call out non-obvious decisions with inline comments
28
+ - Flag areas where you're least confident and want careful review
29
+ - If you changed a pattern used elsewhere, note whether existing code needs updating
30
+
31
+ ### Self-Review Checklist
32
+ Before marking as ready:
33
+ - [ ] PR description explains what, why, and how
34
+ - [ ] All CI checks pass
35
+ - [ ] No secrets, credentials, or API keys in diff
36
+ - [ ] No TODO/FIXME comments without issue links
37
+ - [ ] Breaking changes documented in description
38
+ - [ ] Migration steps documented if applicable
39
+
40
+ ### Required Output (Mandatory)
41
+
42
+ Your output MUST include these sections when this skill is active:
43
+
44
+ 1. **PR Description**: What (one sentence), Why (issue link), How (2-3 sentence technical approach), Testing (what was tested)
45
+ 2. **Commit Hygiene Check**: Verification that each commit is a logical unit, no WIP/fixup/squash, no merge commits
46
+ 3. **Diff Review**: Confirmation that all debugging artifacts removed (console.log, commented code), no unrelated formatting changes
47
+ 4. **Self-Review Checklist Completion**: All items from checklist checked (secrets scanned, CI green, breaking changes documented)
48
+
49
+ If any section is not applicable, explicitly state why it's skipped.
@@ -0,0 +1,43 @@
1
+ ## Product Thinking Expertise
2
+
3
+ Consider the user perspective in your implementation:
4
+
5
+ ### User Stories
6
+ - Who is the user for this feature?
7
+ - What problem does this solve for them?
8
+ - What is their workflow before and after this change?
9
+ - Define acceptance criteria from the user's perspective
10
+
11
+ ### User Experience
12
+ - What is the simplest interaction that solves the problem?
13
+ - How does the user discover this feature?
14
+ - What happens when things go wrong? (error states, recovery)
15
+ - Is the feature accessible to users with disabilities?
16
+
17
+ ### Edge Cases from User Perspective
18
+ - What if the user has no data yet? (empty state)
19
+ - What if the user has too much data? (pagination, filtering)
20
+ - What if the user makes a mistake? (undo, confirmation)
21
+ - What if the user is on a slow connection? (loading states)
22
+
23
+ ### Progressive Disclosure
24
+ - Show the most important information first
25
+ - Hide complexity behind progressive interactions
26
+ - Don't overwhelm with options — provide sensible defaults
27
+ - Use contextual help instead of documentation
28
+
29
+ ### Required Output (Mandatory)
30
+
31
+ Your output MUST include these sections when this skill is active:
32
+
33
+ 1. **User Stories**: In "As a..., I want..., So that..." format with at least one primary and one secondary user story
34
+ 2. **Acceptance Criteria**: Given/When/Then format for how to verify the feature works from the user's perspective
35
+ 3. **Edge Cases from User Perspective**: At least 3 specific scenarios (empty state, error state, overload state)
36
+
37
+ If any section is not applicable, explicitly state why it's skipped.
38
+
39
+ ### Feedback & Communication
40
+ - Confirm successful actions immediately
41
+ - Explain errors in plain language — not error codes
42
+ - Show progress for long-running operations
43
+ - Preserve user context across navigation
@@ -0,0 +1,49 @@
1
+ ## Security Audit Expertise
2
+
3
+ Apply OWASP Top 10 and security best practices:
4
+
5
+ ### Injection Prevention
6
+ - Use parameterized queries for ALL database access
7
+ - Sanitize user input before rendering in HTML/templates
8
+ - Validate and sanitize file paths — prevent directory traversal
9
+ - Never execute user-supplied strings as code or commands
10
+
11
+ ### Authentication
12
+ - Hash passwords with bcrypt/argon2 (never MD5/SHA1)
13
+ - Implement account lockout after failed attempts
14
+ - Use secure session management (HttpOnly, Secure, SameSite cookies)
15
+ - Require re-authentication for sensitive operations
16
+
17
+ ### Authorization
18
+ - Check permissions server-side on EVERY request
19
+ - Use deny-by-default — explicitly grant access
20
+ - Verify resource ownership (user can only access their own data)
21
+ - Log authorization failures for monitoring
22
+
23
+ ### Data Protection
24
+ - Never log sensitive data (passwords, tokens, PII)
25
+ - Encrypt sensitive data at rest
26
+ - Use HTTPS for all communications
27
+ - Set appropriate CORS headers — never use wildcard in production
28
+
29
+ ### Secrets Management
30
+ - Never hardcode secrets in source code
31
+ - Use environment variables or secret managers
32
+ - Rotate secrets regularly
33
+ - Check for accidentally committed secrets (API keys, passwords, tokens)
34
+
35
+ ### Dependency Security
36
+ - Check for known vulnerabilities in dependencies
37
+ - Pin dependency versions to prevent supply chain attacks
38
+ - Review new dependencies before adding them
39
+
40
+ ### Required Output (Mandatory)
41
+
42
+ Your output MUST include these sections when this skill is active:
43
+
44
+ 1. **Threat Model (STRIDE)**: Identify threats across Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege
45
+ 2. **Auth Flow**: Step-by-step diagram of authentication/authorization flow with session/token handling
46
+ 3. **Input Validation Points**: List all places where user input enters the system and how each is validated/sanitized
47
+ 4. **Security Checklist**: Items verified (no secrets in code, secrets rotated, HTTPS enforced, CORS configured, rate limiting applied)
48
+
49
+ If any section is not applicable, explicitly state why it's skipped.
@@ -0,0 +1,40 @@
1
+ ## Systematic Debugging: Root Cause Analysis
2
+
3
+ A previous attempt at this stage FAILED. Do NOT blindly retry the same approach. Follow this 4-phase investigation:
4
+
5
+ ### Phase 1: Evidence Collection
6
+ - Read the error output from the previous attempt carefully
7
+ - Identify the EXACT line/file where the failure occurred
8
+ - Check if the error is a symptom or the root cause
9
+ - Look for patterns: is this a known error type?
10
+
11
+ ### Phase 2: Hypothesis Formation
12
+ - List 3 possible root causes for this failure
13
+ - For each hypothesis, identify what evidence would confirm or deny it
14
+ - Rank hypotheses by likelihood
15
+
16
+ ### Phase 3: Root Cause Verification
17
+ - Test the most likely hypothesis first
18
+ - Read the relevant source code — don't guess
19
+ - Check if previous artifacts (plan.md, design.md) are correct or flawed
20
+ - If the plan was correct but execution failed, focus on execution
21
+ - If the plan was flawed, document what was wrong
22
+
23
+ ### Phase 4: Targeted Fix
24
+ - Fix the ROOT CAUSE, not the symptom
25
+ - If the previous approach was fundamentally wrong, choose a different approach
26
+ - If it was a minor error, make the minimal fix
27
+ - Document what went wrong and why the new approach is better
28
+
29
+ IMPORTANT: If you find existing artifacts from a successful previous stage, USE them — don't regenerate from scratch.
30
+
31
+ ### Required Output (Mandatory)
32
+
33
+ Your output MUST include these sections when this skill is active:
34
+
35
+ 1. **Root Cause Hypothesis**: List 3 possible root causes ranked by likelihood with specific evidence that would confirm/deny each
36
+ 2. **Evidence Gathered**: Exact file:line location of failure, error messages, logs, code examination results, artifact validation (plan.md, design.md correctness)
37
+ 3. **Fix Strategy**: Description of the ROOT CAUSE fix (not the symptom), with rationale for why this approach differs from the previous failed attempt
38
+ 4. **Verification Plan**: How to verify the fix works (test cases, specific checks, expected behavior confirmation)
39
+
40
+ If any section is not applicable, explicitly state why it's skipped.
@@ -0,0 +1,47 @@
1
+ ## Testing Strategy Expertise
2
+
3
+ Apply these testing patterns:
4
+
5
+ ### Test Pyramid
6
+ - **Unit tests** (70%): Test individual functions/methods in isolation
7
+ - **Integration tests** (20%): Test component interactions and boundaries
8
+ - **E2E tests** (10%): Test critical user flows end-to-end
9
+
10
+ ### What to Test
11
+ - Happy path: the expected successful flow
12
+ - Error cases: what happens when things go wrong?
13
+ - Edge cases: empty inputs, maximum values, concurrent access
14
+ - Boundary conditions: off-by-one, empty collections, null/undefined
15
+
16
+ ### Test Quality
17
+ - Each test should verify ONE behavior
18
+ - Test names should describe the expected behavior, not the implementation
19
+ - Tests should be independent — no shared mutable state between tests
20
+ - Tests should be deterministic — same result every run
21
+
22
+ ### Coverage Strategy
23
+ - Aim for meaningful coverage, not 100% line coverage
24
+ - Focus coverage on business logic and error handling
25
+ - Don't test framework code or simple getters/setters
26
+ - Cover the branches, not just the lines
27
+
28
+ ### Mocking Guidelines
29
+ - Mock external dependencies (APIs, databases, file system)
30
+ - Don't mock the code under test
31
+ - Use realistic test data — edge cases reveal bugs
32
+ - Verify mock interactions when the side effect IS the behavior
33
+
34
+ ### Regression Testing
35
+ - Write a failing test FIRST that reproduces the bug
36
+ - Then fix the bug and verify the test passes
37
+ - Keep regression tests — they prevent the bug from recurring
38
+
39
+ ### Required Output (Mandatory)
40
+
41
+ Your output MUST include these sections when this skill is active:
42
+
43
+ 1. **Test Pyramid Breakdown**: Explicit count of unit/integration/E2E tests and their coverage targets (e.g., "70 unit tests covering business logic, 12 integration tests for API boundaries, 3 E2E tests for critical paths")
44
+ 2. **Coverage Targets**: Target coverage percentage per layer and which critical paths MUST be tested
45
+ 3. **Critical Paths to Test**: Specific test cases for the happy path, 2+ error cases, and 2+ edge cases
46
+
47
+ If any section is not applicable, explicitly state why it's skipped.
@@ -0,0 +1,52 @@
1
+ ## Two-Stage Code Review
2
+
3
+ This review runs in two passes. Complete Pass 1 fully before starting Pass 2.
4
+
5
+ ### Pass 1: Spec Compliance
6
+
7
+ Compare the implementation against the plan and issue requirements:
8
+
9
+ 1. **Task Checklist**: Does the code implement every task from plan.md?
10
+ 2. **Files Modified**: Were all planned files actually modified?
11
+ 3. **Requirements Coverage**: Does the implementation satisfy every requirement from the issue?
12
+ 4. **Missing Features**: Is anything from the plan NOT implemented?
13
+ 5. **Scope Creep**: Was anything added that WASN'T in the plan?
14
+
15
+ For each gap found:
16
+ - **[SPEC-GAP]** description — what was planned vs what was implemented
17
+
18
+ If all requirements are met, write: "Spec compliance: PASS — all planned tasks implemented."
19
+
20
+ ---
21
+
22
+ ### Pass 2: Code Quality
23
+
24
+ Now review the code for engineering quality:
25
+
26
+ 1. **Logic bugs** — incorrect conditions, off-by-one errors, null handling
27
+ 2. **Security** — injection, XSS, auth bypass, secret exposure
28
+ 3. **Error handling** — missing catch blocks, silent failures, unclear error messages
29
+ 4. **Performance** — unnecessary loops, missing indexes, N+1 queries
30
+ 5. **Naming and clarity** — confusing names, missing context, magic numbers
31
+ 6. **Test coverage** — are new code paths tested? Edge cases covered?
32
+
33
+ For each issue found, use format:
34
+ - **[SEVERITY]** file:line — description
35
+
36
+ Severity: Critical, Bug, Security, Warning, Suggestion
37
+
38
+ ### Required Output (Mandatory)
39
+
40
+ Your output MUST include these sections for EACH review pass:
41
+
42
+ **Pass 1 Output:**
43
+ - **Spec Compliance Verdict**: PASS or FAIL with explicit gaps found (if any)
44
+ - **Unimplemented Tasks**: List any planned tasks NOT in the code
45
+ - **Unplanned Code**: List any code added that was NOT in the plan
46
+
47
+ **Pass 2 Output:**
48
+ - **Code Quality Verdict**: PASS or list all findings by severity
49
+ - **Critical/Security Issues Found**: Explicit count with file:line references
50
+ - **Suggested Improvements**: Optional suggestions that don't block PASS
51
+
52
+ If any finding is not applicable, explicitly state why it's skipped.