@qball-inc/the-bulwark 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (175) hide show
  1. package/.claude-plugin/plugin.json +43 -0
  2. package/agents/bulwark-fix-validator.md +633 -0
  3. package/agents/bulwark-implementer.md +391 -0
  4. package/agents/bulwark-issue-analyzer.md +308 -0
  5. package/agents/bulwark-standards-reviewer.md +221 -0
  6. package/agents/plan-creation-architect.md +323 -0
  7. package/agents/plan-creation-eng-lead.md +352 -0
  8. package/agents/plan-creation-po.md +300 -0
  9. package/agents/plan-creation-qa-critic.md +334 -0
  10. package/agents/product-ideation-competitive-analyzer.md +298 -0
  11. package/agents/product-ideation-idea-validator.md +268 -0
  12. package/agents/product-ideation-market-researcher.md +292 -0
  13. package/agents/product-ideation-pattern-documenter.md +308 -0
  14. package/agents/product-ideation-segment-analyzer.md +303 -0
  15. package/agents/product-ideation-strategist.md +259 -0
  16. package/agents/statusline-setup.md +97 -0
  17. package/hooks/hooks.json +59 -0
  18. package/package.json +45 -0
  19. package/scripts/hooks/cleanup-stale.sh +13 -0
  20. package/scripts/hooks/enforce-quality.sh +166 -0
  21. package/scripts/hooks/implementer-quality.sh +256 -0
  22. package/scripts/hooks/inject-protocol.sh +52 -0
  23. package/scripts/hooks/suggest-pipeline.sh +175 -0
  24. package/scripts/hooks/track-pipeline-start.sh +37 -0
  25. package/scripts/hooks/track-pipeline-stop.sh +52 -0
  26. package/scripts/init-rules.sh +35 -0
  27. package/scripts/init.sh +151 -0
  28. package/skills/anthropic-validator/SKILL.md +607 -0
  29. package/skills/anthropic-validator/references/agents-checklist.md +131 -0
  30. package/skills/anthropic-validator/references/commands-checklist.md +102 -0
  31. package/skills/anthropic-validator/references/hooks-checklist.md +151 -0
  32. package/skills/anthropic-validator/references/mcp-checklist.md +136 -0
  33. package/skills/anthropic-validator/references/plugins-checklist.md +148 -0
  34. package/skills/anthropic-validator/references/skills-checklist.md +85 -0
  35. package/skills/assertion-patterns/SKILL.md +296 -0
  36. package/skills/bug-magnet-data/SKILL.md +284 -0
  37. package/skills/bug-magnet-data/context/cli-args.md +91 -0
  38. package/skills/bug-magnet-data/context/db-query.md +104 -0
  39. package/skills/bug-magnet-data/context/file-contents.md +103 -0
  40. package/skills/bug-magnet-data/context/http-body.md +91 -0
  41. package/skills/bug-magnet-data/context/process-spawn.md +123 -0
  42. package/skills/bug-magnet-data/data/booleans/boundaries.yaml +143 -0
  43. package/skills/bug-magnet-data/data/collections/arrays.yaml +114 -0
  44. package/skills/bug-magnet-data/data/collections/objects.yaml +123 -0
  45. package/skills/bug-magnet-data/data/concurrency/race-conditions.yaml +118 -0
  46. package/skills/bug-magnet-data/data/concurrency/state-machines.yaml +115 -0
  47. package/skills/bug-magnet-data/data/dates/boundaries.yaml +137 -0
  48. package/skills/bug-magnet-data/data/dates/invalid.yaml +132 -0
  49. package/skills/bug-magnet-data/data/dates/timezone.yaml +118 -0
  50. package/skills/bug-magnet-data/data/encoding/charset.yaml +79 -0
  51. package/skills/bug-magnet-data/data/encoding/normalization.yaml +105 -0
  52. package/skills/bug-magnet-data/data/formats/email.yaml +154 -0
  53. package/skills/bug-magnet-data/data/formats/json.yaml +187 -0
  54. package/skills/bug-magnet-data/data/formats/url.yaml +165 -0
  55. package/skills/bug-magnet-data/data/language-specific/javascript.yaml +182 -0
  56. package/skills/bug-magnet-data/data/language-specific/python.yaml +174 -0
  57. package/skills/bug-magnet-data/data/language-specific/rust.yaml +148 -0
  58. package/skills/bug-magnet-data/data/numbers/boundaries.yaml +161 -0
  59. package/skills/bug-magnet-data/data/numbers/precision.yaml +89 -0
  60. package/skills/bug-magnet-data/data/numbers/special.yaml +69 -0
  61. package/skills/bug-magnet-data/data/strings/boundaries.yaml +109 -0
  62. package/skills/bug-magnet-data/data/strings/injection.yaml +208 -0
  63. package/skills/bug-magnet-data/data/strings/special-chars.yaml +190 -0
  64. package/skills/bug-magnet-data/data/strings/unicode.yaml +139 -0
  65. package/skills/bug-magnet-data/references/external-lists.md +115 -0
  66. package/skills/bulwark-brainstorm/SKILL.md +563 -0
  67. package/skills/bulwark-brainstorm/references/at-teammate-prompts.md +60 -0
  68. package/skills/bulwark-brainstorm/references/role-critical-analyst.md +78 -0
  69. package/skills/bulwark-brainstorm/references/role-development-lead.md +66 -0
  70. package/skills/bulwark-brainstorm/references/role-product-delivery-lead.md +79 -0
  71. package/skills/bulwark-brainstorm/references/role-product-manager.md +62 -0
  72. package/skills/bulwark-brainstorm/references/role-project-sme.md +59 -0
  73. package/skills/bulwark-brainstorm/references/role-technical-architect.md +66 -0
  74. package/skills/bulwark-research/SKILL.md +298 -0
  75. package/skills/bulwark-research/references/viewpoint-contrarian.md +63 -0
  76. package/skills/bulwark-research/references/viewpoint-direct-investigation.md +62 -0
  77. package/skills/bulwark-research/references/viewpoint-first-principles.md +65 -0
  78. package/skills/bulwark-research/references/viewpoint-practitioner.md +62 -0
  79. package/skills/bulwark-research/references/viewpoint-prior-art.md +66 -0
  80. package/skills/bulwark-scaffold/SKILL.md +330 -0
  81. package/skills/bulwark-statusline/SKILL.md +161 -0
  82. package/skills/bulwark-statusline/scripts/statusline.sh +144 -0
  83. package/skills/bulwark-verify/SKILL.md +519 -0
  84. package/skills/code-review/SKILL.md +428 -0
  85. package/skills/code-review/examples/anti-patterns/linting.ts +181 -0
  86. package/skills/code-review/examples/anti-patterns/security.ts +91 -0
  87. package/skills/code-review/examples/anti-patterns/standards.ts +195 -0
  88. package/skills/code-review/examples/anti-patterns/type-safety.ts +108 -0
  89. package/skills/code-review/examples/recommended/linting.ts +195 -0
  90. package/skills/code-review/examples/recommended/security.ts +154 -0
  91. package/skills/code-review/examples/recommended/standards.ts +231 -0
  92. package/skills/code-review/examples/recommended/type-safety.ts +181 -0
  93. package/skills/code-review/frameworks/angular.md +218 -0
  94. package/skills/code-review/frameworks/django.md +235 -0
  95. package/skills/code-review/frameworks/express.md +207 -0
  96. package/skills/code-review/frameworks/flask.md +298 -0
  97. package/skills/code-review/frameworks/generic.md +146 -0
  98. package/skills/code-review/frameworks/react.md +152 -0
  99. package/skills/code-review/frameworks/vue.md +244 -0
  100. package/skills/code-review/references/linting-patterns.md +221 -0
  101. package/skills/code-review/references/security-patterns.md +125 -0
  102. package/skills/code-review/references/standards-patterns.md +246 -0
  103. package/skills/code-review/references/type-safety-patterns.md +130 -0
  104. package/skills/component-patterns/SKILL.md +131 -0
  105. package/skills/component-patterns/references/pattern-cli-command.md +118 -0
  106. package/skills/component-patterns/references/pattern-database.md +166 -0
  107. package/skills/component-patterns/references/pattern-external-api.md +139 -0
  108. package/skills/component-patterns/references/pattern-file-parser.md +168 -0
  109. package/skills/component-patterns/references/pattern-http-server.md +162 -0
  110. package/skills/component-patterns/references/pattern-process-spawner.md +133 -0
  111. package/skills/continuous-feedback/SKILL.md +327 -0
  112. package/skills/continuous-feedback/references/collect-instructions.md +81 -0
  113. package/skills/continuous-feedback/references/specialize-code-review.md +82 -0
  114. package/skills/continuous-feedback/references/specialize-general.md +98 -0
  115. package/skills/continuous-feedback/references/specialize-test-audit.md +81 -0
  116. package/skills/create-skill/SKILL.md +359 -0
  117. package/skills/create-skill/references/agent-conventions.md +194 -0
  118. package/skills/create-skill/references/agent-template.md +195 -0
  119. package/skills/create-skill/references/content-guidance.md +291 -0
  120. package/skills/create-skill/references/decision-framework.md +124 -0
  121. package/skills/create-skill/references/template-pipeline.md +217 -0
  122. package/skills/create-skill/references/template-reference-heavy.md +111 -0
  123. package/skills/create-skill/references/template-research.md +210 -0
  124. package/skills/create-skill/references/template-script-driven.md +172 -0
  125. package/skills/create-skill/references/template-simple.md +80 -0
  126. package/skills/create-subagent/SKILL.md +353 -0
  127. package/skills/create-subagent/references/agent-conventions.md +268 -0
  128. package/skills/create-subagent/references/content-guidance.md +232 -0
  129. package/skills/create-subagent/references/decision-framework.md +134 -0
  130. package/skills/create-subagent/references/template-single-agent.md +192 -0
  131. package/skills/fix-bug/SKILL.md +241 -0
  132. package/skills/governance-protocol/SKILL.md +116 -0
  133. package/skills/init/SKILL.md +341 -0
  134. package/skills/issue-debugging/SKILL.md +385 -0
  135. package/skills/issue-debugging/references/anti-patterns.md +245 -0
  136. package/skills/issue-debugging/references/debug-report-schema.md +227 -0
  137. package/skills/mock-detection/SKILL.md +511 -0
  138. package/skills/mock-detection/references/false-positive-prevention.md +402 -0
  139. package/skills/mock-detection/references/stub-patterns.md +236 -0
  140. package/skills/pipeline-templates/SKILL.md +215 -0
  141. package/skills/pipeline-templates/references/code-change-workflow.md +277 -0
  142. package/skills/pipeline-templates/references/code-review.md +336 -0
  143. package/skills/pipeline-templates/references/fix-validation.md +421 -0
  144. package/skills/pipeline-templates/references/new-feature.md +335 -0
  145. package/skills/pipeline-templates/references/research-brainstorm.md +161 -0
  146. package/skills/pipeline-templates/references/research-planning.md +257 -0
  147. package/skills/pipeline-templates/references/test-audit.md +389 -0
  148. package/skills/pipeline-templates/references/test-execution-fix.md +238 -0
  149. package/skills/plan-creation/SKILL.md +497 -0
  150. package/skills/product-ideation/SKILL.md +372 -0
  151. package/skills/product-ideation/references/analysis-frameworks.md +161 -0
  152. package/skills/session-handoff/SKILL.md +139 -0
  153. package/skills/session-handoff/references/examples.md +223 -0
  154. package/skills/setup-lsp/SKILL.md +312 -0
  155. package/skills/setup-lsp/references/server-registry.md +85 -0
  156. package/skills/setup-lsp/references/troubleshooting.md +135 -0
  157. package/skills/subagent-output-templating/SKILL.md +415 -0
  158. package/skills/subagent-output-templating/references/examples.md +440 -0
  159. package/skills/subagent-prompting/SKILL.md +364 -0
  160. package/skills/subagent-prompting/references/examples.md +342 -0
  161. package/skills/test-audit/SKILL.md +531 -0
  162. package/skills/test-audit/references/known-limitations.md +41 -0
  163. package/skills/test-audit/references/priority-classification.md +30 -0
  164. package/skills/test-audit/references/prompts/deep-mode-detection.md +83 -0
  165. package/skills/test-audit/references/prompts/synthesis.md +57 -0
  166. package/skills/test-audit/references/rewrite-instructions.md +46 -0
  167. package/skills/test-audit/references/schemas/audit-output.yaml +100 -0
  168. package/skills/test-audit/references/schemas/diagnostic-output.yaml +49 -0
  169. package/skills/test-audit/scripts/data-flow-analyzer.ts +509 -0
  170. package/skills/test-audit/scripts/integration-mock-detector.ts +462 -0
  171. package/skills/test-audit/scripts/package.json +20 -0
  172. package/skills/test-audit/scripts/skip-detector.ts +211 -0
  173. package/skills/test-audit/scripts/verification-counter.ts +295 -0
  174. package/skills/test-classification/SKILL.md +310 -0
  175. package/skills/test-fixture-creation/SKILL.md +295 -0
@@ -0,0 +1,531 @@
1
+ ---
2
+ name: test-audit
3
+ description: Audit test suites for T1-T4 violations using AST analysis, mock detection, and multi-stage synthesis. Invoke when user asks to audit tests, check test quality, find mock violations, review test effectiveness, or inspect test suites for over-mocking. Triggers automatic rewrites when quality gates fail.
4
+ user-invocable: true
5
+ argument-hint: [path] [--threshold=N]
6
+ skills:
7
+ - test-classification
8
+ - mock-detection
9
+ - assertion-patterns
10
+ - component-patterns
11
+ - bug-magnet-data
12
+ ---
13
+
14
+ # Test Audit
15
+
16
+ User-facing entry point for test suite quality auditing. Orchestrates classification, mock detection, and synthesis stages to identify T1-T4 violations and trigger automatic rewrites when required.
17
+
18
+ ---
19
+
20
+ ## When to Use This Skill
21
+
22
+ **Load this skill when the user request matches ANY of these patterns:**
23
+
24
+ | Trigger Pattern | Example User Request |
25
+ |-----------------|---------------------|
26
+ | Test quality audit | "Audit my tests", "Check test quality", "Review test suite" |
27
+ | Mock detection | "Find mock violations", "Check for T1 violations", "Are my tests over-mocked?" |
28
+ | Test effectiveness | "How effective are my tests?", "Are my tests real or mocked?" |
29
+ | After writing tests | "I just wrote tests for X, can you audit them?" |
30
+ | CI/CD integration | "Add test audit to pipeline", "Validate tests before merge" |
31
+
32
+ **DO NOT use for:**
33
+ - Running tests (use `just test`)
34
+ - Writing new tests (implement directly)
35
+ - General code review (use `code-review` skill)
36
+ - Debugging test failures (use `issue-debugging` skill)
37
+
38
+ ---
39
+
40
+ ## Pre-Flight Gate (BLOCKING)
41
+
42
+ **STOP. Before ANY analysis, you MUST acknowledge what this skill requires.**
43
+
44
+ This skill uses a **multi-stage pipeline with sub-agents**. You are the orchestrator, NOT the executor.
45
+
46
+ ### What You MUST Do
47
+
48
+ 1. **Run Stage 0 AST scripts** before any LLM stages:
49
+ - `just verify-count {target}` → `/tmp/claude/ast-verify-count.json`
50
+ - `just skip-detect {target}` → `/tmp/claude/ast-skip-detect.json`
51
+ - `just ast-analyze {target}` → `/tmp/claude/ast-data-flow.json`
52
+
53
+ 2. **Select mode** based on file count and threshold (default 5)
54
+
55
+ 3. **Spawn sub-agents** for each applicable stage:
56
+ - Stage 1 (Scale mode only): Classification → `Task(subagent_type="general-purpose", model="haiku", ...)`
57
+ - Stage 2: Mock Detection → `Task(subagent_type="general-purpose", model="sonnet", ...)`
58
+ - Stage 3: Synthesis → `Task(subagent_type="general-purpose", model="sonnet", ...)`
59
+
60
+ 4. **Write outputs to logs/**:
61
+ - `logs/test-classification-{YYYYMMDD-HHMMSS}.yaml` (Scale mode only)
62
+ - `logs/mock-detection-{YYYYMMDD-HHMMSS}.yaml`
63
+ - `logs/test-audit-{YYYYMMDD-HHMMSS}.yaml`
64
+ - `logs/diagnostics/test-audit-{YYYYMMDD-HHMMSS}.yaml`
65
+
66
+ 5. **Follow the orchestration instructions exactly** - do not substitute your own judgment
67
+
68
+ ### What You MUST NOT Do
69
+
70
+ - **Do NOT skip Stage 0** - AST scripts provide deterministic metadata that LLM stages depend on
71
+ - **Do NOT perform classification yourself** - spawn a Haiku sub-agent (Scale mode)
72
+ - **Do NOT perform mock detection yourself** - spawn a Sonnet sub-agent
73
+ - **Do NOT perform synthesis yourself** - spawn a Sonnet sub-agent
74
+ - **Do NOT skip stages** because you think you can do it faster
75
+ - **Do NOT return to user** until all log files are written
76
+
77
+ ### Why This Matters
78
+
79
+ The pipeline exists for:
80
+ - **Bias avoidance** - Different models for different stages prevent self-review bias
81
+ - **Structured artifacts** - Logs enable observability and debugging
82
+ - **Deterministic workflow** - Reproducible results across sessions
83
+ - **Separation of concerns** - Each stage has a specific role
84
+
85
+ **If you find yourself thinking "I can just analyze this directly" - STOP. That violates SC1-SC2 in Rules.md.**
86
+
87
+ ### Completion Checklist
88
+
89
+ Before returning to user, verify ALL items:
90
+
91
+ - [ ] Stage 0 (AST) completed - outputs in `/tmp/claude/ast-*.json` (or graceful degradation logged)
92
+ - [ ] Mode selected (Deep or Scale) and displayed to user
93
+ - [ ] Stage 1 (Classification) completed (Scale mode only) - output written to `logs/test-classification-*.yaml`
94
+ - [ ] Stage 2 (Mock Detection) completed - output written to `logs/mock-detection-*.yaml`
95
+ - [ ] Stage 3 (Synthesis) completed - output written to `logs/test-audit-*.yaml`
96
+ - [ ] Summary presented to user with violation counts and REWRITE_REQUIRED status
97
+ - [ ] Diagnostic output written to `logs/diagnostics/test-audit-*.yaml` (includes mode, threshold, AST status)
98
+
99
+ **If REWRITE_REQUIRED == true, also verify:**
100
+ - [ ] For each file: component type identified
101
+ - [ ] For each file: `bug-magnet-data` context file loaded for component type
102
+ - [ ] For each file: T0 + T1 edge cases loaded from bug-magnet-data
103
+ - [ ] Verification scripts include edge cases from bug-magnet-data
104
+ - [ ] Destructive patterns (`safe_for_automation: false`) excluded or marked manual-only
105
+ - [ ] Rewrites applied using assertion-patterns and component-patterns
106
+
107
+ **Do NOT return to user until all applicable checklist items are verified.**
108
+
109
+ ---
110
+
111
+ ## Usage
112
+
113
+ ```
114
+ /test-audit [path] [--threshold=N]
115
+ ```
116
+
117
+ **Examples:**
118
+ - `/test-audit tests/` - Audit all tests in tests/ directory
119
+ - `/test-audit src/__tests__/api.test.ts` - Audit specific file
120
+ - `/test-audit tests/ --threshold=10` - Force Scale mode for ≤10 files
121
+ - `/test-audit` - Audit tests mentioned in recent context (or prompt for path)
122
+
123
+ ---
124
+
125
+ ## Pipeline Overview
126
+
127
+ ```
128
+ /test-audit tests/
129
+
130
+ ┌─────────────────────────────────────────────────────────────────────┐
131
+ │ ORCHESTRATOR (Opus) - Main Context │
132
+ │ │
133
+ │ Stage 0: AST Pre-Processing (deterministic, no LLM) │
134
+ │ └─ just verify-count {target} │
135
+ │ └─ just skip-detect {target} │
136
+ │ └─ just ast-analyze {target} │
137
+ │ └─ Output: /tmp/claude/ast-*.json │
138
+ │ │
139
+ │ Mode Selection: file_count ≤ threshold → Deep, else → Scale │
140
+ │ │
141
+ │ ┌─── DEEP MODE (≤5 files) ──────── SCALE MODE (>5 files) ────┐ │
142
+ │ │ │ │
143
+ │ │ [skip classification] Stage 1: Classification │ │
144
+ │ │ └─ Haiku + AST hints │ │
145
+ │ │ │ │
146
+ │ │ Stage 2: Detection Stage 2: Detection │ │
147
+ │ │ └─ Sonnet, ALL files └─ Sonnet, flagged only │ │
148
+ │ │ └─ Self-computes metadata └─ Uses classification │ │
149
+ │ │ │ │
150
+ │ └──────────────────────────────────────────────────────────────┘ │
151
+ │ │
152
+ │ Stage 3: Synthesis (Sonnet) — unified for both modes │
153
+ │ │
154
+ │ Step 4: Present summary to user │
155
+ │ │
156
+ │ Step 5: If REWRITE_REQUIRED → Implement rewrites (Opus) │
157
+ │ │
158
+ └─────────────────────────────────────────────────────────────────────┘
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Orchestration Instructions
164
+
165
+ When this skill is loaded, follow these steps exactly:
166
+
167
+ ### Step 1: Resolve Target
168
+
169
+ ```
170
+ IF $ARGUMENTS provided:
171
+ target = $1 (first argument)
172
+ Parse optional flags:
173
+ --threshold=N → override default threshold (default: 5)
174
+ ELSE:
175
+ Look for test files in recent conversation context
176
+ IF found: target = that path
177
+ ELSE: Ask user: "Which test directory or file should I audit?"
178
+ ```
179
+
180
+ ### Step 2: Stage 0 — AST Pre-Processing (MANDATORY)
181
+
182
+ **This step is BINDING. Do NOT skip it.** AST scripts provide deterministic metadata that replaces heuristic estimates. Skipping Stage 0 degrades audit accuracy.
183
+
184
+ 1. Generate timestamp: `YYYYMMDD-HHMMSS`
185
+ 2. Count test files in target (glob `**/*.test.{ts,tsx,js,jsx}` + `**/*.spec.{ts,tsx,js,jsx}`)
186
+ 3. Run all four AST scripts via Justfile recipes:
187
+
188
+ ```bash
189
+ just verify-count {target} > /tmp/claude/ast-verify-count.json
190
+ just skip-detect {target} > /tmp/claude/ast-skip-detect.json
191
+ just ast-analyze {target} > /tmp/claude/ast-data-flow.json
192
+ just integration-mocks {target} > /tmp/claude/ast-integration-mocks.json
193
+ ```
194
+
195
+ 4. Read each output file and verify valid JSON
196
+ 5. If any script fails: log warning in diagnostics, continue with LLM-only analysis for that dimension (graceful degradation)
197
+
198
+ **AST output schemas** (for prompt injection into LLM stages):
199
+
200
+ ```json
201
+ // verify-count output (per file)
202
+ { "file": "tests/user.test.ts", "metrics": { "total_lines": 156, "test_logic_lines": 98, "assertion_lines": 42, "setup_lines": 56, "effectiveness_percent": 42.86, "framework_detected": "jest" } }
203
+
204
+ // skip-detect output (per file)
205
+ { "file": "tests/user.test.ts", "markers": [{ "type": "test.skip", "line": 42, "test_name": "should handle edge case", "severity": "medium", "rule": "T4" }], "summary": { "skip_count": 1, "only_count": 0, "todo_count": 0 } }
206
+
207
+ // ast-analyze output (per file)
208
+ { "file": "tests/workflow.integration.ts", "violations": [{ "line": 42, "type": "T3+", "confidence": "high", "variable": "orderData", "source": "object_literal", "message": "Variable 'orderData' is manually constructed", "suggestion": "Replace with factory function or upstream function output" }] }
209
+
210
+ // integration-mocks output (per file)
211
+ { "file": "tests/error-handler.test.ts", "sections": [{ "name": "Error Handler Integration", "type": "integration", "signal": "keyword_in_name", "line_start": 559, "line_end": 628 }], "leads": [{ "line": 562, "type": "T3", "confidence": "high", "mock_pattern": "jest.fn().mockImplementation()", "enclosing_block": "Error Handler Integration", "block_type": "integration", "message": "Mock call in integration test block", "suggestion": "Replace mock with actual implementation" }], "summary": { "sections_found": 1, "integration_sections": 1, "e2e_sections": 0, "leads_count": 1, "mock_calls_in_integration": 1, "mock_calls_in_e2e": 0 } }
212
+ ```
213
+
214
+ ### Step 3: Mode Selection
215
+
216
+ ```
217
+ threshold = $THRESHOLD_FLAG OR 5 (default)
218
+ file_count = count of test files in target
219
+
220
+ IF file_count <= threshold:
221
+ IF file_count > 25:
222
+ mode = "scale"
223
+ WARN "Deep mode safety cap exceeded (>25 files). Falling back to Scale mode."
224
+ ELSE:
225
+ mode = "deep"
226
+ ELSE:
227
+ mode = "scale"
228
+ ```
229
+
230
+ **Display mode selection to user:**
231
+
232
+ ```
233
+ ## Test Audit: {mode} Mode
234
+
235
+ **Target:** {target}
236
+ **Files:** {file_count}
237
+ **Threshold:** {threshold}
238
+ **Mode:** {mode} ({rationale})
239
+
240
+ Stage 0 (AST): {status — success/partial/failed}
241
+ verify-count: {ok/failed}
242
+ skip-detect: {ok/failed}
243
+ ast-analyze: {ok/failed}
244
+
245
+ Proceeding with {mode} mode pipeline...
246
+ ```
247
+
248
+ ### Step 4: Classification Stage — Scale Mode Only
249
+
250
+ **Skip this step entirely in Deep mode.** In Deep mode, detection (Step 5) self-computes classification metadata using AST output.
251
+
252
+ 1. Access the `test-classification` skill (loaded via frontmatter dependency)
253
+
254
+ **Batching check:**
255
+ ```
256
+ IF file_count > 20:
257
+ Split files into batches of 20-25
258
+ FOR each batch IN PARALLEL:
259
+ Construct 4-part prompt with batch file list
260
+ INCLUDE AST hints in CONTEXT (verify-count + skip-detect per file)
261
+ Task(subagent_type="general-purpose", model="haiku",
262
+ prompt=batch_prompt, run_in_background=true)
263
+ Read all batch outputs
264
+ Merge into single classification YAML
265
+ ELSE:
266
+ Construct 4-part prompt using the skill's template
267
+ INCLUDE AST hints in CONTEXT (verify-count + skip-detect per file)
268
+ Task(subagent_type="general-purpose", model="haiku", prompt=...)
269
+ ```
270
+
271
+ **AST hints for classification CONTEXT:**
272
+ ```
273
+ The following AST-computed metadata is available for each file.
274
+ Use this to improve classification accuracy — these are deterministic,
275
+ not heuristic.
276
+
277
+ {for each file in target}:
278
+ file: {path}
279
+ ast_verification_lines: {metrics.test_logic_lines}
280
+ ast_assertion_lines: {metrics.assertion_lines}
281
+ ast_skip_markers: {markers array or "none"}
282
+ ast_data_flow_violations: {violations array or "none"}
283
+ ```
284
+
285
+ 2. Read output from `logs/test-classification-{YYYYMMDD-HHMMSS}.yaml`
286
+ 3. Verify output contains `files` array with classification data
287
+
288
+ ### Step 5: Detection Stage (Sonnet)
289
+
290
+ **Behavior differs by mode:**
291
+
292
+ #### Deep Mode Detection
293
+
294
+ In Deep mode, ALL files are analyzed (no classification filtering). The detection agent self-computes classification metadata from AST output.
295
+
296
+ 1. Access the `mock-detection` skill (loaded via frontmatter dependency)
297
+ 2. Construct the Deep Mode Detection Prompt (see "Deep Mode Detection Prompt" section below)
298
+ 3. Include ALL test files in the prompt with their AST metadata
299
+
300
+ **Batching check (deep mode):**
301
+ ```
302
+ IF file_count > 10:
303
+ Split files into batches of 10-15
304
+ FOR each batch:
305
+ Include full AST metadata per file
306
+ Task(subagent_type="general-purpose", model="sonnet",
307
+ prompt=deep_mode_batch_prompt, run_in_background=true)
308
+ Read all batch outputs
309
+ Merge into single detection YAML
310
+ ELSE:
311
+ Task(subagent_type="general-purpose", model="sonnet",
312
+ prompt=deep_mode_prompt)
313
+ ```
314
+
315
+ #### Scale Mode Detection
316
+
317
+ In Scale mode, only files flagged by classification are analyzed.
318
+
319
+ 1. Access the `mock-detection` skill (loaded via frontmatter dependency)
320
+ 2. Extract files with `needs_deep_analysis: true` from classification output
321
+ 3. Count flagged files
322
+
323
+ **Batching check (scale mode):**
324
+ ```
325
+ IF flagged_file_count > 10:
326
+ Split flagged files into batches of 10-15
327
+ FOR each batch:
328
+ Include verification_lines from classification for each file
329
+ Include AST metadata (data-flow violations, skip markers) per file
330
+ Task(subagent_type="general-purpose", model="sonnet",
331
+ prompt=batch_prompt, run_in_background=true)
332
+ Read all batch outputs
333
+ Merge into single detection YAML
334
+ ELSE:
335
+ Construct 4-part prompt using the skill's template
336
+ Include AST metadata in CONTEXT
337
+ Task(subagent_type="general-purpose", model="sonnet", prompt=...)
338
+ ```
339
+
340
+ 4. Read output from `logs/mock-detection-{YYYYMMDD-HHMMSS}.yaml`
341
+ 5. Verify output contains `violations` array and `file_summaries`
342
+
343
+ ### Step 6: Synthesis Stage (Sonnet)
344
+
345
+ 1. Construct synthesis prompt using template below (unified for both modes)
346
+ 2. Include detection output in CONTEXT
347
+ 3. Include classification output in CONTEXT (Scale mode) or note "Deep mode — no classification stage" (Deep mode)
348
+ 4. Include AST skip-detect output for T4 violation synthesis
349
+ 5. Spawn sub-agent:
350
+ ```
351
+ Task(
352
+ subagent_type="general-purpose",
353
+ model="sonnet",
354
+ prompt="[synthesis 4-part prompt]"
355
+ )
356
+ ```
357
+ 6. Read output from `logs/test-audit-{YYYYMMDD-HHMMSS}.yaml`
358
+ 7. Verify output contains `directive.REWRITE_REQUIRED` field
359
+
360
+ ### Step 7: Present Summary
361
+
362
+ Display audit summary to user before any rewrites:
363
+
364
+ ```
365
+ ## Test Audit Complete ({mode} Mode)
366
+
367
+ **Target:** {target}
368
+ **Files audited:** {total_files}
369
+ **Files analyzed:** {files_analyzed} (deep: all, scale: flagged only)
370
+ **Overall test effectiveness:** {percentage}%
371
+
372
+ ### Stage 0 (AST)
373
+ - Verification lines: AST-precise (not heuristic)
374
+ - Skip markers (T4): {count} found
375
+ - Data flow leads (T3+): {count} found
376
+
377
+ ### Violations by Priority
378
+ - P0 (False confidence): {count}
379
+ - P1 (Incomplete verification): {count}
380
+ - P2 (Pattern issues): {count}
381
+
382
+ ### REWRITE_REQUIRED: {true/false}
383
+ Gate triggered: {gate description}
384
+
385
+ [If true] Proceeding with automatic rewrites...
386
+ [If false] No automatic rewrites needed. See recommendations below.
387
+ ```
388
+
389
+ ### Step 8: Evaluate REWRITE_REQUIRED (Two-Gate)
390
+
391
+ Apply two-gate logic from audit report:
392
+
393
+ **Gate 1 (Impact):**
394
+ ```
395
+ IF any P0 violations exist:
396
+ REWRITE_REQUIRED = true
397
+ gate_triggered = "Gate 1: Impact (P0 violations - false confidence)"
398
+ ```
399
+
400
+ **Gate 2 (Threshold):**
401
+ ```
402
+ ELSE IF P1 violations exist:
403
+ IF any file has test_effectiveness < 95%:
404
+ REWRITE_REQUIRED = true
405
+ gate_triggered = "Gate 2: Threshold (P1 + effectiveness < 95%)"
406
+ ELSE:
407
+ REWRITE_REQUIRED = false
408
+ status = "Advisory only (P1 above 95% threshold)"
409
+ ```
410
+
411
+ **Advisory:**
412
+ ```
413
+ ELSE (P2 only):
414
+ REWRITE_REQUIRED = false
415
+ status = "Advisory only (P2 pattern issues)"
416
+ ```
417
+
418
+ ### Step 9: Rewrite (If Required)
419
+
420
+ ```
421
+ IF REWRITE_REQUIRED == true:
422
+ Read `references/rewrite-instructions.md` and follow the procedure
423
+ for each file in directive.files_to_rewrite (ordered by priority, then effectiveness).
424
+ Uses: assertion-patterns, component-patterns, bug-magnet-data skills.
425
+ ELSE:
426
+ Display recommendations without auto-rewrite
427
+ ```
428
+
429
+ ---
430
+
431
+ ## Deep Mode Detection Prompt
432
+
433
+ Read `references/prompts/deep-mode-detection.md` and use as the Task() prompt for the Sonnet detection sub-agent in Deep mode. Inject per-file AST metadata into the prompt's CONTEXT placeholders (verification_lines, skip_markers, data_flow_leads, integration_mock_leads from Stage 0 output).
434
+
435
+ ---
436
+
437
+ ## Synthesis Prompt Template
438
+
439
+ Read `references/prompts/synthesis.md` and use as the Task() prompt for the Sonnet synthesis sub-agent. Inject the following into the prompt's CONTEXT placeholders:
440
+ - `{deep or scale}` → current mode
441
+ - `{classification_yaml_path}` → classification log path (Scale) or "N/A" (Deep)
442
+ - `{detection_yaml_path}` → detection log path
443
+ - `{skip_detect_json}` → AST skip-detect output
444
+ - `{verify_count_json}` → AST verify-count output
445
+
446
+ ---
447
+
448
+ ## Priority Classification
449
+
450
+ Full definitions: `references/priority-classification.md`
451
+
452
+ - **P0 (False confidence):** T1 (mock SUT), T3+ (broken chain) — test passes but provides no assurance
453
+ - **P1 (Incomplete verification):** T2 (call-only), T3 (mocked boundary) — real code runs but not fully verified
454
+ - **P2 (Pattern issues):** T4 (skip/only/todo), minor patterns — style and disabled tests
455
+
456
+ ---
457
+
458
+ ## Output Schema
459
+
460
+ Full schema with example: `references/schemas/audit-output.yaml`
461
+
462
+ Key fields the orchestrator validates after synthesis:
463
+ - `directive.REWRITE_REQUIRED` — boolean, drives Step 9
464
+ - `directive.gate_triggered` — which gate fired
465
+ - `directive.files_to_rewrite` — ordered list for rewrite step
466
+ - `audit.file_analysis[].test_effectiveness` — per-file percentage
467
+ - `audit.overview.overall_effectiveness` — aggregate metric
468
+
469
+ ---
470
+
471
+ ## Diagnostic Output
472
+
473
+ Write diagnostic output to `logs/diagnostics/test-audit-{YYYYMMDD-HHMMSS}.yaml`.
474
+
475
+ Schema: `references/schemas/diagnostic-output.yaml`. Includes mode selection, Stage 0 AST status, gate evaluation, and per-file decisions with `verification_lines_source: ast | heuristic`.
476
+
477
+ ---
478
+
479
+ ## Integration Notes
480
+
481
+ ### Hook Integration
482
+
483
+ This skill can be triggered by:
484
+ 1. **Direct invocation:** `/test-audit [path]`
485
+ 2. **Pipeline hook:** PostToolUse on `*.test.*` files suggests Test Audit pipeline
486
+
487
+ Both paths use the same orchestration flow.
488
+
489
+ ### AST Scripts
490
+
491
+ All AST scripts live in `skills/test-audit/scripts/` and are invoked via Justfile recipes:
492
+
493
+ | Recipe | Script | Purpose |
494
+ |--------|--------|---------|
495
+ | `just verify-count` | `verification-counter.ts` | Precise line counting (replaces heuristic) |
496
+ | `just skip-detect` | `skip-detector.ts` | T4 skip/only/todo marker detection |
497
+ | `just ast-analyze` | `data-flow-analyzer.ts` | T3+ broken chain detection via data flow tracing |
498
+
499
+ Scripts use ts-morph for AST parsing, run via `npx tsx`, and output JSON to stdout. Dependencies are in `skills/test-audit/scripts/package.json`.
500
+
501
+ ---
502
+
503
+ ## Known Limitations
504
+
505
+ See `references/known-limitations.md` for full details including resolved limitations history.
506
+
507
+ **Active limitations:** T3+ single-file scope (~90% coverage), manual stub detection gaps (mitigated by Deep mode + extended patterns), context limits at scale (mitigated by batching).
508
+
509
+ ---
510
+
511
+ ## Supporting Files
512
+
513
+ | File | Purpose |
514
+ |------|---------|
515
+ | `references/prompts/deep-mode-detection.md` | 4-part prompt for Deep mode detection sub-agent |
516
+ | `references/prompts/synthesis.md` | 4-part prompt for synthesis sub-agent |
517
+ | `references/schemas/audit-output.yaml` | Output schema with example for audit report |
518
+ | `references/schemas/diagnostic-output.yaml` | Diagnostic output schema |
519
+ | `references/priority-classification.md` | P0/P1/P2 definitions with T-rule impact tables |
520
+ | `references/known-limitations.md` | Active and resolved limitations |
521
+ | `references/rewrite-instructions.md` | Step 9 rewrite procedure with bug-magnet-data integration |
522
+
523
+ ---
524
+
525
+ ## Related Skills
526
+
527
+ - `test-classification` (P0.6) - Classification prompt template
528
+ - `mock-detection` (P0.7) - Detection prompt template + `references/stub-patterns.md`, `references/false-positive-prevention.md`
529
+ - `pipeline-templates` (P0.3) - Test Audit pipeline definition
530
+ - `subagent-prompting` (P0.1) - 4-part template reference
531
+ - `bug-magnet-data` (P4.2) - Curated edge case test data
@@ -0,0 +1,41 @@
1
+ # Known Limitations
2
+
3
+ This skill has the following known limitations that are documented for transparency.
4
+
5
+ ## T3+ Detection: Single-File Scope
6
+
7
+ **Issue:** AST-based data flow analysis (`ast-analyze`) traces data flow within a single file. Cross-file integration chains (e.g., mock data imported from a shared fixtures file) are not traced.
8
+
9
+ **Impact:** Estimated ~90% of T3+ violations are single-file (variable constructed in same test body). Cross-file violations (~10%) require LLM heuristics from the detection stage.
10
+
11
+ **Mitigation:** AST provides high-confidence leads for single-file cases. Detection agent (Sonnet) uses call graph analysis for cross-file patterns. Future enhancement: cross-file data flow analysis (deferred to P6+).
12
+
13
+ **Resolved (P5.10):** Previously, T3+ detection relied entirely on LLM pattern matching for variable names containing "mock". AST data-flow-analyzer now detects violations with generic variable names (e.g., `testOrder`).
14
+
15
+ ## Manual Stub Pattern Detection
16
+
17
+ **Issue:** Projects using manual stub classes (e.g., `StubSharedContext`) instead of `jest.mock()` are not detected by mock indicator scanning alone.
18
+
19
+ **Impact:** Classification may not flag all files needing analysis in projects with custom stubbing patterns.
20
+
21
+ **Mitigation:** Integration/e2e files are always flagged regardless of indicators. Extended pattern detection reference docs (`references/stub-patterns.md` in mock-detection skill) provide Meszaros taxonomy patterns to the detection agent. Deep mode (≤5 files) bypasses classification entirely, analyzing all files.
22
+
23
+ **Improved (P5.12):** Detection agent now has access to extended stub/fake patterns and false positive prevention reference docs.
24
+
25
+ ## Context Limits at Scale
26
+
27
+ **Issue:** Single sub-agent calls can handle ~20-25 test files before approaching context limits.
28
+
29
+ **Mitigation:** Batching with parallel sub-agents implemented for >20 files (classification) and >10 files (detection). Deep mode safety cap prevents analyzing >25 files without classification filtering.
30
+
31
+ ## Resolved Limitations
32
+
33
+ The following limitations from earlier versions have been fully addressed:
34
+
35
+ | Limitation | Resolution | Version |
36
+ |-----------|-----------|---------|
37
+ | Verification line counting approximation | AST-based `verification-counter.ts` provides exact counts | P5.10 |
38
+ | Negative effectiveness percentages | Impossible with AST-precise line counts | P5.10 |
39
+ | T4 detection not automated | AST-based `skip-detector.ts` finds all skip/only/todo markers | P5.10 |
40
+ | No dual-mode for small audits | Deep mode (≤5 files) skips classification | P5.11 |
41
+ | T3+ relies on "mock" in variable names | AST `data-flow-analyzer.ts` traces data sources structurally | P5.10 |
@@ -0,0 +1,30 @@
1
+ # Priority Classification
2
+
3
+ ## P0: False Confidence
4
+
5
+ Tests that pass but should not be trusted:
6
+
7
+ | Rule | Impact |
8
+ |------|--------|
9
+ | T1 | Mock hides real failures - test always passes regardless of SUT behavior |
10
+ | T3+ | Broken integration chain - no real integration is tested |
11
+
12
+ ## P1: Incomplete Verification
13
+
14
+ Tests that run real code but don't fully verify:
15
+
16
+ | Rule | Impact |
17
+ |------|--------|
18
+ | T2 | Call happened but effect not verified |
19
+ | T3 | Integration boundary mocked - partial integration only |
20
+
21
+ ## P2: Pattern Issues
22
+
23
+ Style, organization, and disabled test issues:
24
+
25
+ | Rule | Impact |
26
+ |------|--------|
27
+ | T4 (.skip) | Test disabled — not running, medium severity |
28
+ | T4 (.only) | Focus marker — other tests excluded from CI, high severity |
29
+ | T4 (.todo) | Test placeholder — not implemented, low severity |
30
+ | Minor patterns | Style and organization recommendations |
@@ -0,0 +1,83 @@
1
+ # Deep Mode Detection Prompt
2
+
3
+ Use this template for Stage 2 in **Deep mode only**. In Deep mode, the detection agent self-computes classification metadata (normally provided by Stage 1) using AST output.
4
+
5
+ ## GOAL
6
+
7
+ Analyze ALL provided test files for T1-T4 violations using mock appropriateness rubric and call graph analysis. For each file, self-compute classification metadata (test type, mock indicators, needs_deep_analysis) before performing detection. Track the full scope of each violation for test effectiveness calculation.
8
+
9
+ ## CONSTRAINTS
10
+
11
+ - Do NOT modify any files
12
+ - Analyze ALL provided files (no classification filtering — this is Deep mode)
13
+ - Use AST metadata as ground truth for verification_lines (do not re-estimate)
14
+ - Use AST data-flow violations as starting leads for T3+ analysis
15
+ - Use AST skip markers as T4 violations (deterministic — no further analysis needed)
16
+ - Use call graph analysis to detect T1-T3 violations beyond AST leads
17
+ - Track violation scope (all affected lines, not just violation line)
18
+ - Provide full context for each violation (line, snippet, reason, fix)
19
+ - Complete within 50 tool calls per batch
20
+
21
+ ## CONTEXT
22
+
23
+ **Mode:** Deep (all files analyzed, no classification stage)
24
+
25
+ **Files to analyze:** {list of ALL test files}
26
+
27
+ **AST metadata per file:**
28
+ ```
29
+ {for each file}:
30
+ file: {path}
31
+ verification_lines: {metrics.test_logic_lines from verify-count}
32
+ assertion_lines: {metrics.assertion_lines}
33
+ framework: {metrics.framework_detected}
34
+ skip_markers: {markers from skip-detect, or "none"}
35
+ data_flow_leads: {violations from ast-analyze, or "none"}
36
+ integration_mock_leads: {leads from just integration-mocks, or "none"}
37
+ ```
38
+
39
+ **Self-classification instructions (MANDATORY — per-section, not per-file):**
40
+
41
+ Files commonly contain multiple test types in different sections. You MUST classify each top-level describe/test block independently. DO NOT assign a single test type to the entire file.
42
+
43
+ For each top-level describe block or section:
44
+ 1. **Test type**: unit / integration / e2e — determine from:
45
+ - Block/suite name (e.g., "Integration Tests", "E2E: checkout flow")
46
+ - Preceding comments or section headers (e.g., `// INTEGRATION TESTS`, `# E2E`, `/* system tests */`)
47
+ - Setup patterns within the block (real DB connections = integration, browser launch = e2e)
48
+ - These signals are language-agnostic — apply regardless of whether the file is TypeScript, Python, Java, Go, Ruby, etc.
49
+ 2. **Mock indicators within that block**: list mock/stub/spy framework calls found
50
+ 3. **Evaluate each block against the rubric for ITS test type** — not the file's majority type
51
+
52
+ If AST integration-mock metadata is available (from `just integration-mocks`), use it as ground truth for section classification and mock locations. Validate AST leads and add any the AST missed.
53
+
54
+ **BINDING: AST classification is final.** When the AST script classifies a section as integration or e2e, that classification is NOT subject to LLM override. You MUST evaluate mocks in that section against integration/e2e rules — even if you believe the section is "actually" a unit test. Your role is to evaluate mock appropriateness within the classified type, not to re-classify sections.
55
+
56
+ - If the test author labeled a block "Integration" and the AST confirmed it, both the author's intent and the deterministic signal agree. Do NOT introduce personal judgment to override them.
57
+ - If you believe a section is mislabeled, you MAY note "Advisory: consider renaming this section" — but you MUST still flag T3 violations against the integration/e2e rubric.
58
+ - Dismissing an AST T3 lead by re-classifying the section as "actually unit" is a rule violation.
59
+
60
+ **Mock appropriateness rubric:** See mock-detection skill's "Mock Appropriateness Rubric" section
61
+
62
+ **T1-T4 detection patterns:** See mock-detection skill's "T1-T4 Detection Patterns" section
63
+
64
+ **Extended stub/fake patterns:** See `skills/mock-detection/references/stub-patterns.md` (loaded via mock-detection dependency)
65
+
66
+ **False positive prevention:** See `skills/mock-detection/references/false-positive-prevention.md` (loaded via mock-detection dependency) — consult BEFORE flagging borderline patterns
67
+
68
+ ## OUTPUT
69
+
70
+ Write violations to: `logs/mock-detection-{YYYYMMDD-HHMMSS}.yaml`
71
+
72
+ Write diagnostics to: `logs/diagnostics/mock-detection-{YYYYMMDD-HHMMSS}.yaml`
73
+
74
+ Use the same output schema as the mock-detection skill's "Output Schema" section, with one addition — include a `self_classification` block per file:
75
+
76
+ ```yaml
77
+ self_classification:
78
+ - file: tests/proxy.test.ts
79
+ test_type: unit
80
+ mock_indicators: ["jest.spyOn(child_process, 'spawn')"]
81
+ needs_deep_analysis: true
82
+ reason: "Mock intercepts core dependency"
83
+ ```