@tplog/hasapi 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +54 -0
  3. package/bin/hasapi.mjs +292 -0
  4. package/hasapi-skills/README.md +56 -0
  5. package/hasapi-skills/_shared/common.md +240 -0
  6. package/hasapi-skills/_shared/custom-risks-guide.md +48 -0
  7. package/hasapi-skills/_shared/decay-risks.md +294 -0
  8. package/hasapi-skills/_shared/remedy-guide.md +37 -0
  9. package/hasapi-skills/_shared/source-coverage.md +248 -0
  10. package/hasapi-skills/_shared/test-decay-risks.md +246 -0
  11. package/hasapi-skills/hasapi-audit/SKILL.md +42 -0
  12. package/hasapi-skills/hasapi-audit/architecture-guide.md +195 -0
  13. package/hasapi-skills/hasapi-audit/onboarding-guide.md +89 -0
  14. package/hasapi-skills/hasapi-debt/SKILL.md +35 -0
  15. package/hasapi-skills/hasapi-debt/debt-guide.md +125 -0
  16. package/hasapi-skills/hasapi-diagnosing-bugs/SKILL.md +134 -0
  17. package/hasapi-skills/hasapi-diagnosing-bugs/scripts/hitl-loop.template.sh +41 -0
  18. package/hasapi-skills/hasapi-grilling/SKILL.md +10 -0
  19. package/hasapi-skills/hasapi-handoff/SKILL.md +16 -0
  20. package/hasapi-skills/hasapi-health/SKILL.md +37 -0
  21. package/hasapi-skills/hasapi-health/health-guide.md +89 -0
  22. package/hasapi-skills/hasapi-implement/SKILL.md +15 -0
  23. package/hasapi-skills/hasapi-resolving-merge-conflicts/SKILL.md +14 -0
  24. package/hasapi-skills/hasapi-review/SKILL.md +37 -0
  25. package/hasapi-skills/hasapi-review/pr-review-guide.md +163 -0
  26. package/hasapi-skills/hasapi-setup/SKILL.md +121 -0
  27. package/hasapi-skills/hasapi-setup/domain.md +51 -0
  28. package/hasapi-skills/hasapi-setup/issue-tracker-github.md +22 -0
  29. package/hasapi-skills/hasapi-setup/issue-tracker-gitlab.md +23 -0
  30. package/hasapi-skills/hasapi-setup/issue-tracker-local.md +19 -0
  31. package/hasapi-skills/hasapi-setup/triage-labels.md +15 -0
  32. package/hasapi-skills/hasapi-sweep/SKILL.md +38 -0
  33. package/hasapi-skills/hasapi-sweep/sweep-guide.md +264 -0
  34. package/hasapi-skills/hasapi-tdd/SKILL.md +108 -0
  35. package/hasapi-skills/hasapi-tdd/mocking.md +59 -0
  36. package/hasapi-skills/hasapi-tdd/refactoring.md +10 -0
  37. package/hasapi-skills/hasapi-tdd/tests.md +61 -0
  38. package/hasapi-skills/hasapi-test/SKILL.md +36 -0
  39. package/hasapi-skills/hasapi-test/test-guide.md +147 -0
  40. package/hasapi-skills/hasapi-to-issues/SKILL.md +84 -0
  41. package/hasapi-skills/hasapi-to-prd/SKILL.md +75 -0
  42. package/package.json +39 -0
@@ -0,0 +1,264 @@
1
+ # Brooks-Lint — Full Sweep Guide
2
+
3
+ Sequential autonomous pipeline: **review → test → debt → audit**. Fixes findings
4
+ in place, iterates until clean or capped, reports residuals. One interaction point:
5
+ Step 0 (pre-flight consent) — after approval the pipeline runs hands-free until Step 8.
6
+
7
+ Every finding follows the Iron Law: **Symptom → Source → Consequence → Remedy**.
8
+
9
+ ---
10
+
11
+ ### Step 0 — Pre-flight consent gate
12
+
13
+ **Goal:** State scope, cost, and irreversibility up front; get explicit consent
14
+ once so later steps never have to ask.
15
+
16
+ 0a. Estimate the file count using `git ls-files | wc -l` if in a git repo, or
17
+ `find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/.venv/*' -not -path '*/build/*' -not -path '*/dist/*' -not -path '*/vendor/*' -not -path '*/target/*' | wc -l` otherwise. Order-of-magnitude is enough.
18
+
19
+ 0b. Show this notice verbatim with the estimate filled in. Do not paraphrase —
20
+ the user is agreeing to this exact scope.
21
+
22
+ ```
23
+ ⚠️ /brooks-sweep — Full Repository Sweep & Auto-Fix
24
+
25
+ Scope: Four analysis dimensions run in sequence — PR code decay (R1–R6),
26
+ test quality (T1–T6), tech debt, architecture. Edits are made in
27
+ place inside the detected project scope.
28
+ Estimated files in scope: ~N
29
+
30
+ Order: brooks-review → brooks-test → brooks-debt → brooks-audit.
31
+ Each dimension scans, queues, and fixes before the next starts.
32
+
33
+ Autonomy: Fully autonomous. Safe single-file fixes apply directly. Multi-file
34
+ fixes that have test coverage AND do not break a public interface
35
+ also apply directly. High-risk fixes (public API break, cross-module
36
+ structural change, or no test coverage) are NOT applied — they are
37
+ recorded in the residual report for human review.
38
+
39
+ Iteration: After each dimension pass, modified files + same-module + static
40
+ consumers are re-scanned. A finding that fails to fix 3 times is
41
+ retired to the unresolvable set and never re-queued. Non-critical
42
+ rounds cap at 3 iterations; critical findings iterate until
43
+ resolved or retired.
44
+
45
+ Git impact: The pipeline edits files. It does NOT commit, push, or amend.
46
+ If you have uncommitted work you want to preserve, commit or stash
47
+ first.
48
+
49
+ Proceed with full autonomous sweep? [Y/n]
50
+ ```
51
+
52
+ 0c. Parse the reply (first match wins, evaluate rules in order):
53
+ 1. **Hard negation** (`no`, `n`, `abort`, `cancel`, `取消`, `不要`): abort with "Aborted before scan — no files modified."
54
+ 2. **Consent** (`Y`, `yes`, `ok`, `sure`, `proceed`, `go`, `continue`, `好`, `好的`, `行`, `可以`): proceed to Step 1.
55
+ 3. **Soft pause** (`wait`, `hold on`, `等一下`, `等我`, `let me`): acknowledge in one line ("Understood, waiting"), then wait for the user's next message and re-evaluate from rule 1.
56
+ 4. **Question**: answer it, then re-show the notice once and wait for the next reply. If the next reply is not Consent (rule 2) — whether a second question, another pause, or anything else — abort with "Aborted — did not receive consent after clarification."
57
+
58
+ 0d. After consent, do not ask further questions until Step 8.
59
+
60
+ ---
61
+
62
+ ### Step 1 — Scope enumeration and state init
63
+
64
+ 1a. Apply Auto Scope Detection from `../_shared/common.md` if the user did not
65
+ specify files or a directory. Otherwise honor the user's explicit scope.
66
+
67
+ 1b. Read `.brooks-lint.yaml` from the project root if present. Apply `disable`,
68
+ `severity`, `ignore`, `focus`, and `custom_risks` per common.md. Record the
69
+ applied config values and reuse them across all iteration rounds — do not
70
+ re-read the file in Step 6 even if files were modified.
71
+
72
+ 1c. Initialize pipeline state (persists across all rounds):
73
+
74
+ - **`unresolvable`** (set): findings retired after 3 failed attempts — keyed by `(file, line_range, risk_code)`; `signature` breaks ties. Never re-queued.
75
+ - **`non_critical_rounds`** (int, 0): incremented each round producing Warning/Suggestion; reset on clean round.
76
+ - **`fix_log`** (list): each fix with file, line range, risk code, description, and outcome (`applied` / `reverted` / `retired`).
77
+
78
+ 1d. Record the final scope file list in the Fix Report output buffer for Step 8.
79
+
80
+ ---
81
+
82
+ ### Step 2 — brooks-review pass (R1–R6 code decay)
83
+
84
+ Scan every file in scope against all R-series risks defined in
85
+ `../_shared/decay-risks.md`.
86
+
87
+ 2a. For each R-risk, apply its symptom checklist. Record each hit as a finding
88
+ with: risk code, file + approximate line range, Symptom, Source,
89
+ Consequence, Remedy, Severity (Critical / Warning / Suggestion), and
90
+ **Fix-Class** (see Step 2b).
91
+
92
+ 2b. Assign Fix-Class per finding:
93
+
94
+ | Class | Criteria |
95
+ |-------|----------|
96
+ | **Safe** | Single-file AND fully local: rename a non-exported symbol, extract a constant, remove dead code, add a null guard at a leaf, add a test scaffold for an untested pure function. Any change that modifies or removes an exported symbol is NOT Safe even if in one file. |
97
+ | **Extended-Safe** | Multi-file but (a) a project test command exists and passes pre-fix, AND (b) the change does not rename, remove, or alter the signature of any publicly exported symbol, AND (c) touches ≤ 5 files in this pass. |
98
+ | **Residual** | Public API break, cross-service boundary change, no test coverage to fall back on, or remedy ambiguous. NOT applied — carried to the Step 8 residual report. |
99
+
100
+ 2c. Skip any finding that matches an entry in the `unresolvable` set.
101
+
102
+ 2d. Apply every Safe and Extended-Safe fix in this dimension, lowest risk
103
+ within each severity tier first. For each fix: Edit or Write, then append
104
+ one row to `fix_log` with outcome `applied`. If two fixes touch overlapping
105
+ line ranges in the same file, apply higher-severity first, re-read the file,
106
+ then apply the next.
107
+
108
+ 2e. After all fixes in this dimension, run the project test/lint command if one
109
+ exists (`package.json` scripts, `pytest`, `cargo test`, `go test ./...`, etc.).
110
+ If tests fail: revert fixes from this dimension in reverse order one at a
111
+ time, re-running the test command after each revert, until tests pass.
112
+ Mark each reverted fix with outcome `reverted` in `fix_log` and promote the
113
+ finding to **Residual**. If no test command is found, note this once in the
114
+ report and continue.
115
+
116
+ 2f. Record dimension summary: N scanned, M Safe applied, K Extended-Safe applied,
117
+ R reverted, P Residual.
118
+
119
+ ---
120
+
121
+ ### Step 3 — brooks-test pass (T1–T6 test decay)
122
+
123
+ Scan test files (and untested production code) against T-series risks defined
124
+ in `../_shared/test-decay-risks.md`.
125
+
126
+ Follow the same sub-steps as Step 2 (classify → apply → verify → summarize),
127
+ using T-prefix risk codes. For production files with no test coverage at all,
128
+ record as T2 (Missing Tests). A test scaffold that adds a pure-function test is
129
+ **Safe**; adding tests that require new test infrastructure is **Residual**.
130
+
131
+ ---
132
+
133
+ ### Step 4 — brooks-debt pass (tech debt accumulation)
134
+
135
+ Re-classify R-findings through a debt lens — same symptoms at accumulation scale:
136
+ repeated duplication, layered workarounds, stale `TODO`/`FIXME` clusters, dead
137
+ flags. Score each with **Pain (1–3) × Spread (1–3)**; total 7–9 = Critical,
138
+ 4–6 = Warning, 1–3 = Suggestion. Apply a severity bump for pattern-level
139
+ occurrences (isolated Suggestion → 4+ modules Warning).
140
+
141
+ Follow the same sub-steps as Step 2. Debt findings often span multiple files
142
+ and are more likely to land in Extended-Safe or Residual than Safe.
143
+
144
+ ---
145
+
146
+ ### Step 5 — brooks-audit pass (architecture integrity)
147
+
148
+ Scan the full scope for architecture-level issues. The dependency-direction
149
+ symptoms (inverted dependencies, circular imports, cross-domain coupling) are
150
+ defined in `../_shared/decay-risks.md` Risk 5 — use that checklist. Step 5
151
+ additionally covers architecture-only concerns that R5 does not: missing
152
+ abstraction layers, god modules, leaked infrastructure inside domain code,
153
+ and seam-boundary violations.
154
+
155
+ Most architecture findings are **Residual** by definition — they require human
156
+ judgment on module boundaries. A few are Extended-Safe (e.g. extract a shared
157
+ constant used in 3+ modules into a new module that nothing else imports yet).
158
+ Do not auto-refactor module layouts, rename packages, or change public exports.
159
+
160
+ Follow the same sub-steps as Step 2.
161
+
162
+ ---
163
+
164
+ ### Step 6 — Iteration loop
165
+
166
+ **Goal:** Re-scan what the fixes touched and converge. Stop on clean round,
167
+ cap, or no progress.
168
+
169
+ 6a. Build the re-scan scope:
170
+ - every file modified in Steps 2–5 of the current round, PLUS
171
+ - every file in the same module as a modified file, PLUS
172
+ - every file that statically imports from a modified file.
173
+
174
+ Do not re-scan files whose dependencies were not touched. On monorepos
175
+ where a "module" may span hundreds of files, narrow the same-module bucket
176
+ to files that import from or are imported by a modified file (direct
177
+ dependency graph only).
178
+
179
+ 6b. Re-run Steps 2–5 on the re-scan scope. For each new finding in this round:
180
+ - If it matches an entry in `unresolvable` → skip.
181
+ - Else if 🔴 Critical → queue and fix; Critical findings iterate until
182
+ resolved OR retired (3 failed attempts → `unresolvable`).
183
+ - Else 🟡 Warning / 🟢 Suggestion → queue and fix, subject to cap below.
184
+
185
+ 6c. Classify the round after all fixes attempted:
186
+ - **Clean round** (no new findings outside `unresolvable`): pipeline
187
+ converged → proceed to Step 7.
188
+ - **Critical-only round**: do NOT increment `non_critical_rounds`; return
189
+ to 6a.
190
+ - **Mixed or non-critical round** (any Warning / Suggestion produced):
191
+ increment `non_critical_rounds` by 1. If it reaches the cap (default 3,
192
+ or `sweep.max_iterations` from `.brooks-lint.yaml`), proceed to Step 7
193
+ with remaining non-critical findings recorded as
194
+ `"Unresolved — iteration cap reached"`. Otherwise return to 6a.
195
+
196
+ 6d. Fix-retry rule: if a single finding fails verification (Step 2e) 3 times
197
+ across any combination of rounds, retire it to `unresolvable` with reason
198
+ `"3-retry budget exhausted"` and stop attempting it.
199
+
200
+ ---
201
+
202
+ ### Step 7 — Residual aggregation
203
+
204
+ Collect everything that was NOT fixed in place, de-duplicated:
205
+
206
+ - All Residual-class findings from Steps 2–5 (first round + re-scan rounds)
207
+ - All `unresolvable` entries with their retirement reason
208
+ - All iteration-cap residuals from Step 6c
209
+
210
+ Sort Critical → Warning → Suggestion. Within each severity, list file path,
211
+ risk code, Symptom (one line), Remedy (one line), and the reason it was not
212
+ applied (`public API break` / `no test coverage` / `3-retry budget` /
213
+ `iteration cap`).
214
+
215
+ ---
216
+
217
+ ### Step 8 — Sweep report
218
+
219
+ Output the final report. Use the standard Report Template from
220
+ `../_shared/common.md` with these additions:
221
+
222
+ ```
223
+ # Brooks-Lint — Full Sweep Report
224
+ Mode: Full Sweep | Scope: <files or directory>
225
+ Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored) # omit if no config
226
+
227
+ ## Dimension Summary
228
+ | Dimension | Scanned | Safe Applied | Extended Applied | Reverted | Residual |
229
+ |-----------|---------|--------------|------------------|----------|----------|
230
+ | Review (R1–R6) | ... | ... | ... | ... | ... |
231
+ | Test (T1–T6) | ... | ... | ... | ... | ... |
232
+ | Debt | ... | ... | ... | ... | ... |
233
+ | Audit | ... | ... | ... | ... | ... |
234
+
235
+ ## Iteration History
236
+ Round 1: <classification — clean / critical-only / mixed>, <N> new findings
237
+ Round 2: ...
238
+ Stopped at: clean round | iteration cap | no outstanding criticals
239
+
240
+ ## Fix Log
241
+ | # | File | Lines | Risk | Outcome | Change |
242
+ |---|------|-------|------|----------|--------|
243
+ | 1 | ... | ... | R2 | applied | Extract repeated constant |
244
+ | 2 | ... | ... | T4 | reverted | Test regression; promoted to Residual |
245
+ ...
246
+
247
+ ## Health Score Delta
248
+ Before: <estimated score>/100 → After: <estimated score>/100
249
+ (Re-run /brooks-health for an exact recalculation.)
250
+
251
+ ## Residual Items (<K> not applied)
252
+ <Iron Law entries, sorted Critical → Suggestion, with "Not applied because: ..." line>
253
+
254
+ ## Summary
255
+ - Total findings detected: <N>
256
+ - Fixed this sweep: <M>
257
+ - Residual (needs human review): <K>
258
+ - Unresolvable (3-retry exhausted): <U>
259
+ ```
260
+
261
+ If there are zero residual items and zero unresolvable entries, end with:
262
+ **"Sweep complete — codebase is clean."**
263
+
264
+ **Mode line in report:** `Full Sweep`
@@ -0,0 +1,108 @@
1
+ ---
2
+ name: hasapi-tdd
3
+ description: Test-driven development. Use when the user wants to build features or fix bugs test-first, mentions "red-green-refactor", or wants integration tests.
4
+ ---
5
+
6
+ # Test-Driven Development
7
+
8
+ ## Philosophy
9
+
10
+ **Core principle**: Tests should verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't.
11
+
12
+ **Good tests** are integration-style: they exercise real code paths through public APIs. They describe _what_ the system does, not _how_ it does it. A good test reads like a specification - "user can checkout with valid cart" tells you exactly what capability exists. These tests survive refactors because they don't care about internal structure.
13
+
14
+ **Bad tests** are coupled to implementation. They mock internal collaborators, test private methods, or verify through external means (like querying a database directly instead of using the interface). The warning sign: your test breaks when you refactor, but behavior hasn't changed. If you rename an internal function and tests fail, those tests were testing implementation, not behavior.
15
+
16
+ See [tests.md](tests.md) for examples and [mocking.md](mocking.md) for mocking guidelines.
17
+
18
+ ## Anti-Pattern: Horizontal Slices
19
+
20
+ **DO NOT write all tests first, then all implementation.** This is "horizontal slicing" - treating RED as "write all tests" and GREEN as "write all code."
21
+
22
+ This produces **crap tests**:
23
+
24
+ - Tests written in bulk test _imagined_ behavior, not _actual_ behavior
25
+ - You end up testing the _shape_ of things (data structures, function signatures) rather than user-facing behavior
26
+ - Tests become insensitive to real changes - they pass when behavior breaks, fail when behavior is fine
27
+ - You outrun your headlights, committing to test structure before understanding the implementation
28
+
29
+ **Correct approach**: Vertical slices via tracer bullets. One test → one implementation → repeat. Each test responds to what you learned from the previous cycle. Because you just wrote the code, you know exactly what behavior matters and how to verify it.
30
+
31
+ ```
32
+ WRONG (horizontal):
33
+ RED: test1, test2, test3, test4, test5
34
+ GREEN: impl1, impl2, impl3, impl4, impl5
35
+
36
+ RIGHT (vertical):
37
+ RED→GREEN: test1→impl1
38
+ RED→GREEN: test2→impl2
39
+ RED→GREEN: test3→impl3
40
+ ...
41
+ ```
42
+
43
+ ## Workflow
44
+
45
+ ### 1. Planning
46
+
47
+ When exploring the codebase, read `CONTEXT.md` (if it exists) so that test names and interface vocabulary match the project's domain language, and respect ADRs in the area you're touching.
48
+
49
+ Before writing any code:
50
+
51
+ - [ ] Confirm with user what interface changes are needed
52
+ - [ ] Confirm with user which behaviors to test (prioritize)
53
+ - [ ] Identify opportunities for deep modules (small interface, deep implementation) — run the `/codebase-design` skill for the vocabulary and the testability checks
54
+ - [ ] List the behaviors to test (not implementation steps)
55
+ - [ ] Get user approval on the plan
56
+
57
+ Ask: "What should the public interface look like? Which behaviors are most important to test?"
58
+
59
+ **You can't test everything.** Confirm with the user exactly which behaviors matter most. Focus testing effort on critical paths and complex logic, not every possible edge case.
60
+
61
+ ### 2. Tracer Bullet
62
+
63
+ Write ONE test that confirms ONE thing about the system:
64
+
65
+ ```
66
+ RED: Write test for first behavior → test fails
67
+ GREEN: Write minimal code to pass → test passes
68
+ ```
69
+
70
+ This is your tracer bullet - proves the path works end-to-end.
71
+
72
+ ### 3. Incremental Loop
73
+
74
+ For each remaining behavior:
75
+
76
+ ```
77
+ RED: Write next test → fails
78
+ GREEN: Minimal code to pass → passes
79
+ ```
80
+
81
+ Rules:
82
+
83
+ - One test at a time
84
+ - Only enough code to pass current test
85
+ - Don't anticipate future tests
86
+ - Keep tests focused on observable behavior
87
+
88
+ ### 4. Refactor
89
+
90
+ After all tests pass, look for [refactor candidates](refactoring.md):
91
+
92
+ - [ ] Extract duplication
93
+ - [ ] Deepen modules (move complexity behind simple interfaces)
94
+ - [ ] Apply SOLID principles where natural
95
+ - [ ] Consider what new code reveals about existing code
96
+ - [ ] Run tests after each refactor step
97
+
98
+ **Never refactor while RED.** Get to GREEN first.
99
+
100
+ ## Checklist Per Cycle
101
+
102
+ ```
103
+ [ ] Test describes behavior, not implementation
104
+ [ ] Test uses public interface only
105
+ [ ] Test would survive internal refactor
106
+ [ ] Code is minimal for this test
107
+ [ ] No speculative features added
108
+ ```
@@ -0,0 +1,59 @@
1
+ # When to Mock
2
+
3
+ Mock at **system boundaries** only:
4
+
5
+ - External APIs (payment, email, etc.)
6
+ - Databases (sometimes - prefer test DB)
7
+ - Time/randomness
8
+ - File system (sometimes)
9
+
10
+ Don't mock:
11
+
12
+ - Your own classes/modules
13
+ - Internal collaborators
14
+ - Anything you control
15
+
16
+ ## Designing for Mockability
17
+
18
+ At system boundaries, design interfaces that are easy to mock:
19
+
20
+ **1. Use dependency injection**
21
+
22
+ Pass external dependencies in rather than creating them internally:
23
+
24
+ ```typescript
25
+ // Easy to mock
26
+ function processPayment(order, paymentClient) {
27
+ return paymentClient.charge(order.total);
28
+ }
29
+
30
+ // Hard to mock
31
+ function processPayment(order) {
32
+ const client = new StripeClient(process.env.STRIPE_KEY);
33
+ return client.charge(order.total);
34
+ }
35
+ ```
36
+
37
+ **2. Prefer SDK-style interfaces over generic fetchers**
38
+
39
+ Create specific functions for each external operation instead of one generic function with conditional logic:
40
+
41
+ ```typescript
42
+ // GOOD: Each function is independently mockable
43
+ const api = {
44
+ getUser: (id) => fetch(`/users/${id}`),
45
+ getOrders: (userId) => fetch(`/users/${userId}/orders`),
46
+ createOrder: (data) => fetch('/orders', { method: 'POST', body: data }),
47
+ };
48
+
49
+ // BAD: Mocking requires conditional logic inside the mock
50
+ const api = {
51
+ fetch: (endpoint, options) => fetch(endpoint, options),
52
+ };
53
+ ```
54
+
55
+ The SDK approach means:
56
+ - Each mock returns one specific shape
57
+ - No conditional logic in test setup
58
+ - Easier to see which endpoints a test exercises
59
+ - Type safety per endpoint
@@ -0,0 +1,10 @@
1
+ # Refactor Candidates
2
+
3
+ After TDD cycle, look for:
4
+
5
+ - **Duplication** → Extract function/class
6
+ - **Long methods** → Break into private helpers (keep tests on public interface)
7
+ - **Shallow modules** → Combine or deepen
8
+ - **Feature envy** → Move logic to where data lives
9
+ - **Primitive obsession** → Introduce value objects
10
+ - **Existing code** the new code reveals as problematic
@@ -0,0 +1,61 @@
1
+ # Good and Bad Tests
2
+
3
+ ## Good Tests
4
+
5
+ **Integration-style**: Test through real interfaces, not mocks of internal parts.
6
+
7
+ ```typescript
8
+ // GOOD: Tests observable behavior
9
+ test("user can checkout with valid cart", async () => {
10
+ const cart = createCart();
11
+ cart.add(product);
12
+ const result = await checkout(cart, paymentMethod);
13
+ expect(result.status).toBe("confirmed");
14
+ });
15
+ ```
16
+
17
+ Characteristics:
18
+
19
+ - Tests behavior users/callers care about
20
+ - Uses public API only
21
+ - Survives internal refactors
22
+ - Describes WHAT, not HOW
23
+ - One logical assertion per test
24
+
25
+ ## Bad Tests
26
+
27
+ **Implementation-detail tests**: Coupled to internal structure.
28
+
29
+ ```typescript
30
+ // BAD: Tests implementation details
31
+ test("checkout calls paymentService.process", async () => {
32
+ const mockPayment = jest.mock(paymentService);
33
+ await checkout(cart, payment);
34
+ expect(mockPayment.process).toHaveBeenCalledWith(cart.total);
35
+ });
36
+ ```
37
+
38
+ Red flags:
39
+
40
+ - Mocking internal collaborators
41
+ - Testing private methods
42
+ - Asserting on call counts/order
43
+ - Test breaks when refactoring without behavior change
44
+ - Test name describes HOW not WHAT
45
+ - Verifying through external means instead of interface
46
+
47
+ ```typescript
48
+ // BAD: Bypasses interface to verify
49
+ test("createUser saves to database", async () => {
50
+ await createUser({ name: "Alice" });
51
+ const row = await db.query("SELECT * FROM users WHERE name = ?", ["Alice"]);
52
+ expect(row).toBeDefined();
53
+ });
54
+
55
+ // GOOD: Verifies through interface
56
+ test("createUser makes user retrievable", async () => {
57
+ const user = await createUser({ name: "Alice" });
58
+ const retrieved = await getUser(user.id);
59
+ expect(retrieved.name).toBe("Alice");
60
+ });
61
+ ```
@@ -0,0 +1,36 @@
1
+ ---
2
+ name: hasapi-test
3
+ description: >
4
+ Test quality review drawing on twelve classic engineering books — with primary focus
5
+ on xUnit Test Patterns, The Art of Unit Testing, How Google Tests Software, and
6
+ Working Effectively with Legacy Code — that diagnoses structural problems in an
7
+ existing test suite: brittleness, mock abuse, coverage illusions, slow execution,
8
+ poor readability.
9
+ Triggers when: user asks about test quality, shares test files for review, or
10
+ expresses frustration: "tests keep breaking whenever I change anything", "our tests
11
+ take forever", "I can't understand what this test is doing", "tests pass but bugs
12
+ still reach production", "we have too many mocks".
13
+ Do NOT trigger for: writing new tests from scratch (use the regular test-writing
14
+ workflow) or testing framework/syntax questions — this skill reviews an existing
15
+ suite for structural quality problems, not individual test authoring.
16
+ ---
17
+
18
+ # Brooks-Lint — Test Quality Review
19
+
20
+ ## Setup
21
+
22
+ 1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
23
+ 2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
24
+ 3. Read `../_shared/test-decay-risks.md` for test-space symptom definitions and source attributions
25
+ 4. Read `test-guide.md` in this directory for the test quality review framework
26
+
27
+ ## Process
28
+
29
+ **If the user has not shared test files or pointed to a test directory:** apply Auto
30
+ Scope Detection from `../_shared/common.md` to determine the review scope before proceeding.
31
+
32
+ 1. Build the test suite map (guide's "Before You Start" section)
33
+ 2. Scan for each test decay risk in the order specified (Steps 1–4 of the guide)
34
+ 3. Apply the Iron Law and output using the Report Template (Step 5 of the guide)
35
+
36
+ **Mode line in report:** `Test Quality Review`
@@ -0,0 +1,147 @@
1
+ # Test Quality Review Guide — Mode 4
2
+
3
+ **Purpose:** Diagnose the health of a test suite using six test-space decay risks.
4
+ Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
5
+
6
+ ---
7
+
8
+ ## Before You Start: Build the Test Suite Map
9
+
10
+ Before scanning for any risk, map the current test suite structure:
11
+
12
+ ```
13
+ Unit tests: X files, ~N tests
14
+ Integration tests: X files, ~N tests
15
+ E2E tests: X files, ~N tests
16
+ Ratio: Unit X% : Integration X% : E2E X%
17
+ Coverage areas: [modules with tests] vs [modules without tests]
18
+ ```
19
+
20
+ If you cannot access test files directly, ask the user **one question** — choose the
21
+ most relevant:
22
+ 1. "Which module is hardest to test or has the least coverage?"
23
+ 2. "When you make a change, how often do unrelated tests break?"
24
+ 3. "Is there a part of the codebase your team avoids touching because it has no tests?"
25
+
26
+ After one answer, proceed. Do not ask more than one question.
27
+
28
+ ---
29
+
30
+ ## Analysis Process
31
+
32
+ Work through these five steps in order.
33
+
34
+ ### Step 1: Scan for Test Obscurity
35
+
36
+ *Scan this first — the most visible risk and the one that determines whether the suite
37
+ is maintainable at all.*
38
+
39
+ Look for:
40
+ - Read 5–10 test names at random: can each one communicate subject + scenario + expected
41
+ outcome without opening the test body?
42
+ - Are there tests where a failure gives no clue which behavior broke (multiple assertions,
43
+ no message strings)?
44
+ - Does any test depend on external state (files, database rows, env variables, shared mutable
45
+ fixtures) that is invisible from within the test body?
46
+ - Is there a single massive setUp or beforeEach that every test inherits regardless of
47
+ what it actually needs?
48
+
49
+ If all test names are clear and setups are minimal → no finding.
50
+
51
+ ### Step 2a: Scan for Test Brittleness
52
+
53
+ *Brittle tests break on refactors that do not change observable behavior — they test
54
+ implementation, not contracts.*
55
+
56
+ Look for:
57
+ - Ask (or check git history): did any recent refactor cause test failures with no
58
+ behavior change?
59
+ - Are there test methods where the name contains "and" or that assert on 3 or more
60
+ unrelated behaviors (Eager Test)?
61
+ - Do assertions specify mock call order or exact parameter values that are irrelevant
62
+ to the observable behavior?
63
+ - Are tests coupled to private methods or internal state directly?
64
+
65
+ If brittleness is systemic (most tests in the file break on a rename) → 🔴 Critical.
66
+ If isolated (1–2 brittle tests) → 🟢 Suggestion.
67
+
68
+ ### Step 2b: Scan for Mock Abuse
69
+
70
+ *Mock Abuse produces tests that pass regardless of whether the real behavior is correct.
71
+ Scan this separately from brittleness — over-mocking is often the cause of brittleness,
72
+ but it is a distinct problem worth its own finding.*
73
+
74
+ **Sample 3–5 tests once for both steps 2a and 2b together** — read each test body and
75
+ check brittleness signals and mock-setup ratio in the same pass, then write separate
76
+ findings if both problems are present.
77
+
78
+ Look for:
79
+ - Is mock setup code longer than the assertion logic in the sampled tests?
80
+ - Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` rather than
81
+ assertions on outputs, state, or observable events?
82
+ - Are there methods in production classes that are only called from test files
83
+ (test-induced design damage)?
84
+ - Does any single test create more than 3 mock objects?
85
+
86
+ If mock setup-to-assertion ratio exceeds 3:1 → 🟡 Warning.
87
+ If production methods exist only for test access → 🔴 Critical (architecture is being
88
+ distorted by the test suite).
89
+
90
+ ### Step 3: Scan for Test Duplication
91
+
92
+ Look for:
93
+ - Is the same setup block (same variables initialized the same way) repeated across
94
+ 5 or more test files without a shared helper?
95
+ - Are there multiple tests that pass identical inputs and assert identical outputs
96
+ with no differentiation (Lazy Test)?
97
+ - Is the same business scenario covered at unit, integration, and E2E level with no
98
+ difference in what each layer is testing?
99
+
100
+ If duplication is systemic (10 or more instances) → Critical.
101
+ If localized (3–5 instances) → Warning.
102
+
103
+ ### Step 4: Scan for Coverage Illusion and Architecture Mismatch
104
+
105
+ Look for Coverage Illusion:
106
+ - Pick the most recently modified core module. Are its error-handling branches and
107
+ null/boundary inputs covered by tests?
108
+ - Are there legacy areas (old functions, no test files nearby) that are actively
109
+ being changed?
110
+ - Do the tests assert on side effects (DB writes, events emitted, state transitions)
111
+ or only on return values?
112
+
113
+ **Characterization Test check:** If legacy code is being modified without existing tests,
114
+ the team needs Characterization Tests before making the change — not after. Look for
115
+ this pattern and flag it when absent.
116
+
117
+ A Characterization Test locks in current behavior (right or wrong) so future changes
118
+ do not silently regress it. Template:
119
+ ```
120
+ test("characterize: [module].[method] given [input], returns [current output]") {
121
+ // Call the code under test with realistic inputs
122
+ // Assert on whatever it currently returns — even if you suspect the output is wrong
123
+ // Add a comment: "This captures current behavior, not necessarily correct behavior"
124
+ }
125
+ ```
126
+ Source: Feathers — Working Effectively with Legacy Code, Ch. 13: Characterization Tests
127
+
128
+ Look for Architecture Mismatch:
129
+ - Compare the suite map from the start: is the ratio close to 70% unit / 20% integration / 10% E2E?
130
+ - Are high-risk modules tested at higher density than trivial utilities?
131
+
132
+ **Test suite performance:** A slow test suite is a first-class maintainability risk — it
133
+ breaks the fast-feedback loop and causes developers to skip running tests locally.
134
+ - If the full suite runtime is known and > 10 minutes → 🟡 Warning
135
+ - If the full suite runtime is > 30 minutes or unknown → 🔴 Critical (unknown suite time
136
+ means nobody is running it regularly)
137
+ - If tests that could be unit tests are integration tests, that is a Performance Mismatch:
138
+ each misclassified test adds seconds of avoidable wait time
139
+
140
+ Source: Meszaros — xUnit Test Patterns, Slow Tests (p. 253)
141
+
142
+ ### Step 5: Apply Iron Law, Output Report
143
+
144
+ Apply the Iron Law format from `../_shared/common.md` to each finding.
145
+
146
+ Use the standard Report Template. Mode: Test Quality Review.
147
+ Include the Test Suite Map as a code block immediately before the `## Findings` heading, labeled "Test Suite Map".