npm - @tplog/hasapi - Versions diffs - 0.1.0 - Mend

@tplog/hasapi 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

package/LICENSE +21 -0
package/README.md +54 -0
package/bin/hasapi.mjs +292 -0
package/hasapi-skills/README.md +56 -0
package/hasapi-skills/_shared/common.md +240 -0
package/hasapi-skills/_shared/custom-risks-guide.md +48 -0
package/hasapi-skills/_shared/decay-risks.md +294 -0
package/hasapi-skills/_shared/remedy-guide.md +37 -0
package/hasapi-skills/_shared/source-coverage.md +248 -0
package/hasapi-skills/_shared/test-decay-risks.md +246 -0
package/hasapi-skills/hasapi-audit/SKILL.md +42 -0
package/hasapi-skills/hasapi-audit/architecture-guide.md +195 -0
package/hasapi-skills/hasapi-audit/onboarding-guide.md +89 -0
package/hasapi-skills/hasapi-debt/SKILL.md +35 -0
package/hasapi-skills/hasapi-debt/debt-guide.md +125 -0
package/hasapi-skills/hasapi-diagnosing-bugs/SKILL.md +134 -0
package/hasapi-skills/hasapi-diagnosing-bugs/scripts/hitl-loop.template.sh +41 -0
package/hasapi-skills/hasapi-grilling/SKILL.md +10 -0
package/hasapi-skills/hasapi-handoff/SKILL.md +16 -0
package/hasapi-skills/hasapi-health/SKILL.md +37 -0
package/hasapi-skills/hasapi-health/health-guide.md +89 -0
package/hasapi-skills/hasapi-implement/SKILL.md +15 -0
package/hasapi-skills/hasapi-resolving-merge-conflicts/SKILL.md +14 -0
package/hasapi-skills/hasapi-review/SKILL.md +37 -0
package/hasapi-skills/hasapi-review/pr-review-guide.md +163 -0
package/hasapi-skills/hasapi-setup/SKILL.md +121 -0
package/hasapi-skills/hasapi-setup/domain.md +51 -0
package/hasapi-skills/hasapi-setup/issue-tracker-github.md +22 -0
package/hasapi-skills/hasapi-setup/issue-tracker-gitlab.md +23 -0
package/hasapi-skills/hasapi-setup/issue-tracker-local.md +19 -0
package/hasapi-skills/hasapi-setup/triage-labels.md +15 -0
package/hasapi-skills/hasapi-sweep/SKILL.md +38 -0
package/hasapi-skills/hasapi-sweep/sweep-guide.md +264 -0
package/hasapi-skills/hasapi-tdd/SKILL.md +108 -0
package/hasapi-skills/hasapi-tdd/mocking.md +59 -0
package/hasapi-skills/hasapi-tdd/refactoring.md +10 -0
package/hasapi-skills/hasapi-tdd/tests.md +61 -0
package/hasapi-skills/hasapi-test/SKILL.md +36 -0
package/hasapi-skills/hasapi-test/test-guide.md +147 -0
package/hasapi-skills/hasapi-to-issues/SKILL.md +84 -0
package/hasapi-skills/hasapi-to-prd/SKILL.md +75 -0
package/package.json +39 -0

package/hasapi-skills/hasapi-sweep/sweep-guide.md ADDED Viewed

@@ -0,0 +1,264 @@
+# Brooks-Lint — Full Sweep Guide
+Sequential autonomous pipeline: **review → test → debt → audit**. Fixes findings
+in place, iterates until clean or capped, reports residuals. One interaction point:
+Step 0 (pre-flight consent) — after approval the pipeline runs hands-free until Step 8.
+Every finding follows the Iron Law: **Symptom → Source → Consequence → Remedy**.
+---
+### Step 0 — Pre-flight consent gate
+**Goal:** State scope, cost, and irreversibility up front; get explicit consent
+once so later steps never have to ask.
+0a. Estimate the file count using `git ls-files | wc -l` if in a git repo, or
+   `find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/.venv/*' -not -path '*/build/*' -not -path '*/dist/*' -not -path '*/vendor/*' -not -path '*/target/*' | wc -l` otherwise. Order-of-magnitude is enough.
+0b. Show this notice verbatim with the estimate filled in. Do not paraphrase —
+   the user is agreeing to this exact scope.
+   ```
+   ⚠️  /brooks-sweep — Full Repository Sweep & Auto-Fix
+   Scope:    Four analysis dimensions run in sequence — PR code decay (R1–R6),
+             test quality (T1–T6), tech debt, architecture. Edits are made in
+             place inside the detected project scope.
+   Estimated files in scope: ~N
+   Order:    brooks-review → brooks-test → brooks-debt → brooks-audit.
+             Each dimension scans, queues, and fixes before the next starts.
+   Autonomy: Fully autonomous. Safe single-file fixes apply directly. Multi-file
+             fixes that have test coverage AND do not break a public interface
+             also apply directly. High-risk fixes (public API break, cross-module
+             structural change, or no test coverage) are NOT applied — they are
+             recorded in the residual report for human review.
+   Iteration: After each dimension pass, modified files + same-module + static
+             consumers are re-scanned. A finding that fails to fix 3 times is
+             retired to the unresolvable set and never re-queued. Non-critical
+             rounds cap at 3 iterations; critical findings iterate until
+             resolved or retired.
+   Git impact: The pipeline edits files. It does NOT commit, push, or amend.
+             If you have uncommitted work you want to preserve, commit or stash
+             first.
+   Proceed with full autonomous sweep? [Y/n]
+   ```
+0c. Parse the reply (first match wins, evaluate rules in order):
+   1. **Hard negation** (`no`, `n`, `abort`, `cancel`, `取消`, `不要`): abort with "Aborted before scan — no files modified."
+   2. **Consent** (`Y`, `yes`, `ok`, `sure`, `proceed`, `go`, `continue`, `好`, `好的`, `行`, `可以`): proceed to Step 1.
+   3. **Soft pause** (`wait`, `hold on`, `等一下`, `等我`, `let me`): acknowledge in one line ("Understood, waiting"), then wait for the user's next message and re-evaluate from rule 1.
+   4. **Question**: answer it, then re-show the notice once and wait for the next reply. If the next reply is not Consent (rule 2) — whether a second question, another pause, or anything else — abort with "Aborted — did not receive consent after clarification."
+0d. After consent, do not ask further questions until Step 8.
+---
+### Step 1 — Scope enumeration and state init
+1a. Apply Auto Scope Detection from `../_shared/common.md` if the user did not
+   specify files or a directory. Otherwise honor the user's explicit scope.
+1b. Read `.brooks-lint.yaml` from the project root if present. Apply `disable`,
+   `severity`, `ignore`, `focus`, and `custom_risks` per common.md. Record the
+   applied config values and reuse them across all iteration rounds — do not
+   re-read the file in Step 6 even if files were modified.
+1c. Initialize pipeline state (persists across all rounds):
+   - **`unresolvable`** (set): findings retired after 3 failed attempts — keyed by `(file, line_range, risk_code)`; `signature` breaks ties. Never re-queued.
+   - **`non_critical_rounds`** (int, 0): incremented each round producing Warning/Suggestion; reset on clean round.
+   - **`fix_log`** (list): each fix with file, line range, risk code, description, and outcome (`applied` / `reverted` / `retired`).
+1d. Record the final scope file list in the Fix Report output buffer for Step 8.
+---
+### Step 2 — brooks-review pass (R1–R6 code decay)
+Scan every file in scope against all R-series risks defined in
+`../_shared/decay-risks.md`.
+2a. For each R-risk, apply its symptom checklist. Record each hit as a finding
+   with: risk code, file + approximate line range, Symptom, Source,
+   Consequence, Remedy, Severity (Critical / Warning / Suggestion), and
+   **Fix-Class** (see Step 2b).
+2b. Assign Fix-Class per finding:
+   | Class | Criteria |
+   |-------|----------|
+   | **Safe** | Single-file AND fully local: rename a non-exported symbol, extract a constant, remove dead code, add a null guard at a leaf, add a test scaffold for an untested pure function. Any change that modifies or removes an exported symbol is NOT Safe even if in one file. |
+   | **Extended-Safe** | Multi-file but (a) a project test command exists and passes pre-fix, AND (b) the change does not rename, remove, or alter the signature of any publicly exported symbol, AND (c) touches ≤ 5 files in this pass. |
+   | **Residual** | Public API break, cross-service boundary change, no test coverage to fall back on, or remedy ambiguous. NOT applied — carried to the Step 8 residual report. |
+2c. Skip any finding that matches an entry in the `unresolvable` set.
+2d. Apply every Safe and Extended-Safe fix in this dimension, lowest risk
+   within each severity tier first. For each fix: Edit or Write, then append
+   one row to `fix_log` with outcome `applied`. If two fixes touch overlapping
+   line ranges in the same file, apply higher-severity first, re-read the file,
+   then apply the next.
+2e. After all fixes in this dimension, run the project test/lint command if one
+   exists (`package.json` scripts, `pytest`, `cargo test`, `go test ./...`, etc.).
+   If tests fail: revert fixes from this dimension in reverse order one at a
+   time, re-running the test command after each revert, until tests pass.
+   Mark each reverted fix with outcome `reverted` in `fix_log` and promote the
+   finding to **Residual**. If no test command is found, note this once in the
+   report and continue.
+2f. Record dimension summary: N scanned, M Safe applied, K Extended-Safe applied,
+   R reverted, P Residual.
+---
+### Step 3 — brooks-test pass (T1–T6 test decay)
+Scan test files (and untested production code) against T-series risks defined
+in `../_shared/test-decay-risks.md`.
+Follow the same sub-steps as Step 2 (classify → apply → verify → summarize),
+using T-prefix risk codes. For production files with no test coverage at all,
+record as T2 (Missing Tests). A test scaffold that adds a pure-function test is
+**Safe**; adding tests that require new test infrastructure is **Residual**.
+---
+### Step 4 — brooks-debt pass (tech debt accumulation)
+Re-classify R-findings through a debt lens — same symptoms at accumulation scale:
+repeated duplication, layered workarounds, stale `TODO`/`FIXME` clusters, dead
+flags. Score each with **Pain (1–3) × Spread (1–3)**; total 7–9 = Critical,
+4–6 = Warning, 1–3 = Suggestion. Apply a severity bump for pattern-level
+occurrences (isolated Suggestion → 4+ modules Warning).
+Follow the same sub-steps as Step 2. Debt findings often span multiple files
+and are more likely to land in Extended-Safe or Residual than Safe.
+---
+### Step 5 — brooks-audit pass (architecture integrity)
+Scan the full scope for architecture-level issues. The dependency-direction
+symptoms (inverted dependencies, circular imports, cross-domain coupling) are
+defined in `../_shared/decay-risks.md` Risk 5 — use that checklist. Step 5
+additionally covers architecture-only concerns that R5 does not: missing
+abstraction layers, god modules, leaked infrastructure inside domain code,
+and seam-boundary violations.
+Most architecture findings are **Residual** by definition — they require human
+judgment on module boundaries. A few are Extended-Safe (e.g. extract a shared
+constant used in 3+ modules into a new module that nothing else imports yet).
+Do not auto-refactor module layouts, rename packages, or change public exports.
+Follow the same sub-steps as Step 2.
+---
+### Step 6 — Iteration loop
+**Goal:** Re-scan what the fixes touched and converge. Stop on clean round,
+cap, or no progress.
+6a. Build the re-scan scope:
+   - every file modified in Steps 2–5 of the current round, PLUS
+   - every file in the same module as a modified file, PLUS
+   - every file that statically imports from a modified file.
+   Do not re-scan files whose dependencies were not touched. On monorepos
+   where a "module" may span hundreds of files, narrow the same-module bucket
+   to files that import from or are imported by a modified file (direct
+   dependency graph only).
+6b. Re-run Steps 2–5 on the re-scan scope. For each new finding in this round:
+   - If it matches an entry in `unresolvable` → skip.
+   - Else if 🔴 Critical → queue and fix; Critical findings iterate until
+     resolved OR retired (3 failed attempts → `unresolvable`).
+   - Else 🟡 Warning / 🟢 Suggestion → queue and fix, subject to cap below.
+6c. Classify the round after all fixes attempted:
+   - **Clean round** (no new findings outside `unresolvable`): pipeline
+     converged → proceed to Step 7.
+   - **Critical-only round**: do NOT increment `non_critical_rounds`; return
+     to 6a.
+   - **Mixed or non-critical round** (any Warning / Suggestion produced):
+     increment `non_critical_rounds` by 1. If it reaches the cap (default 3,
+     or `sweep.max_iterations` from `.brooks-lint.yaml`), proceed to Step 7
+     with remaining non-critical findings recorded as
+     `"Unresolved — iteration cap reached"`. Otherwise return to 6a.
+6d. Fix-retry rule: if a single finding fails verification (Step 2e) 3 times
+   across any combination of rounds, retire it to `unresolvable` with reason
+   `"3-retry budget exhausted"` and stop attempting it.
+---
+### Step 7 — Residual aggregation
+Collect everything that was NOT fixed in place, de-duplicated:
+- All Residual-class findings from Steps 2–5 (first round + re-scan rounds)
+- All `unresolvable` entries with their retirement reason
+- All iteration-cap residuals from Step 6c
+Sort Critical → Warning → Suggestion. Within each severity, list file path,
+risk code, Symptom (one line), Remedy (one line), and the reason it was not
+applied (`public API break` / `no test coverage` / `3-retry budget` /
+`iteration cap`).
+---
+### Step 8 — Sweep report
+Output the final report. Use the standard Report Template from
+`../_shared/common.md` with these additions:
+```
+# Brooks-Lint — Full Sweep Report
+Mode: Full Sweep | Scope: <files or directory>
+Config: .brooks-lint.yaml applied (N risks disabled, M paths ignored)   # omit if no config
+## Dimension Summary
+| Dimension | Scanned | Safe Applied | Extended Applied | Reverted | Residual |
+|-----------|---------|--------------|------------------|----------|----------|
+| Review (R1–R6) | ... | ... | ... | ... | ... |
+| Test (T1–T6)   | ... | ... | ... | ... | ... |
+| Debt           | ... | ... | ... | ... | ... |
+| Audit          | ... | ... | ... | ... | ... |
+## Iteration History
+Round 1: <classification — clean / critical-only / mixed>, <N> new findings
+Round 2: ...
+Stopped at: clean round | iteration cap | no outstanding criticals
+## Fix Log
+| # | File | Lines | Risk | Outcome  | Change |
+|---|------|-------|------|----------|--------|
+| 1 | ...  | ...   | R2   | applied  | Extract repeated constant |
+| 2 | ...  | ...   | T4   | reverted | Test regression; promoted to Residual |
+...
+## Health Score Delta
+Before: <estimated score>/100  →  After: <estimated score>/100
+(Re-run /brooks-health for an exact recalculation.)
+## Residual Items  (<K> not applied)
+<Iron Law entries, sorted Critical → Suggestion, with "Not applied because: ..." line>
+## Summary
+- Total findings detected: <N>
+- Fixed this sweep: <M>
+- Residual (needs human review): <K>
+- Unresolvable (3-retry exhausted): <U>
+```
+If there are zero residual items and zero unresolvable entries, end with:
+**"Sweep complete — codebase is clean."**
+**Mode line in report:** `Full Sweep`

package/hasapi-skills/hasapi-tdd/SKILL.md ADDED Viewed

@@ -0,0 +1,108 @@
+---
+name: hasapi-tdd
+description: Test-driven development. Use when the user wants to build features or fix bugs test-first, mentions "red-green-refactor", or wants integration tests.
+---
+# Test-Driven Development
+## Philosophy
+**Core principle**: Tests should verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't.
+**Good tests** are integration-style: they exercise real code paths through public APIs. They describe _what_ the system does, not _how_ it does it. A good test reads like a specification - "user can checkout with valid cart" tells you exactly what capability exists. These tests survive refactors because they don't care about internal structure.
+**Bad tests** are coupled to implementation. They mock internal collaborators, test private methods, or verify through external means (like querying a database directly instead of using the interface). The warning sign: your test breaks when you refactor, but behavior hasn't changed. If you rename an internal function and tests fail, those tests were testing implementation, not behavior.
+See [tests.md](tests.md) for examples and [mocking.md](mocking.md) for mocking guidelines.
+## Anti-Pattern: Horizontal Slices
+**DO NOT write all tests first, then all implementation.** This is "horizontal slicing" - treating RED as "write all tests" and GREEN as "write all code."
+This produces **crap tests**:
+- Tests written in bulk test _imagined_ behavior, not _actual_ behavior
+- You end up testing the _shape_ of things (data structures, function signatures) rather than user-facing behavior
+- Tests become insensitive to real changes - they pass when behavior breaks, fail when behavior is fine
+- You outrun your headlights, committing to test structure before understanding the implementation
+**Correct approach**: Vertical slices via tracer bullets. One test → one implementation → repeat. Each test responds to what you learned from the previous cycle. Because you just wrote the code, you know exactly what behavior matters and how to verify it.
+```
+WRONG (horizontal):
+  RED:   test1, test2, test3, test4, test5
+  GREEN: impl1, impl2, impl3, impl4, impl5
+RIGHT (vertical):
+  RED→GREEN: test1→impl1
+  RED→GREEN: test2→impl2
+  RED→GREEN: test3→impl3
+  ...
+```
+## Workflow
+### 1. Planning
+When exploring the codebase, read `CONTEXT.md` (if it exists) so that test names and interface vocabulary match the project's domain language, and respect ADRs in the area you're touching.
+Before writing any code:
+- [ ] Confirm with user what interface changes are needed
+- [ ] Confirm with user which behaviors to test (prioritize)
+- [ ] Identify opportunities for deep modules (small interface, deep implementation) — run the `/codebase-design` skill for the vocabulary and the testability checks
+- [ ] List the behaviors to test (not implementation steps)
+- [ ] Get user approval on the plan
+Ask: "What should the public interface look like? Which behaviors are most important to test?"
+**You can't test everything.** Confirm with the user exactly which behaviors matter most. Focus testing effort on critical paths and complex logic, not every possible edge case.
+### 2. Tracer Bullet
+Write ONE test that confirms ONE thing about the system:
+```
+RED:   Write test for first behavior → test fails
+GREEN: Write minimal code to pass → test passes
+```
+This is your tracer bullet - proves the path works end-to-end.
+### 3. Incremental Loop
+For each remaining behavior:
+```
+RED:   Write next test → fails
+GREEN: Minimal code to pass → passes
+```
+Rules:
+- One test at a time
+- Only enough code to pass current test
+- Don't anticipate future tests
+- Keep tests focused on observable behavior
+### 4. Refactor
+After all tests pass, look for [refactor candidates](refactoring.md):
+- [ ] Extract duplication
+- [ ] Deepen modules (move complexity behind simple interfaces)
+- [ ] Apply SOLID principles where natural
+- [ ] Consider what new code reveals about existing code
+- [ ] Run tests after each refactor step
+**Never refactor while RED.** Get to GREEN first.
+## Checklist Per Cycle
+```
+[ ] Test describes behavior, not implementation
+[ ] Test uses public interface only
+[ ] Test would survive internal refactor
+[ ] Code is minimal for this test
+[ ] No speculative features added
+```

package/hasapi-skills/hasapi-tdd/mocking.md ADDED Viewed

@@ -0,0 +1,59 @@
+# When to Mock
+Mock at **system boundaries** only:
+- External APIs (payment, email, etc.)
+- Databases (sometimes - prefer test DB)
+- Time/randomness
+- File system (sometimes)
+Don't mock:
+- Your own classes/modules
+- Internal collaborators
+- Anything you control
+## Designing for Mockability
+At system boundaries, design interfaces that are easy to mock:
+**1. Use dependency injection**
+Pass external dependencies in rather than creating them internally:
+```typescript
+// Easy to mock
+function processPayment(order, paymentClient) {
+  return paymentClient.charge(order.total);
+}
+// Hard to mock
+function processPayment(order) {
+  const client = new StripeClient(process.env.STRIPE_KEY);
+  return client.charge(order.total);
+}
+```
+**2. Prefer SDK-style interfaces over generic fetchers**
+Create specific functions for each external operation instead of one generic function with conditional logic:
+```typescript
+// GOOD: Each function is independently mockable
+const api = {
+  getUser: (id) => fetch(`/users/${id}`),
+  getOrders: (userId) => fetch(`/users/${userId}/orders`),
+  createOrder: (data) => fetch('/orders', { method: 'POST', body: data }),
+};
+// BAD: Mocking requires conditional logic inside the mock
+const api = {
+  fetch: (endpoint, options) => fetch(endpoint, options),
+};
+```
+The SDK approach means:
+- Each mock returns one specific shape
+- No conditional logic in test setup
+- Easier to see which endpoints a test exercises
+- Type safety per endpoint

package/hasapi-skills/hasapi-tdd/refactoring.md ADDED Viewed

@@ -0,0 +1,10 @@
+# Refactor Candidates
+After TDD cycle, look for:
+- **Duplication** → Extract function/class
+- **Long methods** → Break into private helpers (keep tests on public interface)
+- **Shallow modules** → Combine or deepen
+- **Feature envy** → Move logic to where data lives
+- **Primitive obsession** → Introduce value objects
+- **Existing code** the new code reveals as problematic

package/hasapi-skills/hasapi-tdd/tests.md ADDED Viewed

@@ -0,0 +1,61 @@
+# Good and Bad Tests
+## Good Tests
+**Integration-style**: Test through real interfaces, not mocks of internal parts.
+```typescript
+// GOOD: Tests observable behavior
+test("user can checkout with valid cart", async () => {
+  const cart = createCart();
+  cart.add(product);
+  const result = await checkout(cart, paymentMethod);
+  expect(result.status).toBe("confirmed");
+});
+```
+Characteristics:
+- Tests behavior users/callers care about
+- Uses public API only
+- Survives internal refactors
+- Describes WHAT, not HOW
+- One logical assertion per test
+## Bad Tests
+**Implementation-detail tests**: Coupled to internal structure.
+```typescript
+// BAD: Tests implementation details
+test("checkout calls paymentService.process", async () => {
+  const mockPayment = jest.mock(paymentService);
+  await checkout(cart, payment);
+  expect(mockPayment.process).toHaveBeenCalledWith(cart.total);
+});
+```
+Red flags:
+- Mocking internal collaborators
+- Testing private methods
+- Asserting on call counts/order
+- Test breaks when refactoring without behavior change
+- Test name describes HOW not WHAT
+- Verifying through external means instead of interface
+```typescript
+// BAD: Bypasses interface to verify
+test("createUser saves to database", async () => {
+  await createUser({ name: "Alice" });
+  const row = await db.query("SELECT * FROM users WHERE name = ?", ["Alice"]);
+  expect(row).toBeDefined();
+});
+// GOOD: Verifies through interface
+test("createUser makes user retrievable", async () => {
+  const user = await createUser({ name: "Alice" });
+  const retrieved = await getUser(user.id);
+  expect(retrieved.name).toBe("Alice");
+});
+```

package/hasapi-skills/hasapi-test/SKILL.md ADDED Viewed

@@ -0,0 +1,36 @@
+---
+name: hasapi-test
+description: >
+  Test quality review drawing on twelve classic engineering books — with primary focus
+  on xUnit Test Patterns, The Art of Unit Testing, How Google Tests Software, and
+  Working Effectively with Legacy Code — that diagnoses structural problems in an
+  existing test suite: brittleness, mock abuse, coverage illusions, slow execution,
+  poor readability.
+  Triggers when: user asks about test quality, shares test files for review, or
+  expresses frustration: "tests keep breaking whenever I change anything", "our tests
+  take forever", "I can't understand what this test is doing", "tests pass but bugs
+  still reach production", "we have too many mocks".
+  Do NOT trigger for: writing new tests from scratch (use the regular test-writing
+  workflow) or testing framework/syntax questions — this skill reviews an existing
+  suite for structural quality problems, not individual test authoring.
+---
+# Brooks-Lint — Test Quality Review
+## Setup
+1. Read `../_shared/common.md` for the Iron Law, Project Config, Report Template, and Health Score rules
+2. Read `../_shared/source-coverage.md` for book-level coverage, exceptions, and tradeoffs
+3. Read `../_shared/test-decay-risks.md` for test-space symptom definitions and source attributions
+4. Read `test-guide.md` in this directory for the test quality review framework
+## Process
+**If the user has not shared test files or pointed to a test directory:** apply Auto
+Scope Detection from `../_shared/common.md` to determine the review scope before proceeding.
+1. Build the test suite map (guide's "Before You Start" section)
+2. Scan for each test decay risk in the order specified (Steps 1–4 of the guide)
+3. Apply the Iron Law and output using the Report Template (Step 5 of the guide)
+**Mode line in report:** `Test Quality Review`

package/hasapi-skills/hasapi-test/test-guide.md ADDED Viewed

@@ -0,0 +1,147 @@
+# Test Quality Review Guide — Mode 4
+**Purpose:** Diagnose the health of a test suite using six test-space decay risks.
+Every finding must follow the Iron Law: Symptom → Source → Consequence → Remedy.
+---
+## Before You Start: Build the Test Suite Map
+Before scanning for any risk, map the current test suite structure:
+```
+Unit tests:        X files, ~N tests
+Integration tests: X files, ~N tests
+E2E tests:         X files, ~N tests
+Ratio:             Unit X%  :  Integration X%  :  E2E X%
+Coverage areas:    [modules with tests] vs [modules without tests]
+```
+If you cannot access test files directly, ask the user **one question** — choose the
+most relevant:
+1. "Which module is hardest to test or has the least coverage?"
+2. "When you make a change, how often do unrelated tests break?"
+3. "Is there a part of the codebase your team avoids touching because it has no tests?"
+After one answer, proceed. Do not ask more than one question.
+---
+## Analysis Process
+Work through these five steps in order.
+### Step 1: Scan for Test Obscurity
+*Scan this first — the most visible risk and the one that determines whether the suite
+is maintainable at all.*
+Look for:
+- Read 5–10 test names at random: can each one communicate subject + scenario + expected
+  outcome without opening the test body?
+- Are there tests where a failure gives no clue which behavior broke (multiple assertions,
+  no message strings)?
+- Does any test depend on external state (files, database rows, env variables, shared mutable
+  fixtures) that is invisible from within the test body?
+- Is there a single massive setUp or beforeEach that every test inherits regardless of
+  what it actually needs?
+If all test names are clear and setups are minimal → no finding.
+### Step 2a: Scan for Test Brittleness
+*Brittle tests break on refactors that do not change observable behavior — they test
+implementation, not contracts.*
+Look for:
+- Ask (or check git history): did any recent refactor cause test failures with no
+  behavior change?
+- Are there test methods where the name contains "and" or that assert on 3 or more
+  unrelated behaviors (Eager Test)?
+- Do assertions specify mock call order or exact parameter values that are irrelevant
+  to the observable behavior?
+- Are tests coupled to private methods or internal state directly?
+If brittleness is systemic (most tests in the file break on a rename) → 🔴 Critical.
+If isolated (1–2 brittle tests) → 🟢 Suggestion.
+### Step 2b: Scan for Mock Abuse
+*Mock Abuse produces tests that pass regardless of whether the real behavior is correct.
+Scan this separately from brittleness — over-mocking is often the cause of brittleness,
+but it is a distinct problem worth its own finding.*
+**Sample 3–5 tests once for both steps 2a and 2b together** — read each test body and
+check brittleness signals and mock-setup ratio in the same pass, then write separate
+findings if both problems are present.
+Look for:
+- Is mock setup code longer than the assertion logic in the sampled tests?
+- Are the primary assertions `expect(mock).toHaveBeenCalledWith(...)` rather than
+  assertions on outputs, state, or observable events?
+- Are there methods in production classes that are only called from test files
+  (test-induced design damage)?
+- Does any single test create more than 3 mock objects?
+If mock setup-to-assertion ratio exceeds 3:1 → 🟡 Warning.
+If production methods exist only for test access → 🔴 Critical (architecture is being
+distorted by the test suite).
+### Step 3: Scan for Test Duplication
+Look for:
+- Is the same setup block (same variables initialized the same way) repeated across
+  5 or more test files without a shared helper?
+- Are there multiple tests that pass identical inputs and assert identical outputs
+  with no differentiation (Lazy Test)?
+- Is the same business scenario covered at unit, integration, and E2E level with no
+  difference in what each layer is testing?
+If duplication is systemic (10 or more instances) → Critical.
+If localized (3–5 instances) → Warning.
+### Step 4: Scan for Coverage Illusion and Architecture Mismatch
+Look for Coverage Illusion:
+- Pick the most recently modified core module. Are its error-handling branches and
+  null/boundary inputs covered by tests?
+- Are there legacy areas (old functions, no test files nearby) that are actively
+  being changed?
+- Do the tests assert on side effects (DB writes, events emitted, state transitions)
+  or only on return values?
+**Characterization Test check:** If legacy code is being modified without existing tests,
+the team needs Characterization Tests before making the change — not after. Look for
+this pattern and flag it when absent.
+A Characterization Test locks in current behavior (right or wrong) so future changes
+do not silently regress it. Template:
+```
+test("characterize: [module].[method] given [input], returns [current output]") {
+  // Call the code under test with realistic inputs
+  // Assert on whatever it currently returns — even if you suspect the output is wrong
+  // Add a comment: "This captures current behavior, not necessarily correct behavior"
+}
+```
+Source: Feathers — Working Effectively with Legacy Code, Ch. 13: Characterization Tests
+Look for Architecture Mismatch:
+- Compare the suite map from the start: is the ratio close to 70% unit / 20% integration / 10% E2E?
+- Are high-risk modules tested at higher density than trivial utilities?
+**Test suite performance:** A slow test suite is a first-class maintainability risk — it
+breaks the fast-feedback loop and causes developers to skip running tests locally.
+- If the full suite runtime is known and > 10 minutes → 🟡 Warning
+- If the full suite runtime is > 30 minutes or unknown → 🔴 Critical (unknown suite time
+  means nobody is running it regularly)
+- If tests that could be unit tests are integration tests, that is a Performance Mismatch:
+  each misclassified test adds seconds of avoidable wait time
+Source: Meszaros — xUnit Test Patterns, Slow Tests (p. 253)
+### Step 5: Apply Iron Law, Output Report
+Apply the Iron Law format from `../_shared/common.md` to each finding.
+Use the standard Report Template. Mode: Test Quality Review.
+Include the Test Suite Map as a code block immediately before the `## Findings` heading, labeled "Test Suite Map".