@lvlup-sw/axiom 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,180 @@
1
+ # Error Patterns Reference
2
+
3
+ Taxonomy of error handling patterns, anti-patterns, and severity guidance for the `harden` skill. Use this reference when classifying catch blocks and evaluating error propagation.
4
+
5
+ ## Silent Catch Taxonomy
6
+
7
+ Four categories of catch block behavior, ordered from most dangerous to least:
8
+
9
+ ### 1. Empty Catch (Severity: HIGH)
10
+
11
+ ```typescript
12
+ // Pattern: catch body is empty or contains only whitespace
13
+ try { riskyOperation(); } catch (e) {}
14
+ try { riskyOperation(); } catch { }
15
+ ```
16
+
17
+ **Why it matters:** Errors are completely invisible. The operation appears to succeed when it failed. Downstream code operates on incorrect assumptions.
18
+
19
+ **Action:** Must add error handling. At minimum, log with context and re-throw if the caller needs to know.
20
+
21
+ ### 2. Log-Only (Severity: MEDIUM)
22
+
23
+ ```typescript
24
+ // Pattern: catch logs but takes no corrective action
25
+ try { riskyOperation(); } catch (e) { console.log(e); }
26
+ try { riskyOperation(); } catch (e) { logger.warn('failed', e); }
27
+ ```
28
+
29
+ **Why it matters:** The error is visible in logs but the system continues as if nothing happened. If the operation was important, downstream code operates on stale or missing data.
30
+
31
+ **Action:** Evaluate whether the operation needs recovery. If yes, add recovery logic. If the operation is truly optional, document why in a comment.
32
+
33
+ ### 3. Swallow-and-Default (Severity: MEDIUM to HIGH)
34
+
35
+ ```typescript
36
+ // Pattern: catch replaces the result with a default value
37
+ try { config = loadConfig(); } catch { config = DEFAULT_CONFIG; }
38
+ try { data = fetchRemote(); } catch { data = cachedData; }
39
+ ```
40
+
41
+ **Why it matters:** The system silently switches to degraded behavior. The operator has no visibility into the fallback. If the default is incorrect or stale, the system produces wrong results while appearing healthy.
42
+
43
+ **Severity escalation:** HIGH when the default can cause data loss or incorrect behavior. MEDIUM when the default is safe but operators should know.
44
+
45
+ **Action:** Log the fallback activation. Add metrics or health checks that surface degraded mode.
46
+
47
+ ### 4. Catch-and-Rethrow-Generic (Severity: MEDIUM)
48
+
49
+ ```typescript
50
+ // Pattern: catch wraps the error but loses context
51
+ try { riskyOperation(); } catch (e) { throw new Error('Operation failed'); }
52
+ // vs. correct:
53
+ try { riskyOperation(); } catch (e) { throw new Error('Failed to load user config', { cause: e }); }
54
+ ```
55
+
56
+ **Why it matters:** The original error's stack trace, message, and context are lost. Debugging requires reproducing the issue rather than reading the error chain.
57
+
58
+ **Action:** Preserve the cause chain using `{ cause: e }`. Include what operation failed and why in the wrapper message.
59
+
60
+ ---
61
+
62
+ ## Error Context Checklist
63
+
64
+ Every error message should answer these four questions:
65
+
66
+ | Question | Example (Good) | Example (Bad) |
67
+ |----------|----------------|---------------|
68
+ | **What failed?** | "Failed to read workflow state for feature-123" | "Read failed" |
69
+ | **Why did it fail?** | "File not found at /tmp/state/feature-123.json" | "Error occurred" |
70
+ | **What to do about it?** | "Ensure the workflow was initialized with `init`" | (nothing) |
71
+ | **Cause chain?** | `new Error('...', { cause: originalError })` | `new Error(originalError.message)` |
72
+
73
+ ### Context Completeness Scoring
74
+
75
+ - **4/4 questions answered:** Excellent error — no finding
76
+ - **3/4 questions answered:** Acceptable — LOW finding if missing "what to do"
77
+ - **2/4 questions answered:** Incomplete — MEDIUM finding
78
+ - **1/4 or 0/4 questions answered:** Poor — HIGH finding (effectively opaque)
79
+
80
+ ---
81
+
82
+ ## Fallback Anti-Patterns
83
+
84
+ ### Silent Degradation (Severity: HIGH)
85
+
86
+ The system switches to a less capable mode without any signal to the operator.
87
+
88
+ ```typescript
89
+ // Anti-pattern: silent mode switch
90
+ function getStore() {
91
+ if (!configuredStore) {
92
+ return new InMemoryStore(); // Silently degrades to non-persistent store
93
+ }
94
+ return configuredStore;
95
+ }
96
+ ```
97
+
98
+ **Fix:** Log when fallback activates. Add a health check endpoint or metric that surfaces degraded state.
99
+
100
+ ### Invisible Mode Switches (Severity: HIGH)
101
+
102
+ ```typescript
103
+ // Anti-pattern: behavior changes silently based on error
104
+ let mode = 'full';
105
+ try { await connectToService(); } catch { mode = 'limited'; }
106
+ // Rest of code behaves differently but nothing signals the switch
107
+ ```
108
+
109
+ **Fix:** Make mode visible via logging, metrics, or return value. Callers should know they're operating in degraded mode.
110
+
111
+ ### Best-Effort Without Signaling (Severity: MEDIUM)
112
+
113
+ ```typescript
114
+ // Anti-pattern: best-effort with no visibility
115
+ async function syncData() {
116
+ try { await pushToRemote(data); } catch { /* best effort */ }
117
+ }
118
+ ```
119
+
120
+ **Fix:** Even best-effort operations should log failures. The operator needs to know sync is failing so they can investigate and fix the root cause.
121
+
122
+ ---
123
+
124
+ ## Promise Rejection Patterns
125
+
126
+ ### Swallowed Rejections (Severity: HIGH)
127
+
128
+ ```typescript
129
+ // Pattern: promise rejection silently consumed
130
+ promise.catch(() => {});
131
+ promise.catch(() => undefined);
132
+ someAsyncFn().catch(() => {});
133
+ ```
134
+
135
+ **Why it matters:** Same as empty catch — the error is invisible. Worse in async contexts because the failure may surface much later as corrupted state.
136
+
137
+ ### Unhandled Rejection Handlers (Severity: MEDIUM)
138
+
139
+ ```typescript
140
+ // Pattern: global handler as a band-aid
141
+ process.on('unhandledRejection', (err) => {
142
+ console.error('Unhandled rejection:', err);
143
+ });
144
+ ```
145
+
146
+ **Why it matters:** Global handlers are a safety net, not a solution. Each rejection should be handled at the call site with appropriate recovery or propagation.
147
+
148
+ **Action:** Keep the global handler as a safety net, but fix each unhandled rejection at its source.
149
+
150
+ ### Fire-and-Forget Without Error Handling (Severity: MEDIUM)
151
+
152
+ ```typescript
153
+ // Pattern: async call started but never awaited and no catch
154
+ sendAnalytics(event); // Returns a promise, never awaited
155
+ cleanupTempFiles(); // Async, failure silently ignored
156
+ ```
157
+
158
+ **Fix:** Either await and handle the error, or explicitly catch and log:
159
+ ```typescript
160
+ sendAnalytics(event).catch(err => logger.warn('Analytics send failed', { err }));
161
+ ```
162
+
163
+ **Exception:** Non-critical telemetry and observability side-effects (e.g., `emitGateEvent(...)`, `sendAnalytics(event)`) may be allowed to fail silently when all of: (1) the call is clearly annotated as fire-and-forget, (2) failure cannot affect primary execution correctness, and (3) the scope is limited to observability. Do not flag these as findings.
164
+
165
+ ---
166
+
167
+ ## Severity Summary
168
+
169
+ | Pattern | Default Severity | Escalation Condition |
170
+ |---------|-----------------|---------------------|
171
+ | Empty catch | HIGH | Always HIGH |
172
+ | Log-only catch | MEDIUM | HIGH if operation affects data integrity |
173
+ | Swallow-and-default | MEDIUM | HIGH if default can cause data loss |
174
+ | Catch-and-rethrow-generic | MEDIUM | HIGH if error is user-facing or triggers retry |
175
+ | Silent degradation | HIGH | Always HIGH |
176
+ | Invisible mode switch | HIGH | Always HIGH |
177
+ | Best-effort without signaling | MEDIUM | HIGH if operation is data-critical |
178
+ | Swallowed promise rejection | HIGH | Always HIGH |
179
+ | Unhandled rejection handler as fix | MEDIUM | HIGH if in production critical path |
180
+ | Fire-and-forget | MEDIUM | HIGH if operation has side effects |
@@ -0,0 +1,82 @@
1
+ # Resilience Checklist
2
+
3
+ Structured checklist for evaluating operational resilience. Each section covers a resilience concern with pass/fail criteria. Use during the qualitative assessment phase of the `harden` skill.
4
+
5
+ ---
6
+
7
+ ## Resource Management
8
+
9
+ Verify that every acquired resource has a corresponding release, and that long-lived collections are bounded.
10
+
11
+ | # | Check | Pass Criteria | Fail Criteria |
12
+ |---|-------|---------------|---------------|
13
+ | R-1 | Bounded caches | Every `Map`, `Set`, or array used as a cache has a documented max size and eviction policy (LRU, TTL, or explicit `.clear()` on lifecycle events) | Collection grows without limit; no `.delete()`, `.clear()`, or size check nearby |
14
+ | R-2 | Connection pools | Database and HTTP connections use pooling with max connections configured; pools are drained on shutdown | Connections opened per-request without pooling, or pool has no max size |
15
+ | R-3 | File handle lifecycle | File handles opened in `try` are closed in `finally` (or use `using`/`await using` with disposable pattern); error paths close handles too | File opened in try, closed only on success path; error path leaks the handle |
16
+ | R-4 | Event listener cleanup | Listeners registered with `.on()` or `.addEventListener()` are removed with `.off()` / `.removeEventListener()` when the owner is disposed | Listeners accumulate without removal, causing memory leaks and duplicate processing |
17
+ | R-5 | Stream lifecycle | Streams are `.end()`ed or `.destroy()`ed on both success and error paths; pipeline errors trigger cleanup | Stream left open on error, causing resource leak or hanging process |
18
+
19
+ ---
20
+
21
+ ## Timeout Patterns
22
+
23
+ Verify that every external call is bounded by a timeout.
24
+
25
+ | # | Check | Pass Criteria | Fail Criteria |
26
+ |---|-------|---------------|---------------|
27
+ | T-1 | HTTP timeout | Every `fetch()` or HTTP client call has a `signal` (AbortController) or `timeout` option configured | `fetch()` called without timeout; could hang indefinitely on network issues |
28
+ | T-2 | Database query timeout | Database queries have statement-level or connection-level timeouts configured | Queries can run unbounded; a slow query blocks the connection pool |
29
+ | T-3 | File system timeout | Long-running file operations (directory walks, large reads) have either a timeout or are chunked | Unbounded `readdir` on large directories or `readFile` on potentially huge files |
30
+ | T-4 | AbortController usage | AbortController signals are wired correctly — controller created, signal passed, abort called on timeout/cancellation | AbortController created but signal never passed to the operation, or abort never called |
31
+ | T-5 | IPC/subprocess timeout | Child processes and IPC calls have timeouts; hung processes are killed | `child_process.exec` without timeout; process could hang forever |
32
+
33
+ ---
34
+
35
+ ## Retry Patterns
36
+
37
+ Verify that retry logic is bounded and uses appropriate backoff.
38
+
39
+ | # | Check | Pass Criteria | Fail Criteria |
40
+ |---|-------|---------------|---------------|
41
+ | Y-1 | Maximum attempts | Every retry loop has a configured max attempt count (typically 3-5 for transient failures) | `while (true) { try ... catch { continue } }` with no attempt counter |
42
+ | Y-2 | Exponential backoff | Retry delays increase exponentially (e.g., 100ms, 200ms, 400ms, 800ms) rather than fixed intervals | Fixed delay between retries (e.g., always `sleep(1000)`) or no delay at all |
43
+ | Y-3 | Jitter | Retry delays include random jitter to prevent thundering herd on shared resources | All instances retry at exactly the same intervals, causing load spikes |
44
+ | Y-4 | Circuit breaker | For services with sustained failures, a circuit breaker stops retrying and fails fast after N consecutive failures | System retries indefinitely against a down service, wasting resources and delaying fallback |
45
+ | Y-5 | Idempotency awareness | Retried operations are safe to repeat (idempotent) or use deduplication keys | Non-idempotent operations (e.g., payment charges) retried without deduplication |
46
+
47
+ ---
48
+
49
+ ## Concurrency Safety
50
+
51
+ Verify that concurrent access to shared state is safe.
52
+
53
+ | # | Check | Pass Criteria | Fail Criteria |
54
+ |---|-------|---------------|---------------|
55
+ | C-1 | Mutex / lock patterns | Shared mutable state accessed by concurrent operations is protected by a mutex, lock, or serialization queue | Two async operations can interleave reads and writes to the same state |
56
+ | C-2 | Compare-and-swap (CAS) | State updates that depend on current value use CAS or optimistic locking patterns | Read-then-write without checking that the value hasn't changed between read and write |
57
+ | C-3 | Single-instance guards | Operations that must run exactly once (initialization, migration) have guard mechanisms (flags, locks, or idempotency checks) | Initialization can run concurrently from two entry points, causing duplicate setup or race conditions |
58
+ | C-4 | Async iteration safety | Collections are not mutated during async iteration (`for await ... of`) | Array/Map modified while being iterated, causing skipped or duplicate items |
59
+ | C-5 | Promise.all error handling | `Promise.all` failures are handled (consider `Promise.allSettled` when partial success is acceptable) | One rejection in `Promise.all` causes all results to be lost; no partial success handling |
60
+
61
+ ---
62
+
63
+ ## Graceful Degradation
64
+
65
+ Verify that the system handles partial failures without cascading collapse.
66
+
67
+ | # | Check | Pass Criteria | Fail Criteria |
68
+ |---|-------|---------------|---------------|
69
+ | G-1 | Partial failure tolerance | System continues operating when a non-critical subsystem fails; critical path is isolated from optional features | Single subsystem failure takes down the entire system |
70
+ | G-2 | Feature flags for degraded mode | Degraded behavior can be toggled via configuration without code deployment; operators can disable failing features | Degraded mode requires a code change and redeployment to activate |
71
+ | G-3 | Health check accuracy | Health endpoints reflect actual system state, including degraded subsystems | Health check returns "healthy" while a critical subsystem is down |
72
+ | G-4 | Bulkhead isolation | Independent request paths don't share failure domains; one slow endpoint doesn't block others | All requests share a single thread pool or connection pool; one slow path starves others |
73
+ | G-5 | Load shedding | System rejects excess load with clear error (429/503) rather than accepting and timing out | System accepts all requests regardless of capacity, causing timeouts and cascading failures |
74
+
75
+ ---
76
+
77
+ ## How to Use This Checklist
78
+
79
+ 1. **Scope:** Apply to the files/modules specified in the harden skill invocation
80
+ 2. **Evaluate:** For each check, determine Pass or Fail based on the criteria
81
+ 3. **Report:** Failed checks become findings with the severity from the corresponding dimension (DIM-7 for resource/timeout/retry/concurrency, DIM-2 for degradation visibility)
82
+ 4. **Prioritize:** HIGH findings (unbounded growth, missing timeouts on critical paths, no retry limits) before MEDIUM (suboptimal patterns that don't risk failure)
@@ -0,0 +1,102 @@
1
+ ---
2
+ name: scan
3
+ description: "Run deterministic pattern checks against backend code. Use when you need mechanical detection of known anti-patterns, code smells, or structural issues. Triggers: 'scan code', 'check patterns', 'run checks', or /axiom:scan. Do NOT use for qualitative architecture review — use axiom:critique instead."
4
+ user-invokable: true
5
+ metadata:
6
+ author: lvlup-sw
7
+ version: 0.1.0
8
+ category: assessment
9
+ dimensions:
10
+ - pluggable
11
+ ---
12
+
13
+ # Scan Skill
14
+
15
+ ## Overview
16
+
17
+ Deterministic check engine that runs grep patterns and structural analysis against backend code. Accepts a `scope` argument (file path, directory path, or cwd) and a `dimensions` argument (comma-separated dimension list or "all") to control which checks execute.
18
+
19
+ This skill performs purely mechanical detection — it matches known patterns, counts structural violations, and reports findings with exact file locations. It does not make qualitative judgments about architecture or design. Other skills (like `axiom:critique`) handle subjective assessment and invoke `scan` when they need deterministic evidence.
20
+
21
+ See `@skills/scan/references/check-catalog.md` for scan-specific execution guidance including ordering, batching, and exclusions.
22
+
23
+ ## Triggers
24
+
25
+ Activate this skill when:
26
+ - User says "scan code", "check patterns", "run checks", or "detect anti-patterns"
27
+ - User runs `/axiom:scan`
28
+ - Another axiom skill requests deterministic pattern detection
29
+ - User wants mechanical verification of known code smells
30
+
31
+ Do not activate this skill when:
32
+ - User wants qualitative architecture review — use `axiom:critique` instead
33
+ - User wants a full quality audit — use `axiom:audit` instead
34
+ - User needs subjective design feedback rather than pattern matching
35
+
36
+ ## Process
37
+
38
+ ### Step 1: Load Check Catalog
39
+
40
+ Load the canonical check definitions from `@skills/backend-quality/references/deterministic-checks.md`. This file defines every grep pattern, structural check, and their associated dimensions and severities.
41
+
42
+ ### Step 2: Load Project-Specific Checks (Optional)
43
+
44
+ If `.axiom/checks.md` exists in the project root, load additional project-specific check definitions. These follow the same format as the canonical catalog and are merged into the check set. If the file does not exist, silently skip this step.
45
+
46
+ ### Step 3: Filter by Requested Dimensions
47
+
48
+ Apply the `dimensions` argument to filter the merged check set:
49
+ - If `dimensions` is `"all"` or omitted, run every check in the catalog
50
+ - If `dimensions` is a comma-separated list (e.g., `"topology,observability"`), only run checks tagged with those dimensions
51
+ - Invalid dimension names produce an actionable error message listing valid dimensions
52
+
53
+ ### Step 4: Execute Checks
54
+
55
+ For each check in the filtered set, run the grep or structural pattern against the resolved `scope`:
56
+ - Execute grep patterns using the exact expressions from the catalog
57
+ - Run structural analysis checks (file counts, nesting depth, import graphs)
58
+ - Collect all matches with file path, line number, and matched content
59
+ - Record checks that produced zero matches as passing
60
+
61
+ ### Step 5: Format Findings
62
+
63
+ Format each match as a finding per `@skills/backend-quality/references/findings-format.md`. Every finding produced by this skill is marked `deterministic: true` to distinguish mechanical detections from qualitative assessments.
64
+
65
+ ### Step 6: Group and Output
66
+
67
+ Output findings grouped by dimension, then by severity (HIGH, MEDIUM, LOW) within each dimension. Include:
68
+ - Total checks run and total findings
69
+ - Per-dimension summary with finding counts
70
+ - Individual findings with file location, matched pattern, and remediation hint
71
+
72
+ ## Error Handling
73
+
74
+ - **Invalid patterns:** If a grep pattern from the catalog fails to compile or execute, report the specific pattern and error message. Do not silently skip — the user needs to know which check is broken so they can fix the catalog entry.
75
+ - **Empty scope:** If the resolved scope contains no files to scan (empty directory, nonexistent path), return a "nothing to scan" message with the resolved path. Do not treat this as a failure.
76
+ - **Missing `.axiom/checks.md`:** Silently skip project-specific check loading. This file is optional and its absence is expected for projects that have not customized their check set.
77
+ - **Permission errors:** If a file cannot be read due to permissions, log the file path and continue scanning remaining files.
78
+
79
+ ## Output Format
80
+
81
+ All findings use the standard format from `@skills/backend-quality/references/findings-format.md` with one addition: every finding includes `deterministic: true` to signal that it was produced by mechanical pattern matching, not qualitative judgment.
82
+
83
+ ```yaml
84
+ dimension: observability
85
+ severity: MEDIUM
86
+ deterministic: true
87
+ file: src/handlers/query.ts
88
+ line: 42
89
+ pattern: readEvents
90
+ match: "const items = await readEvents(stream)"
91
+ remediation: "Move raw event reads out of query handlers — use a read model projection instead"
92
+ ```
93
+
94
+ ## Anti-Patterns
95
+
96
+ | Don't | Do Instead |
97
+ |-------|------------|
98
+ | Make qualitative judgments about matches | Report the match and let critique/review skills interpret |
99
+ | Skip checks that produce many matches | Report all matches — downstream skills handle prioritization |
100
+ | Suppress false positives silently | Report them and let the user tune exclusions in `.axiom/checks.md` |
101
+ | Run checks outside the requested scope | Respect the scope boundary strictly |
102
+ | Invent patterns not in the catalog | Only run checks defined in the canonical or project-specific catalogs |
@@ -0,0 +1,68 @@
1
+ # Check Catalog — Scan Execution Guidance
2
+
3
+ Scan-specific guidance for executing the deterministic checks defined in the canonical catalog at `@skills/backend-quality/references/deterministic-checks.md`. This document covers execution order, performance strategies, and result interpretation.
4
+
5
+ ## Execution Order
6
+
7
+ Run checks from cheapest to most expensive. This lets you fail fast on simple violations before investing time in structural analysis.
8
+
9
+ 1. **Simple grep patterns** (fast, single-pass text matching) — string literals, regex against individual files
10
+ 2. **Multi-file grep patterns** (fast, but touches more files) — cross-file import checks, naming conventions
11
+ 3. **Structural analysis** (slower, requires parsing or counting) — nesting depth, cyclomatic complexity proxies, file size checks
12
+ 4. **Cross-reference checks** (slowest, requires building dependency graphs) — unused exports, circular imports, missing test coverage mapping
13
+
14
+ Within each tier, run checks in dimension order (DIM-1 through DIM-7) for predictable output grouping.
15
+
16
+ ## Timeout Guidance
17
+
18
+ - **Per-check timeout:** 30 seconds for any individual grep or structural check. If a single pattern takes longer, report it as a timeout finding rather than blocking the entire scan.
19
+ - **Total scan timeout:** 5 minutes for a full "all dimensions" scan on a typical repository. For monorepos or very large codebases, recommend running dimension-by-dimension.
20
+ - **Early termination:** If a scope contains more than 10,000 files after exclusions, warn the user and suggest narrowing the scope before proceeding.
21
+
22
+ ## Batch Strategies
23
+
24
+ Running patterns one at a time is wasteful when scanning large codebases. Use these strategies to reduce I/O overhead:
25
+
26
+ - **Combined grep:** Where multiple patterns target the same file set and dimension, combine them into a single grep invocation using alternation (`pattern1\|pattern2\|pattern3`). Parse the output to attribute matches back to individual checks.
27
+ - **File-type grouping:** Group checks by their `--include` glob (e.g., all `*.ts` checks together, all `*.py` checks together) to avoid re-traversing the directory tree.
28
+ - **Scope pre-filtering:** Resolve the file list once via `find` or glob expansion, then run all grep patterns against the pre-filtered list rather than letting each grep re-walk the tree.
29
+ - **Parallel execution:** When the check set is large, run independent dimension groups in parallel. D1 checks have no dependency on D3 checks, so they can execute concurrently.
30
+
31
+ ## Default Exclusions
32
+
33
+ Always exclude these paths and file types from scanning unless the user explicitly overrides:
34
+
35
+ - `node_modules/` — third-party dependencies, not project code
36
+ - `dist/` — build output, generated from source
37
+ - `.git/` — version control internals
38
+ - `build/` — alternative build output directory
39
+ - `coverage/` — test coverage reports
40
+ - `*.min.js`, `*.min.css` — minified assets
41
+ - Binary files (images, fonts, compiled artifacts) — not scannable as text
42
+ - Generated files (`*.generated.ts`, `*.g.ts`, `*.pb.ts`) — produced by code generators, not authored
43
+
44
+ These exclusions prevent false positives from non-authored code and keep scan times reasonable.
45
+
46
+ ## Interpreting Results
47
+
48
+ ### True Positives
49
+
50
+ A match against a catalog pattern in authored code is a true positive. Report it with the full context: file path, line number, matched text, and the remediation hint from the catalog entry.
51
+
52
+ ### False Positives
53
+
54
+ Some patterns intentionally cast a wide net. Common false positive scenarios:
55
+
56
+ - **Test files:** Patterns that detect anti-patterns in production code may match intentional test doubles or test assertions. The catalog should tag checks with `exclude-tests: true` where appropriate.
57
+ - **Comments and documentation:** A grep pattern may match a comment explaining why a pattern is avoided. Context lines (-B/-A) help the user distinguish these.
58
+ - **Legacy code with suppression markers:** If a file contains `// axiom-ignore-next-line` or `// axiom-ignore: <check-id>`, skip that specific match. This is the project-level false positive suppression mechanism.
59
+
60
+ When in doubt, report the match. It is better to surface a false positive that the user can dismiss than to silently hide a true violation. Users can add exclusions to `.axiom/checks.md` to suppress recurring false positives.
61
+
62
+ ### Zero Matches
63
+
64
+ A check that produces zero matches is a passing check. Record it in the summary as passed — this gives the user confidence that the dimension was actually evaluated, not just skipped.
65
+
66
+ ## Cross-Reference
67
+
68
+ The canonical check definitions (patterns, dimensions, severities, remediation hints) live in `@skills/backend-quality/references/deterministic-checks.md`. This file is the single source of truth for what to check. The scan skill does not define its own patterns — it only defines how to execute them efficiently.
@@ -0,0 +1,102 @@
1
+ ---
2
+ name: verify
3
+ description: "Validate test quality by finding test-production divergence, mock overuse, and schema drift. Use when evaluating test suite health or after discovering a bug that tests missed. Triggers: 'check tests', 'test quality', 'verify contracts', or /axiom:verify. Do NOT use for architecture review — use axiom:critique instead."
4
+ user-invokable: true
5
+ metadata:
6
+ author: lvlup-sw
7
+ version: 0.1.0
8
+ category: assessment
9
+ dimensions:
10
+ - test-fidelity
11
+ - contracts
12
+ ---
13
+
14
+ # Verify — Test Validation
15
+
16
+ ## Overview
17
+
18
+ Test validation skill covering DIM-4 (Test Fidelity) and DIM-3 (Contracts) from the backend quality dimension taxonomy. Verify finds the gap between what your tests claim to prove and what they actually exercise — the space where bugs hide behind passing suites.
19
+
20
+ This skill focuses on two complementary concerns:
21
+
22
+ - **Test Fidelity (DIM-4):** Do tests exercise actual production behavior, or do they test a parallel universe of mocks and test-only wiring?
23
+ - **Contracts (DIM-3):** Do schemas, types, and API boundaries stay in sync between declaration and usage?
24
+
25
+ ## Triggers
26
+
27
+ Activate this skill when:
28
+ - Evaluating test suite health after a milestone
29
+ - A bug was found that existing tests should have caught
30
+ - Reviewing test quality during code review
31
+ - Investigating why tests pass but production breaks
32
+ - Checking schema/contract integrity after API changes
33
+
34
+ Do NOT activate when:
35
+ - Reviewing architecture, coupling, or SOLID compliance — use `axiom:critique`
36
+ - Investigating error handling or observability — use `axiom:harden`
37
+ - Looking for dead code or vestigial patterns — use `axiom:distill`
38
+
39
+ ## Process
40
+
41
+ ### Step 1: Load Dimension Definitions
42
+
43
+ Load the relevant dimension definitions from `@skills/backend-quality/references/dimensions.md` — specifically the DIM-4 (Test Fidelity) and DIM-3 (Contracts) sections. These define the invariants, detectable signals, and severity guides for each dimension.
44
+
45
+ ### Step 2: Run Deterministic Checks
46
+
47
+ Run `axiom:scan` targeting the Test Fidelity and Contracts dimensions. This surfaces mechanical findings — grep-detectable patterns like:
48
+ - `describe.skip` / `it.skip` without issue references
49
+ - More than 3 `vi.mock()` or `jest.mock()` calls in a single test file
50
+ - `as Type` assertions without preceding type guards
51
+ - Schema fields referenced in code but absent from Zod/JSON schema definitions
52
+
53
+ ### Step 3: Layer Qualitative Assessment
54
+
55
+ On top of deterministic findings, apply human-judgment assessment for patterns that require understanding intent:
56
+
57
+ - **Test-production divergence:** Compare test setup and factory functions against production initialization code. Are tests creating instances the same way production does? Different instances of shared resources, different initialization order, different configuration, and different wiring are all divergence signals.
58
+
59
+ - **Mock fidelity:** Are mocks placed at true infrastructure boundaries only (HTTP, database, filesystem)? More than 3 mocks in a single test is a smell — it usually means the test is operating at the wrong layer. Check whether mocks verify behavior (what happened) or implementation (how it happened).
60
+
61
+ - **Missing integration tests:** Identify cross-cutting concerns tested only with unit tests. Shared state, event propagation, multi-module workflows, and initialization sequences need integration-level coverage.
62
+
63
+ - **Schema/contract drift:** Look for types removed but still read at runtime, breaking API changes without versioning, and Zod schemas that have diverged from their TypeScript type counterparts. See `@skills/verify/references/contract-testing.md` for detailed detection approaches.
64
+
65
+ - **Test coverage gap analysis:** Are tests exercising only the happy path? Look for missing error paths, boundary cases, empty inputs, and concurrent scenarios.
66
+
67
+ For detailed patterns and taxonomy, see `@skills/verify/references/test-antipatterns.md`.
68
+
69
+ ### Step 4: Output Findings
70
+
71
+ Format all findings per `@skills/backend-quality/references/findings-format.md`. Each finding must include:
72
+ - Dimension (DIM-3 or DIM-4)
73
+ - Severity (HIGH, MEDIUM, LOW)
74
+ - Evidence (file:line references)
75
+ - Explanation and optional suggestion
76
+
77
+ ## The "Passing Tests, Broken System" Problem
78
+
79
+ High test counts and high coverage percentages can create false confidence when tests do not exercise production paths. A suite of thousands of tests proves nothing if every test creates its own isolated world that diverges from how the system actually runs.
80
+
81
+ **Canonical example — the EventStore divergence bug:** 4192 tests passed while the system silently lost events. The root cause: tests created and consumed events through the same EventStore instance, but production wired two separate instances that were never connected. Every test exercised a path that did not exist in production. The tests were not wrong in isolation — they were wrong in aggregate, testing a topology that production never used.
82
+
83
+ This is the most dangerous class of test failure: the test suite becomes a confidence generator rather than a defect detector. Verify exists to find these gaps before they become production incidents.
84
+
85
+ **Warning signs:**
86
+ - Test setup differs from production startup sequence
87
+ - All tests use in-memory implementations of dependencies that production resolves differently
88
+ - No test exercises the actual wiring/initialization path
89
+ - Tests mock the very thing they should be testing
90
+
91
+ ## Error Handling
92
+
93
+ - **Empty scope:** If no files match the provided scope (or no scope is provided), output an informative message: "No files in scope for verify analysis. Provide a file path, directory, or glob pattern." Do not produce empty findings.
94
+ - **No test files found:** If the scope contains source code but no test files, report this as a DIM-4 finding (severity depends on context).
95
+ - **Parse failures:** If a file cannot be parsed for schema analysis, log and skip with a note in the output.
96
+
97
+ ## References
98
+
99
+ - Dimension definitions: `@skills/backend-quality/references/dimensions.md`
100
+ - Finding output format: `@skills/backend-quality/references/findings-format.md`
101
+ - Test antipattern catalog: `@skills/verify/references/test-antipatterns.md`
102
+ - Contract testing guide: `@skills/verify/references/contract-testing.md`