@lvlup-sw/axiom 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +16 -0
- package/CLAUDE.md +26 -0
- package/LICENSE +21 -0
- package/package.json +37 -0
- package/skills/audit/SKILL.md +126 -0
- package/skills/audit/references/composition-guide.md +105 -0
- package/skills/backend-quality/SKILL.md +40 -0
- package/skills/backend-quality/references/deterministic-checks.md +151 -0
- package/skills/backend-quality/references/dimensions.md +206 -0
- package/skills/backend-quality/references/findings-format.md +61 -0
- package/skills/backend-quality/references/scoring-model.md +86 -0
- package/skills/critique/SKILL.md +132 -0
- package/skills/critique/references/dependency-patterns.md +319 -0
- package/skills/critique/references/solid-principles.md +359 -0
- package/skills/distill/SKILL.md +83 -0
- package/skills/distill/references/dead-code-patterns.md +152 -0
- package/skills/distill/references/simplification-guide.md +128 -0
- package/skills/harden/SKILL.md +161 -0
- package/skills/harden/references/error-patterns.md +180 -0
- package/skills/harden/references/resilience-checklist.md +82 -0
- package/skills/scan/SKILL.md +102 -0
- package/skills/scan/references/check-catalog.md +68 -0
- package/skills/verify/SKILL.md +102 -0
- package/skills/verify/references/contract-testing.md +185 -0
- package/skills/verify/references/test-antipatterns.md +161 -0
|
@@ -0,0 +1,180 @@
|
|
|
1
|
+
# Error Patterns Reference
|
|
2
|
+
|
|
3
|
+
Taxonomy of error handling patterns, anti-patterns, and severity guidance for the `harden` skill. Use this reference when classifying catch blocks and evaluating error propagation.
|
|
4
|
+
|
|
5
|
+
## Silent Catch Taxonomy
|
|
6
|
+
|
|
7
|
+
Four categories of catch block behavior, ordered from most dangerous to least:
|
|
8
|
+
|
|
9
|
+
### 1. Empty Catch (Severity: HIGH)
|
|
10
|
+
|
|
11
|
+
```typescript
|
|
12
|
+
// Pattern: catch body is empty or contains only whitespace
|
|
13
|
+
try { riskyOperation(); } catch (e) {}
|
|
14
|
+
try { riskyOperation(); } catch { }
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
**Why it matters:** Errors are completely invisible. The operation appears to succeed when it failed. Downstream code operates on incorrect assumptions.
|
|
18
|
+
|
|
19
|
+
**Action:** Must add error handling. At minimum, log with context and re-throw if the caller needs to know.
|
|
20
|
+
|
|
21
|
+
### 2. Log-Only (Severity: MEDIUM)
|
|
22
|
+
|
|
23
|
+
```typescript
|
|
24
|
+
// Pattern: catch logs but takes no corrective action
|
|
25
|
+
try { riskyOperation(); } catch (e) { console.log(e); }
|
|
26
|
+
try { riskyOperation(); } catch (e) { logger.warn('failed', e); }
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
**Why it matters:** The error is visible in logs but the system continues as if nothing happened. If the operation was important, downstream code operates on stale or missing data.
|
|
30
|
+
|
|
31
|
+
**Action:** Evaluate whether the operation needs recovery. If yes, add recovery logic. If the operation is truly optional, document why in a comment.
|
|
32
|
+
|
|
33
|
+
### 3. Swallow-and-Default (Severity: MEDIUM to HIGH)
|
|
34
|
+
|
|
35
|
+
```typescript
|
|
36
|
+
// Pattern: catch replaces the result with a default value
|
|
37
|
+
try { config = loadConfig(); } catch { config = DEFAULT_CONFIG; }
|
|
38
|
+
try { data = fetchRemote(); } catch { data = cachedData; }
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Why it matters:** The system silently switches to degraded behavior. The operator has no visibility into the fallback. If the default is incorrect or stale, the system produces wrong results while appearing healthy.
|
|
42
|
+
|
|
43
|
+
**Severity escalation:** HIGH when the default can cause data loss or incorrect behavior. MEDIUM when the default is safe but operators should know.
|
|
44
|
+
|
|
45
|
+
**Action:** Log the fallback activation. Add metrics or health checks that surface degraded mode.
|
|
46
|
+
|
|
47
|
+
### 4. Catch-and-Rethrow-Generic (Severity: MEDIUM)
|
|
48
|
+
|
|
49
|
+
```typescript
|
|
50
|
+
// Pattern: catch wraps the error but loses context
|
|
51
|
+
try { riskyOperation(); } catch (e) { throw new Error('Operation failed'); }
|
|
52
|
+
// vs. correct:
|
|
53
|
+
try { riskyOperation(); } catch (e) { throw new Error('Failed to load user config', { cause: e }); }
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
**Why it matters:** The original error's stack trace, message, and context are lost. Debugging requires reproducing the issue rather than reading the error chain.
|
|
57
|
+
|
|
58
|
+
**Action:** Preserve the cause chain using `{ cause: e }`. Include what operation failed and why in the wrapper message.
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## Error Context Checklist
|
|
63
|
+
|
|
64
|
+
Every error message should answer these four questions:
|
|
65
|
+
|
|
66
|
+
| Question | Example (Good) | Example (Bad) |
|
|
67
|
+
|----------|----------------|---------------|
|
|
68
|
+
| **What failed?** | "Failed to read workflow state for feature-123" | "Read failed" |
|
|
69
|
+
| **Why did it fail?** | "File not found at /tmp/state/feature-123.json" | "Error occurred" |
|
|
70
|
+
| **What to do about it?** | "Ensure the workflow was initialized with `init`" | (nothing) |
|
|
71
|
+
| **Cause chain?** | `new Error('...', { cause: originalError })` | `new Error(originalError.message)` |
|
|
72
|
+
|
|
73
|
+
### Context Completeness Scoring
|
|
74
|
+
|
|
75
|
+
- **4/4 questions answered:** Excellent error — no finding
|
|
76
|
+
- **3/4 questions answered:** Acceptable — LOW finding if missing "what to do"
|
|
77
|
+
- **2/4 questions answered:** Incomplete — MEDIUM finding
|
|
78
|
+
- **1/4 or 0/4 questions answered:** Poor — HIGH finding (effectively opaque)
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Fallback Anti-Patterns
|
|
83
|
+
|
|
84
|
+
### Silent Degradation (Severity: HIGH)
|
|
85
|
+
|
|
86
|
+
The system switches to a less capable mode without any signal to the operator.
|
|
87
|
+
|
|
88
|
+
```typescript
|
|
89
|
+
// Anti-pattern: silent mode switch
|
|
90
|
+
function getStore() {
|
|
91
|
+
if (!configuredStore) {
|
|
92
|
+
return new InMemoryStore(); // Silently degrades to non-persistent store
|
|
93
|
+
}
|
|
94
|
+
return configuredStore;
|
|
95
|
+
}
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**Fix:** Log when fallback activates. Add a health check endpoint or metric that surfaces degraded state.
|
|
99
|
+
|
|
100
|
+
### Invisible Mode Switches (Severity: HIGH)
|
|
101
|
+
|
|
102
|
+
```typescript
|
|
103
|
+
// Anti-pattern: behavior changes silently based on error
|
|
104
|
+
let mode = 'full';
|
|
105
|
+
try { await connectToService(); } catch { mode = 'limited'; }
|
|
106
|
+
// Rest of code behaves differently but nothing signals the switch
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
**Fix:** Make mode visible via logging, metrics, or return value. Callers should know they're operating in degraded mode.
|
|
110
|
+
|
|
111
|
+
### Best-Effort Without Signaling (Severity: MEDIUM)
|
|
112
|
+
|
|
113
|
+
```typescript
|
|
114
|
+
// Anti-pattern: best-effort with no visibility
|
|
115
|
+
async function syncData() {
|
|
116
|
+
try { await pushToRemote(data); } catch { /* best effort */ }
|
|
117
|
+
}
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
**Fix:** Even best-effort operations should log failures. The operator needs to know sync is failing so they can investigate and fix the root cause.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## Promise Rejection Patterns
|
|
125
|
+
|
|
126
|
+
### Swallowed Rejections (Severity: HIGH)
|
|
127
|
+
|
|
128
|
+
```typescript
|
|
129
|
+
// Pattern: promise rejection silently consumed
|
|
130
|
+
promise.catch(() => {});
|
|
131
|
+
promise.catch(() => undefined);
|
|
132
|
+
someAsyncFn().catch(() => {});
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
**Why it matters:** Same as empty catch — the error is invisible. Worse in async contexts because the failure may surface much later as corrupted state.
|
|
136
|
+
|
|
137
|
+
### Unhandled Rejection Handlers (Severity: MEDIUM)
|
|
138
|
+
|
|
139
|
+
```typescript
|
|
140
|
+
// Pattern: global handler as a band-aid
|
|
141
|
+
process.on('unhandledRejection', (err) => {
|
|
142
|
+
console.error('Unhandled rejection:', err);
|
|
143
|
+
});
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Why it matters:** Global handlers are a safety net, not a solution. Each rejection should be handled at the call site with appropriate recovery or propagation.
|
|
147
|
+
|
|
148
|
+
**Action:** Keep the global handler as a safety net, but fix each unhandled rejection at its source.
|
|
149
|
+
|
|
150
|
+
### Fire-and-Forget Without Error Handling (Severity: MEDIUM)
|
|
151
|
+
|
|
152
|
+
```typescript
|
|
153
|
+
// Pattern: async call started but never awaited and no catch
|
|
154
|
+
sendAnalytics(event); // Returns a promise, never awaited
|
|
155
|
+
cleanupTempFiles(); // Async, failure silently ignored
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**Fix:** Either await and handle the error, or explicitly catch and log:
|
|
159
|
+
```typescript
|
|
160
|
+
sendAnalytics(event).catch(err => logger.warn('Analytics send failed', { err }));
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
**Exception:** Non-critical telemetry and observability side-effects (e.g., `emitGateEvent(...)`, `sendAnalytics(event)`) may be allowed to fail silently when all of: (1) the call is clearly annotated as fire-and-forget, (2) failure cannot affect primary execution correctness, and (3) the scope is limited to observability. Do not flag these as findings.
|
|
164
|
+
|
|
165
|
+
---
|
|
166
|
+
|
|
167
|
+
## Severity Summary
|
|
168
|
+
|
|
169
|
+
| Pattern | Default Severity | Escalation Condition |
|
|
170
|
+
|---------|-----------------|---------------------|
|
|
171
|
+
| Empty catch | HIGH | Always HIGH |
|
|
172
|
+
| Log-only catch | MEDIUM | HIGH if operation affects data integrity |
|
|
173
|
+
| Swallow-and-default | MEDIUM | HIGH if default can cause data loss |
|
|
174
|
+
| Catch-and-rethrow-generic | MEDIUM | HIGH if error is user-facing or triggers retry |
|
|
175
|
+
| Silent degradation | HIGH | Always HIGH |
|
|
176
|
+
| Invisible mode switch | HIGH | Always HIGH |
|
|
177
|
+
| Best-effort without signaling | MEDIUM | HIGH if operation is data-critical |
|
|
178
|
+
| Swallowed promise rejection | HIGH | Always HIGH |
|
|
179
|
+
| Unhandled rejection handler as fix | MEDIUM | HIGH if in production critical path |
|
|
180
|
+
| Fire-and-forget | MEDIUM | HIGH if operation has side effects |
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# Resilience Checklist
|
|
2
|
+
|
|
3
|
+
Structured checklist for evaluating operational resilience. Each section covers a resilience concern with pass/fail criteria. Use during the qualitative assessment phase of the `harden` skill.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Resource Management
|
|
8
|
+
|
|
9
|
+
Verify that every acquired resource has a corresponding release, and that long-lived collections are bounded.
|
|
10
|
+
|
|
11
|
+
| # | Check | Pass Criteria | Fail Criteria |
|
|
12
|
+
|---|-------|---------------|---------------|
|
|
13
|
+
| R-1 | Bounded caches | Every `Map`, `Set`, or array used as a cache has a documented max size and eviction policy (LRU, TTL, or explicit `.clear()` on lifecycle events) | Collection grows without limit; no `.delete()`, `.clear()`, or size check nearby |
|
|
14
|
+
| R-2 | Connection pools | Database and HTTP connections use pooling with max connections configured; pools are drained on shutdown | Connections opened per-request without pooling, or pool has no max size |
|
|
15
|
+
| R-3 | File handle lifecycle | File handles opened in `try` are closed in `finally` (or use `using`/`await using` with disposable pattern); error paths close handles too | File opened in try, closed only on success path; error path leaks the handle |
|
|
16
|
+
| R-4 | Event listener cleanup | Listeners registered with `.on()` or `.addEventListener()` are removed with `.off()` / `.removeEventListener()` when the owner is disposed | Listeners accumulate without removal, causing memory leaks and duplicate processing |
|
|
17
|
+
| R-5 | Stream lifecycle | Streams are `.end()`ed or `.destroy()`ed on both success and error paths; pipeline errors trigger cleanup | Stream left open on error, causing resource leak or hanging process |
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Timeout Patterns
|
|
22
|
+
|
|
23
|
+
Verify that every external call is bounded by a timeout.
|
|
24
|
+
|
|
25
|
+
| # | Check | Pass Criteria | Fail Criteria |
|
|
26
|
+
|---|-------|---------------|---------------|
|
|
27
|
+
| T-1 | HTTP timeout | Every `fetch()` or HTTP client call has a `signal` (AbortController) or `timeout` option configured | `fetch()` called without timeout; could hang indefinitely on network issues |
|
|
28
|
+
| T-2 | Database query timeout | Database queries have statement-level or connection-level timeouts configured | Queries can run unbounded; a slow query blocks the connection pool |
|
|
29
|
+
| T-3 | File system timeout | Long-running file operations (directory walks, large reads) have either a timeout or are chunked | Unbounded `readdir` on large directories or `readFile` on potentially huge files |
|
|
30
|
+
| T-4 | AbortController usage | AbortController signals are wired correctly — controller created, signal passed, abort called on timeout/cancellation | AbortController created but signal never passed to the operation, or abort never called |
|
|
31
|
+
| T-5 | IPC/subprocess timeout | Child processes and IPC calls have timeouts; hung processes are killed | `child_process.exec` without timeout; process could hang forever |
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Retry Patterns
|
|
36
|
+
|
|
37
|
+
Verify that retry logic is bounded and uses appropriate backoff.
|
|
38
|
+
|
|
39
|
+
| # | Check | Pass Criteria | Fail Criteria |
|
|
40
|
+
|---|-------|---------------|---------------|
|
|
41
|
+
| Y-1 | Maximum attempts | Every retry loop has a configured max attempt count (typically 3-5 for transient failures) | `while (true) { try ... catch { continue } }` with no attempt counter |
|
|
42
|
+
| Y-2 | Exponential backoff | Retry delays increase exponentially (e.g., 100ms, 200ms, 400ms, 800ms) rather than fixed intervals | Fixed delay between retries (e.g., always `sleep(1000)`) or no delay at all |
|
|
43
|
+
| Y-3 | Jitter | Retry delays include random jitter to prevent thundering herd on shared resources | All instances retry at exactly the same intervals, causing load spikes |
|
|
44
|
+
| Y-4 | Circuit breaker | For services with sustained failures, a circuit breaker stops retrying and fails fast after N consecutive failures | System retries indefinitely against a down service, wasting resources and delaying fallback |
|
|
45
|
+
| Y-5 | Idempotency awareness | Retried operations are safe to repeat (idempotent) or use deduplication keys | Non-idempotent operations (e.g., payment charges) retried without deduplication |
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Concurrency Safety
|
|
50
|
+
|
|
51
|
+
Verify that concurrent access to shared state is safe.
|
|
52
|
+
|
|
53
|
+
| # | Check | Pass Criteria | Fail Criteria |
|
|
54
|
+
|---|-------|---------------|---------------|
|
|
55
|
+
| C-1 | Mutex / lock patterns | Shared mutable state accessed by concurrent operations is protected by a mutex, lock, or serialization queue | Two async operations can interleave reads and writes to the same state |
|
|
56
|
+
| C-2 | Compare-and-swap (CAS) | State updates that depend on current value use CAS or optimistic locking patterns | Read-then-write without checking that the value hasn't changed between read and write |
|
|
57
|
+
| C-3 | Single-instance guards | Operations that must run exactly once (initialization, migration) have guard mechanisms (flags, locks, or idempotency checks) | Initialization can run concurrently from two entry points, causing duplicate setup or race conditions |
|
|
58
|
+
| C-4 | Async iteration safety | Collections are not mutated during async iteration (`for await ... of`) | Array/Map modified while being iterated, causing skipped or duplicate items |
|
|
59
|
+
| C-5 | Promise.all error handling | `Promise.all` failures are handled (consider `Promise.allSettled` when partial success is acceptable) | One rejection in `Promise.all` causes all results to be lost; no partial success handling |
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Graceful Degradation
|
|
64
|
+
|
|
65
|
+
Verify that the system handles partial failures without cascading collapse.
|
|
66
|
+
|
|
67
|
+
| # | Check | Pass Criteria | Fail Criteria |
|
|
68
|
+
|---|-------|---------------|---------------|
|
|
69
|
+
| G-1 | Partial failure tolerance | System continues operating when a non-critical subsystem fails; critical path is isolated from optional features | Single subsystem failure takes down the entire system |
|
|
70
|
+
| G-2 | Feature flags for degraded mode | Degraded behavior can be toggled via configuration without code deployment; operators can disable failing features | Degraded mode requires a code change and redeployment to activate |
|
|
71
|
+
| G-3 | Health check accuracy | Health endpoints reflect actual system state, including degraded subsystems | Health check returns "healthy" while a critical subsystem is down |
|
|
72
|
+
| G-4 | Bulkhead isolation | Independent request paths don't share failure domains; one slow endpoint doesn't block others | All requests share a single thread pool or connection pool; one slow path starves others |
|
|
73
|
+
| G-5 | Load shedding | System rejects excess load with clear error (429/503) rather than accepting and timing out | System accepts all requests regardless of capacity, causing timeouts and cascading failures |
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## How to Use This Checklist
|
|
78
|
+
|
|
79
|
+
1. **Scope:** Apply to the files/modules specified in the harden skill invocation
|
|
80
|
+
2. **Evaluate:** For each check, determine Pass or Fail based on the criteria
|
|
81
|
+
3. **Report:** Failed checks become findings with the severity from the corresponding dimension (DIM-7 for resource/timeout/retry/concurrency, DIM-2 for degradation visibility)
|
|
82
|
+
4. **Prioritize:** HIGH findings (unbounded growth, missing timeouts on critical paths, no retry limits) before MEDIUM (suboptimal patterns that don't risk failure)
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: scan
|
|
3
|
+
description: "Run deterministic pattern checks against backend code. Use when you need mechanical detection of known anti-patterns, code smells, or structural issues. Triggers: 'scan code', 'check patterns', 'run checks', or /axiom:scan. Do NOT use for qualitative architecture review — use axiom:critique instead."
|
|
4
|
+
user-invokable: true
|
|
5
|
+
metadata:
|
|
6
|
+
author: lvlup-sw
|
|
7
|
+
version: 0.1.0
|
|
8
|
+
category: assessment
|
|
9
|
+
dimensions:
|
|
10
|
+
- pluggable
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Scan Skill
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
Deterministic check engine that runs grep patterns and structural analysis against backend code. Accepts a `scope` argument (file path, directory path, or cwd) and a `dimensions` argument (comma-separated dimension list or "all") to control which checks execute.
|
|
18
|
+
|
|
19
|
+
This skill performs purely mechanical detection — it matches known patterns, counts structural violations, and reports findings with exact file locations. It does not make qualitative judgments about architecture or design. Other skills (like `axiom:critique`) handle subjective assessment and invoke `scan` when they need deterministic evidence.
|
|
20
|
+
|
|
21
|
+
See `@skills/scan/references/check-catalog.md` for scan-specific execution guidance including ordering, batching, and exclusions.
|
|
22
|
+
|
|
23
|
+
## Triggers
|
|
24
|
+
|
|
25
|
+
Activate this skill when:
|
|
26
|
+
- User says "scan code", "check patterns", "run checks", or "detect anti-patterns"
|
|
27
|
+
- User runs `/axiom:scan`
|
|
28
|
+
- Another axiom skill requests deterministic pattern detection
|
|
29
|
+
- User wants mechanical verification of known code smells
|
|
30
|
+
|
|
31
|
+
Do not activate this skill when:
|
|
32
|
+
- User wants qualitative architecture review — use `axiom:critique` instead
|
|
33
|
+
- User wants a full quality audit — use `axiom:audit` instead
|
|
34
|
+
- User needs subjective design feedback rather than pattern matching
|
|
35
|
+
|
|
36
|
+
## Process
|
|
37
|
+
|
|
38
|
+
### Step 1: Load Check Catalog
|
|
39
|
+
|
|
40
|
+
Load the canonical check definitions from `@skills/backend-quality/references/deterministic-checks.md`. This file defines every grep pattern, structural check, and their associated dimensions and severities.
|
|
41
|
+
|
|
42
|
+
### Step 2: Load Project-Specific Checks (Optional)
|
|
43
|
+
|
|
44
|
+
If `.axiom/checks.md` exists in the project root, load additional project-specific check definitions. These follow the same format as the canonical catalog and are merged into the check set. If the file does not exist, silently skip this step.
|
|
45
|
+
|
|
46
|
+
### Step 3: Filter by Requested Dimensions
|
|
47
|
+
|
|
48
|
+
Apply the `dimensions` argument to filter the merged check set:
|
|
49
|
+
- If `dimensions` is `"all"` or omitted, run every check in the catalog
|
|
50
|
+
- If `dimensions` is a comma-separated list (e.g., `"topology,observability"`), only run checks tagged with those dimensions
|
|
51
|
+
- Invalid dimension names produce an actionable error message listing valid dimensions
|
|
52
|
+
|
|
53
|
+
### Step 4: Execute Checks
|
|
54
|
+
|
|
55
|
+
For each check in the filtered set, run the grep or structural pattern against the resolved `scope`:
|
|
56
|
+
- Execute grep patterns using the exact expressions from the catalog
|
|
57
|
+
- Run structural analysis checks (file counts, nesting depth, import graphs)
|
|
58
|
+
- Collect all matches with file path, line number, and matched content
|
|
59
|
+
- Record checks that produced zero matches as passing
|
|
60
|
+
|
|
61
|
+
### Step 5: Format Findings
|
|
62
|
+
|
|
63
|
+
Format each match as a finding per `@skills/backend-quality/references/findings-format.md`. Every finding produced by this skill is marked `deterministic: true` to distinguish mechanical detections from qualitative assessments.
|
|
64
|
+
|
|
65
|
+
### Step 6: Group and Output
|
|
66
|
+
|
|
67
|
+
Output findings grouped by dimension, then by severity (HIGH, MEDIUM, LOW) within each dimension. Include:
|
|
68
|
+
- Total checks run and total findings
|
|
69
|
+
- Per-dimension summary with finding counts
|
|
70
|
+
- Individual findings with file location, matched pattern, and remediation hint
|
|
71
|
+
|
|
72
|
+
## Error Handling
|
|
73
|
+
|
|
74
|
+
- **Invalid patterns:** If a grep pattern from the catalog fails to compile or execute, report the specific pattern and error message. Do not silently skip — the user needs to know which check is broken so they can fix the catalog entry.
|
|
75
|
+
- **Empty scope:** If the resolved scope contains no files to scan (empty directory, nonexistent path), return a "nothing to scan" message with the resolved path. Do not treat this as a failure.
|
|
76
|
+
- **Missing `.axiom/checks.md`:** Silently skip project-specific check loading. This file is optional and its absence is expected for projects that have not customized their check set.
|
|
77
|
+
- **Permission errors:** If a file cannot be read due to permissions, log the file path and continue scanning remaining files.
|
|
78
|
+
|
|
79
|
+
## Output Format
|
|
80
|
+
|
|
81
|
+
All findings use the standard format from `@skills/backend-quality/references/findings-format.md` with one addition: every finding includes `deterministic: true` to signal that it was produced by mechanical pattern matching, not qualitative judgment.
|
|
82
|
+
|
|
83
|
+
```yaml
|
|
84
|
+
dimension: observability
|
|
85
|
+
severity: MEDIUM
|
|
86
|
+
deterministic: true
|
|
87
|
+
file: src/handlers/query.ts
|
|
88
|
+
line: 42
|
|
89
|
+
pattern: readEvents
|
|
90
|
+
match: "const items = await readEvents(stream)"
|
|
91
|
+
remediation: "Move raw event reads out of query handlers — use a read model projection instead"
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## Anti-Patterns
|
|
95
|
+
|
|
96
|
+
| Don't | Do Instead |
|
|
97
|
+
|-------|------------|
|
|
98
|
+
| Make qualitative judgments about matches | Report the match and let critique/review skills interpret |
|
|
99
|
+
| Skip checks that produce many matches | Report all matches — downstream skills handle prioritization |
|
|
100
|
+
| Suppress false positives silently | Report them and let the user tune exclusions in `.axiom/checks.md` |
|
|
101
|
+
| Run checks outside the requested scope | Respect the scope boundary strictly |
|
|
102
|
+
| Invent patterns not in the catalog | Only run checks defined in the canonical or project-specific catalogs |
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
# Check Catalog — Scan Execution Guidance
|
|
2
|
+
|
|
3
|
+
Scan-specific guidance for executing the deterministic checks defined in the canonical catalog at `@skills/backend-quality/references/deterministic-checks.md`. This document covers execution order, performance strategies, and result interpretation.
|
|
4
|
+
|
|
5
|
+
## Execution Order
|
|
6
|
+
|
|
7
|
+
Run checks from cheapest to most expensive. This lets you fail fast on simple violations before investing time in structural analysis.
|
|
8
|
+
|
|
9
|
+
1. **Simple grep patterns** (fast, single-pass text matching) — string literals, regex against individual files
|
|
10
|
+
2. **Multi-file grep patterns** (fast, but touches more files) — cross-file import checks, naming conventions
|
|
11
|
+
3. **Structural analysis** (slower, requires parsing or counting) — nesting depth, cyclomatic complexity proxies, file size checks
|
|
12
|
+
4. **Cross-reference checks** (slowest, requires building dependency graphs) — unused exports, circular imports, missing test coverage mapping
|
|
13
|
+
|
|
14
|
+
Within each tier, run checks in dimension order (DIM-1 through DIM-7) for predictable output grouping.
|
|
15
|
+
|
|
16
|
+
## Timeout Guidance
|
|
17
|
+
|
|
18
|
+
- **Per-check timeout:** 30 seconds for any individual grep or structural check. If a single pattern takes longer, report it as a timeout finding rather than blocking the entire scan.
|
|
19
|
+
- **Total scan timeout:** 5 minutes for a full "all dimensions" scan on a typical repository. For monorepos or very large codebases, recommend running dimension-by-dimension.
|
|
20
|
+
- **Early termination:** If a scope contains more than 10,000 files after exclusions, warn the user and suggest narrowing the scope before proceeding.
|
|
21
|
+
|
|
22
|
+
## Batch Strategies
|
|
23
|
+
|
|
24
|
+
Running patterns one at a time is wasteful when scanning large codebases. Use these strategies to reduce I/O overhead:
|
|
25
|
+
|
|
26
|
+
- **Combined grep:** Where multiple patterns target the same file set and dimension, combine them into a single grep invocation using alternation (`pattern1\|pattern2\|pattern3`). Parse the output to attribute matches back to individual checks.
|
|
27
|
+
- **File-type grouping:** Group checks by their `--include` glob (e.g., all `*.ts` checks together, all `*.py` checks together) to avoid re-traversing the directory tree.
|
|
28
|
+
- **Scope pre-filtering:** Resolve the file list once via `find` or glob expansion, then run all grep patterns against the pre-filtered list rather than letting each grep re-walk the tree.
|
|
29
|
+
- **Parallel execution:** When the check set is large, run independent dimension groups in parallel. D1 checks have no dependency on D3 checks, so they can execute concurrently.
|
|
30
|
+
|
|
31
|
+
## Default Exclusions
|
|
32
|
+
|
|
33
|
+
Always exclude these paths and file types from scanning unless the user explicitly overrides:
|
|
34
|
+
|
|
35
|
+
- `node_modules/` — third-party dependencies, not project code
|
|
36
|
+
- `dist/` — build output, generated from source
|
|
37
|
+
- `.git/` — version control internals
|
|
38
|
+
- `build/` — alternative build output directory
|
|
39
|
+
- `coverage/` — test coverage reports
|
|
40
|
+
- `*.min.js`, `*.min.css` — minified assets
|
|
41
|
+
- Binary files (images, fonts, compiled artifacts) — not scannable as text
|
|
42
|
+
- Generated files (`*.generated.ts`, `*.g.ts`, `*.pb.ts`) — produced by code generators, not authored
|
|
43
|
+
|
|
44
|
+
These exclusions prevent false positives from non-authored code and keep scan times reasonable.
|
|
45
|
+
|
|
46
|
+
## Interpreting Results
|
|
47
|
+
|
|
48
|
+
### True Positives
|
|
49
|
+
|
|
50
|
+
A match against a catalog pattern in authored code is a true positive. Report it with the full context: file path, line number, matched text, and the remediation hint from the catalog entry.
|
|
51
|
+
|
|
52
|
+
### False Positives
|
|
53
|
+
|
|
54
|
+
Some patterns intentionally cast a wide net. Common false positive scenarios:
|
|
55
|
+
|
|
56
|
+
- **Test files:** Patterns that detect anti-patterns in production code may match intentional test doubles or test assertions. The catalog should tag checks with `exclude-tests: true` where appropriate.
|
|
57
|
+
- **Comments and documentation:** A grep pattern may match a comment explaining why a pattern is avoided. Context lines (-B/-A) help the user distinguish these.
|
|
58
|
+
- **Legacy code with suppression markers:** If a file contains `// axiom-ignore-next-line` or `// axiom-ignore: <check-id>`, skip that specific match. This is the project-level false positive suppression mechanism.
|
|
59
|
+
|
|
60
|
+
When in doubt, report the match. It is better to surface a false positive that the user can dismiss than to silently hide a true violation. Users can add exclusions to `.axiom/checks.md` to suppress recurring false positives.
|
|
61
|
+
|
|
62
|
+
### Zero Matches
|
|
63
|
+
|
|
64
|
+
A check that produces zero matches is a passing check. Record it in the summary as passed — this gives the user confidence that the dimension was actually evaluated, not just skipped.
|
|
65
|
+
|
|
66
|
+
## Cross-Reference
|
|
67
|
+
|
|
68
|
+
The canonical check definitions (patterns, dimensions, severities, remediation hints) live in `@skills/backend-quality/references/deterministic-checks.md`. This file is the single source of truth for what to check. The scan skill does not define its own patterns — it only defines how to execute them efficiently.
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: verify
|
|
3
|
+
description: "Validate test quality by finding test-production divergence, mock overuse, and schema drift. Use when evaluating test suite health or after discovering a bug that tests missed. Triggers: 'check tests', 'test quality', 'verify contracts', or /axiom:verify. Do NOT use for architecture review — use axiom:critique instead."
|
|
4
|
+
user-invokable: true
|
|
5
|
+
metadata:
|
|
6
|
+
author: lvlup-sw
|
|
7
|
+
version: 0.1.0
|
|
8
|
+
category: assessment
|
|
9
|
+
dimensions:
|
|
10
|
+
- test-fidelity
|
|
11
|
+
- contracts
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Verify — Test Validation
|
|
15
|
+
|
|
16
|
+
## Overview
|
|
17
|
+
|
|
18
|
+
Test validation skill covering DIM-4 (Test Fidelity) and DIM-3 (Contracts) from the backend quality dimension taxonomy. Verify finds the gap between what your tests claim to prove and what they actually exercise — the space where bugs hide behind passing suites.
|
|
19
|
+
|
|
20
|
+
This skill focuses on two complementary concerns:
|
|
21
|
+
|
|
22
|
+
- **Test Fidelity (DIM-4):** Do tests exercise actual production behavior, or do they test a parallel universe of mocks and test-only wiring?
|
|
23
|
+
- **Contracts (DIM-3):** Do schemas, types, and API boundaries stay in sync between declaration and usage?
|
|
24
|
+
|
|
25
|
+
## Triggers
|
|
26
|
+
|
|
27
|
+
Activate this skill when:
|
|
28
|
+
- Evaluating test suite health after a milestone
|
|
29
|
+
- A bug was found that existing tests should have caught
|
|
30
|
+
- Reviewing test quality during code review
|
|
31
|
+
- Investigating why tests pass but production breaks
|
|
32
|
+
- Checking schema/contract integrity after API changes
|
|
33
|
+
|
|
34
|
+
Do NOT activate when:
|
|
35
|
+
- Reviewing architecture, coupling, or SOLID compliance — use `axiom:critique`
|
|
36
|
+
- Investigating error handling or observability — use `axiom:harden`
|
|
37
|
+
- Looking for dead code or vestigial patterns — use `axiom:distill`
|
|
38
|
+
|
|
39
|
+
## Process
|
|
40
|
+
|
|
41
|
+
### Step 1: Load Dimension Definitions
|
|
42
|
+
|
|
43
|
+
Load the relevant dimension definitions from `@skills/backend-quality/references/dimensions.md` — specifically the DIM-4 (Test Fidelity) and DIM-3 (Contracts) sections. These define the invariants, detectable signals, and severity guides for each dimension.
|
|
44
|
+
|
|
45
|
+
### Step 2: Run Deterministic Checks
|
|
46
|
+
|
|
47
|
+
Run `axiom:scan` targeting the Test Fidelity and Contracts dimensions. This surfaces mechanical findings — grep-detectable patterns like:
|
|
48
|
+
- `describe.skip` / `it.skip` without issue references
|
|
49
|
+
- More than 3 `vi.mock()` or `jest.mock()` calls in a single test file
|
|
50
|
+
- `as Type` assertions without preceding type guards
|
|
51
|
+
- Schema fields referenced in code but absent from Zod/JSON schema definitions
|
|
52
|
+
|
|
53
|
+
### Step 3: Layer Qualitative Assessment
|
|
54
|
+
|
|
55
|
+
On top of deterministic findings, apply human-judgment assessment for patterns that require understanding intent:
|
|
56
|
+
|
|
57
|
+
- **Test-production divergence:** Compare test setup and factory functions against production initialization code. Are tests creating instances the same way production does? Different instances of shared resources, different initialization order, different configuration, and different wiring are all divergence signals.
|
|
58
|
+
|
|
59
|
+
- **Mock fidelity:** Are mocks placed at true infrastructure boundaries only (HTTP, database, filesystem)? More than 3 mocks in a single test is a smell — it usually means the test is operating at the wrong layer. Check whether mocks verify behavior (what happened) or implementation (how it happened).
|
|
60
|
+
|
|
61
|
+
- **Missing integration tests:** Identify cross-cutting concerns tested only with unit tests. Shared state, event propagation, multi-module workflows, and initialization sequences need integration-level coverage.
|
|
62
|
+
|
|
63
|
+
- **Schema/contract drift:** Look for types removed but still read at runtime, breaking API changes without versioning, and Zod schemas that have diverged from their TypeScript type counterparts. See `@skills/verify/references/contract-testing.md` for detailed detection approaches.
|
|
64
|
+
|
|
65
|
+
- **Test coverage gap analysis:** Are tests exercising only the happy path? Look for missing error paths, boundary cases, empty inputs, and concurrent scenarios.
|
|
66
|
+
|
|
67
|
+
For detailed patterns and taxonomy, see `@skills/verify/references/test-antipatterns.md`.
|
|
68
|
+
|
|
69
|
+
### Step 4: Output Findings
|
|
70
|
+
|
|
71
|
+
Format all findings per `@skills/backend-quality/references/findings-format.md`. Each finding must include:
|
|
72
|
+
- Dimension (DIM-3 or DIM-4)
|
|
73
|
+
- Severity (HIGH, MEDIUM, LOW)
|
|
74
|
+
- Evidence (file:line references)
|
|
75
|
+
- Explanation and optional suggestion
|
|
76
|
+
|
|
77
|
+
## The "Passing Tests, Broken System" Problem
|
|
78
|
+
|
|
79
|
+
High test counts and high coverage percentages can create false confidence when tests do not exercise production paths. A suite of thousands of tests proves nothing if every test creates its own isolated world that diverges from how the system actually runs.
|
|
80
|
+
|
|
81
|
+
**Canonical example — the EventStore divergence bug:** 4192 tests passed while the system silently lost events. The root cause: tests created and consumed events through the same EventStore instance, but production wired two separate instances that were never connected. Every test exercised a path that did not exist in production. The tests were not wrong in isolation — they were wrong in aggregate, testing a topology that production never used.
|
|
82
|
+
|
|
83
|
+
This is the most dangerous class of test failure: the test suite becomes a confidence generator rather than a defect detector. Verify exists to find these gaps before they become production incidents.
|
|
84
|
+
|
|
85
|
+
**Warning signs:**
|
|
86
|
+
- Test setup differs from production startup sequence
|
|
87
|
+
- All tests use in-memory implementations of dependencies that production resolves differently
|
|
88
|
+
- No test exercises the actual wiring/initialization path
|
|
89
|
+
- Tests mock the very thing they should be testing
|
|
90
|
+
|
|
91
|
+
## Error Handling
|
|
92
|
+
|
|
93
|
+
- **Empty scope:** If no files match the provided scope (or no scope is provided), output an informative message: "No files in scope for verify analysis. Provide a file path, directory, or glob pattern." Do not produce empty findings.
|
|
94
|
+
- **No test files found:** If the scope contains source code but no test files, report this as a DIM-4 finding (severity depends on context).
|
|
95
|
+
- **Parse failures:** If a file cannot be parsed for schema analysis, log and skip with a note in the output.
|
|
96
|
+
|
|
97
|
+
## References
|
|
98
|
+
|
|
99
|
+
- Dimension definitions: `@skills/backend-quality/references/dimensions.md`
|
|
100
|
+
- Finding output format: `@skills/backend-quality/references/findings-format.md`
|
|
101
|
+
- Test antipattern catalog: `@skills/verify/references/test-antipatterns.md`
|
|
102
|
+
- Contract testing guide: `@skills/verify/references/contract-testing.md`
|