@lvlup-sw/axiom 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,206 @@
1
+ # Backend Quality Dimensions
2
+
3
+ Seven canonical dimensions for assessing backend architectural health. Each dimension is independently assessable — no dimension requires another's output to produce findings.
4
+
5
+ ## DIM-1: Topology
6
+
7
+ **Definition:** The structural health of dependency graphs, wiring correctness, and state ownership. Topology violations create invisible coupling where modules behave differently depending on initialization order or runtime context.
8
+
9
+ **Invariants:**
10
+ - Every shared resource has a single source of truth for its lifecycle
11
+ - Dependencies are explicit (parameter/constructor injection), not ambient (module globals)
12
+ - No module silently creates degraded instances of shared resources
13
+
14
+ **Detectable Signals:**
15
+ - Module-global mutable state (`let moduleStore = ...` at file scope)
16
+ - Lazy fallback constructors (`if (!store) { store = new Store() }`)
17
+ - Manual wiring functions (`configureXxx()`, `registerXxx()`) without validation
18
+ - Divergent instances of the same resource across modules
19
+ - Circular dependency chains
20
+
21
+ **Severity Guide:**
22
+ - **HIGH:** Lazy fallback creates degraded instance silently (masks broken wiring)
23
+ - **MEDIUM:** Module-global mutable state without documented rationale
24
+ - **LOW:** Manual wiring that works but could be simplified
25
+
26
+ **Examples:**
27
+ - Violation: `getStore()` silently creates an in-memory store when the real store wasn't wired, causing events to be invisible across modules
28
+ - Healthy: Constructor injection where the absence of a dependency is a startup error, not a silent fallback
29
+
30
+ ---
31
+
32
+ ## DIM-2: Observability
33
+
34
+ **Definition:** The visibility of errors, failures, and system behavior. Observability violations hide problems, making bugs harder to find and diagnose. A system with poor observability may appear healthy while silently losing data.
35
+
36
+ **Invariants:**
37
+ - Every catch block either re-throws, logs with context, or has documented rationale for swallowing
38
+ - Error messages include what failed, why, and what to do about it
39
+ - Fallback behavior is visible (logged, metriced, or signaled), never silent
40
+
41
+ **Detectable Signals:**
42
+ - Empty catch blocks (`catch {}`, `catch (e) {}`)
43
+ - Catch blocks that only log without context (`catch (e) { console.log(e) }`)
44
+ - Silent fallbacks that switch behavior modes without signaling
45
+ - Missing error context (generic "something went wrong" messages)
46
+ - Swallowed promise rejections (`.catch(() => {})`)
47
+
48
+ **Severity Guide:**
49
+ - **HIGH:** Silent catch that masks data loss or incorrect behavior
50
+ - **MEDIUM:** Catch that logs but lacks actionable context
51
+ - **LOW:** Verbose error that could be more specific
52
+
53
+ **Examples:**
54
+ - Violation: `catch { mutableState._events = [] }` — silently resets state on error, hiding the failure
55
+ - Healthy: `catch (e) { throw new Error('Failed to load events from store', { cause: e }) }`
56
+
57
+ ---
58
+
59
+ ## DIM-3: Contracts
60
+
61
+ **Definition:** The integrity of schemas, APIs, and type boundaries. Contract violations occur when the actual runtime behavior diverges from the declared interface — fields removed from schemas but still read, breaking API changes without versioning, or type assertions that bypass safety.
62
+
63
+ **Invariants:**
64
+ - Every field read at runtime is present in the declared schema/type
65
+ - API changes are versioned or backward-compatible
66
+ - Type assertions (`as`, `!`) have validated preconditions
67
+
68
+ **Detectable Signals:**
69
+ - Schema fields removed but still accessed at runtime
70
+ - Zod/JSON schemas that don't match TypeScript types
71
+ - Unversioned breaking API changes
72
+ - Type assertions without guards (`value as Type` without `typeof`/`instanceof` check)
73
+ - Interface implementations that silently ignore new required members
74
+
75
+ **Severity Guide:**
76
+ - **HIGH:** Schema-runtime divergence (field removed from schema but read at runtime)
77
+ - **MEDIUM:** Type assertion without validation guard
78
+ - **LOW:** Overly permissive schema (accepts more than necessary)
79
+
80
+ **Examples:**
81
+ - Violation: `_events` removed from Zod schema but guard code still reads `state._events`, silently getting `undefined`
82
+ - Healthy: Schema changes accompanied by grep for all field references, with type system enforcing the change
83
+
84
+ ---
85
+
86
+ ## DIM-4: Test Fidelity
87
+
88
+ **Definition:** The degree to which tests exercise actual production behavior. Low test fidelity means tests can pass while the system is broken — the most dangerous kind of false confidence.
89
+
90
+ **Invariants:**
91
+ - Test setup matches production wiring (same instances, same initialization)
92
+ - Mocks are used only at true infrastructure boundaries (HTTP, DB, filesystem)
93
+ - Critical paths have integration tests, not just unit tests
94
+
95
+ **Detectable Signals:**
96
+ - Test setup creates different instances than production wiring
97
+ - More than 3 mocked dependencies in a single test (over-isolation)
98
+ - Unit tests for cross-cutting concerns that need integration tests
99
+ - Tests that assert on mock calls rather than observable behavior
100
+ - Test helpers that hide important setup details
101
+ - `describe.skip` or `it.skip` without tracked issue references
102
+
103
+ **Severity Guide:**
104
+ - **HIGH:** Test-production divergence on shared state (different instances)
105
+ - **MEDIUM:** Over-mocking hides real integration behavior
106
+ - **LOW:** Test naming doesn't follow conventions
107
+
108
+ **Examples:**
109
+ - Violation: All tests use the same EventStore instance for producer and consumer, but production has two separate instances that were never connected — 4192 tests pass, system is broken
110
+ - Healthy: Test creates the same wiring as production startup, catching initialization bugs
111
+
112
+ ---
113
+
114
+ ## DIM-5: Hygiene
115
+
116
+ **Definition:** The absence of dead code, vestigial patterns, and evolutionary leftovers. Poor hygiene increases cognitive load, hides the actual architecture, and provides misleading signals about what the system does.
117
+
118
+ **Invariants:**
119
+ - Every exported symbol has at least one consumer
120
+ - No commented-out code blocks (use version control instead)
121
+ - No divergent implementations of the same behavior
122
+
123
+ **Detectable Signals:**
124
+ - Unreachable code paths (after unconditional return/throw)
125
+ - Unused exports (exported but never imported)
126
+ - Commented-out code blocks (more than 3 lines)
127
+ - Feature flags for features that shipped long ago
128
+ - Duplicate implementations (same behavior in multiple places)
129
+ - Functions that are declared but never called
130
+
131
+ **Severity Guide:**
132
+ - **HIGH:** Divergent implementations causing inconsistent behavior
133
+ - **MEDIUM:** Dead code actively misleading about system behavior
134
+ - **LOW:** Minor unused exports or stale comments
135
+
136
+ **Examples:**
137
+ - Violation: `registerEventTools()` exists but is never called in production — vestigial from an earlier design that was refactored
138
+ - Healthy: Unused code removed, version history preserves it if needed
139
+
140
+ ---
141
+
142
+ ## DIM-6: Architecture
143
+
144
+ **Definition:** Compliance with fundamental design principles — SOLID, coupling/cohesion, dependency direction. Architecture violations make the system rigid, fragile, and resistant to change.
145
+
146
+ **Invariants:**
147
+ - Dependencies point inward (high-level modules don't depend on low-level details)
148
+ - No circular dependency chains between modules
149
+ - Each module has a single, well-defined responsibility
150
+ - Interfaces are at domain boundaries, not within a module
151
+
152
+ **Detectable Signals:**
153
+ - God objects (classes/modules with >10 public methods or >500 lines)
154
+ - Circular imports between modules
155
+ - Dependency inversion violations (core depends on infrastructure)
156
+ - Feature envy (method primarily uses another class's data)
157
+ - Shotgun surgery indicators (one change requires edits in >5 files)
158
+
159
+ **Severity Guide:**
160
+ - **HIGH:** Circular dependencies creating build or runtime issues
161
+ - **MEDIUM:** SOLID violations that resist planned changes
162
+ - **LOW:** Mild coupling that doesn't impede current work
163
+
164
+ **Examples:**
165
+ - Violation: Event store module imports from CLI module, creating a circular dependency that constrains refactoring
166
+ - Healthy: Event store depends on interfaces; CLI implements those interfaces
167
+
168
+ ---
169
+
170
+ ## DIM-7: Resilience
171
+
172
+ **Definition:** Operational robustness under stress, failure, and resource pressure. Resilience violations don't break normal operation but cause cascading failures under load, resource exhaustion, or partial outages.
173
+
174
+ **Invariants:**
175
+ - Every cache has a maximum size and eviction policy
176
+ - Every external call has a timeout
177
+ - Retry logic has bounded attempts and backoff
178
+ - Resource acquisition has corresponding release (open/close symmetry)
179
+
180
+ **Detectable Signals:**
181
+ - Unbounded caches (`Map` or `Set` that grows without limit)
182
+ - Missing timeouts on HTTP calls, database queries, or file operations
183
+ - Retry loops without maximum attempts
184
+ - Resource leaks (file handles, connections opened but not closed in error paths)
185
+ - Missing graceful degradation (all-or-nothing behavior)
186
+ - Synchronous blocking on I/O in async contexts
187
+
188
+ **Severity Guide:**
189
+ - **HIGH:** Unbounded resource growth that will eventually crash
190
+ - **MEDIUM:** Missing timeout that could hang indefinitely
191
+ - **LOW:** Suboptimal resource management that doesn't impact normal operation
192
+
193
+ **Examples:**
194
+ - Violation: In-memory cache grows without limit as events are processed, eventually exhausting heap
195
+ - Healthy: LRU cache with configurable max size, eviction logged for observability
196
+
197
+ ---
198
+
199
+ ## Dimension Independence
200
+
201
+ Each dimension can be assessed in isolation. However, some findings may span multiple dimensions:
202
+
203
+ - A lazy fallback constructor (DIM-1: Topology) may also be a silent error (DIM-2: Observability)
204
+ - Dead code (DIM-5: Hygiene) may also be a test fidelity issue if tests reference it (DIM-4)
205
+
206
+ When a finding spans dimensions, it should be reported under the **primary** dimension (the one most directly violated) with a cross-reference note. The `audit` skill handles deduplication when the same evidence appears under multiple dimensions.
@@ -0,0 +1,61 @@
1
+ # Finding Format
2
+
3
+ All axiom skills emit findings in a shared schema. This enables composition, deduplication, and aggregation across skills.
4
+
5
+ ## Finding Schema
6
+
7
+ ```typescript
8
+ interface Finding {
9
+ dimension: string; // DIM-1 through DIM-7 (see dimensions.md)
10
+ severity: 'HIGH' | 'MEDIUM' | 'LOW';
11
+ title: string; // Short description, <100 characters
12
+ evidence: string[]; // file:line references (e.g., ["src/store.ts:42", "src/store.ts:87"])
13
+ explanation: string; // What's wrong for context (2-4 sentences)
14
+ suggestion?: string; // How to fix, when actionable (optional)
15
+ skill: string; // Which skill produced this (e.g., "critique", "harden")
16
+ deterministic: boolean; // true if found by scan, false if qualitative assessment
17
+ }
18
+ ```
19
+
20
+ ## Severity Tiers
21
+
22
+ | Tier | Definition | Action |
23
+ |------|-----------|--------|
24
+ | **HIGH** | Violates correctness invariant, risks data loss, or causes silent failure. The system may appear to work but produces incorrect results. | Must fix before merge. |
25
+ | **MEDIUM** | Degrades quality, maintainability, or performance but doesn't break correctness. The system works correctly but is harder to change or operate. | Should fix. May defer with documented rationale. |
26
+ | **LOW** | Polish, minor inefficiencies, aspirational improvements. The system works well but could be better. | Track for future. Don't block. |
27
+
28
+ ## Output Format
29
+
30
+ Skills present findings as a Markdown list grouped by severity:
31
+
32
+ ```markdown
33
+ ## Findings
34
+
35
+ ### HIGH
36
+
37
+ - **[DIM-1] Lazy fallback creates degraded EventStore** (deterministic)
38
+ - Evidence: `src/events/tools.ts:15`, `src/events/tools.ts:42`
39
+ - `getStore()` creates an in-memory instance when the configured store isn't available, causing events to be invisible to other modules.
40
+ - Suggestion: Remove fallback; fail fast if store isn't configured.
41
+
42
+ ### MEDIUM
43
+
44
+ - **[DIM-2] Empty catch block hides initialization errors** (qualitative)
45
+ - Evidence: `src/config/loader.ts:88`
46
+ - Configuration errors are caught and silently ignored, falling back to defaults. This hides broken configuration that may cause subtle behavioral differences.
47
+ - Suggestion: Log the error with context, or re-throw if configuration is required.
48
+
49
+ ### LOW
50
+
51
+ (none)
52
+ ```
53
+
54
+ ## Deduplication Rules
55
+
56
+ When `audit` aggregates findings from multiple skills:
57
+
58
+ 1. **Same evidence + same dimension** → merge into a single finding (keep the most detailed explanation)
59
+ 2. **Same evidence + different dimensions** → keep both (the finding genuinely spans two concerns)
60
+ 3. **Same pattern + different files** → keep as separate findings (each location needs attention)
61
+ 4. **Deterministic + qualitative for same issue** → merge, mark as `deterministic: true` (the mechanical check grounds the qualitative assessment)
@@ -0,0 +1,86 @@
1
+ # Scoring Model
2
+
3
+ How findings are aggregated into a verdict. The plugin produces standalone verdicts (no workflow concepts). Workflow tools map plugin verdicts to their own status values.
4
+
5
+ ## Plugin Verdict
6
+
7
+ ```text
8
+ if HIGH_count > 0:
9
+ verdict = NEEDS_ATTENTION
10
+ elif MEDIUM_count > 5:
11
+ verdict = NEEDS_ATTENTION
12
+ else:
13
+ verdict = CLEAN
14
+ ```
15
+
16
+ | Verdict | Meaning |
17
+ |---------|---------|
18
+ | **CLEAN** | No significant issues found. Code meets quality standards. |
19
+ | **NEEDS_ATTENTION** | Issues found that should be addressed. Review findings and prioritize fixes. |
20
+
21
+ ## Per-Dimension Metrics
22
+
23
+ For each dimension, compute:
24
+
25
+ - **Pass rate:** `checks_passed / total_checks` (deterministic checks only)
26
+ - **Finding count:** total findings (deterministic + qualitative)
27
+ - **Severity distribution:** count of HIGH / MEDIUM / LOW findings
28
+
29
+ ## Aggregate Metrics
30
+
31
+ - **Overall pass rate:** average of per-dimension pass rates (dimensions without deterministic checks are excluded)
32
+ - **Finding density:** `total_findings / files_analyzed` (lower is better)
33
+ - **Coverage:** `dimensions_assessed / 7` (should be 1.0 for a full audit)
34
+
35
+ ## Health Thresholds
36
+
37
+ | Metric | Healthy | Concerning | Unhealthy |
38
+ |--------|---------|-----------|-----------|
39
+ | Overall pass rate | >90% | 70-90% | <70% |
40
+ | Finding density | <0.5 | 0.5-1.0 | >1.0 |
41
+ | HIGH count | 0 | 1-2 | >2 |
42
+ | Dimension coverage | 7/7 | 5-6/7 | <5/7 |
43
+
44
+ ## Report Structure
45
+
46
+ ```markdown
47
+ # Backend Quality Report
48
+
49
+ **Scope:** [scope assessed]
50
+ **Verdict:** [CLEAN | NEEDS_ATTENTION]
51
+ **Date:** [assessment date]
52
+
53
+ ## Summary
54
+
55
+ | Dimension | Findings | HIGH | MED | LOW | Pass Rate |
56
+ |-----------|----------|------|-----|-----|-----------|
57
+ | Topology | N | N | N | N | N% |
58
+ | ... | | | | | |
59
+
60
+ **Aggregate:** N findings across N files (density: N.N)
61
+
62
+ ## HIGH-Priority Findings
63
+ [Grouped findings with evidence and suggestions]
64
+
65
+ ## MEDIUM-Priority Findings
66
+ [Grouped findings]
67
+
68
+ ## LOW-Priority Findings
69
+ [Grouped findings]
70
+
71
+ ## Dimensional Coverage
72
+ [Which dimensions were assessed, which were skipped and why]
73
+
74
+ ## Recommendations
75
+ [Prioritized action items]
76
+ ```
77
+
78
+ ## Consumer Mapping
79
+
80
+ Workflow tools that consume axiom verdicts should define their own mapping. Example:
81
+
82
+ | Plugin Verdict | Consumer Verdict | Condition |
83
+ |---------------|-----------------|-----------|
84
+ | CLEAN | APPROVED | No additional consumer-specific findings |
85
+ | NEEDS_ATTENTION | NEEDS_FIXES | Consumer wants fixes before merge |
86
+ | NEEDS_ATTENTION | BLOCKED | Consumer's domain-specific HIGH findings present |
@@ -0,0 +1,132 @@
1
+ ---
2
+ name: critique
3
+ description: "Review backend architecture for SOLID violations, coupling issues, and dependency direction problems. Use when evaluating structural design decisions or preparing for refactoring. Triggers: 'review architecture', 'check SOLID', 'critique code', or /axiom:critique. Do NOT use for error handling — use axiom:harden instead."
4
+ user-invokable: true
5
+ metadata:
6
+ author: lvlup-sw
7
+ version: 0.1.0
8
+ category: assessment
9
+ dimensions:
10
+ - architecture
11
+ - topology
12
+ ---
13
+
14
+ # Critique Skill — Architecture Review
15
+
16
+ ## Overview
17
+
18
+ Architecture review skill covering two quality dimensions:
19
+
20
+ - **DIM-6 (Architecture):** SOLID principles adherence, module boundaries, responsibility allocation
21
+ - **DIM-1 (Topology):** Dependency graph health, coupling metrics, layering discipline
22
+
23
+ Use this skill to evaluate structural design decisions, identify architectural drift, or prepare a codebase for refactoring. It combines deterministic scanning (via `axiom:scan`) with qualitative agent assessment to produce actionable findings.
24
+
25
+ ## Triggers
26
+
27
+ ### Positive Triggers
28
+
29
+ Activate this skill when:
30
+ - User says "review architecture" or "architecture review"
31
+ - User says "check SOLID" or "SOLID violations"
32
+ - User says "critique code" or "critique this module"
33
+ - User runs `/axiom:critique`
34
+ - User asks about coupling, dependency direction, or module boundaries
35
+ - Preparing for a major refactoring effort
36
+
37
+ ### Negative Triggers
38
+
39
+ Do NOT activate this skill when:
40
+ - User wants error handling review — use `axiom:harden` instead
41
+ - User wants test quality review — use `axiom:verify` instead
42
+ - User wants performance review — performance profiling is out of scope for axiom
43
+ - User wants a general code review — use `axiom:scan` for a broad sweep first
44
+
45
+ ## Process
46
+
47
+ ### Step 1: Load Dimension Definitions
48
+
49
+ Load the relevant dimension definitions for this review:
50
+
51
+ - `@skills/backend-quality/references/dimensions.md` — Read DIM-1 (Topology) and DIM-6 (Architecture) sections for scoring criteria, signal definitions, and severity thresholds.
52
+
53
+ ### Step 2: Run Deterministic Scan
54
+
55
+ Execute `axiom:scan` targeting Architecture and Topology dimensions specifically:
56
+
57
+ - Collects measurable signals: file sizes, parameter counts, import depth, circular references
58
+ - Establishes a baseline of deterministic findings before qualitative assessment
59
+ - Each automated finding sets `skill: "scan"` and `deterministic: true`
60
+
61
+ ### Step 3: Layer Qualitative Assessment
62
+
63
+ On top of the scan baseline, perform agent-driven qualitative evaluation across five areas:
64
+
65
+ #### 3a. SOLID Evaluation
66
+
67
+ Assess adherence to each SOLID principle. For definitions, violation signals, and severity guidance, see `@skills/critique/references/solid-principles.md`.
68
+
69
+ - **Single Responsibility Principle (SRP):** Does each module/class have one reason to change?
70
+ - **Open/Closed Principle (OCP):** Are modules open for extension but closed for modification?
71
+ - **Liskov Substitution Principle (LSP):** Can subtypes replace their base types without breaking behavior?
72
+ - **Interface Segregation Principle (ISP):** Are interfaces focused, or do clients depend on methods they do not use?
73
+ - **Dependency Inversion Principle (DIP):** Do high-level modules depend on abstractions, not concretions?
74
+
75
+ #### 3b. Coupling Analysis
76
+
77
+ Measure and evaluate module coupling:
78
+
79
+ - **Afferent coupling (Ca):** How many modules depend on this module?
80
+ - **Efferent coupling (Ce):** How many modules does this module depend on?
81
+ - **Instability (I = Ce / (Ca + Ce)):** Is the module stable (depended-upon) or unstable (depends-on-others)?
82
+ - Flag modules with high instability that are also heavily depended-upon (unstable foundation)
83
+ - For detailed coupling metrics and patterns, see `@skills/critique/references/dependency-patterns.md`
84
+
85
+ #### 3c. Dependency Direction
86
+
87
+ Evaluate whether dependencies point in the correct direction:
88
+
89
+ - Dependencies should flow inward: infrastructure depends on domain, not the reverse
90
+ - Core/domain modules should never import from infrastructure, framework, or I/O layers
91
+ - Check for proper use of dependency inversion — abstractions at boundaries
92
+ - See `@skills/critique/references/dependency-patterns.md` for healthy vs unhealthy patterns
93
+
94
+ #### 3d. God Object Detection
95
+
96
+ Identify modules with too many responsibilities:
97
+
98
+ - Modules handling more than 3 distinct concerns
99
+ - Files exceeding complexity thresholds (lines, function count, branching depth)
100
+ - Classes or modules that are modified in every feature branch (shotgun surgery indicator)
101
+ - Modules that import from many unrelated domains
102
+
103
+ #### 3e. Circular Dependency Identification
104
+
105
+ Detect import cycles between modules:
106
+
107
+ - Direct circular imports (A imports B, B imports A)
108
+ - Transitive cycles (A -> B -> C -> A)
109
+ - Barrel-file-mediated cycles (index.ts re-exports creating hidden loops)
110
+ - See `@skills/critique/references/dependency-patterns.md` for detection approach and remediation
111
+
112
+ ### Step 4: Output Findings
113
+
114
+ Format all findings per `@skills/backend-quality/references/findings-format.md`:
115
+
116
+ - Each finding includes: dimension, severity, title, evidence, explanation, suggestion (optional), skill, deterministic
117
+ - Scan findings use `deterministic: true`; qualitative findings use `deterministic: false`
118
+ - Grouped by dimension (Architecture, then Topology), sorted by severity within each group
119
+ - Include an executive summary with finding counts by severity
120
+
121
+ ## Error Handling
122
+
123
+ - **Empty scope:** If the target scope contains no analyzable files (e.g., empty directory, only config files), return an informative message: "No backend source files found in the specified scope. Verify the path and ensure it contains TypeScript/JavaScript source files."
124
+ - **Scope validation:** Before analysis, validate that the provided path exists and contains source files. If the path does not exist, report the error immediately rather than producing empty results.
125
+ - **Partial failures:** If the deterministic scan fails on a subset of checks, continue with available results and note which checks were skipped in the output.
126
+
127
+ ## References
128
+
129
+ - `@skills/backend-quality/references/dimensions.md` — Dimension definitions for DIM-1 and DIM-6
130
+ - `@skills/backend-quality/references/findings-format.md` — Standard output format for findings
131
+ - `@skills/critique/references/solid-principles.md` — SOLID principle definitions, violation signals, severity guide, and detection heuristics
132
+ - `@skills/critique/references/dependency-patterns.md` — Dependency pattern catalog, coupling metrics, circular dependency detection, and layered architecture guidance