npm - @lvlup-sw/axiom - Versions diffs - 0.2.0 - Mend

@lvlup-sw/axiom 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/.claude-plugin/plugin.json +16 -0
package/CLAUDE.md +26 -0
package/LICENSE +21 -0
package/package.json +37 -0
package/skills/audit/SKILL.md +126 -0
package/skills/audit/references/composition-guide.md +105 -0
package/skills/backend-quality/SKILL.md +40 -0
package/skills/backend-quality/references/deterministic-checks.md +151 -0
package/skills/backend-quality/references/dimensions.md +206 -0
package/skills/backend-quality/references/findings-format.md +61 -0
package/skills/backend-quality/references/scoring-model.md +86 -0
package/skills/critique/SKILL.md +132 -0
package/skills/critique/references/dependency-patterns.md +319 -0
package/skills/critique/references/solid-principles.md +359 -0
package/skills/distill/SKILL.md +83 -0
package/skills/distill/references/dead-code-patterns.md +152 -0
package/skills/distill/references/simplification-guide.md +128 -0
package/skills/harden/SKILL.md +161 -0
package/skills/harden/references/error-patterns.md +180 -0
package/skills/harden/references/resilience-checklist.md +82 -0
package/skills/scan/SKILL.md +102 -0
package/skills/scan/references/check-catalog.md +68 -0
package/skills/verify/SKILL.md +102 -0
package/skills/verify/references/contract-testing.md +185 -0
package/skills/verify/references/test-antipatterns.md +161 -0

package/skills/backend-quality/references/dimensions.md ADDED Viewed

@@ -0,0 +1,206 @@
+# Backend Quality Dimensions
+Seven canonical dimensions for assessing backend architectural health. Each dimension is independently assessable — no dimension requires another's output to produce findings.
+## DIM-1: Topology
+**Definition:** The structural health of dependency graphs, wiring correctness, and state ownership. Topology violations create invisible coupling where modules behave differently depending on initialization order or runtime context.
+**Invariants:**
+- Every shared resource has a single source of truth for its lifecycle
+- Dependencies are explicit (parameter/constructor injection), not ambient (module globals)
+- No module silently creates degraded instances of shared resources
+**Detectable Signals:**
+- Module-global mutable state (`let moduleStore = ...` at file scope)
+- Lazy fallback constructors (`if (!store) { store = new Store() }`)
+- Manual wiring functions (`configureXxx()`, `registerXxx()`) without validation
+- Divergent instances of the same resource across modules
+- Circular dependency chains
+**Severity Guide:**
+- **HIGH:** Lazy fallback creates degraded instance silently (masks broken wiring)
+- **MEDIUM:** Module-global mutable state without documented rationale
+- **LOW:** Manual wiring that works but could be simplified
+**Examples:**
+- Violation: `getStore()` silently creates an in-memory store when the real store wasn't wired, causing events to be invisible across modules
+- Healthy: Constructor injection where the absence of a dependency is a startup error, not a silent fallback
+---
+## DIM-2: Observability
+**Definition:** The visibility of errors, failures, and system behavior. Observability violations hide problems, making bugs harder to find and diagnose. A system with poor observability may appear healthy while silently losing data.
+**Invariants:**
+- Every catch block either re-throws, logs with context, or has documented rationale for swallowing
+- Error messages include what failed, why, and what to do about it
+- Fallback behavior is visible (logged, metriced, or signaled), never silent
+**Detectable Signals:**
+- Empty catch blocks (`catch {}`, `catch (e) {}`)
+- Catch blocks that only log without context (`catch (e) { console.log(e) }`)
+- Silent fallbacks that switch behavior modes without signaling
+- Missing error context (generic "something went wrong" messages)
+- Swallowed promise rejections (`.catch(() => {})`)
+**Severity Guide:**
+- **HIGH:** Silent catch that masks data loss or incorrect behavior
+- **MEDIUM:** Catch that logs but lacks actionable context
+- **LOW:** Verbose error that could be more specific
+**Examples:**
+- Violation: `catch { mutableState._events = [] }` — silently resets state on error, hiding the failure
+- Healthy: `catch (e) { throw new Error('Failed to load events from store', { cause: e }) }`
+---
+## DIM-3: Contracts
+**Definition:** The integrity of schemas, APIs, and type boundaries. Contract violations occur when the actual runtime behavior diverges from the declared interface — fields removed from schemas but still read, breaking API changes without versioning, or type assertions that bypass safety.
+**Invariants:**
+- Every field read at runtime is present in the declared schema/type
+- API changes are versioned or backward-compatible
+- Type assertions (`as`, `!`) have validated preconditions
+**Detectable Signals:**
+- Schema fields removed but still accessed at runtime
+- Zod/JSON schemas that don't match TypeScript types
+- Unversioned breaking API changes
+- Type assertions without guards (`value as Type` without `typeof`/`instanceof` check)
+- Interface implementations that silently ignore new required members
+**Severity Guide:**
+- **HIGH:** Schema-runtime divergence (field removed from schema but read at runtime)
+- **MEDIUM:** Type assertion without validation guard
+- **LOW:** Overly permissive schema (accepts more than necessary)
+**Examples:**
+- Violation: `_events` removed from Zod schema but guard code still reads `state._events`, silently getting `undefined`
+- Healthy: Schema changes accompanied by grep for all field references, with type system enforcing the change
+---
+## DIM-4: Test Fidelity
+**Definition:** The degree to which tests exercise actual production behavior. Low test fidelity means tests can pass while the system is broken — the most dangerous kind of false confidence.
+**Invariants:**
+- Test setup matches production wiring (same instances, same initialization)
+- Mocks are used only at true infrastructure boundaries (HTTP, DB, filesystem)
+- Critical paths have integration tests, not just unit tests
+**Detectable Signals:**
+- Test setup creates different instances than production wiring
+- More than 3 mocked dependencies in a single test (over-isolation)
+- Unit tests for cross-cutting concerns that need integration tests
+- Tests that assert on mock calls rather than observable behavior
+- Test helpers that hide important setup details
+- `describe.skip` or `it.skip` without tracked issue references
+**Severity Guide:**
+- **HIGH:** Test-production divergence on shared state (different instances)
+- **MEDIUM:** Over-mocking hides real integration behavior
+- **LOW:** Test naming doesn't follow conventions
+**Examples:**
+- Violation: All tests use the same EventStore instance for producer and consumer, but production has two separate instances that were never connected — 4192 tests pass, system is broken
+- Healthy: Test creates the same wiring as production startup, catching initialization bugs
+---
+## DIM-5: Hygiene
+**Definition:** The absence of dead code, vestigial patterns, and evolutionary leftovers. Poor hygiene increases cognitive load, hides the actual architecture, and provides misleading signals about what the system does.
+**Invariants:**
+- Every exported symbol has at least one consumer
+- No commented-out code blocks (use version control instead)
+- No divergent implementations of the same behavior
+**Detectable Signals:**
+- Unreachable code paths (after unconditional return/throw)
+- Unused exports (exported but never imported)
+- Commented-out code blocks (more than 3 lines)
+- Feature flags for features that shipped long ago
+- Duplicate implementations (same behavior in multiple places)
+- Functions that are declared but never called
+**Severity Guide:**
+- **HIGH:** Divergent implementations causing inconsistent behavior
+- **MEDIUM:** Dead code actively misleading about system behavior
+- **LOW:** Minor unused exports or stale comments
+**Examples:**
+- Violation: `registerEventTools()` exists but is never called in production — vestigial from an earlier design that was refactored
+- Healthy: Unused code removed, version history preserves it if needed
+---
+## DIM-6: Architecture
+**Definition:** Compliance with fundamental design principles — SOLID, coupling/cohesion, dependency direction. Architecture violations make the system rigid, fragile, and resistant to change.
+**Invariants:**
+- Dependencies point inward (high-level modules don't depend on low-level details)
+- No circular dependency chains between modules
+- Each module has a single, well-defined responsibility
+- Interfaces are at domain boundaries, not within a module
+**Detectable Signals:**
+- God objects (classes/modules with >10 public methods or >500 lines)
+- Circular imports between modules
+- Dependency inversion violations (core depends on infrastructure)
+- Feature envy (method primarily uses another class's data)
+- Shotgun surgery indicators (one change requires edits in >5 files)
+**Severity Guide:**
+- **HIGH:** Circular dependencies creating build or runtime issues
+- **MEDIUM:** SOLID violations that resist planned changes
+- **LOW:** Mild coupling that doesn't impede current work
+**Examples:**
+- Violation: Event store module imports from CLI module, creating a circular dependency that constrains refactoring
+- Healthy: Event store depends on interfaces; CLI implements those interfaces
+---
+## DIM-7: Resilience
+**Definition:** Operational robustness under stress, failure, and resource pressure. Resilience violations don't break normal operation but cause cascading failures under load, resource exhaustion, or partial outages.
+**Invariants:**
+- Every cache has a maximum size and eviction policy
+- Every external call has a timeout
+- Retry logic has bounded attempts and backoff
+- Resource acquisition has corresponding release (open/close symmetry)
+**Detectable Signals:**
+- Unbounded caches (`Map` or `Set` that grows without limit)
+- Missing timeouts on HTTP calls, database queries, or file operations
+- Retry loops without maximum attempts
+- Resource leaks (file handles, connections opened but not closed in error paths)
+- Missing graceful degradation (all-or-nothing behavior)
+- Synchronous blocking on I/O in async contexts
+**Severity Guide:**
+- **HIGH:** Unbounded resource growth that will eventually crash
+- **MEDIUM:** Missing timeout that could hang indefinitely
+- **LOW:** Suboptimal resource management that doesn't impact normal operation
+**Examples:**
+- Violation: In-memory cache grows without limit as events are processed, eventually exhausting heap
+- Healthy: LRU cache with configurable max size, eviction logged for observability
+---
+## Dimension Independence
+Each dimension can be assessed in isolation. However, some findings may span multiple dimensions:
+- A lazy fallback constructor (DIM-1: Topology) may also be a silent error (DIM-2: Observability)
+- Dead code (DIM-5: Hygiene) may also be a test fidelity issue if tests reference it (DIM-4)
+When a finding spans dimensions, it should be reported under the **primary** dimension (the one most directly violated) with a cross-reference note. The `audit` skill handles deduplication when the same evidence appears under multiple dimensions.

package/skills/backend-quality/references/findings-format.md ADDED Viewed

@@ -0,0 +1,61 @@
+# Finding Format
+All axiom skills emit findings in a shared schema. This enables composition, deduplication, and aggregation across skills.
+## Finding Schema
+```typescript
+interface Finding {
+  dimension: string;        // DIM-1 through DIM-7 (see dimensions.md)
+  severity: 'HIGH' | 'MEDIUM' | 'LOW';
+  title: string;            // Short description, <100 characters
+  evidence: string[];       // file:line references (e.g., ["src/store.ts:42", "src/store.ts:87"])
+  explanation: string;      // What's wrong for context (2-4 sentences)
+  suggestion?: string;      // How to fix, when actionable (optional)
+  skill: string;            // Which skill produced this (e.g., "critique", "harden")
+  deterministic: boolean;   // true if found by scan, false if qualitative assessment
+}
+```
+## Severity Tiers
+| Tier | Definition | Action |
+|------|-----------|--------|
+| **HIGH** | Violates correctness invariant, risks data loss, or causes silent failure. The system may appear to work but produces incorrect results. | Must fix before merge. |
+| **MEDIUM** | Degrades quality, maintainability, or performance but doesn't break correctness. The system works correctly but is harder to change or operate. | Should fix. May defer with documented rationale. |
+| **LOW** | Polish, minor inefficiencies, aspirational improvements. The system works well but could be better. | Track for future. Don't block. |
+## Output Format
+Skills present findings as a Markdown list grouped by severity:
+```markdown
+## Findings
+### HIGH
+- **[DIM-1] Lazy fallback creates degraded EventStore** (deterministic)
+  - Evidence: `src/events/tools.ts:15`, `src/events/tools.ts:42`
+  - `getStore()` creates an in-memory instance when the configured store isn't available, causing events to be invisible to other modules.
+  - Suggestion: Remove fallback; fail fast if store isn't configured.
+### MEDIUM
+- **[DIM-2] Empty catch block hides initialization errors** (qualitative)
+  - Evidence: `src/config/loader.ts:88`
+  - Configuration errors are caught and silently ignored, falling back to defaults. This hides broken configuration that may cause subtle behavioral differences.
+  - Suggestion: Log the error with context, or re-throw if configuration is required.
+### LOW
+(none)
+```
+## Deduplication Rules
+When `audit` aggregates findings from multiple skills:
+1. **Same evidence + same dimension** → merge into a single finding (keep the most detailed explanation)
+2. **Same evidence + different dimensions** → keep both (the finding genuinely spans two concerns)
+3. **Same pattern + different files** → keep as separate findings (each location needs attention)
+4. **Deterministic + qualitative for same issue** → merge, mark as `deterministic: true` (the mechanical check grounds the qualitative assessment)

package/skills/backend-quality/references/scoring-model.md ADDED Viewed

@@ -0,0 +1,86 @@
+# Scoring Model
+How findings are aggregated into a verdict. The plugin produces standalone verdicts (no workflow concepts). Workflow tools map plugin verdicts to their own status values.
+## Plugin Verdict
+```text
+if HIGH_count > 0:
+    verdict = NEEDS_ATTENTION
+elif MEDIUM_count > 5:
+    verdict = NEEDS_ATTENTION
+else:
+    verdict = CLEAN
+```
+| Verdict | Meaning |
+|---------|---------|
+| **CLEAN** | No significant issues found. Code meets quality standards. |
+| **NEEDS_ATTENTION** | Issues found that should be addressed. Review findings and prioritize fixes. |
+## Per-Dimension Metrics
+For each dimension, compute:
+- **Pass rate:** `checks_passed / total_checks` (deterministic checks only)
+- **Finding count:** total findings (deterministic + qualitative)
+- **Severity distribution:** count of HIGH / MEDIUM / LOW findings
+## Aggregate Metrics
+- **Overall pass rate:** average of per-dimension pass rates (dimensions without deterministic checks are excluded)
+- **Finding density:** `total_findings / files_analyzed` (lower is better)
+- **Coverage:** `dimensions_assessed / 7` (should be 1.0 for a full audit)
+## Health Thresholds
+| Metric | Healthy | Concerning | Unhealthy |
+|--------|---------|-----------|-----------|
+| Overall pass rate | >90% | 70-90% | <70% |
+| Finding density | <0.5 | 0.5-1.0 | >1.0 |
+| HIGH count | 0 | 1-2 | >2 |
+| Dimension coverage | 7/7 | 5-6/7 | <5/7 |
+## Report Structure
+```markdown
+# Backend Quality Report
+**Scope:** [scope assessed]
+**Verdict:** [CLEAN | NEEDS_ATTENTION]
+**Date:** [assessment date]
+## Summary
+| Dimension | Findings | HIGH | MED | LOW | Pass Rate |
+|-----------|----------|------|-----|-----|-----------|
+| Topology | N | N | N | N | N% |
+| ... | | | | | |
+**Aggregate:** N findings across N files (density: N.N)
+## HIGH-Priority Findings
+[Grouped findings with evidence and suggestions]
+## MEDIUM-Priority Findings
+[Grouped findings]
+## LOW-Priority Findings
+[Grouped findings]
+## Dimensional Coverage
+[Which dimensions were assessed, which were skipped and why]
+## Recommendations
+[Prioritized action items]
+```
+## Consumer Mapping
+Workflow tools that consume axiom verdicts should define their own mapping. Example:
+| Plugin Verdict | Consumer Verdict | Condition |
+|---------------|-----------------|-----------|
+| CLEAN | APPROVED | No additional consumer-specific findings |
+| NEEDS_ATTENTION | NEEDS_FIXES | Consumer wants fixes before merge |
+| NEEDS_ATTENTION | BLOCKED | Consumer's domain-specific HIGH findings present |

package/skills/critique/SKILL.md ADDED Viewed

@@ -0,0 +1,132 @@
+---
+name: critique
+description: "Review backend architecture for SOLID violations, coupling issues, and dependency direction problems. Use when evaluating structural design decisions or preparing for refactoring. Triggers: 'review architecture', 'check SOLID', 'critique code', or /axiom:critique. Do NOT use for error handling — use axiom:harden instead."
+user-invokable: true
+metadata:
+  author: lvlup-sw
+  version: 0.1.0
+  category: assessment
+  dimensions:
+    - architecture
+    - topology
+---
+# Critique Skill — Architecture Review
+## Overview
+Architecture review skill covering two quality dimensions:
+- **DIM-6 (Architecture):** SOLID principles adherence, module boundaries, responsibility allocation
+- **DIM-1 (Topology):** Dependency graph health, coupling metrics, layering discipline
+Use this skill to evaluate structural design decisions, identify architectural drift, or prepare a codebase for refactoring. It combines deterministic scanning (via `axiom:scan`) with qualitative agent assessment to produce actionable findings.
+## Triggers
+### Positive Triggers
+Activate this skill when:
+- User says "review architecture" or "architecture review"
+- User says "check SOLID" or "SOLID violations"
+- User says "critique code" or "critique this module"
+- User runs `/axiom:critique`
+- User asks about coupling, dependency direction, or module boundaries
+- Preparing for a major refactoring effort
+### Negative Triggers
+Do NOT activate this skill when:
+- User wants error handling review — use `axiom:harden` instead
+- User wants test quality review — use `axiom:verify` instead
+- User wants performance review — performance profiling is out of scope for axiom
+- User wants a general code review — use `axiom:scan` for a broad sweep first
+## Process
+### Step 1: Load Dimension Definitions
+Load the relevant dimension definitions for this review:
+- `@skills/backend-quality/references/dimensions.md` — Read DIM-1 (Topology) and DIM-6 (Architecture) sections for scoring criteria, signal definitions, and severity thresholds.
+### Step 2: Run Deterministic Scan
+Execute `axiom:scan` targeting Architecture and Topology dimensions specifically:
+- Collects measurable signals: file sizes, parameter counts, import depth, circular references
+- Establishes a baseline of deterministic findings before qualitative assessment
+- Each automated finding sets `skill: "scan"` and `deterministic: true`
+### Step 3: Layer Qualitative Assessment
+On top of the scan baseline, perform agent-driven qualitative evaluation across five areas:
+#### 3a. SOLID Evaluation
+Assess adherence to each SOLID principle. For definitions, violation signals, and severity guidance, see `@skills/critique/references/solid-principles.md`.
+- **Single Responsibility Principle (SRP):** Does each module/class have one reason to change?
+- **Open/Closed Principle (OCP):** Are modules open for extension but closed for modification?
+- **Liskov Substitution Principle (LSP):** Can subtypes replace their base types without breaking behavior?
+- **Interface Segregation Principle (ISP):** Are interfaces focused, or do clients depend on methods they do not use?
+- **Dependency Inversion Principle (DIP):** Do high-level modules depend on abstractions, not concretions?
+#### 3b. Coupling Analysis
+Measure and evaluate module coupling:
+- **Afferent coupling (Ca):** How many modules depend on this module?
+- **Efferent coupling (Ce):** How many modules does this module depend on?
+- **Instability (I = Ce / (Ca + Ce)):** Is the module stable (depended-upon) or unstable (depends-on-others)?
+- Flag modules with high instability that are also heavily depended-upon (unstable foundation)
+- For detailed coupling metrics and patterns, see `@skills/critique/references/dependency-patterns.md`
+#### 3c. Dependency Direction
+Evaluate whether dependencies point in the correct direction:
+- Dependencies should flow inward: infrastructure depends on domain, not the reverse
+- Core/domain modules should never import from infrastructure, framework, or I/O layers
+- Check for proper use of dependency inversion — abstractions at boundaries
+- See `@skills/critique/references/dependency-patterns.md` for healthy vs unhealthy patterns
+#### 3d. God Object Detection
+Identify modules with too many responsibilities:
+- Modules handling more than 3 distinct concerns
+- Files exceeding complexity thresholds (lines, function count, branching depth)
+- Classes or modules that are modified in every feature branch (shotgun surgery indicator)
+- Modules that import from many unrelated domains
+#### 3e. Circular Dependency Identification
+Detect import cycles between modules:
+- Direct circular imports (A imports B, B imports A)
+- Transitive cycles (A -> B -> C -> A)
+- Barrel-file-mediated cycles (index.ts re-exports creating hidden loops)
+- See `@skills/critique/references/dependency-patterns.md` for detection approach and remediation
+### Step 4: Output Findings
+Format all findings per `@skills/backend-quality/references/findings-format.md`:
+- Each finding includes: dimension, severity, title, evidence, explanation, suggestion (optional), skill, deterministic
+- Scan findings use `deterministic: true`; qualitative findings use `deterministic: false`
+- Grouped by dimension (Architecture, then Topology), sorted by severity within each group
+- Include an executive summary with finding counts by severity
+## Error Handling
+- **Empty scope:** If the target scope contains no analyzable files (e.g., empty directory, only config files), return an informative message: "No backend source files found in the specified scope. Verify the path and ensure it contains TypeScript/JavaScript source files."
+- **Scope validation:** Before analysis, validate that the provided path exists and contains source files. If the path does not exist, report the error immediately rather than producing empty results.
+- **Partial failures:** If the deterministic scan fails on a subset of checks, continue with available results and note which checks were skipped in the output.
+## References
+- `@skills/backend-quality/references/dimensions.md` — Dimension definitions for DIM-1 and DIM-6
+- `@skills/backend-quality/references/findings-format.md` — Standard output format for findings
+- `@skills/critique/references/solid-principles.md` — SOLID principle definitions, violation signals, severity guide, and detection heuristics
+- `@skills/critique/references/dependency-patterns.md` — Dependency pattern catalog, coupling metrics, circular dependency detection, and layered architecture guidance