npm - @esoteric-logic/praxis-harness - Versions diffs - 2.11.0 → 2.13.0 - Mend

@esoteric-logic/praxis-harness 2.11.0 → 2.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

package/base/CLAUDE.md +37 -3
package/base/agents/evaluator.md +44 -0
package/base/agents/planner.md +48 -0
package/base/hooks/auto-format.sh +1 -1
package/base/hooks/dep-audit.sh +1 -1
package/base/hooks/file-guard.sh +3 -3
package/base/hooks/recursion-guard.sh +7 -1
package/base/hooks/session-data-collect.sh +1 -1
package/base/hooks/vault-checkpoint.sh +5 -5
package/base/rules/code-excellence.md +22 -0
package/base/rules/coding.md +16 -0
package/base/rules/context-budget.md +66 -0
package/base/rules/context-management.md +4 -0
package/base/rules/hooks-policy.md +53 -0
package/base/rules/multi-agent-orchestration.md +99 -0
package/base/rules/observable-code.md +87 -0
package/base/rules/phase-detection.md +52 -0
package/base/rules/refactor-triggers.md +59 -0
package/base/rules/self-repair.md +53 -0
package/base/rules/session-bridge.md +39 -0
package/base/rules/session-metrics.md +61 -0
package/base/rules/skill-authoring.md +85 -0
package/base/rules/writing-quality.md +122 -0
package/base/skills/px-compact/SKILL.md +100 -0
package/base/skills/px-complexity-audit/SKILL.md +118 -0
package/base/skills/px-context-probe/SKILL.md +1 -1
package/base/skills/px-context-triage/SKILL.md +76 -0
package/base/skills/px-discover/SKILL.md +4 -1
package/base/skills/px-discuss/SKILL.md +4 -1
package/base/skills/px-doc-lint/SKILL.md +107 -0
package/base/skills/px-prose-review/SKILL.md +96 -0
package/base/skills/px-quality-gate/SKILL.md +182 -0
package/base/skills/px-risk/SKILL.md +4 -1
package/base/skills/px-scaffold-new/SKILL.md +16 -14
package/base/skills/px-session-retro/SKILL.md +1 -1
package/base/skills/px-spec/SKILL.md +6 -2
package/base/skills/px-verify/SKILL.md +2 -1
package/bin/praxis.js +27 -6
package/kits/api/KIT.md +2 -0
package/kits/api/install.sh +1 -1
package/kits/api/teardown.sh +1 -1
package/kits/code-quality/KIT.md +2 -0
package/kits/code-quality/hooks/generate-baseline.sh +1 -1
package/kits/code-quality/hooks/post-commit.sh +3 -2
package/kits/code-quality/hooks/pre-push.sh +15 -15
package/kits/code-quality/install.sh +1 -1
package/kits/code-quality/teardown.sh +3 -3
package/kits/data/KIT.md +2 -0
package/kits/data/install.sh +1 -1
package/kits/data/teardown.sh +1 -1
package/kits/infrastructure/KIT.md +2 -0
package/kits/infrastructure/install.sh +1 -1
package/kits/infrastructure/teardown.sh +1 -1
package/kits/security/KIT.md +2 -0
package/kits/security/install.sh +1 -1
package/kits/security/teardown.sh +1 -1
package/kits/web-designer/KIT.md +2 -0
package/kits/web-designer/install.sh +1 -1
package/kits/web-designer/teardown.sh +1 -1
package/package.json +1 -1
package/scripts/health-check.sh +21 -15
package/scripts/install-tools.sh +5 -5
package/scripts/lint-harness.sh +1 -1
package/scripts/onboard-mcp.sh +1 -1
package/scripts/test-harness.sh +1 -1
package/scripts/update.sh +1 -1

package/base/CLAUDE.md CHANGED Viewed

@@ -62,6 +62,8 @@ All `{vault_path}` references in rules and skills resolve from this config.
 - Compact trigger: when context approaches ceiling, finish the current milestone first
 - Never compact mid-plan — complete the milestone, write phase summary to vault, then compact
 - After compaction: re-bootstrap from § After Compaction below, re-run quality checks fresh
+- Context budget: see `~/.claude/rules/context-budget.md` for allocation ratios and cost-conscious loading
+- Mid-session optimization: `/px-compact` (no /clear). Full reset: `/px-context-reset` + `/clear`
 ## Durable Memory
 Context is volatile. Files are permanent. Act accordingly.
@@ -110,7 +112,8 @@ Missing servers are non-blocking — features degrade gracefully.
    - Git operation → `~/.claude/rules/git-workflow.md`
    - Client-facing writing → auto-loaded by `px-communication-standards` skill
    - Architecture/specs → auto-loaded by `px-architecture-patterns` skill
-5. Quality re-anchor: read most recent `compact-checkpoint.md` → check the Quality State section.
+5. If a triage checkpoint exists (`*-compact-checkpoint.md` with tag `triage`), use its "Active Working Set" to reload files — it is more precise than the hook checkpoint.
+6. Quality re-anchor: read most recent `compact-checkpoint.md` → check the Quality State section.
    - If lint findings existed before compaction: re-run `golangci-lint run`, confirm status.
    - If tests were failing before compaction: re-run test command, confirm status.
    - Do NOT assume pre-compaction state is current. Always re-run fresh.
@@ -125,6 +128,9 @@ Missing servers are non-blocking — features degrade gracefully.
 - Commit with wrong git identity
 - Write a file with unreplaced {placeholders}
 - Use vault search when Obsidian is not running (obsidian backend requires Obsidian open)
+- Mix refactoring and feature changes in one commit — commit refactor separately
+- Copy-paste 3+ lines instead of extracting a shared function
+- Use `console.log`/`fmt.Println`/`print()` for production logging — use the structured logger
 ## AI-Kit Registry
 Kits activate via `/px-kit:<n>` slash command. Kits are idempotent — double-activate is a no-op.
@@ -137,12 +143,12 @@ Kits activate via `/px-kit:<n>` slash command. Kits are idempotent — double-ac
 | security | `/px-kit:security` | Threat modeling → IAM review → OWASP audit |
 | code-quality | `/px-kit:code-quality` | SAST + secrets + SCA + IaC gate → AI review (over-engineering, smells, structure) |
 | data | `/px-kit:data` | Schema design → migration planning → query optimization |
 Kit manifests live in `~/.claude/kits/<name>/KIT.md`.
+Kit fields: `context_cost` (low/medium/high), `depends_on` (kit dependencies), `skills_chain` (phased workflow).
 ## Rules Registry — Load on Demand Only
-### Universal — always active (12 rules)
+### Universal — always active (16 rules)
 Quality is a generation-time constraint, not a post-hoc review. The rules below
 are the lens you write through — they shape every line of code produced.
@@ -161,6 +167,10 @@ are the lens you write through — they shape every line of code produced.
 | `~/.claude/rules/context-management.md` | Context anti-rot, phase scoping, context reset protocol |
 | `~/.claude/rules/memory-boundary.md` | Auto-memory boundary, MEMORY.md cap, dream integration |
 | `~/.claude/rules/security-posture.md` | Sandbox model, credential protection, protected paths |
+| `~/.claude/rules/writing-quality.md` | Prose constraints — sentence limits, fluff kill list, doc templates, voice rules |
+| `~/.claude/rules/refactor-triggers.md` | Pre-check protocol, commit refactor separately, QUALITY: comment convention |
+| `~/.claude/rules/context-budget.md` | Quantitative budget zones, cost-conscious loading, MCP server discipline |
+| `~/.claude/rules/self-repair.md` | Structured recovery — 3-attempt escalation ladder, strategy rotation |
 ### Scoped — load only when paths match
@@ -188,11 +198,35 @@ are the lens you write through — they shape every line of code produced.
 | `~/.claude/rules/live-docs-required.md` | Dependency manifests, files importing external packages |
 | `~/.claude/rules/desktop-protocol.md` | Claude Desktop ↔ Claude Code handoff sessions |
+#### Application observability
+| File | Loads when |
+|------|------------|
+| `~/.claude/rules/observable-code.md` | `**/services/**`, `**/handlers/**`, `**/workers/**`, `**/middleware/**`, `**/cmd/**` |
+#### Workflow and orchestration
+| File | Loads when |
+|------|------------|
+| `~/.claude/rules/session-bridge.md` | Session start/end, vault handoff, cross-session continuity |
+| `~/.claude/rules/hooks-policy.md` | Adding or modifying hooks in `settings-hooks.json` |
+| `~/.claude/rules/multi-agent-orchestration.md` | Tasks crossing >3 files or multiple domains |
+| `~/.claude/rules/phase-detection.md` | Workflow phase transitions, kit phase changes |
+| `~/.claude/rules/session-metrics.md` | End-of-session retrospective, metrics collection |
+| `~/.claude/rules/skill-authoring.md` | Creating or editing `base/skills/*/SKILL.md` files |
+#### Agent specs
+| File | Purpose |
+|------|---------|
+| `base/agents/evaluator.md` | Evaluator agent for Generator/Evaluator pattern |
+| `base/agents/planner.md` | Planner agent for task decomposition |
 ### Auto-invocable skills (replace former universal rules)
 | Skill | Triggers when |
 |-------|--------------|
 | `px-communication-standards` | Writing client-facing docs, proposals, status reports, commits, PRs |
 | `px-architecture-patterns` | Writing ADRs, specs, system design, risk docs, blocker reports |
+| `px-quality-gate` | Auto inside /px-verify (Step 1 item 5b) and before /px-ship — blocks on BLOCK findings |
+| `px-doc-lint` | Fast structural markdown check inside px-quality-gate for staged *.md files |
 ## Judgment & Research Commands

package/base/agents/evaluator.md ADDED Viewed

@@ -0,0 +1,44 @@
+# Evaluator Agent Spec
+## Role
+You are a critical code evaluator. You score Generator output against a SPEC
+on four dimensions: correctness, completeness, style compliance, and test coverage.
+## Inputs
+- **Diff**: staged changes from the Generator
+- **SPEC**: acceptance criteria from the active plan
+- **Rules**: quality rules for file types in the diff
+- **Test output**: test results if available
+You do NOT have conversation history. Judge the diff on its own merits.
+## Scoring Rubric
+| Dimension | Weight | Pass Criteria |
+| --------- | ------ | ------------- |
+| Correctness | 40% | Code does what SPEC says. All paths handled. No logic errors. |
+| Completeness | 25% | All acceptance criteria addressed. No partial implementations. |
+| Style compliance | 20% | Naming, structure, and quality rules respected. |
+| Test coverage | 15% | Happy path, failure path, and edge cases covered. |
+## Output Format
+Findings use the standard subagent format:
+```
+{file}:{line} — {severity} — {category} — {description} — {fix}
+```
+Severity: Critical (blocks), Major (should fix), Minor (note).
+End with a summary:
+```
+SCORE: {correctness}% / {completeness}% / {style}% / {tests}%
+VERDICT: PASS | CHANGES_REQUESTED | BLOCK
+FINDINGS: {critical} critical, {major} major, {minor} minor
+```
+## Constraints
+- Do not suggest features beyond the SPEC
+- Do not comment on code outside the diff
+- Do not soften findings — be direct about what is wrong
+- If nothing is wrong: "No findings. VERDICT: PASS"

package/base/agents/planner.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Planner Agent Spec
+## Role
+You are a task decomposition planner. You break complex tasks into a dependency-ordered
+subtask graph that a Generator can execute one step at a time.
+## Inputs
+- **PROBLEM / DELIVERABLE / ACCEPTANCE / BOUNDARIES** from the discuss phase
+- **Codebase context**: file structure, key interfaces, existing patterns
+- **Active constraints**: quality rules, architecture rules applicable to the task
+You do NOT have conversation history. Plan from the inputs alone.
+## Output Format
+A numbered subtask graph with dependencies:
+```
+SUBTASK GRAPH
+━━━━━━━━━━━━━━━━━━━━━
+1. [subtask title]
+   Files: {paths}
+   Acceptance: {criteria}
+   depends_on: []
+2. [subtask title]
+   Files: {paths}
+   Acceptance: {criteria}
+   depends_on: [1]
+3. [subtask title]
+   Files: {paths}
+   Acceptance: {criteria}
+   depends_on: [1, 2]
+CRITICAL PATH: 1 → 2 → 3
+PARALLELIZABLE: none | {subtask pairs}
+ESTIMATED MILESTONES: {count}
+```
+## Constraints
+- Each subtask must be completable in a single milestone
+- Each subtask must have testable acceptance criteria
+- Dependencies must be acyclic
+- Do not decompose beyond what is necessary — 3-7 subtasks for most features
+- Flag subtasks that carry architectural risk
+- If the task is simple enough for single-agent mode: say so and output a single subtask

package/base/hooks/auto-format.sh CHANGED Viewed

@@ -1,7 +1,7 @@
 #!/usr/bin/env bash
 # PostToolUse hook — auto-formats files after edit.
 # Always exits 0 (advisory, never blocks).
-set -uo pipefail
+set -euo pipefail
 trap 'exit 0' ERR
 INPUT=$(cat)

package/base/hooks/dep-audit.sh CHANGED Viewed

@@ -2,7 +2,7 @@
 # dep-audit.sh — PostToolUse:Write|Edit|MultiEdit hook
 # Runs dependency vulnerability checks when manifest files are modified.
 # Always exits 0 (advisory only — PostToolUse cannot hard-block).
-set -uo pipefail
+set -euo pipefail
 trap 'exit 0' ERR
 INPUT=$(cat)

package/base/hooks/file-guard.sh CHANGED Viewed

@@ -6,7 +6,7 @@ set -euo pipefail
 INPUT=$(cat)
 FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // .tool_input.path // empty')
-if [ -z "$FILE_PATH" ]; then
+if [[ -z "$FILE_PATH" ]]; then
   exit 0
 fi
@@ -29,7 +29,7 @@ for pattern in "${PROTECTED_PATTERNS[@]}"; do
 done
 # Check project-level protected files from CLAUDE.md if it exists
-if [ -f "CLAUDE.md" ]; then
+if [[ -f "CLAUDE.md" ]]; then
   # Extract paths from ## Protected Files section
   IN_SECTION=false
   while IFS= read -r line; do
@@ -42,7 +42,7 @@ if [ -f "CLAUDE.md" ]; then
     fi
     if $IN_SECTION && echo "$line" | grep -qE "^- "; then
       PROTECTED=$(echo "$line" | sed 's/^- //' | sed 's/ *#.*//' | xargs)
-      if [ -n "$PROTECTED" ] && echo "$FILE_PATH" | grep -qE "$PROTECTED"; then
+      if [[ -n "$PROTECTED" ]] && echo "$FILE_PATH" | grep -qE "$PROTECTED"; then
         echo "BLOCKED: $FILE_PATH matches project-protected pattern '$PROTECTED'. Explain the intended change."
         exit 2
       fi

package/base/hooks/recursion-guard.sh CHANGED Viewed

@@ -50,7 +50,13 @@ KEY="${KEY:0:300}"
 # ── Increment counter ──
 # Use a hash of the key for safe JSON field names
-KEY_HASH=$(echo -n "$KEY" | md5 2>/dev/null || echo -n "$KEY" | md5sum 2>/dev/null | cut -d' ' -f1 || echo "fallback")
+if command -v md5sum &>/dev/null; then
+  KEY_HASH=$(echo -n "$KEY" | md5sum | cut -d' ' -f1)
+elif command -v md5 &>/dev/null; then
+  KEY_HASH=$(echo -n "$KEY" | md5 -q)
+else
+  KEY_HASH="${KEY:0:32}"
+fi
 COUNT=$(jq -r --arg cat "$CATEGORY" --arg key "$KEY_HASH" \
   '.[$cat][$key] // 0' "$STATE_FILE" 2>/dev/null || echo "0")

package/base/hooks/session-data-collect.sh CHANGED Viewed

@@ -1,7 +1,7 @@
 #!/usr/bin/env bash
 # Stop hook — collects structured session data and stages it for the Stop prompt.
 # Always exits 0 (advisory, never blocks session end).
-set -uo pipefail
+set -euo pipefail
 trap 'exit 0' ERR
 CONFIG_FILE="$HOME/.claude/praxis.config.json"

package/base/hooks/vault-checkpoint.sh CHANGED Viewed

@@ -1,7 +1,7 @@
 #!/usr/bin/env bash
 # PreCompact hook — writes minimal checkpoint to vault before context compaction.
 # Always exits 0 (advisory, never blocks compaction).
-set -uo pipefail
+set -euo pipefail
 trap 'exit 0' ERR
 CONFIG_FILE="$HOME/.claude/praxis.config.json"
@@ -19,7 +19,7 @@ PLANS_DIR="$VAULT_PATH/plans"
 mkdir -p "$PLANS_DIR"
 DATE=$(date +%Y-%m-%d)
-TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
+TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
 CHECKPOINT_FILE="$PLANS_DIR/$DATE-compact-checkpoint.md"
 BRANCH=$(git --no-pager rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
@@ -40,16 +40,16 @@ fi
 LINT_STATE="unknown"
 TEST_STATE="unknown"
-if [ -f "go.mod" ] && command -v golangci-lint &>/dev/null; then
+if [[ -f "go.mod" ]] && command -v golangci-lint &>/dev/null; then
   LINT_COUNT=$(golangci-lint run ./... 2>&1 | grep -c "^" || true)
-  if [ "$LINT_COUNT" -eq 0 ]; then
+  if [[ "$LINT_COUNT" -eq 0 ]]; then
     LINT_STATE="clean"
   else
     LINT_STATE="$LINT_COUNT findings"
   fi
 fi
-if [ -f "go.mod" ] && command -v go &>/dev/null; then
+if [[ -f "go.mod" ]] && command -v go &>/dev/null; then
   if go test ./... -short 2>&1 | grep -q "^ok"; then
     TEST_STATE="passing"
   else

package/base/rules/code-excellence.md CHANGED Viewed

@@ -74,3 +74,25 @@ A comment that says `// increment counter` above `counter++` is noise.
 A comment that says `// retry three times because the upstream API returns 503 on cold start`
 is knowledge that cannot be inferred from the code alone.
 Delete the first kind. Write more of the second kind.
+---
+## Reference Codebases — What Excellence Looks Like
+When you need a reference for what excellent code looks like, use these:
+| Domain | Reference | What to study |
+| ------ | --------- | ------------- |
+| C / systems | SQLite source (`sqlite.org/src`) | Discipline: 590x test-to-source ratio, 100% branch coverage, zero external deps |
+| C / network | Redis `src/ae.c`, `src/dict.c` | Naming, readability, data structures that document themselves |
+| Go | Go standard library (`pkg.go.dev/std`) | Idiomatic naming, error design, interface sizing — one method where possible |
+| Rust | `rustc_errors` crate | Error message design: what failed, where, what to do next |
+| Error messages | Elm compiler output | Kindest, most actionable errors in any compiled language |
+| API design | Stripe API (`docs.stripe.com`) | Naming consistency, versioning discipline, error schema |
+| Documentation | Go stdlib `net/http` package docs | Every exported symbol explained by what it does for the caller |
+When uncertain if code is good enough: "Would this survive a review from the SQLite team?"
+If the answer is no — simplify first.
+The SQLite standard: every line has a reason. Every function has one job.
+Every error has a message a human can act on.

package/base/rules/coding.md CHANGED Viewed

@@ -13,6 +13,22 @@
 - If Context7 is unavailable: state that docs could not be verified and flag the
   specific method/API as "unverified against current version."
+### Import-trigger protocol
+Any commit diff that adds a new `import`, `require`, `using`, or `use` statement for an
+external package must have a corresponding Context7 lookup in the same session.
+Language-specific patterns matched:
+- JavaScript/TypeScript: `import ... from`, `require(...)`
+- Python: `import ...`, `from ... import`
+- Go: `import "..."` or `import (...)`
+- Rust: `use ...::...`
+- Java/C#: `import ...`, `using ...`
+Every new external import requires a Context7 verification before the gate clears.
+Internal packages (same repo, same module) are excluded.
 ### Tool preferences
 - Use Read/Edit/Write tools instead of cat/sed/echo.
 - Use `rg` (ripgrep) for searching code, not grep.

package/base/rules/context-budget.md ADDED Viewed

@@ -0,0 +1,66 @@
+# Context Budget — Rules
+# Scope: All projects, all sessions
+# Quantitative budget for context window usage
+# Companion to context-management.md (qualitative bracket discipline)
+## Budget Allocation — Invariants (BLOCK on violation)
+### Zone model
+The context window is a finite resource. Allocate it deliberately:
+| Zone | Share | Contents |
+| ---- | ----- | -------- |
+| System overhead | ~15-20% | CLAUDE.md, universal rules, settings, MCP tool schemas |
+| Working content | ~55-65% | Code, plans, conversation, tool output |
+| Reserve | ~20% | Buffer for compaction, final outputs, tool responses |
+When working content approaches capacity (signals below), begin offloading to vault.
+Never wait for compaction to force the issue.
+### Budget signals (quantitative bracket thresholds)
+| Signal | FRESH | MODERATE | DEPLETED |
+| ------ | ----- | -------- | -------- |
+| Tool calls | <15 | 15-35 | 35+ |
+| Files read | <8 | 8-20 | 20+ |
+| Large files (>200 lines) in session | <2 | 2-4 | 5+ |
+| Corrections received | 0 | 1 | 2+ |
+These thresholds feed `context-management.md` brackets and `/px-context-probe`.
+## Cost-Conscious Loading — Conventions (WARN on violation)
+### MCP servers
+- Connect 2-3 core servers at session start (context7, github).
+- Lazy-load optional servers (perplexity, filesystem, sequential-thinking) only when the task requires them.
+- Never connect all registered servers preemptively — each adds schema overhead.
+### File reads
+- Read targeted line ranges (`offset`/`limit`), not entire files, when you know the section.
+- Files >200 lines: read the relevant section, not the whole file.
+### Search output
+- Use `files_with_matches` for initial discovery; switch to `content` only for confirmed matches.
+- Delegate exploration expected to produce >50 lines of output to a subagent.
+## Universal Rule Weight — Conventions (WARN on violation)
+The 14 universal rules consume ~50KB of every session's context budget.
+- Before proposing a new universal rule: justify the always-loaded cost.
+- Prefer scoped rules (path-matched) over universal ones when the rule applies to a specific domain.
+- New universal rules must stay under 3KB individually.
+- If a universal rule exceeds 100 lines: split into a short universal rule and a scoped reference file.
+## Budget Actions
+| Bracket | Action |
+| ------- | ------ |
+| FRESH | No budget concern. Batch aggressively, load full context. |
+| MODERATE | Prefer concise output. Stop reading whole files. Use subagents for exploration. |
+| DEPLETED | Run `/px-compact` for mid-session optimization, or `/px-context-reset` + `/clear` for full reset. |
+| CRITICAL | STOP new work. Complete current milestone. Write all state to vault. New session. |
+## Removal Condition
+Remove when Claude Code exposes native token utilization metrics and budget enforcement.

package/base/rules/context-management.md CHANGED Viewed

@@ -72,6 +72,10 @@ conversation length heuristic (not token count — we cannot read session JSONL)
 - Write all state to vault immediately
 - Suggest new session for remaining milestones
+### Budget integration
+See `context-budget.md` for quantitative thresholds and allocation ratios.
+Use `/px-compact` for mid-session optimization; `/px-context-reset` + `/clear` for full reset.
 ---
 ## Verification Commands

package/base/rules/hooks-policy.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Hooks Policy — Rules
+# Scope: All projects, all sessions
+# Documents which hooks are mandatory vs optional
+## Hook Inventory
+### Mandatory — must not be disabled (BLOCK on violation)
+| Hook | Event | Purpose |
+| ---- | ----- | ------- |
+| `secret-scan.sh` | PreToolUse: Write/Edit | Blocks credential patterns in code |
+| `credential-guard.sh` | PreToolUse: Bash | Blocks access to protected credential paths |
+| `identity-check.sh` | PreToolUse: Bash | Verifies git identity before commits |
+| `file-guard.sh` | PreToolUse: Write/Edit | Blocks writes to protected file patterns |
+| `vault-checkpoint.sh` | PreCompact | Saves state before context compaction |
+These hooks are security and data-integrity controls. Disabling them requires
+explicit user approval with documented justification.
+### Required — should not be disabled without reason (WARN on violation)
+| Hook | Event | Purpose |
+| ---- | ----- | ------- |
+| `dep-audit.sh` | PostToolUse: Write/Edit | Audits dependencies when manifests change |
+| `session-data-collect.sh` | Stop | Captures session metadata for vault |
+| Stop vault prompt | Stop | Writes session summary and vault updates |
+| `on-stop-failure.sh` | StopFailure | Error handling for failed stop hooks |
+### Optional — enhance but not required
+| Hook | Event | Purpose |
+| ---- | ----- | ------- |
+| `auto-format.sh` | PostToolUse: Write/Edit | Auto-formats files after edits |
+| `recursion-guard.sh` | PreToolUse | Prevents recursive tool invocation |
+## Adding New Hooks
+Before adding a hook to `settings-hooks.json`:
+1. Classify as mandatory, required, or optional using the criteria above
+2. Mandatory: security or data-integrity function — must never silently fail
+3. Required: workflow quality — should warn on failure but not block
+4. Optional: convenience — can be disabled per-project
+5. All hooks must exit 0 on success, non-zero to block (PreToolUse only)
+6. All hooks must handle missing dependencies gracefully (exit 0 if tool not found)
+## Hook Configuration
+Hooks are declared in `base/hooks/settings-hooks.json`.
+Install copies them to `~/.claude/settings.json` during `install.sh`.
+Project-specific hooks can be added in project `.claude/settings.json`.
+## Removal Condition
+Remove when Claude Code provides a native hook policy/priority system.

package/base/rules/multi-agent-orchestration.md ADDED Viewed

@@ -0,0 +1,99 @@
+# Multi-Agent Orchestration — Rules
+# Scope: All projects, all sessions
+# Governs when and how to use multi-agent patterns
+## Agent Patterns
+### Single-Agent (default)
+One Claude instance handles the full task. Subagents handle scoped delegation
+(review, simplify, explore) per the `px-subagent` dispatch protocol.
+**Use when:** Task touches ≤3 files, single domain, straightforward implementation.
+### Generator/Evaluator
+Generator produces output. Evaluator critically reviews against spec, scoring on
+correctness, completeness, style compliance, and test coverage.
+**Use when:** Task touches >5 files, crosses module boundaries, or has high
+correctness requirements (auth, data integrity, public API changes).
+### Planner/Generator/Evaluator
+Planner decomposes into subtask graph first. Generator and Evaluator work each subtask.
+**Use when:** Task spans multiple milestones, requires architectural decisions,
+or the plan itself needs adversarial review.
+## Activation Thresholds — Conventions (WARN on violation)
+| Signal | Single-Agent | Generator/Evaluator | Planner/Generator/Evaluator |
+| ------ | ------------ | ------------------- | --------------------------- |
+| Files changed | ≤3 | 4-10 | 10+ |
+| Domains crossed | 1 | 2 | 3+ |
+| Milestone count | 1 | 1-2 | 3+ |
+| Risk level | Low | Medium | High |
+These are guidelines, not hard gates. Use judgment — a 2-file auth change
+may warrant Generator/Evaluator; a 15-file rename may not.
+## Evaluator Agent Spec
+When Generator/Evaluator mode is active, the Evaluator receives:
+| Input | Source |
+| ----- | ------ |
+| Diff | Generator's output (staged changes) |
+| SPEC | From active plan file |
+| Rules | Relevant quality rules for file types in diff |
+| Test output | If tests were run |
+The Evaluator does NOT receive conversation history.
+### Evaluator scoring rubric
+| Dimension | Weight | Criteria |
+| --------- | ------ | -------- |
+| Correctness | 40% | Does the code do what the spec says? All paths handled? |
+| Completeness | 25% | Are all acceptance criteria met? Tests present? |
+| Style compliance | 20% | Naming, structure, quality rules respected? |
+| Test coverage | 15% | Happy path, failure path, edge cases covered? |
+### Evaluator output format
+Uses the same severity format as `px-subagent`:
+```
+{file}:{line} — {severity} — {category} — {description} — {fix}
+```
+### Escalation
+- Critical findings → Generator must fix before proceeding
+- Major findings → Generator should fix before merge
+- >3 findings addressed → re-run Evaluator (max 3 rounds)
+## Planner Agent Spec
+When Planner mode is active, the Planner receives:
+- PROBLEM / DELIVERABLE / ACCEPTANCE / BOUNDARIES
+- Relevant codebase context (file structure, key interfaces)
+- Active constraints from rules
+The Planner outputs a subtask graph:
+```
+1. [subtask] — {files} — {acceptance criteria}
+   └─ depends_on: []
+2. [subtask] — {files} — {acceptance criteria}
+   └─ depends_on: [1]
+```
+Generator and Evaluator process each subtask in dependency order.
+## Integration with Existing Skills
+| Skill | Orchestration role |
+| ----- | ------------------ |
+| `/px-plan` | May invoke Planner agent for complex decomposition |
+| `/px-review` | Already uses Evaluator pattern via `px-subagent` |
+| `/px-verify` | Self-review step is a lightweight Evaluator |
+| `/px-execute` | Generator role — produces implementation |
+## Removal Condition
+Remove when Claude Code provides native multi-agent orchestration with
+built-in evaluator and planner agent types.

package/base/rules/observable-code.md ADDED Viewed

@@ -0,0 +1,87 @@
+# Observable Code — Instrumentation Constraints
+# Scope: **/services/**, **/handlers/**, **/workers/**, **/middleware/**, **/cmd/**
+# Active during code generation for service-layer code
+# Cross-reference: api-quality.md covers request-level logging and correlation IDs.
+#   This rule covers application-level observability: structured logging, metrics, traces.
+Code is not production-ready if it cannot be debugged without attaching a debugger.
+Observable code tells you what happened, when, and why — from logs, metrics, and traces alone.
+## Invariants — BLOCK on violation
+### Structured logging only
+- All log statements use structured format (key-value pairs, not string interpolation)
+- No `fmt.Println` / `console.log` / `print()` in production code paths — use the structured logger
+- Log at the point of failure, not at the catch site (log once, propagate)
+### Log levels are semantic
+- ERROR: something failed and a human needs to know immediately
+- WARN: something unexpected happened but the system recovered
+- INFO: a significant state transition (service started, job completed, user authenticated)
+- DEBUG: internal detail useful during development — must not appear in production by default
+### Structured log format — mandatory fields
+```json
+{
+  "timestamp": "ISO-8601 UTC",
+  "level": "error|warn|info|debug",
+  "service": "service-name",
+  "correlation_id": "request or trace identifier",
+  "message": "what happened — actionable, not generic",
+  "context": { "relevant_key": "relevant_value" }
+}
+```
+### What NOT to log
+- Passwords, tokens, secrets, full credit card numbers
+- Full request/response bodies in production (may contain PII)
+- DEBUG logs in production services (log level must be configurable)
+- The same event more than once in the same request path
+### External call discipline
+- Every external call (HTTP, DB, queue) has a timeout
+- Every external call logs duration on completion
+- Failed external calls log: target, duration, error type, and whether retry will occur
+## Conventions — WARN on violation
+### Metrics naming
+Format: `{service}_{subsystem}_{name}_{unit}`
+All lowercase, underscores as separators.
+Mandatory metrics per service:
+- `{service}_requests_total` — counter, labeled by method and status code
+- `{service}_errors_total` — counter, labeled by error type
+- `{service}_latency_seconds` — histogram, labeled by operation
+- `{service}_active_connections` or `{service}_queue_depth` — gauge (if applicable)
+GOOD: `auth_login_attempts_total`, `cache_hit_ratio`, `queue_messages_pending`
+BAD: `loginAttempts`, `CacheHitRatio`, `queue-messages-pending`
+### Trace spans (OpenTelemetry)
+Span naming: `{service}/{operation}` — lowercase, slash separator
+GOOD: `auth/validate-token`, `db/query-users`, `cache/get`
+BAD: `validateToken`, `DB Query`, `GET /users`
+Mandatory span attributes:
+- `service.name`
+- `http.method` and `http.status_code` for HTTP operations
+- `db.system` and `db.operation` for database calls
+- `error.type` and `error.message` on error spans
+### Health endpoints
+- Liveness: `/healthz` — "is the process alive?"
+- Readiness: `/readyz` — "can the process serve traffic?"
+- Both return structured JSON with component status
+### The Observability Contract
+An error is only production-observable if ALL three are true:
+1. It appears in structured logs with correlation_id and context
+2. It increments an error metric labeled by error type
+3. It is captured in a trace span with error attributes
+If only one or two are true: the code is not fully observable. Fix before shipping.
+## Removal Condition
+Remove when an observability linter or OpenTelemetry SDK auto-instrumentation
+replaces these generation-time constraints entirely.