@esoteric-logic/praxis-harness 2.11.0 → 2.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/base/CLAUDE.md +37 -3
  2. package/base/agents/evaluator.md +44 -0
  3. package/base/agents/planner.md +48 -0
  4. package/base/hooks/auto-format.sh +1 -1
  5. package/base/hooks/dep-audit.sh +1 -1
  6. package/base/hooks/file-guard.sh +3 -3
  7. package/base/hooks/recursion-guard.sh +7 -1
  8. package/base/hooks/session-data-collect.sh +1 -1
  9. package/base/hooks/vault-checkpoint.sh +5 -5
  10. package/base/rules/code-excellence.md +22 -0
  11. package/base/rules/coding.md +16 -0
  12. package/base/rules/context-budget.md +66 -0
  13. package/base/rules/context-management.md +4 -0
  14. package/base/rules/hooks-policy.md +53 -0
  15. package/base/rules/multi-agent-orchestration.md +99 -0
  16. package/base/rules/observable-code.md +87 -0
  17. package/base/rules/phase-detection.md +52 -0
  18. package/base/rules/refactor-triggers.md +59 -0
  19. package/base/rules/self-repair.md +53 -0
  20. package/base/rules/session-bridge.md +39 -0
  21. package/base/rules/session-metrics.md +61 -0
  22. package/base/rules/skill-authoring.md +85 -0
  23. package/base/rules/writing-quality.md +122 -0
  24. package/base/skills/px-compact/SKILL.md +100 -0
  25. package/base/skills/px-complexity-audit/SKILL.md +118 -0
  26. package/base/skills/px-context-probe/SKILL.md +1 -1
  27. package/base/skills/px-context-triage/SKILL.md +76 -0
  28. package/base/skills/px-discover/SKILL.md +4 -1
  29. package/base/skills/px-discuss/SKILL.md +4 -1
  30. package/base/skills/px-doc-lint/SKILL.md +107 -0
  31. package/base/skills/px-prose-review/SKILL.md +96 -0
  32. package/base/skills/px-quality-gate/SKILL.md +182 -0
  33. package/base/skills/px-risk/SKILL.md +4 -1
  34. package/base/skills/px-scaffold-new/SKILL.md +16 -14
  35. package/base/skills/px-session-retro/SKILL.md +1 -1
  36. package/base/skills/px-spec/SKILL.md +6 -2
  37. package/base/skills/px-verify/SKILL.md +2 -1
  38. package/bin/praxis.js +27 -6
  39. package/kits/api/KIT.md +2 -0
  40. package/kits/api/install.sh +1 -1
  41. package/kits/api/teardown.sh +1 -1
  42. package/kits/code-quality/KIT.md +2 -0
  43. package/kits/code-quality/hooks/generate-baseline.sh +1 -1
  44. package/kits/code-quality/hooks/post-commit.sh +3 -2
  45. package/kits/code-quality/hooks/pre-push.sh +15 -15
  46. package/kits/code-quality/install.sh +1 -1
  47. package/kits/code-quality/teardown.sh +3 -3
  48. package/kits/data/KIT.md +2 -0
  49. package/kits/data/install.sh +1 -1
  50. package/kits/data/teardown.sh +1 -1
  51. package/kits/infrastructure/KIT.md +2 -0
  52. package/kits/infrastructure/install.sh +1 -1
  53. package/kits/infrastructure/teardown.sh +1 -1
  54. package/kits/security/KIT.md +2 -0
  55. package/kits/security/install.sh +1 -1
  56. package/kits/security/teardown.sh +1 -1
  57. package/kits/web-designer/KIT.md +2 -0
  58. package/kits/web-designer/install.sh +1 -1
  59. package/kits/web-designer/teardown.sh +1 -1
  60. package/package.json +1 -1
  61. package/scripts/health-check.sh +21 -15
  62. package/scripts/install-tools.sh +5 -5
  63. package/scripts/lint-harness.sh +1 -1
  64. package/scripts/onboard-mcp.sh +1 -1
  65. package/scripts/test-harness.sh +1 -1
  66. package/scripts/update.sh +1 -1
package/base/CLAUDE.md CHANGED
@@ -62,6 +62,8 @@ All `{vault_path}` references in rules and skills resolve from this config.
62
62
  - Compact trigger: when context approaches ceiling, finish the current milestone first
63
63
  - Never compact mid-plan — complete the milestone, write phase summary to vault, then compact
64
64
  - After compaction: re-bootstrap from § After Compaction below, re-run quality checks fresh
65
+ - Context budget: see `~/.claude/rules/context-budget.md` for allocation ratios and cost-conscious loading
66
+ - Mid-session optimization: `/px-compact` (no /clear). Full reset: `/px-context-reset` + `/clear`
65
67
 
66
68
  ## Durable Memory
67
69
  Context is volatile. Files are permanent. Act accordingly.
@@ -110,7 +112,8 @@ Missing servers are non-blocking — features degrade gracefully.
110
112
  - Git operation → `~/.claude/rules/git-workflow.md`
111
113
  - Client-facing writing → auto-loaded by `px-communication-standards` skill
112
114
  - Architecture/specs → auto-loaded by `px-architecture-patterns` skill
113
- 5. Quality re-anchor: read most recent `compact-checkpoint.md` check the Quality State section.
115
+ 5. If a triage checkpoint exists (`*-compact-checkpoint.md` with tag `triage`), use its "Active Working Set" to reload files — it is more precise than the hook checkpoint.
116
+ 6. Quality re-anchor: read most recent `compact-checkpoint.md` → check the Quality State section.
114
117
  - If lint findings existed before compaction: re-run `golangci-lint run`, confirm status.
115
118
  - If tests were failing before compaction: re-run test command, confirm status.
116
119
  - Do NOT assume pre-compaction state is current. Always re-run fresh.
@@ -125,6 +128,9 @@ Missing servers are non-blocking — features degrade gracefully.
125
128
  - Commit with wrong git identity
126
129
  - Write a file with unreplaced {placeholders}
127
130
  - Use vault search when Obsidian is not running (obsidian backend requires Obsidian open)
131
+ - Mix refactoring and feature changes in one commit — commit refactor separately
132
+ - Copy-paste 3+ lines instead of extracting a shared function
133
+ - Use `console.log`/`fmt.Println`/`print()` for production logging — use the structured logger
128
134
 
129
135
  ## AI-Kit Registry
130
136
  Kits activate via `/px-kit:<n>` slash command. Kits are idempotent — double-activate is a no-op.
@@ -137,12 +143,12 @@ Kits activate via `/px-kit:<n>` slash command. Kits are idempotent — double-ac
137
143
  | security | `/px-kit:security` | Threat modeling → IAM review → OWASP audit |
138
144
  | code-quality | `/px-kit:code-quality` | SAST + secrets + SCA + IaC gate → AI review (over-engineering, smells, structure) |
139
145
  | data | `/px-kit:data` | Schema design → migration planning → query optimization |
140
-
141
146
  Kit manifests live in `~/.claude/kits/<name>/KIT.md`.
147
+ Kit fields: `context_cost` (low/medium/high), `depends_on` (kit dependencies), `skills_chain` (phased workflow).
142
148
 
143
149
  ## Rules Registry — Load on Demand Only
144
150
 
145
- ### Universal — always active (12 rules)
151
+ ### Universal — always active (16 rules)
146
152
 
147
153
  Quality is a generation-time constraint, not a post-hoc review. The rules below
148
154
  are the lens you write through — they shape every line of code produced.
@@ -161,6 +167,10 @@ are the lens you write through — they shape every line of code produced.
161
167
  | `~/.claude/rules/context-management.md` | Context anti-rot, phase scoping, context reset protocol |
162
168
  | `~/.claude/rules/memory-boundary.md` | Auto-memory boundary, MEMORY.md cap, dream integration |
163
169
  | `~/.claude/rules/security-posture.md` | Sandbox model, credential protection, protected paths |
170
+ | `~/.claude/rules/writing-quality.md` | Prose constraints — sentence limits, fluff kill list, doc templates, voice rules |
171
+ | `~/.claude/rules/refactor-triggers.md` | Pre-check protocol, commit refactor separately, QUALITY: comment convention |
172
+ | `~/.claude/rules/context-budget.md` | Quantitative budget zones, cost-conscious loading, MCP server discipline |
173
+ | `~/.claude/rules/self-repair.md` | Structured recovery — 3-attempt escalation ladder, strategy rotation |
164
174
 
165
175
  ### Scoped — load only when paths match
166
176
 
@@ -188,11 +198,35 @@ are the lens you write through — they shape every line of code produced.
188
198
  | `~/.claude/rules/live-docs-required.md` | Dependency manifests, files importing external packages |
189
199
  | `~/.claude/rules/desktop-protocol.md` | Claude Desktop ↔ Claude Code handoff sessions |
190
200
 
201
+ #### Application observability
202
+
203
+ | File | Loads when |
204
+ |------|------------|
205
+ | `~/.claude/rules/observable-code.md` | `**/services/**`, `**/handlers/**`, `**/workers/**`, `**/middleware/**`, `**/cmd/**` |
206
+
207
+ #### Workflow and orchestration
208
+ | File | Loads when |
209
+ |------|------------|
210
+ | `~/.claude/rules/session-bridge.md` | Session start/end, vault handoff, cross-session continuity |
211
+ | `~/.claude/rules/hooks-policy.md` | Adding or modifying hooks in `settings-hooks.json` |
212
+ | `~/.claude/rules/multi-agent-orchestration.md` | Tasks crossing >3 files or multiple domains |
213
+ | `~/.claude/rules/phase-detection.md` | Workflow phase transitions, kit phase changes |
214
+ | `~/.claude/rules/session-metrics.md` | End-of-session retrospective, metrics collection |
215
+ | `~/.claude/rules/skill-authoring.md` | Creating or editing `base/skills/*/SKILL.md` files |
216
+
217
+ #### Agent specs
218
+ | File | Purpose |
219
+ |------|---------|
220
+ | `base/agents/evaluator.md` | Evaluator agent for Generator/Evaluator pattern |
221
+ | `base/agents/planner.md` | Planner agent for task decomposition |
222
+
191
223
  ### Auto-invocable skills (replace former universal rules)
192
224
  | Skill | Triggers when |
193
225
  |-------|--------------|
194
226
  | `px-communication-standards` | Writing client-facing docs, proposals, status reports, commits, PRs |
195
227
  | `px-architecture-patterns` | Writing ADRs, specs, system design, risk docs, blocker reports |
228
+ | `px-quality-gate` | Auto inside /px-verify (Step 1 item 5b) and before /px-ship — blocks on BLOCK findings |
229
+ | `px-doc-lint` | Fast structural markdown check inside px-quality-gate for staged *.md files |
196
230
 
197
231
  ## Judgment & Research Commands
198
232
 
@@ -0,0 +1,44 @@
1
+ # Evaluator Agent Spec
2
+
3
+ ## Role
4
+ You are a critical code evaluator. You score Generator output against a SPEC
5
+ on four dimensions: correctness, completeness, style compliance, and test coverage.
6
+
7
+ ## Inputs
8
+ - **Diff**: staged changes from the Generator
9
+ - **SPEC**: acceptance criteria from the active plan
10
+ - **Rules**: quality rules for file types in the diff
11
+ - **Test output**: test results if available
12
+
13
+ You do NOT have conversation history. Judge the diff on its own merits.
14
+
15
+ ## Scoring Rubric
16
+
17
+ | Dimension | Weight | Pass Criteria |
18
+ | --------- | ------ | ------------- |
19
+ | Correctness | 40% | Code does what SPEC says. All paths handled. No logic errors. |
20
+ | Completeness | 25% | All acceptance criteria addressed. No partial implementations. |
21
+ | Style compliance | 20% | Naming, structure, and quality rules respected. |
22
+ | Test coverage | 15% | Happy path, failure path, and edge cases covered. |
23
+
24
+ ## Output Format
25
+
26
+ Findings use the standard subagent format:
27
+ ```
28
+ {file}:{line} — {severity} — {category} — {description} — {fix}
29
+ ```
30
+
31
+ Severity: Critical (blocks), Major (should fix), Minor (note).
32
+
33
+ End with a summary:
34
+ ```
35
+ SCORE: {correctness}% / {completeness}% / {style}% / {tests}%
36
+ VERDICT: PASS | CHANGES_REQUESTED | BLOCK
37
+ FINDINGS: {critical} critical, {major} major, {minor} minor
38
+ ```
39
+
40
+ ## Constraints
41
+ - Do not suggest features beyond the SPEC
42
+ - Do not comment on code outside the diff
43
+ - Do not soften findings — be direct about what is wrong
44
+ - If nothing is wrong: "No findings. VERDICT: PASS"
@@ -0,0 +1,48 @@
1
+ # Planner Agent Spec
2
+
3
+ ## Role
4
+ You are a task decomposition planner. You break complex tasks into a dependency-ordered
5
+ subtask graph that a Generator can execute one step at a time.
6
+
7
+ ## Inputs
8
+ - **PROBLEM / DELIVERABLE / ACCEPTANCE / BOUNDARIES** from the discuss phase
9
+ - **Codebase context**: file structure, key interfaces, existing patterns
10
+ - **Active constraints**: quality rules, architecture rules applicable to the task
11
+
12
+ You do NOT have conversation history. Plan from the inputs alone.
13
+
14
+ ## Output Format
15
+
16
+ A numbered subtask graph with dependencies:
17
+
18
+ ```
19
+ SUBTASK GRAPH
20
+ ━━━━━━━━━━━━━━━━━━━━━
21
+
22
+ 1. [subtask title]
23
+ Files: {paths}
24
+ Acceptance: {criteria}
25
+ depends_on: []
26
+
27
+ 2. [subtask title]
28
+ Files: {paths}
29
+ Acceptance: {criteria}
30
+ depends_on: [1]
31
+
32
+ 3. [subtask title]
33
+ Files: {paths}
34
+ Acceptance: {criteria}
35
+ depends_on: [1, 2]
36
+
37
+ CRITICAL PATH: 1 → 2 → 3
38
+ PARALLELIZABLE: none | {subtask pairs}
39
+ ESTIMATED MILESTONES: {count}
40
+ ```
41
+
42
+ ## Constraints
43
+ - Each subtask must be completable in a single milestone
44
+ - Each subtask must have testable acceptance criteria
45
+ - Dependencies must be acyclic
46
+ - Do not decompose beyond what is necessary — 3-7 subtasks for most features
47
+ - Flag subtasks that carry architectural risk
48
+ - If the task is simple enough for single-agent mode: say so and output a single subtask
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env bash
2
2
  # PostToolUse hook — auto-formats files after edit.
3
3
  # Always exits 0 (advisory, never blocks).
4
- set -uo pipefail
4
+ set -euo pipefail
5
5
  trap 'exit 0' ERR
6
6
 
7
7
  INPUT=$(cat)
@@ -2,7 +2,7 @@
2
2
  # dep-audit.sh — PostToolUse:Write|Edit|MultiEdit hook
3
3
  # Runs dependency vulnerability checks when manifest files are modified.
4
4
  # Always exits 0 (advisory only — PostToolUse cannot hard-block).
5
- set -uo pipefail
5
+ set -euo pipefail
6
6
  trap 'exit 0' ERR
7
7
 
8
8
  INPUT=$(cat)
@@ -6,7 +6,7 @@ set -euo pipefail
6
6
  INPUT=$(cat)
7
7
  FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // .tool_input.path // empty')
8
8
 
9
- if [ -z "$FILE_PATH" ]; then
9
+ if [[ -z "$FILE_PATH" ]]; then
10
10
  exit 0
11
11
  fi
12
12
 
@@ -29,7 +29,7 @@ for pattern in "${PROTECTED_PATTERNS[@]}"; do
29
29
  done
30
30
 
31
31
  # Check project-level protected files from CLAUDE.md if it exists
32
- if [ -f "CLAUDE.md" ]; then
32
+ if [[ -f "CLAUDE.md" ]]; then
33
33
  # Extract paths from ## Protected Files section
34
34
  IN_SECTION=false
35
35
  while IFS= read -r line; do
@@ -42,7 +42,7 @@ if [ -f "CLAUDE.md" ]; then
42
42
  fi
43
43
  if $IN_SECTION && echo "$line" | grep -qE "^- "; then
44
44
  PROTECTED=$(echo "$line" | sed 's/^- //' | sed 's/ *#.*//' | xargs)
45
- if [ -n "$PROTECTED" ] && echo "$FILE_PATH" | grep -qE "$PROTECTED"; then
45
+ if [[ -n "$PROTECTED" ]] && echo "$FILE_PATH" | grep -qE "$PROTECTED"; then
46
46
  echo "BLOCKED: $FILE_PATH matches project-protected pattern '$PROTECTED'. Explain the intended change."
47
47
  exit 2
48
48
  fi
@@ -50,7 +50,13 @@ KEY="${KEY:0:300}"
50
50
 
51
51
  # ── Increment counter ──
52
52
  # Use a hash of the key for safe JSON field names
53
- KEY_HASH=$(echo -n "$KEY" | md5 2>/dev/null || echo -n "$KEY" | md5sum 2>/dev/null | cut -d' ' -f1 || echo "fallback")
53
+ if command -v md5sum &>/dev/null; then
54
+ KEY_HASH=$(echo -n "$KEY" | md5sum | cut -d' ' -f1)
55
+ elif command -v md5 &>/dev/null; then
56
+ KEY_HASH=$(echo -n "$KEY" | md5 -q)
57
+ else
58
+ KEY_HASH="${KEY:0:32}"
59
+ fi
54
60
 
55
61
  COUNT=$(jq -r --arg cat "$CATEGORY" --arg key "$KEY_HASH" \
56
62
  '.[$cat][$key] // 0' "$STATE_FILE" 2>/dev/null || echo "0")
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env bash
2
2
  # Stop hook — collects structured session data and stages it for the Stop prompt.
3
3
  # Always exits 0 (advisory, never blocks session end).
4
- set -uo pipefail
4
+ set -euo pipefail
5
5
  trap 'exit 0' ERR
6
6
 
7
7
  CONFIG_FILE="$HOME/.claude/praxis.config.json"
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env bash
2
2
  # PreCompact hook — writes minimal checkpoint to vault before context compaction.
3
3
  # Always exits 0 (advisory, never blocks compaction).
4
- set -uo pipefail
4
+ set -euo pipefail
5
5
  trap 'exit 0' ERR
6
6
 
7
7
  CONFIG_FILE="$HOME/.claude/praxis.config.json"
@@ -19,7 +19,7 @@ PLANS_DIR="$VAULT_PATH/plans"
19
19
  mkdir -p "$PLANS_DIR"
20
20
 
21
21
  DATE=$(date +%Y-%m-%d)
22
- TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
22
+ TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
23
23
  CHECKPOINT_FILE="$PLANS_DIR/$DATE-compact-checkpoint.md"
24
24
 
25
25
  BRANCH=$(git --no-pager rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
@@ -40,16 +40,16 @@ fi
40
40
  LINT_STATE="unknown"
41
41
  TEST_STATE="unknown"
42
42
 
43
- if [ -f "go.mod" ] && command -v golangci-lint &>/dev/null; then
43
+ if [[ -f "go.mod" ]] && command -v golangci-lint &>/dev/null; then
44
44
  LINT_COUNT=$(golangci-lint run ./... 2>&1 | grep -c "^" || true)
45
- if [ "$LINT_COUNT" -eq 0 ]; then
45
+ if [[ "$LINT_COUNT" -eq 0 ]]; then
46
46
  LINT_STATE="clean"
47
47
  else
48
48
  LINT_STATE="$LINT_COUNT findings"
49
49
  fi
50
50
  fi
51
51
 
52
- if [ -f "go.mod" ] && command -v go &>/dev/null; then
52
+ if [[ -f "go.mod" ]] && command -v go &>/dev/null; then
53
53
  if go test ./... -short 2>&1 | grep -q "^ok"; then
54
54
  TEST_STATE="passing"
55
55
  else
@@ -74,3 +74,25 @@ A comment that says `// increment counter` above `counter++` is noise.
74
74
  A comment that says `// retry three times because the upstream API returns 503 on cold start`
75
75
  is knowledge that cannot be inferred from the code alone.
76
76
  Delete the first kind. Write more of the second kind.
77
+
78
+ ---
79
+
80
+ ## Reference Codebases — What Excellence Looks Like
81
+
82
+ When you need a reference for what excellent code looks like, use these:
83
+
84
+ | Domain | Reference | What to study |
85
+ | ------ | --------- | ------------- |
86
+ | C / systems | SQLite source (`sqlite.org/src`) | Discipline: 590x test-to-source ratio, 100% branch coverage, zero external deps |
87
+ | C / network | Redis `src/ae.c`, `src/dict.c` | Naming, readability, data structures that document themselves |
88
+ | Go | Go standard library (`pkg.go.dev/std`) | Idiomatic naming, error design, interface sizing — one method where possible |
89
+ | Rust | `rustc_errors` crate | Error message design: what failed, where, what to do next |
90
+ | Error messages | Elm compiler output | Kindest, most actionable errors in any compiled language |
91
+ | API design | Stripe API (`docs.stripe.com`) | Naming consistency, versioning discipline, error schema |
92
+ | Documentation | Go stdlib `net/http` package docs | Every exported symbol explained by what it does for the caller |
93
+
94
+ When uncertain if code is good enough: "Would this survive a review from the SQLite team?"
95
+ If the answer is no — simplify first.
96
+
97
+ The SQLite standard: every line has a reason. Every function has one job.
98
+ Every error has a message a human can act on.
@@ -13,6 +13,22 @@
13
13
  - If Context7 is unavailable: state that docs could not be verified and flag the
14
14
  specific method/API as "unverified against current version."
15
15
 
16
+ ### Import-trigger protocol
17
+
18
+ Any commit diff that adds a new `import`, `require`, `using`, or `use` statement for an
19
+ external package must have a corresponding Context7 lookup in the same session.
20
+
21
+ Language-specific patterns matched:
22
+
23
+ - JavaScript/TypeScript: `import ... from`, `require(...)`
24
+ - Python: `import ...`, `from ... import`
25
+ - Go: `import "..."` or `import (...)`
26
+ - Rust: `use ...::...`
27
+ - Java/C#: `import ...`, `using ...`
28
+
29
+ Every new external import requires a Context7 verification before the gate clears.
30
+ Internal packages (same repo, same module) are excluded.
31
+
16
32
  ### Tool preferences
17
33
  - Use Read/Edit/Write tools instead of cat/sed/echo.
18
34
  - Use `rg` (ripgrep) for searching code, not grep.
@@ -0,0 +1,66 @@
1
+ # Context Budget — Rules
2
+ # Scope: All projects, all sessions
3
+ # Quantitative budget for context window usage
4
+ # Companion to context-management.md (qualitative bracket discipline)
5
+
6
+ ## Budget Allocation — Invariants (BLOCK on violation)
7
+
8
+ ### Zone model
9
+
10
+ The context window is a finite resource. Allocate it deliberately:
11
+
12
+ | Zone | Share | Contents |
13
+ | ---- | ----- | -------- |
14
+ | System overhead | ~15-20% | CLAUDE.md, universal rules, settings, MCP tool schemas |
15
+ | Working content | ~55-65% | Code, plans, conversation, tool output |
16
+ | Reserve | ~20% | Buffer for compaction, final outputs, tool responses |
17
+
18
+ When working content approaches capacity (signals below), begin offloading to vault.
19
+ Never wait for compaction to force the issue.
20
+
21
+ ### Budget signals (quantitative bracket thresholds)
22
+
23
+ | Signal | FRESH | MODERATE | DEPLETED |
24
+ | ------ | ----- | -------- | -------- |
25
+ | Tool calls | <15 | 15-35 | 35+ |
26
+ | Files read | <8 | 8-20 | 20+ |
27
+ | Large files (>200 lines) in session | <2 | 2-4 | 5+ |
28
+ | Corrections received | 0 | 1 | 2+ |
29
+
30
+ These thresholds feed `context-management.md` brackets and `/px-context-probe`.
31
+
32
+ ## Cost-Conscious Loading — Conventions (WARN on violation)
33
+
34
+ ### MCP servers
35
+ - Connect 2-3 core servers at session start (context7, github).
36
+ - Lazy-load optional servers (perplexity, filesystem, sequential-thinking) only when the task requires them.
37
+ - Never connect all registered servers preemptively — each adds schema overhead.
38
+
39
+ ### File reads
40
+ - Read targeted line ranges (`offset`/`limit`), not entire files, when you know the section.
41
+ - Files >200 lines: read the relevant section, not the whole file.
42
+
43
+ ### Search output
44
+ - Use `files_with_matches` for initial discovery; switch to `content` only for confirmed matches.
45
+ - Delegate exploration expected to produce >50 lines of output to a subagent.
46
+
47
+ ## Universal Rule Weight — Conventions (WARN on violation)
48
+
49
+ The 14 universal rules consume ~50KB of every session's context budget.
50
+
51
+ - Before proposing a new universal rule: justify the always-loaded cost.
52
+ - Prefer scoped rules (path-matched) over universal ones when the rule applies to a specific domain.
53
+ - New universal rules must stay under 3KB individually.
54
+ - If a universal rule exceeds 100 lines: split into a short universal rule and a scoped reference file.
55
+
56
+ ## Budget Actions
57
+
58
+ | Bracket | Action |
59
+ | ------- | ------ |
60
+ | FRESH | No budget concern. Batch aggressively, load full context. |
61
+ | MODERATE | Prefer concise output. Stop reading whole files. Use subagents for exploration. |
62
+ | DEPLETED | Run `/px-compact` for mid-session optimization, or `/px-context-reset` + `/clear` for full reset. |
63
+ | CRITICAL | STOP new work. Complete current milestone. Write all state to vault. New session. |
64
+
65
+ ## Removal Condition
66
+ Remove when Claude Code exposes native token utilization metrics and budget enforcement.
@@ -72,6 +72,10 @@ conversation length heuristic (not token count — we cannot read session JSONL)
72
72
  - Write all state to vault immediately
73
73
  - Suggest new session for remaining milestones
74
74
 
75
+ ### Budget integration
76
+ See `context-budget.md` for quantitative thresholds and allocation ratios.
77
+ Use `/px-compact` for mid-session optimization; `/px-context-reset` + `/clear` for full reset.
78
+
75
79
  ---
76
80
 
77
81
  ## Verification Commands
@@ -0,0 +1,53 @@
1
+ # Hooks Policy — Rules
2
+ # Scope: All projects, all sessions
3
+ # Documents which hooks are mandatory vs optional
4
+
5
+ ## Hook Inventory
6
+
7
+ ### Mandatory — must not be disabled (BLOCK on violation)
8
+
9
+ | Hook | Event | Purpose |
10
+ | ---- | ----- | ------- |
11
+ | `secret-scan.sh` | PreToolUse: Write/Edit | Blocks credential patterns in code |
12
+ | `credential-guard.sh` | PreToolUse: Bash | Blocks access to protected credential paths |
13
+ | `identity-check.sh` | PreToolUse: Bash | Verifies git identity before commits |
14
+ | `file-guard.sh` | PreToolUse: Write/Edit | Blocks writes to protected file patterns |
15
+ | `vault-checkpoint.sh` | PreCompact | Saves state before context compaction |
16
+
17
+ These hooks are security and data-integrity controls. Disabling them requires
18
+ explicit user approval with documented justification.
19
+
20
+ ### Required — should not be disabled without reason (WARN on violation)
21
+
22
+ | Hook | Event | Purpose |
23
+ | ---- | ----- | ------- |
24
+ | `dep-audit.sh` | PostToolUse: Write/Edit | Audits dependencies when manifests change |
25
+ | `session-data-collect.sh` | Stop | Captures session metadata for vault |
26
+ | Stop vault prompt | Stop | Writes session summary and vault updates |
27
+ | `on-stop-failure.sh` | StopFailure | Error handling for failed stop hooks |
28
+
29
+ ### Optional — enhance but not required
30
+
31
+ | Hook | Event | Purpose |
32
+ | ---- | ----- | ------- |
33
+ | `auto-format.sh` | PostToolUse: Write/Edit | Auto-formats files after edits |
34
+ | `recursion-guard.sh` | PreToolUse | Prevents recursive tool invocation |
35
+
36
+ ## Adding New Hooks
37
+
38
+ Before adding a hook to `settings-hooks.json`:
39
+ 1. Classify as mandatory, required, or optional using the criteria above
40
+ 2. Mandatory: security or data-integrity function — must never silently fail
41
+ 3. Required: workflow quality — should warn on failure but not block
42
+ 4. Optional: convenience — can be disabled per-project
43
+ 5. All hooks must exit 0 on success, non-zero to block (PreToolUse only)
44
+ 6. All hooks must handle missing dependencies gracefully (exit 0 if tool not found)
45
+
46
+ ## Hook Configuration
47
+
48
+ Hooks are declared in `base/hooks/settings-hooks.json`.
49
+ Install copies them to `~/.claude/settings.json` during `install.sh`.
50
+ Project-specific hooks can be added in project `.claude/settings.json`.
51
+
52
+ ## Removal Condition
53
+ Remove when Claude Code provides a native hook policy/priority system.
@@ -0,0 +1,99 @@
1
+ # Multi-Agent Orchestration — Rules
2
+ # Scope: All projects, all sessions
3
+ # Governs when and how to use multi-agent patterns
4
+
5
+ ## Agent Patterns
6
+
7
+ ### Single-Agent (default)
8
+ One Claude instance handles the full task. Subagents handle scoped delegation
9
+ (review, simplify, explore) per the `px-subagent` dispatch protocol.
10
+
11
+ **Use when:** Task touches ≤3 files, single domain, straightforward implementation.
12
+
13
+ ### Generator/Evaluator
14
+ Generator produces output. Evaluator critically reviews against spec, scoring on
15
+ correctness, completeness, style compliance, and test coverage.
16
+
17
+ **Use when:** Task touches >5 files, crosses module boundaries, or has high
18
+ correctness requirements (auth, data integrity, public API changes).
19
+
20
+ ### Planner/Generator/Evaluator
21
+ Planner decomposes into subtask graph first. Generator and Evaluator work each subtask.
22
+
23
+ **Use when:** Task spans multiple milestones, requires architectural decisions,
24
+ or the plan itself needs adversarial review.
25
+
26
+ ## Activation Thresholds — Conventions (WARN on violation)
27
+
28
+ | Signal | Single-Agent | Generator/Evaluator | Planner/Generator/Evaluator |
29
+ | ------ | ------------ | ------------------- | --------------------------- |
30
+ | Files changed | ≤3 | 4-10 | 10+ |
31
+ | Domains crossed | 1 | 2 | 3+ |
32
+ | Milestone count | 1 | 1-2 | 3+ |
33
+ | Risk level | Low | Medium | High |
34
+
35
+ These are guidelines, not hard gates. Use judgment — a 2-file auth change
36
+ may warrant Generator/Evaluator; a 15-file rename may not.
37
+
38
+ ## Evaluator Agent Spec
39
+
40
+ When Generator/Evaluator mode is active, the Evaluator receives:
41
+
42
+ | Input | Source |
43
+ | ----- | ------ |
44
+ | Diff | Generator's output (staged changes) |
45
+ | SPEC | From active plan file |
46
+ | Rules | Relevant quality rules for file types in diff |
47
+ | Test output | If tests were run |
48
+
49
+ The Evaluator does NOT receive conversation history.
50
+
51
+ ### Evaluator scoring rubric
52
+
53
+ | Dimension | Weight | Criteria |
54
+ | --------- | ------ | -------- |
55
+ | Correctness | 40% | Does the code do what the spec says? All paths handled? |
56
+ | Completeness | 25% | Are all acceptance criteria met? Tests present? |
57
+ | Style compliance | 20% | Naming, structure, quality rules respected? |
58
+ | Test coverage | 15% | Happy path, failure path, edge cases covered? |
59
+
60
+ ### Evaluator output format
61
+ Uses the same severity format as `px-subagent`:
62
+ ```
63
+ {file}:{line} — {severity} — {category} — {description} — {fix}
64
+ ```
65
+
66
+ ### Escalation
67
+ - Critical findings → Generator must fix before proceeding
68
+ - Major findings → Generator should fix before merge
69
+ - >3 findings addressed → re-run Evaluator (max 3 rounds)
70
+
71
+ ## Planner Agent Spec
72
+
73
+ When Planner mode is active, the Planner receives:
74
+ - PROBLEM / DELIVERABLE / ACCEPTANCE / BOUNDARIES
75
+ - Relevant codebase context (file structure, key interfaces)
76
+ - Active constraints from rules
77
+
78
+ The Planner outputs a subtask graph:
79
+ ```
80
+ 1. [subtask] — {files} — {acceptance criteria}
81
+ └─ depends_on: []
82
+ 2. [subtask] — {files} — {acceptance criteria}
83
+ └─ depends_on: [1]
84
+ ```
85
+
86
+ Generator and Evaluator process each subtask in dependency order.
87
+
88
+ ## Integration with Existing Skills
89
+
90
+ | Skill | Orchestration role |
91
+ | ----- | ------------------ |
92
+ | `/px-plan` | May invoke Planner agent for complex decomposition |
93
+ | `/px-review` | Already uses Evaluator pattern via `px-subagent` |
94
+ | `/px-verify` | Self-review step is a lightweight Evaluator |
95
+ | `/px-execute` | Generator role — produces implementation |
96
+
97
+ ## Removal Condition
98
+ Remove when Claude Code provides native multi-agent orchestration with
99
+ built-in evaluator and planner agent types.
@@ -0,0 +1,87 @@
1
+ # Observable Code — Instrumentation Constraints
2
+ # Scope: **/services/**, **/handlers/**, **/workers/**, **/middleware/**, **/cmd/**
3
+ # Active during code generation for service-layer code
4
+ # Cross-reference: api-quality.md covers request-level logging and correlation IDs.
5
+ # This rule covers application-level observability: structured logging, metrics, traces.
6
+
7
+ Code is not production-ready if it cannot be debugged without attaching a debugger.
8
+ Observable code tells you what happened, when, and why — from logs, metrics, and traces alone.
9
+
10
+ ## Invariants — BLOCK on violation
11
+
12
+ ### Structured logging only
13
+ - All log statements use structured format (key-value pairs, not string interpolation)
14
+ - No `fmt.Println` / `console.log` / `print()` in production code paths — use the structured logger
15
+ - Log at the point of failure, not at the catch site (log once, propagate)
16
+
17
+ ### Log levels are semantic
18
+ - ERROR: something failed and a human needs to know immediately
19
+ - WARN: something unexpected happened but the system recovered
20
+ - INFO: a significant state transition (service started, job completed, user authenticated)
21
+ - DEBUG: internal detail useful during development — must not appear in production by default
22
+
23
+ ### Structured log format — mandatory fields
24
+ ```json
25
+ {
26
+ "timestamp": "ISO-8601 UTC",
27
+ "level": "error|warn|info|debug",
28
+ "service": "service-name",
29
+ "correlation_id": "request or trace identifier",
30
+ "message": "what happened — actionable, not generic",
31
+ "context": { "relevant_key": "relevant_value" }
32
+ }
33
+ ```
34
+
35
+ ### What NOT to log
36
+ - Passwords, tokens, secrets, full credit card numbers
37
+ - Full request/response bodies in production (may contain PII)
38
+ - DEBUG logs in production services (log level must be configurable)
39
+ - The same event more than once in the same request path
40
+
41
+ ### External call discipline
42
+ - Every external call (HTTP, DB, queue) has a timeout
43
+ - Every external call logs duration on completion
44
+ - Failed external calls log: target, duration, error type, and whether retry will occur
45
+
46
+ ## Conventions — WARN on violation
47
+
48
+ ### Metrics naming
49
+ Format: `{service}_{subsystem}_{name}_{unit}`
50
+ All lowercase, underscores as separators.
51
+
52
+ Mandatory metrics per service:
53
+ - `{service}_requests_total` — counter, labeled by method and status code
54
+ - `{service}_errors_total` — counter, labeled by error type
55
+ - `{service}_latency_seconds` — histogram, labeled by operation
56
+ - `{service}_active_connections` or `{service}_queue_depth` — gauge (if applicable)
57
+
58
+ GOOD: `auth_login_attempts_total`, `cache_hit_ratio`, `queue_messages_pending`
59
+ BAD: `loginAttempts`, `CacheHitRatio`, `queue-messages-pending`
60
+
61
+ ### Trace spans (OpenTelemetry)
62
+ Span naming: `{service}/{operation}` — lowercase, slash separator
63
+ GOOD: `auth/validate-token`, `db/query-users`, `cache/get`
64
+ BAD: `validateToken`, `DB Query`, `GET /users`
65
+
66
+ Mandatory span attributes:
67
+ - `service.name`
68
+ - `http.method` and `http.status_code` for HTTP operations
69
+ - `db.system` and `db.operation` for database calls
70
+ - `error.type` and `error.message` on error spans
71
+
72
+ ### Health endpoints
73
+ - Liveness: `/healthz` — "is the process alive?"
74
+ - Readiness: `/readyz` — "can the process serve traffic?"
75
+ - Both return structured JSON with component status
76
+
77
+ ### The Observability Contract
78
+ An error is only production-observable if ALL three are true:
79
+ 1. It appears in structured logs with correlation_id and context
80
+ 2. It increments an error metric labeled by error type
81
+ 3. It is captured in a trace span with error attributes
82
+
83
+ If only one or two are true: the code is not fully observable. Fix before shipping.
84
+
85
+ ## Removal Condition
86
+ Remove when an observability linter or OpenTelemetry SDK auto-instrumentation
87
+ replaces these generation-time constraints entirely.