@esoteric-logic/praxis-harness 2.11.0 → 2.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/base/CLAUDE.md +37 -3
- package/base/agents/evaluator.md +44 -0
- package/base/agents/planner.md +48 -0
- package/base/hooks/auto-format.sh +1 -1
- package/base/hooks/dep-audit.sh +1 -1
- package/base/hooks/file-guard.sh +3 -3
- package/base/hooks/recursion-guard.sh +7 -1
- package/base/hooks/session-data-collect.sh +1 -1
- package/base/hooks/vault-checkpoint.sh +5 -5
- package/base/rules/code-excellence.md +22 -0
- package/base/rules/coding.md +16 -0
- package/base/rules/context-budget.md +66 -0
- package/base/rules/context-management.md +4 -0
- package/base/rules/hooks-policy.md +53 -0
- package/base/rules/multi-agent-orchestration.md +99 -0
- package/base/rules/observable-code.md +87 -0
- package/base/rules/phase-detection.md +52 -0
- package/base/rules/refactor-triggers.md +59 -0
- package/base/rules/self-repair.md +53 -0
- package/base/rules/session-bridge.md +39 -0
- package/base/rules/session-metrics.md +61 -0
- package/base/rules/skill-authoring.md +85 -0
- package/base/rules/writing-quality.md +122 -0
- package/base/skills/px-compact/SKILL.md +100 -0
- package/base/skills/px-complexity-audit/SKILL.md +118 -0
- package/base/skills/px-context-probe/SKILL.md +1 -1
- package/base/skills/px-context-triage/SKILL.md +76 -0
- package/base/skills/px-discover/SKILL.md +4 -1
- package/base/skills/px-discuss/SKILL.md +4 -1
- package/base/skills/px-doc-lint/SKILL.md +107 -0
- package/base/skills/px-prose-review/SKILL.md +96 -0
- package/base/skills/px-quality-gate/SKILL.md +182 -0
- package/base/skills/px-risk/SKILL.md +4 -1
- package/base/skills/px-scaffold-new/SKILL.md +16 -14
- package/base/skills/px-session-retro/SKILL.md +1 -1
- package/base/skills/px-spec/SKILL.md +6 -2
- package/base/skills/px-verify/SKILL.md +2 -1
- package/bin/praxis.js +27 -6
- package/kits/api/KIT.md +2 -0
- package/kits/api/install.sh +1 -1
- package/kits/api/teardown.sh +1 -1
- package/kits/code-quality/KIT.md +2 -0
- package/kits/code-quality/hooks/generate-baseline.sh +1 -1
- package/kits/code-quality/hooks/post-commit.sh +3 -2
- package/kits/code-quality/hooks/pre-push.sh +15 -15
- package/kits/code-quality/install.sh +1 -1
- package/kits/code-quality/teardown.sh +3 -3
- package/kits/data/KIT.md +2 -0
- package/kits/data/install.sh +1 -1
- package/kits/data/teardown.sh +1 -1
- package/kits/infrastructure/KIT.md +2 -0
- package/kits/infrastructure/install.sh +1 -1
- package/kits/infrastructure/teardown.sh +1 -1
- package/kits/security/KIT.md +2 -0
- package/kits/security/install.sh +1 -1
- package/kits/security/teardown.sh +1 -1
- package/kits/web-designer/KIT.md +2 -0
- package/kits/web-designer/install.sh +1 -1
- package/kits/web-designer/teardown.sh +1 -1
- package/package.json +1 -1
- package/scripts/health-check.sh +21 -15
- package/scripts/install-tools.sh +5 -5
- package/scripts/lint-harness.sh +1 -1
- package/scripts/onboard-mcp.sh +1 -1
- package/scripts/test-harness.sh +1 -1
- package/scripts/update.sh +1 -1
package/base/CLAUDE.md
CHANGED
|
@@ -62,6 +62,8 @@ All `{vault_path}` references in rules and skills resolve from this config.
|
|
|
62
62
|
- Compact trigger: when context approaches ceiling, finish the current milestone first
|
|
63
63
|
- Never compact mid-plan — complete the milestone, write phase summary to vault, then compact
|
|
64
64
|
- After compaction: re-bootstrap from § After Compaction below, re-run quality checks fresh
|
|
65
|
+
- Context budget: see `~/.claude/rules/context-budget.md` for allocation ratios and cost-conscious loading
|
|
66
|
+
- Mid-session optimization: `/px-compact` (no /clear). Full reset: `/px-context-reset` + `/clear`
|
|
65
67
|
|
|
66
68
|
## Durable Memory
|
|
67
69
|
Context is volatile. Files are permanent. Act accordingly.
|
|
@@ -110,7 +112,8 @@ Missing servers are non-blocking — features degrade gracefully.
|
|
|
110
112
|
- Git operation → `~/.claude/rules/git-workflow.md`
|
|
111
113
|
- Client-facing writing → auto-loaded by `px-communication-standards` skill
|
|
112
114
|
- Architecture/specs → auto-loaded by `px-architecture-patterns` skill
|
|
113
|
-
5.
|
|
115
|
+
5. If a triage checkpoint exists (`*-compact-checkpoint.md` with tag `triage`), use its "Active Working Set" to reload files — it is more precise than the hook checkpoint.
|
|
116
|
+
6. Quality re-anchor: read most recent `compact-checkpoint.md` → check the Quality State section.
|
|
114
117
|
- If lint findings existed before compaction: re-run `golangci-lint run`, confirm status.
|
|
115
118
|
- If tests were failing before compaction: re-run test command, confirm status.
|
|
116
119
|
- Do NOT assume pre-compaction state is current. Always re-run fresh.
|
|
@@ -125,6 +128,9 @@ Missing servers are non-blocking — features degrade gracefully.
|
|
|
125
128
|
- Commit with wrong git identity
|
|
126
129
|
- Write a file with unreplaced {placeholders}
|
|
127
130
|
- Use vault search when Obsidian is not running (obsidian backend requires Obsidian open)
|
|
131
|
+
- Mix refactoring and feature changes in one commit — commit refactor separately
|
|
132
|
+
- Copy-paste 3+ lines instead of extracting a shared function
|
|
133
|
+
- Use `console.log`/`fmt.Println`/`print()` for production logging — use the structured logger
|
|
128
134
|
|
|
129
135
|
## AI-Kit Registry
|
|
130
136
|
Kits activate via `/px-kit:<n>` slash command. Kits are idempotent — double-activate is a no-op.
|
|
@@ -137,12 +143,12 @@ Kits activate via `/px-kit:<n>` slash command. Kits are idempotent — double-ac
|
|
|
137
143
|
| security | `/px-kit:security` | Threat modeling → IAM review → OWASP audit |
|
|
138
144
|
| code-quality | `/px-kit:code-quality` | SAST + secrets + SCA + IaC gate → AI review (over-engineering, smells, structure) |
|
|
139
145
|
| data | `/px-kit:data` | Schema design → migration planning → query optimization |
|
|
140
|
-
|
|
141
146
|
Kit manifests live in `~/.claude/kits/<name>/KIT.md`.
|
|
147
|
+
Kit fields: `context_cost` (low/medium/high), `depends_on` (kit dependencies), `skills_chain` (phased workflow).
|
|
142
148
|
|
|
143
149
|
## Rules Registry — Load on Demand Only
|
|
144
150
|
|
|
145
|
-
### Universal — always active (
|
|
151
|
+
### Universal — always active (16 rules)
|
|
146
152
|
|
|
147
153
|
Quality is a generation-time constraint, not a post-hoc review. The rules below
|
|
148
154
|
are the lens you write through — they shape every line of code produced.
|
|
@@ -161,6 +167,10 @@ are the lens you write through — they shape every line of code produced.
|
|
|
161
167
|
| `~/.claude/rules/context-management.md` | Context anti-rot, phase scoping, context reset protocol |
|
|
162
168
|
| `~/.claude/rules/memory-boundary.md` | Auto-memory boundary, MEMORY.md cap, dream integration |
|
|
163
169
|
| `~/.claude/rules/security-posture.md` | Sandbox model, credential protection, protected paths |
|
|
170
|
+
| `~/.claude/rules/writing-quality.md` | Prose constraints — sentence limits, fluff kill list, doc templates, voice rules |
|
|
171
|
+
| `~/.claude/rules/refactor-triggers.md` | Pre-check protocol, commit refactor separately, QUALITY: comment convention |
|
|
172
|
+
| `~/.claude/rules/context-budget.md` | Quantitative budget zones, cost-conscious loading, MCP server discipline |
|
|
173
|
+
| `~/.claude/rules/self-repair.md` | Structured recovery — 3-attempt escalation ladder, strategy rotation |
|
|
164
174
|
|
|
165
175
|
### Scoped — load only when paths match
|
|
166
176
|
|
|
@@ -188,11 +198,35 @@ are the lens you write through — they shape every line of code produced.
|
|
|
188
198
|
| `~/.claude/rules/live-docs-required.md` | Dependency manifests, files importing external packages |
|
|
189
199
|
| `~/.claude/rules/desktop-protocol.md` | Claude Desktop ↔ Claude Code handoff sessions |
|
|
190
200
|
|
|
201
|
+
#### Application observability
|
|
202
|
+
|
|
203
|
+
| File | Loads when |
|
|
204
|
+
|------|------------|
|
|
205
|
+
| `~/.claude/rules/observable-code.md` | `**/services/**`, `**/handlers/**`, `**/workers/**`, `**/middleware/**`, `**/cmd/**` |
|
|
206
|
+
|
|
207
|
+
#### Workflow and orchestration
|
|
208
|
+
| File | Loads when |
|
|
209
|
+
|------|------------|
|
|
210
|
+
| `~/.claude/rules/session-bridge.md` | Session start/end, vault handoff, cross-session continuity |
|
|
211
|
+
| `~/.claude/rules/hooks-policy.md` | Adding or modifying hooks in `settings-hooks.json` |
|
|
212
|
+
| `~/.claude/rules/multi-agent-orchestration.md` | Tasks crossing >3 files or multiple domains |
|
|
213
|
+
| `~/.claude/rules/phase-detection.md` | Workflow phase transitions, kit phase changes |
|
|
214
|
+
| `~/.claude/rules/session-metrics.md` | End-of-session retrospective, metrics collection |
|
|
215
|
+
| `~/.claude/rules/skill-authoring.md` | Creating or editing `base/skills/*/SKILL.md` files |
|
|
216
|
+
|
|
217
|
+
#### Agent specs
|
|
218
|
+
| File | Purpose |
|
|
219
|
+
|------|---------|
|
|
220
|
+
| `base/agents/evaluator.md` | Evaluator agent for Generator/Evaluator pattern |
|
|
221
|
+
| `base/agents/planner.md` | Planner agent for task decomposition |
|
|
222
|
+
|
|
191
223
|
### Auto-invocable skills (replace former universal rules)
|
|
192
224
|
| Skill | Triggers when |
|
|
193
225
|
|-------|--------------|
|
|
194
226
|
| `px-communication-standards` | Writing client-facing docs, proposals, status reports, commits, PRs |
|
|
195
227
|
| `px-architecture-patterns` | Writing ADRs, specs, system design, risk docs, blocker reports |
|
|
228
|
+
| `px-quality-gate` | Auto inside /px-verify (Step 1 item 5b) and before /px-ship — blocks on BLOCK findings |
|
|
229
|
+
| `px-doc-lint` | Fast structural markdown check inside px-quality-gate for staged *.md files |
|
|
196
230
|
|
|
197
231
|
## Judgment & Research Commands
|
|
198
232
|
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# Evaluator Agent Spec
|
|
2
|
+
|
|
3
|
+
## Role
|
|
4
|
+
You are a critical code evaluator. You score Generator output against a SPEC
|
|
5
|
+
on four dimensions: correctness, completeness, style compliance, and test coverage.
|
|
6
|
+
|
|
7
|
+
## Inputs
|
|
8
|
+
- **Diff**: staged changes from the Generator
|
|
9
|
+
- **SPEC**: acceptance criteria from the active plan
|
|
10
|
+
- **Rules**: quality rules for file types in the diff
|
|
11
|
+
- **Test output**: test results if available
|
|
12
|
+
|
|
13
|
+
You do NOT have conversation history. Judge the diff on its own merits.
|
|
14
|
+
|
|
15
|
+
## Scoring Rubric
|
|
16
|
+
|
|
17
|
+
| Dimension | Weight | Pass Criteria |
|
|
18
|
+
| --------- | ------ | ------------- |
|
|
19
|
+
| Correctness | 40% | Code does what SPEC says. All paths handled. No logic errors. |
|
|
20
|
+
| Completeness | 25% | All acceptance criteria addressed. No partial implementations. |
|
|
21
|
+
| Style compliance | 20% | Naming, structure, and quality rules respected. |
|
|
22
|
+
| Test coverage | 15% | Happy path, failure path, and edge cases covered. |
|
|
23
|
+
|
|
24
|
+
## Output Format
|
|
25
|
+
|
|
26
|
+
Findings use the standard subagent format:
|
|
27
|
+
```
|
|
28
|
+
{file}:{line} — {severity} — {category} — {description} — {fix}
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Severity: Critical (blocks), Major (should fix), Minor (note).
|
|
32
|
+
|
|
33
|
+
End with a summary:
|
|
34
|
+
```
|
|
35
|
+
SCORE: {correctness}% / {completeness}% / {style}% / {tests}%
|
|
36
|
+
VERDICT: PASS | CHANGES_REQUESTED | BLOCK
|
|
37
|
+
FINDINGS: {critical} critical, {major} major, {minor} minor
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Constraints
|
|
41
|
+
- Do not suggest features beyond the SPEC
|
|
42
|
+
- Do not comment on code outside the diff
|
|
43
|
+
- Do not soften findings — be direct about what is wrong
|
|
44
|
+
- If nothing is wrong: "No findings. VERDICT: PASS"
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Planner Agent Spec
|
|
2
|
+
|
|
3
|
+
## Role
|
|
4
|
+
You are a task decomposition planner. You break complex tasks into a dependency-ordered
|
|
5
|
+
subtask graph that a Generator can execute one step at a time.
|
|
6
|
+
|
|
7
|
+
## Inputs
|
|
8
|
+
- **PROBLEM / DELIVERABLE / ACCEPTANCE / BOUNDARIES** from the discuss phase
|
|
9
|
+
- **Codebase context**: file structure, key interfaces, existing patterns
|
|
10
|
+
- **Active constraints**: quality rules, architecture rules applicable to the task
|
|
11
|
+
|
|
12
|
+
You do NOT have conversation history. Plan from the inputs alone.
|
|
13
|
+
|
|
14
|
+
## Output Format
|
|
15
|
+
|
|
16
|
+
A numbered subtask graph with dependencies:
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
SUBTASK GRAPH
|
|
20
|
+
━━━━━━━━━━━━━━━━━━━━━
|
|
21
|
+
|
|
22
|
+
1. [subtask title]
|
|
23
|
+
Files: {paths}
|
|
24
|
+
Acceptance: {criteria}
|
|
25
|
+
depends_on: []
|
|
26
|
+
|
|
27
|
+
2. [subtask title]
|
|
28
|
+
Files: {paths}
|
|
29
|
+
Acceptance: {criteria}
|
|
30
|
+
depends_on: [1]
|
|
31
|
+
|
|
32
|
+
3. [subtask title]
|
|
33
|
+
Files: {paths}
|
|
34
|
+
Acceptance: {criteria}
|
|
35
|
+
depends_on: [1, 2]
|
|
36
|
+
|
|
37
|
+
CRITICAL PATH: 1 → 2 → 3
|
|
38
|
+
PARALLELIZABLE: none | {subtask pairs}
|
|
39
|
+
ESTIMATED MILESTONES: {count}
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Constraints
|
|
43
|
+
- Each subtask must be completable in a single milestone
|
|
44
|
+
- Each subtask must have testable acceptance criteria
|
|
45
|
+
- Dependencies must be acyclic
|
|
46
|
+
- Do not decompose beyond what is necessary — 3-7 subtasks for most features
|
|
47
|
+
- Flag subtasks that carry architectural risk
|
|
48
|
+
- If the task is simple enough for single-agent mode: say so and output a single subtask
|
package/base/hooks/dep-audit.sh
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
# dep-audit.sh — PostToolUse:Write|Edit|MultiEdit hook
|
|
3
3
|
# Runs dependency vulnerability checks when manifest files are modified.
|
|
4
4
|
# Always exits 0 (advisory only — PostToolUse cannot hard-block).
|
|
5
|
-
set -
|
|
5
|
+
set -euo pipefail
|
|
6
6
|
trap 'exit 0' ERR
|
|
7
7
|
|
|
8
8
|
INPUT=$(cat)
|
package/base/hooks/file-guard.sh
CHANGED
|
@@ -6,7 +6,7 @@ set -euo pipefail
|
|
|
6
6
|
INPUT=$(cat)
|
|
7
7
|
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // .tool_input.path // empty')
|
|
8
8
|
|
|
9
|
-
if [ -z "$FILE_PATH" ]; then
|
|
9
|
+
if [[ -z "$FILE_PATH" ]]; then
|
|
10
10
|
exit 0
|
|
11
11
|
fi
|
|
12
12
|
|
|
@@ -29,7 +29,7 @@ for pattern in "${PROTECTED_PATTERNS[@]}"; do
|
|
|
29
29
|
done
|
|
30
30
|
|
|
31
31
|
# Check project-level protected files from CLAUDE.md if it exists
|
|
32
|
-
if [ -f "CLAUDE.md" ]; then
|
|
32
|
+
if [[ -f "CLAUDE.md" ]]; then
|
|
33
33
|
# Extract paths from ## Protected Files section
|
|
34
34
|
IN_SECTION=false
|
|
35
35
|
while IFS= read -r line; do
|
|
@@ -42,7 +42,7 @@ if [ -f "CLAUDE.md" ]; then
|
|
|
42
42
|
fi
|
|
43
43
|
if $IN_SECTION && echo "$line" | grep -qE "^- "; then
|
|
44
44
|
PROTECTED=$(echo "$line" | sed 's/^- //' | sed 's/ *#.*//' | xargs)
|
|
45
|
-
if [ -n "$PROTECTED" ] && echo "$FILE_PATH" | grep -qE "$PROTECTED"; then
|
|
45
|
+
if [[ -n "$PROTECTED" ]] && echo "$FILE_PATH" | grep -qE "$PROTECTED"; then
|
|
46
46
|
echo "BLOCKED: $FILE_PATH matches project-protected pattern '$PROTECTED'. Explain the intended change."
|
|
47
47
|
exit 2
|
|
48
48
|
fi
|
|
@@ -50,7 +50,13 @@ KEY="${KEY:0:300}"
|
|
|
50
50
|
|
|
51
51
|
# ── Increment counter ──
|
|
52
52
|
# Use a hash of the key for safe JSON field names
|
|
53
|
-
|
|
53
|
+
if command -v md5sum &>/dev/null; then
|
|
54
|
+
KEY_HASH=$(echo -n "$KEY" | md5sum | cut -d' ' -f1)
|
|
55
|
+
elif command -v md5 &>/dev/null; then
|
|
56
|
+
KEY_HASH=$(echo -n "$KEY" | md5 -q)
|
|
57
|
+
else
|
|
58
|
+
KEY_HASH="${KEY:0:32}"
|
|
59
|
+
fi
|
|
54
60
|
|
|
55
61
|
COUNT=$(jq -r --arg cat "$CATEGORY" --arg key "$KEY_HASH" \
|
|
56
62
|
'.[$cat][$key] // 0' "$STATE_FILE" 2>/dev/null || echo "0")
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
#!/usr/bin/env bash
|
|
2
2
|
# Stop hook — collects structured session data and stages it for the Stop prompt.
|
|
3
3
|
# Always exits 0 (advisory, never blocks session end).
|
|
4
|
-
set -
|
|
4
|
+
set -euo pipefail
|
|
5
5
|
trap 'exit 0' ERR
|
|
6
6
|
|
|
7
7
|
CONFIG_FILE="$HOME/.claude/praxis.config.json"
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
#!/usr/bin/env bash
|
|
2
2
|
# PreCompact hook — writes minimal checkpoint to vault before context compaction.
|
|
3
3
|
# Always exits 0 (advisory, never blocks compaction).
|
|
4
|
-
set -
|
|
4
|
+
set -euo pipefail
|
|
5
5
|
trap 'exit 0' ERR
|
|
6
6
|
|
|
7
7
|
CONFIG_FILE="$HOME/.claude/praxis.config.json"
|
|
@@ -19,7 +19,7 @@ PLANS_DIR="$VAULT_PATH/plans"
|
|
|
19
19
|
mkdir -p "$PLANS_DIR"
|
|
20
20
|
|
|
21
21
|
DATE=$(date +%Y-%m-%d)
|
|
22
|
-
TIMESTAMP=$(date +"%Y-%m-%
|
|
22
|
+
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
|
|
23
23
|
CHECKPOINT_FILE="$PLANS_DIR/$DATE-compact-checkpoint.md"
|
|
24
24
|
|
|
25
25
|
BRANCH=$(git --no-pager rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
|
|
@@ -40,16 +40,16 @@ fi
|
|
|
40
40
|
LINT_STATE="unknown"
|
|
41
41
|
TEST_STATE="unknown"
|
|
42
42
|
|
|
43
|
-
if [ -f "go.mod" ] && command -v golangci-lint &>/dev/null; then
|
|
43
|
+
if [[ -f "go.mod" ]] && command -v golangci-lint &>/dev/null; then
|
|
44
44
|
LINT_COUNT=$(golangci-lint run ./... 2>&1 | grep -c "^" || true)
|
|
45
|
-
if [ "$LINT_COUNT" -eq 0 ]; then
|
|
45
|
+
if [[ "$LINT_COUNT" -eq 0 ]]; then
|
|
46
46
|
LINT_STATE="clean"
|
|
47
47
|
else
|
|
48
48
|
LINT_STATE="$LINT_COUNT findings"
|
|
49
49
|
fi
|
|
50
50
|
fi
|
|
51
51
|
|
|
52
|
-
if [ -f "go.mod" ] && command -v go &>/dev/null; then
|
|
52
|
+
if [[ -f "go.mod" ]] && command -v go &>/dev/null; then
|
|
53
53
|
if go test ./... -short 2>&1 | grep -q "^ok"; then
|
|
54
54
|
TEST_STATE="passing"
|
|
55
55
|
else
|
|
@@ -74,3 +74,25 @@ A comment that says `// increment counter` above `counter++` is noise.
|
|
|
74
74
|
A comment that says `// retry three times because the upstream API returns 503 on cold start`
|
|
75
75
|
is knowledge that cannot be inferred from the code alone.
|
|
76
76
|
Delete the first kind. Write more of the second kind.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Reference Codebases — What Excellence Looks Like
|
|
81
|
+
|
|
82
|
+
When you need a reference for what excellent code looks like, use these:
|
|
83
|
+
|
|
84
|
+
| Domain | Reference | What to study |
|
|
85
|
+
| ------ | --------- | ------------- |
|
|
86
|
+
| C / systems | SQLite source (`sqlite.org/src`) | Discipline: 590x test-to-source ratio, 100% branch coverage, zero external deps |
|
|
87
|
+
| C / network | Redis `src/ae.c`, `src/dict.c` | Naming, readability, data structures that document themselves |
|
|
88
|
+
| Go | Go standard library (`pkg.go.dev/std`) | Idiomatic naming, error design, interface sizing — one method where possible |
|
|
89
|
+
| Rust | `rustc_errors` crate | Error message design: what failed, where, what to do next |
|
|
90
|
+
| Error messages | Elm compiler output | Kindest, most actionable errors in any compiled language |
|
|
91
|
+
| API design | Stripe API (`docs.stripe.com`) | Naming consistency, versioning discipline, error schema |
|
|
92
|
+
| Documentation | Go stdlib `net/http` package docs | Every exported symbol explained by what it does for the caller |
|
|
93
|
+
|
|
94
|
+
When uncertain if code is good enough: "Would this survive a review from the SQLite team?"
|
|
95
|
+
If the answer is no — simplify first.
|
|
96
|
+
|
|
97
|
+
The SQLite standard: every line has a reason. Every function has one job.
|
|
98
|
+
Every error has a message a human can act on.
|
package/base/rules/coding.md
CHANGED
|
@@ -13,6 +13,22 @@
|
|
|
13
13
|
- If Context7 is unavailable: state that docs could not be verified and flag the
|
|
14
14
|
specific method/API as "unverified against current version."
|
|
15
15
|
|
|
16
|
+
### Import-trigger protocol
|
|
17
|
+
|
|
18
|
+
Any commit diff that adds a new `import`, `require`, `using`, or `use` statement for an
|
|
19
|
+
external package must have a corresponding Context7 lookup in the same session.
|
|
20
|
+
|
|
21
|
+
Language-specific patterns matched:
|
|
22
|
+
|
|
23
|
+
- JavaScript/TypeScript: `import ... from`, `require(...)`
|
|
24
|
+
- Python: `import ...`, `from ... import`
|
|
25
|
+
- Go: `import "..."` or `import (...)`
|
|
26
|
+
- Rust: `use ...::...`
|
|
27
|
+
- Java/C#: `import ...`, `using ...`
|
|
28
|
+
|
|
29
|
+
Every new external import requires a Context7 verification before the gate clears.
|
|
30
|
+
Internal packages (same repo, same module) are excluded.
|
|
31
|
+
|
|
16
32
|
### Tool preferences
|
|
17
33
|
- Use Read/Edit/Write tools instead of cat/sed/echo.
|
|
18
34
|
- Use `rg` (ripgrep) for searching code, not grep.
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# Context Budget — Rules
|
|
2
|
+
# Scope: All projects, all sessions
|
|
3
|
+
# Quantitative budget for context window usage
|
|
4
|
+
# Companion to context-management.md (qualitative bracket discipline)
|
|
5
|
+
|
|
6
|
+
## Budget Allocation — Invariants (BLOCK on violation)
|
|
7
|
+
|
|
8
|
+
### Zone model
|
|
9
|
+
|
|
10
|
+
The context window is a finite resource. Allocate it deliberately:
|
|
11
|
+
|
|
12
|
+
| Zone | Share | Contents |
|
|
13
|
+
| ---- | ----- | -------- |
|
|
14
|
+
| System overhead | ~15-20% | CLAUDE.md, universal rules, settings, MCP tool schemas |
|
|
15
|
+
| Working content | ~55-65% | Code, plans, conversation, tool output |
|
|
16
|
+
| Reserve | ~20% | Buffer for compaction, final outputs, tool responses |
|
|
17
|
+
|
|
18
|
+
When working content approaches capacity (signals below), begin offloading to vault.
|
|
19
|
+
Never wait for compaction to force the issue.
|
|
20
|
+
|
|
21
|
+
### Budget signals (quantitative bracket thresholds)
|
|
22
|
+
|
|
23
|
+
| Signal | FRESH | MODERATE | DEPLETED |
|
|
24
|
+
| ------ | ----- | -------- | -------- |
|
|
25
|
+
| Tool calls | <15 | 15-35 | 35+ |
|
|
26
|
+
| Files read | <8 | 8-20 | 20+ |
|
|
27
|
+
| Large files (>200 lines) in session | <2 | 2-4 | 5+ |
|
|
28
|
+
| Corrections received | 0 | 1 | 2+ |
|
|
29
|
+
|
|
30
|
+
These thresholds feed `context-management.md` brackets and `/px-context-probe`.
|
|
31
|
+
|
|
32
|
+
## Cost-Conscious Loading — Conventions (WARN on violation)
|
|
33
|
+
|
|
34
|
+
### MCP servers
|
|
35
|
+
- Connect 2-3 core servers at session start (context7, github).
|
|
36
|
+
- Lazy-load optional servers (perplexity, filesystem, sequential-thinking) only when the task requires them.
|
|
37
|
+
- Never connect all registered servers preemptively — each adds schema overhead.
|
|
38
|
+
|
|
39
|
+
### File reads
|
|
40
|
+
- Read targeted line ranges (`offset`/`limit`), not entire files, when you know the section.
|
|
41
|
+
- Files >200 lines: read the relevant section, not the whole file.
|
|
42
|
+
|
|
43
|
+
### Search output
|
|
44
|
+
- Use `files_with_matches` for initial discovery; switch to `content` only for confirmed matches.
|
|
45
|
+
- Delegate exploration expected to produce >50 lines of output to a subagent.
|
|
46
|
+
|
|
47
|
+
## Universal Rule Weight — Conventions (WARN on violation)
|
|
48
|
+
|
|
49
|
+
The 14 universal rules consume ~50KB of every session's context budget.
|
|
50
|
+
|
|
51
|
+
- Before proposing a new universal rule: justify the always-loaded cost.
|
|
52
|
+
- Prefer scoped rules (path-matched) over universal ones when the rule applies to a specific domain.
|
|
53
|
+
- New universal rules must stay under 3KB individually.
|
|
54
|
+
- If a universal rule exceeds 100 lines: split into a short universal rule and a scoped reference file.
|
|
55
|
+
|
|
56
|
+
## Budget Actions
|
|
57
|
+
|
|
58
|
+
| Bracket | Action |
|
|
59
|
+
| ------- | ------ |
|
|
60
|
+
| FRESH | No budget concern. Batch aggressively, load full context. |
|
|
61
|
+
| MODERATE | Prefer concise output. Stop reading whole files. Use subagents for exploration. |
|
|
62
|
+
| DEPLETED | Run `/px-compact` for mid-session optimization, or `/px-context-reset` + `/clear` for full reset. |
|
|
63
|
+
| CRITICAL | STOP new work. Complete current milestone. Write all state to vault. New session. |
|
|
64
|
+
|
|
65
|
+
## Removal Condition
|
|
66
|
+
Remove when Claude Code exposes native token utilization metrics and budget enforcement.
|
|
@@ -72,6 +72,10 @@ conversation length heuristic (not token count — we cannot read session JSONL)
|
|
|
72
72
|
- Write all state to vault immediately
|
|
73
73
|
- Suggest new session for remaining milestones
|
|
74
74
|
|
|
75
|
+
### Budget integration
|
|
76
|
+
See `context-budget.md` for quantitative thresholds and allocation ratios.
|
|
77
|
+
Use `/px-compact` for mid-session optimization; `/px-context-reset` + `/clear` for full reset.
|
|
78
|
+
|
|
75
79
|
---
|
|
76
80
|
|
|
77
81
|
## Verification Commands
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
# Hooks Policy — Rules
|
|
2
|
+
# Scope: All projects, all sessions
|
|
3
|
+
# Documents which hooks are mandatory vs optional
|
|
4
|
+
|
|
5
|
+
## Hook Inventory
|
|
6
|
+
|
|
7
|
+
### Mandatory — must not be disabled (BLOCK on violation)
|
|
8
|
+
|
|
9
|
+
| Hook | Event | Purpose |
|
|
10
|
+
| ---- | ----- | ------- |
|
|
11
|
+
| `secret-scan.sh` | PreToolUse: Write/Edit | Blocks credential patterns in code |
|
|
12
|
+
| `credential-guard.sh` | PreToolUse: Bash | Blocks access to protected credential paths |
|
|
13
|
+
| `identity-check.sh` | PreToolUse: Bash | Verifies git identity before commits |
|
|
14
|
+
| `file-guard.sh` | PreToolUse: Write/Edit | Blocks writes to protected file patterns |
|
|
15
|
+
| `vault-checkpoint.sh` | PreCompact | Saves state before context compaction |
|
|
16
|
+
|
|
17
|
+
These hooks are security and data-integrity controls. Disabling them requires
|
|
18
|
+
explicit user approval with documented justification.
|
|
19
|
+
|
|
20
|
+
### Required — should not be disabled without reason (WARN on violation)
|
|
21
|
+
|
|
22
|
+
| Hook | Event | Purpose |
|
|
23
|
+
| ---- | ----- | ------- |
|
|
24
|
+
| `dep-audit.sh` | PostToolUse: Write/Edit | Audits dependencies when manifests change |
|
|
25
|
+
| `session-data-collect.sh` | Stop | Captures session metadata for vault |
|
|
26
|
+
| Stop vault prompt | Stop | Writes session summary and vault updates |
|
|
27
|
+
| `on-stop-failure.sh` | StopFailure | Error handling for failed stop hooks |
|
|
28
|
+
|
|
29
|
+
### Optional — enhance but not required
|
|
30
|
+
|
|
31
|
+
| Hook | Event | Purpose |
|
|
32
|
+
| ---- | ----- | ------- |
|
|
33
|
+
| `auto-format.sh` | PostToolUse: Write/Edit | Auto-formats files after edits |
|
|
34
|
+
| `recursion-guard.sh` | PreToolUse | Prevents recursive tool invocation |
|
|
35
|
+
|
|
36
|
+
## Adding New Hooks
|
|
37
|
+
|
|
38
|
+
Before adding a hook to `settings-hooks.json`:
|
|
39
|
+
1. Classify as mandatory, required, or optional using the criteria above
|
|
40
|
+
2. Mandatory: security or data-integrity function — must never silently fail
|
|
41
|
+
3. Required: workflow quality — should warn on failure but not block
|
|
42
|
+
4. Optional: convenience — can be disabled per-project
|
|
43
|
+
5. All hooks must exit 0 on success, non-zero to block (PreToolUse only)
|
|
44
|
+
6. All hooks must handle missing dependencies gracefully (exit 0 if tool not found)
|
|
45
|
+
|
|
46
|
+
## Hook Configuration
|
|
47
|
+
|
|
48
|
+
Hooks are declared in `base/hooks/settings-hooks.json`.
|
|
49
|
+
Install copies them to `~/.claude/settings.json` during `install.sh`.
|
|
50
|
+
Project-specific hooks can be added in project `.claude/settings.json`.
|
|
51
|
+
|
|
52
|
+
## Removal Condition
|
|
53
|
+
Remove when Claude Code provides a native hook policy/priority system.
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
# Multi-Agent Orchestration — Rules
|
|
2
|
+
# Scope: All projects, all sessions
|
|
3
|
+
# Governs when and how to use multi-agent patterns
|
|
4
|
+
|
|
5
|
+
## Agent Patterns
|
|
6
|
+
|
|
7
|
+
### Single-Agent (default)
|
|
8
|
+
One Claude instance handles the full task. Subagents handle scoped delegation
|
|
9
|
+
(review, simplify, explore) per the `px-subagent` dispatch protocol.
|
|
10
|
+
|
|
11
|
+
**Use when:** Task touches ≤3 files, single domain, straightforward implementation.
|
|
12
|
+
|
|
13
|
+
### Generator/Evaluator
|
|
14
|
+
Generator produces output. Evaluator critically reviews against spec, scoring on
|
|
15
|
+
correctness, completeness, style compliance, and test coverage.
|
|
16
|
+
|
|
17
|
+
**Use when:** Task touches >5 files, crosses module boundaries, or has high
|
|
18
|
+
correctness requirements (auth, data integrity, public API changes).
|
|
19
|
+
|
|
20
|
+
### Planner/Generator/Evaluator
|
|
21
|
+
Planner decomposes into subtask graph first. Generator and Evaluator work each subtask.
|
|
22
|
+
|
|
23
|
+
**Use when:** Task spans multiple milestones, requires architectural decisions,
|
|
24
|
+
or the plan itself needs adversarial review.
|
|
25
|
+
|
|
26
|
+
## Activation Thresholds — Conventions (WARN on violation)
|
|
27
|
+
|
|
28
|
+
| Signal | Single-Agent | Generator/Evaluator | Planner/Generator/Evaluator |
|
|
29
|
+
| ------ | ------------ | ------------------- | --------------------------- |
|
|
30
|
+
| Files changed | ≤3 | 4-10 | 10+ |
|
|
31
|
+
| Domains crossed | 1 | 2 | 3+ |
|
|
32
|
+
| Milestone count | 1 | 1-2 | 3+ |
|
|
33
|
+
| Risk level | Low | Medium | High |
|
|
34
|
+
|
|
35
|
+
These are guidelines, not hard gates. Use judgment — a 2-file auth change
|
|
36
|
+
may warrant Generator/Evaluator; a 15-file rename may not.
|
|
37
|
+
|
|
38
|
+
## Evaluator Agent Spec
|
|
39
|
+
|
|
40
|
+
When Generator/Evaluator mode is active, the Evaluator receives:
|
|
41
|
+
|
|
42
|
+
| Input | Source |
|
|
43
|
+
| ----- | ------ |
|
|
44
|
+
| Diff | Generator's output (staged changes) |
|
|
45
|
+
| SPEC | From active plan file |
|
|
46
|
+
| Rules | Relevant quality rules for file types in diff |
|
|
47
|
+
| Test output | If tests were run |
|
|
48
|
+
|
|
49
|
+
The Evaluator does NOT receive conversation history.
|
|
50
|
+
|
|
51
|
+
### Evaluator scoring rubric
|
|
52
|
+
|
|
53
|
+
| Dimension | Weight | Criteria |
|
|
54
|
+
| --------- | ------ | -------- |
|
|
55
|
+
| Correctness | 40% | Does the code do what the spec says? All paths handled? |
|
|
56
|
+
| Completeness | 25% | Are all acceptance criteria met? Tests present? |
|
|
57
|
+
| Style compliance | 20% | Naming, structure, quality rules respected? |
|
|
58
|
+
| Test coverage | 15% | Happy path, failure path, edge cases covered? |
|
|
59
|
+
|
|
60
|
+
### Evaluator output format
|
|
61
|
+
Uses the same severity format as `px-subagent`:
|
|
62
|
+
```
|
|
63
|
+
{file}:{line} — {severity} — {category} — {description} — {fix}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### Escalation
|
|
67
|
+
- Critical findings → Generator must fix before proceeding
|
|
68
|
+
- Major findings → Generator should fix before merge
|
|
69
|
+
- >3 findings addressed → re-run Evaluator (max 3 rounds)
|
|
70
|
+
|
|
71
|
+
## Planner Agent Spec
|
|
72
|
+
|
|
73
|
+
When Planner mode is active, the Planner receives:
|
|
74
|
+
- PROBLEM / DELIVERABLE / ACCEPTANCE / BOUNDARIES
|
|
75
|
+
- Relevant codebase context (file structure, key interfaces)
|
|
76
|
+
- Active constraints from rules
|
|
77
|
+
|
|
78
|
+
The Planner outputs a subtask graph:
|
|
79
|
+
```
|
|
80
|
+
1. [subtask] — {files} — {acceptance criteria}
|
|
81
|
+
└─ depends_on: []
|
|
82
|
+
2. [subtask] — {files} — {acceptance criteria}
|
|
83
|
+
└─ depends_on: [1]
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
Generator and Evaluator process each subtask in dependency order.
|
|
87
|
+
|
|
88
|
+
## Integration with Existing Skills
|
|
89
|
+
|
|
90
|
+
| Skill | Orchestration role |
|
|
91
|
+
| ----- | ------------------ |
|
|
92
|
+
| `/px-plan` | May invoke Planner agent for complex decomposition |
|
|
93
|
+
| `/px-review` | Already uses Evaluator pattern via `px-subagent` |
|
|
94
|
+
| `/px-verify` | Self-review step is a lightweight Evaluator |
|
|
95
|
+
| `/px-execute` | Generator role — produces implementation |
|
|
96
|
+
|
|
97
|
+
## Removal Condition
|
|
98
|
+
Remove when Claude Code provides native multi-agent orchestration with
|
|
99
|
+
built-in evaluator and planner agent types.
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# Observable Code — Instrumentation Constraints
|
|
2
|
+
# Scope: **/services/**, **/handlers/**, **/workers/**, **/middleware/**, **/cmd/**
|
|
3
|
+
# Active during code generation for service-layer code
|
|
4
|
+
# Cross-reference: api-quality.md covers request-level logging and correlation IDs.
|
|
5
|
+
# This rule covers application-level observability: structured logging, metrics, traces.
|
|
6
|
+
|
|
7
|
+
Code is not production-ready if it cannot be debugged without attaching a debugger.
|
|
8
|
+
Observable code tells you what happened, when, and why — from logs, metrics, and traces alone.
|
|
9
|
+
|
|
10
|
+
## Invariants — BLOCK on violation
|
|
11
|
+
|
|
12
|
+
### Structured logging only
|
|
13
|
+
- All log statements use structured format (key-value pairs, not string interpolation)
|
|
14
|
+
- No `fmt.Println` / `console.log` / `print()` in production code paths — use the structured logger
|
|
15
|
+
- Log at the point of failure, not at the catch site (log once, propagate)
|
|
16
|
+
|
|
17
|
+
### Log levels are semantic
|
|
18
|
+
- ERROR: something failed and a human needs to know immediately
|
|
19
|
+
- WARN: something unexpected happened but the system recovered
|
|
20
|
+
- INFO: a significant state transition (service started, job completed, user authenticated)
|
|
21
|
+
- DEBUG: internal detail useful during development — must not appear in production by default
|
|
22
|
+
|
|
23
|
+
### Structured log format — mandatory fields
|
|
24
|
+
```json
|
|
25
|
+
{
|
|
26
|
+
"timestamp": "ISO-8601 UTC",
|
|
27
|
+
"level": "error|warn|info|debug",
|
|
28
|
+
"service": "service-name",
|
|
29
|
+
"correlation_id": "request or trace identifier",
|
|
30
|
+
"message": "what happened — actionable, not generic",
|
|
31
|
+
"context": { "relevant_key": "relevant_value" }
|
|
32
|
+
}
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
### What NOT to log
|
|
36
|
+
- Passwords, tokens, secrets, full credit card numbers
|
|
37
|
+
- Full request/response bodies in production (may contain PII)
|
|
38
|
+
- DEBUG logs in production services (log level must be configurable)
|
|
39
|
+
- The same event more than once in the same request path
|
|
40
|
+
|
|
41
|
+
### External call discipline
|
|
42
|
+
- Every external call (HTTP, DB, queue) has a timeout
|
|
43
|
+
- Every external call logs duration on completion
|
|
44
|
+
- Failed external calls log: target, duration, error type, and whether retry will occur
|
|
45
|
+
|
|
46
|
+
## Conventions — WARN on violation
|
|
47
|
+
|
|
48
|
+
### Metrics naming
|
|
49
|
+
Format: `{service}_{subsystem}_{name}_{unit}`
|
|
50
|
+
All lowercase, underscores as separators.
|
|
51
|
+
|
|
52
|
+
Mandatory metrics per service:
|
|
53
|
+
- `{service}_requests_total` — counter, labeled by method and status code
|
|
54
|
+
- `{service}_errors_total` — counter, labeled by error type
|
|
55
|
+
- `{service}_latency_seconds` — histogram, labeled by operation
|
|
56
|
+
- `{service}_active_connections` or `{service}_queue_depth` — gauge (if applicable)
|
|
57
|
+
|
|
58
|
+
GOOD: `auth_login_attempts_total`, `cache_hit_ratio`, `queue_messages_pending`
|
|
59
|
+
BAD: `loginAttempts`, `CacheHitRatio`, `queue-messages-pending`
|
|
60
|
+
|
|
61
|
+
### Trace spans (OpenTelemetry)
|
|
62
|
+
Span naming: `{service}/{operation}` — lowercase, slash separator
|
|
63
|
+
GOOD: `auth/validate-token`, `db/query-users`, `cache/get`
|
|
64
|
+
BAD: `validateToken`, `DB Query`, `GET /users`
|
|
65
|
+
|
|
66
|
+
Mandatory span attributes:
|
|
67
|
+
- `service.name`
|
|
68
|
+
- `http.method` and `http.status_code` for HTTP operations
|
|
69
|
+
- `db.system` and `db.operation` for database calls
|
|
70
|
+
- `error.type` and `error.message` on error spans
|
|
71
|
+
|
|
72
|
+
### Health endpoints
|
|
73
|
+
- Liveness: `/healthz` — "is the process alive?"
|
|
74
|
+
- Readiness: `/readyz` — "can the process serve traffic?"
|
|
75
|
+
- Both return structured JSON with component status
|
|
76
|
+
|
|
77
|
+
### The Observability Contract
|
|
78
|
+
An error is only production-observable if ALL three are true:
|
|
79
|
+
1. It appears in structured logs with correlation_id and context
|
|
80
|
+
2. It increments an error metric labeled by error type
|
|
81
|
+
3. It is captured in a trace span with error attributes
|
|
82
|
+
|
|
83
|
+
If only one or two are true: the code is not fully observable. Fix before shipping.
|
|
84
|
+
|
|
85
|
+
## Removal Condition
|
|
86
|
+
Remove when an observability linter or OpenTelemetry SDK auto-instrumentation
|
|
87
|
+
replaces these generation-time constraints entirely.
|