@axis-bootstrap/cli 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/README.md +90 -0
  2. package/package.json +42 -0
  3. package/src/commands/audit.js +53 -0
  4. package/src/commands/cleanup.js +42 -0
  5. package/src/commands/doctor.js +137 -0
  6. package/src/commands/init.js +297 -0
  7. package/src/commands/link.js +31 -0
  8. package/src/commands/spdd.js +139 -0
  9. package/src/commands/state.js +21 -0
  10. package/src/index.js +113 -0
  11. package/src/lib/copy.js +19 -0
  12. package/src/lib/detect.js +70 -0
  13. package/src/lib/i18n.js +147 -0
  14. package/src/lib/paths.js +45 -0
  15. package/src/lib/ui.js +29 -0
  16. package/templates/CANVAS.md +48 -0
  17. package/templates/CONVENTIONS.md +43 -0
  18. package/templates/INSTRUCTIONS.md +49 -0
  19. package/templates/STATE.md +27 -0
  20. package/templates/bootstrap-skill/PLANNER.md +221 -0
  21. package/templates/bootstrap-skill/PROMPT-TEMPLATE.md +128 -0
  22. package/templates/bootstrap-skill/SKILL.md +56 -0
  23. package/templates/bootstrap-skill/references/CANVAS-REASONS.md +111 -0
  24. package/templates/bootstrap-skill/references/PATTERNS.md +372 -0
  25. package/templates/bootstrap-skill/references/PHASE-1-DISCOVERY.md +120 -0
  26. package/templates/bootstrap-skill/references/PHASE-2-SPEC.md +250 -0
  27. package/templates/bootstrap-skill/references/PHASE-3-HARNESS.md +331 -0
  28. package/templates/bootstrap-skill/references/PHASE-4-MEMORY.md +187 -0
  29. package/templates/bootstrap-skill/references/PHASE-5-VALIDATION.md +194 -0
  30. package/templates/bootstrap-skill/references/QUICKSTART.md +144 -0
  31. package/templates/bootstrap-skill/references/TEMPLATES.md +602 -0
  32. package/templates/bootstrap-skill/references/UNIVERSAL-MAP.md +216 -0
  33. package/templates/settings.json +29 -0
  34. package/templates/setup-ide-links.sh +33 -0
  35. package/templates/skills/abstraction-first.md +55 -0
  36. package/templates/skills/alignment.md +53 -0
  37. package/templates/skills/iterative-review.md +55 -0
  38. package/templates/skills/story-decompose.md +54 -0
@@ -0,0 +1,250 @@
1
+ # Phase 2 — Spec Layer Generation
2
+
3
+ **Goal:** generate the project's single source of knowledge (`.ai/`).
4
+
5
+ **Input:** Project Profile validated in Phase 1.
6
+
7
+ **Output:** `.ai/` structure populated with `INSTRUCTIONS.md`, skill skeletons, initial rules, and doc stubs.
8
+
9
+ ---
10
+
11
+ ## Generation Order (do not reverse)
12
+
13
+ ```text
14
+ 1. Create folder structure
15
+ 2. Generate INSTRUCTIONS.md
16
+ 3. Generate skill skeletons (one per identified domain)
17
+ 4. Generate initial rules
18
+ 5. Generate doc stubs
19
+ 6. Present to user and validate
20
+ ```
21
+
22
+ ---
23
+
24
+ ## Step 1 — Folder Structure
25
+
26
+ ```bash
27
+ mkdir -p .ai/{skills,rules,docs,docs/stories}
28
+ ```
29
+
30
+ For non-technical projects, still create the structure — `rules/` can be used for "protocols", `docs/` for domain references. Homogeneity simplifies maintenance.
31
+
32
+ ---
33
+
34
+ ## Step 2 — INSTRUCTIONS.md
35
+
36
+ Use the template from [TEMPLATES.md → INSTRUCTIONS.md](TEMPLATES.md#instructionsmd).
37
+
38
+ **Section order (consultation frequency, not logical importance):**
39
+
40
+ 1. **Purpose** (1-2 sentences — what it does, for whom, why)
41
+ 2. **Stack or Tools** (with relevant versions)
42
+ 3. **How to Run / How to Start** (exact commands or first steps)
43
+ 4. **Architecture** (table: component → responsibility → technology → location)
44
+ 5. **Design Principles** (3-7 bullets with short rationale)
45
+ 6. **Conventions** (summary — details in rules)
46
+ 7. **Available Skills** (table: skill → when to use)
47
+ 8. **Links** (to detailed docs)
48
+
49
+ **Target size:** 100-180 lines. Below 100 is superficial; above 200 loses focus.
50
+
51
+ **Critical insight — describe decisions, not just facts:**
52
+
53
+ ```markdown
54
+ # Bad
55
+ - ORM: TypeORM
56
+
57
+ # Good
58
+ - ORM: TypeORM with Repository pattern — never access `Repository<T>` directly in services,
59
+ encapsulate in `*Repository` classes to make mocking in tests easier
60
+ ```
61
+
62
+ The second form saves a question to the dev and prevents out-of-pattern code.
63
+
64
+ **For non-technical projects**, replace:
65
+
66
+ - "Stack" → "Tools and platforms"
67
+ - "How to run" → "How to start / workflow"
68
+ - "Architecture" → "Project components"
69
+ - "Code conventions" → "Quality standards"
70
+
71
+ ---
72
+
73
+ ## Step 3 — Skill Skeletons
74
+
75
+ For each domain identified in Phase 1, create:
76
+
77
+ ```text
78
+ .ai/skills/<name>/
79
+ └── SKILL.md (40-60 lines, without references/ yet)
80
+ ```
81
+
82
+ Use template [SKILL.md in TEMPLATES.md](TEMPLATES.md#skillmd-index).
83
+
84
+ **The frontmatter `description` is the most critical element** — determines whether the skill will be loaded. Checklist:
85
+
86
+ - [ ] 2-4 lines (1 line is vague, 5+ is excessive)
87
+ - [ ] In third person ("Use when implementing...")
88
+ - [ ] Mentions domain terms that act as triggers
89
+ - [ ] Lists 3-5 usage scenarios
90
+ - [ ] A new dev would understand when to use the skill just by reading the description
91
+
92
+ ```yaml
93
+ # Weak
94
+ description: Reference for the payments API integration.
95
+
96
+ # Strong
97
+ description: Complete reference for the Payments API integration.
98
+ Use when implementing API calls (endpoints, auth, payload format),
99
+ debugging API responses (error codes, rate limits),
100
+ or understanding the retry strategy and idempotency rules.
101
+ ```
102
+
103
+ **Do not populate references/ yet.** This phase delivers the index. References are filled in subsequent sessions as knowledge accumulates.
104
+
105
+ ### Granularity — when to create new skill vs expand existing
106
+
107
+ **Create new when:**
108
+ - Domain has >5 specific concepts
109
+ - Has its own workflow
110
+ - Usage scenario is distinct
111
+
112
+ **Expand existing when:**
113
+ - Information is complementary
114
+ - SKILL.md still <60 lines after addition
115
+ - Same usage scenario
116
+
117
+ **Use `docs/` instead of skill when:**
118
+ - It is pure reference documentation (schema, contracts)
119
+ - Does not involve workflow
120
+ - Will be referenced by multiple skills
121
+
122
+ ---
123
+
124
+ ## Step 4 — Initial Rules
125
+
126
+ For software projects, create 3-7 rules in `.ai/rules/`. Use template [Rule in TEMPLATES.md](TEMPLATES.md#code-rule).
127
+
128
+ **Recommended default structure:**
129
+
130
+ ```text
131
+ .ai/rules/
132
+ ├── code-style.md (naming, formatting, imports)
133
+ ├── architecture-patterns.md (DI, modules, framework patterns)
134
+ ├── database.md (ORM, migrations, queries)
135
+ ├── testing.md (test structure, mocks)
136
+ └── cli.md (commands and scripts)
137
+ ```
138
+
139
+ **Frontmatter for scope:**
140
+
141
+ ```yaml
142
+ ---
143
+ applyTo: "**/*.{ext}"
144
+ paths:
145
+ - "src/**"
146
+ ---
147
+ ```
148
+
149
+ **How to write an effective rule:**
150
+
151
+ ```markdown
152
+ # Bad — too generic
153
+ - Use meaningful variable names
154
+ - Keep functions small
155
+
156
+ # Good — specific and actionable
157
+ - Use constants or enums for all fixed domain values (e.g., `Status`, `Origin`)
158
+ — never use string literals like `'pending'` scattered in the code
159
+ - Batch operations: prefer native ORM/DB bulk inserts/updates — never loops
160
+ (impact of N+1 queries on large tables is exponential)
161
+ ```
162
+
163
+ **Three elements of an effective rule:** what to do, how to do it, and why (when not obvious).
164
+
165
+ **For non-technical projects**, replace rules with **protocols** (same structure, without `applyTo`):
166
+
167
+ ```text
168
+ .ai/rules/ (or .ai/protocols/)
169
+ ├── tone-of-voice.md
170
+ ├── article-structure.md
171
+ ├── review-checklist.md
172
+ └── citation-standards.md
173
+ ```
174
+
175
+ ---
176
+
177
+ ## Step 5 — Doc Stubs
178
+
179
+ Create files with headers and empty sections, ready for future population.
180
+
181
+ ### For software
182
+
183
+ ```text
184
+ .ai/docs/
185
+ ├── architecture.md (system overview + decisions)
186
+ ├── database-schema.md (tables + business rules + indexes)
187
+ ├── api-contracts.md (internal and external contracts)
188
+ ├── data-flows.md (optional — for non-obvious flows)
189
+ └── monitoring.md (optional — observability)
190
+ ```
191
+
192
+ `architecture.md` should include a **Key Architectural Decisions** section with format:
193
+
194
+ ```markdown
195
+ - **Why <decision>:** <short rationale>
196
+ ```
197
+
198
+ `database-schema.md` should include **business rules** alongside the schema (not elsewhere):
199
+
200
+ ```markdown
201
+ **Business rules:**
202
+ - `deleted_at IS NULL` in all queries (soft delete)
203
+ - `retry_count` incremented on each failed attempt, max 3
204
+ ```
205
+
206
+ ### For specialized domains (non-software)
207
+
208
+ ```text
209
+ .ai/docs/
210
+ ├── glossary.md (domain terms with specific meaning)
211
+ ├── workflows.md (work flows)
212
+ └── references.md (external links, official sources)
213
+ ```
214
+
215
+ ---
216
+
217
+ ## Step 6 — Validation and Gate
218
+
219
+ Present to user:
220
+
221
+ ```markdown
222
+ ## Spec Layer Generated
223
+
224
+ ### Structure
225
+ .ai/INSTRUCTIONS.md (N lines)
226
+ .ai/skills/<skill1>/SKILL.md (N lines)
227
+ .ai/skills/<skill2>/SKILL.md (N lines)
228
+ ...
229
+ .ai/rules/<rule1>.md
230
+ .ai/docs/architecture.md (stub)
231
+ ...
232
+
233
+ ### Total
234
+ - N skills initialized
235
+ - N rules created
236
+ - N doc stubs
237
+
238
+ ### Question
239
+ Any critical domain I missed? Any file that doesn't make sense for this project?
240
+ Any name to adjust?
241
+ ```
242
+
243
+ **Wait for confirmation before Phase 3.**
244
+
245
+ ---
246
+
247
+ ## Quality Principles
248
+
249
+ - **Density > length** — every line must carry useful information
250
+ - **Decisions > facts** — explain the "why", not just the "what"
@@ -0,0 +1,331 @@
1
+ # Phase 3 — Harness Layer Configuration
2
+
3
+ **Goal:** install the behavioral infrastructure that makes the agent safe, consistent, and productive regardless of what the model "decides" in the moment.
4
+
5
+ **Input:** Spec Layer from Phase 2 validated + project type.
6
+
7
+ **Output:** versioned `settings.json`, configured hooks, declared sub-agents, distributed symlinks.
8
+
9
+ ---
10
+
11
+ ## Why the Harness Exists
12
+
13
+ The spec defines what the agent knows. **But production reliability depends more on the harness than on the model.** Without it:
14
+
15
+ - Inconsistent formatting accumulates in dirty diffs
16
+ - Destructive commands go through (`rm -rf`, `DROP TABLE`)
17
+ - Tests don't run at the end, regressions escape
18
+ - Each developer gets different AI behavior per machine
19
+
20
+ The harness eliminates each of these by construction, not by discipline.
21
+
22
+ ---
23
+
24
+ ## The Five Subsystems
25
+
26
+ | Subsystem | Function | Applicability |
27
+ | ------------------------- | ------------------------------------ | --------------------------------- |
28
+ | **Permission Harness** | Versioned `settings.json` | Universal |
29
+ | **Execution Harness** | Hooks (Pre/Post/Stop) | Software (with adaptations) |
30
+ | **Orchestration Harness** | Sub-agents | Universal |
31
+ | **Context Harness** | Token budget, Progressive Disclosure | Universal (already in Phase 2) |
32
+ | **Verification Harness** | Quality gates in skills | Universal (already in Phase 2) |
33
+
34
+ This phase implements the first three (the other two are already in the Phase 2 design).
35
+
36
+ ---
37
+
38
+ ## Step 1 — `settings.json`
39
+
40
+ Use the template in [TEMPLATES.md → settings.json](TEMPLATES.md#settingsjson). Adapt to the stack via table:
41
+
42
+ | Stack | Replace `<build-tool>` with |
43
+ | ----------- | ------------------------------------------------- |
44
+ | Node.js | `Bash(npm *)`, `Bash(npx *)` |
45
+ | Python | `Bash(pip *)`, `Bash(pytest *)`, `Bash(poetry *)` |
46
+ | Go | `Bash(go *)` |
47
+ | Java/Maven | `Bash(mvn *)` |
48
+ | Java/Gradle | `Bash(gradle *)`, `Bash(./gradlew *)` |
49
+ | Ruby | `Bash(bundle *)`, `Bash(rake *)` |
50
+ | PHP | `Bash(composer *)` |
51
+ | Rust | `Bash(cargo *)` |
52
+ | .NET | `Bash(dotnet *)` |
53
+
54
+ **Minimum structure:**
55
+
56
+ ```json
57
+ {
58
+ "permissions": {
59
+ "allow": ["Read", "Bash(git *)", "<stack>", "Edit(/src/**)", "Edit(/.ai/**)"],
60
+ "deny": ["Bash(rm -rf *)", "Bash(git push --force*)"],
61
+ "ask": ["Bash(git push *)", "Edit(/.env*)"]
62
+ }
63
+ }
64
+ ```
65
+
66
+ **Universal (non-software):** keep `Read`, `Bash(git *)`, `Edit(/.ai/**)`, and adapt `Edit` to the project layout. Skip stack entries.
67
+
68
+ **Versioning in git** is mandatory. Without it, behavior varies per machine and bugs are hard to reproduce.
69
+
70
+ ---
71
+
72
+ ## Step 2 — Hooks
73
+
74
+ Hooks execute shell commands in response to agent events. **Three are indispensable** when applicable:
75
+
76
+ ### Hook A — `PostToolUse` (automatic formatting)
77
+
78
+ **Applicable to:** software with a formatter.
79
+
80
+ ```json
81
+ {
82
+ "hooks": {
83
+ "PostToolUse": [
84
+ {
85
+ "matcher": "Edit|Write",
86
+ "hooks": [
87
+ {
88
+ "type": "command",
89
+ "command": "bash scripts/format-file.sh \"$CLAUDE_TOOL_INPUT_FILE_PATH\""
90
+ }
91
+ ]
92
+ }
93
+ ]
94
+ }
95
+ }
96
+ ```
97
+
98
+ Script `format-file.sh` in [TEMPLATES.md → format-file.sh](TEMPLATES.md#format-filesh). It is stack-aware via `case` and never fails (`exit 0`) — missing formatter does not block the agent.
99
+
100
+ **Why indispensable:** without it, diffs get polluted with style changes, increasing code review cost.
101
+
102
+ ### Hook B — `PreToolUse` (destructive blocking)
103
+
104
+ **Applicable to:** universal. Always install.
105
+
106
+ ```json
107
+ {
108
+ "hooks": {
109
+ "PreToolUse": [
110
+ {
111
+ "matcher": "Bash",
112
+ "hooks": [{ "type": "command", "command": "bash scripts/validate-bash.sh" }]
113
+ }
114
+ ]
115
+ }
116
+ }
117
+ ```
118
+
119
+ Script `validate-bash.sh` in [TEMPLATES.md → validate-bash.sh](TEMPLATES.md#validate-bashsh). Blocks patterns: `rm -rf /`, `DROP TABLE`, `TRUNCATE`, `DELETE FROM` without WHERE.
120
+
121
+ **Why indispensable:** the agent occasionally infers it needs to "clean up" files. Without protection, a context error is irreversible. Does not block normal work — only the dangerous cases.
122
+
123
+ ### Hook C — `Stop` (tests on finish)
124
+
125
+ **Applicable to:** software with a test runner.
126
+
127
+ ```json
128
+ {
129
+ "hooks": {
130
+ "Stop": [
131
+ {
132
+ "hooks": [
133
+ {
134
+ "type": "command",
135
+ "command": "bash scripts/run-tests-if-changed.sh"
136
+ }
137
+ ]
138
+ }
139
+ ]
140
+ }
141
+ }
142
+ ```
143
+
144
+ Script `run-tests-if-changed.sh` in [TEMPLATES.md → run-tests-if-changed.sh](TEMPLATES.md#run-tests-if-changedsh). Detects changed extensions in the diff and runs only the applicable test runner.
145
+
146
+ **Why indispensable:** closes the feedback loop. The agent not only "does" — it validates what it did. Regressions are caught in the same session.
147
+
148
+ ### For non-technical projects
149
+
150
+ - **Hook A:** skip (no formatter)
151
+ - **Hook B:** **keep** (universal protection)
152
+ - **Hook C:** skip or replace with output validation (e.g., spell check, markdown lint)
153
+
154
+ ---
155
+
156
+ ## Step 3 — Sub-agents
157
+
158
+ Sub-agents enable smart delegation. The main agent **orchestrates**, sub-agents **execute**.
159
+
160
+ ### `Explore` (built-in, always enable)
161
+
162
+ - Read-only access: `Glob`, `Grep`, `Read`, `WebFetch`, `WebSearch`
163
+ - Cannot edit files during research
164
+ - More efficient: does not load unused write tools
165
+ - For large codebases, use level `"very thorough"`
166
+
167
+ ### When to delegate vs execute
168
+
169
+ | Task | Delegate? | Why |
170
+ | --------------------------------- | --------- | ------------------------------------ |
171
+ | Research / exploration | **Yes** | Bulky output; only the summary matters |
172
+ | Task implementation | **Yes** | File reads/edits consume context |
173
+ | Independent parallel tasks | **Yes** | Only way to parallelize |
174
+ | Sequential tasks without dependencies | **Yes** | Keeps main context clean |
175
+ | Planning and task creation | **No** | Requires accumulated context |
176
+ | Validation and final reports | **No** | Needs session history |
177
+ | Quick fixes (≤3 files) | **No** | Overhead > task |
178
+
179
+ ### Sub-agent contract
180
+
181
+ **Receives:**
182
+ - Task definition (what to do, where, completion criteria)
183
+ - Relevant rules and conventions
184
+ - Spec/design the task references
185
+
186
+ **Does not receive:**
187
+ - Definitions of other tasks
188
+ - Accumulated chat history
189
+ - `STATE.md` (unless recording a specific decision/blocker)
190
+
191
+ **Returns:**
192
+ - Status: Complete | Blocked | Partial
193
+ - Changed files
194
+ - Test/validation result
195
+ - Issues found
196
+
197
+ ---
198
+
199
+ ## Step 4 — Symlinks by IDE
200
+
201
+ For each IDE declared in Phase 1, create symlinks. Use the script [setup-ide-links.sh in TEMPLATES.md](TEMPLATES.md#setup-ide-linkssh).
202
+
203
+ **Principle:** the script is **idempotent** (`ln -sf` replaces without error). Can run as many times as needed.
204
+
205
+ **Target folder by IDE:**
206
+
207
+ | IDE | Where it looks for context |
208
+ | -------------- | ---------------------------------------------------------- |
209
+ | Claude Code | `.claude/`, `CLAUDE.md` |
210
+ | Cursor | `.cursor/rules/`, `.cursor/skills/`, `AGENTS.md` |
211
+ | GitHub Copilot | `.github/copilot-instructions.md`, `.github/instructions/` |
212
+ | Windsurf | `AGENTS.md`, `.agents/` |
213
+
214
+ **Skip** symlinks for IDEs the user declared not using — reduces noise in `git status`.
215
+
216
+ **Smoke test after creating:**
217
+
218
+ ```bash
219
+ ls -la CLAUDE.md AGENTS.md .claude/ .cursor/ .agents/ .github/
220
+ ```
221
+
222
+ Each symlink should show `→ ../.ai/...` or similar.
223
+
224
+ ### Windows
225
+
226
+ Symlinks on Windows require administrator permission or Developer Mode enabled. If the team uses Windows:
227
+
228
+ - Document in `INSTRUCTIONS.md` or `CONVENTIONS.md`
229
+ - Recommend `core.symlinks = true` in Git for Windows
230
+ - Alternative: use `mklink /D` in elevated terminal
231
+
232
+ ---
233
+
234
+ ## Step 5 — Smoke Test and Gate
235
+
236
+ ```bash
237
+ # 1. Verify settings.json
238
+ cat .claude/settings.json | jq . # or cat if no jq
239
+
240
+ # 2. Verify symlinks resolve
241
+ ls -la CLAUDE.md AGENTS.md
242
+
243
+ # 3. Verify hooks execute (create dummy file and watch lint run)
244
+ echo "test" > /tmp/test.ts && bash scripts/format-file.sh /tmp/test.ts
245
+ ```
246
+
247
+ Present to user:
248
+
249
+ ```markdown
250
+ ## Harness Layer Configured
251
+
252
+ ### Permissions
253
+ - N entries in allow, N in deny, N in ask
254
+
255
+ ### Hooks installed
256
+ - PostToolUse: format-file.sh (Node/Python/Go per stack)
257
+ - PreToolUse: validate-bash.sh (destructive protection — universal)
258
+ - Stop: run-tests-if-changed.sh
259
+
260
+ ### Symlinks created
261
+ [visual tree]
262
+
263
+ ### Smoke test
264
+ [output of the 3 commands above]
265
+
266
+ ### Question
267
+ Any additional destructive patterns to block? Any missing IDE?
268
+ ```
269
+
270
+ **Wait for confirmation before Phase 4.**
271
+
272
+ ---
273
+
274
+ ## Step 6 — Failure Attribution
275
+
276
+ > **Context:** ReliabilityBench (arxiv 2601.06112) demonstrated that pass@1 overestimates reliability by 20-40%. AgentProp-Bench (arxiv 2604.16706) showed that most benchmarks report only pass/fail, without locating where in the pipeline the failure occurred. AXIS instruments the harness for attribution.
277
+
278
+ **Configure structured logging in `settings.json`:**
279
+
280
+ ```json
281
+ {
282
+ "hooks": {
283
+ "PreToolUse": [
284
+ {
285
+ "matcher": ".*",
286
+ "hooks": [
287
+ {
288
+ "type": "command",
289
+ "command": "echo \"{\\\"event\\\":\\\"pre\\\",\\\"tool\\\":\\\"$CLAUDE_TOOL_NAME\\\",\\\"ts\\\":\\\"$(date -Iseconds)\\\"}\" >> .ai/logs/harness.jsonl 2>/dev/null || true"
290
+ }
291
+ ]
292
+ }
293
+ ],
294
+ "PostToolUse": [
295
+ {
296
+ "matcher": ".*",
297
+ "hooks": [
298
+ {
299
+ "type": "command",
300
+ "command": "echo \"{\\\"event\\\":\\\"post\\\",\\\"tool\\\":\\\"$CLAUDE_TOOL_NAME\\\",\\\"exit\\\":\\\"$CLAUDE_TOOL_EXIT_CODE\\\",\\\"ts\\\":\\\"$(date -Iseconds)\\\"}\" >> .ai/logs/harness.jsonl 2>/dev/null || true"
301
+ }
302
+ ]
303
+ }
304
+ ]
305
+ }
306
+ }
307
+ ```
308
+
309
+ Add `.ai/logs/` to `.gitignore` (they are runtime logs, not versioned).
310
+
311
+ **Failure attribution table:**
312
+
313
+ | Category | Symptom | Signal in log | Action |
314
+ | ------------- | ----------------------------------------- | --------------------------------------- | ------------------------------------------------------------------ |
315
+ | **Planning** | Agent attempts to execute without clear criteria | PreToolUse without corresponding spec task | Review `INSTRUCTIONS.md`; add acceptance criteria to skill |
316
+ | **Execution** | Tool call fails repeatedly | PostToolUse with `exit != 0` in loop | Review `settings.json`; adjust allow/deny |
317
+ | **Response** | Output generated but wrong format | Phase 5 gate rejects | Add output example to skill template |
318
+
319
+ **Add to Phase 5 checklist:**
320
+ - [ ] `harness.jsonl` exists and records events after smoke test
321
+ - [ ] No tool call loop with exit != 0 detected
322
+
323
+ ---
324
+
325
+ ## Unifying Principle
326
+
327
+ The gain from hooks: **removes dependency on manual discipline.** The formatter runs because the hook exists, not because the developer remembered. Tests run because `Stop` was configured, not because the agent decided to. Destructive commands are blocked because the rule exists, not because the agent "was careful".
328
+
329
+ **Production failures are not opaque** — the instrumented harness locates whether the problem is in planning (vague spec), execution (invalid tool call) or response (wrong format). This eliminates trial-and-error debugging.
330
+
331
+ Spec defines what the agent knows. Harness ensures it acts consistently, safely, and traceably — regardless of the conversation context.