@axis-bootstrap/cli 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +90 -0
- package/package.json +42 -0
- package/src/commands/audit.js +53 -0
- package/src/commands/cleanup.js +42 -0
- package/src/commands/doctor.js +137 -0
- package/src/commands/init.js +297 -0
- package/src/commands/link.js +31 -0
- package/src/commands/spdd.js +139 -0
- package/src/commands/state.js +21 -0
- package/src/index.js +113 -0
- package/src/lib/copy.js +19 -0
- package/src/lib/detect.js +70 -0
- package/src/lib/i18n.js +147 -0
- package/src/lib/paths.js +45 -0
- package/src/lib/ui.js +29 -0
- package/templates/CANVAS.md +48 -0
- package/templates/CONVENTIONS.md +43 -0
- package/templates/INSTRUCTIONS.md +49 -0
- package/templates/STATE.md +27 -0
- package/templates/bootstrap-skill/PLANNER.md +221 -0
- package/templates/bootstrap-skill/PROMPT-TEMPLATE.md +128 -0
- package/templates/bootstrap-skill/SKILL.md +56 -0
- package/templates/bootstrap-skill/references/CANVAS-REASONS.md +111 -0
- package/templates/bootstrap-skill/references/PATTERNS.md +372 -0
- package/templates/bootstrap-skill/references/PHASE-1-DISCOVERY.md +120 -0
- package/templates/bootstrap-skill/references/PHASE-2-SPEC.md +250 -0
- package/templates/bootstrap-skill/references/PHASE-3-HARNESS.md +331 -0
- package/templates/bootstrap-skill/references/PHASE-4-MEMORY.md +187 -0
- package/templates/bootstrap-skill/references/PHASE-5-VALIDATION.md +194 -0
- package/templates/bootstrap-skill/references/QUICKSTART.md +144 -0
- package/templates/bootstrap-skill/references/TEMPLATES.md +602 -0
- package/templates/bootstrap-skill/references/UNIVERSAL-MAP.md +216 -0
- package/templates/settings.json +29 -0
- package/templates/setup-ide-links.sh +33 -0
- package/templates/skills/abstraction-first.md +55 -0
- package/templates/skills/alignment.md +53 -0
- package/templates/skills/iterative-review.md +55 -0
- package/templates/skills/story-decompose.md +54 -0
|
@@ -0,0 +1,250 @@
|
|
|
1
|
+
# Phase 2 — Spec Layer Generation
|
|
2
|
+
|
|
3
|
+
**Goal:** generate the project's single source of knowledge (`.ai/`).
|
|
4
|
+
|
|
5
|
+
**Input:** Project Profile validated in Phase 1.
|
|
6
|
+
|
|
7
|
+
**Output:** `.ai/` structure populated with `INSTRUCTIONS.md`, skill skeletons, initial rules, and doc stubs.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Generation Order (do not reverse)
|
|
12
|
+
|
|
13
|
+
```text
|
|
14
|
+
1. Create folder structure
|
|
15
|
+
2. Generate INSTRUCTIONS.md
|
|
16
|
+
3. Generate skill skeletons (one per identified domain)
|
|
17
|
+
4. Generate initial rules
|
|
18
|
+
5. Generate doc stubs
|
|
19
|
+
6. Present to user and validate
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Step 1 — Folder Structure
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
mkdir -p .ai/{skills,rules,docs,docs/stories}
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
For non-technical projects, still create the structure — `rules/` can be used for "protocols", `docs/` for domain references. Homogeneity simplifies maintenance.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## Step 2 — INSTRUCTIONS.md
|
|
35
|
+
|
|
36
|
+
Use the template from [TEMPLATES.md → INSTRUCTIONS.md](TEMPLATES.md#instructionsmd).
|
|
37
|
+
|
|
38
|
+
**Section order (consultation frequency, not logical importance):**
|
|
39
|
+
|
|
40
|
+
1. **Purpose** (1-2 sentences — what it does, for whom, why)
|
|
41
|
+
2. **Stack or Tools** (with relevant versions)
|
|
42
|
+
3. **How to Run / How to Start** (exact commands or first steps)
|
|
43
|
+
4. **Architecture** (table: component → responsibility → technology → location)
|
|
44
|
+
5. **Design Principles** (3-7 bullets with short rationale)
|
|
45
|
+
6. **Conventions** (summary — details in rules)
|
|
46
|
+
7. **Available Skills** (table: skill → when to use)
|
|
47
|
+
8. **Links** (to detailed docs)
|
|
48
|
+
|
|
49
|
+
**Target size:** 100-180 lines. Below 100 is superficial; above 200 loses focus.
|
|
50
|
+
|
|
51
|
+
**Critical insight — describe decisions, not just facts:**
|
|
52
|
+
|
|
53
|
+
```markdown
|
|
54
|
+
# Bad
|
|
55
|
+
- ORM: TypeORM
|
|
56
|
+
|
|
57
|
+
# Good
|
|
58
|
+
- ORM: TypeORM with Repository pattern — never access `Repository<T>` directly in services,
|
|
59
|
+
encapsulate in `*Repository` classes to make mocking in tests easier
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
The second form saves a question to the dev and prevents out-of-pattern code.
|
|
63
|
+
|
|
64
|
+
**For non-technical projects**, replace:
|
|
65
|
+
|
|
66
|
+
- "Stack" → "Tools and platforms"
|
|
67
|
+
- "How to run" → "How to start / workflow"
|
|
68
|
+
- "Architecture" → "Project components"
|
|
69
|
+
- "Code conventions" → "Quality standards"
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Step 3 — Skill Skeletons
|
|
74
|
+
|
|
75
|
+
For each domain identified in Phase 1, create:
|
|
76
|
+
|
|
77
|
+
```text
|
|
78
|
+
.ai/skills/<name>/
|
|
79
|
+
└── SKILL.md (40-60 lines, without references/ yet)
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Use template [SKILL.md in TEMPLATES.md](TEMPLATES.md#skillmd-index).
|
|
83
|
+
|
|
84
|
+
**The frontmatter `description` is the most critical element** — determines whether the skill will be loaded. Checklist:
|
|
85
|
+
|
|
86
|
+
- [ ] 2-4 lines (1 line is vague, 5+ is excessive)
|
|
87
|
+
- [ ] In third person ("Use when implementing...")
|
|
88
|
+
- [ ] Mentions domain terms that act as triggers
|
|
89
|
+
- [ ] Lists 3-5 usage scenarios
|
|
90
|
+
- [ ] A new dev would understand when to use the skill just by reading the description
|
|
91
|
+
|
|
92
|
+
```yaml
|
|
93
|
+
# Weak
|
|
94
|
+
description: Reference for the payments API integration.
|
|
95
|
+
|
|
96
|
+
# Strong
|
|
97
|
+
description: Complete reference for the Payments API integration.
|
|
98
|
+
Use when implementing API calls (endpoints, auth, payload format),
|
|
99
|
+
debugging API responses (error codes, rate limits),
|
|
100
|
+
or understanding the retry strategy and idempotency rules.
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
**Do not populate references/ yet.** This phase delivers the index. References are filled in subsequent sessions as knowledge accumulates.
|
|
104
|
+
|
|
105
|
+
### Granularity — when to create new skill vs expand existing
|
|
106
|
+
|
|
107
|
+
**Create new when:**
|
|
108
|
+
- Domain has >5 specific concepts
|
|
109
|
+
- Has its own workflow
|
|
110
|
+
- Usage scenario is distinct
|
|
111
|
+
|
|
112
|
+
**Expand existing when:**
|
|
113
|
+
- Information is complementary
|
|
114
|
+
- SKILL.md still <60 lines after addition
|
|
115
|
+
- Same usage scenario
|
|
116
|
+
|
|
117
|
+
**Use `docs/` instead of skill when:**
|
|
118
|
+
- It is pure reference documentation (schema, contracts)
|
|
119
|
+
- Does not involve workflow
|
|
120
|
+
- Will be referenced by multiple skills
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## Step 4 — Initial Rules
|
|
125
|
+
|
|
126
|
+
For software projects, create 3-7 rules in `.ai/rules/`. Use template [Rule in TEMPLATES.md](TEMPLATES.md#code-rule).
|
|
127
|
+
|
|
128
|
+
**Recommended default structure:**
|
|
129
|
+
|
|
130
|
+
```text
|
|
131
|
+
.ai/rules/
|
|
132
|
+
├── code-style.md (naming, formatting, imports)
|
|
133
|
+
├── architecture-patterns.md (DI, modules, framework patterns)
|
|
134
|
+
├── database.md (ORM, migrations, queries)
|
|
135
|
+
├── testing.md (test structure, mocks)
|
|
136
|
+
└── cli.md (commands and scripts)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
**Frontmatter for scope:**
|
|
140
|
+
|
|
141
|
+
```yaml
|
|
142
|
+
---
|
|
143
|
+
applyTo: "**/*.{ext}"
|
|
144
|
+
paths:
|
|
145
|
+
- "src/**"
|
|
146
|
+
---
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
**How to write an effective rule:**
|
|
150
|
+
|
|
151
|
+
```markdown
|
|
152
|
+
# Bad — too generic
|
|
153
|
+
- Use meaningful variable names
|
|
154
|
+
- Keep functions small
|
|
155
|
+
|
|
156
|
+
# Good — specific and actionable
|
|
157
|
+
- Use constants or enums for all fixed domain values (e.g., `Status`, `Origin`)
|
|
158
|
+
— never use string literals like `'pending'` scattered in the code
|
|
159
|
+
- Batch operations: prefer native ORM/DB bulk inserts/updates — never loops
|
|
160
|
+
(impact of N+1 queries on large tables is exponential)
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
**Three elements of an effective rule:** what to do, how to do it, and why (when not obvious).
|
|
164
|
+
|
|
165
|
+
**For non-technical projects**, replace rules with **protocols** (same structure, without `applyTo`):
|
|
166
|
+
|
|
167
|
+
```text
|
|
168
|
+
.ai/rules/ (or .ai/protocols/)
|
|
169
|
+
├── tone-of-voice.md
|
|
170
|
+
├── article-structure.md
|
|
171
|
+
├── review-checklist.md
|
|
172
|
+
└── citation-standards.md
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## Step 5 — Doc Stubs
|
|
178
|
+
|
|
179
|
+
Create files with headers and empty sections, ready for future population.
|
|
180
|
+
|
|
181
|
+
### For software
|
|
182
|
+
|
|
183
|
+
```text
|
|
184
|
+
.ai/docs/
|
|
185
|
+
├── architecture.md (system overview + decisions)
|
|
186
|
+
├── database-schema.md (tables + business rules + indexes)
|
|
187
|
+
├── api-contracts.md (internal and external contracts)
|
|
188
|
+
├── data-flows.md (optional — for non-obvious flows)
|
|
189
|
+
└── monitoring.md (optional — observability)
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
`architecture.md` should include a **Key Architectural Decisions** section with format:
|
|
193
|
+
|
|
194
|
+
```markdown
|
|
195
|
+
- **Why <decision>:** <short rationale>
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
`database-schema.md` should include **business rules** alongside the schema (not elsewhere):
|
|
199
|
+
|
|
200
|
+
```markdown
|
|
201
|
+
**Business rules:**
|
|
202
|
+
- `deleted_at IS NULL` in all queries (soft delete)
|
|
203
|
+
- `retry_count` incremented on each failed attempt, max 3
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
### For specialized domains (non-software)
|
|
207
|
+
|
|
208
|
+
```text
|
|
209
|
+
.ai/docs/
|
|
210
|
+
├── glossary.md (domain terms with specific meaning)
|
|
211
|
+
├── workflows.md (work flows)
|
|
212
|
+
└── references.md (external links, official sources)
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
---
|
|
216
|
+
|
|
217
|
+
## Step 6 — Validation and Gate
|
|
218
|
+
|
|
219
|
+
Present to user:
|
|
220
|
+
|
|
221
|
+
```markdown
|
|
222
|
+
## Spec Layer Generated
|
|
223
|
+
|
|
224
|
+
### Structure
|
|
225
|
+
.ai/INSTRUCTIONS.md (N lines)
|
|
226
|
+
.ai/skills/<skill1>/SKILL.md (N lines)
|
|
227
|
+
.ai/skills/<skill2>/SKILL.md (N lines)
|
|
228
|
+
...
|
|
229
|
+
.ai/rules/<rule1>.md
|
|
230
|
+
.ai/docs/architecture.md (stub)
|
|
231
|
+
...
|
|
232
|
+
|
|
233
|
+
### Total
|
|
234
|
+
- N skills initialized
|
|
235
|
+
- N rules created
|
|
236
|
+
- N doc stubs
|
|
237
|
+
|
|
238
|
+
### Question
|
|
239
|
+
Any critical domain I missed? Any file that doesn't make sense for this project?
|
|
240
|
+
Any name to adjust?
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**Wait for confirmation before Phase 3.**
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## Quality Principles
|
|
248
|
+
|
|
249
|
+
- **Density > length** — every line must carry useful information
|
|
250
|
+
- **Decisions > facts** — explain the "why", not just the "what"
|
|
@@ -0,0 +1,331 @@
|
|
|
1
|
+
# Phase 3 — Harness Layer Configuration
|
|
2
|
+
|
|
3
|
+
**Goal:** install the behavioral infrastructure that makes the agent safe, consistent, and productive regardless of what the model "decides" in the moment.
|
|
4
|
+
|
|
5
|
+
**Input:** Spec Layer from Phase 2 validated + project type.
|
|
6
|
+
|
|
7
|
+
**Output:** versioned `settings.json`, configured hooks, declared sub-agents, distributed symlinks.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Why the Harness Exists
|
|
12
|
+
|
|
13
|
+
The spec defines what the agent knows. **But production reliability depends more on the harness than on the model.** Without it:
|
|
14
|
+
|
|
15
|
+
- Inconsistent formatting accumulates in dirty diffs
|
|
16
|
+
- Destructive commands go through (`rm -rf`, `DROP TABLE`)
|
|
17
|
+
- Tests don't run at the end, regressions escape
|
|
18
|
+
- Each developer gets different AI behavior per machine
|
|
19
|
+
|
|
20
|
+
The harness eliminates each of these by construction, not by discipline.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## The Five Subsystems
|
|
25
|
+
|
|
26
|
+
| Subsystem | Function | Applicability |
|
|
27
|
+
| ------------------------- | ------------------------------------ | --------------------------------- |
|
|
28
|
+
| **Permission Harness** | Versioned `settings.json` | Universal |
|
|
29
|
+
| **Execution Harness** | Hooks (Pre/Post/Stop) | Software (with adaptations) |
|
|
30
|
+
| **Orchestration Harness** | Sub-agents | Universal |
|
|
31
|
+
| **Context Harness** | Token budget, Progressive Disclosure | Universal (already in Phase 2) |
|
|
32
|
+
| **Verification Harness** | Quality gates in skills | Universal (already in Phase 2) |
|
|
33
|
+
|
|
34
|
+
This phase implements the first three (the other two are already in the Phase 2 design).
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Step 1 — `settings.json`
|
|
39
|
+
|
|
40
|
+
Use the template in [TEMPLATES.md → settings.json](TEMPLATES.md#settingsjson). Adapt to the stack via table:
|
|
41
|
+
|
|
42
|
+
| Stack | Replace `<build-tool>` with |
|
|
43
|
+
| ----------- | ------------------------------------------------- |
|
|
44
|
+
| Node.js | `Bash(npm *)`, `Bash(npx *)` |
|
|
45
|
+
| Python | `Bash(pip *)`, `Bash(pytest *)`, `Bash(poetry *)` |
|
|
46
|
+
| Go | `Bash(go *)` |
|
|
47
|
+
| Java/Maven | `Bash(mvn *)` |
|
|
48
|
+
| Java/Gradle | `Bash(gradle *)`, `Bash(./gradlew *)` |
|
|
49
|
+
| Ruby | `Bash(bundle *)`, `Bash(rake *)` |
|
|
50
|
+
| PHP | `Bash(composer *)` |
|
|
51
|
+
| Rust | `Bash(cargo *)` |
|
|
52
|
+
| .NET | `Bash(dotnet *)` |
|
|
53
|
+
|
|
54
|
+
**Minimum structure:**
|
|
55
|
+
|
|
56
|
+
```json
|
|
57
|
+
{
|
|
58
|
+
"permissions": {
|
|
59
|
+
"allow": ["Read", "Bash(git *)", "<stack>", "Edit(/src/**)", "Edit(/.ai/**)"],
|
|
60
|
+
"deny": ["Bash(rm -rf *)", "Bash(git push --force*)"],
|
|
61
|
+
"ask": ["Bash(git push *)", "Edit(/.env*)"]
|
|
62
|
+
}
|
|
63
|
+
}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**Universal (non-software):** keep `Read`, `Bash(git *)`, `Edit(/.ai/**)`, and adapt `Edit` to the project layout. Skip stack entries.
|
|
67
|
+
|
|
68
|
+
**Versioning in git** is mandatory. Without it, behavior varies per machine and bugs are hard to reproduce.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Step 2 — Hooks
|
|
73
|
+
|
|
74
|
+
Hooks execute shell commands in response to agent events. **Three are indispensable** when applicable:
|
|
75
|
+
|
|
76
|
+
### Hook A — `PostToolUse` (automatic formatting)
|
|
77
|
+
|
|
78
|
+
**Applicable to:** software with a formatter.
|
|
79
|
+
|
|
80
|
+
```json
|
|
81
|
+
{
|
|
82
|
+
"hooks": {
|
|
83
|
+
"PostToolUse": [
|
|
84
|
+
{
|
|
85
|
+
"matcher": "Edit|Write",
|
|
86
|
+
"hooks": [
|
|
87
|
+
{
|
|
88
|
+
"type": "command",
|
|
89
|
+
"command": "bash scripts/format-file.sh \"$CLAUDE_TOOL_INPUT_FILE_PATH\""
|
|
90
|
+
}
|
|
91
|
+
]
|
|
92
|
+
}
|
|
93
|
+
]
|
|
94
|
+
}
|
|
95
|
+
}
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Script `format-file.sh` in [TEMPLATES.md → format-file.sh](TEMPLATES.md#format-filesh). It is stack-aware via `case` and never fails (`exit 0`) — missing formatter does not block the agent.
|
|
99
|
+
|
|
100
|
+
**Why indispensable:** without it, diffs get polluted with style changes, increasing code review cost.
|
|
101
|
+
|
|
102
|
+
### Hook B — `PreToolUse` (destructive blocking)
|
|
103
|
+
|
|
104
|
+
**Applicable to:** universal. Always install.
|
|
105
|
+
|
|
106
|
+
```json
|
|
107
|
+
{
|
|
108
|
+
"hooks": {
|
|
109
|
+
"PreToolUse": [
|
|
110
|
+
{
|
|
111
|
+
"matcher": "Bash",
|
|
112
|
+
"hooks": [{ "type": "command", "command": "bash scripts/validate-bash.sh" }]
|
|
113
|
+
}
|
|
114
|
+
]
|
|
115
|
+
}
|
|
116
|
+
}
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Script `validate-bash.sh` in [TEMPLATES.md → validate-bash.sh](TEMPLATES.md#validate-bashsh). Blocks patterns: `rm -rf /`, `DROP TABLE`, `TRUNCATE`, `DELETE FROM` without WHERE.
|
|
120
|
+
|
|
121
|
+
**Why indispensable:** the agent occasionally infers it needs to "clean up" files. Without protection, a context error is irreversible. Does not block normal work — only the dangerous cases.
|
|
122
|
+
|
|
123
|
+
### Hook C — `Stop` (tests on finish)
|
|
124
|
+
|
|
125
|
+
**Applicable to:** software with a test runner.
|
|
126
|
+
|
|
127
|
+
```json
|
|
128
|
+
{
|
|
129
|
+
"hooks": {
|
|
130
|
+
"Stop": [
|
|
131
|
+
{
|
|
132
|
+
"hooks": [
|
|
133
|
+
{
|
|
134
|
+
"type": "command",
|
|
135
|
+
"command": "bash scripts/run-tests-if-changed.sh"
|
|
136
|
+
}
|
|
137
|
+
]
|
|
138
|
+
}
|
|
139
|
+
]
|
|
140
|
+
}
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Script `run-tests-if-changed.sh` in [TEMPLATES.md → run-tests-if-changed.sh](TEMPLATES.md#run-tests-if-changedsh). Detects changed extensions in the diff and runs only the applicable test runner.
|
|
145
|
+
|
|
146
|
+
**Why indispensable:** closes the feedback loop. The agent not only "does" — it validates what it did. Regressions are caught in the same session.
|
|
147
|
+
|
|
148
|
+
### For non-technical projects
|
|
149
|
+
|
|
150
|
+
- **Hook A:** skip (no formatter)
|
|
151
|
+
- **Hook B:** **keep** (universal protection)
|
|
152
|
+
- **Hook C:** skip or replace with output validation (e.g., spell check, markdown lint)
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Step 3 — Sub-agents
|
|
157
|
+
|
|
158
|
+
Sub-agents enable smart delegation. The main agent **orchestrates**, sub-agents **execute**.
|
|
159
|
+
|
|
160
|
+
### `Explore` (built-in, always enable)
|
|
161
|
+
|
|
162
|
+
- Read-only access: `Glob`, `Grep`, `Read`, `WebFetch`, `WebSearch`
|
|
163
|
+
- Cannot edit files during research
|
|
164
|
+
- More efficient: does not load unused write tools
|
|
165
|
+
- For large codebases, use level `"very thorough"`
|
|
166
|
+
|
|
167
|
+
### When to delegate vs execute
|
|
168
|
+
|
|
169
|
+
| Task | Delegate? | Why |
|
|
170
|
+
| --------------------------------- | --------- | ------------------------------------ |
|
|
171
|
+
| Research / exploration | **Yes** | Bulky output; only the summary matters |
|
|
172
|
+
| Task implementation | **Yes** | File reads/edits consume context |
|
|
173
|
+
| Independent parallel tasks | **Yes** | Only way to parallelize |
|
|
174
|
+
| Sequential tasks without dependencies | **Yes** | Keeps main context clean |
|
|
175
|
+
| Planning and task creation | **No** | Requires accumulated context |
|
|
176
|
+
| Validation and final reports | **No** | Needs session history |
|
|
177
|
+
| Quick fixes (≤3 files) | **No** | Overhead > task |
|
|
178
|
+
|
|
179
|
+
### Sub-agent contract
|
|
180
|
+
|
|
181
|
+
**Receives:**
|
|
182
|
+
- Task definition (what to do, where, completion criteria)
|
|
183
|
+
- Relevant rules and conventions
|
|
184
|
+
- Spec/design the task references
|
|
185
|
+
|
|
186
|
+
**Does not receive:**
|
|
187
|
+
- Definitions of other tasks
|
|
188
|
+
- Accumulated chat history
|
|
189
|
+
- `STATE.md` (unless recording a specific decision/blocker)
|
|
190
|
+
|
|
191
|
+
**Returns:**
|
|
192
|
+
- Status: Complete | Blocked | Partial
|
|
193
|
+
- Changed files
|
|
194
|
+
- Test/validation result
|
|
195
|
+
- Issues found
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Step 4 — Symlinks by IDE
|
|
200
|
+
|
|
201
|
+
For each IDE declared in Phase 1, create symlinks. Use the script [setup-ide-links.sh in TEMPLATES.md](TEMPLATES.md#setup-ide-linkssh).
|
|
202
|
+
|
|
203
|
+
**Principle:** the script is **idempotent** (`ln -sf` replaces without error). Can run as many times as needed.
|
|
204
|
+
|
|
205
|
+
**Target folder by IDE:**
|
|
206
|
+
|
|
207
|
+
| IDE | Where it looks for context |
|
|
208
|
+
| -------------- | ---------------------------------------------------------- |
|
|
209
|
+
| Claude Code | `.claude/`, `CLAUDE.md` |
|
|
210
|
+
| Cursor | `.cursor/rules/`, `.cursor/skills/`, `AGENTS.md` |
|
|
211
|
+
| GitHub Copilot | `.github/copilot-instructions.md`, `.github/instructions/` |
|
|
212
|
+
| Windsurf | `AGENTS.md`, `.agents/` |
|
|
213
|
+
|
|
214
|
+
**Skip** symlinks for IDEs the user declared not using — reduces noise in `git status`.
|
|
215
|
+
|
|
216
|
+
**Smoke test after creating:**
|
|
217
|
+
|
|
218
|
+
```bash
|
|
219
|
+
ls -la CLAUDE.md AGENTS.md .claude/ .cursor/ .agents/ .github/
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
Each symlink should show `→ ../.ai/...` or similar.
|
|
223
|
+
|
|
224
|
+
### Windows
|
|
225
|
+
|
|
226
|
+
Symlinks on Windows require administrator permission or Developer Mode enabled. If the team uses Windows:
|
|
227
|
+
|
|
228
|
+
- Document in `INSTRUCTIONS.md` or `CONVENTIONS.md`
|
|
229
|
+
- Recommend `core.symlinks = true` in Git for Windows
|
|
230
|
+
- Alternative: use `mklink /D` in elevated terminal
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## Step 5 — Smoke Test and Gate
|
|
235
|
+
|
|
236
|
+
```bash
|
|
237
|
+
# 1. Verify settings.json
|
|
238
|
+
cat .claude/settings.json | jq . # or cat if no jq
|
|
239
|
+
|
|
240
|
+
# 2. Verify symlinks resolve
|
|
241
|
+
ls -la CLAUDE.md AGENTS.md
|
|
242
|
+
|
|
243
|
+
# 3. Verify hooks execute (create dummy file and watch lint run)
|
|
244
|
+
echo "test" > /tmp/test.ts && bash scripts/format-file.sh /tmp/test.ts
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
Present to user:
|
|
248
|
+
|
|
249
|
+
```markdown
|
|
250
|
+
## Harness Layer Configured
|
|
251
|
+
|
|
252
|
+
### Permissions
|
|
253
|
+
- N entries in allow, N in deny, N in ask
|
|
254
|
+
|
|
255
|
+
### Hooks installed
|
|
256
|
+
- PostToolUse: format-file.sh (Node/Python/Go per stack)
|
|
257
|
+
- PreToolUse: validate-bash.sh (destructive protection — universal)
|
|
258
|
+
- Stop: run-tests-if-changed.sh
|
|
259
|
+
|
|
260
|
+
### Symlinks created
|
|
261
|
+
[visual tree]
|
|
262
|
+
|
|
263
|
+
### Smoke test
|
|
264
|
+
[output of the 3 commands above]
|
|
265
|
+
|
|
266
|
+
### Question
|
|
267
|
+
Any additional destructive patterns to block? Any missing IDE?
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
**Wait for confirmation before Phase 4.**
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## Step 6 — Failure Attribution
|
|
275
|
+
|
|
276
|
+
> **Context:** ReliabilityBench (arxiv 2601.06112) demonstrated that pass@1 overestimates reliability by 20-40%. AgentProp-Bench (arxiv 2604.16706) showed that most benchmarks report only pass/fail, without locating where in the pipeline the failure occurred. AXIS instruments the harness for attribution.
|
|
277
|
+
|
|
278
|
+
**Configure structured logging in `settings.json`:**
|
|
279
|
+
|
|
280
|
+
```json
|
|
281
|
+
{
|
|
282
|
+
"hooks": {
|
|
283
|
+
"PreToolUse": [
|
|
284
|
+
{
|
|
285
|
+
"matcher": ".*",
|
|
286
|
+
"hooks": [
|
|
287
|
+
{
|
|
288
|
+
"type": "command",
|
|
289
|
+
"command": "echo \"{\\\"event\\\":\\\"pre\\\",\\\"tool\\\":\\\"$CLAUDE_TOOL_NAME\\\",\\\"ts\\\":\\\"$(date -Iseconds)\\\"}\" >> .ai/logs/harness.jsonl 2>/dev/null || true"
|
|
290
|
+
}
|
|
291
|
+
]
|
|
292
|
+
}
|
|
293
|
+
],
|
|
294
|
+
"PostToolUse": [
|
|
295
|
+
{
|
|
296
|
+
"matcher": ".*",
|
|
297
|
+
"hooks": [
|
|
298
|
+
{
|
|
299
|
+
"type": "command",
|
|
300
|
+
"command": "echo \"{\\\"event\\\":\\\"post\\\",\\\"tool\\\":\\\"$CLAUDE_TOOL_NAME\\\",\\\"exit\\\":\\\"$CLAUDE_TOOL_EXIT_CODE\\\",\\\"ts\\\":\\\"$(date -Iseconds)\\\"}\" >> .ai/logs/harness.jsonl 2>/dev/null || true"
|
|
301
|
+
}
|
|
302
|
+
]
|
|
303
|
+
}
|
|
304
|
+
]
|
|
305
|
+
}
|
|
306
|
+
}
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
Add `.ai/logs/` to `.gitignore` (they are runtime logs, not versioned).
|
|
310
|
+
|
|
311
|
+
**Failure attribution table:**
|
|
312
|
+
|
|
313
|
+
| Category | Symptom | Signal in log | Action |
|
|
314
|
+
| ------------- | ----------------------------------------- | --------------------------------------- | ------------------------------------------------------------------ |
|
|
315
|
+
| **Planning** | Agent attempts to execute without clear criteria | PreToolUse without corresponding spec task | Review `INSTRUCTIONS.md`; add acceptance criteria to skill |
|
|
316
|
+
| **Execution** | Tool call fails repeatedly | PostToolUse with `exit != 0` in loop | Review `settings.json`; adjust allow/deny |
|
|
317
|
+
| **Response** | Output generated but wrong format | Phase 5 gate rejects | Add output example to skill template |
|
|
318
|
+
|
|
319
|
+
**Add to Phase 5 checklist:**
|
|
320
|
+
- [ ] `harness.jsonl` exists and records events after smoke test
|
|
321
|
+
- [ ] No tool call loop with exit != 0 detected
|
|
322
|
+
|
|
323
|
+
---
|
|
324
|
+
|
|
325
|
+
## Unifying Principle
|
|
326
|
+
|
|
327
|
+
The gain from hooks: **removes dependency on manual discipline.** The formatter runs because the hook exists, not because the developer remembered. Tests run because `Stop` was configured, not because the agent decided to. Destructive commands are blocked because the rule exists, not because the agent "was careful".
|
|
328
|
+
|
|
329
|
+
**Production failures are not opaque** — the instrumented harness locates whether the problem is in planning (vague spec), execution (invalid tool call) or response (wrong format). This eliminates trial-and-error debugging.
|
|
330
|
+
|
|
331
|
+
Spec defines what the agent knows. Harness ensures it acts consistently, safely, and traceably — regardless of the conversation context.
|