@tianhai/pi-workflow-kit 0.8.4 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -1
- package/docs/developer-usage-guide.md +10 -0
- package/docs/plans/completed/2026-04-28-executing-tasks-redesign-design.md +171 -0
- package/docs/plans/completed/2026-04-28-executing-tasks-redesign-implementation.md +208 -0
- package/docs/plans/completed/2026-04-28-executing-tasks-redesign-progress.md +14 -0
- package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-design.md +154 -0
- package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-implementation.md +315 -0
- package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-progress.md +15 -0
- package/docs/workflow-phases.md +13 -1
- package/package.json +1 -1
- package/skills/brainstorming/SKILL.md +17 -1
- package/skills/diagnose/SKILL.md +56 -0
- package/skills/executing-tasks/SKILL.md +160 -16
- package/skills/finalizing/SKILL.md +26 -2
- package/skills/writing-plans/SKILL.md +41 -0
|
@@ -0,0 +1,315 @@
|
|
|
1
|
+
# Implementation Plan: Incorporate mattpocock/skills Ideas
|
|
2
|
+
|
|
3
|
+
Design doc: `docs/plans/2026-05-01-incorporate-mattpocock-skills-design.md`
|
|
4
|
+
|
|
5
|
+
## Task 1: Update brainstorming skill — design it twice + ADRs
|
|
6
|
+
|
|
7
|
+
<!-- tdd: trivial -->
|
|
8
|
+
<!-- checkpoint: none -->
|
|
9
|
+
|
|
10
|
+
Edit `skills/brainstorming/SKILL.md`:
|
|
11
|
+
|
|
12
|
+
**Step 3** — change from:
|
|
13
|
+
|
|
14
|
+
```
|
|
15
|
+
3. **Explore approaches** — propose 2-3 approaches with trade-offs. Lead with your recommendation.
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
to:
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
3. **Explore approaches** — propose 2-3 approaches. For each approach, sketch the concrete interface (types, method signatures, example caller code) so the comparison is grounded in actual code, not abstract descriptions. Lead with your recommendation.
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Step 4** — change from:
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
to:
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
|
|
34
|
+
|
|
35
|
+
When a significant architectural decision is identified, offer to write a lightweight ADR to `docs/plans/adr/`. Only write an ADR when all three are true:
|
|
36
|
+
|
|
37
|
+
1. **Hard to reverse** — changing your mind later has meaningful cost
|
|
38
|
+
2. **Surprising without context** — a future reader will wonder "why?"
|
|
39
|
+
3. **A real trade-off** — there were genuine alternatives
|
|
40
|
+
|
|
41
|
+
ADR format — a title and 1-3 sentences covering context, decision, and why:
|
|
42
|
+
|
|
43
|
+
```markdown
|
|
44
|
+
# <Short title of the decision>
|
|
45
|
+
|
|
46
|
+
<1-3 sentences: context, decision, and why.>
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
ADRs live under `docs/plans/adr/` and are archived during finalizing alongside the design doc.
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
git commit -m "feat(brainstorming): add design-it-twice interface sketches and ADR output"
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Task 2: Update writing-plans skill — vertical slices
|
|
59
|
+
|
|
60
|
+
<!-- tdd: trivial -->
|
|
61
|
+
<!-- checkpoint: none -->
|
|
62
|
+
|
|
63
|
+
Edit `skills/writing-plans/SKILL.md` — add a new section after "## Task format" and before "## TDD in the plan":
|
|
64
|
+
|
|
65
|
+
```markdown
|
|
66
|
+
## Vertical slices
|
|
67
|
+
|
|
68
|
+
Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior.
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
WRONG (horizontal):
|
|
72
|
+
Task 1: Create database schema for users
|
|
73
|
+
Task 2: Write user API endpoints
|
|
74
|
+
Task 3: Build user UI components
|
|
75
|
+
Task 4: Wire everything together
|
|
76
|
+
|
|
77
|
+
RIGHT (vertical):
|
|
78
|
+
Task 1: User can sign up (model + endpoint + validation + test)
|
|
79
|
+
Task 2: User can log in (auth check + token + test)
|
|
80
|
+
Task 3: User can view profile (query + endpoint + test)
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Vertical slices ensure every committed task leaves the codebase in a testable state and reduces the blast radius of a bad task.
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
git commit -m "feat(writing-plans): add vertical slice guidance with anti-pattern example"
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Task 3: Update executing-tasks skill — deep modules refactoring
|
|
93
|
+
|
|
94
|
+
<!-- tdd: trivial -->
|
|
95
|
+
<!-- checkpoint: none -->
|
|
96
|
+
|
|
97
|
+
Edit `skills/executing-tasks/SKILL.md` — add a new section after "## TDD discipline":
|
|
98
|
+
|
|
99
|
+
```markdown
|
|
100
|
+
## Refactoring
|
|
101
|
+
|
|
102
|
+
After all tests pass for a task, check for refactoring opportunities:
|
|
103
|
+
|
|
104
|
+
- **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
|
|
105
|
+
- **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
|
|
106
|
+
- **Duplication** — extract repeated patterns
|
|
107
|
+
- **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
|
|
108
|
+
|
|
109
|
+
Run tests after each refactor step. Never refactor while tests are failing.
|
|
110
|
+
|
|
111
|
+
Key vocabulary: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
git commit -m "feat(executing-tasks): add refactoring checklist with deep modules vocabulary"
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## Task 4: Create diagnose skill
|
|
121
|
+
|
|
122
|
+
<!-- tdd: trivial -->
|
|
123
|
+
<!-- checkpoint: none -->
|
|
124
|
+
|
|
125
|
+
Create `skills/diagnose/SKILL.md`:
|
|
126
|
+
|
|
127
|
+
```markdown
|
|
128
|
+
---
|
|
129
|
+
name: diagnose
|
|
130
|
+
description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
# Diagnose
|
|
134
|
+
|
|
135
|
+
A 6-phase debugging discipline. Phase 1 is the skill — spend disproportionate effort here.
|
|
136
|
+
|
|
137
|
+
## Phase 1 — Build a feedback loop
|
|
138
|
+
|
|
139
|
+
Create a fast, deterministic, agent-runnable pass/fail signal for the bug before doing anything else. Try in this order: failing test, curl script, CLI invocation, headless browser script.
|
|
140
|
+
|
|
141
|
+
The loop must produce the failure mode the **user** described — not a nearby but different failure. Iterate on the loop itself: can you make it faster? Sharper? More deterministic?
|
|
142
|
+
|
|
143
|
+
If you genuinely cannot build a loop, stop and say so. List what you tried. Ask for access to a reproducing environment or a captured artifact.
|
|
144
|
+
|
|
145
|
+
Do not proceed until you have a loop you believe in.
|
|
146
|
+
|
|
147
|
+
## Phase 2 — Reproduce
|
|
148
|
+
|
|
149
|
+
Run the loop. Confirm:
|
|
150
|
+
- The failure matches the user's reported symptom
|
|
151
|
+
- The failure is reproducible across multiple runs
|
|
152
|
+
- You've captured the exact symptom (error message, wrong output, slow timing)
|
|
153
|
+
|
|
154
|
+
## Phase 3 — Hypothesise
|
|
155
|
+
|
|
156
|
+
Generate 3-5 ranked hypotheses. Each must be falsifiable:
|
|
157
|
+
|
|
158
|
+
> "If `<X>` is the cause, then `<changing Y>` will make the bug disappear / `<changing Z>` will make it worse."
|
|
159
|
+
|
|
160
|
+
Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly.
|
|
161
|
+
|
|
162
|
+
## Phase 4 — Instrument
|
|
163
|
+
|
|
164
|
+
Each probe must map to a specific hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup later. Prefer a debugger breakpoint over logs when available.
|
|
165
|
+
|
|
166
|
+
## Phase 5 — Fix + regression test
|
|
167
|
+
|
|
168
|
+
Write the regression test **before** the fix — but only if there's a correct seam (one that exercises the real bug pattern at the call site). If no correct seam exists, note it — the codebase architecture is preventing the bug from being locked down.
|
|
169
|
+
|
|
170
|
+
## Phase 6 — Cleanup
|
|
171
|
+
|
|
172
|
+
Required before declaring done:
|
|
173
|
+
- Original repro no longer triggers
|
|
174
|
+
- Regression test passes (or absence of seam is documented)
|
|
175
|
+
- All `[DEBUG-...]` instrumentation removed
|
|
176
|
+
- Ask: what would have prevented this bug?
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
git commit -m "feat(diagnose): add standalone debugging skill with 6-phase loop"
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Task 5: Update finalizing skill — archive ADRs
|
|
186
|
+
|
|
187
|
+
<!-- tdd: trivial -->
|
|
188
|
+
<!-- checkpoint: none -->
|
|
189
|
+
|
|
190
|
+
Edit `skills/finalizing/SKILL.md` — update step 1 from:
|
|
191
|
+
|
|
192
|
+
```
|
|
193
|
+
1. **Move planning docs** — archive the design, implementation, and progress docs, then commit:
|
|
194
|
+
```
|
|
195
|
+
mkdir -p docs/plans/completed
|
|
196
|
+
mv docs/plans/*-design.md docs/plans/completed/
|
|
197
|
+
mv docs/plans/*-implementation.md docs/plans/completed/
|
|
198
|
+
mv docs/plans/*-progress.md docs/plans/completed/
|
|
199
|
+
git add docs/plans/ && git commit -m "chore: archive planning docs"
|
|
200
|
+
```
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
to:
|
|
204
|
+
|
|
205
|
+
```
|
|
206
|
+
1. **Move planning docs** — archive the design, implementation, progress docs, and ADRs (if any), then commit:
|
|
207
|
+
```
|
|
208
|
+
mkdir -p docs/plans/completed
|
|
209
|
+
mkdir -p docs/plans/completed/adr
|
|
210
|
+
mv docs/plans/*-design.md docs/plans/completed/
|
|
211
|
+
mv docs/plans/*-implementation.md docs/plans/completed/
|
|
212
|
+
mv docs/plans/*-progress.md docs/plans/completed/
|
|
213
|
+
mv docs/plans/adr/*.md docs/plans/completed/adr/ 2>/dev/null || true
|
|
214
|
+
rmdir docs/plans/adr 2>/dev/null || true
|
|
215
|
+
git add docs/plans/ && git commit -m "chore: archive planning docs"
|
|
216
|
+
```
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
```bash
|
|
220
|
+
git commit -m "feat(finalizing): archive ADRs alongside planning docs"
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
---
|
|
224
|
+
|
|
225
|
+
## Task 6: Update documentation
|
|
226
|
+
|
|
227
|
+
<!-- tdd: trivial -->
|
|
228
|
+
<!-- checkpoint: none -->
|
|
229
|
+
|
|
230
|
+
### README.md
|
|
231
|
+
|
|
232
|
+
Update the intro line from:
|
|
233
|
+
|
|
234
|
+
```
|
|
235
|
+
**4 workflow skills** that guide the agent through a structured development process:
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
to:
|
|
239
|
+
|
|
240
|
+
```
|
|
241
|
+
**4 workflow skills** and **1 utility skill** that guide the agent through a structured development process:
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
Update the pipeline diagram from:
|
|
245
|
+
|
|
246
|
+
```
|
|
247
|
+
brainstorm → plan → execute → finalize
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
to:
|
|
251
|
+
|
|
252
|
+
```
|
|
253
|
+
brainstorm → plan → execute → finalize
|
|
254
|
+
↕
|
|
255
|
+
diagnose (on demand)
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
Add `diagnose` to the skills table:
|
|
259
|
+
|
|
260
|
+
```
|
|
261
|
+
| `diagnose` | ~35 | 6-phase debugging loop: build feedback loop, reproduce, hypothesise, instrument, fix, cleanup |
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
Update the Architecture section to include `diagnose/`:
|
|
265
|
+
|
|
266
|
+
```
|
|
267
|
+
├── skills/
|
|
268
|
+
│ ├── brainstorming/SKILL.md
|
|
269
|
+
│ ├── writing-plans/SKILL.md
|
|
270
|
+
│ ├── executing-tasks/SKILL.md
|
|
271
|
+
│ ├── finalizing/SKILL.md
|
|
272
|
+
│ └── diagnose/SKILL.md
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
### docs/developer-usage-guide.md
|
|
276
|
+
|
|
277
|
+
Add to the brainstorm section (after "Outcome"):
|
|
278
|
+
|
|
279
|
+
```
|
|
280
|
+
- Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
Add a new section after the 4 workflow phases:
|
|
284
|
+
|
|
285
|
+
```markdown
|
|
286
|
+
### 5. Diagnose (on demand)
|
|
287
|
+
|
|
288
|
+
```
|
|
289
|
+
/skill:diagnose
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### docs/workflow-phases.md
|
|
296
|
+
|
|
297
|
+
Add a new section at the end:
|
|
298
|
+
|
|
299
|
+
```markdown
|
|
300
|
+
## diagnose
|
|
301
|
+
|
|
302
|
+
```
|
|
303
|
+
/skill:diagnose
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
Not a pipeline phase. A utility skill invoked on demand when debugging is needed.
|
|
307
|
+
|
|
308
|
+
- Build a feedback loop (failing test, curl script, etc.)
|
|
309
|
+
- Reproduce, hypothesise, instrument, fix, cleanup
|
|
310
|
+
- No write restrictions (used during execute/finalize, or outside the pipeline)
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
```bash
|
|
314
|
+
git commit -m "docs: update README, usage guide, and workflow phases for new skills"
|
|
315
|
+
```
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
# Progress: incorporate-mattpocock-skills
|
|
2
|
+
|
|
3
|
+
Plan: docs/plans/2026-05-01-incorporate-mattpocock-skills-implementation.md
|
|
4
|
+
Branch: incorporate-mattpocock-skills
|
|
5
|
+
Started: 2026-05-01T00:00:00Z
|
|
6
|
+
Last updated: 2026-05-01T00:00:00Z
|
|
7
|
+
|
|
8
|
+
| # | Status | Task | Commit |
|
|
9
|
+
|---|--------|------|--------|
|
|
10
|
+
| 1 | ✅ done | Update brainstorming skill — design it twice + ADRs | 0231b84 |
|
|
11
|
+
| 2 | ✅ done | Update writing-plans skill — vertical slices | 22a46df |
|
|
12
|
+
| 3 | ✅ done | Update executing-tasks skill — deep modules refactoring | c405634 |
|
|
13
|
+
| 4 | ✅ done | Create diagnose skill | 5e39e2d |
|
|
14
|
+
| 5 | ✅ done | Update finalizing skill — archive ADRs | e31a1af |
|
|
15
|
+
| 6 | ✅ done | Update documentation (README, usage guide, workflow phases) | 8c1c4eb |
|
package/docs/workflow-phases.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Workflow Phases
|
|
2
2
|
|
|
3
|
-
`pi-workflow-kit` has 4 phases. You invoke each one explicitly with `/skill:`.
|
|
3
|
+
`pi-workflow-kit` has 4 phases and 1 utility skill. You invoke each one explicitly with `/skill:`.
|
|
4
4
|
|
|
5
5
|
```
|
|
6
6
|
brainstorm → plan → execute → finalize
|
|
@@ -55,3 +55,15 @@ No write restrictions. All tools available.
|
|
|
55
55
|
- Clean up worktree if one was used
|
|
56
56
|
|
|
57
57
|
No write restrictions. All tools available.
|
|
58
|
+
|
|
59
|
+
## diagnose
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
/skill:diagnose
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Not a pipeline phase. A utility skill invoked on demand when debugging is needed.
|
|
66
|
+
|
|
67
|
+
- Build a feedback loop (failing test, curl script, etc.)
|
|
68
|
+
- Reproduce, hypothesise, instrument, fix, cleanup
|
|
69
|
+
- No write restrictions (used during execute/finalize, or outside the pipeline)
|
package/package.json
CHANGED
|
@@ -11,8 +11,24 @@ Read-only exploration. You may **not** edit or create any files except under `do
|
|
|
11
11
|
|
|
12
12
|
1. **Check git state** — run `git status` and `git log --oneline -5`. If there's uncommitted work, ask the user what to do with it first.
|
|
13
13
|
2. **Understand the idea** — read existing code, docs, and recent commits. Ask questions one at a time to refine the idea. Prefer multiple choice when possible.
|
|
14
|
-
3. **Explore approaches** — propose 2-3 approaches
|
|
14
|
+
3. **Explore approaches** — propose 2-3 approaches. For each approach, sketch the concrete interface (types, method signatures, example caller code) so the comparison is grounded in actual code, not abstract descriptions. Lead with your recommendation.
|
|
15
15
|
4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
|
|
16
|
+
|
|
17
|
+
When a significant architectural decision is identified, offer to write a lightweight ADR to `docs/plans/adr/`. Only write an ADR when all three are true:
|
|
18
|
+
|
|
19
|
+
1. **Hard to reverse** — changing your mind later has meaningful cost
|
|
20
|
+
2. **Surprising without context** — a future reader will wonder "why?"
|
|
21
|
+
3. **A real trade-off** — there were genuine alternatives
|
|
22
|
+
|
|
23
|
+
ADR format — a title and 1-3 sentences covering context, decision, and why:
|
|
24
|
+
|
|
25
|
+
```markdown
|
|
26
|
+
# <Short title of the decision>
|
|
27
|
+
|
|
28
|
+
<1-3 sentences: context, decision, and why.>
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
ADRs live under `docs/plans/adr/` and are archived during finalizing alongside the design doc.
|
|
16
32
|
5. **Write the design doc** — save it to `docs/plans/YYYY-MM-DD-<topic>-design.md`. Ask the user to commit it. Branch creation and worktree setup should be deferred to the execution phase (`/skill:executing-tasks`).
|
|
17
33
|
|
|
18
34
|
## Principles
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: diagnose
|
|
3
|
+
description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Diagnose
|
|
7
|
+
|
|
8
|
+
A 6-phase debugging discipline. Phase 1 is the skill — spend disproportionate effort here.
|
|
9
|
+
|
|
10
|
+
## Phase 1 — Build a feedback loop
|
|
11
|
+
|
|
12
|
+
Create a fast, deterministic, agent-runnable pass/fail signal for the bug before doing anything else. Try in this order: failing test, curl script, CLI invocation, headless browser script.
|
|
13
|
+
|
|
14
|
+
Other strategies when the basics don't work:
|
|
15
|
+
- **Bisection** — bug appeared between two known states? Automate "boot at state X, check, repeat" to bisect
|
|
16
|
+
- **Replay** — save a real network request or event log to disk, replay it through the code path in isolation
|
|
17
|
+
|
|
18
|
+
The loop must produce the failure mode the **user** described — not a nearby but different failure. Iterate on the loop itself: can you make it faster? Sharper? More deterministic?
|
|
19
|
+
|
|
20
|
+
If you genuinely cannot build a loop, stop and say so. List what you tried. Ask for access to a reproducing environment or a captured artifact.
|
|
21
|
+
|
|
22
|
+
Do not proceed until you have a loop you believe in.
|
|
23
|
+
|
|
24
|
+
## Phase 2 — Reproduce
|
|
25
|
+
|
|
26
|
+
Run the loop. Confirm:
|
|
27
|
+
- The failure matches the user's reported symptom
|
|
28
|
+
- The failure is reproducible across multiple runs
|
|
29
|
+
- You've captured the exact symptom (error message, wrong output, slow timing)
|
|
30
|
+
|
|
31
|
+
Then **minimize the repro** — strip it down to the smallest input, shortest path, or fewest steps that still triggers the bug. A minimized repro dramatically narrows the hypothesis space.
|
|
32
|
+
|
|
33
|
+
## Phase 3 — Hypothesise
|
|
34
|
+
|
|
35
|
+
Generate 3-5 ranked hypotheses. Each must be falsifiable:
|
|
36
|
+
|
|
37
|
+
> "If `<X>` is the cause, then `<changing Y>` will make the bug disappear / `<changing Z>` will make it worse."
|
|
38
|
+
|
|
39
|
+
Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly.
|
|
40
|
+
|
|
41
|
+
## Phase 4 — Instrument
|
|
42
|
+
|
|
43
|
+
Each probe must map to a specific hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup later. Prefer a debugger breakpoint over logs when available.
|
|
44
|
+
|
|
45
|
+
## Phase 5 — Fix + regression test
|
|
46
|
+
|
|
47
|
+
Write the regression test **before** the fix — but only if there's a correct seam (one that exercises the real bug pattern at the call site). If no correct seam exists, note it — the codebase architecture is preventing the bug from being locked down.
|
|
48
|
+
|
|
49
|
+
## Phase 6 — Cleanup
|
|
50
|
+
|
|
51
|
+
Required before declaring done:
|
|
52
|
+
- Original repro no longer triggers
|
|
53
|
+
- Regression test passes (or absence of seam is documented)
|
|
54
|
+
- All `[DEBUG-...]` instrumentation removed
|
|
55
|
+
- Ask: what would have prevented this bug?
|
|
56
|
+
- If the bug was caused by an architectural problem (no good test seam, tangled callers, hidden coupling), suggest writing an ADR to `docs/plans/adr/` capturing that insight
|
|
@@ -5,12 +5,33 @@ description: "Use this to implement an approved plan task-by-task. Run after wri
|
|
|
5
5
|
|
|
6
6
|
# Executing Tasks
|
|
7
7
|
|
|
8
|
-
Implement the plan from `docs/plans/*-implementation.md` task by task.
|
|
8
|
+
Implement the plan from `docs/plans/*-implementation.md` task by task, with file-based progress tracking and session-aware context management.
|
|
9
9
|
|
|
10
10
|
## Before you start
|
|
11
11
|
|
|
12
12
|
1. **Check git state** — run `git status` and `git log --oneline -5`. Note any uncommitted changes.
|
|
13
|
-
2. **
|
|
13
|
+
2. **Find the plan** — look for `docs/plans/*-implementation.md`. If multiple exist, ask the user which one to execute.
|
|
14
|
+
3. **Check for existing progress** — look for `docs/plans/*-progress.md`. If one exists matching the plan, this is a **resume** (see [Resume](#resume)). If not, this is a **first run** (see [First run](#first-run)).
|
|
15
|
+
|
|
16
|
+
## First run
|
|
17
|
+
|
|
18
|
+
1. **Parse the implementation plan** — read the plan and extract all `## Task N:` headings. Build the progress table with all tasks as `⬜ pending`.
|
|
19
|
+
2. **Create the progress file** — save to `docs/plans/<plan-name>-progress.md` (replace `-implementation` with `-progress` in the plan filename):
|
|
20
|
+
|
|
21
|
+
```markdown
|
|
22
|
+
# Progress: <topic>
|
|
23
|
+
|
|
24
|
+
Plan: docs/plans/YYYY-MM-DD-<topic>-implementation.md
|
|
25
|
+
Branch: <current-branch>
|
|
26
|
+
Started: <ISO timestamp>
|
|
27
|
+
Last updated: <ISO timestamp>
|
|
28
|
+
|
|
29
|
+
| # | Status | Task | Commit |
|
|
30
|
+
|---|--------|------|--------|
|
|
31
|
+
| 1 | ⬜ pending | Task description (preserve checkpoint labels) | — |
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
3. **Suggest workspace isolation** — if the user isn't already on a feature branch or worktree, present the options:
|
|
14
35
|
|
|
15
36
|
- **Branch** (smaller changes):
|
|
16
37
|
```
|
|
@@ -23,12 +44,63 @@ Implement the plan from `docs/plans/*-implementation.md` task by task.
|
|
|
23
44
|
|
|
24
45
|
Derive `<feature-name>` from the plan doc (e.g. `docs/plans/2026-04-16-auth-design.md` → `auth`). Ask the user which they prefer, then wait for confirmation before proceeding.
|
|
25
46
|
|
|
26
|
-
|
|
47
|
+
4. **Commit the plan docs** — if `docs/plans/` has uncommitted files, commit them on the new branch:
|
|
27
48
|
```
|
|
28
49
|
git add docs/plans/ && git commit -m "docs: add design and implementation plan"
|
|
29
50
|
```
|
|
30
51
|
|
|
31
|
-
|
|
52
|
+
5. **Begin task execution** — start with task 1 (see [Per-task execution](#per-task-execution)).
|
|
53
|
+
|
|
54
|
+
## Resume
|
|
55
|
+
|
|
56
|
+
1. **Read the progress file** — find the first task with status `⬜ pending`, `❌ failed`, or `🔄 in-progress`.
|
|
57
|
+
2. **Handle in-progress task** — if a task is `🔄 in-progress` (mid-task crash):
|
|
58
|
+
- Check `git log --oneline` since the last `✅ done` task's commit
|
|
59
|
+
- If commits exist: ask the user — "Task N was in progress and commits were made. Continue from here, or reset it to pending?"
|
|
60
|
+
- If no commits: restart the task (reset to `🔄 in-progress` and begin)
|
|
61
|
+
3. **Handle failed task** — if a task is `❌ failed`:
|
|
62
|
+
- Show the failure reason from the progress file
|
|
63
|
+
- Ask: "Retry, skip, or abort?"
|
|
64
|
+
4. **Handle pending task** — proceed normally
|
|
65
|
+
5. **All done** — if no `⬜ pending` or `❌ failed` tasks remain, show summary and suggest `/skill:finalizing`
|
|
66
|
+
6. **Begin task execution** — proceed from the identified task
|
|
67
|
+
|
|
68
|
+
## Progress file
|
|
69
|
+
|
|
70
|
+
**Path:** `docs/plans/<plan-name>-progress.md`
|
|
71
|
+
|
|
72
|
+
**Status values:**
|
|
73
|
+
|
|
74
|
+
| Status | Meaning |
|
|
75
|
+
|--------|---------|
|
|
76
|
+
| `⬜ pending` | Not started |
|
|
77
|
+
| `🔄 in-progress` | Currently being worked on |
|
|
78
|
+
| `✅ done` | Committed successfully |
|
|
79
|
+
| `❌ failed` | Could not complete (append `Failed: <reason>`) |
|
|
80
|
+
| `⏭ skipped` | User chose to skip |
|
|
81
|
+
|
|
82
|
+
**Update rules:**
|
|
83
|
+
- Mark `🔄 in-progress` immediately when starting a task
|
|
84
|
+
- Mark `✅ done` + record commit hash only after successful `git commit`
|
|
85
|
+
- Mark `❌ failed` + append reason when the agent can't proceed after retrying
|
|
86
|
+
- Mark `⏭ skipped` when the user says "skip"
|
|
87
|
+
- Update `Last updated` timestamp on every change
|
|
88
|
+
- Preserve checkpoint labels in the task description column
|
|
89
|
+
|
|
90
|
+
## Per-task execution
|
|
91
|
+
|
|
92
|
+
For each task the agent works on:
|
|
93
|
+
|
|
94
|
+
1. **Mark in-progress** — update the progress file: `🔄 in-progress`
|
|
95
|
+
2. **Read only the relevant task** — grep/jump to `## Task N:` in the implementation plan. Do not read the entire plan.
|
|
96
|
+
3. **Implement** — follow the TDD discipline (see [TDD discipline](#tdd-discipline)) and checkpoint flow (see [Checkpoints](#checkpoints))
|
|
97
|
+
4. **Commit** — `git add` the relevant files and commit with a clear message
|
|
98
|
+
5. **Update progress** — mark `✅ done` + record the commit hash
|
|
99
|
+
6. **Check next task** — look at the next task in the progress file:
|
|
100
|
+
- **Has checkpoint** → pause for review (see [Checkpoint review](#checkpoint-review))
|
|
101
|
+
- **No checkpoint** → continue to the next task
|
|
102
|
+
|
|
103
|
+
## Checkpoints
|
|
32
104
|
|
|
33
105
|
Check each task for a `checkpoint` label and follow the appropriate flow:
|
|
34
106
|
|
|
@@ -54,16 +126,6 @@ Check each task for a `checkpoint` label and follow the appropriate flow:
|
|
|
54
126
|
4. **Pause for review** — show what was done and the diff, then wait for human input
|
|
55
127
|
5. **Commit** — `git add` the relevant files and commit with a clear message
|
|
56
128
|
|
|
57
|
-
## TDD discipline
|
|
58
|
-
|
|
59
|
-
Follow the TDD scenario from the plan:
|
|
60
|
-
|
|
61
|
-
- **New feature**: write the test first, see it fail, then implement
|
|
62
|
-
- **Modifying tested code**: run existing tests before and after
|
|
63
|
-
- **Trivial change**: use judgment
|
|
64
|
-
|
|
65
|
-
Don't skip tests because "it's obvious." The test is the contract.
|
|
66
|
-
|
|
67
129
|
## Checkpoint review
|
|
68
130
|
|
|
69
131
|
When pausing at a checkpoint, present:
|
|
@@ -83,6 +145,76 @@ Wait for the human to respond. They may:
|
|
|
83
145
|
- Ask to revert the task
|
|
84
146
|
- Adjust the remaining plan
|
|
85
147
|
|
|
148
|
+
## TDD discipline
|
|
149
|
+
|
|
150
|
+
Follow the TDD scenario from the plan:
|
|
151
|
+
|
|
152
|
+
- **New feature**: write the test first, see it fail, then implement
|
|
153
|
+
- **Modifying tested code**: run existing tests before and after
|
|
154
|
+
- **Trivial change**: use judgment
|
|
155
|
+
|
|
156
|
+
Don't skip tests because "it's obvious." The test is the contract.
|
|
157
|
+
|
|
158
|
+
## Refactoring
|
|
159
|
+
|
|
160
|
+
After all tests pass for a task, check for refactoring opportunities:
|
|
161
|
+
|
|
162
|
+
- **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
|
|
163
|
+
- **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
|
|
164
|
+
- **Duplication** — extract repeated patterns
|
|
165
|
+
- **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
|
|
166
|
+
|
|
167
|
+
Run tests after each refactor step. Never refactor while tests are failing.
|
|
168
|
+
|
|
169
|
+
Key vocabulary: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
|
|
170
|
+
|
|
171
|
+
## Batching and session management
|
|
172
|
+
|
|
173
|
+
The agent suggests a fresh session at natural break points to minimize token accumulation. After completing ~3-5 non-checkpoint tasks in the same session, suggest:
|
|
174
|
+
|
|
175
|
+
```
|
|
176
|
+
✅ Tasks 3-5 done (commits: a1b2, e4f5, i7j8)
|
|
177
|
+
|
|
178
|
+
Progress: 5/10 tasks done
|
|
179
|
+
|
|
180
|
+
⏭ Next: Task 6 — Add auth middleware (no checkpoint)
|
|
181
|
+
|
|
182
|
+
💡 Context is building up. For clean context on remaining tasks:
|
|
183
|
+
/new then /skill:executing-tasks
|
|
184
|
+
(or just say "continue" to keep going here)
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
The user can say "continue" to keep going in the same session. Respect their choice.
|
|
188
|
+
|
|
189
|
+
Also suggest `/new` at checkpoint review pauses when multiple tasks have been completed since the last session break.
|
|
190
|
+
|
|
191
|
+
## Progress file updates (automated)
|
|
192
|
+
|
|
193
|
+
During execution, the agent should update the progress file in place. Example workflow:
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
# Before task 2 starts:
|
|
197
|
+
sed -i 's/| 2 | ⬜ pending/| 2 | 🔄 in-progress/'
|
|
198
|
+
# After successful commit a1b2c3d:
|
|
199
|
+
sed -i 's/| 2 | 🔄 in-progress/| 2 | ✅ done/'
|
|
200
|
+
sed -i 's/| 2 | ✅ done[^|]*|/| 2 | ✅ done | a1b2c3d |/'
|
|
201
|
+
# Update timestamp:
|
|
202
|
+
sed -i "s/Last updated:.*/Last updated: $(date -u +%Y-%m-%dT%H:%M:%SZ)/"
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
Note: The agent should use proper markdown table parsing (not naive sed in production) to avoid corrupting the file — ensure the replacement targets the correct row.
|
|
206
|
+
|
|
207
|
+
## User override commands
|
|
208
|
+
|
|
209
|
+
The user can issue these commands at any time during execution:
|
|
210
|
+
|
|
211
|
+
| User says | Agent does |
|
|
212
|
+
|-----------|-----------|
|
|
213
|
+
| `skip` | Mark current task `⏭ skipped`, move to next |
|
|
214
|
+
| `status` | Show the progress table |
|
|
215
|
+
| `stop` | Mark current task back to `⬜ pending`, suggest `/new` |
|
|
216
|
+
| `retry` | Re-read current task section, start over |
|
|
217
|
+
|
|
86
218
|
## Receiving code review
|
|
87
219
|
|
|
88
220
|
When the user shares code review feedback:
|
|
@@ -94,10 +226,22 @@ When the user shares code review feedback:
|
|
|
94
226
|
|
|
95
227
|
## If you're stuck
|
|
96
228
|
|
|
97
|
-
- Re-read the plan — you may have drifted from the spec
|
|
229
|
+
- Re-read the current task section from the plan — you may have drifted from the spec
|
|
98
230
|
- Check git log — recent commits may reveal context
|
|
99
231
|
- Ask the user — it's better to clarify than to guess wrong
|
|
100
232
|
|
|
101
233
|
## After all tasks
|
|
102
234
|
|
|
103
|
-
|
|
235
|
+
When no `⬜ pending` or `❌ failed` tasks remain, show a summary:
|
|
236
|
+
|
|
237
|
+
```
|
|
238
|
+
✅ All tasks complete!
|
|
239
|
+
|
|
240
|
+
| # | Status | Task |
|
|
241
|
+
|---|--------|------|
|
|
242
|
+
| 1 | ✅ done | Create User model |
|
|
243
|
+
| 2 | ✅ done | Write User model tests |
|
|
244
|
+
| 3 | ⏭ skipped | Add auth middleware |
|
|
245
|
+
|
|
246
|
+
Ready to ship? Run `/skill:finalizing`
|
|
247
|
+
```
|