@tianhai/pi-workflow-kit 0.8.4 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,315 @@
1
+ # Implementation Plan: Incorporate mattpocock/skills Ideas
2
+
3
+ Design doc: `docs/plans/2026-05-01-incorporate-mattpocock-skills-design.md`
4
+
5
+ ## Task 1: Update brainstorming skill — design it twice + ADRs
6
+
7
+ <!-- tdd: trivial -->
8
+ <!-- checkpoint: none -->
9
+
10
+ Edit `skills/brainstorming/SKILL.md`:
11
+
12
+ **Step 3** — change from:
13
+
14
+ ```
15
+ 3. **Explore approaches** — propose 2-3 approaches with trade-offs. Lead with your recommendation.
16
+ ```
17
+
18
+ to:
19
+
20
+ ```
21
+ 3. **Explore approaches** — propose 2-3 approaches. For each approach, sketch the concrete interface (types, method signatures, example caller code) so the comparison is grounded in actual code, not abstract descriptions. Lead with your recommendation.
22
+ ```
23
+
24
+ **Step 4** — change from:
25
+
26
+ ```
27
+ 4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
28
+ ```
29
+
30
+ to:
31
+
32
+ ```
33
+ 4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
34
+
35
+ When a significant architectural decision is identified, offer to write a lightweight ADR to `docs/plans/adr/`. Only write an ADR when all three are true:
36
+
37
+ 1. **Hard to reverse** — changing your mind later has meaningful cost
38
+ 2. **Surprising without context** — a future reader will wonder "why?"
39
+ 3. **A real trade-off** — there were genuine alternatives
40
+
41
+ ADR format — a title and 1-3 sentences covering context, decision, and why:
42
+
43
+ ```markdown
44
+ # <Short title of the decision>
45
+
46
+ <1-3 sentences: context, decision, and why.>
47
+ ```
48
+
49
+ ADRs live under `docs/plans/adr/` and are archived during finalizing alongside the design doc.
50
+ ```
51
+
52
+ ```bash
53
+ git commit -m "feat(brainstorming): add design-it-twice interface sketches and ADR output"
54
+ ```
55
+
56
+ ---
57
+
58
+ ## Task 2: Update writing-plans skill — vertical slices
59
+
60
+ <!-- tdd: trivial -->
61
+ <!-- checkpoint: none -->
62
+
63
+ Edit `skills/writing-plans/SKILL.md` — add a new section after "## Task format" and before "## TDD in the plan":
64
+
65
+ ```markdown
66
+ ## Vertical slices
67
+
68
+ Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior.
69
+
70
+ ```
71
+ WRONG (horizontal):
72
+ Task 1: Create database schema for users
73
+ Task 2: Write user API endpoints
74
+ Task 3: Build user UI components
75
+ Task 4: Wire everything together
76
+
77
+ RIGHT (vertical):
78
+ Task 1: User can sign up (model + endpoint + validation + test)
79
+ Task 2: User can log in (auth check + token + test)
80
+ Task 3: User can view profile (query + endpoint + test)
81
+ ```
82
+
83
+ Vertical slices ensure every committed task leaves the codebase in a testable state and reduces the blast radius of a bad task.
84
+ ```
85
+
86
+ ```bash
87
+ git commit -m "feat(writing-plans): add vertical slice guidance with anti-pattern example"
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Task 3: Update executing-tasks skill — deep modules refactoring
93
+
94
+ <!-- tdd: trivial -->
95
+ <!-- checkpoint: none -->
96
+
97
+ Edit `skills/executing-tasks/SKILL.md` — add a new section after "## TDD discipline":
98
+
99
+ ```markdown
100
+ ## Refactoring
101
+
102
+ After all tests pass for a task, check for refactoring opportunities:
103
+
104
+ - **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
105
+ - **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
106
+ - **Duplication** — extract repeated patterns
107
+ - **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
108
+
109
+ Run tests after each refactor step. Never refactor while tests are failing.
110
+
111
+ Key vocabulary: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
112
+ ```
113
+
114
+ ```bash
115
+ git commit -m "feat(executing-tasks): add refactoring checklist with deep modules vocabulary"
116
+ ```
117
+
118
+ ---
119
+
120
+ ## Task 4: Create diagnose skill
121
+
122
+ <!-- tdd: trivial -->
123
+ <!-- checkpoint: none -->
124
+
125
+ Create `skills/diagnose/SKILL.md`:
126
+
127
+ ```markdown
128
+ ---
129
+ name: diagnose
130
+ description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
131
+ ---
132
+
133
+ # Diagnose
134
+
135
+ A 6-phase debugging discipline. Phase 1 is the skill — spend disproportionate effort here.
136
+
137
+ ## Phase 1 — Build a feedback loop
138
+
139
+ Create a fast, deterministic, agent-runnable pass/fail signal for the bug before doing anything else. Try in this order: failing test, curl script, CLI invocation, headless browser script.
140
+
141
+ The loop must produce the failure mode the **user** described — not a nearby but different failure. Iterate on the loop itself: can you make it faster? Sharper? More deterministic?
142
+
143
+ If you genuinely cannot build a loop, stop and say so. List what you tried. Ask for access to a reproducing environment or a captured artifact.
144
+
145
+ Do not proceed until you have a loop you believe in.
146
+
147
+ ## Phase 2 — Reproduce
148
+
149
+ Run the loop. Confirm:
150
+ - The failure matches the user's reported symptom
151
+ - The failure is reproducible across multiple runs
152
+ - You've captured the exact symptom (error message, wrong output, slow timing)
153
+
154
+ ## Phase 3 — Hypothesise
155
+
156
+ Generate 3-5 ranked hypotheses. Each must be falsifiable:
157
+
158
+ > "If `<X>` is the cause, then `<changing Y>` will make the bug disappear / `<changing Z>` will make it worse."
159
+
160
+ Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly.
161
+
162
+ ## Phase 4 — Instrument
163
+
164
+ Each probe must map to a specific hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup later. Prefer a debugger breakpoint over logs when available.
165
+
166
+ ## Phase 5 — Fix + regression test
167
+
168
+ Write the regression test **before** the fix — but only if there's a correct seam (one that exercises the real bug pattern at the call site). If no correct seam exists, note it — the codebase architecture is preventing the bug from being locked down.
169
+
170
+ ## Phase 6 — Cleanup
171
+
172
+ Required before declaring done:
173
+ - Original repro no longer triggers
174
+ - Regression test passes (or absence of seam is documented)
175
+ - All `[DEBUG-...]` instrumentation removed
176
+ - Ask: what would have prevented this bug?
177
+ ```
178
+
179
+ ```bash
180
+ git commit -m "feat(diagnose): add standalone debugging skill with 6-phase loop"
181
+ ```
182
+
183
+ ---
184
+
185
+ ## Task 5: Update finalizing skill — archive ADRs
186
+
187
+ <!-- tdd: trivial -->
188
+ <!-- checkpoint: none -->
189
+
190
+ Edit `skills/finalizing/SKILL.md` — update step 1 from:
191
+
192
+ ```
193
+ 1. **Move planning docs** — archive the design, implementation, and progress docs, then commit:
194
+ ```
195
+ mkdir -p docs/plans/completed
196
+ mv docs/plans/*-design.md docs/plans/completed/
197
+ mv docs/plans/*-implementation.md docs/plans/completed/
198
+ mv docs/plans/*-progress.md docs/plans/completed/
199
+ git add docs/plans/ && git commit -m "chore: archive planning docs"
200
+ ```
201
+ ```
202
+
203
+ to:
204
+
205
+ ```
206
+ 1. **Move planning docs** — archive the design, implementation, progress docs, and ADRs (if any), then commit:
207
+ ```
208
+ mkdir -p docs/plans/completed
209
+ mkdir -p docs/plans/completed/adr
210
+ mv docs/plans/*-design.md docs/plans/completed/
211
+ mv docs/plans/*-implementation.md docs/plans/completed/
212
+ mv docs/plans/*-progress.md docs/plans/completed/
213
+ mv docs/plans/adr/*.md docs/plans/completed/adr/ 2>/dev/null || true
214
+ rmdir docs/plans/adr 2>/dev/null || true
215
+ git add docs/plans/ && git commit -m "chore: archive planning docs"
216
+ ```
217
+ ```
218
+
219
+ ```bash
220
+ git commit -m "feat(finalizing): archive ADRs alongside planning docs"
221
+ ```
222
+
223
+ ---
224
+
225
+ ## Task 6: Update documentation
226
+
227
+ <!-- tdd: trivial -->
228
+ <!-- checkpoint: none -->
229
+
230
+ ### README.md
231
+
232
+ Update the intro line from:
233
+
234
+ ```
235
+ **4 workflow skills** that guide the agent through a structured development process:
236
+ ```
237
+
238
+ to:
239
+
240
+ ```
241
+ **4 workflow skills** and **1 utility skill** that guide the agent through a structured development process:
242
+ ```
243
+
244
+ Update the pipeline diagram from:
245
+
246
+ ```
247
+ brainstorm → plan → execute → finalize
248
+ ```
249
+
250
+ to:
251
+
252
+ ```
253
+ brainstorm → plan → execute → finalize
254
+
255
+ diagnose (on demand)
256
+ ```
257
+
258
+ Add `diagnose` to the skills table:
259
+
260
+ ```
261
+ | `diagnose` | ~35 | 6-phase debugging loop: build feedback loop, reproduce, hypothesise, instrument, fix, cleanup |
262
+ ```
263
+
264
+ Update the Architecture section to include `diagnose/`:
265
+
266
+ ```
267
+ ├── skills/
268
+ │ ├── brainstorming/SKILL.md
269
+ │ ├── writing-plans/SKILL.md
270
+ │ ├── executing-tasks/SKILL.md
271
+ │ ├── finalizing/SKILL.md
272
+ │ └── diagnose/SKILL.md
273
+ ```
274
+
275
+ ### docs/developer-usage-guide.md
276
+
277
+ Add to the brainstorm section (after "Outcome"):
278
+
279
+ ```
280
+ - Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions
281
+ ```
282
+
283
+ Add a new section after the 4 workflow phases:
284
+
285
+ ```markdown
286
+ ### 5. Diagnose (on demand)
287
+
288
+ ```
289
+ /skill:diagnose
290
+ ```
291
+
292
+ A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
293
+ ```
294
+
295
+ ### docs/workflow-phases.md
296
+
297
+ Add a new section at the end:
298
+
299
+ ```markdown
300
+ ## diagnose
301
+
302
+ ```
303
+ /skill:diagnose
304
+ ```
305
+
306
+ Not a pipeline phase. A utility skill invoked on demand when debugging is needed.
307
+
308
+ - Build a feedback loop (failing test, curl script, etc.)
309
+ - Reproduce, hypothesise, instrument, fix, cleanup
310
+ - No write restrictions (used during execute/finalize, or outside the pipeline)
311
+ ```
312
+
313
+ ```bash
314
+ git commit -m "docs: update README, usage guide, and workflow phases for new skills"
315
+ ```
@@ -0,0 +1,15 @@
1
+ # Progress: incorporate-mattpocock-skills
2
+
3
+ Plan: docs/plans/2026-05-01-incorporate-mattpocock-skills-implementation.md
4
+ Branch: incorporate-mattpocock-skills
5
+ Started: 2026-05-01T00:00:00Z
6
+ Last updated: 2026-05-01T00:00:00Z
7
+
8
+ | # | Status | Task | Commit |
9
+ |---|--------|------|--------|
10
+ | 1 | ✅ done | Update brainstorming skill — design it twice + ADRs | 0231b84 |
11
+ | 2 | ✅ done | Update writing-plans skill — vertical slices | 22a46df |
12
+ | 3 | ✅ done | Update executing-tasks skill — deep modules refactoring | c405634 |
13
+ | 4 | ✅ done | Create diagnose skill | 5e39e2d |
14
+ | 5 | ✅ done | Update finalizing skill — archive ADRs | e31a1af |
15
+ | 6 | ✅ done | Update documentation (README, usage guide, workflow phases) | 8c1c4eb |
@@ -1,6 +1,6 @@
1
1
  # Workflow Phases
2
2
 
3
- `pi-workflow-kit` has 4 phases. You invoke each one explicitly with `/skill:`.
3
+ `pi-workflow-kit` has 4 phases and 1 utility skill. You invoke each one explicitly with `/skill:`.
4
4
 
5
5
  ```
6
6
  brainstorm → plan → execute → finalize
@@ -55,3 +55,15 @@ No write restrictions. All tools available.
55
55
  - Clean up worktree if one was used
56
56
 
57
57
  No write restrictions. All tools available.
58
+
59
+ ## diagnose
60
+
61
+ ```
62
+ /skill:diagnose
63
+ ```
64
+
65
+ Not a pipeline phase. A utility skill invoked on demand when debugging is needed.
66
+
67
+ - Build a feedback loop (failing test, curl script, etc.)
68
+ - Reproduce, hypothesise, instrument, fix, cleanup
69
+ - No write restrictions (used during execute/finalize, or outside the pipeline)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tianhai/pi-workflow-kit",
3
- "version": "0.8.4",
3
+ "version": "0.10.0",
4
4
  "description": "Workflow skills and enforcement extensions for pi",
5
5
  "keywords": [
6
6
  "pi-package"
@@ -11,8 +11,24 @@ Read-only exploration. You may **not** edit or create any files except under `do
11
11
 
12
12
  1. **Check git state** — run `git status` and `git log --oneline -5`. If there's uncommitted work, ask the user what to do with it first.
13
13
  2. **Understand the idea** — read existing code, docs, and recent commits. Ask questions one at a time to refine the idea. Prefer multiple choice when possible.
14
- 3. **Explore approaches** — propose 2-3 approaches with trade-offs. Lead with your recommendation.
14
+ 3. **Explore approaches** — propose 2-3 approaches. For each approach, sketch the concrete interface (types, method signatures, example caller code) so the comparison is grounded in actual code, not abstract descriptions. Lead with your recommendation.
15
15
  4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
16
+
17
+ When a significant architectural decision is identified, offer to write a lightweight ADR to `docs/plans/adr/`. Only write an ADR when all three are true:
18
+
19
+ 1. **Hard to reverse** — changing your mind later has meaningful cost
20
+ 2. **Surprising without context** — a future reader will wonder "why?"
21
+ 3. **A real trade-off** — there were genuine alternatives
22
+
23
+ ADR format — a title and 1-3 sentences covering context, decision, and why:
24
+
25
+ ```markdown
26
+ # <Short title of the decision>
27
+
28
+ <1-3 sentences: context, decision, and why.>
29
+ ```
30
+
31
+ ADRs live under `docs/plans/adr/` and are archived during finalizing alongside the design doc.
16
32
  5. **Write the design doc** — save it to `docs/plans/YYYY-MM-DD-<topic>-design.md`. Ask the user to commit it. Branch creation and worktree setup should be deferred to the execution phase (`/skill:executing-tasks`).
17
33
 
18
34
  ## Principles
@@ -0,0 +1,56 @@
1
+ ---
2
+ name: diagnose
3
+ description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
4
+ ---
5
+
6
+ # Diagnose
7
+
8
+ A 6-phase debugging discipline. Phase 1 is the skill — spend disproportionate effort here.
9
+
10
+ ## Phase 1 — Build a feedback loop
11
+
12
+ Create a fast, deterministic, agent-runnable pass/fail signal for the bug before doing anything else. Try in this order: failing test, curl script, CLI invocation, headless browser script.
13
+
14
+ Other strategies when the basics don't work:
15
+ - **Bisection** — bug appeared between two known states? Automate "boot at state X, check, repeat" to bisect
16
+ - **Replay** — save a real network request or event log to disk, replay it through the code path in isolation
17
+
18
+ The loop must produce the failure mode the **user** described — not a nearby but different failure. Iterate on the loop itself: can you make it faster? Sharper? More deterministic?
19
+
20
+ If you genuinely cannot build a loop, stop and say so. List what you tried. Ask for access to a reproducing environment or a captured artifact.
21
+
22
+ Do not proceed until you have a loop you believe in.
23
+
24
+ ## Phase 2 — Reproduce
25
+
26
+ Run the loop. Confirm:
27
+ - The failure matches the user's reported symptom
28
+ - The failure is reproducible across multiple runs
29
+ - You've captured the exact symptom (error message, wrong output, slow timing)
30
+
31
+ Then **minimize the repro** — strip it down to the smallest input, shortest path, or fewest steps that still triggers the bug. A minimized repro dramatically narrows the hypothesis space.
32
+
33
+ ## Phase 3 — Hypothesise
34
+
35
+ Generate 3-5 ranked hypotheses. Each must be falsifiable:
36
+
37
+ > "If `<X>` is the cause, then `<changing Y>` will make the bug disappear / `<changing Z>` will make it worse."
38
+
39
+ Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly.
40
+
41
+ ## Phase 4 — Instrument
42
+
43
+ Each probe must map to a specific hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup later. Prefer a debugger breakpoint over logs when available.
44
+
45
+ ## Phase 5 — Fix + regression test
46
+
47
+ Write the regression test **before** the fix — but only if there's a correct seam (one that exercises the real bug pattern at the call site). If no correct seam exists, note it — the codebase architecture is preventing the bug from being locked down.
48
+
49
+ ## Phase 6 — Cleanup
50
+
51
+ Required before declaring done:
52
+ - Original repro no longer triggers
53
+ - Regression test passes (or absence of seam is documented)
54
+ - All `[DEBUG-...]` instrumentation removed
55
+ - Ask: what would have prevented this bug?
56
+ - If the bug was caused by an architectural problem (no good test seam, tangled callers, hidden coupling), suggest writing an ADR to `docs/plans/adr/` capturing that insight
@@ -5,12 +5,33 @@ description: "Use this to implement an approved plan task-by-task. Run after wri
5
5
 
6
6
  # Executing Tasks
7
7
 
8
- Implement the plan from `docs/plans/*-implementation.md` task by task.
8
+ Implement the plan from `docs/plans/*-implementation.md` task by task, with file-based progress tracking and session-aware context management.
9
9
 
10
10
  ## Before you start
11
11
 
12
12
  1. **Check git state** — run `git status` and `git log --oneline -5`. Note any uncommitted changes.
13
- 2. **Suggest workspace isolation** — if the user isn't already on a feature branch or worktree, present the options:
13
+ 2. **Find the plan** — look for `docs/plans/*-implementation.md`. If multiple exist, ask the user which one to execute.
14
+ 3. **Check for existing progress** — look for `docs/plans/*-progress.md`. If one exists matching the plan, this is a **resume** (see [Resume](#resume)). If not, this is a **first run** (see [First run](#first-run)).
15
+
16
+ ## First run
17
+
18
+ 1. **Parse the implementation plan** — read the plan and extract all `## Task N:` headings. Build the progress table with all tasks as `⬜ pending`.
19
+ 2. **Create the progress file** — save to `docs/plans/<plan-name>-progress.md` (replace `-implementation` with `-progress` in the plan filename):
20
+
21
+ ```markdown
22
+ # Progress: <topic>
23
+
24
+ Plan: docs/plans/YYYY-MM-DD-<topic>-implementation.md
25
+ Branch: <current-branch>
26
+ Started: <ISO timestamp>
27
+ Last updated: <ISO timestamp>
28
+
29
+ | # | Status | Task | Commit |
30
+ |---|--------|------|--------|
31
+ | 1 | ⬜ pending | Task description (preserve checkpoint labels) | — |
32
+ ```
33
+
34
+ 3. **Suggest workspace isolation** — if the user isn't already on a feature branch or worktree, present the options:
14
35
 
15
36
  - **Branch** (smaller changes):
16
37
  ```
@@ -23,12 +44,63 @@ Implement the plan from `docs/plans/*-implementation.md` task by task.
23
44
 
24
45
  Derive `<feature-name>` from the plan doc (e.g. `docs/plans/2026-04-16-auth-design.md` → `auth`). Ask the user which they prefer, then wait for confirmation before proceeding.
25
46
 
26
- 3. **Commit the plan docs** — if `docs/plans/` has uncommitted files, commit them on the new branch:
47
+ 4. **Commit the plan docs** — if `docs/plans/` has uncommitted files, commit them on the new branch:
27
48
  ```
28
49
  git add docs/plans/ && git commit -m "docs: add design and implementation plan"
29
50
  ```
30
51
 
31
- ## Per-task lifecycle
52
+ 5. **Begin task execution** — start with task 1 (see [Per-task execution](#per-task-execution)).
53
+
54
+ ## Resume
55
+
56
+ 1. **Read the progress file** — find the first task with status `⬜ pending`, `❌ failed`, or `🔄 in-progress`.
57
+ 2. **Handle in-progress task** — if a task is `🔄 in-progress` (mid-task crash):
58
+ - Check `git log --oneline` since the last `✅ done` task's commit
59
+ - If commits exist: ask the user — "Task N was in progress and commits were made. Continue from here, or reset it to pending?"
60
+ - If no commits: restart the task (reset to `🔄 in-progress` and begin)
61
+ 3. **Handle failed task** — if a task is `❌ failed`:
62
+ - Show the failure reason from the progress file
63
+ - Ask: "Retry, skip, or abort?"
64
+ 4. **Handle pending task** — proceed normally
65
+ 5. **All done** — if no `⬜ pending` or `❌ failed` tasks remain, show summary and suggest `/skill:finalizing`
66
+ 6. **Begin task execution** — proceed from the identified task
67
+
68
+ ## Progress file
69
+
70
+ **Path:** `docs/plans/<plan-name>-progress.md`
71
+
72
+ **Status values:**
73
+
74
+ | Status | Meaning |
75
+ |--------|---------|
76
+ | `⬜ pending` | Not started |
77
+ | `🔄 in-progress` | Currently being worked on |
78
+ | `✅ done` | Committed successfully |
79
+ | `❌ failed` | Could not complete (append `Failed: <reason>`) |
80
+ | `⏭ skipped` | User chose to skip |
81
+
82
+ **Update rules:**
83
+ - Mark `🔄 in-progress` immediately when starting a task
84
+ - Mark `✅ done` + record commit hash only after successful `git commit`
85
+ - Mark `❌ failed` + append reason when the agent can't proceed after retrying
86
+ - Mark `⏭ skipped` when the user says "skip"
87
+ - Update `Last updated` timestamp on every change
88
+ - Preserve checkpoint labels in the task description column
89
+
90
+ ## Per-task execution
91
+
92
+ For each task the agent works on:
93
+
94
+ 1. **Mark in-progress** — update the progress file: `🔄 in-progress`
95
+ 2. **Read only the relevant task** — grep/jump to `## Task N:` in the implementation plan. Do not read the entire plan.
96
+ 3. **Implement** — follow the TDD discipline (see [TDD discipline](#tdd-discipline)) and checkpoint flow (see [Checkpoints](#checkpoints))
97
+ 4. **Commit** — `git add` the relevant files and commit with a clear message
98
+ 5. **Update progress** — mark `✅ done` + record the commit hash
99
+ 6. **Check next task** — look at the next task in the progress file:
100
+ - **Has checkpoint** → pause for review (see [Checkpoint review](#checkpoint-review))
101
+ - **No checkpoint** → continue to the next task
102
+
103
+ ## Checkpoints
32
104
 
33
105
  Check each task for a `checkpoint` label and follow the appropriate flow:
34
106
 
@@ -54,16 +126,6 @@ Check each task for a `checkpoint` label and follow the appropriate flow:
54
126
  4. **Pause for review** — show what was done and the diff, then wait for human input
55
127
  5. **Commit** — `git add` the relevant files and commit with a clear message
56
128
 
57
- ## TDD discipline
58
-
59
- Follow the TDD scenario from the plan:
60
-
61
- - **New feature**: write the test first, see it fail, then implement
62
- - **Modifying tested code**: run existing tests before and after
63
- - **Trivial change**: use judgment
64
-
65
- Don't skip tests because "it's obvious." The test is the contract.
66
-
67
129
  ## Checkpoint review
68
130
 
69
131
  When pausing at a checkpoint, present:
@@ -83,6 +145,76 @@ Wait for the human to respond. They may:
83
145
  - Ask to revert the task
84
146
  - Adjust the remaining plan
85
147
 
148
+ ## TDD discipline
149
+
150
+ Follow the TDD scenario from the plan:
151
+
152
+ - **New feature**: write the test first, see it fail, then implement
153
+ - **Modifying tested code**: run existing tests before and after
154
+ - **Trivial change**: use judgment
155
+
156
+ Don't skip tests because "it's obvious." The test is the contract.
157
+
158
+ ## Refactoring
159
+
160
+ After all tests pass for a task, check for refactoring opportunities:
161
+
162
+ - **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
163
+ - **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
164
+ - **Duplication** — extract repeated patterns
165
+ - **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
166
+
167
+ Run tests after each refactor step. Never refactor while tests are failing.
168
+
169
+ Key vocabulary: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
170
+
171
+ ## Batching and session management
172
+
173
+ The agent suggests a fresh session at natural break points to minimize token accumulation. After completing ~3-5 non-checkpoint tasks in the same session, suggest:
174
+
175
+ ```
176
+ ✅ Tasks 3-5 done (commits: a1b2, e4f5, i7j8)
177
+
178
+ Progress: 5/10 tasks done
179
+
180
+ ⏭ Next: Task 6 — Add auth middleware (no checkpoint)
181
+
182
+ 💡 Context is building up. For clean context on remaining tasks:
183
+ /new then /skill:executing-tasks
184
+ (or just say "continue" to keep going here)
185
+ ```
186
+
187
+ The user can say "continue" to keep going in the same session. Respect their choice.
188
+
189
+ Also suggest `/new` at checkpoint review pauses when multiple tasks have been completed since the last session break.
190
+
191
+ ## Progress file updates (automated)
192
+
193
+ During execution, the agent should update the progress file in place. Example workflow:
194
+
195
+ ```bash
196
+ # Before task 2 starts:
197
+ sed -i 's/| 2 | ⬜ pending/| 2 | 🔄 in-progress/'
198
+ # After successful commit a1b2c3d:
199
+ sed -i 's/| 2 | 🔄 in-progress/| 2 | ✅ done/'
200
+ sed -i 's/| 2 | ✅ done[^|]*|/| 2 | ✅ done | a1b2c3d |/'
201
+ # Update timestamp:
202
+ sed -i "s/Last updated:.*/Last updated: $(date -u +%Y-%m-%dT%H:%M:%SZ)/"
203
+ ```
204
+
205
+ Note: The agent should use proper markdown table parsing (not naive sed in production) to avoid corrupting the file — ensure the replacement targets the correct row.
206
+
207
+ ## User override commands
208
+
209
+ The user can issue these commands at any time during execution:
210
+
211
+ | User says | Agent does |
212
+ |-----------|-----------|
213
+ | `skip` | Mark current task `⏭ skipped`, move to next |
214
+ | `status` | Show the progress table |
215
+ | `stop` | Mark current task back to `⬜ pending`, suggest `/new` |
216
+ | `retry` | Re-read current task section, start over |
217
+
86
218
  ## Receiving code review
87
219
 
88
220
  When the user shares code review feedback:
@@ -94,10 +226,22 @@ When the user shares code review feedback:
94
226
 
95
227
  ## If you're stuck
96
228
 
97
- - Re-read the plan — you may have drifted from the spec
229
+ - Re-read the current task section from the plan — you may have drifted from the spec
98
230
  - Check git log — recent commits may reveal context
99
231
  - Ask the user — it's better to clarify than to guess wrong
100
232
 
101
233
  ## After all tasks
102
234
 
103
- Ask: "All tasks done? Run `/skill:finalizing` to ship."
235
+ When no `⬜ pending` or `❌ failed` tasks remain, show a summary:
236
+
237
+ ```
238
+ ✅ All tasks complete!
239
+
240
+ | # | Status | Task |
241
+ |---|--------|------|
242
+ | 1 | ✅ done | Create User model |
243
+ | 2 | ✅ done | Write User model tests |
244
+ | 3 | ⏭ skipped | Add auth middleware |
245
+
246
+ Ready to ship? Run `/skill:finalizing`
247
+ ```