@tianhai/pi-workflow-kit 0.13.1 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -47,7 +47,7 @@ brainstorm → plan → execute → finalize
47
47
  |-------|---------|--------------|
48
48
  | **Brainstorm** | `/skill:brainstorming` | Explore approaches, debate tradeoffs, produce a design doc |
49
49
  | **Plan** | `/skill:writing-plans` | Break design into bite-sized TDD tasks with file paths and acceptance criteria |
50
- | **Execute** | `/skill:executing-tasks` | Implement tasks one-by-one with TDD discipline and optional checkpoint review gates |
50
+ | **Execute** | `/skill:executing-tasks` | Implement tasks one-by-one with TDD discipline and pre-commit checkpoint review gates |
51
51
  | **Finalize** | `/skill:finalizing` | Archive plan docs, update README/CHANGELOG, create PR |
52
52
  | **Diagnose** | `/skill:diagnose` | 6-phase debugging loop: reproduce → hypothesize → instrument → fix → verify |
53
53
 
@@ -0,0 +1,235 @@
1
+ # Design: Checkpoint gates and pre-commit discipline
2
+
3
+ ## Problem
4
+
5
+ The `executing-tasks` skill instructs the agent to pause at checkpoints for human review before committing. In practice, the agent commits first then presents the review, defeating the purpose of checkpoints.
6
+
7
+ Root causes in executing-tasks:
8
+ 1. "PAUSE if" reads as optional — the agent interprets it as "if you remember"
9
+ 2. Steps 6-12 flow together as implement→commit — the pause gets swallowed
10
+ 3. The diff format asks for committed state — nudges the agent to commit first
11
+
12
+ Root causes in writing-plans:
13
+ 4. Task format says `git commit` after each task — the agent sees the commit line past the checkpoint and skips to it
14
+ 5. Refactor and lessons are optional-sounding steps at the end of a long list — the agent skips them
15
+ 6. The plan body has no structural enforcement — everything is just text the agent reads at once
16
+
17
+ Secondary issue: the agent skips steps 9 (Refactor if needed) and 10 (Learn from mistakes) because they're optional-sounding steps at the end of a long list.
18
+
19
+ ## Key insight
20
+
21
+ The agent follows numbered steps and skips loose sections. **Output requirements** (things the agent has to produce) are stronger than instructions (things the agent is told to do). The checkpoint review format forces the agent to report refactoring and lessons — that's the enforcement mechanism.
22
+
23
+ No-checkpoint tasks are simple enough that refactor/lessons genuinely aren't needed — the task author chose no checkpoint because the task is trivial.
24
+
25
+ ## Solution
26
+
27
+ - **writing-plans**: Generate task bodies with numbered steps (including refactor/lessons for checkpointed tasks) and checkpoint gates. Never include `git commit` in the plan.
28
+ - **executing-tasks**: Simplified runner — follow the plan step by step, pause at checkpoint gates, commit after approval.
29
+ - **Progress file**: Use Status column to enforce checkpoint gates. Agent can't go from `🔄 in-progress` → `✅ done` if the task has a checkpoint — must go through `⏸ test-review` or `⏸ done-review` first.
30
+
31
+ ## Design
32
+
33
+ ### Writing-plans: task format
34
+
35
+ The plan never includes `git commit`. That's the executing-tasks skill's responsibility.
36
+
37
+ **No-checkpoint task:**
38
+
39
+ ```markdown
40
+ ## Task 1: Create User model
41
+
42
+ <!-- tdd: new-feature -->
43
+ <!-- checkpoint: none -->
44
+
45
+ Files:
46
+ - `src/user/model.ts`
47
+ - `src/user/model.test.ts`
48
+
49
+ Steps:
50
+ 1. Write failing test for User model creation
51
+ 2. Run test — confirm it fails
52
+ 3. Implement User model
53
+ 4. Run test — confirm it passes
54
+ ```
55
+
56
+ **Checkpoint: test task:**
57
+
58
+ ```markdown
59
+ ## Task 2: Write auth tests
60
+
61
+ <!-- tdd: new-feature -->
62
+ <!-- checkpoint: test -->
63
+
64
+ Files:
65
+ - `src/auth/login.test.ts`
66
+
67
+ Steps:
68
+ 1. Write failing test for login with valid credentials
69
+ 2. Run test — confirm it fails
70
+
71
+ ⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
72
+
73
+ 3. Implement login handler
74
+ 4. Run test — confirm it passes
75
+ 5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
76
+ 6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
77
+ ```
78
+
79
+ **Checkpoint: done task:**
80
+
81
+ ```markdown
82
+ ## Task 3: Add login endpoint
83
+
84
+ <!-- tdd: new-feature -->
85
+ <!-- checkpoint: done -->
86
+
87
+ Files:
88
+ - `src/auth/login.ts`
89
+ - `src/auth/login.test.ts`
90
+
91
+ Steps:
92
+ 1. Write failing test for login with valid credentials
93
+ 2. Run test — confirm it fails
94
+ 3. Implement login handler
95
+ 4. Run test — confirm it passes
96
+ 5. Add edge case tests (invalid password, missing email)
97
+ 6. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
98
+ 7. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
99
+
100
+ ⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
101
+ ```
102
+
103
+ **Task with both checkpoints:**
104
+
105
+ ```markdown
106
+ ## Task 4: Complex auth flow
107
+
108
+ <!-- tdd: new-feature -->
109
+ <!-- checkpoint: test -->
110
+ <!-- checkpoint: done -->
111
+
112
+ Steps:
113
+ 1. Write failing test for auth flow
114
+ 2. Run test — confirm it fails
115
+
116
+ ⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
117
+
118
+ 3. Implement auth flow
119
+ 4. Run test — confirm it passes
120
+ 5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
121
+ 6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
122
+
123
+ ⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
124
+ ```
125
+
126
+ ### Writing-plans: checkpoint labels table
127
+
128
+ | Checkpoint | When to use | What the plan should say |
129
+ |---|---|---|
130
+ | *(none)* | Trivial tasks, well-understood changes | Numbered steps only |
131
+ | **`checkpoint: test`** | Test design matters | Steps up to test → `⏸ CHECKPOINT: test` → implement steps (including refactor/lessons) |
132
+ | **`checkpoint: done`** | Implementation review matters | Steps (including refactor/lessons) → `⏸ CHECKPOINT: done` |
133
+ | Both | Non-obvious tests AND complex logic | Steps up to test → `⏸ CHECKPOINT: test` → implement steps (including refactor/lessons) → `⏸ CHECKPOINT: done` |
134
+
135
+ ### Executing-tasks: simplified runner
136
+
137
+ The per-task execution becomes:
138
+
139
+ 1. Mark `🔄 in-progress` in progress file
140
+ 2. Read the current task from the plan
141
+ 3. Execute each numbered step in order
142
+ 4. When hitting `⏸ CHECKPOINT` in the plan:
143
+ - Update progress to `⏸ test-review` or `⏸ done-review`
144
+ - Present the checkpoint review (see format below)
145
+ - Wait for human approval
146
+ - On approval, update progress back to `🔄 in-progress`
147
+ - Continue with the next step
148
+ 5. After all steps are done (or for no-checkpoint tasks, after steps + commit):
149
+ - `git add` and commit with a clear message
150
+ - Update progress to `✅ done` + record commit hash
151
+
152
+ ### Progress file: Status-enforced gates
153
+
154
+ Status values:
155
+
156
+ | Status | Meaning |
157
+ |--------|---------|
158
+ | `⬜ pending` | Not started |
159
+ | `🔄 in-progress` | Currently executing plan steps |
160
+ | `⏸ test-review` | Paused at checkpoint: test, waiting for human approval |
161
+ | `⏸ done-review` | Paused at checkpoint: done, waiting for human approval |
162
+ | `✅ done` | Committed successfully |
163
+ | `❌ failed` | Could not complete |
164
+ | `⏭ skipped` | User chose to skip |
165
+
166
+ Enforcement rules:
167
+ - Agent cannot go from `🔄 in-progress` → `✅ done` if the task has a checkpoint
168
+ - Must go through `⏸ test-review` or `⏸ done-review` first
169
+ - Can only return to `🔄 in-progress` after human says "approve"
170
+ - Can only go to `✅ done` after commit
171
+
172
+ Example progress file:
173
+
174
+ ```markdown
175
+ # Progress: Auth feature
176
+
177
+ Plan: docs/plans/2026-05-08-auth-implementation.md
178
+ Branch: auth-feature
179
+ Started: 2026-05-08T10:00:00Z
180
+ Last updated: 2026-05-08T10:05:00Z
181
+
182
+ | # | Status | Task | Commit |
183
+ |---|--------|------|--------|
184
+ | 1 | ✅ done | Create User model | abc123 |
185
+ | 2 | ⏸ done-review | Add login endpoint (checkpoint: done) | — |
186
+ | 3 | ⬜ pending | Add auth middleware (checkpoint: done) | — |
187
+ ```
188
+
189
+ ### Checkpoint review format
190
+
191
+ For `checkpoint: test`:
192
+
193
+ ```
194
+ ⏸ Paused at checkpoint: test for task [N]
195
+
196
+ **Test written:** [show test code]
197
+ **Expected behavior:** [what this validates]
198
+ **Next:** Continue implementing after approval
199
+
200
+ **Available actions:**
201
+ - **Approve** — continue to implementation
202
+ - **Request changes** — describe what to change
203
+ - **Revert** — undo this task and mark it back to pending
204
+ - `skip` — skip this task
205
+ - `stop` — pause here, resume later with `/skill:executing-tasks`
206
+ ```
207
+
208
+ For `checkpoint: done`:
209
+
210
+ ```
211
+ ⏸ Paused at checkpoint: done for task [N]
212
+
213
+ **What was done:** [brief summary]
214
+ **Refactoring done:** [what changed, or "none needed — [reason]"]
215
+ **Lessons learned:** [new rule added, or "none"]
216
+ **Diff:** [run `git diff --cached` or `git diff` — do NOT commit first]
217
+ **Next:** Commit after approval
218
+
219
+ **Available actions:**
220
+ - **Approve** — commit and move to next task
221
+ - **Request changes** — describe what to change
222
+ - **Revert** — undo this task and mark it back to pending
223
+ - `skip` — skip this task
224
+ - `stop` — pause here, resume later with `/skill:executing-tasks`
225
+ ```
226
+
227
+ ## Files to change
228
+
229
+ - `skills/writing-plans/SKILL.md` — update task format template and checkpoint labels section
230
+ - `skills/executing-tasks/SKILL.md` — simplify per-task execution to plan-following runner, update progress file status values
231
+
232
+ ## What stays the same
233
+
234
+ - executing-tasks: Before you start, First run, Resume, User override commands, Receiving code review, If you're stuck, After all tasks — all unchanged
235
+ - writing-plans: Process steps, vertical slices, TDD section — all unchanged
@@ -0,0 +1,83 @@
1
+ # Implementation: Checkpoint gates and pre-commit discipline
2
+
3
+ Design: `docs/plans/2026-05-08-checkpoint-gates-design.md`
4
+
5
+ ## Overview
6
+
7
+ Update two skill files so that:
8
+ 1. `writing-plans` generates task bodies with checkpoint gates and numbered refactor/lessons steps, never includes `git commit`
9
+ 2. `executing-tasks` becomes a simplified plan-following runner with status-enforced checkpoint gates
10
+
11
+ ## Task 1: Update writing-plans task format and checkpoint labels
12
+
13
+ <!-- tdd: trivial -->
14
+ <!-- checkpoint: none -->
15
+
16
+ Files:
17
+ - `skills/writing-plans/SKILL.md`
18
+
19
+ Changes:
20
+
21
+ ### Task format section
22
+
23
+ Replace the task format section. Key changes:
24
+ - Remove `git commit` from bullet points (commit is the executing-tasks skill's responsibility)
25
+ - Remove `<!-- checkpoint: none -->` from the default template (omit when no checkpoint)
26
+ - Add task body examples for each checkpoint type (none, test, done, both)
27
+ - For checkpointed tasks, include numbered refactor and lessons steps
28
+ - For checkpointed tasks, include `⏸ CHECKPOINT` gate in the task body
29
+ - No task body should include `git commit`
30
+
31
+ ### Checkpoint labels section
32
+
33
+ Replace the checkpoint labels table. Change the last column from "What happens during execution" to "What the plan should include", showing the gate structure for each checkpoint type.
34
+
35
+ ### TDD section
36
+
37
+ Remove "→ commit" from the Instructions column — commit is not part of the plan.
38
+
39
+ ## Task 2: Update executing-tasks per-task execution and progress file
40
+
41
+ <!-- tdd: trivial -->
42
+ <!-- checkpoint: done -->
43
+
44
+ Files:
45
+ - `skills/executing-tasks/SKILL.md`
46
+
47
+ Changes:
48
+
49
+ ### Per-task execution section
50
+
51
+ Replace the current 15-step list with a simplified plan-following runner:
52
+
53
+ 1. Mark `🔄 in-progress` in progress file
54
+ 2. Read the current task from the plan
55
+ 3. Execute each numbered step in order
56
+ 4. When hitting `⏸ CHECKPOINT` in the plan:
57
+ - Update progress to `⏸ test-review` or `⏸ done-review`
58
+ - Present the checkpoint review
59
+ - Wait for human approval
60
+ - On approval, update progress back to `🔄 in-progress`
61
+ - Continue with the next step
62
+ 5. After all steps done:
63
+ - `git add` and commit with a clear message
64
+ - Update progress to `✅ done` + record commit hash
65
+
66
+ Remove the inline refactor/lessons steps — they're now in the plan for checkpointed tasks.
67
+
68
+ ### Progress file section
69
+
70
+ Add `⏸ test-review` and `⏸ done-review` status values. Add enforcement rule: agent cannot go from `🔄 in-progress` → `✅ done` if task has a checkpoint.
71
+
72
+ ### Checkpoint review section
73
+
74
+ Update `checkpoint: done` review to include:
75
+ - **Refactoring done:** field
76
+ - **Lessons learned:** field
77
+ - **Diff:** uses `git diff --cached` or `git diff`, with "do NOT commit first"
78
+
79
+ Simplify available actions (remove "Adjust plan" since the plan drives execution).
80
+
81
+ ### Keep unchanged
82
+
83
+ - Before you start, First run, Resume, User override commands, Receiving code review, If you're stuck, After all tasks
@@ -0,0 +1,11 @@
1
+ # Progress: Checkpoint gates and pre-commit discipline
2
+
3
+ Plan: docs/plans/2026-05-08-checkpoint-gates-implementation.md
4
+ Branch: main
5
+ Started: 2026-05-08T20:00:00Z
6
+ Last updated: 2026-05-08T20:12:00Z
7
+
8
+ | # | Status | Task | Commit |
9
+ |---|--------|------|--------|
10
+ | 1 | ✅ done | Update writing-plans task format and checkpoint labels | d39510c |
11
+ | 2 | ✅ done | Update executing-tasks per-task execution and progress file (checkpoint: done) | 7c84f59 |
@@ -0,0 +1,39 @@
1
+ # Migrate from @mariozechner to @earendil-works
2
+
3
+ ## Context
4
+
5
+ pi has moved from `@mariozechner` to `@earendil-works` on GitHub and npm. The old `@mariozechner/pi-coding-agent@0.73.1` is deprecated. This package has two unused peer deps (`pi-ai`, `pi-tui`) that should be cleaned up.
6
+
7
+ ## Changes
8
+
9
+ ### 1. `extensions/workflow-guard.ts` — update import
10
+
11
+ ```diff
12
+ -import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
13
+ +import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
14
+ ```
15
+
16
+ ### 2. `package.json` — update peerDependencies
17
+
18
+ ```diff
19
+ "peerDependencies": {
20
+ - "@mariozechner/pi-ai": "*",
21
+ - "@mariozechner/pi-coding-agent": "*",
22
+ - "@mariozechner/pi-tui": "*",
23
+ + "@earendil-works/pi-coding-agent": "*",
24
+ "@sinclair/typebox": "*"
25
+ },
26
+ ```
27
+
28
+ - Rename `pi-coding-agent` to `@earendil-works/pi-coding-agent`
29
+ - Remove `@mariozechner/pi-ai` (unused)
30
+ - Remove `@mariozechner/pi-tui` (unused)
31
+
32
+ ## Verification
33
+
34
+ - `ExtensionAPI` is exported identically from both old and new packages (same export map, same `.d.ts` path)
35
+ - No other imports from `@mariozechner/*` exist in the codebase
36
+
37
+ ## Impact
38
+
39
+ Users on old `@mariozechner/pi-coding-agent` will get a peer dependency resolution error — they must update pi. The old package is explicitly deprecated pointing to the new one.
@@ -0,0 +1,45 @@
1
+ # Implementation Plan: Migrate from @mariozechner to @earendil-works
2
+
3
+ ## Task 1: Update package scope and clean up peer dependencies
4
+
5
+ <!-- tdd: trivial -->
6
+ <!-- checkpoint: done -->
7
+
8
+ Migrate the sole import and peerDependencies from `@mariozechner/*` to `@earendil-works/pi-coding-agent`, dropping the two unused deps (`pi-ai`, `pi-tui`).
9
+
10
+ ### Files to modify
11
+
12
+ 1. **`extensions/workflow-guard.ts`** — line 2:
13
+
14
+ ```diff
15
+ -import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
16
+ +import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
17
+ ```
18
+
19
+ 2. **`package.json`** — `peerDependencies`:
20
+
21
+ ```diff
22
+ "peerDependencies": {
23
+ - "@mariozechner/pi-ai": "*",
24
+ - "@mariozechner/pi-coding-agent": "*",
25
+ - "@mariozechner/pi-tui": "*",
26
+ + "@earendil-works/pi-coding-agent": "*",
27
+ "@sinclair/typebox": "*"
28
+ },
29
+ ```
30
+
31
+ ### Verify
32
+
33
+ ```bash
34
+ grep -r "@mariozechner" extensions/ package.json
35
+ # Expected: no output
36
+
37
+ npm run check
38
+ # Expected: passes (lint + tests)
39
+ ```
40
+
41
+ ### Commit
42
+
43
+ ```
44
+ chore: migrate from @mariozechner to @earendil-works, drop unused peer deps
45
+ ```
@@ -0,0 +1,10 @@
1
+ # Progress: Migrate from @mariozechner to @earendil-works
2
+
3
+ Plan: docs/plans/2026-05-08-migrate-earendil-works-implementation.md
4
+ Branch: migrate-earendil-works
5
+ Started: 2026-05-08T00:00:00Z
6
+ Last updated: 2026-05-08T00:02:00Z
7
+
8
+ | # | Status | Task | Commit |
9
+ |---|--------|------|--------|
10
+ | 1 | ✅ done | Update package scope and clean up peer dependencies (checkpoint: done) | 0a29af0 |
@@ -1,5 +1,5 @@
1
1
  import { resolve } from "node:path";
2
- import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
2
+ import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
3
3
 
4
4
  /**
5
5
  * Workflow Guard extension.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tianhai/pi-workflow-kit",
3
- "version": "0.13.1",
3
+ "version": "0.14.0",
4
4
  "description": "Enforce structured brainstorm→plan→execute→finalize workflow with TDD discipline in AI coding agents",
5
5
  "keywords": [
6
6
  "pi-package",
@@ -40,13 +40,12 @@
40
40
  ]
41
41
  },
42
42
  "peerDependencies": {
43
- "@mariozechner/pi-ai": "*",
44
- "@mariozechner/pi-coding-agent": "*",
45
- "@mariozechner/pi-tui": "*",
43
+ "@earendil-works/pi-coding-agent": "*",
46
44
  "@sinclair/typebox": "*"
47
45
  },
48
46
  "devDependencies": {
49
47
  "@biomejs/biome": "^2.3.15",
48
+ "@earendil-works/pi-coding-agent": "*",
50
49
  "vitest": "^4.0.18"
51
50
  }
52
51
  }
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: brainstorming
3
- description: "Use this before any creative work — creating features, building components, adding functionality, or modifying behavior. Explores intent and design before implementation."
3
+ description: "Use this before any creative work — creating features, building components, adding functionality, or modifying behavior. Explores intent and design before implementation. Use this skill whenever the user describes something they want to build, change, or improve, even if they don't say 'brainstorm' — phrases like 'I want to add X', 'let's build Y', 'we need a way to Z', or 'help me design' all apply."
4
4
  ---
5
5
 
6
6
  # Brainstorming
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: diagnose
3
- description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
3
+ description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken. Use this skill whenever the user reports a bug, says 'this doesn't work', 'something's wrong', 'help me debug', or when tests fail for unclear reasons. Works at any point in the workflow — brainstorm, execute, or standalone."
4
4
  ---
5
5
 
6
6
  # Diagnose
@@ -131,14 +131,20 @@ Implement the plan from `docs/plans/*-implementation.md` task by task, with file
131
131
  | Status | Meaning |
132
132
  |--------|---------|
133
133
  | `⬜ pending` | Not started |
134
- | `🔄 in-progress` | Currently being worked on |
134
+ | `🔄 in-progress` | Currently executing plan steps |
135
+ | `⏸ test-review` | Paused at checkpoint: test, waiting for human approval |
136
+ | `⏸ done-review` | Paused at checkpoint: done, waiting for human approval |
135
137
  | `✅ done` | Committed successfully |
136
138
  | `❌ failed` | Could not complete (append `Failed: <reason>`) |
137
139
  | `⏭ skipped` | User chose to skip |
138
140
 
139
141
  **Update rules:**
140
142
  - Mark `🔄 in-progress` immediately when starting a task
143
+ - Mark `⏸ test-review` or `⏸ done-review` when the agent reaches a `⏸ CHECKPOINT` gate in the plan — this must happen BEFORE any `git add` or `git commit`
144
+ - Can only return to `🔄 in-progress` after the human explicitly says "approve"
141
145
  - Mark `✅ done` + record commit hash only after successful `git commit`
146
+ - Cannot go from `🔄 in-progress` → `✅ done` if the task has a checkpoint — must go through the review status first
147
+ - `git add` and `git commit` happen AFTER human approval, never before
142
148
  - Mark `❌ failed` + append reason when the agent can't proceed after retrying
143
149
  - Mark `⏭ skipped` when the user says "skip"
144
150
  - Update `Last updated` timestamp on every change
@@ -146,97 +152,134 @@ Implement the plan from `docs/plans/*-implementation.md` task by task, with file
146
152
 
147
153
  ## Per-task execution
148
154
 
149
- For each task the agent works on:
155
+ For each task:
150
156
 
151
157
  1. **Mark in-progress** — update the progress file: `🔄 in-progress`
152
- 2. **Read the plan selectively** — read the plan's overview section (everything before `## Task 1:`). Skim all `## Task N:` headings for dependency awareness. Then read the current task's body in full. **Read `docs/lessons.md` if it exists** — follow all rules listed there while working on this task.
153
- 3. **Write the test** — for `new-feature`: write a failing test. For `modifying-tested-code`: run existing tests first. For `trivial`: skip steps 3-5, go to step 6.
154
- 4. **Run the test** — confirm it fails (new-feature) or passes (modifying-tested-code). Fix if needed.
155
- 5. **⏸ PAUSE if `checkpoint: test`** present the [checkpoint review](#checkpoint-review) below. Wait for human input. On changes, update and re-present at this same pause.
156
- 6. **Implement** — write the code to make the test pass.
157
- 7. **Run tests** — verify everything passes. If tests fail and you cannot fix them after retrying, see [If you're stuck](#if-youre-stuck). If still stuck, mark the task `❌ failed` with the reason in the progress file and move to the next task.
158
- 8. **Verify against task description** — re-read the task from the plan. Does the implementation satisfy every requirement in the description? If not, fix before proceeding.
159
- 9. **Refactor if needed** — after all tests pass, check for refactoring opportunities:
158
+ 2. **Read the plan** — read the plan's overview section (everything before `## Task 1:`). Skim all `## Task N:` headings for dependency awareness. Then read the current task's body in full. **Read `docs/lessons.md` if it exists** — follow all rules listed there while working on this task.
159
+ 3. **Execute the plan steps** — follow each numbered step in the task body, in order. Stop at any `⏸ CHECKPOINT` gate (see [Checkpoint gates](#checkpoint-gates--when-the-plan-says-stop)).
160
+ 4. **Verify against task description** — re-read the task from the plan. Does the implementation satisfy every requirement listed? If not, fix before proceeding.
161
+ 5. **Refactor**after all tests pass, look for:
160
162
  - **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
161
163
  - **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
162
164
  - **Duplication** — extract repeated patterns
163
165
  - **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
164
166
 
165
167
  Run tests after each refactor step. Never refactor while tests are failing.
166
- 10. **Learn from mistakes** — if you caught yourself making a mistake during this task that you've made before or that would apply to future tasks, append a rule to `docs/lessons.md`. Only add rules that would change future behavior. If the file doesn't exist, create it with the standard format (see below). Do not add one-off errors or things you self-corrected immediately.
167
-
168
- **`docs/lessons.md` format:**
169
- ```markdown
170
- # Lessons Learned
171
-
172
- <!--
173
- Agent: read this at the start of each task during executing-tasks.
174
- Follow every rule. Add new rules when you catch yourself making repeat mistakes.
175
- Retire rules that no longer apply during finalizing.
176
- -->
177
-
178
- ## Rules
179
-
180
- - <new rule here>
181
- ```
182
- 11. **⏸ PAUSE if `checkpoint: done`** — present the [checkpoint review](#checkpoint-review) below. Wait for human input. On changes, update and re-present at this same pause.
183
- 12. **Commit** — `git add` the relevant files and commit with a clear message.
184
- 13. **Update progress** — mark `✅ done` + record the commit hash.
185
- 14. **Suggest session break if needed** — after completing ~3-5 tasks since the last break, suggest:
186
- ```
187
- Tasks N-M done (commits: abc, def)
188
- Progress: X/Y tasks done
189
- ⏭ Next: Task [N+1] [description]
190
- 💡 Context is building up. For clean context on remaining tasks:
191
- /new then /skill:executing-tasks
192
- (or just say "continue" to keep going here)
193
- ```
194
- Also suggest at checkpoint review pauses when multiple tasks have been completed since the last break. Respect the user's choice if they say "continue".
195
- 15. **Loop** — go back to step 1 for the next `⬜ pending` task, or see [After all tasks](#after-all-tasks) if none remain.
168
+ 6. **Learn from mistakes** — if you caught yourself making a mistake during this task that you've made before or that would apply to future tasks, append a rule to `docs/lessons.md`. Only add rules that would change future behavior. If the file doesn't exist, create it with the standard format (see below).
169
+ 7. **Commit** — after all steps are done (no checkpoint gates remain in the task), `git add` the relevant files and commit with a clear message.
170
+ 8. **Update progress** — mark `✅ done` + record the commit hash.
171
+ 9. **Suggest session break if needed** — after completing ~3-5 tasks since the last break, suggest:
172
+ ```
173
+ ✅ Tasks N-M done (commits: abc, def)
174
+ Progress: X/Y tasks done
175
+ ⏭ Next: Task [N+1] [description]
176
+ 💡 Context is building up. For clean context on remaining tasks:
177
+ /new then /skill:executing-tasks
178
+ (or just say "continue" to keep going here)
179
+ ```
180
+ Also suggest at checkpoint review pauses when multiple tasks have been completed since the last break. Respect the user's choice if they say "continue".
181
+ 10. **Loop** — go back to step 1 for the next `⬜ pending` task, or see [After all tasks](#after-all-tasks) if none remain.
182
+
183
+ ### `docs/lessons.md` format
184
+
185
+ ```markdown
186
+ # Lessons Learned
187
+
188
+ <!--
189
+ Agent: read this at the start of each task during executing-tasks.
190
+ Follow every rule. Add new rules when you catch yourself making repeat mistakes.
191
+ Retire rules that no longer apply during finalizing.
192
+ -->
193
+
194
+ ## Rules
195
+
196
+ - <new rule here>
197
+ ```
198
+
199
+ ### Checkpoint gates — when the plan says STOP
200
+
201
+ The plan marks certain steps with `⏸ **CHECKPOINT: test**` or `⏸ **CHECKPOINT: done**`. These are hard stop points. When you reach one:
202
+
203
+ 1. **Stop executing immediately.** Do not proceed to the next step in the task. Do not pass go.
204
+ 2. **Do NOT run `git add` or `git commit`.** The code stays uncommitted until the human approves.
205
+ 3. Update the progress file to `⏸ test-review` or `⏸ done-review`.
206
+ 4. Present the checkpoint review (see below).
207
+ 5. **Wait for the human to respond.** Do not continue executing steps, do not commit, do not move to the next task.
208
+ 6. On approval, update progress back to `🔄 in-progress` and continue with the next step in the task.
209
+
210
+ The whole point of checkpoints is that the human reviews code at critical moments before the agent proceeds further. If you skip past a checkpoint without waiting, you defeat this purpose.
211
+
212
+ | Checkpoint type | What the agent has done at this point | What needs human approval |
213
+ |---|---|---|
214
+ | `checkpoint: test` | Written failing tests, confirmed they fail | The test design — are the right things being tested? |
215
+ | `checkpoint: done` | Implemented, refactored, written lessons | The implementation approach, the refactoring choices |
216
+
217
+ **For `checkpoint: test`:** Only the test file should exist at this point. No implementation code yet. The human reviews the test to confirm the right behavior is being specified.
218
+
219
+ **For `checkpoint: done`:** All code changes are made but NOT committed. Run `git diff` (not `git diff --cached` — nothing should be staged) to show the human what changed. The human reviews before anything is committed.
196
220
 
197
221
  ## Checkpoint review
198
222
 
199
- When pausing at a `checkpoint: test`, present the test code first:
223
+ When you hit a checkpoint gate, present a review to the human and **stop all execution** until they respond.
224
+
225
+ ### At `checkpoint: test`
200
226
 
227
+ You have written the failing tests and confirmed they fail. No implementation code exists yet.
228
+
229
+ Present:
201
230
  ```
202
231
  ⏸ Paused at checkpoint: test for task [N]
203
232
 
204
- **Test written:**
205
- [show the test code]
233
+ **Test file:** `path/to/test.ts`
234
+
235
+ **Test code:**
236
+ [show the full test code]
237
+
238
+ **Test results:** [paste the failing test output showing which tests fail and why]
206
239
 
207
- **Expected behavior:** [what this test validates]
208
- **Next:** Task [N+1] [description]
240
+ **What this validates:** [summarize the behavior these tests specify]
241
+ **Next step after approval:** Write the implementation to make these tests pass
209
242
 
210
- **Available actions:**
211
- - **Approve** — continue to implementation (step 6)
212
- - **Request changes** — describe what to change, I'll update and re-present
213
- - **Revert** — undo this task and mark it back to pending
214
- - **Adjust plan** — modify the remaining tasks in the implementation plan
215
- - `skip`skip this task and move on
216
- - `stop`pause here, start a fresh session later with `/skill:executing-tasks`
217
- - `status` — show the full progress table
243
+ What would you like to do?
244
+ - **approve** — I'll implement to make these tests pass
245
+ - **request changes** — tell me what to change in the tests
246
+ - **revert** — undo this task and go back to pending
247
+ - **skip** — skip this task entirely
248
+ - **stop**pause here, resume later with /skill:executing-tasks
249
+ - **status**show the full progress table
218
250
  ```
219
251
 
220
- When pausing at a `checkpoint: done`, present the implementation review:
252
+ ### At `checkpoint: done`
221
253
 
254
+ You have implemented the code, run the refactor step, and written any lessons. Nothing is committed yet.
255
+
256
+ Present:
222
257
  ```
223
258
  ⏸ Paused at checkpoint: done for task [N]
224
259
 
225
- **What was done:** [brief summary]
226
- **Diff:** [show relevant diff]
227
- **Next:** Task [N+1] [description]
228
-
229
- **Available actions:**
230
- - **Approve** — continue to the next task
231
- - **Request changes** — describe what to change, I'll update and re-present
232
- - **Revert** undo this task and mark it back to pending
233
- - **Adjust plan** modify the remaining tasks in the implementation plan
234
- - `skip` skip this task and move on
235
- - `stop` — pause here, start a fresh session later with `/skill:executing-tasks`
236
- - `status` show the full progress table
260
+ **What was done:** [brief summary — what feature/fix was implemented]
261
+
262
+ **Test results:** [run tests now, paste the passing output]
263
+
264
+ **Diff:** [run `git diff` — the unstaged changes are what this task produced]
265
+ [paste the full diff]
266
+
267
+ **Refactoring done:** [what changed during refactor, or "none needed [reason]"]
268
+ **Lessons learned:** [new rule added to docs/lessons.md, or "none"]
269
+ **Next step after approval:** git add, commit, and move to next task
270
+
271
+ What would you like to do?
272
+ - **approve** — I'll commit and move to the next task
273
+ - **request changes** — tell me what to change, I'll update and re-present
274
+ - **revert** — undo this task and go back to pending
275
+ - **skip** — skip this task entirely
276
+ - **stop** — pause here, resume later with /skill:executing-tasks
277
+ - **status** — show the full progress table
237
278
  ```
238
279
 
239
- Wait for the human to respond. On **request changes**, make the edits, then re-present at the same checkpoint. Repeat until approved.
280
+ **Do not commit before the human approves.** The diff you show at `checkpoint: done` is the uncommitted work. If the human requests changes, make the edits, re-run tests, and re-present the updated diff at the same checkpoint. Repeat until they say "approve".
281
+
282
+ Only after approval: `git add` the relevant files, commit, and mark the task `✅ done`.
240
283
 
241
284
  ## Progress file updates
242
285
 
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: writing-plans
3
- description: "Use this to break a design into an implementation plan with bite-sized TDD tasks. Works with or without a prior brainstorm."
3
+ description: "Use this to break a design into an implementation plan with bite-sized TDD tasks. Works with or without a prior brainstorm. Use this skill when the user says 'let's plan', 'break this down', 'write a plan', 'create tasks', or after a brainstorm session when they want to move to implementation. Also use when the user has a clear idea and wants to jump straight to a structured plan."
4
4
  ---
5
5
 
6
6
  # Writing Plans
@@ -15,22 +15,20 @@ You may only create or edit files under `docs/plans/`. Do not modify source code
15
15
 
16
16
  ## Task format
17
17
 
18
- Each task should produce one committed, testable change:
18
+ Each task should produce one testable change. The executing-tasks skill handles committing — do not include `git commit` in the task body.
19
19
 
20
+ Each task must include:
20
21
  - Exact file paths to create/modify
21
- - Complete code (not "add validation"). For tasks that depend on types or utilities from earlier tasks, reference them explicitly (e.g., `import { User } from Task 2`) and include only the new code
22
- - Exact commands with expected output
23
- - `git commit` after each task
24
- - Optional `checkpoint: test` or `checkpoint: done` label
25
- - Each task's tests should cover the happy path and at least one edge case or error path
22
+ - **Concrete code** — include the actual implementation, not a summary. Write out SQL schemas, type definitions, function signatures with bodies, route handler code, and test assertions. A developer should be able to copy-paste from the plan and have working code. For tasks that depend on types or utilities from earlier tasks, reference them explicitly (e.g., `import { User } from Task 2`) and include only the new code
23
+ - Exact commands with expected output (e.g., `npx vitest run src/user/model.test.ts` → shows 1 test passing)
24
+ - Each task's tests should cover the happy path and at least one edge case or error path, with concrete assertions
26
25
 
27
- Each task must use a numbered heading:
26
+ Each task must use a numbered heading with optional metadata comments:
28
27
 
29
28
  ```markdown
30
29
  ## Task N: <description>
31
30
 
32
31
  <!-- tdd: new-feature -->
33
- <!-- checkpoint: none -->
34
32
  ```
35
33
 
36
34
  ...where N starts at 1 and incrementally numbers each task in the plan.
@@ -41,6 +39,140 @@ Valid TDD values: `new-feature`, `modifying-tested-code`, `trivial`
41
39
 
42
40
  Valid checkpoint values: `none`, `test`, `done`
43
41
 
42
+ ### Level of detail
43
+
44
+ This is the #1 thing to get right. The plan is not a high-level outline — it's a detailed recipe that the executing-tasks skill will follow step by step. If you write "implement login handler" without showing the code, the executing agent has to guess, and that defeats the purpose of the plan.
45
+
46
+ Think of it this way: the plan author (you, now) has the full design context, the domain model, and the architecture in mind. The plan executor (a future agent session) will have none of that context — just the plan file. Write accordingly.
47
+
48
+ **What "concrete code" means in practice:**
49
+ - SQL: `CREATE TABLE` statements with all columns, types, and constraints
50
+ - Types/interfaces: full type definitions with fields
51
+ - Functions: signature + body (the logic, not just the name)
52
+ - Tests: concrete assertions (`expect(result.status).toBe(409)`) not descriptions ("test that it returns an error")
53
+ - Routes: the actual handler code with validation, error handling, and response format
54
+ - Config: exact values, not "configure appropriately"
55
+
56
+ **Bad** (too vague — the executor must guess):
57
+ ```
58
+ 3. Implement bookmark model
59
+ ```
60
+
61
+ **Good** (executor can copy-paste):
62
+ ```
63
+ 3. Implement `src/db/bookmarks.ts`:
64
+
65
+ ```ts
66
+ import db from '../db.js';
67
+
68
+ export function createBookmarksTable() {
69
+ db.exec(`
70
+ CREATE TABLE IF NOT EXISTS bookmarks (
71
+ id TEXT PRIMARY KEY,
72
+ userId TEXT NOT NULL,
73
+ messageId TEXT NOT NULL,
74
+ createdAt TEXT DEFAULT (datetime('now')),
75
+ UNIQUE(userId, messageId)
76
+ )
77
+ `);
78
+ }
79
+
80
+ export function insertBookmark(userId: string, messageId: string) {
81
+ const id = crypto.randomUUID();
82
+ db.prepare('INSERT INTO bookmarks (id, userId, messageId) VALUES (?, ?, ?)').run(id, userId, messageId);
83
+ return { id, userId, messageId };
84
+ }
85
+ ```
86
+ ```
87
+
88
+ ### Task body structure
89
+
90
+ The examples below show the structure — headings, metadata comments, checkpoints, and step numbering. For the code content within steps, follow the detail level described above.
91
+
92
+ **No checkpoint** — numbered steps only:
93
+ ```markdown
94
+ ## Task 1: Create User model
95
+
96
+ <!-- tdd: new-feature -->
97
+
98
+ Files:
99
+ - `src/user/model.ts`
100
+ - `src/user/model.test.ts`
101
+
102
+ Steps:
103
+ 1. Write failing test for User model creation
104
+ 2. Run test — confirm it fails
105
+ 3. Implement User model
106
+ 4. Run test — confirm it passes
107
+ ```
108
+
109
+ **`checkpoint: test`** — gate after test, before implementing:
110
+ ```markdown
111
+ ## Task 2: Write auth tests
112
+
113
+ <!-- tdd: new-feature -->
114
+ <!-- checkpoint: test -->
115
+
116
+ Files:
117
+ - `src/auth/login.test.ts`
118
+
119
+ Steps:
120
+ 1. Write failing test for login with valid credentials
121
+ 2. Run test — confirm it fails
122
+
123
+ ⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
124
+
125
+ 3. Implement login handler
126
+ 4. Run test — confirm it passes
127
+ 5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
128
+ 6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
129
+ ```
130
+
131
+ **`checkpoint: done`** — gate after all steps including refactor/lessons:
132
+ ```markdown
133
+ ## Task 3: Add login endpoint
134
+
135
+ <!-- tdd: new-feature -->
136
+ <!-- checkpoint: done -->
137
+
138
+ Files:
139
+ - `src/auth/login.ts`
140
+ - `src/auth/login.test.ts`
141
+
142
+ Steps:
143
+ 1. Write failing test for login with valid credentials
144
+ 2. Run test — confirm it fails
145
+ 3. Implement login handler
146
+ 4. Run test — confirm it passes
147
+ 5. Add edge case tests (invalid password, missing email)
148
+ 6. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
149
+ 7. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
150
+
151
+ ⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
152
+ ```
153
+
154
+ **Both checkpoints** — gate after test, then gate after refactor/lessons:
155
+ ```markdown
156
+ ## Task 4: Complex auth flow
157
+
158
+ <!-- tdd: new-feature -->
159
+ <!-- checkpoint: test -->
160
+ <!-- checkpoint: done -->
161
+
162
+ Steps:
163
+ 1. Write failing test for auth flow
164
+ 2. Run test — confirm it fails
165
+
166
+ ⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
167
+
168
+ 3. Implement auth flow
169
+ 4. Run test — confirm it passes
170
+ 5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
171
+ 6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
172
+
173
+ ⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
174
+ ```
175
+
44
176
 
45
177
  ## Vertical slices
46
178
 
@@ -69,19 +201,20 @@ Label each task with its TDD scenario:
69
201
 
70
202
  | Scenario | When | Instructions in the task |
71
203
  |---|---|---|
72
- | **New feature** | Adding new behavior | Write failing test → run it → implement → run it → commit |
73
- | **Modifying tested code** | Changing existing behavior | Run existing tests first → modify → verify they pass → commit |
74
- | **Trivial** | Config, docs, naming | Use judgment, commit when done |
204
+ | **New feature** | Adding new behavior | Write failing test → run it → implement → run it |
205
+ | **Modifying tested code** | Changing existing behavior | Run existing tests first → modify → verify they pass |
206
+ | **Trivial** | Config, docs, naming | Use judgment |
75
207
 
76
208
  ## Checkpoint labels
77
209
 
78
- Optionally label each task with a `checkpoint` to require human review before proceeding:
210
+ Label each task with a `checkpoint` to require human review before proceeding. The checkpoint gate (`⏸ CHECKPOINT`) goes in the task body — the agent follows the plan step by step and pauses when it reaches the gate.
79
211
 
80
- | Checkpoint | When to use | What happens during execution |
212
+ | Checkpoint | When to use | What the plan should include |
81
213
  |---|---|---|
82
- | *(none)* | Trivial tasks, well-understood changes | Auto-advance, no pause |
83
- | **`checkpoint: test`** | Test design matters (API contracts, edge cases, complex behavior) | Pause after writing the failing test, before implementing |
84
- | **`checkpoint: done`** | Implementation review matters (complex logic, security, performance) | Pause after implementation + tests pass, before committing |
214
+ | *(none)* | Trivial tasks, well-understood changes | Numbered steps only |
215
+ | **`checkpoint: test`** | Test design matters (API contracts, edge cases, complex behavior) | Steps up to test `⏸ CHECKPOINT: test` implement steps (including refactor/lessons) |
216
+ | **`checkpoint: done`** | Implementation review matters (complex logic, security, performance) | Steps (including refactor/lessons) `⏸ CHECKPOINT: done` |
217
+ | Both | Non-obvious tests AND complex logic | Steps up to test → `⏸ CHECKPOINT: test` → implement steps (including refactor/lessons) → `⏸ CHECKPOINT: done` |
85
218
 
86
219
  Use judgment when assigning checkpoints. Prefer `checkpoint: test` for new features with non-obvious test design. Prefer `checkpoint: done` for tasks where the implementation approach is debatable. Most tasks should not need a checkpoint. The user can adjust checkpoints when reviewing the plan.
87
220