npm - @tianhai/pi-workflow-kit - Versions diffs - 0.13.1 → 0.14.0 - Mend

@tianhai/pi-workflow-kit 0.13.1 → 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/README.md CHANGED Viewed

@@ -47,7 +47,7 @@ brainstorm → plan → execute → finalize
 |-------|---------|--------------|
 | **Brainstorm** | `/skill:brainstorming` | Explore approaches, debate tradeoffs, produce a design doc |
 | **Plan** | `/skill:writing-plans` | Break design into bite-sized TDD tasks with file paths and acceptance criteria |
-| **Execute** | `/skill:executing-tasks` | Implement tasks one-by-one with TDD discipline and optional checkpoint review gates |
+| **Execute** | `/skill:executing-tasks` | Implement tasks one-by-one with TDD discipline and pre-commit checkpoint review gates |
 | **Finalize** | `/skill:finalizing` | Archive plan docs, update README/CHANGELOG, create PR |
 | **Diagnose** | `/skill:diagnose` | 6-phase debugging loop: reproduce → hypothesize → instrument → fix → verify |

package/docs/plans/completed/2026-05-08-checkpoint-gates-design.md ADDED Viewed

@@ -0,0 +1,235 @@
+# Design: Checkpoint gates and pre-commit discipline
+## Problem
+The `executing-tasks` skill instructs the agent to pause at checkpoints for human review before committing. In practice, the agent commits first then presents the review, defeating the purpose of checkpoints.
+Root causes in executing-tasks:
+1. "PAUSE if" reads as optional — the agent interprets it as "if you remember"
+2. Steps 6-12 flow together as implement→commit — the pause gets swallowed
+3. The diff format asks for committed state — nudges the agent to commit first
+Root causes in writing-plans:
+4. Task format says `git commit` after each task — the agent sees the commit line past the checkpoint and skips to it
+5. Refactor and lessons are optional-sounding steps at the end of a long list — the agent skips them
+6. The plan body has no structural enforcement — everything is just text the agent reads at once
+Secondary issue: the agent skips steps 9 (Refactor if needed) and 10 (Learn from mistakes) because they're optional-sounding steps at the end of a long list.
+## Key insight
+The agent follows numbered steps and skips loose sections. **Output requirements** (things the agent has to produce) are stronger than instructions (things the agent is told to do). The checkpoint review format forces the agent to report refactoring and lessons — that's the enforcement mechanism.
+No-checkpoint tasks are simple enough that refactor/lessons genuinely aren't needed — the task author chose no checkpoint because the task is trivial.
+## Solution
+- **writing-plans**: Generate task bodies with numbered steps (including refactor/lessons for checkpointed tasks) and checkpoint gates. Never include `git commit` in the plan.
+- **executing-tasks**: Simplified runner — follow the plan step by step, pause at checkpoint gates, commit after approval.
+- **Progress file**: Use Status column to enforce checkpoint gates. Agent can't go from `🔄 in-progress` → `✅ done` if the task has a checkpoint — must go through `⏸ test-review` or `⏸ done-review` first.
+## Design
+### Writing-plans: task format
+The plan never includes `git commit`. That's the executing-tasks skill's responsibility.
+**No-checkpoint task:**
+```markdown
+## Task 1: Create User model
+<!-- tdd: new-feature -->
+<!-- checkpoint: none -->
+Files:
+- `src/user/model.ts`
+- `src/user/model.test.ts`
+Steps:
+1. Write failing test for User model creation
+2. Run test — confirm it fails
+3. Implement User model
+4. Run test — confirm it passes
+```
+**Checkpoint: test task:**
+```markdown
+## Task 2: Write auth tests
+<!-- tdd: new-feature -->
+<!-- checkpoint: test -->
+Files:
+- `src/auth/login.test.ts`
+Steps:
+1. Write failing test for login with valid credentials
+2. Run test — confirm it fails
+⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
+3. Implement login handler
+4. Run test — confirm it passes
+5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
+6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
+```
+**Checkpoint: done task:**
+```markdown
+## Task 3: Add login endpoint
+<!-- tdd: new-feature -->
+<!-- checkpoint: done -->
+Files:
+- `src/auth/login.ts`
+- `src/auth/login.test.ts`
+Steps:
+1. Write failing test for login with valid credentials
+2. Run test — confirm it fails
+3. Implement login handler
+4. Run test — confirm it passes
+5. Add edge case tests (invalid password, missing email)
+6. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
+7. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
+⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
+```
+**Task with both checkpoints:**
+```markdown
+## Task 4: Complex auth flow
+<!-- tdd: new-feature -->
+<!-- checkpoint: test -->
+<!-- checkpoint: done -->
+Steps:
+1. Write failing test for auth flow
+2. Run test — confirm it fails
+⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
+3. Implement auth flow
+4. Run test — confirm it passes
+5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
+6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
+⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
+```
+### Writing-plans: checkpoint labels table
+| Checkpoint | When to use | What the plan should say |
+|---|---|---|
+| *(none)* | Trivial tasks, well-understood changes | Numbered steps only |
+| **`checkpoint: test`** | Test design matters | Steps up to test → `⏸ CHECKPOINT: test` → implement steps (including refactor/lessons) |
+| **`checkpoint: done`** | Implementation review matters | Steps (including refactor/lessons) → `⏸ CHECKPOINT: done` |
+| Both | Non-obvious tests AND complex logic | Steps up to test → `⏸ CHECKPOINT: test` → implement steps (including refactor/lessons) → `⏸ CHECKPOINT: done` |
+### Executing-tasks: simplified runner
+The per-task execution becomes:
+1. Mark `🔄 in-progress` in progress file
+2. Read the current task from the plan
+3. Execute each numbered step in order
+4. When hitting `⏸ CHECKPOINT` in the plan:
+   - Update progress to `⏸ test-review` or `⏸ done-review`
+   - Present the checkpoint review (see format below)
+   - Wait for human approval
+   - On approval, update progress back to `🔄 in-progress`
+   - Continue with the next step
+5. After all steps are done (or for no-checkpoint tasks, after steps + commit):
+   - `git add` and commit with a clear message
+   - Update progress to `✅ done` + record commit hash
+### Progress file: Status-enforced gates
+Status values:
+| Status | Meaning |
+|--------|---------|
+| `⬜ pending` | Not started |
+| `🔄 in-progress` | Currently executing plan steps |
+| `⏸ test-review` | Paused at checkpoint: test, waiting for human approval |
+| `⏸ done-review` | Paused at checkpoint: done, waiting for human approval |
+| `✅ done` | Committed successfully |
+| `❌ failed` | Could not complete |
+| `⏭ skipped` | User chose to skip |
+Enforcement rules:
+- Agent cannot go from `🔄 in-progress` → `✅ done` if the task has a checkpoint
+- Must go through `⏸ test-review` or `⏸ done-review` first
+- Can only return to `🔄 in-progress` after human says "approve"
+- Can only go to `✅ done` after commit
+Example progress file:
+```markdown
+# Progress: Auth feature
+Plan: docs/plans/2026-05-08-auth-implementation.md
+Branch: auth-feature
+Started: 2026-05-08T10:00:00Z
+Last updated: 2026-05-08T10:05:00Z
+| # | Status | Task | Commit |
+|---|--------|------|--------|
+| 1 | ✅ done | Create User model | abc123 |
+| 2 | ⏸ done-review | Add login endpoint (checkpoint: done) | — |
+| 3 | ⬜ pending | Add auth middleware (checkpoint: done) | — |
+```
+### Checkpoint review format
+For `checkpoint: test`:
+```
+⏸ Paused at checkpoint: test for task [N]
+**Test written:** [show test code]
+**Expected behavior:** [what this validates]
+**Next:** Continue implementing after approval
+**Available actions:**
+- **Approve** — continue to implementation
+- **Request changes** — describe what to change
+- **Revert** — undo this task and mark it back to pending
+- `skip` — skip this task
+- `stop` — pause here, resume later with `/skill:executing-tasks`
+```
+For `checkpoint: done`:
+```
+⏸ Paused at checkpoint: done for task [N]
+**What was done:** [brief summary]
+**Refactoring done:** [what changed, or "none needed — [reason]"]
+**Lessons learned:** [new rule added, or "none"]
+**Diff:** [run `git diff --cached` or `git diff` — do NOT commit first]
+**Next:** Commit after approval
+**Available actions:**
+- **Approve** — commit and move to next task
+- **Request changes** — describe what to change
+- **Revert** — undo this task and mark it back to pending
+- `skip` — skip this task
+- `stop` — pause here, resume later with `/skill:executing-tasks`
+```
+## Files to change
+- `skills/writing-plans/SKILL.md` — update task format template and checkpoint labels section
+- `skills/executing-tasks/SKILL.md` — simplify per-task execution to plan-following runner, update progress file status values
+## What stays the same
+- executing-tasks: Before you start, First run, Resume, User override commands, Receiving code review, If you're stuck, After all tasks — all unchanged
+- writing-plans: Process steps, vertical slices, TDD section — all unchanged

package/docs/plans/completed/2026-05-08-checkpoint-gates-implementation.md ADDED Viewed

@@ -0,0 +1,83 @@
+# Implementation: Checkpoint gates and pre-commit discipline
+Design: `docs/plans/2026-05-08-checkpoint-gates-design.md`
+## Overview
+Update two skill files so that:
+1. `writing-plans` generates task bodies with checkpoint gates and numbered refactor/lessons steps, never includes `git commit`
+2. `executing-tasks` becomes a simplified plan-following runner with status-enforced checkpoint gates
+## Task 1: Update writing-plans task format and checkpoint labels
+<!-- tdd: trivial -->
+<!-- checkpoint: none -->
+Files:
+- `skills/writing-plans/SKILL.md`
+Changes:
+### Task format section
+Replace the task format section. Key changes:
+- Remove `git commit` from bullet points (commit is the executing-tasks skill's responsibility)
+- Remove `<!-- checkpoint: none -->` from the default template (omit when no checkpoint)
+- Add task body examples for each checkpoint type (none, test, done, both)
+- For checkpointed tasks, include numbered refactor and lessons steps
+- For checkpointed tasks, include `⏸ CHECKPOINT` gate in the task body
+- No task body should include `git commit`
+### Checkpoint labels section
+Replace the checkpoint labels table. Change the last column from "What happens during execution" to "What the plan should include", showing the gate structure for each checkpoint type.
+### TDD section
+Remove "→ commit" from the Instructions column — commit is not part of the plan.
+## Task 2: Update executing-tasks per-task execution and progress file
+<!-- tdd: trivial -->
+<!-- checkpoint: done -->
+Files:
+- `skills/executing-tasks/SKILL.md`
+Changes:
+### Per-task execution section
+Replace the current 15-step list with a simplified plan-following runner:
+1. Mark `🔄 in-progress` in progress file
+2. Read the current task from the plan
+3. Execute each numbered step in order
+4. When hitting `⏸ CHECKPOINT` in the plan:
+   - Update progress to `⏸ test-review` or `⏸ done-review`
+   - Present the checkpoint review
+   - Wait for human approval
+   - On approval, update progress back to `🔄 in-progress`
+   - Continue with the next step
+5. After all steps done:
+   - `git add` and commit with a clear message
+   - Update progress to `✅ done` + record commit hash
+Remove the inline refactor/lessons steps — they're now in the plan for checkpointed tasks.
+### Progress file section
+Add `⏸ test-review` and `⏸ done-review` status values. Add enforcement rule: agent cannot go from `🔄 in-progress` → `✅ done` if task has a checkpoint.
+### Checkpoint review section
+Update `checkpoint: done` review to include:
+- **Refactoring done:** field
+- **Lessons learned:** field
+- **Diff:** uses `git diff --cached` or `git diff`, with "do NOT commit first"
+Simplify available actions (remove "Adjust plan" since the plan drives execution).
+### Keep unchanged
+- Before you start, First run, Resume, User override commands, Receiving code review, If you're stuck, After all tasks

package/docs/plans/completed/2026-05-08-checkpoint-gates-progress.md ADDED Viewed

@@ -0,0 +1,11 @@
+# Progress: Checkpoint gates and pre-commit discipline
+Plan: docs/plans/2026-05-08-checkpoint-gates-implementation.md
+Branch: main
+Started: 2026-05-08T20:00:00Z
+Last updated: 2026-05-08T20:12:00Z
+| # | Status | Task | Commit |
+|---|--------|------|--------|
+| 1 | ✅ done | Update writing-plans task format and checkpoint labels | d39510c |
+| 2 | ✅ done | Update executing-tasks per-task execution and progress file (checkpoint: done) | 7c84f59 |

package/docs/plans/completed/2026-05-08-migrate-earendil-works-design.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Migrate from @mariozechner to @earendil-works
+## Context
+pi has moved from `@mariozechner` to `@earendil-works` on GitHub and npm. The old `@mariozechner/pi-coding-agent@0.73.1` is deprecated. This package has two unused peer deps (`pi-ai`, `pi-tui`) that should be cleaned up.
+## Changes
+### 1. `extensions/workflow-guard.ts` — update import
+```diff
+-import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
++import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
+```
+### 2. `package.json` — update peerDependencies
+```diff
+ "peerDependencies": {
+-    "@mariozechner/pi-ai": "*",
+-    "@mariozechner/pi-coding-agent": "*",
+-    "@mariozechner/pi-tui": "*",
++    "@earendil-works/pi-coding-agent": "*",
+     "@sinclair/typebox": "*"
+ },
+```
+- Rename `pi-coding-agent` to `@earendil-works/pi-coding-agent`
+- Remove `@mariozechner/pi-ai` (unused)
+- Remove `@mariozechner/pi-tui` (unused)
+## Verification
+- `ExtensionAPI` is exported identically from both old and new packages (same export map, same `.d.ts` path)
+- No other imports from `@mariozechner/*` exist in the codebase
+## Impact
+Users on old `@mariozechner/pi-coding-agent` will get a peer dependency resolution error — they must update pi. The old package is explicitly deprecated pointing to the new one.

package/docs/plans/completed/2026-05-08-migrate-earendil-works-implementation.md ADDED Viewed

@@ -0,0 +1,45 @@
+# Implementation Plan: Migrate from @mariozechner to @earendil-works
+## Task 1: Update package scope and clean up peer dependencies
+<!-- tdd: trivial -->
+<!-- checkpoint: done -->
+Migrate the sole import and peerDependencies from `@mariozechner/*` to `@earendil-works/pi-coding-agent`, dropping the two unused deps (`pi-ai`, `pi-tui`).
+### Files to modify
+1. **`extensions/workflow-guard.ts`** — line 2:
+```diff
+-import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
++import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
+```
+2. **`package.json`** — `peerDependencies`:
+```diff
+ "peerDependencies": {
+-    "@mariozechner/pi-ai": "*",
+-    "@mariozechner/pi-coding-agent": "*",
+-    "@mariozechner/pi-tui": "*",
++    "@earendil-works/pi-coding-agent": "*",
+     "@sinclair/typebox": "*"
+ },
+```
+### Verify
+```bash
+grep -r "@mariozechner" extensions/ package.json
+# Expected: no output
+npm run check
+# Expected: passes (lint + tests)
+```
+### Commit
+```
+chore: migrate from @mariozechner to @earendil-works, drop unused peer deps
+```

package/docs/plans/completed/2026-05-08-migrate-earendil-works-progress.md ADDED Viewed

@@ -0,0 +1,10 @@
+# Progress: Migrate from @mariozechner to @earendil-works
+Plan: docs/plans/2026-05-08-migrate-earendil-works-implementation.md
+Branch: migrate-earendil-works
+Started: 2026-05-08T00:00:00Z
+Last updated: 2026-05-08T00:02:00Z
+| # | Status | Task | Commit |
+|---|--------|------|--------|
+| 1 | ✅ done | Update package scope and clean up peer dependencies (checkpoint: done) | 0a29af0 |

package/extensions/workflow-guard.ts CHANGED Viewed

@@ -1,5 +1,5 @@
 import { resolve } from "node:path";
-import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
+import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
 /**
  * Workflow Guard extension.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tianhai/pi-workflow-kit",
-  "version": "0.13.1",
+  "version": "0.14.0",
   "description": "Enforce structured brainstorm→plan→execute→finalize workflow with TDD discipline in AI coding agents",
   "keywords": [
     "pi-package",
@@ -40,13 +40,12 @@
     ]
   },
   "peerDependencies": {
-    "@mariozechner/pi-ai": "*",
-    "@mariozechner/pi-coding-agent": "*",
-    "@mariozechner/pi-tui": "*",
+    "@earendil-works/pi-coding-agent": "*",
     "@sinclair/typebox": "*"
   },
   "devDependencies": {
     "@biomejs/biome": "^2.3.15",
+    "@earendil-works/pi-coding-agent": "*",
     "vitest": "^4.0.18"
   }
 }

package/skills/brainstorming/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: brainstorming
-description: "Use this before any creative work — creating features, building components, adding functionality, or modifying behavior. Explores intent and design before implementation."
+description: "Use this before any creative work — creating features, building components, adding functionality, or modifying behavior. Explores intent and design before implementation. Use this skill whenever the user describes something they want to build, change, or improve, even if they don't say 'brainstorm' — phrases like 'I want to add X', 'let's build Y', 'we need a way to Z', or 'help me design' all apply."
 ---
 # Brainstorming

package/skills/diagnose/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: diagnose
-description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
+description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken. Use this skill whenever the user reports a bug, says 'this doesn't work', 'something's wrong', 'help me debug', or when tests fail for unclear reasons. Works at any point in the workflow — brainstorm, execute, or standalone."
 ---
 # Diagnose

package/skills/executing-tasks/SKILL.md CHANGED Viewed

@@ -131,14 +131,20 @@ Implement the plan from `docs/plans/*-implementation.md` task by task, with file
 | Status | Meaning |
 |--------|---------|
 | `⬜ pending` | Not started |
-| `🔄 in-progress` | Currently being worked on |
+| `🔄 in-progress` | Currently executing plan steps |
+| `⏸ test-review` | Paused at checkpoint: test, waiting for human approval |
+| `⏸ done-review` | Paused at checkpoint: done, waiting for human approval |
 | `✅ done` | Committed successfully |
 | `❌ failed` | Could not complete (append `Failed: <reason>`) |
 | `⏭ skipped` | User chose to skip |
 **Update rules:**
 - Mark `🔄 in-progress` immediately when starting a task
+- Mark `⏸ test-review` or `⏸ done-review` when the agent reaches a `⏸ CHECKPOINT` gate in the plan — this must happen BEFORE any `git add` or `git commit`
+- Can only return to `🔄 in-progress` after the human explicitly says "approve"
 - Mark `✅ done` + record commit hash only after successful `git commit`
+- Cannot go from `🔄 in-progress` → `✅ done` if the task has a checkpoint — must go through the review status first
+- `git add` and `git commit` happen AFTER human approval, never before
 - Mark `❌ failed` + append reason when the agent can't proceed after retrying
 - Mark `⏭ skipped` when the user says "skip"
 - Update `Last updated` timestamp on every change
@@ -146,97 +152,134 @@ Implement the plan from `docs/plans/*-implementation.md` task by task, with file
 ## Per-task execution
-For each task the agent works on:
+For each task:
 1. **Mark in-progress** — update the progress file: `🔄 in-progress`
-2. **Read the plan selectively** — read the plan's overview section (everything before `## Task 1:`). Skim all `## Task N:` headings for dependency awareness. Then read the current task's body in full. **Read `docs/lessons.md` if it exists** — follow all rules listed there while working on this task.
-3. **Write the test** — for `new-feature`: write a failing test. For `modifying-tested-code`: run existing tests first. For `trivial`: skip steps 3-5, go to step 6.
-4. **Run the test** — confirm it fails (new-feature) or passes (modifying-tested-code). Fix if needed.
-5. **⏸ PAUSE if `checkpoint: test`** — present the [checkpoint review](#checkpoint-review) below. Wait for human input. On changes, update and re-present at this same pause.
-6. **Implement** — write the code to make the test pass.
-7. **Run tests** — verify everything passes. If tests fail and you cannot fix them after retrying, see [If you're stuck](#if-youre-stuck). If still stuck, mark the task `❌ failed` with the reason in the progress file and move to the next task.
-8. **Verify against task description** — re-read the task from the plan. Does the implementation satisfy every requirement in the description? If not, fix before proceeding.
-9. **Refactor if needed** — after all tests pass, check for refactoring opportunities:
+2. **Read the plan** — read the plan's overview section (everything before `## Task 1:`). Skim all `## Task N:` headings for dependency awareness. Then read the current task's body in full. **Read `docs/lessons.md` if it exists** — follow all rules listed there while working on this task.
+3. **Execute the plan steps** — follow each numbered step in the task body, in order. Stop at any `⏸ CHECKPOINT` gate (see [Checkpoint gates](#checkpoint-gates--when-the-plan-says-stop)).
+4. **Verify against task description** — re-read the task from the plan. Does the implementation satisfy every requirement listed? If not, fix before proceeding.
+5. **Refactor** — after all tests pass, look for:
    - **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
    - **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
    - **Duplication** — extract repeated patterns
    - **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
    Run tests after each refactor step. Never refactor while tests are failing.
-10. **Learn from mistakes** — if you caught yourself making a mistake during this task that you've made before or that would apply to future tasks, append a rule to `docs/lessons.md`. Only add rules that would change future behavior. If the file doesn't exist, create it with the standard format (see below). Do not add one-off errors or things you self-corrected immediately.
-    **`docs/lessons.md` format:**
-    ```markdown
-    # Lessons Learned
-    <!--
-    Agent: read this at the start of each task during executing-tasks.
-    Follow every rule. Add new rules when you catch yourself making repeat mistakes.
-    Retire rules that no longer apply during finalizing.
-    -->
-    ## Rules
-    - <new rule here>
-    ```
-11. **⏸ PAUSE if `checkpoint: done`** — present the [checkpoint review](#checkpoint-review) below. Wait for human input. On changes, update and re-present at this same pause.
-12. **Commit** — `git add` the relevant files and commit with a clear message.
-13. **Update progress** — mark `✅ done` + record the commit hash.
-14. **Suggest session break if needed** — after completing ~3-5 tasks since the last break, suggest:
-    ```
-    ✅ Tasks N-M done (commits: abc, def)
-    Progress: X/Y tasks done
-    ⏭  Next: Task [N+1] — [description]
-    💡 Context is building up. For clean context on remaining tasks:
-       /new  then  /skill:executing-tasks
-       (or just say "continue" to keep going here)
-    ```
-    Also suggest at checkpoint review pauses when multiple tasks have been completed since the last break. Respect the user's choice if they say "continue".
-15. **Loop** — go back to step 1 for the next `⬜ pending` task, or see [After all tasks](#after-all-tasks) if none remain.
+6. **Learn from mistakes** — if you caught yourself making a mistake during this task that you've made before or that would apply to future tasks, append a rule to `docs/lessons.md`. Only add rules that would change future behavior. If the file doesn't exist, create it with the standard format (see below).
+7. **Commit** — after all steps are done (no checkpoint gates remain in the task), `git add` the relevant files and commit with a clear message.
+8. **Update progress** — mark `✅ done` + record the commit hash.
+9. **Suggest session break if needed** — after completing ~3-5 tasks since the last break, suggest:
+   ```
+   ✅ Tasks N-M done (commits: abc, def)
+   Progress: X/Y tasks done
+   ⏭  Next: Task [N+1] — [description]
+   💡 Context is building up. For clean context on remaining tasks:
+      /new  then  /skill:executing-tasks
+      (or just say "continue" to keep going here)
+   ```
+   Also suggest at checkpoint review pauses when multiple tasks have been completed since the last break. Respect the user's choice if they say "continue".
+10. **Loop** — go back to step 1 for the next `⬜ pending` task, or see [After all tasks](#after-all-tasks) if none remain.
+### `docs/lessons.md` format
+```markdown
+# Lessons Learned
+<!--
+Agent: read this at the start of each task during executing-tasks.
+Follow every rule. Add new rules when you catch yourself making repeat mistakes.
+Retire rules that no longer apply during finalizing.
+-->
+## Rules
+- <new rule here>
+```
+### Checkpoint gates — when the plan says STOP
+The plan marks certain steps with `⏸ **CHECKPOINT: test**` or `⏸ **CHECKPOINT: done**`. These are hard stop points. When you reach one:
+1. **Stop executing immediately.** Do not proceed to the next step in the task. Do not pass go.
+2. **Do NOT run `git add` or `git commit`.** The code stays uncommitted until the human approves.
+3. Update the progress file to `⏸ test-review` or `⏸ done-review`.
+4. Present the checkpoint review (see below).
+5. **Wait for the human to respond.** Do not continue executing steps, do not commit, do not move to the next task.
+6. On approval, update progress back to `🔄 in-progress` and continue with the next step in the task.
+The whole point of checkpoints is that the human reviews code at critical moments before the agent proceeds further. If you skip past a checkpoint without waiting, you defeat this purpose.
+| Checkpoint type | What the agent has done at this point | What needs human approval |
+|---|---|---|
+| `checkpoint: test` | Written failing tests, confirmed they fail | The test design — are the right things being tested? |
+| `checkpoint: done` | Implemented, refactored, written lessons | The implementation approach, the refactoring choices |
+**For `checkpoint: test`:** Only the test file should exist at this point. No implementation code yet. The human reviews the test to confirm the right behavior is being specified.
+**For `checkpoint: done`:** All code changes are made but NOT committed. Run `git diff` (not `git diff --cached` — nothing should be staged) to show the human what changed. The human reviews before anything is committed.
 ## Checkpoint review
-When pausing at a `checkpoint: test`, present the test code first:
+When you hit a checkpoint gate, present a review to the human and **stop all execution** until they respond.
+### At `checkpoint: test`
+You have written the failing tests and confirmed they fail. No implementation code exists yet.
+Present:
 ```
 ⏸ Paused at checkpoint: test for task [N]
-**Test written:**
-[show the test code]
+**Test file:** `path/to/test.ts`
+**Test code:**
+[show the full test code]
+**Test results:** [paste the failing test output showing which tests fail and why]
-**Expected behavior:** [what this test validates]
-**Next:** Task [N+1] — [description]
+**What this validates:** [summarize the behavior these tests specify]
+**Next step after approval:** Write the implementation to make these tests pass
-**Available actions:**
-- **Approve** — continue to implementation (step 6)
-- **Request changes** — describe what to change, I'll update and re-present
-- **Revert** — undo this task and mark it back to pending
-- **Adjust plan** — modify the remaining tasks in the implementation plan
-- `skip` — skip this task and move on
-- `stop` — pause here, start a fresh session later with `/skill:executing-tasks`
-- `status` — show the full progress table
+What would you like to do?
+- **approve** — I'll implement to make these tests pass
+- **request changes** — tell me what to change in the tests
+- **revert** — undo this task and go back to pending
+- **skip** — skip this task entirely
+- **stop** — pause here, resume later with /skill:executing-tasks
+- **status** — show the full progress table
 ```
-When pausing at a `checkpoint: done`, present the implementation review:
+### At `checkpoint: done`
+You have implemented the code, run the refactor step, and written any lessons. Nothing is committed yet.
+Present:
 ```
 ⏸ Paused at checkpoint: done for task [N]
-**What was done:** [brief summary]
-**Diff:** [show relevant diff]
-**Next:** Task [N+1] — [description]
-**Available actions:**
-- **Approve** — continue to the next task
-- **Request changes** — describe what to change, I'll update and re-present
-- **Revert** — undo this task and mark it back to pending
-- **Adjust plan** — modify the remaining tasks in the implementation plan
-- `skip` — skip this task and move on
-- `stop` — pause here, start a fresh session later with `/skill:executing-tasks`
-- `status` — show the full progress table
+**What was done:** [brief summary — what feature/fix was implemented]
+**Test results:** [run tests now, paste the passing output]
+**Diff:** [run `git diff` — the unstaged changes are what this task produced]
+[paste the full diff]
+**Refactoring done:** [what changed during refactor, or "none needed — [reason]"]
+**Lessons learned:** [new rule added to docs/lessons.md, or "none"]
+**Next step after approval:** git add, commit, and move to next task
+What would you like to do?
+- **approve** — I'll commit and move to the next task
+- **request changes** — tell me what to change, I'll update and re-present
+- **revert** — undo this task and go back to pending
+- **skip** — skip this task entirely
+- **stop** — pause here, resume later with /skill:executing-tasks
+- **status** — show the full progress table
 ```
-Wait for the human to respond. On **request changes**, make the edits, then re-present at the same checkpoint. Repeat until approved.
+**Do not commit before the human approves.** The diff you show at `checkpoint: done` is the uncommitted work. If the human requests changes, make the edits, re-run tests, and re-present the updated diff at the same checkpoint. Repeat until they say "approve".
+Only after approval: `git add` the relevant files, commit, and mark the task `✅ done`.
 ## Progress file updates

package/skills/writing-plans/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: writing-plans
-description: "Use this to break a design into an implementation plan with bite-sized TDD tasks. Works with or without a prior brainstorm."
+description: "Use this to break a design into an implementation plan with bite-sized TDD tasks. Works with or without a prior brainstorm. Use this skill when the user says 'let's plan', 'break this down', 'write a plan', 'create tasks', or after a brainstorm session when they want to move to implementation. Also use when the user has a clear idea and wants to jump straight to a structured plan."
 ---
 # Writing Plans
@@ -15,22 +15,20 @@ You may only create or edit files under `docs/plans/`. Do not modify source code
 ## Task format
-Each task should produce one committed, testable change:
+Each task should produce one testable change. The executing-tasks skill handles committing — do not include `git commit` in the task body.
+Each task must include:
 - Exact file paths to create/modify
-- Complete code (not "add validation"). For tasks that depend on types or utilities from earlier tasks, reference them explicitly (e.g., `import { User } from Task 2`) and include only the new code
-- Exact commands with expected output
-- `git commit` after each task
-- Optional `checkpoint: test` or `checkpoint: done` label
-- Each task's tests should cover the happy path and at least one edge case or error path
+- **Concrete code** — include the actual implementation, not a summary. Write out SQL schemas, type definitions, function signatures with bodies, route handler code, and test assertions. A developer should be able to copy-paste from the plan and have working code. For tasks that depend on types or utilities from earlier tasks, reference them explicitly (e.g., `import { User } from Task 2`) and include only the new code
+- Exact commands with expected output (e.g., `npx vitest run src/user/model.test.ts` → shows 1 test passing)
+- Each task's tests should cover the happy path and at least one edge case or error path, with concrete assertions
-Each task must use a numbered heading:
+Each task must use a numbered heading with optional metadata comments:
 ```markdown
 ## Task N: <description>
 <!-- tdd: new-feature -->
-<!-- checkpoint: none -->
 ```
 ...where N starts at 1 and incrementally numbers each task in the plan.
@@ -41,6 +39,140 @@ Valid TDD values: `new-feature`, `modifying-tested-code`, `trivial`
 Valid checkpoint values: `none`, `test`, `done`
+### Level of detail
+This is the #1 thing to get right. The plan is not a high-level outline — it's a detailed recipe that the executing-tasks skill will follow step by step. If you write "implement login handler" without showing the code, the executing agent has to guess, and that defeats the purpose of the plan.
+Think of it this way: the plan author (you, now) has the full design context, the domain model, and the architecture in mind. The plan executor (a future agent session) will have none of that context — just the plan file. Write accordingly.
+**What "concrete code" means in practice:**
+- SQL: `CREATE TABLE` statements with all columns, types, and constraints
+- Types/interfaces: full type definitions with fields
+- Functions: signature + body (the logic, not just the name)
+- Tests: concrete assertions (`expect(result.status).toBe(409)`) not descriptions ("test that it returns an error")
+- Routes: the actual handler code with validation, error handling, and response format
+- Config: exact values, not "configure appropriately"
+**Bad** (too vague — the executor must guess):
+```
+3. Implement bookmark model
+```
+**Good** (executor can copy-paste):
+```
+3. Implement `src/db/bookmarks.ts`:
+```ts
+import db from '../db.js';
+export function createBookmarksTable() {
+  db.exec(`
+    CREATE TABLE IF NOT EXISTS bookmarks (
+      id TEXT PRIMARY KEY,
+      userId TEXT NOT NULL,
+      messageId TEXT NOT NULL,
+      createdAt TEXT DEFAULT (datetime('now')),
+      UNIQUE(userId, messageId)
+    )
+  `);
+}
+export function insertBookmark(userId: string, messageId: string) {
+  const id = crypto.randomUUID();
+  db.prepare('INSERT INTO bookmarks (id, userId, messageId) VALUES (?, ?, ?)').run(id, userId, messageId);
+  return { id, userId, messageId };
+}
+```
+```
+### Task body structure
+The examples below show the structure — headings, metadata comments, checkpoints, and step numbering. For the code content within steps, follow the detail level described above.
+**No checkpoint** — numbered steps only:
+```markdown
+## Task 1: Create User model
+<!-- tdd: new-feature -->
+Files:
+- `src/user/model.ts`
+- `src/user/model.test.ts`
+Steps:
+1. Write failing test for User model creation
+2. Run test — confirm it fails
+3. Implement User model
+4. Run test — confirm it passes
+```
+**`checkpoint: test`** — gate after test, before implementing:
+```markdown
+## Task 2: Write auth tests
+<!-- tdd: new-feature -->
+<!-- checkpoint: test -->
+Files:
+- `src/auth/login.test.ts`
+Steps:
+1. Write failing test for login with valid credentials
+2. Run test — confirm it fails
+⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
+3. Implement login handler
+4. Run test — confirm it passes
+5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
+6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
+```
+**`checkpoint: done`** — gate after all steps including refactor/lessons:
+```markdown
+## Task 3: Add login endpoint
+<!-- tdd: new-feature -->
+<!-- checkpoint: done -->
+Files:
+- `src/auth/login.ts`
+- `src/auth/login.test.ts`
+Steps:
+1. Write failing test for login with valid credentials
+2. Run test — confirm it fails
+3. Implement login handler
+4. Run test — confirm it passes
+5. Add edge case tests (invalid password, missing email)
+6. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
+7. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
+⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
+```
+**Both checkpoints** — gate after test, then gate after refactor/lessons:
+```markdown
+## Task 4: Complex auth flow
+<!-- tdd: new-feature -->
+<!-- checkpoint: test -->
+<!-- checkpoint: done -->
+Steps:
+1. Write failing test for auth flow
+2. Run test — confirm it fails
+⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
+3. Implement auth flow
+4. Run test — confirm it passes
+5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
+6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
+⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
+```
 ## Vertical slices
@@ -69,19 +201,20 @@ Label each task with its TDD scenario:
 | Scenario | When | Instructions in the task |
 |---|---|---|
-| **New feature** | Adding new behavior | Write failing test → run it → implement → run it → commit |
-| **Modifying tested code** | Changing existing behavior | Run existing tests first → modify → verify they pass → commit |
-| **Trivial** | Config, docs, naming | Use judgment, commit when done |
+| **New feature** | Adding new behavior | Write failing test → run it → implement → run it |
+| **Modifying tested code** | Changing existing behavior | Run existing tests first → modify → verify they pass |
+| **Trivial** | Config, docs, naming | Use judgment |
 ## Checkpoint labels
-Optionally label each task with a `checkpoint` to require human review before proceeding:
+Label each task with a `checkpoint` to require human review before proceeding. The checkpoint gate (`⏸ CHECKPOINT`) goes in the task body — the agent follows the plan step by step and pauses when it reaches the gate.
-| Checkpoint | When to use | What happens during execution |
+| Checkpoint | When to use | What the plan should include |
 |---|---|---|
-| *(none)* | Trivial tasks, well-understood changes | Auto-advance, no pause |
-| **`checkpoint: test`** | Test design matters (API contracts, edge cases, complex behavior) | Pause after writing the failing test, before implementing |
-| **`checkpoint: done`** | Implementation review matters (complex logic, security, performance) | Pause after implementation + tests pass, before committing |
+| *(none)* | Trivial tasks, well-understood changes | Numbered steps only |
+| **`checkpoint: test`** | Test design matters (API contracts, edge cases, complex behavior) | Steps up to test → `⏸ CHECKPOINT: test` → implement steps (including refactor/lessons) |
+| **`checkpoint: done`** | Implementation review matters (complex logic, security, performance) | Steps (including refactor/lessons) → `⏸ CHECKPOINT: done` |
+| Both | Non-obvious tests AND complex logic | Steps up to test → `⏸ CHECKPOINT: test` → implement steps (including refactor/lessons) → `⏸ CHECKPOINT: done` |
 Use judgment when assigning checkpoints. Prefer `checkpoint: test` for new features with non-obvious test design. Prefer `checkpoint: done` for tasks where the implementation approach is debatable. Most tasks should not need a checkpoint. The user can adjust checkpoints when reviewing the plan.