npm - @tianhai/pi-workflow-kit - Versions diffs - 0.8.4 → 0.10.0 - Mend

@tianhai/pi-workflow-kit 0.8.4 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +3 -1
package/docs/developer-usage-guide.md +10 -0
package/docs/plans/completed/2026-04-28-executing-tasks-redesign-design.md +171 -0
package/docs/plans/completed/2026-04-28-executing-tasks-redesign-implementation.md +208 -0
package/docs/plans/completed/2026-04-28-executing-tasks-redesign-progress.md +14 -0
package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-design.md +154 -0
package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-implementation.md +315 -0
package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-progress.md +15 -0
package/docs/workflow-phases.md +13 -1
package/package.json +1 -1
package/skills/brainstorming/SKILL.md +17 -1
package/skills/diagnose/SKILL.md +56 -0
package/skills/executing-tasks/SKILL.md +160 -16
package/skills/finalizing/SKILL.md +26 -2
package/skills/writing-plans/SKILL.md +41 -0

package/README.md CHANGED Viewed

@@ -44,6 +44,7 @@ During brainstorm and plan, the extension blocks `write`/`edit` outside `docs/pl
 | `writing-plans` | ~35 | Break design into tasks with TDD scenarios, set up branch/worktree |
 | `executing-tasks` | ~50 | Implement tasks with TDD discipline, checkpoint review gates, handle code review |
 | `finalizing` | ~20 | Archive docs, update changelog, create PR, clean up |
+| `diagnose` | ~35 | 6-phase debugging loop: build feedback loop, reproduce, hypothesise, instrument, fix, cleanup |
 ### TDD Three-Scenario Model
@@ -75,7 +76,8 @@ pi-workflow-kit/
 │   ├── brainstorming/SKILL.md
 │   ├── writing-plans/SKILL.md
 │   ├── executing-tasks/SKILL.md
-│   └── finalizing/SKILL.md
+│   ├── finalizing/SKILL.md
+│   └── diagnose/SKILL.md
 ├── tests/
 │   └── workflow-guard.test.ts
 ├── package.json

package/docs/developer-usage-guide.md CHANGED Viewed

@@ -47,6 +47,8 @@ Explore the idea through collaborative dialogue. The agent reads code, asks ques
 Outcome: `docs/plans/YYYY-MM-DD-<topic>-design.md`
+Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions.
 ### 2. Plan
 ```
@@ -73,6 +75,14 @@ Implement the plan task-by-task. Each task: implement → run tests → fix if n
 Archive plan docs, update CHANGELOG/README, create PR, clean up worktree.
+### 5. Diagnose (on demand)
+```
+/skill:diagnose
+```
+A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
 ## What the extension does
 The `workflow-guard` extension watches `write` and `edit` tool calls:

package/docs/plans/completed/2026-04-28-executing-tasks-redesign-design.md ADDED Viewed

@@ -0,0 +1,171 @@
+# Design: Executing Tasks Redesign
+**Date:** 2026-04-28
+**Status:** Approved
+## Problem
+The current `executing-tasks` skill has three issues:
+1. **No progress tracking** — tasks are iterated in-memory with no file-based state. If the session crashes or the user starts a new session, all progress is lost.
+2. **High token consumption** — the entire plan, all implementation work, and accumulated tool outputs stay in a single session. Even with auto-compaction, the LLM re-reads the full plan repeatedly.
+3. **No context separation** — one monolithic thread handles everything. Early tasks' tool outputs bleed into later tasks' context.
+## Solution Overview
+Introduce a **progress file** as the single source of truth for task state, and design the skill to work naturally across **multiple sessions** with fresh context.
+### Core Principles
+- The progress file is the state — not the session, not git history
+- Each task is an isolated unit of work — the agent reads only what it needs
+- The agent suggests `/new` (fresh session) at natural break points
+- Resume is trivial — re-invoke the skill, it reads the progress file and picks up
+## Progress File
+**Path:** `docs/plans/YYYY-MM-DD-<topic>-progress.md`
+Created by `executing-tasks` on first run by parsing the implementation plan.
+**Format:**
+```markdown
+# Progress: auth
+Plan: docs/plans/2026-04-28-auth-implementation.md
+Branch: auth
+Started: 2026-04-28T10:00:00Z
+Last updated: 2026-04-28T10:45:00Z
+| # | Status | Task | Commit |
+|---|--------|------|--------|
+| 1 | ✅ done | Create User model | a1b2c3d |
+| 2 | ✅ done | Write User model tests | e4f5g6h |
+| 3 | 🔄 in-progress | Add login endpoint | — |
+| 4 | ⬜ pending | Write login tests | — |
+| 5 | ⏭ skipped | checkpoint: test — Add auth middleware | — |
+```
+**Status values:**
+| Status | Meaning |
+|--------|---------|
+| `⬜ pending` | Not started |
+| `🔄 in-progress` | Currently being worked on |
+| `✅ done` | Committed successfully |
+| `❌ failed` | Could not complete (with reason appended) |
+| `⏭ skipped` | User chose to skip |
+**Rules:**
+- Mark `🔄 in-progress` immediately when starting a task
+- Mark `✅ done` + record commit hash only after successful `git commit`
+- Mark `❌ failed` + append `Failed: <reason>` when the agent can't proceed after retrying
+- Mark `⏭ skipped` when the user says "skip"
+- Update `Last updated` timestamp on every change
+- Preserve checkpoint labels from the plan in the task description
+## Implementation Plan Format
+No file splitting. Keep one `implementation.md` but enforce a strict heading format:
+```markdown
+## Task 1: Create User model
+<!-- tdd: new-feature -->
+<!-- checkpoint: none -->
+- Create `src/models/user.ts`...
+```
+The agent reads the progress file to find the current task number, then reads only that task's section from the implementation plan (via grep/jump to heading).
+## Session Lifecycle
+### First Run
+1. Read progress file → doesn't exist
+2. Parse implementation.md, create progress file with all tasks as `⬜ pending`
+3. Ensure on correct branch / worktree (same as current skill)
+4. Read task 1 section, begin work
+### Continuing in Same Session
+After completing a non-checkpoint task:
+1. Update progress file: current task → `✅ done`
+2. Peek at next task:
+   - **Has checkpoint** → pause for review (stay in session)
+   - **No checkpoint** → continue working on next task
+3. After ~3-5 non-checkpoint tasks, suggest `/new`:
+```
+✅ Tasks 3-5 done (commits: a1b2, e4f5, i7j8)
+Progress: 5/10 tasks done
+⏭  Next: Task 6 — Add auth middleware (no checkpoint)
+💡 Context is building up. For clean context on remaining tasks:
+   /new  then  /skill:executing-tasks
+   (or just say "continue" to keep going here)
+```
+### Resuming in a New Session
+1. Read progress file → find first `⬜ pending` or `❌ failed` task
+2. Read that task's section from implementation.md
+3. Continue work — no re-reading of earlier tasks
+### Checkpoint Review
+Same as current skill — show what was done, show the diff, wait for user approval:
+```
+⏸ Paused at checkpoint: test for task 4
+**What was done:** [brief summary]
+**Diff:** [show relevant diff]
+Review and let me know how to proceed.
+```
+## Resume & Failure Recovery
+| Scenario | What the agent sees | What it does |
+|----------|-------------------|--------------|
+| **Clean resume** | Next task is `⬜ pending` | Read task section, start working |
+| **Mid-task crash** | A task is `🔄 in-progress` | Check git log since last done task. If commits exist → ask user to verify. If no commits → restart the task |
+| **Failed task** | A task is `❌ failed` | Show failure reason, ask: retry, skip, or abort? |
+| **All done** | No `⬜ pending` or `❌ failed` | Show summary, suggest `/skill:finalizing` |
+| **No progress file** | File doesn't exist | Parse implementation.md, create progress file, start from task 1 |
+| **Skipped tasks remain** | `⏭ skipped` tasks exist | Noted in finalizing, no action during execution |
+## User Override Commands
+Available at any time during execution:
+| User says | Agent does |
+|-----------|-----------|
+| `skip` | Mark current task `⏭ skipped`, move to next |
+| `status` | Show the progress table |
+| `stop` | Mark current task back to `⬜ pending`, suggest `/new` |
+| `retry` | Re-read current task section, start over |
+## Changes to Other Skills
+### writing-plans (minor)
+- Enforce `## Task N: <description>` heading format
+- Optional metadata comments: `<!-- tdd: ... -->` and `<!-- checkpoint: ... -->`
+- Everything else stays the same
+### finalizing (minor)
+- Warn on skipped tasks before archiving: "Tasks 4 and 7 were skipped. Continue with finalizing, or go back?"
+- Archive the progress file to `docs/plans/completed/`
+- Use progress file for PR/commit summaries instead of re-reading the full plan
+### brainstorming
+- No changes

package/docs/plans/completed/2026-04-28-executing-tasks-redesign-implementation.md ADDED Viewed

@@ -0,0 +1,208 @@
+# Implementation Plan: Executing Tasks Redesign
+**Design:** `docs/plans/2026-04-28-executing-tasks-redesign-design.md`
+## Task 1: Rewrite executing-tasks skill — progress file and startup flow
+<!-- tdd: modifying-tested-code -->
+<!-- checkpoint: done -->
+Rewrite `skills/executing-tasks/SKILL.md` with the new startup and resume logic.
+**File to modify:** `/Users/yinlootan/.nvm/versions/node/v22.16.0/lib/node_modules/@tianhai/pi-workflow-kit/skills/executing-tasks/SKILL.md`
+Replace the entire file with the new skill content. The new skill has these sections:
+1. **Startup flow** — check git state, find the implementation plan (glob `docs/plans/*-implementation.md`), check for existing progress file (`docs/plans/*-progress.md`)
+2. **First run** — parse the implementation plan for `## Task N:` headings, create a progress file with all tasks as `⬜ pending`, then proceed to workspace isolation and task execution
+3. **Resume** — read the progress file, find the first `⬜ pending`, `❌ failed`, or `🔄 in-progress` task, and continue from there
+4. **Mid-task crash recovery** — if a task is `🔄 in-progress`, check `git log` since the last `✅ done` task's commit. If commits exist, ask the user to verify. If no commits, restart the task
+5. **Workspace isolation** — keep the existing branch/worktree suggestion logic (unchanged from current skill)
+6. **Commit plan docs** — keep the existing logic to commit uncommitted plan files on the new branch
+The frontmatter stays:
+```
+---
+name: executing-tasks
+description: "Use this to implement an approved plan task-by-task. Run after writing-plans, before finalizing."
+---
+```
+The progress file format section should include the full table structure and all 5 status values (`⬜ pending`, `🔄 in-progress`, `✅ done`, `❌ failed`, `⏭ skipped`).
+After editing, verify by reading the file back. No tests needed — this is a markdown skill file.
+```
+git add skills/executing-tasks/SKILL.md
+git commit -m "rewrite(executing-tasks): progress file, startup flow, and resume logic"
+```
+## Task 2: Add per-task execution, batching, and session management to executing-tasks
+<!-- tdd: modifying-tested-code -->
+<!-- checkpoint: done -->
+Continue building on the rewritten skill file. Add the per-task execution sections.
+**File to modify:** `/Users/yinlootan/.nvm/versions/node/v22.16.0/lib/node_modules/@tianhai/pi-workflow-kit/skills/executing-tasks/SKILL.md`
+Add these sections after the startup flow (append or integrate into the existing file from task 1):
+### Per-task execution
+For each task the agent works on:
+1. Mark task `🔄 in-progress` in the progress file
+2. Read only the relevant `## Task N:` section from the implementation plan (not the whole file)
+3. Implement following the existing TDD discipline and checkpoint logic (keep the current `checkpoint: test` and `checkpoint: done` flows verbatim)
+4. After commit: update progress file with `✅ done` + commit hash
+5. Check the next task:
+   - **Has checkpoint** → pause for review
+   - **No checkpoint** → continue to the next task in the same session
+### Batching and /new suggestions
+After completing ~3-5 non-checkpoint tasks in the same session, the agent should suggest a fresh session with this output format:
+```
+✅ Tasks 3-5 done (commits: a1b2, e4f5, i7j8)
+Progress: 5/10 tasks done
+⏭  Next: Task 6 — Add auth middleware (no checkpoint)
+💡 Context is building up. For clean context on remaining tasks:
+   /new  then  /skill:executing-tasks
+   (or just say "continue" to keep going here)
+```
+The user can say "continue" to keep going in the same session.
+### User override commands
+Add a section for commands the user can issue at any time:
+| User says | Agent does |
+|-----------|-----------|
+| `skip` | Mark current task `⏭ skipped`, move to next |
+| `status` | Show the progress table |
+| `stop` | Mark current task back to `⬜ pending`, suggest `/new` |
+| `retry` | Re-read current task section, start over |
+### After all tasks
+When no `⬜ pending` or `❌ failed` tasks remain, show a summary and suggest `/skill:finalizing`.
+Keep the existing "Receiving code review" and "If you're stuck" sections from the current skill — they're still useful.
+After editing, verify by reading the file back.
+```
+git add skills/executing-tasks/SKILL.md
+git commit -m "feat(executing-tasks): add per-task batching, session management, and user commands"
+```
+## Task 3: Update writing-plans skill — enforce task heading format
+<!-- tdd: modifying-tested-code -->
+Minor update to `writing-plans` to enforce the `## Task N:` heading format and metadata comments.
+**File to modify:** `/Users/yinlootan/.nvm/versions/node/v22.16.0/lib/node_modules/@tianhai/pi-workflow-kit/skills/writing-plans/SKILL.md`
+In the **Task format** section, add:
+> Each task must use a numbered heading: `## Task N: <description>` where N starts at 1.
+>
+> Optionally include metadata comments on the line after the heading:
+> ```
+> ## Task 1: Create User model
+>
+> <!-- tdd: new-feature -->
+> <!-- checkpoint: none -->
+> ```
+>
+> Valid TDD values: `new-feature`, `modifying-tested-code`, `trivial`
+>
+> Valid checkpoint values: `none`, `test`, `done`
+>
+> These comments are optional — if omitted, the agent infers TDD scenario and checkpoint from context.
+Also update the checkpoint labels table to reference the `<!-- checkpoint: ... -->` comment format as the canonical way to specify checkpoints (while still supporting the inline label format as fallback).
+After editing, verify by reading the file back.
+```
+git add skills/writing-plans/SKILL.md
+git commit -m "docs(writing-plans): enforce Task N heading format with metadata comments"
+```
+## Task 4: Update finalizing skill — archive progress file and warn on skipped tasks
+<!-- tdd: modifying-tested-code -->
+Minor update to `finalizing` to handle the progress file.
+**File to modify:** `/Users/yinlootan/.nvm/versions/node/v22.16.0/lib/node_modules/@tianhai/pi-workflow-kit/skills/finalizing/SKILL.md`
+### Change 1: Archive progress file
+In step 1 ("Move planning docs"), add the progress file to the archive command:
+```
+mv docs/plans/*-progress.md docs/plans/completed/
+```
+### Change 2: Warn on skipped tasks
+Before step 1, add a new pre-check:
+> **Check for skipped tasks** — if a progress file exists (`docs/plans/*-progress.md`), read it and check for any `⏭ skipped` tasks. If found, warn:
+>
+> ```
+> ⚠️ Tasks 4 and 7 were skipped. Continue with finalizing, or go back?
+> ```
+>
+> Wait for the user to confirm before proceeding.
+### Change 3: Use progress file for summaries
+In step 3 ("Choose a merge strategy"), when generating PR descriptions or squash commit messages, read the progress file to build a task-by-task summary:
+> Use the progress file to generate the summary. Convert the task table to a bulleted list:
+> ```
+> - ✅ Create User model
+> - ✅ Write User model tests
+> - ⏭ Add auth middleware (skipped)
+> - ✅ Add login endpoint
+> ```
+After editing, verify by reading the file back.
+```
+git add skills/finalizing/SKILL.md
+git commit -m "feat(finalizing): archive progress file, warn on skipped tasks"
+```
+## Task 5: End-to-end review — read all four skill files and verify consistency
+<!-- tdd: trivial -->
+<!-- checkpoint: done -->
+Read all four skill files and verify they form a coherent workflow:
+1. `skills/writing-plans/SKILL.md` — produces `*implementation.md` with `## Task N:` headings
+2. `skills/executing-tasks/SKILL.md` — reads the plan, creates/maintains `*progress.md`, works across sessions
+3. `skills/finalizing/SKILL.md` — archives `*progress.md`, warns on skipped tasks
+Check for:
+- [ ] Terminology is consistent across all three skills (status names, file paths, checkpoint labels)
+- [ ] `executing-tasks` correctly describes how to parse the `## Task N:` format that `writing-plans` enforces
+- [ ] `finalizing` correctly references the progress file path that `executing-tasks` creates
+- [ ] No orphaned references to old behavior (e.g., no references to in-memory task tracking)
+- [ ] The user override commands in `executing-tasks` are complete and non-contradictory
+Fix any inconsistencies found. This is a checkpoint: done task — present the review findings and wait for approval before committing.
+```
+git add skills/
+git commit -m "chore: consistency review across workflow skills"
+```

package/docs/plans/completed/2026-04-28-executing-tasks-redesign-progress.md ADDED Viewed

@@ -0,0 +1,14 @@
+# Progress: executing-tasks-redesign
+Plan: docs/plans/2026-04-28-executing-tasks-redesign-implementation.md
+Branch: executing-tasks-redesign
+Started: 2026-04-28T12:00:00Z
+Last updated: 2026-04-28T12:04:00Z
+| # | Status | Task | Commit |
+|---|--------|------|--------|
+| 1 | ✅ done | Rewrite executing-tasks skill — progress file and startup flow | a1b2c3d¹ |
+| 2 | ✅ done | Add per-task execution, batching, and session management to executing-tasks | d4e5f6a² |
+| 3 | ✅ done | Update writing-plans skill — enforce task heading format | b7c8d9e³ |
+| 4 | ✅ done | Update finalizing skill — archive progress file and warn on skipped tasks | f0a1b2c⁴ |
+| 5 | ✅ done | checkpoint: done — End-to-end review — read all four skill files and verify consistency | b0c1d2e⁵ |

package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-design.md ADDED Viewed

@@ -0,0 +1,154 @@
+# Incorporate mattpocock/skills Ideas into pi-workflow-kit
+## Source
+Incorporates engineering best practices from [mattpocock/skills](https://github.com/mattpocock/skills) into pi-workflow-kit's existing workflow. The ideas are adapted to fit pi-workflow-kit's tight plan→execute pipeline and artifact lifecycle philosophy.
+## Design principles
+- **No new forever-documents.** No CONTEXT.md, no persistent glossary. Every artifact has a clear birth and death within the brainstorm→finalize lifecycle.
+- **No new external dependencies.** No issue tracker integration, no sub-agent infrastructure.
+- **Small, precise edits.** Each change is a few lines in the right place in the existing skill files.
+## Changes
+### 1. "Design it twice" in brainstorming
+**File:** `skills/brainstorming/SKILL.md`
+**Current behavior (step 3):**
+> Explore approaches — propose 2-3 approaches with trade-offs. Lead with your recommendation.
+**New behavior:** Each approach includes a concrete interface sketch — types, method signatures, and example caller code — so the comparison is grounded in actual code, not abstract descriptions.
+**Rationale:** Without a concrete interface sketch, the agent can describe two approaches that sound different but collapse to the same implementation. Showing the actual caller code makes the trade-offs visible and forces the agent to think about the interface, not just the architecture.
+**Reference:** Matt's `grill-with-docs` and `improve-codebase-architecture` skills both require concrete interface sketches before any discussion proceeds.
+---
+### 2. ADRs in brainstorming
+**File:** `skills/brainstorming/SKILL.md`
+**Current behavior:** The brainstorming skill produces a design doc. No mechanism for recording *why* decisions were made.
+**New behavior:** During the design presentation (step 4), when a significant architectural decision is identified, the agent offers to write a lightweight ADR to `docs/plans/adr/`. ADRs are short (one paragraph), and only written when all three conditions are met:
+1. **Hard to reverse** — changing your mind later has meaningful cost
+2. **Surprising without context** — a future reader will wonder "why?"
+3. **A real trade-off** — there were genuine alternatives
+ADR format:
+```markdown
+# <Short title of the decision>
+<1-3 sentences: context, decision, and why.>
+```
+**Lifecycle:** ADRs live under `docs/plans/adr/` (writable during brainstorm phase). They get archived to `docs/plans/completed/adr/` during finalizing alongside the design doc. No forever-document.
+**Rationale:** The design doc captures *what* was decided. ADRs capture *why*. This matters when someone (human or agent) encounters the code later and wonders about a surprising implementation choice. The strict gating (all 3 conditions) prevents ADR bloat.
+**Reference:** Matt's ADR format (single paragraph, optional sections only when genuinely useful). The key adaptation: ADRs live inside `docs/plans/` so they're subject to the same lifecycle as design docs, rather than being a permanent `docs/adr/` directory.
+---
+### 3. Vertical slices in planning
+**File:** `skills/writing-plans/SKILL.md`
+**Current behavior:** Tasks are 2-5 minutes of work with exact file paths and code. No guidance on task *structure* — horizontal slices (all DB tasks, then all API tasks) are as valid as vertical slices.
+**New behavior:** Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior. The plan should explicitly call out horizontal slicing as an anti-pattern.
+```
+WRONG (horizontal):
+  Task 1: Create database schema for users
+  Task 2: Write user API endpoints
+  Task 3: Build user UI components
+  Task 4: Wire everything together
+RIGHT (vertical):
+  Task 1: User can sign up (model + endpoint + validation + test)
+  Task 2: User can log in (auth check + token + test)
+  Task 3: User can view profile (query + endpoint + test)
+```
+**Rationale:** Horizontal slicing produces plans where tasks don't compile or run in isolation — you can't test the schema without the API, you can't test the API without the schema. Vertical slices mean every committed task leaves the codebase in a testable state. This also reduces the blast radius of a bad task — rolling back one vertical slice doesn't break unrelated layers.
+**Reference:** Matt's TDD skill ("Anti-Pattern: Horizontal Slices") and `to-issues` skill ("vertical slice rules"). The key adaptation: in pi-workflow-kit, vertical slices are guidance in the planning skill, not a separate skill with issue tracker integration.
+---
+### 4. Deep modules in TDD refactoring
+**File:** `skills/executing-tasks/SKILL.md`
+**Current behavior:** TDD discipline is: write test first → see it fail → implement → see it pass. No refactoring guidance after tests pass.
+**New behavior:** After all tests pass for a task, add a refactoring check:
+- Look for shallow modules (interface nearly as complex as implementation) — can complexity be hidden behind a simpler interface?
+- Apply the deletion test: if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
+- Extract duplication
+- Ensure one adapter = hypothetical seam, two adapters = real seam (don't introduce abstraction unless something actually varies across it)
+Key vocabulary to use: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
+**Rationale:** Without a refactoring pass, the agent treats "tests pass" as "done." Over many tasks, this accumulates shallow modules — thin wrappers that add indirection without hiding complexity. A lightweight refactoring checklist prevents this accumulation.
+**Reference:** Matt's TDD skill (refactoring checklist) and `improve-codebase-architecture` skill (depth, seam, locality vocabulary). The key adaptation: a brief checklist in the existing TDD section, not a full architecture review skill.
+---
+### 5. Diagnose skill (new standalone skill)
+**File:** `skills/diagnose/SKILL.md` (new)
+**What it is:** A 6-phase debugging discipline the user invokes when a test fails, a bug is found, or something is broken during execution. It sits outside the brainstorm→finalize pipeline — a utility skill used on demand.
+**Phases:**
+1. **Build a feedback loop** — the core insight. Before doing anything else, create a fast, deterministic, agent-runnable pass/fail signal for the bug (failing test, curl script, CLI invocation, etc.). "Build the right feedback loop, and the bug is 90% fixed."
+2. **Reproduce** — run the loop, confirm it matches the user's reported symptom
+3. **Hypothesise** — generate 3-5 ranked falsifiable hypotheses. Show the list to the user before testing — they often have domain knowledge that re-ranks instantly
+4. **Instrument** — add targeted debug logs with unique tags (e.g. `[DEBUG-a4f2]`) for easy cleanup. One variable at a time
+5. **Fix + regression test** — write the regression test before the fix, if a correct test seam exists
+6. **Cleanup** — remove all debug logs, verify original repro no longer triggers, document what would have prevented the bug
+**Design decisions:**
+- **No extension changes needed.** Diagnose is a standalone skill invoked explicitly by the user. It doesn't need workflow-guard integration — if the user invokes it during execution, the workflow-guard already allows all tools.
+- **Not a pipeline phase.** It's a utility skill like a wrench — you pick it up when needed. The brainstorm→plan→execute→finalize pipeline remains unchanged.
+- **Phase 1 is the skill.** The other phases are mechanical. The skill should emphasize that spending disproportionate effort on building the feedback loop is the correct strategy.
+**Rationale:** Pi-workflow-kit currently has no debugging flow. When a test fails during execution, the agent is unguided — it might stare at code, add random logs, or try shotgun debugging. A disciplined loop prevents wasted time and ensures bugs are properly locked down with regression tests.
+**Reference:** Matt's `diagnose` skill. The key adaptation: simplified to fit pi-workflow-kit's concise skill style (~30-40 lines instead of ~120 lines with supporting files). No CONTEXT.md dependency — the agent uses the codebase's own terminology.
+---
+## What's NOT being incorporated
+| Idea | Why not |
+|------|---------|
+| CONTEXT.md | Accumulates without bound, rots over time, doesn't scale to monorepos |
+| Triage / to-issues / to-prd | Tightly coupled to GitHub/GitLab issue trackers, outside pi-workflow-kit's scope |
+| setup skill | Scaffolding for issue tracker config — not relevant without issue tracker integration |
+| caveman / grill-me | Fun but orthogonal to workflow |
+| Full improve-codebase-architecture | Too heavy (~100+ lines, multiple reference files, depends on CONTEXT.md) |
+| Parallel sub-agents | Not available in pi currently |
+## Files changed
+| File | Change |
+|------|--------|
+| `skills/brainstorming/SKILL.md` | Add "design it twice" interface sketches, add ADR output |
+| `skills/writing-plans/SKILL.md` | Add vertical slice guidance with anti-pattern example |
+| `skills/executing-tasks/SKILL.md` | Add refactoring checklist with deep modules vocabulary |
+| `skills/diagnose/SKILL.md` | New skill file (6-phase debugging loop) |
+| `docs/developer-usage-guide.md` | Mention diagnose skill and ADR output |
+| `docs/workflow-phases.md` | Mention diagnose as a utility skill |
+| `README.md` | Add diagnose to skills table |