npm - @tianhai/pi-workflow-kit - Versions diffs - 0.9.0 → 0.10.0 - Mend

@tianhai/pi-workflow-kit 0.9.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +3 -1
package/docs/developer-usage-guide.md +10 -0
package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-design.md +154 -0
package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-implementation.md +315 -0
package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-progress.md +15 -0
package/docs/workflow-phases.md +13 -1
package/package.json +1 -1
package/skills/brainstorming/SKILL.md +17 -1
package/skills/diagnose/SKILL.md +56 -0
package/skills/executing-tasks/SKILL.md +13 -0
package/skills/finalizing/SKILL.md +4 -1
package/skills/writing-plans/SKILL.md +19 -0

package/README.md CHANGED Viewed

@@ -44,6 +44,7 @@ During brainstorm and plan, the extension blocks `write`/`edit` outside `docs/pl
 | `writing-plans` | ~35 | Break design into tasks with TDD scenarios, set up branch/worktree |
 | `executing-tasks` | ~50 | Implement tasks with TDD discipline, checkpoint review gates, handle code review |
 | `finalizing` | ~20 | Archive docs, update changelog, create PR, clean up |
+| `diagnose` | ~35 | 6-phase debugging loop: build feedback loop, reproduce, hypothesise, instrument, fix, cleanup |
 ### TDD Three-Scenario Model
@@ -75,7 +76,8 @@ pi-workflow-kit/
 │   ├── brainstorming/SKILL.md
 │   ├── writing-plans/SKILL.md
 │   ├── executing-tasks/SKILL.md
-│   └── finalizing/SKILL.md
+│   ├── finalizing/SKILL.md
+│   └── diagnose/SKILL.md
 ├── tests/
 │   └── workflow-guard.test.ts
 ├── package.json

package/docs/developer-usage-guide.md CHANGED Viewed

@@ -47,6 +47,8 @@ Explore the idea through collaborative dialogue. The agent reads code, asks ques
 Outcome: `docs/plans/YYYY-MM-DD-<topic>-design.md`
+Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions.
 ### 2. Plan
 ```
@@ -73,6 +75,14 @@ Implement the plan task-by-task. Each task: implement → run tests → fix if n
 Archive plan docs, update CHANGELOG/README, create PR, clean up worktree.
+### 5. Diagnose (on demand)
+```
+/skill:diagnose
+```
+A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
 ## What the extension does
 The `workflow-guard` extension watches `write` and `edit` tool calls:

package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-design.md ADDED Viewed

@@ -0,0 +1,154 @@
+# Incorporate mattpocock/skills Ideas into pi-workflow-kit
+## Source
+Incorporates engineering best practices from [mattpocock/skills](https://github.com/mattpocock/skills) into pi-workflow-kit's existing workflow. The ideas are adapted to fit pi-workflow-kit's tight plan→execute pipeline and artifact lifecycle philosophy.
+## Design principles
+- **No new forever-documents.** No CONTEXT.md, no persistent glossary. Every artifact has a clear birth and death within the brainstorm→finalize lifecycle.
+- **No new external dependencies.** No issue tracker integration, no sub-agent infrastructure.
+- **Small, precise edits.** Each change is a few lines in the right place in the existing skill files.
+## Changes
+### 1. "Design it twice" in brainstorming
+**File:** `skills/brainstorming/SKILL.md`
+**Current behavior (step 3):**
+> Explore approaches — propose 2-3 approaches with trade-offs. Lead with your recommendation.
+**New behavior:** Each approach includes a concrete interface sketch — types, method signatures, and example caller code — so the comparison is grounded in actual code, not abstract descriptions.
+**Rationale:** Without a concrete interface sketch, the agent can describe two approaches that sound different but collapse to the same implementation. Showing the actual caller code makes the trade-offs visible and forces the agent to think about the interface, not just the architecture.
+**Reference:** Matt's `grill-with-docs` and `improve-codebase-architecture` skills both require concrete interface sketches before any discussion proceeds.
+---
+### 2. ADRs in brainstorming
+**File:** `skills/brainstorming/SKILL.md`
+**Current behavior:** The brainstorming skill produces a design doc. No mechanism for recording *why* decisions were made.
+**New behavior:** During the design presentation (step 4), when a significant architectural decision is identified, the agent offers to write a lightweight ADR to `docs/plans/adr/`. ADRs are short (one paragraph), and only written when all three conditions are met:
+1. **Hard to reverse** — changing your mind later has meaningful cost
+2. **Surprising without context** — a future reader will wonder "why?"
+3. **A real trade-off** — there were genuine alternatives
+ADR format:
+```markdown
+# <Short title of the decision>
+<1-3 sentences: context, decision, and why.>
+```
+**Lifecycle:** ADRs live under `docs/plans/adr/` (writable during brainstorm phase). They get archived to `docs/plans/completed/adr/` during finalizing alongside the design doc. No forever-document.
+**Rationale:** The design doc captures *what* was decided. ADRs capture *why*. This matters when someone (human or agent) encounters the code later and wonders about a surprising implementation choice. The strict gating (all 3 conditions) prevents ADR bloat.
+**Reference:** Matt's ADR format (single paragraph, optional sections only when genuinely useful). The key adaptation: ADRs live inside `docs/plans/` so they're subject to the same lifecycle as design docs, rather than being a permanent `docs/adr/` directory.
+---
+### 3. Vertical slices in planning
+**File:** `skills/writing-plans/SKILL.md`
+**Current behavior:** Tasks are 2-5 minutes of work with exact file paths and code. No guidance on task *structure* — horizontal slices (all DB tasks, then all API tasks) are as valid as vertical slices.
+**New behavior:** Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior. The plan should explicitly call out horizontal slicing as an anti-pattern.
+```
+WRONG (horizontal):
+  Task 1: Create database schema for users
+  Task 2: Write user API endpoints
+  Task 3: Build user UI components
+  Task 4: Wire everything together
+RIGHT (vertical):
+  Task 1: User can sign up (model + endpoint + validation + test)
+  Task 2: User can log in (auth check + token + test)
+  Task 3: User can view profile (query + endpoint + test)
+```
+**Rationale:** Horizontal slicing produces plans where tasks don't compile or run in isolation — you can't test the schema without the API, you can't test the API without the schema. Vertical slices mean every committed task leaves the codebase in a testable state. This also reduces the blast radius of a bad task — rolling back one vertical slice doesn't break unrelated layers.
+**Reference:** Matt's TDD skill ("Anti-Pattern: Horizontal Slices") and `to-issues` skill ("vertical slice rules"). The key adaptation: in pi-workflow-kit, vertical slices are guidance in the planning skill, not a separate skill with issue tracker integration.
+---
+### 4. Deep modules in TDD refactoring
+**File:** `skills/executing-tasks/SKILL.md`
+**Current behavior:** TDD discipline is: write test first → see it fail → implement → see it pass. No refactoring guidance after tests pass.
+**New behavior:** After all tests pass for a task, add a refactoring check:
+- Look for shallow modules (interface nearly as complex as implementation) — can complexity be hidden behind a simpler interface?
+- Apply the deletion test: if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
+- Extract duplication
+- Ensure one adapter = hypothetical seam, two adapters = real seam (don't introduce abstraction unless something actually varies across it)
+Key vocabulary to use: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
+**Rationale:** Without a refactoring pass, the agent treats "tests pass" as "done." Over many tasks, this accumulates shallow modules — thin wrappers that add indirection without hiding complexity. A lightweight refactoring checklist prevents this accumulation.
+**Reference:** Matt's TDD skill (refactoring checklist) and `improve-codebase-architecture` skill (depth, seam, locality vocabulary). The key adaptation: a brief checklist in the existing TDD section, not a full architecture review skill.
+---
+### 5. Diagnose skill (new standalone skill)
+**File:** `skills/diagnose/SKILL.md` (new)
+**What it is:** A 6-phase debugging discipline the user invokes when a test fails, a bug is found, or something is broken during execution. It sits outside the brainstorm→finalize pipeline — a utility skill used on demand.
+**Phases:**
+1. **Build a feedback loop** — the core insight. Before doing anything else, create a fast, deterministic, agent-runnable pass/fail signal for the bug (failing test, curl script, CLI invocation, etc.). "Build the right feedback loop, and the bug is 90% fixed."
+2. **Reproduce** — run the loop, confirm it matches the user's reported symptom
+3. **Hypothesise** — generate 3-5 ranked falsifiable hypotheses. Show the list to the user before testing — they often have domain knowledge that re-ranks instantly
+4. **Instrument** — add targeted debug logs with unique tags (e.g. `[DEBUG-a4f2]`) for easy cleanup. One variable at a time
+5. **Fix + regression test** — write the regression test before the fix, if a correct test seam exists
+6. **Cleanup** — remove all debug logs, verify original repro no longer triggers, document what would have prevented the bug
+**Design decisions:**
+- **No extension changes needed.** Diagnose is a standalone skill invoked explicitly by the user. It doesn't need workflow-guard integration — if the user invokes it during execution, the workflow-guard already allows all tools.
+- **Not a pipeline phase.** It's a utility skill like a wrench — you pick it up when needed. The brainstorm→plan→execute→finalize pipeline remains unchanged.
+- **Phase 1 is the skill.** The other phases are mechanical. The skill should emphasize that spending disproportionate effort on building the feedback loop is the correct strategy.
+**Rationale:** Pi-workflow-kit currently has no debugging flow. When a test fails during execution, the agent is unguided — it might stare at code, add random logs, or try shotgun debugging. A disciplined loop prevents wasted time and ensures bugs are properly locked down with regression tests.
+**Reference:** Matt's `diagnose` skill. The key adaptation: simplified to fit pi-workflow-kit's concise skill style (~30-40 lines instead of ~120 lines with supporting files). No CONTEXT.md dependency — the agent uses the codebase's own terminology.
+---
+## What's NOT being incorporated
+| Idea | Why not |
+|------|---------|
+| CONTEXT.md | Accumulates without bound, rots over time, doesn't scale to monorepos |
+| Triage / to-issues / to-prd | Tightly coupled to GitHub/GitLab issue trackers, outside pi-workflow-kit's scope |
+| setup skill | Scaffolding for issue tracker config — not relevant without issue tracker integration |
+| caveman / grill-me | Fun but orthogonal to workflow |
+| Full improve-codebase-architecture | Too heavy (~100+ lines, multiple reference files, depends on CONTEXT.md) |
+| Parallel sub-agents | Not available in pi currently |
+## Files changed
+| File | Change |
+|------|--------|
+| `skills/brainstorming/SKILL.md` | Add "design it twice" interface sketches, add ADR output |
+| `skills/writing-plans/SKILL.md` | Add vertical slice guidance with anti-pattern example |
+| `skills/executing-tasks/SKILL.md` | Add refactoring checklist with deep modules vocabulary |
+| `skills/diagnose/SKILL.md` | New skill file (6-phase debugging loop) |
+| `docs/developer-usage-guide.md` | Mention diagnose skill and ADR output |
+| `docs/workflow-phases.md` | Mention diagnose as a utility skill |
+| `README.md` | Add diagnose to skills table |

package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-implementation.md ADDED Viewed

@@ -0,0 +1,315 @@
+# Implementation Plan: Incorporate mattpocock/skills Ideas
+Design doc: `docs/plans/2026-05-01-incorporate-mattpocock-skills-design.md`
+## Task 1: Update brainstorming skill — design it twice + ADRs
+<!-- tdd: trivial -->
+<!-- checkpoint: none -->
+Edit `skills/brainstorming/SKILL.md`:
+**Step 3** — change from:
+```
+3. **Explore approaches** — propose 2-3 approaches with trade-offs. Lead with your recommendation.
+```
+to:
+```
+3. **Explore approaches** — propose 2-3 approaches. For each approach, sketch the concrete interface (types, method signatures, example caller code) so the comparison is grounded in actual code, not abstract descriptions. Lead with your recommendation.
+```
+**Step 4** — change from:
+```
+4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
+```
+to:
+```
+4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
+   When a significant architectural decision is identified, offer to write a lightweight ADR to `docs/plans/adr/`. Only write an ADR when all three are true:
+   1. **Hard to reverse** — changing your mind later has meaningful cost
+   2. **Surprising without context** — a future reader will wonder "why?"
+   3. **A real trade-off** — there were genuine alternatives
+   ADR format — a title and 1-3 sentences covering context, decision, and why:
+   ```markdown
+   # <Short title of the decision>
+   <1-3 sentences: context, decision, and why.>
+   ```
+   ADRs live under `docs/plans/adr/` and are archived during finalizing alongside the design doc.
+```
+```bash
+git commit -m "feat(brainstorming): add design-it-twice interface sketches and ADR output"
+```
+---
+## Task 2: Update writing-plans skill — vertical slices
+<!-- tdd: trivial -->
+<!-- checkpoint: none -->
+Edit `skills/writing-plans/SKILL.md` — add a new section after "## Task format" and before "## TDD in the plan":
+```markdown
+## Vertical slices
+Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior.
+```
+WRONG (horizontal):
+  Task 1: Create database schema for users
+  Task 2: Write user API endpoints
+  Task 3: Build user UI components
+  Task 4: Wire everything together
+RIGHT (vertical):
+  Task 1: User can sign up (model + endpoint + validation + test)
+  Task 2: User can log in (auth check + token + test)
+  Task 3: User can view profile (query + endpoint + test)
+```
+Vertical slices ensure every committed task leaves the codebase in a testable state and reduces the blast radius of a bad task.
+```
+```bash
+git commit -m "feat(writing-plans): add vertical slice guidance with anti-pattern example"
+```
+---
+## Task 3: Update executing-tasks skill — deep modules refactoring
+<!-- tdd: trivial -->
+<!-- checkpoint: none -->
+Edit `skills/executing-tasks/SKILL.md` — add a new section after "## TDD discipline":
+```markdown
+## Refactoring
+After all tests pass for a task, check for refactoring opportunities:
+- **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
+- **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
+- **Duplication** — extract repeated patterns
+- **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
+Run tests after each refactor step. Never refactor while tests are failing.
+Key vocabulary: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
+```
+```bash
+git commit -m "feat(executing-tasks): add refactoring checklist with deep modules vocabulary"
+```
+---
+## Task 4: Create diagnose skill
+<!-- tdd: trivial -->
+<!-- checkpoint: none -->
+Create `skills/diagnose/SKILL.md`:
+```markdown
+---
+name: diagnose
+description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
+---
+# Diagnose
+A 6-phase debugging discipline. Phase 1 is the skill — spend disproportionate effort here.
+## Phase 1 — Build a feedback loop
+Create a fast, deterministic, agent-runnable pass/fail signal for the bug before doing anything else. Try in this order: failing test, curl script, CLI invocation, headless browser script.
+The loop must produce the failure mode the **user** described — not a nearby but different failure. Iterate on the loop itself: can you make it faster? Sharper? More deterministic?
+If you genuinely cannot build a loop, stop and say so. List what you tried. Ask for access to a reproducing environment or a captured artifact.
+Do not proceed until you have a loop you believe in.
+## Phase 2 — Reproduce
+Run the loop. Confirm:
+- The failure matches the user's reported symptom
+- The failure is reproducible across multiple runs
+- You've captured the exact symptom (error message, wrong output, slow timing)
+## Phase 3 — Hypothesise
+Generate 3-5 ranked hypotheses. Each must be falsifiable:
+> "If `<X>` is the cause, then `<changing Y>` will make the bug disappear / `<changing Z>` will make it worse."
+Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly.
+## Phase 4 — Instrument
+Each probe must map to a specific hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup later. Prefer a debugger breakpoint over logs when available.
+## Phase 5 — Fix + regression test
+Write the regression test **before** the fix — but only if there's a correct seam (one that exercises the real bug pattern at the call site). If no correct seam exists, note it — the codebase architecture is preventing the bug from being locked down.
+## Phase 6 — Cleanup
+Required before declaring done:
+- Original repro no longer triggers
+- Regression test passes (or absence of seam is documented)
+- All `[DEBUG-...]` instrumentation removed
+- Ask: what would have prevented this bug?
+```
+```bash
+git commit -m "feat(diagnose): add standalone debugging skill with 6-phase loop"
+```
+---
+## Task 5: Update finalizing skill — archive ADRs
+<!-- tdd: trivial -->
+<!-- checkpoint: none -->
+Edit `skills/finalizing/SKILL.md` — update step 1 from:
+```
+1. **Move planning docs** — archive the design, implementation, and progress docs, then commit:
+   ```
+   mkdir -p docs/plans/completed
+   mv docs/plans/*-design.md docs/plans/completed/
+   mv docs/plans/*-implementation.md docs/plans/completed/
+   mv docs/plans/*-progress.md docs/plans/completed/
+   git add docs/plans/ && git commit -m "chore: archive planning docs"
+   ```
+```
+to:
+```
+1. **Move planning docs** — archive the design, implementation, progress docs, and ADRs (if any), then commit:
+   ```
+   mkdir -p docs/plans/completed
+   mkdir -p docs/plans/completed/adr
+   mv docs/plans/*-design.md docs/plans/completed/
+   mv docs/plans/*-implementation.md docs/plans/completed/
+   mv docs/plans/*-progress.md docs/plans/completed/
+   mv docs/plans/adr/*.md docs/plans/completed/adr/ 2>/dev/null || true
+   rmdir docs/plans/adr 2>/dev/null || true
+   git add docs/plans/ && git commit -m "chore: archive planning docs"
+   ```
+```
+```bash
+git commit -m "feat(finalizing): archive ADRs alongside planning docs"
+```
+---
+## Task 6: Update documentation
+<!-- tdd: trivial -->
+<!-- checkpoint: none -->
+### README.md
+Update the intro line from:
+```
+**4 workflow skills** that guide the agent through a structured development process:
+```
+to:
+```
+**4 workflow skills** and **1 utility skill** that guide the agent through a structured development process:
+```
+Update the pipeline diagram from:
+```
+brainstorm → plan → execute → finalize
+```
+to:
+```
+brainstorm → plan → execute → finalize
+                          ↕
+                      diagnose (on demand)
+```
+Add `diagnose` to the skills table:
+```
+| `diagnose` | ~35 | 6-phase debugging loop: build feedback loop, reproduce, hypothesise, instrument, fix, cleanup |
+```
+Update the Architecture section to include `diagnose/`:
+```
+├── skills/
+│   ├── brainstorming/SKILL.md
+│   ├── writing-plans/SKILL.md
+│   ├── executing-tasks/SKILL.md
+│   ├── finalizing/SKILL.md
+│   └── diagnose/SKILL.md
+```
+### docs/developer-usage-guide.md
+Add to the brainstorm section (after "Outcome"):
+```
+- Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions
+```
+Add a new section after the 4 workflow phases:
+```markdown
+### 5. Diagnose (on demand)
+```
+/skill:diagnose
+```
+A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
+```
+### docs/workflow-phases.md
+Add a new section at the end:
+```markdown
+## diagnose
+```
+/skill:diagnose
+```
+Not a pipeline phase. A utility skill invoked on demand when debugging is needed.
+- Build a feedback loop (failing test, curl script, etc.)
+- Reproduce, hypothesise, instrument, fix, cleanup
+- No write restrictions (used during execute/finalize, or outside the pipeline)
+```
+```bash
+git commit -m "docs: update README, usage guide, and workflow phases for new skills"
+```

package/docs/plans/completed/2026-05-01-incorporate-mattpocock-skills-progress.md ADDED Viewed

@@ -0,0 +1,15 @@
+# Progress: incorporate-mattpocock-skills
+Plan: docs/plans/2026-05-01-incorporate-mattpocock-skills-implementation.md
+Branch: incorporate-mattpocock-skills
+Started: 2026-05-01T00:00:00Z
+Last updated: 2026-05-01T00:00:00Z
+| # | Status | Task | Commit |
+|---|--------|------|--------|
+| 1 | ✅ done | Update brainstorming skill — design it twice + ADRs | 0231b84 |
+| 2 | ✅ done | Update writing-plans skill — vertical slices | 22a46df |
+| 3 | ✅ done | Update executing-tasks skill — deep modules refactoring | c405634 |
+| 4 | ✅ done | Create diagnose skill | 5e39e2d |
+| 5 | ✅ done | Update finalizing skill — archive ADRs | e31a1af |
+| 6 | ✅ done | Update documentation (README, usage guide, workflow phases) | 8c1c4eb |

package/docs/workflow-phases.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Workflow Phases
-`pi-workflow-kit` has 4 phases. You invoke each one explicitly with `/skill:`.
+`pi-workflow-kit` has 4 phases and 1 utility skill. You invoke each one explicitly with `/skill:`.
 ```
 brainstorm → plan → execute → finalize
@@ -55,3 +55,15 @@ No write restrictions. All tools available.
 - Clean up worktree if one was used
 No write restrictions. All tools available.
+## diagnose
+```
+/skill:diagnose
+```
+Not a pipeline phase. A utility skill invoked on demand when debugging is needed.
+- Build a feedback loop (failing test, curl script, etc.)
+- Reproduce, hypothesise, instrument, fix, cleanup
+- No write restrictions (used during execute/finalize, or outside the pipeline)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tianhai/pi-workflow-kit",
-  "version": "0.9.0",
+  "version": "0.10.0",
   "description": "Workflow skills and enforcement extensions for pi",
   "keywords": [
     "pi-package"

package/skills/brainstorming/SKILL.md CHANGED Viewed

@@ -11,8 +11,24 @@ Read-only exploration. You may **not** edit or create any files except under `do
 1. **Check git state** — run `git status` and `git log --oneline -5`. If there's uncommitted work, ask the user what to do with it first.
 2. **Understand the idea** — read existing code, docs, and recent commits. Ask questions one at a time to refine the idea. Prefer multiple choice when possible.
-3. **Explore approaches** — propose 2-3 approaches with trade-offs. Lead with your recommendation.
+3. **Explore approaches** — propose 2-3 approaches. For each approach, sketch the concrete interface (types, method signatures, example caller code) so the comparison is grounded in actual code, not abstract descriptions. Lead with your recommendation.
 4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
+   When a significant architectural decision is identified, offer to write a lightweight ADR to `docs/plans/adr/`. Only write an ADR when all three are true:
+   1. **Hard to reverse** — changing your mind later has meaningful cost
+   2. **Surprising without context** — a future reader will wonder "why?"
+   3. **A real trade-off** — there were genuine alternatives
+   ADR format — a title and 1-3 sentences covering context, decision, and why:
+   ```markdown
+   # <Short title of the decision>
+   <1-3 sentences: context, decision, and why.>
+   ```
+   ADRs live under `docs/plans/adr/` and are archived during finalizing alongside the design doc.
 5. **Write the design doc** — save it to `docs/plans/YYYY-MM-DD-<topic>-design.md`. Ask the user to commit it. Branch creation and worktree setup should be deferred to the execution phase (`/skill:executing-tasks`).
 ## Principles

package/skills/diagnose/SKILL.md ADDED Viewed

@@ -0,0 +1,56 @@
+---
+name: diagnose
+description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
+---
+# Diagnose
+A 6-phase debugging discipline. Phase 1 is the skill — spend disproportionate effort here.
+## Phase 1 — Build a feedback loop
+Create a fast, deterministic, agent-runnable pass/fail signal for the bug before doing anything else. Try in this order: failing test, curl script, CLI invocation, headless browser script.
+Other strategies when the basics don't work:
+- **Bisection** — bug appeared between two known states? Automate "boot at state X, check, repeat" to bisect
+- **Replay** — save a real network request or event log to disk, replay it through the code path in isolation
+The loop must produce the failure mode the **user** described — not a nearby but different failure. Iterate on the loop itself: can you make it faster? Sharper? More deterministic?
+If you genuinely cannot build a loop, stop and say so. List what you tried. Ask for access to a reproducing environment or a captured artifact.
+Do not proceed until you have a loop you believe in.
+## Phase 2 — Reproduce
+Run the loop. Confirm:
+- The failure matches the user's reported symptom
+- The failure is reproducible across multiple runs
+- You've captured the exact symptom (error message, wrong output, slow timing)
+Then **minimize the repro** — strip it down to the smallest input, shortest path, or fewest steps that still triggers the bug. A minimized repro dramatically narrows the hypothesis space.
+## Phase 3 — Hypothesise
+Generate 3-5 ranked hypotheses. Each must be falsifiable:
+> "If `<X>` is the cause, then `<changing Y>` will make the bug disappear / `<changing Z>` will make it worse."
+Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly.
+## Phase 4 — Instrument
+Each probe must map to a specific hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup later. Prefer a debugger breakpoint over logs when available.
+## Phase 5 — Fix + regression test
+Write the regression test **before** the fix — but only if there's a correct seam (one that exercises the real bug pattern at the call site). If no correct seam exists, note it — the codebase architecture is preventing the bug from being locked down.
+## Phase 6 — Cleanup
+Required before declaring done:
+- Original repro no longer triggers
+- Regression test passes (or absence of seam is documented)
+- All `[DEBUG-...]` instrumentation removed
+- Ask: what would have prevented this bug?
+- If the bug was caused by an architectural problem (no good test seam, tangled callers, hidden coupling), suggest writing an ADR to `docs/plans/adr/` capturing that insight

package/skills/executing-tasks/SKILL.md CHANGED Viewed

@@ -155,6 +155,19 @@ Follow the TDD scenario from the plan:
 Don't skip tests because "it's obvious." The test is the contract.
+## Refactoring
+After all tests pass for a task, check for refactoring opportunities:
+- **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
+- **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
+- **Duplication** — extract repeated patterns
+- **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
+Run tests after each refactor step. Never refactor while tests are failing.
+Key vocabulary: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
 ## Batching and session management
 The agent suggests a fresh session at natural break points to minimize token accumulation. After completing ~3-5 non-checkpoint tasks in the same session, suggest:

package/skills/finalizing/SKILL.md CHANGED Viewed

@@ -21,12 +21,15 @@ Wait for the user to confirm before proceeding.
 ## Process
-1. **Move planning docs** — archive the design, implementation, and progress docs, then commit:
+1. **Move planning docs** — archive the design, implementation, progress docs, and ADRs (if any), then commit:
    ```
    mkdir -p docs/plans/completed
+   mkdir -p docs/plans/completed/adr
    mv docs/plans/*-design.md docs/plans/completed/
    mv docs/plans/*-implementation.md docs/plans/completed/
    mv docs/plans/*-progress.md docs/plans/completed/
+   mv docs/plans/adr/*.md docs/plans/completed/adr/ 2>/dev/null || true
+   rmdir docs/plans/adr 2>/dev/null || true
    git add docs/plans/ && git commit -m "chore: archive planning docs"
    ```

package/skills/writing-plans/SKILL.md CHANGED Viewed

@@ -44,6 +44,25 @@ These comments are optional — if omitted, the agent infers TDD scenario and ch
 Also use the `<!-- tdd: ... -->` and `<!-- checkpoint: ... -->` metadata comments to specify options explicitly. The inline `checkpoint: test` / `checkpoint: done` label format (e.g. in a task list) is also supported as a fallback, but the metadata comment is the canonical source.
+## Vertical slices
+Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior.
+```
+WRONG (horizontal):
+  Task 1: Create database schema for users
+  Task 2: Write user API endpoints
+  Task 3: Build user UI components
+  Task 4: Wire everything together
+RIGHT (vertical):
+  Task 1: User can sign up (model + endpoint + validation + test)
+  Task 2: User can log in (auth check + token + test)
+  Task 3: User can view profile (query + endpoint + test)
+```
+Vertical slices ensure every committed task leaves the codebase in a testable state and reduces the blast radius of a bad task.
 ## TDD in the plan
 Label each task with its TDD scenario: