@tianhai/pi-workflow-kit 0.9.0 → 0.10.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -44,6 +44,7 @@ During brainstorm and plan, the extension blocks `write`/`edit` outside `docs/pl
44
44
  | `writing-plans` | ~35 | Break design into tasks with TDD scenarios, set up branch/worktree |
45
45
  | `executing-tasks` | ~50 | Implement tasks with TDD discipline, checkpoint review gates, handle code review |
46
46
  | `finalizing` | ~20 | Archive docs, update changelog, create PR, clean up |
47
+ | `diagnose` | ~35 | 6-phase debugging loop: build feedback loop, reproduce, hypothesise, instrument, fix, cleanup |
47
48
 
48
49
  ### TDD Three-Scenario Model
49
50
 
@@ -75,7 +76,8 @@ pi-workflow-kit/
75
76
  │ ├── brainstorming/SKILL.md
76
77
  │ ├── writing-plans/SKILL.md
77
78
  │ ├── executing-tasks/SKILL.md
78
- └── finalizing/SKILL.md
79
+ ├── finalizing/SKILL.md
80
+ │ └── diagnose/SKILL.md
79
81
  ├── tests/
80
82
  │ └── workflow-guard.test.ts
81
83
  ├── package.json
@@ -47,6 +47,8 @@ Explore the idea through collaborative dialogue. The agent reads code, asks ques
47
47
 
48
48
  Outcome: `docs/plans/YYYY-MM-DD-<topic>-design.md`
49
49
 
50
+ Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions.
51
+
50
52
  ### 2. Plan
51
53
 
52
54
  ```
@@ -73,6 +75,14 @@ Implement the plan task-by-task. Each task: implement → run tests → fix if n
73
75
 
74
76
  Archive plan docs, update CHANGELOG/README, create PR, clean up worktree.
75
77
 
78
+ ### 5. Diagnose (on demand)
79
+
80
+ ```
81
+ /skill:diagnose
82
+ ```
83
+
84
+ A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
85
+
76
86
  ## What the extension does
77
87
 
78
88
  The `workflow-guard` extension watches `write` and `edit` tool calls:
@@ -0,0 +1,154 @@
1
+ # Incorporate mattpocock/skills Ideas into pi-workflow-kit
2
+
3
+ ## Source
4
+
5
+ Incorporates engineering best practices from [mattpocock/skills](https://github.com/mattpocock/skills) into pi-workflow-kit's existing workflow. The ideas are adapted to fit pi-workflow-kit's tight plan→execute pipeline and artifact lifecycle philosophy.
6
+
7
+ ## Design principles
8
+
9
+ - **No new forever-documents.** No CONTEXT.md, no persistent glossary. Every artifact has a clear birth and death within the brainstorm→finalize lifecycle.
10
+ - **No new external dependencies.** No issue tracker integration, no sub-agent infrastructure.
11
+ - **Small, precise edits.** Each change is a few lines in the right place in the existing skill files.
12
+
13
+ ## Changes
14
+
15
+ ### 1. "Design it twice" in brainstorming
16
+
17
+ **File:** `skills/brainstorming/SKILL.md`
18
+
19
+ **Current behavior (step 3):**
20
+ > Explore approaches — propose 2-3 approaches with trade-offs. Lead with your recommendation.
21
+
22
+ **New behavior:** Each approach includes a concrete interface sketch — types, method signatures, and example caller code — so the comparison is grounded in actual code, not abstract descriptions.
23
+
24
+ **Rationale:** Without a concrete interface sketch, the agent can describe two approaches that sound different but collapse to the same implementation. Showing the actual caller code makes the trade-offs visible and forces the agent to think about the interface, not just the architecture.
25
+
26
+ **Reference:** Matt's `grill-with-docs` and `improve-codebase-architecture` skills both require concrete interface sketches before any discussion proceeds.
27
+
28
+ ---
29
+
30
+ ### 2. ADRs in brainstorming
31
+
32
+ **File:** `skills/brainstorming/SKILL.md`
33
+
34
+ **Current behavior:** The brainstorming skill produces a design doc. No mechanism for recording *why* decisions were made.
35
+
36
+ **New behavior:** During the design presentation (step 4), when a significant architectural decision is identified, the agent offers to write a lightweight ADR to `docs/plans/adr/`. ADRs are short (one paragraph), and only written when all three conditions are met:
37
+
38
+ 1. **Hard to reverse** — changing your mind later has meaningful cost
39
+ 2. **Surprising without context** — a future reader will wonder "why?"
40
+ 3. **A real trade-off** — there were genuine alternatives
41
+
42
+ ADR format:
43
+
44
+ ```markdown
45
+ # <Short title of the decision>
46
+
47
+ <1-3 sentences: context, decision, and why.>
48
+ ```
49
+
50
+ **Lifecycle:** ADRs live under `docs/plans/adr/` (writable during brainstorm phase). They get archived to `docs/plans/completed/adr/` during finalizing alongside the design doc. No forever-document.
51
+
52
+ **Rationale:** The design doc captures *what* was decided. ADRs capture *why*. This matters when someone (human or agent) encounters the code later and wonders about a surprising implementation choice. The strict gating (all 3 conditions) prevents ADR bloat.
53
+
54
+ **Reference:** Matt's ADR format (single paragraph, optional sections only when genuinely useful). The key adaptation: ADRs live inside `docs/plans/` so they're subject to the same lifecycle as design docs, rather than being a permanent `docs/adr/` directory.
55
+
56
+ ---
57
+
58
+ ### 3. Vertical slices in planning
59
+
60
+ **File:** `skills/writing-plans/SKILL.md`
61
+
62
+ **Current behavior:** Tasks are 2-5 minutes of work with exact file paths and code. No guidance on task *structure* — horizontal slices (all DB tasks, then all API tasks) are as valid as vertical slices.
63
+
64
+ **New behavior:** Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior. The plan should explicitly call out horizontal slicing as an anti-pattern.
65
+
66
+ ```
67
+ WRONG (horizontal):
68
+ Task 1: Create database schema for users
69
+ Task 2: Write user API endpoints
70
+ Task 3: Build user UI components
71
+ Task 4: Wire everything together
72
+
73
+ RIGHT (vertical):
74
+ Task 1: User can sign up (model + endpoint + validation + test)
75
+ Task 2: User can log in (auth check + token + test)
76
+ Task 3: User can view profile (query + endpoint + test)
77
+ ```
78
+
79
+ **Rationale:** Horizontal slicing produces plans where tasks don't compile or run in isolation — you can't test the schema without the API, you can't test the API without the schema. Vertical slices mean every committed task leaves the codebase in a testable state. This also reduces the blast radius of a bad task — rolling back one vertical slice doesn't break unrelated layers.
80
+
81
+ **Reference:** Matt's TDD skill ("Anti-Pattern: Horizontal Slices") and `to-issues` skill ("vertical slice rules"). The key adaptation: in pi-workflow-kit, vertical slices are guidance in the planning skill, not a separate skill with issue tracker integration.
82
+
83
+ ---
84
+
85
+ ### 4. Deep modules in TDD refactoring
86
+
87
+ **File:** `skills/executing-tasks/SKILL.md`
88
+
89
+ **Current behavior:** TDD discipline is: write test first → see it fail → implement → see it pass. No refactoring guidance after tests pass.
90
+
91
+ **New behavior:** After all tests pass for a task, add a refactoring check:
92
+
93
+ - Look for shallow modules (interface nearly as complex as implementation) — can complexity be hidden behind a simpler interface?
94
+ - Apply the deletion test: if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
95
+ - Extract duplication
96
+ - Ensure one adapter = hypothetical seam, two adapters = real seam (don't introduce abstraction unless something actually varies across it)
97
+
98
+ Key vocabulary to use: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
99
+
100
+ **Rationale:** Without a refactoring pass, the agent treats "tests pass" as "done." Over many tasks, this accumulates shallow modules — thin wrappers that add indirection without hiding complexity. A lightweight refactoring checklist prevents this accumulation.
101
+
102
+ **Reference:** Matt's TDD skill (refactoring checklist) and `improve-codebase-architecture` skill (depth, seam, locality vocabulary). The key adaptation: a brief checklist in the existing TDD section, not a full architecture review skill.
103
+
104
+ ---
105
+
106
+ ### 5. Diagnose skill (new standalone skill)
107
+
108
+ **File:** `skills/diagnose/SKILL.md` (new)
109
+
110
+ **What it is:** A 6-phase debugging discipline the user invokes when a test fails, a bug is found, or something is broken during execution. It sits outside the brainstorm→finalize pipeline — a utility skill used on demand.
111
+
112
+ **Phases:**
113
+
114
+ 1. **Build a feedback loop** — the core insight. Before doing anything else, create a fast, deterministic, agent-runnable pass/fail signal for the bug (failing test, curl script, CLI invocation, etc.). "Build the right feedback loop, and the bug is 90% fixed."
115
+ 2. **Reproduce** — run the loop, confirm it matches the user's reported symptom
116
+ 3. **Hypothesise** — generate 3-5 ranked falsifiable hypotheses. Show the list to the user before testing — they often have domain knowledge that re-ranks instantly
117
+ 4. **Instrument** — add targeted debug logs with unique tags (e.g. `[DEBUG-a4f2]`) for easy cleanup. One variable at a time
118
+ 5. **Fix + regression test** — write the regression test before the fix, if a correct test seam exists
119
+ 6. **Cleanup** — remove all debug logs, verify original repro no longer triggers, document what would have prevented the bug
120
+
121
+ **Design decisions:**
122
+
123
+ - **No extension changes needed.** Diagnose is a standalone skill invoked explicitly by the user. It doesn't need workflow-guard integration — if the user invokes it during execution, the workflow-guard already allows all tools.
124
+ - **Not a pipeline phase.** It's a utility skill like a wrench — you pick it up when needed. The brainstorm→plan→execute→finalize pipeline remains unchanged.
125
+ - **Phase 1 is the skill.** The other phases are mechanical. The skill should emphasize that spending disproportionate effort on building the feedback loop is the correct strategy.
126
+
127
+ **Rationale:** Pi-workflow-kit currently has no debugging flow. When a test fails during execution, the agent is unguided — it might stare at code, add random logs, or try shotgun debugging. A disciplined loop prevents wasted time and ensures bugs are properly locked down with regression tests.
128
+
129
+ **Reference:** Matt's `diagnose` skill. The key adaptation: simplified to fit pi-workflow-kit's concise skill style (~30-40 lines instead of ~120 lines with supporting files). No CONTEXT.md dependency — the agent uses the codebase's own terminology.
130
+
131
+ ---
132
+
133
+ ## What's NOT being incorporated
134
+
135
+ | Idea | Why not |
136
+ |------|---------|
137
+ | CONTEXT.md | Accumulates without bound, rots over time, doesn't scale to monorepos |
138
+ | Triage / to-issues / to-prd | Tightly coupled to GitHub/GitLab issue trackers, outside pi-workflow-kit's scope |
139
+ | setup skill | Scaffolding for issue tracker config — not relevant without issue tracker integration |
140
+ | caveman / grill-me | Fun but orthogonal to workflow |
141
+ | Full improve-codebase-architecture | Too heavy (~100+ lines, multiple reference files, depends on CONTEXT.md) |
142
+ | Parallel sub-agents | Not available in pi currently |
143
+
144
+ ## Files changed
145
+
146
+ | File | Change |
147
+ |------|--------|
148
+ | `skills/brainstorming/SKILL.md` | Add "design it twice" interface sketches, add ADR output |
149
+ | `skills/writing-plans/SKILL.md` | Add vertical slice guidance with anti-pattern example |
150
+ | `skills/executing-tasks/SKILL.md` | Add refactoring checklist with deep modules vocabulary |
151
+ | `skills/diagnose/SKILL.md` | New skill file (6-phase debugging loop) |
152
+ | `docs/developer-usage-guide.md` | Mention diagnose skill and ADR output |
153
+ | `docs/workflow-phases.md` | Mention diagnose as a utility skill |
154
+ | `README.md` | Add diagnose to skills table |
@@ -0,0 +1,315 @@
1
+ # Implementation Plan: Incorporate mattpocock/skills Ideas
2
+
3
+ Design doc: `docs/plans/2026-05-01-incorporate-mattpocock-skills-design.md`
4
+
5
+ ## Task 1: Update brainstorming skill — design it twice + ADRs
6
+
7
+ <!-- tdd: trivial -->
8
+ <!-- checkpoint: none -->
9
+
10
+ Edit `skills/brainstorming/SKILL.md`:
11
+
12
+ **Step 3** — change from:
13
+
14
+ ```
15
+ 3. **Explore approaches** — propose 2-3 approaches with trade-offs. Lead with your recommendation.
16
+ ```
17
+
18
+ to:
19
+
20
+ ```
21
+ 3. **Explore approaches** — propose 2-3 approaches. For each approach, sketch the concrete interface (types, method signatures, example caller code) so the comparison is grounded in actual code, not abstract descriptions. Lead with your recommendation.
22
+ ```
23
+
24
+ **Step 4** — change from:
25
+
26
+ ```
27
+ 4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
28
+ ```
29
+
30
+ to:
31
+
32
+ ```
33
+ 4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
34
+
35
+ When a significant architectural decision is identified, offer to write a lightweight ADR to `docs/plans/adr/`. Only write an ADR when all three are true:
36
+
37
+ 1. **Hard to reverse** — changing your mind later has meaningful cost
38
+ 2. **Surprising without context** — a future reader will wonder "why?"
39
+ 3. **A real trade-off** — there were genuine alternatives
40
+
41
+ ADR format — a title and 1-3 sentences covering context, decision, and why:
42
+
43
+ ```markdown
44
+ # <Short title of the decision>
45
+
46
+ <1-3 sentences: context, decision, and why.>
47
+ ```
48
+
49
+ ADRs live under `docs/plans/adr/` and are archived during finalizing alongside the design doc.
50
+ ```
51
+
52
+ ```bash
53
+ git commit -m "feat(brainstorming): add design-it-twice interface sketches and ADR output"
54
+ ```
55
+
56
+ ---
57
+
58
+ ## Task 2: Update writing-plans skill — vertical slices
59
+
60
+ <!-- tdd: trivial -->
61
+ <!-- checkpoint: none -->
62
+
63
+ Edit `skills/writing-plans/SKILL.md` — add a new section after "## Task format" and before "## TDD in the plan":
64
+
65
+ ```markdown
66
+ ## Vertical slices
67
+
68
+ Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior.
69
+
70
+ ```
71
+ WRONG (horizontal):
72
+ Task 1: Create database schema for users
73
+ Task 2: Write user API endpoints
74
+ Task 3: Build user UI components
75
+ Task 4: Wire everything together
76
+
77
+ RIGHT (vertical):
78
+ Task 1: User can sign up (model + endpoint + validation + test)
79
+ Task 2: User can log in (auth check + token + test)
80
+ Task 3: User can view profile (query + endpoint + test)
81
+ ```
82
+
83
+ Vertical slices ensure every committed task leaves the codebase in a testable state and reduces the blast radius of a bad task.
84
+ ```
85
+
86
+ ```bash
87
+ git commit -m "feat(writing-plans): add vertical slice guidance with anti-pattern example"
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Task 3: Update executing-tasks skill — deep modules refactoring
93
+
94
+ <!-- tdd: trivial -->
95
+ <!-- checkpoint: none -->
96
+
97
+ Edit `skills/executing-tasks/SKILL.md` — add a new section after "## TDD discipline":
98
+
99
+ ```markdown
100
+ ## Refactoring
101
+
102
+ After all tests pass for a task, check for refactoring opportunities:
103
+
104
+ - **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
105
+ - **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
106
+ - **Duplication** — extract repeated patterns
107
+ - **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
108
+
109
+ Run tests after each refactor step. Never refactor while tests are failing.
110
+
111
+ Key vocabulary: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
112
+ ```
113
+
114
+ ```bash
115
+ git commit -m "feat(executing-tasks): add refactoring checklist with deep modules vocabulary"
116
+ ```
117
+
118
+ ---
119
+
120
+ ## Task 4: Create diagnose skill
121
+
122
+ <!-- tdd: trivial -->
123
+ <!-- checkpoint: none -->
124
+
125
+ Create `skills/diagnose/SKILL.md`:
126
+
127
+ ```markdown
128
+ ---
129
+ name: diagnose
130
+ description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
131
+ ---
132
+
133
+ # Diagnose
134
+
135
+ A 6-phase debugging discipline. Phase 1 is the skill — spend disproportionate effort here.
136
+
137
+ ## Phase 1 — Build a feedback loop
138
+
139
+ Create a fast, deterministic, agent-runnable pass/fail signal for the bug before doing anything else. Try in this order: failing test, curl script, CLI invocation, headless browser script.
140
+
141
+ The loop must produce the failure mode the **user** described — not a nearby but different failure. Iterate on the loop itself: can you make it faster? Sharper? More deterministic?
142
+
143
+ If you genuinely cannot build a loop, stop and say so. List what you tried. Ask for access to a reproducing environment or a captured artifact.
144
+
145
+ Do not proceed until you have a loop you believe in.
146
+
147
+ ## Phase 2 — Reproduce
148
+
149
+ Run the loop. Confirm:
150
+ - The failure matches the user's reported symptom
151
+ - The failure is reproducible across multiple runs
152
+ - You've captured the exact symptom (error message, wrong output, slow timing)
153
+
154
+ ## Phase 3 — Hypothesise
155
+
156
+ Generate 3-5 ranked hypotheses. Each must be falsifiable:
157
+
158
+ > "If `<X>` is the cause, then `<changing Y>` will make the bug disappear / `<changing Z>` will make it worse."
159
+
160
+ Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly.
161
+
162
+ ## Phase 4 — Instrument
163
+
164
+ Each probe must map to a specific hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup later. Prefer a debugger breakpoint over logs when available.
165
+
166
+ ## Phase 5 — Fix + regression test
167
+
168
+ Write the regression test **before** the fix — but only if there's a correct seam (one that exercises the real bug pattern at the call site). If no correct seam exists, note it — the codebase architecture is preventing the bug from being locked down.
169
+
170
+ ## Phase 6 — Cleanup
171
+
172
+ Required before declaring done:
173
+ - Original repro no longer triggers
174
+ - Regression test passes (or absence of seam is documented)
175
+ - All `[DEBUG-...]` instrumentation removed
176
+ - Ask: what would have prevented this bug?
177
+ ```
178
+
179
+ ```bash
180
+ git commit -m "feat(diagnose): add standalone debugging skill with 6-phase loop"
181
+ ```
182
+
183
+ ---
184
+
185
+ ## Task 5: Update finalizing skill — archive ADRs
186
+
187
+ <!-- tdd: trivial -->
188
+ <!-- checkpoint: none -->
189
+
190
+ Edit `skills/finalizing/SKILL.md` — update step 1 from:
191
+
192
+ ```
193
+ 1. **Move planning docs** — archive the design, implementation, and progress docs, then commit:
194
+ ```
195
+ mkdir -p docs/plans/completed
196
+ mv docs/plans/*-design.md docs/plans/completed/
197
+ mv docs/plans/*-implementation.md docs/plans/completed/
198
+ mv docs/plans/*-progress.md docs/plans/completed/
199
+ git add docs/plans/ && git commit -m "chore: archive planning docs"
200
+ ```
201
+ ```
202
+
203
+ to:
204
+
205
+ ```
206
+ 1. **Move planning docs** — archive the design, implementation, progress docs, and ADRs (if any), then commit:
207
+ ```
208
+ mkdir -p docs/plans/completed
209
+ mkdir -p docs/plans/completed/adr
210
+ mv docs/plans/*-design.md docs/plans/completed/
211
+ mv docs/plans/*-implementation.md docs/plans/completed/
212
+ mv docs/plans/*-progress.md docs/plans/completed/
213
+ mv docs/plans/adr/*.md docs/plans/completed/adr/ 2>/dev/null || true
214
+ rmdir docs/plans/adr 2>/dev/null || true
215
+ git add docs/plans/ && git commit -m "chore: archive planning docs"
216
+ ```
217
+ ```
218
+
219
+ ```bash
220
+ git commit -m "feat(finalizing): archive ADRs alongside planning docs"
221
+ ```
222
+
223
+ ---
224
+
225
+ ## Task 6: Update documentation
226
+
227
+ <!-- tdd: trivial -->
228
+ <!-- checkpoint: none -->
229
+
230
+ ### README.md
231
+
232
+ Update the intro line from:
233
+
234
+ ```
235
+ **4 workflow skills** that guide the agent through a structured development process:
236
+ ```
237
+
238
+ to:
239
+
240
+ ```
241
+ **4 workflow skills** and **1 utility skill** that guide the agent through a structured development process:
242
+ ```
243
+
244
+ Update the pipeline diagram from:
245
+
246
+ ```
247
+ brainstorm → plan → execute → finalize
248
+ ```
249
+
250
+ to:
251
+
252
+ ```
253
+ brainstorm → plan → execute → finalize
254
+
255
+ diagnose (on demand)
256
+ ```
257
+
258
+ Add `diagnose` to the skills table:
259
+
260
+ ```
261
+ | `diagnose` | ~35 | 6-phase debugging loop: build feedback loop, reproduce, hypothesise, instrument, fix, cleanup |
262
+ ```
263
+
264
+ Update the Architecture section to include `diagnose/`:
265
+
266
+ ```
267
+ ├── skills/
268
+ │ ├── brainstorming/SKILL.md
269
+ │ ├── writing-plans/SKILL.md
270
+ │ ├── executing-tasks/SKILL.md
271
+ │ ├── finalizing/SKILL.md
272
+ │ └── diagnose/SKILL.md
273
+ ```
274
+
275
+ ### docs/developer-usage-guide.md
276
+
277
+ Add to the brainstorm section (after "Outcome"):
278
+
279
+ ```
280
+ - Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions
281
+ ```
282
+
283
+ Add a new section after the 4 workflow phases:
284
+
285
+ ```markdown
286
+ ### 5. Diagnose (on demand)
287
+
288
+ ```
289
+ /skill:diagnose
290
+ ```
291
+
292
+ A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
293
+ ```
294
+
295
+ ### docs/workflow-phases.md
296
+
297
+ Add a new section at the end:
298
+
299
+ ```markdown
300
+ ## diagnose
301
+
302
+ ```
303
+ /skill:diagnose
304
+ ```
305
+
306
+ Not a pipeline phase. A utility skill invoked on demand when debugging is needed.
307
+
308
+ - Build a feedback loop (failing test, curl script, etc.)
309
+ - Reproduce, hypothesise, instrument, fix, cleanup
310
+ - No write restrictions (used during execute/finalize, or outside the pipeline)
311
+ ```
312
+
313
+ ```bash
314
+ git commit -m "docs: update README, usage guide, and workflow phases for new skills"
315
+ ```
@@ -0,0 +1,15 @@
1
+ # Progress: incorporate-mattpocock-skills
2
+
3
+ Plan: docs/plans/2026-05-01-incorporate-mattpocock-skills-implementation.md
4
+ Branch: incorporate-mattpocock-skills
5
+ Started: 2026-05-01T00:00:00Z
6
+ Last updated: 2026-05-01T00:00:00Z
7
+
8
+ | # | Status | Task | Commit |
9
+ |---|--------|------|--------|
10
+ | 1 | ✅ done | Update brainstorming skill — design it twice + ADRs | 0231b84 |
11
+ | 2 | ✅ done | Update writing-plans skill — vertical slices | 22a46df |
12
+ | 3 | ✅ done | Update executing-tasks skill — deep modules refactoring | c405634 |
13
+ | 4 | ✅ done | Create diagnose skill | 5e39e2d |
14
+ | 5 | ✅ done | Update finalizing skill — archive ADRs | e31a1af |
15
+ | 6 | ✅ done | Update documentation (README, usage guide, workflow phases) | 8c1c4eb |
@@ -1,6 +1,6 @@
1
1
  # Workflow Phases
2
2
 
3
- `pi-workflow-kit` has 4 phases. You invoke each one explicitly with `/skill:`.
3
+ `pi-workflow-kit` has 4 phases and 1 utility skill. You invoke each one explicitly with `/skill:`.
4
4
 
5
5
  ```
6
6
  brainstorm → plan → execute → finalize
@@ -55,3 +55,15 @@ No write restrictions. All tools available.
55
55
  - Clean up worktree if one was used
56
56
 
57
57
  No write restrictions. All tools available.
58
+
59
+ ## diagnose
60
+
61
+ ```
62
+ /skill:diagnose
63
+ ```
64
+
65
+ Not a pipeline phase. A utility skill invoked on demand when debugging is needed.
66
+
67
+ - Build a feedback loop (failing test, curl script, etc.)
68
+ - Reproduce, hypothesise, instrument, fix, cleanup
69
+ - No write restrictions (used during execute/finalize, or outside the pipeline)
@@ -45,7 +45,7 @@ const DESTRUCTIVE_PATTERNS = [
45
45
  /\bshutdown\b/i,
46
46
  /\bsystemctl\s+(start|stop|restart|enable|disable)/i,
47
47
  /\bservice\s+\S+\s+(start|stop|restart)/i,
48
- /\b(vim?|nano|emacs|code|subl)\b/i,
48
+ /^\s*(vim?|nano|emacs|code|subl)\b/i,
49
49
  ];
50
50
 
51
51
  const SAFE_PATTERNS = [
@@ -117,7 +117,8 @@ const SAFE_PATTERNS = [
117
117
  ];
118
118
 
119
119
  /** Split a compound command into individual sub-commands.
120
- * Handles &&, ||, ;, and | (pipe) operators, ignoring leading whitespace.
120
+ * Splits on &&, ||, and ; operators, ignoring leading whitespace.
121
+ * Does NOT split on | (pipe) to allow piping (e.g. `git log | head`).
121
122
  */
122
123
  function splitCompoundCommand(command: string): string[] {
123
124
  // Match sub-commands separated by &&, ||, ; (with optional whitespace)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tianhai/pi-workflow-kit",
3
- "version": "0.9.0",
3
+ "version": "0.10.1",
4
4
  "description": "Workflow skills and enforcement extensions for pi",
5
5
  "keywords": [
6
6
  "pi-package"
@@ -11,8 +11,24 @@ Read-only exploration. You may **not** edit or create any files except under `do
11
11
 
12
12
  1. **Check git state** — run `git status` and `git log --oneline -5`. If there's uncommitted work, ask the user what to do with it first.
13
13
  2. **Understand the idea** — read existing code, docs, and recent commits. Ask questions one at a time to refine the idea. Prefer multiple choice when possible.
14
- 3. **Explore approaches** — propose 2-3 approaches with trade-offs. Lead with your recommendation.
14
+ 3. **Explore approaches** — propose 2-3 approaches. For each approach, sketch the concrete interface (types, method signatures, example caller code) so the comparison is grounded in actual code, not abstract descriptions. Lead with your recommendation.
15
15
  4. **Present the design** — break it into sections of 200-300 words. Check after each section whether it looks right. Cover: architecture, components, data flow, error handling, testing.
16
+
17
+ When a significant architectural decision is identified, offer to write a lightweight ADR to `docs/plans/adr/`. Only write an ADR when all three are true:
18
+
19
+ 1. **Hard to reverse** — changing your mind later has meaningful cost
20
+ 2. **Surprising without context** — a future reader will wonder "why?"
21
+ 3. **A real trade-off** — there were genuine alternatives
22
+
23
+ ADR format — a title and 1-3 sentences covering context, decision, and why:
24
+
25
+ ```markdown
26
+ # <Short title of the decision>
27
+
28
+ <1-3 sentences: context, decision, and why.>
29
+ ```
30
+
31
+ ADRs live under `docs/plans/adr/` and are archived during finalizing alongside the design doc.
16
32
  5. **Write the design doc** — save it to `docs/plans/YYYY-MM-DD-<topic>-design.md`. Ask the user to commit it. Branch creation and worktree setup should be deferred to the execution phase (`/skill:executing-tasks`).
17
33
 
18
34
  ## Principles
@@ -0,0 +1,56 @@
1
+ ---
2
+ name: diagnose
3
+ description: "Disciplined debugging loop for hard bugs and performance regressions. Use when a test fails unexpectedly, a bug is found during execution, or something is broken."
4
+ ---
5
+
6
+ # Diagnose
7
+
8
+ A 6-phase debugging discipline. Phase 1 is the skill — spend disproportionate effort here.
9
+
10
+ ## Phase 1 — Build a feedback loop
11
+
12
+ Create a fast, deterministic, agent-runnable pass/fail signal for the bug before doing anything else. Try in this order: failing test, curl script, CLI invocation, headless browser script.
13
+
14
+ Other strategies when the basics don't work:
15
+ - **Bisection** — bug appeared between two known states? Automate "boot at state X, check, repeat" to bisect
16
+ - **Replay** — save a real network request or event log to disk, replay it through the code path in isolation
17
+
18
+ The loop must produce the failure mode the **user** described — not a nearby but different failure. Iterate on the loop itself: can you make it faster? Sharper? More deterministic?
19
+
20
+ If you genuinely cannot build a loop, stop and say so. List what you tried. Ask for access to a reproducing environment or a captured artifact.
21
+
22
+ Do not proceed until you have a loop you believe in.
23
+
24
+ ## Phase 2 — Reproduce
25
+
26
+ Run the loop. Confirm:
27
+ - The failure matches the user's reported symptom
28
+ - The failure is reproducible across multiple runs
29
+ - You've captured the exact symptom (error message, wrong output, slow timing)
30
+
31
+ Then **minimize the repro** — strip it down to the smallest input, shortest path, or fewest steps that still triggers the bug. A minimized repro dramatically narrows the hypothesis space.
32
+
33
+ ## Phase 3 — Hypothesise
34
+
35
+ Generate 3-5 ranked hypotheses. Each must be falsifiable:
36
+
37
+ > "If `<X>` is the cause, then `<changing Y>` will make the bug disappear / `<changing Z>` will make it worse."
38
+
39
+ Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly.
40
+
41
+ ## Phase 4 — Instrument
42
+
43
+ Each probe must map to a specific hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup later. Prefer a debugger breakpoint over logs when available.
44
+
45
+ ## Phase 5 — Fix + regression test
46
+
47
+ Write the regression test **before** the fix — but only if there's a correct seam (one that exercises the real bug pattern at the call site). If no correct seam exists, note it — the codebase architecture is preventing the bug from being locked down.
48
+
49
+ ## Phase 6 — Cleanup
50
+
51
+ Required before declaring done:
52
+ - Original repro no longer triggers
53
+ - Regression test passes (or absence of seam is documented)
54
+ - All `[DEBUG-...]` instrumentation removed
55
+ - Ask: what would have prevented this bug?
56
+ - If the bug was caused by an architectural problem (no good test seam, tangled callers, hidden coupling), suggest writing an ADR to `docs/plans/adr/` capturing that insight
@@ -155,6 +155,19 @@ Follow the TDD scenario from the plan:
155
155
 
156
156
  Don't skip tests because "it's obvious." The test is the contract.
157
157
 
158
+ ## Refactoring
159
+
160
+ After all tests pass for a task, check for refactoring opportunities:
161
+
162
+ - **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
163
+ - **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
164
+ - **Duplication** — extract repeated patterns
165
+ - **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
166
+
167
+ Run tests after each refactor step. Never refactor while tests are failing.
168
+
169
+ Key vocabulary: **depth** (lots of behavior behind a small interface), **seam** (where behavior can be altered without editing in place), **locality** (change concentrated in one place).
170
+
158
171
  ## Batching and session management
159
172
 
160
173
  The agent suggests a fresh session at natural break points to minimize token accumulation. After completing ~3-5 non-checkpoint tasks in the same session, suggest:
@@ -21,12 +21,15 @@ Wait for the user to confirm before proceeding.
21
21
 
22
22
  ## Process
23
23
 
24
- 1. **Move planning docs** — archive the design, implementation, and progress docs, then commit:
24
+ 1. **Move planning docs** — archive the design, implementation, progress docs, and ADRs (if any), then commit:
25
25
  ```
26
26
  mkdir -p docs/plans/completed
27
+ mkdir -p docs/plans/completed/adr
27
28
  mv docs/plans/*-design.md docs/plans/completed/
28
29
  mv docs/plans/*-implementation.md docs/plans/completed/
29
30
  mv docs/plans/*-progress.md docs/plans/completed/
31
+ mv docs/plans/adr/*.md docs/plans/completed/adr/ 2>/dev/null || true
32
+ rmdir docs/plans/adr 2>/dev/null || true
30
33
  git add docs/plans/ && git commit -m "chore: archive planning docs"
31
34
  ```
32
35
 
@@ -44,6 +44,25 @@ These comments are optional — if omitted, the agent infers TDD scenario and ch
44
44
  Also use the `<!-- tdd: ... -->` and `<!-- checkpoint: ... -->` metadata comments to specify options explicitly. The inline `checkpoint: test` / `checkpoint: done` label format (e.g. in a task list) is also supported as a fallback, but the metadata comment is the canonical source.
45
45
 
46
46
 
47
+ ## Vertical slices
48
+
49
+ Each task should be a **vertical slice** — a thin path through ALL relevant layers end-to-end, delivering one complete piece of observable behavior.
50
+
51
+ ```
52
+ WRONG (horizontal):
53
+ Task 1: Create database schema for users
54
+ Task 2: Write user API endpoints
55
+ Task 3: Build user UI components
56
+ Task 4: Wire everything together
57
+
58
+ RIGHT (vertical):
59
+ Task 1: User can sign up (model + endpoint + validation + test)
60
+ Task 2: User can log in (auth check + token + test)
61
+ Task 3: User can view profile (query + endpoint + test)
62
+ ```
63
+
64
+ Vertical slices ensure every committed task leaves the codebase in a testable state and reduces the blast radius of a bad task.
65
+
47
66
  ## TDD in the plan
48
67
 
49
68
  Label each task with its TDD scenario: