@tianhai/pi-workflow-kit 0.16.0 → 0.18.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/README.md +51 -39
  2. package/docs/developer-usage-guide.md +32 -14
  3. package/docs/lessons.md +18 -0
  4. package/docs/oversight-model.md +11 -5
  5. package/docs/plans/2026-06-03-karpathy-guidelines-ab-comparison.md +166 -0
  6. package/docs/plans/completed/2026-06-03-add-verify-skill-design.md +51 -0
  7. package/docs/plans/completed/2026-06-03-add-verify-skill-implementation.md +111 -0
  8. package/docs/plans/completed/2026-06-03-add-verify-skill-progress.md +11 -0
  9. package/docs/plans/completed/2026-06-03-verify-skill-design.md +176 -0
  10. package/docs/plans/completed/2026-06-09-code-review-fixes-implementation.md +74 -0
  11. package/docs/plans/completed/2026-06-09-code-review-fixes-progress.md +14 -0
  12. package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-design.md +186 -0
  13. package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-implementation.md +675 -0
  14. package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-progress.md +18 -0
  15. package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-verification-report.md +81 -0
  16. package/docs/plans/completed/2026-06-09-verification-fixes-implementation.md +69 -0
  17. package/docs/plans/completed/2026-06-09-verification-fixes-progress.md +14 -0
  18. package/docs/workflow-phases.md +19 -13
  19. package/extensions/workflow-guard.ts +10 -9
  20. package/package.json +2 -1
  21. package/skills/{brainstorming → pwk-brainstorming}/SKILL.md +17 -6
  22. package/skills/{design-review → pwk-design-review}/SKILL.md +11 -9
  23. package/skills/{diagnose → pwk-diagnose}/SKILL.md +1 -1
  24. package/skills/{executing-tasks → pwk-executing-tasks}/SKILL.md +46 -16
  25. package/skills/{finalizing → pwk-finalizing}/SKILL.md +9 -2
  26. package/skills/pwk-verify/SKILL.md +170 -0
  27. package/skills/{writing-plans → pwk-writing-plans}/SKILL.md +72 -6
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # pi-workflow-kit
2
2
 
3
- > Stop AI agents from rushing to code. Enforce a structured brainstorm→plan→execute→finalize workflow with TDD discipline.
3
+ > Stop AI agents from rushing to code. Enforce a structured brainstorm→plan→execute→verify→finalize workflow with TDD discipline.
4
4
 
5
5
  AI coding agents tend to skip design and jump straight into implementation, producing over-engineered or misaligned code. **pi-workflow-kit** solves this by hard-blocking write operations during brainstorm and planning phases — the agent *literally cannot modify your source files* until you approve the design.
6
6
 
@@ -28,29 +28,30 @@ Enforces phase-appropriate tool access — not just guidelines, but hard blocks:
28
28
 
29
29
  | Phase | `write` / `edit` | `bash` |
30
30
  |-------|:-:|:-:|
31
- | **Brainstorm** / **Plan** | 🔒 Blocked outside `docs/plans/` | 🔒 Read-only only (grep, find, cat, git status, curl…) |
31
+ | **Brainstorm** / **Plan** / **Verify** | 🔒 Blocked outside `docs/plans/` | 🔒 Read-only only (grep, find, cat, git status, curl…) |
32
32
  | **Execute** / **Finalize** | ✅ Full access | ✅ Full access |
33
33
 
34
34
  The agent can read code and discuss design with you during brainstorm/plan, but it physically cannot modify source files or run mutating commands.
35
35
 
36
- ### 🧠 6 Workflow Skills
36
+ ### 🧠 7 Workflow Skills
37
37
 
38
38
  Guide the agent through a disciplined development process:
39
39
 
40
- ```
41
- brainstorm → design-review → plan → execute → finalize
42
-
43
- diagnose (anytime)
44
- ```
40
+ brainstorm → plan → [design-review?] → execute → [verify?] → finalize
41
+
42
+ diagnose (anytime)
43
+
44
+ For multi-feature designs, the plan→execute loop repeats per feature.
45
45
 
46
46
  | Phase | Trigger | What Happens |
47
47
  |-------|---------|--------------|
48
- | **Brainstorm** | `/skill:brainstorming` | Explore approaches, debate tradeoffs, produce a design doc |
49
- | **Design Review** | `/skill:design-review` | Audit design for production risks (security, scalability, fault tolerance) |
50
- | **Plan** | `/skill:writing-plans` | Break design into bite-sized TDD tasks with acceptance criteria and concrete code |
51
- | **Execute** | `/skill:executing-tasks` | Implement tasks one-by-one with TDD discipline and pre-commit checkpoint review gates |
52
- | **Finalize** | `/skill:finalizing` | Archive plan docs, update README/CHANGELOG, create PR |
53
- | **Diagnose** | `/skill:diagnose` | 6-phase debugging loop: reproduce hypothesize instrument → fix → verify |
48
+ | **Brainstorm** | `/skill:pwk-brainstorming` | Explore approaches, debate tradeoffs, produce a design doc with a Features table |
49
+ | **Design Review** | `/skill:pwk-design-review` | Audit plan and design for production risks (security, scalability, fault tolerance) |
50
+ | **Plan** | `/skill:pwk-writing-plans` | Plan one feature at a time from the Features table — bite-sized TDD tasks with acceptance criteria |
51
+ | **Execute** | `/skill:pwk-executing-tasks` | Implement tasks one-by-one with TDD discipline and pre-commit checkpoint review gates |
52
+ | **Verify** | `/skill:pwk-verify` | Three expert review passes (security, optimization, traceability) on implemented code |
53
+ | **Finalize** | `/skill:pwk-finalizing` | Archive plan docs, update README/CHANGELOG, create PR |
54
+ | **Diagnose** | `/skill:pwk-diagnose` | 6-phase debugging loop: reproduce → hypothesize → instrument → fix → verify |
54
55
 
55
56
  ## The Workflow in Detail
56
57
 
@@ -58,13 +59,24 @@ brainstorm → design-review → plan → execute → finalize
58
59
 
59
60
  You control each phase — the agent never advances on its own. Invoke a skill to move forward:
60
61
 
61
- ```
62
- /skill:brainstormingdiscuss and design
63
- /skill:design-review → audit for production risks (non-trivial designs)
64
- /skill:writing-plans break into tasks
65
- /skill:executing-tasks implement with TDD
66
- /skill:finalizing → ship it
67
- ```
62
+ /skill:pwk-brainstorming → discuss and design (names features)
63
+ /skill:pwk-writing-plansplan next feature from the Features table
64
+ /skill:pwk-design-review → audit for production risks (on demand)
65
+ /skill:pwk-executing-tasks implement with TDD
66
+ /skill:pwk-verify review code for security, optimization, and traceability
67
+ /skill:pwk-finalizing → ship it
68
+
69
+ ### Feature-Based Planning
70
+
71
+ Design docs include a `## Features` table that tracks each feature's status:
72
+
73
+ | # | Feature | Status | Notes |
74
+ |---|---------|--------|-------|
75
+ | 1 | User signup | ✅ done | |
76
+ | 2 | Email verification | 🔄 planned | Plan: docs/plans/...-email-verification-implementation.md |
77
+ | 3 | Password reset | ⬜ pending | |
78
+
79
+ This enables incremental development — plan and execute one feature at a time, then loop back for the next.
68
80
 
69
81
  ### TDD Three-Scenario Model
70
82
 
@@ -112,24 +124,23 @@ Optionally label tasks with a `checkpoint` to pause for human review. At each ch
112
124
  pi install npm:@tianhai/pi-workflow-kit
113
125
 
114
126
  # Start a new feature
115
- > /skill:brainstorming
127
+ > /skill:pwk-brainstorming
116
128
  > I want to add OAuth2 login to our API
117
129
 
118
- # (agent explores approaches, writes design doc)
130
+ # (agent explores approaches, writes design doc with Features table)
119
131
  # (write/edit are blocked — your code is safe)
120
132
 
121
- > /skill:design-review
133
+ > /skill:pwk-writing-plans
122
134
 
123
- # (agent audits for security, scalability, fault tolerance)
124
- # (trivial changes can skip this step)
125
-
126
- > /skill:writing-plans
127
-
128
- # (agent breaks design into TDD tasks with acceptance criteria)
129
- > /skill:executing-tasks
135
+ # (agent picks next feature, breaks into TDD tasks)
136
+ # (triggers design review for non-trivial features)
137
+ > /skill:pwk-executing-tasks
130
138
 
131
139
  # (agent implements with TDD, cognitive persona shifts, all tools unlocked)
132
- > /skill:finalizing
140
+ > /skill:pwk-verify
141
+
142
+ # (agent runs security, optimization, and traceability reviews on implemented code)
143
+ > /skill:pwk-finalizing
133
144
 
134
145
  # (agent archives docs, curates lessons, creates PR)
135
146
  ```
@@ -146,14 +157,15 @@ pi install npm:@tianhai/pi-workflow-kit
146
157
  ```
147
158
  pi-workflow-kit/
148
159
  ├── extensions/
149
- │ └── workflow-guard.ts # Write blocker during brainstorm/plan
160
+ │ └── workflow-guard.ts # Write blocker during brainstorm/plan/verify
150
161
  ├── skills/
151
- │ ├── brainstorming/SKILL.md
152
- │ ├── design-review/SKILL.md
153
- │ ├── writing-plans/SKILL.md
154
- │ ├── executing-tasks/SKILL.md
155
- │ ├── finalizing/SKILL.md
156
- └── diagnose/SKILL.md
162
+ │ ├── pwk-brainstorming/SKILL.md
163
+ │ ├── pwk-design-review/SKILL.md
164
+ │ ├── pwk-writing-plans/SKILL.md
165
+ │ ├── pwk-executing-tasks/SKILL.md
166
+ │ ├── pwk-verify/SKILL.md
167
+ ├── pwk-finalizing/SKILL.md
168
+ │ └── pwk-diagnose/SKILL.md
157
169
  ├── tests/
158
170
  │ └── workflow-guard.test.ts
159
171
  ├── package.json
@@ -4,8 +4,9 @@ How to install and use `pi-workflow-kit` with the Pi coding agent.
4
4
 
5
5
  ## What you get
6
6
 
7
- - **4 skills** that guide the agent through a structured workflow
8
- - **1 extension** that hard-blocks source writes during brainstorm and plan phases
7
+ - **4 workflow skills** that guide the agent through a structured feature-based workflow
8
+ - **3 on-demand skills** for design review, verification, and debugging
9
+ - **1 extension** that hard-blocks source writes during brainstorm, plan, and verify phases
9
10
 
10
11
  ## Installation
11
12
 
@@ -31,54 +32,70 @@ Or in `.pi/settings.json` / `~/.pi/agent/config.json`:
31
32
 
32
33
  ## The workflow
33
34
 
34
- You control each phase by invoking the skill:
35
+ You control each phase by invoking the skill. For multi-feature designs, the plan→execute loop repeats per feature:
35
36
 
36
37
  ```
37
- /skill:brainstorming → /skill:writing-plans → /skill:executing-tasks → /skill:finalizing
38
+ /skill:pwk-brainstorming → /skill:pwk-writing-plans → /skill:pwk-executing-tasks → loop or /skill:pwk-finalizing
38
39
  ```
39
40
 
40
41
  ### 1. Brainstorm
41
42
 
42
43
  ```
43
- /skill:brainstorming
44
+ /skill:pwk-brainstorming
44
45
  ```
45
46
 
46
47
  Explore the idea through collaborative dialogue. The agent reads code, asks questions one at a time, proposes 2-3 approaches, and presents the design in sections for your review.
47
48
 
48
- Outcome: `docs/plans/YYYY-MM-DD-<topic>-design.md`
49
+ Outcome: `docs/plans/YYYY-MM-DD-<topic>-design.md` with a `## Features` table
49
50
 
50
51
  Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions.
51
52
 
52
53
  ### 2. Plan
53
54
 
54
55
  ```
55
- /skill:writing-plans
56
+ /skill:pwk-writing-plans
56
57
  ```
57
58
 
58
- Read the design doc and break it into bite-sized tasks with exact file paths, complete code, and TDD scenarios. Optionally set up a branch or worktree.
59
+ Read the design doc's Features table, pick the next `⬜ pending` feature, and create a per-feature implementation plan with exact file paths, complete code, and TDD scenarios. Optionally set up a branch or worktree.
59
60
 
60
- Outcome: `docs/plans/YYYY-MM-DD-<topic>-implementation.md`
61
+ Outcome: `docs/plans/YYYY-MM-DD-<topic>-<feature-name>-implementation.md`
61
62
 
62
63
  ### 3. Execute
63
64
 
64
65
  ```
65
- /skill:executing-tasks
66
+ /skill:pwk-executing-tasks
66
67
  ```
67
68
 
68
- Implement the plan task-by-task. Each task: implement → run tests → fix if needed → commit.
69
+ Implement the plan task-by-task. Each task: implement → run tests → fix if needed → commit. When the feature is done, marks it `✅ done` in the design doc and suggests planning the next feature.
69
70
 
70
71
  ### 4. Finalize
71
72
 
72
73
  ```
73
- /skill:finalizing
74
+ /skill:pwk-finalizing
74
75
  ```
75
76
 
76
77
  Archive plan docs, update CHANGELOG/README, create PR, clean up worktree.
77
78
 
78
- ### 5. Diagnose (on demand)
79
+ ### 5. Design Review (on demand)
79
80
 
80
81
  ```
81
- /skill:diagnose
82
+ /skill:pwk-design-review
83
+ ```
84
+
85
+ Audit a plan doc for production risks — security, scalability, fault tolerance, and operational hazards. Triggered by writing-plans for non-trivial features. Review findings append to the plan doc, not the design doc.
86
+
87
+ ### 6. Verify (on demand)
88
+
89
+ ```
90
+ /skill:pwk-verify
91
+ ```
92
+
93
+ Post-implementation verification with three expert passes — security, optimization, and traceability. Run after executing a feature or before finalizing.
94
+
95
+ ### 7. Diagnose (on demand)
96
+
97
+ ```
98
+ /skill:pwk-diagnose
82
99
  ```
83
100
 
84
101
  A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
@@ -88,6 +105,7 @@ A 6-phase debugging loop you invoke when something is broken. Build a feedback l
88
105
  The `workflow-guard` extension watches `write` and `edit` tool calls:
89
106
 
90
107
  - **During brainstorm and plan**: blocks writes outside `docs/plans/`. The agent can read code and use bash, but cannot modify source files.
108
+ - **During verify**: same read-only enforcement — the agent can inspect code but not modify it.
91
109
  - **During execute and finalize**: no restrictions. All tools available.
92
110
 
93
111
  No configuration needed. It activates automatically after install.
@@ -0,0 +1,18 @@
1
+ # Lessons Learned
2
+
3
+ <!--
4
+ Agent: read this at the start of each task during executing-tasks.
5
+ Follow every rule. Add new rules when you catch yourself making repeat mistakes.
6
+ Rules must be generic patterns applicable to any domain or feature — not specific to one service, entity, or use case.
7
+ Retire rules that no longer apply during finalizing.
8
+ -->
9
+
10
+ ## Cross-Skill Consistency
11
+
12
+ - When adding instructions that reference artifacts from another skill (e.g., "extract metadata from plan doc"), always add a guard for when that artifact doesn't exist — not all workflows use all artifacts
13
+ - When reordering instructions within a step, verify all conditional branches still reference the correct context (e.g., hazard checks that say "this feature" must run after feature identification)
14
+
15
+ ## Documentation
16
+
17
+ - When adding a new phase to an extension, update ALL comments and error messages — stale comments in one place create confusion about the actual behavior
18
+ - When renaming skills with a prefix, check for `/skill:` references in prose and code blocks separately — backtick-enclosed references in code examples may use a different pattern than prose references
@@ -6,10 +6,16 @@
6
6
 
7
7
  Skills teach the agent the workflow. There are 4:
8
8
 
9
- - **brainstorming** — explore ideas, produce a design doc
10
- - **writing-plans** — break design into TDD tasks
11
- - **executing-tasks** — implement tasks, handle code review
12
- - **finalizing** — archive docs, create PR
9
+ - **pwk-brainstorming** — explore ideas, produce a design doc with a Features table
10
+ - **pwk-writing-plans** — plan one feature at a time from the Features table
11
+ - **pwk-executing-tasks** — implement tasks, mark features done, loop to next feature
12
+ - **pwk-finalizing** — archive docs, create PR
13
+
14
+ Plus 3 on-demand skills:
15
+
16
+ - **pwk-design-review** — audit a plan doc for production risks (triggered by writing-plans)
17
+ - **pwk-verify** — post-implementation verification with security, optimization, and traceability passes
18
+ - **pwk-diagnose** — 6-phase debugging loop
13
19
 
14
20
  They explain *what* to do and *when* to do it.
15
21
 
@@ -17,7 +23,7 @@ They explain *what* to do and *when* to do it.
17
23
 
18
24
  The `workflow-guard` extension enforces one rule:
19
25
 
20
- > During brainstorm and plan phases, `write` and `edit` are **hard-blocked** outside `docs/plans/`.
26
+ > During brainstorm, plan, and verify phases, `write` and `edit` are **hard-blocked** outside `docs/plans/`.
21
27
 
22
28
  The agent can still use `read` and `bash` for investigation. It literally cannot call `write` or `edit` on source files — the tools are blocked at the extension level.
23
29
 
@@ -0,0 +1,166 @@
1
+ # A/B Comparison: Writing Plans — Karpathy Behavioral Guidelines
2
+
3
+ ## Setup
4
+ - **Same design doc** (bookmarks: CRUD + search)
5
+ - **Same Go project scaffold**
6
+ - **Same prompt** (no questions, full plan with concrete code)
7
+ - **Variant A** (WITHOUT guidelines): 292-line SKILL.md — original writing-plans skill
8
+ - **Variant B** (WITH guidelines): 354-line SKILL.md — with Behavioral Guidelines section appended
9
+
10
+ ---
11
+
12
+ ## Structural Comparison
13
+
14
+ | Dimension | A (Without) | B (With) |
15
+ |---|---|---|
16
+ | **Total tasks** | 4 | 6 |
17
+ | **Lines in plan** | ~1,054 | ~1,019 |
18
+ | **New files per plan** | 7 files in Task 1 alone | 1-2 files per task |
19
+ | **External dependency** | None (stdlib only) | `github.com/google/uuid` |
20
+
21
+ ---
22
+
23
+ ## Task Decomposition
24
+
25
+ ### A (Without) — 4 tasks
26
+ | Task | Scope | Files touched |
27
+ |---|---|---|
28
+ | 1 | Bookmark + ALL infrastructure (model, store interface, mem store with full CRUD, service, handler, errors, route, tests) | 7 files |
29
+ | 2 | Delete bookmark | 3 files |
30
+ | 3 | List bookmarks (paginated, cursor) | 3 files |
31
+ | 4 | Search bookmarks (keyword + pagination) | 3 files |
32
+
33
+ ### B (With) — 6 tasks
34
+ | Task | Scope | Files touched |
35
+ |---|---|---|
36
+ | 1 | Scaffold (go.mod + model only) | 2 files |
37
+ | 2 | Bookmark a message (store + handler + test + route) | 4 files |
38
+ | 3 | List bookmarks (offset/limit pagination) | 4 files |
39
+ | 4 | Remove a bookmark | 4 files |
40
+ | 5 | Search bookmarks (keyword) | 4 files |
41
+ | 6 | Final wiring + integration lifecycle test | 2 files |
42
+
43
+ ---
44
+
45
+ ## Detailed Analysis by Guideline
46
+
47
+ ### Simplicity First
48
+
49
+ **A (Without):** ⚠️ **Overbuilt in Task 1.** Task 1 creates a `BookmarkStore` interface with 4 methods (Create, Delete, ListByUser, SearchByUser) — methods that won't be used until Tasks 2-4. It also creates the full `MemoryStore` implementation with all 4 methods, an `errors.go` file, a `Service` struct, AND the handler — all in a single task. The store interface is the full contract upfront before any task exercises most of it.
50
+
51
+ **B (With):** ✅ **Minimal per task.** Task 1 only creates `go.mod` + the `Bookmark` struct. Task 2 introduces `Store` with only `Create`, and `MemStore` with only `Create`. `List` is added to the interface in Task 3, `Delete` in Task 4, `Search` in Task 5 — each method appears when it's needed, not before.
52
+
53
+ **Verdict:** Guidelines had a clear positive effect. Plan B builds only what each task needs.
54
+
55
+ ### Surgical Changes
56
+
57
+ **A (Without):** ⚠️ Task 1 touches 7 files in one go (model, store interface, store mem, errors, service, handler, main.go). The Task 1 description says "create the full vertical slice" which bundles infrastructure that isn't tested yet.
58
+
59
+ **B (With):** ✅ Each task touches 1-2 files for new code. Task 1 creates 2 files (go.mod, model.go). Task 2 adds 3 new files + modifies main.go. No task creates more than 4 files.
60
+
61
+ **Verdict:** Guidelines had a clear positive effect. Plan B has tighter blast radius per task.
62
+
63
+ ### Think Before Coding (surface assumptions)
64
+
65
+ **A (Without):** ❌ Silent assumptions throughout:
66
+ - Used cursor-based pagination without noting the design just said "paginated" — didn't surface that offset-based vs cursor-based is a choice
67
+ - Added `sync.RWMutex` and concurrent safety without the design mentioning concurrency
68
+ - Created a `Service` layer between handler and store without justification
69
+
70
+ **B (With):** ⚠️ Still has assumptions but more defensible:
71
+ - Used offset/limit pagination (simpler, matches "paginated" literally)
72
+ - No concurrency concerns added (store uses `sync.Mutex` only, no RWMutex overhead)
73
+ - No `Service` layer — handler calls store directly
74
+ - Did add `github.com/google/uuid` dependency without asking — minor assumption
75
+
76
+ **Verdict:** Marginal positive effect. Plan B is less presumptuous but both plans made assumptions. Neither explicitly surfaced tradeoffs to the user.
77
+
78
+ ### Goal-Driven Execution
79
+
80
+ **A (Without):** ✅ Good acceptance criteria with Given/When/Then. Has a `checkpoint: test` on 3/4 tasks and `checkpoint: done` on the last task.
81
+
82
+ **B (With):** ✅ Good acceptance criteria. Has `checkpoint: test` on 3 tasks, `checkpoint: done` on 1, and no checkpoint on 2 simpler tasks. Added a full lifecycle integration test in Task 6 that wasn't in A.
83
+
84
+ **Verdict:** Roughly equivalent. Both plans have strong acceptance criteria (required by the base skill). The lifecycle test in B is a nice bonus that catches integration issues.
85
+
86
+ ---
87
+
88
+ ## Unrelated Observations (noise, not guidelines)
89
+
90
+ | Observation | A (Without) | B (With) |
91
+ |---|---|---|
92
+ | Pagination style | Cursor-based (more complex) | Offset-based (simpler) |
93
+ | External deps | None | `google/uuid` |
94
+ | Handler method naming | `Create`, `Delete`, `List`, `Search` | `CreateBookmark`, `DeleteBookmark`, `ListBookmarks`, `SearchBookmarks` |
95
+ | Test structure | Single `TestXxx` with `t.Run` subtests | Separate top-level test functions |
96
+ | `make([]T, 0, len)` usage | Yes (mem store candidates) | Yes (list handler, search handler) |
97
+
98
+ ---
99
+
100
+ ## Overall Assessment
101
+
102
+ | Guideline | Effect | Evidence |
103
+ |---|---|---|
104
+ | **Simplicity First** | ✅ Strong positive | B builds incrementally; A front-loads the full store interface |
105
+ | **Surgical Changes** | ✅ Positive | B touches fewer files per task (1-4 vs 7 in Task 1) |
106
+ | **Think Before Coding** | ⚠️ Marginal | B made fewer silent assumptions but neither surfaced tradeoffs explicitly |
107
+ | **Goal-Driven Execution** | ≈ Neutral | Both strong; base skill already enforces acceptance criteria |
108
+
109
+ **Bottom line (iteration 1):** The guidelines measurably improved the plan. The biggest win is **Simplicity First** — Plan B's incremental interface growth (adding methods to `Store` as each task needs them) is clearly better than Plan A's upfront full-contract approach. This is exactly the kind of thing "no abstractions for single-use code" catches.
110
+
111
+ **Weakness:** Neither plan explicitly called out assumptions or asked clarifying questions — the "Think Before Coding" guideline had the weakest signal. The guidelines alone may not be enough to overcome the model's tendency to fill gaps silently.
112
+
113
+ ---
114
+
115
+ ## Iteration 2: Revised Guidelines
116
+
117
+ ### What changed
118
+
119
+ The guidelines were reworked from 4 generic coding rules to 3 planning-specific principles:
120
+
121
+ | v1 (Generic) | v2 (Planning-Specific) | Why |
122
+ |---|---|---|
123
+ | Think Before Coding | **Surface Assumptions** | v1 said "ask" — the agent ignores this when told not to ask. v2 says "annotate in the plan" with a concrete `> **Assumption:** ...` format and examples of what to annotate. |
124
+ | Simplicity First | **Build Only What Each Task Needs** | Kept the same core principle but added the specific anti-pattern from the v1 A/B test: "don't define interface methods that no task exercises yet." |
125
+ | Surgical Changes | **One Task, One Change** | Reframed from "don't touch adjacent code" to "each task should trace to exactly one user-facing behavior" with a concrete guardrail (max 4 new files). |
126
+ | Goal-Driven Execution | *(removed)* | Redundant — the base skill already enforces Given/When/Then acceptance criteria. |
127
+
128
+ ### Iteration 2 Plan (v2 guidelines) vs Iteration 1 Plans
129
+
130
+ | Dimension | A (No guidelines) | B1 (v1 guidelines) | B2 (v2 guidelines) |
131
+ |---|---|---|---|
132
+ | **Total tasks** | 4 | 6 | 4 |
133
+ | **Max files/task** | 7 (Task 1) | 4 | 4 |
134
+ | **Assumptions annotated** | 0 | 0 | **4** (header below) |
135
+ | **External deps** | None | `google/uuid` | None |
136
+ | **Store interface** | 4 methods upfront in Task 1 | 1 method per task | 1 method per task |
137
+ | **Service layer** | Yes (unjustified) | No | No |
138
+
139
+ ### The big win: Surface Assumptions
140
+
141
+ Plan B2 opens with four explicit assumption annotations:
142
+
143
+ ```
144
+ > **Assumption:** User identification via X-User-ID request header since
145
+ > no auth system exists in the project.
146
+
147
+ > **Assumption:** Bookmarks include a Note field so users can annotate
148
+ > bookmarks. The design says "search by keyword" but doesn't specify
149
+ > the field.
150
+
151
+ > **Assumption:** Offset/limit pagination (not cursor-based).
152
+
153
+ > **Assumption:** In-memory store behind a Store interface.
154
+ ```
155
+
156
+ None of the previous plans (A or B1) did this. The v1 "Think Before Coding" guideline was completely invisible in output. The v2 "Surface Assumptions" guideline produced visible, reviewable annotations on the first run.
157
+
158
+ ### Iteration 2 Assessment
159
+
160
+ | Guideline | v1 Effect | v2 Effect | Improvement |
161
+ |---|---|---|---|
162
+ | **Surface Assumptions** (was Think Before Coding) | ⚠️ Invisible | ✅ 4 explicit annotations | Complete turnaround — concrete format + examples fixed the weakest signal |
163
+ | **Build Only What's Needed** (was Simplicity First) | ✅ Strong | ✅ Strong | Maintained — interface still grows incrementally |
164
+ | **One Task, One Change** (was Surgical Changes) | ✅ Positive | ✅ Positive | Maintained — max 4 files/task |
165
+
166
+ **Bottom line (iteration 2):** The v2 guidelines fixed the weakest signal from v1. "Surface Assumptions" went from invisible to producing 4 explicit, reviewable annotations. The other two principles maintained their positive effect. The removal of "Goal-Driven Execution" (redundant) reduced noise without losing signal.
@@ -0,0 +1,51 @@
1
+ # Add Verify Skill — Design Doc
2
+
3
+ ## Context
4
+
5
+ Based on [Chris LeMa's "The Last Prompt"](https://chrislema.com/the-last-prompt-you-need-when-building-software-with-ai), we need a post-implementation code verification phase in pi-workflow-kit. The existing `design-review` skill validates architecture *intentions* at the design-doc level, but there's no review of the *actual implemented code*. This is where the most dangerous bugs hide: signature mismatches between layers, dead code, duplicated logic, and security holes that pass tests but break in production.
6
+
7
+ ## Decision
8
+
9
+ ### Add a `verify` skill (new)
10
+
11
+ A single skill triggered by `/skill:verify` that runs three sequential expert review passes over implemented code:
12
+
13
+ 1. **Security** 🔴 — adversarial review as if a junior wrote it and the best security expert is auditing
14
+ 2. **Optimization** 🟡 — dead code, duplication, over/under-engineering, performance
15
+ 3. **Traceability** 🔵 — end-to-end call chain verification across every layer boundary
16
+
17
+ Output: structured markdown report at `docs/plans/*-verification-report.md` with findings and actionable task list.
18
+
19
+ ### Keep `design-review` unchanged
20
+
21
+ `design-review` stays between brainstorm and plan — it validates architecture before task breakdown. Moving it would lose the cheap "catch it before you build it" value.
22
+
23
+ ### Update README
24
+
25
+ Add `verify` to the workflow diagram, skill table, and quick start. The pipeline becomes:
26
+
27
+ ```
28
+ brainstorm → design-review → plan → execute → verify → finalize
29
+ ```
30
+
31
+ ## Workflow Integration
32
+
33
+ ```
34
+ brainstorm → design-review (optional) → plan → execute → verify → finalize
35
+ ↑ ↑
36
+ existing new
37
+ ```
38
+
39
+ - `verify` runs after `executing-tasks` and before `finalizing`
40
+ - It's optional — trivial changes can skip it
41
+ - The report's remediation task list feeds directly into a follow-up `/skill:writing-plans` if fixes are needed
42
+ - Read-only: can write to `docs/plans/` only, cannot modify source code
43
+
44
+ ## Files to Change
45
+
46
+ 1. **`skills/verify/SKILL.md`** — new skill (full content in `docs/plans/2026-06-03-verify-skill-design.md`)
47
+ 2. **`README.md`** — update workflow diagram, skill table, quick start, and project structure
48
+
49
+ ## Production Risks
50
+
51
+ Simple change — no design review needed. We're adding a new SKILL.md and updating documentation. No code execution, no external integrations, no security surface.
@@ -0,0 +1,111 @@
1
+ # Implementation Plan: Add Verify Skill
2
+
3
+ Design: `docs/plans/2026-06-03-add-verify-skill-design.md`
4
+
5
+ ## Overview
6
+
7
+ Add a `verify` skill to pi-workflow-kit — a post-implementation code verification phase that runs three expert review passes (security, optimization, traceability) over implemented code. Also update the README to reflect the expanded workflow pipeline.
8
+
9
+ Full SKILL.md content is in `docs/plans/2026-06-03-verify-skill-design.md` (lines 7-176, inside the code fence).
10
+
11
+ ## Task 1: Create the verify skill
12
+
13
+ <!-- tdd: trivial -->
14
+
15
+ Acceptance Criteria (QA Engineer Hat):
16
+ - **Happy Path**:
17
+ - Given: No `skills/verify/` directory exists
18
+ - When: `skills/verify/SKILL.md` is created
19
+ - Then: The file contains valid YAML frontmatter with `name: verify` and a description mentioning security, optimization, and traceability. The file body contains all three review pass sections, the report format template, and the principles section.
20
+ - **Edge Case (skill already exists)**:
21
+ - Given: `skills/verify/SKILL.md` already exists
22
+ - When: Task runs
23
+ - Then: The existing file is overwritten with the new content
24
+
25
+ Files:
26
+ - `skills/verify/SKILL.md`
27
+
28
+ Steps:
29
+ 1. Create the directory `skills/verify/`
30
+ 2. Create `skills/verify/SKILL.md` with the full content from the design draft. The content is the markdown inside the code fence in `docs/plans/2026-06-03-verify-skill-design.md` (lines 8-176). Copy it exactly — it includes:
31
+ - YAML frontmatter with name and description
32
+ - # Verify heading and intro paragraph
33
+ - ## Process section (5 steps)
34
+ - ## Pass 1 — Security Review 🔴 (framing, what to look for, severity table)
35
+ - ## Pass 2 — Optimization Review 🟡 (framing, what to look for, priority table)
36
+ - ## Pass 3 — Traceability Review 🔵 (framing, what to look for 4 sub-items, severity table)
37
+ - ## Report Format section (full template with summary table, findings sections, remediation task list)
38
+ - ## Principles section (5 bullets)
39
+
40
+ ## Task 2: Update README with verify skill
41
+
42
+ <!-- tdd: trivial -->
43
+
44
+ Acceptance Criteria (QA Engineer Hat):
45
+ - **Happy Path**:
46
+ - Given: README.md has the current workflow (brainstorm → design-review → plan → execute → finalize)
47
+ - When: README is updated
48
+ - Then: All five sections are updated — tagline, workflow diagram, skill table, phase control, quick start, and project structure — to include `verify` between execute and finalize.
49
+ - **Edge Case (verify already in README)**:
50
+ - Given: README already contains verify references
51
+ - When: Task runs
52
+ - Then: No duplicate entries are introduced
53
+
54
+ Files:
55
+ - `README.md`
56
+
57
+ Steps:
58
+
59
+ 1. Update the tagline (line 3) — change `brainstorm→plan→execute→finalize` to `brainstorm→plan→execute→verify→finalize`:
60
+ ```
61
+ > Stop AI agents from rushing to code. Enforce a structured brainstorm→plan→execute→verify→finalize workflow with TDD discipline.
62
+ ```
63
+
64
+ 2. Update the "🧠 6 Workflow Skills" heading (line 36) to "🧠 7 Workflow Skills"
65
+
66
+ 3. Update the workflow diagram (lines 40-44) to:
67
+ ```
68
+ brainstorm → design-review → plan → execute → verify → finalize
69
+
70
+ diagnose (anytime)
71
+ ```
72
+
73
+ 4. Add verify to the skill table (after the Execute row, before Finalize):
74
+ ```
75
+ | **Verify** | `/skill:verify` | Three expert review passes (security, optimization, traceability) on implemented code |
76
+ ```
77
+
78
+ 5. Update the phase control section (lines 61-67) to add verify:
79
+ ```
80
+ /skill:brainstorming → discuss and design
81
+ /skill:design-review → audit for production risks (non-trivial designs)
82
+ /skill:writing-plans → break into tasks
83
+ /skill:executing-tasks → implement with TDD
84
+ /skill:verify → review code for security, optimization, and traceability issues
85
+ /skill:finalizing → ship it
86
+ ```
87
+
88
+ 6. Update the quick start section (lines 110-135) to add verify between executing-tasks and finalizing:
89
+ ```
90
+ > /skill:executing-tasks
91
+
92
+ # (agent implements with TDD, cognitive persona shifts, all tools unlocked)
93
+ > /skill:verify
94
+
95
+ # (agent runs security, optimization, and traceability reviews on implemented code)
96
+ > /skill:finalizing
97
+
98
+ # (agent archives docs, curates lessons, creates PR)
99
+ ```
100
+
101
+ 7. Update the project structure (lines 146-161) to add verify:
102
+ ```
103
+ ├── skills/
104
+ │ ├── brainstorming/SKILL.md
105
+ │ ├── design-review/SKILL.md
106
+ │ ├── writing-plans/SKILL.md
107
+ │ ├── executing-tasks/SKILL.md
108
+ │ ├── verify/SKILL.md
109
+ │ ├── finalizing/SKILL.md
110
+ │ └── diagnose/SKILL.md
111
+ ```
@@ -0,0 +1,11 @@
1
+ # Progress: Add Verify Skill
2
+
3
+ Plan: docs/plans/2026-06-03-add-verify-skill-implementation.md
4
+ Branch: add-verify-skill
5
+ Started: 2026-06-03T13:00:00Z
6
+ Last updated: 2026-06-03T13:00:00Z
7
+
8
+ | # | Status | Task | Commit |
9
+ |---|--------|------|--------|
10
+ | 1 | ✅ done | Create the verify skill | c48d47a |
11
+ | 2 | ✅ done | Update README with verify skill | ea37ea8 |