@tianhai/pi-workflow-kit 0.16.0 → 0.18.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +51 -39
- package/docs/developer-usage-guide.md +32 -14
- package/docs/lessons.md +18 -0
- package/docs/oversight-model.md +11 -5
- package/docs/plans/2026-06-03-karpathy-guidelines-ab-comparison.md +166 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-design.md +51 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-implementation.md +111 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-progress.md +11 -0
- package/docs/plans/completed/2026-06-03-verify-skill-design.md +176 -0
- package/docs/plans/completed/2026-06-09-code-review-fixes-implementation.md +74 -0
- package/docs/plans/completed/2026-06-09-code-review-fixes-progress.md +14 -0
- package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-design.md +186 -0
- package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-implementation.md +675 -0
- package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-progress.md +18 -0
- package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-verification-report.md +81 -0
- package/docs/plans/completed/2026-06-09-verification-fixes-implementation.md +69 -0
- package/docs/plans/completed/2026-06-09-verification-fixes-progress.md +14 -0
- package/docs/workflow-phases.md +19 -13
- package/extensions/workflow-guard.ts +10 -9
- package/package.json +2 -1
- package/skills/{brainstorming → pwk-brainstorming}/SKILL.md +17 -6
- package/skills/{design-review → pwk-design-review}/SKILL.md +11 -9
- package/skills/{diagnose → pwk-diagnose}/SKILL.md +1 -1
- package/skills/{executing-tasks → pwk-executing-tasks}/SKILL.md +46 -16
- package/skills/{finalizing → pwk-finalizing}/SKILL.md +9 -2
- package/skills/pwk-verify/SKILL.md +170 -0
- package/skills/{writing-plans → pwk-writing-plans}/SKILL.md +72 -6
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# pi-workflow-kit
|
|
2
2
|
|
|
3
|
-
> Stop AI agents from rushing to code. Enforce a structured brainstorm→plan→execute→finalize workflow with TDD discipline.
|
|
3
|
+
> Stop AI agents from rushing to code. Enforce a structured brainstorm→plan→execute→verify→finalize workflow with TDD discipline.
|
|
4
4
|
|
|
5
5
|
AI coding agents tend to skip design and jump straight into implementation, producing over-engineered or misaligned code. **pi-workflow-kit** solves this by hard-blocking write operations during brainstorm and planning phases — the agent *literally cannot modify your source files* until you approve the design.
|
|
6
6
|
|
|
@@ -28,29 +28,30 @@ Enforces phase-appropriate tool access — not just guidelines, but hard blocks:
|
|
|
28
28
|
|
|
29
29
|
| Phase | `write` / `edit` | `bash` |
|
|
30
30
|
|-------|:-:|:-:|
|
|
31
|
-
| **Brainstorm** / **Plan** | 🔒 Blocked outside `docs/plans/` | 🔒 Read-only only (grep, find, cat, git status, curl…) |
|
|
31
|
+
| **Brainstorm** / **Plan** / **Verify** | 🔒 Blocked outside `docs/plans/` | 🔒 Read-only only (grep, find, cat, git status, curl…) |
|
|
32
32
|
| **Execute** / **Finalize** | ✅ Full access | ✅ Full access |
|
|
33
33
|
|
|
34
34
|
The agent can read code and discuss design with you during brainstorm/plan, but it physically cannot modify source files or run mutating commands.
|
|
35
35
|
|
|
36
|
-
### 🧠
|
|
36
|
+
### 🧠 7 Workflow Skills
|
|
37
37
|
|
|
38
38
|
Guide the agent through a disciplined development process:
|
|
39
39
|
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
40
|
+
brainstorm → plan → [design-review?] → execute → [verify?] → finalize
|
|
41
|
+
↕
|
|
42
|
+
diagnose (anytime)
|
|
43
|
+
|
|
44
|
+
For multi-feature designs, the plan→execute loop repeats per feature.
|
|
45
45
|
|
|
46
46
|
| Phase | Trigger | What Happens |
|
|
47
47
|
|-------|---------|--------------|
|
|
48
|
-
| **Brainstorm** | `/skill:brainstorming` | Explore approaches, debate tradeoffs, produce a design doc |
|
|
49
|
-
| **Design Review** | `/skill:design-review` | Audit design for production risks (security, scalability, fault tolerance) |
|
|
50
|
-
| **Plan** | `/skill:writing-plans` |
|
|
51
|
-
| **Execute** | `/skill:executing-tasks` | Implement tasks one-by-one with TDD discipline and pre-commit checkpoint review gates |
|
|
52
|
-
| **
|
|
53
|
-
| **
|
|
48
|
+
| **Brainstorm** | `/skill:pwk-brainstorming` | Explore approaches, debate tradeoffs, produce a design doc with a Features table |
|
|
49
|
+
| **Design Review** | `/skill:pwk-design-review` | Audit plan and design for production risks (security, scalability, fault tolerance) |
|
|
50
|
+
| **Plan** | `/skill:pwk-writing-plans` | Plan one feature at a time from the Features table — bite-sized TDD tasks with acceptance criteria |
|
|
51
|
+
| **Execute** | `/skill:pwk-executing-tasks` | Implement tasks one-by-one with TDD discipline and pre-commit checkpoint review gates |
|
|
52
|
+
| **Verify** | `/skill:pwk-verify` | Three expert review passes (security, optimization, traceability) on implemented code |
|
|
53
|
+
| **Finalize** | `/skill:pwk-finalizing` | Archive plan docs, update README/CHANGELOG, create PR |
|
|
54
|
+
| **Diagnose** | `/skill:pwk-diagnose` | 6-phase debugging loop: reproduce → hypothesize → instrument → fix → verify |
|
|
54
55
|
|
|
55
56
|
## The Workflow in Detail
|
|
56
57
|
|
|
@@ -58,13 +59,24 @@ brainstorm → design-review → plan → execute → finalize
|
|
|
58
59
|
|
|
59
60
|
You control each phase — the agent never advances on its own. Invoke a skill to move forward:
|
|
60
61
|
|
|
61
|
-
|
|
62
|
-
/skill:
|
|
63
|
-
/skill:design-review → audit for production risks (
|
|
64
|
-
/skill:
|
|
65
|
-
/skill:
|
|
66
|
-
/skill:finalizing
|
|
67
|
-
|
|
62
|
+
/skill:pwk-brainstorming → discuss and design (names features)
|
|
63
|
+
/skill:pwk-writing-plans → plan next feature from the Features table
|
|
64
|
+
/skill:pwk-design-review → audit for production risks (on demand)
|
|
65
|
+
/skill:pwk-executing-tasks → implement with TDD
|
|
66
|
+
/skill:pwk-verify → review code for security, optimization, and traceability
|
|
67
|
+
/skill:pwk-finalizing → ship it
|
|
68
|
+
|
|
69
|
+
### Feature-Based Planning
|
|
70
|
+
|
|
71
|
+
Design docs include a `## Features` table that tracks each feature's status:
|
|
72
|
+
|
|
73
|
+
| # | Feature | Status | Notes |
|
|
74
|
+
|---|---------|--------|-------|
|
|
75
|
+
| 1 | User signup | ✅ done | |
|
|
76
|
+
| 2 | Email verification | 🔄 planned | Plan: docs/plans/...-email-verification-implementation.md |
|
|
77
|
+
| 3 | Password reset | ⬜ pending | |
|
|
78
|
+
|
|
79
|
+
This enables incremental development — plan and execute one feature at a time, then loop back for the next.
|
|
68
80
|
|
|
69
81
|
### TDD Three-Scenario Model
|
|
70
82
|
|
|
@@ -112,24 +124,23 @@ Optionally label tasks with a `checkpoint` to pause for human review. At each ch
|
|
|
112
124
|
pi install npm:@tianhai/pi-workflow-kit
|
|
113
125
|
|
|
114
126
|
# Start a new feature
|
|
115
|
-
> /skill:brainstorming
|
|
127
|
+
> /skill:pwk-brainstorming
|
|
116
128
|
> I want to add OAuth2 login to our API
|
|
117
129
|
|
|
118
|
-
# (agent explores approaches, writes design doc)
|
|
130
|
+
# (agent explores approaches, writes design doc with Features table)
|
|
119
131
|
# (write/edit are blocked — your code is safe)
|
|
120
132
|
|
|
121
|
-
> /skill:
|
|
133
|
+
> /skill:pwk-writing-plans
|
|
122
134
|
|
|
123
|
-
# (agent
|
|
124
|
-
# (
|
|
125
|
-
|
|
126
|
-
> /skill:writing-plans
|
|
127
|
-
|
|
128
|
-
# (agent breaks design into TDD tasks with acceptance criteria)
|
|
129
|
-
> /skill:executing-tasks
|
|
135
|
+
# (agent picks next feature, breaks into TDD tasks)
|
|
136
|
+
# (triggers design review for non-trivial features)
|
|
137
|
+
> /skill:pwk-executing-tasks
|
|
130
138
|
|
|
131
139
|
# (agent implements with TDD, cognitive persona shifts, all tools unlocked)
|
|
132
|
-
> /skill:
|
|
140
|
+
> /skill:pwk-verify
|
|
141
|
+
|
|
142
|
+
# (agent runs security, optimization, and traceability reviews on implemented code)
|
|
143
|
+
> /skill:pwk-finalizing
|
|
133
144
|
|
|
134
145
|
# (agent archives docs, curates lessons, creates PR)
|
|
135
146
|
```
|
|
@@ -146,14 +157,15 @@ pi install npm:@tianhai/pi-workflow-kit
|
|
|
146
157
|
```
|
|
147
158
|
pi-workflow-kit/
|
|
148
159
|
├── extensions/
|
|
149
|
-
│ └── workflow-guard.ts # Write blocker during brainstorm/plan
|
|
160
|
+
│ └── workflow-guard.ts # Write blocker during brainstorm/plan/verify
|
|
150
161
|
├── skills/
|
|
151
|
-
│ ├── brainstorming/SKILL.md
|
|
152
|
-
│ ├── design-review/SKILL.md
|
|
153
|
-
│ ├── writing-plans/SKILL.md
|
|
154
|
-
│ ├── executing-tasks/SKILL.md
|
|
155
|
-
│ ├──
|
|
156
|
-
│
|
|
162
|
+
│ ├── pwk-brainstorming/SKILL.md
|
|
163
|
+
│ ├── pwk-design-review/SKILL.md
|
|
164
|
+
│ ├── pwk-writing-plans/SKILL.md
|
|
165
|
+
│ ├── pwk-executing-tasks/SKILL.md
|
|
166
|
+
│ ├── pwk-verify/SKILL.md
|
|
167
|
+
│ ├── pwk-finalizing/SKILL.md
|
|
168
|
+
│ └── pwk-diagnose/SKILL.md
|
|
157
169
|
├── tests/
|
|
158
170
|
│ └── workflow-guard.test.ts
|
|
159
171
|
├── package.json
|
|
@@ -4,8 +4,9 @@ How to install and use `pi-workflow-kit` with the Pi coding agent.
|
|
|
4
4
|
|
|
5
5
|
## What you get
|
|
6
6
|
|
|
7
|
-
- **4 skills** that guide the agent through a structured workflow
|
|
8
|
-
- **
|
|
7
|
+
- **4 workflow skills** that guide the agent through a structured feature-based workflow
|
|
8
|
+
- **3 on-demand skills** for design review, verification, and debugging
|
|
9
|
+
- **1 extension** that hard-blocks source writes during brainstorm, plan, and verify phases
|
|
9
10
|
|
|
10
11
|
## Installation
|
|
11
12
|
|
|
@@ -31,54 +32,70 @@ Or in `.pi/settings.json` / `~/.pi/agent/config.json`:
|
|
|
31
32
|
|
|
32
33
|
## The workflow
|
|
33
34
|
|
|
34
|
-
You control each phase by invoking the skill:
|
|
35
|
+
You control each phase by invoking the skill. For multi-feature designs, the plan→execute loop repeats per feature:
|
|
35
36
|
|
|
36
37
|
```
|
|
37
|
-
/skill:brainstorming → /skill:writing-plans → /skill:executing-tasks → /skill:finalizing
|
|
38
|
+
/skill:pwk-brainstorming → /skill:pwk-writing-plans → /skill:pwk-executing-tasks → loop or /skill:pwk-finalizing
|
|
38
39
|
```
|
|
39
40
|
|
|
40
41
|
### 1. Brainstorm
|
|
41
42
|
|
|
42
43
|
```
|
|
43
|
-
/skill:brainstorming
|
|
44
|
+
/skill:pwk-brainstorming
|
|
44
45
|
```
|
|
45
46
|
|
|
46
47
|
Explore the idea through collaborative dialogue. The agent reads code, asks questions one at a time, proposes 2-3 approaches, and presents the design in sections for your review.
|
|
47
48
|
|
|
48
|
-
Outcome: `docs/plans/YYYY-MM-DD-<topic>-design.md`
|
|
49
|
+
Outcome: `docs/plans/YYYY-MM-DD-<topic>-design.md` with a `## Features` table
|
|
49
50
|
|
|
50
51
|
Optionally writes ADRs to `docs/plans/adr/` for significant architectural decisions.
|
|
51
52
|
|
|
52
53
|
### 2. Plan
|
|
53
54
|
|
|
54
55
|
```
|
|
55
|
-
/skill:writing-plans
|
|
56
|
+
/skill:pwk-writing-plans
|
|
56
57
|
```
|
|
57
58
|
|
|
58
|
-
Read the design doc and
|
|
59
|
+
Read the design doc's Features table, pick the next `⬜ pending` feature, and create a per-feature implementation plan with exact file paths, complete code, and TDD scenarios. Optionally set up a branch or worktree.
|
|
59
60
|
|
|
60
|
-
Outcome: `docs/plans/YYYY-MM-DD-<topic>-implementation.md`
|
|
61
|
+
Outcome: `docs/plans/YYYY-MM-DD-<topic>-<feature-name>-implementation.md`
|
|
61
62
|
|
|
62
63
|
### 3. Execute
|
|
63
64
|
|
|
64
65
|
```
|
|
65
|
-
/skill:executing-tasks
|
|
66
|
+
/skill:pwk-executing-tasks
|
|
66
67
|
```
|
|
67
68
|
|
|
68
|
-
Implement the plan task-by-task. Each task: implement → run tests → fix if needed → commit.
|
|
69
|
+
Implement the plan task-by-task. Each task: implement → run tests → fix if needed → commit. When the feature is done, marks it `✅ done` in the design doc and suggests planning the next feature.
|
|
69
70
|
|
|
70
71
|
### 4. Finalize
|
|
71
72
|
|
|
72
73
|
```
|
|
73
|
-
/skill:finalizing
|
|
74
|
+
/skill:pwk-finalizing
|
|
74
75
|
```
|
|
75
76
|
|
|
76
77
|
Archive plan docs, update CHANGELOG/README, create PR, clean up worktree.
|
|
77
78
|
|
|
78
|
-
### 5.
|
|
79
|
+
### 5. Design Review (on demand)
|
|
79
80
|
|
|
80
81
|
```
|
|
81
|
-
/skill:
|
|
82
|
+
/skill:pwk-design-review
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
Audit a plan doc for production risks — security, scalability, fault tolerance, and operational hazards. Triggered by writing-plans for non-trivial features. Review findings append to the plan doc, not the design doc.
|
|
86
|
+
|
|
87
|
+
### 6. Verify (on demand)
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
/skill:pwk-verify
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Post-implementation verification with three expert passes — security, optimization, and traceability. Run after executing a feature or before finalizing.
|
|
94
|
+
|
|
95
|
+
### 7. Diagnose (on demand)
|
|
96
|
+
|
|
97
|
+
```
|
|
98
|
+
/skill:pwk-diagnose
|
|
82
99
|
```
|
|
83
100
|
|
|
84
101
|
A 6-phase debugging loop you invoke when something is broken. Build a feedback loop first, then reproduce, hypothesise, instrument, fix, and cleanup. Not a pipeline phase — use whenever needed.
|
|
@@ -88,6 +105,7 @@ A 6-phase debugging loop you invoke when something is broken. Build a feedback l
|
|
|
88
105
|
The `workflow-guard` extension watches `write` and `edit` tool calls:
|
|
89
106
|
|
|
90
107
|
- **During brainstorm and plan**: blocks writes outside `docs/plans/`. The agent can read code and use bash, but cannot modify source files.
|
|
108
|
+
- **During verify**: same read-only enforcement — the agent can inspect code but not modify it.
|
|
91
109
|
- **During execute and finalize**: no restrictions. All tools available.
|
|
92
110
|
|
|
93
111
|
No configuration needed. It activates automatically after install.
|
package/docs/lessons.md
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# Lessons Learned
|
|
2
|
+
|
|
3
|
+
<!--
|
|
4
|
+
Agent: read this at the start of each task during executing-tasks.
|
|
5
|
+
Follow every rule. Add new rules when you catch yourself making repeat mistakes.
|
|
6
|
+
Rules must be generic patterns applicable to any domain or feature — not specific to one service, entity, or use case.
|
|
7
|
+
Retire rules that no longer apply during finalizing.
|
|
8
|
+
-->
|
|
9
|
+
|
|
10
|
+
## Cross-Skill Consistency
|
|
11
|
+
|
|
12
|
+
- When adding instructions that reference artifacts from another skill (e.g., "extract metadata from plan doc"), always add a guard for when that artifact doesn't exist — not all workflows use all artifacts
|
|
13
|
+
- When reordering instructions within a step, verify all conditional branches still reference the correct context (e.g., hazard checks that say "this feature" must run after feature identification)
|
|
14
|
+
|
|
15
|
+
## Documentation
|
|
16
|
+
|
|
17
|
+
- When adding a new phase to an extension, update ALL comments and error messages — stale comments in one place create confusion about the actual behavior
|
|
18
|
+
- When renaming skills with a prefix, check for `/skill:` references in prose and code blocks separately — backtick-enclosed references in code examples may use a different pattern than prose references
|
package/docs/oversight-model.md
CHANGED
|
@@ -6,10 +6,16 @@
|
|
|
6
6
|
|
|
7
7
|
Skills teach the agent the workflow. There are 4:
|
|
8
8
|
|
|
9
|
-
- **brainstorming** — explore ideas, produce a design doc
|
|
10
|
-
- **writing-plans** —
|
|
11
|
-
- **executing-tasks** — implement tasks,
|
|
12
|
-
- **finalizing** — archive docs, create PR
|
|
9
|
+
- **pwk-brainstorming** — explore ideas, produce a design doc with a Features table
|
|
10
|
+
- **pwk-writing-plans** — plan one feature at a time from the Features table
|
|
11
|
+
- **pwk-executing-tasks** — implement tasks, mark features done, loop to next feature
|
|
12
|
+
- **pwk-finalizing** — archive docs, create PR
|
|
13
|
+
|
|
14
|
+
Plus 3 on-demand skills:
|
|
15
|
+
|
|
16
|
+
- **pwk-design-review** — audit a plan doc for production risks (triggered by writing-plans)
|
|
17
|
+
- **pwk-verify** — post-implementation verification with security, optimization, and traceability passes
|
|
18
|
+
- **pwk-diagnose** — 6-phase debugging loop
|
|
13
19
|
|
|
14
20
|
They explain *what* to do and *when* to do it.
|
|
15
21
|
|
|
@@ -17,7 +23,7 @@ They explain *what* to do and *when* to do it.
|
|
|
17
23
|
|
|
18
24
|
The `workflow-guard` extension enforces one rule:
|
|
19
25
|
|
|
20
|
-
> During brainstorm and
|
|
26
|
+
> During brainstorm, plan, and verify phases, `write` and `edit` are **hard-blocked** outside `docs/plans/`.
|
|
21
27
|
|
|
22
28
|
The agent can still use `read` and `bash` for investigation. It literally cannot call `write` or `edit` on source files — the tools are blocked at the extension level.
|
|
23
29
|
|
|
@@ -0,0 +1,166 @@
|
|
|
1
|
+
# A/B Comparison: Writing Plans — Karpathy Behavioral Guidelines
|
|
2
|
+
|
|
3
|
+
## Setup
|
|
4
|
+
- **Same design doc** (bookmarks: CRUD + search)
|
|
5
|
+
- **Same Go project scaffold**
|
|
6
|
+
- **Same prompt** (no questions, full plan with concrete code)
|
|
7
|
+
- **Variant A** (WITHOUT guidelines): 292-line SKILL.md — original writing-plans skill
|
|
8
|
+
- **Variant B** (WITH guidelines): 354-line SKILL.md — with Behavioral Guidelines section appended
|
|
9
|
+
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
## Structural Comparison
|
|
13
|
+
|
|
14
|
+
| Dimension | A (Without) | B (With) |
|
|
15
|
+
|---|---|---|
|
|
16
|
+
| **Total tasks** | 4 | 6 |
|
|
17
|
+
| **Lines in plan** | ~1,054 | ~1,019 |
|
|
18
|
+
| **New files per plan** | 7 files in Task 1 alone | 1-2 files per task |
|
|
19
|
+
| **External dependency** | None (stdlib only) | `github.com/google/uuid` |
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Task Decomposition
|
|
24
|
+
|
|
25
|
+
### A (Without) — 4 tasks
|
|
26
|
+
| Task | Scope | Files touched |
|
|
27
|
+
|---|---|---|
|
|
28
|
+
| 1 | Bookmark + ALL infrastructure (model, store interface, mem store with full CRUD, service, handler, errors, route, tests) | 7 files |
|
|
29
|
+
| 2 | Delete bookmark | 3 files |
|
|
30
|
+
| 3 | List bookmarks (paginated, cursor) | 3 files |
|
|
31
|
+
| 4 | Search bookmarks (keyword + pagination) | 3 files |
|
|
32
|
+
|
|
33
|
+
### B (With) — 6 tasks
|
|
34
|
+
| Task | Scope | Files touched |
|
|
35
|
+
|---|---|---|
|
|
36
|
+
| 1 | Scaffold (go.mod + model only) | 2 files |
|
|
37
|
+
| 2 | Bookmark a message (store + handler + test + route) | 4 files |
|
|
38
|
+
| 3 | List bookmarks (offset/limit pagination) | 4 files |
|
|
39
|
+
| 4 | Remove a bookmark | 4 files |
|
|
40
|
+
| 5 | Search bookmarks (keyword) | 4 files |
|
|
41
|
+
| 6 | Final wiring + integration lifecycle test | 2 files |
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Detailed Analysis by Guideline
|
|
46
|
+
|
|
47
|
+
### Simplicity First
|
|
48
|
+
|
|
49
|
+
**A (Without):** ⚠️ **Overbuilt in Task 1.** Task 1 creates a `BookmarkStore` interface with 4 methods (Create, Delete, ListByUser, SearchByUser) — methods that won't be used until Tasks 2-4. It also creates the full `MemoryStore` implementation with all 4 methods, an `errors.go` file, a `Service` struct, AND the handler — all in a single task. The store interface is the full contract upfront before any task exercises most of it.
|
|
50
|
+
|
|
51
|
+
**B (With):** ✅ **Minimal per task.** Task 1 only creates `go.mod` + the `Bookmark` struct. Task 2 introduces `Store` with only `Create`, and `MemStore` with only `Create`. `List` is added to the interface in Task 3, `Delete` in Task 4, `Search` in Task 5 — each method appears when it's needed, not before.
|
|
52
|
+
|
|
53
|
+
**Verdict:** Guidelines had a clear positive effect. Plan B builds only what each task needs.
|
|
54
|
+
|
|
55
|
+
### Surgical Changes
|
|
56
|
+
|
|
57
|
+
**A (Without):** ⚠️ Task 1 touches 7 files in one go (model, store interface, store mem, errors, service, handler, main.go). The Task 1 description says "create the full vertical slice" which bundles infrastructure that isn't tested yet.
|
|
58
|
+
|
|
59
|
+
**B (With):** ✅ Each task touches 1-2 files for new code. Task 1 creates 2 files (go.mod, model.go). Task 2 adds 3 new files + modifies main.go. No task creates more than 4 files.
|
|
60
|
+
|
|
61
|
+
**Verdict:** Guidelines had a clear positive effect. Plan B has tighter blast radius per task.
|
|
62
|
+
|
|
63
|
+
### Think Before Coding (surface assumptions)
|
|
64
|
+
|
|
65
|
+
**A (Without):** ❌ Silent assumptions throughout:
|
|
66
|
+
- Used cursor-based pagination without noting the design just said "paginated" — didn't surface that offset-based vs cursor-based is a choice
|
|
67
|
+
- Added `sync.RWMutex` and concurrent safety without the design mentioning concurrency
|
|
68
|
+
- Created a `Service` layer between handler and store without justification
|
|
69
|
+
|
|
70
|
+
**B (With):** ⚠️ Still has assumptions but more defensible:
|
|
71
|
+
- Used offset/limit pagination (simpler, matches "paginated" literally)
|
|
72
|
+
- No concurrency concerns added (store uses `sync.Mutex` only, no RWMutex overhead)
|
|
73
|
+
- No `Service` layer — handler calls store directly
|
|
74
|
+
- Did add `github.com/google/uuid` dependency without asking — minor assumption
|
|
75
|
+
|
|
76
|
+
**Verdict:** Marginal positive effect. Plan B is less presumptuous but both plans made assumptions. Neither explicitly surfaced tradeoffs to the user.
|
|
77
|
+
|
|
78
|
+
### Goal-Driven Execution
|
|
79
|
+
|
|
80
|
+
**A (Without):** ✅ Good acceptance criteria with Given/When/Then. Has a `checkpoint: test` on 3/4 tasks and `checkpoint: done` on the last task.
|
|
81
|
+
|
|
82
|
+
**B (With):** ✅ Good acceptance criteria. Has `checkpoint: test` on 3 tasks, `checkpoint: done` on 1, and no checkpoint on 2 simpler tasks. Added a full lifecycle integration test in Task 6 that wasn't in A.
|
|
83
|
+
|
|
84
|
+
**Verdict:** Roughly equivalent. Both plans have strong acceptance criteria (required by the base skill). The lifecycle test in B is a nice bonus that catches integration issues.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Unrelated Observations (noise, not guidelines)
|
|
89
|
+
|
|
90
|
+
| Observation | A (Without) | B (With) |
|
|
91
|
+
|---|---|---|
|
|
92
|
+
| Pagination style | Cursor-based (more complex) | Offset-based (simpler) |
|
|
93
|
+
| External deps | None | `google/uuid` |
|
|
94
|
+
| Handler method naming | `Create`, `Delete`, `List`, `Search` | `CreateBookmark`, `DeleteBookmark`, `ListBookmarks`, `SearchBookmarks` |
|
|
95
|
+
| Test structure | Single `TestXxx` with `t.Run` subtests | Separate top-level test functions |
|
|
96
|
+
| `make([]T, 0, len)` usage | Yes (mem store candidates) | Yes (list handler, search handler) |
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## Overall Assessment
|
|
101
|
+
|
|
102
|
+
| Guideline | Effect | Evidence |
|
|
103
|
+
|---|---|---|
|
|
104
|
+
| **Simplicity First** | ✅ Strong positive | B builds incrementally; A front-loads the full store interface |
|
|
105
|
+
| **Surgical Changes** | ✅ Positive | B touches fewer files per task (1-4 vs 7 in Task 1) |
|
|
106
|
+
| **Think Before Coding** | ⚠️ Marginal | B made fewer silent assumptions but neither surfaced tradeoffs explicitly |
|
|
107
|
+
| **Goal-Driven Execution** | ≈ Neutral | Both strong; base skill already enforces acceptance criteria |
|
|
108
|
+
|
|
109
|
+
**Bottom line (iteration 1):** The guidelines measurably improved the plan. The biggest win is **Simplicity First** — Plan B's incremental interface growth (adding methods to `Store` as each task needs them) is clearly better than Plan A's upfront full-contract approach. This is exactly the kind of thing "no abstractions for single-use code" catches.
|
|
110
|
+
|
|
111
|
+
**Weakness:** Neither plan explicitly called out assumptions or asked clarifying questions — the "Think Before Coding" guideline had the weakest signal. The guidelines alone may not be enough to overcome the model's tendency to fill gaps silently.
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## Iteration 2: Revised Guidelines
|
|
116
|
+
|
|
117
|
+
### What changed
|
|
118
|
+
|
|
119
|
+
The guidelines were reworked from 4 generic coding rules to 3 planning-specific principles:
|
|
120
|
+
|
|
121
|
+
| v1 (Generic) | v2 (Planning-Specific) | Why |
|
|
122
|
+
|---|---|---|
|
|
123
|
+
| Think Before Coding | **Surface Assumptions** | v1 said "ask" — the agent ignores this when told not to ask. v2 says "annotate in the plan" with a concrete `> **Assumption:** ...` format and examples of what to annotate. |
|
|
124
|
+
| Simplicity First | **Build Only What Each Task Needs** | Kept the same core principle but added the specific anti-pattern from the v1 A/B test: "don't define interface methods that no task exercises yet." |
|
|
125
|
+
| Surgical Changes | **One Task, One Change** | Reframed from "don't touch adjacent code" to "each task should trace to exactly one user-facing behavior" with a concrete guardrail (max 4 new files). |
|
|
126
|
+
| Goal-Driven Execution | *(removed)* | Redundant — the base skill already enforces Given/When/Then acceptance criteria. |
|
|
127
|
+
|
|
128
|
+
### Iteration 2 Plan (v2 guidelines) vs Iteration 1 Plans
|
|
129
|
+
|
|
130
|
+
| Dimension | A (No guidelines) | B1 (v1 guidelines) | B2 (v2 guidelines) |
|
|
131
|
+
|---|---|---|---|
|
|
132
|
+
| **Total tasks** | 4 | 6 | 4 |
|
|
133
|
+
| **Max files/task** | 7 (Task 1) | 4 | 4 |
|
|
134
|
+
| **Assumptions annotated** | 0 | 0 | **4** (header below) |
|
|
135
|
+
| **External deps** | None | `google/uuid` | None |
|
|
136
|
+
| **Store interface** | 4 methods upfront in Task 1 | 1 method per task | 1 method per task |
|
|
137
|
+
| **Service layer** | Yes (unjustified) | No | No |
|
|
138
|
+
|
|
139
|
+
### The big win: Surface Assumptions
|
|
140
|
+
|
|
141
|
+
Plan B2 opens with four explicit assumption annotations:
|
|
142
|
+
|
|
143
|
+
```
|
|
144
|
+
> **Assumption:** User identification via X-User-ID request header since
|
|
145
|
+
> no auth system exists in the project.
|
|
146
|
+
|
|
147
|
+
> **Assumption:** Bookmarks include a Note field so users can annotate
|
|
148
|
+
> bookmarks. The design says "search by keyword" but doesn't specify
|
|
149
|
+
> the field.
|
|
150
|
+
|
|
151
|
+
> **Assumption:** Offset/limit pagination (not cursor-based).
|
|
152
|
+
|
|
153
|
+
> **Assumption:** In-memory store behind a Store interface.
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
None of the previous plans (A or B1) did this. The v1 "Think Before Coding" guideline was completely invisible in output. The v2 "Surface Assumptions" guideline produced visible, reviewable annotations on the first run.
|
|
157
|
+
|
|
158
|
+
### Iteration 2 Assessment
|
|
159
|
+
|
|
160
|
+
| Guideline | v1 Effect | v2 Effect | Improvement |
|
|
161
|
+
|---|---|---|---|
|
|
162
|
+
| **Surface Assumptions** (was Think Before Coding) | ⚠️ Invisible | ✅ 4 explicit annotations | Complete turnaround — concrete format + examples fixed the weakest signal |
|
|
163
|
+
| **Build Only What's Needed** (was Simplicity First) | ✅ Strong | ✅ Strong | Maintained — interface still grows incrementally |
|
|
164
|
+
| **One Task, One Change** (was Surgical Changes) | ✅ Positive | ✅ Positive | Maintained — max 4 files/task |
|
|
165
|
+
|
|
166
|
+
**Bottom line (iteration 2):** The v2 guidelines fixed the weakest signal from v1. "Surface Assumptions" went from invisible to producing 4 explicit, reviewable annotations. The other two principles maintained their positive effect. The removal of "Goal-Driven Execution" (redundant) reduced noise without losing signal.
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Add Verify Skill — Design Doc
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
|
|
5
|
+
Based on [Chris LeMa's "The Last Prompt"](https://chrislema.com/the-last-prompt-you-need-when-building-software-with-ai), we need a post-implementation code verification phase in pi-workflow-kit. The existing `design-review` skill validates architecture *intentions* at the design-doc level, but there's no review of the *actual implemented code*. This is where the most dangerous bugs hide: signature mismatches between layers, dead code, duplicated logic, and security holes that pass tests but break in production.
|
|
6
|
+
|
|
7
|
+
## Decision
|
|
8
|
+
|
|
9
|
+
### Add a `verify` skill (new)
|
|
10
|
+
|
|
11
|
+
A single skill triggered by `/skill:verify` that runs three sequential expert review passes over implemented code:
|
|
12
|
+
|
|
13
|
+
1. **Security** 🔴 — adversarial review as if a junior wrote it and the best security expert is auditing
|
|
14
|
+
2. **Optimization** 🟡 — dead code, duplication, over/under-engineering, performance
|
|
15
|
+
3. **Traceability** 🔵 — end-to-end call chain verification across every layer boundary
|
|
16
|
+
|
|
17
|
+
Output: structured markdown report at `docs/plans/*-verification-report.md` with findings and actionable task list.
|
|
18
|
+
|
|
19
|
+
### Keep `design-review` unchanged
|
|
20
|
+
|
|
21
|
+
`design-review` stays between brainstorm and plan — it validates architecture before task breakdown. Moving it would lose the cheap "catch it before you build it" value.
|
|
22
|
+
|
|
23
|
+
### Update README
|
|
24
|
+
|
|
25
|
+
Add `verify` to the workflow diagram, skill table, and quick start. The pipeline becomes:
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
brainstorm → design-review → plan → execute → verify → finalize
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Workflow Integration
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
brainstorm → design-review (optional) → plan → execute → verify → finalize
|
|
35
|
+
↑ ↑
|
|
36
|
+
existing new
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
- `verify` runs after `executing-tasks` and before `finalizing`
|
|
40
|
+
- It's optional — trivial changes can skip it
|
|
41
|
+
- The report's remediation task list feeds directly into a follow-up `/skill:writing-plans` if fixes are needed
|
|
42
|
+
- Read-only: can write to `docs/plans/` only, cannot modify source code
|
|
43
|
+
|
|
44
|
+
## Files to Change
|
|
45
|
+
|
|
46
|
+
1. **`skills/verify/SKILL.md`** — new skill (full content in `docs/plans/2026-06-03-verify-skill-design.md`)
|
|
47
|
+
2. **`README.md`** — update workflow diagram, skill table, quick start, and project structure
|
|
48
|
+
|
|
49
|
+
## Production Risks
|
|
50
|
+
|
|
51
|
+
Simple change — no design review needed. We're adding a new SKILL.md and updating documentation. No code execution, no external integrations, no security surface.
|
|
@@ -0,0 +1,111 @@
|
|
|
1
|
+
# Implementation Plan: Add Verify Skill
|
|
2
|
+
|
|
3
|
+
Design: `docs/plans/2026-06-03-add-verify-skill-design.md`
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
Add a `verify` skill to pi-workflow-kit — a post-implementation code verification phase that runs three expert review passes (security, optimization, traceability) over implemented code. Also update the README to reflect the expanded workflow pipeline.
|
|
8
|
+
|
|
9
|
+
Full SKILL.md content is in `docs/plans/2026-06-03-verify-skill-design.md` (lines 7-176, inside the code fence).
|
|
10
|
+
|
|
11
|
+
## Task 1: Create the verify skill
|
|
12
|
+
|
|
13
|
+
<!-- tdd: trivial -->
|
|
14
|
+
|
|
15
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
16
|
+
- **Happy Path**:
|
|
17
|
+
- Given: No `skills/verify/` directory exists
|
|
18
|
+
- When: `skills/verify/SKILL.md` is created
|
|
19
|
+
- Then: The file contains valid YAML frontmatter with `name: verify` and a description mentioning security, optimization, and traceability. The file body contains all three review pass sections, the report format template, and the principles section.
|
|
20
|
+
- **Edge Case (skill already exists)**:
|
|
21
|
+
- Given: `skills/verify/SKILL.md` already exists
|
|
22
|
+
- When: Task runs
|
|
23
|
+
- Then: The existing file is overwritten with the new content
|
|
24
|
+
|
|
25
|
+
Files:
|
|
26
|
+
- `skills/verify/SKILL.md`
|
|
27
|
+
|
|
28
|
+
Steps:
|
|
29
|
+
1. Create the directory `skills/verify/`
|
|
30
|
+
2. Create `skills/verify/SKILL.md` with the full content from the design draft. The content is the markdown inside the code fence in `docs/plans/2026-06-03-verify-skill-design.md` (lines 8-176). Copy it exactly — it includes:
|
|
31
|
+
- YAML frontmatter with name and description
|
|
32
|
+
- # Verify heading and intro paragraph
|
|
33
|
+
- ## Process section (5 steps)
|
|
34
|
+
- ## Pass 1 — Security Review 🔴 (framing, what to look for, severity table)
|
|
35
|
+
- ## Pass 2 — Optimization Review 🟡 (framing, what to look for, priority table)
|
|
36
|
+
- ## Pass 3 — Traceability Review 🔵 (framing, what to look for 4 sub-items, severity table)
|
|
37
|
+
- ## Report Format section (full template with summary table, findings sections, remediation task list)
|
|
38
|
+
- ## Principles section (5 bullets)
|
|
39
|
+
|
|
40
|
+
## Task 2: Update README with verify skill
|
|
41
|
+
|
|
42
|
+
<!-- tdd: trivial -->
|
|
43
|
+
|
|
44
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
45
|
+
- **Happy Path**:
|
|
46
|
+
- Given: README.md has the current workflow (brainstorm → design-review → plan → execute → finalize)
|
|
47
|
+
- When: README is updated
|
|
48
|
+
- Then: All five sections are updated — tagline, workflow diagram, skill table, phase control, quick start, and project structure — to include `verify` between execute and finalize.
|
|
49
|
+
- **Edge Case (verify already in README)**:
|
|
50
|
+
- Given: README already contains verify references
|
|
51
|
+
- When: Task runs
|
|
52
|
+
- Then: No duplicate entries are introduced
|
|
53
|
+
|
|
54
|
+
Files:
|
|
55
|
+
- `README.md`
|
|
56
|
+
|
|
57
|
+
Steps:
|
|
58
|
+
|
|
59
|
+
1. Update the tagline (line 3) — change `brainstorm→plan→execute→finalize` to `brainstorm→plan→execute→verify→finalize`:
|
|
60
|
+
```
|
|
61
|
+
> Stop AI agents from rushing to code. Enforce a structured brainstorm→plan→execute→verify→finalize workflow with TDD discipline.
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
2. Update the "🧠 6 Workflow Skills" heading (line 36) to "🧠 7 Workflow Skills"
|
|
65
|
+
|
|
66
|
+
3. Update the workflow diagram (lines 40-44) to:
|
|
67
|
+
```
|
|
68
|
+
brainstorm → design-review → plan → execute → verify → finalize
|
|
69
|
+
↕
|
|
70
|
+
diagnose (anytime)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
4. Add verify to the skill table (after the Execute row, before Finalize):
|
|
74
|
+
```
|
|
75
|
+
| **Verify** | `/skill:verify` | Three expert review passes (security, optimization, traceability) on implemented code |
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
5. Update the phase control section (lines 61-67) to add verify:
|
|
79
|
+
```
|
|
80
|
+
/skill:brainstorming → discuss and design
|
|
81
|
+
/skill:design-review → audit for production risks (non-trivial designs)
|
|
82
|
+
/skill:writing-plans → break into tasks
|
|
83
|
+
/skill:executing-tasks → implement with TDD
|
|
84
|
+
/skill:verify → review code for security, optimization, and traceability issues
|
|
85
|
+
/skill:finalizing → ship it
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
6. Update the quick start section (lines 110-135) to add verify between executing-tasks and finalizing:
|
|
89
|
+
```
|
|
90
|
+
> /skill:executing-tasks
|
|
91
|
+
|
|
92
|
+
# (agent implements with TDD, cognitive persona shifts, all tools unlocked)
|
|
93
|
+
> /skill:verify
|
|
94
|
+
|
|
95
|
+
# (agent runs security, optimization, and traceability reviews on implemented code)
|
|
96
|
+
> /skill:finalizing
|
|
97
|
+
|
|
98
|
+
# (agent archives docs, curates lessons, creates PR)
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
7. Update the project structure (lines 146-161) to add verify:
|
|
102
|
+
```
|
|
103
|
+
├── skills/
|
|
104
|
+
│ ├── brainstorming/SKILL.md
|
|
105
|
+
│ ├── design-review/SKILL.md
|
|
106
|
+
│ ├── writing-plans/SKILL.md
|
|
107
|
+
│ ├── executing-tasks/SKILL.md
|
|
108
|
+
│ ├── verify/SKILL.md
|
|
109
|
+
│ ├── finalizing/SKILL.md
|
|
110
|
+
│ └── diagnose/SKILL.md
|
|
111
|
+
```
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# Progress: Add Verify Skill
|
|
2
|
+
|
|
3
|
+
Plan: docs/plans/2026-06-03-add-verify-skill-implementation.md
|
|
4
|
+
Branch: add-verify-skill
|
|
5
|
+
Started: 2026-06-03T13:00:00Z
|
|
6
|
+
Last updated: 2026-06-03T13:00:00Z
|
|
7
|
+
|
|
8
|
+
| # | Status | Task | Commit |
|
|
9
|
+
|---|--------|------|--------|
|
|
10
|
+
| 1 | ✅ done | Create the verify skill | c48d47a |
|
|
11
|
+
| 2 | ✅ done | Update README with verify skill | ea37ea8 |
|