@tianhai/pi-workflow-kit 0.15.0 → 0.17.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +23 -13
- package/docs/plans/2026-06-03-karpathy-guidelines-ab-comparison.md +166 -0
- package/docs/plans/completed/2026-05-22-agentic-agile-enhancements-design.md +77 -0
- package/docs/plans/completed/2026-05-22-agentic-agile-enhancements-implementation.md +473 -0
- package/docs/plans/completed/2026-05-25-design-review-split-implementation.md +622 -0
- package/docs/plans/completed/2026-05-25-design-review-split-progress.md +16 -0
- package/docs/plans/completed/2026-05-25-pr5-improvements-implementation.md +273 -0
- package/docs/plans/completed/2026-05-25-pr5-improvements-progress.md +17 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-design.md +51 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-implementation.md +111 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-progress.md +11 -0
- package/docs/plans/completed/2026-06-03-verify-skill-design.md +176 -0
- package/extensions/workflow-guard.ts +174 -185
- package/package.json +1 -1
- package/skills/brainstorming/SKILL.md +6 -1
- package/skills/design-review/SKILL.md +113 -0
- package/skills/executing-tasks/SKILL.md +17 -8
- package/skills/finalizing/SKILL.md +5 -3
- package/skills/verify/SKILL.md +170 -0
- package/skills/writing-plans/SKILL.md +121 -1
|
@@ -0,0 +1,273 @@
|
|
|
1
|
+
# Implementation Plan: PR #5 Improvements
|
|
2
|
+
|
|
3
|
+
Fixes CI failures, tightens consistency across the workflow chain, and improves the design-review integration so every skill speaks with one voice.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Task 1: Fix biome lint and format errors in workflow-guard
|
|
8
|
+
|
|
9
|
+
<!-- tdd: modifying-tested-code -->
|
|
10
|
+
|
|
11
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
12
|
+
- **Happy Path**:
|
|
13
|
+
- Given: The biome linter runs on `extensions/workflow-guard.ts` and `tests/workflow-guard.test.ts`
|
|
14
|
+
- When: `npx biome check` is executed
|
|
15
|
+
- Then: Zero errors and zero warnings are emitted
|
|
16
|
+
- **Edge Case (no functional regression)**:
|
|
17
|
+
- Given: The existing test suite for workflow-guard
|
|
18
|
+
- When: `npx vitest run` is executed
|
|
19
|
+
- Then: All 27 existing tests still pass
|
|
20
|
+
|
|
21
|
+
Files:
|
|
22
|
+
- `extensions/workflow-guard.ts`
|
|
23
|
+
- `tests/workflow-guard.test.ts`
|
|
24
|
+
|
|
25
|
+
Steps:
|
|
26
|
+
1. Fix `extensions/workflow-guard.ts` line 163 — replace string concatenation with template literal:
|
|
27
|
+
```ts
|
|
28
|
+
// Before:
|
|
29
|
+
return !absolute.startsWith(plansDir + "/");
|
|
30
|
+
// After:
|
|
31
|
+
return !absolute.startsWith(`${plansDir}/`);
|
|
32
|
+
```
|
|
33
|
+
2. Fix `tests/workflow-guard.test.ts` line 1 — remove unused `beforeEach` import:
|
|
34
|
+
```ts
|
|
35
|
+
// Before:
|
|
36
|
+
import { describe, it, expect, beforeEach } from "vitest";
|
|
37
|
+
// After:
|
|
38
|
+
import { describe, it, expect } from "vitest";
|
|
39
|
+
```
|
|
40
|
+
3. Fix `tests/workflow-guard.test.ts` line 19 — remove unused `getCurrentPhase` import:
|
|
41
|
+
```ts
|
|
42
|
+
// Before:
|
|
43
|
+
import { getCurrentPhase, isSafeCommand, shouldBlockFilePath } from "../extensions/workflow-guard";
|
|
44
|
+
// After:
|
|
45
|
+
import { isSafeCommand, shouldBlockFilePath } from "../extensions/workflow-guard";
|
|
46
|
+
```
|
|
47
|
+
4. Run `npx biome check extensions/ tests/` — confirm zero errors.
|
|
48
|
+
5. Run `npx vitest run` — confirm all tests pass.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Task 2: Fix CHANGELOG `[Unreleased]` link
|
|
53
|
+
|
|
54
|
+
<!-- tdd: trivial -->
|
|
55
|
+
|
|
56
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
57
|
+
- **Happy Path**:
|
|
58
|
+
- Given: The CHANGELOG.md file with version link references
|
|
59
|
+
- When: A reader clicks the `[Unreleased]` link
|
|
60
|
+
- Then: It shows changes between v0.16.0 and HEAD (not v0.14.0)
|
|
61
|
+
|
|
62
|
+
Files:
|
|
63
|
+
- `CHANGELOG.md`
|
|
64
|
+
|
|
65
|
+
Steps:
|
|
66
|
+
1. Update the `[Unreleased]` link at the bottom of CHANGELOG.md:
|
|
67
|
+
```markdown
|
|
68
|
+
// Before:
|
|
69
|
+
[Unreleased]: https://github.com/yinloo-ola/pi-workflow-kit/compare/v0.14.0...HEAD
|
|
70
|
+
// After:
|
|
71
|
+
[Unreleased]: https://github.com/yinloo-ola/pi-workflow-kit/compare/v0.16.0...HEAD
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## Task 3: Align skill descriptions — trim design-review, match tone
|
|
77
|
+
|
|
78
|
+
<!-- tdd: trivial -->
|
|
79
|
+
|
|
80
|
+
All six skills should follow the same description pattern: a one-sentence summary of what the skill does, followed by trigger guidance. The `design-review` description currently front-loads usage instructions that belong in the skill body.
|
|
81
|
+
|
|
82
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
83
|
+
- **Happy Path**:
|
|
84
|
+
- Given: The `description` frontmatter of `skills/design-review/SKILL.md`
|
|
85
|
+
- When: Compared to the other five skill descriptions
|
|
86
|
+
- Then: It follows the same pattern — concise purpose first, trigger guidance second
|
|
87
|
+
- **Edge Case (no information loss)**:
|
|
88
|
+
- Given: The trimmed description
|
|
89
|
+
- When: An agent reads it for skill matching
|
|
90
|
+
- Then: It still contains enough signal to trigger correctly (keywords: audit, design, production risks, security, scalability)
|
|
91
|
+
|
|
92
|
+
Files:
|
|
93
|
+
- `skills/design-review/SKILL.md`
|
|
94
|
+
|
|
95
|
+
Steps:
|
|
96
|
+
1. Replace the description in `skills/design-review/SKILL.md` frontmatter:
|
|
97
|
+
```yaml
|
|
98
|
+
// Before:
|
|
99
|
+
description: "Audit a design doc for production risks — security, scalability, fault tolerance, and operational hazards. Run after brainstorming, before writing-plans. Use when the brainstorm flags a non-trivial design, or when you want to stress-test a design for production readiness."
|
|
100
|
+
// After:
|
|
101
|
+
description: "Audit a design doc for production risks — security, scalability, fault tolerance, and operational hazards. Use after brainstorming for non-trivial designs, or when you want to stress-test a design for production readiness."
|
|
102
|
+
```
|
|
103
|
+
This trims the redundant "Run after brainstorming, before writing-plans" (the workflow order is documented in README) while keeping the trigger guidance.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Task 4: Remove redundant user confirmation for trivial design-review
|
|
108
|
+
|
|
109
|
+
<!-- tdd: modifying-tested-code -->
|
|
110
|
+
|
|
111
|
+
The brainstorming skill already asked the user to classify trivial vs non-trivial. When the design doc says "Simple change — no design review needed", the user already made that decision. Asking again in design-review step 2 adds friction without value.
|
|
112
|
+
|
|
113
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
114
|
+
- **Happy Path (trivial skip)**:
|
|
115
|
+
- Given: A design doc with "Simple change — no design review needed"
|
|
116
|
+
- When: `/skill:design-review` is run
|
|
117
|
+
- Then: The agent automatically appends the "Skipped — trivial change" section and moves on, without asking the user to confirm
|
|
118
|
+
- **Edge Case (non-trivial proceeds normally)**:
|
|
119
|
+
- Given: A design doc without the trivial marker
|
|
120
|
+
- When: `/skill:design-review` is run
|
|
121
|
+
- Then: The full audit proceeds as before (no behavior change)
|
|
122
|
+
|
|
123
|
+
Files:
|
|
124
|
+
- `skills/design-review/SKILL.md`
|
|
125
|
+
|
|
126
|
+
Steps:
|
|
127
|
+
1. In step 2 ("Check triviality"), replace the interactive confirmation with an automatic skip:
|
|
128
|
+
```markdown
|
|
129
|
+
// Before:
|
|
130
|
+
2. **Check triviality** — if the design doc notes "Simple change — no design review needed", confirm with the user: "This looks like a trivial change. Skip the full audit?" If yes, append a brief section:
|
|
131
|
+
|
|
132
|
+
// After:
|
|
133
|
+
2. **Check triviality** — if the design doc notes "Simple change — no design review needed", append a brief section:
|
|
134
|
+
```
|
|
135
|
+
2. Remove the "If yes," conditional — the append and stop is now unconditional for trivial docs.
|
|
136
|
+
3. Verify the file reads cleanly — the flow is now: find doc → check trivial → (if trivial: append + stop, else: continue to step 3).
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## Task 5: Evaluate design for review need regardless of design doc presence
|
|
141
|
+
|
|
142
|
+
<!-- tdd: trivial -->
|
|
143
|
+
|
|
144
|
+
The brainstorming skill flags "database changes, external services, auth, concurrency, large data flows" for design-review. The writing-plans safety net checks for a slightly different list and only when a design doc exists. When writing-plans is used standalone (no design doc), the safety net never fires — so a non-trivial standalone design skips design-review entirely.
|
|
145
|
+
|
|
146
|
+
The fix: always evaluate whether the design involves high-risk patterns, regardless of source. A design doc with no `## Architectural Review` section and a standalone user description both deserve the same scrutiny.
|
|
147
|
+
|
|
148
|
+
|
|
149
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
150
|
+
- **Happy Path (design doc, no review section)**:
|
|
151
|
+
- Given: A design doc exists without an `## Architectural Review` section, and the design involves database schema changes
|
|
152
|
+
- When: Writing-plans step 1 runs
|
|
153
|
+
- Then: The agent prompts the user to run `/skill:design-review` or type 'proceed'
|
|
154
|
+
- **Happy Path (standalone, non-trivial)**:
|
|
155
|
+
- Given: No design doc exists, and the user describes a feature involving authentication and external API integrations
|
|
156
|
+
- When: Writing-plans gathers context
|
|
157
|
+
- Then: The agent prompts: "This design involves [auth, external APIs] but hasn't been reviewed for production risks. Run `/skill:design-review` first, or type 'proceed' to skip."
|
|
158
|
+
- **Edge Case (design doc already reviewed)**:
|
|
159
|
+
- Given: A design doc with an `## Architectural Review` section
|
|
160
|
+
- When: Writing-plans step 1 runs
|
|
161
|
+
- Then: No prompt — the review already happened
|
|
162
|
+
- **Edge Case (trivial)**:
|
|
163
|
+
- Given: A trivial design (config rename, simple field addition) with or without a design doc
|
|
164
|
+
- When: Writing-plans evaluates the design
|
|
165
|
+
- Then: No prompt — no trigger categories matched
|
|
166
|
+
|
|
167
|
+
Files:
|
|
168
|
+
- `skills/brainstorming/SKILL.md`
|
|
169
|
+
- `skills/writing-plans/SKILL.md`
|
|
170
|
+
|
|
171
|
+
Steps:
|
|
172
|
+
1. Update `skills/brainstorming/SKILL.md` step 4 to match writing-plans' more specific list:
|
|
173
|
+
```markdown
|
|
174
|
+
// Before:
|
|
175
|
+
For non-trivial designs, note any areas that may need production-risk review (database changes, external services, auth, concurrency, large data flows). You don't need to audit them here — just flag them for the design-review stage.
|
|
176
|
+
|
|
177
|
+
// After:
|
|
178
|
+
For non-trivial designs, note any areas that may need production-risk review (database schema changes, authentication or authorization, external API integrations, concurrency or batch processing, file uploads or large data flows, Redis/caching/message queues). You don't need to audit them here — just flag them for the design-review stage.
|
|
179
|
+
```
|
|
180
|
+
2. Update `skills/writing-plans/SKILL.md` step 1 — consolidate the safety net into one check that applies regardless of whether a design doc exists. Replace the current conditional with:
|
|
181
|
+
```markdown
|
|
182
|
+
// Before (current text — only checks design docs):
|
|
183
|
+
Then check whether the design doc has an `## Architectural Review` section. If it doesn't, and the design involves any of the following, prompt the user...
|
|
184
|
+
|
|
185
|
+
// After (unified check):
|
|
186
|
+
Then evaluate whether the design — whether from the design doc or from the user's description and codebase exploration — involves any of the following:
|
|
187
|
+
|
|
188
|
+
- Database schema changes or migrations
|
|
189
|
+
- Authentication or authorization logic
|
|
190
|
+
- External API or service integrations
|
|
191
|
+
- Concurrency or batch processing
|
|
192
|
+
- File uploads or large data flows
|
|
193
|
+
- Redis, caching, or message queues
|
|
194
|
+
|
|
195
|
+
If any apply AND the design doc does not already have an `## Architectural Review` section, prompt the user: "This design involves [list what you found] but hasn't been reviewed for production risks. Run `/skill:design-review` first, or type 'proceed' to skip."
|
|
196
|
+
|
|
197
|
+
If the design doc explicitly notes "Simple change — no design review needed", skip this check.
|
|
198
|
+
```
|
|
199
|
+
3. Verify the safety net fires for both design-doc and standalone paths, and skips when the review already exists.
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Task 6: Generalize `NODE_ENV` reference in executing-tasks
|
|
204
|
+
|
|
205
|
+
<!-- tdd: trivial -->
|
|
206
|
+
|
|
207
|
+
The QA Test frame references `NODE_ENV` which is Node.js-specific. Since the workflow kit is used across languages (the examples reference SQL, Go, etc.), this should be generalized.
|
|
208
|
+
|
|
209
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
210
|
+
- **Happy Path**:
|
|
211
|
+
- Given: The QA Test frame in `skills/executing-tasks/SKILL.md`
|
|
212
|
+
- When: A developer working in Python or Go reads it
|
|
213
|
+
- Then: The guidance makes sense without Node.js context
|
|
214
|
+
- **Edge Case (Node.js users still understand)**:
|
|
215
|
+
- Given: A Node.js developer reading the same text
|
|
216
|
+
- When: They see the generalized phrasing
|
|
217
|
+
- Then: They understand it means `NODE_ENV=test` or equivalent
|
|
218
|
+
|
|
219
|
+
Files:
|
|
220
|
+
- `skills/executing-tasks/SKILL.md`
|
|
221
|
+
|
|
222
|
+
Steps:
|
|
223
|
+
1. In the QA Test frame, replace the `NODE_ENV` reference:
|
|
224
|
+
```markdown
|
|
225
|
+
// Before:
|
|
226
|
+
External dependencies must be mocked or stubbed. `NODE_ENV` must be `test` (or equivalent).
|
|
227
|
+
|
|
228
|
+
// After:
|
|
229
|
+
External dependencies must be mocked or stubbed. Ensure the test environment is isolated (e.g., `NODE_ENV=test`, `GO_ENV=test`, or equivalent for your stack).
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## Task 7: Deduplicate test coverage requirement in writing-plans task format
|
|
235
|
+
|
|
236
|
+
<!-- tdd: trivial -->
|
|
237
|
+
|
|
238
|
+
The "Each task must include" section has two overlapping bullets about test coverage:
|
|
239
|
+
1. The new Acceptance Criteria block (Happy Path + Edge Cases)
|
|
240
|
+
2. The old "Each task's tests should cover the happy path and at least one edge case" bullet
|
|
241
|
+
|
|
242
|
+
The Acceptance Criteria block supersedes the old bullet. Keeping both is redundant and confusing.
|
|
243
|
+
|
|
244
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
245
|
+
- **Happy Path**:
|
|
246
|
+
- Given: The task format section in `skills/writing-plans/SKILL.md`
|
|
247
|
+
- When: Reading the "Each task must include" bullets
|
|
248
|
+
- Then: Test coverage is specified exactly once (in the Acceptance Criteria block), not duplicated
|
|
249
|
+
|
|
250
|
+
Files:
|
|
251
|
+
- `skills/writing-plans/SKILL.md`
|
|
252
|
+
|
|
253
|
+
Steps:
|
|
254
|
+
1. Remove the redundant bullet from "Each task must include":
|
|
255
|
+
```markdown
|
|
256
|
+
// Remove this line (now covered by Acceptance Criteria):
|
|
257
|
+
- Each task's tests should cover the happy path and at least one edge case or error path, with concrete assertions
|
|
258
|
+
```
|
|
259
|
+
2. Verify the Acceptance Criteria bullet already covers this requirement with its "Happy Path" and "Edge Cases & Error Paths" sub-bullets.
|
|
260
|
+
|
|
261
|
+
---
|
|
262
|
+
|
|
263
|
+
## Task 8: Run tests and verify CI passes
|
|
264
|
+
|
|
265
|
+
<!-- tdd: trivial -->
|
|
266
|
+
|
|
267
|
+
Files:
|
|
268
|
+
- None (verification only)
|
|
269
|
+
|
|
270
|
+
Steps:
|
|
271
|
+
1. Run `npx biome check extensions/ tests/` — confirm zero errors.
|
|
272
|
+
2. Run `npx vitest run` — confirm all existing tests pass.
|
|
273
|
+
3. Verify the CHANGELOG link renders correctly.
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
# Progress: PR #5 Improvements
|
|
2
|
+
|
|
3
|
+
Plan: docs/plans/2026-05-25-pr5-improvements-implementation.md
|
|
4
|
+
Branch: design-review-split
|
|
5
|
+
Started: 2026-05-25T12:00:00Z
|
|
6
|
+
Last updated: 2026-05-25T12:26:00Z
|
|
7
|
+
|
|
8
|
+
| # | Status | Task | Commit |
|
|
9
|
+
|---|--------|------|--------|
|
|
10
|
+
| 1 | ✅ done | Fix biome lint and format errors in workflow-guard | 6f2eb8c |
|
|
11
|
+
| 2 | ✅ done | Fix CHANGELOG `[Unreleased]` link | 4866c25 |
|
|
12
|
+
| 3 | ✅ done | Align skill descriptions — trim design-review, match tone | 541ea9b |
|
|
13
|
+
| 4 | ✅ done | Remove redundant user confirmation for trivial design-review | 953b6d6 |
|
|
14
|
+
| 5 | ✅ done | Evaluate design for review need regardless of design doc presence | 20ea47e |
|
|
15
|
+
| 6 | ✅ done | Generalize `NODE_ENV` reference in executing-tasks | b117a44 |
|
|
16
|
+
| 7 | ✅ done | Deduplicate test coverage requirement in writing-plans task format | 3a34266 |
|
|
17
|
+
| 8 | ✅ done | Run tests and verify CI passes | — |
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Add Verify Skill — Design Doc
|
|
2
|
+
|
|
3
|
+
## Context
|
|
4
|
+
|
|
5
|
+
Based on [Chris LeMa's "The Last Prompt"](https://chrislema.com/the-last-prompt-you-need-when-building-software-with-ai), we need a post-implementation code verification phase in pi-workflow-kit. The existing `design-review` skill validates architecture *intentions* at the design-doc level, but there's no review of the *actual implemented code*. This is where the most dangerous bugs hide: signature mismatches between layers, dead code, duplicated logic, and security holes that pass tests but break in production.
|
|
6
|
+
|
|
7
|
+
## Decision
|
|
8
|
+
|
|
9
|
+
### Add a `verify` skill (new)
|
|
10
|
+
|
|
11
|
+
A single skill triggered by `/skill:verify` that runs three sequential expert review passes over implemented code:
|
|
12
|
+
|
|
13
|
+
1. **Security** 🔴 — adversarial review as if a junior wrote it and the best security expert is auditing
|
|
14
|
+
2. **Optimization** 🟡 — dead code, duplication, over/under-engineering, performance
|
|
15
|
+
3. **Traceability** 🔵 — end-to-end call chain verification across every layer boundary
|
|
16
|
+
|
|
17
|
+
Output: structured markdown report at `docs/plans/*-verification-report.md` with findings and actionable task list.
|
|
18
|
+
|
|
19
|
+
### Keep `design-review` unchanged
|
|
20
|
+
|
|
21
|
+
`design-review` stays between brainstorm and plan — it validates architecture before task breakdown. Moving it would lose the cheap "catch it before you build it" value.
|
|
22
|
+
|
|
23
|
+
### Update README
|
|
24
|
+
|
|
25
|
+
Add `verify` to the workflow diagram, skill table, and quick start. The pipeline becomes:
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
brainstorm → design-review → plan → execute → verify → finalize
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Workflow Integration
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
brainstorm → design-review (optional) → plan → execute → verify → finalize
|
|
35
|
+
↑ ↑
|
|
36
|
+
existing new
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
- `verify` runs after `executing-tasks` and before `finalizing`
|
|
40
|
+
- It's optional — trivial changes can skip it
|
|
41
|
+
- The report's remediation task list feeds directly into a follow-up `/skill:writing-plans` if fixes are needed
|
|
42
|
+
- Read-only: can write to `docs/plans/` only, cannot modify source code
|
|
43
|
+
|
|
44
|
+
## Files to Change
|
|
45
|
+
|
|
46
|
+
1. **`skills/verify/SKILL.md`** — new skill (full content in `docs/plans/2026-06-03-verify-skill-design.md`)
|
|
47
|
+
2. **`README.md`** — update workflow diagram, skill table, quick start, and project structure
|
|
48
|
+
|
|
49
|
+
## Production Risks
|
|
50
|
+
|
|
51
|
+
Simple change — no design review needed. We're adding a new SKILL.md and updating documentation. No code execution, no external integrations, no security surface.
|
|
@@ -0,0 +1,111 @@
|
|
|
1
|
+
# Implementation Plan: Add Verify Skill
|
|
2
|
+
|
|
3
|
+
Design: `docs/plans/2026-06-03-add-verify-skill-design.md`
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
Add a `verify` skill to pi-workflow-kit — a post-implementation code verification phase that runs three expert review passes (security, optimization, traceability) over implemented code. Also update the README to reflect the expanded workflow pipeline.
|
|
8
|
+
|
|
9
|
+
Full SKILL.md content is in `docs/plans/2026-06-03-verify-skill-design.md` (lines 7-176, inside the code fence).
|
|
10
|
+
|
|
11
|
+
## Task 1: Create the verify skill
|
|
12
|
+
|
|
13
|
+
<!-- tdd: trivial -->
|
|
14
|
+
|
|
15
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
16
|
+
- **Happy Path**:
|
|
17
|
+
- Given: No `skills/verify/` directory exists
|
|
18
|
+
- When: `skills/verify/SKILL.md` is created
|
|
19
|
+
- Then: The file contains valid YAML frontmatter with `name: verify` and a description mentioning security, optimization, and traceability. The file body contains all three review pass sections, the report format template, and the principles section.
|
|
20
|
+
- **Edge Case (skill already exists)**:
|
|
21
|
+
- Given: `skills/verify/SKILL.md` already exists
|
|
22
|
+
- When: Task runs
|
|
23
|
+
- Then: The existing file is overwritten with the new content
|
|
24
|
+
|
|
25
|
+
Files:
|
|
26
|
+
- `skills/verify/SKILL.md`
|
|
27
|
+
|
|
28
|
+
Steps:
|
|
29
|
+
1. Create the directory `skills/verify/`
|
|
30
|
+
2. Create `skills/verify/SKILL.md` with the full content from the design draft. The content is the markdown inside the code fence in `docs/plans/2026-06-03-verify-skill-design.md` (lines 8-176). Copy it exactly — it includes:
|
|
31
|
+
- YAML frontmatter with name and description
|
|
32
|
+
- # Verify heading and intro paragraph
|
|
33
|
+
- ## Process section (5 steps)
|
|
34
|
+
- ## Pass 1 — Security Review 🔴 (framing, what to look for, severity table)
|
|
35
|
+
- ## Pass 2 — Optimization Review 🟡 (framing, what to look for, priority table)
|
|
36
|
+
- ## Pass 3 — Traceability Review 🔵 (framing, what to look for 4 sub-items, severity table)
|
|
37
|
+
- ## Report Format section (full template with summary table, findings sections, remediation task list)
|
|
38
|
+
- ## Principles section (5 bullets)
|
|
39
|
+
|
|
40
|
+
## Task 2: Update README with verify skill
|
|
41
|
+
|
|
42
|
+
<!-- tdd: trivial -->
|
|
43
|
+
|
|
44
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
45
|
+
- **Happy Path**:
|
|
46
|
+
- Given: README.md has the current workflow (brainstorm → design-review → plan → execute → finalize)
|
|
47
|
+
- When: README is updated
|
|
48
|
+
- Then: All five sections are updated — tagline, workflow diagram, skill table, phase control, quick start, and project structure — to include `verify` between execute and finalize.
|
|
49
|
+
- **Edge Case (verify already in README)**:
|
|
50
|
+
- Given: README already contains verify references
|
|
51
|
+
- When: Task runs
|
|
52
|
+
- Then: No duplicate entries are introduced
|
|
53
|
+
|
|
54
|
+
Files:
|
|
55
|
+
- `README.md`
|
|
56
|
+
|
|
57
|
+
Steps:
|
|
58
|
+
|
|
59
|
+
1. Update the tagline (line 3) — change `brainstorm→plan→execute→finalize` to `brainstorm→plan→execute→verify→finalize`:
|
|
60
|
+
```
|
|
61
|
+
> Stop AI agents from rushing to code. Enforce a structured brainstorm→plan→execute→verify→finalize workflow with TDD discipline.
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
2. Update the "🧠 6 Workflow Skills" heading (line 36) to "🧠 7 Workflow Skills"
|
|
65
|
+
|
|
66
|
+
3. Update the workflow diagram (lines 40-44) to:
|
|
67
|
+
```
|
|
68
|
+
brainstorm → design-review → plan → execute → verify → finalize
|
|
69
|
+
↕
|
|
70
|
+
diagnose (anytime)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
4. Add verify to the skill table (after the Execute row, before Finalize):
|
|
74
|
+
```
|
|
75
|
+
| **Verify** | `/skill:verify` | Three expert review passes (security, optimization, traceability) on implemented code |
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
5. Update the phase control section (lines 61-67) to add verify:
|
|
79
|
+
```
|
|
80
|
+
/skill:brainstorming → discuss and design
|
|
81
|
+
/skill:design-review → audit for production risks (non-trivial designs)
|
|
82
|
+
/skill:writing-plans → break into tasks
|
|
83
|
+
/skill:executing-tasks → implement with TDD
|
|
84
|
+
/skill:verify → review code for security, optimization, and traceability issues
|
|
85
|
+
/skill:finalizing → ship it
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
6. Update the quick start section (lines 110-135) to add verify between executing-tasks and finalizing:
|
|
89
|
+
```
|
|
90
|
+
> /skill:executing-tasks
|
|
91
|
+
|
|
92
|
+
# (agent implements with TDD, cognitive persona shifts, all tools unlocked)
|
|
93
|
+
> /skill:verify
|
|
94
|
+
|
|
95
|
+
# (agent runs security, optimization, and traceability reviews on implemented code)
|
|
96
|
+
> /skill:finalizing
|
|
97
|
+
|
|
98
|
+
# (agent archives docs, curates lessons, creates PR)
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
7. Update the project structure (lines 146-161) to add verify:
|
|
102
|
+
```
|
|
103
|
+
├── skills/
|
|
104
|
+
│ ├── brainstorming/SKILL.md
|
|
105
|
+
│ ├── design-review/SKILL.md
|
|
106
|
+
│ ├── writing-plans/SKILL.md
|
|
107
|
+
│ ├── executing-tasks/SKILL.md
|
|
108
|
+
│ ├── verify/SKILL.md
|
|
109
|
+
│ ├── finalizing/SKILL.md
|
|
110
|
+
│ └── diagnose/SKILL.md
|
|
111
|
+
```
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
# Progress: Add Verify Skill
|
|
2
|
+
|
|
3
|
+
Plan: docs/plans/2026-06-03-add-verify-skill-implementation.md
|
|
4
|
+
Branch: add-verify-skill
|
|
5
|
+
Started: 2026-06-03T13:00:00Z
|
|
6
|
+
Last updated: 2026-06-03T13:00:00Z
|
|
7
|
+
|
|
8
|
+
| # | Status | Task | Commit |
|
|
9
|
+
|---|--------|------|--------|
|
|
10
|
+
| 1 | ✅ done | Create the verify skill | c48d47a |
|
|
11
|
+
| 2 | ✅ done | Update README with verify skill | ea37ea8 |
|
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
# Verify Skill — Draft SKILL.md
|
|
2
|
+
|
|
3
|
+
> **Target path:** `skills/verify/SKILL.md` (to be created during executing-tasks)
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
```markdown
|
|
8
|
+
---
|
|
9
|
+
name: verify
|
|
10
|
+
description: "Post-implementation code verification with three expert review passes — security, optimization, and traceability. Use after executing-tasks and before finalizing to catch issues that pass tests but break in production. Runs the 'last prompt' pattern: adversarial security review, dead code and duplication audit, and end-to-end contract verification across every layer. Use this skill whenever the user says 'verify', 'review the code', 'check for issues', 'security review', 'the last prompt', 'audit', or when code has been implemented and needs a quality gate before shipping."
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Verify
|
|
14
|
+
|
|
15
|
+
Three expert review passes over the implemented codebase. Read-only — you **may** write the verification report to `docs/plans/`, but you **may not** modify source code.
|
|
16
|
+
|
|
17
|
+
The core insight: code that passes tests is not code that's ready. Working code can have security holes, dead branches, duplicated logic, and broken contracts between layers — especially when AI generates across many files without maintaining a single mental model of the whole system. This skill catches what tests miss.
|
|
18
|
+
|
|
19
|
+
## Process
|
|
20
|
+
|
|
21
|
+
1. **Check what's been done** — run `git log --oneline` and `git diff --stat` to understand the scope of recent changes. If nothing has been implemented, say "No code changes found. Run `/skill:executing-tasks` first." and stop.
|
|
22
|
+
|
|
23
|
+
2. **Identify the project's layers** — before reviewing, map the codebase's architecture. Look for layer boundaries: UI/handlers/routes → services/business logic → repositories/data access → database/models. Note the patterns: does the project use controllers, handlers, or routes? Services or use cases? Repositories or DAOs? This map drives the traceability pass.
|
|
24
|
+
|
|
25
|
+
3. **Run three expert review passes** — each pass adopts a distinct adversarial framing. Do them sequentially. For each pass, read the relevant code deeply — don't skim. Then write findings.
|
|
26
|
+
|
|
27
|
+
4. **Compile the report** — write all findings to `docs/plans/*-verification-report.md`. Present the report to the user and wait for feedback.
|
|
28
|
+
|
|
29
|
+
5. **Offer to create a remediation plan** — after the report, ask: "Want me to create a fix plan from these findings? Run `/skill:writing-plans` to turn the task list into executable tasks."
|
|
30
|
+
|
|
31
|
+
## Pass 1 — Security Review 🔴
|
|
32
|
+
|
|
33
|
+
**Framing:** A junior developer wrote this code. Now the best security expert on the team is reviewing it — adversarial, suspicious of everything. Trust nothing.
|
|
34
|
+
|
|
35
|
+
**What to look for:**
|
|
36
|
+
|
|
37
|
+
- **Input validation** — every external input (HTTP params, form data, headers, query strings, environment variables) must be validated and sanitized. Unvalidated input is a critical finding.
|
|
38
|
+
- **Authentication & authorization** — every endpoint that handles user data must have auth checks. Are there endpoints that skip auth? Can one user access another user's data by changing an ID?
|
|
39
|
+
- **Injection** — SQL queries built by string concatenation, unsanitized shell commands, template injection, XSS in HTML output. Any raw variable interpolated into a query or command is critical.
|
|
40
|
+
- **Secrets** — API keys, passwords, tokens hardcoded in source files. Check environment variable loading — are defaults set to empty or to actual secrets?
|
|
41
|
+
- **Data exposure** — are sensitive fields (passwords, tokens, PII) logged, returned in API responses, or stored unencrypted?
|
|
42
|
+
- **Dependency risks** — known-vulnerable packages (if `package.json`/`go.mod`/`requirements.txt` is present).
|
|
43
|
+
|
|
44
|
+
**Severity classification:**
|
|
45
|
+
|
|
46
|
+
| Severity | Definition |
|
|
47
|
+
|----------|-----------|
|
|
48
|
+
| Critical | Exploitable right now — auth bypass, injection, data leak |
|
|
49
|
+
| High | Likely exploitable — missing validation on sensitive endpoint, weak auth |
|
|
50
|
+
| Medium | Harder to exploit but real risk — verbose error messages leaking internals, missing rate limits |
|
|
51
|
+
| Low | Best practice violations — missing CSP headers, no HSTS, long session timeouts |
|
|
52
|
+
|
|
53
|
+
## Pass 2 — Optimization Review 🟡
|
|
54
|
+
|
|
55
|
+
**Framing:** A code quality expert looking for waste — things that make the codebase harder to maintain, slower to run, or more confusing than necessary.
|
|
56
|
+
|
|
57
|
+
**What to look for:**
|
|
58
|
+
|
|
59
|
+
- **Dead code** — functions, methods, types, or exports that are never called anywhere in the codebase. Search for definitions and verify they have callers.
|
|
60
|
+
- **Duplication** — the same logic implemented in slightly different ways across multiple files. AI-generated code is especially prone to this — if context was lost between sessions, the AI solved the same sub-problem differently in two places. Flag each pair with file paths and line numbers.
|
|
61
|
+
- **Over-engineering** — abstractions, interfaces, or layers that add complexity without earning their keep (only one implementation, no real variation across the seam).
|
|
62
|
+
- **Under-engineering** — god functions, 200-line blocks, deeply nested conditionals that should be extracted.
|
|
63
|
+
- **Performance concerns** — N+1 queries, unbounded loops, unnecessary copies of large data structures, missing pagination on list endpoints.
|
|
64
|
+
|
|
65
|
+
**Priority classification:**
|
|
66
|
+
|
|
67
|
+
| Priority | Definition |
|
|
68
|
+
|----------|-----------|
|
|
69
|
+
| P0 | Dead code in a critical path or duplicated logic that will diverge |
|
|
70
|
+
| P1 | Significant duplication or over-engineering that increases maintenance cost |
|
|
71
|
+
| P2 | Minor cleanups — long functions, missing pagination, style inconsistencies |
|
|
72
|
+
|
|
73
|
+
## Pass 3 — Traceability Review 🔵
|
|
74
|
+
|
|
75
|
+
**Framing:** An integration expert tracing every user-facing action end-to-end — from UI to database and back. The AI generates code file-by-file, and the seams between files are where bugs hide.
|
|
76
|
+
|
|
77
|
+
**What to look for:**
|
|
78
|
+
|
|
79
|
+
1. **Map every entry point** — list all handlers, routes, controllers, or event listeners that receive external input.
|
|
80
|
+
2. **Trace each call chain** — for each entry point, follow the call: handler → service → repository → database. At each boundary, verify:
|
|
81
|
+
- **Function name** — does the caller use the exact function name the callee exposes?
|
|
82
|
+
- **Argument names** — does the caller pass `userId` when the function expects `user_id`? Does `id` mean the same thing in both layers?
|
|
83
|
+
- **Argument types** — is a string passed where an integer is expected? Is an object shape different from what the next layer destructures?
|
|
84
|
+
- **Return shape** — does the caller expect fields that the callee actually returns? Are response DTOs consistent across layers?
|
|
85
|
+
3. **Check error propagation** — when a database query returns no results, does the service layer handle it? Does the handler return 404 or 500? Do errors propagate cleanly or get swallowed silently?
|
|
86
|
+
4. **Verify the round-trip** — if the UI calls `getUser(id)` and displays `user.name`, trace that `name` actually exists in the DB schema, gets selected by the query, mapped by the repository, passed through the service, included in the response, and rendered by the UI.
|
|
87
|
+
|
|
88
|
+
**This is the pass that catches the most bugs.** AI-generated code will often have a frontend calling `getUserProfile(userId)` and a backend exposing `get_user_profile(user_id)` — both work in isolation, neither works together.
|
|
89
|
+
|
|
90
|
+
**Severity classification:**
|
|
91
|
+
|
|
92
|
+
| Severity | Definition |
|
|
93
|
+
|----------|-----------|
|
|
94
|
+
| Critical | Call chain is completely broken — function doesn't exist or signature is fundamentally wrong |
|
|
95
|
+
| High | Signature mismatch — wrong arg names, wrong types, missing required fields |
|
|
96
|
+
| Medium | Silent error handling — errors swallowed without logging or user feedback |
|
|
97
|
+
| Low | Inconsistent naming conventions that could confuse future developers |
|
|
98
|
+
|
|
99
|
+
## Report Format
|
|
100
|
+
|
|
101
|
+
Write findings to `docs/plans/*-verification-report.md` using this structure:
|
|
102
|
+
|
|
103
|
+
# Verification Report: <feature/topic>
|
|
104
|
+
|
|
105
|
+
**Date:** <ISO date>
|
|
106
|
+
**Scope:** <summary of what was reviewed>
|
|
107
|
+
**Reviewer:** AI verify skill (security + optimization + traceability)
|
|
108
|
+
|
|
109
|
+
## Summary
|
|
110
|
+
|
|
111
|
+
| Pass | Critical | High | Medium | Low |
|
|
112
|
+
|------|----------|------|--------|-----|
|
|
113
|
+
| Security | X | X | X | X |
|
|
114
|
+
| Optimization | — | X | X | X |
|
|
115
|
+
| Traceability | X | X | X | X |
|
|
116
|
+
| **Total** | **X** | **X** | **X** | **X** |
|
|
117
|
+
|
|
118
|
+
## 🔴 Security Findings
|
|
119
|
+
|
|
120
|
+
### [S-001] Critical — <short title>
|
|
121
|
+
|
|
122
|
+
**Location:** `path/to/file.ts:line`
|
|
123
|
+
|
|
124
|
+
**Issue:** <what's wrong and why it matters>
|
|
125
|
+
|
|
126
|
+
**Fix:** <concrete remediation step>
|
|
127
|
+
|
|
128
|
+
### [S-002] High — <short title>
|
|
129
|
+
...
|
|
130
|
+
|
|
131
|
+
## 🟡 Optimization Findings
|
|
132
|
+
|
|
133
|
+
### [O-001] P0 — <short title>
|
|
134
|
+
|
|
135
|
+
**Location:** `path/to/file.ts:line` and `path/to/other.ts:line`
|
|
136
|
+
|
|
137
|
+
**Issue:** <what's wrong>
|
|
138
|
+
|
|
139
|
+
**Fix:** <concrete remediation step>
|
|
140
|
+
|
|
141
|
+
### [O-002] P1 — <short title>
|
|
142
|
+
...
|
|
143
|
+
|
|
144
|
+
## 🔵 Traceability Findings
|
|
145
|
+
|
|
146
|
+
### [T-001] Critical — <short title>
|
|
147
|
+
|
|
148
|
+
**Entry point:** `path/to/handler.ts:line`
|
|
149
|
+
**Call chain:** handler → service → repository → DB
|
|
150
|
+
**Broken at:** <which boundary>
|
|
151
|
+
**Issue:** <what's wrong — e.g., handler passes `userId` but service expects `user_id`>
|
|
152
|
+
|
|
153
|
+
**Fix:** <concrete remediation step>
|
|
154
|
+
|
|
155
|
+
### [T-002] High — <short title>
|
|
156
|
+
...
|
|
157
|
+
|
|
158
|
+
## Remediation Task List
|
|
159
|
+
|
|
160
|
+
Convert findings into actionable tasks:
|
|
161
|
+
|
|
162
|
+
| ID | Priority | Finding | Estimated Effort |
|
|
163
|
+
|----|----------|---------|-----------------|
|
|
164
|
+
| S-001 | Critical | <one-liner> | <small/medium/large> |
|
|
165
|
+
| T-001 | Critical | <one-liner> | <small/medium/large> |
|
|
166
|
+
| O-001 | P0 | <one-liner> | <small/medium/large> |
|
|
167
|
+
| ...
|
|
168
|
+
|
|
169
|
+
## Principles
|
|
170
|
+
|
|
171
|
+
- **Be specific** — every finding must include a file path and line reference. "There might be security issues" is useless.
|
|
172
|
+
- **Be adversarial** — actively look for problems. If you don't find any, say so — but don't phone it in.
|
|
173
|
+
- **Be proportional** — a small config change doesn't need the same depth as a new API endpoint. Adjust your review depth to the scope of changes.
|
|
174
|
+
- **Don't fix anything** — this is read-only. Find and report. The user decides what to fix and when.
|
|
175
|
+
- **Focus on seams** — the traceability pass is where the most value lives. Code within a single file is usually coherent; the bugs hide between files.
|
|
176
|
+
```
|