@tianhai/pi-workflow-kit 0.14.0 → 0.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +16 -8
- package/docs/plans/completed/2026-05-20-generic-lessons-design.md +70 -0
- package/docs/plans/completed/2026-05-20-generic-lessons-implementation.md +114 -0
- package/docs/plans/completed/2026-05-20-generic-lessons-progress.md +11 -0
- package/docs/plans/completed/2026-05-22-agentic-agile-enhancements-design.md +77 -0
- package/docs/plans/completed/2026-05-22-agentic-agile-enhancements-implementation.md +473 -0
- package/docs/plans/completed/2026-05-25-design-review-split-implementation.md +622 -0
- package/docs/plans/completed/2026-05-25-design-review-split-progress.md +16 -0
- package/docs/plans/completed/2026-05-25-pr5-improvements-implementation.md +273 -0
- package/docs/plans/completed/2026-05-25-pr5-improvements-progress.md +17 -0
- package/extensions/workflow-guard.ts +174 -185
- package/package.json +1 -1
- package/skills/brainstorming/SKILL.md +6 -1
- package/skills/design-review/SKILL.md +113 -0
- package/skills/executing-tasks/SKILL.md +26 -8
- package/skills/finalizing/SKILL.md +7 -3
- package/skills/writing-plans/SKILL.md +70 -1
|
@@ -0,0 +1,622 @@
|
|
|
1
|
+
# Implementation Plan: Split Design Review into a Separate Skill
|
|
2
|
+
|
|
3
|
+
Extracts the architectural/security review from `brainstorming` into a dedicated `design-review` skill. Updates the workflow chain to: brainstorm → design-review → writing-plans → executing-tasks → finalizing.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Task 1: Update `skills/brainstorming/SKILL.md` — Remove Security Review, Add Trivial Gate
|
|
8
|
+
|
|
9
|
+
<!-- tdd: modifying-tested-code -->
|
|
10
|
+
|
|
11
|
+
Files:
|
|
12
|
+
- `skills/brainstorming/SKILL.md`
|
|
13
|
+
|
|
14
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
15
|
+
- **Happy Path (non-trivial)**:
|
|
16
|
+
- Given: A user runs `/skill:brainstorming` and a non-trivial design is proposed
|
|
17
|
+
- When: The agent finishes writing the design doc
|
|
18
|
+
- Then: The "After the design" section suggests running `/skill:design-review` before planning
|
|
19
|
+
- **Happy Path (trivial)**:
|
|
20
|
+
- Given: A user runs `/skill:brainstorming` for a trivial change (e.g., renaming a column)
|
|
21
|
+
- When: The agent finishes writing the design doc
|
|
22
|
+
- Then: The "After the design" section says to skip design review and go straight to planning
|
|
23
|
+
- **Edge Path (security content removed)**:
|
|
24
|
+
- Given: The current brainstorming SKILL.md contains inline 6 Pillars / 8 Hazards / 3 Socratic Heuristics
|
|
25
|
+
- When: This task is complete
|
|
26
|
+
- Then: None of that security content remains in brainstorming — it lives in the new design-review skill
|
|
27
|
+
|
|
28
|
+
Steps:
|
|
29
|
+
1. Read `skills/brainstorming/SKILL.md` in full
|
|
30
|
+
2. In step 4 ("Present the design"), add a brief trivial/non-trivial gate. Insert after the ADR guidance block (after the closing ` ``` ` of the ADR format) and before step 5:
|
|
31
|
+
|
|
32
|
+
```markdown
|
|
33
|
+
For non-trivial designs, note any areas that may need production-risk review (database changes, external services, auth, concurrency, large data flows). You don't need to audit them here — just flag them for the design-review stage.
|
|
34
|
+
|
|
35
|
+
For trivial changes (config, naming, simple field additions), note "Simple change — no design review needed" in the design doc.
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
3. Update the `## After the design` section. Replace:
|
|
39
|
+
|
|
40
|
+
```markdown
|
|
41
|
+
## After the design
|
|
42
|
+
|
|
43
|
+
Ask: "Ready to plan? Run `/skill:writing-plans`"
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
With:
|
|
47
|
+
|
|
48
|
+
```markdown
|
|
49
|
+
## After the design
|
|
50
|
+
|
|
51
|
+
- **Non-trivial design**: Ask: "Design looks good. Run `/skill:design-review` to check for production risks before planning."
|
|
52
|
+
- **Trivial change**: Ask: "Simple change — skip design review. Ready to plan? Run `/skill:writing-plans`"
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
4. Verify the file reads cleanly — no security/hazard/Socratic content remains in brainstorming.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Task 2: Create `skills/design-review/SKILL.md` — New Skill
|
|
60
|
+
|
|
61
|
+
<!-- tdd: new-feature -->
|
|
62
|
+
|
|
63
|
+
Files:
|
|
64
|
+
- `skills/design-review/SKILL.md`
|
|
65
|
+
|
|
66
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
67
|
+
- **Happy Path (hazards found)**:
|
|
68
|
+
- Given: A non-trivial design doc at `docs/plans/*-design.md` involving Redis key deletion and concurrent API calls
|
|
69
|
+
- When: `/skill:design-review` is run
|
|
70
|
+
- Then: The agent audits against 6 pillars, 8 hazards, and 3 Socratic heuristics, flags hazards #1 and #3 as `[TRIGGERED]`, and appends a `## ⚠️ High-Risk Operations & Mitigations` section to the design doc
|
|
71
|
+
- **Happy Path (all clear)**:
|
|
72
|
+
- Given: A non-trivial design doc with no high-risk patterns
|
|
73
|
+
- When: `/skill:design-review` is run
|
|
74
|
+
- Then: The agent appends a `## Architectural Review` section with `✅ No high-risk hazards detected` and brief pillar summaries
|
|
75
|
+
- **Edge Path (no design doc)**:
|
|
76
|
+
- Given: No `docs/plans/*-design.md` exists
|
|
77
|
+
- When: `/skill:design-review` is run
|
|
78
|
+
- Then: The agent says "No design doc found. Run `/skill:brainstorming` first." and stops
|
|
79
|
+
- **Edge Path (trivial design)**:
|
|
80
|
+
- Given: A design doc that already notes "Simple change — no design review needed"
|
|
81
|
+
- When: `/skill:design-review` is run
|
|
82
|
+
- Then: The agent confirms triviality and skips the full audit, appending a brief `## Architectural Review: Skipped (trivial change)` note
|
|
83
|
+
|
|
84
|
+
Steps:
|
|
85
|
+
1. Create the directory:
|
|
86
|
+
```
|
|
87
|
+
mkdir -p skills/design-review
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
2. Create `skills/design-review/SKILL.md` with the following content:
|
|
91
|
+
|
|
92
|
+
```markdown
|
|
93
|
+
---
|
|
94
|
+
name: design-review
|
|
95
|
+
description: "Audit a design doc for production risks — security, scalability, fault tolerance, and operational hazards. Run after brainstorming, before writing-plans. Use when the brainstorm flags a non-trivial design, or when you want to stress-test a design for production readiness."
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
# Design Review
|
|
99
|
+
|
|
100
|
+
Read-only exploration of the design doc. You **may** edit the design doc to append review findings. You may **not** edit source code or configuration.
|
|
101
|
+
|
|
102
|
+
## Process
|
|
103
|
+
|
|
104
|
+
1. **Find the design doc** — look for `docs/plans/*-design.md`. If none exists, say "No design doc found. Run `/skill:brainstorming` first." and stop.
|
|
105
|
+
|
|
106
|
+
2. **Check triviality** — if the design doc notes "Simple change — no design review needed", confirm with the user: "This looks like a trivial change. Skip the full audit?" If yes, append a brief section:
|
|
107
|
+
|
|
108
|
+
```markdown
|
|
109
|
+
## Architectural Review
|
|
110
|
+
|
|
111
|
+
**Status**: Skipped — trivial change. No high-risk operations detected.
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Then say: "Ready to plan? Run `/skill:writing-plans`" and stop.
|
|
115
|
+
|
|
116
|
+
3. **Read the design doc in full** — understand the architecture, data flow, components, and error handling proposed.
|
|
117
|
+
|
|
118
|
+
4. **🏛️ Architectural Pillars Review** — evaluate the design against the 6 Pillars of Production-Grade Design:
|
|
119
|
+
|
|
120
|
+
1. **Robustness & Fault Tolerance**: How expected failures are handled, subsystem isolation, graceful degradation.
|
|
121
|
+
2. **Atomicity & Consistency**: Database transactions, state rollback on error, endpoint idempotency.
|
|
122
|
+
3. **Security & Access Control**: Input validation/sanitization, authorization checks at the boundary.
|
|
123
|
+
4. **Scalability & Performance**: Connection pooling, closing resource leaks, preventing N+1 queries.
|
|
124
|
+
5. **Backwards Compatibility**: Schema migration safety, zero-downtime deployment, API versioning.
|
|
125
|
+
6. **Testability**: Injection seams for external dependencies (APIs, system clocks, randomizers) to keep tests 100% deterministic.
|
|
126
|
+
|
|
127
|
+
For each pillar, write a 1-2 sentence assessment. Flag any concerns.
|
|
128
|
+
|
|
129
|
+
5. **⚠️ High-Risk Hazard Audit** — evaluate the design against the 8 High-Risk Production Hazards. For each hazard, write either `[SAFE]` (with a 1-sentence justification) or `[TRIGGERED]` (detailing the mitigation):
|
|
130
|
+
|
|
131
|
+
1. **Unbounded Redis Deletions / Operations**: Multi-key deletion or scans (e.g. `KEYS` or raw `SCAN` loops) that block single-threaded performance.
|
|
132
|
+
2. **In-Memory OOM Loops**: Fetching complete database datasets into server memory (e.g., raw `select *`) to filter, sort, or map in runtime heap.
|
|
133
|
+
3. **Unbounded Concurrency Spikes**: Running concurrent network requests (e.g. unthrottled `Promise.all`) without strict batch limits.
|
|
134
|
+
4. **Missing High-Frequency Indexes**: Running queries on unindexed columns, forcing expensive table-scans under load.
|
|
135
|
+
5. **Nested/Long-Running Transactions**: Holding database connections and locks open while awaiting slow external HTTP, disk, or cryptographic tasks.
|
|
136
|
+
6. **Unrestricted Uploads & Temp Flooding**: Writing uploaded data directly to local temporary paths without validation limits or explicit `finally` cleanup blocks.
|
|
137
|
+
7. **Raw Query String Interpolation**: Merging raw variables into SQL queries or shell command inputs (susceptible to injection).
|
|
138
|
+
8. **Silent Swallowing Loops**: Background workers or cron tasks silently catching and suppressing exceptions without logging, back-offs, or alerts.
|
|
139
|
+
|
|
140
|
+
6. **🔍 Socratic Risk Discovery** — put on your **SRE Hat** and audit the proposed logic against 3 heuristics to identify novel or domain-specific risks:
|
|
141
|
+
|
|
142
|
+
- **The "Scale to 100x" Heuristic**: If this operation is run 100x/sec or on 100k items, what breaks? (Memory, CPU, Disk I/O, sockets, database connection limits).
|
|
143
|
+
- **The "Hostile World" Heuristic**: If a malicious actor has complete control over these inputs (headers, payloads, IDs), how can they exploit, crash, or extract data?
|
|
144
|
+
- **The "Silent Error" Heuristic**: If this downstream dependency or query hangs or fails silently, how does our server react? Is there a timeout, a back-off, or logging?
|
|
145
|
+
|
|
146
|
+
For each heuristic, note any risks discovered. If a risk overlaps with a triggered hazard, cross-reference it.
|
|
147
|
+
|
|
148
|
+
7. **Present findings** — show the full review to the user. For each triggered hazard or Socratic risk, propose a concrete mitigation. Wait for user feedback and incorporate changes.
|
|
149
|
+
|
|
150
|
+
8. **Append to design doc** — add a `## Architectural Review` section to the design doc. Two cases:
|
|
151
|
+
|
|
152
|
+
**All clear** (no hazards triggered, no Socratic risks):
|
|
153
|
+
```markdown
|
|
154
|
+
## Architectural Review
|
|
155
|
+
|
|
156
|
+
**Status**: ✅ No high-risk hazards detected.
|
|
157
|
+
|
|
158
|
+
**Pillars reviewed**: All 6 — no concerns.
|
|
159
|
+
**Hazards audited**: All 8 [SAFE].
|
|
160
|
+
**Socratic risks**: None identified.
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
**Hazards or risks found**:
|
|
164
|
+
```markdown
|
|
165
|
+
## Architectural Review
|
|
166
|
+
|
|
167
|
+
**Status**: ⚠️ High-risk operations detected — see mitigations below.
|
|
168
|
+
|
|
169
|
+
### Pillar Assessments
|
|
170
|
+
- **Robustness**: [assessment]
|
|
171
|
+
- **Atomicity**: [assessment]
|
|
172
|
+
- **Security**: [assessment]
|
|
173
|
+
- **Scalability**: [assessment]
|
|
174
|
+
- **Backwards Compatibility**: [assessment]
|
|
175
|
+
- **Testability**: [assessment]
|
|
176
|
+
|
|
177
|
+
### Hazard Audit
|
|
178
|
+
- 1. Unbounded Redis: [SAFE / TRIGGERED — mitigation]
|
|
179
|
+
- 2. In-Memory OOM: [SAFE / TRIGGERED — mitigation]
|
|
180
|
+
- 3. Unbounded Concurrency: [SAFE / TRIGGERED — mitigation]
|
|
181
|
+
- 4. Missing Indexes: [SAFE / TRIGGERED — mitigation]
|
|
182
|
+
- 5. Long-Running Transactions: [SAFE / TRIGGERED — mitigation]
|
|
183
|
+
- 6. Unrestricted Uploads: [SAFE / TRIGGERED — mitigation]
|
|
184
|
+
- 7. Query Interpolation: [SAFE / TRIGGERED — mitigation]
|
|
185
|
+
- 8. Silent Swallowing: [SAFE / TRIGGERED — mitigation]
|
|
186
|
+
|
|
187
|
+
### ⚠️ High-Risk Operations & Mitigations
|
|
188
|
+
[Detailed mitigation for each TRIGGERED hazard and Socratic risk]
|
|
189
|
+
|
|
190
|
+
### Socratic Risks
|
|
191
|
+
- **Scale to 100x**: [finding or "none identified"]
|
|
192
|
+
- **Hostile World**: [finding or "none identified"]
|
|
193
|
+
- **Silent Error**: [finding or "none identified"]
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## Principles
|
|
197
|
+
|
|
198
|
+
- Be specific — every `[TRIGGERED]` hazard must include a concrete mitigation, not just "be careful"
|
|
199
|
+
- Be honest — if the design is risky and the risk can't be mitigated easily, say so
|
|
200
|
+
- Be proportional — a simple CRUD endpoint doesn't need the same depth as a batch processing pipeline
|
|
201
|
+
- Don't redesign — flag risks and propose mitigations, but the design owner decides
|
|
202
|
+
|
|
203
|
+
## After the review
|
|
204
|
+
|
|
205
|
+
Ask: "Ready to plan? Run `/skill:writing-plans`"
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
3. Verify the file reads cleanly — the skill should be self-contained with no references to brainstorming's internals.
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## Task 3: Update `skills/writing-plans/SKILL.md` — QA Hat, Acceptance Criteria, Plan Audit with Design-Review Awareness
|
|
213
|
+
|
|
214
|
+
<!-- tdd: modifying-tested-code -->
|
|
215
|
+
|
|
216
|
+
Files:
|
|
217
|
+
- `skills/writing-plans/SKILL.md`
|
|
218
|
+
|
|
219
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
220
|
+
- **Happy Path**:
|
|
221
|
+
- Given: A user runs `/skill:writing-plans` with a design doc that has an `## Architectural Review` section with triggered hazards
|
|
222
|
+
- When: The implementation plan is generated
|
|
223
|
+
- Then: Every task has a structured `Acceptance Criteria` block with `Given/When/Then`, and tasks corresponding to triggered hazards have `checkpoint: done` and a `Hazard Mitigation Verification` section
|
|
224
|
+
- **Edge Path (design doc has no review, non-trivial)**:
|
|
225
|
+
- Given: A design doc with no `## Architectural Review` section but clearly non-trivial (database, auth, external services)
|
|
226
|
+
- When: Writing-plans starts
|
|
227
|
+
- Then: The agent prompts: "This design involves [database/auth/...] but hasn't been reviewed for production risks. Run `/skill:design-review` first, or confirm you want to proceed without."
|
|
228
|
+
- **Edge Path (design doc has no review, trivial)**:
|
|
229
|
+
- Given: A trivial design doc with "Simple change — no design review needed"
|
|
230
|
+
- When: Writing-plans starts
|
|
231
|
+
- Then: The agent proceeds without prompting for design-review
|
|
232
|
+
|
|
233
|
+
Steps:
|
|
234
|
+
1. Read `skills/writing-plans/SKILL.md` in full
|
|
235
|
+
|
|
236
|
+
2. In step 1 ("Check for a design doc"), add the design-review safety net after reading the design doc. Insert after "Read `docs/lessons.md` if it exists":
|
|
237
|
+
|
|
238
|
+
```markdown
|
|
239
|
+
Then check whether the design doc has an `## Architectural Review` section. If it doesn't, and the design involves any of the following, prompt the user: "This design involves [list what you found: database changes, authentication, external services, concurrency, large data flows] but hasn't been reviewed for production risks. Run `/skill:design-review` first, or type 'proceed' to skip."
|
|
240
|
+
|
|
241
|
+
- Database schema changes or migrations
|
|
242
|
+
- Authentication or authorization logic
|
|
243
|
+
- External API or service integrations
|
|
244
|
+
- Concurrency or batch processing
|
|
245
|
+
- File uploads or large data flows
|
|
246
|
+
- Redis, caching, or message queues
|
|
247
|
+
|
|
248
|
+
If the design doc explicitly notes "Simple change — no design review needed", skip this check.
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
3. In the "Task format" section, add the QA Engineer Hat and Acceptance Criteria requirements. Replace:
|
|
252
|
+
|
|
253
|
+
```markdown
|
|
254
|
+
Each task must include:
|
|
255
|
+
- Exact file paths to create/modify
|
|
256
|
+
- **Concrete code** — include the actual implementation, not a summary. Write out SQL schemas, type definitions, function signatures with bodies, route handler code, and test assertions. A developer should be able to copy-paste from the plan and have working code. For tasks that depend on types or utilities from earlier tasks, reference them explicitly (e.g., `import { User } from Task 2`) and include only the new code
|
|
257
|
+
- Exact commands with expected output (e.g., `npx vitest run src/user/model.test.ts` → shows 1 test passing)
|
|
258
|
+
- Each task's tests should cover the happy path and at least one edge case or error path, with concrete assertions
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
With:
|
|
262
|
+
|
|
263
|
+
```markdown
|
|
264
|
+
Each task must include:
|
|
265
|
+
- Exact file paths to create/modify
|
|
266
|
+
- **Acceptance Criteria (QA Engineer Hat)** — Put on your **QA Engineer Hat** to design exhaustive test coverage. Explicitly define:
|
|
267
|
+
- **Happy Path**: Expected behavior under normal operations.
|
|
268
|
+
- **Edge Cases & Error Paths**: What happens with empty inputs, limits exceeded, authentication failures, or error states.
|
|
269
|
+
Ensure every criteria block specifies the expected state and returned results using `Given/When/Then` behavioral blocks.
|
|
270
|
+
- **Concrete code** — include the actual implementation, not a summary. Write out SQL schemas, type definitions, function signatures with bodies, route handler code, and test assertions. A developer should be able to copy-paste from the plan and have working code. For tasks that depend on types or utilities from earlier tasks, reference them explicitly (e.g., `import { User } from Task 2`) and include only the new code
|
|
271
|
+
- Exact commands with expected output (e.g., `npx vitest run src/user/model.test.ts` → shows 1 test passing)
|
|
272
|
+
- Each task's tests should cover the happy path and at least one edge case or error path, with concrete assertions
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
4. In the "Task body structure" section, update each example task template to include an `Acceptance Criteria` block. Update the "No checkpoint" example to:
|
|
276
|
+
|
|
277
|
+
```markdown
|
|
278
|
+
## Task 1: Create User model
|
|
279
|
+
|
|
280
|
+
<!-- tdd: new-feature -->
|
|
281
|
+
|
|
282
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
283
|
+
- **Happy Path**:
|
|
284
|
+
- Given: Valid user data with name and email
|
|
285
|
+
- When: The User model is created
|
|
286
|
+
- Then: The model contains the correct fields and a generated ID
|
|
287
|
+
- **Edge Case (duplicate email)**:
|
|
288
|
+
- Given: A user with email "test@example.com" already exists
|
|
289
|
+
- When: Another user is created with the same email
|
|
290
|
+
- Then: Creation fails with a unique constraint error
|
|
291
|
+
|
|
292
|
+
Files:
|
|
293
|
+
- `src/user/model.ts`
|
|
294
|
+
- `src/user/model.test.ts`
|
|
295
|
+
|
|
296
|
+
Steps:
|
|
297
|
+
1. Write failing test for User model creation
|
|
298
|
+
2. Run test — confirm it fails
|
|
299
|
+
3. Implement User model
|
|
300
|
+
4. Run test — confirm it passes
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
Update the `checkpoint: test` example to include acceptance criteria:
|
|
304
|
+
|
|
305
|
+
```markdown
|
|
306
|
+
## Task 2: Write auth tests
|
|
307
|
+
|
|
308
|
+
<!-- tdd: new-feature -->
|
|
309
|
+
<!-- checkpoint: test -->
|
|
310
|
+
|
|
311
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
312
|
+
- **Happy Path**:
|
|
313
|
+
- Given: A user with valid credentials exists
|
|
314
|
+
- When: Login is attempted
|
|
315
|
+
- Then: A valid session token is returned
|
|
316
|
+
- **Edge Case (wrong password)**:
|
|
317
|
+
- Given: A user exists but password is incorrect
|
|
318
|
+
- When: Login is attempted
|
|
319
|
+
- Then: An authentication error is returned
|
|
320
|
+
|
|
321
|
+
Files:
|
|
322
|
+
- `src/auth/login.test.ts`
|
|
323
|
+
|
|
324
|
+
Steps:
|
|
325
|
+
1. Write failing test for login with valid credentials
|
|
326
|
+
2. Run test — confirm it fails
|
|
327
|
+
|
|
328
|
+
⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
|
|
329
|
+
|
|
330
|
+
3. Implement login handler
|
|
331
|
+
4. Run test — confirm it passes
|
|
332
|
+
5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
|
|
333
|
+
6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
Update the `checkpoint: done` example to include acceptance criteria:
|
|
337
|
+
|
|
338
|
+
```markdown
|
|
339
|
+
## Task 3: Add login endpoint
|
|
340
|
+
|
|
341
|
+
<!-- tdd: new-feature -->
|
|
342
|
+
<!-- checkpoint: done -->
|
|
343
|
+
|
|
344
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
345
|
+
- **Happy Path**:
|
|
346
|
+
- Given: A user with email "user@example.com" and password "secure123" exists
|
|
347
|
+
- When: A POST request with those credentials is sent to `/api/login`
|
|
348
|
+
- Then: Response returns `200 OK` with a signed JWT token
|
|
349
|
+
- **Edge Case (invalid password)**:
|
|
350
|
+
- Given: A user exists but the password sent is "wrong-pass"
|
|
351
|
+
- When: A POST request is sent to `/api/login`
|
|
352
|
+
- Then: Response returns `401 Unauthorized`
|
|
353
|
+
- **Edge Case (rate limiting)**:
|
|
354
|
+
- Given: 5 failed login attempts from the same IP
|
|
355
|
+
- When: A 6th attempt is sent
|
|
356
|
+
- Then: Response returns `429 Too Many Requests`
|
|
357
|
+
|
|
358
|
+
Files:
|
|
359
|
+
- `src/auth/login.ts`
|
|
360
|
+
- `src/auth/login.test.ts`
|
|
361
|
+
|
|
362
|
+
Steps:
|
|
363
|
+
1. Write failing test for login with valid credentials
|
|
364
|
+
2. Run test — confirm it fails
|
|
365
|
+
3. Implement login handler
|
|
366
|
+
4. Run test — confirm it passes
|
|
367
|
+
5. Add edge case tests (invalid password, missing email)
|
|
368
|
+
6. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
|
|
369
|
+
7. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
|
|
370
|
+
|
|
371
|
+
⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
|
|
372
|
+
```
|
|
373
|
+
|
|
374
|
+
Update the "Both checkpoints" example to include acceptance criteria:
|
|
375
|
+
|
|
376
|
+
```markdown
|
|
377
|
+
## Task 4: Complex auth flow
|
|
378
|
+
|
|
379
|
+
<!-- tdd: new-feature -->
|
|
380
|
+
<!-- checkpoint: test -->
|
|
381
|
+
<!-- checkpoint: done -->
|
|
382
|
+
|
|
383
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
384
|
+
- **Happy Path**:
|
|
385
|
+
- Given: A valid OAuth2 authorization code
|
|
386
|
+
- When: The auth callback is invoked
|
|
387
|
+
- Then: A user session is created and the user is redirected to the dashboard
|
|
388
|
+
- **Edge Case (expired code)**:
|
|
389
|
+
- Given: An expired or invalid authorization code
|
|
390
|
+
- When: The auth callback is invoked
|
|
391
|
+
- Then: The user is redirected to login with an error message
|
|
392
|
+
|
|
393
|
+
Steps:
|
|
394
|
+
1. Write failing test for auth flow
|
|
395
|
+
2. Run test — confirm it fails
|
|
396
|
+
|
|
397
|
+
⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
|
|
398
|
+
|
|
399
|
+
3. Implement auth flow
|
|
400
|
+
4. Run test — confirm it passes
|
|
401
|
+
5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
|
|
402
|
+
6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
|
|
403
|
+
|
|
404
|
+
⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
5. In step 3 ("Present the plan"), add the **Plan Acceptance Audit** sub-step after "show the complete plan to the human":
|
|
408
|
+
|
|
409
|
+
```markdown
|
|
410
|
+
Before presenting, run the **Plan Acceptance Audit**:
|
|
411
|
+
- **Vertical Slices**: Is every task a complete vertical slice (not horizontal)?
|
|
412
|
+
- **Task Sizing**: Is any single task too large or covering multiple complex behaviors? If so, split it.
|
|
413
|
+
- **QA Coverage**: Does every task have both a Happy Path and at least one Edge Case in its Acceptance Criteria?
|
|
414
|
+
- **Checkpoint Alignment**: Are `checkpoint: test` and `checkpoint: done` gates placed on the most critical or risky tasks?
|
|
415
|
+
- **Risk Enforcement**: If the design doc's Architectural Review section flagged any hazards as `[TRIGGERED]`, verify the corresponding tasks have `checkpoint: done` and a `Hazard Mitigation Verification` section.
|
|
416
|
+
|
|
417
|
+
If any check fails, fix the plan before presenting.
|
|
418
|
+
```
|
|
419
|
+
|
|
420
|
+
6. Verify the file reads cleanly.
|
|
421
|
+
|
|
422
|
+
---
|
|
423
|
+
|
|
424
|
+
## Task 4: Update `skills/executing-tasks/SKILL.md` — Cognitive Persona Shifts & Defensive Sandboxing
|
|
425
|
+
|
|
426
|
+
<!-- tdd: modifying-tested-code -->
|
|
427
|
+
|
|
428
|
+
Files:
|
|
429
|
+
- `skills/executing-tasks/SKILL.md`
|
|
430
|
+
|
|
431
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
432
|
+
- **Happy Path**:
|
|
433
|
+
- Given: An implementation plan with tasks containing Given/When/Then acceptance criteria and numbered steps
|
|
434
|
+
- When: `/skill:executing-tasks` runs through a task
|
|
435
|
+
- Then: The agent follows the plan's numbered steps while applying three cognitive frames:
|
|
436
|
+
1. **QA Test frame** (when writing/running tests): Focus on translating Given/When/Then specs, verify sandboxed environment
|
|
437
|
+
2. **Pragmatic Developer frame** (when implementing): Focus on simplest code to green tests
|
|
438
|
+
3. **Senior Refactoring frame** (when refactoring): Evaluate craftsmanship (shallow modules, deletion test, duplication, seam discipline)
|
|
439
|
+
- **Edge Path (Sandbox Verification)**:
|
|
440
|
+
- Given: A test file that would connect to a real database
|
|
441
|
+
- When: The agent is in the QA Test frame
|
|
442
|
+
- Then: The agent verifies the test uses mocks/stubs and no live connections before running
|
|
443
|
+
|
|
444
|
+
Steps:
|
|
445
|
+
1. Read `skills/executing-tasks/SKILL.md` in full
|
|
446
|
+
|
|
447
|
+
2. In the "Per-task execution" section, replace step 3 with meta-framed persona shifts that preserve the plan-step-following behavior. Replace:
|
|
448
|
+
|
|
449
|
+
```markdown
|
|
450
|
+
3. **Execute the plan steps** — follow each numbered step in the task body, in order. Stop at any `⏸ CHECKPOINT` gate (see [Checkpoint gates](#checkpoint-gates--when-the-plan-says-stop)).
|
|
451
|
+
4. **Verify against task description** — re-read the task from the plan. Does the implementation satisfy every requirement listed? If not, fix before proceeding.
|
|
452
|
+
5. **Refactor** — after all tests pass, look for:
|
|
453
|
+
- **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
|
|
454
|
+
- **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
|
|
455
|
+
- **Duplication** — extract repeated patterns
|
|
456
|
+
- **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
|
|
457
|
+
|
|
458
|
+
Run tests after each refactor step. Never refactor while tests are failing.
|
|
459
|
+
```
|
|
460
|
+
|
|
461
|
+
With:
|
|
462
|
+
|
|
463
|
+
```markdown
|
|
464
|
+
3. **Execute the plan steps** — follow each numbered step in the task body, in order. As you work, shift your cognitive focus through three frames:
|
|
465
|
+
|
|
466
|
+
**QA Test frame** (when writing/running tests): Focus entirely on translating the task's `Given/When/Then` Acceptance Criteria into precise failing tests. Before running tests, verify the test environment is sandboxed — no real database connections, API calls, or live services. External dependencies must be mocked or stubbed. `NODE_ENV` must be `test` (or equivalent).
|
|
467
|
+
|
|
468
|
+
**Pragmatic Developer frame** (when implementing): Focus on the simplest possible code to make the tests green. Do not over-engineer or add code for future requirements. Keep complexity to a bare minimum.
|
|
469
|
+
|
|
470
|
+
**Senior Refactoring frame** (when refactoring): Evaluate the craftsmanship of the code. Check for:
|
|
471
|
+
- **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
|
|
472
|
+
- **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
|
|
473
|
+
- **Duplication** — extract repeated patterns
|
|
474
|
+
- **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
|
|
475
|
+
|
|
476
|
+
Run tests after each refactor step. Never refactor while tests are failing.
|
|
477
|
+
|
|
478
|
+
Stop at any `⏸ CHECKPOINT` gate (see [Checkpoint gates](#checkpoint-gates--when-the-plan-says-stop)).
|
|
479
|
+
4. **Verify against task description** — re-read the task from the plan. Does the implementation satisfy every requirement listed? If not, fix before proceeding.
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
Note: The old step 5 (Refactor) is folded into step 3's "Senior Refactoring frame" so step 4 remains "Verify against task description". The remaining steps (old 6→5, old 7→6, old 8→7, old 9→8, old 10→9) need to be renumbered.
|
|
483
|
+
|
|
484
|
+
3. Renumber the remaining steps after the new step 4:
|
|
485
|
+
- Old step 6 ("Learn from mistakes") → new step 5
|
|
486
|
+
- Old step 7 ("Commit") → new step 6
|
|
487
|
+
- Old step 8 ("Update progress") → new step 7
|
|
488
|
+
- Old step 9 ("Suggest session break") → new step 8
|
|
489
|
+
- Old step 10 ("Loop") → new step 9
|
|
490
|
+
|
|
491
|
+
4. Verify the file reads cleanly — the cognitive frames are meta-guidance applied while following the plan's numbered steps, not a replacement for them.
|
|
492
|
+
|
|
493
|
+
---
|
|
494
|
+
|
|
495
|
+
## Task 5: Update `skills/finalizing/SKILL.md` — Lessons Curation with Scrum Master Hat
|
|
496
|
+
|
|
497
|
+
<!-- tdd: modifying-tested-code -->
|
|
498
|
+
|
|
499
|
+
Files:
|
|
500
|
+
- `skills/finalizing/SKILL.md`
|
|
501
|
+
|
|
502
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
503
|
+
- **Happy Path**:
|
|
504
|
+
- Given: A sprint is completed with some rules in `docs/lessons.md`
|
|
505
|
+
- When: `/skill:finalizing` is executed
|
|
506
|
+
- Then: The agent puts on the **Agile Scrum Master Hat** to de-duplicate, generalize, and categorize all rules under structured markdown headers
|
|
507
|
+
- **Edge Path (No lessons exist)**:
|
|
508
|
+
- Given: No `docs/lessons.md` exists and no lessons were learned
|
|
509
|
+
- When: `/skill:finalizing` is executed
|
|
510
|
+
- Then: The step is skipped gracefully (existing behavior preserved)
|
|
511
|
+
- **Edge Path (Lessons format after categorization)**:
|
|
512
|
+
- Given: `docs/lessons.md` was categorized into headers like `## Tool Usage` and `## Testing Patterns` by a previous finalizing run
|
|
513
|
+
- When: A new execution phase appends a rule under `## Rules`
|
|
514
|
+
- Then: The rule lands in the correct location (the `## Rules` section still exists for new entries, and finalizing re-categorizes later)
|
|
515
|
+
|
|
516
|
+
Steps:
|
|
517
|
+
1. Read `skills/finalizing/SKILL.md` in full
|
|
518
|
+
|
|
519
|
+
2. In step 2 ("Review lessons learned"), replace the existing instruction with the enhanced Scrum Master Hat curation. Replace:
|
|
520
|
+
|
|
521
|
+
```markdown
|
|
522
|
+
2. **Review lessons learned** — if `docs/lessons.md` exists, review it:
|
|
523
|
+
- Add any lessons from this session that were missed during execution
|
|
524
|
+
- **Generalize domain-specific rules** — if a rule names a specific service, entity, or feature, either rewrite it as a generic pattern or remove it if no generic form exists
|
|
525
|
+
- Retire rules that no longer apply (remove the bullet)
|
|
526
|
+
- If no changes are needed, leave it as-is
|
|
527
|
+
```
|
|
528
|
+
|
|
529
|
+
With:
|
|
530
|
+
|
|
531
|
+
```markdown
|
|
532
|
+
2. **Review & Polish Lessons (Agile Scrum Master Hat)** — if `docs/lessons.md` exists, put on your **Agile Scrum Master Hat** to curate and optimize it for future sprints:
|
|
533
|
+
- **Add missed lessons** — capture any lessons from this session that weren't written during execution
|
|
534
|
+
- **Generalize domain-specific rules** — if a rule names a specific service, entity, or feature, either rewrite it as a generic pattern or remove it if no generic form exists
|
|
535
|
+
- **De-duplicate** — combine overlapping or redundant rules into single, sharper entries
|
|
536
|
+
- **Categorize** — group the rules under clear, structured markdown headers (e.g., `## Tool Usage`, `## Testing Patterns`, `## Architecture Rules`) to make the document highly scannable for future sessions. Keep the `## Rules` section as the append target for new entries during execution — categorization moves rules out of `## Rules` into the appropriate category headers.
|
|
537
|
+
- **Retire stale rules** — remove bullets that no longer apply
|
|
538
|
+
- If no changes are needed, leave it as-is
|
|
539
|
+
```
|
|
540
|
+
|
|
541
|
+
3. Verify the file reads cleanly.
|
|
542
|
+
|
|
543
|
+
---
|
|
544
|
+
|
|
545
|
+
## Task 6: Update `docs/lessons.md` format template in `skills/executing-tasks/SKILL.md`
|
|
546
|
+
|
|
547
|
+
<!-- tdd: modifying-tested-code -->
|
|
548
|
+
|
|
549
|
+
Files:
|
|
550
|
+
- `skills/executing-tasks/SKILL.md`
|
|
551
|
+
|
|
552
|
+
Acceptance Criteria (QA Engineer Hat):
|
|
553
|
+
- **Happy Path**:
|
|
554
|
+
- Given: The agent catches a repeat mistake during task execution
|
|
555
|
+
- When: It appends a new rule to `docs/lessons.md`
|
|
556
|
+
- Then: The rule is appended under `## Rules` (the standard append target), regardless of whether category headers exist from a previous finalizing run
|
|
557
|
+
- **Edge Path (After categorization)**:
|
|
558
|
+
- Given: `docs/lessons.md` has been reorganized by finalizing with category headers like `## Tool Usage`
|
|
559
|
+
- When: The agent needs to append a new rule during execution
|
|
560
|
+
- Then: The agent appends to `## Rules` (which finalizing ensures always exists as the catch-all section)
|
|
561
|
+
|
|
562
|
+
Steps:
|
|
563
|
+
1. Read the `docs/lessons.md` format template section in `skills/executing-tasks/SKILL.md`
|
|
564
|
+
|
|
565
|
+
2. Update the format template comment to clarify the append convention. Replace:
|
|
566
|
+
|
|
567
|
+
```markdown
|
|
568
|
+
### `docs/lessons.md` format
|
|
569
|
+
|
|
570
|
+
```markdown
|
|
571
|
+
# Lessons Learned
|
|
572
|
+
|
|
573
|
+
<!--
|
|
574
|
+
Agent: read this at the start of each task during executing-tasks.
|
|
575
|
+
Follow every rule. Add new rules when you catch yourself making repeat mistakes.
|
|
576
|
+
Rules must be generic patterns applicable to any domain or feature — not specific to one service, entity, or use case.
|
|
577
|
+
Retire rules that no longer apply during finalizing.
|
|
578
|
+
-->
|
|
579
|
+
|
|
580
|
+
## Rules
|
|
581
|
+
|
|
582
|
+
- <new rule here>
|
|
583
|
+
```
|
|
584
|
+
```
|
|
585
|
+
|
|
586
|
+
With:
|
|
587
|
+
|
|
588
|
+
```markdown
|
|
589
|
+
### `docs/lessons.md` format
|
|
590
|
+
|
|
591
|
+
```markdown
|
|
592
|
+
# Lessons Learned
|
|
593
|
+
|
|
594
|
+
<!--
|
|
595
|
+
Agent: read this at the start of each task during executing-tasks.
|
|
596
|
+
Follow every rule. Add new rules when you catch yourself making repeat mistakes.
|
|
597
|
+
Rules must be generic patterns applicable to any domain or feature — not specific to one service, entity, or use case.
|
|
598
|
+
Retire rules that no longer apply during finalizing.
|
|
599
|
+
-->
|
|
600
|
+
|
|
601
|
+
## Rules
|
|
602
|
+
|
|
603
|
+
- <new rule here>
|
|
604
|
+
```
|
|
605
|
+
|
|
606
|
+
When adding a new rule during execution, always append it under `## Rules`. The categorization into specific headers (e.g., `## Tool Usage`, `## Testing Patterns`) is done during finalizing — never during execution.
|
|
607
|
+
```
|
|
608
|
+
|
|
609
|
+
3. Verify the file reads cleanly.
|
|
610
|
+
|
|
611
|
+
---
|
|
612
|
+
|
|
613
|
+
## Task 7: Run tests and verify existing suite passes
|
|
614
|
+
|
|
615
|
+
<!-- tdd: trivial -->
|
|
616
|
+
|
|
617
|
+
Files:
|
|
618
|
+
- None (verification only)
|
|
619
|
+
|
|
620
|
+
Steps:
|
|
621
|
+
1. Run `npm test` — confirm all existing tests pass without side-effects
|
|
622
|
+
2. Verify no `docs/lessons.md` was created or modified by the test run
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Progress: Design Review Split
|
|
2
|
+
|
|
3
|
+
Plan: docs/plans/2026-05-25-design-review-split-implementation.md
|
|
4
|
+
Branch: design-review-split
|
|
5
|
+
Started: 2026-05-25T12:00:00Z
|
|
6
|
+
Last updated: 2026-05-25T12:00:00Z
|
|
7
|
+
|
|
8
|
+
| # | Status | Task | Commit |
|
|
9
|
+
|---|--------|------|--------|
|
|
10
|
+
| 1 | ✅ done | Update brainstorming — remove security review, add trivial gate | 5055adb |
|
|
11
|
+
| 2 | ✅ done | Create design-review skill | 0e59552 |
|
|
12
|
+
| 3 | ✅ done | Update writing-plans — QA hat, acceptance criteria, plan audit with design-review awareness | f429397 |
|
|
13
|
+
| 4 | ✅ done | Update executing-tasks — cognitive persona shifts & defensive sandboxing | 8343a5b |
|
|
14
|
+
| 5 | ✅ done | Update finalizing — lessons curation with Scrum Master hat | d90fa2b |
|
|
15
|
+
| 6 | ✅ done | Update lessons format template in executing-tasks | 9feac99 |
|
|
16
|
+
| 7 | ✅ done | Run tests and verify existing suite passes | — |
|