@tianhai/pi-workflow-kit 0.15.0 → 0.17.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,473 @@
1
+ # Implementation Plan: Agentic Agile & Architectural Rigor
2
+
3
+ Updates 4 skill files to introduce behavioral acceptance criteria, SRE hazard checks, cognitive persona shifts, architectural design reviews, and automated lessons curation.
4
+
5
+ ---
6
+
7
+ ## Task 1: Update `skills/brainstorming/SKILL.md` — 6 Pillars, 8 Hazards, 3 Socratic Heuristics
8
+
9
+ <!-- tdd: modifying-tested-code -->
10
+
11
+ Files:
12
+ - `skills/brainstorming/SKILL.md`
13
+
14
+ Acceptance Criteria (QA Engineer Hat):
15
+ - **Happy Path**:
16
+ - Given: A user runs `/skill:brainstorming` and a non-trivial design is proposed
17
+ - When: The agent presents the design for approval
18
+ - Then: The design includes a dedicated `🏛️ Architectural Pillars Review` section covering all 6 pillars (Robustness, Atomicity, Security, Scalability, Compatibility, Testability)
19
+ - **Edge Path (Trivial Feature)**:
20
+ - Given: A user runs `/skill:brainstorming` for a trivial change (e.g., renaming a column)
21
+ - When: The agent reaches the architectural review step
22
+ - Then: The agent writes a brief statement like "Simple change — no architectural review needed" and skips the full audit
23
+ - **Edge Path (Hazard Detection)**:
24
+ - Given: The proposed design involves Redis key deletion
25
+ - When: The agent audits against the 8 High-Risk Hazards
26
+ - Then: The design flags it as `[TRIGGERED]` under hazard #1 and includes a mitigation in a `⚠️ High-Risk Operations & Mitigations` section
27
+ - **Edge Path (Socratic Discovery)**:
28
+ - Given: The proposed design has a novel batch-processing loop not covered by the 8 hazards
29
+ - When: The agent applies the 3 Socratic Heuristics
30
+ - Then: The design flags the discovered risk and proposes mitigation
31
+
32
+ Steps:
33
+ 1. Read `skills/brainstorming/SKILL.md` in full
34
+ 2. In step 4 ("Present the design"), add a new mandatory sub-step before writing the design doc: **Architectural Review & Risk Detection**. Insert the following inline guidelines:
35
+
36
+ ```markdown
37
+ #### 🏛️ Architectural Pillars Review
38
+
39
+ For non-trivial designs, evaluate the proposed design against the **6 Pillars of Production-Grade Design**. Include a dedicated section in the design doc addressing each:
40
+
41
+ 1. **Robustness & Fault Tolerance**: How expected failures are handled, subsystem isolation, graceful degradation.
42
+ 2. **Atomicity & Consistency**: Database transactions, state rollback on error, endpoint idempotency.
43
+ 3. **Security & Access Control**: Input validation/sanitization, authorization checks at the boundary.
44
+ 4. **Scalability & Performance**: Connection pooling, closing resource leaks, preventing N+1 queries.
45
+ 5. **Backwards Compatibility**: Schema migration safety, zero-downtime deployment, API versioning.
46
+ 6. **Testability**: Injection seams for external dependencies (APIs, system clocks, randomizers) to keep tests 100% deterministic.
47
+
48
+ For trivial changes (config, naming, simple field additions), a brief statement like "Simple change — no architectural review needed" suffices.
49
+
50
+ #### ⚠️ High-Risk Hazard Audit
51
+
52
+ For non-trivial designs, you MUST evaluate the design against the **8 High-Risk Production Hazards**. For each hazard, write either `[SAFE]` (with a 1-sentence justification of why it doesn't apply) or `[TRIGGERED]` (detailing the mitigation):
53
+
54
+ - **1. Unbounded Redis Deletions / Operations**: Multi-key deletion or scans (e.g. `KEYS` or raw `SCAN` loops) that block single-threaded performance.
55
+ - **2. In-Memory OOM Loops**: Fetching complete database datasets into server memory (e.g., raw `select *`) to filter, sort, or map in runtime heap.
56
+ - **3. Unbounded Concurrency Spikes**: Running concurrent network requests (e.g. unthrottled `Promise.all`) without strict batch limits.
57
+ - **4. Missing High-Frequency Indexes**: Running queries on unindexed columns, forcing expensive table-scans under load.
58
+ - **5. Nested/Long-Running Transactions**: Holding database connections and locks open while awaiting slow external HTTP, disk, or cryptographic tasks.
59
+ - **6. Unrestricted Uploads & Temp Flooding**: Writing uploaded data directly to local temporary paths without validation limits or explicit `finally` cleanup blocks.
60
+ - **7. Raw Query String Interpolation**: Merging raw variables into SQL queries or shell command inputs (susceptible to injection).
61
+ - **8. Silent Swallowing Loops**: Background workers or cron tasks silently catching and suppressing exceptions without logging, back-offs, or alerts.
62
+
63
+ For trivial changes, skip this audit.
64
+
65
+ #### 🔍 Socratic Risk Discovery
66
+
67
+ For non-trivial designs, put on your **SRE Hat** and audit the proposed logic against the **3 Socratic Heuristics** to identify novel or domain-specific risks:
68
+
69
+ - **The "Scale to 100x" Heuristic**: If this operation is run 100x/sec or on 100k items, what breaks? (Memory, CPU, Disk I/O, sockets, database connection limits).
70
+ - **The "Hostile World" Heuristic**: If a malicious actor has complete control over these inputs (headers, payloads, IDs), how can they exploit, crash, or extract data?
71
+ - **The "Silent Error" Heuristic**: If this downstream dependency or query hangs or fails silently, how does our server react? Is there a timeout, a back-off, or logging?
72
+
73
+ For trivial changes, skip this audit.
74
+
75
+ If any hazard is `[TRIGGERED]` or any Socratic risk is identified, the design document **must** include a dedicated `⚠️ High-Risk Operations & Mitigations` section detailing the exact safety protocols applied.
76
+ ```
77
+
78
+ 3. Verify the file reads cleanly — the new sections should slot naturally between the existing ADR guidance and step 5 ("Write the design doc").
79
+
80
+ ---
81
+
82
+ ## Task 2: Update `skills/writing-plans/SKILL.md` — QA Hat, Given/When/Then, Plan Acceptance Audit
83
+
84
+ <!-- tdd: modifying-tested-code -->
85
+
86
+ Files:
87
+ - `skills/writing-plans/SKILL.md`
88
+
89
+ Acceptance Criteria (QA Engineer Hat):
90
+ - **Happy Path**:
91
+ - Given: A user runs `/skill:writing-plans`
92
+ - When: The implementation plan is generated
93
+ - Then: Every task has a structured `Acceptance Criteria` block with `Given/When/Then` happy-path and edge-case behaviors
94
+ - **Edge Path (Risk Enforcement)**:
95
+ - Given: A task involves any of the 8 production hazards or Socratic risks flagged in the design
96
+ - When: The plan audit runs
97
+ - Then: That task is automatically gated with `checkpoint: done` and includes a `Hazard Mitigation Verification` section
98
+
99
+ Steps:
100
+ 1. Read `skills/writing-plans/SKILL.md` in full
101
+ 2. In the "Task format" section, add the QA Engineer Hat and Acceptance Criteria requirements. Replace:
102
+
103
+ ```markdown
104
+ Each task must include:
105
+ - Exact file paths to create/modify
106
+ - **Concrete code** — include the actual implementation, not a summary. Write out SQL schemas, type definitions, function signatures with bodies, route handler code, and test assertions. A developer should be able to copy-paste from the plan and have working code. For tasks that depend on types or utilities from earlier tasks, reference them explicitly (e.g., `import { User } from Task 2`) and include only the new code
107
+ - Exact commands with expected output (e.g., `npx vitest run src/user/model.test.ts` → shows 1 test passing)
108
+ - Each task's tests should cover the happy path and at least one edge case or error path, with concrete assertions
109
+ ```
110
+
111
+ With:
112
+
113
+ ```markdown
114
+ Each task must include:
115
+ - Exact file paths to create/modify
116
+ - **Acceptance Criteria (QA Engineer Hat)** — Put on your **QA Engineer Hat** to design exhaustive test coverage. Explicitly define:
117
+ - **Happy Path**: Expected behavior under normal operations.
118
+ - **Edge Cases & Error Paths**: What happens with empty inputs, limits exceeded, authentication failures, or error states.
119
+ Ensure every criteria block specifies the expected state and returned results using `Given/When/Then` behavioral blocks.
120
+ - **Concrete code** — include the actual implementation, not a summary. Write out SQL schemas, type definitions, function signatures with bodies, route handler code, and test assertions. A developer should be able to copy-paste from the plan and have working code. For tasks that depend on types or utilities from earlier tasks, reference them explicitly (e.g., `import { User } from Task 2`) and include only the new code
121
+ - Exact commands with expected output (e.g., `npx vitest run src/user/model.test.ts` → shows 1 test passing)
122
+ - Each task's tests should cover the happy path and at least one edge case or error path, with concrete assertions
123
+ ```
124
+
125
+ 3. In the "Task body structure" section, update each example task template to include an `Acceptance Criteria` block. Update the "No checkpoint" example to:
126
+
127
+ ```markdown
128
+ ## Task 1: Create User model
129
+
130
+ <!-- tdd: new-feature -->
131
+
132
+ Acceptance Criteria (QA Engineer Hat):
133
+ - **Happy Path**:
134
+ - Given: Valid user data with name and email
135
+ - When: The User model is created
136
+ - Then: The model contains the correct fields and a generated ID
137
+ - **Edge Case (duplicate email)**:
138
+ - Given: A user with email "test@example.com" already exists
139
+ - When: Another user is created with the same email
140
+ - Then: Creation fails with a unique constraint error
141
+
142
+ Files:
143
+ - `src/user/model.ts`
144
+ - `src/user/model.test.ts`
145
+
146
+ Steps:
147
+ 1. Write failing test for User model creation
148
+ 2. Run test — confirm it fails
149
+ 3. Implement User model
150
+ 4. Run test — confirm it passes
151
+ ```
152
+
153
+ Update the `checkpoint: test` example to include acceptance criteria:
154
+
155
+ ```markdown
156
+ ## Task 2: Write auth tests
157
+
158
+ <!-- tdd: new-feature -->
159
+ <!-- checkpoint: test -->
160
+
161
+ Acceptance Criteria (QA Engineer Hat):
162
+ - **Happy Path**:
163
+ - Given: A user with valid credentials exists
164
+ - When: Login is attempted
165
+ - Then: A valid session token is returned
166
+ - **Edge Case (wrong password)**:
167
+ - Given: A user exists but password is incorrect
168
+ - When: Login is attempted
169
+ - Then: An authentication error is returned
170
+
171
+ Files:
172
+ - `src/auth/login.test.ts`
173
+
174
+ Steps:
175
+ 1. Write failing test for login with valid credentials
176
+ 2. Run test — confirm it fails
177
+
178
+ ⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
179
+
180
+ 3. Implement login handler
181
+ 4. Run test — confirm it passes
182
+ 5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
183
+ 6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
184
+ ```
185
+
186
+ Update the `checkpoint: done` example to include acceptance criteria:
187
+
188
+ ```markdown
189
+ ## Task 3: Add login endpoint
190
+
191
+ <!-- tdd: new-feature -->
192
+ <!-- checkpoint: done -->
193
+
194
+ Acceptance Criteria (QA Engineer Hat):
195
+ - **Happy Path**:
196
+ - Given: A user with email "user@example.com" and password "secure123" exists
197
+ - When: A POST request with those credentials is sent to `/api/login`
198
+ - Then: Response returns `200 OK` with a signed JWT token
199
+ - **Edge Case (invalid password)**:
200
+ - Given: A user exists but the password sent is "wrong-pass"
201
+ - When: A POST request is sent to `/api/login`
202
+ - Then: Response returns `401 Unauthorized`
203
+ - **Edge Case (rate limiting)**:
204
+ - Given: 5 failed login attempts from the same IP
205
+ - When: A 6th attempt is sent
206
+ - Then: Response returns `429 Too Many Requests`
207
+
208
+ Files:
209
+ - `src/auth/login.ts`
210
+ - `src/auth/login.test.ts`
211
+
212
+ Steps:
213
+ 1. Write failing test for login with valid credentials
214
+ 2. Run test — confirm it fails
215
+ 3. Implement login handler
216
+ 4. Run test — confirm it passes
217
+ 5. Add edge case tests (invalid password, missing email)
218
+ 6. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
219
+ 7. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
220
+
221
+ ⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
222
+ ```
223
+
224
+ Update the "Both checkpoints" example to include acceptance criteria:
225
+
226
+ ```markdown
227
+ ## Task 4: Complex auth flow
228
+
229
+ <!-- tdd: new-feature -->
230
+ <!-- checkpoint: test -->
231
+ <!-- checkpoint: done -->
232
+
233
+ Acceptance Criteria (QA Engineer Hat):
234
+ - **Happy Path**:
235
+ - Given: A valid OAuth2 authorization code
236
+ - When: The auth callback is invoked
237
+ - Then: A user session is created and the user is redirected to the dashboard
238
+ - **Edge Case (expired code)**:
239
+ - Given: An expired or invalid authorization code
240
+ - When: The auth callback is invoked
241
+ - Then: The user is redirected to login with an error message
242
+
243
+ Steps:
244
+ 1. Write failing test for auth flow
245
+ 2. Run test — confirm it fails
246
+
247
+ ⏸ **CHECKPOINT: test** — present test review. Wait for human approval before implementing.
248
+
249
+ 3. Implement auth flow
250
+ 4. Run test — confirm it passes
251
+ 5. Refactor — check for shallow modules, duplication, seam discipline. Run tests after changes.
252
+ 6. Lessons — caught a mistake that applies to future tasks? Add rule to `docs/lessons.md`.
253
+
254
+ ⏸ **CHECKPOINT: done** — present implementation review. Wait for human approval before committing.
255
+ ```
256
+
257
+ 4. In step 3 ("Present the plan"), add the **Plan Acceptance Audit** sub-step after "show the complete plan to the human":
258
+
259
+ ```markdown
260
+ Before presenting, run the **Plan Acceptance Audit**:
261
+ - **Vertical Slices**: Is every task a complete vertical slice (not horizontal)?
262
+ - **Task Sizing**: Is any single task too large or covering multiple complex behaviors? If so, split it.
263
+ - **QA Coverage**: Does every task have both a Happy Path and at least one Edge Case in its Acceptance Criteria?
264
+ - **Checkpoint Alignment**: Are `checkpoint: test` and `checkpoint: done` gates placed on the most critical or risky tasks?
265
+ - **Risk Enforcement**: If the design doc flagged any hazards as `[TRIGGERED]`, verify the corresponding tasks have `checkpoint: done` and a `Hazard Mitigation Verification` section.
266
+
267
+ If any check fails, fix the plan before presenting.
268
+ ```
269
+
270
+ 5. Verify the file reads cleanly.
271
+
272
+ ---
273
+
274
+ ## Task 3: Update `skills/executing-tasks/SKILL.md` — Cognitive Persona Shifts & Defensive Sandboxing
275
+
276
+ <!-- tdd: modifying-tested-code -->
277
+
278
+ Files:
279
+ - `skills/executing-tasks/SKILL.md`
280
+
281
+ Acceptance Criteria (QA Engineer Hat):
282
+ - **Happy Path**:
283
+ - Given: An implementation plan with tasks containing Given/When/Then acceptance criteria and numbered steps
284
+ - When: `/skill:executing-tasks` runs through a task
285
+ - Then: The agent follows the plan's numbered steps while applying three cognitive frames:
286
+ 1. **QA Test frame** (when writing/running tests): Focus on translating Given/When/Then specs, verify sandboxed environment
287
+ 2. **Pragmatic Developer frame** (when implementing): Focus on simplest code to green tests
288
+ 3. **Senior Refactoring frame** (when refactoring): Evaluate craftsmanship (shallow modules, deletion test, duplication, seam discipline)
289
+ - **Edge Path (Sandbox Verification)**:
290
+ - Given: A test file that would connect to a real database
291
+ - When: The agent is in the QA Test frame
292
+ - Then: The agent verifies the test uses mocks/stubs and no live connections before running
293
+
294
+ Steps:
295
+ 1. Read `skills/executing-tasks/SKILL.md` in full
296
+ 2. In the "Per-task execution" section, replace step 3 with meta-framed persona shifts that preserve the plan-step-following behavior. Replace:
297
+
298
+ ```markdown
299
+ 3. **Execute the plan steps** — follow each numbered step in the task body, in order. Stop at any `⏸ CHECKPOINT` gate (see [Checkpoint gates](#checkpoint-gates--when-the-plan-says-stop)).
300
+ 4. **Verify against task description** — re-read the task from the plan. Does the implementation satisfy every requirement listed? If not, fix before proceeding.
301
+ 5. **Refactor** — after all tests pass, look for:
302
+ - **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
303
+ - **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
304
+ - **Duplication** — extract repeated patterns
305
+ - **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
306
+
307
+ Run tests after each refactor step. Never refactor while tests are failing.
308
+ ```
309
+
310
+ With:
311
+
312
+ ```markdown
313
+ 3. **Execute the plan steps** — follow each numbered step in the task body, in order. As you work, shift your cognitive focus through three frames:
314
+
315
+ **QA Test frame** (when writing/running tests): Focus entirely on translating the task's `Given/When/Then` Acceptance Criteria into precise failing tests. Before running tests, verify the test environment is sandboxed — no real database connections, API calls, or live services. External dependencies must be mocked or stubbed. `NODE_ENV` must be `test` (or equivalent).
316
+
317
+ **Pragmatic Developer frame** (when implementing): Focus on the simplest possible code to make the tests green. Do not over-engineer or add code for future requirements. Keep complexity to a bare minimum.
318
+
319
+ **Senior Refactoring frame** (when refactoring): Evaluate the craftsmanship of the code. Check for:
320
+ - **Shallow modules** — is the interface nearly as complex as the implementation? Can complexity be hidden behind a simpler interface?
321
+ - **Deletion test** — if you deleted this module, would complexity vanish (pass-through) or reappear across callers (earning its keep)?
322
+ - **Duplication** — extract repeated patterns
323
+ - **Seam discipline** — don't introduce abstraction unless something actually varies across it. One adapter = hypothetical seam. Two adapters = real seam
324
+
325
+ Run tests after each refactor step. Never refactor while tests are failing.
326
+
327
+ Stop at any `⏸ CHECKPOINT` gate (see [Checkpoint gates](#checkpoint-gates--when-the-plan-says-stop)).
328
+ 4. **Verify against task description** — re-read the task from the plan. Does the implementation satisfy every requirement listed? If not, fix before proceeding.
329
+ ```
330
+
331
+ Note: The old step 5 (Refactor) is folded into step 3's "Senior Refactoring frame" so step 4 remains "Verify against task description". The remaining steps (old 6→5, old 7→6, old 8→7, old 9→8, old 10→9) need to be renumbered.
332
+
333
+ 3. Renumber the remaining steps after the new step 4:
334
+ - Old step 6 ("Learn from mistakes") → new step 5
335
+ - Old step 7 ("Commit") → new step 6
336
+ - Old step 8 ("Update progress") → new step 7
337
+ - Old step 9 ("Suggest session break") → new step 8
338
+ - Old step 10 ("Loop") → new step 9
339
+
340
+ 4. Verify the file reads cleanly — the cognitive frames are meta-guidance applied while following the plan's numbered steps, not a replacement for them.
341
+
342
+ ---
343
+
344
+ ## Task 4: Update `skills/finalizing/SKILL.md` — Lessons Curation with Scrum Master Hat
345
+
346
+ <!-- tdd: modifying-tested-code -->
347
+
348
+ Files:
349
+ - `skills/finalizing/SKILL.md`
350
+
351
+ Acceptance Criteria (QA Engineer Hat):
352
+ - **Happy Path**:
353
+ - Given: A sprint is completed with some rules in `docs/lessons.md`
354
+ - When: `/skill:finalizing` is executed
355
+ - Then: The agent puts on the **Agile Scrum Master Hat** to de-duplicate, generalize, and categorize all rules under structured markdown headers
356
+ - **Edge Path (No lessons exist)**:
357
+ - Given: No `docs/lessons.md` exists and no lessons were learned
358
+ - When: `/skill:finalizing` is executed
359
+ - Then: The step is skipped gracefully (existing behavior preserved)
360
+ - **Edge Path (Lessons format after categorization)**:
361
+ - Given: `docs/lessons.md` was categorized into headers like `## Tool Usage` and `## Testing Patterns` by a previous finalizing run
362
+ - When: A new execution phase appends a rule under `## Rules`
363
+ - Then: The rule lands in the correct location (the `## Rules` section still exists for new entries, and finalizing re-categorizes later)
364
+
365
+ Steps:
366
+ 1. Read `skills/finalizing/SKILL.md` in full
367
+ 2. In step 2 ("Review lessons learned"), replace the existing instruction with the enhanced Scrum Master Hat curation:
368
+
369
+ Replace:
370
+
371
+ ```markdown
372
+ 2. **Review lessons learned** — if `docs/lessons.md` exists, review it:
373
+ - Add any lessons from this session that were missed during execution
374
+ - **Generalize domain-specific rules** — if a rule names a specific service, entity, or feature, either rewrite it as a generic pattern or remove it if no generic form exists
375
+ - Retire rules that no longer apply (remove the bullet)
376
+ - If no changes are needed, leave it as-is
377
+ ```
378
+
379
+ With:
380
+
381
+ ```markdown
382
+ 2. **Review & Polish Lessons (Agile Scrum Master Hat)** — if `docs/lessons.md` exists, put on your **Agile Scrum Master Hat** to curate and optimize it for future sprints:
383
+ - **Add missed lessons** — capture any lessons from this session that weren't written during execution
384
+ - **Generalize domain-specific rules** — if a rule names a specific service, entity, or feature, either rewrite it as a generic pattern or remove it if no generic form exists
385
+ - **De-duplicate** — combine overlapping or redundant rules into single, sharper entries
386
+ - **Categorize** — group the rules under clear, structured markdown headers (e.g., `## Tool Usage`, `## Testing Patterns`, `## Architecture Rules`) to make the document highly scannable for future sessions. Keep the `## Rules` section as the append target for new entries during execution — categorization moves rules out of `## Rules` into the appropriate category headers.
387
+ - **Retire stale rules** — remove bullets that no longer apply
388
+ - If no changes are needed, leave it as-is
389
+ ```
390
+
391
+ 3. Verify the file reads cleanly.
392
+
393
+ ---
394
+
395
+ ## Task 5: Update `docs/lessons.md` format template in `skills/executing-tasks/SKILL.md`
396
+
397
+ <!-- tdd: modifying-tested-code -->
398
+
399
+ Files:
400
+ - `skills/executing-tasks/SKILL.md`
401
+
402
+ Acceptance Criteria (QA Engineer Hat):
403
+ - **Happy Path**:
404
+ - Given: The agent catches a repeat mistake during task execution
405
+ - When: It appends a new rule to `docs/lessons.md`
406
+ - Then: The rule is appended under `## Rules` (the standard append target), regardless of whether category headers exist from a previous finalizing run
407
+ - **Edge Path (After categorization)**:
408
+ - Given: `docs/lessons.md` has been reorganized by finalizing with category headers like `## Tool Usage`
409
+ - When: The agent needs to append a new rule during execution
410
+ - Then: The agent appends to `## Rules` (which finalizing ensures always exists as the catch-all section)
411
+
412
+ Steps:
413
+ 1. Read the `docs/lessons.md` format template section in `skills/executing-tasks/SKILL.md`
414
+ 2. Update the format template comment to clarify the append convention:
415
+
416
+ Replace:
417
+
418
+ ```markdown
419
+ ### `docs/lessons.md` format
420
+
421
+ ```markdown
422
+ # Lessons Learned
423
+
424
+ <!--
425
+ Agent: read this at the start of each task during executing-tasks.
426
+ Follow every rule. Add new rules when you catch yourself making repeat mistakes.
427
+ Rules must be generic patterns applicable to any domain or feature — not specific to one service, entity, or use case.
428
+ Retire rules that no longer apply during finalizing.
429
+ -->
430
+
431
+ ## Rules
432
+
433
+ - <new rule here>
434
+ ```
435
+ ```
436
+
437
+ With:
438
+
439
+ ```markdown
440
+ ### `docs/lessons.md` format
441
+
442
+ ```markdown
443
+ # Lessons Learned
444
+
445
+ <!--
446
+ Agent: read this at the start of each task during executing-tasks.
447
+ Follow every rule. Add new rules when you catch yourself making repeat mistakes.
448
+ Rules must be generic patterns applicable to any domain or feature — not specific to one service, entity, or use case.
449
+ Retire rules that no longer apply during finalizing.
450
+ -->
451
+
452
+ ## Rules
453
+
454
+ - <new rule here>
455
+ ```
456
+
457
+ When adding a new rule during execution, always append it under `## Rules`. The categorization into specific headers (e.g., `## Tool Usage`, `## Testing Patterns`) is done during finalizing — never during execution.
458
+ ```
459
+
460
+ 3. Verify the file reads cleanly.
461
+
462
+ ---
463
+
464
+ ## Task 6: Run tests and verify existing suite passes
465
+
466
+ <!-- tdd: trivial -->
467
+
468
+ Files:
469
+ - None (verification only)
470
+
471
+ Steps:
472
+ 1. Run `npm test` — confirm all existing tests pass without side-effects
473
+ 2. Verify no `docs/lessons.md` was created or modified by the test run