@tianhai/pi-workflow-kit 0.14.0 → 0.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -33,20 +33,21 @@ Enforces phase-appropriate tool access — not just guidelines, but hard blocks:
33
33
 
34
34
  The agent can read code and discuss design with you during brainstorm/plan, but it physically cannot modify source files or run mutating commands.
35
35
 
36
- ### 🧠 5 Workflow Skills
36
+ ### 🧠 6 Workflow Skills
37
37
 
38
38
  Guide the agent through a disciplined development process:
39
39
 
40
40
  ```
41
- brainstorm → plan → execute → finalize
42
-
43
- diagnose (anytime)
41
+ brainstorm → design-review → plan → execute → finalize
42
+
43
+ diagnose (anytime)
44
44
  ```
45
45
 
46
46
  | Phase | Trigger | What Happens |
47
47
  |-------|---------|--------------|
48
48
  | **Brainstorm** | `/skill:brainstorming` | Explore approaches, debate tradeoffs, produce a design doc |
49
- | **Plan** | `/skill:writing-plans` | Break design into bite-sized TDD tasks with file paths and acceptance criteria |
49
+ | **Design Review** | `/skill:design-review` | Audit design for production risks (security, scalability, fault tolerance) |
50
+ | **Plan** | `/skill:writing-plans` | Break design into bite-sized TDD tasks with acceptance criteria and concrete code |
50
51
  | **Execute** | `/skill:executing-tasks` | Implement tasks one-by-one with TDD discipline and pre-commit checkpoint review gates |
51
52
  | **Finalize** | `/skill:finalizing` | Archive plan docs, update README/CHANGELOG, create PR |
52
53
  | **Diagnose** | `/skill:diagnose` | 6-phase debugging loop: reproduce → hypothesize → instrument → fix → verify |
@@ -59,6 +60,7 @@ You control each phase — the agent never advances on its own. Invoke a skill t
59
60
 
60
61
  ```
61
62
  /skill:brainstorming → discuss and design
63
+ /skill:design-review → audit for production risks (non-trivial designs)
62
64
  /skill:writing-plans → break into tasks
63
65
  /skill:executing-tasks → implement with TDD
64
66
  /skill:finalizing → ship it
@@ -116,15 +118,20 @@ pi install npm:@tianhai/pi-workflow-kit
116
118
  # (agent explores approaches, writes design doc)
117
119
  # (write/edit are blocked — your code is safe)
118
120
 
121
+ > /skill:design-review
122
+
123
+ # (agent audits for security, scalability, fault tolerance)
124
+ # (trivial changes can skip this step)
125
+
119
126
  > /skill:writing-plans
120
127
 
121
- # (agent breaks design into TDD tasks)
128
+ # (agent breaks design into TDD tasks with acceptance criteria)
122
129
  > /skill:executing-tasks
123
130
 
124
- # (agent implements with TDD, all tools unlocked)
131
+ # (agent implements with TDD, cognitive persona shifts, all tools unlocked)
125
132
  > /skill:finalizing
126
133
 
127
- # (agent archives docs, updates changelog, creates PR)
134
+ # (agent archives docs, curates lessons, creates PR)
128
135
  ```
129
136
 
130
137
  ## Why?
@@ -142,6 +149,7 @@ pi-workflow-kit/
142
149
  │ └── workflow-guard.ts # Write blocker during brainstorm/plan
143
150
  ├── skills/
144
151
  │ ├── brainstorming/SKILL.md
152
+ │ ├── design-review/SKILL.md
145
153
  │ ├── writing-plans/SKILL.md
146
154
  │ ├── executing-tasks/SKILL.md
147
155
  │ ├── finalizing/SKILL.md
@@ -0,0 +1,70 @@
1
+ # Design: Enforce Generic Lessons in `docs/lessons.md`
2
+
3
+ ## Problem
4
+
5
+ During `executing-tasks`, the agent writes lessons to `docs/lessons.md` scoped to the task at hand. In a monorepo, this produces domain-specific rules that are only useful within one feature or sprint — for example:
6
+
7
+ > "Always validate `userId` before calling `UserProfile.Get`"
8
+
9
+ The real lesson — applicable across any domain — would be:
10
+
11
+ > "Always validate required ID fields at the service boundary — missing IDs should return 400, not 500"
12
+
13
+ Domain-specific rules decay immediately after the feature is done and pollute the lessons file for future work.
14
+
15
+ ## Goal
16
+
17
+ Rules in `docs/lessons.md` should be generic patterns applicable to any domain or feature in the repo, not instances of a pattern tied to one service or entity.
18
+
19
+ ## Affected files
20
+
21
+ - `skills/executing-tasks/SKILL.md`
22
+ - `skills/finalizing/SKILL.md`
23
+
24
+ ## Changes
25
+
26
+ ### 1. `executing-tasks` — Step 6 "Learn from mistakes"
27
+
28
+ Add a **generalization test** after "Only add rules that would change future behavior."
29
+
30
+ ```
31
+ Before writing, apply the **generalization test**: would this rule apply equally to a
32
+ completely different feature or domain in this repo? If not, rewrite it — strip out
33
+ specific service names, entity types, and domain concepts, and express the underlying
34
+ pattern instead. If you can't express a generic form, don't write the rule.
35
+
36
+ ❌ Domain-specific (only survives this sprint):
37
+ "Always validate `userId` before calling `UserProfile.Get`"
38
+
39
+ ✅ Generic (applies across the whole repo):
40
+ "Always validate required ID fields at the service boundary — missing IDs should
41
+ return 400, not 500"
42
+ ```
43
+
44
+ ### 2. `executing-tasks` — `docs/lessons.md` format template comment
45
+
46
+ Add one line to the comment block so the constraint is visible every time the agent opens the file:
47
+
48
+ ```
49
+ Rules must be generic patterns applicable to any domain or feature — not specific to
50
+ one service, entity, or use case.
51
+ ```
52
+
53
+ ### 3. `finalizing` — Step 2 "Review lessons learned"
54
+
55
+ Add a generalization audit bullet between "Add any lessons..." and "Retire rules...":
56
+
57
+ ```
58
+ - Generalize domain-specific rules — if a rule names a specific service, entity, or
59
+ feature, either rewrite it as a generic pattern or remove it if no generic form exists
60
+ ```
61
+
62
+ ### 4. `finalizing` — `docs/lessons.md` format template comment
63
+
64
+ Same addition as change 2 — keep both template definitions consistent.
65
+
66
+ ## Slice summary
67
+
68
+ One end-to-end slice:
69
+
70
+ > **Lessons stay generic** — at write-time (executing-tasks step 6) the agent is required to generalize before writing; the file's own comment reinforces the constraint; at finalization the agent audits and cleans up anything that slipped through.
@@ -0,0 +1,114 @@
1
+ # Implementation Plan: Enforce Generic Lessons in `docs/lessons.md`
2
+
3
+ Design: docs/plans/2026-05-20-generic-lessons-design.md
4
+
5
+ Two skill files need four text edits total. No tests (markdown-only changes). Both
6
+ tasks are trivial — exact old/new text is specified so the executor can apply them
7
+ without guessing.
8
+
9
+ ---
10
+
11
+ ## Task 1: Add generalization test + update format comment in `executing-tasks`
12
+
13
+ <!-- tdd: trivial -->
14
+
15
+ File: `skills/executing-tasks/SKILL.md`
16
+
17
+ Two edits in one file:
18
+
19
+ ### Edit A — Step 6 "Learn from mistakes"
20
+
21
+ Replace:
22
+ ```
23
+ 6. **Learn from mistakes** — if you caught yourself making a mistake during this task that you've made before or that would apply to future tasks, append a rule to `docs/lessons.md`. Only add rules that would change future behavior. If the file doesn't exist, create it with the standard format (see below).
24
+ ```
25
+
26
+ With:
27
+ ```
28
+ 6. **Learn from mistakes** — if you caught yourself making a mistake during this task that you've made before or that would apply to future tasks, append a rule to `docs/lessons.md`. Only add rules that would change future behavior. If the file doesn't exist, create it with the standard format (see below).
29
+
30
+ Before writing, apply the **generalization test**: would this rule apply equally to a completely different feature or domain in this repo? If not, rewrite it — strip out specific service names, entity types, and domain concepts, and express the underlying pattern instead. If you can't express a generic form, don't write the rule.
31
+
32
+ ❌ **Domain-specific** (only survives this sprint):
33
+ > "Always validate `userId` before calling `UserProfile.Get`"
34
+
35
+ ✅ **Generic** (applies across the whole repo):
36
+ > "Always validate required ID fields at the service boundary — missing IDs should return 400, not 500"
37
+ ```
38
+
39
+ ### Edit B — `docs/lessons.md` format template comment
40
+
41
+ Replace:
42
+ ```
43
+ <!--
44
+ Agent: read this at the start of each task during executing-tasks.
45
+ Follow every rule. Add new rules when you catch yourself making repeat mistakes.
46
+ Retire rules that no longer apply during finalizing.
47
+ -->
48
+ ```
49
+
50
+ With:
51
+ ```
52
+ <!--
53
+ Agent: read this at the start of each task during executing-tasks.
54
+ Follow every rule. Add new rules when you catch yourself making repeat mistakes.
55
+ Rules must be generic patterns applicable to any domain or feature — not specific to one service, entity, or use case.
56
+ Retire rules that no longer apply during finalizing.
57
+ -->
58
+ ```
59
+
60
+ Steps:
61
+ 1. Apply Edit A to `skills/executing-tasks/SKILL.md`
62
+ 2. Apply Edit B to `skills/executing-tasks/SKILL.md`
63
+ 3. Verify: open the file and confirm both edits are present and the surrounding text is intact
64
+
65
+ ---
66
+
67
+ ## Task 2: Add generalization audit bullet + update format comment in `finalizing`
68
+
69
+ <!-- tdd: trivial -->
70
+
71
+ File: `skills/finalizing/SKILL.md`
72
+
73
+ Two edits in one file:
74
+
75
+ ### Edit A — Step 2 "Review lessons learned"
76
+
77
+ Replace:
78
+ ```
79
+ - Add any lessons from this session that were missed during execution
80
+ - Retire rules that no longer apply (remove the bullet)
81
+ ```
82
+
83
+ With:
84
+ ```
85
+ - Add any lessons from this session that were missed during execution
86
+ - **Generalize domain-specific rules** — if a rule names a specific service, entity, or feature, either rewrite it as a generic pattern or remove it if no generic form exists
87
+ - Retire rules that no longer apply (remove the bullet)
88
+ ```
89
+
90
+ ### Edit B — `docs/lessons.md` format template comment
91
+
92
+ Replace:
93
+ ```
94
+ <!--
95
+ Agent: read this at the start of each task during executing-tasks.
96
+ Follow every rule. Add new rules when you catch yourself making repeat mistakes.
97
+ Retire rules that no longer apply during finalizing.
98
+ -->
99
+ ```
100
+
101
+ With:
102
+ ```
103
+ <!--
104
+ Agent: read this at the start of each task during executing-tasks.
105
+ Follow every rule. Add new rules when you catch yourself making repeat mistakes.
106
+ Rules must be generic patterns applicable to any domain or feature — not specific to one service, entity, or use case.
107
+ Retire rules that no longer apply during finalizing.
108
+ -->
109
+ ```
110
+
111
+ Steps:
112
+ 1. Apply Edit A to `skills/finalizing/SKILL.md`
113
+ 2. Apply Edit B to `skills/finalizing/SKILL.md`
114
+ 3. Verify: open the file and confirm both edits are present and the surrounding text is intact
@@ -0,0 +1,11 @@
1
+ # Progress: generic-lessons
2
+
3
+ Plan: docs/plans/2026-05-20-generic-lessons-implementation.md
4
+ Branch: generic-lessons
5
+ Started: 2026-05-20T00:00:00Z
6
+ Last updated: 2026-05-20T00:00:00Z
7
+
8
+ | # | Status | Task | Commit |
9
+ |---|--------|------|--------|
10
+ | 1 | ✅ done | Add generalization test + update format comment in `executing-tasks` | 96010f7 |
11
+ | 2 | ✅ done | Add generalization audit bullet + update format comment in `finalizing` | 72f088d |
@@ -0,0 +1,77 @@
1
+ # Design: Agentic Agile & Architectural Rigor Enhancements
2
+
3
+ Enforcing rigorous Agile engineering discipline within `pi-workflow-kit` by introducing Behavioral Acceptance Criteria, Cognitive Persona Shifts, automated Lessons Curation, strict Multi-Pillar Architectural Reviews, and High-Risk Operation Safeguards.
4
+
5
+ ## Context & Objectives
6
+ Based on industry standards and modern agentic development templates (such as Microsoft's Agentic Agile model), autonomous coding agents succeed most when operating under tight behavioral boundaries, specialized cognitive roles, and continuous retro/learning loops.
7
+
8
+ We are enhancing `pi-workflow-kit` by mapping out distinct engineering "Hats" and rigorous check-gates directly into our existing phase-based skills without adding repository clutter or introducing flaky external file lookups:
9
+ 1. **The QA Engineer Hat** (in `writing-plans`): Defines rigid, testable `Given/When/Then` Acceptance Criteria for both happy and edge paths during planning.
10
+ 2. **The Pragmatic Developer & Senior Refactorer Hats** (in `executing-tasks`): Guides the execution loop through clear cognitive phases (Green Light → Polish / Software Craftsmanship).
11
+ 3. **The Agile Scrum Master Hat** (in `finalizing`): Cleans up, de-duplicates, and categorizes persistent lessons to prevent context-bloat and maximize the utility of future sprints.
12
+ 4. **Architectural Review & Audit Gates**: Formally audits both the design (brainstorming) and the plan (writing-plans) against the 6 core pillars of production-grade software (Robustness, Atomicity, Security, Scalability, Compatibility, and Testability) before allowing the agent to move forward.
13
+ 5. **High-Risk Operation Safeguards**: Auto-detects critical execution hazards (unbounded Redis scans, in-memory OOM loops, unthrottled concurrency, long-running transactions, etc.) and mandates strict mitigation steps and verification checkpoints.
14
+
15
+ ---
16
+
17
+ ## Architecture & Detailed Design
18
+
19
+ Because agent workspaces default tool execution and file-reading relative to the user's project directory, external files bundled in NPM global modules are not reliably reachable. Therefore, all guidelines are **inlined directly within the respective `SKILL.md` prompts**. This guarantees 100% reliability, zero repository pollution, and zero runtime performance overhead.
20
+
21
+ ### Slice 1: Multi-Pillar Design Review & Risk Detection (`brainstorming`)
22
+ Before concluding a brainstorm and generating a design doc, the agent must put on its **Architect Hat** and evaluate the proposed system against the **6 Pillars of Production-Grade Design**:
23
+ 1. **Robustness & Fault Tolerance**: How expected failures are handled, subsystem isolation, and graceful degradation.
24
+ 2. **Atomicity & Consistency**: Database transactions, state rollback on error, and endpoint idempotency.
25
+ 3. **Security & Access Control**: Input validation/sanitization and authorization checks at the boundary.
26
+ 4. **Scalability & Performance**: Connection pooling, closing resource leaks, and preventing N+1 queries.
27
+ 5. **Backwards Compatibility**: Schema migration safety, zero-downtime deployment, and API versioning.
28
+ 6. **Testability**: Injection seams for external dependencies (APIs, system clocks, randomizers) to keep tests 100% deterministic.
29
+
30
+ #### ⚠️ High-Risk Hazard Auditing
31
+ The agent must proactively audit the design for the **8 High-Risk Production Hazards**:
32
+ 1. **Unbounded Redis Deletions / Operations**: Multi-key deletion or scans (e.g. `KEYS` or raw `SCAN` loops) that block single-threaded performance.
33
+ 2. **In-Memory OOM Loops**: Fetching complete database datasets into server memory (e.g., raw `select *`) to filter, sort, or map in runtime heap.
34
+ 3. **Unbounded Concurrency Spikes**: Running concurrent network requests (e.g. unthrottled `Promise.all`) without strict batch limits (e.g., `p-limit`).
35
+ 4. **Missing High-Frequency Indexes**: Running queries on unindexed columns, forcing expensive table-scans under load.
36
+ 5. **Nested/Long-Running Transactions**: Holding database connections and locks open while awaiting slow external HTTP, disk, or cryptographic tasks.
37
+ 6. **Unrestricted Uploads & Temp Flooding**: Writing uploaded data directly to local temporary paths without validation limits or explicit `finally` cleanup blocks.
38
+ 7. **Raw Query String Interpolation**: Merging raw variables into SQL queries or shell command inputs (susceptible to injection).
39
+ 8. **Silent Swallowing loops**: Background workers or cron tasks silently catching and suppressing exceptions without logging, back-offs, or alerts.
40
+
41
+ #### 🔍 Discovering Unknown & Contextual Risks (Socratic Heuristics)
42
+ To identify novel or domain-specific risks that fall outside the standard checklist, the agent must put on its **SRE Hat** and audit the proposed logic against the **3 Socratic Heuristics**:
43
+ * **The "Scale to 100x" Heuristic (Resource Exhaustion)**: If this operation is run 100x/sec or on 100k items, what breaks? (Memory, CPU, Disk I/O, sockets, database connection limits).
44
+ * **The "Hostile World" Heuristic (Security & Malice)**: If a malicious actor has complete control over these inputs (headers, payloads, IDs), how can they exploit, crash, or extract data?
45
+ * **The "Silent Error" Heuristic (Observability & Partitioning)**: If this downstream dependency or query hangs or fails silently, how does our server react? Is there a timeout, a back-off, or logging?
46
+
47
+ If any of the standard hazards or Socratic risks are identified, the design document **must** include a dedicated `⚠️ High-Risk Operations & Mitigations` section detailing the exact safety protocols applied.
48
+
49
+ ### Slice 2: Behavioral Acceptance Criteria & Plan Audit (`writing-plans`)
50
+ The planning process is enhanced to mandate behavior-driven specifications and an automated plan verification step.
51
+
52
+ - **Role**: QA Engineer Hat.
53
+ - **Specification Format**: Mandatory `Given/When/Then` blocks covering the Happy Path and Edge/Error Paths.
54
+ - **Plan Acceptance Audit**: Before presenting the plan to the user, the agent must verify:
55
+ - Every task is a complete vertical slice.
56
+ - Sizing is correct (no monolithic tasks).
57
+ - Checkpoint gates are placed on the most critical/risky tasks.
58
+ - **Risk Enforcement**: Any task containing any of the **8 High-Risk Hazards** or **Socratic Heuristics risks** is strictly required to have a mandatory `checkpoint: done` gate and explicit verification guidelines.
59
+
60
+ ### Slice 3: Cognitive Persona Shifts (`executing-tasks`)
61
+ The implementation execution loop is updated to divide the cognitive workload of a single task into three distinct phases.
62
+
63
+ - **Phase 1: QA Test Phase**: Translate the Given/When/Then specs into failing test cases.
64
+ - **Phase 2: Pragmatic Developer Phase**: Implement the simplest, raw code to green the tests.
65
+ - **Phase 3: Senior Refactoring Phase**: Refactor and polish using software craftsmanship principles (Shallow Modules, Deletion Test, Duplication, Seam Discipline).
66
+
67
+ ### Slice 4: Lessons Curation & Caching (`finalizing`)
68
+ The finalizing phase is upgraded to run a structured retrospective on our persistent learning files.
69
+
70
+ - **Role**: Agile Scrum Master Hat.
71
+ - **Curating Rules**: De-duplicate, validate against the Generalization Test, and categorize rules under distinct headers (e.g., `# Tool Usage`, `# Testing Patterns`, `# Architecture Rules`).
72
+
73
+ ---
74
+
75
+ ## Verification & Testing Plan
76
+ - **Manual Verification**: Run a mock `/skill:writing-plans` and `/skill:executing-tasks` to verify the generated implementation plan matches our QA template and the task-running agent correctly segments its progress through the three cognitive hats.
77
+ - **Automated Tests**: Confirm existing Vitest suites run successfully without side-effects.