@tianhai/pi-workflow-kit 0.16.0 → 0.18.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +51 -39
- package/docs/developer-usage-guide.md +32 -14
- package/docs/lessons.md +18 -0
- package/docs/oversight-model.md +11 -5
- package/docs/plans/2026-06-03-karpathy-guidelines-ab-comparison.md +166 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-design.md +51 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-implementation.md +111 -0
- package/docs/plans/completed/2026-06-03-add-verify-skill-progress.md +11 -0
- package/docs/plans/completed/2026-06-03-verify-skill-design.md +176 -0
- package/docs/plans/completed/2026-06-09-code-review-fixes-implementation.md +74 -0
- package/docs/plans/completed/2026-06-09-code-review-fixes-progress.md +14 -0
- package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-design.md +186 -0
- package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-implementation.md +675 -0
- package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-progress.md +18 -0
- package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-verification-report.md +81 -0
- package/docs/plans/completed/2026-06-09-verification-fixes-implementation.md +69 -0
- package/docs/plans/completed/2026-06-09-verification-fixes-progress.md +14 -0
- package/docs/workflow-phases.md +19 -13
- package/extensions/workflow-guard.ts +10 -9
- package/package.json +2 -1
- package/skills/{brainstorming → pwk-brainstorming}/SKILL.md +17 -6
- package/skills/{design-review → pwk-design-review}/SKILL.md +11 -9
- package/skills/{diagnose → pwk-diagnose}/SKILL.md +1 -1
- package/skills/{executing-tasks → pwk-executing-tasks}/SKILL.md +46 -16
- package/skills/{finalizing → pwk-finalizing}/SKILL.md +9 -2
- package/skills/pwk-verify/SKILL.md +170 -0
- package/skills/{writing-plans → pwk-writing-plans}/SKILL.md +72 -6
|
@@ -0,0 +1,170 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pwk-verify
|
|
3
|
+
description: "Post-implementation code verification with three expert review passes — security, optimization, and traceability. Use after executing-tasks and before finalizing to catch issues that pass tests but break in production. Runs the 'last prompt' pattern: adversarial security review, dead code and duplication audit, and end-to-end contract verification across every layer. Use this skill whenever the user says 'verify', 'review the code', 'check for issues', 'security review', 'the last prompt', 'audit', or when code has been implemented and needs a quality gate before shipping."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Verify
|
|
7
|
+
|
|
8
|
+
Three expert review passes over the implemented codebase. Read-only — you **may** write the verification report to `docs/plans/`, but you **may not** modify source code.
|
|
9
|
+
|
|
10
|
+
The core insight: code that passes tests is not code that's ready. Working code can have security holes, dead branches, duplicated logic, and broken contracts between layers — especially when AI generates across many files without maintaining a single mental model of the whole system. This skill catches what tests miss.
|
|
11
|
+
|
|
12
|
+
## Process
|
|
13
|
+
|
|
14
|
+
1. **Check what's been done** — run `git log --oneline` and `git diff --stat` to understand the scope of recent changes. If nothing has been implemented, say "No code changes found. Run `/skill:pwk-executing-tasks` first." and stop.
|
|
15
|
+
|
|
16
|
+
2. **Identify the project's layers** — before reviewing, map the codebase's architecture. Look for layer boundaries: UI/handlers/routes → services/business logic → repositories/data access → database/models. Note the patterns: does the project use controllers, handlers, or routes? Services or use cases? Repositories or DAOs? This map drives the traceability pass.
|
|
17
|
+
|
|
18
|
+
3. **Run three expert review passes** — each pass adopts a distinct adversarial framing. Do them sequentially. For each pass, read the relevant code deeply — don't skim. Then write findings.
|
|
19
|
+
|
|
20
|
+
4. **Compile the report** — write all findings to `docs/plans/*-verification-report.md`. Present the report to the user and wait for feedback.
|
|
21
|
+
|
|
22
|
+
5. **Offer to create a remediation plan** — after the report, ask: "Want me to create a fix plan from these findings? Run `/skill:pwk-writing-plans` to turn the task list into executable tasks."
|
|
23
|
+
|
|
24
|
+
## Pass 1 — Security Review 🔴
|
|
25
|
+
|
|
26
|
+
**Framing:** A junior developer wrote this code. Now the best security expert on the team is reviewing it — adversarial, suspicious of everything. Trust nothing.
|
|
27
|
+
|
|
28
|
+
**What to look for:**
|
|
29
|
+
|
|
30
|
+
- **Input validation** — every external input (HTTP params, form data, headers, query strings, environment variables) must be validated and sanitized. Unvalidated input is a critical finding.
|
|
31
|
+
- **Authentication & authorization** — every endpoint that handles user data must have auth checks. Are there endpoints that skip auth? Can one user access another user's data by changing an ID?
|
|
32
|
+
- **Injection** — SQL queries built by string concatenation, unsanitized shell commands, template injection, XSS in HTML output. Any raw variable interpolated into a query or command is critical.
|
|
33
|
+
- **Secrets** — API keys, passwords, tokens hardcoded in source files. Check environment variable loading — are defaults set to empty or to actual secrets?
|
|
34
|
+
- **Data exposure** — are sensitive fields (passwords, tokens, PII) logged, returned in API responses, or stored unencrypted?
|
|
35
|
+
- **Dependency risks** — known-vulnerable packages (if `package.json`/`go.mod`/`requirements.txt` is present).
|
|
36
|
+
|
|
37
|
+
**Severity classification:**
|
|
38
|
+
|
|
39
|
+
| Severity | Definition |
|
|
40
|
+
|----------|-----------|
|
|
41
|
+
| Critical | Exploitable right now — auth bypass, injection, data leak |
|
|
42
|
+
| High | Likely exploitable — missing validation on sensitive endpoint, weak auth |
|
|
43
|
+
| Medium | Harder to exploit but real risk — verbose error messages leaking internals, missing rate limits |
|
|
44
|
+
| Low | Best practice violations — missing CSP headers, no HSTS, long session timeouts |
|
|
45
|
+
|
|
46
|
+
## Pass 2 — Optimization Review 🟡
|
|
47
|
+
|
|
48
|
+
**Framing:** A code quality expert looking for waste — things that make the codebase harder to maintain, slower to run, or more confusing than necessary.
|
|
49
|
+
|
|
50
|
+
**What to look for:**
|
|
51
|
+
|
|
52
|
+
- **Dead code** — functions, methods, types, or exports that are never called anywhere in the codebase. Search for definitions and verify they have callers.
|
|
53
|
+
- **Duplication** — the same logic implemented in slightly different ways across multiple files. AI-generated code is especially prone to this — if context was lost between sessions, the AI solved the same sub-problem differently in two places. Flag each pair with file paths and line numbers.
|
|
54
|
+
- **Over-engineering** — abstractions, interfaces, or layers that add complexity without earning their keep (only one implementation, no real variation across the seam).
|
|
55
|
+
- **Under-engineering** — god functions, 200-line blocks, deeply nested conditionals that should be extracted.
|
|
56
|
+
- **Performance concerns** — N+1 queries, unbounded loops, unnecessary copies of large data structures, missing pagination on list endpoints.
|
|
57
|
+
|
|
58
|
+
**Priority classification:**
|
|
59
|
+
|
|
60
|
+
| Priority | Definition |
|
|
61
|
+
|----------|-----------|
|
|
62
|
+
| P0 | Dead code in a critical path or duplicated logic that will diverge |
|
|
63
|
+
| P1 | Significant duplication or over-engineering that increases maintenance cost |
|
|
64
|
+
| P2 | Minor cleanups — long functions, missing pagination, style inconsistencies |
|
|
65
|
+
|
|
66
|
+
## Pass 3 — Traceability Review 🔵
|
|
67
|
+
|
|
68
|
+
**Framing:** An integration expert tracing every user-facing action end-to-end — from UI to database and back. The AI generates code file-by-file, and the seams between files are where bugs hide.
|
|
69
|
+
|
|
70
|
+
**What to look for:**
|
|
71
|
+
|
|
72
|
+
1. **Map every entry point** — list all handlers, routes, controllers, or event listeners that receive external input.
|
|
73
|
+
2. **Trace each call chain** — for each entry point, follow the call: handler → service → repository → database. At each boundary, verify:
|
|
74
|
+
- **Function name** — does the caller use the exact function name the callee exposes?
|
|
75
|
+
- **Argument names** — does the caller pass `userId` when the function expects `user_id`? Does `id` mean the same thing in both layers?
|
|
76
|
+
- **Argument types** — is a string passed where an integer is expected? Is an object shape different from what the next layer destructures?
|
|
77
|
+
- **Return shape** — does the caller expect fields that the callee actually returns? Are response DTOs consistent across layers?
|
|
78
|
+
3. **Check error propagation** — when a database query returns no results, does the service layer handle it? Does the handler return 404 or 500? Do errors propagate cleanly or get swallowed silently?
|
|
79
|
+
4. **Verify the round-trip** — if the UI calls `getUser(id)` and displays `user.name`, trace that `name` actually exists in the DB schema, gets selected by the query, mapped by the repository, passed through the service, included in the response, and rendered by the UI.
|
|
80
|
+
|
|
81
|
+
**This is the pass that catches the most bugs.** AI-generated code will often have a frontend calling `getUserProfile(userId)` and a backend exposing `get_user_profile(user_id)` — both work in isolation, neither works together.
|
|
82
|
+
|
|
83
|
+
**Severity classification:**
|
|
84
|
+
|
|
85
|
+
| Severity | Definition |
|
|
86
|
+
|----------|-----------|
|
|
87
|
+
| Critical | Call chain is completely broken — function doesn't exist or signature is fundamentally wrong |
|
|
88
|
+
| High | Signature mismatch — wrong arg names, wrong types, missing required fields |
|
|
89
|
+
| Medium | Silent error handling — errors swallowed without logging or user feedback |
|
|
90
|
+
| Low | Inconsistent naming conventions that could confuse future developers |
|
|
91
|
+
|
|
92
|
+
## Report Format
|
|
93
|
+
|
|
94
|
+
Write findings to `docs/plans/*-verification-report.md` using this structure:
|
|
95
|
+
|
|
96
|
+
```markdown
|
|
97
|
+
# Verification Report: <feature/topic>
|
|
98
|
+
|
|
99
|
+
**Date:** <ISO date>
|
|
100
|
+
**Scope:** <summary of what was reviewed>
|
|
101
|
+
**Reviewer:** AI verify skill (security + optimization + traceability)
|
|
102
|
+
|
|
103
|
+
## Summary
|
|
104
|
+
|
|
105
|
+
| Pass | Critical | High | Medium | Low |
|
|
106
|
+
|------|----------|------|--------|-----|
|
|
107
|
+
| Security | X | X | X | X |
|
|
108
|
+
| Optimization | — | X | X | X |
|
|
109
|
+
| Traceability | X | X | X | X |
|
|
110
|
+
| **Total** | **X** | **X** | **X** | **X** |
|
|
111
|
+
|
|
112
|
+
## 🔴 Security Findings
|
|
113
|
+
|
|
114
|
+
### [S-001] Critical — <short title>
|
|
115
|
+
|
|
116
|
+
**Location:** `path/to/file.ts:line`
|
|
117
|
+
|
|
118
|
+
**Issue:** <what's wrong and why it matters>
|
|
119
|
+
|
|
120
|
+
**Fix:** <concrete remediation step>
|
|
121
|
+
|
|
122
|
+
### [S-002] High — <short title>
|
|
123
|
+
...
|
|
124
|
+
|
|
125
|
+
## 🟡 Optimization Findings
|
|
126
|
+
|
|
127
|
+
### [O-001] P0 — <short title>
|
|
128
|
+
|
|
129
|
+
**Location:** `path/to/file.ts:line` and `path/to/other.ts:line`
|
|
130
|
+
|
|
131
|
+
**Issue:** <what's wrong>
|
|
132
|
+
|
|
133
|
+
**Fix:** <concrete remediation step>
|
|
134
|
+
|
|
135
|
+
### [O-002] P1 — <short title>
|
|
136
|
+
...
|
|
137
|
+
|
|
138
|
+
## 🔵 Traceability Findings
|
|
139
|
+
|
|
140
|
+
### [T-001] Critical — <short title>
|
|
141
|
+
|
|
142
|
+
**Entry point:** `path/to/handler.ts:line`
|
|
143
|
+
**Call chain:** handler → service → repository → DB
|
|
144
|
+
**Broken at:** <which boundary>
|
|
145
|
+
**Issue:** <what's wrong — e.g., handler passes `userId` but service expects `user_id`>
|
|
146
|
+
|
|
147
|
+
**Fix:** <concrete remediation step>
|
|
148
|
+
|
|
149
|
+
### [T-002] High — <short title>
|
|
150
|
+
...
|
|
151
|
+
|
|
152
|
+
## Remediation Task List
|
|
153
|
+
|
|
154
|
+
Convert findings into actionable tasks:
|
|
155
|
+
|
|
156
|
+
| ID | Priority | Finding | Estimated Effort |
|
|
157
|
+
|----|----------|---------|-----------------|
|
|
158
|
+
| S-001 | Critical | <one-liner> | <small/medium/large> |
|
|
159
|
+
| T-001 | Critical | <one-liner> | <small/medium/large> |
|
|
160
|
+
| O-001 | P0 | <one-liner> | <small/medium/large> |
|
|
161
|
+
| ...
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
## Principles
|
|
165
|
+
|
|
166
|
+
- **Be specific** — every finding must include a file path and line reference. "There might be security issues" is useless.
|
|
167
|
+
- **Be adversarial** — actively look for problems. If you don't find any, say so — but don't phone it in.
|
|
168
|
+
- **Be proportional** — a small config change doesn't need the same depth as a new API endpoint. Adjust your review depth to the scope of changes.
|
|
169
|
+
- **Don't fix anything** — this is read-only. Find and report. The user decides what to fix and when.
|
|
170
|
+
- **Focus on seams** — the traceability pass is where the most value lives. Code within a single file is usually coherent; the bugs hide between files.
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
---
|
|
2
|
-
name: writing-plans
|
|
2
|
+
name: pwk-writing-plans
|
|
3
3
|
description: "Use this to break a design into an implementation plan with bite-sized TDD tasks. Works with or without a prior brainstorm. Use this skill when the user says 'let's plan', 'break this down', 'write a plan', 'create tasks', or after a brainstorm session when they want to move to implementation. Also use when the user has a clear idea and wants to jump straight to a structured plan."
|
|
4
4
|
---
|
|
5
5
|
|
|
@@ -11,7 +11,9 @@ You may only create or edit files under `docs/plans/`. Do not modify source code
|
|
|
11
11
|
|
|
12
12
|
1. **Check for a design doc** — look for `docs/plans/*-design.md`. If one exists, use it as the basis for the plan. If the design doc is incomplete, fill gaps by asking the human. If no design doc exists, ask the user to describe what they want to build and read relevant code. **Read `docs/lessons.md`** if it exists — incorporate known patterns into the task breakdown (e.g., if a lesson says "always run lint before commit," include that in relevant task instructions).
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
If the design doc has a `## Features` table, read it to identify the next feature with status `⬜ pending`. Mark that feature as `🔄 planned` by editing the design doc. This plan will cover only that one feature. If the design doc has no Features table, plan the entire design as before.
|
|
15
|
+
|
|
16
|
+
Then evaluate whether the feature — whether from the design doc or from the user's description and codebase exploration — involves any of the following:
|
|
15
17
|
|
|
16
18
|
- Database schema changes or migrations
|
|
17
19
|
- Authentication or authorization logic
|
|
@@ -20,10 +22,21 @@ You may only create or edit files under `docs/plans/`. Do not modify source code
|
|
|
20
22
|
- File uploads or large data flows
|
|
21
23
|
- Redis, caching, or message queues
|
|
22
24
|
|
|
23
|
-
If any apply
|
|
25
|
+
If any apply, prompt the user: "This feature involves [list what you found] but hasn't been reviewed for production risks. Run `/skill:pwk-design-review` first, or type 'proceed' to skip."
|
|
24
26
|
|
|
25
27
|
If the design doc explicitly notes "Simple change — no design review needed", skip this check.
|
|
26
|
-
2. **Write the implementation plan** — break the
|
|
28
|
+
2. **Write the implementation plan** — break the feature into tasks. Save to `docs/plans/YYYY-MM-DD-<topic>-<feature-name>-implementation.md` (derive `<feature-name>` from the feature's name in the table, slugified). If the design doc has no Features table, use `docs/plans/YYYY-MM-DD-<topic>-implementation.md`. Include metadata at the top of the plan doc so the executor can find the design doc and feature row:
|
|
29
|
+
|
|
30
|
+
```markdown
|
|
31
|
+
# Implementation Plan: <feature name>
|
|
32
|
+
|
|
33
|
+
## Overview
|
|
34
|
+
|
|
35
|
+
Design: docs/plans/YYYY-MM-DD-<topic>-design.md
|
|
36
|
+
Feature: <feature name> (row N in Features table)
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
If the design is too large for ~15 tasks for a single feature, flag this to the human and ask whether to reduce scope or proceed with the full plan.
|
|
27
40
|
3. **Present the plan** — show the complete plan to the human. Wait for approval before suggesting execution.
|
|
28
41
|
|
|
29
42
|
Before presenting, run the **Plan Acceptance Audit**:
|
|
@@ -31,7 +44,7 @@ You may only create or edit files under `docs/plans/`. Do not modify source code
|
|
|
31
44
|
- **Task Sizing**: Is any single task too large or covering multiple complex behaviors? If so, split it.
|
|
32
45
|
- **QA Coverage**: Does every task have both a Happy Path and at least one Edge Case in its Acceptance Criteria?
|
|
33
46
|
- **Checkpoint Alignment**: Are `checkpoint: test` and `checkpoint: done` gates placed on the most critical or risky tasks?
|
|
34
|
-
- **Risk Enforcement**: If
|
|
47
|
+
- **Risk Enforcement**: If this plan doc's Architectural Review section flagged any hazards as `[TRIGGERED]`, verify the corresponding tasks have `checkpoint: done` and a `Hazard Mitigation Verification` section.
|
|
35
48
|
|
|
36
49
|
If any check fails, fix the plan before presenting.
|
|
37
50
|
|
|
@@ -289,4 +302,57 @@ Use judgment when assigning checkpoints. Prefer `checkpoint: test` for new featu
|
|
|
289
302
|
|
|
290
303
|
## After the plan
|
|
291
304
|
|
|
292
|
-
Ask: "Ready to execute? Run `/skill:executing-tasks`"
|
|
305
|
+
Ask: "Ready to execute? Run `/skill:pwk-executing-tasks`"
|
|
306
|
+
|
|
307
|
+
> After executing this feature, the executor will check for more `⬜ pending` features and suggest planning the next one.
|
|
308
|
+
|
|
309
|
+
## Behavioral Guidelines
|
|
310
|
+
|
|
311
|
+
Guidelines to reduce overcomplication and hidden assumptions in plans. Derived from [Andrej Karpathy's observations](https://x.com/karpathy/status/2015883857489522876) on LLM coding pitfalls, adapted for the planning context.
|
|
312
|
+
|
|
313
|
+
**Tradeoff:** These guidelines bias toward caution over speed. For trivial plans (1-2 tasks), use judgment.
|
|
314
|
+
|
|
315
|
+
### Surface Assumptions
|
|
316
|
+
|
|
317
|
+
**When the design is ambiguous, annotate — don't silently pick.**
|
|
318
|
+
|
|
319
|
+
When writing a plan, you'll encounter gaps: the design says "paginated" but doesn't specify how, says "validate input" but doesn't say which fields, or leaves the data layer unspecified. Your instinct will be to fill the gap and keep writing. Resist that.
|
|
320
|
+
|
|
321
|
+
Instead, add a brief `> **Assumption:** ...` note in the plan at the point where you made the call:
|
|
322
|
+
|
|
323
|
+
```
|
|
324
|
+
> **Assumption:** Using offset/limit pagination because the design just says
|
|
325
|
+
> "paginated". Cursor-based would be better for large datasets.
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
```
|
|
329
|
+
> **Assumption:** No service layer — handler calls store directly. Add one
|
|
330
|
+
> if cross-cutting concerns (logging, auth checks) emerge later.
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
This lets the reviewer see what you chose and why, without blocking progress. Common gaps worth annotating:
|
|
334
|
+
- Pagination style, error handling strategy, concurrency model
|
|
335
|
+
- Whether to add a service/middleware layer
|
|
336
|
+
- Whether to add external dependencies
|
|
337
|
+
- Naming conventions when the design doesn't specify
|
|
338
|
+
|
|
339
|
+
### Build Only What Each Task Needs
|
|
340
|
+
|
|
341
|
+
**Minimum code to deliver the task's observable behavior. Nothing more.**
|
|
342
|
+
|
|
343
|
+
- No interface methods that no task exercises yet. If Task 2 creates a `Store` interface, it should have only the methods Task 2 calls. Add methods in the task that first needs them.
|
|
344
|
+
- No layers (service, middleware, repository) unless the design explicitly requires them.
|
|
345
|
+
- No error types, helper files, or shared packages until a task actually uses them.
|
|
346
|
+
- No external dependencies when stdlib suffices. Every `go get` or `npm install` is a choice — default to no.
|
|
347
|
+
- No "flexible" or "configurable" code that wasn't requested.
|
|
348
|
+
|
|
349
|
+
If you find yourself writing a store with 4 methods where only 1 is used in this task, stop. Write 1 method. Add the rest when the tasks that need them arrive.
|
|
350
|
+
|
|
351
|
+
### One Task, One Change
|
|
352
|
+
|
|
353
|
+
**Each task should trace to exactly one user-facing behavior.**
|
|
354
|
+
|
|
355
|
+
- If a task creates more than 4 new files, it's probably doing too much — split it.
|
|
356
|
+
- If a task modifies existing files unrelated to its acceptance criteria, trim the scope.
|
|
357
|
+
- Infrastructure (types, interfaces, module scaffolding) should live in the same task as the first code that uses it, not in a separate "setup" task — unless the infrastructure alone is complex enough to warrant its own task.
|
|
358
|
+
- Every file listed in a task's `Files:` section should be directly necessary for that task's acceptance criteria to pass.
|