npm - @tianhai/pi-workflow-kit - Versions diffs - 0.16.0 → 0.18.1 - Mend

@tianhai/pi-workflow-kit 0.16.0 → 0.18.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/docs/plans/completed/2026-06-03-verify-skill-design.md ADDED Viewed

@@ -0,0 +1,176 @@
+# Verify Skill — Draft SKILL.md
+> **Target path:** `skills/verify/SKILL.md` (to be created during executing-tasks)
+---
+```markdown
+---
+name: verify
+description: "Post-implementation code verification with three expert review passes — security, optimization, and traceability. Use after executing-tasks and before finalizing to catch issues that pass tests but break in production. Runs the 'last prompt' pattern: adversarial security review, dead code and duplication audit, and end-to-end contract verification across every layer. Use this skill whenever the user says 'verify', 'review the code', 'check for issues', 'security review', 'the last prompt', 'audit', or when code has been implemented and needs a quality gate before shipping."
+---
+# Verify
+Three expert review passes over the implemented codebase. Read-only — you **may** write the verification report to `docs/plans/`, but you **may not** modify source code.
+The core insight: code that passes tests is not code that's ready. Working code can have security holes, dead branches, duplicated logic, and broken contracts between layers — especially when AI generates across many files without maintaining a single mental model of the whole system. This skill catches what tests miss.
+## Process
+1. **Check what's been done** — run `git log --oneline` and `git diff --stat` to understand the scope of recent changes. If nothing has been implemented, say "No code changes found. Run `/skill:executing-tasks` first." and stop.
+2. **Identify the project's layers** — before reviewing, map the codebase's architecture. Look for layer boundaries: UI/handlers/routes → services/business logic → repositories/data access → database/models. Note the patterns: does the project use controllers, handlers, or routes? Services or use cases? Repositories or DAOs? This map drives the traceability pass.
+3. **Run three expert review passes** — each pass adopts a distinct adversarial framing. Do them sequentially. For each pass, read the relevant code deeply — don't skim. Then write findings.
+4. **Compile the report** — write all findings to `docs/plans/*-verification-report.md`. Present the report to the user and wait for feedback.
+5. **Offer to create a remediation plan** — after the report, ask: "Want me to create a fix plan from these findings? Run `/skill:writing-plans` to turn the task list into executable tasks."
+## Pass 1 — Security Review 🔴
+**Framing:** A junior developer wrote this code. Now the best security expert on the team is reviewing it — adversarial, suspicious of everything. Trust nothing.
+**What to look for:**
+- **Input validation** — every external input (HTTP params, form data, headers, query strings, environment variables) must be validated and sanitized. Unvalidated input is a critical finding.
+- **Authentication & authorization** — every endpoint that handles user data must have auth checks. Are there endpoints that skip auth? Can one user access another user's data by changing an ID?
+- **Injection** — SQL queries built by string concatenation, unsanitized shell commands, template injection, XSS in HTML output. Any raw variable interpolated into a query or command is critical.
+- **Secrets** — API keys, passwords, tokens hardcoded in source files. Check environment variable loading — are defaults set to empty or to actual secrets?
+- **Data exposure** — are sensitive fields (passwords, tokens, PII) logged, returned in API responses, or stored unencrypted?
+- **Dependency risks** — known-vulnerable packages (if `package.json`/`go.mod`/`requirements.txt` is present).
+**Severity classification:**
+| Severity | Definition |
+|----------|-----------|
+| Critical | Exploitable right now — auth bypass, injection, data leak |
+| High | Likely exploitable — missing validation on sensitive endpoint, weak auth |
+| Medium | Harder to exploit but real risk — verbose error messages leaking internals, missing rate limits |
+| Low | Best practice violations — missing CSP headers, no HSTS, long session timeouts |
+## Pass 2 — Optimization Review 🟡
+**Framing:** A code quality expert looking for waste — things that make the codebase harder to maintain, slower to run, or more confusing than necessary.
+**What to look for:**
+- **Dead code** — functions, methods, types, or exports that are never called anywhere in the codebase. Search for definitions and verify they have callers.
+- **Duplication** — the same logic implemented in slightly different ways across multiple files. AI-generated code is especially prone to this — if context was lost between sessions, the AI solved the same sub-problem differently in two places. Flag each pair with file paths and line numbers.
+- **Over-engineering** — abstractions, interfaces, or layers that add complexity without earning their keep (only one implementation, no real variation across the seam).
+- **Under-engineering** — god functions, 200-line blocks, deeply nested conditionals that should be extracted.
+- **Performance concerns** — N+1 queries, unbounded loops, unnecessary copies of large data structures, missing pagination on list endpoints.
+**Priority classification:**
+| Priority | Definition |
+|----------|-----------|
+| P0 | Dead code in a critical path or duplicated logic that will diverge |
+| P1 | Significant duplication or over-engineering that increases maintenance cost |
+| P2 | Minor cleanups — long functions, missing pagination, style inconsistencies |
+## Pass 3 — Traceability Review 🔵
+**Framing:** An integration expert tracing every user-facing action end-to-end — from UI to database and back. The AI generates code file-by-file, and the seams between files are where bugs hide.
+**What to look for:**
+1. **Map every entry point** — list all handlers, routes, controllers, or event listeners that receive external input.
+2. **Trace each call chain** — for each entry point, follow the call: handler → service → repository → database. At each boundary, verify:
+   - **Function name** — does the caller use the exact function name the callee exposes?
+   - **Argument names** — does the caller pass `userId` when the function expects `user_id`? Does `id` mean the same thing in both layers?
+   - **Argument types** — is a string passed where an integer is expected? Is an object shape different from what the next layer destructures?
+   - **Return shape** — does the caller expect fields that the callee actually returns? Are response DTOs consistent across layers?
+3. **Check error propagation** — when a database query returns no results, does the service layer handle it? Does the handler return 404 or 500? Do errors propagate cleanly or get swallowed silently?
+4. **Verify the round-trip** — if the UI calls `getUser(id)` and displays `user.name`, trace that `name` actually exists in the DB schema, gets selected by the query, mapped by the repository, passed through the service, included in the response, and rendered by the UI.
+**This is the pass that catches the most bugs.** AI-generated code will often have a frontend calling `getUserProfile(userId)` and a backend exposing `get_user_profile(user_id)` — both work in isolation, neither works together.
+**Severity classification:**
+| Severity | Definition |
+|----------|-----------|
+| Critical | Call chain is completely broken — function doesn't exist or signature is fundamentally wrong |
+| High | Signature mismatch — wrong arg names, wrong types, missing required fields |
+| Medium | Silent error handling — errors swallowed without logging or user feedback |
+| Low | Inconsistent naming conventions that could confuse future developers |
+## Report Format
+Write findings to `docs/plans/*-verification-report.md` using this structure:
+    # Verification Report: <feature/topic>
+    **Date:** <ISO date>
+    **Scope:** <summary of what was reviewed>
+    **Reviewer:** AI verify skill (security + optimization + traceability)
+    ## Summary
+    | Pass | Critical | High | Medium | Low |
+    |------|----------|------|--------|-----|
+    | Security | X | X | X | X |
+    | Optimization | — | X | X | X |
+    | Traceability | X | X | X | X |
+    | **Total** | **X** | **X** | **X** | **X** |
+    ## 🔴 Security Findings
+    ### [S-001] Critical — <short title>
+    **Location:** `path/to/file.ts:line`
+    **Issue:** <what's wrong and why it matters>
+    **Fix:** <concrete remediation step>
+    ### [S-002] High — <short title>
+    ...
+    ## 🟡 Optimization Findings
+    ### [O-001] P0 — <short title>
+    **Location:** `path/to/file.ts:line` and `path/to/other.ts:line`
+    **Issue:** <what's wrong>
+    **Fix:** <concrete remediation step>
+    ### [O-002] P1 — <short title>
+    ...
+    ## 🔵 Traceability Findings
+    ### [T-001] Critical — <short title>
+    **Entry point:** `path/to/handler.ts:line`
+    **Call chain:** handler → service → repository → DB
+    **Broken at:** <which boundary>
+    **Issue:** <what's wrong — e.g., handler passes `userId` but service expects `user_id`>
+    **Fix:** <concrete remediation step>
+    ### [T-002] High — <short title>
+    ...
+    ## Remediation Task List
+    Convert findings into actionable tasks:
+    | ID | Priority | Finding | Estimated Effort |
+    |----|----------|---------|-----------------|
+    | S-001 | Critical | <one-liner> | <small/medium/large> |
+    | T-001 | Critical | <one-liner> | <small/medium/large> |
+    | O-001 | P0 | <one-liner> | <small/medium/large> |
+    | ...
+## Principles
+- **Be specific** — every finding must include a file path and line reference. "There might be security issues" is useless.
+- **Be adversarial** — actively look for problems. If you don't find any, say so — but don't phone it in.
+- **Be proportional** — a small config change doesn't need the same depth as a new API endpoint. Adjust your review depth to the scope of changes.
+- **Don't fix anything** — this is read-only. Find and report. The user decides what to fix and when.
+- **Focus on seams** — the traceability pass is where the most value lives. Code within a single file is usually coherent; the bugs hide between files.
+```

package/docs/plans/completed/2026-06-09-code-review-fixes-implementation.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Implementation Plan: Code review findings fix
+## Overview
+Fix 5 findings from the cross-skill code review. All are small — documentation clarity and minor correctness guards.
+## Task 1: Add explicit fallback documentation in executing-tasks "Find the plan"
+<!-- tdd: trivial -->
+Fix finding #1. The primary path expects a design doc with Features table, but the fallback for plans without one is implicit.
+Files:
+- `skills/pwk-executing-tasks/SKILL.md`
+Steps:
+1. In step 2 ("Find the plan"), after the fallback sentence, append: "This covers plans created without a brainstorm session (no design doc or Features table)."
+## Task 2: Add metadata-missing guard in executing-tasks per-task step 2
+<!-- tdd: trivial -->
+Fix finding #2. When a plan has no `Design:` / `Feature:` metadata (no Features table), the "Extract metadata" instruction is dangling.
+Files:
+- `skills/pwk-executing-tasks/SKILL.md`
+Steps:
+1. In per-task step 2, after "Extract the `Design:` and `Feature:` metadata to know which design doc and feature row this execution covers.", add: "If no `Design:` or `Feature:` metadata is present, the plan covers the entire design (no feature table). Skip design doc reading and proceed directly to task execution."
+## Task 3: Document worktree handoff behavior for multi-feature plans
+<!-- tdd: trivial -->
+Fix finding #3. The worktree glob moves all plan docs, which is correct but should be documented explicitly.
+Files:
+- `skills/pwk-executing-tasks/SKILL.md`
+Steps:
+1. In step 3b ("Move plan docs into the worktree"), add a note before the mv commands:
+   > When using the feature table, all plan docs for this design move together — completed feature plans, the current feature's plan, and the design doc. This is intentional: the worktree works on one design at a time.
+## Task 4: Add unstarted-features guard to finalizing
+<!-- tdd: trivial -->
+Fix finding #4. Archiving the design doc while features are still `⬜ pending` makes them invisible to future planning.
+Files:
+- `skills/pwk-finalizing/SKILL.md`
+Steps:
+1. In step 1 ("Move planning docs"), before the archive commands, add a check:
+   > If the design doc has a `## Features` table with any `⬜ pending` or `🔄 planned` features, warn:
+   > ```
+   > ⚠️ Design doc has N unplanned features. Archive anyway, or go back to plan them?
+   > ```
+   > Wait for the user to confirm before proceeding.
+## Task 5: Add verification report to finalizing archive step
+<!-- tdd: trivial -->
+Fix finding #5. Verification reports are left behind in `docs/plans/` after finalizing.
+Files:
+- `skills/pwk-finalizing/SKILL.md`
+Steps:
+1. In step 1, after the existing `mv` commands, add:
+   ```
+   mv docs/plans/*-verification-report.md docs/plans/completed/ 2>/dev/null || true
+   ```

package/docs/plans/completed/2026-06-09-code-review-fixes-progress.md ADDED Viewed

@@ -0,0 +1,14 @@
+# Progress: Code review fixes
+Plan: docs/plans/2026-06-09-code-review-fixes-implementation.md
+Branch: incremental-workflow-and-rename
+Started: 2026-06-09T23:15:00
+Last updated: 2026-06-09T23:15:00
+| # | Status | Task | Commit |
+|---|--------|------|--------|
+| 1 | ✅ done | Add explicit fallback docs in executing-tasks "Find the plan" | 1ab7825 |
+| 2 | ✅ done | Add metadata-missing guard in executing-tasks per-task step 2 | 1ab7825 |
+| 3 | ✅ done | Document worktree handoff for multi-feature plans | 1ab7825 |
+| 4 | ✅ done | Add unstarted-features guard to finalizing | 1ab7825 |
+| 5 | ✅ done | Add verification report to finalizing archive step | 1ab7825 |

package/docs/plans/completed/2026-06-09-incremental-workflow-and-rename-design.md ADDED Viewed

@@ -0,0 +1,186 @@
+# Incremental Workflow & Skill Rename
+## Problem
+Three issues with the current workflow:
+1. **Design review runs too early.** The 8 hazard checks evaluate concrete code (missing indexes, raw SQL interpolation, unbounded concurrency). A design doc is too vague to audit effectively — most hazards are invisible until the plan has actual code.
+2. **Large features create context pressure.** Planning all tasks upfront, then executing them all, means a 20-task plan accumulates massive context. Lessons learned mid-execution can't reshape later tasks. The plan goes stale.
+3. **Skills aren't namespaced.** All 7 skills live in a flat namespace, making it hard to discover which skills belong to pi-workflow-kit versus third-party or user-installed skills.
+## Decision
+1. **Move design review after writing-plans.** The plan doc has concrete code, making hazard checks meaningful. Writing-plans already flags high-risk areas — when flagged, suggest design-review before executing.
+2. **Incremental feature-based workflow.** Brainstorm names features. Each feature gets its own plan → execute → optional verify cycle. The brainstorm doc's `## Features` table tracks overall progress.
+3. **Rename all skills with `pwk-` prefix.** All 7 skills and all cross-references updated.
+## Workflow Change
+### Before
+```
+brainstorm → [design-review?] → plan (all tasks) → execute (all tasks) → [verify?] → finalize
+```
+### After
+```
+brainstorm (name features)
+  → plan next feature
+    → [design-review if hazards flagged]
+    → execute feature
+    → [verify this feature? (optional)]
+    → more features? → loop back to plan
+    → all done?
+      → [verify everything? (optional)]
+      → finalize
+```
+## Feature Table
+Brainstorm doc adds a `## Features` table. Simple features get one row. Complex features get many. The table is the feature-level state machine — it answers "what's next?" at any point.
+```markdown
+## Features
+| # | Feature | Status | Observable Behavior |
+|---|---------|--------|---------------------|
+| 1 | User signup | ✅ done | User can create account with email+password |
+| 2 | Email verification | 🔄 planned | User receives and confirms verification email |
+| 3 | Password reset | ⬜ pending | User can reset password via email link |
+```
+Status values: `⬜ pending`, `🔄 planned`, `✅ done`, `⏭ skipped`.
+### Table maintenance
+- **Brainstorm** creates the table with all rows as `⬜ pending`
+- **Writing-plans** marks the next feature as `🔄 planned` when it creates a plan for it
+- **Executing-tasks** marks the feature as `✅ done` (or `⏭ skipped`) after completing all its tasks
+- **Any skill can add rows** if a new feature is discovered mid-implementation (human decides, not agent)
+- The table is a living document. If features need merging, splitting, or reordering, the human directs changes during execution
+## Design Review Changes
+### Timing
+Design review moves from after-brainstorm to after-writing-plans.
+**Trigger mechanism (already exists in writing-plans step 1):** Writing-plans checks for hazards (DB schema changes, auth, external APIs, concurrency, uploads, Redis/MQ). If any apply AND no architectural review section exists → prompt user to run `/skill:pwk-design-review` or type 'proceed' to skip.
+If the plan doc notes "Simple change — no design review needed" → skip.
+### Review input
+Design review reads both the plan doc (concrete code) and the design doc (architectural context). This is better than the current flow — concrete code makes hazards visible.
+### No mandatory per-feature review
+Design review is suggested, not mandatory, for each feature. The writing-plans hazard check gates it. Low-risk features skip it entirely.
+## Verify Changes
+### Two modes
+| | Per-feature verify | Full verify |
+|---|---|---|
+| **Scope** | One feature's code | All feature code together |
+| **Catches** | Security, dead code, traceability within the feature | Cross-feature integration, duplicated patterns across features, overall consistency |
+| **When** | After each feature (optional, human-initiated) | After all features (optional, human-initiated) |
+| **Cost** | Low — small code surface | Higher — full codebase |
+Neither is mandatory. The human decides based on risk and complexity.
+### Executor prompts
+After completing a feature:
+```
+✅ Feature "<name>" complete.
+⏭  Next: "<next feature name>"
+💡 Options:
+   - Plan next feature: /skill:pwk-writing-plans
+   - Verify this feature first: /skill:pwk-verify
+   - Or just say "continue"
+```
+After all features complete:
+```
+✅ All features complete!
+   - Verify everything: /skill:pwk-verify
+   - Ship: /skill:pwk-finalizing
+```
+## Per-Feature Execution Model
+### Plan docs
+One plan doc per feature: `docs/plans/YYYY-MM-DD-<topic>-<feature-name>-implementation.md`
+### Progress docs
+One progress doc per feature: `docs/plans/YYYY-MM-DD-<topic>-<feature-name>-progress.md`
+### Feature loop
+1. **Writing-plans** reads design doc, identifies next `⬜ pending` feature, marks it `🔄 planned`, writes plan doc for that feature
+2. **Design review** (if triggered) reviews plan doc + design doc
+3. **Executing-tasks** executes the feature's plan, marks feature `✅ done` in design doc table
+4. **Verify** (optional) reviews just the feature's code
+5. **Loop** back to step 1 if more `⬜ pending` features exist
+### Session boundaries
+Each plan → execute cycle is a natural session break. The executor suggests `/new` between features for clean context, as it already does for long task runs.
+## Skill Rename
+All 7 skills renamed with `pwk-` prefix:
+| Current | New |
+|---------|-----|
+| brainstorming | pwk-brainstorming |
+| writing-plans | pwk-writing-plans |
+| executing-tasks | pwk-executing-tasks |
+| design-review | pwk-design-review |
+| verify | pwk-verify |
+| finalizing | pwk-finalizing |
+| diagnose | pwk-diagnose |
+All cross-references between skills updated (e.g. `Run /skill:executing-tasks` → `Run /skill:pwk-executing-tasks`).
+## Files Changed
+### Skill files (rename + content updates)
+- `skills/brainstorming/SKILL.md` → `skills/pwk-brainstorming/SKILL.md`
+- `skills/writing-plans/SKILL.md` → `skills/pwk-writing-plans/SKILL.md`
+- `skills/executing-tasks/SKILL.md` → `skills/pwk-executing-tasks/SKILL.md`
+- `skills/design-review/SKILL.md` → `skills/pwk-design-review/SKILL.md`
+- `skills/verify/SKILL.md` → `skills/pwk-verify/SKILL.md`
+- `skills/finalizing/SKILL.md` → `skills/pwk-finalizing/SKILL.md`
+- `skills/diagnose/SKILL.md` → `skills/pwk-diagnose/SKILL.md`
+### Documentation
+- `docs/workflow-phases.md` — update skill names and flow diagram
+- `docs/oversight-model.md` — update skill names
+- `docs/developer-usage-guide.md` — update skill names
+### Extension
+- `extensions/workflow-guard.ts` — update skill name references:
+  - `SKILL_TO_PHASE` keys: `brainstorming` → `pwk-brainstorming`, `writing-plans` → `pwk-writing-plans`
+  - Phase-clearing triggers: `/skill:executing-tasks` → `/skill:pwk-executing-tasks`, `/skill:finalizing` → `/skill:pwk-finalizing`
+  - Add `"pwk-verify": "verify"` to `SKILL_TO_PHASE` — enforce write restriction (only `docs/plans/`) during verify phase, matching the skill's read-only claim
+- `tests/workflow-guard.test.ts` — update all skill name references and add test cases for `pwk-verify` phase
+## Features (implementation slices)
+1. **Rename skill directories and files** — move all 7 skill folders to `pwk-` prefix names, update frontmatter names
+2. **Update cross-references in all skills** — find-and-replace all `/skill:` references across all 7 skill files
+3. **Add feature table to brainstorming** — update brainstorming skill to produce a `## Features` table in design doc
+4. **Update writing-plans for feature-at-a-time** — detect next `⬜ pending` feature, plan only that feature, mark `🔄 planned`, update file naming to per-feature
+5. **Update executing-tasks for feature loop** — mark feature `✅ done` after all tasks, suggest next feature or verify/finalize
+6. **Move design review trigger** — remove brainstorm's "after design" review suggestion, confirm writing-plans already has the trigger (it does)
+7. **Update executor end-of-feature prompts** — new prompt format with verify option and next feature
+8. **Update workflow-guard extension** — rename skill references, add `pwk-verify` to phase map
+9. **Update documentation** — workflow-phases.md, oversight-model.md, developer-usage-guide.md
+Simple change — no design review needed.