npm - @tianhai/pi-workflow-kit - Versions diffs - 0.4.1 - Mend

@tianhai/pi-workflow-kit 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

package/LICENSE +22 -0
package/README.md +509 -0
package/ROADMAP.md +16 -0
package/agents/code-reviewer.md +18 -0
package/agents/config.ts +5 -0
package/agents/implementer.md +26 -0
package/agents/spec-reviewer.md +13 -0
package/agents/worker.md +17 -0
package/banner.jpg +0 -0
package/docs/developer-usage-guide.md +463 -0
package/docs/oversight-model.md +49 -0
package/docs/workflow-phases.md +71 -0
package/extensions/constants.ts +9 -0
package/extensions/lib/logging.ts +138 -0
package/extensions/plan-tracker.ts +496 -0
package/extensions/subagent/agents.ts +144 -0
package/extensions/subagent/concurrency.ts +52 -0
package/extensions/subagent/env.ts +47 -0
package/extensions/subagent/index.ts +1116 -0
package/extensions/subagent/lifecycle.ts +25 -0
package/extensions/subagent/timeout.ts +13 -0
package/extensions/workflow-monitor/debug-monitor.ts +98 -0
package/extensions/workflow-monitor/git.ts +31 -0
package/extensions/workflow-monitor/heuristics.ts +58 -0
package/extensions/workflow-monitor/investigation.ts +52 -0
package/extensions/workflow-monitor/reference-tool.ts +42 -0
package/extensions/workflow-monitor/skip-confirmation.ts +19 -0
package/extensions/workflow-monitor/tdd-monitor.ts +137 -0
package/extensions/workflow-monitor/test-runner.ts +37 -0
package/extensions/workflow-monitor/verification-monitor.ts +61 -0
package/extensions/workflow-monitor/warnings.ts +81 -0
package/extensions/workflow-monitor/workflow-handler.ts +358 -0
package/extensions/workflow-monitor/workflow-tracker.ts +231 -0
package/extensions/workflow-monitor/workflow-transitions.ts +55 -0
package/extensions/workflow-monitor.ts +885 -0
package/package.json +49 -0
package/skills/brainstorming/SKILL.md +70 -0
package/skills/dispatching-parallel-agents/SKILL.md +194 -0
package/skills/executing-tasks/SKILL.md +247 -0
package/skills/receiving-code-review/SKILL.md +196 -0
package/skills/systematic-debugging/SKILL.md +170 -0
package/skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
package/skills/systematic-debugging/condition-based-waiting.md +115 -0
package/skills/systematic-debugging/defense-in-depth.md +122 -0
package/skills/systematic-debugging/find-polluter.sh +63 -0
package/skills/systematic-debugging/reference/rationalizations.md +61 -0
package/skills/systematic-debugging/root-cause-tracing.md +169 -0
package/skills/test-driven-development/SKILL.md +266 -0
package/skills/test-driven-development/reference/examples.md +101 -0
package/skills/test-driven-development/reference/rationalizations.md +67 -0
package/skills/test-driven-development/reference/when-stuck.md +33 -0
package/skills/test-driven-development/testing-anti-patterns.md +299 -0
package/skills/using-git-worktrees/SKILL.md +231 -0
package/skills/writing-plans/SKILL.md +149 -0

package/package.json ADDED Viewed

@@ -0,0 +1,49 @@
+{
+  "name": "@tianhai/pi-workflow-kit",
+  "version": "0.4.1",
+  "description": "Workflow skills and enforcement extensions for pi",
+  "keywords": [
+    "pi-package"
+  ],
+  "scripts": {
+    "test": "vitest run",
+    "lint": "biome check .",
+    "check": "biome check . && vitest run"
+  },
+  "license": "MIT",
+  "author": "yinloo-ola",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/yinloo-ola/pi-workflow-kit.git"
+  },
+  "files": [
+    "extensions/",
+    "agents/",
+    "skills/",
+    "docs/",
+    "banner.jpg",
+    "LICENSE",
+    "README.md",
+    "ROADMAP.md"
+  ],
+  "pi": {
+    "extensions": [
+      "extensions/plan-tracker.ts",
+      "extensions/workflow-monitor.ts",
+      "extensions/subagent/index.ts"
+    ],
+    "skills": [
+      "skills"
+    ]
+  },
+  "peerDependencies": {
+    "@mariozechner/pi-ai": "*",
+    "@mariozechner/pi-coding-agent": "*",
+    "@mariozechner/pi-tui": "*",
+    "@sinclair/typebox": "*"
+  },
+  "devDependencies": {
+    "@biomejs/biome": "^2.3.15",
+    "vitest": "^4.0.18"
+  }
+}

package/skills/brainstorming/SKILL.md ADDED Viewed

@@ -0,0 +1,70 @@
+---
+name: brainstorming
+description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
+---
+> **Related skills:** Consider `/skill:using-git-worktrees` to set up an isolated workspace, then `/skill:writing-plans` for implementation planning.
+# Brainstorming Ideas Into Designs
+## Overview
+Help turn ideas into fully formed designs and specs through natural collaborative dialogue.
+Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design in small sections (200-300 words), checking after each section whether it looks right so far.
+## Boundaries
+- Read code and docs: yes
+- Write to docs/plans/: yes
+- Edit or create any other files: no
+## The Process
+**Before anything else — check git state:**
+- Run `git status` and `git log --oneline -5`
+- If on a feature branch with uncommitted or unmerged work, ask the user:
+  - "You're on `<branch>` with uncommitted changes. Want to finish/merge that first, stash it, or continue here?"
+- Require exactly one of: finish prior work, stash, or explicitly continue here
+- If the topic is new, suggest creating a new branch before brainstorming
+**Understanding the idea:**
+- Check out the current project state first (files, docs, recent commits)
+- Check if the codebase or ecosystem already solves this before designing from scratch
+- Ask questions one at a time to refine the idea
+- Prefer multiple choice questions when possible, but open-ended is fine too
+- Only one question per message - if a topic needs more exploration, break it into multiple questions
+- Focus on understanding: purpose, constraints, success criteria
+**Exploring approaches:**
+- Propose 2-3 different approaches with trade-offs
+- Present options conversationally with your recommendation and reasoning
+- Lead with your recommended option and explain why
+**Presenting the design:**
+- Once you believe you understand what you're building, present the design
+- Break it into sections of 200-300 words
+- Ask after each section whether it looks right so far
+- Cover: architecture, components, data flow, error handling, testing
+- Be ready to go back and clarify if something doesn't make sense
+## After the Design
+**Documentation:**
+- Write the validated design to `docs/plans/YYYY-MM-DD-<topic>-design.md`
+- Commit the design document to git
+- The workflow monitor automatically tracks phase transitions when you invoke skills
+**Implementation (if continuing):**
+- Ask: "Ready to set up for implementation?"
+- Set up isolated workspace — `/skill:using-git-worktrees` for larger work, or just create a branch for small changes
+- Use `/skill:writing-plans` to create detailed implementation plan
+## Key Principles
+- **One question at a time** - Don't overwhelm with multiple questions
+- **Multiple choice preferred** - Easier to answer than open-ended when possible
+- **YAGNI ruthlessly** - Remove unnecessary features from all designs
+- **Design for testability** - Favor approaches with clear boundaries that are easy to verify with TDD
+- **Explore alternatives** - Always propose 2-3 approaches before settling
+- **Incremental validation** - Present design in sections, validate each
+- **Be flexible** - Go back and clarify when something doesn't make sense

package/skills/dispatching-parallel-agents/SKILL.md ADDED Viewed

@@ -0,0 +1,194 @@
+---
+name: dispatching-parallel-agents
+description: Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies
+---
+> **Related skills:** Debug each problem with `/skill:systematic-debugging`. Verify all fixes with `/skill:executing-tasks`.
+# Dispatching Parallel Agents
+## Overview
+When you have multiple unrelated failures (different test files, different subsystems, different bugs), investigating them sequentially wastes time. Each investigation is independent and can happen in parallel.
+**Core principle:** Dispatch one agent per independent problem domain. Let them work concurrently.
+## When to Use
+```dot
+digraph when_to_use {
+    "Multiple failures?" [shape=diamond];
+    "Are they independent?" [shape=diamond];
+    "Single agent investigates all" [shape=box];
+    "One agent per problem domain" [shape=box];
+    "Can they work in parallel?" [shape=diamond];
+    "Sequential agents" [shape=box];
+    "Parallel dispatch" [shape=box];
+    "Multiple failures?" -> "Are they independent?" [label="yes"];
+    "Are they independent?" -> "Single agent investigates all" [label="no - related"];
+    "Are they independent?" -> "Can they work in parallel?" [label="yes"];
+    "Can they work in parallel?" -> "Parallel dispatch" [label="yes"];
+    "Can they work in parallel?" -> "Sequential agents" [label="no - shared state"];
+}
+```
+**Use when:**
+- 3+ test files failing with different root causes
+- Multiple subsystems broken independently
+- Each problem can be understood without context from others
+- No shared state between investigations
+**Don't use when:**
+- Failures are related (fix one might fix others)
+- Need to understand full system state
+- Agents would interfere with each other
+## The Pattern
+### 1. Identify Independent Domains
+Group failures by what's broken:
+- File A tests: Tool approval flow
+- File B tests: Batch completion behavior
+- File C tests: Abort functionality
+Each domain is independent - fixing tool approval doesn't affect abort tests.
+### 2. Create Focused Agent Tasks
+Each agent gets:
+- **Specific scope:** One test file or subsystem
+- **Clear goal:** Make these tests pass
+- **Constraints:** Don't change other code
+- **Expected output:** Summary of what you found and fixed
+### 3. Dispatch in Parallel
+**How to dispatch:**
+Use the `subagent` tool in parallel mode:
+> **Agent scope:** The built-in agents (`worker`, `implementer`, `code-reviewer`, `spec-reviewer`)
+> are bundled with this package. To use them, set `agentScope: "both"`. The default scope `"user"`
+> only loads agents from `~/.pi/agent/agents/`.
+```ts
+subagent({
+  agentScope: "both",   // include bundled agents (worker, implementer, etc.)
+  tasks: [
+    { agent: "worker", task: "Fix agent-tool-abort.test.ts failures" },
+    { agent: "worker", task: "Fix batch-completion-behavior.test.ts failures" },
+    { agent: "worker", task: "Fix tool-approval-race-conditions.test.ts failures" },
+  ],
+})
+```
+### 4. Review and Integrate
+When agents return:
+- Read each summary
+- Verify fixes don't conflict
+- Run full test suite
+- Integrate all changes
+**If agents edited the same files:** Review manually. Pick the correct version per hunk, or re-run one agent with the other's changes as context. Don't blindly merge.
+**If some agents failed:** Integrate successful agents first (commit their work). Then retry the failed agent with fresh context that includes the integrated changes.
+## Agent Prompt Structure
+Good agent prompts are:
+1. **Focused** - One clear problem domain
+2. **Self-contained** - All context needed to understand the problem
+3. **Specific about output** - What should the agent return?
+```markdown
+Fix the 3 failing tests in src/agents/agent-tool-abort.test.ts:
+1. "should abort tool with partial output capture" - expects 'interrupted at' in message
+2. "should handle mixed completed and aborted tools" - fast tool aborted instead of completed
+3. "should properly track pendingToolCount" - expects 3 results but gets 0
+These are timing/race condition issues. Your task:
+1. Read the test file and understand what each test verifies
+2. Identify root cause - timing issues or actual bugs?
+3. Fix by:
+   - Replacing arbitrary timeouts with event-based waiting
+   - Fixing bugs in abort implementation if found
+   - Adjusting test expectations if testing changed behavior
+Do NOT just increase timeouts - find the real issue.
+Return: Summary of what you found and what you fixed.
+```
+## Common Mistakes
+**❌ Too broad:** "Fix all the tests" - agent gets lost
+**✅ Specific:** "Fix agent-tool-abort.test.ts" - focused scope
+**❌ No context:** "Fix the race condition" - agent doesn't know where
+**✅ Context:** Paste the error messages and test names
+**❌ No constraints:** Agent might refactor everything
+**✅ Constraints:** "Do NOT change production code" or "Fix tests only"
+**❌ Vague output:** "Fix it" - you don't know what changed
+**✅ Specific:** "Return summary of root cause and changes"
+## When NOT to Use
+**Related failures:** Fixing one might fix others - investigate together first
+**Need full context:** Understanding requires seeing entire system
+**Exploratory debugging:** You don't know what's broken yet
+**Shared state:** Agents would interfere (editing same files, using same resources)
+## Real Example from Session
+**Scenario:** 6 test failures across 3 files after major refactoring
+**Failures:**
+- agent-tool-abort.test.ts: 3 failures (timing issues)
+- batch-completion-behavior.test.ts: 2 failures (tools not executing)
+- tool-approval-race-conditions.test.ts: 1 failure (execution count = 0)
+**Decision:** Independent domains - abort logic separate from batch completion separate from race conditions
+**Dispatch:**
+```ts
+subagent({
+  agentScope: "both",   // include bundled agents (worker, implementer, etc.)
+  tasks: [
+    { agent: "worker", task: "Fix agent-tool-abort.test.ts" },
+    { agent: "worker", task: "Fix batch-completion-behavior.test.ts" },
+    { agent: "worker", task: "Fix tool-approval-race-conditions.test.ts" },
+  ],
+})
+```
+**Results:**
+- Agent 1: Replaced timeouts with event-based waiting
+- Agent 2: Fixed event structure bug (threadId in wrong place)
+- Agent 3: Added wait for async tool execution to complete
+**Integration:** All fixes independent, no conflicts, full suite green
+**Time saved:** 3 problems solved in parallel vs sequentially
+## Verification
+After agents return:
+1. **Review each summary** - Understand what changed
+2. **Check for conflicts** - Did agents edit same code?
+3. **Run full suite** - Verify all fixes work together
+4. **Spot check** - Agents can make systematic errors
+> **Integration-mode note:** When integrating parallel agent results, run `git stash` if
+> needed before the integration test run to isolate any stash conflicts from true failures.
+> If tests fail during integration, rule out merge conflicts first before treating it as a
+> new bug. Only invoke `workflow_reference debug-rationalizations` if you have confirmed
+> the failure is not from a merge conflict.

package/skills/executing-tasks/SKILL.md ADDED Viewed

@@ -0,0 +1,247 @@
+---
+name: executing-tasks
+description: Use when you have an approved implementation plan to execute task-by-task with human gates and bounded retries
+---
+# Executing Tasks
+## Overview
+Execute an implementation plan task-by-task using a per-task lifecycle with human gates and bounded retry loops. Each task goes through: **define → approve → execute → verify → review → fix**.
+**Announce at start:** "I'm using the executing-tasks skill to implement the plan."
+## Prerequisites
+Before starting, verify:
+- [ ] On the correct branch/worktree
+- [ ] Plan file exists at `docs/plans/YYYY-MM-DD-<name>.md`
+- [ ] Plan has been reviewed and approved
+## Initialization
+1. Read the plan file and extract all tasks, including each task's `Type:` field
+2. Initialize plan_tracker with structured task metadata:
+   ```
+   plan_tracker({
+     action: "init",
+     tasks: [
+       { name: "Task 1 name", type: "code" },
+       { name: "Task 2 name", type: "non-code" },
+     ],
+   })
+   ```
+3. Mark the execute phase as active
+## Per-Task Lifecycle
+For each task in the plan:
+### 1. Define
+**Code task →** Write actual test file(s) with assertions:
+- Create test files that exercise the new/modified behavior
+- Tests must be specific, deterministic, and fail before implementation
+- Include edge cases and error conditions
+- Apply TDD-specific guidance only to code tasks
+**Non-code task →** Reuse and refine the plan's acceptance criteria:
+- List specific, measurable conditions
+- Each criterion must be independently verifiable
+- Treat these criteria as the basis for approval and verification
+Update plan_tracker:
+```
+plan_tracker({ action: "update", index: N, phase: "define" })
+```
+### 2. Approve (Human Gate)
+Present the test cases or acceptance criteria to the human:
+**For code tasks:**
+- Show the test files to be written
+- Explain what each test verifies
+- Ask: "Do these test cases cover the requirements? Approve, revise, or reject?"
+**For non-code tasks:**
+- Show the acceptance criteria list from the plan
+- Ask: "Do these criteria capture the intent? Approve, revise, or reject?"
+**No execution begins until approved.**
+If revised → return to Define step.
+If rejected → skip task and mark as blocked.
+```
+plan_tracker({ action: "update", index: N, phase: "approve" })
+```
+### 3. Execute (max 3 attempts)
+Implement the task following the plan's steps.
+For each attempt:
+1. Write/modify code as specified in the plan
+2. Run tests or verify against acceptance criteria
+3. If all pass → move to Verify
+4. If failures:
+   - Analyze the failures
+   - Fix the implementation
+   - Increment executeAttempts
+   - If executeAttempts reaches 3 → **escalate to human**
+```
+plan_tracker({ action: "update", index: N, phase: "execute" })
+plan_tracker({ action: "update", index: N, attempts: 1 })  // after each attempt (routes to executeAttempts based on phase)
+```
+**Escalation on budget exhaustion:**
+> "I've attempted this task 3 times without success. Options:
+> 1. Revise the scope or approach
+> 2. Adjust the test cases / acceptance criteria
+> 3. Abandon this task and move on
+>
+> What would you like to do?"
+### 4. Verify
+Re-run all tests or check all acceptance criteria.
+Report results to the human:
+- ✅ Condition 1: passed
+- ✅ Condition 2: passed
+- ❌ Condition 3: failed — [description of failure]
+**Does not auto-fix.** Flags failures to human for decision.
+```
+plan_tracker({ action: "update", index: N, phase: "verify" })
+```
+If failures detected:
+> "Verification found issues. Options:
+> 1. Go back to Execute for another attempt
+> 2. Revise the tests/criteria
+> 3. Accept as-is (mark partial)
+>
+> What would you like to do?"
+### 5. Review (two layers)
+**Layer 1 — Subagent review:**
+- Dispatch a subagent to review the implementation against the task spec
+- Subagent checks: correctness, edge cases, code quality, test coverage
+- Subagent reports findings
+  Use `agentScope: "both"` to access the bundled `code-reviewer` agent:
+  ```
+  subagent({ agent: "code-reviewer", task: "Review implementation of task N against spec", agentScope: "both" })
+  ```
+**Layer 2 — Human sign-off:**
+- Present the subagent review + test results to the human
+- Summarize what was done, what passed, any concerns
+- Ask: "Does this look good? Approve or request changes?"
+```
+plan_tracker({ action: "update", index: N, phase: "review" })
+```
+If issues found → move to Fix.
+### 6. Fix (max 3 loops, re-enters Verify → Review)
+1. Address the review feedback
+2. Re-enter Verify → Review cycle
+3. Increment fixAttempts after each fix round
+4. If fixAttempts reaches 3 → **escalate to human**
+```
+plan_tracker({ action: "update", index: N, phase: "fix" })
+plan_tracker({ action: "update", index: N, attempts: 1 })  // routes to fixAttempts based on phase
+```
+**Escalation on budget exhaustion:**
+> "I've attempted fixes 3 times. Options:
+> 1. Proceed as-is despite remaining issues
+> 2. Keep fixing (at your own risk)
+> 3. Abandon this task and move on
+>
+> What would you like to do?"
+### Task Complete
+When both reviewers are satisfied and all conditions pass:
+```
+plan_tracker({ action: "update", index: N, status: "complete" })
+```
+Commit the task:
+```bash
+git add <relevant files>
+git commit -m "feat(task N): <description>"
+```
+## Escalation Rules
+| Event | Action |
+|-------|--------|
+| Execute 3 attempts exhausted | Escalate to human — never auto-skip |
+| Fix loop 3 attempts exhausted | Escalate to human — never auto-skip |
+| Verify fails | Flag to human — human decides next step |
+**No silent skipping. Consistent escalation everywhere.**
+## Finalize Phase
+After all tasks complete (or are explicitly accepted by human):
+### 1. Final Review
+- Dispatch subagent to review the entire implementation holistically
+- Check for integration issues, consistency across tasks, documentation gaps
+### 2. Create PR
+```bash
+git push origin <branch>
+gh pr create --title "feat: <feature summary>" --body "<task summary>"
+```
+### 3. Archive Planning Docs
+```bash
+mkdir -p docs/plans/completed
+mv docs/plans/<plan-file> docs/plans/completed/
+```
+### 4. Update Repo Docs
+- Update CHANGELOG with feature summary
+- Update README if API/surface changed
+- Update inline documentation as needed
+### 5. Update Project Documentation
+- Update README if project overview has changed
+- Update CONTRIBUTING or architecture docs if structure changed
+- Note any new patterns or conventions introduced
+### 6. Clean Up
+- Remove worktree if one was used
+- Mark finalize phase complete
+## Boundaries
+- Read code, docs, and tests: yes
+- Write tests and implementation code: yes (within current task scope)
+- Write to docs/plans/completed/: yes (during finalize)
+- Edit files outside task scope: no (unless human explicitly approves)
+## Remember
+- Always present test cases/criteria for human approval before executing
+- Extract each task's `Type:` from the plan and preserve it in `plan_tracker`
+- Track per-task phase and attempts in plan_tracker
+- Code tasks use TDD; non-code tasks use acceptance criteria during define, approve, and verify
+- Escalate immediately on budget exhaustion — never silently skip or continue
+- Verify does not auto-fix — always flag to human
+- Review has two layers (subagent first, then human)
+- Fix loops re-enter verify → review (max 3 fix loops)
+- Execute has separate budget (max 3 attempts)
+- Total max cycles per task: 3 execute + 3 fix = 6