npm - workerssuper - Versions diffs - 5.0.4 - Mend

workerssuper 5.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (135) hide show

package/.claude-plugin/marketplace.json +20 -0
package/.claude-plugin/plugin.json +13 -0
package/.codex/INSTALL.md +67 -0
package/.cursor-plugin/plugin.json +18 -0
package/.gitattributes +18 -0
package/.github/FUNDING.yml +3 -0
package/.github/ISSUE_TEMPLATE/bug_report.md +52 -0
package/.github/ISSUE_TEMPLATE/config.yml +5 -0
package/.github/ISSUE_TEMPLATE/feature_request.md +34 -0
package/.github/ISSUE_TEMPLATE/platform_support.md +23 -0
package/.github/PULL_REQUEST_TEMPLATE.md +87 -0
package/.opencode/INSTALL.md +83 -0
package/.opencode/plugins/superpowers.js +107 -0
package/CHANGELOG.md +13 -0
package/CODE_OF_CONDUCT.md +128 -0
package/GEMINI.md +2 -0
package/LICENSE +21 -0
package/README.md +187 -0
package/RELEASE-NOTES.md +1057 -0
package/agents/code-reviewer.md +48 -0
package/commands/brainstorm.md +5 -0
package/commands/execute-plan.md +5 -0
package/commands/write-plan.md +5 -0
package/docs/README.codex.md +126 -0
package/docs/README.opencode.md +130 -0
package/docs/plans/2025-11-22-opencode-support-design.md +294 -0
package/docs/plans/2025-11-22-opencode-support-implementation.md +1095 -0
package/docs/plans/2025-11-28-skills-improvements-from-user-feedback.md +711 -0
package/docs/plans/2026-01-17-visual-brainstorming.md +571 -0
package/docs/superpowers/plans/2026-01-22-document-review-system.md +301 -0
package/docs/superpowers/plans/2026-02-19-visual-brainstorming-refactor.md +523 -0
package/docs/superpowers/plans/2026-03-11-zero-dep-brainstorm-server.md +479 -0
package/docs/superpowers/specs/2026-01-22-document-review-system-design.md +136 -0
package/docs/superpowers/specs/2026-02-19-visual-brainstorming-refactor-design.md +162 -0
package/docs/superpowers/specs/2026-03-11-zero-dep-brainstorm-server-design.md +118 -0
package/docs/testing.md +303 -0
package/docs/windows/polyglot-hooks.md +212 -0
package/gemini-extension.json +6 -0
package/hooks/hooks-cursor.json +10 -0
package/hooks/hooks.json +16 -0
package/hooks/run-hook.cmd +46 -0
package/hooks/session-start +57 -0
package/package.json +5 -0
package/skills/brainstorming/SKILL.md +164 -0
package/skills/brainstorming/scripts/frame-template.html +214 -0
package/skills/brainstorming/scripts/helper.js +88 -0
package/skills/brainstorming/scripts/server.cjs +338 -0
package/skills/brainstorming/scripts/start-server.sh +153 -0
package/skills/brainstorming/scripts/stop-server.sh +55 -0
package/skills/brainstorming/spec-document-reviewer-prompt.md +49 -0
package/skills/brainstorming/visual-companion.md +286 -0
package/skills/dispatching-parallel-agents/SKILL.md +182 -0
package/skills/executing-plans/SKILL.md +70 -0
package/skills/finishing-a-development-branch/SKILL.md +200 -0
package/skills/receiving-code-review/SKILL.md +213 -0
package/skills/requesting-code-review/SKILL.md +105 -0
package/skills/requesting-code-review/code-reviewer.md +146 -0
package/skills/subagent-driven-development/SKILL.md +277 -0
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
package/skills/subagent-driven-development/implementer-prompt.md +113 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
package/skills/systematic-debugging/CREATION-LOG.md +119 -0
package/skills/systematic-debugging/SKILL.md +296 -0
package/skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
package/skills/systematic-debugging/condition-based-waiting.md +115 -0
package/skills/systematic-debugging/defense-in-depth.md +122 -0
package/skills/systematic-debugging/find-polluter.sh +63 -0
package/skills/systematic-debugging/root-cause-tracing.md +169 -0
package/skills/systematic-debugging/test-academic.md +14 -0
package/skills/systematic-debugging/test-pressure-1.md +58 -0
package/skills/systematic-debugging/test-pressure-2.md +68 -0
package/skills/systematic-debugging/test-pressure-3.md +69 -0
package/skills/test-driven-development/SKILL.md +371 -0
package/skills/test-driven-development/testing-anti-patterns.md +299 -0
package/skills/using-git-worktrees/SKILL.md +218 -0
package/skills/using-superpowers/SKILL.md +115 -0
package/skills/using-superpowers/references/codex-tools.md +25 -0
package/skills/using-superpowers/references/gemini-tools.md +33 -0
package/skills/verification-before-completion/SKILL.md +139 -0
package/skills/writing-plans/SKILL.md +145 -0
package/skills/writing-plans/plan-document-reviewer-prompt.md +49 -0
package/skills/writing-skills/SKILL.md +655 -0
package/skills/writing-skills/anthropic-best-practices.md +1150 -0
package/skills/writing-skills/examples/CLAUDE_MD_TESTING.md +189 -0
package/skills/writing-skills/graphviz-conventions.dot +172 -0
package/skills/writing-skills/persuasion-principles.md +187 -0
package/skills/writing-skills/render-graphs.js +168 -0
package/skills/writing-skills/testing-skills-with-subagents.md +384 -0
package/tests/brainstorm-server/package-lock.json +36 -0
package/tests/brainstorm-server/package.json +10 -0
package/tests/brainstorm-server/server.test.js +424 -0
package/tests/brainstorm-server/windows-lifecycle.test.sh +351 -0
package/tests/brainstorm-server/ws-protocol.test.js +392 -0
package/tests/claude-code/README.md +158 -0
package/tests/claude-code/analyze-token-usage.py +168 -0
package/tests/claude-code/run-skill-tests.sh +187 -0
package/tests/claude-code/test-document-review-system.sh +177 -0
package/tests/claude-code/test-helpers.sh +202 -0
package/tests/claude-code/test-subagent-driven-development-integration.sh +314 -0
package/tests/claude-code/test-subagent-driven-development.sh +165 -0
package/tests/explicit-skill-requests/prompts/action-oriented.txt +3 -0
package/tests/explicit-skill-requests/prompts/after-planning-flow.txt +17 -0
package/tests/explicit-skill-requests/prompts/claude-suggested-it.txt +11 -0
package/tests/explicit-skill-requests/prompts/i-know-what-sdd-means.txt +8 -0
package/tests/explicit-skill-requests/prompts/mid-conversation-execute-plan.txt +3 -0
package/tests/explicit-skill-requests/prompts/please-use-brainstorming.txt +1 -0
package/tests/explicit-skill-requests/prompts/skip-formalities.txt +3 -0
package/tests/explicit-skill-requests/prompts/subagent-driven-development-please.txt +1 -0
package/tests/explicit-skill-requests/prompts/use-systematic-debugging.txt +1 -0
package/tests/explicit-skill-requests/run-all.sh +70 -0
package/tests/explicit-skill-requests/run-claude-describes-sdd.sh +100 -0
package/tests/explicit-skill-requests/run-extended-multiturn-test.sh +113 -0
package/tests/explicit-skill-requests/run-haiku-test.sh +144 -0
package/tests/explicit-skill-requests/run-multiturn-test.sh +143 -0
package/tests/explicit-skill-requests/run-test.sh +136 -0
package/tests/opencode/run-tests.sh +163 -0
package/tests/opencode/setup.sh +73 -0
package/tests/opencode/test-plugin-loading.sh +72 -0
package/tests/opencode/test-priority.sh +198 -0
package/tests/opencode/test-tools.sh +104 -0
package/tests/skill-triggering/prompts/dispatching-parallel-agents.txt +8 -0
package/tests/skill-triggering/prompts/executing-plans.txt +1 -0
package/tests/skill-triggering/prompts/requesting-code-review.txt +3 -0
package/tests/skill-triggering/prompts/systematic-debugging.txt +11 -0
package/tests/skill-triggering/prompts/test-driven-development.txt +7 -0
package/tests/skill-triggering/prompts/writing-plans.txt +10 -0
package/tests/skill-triggering/run-all.sh +60 -0
package/tests/skill-triggering/run-test.sh +88 -0
package/tests/subagent-driven-dev/go-fractals/design.md +81 -0
package/tests/subagent-driven-dev/go-fractals/plan.md +172 -0
package/tests/subagent-driven-dev/go-fractals/scaffold.sh +45 -0
package/tests/subagent-driven-dev/run-test.sh +106 -0
package/tests/subagent-driven-dev/svelte-todo/design.md +70 -0
package/tests/subagent-driven-dev/svelte-todo/plan.md +222 -0
package/tests/subagent-driven-dev/svelte-todo/scaffold.sh +46 -0

package/docs/superpowers/specs/2026-02-19-visual-brainstorming-refactor-design.md ADDED Viewed

@@ -0,0 +1,162 @@
+# Visual Brainstorming Refactor: Browser Displays, Terminal Commands
+**Date:** 2026-02-19
+**Status:** Approved
+**Scope:** `lib/brainstorm-server/`, `skills/brainstorming/visual-companion.md`, `tests/brainstorm-server/`
+## Problem
+During visual brainstorming, Claude runs `wait-for-feedback.sh` as a background task and blocks on `TaskOutput(block=true, timeout=600s)`. This seizes the TUI entirely — the user cannot type to Claude while visual brainstorming is running. The browser becomes the only input channel.
+Claude Code's execution model is turn-based. There is no way for Claude to listen on two channels simultaneously within a single turn. The blocking `TaskOutput` pattern was the wrong primitive — it simulates event-driven behavior the platform doesn't support.
+## Design
+### Core Model
+**Browser = interactive display.** Shows mockups, lets the user click to select options. Selections are recorded server-side.
+**Terminal = conversation channel.** Always unblocked, always available. The user talks to Claude here.
+### The Loop
+1. Claude writes an HTML file to the session directory
+2. Server detects it via chokidar, pushes WebSocket reload to the browser (unchanged)
+3. Claude ends its turn — tells the user to check the browser and respond in the terminal
+4. User looks at browser, optionally clicks to select an option, then types feedback in the terminal
+5. On the next turn, Claude reads `$SCREEN_DIR/.events` for the browser interaction stream (clicks, selections), merges with the terminal text
+6. Iterate or advance
+No background tasks. No `TaskOutput` blocking. No polling scripts.
+### Key Deletion: `wait-for-feedback.sh`
+Deleted entirely. Its purpose was to bridge "server logs events to stdout" and "Claude needs to receive those events." The `.events` file replaces this — the server writes user interaction events directly, and Claude reads them with whatever file-reading mechanism the platform provides.
+### Key Addition: `.events` File (Per-Screen Event Stream)
+The server writes all user interaction events to `$SCREEN_DIR/.events`, one JSON object per line. This gives Claude the full interaction stream for the current screen — not just the final selection, but the user's exploration path (clicked A, then B, settled on C).
+Example contents after a user explores options:
+```jsonl
+{"type":"click","choice":"a","text":"Option A - Preset-First Wizard","timestamp":1706000101}
+{"type":"click","choice":"c","text":"Option C - Manual Config","timestamp":1706000108}
+{"type":"click","choice":"b","text":"Option B - Hybrid Approach","timestamp":1706000115}
+```
+- Append-only within a screen. Each user event is appended as a new line.
+- The file is cleared (deleted) when chokidar detects a new HTML file (new screen pushed), preventing stale events from carrying over.
+- If the file doesn't exist when Claude reads it, no browser interaction occurred — Claude uses only the terminal text.
+- The file contains only user events (`click`, etc.) — not server lifecycle events (`server-started`, `screen-added`). This keeps it small and focused.
+- Claude can read the full stream to understand the user's exploration pattern, or just look at the last `choice` event for the final selection.
+## Changes by File
+### `index.js` (server)
+**A. Write user events to `.events` file.**
+In the WebSocket `message` handler, after logging the event to stdout: append the event as a JSON line to `$SCREEN_DIR/.events` via `fs.appendFileSync`. Only write user interaction events (those with `source: 'user-event'`), not server lifecycle events.
+**B. Clear `.events` on new screen.**
+In the chokidar `add` handler (new `.html` file detected), delete `$SCREEN_DIR/.events` if it exists. This is the definitive "new screen" signal — better than clearing on GET `/` which fires on every reload.
+**C. Replace `wrapInFrame` content injection.**
+The current regex anchors on `<div class="feedback-footer">`, which is being removed. Replace with a comment placeholder: remove the existing default content inside `#claude-content` (the `<h2>Visual Brainstorming</h2>` and subtitle paragraph) and replace with a single `<!-- CONTENT -->` marker. Content injection becomes `frameTemplate.replace('<!-- CONTENT -->', content)`. Simpler and won't break if template formatting changes.
+### `frame-template.html` (UI frame)
+**Remove:**
+- The `feedback-footer` div (textarea, Send button, label, `.feedback-row`)
+- Associated CSS (`.feedback-footer`, `.feedback-footer label`, `.feedback-row`, textarea and button styles within it)
+**Add:**
+- `<!-- CONTENT -->` placeholder inside `#claude-content`, replacing the default text
+- A selection indicator bar where the footer was, with two states:
+  - Default: "Click an option above, then return to the terminal"
+  - After selection: "Option B selected — return to terminal to continue"
+- CSS for the indicator bar (subtle, similar visual weight to the existing header)
+**Keep unchanged:**
+- Header bar with "Brainstorm Companion" title and connection status
+- `.main` wrapper and `#claude-content` container
+- All component CSS (`.options`, `.cards`, `.mockup`, `.split`, `.pros-cons`, placeholders, mock elements)
+- Dark/light theme variables and media query
+### `helper.js` (client-side script)
+**Remove:**
+- `sendToClaude()` function and the "Sent to Claude" page takeover
+- `window.send()` function (was tied to the removed Send button)
+- Form submission handler — no purpose without the feedback textarea, adds log noise
+- Input change handler — same reason
+- `pageshow` event listener (was added to fix textarea persistence — no textarea anymore)
+**Keep:**
+- WebSocket connection, reconnect logic, event queue
+- Reload handler (`window.location.reload()` on server push)
+- `window.toggleSelect()` for selection highlighting
+- `window.selectedChoice` tracking
+- `window.brainstorm.send()` and `window.brainstorm.choice()` — these are distinct from the removed `window.send()`. They call `sendEvent` which logs to the server via WebSocket. Useful for custom full-document pages.
+**Narrow:**
+- Click handler: capture only `[data-choice]` clicks, not all buttons/links. The broad capture was needed when the browser was a feedback channel; now it's just for selection tracking.
+**Add:**
+- On `data-choice` click, update the selection indicator bar text to show which option was selected.
+**Remove from `window.brainstorm` API:**
+- `brainstorm.sendToClaude` — no longer exists
+### `visual-companion.md` (skill instructions)
+**Rewrite "The Loop" section** to the non-blocking flow described above. Remove all references to:
+- `wait-for-feedback.sh`
+- `TaskOutput` blocking
+- Timeout/retry logic (600s timeout, 30-minute cap)
+- "User Feedback Format" section describing `send-to-claude` JSON
+**Replace with:**
+- The new loop (write HTML → end turn → user responds in terminal → read `.events` → iterate)
+- `.events` file format documentation
+- Guidance that the terminal message is the primary feedback; `.events` provides the full browser interaction stream for additional context
+**Keep:**
+- Server startup/shutdown instructions
+- Content fragment vs full document guidance
+- CSS class reference and available components
+- Design tips (scale fidelity to the question, 2-4 options per screen, etc.)
+### `wait-for-feedback.sh`
+**Deleted entirely.**
+### `tests/brainstorm-server/server.test.js`
+Tests that need updating:
+- Test asserting `feedback-footer` presence in fragment responses — update to assert the selection indicator bar or `<!-- CONTENT -->` replacement
+- Test asserting `helper.js` contains `send` — update to reflect narrowed API
+- Test asserting `sendToClaude` CSS variable usage — remove (function no longer exists)
+## Platform Compatibility
+The server code (`index.js`, `helper.js`, `frame-template.html`) is fully platform-agnostic — pure Node.js and browser JavaScript. No Claude Code-specific references. Already proven to work on Codex via background terminal interaction.
+The skill instructions (`visual-companion.md`) are the platform-adaptive layer. Each platform's Claude uses its own tools to start the server, read `.events`, etc. The non-blocking model works naturally across platforms since it doesn't depend on any platform-specific blocking primitive.
+## What This Enables
+- **TUI always responsive** during visual brainstorming
+- **Mixed input** — click in browser + type in terminal, naturally merged
+- **Graceful degradation** — browser down or user doesn't open it? Terminal still works
+- **Simpler architecture** — no background tasks, no polling scripts, no timeout management
+- **Cross-platform** — same server code works on Claude Code, Codex, and any future platform
+## What This Drops
+- **Pure-browser feedback workflow** — user must return to the terminal to continue. The selection indicator bar guides them, but it's one extra step compared to the old click-Send-and-wait flow.
+- **Inline text feedback from browser** — the textarea is gone. All text feedback goes through the terminal. This is intentional — the terminal is a better text input channel than a small textarea in a frame.
+- **Immediate response on browser Send** — the old system had Claude respond the moment the user clicked Send. Now there's a gap while the user switches to the terminal. In practice this is seconds, and the user gets to add context in their terminal message.

package/docs/superpowers/specs/2026-03-11-zero-dep-brainstorm-server-design.md ADDED Viewed

@@ -0,0 +1,118 @@
+# Zero-Dependency Brainstorm Server
+Replace the brainstorm companion server's vendored node_modules (express, ws, chokidar — 714 tracked files) with a single zero-dependency `server.js` using only Node.js built-ins.
+## Motivation
+Vendoring node_modules into the git repo creates a supply chain risk: frozen dependencies don't get security patches, 714 files of third-party code are committed without audit, and modifications to vendored code look like normal commits. While the actual risk is low (localhost-only dev server), eliminating it is straightforward.
+## Architecture
+A single `server.js` file (~250-300 lines) using `http`, `crypto`, `fs`, and `path`. The file serves two roles:
+- **When run directly** (`node server.js`): starts the HTTP/WebSocket server
+- **When required** (`require('./server.js')`): exports WebSocket protocol functions for unit testing
+### WebSocket Protocol
+Implements RFC 6455 for text frames only:
+**Handshake:** Compute `Sec-WebSocket-Accept` from client's `Sec-WebSocket-Key` using SHA-1 + the RFC 6455 magic GUID. Return 101 Switching Protocols.
+**Frame decoding (client to server):** Handle three masked length encodings:
+- Small: payload < 126 bytes
+- Medium: 126-65535 bytes (16-bit extended)
+- Large: > 65535 bytes (64-bit extended)
+XOR-unmask payload using 4-byte mask key. Return `{ opcode, payload, bytesConsumed }` or `null` for incomplete buffers. Reject unmasked frames.
+**Frame encoding (server to client):** Unmasked frames with the same three length encodings.
+**Opcodes handled:** TEXT (0x01), CLOSE (0x08), PING (0x09), PONG (0x0A). Unrecognized opcodes get a close frame with status 1003 (Unsupported Data).
+**Deliberately skipped:** Binary frames, fragmented messages, extensions (permessage-deflate), subprotocols. These are unnecessary for small JSON text messages between localhost clients. Extensions and subprotocols are negotiated in the handshake — by not advertising them, they are never active.
+**Buffer accumulation:** Each connection maintains a buffer. On `data`, append and loop `decodeFrame` until it returns null or buffer is empty.
+### HTTP Server
+Three routes:
+1. **`GET /`** — Serve newest `.html` from screen directory by mtime. Detect full documents vs fragments, wrap fragments in frame template, inject helper.js. Return `text/html`. When no `.html` files exist, serve a hardcoded waiting page ("Waiting for Claude to push a screen...") with helper.js injected.
+2. **`GET /files/*`** — Serve static files from screen directory with MIME type lookup from a hardcoded extension map (html, css, js, png, jpg, gif, svg, json). Return 404 if not found.
+3. **Everything else** — 404.
+WebSocket upgrade handled via the `'upgrade'` event on the HTTP server, separate from the request handler.
+### Configuration
+Environment variables (all optional):
+- `BRAINSTORM_PORT` — port to bind (default: random high port 49152-65535)
+- `BRAINSTORM_HOST` — interface to bind (default: `127.0.0.1`)
+- `BRAINSTORM_URL_HOST` — hostname for the URL in startup JSON (default: `localhost` when host is `127.0.0.1`, otherwise same as host)
+- `BRAINSTORM_DIR` — screen directory path (default: `/tmp/brainstorm`)
+### Startup Sequence
+1. Create `SCREEN_DIR` if it doesn't exist (`mkdirSync` recursive)
+2. Load frame template and helper.js from `__dirname`
+3. Start HTTP server on configured host/port
+4. Start `fs.watch` on `SCREEN_DIR`
+5. On successful listen, log `server-started` JSON to stdout: `{ type, port, host, url_host, url, screen_dir }`
+6. Write the same JSON to `SCREEN_DIR/.server-info` so agents can find connection details when stdout is hidden (background execution)
+### Application-Level WebSocket Messages
+When a TEXT frame arrives from a client:
+1. Parse as JSON. If parsing fails, log to stderr and continue.
+2. Log to stdout as `{ source: 'user-event', ...event }`.
+3. If the event contains a `choice` property, append the JSON to `SCREEN_DIR/.events` (one line per event).
+### File Watching
+`fs.watch(SCREEN_DIR)` replaces chokidar. On HTML file events:
+- On new file (`rename` event for a file that exists): delete `.events` file if present (`unlinkSync`), log `screen-added` to stdout as JSON
+- On file change (`change` event): log `screen-updated` to stdout as JSON (do NOT clear `.events`)
+- Both events: send `{ type: 'reload' }` to all connected WebSocket clients
+Debounce per-filename with ~100ms timeout to prevent duplicate events (common on macOS and Linux).
+### Error Handling
+- Malformed JSON from WebSocket clients: log to stderr, continue
+- Unhandled opcodes: close with status 1003
+- Client disconnects: remove from broadcast set
+- `fs.watch` errors: log to stderr, continue
+- No graceful shutdown logic — shell scripts handle process lifecycle via SIGTERM
+## What Changes
+| Before | After |
+|---|---|
+| `index.js` + `package.json` + `package-lock.json` + 714 `node_modules` files | `server.js` (single file) |
+| express, ws, chokidar dependencies | none |
+| No static file serving | `/files/*` serves from screen directory |
+## What Stays the Same
+- `helper.js` — no changes
+- `frame-template.html` — no changes
+- `start-server.sh` — one-line update: `index.js` to `server.js`
+- `stop-server.sh` — no changes
+- `visual-companion.md` — no changes
+- All existing server behavior and external contract
+## Platform Compatibility
+- `server.js` uses only cross-platform Node built-ins
+- `fs.watch` is reliable for single flat directories on macOS, Linux, and Windows
+- Shell scripts require bash (Git Bash on Windows, which is required for Claude Code)
+## Testing
+**Unit tests** (`ws-protocol.test.js`): Test WebSocket frame encoding/decoding, handshake computation, and protocol edge cases directly by requiring `server.js` exports.
+**Integration tests** (`server.test.js`): Test full server behavior — HTTP serving, WebSocket communication, file watching, brainstorming workflow. Uses `ws` npm package as a test-only client dependency (not shipped to end users).

package/docs/testing.md ADDED Viewed

@@ -0,0 +1,303 @@
+# Testing Superpowers Skills
+This document describes how to test Superpowers skills, particularly the integration tests for complex skills like `subagent-driven-development`.
+## Overview
+Testing skills that involve subagents, workflows, and complex interactions requires running actual Claude Code sessions in headless mode and verifying their behavior through session transcripts.
+## Test Structure
+```
+tests/
+├── claude-code/
+│   ├── test-helpers.sh                    # Shared test utilities
+│   ├── test-subagent-driven-development-integration.sh
+│   ├── analyze-token-usage.py             # Token analysis tool
+│   └── run-skill-tests.sh                 # Test runner (if exists)
+```
+## Running Tests
+### Integration Tests
+Integration tests execute real Claude Code sessions with actual skills:
+```bash
+# Run the subagent-driven-development integration test
+cd tests/claude-code
+./test-subagent-driven-development-integration.sh
+```
+**Note:** Integration tests can take 10-30 minutes as they execute real implementation plans with multiple subagents.
+### Requirements
+- Must run from the **superpowers plugin directory** (not from temp directories)
+- Claude Code must be installed and available as `claude` command
+- Local dev marketplace must be enabled: `"superpowers@superpowers-dev": true` in `~/.claude/settings.json`
+## Integration Test: subagent-driven-development
+### What It Tests
+The integration test verifies the `subagent-driven-development` skill correctly:
+1. **Plan Loading**: Reads the plan once at the beginning
+2. **Full Task Text**: Provides complete task descriptions to subagents (doesn't make them read files)
+3. **Self-Review**: Ensures subagents perform self-review before reporting
+4. **Review Order**: Runs spec compliance review before code quality review
+5. **Review Loops**: Uses review loops when issues are found
+6. **Independent Verification**: Spec reviewer reads code independently, doesn't trust implementer reports
+### How It Works
+1. **Setup**: Creates a temporary Node.js project with a minimal implementation plan
+2. **Execution**: Runs Claude Code in headless mode with the skill
+3. **Verification**: Parses the session transcript (`.jsonl` file) to verify:
+   - Skill tool was invoked
+   - Subagents were dispatched (Task tool)
+   - TodoWrite was used for tracking
+   - Implementation files were created
+   - Tests pass
+   - Git commits show proper workflow
+4. **Token Analysis**: Shows token usage breakdown by subagent
+### Test Output
+```
+========================================
+ Integration Test: subagent-driven-development
+========================================
+Test project: /tmp/tmp.xyz123
+=== Verification Tests ===
+Test 1: Skill tool invoked...
+  [PASS] subagent-driven-development skill was invoked
+Test 2: Subagents dispatched...
+  [PASS] 7 subagents dispatched
+Test 3: Task tracking...
+  [PASS] TodoWrite used 5 time(s)
+Test 6: Implementation verification...
+  [PASS] src/math.js created
+  [PASS] add function exists
+  [PASS] multiply function exists
+  [PASS] test/math.test.js created
+  [PASS] Tests pass
+Test 7: Git commit history...
+  [PASS] Multiple commits created (3 total)
+Test 8: No extra features added...
+  [PASS] No extra features added
+=========================================
+ Token Usage Analysis
+=========================================
+Usage Breakdown:
+----------------------------------------------------------------------------------------------------
+Agent           Description                          Msgs      Input     Output      Cache     Cost
+----------------------------------------------------------------------------------------------------
+main            Main session (coordinator)             34         27      3,996  1,213,703 $   4.09
+3380c209        implementing Task 1: Create Add Function     1          2        787     24,989 $   0.09
+34b00fde        implementing Task 2: Create Multiply Function     1          4        644     25,114 $   0.09
+3801a732        reviewing whether an implementation matches...   1          5        703     25,742 $   0.09
+4c142934        doing a final code review...                    1          6        854     25,319 $   0.09
+5f017a42        a code reviewer. Review Task 2...               1          6        504     22,949 $   0.08
+a6b7fbe4        a code reviewer. Review Task 1...               1          6        515     22,534 $   0.08
+f15837c0        reviewing whether an implementation matches...   1          6        416     22,485 $   0.07
+----------------------------------------------------------------------------------------------------
+TOTALS:
+  Total messages:         41
+  Input tokens:           62
+  Output tokens:          8,419
+  Cache creation tokens:  132,742
+  Cache read tokens:      1,382,835
+  Total input (incl cache): 1,515,639
+  Total tokens:             1,524,058
+  Estimated cost: $4.67
+  (at $3/$15 per M tokens for input/output)
+========================================
+ Test Summary
+========================================
+STATUS: PASSED
+```
+## Token Analysis Tool
+### Usage
+Analyze token usage from any Claude Code session:
+```bash
+python3 tests/claude-code/analyze-token-usage.py ~/.claude/projects/<project-dir>/<session-id>.jsonl
+```
+### Finding Session Files
+Session transcripts are stored in `~/.claude/projects/` with the working directory path encoded:
+```bash
+# Example for /Users/jesse/Documents/GitHub/superpowers/superpowers
+SESSION_DIR="$HOME/.claude/projects/-Users-jesse-Documents-GitHub-superpowers-superpowers"
+# Find recent sessions
+ls -lt "$SESSION_DIR"/*.jsonl | head -5
+```
+### What It Shows
+- **Main session usage**: Token usage by the coordinator (you or main Claude instance)
+- **Per-subagent breakdown**: Each Task invocation with:
+  - Agent ID
+  - Description (extracted from prompt)
+  - Message count
+  - Input/output tokens
+  - Cache usage
+  - Estimated cost
+- **Totals**: Overall token usage and cost estimate
+### Understanding the Output
+- **High cache reads**: Good - means prompt caching is working
+- **High input tokens on main**: Expected - coordinator has full context
+- **Similar costs per subagent**: Expected - each gets similar task complexity
+- **Cost per task**: Typical range is $0.05-$0.15 per subagent depending on task
+## Troubleshooting
+### Skills Not Loading
+**Problem**: Skill not found when running headless tests
+**Solutions**:
+1. Ensure you're running FROM the superpowers directory: `cd /path/to/superpowers && tests/...`
+2. Check `~/.claude/settings.json` has `"superpowers@superpowers-dev": true` in `enabledPlugins`
+3. Verify skill exists in `skills/` directory
+### Permission Errors
+**Problem**: Claude blocked from writing files or accessing directories
+**Solutions**:
+1. Use `--permission-mode bypassPermissions` flag
+2. Use `--add-dir /path/to/temp/dir` to grant access to test directories
+3. Check file permissions on test directories
+### Test Timeouts
+**Problem**: Test takes too long and times out
+**Solutions**:
+1. Increase timeout: `timeout 1800 claude ...` (30 minutes)
+2. Check for infinite loops in skill logic
+3. Review subagent task complexity
+### Session File Not Found
+**Problem**: Can't find session transcript after test run
+**Solutions**:
+1. Check the correct project directory in `~/.claude/projects/`
+2. Use `find ~/.claude/projects -name "*.jsonl" -mmin -60` to find recent sessions
+3. Verify test actually ran (check for errors in test output)
+## Writing New Integration Tests
+### Template
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "$SCRIPT_DIR/test-helpers.sh"
+# Create test project
+TEST_PROJECT=$(create_test_project)
+trap "cleanup_test_project $TEST_PROJECT" EXIT
+# Set up test files...
+cd "$TEST_PROJECT"
+# Run Claude with skill
+PROMPT="Your test prompt here"
+cd "$SCRIPT_DIR/../.." && timeout 1800 claude -p "$PROMPT" \
+  --allowed-tools=all \
+  --add-dir "$TEST_PROJECT" \
+  --permission-mode bypassPermissions \
+  2>&1 | tee output.txt
+# Find and analyze session
+WORKING_DIR_ESCAPED=$(echo "$SCRIPT_DIR/../.." | sed 's/\\//-/g' | sed 's/^-//')
+SESSION_DIR="$HOME/.claude/projects/$WORKING_DIR_ESCAPED"
+SESSION_FILE=$(find "$SESSION_DIR" -name "*.jsonl" -type f -mmin -60 | sort -r | head -1)
+# Verify behavior by parsing session transcript
+if grep -q '"name":"Skill".*"skill":"your-skill-name"' "$SESSION_FILE"; then
+    echo "[PASS] Skill was invoked"
+fi
+# Show token analysis
+python3 "$SCRIPT_DIR/analyze-token-usage.py" "$SESSION_FILE"
+```
+### Best Practices
+1. **Always cleanup**: Use trap to cleanup temp directories
+2. **Parse transcripts**: Don't grep user-facing output - parse the `.jsonl` session file
+3. **Grant permissions**: Use `--permission-mode bypassPermissions` and `--add-dir`
+4. **Run from plugin dir**: Skills only load when running from the superpowers directory
+5. **Show token usage**: Always include token analysis for cost visibility
+6. **Test real behavior**: Verify actual files created, tests passing, commits made
+## Session Transcript Format
+Session transcripts are JSONL (JSON Lines) files where each line is a JSON object representing a message or tool result.
+### Key Fields
+```json
+{
+  "type": "assistant",
+  "message": {
+    "content": [...],
+    "usage": {
+      "input_tokens": 27,
+      "output_tokens": 3996,
+      "cache_read_input_tokens": 1213703
+    }
+  }
+}
+```
+### Tool Results
+```json
+{
+  "type": "user",
+  "toolUseResult": {
+    "agentId": "3380c209",
+    "usage": {
+      "input_tokens": 2,
+      "output_tokens": 787,
+      "cache_read_input_tokens": 24989
+    },
+    "prompt": "You are implementing Task 1...",
+    "content": [{"type": "text", "text": "..."}]
+  }
+}
+```
+The `agentId` field links to subagent sessions, and the `usage` field contains token usage for that specific subagent invocation.