workerssuper 5.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (135) hide show
  1. package/.claude-plugin/marketplace.json +20 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/.codex/INSTALL.md +67 -0
  4. package/.cursor-plugin/plugin.json +18 -0
  5. package/.gitattributes +18 -0
  6. package/.github/FUNDING.yml +3 -0
  7. package/.github/ISSUE_TEMPLATE/bug_report.md +52 -0
  8. package/.github/ISSUE_TEMPLATE/config.yml +5 -0
  9. package/.github/ISSUE_TEMPLATE/feature_request.md +34 -0
  10. package/.github/ISSUE_TEMPLATE/platform_support.md +23 -0
  11. package/.github/PULL_REQUEST_TEMPLATE.md +87 -0
  12. package/.opencode/INSTALL.md +83 -0
  13. package/.opencode/plugins/superpowers.js +107 -0
  14. package/CHANGELOG.md +13 -0
  15. package/CODE_OF_CONDUCT.md +128 -0
  16. package/GEMINI.md +2 -0
  17. package/LICENSE +21 -0
  18. package/README.md +187 -0
  19. package/RELEASE-NOTES.md +1057 -0
  20. package/agents/code-reviewer.md +48 -0
  21. package/commands/brainstorm.md +5 -0
  22. package/commands/execute-plan.md +5 -0
  23. package/commands/write-plan.md +5 -0
  24. package/docs/README.codex.md +126 -0
  25. package/docs/README.opencode.md +130 -0
  26. package/docs/plans/2025-11-22-opencode-support-design.md +294 -0
  27. package/docs/plans/2025-11-22-opencode-support-implementation.md +1095 -0
  28. package/docs/plans/2025-11-28-skills-improvements-from-user-feedback.md +711 -0
  29. package/docs/plans/2026-01-17-visual-brainstorming.md +571 -0
  30. package/docs/superpowers/plans/2026-01-22-document-review-system.md +301 -0
  31. package/docs/superpowers/plans/2026-02-19-visual-brainstorming-refactor.md +523 -0
  32. package/docs/superpowers/plans/2026-03-11-zero-dep-brainstorm-server.md +479 -0
  33. package/docs/superpowers/specs/2026-01-22-document-review-system-design.md +136 -0
  34. package/docs/superpowers/specs/2026-02-19-visual-brainstorming-refactor-design.md +162 -0
  35. package/docs/superpowers/specs/2026-03-11-zero-dep-brainstorm-server-design.md +118 -0
  36. package/docs/testing.md +303 -0
  37. package/docs/windows/polyglot-hooks.md +212 -0
  38. package/gemini-extension.json +6 -0
  39. package/hooks/hooks-cursor.json +10 -0
  40. package/hooks/hooks.json +16 -0
  41. package/hooks/run-hook.cmd +46 -0
  42. package/hooks/session-start +57 -0
  43. package/package.json +5 -0
  44. package/skills/brainstorming/SKILL.md +164 -0
  45. package/skills/brainstorming/scripts/frame-template.html +214 -0
  46. package/skills/brainstorming/scripts/helper.js +88 -0
  47. package/skills/brainstorming/scripts/server.cjs +338 -0
  48. package/skills/brainstorming/scripts/start-server.sh +153 -0
  49. package/skills/brainstorming/scripts/stop-server.sh +55 -0
  50. package/skills/brainstorming/spec-document-reviewer-prompt.md +49 -0
  51. package/skills/brainstorming/visual-companion.md +286 -0
  52. package/skills/dispatching-parallel-agents/SKILL.md +182 -0
  53. package/skills/executing-plans/SKILL.md +70 -0
  54. package/skills/finishing-a-development-branch/SKILL.md +200 -0
  55. package/skills/receiving-code-review/SKILL.md +213 -0
  56. package/skills/requesting-code-review/SKILL.md +105 -0
  57. package/skills/requesting-code-review/code-reviewer.md +146 -0
  58. package/skills/subagent-driven-development/SKILL.md +277 -0
  59. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
  60. package/skills/subagent-driven-development/implementer-prompt.md +113 -0
  61. package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
  62. package/skills/systematic-debugging/CREATION-LOG.md +119 -0
  63. package/skills/systematic-debugging/SKILL.md +296 -0
  64. package/skills/systematic-debugging/condition-based-waiting-example.ts +158 -0
  65. package/skills/systematic-debugging/condition-based-waiting.md +115 -0
  66. package/skills/systematic-debugging/defense-in-depth.md +122 -0
  67. package/skills/systematic-debugging/find-polluter.sh +63 -0
  68. package/skills/systematic-debugging/root-cause-tracing.md +169 -0
  69. package/skills/systematic-debugging/test-academic.md +14 -0
  70. package/skills/systematic-debugging/test-pressure-1.md +58 -0
  71. package/skills/systematic-debugging/test-pressure-2.md +68 -0
  72. package/skills/systematic-debugging/test-pressure-3.md +69 -0
  73. package/skills/test-driven-development/SKILL.md +371 -0
  74. package/skills/test-driven-development/testing-anti-patterns.md +299 -0
  75. package/skills/using-git-worktrees/SKILL.md +218 -0
  76. package/skills/using-superpowers/SKILL.md +115 -0
  77. package/skills/using-superpowers/references/codex-tools.md +25 -0
  78. package/skills/using-superpowers/references/gemini-tools.md +33 -0
  79. package/skills/verification-before-completion/SKILL.md +139 -0
  80. package/skills/writing-plans/SKILL.md +145 -0
  81. package/skills/writing-plans/plan-document-reviewer-prompt.md +49 -0
  82. package/skills/writing-skills/SKILL.md +655 -0
  83. package/skills/writing-skills/anthropic-best-practices.md +1150 -0
  84. package/skills/writing-skills/examples/CLAUDE_MD_TESTING.md +189 -0
  85. package/skills/writing-skills/graphviz-conventions.dot +172 -0
  86. package/skills/writing-skills/persuasion-principles.md +187 -0
  87. package/skills/writing-skills/render-graphs.js +168 -0
  88. package/skills/writing-skills/testing-skills-with-subagents.md +384 -0
  89. package/tests/brainstorm-server/package-lock.json +36 -0
  90. package/tests/brainstorm-server/package.json +10 -0
  91. package/tests/brainstorm-server/server.test.js +424 -0
  92. package/tests/brainstorm-server/windows-lifecycle.test.sh +351 -0
  93. package/tests/brainstorm-server/ws-protocol.test.js +392 -0
  94. package/tests/claude-code/README.md +158 -0
  95. package/tests/claude-code/analyze-token-usage.py +168 -0
  96. package/tests/claude-code/run-skill-tests.sh +187 -0
  97. package/tests/claude-code/test-document-review-system.sh +177 -0
  98. package/tests/claude-code/test-helpers.sh +202 -0
  99. package/tests/claude-code/test-subagent-driven-development-integration.sh +314 -0
  100. package/tests/claude-code/test-subagent-driven-development.sh +165 -0
  101. package/tests/explicit-skill-requests/prompts/action-oriented.txt +3 -0
  102. package/tests/explicit-skill-requests/prompts/after-planning-flow.txt +17 -0
  103. package/tests/explicit-skill-requests/prompts/claude-suggested-it.txt +11 -0
  104. package/tests/explicit-skill-requests/prompts/i-know-what-sdd-means.txt +8 -0
  105. package/tests/explicit-skill-requests/prompts/mid-conversation-execute-plan.txt +3 -0
  106. package/tests/explicit-skill-requests/prompts/please-use-brainstorming.txt +1 -0
  107. package/tests/explicit-skill-requests/prompts/skip-formalities.txt +3 -0
  108. package/tests/explicit-skill-requests/prompts/subagent-driven-development-please.txt +1 -0
  109. package/tests/explicit-skill-requests/prompts/use-systematic-debugging.txt +1 -0
  110. package/tests/explicit-skill-requests/run-all.sh +70 -0
  111. package/tests/explicit-skill-requests/run-claude-describes-sdd.sh +100 -0
  112. package/tests/explicit-skill-requests/run-extended-multiturn-test.sh +113 -0
  113. package/tests/explicit-skill-requests/run-haiku-test.sh +144 -0
  114. package/tests/explicit-skill-requests/run-multiturn-test.sh +143 -0
  115. package/tests/explicit-skill-requests/run-test.sh +136 -0
  116. package/tests/opencode/run-tests.sh +163 -0
  117. package/tests/opencode/setup.sh +73 -0
  118. package/tests/opencode/test-plugin-loading.sh +72 -0
  119. package/tests/opencode/test-priority.sh +198 -0
  120. package/tests/opencode/test-tools.sh +104 -0
  121. package/tests/skill-triggering/prompts/dispatching-parallel-agents.txt +8 -0
  122. package/tests/skill-triggering/prompts/executing-plans.txt +1 -0
  123. package/tests/skill-triggering/prompts/requesting-code-review.txt +3 -0
  124. package/tests/skill-triggering/prompts/systematic-debugging.txt +11 -0
  125. package/tests/skill-triggering/prompts/test-driven-development.txt +7 -0
  126. package/tests/skill-triggering/prompts/writing-plans.txt +10 -0
  127. package/tests/skill-triggering/run-all.sh +60 -0
  128. package/tests/skill-triggering/run-test.sh +88 -0
  129. package/tests/subagent-driven-dev/go-fractals/design.md +81 -0
  130. package/tests/subagent-driven-dev/go-fractals/plan.md +172 -0
  131. package/tests/subagent-driven-dev/go-fractals/scaffold.sh +45 -0
  132. package/tests/subagent-driven-dev/run-test.sh +106 -0
  133. package/tests/subagent-driven-dev/svelte-todo/design.md +70 -0
  134. package/tests/subagent-driven-dev/svelte-todo/plan.md +222 -0
  135. package/tests/subagent-driven-dev/svelte-todo/scaffold.sh +46 -0
@@ -0,0 +1,162 @@
1
+ # Visual Brainstorming Refactor: Browser Displays, Terminal Commands
2
+
3
+ **Date:** 2026-02-19
4
+ **Status:** Approved
5
+ **Scope:** `lib/brainstorm-server/`, `skills/brainstorming/visual-companion.md`, `tests/brainstorm-server/`
6
+
7
+ ## Problem
8
+
9
+ During visual brainstorming, Claude runs `wait-for-feedback.sh` as a background task and blocks on `TaskOutput(block=true, timeout=600s)`. This seizes the TUI entirely — the user cannot type to Claude while visual brainstorming is running. The browser becomes the only input channel.
10
+
11
+ Claude Code's execution model is turn-based. There is no way for Claude to listen on two channels simultaneously within a single turn. The blocking `TaskOutput` pattern was the wrong primitive — it simulates event-driven behavior the platform doesn't support.
12
+
13
+ ## Design
14
+
15
+ ### Core Model
16
+
17
+ **Browser = interactive display.** Shows mockups, lets the user click to select options. Selections are recorded server-side.
18
+
19
+ **Terminal = conversation channel.** Always unblocked, always available. The user talks to Claude here.
20
+
21
+ ### The Loop
22
+
23
+ 1. Claude writes an HTML file to the session directory
24
+ 2. Server detects it via chokidar, pushes WebSocket reload to the browser (unchanged)
25
+ 3. Claude ends its turn — tells the user to check the browser and respond in the terminal
26
+ 4. User looks at browser, optionally clicks to select an option, then types feedback in the terminal
27
+ 5. On the next turn, Claude reads `$SCREEN_DIR/.events` for the browser interaction stream (clicks, selections), merges with the terminal text
28
+ 6. Iterate or advance
29
+
30
+ No background tasks. No `TaskOutput` blocking. No polling scripts.
31
+
32
+ ### Key Deletion: `wait-for-feedback.sh`
33
+
34
+ Deleted entirely. Its purpose was to bridge "server logs events to stdout" and "Claude needs to receive those events." The `.events` file replaces this — the server writes user interaction events directly, and Claude reads them with whatever file-reading mechanism the platform provides.
35
+
36
+ ### Key Addition: `.events` File (Per-Screen Event Stream)
37
+
38
+ The server writes all user interaction events to `$SCREEN_DIR/.events`, one JSON object per line. This gives Claude the full interaction stream for the current screen — not just the final selection, but the user's exploration path (clicked A, then B, settled on C).
39
+
40
+ Example contents after a user explores options:
41
+
42
+ ```jsonl
43
+ {"type":"click","choice":"a","text":"Option A - Preset-First Wizard","timestamp":1706000101}
44
+ {"type":"click","choice":"c","text":"Option C - Manual Config","timestamp":1706000108}
45
+ {"type":"click","choice":"b","text":"Option B - Hybrid Approach","timestamp":1706000115}
46
+ ```
47
+
48
+ - Append-only within a screen. Each user event is appended as a new line.
49
+ - The file is cleared (deleted) when chokidar detects a new HTML file (new screen pushed), preventing stale events from carrying over.
50
+ - If the file doesn't exist when Claude reads it, no browser interaction occurred — Claude uses only the terminal text.
51
+ - The file contains only user events (`click`, etc.) — not server lifecycle events (`server-started`, `screen-added`). This keeps it small and focused.
52
+ - Claude can read the full stream to understand the user's exploration pattern, or just look at the last `choice` event for the final selection.
53
+
54
+ ## Changes by File
55
+
56
+ ### `index.js` (server)
57
+
58
+ **A. Write user events to `.events` file.**
59
+
60
+ In the WebSocket `message` handler, after logging the event to stdout: append the event as a JSON line to `$SCREEN_DIR/.events` via `fs.appendFileSync`. Only write user interaction events (those with `source: 'user-event'`), not server lifecycle events.
61
+
62
+ **B. Clear `.events` on new screen.**
63
+
64
+ In the chokidar `add` handler (new `.html` file detected), delete `$SCREEN_DIR/.events` if it exists. This is the definitive "new screen" signal — better than clearing on GET `/` which fires on every reload.
65
+
66
+ **C. Replace `wrapInFrame` content injection.**
67
+
68
+ The current regex anchors on `<div class="feedback-footer">`, which is being removed. Replace with a comment placeholder: remove the existing default content inside `#claude-content` (the `<h2>Visual Brainstorming</h2>` and subtitle paragraph) and replace with a single `<!-- CONTENT -->` marker. Content injection becomes `frameTemplate.replace('<!-- CONTENT -->', content)`. Simpler and won't break if template formatting changes.
69
+
70
+ ### `frame-template.html` (UI frame)
71
+
72
+ **Remove:**
73
+ - The `feedback-footer` div (textarea, Send button, label, `.feedback-row`)
74
+ - Associated CSS (`.feedback-footer`, `.feedback-footer label`, `.feedback-row`, textarea and button styles within it)
75
+
76
+ **Add:**
77
+ - `<!-- CONTENT -->` placeholder inside `#claude-content`, replacing the default text
78
+ - A selection indicator bar where the footer was, with two states:
79
+ - Default: "Click an option above, then return to the terminal"
80
+ - After selection: "Option B selected — return to terminal to continue"
81
+ - CSS for the indicator bar (subtle, similar visual weight to the existing header)
82
+
83
+ **Keep unchanged:**
84
+ - Header bar with "Brainstorm Companion" title and connection status
85
+ - `.main` wrapper and `#claude-content` container
86
+ - All component CSS (`.options`, `.cards`, `.mockup`, `.split`, `.pros-cons`, placeholders, mock elements)
87
+ - Dark/light theme variables and media query
88
+
89
+ ### `helper.js` (client-side script)
90
+
91
+ **Remove:**
92
+ - `sendToClaude()` function and the "Sent to Claude" page takeover
93
+ - `window.send()` function (was tied to the removed Send button)
94
+ - Form submission handler — no purpose without the feedback textarea, adds log noise
95
+ - Input change handler — same reason
96
+ - `pageshow` event listener (was added to fix textarea persistence — no textarea anymore)
97
+
98
+ **Keep:**
99
+ - WebSocket connection, reconnect logic, event queue
100
+ - Reload handler (`window.location.reload()` on server push)
101
+ - `window.toggleSelect()` for selection highlighting
102
+ - `window.selectedChoice` tracking
103
+ - `window.brainstorm.send()` and `window.brainstorm.choice()` — these are distinct from the removed `window.send()`. They call `sendEvent` which logs to the server via WebSocket. Useful for custom full-document pages.
104
+
105
+ **Narrow:**
106
+ - Click handler: capture only `[data-choice]` clicks, not all buttons/links. The broad capture was needed when the browser was a feedback channel; now it's just for selection tracking.
107
+
108
+ **Add:**
109
+ - On `data-choice` click, update the selection indicator bar text to show which option was selected.
110
+
111
+ **Remove from `window.brainstorm` API:**
112
+ - `brainstorm.sendToClaude` — no longer exists
113
+
114
+ ### `visual-companion.md` (skill instructions)
115
+
116
+ **Rewrite "The Loop" section** to the non-blocking flow described above. Remove all references to:
117
+ - `wait-for-feedback.sh`
118
+ - `TaskOutput` blocking
119
+ - Timeout/retry logic (600s timeout, 30-minute cap)
120
+ - "User Feedback Format" section describing `send-to-claude` JSON
121
+
122
+ **Replace with:**
123
+ - The new loop (write HTML → end turn → user responds in terminal → read `.events` → iterate)
124
+ - `.events` file format documentation
125
+ - Guidance that the terminal message is the primary feedback; `.events` provides the full browser interaction stream for additional context
126
+
127
+ **Keep:**
128
+ - Server startup/shutdown instructions
129
+ - Content fragment vs full document guidance
130
+ - CSS class reference and available components
131
+ - Design tips (scale fidelity to the question, 2-4 options per screen, etc.)
132
+
133
+ ### `wait-for-feedback.sh`
134
+
135
+ **Deleted entirely.**
136
+
137
+ ### `tests/brainstorm-server/server.test.js`
138
+
139
+ Tests that need updating:
140
+ - Test asserting `feedback-footer` presence in fragment responses — update to assert the selection indicator bar or `<!-- CONTENT -->` replacement
141
+ - Test asserting `helper.js` contains `send` — update to reflect narrowed API
142
+ - Test asserting `sendToClaude` CSS variable usage — remove (function no longer exists)
143
+
144
+ ## Platform Compatibility
145
+
146
+ The server code (`index.js`, `helper.js`, `frame-template.html`) is fully platform-agnostic — pure Node.js and browser JavaScript. No Claude Code-specific references. Already proven to work on Codex via background terminal interaction.
147
+
148
+ The skill instructions (`visual-companion.md`) are the platform-adaptive layer. Each platform's Claude uses its own tools to start the server, read `.events`, etc. The non-blocking model works naturally across platforms since it doesn't depend on any platform-specific blocking primitive.
149
+
150
+ ## What This Enables
151
+
152
+ - **TUI always responsive** during visual brainstorming
153
+ - **Mixed input** — click in browser + type in terminal, naturally merged
154
+ - **Graceful degradation** — browser down or user doesn't open it? Terminal still works
155
+ - **Simpler architecture** — no background tasks, no polling scripts, no timeout management
156
+ - **Cross-platform** — same server code works on Claude Code, Codex, and any future platform
157
+
158
+ ## What This Drops
159
+
160
+ - **Pure-browser feedback workflow** — user must return to the terminal to continue. The selection indicator bar guides them, but it's one extra step compared to the old click-Send-and-wait flow.
161
+ - **Inline text feedback from browser** — the textarea is gone. All text feedback goes through the terminal. This is intentional — the terminal is a better text input channel than a small textarea in a frame.
162
+ - **Immediate response on browser Send** — the old system had Claude respond the moment the user clicked Send. Now there's a gap while the user switches to the terminal. In practice this is seconds, and the user gets to add context in their terminal message.
@@ -0,0 +1,118 @@
1
+ # Zero-Dependency Brainstorm Server
2
+
3
+ Replace the brainstorm companion server's vendored node_modules (express, ws, chokidar — 714 tracked files) with a single zero-dependency `server.js` using only Node.js built-ins.
4
+
5
+ ## Motivation
6
+
7
+ Vendoring node_modules into the git repo creates a supply chain risk: frozen dependencies don't get security patches, 714 files of third-party code are committed without audit, and modifications to vendored code look like normal commits. While the actual risk is low (localhost-only dev server), eliminating it is straightforward.
8
+
9
+ ## Architecture
10
+
11
+ A single `server.js` file (~250-300 lines) using `http`, `crypto`, `fs`, and `path`. The file serves two roles:
12
+
13
+ - **When run directly** (`node server.js`): starts the HTTP/WebSocket server
14
+ - **When required** (`require('./server.js')`): exports WebSocket protocol functions for unit testing
15
+
16
+ ### WebSocket Protocol
17
+
18
+ Implements RFC 6455 for text frames only:
19
+
20
+ **Handshake:** Compute `Sec-WebSocket-Accept` from client's `Sec-WebSocket-Key` using SHA-1 + the RFC 6455 magic GUID. Return 101 Switching Protocols.
21
+
22
+ **Frame decoding (client to server):** Handle three masked length encodings:
23
+ - Small: payload < 126 bytes
24
+ - Medium: 126-65535 bytes (16-bit extended)
25
+ - Large: > 65535 bytes (64-bit extended)
26
+
27
+ XOR-unmask payload using 4-byte mask key. Return `{ opcode, payload, bytesConsumed }` or `null` for incomplete buffers. Reject unmasked frames.
28
+
29
+ **Frame encoding (server to client):** Unmasked frames with the same three length encodings.
30
+
31
+ **Opcodes handled:** TEXT (0x01), CLOSE (0x08), PING (0x09), PONG (0x0A). Unrecognized opcodes get a close frame with status 1003 (Unsupported Data).
32
+
33
+ **Deliberately skipped:** Binary frames, fragmented messages, extensions (permessage-deflate), subprotocols. These are unnecessary for small JSON text messages between localhost clients. Extensions and subprotocols are negotiated in the handshake — by not advertising them, they are never active.
34
+
35
+ **Buffer accumulation:** Each connection maintains a buffer. On `data`, append and loop `decodeFrame` until it returns null or buffer is empty.
36
+
37
+ ### HTTP Server
38
+
39
+ Three routes:
40
+
41
+ 1. **`GET /`** — Serve newest `.html` from screen directory by mtime. Detect full documents vs fragments, wrap fragments in frame template, inject helper.js. Return `text/html`. When no `.html` files exist, serve a hardcoded waiting page ("Waiting for Claude to push a screen...") with helper.js injected.
42
+ 2. **`GET /files/*`** — Serve static files from screen directory with MIME type lookup from a hardcoded extension map (html, css, js, png, jpg, gif, svg, json). Return 404 if not found.
43
+ 3. **Everything else** — 404.
44
+
45
+ WebSocket upgrade handled via the `'upgrade'` event on the HTTP server, separate from the request handler.
46
+
47
+ ### Configuration
48
+
49
+ Environment variables (all optional):
50
+
51
+ - `BRAINSTORM_PORT` — port to bind (default: random high port 49152-65535)
52
+ - `BRAINSTORM_HOST` — interface to bind (default: `127.0.0.1`)
53
+ - `BRAINSTORM_URL_HOST` — hostname for the URL in startup JSON (default: `localhost` when host is `127.0.0.1`, otherwise same as host)
54
+ - `BRAINSTORM_DIR` — screen directory path (default: `/tmp/brainstorm`)
55
+
56
+ ### Startup Sequence
57
+
58
+ 1. Create `SCREEN_DIR` if it doesn't exist (`mkdirSync` recursive)
59
+ 2. Load frame template and helper.js from `__dirname`
60
+ 3. Start HTTP server on configured host/port
61
+ 4. Start `fs.watch` on `SCREEN_DIR`
62
+ 5. On successful listen, log `server-started` JSON to stdout: `{ type, port, host, url_host, url, screen_dir }`
63
+ 6. Write the same JSON to `SCREEN_DIR/.server-info` so agents can find connection details when stdout is hidden (background execution)
64
+
65
+ ### Application-Level WebSocket Messages
66
+
67
+ When a TEXT frame arrives from a client:
68
+
69
+ 1. Parse as JSON. If parsing fails, log to stderr and continue.
70
+ 2. Log to stdout as `{ source: 'user-event', ...event }`.
71
+ 3. If the event contains a `choice` property, append the JSON to `SCREEN_DIR/.events` (one line per event).
72
+
73
+ ### File Watching
74
+
75
+ `fs.watch(SCREEN_DIR)` replaces chokidar. On HTML file events:
76
+
77
+ - On new file (`rename` event for a file that exists): delete `.events` file if present (`unlinkSync`), log `screen-added` to stdout as JSON
78
+ - On file change (`change` event): log `screen-updated` to stdout as JSON (do NOT clear `.events`)
79
+ - Both events: send `{ type: 'reload' }` to all connected WebSocket clients
80
+
81
+ Debounce per-filename with ~100ms timeout to prevent duplicate events (common on macOS and Linux).
82
+
83
+ ### Error Handling
84
+
85
+ - Malformed JSON from WebSocket clients: log to stderr, continue
86
+ - Unhandled opcodes: close with status 1003
87
+ - Client disconnects: remove from broadcast set
88
+ - `fs.watch` errors: log to stderr, continue
89
+ - No graceful shutdown logic — shell scripts handle process lifecycle via SIGTERM
90
+
91
+ ## What Changes
92
+
93
+ | Before | After |
94
+ |---|---|
95
+ | `index.js` + `package.json` + `package-lock.json` + 714 `node_modules` files | `server.js` (single file) |
96
+ | express, ws, chokidar dependencies | none |
97
+ | No static file serving | `/files/*` serves from screen directory |
98
+
99
+ ## What Stays the Same
100
+
101
+ - `helper.js` — no changes
102
+ - `frame-template.html` — no changes
103
+ - `start-server.sh` — one-line update: `index.js` to `server.js`
104
+ - `stop-server.sh` — no changes
105
+ - `visual-companion.md` — no changes
106
+ - All existing server behavior and external contract
107
+
108
+ ## Platform Compatibility
109
+
110
+ - `server.js` uses only cross-platform Node built-ins
111
+ - `fs.watch` is reliable for single flat directories on macOS, Linux, and Windows
112
+ - Shell scripts require bash (Git Bash on Windows, which is required for Claude Code)
113
+
114
+ ## Testing
115
+
116
+ **Unit tests** (`ws-protocol.test.js`): Test WebSocket frame encoding/decoding, handshake computation, and protocol edge cases directly by requiring `server.js` exports.
117
+
118
+ **Integration tests** (`server.test.js`): Test full server behavior — HTTP serving, WebSocket communication, file watching, brainstorming workflow. Uses `ws` npm package as a test-only client dependency (not shipped to end users).
@@ -0,0 +1,303 @@
1
+ # Testing Superpowers Skills
2
+
3
+ This document describes how to test Superpowers skills, particularly the integration tests for complex skills like `subagent-driven-development`.
4
+
5
+ ## Overview
6
+
7
+ Testing skills that involve subagents, workflows, and complex interactions requires running actual Claude Code sessions in headless mode and verifying their behavior through session transcripts.
8
+
9
+ ## Test Structure
10
+
11
+ ```
12
+ tests/
13
+ ├── claude-code/
14
+ │ ├── test-helpers.sh # Shared test utilities
15
+ │ ├── test-subagent-driven-development-integration.sh
16
+ │ ├── analyze-token-usage.py # Token analysis tool
17
+ │ └── run-skill-tests.sh # Test runner (if exists)
18
+ ```
19
+
20
+ ## Running Tests
21
+
22
+ ### Integration Tests
23
+
24
+ Integration tests execute real Claude Code sessions with actual skills:
25
+
26
+ ```bash
27
+ # Run the subagent-driven-development integration test
28
+ cd tests/claude-code
29
+ ./test-subagent-driven-development-integration.sh
30
+ ```
31
+
32
+ **Note:** Integration tests can take 10-30 minutes as they execute real implementation plans with multiple subagents.
33
+
34
+ ### Requirements
35
+
36
+ - Must run from the **superpowers plugin directory** (not from temp directories)
37
+ - Claude Code must be installed and available as `claude` command
38
+ - Local dev marketplace must be enabled: `"superpowers@superpowers-dev": true` in `~/.claude/settings.json`
39
+
40
+ ## Integration Test: subagent-driven-development
41
+
42
+ ### What It Tests
43
+
44
+ The integration test verifies the `subagent-driven-development` skill correctly:
45
+
46
+ 1. **Plan Loading**: Reads the plan once at the beginning
47
+ 2. **Full Task Text**: Provides complete task descriptions to subagents (doesn't make them read files)
48
+ 3. **Self-Review**: Ensures subagents perform self-review before reporting
49
+ 4. **Review Order**: Runs spec compliance review before code quality review
50
+ 5. **Review Loops**: Uses review loops when issues are found
51
+ 6. **Independent Verification**: Spec reviewer reads code independently, doesn't trust implementer reports
52
+
53
+ ### How It Works
54
+
55
+ 1. **Setup**: Creates a temporary Node.js project with a minimal implementation plan
56
+ 2. **Execution**: Runs Claude Code in headless mode with the skill
57
+ 3. **Verification**: Parses the session transcript (`.jsonl` file) to verify:
58
+ - Skill tool was invoked
59
+ - Subagents were dispatched (Task tool)
60
+ - TodoWrite was used for tracking
61
+ - Implementation files were created
62
+ - Tests pass
63
+ - Git commits show proper workflow
64
+ 4. **Token Analysis**: Shows token usage breakdown by subagent
65
+
66
+ ### Test Output
67
+
68
+ ```
69
+ ========================================
70
+ Integration Test: subagent-driven-development
71
+ ========================================
72
+
73
+ Test project: /tmp/tmp.xyz123
74
+
75
+ === Verification Tests ===
76
+
77
+ Test 1: Skill tool invoked...
78
+ [PASS] subagent-driven-development skill was invoked
79
+
80
+ Test 2: Subagents dispatched...
81
+ [PASS] 7 subagents dispatched
82
+
83
+ Test 3: Task tracking...
84
+ [PASS] TodoWrite used 5 time(s)
85
+
86
+ Test 6: Implementation verification...
87
+ [PASS] src/math.js created
88
+ [PASS] add function exists
89
+ [PASS] multiply function exists
90
+ [PASS] test/math.test.js created
91
+ [PASS] Tests pass
92
+
93
+ Test 7: Git commit history...
94
+ [PASS] Multiple commits created (3 total)
95
+
96
+ Test 8: No extra features added...
97
+ [PASS] No extra features added
98
+
99
+ =========================================
100
+ Token Usage Analysis
101
+ =========================================
102
+
103
+ Usage Breakdown:
104
+ ----------------------------------------------------------------------------------------------------
105
+ Agent Description Msgs Input Output Cache Cost
106
+ ----------------------------------------------------------------------------------------------------
107
+ main Main session (coordinator) 34 27 3,996 1,213,703 $ 4.09
108
+ 3380c209 implementing Task 1: Create Add Function 1 2 787 24,989 $ 0.09
109
+ 34b00fde implementing Task 2: Create Multiply Function 1 4 644 25,114 $ 0.09
110
+ 3801a732 reviewing whether an implementation matches... 1 5 703 25,742 $ 0.09
111
+ 4c142934 doing a final code review... 1 6 854 25,319 $ 0.09
112
+ 5f017a42 a code reviewer. Review Task 2... 1 6 504 22,949 $ 0.08
113
+ a6b7fbe4 a code reviewer. Review Task 1... 1 6 515 22,534 $ 0.08
114
+ f15837c0 reviewing whether an implementation matches... 1 6 416 22,485 $ 0.07
115
+ ----------------------------------------------------------------------------------------------------
116
+
117
+ TOTALS:
118
+ Total messages: 41
119
+ Input tokens: 62
120
+ Output tokens: 8,419
121
+ Cache creation tokens: 132,742
122
+ Cache read tokens: 1,382,835
123
+
124
+ Total input (incl cache): 1,515,639
125
+ Total tokens: 1,524,058
126
+
127
+ Estimated cost: $4.67
128
+ (at $3/$15 per M tokens for input/output)
129
+
130
+ ========================================
131
+ Test Summary
132
+ ========================================
133
+
134
+ STATUS: PASSED
135
+ ```
136
+
137
+ ## Token Analysis Tool
138
+
139
+ ### Usage
140
+
141
+ Analyze token usage from any Claude Code session:
142
+
143
+ ```bash
144
+ python3 tests/claude-code/analyze-token-usage.py ~/.claude/projects/<project-dir>/<session-id>.jsonl
145
+ ```
146
+
147
+ ### Finding Session Files
148
+
149
+ Session transcripts are stored in `~/.claude/projects/` with the working directory path encoded:
150
+
151
+ ```bash
152
+ # Example for /Users/jesse/Documents/GitHub/superpowers/superpowers
153
+ SESSION_DIR="$HOME/.claude/projects/-Users-jesse-Documents-GitHub-superpowers-superpowers"
154
+
155
+ # Find recent sessions
156
+ ls -lt "$SESSION_DIR"/*.jsonl | head -5
157
+ ```
158
+
159
+ ### What It Shows
160
+
161
+ - **Main session usage**: Token usage by the coordinator (you or main Claude instance)
162
+ - **Per-subagent breakdown**: Each Task invocation with:
163
+ - Agent ID
164
+ - Description (extracted from prompt)
165
+ - Message count
166
+ - Input/output tokens
167
+ - Cache usage
168
+ - Estimated cost
169
+ - **Totals**: Overall token usage and cost estimate
170
+
171
+ ### Understanding the Output
172
+
173
+ - **High cache reads**: Good - means prompt caching is working
174
+ - **High input tokens on main**: Expected - coordinator has full context
175
+ - **Similar costs per subagent**: Expected - each gets similar task complexity
176
+ - **Cost per task**: Typical range is $0.05-$0.15 per subagent depending on task
177
+
178
+ ## Troubleshooting
179
+
180
+ ### Skills Not Loading
181
+
182
+ **Problem**: Skill not found when running headless tests
183
+
184
+ **Solutions**:
185
+ 1. Ensure you're running FROM the superpowers directory: `cd /path/to/superpowers && tests/...`
186
+ 2. Check `~/.claude/settings.json` has `"superpowers@superpowers-dev": true` in `enabledPlugins`
187
+ 3. Verify skill exists in `skills/` directory
188
+
189
+ ### Permission Errors
190
+
191
+ **Problem**: Claude blocked from writing files or accessing directories
192
+
193
+ **Solutions**:
194
+ 1. Use `--permission-mode bypassPermissions` flag
195
+ 2. Use `--add-dir /path/to/temp/dir` to grant access to test directories
196
+ 3. Check file permissions on test directories
197
+
198
+ ### Test Timeouts
199
+
200
+ **Problem**: Test takes too long and times out
201
+
202
+ **Solutions**:
203
+ 1. Increase timeout: `timeout 1800 claude ...` (30 minutes)
204
+ 2. Check for infinite loops in skill logic
205
+ 3. Review subagent task complexity
206
+
207
+ ### Session File Not Found
208
+
209
+ **Problem**: Can't find session transcript after test run
210
+
211
+ **Solutions**:
212
+ 1. Check the correct project directory in `~/.claude/projects/`
213
+ 2. Use `find ~/.claude/projects -name "*.jsonl" -mmin -60` to find recent sessions
214
+ 3. Verify test actually ran (check for errors in test output)
215
+
216
+ ## Writing New Integration Tests
217
+
218
+ ### Template
219
+
220
+ ```bash
221
+ #!/usr/bin/env bash
222
+ set -euo pipefail
223
+
224
+ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
225
+ source "$SCRIPT_DIR/test-helpers.sh"
226
+
227
+ # Create test project
228
+ TEST_PROJECT=$(create_test_project)
229
+ trap "cleanup_test_project $TEST_PROJECT" EXIT
230
+
231
+ # Set up test files...
232
+ cd "$TEST_PROJECT"
233
+
234
+ # Run Claude with skill
235
+ PROMPT="Your test prompt here"
236
+ cd "$SCRIPT_DIR/../.." && timeout 1800 claude -p "$PROMPT" \
237
+ --allowed-tools=all \
238
+ --add-dir "$TEST_PROJECT" \
239
+ --permission-mode bypassPermissions \
240
+ 2>&1 | tee output.txt
241
+
242
+ # Find and analyze session
243
+ WORKING_DIR_ESCAPED=$(echo "$SCRIPT_DIR/../.." | sed 's/\\//-/g' | sed 's/^-//')
244
+ SESSION_DIR="$HOME/.claude/projects/$WORKING_DIR_ESCAPED"
245
+ SESSION_FILE=$(find "$SESSION_DIR" -name "*.jsonl" -type f -mmin -60 | sort -r | head -1)
246
+
247
+ # Verify behavior by parsing session transcript
248
+ if grep -q '"name":"Skill".*"skill":"your-skill-name"' "$SESSION_FILE"; then
249
+ echo "[PASS] Skill was invoked"
250
+ fi
251
+
252
+ # Show token analysis
253
+ python3 "$SCRIPT_DIR/analyze-token-usage.py" "$SESSION_FILE"
254
+ ```
255
+
256
+ ### Best Practices
257
+
258
+ 1. **Always cleanup**: Use trap to cleanup temp directories
259
+ 2. **Parse transcripts**: Don't grep user-facing output - parse the `.jsonl` session file
260
+ 3. **Grant permissions**: Use `--permission-mode bypassPermissions` and `--add-dir`
261
+ 4. **Run from plugin dir**: Skills only load when running from the superpowers directory
262
+ 5. **Show token usage**: Always include token analysis for cost visibility
263
+ 6. **Test real behavior**: Verify actual files created, tests passing, commits made
264
+
265
+ ## Session Transcript Format
266
+
267
+ Session transcripts are JSONL (JSON Lines) files where each line is a JSON object representing a message or tool result.
268
+
269
+ ### Key Fields
270
+
271
+ ```json
272
+ {
273
+ "type": "assistant",
274
+ "message": {
275
+ "content": [...],
276
+ "usage": {
277
+ "input_tokens": 27,
278
+ "output_tokens": 3996,
279
+ "cache_read_input_tokens": 1213703
280
+ }
281
+ }
282
+ }
283
+ ```
284
+
285
+ ### Tool Results
286
+
287
+ ```json
288
+ {
289
+ "type": "user",
290
+ "toolUseResult": {
291
+ "agentId": "3380c209",
292
+ "usage": {
293
+ "input_tokens": 2,
294
+ "output_tokens": 787,
295
+ "cache_read_input_tokens": 24989
296
+ },
297
+ "prompt": "You are implementing Task 1...",
298
+ "content": [{"type": "text", "text": "..."}]
299
+ }
300
+ }
301
+ ```
302
+
303
+ The `agentId` field links to subagent sessions, and the `usage` field contains token usage for that specific subagent invocation.