ai-or-die 0.1.22 → 0.1.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -52,7 +52,7 @@ jobs:
52
52
  playwright-report/
53
53
  retention-days: 14
54
54
 
55
- test-browser-functional:
55
+ test-browser-functional-core:
56
56
  runs-on: ${{ matrix.os }}
57
57
  strategy:
58
58
  matrix:
@@ -65,13 +65,38 @@ jobs:
65
65
  - run: npm ci
66
66
  - name: Install Playwright browsers
67
67
  run: npx playwright install chromium --with-deps
68
- - name: Run functional browser tests
69
- run: npx playwright test --config e2e/playwright.config.js --project functional
68
+ - name: Run functional core tests
69
+ run: npx playwright test --config e2e/playwright.config.js --project functional-core
70
70
  - name: Upload Playwright report
71
71
  uses: actions/upload-artifact@v4
72
72
  if: ${{ !cancelled() }}
73
73
  with:
74
- name: playwright-functional-${{ matrix.os }}
74
+ name: playwright-functional-core-${{ matrix.os }}
75
+ path: |
76
+ e2e/test-results/
77
+ playwright-report/
78
+ retention-days: 14
79
+
80
+ test-browser-functional-extended:
81
+ runs-on: ${{ matrix.os }}
82
+ strategy:
83
+ matrix:
84
+ os: [ubuntu-latest, windows-latest]
85
+ steps:
86
+ - uses: actions/checkout@v4
87
+ - uses: actions/setup-node@v4
88
+ with:
89
+ node-version: '22'
90
+ - run: npm ci
91
+ - name: Install Playwright browsers
92
+ run: npx playwright install chromium --with-deps
93
+ - name: Run functional extended tests
94
+ run: npx playwright test --config e2e/playwright.config.js --project functional-extended
95
+ - name: Upload Playwright report
96
+ uses: actions/upload-artifact@v4
97
+ if: ${{ !cancelled() }}
98
+ with:
99
+ name: playwright-functional-extended-${{ matrix.os }}
75
100
  path: |
76
101
  e2e/test-results/
77
102
  playwright-report/
package/CLAUDE.md CHANGED
@@ -19,17 +19,29 @@ Available agents: **Architect**, **Engineer**, **QA Reviewer**, **Troubleshooter
19
19
 
20
20
  ### Documentation-Driven Workflow
21
21
  Before starting any task, consult the relevant documentation:
22
- - `docs/agent-instructions/` -- Philosophy, research guidelines, testing standards, tooling conventions
22
+ - `docs/agent-instructions/` -- Agent workflow guides:
23
+ - `00-philosophy.md` -- Core principles
24
+ - `01-research-and-web.md` -- Research guidelines
25
+ - `02-testing-and-validation.md` -- Testing standards
26
+ - `03-tooling-and-pipelines.md` -- Tooling conventions
27
+ - `04-handoff-protocol.md` -- How to leave the repo clean for the next agent
28
+ - `05-defensive-coding.md` -- Error prevention, cross-platform traps
29
+ - `06-ci-first-testing.md` -- CI-only testing, E2E debugging, performance budget
30
+ - `07-docs-hygiene.md` -- Keeping documentation in sync
31
+ - `08-multi-agent-consultation.md` -- When and how to consult expert subagents
23
32
  - `docs/adrs/` -- Architecture Decision Records (check before proposing new patterns)
24
33
  - `docs/specs/` -- Component specifications (read before implementing, update after changing behavior)
25
34
  - `docs/architecture/` -- System diagrams and component overviews
26
- - `docs/history/` -- Incident post-mortems and debugging notes
35
+ - `docs/history/` -- Solved problems and debugging notes (check before debugging any issue)
27
36
 
28
37
  ### Mandatory Rules
29
38
  1. **Spec updates with code changes**: When code behavior changes, the corresponding spec in `docs/specs/` must be updated in the same commit or PR.
30
39
  2. **ADR compliance**: Never contradict an accepted ADR. To change direction, write a new ADR that supersedes the old one.
31
40
  3. **Cross-platform support**: All code must work on both Windows and Linux. Use `path.join()` for file paths, provide `.sh` and `.ps1` script variants, and test on both platforms in CI.
32
41
  4. **Test coverage**: Every feature and bug fix requires tests. No exceptions.
42
+ 5. **CI-only testing**: All testing happens on GitHub Actions runners. Never test locally. E2E tests are the only true validation. Push → draft PR → CI → iterate.
43
+ 6. **Document what you solve**: Every solved problem goes in `docs/history/`. LLMs don't carry memories — written docs are the only institutional memory.
44
+ 7. **Consult before committing**: For significant decisions, spawn expert subagents (architect, principal engineer, lead QA, PM, designer, user researcher) in parallel. See `docs/agent-instructions/08-multi-agent-consultation.md`.
33
45
 
34
46
  ## Common Commands
35
47
 
@@ -0,0 +1,65 @@
1
+ # ADR-0008: E2E Test Parallelization Strategy
2
+
3
+ ## Status
4
+
5
+ **Accepted**
6
+
7
+ ## Date
8
+
9
+ 2026-02-07
10
+
11
+ ## Context
12
+
13
+ The E2E test suite has grown to 16 spec files across 6 Playwright projects. The `functional` project — containing tests 02-07, 09-image-paste, and 09-background-notifications — runs approximately 30 tests sequentially with `workers: 1`. On GitHub Actions runners, this takes 7-15 minutes per platform, exceeding the 7-minute performance budget for CI feedback loops.
14
+
15
+ Fast CI feedback is critical because all testing happens exclusively on GitHub runners (no local testing). The push → CI → fix → push cycle must be fast enough that agents can iterate efficiently.
16
+
17
+ ## Decision
18
+
19
+ Split the functional test group into two sub-groups and enable parallel workers in CI:
20
+
21
+ ### Test Split
22
+ - **`functional-core`**: Tests `02-terminal-io`, `03-clipboard`, `04-context-menu`, `05-tab-switching` (core terminal interaction features)
23
+ - **`functional-extended`**: Tests `06-large-paste`, `07-vim-and-session`, `09-image-paste`, `09-background-notifications` (extended features and cross-cutting concerns)
24
+
25
+ ### Parallel Workers
26
+ - Set `workers: process.env.CI ? 2 : 1` in `e2e/playwright.config.js`
27
+ - CI runs 2 Playwright workers per job for parallel test execution
28
+ - Local development retains 1 worker for debugging simplicity (though local testing is not the primary workflow)
29
+
30
+ ### CI Pipeline Changes
31
+ - Replace single `test-browser-functional` job with two: `test-browser-functional-core` and `test-browser-functional-extended`
32
+ - Each runs independently and in parallel with all other browser test jobs
33
+ - Each uploads artifacts with distinct names for failure diagnosis
34
+
35
+ ### Why this works
36
+ - Each test already creates its own server instance via `createServer()` with an ephemeral port (port 0)
37
+ - Sessions are per-server, eliminating cross-test state contamination
38
+ - Playwright provides browser context isolation between parallel tests
39
+ - No shared filesystem resources detected in the test suite
40
+
41
+ ## Consequences
42
+
43
+ ### Positive
44
+
45
+ - No CI job exceeds 7 minutes — faster feedback for the push-fix-push workflow
46
+ - More granular job names in CI (functional-core vs functional-extended) aid debugging — agents can immediately see which category of tests failed
47
+ - Parallel workers within jobs further reduce wall-clock time
48
+ - Sets a pattern for future test group splits as the suite grows
49
+
50
+ ### Negative
51
+
52
+ - More CI jobs to monitor (6 browser test job types instead of 5, plus unit tests and build-binary)
53
+ - Artifact names become longer and more numerous
54
+ - If test isolation assumptions prove wrong, parallel execution could introduce flakiness (mitigated by the existing ephemeral-port pattern)
55
+
56
+ ### Neutral
57
+
58
+ - Existing test files require no code changes — only configuration and CI workflow updates
59
+ - The `workers: 2` setting is conservative and can be increased if runners have sufficient resources
60
+
61
+ ## Notes
62
+
63
+ - When any job approaches 6 minutes consistently, split it further
64
+ - When the test suite exceeds 80 tests, re-evaluate the overall split strategy
65
+ - Monitor for flaky tests that may indicate parallel execution issues
@@ -26,14 +26,26 @@ Write tests alongside implementation, not after. The workflow:
26
26
  - Use temp directories for file system tests (see `session-store.test.js` pattern)
27
27
  - Test cross-platform behavior: path construction, command resolution, shell detection
28
28
 
29
+ ## CI-Only Testing
30
+
31
+ All testing happens on GitHub Actions runners. No local test runs. Ever.
32
+
33
+ - Local environments are unreliable: missing native modules, stale state, platform differences
34
+ - CI provides fresh, reproducible, cross-platform results every time
35
+ - E2E tests are the only true validation — if they pass on CI, the feature works
36
+
37
+ The workflow: write code → push to branch → open draft PR → CI runs → read results → fix → push again.
38
+
39
+ See `docs/agent-instructions/06-ci-first-testing.md` for the complete CI workflow guide, job map, and debugging playbook.
40
+
29
41
  ## Self-Validation
30
42
 
31
43
  Before committing, every agent must:
32
44
 
33
- 1. Run `npm test` all tests pass
34
- 2. Run `npm start` server boots without errors
35
- 3. Run `scripts/validate.sh` (Linux) or `scripts/validate.ps1` (Windows)
36
- 4. Verify the change doesn't break existing functionality
45
+ 1. Push to branch and open a draft PR to trigger CI
46
+ 2. Verify all CI jobs pass on both ubuntu-latest and windows-latest
47
+ 3. Check `docs/history/` for known issues if any job fails
48
+ 4. Verify the change doesn't break existing functionality (CI confirms this)
37
49
 
38
50
  ## What to Test
39
51
 
@@ -50,9 +62,9 @@ Before committing, every agent must:
50
62
  - Auth middleware behavior
51
63
 
52
64
  ### For Client Changes
53
- - Manual browser testing (create session, select tool, verify output)
54
- - Check mobile responsiveness
55
- - Verify WebSocket reconnection
65
+ - E2E tests via Playwright (verified on CI, never locally)
66
+ - Mobile viewport tests via mobile-iphone and mobile-pixel Playwright projects
67
+ - WebSocket reconnection covered by E2E functional tests
56
68
 
57
69
  ## When Tests Fail
58
70
 
@@ -14,14 +14,13 @@ If you perform a verification task twice, script it. All scripts live in the `sc
14
14
 
15
15
  ### GitHub Actions
16
16
 
17
- The CI pipeline (`.github/workflows/ci.yml`) runs on every push and PR:
18
-
19
- 1. **Matrix**: Runs on both `ubuntu-latest` and `windows-latest`
20
- 2. **Install**: `npm ci`
21
- 3. **Lint**: ESLint check
22
- 4. **Test**: `npm test` with coverage reporting
23
- 5. **Audit**: `npm audit` for security vulnerabilities
24
- 6. **Docs Check**: Verify docs/ structure exists
17
+ The CI pipeline (`.github/workflows/ci.yml`) runs on every push and PR. It runs 8 job types in parallel across ubuntu-latest and windows-latest (16 total jobs):
18
+
19
+ - **Unit tests**: `npm test` + `npm audit`
20
+ - **Browser E2E tests**: 6 Playwright job types (golden-path, functional-core, functional-extended, mobile, visual-regression, new-features)
21
+ - **Binary build**: SEA binary compilation + smoke tests
22
+
23
+ See `06-ci-first-testing.md` for the full CI job map, artifact details, and debugging workflow. CI is the only authority on whether code works (see ADR-0008 for the parallelization strategy).
25
24
 
26
25
  ### Release Pipeline
27
26
 
@@ -0,0 +1,63 @@
1
+ # Handoff Protocol
2
+
3
+ ## The Golden Rule
4
+
5
+ Every session ends with a cleaner repo than it started. If you touched it, you documented it. If you broke it, you fixed it. If you couldn't finish, you left a trail.
6
+
7
+ ## Pre-Handoff Checklist
8
+
9
+ Before ending any work session, verify:
10
+
11
+ 1. **All CI jobs pass.** Push to your branch and check GitHub Actions. Both `ubuntu-latest` and `windows-latest` must be green. Do not hand off a red build.
12
+ 2. **Documentation is updated.** Specs in `docs/specs/` match the current code. ADRs are written for any architectural decisions made during the session.
13
+ 3. **No orphaned work-in-progress.** No half-implemented features sitting uncommitted. Everything is either committed and pushed, or explicitly tracked in a GitHub issue.
14
+ 4. **Commit messages explain "why", not just "what".** A future agent reading the git log should understand the reasoning without opening the diff.
15
+ 5. **New patterns and conventions are documented.** If you introduced a new coding pattern, utility, or convention, write it down in the relevant spec or instruction doc.
16
+
17
+ ## Work-in-Progress Protocol
18
+
19
+ When you cannot finish a task:
20
+
21
+ - Create a GitHub issue with full context: what was attempted, where it stopped, what blockers exist, and what the next steps are.
22
+ - Use `[WIP]` prefix in commit messages for incomplete work.
23
+ - List which files are mid-change and what state they are in.
24
+ - Reference relevant specs, ADRs, and CI run links.
25
+ - Never leave broken tests on main. If your work breaks tests, either fix them or revert before ending.
26
+
27
+ ## Clean Commit Hygiene
28
+
29
+ - Follow Conventional Commits: `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `refactor:`.
30
+ - One concern per commit. Do not mix a bug fix with a feature addition.
31
+ - Reference GitHub issues in the message: `fix: resolve WebSocket race in image upload (#42)`.
32
+ - Commit messages should be self-contained. Another agent reading the git log should understand what happened and why without reading the diff.
33
+
34
+ ## Session Context Dump
35
+
36
+ What to leave behind for the next agent:
37
+
38
+ - Updated specs in `docs/specs/` reflecting any behavior changes.
39
+ - Research findings documented in the relevant ADR or spec.
40
+ - Error patterns discovered during debugging added to `docs/history/`.
41
+ - Decisions made and their rationale recorded in ADRs.
42
+ - If you modified the CI pipeline, document what changed and why.
43
+
44
+ ## Log What You Solved
45
+
46
+ When you encounter and solve a problem, document it in `docs/history/`. LLMs do not carry memories between sessions -- written docs are the only institutional memory. Every solved problem that is not documented is a problem that will be solved again.
47
+
48
+ See `07-docs-hygiene.md` for the history entry format and full guidelines. Before debugging any issue, always check `docs/history/` first.
49
+
50
+ ## Anti-Patterns
51
+
52
+ Do NOT do any of these:
53
+
54
+ - Leave vague commit messages like "Made some changes" or "Updated stuff".
55
+ - Push uncommitted or unstaged work.
56
+ - Leave broken tests and move on.
57
+ - Make architectural decisions without writing an ADR.
58
+ - Solve a problem without documenting the solution.
59
+ - Skip spec updates when behavior changes.
60
+ - Assume the next agent will "figure it out".
61
+ - Delete or disable tests to make CI pass.
62
+ - Commit secrets, API keys, tokens, or `.env` files. Check `git diff --staged` for sensitive data before every commit.
63
+ - Expand scope beyond what was asked. If you discover adjacent issues, file them as separate GitHub issues. Do not expand scope without explicit approval.
@@ -0,0 +1,170 @@
1
+ # Defensive Coding
2
+
3
+ ## Validate at Boundaries
4
+
5
+ Trust nothing that crosses a system boundary. Every REST endpoint, WebSocket handler, and bridge method should validate its inputs before processing.
6
+
7
+ Where boundaries exist in this codebase:
8
+
9
+ - REST API handlers in `src/server.js` -- validate request params, body, headers
10
+ - WebSocket message handlers -- validate `type` field, required fields per message type
11
+ - Bridge methods (`startSession`, `sendInput`, `resize`) -- validate sessionId exists, dimensions are positive integers
12
+ - Client-to-server messages -- validate session ownership, check session is active
13
+
14
+ Pattern:
15
+
16
+ ```javascript
17
+ // Bad
18
+ handleMessage(wsId, message) {
19
+ const session = this.sessions.get(message.sessionId);
20
+ session.bridge.sendInput(message.data); // crashes if session doesn't exist
21
+ }
22
+
23
+ // Good
24
+ handleMessage(wsId, message) {
25
+ if (!message.sessionId) {
26
+ return this.sendError(wsId, 'Missing sessionId');
27
+ }
28
+ const session = this.sessions.get(message.sessionId);
29
+ if (!session) {
30
+ return this.sendError(wsId, `Session '${message.sessionId}' not found`);
31
+ }
32
+ if (!session.active) {
33
+ return this.sendError(wsId, `Session '${message.sessionId}' is not active`);
34
+ }
35
+ session.bridge.sendInput(message.data);
36
+ }
37
+ ```
38
+
39
+ ## Error Messages Are UI
40
+
41
+ Error messages are read by other agents trying to debug. Make them actionable.
42
+
43
+ Every error message should answer three questions:
44
+
45
+ 1. What went wrong?
46
+ 2. What was expected?
47
+ 3. What should be done about it?
48
+
49
+ ```javascript
50
+ // Bad
51
+ throw new Error('Invalid');
52
+ throw new Error('Not found');
53
+ throw new Error('Failed');
54
+
55
+ // Good
56
+ throw new Error(`Session '${sessionId}' not found. Available sessions: [${[...sessions.keys()].join(', ')}]`);
57
+ throw new Error(`Bridge '${toolId}' is not available. Run 'which ${command}' to verify installation. Searched paths: ${searchPaths.join(', ')}`);
58
+ throw new Error(`WebSocket message missing required field 'type'. Received: ${JSON.stringify(message)}`);
59
+ ```
60
+
61
+ ## Cross-Platform Landmines
62
+
63
+ This codebase runs on both Windows and Linux. Every line of code that touches the filesystem, spawns a process, or handles paths must account for both.
64
+
65
+ ### Paths
66
+
67
+ - ALWAYS use `path.join()`, never string concatenation with `/` or `\\`
68
+ - Use `os.homedir()`, never `process.env.HOME` (undefined on Windows)
69
+ - File paths are case-insensitive on Windows, case-sensitive on Linux
70
+ - Use `path.resolve()` to normalize paths before comparison
71
+
72
+ ### Process Spawning
73
+
74
+ - `where` on Windows, `which` on Linux -- check `process.platform`
75
+ - Windows uses ConPTY, Linux uses standard PTY -- different buffering behavior
76
+ - Executable extensions: `.exe`, `.cmd` on Windows, none on Linux
77
+ - Shell: `cmd.exe` or `powershell.exe` on Windows, `bash` or `sh` on Linux
78
+
79
+ ### Line Endings
80
+
81
+ - Never match output with exact strings -- use `.includes()` or `.trim()`
82
+ - Windows may inject `\r\n` where Linux gives `\n`
83
+ - PTY output may contain ANSI escape sequences -- strip them before comparing
84
+
85
+ ### The ConPTY Quirks
86
+
87
+ - Writes larger than 4096 bytes can overflow the ConPTY buffer on Windows
88
+ - Solution: chunked writes with delays (see `base-bridge.js` chunked write pattern)
89
+ - ConPTY may echo input back -- don't assume output is only from the spawned process
90
+
91
+ ## Async Safety
92
+
93
+ Node.js is async-first. Unhandled promise rejections crash the process.
94
+
95
+ Rules:
96
+
97
+ - Every `async` function must have try-catch at the top level
98
+ - Every `.then()` chain must have a `.catch()`
99
+ - Event handlers that call async code must wrap in try-catch
100
+ - Use the spawn watchdog pattern from `base-bridge.js`: set a timer when spawning a process, kill it if no output arrives within 30 seconds
101
+
102
+ ```javascript
103
+ // Bad -- unhandled rejection if startSession throws
104
+ ws.on('message', (data) => {
105
+ const msg = JSON.parse(data);
106
+ this.startSession(msg.sessionId);
107
+ });
108
+
109
+ // Good
110
+ ws.on('message', (data) => {
111
+ try {
112
+ const msg = JSON.parse(data);
113
+ this.startSession(msg.sessionId).catch(err => {
114
+ console.error(`Failed to start session ${msg.sessionId}:`, err);
115
+ this.sendError(wsId, err.message);
116
+ });
117
+ } catch (err) {
118
+ console.error('Failed to parse WebSocket message:', err);
119
+ }
120
+ });
121
+ ```
122
+
123
+ ## Fail Fast, Fail Loud
124
+
125
+ Silent failures are the worst kind. They create bugs that surface hours or sessions later, with no trail.
126
+
127
+ - Assert preconditions at function entry -- don't wait until line 50 to discover the input was invalid
128
+ - Log errors with full context before re-throwing: what function, what inputs, what state
129
+ - Never `catch` and silently swallow: `catch (err) { /* ignore */ }` -- this is forbidden
130
+ - If something "shouldn't happen," make it throw, not silently return null
131
+
132
+ ```javascript
133
+ // Bad -- silent null propagation
134
+ function getSession(id) {
135
+ return sessions.get(id); // returns undefined silently
136
+ }
137
+
138
+ // Good -- fail fast with context
139
+ function getSession(id) {
140
+ const session = sessions.get(id);
141
+ if (!session) {
142
+ throw new Error(`getSession: no session with id '${id}'. Active sessions: ${sessions.size}`);
143
+ }
144
+ return session;
145
+ }
146
+ ```
147
+
148
+ ## The "Fresh Machine" Test
149
+
150
+ Before considering any code complete, ask yourself: "Would this work on a brand new GitHub Actions runner with nothing pre-installed except Node.js 22?"
151
+
152
+ This means:
153
+
154
+ - No reliance on globally installed tools (unless you check for them and give a clear error)
155
+ - No hardcoded paths that only exist on your dev machine
156
+ - No cached `node_modules` assumptions -- `npm ci` installs from scratch
157
+ - No file system state left over from previous runs
158
+ - No environment variables that aren't set in CI
159
+
160
+ If the answer is "maybe," add a runtime check:
161
+
162
+ ```javascript
163
+ const commandPath = await this.findCommandAsync();
164
+ if (!commandPath) {
165
+ throw new Error(
166
+ `${this.toolName} CLI not found. Searched: ${this.searchPaths.join(', ')}. ` +
167
+ `Install ${this.toolName} or add it to PATH.`
168
+ );
169
+ }
170
+ ```
@@ -0,0 +1,268 @@
1
+ # CI-First Testing
2
+
3
+ ## E2E Tests Are the Source of Truth
4
+
5
+ End-to-end tests are the only true way to validate that the system works. Unit tests verify isolated logic. E2E tests prove the whole system -- server, WebSocket, terminal, browser UI -- actually functions as a user would experience it.
6
+
7
+ A feature is not done until its E2E tests pass on GitHub runners. If unit tests pass but E2E fails, the feature is broken. Period. No exceptions. No "it works on my machine." The GitHub runner is the only machine that matters.
8
+
9
+ Every new feature must have E2E test coverage. Every bug fix must have a regression E2E test. The E2E suite is the contract that tells the next agent "this is what working looks like."
10
+
11
+ ### Long E2E waits indicate bugs
12
+
13
+ If an E2E test requires long waits or generous timeouts to pass, that is a signal of a bug in the product code, not a test timing issue. No real user is going to wait 30 seconds for a terminal to respond or 10 seconds for a WebSocket to connect. If the test needs that much patience, the code is too slow and must be fixed. Tightening test timeouts is a legitimate way to catch performance regressions -- the test should reflect realistic user expectations, not compensate for sluggish code.
14
+
15
+ ## The Rule: CI Only
16
+
17
+ CI is the only authority on whether code works. Never consider a feature done based on local results alone.
18
+
19
+ Why:
20
+
21
+ - Local environments accumulate stale state, cached modules, and leftover config
22
+ - Native modules like `@lydell/node-pty` may not compile correctly locally
23
+ - Playwright browsers may be outdated or misconfigured locally
24
+ - Local testing only proves it works on one machine, one platform
25
+ - CI runs on both ubuntu-latest AND windows-latest -- that is the real test
26
+ - CI gives fresh, reproducible, cross-platform results every single time
27
+
28
+ You may run quick local checks for rapid iteration (e.g., syntax checks, single-file linting), but a feature is not done until CI passes. The GitHub runner is the only environment whose results count.
29
+
30
+ ## The Workflow
31
+
32
+ ```
33
+ Write code
34
+ |
35
+ v
36
+ Push to branch
37
+ |
38
+ v
39
+ Open draft PR (triggers CI automatically)
40
+ |
41
+ v
42
+ Wait for CI results (~5-7 minutes)
43
+ |
44
+ v
45
+ Read results: all green? --> Done
46
+ |
47
+ v (if red)
48
+ Download failure artifacts
49
+ |
50
+ v
51
+ Read traces, screenshots, terminal buffers
52
+ |
53
+ v
54
+ Fix the issue
55
+ |
56
+ v
57
+ Push again --> CI runs again --> repeat until green
58
+ ```
59
+
60
+ Use `gh pr create --draft` to trigger CI without requesting review. Use `gh run watch` to monitor CI progress from the terminal.
61
+
62
+ ## CI Job Map
63
+
64
+ The CI pipeline is defined in `.github/workflows/ci.yml`. It runs these jobs in parallel, each on both ubuntu-latest and windows-latest:
65
+
66
+ | Job | What it tests | Playwright Project | Tests |
67
+ |-----|--------------|-------------------|-------|
68
+ | `test` | Unit tests (Mocha) | N/A | `test/*.test.js` |
69
+ | `test-browser-golden` | Fresh user flow with real CLI | `golden-path` | `01-golden-path.spec.js` |
70
+ | `test-browser-functional-core` | Core terminal features | `functional-core` | `02-terminal-io`, `03-clipboard`, `04-context-menu`, `05-tab-switching` |
71
+ | `test-browser-functional-extended` | Extended features | `functional-extended` | `06-large-paste`, `07-vim-and-session`, `09-image-paste`, `09-background-notifications` |
72
+ | `test-browser-mobile` | Mobile viewport behavior | `mobile-iphone`, `mobile-pixel` | `08-mobile-portrait.spec.js` |
73
+ | `test-browser-visual` | Screenshot regression | `visual-regression` | `09-visual-regression.spec.js` |
74
+ | `test-browser-new-features` | Latest features | `new-features` | `10-command-palette` through `14-nerd-font-rendering` |
75
+ | `build-binary` | SEA binary build + smoke test | N/A | `scripts/smoke-test-binary.js` |
76
+
77
+ Total: 16 parallel job executions (8 job types x 2 platforms). All must pass for a green CI.
78
+
79
+ ### Playwright Project Configuration
80
+
81
+ The Playwright config at `e2e/playwright.config.js` defines how test files map to projects:
82
+
83
+ - `golden-path` matches `01-golden-path.spec.js`
84
+ - `functional-core` matches `/0[2-5]-.*\.spec\.js/`
85
+ - `functional-extended` matches `/0[6-7]-.*\.spec\.js|09-image-paste\.spec\.js|09-background-.*\.spec\.js/`
86
+ - `mobile-iphone` and `mobile-pixel` both match `08-mobile-portrait.spec.js` (with device-specific viewports)
87
+ - `visual-regression` matches `09-visual-regression.spec.js`
88
+ - `new-features` matches `/1[0-4]-.*\.spec\.js/`
89
+
90
+ ## Reading CI Failures
91
+
92
+ When CI fails:
93
+
94
+ 1. **Go to the Actions tab** on the PR. Find the failed run.
95
+ 2. **Identify the failing job.** Note which platform (ubuntu vs windows).
96
+ 3. **Read the job log.** Expand the failed step, look for the error message.
97
+ 4. **Download artifacts.** Each browser test job uploads artifacts on failure:
98
+ - `playwright-{job}-{os}.zip` -- contains test results, screenshots, traces
99
+ - `screenshot-baselines-{os}` -- visual regression baselines (visual job only)
100
+ - `screenshot-diffs-{os}` -- visual diff images (visual job only, on failure)
101
+
102
+ ### What the artifacts contain
103
+
104
+ - **Screenshots**: Captured on failure -- shows what the browser actually rendered
105
+ - **Traces**: Playwright trace files -- DOM snapshots, network requests, console logs at each test step (captured on first retry via `trace: 'on-first-retry'`)
106
+ - **Terminal buffer**: The xterm.js buffer content at failure time -- shows what the terminal displayed
107
+ - **WebSocket logs**: Messages exchanged between client and server
108
+ - **Console logs**: Browser console output captured by `setupPageCapture()`
109
+
110
+ ### Platform-specific failures
111
+
112
+ - **Fails on Windows only**: Usually path handling (`\\` vs `/`), shell command differences (`where` vs `which`), ConPTY buffering, or line ending issues
113
+ - **Fails on Linux only**: Usually permission issues, case-sensitive file names, or missing system dependencies
114
+ - **Fails on both**: Real bug in application logic
115
+
116
+ ## Using Playwright Traces
117
+
118
+ Download the trace from CI artifacts and view it:
119
+
120
+ ```bash
121
+ # Download artifacts (use gh CLI)
122
+ gh run download <run-id> -n playwright-functional-core-ubuntu-latest
123
+
124
+ # View trace in browser
125
+ npx playwright show-trace e2e/test-results/path-to-trace.zip
126
+ ```
127
+
128
+ The trace viewer shows:
129
+
130
+ - Step-by-step test execution with timestamps
131
+ - DOM snapshot at each step (inspectable)
132
+ - Network requests and responses
133
+ - Console log entries
134
+ - Screenshots before and after each action
135
+
136
+ This is the most powerful debugging tool available. Use it.
137
+
138
+ ## Check History Before Debugging
139
+
140
+ Before investigating any CI failure, check `docs/history/` for known issues and prior solutions. The problem may already be solved. If it's new, document the solution after fixing (see `07-docs-hygiene.md` for format).
141
+
142
+ ## E2E Tests as Debugging Tools
143
+
144
+ E2E tests serve dual purpose: validation and documentation.
145
+
146
+ ### Understanding expected behavior
147
+
148
+ Each spec file demonstrates how a feature should work. Before modifying a feature, read its test first -- it shows the intended behavior more precisely than any spec document.
149
+
150
+ ### When a test fails, consider both sides
151
+
152
+ A failing test means something is wrong, but the bug could live in either place:
153
+
154
+ - **Product code bug** -- The code doesn't work as intended. Fix the code, not the test (see ADR-0006).
155
+ - **Test mistake** -- The test has incorrect assertions, wrong selectors, bad timing, or flawed assumptions about expected behavior.
156
+
157
+ Always investigate both possibilities before committing a fix. Read the test carefully -- does it actually test the right thing? Then read the product code -- does it actually do what the spec says? Fixing the wrong side creates a false sense of security.
158
+
159
+ ### Reproducing bugs
160
+
161
+ 1. Find the closest existing test to the reported behavior
162
+ 2. Modify it (or add a new test case) to reproduce the issue
163
+ 3. Push to CI -- if the test fails, you have confirmed the bug
164
+ 4. Determine whether the bug is in the code or the test
165
+ 5. Fix the correct side
166
+ 6. Push again -- test should pass, confirming the fix
167
+
168
+ ### Adding regression tests
169
+
170
+ Every bug fix must include an E2E test that would have caught the bug. This prevents regression and documents the fix for future agents.
171
+
172
+ ## Creating New E2E Tests
173
+
174
+ ### Naming Convention
175
+
176
+ Tests are numbered by category:
177
+
178
+ - `01-*` -- Golden path (fresh user flow)
179
+ - `02-05` -- Core functional features (functional-core project)
180
+ - `06-07` -- Extended functional features (functional-extended project)
181
+ - `08-*` -- Mobile-specific
182
+ - `09-*` -- Cross-cutting: `09-image-paste` and `09-background-notifications` in functional-extended, `09-visual-regression` in visual-regression project
183
+ - `10-14` -- New features
184
+
185
+ Add new tests with the next available number in the appropriate range. Currently the highest number is `14-nerd-font-rendering.spec.js`.
186
+
187
+ ### Test Structure
188
+
189
+ ```javascript
190
+ const { test, expect } = require('@playwright/test');
191
+ const { createServer, createSessionViaApi } = require('../helpers/server-factory');
192
+ const {
193
+ waitForAppReady,
194
+ waitForTerminalCanvas,
195
+ typeInTerminal,
196
+ waitForTerminalText,
197
+ setupPageCapture,
198
+ attachFailureArtifacts,
199
+ joinSessionAndStartTerminal,
200
+ } = require('../helpers/terminal-helpers');
201
+
202
+ test.describe('Feature Name', () => {
203
+ let server, port, url;
204
+
205
+ test.beforeAll(async () => {
206
+ ({ server, port, url } = await createServer());
207
+ });
208
+
209
+ test.afterAll(async () => {
210
+ if (server) server.close();
211
+ });
212
+
213
+ test.afterEach(async ({ page }, testInfo) => {
214
+ await attachFailureArtifacts(page, testInfo);
215
+ });
216
+
217
+ test('should do the expected thing', async ({ page }) => {
218
+ setupPageCapture(page);
219
+ const sessionId = await createSessionViaApi(port, `Test_${Date.now()}`);
220
+ await page.goto(url);
221
+ await waitForAppReady(page);
222
+ await waitForTerminalCanvas(page);
223
+ await joinSessionAndStartTerminal(page, sessionId);
224
+ // ... test logic using terminal helpers
225
+ });
226
+ });
227
+ ```
228
+
229
+ ### Available Helpers
230
+
231
+ From `e2e/helpers/terminal-helpers.js`:
232
+
233
+ - `waitForAppReady(page)` -- Wait for app to fully initialize
234
+ - `waitForTerminalCanvas(page)` -- Wait for xterm.js container to render
235
+ - `focusTerminal(page)` -- Focus the terminal textarea for keyboard input
236
+ - `typeInTerminal(page, text)` -- Type text into the terminal with per-character delay
237
+ - `pressKey(page, key)` -- Press a key or key combination (e.g. `'Enter'`, `'Control+c'`)
238
+ - `readTerminalContent(page)` -- Read current terminal buffer via xterm.js API
239
+ - `waitForTerminalText(page, text, timeout)` -- Wait for specific text to appear in terminal
240
+ - `getTerminalDimensions(page)` -- Get terminal cols and rows
241
+ - `setupPageCapture(page)` -- Capture WebSocket messages and console logs (call before `page.goto()`)
242
+ - `attachFailureArtifacts(page, testInfo)` -- Attach debug artifacts on test failure (call in `afterEach`)
243
+ - `waitForWebSocket(page)` -- Wait for WebSocket connection to be open
244
+ - `joinSessionAndStartTerminal(page, sessionId)` -- Full session setup: join session and start terminal tool
245
+
246
+ From `e2e/helpers/server-factory.js`:
247
+
248
+ - `createServer()` -- Start a test server instance, returns `{ server, port, url }`
249
+ - `createSessionViaApi(port, name)` -- Create a session via REST API, returns sessionId
250
+
251
+ ### Registering in Playwright Config
252
+
253
+ Add new tests to the appropriate project in `e2e/playwright.config.js` by updating the `testMatch` pattern. Then update the corresponding CI job in `.github/workflows/ci.yml` if the new test does not already match an existing project regex.
254
+
255
+ For new feature tests numbered 10-14, they automatically match the `new-features` project regex `/1[0-4]-.*\.spec\.js/`. If you need number 15+, update the regex.
256
+
257
+ ## Performance Budget
258
+
259
+ No single CI job should take more than 7 minutes. This is a hard limit.
260
+
261
+ Fast CI feedback is critical for the push-fix-push workflow. If a job exceeds 7 minutes:
262
+
263
+ 1. Check if the job has too many tests -- split into sub-groups
264
+ 2. Check for tests with excessive waits or timeouts that could be tightened
265
+ 3. Consider splitting the job into multiple CI matrix entries
266
+ 4. Open an issue to track and fix the performance regression
267
+
268
+ Monitor job times after adding new E2E tests. Growth is expected, but the 7-minute budget must hold.
@@ -0,0 +1,124 @@
1
+ # Documentation Hygiene
2
+
3
+ ## The Spec-Code Contract
4
+
5
+ Every component in this codebase has a specification in `docs/specs/`. This is a binding contract:
6
+
7
+ - If behavior changes, the spec MUST be updated in the same commit. Not the next commit. Not the next PR. The same commit.
8
+ - If the spec says X and the code says Y, the code is wrong -- until the spec is deliberately updated.
9
+ - Pull requests that change behavior without updating specs are incomplete and should not be merged.
10
+
11
+ This is not bureaucracy. This is how agents that don't share memory stay in sync. The spec is the source of truth that persists across sessions.
12
+
13
+ ## When to Update What
14
+
15
+ | You did this | Update this |
16
+ |---|---|
17
+ | Added a new feature | Write or update spec in `docs/specs/` + write ADR if architectural decision was made |
18
+ | Fixed a bug | Update spec if behavior changed + add entry to `docs/history/` with root cause and fix |
19
+ | Refactored code | Write ADR if pattern changed + update spec if API surface changed |
20
+ | Added a dependency | Write ADR with research findings (version, license, CVE check, alternatives considered) |
21
+ | Changed the CI pipeline | Update `docs/agent-instructions/03-tooling-and-pipelines.md` and `06-ci-first-testing.md` |
22
+ | Changed WebSocket protocol | Update `docs/architecture/websocket-protocol.md` + update server spec |
23
+ | Added a new bridge | Update `docs/specs/bridges.md` + update `docs/architecture/bridge-pattern.md` |
24
+ | Changed E2E test structure | Update `docs/specs/e2e-testing.md` + update `06-ci-first-testing.md` CI job map |
25
+
26
+ When in doubt: update the docs. Over-documentation is always better than under-documentation in an AI-agent-driven codebase.
27
+
28
+ ## ADR Lifecycle
29
+
30
+ Architecture Decision Records are permanent artifacts. They capture the context, reasoning, and trade-offs of a decision at the time it was made.
31
+
32
+ ### Creating a new ADR
33
+
34
+ - Use the template at `docs/adrs/0000-template.md`
35
+ - Number sequentially: find the highest existing number and increment
36
+ - Status: "Accepted" with today's date
37
+ - Include: Context (why this decision was needed), Decision (what was chosen), Consequences (positive and negative)
38
+
39
+ ### Changing a decision
40
+
41
+ Never edit an accepted ADR. The original context and reasoning are historically valuable.
42
+
43
+ Instead:
44
+
45
+ 1. Create a new ADR that supersedes the old one
46
+ 2. In the new ADR, reference the old one: "Supersedes ADR-XXXX"
47
+ 3. In the old ADR, add a note: "Superseded by ADR-YYYY" with the date
48
+ 4. Keep the old ADR's original content intact
49
+
50
+ ### When an ADR is required
51
+
52
+ - Choosing between architectural approaches (e.g., ADR-0001: bridge base class)
53
+ - Adding or removing dependencies (e.g., ADR-0002: devtunnels over ngrok)
54
+ - Changing system topology (e.g., ADR-0003: multi-tool architecture)
55
+ - Platform-specific decisions (e.g., ADR-0004: cross-platform support)
56
+ - Distribution changes (e.g., ADR-0005: single binary distribution)
57
+ - Process decisions (e.g., ADR-0006: test-driven bug fixes)
58
+
59
+ ## History as Institutional Memory
60
+
61
+ `docs/history/` is the most important directory for autonomous AI agents. It's where lessons live.
62
+
63
+ LLMs don't carry memories between sessions. Every new session starts from zero context. The ONLY way to learn from past mistakes, debugging sessions, and hard-won insights is to write them down in `docs/history/`.
64
+
65
+ ### What to document
66
+
67
+ - Non-trivial bug fixes (especially platform-specific ones)
68
+ - CI failure patterns and their solutions
69
+ - Cross-platform gotchas discovered during development
70
+ - Debugging sessions that took significant effort
71
+ - Performance issues and how they were resolved
72
+ - Dependency conflicts and their resolutions
73
+
74
+ ### Format
75
+
76
+ File name: `YYYY-MM-DD-short-description.md`
77
+
78
+ Content structure:
79
+
80
+ ```markdown
81
+ # Short Description
82
+
83
+ ## What Happened
84
+ [The symptom or error observed. Include error messages, CI job names, platforms affected.]
85
+
86
+ ## Root Cause
87
+ [What actually caused the issue. Be specific -- which file, which line, which assumption was wrong.]
88
+
89
+ ## Fix
90
+ [What was changed and why. Reference commit hashes or PR numbers.]
91
+
92
+ ## Watch For
93
+ [Conditions that might trigger the same issue again. What future agents should be careful about.]
94
+ ```
95
+
96
+ ### The rule
97
+
98
+ Before debugging any failure, check `docs/history/` first. If the problem has been solved before, the answer is already there. If it hasn't, document your solution after fixing it.
99
+
100
+ A solved problem that isn't documented is a problem that will be solved again.
101
+
102
+ ## Stale Docs Are Bugs
103
+
104
+ Outdated documentation is not a low-priority cleanup task. It's a bug. It actively misleads the next agent, causing incorrect implementations, wasted CI cycles, and rework.
105
+
106
+ Treat stale docs with the same urgency as a failing test:
107
+
108
+ - If you notice a spec that doesn't match current behavior, update it immediately
109
+ - If you find an ADR that references deleted code, note it
110
+ - If a history entry has incorrect information, correct it
111
+ - If agent instructions reference outdated patterns, fix them
112
+
113
+ ## Pre-Commit Documentation Checklist
114
+
115
+ Before every commit, ask yourself these 6 questions:
116
+
117
+ 1. **Did I change behavior?** -- Update the relevant spec in `docs/specs/`
118
+ 2. **Did I make an architectural decision?** -- Write an ADR in `docs/adrs/`
119
+ 3. **Did I fix a bug?** -- Add a history entry in `docs/history/`
120
+ 4. **Did I solve a non-obvious problem?** -- Add a history entry in `docs/history/`
121
+ 5. **Did I change an API surface?** -- Update method signatures in the spec
122
+ 6. **Did I introduce a new pattern?** -- Document it in `docs/architecture/`
123
+
124
+ If the answer to any of these is yes and you haven't updated docs, your commit is incomplete.
@@ -0,0 +1,168 @@
1
+ # Multi-Agent Consultation
2
+
3
+ ## When In Doubt, Consult
4
+
5
+ Don't guess at architecture. Don't guess at testing strategy. Don't guess at UX. Don't guess at requirements.
6
+
7
+ When facing a decision that could go multiple ways, spawn specialized subagents to get expert perspectives before committing to an approach. The cost of consulting is a few minutes. The cost of guessing wrong is hours of rework, broken CI, and confused future agents.
8
+
9
+ This is not optional for significant decisions. This is how a well-run engineering org operates -- you get input from experts before making calls that affect the whole system.
10
+
11
+ If your runtime does not support spawning subagents, adopt the expert role yourself: explicitly state "Thinking as a Principal Engineer..." and reason from that perspective before proceeding. The goal is the expert thinking, not the subagent mechanism.
12
+
13
+ ## Available Expert Roles
14
+
15
+ Beyond the 5 team agents (Architect, Engineer, QA Reviewer, Troubleshooter, Researcher), consult these senior perspectives when the situation calls for it:
16
+
17
+ ### Architect
18
+
19
+ **When to consult:** System design, component boundaries, data flow, protocol changes.
20
+ **Ask for:** Design review, alternative approaches, risk assessment.
21
+
22
+ ### Principal Engineer
23
+
24
+ **When to consult:** Deep technical decisions, performance trade-offs, system reliability, concurrency issues, platform-specific behavior.
25
+ **Ask for:** Technical feasibility assessment, performance implications, edge case analysis.
26
+
27
+ ### Lead QA
28
+
29
+ **When to consult:** Test strategy, coverage gaps, regression risk, E2E test design, CI pipeline changes.
30
+ **Ask for:** Test plan review, risk assessment, coverage recommendations.
31
+
32
+ ### Principal Program Manager
33
+
34
+ **When to consult:** Requirements clarity, scope decisions, feature prioritization, user-facing changes, backwards compatibility.
35
+ **Ask for:** Requirements validation, scope check, impact analysis.
36
+
37
+ ### Designer
38
+
39
+ **When to consult:** UI/UX decisions, interaction patterns, accessibility, visual consistency, mobile behavior.
40
+ **Ask for:** Interaction review, accessibility audit, visual consistency check.
41
+
42
+ ### Lead User Researcher
43
+
44
+ **When to consult:** User impact assessment, usability concerns, workflow analysis, onboarding experience.
45
+ **Ask for:** User impact assessment, usability review, workflow validation.
46
+
47
+ ## Parallel Consultation
48
+
49
+ When facing a complex decision, spawn multiple expert subagents in parallel. Don't consult one at a time -- that wastes time.
50
+
51
+ ### Example: Changing the WebSocket protocol
52
+
53
+ This affects architecture, implementation, testing, and user experience. Consult simultaneously:
54
+
55
+ - **Architect** -- Is the protocol change consistent with existing patterns? What are the migration concerns?
56
+ - **Principal Engineer** -- What are the performance implications? Are there concurrency edge cases?
57
+ - **Lead QA** -- What tests need to change? What regression risks exist?
58
+
59
+ All three can run in parallel and return independent assessments.
60
+
61
+ ### Example: Adding a new UI component
62
+
63
+ - **Designer** -- Does it fit the existing design language? Is it accessible?
64
+ - **Lead User Researcher** -- Will users understand it? Does it fit the workflow?
65
+ - **Engineer** -- What's the implementation approach? What existing patterns apply?
66
+
67
+ ### Example: Debugging a platform-specific CI failure
68
+
69
+ - **Troubleshooter** -- What's the root cause? What's the minimal fix?
70
+ - **Principal Engineer** -- Is there a deeper architectural issue? Will this recur?
71
+ - **Lead QA** -- What test coverage is missing? How do we prevent regression?
72
+
73
+ ## How to Frame a Consultation
74
+
75
+ Give each subagent full context. A vague question gets a vague answer.
76
+
77
+ ### What to include in every consultation request
78
+
79
+ 1. **What you're trying to do** -- The goal, not just the task
80
+ 2. **What you've considered** -- Options you've thought about and why you're unsure
81
+ 3. **What constraints exist** -- Cross-platform requirements, performance budgets, backwards compatibility needs
82
+ 4. **What you need back** -- A specific deliverable: recommendation, risk assessment, alternative approaches, code review
83
+
84
+ ### Good consultation prompt
85
+
86
+ ```
87
+ I need to add chunked file upload support to the WebSocket protocol.
88
+
89
+ Context: Currently image uploads send the entire base64 payload in one message.
90
+ For files over 1MB this causes WebSocket frame size issues on some browsers.
91
+
92
+ Options I'm considering:
93
+ 1. Split into multiple WebSocket messages with sequence numbers
94
+ 2. Use a separate HTTP upload endpoint
95
+ 3. Use WebSocket binary frames with streaming
96
+
97
+ Constraints:
98
+ - Must work on both desktop and mobile browsers
99
+ - Must not break existing image paste flow
100
+ - Server must handle concurrent uploads from multiple sessions
101
+
102
+ Please assess each option for: implementation complexity, reliability,
103
+ cross-browser compatibility, and impact on existing code.
104
+ ```
105
+
106
+ ### Bad consultation prompt
107
+
108
+ ```
109
+ How should I handle file uploads?
110
+ ```
111
+
112
+ ## When to Consult
113
+
114
+ Always consult for:
115
+
116
+ - **Architectural changes** -- New patterns, component restructuring, protocol changes
117
+ - **Breaking API changes** -- WebSocket message format, REST endpoint changes
118
+ - **New dependencies** -- Any npm package addition (Researcher for vetting, Architect for fit)
119
+ - **UX-visible changes** -- Anything a user would notice (Designer + User Researcher)
120
+ - **Test strategy changes** -- New testing patterns, CI pipeline changes (Lead QA)
121
+ - **Performance-critical code** -- Anything in the hot path (Principal Engineer)
122
+ - **Security-sensitive code** -- Auth, input validation, path traversal (Principal Engineer + QA)
123
+
124
+ Skip consultation for:
125
+
126
+ - Typo fixes
127
+ - Comment updates
128
+ - Straightforward bug fixes where the root cause is clear
129
+ - Documentation-only changes
130
+
131
+ ## Synthesizing Advice
132
+
133
+ When experts disagree (and they will), handle it systematically:
134
+
135
+ 1. **Document the disagreement** -- What does each expert recommend and why?
136
+ 2. **Identify the core tension** -- Is it between performance and simplicity? Between speed and correctness?
137
+ 3. **Make a decision** -- You can't wait for consensus. Weigh the arguments and choose.
138
+ 4. **Record it in an ADR** -- Document what was decided, what alternatives were considered, and why you chose this path.
139
+ 5. **Move forward** -- Don't second-guess. If the decision proves wrong later, a future agent can write a new ADR that supersedes.
140
+
141
+ Disagreement between experts is a signal that the decision is important, not that it's impossible.
142
+
143
+ ## Post-Completion Review Is Mandatory
144
+
145
+ After completing any non-trivial work, spawn a reviewer subagent to review what you did before considering the task done. This is not optional.
146
+
147
+ Self-review is unreliable -- the same blind spots that led to mistakes in implementation will exist during self-review. An independent reviewer subagent operates with fresh context and catches issues you missed.
148
+
149
+ ### What the reviewer should check
150
+
151
+ - Code correctness and edge cases
152
+ - Cross-platform compatibility (Windows + Linux)
153
+ - Test coverage completeness
154
+ - Documentation updates (specs, ADRs, history)
155
+ - Adherence to coding conventions
156
+ - Security concerns (input validation, path traversal, injection)
157
+ - Performance implications
158
+
159
+ ### How to run the review
160
+
161
+ Spawn a QA Reviewer or Lead QA subagent with:
162
+
163
+ 1. A summary of what was changed and why
164
+ 2. The list of files modified
165
+ 3. The relevant spec and ADR references
166
+ 4. A request to verify: correctness, test coverage, doc completeness, cross-platform safety
167
+
168
+ The reviewer's findings should be addressed before marking the task as done. If the reviewer identifies issues, fix them and re-review. No work is complete until it has been independently reviewed.
@@ -6,7 +6,7 @@ module.exports = defineConfig({
6
6
  fullyParallel: false,
7
7
  forbidOnly: !!process.env.CI,
8
8
  retries: process.env.CI ? 1 : 0,
9
- workers: 1,
9
+ workers: process.env.CI ? 2 : 1,
10
10
  timeout: 60000,
11
11
  expect: {
12
12
  timeout: 15000,
@@ -32,8 +32,12 @@ module.exports = defineConfig({
32
32
  testMatch: '01-golden-path.spec.js',
33
33
  },
34
34
  {
35
- name: 'functional',
36
- testMatch: /0[2-7]-.*\.spec\.js|09-image-paste\.spec\.js|09-background-.*\.spec\.js/,
35
+ name: 'functional-core',
36
+ testMatch: /0[2-5]-.*\.spec\.js/,
37
+ },
38
+ {
39
+ name: 'functional-extended',
40
+ testMatch: /0[6-7]-.*\.spec\.js|09-image-paste\.spec\.js|09-background-.*\.spec\.js/,
37
41
  },
38
42
  {
39
43
  name: 'mobile-iphone',
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-or-die",
3
- "version": "0.1.22",
3
+ "version": "0.1.23",
4
4
  "description": "Universal AI coding terminal — Claude, Copilot, Gemini & more in your browser",
5
5
  "main": "src/server.js",
6
6
  "bin": {