npm - ai-or-die - Versions diffs - 0.1.43 → 0.1.44 - Mend

ai-or-die 0.1.43 → 0.1.44

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/.github/workflows/ci.yml +175 -13
package/docs/agent-instructions/02-testing-and-validation.md +55 -0
package/docs/agent-instructions/06-ci-first-testing.md +14 -0
package/docs/agent-instructions/09-copilot-agent-testing.md +523 -0
package/docs/audits/SUMMARY.md +151 -0
package/docs/history/mobile-ux-overhaul-deferrals.md +107 -0
package/docs/planning/qol2-handoff.md +119 -0
package/e2e/playwright.config.js +16 -1
package/package.json +1 -1
package/src/base-bridge.js +21 -1
package/src/public/app.js +378 -70
package/src/public/base.css +2 -0
package/src/public/components/bottom-nav.css +1 -1
package/src/public/components/buttons.css +6 -0
package/src/public/components/extra-keys.css +31 -15
package/src/public/components/menus.css +13 -1
package/src/public/components/modals.css +14 -0
package/src/public/components/tabs.css +16 -4
package/src/public/extra-keys.js +148 -18
package/src/public/index.html +4 -4
package/src/public/mobile.css +40 -10
package/src/server.js +37 -6

package/.github/workflows/ci.yml CHANGED Viewed

@@ -7,14 +7,15 @@ on:
     branches: [main]
 concurrency:
-  group: ci-${{ github.event.pull_request.number || github.sha }}
+  group: ci-main-${{ github.event.pull_request.number || github.sha }}
   cancel-in-progress: true
 jobs:
   test:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
+      fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest]
         node-version: [22]
@@ -23,6 +24,7 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: ${{ matrix.node-version }}
+          cache: 'npm'
       - run: npm ci
       - run: npm test
       - run: npm audit --audit-level=moderate
@@ -30,8 +32,9 @@ jobs:
   test-browser-golden:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
+      fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest]
     steps:
@@ -39,7 +42,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run golden path test
@@ -56,8 +67,9 @@ jobs:
   test-browser-functional-core:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
+      fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest]
     steps:
@@ -65,7 +77,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run functional core tests
@@ -82,8 +102,9 @@ jobs:
   test-browser-functional-extended:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
+      fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest]
     steps:
@@ -91,7 +112,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run functional extended tests
@@ -108,8 +137,9 @@ jobs:
   test-browser-mobile:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
+      fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest]
     steps:
@@ -117,7 +147,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run mobile portrait tests (iPhone 14)
@@ -136,7 +174,7 @@ jobs:
   test-browser-visual:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
       fail-fast: false
       matrix:
@@ -146,7 +184,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run visual regression tests
@@ -177,8 +223,9 @@ jobs:
   test-browser-new-features:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
+      fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest]
     steps:
@@ -186,7 +233,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run new feature tests
@@ -203,8 +258,9 @@ jobs:
   test-browser-integrations:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
+      fail-fast: false
       matrix:
         os: [ubuntu-latest, windows-latest]
     steps:
@@ -212,7 +268,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run integration tests
@@ -230,7 +294,7 @@ jobs:
   test-browser-power-user:
     runs-on: ${{ matrix.os }}
     needs: test
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
       fail-fast: false
       matrix:
@@ -240,7 +304,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run power user flow tests
@@ -258,7 +330,7 @@ jobs:
   test-browser-mobile-flows:
     runs-on: ${{ matrix.os }}
     needs: test
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
       fail-fast: false
       matrix:
@@ -268,7 +340,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run mobile flow tests
@@ -286,7 +366,7 @@ jobs:
   test-browser-ui-features:
     runs-on: ${{ matrix.os }}
     needs: test
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
       fail-fast: false
       matrix:
@@ -296,7 +376,15 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
       - name: Install Playwright browsers
         run: npx playwright install chromium --with-deps
       - name: Run UI feature tests
@@ -311,10 +399,83 @@ jobs:
             playwright-report/
           retention-days: 14
+  test-browser-mobile-sprint1:
+    runs-on: ${{ matrix.os }}
+    needs: test
+    timeout-minutes: 12
+    strategy:
+      fail-fast: false
+      matrix:
+        os: [ubuntu-latest, windows-latest]
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version: '22'
+          cache: 'npm'
+      - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
+      - name: Install Playwright browsers
+        run: npx playwright install chromium --with-deps
+      - name: Run mobile sprint1 tests
+        run: npx playwright test --config e2e/playwright.config.js --project mobile-sprint1
+      - name: Upload Playwright report
+        uses: actions/upload-artifact@v4
+        if: ${{ !cancelled() }}
+        with:
+          name: playwright-mobile-sprint1-${{ matrix.os }}
+          path: |
+            e2e/test-results/
+            playwright-report/
+          retention-days: 14
+  test-browser-mobile-sprint23:
+    runs-on: ${{ matrix.os }}
+    needs: test
+    timeout-minutes: 12
+    strategy:
+      fail-fast: false
+      matrix:
+        os: [ubuntu-latest, windows-latest]
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version: '22'
+          cache: 'npm'
+      - run: npm ci
+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: |
+            ~/.cache/ms-playwright
+            ~/AppData/Local/ms-playwright
+          key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
+      - name: Install Playwright browsers
+        run: npx playwright install chromium --with-deps
+      - name: Run mobile sprint23 tests
+        run: npx playwright test --config e2e/playwright.config.js --project mobile-sprint23
+      - name: Upload Playwright report
+        uses: actions/upload-artifact@v4
+        if: ${{ !cancelled() }}
+        with:
+          name: playwright-mobile-sprint23-${{ matrix.os }}
+          path: |
+            e2e/test-results/
+            playwright-report/
+          retention-days: 14
   build-binary:
     runs-on: ${{ matrix.os }}
-    timeout-minutes: 9
+    timeout-minutes: 12
     strategy:
+      fail-fast: false
       matrix:
         include:
           - os: ubuntu-latest
@@ -328,6 +489,7 @@ jobs:
       - uses: actions/setup-node@v4
         with:
           node-version: '22'
+          cache: 'npm'
       - run: npm ci
       - name: Build SEA binary
         run: node scripts/build-sea.js

package/docs/agent-instructions/02-testing-and-validation.md CHANGED Viewed

@@ -1,5 +1,60 @@
 # Testing and Validation
+## Core Philosophy
+Validate like a user would use the product. Every test — from unit to E2E to exploratory — must ultimately answer the question: "Does this work the way a real person expects?" Tests that verify internal implementation details without connecting to user-observable behavior are maintenance liabilities, not safety nets.
+## Testing Hierarchy
+The project uses three testing tiers. Each tier serves a distinct purpose. Using the wrong tier for a given problem wastes time or misses bugs.
+### Tier 1: True E2E Tests (Deterministic, CI)
+Playwright tests that run on every PR across both Ubuntu and Windows. These are the source of truth for whether the product works. They simulate real user actions — clicking, typing, navigating — against the full running system (server, WebSocket, terminal, browser UI).
+- **Run frequency**: Every commit, every PR
+- **Authority**: If E2E passes on CI, the feature works. If it fails, the feature is broken.
+- **Finds**: Regressions in known behavior, cross-platform breakage, integration failures
+- **Owns**: The regression contract. Once an E2E test exists for a behavior, that behavior cannot break without CI catching it.
+Every new feature requires E2E coverage. Every bug fix requires a regression E2E test. No exceptions.
+See `docs/agent-instructions/06-ci-first-testing.md` for the complete CI workflow, job map, and debugging playbook.
+### Tier 2: Copilot Agent Exploratory Testing (LLM, Periodic)
+Copilot coding agents with Playwright MCP acting as human-like testers. They browse the app, interact with it at various viewports, and produce structured audit reports. This is a "bug bash" — run per feature, per release, or per major UI change. Not on every commit.
+- **Run frequency**: Per feature or per release (~50 minutes per run)
+- **Authority**: Findings require expert validation before action (~15% false-positive rate from emulation gaps)
+- **Finds**: Unknown-unknowns, UX issues, accessibility gaps, mobile layout problems, edge cases nobody anticipated
+- **Owns**: Discovery. These tests find the things you forgot to test.
+Validated findings become fix tasks. Fixes include Tier 1 E2E regression tests that prevent recurrence.
+See `docs/agent-instructions/09-copilot-agent-testing.md` for the full setup, issue templates, and validation process.
+### Tier 3: Manual Device Testing (Real Hardware, Edge Cases)
+Real devices, real keyboards, real network conditions. For issues that Playwright emulation cannot catch.
+- **Run frequency**: As needed, for findings flagged "Needs Real Device Testing" during Tier 2 validation
+- **Authority**: Final word on device-specific behavior
+- **Finds**: `visualViewport` timing, `pointer: coarse` media query behavior, virtual keyboard overlays, PWA install flows, touch physics, real network latency
+- **Owns**: The gap between emulation and reality
+Any Tier 2 finding that depends on real device behavior must be verified on Tier 3 before the fix ships.
+### How the Tiers Work Together
+1. **Tier 2 discovers issues** during feature development or before a release
+2. **Expert validation** removes false positives and confirms real bugs
+3. **Tier 3 verifies** any finding that depends on real device behavior
+4. **Fixes ship with Tier 1 E2E regression tests** that run on every future commit
+5. **Tier 1 prevents recurrence** permanently
+The tiers are complementary, not competing. Tier 1 catches what you know about. Tier 2 finds what you missed. Tier 3 confirms what emulation cannot.
 ## Coverage Target
 Target 90% code coverage for all new code. This is not optional for new features or refactors. Existing code without tests should be covered when modified.

package/docs/agent-instructions/06-ci-first-testing.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # CI-First Testing
+This document covers Tier 1 (True E2E Tests) of the project's testing hierarchy. E2E tests on CI are the source of truth for regression. For the full three-tier testing hierarchy — E2E, Copilot agent exploratory testing, and manual device testing — see `docs/agent-instructions/02-testing-and-validation.md`.
 ## E2E Tests Are the Source of Truth
 End-to-end tests are the only true way to validate that the system works. Unit tests verify isolated logic. E2E tests prove the whole system -- server, WebSocket, terminal, browser UI -- actually functions as a user would experience it.
@@ -8,6 +10,18 @@ A feature is not done until its E2E tests pass on GitHub runners. If unit tests
 Every new feature must have E2E test coverage. Every bug fix must have a regression E2E test. The E2E suite is the contract that tells the next agent "this is what working looks like."
+### Performance budget: 5-minute target, 7-minute max
+The entire CI pipeline must complete within 5 minutes wall-clock time. 7 minutes is the absolute maximum acceptable. The per-job timeout is set to 9 minutes as a safety net for runner queue delays, but any job consistently hitting 7+ minutes must be investigated and optimized.
+To hit this budget:
+- **Parallelize aggressively**: All independent Playwright projects run in separate parallel jobs. Never run projects sequentially within a single job unless they share expensive state.
+- **Minimize setup overhead**: Each CI job spends 2-3 minutes on checkout, npm ci, and Playwright install. Consolidate small test projects into fewer jobs to reduce redundant setup.
+- **No unnecessary dependencies**: Do not add `needs:` between jobs unless one job consumes artifacts from another. Unit tests and browser tests run in parallel from the start.
+- **Increase Playwright workers**: Use `--workers=2` or more within each job for parallel test execution.
+When adding new E2E tests, verify the pipeline still completes under 5 minutes. If it doesn't, split the slowest job or consolidate the smallest ones.
 ### Long E2E waits indicate bugs
 If an E2E test requires long waits or generous timeouts to pass, that is a signal of a bug in the product code, not a test timing issue. No real user is going to wait 30 seconds for a terminal to respond or 10 seconds for a WebSocket to connect. If the test needs that much patience, the code is too slow and must be fixed. Tightening test timeouts is a legitimate way to catch performance regressions -- the test should reflect realistic user expectations, not compensate for sluggish code.