npm - @athenaflow/plugin-e2e-test-builder - Versions diffs - 2.0.9 - Mend

@athenaflow/plugin-e2e-test-builder 2.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (190) hide show

package/dist/2.0.9/codex/plugin/skills/write-test-code/references/anti-patterns.md ADDED Viewed

@@ -0,0 +1,88 @@
+# Anti-Patterns: Detailed Explanations and Fix Strategies
+## 1. Raw CSS selectors
+Use semantic locators (`getByRole`, `getByLabel`, `getByTestId`) instead of CSS selectors. CSS selectors are brittle and break when markup changes.
+**Why:** A class rename, component refactor, or CSS-in-JS migration breaks every CSS-based selector overnight. Semantic locators survive these changes because they target accessible roles and labels, not implementation details.
+**Fix:** Replace `page.locator('.submit-btn')` with `page.getByRole('button', { name: /submit/i })`. Follow the locator strategy hierarchy in the main skill.
+## 2. `waitForTimeout()`
+Use proper assertions and event-driven waits. `waitForTimeout()` adds arbitrary delays that slow tests and mask timing issues.
+**Why:** A 2-second sleep that works locally may be too short in CI (slower machines) or too long everywhere (wasting time). It also hides the real question: "what am I actually waiting for?"
+**Fix:** Replace with `waitForResponse` for API-dependent UI, `expect(el).toBeVisible()` for element appearance, or `expect(spinner).toBeHidden()` for loading states.
+## 3. Fragile `.nth()` / `.first()`
+Scope locators to a container instead of relying on position. If unavoidable, add a comment explaining why.
+**Why:** Element order can change when the page adds a banner, reorders a list, or renders asynchronously. `.first()` silently picks the wrong element, causing false passes or mysterious failures.
+**Fix:** Use `.filter({ hasText: 'Specific Item' })` or scope to a container: `page.locator('[data-testid="cart"]').getByRole('button')`.
+## 4. Exact long text matches
+Use regex with key words instead of matching entire strings. Marketing copy and UI text change frequently.
+**Why:** A copywriter changes "Sign up for free today!" to "Create your free account" and every test matching the full string breaks, even though the feature works fine.
+**Fix:** Use `page.getByText(/sign up/i)` or `page.getByRole('button', { name: /free/i })` — match the stable semantic keywords.
+## 5. Unscoped locators
+Scope locators to `main`, `nav`, `dialog`, or other containers when possible. Global locators match unintended elements.
+**Why:** A page-wide `getByRole('button', { name: /submit/i })` may match a submit button in the header, footer, or a hidden modal — not just the one in your form. This causes the wrong click or ambiguous locator errors.
+**Fix:** Scope first: `page.locator('main').getByRole('button', { name: /submit/i })` or `page.locator('[role="dialog"]').getByRole('button')`.
+## 6. Login via UI in every test
+Use `storageState` or API-based auth setup. UI login in every test wastes time and creates coupling to the login flow.
+**Why:** If every test clicks through the login form, a single login page change breaks the entire suite. It also adds 3-10 seconds per test — multiplied across hundreds of tests, this becomes significant CI time.
+**Fix:** Log in once in `globalSetup`, save `storageState` to a JSON file, and reuse it. See `references/auth-patterns.md` for the four auth strategies.
+## 7. UI clicks to set up test data
+Use API requests for data seeding. UI setup is 10-50x slower and more fragile than API calls.
+**Why:** Creating a product via the admin UI takes 15+ seconds and 10+ actions. An API call takes 200ms and one line. UI setup also couples your test to two features instead of one — if the admin form breaks, your unrelated cart test fails too.
+**Fix:** Use the `request` fixture: `await request.post('/api/products', { data: { ... } })`. See `references/api-setup-teardown.md`.
+## 8. No error path tests
+Every feature needs at least one failure scenario test. Cover server errors (500), network timeouts, and empty states.
+**Why:** Happy-path-only suites give false confidence. The app may crash on a 500, show a blank screen on empty data, or hang on a timeout — none of which are caught without explicit error path tests.
+**Fix:** Use `page.route()` to mock failures. At minimum: one 500 response, one timeout (`route.abort('timedout')`), one empty state (`route.fulfill({ json: { items: [] } })`). See `references/network-interception.md`.
+## 9. Hardcoded test data
+NEVER embed real entity IDs (`'ACC-SUB-2026-00025'`), real user names (`'Anas Client 73'`), real monetary amounts, or environment-specific strings in test code. Instead:
+- Create data via API in `beforeEach` and capture the returned ID
+- Use `Date.now()` or `crypto.randomUUID()` suffixes for uniqueness
+- Read values from `process.env` or a test data module
+- For read-only assertions on existing data, use pattern matchers (`expect(text).toMatch(/ACC-SUB-\d{4}-\d{5}/)`) instead of exact values
+If you find yourself typing a specific ID or name into test code, STOP — that is a hardcoded value.
+## 10. Tests depending on execution order
+Each test must be independently runnable. Never rely on state left by a previous test.
+## 11. `expect(await el.isVisible()).toBe(true)`
+Use `await expect(el).toBeVisible()` instead. The Playwright assertion auto-retries, while the manual pattern checks once and fails on timing.
+## 12. `{ force: true }` on clicks/checks
+Hides real interaction problems (overlapping elements, not scrolled into view, disabled state). Diagnose the root cause instead: use `scrollIntoViewIfNeeded()`, wait for overlay to disappear, or wait for element to be enabled. Only acceptable when interacting with a custom widget that Playwright cannot natively trigger (document why in a comment).
+## 13. `waitForLoadState('networkidle')` as default strategy
+`networkidle` waits for 500ms of no network activity, which breaks on long-polling, WebSockets, analytics beacons, or chat widgets. Use it ONLY for initial full-page loads where no streaming/polling exists. For post-action waits, use `waitForResponse` targeting the specific API call, or assert directly on the resulting UI state (Playwright auto-retries).
+## 14. CSS utility class selectors (Tailwind, Bootstrap, etc.)
+`button.rounded-l-lg`, `.flex.items-center`, `.bg-primary` are styling concerns that change during refactors. Treat ALL utility framework classes as volatile — never use them as selectors. If no semantic locator works, request a `data-testid` from the dev team.
+## 15. Asserting exact server-computed values
+`expect(revenue).toHaveText('12450')` will break when data changes. For dashboard counters, totals, and aggregates:
+- Assert the element exists and contains a number (`toMatch(/\$[\d,]+/)`)
+- Assert non-zero or within a range
+- Assert format correctness (`/^\d{1,3}(,\d{3})*$/`)
+- If exact value matters, seed the data via API first so you control the expected value

package/dist/2.0.9/codex/plugin/skills/write-test-code/references/api-setup-teardown.md ADDED Viewed

@@ -0,0 +1,83 @@
+# API-Driven Test Setup and Teardown
+## API-Driven Test Setup
+Use API calls to set up test data instead of clicking through UI. This is 10-50x faster and more reliable.
+**When to use API setup:** Creating test users, products, orders, seed data. Setting feature flags. Resetting state between tests.
+**When to use UI setup:** Only when the creation flow IS the test being verified.
+```typescript
+test('TC-CART-001: User sees items in cart', async ({ page, request }) => {
+  // Arrange: seed data via API (fast, deterministic)
+  await request.post('/api/cart/items', {
+    data: { productId: 'SKU-123', quantity: 2 }
+  });
+  // Act: navigate to verify UI
+  await page.goto('/cart');
+  // Assert
+  await expect(page.getByRole('listitem')).toHaveCount(2);
+});
+```
+**Reusable API fixture pattern:**
+```typescript
+export const test = base.extend<{ apiClient: APIRequestContext }>({
+  apiClient: async ({ playwright }, use) => {
+    const ctx = await playwright.request.newContext({
+      baseURL: process.env.API_BASE_URL,
+      extraHTTPHeaders: { Authorization: `Bearer ${process.env.API_TOKEN}` },
+    });
+    await use(ctx);
+    await ctx.dispose();
+  },
+});
+```
+## Test Data Teardown
+Tests that create persistent data (database records, uploaded files, user accounts) MUST clean up after themselves. Leaked test data accumulates across runs and causes false positives/negatives in other tests (pagination counts drift, filter results change, list assertions break).
+### Strategy 1: API teardown in afterEach (Recommended)
+```typescript
+let createdTicketId: string;
+test.beforeEach(async ({ request }) => {
+  const resp = await request.post('/api/tickets', {
+    data: { title: `Test ${Date.now()}` }
+  });
+  createdTicketId = (await resp.json()).id;
+});
+test.afterEach(async ({ request }) => {
+  if (createdTicketId) {
+    await request.delete(`/api/tickets/${createdTicketId}`);
+  }
+});
+```
+### Strategy 2: Fixture with automatic cleanup
+```typescript
+export const test = base.extend<{ testTicket: { id: string; title: string } }>({
+  testTicket: async ({ request }, use) => {
+    const resp = await request.post('/api/tickets', {
+      data: { title: `Test ${Date.now()}` }
+    });
+    const ticket = await resp.json();
+    await use(ticket);
+    // cleanup runs automatically when test finishes
+    await request.delete(`/api/tickets/${ticket.id}`);
+  },
+});
+```
+### Strategy 3: Bulk cleanup in globalTeardown
+For environments where individual deletion is impractical, tag test data (e.g., `title LIKE 'Test %'`) and delete in batch during `globalTeardown.ts`.
+If the cleanup API endpoint is unknown, do not invent one. Leave a clear `TODO` with the missing endpoint details, document the cleanup gap in the test file or tracker, and prefer fixture-scoped or environment reset strategies that you can verify. If cleanup is genuinely impossible (no API, no database access), document this as a known limitation in the test file header AND add an `afterEach` that logs a warning.

package/dist/2.0.9/codex/plugin/skills/write-test-code/references/auth-patterns.md ADDED Viewed

@@ -0,0 +1,63 @@
+# Authentication Setup Patterns
+Choose the right auth strategy based on the project's needs.
+## Strategy 1: storageState (Recommended for most projects)
+Log in once in global setup, save cookies/localStorage to a JSON file, and reuse across all tests:
+```typescript
+// global-setup.ts
+import { chromium, FullConfig } from '@playwright/test';
+async function globalSetup(config: FullConfig) {
+  const browser = await chromium.launch();
+  const page = await browser.newPage();
+  await page.goto('/login');
+  await page.getByLabel(/email/i).fill(process.env.TEST_USER_EMAIL!);
+  await page.getByLabel(/password/i).fill(process.env.TEST_USER_PASSWORD!);
+  await page.getByRole('button', { name: /sign in/i }).click();
+  await page.waitForURL('/dashboard');
+  await page.context().storageState({ path: 'tests/.auth/user.json' });
+  await browser.close();
+}
+export default globalSetup;
+```
+Reference in config: `use: { storageState: 'tests/.auth/user.json' }`
+## Strategy 2: Worker-scoped fixture (for parallel workers needing separate accounts)
+```typescript
+export const test = base.extend<{}, { workerStorageState: string }>({
+  storageState: ({ workerStorageState }, use) => use(workerStorageState),
+  workerStorageState: [async ({ browser }, use, testInfo) => {
+    const page = await browser.newPage({ storageState: undefined });
+    // Login with worker-specific account...
+    const path = `tests/.auth/worker-${testInfo.parallelIndex}.json`;
+    await page.context().storageState({ path });
+    await use(path);
+    await page.close();
+  }, { scope: 'worker' }],
+});
+```
+## Strategy 3: Multi-role testing (admin + user in same test)
+```typescript
+test('TC-ADMIN-001: Admin sees user profile', async ({ browser }) => {
+  const adminContext = await browser.newContext({ storageState: 'tests/.auth/admin.json' });
+  const userContext = await browser.newContext({ storageState: 'tests/.auth/user.json' });
+  const adminPage = await adminContext.newPage();
+  const userPage = await userContext.newPage();
+  // Interact with both pages...
+  await adminContext.close();
+  await userContext.close();
+});
+```
+## Strategy 4: Per-test login
+Only when testing login itself or permission-specific scenarios.
+Never hardcode tokens. Use environment variables or `.env.test`.

package/dist/2.0.9/codex/plugin/skills/write-test-code/references/mapping-tables.md ADDED Viewed

@@ -0,0 +1,56 @@
+# Mapping Tables
+Standard translations for converting journey specs and exploration results to Playwright API calls.
+## Scope-to-Locator
+| Journey Scope | Playwright Scoping |
+|---------------|-------------------|
+| `page` | No scoping needed |
+| `header` | `page.locator('header')` |
+| `main` | `page.locator('main')` |
+| `nav` | `page.locator('nav')` |
+| `dialog` | `page.locator('[role="dialog"]')` |
+## Action-to-Playwright
+| Journey Action | Playwright Code |
+|----------------|-----------------|
+| `goto` | `await page.goto(url)` |
+| `click` | `await locator.click()` |
+| `fill` | `await locator.fill(value)` |
+| `select` | `await locator.selectOption(value)` |
+| `assert` | `await expect(locator).toBeVisible()` |
+## Assertion Mapping
+| Observed Effect | Playwright Assertion |
+|----------------|---------------------|
+| `url changed to /cart` | `await expect(page).toHaveURL(/cart/)` |
+| `text 'Added' visible` | `await expect(page.getByText(/added/i)).toBeVisible()` |
+| `radio 256GB checked` | `await expect(locator).toBeChecked()` |
+| `button now enabled` | `await expect(locator).toBeEnabled()` |
+## Target Kind to Locator
+| Target Kind | Value Pattern | Playwright Locator |
+|-------------|--------------|-------------------|
+| `role` | `button name~Add to Bag` | `getByRole('button', { name: /add to bag/i })` |
+| `role` | `radio name~256GB` | `getByRole('radio', { name: /256gb/i })` |
+| `label` | `Email address` | `getByLabel(/email address/i)` |
+| `testid` | `checkout-button` | `getByTestId('checkout-button')` |
+## Low Confidence Handling (<0.7)
+When journey step confidence is low:
+1. Add extra assertions to verify state
+2. Include fallback locators as comments
+3. Consider retry logic for flaky interactions
+```typescript
+// Primary locator
+const storageRadio = page.getByRole('radio', { name: /256gb/i });
+// Fallback: page.getByLabel(/256gb/i)
+await storageRadio.click();
+await expect(storageRadio).toBeChecked({ timeout: 5000 });
+```

package/dist/2.0.9/codex/plugin/skills/write-test-code/references/network-interception.md ADDED Viewed

@@ -0,0 +1,56 @@
+# Network Interception and Error Path Testing
+## Network Interception
+Use `page.route()` to intercept and mock network requests for deterministic error testing.
+**Mock server errors:**
+```typescript
+await page.route('**/api/checkout', route =>
+  route.fulfill({ status: 500, body: JSON.stringify({ error: 'Payment declined' }) })
+);
+```
+**Patch real responses (modify, don't replace):**
+```typescript
+await page.route('**/api/products', async route => {
+  const response = await route.fetch();
+  const json = await response.json();
+  json.results = json.results.slice(0, 1); // reduce to 1 item
+  await route.fulfill({ response, json });
+});
+```
+**Assert backend was called:**
+```typescript
+const [response] = await Promise.all([
+  page.waitForResponse(resp =>
+    resp.url().includes('/api/order') && resp.status() === 201
+  ),
+  page.getByRole('button', { name: /place order/i }).click(),
+]);
+expect(response.status()).toBe(201);
+```
+**Block heavy resources to speed up tests:**
+```typescript
+await page.route('**/*.{png,jpg,jpeg,gif,svg}', route => route.abort());
+```
+## Error Path Testing
+Every feature needs error path tests. Use network interception (see above) to simulate failures. At minimum, every feature test suite should cover:
+- **Server error** — `route.fulfill({ status: 500, ... })` — verify error UI appears
+- **Network timeout** — `route.abort('timedout')` — verify retry option or error message
+- **Empty state** — `route.fulfill({ status: 200, json: { items: [] } })` — verify empty state UI
+```typescript
+test('TC-DASHBOARD-005: Shows empty state when no data', async ({ page }) => {
+  await page.route('**/api/items', route =>
+    route.fulfill({ status: 200, json: { items: [] } })
+  );
+  await page.goto('/dashboard');
+  await expect(page.getByText(/no items/i)).toBeVisible();
+});
+```

package/dist/2.0.9/release.json ADDED Viewed

@@ -0,0 +1,18 @@
+{
+  "schemaVersion": 1,
+  "pluginRef": "e2e-test-builder@athena-workflow-marketplace",
+  "pluginName": "e2e-test-builder",
+  "marketplaceName": "athena-workflow-marketplace",
+  "version": "2.0.9",
+  "artifacts": {
+    "claude": {
+      "type": "directory",
+      "path": "./claude/plugin"
+    },
+    "codex": {
+      "type": "marketplace",
+      "marketplacePath": "./.agents/plugins/marketplace.json",
+      "pluginPath": "./codex/plugin"
+    }
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,13 @@
+{
+  "name": "@athenaflow/plugin-e2e-test-builder",
+  "version": "2.0.9",
+  "description": "Full-pipeline Playwright E2E test generation \u2014 explores your live site via browser, detects existing test conventions, plans coverage gaps, produces reviewed test specs, writes production-grade test code with quality gates, and stabilizes flaky tests",
+  "license": "MIT",
+  "publishConfig": {
+    "access": "public"
+  },
+  "scripts": {
+    "build:artifacts": "node ../../scripts/build-plugin-artifacts.mjs .",
+    "prepack": "npm run build:artifacts"
+  }
+}

package/skills/add-e2e-tests/SKILL.md ADDED Viewed

@@ -0,0 +1,215 @@
+---
+name: add-e2e-tests
+description: >
+  THE DEFAULT ENTRY POINT for all Playwright / E2E test work. This skill should be used FIRST
+  whenever the user wants to add, create, or set up end-to-end tests for any feature, page, or
+  application. Runs the full pipeline: analyze codebase, explore the live site, plan coverage,
+  generate TC-ID specs, run quality-gate reviews, write production-grade test code, and execute.
+  Delegates to sub-skills (analyze-test-codebase, plan-test-coverage, generate-test-cases,
+  review-test-cases, write-test-code, review-test-code, fix-flaky-tests) internally — do NOT
+  skip to sub-skills directly unless the user explicitly requests a narrow activity.
+  Iterative and resumable via tracker file. Uses subagent delegation to save context.
+allowed-tools: Read Write Edit Glob Grep Bash Task
+---
+# Add E2E Tests
+Go from zero to passing Playwright tests for the target feature in one interactive session.
+## Input
+Parse the target URL and feature description from: $ARGUMENTS
+Derive a **feature slug** from the feature description (e.g., "Login flow" → `login`, "Checkout with payment" → `checkout`). Use this slug for file naming throughout.
+## Session Protocol
+### 1. Orient: Understand the Project, the Product, and Your Capabilities
+Before planning any work, build deep situational awareness. This step determines the quality of everything that follows — rushed orientation leads to missed test cases and wasted effort.
+**Check for existing progress:**
+- If `e2e-tracker.md` exists in the project root, read it and resume from where you left off — skip to **step 2 (Plan)** with the remaining work.
+- If no tracker exists, this is a fresh start. Proceed with orientation below.
+#### First: create initial tasks and tracker
+As soon as you parse the user's request:
+1. **Create the tracker** — write `e2e-tracker.md` with the goal (URL, feature, slug) and a skeleton plan.
+2. **Create high-level tasks** for the work ahead — analyze codebase, explore the product, plan coverage, generate test specs, write tests, verify tests.
+These are your starting skeleton. As you work through orientation and discover the actual shape of the work, refine both the tasks and the tracker — break tasks into granular sub-tasks, add new ones, remove ones that don't apply.
+Treat the task list as a visible milestone log. Keep it concise, but update it continuously. Do not leave broad tasks open until the end and then mark everything complete in one batch.
+#### 1a. Understand the codebase
+- Does a Playwright config exist (`playwright.config.{ts,js,mjs}`)? If not, you will need to scaffold one (see Scaffolding section).
+- Are there existing tests? What conventions do they follow — naming, locators, fixtures, page objects, auth?
+- Load the `analyze-test-codebase` skill and follow its methodology.
+#### 1b. Understand the product
+This is the most important part of orientation. You cannot write good tests for a product you don't understand.
+- **Read existing test cases** — if `test-cases/*.md` files exist, read them to understand what journeys have been mapped. Look at what's covered AND what's missing.
+- **Browse the actual product** — load the `agent-web-interface-guide` skill and use the browser MCP tools to walk through the feature you're testing. Don't just skim the page — interact with it as a user would: fill forms, click buttons, trigger validation, navigate between pages, check error states.
+- **Map the user journey in detail** — understand the complete flow: entry points, happy paths, error paths, edge cases, what happens with invalid input, what happens when the user goes back, what conditional UI exists.
+Why this matters: absent explicit exploration, agents tend to write tests based on assumptions about how a product works rather than how it actually works. The result is tests that target imaginary behavior or miss critical real behavior. Spending time here prevents both.
+#### 1c. Know your skills
+You have access to specialized skills that contain deep domain knowledge. Load the relevant skill before performing each activity — skills prevent improvisation and encode best practices.
+| Activity | Skill |
+|----------|-------|
+| Analyzing test setup, config, conventions | `analyze-test-codebase` |
+| Deciding what to test, coverage gaps, priorities | `plan-test-coverage` |
+| Opening a URL, browsing, using browser MCP tools | `agent-web-interface-guide` |
+| Creating TC-ID specs from site exploration | `generate-test-cases` |
+| Reviewing TC-ID specs before implementation | `review-test-cases` |
+| Writing, editing, or refactoring test code | `write-test-code` |
+| Reviewing test code before execution signoff | `review-test-code` |
+| Debugging test failures, checking stability | `fix-flaky-tests` |
+Before doing a substantial activity, load the skill that covers that activity so you can follow its workflow rather than improvising.
+#### 1d. Update the tracker with orientation findings
+After orienting, update the tracker with what you learned about the codebase and product, conventions discovered, and your refined plan. The tracker must always answer these four questions for anyone reading it cold:
+1. What is the goal?
+2. What has been done?
+3. What is remaining?
+4. What should I do next?
+See [references/tracker-template.md](references/tracker-template.md) for a concrete template.
+### 2. Plan: Refine Tasks Into Granular Checkpoints
+By now you have initial tasks and a tracker from step 1. Refine tasks into granular checkpoints. The plan should flow from what you learned during orientation, not from a fixed template.
+#### Task granularity
+Think in small checkpoints, not big phases. Each task should represent a concrete, verifiable unit of progress.
+Too coarse: "Analyze codebase", "Write tests", "Verify tests"
+Right granularity:
+- "Read playwright.config.ts — extract baseURL, testDir, projects"
+- "Read 2 existing test files — identify locator strategy and naming pattern"
+- "Write conventions report to e2e-plan/conventions.md"
+- "Navigate to /login — catalog all form fields, buttons, and validation messages"
+- "Submit login form empty — record all validation error messages and their positions"
+- "Submit login with invalid email format — record inline validation behavior"
+- "Write TC-LOGIN-001: happy path login with valid credentials"
+- "Write TC-LOGIN-002: login with empty email shows required field error"
+- "Run login.spec.ts and record full output"
+- "Fix TC-LOGIN-003: selector not found — browse page and re-extract selector"
+- "Re-run login.spec.ts — verify fix didn't break other tests"
+- "Check all TC-IDs from spec are present in test files"
+**Never be conservative.** More tasks is better than fewer. If you discover new work mid-session (a test fails, a selector changed, a form has unexpected validation), add tasks dynamically. The task list is a living document that reflects the real state of the work.
+Create tasks for verification steps too (running tests, checking coverage, browsing to confirm selectors), not just implementation.
+Update task status as each checkpoint completes. A good pattern is: finish exploration and mark it complete, finish coverage/spec work and mark it complete, finish implementation and mark it complete, then finish review/execution and mark it complete. Do not keep all milestones open until session end.
+### 3. Execute
+Work through your tasks. Load the relevant skill before each activity.
+#### Planning uses the browser heavily
+When planning what to test (coverage planning, test case generation), use the browser extensively. Don't just catalog elements — interact with the product to discover:
+- What validation messages appear for each field?
+- What happens when you submit with missing data?
+- What error states exist (network errors, empty states, permission errors)?
+- What does the flow look like end-to-end, not just page-by-page?
+- What edge cases exist (special characters, long inputs, rapid clicks)?
+- What UI changes conditionally (loading states, disabled buttons, progressive disclosure)?
+Every test case you generate should trace back to something you actually observed or deliberately triggered in the browser. This is how you avoid introducing useless test cases (testing imaginary behavior) and avoid missing important ones (behavior you didn't think to check).
+#### Subagent delegation
+Delegate heavy browser exploration and test writing to subagents when that saves context for orchestration, verification, and debugging. When delegating:
+- Pass the relevant file paths (conventions, coverage plan, test specs)
+- Instruct the subagent to invoke the appropriate skill (subagents inherit access to plugin skills)
+- Specify concrete output expectations (file path, format, TC-ID conventions)
+#### Quality gates
+Two review gates and a test execution checkpoint are mandatory during execution. The review gates are review-only — they produce findings but do not modify files.
+**Gate 1: Review test case specs** (after `generate-test-cases`, before `write-test-code`)
+1. Load the `review-test-cases` skill and run it against `test-cases/<feature>.md`
+2. If verdict is **NEEDS REVISION** — address all blockers in the spec before proceeding to implementation
+3. If verdict is **PASS WITH WARNINGS** — address warnings if quick, otherwise note them and proceed
+4. Record the review verdict in the tracker
+**Gate 2: Review test code** (after `write-test-code`, before final test execution)
+1. Load the `review-test-code` skill and run it against the implemented test files
+2. If verdict is **NEEDS REVISION** — fix all blockers before running tests for signoff
+3. If verdict is **PASS WITH WARNINGS** — fix warnings that affect stability, proceed with execution
+4. Record the review verdict in the tracker
+**Checkpoint: Test execution**
+1. Run the tests: `npx playwright test <file> --reporter=list 2>&1`
+2. Record full output — green test output is the only proof of correctness
+3. If tests fail, load the `fix-flaky-tests` skill and follow its structured diagnostic approach. Do not guess-and-retry.
+4. Maximum 3 fix-and-rerun cycles per test. If stuck after 3 cycles, record the diagnostic output in the tracker and move on.
+**Test execution and coverage checks must never be delegated to subagents.** Run `npx playwright test` directly and record the output.
+#### Update the tracker as you work
+Do not wait until session end. After each meaningful chunk of progress (completing a step, discovering a blocker, producing an artifact), update the tracker. If your context window resets, only what's in the tracker survives.
+Keep the tracker and task list synchronized. If you record progress in the tracker, update the corresponding task status in the same phase of work.
+#### Error recovery
+If infrastructure failures occur (browser MCP unavailable, clone failures, npm install errors), see [references/error-recovery.md](references/error-recovery.md) for diagnostic steps. General pattern: diagnose, attempt one known fix, if still stuck record in tracker and ask the user.
+### 4. End of Session
+Before exiting:
+1. Ensure the tracker reflects all progress, discoveries, and blockers from this session
+2. Write clear instructions for what the next session should do
+3. If all work is complete and all tests pass with full TC-ID coverage: write `<!-- E2E_COMPLETE -->` as the last line of the tracker
+4. If an unrecoverable blocker prevents progress: write `<!-- E2E_BLOCKED: reason -->` as the last line
+Do not write terminal markers prematurely. Only after you are confident the work is truly done or truly stuck.
+## Scaffolding
+If Playwright is not set up in the target project, follow the procedure in [references/scaffolding.md](references/scaffolding.md) to clone the boilerplate, merge configuration, and install dependencies. Log all scaffolding steps in the tracker.
+## Authentication
+If the target feature requires login, follow [references/authentication.md](references/authentication.md). Key rule: never hardcode credentials — use environment variables or `storageState`.
+## Principles
+- **Skills carry the knowledge** — load the relevant skill before each activity; do not improvise
+- **Subagent-driven** — delegate heavy browser and writing work to subagents to save context
+- **Follow existing conventions** — match the project's test style, not a generic template
+- **Traceable** — every test links back to a TC-ID from the spec
+- **Use what the project provides** — if the scaffolded boilerplate includes Page Object Models (BasePage, pages/), path aliases (tsconfig paths), or utility modules, USE them in generated tests. Do not ship infrastructure that tests ignore. If a boilerplate file is unused after test generation, either integrate it or remove it — dead code in test infrastructure causes confusion.
+- **No arbitrary waits** — use Playwright's built-in auto-wait and event-driven waits
+- **API before UI for setup** — use API calls (`request` fixture) for test data; reserve UI for what you are verifying
+- **Test failures, not just success** — every feature needs error path coverage
+- **Artifacts live in standard locations** — `e2e-plan/` for analysis, `test-cases/` for specs, project test dir for test files
+## Example Usage
+```
+Claude Code: /add-e2e-tests https://myapp.com/checkout Checkout flow with cart, shipping, and payment
+Codex: $add-e2e-tests https://myapp.com/checkout Checkout flow with cart, shipping, and payment
+Claude Code: /add-e2e-tests https://myapp.com/login User authentication including social login
+Codex: $add-e2e-tests https://myapp.com/login User authentication including social login
+```

package/skills/add-e2e-tests/agents/claude.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+frontmatter:
+  argument-hint: "<url> <feature to test>"
+  user-invocable: true

package/skills/add-e2e-tests/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,10 @@
+interface:
+  display_name: "Run Full E2E Workflow"
+  short_description: "Orchestrate browser-led E2E work from exploration to verified tests"
+  default_prompt: "Run the full E2E workflow for this feature: maintain a concise progress tracker, explore the live product first, then plan, spec, review, implement, and verify the Playwright tests."
+dependencies:
+  tools:
+    - type: "mcp"
+      value: "agent-web-interface"
+      description: "Browser automation MCP used to explore the site and verify flows"

package/skills/add-e2e-tests/references/authentication.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Authentication Handling for E2E Tests
+If the target feature requires login or any form of authentication:
+1. **Check existing infrastructure** — look for existing test fixtures, environment variables, or auth setup files that already handle authentication. Load the `analyze-test-codebase` skill to find auth patterns.
+2. **Ask the user if no auth setup exists** — request credentials or an auth strategy (stored auth state, API tokens, test accounts). Do not proceed with tests that require login until auth is resolved.
+3. **Never hardcode credentials** — use environment variables, Playwright's `storageState`, or the project's existing auth fixture pattern.
+4. **Handle mid-session auth discovery** — if you discover auth is needed mid-session (e.g., a page redirects to login), ask the user immediately and add auth setup as a prerequisite task.