npm - @athenaflow/plugin-e2e-test-builder - Versions diffs - 2.0.9 - Mend

@athenaflow/plugin-e2e-test-builder 2.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (190) hide show

package/dist/2.0.9/claude/plugin/skills/analyze-test-codebase/SKILL.md ADDED Viewed

@@ -0,0 +1,142 @@
+---
+name: analyze-test-codebase
+description: >
+  Scans and reports on an existing Playwright test codebase. This skill should be used to inspect
+  the current Playwright setup before writing, reviewing, or fixing tests, and should be loaded
+  early when working in a new project to understand existing patterns. Covers: Playwright config,
+  page objects, fixtures, helpers, auth/globalSetup patterns, test conventions, and directory
+  structure. Triggers: "understand", "check", "show me", "inspect", or "analyze"
+  the current test setup, config, infrastructure, patterns, or conventions. In the full add-tests
+  workflow, this skill serves as an early read-only analysis sub-step before planning detailed
+  coverage or writing code. This skill examines existing files and outputs a structured report.
+  It does NOT write or fix tests, install Playwright, or explore live websites — for live site
+  exploration, use agent-web-interface-guide instead.
+allowed-tools: Read Glob Grep Bash
+---
+# Analyze Test Codebase
+Scan and analyze an existing Playwright test codebase to understand its conventions, configuration, and patterns before writing new tests.
+## Workflow
+1. **Locate Playwright configuration** — search for `playwright.config.ts`, `playwright.config.js`, or `playwright.config.mjs` in the project root and common subdirectories.
+2. **Extract configuration details**:
+   - `baseURL` — the target application URL
+   - `testDir` — where tests live
+   - `projects` — browser/device configurations
+   - `use` — default options (viewport, headless, trace, screenshot)
+   - `webServer` — if Playwright starts the app
+   - `globalSetup` / `globalTeardown` — auth or data seeding scripts
+   - `reporter` — configured reporters
+3. **Scan test directory structure**:
+   ```
+   Glob for **/*.spec.ts, **/*.test.ts, **/*.spec.js, **/*.test.js
+   ```
+   - Count total test files
+   - Identify naming conventions (`.spec.ts` vs `.test.ts`)
+   - Map directory organization (by feature? by page? flat?)
+4. **Detect patterns in existing tests** — read 2-3 representative test files and identify:
+   - **Locator strategy**: `getByRole`, `getByTestId`, CSS selectors, XPath
+   - **Wait strategy**: auto-waits, explicit `waitFor*`, any `waitForTimeout` (flag these)
+   - **Test structure**: AAA pattern, describe/test nesting, use of `test.step`
+   - **Data management**: fixtures, test data files, factory functions
+   - **Page Objects**: POM pattern usage, fixture-based injection
+   - **Auth handling**: `storageState`, global setup, per-test login
+   - **Custom fixtures**: extended `test` with custom fixtures
+   - **Helper utilities**: shared functions, custom assertions
+   - **Network mocking**: `page.route()` usage, HAR recording (`routeFromHAR`), API interceptors
+   - **Visual regression**: `toHaveScreenshot()`, `toMatchSnapshot()`, snapshot directories
+   - **Accessibility testing**: `@axe-core/playwright` usage, custom a11y assertions
+   - **Cross-browser config**: multiple projects in playwright.config (chromium, firefox, webkit)
+   - **Retry configuration**: `retries` count, trace settings (`on-first-retry`)
+   - **Parallelism**: `fullyParallel`, `workers` count, test isolation strategy
+5. **Check for supporting infrastructure**:
+   - `fixtures/` or `helpers/` directories
+   - `pages/` or `pom/` (Page Object Model)
+   - `.env` or `.env.test` files (test-specific environment configuration)
+   - CI configuration (`.github/workflows/`, `Jenkinsfile`, etc.)
+   - `package.json` scripts for running tests
+   - `*.har` files or HAR directories (recorded API responses for mocking)
+   - Snapshot directories (`__snapshots__`, `screenshots/`) for visual regression baselines
+   - Docker or `docker-compose` for test environment setup
+   - Global setup/teardown scripts — what they do (auth, seeding, cleanup)
+6. **Generate report** — output a structured summary:
+```markdown
+## Test Codebase Analysis
+### Framework
+- **Playwright** version X.Y.Z
+- Config: `playwright.config.ts`
+- Test directory: `tests/`
+### Conventions
+- File naming: `*.spec.ts`
+- Locator preference: getByRole > getByTestId
+- Structure: describe blocks by feature
+- Page Objects: Yes, in `tests/pages/`
+- Custom fixtures: Yes, in `tests/fixtures/`
+### Auth
+- Method: storageState via global setup
+- Auth file: `tests/.auth/user.json`
+### Existing Coverage
+- Total test files: N
+- Feature areas covered: [list]
+- Test count: ~N tests
+### Network Mocking
+- Method: page.route() / HAR / none
+- Patterns: [list any existing mocking patterns]
+### Visual Testing
+- Enabled: Yes/No
+- Tool: toHaveScreenshot() / third-party / none
+- Baseline directory: [path if exists]
+### Accessibility Testing
+- Enabled: Yes/No
+- Tool: @axe-core/playwright / custom assertions / none
+### Cross-Browser & CI
+- Browser projects: [list from config]
+- Retries: N (CI) / N (local)
+- Workers: N / fullyParallel: Yes/No
+- CI platform: GitHub Actions / Jenkins / none detected
+- Trace: [setting from config]
+### Recommendations for New Tests
+- Follow existing `*.spec.ts` naming
+- Use page objects from `tests/pages/`
+- Import test data from `tests/fixtures/testData.ts`
+- Use `baseURL` for navigation (relative paths)
+- Auth: reuse existing storageState setup
+```
+## Out of Scope
+This skill only reads and reports on the codebase. For related tasks, use the appropriate skill:
+| Task | Skill |
+|------|-------|
+| Browsing a live site, interacting with UI elements | `agent-web-interface-guide` |
+| Deciding what to test, coverage gaps, priorities | `plan-test-coverage` |
+| Writing or modifying executable test code | `write-test-code` |
+| Diagnosing flaky or failing tests | `fix-flaky-tests` |
+## Example Usage
+```
+Claude Code: /analyze-test-codebase
+Codex: $analyze-test-codebase
+Claude Code: /analyze-test-codebase ./my-app
+Codex: $analyze-test-codebase ./my-app
+```

package/dist/2.0.9/claude/plugin/skills/analyze-test-codebase/agents/claude.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+frontmatter:
+  argument-hint: "[optional: path to project root]"
+  user-invocable: true

package/dist/2.0.9/claude/plugin/skills/analyze-test-codebase/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Analyze Existing Test Setup"
+  short_description: "Inspect Playwright conventions before planning or writing tests"
+  default_prompt: "Analyze this Playwright codebase and summarize the current setup as an early workflow step before planning detailed coverage or writing test code."

package/dist/2.0.9/claude/plugin/skills/fix-flaky-tests/SKILL.md ADDED Viewed

@@ -0,0 +1,160 @@
+---
+name: fix-flaky-tests
+description: >
+  This skill should be used when a Playwright test is failing, flaky, timing out, or behaving
+  inconsistently. It provides structured root cause analysis for: stabilizing intermittent tests,
+  debugging timeouts ("Test timeout of 30000ms exceeded"), fixing race conditions, investigating
+  local-vs-CI divergence, running repeated stability checks (--repeat-each).
+  IMPORTANT: If running tests with --repeat-each, --retries, or multiple times to check stability,
+  STOP and load this skill first — it has structured root cause analysis that prevents brute-force
+  approaches. Triggers: "stabilize", "intermittent", "flaky", "keeps failing", "fails in CI",
+  "timeout on", "race condition", "run N times to check stability", "verify tests are stable".
+  NOT for writing new tests (use write-test-code) or analyzing setup (use analyze-test-codebase).
+allowed-tools: Read Write Edit Bash Glob Grep Task
+---
+# Fix Flaky Tests
+Systematically diagnose and fix intermittent Playwright test failures using root cause analysis. A flaky test is worse than no test — it trains teams to ignore failures.
+## Input
+Parse the test file path or test name from: $ARGUMENTS
+If no argument provided, ask: "Which test file or test name is flaky?"
+## Workflow
+### Step 1: Reproduce and Classify
+1. **Read the test file** to understand what it tests and how
+2. **Run the test multiple times** to observe the failure pattern:
+   ```bash
+   npx playwright test <file> --repeat-each=5 --reporter=list 2>&1
+   ```
+3. **Run in isolation** if it passed above — it may only fail with other tests:
+   ```bash
+   npx playwright test --reporter=list 2>&1
+   ```
+4. **Classify the failure** into one of these root cause categories:
+| Category | Symptoms |
+|----------|----------|
+| **Timing** | Timeout errors, "element not found", "not visible yet" |
+| **State leakage** | Passes alone, fails when run with other tests |
+| **Data dependency** | Fails when expected data doesn't exist or has changed |
+| **Race condition** | Action fires before page is ready (hydration, animation) |
+| **Selector fragility** | Element found but wrong one, or `.first()` picks different element |
+| **Environment** | Passes locally, fails in CI (viewport, speed, resources) |
+### Step 2: Root Cause Analysis
+Investigate based on the classification:
+**Timing issues:**
+- Look for assertions immediately after actions with no wait for the resulting state change
+- Check if the test asserts before an API response arrives — search for missing `waitForResponse`
+- Look for animations/transitions that affect element state (CSS transitions, skeleton screens)
+- Check for `waitForTimeout` being used as a "fix" — this is a symptom, not a cure
+- Check if `networkidle` or `load` waitUntil would help for navigation
+**State leakage:**
+- Run the failing test alone: `npx playwright test --grep "<test name>"`
+- Check if tests share mutable state: global variables, database rows, cookies, localStorage
+- Look for missing cleanup in `afterEach`/`afterAll`
+- Check if `storageState` bleeds between tests or test files
+- Check for test data created by one test that another test depends on
+**Race conditions:**
+- Identify the race: what two things are happening concurrently?
+- Check for click handlers that fire before JavaScript hydration completes
+- Look for optimistic UI updates that revert on API response
+- Check for actions during navigation transitions (click during page load)
+- Look for double-clicks or rapid interactions that trigger duplicate actions
+**Selector fragility:**
+- Navigate to the page in the browser and verify the selector currently matches the intended element
+- Check if the selector matches multiple elements — `.first()` or `.nth()` is a smell
+- Look for dynamically generated IDs, classes, or attributes
+- Check for conditional rendering that changes element order or presence
+- Verify locators against current DOM structure using `find` and `get_element`
+**Environment issues:**
+- Compare CI viewport size vs local — element may be off-screen in CI
+- Check for timezone-dependent assertions (dates, timestamps)
+- Check for locale-dependent formatting (numbers, currency)
+- Check if CI has slower network/CPU affecting timing
+- Look for third-party scripts (analytics, chat widgets) that load differently in CI
+### Step 3: Apply the Correct Fix
+Use the right fix pattern for the diagnosed root cause. **Never apply a fix without understanding the cause.** See [references/fix-patterns.md](references/fix-patterns.md) for full code examples.
+| Category | Principle |
+|----------|-----------|
+| **Timing** | Replace sleeps with event-driven waits (`waitForResponse`, auto-retrying assertions) |
+| **State isolation** | Unique data per test, API-based reset in `beforeEach`, no shared mutable state |
+| **Race condition** | Use `Promise.all` for action + expected response; wait for hydration before interaction |
+| **Selector** | Scope locators to containers with unique content; avoid `.first()` and position-dependent selectors |
+| **Environment** | Explicit viewport, timezone-agnostic assertions, block interfering third-party scripts |
+### Step 4: Verify the Fix
+1. **Run the test 5+ times** to confirm stability:
+   ```bash
+   npx playwright test <file> --repeat-each=5 --reporter=list 2>&1
+   ```
+2. **Run with the full test suite** to verify no state leakage:
+   ```bash
+   npx playwright test --reporter=list 2>&1
+   ```
+3. If still flaky → return to Step 2 with the new failure output. The initial classification may have been wrong.
+4. **Maximum 3 fix-and-rerun cycles.** If the test is still flaky after 3 attempts, stop and report the diagnostic findings (root cause hypothesis, fixes attempted, remaining failure output) so the user can decide next steps. Do not continue looping.
+### Step 5: Summarize
+Report:
+1. **Root cause** — what made the test flaky and why
+2. **Fix applied** — what changed and why this fix addresses the root cause
+3. **Verification** — how many consecutive runs passed
+4. **Prevention** — what pattern to follow in future tests to avoid this class of flakiness
+## Flakiness Checklist (Less Obvious Causes)
+When the standard categories don't fit, check these:
+- [ ] **Viewport size** — element off-screen in CI (smaller viewport)
+- [ ] **Font rendering** — text matching fails due to font differences across OS
+- [ ] **Timezone** — date/time assertions fail in different timezones
+- [ ] **Locale** — number/currency formatting differs (1,000 vs 1.000)
+- [ ] **Third-party scripts** — analytics/chat widgets change DOM or block clicks
+- [ ] **Cookie consent banners** — overlay blocks click targets
+- [ ] **Feature flags** — different features enabled in different environments
+- [ ] **Database state** — shared test database with stale or conflicting data
+- [ ] **Parallel execution** — tests interfere when run in parallel workers
+- [ ] **Browser caching** — cached responses differ from fresh ones
+- [ ] **Service workers** — intercepting requests differently than expected
+- [ ] **Lazy loading** — elements not yet in DOM when test tries to interact
+## Anti-Patterns: What is NOT a Fix
+These mask the problem. Never apply them without a real fix:
+| "Fix" | Why It's Wrong | Real Fix |
+|-------|---------------|----------|
+| `waitForTimeout(3000)` | Hides timing race, will break under load | Wait for the specific event |
+| `.first()` added | Hides selector ambiguity | Narrow the selector |
+| Increased timeout to 30s | Hides missing wait or slow setup | Find what you're actually waiting for |
+| `test.skip()` | Ignoring the problem | Diagnose and fix |
+| `retries: 3` without fix | Masks real failures, wastes CI time | Fix the root cause, then keep retries as safety net |
+| `{ force: true }` | Bypasses actionability checks, hides overlapping elements or disabled state | Find and fix the actionability issue: wait for overlay to disappear, scroll element into view, or wait for enabled state |
+| `try/catch` swallowing errors | Test passes but doesn't verify anything | Fix the assertion |
+## Multiple Flaky Tests
+When a suite has several flaky tests:
+1. **Triage first.** Run the full suite once and group failures by root cause category (timing, state leakage, etc.). Shared root causes (broken fixture, leaking state) should be fixed once, not per-test.
+2. **Fix shared infrastructure issues first.** A bad `beforeEach`, a leaking `storageState`, or a missing cleanup can cause many tests to fail. One fix resolves many failures.
+3. **Split independent fixes across subagents** when the fix scopes do not overlap (different test files, no shared fixtures). Pass each subagent the test file path, this diagnostic workflow, and the root cause classification table.
+4. The 3 fix-and-rerun cycle limit applies **per test**, not globally.

package/dist/2.0.9/claude/plugin/skills/fix-flaky-tests/agents/claude.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+frontmatter:
+  argument-hint: "<path to flaky test file or test name>"
+  user-invocable: true

package/dist/2.0.9/claude/plugin/skills/fix-flaky-tests/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,10 @@
+interface:
+  display_name: "Fix Flaky Playwright Tests"
+  short_description: "Diagnose unstable Playwright tests and fix the root cause"
+  default_prompt: "Diagnose this flaky Playwright test, reproduce the failure, and fix the root cause."
+dependencies:
+  tools:
+    - type: "mcp"
+      value: "agent-web-interface"
+      description: "Browser automation MCP used to reproduce and inspect unstable flows"

package/dist/2.0.9/claude/plugin/skills/fix-flaky-tests/references/fix-patterns.md ADDED Viewed

@@ -0,0 +1,91 @@
+# Fix Patterns by Root Cause
+Code examples for each root cause category. Apply only after diagnosing the cause in Step 2.
+## Timing Fixes — Replace Sleeps with Event-Driven Waits
+```typescript
+// BAD: arbitrary sleep
+await page.waitForTimeout(2000);
+await expect(element).toBeVisible();
+// GOOD: wait for the network event that loads the content
+await page.waitForResponse(resp => resp.url().includes('/api/data'));
+await expect(element).toBeVisible();
+// GOOD: wait for loading indicator to disappear
+await expect(page.getByRole('progressbar')).toBeHidden();
+await expect(element).toBeVisible();
+// GOOD: wait for navigation to complete
+await page.goto('/page', { waitUntil: 'networkidle' });
+// GOOD: use auto-retrying assertion (retries until timeout)
+await expect(page.getByText(/loaded/i)).toBeVisible({ timeout: 10000 });
+```
+## State Isolation Fixes
+```typescript
+// Unique data per test
+const uniqueEmail = `test-${Date.now()}@example.com`;
+// Reset state via API before each test
+test.beforeEach(async ({ request }) => {
+  await request.post('/api/test/reset');
+});
+// Use fresh browser context (default in Playwright, but verify)
+// Do NOT share page or context between tests
+```
+## Race Condition Fixes
+```typescript
+// Wait for hydration/framework readiness
+await page.waitForFunction(() =>
+  document.querySelector('[data-hydrated="true"]')
+);
+// Use Promise.all for action + expected response
+const [response] = await Promise.all([
+  page.waitForResponse('**/api/submit'),
+  submitButton.click(),
+]);
+// Wait for animation/transition to complete
+await expect(modal).toBeVisible();
+await page.waitForFunction(() =>
+  !document.querySelector('.modal-animating')
+);
+```
+## Selector Fixes
+```typescript
+// BAD: position-dependent, matches wrong element if order changes
+page.locator('.item').first();
+// GOOD: scoped to container with unique content
+page.getByRole('listitem').filter({ hasText: 'Specific Item' });
+// GOOD: use test IDs for ambiguous elements
+page.getByTestId('cart-item-sku-123');
+// GOOD: scope to a region first, then find within
+page.locator('main').getByRole('button', { name: /submit/i });
+```
+## Environment Fixes
+```typescript
+// Set explicit viewport in test or config
+test.use({ viewport: { width: 1280, height: 720 } });
+// Use timezone-agnostic assertions
+await expect(dateElement).toContainText(/\d{4}/); // year, not full date string
+// Block third-party scripts that interfere
+await page.route('**/analytics/**', route => route.abort());
+await page.route('**/chat-widget/**', route => route.abort());
+```

package/dist/2.0.9/claude/plugin/skills/generate-test-cases/SKILL.md ADDED Viewed

@@ -0,0 +1,184 @@
+---
+name: generate-test-cases
+description: >
+  Use when the user wants detailed TC-ID test case specifications for a web app feature, not executable code. It explores the target flow, covers happy paths, validation failures, edge cases, and required error states, then writes structured specs under `test-cases/`. Use it after coverage planning or when the user explicitly asks for test cases or TC-IDs. It does not write Playwright code.
+allowed-tools: Read Write Bash Glob Grep Task
+---
+# Generate Test Cases
+Generate comprehensive, structured test case specifications for a web application by exploring it live in a browser.
+## Input
+Parse the target URL and user journey description from: $ARGUMENTS
+## Workflow
+### Step 1: Understand the Journey and Existing Coverage
+Parse the journey description to identify:
+- **Base URL** and target feature area
+- **Primary user goal** (what the happy path achieves)
+- **Key interaction points** (forms, buttons, navigation, selections)
+- **Implicit requirements** (validation, authentication, authorization)
+Check for existing test coverage before exploring:
+- Search for existing test files related to this feature (`Grep` for feature keywords in `**/*.spec.ts`, `**/*.test.ts`)
+- Read any existing `test-cases/*.md` spec files for this feature
+- Note existing TC-IDs to avoid conflicts — continue numbering from the highest existing ID
+- Focus on gaps in existing coverage
+### Step 2: Explore the Happy Path
+Use a subagent for browser exploration when that saves context. Pass it:
+- The URL and journey description
+- Instructions to walk through each step using `find`, `get_form`, `get_field`
+- Instructions to catalog all interactive elements, form fields, navigation options
+- Instructions to use `get_element` on key elements to capture the best Playwright selector
+- Instructions to return results in this structured format for each step:
+```
+Step: <what was done>
+URL: <current URL after action>
+Elements found:
+  - Submit button: getByRole('button', { name: /submit/i })
+  - Email field: getByLabel(/email/i)
+  - Error message: getByText(/required/i)
+Observations: <what appeared, validation messages, state changes>
+```
+This structured output ensures selectors survive the handoff to the spec file and ultimately to `write-test-code`.
+### Step 3: Explore Alternative and Failure Paths
+Launch another subagent, or continue in the main thread if the flow is small, to systematically probe beyond the happy path:
+**Validation & Error Handling:**
+- Submit forms with empty required fields
+- Enter invalid formats (wrong email, short passwords, letters in number fields)
+- Exceed field length limits, use special characters and Unicode
+**Boundary Conditions:**
+- Min/max values for numeric fields
+- Single character and max length strings
+- Zero quantities, negative numbers, date boundaries
+**State & Navigation:**
+- Browser back/forward during multi-step flows
+- Page refresh mid-flow
+- Accessing later steps directly via URL
+**UI & Interaction:**
+- Rapid repeated clicks on submit buttons
+- Dropdown default values, empty options
+- Loading states, disabled states, conditional visibility
+**Access & Authorization (observe only):**
+- Redirect behavior for unauthenticated users
+- Permission-related error messages
+### Step 4: Reason About Additional Scenarios
+After exploration, reason about scenarios that could not be directly triggered but must be covered:
+- **Network & Performance** — failure modes, slow responses, large data sets, offline behavior
+- **Accessibility (WCAG 2.1 AA)** — keyboard navigation, screen reader support, focus management, contrast
+- **Visual Consistency** — layout stability, responsive breakpoints, dark mode
+- **Cross-browser** — Safari/Firefox/mobile-specific behavioral differences
+- **Concurrent & Session** — session expiry, multi-tab conflicts, race conditions
+See [references/scenario-categories.md](references/scenario-categories.md) for detailed checklists within each category.
+### Step 5: Generate Test Case Specifications
+Write structured test cases to `test-cases/<feature-name>.md`.
+## Output Specification
+### Test Case Format
+```markdown
+### TC-<FEATURE>-<NUMBER>: <Descriptive title>
+**Priority:** Critical | High | Medium | Low
+**Category:** Happy Path | Validation | Error Handling | Edge Case | Boundary | Security | Accessibility | Visual | Performance | Network Error | UX
+**Preconditions:**
+- <What must be true before this test>
+**Steps:**
+1. <Action the tester performs>
+2. <Next action>
+**Expected Result:**
+- <What should happen>
+**Selectors observed:**
+- <element>: `getByRole('button', { name: /submit/i })` or `getByLabel(/email/i)` — from `get_element`/`find` during exploration
+- (Include selectors for key interactive elements so `write-test-code` doesn't have to rediscover them)
+**Notes:**
+- <Additional context discovered during exploration>
+```
+### Output Organization
+```markdown
+# Test Cases: <Feature Name>
+**URL:** <base URL>
+**Generated:** <date>
+**Journey:** <brief description>
+## Summary
+- Total test cases: <count>
+- Critical: <count> | High: <count> | Medium: <count> | Low: <count>
+## Happy Path
+## Validation & Error Handling
+## Edge Cases
+## Boundary Conditions
+## Security & Access
+## Network Error Scenarios
+## Visual & Responsive
+## Performance & Loading
+## Accessibility & UX
+```
+### File Naming Convention
+When generating specs that span multiple roles or test categories, recommend role-based file naming (`*.admin.spec.ts`, `*.user.spec.ts`) or Playwright tag annotations (`@admin`, `@smoke`) in the spec. This enables selective execution via `--grep @admin` or glob patterns instead of fragile `testIgnore` regex in playwright.config.ts. NEVER recommend a `testIgnore` regex that must be updated for every new test file.
+### TC-ID Convention
+- Format: `TC-<FEATURE>-<NNN>` where NNN is zero-padded to 3 digits
+- Feature abbreviation: short and clear (LOGIN, CHECKOUT, SEARCH, SIGNUP)
+- Start at 001, sequential, unique within the document
+- Happy path first, then validation, then edge cases
+## Quality Standards
+- Every test case must be **independently executable** — no hidden dependencies
+- Steps must be **concrete and unambiguous** — "click the Submit button" not "submit the form"
+- Expected results must be **observable and verifiable** — include actual error messages observed
+- Priority must be **justified** — Critical = blocks core journey, High = significant, Medium = secondary, Low = cosmetic
+- Every feature spec MUST include at minimum: one network error scenario (500/timeout), one empty state scenario, and one session/auth edge case (if the feature requires auth). These are non-negotiable — omitting them is a BLOCKER in the review-test-cases quality gate.
+- **Test case count guidance:** Aim for 15-30 test cases per feature area as a baseline. Fewer than 10 suggests missing error paths or edge cases. More than 40 suggests the feature should be split into sub-features with separate spec files. Prioritize breadth of category coverage over depth within a single category.
+## Blocking Conditions
+Report and work around:
+- **Login/auth walls**: Document as precondition, test observable behavior
+- **CAPTCHA**: Report, skip, note in preconditions
+- **Payment gateways**: Don't enter real data, document flow up to that point
+- **Rate limiting**: Slow down, note rate limit behavior as a test case
+## Example Usage
+```
+Claude Code: /generate-test-cases https://example.com/login User logs in with email and password, sees dashboard
+Codex: $generate-test-cases https://example.com/login User logs in with email and password, sees dashboard
+Claude Code: /generate-test-cases https://shop.example.com User searches for product, adds to cart, proceeds to checkout
+Codex: $generate-test-cases https://shop.example.com User searches for product, adds to cart, proceeds to checkout
+```

package/dist/2.0.9/claude/plugin/skills/generate-test-cases/agents/claude.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+frontmatter:
+  argument-hint: "<url> <user journey description>"
+  user-invocable: true

package/dist/2.0.9/claude/plugin/skills/generate-test-cases/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,10 @@
+interface:
+  display_name: "Write TC-ID Specs"
+  short_description: "Explore the feature and write detailed TC-ID test case specs"
+  default_prompt: "Explore this feature and generate detailed TC-ID test case specifications."
+dependencies:
+  tools:
+    - type: "mcp"
+      value: "agent-web-interface"
+      description: "Browser automation MCP used to inspect the live flow before writing specs"

package/dist/2.0.9/claude/plugin/skills/generate-test-cases/references/scenario-categories.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Scenario Categories — Detailed Checklists
+These checklists support Step 4 of the generate-test-cases skill. Each category covers scenarios that may not be directly triggerable during browser exploration but must be included in comprehensive test specifications.
+## Network & Performance
+- Network failure during form submission (mock 500, timeout)
+- Slow API response (loading states, skeleton screens, spinners)
+- Large data sets (pagination, infinite scroll, 100+ items)
+- Offline behavior (if PWA or service worker is present)
+## Accessibility (WCAG 2.1 AA)
+- Keyboard-only navigation through the entire flow (Tab, Enter, Escape)
+- Screen reader announcements for dynamic content (ARIA live regions)
+- Focus management after modal open/close, page transitions
+- Color contrast for error states and disabled elements
+- Form error association (`aria-describedby` linking errors to fields)
+## Visual Consistency
+- Layout stability (no unexpected content shifts after load)
+- Responsive behavior at standard breakpoints (mobile 375px, tablet 768px, desktop 1280px)
+- Dark mode rendering if supported
+## Cross-browser Considerations
+- Safari-specific behavior (date inputs, smooth scrolling, storage quirks)
+- Firefox form validation differences
+- Mobile browser touch targets and gestures
+## Concurrent & Session
+- Session expiry mid-flow (cookie cleared during multi-step)
+- Concurrent access (two tabs, same user)
+- Race conditions (double-click submit, rapid navigation)