npm - @athenaflow/plugin-e2e-test-builder - Versions diffs - 2.0.9 - Mend

@athenaflow/plugin-e2e-test-builder 2.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (190) hide show

package/dist/2.0.9/claude/plugin/skills/plan-test-coverage/SKILL.md ADDED Viewed

@@ -0,0 +1,117 @@
+---
+name: plan-test-coverage
+description: >
+  Use before writing specs or test code to decide what E2E coverage is needed first. It scans existing tests, inspects the target flow, finds coverage gaps, and produces a prioritized P0/P1/P2 plan with TC-IDs. Use it for requests like "what tests do I need", "coverage gaps", or "what TC-IDs are missing". It does not write detailed specs or executable tests.
+allowed-tools: Read Glob Grep Task
+---
+# Plan Test Coverage
+Plan what E2E tests to write for a feature by analyzing existing test coverage and, when browser tooling is available in the current context, doing a quick site inspection.
+## Workflow
+1. **Parse input** — extract the target URL and feature area from: $ARGUMENTS
+2. **Check existing test coverage**:
+   - Search for existing test files related to the feature:
+     ```
+     Grep for feature keywords in **/*.spec.ts, **/*.test.ts
+     ```
+   - Identify what's already covered and what's missing
+   - Note existing TC-IDs for the feature area to avoid conflicts
+3. **Quick site inspection** (lightweight, not full exploration, optional if browser tooling is unavailable):
+   - If the current context has browser tools, follow the `agent-web-interface-guide` skill's browsing patterns (orient before acting, use `list_pages` for session awareness, close only pages you opened)
+   - Navigate to the URL in a dedicated page
+   - Use `find` to catalog the main interactive elements
+   - Use `get_form` or `get_field` if the page has forms worth covering
+   - Identify the key user flows visible on the page
+   - Close only the page you opened when done; do not rely on a session-wide close
+   - If browser tooling is unavailable, infer flows from the URL, existing tests, route names, component names, and user-provided context, and record that the plan was produced without live inspection
+4. **Identify test categories** — for the feature, determine tests needed across:
+   - **Critical path** — core happy path that must never break
+   - **Input validation** — form fields, required fields, format constraints
+   - **Error states** — network errors, server errors, empty states
+   - **Edge cases** — boundary values, special characters, concurrent actions
+   - **Cross-feature** — interactions with other features (e.g., auth + checkout)
+   - **Accessibility** — keyboard navigation, screen reader support, focus management
+   - **Visual regression** — layout consistency, responsive breakpoints (375px, 768px, 1280px)
+   - **Performance** — loading states, lazy loading, large data sets
+   - **Network errors** — server 500s, timeouts, offline behavior
+   Not all categories apply to every project. Include Accessibility, Visual Regression, and Cross-Browser sections only when the project has explicit requirements, tooling, or configuration for them. Omit them from the output plan if not relevant — a focused plan is more useful than a padded one.
+5. **Prioritize** — rank tests by:
+   - **P0 (Must have)**: Core user journey, auth flows, data corruption prevention. Blocks revenue/signups if broken.
+   - **P1 (Should have)**: Input validation, common error paths, accessibility basics (keyboard navigation, form labels)
+   - **P2 (Nice to have)**: Edge cases, visual regression, performance scenarios, cross-browser specifics, rare error paths
+6. **Output test plan**:
+```markdown
+## Test Coverage Plan: <Feature>
+**URL:** <url>
+**Date:** <date>
+**Existing coverage:** <N tests already exist / none>
+### Already Covered
+- TC-FEATURE-001: <description> (in `tests/feature.spec.ts`)
+- ...
+### Proposed New Tests
+#### P0 — Critical Path
+| TC-ID | Description | Why Critical |
+|-------|-------------|-------------|
+| TC-FEATURE-010 | Happy path: user completes full flow | Core revenue path |
+#### P1 — Validation & Errors
+| TC-ID | Description | Why Important |
+|-------|-------------|--------------|
+| TC-FEATURE-020 | Submit with empty required fields | Common user error |
+#### P2 — Edge Cases
+| TC-ID | Description | Notes |
+|-------|-------------|-------|
+| TC-FEATURE-030 | Special characters in search input | Unicode handling |
+#### Accessibility (include if project has accessibility requirements or WCAG compliance goals)
+| TC-ID | Description | WCAG Criterion |
+|-------|-------------|----------------|
+| TC-FEATURE-A01 | Keyboard-only navigation through flow | 2.1.1 Keyboard |
+| TC-FEATURE-A02 | Form errors announced to screen readers | 1.3.1 Info and Relationships |
+#### Visual Regression (if project has visual testing setup)
+| TC-ID | Description | Viewport |
+|-------|-------------|----------|
+| TC-FEATURE-V01 | Layout consistency at mobile width | 375x812 |
+#### Cross-Browser Matrix (include if project runs tests across multiple browsers)
+| Browser | Priority | Reason |
+|---------|----------|--------|
+| Chromium | P0 | Primary target |
+| Firefox | P1 | Second largest desktop share |
+| WebKit/Safari | P1 | Required for iOS users |
+### Recommended Order
+1. Write P0 tests first (N tests)
+2. Then P1 validation + accessibility basics (N tests)
+3. P2 edge cases, visual regression, and performance as time allows
+### Next Steps
+- Invoke the `generate-test-cases` skill with the target URL and journey for detailed test specs
+- Invoke the `write-test-code` skill to implement the tests
+```
+## Example Usage
+```
+Claude Code: /plan-test-coverage https://myapp.com/checkout Checkout flow
+Codex: $plan-test-coverage https://myapp.com/checkout Checkout flow
+Claude Code: /plan-test-coverage https://myapp.com/login Authentication
+Codex: $plan-test-coverage https://myapp.com/login Authentication
+```

package/dist/2.0.9/claude/plugin/skills/plan-test-coverage/agents/claude.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+frontmatter:
+  argument-hint: "<url> <feature or area to test>"
+  user-invocable: true

package/dist/2.0.9/claude/plugin/skills/plan-test-coverage/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,10 @@
+interface:
+  display_name: "Plan Coverage Priorities"
+  short_description: "Identify coverage gaps and prioritize what to test first"
+  default_prompt: "Review this feature and produce a prioritized E2E coverage plan without writing specs or code."
+dependencies:
+  tools:
+    - type: "mcp"
+      value: "agent-web-interface"
+      description: "Browser automation MCP used for lightweight site inspection"

package/dist/2.0.9/claude/plugin/skills/review-test-cases/SKILL.md ADDED Viewed

@@ -0,0 +1,147 @@
+---
+name: review-test-cases
+description: >
+  This skill should be used when a quality review of TC-ID test case specifications is needed before writing executable
+  test code. It reviews the spec artifact only; it does not implement or rewrite tests.
+  Triggers: "review test cases", "check test specs", "review TC-IDs", "audit test coverage",
+  "are my test cases good", "validate test specs", "review test-cases/*.md",
+  "check for gaps in test cases", "review before writing tests", "quality check test specs".
+  Inserted as a quality gate between generate-test-cases and write-test-code — catches
+  gaps, duplication, weak assertions, missing error paths, and invented scenarios before they get
+  encoded into test code. Review-only — does NOT modify the spec file, does NOT write test code.
+  The write-test-code skill should be used for implementation.
+allowed-tools: Read Glob Grep Task
+---
+# Review Test Cases
+Review TC-ID test case specifications for completeness, accuracy, and quality before they are implemented as executable Playwright tests. This is a quality gate — catch problems in the spec, not in the code.
+## Input
+Parse the spec file path from: $ARGUMENTS
+If no argument provided, search for `test-cases/*.md` files and review the most recently modified one.
+## Workflow
+### Step 1: Load the Spec and Context
+1. Read the test case spec file
+2. Read any related files for context:
+   - `e2e-plan/conventions.md` or `e2e-plan/coverage-plan.md` if they exist
+   - `e2e-tracker.md` if it exists (to understand what was explored)
+3. Extract the target URL from the spec header
+### Step 2: Run the Review Checklist
+Evaluate every test case against each criterion. Track findings by severity:
+- **BLOCKER** — must fix before writing tests (missing critical paths, invented behavior, wrong URL)
+- **WARNING** — should fix, will cause problems in implementation (vague steps, weak assertions, duplication)
+- **SUGGESTION** — optional improvement (priority adjustment, better categorization, additional edge case)
+#### 2a. Coverage Completeness
+| Check | What to Look For |
+|-------|-----------------|
+| Happy path present | At least one Critical-priority test covers the primary success flow end-to-end |
+| Error paths covered (MINIMUM) | Every spec MUST have at least: (1) one server error test (500), (2) one network failure test (timeout/offline), (3) one empty state test. If auth is involved: (4) one session expiry test. Missing any of these is a BLOCKER, not a suggestion |
+| Boundary conditions | Min/max values, empty inputs, special characters, long strings |
+| Authentication edge cases | Session expiry, unauthorized access, role-based differences (if applicable) |
+| Navigation edge cases | Back/forward, direct URL access, refresh mid-flow |
+| Missing user actions | Every interactive element on the page should appear in at least one test case |
+#### 2b. Specification Quality
+| Check | What to Look For |
+|-------|-----------------|
+| Steps are concrete | "Click the Submit button" not "submit the form"; "Enter 'test@example.com' in Email field" not "enter email" |
+| Expected results are observable | Specific text, URL change, element state — not "page updates" or "works correctly" |
+| Preconditions are explicit | Auth state, test data, feature flags, starting URL — nothing assumed |
+| TC-IDs are sequential | No gaps, no duplicates, correct feature prefix |
+| Priority is justified | Critical = blocks core journey; not everything is Critical |
+| Categories are accurate | Happy Path vs Validation vs Edge Case — correctly classified |
+#### 2c. Invented vs Observed
+This is the most important check. Test cases should trace back to behavior that was actually observed or deliberately triggered during exploration, not assumed.
+Red flags for invented scenarios:
+- Specific error message text that wasn't observed (e.g., "Please enter a valid email" when the actual message might differ)
+- Assumptions about validation rules without exploration evidence (e.g., "minimum 8 characters" without trying it)
+- Test cases for UI elements that may not exist (e.g., "retry button" on error page without visiting the error page)
+- Server-side behavior assumptions (e.g., "rate limit after 5 attempts" without evidence)
+When suspicious: delegate a spot-check to a subagent with browser access (Task tool). Pass it the target URL, the specific TC-IDs under suspicion, and the claims to verify (element existence, error message text, validation behavior). The subagent should return structured evidence: what it found, what matched, what differed.
+#### 2d. Duplication and Overlap
+- Flag test cases that test the same behavior with trivially different inputs
+- Flag test cases where the steps are identical but expected results differ only cosmetically
+- Merging candidates: cases that could be combined into a single parameterized test without losing coverage
+#### 2e. Implementability
+- Flag steps that cannot be automated with Playwright (e.g., "verify email arrives", "check database directly")
+- Flag preconditions that require manual setup with no automation path
+- Flag assertions that require visual comparison without specifying tolerance
+- Flag test cases that depend on third-party services (payment processors, OAuth providers) without a mock strategy
+### Step 3: Produce the Review Report
+Output a structured review with this format:
+```markdown
+# Test Case Review: <feature>
+**Spec file:** <path>
+**Total test cases:** <count>
+**Review date:** <date>
+## Verdict: PASS | PASS WITH WARNINGS | NEEDS REVISION
+## Blockers (<count>)
+- **TC-<ID>**: <issue description>
+## Warnings (<count>)
+- **TC-<ID>**: <issue description>
+## Suggestions (<count>)
+- **TC-<ID>**: <issue description>
+## Coverage Gaps
+- <Missing scenario that should be added>
+## Duplication
+- **TC-<ID>** and **TC-<ID>**: <overlap description>
+## Summary
+<2-3 sentences on overall spec quality and what to address before implementation>
+```
+### Step 4: Verdict Rules
+- **PASS** — no blockers, 2 or fewer warnings. Proceed to write-test-code.
+- **PASS WITH WARNINGS** — no blockers, 3+ warnings. Can proceed but should address warnings.
+- **NEEDS REVISION** — 1+ blockers. Do not proceed to write-test-code until blockers are resolved.
+Example: 0 blockers + 2 warnings = PASS. 0 blockers + 3 warnings = PASS WITH WARNINGS. 1+ blockers = NEEDS REVISION regardless of warning count.
+## Principles
+- **Review-only** — never modify the spec file; report findings for the author to act on
+- **Evidence over opinion** — cite specific TC-IDs and quote specific steps/assertions when flagging issues
+- **Spot-check against live site** — delegate to a subagent with browser access to verify 2-3 suspicious claims rather than trusting all text at face value
+- **Bounded output** — the review report should be actionable and finite, not an exhaustive rewrite
+- **Severity matters** — distinguish blockers from suggestions; not every imperfection is worth fixing before implementation
+## Example Usage
+```
+Claude Code: /review-test-cases test-cases/login.md
+Codex: $review-test-cases test-cases/login.md
+Claude Code: /review-test-cases test-cases/checkout.md
+Codex: $review-test-cases test-cases/checkout.md
+```

package/dist/2.0.9/claude/plugin/skills/review-test-cases/agents/claude.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+frontmatter:
+  argument-hint: "<path to test-cases/*.md spec file>"
+  user-invocable: true

package/dist/2.0.9/claude/plugin/skills/review-test-cases/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,10 @@
+interface:
+  display_name: "Review TC-ID Specs"
+  short_description: "Review TC-ID specs for gaps, duplication, and weak assertions"
+  default_prompt: "Review these TC-ID test case specifications before implementation and flag quality issues."
+dependencies:
+  tools:
+    - type: "mcp"
+      value: "agent-web-interface"
+      description: "Browser automation MCP used to spot-check observed behavior claims against live site"

package/dist/2.0.9/claude/plugin/skills/review-test-code/SKILL.md ADDED Viewed

@@ -0,0 +1,189 @@
+---
+name: review-test-code
+description: >
+  Quality review of Playwright test code before final execution signoff. This skill should be used
+  when implementation review of executable Playwright tests is needed, not for diagnosis of runtime flakiness.
+  Triggers: "review test code", "review Playwright tests", "check test quality",
+  "audit test implementation", "review my tests before merging", "check test code for issues",
+  "review e2e tests", "code review Playwright", "are my tests stable",
+  "check for brittle selectors", "review before running tests". Quality gate
+  after write-test-code — catches brittle selectors, force:true misuse, networkidle overuse, Tailwind
+  utility class selectors, exact numeric assertions, missing teardown, parallel-unsafe mutations,
+  hardcoded data, missing assertions, test coupling, and convention divergence. Review-only — does NOT
+  rewrite tests, does NOT run tests. Use fix-flaky-tests for fixing, write-test-code for rewriting.
+allowed-tools: Read Glob Grep Task
+---
+# Review Test Code
+Review Playwright test code for stability, correctness, and adherence to project conventions before final execution signoff. This is a quality gate — catch structural issues in code before running tests, not after flaky failures.
+## Input
+Parse the test file path or directory from: $ARGUMENTS
+If no argument provided, search for recently modified `*.spec.ts` or `*.test.ts` files and review those.
+## Workflow
+### Step 1: Load Context
+1. Read the test file(s) to review
+2. Read project conventions for comparison:
+   - `playwright.config.ts` or `playwright.config.js` — extract `baseURL`, `testDir`, projects, timeouts, `fullyParallel`, `workers`
+   - 2-3 existing test files (not the ones under review) to establish the project's conventions
+   - `e2e-plan/conventions.md` if it exists
+3. Read the corresponding test case spec (`test-cases/<feature>.md`) if it exists — needed for traceability check
+4. Note the project's locator strategy, fixture patterns, auth approach, and naming conventions
+### Step 2: Run the Review Checklist
+Evaluate the test code against each criterion. Track findings by severity:
+- **BLOCKER** — will cause test failures or false passes (missing assertions, wrong selectors, broken isolation)
+- **WARNING** — will cause flakiness or maintenance burden (brittle selectors, arbitrary waits, poor structure)
+- **SUGGESTION** — style or convention improvement (naming, organization, minor readability)
+#### 2a. Locator Quality
+| Check | What to Look For |
+|-------|-----------------|
+| Semantic locators preferred | `getByRole`, `getByLabel`, `getByPlaceholder` over CSS selectors |
+| No fragile positional selectors | `.first()`, `.nth()`, `.last()` without documented justification |
+| No dynamic IDs or classes | Selectors containing generated hashes, UUIDs, or auto-incremented values |
+| No utility framework classes | Selectors must not contain Tailwind (`rounded-lg`, `flex`, `bg-*`), Bootstrap (`btn-primary`, `col-md-*`), or similar utility classes — these are styling, not identity |
+| Scoped to containers | Locators narrowed to `main`, `nav`, `[role="dialog"]` where needed |
+| No exact long text matches | Use regex with key words instead of full marketing copy |
+When a locator appears suspicious, delegate verification to a subagent (Task tool): instruct it to open the target URL, locate the element using the browser MCP tools (`find`, `get_element`), and report back whether the element exists and is unique.
+#### 2b. Waiting and Timing
+| Check | What to Look For |
+|-------|-----------------|
+| No `waitForTimeout()` | Arbitrary sleeps mask real timing issues |
+| Proper action-response waits | `waitForResponse` before asserting API-dependent UI |
+| Auto-retrying assertions used | `await expect(el).toBeVisible()` not `expect(await el.isVisible()).toBe(true)` |
+| Reasonable explicit timeouts | Custom timeouts (`{ timeout: 10000 }`) have a comment explaining why |
+| No `networkidle` overuse | `networkidle` is fragile; prefer specific response waits |
+#### 2c. Assertions
+| Check | What to Look For |
+|-------|-----------------|
+| Every test has assertions | No test blocks without `expect()` calls |
+| Assertions test user outcomes | Visible text, URL changes, element states — not internal state or CSS classes |
+| Assertions are specific | `toHaveText('Welcome, John')` not just `toBeVisible()` |
+| Error paths have assertions | Error scenario tests verify the error message, not just that "something happened" |
+| No exact server-computed values | Dashboard counters, totals, and aggregates must not assert exact numbers — use patterns, ranges, or seed data first |
+| No `toBeTruthy()` on locators | Use Playwright-specific matchers (`toBeVisible`, `toBeEnabled`, `toHaveText`) |
+#### 2d. Test Isolation and Structure
+| Check | What to Look For |
+|-------|-----------------|
+| No shared mutable state | Tests do not depend on execution order or modify shared variables |
+| Proper setup/teardown | `beforeEach`/`afterEach` for shared setup, not duplicated in each test |
+| AAA structure | Clear Arrange → Act → Assert sections (comments optional but structure required) |
+| No test coupling | Test B does not depend on side effects from Test A |
+| Auth handled correctly | `storageState` or fixture, not UI login in every test (unless testing login itself) |
+| Test data is unique | Uses `Date.now()`, factories, or unique IDs — not hardcoded shared data |
+| Parallel-safe (if `fullyParallel: true` or `workers` > 1 found in config) | Tests that create data must not assert on unscoped lists or counts — filter assertions to the specific data created. If parallelism is disabled, note as SUGGESTION rather than WARNING |
+| Data cleanup present | Tests that create persistent records (API POST/PUT) must have corresponding teardown (`afterEach`, fixture cleanup, or `globalTeardown`) |
+#### 2e. Convention Adherence
+| Check | What to Look For |
+|-------|-----------------|
+| TC-ID in test title | Every test has `TC-<FEATURE>-<NNN>: Description` format |
+| File naming matches project | Follows existing `*.spec.ts` or `*.test.ts` convention |
+| Import style matches project | Imports from project fixtures file if one exists, not raw `@playwright/test` |
+| baseURL used | `page.goto('/')` not `page.goto('https://example.com/')` |
+| POM pattern followed (if used) | Page objects for interactions, tests for assertions |
+| Consistent locator strategy | Same locator approach as existing tests |
+#### 2f. TC-ID Traceability
+If a test case spec file exists for this feature:
+- Verify every TC-ID from the spec has a corresponding test
+- Flag TC-IDs in the spec with no implementation
+- Flag tests with TC-IDs not present in the spec (orphaned tests)
+- Note: not every spec TC-ID must be implemented — but missing ones should be acknowledged
+#### 2g. Anti-Pattern Detection
+Flag any instances of these known anti-patterns:
+1. Raw CSS selectors where semantic locators would work
+2. `waitForTimeout()` used as a fix
+3. `.first()` / `.nth()` without justification
+4. Exact long text matches (fragile to copy changes)
+5. Login via UI in every test (should use storageState)
+6. UI clicks to set up test data (should use API)
+7. No error path tests in the suite
+8. Hardcoded test data
+9. Tests depending on execution order
+10. `expect(await el.isVisible()).toBe(true)` instead of `await expect(el).toBeVisible()`
+11. Missing `await` on Playwright calls (easy to miss, causes silent failures)
+12. `{ force: true }` on interactions without documented justification (masks actionability issues — overlapping elements, disabled state, not scrolled into view)
+13. `waitForLoadState('networkidle')` as default wait strategy — breaks on long-polling, WebSockets, analytics beacons; use specific `waitForResponse` or UI assertions instead
+14. CSS utility class selectors (Tailwind `rounded-lg`, `flex`, Bootstrap `btn-primary`, `col-md-*`) — styling classes are volatile, never use as selectors
+15. Asserting exact server-computed values (`toHaveText('12450')`) — use pattern matchers, ranges, or seed data to control expected values
+### Step 3: Produce the Review Report
+Output a structured review with this format:
+```markdown
+# Test Code Review: <file or feature>
+**Files reviewed:** <list>
+**Total tests:** <count>
+**Review date:** <date>
+## Verdict: PASS | PASS WITH WARNINGS | NEEDS REVISION
+## Blockers (<count>)
+- **<file>:<line>** `<test name>`: <issue description>
+## Warnings (<count>)
+- **<file>:<line>** `<test name>`: <issue description>
+## Suggestions (<count>)
+- **<file>:<line>**: <issue description>
+## Convention Divergences
+- <How this code differs from the project's established patterns>
+## TC-ID Traceability
+- **Implemented:** <count> / <total in spec>
+- **Missing from implementation:** <list of TC-IDs>
+- **Orphaned (no spec):** <list of TC-IDs>
+## Summary
+<2-3 sentences on overall code quality and what to address before test execution>
+```
+### Step 4: Verdict Rules
+- **PASS** — no blockers, 2 or fewer warnings. Proceed to test execution.
+- **PASS WITH WARNINGS** — no blockers, 3+ warnings. Can proceed but should address warnings for long-term stability.
+- **NEEDS REVISION** — 1+ blockers. Do not run tests expecting stable results until blockers are resolved.
+## Principles
+- **Review-only** — never modify test files; report findings for the author to act on
+- **Evidence over opinion** — cite specific file paths, line numbers, and code snippets when flagging issues
+- **Spot-check selectors** — delegate to a subagent with browser access to verify 2-3 suspicious locators against the live site
+- **Convention-first** — compare against the project's existing test patterns, not an abstract ideal
+- **Bounded output** — the review should be actionable and finite, not a full rewrite specification
+- **Severity matters** — a missing `await` is a blocker; a naming style preference is a suggestion
+## Example Usage
+```
+Claude Code: /review-test-code tests/e2e/login.spec.ts
+Codex: $review-test-code tests/e2e/login.spec.ts
+Claude Code: /review-test-code tests/e2e/
+Codex: $review-test-code tests/e2e/
+```

package/dist/2.0.9/claude/plugin/skills/review-test-code/agents/claude.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+frontmatter:
+  argument-hint: "<path to test file or directory>"
+  user-invocable: true

package/dist/2.0.9/claude/plugin/skills/review-test-code/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,10 @@
+interface:
+  display_name: "Review Playwright Test Code"
+  short_description: "Review Playwright test implementation for stability and correctness"
+  default_prompt: "Review this Playwright test implementation before execution and flag quality issues."
+dependencies:
+  tools:
+    - type: "mcp"
+      value: "agent-web-interface"
+      description: "Browser automation MCP used to verify selectors against the live site"