npm - @harness-engineering/cli - Versions diffs - 1.13.0 → 1.13.1 - Mend

@harness-engineering/cli 1.13.0 → 1.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (267) hide show

package/dist/agents/skills/claude-code/harness-design/skill.yaml CHANGED Viewed

@@ -33,6 +33,7 @@ mcp:
     skill: harness-design
     path: string
 type: flexible
+tier: 3
 phases:
   - name: intent
     description: Capture aesthetic intent, style, tone, and differentiator

package/dist/agents/skills/claude-code/harness-design-mobile/skill.yaml CHANGED Viewed

@@ -31,6 +31,7 @@ mcp:
     skill: harness-design-mobile
     path: string
 type: rigid
+tier: 3
 phases:
   - name: scaffold
     description: Read tokens and design intent, detect mobile platform, plan component structure with platform-specific rules

package/dist/agents/skills/claude-code/harness-design-system/skill.yaml CHANGED Viewed

@@ -31,6 +31,7 @@ mcp:
     skill: harness-design-system
     path: string
 type: rigid
+tier: 3
 phases:
   - name: discover
     description: Detect existing design system, tokens, frameworks, and project context

package/dist/agents/skills/claude-code/harness-design-web/skill.yaml CHANGED Viewed

@@ -34,6 +34,7 @@ mcp:
     skill: harness-design-web
     path: string
 type: rigid
+tier: 3
 phases:
   - name: scaffold
     description: Read tokens and design intent, detect framework and CSS strategy, plan component structure

package/dist/agents/skills/claude-code/harness-diagnostics/skill.yaml CHANGED Viewed

@@ -30,6 +30,7 @@ mcp:
     skill: harness-diagnostics
     path: string
 type: rigid
+tier: 3
 phases:
   - name: classify
     description: Categorize the error into one of 7 taxonomy categories

package/dist/agents/skills/claude-code/harness-docs-pipeline/skill.yaml CHANGED Viewed

@@ -39,6 +39,7 @@ mcp:
     skill: harness-docs-pipeline
     path: string
 type: rigid
+tier: 2
 phases:
   - name: freshen
     description: Check graph freshness, detect AGENTS.md, trigger bootstrap if needed

package/dist/agents/skills/claude-code/harness-dx/SKILL.md ADDED Viewed

@@ -0,0 +1,276 @@
+# Harness DX
+> Audit developer experience artifacts -- README quality, API documentation coverage, getting-started guides, and example code validation. Produces a structured DX scorecard with specific improvements and scaffolds missing documentation.
+## When to Use
+- When preparing a library, SDK, or open-source project for release and developer adoption matters
+- When reviewing a PR that changes public API surface and documentation should match
+- When onboarding friction is high and you need to identify where developers get stuck
+- NOT for internal architecture documentation (use harness-docs-pipeline)
+- NOT for user-facing product copy (use harness-ux-copy)
+- NOT for API design decisions like REST vs GraphQL (use harness-api-design)
+## Process
+### Phase 1: AUDIT -- Evaluate Documentation Quality
+1. **Resolve project root.** Use provided path or cwd.
+2. **Locate documentation artifacts.** Search for:
+   - README files: `README.md`, `README.rst`, `readme.md`
+   - Getting started: `GETTING_STARTED.md`, `QUICKSTART.md`, `docs/getting-started.md`
+   - API docs: `docs/api/`, `API.md`, generated docs in `docs/`, `site/`
+   - Examples: `examples/`, `demos/`, `samples/`, code blocks in README
+   - Changelog: `CHANGELOG.md`, `CHANGES.md`, `HISTORY.md`
+   - Contributing: `CONTRIBUTING.md`, `.github/CONTRIBUTING.md`
+3. **Score README completeness.** Check for the presence and quality of each section:
+   - **Title and description** (what is this project?) -- 0-2 points
+   - **Installation/setup** (how do I get it?) -- 0-3 points
+   - **Quick example** (show me it working in under 30 seconds) -- 0-3 points
+   - **API reference or link** (where is the full documentation?) -- 0-2 points
+   - **Contributing guide or link** -- 0-1 point
+   - **License** -- 0-1 point
+   - Total: score out of 12, grade A (10+), B (7-9), C (4-6), D (0-3)
+4. **Evaluate installation instructions.** Check:
+   - Are all package managers covered? (npm, yarn, pnpm for JS; pip, poetry for Python; cargo for Rust)
+   - Are prerequisites listed? (Node version, OS requirements, system dependencies)
+   - Is there a one-liner to get started? (copy-paste friendly)
+   - Do the instructions work on all documented platforms?
+5. **Assess API documentation coverage.** For every exported function, class, or endpoint:
+   - Is it documented?
+   - Does it have parameter descriptions?
+   - Does it have a usage example?
+   - Does it have return type documentation?
+   - Calculate coverage percentage: `documented / total * 100`
+6. **Check for time-to-hello-world.** Estimate the number of steps from `git clone` to seeing the project work. Fewer than 5 steps is good. More than 10 is a problem.
+---
+### Phase 2: EXTRACT -- Identify and Validate Examples
+1. **Extract code examples from documentation.** Parse all markdown files for fenced code blocks with language annotations. Track:
+   - File location and line number
+   - Language (js, ts, python, bash, etc.)
+   - Whether it is a complete runnable example or a fragment
+2. **Extract standalone examples.** Scan `examples/`, `demos/`, `samples/` for:
+   - Example projects with their own package.json/requirements.txt
+   - Single-file examples
+   - Example README files explaining what each example demonstrates
+3. **Validate example syntax.** For each extracted code example:
+   - Check for syntax errors (missing imports, unclosed brackets, invalid syntax)
+   - Check for references to APIs that no longer exist (stale examples)
+   - Check that import paths match the actual package name and exports
+4. **Run executable examples.** When `--validate-examples` is set:
+   - For JavaScript/TypeScript: attempt `node` or `tsx` execution
+   - For Python: attempt `python` execution
+   - For shell commands: validate they reference real scripts and flags
+   - Record pass/fail for each example with error output
+5. **Check example freshness.** Compare examples against the current API surface:
+   - Are there deprecated APIs used in examples?
+   - Are there new APIs with no examples?
+   - When was each example file last modified relative to the source it demonstrates?
+6. **Build coverage map.** Map examples to the APIs they demonstrate. Identify APIs with zero examples (documentation gaps).
+---
+### Phase 3: SCAFFOLD -- Generate Missing Documentation
+1. **Generate README sections.** For any missing README section identified in Phase 1:
+   - Draft installation instructions by reading `package.json`, `setup.py`, `Cargo.toml`, or equivalent
+   - Draft a quick-start example using the project's main export
+   - Draft a features list from the project's exports and test descriptions
+2. **Generate API documentation stubs.** For undocumented exports:
+   - Extract function signatures, parameter types, and return types from source
+   - Generate JSDoc/docstring stubs with parameter descriptions inferred from type names
+   - Include a usage example skeleton derived from test files when available
+3. **Generate example files.** For APIs with no examples:
+   - Create a minimal working example in `examples/`
+   - Include comments explaining each step
+   - Ensure the example is self-contained (includes imports, setup, and cleanup)
+4. **Generate getting-started guide.** If no quickstart exists:
+   - Write a step-by-step guide from installation through first meaningful use
+   - Include expected output at each step
+   - Target under 5 minutes to complete
+5. **Propose documentation structure.** If documentation is scattered or missing:
+   - Recommend a `docs/` directory structure
+   - Map content to sections (guides, reference, examples, tutorials)
+   - Suggest a documentation site generator if the project is large enough (Docusaurus, MkDocs, mdBook)
+---
+### Phase 4: VALIDATE -- Verify Documentation Accuracy
+1. **Check link integrity.** Verify all links in documentation:
+   - Internal links: do referenced files and anchors exist?
+   - External links: are they well-formed? (do not make HTTP requests)
+   - Badge URLs: are shields.io and similar badge URLs using the correct repo/package name?
+2. **Check version consistency.** Verify documentation matches the current version:
+   - Does the installation section reference the correct package version?
+   - Do API examples use the current function signatures?
+   - Is the changelog up to date with the latest release?
+3. **Check cross-references.** Verify README links to detailed docs, and detailed docs link back to the README and to each other where appropriate.
+4. **Output DX scorecard.** Present the complete audit results:
+   ```
+   DX Scorecard: [GRADE]
+   README: [score]/12 ([grade])
+   API Coverage: [N]% ([documented]/[total] exports)
+   Examples: [working]/[total] passing
+   Time to Hello World: ~[N] steps
+   Links: [valid]/[total] verified
+   GAPS:
+   - Missing: getting-started guide
+   - Missing: 12 undocumented exports
+   - Broken: examples/advanced.ts references removed API
+   GENERATED:
+   - docs/getting-started.md (draft)
+   - 4 API documentation stubs added
+   - examples/basic-usage.ts created
+   ```
+5. **Verify scaffolded content compiles.** If documentation was generated, verify:
+   - Generated code examples have valid syntax
+   - Generated markdown renders correctly (no broken formatting)
+   - Generated files are placed in the correct directories
+---
+## Harness Integration
+- **`harness skill run harness-dx`** -- Primary command for running the DX audit.
+- **`harness validate`** -- Run after scaffolding documentation to verify project health.
+- **`Glob`** -- Used to locate README files, documentation directories, example folders, and API docs.
+- **`Grep`** -- Used to extract exported symbols, find documentation comments, and locate code examples in markdown.
+- **`Read`** -- Used to read documentation files, package manifests, and source files for API extraction.
+- **`Write`** -- Used to scaffold missing documentation, generate example files, and create getting-started guides.
+- **`Bash`** -- Used to run example validation, check link targets, and execute code snippets.
+- **`emit_interaction`** -- Used to present the DX scorecard and request confirmation before generating scaffolded files.
+## Success Criteria
+- README is scored against all 6 completeness criteria with specific gap identification
+- API documentation coverage percentage is calculated against actual exported surface
+- All code examples in documentation are syntax-checked
+- Executable examples pass when `--validate-examples` is set
+- Missing documentation is scaffolded with accurate, runnable content
+- DX scorecard provides an at-a-glance quality grade
+- Time-to-hello-world is estimated and actionable if too high
+## Examples
+### Example: Node.js SDK with Sparse Documentation
+```
+Phase 1: AUDIT
+  README score: 5/12 (C)
+    Present: title, description, license
+    Missing: installation, quick example, API reference link, contributing
+  API coverage: 23% (7/30 exports documented)
+  Time to hello world: ~14 steps (too many, target: <5)
+Phase 2: EXTRACT
+  Code examples found: 3 (all in README)
+  examples/ directory: empty
+  Validation: 2/3 examples pass syntax check
+  Broken: README line 45 references `sdk.connect()` -- renamed to `sdk.init()` in v2.0
+Phase 3: SCAFFOLD
+  Generated: docs/getting-started.md (5-step quickstart)
+  Generated: examples/basic-usage.ts (demonstrates init, query, cleanup)
+  Generated: 23 JSDoc stubs from TypeScript signatures
+  README patches: added installation section, updated broken example
+Phase 4: VALIDATE
+  Links: 8/10 valid (2 broken anchors in README)
+  Generated examples: syntax valid
+  DX Scorecard: C -> B (projected after applying changes)
+```
+### Example: Python Library with Comprehensive Docs (Sphinx)
+```
+Phase 1: AUDIT
+  README score: 11/12 (A)
+    Missing only: contributing guide link
+  API coverage: 89% (142/160 functions documented)
+  Sphinx docs at docs/_build/html: present, 45 pages
+  Time to hello world: ~4 steps (good)
+Phase 2: EXTRACT
+  Code examples: 28 in docs, 12 in examples/
+  Validation: 37/40 pass (3 use deprecated pandas.append)
+  Stale examples: 3 files last modified 8 months ago, source changed since
+Phase 3: SCAFFOLD
+  Generated: 18 docstring stubs for undocumented functions
+  Updated: 3 stale examples to use pandas.concat
+  Added: CONTRIBUTING.md link to README
+Phase 4: VALIDATE
+  Links: 52/52 valid
+  DX Scorecard: A (maintained, minor freshness issues resolved)
+```
+### Example: Rust CLI Tool Missing Getting Started
+```
+Phase 1: AUDIT
+  README score: 7/12 (B)
+    Present: title, description, installation (cargo install), license, API link
+    Missing: quick example showing actual CLI usage, contributing
+  API coverage: N/A (CLI tool, not library)
+  CLI help text: present via clap derive
+  Time to hello world: ~6 steps
+Phase 2: EXTRACT
+  Code examples: 2 in README (both installation commands)
+  examples/ directory: 1 example config file, no runnable examples
+  Missing: actual usage examples showing command output
+Phase 3: SCAFFOLD
+  Generated: docs/getting-started.md with:
+    1. cargo install myctl
+    2. myctl init
+    3. myctl run --config example.toml
+    (with expected output at each step)
+  Generated: examples/basic-config.toml with annotated comments
+  Generated: README quick-example section with terminal output
+Phase 4: VALIDATE
+  CLI help flags match documented flags: YES
+  Config example matches current schema: YES
+  DX Scorecard: B -> A (projected after applying changes)
+```
+## Gates
+- **No scaffolding without human confirmation.** Generated documentation is always presented as a draft for review. Do not commit generated files automatically. Use `emit_interaction` to present scaffolded content and wait for approval.
+- **No overwriting existing documentation.** If a README section already exists, do not replace it. Only fill gaps. Existing content may have been carefully written and should not be clobbered.
+- **No fabricating API behavior.** Generated documentation and examples must be derived from actual source code (type signatures, test files, existing docs). Do not guess what an undocumented function does based on its name alone.
+- **No marking stale examples as passing.** If an example references a renamed or removed API, it is broken regardless of whether it happens to still parse syntactically.
+## Escalation
+- **When API documentation requires domain expertise:** If function behavior cannot be inferred from types and tests alone, flag it: "These 5 functions need developer-written documentation -- their behavior is domain-specific and cannot be reliably inferred."
+- **When examples require external services:** If running an example requires a database, API key, or external service, flag the dependency rather than failing: "This example requires a running PostgreSQL instance. Consider adding a Docker Compose file for example dependencies."
+- **When documentation tooling is broken:** If Sphinx, TypeDoc, or other doc generators fail to build, report the error but do not attempt to fix the toolchain. That is outside this skill's scope.
+- **When README and API docs contradict each other:** Flag the contradiction with both sources quoted. Do not choose which one is correct -- the developer must resolve the conflict: "README says `init()` accepts a string, but the TypeDoc shows it accepts `InitConfig`. Which is current?"

package/dist/agents/skills/claude-code/harness-dx/skill.yaml ADDED Viewed

@@ -0,0 +1,76 @@
+name: harness-dx
+version: "1.0.0"
+description: Developer experience auditing — README quality, API documentation, getting-started guides, and example validation
+cognitive_mode: advisory-guide
+triggers:
+  - manual
+  - on_milestone
+  - on_pr
+platforms:
+  - claude-code
+  - gemini-cli
+tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Glob
+  - Grep
+  - emit_interaction
+cli:
+  command: harness skill run harness-dx
+  args:
+    - name: path
+      description: Project root path
+      required: false
+    - name: focus
+      description: "Audit focus: readme, api-docs, examples, quickstart, all. Defaults to all."
+      required: false
+    - name: validate-examples
+      description: Run example code snippets to verify they execute successfully
+      required: false
+mcp:
+  tool: run_skill
+  input:
+    skill: harness-dx
+    path: string
+type: rigid
+tier: 3
+internal: false
+keywords:
+  - developer experience
+  - DX
+  - README
+  - API docs
+  - getting started
+  - examples
+  - SDK
+  - onboarding
+  - documentation quality
+  - quickstart
+  - tutorials
+stack_signals:
+  - "README.md"
+  - "docs/"
+  - "examples/"
+  - "GETTING_STARTED.md"
+  - "QUICKSTART.md"
+  - "docs/api/"
+  - "sdk/"
+phases:
+  - name: audit
+    description: Evaluate README completeness, API doc coverage, and getting-started guide quality
+    required: true
+  - name: extract
+    description: Identify code examples, API references, and installation instructions for validation
+    required: true
+  - name: scaffold
+    description: Generate missing documentation, fix broken examples, and fill coverage gaps
+    required: true
+  - name: validate
+    description: Verify examples run, links resolve, and documentation matches current API surface
+    required: true
+state:
+  persistent: false
+  files: []
+depends_on: []

package/dist/agents/skills/claude-code/harness-e2e/SKILL.md ADDED Viewed

@@ -0,0 +1,245 @@
+# Harness E2E
+> End-to-end browser testing with Playwright, Cypress, or Selenium. Covers page object scaffolding, critical-path test implementation, and systematic flakiness remediation.
+## When to Use
+- Writing browser-level tests for critical user flows (login, checkout, onboarding)
+- Adding E2E coverage for a new feature that touches the UI
+- Diagnosing and remediating flaky E2E tests that block CI pipelines
+- NOT when testing API-only behavior with no UI (use harness-integration-test instead)
+- NOT when testing individual component rendering in isolation (use unit tests or harness-tdd instead)
+- NOT when performing visual screenshot comparison (use harness-visual-regression instead)
+## Process
+### Phase 1: DETECT -- Identify Framework and Application Structure
+1. **Scan for E2E configuration.** Search for `playwright.config.ts`, `playwright.config.js`, `cypress.config.ts`, `cypress.config.js`, `wdio.conf.js`, or `selenium` directories. If multiple frameworks are present, prefer the one with the most existing tests.
+2. **Catalog existing E2E tests.** Glob for `*.spec.ts`, `*.e2e.ts`, `*.cy.ts`, `*.test.ts` within E2E directories. Count tests per file and identify patterns: naming conventions, folder structure, shared utilities.
+3. **Map application entry points.** Identify the base URL, authentication flow, and route structure. Check for:
+   - Environment configuration (`.env.test`, `playwright.config.ts` baseURL)
+   - Authentication fixtures (stored auth state, login helpers)
+   - Route definitions (Next.js pages, React Router config, Express routes)
+4. **Identify the test execution environment.** Determine whether tests run against a dev server, a preview deployment, or a Docker Compose stack. Check `package.json` scripts for `e2e`, `test:e2e`, or `playwright test` commands.
+5. **Report findings.** Summarize: framework detected, number of existing tests, coverage gaps relative to application routes, and any configuration issues (missing base URL, no auth setup).
+### Phase 2: SCAFFOLD -- Generate Page Objects and Test Infrastructure
+1. **Create the page object directory.** Follow the project's existing conventions. If no convention exists, use `e2e/pages/` for Playwright or `cypress/pages/` for Cypress.
+2. **Generate page objects for target flows.** Each page object encapsulates:
+   - Locator definitions using stable selectors (`data-testid`, `role`, `aria-label`) -- never CSS classes or XPath positional selectors
+   - Action methods (click, fill, navigate) that return the next page object for chaining
+   - Assertion helpers that verify page state without exposing DOM internals
+3. **Create shared fixtures and helpers.** Generate:
+   - Authentication fixture that stores and reuses auth state across tests
+   - Test data factory integration (API calls or database seeds for prerequisite data)
+   - Custom assertions for domain-specific validations
+4. **Configure test parallelization.** Set up:
+   - Worker count based on CI environment capabilities
+   - Test isolation (each test gets its own browser context)
+   - Retry configuration (1 retry for CI, 0 for local development)
+   - Screenshot and trace capture on failure
+5. **Verify scaffold compiles.** Run the test command with `--list` or `--dry-run` to confirm all imports resolve and page objects instantiate without errors.
+### Phase 3: IMPLEMENT -- Write E2E Tests for Critical Paths
+1. **Prioritize user flows by business impact.** Order test implementation:
+   - Smoke tests: application loads, critical pages render
+   - Authentication: login, logout, session persistence
+   - Primary flows: the 3-5 flows that represent 80% of user value
+   - Error paths: form validation, 404 handling, permission denied
+2. **Write each test following the Arrange-Act-Assert pattern.**
+   - Arrange: set up test data via API calls or fixtures (never through the UI for setup)
+   - Act: perform the user flow through page object methods
+   - Assert: verify the expected outcome using page object assertion helpers
+3. **Use explicit waits, never arbitrary timeouts.** Where the framework provides an auto-waiting mechanism (Playwright `expect` with auto-retry, Cypress implicit waits), rely on it. Where explicit waits are needed, wait for specific network responses, DOM mutations, or URL changes -- never `page.waitForTimeout()`.
+4. **Isolate tests from each other.** Each test must:
+   - Create its own test data (no shared mutable state between tests)
+   - Clean up after itself or rely on test isolation (separate browser context)
+   - Pass when run individually and when run in any order
+5. **Tag tests by scope.** Apply tags or annotations:
+   - `@smoke` for tests that must pass on every deployment
+   - `@critical-path` for primary business flow coverage
+   - `@slow` for tests that exceed 30 seconds
+6. **Run the full E2E suite.** Verify all tests pass locally before proceeding to validation.
+### Phase 4: VALIDATE -- Execute, Detect Flakiness, and Remediate
+1. **Run the suite 3 times consecutively.** Track pass/fail per test across runs. Any test that fails in at least one run but passes in another is flagged as flaky.
+2. **Classify flaky tests by root cause.** Common categories:
+   - **Race condition:** test asserts before async operation completes. Fix: add explicit wait for the specific condition.
+   - **Shared state:** test depends on data from a previous test. Fix: make test data independent.
+   - **Animation/transition:** assertion fires during CSS transition. Fix: wait for animation to complete or disable animations in test mode.
+   - **Network timing:** API response arrives before or after expected. Fix: intercept and mock the network request, or wait for the specific response.
+3. **Apply remediation for each flaky test.** Do not simply add retries -- fix the root cause. Retries mask problems. After remediation, rerun the previously-flaky test 5 times to confirm stability.
+4. **Run `harness validate`.** Confirm the project passes all harness checks with the new E2E tests in place.
+5. **Generate a coverage summary.** Report:
+   - Number of user flows covered vs. total identified
+   - Flaky tests found and remediated
+   - Remaining coverage gaps with recommended next steps
+### Graph Refresh
+If a knowledge graph exists at `.harness/graph/`, refresh it after code changes to keep graph queries accurate:
+```
+harness scan [path]
+```
+## Harness Integration
+- **`harness validate`** -- Run in VALIDATE phase after all tests are implemented. Confirms project health including new E2E infrastructure.
+- **`harness check-deps`** -- Run after SCAFFOLD phase to verify E2E test dependencies do not introduce forbidden imports into production code.
+- **`emit_interaction`** -- Used at checkpoints to present flakiness findings and remediation options to the human for approval.
+- **Glob** -- Used in DETECT phase to catalog existing test files and page objects.
+- **Grep** -- Used to search for selector patterns, wait strategies, and anti-patterns in existing tests.
+## Success Criteria
+- Every critical user flow identified in DETECT has a corresponding E2E test
+- All E2E tests pass on 3 consecutive local runs with zero flaky failures
+- Page objects use stable selectors (`data-testid`, ARIA roles) -- no CSS class selectors or XPath positional selectors
+- No test uses arbitrary timeouts (`waitForTimeout`, `cy.wait(N)`, `Thread.sleep`)
+- Test data is created via API or fixtures, not through UI interactions during setup
+- Each test is independent and passes when run in isolation
+- `harness validate` passes after the full suite is in place
+## Examples
+### Example: Playwright E2E for a SaaS Dashboard
+**DETECT output:**
+```
+Framework: Playwright 1.42 (playwright.config.ts found)
+Existing tests: 12 specs in e2e/tests/
+Base URL: http://localhost:3000
+Auth: Cookie-based, no stored auth state found
+Coverage gaps: settings page, billing flow, team invitation
+```
+**SCAFFOLD -- Page object for dashboard:**
+```typescript
+// e2e/pages/dashboard.page.ts
+import { type Page, type Locator, expect } from '@playwright/test';
+export class DashboardPage {
+  readonly page: Page;
+  readonly heading: Locator;
+  readonly projectList: Locator;
+  readonly createButton: Locator;
+  constructor(page: Page) {
+    this.page = page;
+    this.heading = page.getByRole('heading', { name: 'Dashboard' });
+    this.projectList = page.getByTestId('project-list');
+    this.createButton = page.getByRole('button', { name: 'New Project' });
+  }
+  async goto() {
+    await this.page.goto('/dashboard');
+    await expect(this.heading).toBeVisible();
+  }
+  async createProject(name: string) {
+    await this.createButton.click();
+    await this.page.getByLabel('Project name').fill(name);
+    await this.page.getByRole('button', { name: 'Create' }).click();
+    await expect(this.page.getByText(name)).toBeVisible();
+  }
+  async expectProjectCount(count: number) {
+    await expect(this.projectList.getByRole('listitem')).toHaveCount(count);
+  }
+}
+```
+**IMPLEMENT -- Critical path test:**
+```typescript
+// e2e/tests/project-creation.spec.ts
+import { test, expect } from '@playwright/test';
+import { DashboardPage } from '../pages/dashboard.page';
+import { LoginPage } from '../pages/login.page';
+test.describe('Project creation', () => {
+  test('user can create a project from the dashboard', async ({ page }) => {
+    // Arrange: authenticate via stored state
+    const loginPage = new LoginPage(page);
+    await loginPage.loginAs('test-user@example.com');
+    // Act: create a new project
+    const dashboard = new DashboardPage(page);
+    await dashboard.goto();
+    await dashboard.createProject('My Test Project');
+    // Assert: project appears in the list
+    await expect(page.getByText('My Test Project')).toBeVisible();
+    await expect(page).toHaveURL(/\/projects\/[\w-]+/);
+  });
+});
+```
+### Example: Cypress E2E for an E-Commerce Checkout
+**IMPLEMENT -- Checkout flow with network interception:**
+```typescript
+// cypress/e2e/checkout.cy.ts
+describe('Checkout flow', () => {
+  beforeEach(() => {
+    cy.intercept('POST', '/api/orders', { fixture: 'order-success.json' }).as('createOrder');
+    cy.loginByApi('customer@shop.com', 'testpass123');
+  });
+  it('completes checkout with valid payment', () => {
+    cy.visit('/cart');
+    cy.findByTestId('cart-item').should('have.length.at.least', 1);
+    cy.findByRole('button', { name: 'Proceed to Checkout' }).click();
+    cy.url().should('include', '/checkout');
+    cy.findByLabelText('Card number').type('4242424242424242');
+    cy.findByLabelText('Expiry').type('12/28');
+    cy.findByLabelText('CVC').type('123');
+    cy.findByRole('button', { name: 'Place Order' }).click();
+    cy.wait('@createOrder');
+    cy.findByRole('heading', { name: 'Order Confirmed' }).should('be.visible');
+  });
+});
+```
+## Gates
+- **No CSS class selectors in page objects.** If a locator uses `.btn-primary` or `[class*="header"]`, the test is brittle. Use `data-testid`, ARIA roles, or accessible labels. Rewrite before merging.
+- **No arbitrary waits.** If any test contains `waitForTimeout`, `cy.wait(number)`, or `Thread.sleep`, it is not ready. Replace with explicit condition waits.
+- **No shared mutable state.** If test B depends on data created by test A, both tests are broken. Each test must create its own data. Fix before proceeding.
+- **Flaky tests block merge.** Any test that fails intermittently on 3 consecutive runs must be remediated or quarantined with a tracking issue before the suite is considered complete.
+## Escalation
+- **When the application requires a complex auth flow (OAuth, SSO, MFA):** Do not automate the full OAuth redirect in the browser. Use API-based authentication to obtain tokens, then inject them as cookies or headers. If API auth is not available, escalate to the team to expose a test-only auth bypass.
+- **When tests require infrastructure not available locally (third-party APIs, payment processors):** Mock the external dependency at the network layer using Playwright route interception or Cypress intercept. If the mock is insufficient for confidence, escalate for a staging environment with sandbox credentials.
+- **When flakiness persists after 3 remediation attempts on the same test:** The test may be exposing a real race condition in the application. Escalate the finding as a potential production bug rather than continuing to patch the test.
+- **When the E2E suite exceeds 10 minutes on CI:** Triage tests into smoke (must run on every commit) and full (runs on PR merge or nightly). Do not simply accept slow pipelines.