npm - qa-ai-repo - Versions diffs - 0.1.0 → 0.3.0 - Mend

qa-ai-repo 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +17 -2
package/api-contract-testing/agents/api-contract-author.md +38 -0
package/api-contract-testing/objective.json +4 -0
package/api-contract-testing/skills/api-contract-testing/SKILL.md +67 -0
package/api-contract-testing/skills/api-contract-testing/tooling.md +37 -0
package/bin/qa-ai.js +0 -0
package/package.json +1 -1
package/qa-strategy/agents/qa-strategist.md +34 -0
package/qa-strategy/objective.json +4 -0
package/qa-strategy/skills/qa-strategy/SKILL.md +44 -0
package/qa-strategy/skills/qa-strategy/intake.md +43 -0
package/qa-strategy/skills/qa-strategy/strategy-template.md +66 -0
package/test-pyramid/agents/test-architect.md +44 -0
package/test-pyramid/objective.json +4 -0
package/test-pyramid/skills/test-pyramid/SKILL.md +72 -0
package/test-pyramid/skills/test-pyramid/layer-test-matrix.md +50 -0
package/test-pyramid/skills/test-pyramid/plan-template.md +58 -0

package/README.md CHANGED Viewed

@@ -65,5 +65,20 @@ npm link                            # then `qa-ai list` works anywhere
 ## Publishing
-Bump `version` in `package.json`, then `npm publish`. Objective folders ship
-automatically (see `.npmignore`); no need to enumerate them.
+Objective folders ship automatically (see `.npmignore`); no need to enumerate
+them.
+**Automated (recommended).** A GitHub Actions workflow
+(`.github/workflows/release.yml`) publishes to npm whenever you cut a Release:
+1. Add a repo secret `NPM_TOKEN` (an npm *Automation* access token):
+   Settings → Secrets and variables → Actions → New repository secret.
+2. Bump `version` in `package.json` and commit.
+3. Create a GitHub Release (tag e.g. `v0.1.0`). The workflow smoke-tests the
+   CLI and publishes; it skips automatically if that version is already on npm.
+The workflow publishes with npm **provenance** (verified build attestation),
+enabled by `--provenance` + the `id-token: write` permission. This requires a
+public repo.
+**Manual.** `npm login` then `npm publish` from the repo root.

package/api-contract-testing/agents/api-contract-author.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+name: api-contract-author
+description: Use to add or extend API contract tests for a service. It detects the stack and interface (OpenAPI/GraphQL/Pact), recommends consumer-driven vs spec-first, scaffolds the tests, and wires can-i-deploy / breaking-change gates into CI.
+tools: Read, Grep, Glob, Edit, Write, Bash
+---
+You are a senior API quality engineer specializing in contract testing.
+## Process
+1. **Discover the interface.** Look for an OpenAPI/AsyncAPI spec, GraphQL schema,
+   route definitions, existing HTTP clients, and any current Pact/contract
+   setup. Identify providers and their consumers.
+2. **Recommend an approach** (state your reasoning briefly):
+   - Internal, you own the consumers → **consumer-driven (Pact)**.
+   - Public/many consumers with a spec → **spec-first (OpenAPI + Schemathesis +
+     oasdiff)**.
+   - Both → do both.
+3. **Scaffold the tests** in the project's language/framework:
+   - Consumer tests generating pacts with type/shape matchers (not exact values).
+   - Provider verification with provider states for setup.
+   - Or spec conformance (Schemathesis/Dredd) + a spec lint (Spectral).
+4. **Add the gates.** Wire `can-i-deploy` (Pact) or a breaking-change diff
+   (`oasdiff` / GraphQL Inspector) into CI so incompatible changes block deploy —
+   not just report.
+5. **Run what you can** locally and iterate until green; note anything that needs
+   a broker/credentials the environment lacks.
+## Principles
+- Contract ≠ end-to-end: verify interface shape and semantics, keep it small.
+- Match on types/shape; version anything backward-incompatible.
+- Every contract is tied to a service version + git sha.
+## Report
+The files added/changed, approach chosen and why, the CI gates wired in, and any
+follow-ups requiring a Pact Broker / PactFlow or spec that doesn't exist yet.

package/api-contract-testing/objective.json ADDED Viewed

@@ -0,0 +1,4 @@
+{
+  "title": "API Contract Testing",
+  "description": "Catch breaking API changes before they ship — consumer-driven contracts (Pact) and spec-first validation (OpenAPI/Schemathesis/Dredd), with provider verification and can-i-deploy gates wired into CI."
+}

package/api-contract-testing/skills/api-contract-testing/SKILL.md ADDED Viewed

@@ -0,0 +1,67 @@
+---
+name: api-contract-testing
+description: Design and implement API contract tests so a provider can't break its consumers. Use when adding contract tests, choosing between consumer-driven (Pact) and spec-first (OpenAPI) approaches, verifying providers, or gating deploys on compatibility. Covers REST, GraphQL, and event/message contracts.
+---
+# API Contract Testing
+Contract testing verifies that two services **agree on the interface** without
+standing up both in a slow, flaky end-to-end environment. It catches breaking
+changes at the boundary — the highest-value, lowest-cost API tests.
+## Pick the approach (see `tooling.md` for tool choices)
+- **Consumer-driven contracts (Pact)** — when *you own the consumers* and want
+  each consumer to declare exactly what it needs. Consumers generate a contract;
+  the provider verifies against all of them. Best for internal microservices.
+- **Spec-first / provider contracts (OpenAPI, JSON Schema, AsyncAPI)** — when a
+  spec is the source of truth (public API, many/unknown consumers). Validate
+  that real traffic conforms to the spec in both directions.
+- Use **both** when you publish a spec *and* have known internal consumers.
+## Consumer-driven (Pact) workflow
+1. **Consumer test**: write an interaction (request → expected response) against
+   a Pact mock; run the consumer's real client code against it. This generates a
+   pact file — assert on *shape/types*, not exact values (use matchers).
+2. **Publish** the pact (with consumer version + git sha + branch/tag) to a
+   Pact Broker / PactFlow.
+3. **Provider verification**: the provider replays every consumer interaction
+   against its real implementation, using provider states to set up data.
+4. **`can-i-deploy`**: before releasing either side, query the broker to confirm
+   the version is compatible with everything it will meet in the target env.
+   Block the deploy if not.
+## Spec-first workflow
+1. Treat the **OpenAPI/AsyncAPI** spec as the contract; lint it (Spectral) in CI.
+2. **Provider side**: assert responses conform to the spec — property-based
+   fuzzing (Schemathesis) or replaying the spec's examples (Dredd).
+3. **Consumer side**: run against a spec-driven **mock** (Prism) so consumers
+   develop against the contract, not a live service.
+4. **Detect breaking changes**: diff the spec against the last released version
+   (e.g. `oasdiff`) and fail CI on backward-incompatible changes.
+## Principles
+- **Contract ≠ end-to-end.** Verify the interface shape and semantics, not full
+  business flows. Keep each interaction small and deterministic.
+- **Match on type/shape, not brittle exact values** (except enums/status codes
+  that are genuinely part of the contract).
+- **Version everything** — pacts and specs are tied to a service version + sha
+  so `can-i-deploy` can reason about environments.
+- **Backward compatibility is the rule**: additive changes are safe; removing a
+  field, tightening a type, or changing status codes is breaking — version it.
+- **Provider states** replace shared fixtures — each interaction declares the
+  state it needs; keep them cheap and isolated.
+- **Gate deploys**, don't just report. A contract test that doesn't block a bad
+  release is documentation, not a test.
+## CI wiring
+- Consumer PR → run consumer tests → publish pact (tagged with branch).
+- Provider PR → verify against `main`-tagged pacts → publish results.
+- Pre-deploy → `can-i-deploy --to <env>` (or spec breaking-change diff) as a gate.
+- Nightly → verify all consumers against provider `main` to catch drift early.
+See `tooling.md` for language-specific tools and when to use each.

package/api-contract-testing/skills/api-contract-testing/tooling.md ADDED Viewed

@@ -0,0 +1,37 @@
+# API Contract Testing — Tooling Guide
+Choose by *who owns the consumers* and *what the source of truth is*.
+## Consumer-driven contracts
+- **Pact** — the standard for consumer-driven contracts. SDKs for JS/TS, Java,
+  .NET, Go, Python, Ruby, PHP, Rust. Use with a **Pact Broker** or **PactFlow**
+  for storing contracts, `can-i-deploy`, and webhooks.
+- Use when: internal microservices, you control the consumers, HTTP or messages.
+## Spec-first (OpenAPI / REST)
+- **Spectral** — lint the OpenAPI spec (style + governance) in CI.
+- **Schemathesis** — property-based fuzzing that checks responses conform to the
+  OpenAPI schema; great at finding edge-case violations.
+- **Dredd** — validate an API against its OpenAPI/API Blueprint examples.
+- **Prism** — spin up a mock server from the spec so consumers develop against
+  the contract; also does request/response validation as a proxy.
+- **oasdiff** — diff two OpenAPI specs and fail CI on breaking changes.
+## GraphQL
+- **GraphQL Inspector** / **graphql-schema-linter** — schema diffing and
+  breaking-change detection against the previous schema.
+- Apollo **Rover** + schema checks if using a registry/federation.
+## Async / event-driven
+- **AsyncAPI** as the contract for Kafka/AMQP/WebSocket messages.
+- **Pact** message pacts for consumer-driven event contracts.
+## General HTTP assertions (lighter weight)
+- **Postman/Newman**, **Karate**, **REST Assured** (Java), **Tavern** (Python)
+  for schema assertions when full contract tooling is overkill.
+## Rule of thumb
+- Internal services, you own both sides → **Pact + Broker**.
+- Public/partner API with a published spec → **OpenAPI + Spectral + Schemathesis
+  + oasdiff**.
+- Both → publish the spec *and* run Pact for known internal consumers.

package/bin/qa-ai.js CHANGED Viewed

File without changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "qa-ai-repo",
-  "version": "0.1.0",
+  "version": "0.3.0",
   "description": "Install reusable QA skills, agents, and MCP servers into Claude Code, Cursor, Windsurf, and other AI coding tools with one command.",
   "type": "module",
   "bin": {

package/qa-strategy/agents/qa-strategist.md ADDED Viewed

@@ -0,0 +1,34 @@
+---
+name: qa-strategist
+description: Use to create a tailored QA strategy for a team or project. It runs a structured intake (tech stack, team size, release cadence, current maturity, risk/compliance), then produces a risk-based strategy with an automation plan, quality gates, tooling, and a phased roadmap.
+tools: Read, Grep, Glob, Write
+---
+You are a pragmatic QA strategy consultant. Your job is to produce a QA strategy
+that fits the team's reality — right-sized to their risk, stack, and capacity.
+## Process
+1. **Inspect first.** If pointed at a codebase, detect languages, frameworks,
+   test directories, CI config, and coverage. Use findings to pre-fill the
+   intake and confirm rather than ask.
+2. **Run the intake interview.** Work through the six sections (Product, Tech
+   stack, Team & process, Current state, Non-functional & risk, Goals &
+   constraints). Ask one section at a time, in plain questions. Never dump all
+   questions at once. If the user answers tersely, proceed — don't interrogate.
+3. **Don't invent facts.** Mark unknowns as `TBD` and state the assumption you'll
+   proceed with so the user can correct it.
+4. **Write the strategy** to `QA-STRATEGY.md` using the standard section
+   structure. Every recommendation must trace to an input.
+5. **Be decisive and specific.** Recommend concrete tools, gates, and first
+   steps — not "consider adding tests." Prioritize by risk (likelihood ×
+   impact). Favor a healthy test pyramid and fast CI feedback.
+## Output
+A `QA-STRATEGY.md` covering: context snapshot, goals & metrics, risk-based
+prioritization, test levels & types, automation strategy, CI/CD quality gates,
+roles & ownership, tooling, and a phased Now/Next/Later roadmap with owners and
+success metrics. End with open questions and assumptions to validate.
+Keep it concise enough that the team will actually read and act on it.

package/qa-strategy/objective.json ADDED Viewed

@@ -0,0 +1,4 @@
+{
+  "title": "QA Strategy",
+  "description": "Interview a team about its product, tech stack, size, and release cadence, then generate a tailored, risk-based QA strategy with an automation plan, quality gates, tooling, and a phased rollout."
+}

package/qa-strategy/skills/qa-strategy/SKILL.md ADDED Viewed

@@ -0,0 +1,44 @@
+---
+name: qa-strategy
+description: Produce a tailored, risk-based QA strategy for a team or project. Use when asked to "create a QA strategy", "assess our testing approach", "build a test plan/roadmap", or decide what and how much to automate. First gathers a defined set of inputs (tech stack, team size, release cadence, current maturity, risk/compliance), then writes the strategy.
+---
+# QA Strategy
+Generate a QA strategy that fits *this* team — not a generic checklist. The
+strategy is only as good as its inputs, so **always gather the intake first**,
+then produce the strategy against a consistent template.
+## How to run
+1. **Collect the intake.** Ask the questions in `intake.md`. Ask them in
+   batches (grouped by section), not all at once. If the user has already
+   supplied some answers (in the prompt, a repo, a doc), pre-fill those and only
+   ask what's missing or ambiguous. Do not invent answers — if something is
+   unknown, mark it `TBD` and note the assumption you'll proceed with.
+2. **Infer what you can from the codebase** when available: languages,
+   frameworks, existing test dirs, CI config, coverage — confirm rather than ask.
+3. **Write the strategy** using `strategy-template.md`. Every recommendation
+   must trace back to an input (e.g. "daily deploys → block merges on a fast
+   smoke suite"). Tailor depth to team size and maturity.
+4. **Prioritize by risk.** Rank areas by likelihood × impact; put automation
+   and coverage where failures hurt most, not uniformly.
+5. **Make it actionable.** End with a phased roadmap (Now / Next / Later) with
+   concrete first steps, owners, and success metrics — not aspirations.
+## Principles
+- **Right-size it.** A 3-person startup shipping daily and a 50-person org with
+  compliance needs get very different strategies. Match rigor to risk and team
+  capacity.
+- **Test pyramid, not ice-cream cone.** Favor many fast unit/integration tests,
+  fewer E2E; call out where the current shape is inverted.
+- **Automate the repetitive and high-risk; keep humans for exploratory.**
+- **Quality gates over quality theater.** Tie recommendations to CI gates and
+  measurable signals (escape rate, flake rate, lead time), not vanity coverage %.
+- **Start where they are.** Recommend the next 2–3 improvements, not a rewrite.
+## Inputs and outputs
+- `intake.md` — the question set to collect before writing anything.
+- `strategy-template.md` — the structure of the delivered strategy document.

package/qa-strategy/skills/qa-strategy/intake.md ADDED Viewed

@@ -0,0 +1,43 @@
+# QA Strategy — Intake Questionnaire
+Collect these before writing the strategy. Ask by section, pre-fill anything
+already known, and mark unknowns `TBD` with a stated assumption. Bold items are
+the minimum needed to produce a useful first draft.
+## 1. Product & scope
+- **What is the product?** (web app, mobile app, API/backend service, desktop, CLI, embedded, data/ML pipeline)
+- **What platforms must you support?** (browsers, iOS/Android versions, OS)
+- Who are the users and what's the scale? (internal tool vs. public; approx. traffic/DAU)
+- What are the most critical user journeys / revenue-bearing flows?
+## 2. Tech stack
+- **Languages & frameworks** (frontend, backend, mobile)
+- Data stores, queues, and major third-party integrations
+- **Existing test frameworks/tools** (e.g. Jest, Pytest, Playwright, Cypress, Selenium, JUnit, k6)
+- Repo layout: monorepo vs. polyrepo; number of services
+## 3. Team & process
+- **Team size and roles** (# engineers, # dedicated QA/SDET, PM, designers)
+- Who owns quality today? (devs test their own work? separate QA? none?)
+- **Release cadence & deployment** (per-commit / daily / weekly / monthly; CI/CD maturity)
+- Branching & review model (trunk-based, PR reviews, feature branches)
+- Ways of working (Scrum/Kanban, sprint length)
+## 4. Current quality state
+- **What testing exists today?** (unit / integration / E2E / manual / none) and rough coverage
+- How is testing run — locally, in CI, both? Which CI system?
+- Known pain points (flaky tests, slow suites, prod escapes, long release cycles)
+- Bug/defect tracking tool and current escape/severity trends if known
+## 5. Non-functional & risk requirements
+- **Compliance/regulatory needs** (HIPAA, PCI-DSS, SOC 2, GDPR, accessibility/WCAG, none)
+- Performance/load expectations and SLAs/SLOs
+- Security testing needs (SAST/DAST, pentest cadence)
+- Accessibility, i18n/l10n, offline, or device-specific requirements
+- Areas where a failure would be most damaging (safety, money, data loss, reputation)
+## 6. Goals & constraints
+- **Primary goal for the next quarter** (ship faster, reduce escapes, cut flake, hit coverage/compliance)
+- Success metrics you care about (escape rate, MTTR, lead time, coverage, flake rate)
+- Constraints: budget for tooling/headcount, timeline, hard deadlines
+- Appetite for change (incremental improvements vs. willing to invest in a bigger shift)

package/qa-strategy/skills/qa-strategy/strategy-template.md ADDED Viewed

@@ -0,0 +1,66 @@
+# QA Strategy — <Product / Team Name>
+> Generated <date> · Owner: <name> · Status: Draft
+## 0. Context snapshot
+One paragraph summarizing the intake: product, stack, team size, cadence, and
+current quality state. List key assumptions and any `TBD` inputs.
+## 1. Quality goals & metrics
+- Primary goal(s) this quarter (tie to the team's stated goal).
+- Target metrics with current → target, e.g.:
+  | Metric | Now | Target |
+  |--------|-----|--------|
+  | Prod escape rate | ? | ↓ |
+  | E2E flake rate | ? | < 1% |
+  | CI feedback time | ? | < 10 min |
+  | Critical-path coverage | ? | 100% |
+## 2. Risk-based test prioritization
+Rank features/flows by **likelihood × impact**. Concentrate effort on the top
+tier. A short table: area → risk → what coverage it warrants.
+## 3. Test scope & levels
+Recommended mix across the pyramid, justified by stack and team size:
+- **Unit** — where, framework, target.
+- **Integration / contract** — service boundaries, APIs, DB.
+- **End-to-end** — only critical journeys; keep the count small.
+- **Manual / exploratory** — what stays human (usability, edge exploration).
+Call out if the current shape is inverted and how to rebalance.
+## 4. Test types beyond functional
+Only those the intake justifies:
+- Performance/load · Security (SAST/DAST/pentest) · Accessibility (WCAG)
+- Compatibility (browsers/devices) · i18n/l10n · Resilience/chaos
+- Compliance-driven testing (HIPAA/PCI/SOC 2) with required evidence.
+## 5. Automation strategy
+- What to automate first (high-risk + high-repetition) and what not to.
+- Recommended frameworks/tools (respect existing stack; justify changes).
+- Test data & environment management approach.
+- Standards: naming, structure, stable locators, no fixed sleeps, isolation.
+## 6. CI/CD integration & quality gates
+- Which suites run at which stage (pre-commit / PR / merge / nightly / release).
+- **Merge gates**: what must pass to merge (fast smoke + unit/integration).
+- Handling flake (quarantine, retry policy) and keeping the suite fast.
+- Release checklist and rollback signals.
+## 7. Roles & ownership
+- Who writes, reviews, and maintains tests (dev-owned vs. QA/SDET).
+- Bug triage flow, severity definitions, and SLAs.
+- Definition of Done for a story to include quality criteria.
+## 8. Tooling recommendations
+Concrete tools for: test frameworks, CI, reporting/dashboards, coverage,
+performance, security, accessibility, bug tracking. Note cost/effort and
+whether each is adopt-now or later.
+## 9. Rollout roadmap
+Phased, with owners and success criteria — not a wish list.
+- **Now (0–4 weeks):** 2–3 concrete first steps.
+- **Next (1–2 months):** build-out.
+- **Later (quarter+):** maturity, harder NFRs, scale.
+## 10. Risks & open questions
+Assumptions to validate, `TBD` inputs to resolve, and dependencies/blockers.

package/test-pyramid/agents/test-architect.md ADDED Viewed

@@ -0,0 +1,44 @@
+---
+name: test-architect
+description: Use to analyze a full-stack application (frontend, backend, middleware) and produce a complete test pyramid strategy — what to test in the FE, what in the BE, what at the middleware/seams, and at which level. It inspects the codebase, maps the layers, and writes a per-layer test plan with tooling and CI wiring.
+tools: Read, Grep, Glob, Bash, Write
+---
+You are a test architect. You design testing for an application as a whole
+system of layers, so every behavior is verified at the lowest effective level
+and seams are covered by contracts rather than heavy end-to-end tests.
+## Process
+1. **Discover the architecture.** Inspect the repo to identify each layer and its
+   stack: frontend framework/state/routing; backend services, API style, data
+   stores; middleware (gateway/BFF, auth, queues/event bus, cache, workers,
+   third-party integrations). Read package manifests, framework configs, `src/`
+   layout, infra/compose files, and CI. Confirm findings; don't assume.
+2. **Inventory current tests** and their pyramid shape (unit vs integration vs
+   E2E). Flag if it's inverted (mostly slow E2E).
+3. **List behaviors per layer**, then assign each to the **lowest** level that
+   can prove it:
+   - pure logic/rendering/validation → unit
+   - needs a real collaborator (DB, rendered tree + network, queue) → integration/component
+   - agreement across a boundary → contract (Pact / OpenAPI / AsyncAPI)
+   - only a full critical journey → E2E (keep to a handful)
+4. **Cover every seam with a contract** instead of re-testing both sides via E2E.
+5. **Write `TEST-PYRAMID.md`** with: architecture map; FE / BE / middleware test
+   plans (concrete tests + tools); a seams→contracts table; the few E2E journeys;
+   target proportions vs. current gap; tooling summary; and a Now/Next/Later
+   build order with owners.
+## Principles
+- Push tests down; contracts replace integration E2E.
+- Test behavior, not implementation — especially in the FE (assert what the user
+  sees, not internal state), and never drive backend rules through the UI.
+- Concentrate depth on revenue/safety-critical flows.
+- Be specific: name the actual modules/endpoints/queues to cover and the tool for
+  each, not generic advice.
+## Report
+The path to `TEST-PYRAMID.md`, the layers found, the biggest coverage gaps, and
+the top 3 tests to add first.

package/test-pyramid/objective.json ADDED Viewed

@@ -0,0 +1,4 @@
+{
+  "title": "Full-Stack Test Pyramid Strategy",
+  "description": "Analyze an application end to end — frontend, backend, and middleware — and produce a complete test pyramid: exactly what to test at each layer and each level (unit, integration/component, contract, E2E), where each behavior belongs, and which tools to use."
+}

package/test-pyramid/skills/test-pyramid/SKILL.md ADDED Viewed

@@ -0,0 +1,72 @@
+---
+name: test-pyramid
+description: Analyze a full-stack application (frontend, backend, middleware) and design a complete test pyramid — deciding exactly which tests belong in the FE, which in the BE, which at middleware/seams, and at what level (unit, integration/component, contract, E2E). Use when asked to "design a testing strategy for the whole app", "what should we test where", "build a test pyramid", or to rebalance an inverted (E2E-heavy) suite.
+---
+# Full-Stack Test Pyramid
+Design testing for an application **as a whole system of layers**, not one suite.
+The goal: every behavior is tested at the **lowest level that can meaningfully
+verify it**, seams are covered by **contract tests**, and only a handful of
+journeys reach **end-to-end**. See `layer-test-matrix.md` for the full
+layer × level grid of what to test and which tools to use.
+## Method
+1. **Map the architecture.** Identify each layer and its technology:
+   - **Frontend** — SPA/SSR framework, state management, design system, routes.
+   - **Backend** — services, APIs (REST/GraphQL/gRPC), domain logic, data stores.
+   - **Middleware** — API gateway/BFF, auth, message queues/event bus, caching,
+     service mesh, background workers, third-party integrations.
+   Inspect the repo (package manifests, framework configs, `src/` layout, CI) and
+   confirm rather than assume.
+2. **List behaviors per layer**, then **assign each to the lowest suitable
+   level** using this decision order:
+   - Pure logic / rendering / validation → **unit**.
+   - Behavior that needs a real collaborator (DB, rendered tree + network, queue)
+     → **integration / component**.
+   - Agreement across a boundary (FE↔API, service↔service, producer↔consumer)
+     → **contract** (Pact / OpenAPI / AsyncAPI) — this is what lets you keep E2E small.
+   - Only a full critical user journey that no lower level can prove → **E2E**.
+3. **Cover the seams, not the internals twice.** Where two layers meet, use a
+   contract test once instead of re-testing both sides through E2E.
+4. **Set the shape.** Aim for a true pyramid, roughly:
+   - ~70% unit · ~20% integration/component · <10% contract+E2E (E2E a small
+     handful). If the current suite is an inverted "ice-cream cone" (mostly E2E),
+     call it out and give the rebalancing plan.
+5. **Output the plan** using `plan-template.md`: per-layer test lists, the seam
+   contracts, the few E2E journeys, target proportions, tooling, CI wiring, and
+   what to build first.
+## What goes where (summary — detail in `layer-test-matrix.md`)
+- **Frontend:** unit test component logic/hooks/reducers/utils; component-test
+  rendered UI with mocked network (Testing Library + MSW), accessibility (axe),
+  and visual regression; **consumer contract** tests against the API; a few E2E
+  journeys. Do NOT drive backend business rules through the UI.
+- **Backend:** unit test domain/business logic and validators; integration-test
+  repositories and route handlers against a real (containerized) DB and real
+  adapters; **provider contract** verification + OpenAPI/schema conformance;
+  service/component tests with downstreams mocked.
+- **Middleware:** unit test routing/transformation/auth/rate-limit logic;
+  integration-test queue producers/consumers, gateway routing, and cache
+  behavior with real infra (Testcontainers); **message/event contracts**
+  (AsyncAPI, Pact message pacts); resilience tests for retries, timeouts,
+  circuit breakers, and idempotency.
+- **Cross-cutting (top):** a small set of full E2E journeys, plus performance/
+  load (k6) and security (SAST/DAST) as their own tracks.
+## Principles
+- **Push tests down.** A bug catchable by a unit test should not need an E2E.
+- **Contracts replace integration E2E.** Seams verified by contracts let you
+  delete most cross-service E2E.
+- **Test behavior, not implementation.** Especially in the FE — assert what the
+  user sees, not internal state.
+- **Isolation + speed at the base**, realism concentrated at the seams, breadth
+  only at the tip.
+- **Right-size to risk:** put the extra depth on revenue/safety-critical flows.

package/test-pyramid/skills/test-pyramid/layer-test-matrix.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Layer × Level Test Matrix
+For each layer, what to test at each pyramid level and typical tools. Assign each
+behavior to the **lowest** level that can prove it.
+## Frontend (UI)
+| Level | What to test in the FE | Tools |
+|-------|------------------------|-------|
+| Unit | Pure logic: hooks, reducers/stores, selectors, formatters, validation, utility fns | Jest / Vitest |
+| Component / integration | Rendered components & flows with **network mocked**; forms, conditional UI, routing; accessibility; visual regression | Testing Library, MSW, jest-axe, Storybook + Playwright/Chromatic snapshots |
+| Contract (consumer) | The shape/behavior the FE expects from each API it calls | Pact (consumer), or types generated from OpenAPI/GraphQL schema |
+| E2E | A few critical user journeys through the real app | Playwright / Cypress |
+**Don't** test backend business rules or data validation *through* the UI — mock
+the API and test those rules in the BE.
+## Backend (services / APIs)
+| Level | What to test in the BE | Tools |
+|-------|------------------------|-------|
+| Unit | Domain/business logic, calculations, state machines, validators, mappers — no I/O | Jest/Vitest, Pytest, JUnit, Go test, RSpec |
+| Integration | Repositories/ORM against a **real DB**, route/controller handlers, external adapters, migrations | Testcontainers, Supertest, test DB, WireMock for third parties |
+| Contract (provider) | Verify the provider satisfies every consumer contract; conform to the published OpenAPI/GraphQL schema | Pact (provider verification), Schemathesis, Dredd, oasdiff |
+| Component / service | The whole service in isolation with downstreams stubbed | in-process HTTP + mocked deps |
+## Middleware (gateway / queues / auth / cache / workers)
+| Level | What to test in middleware | Tools |
+|-------|----------------------------|-------|
+| Unit | Routing/transformation rules, auth/authorization middleware, rate limiting, serialization | framework test runner |
+| Integration | Queue producers/consumers, event handlers, gateway routing/BFF aggregation, cache read/write/invalidation | Testcontainers (Kafka/RabbitMQ/Redis), LocalStack |
+| Contract (message/event) | Event & message schemas between producers and consumers | AsyncAPI validation, Pact message pacts |
+| Resilience | Retries, timeouts, circuit breakers, idempotency, dead-letter handling, backpressure | Toxiproxy, fault-injection, chaos tests |
+## Cross-cutting (top of the pyramid — keep small)
+| Concern | What | Tools |
+|---------|------|-------|
+| E2E journeys | A handful of full-stack critical paths only | Playwright / Cypress |
+| Performance / load | Throughput, latency, soak, spike | k6, Gatling, Locust |
+| Security | SAST, dependency scan, DAST | Semgrep/CodeQL, Snyk/Dependabot, OWASP ZAP |
+| Accessibility | End-to-end a11y on key flows | axe, Lighthouse CI |
+## Target shape
+- ~70% unit · ~20% integration/component · ~7% contract · ~3% E2E (a small,
+  fixed set). Contracts do the heavy lifting at seams so E2E stays tiny.
+- Inverted suite (mostly slow E2E)? Push each E2E down: replace with a component
+  test (FE), an integration test (BE), or a contract test (seam) wherever possible.

package/test-pyramid/skills/test-pyramid/plan-template.md ADDED Viewed

@@ -0,0 +1,58 @@
+# Test Pyramid Strategy — <Application Name>
+> Generated <date> · Scope: frontend + backend + middleware
+## 0. Architecture map
+Layers detected and their tech:
+- **Frontend:** framework, state, routing, design system.
+- **Backend:** services, API style (REST/GraphQL/gRPC), data stores.
+- **Middleware:** gateway/BFF, auth, queues/event bus, cache, workers, 3rd parties.
+Diagram or bullet list of how requests/events flow across layers.
+## 1. Frontend test plan
+- **Unit:** <hooks/reducers/utils/validators to cover> — tool.
+- **Component/integration:** <rendered flows, forms, a11y, visual> — tool.
+- **Consumer contracts:** <each API the FE consumes> — tool.
+- Explicitly **not** in the FE: <backend rules to push down>.
+## 2. Backend test plan
+- **Unit:** <domain logic, calculations, validators>.
+- **Integration:** <repositories, handlers, adapters, migrations> — real DB via Testcontainers.
+- **Provider contracts / schema conformance:** <APIs to verify>.
+- **Service/component:** <services to test in isolation>.
+## 3. Middleware test plan
+- **Unit:** <routing/auth/rate-limit/transform logic>.
+- **Integration:** <queues, gateway, cache> with real infra.
+- **Message/event contracts:** <topics/queues and their schemas>.
+- **Resilience:** <retries, timeouts, circuit breakers, idempotency>.
+## 4. Seams & contracts (the glue)
+Table of every boundary → contract that covers it, so E2E can stay small.
+| Seam | Consumer | Provider | Contract |
+|------|----------|----------|----------|
+| FE ↔ Orders API | web app | orders-svc | Pact / OpenAPI |
+| orders-svc ↔ payments | orders-svc | payments-svc | Pact |
+| orders-svc → events | producer | notif-worker | AsyncAPI / message pact |
+## 5. End-to-end journeys (keep to a handful)
+List the few critical full-stack paths that justify E2E, and why each can't be
+covered lower down.
+## 6. Cross-cutting
+Performance/load, security (SAST/DAST/deps), accessibility — owners and cadence.
+## 7. Target proportions & current gap
+| Level | Target | Now | Action |
+|-------|--------|-----|--------|
+| Unit | ~70% | ? | |
+| Integration/component | ~20% | ? | |
+| Contract | ~7% | ? | |
+| E2E | ~3% | ? | |
+Note if the current suite is inverted and the rebalancing moves.
+## 8. Tooling summary
+Per layer: chosen runners, mocking, contract tooling, CI reporting.
+## 9. Build order (Now / Next / Later)
+Concrete first tests to add, then build-out, then hardening — with owners.