npm - ccqa - Versions diffs - 0.6.0 → 0.8.0 - Mend

ccqa 0.6.0 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +104 -25
package/dist/bin/ccqa.mjs +7972 -4680
package/dist/package.json +1 -1
package/dist/runtime/test-helpers.d.mts +11 -1
package/dist/runtime/test-helpers.mjs +120 -2
package/dist/{spawn-ab-DjRh1-4T.mjs → spawn-ab-Ja8NRRab.mjs} +14 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -2,7 +2,14 @@
 **Your Claude subscription already includes a QA engineer.**
-ccqa turns Claude Code into a browser test recorder. Write a spec in YAML, run `ccqa trace`, and Claude drives your app via [agent-browser](https://github.com/vercel-labs/agent-browser). Every action is recorded and compiled into a deterministic test script you can run in CI. No extra API key. Just `claude`.
+ccqa turns Claude Code into a browser test recorder. Write a spec in YAML, declare in the spec whether it should run **deterministic** or **live**, then `ccqa run` does the right thing per spec:
+- **Deterministic** (`mode: deterministic`, default): record once with `ccqa record`. Claude drives the browser, ccqa compiles every action into a `test.spec.ts` you can replay in CI under vitest — no LLM at run time. Cheapest and most stable.
+- **Live** (`mode: live`): no codegen. `ccqa run` sends each step to Claude every time, Claude drives `agent-browser` directly, judges pass/fail against the step's `expected`, and saves a before/after screenshot. More flexible for fragile UIs.
+A single project mixes both: each spec.yaml picks its own mode, and `ccqa run` reads the field and dispatches. The HTML report covers both in one page.
+No extra API key. Just `claude`.
 [日本語版 README](./docs/README.ja.md)
@@ -10,12 +17,15 @@ ccqa turns Claude Code into a browser test recorder. Write a spec in YAML, run `
 ```mermaid
 flowchart LR
-    A["Write spec\n(spec.yaml)"] --> B["ccqa trace\n(Claude drives browser)"]
-    B --> C["ccqa generate\n(LLM → test script)"]
-    C --> D["ccqa run\n(deterministic replay)"]
+    A["Write spec\n(spec.yaml + mode:)"] --> B{mode}
+    B -- deterministic --> C["ccqa record\n(Claude → test.spec.ts)"]
+    C --> D["ccqa run\n(vitest replay, no LLM)"]
+    B -- live --> E["ccqa run\n(Claude drives every time,\nper-step pass/fail)"]
 ```
-`trace` invokes Claude Code with your spec. Claude drives the browser step by step, recording every action as structured data. `generate` compiles that data into a vitest-compatible script. `run` replays it deterministically — no LLM involved.
+For deterministic specs, `record` invokes Claude Code with your spec, Claude drives the browser step by step, every action is recorded, and a vitest-compatible script is generated. `run` then replays it without involving an LLM.
+For live specs, `record` is not needed. `run` directly sends each step to Claude, which drives the browser through `agent-browser`, judges whether the step's `expected` clause holds, and writes a PNG before and after each step. Useful when codegen is fragile (timing-dependent UIs, rich-text editors, dynamic selectors).
 ## Install
@@ -27,11 +37,12 @@ Requires Node.js **20+**. [agent-browser](https://github.com/vercel-labs/agent-b
 ## Quick start
-**1. Write a spec** — by hand, or interactively with [`ccqa draft`](./docs/draft.md)
+**1. Write a spec** — by hand, or interactively with [`ccqa draft`](./docs/draft.md). Declare the mode in the spec itself.
 ```yaml
 # .ccqa/features/tasks/test-cases/create-and-complete/spec.yaml
 title: Create a task and mark it complete
+mode: deterministic   # or: live. Omit for deterministic (the default).
 steps:
   - instruction: |
@@ -45,28 +56,26 @@ steps:
 URLs live inside `instruction` strings — either verbatim or via `${ENV_VAR}` references for environment-specific values.
-**2. Trace** — Claude drives the browser and records every action
+**2a. For `mode: deterministic` — record once, then replay**
 ```bash
-ccqa trace tasks/create-and-complete
+ccqa record tasks/create-and-complete   # Claude drives the browser; generates test.spec.ts
+ccqa run tasks/create-and-complete      # vitest replays test.spec.ts; no LLM
 ```
-**3. Generate** — convert recorded actions into a replayable test
+**2b. For `mode: live` — skip codegen, run directly**
 ```bash
-ccqa generate tasks/create-and-complete
+ccqa run tasks/create-and-complete      # Claude drives the browser every time
 ```
-**4. Run** — replay deterministically, no LLM involved
-```bash
-ccqa run tasks/create-and-complete
-```
+By default deterministic runs write step-boundary screenshots and metadata to `ccqa-report/evidence/<feature>/<spec>/` so a reviewer can confirm a passing spec actually reached the states its `expected` clauses describe. Disable with `--no-evidence`.
-In CI you can opt in to drift analysis on test failures by passing `--drift` — Claude will explain the failure by comparing the spec against the current codebase. Requires `ANTHROPIC_API_KEY` or a local Claude login.
+In CI you can opt in to an HTML run report by passing `--report` — every failing spec gets a drift audit plus a root-cause call (TEST_DRIFT / SPEC_CHANGE / PRODUCT_BUG) using the branch's git diff as context, and the report lets a human grade those calls to measure their accuracy. Requires `ANTHROPIC_API_KEY` or a local Claude login for the analysis part. Opt out with `--no-failure-analysis` (which also implicitly skips the drift audit — the audit is rendered as evidence under the classification, so without the classification the cost has nowhere to land). Use `--no-drift-audit` to keep the classification but skip the audit. See [Run report](./docs/report.md).
 ```bash
-ccqa run tasks/create-and-complete --drift --format github
+ccqa run tasks/create-and-complete --report --base origin/main
+ccqa run --changed --report                    # only specs whose relatedPaths touch the diff
 ```
 ## Features
@@ -78,20 +87,35 @@ ccqa run tasks/create-and-complete --drift --format github
 | Assertion helper functions | [Assertions](./docs/assertions.md) |
 | Auto-fix failing tests | [Auto-fix](./docs/auto-fix.md) |
 | Detect spec/code drift in CI | [Drift](./docs/drift.md) |
+| HTML run report with failure root-cause calls | [Run report](./docs/report.md) |
 | Inventory existing test coverage | [Perspectives](./docs/perspectives.md) |
+| Architecture decision records (why it is built this way) | [ADR](./docs/adr/README.md) |
 ## Commands
 ```
 ccqa draft [feature/spec]          Co-author a test spec with Claude
-ccqa trace <feature/spec>          Record browser actions for a spec (inlines any included blocks)
-ccqa generate <feature/spec>       Generate test script from recorded actions
-ccqa run [feature/spec]            Execute generated test scripts (add --drift to analyze failures)
-ccqa drift [feature/spec]          Standalone spec ↔ codebase drift audit (for scheduled jobs)
 ccqa perspectives                  Inventory existing test coverage into .ccqa/perspectives.yaml
+ccqa record <feature/spec>         (deterministic specs only) Trace browser actions + generate test.spec.ts
+ccqa run [feature/spec]            Execute specs. Per spec, the spec.yaml `mode:` field selects deterministic
+                                   (vitest replay) or live (Claude drives every time). One run can mix both;
+                                   `--report` writes one unified HTML.
+ccqa drift [feature/spec]          Standalone spec ↔ codebase static audit (for PR checks)
 ```
-All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus` | `haiku`, or a full model ID). The flag overrides `CCQA_MODEL`; when both are unset, the Claude Code CLI default is used. They also accept `--language <bcp47>` (e.g. `ja`, `en`) to set the language of human-readable output; the default `auto` follows the language of the spec/codebase. Interactive commands authenticate via your local Claude Code login; commands that talk to Claude in CI (`ccqa run --drift`, `ccqa drift`) additionally honor `ANTHROPIC_API_KEY`.
+`ccqa run` flags:
+- `--report [dir]` — write a self-contained HTML run report (default dir: `ccqa-report/`)
+- `--changed` — restrict execution to specs whose `relatedPaths` intersect `git diff <base>...HEAD`. Mutually exclusive with an explicit spec id.
+- `--base <ref>` — base ref for the git diff (default: `$GITHUB_BASE_REF`, then `origin/main`)
+- `--no-failure-analysis` — skip the per-failure root-cause classification (also skips the drift audit, since the audit only shows under the classification)
+- `--no-drift-audit` — skip the spec ↔ code drift audit while keeping the classification
+- `--no-evidence` — (deterministic specs only) skip step-boundary PNG capture
+- `--retry <n>` — (live specs only) retry each failing step up to N more times
+- `--format <fmt>` — `text` (default), `json` (report.json), `github` (Actions annotations)
+- `--out <dir>` — (live specs only, single-spec invocations) override the per-run artifact directory
+All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus` | `haiku`, or a full model ID). The flag overrides `CCQA_MODEL`; when both are unset, the Claude Code CLI default is used. They also accept `--language <bcp47>` (e.g. `ja`, `en`) to set the language of human-readable output; the default `auto` follows the language of the spec/codebase. `--cwd <path>` works on `record` / `run` / `drift` so you can target a subpackage inside a monorepo from the repo root. Interactive commands authenticate via your local Claude Code login; commands that talk to Claude in CI (`ccqa run --report`, `ccqa drift`) additionally honor `ANTHROPIC_API_KEY`.
 `<feature/spec>` is a 2-segment alias for the on-disk path `.ccqa/features/<feature>/test-cases/<spec>/`.
@@ -101,6 +125,9 @@ All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus`
 .ccqa/
   perspectives.yaml              # Inventory of existing coverage (machine-readable, canonical)
   perspectives.md                # Category index, regenerated from the YAML
+  prompts/
+    trace.user.md                # Project-specific guidance appended to `ccqa record` (trace phase)
+    run-nd.user.md               # Project-specific guidance appended to `ccqa run` (live specs)
   blocks/
     login/
       spec.yaml                  # Reusable block (params + steps)
@@ -109,11 +136,63 @@ All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus`
       perspectives.md            # Per-category detail tables (one per case)
       test-cases/
         create-and-complete/
-          spec.yaml              # Test definition
-          actions.json           # Recorded actions from trace
-          test.spec.ts           # Generated test script
+          spec.yaml              # Test definition, with `mode: deterministic | live`
+          actions.json           # (deterministic only) Recorded actions from `ccqa record`
+          test.spec.ts           # (deterministic only) Generated vitest script
+          runs/
+            2026-06-14T10-00-00-000Z/  # (live only) one `ccqa run` invocation
+              run.json                  # Machine-readable summary
+              run.md                    # Human-readable per-step log
+              steps/
+                step-01.before.png      # Before-step screenshot
+                step-01.after.png       # After-step screenshot
+                step-01.log.txt         # Claude's full transcript for the step
 ```
+Add `.ccqa/features/*/test-cases/*/runs/` to `.gitignore` — these are per-run artefacts that should not be committed. Likewise `ccqa-report*/`.
+## Live specs (`mode: live`)
+For specs declared `mode: live` in their spec.yaml, `ccqa run` skips codegen entirely: Claude executes each spec step against `agent-browser` directly, judges whether the step's `expected` outcome holds, and saves a PNG screenshot before and after every step. Use this mode when:
+- you want to validate a spec but don't yet need a replayable, recorded test
+- the codegen output for a spec is fragile (heavily timing-dependent UIs, rich-text editors, dynamic selectors)
+- you want a visual audit trail of what the page looked like at every step
+```bash
+# Run a single live spec
+ccqa run tasks/create-and-complete
+# Run every spec under a feature (mixes deterministic + live as declared)
+ccqa run tasks
+# Run every spec in the project, into a unified HTML report
+ccqa run --report
+# Retry each failing step up to 2 more times (live specs only)
+ccqa run --retry 2 tasks/create-and-complete
+```
+Constraints on selectors / `agent-browser` subcommands that apply during `ccqa record` (no `eval`, no `@ref`, no bare-tag positional `find`, no chained agent-browser calls) are **relaxed** for live specs — Claude can use any subcommand and any selector style because there is no replay contract to honour.
+### Per-project guidance (`.ccqa/prompts/run-nd.user.md`)
+ccqa's live-mode system prompt is deliberately product-agnostic. Anything specific to **your** project — staging URLs, login flow quirks, rich-editor types, common access-denied wording — belongs in `.ccqa/prompts/run-nd.user.md`. The file is read once per invocation and appended to the system prompt under a "Project-specific guidance" heading.
+Keep it short. A page or two of focused notes beats a long handbook — Claude has the spec's `expected` text to work from, the file is for the *non-obvious* product knowledge that isn't in any single spec. Examples of what's useful here:
+- "the rich text editor is `[contenteditable='true']` — use `fill`, not keystrokes"
+- "login redirects through an IDP service-selection screen; you can skip it by opening the destination URL directly"
+- "access-denied is signalled by a specific in-app message string — name it here so the model asserts on it"
+Examples of what does **not** belong:
+- per-spec details (those belong in the spec's `instruction` / `expected`)
+- restating the STEP_RESULT contract (already in the system prompt)
+- copy-pasted style guidelines from `trace.user.md` (the relaxed-constraint mode doesn't need them)
+The file is capped at 32 KiB; anything beyond that is truncated with a warning.
 ## License
 MIT