ccqa 0.7.0 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,7 +2,14 @@
2
2
 
3
3
  **Your Claude subscription already includes a QA engineer.**
4
4
 
5
- ccqa turns Claude Code into a browser test recorder. Write a spec in YAML, run `ccqa trace`, and Claude drives your app via [agent-browser](https://github.com/vercel-labs/agent-browser). Every action is recorded and compiled into a deterministic test script you can run in CI. No extra API key. Just `claude`.
5
+ ccqa turns Claude Code into a browser test recorder. Write a spec in YAML, declare in the spec whether it should run **deterministic** or **live**, then `ccqa run` does the right thing per spec:
6
+
7
+ - **Deterministic** (`mode: deterministic`, default): record once with `ccqa record`. Claude drives the browser, ccqa compiles every action into a `test.spec.ts` you can replay in CI under vitest — no LLM at run time. Cheapest and most stable.
8
+ - **Live** (`mode: live`): no codegen. `ccqa run` sends each step to Claude every time, Claude drives `agent-browser` directly, judges pass/fail against the step's `expected`, and saves a before/after screenshot. More flexible for fragile UIs.
9
+
10
+ A single project mixes both: each spec.yaml picks its own mode, and `ccqa run` reads the field and dispatches. The HTML report covers both in one page.
11
+
12
+ No extra API key. Just `claude`.
6
13
 
7
14
  [日本語版 README](./docs/README.ja.md)
8
15
 
@@ -10,12 +17,15 @@ ccqa turns Claude Code into a browser test recorder. Write a spec in YAML, run `
10
17
 
11
18
  ```mermaid
12
19
  flowchart LR
13
- A["Write spec\n(spec.yaml)"] --> B["ccqa trace\n(Claude drives browser)"]
14
- B --> C["ccqa generate\n(LLM → test script)"]
15
- C --> D["ccqa run\n(deterministic replay)"]
20
+ A["Write spec\n(spec.yaml + mode:)"] --> B{mode}
21
+ B -- deterministic --> C["ccqa record\n(Claude → test.spec.ts)"]
22
+ C --> D["ccqa run\n(vitest replay, no LLM)"]
23
+ B -- live --> E["ccqa run\n(Claude drives every time,\nper-step pass/fail)"]
16
24
  ```
17
25
 
18
- `trace` invokes Claude Code with your spec. Claude drives the browser step by step, recording every action as structured data. `generate` compiles that data into a vitest-compatible script. `run` replays it deterministically no LLM involved.
26
+ For deterministic specs, `record` invokes Claude Code with your spec, Claude drives the browser step by step, every action is recorded, and a vitest-compatible script is generated. `run` then replays it without involving an LLM.
27
+
28
+ For live specs, `record` is not needed. `run` directly sends each step to Claude, which drives the browser through `agent-browser`, judges whether the step's `expected` clause holds, and writes a PNG before and after each step. Useful when codegen is fragile (timing-dependent UIs, rich-text editors, dynamic selectors).
19
29
 
20
30
  ## Install
21
31
 
@@ -27,11 +37,12 @@ Requires Node.js **20+**. [agent-browser](https://github.com/vercel-labs/agent-b
27
37
 
28
38
  ## Quick start
29
39
 
30
- **1. Write a spec** — by hand, or interactively with [`ccqa draft`](./docs/draft.md)
40
+ **1. Write a spec** — by hand, or interactively with [`ccqa draft`](./docs/draft.md). Declare the mode in the spec itself.
31
41
 
32
42
  ```yaml
33
43
  # .ccqa/features/tasks/test-cases/create-and-complete/spec.yaml
34
44
  title: Create a task and mark it complete
45
+ mode: deterministic # or: live. Omit for deterministic (the default).
35
46
 
36
47
  steps:
37
48
  - instruction: |
@@ -45,28 +56,26 @@ steps:
45
56
 
46
57
  URLs live inside `instruction` strings — either verbatim or via `${ENV_VAR}` references for environment-specific values.
47
58
 
48
- **2. Trace** Claude drives the browser and records every action
59
+ **2a. For `mode: deterministic` record once, then replay**
49
60
 
50
61
  ```bash
51
- ccqa trace tasks/create-and-complete
62
+ ccqa record tasks/create-and-complete # Claude drives the browser; generates test.spec.ts
63
+ ccqa run tasks/create-and-complete # vitest replays test.spec.ts; no LLM
52
64
  ```
53
65
 
54
- **3. Generate** convert recorded actions into a replayable test
66
+ **2b. For `mode: live` skip codegen, run directly**
55
67
 
56
68
  ```bash
57
- ccqa generate tasks/create-and-complete
69
+ ccqa run tasks/create-and-complete # Claude drives the browser every time
58
70
  ```
59
71
 
60
- **4. Run** replay deterministically, no LLM involved
61
-
62
- ```bash
63
- ccqa run tasks/create-and-complete
64
- ```
72
+ By default deterministic runs write step-boundary screenshots and metadata to `ccqa-report/evidence/<feature>/<spec>/` so a reviewer can confirm a passing spec actually reached the states its `expected` clauses describe. Disable with `--no-evidence`.
65
73
 
66
- In CI you can opt in to an HTML run report by passing `--drift-report` — every failing spec gets a drift audit plus a root-cause call (TEST_DRIFT / SPEC_CHANGE / PRODUCT_BUG) using the PR diff as context, and the report lets a human grade those calls to measure their accuracy. Requires `ANTHROPIC_API_KEY` or a local Claude login for the analysis part. See [Run report](./docs/report.md).
74
+ In CI you can opt in to an HTML run report by passing `--report` — every failing spec gets a drift audit plus a root-cause call (TEST_DRIFT / SPEC_CHANGE / PRODUCT_BUG) using the branch's git diff as context, and the report lets a human grade those calls to measure their accuracy. Requires `ANTHROPIC_API_KEY` or a local Claude login for the analysis part. Opt out with `--no-failure-analysis` (which also implicitly skips the drift audit — the audit is rendered as evidence under the classification, so without the classification the cost has nowhere to land). Use `--no-drift-audit` to keep the classification but skip the audit. See [Run report](./docs/report.md).
67
75
 
68
76
  ```bash
69
- ccqa run tasks/create-and-complete --drift-report --drift-base origin/main
77
+ ccqa run tasks/create-and-complete --report --base origin/main
78
+ ccqa run --changed --report # only specs whose relatedPaths touch the diff
70
79
  ```
71
80
 
72
81
  ## Features
@@ -86,14 +95,27 @@ ccqa run tasks/create-and-complete --drift-report --drift-base origin/main
86
95
 
87
96
  ```
88
97
  ccqa draft [feature/spec] Co-author a test spec with Claude
89
- ccqa trace <feature/spec> Record browser actions for a spec (inlines any included blocks)
90
- ccqa generate <feature/spec> Generate test script from recorded actions
91
- ccqa run [feature/spec] Execute generated test scripts (add --drift-report for an HTML report with failure analysis)
92
- ccqa drift [feature/spec] Standalone spec ↔ codebase drift audit (for scheduled jobs)
93
98
  ccqa perspectives Inventory existing test coverage into .ccqa/perspectives.yaml
99
+ ccqa record <feature/spec> (deterministic specs only) Trace browser actions + generate test.spec.ts
100
+ ccqa run [feature/spec] Execute specs. Per spec, the spec.yaml `mode:` field selects deterministic
101
+ (vitest replay) or live (Claude drives every time). One run can mix both;
102
+ `--report` writes one unified HTML.
103
+ ccqa drift [feature/spec] Standalone spec ↔ codebase static audit (for PR checks)
94
104
  ```
95
105
 
96
- All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus` | `haiku`, or a full model ID). The flag overrides `CCQA_MODEL`; when both are unset, the Claude Code CLI default is used. They also accept `--language <bcp47>` (e.g. `ja`, `en`) to set the language of human-readable output; the default `auto` follows the language of the spec/codebase. Interactive commands authenticate via your local Claude Code login; commands that talk to Claude in CI (`ccqa run --drift-report`, `ccqa drift`) additionally honor `ANTHROPIC_API_KEY`.
106
+ `ccqa run` flags:
107
+
108
+ - `--report [dir]` — write a self-contained HTML run report (default dir: `ccqa-report/`)
109
+ - `--changed` — restrict execution to specs whose `relatedPaths` intersect `git diff <base>...HEAD`. Mutually exclusive with an explicit spec id.
110
+ - `--base <ref>` — base ref for the git diff (default: `$GITHUB_BASE_REF`, then `origin/main`)
111
+ - `--no-failure-analysis` — skip the per-failure root-cause classification (also skips the drift audit, since the audit only shows under the classification)
112
+ - `--no-drift-audit` — skip the spec ↔ code drift audit while keeping the classification
113
+ - `--no-evidence` — (deterministic specs only) skip step-boundary PNG capture
114
+ - `--retry <n>` — (live specs only) retry each failing step up to N more times
115
+ - `--format <fmt>` — `text` (default), `json` (report.json), `github` (Actions annotations)
116
+ - `--out <dir>` — (live specs only, single-spec invocations) override the per-run artifact directory
117
+
118
+ All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus` | `haiku`, or a full model ID). The flag overrides `CCQA_MODEL`; when both are unset, the Claude Code CLI default is used. They also accept `--language <bcp47>` (e.g. `ja`, `en`) to set the language of human-readable output; the default `auto` follows the language of the spec/codebase. `--cwd <path>` works on `record` / `run` / `drift` so you can target a subpackage inside a monorepo from the repo root. Interactive commands authenticate via your local Claude Code login; commands that talk to Claude in CI (`ccqa run --report`, `ccqa drift`) additionally honor `ANTHROPIC_API_KEY`.
97
119
 
98
120
  `<feature/spec>` is a 2-segment alias for the on-disk path `.ccqa/features/<feature>/test-cases/<spec>/`.
99
121
 
@@ -103,6 +125,9 @@ All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus`
103
125
  .ccqa/
104
126
  perspectives.yaml # Inventory of existing coverage (machine-readable, canonical)
105
127
  perspectives.md # Category index, regenerated from the YAML
128
+ prompts/
129
+ trace.user.md # Project-specific guidance appended to `ccqa record` (trace phase)
130
+ run-nd.user.md # Project-specific guidance appended to `ccqa run` (live specs)
106
131
  blocks/
107
132
  login/
108
133
  spec.yaml # Reusable block (params + steps)
@@ -111,11 +136,63 @@ All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus`
111
136
  perspectives.md # Per-category detail tables (one per case)
112
137
  test-cases/
113
138
  create-and-complete/
114
- spec.yaml # Test definition
115
- actions.json # Recorded actions from trace
116
- test.spec.ts # Generated test script
139
+ spec.yaml # Test definition, with `mode: deterministic | live`
140
+ actions.json # (deterministic only) Recorded actions from `ccqa record`
141
+ test.spec.ts # (deterministic only) Generated vitest script
142
+ runs/
143
+ 2026-06-14T10-00-00-000Z/ # (live only) one `ccqa run` invocation
144
+ run.json # Machine-readable summary
145
+ run.md # Human-readable per-step log
146
+ steps/
147
+ step-01.before.png # Before-step screenshot
148
+ step-01.after.png # After-step screenshot
149
+ step-01.log.txt # Claude's full transcript for the step
117
150
  ```
118
151
 
152
+ Add `.ccqa/features/*/test-cases/*/runs/` to `.gitignore` — these are per-run artefacts that should not be committed. Likewise `ccqa-report*/`.
153
+
154
+ ## Live specs (`mode: live`)
155
+
156
+ For specs declared `mode: live` in their spec.yaml, `ccqa run` skips codegen entirely: Claude executes each spec step against `agent-browser` directly, judges whether the step's `expected` outcome holds, and saves a PNG screenshot before and after every step. Use this mode when:
157
+
158
+ - you want to validate a spec but don't yet need a replayable, recorded test
159
+ - the codegen output for a spec is fragile (heavily timing-dependent UIs, rich-text editors, dynamic selectors)
160
+ - you want a visual audit trail of what the page looked like at every step
161
+
162
+ ```bash
163
+ # Run a single live spec
164
+ ccqa run tasks/create-and-complete
165
+
166
+ # Run every spec under a feature (mixes deterministic + live as declared)
167
+ ccqa run tasks
168
+
169
+ # Run every spec in the project, into a unified HTML report
170
+ ccqa run --report
171
+
172
+ # Retry each failing step up to 2 more times (live specs only)
173
+ ccqa run --retry 2 tasks/create-and-complete
174
+ ```
175
+
176
+ Constraints on selectors / `agent-browser` subcommands that apply during `ccqa record` (no `eval`, no `@ref`, no bare-tag positional `find`, no chained agent-browser calls) are **relaxed** for live specs — Claude can use any subcommand and any selector style because there is no replay contract to honour.
177
+
178
+ ### Per-project guidance (`.ccqa/prompts/run-nd.user.md`)
179
+
180
+ ccqa's live-mode system prompt is deliberately product-agnostic. Anything specific to **your** project — staging URLs, login flow quirks, rich-editor types, common access-denied wording — belongs in `.ccqa/prompts/run-nd.user.md`. The file is read once per invocation and appended to the system prompt under a "Project-specific guidance" heading.
181
+
182
+ Keep it short. A page or two of focused notes beats a long handbook — Claude has the spec's `expected` text to work from, the file is for the *non-obvious* product knowledge that isn't in any single spec. Examples of what's useful here:
183
+
184
+ - "the rich text editor is `[contenteditable='true']` — use `fill`, not keystrokes"
185
+ - "login redirects through an IDP service-selection screen; you can skip it by opening the destination URL directly"
186
+ - "access-denied is signalled by a specific in-app message string — name it here so the model asserts on it"
187
+
188
+ Examples of what does **not** belong:
189
+
190
+ - per-spec details (those belong in the spec's `instruction` / `expected`)
191
+ - restating the STEP_RESULT contract (already in the system prompt)
192
+ - copy-pasted style guidelines from `trace.user.md` (the relaxed-constraint mode doesn't need them)
193
+
194
+ The file is capped at 32 KiB; anything beyond that is truncated with a warning.
195
+
119
196
  ## License
120
197
 
121
198
  MIT