ccqa 0.3.9 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,375 +2,113 @@
2
2
 
3
3
  **Your Claude subscription already includes a QA engineer.**
4
4
 
5
- ccqa turns Claude Code into a browser test recorder.
5
+ ccqa turns Claude Code into a browser test recorder. Write a spec in YAML, run `ccqa trace`, and Claude drives your app via [agent-browser](https://github.com/vercel-labs/agent-browser). Every action is recorded and compiled into a deterministic test script you can run in CI. No extra API key. Just `claude`.
6
6
 
7
- Write a spec in Markdown, run `ccqa trace`, and Claude drives your app via [agent-browser](https://github.com/vercel-labs/agent-browser) — a lightweight headless browser CLI that runs anywhere without a browser driver or Playwright setup. Because the agent controls the browser through a simple CLI interface, it can handle login flows, intermediate screens, and dynamic UI the same way a human would.
8
-
9
- Every action is recorded as structured data and compiled into a deterministic test script you can run in CI. No extra API key. Just `claude`.
7
+ [日本語版 README](./docs/README.ja.md)
10
8
 
11
9
  ## How it works
12
10
 
13
11
  ```mermaid
14
12
  flowchart LR
15
- A["Write spec\n(test-spec.md)"] --> B["ccqa trace\n(Claude drives browser)"]
13
+ A["Write spec\n(spec.yaml)"] --> B["ccqa trace\n(Claude drives browser)"]
16
14
  B --> C["ccqa generate\n(LLM → test script)"]
17
15
  C --> D["ccqa run\n(deterministic replay)"]
18
16
  ```
19
17
 
20
- `trace` invokes Claude Code with your spec. Claude drives the browser step by step via [agent-browser](https://github.com/vercel-labs/agent-browser), recording every action as structured data. `generate` compiles that data into a vitest-compatible script. `run` replays it deterministically — no LLM involved.
18
+ `trace` invokes Claude Code with your spec. Claude drives the browser step by step, recording every action as structured data. `generate` compiles that data into a vitest-compatible script. `run` replays it deterministically — no LLM involved.
21
19
 
22
20
  ## Install
23
21
 
24
- Add ccqa as a dev dependency in your project:
25
-
26
- ```bash
27
- pnpm add -D ccqa vitest
28
- # or
29
- npm install -D ccqa vitest
30
- ```
31
-
32
- Then invoke the CLI via your package runner:
33
-
34
22
  ```bash
35
- pnpm exec ccqa trace tasks/create-and-complete
36
- # or
37
- npx ccqa trace tasks/create-and-complete
23
+ pnpm add -D ccqa vitest agent-browser
38
24
  ```
39
25
 
40
- ccqa requires Node.js **20+** at runtime. The peer dependency [agent-browser](https://github.com/vercel-labs/agent-browser) must also be installed:
26
+ Requires Node.js **20+**. [agent-browser](https://github.com/vercel-labs/agent-browser) is a peer dependency.
41
27
 
42
- ```bash
43
- pnpm add -D agent-browser
44
- ```
28
+ ## Quick start
45
29
 
46
- ## Usage
30
+ **1. Write a spec** — by hand, or interactively with [`ccqa draft`](./docs/draft.md)
47
31
 
48
- **1. Write a spec** — by hand, or interactively with [`ccqa draft`](#draft--co-author-test-specmd-with-claude)
49
-
50
- ```markdown
51
- <!-- .ccqa/features/tasks/test-cases/create-and-complete/test-spec.md -->
52
- ---
32
+ ```yaml
33
+ # .ccqa/features/tasks/test-cases/create-and-complete/spec.yaml
53
34
  title: Create a task and mark it complete
54
- baseUrl: http://localhost:3000
55
- ---
56
-
57
- ## Steps
58
35
 
59
- ### Step 1: Log in
60
- - **Instruction**: Fill in email and password, submit the form
61
- - **Expected**: Redirected to /dashboard, user avatar visible in the header
36
+ steps:
37
+ - instruction: |
38
+ Open ${APP_URL}/login. Fill in email and password, submit the form.
39
+ expected: Redirected to /dashboard, user avatar visible in the header
62
40
 
63
- ### Step 2: Create a new task
64
- - **Instruction**: Click "New Task", fill in the title "Fix login bug", set priority to High, save
65
- - **Expected**: Task appears in the task list with status "Open"
66
-
67
- ### Step 3: Mark the task as complete
68
- - **Instruction**: Open the task "Fix login bug", click "Mark as complete"
69
- - **Expected**: Task status changes to "Done", task moves to the completed section
41
+ - instruction: |
42
+ Click "New Task", fill in the title "Fix login bug", set priority to High, save.
43
+ expected: Task appears in the task list with status "Open"
70
44
  ```
71
45
 
72
- **2. TraceClaude drives the browser and records every action**
46
+ URLs live inside `instruction` strings either verbatim or via `${ENV_VAR}` references for environment-specific values.
47
+
48
+ **2. Trace** — Claude drives the browser and records every action
73
49
 
74
50
  ```bash
75
51
  ccqa trace tasks/create-and-complete
76
52
  ```
77
53
 
78
- ```
79
- ▶ trace tasks/create-and-complete
80
- spec Create a task and mark it complete
81
- url http://localhost:3000
82
- steps 3
83
-
84
- Running agent-browser session...
85
- ● step-01 Log in
86
- ● step-02 Create a new task
87
- ● step-03 Mark the task as complete
88
-
89
- trace .ccqa/features/tasks/test-cases/create-and-complete/actions.json
90
- actions 24
91
- status PASSED
92
- ```
93
-
94
- **3. Generate — convert recorded actions into a replayable test**
54
+ **3. Generate** — convert recorded actions into a replayable test
95
55
 
96
56
  ```bash
97
57
  ccqa generate tasks/create-and-complete
98
58
  ```
99
59
 
100
- **4. Run — replay deterministically, no LLM involved**
60
+ **4. Run** — replay deterministically, no LLM involved
101
61
 
102
62
  ```bash
103
63
  ccqa run tasks/create-and-complete
104
64
  ```
105
65
 
106
- ## Draftco-author test-spec.md with Claude
107
-
108
- Writing a `test-spec.md` from scratch means digging into your codebase to find the right aria-labels, URLs, and button text. `ccqa draft` puts Claude in the loop: you describe what you want to test in plain language, Claude reads the relevant code, and you refine the spec interactively.
66
+ In CI you can opt in to drift analysis on test failures by passing `--drift` Claude will explain the failure by comparing the spec against the current codebase. Requires `ANTHROPIC_API_KEY` or a local Claude login.
109
67
 
110
68
  ```bash
111
- ccqa draft
112
- ```
113
-
114
- The first run asks for your intent, proposes a `feature/spec` name, and writes a draft. Each subsequent invocation lets you give a refinement instruction — empty input means "just re-check the current spec against the code." Press `y` at the final "Are you done with this draft?" prompt to end the session.
115
-
116
- ```
117
- ccqa draft
118
-
119
- What do you want to test? > Select a category on the AI Maintenance page and run a check
120
- Proposing a feature/spec name based on your intent...
121
- proposed: ai-maintenance/run-check-with-category
122
- Use this name? [y/N/edit] > y
123
-
124
- Reading codebase and drafting spec...
125
- ✓ 5 Read, 3 Grep, 2 Glob (4.2s)
126
-
127
- ── Review (1 warning, 3 passed) ───────────────────────────────────
128
-
129
- WARNINGS (1)
130
- Assertability step-05
131
- Result row may still show "running" right after the click
132
- └ ContentQualityCheck.tsx polls every 5s; the status starts at
133
- IN_PROGRESS and only flips to SUCCEEDED later.
134
-
135
- PASSED (3)
136
- Setup references, Step granularity, Unimplemented checks
137
-
138
- ────────────────────────────────────────────────────────────────────
139
-
140
- --- proposed changes ---
141
- + ---
142
- + title: "AI Maintenance — content quality check"
143
- ...
144
-
145
- Apply this patch? [y/N] y
146
- saved: .ccqa/features/ai-maintenance/test-cases/run-check-with-category/test-spec.md
147
-
148
- How would you like to refine? (empty = re-validate) >
69
+ ccqa run tasks/create-and-complete --drift --format github
149
70
  ```
150
71
 
151
- You can also edit `test-spec.md` directly in your editor between turns — `ccqa draft` re-reads the file each iteration.
152
-
153
- ### What gets reviewed
72
+ ## Features
154
73
 
155
- Every turn Claude grades the spec on four axes and reports issues:
156
-
157
- | Check | What it verifies |
74
+ | Feature | Docs |
158
75
  |---|---|
159
- | **Assertability** | Each step's **Expected** references concrete, observable signals (visible text, URL pattern, element state) that actually exist in the code. Flags timestamps, exact counts, and session-specific values that won't be stable across runs. |
160
- | **Setup references** | Every `setups[].name` in the frontmatter resolves to an existing `.ccqa/setups/<name>/setup-spec.md`, and every `params` key matches that setup's `placeholders`. See [Setup Specs](#setup-specs--reusable-shared-procedures). |
161
- | **Step granularity** | Steps aren't too coarse (multiple actions in one) or too fine (snapshot-only filler), and the order is logical. |
162
- | **Unimplemented checks** | Anything the spec describes that Claude couldn't find in the codebase — a hint that you may be specifying behavior that doesn't exist yet. |
163
-
164
- Findings with severity `WARN` or `ERROR` are shown in full; `OK` checks collapse to a one-line summary.
165
-
166
- ### Flags
167
-
168
- ```
169
- ccqa draft [feature/spec] # arg is optional; Claude proposes a name if omitted
170
- --instruction <text> # single-shot, non-interactive
171
- --apply # auto-apply patches without [y/N] confirmation
172
- ```
173
-
174
- ## Setup Specs — Reusable shared procedures
175
-
176
- Setup specs let you define reusable procedures (login, data preparation, etc.) that run before your test steps. Define once, use across multiple test specs.
177
-
178
- ### 1. Write a setup spec
179
-
180
- ```markdown
181
- <!-- .ccqa/setups/login/setup-spec.md -->
182
- ---
183
- title: "Login"
184
- placeholders:
185
- loginUrl:
186
- dummy: "http://localhost:3000/login"
187
- description: "Login page URL"
188
- email:
189
- dummy: "user@example.com"
190
- description: "Email address"
191
- password:
192
- dummy: "secret"
193
- description: "Password"
194
- ---
195
-
196
- ## Steps
197
-
198
- ### Step 1: Open login page
199
- - **Instruction**: Navigate to {{loginUrl}}
200
- - **Expected**: Login form is displayed
201
-
202
- ### Step 2: Enter credentials and log in
203
- - **Instruction**: Enter email {{email}} and password {{password}}, then submit
204
- - **Expected**: Login succeeds
205
- ```
206
-
207
- The `placeholders` section defines variables with `dummy` values. During `trace-setup`, the dummy values are used for actual browser operation. During `generate-setup`, they are reverse-replaced with `{{key}}` placeholders.
208
-
209
- ### 2. Trace the setup
210
-
211
- ```bash
212
- ccqa trace-setup login
213
- ```
214
-
215
- ### 3. Generate and validate the setup
216
-
217
- ```bash
218
- ccqa generate-setup login
219
- ```
220
-
221
- This generates `test.dummy.spec.ts` with dummy values, runs vitest to validate, and applies auto-fix. On success, it reverse-replaces dummy values with placeholders and saves `test.spec.ts`.
222
-
223
- If auto-fix fails, edit `test.dummy.spec.ts` manually and re-run:
224
-
225
- ```bash
226
- ccqa generate-setup login --from-dummy
227
- ```
228
-
229
- ### 4. Reference from test specs
230
-
231
- ```markdown
232
- ---
233
- title: Create a task
234
- baseUrl: http://localhost:3000
235
- setups:
236
- - name: login
237
- params:
238
- loginUrl: "http://localhost:3000/login"
239
- email: "admin@example.com"
240
- password: "AdminPass123"
241
- ---
242
-
243
- ## Steps
244
- ### Step 1: Create a new task
245
- ...
246
- ```
247
-
248
- When you run `ccqa trace` or `ccqa generate`, the setup's test body is loaded, placeholders are replaced with `params` values, and it runs before your test steps — sharing the same browser session.
249
-
250
- ## What gets generated
251
-
252
- `ab()` is a thin wrapper around [agent-browser](https://github.com/vercel-labs/agent-browser) — a headless browser CLI. Each call spawns `agent-browser <command>` as a subprocess and throws if it exits non-zero. No browser driver setup, no async/await, no `.waitFor()`.
253
-
254
- ```typescript
255
- // .ccqa/features/tasks/test-cases/create-and-complete/test.spec.ts
256
- import { test } from "vitest";
257
- import { ab, abWait, abAssertUrl, abAssertTextVisible, abAssertEnabled } from "ccqa/test-helpers";
258
-
259
- process.env.AGENT_BROWSER_SESSION = `ccqa-run-${Date.now()}`;
260
-
261
- test("setup: login", () => {
262
- ab("cookies", "clear");
263
- ab("open", "http://localhost:3000/login");
264
- ab("fill", "[placeholder='Email']", "admin@example.com");
265
- ab("fill", "[type='password']", "AdminPass123");
266
- ab("press", "Enter");
267
- }, 3 * 60 * 1000);
268
-
269
- test("Create a task", () => {
270
- ab("open", "http://localhost:3000");
271
-
272
- // Create a new task
273
- ab("click", "[aria-label='New Task']");
274
- ab("fill", "[placeholder='Task title']", "Fix login bug");
275
- ab("select", "[aria-label='Priority']", "High");
276
- ab("click", "[aria-label='Save']");
277
- abAssertTextVisible("Fix login bug");
278
- abAssertTextVisible("Open");
279
- }, 5 * 60 * 1000);
280
- ```
281
-
282
- Setup and test share the same `AGENT_BROWSER_SESSION` — login state carries over. Each run starts with `cookies clear` to ensure a clean session.
283
-
284
- ## Assertions
285
-
286
- During `trace`, Claude verifies each step with at least two independent signals and emits structured assertions. These become typed helper calls in the generated script:
287
-
288
- | Assert | What it checks |
289
- |--------|---------------|
290
- | `abAssertTextVisible(text)` | Text appears on page (waits up to 30s) |
291
- | `abAssertUrl(pattern)` | Current URL contains pattern |
292
- | `abAssertEnabled(selector)` | Button/input is enabled |
293
- | `abAssertDisabled(selector)` | Button/input is disabled |
294
- | `abAssertVisible(selector)` | Element is visible |
295
- | `abAssertNotVisible(selector)` | Element is hidden |
296
- | `abAssertChecked(selector)` | Checkbox is checked |
297
- | `abAssertUnchecked(selector)` | Checkbox is unchecked |
298
-
299
- Assertions are stability-aware: Claude skips timestamps, session IDs, and exact counts that vary between runs.
300
-
301
- ## Auto-fix
302
-
303
- If the generated script fails, `generate` invokes an LLM to diagnose the failure and propose a fix. The diagnosis is one of:
304
-
305
- - **TIMING_ISSUE** — insert or extend `sleep` so the page has time to settle.
306
- - **OVER_ASSERTION** — remove `abAssert*` lines that the spec doesn't actually require.
307
- - **SELECTOR_DRIFT** — replace a renamed selector with the new one. The diagnose LLM is allowed to `Grep` / `Read` your repository (read-only) to find the actual `aria-label` / `placeholder` / `data-testid` / i18n string in the app source, so renames in the UI code are caught even when the failure log only says "selector not visible".
308
- - **DATA_MISSING** / **UNKNOWN** — not auto-fixable; the loop bails and reports the diagnosis.
309
-
310
- Each diagnosis has a `confidence` score. By default high-confidence fixes are applied automatically; low-confidence fixes drop into an interactive `[a]pply / [s]kip / [m]anual / [q]uit` prompt.
311
-
312
- ```bash
313
- ccqa generate tasks/create-and-complete # default: interactive on low confidence
314
- ccqa generate tasks/create-and-complete --auto # CI: always auto-apply
315
- ccqa generate tasks/create-and-complete --no-interactive # CI: auto-apply on high confidence, give up otherwise
316
- ccqa generate tasks/create-and-complete --max-retries 5
317
- ```
318
-
319
- > **Note**: `generate` regenerates `test.spec.ts` from `actions.json` on every run. Manual edits to `test.spec.ts` are lost on the next `generate`. When an existing `test.spec.ts` is detected, `generate` always asks for `y/N` confirmation before overwriting (even with `--auto` / `--no-interactive`). To skip the prompt in CI, pass `--force`. To persist a fix, re-run `trace` so `actions.json` reflects the new flow.
76
+ | Write specs interactively with Claude | [Draft](./docs/draft.md) |
77
+ | Reuse login and other shared step sequences | [Blocks](./docs/blocks.md) |
78
+ | Assertion helper functions | [Assertions](./docs/assertions.md) |
79
+ | Auto-fix failing tests | [Auto-fix](./docs/auto-fix.md) |
80
+ | Detect spec/code drift in CI | [Drift](./docs/drift.md) |
320
81
 
321
82
  ## Commands
322
83
 
323
84
  ```
324
85
  ccqa draft [feature/spec] Co-author a test spec with Claude
325
- --instruction <text> Single-shot, non-interactive
326
- --apply Auto-apply patches without [y/N] confirmation
327
-
328
- ccqa trace <feature/spec> Record browser actions for a test spec
86
+ ccqa trace <feature/spec> Record browser actions for a spec (inlines any included blocks)
329
87
  ccqa generate <feature/spec> Generate test script from recorded actions
330
- --auto Apply auto-fixes without confirmation (CI)
331
- --no-interactive Auto-apply only on high confidence; never prompt
332
- --force Overwrite an existing test.spec.ts without prompting
333
- --max-retries <n> Default: 3
334
- ccqa run [feature/spec] Execute generated test scripts
335
-
336
- ccqa trace-setup <name> Record browser actions for a setup spec
337
- ccqa generate-setup <name> Generate and validate setup test script
338
- --from-dummy Resume from manually edited test.dummy.spec.ts
339
- --auto / --no-interactive Same semantics as `generate`
88
+ ccqa run [feature/spec] Execute generated test scripts (add --drift to analyze failures)
89
+ ccqa drift [feature/spec] Standalone spec codebase drift audit (for scheduled jobs)
340
90
  ```
341
91
 
342
- All Claude-driven commands (`trace`, `trace-setup`, `generate`, `generate-setup`) accept `-m, --model <name>` to select the Claude model — pass an alias (`sonnet` | `opus` | `haiku`) or a full model ID (e.g. `claude-opus-4-7`). The flag overrides the `CCQA_MODEL` environment variable; when both are unset, the Claude Code CLI default is used. Authentication is handled by your local Claude Code login no `ANTHROPIC_API_KEY` is required.
92
+ All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus` | `haiku`, or a full model ID). The flag overrides `CCQA_MODEL`; when both are unset, the Claude Code CLI default is used. Interactive commands authenticate via your local Claude Code login; commands that talk to Claude in CI (`ccqa run --drift`, `ccqa drift`) additionally honor `ANTHROPIC_API_KEY`.
343
93
 
344
- `<feature/spec>` is a 2-segment alias for the on-disk path `.ccqa/features/<feature>/test-cases/<spec>/`. Pass the alias, not the full directory path.
94
+ `<feature/spec>` is a 2-segment alias for the on-disk path `.ccqa/features/<feature>/test-cases/<spec>/`.
345
95
 
346
96
  ## File structure
347
97
 
348
98
  ```
349
99
  .ccqa/
350
- setups/
100
+ blocks/
351
101
  login/
352
- setup-spec.md # Setup definition with placeholders
353
- test.spec.ts # Generated setup script (with {{placeholders}})
102
+ spec.yaml # Reusable block (params + steps)
354
103
  features/
355
104
  tasks/
356
105
  test-cases/
357
106
  create-and-complete/
358
- test-spec.md # Test definition (references setups)
107
+ spec.yaml # Test definition
359
108
  actions.json # Recorded actions from trace
360
109
  test.spec.ts # Generated test script
361
110
  ```
362
111
 
363
- ## Why not write Playwright tests by hand?
364
-
365
- | | ccqa | Hand-written Playwright |
366
- |---|---|---|
367
- | Write selectors | Claude picks them from ARIA snapshots | You inspect the DOM |
368
- | Handle timing | Recorded wait commands, auto-fix sleep | `waitFor`, `expect().toBeVisible()` |
369
- | Assertions | Auto-generated from verified signals | Written manually |
370
- | Login / setup | Shared setup specs with placeholders | Custom fixtures per project |
371
- | Update after UI change | Re-run `trace` | Find and update every affected locator |
372
- | Runs in CI | Yes (deterministic replay, no LLM) | Yes |
373
-
374
112
  ## License
375
113
 
376
114
  MIT