ccqa 0.3.8 → 0.3.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -45,7 +45,7 @@ pnpm add -D agent-browser
45
45
 
46
46
  ## Usage
47
47
 
48
- **1. Write a spec**
48
+ **1. Write a spec** — by hand, or interactively with [`ccqa draft`](#draft--co-author-test-specmd-with-claude)
49
49
 
50
50
  ```markdown
51
51
  <!-- .ccqa/features/tasks/test-cases/create-and-complete/test-spec.md -->
@@ -103,6 +103,74 @@ ccqa generate tasks/create-and-complete
103
103
  ccqa run tasks/create-and-complete
104
104
  ```
105
105
 
106
+ ## Draft — co-author test-spec.md with Claude
107
+
108
+ Writing a `test-spec.md` from scratch means digging into your codebase to find the right aria-labels, URLs, and button text. `ccqa draft` puts Claude in the loop: you describe what you want to test in plain language, Claude reads the relevant code, and you refine the spec interactively.
109
+
110
+ ```bash
111
+ ccqa draft
112
+ ```
113
+
114
+ The first run asks for your intent, proposes a `feature/spec` name, and writes a draft. Each subsequent invocation lets you give a refinement instruction — empty input means "just re-check the current spec against the code." Press `y` at the final "Are you done with this draft?" prompt to end the session.
115
+
116
+ ```
117
+ ccqa draft
118
+
119
+ What do you want to test? > Select a category on the AI Maintenance page and run a check
120
+ Proposing a feature/spec name based on your intent...
121
+ proposed: ai-maintenance/run-check-with-category
122
+ Use this name? [y/N/edit] > y
123
+
124
+ Reading codebase and drafting spec...
125
+ ✓ 5 Read, 3 Grep, 2 Glob (4.2s)
126
+
127
+ ── Review (1 warning, 3 passed) ───────────────────────────────────
128
+
129
+ WARNINGS (1)
130
+ Assertability step-05
131
+ Result row may still show "running" right after the click
132
+ └ ContentQualityCheck.tsx polls every 5s; the status starts at
133
+ IN_PROGRESS and only flips to SUCCEEDED later.
134
+
135
+ PASSED (3)
136
+ Setup references, Step granularity, Unimplemented checks
137
+
138
+ ────────────────────────────────────────────────────────────────────
139
+
140
+ --- proposed changes ---
141
+ + ---
142
+ + title: "AI Maintenance — content quality check"
143
+ ...
144
+
145
+ Apply this patch? [y/N] y
146
+ saved: .ccqa/features/ai-maintenance/test-cases/run-check-with-category/test-spec.md
147
+
148
+ How would you like to refine? (empty = re-validate) >
149
+ ```
150
+
151
+ You can also edit `test-spec.md` directly in your editor between turns — `ccqa draft` re-reads the file each iteration.
152
+
153
+ ### What gets reviewed
154
+
155
+ Every turn Claude grades the spec on four axes and reports issues:
156
+
157
+ | Check | What it verifies |
158
+ |---|---|
159
+ | **Assertability** | Each step's **Expected** references concrete, observable signals (visible text, URL pattern, element state) that actually exist in the code. Flags timestamps, exact counts, and session-specific values that won't be stable across runs. |
160
+ | **Setup references** | Every `setups[].name` in the frontmatter resolves to an existing `.ccqa/setups/<name>/setup-spec.md`, and every `params` key matches that setup's `placeholders`. See [Setup Specs](#setup-specs--reusable-shared-procedures). |
161
+ | **Step granularity** | Steps aren't too coarse (multiple actions in one) or too fine (snapshot-only filler), and the order is logical. |
162
+ | **Unimplemented checks** | Anything the spec describes that Claude couldn't find in the codebase — a hint that you may be specifying behavior that doesn't exist yet. |
163
+
164
+ Findings with severity `WARN` or `ERROR` are shown in full; `OK` checks collapse to a one-line summary.
165
+
166
+ ### Flags
167
+
168
+ ```
169
+ ccqa draft [feature/spec] # arg is optional; Claude proposes a name if omitted
170
+ --instruction <text> # single-shot, non-interactive
171
+ --apply # auto-apply patches without [y/N] confirmation
172
+ ```
173
+
106
174
  ## Setup Specs — Reusable shared procedures
107
175
 
108
176
  Setup specs let you define reusable procedures (login, data preparation, etc.) that run before your test steps. Define once, use across multiple test specs.
@@ -253,6 +321,10 @@ ccqa generate tasks/create-and-complete --max-retries 5
253
321
  ## Commands
254
322
 
255
323
  ```
324
+ ccqa draft [feature/spec] Co-author a test spec with Claude
325
+ --instruction <text> Single-shot, non-interactive
326
+ --apply Auto-apply patches without [y/N] confirmation
327
+
256
328
  ccqa trace <feature/spec> Record browser actions for a test spec
257
329
  ccqa generate <feature/spec> Generate test script from recorded actions
258
330
  --auto Apply auto-fixes without confirmation (CI)
package/dist/bin/ccqa.mjs CHANGED
@@ -1,7 +1,7 @@
1
1
  #!/usr/bin/env node
2
2
  import { createRequire } from "node:module";
3
3
  import { Command } from "commander";
4
- import { accessSync, readFileSync } from "node:fs";
4
+ import { accessSync, readFileSync, statSync } from "node:fs";
5
5
  import { fileURLToPath } from "node:url";
6
6
  import { access, mkdir, mkdtemp, readFile, readdir, rm, stat, unlink, writeFile } from "node:fs/promises";
7
7
  import { delimiter, dirname, join, resolve } from "node:path";
@@ -10,6 +10,8 @@ import matter from "gray-matter";
10
10
  import { spawn } from "node:child_process";
11
11
  import { createInterface } from "node:readline";
12
12
  import { tmpdir } from "node:os";
13
+ import { createInterface as createInterface$1 } from "node:readline/promises";
14
+ import { z } from "zod";
13
15
  //#region src/prompts/trace.ts
14
16
  function generateSessionName() {
15
17
  return `ccqa-trace-${(/* @__PURE__ */ new Date()).toISOString().replace(/[:.]/g, "-")}`;
@@ -106,6 +108,7 @@ For each step:
106
108
  - Do NOT retry a selector without taking a fresh snapshot first
107
109
  - Do NOT work around blockers (login walls, missing data, captchas) — stop and report
108
110
  - **Do NOT suppress errors** — never use \`2>/dev/null\`, \`|| true\`, \`; other-command\`, or any other technique that hides agent-browser failures. Each \`agent-browser\` command must run standalone so failures are properly detected and recorded.
111
+ - **If \`agent-browser\` is not found, stop immediately.** Do not run \`which\`, \`find\`, \`npm ls\`, \`npm install\`, \`npx\`, \`brew\`, or any other discovery / installation command. Do not try alternate paths. The ccqa host already validates the binary before launching you, so if you see \`command not found\` it is a host-environment problem you cannot fix from inside the test run. Emit one line and terminate: \`ASSERTION_FAILED|step-XX|agent-browser binary not available in PATH\`.
109
112
 
110
113
  ## Source Code Reference
111
114
 
@@ -345,7 +348,7 @@ function resolveModel(explicit) {
345
348
  return envModel && envModel.length > 0 ? envModel : void 0;
346
349
  }
347
350
  async function invokeClaudeStreaming(options, onEvent) {
348
- const { prompt, systemPrompt, allowedTools, disableBuiltinTools = false, maxTurns, env, model, onAbAction, onAbActionFailed } = options;
351
+ const { prompt, systemPrompt, allowedTools, disableBuiltinTools = false, maxTurns, env, model, onAbAction, onAbActionFailed, silenceBashLog = false } = options;
349
352
  const resolvedModel = resolveModel(model);
350
353
  let lastAbToolUseId = null;
351
354
  const sdkOptions = {
@@ -397,7 +400,7 @@ async function invokeClaudeStreaming(options, onEvent) {
397
400
  const q = await buildMessageStream(prompt, sdkOptions);
398
401
  for await (const msg of q) {
399
402
  onEvent(msg);
400
- if (msg.type === "assistant") {
403
+ if (msg.type === "assistant" && !silenceBashLog) {
401
404
  for (const block of msg.message.content ?? []) if (block.type === "tool_use" && block.name === "Bash") {
402
405
  const cmd = block.input?.["command"];
403
406
  if (typeof cmd === "string") bash(cmd);
@@ -560,6 +563,16 @@ async function readSpecFile(featureName, specName, cwd) {
560
563
  throw new Error(`Spec file not found: ${specPath}`);
561
564
  });
562
565
  }
566
+ async function tryReadSpecFile(featureName, specName, cwd) {
567
+ return readFile(join(getSpecDir(featureName, specName, cwd), "test-spec.md"), "utf-8").catch(() => null);
568
+ }
569
+ async function saveSpecFile(featureName, specName, content, cwd) {
570
+ const specDir = getSpecDir(featureName, specName, cwd);
571
+ await mkdir(specDir, { recursive: true });
572
+ const specPath = join(specDir, "test-spec.md");
573
+ await writeFile(specPath, content.endsWith("\n") ? content : content + "\n", "utf-8");
574
+ return specPath;
575
+ }
563
576
  async function saveRoute(featureName, specName, route, cwd) {
564
577
  const specDir = getSpecDir(featureName, specName, cwd);
565
578
  await mkdir(specDir, { recursive: true });
@@ -645,6 +658,34 @@ async function listAllSpecs(cwd) {
645
658
  async function listSpecsForFeature(featureName, cwd) {
646
659
  return readdir(join(getFeatureDir(featureName, cwd), "test-cases")).catch(() => []);
647
660
  }
661
+ /**
662
+ * Lists every feature/spec dir under .ccqa/features/, regardless of whether
663
+ * the spec is fully drafted yet. Used by `ccqa draft` to suggest non-colliding
664
+ * feature/spec names that fit the existing structure.
665
+ */
666
+ async function listFeatureTree(cwd) {
667
+ const featuresDir = join(getCcqaDir(cwd), "features");
668
+ const featureDirs = await readdir(featuresDir).catch(() => []);
669
+ return Promise.all(featureDirs.map(async (featureName) => {
670
+ const testCasesDir = join(featuresDir, featureName, "test-cases");
671
+ const specDirs = await readdir(testCasesDir).catch(() => []);
672
+ return {
673
+ featureName,
674
+ specs: await Promise.all(specDirs.map(async (specName) => {
675
+ const content = await readFile(join(testCasesDir, specName, "test-spec.md"), "utf-8").catch(() => null);
676
+ if (content === null) return {
677
+ specName,
678
+ hasSpecFile: false
679
+ };
680
+ return {
681
+ specName,
682
+ hasSpecFile: true,
683
+ title: content.match(/^title:\s*"?([^"\n]+)"?/m)?.[1]?.trim()
684
+ };
685
+ }))
686
+ };
687
+ }));
688
+ }
648
689
  function routeToMarkdown(route) {
649
690
  const lines = [
650
691
  "---",
@@ -834,22 +875,46 @@ function waitExit(child) {
834
875
  //#endregion
835
876
  //#region src/runtime/agent-browser-bin.ts
836
877
  const require$1 = createRequire(import.meta.url);
878
+ function hasAgentBrowserShim(dir) {
879
+ try {
880
+ statSync(join(dir, "agent-browser"));
881
+ return true;
882
+ } catch {
883
+ return false;
884
+ }
885
+ }
886
+ /**
887
+ * Walks up from `start` looking for a `node_modules/.bin/agent-browser` shim.
888
+ * Returns the .bin directory containing the shim, or null if none is found.
889
+ */
890
+ function findNodeModulesBin(start) {
891
+ let cur = start;
892
+ while (true) {
893
+ const candidate = join(cur, "node_modules", ".bin");
894
+ if (hasAgentBrowserShim(candidate)) return candidate;
895
+ const parent = dirname(cur);
896
+ if (parent === cur) return null;
897
+ cur = parent;
898
+ }
899
+ }
837
900
  /**
838
901
  * Resolves the directory containing the `agent-browser` shim that npm/pnpm
839
902
  * exposes on PATH for the peer-installed package. Used by `ccqa trace` to
840
903
  * prepend this directory to PATH so the Claude subprocess can invoke
841
904
  * `agent-browser ...` without requiring a global install.
842
905
  *
843
- * Returns null if agent-browser cannot be resolved (peer not installed).
906
+ * Returns null if agent-browser cannot be located.
844
907
  */
845
908
  function resolveAgentBrowserBinDir() {
846
- let pkgJsonPath;
909
+ const fromCwd = findNodeModulesBin(process.cwd());
910
+ if (fromCwd) return fromCwd;
911
+ const fromSelf = findNodeModulesBin(dirname(require$1.resolve("agent-browser/package.json")));
912
+ if (fromSelf) return fromSelf;
847
913
  try {
848
- pkgJsonPath = require$1.resolve("agent-browser/package.json");
849
- } catch {
850
- return null;
851
- }
852
- return join(dirname(pkgJsonPath), "node_modules", ".bin");
914
+ const candidate = join(dirname(require$1.resolve("agent-browser/package.json")), "node_modules", ".bin");
915
+ if (hasAgentBrowserShim(candidate)) return candidate;
916
+ } catch {}
917
+ return null;
853
918
  }
854
919
  /**
855
920
  * Returns a PATH string with the agent-browser shim directory prepended,
@@ -863,6 +928,48 @@ function pathWithAgentBrowserShim(currentPath) {
863
928
  if (path.split(delimiter).includes(dir)) return path;
864
929
  return dir + delimiter + path;
865
930
  }
931
+ /**
932
+ * Confirms before launching Claude that an `agent-browser` shim is reachable
933
+ * via PATH. We do this up front so a missing peer dependency fails fast with
934
+ * a clear message, instead of Claude burning tokens probing the system with
935
+ * `which`, `find`, `npm install`, etc.
936
+ *
937
+ * The `resolver` argument is for tests; production calls take no args.
938
+ */
939
+ function assertAgentBrowserAvailable(resolver = resolveAgentBrowserBinDir) {
940
+ const dir = resolver();
941
+ if (!dir) throw new AgentBrowserUnavailableError();
942
+ const shim = join(dir, "agent-browser");
943
+ try {
944
+ const s = statSync(shim);
945
+ if (!s.isFile() && !s.isSymbolicLink()) throw new AgentBrowserUnavailableError();
946
+ } catch {
947
+ throw new AgentBrowserUnavailableError();
948
+ }
949
+ return dir;
950
+ }
951
+ var AgentBrowserUnavailableError = class extends Error {
952
+ constructor() {
953
+ super("agent-browser binary not found on PATH");
954
+ this.name = "AgentBrowserUnavailableError";
955
+ }
956
+ };
957
+ /** Human-readable explanation shown to the user when the guard fires. */
958
+ function formatAgentBrowserUnavailableMessage() {
959
+ return [
960
+ "agent-browser is not installed or not on PATH.",
961
+ "",
962
+ "ccqa drives the browser via the peer-installed `agent-browser` package.",
963
+ "Install it in this project:",
964
+ "",
965
+ " pnpm add -D agent-browser",
966
+ " # or",
967
+ " npm install -D agent-browser",
968
+ "",
969
+ "If it is already installed, make sure you are running ccqa from the",
970
+ "project root (or via your package runner, e.g. `pnpm exec ccqa ...`)."
971
+ ].join("\n");
972
+ }
866
973
  //#endregion
867
974
  //#region src/runtime/env-vars.ts
868
975
  const ENV_VAR_RE = /\$\{([A-Z_][A-Z0-9_]*)\}|\$([A-Z_][A-Z0-9_]*)/g;
@@ -921,6 +1028,15 @@ const traceCommand = new Command("trace").argument("<feature/spec>", "Spec id in
921
1028
  });
922
1029
  async function runTrace(featureName, specName, model) {
923
1030
  header("trace", `${featureName}/${specName}`);
1031
+ try {
1032
+ meta("agent-browser", assertAgentBrowserAvailable());
1033
+ } catch (e) {
1034
+ if (e instanceof AgentBrowserUnavailableError) {
1035
+ error(formatAgentBrowserUnavailableMessage());
1036
+ process.exit(1);
1037
+ }
1038
+ throw e;
1039
+ }
924
1040
  await ensureCcqaDir();
925
1041
  const spec = parseTestSpec(await readSpecFile(featureName, specName));
926
1042
  const hasSetups = (spec.setups?.length ?? 0) > 0;
@@ -1567,7 +1683,7 @@ async function diagnose(input, options = {}) {
1567
1683
  reason: "diagnose returned no parseable diagnosis JSON"
1568
1684
  },
1569
1685
  confidence: 0,
1570
- reasoning: truncate$1(raw, 1e3)
1686
+ reasoning: truncate$2(raw, 1e3)
1571
1687
  },
1572
1688
  raw,
1573
1689
  sdkError: false
@@ -1624,7 +1740,7 @@ function extractJsonCandidates(raw) {
1624
1740
  }
1625
1741
  return out;
1626
1742
  }
1627
- function truncate$1(s, max) {
1743
+ function truncate$2(s, max) {
1628
1744
  return s.length <= max ? s : `${s.slice(0, max)}... [truncated, ${s.length - max} more chars]`;
1629
1745
  }
1630
1746
  function stripFence(raw) {
@@ -1852,11 +1968,11 @@ async function captureSnapshot(sessionName) {
1852
1968
  resolve(null);
1853
1969
  return;
1854
1970
  }
1855
- resolve(truncate(trimmed, MAX_OUTPUT_BYTES));
1971
+ resolve(truncate$1(trimmed, MAX_OUTPUT_BYTES));
1856
1972
  });
1857
1973
  });
1858
1974
  }
1859
- function truncate(s, maxBytes) {
1975
+ function truncate$1(s, maxBytes) {
1860
1976
  if (s.length <= maxBytes) return s;
1861
1977
  return `${s.slice(0, maxBytes)}\n... [truncated, ${s.length - maxBytes} more chars]`;
1862
1978
  }
@@ -2426,6 +2542,15 @@ const traceSetupCommand = new Command("trace-setup").argument("<name>", "Setup n
2426
2542
  });
2427
2543
  async function runTraceSetup(name, model) {
2428
2544
  header("trace-setup", name);
2545
+ try {
2546
+ meta("agent-browser", assertAgentBrowserAvailable());
2547
+ } catch (e) {
2548
+ if (e instanceof AgentBrowserUnavailableError) {
2549
+ error(formatAgentBrowserUnavailableMessage());
2550
+ process.exit(1);
2551
+ }
2552
+ throw e;
2553
+ }
2429
2554
  await ensureCcqaDir();
2430
2555
  const spec = parseSetupSpec(await readSetupSpecFile(name));
2431
2556
  const resolvedSpec = replacePlaceholdersWithDummies(spec);
@@ -2695,6 +2820,607 @@ async function cleanupActions(actions, model) {
2695
2820
  return actions;
2696
2821
  }
2697
2822
  //#endregion
2823
+ //#region src/prompts/draft.ts
2824
+ function buildNamingSystemPrompt() {
2825
+ return `You name a new ccqa test case based on the user's intent and the existing feature tree.
2826
+
2827
+ ccqa test cases live under \`.ccqa/features/<featureName>/test-cases/<specName>/test-spec.md\`.
2828
+
2829
+ ## Naming rules
2830
+
2831
+ - featureName and specName are kebab-case ASCII (lowercase, words separated by '-').
2832
+ - featureName: a broad area (e.g. "tasks", "auth", "billing", "search").
2833
+ - specName: a short scenario name (e.g. "create-and-complete", "login-with-email", "search-by-tag").
2834
+ - Reuse existing featureName when the user's intent fits an existing area. Only invent a new featureName when the existing tree clearly does not cover the area.
2835
+ - specName must NOT collide with an existing spec under the chosen feature. If the natural name collides, pick a different one that distinguishes the new scenario from the existing ones.
2836
+ - Use the codebase (Read/Grep/Glob) sparingly to confirm domain vocabulary if helpful. Do not over-explore.
2837
+
2838
+ ## Output (STRICT)
2839
+
2840
+ Output ONE fenced \`\`\`json block, nothing else outside it:
2841
+
2842
+ {
2843
+ "featureName": "<kebab-case>",
2844
+ "specName": "<kebab-case>",
2845
+ "reason": "<one short sentence: why this name and how it relates to existing specs>"
2846
+ }
2847
+ `;
2848
+ }
2849
+ function buildNamingPrompt(intent, tree) {
2850
+ return `## User intent
2851
+
2852
+ ${intent}
2853
+
2854
+ ## Existing feature tree
2855
+
2856
+ ${tree.length === 0 ? "(no existing features yet)" : tree.map((f) => {
2857
+ const specLines = f.specs.length === 0 ? " (no specs yet)" : f.specs.map((s) => ` - ${s.specName}${s.title ? ` — ${s.title}` : ""}`).join("\n");
2858
+ return `- ${f.featureName}/\n${specLines}`;
2859
+ }).join("\n")}
2860
+
2861
+ ## Task
2862
+
2863
+ Pick featureName and specName for the new test case. Follow the naming rules. Avoid colliding with any existing specName under the chosen feature.
2864
+ `;
2865
+ }
2866
+ function buildDraftSystemPrompt() {
2867
+ return `You are a QA engineer drafting and refining a ccqa test-spec.md.
2868
+
2869
+ The CLI runs you in a loop: each turn the user gives an intent (first run) or a refinement instruction (later runs). You read the codebase, validate the spec, and return a single JSON report. The CLI displays a diff and asks the user whether to apply.
2870
+
2871
+ ## test-spec.md format (STRICT)
2872
+
2873
+ YAML frontmatter + Markdown body.
2874
+
2875
+ Frontmatter fields:
2876
+ - title: string (required)
2877
+ - baseUrl: string (required, e.g. http://localhost:3000)
2878
+ - prerequisites: string (optional, free text)
2879
+ - setups: array of { name: string, params?: Record<string,string> } (optional)
2880
+
2881
+ Body must contain a \`## Steps\` section followed by step blocks:
2882
+
2883
+ \`\`\`
2884
+ ### Step 1: <short title>
2885
+ - **Instruction**: <imperative, one sentence>
2886
+ - **Expected**: <observable outcome>
2887
+
2888
+ ### Step 2: <short title>
2889
+ ...
2890
+ \`\`\`
2891
+
2892
+ ## Quality rules
2893
+
2894
+ - One user-facing action per step (login, click, fill, navigate, ...).
2895
+ - **Expected** must be assertion-friendly: visible text, URL pattern, element state.
2896
+ - Forbidden in **Expected**: timestamps, exact counts, session IDs, internal state.
2897
+ - 3–8 steps is typical. Fewer means too coarse; more means too fine.
2898
+
2899
+ ## Workflow (use Read / Grep / Glob extensively)
2900
+
2901
+ 1. Read the codebase under cwd to find concrete strings: routes, button labels, aria-labels, page titles, placeholders. Use those exact strings in **Expected**.
2902
+ 2. If the spec references setups, Read \`.ccqa/setups/<name>/setup-spec.md\` and verify each \`params\` key matches the setup's \`placeholders\`.
2903
+ 3. Validate the (current or proposed) spec on four axes — emit one issue per finding:
2904
+ - **assertable**: each Expected can be verified against a string/URL/state that exists in code.
2905
+ - **setups**: referenced setup exists; params keys match placeholders.
2906
+ - **granularity**: not too coarse (multiple actions per step) nor too fine (snapshot-only steps); order is logical.
2907
+ - **unimplemented**: any feature mentioned in the spec that you cannot find in code.
2908
+
2909
+ ## Output contract (STRICT)
2910
+
2911
+ Output exactly ONE fenced \`\`\`json code block, and nothing else outside it. No prose before or after.
2912
+
2913
+ Schema:
2914
+
2915
+ \`\`\`json
2916
+ {
2917
+ "issues": [
2918
+ {
2919
+ "severity": "OK" | "WARN" | "ERROR",
2920
+ "category": "assertable" | "setups" | "granularity" | "unimplemented",
2921
+ "stepId": "step-01" | null,
2922
+ "message": "<one-line summary>",
2923
+ "detail": "<optional, multiline explanation>"
2924
+ }
2925
+ ],
2926
+ "patch": "<COMPLETE rewritten test-spec.md, or empty string if no changes>"
2927
+ }
2928
+ \`\`\`
2929
+
2930
+ ## Patch rules
2931
+
2932
+ - \`patch\` must be the COMPLETE file content if non-empty (never a diff fragment).
2933
+ - The CLI replaces the file atomically with \`patch\`.
2934
+ - For **create** mode: produce a fresh spec from the user intent.
2935
+ - For **refine** mode with a non-empty user instruction: apply the user's request, plus fix any issues it introduces. Preserve the user's wording elsewhere.
2936
+ - For **refine** mode with an empty user instruction: only fix issues you find against the current spec; if everything is fine, return \`patch: ""\`.
2937
+ - If \`patch\` is the same as the current spec, return \`patch: ""\` instead.
2938
+ `;
2939
+ }
2940
+ function buildDraftPrompt(input) {
2941
+ const { mode, existing, userInput } = input;
2942
+ if (mode === "create") return `## Mode
2943
+
2944
+ create — no spec exists yet at the target path. Produce a fresh test-spec.md.
2945
+
2946
+ ## User intent
2947
+
2948
+ ${userInput}
2949
+
2950
+ ## Task
2951
+
2952
+ Read the codebase under cwd. Discover concrete strings (routes, labels, titles). Produce a complete test-spec.md as the \`patch\` field, plus any issues you'd flag about your own draft.
2953
+ `;
2954
+ return `## Mode
2955
+
2956
+ refine — a spec already exists. Apply the user's instruction (if any) and validate against the codebase.
2957
+
2958
+ ## Current spec
2959
+
2960
+ \`\`\`markdown
2961
+ ${existing}\`\`\`
2962
+
2963
+ ${userInput ? `## User refinement instruction\n\n${userInput}\n` : `## User refinement instruction\n\n(empty — re-validate the current spec against the codebase; only emit a non-empty patch if something is actually wrong)\n`}
2964
+ ## Task
2965
+
2966
+ 1. Read the codebase under cwd and any referenced setups.
2967
+ 2. If the user's instruction is non-empty, apply it to the spec.
2968
+ 3. Validate the resulting spec on the four axes. Emit issues.
2969
+ 4. Return the complete updated spec as \`patch\`. If no changes are needed, return \`patch: ""\`.
2970
+ `;
2971
+ }
2972
+ //#endregion
2973
+ //#region src/types.ts
2974
+ const TestStepSchema = z.object({
2975
+ id: z.string(),
2976
+ title: z.string(),
2977
+ instruction: z.string(),
2978
+ expected: z.string()
2979
+ });
2980
+ const SetupRefSchema = z.object({
2981
+ name: z.string(),
2982
+ params: z.record(z.string(), z.string()).optional()
2983
+ });
2984
+ z.object({
2985
+ title: z.string(),
2986
+ baseUrl: z.string(),
2987
+ prerequisites: z.string().optional(),
2988
+ setups: z.array(SetupRefSchema).optional(),
2989
+ steps: z.array(TestStepSchema)
2990
+ });
2991
+ const PlaceholderDefSchema = z.object({
2992
+ dummy: z.string(),
2993
+ description: z.string().optional()
2994
+ });
2995
+ z.object({
2996
+ title: z.string(),
2997
+ placeholders: z.record(z.string(), PlaceholderDefSchema).optional(),
2998
+ steps: z.array(TestStepSchema)
2999
+ });
3000
+ const RouteStepSchema = z.object({
3001
+ title: z.string(),
3002
+ action: z.string(),
3003
+ observation: z.string(),
3004
+ status: z.enum([
3005
+ "PASSED",
3006
+ "FAILED",
3007
+ "SKIPPED"
3008
+ ]),
3009
+ reason: z.string().optional()
3010
+ });
3011
+ z.object({
3012
+ specName: z.string(),
3013
+ timestamp: z.string(),
3014
+ status: z.enum(["passed", "failed"]),
3015
+ steps: z.array(RouteStepSchema)
3016
+ });
3017
+ const DraftIssueSchema = z.object({
3018
+ severity: z.enum([
3019
+ "OK",
3020
+ "WARN",
3021
+ "ERROR"
3022
+ ]),
3023
+ category: z.enum([
3024
+ "assertable",
3025
+ "setups",
3026
+ "granularity",
3027
+ "unimplemented"
3028
+ ]),
3029
+ stepId: z.string().nullable(),
3030
+ message: z.string(),
3031
+ detail: z.string().optional()
3032
+ });
3033
+ const DraftReportSchema = z.object({
3034
+ issues: z.array(DraftIssueSchema),
3035
+ patch: z.string()
3036
+ });
3037
+ const DraftNamingSchema = z.object({
3038
+ featureName: z.string().min(1),
3039
+ specName: z.string().min(1),
3040
+ reason: z.string().optional()
3041
+ });
3042
+ //#endregion
3043
+ //#region src/cli/draft.ts
3044
+ const CATEGORY_LABEL = {
3045
+ assertable: "Assertability",
3046
+ setups: "Setup references",
3047
+ granularity: "Step granularity",
3048
+ unimplemented: "Unimplemented checks"
3049
+ };
3050
+ const draftCommand = new Command("draft").argument("[feature/spec]", "Optional spec path (e.g. tasks/create-and-complete). If omitted, Claude proposes one from your intent.").description("Interactively draft and refine a test-spec.md with Claude Code").option("--instruction <text>", "Non-interactive single-shot instruction (skips the interactive loop)").option("--apply", "Auto-apply each generated patch without [y/N] confirmation", false).action(async (specPath, opts) => {
3051
+ await ensureCcqaDir();
3052
+ let featureName;
3053
+ let specName;
3054
+ let prefilledIntent = null;
3055
+ if (specPath) ({featureName, specName} = parseSpecPath(specPath));
3056
+ else {
3057
+ const { naming, intent } = await proposeNaming(opts);
3058
+ featureName = naming.featureName;
3059
+ specName = naming.specName;
3060
+ prefilledIntent = intent;
3061
+ }
3062
+ await runDraft(featureName, specName, opts, prefilledIntent);
3063
+ });
3064
+ async function runDraft(featureName, specName, opts, prefilledIntent) {
3065
+ header("draft", `${featureName}/${specName}`);
3066
+ const oneShot = opts.instruction !== void 0;
3067
+ let useIntentOnce = prefilledIntent !== null && !oneShot;
3068
+ while (true) {
3069
+ const existing = await tryReadSpecFile(featureName, specName);
3070
+ const isFirstRun = existing === null;
3071
+ let userInput;
3072
+ if (oneShot) userInput = opts.instruction ?? "";
3073
+ else if (useIntentOnce && isFirstRun) {
3074
+ userInput = prefilledIntent ?? "";
3075
+ useIntentOnce = false;
3076
+ } else userInput = await prompt(isFirstRun ? "What do you want to test? > " : "How would you like to refine? (empty = re-validate) > ");
3077
+ if (isFirstRun && !userInput.trim()) {
3078
+ error("intent required for the first draft (no spec exists yet)");
3079
+ process.exit(1);
3080
+ }
3081
+ const turnResult = await runOneTurn({
3082
+ featureName,
3083
+ specName,
3084
+ existing,
3085
+ userInput: userInput.trim(),
3086
+ autoApply: opts.apply === true
3087
+ });
3088
+ if (oneShot) process.exit(turnResult.hasError && !turnResult.applied ? 1 : 0);
3089
+ blank();
3090
+ if (/^y/i.test(await prompt("Are you done with this draft? [y/N] "))) {
3091
+ info("draft session complete.");
3092
+ hint(`run 'ccqa trace ${featureName}/${specName}' to record actions`);
3093
+ process.exit(0);
3094
+ }
3095
+ }
3096
+ }
3097
+ async function runOneTurn(input) {
3098
+ const { featureName, specName, existing, userInput, autoApply } = input;
3099
+ const isFirstRun = existing === null;
3100
+ const systemPrompt = buildDraftSystemPrompt();
3101
+ const userPrompt = buildDraftPrompt({
3102
+ mode: isFirstRun ? "create" : "refine",
3103
+ existing: existing ?? "",
3104
+ userInput
3105
+ });
3106
+ info(isFirstRun ? "Reading codebase and drafting spec..." : "Re-validating spec against codebase...");
3107
+ const toolCounts = {};
3108
+ const startedAt = Date.now();
3109
+ const { result, isError } = await invokeClaudeStreaming({
3110
+ prompt: userPrompt,
3111
+ systemPrompt,
3112
+ allowedTools: [
3113
+ "Read",
3114
+ "Grep",
3115
+ "Glob"
3116
+ ],
3117
+ silenceBashLog: true
3118
+ }, (msg) => {
3119
+ if (msg.type !== "assistant") return;
3120
+ for (const block of msg.message.content ?? []) if (block.type === "tool_use") toolCounts[block.name] = (toolCounts[block.name] ?? 0) + 1;
3121
+ });
3122
+ printToolSummary(toolCounts, Date.now() - startedAt);
3123
+ if (isError) {
3124
+ error("Claude returned an error result");
3125
+ return {
3126
+ hasError: true,
3127
+ applied: false
3128
+ };
3129
+ }
3130
+ const json = extractJsonBlock(result);
3131
+ if (!json) {
3132
+ error("Claude did not return a json block");
3133
+ warn(`raw tail: ${truncate(result, 200)}`);
3134
+ return {
3135
+ hasError: true,
3136
+ applied: false
3137
+ };
3138
+ }
3139
+ let report;
3140
+ try {
3141
+ report = DraftReportSchema.parse(JSON.parse(json));
3142
+ } catch (e) {
3143
+ error(`failed to parse draft report: ${e.message}`);
3144
+ return {
3145
+ hasError: true,
3146
+ applied: false
3147
+ };
3148
+ }
3149
+ const hasError = printReviewBlock(report.issues);
3150
+ const original = existing ?? "";
3151
+ if (!report.patch || report.patch === original) {
3152
+ blank();
3153
+ info("no changes proposed.");
3154
+ return {
3155
+ hasError,
3156
+ applied: false
3157
+ };
3158
+ }
3159
+ blank();
3160
+ info("--- proposed changes ---");
3161
+ printUnifiedDiff(original, report.patch);
3162
+ blank();
3163
+ if (!(autoApply ? true : /^y/i.test(await prompt("Apply this patch? [y/N] ")))) {
3164
+ info("aborted — no changes applied.");
3165
+ return {
3166
+ hasError,
3167
+ applied: false
3168
+ };
3169
+ }
3170
+ try {
3171
+ parseTestSpec(report.patch);
3172
+ } catch (e) {
3173
+ error(`refused to apply: patch failed validation (${e.message})`);
3174
+ return {
3175
+ hasError: true,
3176
+ applied: false
3177
+ };
3178
+ }
3179
+ meta("saved", await saveSpecFile(featureName, specName, report.patch));
3180
+ return {
3181
+ hasError,
3182
+ applied: true
3183
+ };
3184
+ }
3185
+ async function prompt(question) {
3186
+ const rl = createInterface$1({
3187
+ input: process.stdin,
3188
+ output: process.stdout
3189
+ });
3190
+ rl.on("SIGINT", () => {
3191
+ rl.close();
3192
+ process.exit(130);
3193
+ });
3194
+ try {
3195
+ return (await rl.question(question)).trim();
3196
+ } finally {
3197
+ rl.close();
3198
+ }
3199
+ }
3200
+ /** Aggregated tool-call counts shown after each Claude turn. */
3201
+ function formatToolSummary(counts, elapsedMs) {
3202
+ const entries = Object.entries(counts).filter(([, n]) => n > 0).sort((a, b) => b[1] - a[1]).map(([name, n]) => `${n} ${name}`);
3203
+ return ` ✓ ${entries.length === 0 ? "no tool calls" : entries.join(", ")} (${(elapsedMs / 1e3).toFixed(1)}s)`;
3204
+ }
3205
+ function printToolSummary(counts, elapsedMs) {
3206
+ process.stdout.write(`${formatToolSummary(counts, elapsedMs)}\n`);
3207
+ }
3208
+ /**
3209
+ * Renders the review report as a visually separated block, grouped by
3210
+ * severity. ERROR and WARN findings get full detail; OK findings collapse
3211
+ * to a one-line summary of category names. Returns whether any ERROR
3212
+ * severity was emitted.
3213
+ */
3214
+ function printReviewBlock(issues) {
3215
+ const RULE = "─".repeat(67);
3216
+ const errors = issues.filter((i) => i.severity === "ERROR");
3217
+ const warnings = issues.filter((i) => i.severity === "WARN");
3218
+ const passed = issues.filter((i) => i.severity === "OK");
3219
+ const headerParts = [];
3220
+ if (errors.length) headerParts.push(`${errors.length} error${errors.length > 1 ? "s" : ""}`);
3221
+ if (warnings.length) headerParts.push(`${warnings.length} warning${warnings.length > 1 ? "s" : ""}`);
3222
+ if (passed.length) headerParts.push(`${passed.length} passed`);
3223
+ const headerSuffix = headerParts.length ? ` (${headerParts.join(", ")})` : "";
3224
+ const ruleLen = Math.max(0, 60 - headerSuffix.length);
3225
+ process.stdout.write(`\n── Review${headerSuffix} ${"─".repeat(ruleLen)}\n\n`);
3226
+ if (issues.length === 0) {
3227
+ process.stdout.write(" (no findings)\n");
3228
+ process.stdout.write(`\n${RULE}\n\n`);
3229
+ return false;
3230
+ }
3231
+ if (errors.length) {
3232
+ process.stdout.write(` ERRORS (${errors.length})\n`);
3233
+ for (const issue of errors) writeFinding(issue);
3234
+ process.stdout.write("\n");
3235
+ }
3236
+ if (warnings.length) {
3237
+ process.stdout.write(` WARNINGS (${warnings.length})\n`);
3238
+ for (const issue of warnings) writeFinding(issue);
3239
+ process.stdout.write("\n");
3240
+ }
3241
+ if (passed.length) {
3242
+ const names = passed.map((i) => CATEGORY_LABEL[i.category]).join(", ");
3243
+ process.stdout.write(` PASSED (${passed.length})\n ${names}\n`);
3244
+ }
3245
+ process.stdout.write(`\n${RULE}\n\n`);
3246
+ return errors.length > 0;
3247
+ }
3248
+ function writeFinding(issue) {
3249
+ const stepPart = issue.stepId ? ` ${issue.stepId}` : "";
3250
+ process.stdout.write(` ${CATEGORY_LABEL[issue.category]}${stepPart}\n`);
3251
+ process.stdout.write(` ${issue.message}\n`);
3252
+ if (issue.detail) process.stdout.write(` └ ${issue.detail.replace(/\n/g, "\n ")}\n`);
3253
+ }
3254
+ async function proposeNaming(opts) {
3255
+ const oneShot = opts.instruction !== void 0;
3256
+ const intent = oneShot ? opts.instruction ?? "" : await prompt("What do you want to test? > ");
3257
+ if (!intent.trim()) {
3258
+ error("intent required to propose a feature/spec name");
3259
+ process.exit(1);
3260
+ }
3261
+ const tree = await listFeatureTree();
3262
+ const treeForPrompt = tree.map((f) => ({
3263
+ featureName: f.featureName,
3264
+ specs: f.specs.map((s) => ({
3265
+ specName: s.specName,
3266
+ ...s.title ? { title: s.title } : {}
3267
+ }))
3268
+ }));
3269
+ info("Proposing a feature/spec name based on your intent...");
3270
+ const { result, isError } = await invokeClaudeStreaming({
3271
+ silenceBashLog: true,
3272
+ prompt: buildNamingPrompt(intent.trim(), treeForPrompt),
3273
+ systemPrompt: buildNamingSystemPrompt(),
3274
+ allowedTools: [
3275
+ "Read",
3276
+ "Grep",
3277
+ "Glob"
3278
+ ]
3279
+ }, () => {});
3280
+ if (isError) {
3281
+ error("Claude failed during naming");
3282
+ process.exit(1);
3283
+ }
3284
+ const json = extractJsonBlock(result);
3285
+ if (!json) {
3286
+ error("Claude did not return a json block for naming");
3287
+ process.exit(1);
3288
+ }
3289
+ let proposed;
3290
+ try {
3291
+ proposed = DraftNamingSchema.parse(JSON.parse(json));
3292
+ } catch (e) {
3293
+ error(`failed to parse naming response: ${e.message}`);
3294
+ process.exit(1);
3295
+ }
3296
+ const sanitized = {
3297
+ featureName: sanitizeNamePart(proposed.featureName),
3298
+ specName: sanitizeNamePart(proposed.specName)
3299
+ };
3300
+ if (!sanitized.featureName || !sanitized.specName) {
3301
+ error(`Claude returned an invalid name: ${proposed.featureName}/${proposed.specName}`);
3302
+ process.exit(1);
3303
+ }
3304
+ const final = ensureUnique(tree, sanitized.featureName, sanitized.specName);
3305
+ meta("proposed", `${final.featureName}/${final.specName}`);
3306
+ if (proposed.reason) meta("reason", proposed.reason);
3307
+ if (oneShot || opts.apply === true) return {
3308
+ naming: final,
3309
+ intent: intent.trim()
3310
+ };
3311
+ const answer = await prompt(`Use this name? [y/N/edit] > `);
3312
+ if (/^y/i.test(answer)) return {
3313
+ naming: final,
3314
+ intent: intent.trim()
3315
+ };
3316
+ if (/^e/i.test(answer)) {
3317
+ const manual = await prompt("Enter feature/spec (e.g. tasks/create-and-complete) > ");
3318
+ const parts = manual.split("/");
3319
+ if (parts.length !== 2 || !parts[0] || !parts[1]) {
3320
+ error(`invalid spec path: "${manual}". Expected "<feature>/<spec>"`);
3321
+ process.exit(1);
3322
+ }
3323
+ const featureName = sanitizeNamePart(parts[0]);
3324
+ const specName = sanitizeNamePart(parts[1]);
3325
+ if (!featureName || !specName) {
3326
+ error(`invalid characters in name: ${parts[0]}/${parts[1]}`);
3327
+ process.exit(1);
3328
+ }
3329
+ return {
3330
+ naming: {
3331
+ featureName,
3332
+ specName
3333
+ },
3334
+ intent: intent.trim()
3335
+ };
3336
+ }
3337
+ info("aborted — no draft created.");
3338
+ process.exit(0);
3339
+ }
3340
+ /**
3341
+ * Restrict to kebab-case-friendly characters: lowercase letters, digits, hyphen.
3342
+ * Anything else is dropped or replaced with '-'. Collapses repeated/edge hyphens.
3343
+ */
3344
+ function sanitizeNamePart(raw) {
3345
+ return raw.trim().toLowerCase().replace(/[^a-z0-9]+/g, "-").replace(/^-+|-+$/g, "").slice(0, 60);
3346
+ }
3347
+ function ensureUnique(tree, featureName, specName) {
3348
+ const feature = tree.find((f) => f.featureName === featureName);
3349
+ if (!feature) return {
3350
+ featureName,
3351
+ specName
3352
+ };
3353
+ const taken = new Set(feature.specs.map((s) => s.specName));
3354
+ if (!taken.has(specName)) return {
3355
+ featureName,
3356
+ specName
3357
+ };
3358
+ for (let i = 2; i < 100; i++) {
3359
+ const candidate = `${specName}-${i}`;
3360
+ if (!taken.has(candidate)) return {
3361
+ featureName,
3362
+ specName: candidate
3363
+ };
3364
+ }
3365
+ return {
3366
+ featureName,
3367
+ specName: `${specName}-${Date.now()}`
3368
+ };
3369
+ }
3370
+ function extractJsonBlock(text) {
3371
+ const fenced = text.match(/```(?:json)?\s*\n([\s\S]*?)\n```/);
3372
+ if (fenced && fenced[1]) return fenced[1].trim();
3373
+ const trimmed = text.trim();
3374
+ if (trimmed.startsWith("{") && trimmed.endsWith("}")) return trimmed;
3375
+ return null;
3376
+ }
3377
+ function printUnifiedDiff(before, after) {
3378
+ const lines = computeLineDiff(before.split("\n"), after.split("\n"));
3379
+ for (const line of lines) process.stdout.write(line + "\n");
3380
+ }
3381
+ function computeLineDiff(a, b) {
3382
+ const n = a.length;
3383
+ const m = b.length;
3384
+ const dp = Array.from({ length: n + 1 }, () => new Array(m + 1).fill(0));
3385
+ for (let i = n - 1; i >= 0; i--) for (let j = m - 1; j >= 0; j--) dp[i][j] = a[i] === b[j] ? dp[i + 1][j + 1] + 1 : Math.max(dp[i + 1][j], dp[i][j + 1]);
3386
+ const out = [];
3387
+ let i = 0;
3388
+ let j = 0;
3389
+ while (i < n && j < m) if (a[i] === b[j]) {
3390
+ out.push({
3391
+ kind: "ctx",
3392
+ text: a[i]
3393
+ });
3394
+ i++;
3395
+ j++;
3396
+ } else if (dp[i + 1][j] >= dp[i][j + 1]) {
3397
+ out.push({
3398
+ kind: "del",
3399
+ text: a[i]
3400
+ });
3401
+ i++;
3402
+ } else {
3403
+ out.push({
3404
+ kind: "add",
3405
+ text: b[j]
3406
+ });
3407
+ j++;
3408
+ }
3409
+ while (i < n) out.push({
3410
+ kind: "del",
3411
+ text: a[i++]
3412
+ });
3413
+ while (j < m) out.push({
3414
+ kind: "add",
3415
+ text: b[j++]
3416
+ });
3417
+ return out.map((l) => l.kind === "add" ? `+ ${l.text}` : l.kind === "del" ? `- ${l.text}` : ` ${l.text}`);
3418
+ }
3419
+ function truncate(s, n) {
3420
+ if (s.length <= n) return s;
3421
+ return s.slice(s.length - n);
3422
+ }
3423
+ //#endregion
2698
3424
  //#region src/cli/index.ts
2699
3425
  const packageJsonPath = resolvePackageJson();
2700
3426
  const { version } = JSON.parse(readFileSync(packageJsonPath, "utf8"));
@@ -2710,6 +3436,7 @@ function resolvePackageJson() {
2710
3436
  }
2711
3437
  const program = new Command();
2712
3438
  program.name("ccqa").description("E2E test CLI using Claude Code + agent-browser").version(version);
3439
+ program.addCommand(draftCommand);
2713
3440
  program.addCommand(traceCommand);
2714
3441
  program.addCommand(generateCommand);
2715
3442
  program.addCommand(runCommand);
package/dist/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ccqa",
3
- "version": "0.3.8",
3
+ "version": "0.3.9",
4
4
  "type": "module",
5
5
  "description": "Browser test recorder powered by Claude Code and agent-browser",
6
6
  "repository": {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ccqa",
3
- "version": "0.3.8",
3
+ "version": "0.3.9",
4
4
  "type": "module",
5
5
  "description": "Browser test recorder powered by Claude Code and agent-browser",
6
6
  "repository": {