npm - ccqa - Versions diffs - 0.8.3 → 0.10.0 - Mend

ccqa 0.8.3 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md +115 -12
package/dist/bin/ccqa.mjs +869 -303
package/dist/package.json +1 -1
package/dist/runtime/test-helpers.d.mts +8 -1
package/dist/runtime/test-helpers.mjs +28 -3
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -69,6 +69,8 @@ ccqa run tasks/create-and-complete      # vitest replays test.spec.ts; no LLM
 ccqa run tasks/create-and-complete      # Claude drives the browser every time
 ```
+Live specs can start already-signed-in by pointing `statePath:` at a saved agent-browser state file (cookies + localStorage). Run an interactive login locally once, save the state with `agent-browser state save .ccqa/sessions/<name>.json`, then commit the path (not the file) — see [Pre-authenticated state](#pre-authenticated-state-statepath) below for the local bootstrap and the CI restore pattern.
 By default deterministic runs write step-boundary screenshots and metadata to `ccqa-report/evidence/<feature>/<spec>/` so a reviewer can confirm a passing spec actually reached the states its `expected` clauses describe. Disable with `--no-evidence`.
 In CI you can opt in to an HTML run report by passing `--report` — every failing spec gets a drift audit plus a root-cause call (TEST_DRIFT / SPEC_CHANGE / PRODUCT_BUG) using the branch's git diff as context, and the report lets a human grade those calls to measure their accuracy. Requires `ANTHROPIC_API_KEY` or a local Claude login for the analysis part. Opt out with `--no-failure-analysis` (which also implicitly skips the drift audit — the audit is rendered as evidence under the classification, so without the classification the cost has nowhere to land). Use `--no-drift-audit` to keep the classification but skip the audit. See [Run report](./docs/report.md).
@@ -84,6 +86,7 @@ ccqa run --changed --report                    # only specs whose relatedPaths t
 |---|---|
 | Write specs interactively with Claude | [Draft](./docs/draft.md) |
 | Reuse login and other shared step sequences | [Blocks](./docs/blocks.md) |
+| Drive `<input type="file">` without an OS picker | [File upload](./docs/file-upload.md) |
 | Assertion helper functions | [Assertions](./docs/assertions.md) |
 | Auto-fix failing tests | [Auto-fix](./docs/auto-fix.md) |
 | Detect spec/code drift in CI | [Drift](./docs/drift.md) |
@@ -94,19 +97,22 @@ ccqa run --changed --report                    # only specs whose relatedPaths t
 ## Commands
 ```
+ccqa init                          Scaffold .ccqa/prompts/{live,record}.{user,agent}.md templates
 ccqa draft [feature/spec]          Co-author a test spec with Claude
 ccqa perspectives                  Inventory existing test coverage into .ccqa/perspectives.yaml
 ccqa record <feature/spec>         (deterministic specs only) Trace browser actions + generate test.spec.ts
-ccqa run [feature/spec]            Execute specs. Per spec, the spec.yaml `mode:` field selects deterministic
+ccqa run [feature/spec...]         Execute specs. Per spec, the spec.yaml `mode:` field selects deterministic
                                    (vitest replay) or live (Claude drives every time). One run can mix both;
-                                   `--report` writes one unified HTML.
+                                   `--report` writes one unified HTML. Pass multiple targets space-separated.
 ccqa drift [feature/spec]          Standalone spec ↔ codebase static audit (for PR checks)
 ```
 `ccqa run` flags:
 - `--report [dir]` — write a self-contained HTML run report (default dir: `ccqa-report/`)
-- `--changed` — restrict execution to specs whose `relatedPaths` intersect `git diff <base>...HEAD`. Mutually exclusive with an explicit spec id.
+- `--profile <name>` — load `.ccqa/profiles/<name>.env` into the environment before resolving spec `${VAR}` references, so one spec targets dev/stg/prd without per-environment copies. See [Profiles](#profiles---profile).
+- `--changed` — restrict execution to specs whose `relatedPaths` intersect `git diff <base>...HEAD`. Mutually exclusive with explicit spec targets.
+- `--concurrency <n>` — run up to N specs in parallel **within each mode** (deterministic specs run as one phase, live specs as the next; parallelism is within a phase, not across). Default `1` (sequential, identical to before). Above 1, each spec's output is buffered and flushed as a labelled block so parallel logs stay legible. Live specs each launch their own headed Chrome, so high values spawn many browser instances.
 - `--base <ref>` — base ref for the git diff (default: `$GITHUB_BASE_REF`, then `origin/main`)
 - `--no-failure-analysis` — skip the per-failure root-cause classification (also skips the drift audit, since the audit only shows under the classification)
 - `--no-drift-audit` — skip the spec ↔ code drift audit while keeping the classification
@@ -114,10 +120,11 @@ ccqa drift [feature/spec]          Standalone spec ↔ codebase static audit (fo
 - `--retry <n>` — (live specs only) retry each failing step up to N more times
 - `--format <fmt>` — `text` (default), `json` (report.json), `github` (Actions annotations)
 - `--out <dir>` — (live specs only, single-spec invocations) override the per-run artifact directory
+- `--update-agent-prompt` — (live specs only) after the run, summarise it back to Claude and rewrite `.ccqa/prompts/live.agent.md` so the next run inherits the lessons learned. `ccqa record` ships the same flag, refreshing `record.agent.md` from the trace summary.
 All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus` | `haiku`, or a full model ID). The flag overrides `CCQA_MODEL`; when both are unset, the Claude Code CLI default is used. They also accept `--language <bcp47>` (e.g. `ja`, `en`) to set the language of human-readable output; the default `auto` follows the language of the spec/codebase. `--cwd <path>` works on `record` / `run` / `drift` so you can target a subpackage inside a monorepo from the repo root. Interactive commands authenticate via your local Claude Code login; commands that talk to Claude in CI (`ccqa run --report`, `ccqa drift`) additionally honor `ANTHROPIC_API_KEY`.
-`<feature/spec>` is a 2-segment alias for the on-disk path `.ccqa/features/<feature>/test-cases/<spec>/`.
+`<feature/spec>` is a 2-segment alias for the on-disk path `.ccqa/features/<feature>/test-cases/<spec>/`. `ccqa run` accepts several targets space-separated (each a `<feature>/<spec>`, a bare `<feature>` for all its specs, or omitted for everything); duplicates are de-duped and `--changed` cannot be combined with explicit targets.
 ## File structure
@@ -125,9 +132,14 @@ All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus`
 .ccqa/
   perspectives.yaml              # Inventory of existing coverage (machine-readable, canonical)
   perspectives.md                # Category index, regenerated from the YAML
-  prompts/
-    trace.user.md                # Project-specific guidance appended to `ccqa record` (trace phase)
-    run-nd.user.md               # Project-specific guidance appended to `ccqa run` (live specs)
+  profiles/                      # `--profile <name>` env files
+    stg.env                      # URLs + credential refs; commit if it uses secret-manager refs, gitignore if it holds plaintext secrets
+    prd.env
+  prompts/                       # Run `ccqa init` to scaffold these
+    record.user.md               # Human-maintained guidance appended to `ccqa record` (trace phase)
+    record.agent.md              # Auto-updated by `ccqa record --update-agent-prompt`
+    live.user.md                 # Human-maintained guidance appended to `ccqa run` (live specs)
+    live.agent.md                # Auto-updated by `ccqa run --update-agent-prompt`
   blocks/
     login/
       spec.yaml                  # Reusable block (params + steps)
@@ -151,6 +163,26 @@ All Claude-driven commands accept `-m, --model <name>` (alias `sonnet` | `opus`
 Add `.ccqa/features/*/test-cases/*/runs/` to `.gitignore` — these are per-run artefacts that should not be committed. Likewise `ccqa-report*/`.
+## Profiles (`--profile`)
+Keep environment-specific values out of specs as `${VAR}` references and supply them per environment from a **profile** — a `.env` under `.ccqa/profiles/<name>.env`. `ccqa run`/`record --profile <name>` merges it into the environment before resolving `${VAR}`, so one spec runs anywhere.
+```bash
+# .ccqa/profiles/stg.env
+APP_BASE_URL=https://<your-app-host>
+TEST_USER_EMAIL=<stg-test-account>
+TEST_USER_PASSWORD=...
+```
+```bash
+ccqa run auth/login --profile stg    # same spec, stg values
+```
+- Name is free-form (`stg`/`prd` are conventions); a path separator / `..` / leading dot is rejected, and a missing profile exits 2. Only the name is logged, never values.
+- Format is a small `.env` subset (`KEY=value`, `#` comments, `export`, quotes). Profile values **override** the inherited environment.
+- Without `--profile`, ccqa auto-loads `<cwd>/.env` if present (like dotenv); with neither, `${VAR}` resolves against the existing `process.env` (e.g. `direnv`).
+**Secrets:** gitignore any profile that holds plaintext secrets. ccqa only parses `.env` files — it doesn't resolve secret-manager references — so to keep secrets off disk, drop `--profile` and run ccqa under your secret manager instead (e.g. `op run --env-file=.ccqa/profiles/stg.env -- ccqa run ...`), which injects the resolved values into `process.env` for ccqa to read.
 ## Live specs (`mode: live`)
 For specs declared `mode: live` in their spec.yaml, `ccqa run` skips codegen entirely: Claude executes each spec step against `agent-browser` directly, judges whether the step's `expected` outcome holds, and saves a PNG screenshot before and after every step. Use this mode when:
@@ -175,11 +207,82 @@ ccqa run --retry 2 tasks/create-and-complete
 Constraints on selectors / `agent-browser` subcommands that apply during `ccqa record` (no `eval`, no `@ref`, no bare-tag positional `find`, no chained agent-browser calls) are **relaxed** for live specs — Claude can use any subcommand and any selector style because there is no replay contract to honour.
-### Per-project guidance (`.ccqa/prompts/run-nd.user.md`)
+### Pre-authenticated state (`statePath:`)
+By default each `ccqa run` of a live spec spins up a fresh `agent-browser` session and starts signed-out. That keeps runs hermetic but forces every device-trust gate (Slack "we don't recognize this browser", Google's unfamiliar-device prompt, MFA challenges, …) to fire on every run.
+To skip them, save an authenticated browser state to a JSON file once locally and point the spec at it:
+```yaml
+title: Slack App Home — non-admin access denied
+mode: live
+statePath: .ccqa/sessions/slack-stg.json   # cookies + localStorage to restore
+steps:
+  - ...
+```
+ccqa resolves the path against the project root and passes `--state <path>` to every `agent-browser` invocation in the run (including ccqa's own screenshot calls). The file is **read-only** — `--state` loads it but never writes back to it. Re-running locally or in CI does not mutate it.
+Bootstrap once locally:
+```bash
+# 1. Log in interactively in a headed browser.
+agent-browser --headed open https://app.slack.com
+# …complete login + device-trust prompts by hand…
+# 2. Snapshot cookies + localStorage to the path the spec references.
+mkdir -p .ccqa/sessions
+agent-browser state save .ccqa/sessions/slack-stg.json
+agent-browser close
+# 3. ccqa run reuses the saved state — no login prompt.
+ccqa run slack/app-home-non-admin-access-denied
+```
+Add `.ccqa/sessions/` to `.gitignore` — these files contain live auth cookies and must never be committed.
+#### CI: bring the state file with you
+`statePath:` lives entirely inside `.ccqa/` and never touches `~/`. CI re-uses the state by writing the file into the same path the spec already references:
+```bash
+# Locally, after the interactive bootstrap above:
+base64 -i .ccqa/sessions/slack-stg.json | pbcopy
+# paste into your CI secret store as CCQA_SLACK_STG_STATE_B64
+```
+```yaml
+# .github/workflows/ccqa.yml (sketch)
+- name: Restore agent-browser state
+  env:
+    CCQA_SLACK_STG_STATE_B64: ${{ secrets.CCQA_SLACK_STG_STATE_B64 }}
+  run: |
+    mkdir -p .ccqa/sessions
+    printf '%s' "$CCQA_SLACK_STG_STATE_B64" | base64 -d \
+      > .ccqa/sessions/slack-stg.json
+- name: Run live specs
+  env:
+    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+  run: pnpm ccqa run --report
+```
+Caveats:
+- **Expiry.** Whatever the upstream service's "remember this device" window is (Slack ≈ 30 days, others vary), the cookies in the state file eventually expire and CI starts failing on the device-trust gate again. Re-bootstrap locally and rotate the secret.
+- **Treat the file as a credential.** It contains live auth cookies. Store it in your CI secret manager (GitHub Actions encrypted secrets, Vault, …) and never commit it.
+- **Deterministic specs ignore `statePath:`.** Today it only affects `mode: live`; vitest-replayed specs always run isolated.
+### Per-project guidance (`.ccqa/prompts/live.user.md` + `live.agent.md`)
+ccqa's live-mode system prompt is deliberately product-agnostic. Anything specific to **your** project — staging URLs, login flow quirks, rich-editor types, common access-denied wording — belongs in two sibling files (run `ccqa init` to scaffold both):
+- `.ccqa/prompts/live.user.md` — human-maintained stable guidance.
+- `.ccqa/prompts/live.agent.md` — auto-updated by `ccqa run --update-agent-prompt` from each run's summary. You can hand-edit it, but the next `--update-agent-prompt` run may rewrite the whole file; durable rules should live in `live.user.md`.
-ccqa's live-mode system prompt is deliberately product-agnostic. Anything specific to **your** project — staging URLs, login flow quirks, rich-editor types, common access-denied wording — belongs in `.ccqa/prompts/run-nd.user.md`. The file is read once per invocation and appended to the system prompt under a "Project-specific guidance" heading.
+Both files (when present) are read once per invocation and appended to the system prompt under "Project-specific guidance". The `ccqa record` (trace) side has the same split: `record.user.md` + `record.agent.md`, refreshed by `ccqa record --update-agent-prompt`.
-Keep it short. A page or two of focused notes beats a long handbook — Claude has the spec's `expected` text to work from, the file is for the *non-obvious* product knowledge that isn't in any single spec. Examples of what's useful here:
+Keep them short. A page or two of focused notes beats a long handbook — Claude has the spec's `expected` text to work from, these files are for the *non-obvious* product knowledge that isn't in any single spec. Examples of what's useful here:
 - "the rich text editor is `[contenteditable='true']` — use `fill`, not keystrokes"
 - "login redirects through an IDP service-selection screen; you can skip it by opening the destination URL directly"
@@ -189,9 +292,9 @@ Examples of what does **not** belong:
 - per-spec details (those belong in the spec's `instruction` / `expected`)
 - restating the STEP_RESULT contract (already in the system prompt)
-- copy-pasted style guidelines from `trace.user.md` (the relaxed-constraint mode doesn't need them)
+- copy-pasted style guidelines from `record.user.md` (the relaxed-constraint mode doesn't need them)
-The file is capped at 32 KiB; anything beyond that is truncated with a warning.
+The combined bundle is capped at 32 KiB; anything beyond that is truncated with a warning.
 ## License