prose-qa 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,570 +1,133 @@
1
1
  # Prose-QA
2
2
 
3
- Agent harness for **end-to-end regression testing** of web apps. Scenarios are written in natural language with explicit verification checkpoints. An LLM agent executes them using [Vercel `agent-browser`](https://github.com/vercel-labs/agent-browser) via bash no browser wrapper in TypeScript.
3
+ Write what you want to test in plain text, and let Prose-QA do the rest. This autonomous, LLM-driven testing engine executes complex web workflows and validation checkpoints without the overhead of heavy browser wrappers, bringing frictionless QA to modern development.
4
4
 
5
- ## How-to (step by step)
5
+ Requires **Node.js 24+**, `PQA_LLM_API_KEY`, and `llm.provider` / `llm.model` in config.
6
6
 
7
- See **[docs/HOWTO.md](docs/HOWTO.md)** for a progressive guide (scenario format → agent-browser → debug/run → CI → auth → MCP → record → cache → healing → analyze).
8
-
9
- ## Features
10
-
11
- - **Natural language scenarios** with `# Goal`, `# Steps`, and `# Then` checkpoints
12
- - **Agent Skills** ([agentskills.io](https://agentskills.io/)) — Anthropic-compatible `SKILL.md` format
13
- - **Pinned agent-browser skill** vendored at `skills/agent-browser/` (installed via `postinstall` on `npm ci` / `npm install`)
14
- - **CI + local debug** modes with HTML/JSON reports
15
-
16
- ## Install
7
+ ## Quick start
17
8
 
18
9
  ```bash
19
- npm install -g prose-qa
20
- # or in a project:
21
10
  npm install prose-qa
22
- npx pqa --help
23
- ```
24
-
25
- Requires Node.js 20+ and an LLM API key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `FIREWORKS_API_KEY`, `OPENROUTER_API_KEY`, etc. depending on config).
26
-
27
- On first install, `agent-browser` downloads its browser binary via `postinstall`. In CI, run:
28
-
29
- ```bash
30
- npx agent-browser install --with-deps
31
- ```
32
11
 
33
- ## New project setup
34
-
35
- 1. Install the package in your app repo (or globally).
36
- 2. Create `pqa.config.json` in your project root (or use `pqa config <key> <value>` to set values incrementally):
37
-
38
- ```bash
39
12
  pqa config llm.provider anthropic
40
13
  pqa config llm.model claude-sonnet-4-20250514
41
- ```
42
-
43
- Supported config filenames (first match wins): `pqa.config.json`, `pqa.config.mjs`, `pqa.config.js`, `pqa.config.ts`.
44
-
45
- 3. Create scenarios under `scenarios/` (see [0_hello-world.md](scenarios/0_hello-world.md)).
46
- 4. Copy [`.env.example`](.env.example) to `.env.development.local` (or set env vars in CI) and fill in secrets.
47
- 5. Run:
48
14
 
49
- ```bash
50
- export ANTHROPIC_API_KEY=...
15
+ export PQA_LLM_API_KEY=...
51
16
  pqa run scenarios/**/*.md --tags smoke
52
17
  ```
53
18
 
54
- Bundled harness assets (`prompt/`, `skills/`) ship inside the npm package. Your project only needs `pqa.config.*`, `scenarios/`, and optional `.agents/skills/` overrides.
55
-
56
- ## Development (this repo)
57
-
58
- ```bash
59
- git clone https://github.com/FreakDev/Prose-QA.git
60
- cd Prose-QA
61
- npm ci
62
- npm run build
63
-
64
- export ANTHROPIC_API_KEY=...
65
-
66
- # Bundled scenarios target http://127.0.0.1:8080/ — start the demo server first (separate terminal or background)
67
- npm run demo:server
68
-
69
- # CI mode
70
- npm run dev -- run scenarios/**/*.md --tags example
71
-
72
- # Debug single scenario
73
- npm run dev -- debug scenarios/0_hello-world.md --verbose
74
-
75
- # Auth demo (demo server with hardcoded credentials)
76
- export PQA_TEST_EMAIL=demo@pqa.local PQA_TEST_PASSWORD=demo-password
77
- npm run dev -- debug scenarios/1_example-authenticated.md --verbose
78
- ```
79
-
80
- The demo server (`npm run demo:server` → `scripts/demo-server.mjs`) serves `/` (Hello World), `/login`, and protected `/projects`. Credentials match `.env.example`.
81
-
82
- See [CONTRIBUTING.md](CONTRIBUTING.md) for pull request guidelines.
19
+ **New project checklist**
83
20
 
84
- ## MCP server (Cursor, Claude Desktop, )
21
+ 1. Install the package in your app repo (or globally with `npm install -g prose-qa`).
22
+ 2. Create `pqa.config.json` — use `pqa config <key> <value>` or copy the [minimal example](docs/CONFIG.md#minimal-example).
23
+ 3. Add scenarios under `scenarios/` (see [0_hello-world.md](scenarios/0_hello-world.md)).
24
+ 4. Copy `[.env.example](.env.example)` to `.env.development.local` (or set env vars in CI).
25
+ 5. Run `pqa run` or `pqa debug`.
85
26
 
86
- Start the Prose-QA MCP server over stdio so clients can read the **create-pqa-scenario** skill and run scenarios from inline markdown (same format as `scenarios/*.md`):
27
+ On first install, `agent-browser` downloads its browser binary via `postinstall`. In CI:
87
28
 
88
29
  ```bash
89
- pqa mcp
90
- # or from this repo:
91
- npm run mcp
92
- ```
93
-
94
- **Cursor** (`.cursor/mcp.json` in your app repo — `cwd` must be the project with `pqa.config` and env vars):
95
-
96
- ```json
97
- {
98
- "mcpServers": {
99
- "prose-qa": {
100
- "command": "npx",
101
- "args": ["-y", "prose-qa", "mcp"]
102
- }
103
- }
104
- }
30
+ npx agent-browser install --with-deps
105
31
  ```
106
32
 
107
- After `npm run build` in this repo, use `"command": "node"` and `"args": ["dist/cli/index.js", "mcp"]` with `cwd` set to the Prose-QA repo root.
33
+ Bundled harness assets (`prompt/`, `skills/`) ship inside the npm package. Your project only needs `pqa.config.*`, `scenarios/`, and optional `.agents/skills/` overrides.
108
34
 
109
- | Surface | Purpose |
110
- | -------- | -------- |
111
- | Resource `pqa://skill/create-pqa-scenario` | Full create-pqa-scenario `SKILL.md` |
112
- | Tool `get_create_pqa_scenario_skill` | Same skill as text |
113
- | Tool `validate_scenario` | Parse `content` without running the browser |
114
- | Tool `run_scenario` | Execute `content` (requires LLM + browser env) |
115
- | Prompt `author_pqa_scenario` | Template that includes the skill |
35
+ ## What you get
116
36
 
117
- ## Scenario format
37
+ - **Natural language scenarios** — `# Goal`, `# Steps`, and `# Then` checkpoints ([format guide](docs/HOWTO.md#1-scenario-format-goal--steps--then--frontmatter))
38
+ - **Agent Skills** ([agentskills.io](https://agentskills.io/)) — Anthropic-compatible `SKILL.md` format
39
+ - **Pinned agent-browser skill** vendored at `skills/agent-browser/` (installed via `postinstall`)
40
+ - **CI + local debug** modes with HTML/JSON reports
41
+ - **Auth, cache, healing, recording, and analysis** — see [HOWTO](docs/HOWTO.md)
118
42
 
119
- See [prompt/references/scenario-format.md](prompt/references/scenario-format.md).
43
+ ## Documentation
120
44
 
121
- ```markdown
122
- ---
123
- name: checkout-happy-path
124
- tags: [smoke]
125
- auth: admin
126
- url: https://app.example.com
127
- ---
128
45
 
129
- # Goal
130
- As a user, complete checkout.
46
+ | Doc | Purpose |
47
+ | ------------------------------------------------------------ | ------------------------------------------------------------------------------------------ |
48
+ | [docs/HOWTO.md](docs/HOWTO.md) | Step-by-step guide: scenarios → run → CI → auth → MCP → record → cache → healing → analyze |
49
+ | [docs/CONFIG.md](docs/CONFIG.md) | Full configuration reference |
50
+ | [CONTRIBUTING.md](CONTRIBUTING.md) | Pull request guidelines |
51
+ | [SECURITY.md](SECURITY.md) | Vulnerability reporting, secrets, and run artifacts |
52
+ | [recorder-extension/README.md](recorder-extension/README.md) | Chrome extension recorder (WIP) |
131
53
 
132
- # Steps
133
- 1. Add item to cart and proceed to checkout.
134
- 2. Complete payment with test card.
135
54
 
136
- # Then
137
- - url contains "/order-confirmation"
138
- - page shows "Thank you"
139
- ```
55
+ ## CLI
140
56
 
141
- ## Configuration
142
57
 
143
- Prose-QA loads configuration from the bundled defaults ([`pqa.config.ts`](pqa.config.ts) in the npm package), then merges your local overrides. Only keys you set need to appear in your project file.
58
+ | Command | Description |
59
+ | --------------------------------------------------- | -------------------------------------------------- |
60
+ | `pqa config <key> <value>` | Set a value in `pqa.config.json` |
61
+ | `pqa run [globs]` | Run scenarios (headless by default) |
62
+ | `pqa debug [globs]` | Verbose debug run (headed by default) |
63
+ | `pqa clear-cache [scenario]` | Clear scenario replay cache |
64
+ | `pqa auth list` / `clear` / `save` | Manage cached auth profiles |
65
+ | `pqa analyze [run...]` | Post-run analysis and flaky detection (`--last N`) |
66
+ | `pqa record start` / `note` / `checkpoint` / `stop` | Record browser actions → scenario markdown |
67
+ | `pqa skills list` / `show` / `sync` | Discover and inspect agent skills |
68
+ | `pqa mcp` | Start MCP server (Cursor, Claude Desktop, …) |
144
69
 
145
- **Local config files** (first match in the project root wins): `pqa.config.json`, `pqa.config.mjs`, `pqa.config.js`, `pqa.config.ts`.
146
70
 
147
- **CLI helper** create or update `pqa.config.json` without editing by hand (dot notation for nested keys):
71
+ Tag filters, auth refresh, retries, and cache flags: see [HOWTO §3–§4](docs/HOWTO.md#3-debug-vs-run) and [HOWTO §11](docs/HOWTO.md#11-healing--retries).
148
72
 
149
- ```bash
150
- pqa config llm.provider anthropic
151
- pqa config browser.headed true
152
- pqa config envVars '["PQA_TEST_EMAIL","PQA_TEST_PASSWORD"]'
153
- ```
73
+ **Exit codes:** `0` pass · `1` failure · `2` config/harness error
154
74
 
155
- Unknown keys are rejected; only properties that exist in the bundled reference config are allowed.
75
+ ## Configuration
156
76
 
157
- ### Minimal example
77
+ Supported filenames (first match wins): `pqa.config.json`, `pqa.config.mjs`, `pqa.config.js`, `pqa.config.ts`.
158
78
 
159
79
  ```json
160
80
  {
161
81
  "envVars": ["PQA_TEST_EMAIL", "PQA_TEST_PASSWORD"],
162
- "sensitiveEnvVars": ["PQA_TEST_EMAIL", "PQA_TEST_PASSWORD"],
163
82
  "llm": {
164
83
  "provider": "anthropic",
165
84
  "model": "claude-sonnet-4-20250514"
166
- },
167
- "auth": {
168
- "admin": {
169
- "scenario": "login-admin",
170
- "statePath": ".pqa/auth/admin.json"
171
- }
172
85
  }
173
86
  }
174
87
  ```
175
88
 
176
- ### Environment variables
177
-
178
- | Variable | Description |
179
- | --- | --- |
180
- | `ANTHROPIC_API_KEY` | Required when `llm.provider` is `anthropic` |
181
- | `OPENAI_API_KEY` | Required when `llm.provider` is `openai` |
182
- | `FIREWORKS_API_KEY` | Required when `llm.provider` is `fireworks` |
183
- | `GOOGLE_GENERATIVE_AI_API_KEY` | Required when `llm.provider` is `google` |
184
- | `OPENROUTER_API_KEY` | Required when `llm.provider` is `openrouter` |
185
- | `PQA_LLM_PROVIDER` | Overrides bundled default `llm.provider` (dev / CI shortcut) |
186
- | `PQA_LLM_MODEL` | Overrides bundled default `llm.model` |
187
-
188
- Ollama does not require an API key env var. Any name listed in `envVars` must be set before a run starts.
189
-
190
- ### All options
191
-
192
- #### `scenariosDir` (string)
193
-
194
- Root directory for scenario markdown files. Set directly in `pqa.config.json`.
195
-
196
- | | |
197
- | --- | --- |
198
- | **Default** | `scenarios`, or `pqa/` when that directory exists and `scenarios/` does not |
199
-
200
- #### `systemPromptPath` (string)
201
-
202
- Path to the agent system prompt markdown file. Relative paths resolve against the project cwd first, then bundled package assets.
203
-
204
- | | |
205
- | --- | --- |
206
- | **Default** | `prompt/SYSTEM.md` (bundled) |
207
-
208
- #### `envVars` (string[])
209
-
210
- Environment variable **names** the agent should know about. Injected into the system prompt at runtime (set / not-set status only — never values). Validated before each run.
211
-
212
- | | |
213
- | --- | --- |
214
- | **Default** | `[]` |
215
-
216
- #### `sensitiveEnvVars` (string[])
217
-
218
- Env var names whose **values** are redacted from transcripts, verdicts, reports, and verbose logs (replaced with `${VAR_NAME}`). If omitted, defaults to `envVars`. The LLM API key for the configured provider is always redacted.
219
-
220
- | | |
221
- | --- | --- |
222
- | **Default** | same as `envVars` |
223
-
224
- ---
225
-
226
- #### `llm` (object)
227
-
228
- LLM provider and model used for test runs, recording generation, and analysis.
229
-
230
- | Key | Type | Default | Description |
231
- | --- | --- | --- | --- |
232
- | `provider` | `"anthropic"` \| `"openai"` \| `"fireworks"` \| `"ollama"` \| `"google"` \| `"openrouter"` | `"anthropic"` | LLM backend |
233
- | `model` | string | `"claude-sonnet-4-20250514"` | Model identifier for the chosen provider |
234
-
235
- ##### `llm.thinking` (object, optional)
236
-
237
- Extended thinking / reasoning. Provider support varies.
238
-
239
- | Key | Type | Default | Description |
240
- | --- | --- | --- | --- |
241
- | `enabled` | boolean | `true` | Enable extended thinking |
242
- | `budgetTokens` | number | `10000` | Thinking token budget (Anthropic, Fireworks, Google, OpenRouter) |
243
- | `reasoningEffort` | `"none"` \| `"minimal"` \| `"low"` \| `"medium"` \| `"high"` \| `"xhigh"` | — | OpenAI reasoning effort; mapped to Anthropic effort, Google thinking level, and OpenRouter reasoning effort. Ollama uses `think` mode only (other fields ignored) |
244
-
245
- ---
246
-
247
- #### `browser` (object)
248
-
249
- Default browser behavior for scenario runs (overridable per run with `--headed` / `--no-headed`).
250
-
251
- | Key | Type | Default | Description |
252
- | --- | --- | --- | --- |
253
- | `headed` | boolean | `false` | Run browser in visible (headed) mode |
254
- | `sessionName` | string | `"pqa"` | agent-browser session name |
255
- | `defaultTimeout` | number | `25000` | Default timeout in milliseconds |
256
-
257
- ---
258
-
259
- #### `skills` (object)
260
-
261
- Agent skill discovery and preloading ([agentskills.io](https://agentskills.io/) `SKILL.md` format).
262
-
263
- | Key | Type | Default | Description |
264
- | --- | --- | --- | --- |
265
- | `dirs` | string[] | `["skills", ".agents/skills"]` | Directories scanned for skills. Relative paths resolve like bundled assets |
266
- | `preloads` | string[] | `["core"]` | Skill names always appended to the system prompt (`core` = vendored agent-browser skill) |
267
-
268
- ---
269
-
270
- #### `agent` (object)
271
89
 
272
- Agent loop limits.
90
+ | Variable | Required when |
91
+ | ------------------ | ---------------------------------------- |
92
+ | `PQA_LLM_API_KEY` | Any cloud `llm.provider` (not `ollama`) |
93
+ | `PQA_LLM_PROVIDER` | Optional env shortcut for `llm.provider` |
94
+ | `PQA_LLM_MODEL` | Optional env shortcut for `llm.model` |
273
95
 
274
- | Key | Type | Default | Description |
275
- | --- | --- | --- | --- |
276
- | `maxTurns` | number | `200` | Maximum agent turns per scenario |
277
- | `bashTimeoutMs` | number | `120000` | Timeout for each bash (agent-browser) command in milliseconds |
278
96
 
279
- ---
97
+ All options, env vars, and a full example: **[docs/CONFIG.md](docs/CONFIG.md)**.
280
98
 
281
- #### `auth` (object)
99
+ ## MCP (Cursor)
282
100
 
283
- Map of auth profile names to login scenario configuration. Consumer scenarios reference a profile via frontmatter `auth: <name>`.
284
-
285
- Each profile key (e.g. `admin`) supports:
286
-
287
- | Key | Type | Default | Description |
288
- | --- | --- | --- | --- |
289
- | `scenario` | string | — | `frontmatter.name` of the on-demand auth scenario (e.g. `"login-admin"`) |
290
- | `statePath` | string | `.pqa/auth/<profile>.json` | agent-browser state file path |
291
-
292
- When a scenario uses `auth: admin`, the harness loads cached state from `statePath` or runs the auth scenario once, saves browser state, then continues. See [Auth (hybrid authStore)](#auth-hybrid-authstore).
293
-
294
- ---
295
-
296
- #### `healing` (object, optional)
297
-
298
- Conservative self-healing: in-run recovery and transient-only scenario retries. See [Self-healing](#self-healing-conservative).
299
-
300
- | Key | Type | Default | Description |
301
- | --- | --- | --- | --- |
302
- | `enabled` | boolean | `true` | Master switch for in-run recovery and transient retry gating |
303
- | `maxRecoveryTurns` | number | `2` | Extra agent turns after a failed verdict (same browser session) |
304
- | `recoverOnUnknown` | boolean | `false` | Allow recovery when failure class is unknown but bash output looks transient |
305
- | `transientPatterns` | string[] | see below | Substrings matched against bash output and checkpoint reasons to classify transient failures |
306
-
307
- Default `transientPatterns`: `timeout`, `timed out`, `not found`, `waiting for`, `navigation`, `net::`, `target closed`, `detached`, `stale`, `interrupted`.
308
-
309
- CLI equivalents: `--no-healing`, `--retries N`, `--retries-policy transient|always`, `--no-cache`.
310
-
311
- ---
312
-
313
- #### `cache` (object, optional)
314
-
315
- Scenario replay cache settings. See [Scenario replay cache](#scenario-replay-cache).
316
-
317
- | Key | Type | Default | Description |
318
- | --- | --- | --- | --- |
319
- | `dir` | string | `".pqa/cache"` | Directory for per-scenario replay hints |
320
- | `enabled` | boolean | `true` | Master switch (opt-out via `--no-cache`) |
321
-
322
- ---
323
-
324
- #### `recorder` (object, optional)
325
-
326
- Settings for `pqa record`. See [Recording scenarios](#recording-scenarios).
327
-
328
- | Key | Type | Default | Description |
329
- | --- | --- | --- | --- |
330
- | `bridgePort` | number | `17321` | Local HTTP port for the recording event bridge |
331
- | `outputDir` | string | `".pqa/recordings"` | Directory for saved recording sessions |
332
- | `defaultTags` | string[] | `["recorded"]` | Tags added to generated scenario frontmatter |
333
-
334
- ---
335
-
336
- ### Full reference example
101
+ Add to `.cursor/mcp.json` in your app repo (`cwd` must be the project with `pqa.config` and env vars):
337
102
 
338
103
  ```json
339
104
  {
340
- "scenariosDir": "pqa",
341
- "systemPromptPath": "prompt/SYSTEM.md",
342
- "envVars": ["PQA_TEST_EMAIL", "PQA_TEST_PASSWORD"],
343
- "sensitiveEnvVars": ["PQA_TEST_EMAIL", "PQA_TEST_PASSWORD"],
344
- "llm": {
345
- "provider": "anthropic",
346
- "model": "claude-sonnet-4-20250514",
347
- "thinking": {
348
- "enabled": true,
349
- "budgetTokens": 10000,
350
- "reasoningEffort": "high"
351
- }
352
- },
353
- "browser": {
354
- "headed": false,
355
- "sessionName": "pqa",
356
- "defaultTimeout": 25000
357
- },
358
- "skills": {
359
- "dirs": ["skills", ".agents/skills"],
360
- "preloads": ["core"]
361
- },
362
- "agent": {
363
- "maxTurns": 200,
364
- "bashTimeoutMs": 120000
365
- },
366
- "auth": {
367
- "admin": {
368
- "scenario": "login-admin",
369
- "statePath": ".pqa/auth/admin.json"
105
+ "mcpServers": {
106
+ "prose-qa": {
107
+ "command": "npx",
108
+ "args": ["-y", "prose-qa", "mcp"]
370
109
  }
371
- },
372
- "healing": {
373
- "enabled": true,
374
- "maxRecoveryTurns": 2,
375
- "recoverOnUnknown": false,
376
- "transientPatterns": [
377
- "timeout",
378
- "timed out",
379
- "not found",
380
- "waiting for",
381
- "navigation",
382
- "net::",
383
- "target closed",
384
- "detached",
385
- "stale",
386
- "interrupted"
387
- ]
388
- },
389
- "recorder": {
390
- "bridgePort": 17321,
391
- "outputDir": ".pqa/recordings",
392
- "defaultTags": ["recorded"]
393
- },
394
- "cache": {
395
- "dir": ".pqa/cache",
396
- "enabled": true
397
- }
398
- }
399
- ```
400
-
401
- ## CLI
402
-
403
- | Command | Description |
404
- | --- | --- |
405
- | `pqa config <key> <value>` | Set a value in `pqa.config.json` |
406
- | `pqa run [globs]` | Run scenarios (headless by default) |
407
- | `pqa clear-cache [scenario]` | Clear scenario replay cache |
408
- | `pqa debug [globs]` | Verbose debug run (headed by default, supports `--tag` / `--tags`) |
409
- | `pqa skills list` | List discovered skills |
410
- | `pqa skills show <name>` | Print skill body |
411
- | `pqa skills sync` | Re-vendor agent-browser skill (dev repo only) |
412
- | `pqa auth list` | List cached auth profiles in the auth store |
413
- | `pqa auth clear [profile]` | Clear cached auth state |
414
- | `pqa auth save <name>` | Run the configured auth scenario and save state |
415
- | `pqa analyze [run...]` | Heuristic + LLM analysis, interactive patch review (REPL); multi-run flaky detection with `--last N` |
416
- | `pqa record start` | Start headed recording session (browser + event bridge) |
417
- | `pqa record note <text>` | Add a comment to the active recording |
418
- | `pqa record checkpoint <text>` | Add a Then-section hint |
419
- | `pqa record stop` | Stop recording and generate `scenarios/recorded/*.md` via LLM |
420
- | `pqa record generate <dir>` | Regenerate scenario markdown from a saved recording |
421
-
422
- Tag filters on `run` and `debug` can express AND/OR/NOT combinations:
423
-
424
- ```bash
425
- # AND: scenario must have both tags
426
- pqa run scenarios/**/*.md --tags smoke,checkout
427
-
428
- # AND with NOT: scenario must have p0 and must not have smoke
429
- pqa run scenarios/**/*.md --tags p0,!smoke
430
-
431
- # OR: scenario may have either tag
432
- pqa run scenarios/**/*.md --tag smoke --tag checkout
433
-
434
- # OR with NOT: scenario either lacks p0 or has smoke
435
- pqa run scenarios/**/*.md --tag !p0 --tag smoke
436
-
437
- # Combined: (smoke AND checkout) OR auth
438
- pqa run scenarios/**/*.md --tags smoke,checkout --tag auth
439
- ```
440
-
441
- Use `--auth-refresh` on `run` / `debug` to re-run auth scenarios and refresh the store.
442
-
443
- ## Scenario replay cache
444
-
445
- After a scenario passes, PQA runs a secondary LLM pass on the run transcript to produce **replay hints** under `.pqa/cache/<scenario-name>/` (`hints.md` + `meta.json`). On the next run, those hints are injected into the agent system prompt (like a skill) so the agent can follow proven `agent-browser` paths and avoid repeating costly recovery loops.
446
-
447
- ```bash
448
- # First run: agent executes; hints are generated on pass
449
- pqa run scenarios/lapresse/homepage-smoke.md
450
-
451
- # Second run: agent runs with cached hints (if scenario content unchanged)
452
- pqa run scenarios/lapresse/homepage-smoke.md
453
-
454
- # Skip cache read/write
455
- pqa run scenarios/**/*.md --no-cache
456
-
457
- # Clear one or all caches
458
- pqa clear-cache lapresse-homepage-smoke
459
- pqa clear-cache
460
- ```
461
-
462
- Cache is **invalidated** when the effective scenario content changes (Goal, Steps, Then, frontmatter, and expanded includes — detected via content hash). Hints are **merged and refined** on each subsequent pass. Failed runs do not update the cache.
463
-
464
- Config (optional):
465
-
466
- ```json
467
- {
468
- "cache": {
469
- "dir": ".pqa/cache",
470
- "enabled": true
471
110
  }
472
111
  }
473
112
  ```
474
113
 
475
- ## Recording scenarios
476
-
477
- Record user actions and generate a draft scenario markdown file:
478
-
479
- ```bash
480
- pqa record start --url http://localhost:3000/projects
481
- pqa record note "intentionally invalid date"
482
- # interact in the browser
483
- pqa record checkpoint 'page shows "Projects"'
484
- pqa record stop --name my-flow
485
- pqa debug scenarios/recorded/my-flow.md --verbose --headed
486
- ```
487
-
488
- Events are stored under `.pqa/recordings/<timestamp>/events.jsonl`. On each interaction, the bridge runs `agent-browser snapshot -i`, matches the target to a snapshot ref (`snapshot.ref`, `snapshot.description`), and saves the tree under `snapshots/<ts>.json`. A background bridge process keeps receiving browser events until `pqa record stop` (so you can run `record note` / `record checkpoint` in another terminal while clicking in the browser). Generation uses the same LLM config as test runs (`prompt/RECORD.md`). Recorder options: see [`recorder`](#recorder-object-optional) in Configuration.
489
-
490
- **Chrome extension (WIP):** load unpacked from [recorder-extension/](recorder-extension/README.md), run `pqa record start --connect 9222`, and use the popup for notes/checkpoints.
491
-
492
- **Exit codes:** `0` pass · `1` failure · `2` config/harness error
493
-
494
- ## System prompt & skills
495
-
496
- | File / skill | Role |
497
- | --- | --- |
498
- | [prompt/SYSTEM.md](prompt/SYSTEM.md) | Agent system prompt (workflow, verdict schema, rules) |
499
- | `core` | Vendored agent-browser skill at `skills/agent-browser/` (bundled with the package) |
500
-
501
- `prompt/SYSTEM.md` is loaded as the system prompt; `core` is appended as a supplemental skill. Browser control stays in bash — the agent runs `agent-browser` commands directly.
502
-
503
- The system prompt enforces an **Observe-Act-Verify loop**: snapshot before each UI interaction, one interaction command per bash call, re-snapshot after page changes, and targeted reasoning only at ambiguous refs, failures, or before the final verdict. See [prompt/SYSTEM.md](prompt/SYSTEM.md) for details.
504
-
505
- ## Auth (hybrid authStore)
506
-
507
- Map auth profiles to on-demand login scenarios via the [`auth`](#auth-object) config block. See [scenario format — Auth](prompt/references/scenario-format.md#auth-hybrid-authstore).
508
-
509
- When a consumer scenario uses `auth: admin`, the harness loads cached state from `.pqa/auth/` or runs `login-admin` once, saves browser state, then continues.
510
-
511
- ```bash
512
- # Inspect / invalidate cache
513
- pqa auth list
514
- pqa auth clear admin
515
-
516
- # Force fresh login
517
- pqa run scenarios/**/*.md --auth-refresh
518
- ```
519
-
520
- **CI:** pass test credentials as GitHub Secrets → env vars (`PQA_TEST_EMAIL`, etc.) referenced in auth scenario Steps. Optionally pre-seed state from a base64 secret before the run.
521
-
522
- Legacy manual capture (runs the configured auth scenario):
523
-
524
- ```bash
525
- pqa auth save admin
526
- ```
527
-
528
- ## Self-healing (conservative)
529
-
530
- When [`healing.enabled`](#healing-object-optional) is `true` (default), Prose-QA can:
531
-
532
- 1. **In-run recovery** — after a failed verdict, retry verification of failed checkpoints only (same browser session), for **transient** failures (timeouts, stale refs).
533
- 2. **Scenario retries** — `--retries N` with `--retries-policy transient` (default) re-runs the whole scenario only when the failure is classified transient. Use `--no-healing` for legacy behavior (any failure retries).
114
+ Tools: `validate_scenario`, `run_scenario`, `get_create_pqa_scenario_skill`. Details: [HOWTO §8](docs/HOWTO.md#8-mcp--author-skill).
534
115
 
535
- Checkpoints are never relaxed automatically. Passes after recovery are marked `healing.used: true` in reports.
116
+ ## Development (this repo)
536
117
 
537
118
  ```bash
538
- # CI: one retry for flakes only
539
- pqa run scenarios/**/*.md --retries 1 --retries-policy transient
119
+ git clone https://github.com/FreakDev/Prose-QA.git
120
+ cd Prose-QA
121
+ npm ci && npm run build
540
122
 
541
- # Analyze the latest run (interactive REPL)
542
- pqa analyze
123
+ export PQA_LLM_API_KEY=...
543
124
 
544
- # Compare the 10 most recent runs for flaky scenarios
545
- pqa analyze --last 10
125
+ npm run demo:server # terminal 1 http://127.0.0.1:8080/
126
+ npm run dev -- debug scenarios/0_hello-world.md --verbose
546
127
  ```
547
128
 
548
- All healing options: see [`healing`](#healing-object-optional) in Configuration.
549
-
550
- ## Reports
551
-
552
- Runs write artifacts to `.pqa/runs/<runId>/`:
553
-
554
- - `report.json` / `report.html` — summary
555
- - `analyze.json` / `analyze-llm.json` — written by `pqa analyze` (single run)
556
- - `.pqa/analyze/<timestamp>/analyze-flaky.json` / `analyze-llm.json` — multi-run flaky analysis
557
- - `<scenario>/transcript.json` — bash commands + agent messages
558
- - `<scenario>/verdict.json` — structured pass/fail
559
-
560
- ## CI
561
-
562
- See [.github/workflows/smoke_tests.yml](.github/workflows/smoke_tests.yml). Unit tests run on every push. Optional smoke PQA runs require `ANTHROPIC_API_KEY` (or configure another provider via env).
563
-
564
- ## Security
565
-
566
- See [SECURITY.md](SECURITY.md) for vulnerability reporting and guidance on run artifacts and credentials.
129
+ See [CONTRIBUTING.md](CONTRIBUTING.md) and [docs/HOWTO.md](docs/HOWTO.md) for the full walkthrough.
567
130
 
568
131
  ## License
569
132
 
570
- MIT — see [LICENSE](LICENSE).
133
+ MIT — see [LICENSE](LICENSE).
@@ -1 +1 @@
1
- {"version":3,"file":"llm-model.d.ts","sourceRoot":"","sources":["../../src/agent/llm-model.ts"],"names":[],"mappings":"AAMA,OAAO,KAAK,EAAE,aAAa,EAAE,MAAM,IAAI,CAAC;AACxC,OAAO,KAAK,EAAE,SAAS,EAAE,MAAM,oBAAoB,CAAC;AAEpD,gFAAgF;AAChF,wBAAgB,cAAc,CAAC,MAAM,EAAE,SAAS,GAAG,aAAa,CAoB/D"}
1
+ {"version":3,"file":"llm-model.d.ts","sourceRoot":"","sources":["../../src/agent/llm-model.ts"],"names":[],"mappings":"AAMA,OAAO,KAAK,EAAE,aAAa,EAAE,MAAM,IAAI,CAAC;AAExC,OAAO,KAAK,EAAE,SAAS,EAAE,MAAM,oBAAoB,CAAC;AASpD,gFAAgF;AAChF,wBAAgB,cAAc,CAAC,MAAM,EAAE,SAAS,GAAG,aAAa,CAwB/D"}