npm - @exodus/xqa - Versions diffs - 1.3.0 → 1.5.0 - Mend

@exodus/xqa 1.3.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +150 -96
package/dist/skills/xqa-spec/AGENTS.md +20 -15
package/dist/skills/xqa-spec/SKILL.md +73 -18
package/dist/xqa.cjs +9365 -2158
package/package.json +12 -11

package/README.md CHANGED Viewed

@@ -1,166 +1,220 @@
 # @exodus/xqa
-CLI for running AI-powered QA agents against Exodus mobile apps on iOS.
+AI-powered QA agent CLI for Exodus applications.
-## Prerequisites
+## Overview
-- Node >= 22
-- pnpm
-- An Anthropic API key
+`xqa` automates mobile app QA by connecting to physical devices or emulators and running intelligent exploration and spec-based testing. The CLI orchestrates the pipeline that spawns agents to interact with your app, capture screenshots, and generate findings based on user-defined specs or breadth-first exploration.
-## Installation
+The tool manages configuration, project initialization, session state tracking, and interactive review workflows for triaging findings.
-From the monorepo root:
+## Commands
-```bash
-pnpm install
-```
+### init
-Then build and link the CLI globally:
+Initialize a new xqa project in the current directory.
+Creates a `.xqa/` directory with templates and subdirectories for specs, designs, and suites. Installs the `xqa-spec` skill for creating test specs.
 ```bash
-pnpm build:link   # build + link `xqa` into PATH
+xqa init
 ```
-For active development:
+### explore [prompt]
+Run the explorer agent; omit prompt for a full breadth-first sweep.
+Optional focus hint for the explorer agent. Omit to explore the entire app from the starting state. Generates a findings JSON file in `.xqa/output/` and prints the path upon completion.
 ```bash
-pnpm dev:link     # build, link, and watch for changes
+xqa explore                          # breadth-first exploration
+xqa explore "test the login flow"    # focused exploration
+xqa explore -v prompt,screen         # verbose output for categories
+xqa explore -v                       # verbose output for all categories
 ```
-## Setup
+Flag: `-v, --verbose [categories]` — Log categories (prompt, tools, screen, memory). Default: all if flag is present without value.
-Copy the example env file and fill in your values:
+### spec [spec-file]
+Run the explorer agent against a spec file.
+Loads a spec markdown file from `.xqa/specs/` (or an absolute path) and executes the agent against it. Spec files define entry points, steps, and optional timeouts. Omit the argument to pick from available specs interactively.
 ```bash
-cp .env.example .env.local
+xqa spec                                      # interactive spec picker
+xqa spec .xqa/specs/authentication.test.md   # explicit spec file
+xqa spec -v tools,memory                      # verbose output
 ```
-`.env.local` is loaded automatically at startup.
+Flag: `-v, --verbose [categories]` — Same as explore.
-## Environment Variables
+Spec file format (YAML frontmatter + markdown):
-| Variable                       | Required | Default          | Description                                                                                 |
-| ------------------------------ | -------- | ---------------- | ------------------------------------------------------------------------------------------- |
-| `ANTHROPIC_API_KEY`            | Yes      | —                | Anthropic API key                                                                           |
-| `GOOGLE_GENERATIVE_AI_API_KEY` | No       | —                | Gemini key — enables video analysis; required for `xqa analyse`                             |
-| `QA_RUN_ID`                    | No       | auto-generated   | Fixed run ID; auto-incremented when omitted                                                 |
-| `QA_EXPLORE_TIMEOUT_SECONDS`   | No       | —                | Max wall-clock time for an explore or spec run                                              |
-| `QA_WALLET_MNEMONIC`           | No       | —                | Wallet mnemonic; agent restores wallet before exploring when set                            |
-| `QA_BUILD_ENV`                 | No       | `prod`           | `dev` or `prod`; `dev` mode ignores debug overlays                                          |
-| `QA_STARTUP_STATE`             | No       | —                | `portfolio`, `new-wallet`, or `restore-wallet`; unset means app starts in its current state |
-| `QA_DESIGNS_DIR`               | No       | `./.xqa/designs` | Design artboards directory; enables visual regression checks when set                       |
+```markdown
+---
+feature: 'Feature Name'
+entry: 'Screen name or navigation path'
+timeout: 300
+---
-## Commands
+# Spec content
+```
+### review [findings-path]
-### `xqa explore [prompt]`
+Review findings and mark false positives.
-Runs the explorer agent against the live simulator. Without a prompt the agent sweeps the entire app. With a prompt it focuses on the described flow.
+Interactive session for triaging findings generated by explore or spec runs. Displays findings with confidence scores, steps, and screenshots. Mark findings as false positives (with optional reason) or undo previous dismissals. Saves dismissals to `.xqa/dismissals.json`. Defaults to the last findings path if omitted.
 ```bash
-xqa explore
-xqa explore "Try to send Bitcoin to an external address"
-xqa explore --verbose
+xqa review                                      # use last findings file
+xqa review .xqa/output/findings-abc123.json    # explicit path
 ```
-Startup state (`QA_STARTUP_STATE`) controls what the agent sees on launch:
+### analyse [video-path]
+Analyse a session recording with Gemini.
-- `portfolio` — main assets screen (default)
-- `new-wallet` — onboarding screen; agent taps through setup
-- `restore-wallet` — onboarding screen; agent restores wallet using `QA_WALLET_MNEMONIC`
+Requires `GOOGLE_GENERATIVE_AI_API_KEY` in environment. Analyzes a video file recorded during exploration and outputs findings as JSON.
+```bash
+xqa analyse /path/to/video.mp4
+```
-When `GOOGLE_GENERATIVE_AI_API_KEY` is set, a Gemini video analyser runs automatically after the explorer finishes.
+### completion <shell>
-### `xqa spec <spec-file>`
+Output shell completion script.
-Runs the explorer against a markdown spec file. The agent navigates to the entry point defined in the frontmatter and verifies each described step.
+Generate completion script for bash or zsh. Pipe output to shell config file to enable tab completion.
 ```bash
-xqa spec path/to/send-flow.md
-xqa spec path/to/send-flow.md --verbose
+xqa completion bash  # generate bash completions
+xqa completion zsh   # generate zsh completions
 ```
-Spec file format:
+## Configuration
-```markdown
----
-feature: Send Flow
-entry: Assets list
-max_steps: 40
----
+Configuration is loaded from environment variables and `.env.local`:
-Steps describing the flow to verify...
-```
+- `ANTHROPIC_API_KEY` (required) — Anthropic Claude API key for agent reasoning
+- `GOOGLE_GENERATIVE_AI_API_KEY` (optional) — Google Generative AI key for video analysis
+- `QA_RUN_ID` (optional) — Custom run identifier; defaults to auto-generated
+- `QA_EXPLORE_TIMEOUT_SECONDS` (optional) — Exploration timeout in seconds
+- `QA_BUILD_ENV` (optional) — Build environment: `dev` or `prod` (default: prod)
+## Architecture
-| Field       | Required | Description                                        |
-| ----------- | -------- | -------------------------------------------------- |
-| `feature`   | Yes      | Human-readable feature name                        |
-| `entry`     | Yes      | Screen name the agent navigates to before starting |
-| `max_steps` | No       | Maximum number of agent steps                      |
+Key files and directories:
-### `xqa analyse <video-path>`
+- `src/index.ts` — CLI entry point; wires commander commands and manages graceful shutdown via process locks
+- `src/commands/` — Command implementations (init, explore, spec, review, analyse, completion)
+- `src/core/` — Pure functions: spec parsing, completion generation, verbose option parsing, last-path tracking
+- `src/shell/` — I/O wrappers: file reading, device discovery, app context loading
+- `src/config.ts`, `src/config-schema.ts` — Configuration loading and validation with Zod
+- `src/review-session.ts` — Interactive finding review loop with dismissal tracking
+- `src/spec-frontmatter.ts` — Spec markdown frontmatter parsing (YAML)
+- `src/spec-slug.ts` — Spec filename to slug derivation for output organization
+- `src/pid-lock.ts` — Process-level mutual exclusion to prevent concurrent runs
-Analyses a session recording with Gemini. Requires `GOOGLE_GENERATIVE_AI_API_KEY`. Prints findings as JSON to stdout.
+## Error Types
+Core error discriminated unions:
+- `ConfigError` — Configuration validation failed (INVALID_CONFIG)
+- `AppContextError` — Failed to read app.md or explore.md (READ_FAILED)
+- `XqaDirectoryError` — No .xqa directory found (XQA_NOT_INITIALIZED)
+- `SpecFrontmatterError` — Malformed spec markdown (MISSING_FRONTMATTER, MISSING_FIELD, PARSE_ERROR)
+- `LastPathError` — No findings path provided and no prior session (NO_ARG_AND_NO_STATE)
+## Development
+Install dependencies:
 ```bash
-xqa analyse .xqa/output/2026-04-10/0001/recording.mp4
+pnpm install
 ```
-### `xqa review [findings-path]`
+Build the CLI:
-Interactive terminal session for reviewing findings and marking false positives. Requires a TTY. Dismissals are persisted to a dismissals store and excluded from future runs.
+```bash
+pnpm run build
+```
+Run tests:
 ```bash
-xqa review .xqa/output/2026-04-10/0001/findings.json
+pnpm run test
+```
+Type check:
-# re-open the last reviewed findings file
-xqa review
+```bash
+pnpm run typecheck
 ```
-### `xqa completion <shell>`
+Lint and format:
-Outputs a shell completion script.
+```bash
+pnpm run lint
+pnpm run lint:fix
+```
+Full quality check (lint, typecheck, test):
 ```bash
-xqa completion zsh >> ~/.zshrc
-xqa completion bash >> ~/.bashrc
+pnpm run check
+pnpm run check:fix
 ```
-## Process Behaviour
+Watch mode (build + re-run on file changes):
-Only one `xqa` instance runs at a time (PID lock). A second invocation while a run is active will exit immediately with an error.
+```bash
+pnpm run dev
+```
-- `Ctrl+C` once: graceful shutdown — the current agent step completes, findings are written, then the process exits
-- `Ctrl+C` twice: force exit
+Link binary globally (symlinks dist/xqa.cjs to ~/.local/bin/xqa):
-## Development
+```bash
+pnpm run build:link
+```
+Unlink binary:
 ```bash
-pnpm dev          # watch build
-pnpm build        # production build
-pnpm build:link   # build + link `xqa` globally
-pnpm dev:link     # watch build + link
-pnpm test         # run Vitest test suite
-pnpm typecheck    # TypeScript type check
-pnpm lint         # ESLint + Prettier check
-pnpm lint:fix     # ESLint + Prettier auto-fix
-pnpm check        # lint + typecheck + test (affected only)
-pnpm check:fix    # lint:fix + typecheck + test (affected only)
+pnpm run build:unlink
 ```
-## Architecture
+## Project Structure
 ```
 src/
-  index.ts                # CLI entry — registers all commands
-  config-schema.ts        # Zod schema for all environment variables
+  index.ts                    # CLI entry point
+  config.ts                   # Config loading and types
+  config-schema.ts            # Zod schema for env vars
+  constants.ts                # Tool lists and timeouts
+  pid-lock.ts                 # Process exclusion lock
+  spec-slug.ts                # Spec file to slug conversion
+  spec-frontmatter.ts         # Spec YAML parsing
+  review-session.ts           # Interactive finding review loop
   commands/
-    explore-command.ts    # xqa explore
-    spec-command.ts       # xqa spec
-    analyse-command.ts    # xqa analyse
-    review-command.ts     # xqa review
-    completion-command.ts # xqa completion
-  prompt-builder.ts       # builds the explorer system prompt from config
+    init-command.ts           # Project initialization
+    explore-command.ts        # Breadth-first exploration
+    spec-command.ts           # Spec-based exploration
+    review-command.ts         # Finding triage workflow
+    analyse-command.ts        # Video analysis
+    completion-command.ts     # Shell completion generation
+  core/
+    parse-verbose.ts          # Verbose flag parsing
+    completion-generator.ts   # Bash/zsh completion script generation
+    last-path.ts              # Last findings path tracking
+  shell/
+    app-context.ts            # Read app.md and explore.md
+    xqa-directory.ts          # Locate .xqa directory
+  __tests__/
+    *.test.ts                 # Test files co-located with src/
 ```
-The CLI is a thin shell over `@qa-agents/pipeline`. It parses env vars, builds a `PipelineConfig`, and calls `runPipeline()`.

package/dist/skills/xqa-spec/AGENTS.md CHANGED Viewed

@@ -23,7 +23,7 @@ Silently scan `.xqa/specs/*.test.md`. Learn:
 - Tag vocabulary
 - Level of detail and step granularity
-Also read `.xqa/instructions.md` if it exists for app context.
+Also read `.xqa/app.md` if it exists for app context.
 ### 2. Detect mode
@@ -40,17 +40,20 @@ Ask one question at a time. Wait for the answer before asking the next. Prefer m
 **Question sequence:**
-1. **What flow?** — Confirm what's being tested if not already clear. Suggest a filename.
-2. **Starting state** — "Where does the app start for this test? What's already set up?" → becomes `## Setup`
-3. **Steps** — "Walk me through the steps, one at a time. I'll ask for the next when you're done." → collect each step, then ask "What should happen?" for the assertion (optional)
-4. **Global assertions** — "Anything non-obvious that should hold at the end?" → becomes `## Assertions`; skip if none. Never suggest trivial examples (no errors shown, page loaded) — only capture meaningful, app-specific checks.
-5. **Metadata** — "Any tags or a custom timeout?" (offer to skip)
+1. **What flow?** — Confirm what's being tested if not already clear. Suggest a filename and `feature` name.
+2. **Entry point** — "What's the navigation path to reach this flow?" (e.g., `App launch`, `Home > Wallet`) → becomes `entry:` frontmatter
+3. **Starting state** — "What's already set up? What state is the device/app in?" → becomes `## Setup`
+4. **Steps** — "Walk me through the steps, one at a time. I'll ask for the next when you're done." → collect each step, then ask "What should happen?" for the assertion (optional)
+5. **Global assertions** — "Any overall things that should be true at the end of the flow?" → becomes `## Assertions` (skip if none)
+6. **Timeout** — "Set a timeout in seconds? (optional, for long-running specs)" → becomes `timeout:` frontmatter (offer to skip)
 IMPORTANT: Ask each question in its own message. Never batch questions.
 ### 4. Draft
-Assemble the spec from interview answers — don't invent steps or assertions the user didn't describe. Present the full draft for review.
+Assemble using ONLY these frontmatter fields: `feature`, `entry`, `timeout`. Do not add any other frontmatter field. `feature` MUST be present. `timeout` MUST be a positive number (seconds) if included.
+Steps and assertions come from the user — never invent them. Present the full draft for review.
 ### 5. Review
@@ -66,28 +69,30 @@ Save to `.xqa/specs/<name>.test.md` only after explicit approval.
 ```md
 ---
-description: optional one-liner
-tags: [optional, tags]
-timeout: 120
+feature: <string>
+entry: <string>
+timeout: <seconds>
 ---
 ## Setup
-Starting screen and preconditions. Required.
+<preconditions and starting state>
 ## Steps
-1. Action → expected outcome (optional inline assertion)
-2. Next action
+1. <action> → <expected outcome>
+2. <action>
 ## Assertions
-- Global flow-level check (optional section)
+- <global flow-level check>
 ```
+Omit `entry` and `timeout` lines if not provided. Omit `## Assertions` section if none.
 ## Rules
-- `## Setup` and `## Steps` are required; frontmatter and `## Assertions` are optional
+- `## Setup` and `## Steps` are required; `## Assertions` is optional
 - Inline assertion syntax: `action → outcome` using the → character
 - Steps come from the user — never invent them
 - Write file only after explicit approval

package/dist/skills/xqa-spec/SKILL.md CHANGED Viewed

@@ -26,10 +26,9 @@ IMPORTANT: Never generate a draft before the interview is complete. The user des
 Silently scan `.xqa/specs/*.test.md`. Learn:
 - Naming conventions
-- Tag vocabulary
 - Level of detail and step granularity
-Also read `.xqa/instructions.md` if it exists for app context.
+Also read `.xqa/app.md` if it exists for app context.
 ### 2. Detect mode
@@ -46,17 +45,19 @@ Ask one question at a time. Wait for the answer before asking the next. Prefer m
 **Question sequence:**
-1. **What flow?** — Confirm what's being tested if not already clear. Suggest a filename.
-2. **Starting state** — "Where does the app start for this test? What's already set up?" → becomes `## Setup`
-3. **Steps** — "Walk me through the steps, one at a time. I'll ask for the next when you're done." → collect each step, then ask "What should happen?" for the assertion (optional)
-4. **Global assertions** — "Anything non-obvious that should hold at the end?" → becomes `## Assertions`; skip if none. Never suggest trivial examples (no errors shown, page loaded) — only capture meaningful, app-specific checks.
-5. **Metadata** — "Any tags or a custom timeout?" (offer to skip)
+1. **What flow?** — Confirm what's being tested if not already clear. Suggest a filename and `feature` name.
+2. **Starting state** — "What's already set up? What state is the device/app in?" → becomes `## Setup`
+3. **Steps** — "Walk me through the steps, one at a time. For each step, describe the intent (what the user is trying to do), and optionally the expected outcome and a hint about the current label or wording." → collect each step. Prompt for outcome only when user mentions something to verify. Prompt for hint only when current label wording is useful.
+4. **Global assertions** — "Any overall things that should be true at the end of the flow?" → becomes `## Assertions` (skip if none)
+5. **Timeout** — "Set a timeout in seconds? (optional, for long-running specs)" → becomes `timeout:` frontmatter (offer to skip)
 IMPORTANT: Ask each question in its own message. Never batch questions.
 ### 4. Draft
-Assemble the spec from interview answers — don't invent steps or assertions the user didn't describe. Present the full draft for review.
+Assemble using ONLY these frontmatter fields: `feature`, `timeout`. Do not add any other frontmatter field. `feature` MUST be present. `timeout` MUST be a positive number (seconds) if included.
+Steps come from the user — never invent them. Use intent-first language: describe the user's goal, not literal widget taps. Present the full draft for review.
 ### 5. Review
@@ -66,35 +67,89 @@ Iterate until approved. One round of changes per message.
 ### 6. Write
-Save to `.xqa/specs/<name>.test.md` only after explicit approval.
+Before writing, verify the draft passes all checks:
+- [ ] `feature` is present and non-empty
+- [ ] frontmatter contains only permitted fields: `feature`, `timeout`
+- [ ] `timeout` if present is a positive number in seconds (not a string, not zero)
+- [ ] `## Setup` section is present
+- [ ] `## Steps` section is present
+- [ ] No step begins with `Tap "<literal label>"` or equivalent literal-widget references — steps use intent-first language
+- [ ] No forbidden fields: `tags`, `max_steps`, `priority`, `type`, `description`, `id`, `author`, `version`, `entry`
+Fix any failure before writing. Save to `.xqa/specs/<name>.test.md` only after explicit approval.
+## Step Grammar
+```
+<intent phrase> [→ <outcome state>] [hint: <advisory text>]
+```
+- **Intent phrase** — imperative, action-oriented, describes the user's goal. No literal widget labels. Include a domain noun (e.g., "agreement", "asset", "settings").
+- **Outcome state** — optional. Include only when the step has something to verify. Observable screen state or content predicate.
+- **Hint** — optional. Free-form natural text describing expected label or qualities. Agent infers role (primary/secondary/dismissal) from context. Never use structured `role=` syntax.
+### Valid examples
+```md
+1. Accept the terms and conditions → Portfolio screen visible [hint: "Agree and continue"]
+2. Open the settings menu
+3. Dismiss the backup reminder [hint: skip or maybe later button]
+4. Enter the recovery phrase words in order → All 12 fields filled, Confirm becomes active
+```
+### Anti-patterns
+```md
+# Literal label as selector — breaks on rename
+1. Tap the "Agree and continue" button
+# Vague intent — agent has nothing to anchor on
+2. Do the onboarding thing
+```
 ## File format
+FRONTMATTER SCHEMA — exact fields, exact types, no others:
+```
+feature    string           REQUIRED
+timeout    positive number (seconds) OPTIONAL — omit if not provided
+```
+FORBIDDEN frontmatter fields — never generate these: `tags`, `max_steps`, `priority`, `type`, `description`, `id`, `author`, `version`, `entry`
+CANONICAL OUTPUT FORMAT:
 ```md
 ---
-description: optional one-liner
-tags: [optional, tags]
-timeout: 120
+feature: <string>
+timeout: <seconds>
 ---
 ## Setup
-Starting screen and preconditions. Required.
+<preconditions and starting state>
 ## Steps
-1. Action → expected outcome (optional inline assertion)
-2. Next action
+1. <intent> → <outcome> [hint: <advisory>]
+2. <intent>
 ## Assertions
-- Global flow-level check (optional section)
+- <global flow-level check>
 ```
+Omit `timeout` line if not provided. Omit `## Assertions` section if none. Omit `→ <outcome>` or `[hint: ...]` per-step when not applicable.
 ## Rules
-- `## Setup` and `## Steps` are required; frontmatter and `## Assertions` are optional
-- Inline assertion syntax: `action → outcome` using the → character
+- Step grammar: `<intent> [→ <outcome>] [hint: <advisory>]` using the → character
+- Outcome and hint are optional per step; include only when useful
 - Steps come from the user — never invent them
+- Intent-first language only; no literal `Tap "<label>"` references
 - Write file only after explicit approval
 - In edit mode, ask before touching anything