npm - maestro-flow - Versions diffs - 0.5.3 → 0.5.31 - Mend

maestro-flow 0.5.3 → 0.5.31

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (259) hide show

package/.agents/skills/learn-follow/SKILL.md +114 -114
package/.agents/skills/learn-investigate/SKILL.md +138 -139
package/.agents/skills/learn-second-opinion/SKILL.md +105 -109
package/.agents/skills/maestro/SKILL.md +2 -10
package/.agents/skills/maestro-amend/SKILL.md +152 -152
package/.agents/skills/maestro-analyze/SKILL.md +201 -252
package/.agents/skills/maestro-blueprint/SKILL.md +175 -190
package/.agents/skills/maestro-brainstorm/SKILL.md +196 -200
package/.agents/skills/maestro-collab/SKILL.md +159 -159
package/.agents/skills/maestro-companion/SKILL.md +517 -517
package/.agents/skills/maestro-composer/SKILL.md +173 -164
package/.agents/skills/maestro-execute/SKILL.md +169 -170
package/.agents/skills/maestro-fork/SKILL.md +97 -96
package/.agents/skills/maestro-grill/SKILL.md +161 -162
package/.agents/skills/maestro-guard/SKILL.md +93 -92
package/.agents/skills/maestro-impeccable/SKILL.md +296 -253
package/.agents/skills/maestro-init/SKILL.md +117 -118
package/.agents/skills/maestro-merge/SKILL.md +73 -66
package/.agents/skills/maestro-milestone-audit/SKILL.md +4 -10
package/.agents/skills/maestro-milestone-complete/SKILL.md +6 -7
package/.agents/skills/maestro-milestone-release/SKILL.md +122 -131
package/.agents/skills/maestro-next/SKILL.md +241 -245
package/.agents/skills/maestro-overlay/SKILL.md +176 -166
package/.agents/skills/maestro-plan/SKILL.md +211 -197
package/.agents/skills/maestro-player/SKILL.md +167 -167
package/.agents/skills/maestro-quick/SKILL.md +69 -63
package/.agents/skills/maestro-ralph/SKILL.md +2 -36
package/.agents/skills/maestro-ralph-beta/SKILL.md +861 -872
package/.agents/skills/maestro-ralph-execute/SKILL.md +234 -234
package/.agents/skills/maestro-roadmap/SKILL.md +159 -172
package/.agents/skills/maestro-swarm-workflow/SKILL.md +229 -250
package/.agents/skills/maestro-tools-execute/SKILL.md +108 -103
package/.agents/skills/maestro-tools-register/SKILL.md +148 -143
package/.agents/skills/maestro-ui-codify/SKILL.md +103 -86
package/.agents/skills/maestro-universal-workflow/SKILL.md +534 -547
package/.agents/skills/maestro-update/SKILL.md +109 -106
package/.agents/skills/manage-codebase-rebuild/SKILL.md +73 -71
package/.agents/skills/manage-harvest/SKILL.md +83 -81
package/.agents/skills/manage-issue/SKILL.md +59 -60
package/.agents/skills/manage-issue-discover/SKILL.md +70 -68
package/.agents/skills/manage-kg-extractors/SKILL.md +130 -0
package/.agents/skills/manage-knowhow/SKILL.md +70 -66
package/.agents/skills/manage-knowhow-capture/SKILL.md +79 -69
package/.agents/skills/manage-knowledge-audit/SKILL.md +91 -74
package/.agents/skills/manage-status/SKILL.md +52 -42
package/.agents/skills/manage-wiki/SKILL.md +69 -58
package/.agents/skills/odyssey-debug/SKILL.md +445 -459
package/.agents/skills/odyssey-improve/SKILL.md +477 -491
package/.agents/skills/odyssey-planex/SKILL.md +576 -587
package/.agents/skills/odyssey-review-test-fix/SKILL.md +400 -413
package/.agents/skills/odyssey-ui/SKILL.md +431 -448
package/.agents/skills/quality-auto-test/SKILL.md +140 -123
package/.agents/skills/quality-debug/SKILL.md +145 -106
package/.agents/skills/quality-refactor/SKILL.md +91 -53
package/.agents/skills/quality-retrospective/SKILL.md +109 -63
package/.agents/skills/quality-review/SKILL.md +141 -114
package/.agents/skills/quality-sync/SKILL.md +74 -38
package/.agents/skills/quality-test/SKILL.md +133 -103
package/.agents/skills/security-audit/SKILL.md +217 -166
package/.agents/skills/spec-add/SKILL.md +66 -59
package/.agents/skills/spec-load/SKILL.md +68 -68
package/.agents/skills/spec-remove/SKILL.md +42 -42
package/.agents/skills/spec-setup/SKILL.md +38 -41
package/.agy/skills/learn-follow/SKILL.md +114 -114
package/.agy/skills/learn-investigate/SKILL.md +138 -139
package/.agy/skills/learn-second-opinion/SKILL.md +105 -109
package/.agy/skills/maestro/SKILL.md +2 -10
package/.agy/skills/maestro-amend/SKILL.md +152 -152
package/.agy/skills/maestro-analyze/SKILL.md +201 -252
package/.agy/skills/maestro-blueprint/SKILL.md +175 -190
package/.agy/skills/maestro-brainstorm/SKILL.md +196 -200
package/.agy/skills/maestro-collab/SKILL.md +159 -159
package/.agy/skills/maestro-companion/SKILL.md +517 -517
package/.agy/skills/maestro-composer/SKILL.md +173 -164
package/.agy/skills/maestro-execute/SKILL.md +169 -170
package/.agy/skills/maestro-fork/SKILL.md +97 -96
package/.agy/skills/maestro-grill/SKILL.md +161 -162
package/.agy/skills/maestro-guard/SKILL.md +93 -92
package/.agy/skills/maestro-impeccable/SKILL.md +296 -253
package/.agy/skills/maestro-init/SKILL.md +117 -118
package/.agy/skills/maestro-merge/SKILL.md +73 -66
package/.agy/skills/maestro-milestone-audit/SKILL.md +4 -10
package/.agy/skills/maestro-milestone-complete/SKILL.md +6 -7
package/.agy/skills/maestro-milestone-release/SKILL.md +122 -131
package/.agy/skills/maestro-next/SKILL.md +241 -245
package/.agy/skills/maestro-overlay/SKILL.md +176 -166
package/.agy/skills/maestro-plan/SKILL.md +211 -197
package/.agy/skills/maestro-player/SKILL.md +167 -167
package/.agy/skills/maestro-quick/SKILL.md +69 -63
package/.agy/skills/maestro-ralph/SKILL.md +2 -36
package/.agy/skills/maestro-ralph-beta/SKILL.md +861 -872
package/.agy/skills/maestro-ralph-execute/SKILL.md +234 -234
package/.agy/skills/maestro-roadmap/SKILL.md +159 -172
package/.agy/skills/maestro-swarm-workflow/SKILL.md +229 -250
package/.agy/skills/maestro-tools-execute/SKILL.md +108 -103
package/.agy/skills/maestro-tools-register/SKILL.md +148 -143
package/.agy/skills/maestro-ui-codify/SKILL.md +103 -86
package/.agy/skills/maestro-universal-workflow/SKILL.md +534 -547
package/.agy/skills/maestro-update/SKILL.md +109 -106
package/.agy/skills/manage-codebase-rebuild/SKILL.md +73 -71
package/.agy/skills/manage-harvest/SKILL.md +83 -81
package/.agy/skills/manage-issue/SKILL.md +59 -60
package/.agy/skills/manage-issue-discover/SKILL.md +70 -68
package/.agy/skills/manage-kg-extractors/SKILL.md +130 -0
package/.agy/skills/manage-knowhow/SKILL.md +70 -66
package/.agy/skills/manage-knowhow-capture/SKILL.md +79 -69
package/.agy/skills/manage-knowledge-audit/SKILL.md +91 -74
package/.agy/skills/manage-status/SKILL.md +52 -42
package/.agy/skills/manage-wiki/SKILL.md +69 -58
package/.agy/skills/odyssey-debug/SKILL.md +445 -459
package/.agy/skills/odyssey-improve/SKILL.md +477 -491
package/.agy/skills/odyssey-planex/SKILL.md +576 -587
package/.agy/skills/odyssey-review-test-fix/SKILL.md +400 -413
package/.agy/skills/odyssey-ui/SKILL.md +431 -448
package/.agy/skills/quality-auto-test/SKILL.md +140 -123
package/.agy/skills/quality-debug/SKILL.md +145 -106
package/.agy/skills/quality-refactor/SKILL.md +91 -53
package/.agy/skills/quality-retrospective/SKILL.md +109 -63
package/.agy/skills/quality-review/SKILL.md +141 -114
package/.agy/skills/quality-sync/SKILL.md +74 -38
package/.agy/skills/quality-test/SKILL.md +133 -103
package/.agy/skills/security-audit/SKILL.md +217 -166
package/.agy/skills/spec-add/SKILL.md +66 -59
package/.agy/skills/spec-load/SKILL.md +68 -68
package/.agy/skills/spec-remove/SKILL.md +42 -42
package/.agy/skills/spec-setup/SKILL.md +38 -41
package/.claude/commands/learn-follow.md +127 -127
package/.claude/commands/learn-investigate.md +151 -152
package/.claude/commands/learn-second-opinion.md +118 -122
package/.claude/commands/maestro-amend.md +164 -164
package/.claude/commands/maestro-analyze.md +215 -266
package/.claude/commands/maestro-blueprint.md +189 -204
package/.claude/commands/maestro-brainstorm.md +209 -213
package/.claude/commands/maestro-collab.md +172 -172
package/.claude/commands/maestro-companion.md +531 -531
package/.claude/commands/maestro-composer.md +188 -179
package/.claude/commands/maestro-execute.md +183 -184
package/.claude/commands/maestro-fork.md +111 -110
package/.claude/commands/maestro-grill.md +175 -176
package/.claude/commands/maestro-guard.md +103 -102
package/.claude/commands/maestro-impeccable.md +311 -268
package/.claude/commands/maestro-init.md +130 -131
package/.claude/commands/maestro-merge.md +87 -80
package/.claude/commands/maestro-milestone-audit.md +4 -10
package/.claude/commands/maestro-milestone-complete.md +6 -7
package/.claude/commands/maestro-milestone-release.md +136 -145
package/.claude/commands/maestro-next.md +253 -257
package/.claude/commands/maestro-overlay.md +188 -178
package/.claude/commands/maestro-plan.md +225 -211
package/.claude/commands/maestro-player.md +182 -182
package/.claude/commands/maestro-quick.md +83 -77
package/.claude/commands/maestro-ralph-beta.md +875 -886
package/.claude/commands/maestro-ralph-execute.md +247 -247
package/.claude/commands/maestro-ralph.md +2 -36
package/.claude/commands/maestro-roadmap.md +173 -186
package/.claude/commands/maestro-swarm-workflow.md +243 -264
package/.claude/commands/maestro-tools-execute.md +122 -117
package/.claude/commands/maestro-tools-register.md +162 -157
package/.claude/commands/maestro-ui-codify.md +117 -100
package/.claude/commands/maestro-universal-workflow.md +548 -561
package/.claude/commands/maestro-update.md +122 -119
package/.claude/commands/maestro.md +2 -10
package/.claude/commands/manage-codebase-rebuild.md +87 -85
package/.claude/commands/manage-harvest.md +97 -95
package/.claude/commands/manage-issue-discover.md +83 -81
package/.claude/commands/manage-issue.md +72 -73
package/.claude/commands/manage-kg-extractors.md +128 -0
package/.claude/commands/manage-knowhow-capture.md +92 -82
package/.claude/commands/manage-knowhow.md +83 -79
package/.claude/commands/manage-knowledge-audit.md +105 -88
package/.claude/commands/manage-status.md +62 -52
package/.claude/commands/manage-wiki.md +82 -71
package/.claude/commands/odyssey-debug.md +459 -473
package/.claude/commands/odyssey-improve.md +491 -505
package/.claude/commands/odyssey-planex.md +590 -601
package/.claude/commands/odyssey-review-test-fix.md +414 -427
package/.claude/commands/odyssey-ui.md +445 -462
package/.claude/commands/quality-auto-test.md +153 -136
package/.claude/commands/quality-debug.md +159 -120
package/.claude/commands/quality-refactor.md +105 -67
package/.claude/commands/quality-retrospective.md +123 -77
package/.claude/commands/quality-review.md +155 -128
package/.claude/commands/quality-sync.md +88 -52
package/.claude/commands/quality-test.md +147 -117
package/.claude/commands/security-audit.md +230 -179
package/.claude/commands/spec-add.md +77 -70
package/.claude/commands/spec-load.md +78 -78
package/.claude/commands/spec-remove.md +55 -55
package/.claude/commands/spec-setup.md +49 -52
package/dist/src/cli.js +1 -1
package/dist/src/cli.js.map +1 -1
package/dist/src/commands/kg.d.ts.map +1 -1
package/dist/src/commands/kg.js +11 -5
package/dist/src/commands/kg.js.map +1 -1
package/dist/src/graph/kg/extraction/code/code-extractor.d.ts +2 -0
package/dist/src/graph/kg/extraction/code/code-extractor.d.ts.map +1 -1
package/dist/src/graph/kg/extraction/code/code-extractor.js +32 -3
package/dist/src/graph/kg/extraction/code/code-extractor.js.map +1 -1
package/dist/src/graph/kg/extraction/code/plugin-engine.d.ts +35 -0
package/dist/src/graph/kg/extraction/code/plugin-engine.d.ts.map +1 -0
package/dist/src/graph/kg/extraction/code/plugin-engine.js +573 -0
package/dist/src/graph/kg/extraction/code/plugin-engine.js.map +1 -0
package/dist/src/graph/kg/extraction/code/plugin-types.d.ts +95 -0
package/dist/src/graph/kg/extraction/code/plugin-types.d.ts.map +1 -0
package/dist/src/graph/kg/extraction/code/plugin-types.js +5 -0
package/dist/src/graph/kg/extraction/code/plugin-types.js.map +1 -0
package/dist/src/graph/kg/extraction/orchestrator.d.ts.map +1 -1
package/dist/src/graph/kg/extraction/orchestrator.js +17 -5
package/dist/src/graph/kg/extraction/orchestrator.js.map +1 -1
package/dist/src/graph/kg/schema.sql +16 -11
package/dist/src/graph/kg/surface/cli.d.ts.map +1 -1
package/dist/src/graph/kg/surface/cli.js +153 -56
package/dist/src/graph/kg/surface/cli.js.map +1 -1
package/dist/src/hooks/workspace.d.ts +4 -2
package/dist/src/hooks/workspace.d.ts.map +1 -1
package/dist/src/hooks/workspace.js +6 -2
package/dist/src/hooks/workspace.js.map +1 -1
package/package.json +91 -91
package/workflows/analyze.md +25 -49
package/workflows/auto-test.md +699 -699
package/workflows/blueprint.md +403 -431
package/workflows/brainstorm.md +54 -195
package/workflows/business-test.md +570 -570
package/workflows/claude-instructions.md +23 -51
package/workflows/codex-instructions.md +27 -77
package/workflows/coding-philosophy.md +69 -69
package/workflows/command-authoring.md +823 -823
package/workflows/debug.md +43 -98
package/workflows/delegate-usage.md +39 -241
package/workflows/execute.md +4 -53
package/workflows/grill.md +12 -56
package/workflows/harvest.md +22 -68
package/workflows/init.md +148 -148
package/workflows/instruction-authoring-guide.md +97 -0
package/workflows/issue-execute.md +110 -110
package/workflows/issue-gaps-analyze.codex.md +260 -260
package/workflows/issue-gaps-analyze.md +216 -216
package/workflows/issue-plan.md +110 -110
package/workflows/issue.md +338 -346
package/workflows/knowhow.md +0 -32
package/workflows/learn.md +277 -277
package/workflows/maestro-chain-execute.md +20 -20
package/workflows/refactor.md +22 -44
package/workflows/retrospective.md +16 -65
package/workflows/review.md +446 -486
package/workflows/roadmap.md +35 -132
package/workflows/skill-authoring.md +265 -265
package/workflows/spec-generate.md +470 -470
package/workflows/specs-remove.md +104 -104
package/workflows/sync.md +11 -41
package/workflows/test-gen.md +226 -226
package/workflows/test.md +385 -475
package/workflows/ui-design.md +391 -391
package/workflows/ui-style.md +199 -199
package/workflows/wiki-connect.md +151 -151
package/workflows/wiki-digest.md +178 -178
package/workflows/wiki-manage.md +109 -109
package/workflows/cli-tools-usage.md +0 -252
package/workflows/delegate-protocol.codex.md +0 -65

package/workflows/test.md CHANGED Viewed

@@ -1,475 +1,385 @@
-# Test Workflow (UAT)
-Validate built features through conversational UAT testing with persistent state, auto-diagnosis via parallel debug agents, and gap-fix closure loop.
-User tests, Claude records. One test at a time. Plain text responses.
-Severity inferred from natural language -- never ask "how severe is this?"
-**Philosophy: Show expected, ask if reality matches.**
-Claude presents what SHOULD happen. User confirms or describes what's different.
-- "yes" / "y" / "next" / empty / "pass" -> pass
-- "skip" / "can't test" / "n/a" -> skipped
-- Anything else -> logged as issue, severity inferred
-No Pass/Fail buttons. No severity questions. Just: "Here's what should happen. Does it?"
----
-### Step 1: Resolve Target
-Determine test target from $ARGUMENTS:
-**If phase number provided** (e.g., "3"):
-- Set `$TARGET_TYPE = "phase"`
-- Resolve phase dir: look up `phaseNum` in `.workflow/state.json` artifacts (type=execute), derive `PHASE_DIR = ".workflow/" + art.path`. Error if not found.
-- Load `$PHASE_DIR/index.json` for context
-**If scratch task ID provided:**
-- Set `$TARGET_TYPE = "scratch"`
-- Set `$SCRATCH_DIR = ".workflow/scratch/{id}/"`
-- Load `$SCRATCH_DIR/index.json` for context
-**If nothing provided:**
-- Check for active UAT sessions (see Step 2)
-- If none found, prompt user for phase number or scratch task
-**Flags:**
-- `--smoke` -- Run cold-start smoke tests before UAT
-- `--auto-fix` -- Auto-trigger gap-fix loop on failures
-Validate target exists and has been verified (verification.json present). (E002)
----
-### Step 2: Check Active Sessions
-```bash
-# Check scratch dirs (resolved via artifact registry) for active UAT sessions
-find .workflow/scratch -name "uat.md" -type f 2>/dev/null | head -5
-```
-Read each file's frontmatter (status, target) and Current Test section.
-**If active sessions exist AND no $ARGUMENTS:**
-Display inline:
-```
-## Active UAT Sessions
-| # | Target | Status | Current Test | Progress |
-|---|--------|--------|--------------|----------|
-| 1 | 04-comments | testing | 3. Reply to Comment | 2/6 |
-| 2 | quick-fix-nav | testing | 1. Nav Links | 0/4 |
-Reply with a number to resume, or provide a phase/task to start new.
-```
-Wait for user response.
-- Number -> resume that session (go to Step 9: Resume From File)
-- Phase/task ID -> new session (go to Step 4: Find Testables)
-**If active sessions exist AND $ARGUMENTS provided:**
-Check if session exists for that target. If yes, offer resume or restart.
-**If no active sessions AND no $ARGUMENTS:**
-Prompt: "No active UAT sessions. Provide a phase number or scratch task ID to start testing."
-**If no active sessions AND $ARGUMENTS:**
-Continue to Step 3 or Step 4.
----
-### Step 3: Run Smoke Tests (if --smoke)
-Skip if --smoke not set.
-Inject basic sanity tests BEFORE UAT scenarios:
-| Smoke Test | Check | Method |
-|------------|-------|--------|
-| App starts | Process runs without crash | `bash: start command, check exit code` |
-| Routes respond | Key endpoints return non-error | `bash: curl/fetch main routes` |
-| Build clean | No build errors | `bash: build command succeeds` |
-| Dependencies | No missing deps | `bash: install check` |
-Record smoke results in uat.md under `## Smoke Tests` section.
-If any smoke test fails: abort UAT, report as blocker, suggest Skill({ skill: "quality-debug" }). (E003)
----
-### Step 4: Load Verification Context
-Read from target directory:
-- verification.json -- must_haves with truth/artifact/wiring status
-- validation.json -- requirement-to-test mapping
-- index.json -- success_criteria
-- plan.json -- task overview
-- All `.summaries/TASK-*.md` -- execution results
-```bash
-ls "$OUTPUT_DIR/.summaries/"*summary*.md 2>/dev/null
-```
-Build testable list: user-observable outcomes from success_criteria + must_haves + task accomplishments.
----
-### Step 5: Design Test Scenarios
-For each testable item, create a scenario:
-- **id**: T-001, T-002, ...
-- **name**: Brief test name
-- **category**: "e2e" | "integration" | "unit"
-- **expected**: Specific observable behavior (what user should see)
-- **requirement_ref**: Which success criterion this covers
-Write test-plan.json to `.tests/`:
-```json
-{
-  "target": "{phase or scratch ID}",
-  "generated_at": "{ISO timestamp}",
-  "tests": [...],
-  "coverage": {
-    "requirements_mapped": ["SC-001"],
-    "requirements_unmapped": ["SC-003"]
-  }
-}
-```
-```bash
-mkdir -p "$OUTPUT_DIR/.tests"
-```
-Focus on USER-OBSERVABLE outcomes, not implementation details.
-Skip internal/non-observable items (refactors, type changes).
----
-### Step 6: Create UAT File
-**Archive previous UAT artifacts** before writing: if `$OUTPUT_DIR/uat.md` exists, move it to `$OUTPUT_DIR/.history/uat-{YYYY-MM-DDTHH-mm-ss}.md`.
-Build test list from test-plan.json. Create file at `$OUTPUT_DIR/uat.md`:
-```markdown
----
-status: testing
-target: {phase slug or scratch ID}
-source: [list of summary files]
-started: {ISO timestamp}
-updated: {ISO timestamp}
----
-## Current Test
-<!-- OVERWRITE each test - shows where we are -->
-number: 1
-name: {first test name}
-expected: |
-  {what user should observe}
-awaiting: user response
-## Smoke Tests
-{results if ran, otherwise omitted}
-## Tests
-### 1. {Test Name}
-expected: {observable behavior}
-result: [pending]
-### 2. {Test Name}
-expected: {observable behavior}
-result: [pending]
-...
-## Summary
-total: {N}
-passed: 0
-issues: 0
-pending: {N}
-skipped: 0
-## Gaps
-[none yet]
-```
-Proceed to Step 7.
----
-### Step 7: Present Test
-Present current test to user (one at a time):
-Read Current Test section from uat.md.
-Display:
-```
-------------------------------------------------------------
-  TEST {number}/{total}: {name}
-------------------------------------------------------------
-Expected behavior:
-{expected}
-------------------------------------------------------------
-> Type "pass" or describe what's wrong
-------------------------------------------------------------
-```
-Wait for user response (plain text, no AskUserQuestion).
----
-### Step 8: Process Response
-**If response indicates pass:**
-- Empty response, "yes", "y", "ok", "pass", "next"
-**If response indicates skip:**
-- "skip", "can't test", "n/a"
-**If response is anything else (issue):**
-- Treat as issue description
-- Infer severity from description (see Severity Inference section)
-For issues, update Tests section:
-```yaml
-### {N}. {name}
-expected: {expected}
-result: issue
-reported: "{verbatim user response}"
-severity: {inferred}
-```
-Append to Gaps section:
-```yaml
-- test: {N}
-  truth: "{expected behavior}"
-  status: failed
-  reason: "User reported: {verbatim}"
-  severity: {inferred}
-  requirement_ref: {if mapped}
-```
-**Auto-create Issue from UAT Gap:**
-When result is "issue", create an issue in `.workflow/issues/issues.jsonl`:
-- **ID**: `ISS-{YYYYMMDD}-{NNN}` (auto-increment per day from existing entries)
-- **Fields**: `id`, `title` ("UAT: {test.name} - {response}" truncated 100 chars), `status: "registered"`, `priority` (from severity), `severity`, `source: "uat"`, `phase_ref` (if phase-scoped), `gap_ref: test.id`, `description` (expected vs reported), `fix_direction: ""`, `context` (with requirement_ref), `tags: ["uat"]`, `affected_components: []`, `feedback: []`, `issue_history: []`, timestamps, `resolved_at: null`, `resolution: null`
-- Back-reference: set `gap.issue_id = issue_id` in the gap YAML entry
-**Batched writes for efficiency:**
-Keep results in memory. Write to file only when:
-1. **Issue found** -- Preserve the problem immediately
-2. **Session complete** -- Final write before artifacts
-3. **Checkpoint** -- Every 5 passed tests (safety net for context reset)
-If more tests remain -> Update Current Test, go to Step 7
-If no more tests -> Go to Step 10
----
-### Step 9: Resume From File
-Read the full uat.md file.
-Find first test with `result: [pending]`.
-Announce progress and continue from pending test.
-Update Current Test section with the pending test.
-Proceed to Step 7.
----
-### Step 10: Complete Session
-Update uat.md frontmatter: status -> "complete", updated timestamp.
-**Archive previous test result artifacts** before writing: if `test-results.json` or `coverage-report.json` exist in `$OUTPUT_DIR/.tests/`, move them to `$OUTPUT_DIR/.history/{name}-{YYYY-MM-DDTHH-mm-ss}.{ext}`.
-Write `.tests/test-results.json`:
-```json
-{
-  "target": "{phase or scratch ID}",
-  "completed_at": "{ISO timestamp}",
-  "results": [
-    { "id": "T-001", "name": "...", "status": "pass|issue|skipped", "details": "..." }
-  ],
-  "summary": { "total": N, "passed": N, "issues": N, "skipped": N }
-}
-```
-Write `.tests/coverage-report.json`:
-```json
-{
-  "target": "{phase or scratch ID}",
-  "generated_at": "{ISO timestamp}",
-  "requirements_covered": ["SC-001"],
-  "requirements_uncovered": ["SC-003"],
-  "coverage_percentage": 66.7
-}
-```
-Update index.json with uat results:
-```json
-{
-  "uat": {
-    "status": "passed|gaps_found",
-    "test_count": N,
-    "passed": N,
-    "gaps": [...]
-  }
-}
-```
-If issues == 0 -> go to Step 13 (report, all pass).
-If issues > 0 -> go to Step 11.
----
-### Step 11: Auto-Diagnose
-**Spawn parallel debug agents for gap clusters.**
-1. **Cluster related gaps**: Group issues by affected component/area.
-   - Same file/module -> one cluster
-   - Same feature/flow -> one cluster
-   - Unrelated -> separate clusters
-2. **Spawn one debug agent per cluster** (parallel):
-For each cluster, spawn a general-purpose agent with pre-filled symptoms (test ID, expected, reported, severity). Agent investigates source files and returns per gap: `root_cause`, `fix_direction`, `affected_files`, `evidence` (file:line refs). Mode: `symptoms_prefilled`, goal: `find_root_cause`. `run_in_background: false`.
-3. **Collect results** from all agents.
-**Pass issue_ids to debug context:** gather `issue_id` from each gap in the cluster and include in agent prompt so debug agents can reference/update corresponding issues.
-4. **Update uat.md** gaps with diagnosis:
-```yaml
-- test: {N}
-  truth: "..."
-  status: failed
-  reason: "..."
-  severity: {inferred}
-  root_cause: "{diagnosed cause}"
-  fix_direction: "{suggested approach}"
-  affected_files: ["{file1}", "{file2}"]
-```
-Proceed to Step 12.
----
-### Step 12: Gap Closure Decision
-If AUTO_FIX is set:
-- Skip user prompt, go directly to gap-fix loop.
-If AUTO_FIX is not set:
-- Present diagnosis summary and offer options:
-```
-### Diagnosis Complete
-| Gap | Severity | Root Cause | Fix Direction |
-|-----|----------|------------|---------------|
-| T-3 | major    | Missing null check | Add guard clause |
-| T-5 | blocker  | Event not cleaned  | Add cleanup logic |
-Options:
-1. Auto-fix -- Plan and execute fixes, then re-verify
-2. Debug deep -- Skill({ skill: "quality-debug" }) per issue
-3. Plan fixes -- Skill({ skill: "maestro-plan", args: "{phase} --gaps" })
-4. Manual fix -- Address issues yourself
-```
-| Choice | Action |
-|--------|--------|
-| 1 / "auto-fix" | Go to gap-fix loop |
-| 2 / "debug" | Suggest Skill({ skill: "quality-debug" }) |
-| 3 / "plan" | Suggest Skill({ skill: "maestro-plan", args: "{phase} --gaps" }) |
-| 4 / "manual" | Done, report results |
-**Gap-fix closure loop:**
-Execute the loop: plan --gaps -> execute -> re-verify.
-1. Run Skill({ skill: "maestro-plan", args: "{phase} --gaps" }) -- generates fix tasks from gaps
-2. Run Skill({ skill: "maestro-execute", args: "{phase}" }) -- executes fix tasks
-3. Run Skill({ skill: "maestro-execute", args: "{phase}" }) -- re-verify via verification gate
-If re-verify passes: update uat.md gaps as resolved, report success.
-If re-verify still has gaps: report remaining gaps, suggest manual intervention.
-**Issue lifecycle updates during gap-fix loop:**
-- Before plan --gaps: transition issues `registered` -> `planning`
-- Before execute: transition `planning` -> `executing`
-- After re-verify: resolved gaps -> `completed` (with resolution "auto-fixed via gap-fix loop"), unresolved -> `failed`
-**Loop limit**: Maximum 2 iterations to prevent infinite loops.
----
-### Step 12.5: UAT Confidence Scoring
-Dimensions (4): scenario_coverage, diagnostic_depth, observation_quality, closure_completeness. Factors (weights): requirements_mapped(.30), observation_specificity(.25), user_validation(.20), diagnostic_depth(.15), consistency(.10). Score at: init (Step 5), per user response (Step 8), after gap-fix loop (Step 12).
-Quality mechanisms: Pressure Pass — >80% pass → ask user to try edge case. Devil's Advocate — >70% first-try pass → challenge scenario difficulty. Stall Detection — 2 gap-fix iterations without improvement → stop.
-Readiness Gate (blocks Step 13): scenario_coverage < 40% | blocker gap without diagnosis | no pressure pass (if >80%) | unresolved gaps without acknowledgment. Append confidence summary to uat.md.
----
-### Step 13: Report
-```
-=== UAT RESULTS ===
-Target:      {target}
-Smoke Tests: {smoke_count} run, {smoke_pass} passed (if ran)
-UAT Tests:   {total} total
-  Passed:    {passed}
-  Issues:    {issues} ({blocker_count} blockers, {major_count} major)
-  Skipped:   {skipped}
-Diagnosis:   {diagnosed_count}/{issues} gaps diagnosed
-Auto-fix:    {fixed_count} gaps resolved (if ran)
-Files:
-  {target_dir}/uat.md
-  {target_dir}/.tests/test-results.json
-  {target_dir}/.tests/coverage-report.json
-Next steps:
-  {suggested_next_command}
-```
-**Next step routing:**
-| Result | Suggestion |
-|--------|------------|
-| All passed, no gaps | Skill({ skill: "maestro-milestone-audit" }) |
-| Gaps auto-fixed | Skill({ skill: "maestro-milestone-audit" }) |
-| Gaps remain, diagnosed | Skill({ skill: "quality-debug" }) or Skill({ skill: "maestro-plan", args: "--gaps" }) |
-| Low coverage | Skill({ skill: "quality-auto-test", args: "{phase}" }) to generate missing tests |
----
-## Severity Inference
-Infer severity from user's natural language:
-| User says | Infer |
-|-----------|-------|
-| "crashes", "error", "exception", "fails completely", "can't use" | blocker |
-| "doesn't work", "nothing happens", "wrong behavior", "broken" | major |
-| "works but...", "slow", "weird", "minor issue", "inconsistent" | minor |
-| "color", "spacing", "alignment", "looks off", "typo" | cosmetic |
-Default to **major** if unclear. Never ask "how severe is this?" -- just infer and move on.
+# Test Workflow (UAT)
+Conversational UAT testing with persistent state, auto-diagnosis, and gap-fix closure loop.
+**Core**: Show expected behavior, ask if reality matches. One test at a time.
+- "yes" / "y" / "next" / empty / "pass" → pass
+- "skip" / "can't test" / "n/a" → skipped
+- Anything else → logged as issue, severity auto-inferred
+NEVER ask "how severe is this?"
+---
+### Step 1: Resolve Target
+| Input | Action |
+|-------|--------|
+| Phase number (e.g., "3") | `TARGET_TYPE=phase`, resolve from `state.json` artifacts |
+| Scratch task ID | `TARGET_TYPE=scratch`, `SCRATCH_DIR=.workflow/scratch/{id}/` |
+| Nothing | Check active UAT sessions (Step 2), else prompt user |
+**Flags:** `--smoke` (cold-start smoke tests before UAT), `--auto-fix` (auto gap-fix loop on failures)
+Validate target exists and has verification.json (E002).
+---
+### Step 2: Check Active Sessions
+```bash
+# Check scratch dirs (resolved via artifact registry) for active UAT sessions
+find .workflow/scratch -name "uat.md" -type f 2>/dev/null | head -5
+```
+Read each file's frontmatter (status, target) and Current Test section.
+**If active sessions exist AND no $ARGUMENTS:**
+Display inline:
+```
+## Active UAT Sessions
+| # | Target | Status | Current Test | Progress |
+|---|--------|--------|--------------|----------|
+| 1 | 04-comments | testing | 3. Reply to Comment | 2/6 |
+| 2 | quick-fix-nav | testing | 1. Nav Links | 0/4 |
+Reply with a number to resume, or provide a phase/task to start new.
+```
+Wait for user response.
+- Number -> resume that session (go to Step 9: Resume From File)
+- Phase/task ID -> new session (go to Step 4: Find Testables)
+**If active sessions exist AND $ARGUMENTS provided:**
+Check if session exists for that target. If yes, offer resume or restart.
+**If no active sessions AND no $ARGUMENTS:**
+Prompt: "No active UAT sessions. Provide a phase number or scratch task ID to start testing."
+**If no active sessions AND $ARGUMENTS:**
+Continue to Step 3 or Step 4.
+---
+### Step 3: Run Smoke Tests (if --smoke)
+Skip if --smoke not set.
+Inject basic sanity tests BEFORE UAT scenarios:
+| Smoke Test | Check | Method |
+|------------|-------|--------|
+| App starts | Process runs without crash | `bash: start command, check exit code` |
+| Routes respond | Key endpoints return non-error | `bash: curl/fetch main routes` |
+| Build clean | No build errors | `bash: build command succeeds` |
+| Dependencies | No missing deps | `bash: install check` |
+Record smoke results in uat.md under `## Smoke Tests` section.
+If any smoke test fails: abort UAT, report as blocker, suggest Skill({ skill: "quality-debug" }). (E003)
+---
+### Step 4: Load Verification Context
+Read from target directory: `verification.json`, `validation.json`, `index.json`, `plan.json`, `.summaries/TASK-*.md`.
+Build testable list from success_criteria + must_haves + task accomplishments (user-observable outcomes only).
+---
+### Step 5: Design Test Scenarios
+For each testable item, create a scenario:
+- **id**: T-001, T-002, ...
+- **name**: Brief test name
+- **category**: "e2e" | "integration" | "unit"
+- **expected**: Specific observable behavior (what user should see)
+- **requirement_ref**: Which success criterion this covers
+Write test-plan.json to `.tests/`:
+```json
+{
+  "target": "{phase or scratch ID}",
+  "generated_at": "{ISO timestamp}",
+  "tests": [...],
+  "coverage": {
+    "requirements_mapped": ["SC-001"],
+    "requirements_unmapped": ["SC-003"]
+  }
+}
+```
+```bash
+mkdir -p "$OUTPUT_DIR/.tests"
+```
+Skip internal/non-observable items (refactors, type changes).
+---
+### Step 6: Create UAT File
+Archive existing `uat.md` → `$OUTPUT_DIR/.history/uat-{YYYY-MM-DDTHH-mm-ss}.md`.
+Create `$OUTPUT_DIR/uat.md`:
+```markdown
+---
+status: testing
+target: {phase slug or scratch ID}
+source: [list of summary files]
+started: {ISO timestamp}
+updated: {ISO timestamp}
+---
+## Current Test
+<!-- OVERWRITE each test - shows where we are -->
+number: 1
+name: {first test name}
+expected: |
+  {what user should observe}
+awaiting: user response
+## Smoke Tests
+{results if ran, otherwise omitted}
+## Tests
+### 1. {Test Name}
+expected: {observable behavior}
+result: [pending]
+### 2. {Test Name}
+expected: {observable behavior}
+result: [pending]
+...
+## Summary
+total: {N}
+passed: 0
+issues: 0
+pending: {N}
+skipped: 0
+## Gaps
+[none yet]
+```
+→ Step 7.
+---
+### Step 7: Present Test
+Display:
+```
+------------------------------------------------------------
+  TEST {number}/{total}: {name}
+------------------------------------------------------------
+Expected behavior:
+{expected}
+------------------------------------------------------------
+> Type "pass" or describe what's wrong
+------------------------------------------------------------
+```
+Wait for user response (plain text).
+---
+### Step 8: Process Response
+| Response | Action |
+|----------|--------|
+| empty / "yes" / "y" / "ok" / "pass" / "next" | Pass |
+| "skip" / "can't test" / "n/a" | Skipped |
+| Anything else | Issue (severity auto-inferred) |
+For issues, update Tests section:
+```yaml
+### {N}. {name}
+expected: {expected}
+result: issue
+reported: "{verbatim user response}"
+severity: {inferred}
+```
+Append to Gaps section:
+```yaml
+- test: {N}
+  truth: "{expected behavior}"
+  status: failed
+  reason: "User reported: {verbatim}"
+  severity: {inferred}
+  requirement_ref: {if mapped}
+```
+**Auto-create Issue from UAT Gap:**
+Append to `.workflow/issues/issues.jsonl`: `ISS-{YYYYMMDD}-{NNN}`, title "UAT: {test.name} - {response}" (max 100 chars), `source: "uat"`, severity/priority from inference. Back-reference: set `gap.issue_id` in gap YAML.
+**Write triggers:** 1) Issue found 2) Session complete 3) Every 5 passed tests (checkpoint).
+More tests → Step 7. No more → Step 10.
+---
+### Step 9: Resume From File
+Read uat.md → find first `result: [pending]` → update Current Test → Step 7.
+---
+### Step 10: Complete Session
+Update uat.md: `status: complete`. Archive existing test artifacts → `.history/`.
+Write `.tests/test-results.json`:
+```json
+{
+  "target": "{phase or scratch ID}",
+  "completed_at": "{ISO timestamp}",
+  "results": [
+    { "id": "T-001", "name": "...", "status": "pass|issue|skipped", "details": "..." }
+  ],
+  "summary": { "total": N, "passed": N, "issues": N, "skipped": N }
+}
+```
+Write `.tests/coverage-report.json`:
+```json
+{
+  "target": "{phase or scratch ID}",
+  "generated_at": "{ISO timestamp}",
+  "requirements_covered": ["SC-001"],
+  "requirements_uncovered": ["SC-003"],
+  "coverage_percentage": 66.7
+}
+```
+Update index.json with uat results (`status`, `test_count`, `passed`, `gaps`).
+issues == 0 → Step 13. issues > 0 → Step 11.
+---
+### Step 11: Auto-Diagnose
+1. **Cluster gaps** by component/area (same file/module → one cluster, same flow → one cluster)
+2. **Spawn one debug agent per cluster** (parallel, `run_in_background: false`): pre-filled symptoms, `goal: find_root_cause`. Include `issue_id` refs.
+3. **Collect results**, update uat.md gaps:
+```yaml
+- test: {N}
+  truth: "..."
+  status: failed
+  reason: "..."
+  severity: {inferred}
+  root_cause: "{diagnosed cause}"
+  fix_direction: "{suggested approach}"
+  affected_files: ["{file1}", "{file2}"]
+```
+---
+### Step 12: Gap Closure Decision
+`AUTO_FIX` set → skip prompt, go to gap-fix loop. Otherwise present:
+```
+### Diagnosis Complete
+| Gap | Severity | Root Cause | Fix Direction |
+|-----|----------|------------|---------------|
+| T-3 | major    | Missing null check | Add guard clause |
+| T-5 | blocker  | Event not cleaned  | Add cleanup logic |
+Options:
+1. Auto-fix -- Plan and execute fixes, then re-verify
+2. Debug deep -- Skill({ skill: "quality-debug" }) per issue
+3. Plan fixes -- Skill({ skill: "maestro-plan", args: "{phase} --gaps" })
+4. Manual fix -- Address issues yourself
+```
+| Choice | Action |
+|--------|--------|
+| 1 / "auto-fix" | Go to gap-fix loop |
+| 2 / "debug" | Suggest Skill({ skill: "quality-debug" }) |
+| 3 / "plan" | Suggest Skill({ skill: "maestro-plan", args: "{phase} --gaps" }) |
+| 4 / "manual" | Done, report results |
+**Gap-fix closure loop** (max 2 iterations):
+1. `maestro-plan {phase} --gaps` → fix tasks
+2. `maestro-execute {phase}` → execute fixes
+3. `maestro-execute {phase}` → re-verify
+Issue lifecycle: `registered` → `planning` → `executing` → `completed` | `failed`.
+Pass → update uat.md gaps as resolved. Still gaps → report remaining, suggest manual intervention.
+---
+### Step 12.5: UAT Confidence Scoring
+Dimensions (4): scenario_coverage, diagnostic_depth, observation_quality, closure_completeness. Factors (weights): requirements_mapped(.30), observation_specificity(.25), user_validation(.20), diagnostic_depth(.15), consistency(.10). Score at: init (Step 5), per user response (Step 8), after gap-fix loop (Step 12).
+Quality mechanisms: Pressure Pass — >80% pass → ask user to try edge case. Devil's Advocate — >70% first-try pass → challenge scenario difficulty. Stall Detection — 2 gap-fix iterations without improvement → stop.
+Readiness Gate (blocks Step 13): scenario_coverage < 40% | blocker gap without diagnosis | no pressure pass (if >80%) | unresolved gaps without acknowledgment. Append confidence summary to uat.md.
+---
+### Step 13: Report
+```
+=== UAT RESULTS ===
+Target:      {target}
+Smoke Tests: {smoke_count} run, {smoke_pass} passed (if ran)
+UAT Tests:   {total} total
+  Passed:    {passed}
+  Issues:    {issues} ({blocker_count} blockers, {major_count} major)
+  Skipped:   {skipped}
+Diagnosis:   {diagnosed_count}/{issues} gaps diagnosed
+Auto-fix:    {fixed_count} gaps resolved (if ran)
+Files:
+  {target_dir}/uat.md
+  {target_dir}/.tests/test-results.json
+  {target_dir}/.tests/coverage-report.json
+Next steps:
+  {suggested_next_command}
+```
+**Next step routing:**
+| Result | Suggestion |
+|--------|------------|
+| All passed, no gaps | Skill({ skill: "maestro-milestone-audit" }) |
+| Gaps auto-fixed | Skill({ skill: "maestro-milestone-audit" }) |
+| Gaps remain, diagnosed | Skill({ skill: "quality-debug" }) or Skill({ skill: "maestro-plan", args: "--gaps" }) |
+| Low coverage | Skill({ skill: "quality-auto-test", args: "{phase}" }) to generate missing tests |
+---
+## Severity Inference
+| User says | Infer |
+|-----------|-------|
+| "crashes", "error", "exception", "fails completely", "can't use" | blocker |
+| "doesn't work", "nothing happens", "wrong behavior", "broken" | major |
+| "works but...", "slow", "weird", "minor issue", "inconsistent" | minor |
+| "color", "spacing", "alignment", "looks off", "typo" | cosmetic |
+Default: **major**. NEVER ask severity — infer and move on.