npm - maestro-flow - Versions diffs - 0.4.10 → 0.4.11 - Mend

maestro-flow 0.4.10 → 0.4.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (226) hide show

package/.agents/skills/team-testing/roles/coordinator/commands/analyze.md ADDED Viewed

@@ -0,0 +1,70 @@
+# Analyze Task
+Parse user task -> detect testing capabilities -> select pipeline -> design roles.
+**CONSTRAINT**: Text-level analysis only. NO source code reading, NO codebase exploration.
+## Signal Detection
+| Keywords | Capability | Prefix |
+|----------|------------|--------|
+| strategy, plan, layers, scope | strategist | STRATEGY |
+| generate tests, write tests, create tests | generator | TESTGEN |
+| run tests, execute, coverage | executor | TESTRUN |
+| analyze, report, quality, defects | analyst | TESTANA |
+## Pipeline Mode Detection
+| Condition | Pipeline |
+|-----------|----------|
+| fileCount <= 3 AND moduleCount <= 1 | targeted |
+| fileCount <= 10 AND moduleCount <= 3 | standard |
+| Otherwise | comprehensive |
+## Dependency Graph
+Natural ordering for testing pipeline:
+- Tier 0: strategist (change analysis, no upstream dependency)
+- Tier 1: generator (requires strategy)
+- Tier 2: executor (requires generated tests; GC loop with generator)
+- Tier 3: analyst (requires execution results)
+## Pipeline Definitions
+```
+Targeted:      STRATEGY -> TESTGEN(L1) -> TESTRUN(L1)
+Standard:      STRATEGY -> TESTGEN(L1) -> TESTRUN(L1) -> TESTGEN(L2) -> TESTRUN(L2) -> TESTANA
+Comprehensive: STRATEGY -> [TESTGEN(L1) || TESTGEN(L2)] -> [TESTRUN(L1) || TESTRUN(L2)] -> TESTGEN(L3) -> TESTRUN(L3) -> TESTANA
+```
+## Complexity Scoring
+| Factor | Points |
+|--------|--------|
+| Per test layer | +1 |
+| Parallel tracks | +1 per track |
+| GC loop enabled | +1 |
+| Serial depth > 3 | +1 |
+Results: 1-2 Low, 3-5 Medium, 6+ High
+## Role Minimization
+- Cap at 5 roles (coordinator + 4 workers)
+- GC loop: generator <-> executor iterate up to 3 rounds per layer
+## Output
+Write <session>/task-analysis.json:
+```json
+{
+  "task_description": "<original>",
+  "pipeline_mode": "<targeted|standard|comprehensive>",
+  "capabilities": [{ "name": "<cap>", "prefix": "<PREFIX>", "keywords": ["..."] }],
+  "dependency_graph": { "<TASK-ID>": { "role": "<role>", "blockedBy": ["..."], "layer": "L1|L2|L3" } },
+  "roles": [{ "name": "<role>", "prefix": "<PREFIX>", "inner_loop": true }],
+  "complexity": { "score": 0, "level": "Low|Medium|High" },
+  "coverage_targets": { "L1": 80, "L2": 60, "L3": 40 },
+  "gc_loop_enabled": true
+}
+```

package/.agents/skills/team-testing/roles/coordinator/commands/dispatch.md ADDED Viewed

@@ -0,0 +1,108 @@
+# Dispatch Tasks
+Create testing task chains with correct dependencies. Supports targeted, standard, and comprehensive pipelines.
+## Workflow
+1. Read task-analysis.json -> extract pipeline_mode and dependency_graph
+2. Read specs/pipelines.md -> get task registry for selected pipeline
+3. Topological sort tasks (respect blockedBy)
+4. Validate all owners exist in role registry (SKILL.md)
+5. For each task (in order):
+   - create_task with structured description (see template below)
+   - update_task with blockedBy + owner assignment
+6. Update session.json with pipeline.tasks_total
+7. Validate chain (no orphans, no cycles, all refs valid)
+## Task Description Template
+```
+PURPOSE: <goal> | Success: <criteria>
+TASK:
+  - <step 1>
+  - <step 2>
+CONTEXT:
+  - Session: <session-folder>
+  - Scope: <scope>
+  - Layer: <L1-unit|L2-integration|L3-e2e>
+  - Upstream artifacts: <artifact-1>, <artifact-2>
+  - Shared memory: <session>/wisdom/.msg/meta.json
+EXPECTED: <deliverable path> + <quality criteria>
+CONSTRAINTS: <scope limits, focus areas>
+---
+InnerLoop: <true|false>
+RoleSpec: ~  or <project>/.claude/skills/team-testing/roles/<role>/role.md
+```
+## Pipeline Task Registry
+### Targeted Pipeline
+```
+STRATEGY-001 (strategist): Analyze change scope, define test strategy
+  blockedBy: []
+TESTGEN-001 (generator): Generate L1 unit tests
+  blockedBy: [STRATEGY-001], meta: layer=L1-unit
+TESTRUN-001 (executor): Execute L1 tests, collect coverage
+  blockedBy: [TESTGEN-001], inner_loop: true, meta: layer=L1-unit, coverage_target=80%
+```
+### Standard Pipeline
+```
+STRATEGY-001 (strategist): Analyze change scope, define test strategy
+  blockedBy: []
+TESTGEN-001 (generator): Generate L1 unit tests
+  blockedBy: [STRATEGY-001], meta: layer=L1-unit
+TESTRUN-001 (executor): Execute L1 tests, collect coverage
+  blockedBy: [TESTGEN-001], inner_loop: true, meta: layer=L1-unit, coverage_target=80%
+TESTGEN-002 (generator): Generate L2 integration tests
+  blockedBy: [TESTRUN-001], meta: layer=L2-integration
+TESTRUN-002 (executor): Execute L2 tests, collect coverage
+  blockedBy: [TESTGEN-002], inner_loop: true, meta: layer=L2-integration, coverage_target=60%
+TESTANA-001 (analyst): Defect pattern analysis, quality report
+  blockedBy: [TESTRUN-002]
+```
+### Comprehensive Pipeline
+```
+STRATEGY-001 (strategist): Analyze change scope, define test strategy
+  blockedBy: []
+TESTGEN-001 (generator-1): Generate L1 unit tests
+  blockedBy: [STRATEGY-001], meta: layer=L1-unit
+TESTGEN-002 (generator-2): Generate L2 integration tests
+  blockedBy: [STRATEGY-001], meta: layer=L2-integration
+TESTRUN-001 (executor-1): Execute L1 tests, collect coverage
+  blockedBy: [TESTGEN-001], inner_loop: true, meta: layer=L1-unit, coverage_target=80%
+TESTRUN-002 (executor-2): Execute L2 tests, collect coverage
+  blockedBy: [TESTGEN-002], inner_loop: true, meta: layer=L2-integration, coverage_target=60%
+TESTGEN-003 (generator): Generate L3 E2E tests
+  blockedBy: [TESTRUN-001, TESTRUN-002], meta: layer=L3-e2e
+TESTRUN-003 (executor): Execute L3 tests, collect coverage
+  blockedBy: [TESTGEN-003], inner_loop: true, meta: layer=L3-e2e, coverage_target=40%
+TESTANA-001 (analyst): Defect pattern analysis, quality report
+  blockedBy: [TESTRUN-003]
+```
+## InnerLoop Flag Rules
+- true: generator, executor roles (GC loop iterations)
+- false: strategist, analyst roles
+## Dependency Validation
+- No orphan tasks (all tasks have valid owner)
+- No circular dependencies
+- All blockedBy references exist
+- Session reference in every task description
+- RoleSpec reference in every task description
+## Log After Creation
+```
+mcp__ccw-tools__team_msg({
+  operation: "log",
+  session_id: <session-id>,
+  from: "coordinator",
+  type: "pipeline_selected",
+  data: { pipeline: "<mode>", task_count: <N> }
+})
+```

package/.agents/skills/team-testing/roles/coordinator/commands/monitor.md ADDED Viewed

@@ -0,0 +1,257 @@
+# monitor_process Pipeline
+Event-driven pipeline coordination. Beat model: coordinator wake -> process -> spawn -> STOP.
+## Constants
+- SPAWN_MODE: background
+- ONE_STEP_PER_INVOCATION: true
+- FAST_ADVANCE_AWARE: true
+- WORKER_AGENT: team-worker
+- MAX_GC_ROUNDS: 3
+## Handler Router
+| Source | Handler |
+|--------|---------|
+| Message contains [strategist], [generator], [executor], [analyst] | handleCallback |
+| "capability_gap" | handleAdapt |
+| "check" or "status" | handleCheck |
+| "resume" or "continue" | handleResume |
+| All tasks completed | handleComplete |
+| Default | handleSpawnNext |
+## Role-Worker Map
+| Prefix | Role | Role Spec | inner_loop |
+|--------|------|-----------|------------|
+| STRATEGY-* | strategist | `~  or <project>/.claude/skills/team-testing/roles/strategist/role.md` | false |
+| TESTGEN-* | generator | `~  or <project>/.claude/skills/team-testing/roles/generator/role.md` | true |
+| TESTRUN-* | executor | `~  or <project>/.claude/skills/team-testing/roles/executor/role.md` | true |
+| TESTANA-* | analyst | `~  or <project>/.claude/skills/team-testing/roles/analyst/role.md` | false |
+## handleCallback
+Worker completed. Process and advance.
+1. Parse message to identify role and task ID:
+| Message Pattern | Role Detection |
+|----------------|---------------|
+| `[strategist]` or task ID `STRATEGY-*` | strategist |
+| `[generator]` or task ID `TESTGEN-*` | generator |
+| `[executor]` or task ID `TESTRUN-*` | executor |
+| `[analyst]` or task ID `TESTANA-*` | analyst |
+2. Check if progress update (inner loop) or final completion
+3. Progress -> update session state, STOP
+4. Completion -> mark task done via update_task(status="completed"), remove from active_workers
+5. Check for checkpoints:
+   - TESTRUN-* completes -> read meta.json for executor.pass_rate and executor.coverage:
+     - (pass_rate >= 0.95 AND coverage >= target) OR gc_rounds[layer] >= MAX_GC_ROUNDS -> proceed to handleSpawnNext
+     - (pass_rate < 0.95 OR coverage < target) AND gc_rounds[layer] < MAX_GC_ROUNDS -> create GC fix tasks, increment gc_rounds[layer]
+**GC Fix Task Creation** (when coverage below target):
+```
+create_task({
+  subject: "TESTGEN-<layer>-fix-<round>: Revise <layer> tests (GC #<round>)",
+  description: "PURPOSE: Revise tests to fix failures and improve coverage | Success: pass_rate >= 0.95 AND coverage >= target
+TASK:
+  - Read previous test results and failure details
+  - Revise tests to address failures
+  - Improve coverage for uncovered areas
+CONTEXT:
+  - Session: <session-folder>
+  - Layer: <layer>
+  - Previous results: <session>/results/run-<N>.json
+EXPECTED: Revised test files in <session>/tests/<layer>/
+CONSTRAINTS: Only modify test files
+---
+InnerLoop: true
+RoleSpec: ~  or <project>/.claude/skills/team-testing/roles/generator/role.md"
+})
+create_task({
+  subject: "TESTRUN-<layer>-fix-<round>: Re-execute <layer> (GC #<round>)",
+  description: "PURPOSE: Re-execute tests after revision | Success: pass_rate >= 0.95
+CONTEXT:
+  - Session: <session-folder>
+  - Layer: <layer>
+  - Input: tests/<layer>
+EXPECTED: <session>/results/run-<N>-gc.json
+---
+InnerLoop: true
+RoleSpec: ~  or <project>/.claude/skills/team-testing/roles/executor/role.md",
+  blockedBy: ["TESTGEN-<layer>-fix-<round>"]
+})
+```
+Update session.gc_rounds[layer]++
+6. -> handleSpawnNext
+## handleCheck
+Read-only status report, then STOP.
+**Worker Progress** (from message bus):
+Before generating status output, read worker milestones:
+```javascript
+const progressMsgs = mcp__ccw-tools__team_msg({
+  operation: "list", session_id: sessionId, type: "progress", last: 50
+})
+const blockerMsgs = mcp__ccw-tools__team_msg({
+  operation: "list", session_id: sessionId, type: "blocker", last: 10
+})
+// Aggregate latest milestone per task
+const taskProgress = {}
+for (const msg of (progressMsgs.result?.messages || [])) {
+  const tid = msg.data?.task_id
+  if (tid && (!taskProgress[tid] || msg.ts > taskProgress[tid].ts)) {
+    taskProgress[tid] = { phase: msg.data.phase, pct: msg.data.progress_pct, ts: msg.ts }
+  }
+}
+```
+Include in status output:
+- Per-worker latest milestone (phase + progress_pct) next to task status
+- Active blockers section (if any blockerMsgs found)
+Output:
+```
+[coordinator] Testing Pipeline Status
+[coordinator] Mode: <pipeline_mode>
+[coordinator] Progress: <done>/<total> (<pct>%)
+[coordinator] GC Rounds: L1: <n>/3, L2: <n>/3
+[coordinator] Pipeline Graph:
+  STRATEGY-001: <done|run|wait> test-strategy.md
+  TESTGEN-001:  <done|run|wait> generating L1...
+  TESTRUN-001:  <done|run|wait> blocked by TESTGEN-001
+  TESTGEN-002:  <done|run|wait> blocked by TESTRUN-001
+  TESTRUN-002:  <done|run|wait> blocked by TESTGEN-002
+  TESTANA-001:  <done|run|wait> blocked by TESTRUN-*
+[coordinator] Active Workers: <list with elapsed time>
+[coordinator] Ready: <pending tasks with resolved deps>
+[coordinator] Commands: 'resume' to advance | 'check' to refresh
+```
+Then STOP.
+## handleResume
+1. No active workers -> handleSpawnNext
+2. Has active -> check each status
+   - completed -> mark done via update_task
+   - in_progress -> still running
+3. Some completed -> handleSpawnNext
+4. All running -> report status, STOP
+## handleSpawnNext
+Find ready tasks, spawn workers, STOP.
+1. Collect from list_tasks():
+   - completedSubjects: status = completed
+   - inProgressSubjects: status = in_progress
+   - readySubjects: status = pending AND all blockedBy in completedSubjects
+2. No ready + work in progress -> report waiting, STOP
+3. No ready + nothing in progress -> handleComplete
+4. Has ready -> for each ready task:
+   a. Determine role from prefix (use Role-Worker Map)
+   b. Check if inner loop role (generator/executor) with active worker -> skip (worker picks up next task)
+   c. update_task -> in_progress
+   d. team_msg log -> task_unblocked
+   e. Spawn team-worker:
+```
+delegate_subagent({
+  subagent_type: "team-worker",
+  description: "Spawn <role> worker for <subject>",
+  team_name: "testing",
+  name: "<role>",
+  run_in_background: true,
+  prompt: `## Role Assignment
+role: <role>
+role_spec: ~  or <project>/.claude/skills/team-testing/roles/<role>/role.md
+session: <session-folder>
+session_id: <session-id>
+team_name: testing
+requirement: <task-description>
+inner_loop: <true|false>
+## Current Task
+- Task ID: <task-id>
+- Task: <subject>
+## Progress Milestones
+session_id: <session-id>
+Report progress via team_msg at natural phase boundaries (context loaded -> core work done -> verification).
+Report blockers immediately via team_msg type="blocker".
+Report completion via team_msg type="task_complete" after final send_message.
+Read role_spec file to load Phase 2-4 domain instructions.
+Execute built-in Phase 1 (task discovery) -> role Phase 2-4 -> built-in Phase 5 (report).`
+})
+```
+   f. Add to active_workers
+5. **Parallel spawn** (comprehensive pipeline):
+   - TESTGEN-001 + TESTGEN-002 both unblocked -> spawn both in parallel (name: "generator-1", "generator-2")
+   - TESTRUN-001 + TESTRUN-002 both unblocked -> spawn both in parallel (name: "executor-1", "executor-2")
+6. Update session.json, output summary, STOP
+## handleComplete
+Pipeline done. Generate report and completion action.
+1. Verify all tasks (including any GC fix tasks) have status "completed" or "deleted"
+2. If any tasks incomplete -> return to handleSpawnNext
+3. If all complete:
+   - Read final state from meta.json (analyst.quality_score, executor.coverage, gc_rounds)
+   - Generate summary (deliverables, task count, GC rounds, coverage metrics)
+4. Read session.completion_action:
+   - interactive -> ask_user (Archive/Keep/Deepen Coverage)
+   - auto_archive -> Archive & Clean (status=completed, delete_team)
+   - auto_keep -> Keep Active (status=paused)
+## handleAdapt
+Capability gap reported mid-pipeline.
+1. Parse gap description
+2. Check if existing role covers it -> redirect
+3. Role count < 5 -> generate dynamic role-spec in <session>/role-specs/
+4. Create new task, spawn worker
+5. Role count >= 5 -> merge or pause
+## Fast-Advance Reconciliation
+On every coordinator wake:
+1. Read team_msg entries with type="fast_advance"
+2. Sync active_workers with spawned successors
+3. No duplicate spawns
+## Phase 4: State Persistence
+After every handler execution:
+1. Reconcile active_workers with actual list_tasks states
+2. Remove entries for completed/deleted tasks
+3. Write updated session.json
+4. STOP (wait for next callback)
+## Error Handling
+| Scenario | Resolution |
+|----------|------------|
+| Session file not found | Error, suggest re-initialization |
+| Worker callback from unknown role | Log info, scan for other completions |
+| GC loop exceeded (3 rounds) | Accept current coverage with warning, proceed |
+| Pipeline stall | Check blockedBy chains, report to user |
+| Coverage tool unavailable | Degrade to pass rate judgment |
+| Worker crash | Reset task to pending, respawn |

package/.agents/skills/team-testing/roles/coordinator/role.md ADDED Viewed

@@ -0,0 +1,134 @@
+# Coordinator Role
+Orchestrate team-testing: analyze -> dispatch -> spawn -> monitor -> report.
+## Identity
+- Name: coordinator | Tag: [coordinator]
+- Responsibility: Change scope analysis -> Create team -> Dispatch tasks -> monitor_process progress -> Report results
+## Boundaries
+### MUST
+- Use `team-worker` agent type for all worker spawns (NOT `general-purpose`)
+- Follow Command Execution Protocol for dispatch and monitor commands
+- Respect pipeline stage dependencies (blockedBy)
+- Stop after spawning workers -- wait for callbacks
+- Handle Generator-Critic cycles with max 3 iterations per layer
+- Execute completion action in Phase 5
+### MUST NOT
+- Implement domain logic (test generation, execution, analysis) -- workers handle this
+- Spawn workers without creating tasks first
+- Skip quality gates when coverage is below target
+- Modify test files or source code directly -- delegate to workers
+- Force-advance pipeline past failed GC loops
+## Command Execution Protocol
+When coordinator needs to execute a specific phase:
+1. Read `commands/<command>.md`
+2. Follow the workflow defined in the command
+3. Commands are inline execution guides, NOT separate agents
+4. Execute synchronously, complete before proceeding
+## Entry Router
+| Detection | Condition | Handler |
+|-----------|-----------|---------|
+| Worker callback | Message contains [strategist], [generator], [executor], [analyst] | -> handleCallback (monitor.md) |
+| Status check | Args contain "check" or "status" | -> handleCheck (monitor.md) |
+| Manual resume | Args contain "resume" or "continue" | -> handleResume (monitor.md) |
+| Capability gap | Message contains "capability_gap" | -> handleAdapt (monitor.md) |
+| Pipeline complete | All tasks completed | -> handleComplete (monitor.md) |
+| Interrupted session | Active session in .workflow/.team/TST-* | -> Phase 0 |
+| New session | None of above | -> Phase 1 |
+For callback/check/resume/adapt/complete: load @commands/monitor.md, execute handler, STOP.
+## Phase 0: Session Resume Check
+1. Scan .workflow/.team/TST-*/session.json for active/paused sessions
+2. No sessions -> Phase 1
+3. Single session -> reconcile (audit list_tasks, reset in_progress->pending, rebuild team, kick first ready task)
+4. Multiple -> ask_user for selection
+## Phase 1: Requirement Clarification
+TEXT-LEVEL ONLY. No source code reading.
+1. Parse task description from $ARGUMENTS
+2. Analyze change scope:
+   ```
+   shell("git diff --name-only HEAD~1 2>/dev/null || git diff --name-only --cached")
+   ```
+3. Select pipeline:
+| Condition | Pipeline |
+|-----------|----------|
+| fileCount <= 3 AND moduleCount <= 1 | targeted |
+| fileCount <= 10 AND moduleCount <= 3 | standard |
+| Otherwise | comprehensive |
+4. Clarify if ambiguous (ask_user for scope)
+5. Delegate to @commands/analyze.md
+6. Output: task-analysis.json
+7. CRITICAL: Always proceed to Phase 2, never skip team workflow
+## Phase 2: Create Team + Initialize Session
+1. Resolve workspace paths (MUST do first):
+   - `project_root` = result of `shell({ command: "pwd" })`
+   - `skill_root` = `<project_root>/.claude/skills/team-testing`
+2. Generate session ID: TST-<slug>-<date>
+3. Create session folder structure (strategy/, tests/L1-unit/, tests/L2-integration/, tests/L3-e2e/, results/, analysis/, wisdom/)
+4. create_team with team name "testing"
+5. Read specs/pipelines.md -> select pipeline based on mode
+6. Initialize pipeline via team_msg state_update:
+   ```
+   mcp__ccw-tools__team_msg({
+     operation: "log", session_id: "<id>", from: "coordinator",
+     type: "state_update", summary: "Session initialized",
+     data: {
+       pipeline_mode: "<targeted|standard|comprehensive>",
+       pipeline_stages: ["strategist", "generator", "executor", "analyst"],
+       team_name: "testing",
+       coverage_targets: { "L1": 80, "L2": 60, "L3": 40 },
+       gc_rounds: {}
+     }
+   })
+   ```
+7. Write session.json
+## Phase 3: Create Task Chain
+Delegate to @commands/dispatch.md:
+1. Read specs/pipelines.md for selected pipeline's task registry
+2. Topological sort tasks
+3. Create tasks via create_task with blockedBy
+4. Update session.json
+## Phase 4: Spawn-and-Stop
+Delegate to @commands/monitor.md#handleSpawnNext:
+1. Find ready tasks (pending + blockedBy resolved)
+2. Spawn team-worker agents (see SKILL.md Spawn Template)
+3. Output status summary
+4. STOP
+## Phase 5: Report + Completion Action
+1. Generate summary (deliverables, pipeline stats, GC rounds, coverage metrics)
+2. Execute completion action per session.completion_action:
+   - interactive -> ask_user (Archive/Keep/Deepen Coverage)
+   - auto_archive -> Archive & Clean
+   - auto_keep -> Keep Active
+## Error Handling
+| Error | Resolution |
+|-------|------------|
+| Task too vague | ask_user for clarification |
+| Session corruption | Attempt recovery, fallback to manual |
+| Worker crash | Reset task to pending, respawn |
+| Dependency cycle | Detect in analysis, halt |
+| GC loop exceeded (3 rounds) | Accept current coverage, log to wisdom, proceed |
+| Coverage tool unavailable | Degrade to pass rate judgment |

package/.agents/skills/team-testing/roles/executor/role.md ADDED Viewed

@@ -0,0 +1,101 @@
+---
+role: executor
+prefix: TESTRUN
+inner_loop: true
+message_types:
+  success: tests_passed
+  failure: tests_failed
+  coverage: coverage_report
+  error: error
+---
+<!-- Open-standard mirror generated by scripts/build-agents-standard.mjs — do not edit; re-run after editing .claude/ source. -->
+# Test Executor
+Execute tests, collect coverage, attempt auto-fix for failures. Acts as the Critic in the Generator-Critic loop. Reports pass rate and coverage for coordinator GC decisions.
+## Phase 2: Context Loading
+| Input | Source | Required |
+|-------|--------|----------|
+| Task description | From task subject/description | Yes |
+| Session path | Extracted from task description | Yes |
+| Test directory | Task description (Input: <path>) | Yes |
+| Coverage target | Task description (default: 80%) | Yes |
+| .msg/meta.json | <session>/wisdom/.msg/meta.json | No |
+1. Extract session path and test directory from task description
+2. Load test specs: Run `ccw spec load --category test` for test framework conventions and coverage targets
+3. Extract coverage target (default: 80%)
+3. Read .msg/meta.json for framework info (from strategist namespace)
+4. Determine test framework:
+| Framework | Run Command |
+|-----------|-------------|
+| Jest | `npx jest --coverage --json --outputFile=<session>/results/jest-output.json` |
+| Pytest | `python -m pytest --cov --cov-report=json:<session>/results/coverage.json -v` |
+| Vitest | `npx vitest run --coverage --reporter=json` |
+5. Find test files to execute:
+```
+find_files("<session>/<test-dir>/**/*")
+```
+## Phase 3: Test Execution + Fix Cycle
+**Iterative test-fix cycle** (max 3 iterations):
+| Step | Action |
+|------|--------|
+| 1 | Run test command |
+| 2 | Parse results: pass rate + coverage |
+| 3 | pass_rate >= 0.95 AND coverage >= target -> success, exit |
+| 4 | Extract failing test details |
+| 5 | Delegate fix to CLI tool (gemini write mode) |
+| 6 | Increment iteration; >= 3 -> exit with failures |
+```
+shell("<test-command> 2>&1 || true")
+```
+**Auto-fix delegation** (on failure):
+```
+shell({
+  command: `maestro delegate "PURPOSE: Fix test failures to achieve pass rate >= 0.95; success = all tests pass
+TASK: • Analyze test failure output • Identify root causes • Fix test code only (not source) • Preserve test intent
+MODE: write
+CONTEXT: @<session>/<test-dir>/**/* | Memory: Test framework: <framework>, iteration <N>/3
+EXPECTED: Fixed test files with: corrected assertions, proper async handling, fixed imports, maintained coverage
+CONSTRAINTS: Only modify test files | Preserve test structure | No source code changes
+Test failures:
+<test-output>" --tool gemini --mode write --cd <session>`,
+  run_in_background: false
+})
+```
+**Save results**: `<session>/results/run-<N>.json`
+## Phase 4: Defect Pattern Extraction & State Update
+**Extract defect patterns from failures**:
+| Pattern Type | Detection Keywords |
+|--------------|-------------------|
+| Null reference | "null", "undefined", "Cannot read property" |
+| Async timing | "timeout", "async", "await", "promise" |
+| Import errors | "Cannot find module", "import" |
+| Type mismatches | "type", "expected", "received" |
+**Record effective test patterns** (if pass_rate > 0.8):
+| Pattern | Detection |
+|---------|-----------|
+| Happy path | "should succeed", "valid input" |
+| Edge cases | "edge", "boundary", "limit" |
+| Error handling | "should fail", "error", "throw" |
+Update `<session>/wisdom/.msg/meta.json` under `executor` namespace:
+- Merge `{ "executor": { pass_rate, coverage, defect_patterns, effective_patterns, coverage_history_entry } }`