npm - maestro-flow-one - Versions diffs - 0.1.0 - Mend

maestro-flow-one 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (106) hide show

package/LICENSE +21 -0
package/README.md +173 -0
package/bin/maestro-flow.js +730 -0
package/claude/maestro-flow/SKILL.md +239 -0
package/claude/maestro-flow/chains/templates.json +256 -0
package/claude/maestro-flow/commands/learn/decompose.md +176 -0
package/claude/maestro-flow/commands/learn/follow.md +167 -0
package/claude/maestro-flow/commands/learn/investigate.md +221 -0
package/claude/maestro-flow/commands/learn/retro.md +303 -0
package/claude/maestro-flow/commands/learn/second-opinion.md +167 -0
package/claude/maestro-flow/commands/lifecycle/amend.md +300 -0
package/claude/maestro-flow/commands/lifecycle/analyze.md +126 -0
package/claude/maestro-flow/commands/lifecycle/brainstorm.md +100 -0
package/claude/maestro-flow/commands/lifecycle/composer.md +354 -0
package/claude/maestro-flow/commands/lifecycle/execute.md +114 -0
package/claude/maestro-flow/commands/lifecycle/fork.md +86 -0
package/claude/maestro-flow/commands/lifecycle/init.md +78 -0
package/claude/maestro-flow/commands/lifecycle/learn.md +140 -0
package/claude/maestro-flow/commands/lifecycle/link-coordinate.md +71 -0
package/claude/maestro-flow/commands/lifecycle/merge.md +61 -0
package/claude/maestro-flow/commands/lifecycle/overlay.md +178 -0
package/claude/maestro-flow/commands/lifecycle/plan.md +138 -0
package/claude/maestro-flow/commands/lifecycle/player.md +404 -0
package/claude/maestro-flow/commands/lifecycle/quick.md +56 -0
package/claude/maestro-flow/commands/lifecycle/roadmap.md +164 -0
package/claude/maestro-flow/commands/lifecycle/ui-design.md +93 -0
package/claude/maestro-flow/commands/lifecycle/update.md +176 -0
package/claude/maestro-flow/commands/lifecycle/verify.md +90 -0
package/claude/maestro-flow/commands/manage/codebase-rebuild.md +75 -0
package/claude/maestro-flow/commands/manage/codebase-refresh.md +57 -0
package/claude/maestro-flow/commands/manage/harvest.md +94 -0
package/claude/maestro-flow/commands/manage/issue-discover.md +77 -0
package/claude/maestro-flow/commands/manage/issue.md +73 -0
package/claude/maestro-flow/commands/manage/knowhow-capture.md +193 -0
package/claude/maestro-flow/commands/manage/knowhow.md +77 -0
package/claude/maestro-flow/commands/manage/learn.md +67 -0
package/claude/maestro-flow/commands/manage/status.md +51 -0
package/claude/maestro-flow/commands/manage/wiki.md +62 -0
package/claude/maestro-flow/commands/milestone/audit.md +68 -0
package/claude/maestro-flow/commands/milestone/complete.md +75 -0
package/claude/maestro-flow/commands/milestone/release.md +96 -0
package/claude/maestro-flow/commands/quality/auto-test.md +124 -0
package/claude/maestro-flow/commands/quality/debug.md +115 -0
package/claude/maestro-flow/commands/quality/refactor.md +55 -0
package/claude/maestro-flow/commands/quality/retrospective.md +78 -0
package/claude/maestro-flow/commands/quality/review.md +108 -0
package/claude/maestro-flow/commands/quality/sync.md +51 -0
package/claude/maestro-flow/commands/quality/test.md +103 -0
package/claude/maestro-flow/commands/spec/add.md +49 -0
package/claude/maestro-flow/commands/spec/load.md +51 -0
package/claude/maestro-flow/commands/spec/remove.md +51 -0
package/claude/maestro-flow/commands/spec/setup.md +51 -0
package/claude/maestro-flow/commands/wiki/connect.md +62 -0
package/claude/maestro-flow/commands/wiki/digest.md +69 -0
package/codex/maestro-flow/SKILL.md +505 -0
package/codex/maestro-flow/chains/templates.json +256 -0
package/codex/maestro-flow/commands/learn/decompose.md +113 -0
package/codex/maestro-flow/commands/learn/follow.md +83 -0
package/codex/maestro-flow/commands/learn/investigate.md +83 -0
package/codex/maestro-flow/commands/learn/retro.md +83 -0
package/codex/maestro-flow/commands/learn/second-opinion.md +86 -0
package/codex/maestro-flow/commands/lifecycle/amend.md +300 -0
package/codex/maestro-flow/commands/lifecycle/analyze.md +483 -0
package/codex/maestro-flow/commands/lifecycle/brainstorm.md +397 -0
package/codex/maestro-flow/commands/lifecycle/composer.md +213 -0
package/codex/maestro-flow/commands/lifecycle/execute.md +318 -0
package/codex/maestro-flow/commands/lifecycle/fork.md +98 -0
package/codex/maestro-flow/commands/lifecycle/init.md +134 -0
package/codex/maestro-flow/commands/lifecycle/learn.md +80 -0
package/codex/maestro-flow/commands/lifecycle/link-coordinate.md +257 -0
package/codex/maestro-flow/commands/lifecycle/merge.md +69 -0
package/codex/maestro-flow/commands/lifecycle/overlay.md +119 -0
package/codex/maestro-flow/commands/lifecycle/plan.md +460 -0
package/codex/maestro-flow/commands/lifecycle/player.md +323 -0
package/codex/maestro-flow/commands/lifecycle/quick.md +124 -0
package/codex/maestro-flow/commands/lifecycle/roadmap.md +468 -0
package/codex/maestro-flow/commands/lifecycle/ui-design.md +135 -0
package/codex/maestro-flow/commands/lifecycle/update.md +176 -0
package/codex/maestro-flow/commands/lifecycle/verify.md +468 -0
package/codex/maestro-flow/commands/manage/codebase-rebuild.md +347 -0
package/codex/maestro-flow/commands/manage/codebase-refresh.md +66 -0
package/codex/maestro-flow/commands/manage/harvest.md +91 -0
package/codex/maestro-flow/commands/manage/issue-discover.md +431 -0
package/codex/maestro-flow/commands/manage/issue.md +75 -0
package/codex/maestro-flow/commands/manage/knowhow-capture.md +110 -0
package/codex/maestro-flow/commands/manage/knowhow.md +95 -0
package/codex/maestro-flow/commands/manage/learn.md +137 -0
package/codex/maestro-flow/commands/manage/status.md +76 -0
package/codex/maestro-flow/commands/manage/wiki.md +55 -0
package/codex/maestro-flow/commands/milestone/audit.md +87 -0
package/codex/maestro-flow/commands/milestone/complete.md +91 -0
package/codex/maestro-flow/commands/milestone/release.md +70 -0
package/codex/maestro-flow/commands/quality/auto-test.md +547 -0
package/codex/maestro-flow/commands/quality/debug.md +334 -0
package/codex/maestro-flow/commands/quality/refactor.md +151 -0
package/codex/maestro-flow/commands/quality/retrospective.md +292 -0
package/codex/maestro-flow/commands/quality/review.md +364 -0
package/codex/maestro-flow/commands/quality/sync.md +111 -0
package/codex/maestro-flow/commands/quality/test.md +498 -0
package/codex/maestro-flow/commands/spec/add.md +101 -0
package/codex/maestro-flow/commands/spec/load.md +77 -0
package/codex/maestro-flow/commands/spec/remove.md +69 -0
package/codex/maestro-flow/commands/spec/setup.md +75 -0
package/codex/maestro-flow/commands/wiki/connect.md +73 -0
package/codex/maestro-flow/commands/wiki/digest.md +87 -0
package/package.json +24 -0

package/codex/maestro-flow/commands/quality/debug.md ADDED Viewed

@@ -0,0 +1,334 @@
+---
+name: quality-debug
+description: Hypothesis-driven debugging via CSV wave pipeline. Wave 1 generates parallel hypotheses, Wave 2 attempts parallel fixes on confirmed hypotheses. Replaces quality-debug command.
+argument-hint: "[-y|--yes] [-c|--concurrency N] [--continue] \"[bug description] [--from-uat <phase>] [--parallel]\""
+allowed-tools: spawn_agents_on_csv, Read, Write, Edit, Bash, Glob, Grep, AskUserQuestion
+---
+<purpose>
+Wave-based hypothesis-driven debugging using `spawn_agents_on_csv`. Wave 1 explores hypotheses in parallel, Wave 2 attempts fixes on confirmed hypotheses in parallel.
+**Core workflow**: Gather Symptoms -> Generate Hypotheses -> Parallel Investigation -> Parallel Fix Attempts -> Unify Results
+```
++---------------------------------------------------------------------------+
+|                    DEBUG CSV WAVE WORKFLOW                                 |
++---------------------------------------------------------------------------+
+|                                                                           |
+|  Phase 1: Input Resolution -> CSV                                         |
+|     +-- Parse mode: standalone / --from-uat / --parallel                  |
+|     +-- Gather symptoms (interactive) or load UAT gaps (pre-filled)       |
+|     +-- Cluster gaps by component (if from-uat)                           |
+|     +-- Generate 3-5 hypotheses per cluster/issue                         |
+|     +-- Generate tasks.csv with one row per hypothesis                    |
+|     +-- User validates hypothesis breakdown (skip if -y)                  |
+|                                                                           |
+|  Phase 2: Wave Execution Engine                                           |
+|     +-- Wave 1: Hypothesis Investigation (parallel)                       |
+|     |   +-- Each agent investigates one hypothesis                        |
+|     |   +-- Agent searches code, logs evidence, confirms/refutes          |
+|     |   +-- Discoveries shared via board (code patterns, root causes)     |
+|     |   +-- Results: evidence_for + evidence_against per hypothesis       |
+|     +-- Wave 2: Fix Attempts (parallel, confirmed hypotheses only)        |
+|     |   +-- Filter: only hypotheses with status=confirmed from wave 1     |
+|     |   +-- Each agent attempts fix for its confirmed root cause          |
+|     |   +-- Agent applies fix, runs verification, logs result             |
+|     |   +-- Results: fix_applied + verified per fix task                  |
+|     +-- discoveries.ndjson shared across all waves (append-only)          |
+|                                                                           |
+|  Phase 3: Results Aggregation                                             |
+|     +-- Export results.csv with all investigation + fix outcomes           |
+|     +-- Generate context.md with diagnosis summary                        |
+|     +-- Update UAT gaps with diagnosis (if --from-uat)                    |
+|     +-- Update issues.jsonl with diagnosis results                        |
+|     +-- Display summary with next steps                                   |
+|                                                                           |
++---------------------------------------------------------------------------+
+```
+</purpose>
+<context>
+```bash
+$quality-debug "Login button throws 500 error on click"
+$quality-debug -y "JWT token not refreshed --from-uat 3"
+$quality-debug -c 4 "Navigation crash --from-uat 3 --parallel"
+$quality-debug -y "--from-auto-test 3"
+$quality-debug --continue "20260318-debug-P3-jwt-expiry"
+```
+**Flags**:
+- `-y, --yes`: Skip all confirmations (auto mode)
+- `-c, --concurrency N`: Max concurrent agents within each wave (default: 5)
+- `--continue`: Resume existing session
+- `--from-uat <phase>`: Load gaps from UAT uat.md as pre-filled symptoms
+- `--from-auto-test <phase>`: Load code_defect failures from auto-test report.json as pre-filled symptoms
+- `--parallel`: One agent per gap cluster (implies from-uat or from-auto-test)
+When `--yes` or `-y`: Auto-confirm hypothesis selection, skip interactive symptom gathering (require bug description in args), use defaults for mode detection.
+**Output Directory**: `.workflow/.csv-wave/{session-id}/`
+**Core Output**: `tasks.csv` (master state) + `results.csv` (final) + `discoveries.ndjson` (shared exploration) + `context.md` (human-readable report)
+</context>
+<csv_schema>
+### tasks.csv (Master State)
+```csv
+id,title,description,hypothesis,evidence_for,evidence_against,deps,context_from,wave,status,findings,fix_applied,verified,error
+"H1","Null pointer in login handler","Investigate whether login handler crashes due to null user object after failed DB lookup","User object is null when DB returns empty result; login.ts:42 dereferences without null check","","","","","1","","","","",""
+"H2","Missing error boundary","Investigate whether unhandled promise rejection in auth middleware propagates to 500","Auth middleware catches DB errors but not validation errors; middleware.ts:78 has no catch block","","","","","1","","","","",""
+"H3","Stale session token","Investigate whether expired session tokens bypass refresh logic","Session refresh only triggers on 403 but server returns 401 for expired tokens; session.ts:15","","","","","1","","","","",""
+"FIX-H1","Fix null pointer in login","Apply null check before user object dereference in login handler","","","","H1","H1","2","","","","",""
+"FIX-H3","Fix session token refresh","Update refresh trigger to also handle 401 status codes","","","","H3","H3","2","","","","",""
+```
+**Columns**:
+| Column | Phase | Description |
+|--------|-------|-------------|
+| `id` | Input | Unique task identifier: `H{N}` for hypotheses (wave 1), `FIX-H{N}` for fixes (wave 2) |
+| `title` | Input | Short hypothesis or fix title |
+| `description` | Input | Detailed investigation/fix instructions |
+| `hypothesis` | Input | The hypothesis being tested (wave 1) or empty (wave 2) |
+| `evidence_for` | Output | Evidence supporting the hypothesis |
+| `evidence_against` | Output | Evidence refuting the hypothesis |
+| `deps` | Input | Semicolon-separated dependency task IDs (wave 2 depends on wave 1) |
+| `context_from` | Input | Semicolon-separated task IDs whose findings this task needs |
+| `wave` | Computed | Wave number (1 = investigation, 2 = fix attempt) |
+| `status` | Output | `pending` -> `confirmed` / `refuted` / `inconclusive` / `fixed` / `fix_failed` / `skipped` |
+| `findings` | Output | Key findings summary (max 500 chars) |
+| `fix_applied` | Output | Description of fix applied (wave 2 only) |
+| `verified` | Output | `true` / `false` -- whether fix was verified to work (wave 2 only) |
+| `error` | Output | Error message if failed |
+### Per-Wave CSV (Temporary)
+Each wave generates `wave-{N}.csv` with extra `prev_context` column.
+### Output Artifacts
+| File | Purpose | Lifecycle |
+|------|---------|-----------|
+| `tasks.csv` | Master state -- all tasks with status/findings | Updated after each wave |
+| `wave-{N}.csv` | Per-wave input (temporary) | Created before wave, deleted after |
+| `results.csv` | Final export of all task results | Created in Phase 3 |
+| `discoveries.ndjson` | Shared exploration board | Append-only, carries across waves |
+| `context.md` | Human-readable diagnosis report | Created in Phase 3 |
+### Session Structure
+```
+.workflow/.csv-wave/{YYYYMMDD}-debug-P{N}-{slug}/
++-- tasks.csv
++-- results.csv
++-- discoveries.ndjson
++-- context.md
++-- wave-{N}.csv (temporary)
+```
+</csv_schema>
+<invariants>
+1. **Start Immediately**: First action is session initialization, then Phase 1
+2. **Wave Order is Sacred**: Never execute wave 2 before wave 1 completes and results are merged
+3. **CSV is Source of Truth**: Master tasks.csv holds all state
+4. **Context Propagation**: prev_context built from master CSV, not from memory
+5. **Discovery Board is Append-Only**: Never clear, modify, or recreate discoveries.ndjson
+6. **Skip on Refuted**: Wave 2 fix tasks skip if their hypothesis was refuted or inconclusive
+7. **Cleanup Temp Files**: Remove wave-{N}.csv after results are merged
+8. **DO NOT STOP**: Continuous execution until all waves complete
+</invariants>
+<execution>
+### Session Initialization
+```
+Parse from $ARGUMENTS:
+  AUTO_YES       ← --yes | -y
+  continueMode   ← --continue
+  maxConcurrency ← --concurrency | -c N  (default: 5)
+  fromUat        ← --from-uat <phase>  (default: null)
+  fromAutoTest   ← --from-auto-test <phase>  (default: null)
+  parallelMode   ← --parallel
+  bugDescription ← remaining text after flag removal
+Derive:
+  phaseRef       ← fromUat || fromAutoTest || null
+  sourceType     ← fromAutoTest ? "auto-test" : fromUat ? "uat" : "standalone"
+  slug           ← bugDescription kebab-cased, max 40 chars
+  dateStr        ← UTC+8 YYYYMMDD
+  sessionId      ← phaseRef ? "{dateStr}-debug-P{phaseRef}-{slug}" : "{dateStr}-debug-{slug}"
+  sessionFolder  ← ".workflow/.csv-wave/{sessionId}"
+mkdir -p {sessionFolder}
+```
+### Phase 1: Input Resolution -> CSV
+**Objective**: Parse mode, gather symptoms or load UAT gaps, generate hypotheses, build tasks.csv.
+**Decomposition Rules**:
+1. **Mode detection**:
+| Condition | Mode |
+|-----------|------|
+| `--from-uat` flag present | from-uat (load gaps from uat.md) |
+| `--from-auto-test` flag present | from-auto-test (load code_defects from report.json) |
+| `--parallel` flag present | parallel (implies from-uat or from-auto-test, one agent per gap cluster) |
+| Neither flag | standalone (gather symptoms interactively) |
+2. **Related session discovery**: Query `state.json.artifacts[]` for matching phase+milestone. Extract relevant outputs by type: execute -> .summaries/.task/, review -> review.json (guide hypotheses), debug -> understanding.md (avoid re-investigation), test -> uat.md + .tests/auto-test/report.json.
+3. **Symptom collection**:
+| Mode | Source | Action |
+|------|--------|--------|
+| standalone | User input | Ask 5 questions: expected, actual, errors, timeline, reproduction |
+| from-uat | test artifact's uat.md (via registry) | Parse Gaps section, cluster by component |
+| from-auto-test | test artifact's `.tests/auto-test/report.json` (via registry) | Parse `failures[]` where `classification == "code_defect"`, cluster by target module |
+| parallel | test artifact's uat.md or report.json (via registry) | Same as from-uat/from-auto-test, one investigation per cluster |
+**from-auto-test specifics**: Each `code_defect` failure provides: `scenario_id`, `req_ref`, `description`, `expected`, `actual`, `fix_suggestion.file`, `fix_suggestion.line`, `fix_suggestion.direction`. Map these to symptoms: expected=failure.expected, actual=failure.actual, location=fix_suggestion.file:line, context=fix_suggestion.direction.
+3. **Hypothesis generation**: Per symptom cluster, analyze affected code and generate 3-5 ranked hypotheses (each becomes a wave 1 row).
+4. **Fix task generation**: Pre-generate wave 2 fix row per hypothesis (`deps`/`context_from` -> hypothesis ID). Only executes if hypothesis confirmed.
+5. **CSV generation**: Hypothesis rows (wave 1) + fix rows (wave 2).
+**Wave computation**: Simple 2-wave -- all hypothesis tasks = wave 1, all fix tasks = wave 2.
+**User validation**: Display hypothesis breakdown (skip if AUTO_YES).
+### Phase 2: Wave Execution Engine
+**Objective**: Investigate hypotheses wave-by-wave via spawn_agents_on_csv.
+#### Wave 1: Hypothesis Investigation (Parallel)
+1. Extract wave 1 pending rows from master `tasks.csv` into `wave-1.csv` (no prev_context needed)
+2. Execute:
+```javascript
+spawn_agents_on_csv({
+  csv_path: `${sessionFolder}/wave-1.csv`,
+  id_column: "id",
+  instruction: buildInvestigationInstruction(sessionFolder),
+  max_concurrency: maxConcurrency, max_runtime_seconds: 3600,
+  output_csv_path: `${sessionFolder}/wave-1-results.csv`,
+  output_schema: { id, status: [confirmed|refuted|inconclusive|failed], findings, evidence_for, evidence_against, error }
+})
+```
+3. Merge results into master `tasks.csv`, delete `wave-1.csv`
+4. **Filter for wave 2**: Mark fix tasks as `skipped` if their hypothesis was `refuted` or `inconclusive`
+#### Wave 2: Fix Attempts (Parallel, Confirmed Only)
+1. If no confirmed hypotheses remain, skip wave 2 entirely
+2. Extract wave 2 pending rows, build `prev_context` from confirmed wave 1 findings
+3. Write `wave-2.csv`, then execute:
+```javascript
+spawn_agents_on_csv({
+  csv_path: `${sessionFolder}/wave-2.csv`,
+  id_column: "id",
+  instruction: buildFixInstruction(sessionFolder),
+  max_concurrency: maxConcurrency, max_runtime_seconds: 3600,
+  output_csv_path: `${sessionFolder}/wave-2-results.csv`,
+  output_schema: { id, status: [fixed|fix_failed|failed], findings, fix_applied, verified, error }
+})
+```
+4. Merge results into master `tasks.csv`, delete `wave-2.csv`
+### Phase 3: Results Aggregation
+**Objective**: Generate final results and human-readable report.
+1. Export final `tasks.csv` as `results.csv`
+2. **Generate context.md**: Debug report with summary (mode, hypothesis/confirmed/fixed/verified counts), per-hypothesis results (hypothesis, evidence for/against, findings, status), per-fix results (fix applied, verified, findings), aggregated root causes, and next steps.
+3. **UAT update** (if --from-uat): Update `uat.md` gaps with `root_cause`, `fix_direction`, `affected_files` for confirmed hypotheses.
+4. **Issue update**: If `issues.jsonl` exists, update matching issues with status `diagnosed`, add `context.suggested_fix` and `context.notes`.
+5. **Register artifact** (phase-scoped only): Append to `state.json.artifacts[]` with `type: "debug"`, `id: DBG-NNN`, `depends_on: triggering_review_id || exec_art.id`.
+6. **Post-debug Knowledge Inquiry**: Prompt user to capture knowledge when:
+   - Recurring root cause pattern detected -> `/spec-add debug`
+   - Non-obvious fix strategy used -> `/spec-add learning`
+   - Architectural gap identified -> `/spec-add arch`
+8. **Next step routing**:
+| Result | Suggestion |
+|--------|------------|
+| All fixes verified | Run tests: `Skill({ skill: "maestro-flow", args: "--cmd quality-test {phase}" })` |
+| Fixes applied, not verified | Re-verify: `Skill({ skill: "maestro-flow", args: "--cmd maestro-verify {phase}" })` |
+| Confirmed but no fix | Plan fixes: `Skill({ skill: "maestro-flow", args: "--cmd maestro-plan {phase} --gaps" })` |
+| All inconclusive | Resume with more context or manual investigation |
+| From UAT, all diagnosed | `Skill({ skill: "maestro-flow", args: "--cmd quality-test {phase} --auto-fix" })` |
+9. Display summary.
+### Shared Discovery Board Protocol
+#### Standard Discovery Types
+| Type | Dedup Key | Data Schema | Description |
+|------|-----------|-------------|-------------|
+| `code_pattern` | `data.name` | `{name, file, description}` | Reusable code pattern found |
+| `integration_point` | `data.file` | `{file, description, exports[]}` | Module connection point |
+| `convention` | singleton | `{naming, imports, formatting}` | Project code conventions |
+| `blocker` | `data.issue` | `{issue, severity, impact}` | Blocking issue found |
+| `tech_stack` | singleton | `{framework, language, tools[]}` | Technology stack info |
+#### Domain Discovery Types
+| Type | Dedup Key | Data Schema | Description |
+|------|-----------|-------------|-------------|
+| `root_cause` | `data.location` | `{location, cause, severity, confidence}` | Confirmed root cause |
+| `hypothesis_evidence` | `data.hypothesis+data.location` | `{hypothesis, location, type, conclusion}` | Evidence for/against hypothesis |
+| `affected_component` | `data.component` | `{component, files[], impact}` | Component affected by bug |
+| `reproduction_path` | `data.trigger` | `{trigger, steps[], frequency}` | Bug reproduction path |
+#### Protocol
+Read `discoveries.ndjson` before investigation. Append-only: dedup by type+key before writing, never modify/delete.
+```bash
+echo '{"ts":"<ISO>","worker":"{id}","type":"root_cause","data":{"location":"src/auth/login.ts:42","cause":"null_dereference","severity":"high","confidence":"confirmed"}}' >> {session_folder}/discoveries.ndjson
+```
+</execution>
+<error_codes>
+| Error | Resolution |
+|-------|------------|
+| No bug description and no --from-uat/--from-auto-test | Abort with error: "Issue description required" |
+| UAT file not found for --from-uat phase | Abort with error: "uat.md not found for phase {N}" |
+| Auto-test report not found for --from-auto-test phase | Abort with error: "report.json not found for phase {N}" |
+| No gaps in UAT file / no code_defects in report | Abort with error: "No failed gaps/defects found" |
+| Hypothesis agent timeout | Mark as inconclusive, continue with remaining |
+| All hypotheses refuted | Skip wave 2, suggest manual investigation |
+| Fix agent timeout | Mark as fix_failed, report partial results |
+| CSV parse error | Validate format, show line number |
+| discoveries.ndjson corrupt | Ignore malformed lines |
+| Continue mode: no session found | List available sessions |
+| Existing debug session found | Offer resume (skip if AUTO_YES) |
+</error_codes>
+<success_criteria>
+- [ ] Session folder created with valid tasks.csv
+- [ ] Wave 1 hypotheses investigated in parallel
+- [ ] Refuted/inconclusive hypotheses correctly skip wave 2 fix tasks
+- [ ] Wave 2 fixes attempted only for confirmed hypotheses
+- [ ] context.md produced with diagnosis summary
+- [ ] UAT gaps updated (if --from-uat)
+- [ ] Issues updated with diagnosis results
+- [ ] discoveries.ndjson append-only throughout
+</success_criteria>

package/codex/maestro-flow/commands/quality/refactor.md ADDED Viewed

@@ -0,0 +1,151 @@
+---
+name: quality-refactor
+description: Tech debt reduction with reflection-driven iteration. Analyze scope, plan refactoring, execute with test verification, reflect on strategy per round.
+argument-hint: "<phase|--dir path> [--max-iterations N]"
+allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Agent, AskUserQuestion
+---
+<purpose>
+Iterative refactoring cycle: analyze scope for tech debt -> plan refactoring tasks -> execute each with test verification -> reflect on strategy per round -> repeat if needed. Every change is verified against existing tests. Failed changes are reverted and retried with adjusted strategy.
+</purpose>
+<context>
+$ARGUMENTS -- module path, feature area, or "all", plus optional flags.
+**Usage**:
+```bash
+$quality-refactor "src/auth"                    # module path scope
+$quality-refactor "authentication"              # feature area scope
+$quality-refactor "all"                         # full codebase scan
+$quality-refactor "src/api --max-iterations 5"  # limit iteration rounds
+$quality-refactor "--dir .workflow/scratch/refactor-auth-2026-03-18"  # resume existing
+```
+**Flags**:
+- `<phase|scope>`: Module path, feature area, or "all"
+- `--dir path`: Resume existing refactor scratch directory
+- `--max-iterations N`: Max refactoring rounds (default: 3)
+**Output**: `.workflow/scratch/refactor-{slug}-{date}/` with index.json, plan.json, reflection-log.md, .task/, .summaries/
+</context>
+<invariants>
+1. **Test after every change** -- zero regressions tolerated
+2. **Revert on failure** -- never leave broken state
+3. **Max 2 retries per task** with strategy adjustment
+4. **Reflection-driven** -- every round records strategy, outcome, adjustment
+5. **User approval required** before execution (Step 4)
+6. **Quick wins first** -- order by risk (low first) and dependency
+7. **Agent calls use `run_in_background: false`** for synchronous execution
+8. **Incremental safety** -- each task is independently safe to apply or revert
+</invariants>
+<execution>
+### Step 1: Parse Scope
+1. Parse `$ARGUMENTS` for scope and flags
+2. If `--dir` provided: resume existing scratch directory (skip to Step 5)
+3. Scope types:
+   - Module path (e.g., "src/auth") -> scan that directory
+   - Feature area (e.g., "authentication") -> search for related files
+   - "all" -> full codebase scan
+4. If empty: prompt user via AskUserQuestion with options (Module path / Feature area / Full codebase)
+5. Detect `--max-iterations N` (default: 3)
+### Step 2: Create Scratch Directory
+Create `.workflow/scratch/refactor-{slug}-{date}/` with `.task/` and `.summaries/` subdirectories. Write `index.json` with type "refactor", scope, status "active", plan/execution/reflection counters.
+### Step 3: Scope Analysis
+Load project specs if available (`maestro spec load --category coding`).
+Analyze scope for tech debt categories:
+| Category | What to Look For |
+|----------|-----------------|
+| Duplication | Repeated code blocks, copy-paste patterns |
+| Complexity | Long functions, deep nesting, high cyclomatic complexity |
+| Naming | Inconsistent naming, unclear identifiers |
+| Dependencies | Circular deps, tight coupling, god objects |
+| Dead code | Unused functions, unreachable branches |
+| Pattern violations | Inconsistent with project conventions |
+Present analysis summary table with category, count, severity.
+Confirm with user before proceeding.
+### Step 4: Plan Refactoring
+1. Write `plan.json` with scope, total_tasks, strategy ("incremental -- each task independently safe")
+2. For each identified issue, create `.task/TASK-{NNN}.json`:
+   - id, title, status (pending), type (refactor), category
+   - description, read_first files, files with action/target/change
+   - convergence.criteria (grep-verifiable), verification command
+   - implementation steps, risk level
+3. Order: high risk last, dependencies respected, quick wins first
+4. Update `index.json` plan fields
+5. Present plan to user via AskUserQuestion -- show affected files, risk areas, ask for approval
+### Step 5: Execute with Reflection
+Initialize `reflection-log.md` if not exists.
+For each task in order:
+**5a. Execute refactoring:** Spawn Agent to implement the refactoring — read `read_first` files, apply changes to targets, follow convergence criteria exactly.
+**5b. Run test suite** (npm test / pytest / go test as appropriate).
+**5c. Record in reflection-log.md:** Round number, task title, strategy, result (pass/fail), test outcome, adjustment for next round, files changed.
+**5d. Handle test failures:**
+1. Revert the change
+2. Record failure + strategy adjustment in reflection-log.md
+3. Retry with adjusted strategy (max 2 retries per task)
+4. If still failing: mark task "blocked", continue to next
+**5e. Update state:**
+- `.task/TASK-{NNN}.json` status -> "completed" or "blocked"
+- `.summaries/TASK-{NNN}-summary.md` written
+- `index.json` execution and reflection fields updated
+### Step 6: Final Verification
+Run full test suite. Record final state in reflection-log.md: test result, tasks completed/total, tasks blocked, key learnings.
+### Step 7: Complete and Report
+Update `index.json`: status -> "completed", final execution/reflection counts.
+Display report: scope, tasks completed/blocked, reflection rounds, strategy adjustments, test status, key learnings from reflection-log.md, artifact paths (`{REFACTOR_DIR}/reflection-log.md`, `{REFACTOR_DIR}/.summaries/`).
+**Next-step routing:**
+| Result | Next Step |
+|--------|-----------|
+| All tests pass, refactoring complete | `$quality-sync` (update codebase docs) |
+| Test failures remain after refactor | `$quality-debug "{scope}"` |
+| No test suite available for scope | `$quality-auto-test "{phase}"` |
+| Partial completion (some blocked) | `$quality-debug "{scope}"` for blocked tasks |
+</execution>
+<error_codes>
+| Code | Severity | Condition | Recovery |
+|------|----------|-----------|----------|
+| E001 | error | Scope/description required | Prompt user for module path, feature area, or "all" |
+| E002 | error | Test suite not available | Suggest creating tests first, or proceed with manual verification |
+| W001 | warning | Partial test coverage | Note uncovered areas, proceed with extra caution |
+</error_codes>
+<success_criteria>
+- [ ] Scope resolved and scratch directory created
+- [ ] Tech debt analysis completed with categorized findings
+- [ ] Refactoring plan approved by user
+- [ ] Each task executed with test verification
+- [ ] Failed changes reverted, retried with adjusted strategy
+- [ ] Reflection log records every round's strategy and outcome
+- [ ] Final test suite passes with zero regressions
+- [ ] Completion report with key learnings displayed
+</success_criteria>