npm - maestro-flow - Versions diffs - 0.4.5 → 0.4.7 - Mend

maestro-flow 0.4.5 → 0.4.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

package/.codex/skills/maestro-collab/SKILL.md CHANGED Viewed

@@ -2,83 +2,86 @@
 name: maestro-collab
 description: Use when a question needs cross-verification from multiple CLI tools or diverse analytical perspectives
 argument-hint: "\"<requirement>\" [--tools gemini,qwen,claude] [--mode analysis|write] [--rule <template>] [-y]"
-allowed-tools: spawn_agents_on_csv, Read, Write, Edit, Bash, Glob, Grep, request_user_input
+allowed-tools: Read, Write, Edit, Glob, Grep, request_user_input
 ---
 <purpose>
-Wave-based multi-CLI collaboration via `spawn_agents_on_csv`. Diamond topology:
-Wave 1 (parallel CLI fan-out) → Wave 2 (cross-verify) → Wave 3 (synthesis).
+Direct CLI fan-out collaboration via `exec_command`. Diamond topology:
+Fan-out (parallel `exec_command` → `maestro delegate --to <tool>`) → Cross-verify (coordinator) → Synthesize (coordinator).
-Each CLI tool independently analyzes the requirement. Results cross-verified for consensus/conflicts, synthesized into unified report.
+Each CLI tool independently analyzes the requirement via `maestro delegate` shell call.
+Coordinator polls ALL CLI results to completion via delegate-protocol.codex.md,
+then cross-verifies for consensus/conflicts and synthesizes into unified report.
+NO spawn_agents_on_csv. NO spawn_agent. ALL CLI calls directly by coordinator via exec_command.
 </purpose>
 <context>
 $ARGUMENTS — requirement text and optional flags.
 **Flags**:
 - `--tools <list>`: Comma-separated CLI tools (default: first 3 enabled)
 - `--mode analysis|write`: Delegate mode (default: analysis)
-- `--rule <template>`: Shared rule template
+- `--rule <template>`: Shared rule template for all delegates (see Rule Reference below)
 - `-y`: Skip confirmations
-- `-c N`: Max concurrency per wave (default: 5)
+**Rule Reference** — common `--rule` values for collab scenarios:
+| Scenario | Rule | Description |
+|----------|------|-------------|
+| Code quality review | `analysis-review-code-quality` | 代码质量多维评审 |
+| Architecture review | `analysis-review-architecture` | 架构设计评审 |
+| Bug root cause | `analysis-diagnose-bug-root-cause` | Bug 根因诊断 |
+| Security assessment | `analysis-assess-security-risks` | 安全风险评估 |
+| Performance analysis | `analysis-analyze-performance` | 性能瓶颈分析 |
+| Code pattern analysis | `analysis-analyze-code-patterns` | 代码模式/反模式识别 |
+| Architecture design | `planning-plan-architecture-design` | 架构方案设计 |
+| Task breakdown | `planning-breakdown-task-steps` | 任务分解规划 |
+| Migration strategy | `planning-plan-migration-strategy` | 迁移策略制定 |
+| Rigorous style | `universal-universal-rigorous-style` | 严谨风格（通用） |
 **Auto-select** (no --tools): read `~/.maestro/cli-tools.json` → filter enabled + eligible → first 3. Exclude api-endpoint when --mode write. Minimum 2 required.
-**Session**: `.workflow/.csv-wave/{YYYYMMDD}-collab-{slug}/`
+**Session**: `.workflow/.maestro/{YYYYMMDD}-collab-{slug}/`
 **Scratch**: `.workflow/scratch/{YYYYMMDD}-collab-{slug}/`
 **Output files**:
 - `collab-report.md` — merged findings (Consensus/Conflicts/Unique/Recommendations)
 - `context.md` — Locked/Free/Deferred decisions (plan compatible)
 - `conclusions.json` — session_id, tools[], consensus_level, recommendation, confidence, dimensions[], decisions[]
-- `per-tool/{tool}-output.md` — raw outputs
-</context>
-<csv_schema>
-### tasks.csv
-```csv
-id,title,description,tool,role,prompt,mode,rule,deps,context_from,wave,status,findings,recommendations,confidence,error
-"1","CLI: gemini","...","gemini","analyze","<prompt>","analysis","","","","1","","","","",""
-"2","CLI: claude","...","claude","analyze","<prompt>","analysis","","","","1","","","","",""
-"3","Cross-Verify","Compare CLI outputs: CONSENSUS/CONFLICT/UNIQUE","","","","","","1;2","1;2","2","","","","",""
-"4","Synthesis","Merge verified findings → collab-report.md + context.md + conclusions.json","","","","","","3","3","3","","","","",""
-```
+- `per-tool/{tool}-output.md` — raw CLI outputs
-Input columns: id, title, description, tool, role, prompt, mode, rule, deps, context_from, wave.
-Output columns: status (pending→completed/failed), findings, recommendations, confidence, error.
-### Downstream Compatibility
+**Downstream compatibility**:
 | Consumer | Artifact |
 |----------|----------|
 | maestro-plan | context.md + conclusions.json (via --dir) |
 | maestro-analyze | context.md as prior context (via state.json) |
 | maestro-ralph | artifact chain lookup (type=collab) |
-</csv_schema>
+</context>
 <invariants>
-1. **Wave order sacred**: Never execute wave N+1 before wave N completes
-2. **CSV is source of truth**: Master tasks.csv holds all state
-3. **Same prompt, different tool**: Wave 1 agents all use same base prompt, only --to differs
-4. **Minimum 2 tools**: Abort if fewer eligible
-5. **Delegate protocol**: All exec_command calls follow delegate-protocol.codex.md (yield_time + poll)
-6. **Partial degradation**: If 1+ tool fails in wave 1, continue with remaining
-7. **Discovery board append-only**: Never modify/delete discoveries.ndjson
+1. **ALL analysis via exec_command → maestro delegate** — coordinator NEVER performs analysis internally, NEVER spawns agents for analysis
+2. **exec_command is the execution mechanism** — every delegate call: `exec_command({ cmd: "maestro delegate ..." })`
+3. **delegate-protocol.codex.md governs lifecycle** — MUST follow exec_command → poll write_stdin → parse for every delegate
+4. **NEVER fire-and-forget** — every exec_command MUST be polled to completion via write_stdin, result consumed before proceeding
+5. **NEVER substitute internal reasoning** — if CLI fails, report failure; do NOT generate analysis yourself as replacement
+6. **Indefinite wait** — polling has NO max timeout; continue polling until CLI returns regardless of elapsed time; NEVER abandon a running session
+6. **Same prompt, different --to** — fan-out delegates all use identical base prompt, only `--to <tool>` differs
+7. **Minimum 2 tools** — abort if fewer eligible
+8. **Partial degradation** — 1 tool fails → continue with remaining (minimum 2 results for cross-verify)
 </invariants>
 <state_machine>
 <states>
-S_PARSE      — 解析参数、发现工具                       PERSIST: —
-S_CONFIRM    — 展示计划、用户确认（-y 跳过）            PERSIST: —
-S_CSV_GEN    — 生成 tasks.csv                           PERSIST: tasks.csv
-S_WAVE_1     — CLI Fan-Out (parallel spawn)              PERSIST: per-tool outputs + tasks.csv
-S_WAVE_2     — Cross-Verify (single agent spawn)         PERSIST: tasks.csv
-S_WAVE_3     — Synthesis (single agent spawn)            PERSIST: reports + tasks.csv
-S_AGGREGATE  — 注册 artifact、输出摘要                   PERSIST: state.json + results.csv
+S_PARSE          — 解析参数、发现工具                       PERSIST: —
+S_CONFIRM        — 展示计划、用户确认（-y 跳过）            PERSIST: —
+S_FAN_OUT        — 并行 exec_command fan-out + 轮询等待      PERSIST: per-tool outputs
+S_CROSS_VERIFY   — 交叉验证：共识/冲突/独特分类              PERSIST: cross-verify.md
+S_SYNTHESIZE     — 生成最终报告                              PERSIST: reports
+S_AGGREGATE      — 注册 artifact、输出摘要                   PERSIST: state.json
 </states>
 <transitions>
@@ -88,25 +91,22 @@ S_PARSE:
   → ERROR(E002)  WHEN: eligible tools < 2
 S_CONFIRM:
-  → S_CSV_GEN    WHEN: -y OR user confirms "执行"
-  → S_PARSE      WHEN: user modifies tools               DO: re-select, validate >= 2
+  → S_FAN_OUT    WHEN: -y OR user confirms
+  → S_PARSE      WHEN: user modifies tools
   → END          WHEN: user cancels
-S_CSV_GEN:
-  → S_WAVE_1     DO: A_GENERATE_CSV (N tool rows wave 1 + 1 verify wave 2 + 1 synthesis wave 3)
-S_WAVE_1:
-  → S_WAVE_2     WHEN: 1+ agents completed               DO: A_SPAWN_WAVE_1
-  → ERROR(E004)  WHEN: all failed
+S_FAN_OUT:
+  → S_CROSS_VERIFY  WHEN: 2+ delegates completed        DO: A_FAN_OUT_DELEGATES
+  → ERROR(E004)     WHEN: all failed OR fewer than 2 completed
-S_WAVE_2:
-  → S_WAVE_3     DO: A_SPAWN_WAVE_2
+S_CROSS_VERIFY:
+  → S_SYNTHESIZE    DO: A_CROSS_VERIFY
-S_WAVE_3:
-  → S_AGGREGATE  DO: A_SPAWN_WAVE_3
+S_SYNTHESIZE:
+  → S_AGGREGATE     DO: A_SYNTHESIZE
 S_AGGREGATE:
-  → END          DO: A_AGGREGATE_RESULTS
+  → END             DO: A_AGGREGATE_RESULTS
 </transitions>
@@ -114,111 +114,212 @@ S_AGGREGATE:
 ### A_PARSE_AND_DISCOVER
-1. Parse flags: requirement, tools, mode, rule, autoYes, concurrency
-2. Read cli-tools.json → build eligible tool list
+1. Parse flags: requirement, tools, mode, rule, autoYes
+2. Read `~/.maestro/cli-tools.json` → build eligible tool list
 3. Auto-select if no --tools: first 3 eligible in config order
-4. Load context: project.md + `maestro spec load --category arch` + `maestro wiki list --category arch`
-5. Build shared delegate prompt (6-field format: PURPOSE/TASK/MODE/CONTEXT/EXPECTED/CONSTRAINTS)
+4. Build shared delegate prompt (6-field format):
+   ```
+   PURPOSE: {requirement} + cross-verification analysis
+   TASK: {specific analysis tasks from requirement}
+   MODE: {mode}
+   CONTEXT: @**/* | {project context if available}
+   EXPECTED: Structured findings with evidence, confidence per dimension
+   CONSTRAINTS: {scope limits}
+   ```
+5. Create session + scratch dirs
+6. `update_plan` with all phases pending
-### A_GENERATE_CSV
+### A_FAN_OUT_DELEGATES
-Create session + scratch dirs. Write tasks.csv:
-- Wave 1: one row per selected tool (parallel)
-- Wave 2: cross-verify row (deps on all wave 1 IDs)
-- Wave 3: synthesis row (deps on wave 2 ID)
+#### Phase 1: Parallel Launch
-### A_SPAWN_WAVE_1
-Filter wave==1 from CSV → write wave-1.csv.
+Launch ALL delegate commands simultaneously via `multi_tool_use.parallel`:
 ```
-spawn_agents_on_csv({ csv_path: "wave-1.csv", max_concurrency: N })
+multi_tool_use.parallel({
+  tool_uses: [
+    {
+      recipient_name: "functions.exec_command",
+      parameters: {
+        cmd: "maestro delegate \"<shared_prompt>\" --to gemini --mode <mode> [--rule <rule>]",
+        yield_time_ms: 30000,
+        max_output_tokens: 6000
+      }
+    },
+    {
+      recipient_name: "functions.exec_command",
+      parameters: {
+        cmd: "maestro delegate \"<shared_prompt>\" --to claude --mode <mode> [--rule <rule>]",
+        yield_time_ms: 30000,
+        max_output_tokens: 6000
+      }
+    }
+    // ... one entry per selected tool
+  ]
+})
 ```
-**Agent instruction**: Execute `maestro delegate "<prompt>" --to <tool> --mode <mode>` via exec_command (delegate-protocol.codex.md). Write output to per-tool/{tool}-output.md. Extract findings/recommendations/confidence. Append discoveries.ndjson.
+#### Phase 2: Block Until ALL Complete
-Merge results → master tasks.csv.
+For each result from Phase 1, check completion status:
-### A_SPAWN_WAVE_2
+- **Completed** (no session_id) → save output directly to `{scratchDir}/per-tool/{tool}-output.md`
+- **Running** (session_id returned) → add to `pending_sessions[]`
-Filter wave==2 → write wave-2.csv. Build prev_context from wave 1 findings.
+**Blocking poll loop — runs until pending_sessions is empty:**
 ```
-spawn_agents_on_csv({ csv_path: "wave-2.csv", max_concurrency: 1 })
+pending_sessions = [{ tool, session_id }, ...]
+WHILE pending_sessions.length > 0:
+  FOR EACH session IN pending_sessions:
+    result = write_stdin({
+      session_id: session.session_id,
+      chars: "",
+      yield_time_ms: 60000,          // 60s per poll — no rush, wait for real output
+      max_output_tokens: 6000
+    })
+    IF result indicates completed:
+      save output → {scratchDir}/per-tool/{session.tool}-output.md
+      REMOVE session FROM pending_sessions
+      completed_count += 1
+    IF result indicates failed/error:
+      log error for session.tool
+      REMOVE session FROM pending_sessions
+      failed_count += 1
+    // still running → stays in pending_sessions, poll again next round
 ```
-**Agent instruction**: Read all per-tool outputs + discoveries.ndjson. Classify each finding:
+**Blocking guarantees:**
+- `yield_time_ms: 60000` — each poll waits up to 60s for output, no short-circuit
+- NO max retry count — loop continues indefinitely until CLI returns
+- NO timeout escalation — delegate can run as long as needed (30s to 10min+)
+- NO early exit — even if tool 1 and 2 are done, keep polling tool 3 until it completes
+- Round-robin ensures fair polling across all pending sessions
+#### Phase 3: Validate
+- Count completed tools
+- completed < 2 → ERROR(E004)
+- 1 tool failed but 2+ succeeded → W001, log failure, continue
+**Iron rules**:
+- NEVER skip polling — every session_id MUST be polled to completion
+- NEVER proceed to S_CROSS_VERIFY while pending_sessions is non-empty
+- NEVER set a max timeout or max retry count on the poll loop
+- NEVER generate analysis internally as substitute for CLI output
+- NEVER summarize or paraphrase — save raw CLI output verbatim
+### A_CROSS_VERIFY
+Coordinator reads ALL per-tool outputs from `{scratchDir}/per-tool/` and classifies each finding:
 | Condition | Tag |
 |-----------|-----|
-| 2+ tools agree | CONSENSUS |
-| Tools disagree | CONFLICT |
-| 1 tool only | UNIQUE |
+| 2+ tools agree on same finding | CONSENSUS |
+| Tools have contradictory findings | CONFLICT |
+| Only 1 tool identified | UNIQUE |
-Compute consensus_level = consensus_count / total * 100.
+For each CONFLICT: note which tools disagree, their evidence, and confidence levels.
-Merge results → master tasks.csv.
+Compute: `consensus_level = consensus_count / total_findings * 100`
-### A_SPAWN_WAVE_3
+Write results to `{scratchDir}/cross-verify.md`.
-Filter wave==3 → write wave-3.csv. Build prev_context from wave 2 findings.
+### A_SYNTHESIZE
-```
-spawn_agents_on_csv({ csv_path: "wave-3.csv", max_concurrency: 1 })
-```
+Generate 3 output files from cross-verify results:
-**Agent instruction**: Resolve conflicts via evidence-weighted voting (higher confidence wins, specific evidence > general). Generate 3 files:
-1. **collab-report.md**: Summary, Consensus Findings, Resolved Conflicts, Unresolved Items, Unique Insights, Recommendations, Per-Tool Confidence table
-2. **context.md**: Locked (CONSENSUS), Free (UNIQUE w/ strong evidence), Deferred (UNRESOLVED). Standard Locked/Free/Deferred format.
-3. **conclusions.json**: session_id, subject, mode, tools[], consensus_level, recommendation (Go/No-Go/Conditional), confidence, dimensions[], decisions[]
+1. **collab-report.md**:
+   ```markdown
+   # Collaborative Analysis: {requirement}
-Merge results → master tasks.csv.
+   ## Summary
+   Tools: {tool list} | Consensus: {consensus_level}%
-### A_AGGREGATE_RESULTS
+   ## Consensus Findings
+   {findings agreed by 2+ tools, with evidence}
-1. Export tasks.csv → results.csv
-2. Verify outputs exist (fallback: build minimal from available findings)
-3. Copy collab-report.md + context.md + conclusions.json → scratchDir
-4. Register CLB artifact in state.json (type: collab, scope: adhoc)
-5. Spec enrichment: for each Locked decision → `maestro spec add arch`
-6. Display summary (requirement, tools, consensus_level, per-tool status, artifact ID, next steps)
+   ## Conflicts
+   {contradictory findings with per-tool positions and evidence}
-</actions>
+   ## Unique Insights
+   {single-tool findings worth noting}
-</state_machine>
+   ## Recommendations
+   {actionable recommendations, prioritized}
-<discovery_board>
+   ## Per-Tool Confidence
+   | Tool | Confidence | Key Contribution |
+   |------|-----------|-----------------|
+   ```
-| Type | Dedup Key | Data |
-|------|-----------|------|
-| cli_finding | tool+dimension | {tool, dimension, finding, confidence, evidence} |
-| consensus | area | {area, tools[], finding, confidence} |
-| conflict | area | {area, positions[{tool, stance, evidence}], resolution} |
-| unique_insight | tool+finding | {tool, finding, significance, actionable} |
+2. **context.md**: Locked (CONSENSUS) / Free (UNIQUE w/ strong evidence) / Deferred (CONFLICT unresolved)
-Protocol: read before analysis, append-only, dedup by type+key.
-</discovery_board>
+3. **conclusions.json**:
+   ```json
+   {
+     "session_id": "", "subject": "", "mode": "",
+     "tools": [], "consensus_level": 0,
+     "recommendation": "Go|No-Go|Conditional",
+     "confidence": 0,
+     "dimensions": [{ "name": "", "consensus": "", "details": "" }],
+     "decisions": [{ "area": "", "status": "locked|free|deferred", "rationale": "" }]
+   }
+   ```
+### A_AGGREGATE_RESULTS
+1. Copy outputs to scratchDir
+2. Register CLB artifact in state.json (type: collab, scope: adhoc)
+3. Spec enrichment: for each Locked decision → `maestro spec add arch`
+4. `update_plan` all steps completed
+5. Display summary:
+   ```
+   == Collab Analysis Complete ==
+   Requirement: {requirement}
+   Tools: {tool list with status}
+   Consensus Level: {consensus_level}%
+   Key Findings:
+     CONSENSUS: {count}
+     CONFLICT:  {count}
+     UNIQUE:    {count}
+   Reports: {scratchDir}/collab-report.md
+   Next: $maestro-plan --dir {scratchDir}
+   ```
+</actions>
+</state_machine>
 <error_codes>
 | Code | Condition | Recovery |
 |------|-----------|----------|
-| E002 | Fewer than 2 eligible tools | Check cli-tools.json |
-| E004 | All wave 1 delegates failed | Abort with per-tool error details |
-| W001 | One tool failed wave 1 | Continue with remaining |
-| W003 | Synthesis failed | Use cross-verify output as fallback |
-| W004 | consensus_level < 40% | Flag in summary |
+| E002 | Fewer than 2 eligible tools | Check cli-tools.json, specify --tools |
+| E004 | All delegates failed or < 2 completed | Show per-tool errors, abort |
+| W001 | One tool failed | Continue with remaining |
+| W004 | consensus_level < 40% | Flag in summary as low-confidence |
 </error_codes>
 <success_criteria>
-- [ ] Wave 1: all delegates via delegate-protocol.codex.md, per-tool outputs written
-- [ ] Wave 2: consensus/conflict/unique classified, consensus_level computed
-- [ ] Wave 3: collab-report.md + context.md + conclusions.json produced
-- [ ] CLB artifact registered, outputs copied to scratchDir
-- [ ] Partial degradation: continued if 1+ tools succeeded
+- [ ] ALL analysis performed via exec_command → maestro delegate — zero internal analysis
+- [ ] multi_tool_use.parallel used for fan-out launch
+- [ ] Every exec_command polled to completion via write_stdin — no timeout cap, no max retries
+- [ ] Blocking poll loop ran until pending_sessions empty — no early exit
+- [ ] Per-tool raw outputs saved to {scratchDir}/per-tool/
+- [ ] Cross-verify: CONSENSUS/CONFLICT/UNIQUE classified, consensus_level computed
+- [ ] collab-report.md + context.md + conclusions.json produced
+- [ ] CLB artifact registered in state.json
+- [ ] Partial degradation: continued if 2+ tools succeeded
 </success_criteria>
 <next_step_routing>
 - Deep feasibility analysis → `$maestro-analyze "{topic}"`
-- Plan from conclusions → `$maestro-plan --dir {dir}`
+- Plan from conclusions → `$maestro-plan --dir {scratchDir}`
 - Expand exploration → `$maestro-brainstorm "{topic}"`
 </next_step_routing>

package/.codex/skills/maestro-execute/SKILL.md CHANGED Viewed

@@ -94,12 +94,12 @@ When `--yes` or `-y`: Auto-confirm task breakdown, skip blocked-task prompts, au
 ### tasks.csv (Master State)
 ```csv
-id,title,description,scope,convergence_criteria,hints,execution_directives,deps,context_from,wave,status,findings,files_modified,tests_passed,error
-"TASK-001","Setup auth module","Create authentication module with JWT token generation and verification. Export verifyToken and generateToken functions.","src/auth/","auth.ts contains export function verifyToken(; auth.ts contains export function generateToken(","Reference existing middleware pattern in src/middleware/auth.ts","npm test -- --grep auth","","","1","","","","",""
-"TASK-002","Create user model","Define User interface and database schema with email, passwordHash, role fields. Use existing Result type pattern.","src/models/","user.ts contains export interface User; user.ts contains email: string","See src/models/session.ts for existing model pattern","npm test -- --grep user","","","1","","","","",""
-"TASK-003","Auth middleware","Create Express middleware that validates JWT from Authorization header. Use verifyToken from auth module. Return 401 on invalid token.","src/middleware/","auth-middleware.ts contains export function authMiddleware(; auth-middleware.ts contains verifyToken","Follows existing middleware pattern in src/middleware/logging.ts","npm test -- --grep middleware","TASK-001","TASK-001","2","","","","",""
-"TASK-004","Login endpoint","Implement POST /api/login endpoint. Validate credentials against user model, return JWT on success. Use generateToken from auth module.","src/routes/","login.ts contains router.post('/api/login'; login.ts contains generateToken(","Wire into existing Express app in src/app.ts","curl -X POST localhost:3000/api/login","TASK-001;TASK-002","TASK-001;TASK-002","2","","","","",""
-"TASK-005","Integration tests","Write integration tests for full auth flow: register, login, access protected route, token refresh.","tests/","tests/auth.test.ts exists; npm test exits with code 0","Use existing test setup in tests/setup.ts","npm test","TASK-003;TASK-004","TASK-003;TASK-004","3","","","","",""
+id,title,description,scope,convergence_criteria,hints,execution_directives,deps,context_from,wave
+"TASK-001","Setup auth module","Create authentication module with JWT token generation and verification. Export verifyToken and generateToken functions.","src/auth/","auth.ts contains export function verifyToken(; auth.ts contains export function generateToken(","Reference existing middleware pattern in src/middleware/auth.ts","npm test -- --grep auth","","","1"
+"TASK-002","Create user model","Define User interface and database schema with email, passwordHash, role fields. Use existing Result type pattern.","src/models/","user.ts contains export interface User; user.ts contains email: string","See src/models/session.ts for existing model pattern","npm test -- --grep user","","","1"
+"TASK-003","Auth middleware","Create Express middleware that validates JWT from Authorization header. Use verifyToken from auth module. Return 401 on invalid token.","src/middleware/","auth-middleware.ts contains export function authMiddleware(; auth-middleware.ts contains verifyToken","Follows existing middleware pattern in src/middleware/logging.ts","npm test -- --grep middleware","TASK-001","TASK-001","2"
+"TASK-004","Login endpoint","Implement POST /api/login endpoint. Validate credentials against user model, return JWT on success. Use generateToken from auth module.","src/routes/","login.ts contains router.post('/api/login'; login.ts contains generateToken(","Wire into existing Express app in src/app.ts","curl -X POST localhost:3000/api/login","TASK-001;TASK-002","TASK-001;TASK-002","2"
+"TASK-005","Integration tests","Write integration tests for full auth flow: register, login, access protected route, token refresh.","tests/","tests/auth.test.ts exists; npm test exits with code 0","Use existing test setup in tests/setup.ts","npm test","TASK-003;TASK-004","TASK-003;TASK-004","3"
 ```
 **Columns**:
@@ -116,12 +116,14 @@ id,title,description,scope,convergence_criteria,hints,execution_directives,deps,
 | `deps` | Input | Semicolon-separated dependency task IDs |
 | `context_from` | Input | Semicolon-separated task IDs whose findings this task needs |
 | `wave` | Computed | Wave number from plan.json wave assignment |
-| `status` | Output | `pending` -> `completed` / `failed` / `blocked` / `skipped` |
+| `status` | Output | `pending` -> `completed` / `failed` / `blocked` / `skipped` (mapped from output_schema `result_status`) |
 | `findings` | Output | Implementation notes and observations (max 500 chars) |
 | `files_modified` | Output | Semicolon-separated list of created/modified files |
 | `tests_passed` | Output | Test pass/fail status from execution_directives |
 | `error` | Output | Error message if failed or blocked |
+**Column separation rule**: Wave CSV (input to spawn_agents_on_csv) and output_schema MUST NOT share column names. Wave CSV only contains Input columns + prev_context. Output columns are returned exclusively via output_schema (using `result_status`, not `status`). During merge, `result_status` maps back to the master CSV's `status` column.
 ### Per-Wave CSV (Temporary)
 Each wave generates `wave-{N}.csv` with extra `prev_context` column populated from predecessor task findings.
@@ -132,7 +134,7 @@ Each wave generates `wave-{N}.csv` with extra `prev_context` column populated fr
 |------|---------|-----------|
 | `tasks.csv` | Master state -- all tasks with status/findings | Updated after each wave |
 | `wave-{N}.csv` | Per-wave input (temporary) | Created before wave, deleted after |
-| `wave-{N}-results.csv` | Per-wave output | Created by spawn_agents_on_csv |
+| `wave-{N}-results.csv` | Per-wave output (uses `result_status`) | Created by spawn_agents_on_csv, deleted after merge |
 | `results.csv` | Final export of all task results | Created in Phase 3 |
 | `discoveries.ndjson` | Shared exploration board | Append-only, carries across waves |
 | `context.md` | Human-readable execution report | Created in Phase 3 |
@@ -158,7 +160,7 @@ Each wave generates `wave-{N}.csv` with extra `prev_context` column populated fr
 4. **Context Propagation**: prev_context built from master CSV findings, not from memory
 5. **Discovery Board is Append-Only**: Never clear, modify, or recreate discoveries.ndjson
 6. **Cascading Skip on Failure**: If a task fails/blocks, all dependent tasks are skipped
-7. **Cleanup Temp Files**: Remove wave-{N}.csv after results are merged
+7. **Cleanup Temp Files**: Remove `wave-{N}.csv` AND `wave-{N}-results.csv` after results are merged
 8. **Max 3 Fix Attempts**: Per task, auto-fix convergence failures up to 3 times, then mark blocked
 9. **Breakpoint Resume**: Always detect completed tasks and skip them on re-run
 10. **DO NOT STOP**: Continuous execution until all waves complete or user explicitly stops
@@ -251,11 +253,11 @@ spawn_agents_on_csv({
   instruction: buildExecutorInstruction(sessionFolder, phaseDir, autoCommit, specsContent),  // agent: ~/.codex/agents/workflow-executor.toml
   max_concurrency: maxConcurrency, max_runtime_seconds: 3600,
   output_csv_path: `${sessionFolder}/wave-${N}-results.csv`,
-  output_schema: { id, status: [completed|failed|blocked], findings, files_modified, tests_passed, error }
+  output_schema: { id, result_status: [completed|failed|blocked], findings, files_modified, tests_passed, error }
 })
 ```
-4. Merge results into master `tasks.csv`, delete `wave-{N}.csv`
+4. Merge results into master `tasks.csv`: map `result_status` from `wave-{N}-results.csv` to the `status` column in master CSV. Delete `wave-{N}.csv` AND `wave-{N}-results.csv` after merge.
 #### Blocked Task Handling

package/.codex/skills/maestro-milestone-audit/SKILL.md CHANGED Viewed

@@ -28,9 +28,9 @@ $maestro-milestone-audit "M1"
 ### tasks.csv (Master State)
 ```csv
-id,title,description,scope,check_targets,deps,wave,status,findings,gaps_found,severity,error
-"integ-1","Interface & dependency chains","Verify shared interfaces are consistent across phases: re-exports match, dependency chains unbroken, no circular imports between phase outputs","cross-phase imports, shared types, re-exports","grep for shared type names across phase output dirs; verify export/import consistency","","1","","","","",""
-"integ-2","Data contracts & API consistency","Verify request/response schemas match across phases: API signatures consistent, error codes aligned, no contract drift","request/response schemas, API signatures, error codes","diff API type definitions across phases; check error code enum consistency","","1","","","","",""
+id,title,description,scope,check_targets,deps,wave
+"integ-1","Interface & dependency chains","Verify shared interfaces are consistent across phases: re-exports match, dependency chains unbroken, no circular imports between phase outputs","cross-phase imports, shared types, re-exports","grep for shared type names across phase output dirs; verify export/import consistency","","1"
+"integ-2","Data contracts & API consistency","Verify request/response schemas match across phases: API signatures consistent, error codes aligned, no contract drift","request/response schemas, API signatures, error codes","diff API type definitions across phases; check error code enum consistency","","1"
 ```
 **Columns**:
@@ -44,19 +44,21 @@ id,title,description,scope,check_targets,deps,wave,status,findings,gaps_found,se
 | `check_targets` | Input | Specific verification commands/grep patterns |
 | `deps` | Input | Dependencies (empty — all wave 1) |
 | `wave` | Computed | Wave number (always 1 — single parallel wave) |
-| `status` | Output | `pending` -> `pass` / `fail` / `warning` |
+| `result_status` | Output | `pass` / `fail` / `warning` |
 | `findings` | Output | Detailed findings per dimension (max 500 chars) |
 | `gaps_found` | Output | Semicolon-separated list of integration gaps |
 | `severity` | Output | `critical` / `warning` / `info` per gap |
 | `error` | Output | Error message if check failed |
+**Column separation rule**: Input columns and Output columns MUST NOT share names. Wave CSV only contains Input columns. Output columns are returned exclusively via output_schema.
 ### Session Structure
 ```
 .workflow/.csv-wave/{YYYYMMDD}-audit-{milestone}/
 +-- tasks.csv
-+-- wave-1.csv (temporary)
-+-- wave-1-results.csv
++-- wave-1.csv (temporary, deleted after merge)
++-- wave-1-results.csv (temporary, deleted after merge)
 ```
 </csv_schema>
@@ -95,16 +97,16 @@ Verify all adhoc-scoped artifacts completed. For each execute artifact, verify a
 spawn_agents_on_csv({
   csv_path: `${sessionFolder}/wave-1.csv`,
   id_column: "id",
-  instruction: `You are an integration checker for milestone ${milestone}. For each row, examine the scope and check_targets. Search the codebase for inconsistencies, contract drift, and broken dependencies across phase outputs. Report findings with file:line references. Set status to pass/fail/warning. List specific gaps in gaps_found (semicolon-separated).`,
+  instruction: `You are an integration checker for milestone ${milestone}. For each row, examine the scope and check_targets. Search the codebase for inconsistencies, contract drift, and broken dependencies across phase outputs. Report findings with file:line references. Set result_status to pass/fail/warning. List specific gaps in gaps_found (semicolon-separated).`,
   max_concurrency: 2, max_runtime_seconds: 600,
   output_csv_path: `${sessionFolder}/wave-1-results.csv`,
-  output_schema: { id, status: [pass|fail|warning], findings, gaps_found, severity, error }
+  output_schema: { id, result_status: [pass|fail|warning], findings, gaps_found, severity, error }
 })
 ```
-4. Merge results into master `tasks.csv`
+4. Merge results into master `tasks.csv`: map `result_status` → master `status` column, copy `findings`, `gaps_found`, `severity`, `error`. Delete temporary files (`wave-1.csv`, `wave-1-results.csv`) after merge.
 5. Parse `gaps_found` from all workers — aggregate into `.workflow/milestones/{milestone}/audit-report.md`
-6. Any worker with `status == fail` and `severity == critical` → milestone verdict = FAIL
+6. Any worker with `result_status == fail` and `severity == critical` → milestone verdict = FAIL
 ### Step 6: Verdict

package/.codex/skills/maestro-ralph/SKILL.md CHANGED Viewed

@@ -99,8 +99,8 @@ S_BUILD_CHAIN:
   -> S_CREATE_SESSION DO: A_BUILD_STEPS
 S_CREATE_SESSION:
-  -> S_CONFIRM       WHEN: not auto_mode
-  -> S_LOAD_NEXT     WHEN: auto_mode
+  -> S_CONFIRM       WHEN: not auto_mode                       DO: A_CREATE_SESSION
+  -> S_LOAD_NEXT     WHEN: auto_mode                           DO: A_CREATE_SESSION
 S_CONFIRM:
   -> S_LOAD_NEXT     WHEN: "Proceed"
@@ -220,6 +220,13 @@ Priority: regex from intent `phase\s*(\d+)` -> latest in-progress artifact's pha
 6. Args use placeholders `{phase}`, `{intent}`, `{dirs}` — resolved at wave execution time
 7. Append `-y` to all skill args when `auto_mode` is true (see -y propagation table in context)
+### A_CREATE_SESSION
+1. Write `.workflow/.maestro/ralph-{YYYYMMDD-HHmmss}/status.json` (see Session JSON Schema)
+2. Initialize tracking:
+   - `create_goal({ objective: "Ralph lifecycle: {quality_mode} mode, {N} steps from {lifecycle_position}" })`
+   - `update_plan({ plan: steps.map(step => { step, status: "pending" }) })`
 ### A_BUILD_AND_SPAWN_WAVE
 1. Conditional step eval: check_coverage -> read validation.json, skip if >= threshold
@@ -255,11 +262,16 @@ Update session: milestone, phase, reset passed_gates. Re-infer quality_mode. Bui
 ### A_FINALIZE
-Set status = "completed". Sync update_plan. Release goal. Display completion report.
+1. Set `session.status = "completed"`, write status.json
+2. Sync update_plan: all steps → "completed"
+3. `update_goal({ status: "complete" })` — release goal constraint
+4. Display completion report
 ### A_PAUSE_SESSION
-Set status = "paused". Do NOT release goal. Display: use $maestro-ralph execute to continue.
+1. Set `session.status = "paused"`, write status.json
+2. Do NOT call `update_goal` — goal stays for `execute`/`continue` resume
+3. Display: use `$maestro-ralph execute` to continue
 </actions>