maestro-flow 0.3.39 → 0.3.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -108,6 +108,10 @@ Full mode:
108
108
  - [ ] analysis.md written with all 6 dimensions scored with evidence
109
109
  - [ ] conclusions.json created with recommendations and decision trail
110
110
  - [ ] Intent Coverage tracked and verified (no unresolved ❌ items)
111
+ - [ ] Confidence tracking initialized (Step 4.6) and re-scored each round (Step 5.8)
112
+ - [ ] Readiness gate checked before synthesis (Step 5.10)
113
+ - [ ] Pressure pass completed ≥ 1 time before Step 6
114
+ - [ ] Confidence summary with factor decomposition written to analysis.md
111
115
 
112
116
  Gaps mode:
113
117
  - [ ] Issues loaded from issues.jsonl (all open/registered, or single ISS-ID)
@@ -91,6 +91,10 @@ Single role mode:
91
91
  - [ ] Final Output Gate passed (Step 5.5) or `--yes` bypassed
92
92
  - [ ] All user decisions captured with Decision Recording Protocol
93
93
  - [ ] Session metadata updated with completion status
94
+ - [ ] Confidence scored per role completion and after cross-role analysis
95
+ - [ ] Readiness gate checked before spec generation
96
+ - [ ] Pressure pass completed on at least 1 feature spec
97
+ - [ ] Confidence summary appended to synthesis-changelog.md
94
98
 
95
99
  **Single role mode**:
96
100
  - [ ] analysis.md written to `{output_dir}/{role}/`
@@ -0,0 +1,333 @@
1
+ ---
2
+ name: maestro-collab
3
+ description: Multi-CLI collaborative analysis -- fan-out to multiple CLI tools, cross-verify, synthesize
4
+ argument-hint: "\"<requirement>\" [--tools gemini,qwen,claude] [--mode analysis|write] [--rule <template>] [-y]"
5
+ allowed-tools:
6
+ - Read
7
+ - Write
8
+ - Bash
9
+ - Glob
10
+ - Grep
11
+ - Agent
12
+ - AskUserQuestion
13
+ ---
14
+
15
+ <purpose>
16
+ Multi-CLI collaboration: fan-out the same requirement to multiple CLI tools in parallel, cross-verify outputs for consensus/conflicts, then synthesize into a unified report with standard downstream artifacts (context.md + conclusions.json).
17
+
18
+ Each CLI tool independently analyzes the requirement. Results are compared and merged via evidence-weighted synthesis.
19
+ </purpose>
20
+
21
+ <context>
22
+ $ARGUMENTS — requirement text and optional flags.
23
+
24
+ ```bash
25
+ /maestro-collab "analyze the auth module for security vulnerabilities"
26
+ /maestro-collab "design a caching strategy" --tools gemini,qwen,claude
27
+ /maestro-collab -y "review error handling patterns"
28
+ /maestro-collab "refactor user service" --mode write --tools gemini,claude
29
+ ```
30
+
31
+ **Flags**:
32
+ - `--tools <list>`: Comma-separated CLI tools (default: auto-select first 3 enabled)
33
+ - `--mode analysis|write`: Delegate mode (default: analysis)
34
+ - `--rule <template>`: Shared rule template for all delegates
35
+ - `-y` / `--yes`: Skip plan confirmation
36
+
37
+ **Output**: `.workflow/scratch/{YYYYMMDD}-collab-{slug}/`
38
+ - `collab-report.md` — full collaboration report
39
+ - `context.md` — standard Locked/Free/Deferred decisions (plan/analyze compatible)
40
+ - `conclusions.json` — structured conclusions (plan fast-track compatible)
41
+ - `per-tool/{tool}-output.md` — raw per-tool outputs
42
+ </context>
43
+
44
+ <execution>
45
+
46
+ ### Step 1: Parse Arguments
47
+
48
+ Extract from `$ARGUMENTS`:
49
+ - `requirement` — remaining text after flag removal (error if empty)
50
+ - `--tools` → `selectedTools` (comma-split)
51
+ - `--mode` → `delegateMode` (default: `analysis`)
52
+ - `--rule` → `ruleTemplate`
53
+ - `-y` / `--yes` → `autoYes`
54
+
55
+ ### Step 2: Discover Available CLI Tools
56
+
57
+ ```bash
58
+ Bash("maestro tools list --json 2>/dev/null || cat ~/.maestro/cli-tools.json")
59
+ ```
60
+
61
+ Parse tool entries. Build eligible list:
62
+ - `enabled == true`
63
+ - If `--mode write`: exclude `type == "api-endpoint"`
64
+
65
+ Auto-select (when `--tools` omitted): first 3 eligible in config order.
66
+ Validate: minimum 2 eligible tools (abort if fewer).
67
+
68
+ ### Step 3: Present Collaboration Plan
69
+
70
+ **(Skip if `-y`)**
71
+
72
+ Display plan, then ask user:
73
+
74
+ ```
75
+ ============================================================
76
+ COLLABORATION PLAN
77
+ ============================================================
78
+ Requirement: {requirement}
79
+ Mode: {delegateMode}
80
+ Rule: {ruleTemplate || "none"}
81
+
82
+ Available CLI Tools (from cli-tools.json):
83
+ [✓] gemini — gemini-3.1-pro-preview [fullstack, frontend]
84
+ [✓] claude — claude-sonnet-4-6 [fullstack]
85
+ [✓] codex — gpt-5.5 [fullstack, backend]
86
+ [ ] opencode — (no model) [fullstack]
87
+
88
+ Selected: gemini, claude, codex (3 tools)
89
+
90
+ Pipeline:
91
+ 1. Fan-out → parallel delegate to each tool
92
+ 2. Cross-verification → consensus/conflict analysis
93
+ 3. Synthesis → context.md + conclusions.json
94
+ ============================================================
95
+ ```
96
+
97
+ Use `AskUserQuestion` with options:
98
+ - **执行** — proceed with selected tools
99
+ - **修改工具选择** — let user specify different tool combination
100
+ - **取消** — abort
101
+
102
+ If **修改工具选择**: ask user which tools to use (show eligible list), validate ≥ 2, re-display plan.
103
+
104
+ ### Step 4: Setup Session
105
+
106
+ ```
107
+ slug = requirement kebab-cased, max 40 chars
108
+ outputDir = .workflow/scratch/{YYYYMMDD}-collab-{slug}/
109
+ ```
110
+
111
+ Create `outputDir` + `outputDir/per-tool/`.
112
+
113
+ ### Step 5: Build Delegate Prompt
114
+
115
+ Shared prompt for all tools:
116
+
117
+ ```
118
+ PURPOSE: {requirement}; success = actionable findings with evidence
119
+ TASK: {auto-decomposed into 3-5 specific verbs}
120
+ MODE: {delegateMode}
121
+ CONTEXT: @**/*
122
+ EXPECTED: Structured findings with file:line references, confidence score (0-100), prioritized recommendations. Sections: ## Findings, ## Recommendations, ## Confidence
123
+ CONSTRAINTS: {extracted from requirement}
124
+ ```
125
+
126
+ ### Step 6: Parallel Fan-Out
127
+
128
+ Launch ALL delegate calls simultaneously using multiple `Bash(run_in_background: true)` in a **single message**:
129
+
130
+ ```
131
+ // Launch all in ONE message — do NOT wait between calls
132
+ Bash({
133
+ command: `maestro delegate "${prompt}" --to gemini --mode ${mode} ${rule}`,
134
+ run_in_background: true
135
+ })
136
+ Bash({
137
+ command: `maestro delegate "${prompt}" --to claude --mode ${mode} ${rule}`,
138
+ run_in_background: true
139
+ })
140
+ Bash({
141
+ command: `maestro delegate "${prompt}" --to codex --mode ${mode} ${rule}`,
142
+ run_in_background: true
143
+ })
144
+ ```
145
+
146
+ **After launching all calls → STOP immediately. Do not output anything. Wait for background completion callbacks.**
147
+
148
+ ### Step 7: Collect Results
149
+
150
+ As each background callback arrives:
151
+ 1. Extract exec ID from output (`[MAESTRO_EXEC_ID=...]`)
152
+ 2. Run `maestro delegate output <id>` to get full result
153
+ 3. Write raw output to `per-tool/{tool}-output.md`
154
+
155
+ **Wait until ALL callbacks have arrived before proceeding.**
156
+
157
+ ### Step 8: Cross-Verify
158
+
159
+ Read all `per-tool/{tool}-output.md` files. Compare findings across tools:
160
+
161
+ For each finding, classify:
162
+ - **[CONSENSUS]**: 2+ tools agree on same finding/recommendation
163
+ - **[CONFLICT]**: Tools disagree on approach or assessment
164
+ - **[UNIQUE]**: Finding from only one tool
165
+
166
+ Compute `consensus_level = (consensus_count / total_findings) * 100`.
167
+
168
+ ### Step 9: Synthesize Outputs
169
+
170
+ Resolve conflicts via evidence-weighted voting:
171
+ - Higher confidence tool's position wins
172
+ - More specific evidence (file:line refs) wins over general statements
173
+ - If tied: mark as `[SUGGESTED]`
174
+
175
+ Generate three output files:
176
+
177
+ #### collab-report.md
178
+
179
+ ```markdown
180
+ # Multi-CLI Collaboration Report — {requirement}
181
+
182
+ ## Summary
183
+ - Tools: {tool_list}
184
+ - Consensus level: {N}%
185
+ - Key finding: {top finding}
186
+
187
+ ## Consensus Findings
188
+ {findings agreed by 2+ tools}
189
+
190
+ ## Resolved Conflicts
191
+ {conflicts resolved with rationale and winning tool}
192
+
193
+ ## Unresolved Items
194
+ {items requiring human judgment}
195
+
196
+ ## Unique Insights
197
+ {valuable unique findings with source tool attribution}
198
+
199
+ ## Recommendations
200
+ {prioritized, merged recommendations}
201
+
202
+ ## Per-Tool Confidence
203
+ | Tool | Confidence | Key Strength |
204
+ |------|-----------|--------------|
205
+ ```
206
+
207
+ #### context.md (standard downstream format)
208
+
209
+ ```markdown
210
+ # Context: {requirement}
211
+
212
+ **Date**: {date}
213
+ **Mode**: collab ({tool_list})
214
+ **Consensus Level**: {N}%
215
+
216
+ ## Decisions
217
+
218
+ ### Decision N: {TITLE}
219
+ - **Context**: {what and why}
220
+ - **Options**: 1. {opt1} 2. {opt2}
221
+ - **Chosen**: {selected}
222
+ - **Reason**: {rationale — which tools agreed/disagreed}
223
+
224
+ ## Constraints
225
+
226
+ ### Locked
227
+ {[CONSENSUS] items — treat as confirmed decisions}
228
+
229
+ ### Free
230
+ {[UNIQUE] items with strong evidence — implementer discretion}
231
+
232
+ ### Deferred
233
+ {[UNRESOLVED] conflicts — require human judgment}
234
+
235
+ ## Code Context
236
+ {file:line references from per-tool findings}
237
+ ```
238
+
239
+ #### conclusions.json
240
+
241
+ ```json
242
+ {
243
+ "session_id": "{sessionId}",
244
+ "subject": "{requirement}",
245
+ "mode": "collab",
246
+ "tools": ["gemini", "claude", "codex"],
247
+ "consensus_level": 85,
248
+ "recommendation": "Go|No-Go|Conditional",
249
+ "confidence": "high|medium|low",
250
+ "dimensions": [
251
+ { "name": "gemini", "score": 80, "findings": "...", "recommendations": "..." }
252
+ ],
253
+ "decisions": [
254
+ { "title": "...", "classification": "locked|free|deferred", "source_tools": [], "rationale": "..." }
255
+ ],
256
+ "timestamp": "<ISO>"
257
+ }
258
+ ```
259
+
260
+ ### Step 10: Register Artifact
261
+
262
+ Append to `.workflow/state.json`:
263
+
264
+ ```json
265
+ {
266
+ "id": "CLB-{next_id}",
267
+ "type": "collab",
268
+ "milestone": "{current_milestone}",
269
+ "phase": null,
270
+ "scope": "adhoc",
271
+ "path": "scratch/{YYYYMMDD}-collab-{slug}",
272
+ "status": "completed",
273
+ "depends_on": null,
274
+ "harvested": false,
275
+ "created_at": "<ISO>",
276
+ "completed_at": "<ISO>"
277
+ }
278
+ ```
279
+
280
+ ### Step 11: Display Summary
281
+
282
+ ```
283
+ ============================================================
284
+ MULTI-CLI COLLABORATION COMPLETE
285
+ ============================================================
286
+ Requirement: {requirement}
287
+ Tools: {tool_list}
288
+ Consensus Level: {N}%
289
+
290
+ Per-Tool:
291
+ gemini: completed (confidence: {N}%)
292
+ claude: completed (confidence: {N}%)
293
+ codex: completed (confidence: {N}%)
294
+
295
+ Artifact: CLB-{id}
296
+ Output: {outputDir}/
297
+
298
+ Next steps:
299
+ /maestro-analyze "{topic}" — Deep feasibility analysis
300
+ /maestro-plan "{phase} --dir {dir}" — Plan from collab conclusions
301
+ /maestro-brainstorm "{topic}" — Expand with multi-role brainstorm
302
+ ============================================================
303
+ ```
304
+
305
+ </execution>
306
+
307
+ <error_codes>
308
+
309
+ | Code | Severity | Condition | Recovery |
310
+ |------|----------|-----------|----------|
311
+ | E001 | error | Requirement argument missing | Prompt for requirement |
312
+ | E002 | error | Fewer than 2 CLI tools eligible | Check cli-tools.json, enable more tools |
313
+ | E003 | error | Specified tool not found/enabled | Show available tools |
314
+ | E004 | error | All delegates failed | Abort with per-tool error details |
315
+ | W001 | warning | One tool failed | Continue with remaining tools |
316
+ | W002 | warning | >50% conflicts in cross-verify | Highlight in report, recommend manual review |
317
+ | W003 | warning | Low consensus level (<40%) | Flag in summary |
318
+
319
+ </error_codes>
320
+
321
+ <success_criteria>
322
+ - [ ] Available tools discovered from cli-tools.json with eligibility filtering
323
+ - [ ] Plan presented via AskUserQuestion with tool modification option (unless -y)
324
+ - [ ] All delegates launched in parallel via Bash(run_in_background: true)
325
+ - [ ] Execution stopped after launch — waited for all callbacks
326
+ - [ ] Per-tool outputs written to per-tool/{tool}-output.md
327
+ - [ ] Cross-verification: consensus/conflict/unique classification complete
328
+ - [ ] collab-report.md produced with merged findings
329
+ - [ ] context.md produced in Locked/Free/Deferred format (downstream compatible)
330
+ - [ ] conclusions.json produced (plan fast-track compatible)
331
+ - [ ] CLB artifact registered in state.json
332
+ - [ ] Partial degradation: continued if 1+ tools succeeded
333
+ </success_criteria>
@@ -143,8 +143,12 @@ Follow workflow plan.md § "Revise Mode" and § "Check Mode" respectively. These
143
143
  - [ ] Every task has `read_first[]` with at least the file being modified + source of truth files
144
144
  - [ ] Every task has `convergence.criteria[]` with grep-verifiable conditions (no subjective language)
145
145
  - [ ] Every task `action` and `implementation` contain concrete values (no "align X with Y")
146
+ - [ ] Plan confidence scored in P4 with 5-dimension factor model
147
+ - [ ] Plan readiness gate checked before P4.5 collision detection
148
+ - [ ] Pressure pass completed on highest-complexity task
149
+ - [ ] plan.json includes confidence section (overall, dimensions, pressure_pass)
146
150
  - [ ] Collision detection executed against same-milestone plans (non-blocking)
147
151
  - [ ] Plan-checker passed (or minor issues acknowledged)
148
- - [ ] User confirmation captured (execute/modify/cancel)
152
+ - [ ] User confirmation captured (execute/modify/cancel) with confidence displayed
149
153
  - [ ] Artifact registered in state.json with correct scope/milestone/phase/depends_on
150
154
  </success_criteria>
@@ -326,10 +326,28 @@ For quality-gate decisions (post-verify, post-business-test, post-review, post-t
326
326
  | post-review | `{artifact_dir}/review.json` |
327
327
  | post-test | `{artifact_dir}/uat.md`, `{artifact_dir}/.tests/test-results.json` |
328
328
 
329
+ **Confidence-aware evaluation**:
330
+
331
+ Before delegating, check if artifact contains a confidence section (added by downstream commands):
332
+ - `verification.json` → `confidence.overall` (from maestro-verify)
333
+ - `report.json` → `confidence.overall` (from quality-auto-test)
334
+ - `review.json` → may contain dimension confidence (from quality-review)
335
+ - `uat.md` → confidence summary section (from quality-test)
336
+
337
+ If confidence data found, include in delegate prompt as additional signal:
338
+ ```
339
+ 已有置信度评估: 整体 {overall}%, 最弱维度: {weakest} ({score}%)
340
+ ```
341
+
342
+ **Confidence-based verdict bias**: When artifact confidence is available:
343
+ - confidence < 60% → bias toward "fix" even if surface status looks clean (hidden quality gaps)
344
+ - confidence 60-95% → use delegate verdict as-is
345
+ - confidence > 95% → bias toward "proceed" (strong evidence of quality)
346
+
329
347
  ```
330
348
  Bash({
331
349
  command: `maestro delegate "PURPOSE: 评估 ${meta.decision} 质量门结果,判断是否通过
332
- TASK: 读取结果文件 | 分析通过/失败状态 | 评估问题严重性 | 给出下一步建议
350
+ TASK: 读取结果文件 | 分析通过/失败状态 | 评估问题严重性 | 检查置信度评分 | 给出下一步建议
333
351
  MODE: analysis
334
352
  CONTEXT: @${result_files}
335
353
  EXPECTED: 严格按以下格式输出:
@@ -338,8 +356,10 @@ STATUS: proceed | fix | escalate
338
356
  REASON: 一句话解释
339
357
  GAP_SUMMARY: 具体问题描述(仅 fix/escalate 时填写,用于传递给 quality-debug)
340
358
  CONFIDENCE: high | medium | low
359
+ CONFIDENCE_SCORE: 0-100(从结果文件中读取置信度分数,无则估算)
360
+ WEAKEST_DIMENSION: 最弱维度名称
341
361
  ---END---
342
- CONSTRAINTS: 只评估不修改 | STATUS 三选一 | 如果 retry ${meta.retry_count}/${meta.max_retries} 已达上限且仍有问题则必须 escalate" --role analyze --mode analysis`,
362
+ CONSTRAINTS: 只评估不修改 | STATUS 三选一 | 置信度 < 60% 倾向 fix | 如果 retry ${meta.retry_count}/${meta.max_retries} 已达上限且仍有问题则必须 escalate" --role analyze --mode analysis`,
343
363
  run_in_background: true
344
364
  })
345
365
  STOP — wait for callback.
@@ -352,12 +372,20 @@ STOP — wait for callback.
352
372
  Parse structured response:
353
373
  ```
354
374
  Extract between ---VERDICT--- and ---END---:
355
- verdict.status = "proceed" | "fix" | "escalate"
356
- verdict.reason = string
357
- verdict.gap_summary = string (context for quality-debug)
358
- verdict.confidence = "high" | "medium" | "low"
375
+ verdict.status = "proceed" | "fix" | "escalate"
376
+ verdict.reason = string
377
+ verdict.gap_summary = string (context for quality-debug)
378
+ verdict.confidence = "high" | "medium" | "low"
379
+ verdict.confidence_score = 0-100 (numeric, from artifact or estimated)
380
+ verdict.weakest_dimension = string (weakest confidence dimension)
359
381
 
360
382
  If parse fails → fallback: treat as "fix" with generic gap_summary
383
+
384
+ Confidence-based verdict adjustment (after parse, before apply):
385
+ If verdict.confidence_score < 60 AND verdict.status == "proceed":
386
+ → Override to "fix", reason += " (置信度不足: {score}%,{weakest_dimension} 需加强)"
387
+ If verdict.confidence_score > 95 AND verdict.status == "fix" AND retry_count > 0:
388
+ → Suggest "proceed" override, reason += " (置信度充分: {score}%,建议通过)"
361
389
  ```
362
390
 
363
391
  **Apply verdict:**
@@ -503,9 +531,11 @@ End.
503
531
  - [ ] Full quality pipeline generated: verify → business-test → review → test-gen → test
504
532
  - [ ] Decision nodes inserted after: post-verify, post-business-test, post-review, post-test, post-milestone
505
533
  - [ ] Quality-gate decisions delegated via `maestro delegate --role analyze --mode analysis`
506
- - [ ] Delegate verdict parsed: STATUS / REASON / GAP_SUMMARY / CONFIDENCE
507
- - [ ] `-y` mode: auto-follow delegate verdict without user confirmation
508
- - [ ] Interactive mode: display recommendation + AskUserQuestion with override options
534
+ - [ ] Delegate verdict parsed: STATUS / REASON / GAP_SUMMARY / CONFIDENCE / CONFIDENCE_SCORE / WEAKEST_DIMENSION
535
+ - [ ] Confidence-based verdict adjustment applied (< 60% bias fix, > 95% bias proceed)
536
+ - [ ] Artifact confidence sections read when available (verification.json, report.json, uat.md)
537
+ - [ ] `-y` mode: auto-follow adjusted verdict without user confirmation
538
+ - [ ] Interactive mode: display recommendation with confidence score + AskUserQuestion with override options
509
539
  - [ ] Delegate failure fallback: treat as "fix" verdict
510
540
  - [ ] gap_summary from delegate passed to quality-debug as context
511
541
  - [ ] Fix-loop templates applied per decision type with retry_count increment
@@ -116,6 +116,10 @@ Append to state.json.artifacts[]:
116
116
  - [ ] Tests executed progressively (L0→L3) with fail-fast on critical
117
117
  - [ ] Iteration engine ran (inner: test_defect fix, outer: strategy adjust)
118
118
  - [ ] state.json, report.json, reflection-log.md written
119
+ - [ ] Test confidence scored per iteration (Step 7.5) with 5-dimension factor model
120
+ - [ ] Convergence check includes confidence >= 60% alongside pass_rate threshold
121
+ - [ ] Pressure pass completed on highest-pass-rate layer before completion
122
+ - [ ] report.json includes confidence section
119
123
  - [ ] index.json updated with auto_test section
120
124
  - [ ] If spec source: traceability matrix built, traceability.md written
121
125
  - [ ] If failures: issues auto-created in issues.jsonl
@@ -115,7 +115,11 @@ If user confirms, invoke `Skill({ skill: "spec-add", args: "<category> <content>
115
115
  - [ ] evidence.ndjson written with structured NDJSON entries
116
116
  - [ ] understanding.md tracks evolving understanding per cluster
117
117
  - [ ] Root causes collected with fix_direction and affected_files
118
+ - [ ] Multi-factor confidence scored per gap (Step 7.0) replacing simple high/medium/low
119
+ - [ ] Readiness gate checked before ROOT CAUSE declaration
120
+ - [ ] Pressure pass completed on confirmed hypothesis
121
+ - [ ] Confidence table appended to understanding.md
118
122
  - [ ] If --from-uat: uat.md gaps updated with diagnosis artifacts
119
- - [ ] Results unified into diagnosis summary
123
+ - [ ] Results unified into diagnosis summary with confidence section
120
124
  - [ ] Next step routed (plan --gaps + execute if fix needed, verify if fix applied, resume if inconclusive)
121
125
  </success_criteria>
@@ -95,6 +95,10 @@ Append to state.json.artifacts[]:
95
95
  - [ ] Severity inferred from natural language (never asked)
96
96
  - [ ] Batched writes: on issue, every 5 passes, or completion
97
97
  - [ ] test-results.json and coverage-report.json written
98
+ - [ ] UAT confidence scored with 4-dimension factor model
99
+ - [ ] Readiness gate checked before final report
100
+ - [ ] Pressure pass completed if > 80% pass rate
101
+ - [ ] Confidence summary appended to uat.md
98
102
  - [ ] index.json uat fields updated
99
103
  - [ ] If issues: parallel debug agents spawned per gap cluster
100
104
  - [ ] Gaps updated with root_cause, fix_direction, affected_files
@@ -139,9 +139,15 @@ After each barrier skill completes, read its artifacts and update `state.context
139
139
  }
140
140
  ```
141
141
 
142
- 7. **Initialize plan tracking** (dual-track: status.json + update_plan):
142
+ 7. **Initialize tracking** (goal constraint plan sub-items):
143
143
 
144
144
  ```
145
+ // Goal = outer constraint — ensures entire chain completes
146
+ functions.create_goal({
147
+ objective: `Maestro ${chain_name}: ${steps.length} steps [${steps.map(s => s.skill).join(' → ')}]`
148
+ })
149
+
150
+ // Plan = inner tracking — sub-step progress
145
151
  functions.update_plan({
146
152
  plan: steps.map((step, i) => ({
147
153
  id: `step-${i}`,
@@ -233,9 +239,12 @@ Object with all fields required: `status` ("completed"|"failed"), `skill_call` (
233
239
 
234
240
  ### Phase 3: Completion Report
235
241
 
236
- Finalize dual tracking:
242
+ Finalize tracking:
237
243
  - status.json: `state.status = 'completed'`
238
244
  - update_plan: all steps → `"completed"` (skipped steps also marked completed)
245
+ - **update_goal**: `functions.update_goal({ status: "complete" })` — release goal constraint
246
+
247
+ **Note**: Abort path (Phase 2 step 7) does NOT call `update_goal` — goal stays running for `--continue` resume.
239
248
 
240
249
  ```
241
250
  === COORDINATE COMPLETE ===
@@ -112,6 +112,7 @@ id,title,description,dimension,analysis_type,deps,context_from,wave,status,findi
112
112
  | `findings` | Output | Key findings summary (max 500 chars) |
113
113
  | `score` | Output | Dimension score (0-100 for scoring tasks, empty for explore/decide) |
114
114
  | `recommendations` | Output | Dimension-specific recommendations |
115
+ | `confidence_score` | Output | Per-dimension confidence score (0-100) from factor-based assessment |
115
116
  | `error` | Output | Error message if failed |
116
117
 
117
118
  ### Per-Wave CSV (Temporary)
@@ -356,6 +357,10 @@ Write wave CSV with `prev_context`, execute `spawn_agents_on_csv` for synthesis
356
357
  {prioritized recommendations with rationale}
357
358
  ```
358
359
 
360
+ 3b. **Confidence scoring** (full mode only):
361
+
362
+ Factors (weights): findings_depth(.30), evidence_strength(.25), coverage_breadth(.20), user_validation(.15, 0 in CSV mode), consistency(.10). Overall = average of dimension scores. Thresholds: <60% deeper | 60-80% optional | 80-95% converging | >95% converge. Append confidence summary to `analysis.md` and `conclusions.json`.
363
+
359
364
  4. Build `context.md` (both modes):
360
365
 
361
366
  ```markdown
@@ -479,6 +484,8 @@ echo '{"ts":"<ISO>","worker":"{id}","type":"exploration_finding","data":{"file":
479
484
  - [ ] analysis.md + conclusions.json produced (full mode only)
480
485
  - [ ] Deferred items auto-created as issues
481
486
  - [ ] Artifact registered in state.json
487
+ - [ ] Confidence scored per dimension with factor-based model (full mode only)
488
+ - [ ] Confidence summary appended to analysis.md and conclusions.json
482
489
  - [ ] Final outputs copied to scratchDir
483
490
  - [ ] discoveries.ndjson append-only throughout
484
491
  </success_criteria>
@@ -364,9 +364,15 @@ spawn_agents_on_csv({
364
364
  - Skill: maestro-roadmap --mode full -- Generate full spec package from brainstorm
365
365
  ```
366
366
 
367
- 4. Copy artifacts to output `.brainstorming/` directory (phase mode or scratch mode target)
368
- 5. Update phase `index.json` with brainstorm status (if phase mode)
369
- 6. **Next-Step Routing** (skip if AUTO_YES default to first applicable):
367
+ 4. **Brainstorm confidence scoring**:
368
+
369
+ Dimensions (5): role_coverage, cross_role_consistency, feature_completeness, spec_quality, design_feasibility. Factors (weights): analysis_depth(.30), evidence_strength(.25), coverage_breadth(.20), user_validation(.15, 0 if --yes), consistency(.10). Append confidence summary to `synthesis-changelog.md`.
370
+
371
+ **Conflict-based quality gate**: >3 `[UNRESOLVED]` conflicts → warn before artifact registration.
372
+
373
+ 5. Copy artifacts to output `.brainstorming/` directory (phase mode or scratch mode target)
374
+ 6. Update phase `index.json` with brainstorm status (if phase mode)
375
+ 7. **Next-Step Routing** (skip if AUTO_YES — default to first applicable):
370
376
  - Detect UI features: scan feature-index.json for UI/frontend-related features (keywords: ui, interface, page, component, dashboard, form, layout)
371
377
  - `request_user_input` (include UI Design option only when UI features detected):
372
378
  ```json
@@ -439,4 +445,7 @@ echo '{"ts":"<ISO>","worker":"{id}","type":"terminology","data":{"term":"CRDT","
439
445
  - [ ] context.md produced with full brainstorm report
440
446
  - [ ] Artifacts copied to target .brainstorming/ directory
441
447
  - [ ] discoveries.ndjson append-only throughout
448
+ - [ ] Confidence scored per role and after cross-role synthesis
449
+ - [ ] Conflict-based quality gate evaluated (> 3 unresolved = warning)
450
+ - [ ] Confidence summary appended to synthesis-changelog.md
442
451
  </success_criteria>