npm - maestro-flow - Versions diffs - 0.3.39 → 0.3.41 - Mend

maestro-flow 0.3.39 → 0.3.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/.claude/commands/maestro-analyze.md +4 -0
package/.claude/commands/maestro-brainstorm.md +4 -0
package/.claude/commands/maestro-collab.md +333 -0
package/.claude/commands/maestro-plan.md +5 -1
package/.claude/commands/maestro-ralph.md +39 -9
package/.claude/commands/quality-auto-test.md +4 -0
package/.claude/commands/quality-debug.md +5 -1
package/.claude/commands/quality-test.md +4 -0
package/.codex/skills/maestro/SKILL.md +11 -2
package/.codex/skills/maestro-analyze/SKILL.md +7 -0
package/.codex/skills/maestro-brainstorm/SKILL.md +12 -3
package/.codex/skills/maestro-collab/SKILL.md +631 -0
package/.codex/skills/maestro-plan/SKILL.md +10 -0
package/.codex/skills/maestro-player/SKILL.md +10 -0
package/.codex/skills/maestro-ralph/SKILL.md +32 -9
package/.codex/skills/quality-auto-test/SKILL.md +6 -0
package/.codex/skills/quality-debug/SKILL.md +7 -1
package/.codex/skills/quality-test/SKILL.md +9 -0
package/package.json +1 -1
package/workflows/analyze.md +24 -2
package/workflows/auto-test.md +12 -0
package/workflows/brainstorm.md +11 -1
package/workflows/debug.md +13 -4
package/workflows/plan.md +14 -4
package/workflows/test.md +10 -0

package/.claude/commands/maestro-analyze.md CHANGED Viewed

@@ -108,6 +108,10 @@ Full mode:
 - [ ] analysis.md written with all 6 dimensions scored with evidence
 - [ ] conclusions.json created with recommendations and decision trail
 - [ ] Intent Coverage tracked and verified (no unresolved ❌ items)
+- [ ] Confidence tracking initialized (Step 4.6) and re-scored each round (Step 5.8)
+- [ ] Readiness gate checked before synthesis (Step 5.10)
+- [ ] Pressure pass completed ≥ 1 time before Step 6
+- [ ] Confidence summary with factor decomposition written to analysis.md
 Gaps mode:
 - [ ] Issues loaded from issues.jsonl (all open/registered, or single ISS-ID)

package/.claude/commands/maestro-brainstorm.md CHANGED Viewed

@@ -91,6 +91,10 @@ Single role mode:
 - [ ] Final Output Gate passed (Step 5.5) or `--yes` bypassed
 - [ ] All user decisions captured with Decision Recording Protocol
 - [ ] Session metadata updated with completion status
+- [ ] Confidence scored per role completion and after cross-role analysis
+- [ ] Readiness gate checked before spec generation
+- [ ] Pressure pass completed on at least 1 feature spec
+- [ ] Confidence summary appended to synthesis-changelog.md
 **Single role mode**:
 - [ ] analysis.md written to `{output_dir}/{role}/`

package/.claude/commands/maestro-collab.md ADDED Viewed

@@ -0,0 +1,333 @@
+---
+name: maestro-collab
+description: Multi-CLI collaborative analysis -- fan-out to multiple CLI tools, cross-verify, synthesize
+argument-hint: "\"<requirement>\" [--tools gemini,qwen,claude] [--mode analysis|write] [--rule <template>] [-y]"
+allowed-tools:
+  - Read
+  - Write
+  - Bash
+  - Glob
+  - Grep
+  - Agent
+  - AskUserQuestion
+---
+<purpose>
+Multi-CLI collaboration: fan-out the same requirement to multiple CLI tools in parallel, cross-verify outputs for consensus/conflicts, then synthesize into a unified report with standard downstream artifacts (context.md + conclusions.json).
+Each CLI tool independently analyzes the requirement. Results are compared and merged via evidence-weighted synthesis.
+</purpose>
+<context>
+$ARGUMENTS — requirement text and optional flags.
+```bash
+/maestro-collab "analyze the auth module for security vulnerabilities"
+/maestro-collab "design a caching strategy" --tools gemini,qwen,claude
+/maestro-collab -y "review error handling patterns"
+/maestro-collab "refactor user service" --mode write --tools gemini,claude
+```
+**Flags**:
+- `--tools <list>`: Comma-separated CLI tools (default: auto-select first 3 enabled)
+- `--mode analysis|write`: Delegate mode (default: analysis)
+- `--rule <template>`: Shared rule template for all delegates
+- `-y` / `--yes`: Skip plan confirmation
+**Output**: `.workflow/scratch/{YYYYMMDD}-collab-{slug}/`
+- `collab-report.md` — full collaboration report
+- `context.md` — standard Locked/Free/Deferred decisions (plan/analyze compatible)
+- `conclusions.json` — structured conclusions (plan fast-track compatible)
+- `per-tool/{tool}-output.md` — raw per-tool outputs
+</context>
+<execution>
+### Step 1: Parse Arguments
+Extract from `$ARGUMENTS`:
+- `requirement` — remaining text after flag removal (error if empty)
+- `--tools` → `selectedTools` (comma-split)
+- `--mode` → `delegateMode` (default: `analysis`)
+- `--rule` → `ruleTemplate`
+- `-y` / `--yes` → `autoYes`
+### Step 2: Discover Available CLI Tools
+```bash
+Bash("maestro tools list --json 2>/dev/null || cat ~/.maestro/cli-tools.json")
+```
+Parse tool entries. Build eligible list:
+- `enabled == true`
+- If `--mode write`: exclude `type == "api-endpoint"`
+Auto-select (when `--tools` omitted): first 3 eligible in config order.
+Validate: minimum 2 eligible tools (abort if fewer).
+### Step 3: Present Collaboration Plan
+**(Skip if `-y`)**
+Display plan, then ask user:
+```
+============================================================
+  COLLABORATION PLAN
+============================================================
+  Requirement: {requirement}
+  Mode:        {delegateMode}
+  Rule:        {ruleTemplate || "none"}
+  Available CLI Tools (from cli-tools.json):
+    [✓] gemini    — gemini-3.1-pro-preview     [fullstack, frontend]
+    [✓] claude    — claude-sonnet-4-6           [fullstack]
+    [✓] codex     — gpt-5.5                    [fullstack, backend]
+    [ ] opencode  — (no model)                  [fullstack]
+  Selected: gemini, claude, codex (3 tools)
+  Pipeline:
+    1. Fan-out → parallel delegate to each tool
+    2. Cross-verification → consensus/conflict analysis
+    3. Synthesis → context.md + conclusions.json
+============================================================
+```
+Use `AskUserQuestion` with options:
+- **执行** — proceed with selected tools
+- **修改工具选择** — let user specify different tool combination
+- **取消** — abort
+If **修改工具选择**: ask user which tools to use (show eligible list), validate ≥ 2, re-display plan.
+### Step 4: Setup Session
+```
+slug = requirement kebab-cased, max 40 chars
+outputDir = .workflow/scratch/{YYYYMMDD}-collab-{slug}/
+```
+Create `outputDir` + `outputDir/per-tool/`.
+### Step 5: Build Delegate Prompt
+Shared prompt for all tools:
+```
+PURPOSE: {requirement}; success = actionable findings with evidence
+TASK: {auto-decomposed into 3-5 specific verbs}
+MODE: {delegateMode}
+CONTEXT: @**/*
+EXPECTED: Structured findings with file:line references, confidence score (0-100), prioritized recommendations. Sections: ## Findings, ## Recommendations, ## Confidence
+CONSTRAINTS: {extracted from requirement}
+```
+### Step 6: Parallel Fan-Out
+Launch ALL delegate calls simultaneously using multiple `Bash(run_in_background: true)` in a **single message**:
+```
+// Launch all in ONE message — do NOT wait between calls
+Bash({
+  command: `maestro delegate "${prompt}" --to gemini --mode ${mode} ${rule}`,
+  run_in_background: true
+})
+Bash({
+  command: `maestro delegate "${prompt}" --to claude --mode ${mode} ${rule}`,
+  run_in_background: true
+})
+Bash({
+  command: `maestro delegate "${prompt}" --to codex --mode ${mode} ${rule}`,
+  run_in_background: true
+})
+```
+**After launching all calls → STOP immediately. Do not output anything. Wait for background completion callbacks.**
+### Step 7: Collect Results
+As each background callback arrives:
+1. Extract exec ID from output (`[MAESTRO_EXEC_ID=...]`)
+2. Run `maestro delegate output <id>` to get full result
+3. Write raw output to `per-tool/{tool}-output.md`
+**Wait until ALL callbacks have arrived before proceeding.**
+### Step 8: Cross-Verify
+Read all `per-tool/{tool}-output.md` files. Compare findings across tools:
+For each finding, classify:
+- **[CONSENSUS]**: 2+ tools agree on same finding/recommendation
+- **[CONFLICT]**: Tools disagree on approach or assessment
+- **[UNIQUE]**: Finding from only one tool
+Compute `consensus_level = (consensus_count / total_findings) * 100`.
+### Step 9: Synthesize Outputs
+Resolve conflicts via evidence-weighted voting:
+- Higher confidence tool's position wins
+- More specific evidence (file:line refs) wins over general statements
+- If tied: mark as `[SUGGESTED]`
+Generate three output files:
+#### collab-report.md
+```markdown
+# Multi-CLI Collaboration Report — {requirement}
+## Summary
+- Tools: {tool_list}
+- Consensus level: {N}%
+- Key finding: {top finding}
+## Consensus Findings
+{findings agreed by 2+ tools}
+## Resolved Conflicts
+{conflicts resolved with rationale and winning tool}
+## Unresolved Items
+{items requiring human judgment}
+## Unique Insights
+{valuable unique findings with source tool attribution}
+## Recommendations
+{prioritized, merged recommendations}
+## Per-Tool Confidence
+| Tool | Confidence | Key Strength |
+|------|-----------|--------------|
+```
+#### context.md (standard downstream format)
+```markdown
+# Context: {requirement}
+**Date**: {date}
+**Mode**: collab ({tool_list})
+**Consensus Level**: {N}%
+## Decisions
+### Decision N: {TITLE}
+- **Context**: {what and why}
+- **Options**: 1. {opt1} 2. {opt2}
+- **Chosen**: {selected}
+- **Reason**: {rationale — which tools agreed/disagreed}
+## Constraints
+### Locked
+{[CONSENSUS] items — treat as confirmed decisions}
+### Free
+{[UNIQUE] items with strong evidence — implementer discretion}
+### Deferred
+{[UNRESOLVED] conflicts — require human judgment}
+## Code Context
+{file:line references from per-tool findings}
+```
+#### conclusions.json
+```json
+{
+  "session_id": "{sessionId}",
+  "subject": "{requirement}",
+  "mode": "collab",
+  "tools": ["gemini", "claude", "codex"],
+  "consensus_level": 85,
+  "recommendation": "Go|No-Go|Conditional",
+  "confidence": "high|medium|low",
+  "dimensions": [
+    { "name": "gemini", "score": 80, "findings": "...", "recommendations": "..." }
+  ],
+  "decisions": [
+    { "title": "...", "classification": "locked|free|deferred", "source_tools": [], "rationale": "..." }
+  ],
+  "timestamp": "<ISO>"
+}
+```
+### Step 10: Register Artifact
+Append to `.workflow/state.json`:
+```json
+{
+  "id": "CLB-{next_id}",
+  "type": "collab",
+  "milestone": "{current_milestone}",
+  "phase": null,
+  "scope": "adhoc",
+  "path": "scratch/{YYYYMMDD}-collab-{slug}",
+  "status": "completed",
+  "depends_on": null,
+  "harvested": false,
+  "created_at": "<ISO>",
+  "completed_at": "<ISO>"
+}
+```
+### Step 11: Display Summary
+```
+============================================================
+  MULTI-CLI COLLABORATION COMPLETE
+============================================================
+  Requirement:     {requirement}
+  Tools:           {tool_list}
+  Consensus Level: {N}%
+  Per-Tool:
+    gemini:  completed (confidence: {N}%)
+    claude:  completed (confidence: {N}%)
+    codex:   completed (confidence: {N}%)
+  Artifact: CLB-{id}
+  Output:   {outputDir}/
+  Next steps:
+    /maestro-analyze "{topic}"            — Deep feasibility analysis
+    /maestro-plan "{phase} --dir {dir}"   — Plan from collab conclusions
+    /maestro-brainstorm "{topic}"         — Expand with multi-role brainstorm
+============================================================
+```
+</execution>
+<error_codes>
+| Code | Severity | Condition | Recovery |
+|------|----------|-----------|----------|
+| E001 | error | Requirement argument missing | Prompt for requirement |
+| E002 | error | Fewer than 2 CLI tools eligible | Check cli-tools.json, enable more tools |
+| E003 | error | Specified tool not found/enabled | Show available tools |
+| E004 | error | All delegates failed | Abort with per-tool error details |
+| W001 | warning | One tool failed | Continue with remaining tools |
+| W002 | warning | >50% conflicts in cross-verify | Highlight in report, recommend manual review |
+| W003 | warning | Low consensus level (<40%) | Flag in summary |
+</error_codes>
+<success_criteria>
+- [ ] Available tools discovered from cli-tools.json with eligibility filtering
+- [ ] Plan presented via AskUserQuestion with tool modification option (unless -y)
+- [ ] All delegates launched in parallel via Bash(run_in_background: true)
+- [ ] Execution stopped after launch — waited for all callbacks
+- [ ] Per-tool outputs written to per-tool/{tool}-output.md
+- [ ] Cross-verification: consensus/conflict/unique classification complete
+- [ ] collab-report.md produced with merged findings
+- [ ] context.md produced in Locked/Free/Deferred format (downstream compatible)
+- [ ] conclusions.json produced (plan fast-track compatible)
+- [ ] CLB artifact registered in state.json
+- [ ] Partial degradation: continued if 1+ tools succeeded
+</success_criteria>

package/.claude/commands/maestro-plan.md CHANGED Viewed

@@ -143,8 +143,12 @@ Follow workflow plan.md § "Revise Mode" and § "Check Mode" respectively. These
 - [ ] Every task has `read_first[]` with at least the file being modified + source of truth files
 - [ ] Every task has `convergence.criteria[]` with grep-verifiable conditions (no subjective language)
 - [ ] Every task `action` and `implementation` contain concrete values (no "align X with Y")
+- [ ] Plan confidence scored in P4 with 5-dimension factor model
+- [ ] Plan readiness gate checked before P4.5 collision detection
+- [ ] Pressure pass completed on highest-complexity task
+- [ ] plan.json includes confidence section (overall, dimensions, pressure_pass)
 - [ ] Collision detection executed against same-milestone plans (non-blocking)
 - [ ] Plan-checker passed (or minor issues acknowledged)
-- [ ] User confirmation captured (execute/modify/cancel)
+- [ ] User confirmation captured (execute/modify/cancel) with confidence displayed
 - [ ] Artifact registered in state.json with correct scope/milestone/phase/depends_on
 </success_criteria>

package/.claude/commands/maestro-ralph.md CHANGED Viewed

@@ -326,10 +326,28 @@ For quality-gate decisions (post-verify, post-business-test, post-review, post-t
 | post-review | `{artifact_dir}/review.json` |
 | post-test | `{artifact_dir}/uat.md`, `{artifact_dir}/.tests/test-results.json` |
+**Confidence-aware evaluation**:
+Before delegating, check if artifact contains a confidence section (added by downstream commands):
+- `verification.json` → `confidence.overall` (from maestro-verify)
+- `report.json` → `confidence.overall` (from quality-auto-test)
+- `review.json` → may contain dimension confidence (from quality-review)
+- `uat.md` → confidence summary section (from quality-test)
+If confidence data found, include in delegate prompt as additional signal:
+```
+已有置信度评估: 整体 {overall}%, 最弱维度: {weakest} ({score}%)
+```
+**Confidence-based verdict bias**: When artifact confidence is available:
+- confidence < 60% → bias toward "fix" even if surface status looks clean (hidden quality gaps)
+- confidence 60-95% → use delegate verdict as-is
+- confidence > 95% → bias toward "proceed" (strong evidence of quality)
 ```
 Bash({
   command: `maestro delegate "PURPOSE: 评估 ${meta.decision} 质量门结果，判断是否通过
-TASK: 读取结果文件 | 分析通过/失败状态 | 评估问题严重性 | 给出下一步建议
+TASK: 读取结果文件 | 分析通过/失败状态 | 评估问题严重性 | 检查置信度评分 | 给出下一步建议
 MODE: analysis
 CONTEXT: @${result_files}
 EXPECTED: 严格按以下格式输出:
@@ -338,8 +356,10 @@ STATUS: proceed | fix | escalate
 REASON: 一句话解释
 GAP_SUMMARY: 具体问题描述（仅 fix/escalate 时填写，用于传递给 quality-debug）
 CONFIDENCE: high | medium | low
+CONFIDENCE_SCORE: 0-100（从结果文件中读取置信度分数，无则估算）
+WEAKEST_DIMENSION: 最弱维度名称
 ---END---
-CONSTRAINTS: 只评估不修改 | STATUS 三选一 | 如果 retry ${meta.retry_count}/${meta.max_retries} 已达上限且仍有问题则必须 escalate" --role analyze --mode analysis`,
+CONSTRAINTS: 只评估不修改 | STATUS 三选一 | 置信度 < 60% 倾向 fix | 如果 retry ${meta.retry_count}/${meta.max_retries} 已达上限且仍有问题则必须 escalate" --role analyze --mode analysis`,
   run_in_background: true
 })
 STOP — wait for callback.
@@ -352,12 +372,20 @@ STOP — wait for callback.
 Parse structured response:
 ```
 Extract between ---VERDICT--- and ---END---:
-  verdict.status   = "proceed" | "fix" | "escalate"
-  verdict.reason   = string
-  verdict.gap_summary = string (context for quality-debug)
-  verdict.confidence = "high" | "medium" | "low"
+  verdict.status           = "proceed" | "fix" | "escalate"
+  verdict.reason           = string
+  verdict.gap_summary      = string (context for quality-debug)
+  verdict.confidence       = "high" | "medium" | "low"
+  verdict.confidence_score = 0-100 (numeric, from artifact or estimated)
+  verdict.weakest_dimension = string (weakest confidence dimension)
 If parse fails → fallback: treat as "fix" with generic gap_summary
+Confidence-based verdict adjustment (after parse, before apply):
+  If verdict.confidence_score < 60 AND verdict.status == "proceed":
+    → Override to "fix", reason += " (置信度不足: {score}%，{weakest_dimension} 需加强)"
+  If verdict.confidence_score > 95 AND verdict.status == "fix" AND retry_count > 0:
+    → Suggest "proceed" override, reason += " (置信度充分: {score}%，建议通过)"
 ```
 **Apply verdict:**
@@ -503,9 +531,11 @@ End.
 - [ ] Full quality pipeline generated: verify → business-test → review → test-gen → test
 - [ ] Decision nodes inserted after: post-verify, post-business-test, post-review, post-test, post-milestone
 - [ ] Quality-gate decisions delegated via `maestro delegate --role analyze --mode analysis`
-- [ ] Delegate verdict parsed: STATUS / REASON / GAP_SUMMARY / CONFIDENCE
-- [ ] `-y` mode: auto-follow delegate verdict without user confirmation
-- [ ] Interactive mode: display recommendation + AskUserQuestion with override options
+- [ ] Delegate verdict parsed: STATUS / REASON / GAP_SUMMARY / CONFIDENCE / CONFIDENCE_SCORE / WEAKEST_DIMENSION
+- [ ] Confidence-based verdict adjustment applied (< 60% bias fix, > 95% bias proceed)
+- [ ] Artifact confidence sections read when available (verification.json, report.json, uat.md)
+- [ ] `-y` mode: auto-follow adjusted verdict without user confirmation
+- [ ] Interactive mode: display recommendation with confidence score + AskUserQuestion with override options
 - [ ] Delegate failure fallback: treat as "fix" verdict
 - [ ] gap_summary from delegate passed to quality-debug as context
 - [ ] Fix-loop templates applied per decision type with retry_count increment

package/.claude/commands/quality-auto-test.md CHANGED Viewed

@@ -116,6 +116,10 @@ Append to state.json.artifacts[]:
 - [ ] Tests executed progressively (L0→L3) with fail-fast on critical
 - [ ] Iteration engine ran (inner: test_defect fix, outer: strategy adjust)
 - [ ] state.json, report.json, reflection-log.md written
+- [ ] Test confidence scored per iteration (Step 7.5) with 5-dimension factor model
+- [ ] Convergence check includes confidence >= 60% alongside pass_rate threshold
+- [ ] Pressure pass completed on highest-pass-rate layer before completion
+- [ ] report.json includes confidence section
 - [ ] index.json updated with auto_test section
 - [ ] If spec source: traceability matrix built, traceability.md written
 - [ ] If failures: issues auto-created in issues.jsonl

package/.claude/commands/quality-debug.md CHANGED Viewed

@@ -115,7 +115,11 @@ If user confirms, invoke `Skill({ skill: "spec-add", args: "<category> <content>
 - [ ] evidence.ndjson written with structured NDJSON entries
 - [ ] understanding.md tracks evolving understanding per cluster
 - [ ] Root causes collected with fix_direction and affected_files
+- [ ] Multi-factor confidence scored per gap (Step 7.0) replacing simple high/medium/low
+- [ ] Readiness gate checked before ROOT CAUSE declaration
+- [ ] Pressure pass completed on confirmed hypothesis
+- [ ] Confidence table appended to understanding.md
 - [ ] If --from-uat: uat.md gaps updated with diagnosis artifacts
-- [ ] Results unified into diagnosis summary
+- [ ] Results unified into diagnosis summary with confidence section
 - [ ] Next step routed (plan --gaps + execute if fix needed, verify if fix applied, resume if inconclusive)
 </success_criteria>

package/.claude/commands/quality-test.md CHANGED Viewed

@@ -95,6 +95,10 @@ Append to state.json.artifacts[]:
 - [ ] Severity inferred from natural language (never asked)
 - [ ] Batched writes: on issue, every 5 passes, or completion
 - [ ] test-results.json and coverage-report.json written
+- [ ] UAT confidence scored with 4-dimension factor model
+- [ ] Readiness gate checked before final report
+- [ ] Pressure pass completed if > 80% pass rate
+- [ ] Confidence summary appended to uat.md
 - [ ] index.json uat fields updated
 - [ ] If issues: parallel debug agents spawned per gap cluster
 - [ ] Gaps updated with root_cause, fix_direction, affected_files

package/.codex/skills/maestro/SKILL.md CHANGED Viewed

@@ -139,9 +139,15 @@ After each barrier skill completes, read its artifacts and update `state.context
 }
 ```
-7. **Initialize plan tracking** (dual-track: status.json + update_plan):
+7. **Initialize tracking** (goal constraint → plan sub-items):
 ```
+// Goal = outer constraint — ensures entire chain completes
+functions.create_goal({
+  objective: `Maestro ${chain_name}: ${steps.length} steps [${steps.map(s => s.skill).join(' → ')}]`
+})
+// Plan = inner tracking — sub-step progress
 functions.update_plan({
   plan: steps.map((step, i) => ({
     id: `step-${i}`,
@@ -233,9 +239,12 @@ Object with all fields required: `status` ("completed"|"failed"), `skill_call` (
 ### Phase 3: Completion Report
-Finalize dual tracking:
+Finalize tracking:
 - status.json: `state.status = 'completed'`
 - update_plan: all steps → `"completed"` (skipped steps also marked completed)
+- **update_goal**: `functions.update_goal({ status: "complete" })` — release goal constraint
+**Note**: Abort path (Phase 2 step 7) does NOT call `update_goal` — goal stays running for `--continue` resume.
 ```
 === COORDINATE COMPLETE ===

package/.codex/skills/maestro-analyze/SKILL.md CHANGED Viewed

@@ -112,6 +112,7 @@ id,title,description,dimension,analysis_type,deps,context_from,wave,status,findi
 | `findings` | Output | Key findings summary (max 500 chars) |
 | `score` | Output | Dimension score (0-100 for scoring tasks, empty for explore/decide) |
 | `recommendations` | Output | Dimension-specific recommendations |
+| `confidence_score` | Output | Per-dimension confidence score (0-100) from factor-based assessment |
 | `error` | Output | Error message if failed |
 ### Per-Wave CSV (Temporary)
@@ -356,6 +357,10 @@ Write wave CSV with `prev_context`, execute `spawn_agents_on_csv` for synthesis
 {prioritized recommendations with rationale}
 ```
+3b. **Confidence scoring** (full mode only):
+   Factors (weights): findings_depth(.30), evidence_strength(.25), coverage_breadth(.20), user_validation(.15, 0 in CSV mode), consistency(.10). Overall = average of dimension scores. Thresholds: <60% deeper | 60-80% optional | 80-95% converging | >95% converge. Append confidence summary to `analysis.md` and `conclusions.json`.
 4. Build `context.md` (both modes):
 ```markdown
@@ -479,6 +484,8 @@ echo '{"ts":"<ISO>","worker":"{id}","type":"exploration_finding","data":{"file":
 - [ ] analysis.md + conclusions.json produced (full mode only)
 - [ ] Deferred items auto-created as issues
 - [ ] Artifact registered in state.json
+- [ ] Confidence scored per dimension with factor-based model (full mode only)
+- [ ] Confidence summary appended to analysis.md and conclusions.json
 - [ ] Final outputs copied to scratchDir
 - [ ] discoveries.ndjson append-only throughout
 </success_criteria>

package/.codex/skills/maestro-brainstorm/SKILL.md CHANGED Viewed

@@ -364,9 +364,15 @@ spawn_agents_on_csv({
 - Skill: maestro-roadmap --mode full -- Generate full spec package from brainstorm
 ```
-4. Copy artifacts to output `.brainstorming/` directory (phase mode or scratch mode target)
-5. Update phase `index.json` with brainstorm status (if phase mode)
-6. **Next-Step Routing** (skip if AUTO_YES — default to first applicable):
+4. **Brainstorm confidence scoring**:
+   Dimensions (5): role_coverage, cross_role_consistency, feature_completeness, spec_quality, design_feasibility. Factors (weights): analysis_depth(.30), evidence_strength(.25), coverage_breadth(.20), user_validation(.15, 0 if --yes), consistency(.10). Append confidence summary to `synthesis-changelog.md`.
+   **Conflict-based quality gate**: >3 `[UNRESOLVED]` conflicts → warn before artifact registration.
+5. Copy artifacts to output `.brainstorming/` directory (phase mode or scratch mode target)
+6. Update phase `index.json` with brainstorm status (if phase mode)
+7. **Next-Step Routing** (skip if AUTO_YES — default to first applicable):
    - Detect UI features: scan feature-index.json for UI/frontend-related features (keywords: ui, interface, page, component, dashboard, form, layout)
    - `request_user_input` (include UI Design option only when UI features detected):
      ```json
@@ -439,4 +445,7 @@ echo '{"ts":"<ISO>","worker":"{id}","type":"terminology","data":{"term":"CRDT","
 - [ ] context.md produced with full brainstorm report
 - [ ] Artifacts copied to target .brainstorming/ directory
 - [ ] discoveries.ndjson append-only throughout
+- [ ] Confidence scored per role and after cross-role synthesis
+- [ ] Conflict-based quality gate evaluated (> 3 unresolved = warning)
+- [ ] Confidence summary appended to synthesis-changelog.md
 </success_criteria>