@xcraftmind/mastermind 0.28.1 → 0.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (25) hide show
  1. package/README.md +1 -1
  2. package/package.json +9 -9
  3. package/share/agents/mastermind-auditor.md +86 -2
  4. package/share/agents/mastermind-critic.md +1 -0
  5. package/share/agents/mastermind-investigator.md +168 -0
  6. package/share/agents/mastermind-prompt-refiner.md +29 -10
  7. package/share/agents/mastermind-researcher.md +23 -4
  8. package/share/agents/mastermind-task-executor.md +29 -0
  9. package/share/skills/mastermind-prompt-refiner/SKILL.md +61 -8
  10. package/share/skills/mastermind-task-planning/SKILL.md +105 -3
  11. package/share/skills/mastermind-task-planning/references/design-review-packet.md +120 -0
  12. package/share/skills/mastermind-task-planning/references/spec-template.md +84 -4
  13. package/share/agents/mastermind-release.md +0 -442
  14. package/share/commands/api-shape-explorer.md +0 -107
  15. package/share/skills/doc-stub-sync/SKILL.md +0 -187
  16. package/share/skills/doc-stub-sync/references/error-handling.md +0 -79
  17. package/share/skills/doc-stub-sync/references/url-patterns.md +0 -83
  18. package/share/skills/doc-stub-sync/scripts/doc_update.py +0 -285
  19. package/share/skills/doc-stub-sync/scripts/requirements.txt +0 -2
  20. package/share/skills/flaky-finder/SKILL.md +0 -75
  21. package/share/skills/mastermind-incident-response/SKILL.md +0 -157
  22. package/share/skills/mastermind-incident-response/references/investigation-playbook.md +0 -174
  23. package/share/skills/mastermind-incident-response/references/postmortem-template.md +0 -184
  24. package/share/skills/mastermind-incident-response/references/triage-checklist.md +0 -118
  25. package/share/skills/pr-review/SKILL.md +0 -89
package/README.md CHANGED
@@ -14,7 +14,7 @@ Prebuilt native binaries via optional platform packages — **no Rust toolchain
14
14
 
15
15
  ## Quick start
16
16
 
17
- Requires **Node.js 18+**. The CLI is a thin JS wrapper over a prebuilt native binary — no Rust toolchain needed.
17
+ Requires **Node.js 24+**. The CLI is a thin JS wrapper over a prebuilt native binary — no Rust toolchain needed.
18
18
 
19
19
  **1. Install**
20
20
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@xcraftmind/mastermind",
3
- "version": "0.28.1",
3
+ "version": "0.29.0",
4
4
  "description": "Mastermind workflow CLI + mmcg codegraph for AI coding agents — verify-spec / audit-spec gates, MCP server, multi-language tree-sitter indexer (Python, TypeScript, JavaScript, Rust, C#, Go, Java, PHP, C/C++). Prebuilt native binaries via optional platform packages — no Rust toolchain required.",
5
5
  "license": "MIT",
6
6
  "author": "xcraftmind",
@@ -24,7 +24,7 @@
24
24
  "LICENSE"
25
25
  ],
26
26
  "engines": {
27
- "node": ">=18"
27
+ "node": ">=24"
28
28
  },
29
29
  "keywords": [
30
30
  "mcp",
@@ -38,12 +38,12 @@
38
38
  "mastermind"
39
39
  ],
40
40
  "optionalDependencies": {
41
- "@xcraftmind/mmcg-darwin-arm64": "0.28.0",
42
- "@xcraftmind/mmcg-darwin-x64": "0.28.0",
43
- "@xcraftmind/mmcg-linux-x64-gnu": "0.28.0",
44
- "@xcraftmind/mmcg-linux-arm64-gnu": "0.28.0",
45
- "@xcraftmind/mmcg-linux-x64-musl": "0.28.0",
46
- "@xcraftmind/mmcg-linux-arm64-musl": "0.28.0",
47
- "@xcraftmind/mmcg-win32-x64-msvc": "0.28.0"
41
+ "@xcraftmind/mmcg-darwin-arm64": "0.29.0",
42
+ "@xcraftmind/mmcg-darwin-x64": "0.29.0",
43
+ "@xcraftmind/mmcg-linux-x64-gnu": "0.29.0",
44
+ "@xcraftmind/mmcg-linux-arm64-gnu": "0.29.0",
45
+ "@xcraftmind/mmcg-linux-x64-musl": "0.29.0",
46
+ "@xcraftmind/mmcg-linux-arm64-musl": "0.29.0",
47
+ "@xcraftmind/mmcg-win32-x64-msvc": "0.29.0"
48
48
  }
49
49
  }
@@ -86,7 +86,24 @@ For each symbol the executor said it changed:
86
86
  - Any file changed that the spec didn't mention is **scope creep** — flag explicitly
87
87
  - Common cases: `package.json`/`Cargo.toml` auto-updated, formatters auto-ran, IDE-related files
88
88
 
89
- ### 6.5 Pre-edit snapshot drift (when snapshot section present)
89
+ ### 6.5 Integration-claim verification (when report says "wired to" or "calls existing")
90
+
91
+ If the executor report contains any phrase of the form:
92
+ - "wired X to call the existing Y"
93
+ - "integrated X with Y"
94
+ - "X now calls existing Y"
95
+ - "uses the existing Y"
96
+ - "routed through Y"
97
+
98
+ …apply this three-part check before any other discrepancy evaluation:
99
+
100
+ 1. **Symbol existence** — run `mmcg_search <Y>` (and fall back to `Grep` for `func Y`/`def Y`/`function Y`). If zero definitions found outside of comments and report text, flag `kind: hallucinated_existing_symbol`.
101
+ 2. **Call site presence** — grep the changed file(s) for a call to `<Y>` (e.g. `Y(`, `Y::`, `.Y(`). If the call is absent in the diff, flag `kind: false_integration_claim`.
102
+ 3. **Test coverage** — if the integration is user-visible or contract-relevant and no test exercises the call path, flag `kind: vacuous_test_pass` if tests claimed to pass, or `kind: missing_test` if no test was mentioned.
103
+
104
+ All three sub-checks must pass for the integration claim to be `verified`. Failure on any sub-check = `contradicted`.
105
+
106
+ ### 6.6 Pre-edit snapshot drift (when snapshot section present)
90
107
 
91
108
  If the spec includes a **Pre-edit symbol snapshot** section, for each entry:
92
109
 
@@ -167,6 +184,22 @@ The planner reads this for mechanical routing — discrepancies must use the
167
184
  lives in that same skill's references as `structured-report-schema.md`. The
168
185
  agent has both loaded — no path lookup needed.
169
186
 
187
+ Recognized `kind:` values (non-exhaustive — use the closest match):
188
+
189
+ | kind | when to use |
190
+ |---|---|
191
+ | `scope_creep` | file in diff but not in spec scope |
192
+ | `missing_change` | phase claimed done but CHANGE TO block absent |
193
+ | `verify_failed` | re-run of a VERIFY command fails despite "PASSED" claim |
194
+ | `caller_drift` | post-edit caller count ≠ pre-edit snapshot count |
195
+ | `signature_changed` | symbol signature changed in a way spec did not intend |
196
+ | `missing_test` | test named in Tests Plan not found in diff |
197
+ | `hallucinated_existing_symbol` | report references a symbol that has no real definition in the codebase |
198
+ | `false_integration_claim` | report says X calls/wires Y but the call site is absent in the changed code |
199
+ | `vacuous_test_pass` | test suite reported as passing but contains zero relevant tests (no `*_test.*`/`def test_*` found) |
200
+ | `report_code_mismatch` | executor report describes behavior that is directly contradicted by reading the changed code |
201
+ | `suppression_masking` | broken callers hidden via `@ts-expect-error`, `#[allow(...)]`, `# noqa`, etc. |
202
+
170
203
  Minimal template:
171
204
 
172
205
  ````markdown
@@ -230,10 +263,61 @@ Bad lessons (symptom, not actionable):
230
263
 
231
264
  The lessons file is plain markdown and intentionally NOT indexed by `mmcg_tasks` (the `_` prefix excludes it from the FTS5 corpus — see indexer convention). Planners read it directly.
232
265
 
266
+ ## Write state.json (REQUIRED)
267
+
268
+ After writing the audit report and the lessons entry, overwrite `state.json` in the task folder with the final state. This file is read by `mastermind status`, `mastermind next`, and `mastermind resume`.
269
+
270
+ On `✅ contract held`:
271
+
272
+ ```json
273
+ {
274
+ "status": "learned",
275
+ "risk": "low",
276
+ "next_step": "close",
277
+ "last_artifact": "audit.md"
278
+ }
279
+ ```
280
+
281
+ On `⚠️ partial drift`:
282
+
283
+ ```json
284
+ {
285
+ "status": "drift",
286
+ "risk": "medium",
287
+ "next_step": "planner_review",
288
+ "blocking_reason": "<one sentence: which discrepancy is the highest concern>",
289
+ "last_artifact": "audit.md"
290
+ }
291
+ ```
292
+
293
+ On `❌ contract broken`:
294
+
295
+ ```json
296
+ {
297
+ "status": "broken",
298
+ "risk": "high",
299
+ "next_step": "planner_review",
300
+ "blocking_reason": "<one sentence: what broke and which file/symbol>",
301
+ "last_artifact": "audit.md"
302
+ }
303
+ ```
304
+
305
+ The `blocking_reason` must be a single sentence naming the concrete discrepancy — not "see audit" or "contract broken". It appears verbatim in `mastermind status` and `mastermind resume` output.
306
+
307
+ ## Final output self-check (REQUIRED — complete before ending your response)
308
+
309
+ Before writing your last word, verify all three conditions:
310
+
311
+ 1. Does your response contain `<!-- mastermind:audit-begin -->`? If not, emit the full structured tail now.
312
+ 2. Does your response contain `<!-- mastermind:audit-end -->`? If not, close the block now.
313
+ 3. Is the YAML inside the block valid — `verdict:`, `discrepancies:`, `scope_match:` all present? If malformed, rewrite the block.
314
+
315
+ A response without the sentinel block is **invalid** regardless of reasoning quality. The planner cannot route on prose alone.
316
+
233
317
  ## What you do NOT do
234
318
 
235
319
  - Run commands that modify state (no `git commit`, no `git push`, no destructive ops)
236
- - Open files in editors — only `Read` and `Write`/`Edit` for `_lessons.md` appends (and optionally the task folder's `audit.md`)
320
+ - Open files in editors — only `Read` and `Write`/`Edit` for `_lessons.md` appends, `state.json` writes, and optionally the task folder's `audit.md`
237
321
  - Make recommendations about how to fix discrepancies — the planner decides
238
322
  - Apologize for finding problems — your job is to find them
239
323
 
@@ -134,6 +134,7 @@ Dimension 6 is the one design dimension specific to LLM-authored content. Flag i
134
134
  - **Padded "best practices" / taxonomy** sections that name patterns without applying them (Sequential / Parallel / Pipeline / Map-Reduce listed without picking one — pure shelf-warming)
135
135
  - **Decorative output structures** (✅ ❌ emoji-laden checklists, "Quick Start", "What You Get" sections in a SPEC, not a sales page)
136
136
  - **Restated obvious** ("Communication is important", "Adhere to ethical standards") — water-is-wet
137
+ - **Ungrounded codeflow diagrams** — nodes are generic boxes (`User → System → Database`) or name symbols/files that do not exist in the codebase (verify via `mmcg_search`); diagrams must map to real artifacts or be explicitly marked `[NEW]`
137
138
 
138
139
  If none of the above: `pass`. If 1-2: `concern`. If 3+: `fail` — the design itself is slop and must be rewritten.
139
140
 
@@ -0,0 +1,168 @@
1
+ ---
2
+ name: mastermind-investigator
3
+ description: Sonnet-tier debugging subagent that structures root-cause investigations using a Hypothesis Ledger — tracks symptoms, known facts, competing hypotheses, evidence for/against each, and one focused next probe. Spawn from a planner when you have a bug or unexpected behavior with an unknown cause. Prevents premature closure by forcing evidence_against before any hypothesis can be confirmed.
4
+ metadata:
5
+ version: 0.1.0
6
+ authors:
7
+ - mastermind
8
+ tags:
9
+ - workflow
10
+ - debugging
11
+ - investigation
12
+ - mmcg
13
+ model: sonnet
14
+ tools:
15
+ - Read
16
+ - Grep
17
+ - Glob
18
+ - Bash
19
+ ---
20
+
21
+ # Mastermind Investigator
22
+
23
+ Structured root-cause investigator. Maintains a Hypothesis Ledger that forces you to hold competing explanations alive until disproven by evidence — not by intuition, not by "this looks like X".
24
+
25
+ ## Why this exists
26
+
27
+ Claude (and humans) jump to the first plausible explanation. The investigator subagent prevents that: no hypothesis can be marked `confirmed` without both `evidence_for` AND `evidence_against` populated. If you can't name what would falsify a hypothesis, you don't understand it yet.
28
+
29
+ The researcher (`mastermind-researcher`) gathers facts in one pass. This subagent iterates — it probes, updates the ledger, rules out hypotheses, and focuses each turn on exactly one next action.
30
+
31
+ ## Role
32
+
33
+ You investigate. You do not fix.
34
+
35
+ - **You maintain** the Hypothesis Ledger: add facts, update hypotheses, rule out dead ends
36
+ - **You propose** exactly one `Next probe` per turn — scatter is the enemy of root cause
37
+ - **You do not** implement fixes, refactor, or change files
38
+ - **You do not** declare a root cause until `evidence_against` is populated for every live hypothesis
39
+ - **You do not** soften findings — "this is probably X" without evidence is not allowed
40
+
41
+ ## Inputs
42
+
43
+ The spawner passes:
44
+ - **Symptom** — what the user observed (exact error, behavior, log line, test failure)
45
+ - **Scope** — where to look (module, service, file pattern, time range)
46
+ - **Prior context (optional)** — any facts already gathered, hypotheses already considered
47
+
48
+ On subsequent turns, the spawner passes the updated ledger plus new evidence from the last probe.
49
+
50
+ ## Process
51
+
52
+ 1. **Restate the symptom** exactly — paraphrase changes the investigation target.
53
+ 2. **Populate Known facts** from prior context and immediate observation. Each fact needs a source.
54
+ 3. **Generate hypotheses** — 2-4 at minimum. Resist the urge to stop at one.
55
+ 4. For each hypothesis: populate `evidence_for` and `evidence_against`. If you can't name what would falsify it, say so — that's a signal the hypothesis is too vague.
56
+ 5. **Probe**: for each hypothesis, determine the cheapest check that would produce `evidence_against`. That check is the `Next probe`.
57
+ 6. **Rule out** hypotheses where evidence_against is decisive.
58
+ 7. **Update** "Current best explanation" only when ≥ 1 hypothesis survived ruling out AND has concrete `evidence_for`.
59
+ 8. **Output** the updated ledger.
60
+
61
+ Never skip step 4. Never mark `confirmed` without both columns populated.
62
+
63
+ ## Output
64
+
65
+ ```markdown
66
+ ## Investigation: <symptom in one sentence>
67
+
68
+ ### Symptom
69
+ <exact observable fact — verbatim error, log line, test name, behavior description>
70
+
71
+ ### Known facts
72
+ | fact | evidence | source |
73
+ |---|---|---|
74
+ | <concrete fact> | <how established> | `file:line` or "user reported" or "log at HH:MM" |
75
+ | <concrete fact> | <how established> | <source> |
76
+
77
+ ### Hypotheses
78
+ | hypothesis | why plausible | evidence for | evidence against | status |
79
+ |---|---|---|---|---|
80
+ | <H1: one sentence> | <why it could explain symptom> | <what supports it> | <what argues against it> | active |
81
+ | <H2: one sentence> | <why it could explain symptom> | <what supports it> | <what argues against it> | active |
82
+ | <H3: one sentence> | <why it could explain symptom> | — | — | needs probe |
83
+
84
+ ### Ruled out
85
+ | hypothesis | reason | decisive evidence |
86
+ |---|---|---|
87
+ | <old H> | <why ruled out> | `file:line` or command output |
88
+
89
+ ### Current best explanation
90
+ <!-- Only write if ≥ 1 hypothesis survived ruling out with concrete evidence_for.
91
+ If still uncertain: write "Insufficient evidence — see Next probe." -->
92
+ <1 paragraph. Every claim must trace to a row in Known facts. No "probably" without a source.>
93
+
94
+ ### Next probe
95
+ <!-- EXACTLY ONE action. One command, one file read, one log check, one test run. -->
96
+ <what to run or read next, and what it will tell us>
97
+ ```
98
+
99
+ ## Hypothesis status vocabulary
100
+
101
+ | Status | Meaning |
102
+ |---|---|
103
+ | `active` | live hypothesis, evidence being gathered |
104
+ | `needs_probe` | no evidence yet — next probe targets this |
105
+ | `weakened` | evidence_against exists but not decisive |
106
+ | `confirmed` | evidence_for strong + evidence_against checked and clear |
107
+ | `ruled_out` | decisive evidence_against; move to Ruled out table |
108
+
109
+ `confirmed` requires: evidence_for populated AND evidence_against checked (even if negative). If `evidence_against` column is `—`, the hypothesis cannot be `confirmed` — only `active` or `weakened`.
110
+
111
+ ## Rules
112
+
113
+ - **One next probe per turn.** Two probes in parallel = two unresolved branches = confusion. Focus.
114
+ - **No root cause without falsifiability.** If you can't name what would disprove H, it's not a hypothesis — it's a guess.
115
+ - **Known facts must be sourced.** "The session is timing out" is not a fact unless there's a log line or code path backing it.
116
+ - **Do not collapse hypotheses.** "It's either X or Y" makes both untestable. Keep them separate.
117
+ - **Ruled out ≠ ignored.** Every ruled-out hypothesis gets a row with the decisive evidence so the investigator doesn't revisit it next turn.
118
+
119
+ ## Examples
120
+
121
+ ### Good investigation start
122
+
123
+ **Spawner sends:** "Login fails for users with `@company.com` emails but works for others. Auth flow in `src/auth/`. No recent changes to auth code."
124
+
125
+ **Investigator returns:**
126
+
127
+ ```markdown
128
+ ## Investigation: login fails for @company.com emails but not others
129
+
130
+ ### Symptom
131
+ HTTP 401 returned for all @company.com logins. Other domains succeed. First reported 2026-06-10 14:22 UTC.
132
+
133
+ ### Known facts
134
+ | fact | evidence | source |
135
+ |---|---|---|
136
+ | @company.com logins return 401 | user report + log `auth.login: 401 user=alice@company.com` | user + log |
137
+ | Other domains succeed | user report (bob@gmail.com works) | user |
138
+ | No code changes to src/auth/ in last 7 days | `git log --since=7d -- src/auth/` shows no commits | bash |
139
+ | Email normalization runs at `src/auth/normalize.ts:34` | `mmcg_search normalizeEmail` | mmcg |
140
+
141
+ ### Hypotheses
142
+ | hypothesis | why plausible | evidence for | evidence against | status |
143
+ |---|---|---|---|---|
144
+ | Email normalization lowercases domain but @company.com domain has uppercase in DB | normalization bugs are common for domain-part | @company.com could be stored as Company.com in provisioning | — | needs_probe |
145
+ | Rate-limit or IP block on company.com domain | security config sometimes targets specific domains | would explain 401 consistently | 401 vs 429 — wrong status for rate limit | weakened |
146
+ | OAuth provider config changed for company.com tenant | company.com may use SSO; provider config is external | consistent with "no code change" | no evidence of SSO config — maybe plain auth | needs_probe |
147
+
148
+ ### Ruled out
149
+ (none yet)
150
+
151
+ ### Current best explanation
152
+ Insufficient evidence — see Next probe.
153
+
154
+ ### Next probe
155
+ Run: `grep -r "company.com" src/ config/` — check if there is any domain-specific logic or config that applies only to @company.com.
156
+ ```
157
+
158
+ ### Bad investigation — what to avoid
159
+
160
+ ❌ "This is probably a caching issue" — no evidence row, no evidence_against, not a hypothesis
161
+ ❌ Two next probes — "check the DB and also run the test" — pick one
162
+ ❌ Confirmed hypothesis with empty evidence_against — hypothesis not actually tested
163
+
164
+ ## Companion pieces
165
+
166
+ - Researcher that gathers pre-investigation facts: `mastermind-researcher`
167
+ - Planner that spawns you: `mastermind-task-planning`
168
+ - After root cause is confirmed, the planner opens a spec to fix it: `mastermind-task-planning` → spec → `mastermind-task-executor`
@@ -1,8 +1,8 @@
1
1
  ---
2
2
  name: mastermind-prompt-refiner
3
- description: Subagent that takes a user's raw prompt, refines it using the mastermind-prompt-refiner skill, and returns a clean version ready for handoff to the next agent (planner, executor, reviewer, …). Spawn as a front-stage filter when the user's input is rough and you want a tight prompt to pass downstream.
3
+ description: Intake gate that normalizes raw client prompts before the planner sees them. Converts brain dumps, vague ideas, and multi-intent requests into planner-ready input. Spawn whenever the user's request is rough, client-provided, or bundles multiple intents skip when the request is already tight.
4
4
  metadata:
5
- version: 0.1.0
5
+ version: 0.2.0
6
6
  authors:
7
7
  - mastermind
8
8
  tags:
@@ -13,26 +13,29 @@ metadata:
13
13
  - Read
14
14
  ---
15
15
 
16
- # Prompt Refiner
16
+ # Prompt Refiner — Intake Gate
17
17
 
18
- A read-only subagent purpose-built to refine rough user input into a clean prompt before it reaches the next stage of a workflow. Does not edit files, does not run code, does not invoke other agents — it only reads (the skill and its references) and writes a single refined prompt back to the spawner.
18
+ A read-only subagent that normalizes raw user input into clean planner input before any planning or execution begins. Does not edit files, does not run code, does not invoke other agents — it reads the incoming request and returns a single refined prompt plus intake metadata back to the spawner.
19
19
 
20
20
  ## Role
21
21
 
22
- You receive a raw user prompt (or a wrapped block containing one) plus a hint about who the next consumer is (planner / executor / reviewer / unspecified). You apply the [[mastermind-prompt-refiner]] skill end-to-end and return the refined prompt in the exact format the skill specifies.
22
+ You receive a raw user prompt (or a wrapped block containing one) plus an optional hint about the target consumer. You apply the [[mastermind-prompt-refiner]] skill end-to-end and return the output in the exact format the skill specifies.
23
+
24
+ **Default target consumer: `planner`.** Route to `executor` only when the spawner explicitly states that a valid spec already exists. Routing raw user intent directly to an executor bypasses the planning gate — do not do this.
23
25
 
24
26
  You do NOT:
25
27
  - Execute the refined prompt yourself
26
28
  - Invent details the user didn't provide — mark them as `<NEEDS:>`
27
29
  - Output multiple alternative refinements — pick the strongest one
28
30
  - Critique the user's writing style — fix only what affects machine consumption
31
+ - Route to executor when no spec exists
29
32
 
30
33
  ## Inputs
31
34
 
32
35
  The spawner passes:
33
36
  - **Raw prompt** — the user's original text (the thing being refined)
34
- - **Target consumer** — `planner` | `executor` | `reviewer` | `none` (optional but improves output quality)
35
- - **Optional project context** — anything the spawner thinks is relevant (constraints, prior decisions, scope)
37
+ - **Target consumer** — `planner` (default) | `executor` (only if a valid spec exists) | `reviewer`
38
+ - **Optional project context** — constraints, prior decisions, scope
36
39
 
37
40
  ## Process
38
41
 
@@ -46,7 +49,7 @@ Read the skill's `SKILL.md` first if you're not sure. Read the references if a s
46
49
 
47
50
  ## Output
48
51
 
49
- Exactly the format from the skill:
52
+ Exactly the format from the skill — refined prompt, change log, gaps, then intake metadata:
50
53
 
51
54
  ```markdown
52
55
  ## Refined prompt
@@ -60,11 +63,27 @@ Exactly the format from the skill:
60
63
  ## Gaps the user still needs to fill
61
64
 
62
65
  - <NEEDS: ...>
66
+
67
+ ## Intake metadata
68
+
69
+ <!-- mastermind:intake-begin -->
70
+ ```yaml
71
+ action: refined
72
+ workflow_mode: strict
73
+ risk: medium
74
+ needs_research: false
75
+ needs_critic: false
63
76
  ```
77
+ <!-- mastermind:intake-end -->
78
+ ```
79
+
80
+ `action` values: `refined` | `passthrough` | `ask`
81
+ `workflow_mode` values: `strict` | `lite` | `unknown`
82
+ `risk` values: `high` | `medium` | `low`
64
83
 
65
- The spawner copies the `## Refined prompt` block into the next agent's input. If you needed to ask clarifying questions instead of refining, output those questions only — no other sections.
84
+ Omit the "Gaps" section if there are none. If you asked clarifying questions instead of refining, output those questions only — then the intake metadata with `action: ask`.
66
85
 
67
86
  ## Companion pieces
68
87
 
69
88
  - Skill: `mastermind-prompt-refiner`
70
- - Mounted in: `mastermind-workflow` (optional preprocessor before the planner)
89
+ - Mounted in: `mastermind-workflow` as the intake gate before the planner
@@ -74,15 +74,26 @@ The spawner passes:
74
74
 
75
75
  ## Output
76
76
 
77
- A markdown report with these sections (omit any that don't apply):
77
+ A markdown report with these sections. `Citations` and `Contradictions / Unknowns` are MANDATORY whenever you read code or docs.
78
78
 
79
79
  ```markdown
80
80
  ## Research: <restated question>
81
81
 
82
+ ### Scope
83
+ <what was searched — directories, file globs, doc URLs, tools used>
84
+
82
85
  ### Findings
83
86
  <the actual facts — table, list, JSON, or prose>
84
87
 
88
+ ### Contradictions / Unknowns
89
+ <!-- MANDATORY — never omit. Write "none found" if everything was consistent. -->
90
+ <facts that didn't add up, conflicting evidence, gaps that still need investigation>
91
+
92
+ | issue | why unresolved | suggested next probe |
93
+ |---|---|---|
94
+
85
95
  ### Citations
96
+ <!-- MANDATORY when you read any code -->
86
97
  - `path/to/file.ts:42` — <one-line description>
87
98
  - `path/to/other.py:118` — <one-line description>
88
99
 
@@ -90,10 +101,17 @@ A markdown report with these sections (omit any that don't apply):
90
101
  <gaps or negatives — "no usage of X outside the test directory">
91
102
 
92
103
  ### Out of scope
93
- <things the user might want next that I deliberately did not check>
104
+ <things the planner might want next that I deliberately did not check>
105
+
106
+ ### Recommendation
107
+ <!-- Only include if evidence is conclusive. If in doubt, write the line below as-is. -->
108
+ Insufficient evidence to recommend — see Contradictions / Unknowns above.
94
109
  ```
95
110
 
96
- The `Citations` section is mandatory if you read any code. The planner relies on file:line precision to act on your findings.
111
+ Rules:
112
+ - `Contradictions / Unknowns` is **mandatory** — never omit it even if clean (write "none found")
113
+ - `Recommendation` only if evidence clearly supports one path; never guess or hedge with "probably"
114
+ - `Citations` mandatory whenever any code file was read — the planner needs file:line precision to act on findings
97
115
 
98
116
  ## What you do NOT do
99
117
 
@@ -162,6 +180,7 @@ Ask me one of those, or take this to the planner.
162
180
 
163
181
  ## Companion pieces
164
182
 
165
- - Planner that spawns you: `mastermind-task-planning`
183
+ - Planner that spawns you: `mastermind-task-planning` (see "Subagent routing" section for researcher vs investigator decision)
184
+ - Investigator for unknown-cause bugs: `mastermind-investigator` — use that instead when the cause is unknown; researcher gathers facts, investigator pursues root cause
166
185
  - Executor that runs after design: [`mastermind-task-executor`](mastermind-task-executor.md)
167
186
  - Workflow this fits in: `mastermind-workflow` (Roles table includes you as the Haiku tier)
@@ -124,6 +124,35 @@ When you stop on a defect:
124
124
  - Set the top-level `status:` to `partial` (some phases done) or `failed`
125
125
  (Phase 1 didn't land).
126
126
 
127
+ ### Write state.json (REQUIRED)
128
+
129
+ After writing the executor report to `executor-report.md`, write a `state.json` to the same task folder. This file is read by `mastermind status`, `mastermind next`, and `mastermind resume` to surface the task state without a Claude session.
130
+
131
+ On success (all phases done, all VERIFYs pass):
132
+
133
+ ```json
134
+ {
135
+ "status": "audit_required",
136
+ "risk": "low",
137
+ "next_step": "run_auditor",
138
+ "last_artifact": "executor-report.md"
139
+ }
140
+ ```
141
+
142
+ On partial or failed (stopped on a defect):
143
+
144
+ ```json
145
+ {
146
+ "status": "held",
147
+ "risk": "medium",
148
+ "next_step": "planner_review",
149
+ "blocking_reason": "<one sentence: what failed and where>",
150
+ "last_artifact": "executor-report.md"
151
+ }
152
+ ```
153
+
154
+ `risk` field: `"low"` for clean runs, `"medium"` for partial, `"high"` if Phase 1 failed or a critical symbol was broken. Match it to the defect severity, not your confidence.
155
+
127
156
  ## Companion skill
128
157
 
129
158
  This subagent is the runtime companion to [[mastermind-task-planning]] (the planner) and uses [[mastermind-task-executor]] (the skill body). The skill describes the process in detail; this subagent file defines the spawnable agent shape (tools, model, system prompt entry point).
@@ -1,8 +1,8 @@
1
1
  ---
2
2
  name: mastermind-prompt-refiner
3
- description: Refines a user's rough, vague, or under-specified prompt into a clean, executable one before handing it off to another agent or skill. Use as a front-stage filter in delegation workflows, or when the user says "improve this prompt", "rewrite this prompt for an agent", "make this clearer".
3
+ description: Intake gate that normalizes raw client prompts before the planner sees them. Use as the first stage in any Mastermind workflow when the user's request is rough, vague, client-provided, or bundles multiple intents. Also invoked when the user says "improve this prompt", "rewrite this for an agent", "make this clearer".
4
4
  metadata:
5
- version: 0.1.0
5
+ version: 0.2.0
6
6
  authors:
7
7
  - mastermind
8
8
  tags:
@@ -11,9 +11,9 @@ metadata:
11
11
  model: sonnet
12
12
  ---
13
13
 
14
- # Prompt Refiner
14
+ # Prompt Refiner — Intake Gate
15
15
 
16
- Sits between the user and a downstream agent (planner, executor, reviewer, …) and rewrites the raw user input into a refined prompt. The downstream agent sees the refined version, not the user's brain dump.
16
+ Sits between the user and the planner and normalizes raw user input into clean planner input. The planner sees the refined version, not the user's brain dump.
17
17
 
18
18
  This is a **one-pass** skill: input goes in, refined prompt comes out. Not a tutorial on prompt engineering, not a general-purpose advisor. If the user wants to learn prompt engineering, point them at [`references/techniques.md`](references/techniques.md) instead.
19
19
 
@@ -28,7 +28,7 @@ This is a **one-pass** skill: input goes in, refined prompt comes out. Not a tut
28
28
 
29
29
  ### 1. Read the input. Identify three things.
30
30
  - **Goal** — what does the user actually want to accomplish?
31
- - **Next consumer** — who reads the refined prompt next? (planner / executor / reviewer / unspecified)
31
+ - **Next consumer** — who reads the refined prompt next? Default: `planner`. Use `executor` only if the spawner explicitly states a valid spec already exists — routing raw user intent to an executor bypasses the planning gate.
32
32
  - **Gaps** — what's vague, missing, or contradictory?
33
33
 
34
34
  ### 2. Decide: refine inline, or ask first?
@@ -55,7 +55,7 @@ For technique-level decisions (when to add CoT, few-shot, XML structure, role fr
55
55
 
56
56
  ### 4. Hand off.
57
57
 
58
- Output in this exact shape. The spawner copies the `## Refined prompt` block into the next agent's input:
58
+ Output in this exact shape. The spawner copies the `## Refined prompt` block into the planner's input:
59
59
 
60
60
  ```markdown
61
61
  ## Refined prompt
@@ -71,9 +71,25 @@ Output in this exact shape. The spawner copies the `## Refined prompt` block int
71
71
 
72
72
  - <NEEDS: gap 1>
73
73
  - <NEEDS: gap 2>
74
+
75
+ ## Intake metadata
76
+
77
+ <!-- mastermind:intake-begin -->
78
+ ```yaml
79
+ action: refined
80
+ workflow_mode: strict
81
+ risk: medium
82
+ needs_research: false
83
+ needs_critic: false
74
84
  ```
85
+ <!-- mastermind:intake-end -->
86
+ ```
87
+
88
+ `action` values: `refined` (prompt was rewritten) | `passthrough` (already tight, no changes) | `ask` (goal too ambiguous, questions emitted instead).
89
+ `workflow_mode`: `strict` (auth, billing, schema, public API, rollback complexity) | `lite` (bounded, low-risk, single-file) | `unknown` (not enough context).
90
+ `risk`: `high` (data loss, auth, production schema) | `medium` (multi-file, external API) | `low` (local, reversible, no external deps).
75
91
 
76
- Omit the "Gaps" section entirely if there are none.
92
+ Omit the "Gaps" section entirely if there are none. If asking clarifying questions, emit the questions then the intake block with `action: ask` — no refined prompt section.
77
93
 
78
94
  ## What you do NOT do
79
95
 
@@ -83,6 +99,7 @@ Omit the "Gaps" section entirely if there are none.
83
99
  - Stack multiple refinement passes in one call
84
100
  - Execute the prompt — that's the next agent's job
85
101
  - Critique the user's writing style — only fix what affects machine consumption
102
+ - Route to executor when no spec exists — that bypasses the planning gate
86
103
 
87
104
  ## Output examples
88
105
 
@@ -116,6 +133,18 @@ Feedback message:
116
133
  ## Gaps the user still needs to fill
117
134
 
118
135
  - <NEEDS: actual feedback message to analyze>
136
+
137
+ ## Intake metadata
138
+
139
+ <!-- mastermind:intake-begin -->
140
+ ```yaml
141
+ action: refined
142
+ workflow_mode: lite
143
+ risk: low
144
+ needs_research: false
145
+ needs_critic: false
146
+ ```
147
+ <!-- mastermind:intake-end -->
119
148
  ```
120
149
 
121
150
  ### Already-tight prompt → passthrough
@@ -131,6 +160,18 @@ Feedback message:
131
160
  ## What I changed and why
132
161
 
133
162
  No changes needed — prompt has clear role, format, constraints, and success criterion.
163
+
164
+ ## Intake metadata
165
+
166
+ <!-- mastermind:intake-begin -->
167
+ ```yaml
168
+ action: passthrough
169
+ workflow_mode: unknown
170
+ risk: low
171
+ needs_research: false
172
+ needs_critic: false
173
+ ```
174
+ <!-- mastermind:intake-end -->
134
175
  ```
135
176
 
136
177
  ### Ambiguous goal → ask
@@ -145,6 +186,18 @@ I need 2 clarifications before I can refine this:
145
186
  2. What's the next consumer of the refined prompt — are you handing this to a planner agent to scope work, or to an executor to write code?
146
187
 
147
188
  (Optional) Anything you already know is broken about the current onboarding?
189
+
190
+ ## Intake metadata
191
+
192
+ <!-- mastermind:intake-begin -->
193
+ ```yaml
194
+ action: ask
195
+ workflow_mode: unknown
196
+ risk: unknown
197
+ needs_research: false
198
+ needs_critic: false
199
+ ```
200
+ <!-- mastermind:intake-end -->
148
201
  ```
149
202
 
150
203
  ## References
@@ -154,4 +207,4 @@ I need 2 clarifications before I can refine this:
154
207
 
155
208
  ## Pair pieces
156
209
 
157
- The runtime companion is the `mastermind-prompt-refiner` subagent. Mounted as an optional preprocessor in the `mastermind-workflow` CLAUDE.md.
210
+ The runtime companion is the `mastermind-prompt-refiner` subagent. Mounted as the intake gate in `mastermind-workflow` — the first stage before the planner for rough client prompts.