@xcraftmind/mastermind 0.28.0 → 0.28.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +4 -4
- package/package.json +9 -9
- package/share/agents/mastermind-auditor.md +76 -2
- package/share/agents/mastermind-critic.md +1 -0
- package/share/agents/mastermind-investigator.md +168 -0
- package/share/agents/mastermind-prompt-refiner.md +29 -10
- package/share/agents/mastermind-researcher.md +23 -4
- package/share/agents/mastermind-task-executor.md +29 -0
- package/share/skills/mastermind-prompt-refiner/SKILL.md +61 -8
- package/share/skills/mastermind-task-planning/SKILL.md +105 -3
- package/share/skills/mastermind-task-planning/references/design-review-packet.md +120 -0
- package/share/skills/mastermind-task-planning/references/spec-template.md +84 -4
- package/share/agents/mastermind-release.md +0 -442
- package/share/commands/api-shape-explorer.md +0 -107
- package/share/skills/doc-stub-sync/SKILL.md +0 -187
- package/share/skills/doc-stub-sync/references/error-handling.md +0 -79
- package/share/skills/doc-stub-sync/references/url-patterns.md +0 -83
- package/share/skills/doc-stub-sync/scripts/doc_update.py +0 -285
- package/share/skills/doc-stub-sync/scripts/requirements.txt +0 -2
- package/share/skills/flaky-finder/SKILL.md +0 -75
- package/share/skills/mastermind-incident-response/SKILL.md +0 -157
- package/share/skills/mastermind-incident-response/references/investigation-playbook.md +0 -174
- package/share/skills/mastermind-incident-response/references/postmortem-template.md +0 -184
- package/share/skills/mastermind-incident-response/references/triage-checklist.md +0 -118
- package/share/skills/pr-review/SKILL.md +0 -89
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: mastermind-prompt-refiner
|
|
3
|
-
description:
|
|
3
|
+
description: Intake gate that normalizes raw client prompts before the planner sees them. Use as the first stage in any Mastermind workflow when the user's request is rough, vague, client-provided, or bundles multiple intents. Also invoked when the user says "improve this prompt", "rewrite this for an agent", "make this clearer".
|
|
4
4
|
metadata:
|
|
5
|
-
version: 0.
|
|
5
|
+
version: 0.2.0
|
|
6
6
|
authors:
|
|
7
7
|
- mastermind
|
|
8
8
|
tags:
|
|
@@ -11,9 +11,9 @@ metadata:
|
|
|
11
11
|
model: sonnet
|
|
12
12
|
---
|
|
13
13
|
|
|
14
|
-
# Prompt Refiner
|
|
14
|
+
# Prompt Refiner — Intake Gate
|
|
15
15
|
|
|
16
|
-
Sits between the user and
|
|
16
|
+
Sits between the user and the planner and normalizes raw user input into clean planner input. The planner sees the refined version, not the user's brain dump.
|
|
17
17
|
|
|
18
18
|
This is a **one-pass** skill: input goes in, refined prompt comes out. Not a tutorial on prompt engineering, not a general-purpose advisor. If the user wants to learn prompt engineering, point them at [`references/techniques.md`](references/techniques.md) instead.
|
|
19
19
|
|
|
@@ -28,7 +28,7 @@ This is a **one-pass** skill: input goes in, refined prompt comes out. Not a tut
|
|
|
28
28
|
|
|
29
29
|
### 1. Read the input. Identify three things.
|
|
30
30
|
- **Goal** — what does the user actually want to accomplish?
|
|
31
|
-
- **Next consumer** — who reads the refined prompt next?
|
|
31
|
+
- **Next consumer** — who reads the refined prompt next? Default: `planner`. Use `executor` only if the spawner explicitly states a valid spec already exists — routing raw user intent to an executor bypasses the planning gate.
|
|
32
32
|
- **Gaps** — what's vague, missing, or contradictory?
|
|
33
33
|
|
|
34
34
|
### 2. Decide: refine inline, or ask first?
|
|
@@ -55,7 +55,7 @@ For technique-level decisions (when to add CoT, few-shot, XML structure, role fr
|
|
|
55
55
|
|
|
56
56
|
### 4. Hand off.
|
|
57
57
|
|
|
58
|
-
Output in this exact shape. The spawner copies the `## Refined prompt` block into the
|
|
58
|
+
Output in this exact shape. The spawner copies the `## Refined prompt` block into the planner's input:
|
|
59
59
|
|
|
60
60
|
```markdown
|
|
61
61
|
## Refined prompt
|
|
@@ -71,9 +71,25 @@ Output in this exact shape. The spawner copies the `## Refined prompt` block int
|
|
|
71
71
|
|
|
72
72
|
- <NEEDS: gap 1>
|
|
73
73
|
- <NEEDS: gap 2>
|
|
74
|
+
|
|
75
|
+
## Intake metadata
|
|
76
|
+
|
|
77
|
+
<!-- mastermind:intake-begin -->
|
|
78
|
+
```yaml
|
|
79
|
+
action: refined
|
|
80
|
+
workflow_mode: strict
|
|
81
|
+
risk: medium
|
|
82
|
+
needs_research: false
|
|
83
|
+
needs_critic: false
|
|
74
84
|
```
|
|
85
|
+
<!-- mastermind:intake-end -->
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
`action` values: `refined` (prompt was rewritten) | `passthrough` (already tight, no changes) | `ask` (goal too ambiguous, questions emitted instead).
|
|
89
|
+
`workflow_mode`: `strict` (auth, billing, schema, public API, rollback complexity) | `lite` (bounded, low-risk, single-file) | `unknown` (not enough context).
|
|
90
|
+
`risk`: `high` (data loss, auth, production schema) | `medium` (multi-file, external API) | `low` (local, reversible, no external deps).
|
|
75
91
|
|
|
76
|
-
Omit the "Gaps" section entirely if there are none.
|
|
92
|
+
Omit the "Gaps" section entirely if there are none. If asking clarifying questions, emit the questions then the intake block with `action: ask` — no refined prompt section.
|
|
77
93
|
|
|
78
94
|
## What you do NOT do
|
|
79
95
|
|
|
@@ -83,6 +99,7 @@ Omit the "Gaps" section entirely if there are none.
|
|
|
83
99
|
- Stack multiple refinement passes in one call
|
|
84
100
|
- Execute the prompt — that's the next agent's job
|
|
85
101
|
- Critique the user's writing style — only fix what affects machine consumption
|
|
102
|
+
- Route to executor when no spec exists — that bypasses the planning gate
|
|
86
103
|
|
|
87
104
|
## Output examples
|
|
88
105
|
|
|
@@ -116,6 +133,18 @@ Feedback message:
|
|
|
116
133
|
## Gaps the user still needs to fill
|
|
117
134
|
|
|
118
135
|
- <NEEDS: actual feedback message to analyze>
|
|
136
|
+
|
|
137
|
+
## Intake metadata
|
|
138
|
+
|
|
139
|
+
<!-- mastermind:intake-begin -->
|
|
140
|
+
```yaml
|
|
141
|
+
action: refined
|
|
142
|
+
workflow_mode: lite
|
|
143
|
+
risk: low
|
|
144
|
+
needs_research: false
|
|
145
|
+
needs_critic: false
|
|
146
|
+
```
|
|
147
|
+
<!-- mastermind:intake-end -->
|
|
119
148
|
```
|
|
120
149
|
|
|
121
150
|
### Already-tight prompt → passthrough
|
|
@@ -131,6 +160,18 @@ Feedback message:
|
|
|
131
160
|
## What I changed and why
|
|
132
161
|
|
|
133
162
|
No changes needed — prompt has clear role, format, constraints, and success criterion.
|
|
163
|
+
|
|
164
|
+
## Intake metadata
|
|
165
|
+
|
|
166
|
+
<!-- mastermind:intake-begin -->
|
|
167
|
+
```yaml
|
|
168
|
+
action: passthrough
|
|
169
|
+
workflow_mode: unknown
|
|
170
|
+
risk: low
|
|
171
|
+
needs_research: false
|
|
172
|
+
needs_critic: false
|
|
173
|
+
```
|
|
174
|
+
<!-- mastermind:intake-end -->
|
|
134
175
|
```
|
|
135
176
|
|
|
136
177
|
### Ambiguous goal → ask
|
|
@@ -145,6 +186,18 @@ I need 2 clarifications before I can refine this:
|
|
|
145
186
|
2. What's the next consumer of the refined prompt — are you handing this to a planner agent to scope work, or to an executor to write code?
|
|
146
187
|
|
|
147
188
|
(Optional) Anything you already know is broken about the current onboarding?
|
|
189
|
+
|
|
190
|
+
## Intake metadata
|
|
191
|
+
|
|
192
|
+
<!-- mastermind:intake-begin -->
|
|
193
|
+
```yaml
|
|
194
|
+
action: ask
|
|
195
|
+
workflow_mode: unknown
|
|
196
|
+
risk: unknown
|
|
197
|
+
needs_research: false
|
|
198
|
+
needs_critic: false
|
|
199
|
+
```
|
|
200
|
+
<!-- mastermind:intake-end -->
|
|
148
201
|
```
|
|
149
202
|
|
|
150
203
|
## References
|
|
@@ -154,4 +207,4 @@ I need 2 clarifications before I can refine this:
|
|
|
154
207
|
|
|
155
208
|
## Pair pieces
|
|
156
209
|
|
|
157
|
-
The runtime companion is the `mastermind-prompt-refiner` subagent. Mounted as
|
|
210
|
+
The runtime companion is the `mastermind-prompt-refiner` subagent. Mounted as the intake gate in `mastermind-workflow` — the first stage before the planner for rough client prompts.
|
|
@@ -57,7 +57,33 @@ If you find yourself **guessing** at a function signature, a file path, or the n
|
|
|
57
57
|
|
|
58
58
|
If mmcg is not configured (no `mmcg_status` response), say so to the user and ask whether to proceed without truth grounding or wait until the index is set up. Do not silently work blind.
|
|
59
59
|
|
|
60
|
-
|
|
60
|
+
### Subagent routing — researcher vs investigator vs self
|
|
61
|
+
|
|
62
|
+
Before designing, pick the right fact-gathering tool:
|
|
63
|
+
|
|
64
|
+
| Situation | Use |
|
|
65
|
+
|---|---|
|
|
66
|
+
| You need to batch mmcg lookups (callsites, imports, blast radius, config values) before drafting | `mastermind-researcher` (Haiku — cheap, read-only, returns structured facts) |
|
|
67
|
+
| User reports a bug / unexpected behavior and you do **not know the cause** | `mastermind-investigator` (Sonnet — iterative, maintains Hypothesis Ledger, one probe per turn) |
|
|
68
|
+
| Simple one-symbol lookup, 1-2 quick mmcg queries | Do it yourself inline — spawning a subagent for trivial lookups wastes tokens |
|
|
69
|
+
|
|
70
|
+
**Researcher** gathers facts in one pass and returns. You do NOT iterate with the researcher — one question, one structured report, done.
|
|
71
|
+
|
|
72
|
+
**Investigator** iterates. You spawn it with a symptom, it returns an updated ledger + one next probe. You run the probe (or hand it to the user), pass the result back, it updates the ledger again. Repeat until one hypothesis is `confirmed`. Then you open a spec.
|
|
73
|
+
|
|
74
|
+
### Workflow modes — pick before drafting
|
|
75
|
+
|
|
76
|
+
Every task runs in one of three modes. Pick the mode first; it determines which spec sections are required. Do NOT use `strict` ceremony for a one-liner.
|
|
77
|
+
|
|
78
|
+
| Mode | When to use | Required spec sections |
|
|
79
|
+
|---|---|---|
|
|
80
|
+
| **lite** | One-file or trivial change, no auth/billing/migration | Goal, Scope, FIND/CHANGE TO, VERIFY |
|
|
81
|
+
| **standard** | Normal feature or fix — multi-file, no sensitive areas | Everything in lite + Alternatives Considered, Codeflow, Decision Matrix, Tests Plan, Docs Plan, Observability, Performance |
|
|
82
|
+
| **strict** | Auth, billing, migration, public API, data-loss risk, blast-radius ≥ 20 | Everything in standard + Evidence Ledger, Risk Register, 3-lens critic panel, Rollback Plan |
|
|
83
|
+
|
|
84
|
+
`mastermind new-spec` defaults to `lite`. Pass `--mode standard` or `--mode strict` to get the richer template.
|
|
85
|
+
|
|
86
|
+
For `strict` mode, the critic panel (3 parallel lenses) is mandatory before drafting the spec. For `standard`, one critic spawn is recommended but not mandatory. For `lite`, skip the critic.
|
|
61
87
|
|
|
62
88
|
### Auto-fill the Pre-edit symbol snapshot
|
|
63
89
|
|
|
@@ -91,6 +117,68 @@ If a *single* assumption is load-bearing (e.g. "the user means PostgreSQL, not g
|
|
|
91
117
|
|
|
92
118
|
The bar is concrete: if you can imagine a reasonable user reading the spec and saying "that's not what I meant", verbalize the fork upfront. The cost of a 2-line "Interpretation" note is negligible; the cost of an executor implementing the wrong interpretation is a full re-spec cycle.
|
|
93
119
|
|
|
120
|
+
## Debug-time investigation — spawn the investigator
|
|
121
|
+
|
|
122
|
+
When the user reports a bug, test failure, or unexpected behavior and the root cause is **not already known**, spawn `mastermind-investigator` before opening a spec. Opening a spec on a misdiagnosed bug wastes an entire executor + auditor cycle.
|
|
123
|
+
|
|
124
|
+
### When to spawn the investigator — mandatory
|
|
125
|
+
|
|
126
|
+
Spawn when:
|
|
127
|
+
- User says "X is broken" but doesn't say why.
|
|
128
|
+
- A test fails and the stack trace points to ≥ 2 plausible causes.
|
|
129
|
+
- A behavior changed and no obvious commit explains it.
|
|
130
|
+
- You find yourself guessing the cause ("probably the cache", "likely a race condition") without evidence.
|
|
131
|
+
|
|
132
|
+
Do **not** spawn for:
|
|
133
|
+
- Bugs where the cause is already known (a typo, a wrong constant, a confirmed missing import) — go straight to a spec.
|
|
134
|
+
- Feature requests — the investigator is for unknown failures only.
|
|
135
|
+
- Trivial one-liner fixes where the change is self-evident.
|
|
136
|
+
|
|
137
|
+
### What to pass the investigator
|
|
138
|
+
|
|
139
|
+
```markdown
|
|
140
|
+
**Symptom:** <exact observable fact — verbatim error, log line, test name, behavior>
|
|
141
|
+
|
|
142
|
+
**Scope:** <directory, file pattern, module, or service to search in>
|
|
143
|
+
|
|
144
|
+
**Prior context (optional):** <any facts already known, hypotheses already considered>
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Do not send a wall of context. The investigator needs a clean, cold start — your prior reasoning about the cause is bias, not fact.
|
|
148
|
+
|
|
149
|
+
### The iteration protocol
|
|
150
|
+
|
|
151
|
+
The investigator returns an updated **Hypothesis Ledger** and exactly one **Next probe**. Your job as planner:
|
|
152
|
+
|
|
153
|
+
1. Read the ledger. Do not form your own opinion about which hypothesis is correct.
|
|
154
|
+
2. Execute (or ask the user to run) the one Next probe.
|
|
155
|
+
3. Pass the result back to the investigator as a follow-up with the updated ledger.
|
|
156
|
+
4. Repeat until the ledger has one hypothesis in `confirmed` status.
|
|
157
|
+
|
|
158
|
+
A `confirmed` hypothesis requires: `evidence_for` populated AND `evidence_against` checked. If the investigator returns a hypothesis as `confirmed` without an `evidence_against` entry, push back — that's premature closure.
|
|
159
|
+
|
|
160
|
+
Do not run additional probes beyond the one the investigator specified. Uncoordinated parallel probes create conflicting evidence that the ledger can't cleanly incorporate.
|
|
161
|
+
|
|
162
|
+
### Termination — when to stop investigating and open a spec
|
|
163
|
+
|
|
164
|
+
Stop the investigation loop when:
|
|
165
|
+
- Exactly one hypothesis is `confirmed` (evidence_for + evidence_against both populated).
|
|
166
|
+
- The investigator's "Current best explanation" names a concrete code location, config value, or external dependency.
|
|
167
|
+
|
|
168
|
+
At that point:
|
|
169
|
+
1. Copy the "Current best explanation" into the spec's **Goal** section as the diagnosed root cause.
|
|
170
|
+
2. Copy the Ruled out table into the spec's **Notes** — the executor should not re-investigate.
|
|
171
|
+
3. Open the spec normally (pick mode, draft, critic if needed).
|
|
172
|
+
|
|
173
|
+
If the investigator exhausts probes without confirming a cause: escalate to the user with the full ledger and ruled-out table. Do not guess a root cause. Do not open a spec on an unconfirmed hypothesis.
|
|
174
|
+
|
|
175
|
+
### Anti-patterns
|
|
176
|
+
|
|
177
|
+
- **Skipping the investigator because you think you know the answer.** "I'm pretty sure it's X" is exactly the cognitive bias the investigator exists to prevent. If you can't populate `evidence_against` for your hypothesis, you don't know — you guess.
|
|
178
|
+
- **Running probes yourself in parallel with the investigator.** The investigator's probe sequence is deliberate: each result informs the next. Parallel probes create noise.
|
|
179
|
+
- **Opening a spec on a `weakened` hypothesis.** Weakened ≠ confirmed. Wait for confirmation.
|
|
180
|
+
- **Treating "no other hypothesis survived" as evidence_for.** Ruling out alternatives doesn't confirm the survivor — it may mean all hypotheses are wrong.
|
|
181
|
+
|
|
94
182
|
## Design-time challenge — spawn the critic
|
|
95
183
|
|
|
96
184
|
Before drafting the spec, you decide on an approach. **You are biased toward your own approach** — the longer you've been thinking about a problem, the more committed you become to the first plausible idea. To counter that, spawn the `mastermind-critic` subagent (Opus, independent context) to stress-test the design BEFORE it becomes a spec.
|
|
@@ -149,7 +237,7 @@ Spawn when:
|
|
|
149
237
|
|
|
150
238
|
### What to send the critic
|
|
151
239
|
|
|
152
|
-
A
|
|
240
|
+
A structured design brief. **Must include `mmcg` evidence** — without it the critic flags dimension #7 as `fail`. Use the canonical format in [`references/design-review-packet.md`](references/design-review-packet.md):
|
|
153
241
|
|
|
154
242
|
```markdown
|
|
155
243
|
**Problem:** <1-2 sentences on what we're solving>
|
|
@@ -160,6 +248,13 @@ A focused brief — 1-2 paragraphs. **Must include `mmcg` evidence** — without
|
|
|
160
248
|
- <Alt 1>: rejected because <concrete reason>
|
|
161
249
|
- <Alt 2>: rejected because <concrete reason>
|
|
162
250
|
|
|
251
|
+
**Decision Matrix:**
|
|
252
|
+
| Option | Correctness | Complexity | Blast radius | Migration risk | Observability | Reversibility | Verdict |
|
|
253
|
+
|---|---|---|---|---|---|---|---|
|
|
254
|
+
| A | pass | low | low | none | good | easy | reject |
|
|
255
|
+
| B | concern | medium | high | medium | weak | hard | reject |
|
|
256
|
+
| C | pass | medium | low | none | good | easy | chosen |
|
|
257
|
+
|
|
163
258
|
**Constraints:** <hard limits — language, deadline, compatibility, ops>
|
|
164
259
|
|
|
165
260
|
**mmcg snapshot:** <the relevant mmcg_search / mmcg_callers / mmcg_impact results
|
|
@@ -174,7 +269,14 @@ Do not send the critic the whole brainstorming conversation — that imports you
|
|
|
174
269
|
|
|
175
270
|
**Alternatives mandate.** For any non-trivial change (multi-file, anything in sensitive areas, anything where the critic would be mandatory), the brief MUST include ≥ 2 rejected alternatives with concrete reasons. The critic checks this. The spec template's "Alternatives Considered" section is mandatory for the same reason — it's the audit trail for "we did think about other options".
|
|
176
271
|
|
|
177
|
-
For green-field interface / API / module-boundary design,
|
|
272
|
+
For green-field interface / API / module-boundary design, generate 3 qualitatively different shapes (not 3 variants of one idea) and pick one with a defended rationale. The two unpicked become the rejected alternatives in the brief and in the spec's "Alternatives Considered" section. Skip when modifying an existing API where the shape is already fixed.
|
|
273
|
+
|
|
274
|
+
**Codeflow diagrams.** For each non-trivial alternative (auth, billing, data-flow, multi-module refactor, API boundary, migration, anything touching ≥ 3 files), include a small Mermaid `flowchart TD` diagram alongside the alternative. Rules:
|
|
275
|
+
|
|
276
|
+
- **≤ 8 nodes per diagram.**
|
|
277
|
+
- Every node must be a real file, symbol, module, or external boundary — verified via `mmcg_search` or explicitly marked `[NEW]`.
|
|
278
|
+
- No generic box (`User → System → Database`) — that is AI slop and the critic will flag it.
|
|
279
|
+
- Omit diagrams for trivial changes (one-line fix, docs, simple test, mechanical rename).
|
|
178
280
|
|
|
179
281
|
### How to read the critic's verdict
|
|
180
282
|
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
# Design Review Packet
|
|
2
|
+
|
|
3
|
+
Canonical input format for spawning `mastermind-critic`. Copy this template, fill every section, and pass it as the critic's input. Do not send raw brainstorming — cold context is the point.
|
|
4
|
+
|
|
5
|
+
## When to use
|
|
6
|
+
|
|
7
|
+
- Before drafting a standard or strict spec.
|
|
8
|
+
- Whenever you are unsure which of several approaches to pick.
|
|
9
|
+
- Always for mandatory critic categories (auth, billing, migration, public-API, blast-radius ≥ 20).
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Template
|
|
14
|
+
|
|
15
|
+
```markdown
|
|
16
|
+
# Design Review: <short title>
|
|
17
|
+
|
|
18
|
+
## Problem
|
|
19
|
+
|
|
20
|
+
<1-2 sentences — what is broken, missing, or required. Concrete, not abstract.>
|
|
21
|
+
|
|
22
|
+
## Proposed design
|
|
23
|
+
|
|
24
|
+
<The approach you intend to implement. 1-3 paragraphs. Concrete enough to critique — name files, symbols, modules. Vague prose like "improve the architecture" gives the critic nothing to work with.>
|
|
25
|
+
|
|
26
|
+
## Alternatives considered
|
|
27
|
+
|
|
28
|
+
List ≥ 2 rejected alternatives for non-trivial changes. For each:
|
|
29
|
+
|
|
30
|
+
### Alternative A — <name>
|
|
31
|
+
|
|
32
|
+
```mermaid
|
|
33
|
+
flowchart TD
|
|
34
|
+
<real_symbol_or_file> --> <real_symbol_or_file>
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
- **Rejected because:** <concrete reason tied to mmcg findings or project constraint>
|
|
38
|
+
|
|
39
|
+
### Alternative B — <name>
|
|
40
|
+
|
|
41
|
+
```mermaid
|
|
42
|
+
flowchart TD
|
|
43
|
+
<real_symbol_or_file> --> <real_symbol_or_file>
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
- **Rejected because:** <reason>
|
|
47
|
+
|
|
48
|
+
### Picked approach — <name>
|
|
49
|
+
|
|
50
|
+
```mermaid
|
|
51
|
+
flowchart TD
|
|
52
|
+
<real_symbol_or_file> --> <real_symbol_or_file>
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
- **Chosen because:** <concrete reason>
|
|
56
|
+
|
|
57
|
+
## Decision Matrix
|
|
58
|
+
|
|
59
|
+
Crystallizes the alternatives comparison. Fill one row per option above.
|
|
60
|
+
|
|
61
|
+
| Option | Correctness | Complexity | Blast radius | Migration risk | Observability | Reversibility | Verdict |
|
|
62
|
+
|---|---|---|---|---|---|---|---|
|
|
63
|
+
| A — <name> | pass | low | low | none | good | easy | reject |
|
|
64
|
+
| B — <name> | concern | medium | high | medium | weak | hard | reject |
|
|
65
|
+
| C — <name> | pass | medium | low | none | good | easy | chosen |
|
|
66
|
+
|
|
67
|
+
Column values: `pass / concern / fail` for Correctness; `low / medium / high` for complexity/blast/migration; `good / weak / none` for observability; `easy / medium / hard` for reversibility. Exactly one row gets `chosen`.
|
|
68
|
+
|
|
69
|
+
## Constraints
|
|
70
|
+
|
|
71
|
+
<Hard limits — programming language, runtime version, deadline, backward-compatibility requirements, ops constraints. The critic uses these to distinguish intentional tradeoffs from oversights.>
|
|
72
|
+
|
|
73
|
+
## mmcg snapshot
|
|
74
|
+
|
|
75
|
+
<Paste the mmcg evidence that grounds the design. Without this, the critic cannot verify claims and will flag dimension #7 as `fail`. Include at minimum:>
|
|
76
|
+
|
|
77
|
+
- `mmcg_search <primary_symbol>` → `<file:line>` (<brief description>)
|
|
78
|
+
- `mmcg_callers <primary_symbol>` → `<N> callers` (<impact summary>)
|
|
79
|
+
- `mmcg_impact <primary_symbol> --depth 3` → `<M> transitive` (if relevant)
|
|
80
|
+
- `mmcg_search <secondary_symbol>` → `<file:line>` (if touching ≥ 2 symbols)
|
|
81
|
+
|
|
82
|
+
For pure doc / config changes with no code symbols: write "no code symbols — mmcg not applicable".
|
|
83
|
+
|
|
84
|
+
## Risk surface
|
|
85
|
+
|
|
86
|
+
<Known unknowns and failure modes for the proposed design. The critic will probe these. Being explicit here prevents the critic from marking concerns you already know about as `fail`.>
|
|
87
|
+
|
|
88
|
+
- <risk 1>: <what could go wrong>
|
|
89
|
+
- <risk 2>: <what could go wrong>
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## Completeness rules
|
|
95
|
+
|
|
96
|
+
- **mmcg snapshot is mandatory** for any change touching code symbols.
|
|
97
|
+
- **Decision Matrix is mandatory** for standard and strict specs.
|
|
98
|
+
- **Codeflow diagrams** are required for non-trivial alternatives (multi-module, auth/billing/data-flow, API boundary, migration, ≥ 3 files). All nodes must be real files, symbols, modules, or external boundaries — verified via `mmcg_search` or explicitly marked `[NEW]`. Generic boxes (`User → System → Database`) are AI slop and will cause critic `fail` on dimension #6.
|
|
99
|
+
- **≥ 2 alternatives** for non-trivial changes. For trivial changes (one-line fix, doc edit, mechanical rename), write "trivial change — single approach".
|
|
100
|
+
- **Constraints section must name the actual hard limits** — not vague preferences. If there are none, write "none beyond standard project conventions".
|
|
101
|
+
|
|
102
|
+
## What NOT to include
|
|
103
|
+
|
|
104
|
+
- The full brainstorming conversation — that imports your bias.
|
|
105
|
+
- Speculative alternatives you haven't thought through — shallow rejection reasons give the critic nothing to work with.
|
|
106
|
+
- Implementation details that belong in the spec's Scope section — this packet is for design validation, not execution instructions.
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## For 3-lens critic panels (strict mode)
|
|
111
|
+
|
|
112
|
+
When spawning three critics in parallel, prepend a lens directive to the same packet:
|
|
113
|
+
|
|
114
|
+
**Security lens:** "Review primarily for authz/authn holes, secret exposure, injection, and privilege escalation. Other dimensions are secondary."
|
|
115
|
+
|
|
116
|
+
**Performance lens:** "Review primarily for latency, throughput, memory pressure, and lock contention. Other dimensions are secondary."
|
|
117
|
+
|
|
118
|
+
**Simplicity lens:** "Review primarily for YAGNI violations, unnecessary abstraction, complexity creep, and AI slop. Other dimensions are secondary."
|
|
119
|
+
|
|
120
|
+
Same packet body, same mmcg snapshot, same alternatives — only the lens directive differs. Spawn all three in one message so they run concurrently.
|
|
@@ -93,11 +93,69 @@ You are <doing X> to achieve <Y, the goal in one sentence>.
|
|
|
93
93
|
|
|
94
94
|
The planner must enumerate ≥ 2 plausible approaches and explain why each was rejected. For trivial changes (one-line fix, doc edit, throwaway exploration), write "trivial change — single approach". The critic uses this section to avoid re-suggesting rejected options.
|
|
95
95
|
|
|
96
|
-
For
|
|
96
|
+
For non-trivial alternatives (multi-module, auth/billing/data-flow, API boundary, migration, anything touching ≥ 3 files), add a Mermaid codeflow diagram per alternative. Nodes must be real files, symbols, modules, or external boundaries — verified via `mmcg_search` or explicitly marked `[NEW]`. Keep each diagram ≤ 8 nodes. Generic boxes (`User → System → Database`) are AI slop and will be flagged by the critic.
|
|
97
97
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
98
|
+
### Alternative A — <name>
|
|
99
|
+
|
|
100
|
+
```mermaid
|
|
101
|
+
flowchart TD
|
|
102
|
+
<real_symbol_or_file> --> <real_symbol_or_file>
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
- **Grounding:** `mmcg_search <symbol>` → `<file:line>`, `mmcg_callers <symbol>` → `<N> callers`
|
|
106
|
+
- **Tradeoff:** <concrete>
|
|
107
|
+
- **Rejected because:** <concrete reason tied to mmcg findings or project constraint>
|
|
108
|
+
|
|
109
|
+
### Alternative B — <name>
|
|
110
|
+
|
|
111
|
+
```mermaid
|
|
112
|
+
flowchart TD
|
|
113
|
+
<real_symbol_or_file> --> <real_symbol_or_file>
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
- **Grounding:** `mmcg_search <symbol>` → `<file:line>`
|
|
117
|
+
- **Tradeoff:** <concrete>
|
|
118
|
+
- **Rejected because:** <reason>
|
|
119
|
+
|
|
120
|
+
### Picked approach — <name>
|
|
121
|
+
|
|
122
|
+
```mermaid
|
|
123
|
+
flowchart TD
|
|
124
|
+
<real_symbol_or_file> --> <real_symbol_or_file>
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
- **Grounding:** <mmcg evidence>
|
|
128
|
+
- **Chosen because:** <concrete reason>
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Decision Matrix *(required for standard/strict; skip for lite)*
|
|
133
|
+
|
|
134
|
+
Crystallizes the alternatives comparison into an objective grid. Fill one row per option in Alternatives Considered above.
|
|
135
|
+
|
|
136
|
+
| Option | Correctness | Complexity | Blast radius | Migration risk | Observability | Reversibility | Verdict |
|
|
137
|
+
|---|---|---|---|---|---|---|---|
|
|
138
|
+
| A — <name> | pass | low | low | low | good | easy | reject |
|
|
139
|
+
| B — <name> | concern | low | high | medium | weak | hard | reject |
|
|
140
|
+
| C — <name> | pass | medium | low | none | good | easy | **chosen** |
|
|
141
|
+
|
|
142
|
+
Column values: `pass / concern / fail` for Correctness; `low / medium / high` for complexity/blast/migration; `good / weak / none` for observability; `easy / medium / hard` for reversibility.
|
|
143
|
+
|
|
144
|
+
The `Verdict` row must be one of: `chosen`, `reject`, `candidate` (deferred alternative). Exactly one row gets `chosen`.
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
## Risk Register *(required for strict specs; skip for trivial/lite)*
|
|
149
|
+
|
|
150
|
+
Known risks of the chosen approach. Each risk must have mitigation assigned to a phase. If a risk has no mitigation, say so — don't omit it.
|
|
151
|
+
|
|
152
|
+
| Risk | Probability | Impact | Evidence | Mitigation | Owner phase |
|
|
153
|
+
|---|---|---|---|---|---|
|
|
154
|
+
| breaks existing callers | medium | high | `mmcg_callers X → 45` | preserve signature, add compat wrapper | Phase 1 |
|
|
155
|
+
| migration leaves stale data | low | high | schema diff shows nullable column | add backfill script, gate on count > 0 | Phase 2 |
|
|
156
|
+
| no prod observability | low | medium | assumption — new code path | add log line + metric in Phase 3 | Phase 3 |
|
|
157
|
+
|
|
158
|
+
A risk with `impact: high` and no mitigation = automatic critic `fail` on dimension #2 (Performance & scale) or #1 (Correctness).
|
|
101
159
|
|
|
102
160
|
---
|
|
103
161
|
|
|
@@ -112,6 +170,24 @@ Auditor will re-run `mmcg_callers` / `mmcg_search` post-execution and surface an
|
|
|
112
170
|
|
|
113
171
|
---
|
|
114
172
|
|
|
173
|
+
## Evidence Ledger *(required for strict specs; skip for trivial/lite)*
|
|
174
|
+
|
|
175
|
+
Every non-trivial claim in this spec must be backed by one of: mmcg evidence, file evidence, a runnable command, user-provided input, or an explicit assumption. If a claim has no backing, it's a guess — name it as an assumption so the critic and auditor can flag it.
|
|
176
|
+
|
|
177
|
+
| Claim | Evidence type | Evidence | Confidence |
|
|
178
|
+
|---|---|---|---|
|
|
179
|
+
| `<symbol>` has N callers | mmcg | `mmcg_callers <symbol> → N` | high |
|
|
180
|
+
| `<file>` contains `<pattern>` | file | `grep '<pattern>' <file>` | high |
|
|
181
|
+
| no prod runtime | assumption | internal build script only; confirmed with user | medium |
|
|
182
|
+
| `<claim>` | user-provided | user stated in session on <date> | medium |
|
|
183
|
+
|
|
184
|
+
Rules:
|
|
185
|
+
- No `"this should be safe"` without an evidence row
|
|
186
|
+
- No `"existing callers are fine"` without a `mmcg_callers` count
|
|
187
|
+
- Assumptions are allowed, but must be explicit — they become critic `concern` targets
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
115
191
|
## Phase 1: <First logical step — name it by outcome, not process>
|
|
116
192
|
|
|
117
193
|
### 1.1 <Specific action — one verb, one location>
|
|
@@ -260,6 +336,10 @@ Explicit anti-patterns specific to this task. Distinct from the global Rules abo
|
|
|
260
336
|
- [ ] `mmcg_impact` on each symbol-to-be-changed agrees with this spec's stated scope
|
|
261
337
|
- [ ] `VERIFY:` commands look executable for this project
|
|
262
338
|
- [ ] **Alternatives Considered has ≥ 2 entries** (or "trivial change" justification)
|
|
339
|
+
- [ ] **Codeflow diagrams** present for every non-trivial alternative, each node mmcg-verified or marked `[NEW]` (or section explicitly skipped as trivial)
|
|
340
|
+
- [ ] **Decision Matrix** filled for standard/strict specs, or explicitly skipped (write "lite — no decision matrix")
|
|
341
|
+
- [ ] **Risk Register** filled for strict specs, or explicitly skipped (write "lite/standard — no risk register")
|
|
342
|
+
- [ ] **Evidence Ledger** — every non-trivial claim has a row, assumptions are explicit (or section skipped for lite)
|
|
263
343
|
- [ ] **Pre-edit symbol snapshot** filled via mmcg for every edited function/method (or section deleted if no code symbols touched)
|
|
264
344
|
- [ ] **Tests Plan is concrete** (per-test what's covered)
|
|
265
345
|
- [ ] **Documentation Plan** lists every doc touched
|