@event4u/agent-config 1.21.0 → 1.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/.agent-src/commands/bug-fix.md +1 -0
  2. package/.agent-src/commands/bug-investigate.md +1 -0
  3. package/.agent-src/commands/challenge-me/vision.md +348 -0
  4. package/.agent-src/commands/challenge-me/with-docs.md +333 -0
  5. package/.agent-src/commands/challenge-me.md +61 -0
  6. package/.agent-src/commands/council/default.md +64 -17
  7. package/.agent-src/commands/create-pr.md +7 -3
  8. package/.agent-src/commands/grill-me.md +38 -0
  9. package/.agent-src/commands/judge/steps.md +1 -1
  10. package/.agent-src/commands/roadmap/ai-council.md +183 -0
  11. package/.agent-src/commands/roadmap/create.md +6 -1
  12. package/.agent-src/commands/roadmap/process-full.md +58 -0
  13. package/.agent-src/commands/roadmap/process-phase.md +69 -0
  14. package/.agent-src/commands/roadmap/process-step.md +57 -0
  15. package/.agent-src/commands/roadmap.md +44 -16
  16. package/.agent-src/commands/threat-model.md +1 -0
  17. package/.agent-src/contexts/augment-infrastructure.md +1 -1
  18. package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md +53 -18
  19. package/.agent-src/contexts/execution/roadmap-process-loop.md +125 -0
  20. package/.agent-src/contexts/skills-and-commands.md +1 -1
  21. package/.agent-src/rules/improve-before-implement.md +1 -0
  22. package/.agent-src/rules/invite-challenge.md +71 -0
  23. package/.agent-src/skills/adversarial-review/SKILL.md +1 -0
  24. package/.agent-src/skills/ai-council/SKILL.md +132 -8
  25. package/.agent-src/skills/bug-analyzer/SKILL.md +1 -0
  26. package/.agent-src/skills/roadmap-management/SKILL.md +7 -7
  27. package/.agent-src/skills/systematic-debugging/SKILL.md +22 -2
  28. package/.agent-src/skills/technical-specification/SKILL.md +58 -1
  29. package/.agent-src/skills/threat-modeling/SKILL.md +1 -0
  30. package/.agent-src/templates/agent-settings.md +14 -3
  31. package/.agent-src/templates/command.md +17 -1
  32. package/.agent-src/templates/roadmaps.md +10 -2
  33. package/.agent-src/templates/rule.md +2 -0
  34. package/.agent-src/templates/skill.md +17 -0
  35. package/.claude-plugin/marketplace.json +9 -2
  36. package/AGENTS.md +2 -2
  37. package/CHANGELOG.md +38 -0
  38. package/README.md +1 -1
  39. package/config/agent-settings.template.yml +22 -0
  40. package/docs/architecture.md +2 -2
  41. package/docs/contracts/command-clusters.md +45 -1
  42. package/docs/decisions/ADR-003-flat-cluster-subs-and-colon-syntax.md +126 -0
  43. package/docs/getting-started.md +1 -1
  44. package/docs/guidelines/agent-infra/naming.md +1 -1
  45. package/package.json +1 -1
  46. package/scripts/_phase2_shim_helper.py +1 -1
  47. package/scripts/council_cli.py +127 -10
  48. package/scripts/migrate_command_suggestions.py +2 -2
  49. package/scripts/schemas/command.schema.json +5 -0
  50. package/scripts/schemas/rule.schema.json +5 -0
  51. package/scripts/schemas/skill.schema.json +5 -0
  52. package/scripts/skill_linter.py +1 -1
  53. package/.agent-src/commands/roadmap/execute.md +0 -109
@@ -0,0 +1,125 @@
1
+ # Roadmap-Process Loop
2
+
3
+ Loaded by [`/roadmap:process-step`](../../commands/roadmap/process-step.md),
4
+ [`/roadmap:process-phase`](../../commands/roadmap/process-phase.md), and
5
+ [`/roadmap:process-full`](../../commands/roadmap/process-full.md). Holds
6
+ the canonical autonomous-execution loop, roadmap discovery, cadence
7
+ resolution, commit-step pre-scan, halt conditions, and archival check.
8
+ The three command files are thin wrappers that bind only the **scope
9
+ delta**.
10
+
11
+ **Size budget:** ≤ 4,000 chars.
12
+
13
+ ## 1. Resolve roadmap
14
+
15
+ Search both locations:
16
+
17
+ - `agents/roadmaps/*.md` (project root)
18
+ - `app/Modules/*/agents/roadmaps/*.md` (module-scoped)
19
+
20
+ **Exclude** `template.md`, `archive/`, and `skipped/`.
21
+
22
+ - User named one (path, partial name, title) → use it.
23
+ - None named, single active roadmap (`count_open > 0`) → use it.
24
+ - None named, multiple active → default = **most recently modified**;
25
+ surface alternatives in the pre-run summary.
26
+ - None active → tell the user; suggest [`/roadmap:create`](../../commands/roadmap/create.md).
27
+
28
+ ## 2. Pre-run summary — single confirmation gate
29
+
30
+ Before the loop runs, show the resolved config in the user's language:
31
+
32
+ > Roadmap: `<resolved-path>`
33
+ > Phase 1: `<name>` — 3/5 done
34
+ > Phase 2: `<name>` — 0/4
35
+ > Next open step: `<description>`
36
+ > Scope: **step | phase | full**
37
+ > AI council: **on | off** (`<member list or "no members configured">`)
38
+ > Quality cadence: **end_of_roadmap | per_phase | per_step**
39
+ > Commit steps in roadmap: **N** (see § 3)
40
+ >
41
+ > 1. Go — start processing autonomously
42
+ > 2. Different roadmap · 3. Different scope · 4. Toggle council · 5. Abort
43
+
44
+ Skip the gate when scope, roadmap, and council are all unambiguous in
45
+ the invocation (e.g. `/roadmap:process-phase road-to-X.md with council`).
46
+
47
+ ## 3. Commit-step pre-scan — one upfront ask
48
+
49
+ Before step 4, scan the roadmap for explicit commit steps (lines
50
+ matching `commit:` / `git commit` / `Commit phase` patterns).
51
+
52
+ - **No commit steps** → nothing to ask. Never commit, never re-ask
53
+ per [`commit-policy`](../../rules/commit-policy.md).
54
+ - **Commit steps present, autonomous mode** (`personal.autonomy: on`,
55
+ or `auto` after opt-in) → ask **once** upfront:
56
+ > "Roadmap contains N commit steps. Authorize all of them for this
57
+ > run? (yes / no / list them)"
58
+ Cache the answer for the whole run; do **not** re-ask per step.
59
+ Hard-Floor diffs (bulk deletions, infra) still trigger the
60
+ per-commit gate from [`commit-mechanics`](../authority/commit-mechanics.md).
61
+ - **Commit steps present, non-autonomous** → ask before each commit
62
+ step inside the loop.
63
+
64
+ ## 4. Resolve quality cadence
65
+
66
+ Read `roadmap.quality_cadence` from `.agent-settings.yml` once:
67
+
68
+ | Value | Pipeline runs |
69
+ |---|---|
70
+ | `end_of_roadmap` (default) | Once, before archival (§ 6) |
71
+ | `per_phase` | At every phase boundary + § 6 |
72
+ | `per_step` | After every step + § 6 |
73
+
74
+ Missing / unreadable / unknown → fall back to `end_of_roadmap`.
75
+ The Iron Law [`verify-before-complete`](../../rules/verify-before-complete.md)
76
+ still mandates fresh quality output before any "complete" claim.
77
+
78
+ ## 5. Step loop
79
+
80
+ For each open step in the working set (scope-bound — see wrapper):
81
+
82
+ 1. Read the step description and inline notes.
83
+ 2. Analyze the codebase for what the step requires.
84
+ 3. Decide and act — implement. **No "should I implement this?" prompt.**
85
+ 4. **Open question handling:**
86
+ - **Council on** → invoke per [`ai-council`](../../skills/ai-council/SKILL.md),
87
+ integrate convergence, proceed. Token spend was opted in.
88
+ - **Council off** → halt, surface once, wait. Resume on next turn.
89
+ 5. Mark the checkbox: `[x]` done · `[~]` partial · `[-]` skipped.
90
+ 6. Regenerate the dashboard — `./agent-config roadmap:progress` — in
91
+ the **same response** per [`roadmap-progress-sync`](../../rules/roadmap-progress-sync.md).
92
+ 7. Run quality pipeline if cadence is `per_step`.
93
+
94
+ ### Halt conditions
95
+
96
+ - Hard-Floor trigger ([`non-destructive-by-default`](../../rules/non-destructive-by-default.md))
97
+ - Security-sensitive path ([`security-sensitive-stop`](../../rules/security-sensitive-stop.md))
98
+ - Step reveals work outside the roadmap's scope
99
+ - Test failure or quality red on `per_step`
100
+ - Council off + true ambiguity
101
+
102
+ On halt: stop, surface state, do **not** auto-fix outside the failing step.
103
+
104
+ ## 6. Final report and archival
105
+
106
+ - Summary: scope-bound (steps/phases done in this run), council
107
+ consultations count (if on), steps remaining, halts.
108
+ - Final dashboard regen.
109
+ - **If the entire roadmap reached `count_open == 0`** → run the full
110
+ project quality pipeline. On green → archival via the
111
+ [`roadmap-management`](../../skills/roadmap-management/SKILL.md) skill
112
+ (`git mv` to `agents/roadmaps/archive/`, regenerate dashboard). On
113
+ red → stop, surface failures, do **not** archive.
114
+
115
+ ## Scope deltas — what each wrapper binds
116
+
117
+ | Wrapper | Working set | Stop after |
118
+ |---|---|---|
119
+ | `process-step` | Single first open step | One iteration of § 5 |
120
+ | `process-phase` | All open steps in first phase with `count_open > 0` | Phase boundary; per-phase quality if cadence ≠ `end_of_roadmap` |
121
+ | `process-full` | Every open step across every phase, in order | Roadmap fully closed (or halt) |
122
+
123
+ `process-full` runs the per-phase quality pipeline at every phase
124
+ boundary when cadence is `per_phase` or `per_step`; on red it halts
125
+ before the next phase.
@@ -30,7 +30,7 @@ Commands often chain together. Here are the main workflows:
30
30
  ### Feature Development
31
31
 
32
32
  ```
33
- /feature-explore → /feature-plan → /feature-roadmap → /roadmap-execute
33
+ /feature-explore → /feature-plan → /feature-roadmap → /roadmap:process-phase
34
34
  ↓ ↓ ↓ ↓
35
35
  Brainstorm Structure Phases/Steps Implement
36
36
  ```
@@ -4,6 +4,7 @@ tier: "2b"
4
4
  description: "Before implementing features or architectural changes — validate the request against existing code, challenge weak requirements, and suggest improvements"
5
5
  alwaysApply: false
6
6
  source: package
7
+ council_depth: deep
7
8
  triggers:
8
9
  - intent: "implement feature"
9
10
  - intent: "architectural change"
@@ -0,0 +1,71 @@
1
+ ---
2
+ type: "auto"
3
+ tier: "2b"
4
+ description: "Before executing a complex plan or non-trivial design — proactively ask 'am I solving the right problem?' and pause for user confirmation, even when no ambiguity is detected"
5
+ alwaysApply: false
6
+ source: package
7
+ council_depth: deep
8
+ triggers:
9
+ - intent: "complex plan"
10
+ - intent: "design decision"
11
+ - intent: "architectural plan"
12
+ - intent: "multi-step implementation"
13
+ - keyword: "plan"
14
+ - keyword: "design"
15
+ - keyword: "architecture"
16
+ - keyword: "approach"
17
+ ---
18
+
19
+ # invite-challenge
20
+
21
+ ## The Iron Law
22
+
23
+ ```
24
+ BEFORE EXECUTING A COMPLEX PLAN — ASK ONCE:
25
+ "AM I SOLVING THE RIGHT PROBLEM?"
26
+ PAUSE. WAIT FOR CONFIRMATION. THEN PROCEED.
27
+ ```
28
+
29
+ Proactive confirmation checkpoint. Distinct from [`direct-answers`](direct-answers.md) (reactive — fires after misdirection is detected) and [`ask-when-uncertain`](ask-when-uncertain.md) (info-gathering — fires when context is missing). This rule fires when context is **sufficient** and direction **looks correct** but the plan is heavy enough that a silent misread would be expensive.
30
+
31
+ ## When to activate
32
+
33
+ Two or more of:
34
+
35
+ - Touches ≥ 3 files or ≥ 2 modules
36
+ - New abstraction, pattern, or library
37
+ - Security / billing / tenant / data-boundary path
38
+ - Migration, schema change, or backwards-incompatible API change
39
+ - Estimated effort > 30 min of agent work
40
+ - High-level goal ("rebuild the dashboard") rather than a bounded task ("rename this method")
41
+
42
+ **Does NOT activate for:** evidenced bug fixes · trivial edits · user-fenced tasks ("just do it") · steps inside an already-confirmed plan (ask once per plan, not per step) · cases handled by [`improve-before-implement`](improve-before-implement.md) Phase 1.
43
+
44
+ ## How to challenge
45
+
46
+ One question, one turn. Numbered options per [`user-interaction`](user-interaction.md):
47
+
48
+ ```
49
+ > Before I start, one check:
50
+ >
51
+ > Goal as I read it: {1-sentence restatement}
52
+ > Plan: {2-3 bullet shape, not full implementation}
53
+ > Risk if I misread: {what would be wasted}
54
+ >
55
+ > 1. Goal + plan match — proceed
56
+ > 2. Goal right, plan needs adjustment — say what
57
+ > 3. Goal wrong — let me restate
58
+ ```
59
+
60
+ The restatement is the load-bearing part. Pick 1 → execute. Pick 2 or 3 → fold the correction in, do not re-ask.
61
+
62
+ ## Scope limits
63
+
64
+ - **One challenge per plan**, not per step.
65
+ - **Skip when the user already restated the goal** in the same turn.
66
+ - **Skip when [`scope-control § fenced step`](scope-control.md) applies** — deliver the plan, hand back, no confirmation question appended.
67
+ - **Never argue the goal twice** — user's restatement is final for the turn.
68
+
69
+ ## Interactions
70
+
71
+ [`direct-answers`](direct-answers.md) fires after misdirection · [`ask-when-uncertain`](ask-when-uncertain.md) fires on missing context · [`improve-before-implement`](improve-before-implement.md) Phase 1 already runs a clarity check (do not double-ask) · [`scope-control § fenced step`](scope-control.md) handback wins · [`no-cheap-questions`](no-cheap-questions.md) — checkpoint must carry a real goal restatement, not a content-free "ready?".
@@ -4,6 +4,7 @@ description: "ONLY when user explicitly requests adversarial review, devil's adv
4
4
  personas:
5
5
  - critical-challenger
6
6
  source: package
7
+ council_depth: deep
7
8
  ---
8
9
 
9
10
  # Adversarial Review
@@ -43,6 +43,7 @@ prompt that asks them to think on their own merits.
43
43
  THE COUNCIL DOES NOT SEE THE HOST AGENT'S ANALYSIS.
44
44
  THE COUNCIL DOES NOT SEE PRIOR REPLIES.
45
45
  THE COUNCIL SEES THE ARTEFACT + THE NEUTRAL SYSTEM PROMPT. NOTHING ELSE.
46
+ THE HOST AGENT IS THE CONVENER, NEVER A REVIEWER.
46
47
  ```
47
48
 
48
49
  If you find yourself wanting to "frame" the artefact for the council,
@@ -50,6 +51,13 @@ stop. Framing is exactly what kills the second-opinion value. Use the
50
51
  unbiased system prompts in `scripts/ai_council/prompts.py`; do not
51
52
  roll your own.
52
53
 
54
+ The host runs the council and synthesises convergence — it is the
55
+ convener, not a reviewer. The reviewer-ban is structural: the host
56
+ wrote (or framed) the artefact and cannot critique it independently.
57
+ Anonymising the host as "Reviewer C" is worse than excluding it — the
58
+ user is told they got an outside vote when they did not. Externals
59
+ down → surface and skip; never substitute the host as a reviewer.
60
+
53
61
  ### Neutrality — context-handoff
54
62
 
55
63
  External reviewers do better critique when they know **what the
@@ -120,6 +128,27 @@ token counts (from the manual-paste length heuristic or the
120
128
  provider's reply, when available) for observability. Mixed runs
121
129
  (one manual + one api) gate only the api members.
122
130
 
131
+ ## Degradation modes
132
+
133
+ How the council behaves when fewer than two billable members are
134
+ reachable. The orchestrator never silently substitutes — degradation
135
+ is visible to the user.
136
+
137
+ | Reachable | Behaviour | Independence |
138
+ |---|---|---|
139
+ | **2+** | Full fan-out, multi-round debate. Default. | High — cross-provider diversity. |
140
+ | **1** | Single-voice critique with a degraded-run warning. Multi-round mode lets the model see its own anonymised reply, but convergence ≠ correctness. | Low — shared blind spots. |
141
+ | **0** | Council skipped. Surface the failure, proceed without external review. **Never** substitute the host or an unrequested manual pass. | None. |
142
+
143
+ Rejected anti-patterns (council convergence, 2026-05-06): persona
144
+ prompts (same model, same blind spots, more cost), temperature
145
+ spread (noise, not signal), host-as-fallback (Iron Law breach).
146
+ Supported single-provider strategy is **sibling models on the same
147
+ provider** (e.g. Sonnet ↔ Opus, gpt-4o ↔ o1) — different training
148
+ cutoffs / reasoning architectures within one provider family. Cost
149
+ is real (siblings price-tier higher); explicit opt-in per invocation,
150
+ not a default.
151
+
123
152
  ## Procedure
124
153
 
125
154
  1. **Resolve target.** Identify the artefact mode (`prompt`, `roadmap`,
@@ -140,9 +169,61 @@ provider's reply, when available) for observability. Mixed runs
140
169
  into the host agent's voice.
141
170
  6. **Summarise.** Write a `Convergence / Divergence` block listing
142
171
  agreements, disagreements, and unique insights — provider-attributed.
143
- 7. **Translate to options.** Convert actionable council suggestions
144
- into concrete numbered options for the user. The user decides;
145
- the council advises.
172
+ 7. **Critically evaluate** every finding before it leaves the host
173
+ (see *Critical evaluation* below). The host is the convener **and**
174
+ the skeptic — never a reviewer of the artefact itself, but always a
175
+ reviewer of the **council's output**.
176
+ 8. **Translate validated findings to options.** Convert each finding
177
+ the host accepts (or accepts with modification) into a concrete
178
+ numbered option for the user. Tag every option with the host's
179
+ verdict so the user sees the agent's reasoned position, not the
180
+ council's raw output. The user decides; the council advises; the
181
+ host filters.
182
+
183
+ ## Critical evaluation — convener-skeptic stance
184
+
185
+ ```
186
+ COUNCIL CONVERGENCE IS NOT CORRECTNESS.
187
+ DO NOT BLINDLY ACCEPT FINDINGS. DO NOT BLINDLY REJECT THEM.
188
+ EVERY FINDING GETS A REASONED VERDICT BEFORE IT REACHES THE USER.
189
+ ```
190
+
191
+ The council is **uninformed about the codebase, ADRs, locked
192
+ contracts, prior decisions, and project history** — it sees only the
193
+ artefact + neutrality preamble. That is the source of its diversity
194
+ **and** its blind spots. Convergence between members can mean shared
195
+ generic best-practice priors, not project-specific correctness.
196
+
197
+ The host applies a critical lens to **every finding** (convergence
198
+ **and** divergence) before surfacing it as a numbered option:
199
+
200
+ | Check | Question | Tool |
201
+ |---|---|---|
202
+ | **Codebase fit** | Does the finding match the actual code, files, signatures, conventions? | `view` / `codebase-retrieval` / `grep` |
203
+ | **Locked-decision conflict** | Does it contradict an ADR, kernel rule, contract under `docs/contracts/`, or `docs/decisions/`? | `view` |
204
+ | **Already addressed** | Is it a generic best-practice already covered by an existing rule, skill, or test? | `view` / `grep` |
205
+ | **Cost / benefit** | Is the change worth the diff size, churn, and review cost vs. the marginal benefit? | reasoning |
206
+ | **Hallucination** | Does the finding cite a file, function, or behavior that does not exist? | `view` |
207
+
208
+ Each finding receives one of three verdicts:
209
+
210
+ - **`accept`** — codebase fits, no locked-decision conflict, benefit clears cost. Surface as a normal numbered option.
211
+ - **`accept-with-modification`** — core insight valid, but the proposed shape needs adjusting (wrong file, contradicts ADR detail, scope creep). Surface with the **modified** patch and a one-line note.
212
+ - **`reject`** — finding is wrong (hallucinated reference, contradicts a locked decision, already addressed, generic noise). Surface as a **Rejected by host** entry with a one-line reason. Still visible — the user can override.
213
+
214
+ The verdict is the host's **own** reasoning, not the council's.
215
+ Pretending convergence equals correctness, or paraphrasing council
216
+ output as host analysis, both breach the [`direct-answers`](../../rules/direct-answers.md)
217
+ no-invented-facts rule. When the host cannot reach a confident
218
+ verdict on a finding (mixed evidence, ambiguous scope), it surfaces
219
+ the finding as `needs-input` with the open question — the user
220
+ decides, the host does not guess.
221
+
222
+ ### What this is NOT
223
+
224
+ - **Not a re-review by the host.** The host did not write the artefact independently and cannot critique it independently — that boundary still holds.
225
+ - **Not a vote against the council.** Rejecting a finding requires evidence (file, line, contract reference), not preference.
226
+ - **Not silent filtering.** Every finding reaches the user with its verdict and reason. The user can pick a `reject` option and override the host.
146
227
 
147
228
  ## Output path convention
148
229
 
@@ -195,16 +276,26 @@ Every council reply MUST contain, in this order:
195
276
  containing the member's verbatim output.
196
277
  3. **Convergence / Divergence summary** — bullet list, every claim
197
278
  attributed by provider name.
198
- 4. **User-facing options** — numbered block per `user-interaction`,
199
- with "discard council input" always present as an option.
200
-
201
- The host agent NEVER ships council output as its own reasoning.
202
- Provider attribution stays visible in every render.
279
+ 4. **Host verdict per finding** — one row per finding with `accept`
280
+ / `accept-with-modification` / `reject` / `needs-input` plus a
281
+ one-line reason citing host evidence (file:line, ADR, contract).
282
+ See *Critical evaluation* above.
283
+ 5. **User-facing options** numbered block per `user-interaction`,
284
+ carrying the host verdict in each option, with "discard council
285
+ input" always present as an option.
286
+
287
+ The host agent NEVER ships council output as its own reasoning, and
288
+ NEVER ships the host verdict as council output. Provider attribution
289
+ stays visible in the per-member sections; host verdicts stay
290
+ attributed to the host.
203
291
 
204
292
  ## Do NOT
205
293
 
206
294
  - Do NOT paraphrase council output into the host agent's voice — strip
207
295
  attribution and you've stripped the value.
296
+ - Do NOT surface council findings to the user without a host verdict
297
+ — convergence ≠ correctness, and the user deserves the agent's
298
+ reasoned filter, not a raw forward.
208
299
  - Do NOT pre-warm the council with the host agent's analysis or
209
300
  identity — that primes the reviewer and collapses diversity.
210
301
  - Do NOT silently truncate a too-large bundle — surface the size and
@@ -236,6 +327,9 @@ Real failure modes seen in the wild:
236
327
  | Silently truncate a too-large bundle. | Misleads the reviewer into thinking they saw the whole thing. | Bundler raises `BundleTooLarge`; surface and ask for narrower scope. |
237
328
  | Reuse the same SDK client across calls without re-loading the key. | Leaks the key in long-lived process state. | Each invocation builds fresh clients from `load_*_key()`. |
238
329
  | Auto-spend tokens under `personal.autonomy: on`. | Autonomy ≠ permission to spend money. | Always ask before consultation, even under autonomy. |
330
+ | Forward council convergence to the user as numbered options without a host verdict. | Convergence ≠ correctness; the council never saw the codebase. | Apply the *Critical evaluation* lens; tag every finding `accept` / `accept-with-modification` / `reject` / `needs-input` with one-line reason. |
331
+ | Reject a finding on preference, not evidence. | "I don't like this" is not a verdict. | Cite the file, line, ADR, or contract that justifies the rejection — or surface as `needs-input`. |
332
+ | Paraphrase council output into the host's own analysis to defend a verdict. | Strips attribution, breaches `direct-answers` no-invented-facts. | Verdict cites host evidence (file:line); council output stays attributed in the per-member sections. |
239
333
 
240
334
  ## Redaction expectations
241
335
 
@@ -343,9 +437,39 @@ at least once before convergence). The host agent does **not** ask
343
437
  the settings owner already made that decision. Ask only when a
344
438
  genuinely complex artefact justifies more depth than the default.
345
439
 
440
+ ### Deep-reasoning tier (`council_depth: deep`)
441
+
442
+ Architecture review, refactoring proposals, and bug-diagnosis runs
443
+ benefit from an extra critique round. The deep tier is opt-in per
444
+ artefact:
445
+
446
+ 1. The consuming **rule, skill, or command** declares
447
+ `council_depth: deep` in its frontmatter. The schema accepts
448
+ **only `deep`** — `standard` is the implicit default and is
449
+ rejected by the linter (every frontmatter byte costs context
450
+ window; see `scripts/schemas/{rule,skill,command}.schema.json`).
451
+ To return an artefact to default depth, **delete the key**.
452
+ 2. The **host agent** reads that frontmatter when it dispatches the
453
+ council and passes `--depth deep` to `council_cli`. If multiple
454
+ active artefacts disagree, **deep wins** (max policy).
455
+ 3. The **CLI** floors the round count at
456
+ `max(ai_council.deep_min_rounds, ai_council.min_rounds)` —
457
+ defaults to `3` and `2` respectively. Lowering `deep_min_rounds`
458
+ below `min_rounds` has no effect (defensive max).
459
+
460
+ The CLI itself has no knowledge of frontmatter; the contract is the
461
+ flag. Resolution chain (highest priority first):
462
+
463
+ ```
464
+ --rounds N → explicit, any value (user override)
465
+ --depth deep → max(deep_min_rounds, min_rounds)
466
+ (no flag) → min_rounds (default 2)
467
+ ```
468
+
346
469
  | Property | Behaviour |
347
470
  |---|---|
348
471
  | Default count | `ai_council.min_rounds` (default `2`). Override per-invocation with `rounds:N` (or `--rounds N` to the CLI). |
472
+ | Deep floor | `ai_council.deep_min_rounds` (default `3`). Activated by `council_depth: deep` in artefact frontmatter (host translates to `--depth deep`) or explicit `--depth deep` on the CLI. Floored at `min_rounds`. |
349
473
  | Anonymisation | Provider/model identity is stripped. Reviewers are labelled `Reviewer A / B / C…` in input order. |
350
474
  | Errored prior responses | Skipped — they reveal nothing useful and can leak provider error formats. |
351
475
  | Cost budget | Accumulates across rounds. A round-2 call that breaches the cap fires `on_overrun` exactly like a round-1 breach. |
@@ -2,6 +2,7 @@
2
2
  name: bug-analyzer
3
3
  description: "Use when the user shares a Sentry error, Jira bug ticket, or error description and wants root cause analysis. Also for proactive bug hunting and code audits for hidden bugs."
4
4
  source: package
5
+ council_depth: deep
5
6
  ---
6
7
 
7
8
  # bug-analyzer
@@ -9,8 +9,8 @@ source: package
9
9
  ## When to use
10
10
 
11
11
  Use this skill when:
12
- - Creating a new roadmap (`roadmap-create` command)
13
- - Executing a roadmap (`roadmap-execute` command)
12
+ - Creating a new roadmap (`/roadmap:create` command)
13
+ - Executing a roadmap (`/roadmap:process-step|phase|full` commands)
14
14
  - Checking roadmap progress
15
15
  - Updating roadmap status after completing work
16
16
 
@@ -119,8 +119,8 @@ Every roadmap follows this structure:
119
119
 
120
120
  Every roadmap implicitly includes the project's quality pipeline
121
121
  (static analysis, autofixes, tests). What's configurable is **when**
122
- the pipeline runs during `/roadmap execute`, controlled by
123
- `roadmap.quality_cadence` in `.agent-settings.yml`:
122
+ the pipeline runs during `/roadmap:process-step|phase|full`,
123
+ controlled by `roadmap.quality_cadence` in `.agent-settings.yml`:
124
124
 
125
125
  | Cadence | Pipeline runs | Trade-off |
126
126
  |---|---|---|
@@ -164,7 +164,7 @@ unfamiliar codebases.
164
164
  evidence-based reason (e.g. risky migration benefits from a spike).
165
165
  Never include release versions, deprecation dates, or git tags in
166
166
  the roadmap text. If the user declines, do **not** re-propose during
167
- `roadmap-execute`. Decline = silence. See [`scope-control`](../../rules/scope-control.md#decline--silence--no-re-asking-on-the-same-task).
167
+ `/roadmap:process-*`. Decline = silence. See [`scope-control`](../../rules/scope-control.md#decline--silence--no-re-asking-on-the-same-task).
168
168
  5. Save with a kebab-case filename (e.g. `optimize-webhook-jobs.md`).
169
169
  **Before writing**, scan the entire roadmap namespace for a
170
170
  collision — active, `archive/`, `skipped/`, and nested subdirs —
@@ -289,8 +289,8 @@ is rewritten by `.augment/scripts/update_roadmap_progress.py`.
289
289
  **Always regenerate in the SAME response** after any of the following
290
290
  (enforced by [`roadmap-progress-sync`](../../rules/roadmap-progress-sync.md)):
291
291
 
292
- - Creating a new roadmap (`roadmap-create`)
293
- - Marking a step `[x]`, `[~]`, or `[-]` during `roadmap-execute`
292
+ - Creating a new roadmap (`/roadmap:create`)
293
+ - Marking a step `[x]`, `[~]`, or `[-]` during `/roadmap:process-*`
294
294
  - Archiving or moving a roadmap to `skipped/`
295
295
  - Adding, renaming, or removing a phase
296
296
 
@@ -2,6 +2,7 @@
2
2
  name: systematic-debugging
3
3
  description: "Use when hitting a bug, test failure, crash, or unexpected behavior — enforces reproduce → isolate → hypothesize → verify before any fix — even when the user just says 'this is broken' or 'quick fix'."
4
4
  source: package
5
+ council_depth: deep
5
6
  ---
6
7
 
7
8
  # systematic-debugging
@@ -35,10 +36,28 @@ papers over an unknown cause is a regression waiting to happen.
35
36
 
36
37
  ```
37
38
  NO FIX WITHOUT ROOT CAUSE. NO ROOT CAUSE WITHOUT EVIDENCE.
39
+ NO BUG MARKED FIXED WITHOUT A REGRESSION TEST.
38
40
  ```
39
41
 
40
42
  "I think it's probably X" is not evidence. A log line, a stack trace, a
41
- diff, a reproduced failure — those are evidence.
43
+ diff, a reproduced failure — those are evidence. A green run after a
44
+ manual edit is not a regression test — a test that fails without the fix
45
+ and passes with it, is.
46
+
47
+ ## The 6-phase loop (the spine)
48
+
49
+ Every debug session walks these six phases in order. Treat them as a
50
+ checklist — tick each box before claiming the bug fixed:
51
+
52
+ - [ ] **1. Reproduce** — bug triggers on demand, smallest possible setup _(Phase 1)_
53
+ - [ ] **2. Minimize** — smallest failing case isolated, irrelevant context stripped _(Phase 1, step 2)_
54
+ - [ ] **3. Hypothesize** — one testable theory stated in one sentence _(Phase 3)_
55
+ - [ ] **4. Instrument** — log / breakpoint / trace at the boundary where expected ≠ actual _(Phase 2)_
56
+ - [ ] **5. Fix** — single, minimal change targeting the root cause _(Phase 4, step 2)_
57
+ - [ ] **6. Regression-test** — failing test added that catches the bug returning **(MANDATORY — no exception)** _(Phase 4, step 1 + Validation checklist)_
58
+
59
+ Skipping a box (especially #2 or #6) is the single biggest cause of
60
+ wasted debug time and re-opened bugs.
42
61
 
43
62
  ## Procedure
44
63
 
@@ -242,10 +261,11 @@ When reporting debug findings to the user:
242
261
 
243
262
  Before declaring a bug fixed:
244
263
 
264
+ * [ ] **6-phase loop** — all six boxes (Reproduce → Minimize → Hypothesize → Instrument → Fix → Regression-test) ticked
245
265
  * [ ] The failure was reproduced before any code changed
246
266
  * [ ] The root cause is named explicitly, not "probably"
247
267
  * [ ] Evidence (log, trace, diff) supports the named root cause
248
- * [ ] A failing test reproducing the bug was added or updated
268
+ * [ ] **Regression test added — MANDATORY**: a test that fails without the fix and passes with it. No exception. "Manual reproduction confirmed gone" is not a regression test
249
269
  * [ ] The fix is minimal and targets the root cause, not the symptom
250
270
  * [ ] The regression test now passes
251
271
  * [ ] Adjacent tests still pass
@@ -1,7 +1,8 @@
1
1
  ---
2
2
  name: technical-specification
3
- description: "Use when the user says "write a spec", "create RFC", or "document this decision". Writes technical specifications, RFCs, and ADRs with clear structure."
3
+ description: "Use when the user says "write a spec", "create RFC", "write a PRD", or "document this decision". Writes technical specifications, PRDs, RFCs, and ADRs with clear structure."
4
4
  source: package
5
+ council_depth: deep
5
6
  ---
6
7
 
7
8
  # technical-specification
@@ -10,6 +11,7 @@ source: package
10
11
 
11
12
  Use this skill when:
12
13
  - Writing a technical specification for a new feature or system
14
+ - Writing a Product Requirements Document (PRD) when user pain leads and solution shape follows
13
15
  - Creating an architecture decision record (ADR)
14
16
  - Documenting a technical RFC (Request for Comments)
15
17
  - Planning a significant technical change that needs team review
@@ -69,6 +71,59 @@ For complex features or systems. Stored in `agents/features/` or module `agents/
69
71
  - {Links to related docs, tickets, or external resources}
70
72
  ```
71
73
 
74
+ ### Product Requirements Document (PRD)
75
+
76
+ For features whose **shape is product-driven** rather than implementation-driven —
77
+ when "what users get" matters more than "how the code is wired". Use a PRD
78
+ when a Technical Specification would over-index on internals and under-index
79
+ on user value. Stored in `agents/features/` next to the technical spec when
80
+ both exist.
81
+
82
+ ```markdown
83
+ # PRD: {Title}
84
+
85
+ ## Status
86
+ { Draft | In Review | Approved | Shipped | Parked }
87
+
88
+ ## Problem
89
+ {The user-visible pain. One paragraph. Cite evidence — support tickets,
90
+ analytics, user quotes — not assumptions.}
91
+
92
+ ## Success Criteria
93
+ - {Measurable outcome with a number and a timeframe}
94
+ - {Another measurable outcome}
95
+
96
+ ## Non-Goals
97
+ - {What this PRD explicitly does NOT cover — protect scope}
98
+
99
+ ## User Stories
100
+ - As a {role}, I want to {action}, so that {outcome}.
101
+ - As a {role}, I want to {action}, so that {outcome}.
102
+
103
+ ## Major Modules
104
+ {The 3-7 functional building blocks the feature decomposes into. One
105
+ sentence each — what it does, not how. Implementation lives in the
106
+ technical spec, not here.}
107
+
108
+ - **{Module name}** — {one-sentence responsibility}
109
+ - **{Module name}** — {one-sentence responsibility}
110
+
111
+ ## Open Questions
112
+ - [ ] {Unresolved product question — pricing, permission, copy, edge case}
113
+
114
+ ## References
115
+ - {Links to research, related PRDs, technical spec when it exists}
116
+ ```
117
+
118
+ **PRD vs Technical Specification — when to pick which:**
119
+
120
+ - **PRD first** when the user pain is clear but the solution shape is not
121
+ ("users can't share dashboards" → PRD scopes the problem; spec follows).
122
+ - **Spec first** when the solution shape is clear but the implementation
123
+ is non-trivial ("rebuild auth on OAuth2" → spec leads; PRD often skipped).
124
+ - **Both** when the feature is large enough that product and engineering
125
+ decisions belong in different artifacts.
126
+
72
127
  ### Architecture Decision Record (ADR)
73
128
 
74
129
  For significant technical decisions. Stored in `agents/decisions/`.
@@ -161,6 +216,8 @@ If a developer reads only this document, they should be able to build it.
161
216
  ## Auto-trigger keywords
162
217
 
163
218
  - technical spec
219
+ - PRD
220
+ - product requirements
164
221
  - RFC
165
222
  - ADR
166
223
  - architecture decision
@@ -2,6 +2,7 @@
2
2
  name: threat-modeling
3
3
  description: "Use when adding auth, webhooks, uploads, queues, secrets, tenant boundaries, or public endpoints — produces trust boundaries + abuse cases mapped to files, BEFORE implementation."
4
4
  source: package
5
+ council_depth: deep
5
6
  ---
6
7
 
7
8
  # threat-modeling