@event4u/agent-config 1.21.0 → 1.22.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent-src/commands/bug-fix.md +1 -0
- package/.agent-src/commands/bug-investigate.md +1 -0
- package/.agent-src/commands/challenge-me/vision.md +348 -0
- package/.agent-src/commands/challenge-me/with-docs.md +333 -0
- package/.agent-src/commands/challenge-me.md +61 -0
- package/.agent-src/commands/council/default.md +64 -17
- package/.agent-src/commands/create-pr.md +7 -3
- package/.agent-src/commands/grill-me.md +38 -0
- package/.agent-src/commands/judge/steps.md +1 -1
- package/.agent-src/commands/roadmap/ai-council.md +183 -0
- package/.agent-src/commands/roadmap/create.md +6 -1
- package/.agent-src/commands/roadmap/process-full.md +58 -0
- package/.agent-src/commands/roadmap/process-phase.md +69 -0
- package/.agent-src/commands/roadmap/process-step.md +57 -0
- package/.agent-src/commands/roadmap.md +44 -16
- package/.agent-src/commands/threat-model.md +1 -0
- package/.agent-src/contexts/augment-infrastructure.md +1 -1
- package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md +53 -18
- package/.agent-src/contexts/execution/roadmap-process-loop.md +125 -0
- package/.agent-src/contexts/skills-and-commands.md +1 -1
- package/.agent-src/rules/improve-before-implement.md +1 -0
- package/.agent-src/rules/invite-challenge.md +71 -0
- package/.agent-src/skills/adversarial-review/SKILL.md +1 -0
- package/.agent-src/skills/ai-council/SKILL.md +132 -8
- package/.agent-src/skills/bug-analyzer/SKILL.md +1 -0
- package/.agent-src/skills/roadmap-management/SKILL.md +7 -7
- package/.agent-src/skills/systematic-debugging/SKILL.md +22 -2
- package/.agent-src/skills/technical-specification/SKILL.md +58 -1
- package/.agent-src/skills/threat-modeling/SKILL.md +1 -0
- package/.agent-src/templates/agent-settings.md +14 -3
- package/.agent-src/templates/command.md +17 -1
- package/.agent-src/templates/roadmaps.md +10 -2
- package/.agent-src/templates/rule.md +2 -0
- package/.agent-src/templates/skill.md +17 -0
- package/.claude-plugin/marketplace.json +9 -2
- package/AGENTS.md +2 -2
- package/CHANGELOG.md +38 -0
- package/README.md +1 -1
- package/config/agent-settings.template.yml +22 -0
- package/docs/architecture.md +2 -2
- package/docs/contracts/command-clusters.md +45 -1
- package/docs/decisions/ADR-003-flat-cluster-subs-and-colon-syntax.md +126 -0
- package/docs/getting-started.md +1 -1
- package/docs/guidelines/agent-infra/naming.md +1 -1
- package/package.json +1 -1
- package/scripts/_phase2_shim_helper.py +1 -1
- package/scripts/council_cli.py +127 -10
- package/scripts/migrate_command_suggestions.py +2 -2
- package/scripts/schemas/command.schema.json +5 -0
- package/scripts/schemas/rule.schema.json +5 -0
- package/scripts/schemas/skill.schema.json +5 -0
- package/scripts/skill_linter.py +1 -1
- package/.agent-src/commands/roadmap/execute.md +0 -109
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
# Roadmap-Process Loop
|
|
2
|
+
|
|
3
|
+
Loaded by [`/roadmap:process-step`](../../commands/roadmap/process-step.md),
|
|
4
|
+
[`/roadmap:process-phase`](../../commands/roadmap/process-phase.md), and
|
|
5
|
+
[`/roadmap:process-full`](../../commands/roadmap/process-full.md). Holds
|
|
6
|
+
the canonical autonomous-execution loop, roadmap discovery, cadence
|
|
7
|
+
resolution, commit-step pre-scan, halt conditions, and archival check.
|
|
8
|
+
The three command files are thin wrappers that bind only the **scope
|
|
9
|
+
delta**.
|
|
10
|
+
|
|
11
|
+
**Size budget:** ≤ 4,000 chars.
|
|
12
|
+
|
|
13
|
+
## 1. Resolve roadmap
|
|
14
|
+
|
|
15
|
+
Search both locations:
|
|
16
|
+
|
|
17
|
+
- `agents/roadmaps/*.md` (project root)
|
|
18
|
+
- `app/Modules/*/agents/roadmaps/*.md` (module-scoped)
|
|
19
|
+
|
|
20
|
+
**Exclude** `template.md`, `archive/`, and `skipped/`.
|
|
21
|
+
|
|
22
|
+
- User named one (path, partial name, title) → use it.
|
|
23
|
+
- None named, single active roadmap (`count_open > 0`) → use it.
|
|
24
|
+
- None named, multiple active → default = **most recently modified**;
|
|
25
|
+
surface alternatives in the pre-run summary.
|
|
26
|
+
- None active → tell the user; suggest [`/roadmap:create`](../../commands/roadmap/create.md).
|
|
27
|
+
|
|
28
|
+
## 2. Pre-run summary — single confirmation gate
|
|
29
|
+
|
|
30
|
+
Before the loop runs, show the resolved config in the user's language:
|
|
31
|
+
|
|
32
|
+
> Roadmap: `<resolved-path>`
|
|
33
|
+
> Phase 1: `<name>` — 3/5 done
|
|
34
|
+
> Phase 2: `<name>` — 0/4
|
|
35
|
+
> Next open step: `<description>`
|
|
36
|
+
> Scope: **step | phase | full**
|
|
37
|
+
> AI council: **on | off** (`<member list or "no members configured">`)
|
|
38
|
+
> Quality cadence: **end_of_roadmap | per_phase | per_step**
|
|
39
|
+
> Commit steps in roadmap: **N** (see § 3)
|
|
40
|
+
>
|
|
41
|
+
> 1. Go — start processing autonomously
|
|
42
|
+
> 2. Different roadmap · 3. Different scope · 4. Toggle council · 5. Abort
|
|
43
|
+
|
|
44
|
+
Skip the gate when scope, roadmap, and council are all unambiguous in
|
|
45
|
+
the invocation (e.g. `/roadmap:process-phase road-to-X.md with council`).
|
|
46
|
+
|
|
47
|
+
## 3. Commit-step pre-scan — one upfront ask
|
|
48
|
+
|
|
49
|
+
Before step 4, scan the roadmap for explicit commit steps (lines
|
|
50
|
+
matching `commit:` / `git commit` / `Commit phase` patterns).
|
|
51
|
+
|
|
52
|
+
- **No commit steps** → nothing to ask. Never commit, never re-ask
|
|
53
|
+
per [`commit-policy`](../../rules/commit-policy.md).
|
|
54
|
+
- **Commit steps present, autonomous mode** (`personal.autonomy: on`,
|
|
55
|
+
or `auto` after opt-in) → ask **once** upfront:
|
|
56
|
+
> "Roadmap contains N commit steps. Authorize all of them for this
|
|
57
|
+
> run? (yes / no / list them)"
|
|
58
|
+
Cache the answer for the whole run; do **not** re-ask per step.
|
|
59
|
+
Hard-Floor diffs (bulk deletions, infra) still trigger the
|
|
60
|
+
per-commit gate from [`commit-mechanics`](../authority/commit-mechanics.md).
|
|
61
|
+
- **Commit steps present, non-autonomous** → ask before each commit
|
|
62
|
+
step inside the loop.
|
|
63
|
+
|
|
64
|
+
## 4. Resolve quality cadence
|
|
65
|
+
|
|
66
|
+
Read `roadmap.quality_cadence` from `.agent-settings.yml` once:
|
|
67
|
+
|
|
68
|
+
| Value | Pipeline runs |
|
|
69
|
+
|---|---|
|
|
70
|
+
| `end_of_roadmap` (default) | Once, before archival (§ 6) |
|
|
71
|
+
| `per_phase` | At every phase boundary + § 6 |
|
|
72
|
+
| `per_step` | After every step + § 6 |
|
|
73
|
+
|
|
74
|
+
Missing / unreadable / unknown → fall back to `end_of_roadmap`.
|
|
75
|
+
The Iron Law [`verify-before-complete`](../../rules/verify-before-complete.md)
|
|
76
|
+
still mandates fresh quality output before any "complete" claim.
|
|
77
|
+
|
|
78
|
+
## 5. Step loop
|
|
79
|
+
|
|
80
|
+
For each open step in the working set (scope-bound — see wrapper):
|
|
81
|
+
|
|
82
|
+
1. Read the step description and inline notes.
|
|
83
|
+
2. Analyze the codebase for what the step requires.
|
|
84
|
+
3. Decide and act — implement. **No "should I implement this?" prompt.**
|
|
85
|
+
4. **Open question handling:**
|
|
86
|
+
- **Council on** → invoke per [`ai-council`](../../skills/ai-council/SKILL.md),
|
|
87
|
+
integrate convergence, proceed. Token spend was opted in.
|
|
88
|
+
- **Council off** → halt, surface once, wait. Resume on next turn.
|
|
89
|
+
5. Mark the checkbox: `[x]` done · `[~]` partial · `[-]` skipped.
|
|
90
|
+
6. Regenerate the dashboard — `./agent-config roadmap:progress` — in
|
|
91
|
+
the **same response** per [`roadmap-progress-sync`](../../rules/roadmap-progress-sync.md).
|
|
92
|
+
7. Run quality pipeline if cadence is `per_step`.
|
|
93
|
+
|
|
94
|
+
### Halt conditions
|
|
95
|
+
|
|
96
|
+
- Hard-Floor trigger ([`non-destructive-by-default`](../../rules/non-destructive-by-default.md))
|
|
97
|
+
- Security-sensitive path ([`security-sensitive-stop`](../../rules/security-sensitive-stop.md))
|
|
98
|
+
- Step reveals work outside the roadmap's scope
|
|
99
|
+
- Test failure or quality red on `per_step`
|
|
100
|
+
- Council off + true ambiguity
|
|
101
|
+
|
|
102
|
+
On halt: stop, surface state, do **not** auto-fix outside the failing step.
|
|
103
|
+
|
|
104
|
+
## 6. Final report and archival
|
|
105
|
+
|
|
106
|
+
- Summary: scope-bound (steps/phases done in this run), council
|
|
107
|
+
consultations count (if on), steps remaining, halts.
|
|
108
|
+
- Final dashboard regen.
|
|
109
|
+
- **If the entire roadmap reached `count_open == 0`** → run the full
|
|
110
|
+
project quality pipeline. On green → archival via the
|
|
111
|
+
[`roadmap-management`](../../skills/roadmap-management/SKILL.md) skill
|
|
112
|
+
(`git mv` to `agents/roadmaps/archive/`, regenerate dashboard). On
|
|
113
|
+
red → stop, surface failures, do **not** archive.
|
|
114
|
+
|
|
115
|
+
## Scope deltas — what each wrapper binds
|
|
116
|
+
|
|
117
|
+
| Wrapper | Working set | Stop after |
|
|
118
|
+
|---|---|---|
|
|
119
|
+
| `process-step` | Single first open step | One iteration of § 5 |
|
|
120
|
+
| `process-phase` | All open steps in first phase with `count_open > 0` | Phase boundary; per-phase quality if cadence ≠ `end_of_roadmap` |
|
|
121
|
+
| `process-full` | Every open step across every phase, in order | Roadmap fully closed (or halt) |
|
|
122
|
+
|
|
123
|
+
`process-full` runs the per-phase quality pipeline at every phase
|
|
124
|
+
boundary when cadence is `per_phase` or `per_step`; on red it halts
|
|
125
|
+
before the next phase.
|
|
@@ -30,7 +30,7 @@ Commands often chain together. Here are the main workflows:
|
|
|
30
30
|
### Feature Development
|
|
31
31
|
|
|
32
32
|
```
|
|
33
|
-
/feature-explore → /feature-plan → /feature-roadmap → /roadmap-
|
|
33
|
+
/feature-explore → /feature-plan → /feature-roadmap → /roadmap:process-phase
|
|
34
34
|
↓ ↓ ↓ ↓
|
|
35
35
|
Brainstorm Structure Phases/Steps Implement
|
|
36
36
|
```
|
|
@@ -4,6 +4,7 @@ tier: "2b"
|
|
|
4
4
|
description: "Before implementing features or architectural changes — validate the request against existing code, challenge weak requirements, and suggest improvements"
|
|
5
5
|
alwaysApply: false
|
|
6
6
|
source: package
|
|
7
|
+
council_depth: deep
|
|
7
8
|
triggers:
|
|
8
9
|
- intent: "implement feature"
|
|
9
10
|
- intent: "architectural change"
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
---
|
|
2
|
+
type: "auto"
|
|
3
|
+
tier: "2b"
|
|
4
|
+
description: "Before executing a complex plan or non-trivial design — proactively ask 'am I solving the right problem?' and pause for user confirmation, even when no ambiguity is detected"
|
|
5
|
+
alwaysApply: false
|
|
6
|
+
source: package
|
|
7
|
+
council_depth: deep
|
|
8
|
+
triggers:
|
|
9
|
+
- intent: "complex plan"
|
|
10
|
+
- intent: "design decision"
|
|
11
|
+
- intent: "architectural plan"
|
|
12
|
+
- intent: "multi-step implementation"
|
|
13
|
+
- keyword: "plan"
|
|
14
|
+
- keyword: "design"
|
|
15
|
+
- keyword: "architecture"
|
|
16
|
+
- keyword: "approach"
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
# invite-challenge
|
|
20
|
+
|
|
21
|
+
## The Iron Law
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
BEFORE EXECUTING A COMPLEX PLAN — ASK ONCE:
|
|
25
|
+
"AM I SOLVING THE RIGHT PROBLEM?"
|
|
26
|
+
PAUSE. WAIT FOR CONFIRMATION. THEN PROCEED.
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
Proactive confirmation checkpoint. Distinct from [`direct-answers`](direct-answers.md) (reactive — fires after misdirection is detected) and [`ask-when-uncertain`](ask-when-uncertain.md) (info-gathering — fires when context is missing). This rule fires when context is **sufficient** and direction **looks correct** but the plan is heavy enough that a silent misread would be expensive.
|
|
30
|
+
|
|
31
|
+
## When to activate
|
|
32
|
+
|
|
33
|
+
Two or more of:
|
|
34
|
+
|
|
35
|
+
- Touches ≥ 3 files or ≥ 2 modules
|
|
36
|
+
- New abstraction, pattern, or library
|
|
37
|
+
- Security / billing / tenant / data-boundary path
|
|
38
|
+
- Migration, schema change, or backwards-incompatible API change
|
|
39
|
+
- Estimated effort > 30 min of agent work
|
|
40
|
+
- High-level goal ("rebuild the dashboard") rather than a bounded task ("rename this method")
|
|
41
|
+
|
|
42
|
+
**Does NOT activate for:** evidenced bug fixes · trivial edits · user-fenced tasks ("just do it") · steps inside an already-confirmed plan (ask once per plan, not per step) · cases handled by [`improve-before-implement`](improve-before-implement.md) Phase 1.
|
|
43
|
+
|
|
44
|
+
## How to challenge
|
|
45
|
+
|
|
46
|
+
One question, one turn. Numbered options per [`user-interaction`](user-interaction.md):
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
> Before I start, one check:
|
|
50
|
+
>
|
|
51
|
+
> Goal as I read it: {1-sentence restatement}
|
|
52
|
+
> Plan: {2-3 bullet shape, not full implementation}
|
|
53
|
+
> Risk if I misread: {what would be wasted}
|
|
54
|
+
>
|
|
55
|
+
> 1. Goal + plan match — proceed
|
|
56
|
+
> 2. Goal right, plan needs adjustment — say what
|
|
57
|
+
> 3. Goal wrong — let me restate
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
The restatement is the load-bearing part. Pick 1 → execute. Pick 2 or 3 → fold the correction in, do not re-ask.
|
|
61
|
+
|
|
62
|
+
## Scope limits
|
|
63
|
+
|
|
64
|
+
- **One challenge per plan**, not per step.
|
|
65
|
+
- **Skip when the user already restated the goal** in the same turn.
|
|
66
|
+
- **Skip when [`scope-control § fenced step`](scope-control.md) applies** — deliver the plan, hand back, no confirmation question appended.
|
|
67
|
+
- **Never argue the goal twice** — user's restatement is final for the turn.
|
|
68
|
+
|
|
69
|
+
## Interactions
|
|
70
|
+
|
|
71
|
+
[`direct-answers`](direct-answers.md) fires after misdirection · [`ask-when-uncertain`](ask-when-uncertain.md) fires on missing context · [`improve-before-implement`](improve-before-implement.md) Phase 1 already runs a clarity check (do not double-ask) · [`scope-control § fenced step`](scope-control.md) handback wins · [`no-cheap-questions`](no-cheap-questions.md) — checkpoint must carry a real goal restatement, not a content-free "ready?".
|
|
@@ -43,6 +43,7 @@ prompt that asks them to think on their own merits.
|
|
|
43
43
|
THE COUNCIL DOES NOT SEE THE HOST AGENT'S ANALYSIS.
|
|
44
44
|
THE COUNCIL DOES NOT SEE PRIOR REPLIES.
|
|
45
45
|
THE COUNCIL SEES THE ARTEFACT + THE NEUTRAL SYSTEM PROMPT. NOTHING ELSE.
|
|
46
|
+
THE HOST AGENT IS THE CONVENER, NEVER A REVIEWER.
|
|
46
47
|
```
|
|
47
48
|
|
|
48
49
|
If you find yourself wanting to "frame" the artefact for the council,
|
|
@@ -50,6 +51,13 @@ stop. Framing is exactly what kills the second-opinion value. Use the
|
|
|
50
51
|
unbiased system prompts in `scripts/ai_council/prompts.py`; do not
|
|
51
52
|
roll your own.
|
|
52
53
|
|
|
54
|
+
The host runs the council and synthesises convergence — it is the
|
|
55
|
+
convener, not a reviewer. The reviewer-ban is structural: the host
|
|
56
|
+
wrote (or framed) the artefact and cannot critique it independently.
|
|
57
|
+
Anonymising the host as "Reviewer C" is worse than excluding it — the
|
|
58
|
+
user is told they got an outside vote when they did not. Externals
|
|
59
|
+
down → surface and skip; never substitute the host as a reviewer.
|
|
60
|
+
|
|
53
61
|
### Neutrality — context-handoff
|
|
54
62
|
|
|
55
63
|
External reviewers do better critique when they know **what the
|
|
@@ -120,6 +128,27 @@ token counts (from the manual-paste length heuristic or the
|
|
|
120
128
|
provider's reply, when available) for observability. Mixed runs
|
|
121
129
|
(one manual + one api) gate only the api members.
|
|
122
130
|
|
|
131
|
+
## Degradation modes
|
|
132
|
+
|
|
133
|
+
How the council behaves when fewer than two billable members are
|
|
134
|
+
reachable. The orchestrator never silently substitutes — degradation
|
|
135
|
+
is visible to the user.
|
|
136
|
+
|
|
137
|
+
| Reachable | Behaviour | Independence |
|
|
138
|
+
|---|---|---|
|
|
139
|
+
| **2+** | Full fan-out, multi-round debate. Default. | High — cross-provider diversity. |
|
|
140
|
+
| **1** | Single-voice critique with a degraded-run warning. Multi-round mode lets the model see its own anonymised reply, but convergence ≠ correctness. | Low — shared blind spots. |
|
|
141
|
+
| **0** | Council skipped. Surface the failure, proceed without external review. **Never** substitute the host or an unrequested manual pass. | None. |
|
|
142
|
+
|
|
143
|
+
Rejected anti-patterns (council convergence, 2026-05-06): persona
|
|
144
|
+
prompts (same model, same blind spots, more cost), temperature
|
|
145
|
+
spread (noise, not signal), host-as-fallback (Iron Law breach).
|
|
146
|
+
Supported single-provider strategy is **sibling models on the same
|
|
147
|
+
provider** (e.g. Sonnet ↔ Opus, gpt-4o ↔ o1) — different training
|
|
148
|
+
cutoffs / reasoning architectures within one provider family. Cost
|
|
149
|
+
is real (siblings price-tier higher); explicit opt-in per invocation,
|
|
150
|
+
not a default.
|
|
151
|
+
|
|
123
152
|
## Procedure
|
|
124
153
|
|
|
125
154
|
1. **Resolve target.** Identify the artefact mode (`prompt`, `roadmap`,
|
|
@@ -140,9 +169,61 @@ provider's reply, when available) for observability. Mixed runs
|
|
|
140
169
|
into the host agent's voice.
|
|
141
170
|
6. **Summarise.** Write a `Convergence / Divergence` block listing
|
|
142
171
|
agreements, disagreements, and unique insights — provider-attributed.
|
|
143
|
-
7. **
|
|
144
|
-
|
|
145
|
-
the
|
|
172
|
+
7. **Critically evaluate** every finding before it leaves the host
|
|
173
|
+
(see *Critical evaluation* below). The host is the convener **and**
|
|
174
|
+
the skeptic — never a reviewer of the artefact itself, but always a
|
|
175
|
+
reviewer of the **council's output**.
|
|
176
|
+
8. **Translate validated findings to options.** Convert each finding
|
|
177
|
+
the host accepts (or accepts with modification) into a concrete
|
|
178
|
+
numbered option for the user. Tag every option with the host's
|
|
179
|
+
verdict so the user sees the agent's reasoned position, not the
|
|
180
|
+
council's raw output. The user decides; the council advises; the
|
|
181
|
+
host filters.
|
|
182
|
+
|
|
183
|
+
## Critical evaluation — convener-skeptic stance
|
|
184
|
+
|
|
185
|
+
```
|
|
186
|
+
COUNCIL CONVERGENCE IS NOT CORRECTNESS.
|
|
187
|
+
DO NOT BLINDLY ACCEPT FINDINGS. DO NOT BLINDLY REJECT THEM.
|
|
188
|
+
EVERY FINDING GETS A REASONED VERDICT BEFORE IT REACHES THE USER.
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
The council is **uninformed about the codebase, ADRs, locked
|
|
192
|
+
contracts, prior decisions, and project history** — it sees only the
|
|
193
|
+
artefact + neutrality preamble. That is the source of its diversity
|
|
194
|
+
**and** its blind spots. Convergence between members can mean shared
|
|
195
|
+
generic best-practice priors, not project-specific correctness.
|
|
196
|
+
|
|
197
|
+
The host applies a critical lens to **every finding** (convergence
|
|
198
|
+
**and** divergence) before surfacing it as a numbered option:
|
|
199
|
+
|
|
200
|
+
| Check | Question | Tool |
|
|
201
|
+
|---|---|---|
|
|
202
|
+
| **Codebase fit** | Does the finding match the actual code, files, signatures, conventions? | `view` / `codebase-retrieval` / `grep` |
|
|
203
|
+
| **Locked-decision conflict** | Does it contradict an ADR, kernel rule, contract under `docs/contracts/`, or `docs/decisions/`? | `view` |
|
|
204
|
+
| **Already addressed** | Is it a generic best-practice already covered by an existing rule, skill, or test? | `view` / `grep` |
|
|
205
|
+
| **Cost / benefit** | Is the change worth the diff size, churn, and review cost vs. the marginal benefit? | reasoning |
|
|
206
|
+
| **Hallucination** | Does the finding cite a file, function, or behavior that does not exist? | `view` |
|
|
207
|
+
|
|
208
|
+
Each finding receives one of three verdicts:
|
|
209
|
+
|
|
210
|
+
- **`accept`** — codebase fits, no locked-decision conflict, benefit clears cost. Surface as a normal numbered option.
|
|
211
|
+
- **`accept-with-modification`** — core insight valid, but the proposed shape needs adjusting (wrong file, contradicts ADR detail, scope creep). Surface with the **modified** patch and a one-line note.
|
|
212
|
+
- **`reject`** — finding is wrong (hallucinated reference, contradicts a locked decision, already addressed, generic noise). Surface as a **Rejected by host** entry with a one-line reason. Still visible — the user can override.
|
|
213
|
+
|
|
214
|
+
The verdict is the host's **own** reasoning, not the council's.
|
|
215
|
+
Pretending convergence equals correctness, or paraphrasing council
|
|
216
|
+
output as host analysis, both breach the [`direct-answers`](../../rules/direct-answers.md)
|
|
217
|
+
no-invented-facts rule. When the host cannot reach a confident
|
|
218
|
+
verdict on a finding (mixed evidence, ambiguous scope), it surfaces
|
|
219
|
+
the finding as `needs-input` with the open question — the user
|
|
220
|
+
decides, the host does not guess.
|
|
221
|
+
|
|
222
|
+
### What this is NOT
|
|
223
|
+
|
|
224
|
+
- **Not a re-review by the host.** The host did not write the artefact independently and cannot critique it independently — that boundary still holds.
|
|
225
|
+
- **Not a vote against the council.** Rejecting a finding requires evidence (file, line, contract reference), not preference.
|
|
226
|
+
- **Not silent filtering.** Every finding reaches the user with its verdict and reason. The user can pick a `reject` option and override the host.
|
|
146
227
|
|
|
147
228
|
## Output path convention
|
|
148
229
|
|
|
@@ -195,16 +276,26 @@ Every council reply MUST contain, in this order:
|
|
|
195
276
|
containing the member's verbatim output.
|
|
196
277
|
3. **Convergence / Divergence summary** — bullet list, every claim
|
|
197
278
|
attributed by provider name.
|
|
198
|
-
4. **
|
|
199
|
-
with
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
279
|
+
4. **Host verdict per finding** — one row per finding with `accept`
|
|
280
|
+
/ `accept-with-modification` / `reject` / `needs-input` plus a
|
|
281
|
+
one-line reason citing host evidence (file:line, ADR, contract).
|
|
282
|
+
See *Critical evaluation* above.
|
|
283
|
+
5. **User-facing options** — numbered block per `user-interaction`,
|
|
284
|
+
carrying the host verdict in each option, with "discard council
|
|
285
|
+
input" always present as an option.
|
|
286
|
+
|
|
287
|
+
The host agent NEVER ships council output as its own reasoning, and
|
|
288
|
+
NEVER ships the host verdict as council output. Provider attribution
|
|
289
|
+
stays visible in the per-member sections; host verdicts stay
|
|
290
|
+
attributed to the host.
|
|
203
291
|
|
|
204
292
|
## Do NOT
|
|
205
293
|
|
|
206
294
|
- Do NOT paraphrase council output into the host agent's voice — strip
|
|
207
295
|
attribution and you've stripped the value.
|
|
296
|
+
- Do NOT surface council findings to the user without a host verdict
|
|
297
|
+
— convergence ≠ correctness, and the user deserves the agent's
|
|
298
|
+
reasoned filter, not a raw forward.
|
|
208
299
|
- Do NOT pre-warm the council with the host agent's analysis or
|
|
209
300
|
identity — that primes the reviewer and collapses diversity.
|
|
210
301
|
- Do NOT silently truncate a too-large bundle — surface the size and
|
|
@@ -236,6 +327,9 @@ Real failure modes seen in the wild:
|
|
|
236
327
|
| Silently truncate a too-large bundle. | Misleads the reviewer into thinking they saw the whole thing. | Bundler raises `BundleTooLarge`; surface and ask for narrower scope. |
|
|
237
328
|
| Reuse the same SDK client across calls without re-loading the key. | Leaks the key in long-lived process state. | Each invocation builds fresh clients from `load_*_key()`. |
|
|
238
329
|
| Auto-spend tokens under `personal.autonomy: on`. | Autonomy ≠ permission to spend money. | Always ask before consultation, even under autonomy. |
|
|
330
|
+
| Forward council convergence to the user as numbered options without a host verdict. | Convergence ≠ correctness; the council never saw the codebase. | Apply the *Critical evaluation* lens; tag every finding `accept` / `accept-with-modification` / `reject` / `needs-input` with one-line reason. |
|
|
331
|
+
| Reject a finding on preference, not evidence. | "I don't like this" is not a verdict. | Cite the file, line, ADR, or contract that justifies the rejection — or surface as `needs-input`. |
|
|
332
|
+
| Paraphrase council output into the host's own analysis to defend a verdict. | Strips attribution, breaches `direct-answers` no-invented-facts. | Verdict cites host evidence (file:line); council output stays attributed in the per-member sections. |
|
|
239
333
|
|
|
240
334
|
## Redaction expectations
|
|
241
335
|
|
|
@@ -343,9 +437,39 @@ at least once before convergence). The host agent does **not** ask
|
|
|
343
437
|
the settings owner already made that decision. Ask only when a
|
|
344
438
|
genuinely complex artefact justifies more depth than the default.
|
|
345
439
|
|
|
440
|
+
### Deep-reasoning tier (`council_depth: deep`)
|
|
441
|
+
|
|
442
|
+
Architecture review, refactoring proposals, and bug-diagnosis runs
|
|
443
|
+
benefit from an extra critique round. The deep tier is opt-in per
|
|
444
|
+
artefact:
|
|
445
|
+
|
|
446
|
+
1. The consuming **rule, skill, or command** declares
|
|
447
|
+
`council_depth: deep` in its frontmatter. The schema accepts
|
|
448
|
+
**only `deep`** — `standard` is the implicit default and is
|
|
449
|
+
rejected by the linter (every frontmatter byte costs context
|
|
450
|
+
window; see `scripts/schemas/{rule,skill,command}.schema.json`).
|
|
451
|
+
To return an artefact to default depth, **delete the key**.
|
|
452
|
+
2. The **host agent** reads that frontmatter when it dispatches the
|
|
453
|
+
council and passes `--depth deep` to `council_cli`. If multiple
|
|
454
|
+
active artefacts disagree, **deep wins** (max policy).
|
|
455
|
+
3. The **CLI** floors the round count at
|
|
456
|
+
`max(ai_council.deep_min_rounds, ai_council.min_rounds)` —
|
|
457
|
+
defaults to `3` and `2` respectively. Lowering `deep_min_rounds`
|
|
458
|
+
below `min_rounds` has no effect (defensive max).
|
|
459
|
+
|
|
460
|
+
The CLI itself has no knowledge of frontmatter; the contract is the
|
|
461
|
+
flag. Resolution chain (highest priority first):
|
|
462
|
+
|
|
463
|
+
```
|
|
464
|
+
--rounds N → explicit, any value (user override)
|
|
465
|
+
--depth deep → max(deep_min_rounds, min_rounds)
|
|
466
|
+
(no flag) → min_rounds (default 2)
|
|
467
|
+
```
|
|
468
|
+
|
|
346
469
|
| Property | Behaviour |
|
|
347
470
|
|---|---|
|
|
348
471
|
| Default count | `ai_council.min_rounds` (default `2`). Override per-invocation with `rounds:N` (or `--rounds N` to the CLI). |
|
|
472
|
+
| Deep floor | `ai_council.deep_min_rounds` (default `3`). Activated by `council_depth: deep` in artefact frontmatter (host translates to `--depth deep`) or explicit `--depth deep` on the CLI. Floored at `min_rounds`. |
|
|
349
473
|
| Anonymisation | Provider/model identity is stripped. Reviewers are labelled `Reviewer A / B / C…` in input order. |
|
|
350
474
|
| Errored prior responses | Skipped — they reveal nothing useful and can leak provider error formats. |
|
|
351
475
|
| Cost budget | Accumulates across rounds. A round-2 call that breaches the cap fires `on_overrun` exactly like a round-1 breach. |
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
name: bug-analyzer
|
|
3
3
|
description: "Use when the user shares a Sentry error, Jira bug ticket, or error description and wants root cause analysis. Also for proactive bug hunting and code audits for hidden bugs."
|
|
4
4
|
source: package
|
|
5
|
+
council_depth: deep
|
|
5
6
|
---
|
|
6
7
|
|
|
7
8
|
# bug-analyzer
|
|
@@ -9,8 +9,8 @@ source: package
|
|
|
9
9
|
## When to use
|
|
10
10
|
|
|
11
11
|
Use this skill when:
|
|
12
|
-
- Creating a new roadmap (
|
|
13
|
-
- Executing a roadmap (
|
|
12
|
+
- Creating a new roadmap (`/roadmap:create` command)
|
|
13
|
+
- Executing a roadmap (`/roadmap:process-step|phase|full` commands)
|
|
14
14
|
- Checking roadmap progress
|
|
15
15
|
- Updating roadmap status after completing work
|
|
16
16
|
|
|
@@ -119,8 +119,8 @@ Every roadmap follows this structure:
|
|
|
119
119
|
|
|
120
120
|
Every roadmap implicitly includes the project's quality pipeline
|
|
121
121
|
(static analysis, autofixes, tests). What's configurable is **when**
|
|
122
|
-
the pipeline runs during `/roadmap
|
|
123
|
-
`roadmap.quality_cadence` in `.agent-settings.yml`:
|
|
122
|
+
the pipeline runs during `/roadmap:process-step|phase|full`,
|
|
123
|
+
controlled by `roadmap.quality_cadence` in `.agent-settings.yml`:
|
|
124
124
|
|
|
125
125
|
| Cadence | Pipeline runs | Trade-off |
|
|
126
126
|
|---|---|---|
|
|
@@ -164,7 +164,7 @@ unfamiliar codebases.
|
|
|
164
164
|
evidence-based reason (e.g. risky migration benefits from a spike).
|
|
165
165
|
Never include release versions, deprecation dates, or git tags in
|
|
166
166
|
the roadmap text. If the user declines, do **not** re-propose during
|
|
167
|
-
|
|
167
|
+
`/roadmap:process-*`. Decline = silence. See [`scope-control`](../../rules/scope-control.md#decline--silence--no-re-asking-on-the-same-task).
|
|
168
168
|
5. Save with a kebab-case filename (e.g. `optimize-webhook-jobs.md`).
|
|
169
169
|
**Before writing**, scan the entire roadmap namespace for a
|
|
170
170
|
collision — active, `archive/`, `skipped/`, and nested subdirs —
|
|
@@ -289,8 +289,8 @@ is rewritten by `.augment/scripts/update_roadmap_progress.py`.
|
|
|
289
289
|
**Always regenerate in the SAME response** after any of the following
|
|
290
290
|
(enforced by [`roadmap-progress-sync`](../../rules/roadmap-progress-sync.md)):
|
|
291
291
|
|
|
292
|
-
- Creating a new roadmap (
|
|
293
|
-
- Marking a step `[x]`, `[~]`, or `[-]` during
|
|
292
|
+
- Creating a new roadmap (`/roadmap:create`)
|
|
293
|
+
- Marking a step `[x]`, `[~]`, or `[-]` during `/roadmap:process-*`
|
|
294
294
|
- Archiving or moving a roadmap to `skipped/`
|
|
295
295
|
- Adding, renaming, or removing a phase
|
|
296
296
|
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
name: systematic-debugging
|
|
3
3
|
description: "Use when hitting a bug, test failure, crash, or unexpected behavior — enforces reproduce → isolate → hypothesize → verify before any fix — even when the user just says 'this is broken' or 'quick fix'."
|
|
4
4
|
source: package
|
|
5
|
+
council_depth: deep
|
|
5
6
|
---
|
|
6
7
|
|
|
7
8
|
# systematic-debugging
|
|
@@ -35,10 +36,28 @@ papers over an unknown cause is a regression waiting to happen.
|
|
|
35
36
|
|
|
36
37
|
```
|
|
37
38
|
NO FIX WITHOUT ROOT CAUSE. NO ROOT CAUSE WITHOUT EVIDENCE.
|
|
39
|
+
NO BUG MARKED FIXED WITHOUT A REGRESSION TEST.
|
|
38
40
|
```
|
|
39
41
|
|
|
40
42
|
"I think it's probably X" is not evidence. A log line, a stack trace, a
|
|
41
|
-
diff, a reproduced failure — those are evidence.
|
|
43
|
+
diff, a reproduced failure — those are evidence. A green run after a
|
|
44
|
+
manual edit is not a regression test — a test that fails without the fix
|
|
45
|
+
and passes with it, is.
|
|
46
|
+
|
|
47
|
+
## The 6-phase loop (the spine)
|
|
48
|
+
|
|
49
|
+
Every debug session walks these six phases in order. Treat them as a
|
|
50
|
+
checklist — tick each box before claiming the bug fixed:
|
|
51
|
+
|
|
52
|
+
- [ ] **1. Reproduce** — bug triggers on demand, smallest possible setup _(Phase 1)_
|
|
53
|
+
- [ ] **2. Minimize** — smallest failing case isolated, irrelevant context stripped _(Phase 1, step 2)_
|
|
54
|
+
- [ ] **3. Hypothesize** — one testable theory stated in one sentence _(Phase 3)_
|
|
55
|
+
- [ ] **4. Instrument** — log / breakpoint / trace at the boundary where expected ≠ actual _(Phase 2)_
|
|
56
|
+
- [ ] **5. Fix** — single, minimal change targeting the root cause _(Phase 4, step 2)_
|
|
57
|
+
- [ ] **6. Regression-test** — failing test added that catches the bug returning **(MANDATORY — no exception)** _(Phase 4, step 1 + Validation checklist)_
|
|
58
|
+
|
|
59
|
+
Skipping a box (especially #2 or #6) is the single biggest cause of
|
|
60
|
+
wasted debug time and re-opened bugs.
|
|
42
61
|
|
|
43
62
|
## Procedure
|
|
44
63
|
|
|
@@ -242,10 +261,11 @@ When reporting debug findings to the user:
|
|
|
242
261
|
|
|
243
262
|
Before declaring a bug fixed:
|
|
244
263
|
|
|
264
|
+
* [ ] **6-phase loop** — all six boxes (Reproduce → Minimize → Hypothesize → Instrument → Fix → Regression-test) ticked
|
|
245
265
|
* [ ] The failure was reproduced before any code changed
|
|
246
266
|
* [ ] The root cause is named explicitly, not "probably"
|
|
247
267
|
* [ ] Evidence (log, trace, diff) supports the named root cause
|
|
248
|
-
* [ ]
|
|
268
|
+
* [ ] **Regression test added — MANDATORY**: a test that fails without the fix and passes with it. No exception. "Manual reproduction confirmed gone" is not a regression test
|
|
249
269
|
* [ ] The fix is minimal and targets the root cause, not the symptom
|
|
250
270
|
* [ ] The regression test now passes
|
|
251
271
|
* [ ] Adjacent tests still pass
|
|
@@ -1,7 +1,8 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: technical-specification
|
|
3
|
-
description: "Use when the user says "write a spec", "create RFC", or "document this decision". Writes technical specifications, RFCs, and ADRs with clear structure."
|
|
3
|
+
description: "Use when the user says "write a spec", "create RFC", "write a PRD", or "document this decision". Writes technical specifications, PRDs, RFCs, and ADRs with clear structure."
|
|
4
4
|
source: package
|
|
5
|
+
council_depth: deep
|
|
5
6
|
---
|
|
6
7
|
|
|
7
8
|
# technical-specification
|
|
@@ -10,6 +11,7 @@ source: package
|
|
|
10
11
|
|
|
11
12
|
Use this skill when:
|
|
12
13
|
- Writing a technical specification for a new feature or system
|
|
14
|
+
- Writing a Product Requirements Document (PRD) when user pain leads and solution shape follows
|
|
13
15
|
- Creating an architecture decision record (ADR)
|
|
14
16
|
- Documenting a technical RFC (Request for Comments)
|
|
15
17
|
- Planning a significant technical change that needs team review
|
|
@@ -69,6 +71,59 @@ For complex features or systems. Stored in `agents/features/` or module `agents/
|
|
|
69
71
|
- {Links to related docs, tickets, or external resources}
|
|
70
72
|
```
|
|
71
73
|
|
|
74
|
+
### Product Requirements Document (PRD)
|
|
75
|
+
|
|
76
|
+
For features whose **shape is product-driven** rather than implementation-driven —
|
|
77
|
+
when "what users get" matters more than "how the code is wired". Use a PRD
|
|
78
|
+
when a Technical Specification would over-index on internals and under-index
|
|
79
|
+
on user value. Stored in `agents/features/` next to the technical spec when
|
|
80
|
+
both exist.
|
|
81
|
+
|
|
82
|
+
```markdown
|
|
83
|
+
# PRD: {Title}
|
|
84
|
+
|
|
85
|
+
## Status
|
|
86
|
+
{ Draft | In Review | Approved | Shipped | Parked }
|
|
87
|
+
|
|
88
|
+
## Problem
|
|
89
|
+
{The user-visible pain. One paragraph. Cite evidence — support tickets,
|
|
90
|
+
analytics, user quotes — not assumptions.}
|
|
91
|
+
|
|
92
|
+
## Success Criteria
|
|
93
|
+
- {Measurable outcome with a number and a timeframe}
|
|
94
|
+
- {Another measurable outcome}
|
|
95
|
+
|
|
96
|
+
## Non-Goals
|
|
97
|
+
- {What this PRD explicitly does NOT cover — protect scope}
|
|
98
|
+
|
|
99
|
+
## User Stories
|
|
100
|
+
- As a {role}, I want to {action}, so that {outcome}.
|
|
101
|
+
- As a {role}, I want to {action}, so that {outcome}.
|
|
102
|
+
|
|
103
|
+
## Major Modules
|
|
104
|
+
{The 3-7 functional building blocks the feature decomposes into. One
|
|
105
|
+
sentence each — what it does, not how. Implementation lives in the
|
|
106
|
+
technical spec, not here.}
|
|
107
|
+
|
|
108
|
+
- **{Module name}** — {one-sentence responsibility}
|
|
109
|
+
- **{Module name}** — {one-sentence responsibility}
|
|
110
|
+
|
|
111
|
+
## Open Questions
|
|
112
|
+
- [ ] {Unresolved product question — pricing, permission, copy, edge case}
|
|
113
|
+
|
|
114
|
+
## References
|
|
115
|
+
- {Links to research, related PRDs, technical spec when it exists}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
**PRD vs Technical Specification — when to pick which:**
|
|
119
|
+
|
|
120
|
+
- **PRD first** when the user pain is clear but the solution shape is not
|
|
121
|
+
("users can't share dashboards" → PRD scopes the problem; spec follows).
|
|
122
|
+
- **Spec first** when the solution shape is clear but the implementation
|
|
123
|
+
is non-trivial ("rebuild auth on OAuth2" → spec leads; PRD often skipped).
|
|
124
|
+
- **Both** when the feature is large enough that product and engineering
|
|
125
|
+
decisions belong in different artifacts.
|
|
126
|
+
|
|
72
127
|
### Architecture Decision Record (ADR)
|
|
73
128
|
|
|
74
129
|
For significant technical decisions. Stored in `agents/decisions/`.
|
|
@@ -161,6 +216,8 @@ If a developer reads only this document, they should be able to build it.
|
|
|
161
216
|
## Auto-trigger keywords
|
|
162
217
|
|
|
163
218
|
- technical spec
|
|
219
|
+
- PRD
|
|
220
|
+
- product requirements
|
|
164
221
|
- RFC
|
|
165
222
|
- ADR
|
|
166
223
|
- architecture decision
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
name: threat-modeling
|
|
3
3
|
description: "Use when adding auth, webhooks, uploads, queues, secrets, tenant boundaries, or public endpoints — produces trust boundaries + abuse cases mapped to files, BEFORE implementation."
|
|
4
4
|
source: package
|
|
5
|
+
council_depth: deep
|
|
5
6
|
---
|
|
6
7
|
|
|
7
8
|
# threat-modeling
|