npm - @event4u/agent-config - Versions diffs - 1.21.0 → 1.22.0 - Mend

@event4u/agent-config 1.21.0 → 1.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

package/.agent-src/commands/bug-fix.md +1 -0
package/.agent-src/commands/bug-investigate.md +1 -0
package/.agent-src/commands/challenge-me/vision.md +348 -0
package/.agent-src/commands/challenge-me/with-docs.md +333 -0
package/.agent-src/commands/challenge-me.md +61 -0
package/.agent-src/commands/council/default.md +64 -17
package/.agent-src/commands/create-pr.md +7 -3
package/.agent-src/commands/grill-me.md +38 -0
package/.agent-src/commands/judge/steps.md +1 -1
package/.agent-src/commands/roadmap/ai-council.md +183 -0
package/.agent-src/commands/roadmap/create.md +6 -1
package/.agent-src/commands/roadmap/process-full.md +58 -0
package/.agent-src/commands/roadmap/process-phase.md +69 -0
package/.agent-src/commands/roadmap/process-step.md +57 -0
package/.agent-src/commands/roadmap.md +44 -16
package/.agent-src/commands/threat-model.md +1 -0
package/.agent-src/contexts/augment-infrastructure.md +1 -1
package/.agent-src/contexts/communication/rules-auto/slash-command-routing-policy-mechanics.md +53 -18
package/.agent-src/contexts/execution/roadmap-process-loop.md +125 -0
package/.agent-src/contexts/skills-and-commands.md +1 -1
package/.agent-src/rules/improve-before-implement.md +1 -0
package/.agent-src/rules/invite-challenge.md +71 -0
package/.agent-src/skills/adversarial-review/SKILL.md +1 -0
package/.agent-src/skills/ai-council/SKILL.md +132 -8
package/.agent-src/skills/bug-analyzer/SKILL.md +1 -0
package/.agent-src/skills/roadmap-management/SKILL.md +7 -7
package/.agent-src/skills/systematic-debugging/SKILL.md +22 -2
package/.agent-src/skills/technical-specification/SKILL.md +58 -1
package/.agent-src/skills/threat-modeling/SKILL.md +1 -0
package/.agent-src/templates/agent-settings.md +14 -3
package/.agent-src/templates/command.md +17 -1
package/.agent-src/templates/roadmaps.md +10 -2
package/.agent-src/templates/rule.md +2 -0
package/.agent-src/templates/skill.md +17 -0
package/.claude-plugin/marketplace.json +9 -2
package/AGENTS.md +2 -2
package/CHANGELOG.md +38 -0
package/README.md +1 -1
package/config/agent-settings.template.yml +22 -0
package/docs/architecture.md +2 -2
package/docs/contracts/command-clusters.md +45 -1
package/docs/decisions/ADR-003-flat-cluster-subs-and-colon-syntax.md +126 -0
package/docs/getting-started.md +1 -1
package/docs/guidelines/agent-infra/naming.md +1 -1
package/package.json +1 -1
package/scripts/_phase2_shim_helper.py +1 -1
package/scripts/council_cli.py +127 -10
package/scripts/migrate_command_suggestions.py +2 -2
package/scripts/schemas/command.schema.json +5 -0
package/scripts/schemas/rule.schema.json +5 -0
package/scripts/schemas/skill.schema.json +5 -0
package/scripts/skill_linter.py +1 -1
package/.agent-src/commands/roadmap/execute.md +0 -109

package/.agent-src/contexts/execution/roadmap-process-loop.md ADDED Viewed

@@ -0,0 +1,125 @@
+# Roadmap-Process Loop
+Loaded by [`/roadmap:process-step`](../../commands/roadmap/process-step.md),
+[`/roadmap:process-phase`](../../commands/roadmap/process-phase.md), and
+[`/roadmap:process-full`](../../commands/roadmap/process-full.md). Holds
+the canonical autonomous-execution loop, roadmap discovery, cadence
+resolution, commit-step pre-scan, halt conditions, and archival check.
+The three command files are thin wrappers that bind only the **scope
+delta**.
+**Size budget:** ≤ 4,000 chars.
+## 1. Resolve roadmap
+Search both locations:
+- `agents/roadmaps/*.md` (project root)
+- `app/Modules/*/agents/roadmaps/*.md` (module-scoped)
+**Exclude** `template.md`, `archive/`, and `skipped/`.
+- User named one (path, partial name, title) → use it.
+- None named, single active roadmap (`count_open > 0`) → use it.
+- None named, multiple active → default = **most recently modified**;
+  surface alternatives in the pre-run summary.
+- None active → tell the user; suggest [`/roadmap:create`](../../commands/roadmap/create.md).
+## 2. Pre-run summary — single confirmation gate
+Before the loop runs, show the resolved config in the user's language:
+> Roadmap: `<resolved-path>`
+> Phase 1: `<name>` — 3/5 done
+> Phase 2: `<name>` — 0/4
+> Next open step: `<description>`
+> Scope: **step | phase | full**
+> AI council: **on | off** (`<member list or "no members configured">`)
+> Quality cadence: **end_of_roadmap | per_phase | per_step**
+> Commit steps in roadmap: **N** (see § 3)
+>
+> 1. Go — start processing autonomously
+> 2. Different roadmap · 3. Different scope · 4. Toggle council · 5. Abort
+Skip the gate when scope, roadmap, and council are all unambiguous in
+the invocation (e.g. `/roadmap:process-phase road-to-X.md with council`).
+## 3. Commit-step pre-scan — one upfront ask
+Before step 4, scan the roadmap for explicit commit steps (lines
+matching `commit:` / `git commit` / `Commit phase` patterns).
+- **No commit steps** → nothing to ask. Never commit, never re-ask
+  per [`commit-policy`](../../rules/commit-policy.md).
+- **Commit steps present, autonomous mode** (`personal.autonomy: on`,
+  or `auto` after opt-in) → ask **once** upfront:
+  > "Roadmap contains N commit steps. Authorize all of them for this
+  > run? (yes / no / list them)"
+  Cache the answer for the whole run; do **not** re-ask per step.
+  Hard-Floor diffs (bulk deletions, infra) still trigger the
+  per-commit gate from [`commit-mechanics`](../authority/commit-mechanics.md).
+- **Commit steps present, non-autonomous** → ask before each commit
+  step inside the loop.
+## 4. Resolve quality cadence
+Read `roadmap.quality_cadence` from `.agent-settings.yml` once:
+| Value | Pipeline runs |
+|---|---|
+| `end_of_roadmap` (default) | Once, before archival (§ 6) |
+| `per_phase` | At every phase boundary + § 6 |
+| `per_step` | After every step + § 6 |
+Missing / unreadable / unknown → fall back to `end_of_roadmap`.
+The Iron Law [`verify-before-complete`](../../rules/verify-before-complete.md)
+still mandates fresh quality output before any "complete" claim.
+## 5. Step loop
+For each open step in the working set (scope-bound — see wrapper):
+1. Read the step description and inline notes.
+2. Analyze the codebase for what the step requires.
+3. Decide and act — implement. **No "should I implement this?" prompt.**
+4. **Open question handling:**
+   - **Council on** → invoke per [`ai-council`](../../skills/ai-council/SKILL.md),
+     integrate convergence, proceed. Token spend was opted in.
+   - **Council off** → halt, surface once, wait. Resume on next turn.
+5. Mark the checkbox: `[x]` done · `[~]` partial · `[-]` skipped.
+6. Regenerate the dashboard — `./agent-config roadmap:progress` — in
+   the **same response** per [`roadmap-progress-sync`](../../rules/roadmap-progress-sync.md).
+7. Run quality pipeline if cadence is `per_step`.
+### Halt conditions
+- Hard-Floor trigger ([`non-destructive-by-default`](../../rules/non-destructive-by-default.md))
+- Security-sensitive path ([`security-sensitive-stop`](../../rules/security-sensitive-stop.md))
+- Step reveals work outside the roadmap's scope
+- Test failure or quality red on `per_step`
+- Council off + true ambiguity
+On halt: stop, surface state, do **not** auto-fix outside the failing step.
+## 6. Final report and archival
+- Summary: scope-bound (steps/phases done in this run), council
+  consultations count (if on), steps remaining, halts.
+- Final dashboard regen.
+- **If the entire roadmap reached `count_open == 0`** → run the full
+  project quality pipeline. On green → archival via the
+  [`roadmap-management`](../../skills/roadmap-management/SKILL.md) skill
+  (`git mv` to `agents/roadmaps/archive/`, regenerate dashboard). On
+  red → stop, surface failures, do **not** archive.
+## Scope deltas — what each wrapper binds
+| Wrapper | Working set | Stop after |
+|---|---|---|
+| `process-step` | Single first open step | One iteration of § 5 |
+| `process-phase` | All open steps in first phase with `count_open > 0` | Phase boundary; per-phase quality if cadence ≠ `end_of_roadmap` |
+| `process-full` | Every open step across every phase, in order | Roadmap fully closed (or halt) |
+`process-full` runs the per-phase quality pipeline at every phase
+boundary when cadence is `per_phase` or `per_step`; on red it halts
+before the next phase.

package/.agent-src/contexts/skills-and-commands.md CHANGED Viewed

@@ -30,7 +30,7 @@ Commands often chain together. Here are the main workflows:
 ### Feature Development
 ```
-/feature-explore → /feature-plan → /feature-roadmap → /roadmap-execute
+/feature-explore → /feature-plan → /feature-roadmap → /roadmap:process-phase
      ↓                  ↓                ↓                    ↓
   Brainstorm        Structure         Phases/Steps        Implement
 ```

package/.agent-src/rules/improve-before-implement.md CHANGED Viewed

@@ -4,6 +4,7 @@ tier: "2b"
 description: "Before implementing features or architectural changes — validate the request against existing code, challenge weak requirements, and suggest improvements"
 alwaysApply: false
 source: package
+council_depth: deep
 triggers:
   - intent: "implement feature"
   - intent: "architectural change"

package/.agent-src/rules/invite-challenge.md ADDED Viewed

@@ -0,0 +1,71 @@
+---
+type: "auto"
+tier: "2b"
+description: "Before executing a complex plan or non-trivial design — proactively ask 'am I solving the right problem?' and pause for user confirmation, even when no ambiguity is detected"
+alwaysApply: false
+source: package
+council_depth: deep
+triggers:
+  - intent: "complex plan"
+  - intent: "design decision"
+  - intent: "architectural plan"
+  - intent: "multi-step implementation"
+  - keyword: "plan"
+  - keyword: "design"
+  - keyword: "architecture"
+  - keyword: "approach"
+---
+# invite-challenge
+## The Iron Law
+```
+BEFORE EXECUTING A COMPLEX PLAN — ASK ONCE:
+"AM I SOLVING THE RIGHT PROBLEM?"
+PAUSE. WAIT FOR CONFIRMATION. THEN PROCEED.
+```
+Proactive confirmation checkpoint. Distinct from [`direct-answers`](direct-answers.md) (reactive — fires after misdirection is detected) and [`ask-when-uncertain`](ask-when-uncertain.md) (info-gathering — fires when context is missing). This rule fires when context is **sufficient** and direction **looks correct** but the plan is heavy enough that a silent misread would be expensive.
+## When to activate
+Two or more of:
+- Touches ≥ 3 files or ≥ 2 modules
+- New abstraction, pattern, or library
+- Security / billing / tenant / data-boundary path
+- Migration, schema change, or backwards-incompatible API change
+- Estimated effort > 30 min of agent work
+- High-level goal ("rebuild the dashboard") rather than a bounded task ("rename this method")
+**Does NOT activate for:** evidenced bug fixes · trivial edits · user-fenced tasks ("just do it") · steps inside an already-confirmed plan (ask once per plan, not per step) · cases handled by [`improve-before-implement`](improve-before-implement.md) Phase 1.
+## How to challenge
+One question, one turn. Numbered options per [`user-interaction`](user-interaction.md):
+```
+> Before I start, one check:
+>
+> Goal as I read it: {1-sentence restatement}
+> Plan: {2-3 bullet shape, not full implementation}
+> Risk if I misread: {what would be wasted}
+>
+> 1. Goal + plan match — proceed
+> 2. Goal right, plan needs adjustment — say what
+> 3. Goal wrong — let me restate
+```
+The restatement is the load-bearing part. Pick 1 → execute. Pick 2 or 3 → fold the correction in, do not re-ask.
+## Scope limits
+- **One challenge per plan**, not per step.
+- **Skip when the user already restated the goal** in the same turn.
+- **Skip when [`scope-control § fenced step`](scope-control.md) applies** — deliver the plan, hand back, no confirmation question appended.
+- **Never argue the goal twice** — user's restatement is final for the turn.
+## Interactions
+[`direct-answers`](direct-answers.md) fires after misdirection · [`ask-when-uncertain`](ask-when-uncertain.md) fires on missing context · [`improve-before-implement`](improve-before-implement.md) Phase 1 already runs a clarity check (do not double-ask) · [`scope-control § fenced step`](scope-control.md) handback wins · [`no-cheap-questions`](no-cheap-questions.md) — checkpoint must carry a real goal restatement, not a content-free "ready?".

package/.agent-src/skills/adversarial-review/SKILL.md CHANGED Viewed

@@ -4,6 +4,7 @@ description: "ONLY when user explicitly requests adversarial review, devil's adv
 personas:
   - critical-challenger
 source: package
+council_depth: deep
 ---
 # Adversarial Review

package/.agent-src/skills/ai-council/SKILL.md CHANGED Viewed

@@ -43,6 +43,7 @@ prompt that asks them to think on their own merits.
 THE COUNCIL DOES NOT SEE THE HOST AGENT'S ANALYSIS.
 THE COUNCIL DOES NOT SEE PRIOR REPLIES.
 THE COUNCIL SEES THE ARTEFACT + THE NEUTRAL SYSTEM PROMPT. NOTHING ELSE.
+THE HOST AGENT IS THE CONVENER, NEVER A REVIEWER.
 ```
 If you find yourself wanting to "frame" the artefact for the council,
@@ -50,6 +51,13 @@ stop. Framing is exactly what kills the second-opinion value. Use the
 unbiased system prompts in `scripts/ai_council/prompts.py`; do not
 roll your own.
+The host runs the council and synthesises convergence — it is the
+convener, not a reviewer. The reviewer-ban is structural: the host
+wrote (or framed) the artefact and cannot critique it independently.
+Anonymising the host as "Reviewer C" is worse than excluding it — the
+user is told they got an outside vote when they did not. Externals
+down → surface and skip; never substitute the host as a reviewer.
 ### Neutrality — context-handoff
 External reviewers do better critique when they know **what the
@@ -120,6 +128,27 @@ token counts (from the manual-paste length heuristic or the
 provider's reply, when available) for observability. Mixed runs
 (one manual + one api) gate only the api members.
+## Degradation modes
+How the council behaves when fewer than two billable members are
+reachable. The orchestrator never silently substitutes — degradation
+is visible to the user.
+| Reachable | Behaviour | Independence |
+|---|---|---|
+| **2+** | Full fan-out, multi-round debate. Default. | High — cross-provider diversity. |
+| **1** | Single-voice critique with a degraded-run warning. Multi-round mode lets the model see its own anonymised reply, but convergence ≠ correctness. | Low — shared blind spots. |
+| **0** | Council skipped. Surface the failure, proceed without external review. **Never** substitute the host or an unrequested manual pass. | None. |
+Rejected anti-patterns (council convergence, 2026-05-06): persona
+prompts (same model, same blind spots, more cost), temperature
+spread (noise, not signal), host-as-fallback (Iron Law breach).
+Supported single-provider strategy is **sibling models on the same
+provider** (e.g. Sonnet ↔ Opus, gpt-4o ↔ o1) — different training
+cutoffs / reasoning architectures within one provider family. Cost
+is real (siblings price-tier higher); explicit opt-in per invocation,
+not a default.
 ## Procedure
 1. **Resolve target.** Identify the artefact mode (`prompt`, `roadmap`,
@@ -140,9 +169,61 @@ provider's reply, when available) for observability. Mixed runs
    into the host agent's voice.
 6. **Summarise.** Write a `Convergence / Divergence` block listing
    agreements, disagreements, and unique insights — provider-attributed.
-7. **Translate to options.** Convert actionable council suggestions
-   into concrete numbered options for the user. The user decides;
-   the council advises.
+7. **Critically evaluate** every finding before it leaves the host
+   (see *Critical evaluation* below). The host is the convener **and**
+   the skeptic — never a reviewer of the artefact itself, but always a
+   reviewer of the **council's output**.
+8. **Translate validated findings to options.** Convert each finding
+   the host accepts (or accepts with modification) into a concrete
+   numbered option for the user. Tag every option with the host's
+   verdict so the user sees the agent's reasoned position, not the
+   council's raw output. The user decides; the council advises; the
+   host filters.
+## Critical evaluation — convener-skeptic stance
+```
+COUNCIL CONVERGENCE IS NOT CORRECTNESS.
+DO NOT BLINDLY ACCEPT FINDINGS. DO NOT BLINDLY REJECT THEM.
+EVERY FINDING GETS A REASONED VERDICT BEFORE IT REACHES THE USER.
+```
+The council is **uninformed about the codebase, ADRs, locked
+contracts, prior decisions, and project history** — it sees only the
+artefact + neutrality preamble. That is the source of its diversity
+**and** its blind spots. Convergence between members can mean shared
+generic best-practice priors, not project-specific correctness.
+The host applies a critical lens to **every finding** (convergence
+**and** divergence) before surfacing it as a numbered option:
+| Check | Question | Tool |
+|---|---|---|
+| **Codebase fit** | Does the finding match the actual code, files, signatures, conventions? | `view` / `codebase-retrieval` / `grep` |
+| **Locked-decision conflict** | Does it contradict an ADR, kernel rule, contract under `docs/contracts/`, or `docs/decisions/`? | `view` |
+| **Already addressed** | Is it a generic best-practice already covered by an existing rule, skill, or test? | `view` / `grep` |
+| **Cost / benefit** | Is the change worth the diff size, churn, and review cost vs. the marginal benefit? | reasoning |
+| **Hallucination** | Does the finding cite a file, function, or behavior that does not exist? | `view` |
+Each finding receives one of three verdicts:
+- **`accept`** — codebase fits, no locked-decision conflict, benefit clears cost. Surface as a normal numbered option.
+- **`accept-with-modification`** — core insight valid, but the proposed shape needs adjusting (wrong file, contradicts ADR detail, scope creep). Surface with the **modified** patch and a one-line note.
+- **`reject`** — finding is wrong (hallucinated reference, contradicts a locked decision, already addressed, generic noise). Surface as a **Rejected by host** entry with a one-line reason. Still visible — the user can override.
+The verdict is the host's **own** reasoning, not the council's.
+Pretending convergence equals correctness, or paraphrasing council
+output as host analysis, both breach the [`direct-answers`](../../rules/direct-answers.md)
+no-invented-facts rule. When the host cannot reach a confident
+verdict on a finding (mixed evidence, ambiguous scope), it surfaces
+the finding as `needs-input` with the open question — the user
+decides, the host does not guess.
+### What this is NOT
+- **Not a re-review by the host.** The host did not write the artefact independently and cannot critique it independently — that boundary still holds.
+- **Not a vote against the council.** Rejecting a finding requires evidence (file, line, contract reference), not preference.
+- **Not silent filtering.** Every finding reaches the user with its verdict and reason. The user can pick a `reject` option and override the host.
 ## Output path convention
@@ -195,16 +276,26 @@ Every council reply MUST contain, in this order:
    containing the member's verbatim output.
 3. **Convergence / Divergence summary** — bullet list, every claim
    attributed by provider name.
-4. **User-facing options** — numbered block per `user-interaction`,
-   with "discard council input" always present as an option.
-The host agent NEVER ships council output as its own reasoning.
-Provider attribution stays visible in every render.
+4. **Host verdict per finding** — one row per finding with `accept`
+   / `accept-with-modification` / `reject` / `needs-input` plus a
+   one-line reason citing host evidence (file:line, ADR, contract).
+   See *Critical evaluation* above.
+5. **User-facing options** — numbered block per `user-interaction`,
+   carrying the host verdict in each option, with "discard council
+   input" always present as an option.
+The host agent NEVER ships council output as its own reasoning, and
+NEVER ships the host verdict as council output. Provider attribution
+stays visible in the per-member sections; host verdicts stay
+attributed to the host.
 ## Do NOT
 - Do NOT paraphrase council output into the host agent's voice — strip
   attribution and you've stripped the value.
+- Do NOT surface council findings to the user without a host verdict
+  — convergence ≠ correctness, and the user deserves the agent's
+  reasoned filter, not a raw forward.
 - Do NOT pre-warm the council with the host agent's analysis or
   identity — that primes the reviewer and collapses diversity.
 - Do NOT silently truncate a too-large bundle — surface the size and
@@ -236,6 +327,9 @@ Real failure modes seen in the wild:
 | Silently truncate a too-large bundle. | Misleads the reviewer into thinking they saw the whole thing. | Bundler raises `BundleTooLarge`; surface and ask for narrower scope. |
 | Reuse the same SDK client across calls without re-loading the key. | Leaks the key in long-lived process state. | Each invocation builds fresh clients from `load_*_key()`. |
 | Auto-spend tokens under `personal.autonomy: on`. | Autonomy ≠ permission to spend money. | Always ask before consultation, even under autonomy. |
+| Forward council convergence to the user as numbered options without a host verdict. | Convergence ≠ correctness; the council never saw the codebase. | Apply the *Critical evaluation* lens; tag every finding `accept` / `accept-with-modification` / `reject` / `needs-input` with one-line reason. |
+| Reject a finding on preference, not evidence. | "I don't like this" is not a verdict. | Cite the file, line, ADR, or contract that justifies the rejection — or surface as `needs-input`. |
+| Paraphrase council output into the host's own analysis to defend a verdict. | Strips attribution, breaches `direct-answers` no-invented-facts. | Verdict cites host evidence (file:line); council output stays attributed in the per-member sections. |
 ## Redaction expectations
@@ -343,9 +437,39 @@ at least once before convergence). The host agent does **not** ask
 the settings owner already made that decision. Ask only when a
 genuinely complex artefact justifies more depth than the default.
+### Deep-reasoning tier (`council_depth: deep`)
+Architecture review, refactoring proposals, and bug-diagnosis runs
+benefit from an extra critique round. The deep tier is opt-in per
+artefact:
+1. The consuming **rule, skill, or command** declares
+   `council_depth: deep` in its frontmatter. The schema accepts
+   **only `deep`** — `standard` is the implicit default and is
+   rejected by the linter (every frontmatter byte costs context
+   window; see `scripts/schemas/{rule,skill,command}.schema.json`).
+   To return an artefact to default depth, **delete the key**.
+2. The **host agent** reads that frontmatter when it dispatches the
+   council and passes `--depth deep` to `council_cli`. If multiple
+   active artefacts disagree, **deep wins** (max policy).
+3. The **CLI** floors the round count at
+   `max(ai_council.deep_min_rounds, ai_council.min_rounds)` —
+   defaults to `3` and `2` respectively. Lowering `deep_min_rounds`
+   below `min_rounds` has no effect (defensive max).
+The CLI itself has no knowledge of frontmatter; the contract is the
+flag. Resolution chain (highest priority first):
+```
+--rounds N           → explicit, any value (user override)
+--depth deep         → max(deep_min_rounds, min_rounds)
+(no flag)            → min_rounds (default 2)
+```
 | Property | Behaviour |
 |---|---|
 | Default count | `ai_council.min_rounds` (default `2`). Override per-invocation with `rounds:N` (or `--rounds N` to the CLI). |
+| Deep floor | `ai_council.deep_min_rounds` (default `3`). Activated by `council_depth: deep` in artefact frontmatter (host translates to `--depth deep`) or explicit `--depth deep` on the CLI. Floored at `min_rounds`. |
 | Anonymisation | Provider/model identity is stripped. Reviewers are labelled `Reviewer A / B / C…` in input order. |
 | Errored prior responses | Skipped — they reveal nothing useful and can leak provider error formats. |
 | Cost budget | Accumulates across rounds. A round-2 call that breaches the cap fires `on_overrun` exactly like a round-1 breach. |

package/.agent-src/skills/bug-analyzer/SKILL.md CHANGED Viewed

@@ -2,6 +2,7 @@
 name: bug-analyzer
 description: "Use when the user shares a Sentry error, Jira bug ticket, or error description and wants root cause analysis. Also for proactive bug hunting and code audits for hidden bugs."
 source: package
+council_depth: deep
 ---
 # bug-analyzer

package/.agent-src/skills/roadmap-management/SKILL.md CHANGED Viewed

@@ -9,8 +9,8 @@ source: package
 ## When to use
 Use this skill when:
-- Creating a new roadmap (`roadmap-create` command)
-- Executing a roadmap (`roadmap-execute` command)
+- Creating a new roadmap (`/roadmap:create` command)
+- Executing a roadmap (`/roadmap:process-step|phase|full` commands)
 - Checking roadmap progress
 - Updating roadmap status after completing work
@@ -119,8 +119,8 @@ Every roadmap follows this structure:
 Every roadmap implicitly includes the project's quality pipeline
 (static analysis, autofixes, tests). What's configurable is **when**
-the pipeline runs during `/roadmap execute`, controlled by
-`roadmap.quality_cadence` in `.agent-settings.yml`:
+the pipeline runs during `/roadmap:process-step|phase|full`,
+controlled by `roadmap.quality_cadence` in `.agent-settings.yml`:
 | Cadence | Pipeline runs | Trade-off |
 |---|---|---|
@@ -164,7 +164,7 @@ unfamiliar codebases.
    evidence-based reason (e.g. risky migration benefits from a spike).
    Never include release versions, deprecation dates, or git tags in
    the roadmap text. If the user declines, do **not** re-propose during
-   `roadmap-execute`. Decline = silence. See [`scope-control`](../../rules/scope-control.md#decline--silence--no-re-asking-on-the-same-task).
+   `/roadmap:process-*`. Decline = silence. See [`scope-control`](../../rules/scope-control.md#decline--silence--no-re-asking-on-the-same-task).
 5. Save with a kebab-case filename (e.g. `optimize-webhook-jobs.md`).
    **Before writing**, scan the entire roadmap namespace for a
    collision — active, `archive/`, `skipped/`, and nested subdirs —
@@ -289,8 +289,8 @@ is rewritten by `.augment/scripts/update_roadmap_progress.py`.
 **Always regenerate in the SAME response** after any of the following
 (enforced by [`roadmap-progress-sync`](../../rules/roadmap-progress-sync.md)):
-- Creating a new roadmap (`roadmap-create`)
-- Marking a step `[x]`, `[~]`, or `[-]` during `roadmap-execute`
+- Creating a new roadmap (`/roadmap:create`)
+- Marking a step `[x]`, `[~]`, or `[-]` during `/roadmap:process-*`
 - Archiving or moving a roadmap to `skipped/`
 - Adding, renaming, or removing a phase

package/.agent-src/skills/systematic-debugging/SKILL.md CHANGED Viewed

@@ -2,6 +2,7 @@
 name: systematic-debugging
 description: "Use when hitting a bug, test failure, crash, or unexpected behavior — enforces reproduce → isolate → hypothesize → verify before any fix — even when the user just says 'this is broken' or 'quick fix'."
 source: package
+council_depth: deep
 ---
 # systematic-debugging
@@ -35,10 +36,28 @@ papers over an unknown cause is a regression waiting to happen.
 ```
 NO FIX WITHOUT ROOT CAUSE. NO ROOT CAUSE WITHOUT EVIDENCE.
+NO BUG MARKED FIXED WITHOUT A REGRESSION TEST.
 ```
 "I think it's probably X" is not evidence. A log line, a stack trace, a
-diff, a reproduced failure — those are evidence.
+diff, a reproduced failure — those are evidence. A green run after a
+manual edit is not a regression test — a test that fails without the fix
+and passes with it, is.
+## The 6-phase loop (the spine)
+Every debug session walks these six phases in order. Treat them as a
+checklist — tick each box before claiming the bug fixed:
+- [ ] **1. Reproduce** — bug triggers on demand, smallest possible setup _(Phase 1)_
+- [ ] **2. Minimize** — smallest failing case isolated, irrelevant context stripped _(Phase 1, step 2)_
+- [ ] **3. Hypothesize** — one testable theory stated in one sentence _(Phase 3)_
+- [ ] **4. Instrument** — log / breakpoint / trace at the boundary where expected ≠ actual _(Phase 2)_
+- [ ] **5. Fix** — single, minimal change targeting the root cause _(Phase 4, step 2)_
+- [ ] **6. Regression-test** — failing test added that catches the bug returning **(MANDATORY — no exception)** _(Phase 4, step 1 + Validation checklist)_
+Skipping a box (especially #2 or #6) is the single biggest cause of
+wasted debug time and re-opened bugs.
 ## Procedure
@@ -242,10 +261,11 @@ When reporting debug findings to the user:
 Before declaring a bug fixed:
+* [ ] **6-phase loop** — all six boxes (Reproduce → Minimize → Hypothesize → Instrument → Fix → Regression-test) ticked
 * [ ] The failure was reproduced before any code changed
 * [ ] The root cause is named explicitly, not "probably"
 * [ ] Evidence (log, trace, diff) supports the named root cause
-* [ ] A failing test reproducing the bug was added or updated
+* [ ] **Regression test added — MANDATORY**: a test that fails without the fix and passes with it. No exception. "Manual reproduction confirmed gone" is not a regression test
 * [ ] The fix is minimal and targets the root cause, not the symptom
 * [ ] The regression test now passes
 * [ ] Adjacent tests still pass

package/.agent-src/skills/technical-specification/SKILL.md CHANGED Viewed

@@ -1,7 +1,8 @@
 ---
 name: technical-specification
-description: "Use when the user says "write a spec", "create RFC", or "document this decision". Writes technical specifications, RFCs, and ADRs with clear structure."
+description: "Use when the user says "write a spec", "create RFC", "write a PRD", or "document this decision". Writes technical specifications, PRDs, RFCs, and ADRs with clear structure."
 source: package
+council_depth: deep
 ---
 # technical-specification
@@ -10,6 +11,7 @@ source: package
 Use this skill when:
 - Writing a technical specification for a new feature or system
+- Writing a Product Requirements Document (PRD) when user pain leads and solution shape follows
 - Creating an architecture decision record (ADR)
 - Documenting a technical RFC (Request for Comments)
 - Planning a significant technical change that needs team review
@@ -69,6 +71,59 @@ For complex features or systems. Stored in `agents/features/` or module `agents/
 - {Links to related docs, tickets, or external resources}
 ```
+### Product Requirements Document (PRD)
+For features whose **shape is product-driven** rather than implementation-driven —
+when "what users get" matters more than "how the code is wired". Use a PRD
+when a Technical Specification would over-index on internals and under-index
+on user value. Stored in `agents/features/` next to the technical spec when
+both exist.
+```markdown
+# PRD: {Title}
+## Status
+{ Draft | In Review | Approved | Shipped | Parked }
+## Problem
+{The user-visible pain. One paragraph. Cite evidence — support tickets,
+analytics, user quotes — not assumptions.}
+## Success Criteria
+- {Measurable outcome with a number and a timeframe}
+- {Another measurable outcome}
+## Non-Goals
+- {What this PRD explicitly does NOT cover — protect scope}
+## User Stories
+- As a {role}, I want to {action}, so that {outcome}.
+- As a {role}, I want to {action}, so that {outcome}.
+## Major Modules
+{The 3-7 functional building blocks the feature decomposes into. One
+sentence each — what it does, not how. Implementation lives in the
+technical spec, not here.}
+- **{Module name}** — {one-sentence responsibility}
+- **{Module name}** — {one-sentence responsibility}
+## Open Questions
+- [ ] {Unresolved product question — pricing, permission, copy, edge case}
+## References
+- {Links to research, related PRDs, technical spec when it exists}
+```
+**PRD vs Technical Specification — when to pick which:**
+- **PRD first** when the user pain is clear but the solution shape is not
+  ("users can't share dashboards" → PRD scopes the problem; spec follows).
+- **Spec first** when the solution shape is clear but the implementation
+  is non-trivial ("rebuild auth on OAuth2" → spec leads; PRD often skipped).
+- **Both** when the feature is large enough that product and engineering
+  decisions belong in different artifacts.
 ### Architecture Decision Record (ADR)
 For significant technical decisions. Stored in `agents/decisions/`.
@@ -161,6 +216,8 @@ If a developer reads only this document, they should be able to build it.
 ## Auto-trigger keywords
 - technical spec
+- PRD
+- product requirements
 - RFC
 - ADR
 - architecture decision

package/.agent-src/skills/threat-modeling/SKILL.md CHANGED Viewed

@@ -2,6 +2,7 @@
 name: threat-modeling
 description: "Use when adding auth, webhooks, uploads, queues, secrets, tenant boundaries, or public endpoints — produces trust boundaries + abuse cases mapped to files, BEFORE implementation."
 source: package
+council_depth: deep
 ---
 # threat-modeling