npm - projecta-rrr - Versions diffs - 1.24.5 → 1.24.8 - Mend

projecta-rrr 1.24.5 → 1.24.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/AGENTS.md +1 -1
package/CHANGELOG.md +55 -0
package/agents/rrr-auditor.md +41 -0
package/agents/rrr-codebase-mapper.md +41 -0
package/agents/rrr-integration-checker.md +45 -0
package/agents/rrr-plan-checker.md +112 -0
package/agents/rrr-verifier.md +127 -0
package/commands/rrr/add-todo.md +30 -1
package/commands/rrr/audit-milestone.md +75 -0
package/commands/rrr/check-todos.md +24 -7
package/commands/rrr/discuss-milestone.md +54 -1
package/commands/rrr/plan-phase.md +88 -0
package/commands/rrr/ship.md +35 -0
package/commands/rrr/verify-work.md +47 -0
package/package.json +6 -3
package/rrr/lib/team-mode/manager.js +136 -5
package/rrr/references/design-review-checklist.md +150 -0
package/rrr/references/review-checklist.md +203 -0
package/scripts/rrr-team-mode.js +10 -1

package/AGENTS.md CHANGED Viewed

@@ -96,4 +96,4 @@ The RRR loop follows five steps:
 - Use `$rrr-help` for full skill catalogue
 - Trigger phrases are case-sensitive: `$rrr-plan-phase` not `$rrr-PlanPhase`
-<!-- generated: 2026-05-13T01:06:46.790Z | source: commands/rrr/*.md | count: 59 skills -->
+<!-- generated: 2026-05-13T03:49:42.852Z | source: commands/rrr/*.md | count: 59 skills -->

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,61 @@ All notable changes to RRR will be documented in this file.
 Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
+## [1.24.8] - 2026-05-13
+### Security (Phase 91 — close remaining Pass 1 audit findings)
+Audit report at `.planning/audits/2026-05-13-coordinator-pass1.md` (gitignored, local-only). All findings now mitigated or accepted-with-documentation; deployed to Sprite team-platform and verified.
+- **`handleWebhook` input validation (server.mjs)** — every webhook input that flows into a shell-string is now regex-validated BEFORE script construction: `repository.full_name` against `^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$`; `ref` against `^refs/heads/[A-Za-z0-9._/-]+$`; branch substring against `^[A-Za-z0-9._/-]+$`; `pull_request.number` coerced via `Number()` and rejected if not a positive integer. Mismatch returns `400 {error,field,value}` with value-preview sanitised to printable ASCII, ≤ 80 chars. Defense-in-depth — GitHub blocks dangerous chars in repo names today, but a single future GitHub behavior change would have made the unvalidated path exploitable. Set `RRR_VALIDATE_SELF_CHECK=1` to run the validator's built-in test cases on startup.
+- **Untrusted-content delimiters in reviewer + synthesizer prompts (scripts.mjs)** — every reviewer prompt now begins with an `## INPUT TRUST BOUNDARY` preamble explicitly listing PR diff, PR body, and repo contents as attacker-controllable. Reviewer outputs feeding the synthesizer are now wrapped in `<reviewer name="...">…</reviewer>` blocks with the same anti-injection clause applied to LLM-generated reviewer content. Closes the prompt-injection vector that was reachable from any external PR.
+- **Debounce state persistence (server.mjs)** — `DEBOUNCE` Map writes to `/tmp/rrr-debounce.json` on every mutation (atomic `tmp + rename`); on boot, the Map is rehydrated and any in-window pending entries get a fresh trailing-edge timer with the remaining time. Survives `sprite-env services restart` without dropping coalesced events.
+- **Jobs persistence (server.mjs)** — finished jobs append to `/tmp/rrr-jobs.ndjson` (NDJSON, append-only). On boot, the last 100 lines are replayed into the in-memory `JOBS` Map so `/jobs/<id>` queries return historical jobs after restart. Output/error bodies are not persisted (size-bounded in-process only).
+- **Non-pass-through webhook events recorded (server.mjs)** — events that fall through (unhandled event type, base-not-main, branch-not-main-or-integration, validation failures) now create a `kind: "webhook-ignored"` job stub with the event name + action + reason. Visible via `/jobs` — operator gets visibility into misconfigured subscriptions without changing the 200 response shape.
+- **Service spawn: `bash -lc` → `bash -c` + explicit loader source (server.mjs)** — `spawnReview` no longer uses login-shell autosource. Instead `bash -c 'set -e; [ -r "$HOME/.rrr-load.sh" ] && source "$HOME/.rrr-load.sh"; bash <scriptPath>'`. Reduces blast radius if any future vector achieves env-var injection — only the narrow loader file is sourced, not whatever `~/.profile` chain pulls in.
+- **`/status` surfaces persistence paths** — adds `debounce_file` and `jobs_file` fields so operators can quickly find on-disk state during incident response.
+- **`cf-coordinator/src/index.ts` SECURITY-DEPRECATED banner** — prominent header at the top of the file naming the 7 CRITICAL findings and the do-not-revive condition. The Worker itself remains deleted from Cloudflare (no infra change in this release).
+### Notes
+- **No externally-exploitable findings remain** as of v1.24.8 deploy. Shell-injection CRITICALs were never externally exploitable (GitHub blocks dangerous chars); they are now defense-in-depth-validated. Prompt-injection via PR bodies WAS exploitable; now mitigated by reviewer Bash-tool removal (v1.24.7 hotfix) AND untrusted-content delimiters (this release).
+- **Validator self-check is opt-in** — leaving it off in production keeps boot fast; CI can flip `RRR_VALIDATE_SELF_CHECK=1` to assert the regex assertions hold.
+## [1.24.7] - 2026-05-13
+### Added — gstack borrow integration (Phase 90, no new commands — all enhancements to existing commands/agents)
+- **TODO priority tiers (P0–P4) + effort tiers (S/M/L/XL) + `why` field** — `commands/rrr/add-todo.md` prompts for both via `AskUserQuestion` (User Sovereignty: never auto-decide); `commands/rrr/check-todos.md` groups display by priority with P0 visually separated as 🔴 BLOCKING. Adapted from gstack [`TODOS.md` format](https://github.com/garrytan/gstack).
+- **ASCII coverage-diagram check in `rrr-plan-checker`** — new dimension 5.5: plan must render a coverage diagram with `★★★` / `★★` / `★` / `[GAP]` markers OR explicitly set `coverage_diagram_not_applicable: true` with reason. Warning (does not block). Adapted from gstack [`plan-eng-review/SKILL.md`](https://github.com/garrytan/gstack).
+- **Overbuilt-plan detector in `rrr-plan-checker`** — new dimension 5.6: flags scope inflation (files_modified ≥ 8, ≥ 2 new classes, new top-level dir, new dep, "framework/abstraction/generic" in title) and prompts the operator for a minimal alternative via `AskUserQuestion`. Adapted from gstack `plan-eng-review/SKILL.md` "overbuilt plans" rule.
+- **Optional dual-voice second planner in `/rrr:plan-phase`** — step 9.5 auto-triggers on risky-domain keywords (`auth`/`webhook`/`payment`/`llm`/`oauth`/`migration`/`secret`/`pii`/`rbac`/`prompt-injection`) or `--dual-voice` flag. Spawns a blind `rrr-planner` (no RESEARCH.md, no first-planner output) and builds a divergence table; disagreements surface via `AskUserQuestion`. Documentation includes the gstack Codex filesystem-boundary preamble for future second-model integration. Adapted from gstack [`autoplan/SKILL.md`](https://github.com/garrytan/gstack) + [`codex/SKILL.md`](https://github.com/garrytan/gstack).
+- **Strategy Scope Challenge in `/rrr:discuss-milestone`** — new step 5: four scope-mode passes (EXPANSION / SELECTIVE / HOLD / REDUCTION) with inversion-reflex prompts before locking milestone scope. Every move surfaced via `AskUserQuestion`. Adapted from gstack [`plan-ceo-review/SKILL.md`](https://github.com/garrytan/gstack).
+- **Console-error capture + Runtime Health Score in `/rrr:verify-work --uat`** — UAT now calls `browser_console_messages` after each navigate (errors fail the step, warnings advise); computes a 0–100 health score weighted across console / links / visual / functional / perf with a persisted baseline at `.planning/artifacts/qa-baseline.json`. Below 60 → UAT fails. Adapted from gstack [`qa/SKILL.md`](https://github.com/garrytan/gstack).
+- **Design Review (audit-only) in `/rrr:verify-work --design-review`** — 80-item / 10-category checklist with Design Score (A–F) + AI-Slop Score (0–10). RRR does **not** borrow gstack's auto-fix loop (conflicts with planner→executor separation); findings become `### Design Review` items in VERIFICATION.md that the operator turns into todos / new phases. New reference file: `rrr/references/design-review-checklist.md`. Adapted from gstack [`design-review/SKILL.md`](https://github.com/garrytan/gstack).
+- **Retro Metrics in `/rrr:audit-milestone`** — new step 6.5 emits commits-per-phase, fix-to-feature ratio (> 50% triggers `REVIEW_QUALITY_WARNING`), file-churn hotspots, session detection, and net LOC delta into the milestone audit report. Informational; doesn't block archive. Adapted from gstack [`retro/SKILL.md`](https://github.com/garrytan/gstack).
+- **Composite Health-Score preflight in `/rrr:ship`** — new step 1.7: runs typecheck (25 pts) + lint (20) + tests (30) + dead-code (15) + shellcheck (10) where defined; renormalises if any tool is absent. Score < 50 hard-blocks ship; 50–69 warns via `AskUserQuestion`; ≥ 70 continues. Trend logged to `.planning/artifacts/health-history.jsonl`. Adapted from gstack [`health/SKILL.md`](https://github.com/garrytan/gstack).
+- **Learnings persistence (`.planning/learnings.jsonl`)** — `rrr-verifier` appends a one-line entry per gap on `gaps_found: true`; `/rrr:plan-phase` step 3.7b queries the file and surfaces relevant entries via `AskUserQuestion` before planning. Complements (does NOT replace) `decision-recall.js` — that tracks deliberate decisions; this tracks observed patterns. Adapted from gstack [`learn/SKILL.md`](https://github.com/garrytan/gstack).
+- **Codex filesystem-boundary preamble (documented for future second-opinion paths)** — `Do NOT read files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/.` Embedded in `commands/rrr/plan-phase.md` step 9.5 so any future Codex-based second voice doesn't audit the meta-system. From gstack [`codex/SKILL.md`](https://github.com/garrytan/gstack).
+### Notes
+- **No new commands were created.** Every gstack-recommended new command (`rrr:health`, `rrr:retro`, `rrr:design-review`, `rrr:strategy-review`, `rrr:design-system`) was mapped onto an enhancement of the closest existing RRR command, preserving the existing command surface.
+- **User Sovereignty preserved everywhere.** All non-mechanical decisions surfaced via `AskUserQuestion` — RRR never silently picks a side on contested questions.
+- **Planner→Executor separation preserved.** The verifier remains read-only; gstack's auto-fix loops in `/qa` and `/design-review` were **not** borrowed.
+## [1.24.6] - 2026-05-13
+### Added
+- **Security pass in rrr-verifier** — every phase verification now runs a Pass 1 (CRITICAL) security scan on phase-modified files: SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Shell Injection, and Enum & Value Completeness. Findings appear as a `### Security Pass` subsection in VERIFICATION.md. A Pass 1 finding blocks phase status only when it is on a file the phase claims to have completed; legacy file findings are recorded as `SECURITY_ADVISORY` without blocking. Pass 2 (INFORMATIONAL, 8 categories) is opt-in via `mode: deep-review`. Checklist content adapted from [garrytan/gstack `review/checklist.md`](https://github.com/garrytan/gstack) (gstack ETHOS: "User Sovereignty — AI recommends, human decides").
+- **`rrr/references/review-checklist.md`** — standalone reference containing both passes with grep patterns, severity classification, Fix-First heuristic, and suppression rules. Updatable independently of the verifier prompt.
+- **Infisical machine-identity auth mode** — `$rrr-provision-coordinator --auth-mode infisical-machine-identity --infisical-project-id <id> --infisical-env <env>` writes a coordinator contract that fetches secrets at boot from Infisical via Universal Auth. Required runtime secrets become `RRR_INFI_CID` + `RRR_INFI_CS` only; Anthropic and GitHub tokens live in Infisical and rotate centrally.
+- **Infisical Sprite bootstrap** — `$rrr-provision-sprites --apply --infisical-project-id <id>` installs Infisical CLI inside each Sprite, writes machine-identity creds, and wires `~/.profile` to fetch `ANTHROPIC_API_KEY` on every login shell. Uses `RRR_INFI_CID` + `RRR_INFI_CS` from the provisioning host's environment.
+### Changed
+- **Default coordinator auth mode** — `infisical-machine-identity` is now the default (was `claude-code-subscription`). The subscription path remains available via `--auth-mode claude-code-subscription`.
+- **Coordinator README** — generated README emits the Infisical install + login + secret-fetch boot snippet when in Infisical mode; falls back to credentials-file injection or AI Gateway when in their respective modes.
+- **`$rrr-coordinator-status`** — checks for `RRR_INFI_CID`/`RRR_INFI_CS` env and `infisical --version` when in Infisical mode.
 ## [1.24.5] - 2026-05-13
 ### Added

package/agents/rrr-auditor.md CHANGED Viewed

@@ -24,6 +24,47 @@ A "deprecated" doc might contain the only record of a critical decision. An "out
 **Canonical Truth Rule:** `.planning/*` is canonical. Non-.planning markdown is reference-only unless explicitly imported.
 </core_principle>
+<architectural_guards>
+## Step 0 (REQUIRED FIRST) — Load Architectural Guards
+Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
+**Files to read (in order, mandatory):**
+1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
+2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
+3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
+4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
+**Use the loaded context as guards:**
+When you find a "gap" / "missing connection" / "unverified requirement" / "architectural concern":
+- Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
+  → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
+- Does a locked rule prescribe a DIFFERENT fix than the obvious one?
+  → Recommend the locked-rule fix, not the obvious one. Quote the source.
+- Is the architecture genuinely ambiguous and could be resolved in multiple ways?
+  → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
+**Report format addition:**
+Open your final report with a section:
+```markdown
+## Architectural Context Loaded
+- Locked source: <path> — <quoted relevant clause>
+- (or) No locked sources found in this repo.
+```
+If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
+**If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
+</architectural_guards>
 <scan_ignored_paths>
 **NEVER scan these paths:**
 - GSDWatcher/**

package/agents/rrr-codebase-mapper.md CHANGED Viewed

@@ -26,6 +26,47 @@ You are spawned by `/rrr:map-codebase` with one of four focus areas:
 Your job: Explore thoroughly, then write document(s) directly. Return confirmation only.
 </role>
+<architectural_guards>
+## Step 0 (REQUIRED FIRST) — Load Architectural Guards
+Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
+**Files to read (in order, mandatory):**
+1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
+2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
+3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
+4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
+**Use the loaded context as guards:**
+When you find a "gap" / "missing connection" / "unverified requirement" / "architectural concern":
+- Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
+  → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
+- Does a locked rule prescribe a DIFFERENT fix than the obvious one?
+  → Recommend the locked-rule fix, not the obvious one. Quote the source.
+- Is the architecture genuinely ambiguous and could be resolved in multiple ways?
+  → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
+**Report format addition:**
+Open your final report with a section:
+```markdown
+## Architectural Context Loaded
+- Locked source: <path> — <quoted relevant clause>
+- (or) No locked sources found in this repo.
+```
+If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
+**If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
+</architectural_guards>
 <why_this_matters>
 **These documents are consumed by other RRR commands:**

package/agents/rrr-integration-checker.md CHANGED Viewed

@@ -25,8 +25,53 @@ Integration verification checks connections:
 4. **Data → Display** — Database has data, UI renders it?
 A "complete" codebase with broken wiring is a broken product.
+**Critical second principle: Integration ≠ Correct Architecture**
+A perfectly wired system can still violate the locked architectural decisions of the project. Before recommending any "missing wiring" or "broken flow" fix, you MUST verify the proposed fix does not contradict an architectural lock. See Step 0 below.
 </core_principle>
+<architectural_guards>
+## Step 0 (REQUIRED FIRST) — Load Architectural Guards
+Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
+**Files to read (in order, mandatory):**
+1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
+2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
+3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
+4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
+**Use the loaded context as guards:**
+When you find a "gap" or "missing connection," ask:
+- Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
+  → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
+- Does a locked rule prescribe a DIFFERENT fix than the obvious one?
+  → Recommend the locked-rule fix, not the obvious one. Quote the source.
+- Is the architecture genuinely ambiguous and could be resolved in multiple ways?
+  → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
+**Report format addition:**
+Open your final report with a section:
+```markdown
+## Architectural Context Loaded
+- Locked source: <path> — <quoted relevant clause>
+- Locked source: <path> — <quoted relevant clause>
+- (or) No locked sources found in this repo.
+```
+If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
+**If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
+</architectural_guards>
 <inputs>
 ## Required Context (provided by milestone auditor)

package/agents/rrr-plan-checker.md CHANGED Viewed

@@ -48,6 +48,47 @@ Then verify each level against the actual plan files.
 Same methodology (goal-backward), different timing, different subject matter.
 </core_principle>
+<architectural_guards>
+## Step 0 (REQUIRED FIRST) — Load Architectural Guards
+Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
+**Files to read (in order, mandatory):**
+1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
+2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
+3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
+4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
+**Use the loaded context as guards:**
+When you find a "gap" / "missing connection" / "unverified requirement" / "architectural concern":
+- Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
+  → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
+- Does a locked rule prescribe a DIFFERENT fix than the obvious one?
+  → Recommend the locked-rule fix, not the obvious one. Quote the source.
+- Is the architecture genuinely ambiguous and could be resolved in multiple ways?
+  → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
+**Report format addition:**
+Open your final report with a section:
+```markdown
+## Architectural Context Loaded
+- Locked source: <path> — <quoted relevant clause>
+- (or) No locked sources found in this repo.
+```
+If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
+**If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
+</architectural_guards>
 <verification_dimensions>
 ## Dimension 1: Requirement Coverage
@@ -207,6 +248,77 @@ issue:
   fix_hint: "Split into 2 plans: foundation (01) and integration (02)"
 ```
+## Dimension 5.5: ASCII Coverage Diagram (gstack-borrowed)
+**Question:** Does the plan declare a coverage diagram for each code path / user flow?
+Borrowed from gstack's [`plan-eng-review/SKILL.md`](https://github.com/garrytan/gstack). Forces planners to think through which paths are well-covered vs. thinly-covered vs. uncovered BEFORE the executor starts. This catches "we wrote the plan but the diff would leave Path X completely untested" early.
+**Process:**
+1. Scan each PLAN.md body for an ASCII diagram (lines containing `★` or `[GAP]` markers, OR a fenced block titled `coverage`/`flow`).
+2. If absent AND the plan frontmatter does NOT set `coverage_diagram_not_applicable: true` with a one-line `coverage_diagram_skip_reason`, flag `MISSING_COVERAGE_DIAGRAM`.
+**Markers used in the diagram:**
+| Marker | Meaning |
+|--------|---------|
+| `★★★`  | Fully covered (unit + integration + e2e or equivalent) |
+| `★★`   | Partially covered |
+| `★`    | Thinly covered |
+| `[GAP]`| Uncovered — explicit acknowledgment |
+**When to skip:** Docs-only changes (`surface: docs_only`), config bumps, version churn. The plan declares this in frontmatter, not silently.
+**Severity:** Warning (does NOT block execution by itself — it forces a planner-level reflection). Operator can override after seeing the warning.
+**Example issue:**
+```yaml
+issue:
+  dimension: coverage_diagram
+  severity: warning
+  description: "Plan 01 has 3 user-flow truths but no coverage diagram"
+  plan: "01"
+  fix_hint: "Add a 5-line ASCII flow with ★/★★/★★★/[GAP] markers OR set coverage_diagram_not_applicable: true with reason"
+```
+## Dimension 5.6: Overbuilt Plan Detector (gstack-borrowed)
+**Question:** Is this plan doing more than the phase needs?
+Borrowed from gstack's [`plan-eng-review/SKILL.md`](https://github.com/garrytan/gstack) "overbuilt plans" rule. Flags scope inflation early. Recommends a minimal alternative as a QUESTION to the operator — never auto-shrinks (User Sovereignty).
+**Trigger signals (ANY = flag):**
+- `files_modified` count ≥ 8 in a single plan
+- ≥ 2 new classes introduced (grep `class \w+` in proposed diffs section)
+- ≥ 1 new top-level directory (e.g., `src/foo/` where `foo` didn't exist)
+- ≥ 1 new package dependency in `package.json` / `Cargo.toml` / `pyproject.toml`
+- A plan title containing `framework`, `abstraction`, `generic`, `extensible`, `reusable` (common over-engineering tells)
+**Severity:** Warning (informational). The output MUST include a one-question prompt to the operator using `AskUserQuestion`:
+```
+Plan 01 has 11 files_modified and adds 2 new classes (UserAdapter, SessionAdapter).
+Minimal alternative considered: extend the existing AuthFacade.evaluate() with two
+new branches (~3 files). Which fits the phase goal?
+  1. Ship as planned (justified by [reason])
+  2. Reduce to the minimal alternative
+  3. Skip — none of these (let me explain)
+```
+**Example issue:**
+```yaml
+issue:
+  dimension: overbuilt_detector
+  severity: warning
+  description: "Plan 01 hits 3 of 5 overbuilt signals (file count, class count, new dependency)"
+  plan: "01"
+  signals:
+    files_modified: 11
+    new_classes: 2
+    new_deps: ["jose"]
+  minimal_alternative: "Reuse AuthFacade with two new evaluate() branches (~3 files)"
+  fix_hint: "Ask operator before executing"
+```
 ## Dimension 6: Verification Derivation
 **Question:** Do must_haves trace back to phase goal?

package/agents/rrr-verifier.md CHANGED Viewed

@@ -30,6 +30,47 @@ Goal-backward verification starts from the outcome and works backwards:
 Then verify each level against the actual codebase.
 </core_principle>
+<architectural_guards>
+## Step 0 (REQUIRED FIRST) — Load Architectural Guards
+Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
+**Files to read (in order, mandatory):**
+1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
+2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
+3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
+4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
+**Use the loaded context as guards:**
+When you find a "gap" / "missing connection" / "unverified requirement" / "architectural concern":
+- Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
+  → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
+- Does a locked rule prescribe a DIFFERENT fix than the obvious one?
+  → Recommend the locked-rule fix, not the obvious one. Quote the source.
+- Is the architecture genuinely ambiguous and could be resolved in multiple ways?
+  → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
+**Report format addition:**
+Open your final report with a section:
+```markdown
+## Architectural Context Loaded
+- Locked source: <path> — <quoted relevant clause>
+- (or) No locked sources found in this repo.
+```
+If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
+**If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
+</architectural_guards>
 <verification_process>
 ## Step 0: Check for Previous Verification
@@ -429,6 +470,57 @@ Categorize findings:
 - ⚠️ Warning: Indicates incomplete (TODO comments, console.log)
 - ℹ️ Info: Notable but not problematic
+## Step 7.5: Security Pass (Pass 1 — CRITICAL)
+Run after the anti-pattern scan, before human verification needs. Adapted from [garrytan/gstack `review/checklist.md`](https://github.com/garrytan/gstack) under the gstack ETHOS principle: "User Sovereignty — AI recommends, human decides."
+Full category definitions and grep patterns: `rrr/references/review-checklist.md`
+**Scope:** Apply Pass 1 categories ONLY to files modified in the current phase. Identify modified files via:
+```bash
+# From SUMMARY.md (preferred — respects phase scope)
+grep -E "^\- \`" "$PHASE_DIR"/*-SUMMARY.md 2>/dev/null | sed 's/.*`\([^`]*\)`.*/\1/' | sort -u
+# Fallback: git diff against phase baseline (last commit before this phase)
+git diff --name-only HEAD~1 2>/dev/null
+```
+**Pass 1 categories to check (CRITICAL):**
+1. **SQL & Data Safety** — string interpolation in queries, TOCTOU races, validation bypass, N+1 queries
+2. **Race Conditions & Concurrency** — read-check-write without uniqueness constraint, find-or-create without unique index, unsafe HTML rendering (XSS)
+3. **LLM Output Trust Boundary** — unvalidated LLM values to DB, untyped structured output, SSRF via LLM-provided URLs, prompt injection in vector stores
+4. **Shell Injection** — `subprocess(shell=True)` + interpolation, `os.system()` with variables, unguarded `eval()`/`exec()`
+5. **Enum & Value Completeness** — trace new enum values through all consumers, check allowlists, check `case`/`if-elsif` chains
+**Pass 2 (INFORMATIONAL):** Opt-in only. Run when invoked with `mode: deep-review` or when the phase explicitly touches security-sensitive domains (auth, webhooks, payments, LLM pipelines). Categories: Async/Sync Mixing, Column/Field Name Safety, LLM Prompt Issues, Completeness Gaps, Time Window Safety, Type Coercion at Boundaries, View/Frontend, Distribution & CI/CD Pipeline. See `rrr/references/review-checklist.md` Pass 2 for details.
+**Severity gating:**
+A Pass 1 finding BLOCKS phase verification (contributes to `gaps_found` status) ONLY if:
+- The finding is on a file the phase claims to have completed (listed in SUMMARY.md artifacts or PLAN.md `must_haves`), AND
+- The severity is CRITICAL (Pass 1 category)
+A Pass 1 finding on an unrelated legacy file that the phase incidentally touched does NOT fail the phase — record it as `SECURITY_ADVISORY` (informational, operator-review only).
+Pass 2 findings NEVER block phase status. They are always `ADVISORY`.
+**Output in VERIFICATION.md:** Add a `### Security Pass` subsection under `### Anti-Patterns Found`:
+```markdown
+### Security Pass (gstack Pass 1 — CRITICAL)
+| File | Line | Category | Finding | Severity | Blocks Phase |
+| ---- | ---- | -------- | ------- | -------- | ------------ |
+| `path/to/file.py` | 42 | LLM Output Trust Boundary | LLM URL fetched without allowlist (SSRF risk) | CRITICAL | Yes — phase artifact |
+| `legacy/old.rb` | 12 | SQL & Data Safety | String interpolation in query | CRITICAL | No — not a phase artifact |
+_Pass 2 (INFORMATIONAL) not run. Invoke with `mode: deep-review` to enable._
+```
+If no findings: `Security Pass: No Pass 1 issues found in phase-modified files.`
 ## Step 8: Identify Human Verification Needs
 Some things can't be verified programmatically:
@@ -764,6 +856,39 @@ return <div>No messages</div>  // Always shows "no messages"
 </stub_detection_patterns>
+<learnings_persistence>
+## Learnings Persistence (gstack-borrowed)
+Adapted from gstack [`learn/SKILL.md`](https://github.com/garrytan/gstack) `learnings.jsonl` pattern. When verification reports `gaps_found: true`, append a one-line entry per distinct gap to `.planning/learnings.jsonl` so the next phase planner can query and surface relevant prior patterns.
+```bash
+# After writing VERIFICATION.md, for each gap of severity >= warning:
+mkdir -p .planning
+cat >> .planning/learnings.jsonl <<JSONL
+{"timestamp":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","phase":"<phase>","key":"<gap-category-slug>","insight":"<one-line>","source":"verifier","severity":"<warning|critical>"}
+JSONL
+```
+Schema:
+| Field | Notes |
+|-------|-------|
+| `timestamp` | ISO-8601 UTC |
+| `phase` | The phase number / slug this gap surfaced in |
+| `key` | Lowercase kebab-case category slug (e.g., `chat-component-stub`, `unverified-webhook-handler`, `missing-error-state`) — same key across phases means same pattern |
+| `insight` | Single sentence, < 200 chars |
+| `source` | `verifier` here; other writers may use `plan-checker`, `integration-checker`, `auditor` |
+| `severity` | `warning` or `critical` (mirrors VERIFICATION.md classification) |
+**Dedup:** Later entries with the same `key` are not deduplicated at write time (append-only is intentional — frequency itself is signal). Readers (e.g., `plan-phase.md` step 3.7b) latest-wins per key for display.
+**Separation from decision-recall.** This is NOT the same as RRR's existing `decision-store.js` — that tracks deliberate decisions ("we chose Postgres over SQLite because..."). Learnings track *observed patterns* the planner can use to warn next time ("stub-pattern showed up in this surface twice — watch for it again"). Both coexist.
+Attribution: gstack `learn/SKILL.md` + `learnings.jsonl` schema.
+</learnings_persistence>
 <success_criteria>
 - [ ] Previous VERIFICATION.md checked (Step 0)
@@ -774,6 +899,8 @@ return <div>No messages</div>  // Always shows "no messages"
 - [ ] All key links verified
 - [ ] Requirements coverage assessed (if applicable)
 - [ ] Anti-patterns scanned and categorized
+- [ ] Security pass (Pass 1 CRITICAL) run on phase-modified files; findings in VERIFICATION.md
+- [ ] On gaps_found: one learning entry per distinct gap appended to `.planning/learnings.jsonl`
 - [ ] Human verification items identified
 - [ ] Overall status determined
 - [ ] Gaps structured in YAML frontmatter (if gaps_found)

package/commands/rrr/add-todo.md CHANGED Viewed

@@ -87,6 +87,31 @@ If overlapping, use AskUserQuestion:
   - "Add anyway" — create as separate todo
 </step>
+<step name="prioritize">
+Ask the operator for priority and effort using `AskUserQuestion` (User Sovereignty — never auto-decide). Priority tiers borrowed from gstack `TODOS.md` format:
+| Tier | Meaning |
+|------|---------|
+| P0 | Blocking — fix now, before any other work |
+| P1 | Critical this cycle — required for the active milestone |
+| P2 | After urgent work — defaults if unsure |
+| P3 | Revisit with usage data — gate on a signal (telemetry, user report) |
+| P4 | Future consideration — backlog reference only |
+Effort tiers borrowed from gstack:
+| Tier | Hours |
+|------|-------|
+| S | 4–8 hours |
+| M | ~1 day |
+| L | 2–3 days |
+| XL | 4+ days |
+Surface BOTH questions in a single `AskUserQuestion` call (multiSelect: false on each). Use the inferred area + problem statement to suggest a default in the option labels (e.g. "P2 (Recommended)" for general defensive work). The operator's selection becomes the `priority` and `effort` frontmatter values.
+Also collect a single-line `why` from the operator (or compress from the problem statement) — this becomes the `why:` frontmatter.
+</step>
 <step name="create_file">
 ```bash
 timestamp=$(date "+%Y-%m-%dT%H:%M")
@@ -102,6 +127,9 @@ Write to `.planning/todos/pending/${date_prefix}-${slug}.md`:
 created: [timestamp]
 title: [title]
 area: [area]
+priority: [P0|P1|P2|P3|P4]      # gstack-borrowed tier
+effort: [S|M|L|XL]                # gstack-borrowed tier
+why: [one-line motivation]
 files:
   - [file:lines]
 ---
@@ -173,10 +201,11 @@ Would you like to:
 <success_criteria>
 - [ ] Directory structure exists
-- [ ] Todo file created with valid frontmatter
+- [ ] Todo file created with valid frontmatter (including `priority` + `effort` + `why`)
 - [ ] Problem section has enough context for future Claude
 - [ ] No duplicates (checked and resolved)
 - [ ] Area consistent with existing todos
 - [ ] STATE.md updated if exists
 - [ ] Todo and state committed to git
+- [ ] Priority + effort prompted via AskUserQuestion (never auto-decided) — gstack User Sovereignty principle
 </success_criteria>

package/commands/rrr/audit-milestone.md CHANGED Viewed

@@ -138,6 +138,81 @@ Plus full markdown report with tables for requirements, phases, integration, tec
 - `gaps_found` — critical blockers exist
 - `tech_debt` — no blockers but accumulated deferred items need review
+## 6.5. Retro Metrics (gstack-borrowed)
+Adapted from gstack [`retro/SKILL.md`](https://github.com/garrytan/gstack). RRR has no built-in retrospective today — the audit verifies what shipped vs. what was intended, but not the *how* (velocity, quality signals, churn). These metrics are INFORMATIONAL — they do not block milestone archive. One signal (fix-to-feature ratio) raises a REVIEW_QUALITY_WARNING to the operator before archiving.
+Compute over `git log` covering the milestone's commit range (from the first commit after the previous milestone's archive tag to HEAD):
+### Commits per phase
+Group commits by phase number prefix in commit messages (e.g., `feat(phase-90):` → phase 90). Surface as a table.
+```bash
+git log v1.23..HEAD --pretty=format:"%s" 2>/dev/null \
+  | grep -oE "phase[- ]?[0-9]+" | sort | uniq -c | sort -rn
+```
+### Fix-to-feature ratio
+```bash
+fix=$(git log v1.23..HEAD --pretty=format:"%s" | grep -cE "^fix[(:]")
+feat=$(git log v1.23..HEAD --pretty=format:"%s" | grep -cE "^feat[(:]")
+echo "scale=2; $fix / ($fix + $feat) * 100" | bc
+```
+- Below 30% → healthy
+- 30–50% → ADVISORY (note in audit report)
+- Above 50% → **REVIEW_QUALITY_WARNING** — surface via `AskUserQuestion` before archiving: "Fix-ratio is {N}% — likely review-gap signal. Continue archiving?" → Yes / Pause for retro phase / Let me investigate
+### File churn hotspots
+Top 5 files modified across the most phases (signals coupling / instability).
+```bash
+git log v1.23..HEAD --name-only --pretty=format:"%H" 2>/dev/null \
+  | grep -E "^[a-z]" | sort | uniq -c | sort -rn | head -5
+```
+### Session detection
+Group commits by 45-minute gaps; report session count + average duration.
+```bash
+git log v1.23..HEAD --pretty=format:"%ai" \
+  | awk 'BEGIN{prev=0; n=0; dur=0}
+         { ts = mktime(substr($1,1,4)" "substr($1,6,2)" "substr($1,9,2)" "substr($2,1,2)" "substr($2,4,2)" 00")
+           if (prev && ts - prev > 2700) { n++ }
+           if (prev) dur += ts - prev
+           prev = ts }
+         END { print "sessions: " (n+1) ", avg gap: " (dur/(n+1))/60 "m" }'
+```
+### Total LOC delta
+```bash
+git log v1.23..HEAD --shortstat --pretty=tformat: 2>/dev/null \
+  | awk '{ ins += $4; del += $6 } END { print "+" ins " -" del " net " (ins-del) }'
+```
+### Output
+Append a `### Retro Metrics` subsection to `v{version}-MILESTONE-AUDIT.md`:
+```markdown
+### Retro Metrics (gstack-borrowed)
+**Commits per phase:**
+| Phase | Commits |
+|-------|---------|
+| 87    | 8       |
+| 88    | 6       |
+| ...   | ...     |
+**Fix-to-feature ratio:** 24% — healthy (< 30%)
+**Top churn files:** package.json, CHANGELOG.md, agents/rrr-verifier.md, ...
+**Sessions:** 12 detected, avg gap 8.4 min within sessions
+**Net LOC delta:** +4,820 / -1,330 (net +3,490)
+```
+Attribution: gstack `retro/SKILL.md` metrics framework.
 ## 7. Present Results
 Route by status (see `<offer_next>`).