projecta-rrr 1.24.5 → 1.24.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -96,4 +96,4 @@ The RRR loop follows five steps:
96
96
  - Use `$rrr-help` for full skill catalogue
97
97
  - Trigger phrases are case-sensitive: `$rrr-plan-phase` not `$rrr-PlanPhase`
98
98
 
99
- <!-- generated: 2026-05-13T01:06:46.790Z | source: commands/rrr/*.md | count: 59 skills -->
99
+ <!-- generated: 2026-05-13T03:49:42.852Z | source: commands/rrr/*.md | count: 59 skills -->
package/CHANGELOG.md CHANGED
@@ -4,6 +4,61 @@ All notable changes to RRR will be documented in this file.
4
4
 
5
5
  Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
6
6
 
7
+ ## [1.24.8] - 2026-05-13
8
+
9
+ ### Security (Phase 91 — close remaining Pass 1 audit findings)
10
+
11
+ Audit report at `.planning/audits/2026-05-13-coordinator-pass1.md` (gitignored, local-only). All findings now mitigated or accepted-with-documentation; deployed to Sprite team-platform and verified.
12
+
13
+ - **`handleWebhook` input validation (server.mjs)** — every webhook input that flows into a shell-string is now regex-validated BEFORE script construction: `repository.full_name` against `^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$`; `ref` against `^refs/heads/[A-Za-z0-9._/-]+$`; branch substring against `^[A-Za-z0-9._/-]+$`; `pull_request.number` coerced via `Number()` and rejected if not a positive integer. Mismatch returns `400 {error,field,value}` with value-preview sanitised to printable ASCII, ≤ 80 chars. Defense-in-depth — GitHub blocks dangerous chars in repo names today, but a single future GitHub behavior change would have made the unvalidated path exploitable. Set `RRR_VALIDATE_SELF_CHECK=1` to run the validator's built-in test cases on startup.
14
+ - **Untrusted-content delimiters in reviewer + synthesizer prompts (scripts.mjs)** — every reviewer prompt now begins with an `## INPUT TRUST BOUNDARY` preamble explicitly listing PR diff, PR body, and repo contents as attacker-controllable. Reviewer outputs feeding the synthesizer are now wrapped in `<reviewer name="...">…</reviewer>` blocks with the same anti-injection clause applied to LLM-generated reviewer content. Closes the prompt-injection vector that was reachable from any external PR.
15
+ - **Debounce state persistence (server.mjs)** — `DEBOUNCE` Map writes to `/tmp/rrr-debounce.json` on every mutation (atomic `tmp + rename`); on boot, the Map is rehydrated and any in-window pending entries get a fresh trailing-edge timer with the remaining time. Survives `sprite-env services restart` without dropping coalesced events.
16
+ - **Jobs persistence (server.mjs)** — finished jobs append to `/tmp/rrr-jobs.ndjson` (NDJSON, append-only). On boot, the last 100 lines are replayed into the in-memory `JOBS` Map so `/jobs/<id>` queries return historical jobs after restart. Output/error bodies are not persisted (size-bounded in-process only).
17
+ - **Non-pass-through webhook events recorded (server.mjs)** — events that fall through (unhandled event type, base-not-main, branch-not-main-or-integration, validation failures) now create a `kind: "webhook-ignored"` job stub with the event name + action + reason. Visible via `/jobs` — operator gets visibility into misconfigured subscriptions without changing the 200 response shape.
18
+ - **Service spawn: `bash -lc` → `bash -c` + explicit loader source (server.mjs)** — `spawnReview` no longer uses login-shell autosource. Instead `bash -c 'set -e; [ -r "$HOME/.rrr-load.sh" ] && source "$HOME/.rrr-load.sh"; bash <scriptPath>'`. Reduces blast radius if any future vector achieves env-var injection — only the narrow loader file is sourced, not whatever `~/.profile` chain pulls in.
19
+ - **`/status` surfaces persistence paths** — adds `debounce_file` and `jobs_file` fields so operators can quickly find on-disk state during incident response.
20
+ - **`cf-coordinator/src/index.ts` SECURITY-DEPRECATED banner** — prominent header at the top of the file naming the 7 CRITICAL findings and the do-not-revive condition. The Worker itself remains deleted from Cloudflare (no infra change in this release).
21
+
22
+ ### Notes
23
+
24
+ - **No externally-exploitable findings remain** as of v1.24.8 deploy. Shell-injection CRITICALs were never externally exploitable (GitHub blocks dangerous chars); they are now defense-in-depth-validated. Prompt-injection via PR bodies WAS exploitable; now mitigated by reviewer Bash-tool removal (v1.24.7 hotfix) AND untrusted-content delimiters (this release).
25
+ - **Validator self-check is opt-in** — leaving it off in production keeps boot fast; CI can flip `RRR_VALIDATE_SELF_CHECK=1` to assert the regex assertions hold.
26
+
27
+ ## [1.24.7] - 2026-05-13
28
+
29
+ ### Added — gstack borrow integration (Phase 90, no new commands — all enhancements to existing commands/agents)
30
+
31
+ - **TODO priority tiers (P0–P4) + effort tiers (S/M/L/XL) + `why` field** — `commands/rrr/add-todo.md` prompts for both via `AskUserQuestion` (User Sovereignty: never auto-decide); `commands/rrr/check-todos.md` groups display by priority with P0 visually separated as 🔴 BLOCKING. Adapted from gstack [`TODOS.md` format](https://github.com/garrytan/gstack).
32
+ - **ASCII coverage-diagram check in `rrr-plan-checker`** — new dimension 5.5: plan must render a coverage diagram with `★★★` / `★★` / `★` / `[GAP]` markers OR explicitly set `coverage_diagram_not_applicable: true` with reason. Warning (does not block). Adapted from gstack [`plan-eng-review/SKILL.md`](https://github.com/garrytan/gstack).
33
+ - **Overbuilt-plan detector in `rrr-plan-checker`** — new dimension 5.6: flags scope inflation (files_modified ≥ 8, ≥ 2 new classes, new top-level dir, new dep, "framework/abstraction/generic" in title) and prompts the operator for a minimal alternative via `AskUserQuestion`. Adapted from gstack `plan-eng-review/SKILL.md` "overbuilt plans" rule.
34
+ - **Optional dual-voice second planner in `/rrr:plan-phase`** — step 9.5 auto-triggers on risky-domain keywords (`auth`/`webhook`/`payment`/`llm`/`oauth`/`migration`/`secret`/`pii`/`rbac`/`prompt-injection`) or `--dual-voice` flag. Spawns a blind `rrr-planner` (no RESEARCH.md, no first-planner output) and builds a divergence table; disagreements surface via `AskUserQuestion`. Documentation includes the gstack Codex filesystem-boundary preamble for future second-model integration. Adapted from gstack [`autoplan/SKILL.md`](https://github.com/garrytan/gstack) + [`codex/SKILL.md`](https://github.com/garrytan/gstack).
35
+ - **Strategy Scope Challenge in `/rrr:discuss-milestone`** — new step 5: four scope-mode passes (EXPANSION / SELECTIVE / HOLD / REDUCTION) with inversion-reflex prompts before locking milestone scope. Every move surfaced via `AskUserQuestion`. Adapted from gstack [`plan-ceo-review/SKILL.md`](https://github.com/garrytan/gstack).
36
+ - **Console-error capture + Runtime Health Score in `/rrr:verify-work --uat`** — UAT now calls `browser_console_messages` after each navigate (errors fail the step, warnings advise); computes a 0–100 health score weighted across console / links / visual / functional / perf with a persisted baseline at `.planning/artifacts/qa-baseline.json`. Below 60 → UAT fails. Adapted from gstack [`qa/SKILL.md`](https://github.com/garrytan/gstack).
37
+ - **Design Review (audit-only) in `/rrr:verify-work --design-review`** — 80-item / 10-category checklist with Design Score (A–F) + AI-Slop Score (0–10). RRR does **not** borrow gstack's auto-fix loop (conflicts with planner→executor separation); findings become `### Design Review` items in VERIFICATION.md that the operator turns into todos / new phases. New reference file: `rrr/references/design-review-checklist.md`. Adapted from gstack [`design-review/SKILL.md`](https://github.com/garrytan/gstack).
38
+ - **Retro Metrics in `/rrr:audit-milestone`** — new step 6.5 emits commits-per-phase, fix-to-feature ratio (> 50% triggers `REVIEW_QUALITY_WARNING`), file-churn hotspots, session detection, and net LOC delta into the milestone audit report. Informational; doesn't block archive. Adapted from gstack [`retro/SKILL.md`](https://github.com/garrytan/gstack).
39
+ - **Composite Health-Score preflight in `/rrr:ship`** — new step 1.7: runs typecheck (25 pts) + lint (20) + tests (30) + dead-code (15) + shellcheck (10) where defined; renormalises if any tool is absent. Score < 50 hard-blocks ship; 50–69 warns via `AskUserQuestion`; ≥ 70 continues. Trend logged to `.planning/artifacts/health-history.jsonl`. Adapted from gstack [`health/SKILL.md`](https://github.com/garrytan/gstack).
40
+ - **Learnings persistence (`.planning/learnings.jsonl`)** — `rrr-verifier` appends a one-line entry per gap on `gaps_found: true`; `/rrr:plan-phase` step 3.7b queries the file and surfaces relevant entries via `AskUserQuestion` before planning. Complements (does NOT replace) `decision-recall.js` — that tracks deliberate decisions; this tracks observed patterns. Adapted from gstack [`learn/SKILL.md`](https://github.com/garrytan/gstack).
41
+ - **Codex filesystem-boundary preamble (documented for future second-opinion paths)** — `Do NOT read files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/.` Embedded in `commands/rrr/plan-phase.md` step 9.5 so any future Codex-based second voice doesn't audit the meta-system. From gstack [`codex/SKILL.md`](https://github.com/garrytan/gstack).
42
+
43
+ ### Notes
44
+
45
+ - **No new commands were created.** Every gstack-recommended new command (`rrr:health`, `rrr:retro`, `rrr:design-review`, `rrr:strategy-review`, `rrr:design-system`) was mapped onto an enhancement of the closest existing RRR command, preserving the existing command surface.
46
+ - **User Sovereignty preserved everywhere.** All non-mechanical decisions surfaced via `AskUserQuestion` — RRR never silently picks a side on contested questions.
47
+ - **Planner→Executor separation preserved.** The verifier remains read-only; gstack's auto-fix loops in `/qa` and `/design-review` were **not** borrowed.
48
+
49
+ ## [1.24.6] - 2026-05-13
50
+
51
+ ### Added
52
+ - **Security pass in rrr-verifier** — every phase verification now runs a Pass 1 (CRITICAL) security scan on phase-modified files: SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Shell Injection, and Enum & Value Completeness. Findings appear as a `### Security Pass` subsection in VERIFICATION.md. A Pass 1 finding blocks phase status only when it is on a file the phase claims to have completed; legacy file findings are recorded as `SECURITY_ADVISORY` without blocking. Pass 2 (INFORMATIONAL, 8 categories) is opt-in via `mode: deep-review`. Checklist content adapted from [garrytan/gstack `review/checklist.md`](https://github.com/garrytan/gstack) (gstack ETHOS: "User Sovereignty — AI recommends, human decides").
53
+ - **`rrr/references/review-checklist.md`** — standalone reference containing both passes with grep patterns, severity classification, Fix-First heuristic, and suppression rules. Updatable independently of the verifier prompt.
54
+ - **Infisical machine-identity auth mode** — `$rrr-provision-coordinator --auth-mode infisical-machine-identity --infisical-project-id <id> --infisical-env <env>` writes a coordinator contract that fetches secrets at boot from Infisical via Universal Auth. Required runtime secrets become `RRR_INFI_CID` + `RRR_INFI_CS` only; Anthropic and GitHub tokens live in Infisical and rotate centrally.
55
+ - **Infisical Sprite bootstrap** — `$rrr-provision-sprites --apply --infisical-project-id <id>` installs Infisical CLI inside each Sprite, writes machine-identity creds, and wires `~/.profile` to fetch `ANTHROPIC_API_KEY` on every login shell. Uses `RRR_INFI_CID` + `RRR_INFI_CS` from the provisioning host's environment.
56
+
57
+ ### Changed
58
+ - **Default coordinator auth mode** — `infisical-machine-identity` is now the default (was `claude-code-subscription`). The subscription path remains available via `--auth-mode claude-code-subscription`.
59
+ - **Coordinator README** — generated README emits the Infisical install + login + secret-fetch boot snippet when in Infisical mode; falls back to credentials-file injection or AI Gateway when in their respective modes.
60
+ - **`$rrr-coordinator-status`** — checks for `RRR_INFI_CID`/`RRR_INFI_CS` env and `infisical --version` when in Infisical mode.
61
+
7
62
  ## [1.24.5] - 2026-05-13
8
63
 
9
64
  ### Added
@@ -24,6 +24,47 @@ A "deprecated" doc might contain the only record of a critical decision. An "out
24
24
  **Canonical Truth Rule:** `.planning/*` is canonical. Non-.planning markdown is reference-only unless explicitly imported.
25
25
  </core_principle>
26
26
 
27
+ <architectural_guards>
28
+
29
+ ## Step 0 (REQUIRED FIRST) — Load Architectural Guards
30
+
31
+ Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
32
+
33
+ **Files to read (in order, mandatory):**
34
+
35
+ 1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
36
+ 2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
37
+ 3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
38
+ 4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
39
+
40
+ **Use the loaded context as guards:**
41
+
42
+ When you find a "gap" / "missing connection" / "unverified requirement" / "architectural concern":
43
+ - Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
44
+ → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
45
+ - Does a locked rule prescribe a DIFFERENT fix than the obvious one?
46
+ → Recommend the locked-rule fix, not the obvious one. Quote the source.
47
+ - Is the architecture genuinely ambiguous and could be resolved in multiple ways?
48
+ → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
49
+
50
+ **Report format addition:**
51
+
52
+ Open your final report with a section:
53
+
54
+ ```markdown
55
+ ## Architectural Context Loaded
56
+
57
+ - Locked source: <path> — <quoted relevant clause>
58
+ - (or) No locked sources found in this repo.
59
+ ```
60
+
61
+ If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
62
+
63
+ **If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
64
+
65
+ </architectural_guards>
66
+
67
+
27
68
  <scan_ignored_paths>
28
69
  **NEVER scan these paths:**
29
70
  - GSDWatcher/**
@@ -26,6 +26,47 @@ You are spawned by `/rrr:map-codebase` with one of four focus areas:
26
26
  Your job: Explore thoroughly, then write document(s) directly. Return confirmation only.
27
27
  </role>
28
28
 
29
+ <architectural_guards>
30
+
31
+ ## Step 0 (REQUIRED FIRST) — Load Architectural Guards
32
+
33
+ Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
34
+
35
+ **Files to read (in order, mandatory):**
36
+
37
+ 1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
38
+ 2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
39
+ 3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
40
+ 4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
41
+
42
+ **Use the loaded context as guards:**
43
+
44
+ When you find a "gap" / "missing connection" / "unverified requirement" / "architectural concern":
45
+ - Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
46
+ → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
47
+ - Does a locked rule prescribe a DIFFERENT fix than the obvious one?
48
+ → Recommend the locked-rule fix, not the obvious one. Quote the source.
49
+ - Is the architecture genuinely ambiguous and could be resolved in multiple ways?
50
+ → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
51
+
52
+ **Report format addition:**
53
+
54
+ Open your final report with a section:
55
+
56
+ ```markdown
57
+ ## Architectural Context Loaded
58
+
59
+ - Locked source: <path> — <quoted relevant clause>
60
+ - (or) No locked sources found in this repo.
61
+ ```
62
+
63
+ If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
64
+
65
+ **If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
66
+
67
+ </architectural_guards>
68
+
69
+
29
70
  <why_this_matters>
30
71
  **These documents are consumed by other RRR commands:**
31
72
 
@@ -25,8 +25,53 @@ Integration verification checks connections:
25
25
  4. **Data → Display** — Database has data, UI renders it?
26
26
 
27
27
  A "complete" codebase with broken wiring is a broken product.
28
+
29
+ **Critical second principle: Integration ≠ Correct Architecture**
30
+
31
+ A perfectly wired system can still violate the locked architectural decisions of the project. Before recommending any "missing wiring" or "broken flow" fix, you MUST verify the proposed fix does not contradict an architectural lock. See Step 0 below.
28
32
  </core_principle>
29
33
 
34
+ <architectural_guards>
35
+
36
+ ## Step 0 (REQUIRED FIRST) — Load Architectural Guards
37
+
38
+ Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
39
+
40
+ **Files to read (in order, mandatory):**
41
+
42
+ 1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
43
+ 2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
44
+ 3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
45
+ 4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
46
+
47
+ **Use the loaded context as guards:**
48
+
49
+ When you find a "gap" or "missing connection," ask:
50
+ - Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
51
+ → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
52
+ - Does a locked rule prescribe a DIFFERENT fix than the obvious one?
53
+ → Recommend the locked-rule fix, not the obvious one. Quote the source.
54
+ - Is the architecture genuinely ambiguous and could be resolved in multiple ways?
55
+ → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
56
+
57
+ **Report format addition:**
58
+
59
+ Open your final report with a section:
60
+
61
+ ```markdown
62
+ ## Architectural Context Loaded
63
+
64
+ - Locked source: <path> — <quoted relevant clause>
65
+ - Locked source: <path> — <quoted relevant clause>
66
+ - (or) No locked sources found in this repo.
67
+ ```
68
+
69
+ If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
70
+
71
+ **If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
72
+
73
+ </architectural_guards>
74
+
30
75
  <inputs>
31
76
  ## Required Context (provided by milestone auditor)
32
77
 
@@ -48,6 +48,47 @@ Then verify each level against the actual plan files.
48
48
  Same methodology (goal-backward), different timing, different subject matter.
49
49
  </core_principle>
50
50
 
51
+ <architectural_guards>
52
+
53
+ ## Step 0 (REQUIRED FIRST) — Load Architectural Guards
54
+
55
+ Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
56
+
57
+ **Files to read (in order, mandatory):**
58
+
59
+ 1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
60
+ 2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
61
+ 3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
62
+ 4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
63
+
64
+ **Use the loaded context as guards:**
65
+
66
+ When you find a "gap" / "missing connection" / "unverified requirement" / "architectural concern":
67
+ - Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
68
+ → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
69
+ - Does a locked rule prescribe a DIFFERENT fix than the obvious one?
70
+ → Recommend the locked-rule fix, not the obvious one. Quote the source.
71
+ - Is the architecture genuinely ambiguous and could be resolved in multiple ways?
72
+ → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
73
+
74
+ **Report format addition:**
75
+
76
+ Open your final report with a section:
77
+
78
+ ```markdown
79
+ ## Architectural Context Loaded
80
+
81
+ - Locked source: <path> — <quoted relevant clause>
82
+ - (or) No locked sources found in this repo.
83
+ ```
84
+
85
+ If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
86
+
87
+ **If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
88
+
89
+ </architectural_guards>
90
+
91
+
51
92
  <verification_dimensions>
52
93
 
53
94
  ## Dimension 1: Requirement Coverage
@@ -207,6 +248,77 @@ issue:
207
248
  fix_hint: "Split into 2 plans: foundation (01) and integration (02)"
208
249
  ```
209
250
 
251
+ ## Dimension 5.5: ASCII Coverage Diagram (gstack-borrowed)
252
+
253
+ **Question:** Does the plan declare a coverage diagram for each code path / user flow?
254
+
255
+ Borrowed from gstack's [`plan-eng-review/SKILL.md`](https://github.com/garrytan/gstack). Forces planners to think through which paths are well-covered vs. thinly-covered vs. uncovered BEFORE the executor starts. This catches "we wrote the plan but the diff would leave Path X completely untested" early.
256
+
257
+ **Process:**
258
+ 1. Scan each PLAN.md body for an ASCII diagram (lines containing `★` or `[GAP]` markers, OR a fenced block titled `coverage`/`flow`).
259
+ 2. If absent AND the plan frontmatter does NOT set `coverage_diagram_not_applicable: true` with a one-line `coverage_diagram_skip_reason`, flag `MISSING_COVERAGE_DIAGRAM`.
260
+
261
+ **Markers used in the diagram:**
262
+ | Marker | Meaning |
263
+ |--------|---------|
264
+ | `★★★` | Fully covered (unit + integration + e2e or equivalent) |
265
+ | `★★` | Partially covered |
266
+ | `★` | Thinly covered |
267
+ | `[GAP]`| Uncovered — explicit acknowledgment |
268
+
269
+ **When to skip:** Docs-only changes (`surface: docs_only`), config bumps, version churn. The plan declares this in frontmatter, not silently.
270
+
271
+ **Severity:** Warning (does NOT block execution by itself — it forces a planner-level reflection). Operator can override after seeing the warning.
272
+
273
+ **Example issue:**
274
+ ```yaml
275
+ issue:
276
+ dimension: coverage_diagram
277
+ severity: warning
278
+ description: "Plan 01 has 3 user-flow truths but no coverage diagram"
279
+ plan: "01"
280
+ fix_hint: "Add a 5-line ASCII flow with ★/★★/★★★/[GAP] markers OR set coverage_diagram_not_applicable: true with reason"
281
+ ```
282
+
283
+ ## Dimension 5.6: Overbuilt Plan Detector (gstack-borrowed)
284
+
285
+ **Question:** Is this plan doing more than the phase needs?
286
+
287
+ Borrowed from gstack's [`plan-eng-review/SKILL.md`](https://github.com/garrytan/gstack) "overbuilt plans" rule. Flags scope inflation early. Recommends a minimal alternative as a QUESTION to the operator — never auto-shrinks (User Sovereignty).
288
+
289
+ **Trigger signals (ANY = flag):**
290
+ - `files_modified` count ≥ 8 in a single plan
291
+ - ≥ 2 new classes introduced (grep `class \w+` in proposed diffs section)
292
+ - ≥ 1 new top-level directory (e.g., `src/foo/` where `foo` didn't exist)
293
+ - ≥ 1 new package dependency in `package.json` / `Cargo.toml` / `pyproject.toml`
294
+ - A plan title containing `framework`, `abstraction`, `generic`, `extensible`, `reusable` (common over-engineering tells)
295
+
296
+ **Severity:** Warning (informational). The output MUST include a one-question prompt to the operator using `AskUserQuestion`:
297
+
298
+ ```
299
+ Plan 01 has 11 files_modified and adds 2 new classes (UserAdapter, SessionAdapter).
300
+ Minimal alternative considered: extend the existing AuthFacade.evaluate() with two
301
+ new branches (~3 files). Which fits the phase goal?
302
+ 1. Ship as planned (justified by [reason])
303
+ 2. Reduce to the minimal alternative
304
+ 3. Skip — none of these (let me explain)
305
+ ```
306
+
307
+ **Example issue:**
308
+ ```yaml
309
+ issue:
310
+ dimension: overbuilt_detector
311
+ severity: warning
312
+ description: "Plan 01 hits 3 of 5 overbuilt signals (file count, class count, new dependency)"
313
+ plan: "01"
314
+ signals:
315
+ files_modified: 11
316
+ new_classes: 2
317
+ new_deps: ["jose"]
318
+ minimal_alternative: "Reuse AuthFacade with two new evaluate() branches (~3 files)"
319
+ fix_hint: "Ask operator before executing"
320
+ ```
321
+
210
322
  ## Dimension 6: Verification Derivation
211
323
 
212
324
  **Question:** Do must_haves trace back to phase goal?
@@ -30,6 +30,47 @@ Goal-backward verification starts from the outcome and works backwards:
30
30
  Then verify each level against the actual codebase.
31
31
  </core_principle>
32
32
 
33
+ <architectural_guards>
34
+
35
+ ## Step 0 (REQUIRED FIRST) — Load Architectural Guards
36
+
37
+ Before ANY verification, read the project's locked architectural context. Findings that contradict a locked decision are NOT code gaps — they are **operator review items**, and you must flag them as such.
38
+
39
+ **Files to read (in order, mandatory):**
40
+
41
+ 1. `.planning/NORTH-STAR.md` if it exists — quote any rule that touches the area you're auditing.
42
+ 2. Any project root `AGENTS.md` / `CLAUDE.md` — the project may name additional locked docs.
43
+ 3. `.planning/milestones/<active>/ROADMAP.md` — current milestone scope + sequencing.
44
+ 4. Memory files matching glob `*-locked*` or `*_locked*` in `~/.claude/projects/<project>/memory/` if accessible — quote any that touch the area you're auditing.
45
+
46
+ **Use the loaded context as guards:**
47
+
48
+ When you find a "gap" / "missing connection" / "unverified requirement" / "architectural concern":
49
+ - Does a locked memory or NORTH-STAR rule explicitly say this should NOT exist?
50
+ → If yes, this is intentional. Do NOT flag as gap. Note as "intentional per <quoted-source>".
51
+ - Does a locked rule prescribe a DIFFERENT fix than the obvious one?
52
+ → Recommend the locked-rule fix, not the obvious one. Quote the source.
53
+ - Is the architecture genuinely ambiguous and could be resolved in multiple ways?
54
+ → Flag as **ARCHITECTURAL DECISION QUESTION (ADQ)** for operator review. Do NOT pick a side. State the options.
55
+
56
+ **Report format addition:**
57
+
58
+ Open your final report with a section:
59
+
60
+ ```markdown
61
+ ## Architectural Context Loaded
62
+
63
+ - Locked source: <path> — <quoted relevant clause>
64
+ - (or) No locked sources found in this repo.
65
+ ```
66
+
67
+ If a finding's verdict depended on a locked rule, cite the rule inline next to the finding.
68
+
69
+ **If you skip Step 0 and your fix contradicts a locked rule, the operator's trust in the audit collapses.** This step is not optional.
70
+
71
+ </architectural_guards>
72
+
73
+
33
74
  <verification_process>
34
75
 
35
76
  ## Step 0: Check for Previous Verification
@@ -429,6 +470,57 @@ Categorize findings:
429
470
  - ⚠️ Warning: Indicates incomplete (TODO comments, console.log)
430
471
  - ℹ️ Info: Notable but not problematic
431
472
 
473
+ ## Step 7.5: Security Pass (Pass 1 — CRITICAL)
474
+
475
+ Run after the anti-pattern scan, before human verification needs. Adapted from [garrytan/gstack `review/checklist.md`](https://github.com/garrytan/gstack) under the gstack ETHOS principle: "User Sovereignty — AI recommends, human decides."
476
+
477
+ Full category definitions and grep patterns: `rrr/references/review-checklist.md`
478
+
479
+ **Scope:** Apply Pass 1 categories ONLY to files modified in the current phase. Identify modified files via:
480
+
481
+ ```bash
482
+ # From SUMMARY.md (preferred — respects phase scope)
483
+ grep -E "^\- \`" "$PHASE_DIR"/*-SUMMARY.md 2>/dev/null | sed 's/.*`\([^`]*\)`.*/\1/' | sort -u
484
+
485
+ # Fallback: git diff against phase baseline (last commit before this phase)
486
+ git diff --name-only HEAD~1 2>/dev/null
487
+ ```
488
+
489
+ **Pass 1 categories to check (CRITICAL):**
490
+
491
+ 1. **SQL & Data Safety** — string interpolation in queries, TOCTOU races, validation bypass, N+1 queries
492
+ 2. **Race Conditions & Concurrency** — read-check-write without uniqueness constraint, find-or-create without unique index, unsafe HTML rendering (XSS)
493
+ 3. **LLM Output Trust Boundary** — unvalidated LLM values to DB, untyped structured output, SSRF via LLM-provided URLs, prompt injection in vector stores
494
+ 4. **Shell Injection** — `subprocess(shell=True)` + interpolation, `os.system()` with variables, unguarded `eval()`/`exec()`
495
+ 5. **Enum & Value Completeness** — trace new enum values through all consumers, check allowlists, check `case`/`if-elsif` chains
496
+
497
+ **Pass 2 (INFORMATIONAL):** Opt-in only. Run when invoked with `mode: deep-review` or when the phase explicitly touches security-sensitive domains (auth, webhooks, payments, LLM pipelines). Categories: Async/Sync Mixing, Column/Field Name Safety, LLM Prompt Issues, Completeness Gaps, Time Window Safety, Type Coercion at Boundaries, View/Frontend, Distribution & CI/CD Pipeline. See `rrr/references/review-checklist.md` Pass 2 for details.
498
+
499
+ **Severity gating:**
500
+
501
+ A Pass 1 finding BLOCKS phase verification (contributes to `gaps_found` status) ONLY if:
502
+ - The finding is on a file the phase claims to have completed (listed in SUMMARY.md artifacts or PLAN.md `must_haves`), AND
503
+ - The severity is CRITICAL (Pass 1 category)
504
+
505
+ A Pass 1 finding on an unrelated legacy file that the phase incidentally touched does NOT fail the phase — record it as `SECURITY_ADVISORY` (informational, operator-review only).
506
+
507
+ Pass 2 findings NEVER block phase status. They are always `ADVISORY`.
508
+
509
+ **Output in VERIFICATION.md:** Add a `### Security Pass` subsection under `### Anti-Patterns Found`:
510
+
511
+ ```markdown
512
+ ### Security Pass (gstack Pass 1 — CRITICAL)
513
+
514
+ | File | Line | Category | Finding | Severity | Blocks Phase |
515
+ | ---- | ---- | -------- | ------- | -------- | ------------ |
516
+ | `path/to/file.py` | 42 | LLM Output Trust Boundary | LLM URL fetched without allowlist (SSRF risk) | CRITICAL | Yes — phase artifact |
517
+ | `legacy/old.rb` | 12 | SQL & Data Safety | String interpolation in query | CRITICAL | No — not a phase artifact |
518
+
519
+ _Pass 2 (INFORMATIONAL) not run. Invoke with `mode: deep-review` to enable._
520
+ ```
521
+
522
+ If no findings: `Security Pass: No Pass 1 issues found in phase-modified files.`
523
+
432
524
  ## Step 8: Identify Human Verification Needs
433
525
 
434
526
  Some things can't be verified programmatically:
@@ -764,6 +856,39 @@ return <div>No messages</div> // Always shows "no messages"
764
856
 
765
857
  </stub_detection_patterns>
766
858
 
859
+ <learnings_persistence>
860
+
861
+ ## Learnings Persistence (gstack-borrowed)
862
+
863
+ Adapted from gstack [`learn/SKILL.md`](https://github.com/garrytan/gstack) `learnings.jsonl` pattern. When verification reports `gaps_found: true`, append a one-line entry per distinct gap to `.planning/learnings.jsonl` so the next phase planner can query and surface relevant prior patterns.
864
+
865
+ ```bash
866
+ # After writing VERIFICATION.md, for each gap of severity >= warning:
867
+ mkdir -p .planning
868
+ cat >> .planning/learnings.jsonl <<JSONL
869
+ {"timestamp":"$(date -u +%Y-%m-%dT%H:%M:%SZ)","phase":"<phase>","key":"<gap-category-slug>","insight":"<one-line>","source":"verifier","severity":"<warning|critical>"}
870
+ JSONL
871
+ ```
872
+
873
+ Schema:
874
+
875
+ | Field | Notes |
876
+ |-------|-------|
877
+ | `timestamp` | ISO-8601 UTC |
878
+ | `phase` | The phase number / slug this gap surfaced in |
879
+ | `key` | Lowercase kebab-case category slug (e.g., `chat-component-stub`, `unverified-webhook-handler`, `missing-error-state`) — same key across phases means same pattern |
880
+ | `insight` | Single sentence, < 200 chars |
881
+ | `source` | `verifier` here; other writers may use `plan-checker`, `integration-checker`, `auditor` |
882
+ | `severity` | `warning` or `critical` (mirrors VERIFICATION.md classification) |
883
+
884
+ **Dedup:** Later entries with the same `key` are not deduplicated at write time (append-only is intentional — frequency itself is signal). Readers (e.g., `plan-phase.md` step 3.7b) latest-wins per key for display.
885
+
886
+ **Separation from decision-recall.** This is NOT the same as RRR's existing `decision-store.js` — that tracks deliberate decisions ("we chose Postgres over SQLite because..."). Learnings track *observed patterns* the planner can use to warn next time ("stub-pattern showed up in this surface twice — watch for it again"). Both coexist.
887
+
888
+ Attribution: gstack `learn/SKILL.md` + `learnings.jsonl` schema.
889
+
890
+ </learnings_persistence>
891
+
767
892
  <success_criteria>
768
893
 
769
894
  - [ ] Previous VERIFICATION.md checked (Step 0)
@@ -774,6 +899,8 @@ return <div>No messages</div> // Always shows "no messages"
774
899
  - [ ] All key links verified
775
900
  - [ ] Requirements coverage assessed (if applicable)
776
901
  - [ ] Anti-patterns scanned and categorized
902
+ - [ ] Security pass (Pass 1 CRITICAL) run on phase-modified files; findings in VERIFICATION.md
903
+ - [ ] On gaps_found: one learning entry per distinct gap appended to `.planning/learnings.jsonl`
777
904
  - [ ] Human verification items identified
778
905
  - [ ] Overall status determined
779
906
  - [ ] Gaps structured in YAML frontmatter (if gaps_found)
@@ -87,6 +87,31 @@ If overlapping, use AskUserQuestion:
87
87
  - "Add anyway" — create as separate todo
88
88
  </step>
89
89
 
90
+ <step name="prioritize">
91
+ Ask the operator for priority and effort using `AskUserQuestion` (User Sovereignty — never auto-decide). Priority tiers borrowed from gstack `TODOS.md` format:
92
+
93
+ | Tier | Meaning |
94
+ |------|---------|
95
+ | P0 | Blocking — fix now, before any other work |
96
+ | P1 | Critical this cycle — required for the active milestone |
97
+ | P2 | After urgent work — defaults if unsure |
98
+ | P3 | Revisit with usage data — gate on a signal (telemetry, user report) |
99
+ | P4 | Future consideration — backlog reference only |
100
+
101
+ Effort tiers borrowed from gstack:
102
+
103
+ | Tier | Hours |
104
+ |------|-------|
105
+ | S | 4–8 hours |
106
+ | M | ~1 day |
107
+ | L | 2–3 days |
108
+ | XL | 4+ days |
109
+
110
+ Surface BOTH questions in a single `AskUserQuestion` call (multiSelect: false on each). Use the inferred area + problem statement to suggest a default in the option labels (e.g. "P2 (Recommended)" for general defensive work). The operator's selection becomes the `priority` and `effort` frontmatter values.
111
+
112
+ Also collect a single-line `why` from the operator (or compress from the problem statement) — this becomes the `why:` frontmatter.
113
+ </step>
114
+
90
115
  <step name="create_file">
91
116
  ```bash
92
117
  timestamp=$(date "+%Y-%m-%dT%H:%M")
@@ -102,6 +127,9 @@ Write to `.planning/todos/pending/${date_prefix}-${slug}.md`:
102
127
  created: [timestamp]
103
128
  title: [title]
104
129
  area: [area]
130
+ priority: [P0|P1|P2|P3|P4] # gstack-borrowed tier
131
+ effort: [S|M|L|XL] # gstack-borrowed tier
132
+ why: [one-line motivation]
105
133
  files:
106
134
  - [file:lines]
107
135
  ---
@@ -173,10 +201,11 @@ Would you like to:
173
201
 
174
202
  <success_criteria>
175
203
  - [ ] Directory structure exists
176
- - [ ] Todo file created with valid frontmatter
204
+ - [ ] Todo file created with valid frontmatter (including `priority` + `effort` + `why`)
177
205
  - [ ] Problem section has enough context for future Claude
178
206
  - [ ] No duplicates (checked and resolved)
179
207
  - [ ] Area consistent with existing todos
180
208
  - [ ] STATE.md updated if exists
181
209
  - [ ] Todo and state committed to git
210
+ - [ ] Priority + effort prompted via AskUserQuestion (never auto-decided) — gstack User Sovereignty principle
182
211
  </success_criteria>
@@ -138,6 +138,81 @@ Plus full markdown report with tables for requirements, phases, integration, tec
138
138
  - `gaps_found` — critical blockers exist
139
139
  - `tech_debt` — no blockers but accumulated deferred items need review
140
140
 
141
+ ## 6.5. Retro Metrics (gstack-borrowed)
142
+
143
+ Adapted from gstack [`retro/SKILL.md`](https://github.com/garrytan/gstack). RRR has no built-in retrospective today — the audit verifies what shipped vs. what was intended, but not the *how* (velocity, quality signals, churn). These metrics are INFORMATIONAL — they do not block milestone archive. One signal (fix-to-feature ratio) raises a REVIEW_QUALITY_WARNING to the operator before archiving.
144
+
145
+ Compute over `git log` covering the milestone's commit range (from the first commit after the previous milestone's archive tag to HEAD):
146
+
147
+ ### Commits per phase
148
+ Group commits by phase number prefix in commit messages (e.g., `feat(phase-90):` → phase 90). Surface as a table.
149
+
150
+ ```bash
151
+ git log v1.23..HEAD --pretty=format:"%s" 2>/dev/null \
152
+ | grep -oE "phase[- ]?[0-9]+" | sort | uniq -c | sort -rn
153
+ ```
154
+
155
+ ### Fix-to-feature ratio
156
+
157
+ ```bash
158
+ fix=$(git log v1.23..HEAD --pretty=format:"%s" | grep -cE "^fix[(:]")
159
+ feat=$(git log v1.23..HEAD --pretty=format:"%s" | grep -cE "^feat[(:]")
160
+ echo "scale=2; $fix / ($fix + $feat) * 100" | bc
161
+ ```
162
+
163
+ - Below 30% → healthy
164
+ - 30–50% → ADVISORY (note in audit report)
165
+ - Above 50% → **REVIEW_QUALITY_WARNING** — surface via `AskUserQuestion` before archiving: "Fix-ratio is {N}% — likely review-gap signal. Continue archiving?" → Yes / Pause for retro phase / Let me investigate
166
+
167
+ ### File churn hotspots
168
+ Top 5 files modified across the most phases (signals coupling / instability).
169
+
170
+ ```bash
171
+ git log v1.23..HEAD --name-only --pretty=format:"%H" 2>/dev/null \
172
+ | grep -E "^[a-z]" | sort | uniq -c | sort -rn | head -5
173
+ ```
174
+
175
+ ### Session detection
176
+ Group commits by 45-minute gaps; report session count + average duration.
177
+
178
+ ```bash
179
+ git log v1.23..HEAD --pretty=format:"%ai" \
180
+ | awk 'BEGIN{prev=0; n=0; dur=0}
181
+ { ts = mktime(substr($1,1,4)" "substr($1,6,2)" "substr($1,9,2)" "substr($2,1,2)" "substr($2,4,2)" 00")
182
+ if (prev && ts - prev > 2700) { n++ }
183
+ if (prev) dur += ts - prev
184
+ prev = ts }
185
+ END { print "sessions: " (n+1) ", avg gap: " (dur/(n+1))/60 "m" }'
186
+ ```
187
+
188
+ ### Total LOC delta
189
+ ```bash
190
+ git log v1.23..HEAD --shortstat --pretty=tformat: 2>/dev/null \
191
+ | awk '{ ins += $4; del += $6 } END { print "+" ins " -" del " net " (ins-del) }'
192
+ ```
193
+
194
+ ### Output
195
+
196
+ Append a `### Retro Metrics` subsection to `v{version}-MILESTONE-AUDIT.md`:
197
+
198
+ ```markdown
199
+ ### Retro Metrics (gstack-borrowed)
200
+
201
+ **Commits per phase:**
202
+ | Phase | Commits |
203
+ |-------|---------|
204
+ | 87 | 8 |
205
+ | 88 | 6 |
206
+ | ... | ... |
207
+
208
+ **Fix-to-feature ratio:** 24% — healthy (< 30%)
209
+ **Top churn files:** package.json, CHANGELOG.md, agents/rrr-verifier.md, ...
210
+ **Sessions:** 12 detected, avg gap 8.4 min within sessions
211
+ **Net LOC delta:** +4,820 / -1,330 (net +3,490)
212
+ ```
213
+
214
+ Attribution: gstack `retro/SKILL.md` metrics framework.
215
+
141
216
  ## 7. Present Results
142
217
 
143
218
  Route by status (see `<offer_next>`).