@fro.bot/systematic 2.6.0 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/agents/review/api-contract-reviewer.md +1 -1
  2. package/agents/review/correctness-reviewer.md +1 -1
  3. package/agents/review/data-migrations-reviewer.md +1 -1
  4. package/agents/review/dhh-rails-reviewer.md +1 -1
  5. package/agents/review/julik-frontend-races-reviewer.md +1 -1
  6. package/agents/review/kieran-python-reviewer.md +1 -1
  7. package/agents/review/kieran-rails-reviewer.md +1 -1
  8. package/agents/review/kieran-typescript-reviewer.md +1 -1
  9. package/agents/review/maintainability-reviewer.md +1 -1
  10. package/agents/review/performance-reviewer.md +1 -1
  11. package/agents/review/reliability-reviewer.md +1 -1
  12. package/agents/review/security-reviewer.md +1 -1
  13. package/agents/workflow/bug-reproduction-validator.md +1 -1
  14. package/dist/cli.js +1 -1
  15. package/dist/{index-3h7kpmfa.js → index-k9tdxh0p.js} +1 -1
  16. package/dist/index.d.ts +1 -1
  17. package/dist/index.js +2 -3
  18. package/dist/lib/skills.d.ts +1 -0
  19. package/package.json +1 -1
  20. package/skills/ce-brainstorm/references/handoff.md +127 -0
  21. package/skills/ce-brainstorm/references/requirements-capture.md +243 -0
  22. package/skills/ce-brainstorm/references/universal-brainstorming.md +63 -0
  23. package/skills/ce-ideate/references/post-ideation-workflow.md +240 -0
  24. package/skills/ce-plan/references/deepening-workflow.md +249 -0
  25. package/skills/ce-plan/references/plan-handoff.md +96 -0
  26. package/skills/ce-plan/references/universal-planning.md +114 -0
  27. package/skills/ce-plan/references/visual-communication.md +31 -0
  28. package/skills/ce-work/references/shipping-workflow.md +129 -0
  29. package/skills/ce-work-beta/references/codex-delegation-workflow.md +327 -0
  30. package/skills/ce-work-beta/references/shipping-workflow.md +129 -0
  31. package/skills/compound-docs/SKILL.md +2 -3
  32. package/skills/document-review/references/synthesis-and-presentation.md +406 -0
  33. package/skills/proof/references/hitl-review.md +368 -0
  34. package/skills/writing-systematic-skills/SKILL.md +115 -0
  35. package/skills/writing-systematic-skills/references/foundation-conventions.md +143 -0
@@ -0,0 +1,406 @@
1
+ # Phases 3-5: Synthesis, Presentation, and Next Action
2
+
3
+ ## Phase 3: Synthesize Findings
4
+
5
+ Process findings from all agents through this pipeline. Order matters — each step depends on the previous. The pipeline implements the finding-lifecycle state machine: **Raised → (Confidence Gate | FYI-eligible | Dropped) → Deduplicated → Classified → SafeAuto | GatedAuto | Manual | FYI**. Re-evaluate state at each step boundary; do not carry forward assumptions from earlier steps as prose-level shortcuts.
6
+
7
+ ### 3.1 Validate
8
+
9
+ Check each agent's returned JSON against the findings schema:
10
+
11
+ - Drop findings missing any required field defined in the schema
12
+ - Drop findings with invalid enum values (including the pre-rename `auto` / `present` values from older personas — treat those as malformed until all persona output has been regenerated)
13
+ - Note the agent name for any malformed output in the Coverage section
14
+
15
+ **Do not narrate remap / validation diagnostics to the user.** Schema-drift notes ("persona X returned unknown enum Y, remapped to Z"), persona-prompt-drift commentary, and other validator-internal diagnostics are maintainer-facing information. They do not belong in the Phase 4 output the user reads. If a persona's output is malformed, the only user-visible consequence is a Coverage-row annotation (e.g., the persona shows fewer findings or a `malformed` marker). Everything else stays internal.
16
+
17
+ ### 3.2 Confidence Gate (Anchor-Based)
18
+
19
+ Gate findings by their `confidence` anchor value. Anchors are discrete integers (`0`, `25`, `50`, `75`, `100`) with behavioral definitions documented in `references/findings-schema.json` and embedded in the persona rubric (`references/subagent-template.md`). This replaces the prior continuous 0.0-1.0 scale with per-severity gates — doc-review economics do not warrant threshold gradation by severity, and coarse anchors prevent false-precision gaming.
20
+
21
+ | Anchor | Meaning | Route |
22
+ |--------|---------|-------|
23
+ | `0` | False positive or pre-existing issue | Drop silently |
24
+ | `25` | Might be real but could not verify | Drop silently |
25
+ | `50` | Verified real but nitpick / advisory / not very important | Surface in FYI subsection |
26
+ | `75` | Double-checked, will hit in practice, directly impacts correctness | Enter actionable tier (classify by `autofix_class`) |
27
+ | `100` | Evidence directly confirms; will happen frequently | Enter actionable tier (classify by `autofix_class`) |
28
+
29
+ - **Dropped silently** (anchors `0` and `25`): these do not surface in any output bucket — not as findings, not as FYI observations, not as residual concerns. Record the total drop count as a Coverage footnote line when non-zero: `Dropped: N (anchors 0/25 suppressed)`. The footnote appears below the Coverage table, alongside the `Chains:` footnote when both apply. This is the canonical location for drop-count reporting — not the summary line and not a per-persona Coverage column. Omit the footnote when N is zero.
30
+ - **FYI-subsection** (anchor `50`): surface in the presentation layer's FYI subsection regardless of `autofix_class`. These do not enter the walk-through or any bulk action — observational value without forcing a decision. Advisory observations ("nothing breaks, but...") naturally land here.
31
+ - **Actionable** (anchors `75` and `100`): enter the classification pipeline. Route by `autofix_class` (see 3.7).
32
+
33
+ **Why this threshold, not Anthropic's ≥ 80 code-review threshold:** Document review has opposite economics from code review. There is no linter backstop — the review IS the backstop. Premise-level concerns (product-lens, adversarial) naturally cap at anchors 50-75 because "is the motivation valid?" cannot be verified against ground truth. The routing menu already makes dismissal cheap (Skip, Append to Open Questions), so surfaced-and-skipped is a low-cost outcome while missed-and-shipped derails downstream implementation. Filter low (`≥ 50`) and let the routing menu handle volume.
34
+
35
+ ### 3.3 Deduplicate
36
+
37
+ Fingerprint each finding using `normalize(section) + normalize(title)`. Normalization: lowercase, strip punctuation, collapse whitespace.
38
+
39
+ When fingerprints match across personas:
40
+
41
+ - If the findings recommend opposing actions (e.g., one says cut, the other says keep), do not merge — preserve both for contradiction resolution in 3.5
42
+ - Otherwise merge: keep the highest severity, keep the highest confidence anchor (if tied, keep the finding appearing first in document order — deterministic, not probabilistic), union all evidence arrays, note all agreeing reviewers (e.g., "coherence, feasibility")
43
+ - **Coverage attribution:** Attribute the merged finding to the persona with the highest confidence anchor. If anchors tie, attribute to the persona whose entry appeared first in document order. Decrement the losing persona's Findings count and the corresponding route bucket so totals stay exact.
44
+
45
+ ### 3.3b Same-Persona Premise Redundancy Collapse
46
+
47
+ A single persona sometimes files multiple findings that share the same root premise expressed at different sections or wrapped in different framing (e.g., product-lens firing five variants of "motivation is weak" attached to Motivation, Unit 4b, Key Technical Decisions, and two other sections). Cross-persona dedup (3.3) does not catch this — it fingerprints on section+title, which differ even when the underlying concern is the same. Surfacing all N variants over-weights one persona's perspective relative to the other five and inflates the P2 Decisions tier with near-duplicate signal.
48
+
49
+ For each persona, cluster that persona's surviving findings by shared root premise. A cluster forms when 3 or more findings from the same persona share:
50
+
51
+ - The same `finding_type` (error or omission)
52
+ - Substantially overlapping `why_it_matters` phrasing (same key nouns/verbs signaling the same concern, e.g., "motivation", "justification", "premise unsupported", "scope creep")
53
+ - Fixes that would all be obviated by the same upstream decision (e.g., "add the triggering incident" would moot all five motivation-weakness findings)
54
+
55
+ For each cluster of size N ≥ 3:
56
+
57
+ - Keep the single finding with the strongest evidence (highest confidence anchor, or if tied, the one citing the most concrete document reference)
58
+ - Demote the remaining N-1 findings to FYI-subsection status (anchor `50`), regardless of their original anchor
59
+ - On the kept finding, note in the Reviewer column that the persona raised N-1 related variants (e.g., `product-lens (+4 related variants demoted to FYI)`)
60
+
61
+ This runs per-persona before 3.4 cross-persona boost. Cross-persona agreement across the *kept* finding still qualifies for the anchor-step promotion in 3.4; demoted variants do not participate in cross-persona promotion (they are observational only after collapse).
62
+
63
+ Do NOT collapse across personas at this step — different personas surfacing the same concern is exactly the independence signal the cross-persona boost rewards. Collapse applies within one persona's output only.
64
+
65
+ ### 3.4 Cross-Persona Agreement Promotion
66
+
67
+ When 2+ independent personas flagged the same merged finding (from 3.3), promote the merged finding's anchor by one step: `50 → 75`, `75 → 100`. Anchor `100` does not promote further (already at the ceiling). Findings at anchors `0` or `25` do not reach this step (they were dropped in 3.2).
68
+
69
+ Independent corroboration is strong signal — multiple reviewers converging on the same issue is more reliable than any single reviewer's anchor. Promoting by one anchor step is semantically meaningful (a "verified but nitpick" finding that two personas independently surface is plausibly "will hit in practice"). This replaces the prior `+0.10` boost — the magic-number bump was calibrated to the continuous scale and no longer applies.
70
+
71
+ Note the promotion in the Reviewer column of the output (e.g., `coherence, feasibility (+1 anchor)`).
72
+
73
+ This replaces the earlier residual-concern promotion step. Findings at anchors `0` / `25` are not promoted back into the review surface; they appear only as drop counts in Coverage. If a dropped finding is genuinely important, the reviewer should raise their anchor to `50` or higher through stronger evidence rather than relying on a promotion rule.
74
+
75
+ ### 3.5 Resolve Contradictions
76
+
77
+ When personas disagree on the same section:
78
+
79
+ - Create a combined finding presenting both perspectives
80
+ - Set `autofix_class: manual` (contradictions are by definition judgment calls)
81
+ - Set `finding_type: error` (contradictions are about conflicting things the document says, not things it omits)
82
+ - Frame as a tradeoff, not a verdict
83
+
84
+ Specific conflict patterns:
85
+
86
+ - Coherence says "keep for consistency" + scope-guardian says "cut for simplicity" → combined finding, let user decide
87
+ - Feasibility says "this is impossible" + product-lens says "this is essential" → P1 finding framed as a tradeoff
88
+ - Multiple personas flag the same issue (no disagreement) → handled in 3.3 merge, not here
89
+
90
+ ### 3.5b Deterministic Recommended-Action Tie-Break
91
+
92
+ Every merged finding carries exactly one `recommended_action` field consumed by the walk-through (`references/walkthrough.md`) to mark the `(recommended)` option, by the best-judgment path (`references/bulk-preview.md`) to choose what to execute in bulk, and by the stem's yes/no framing. When a merged finding was flagged by multiple personas who implied different actions, synthesis picks the recommended action deterministically so identical review artifacts produce identical walk-through and best-judgment behavior across runs.
93
+
94
+ **Tie-break order (most conservative first):** `Skip > Defer > Apply`. The first action that at least one contributing persona implied wins, scanning in that order.
95
+
96
+ - If any contributing persona implied Skip → `recommended_action: Skip`
97
+ - Else if any contributing persona implied Defer → `recommended_action: Defer`
98
+ - Else → `recommended_action: Apply`
99
+
100
+ **Persona-to-action mapping.** A persona implies an action through its classification:
101
+
102
+ - `safe_auto` or `gated_auto` → implies Apply
103
+ - `manual` with a concrete `suggested_fix` and a recommended resolution → implies Apply (the persona has an opinion about what to do)
104
+ - `manual` flagged as a tradeoff or scope question with no recommended resolution → implies Defer (worth revisiting, not worth acting now)
105
+ - Any persona flagging the finding as low-confidence or suppression-eligible via residual concerns → implies Skip
106
+ - Persona in the contradiction set (3.5) implying "keep as-is / do not change" → implies Skip
107
+
108
+ If the contributing personas are all silent on action (e.g., a merged `manual` finding from personas that all flagged it as observation without recommendation), pick the default based on whether the merged finding carries an executable `suggested_fix`:
109
+
110
+ - `suggested_fix` present → `recommended_action: Apply` as the pragmatic default.
111
+ - `suggested_fix` absent → `recommended_action: Defer` (the walk-through and best-judgment path cannot execute Apply without a fix; routing an actionless finding to Defer surfaces it in Open Questions where the user can decide what to do with it).
112
+
113
+ This gate holds for every branch of the tie-break: if the winning action is `Apply` but the merged finding has no `suggested_fix` after 3.6 (Promote) and 3.7 (Route) have run, downgrade to `Defer`. The walk-through still lets the user pick any of the four options; this rule only governs the agent's default recommendation so the best-judgment path and bulk-preview never schedule a non-executable Apply.
114
+
115
+ **Conflict-context surface.** When the tie-break fires (contributing personas implied different actions), record a one-line conflict-context string on the merged finding. The walk-through renders this on the R15 conflict-context line (see `references/walkthrough.md`). Example: `Coherence recommends Apply; scope-guardian recommends Skip. Agent's recommendation: Skip.`
116
+
117
+ **Downstream invariant.** The walk-through and bulk-preview never recompute the recommendation — they read `recommended_action` and render `(recommended)` on the matching option. Best-judgment-the-rest and routing option B execute the `recommended_action` across the scoped finding set in bulk. This keeps best-judgment outcomes reproducible and auditable: the same review artifact always produces the same bulk plan.
118
+
119
+ ### 3.5c Premise-Dependency Chain Linking
120
+
121
+ Document reviews often produce fanout: a single premise challenge ("is this work justified?") generates downstream findings that all evaporate if the premise is rejected ("alias unjustified", "abstraction overkill", "migration lacks rollback", "naming forecloses future"). Surfacing each as an independent decision forces the user to re-litigate the same root question N times. This step links dependent findings to their root so presentation can group them and the walk-through can cascade a single root decision across the chain.
122
+
123
+ Run this step after 3.5b (recommended_action normalized) and before 3.6 (auto-promotion), operating on the merged finding set.
124
+
125
+ **Step 1: Identify roots.** A finding is a candidate root when ALL of the following hold:
126
+
127
+ - Severity is `P0` or `P1` (premise-level issues carry high priority by nature)
128
+ - `autofix_class` is `manual` (the root itself requires judgment — a safe/gated root is acted on, not cascaded)
129
+ - `why_it_matters` or `title` challenges a foundational premise, not a detail. Signal phrases (shape, not vocabulary): "premise unsupported", "justification missing", "do-nothing baseline not evaluated", "is X justified", "unsupported by evidence", "is the proposed solution the right approach"
130
+ - The finding's `section` is framing-level (Problem Frame, Summary, Overview, Why, Motivation, Goals — `Summary` is the new ce-plan / ce-brainstorm template heading; `Overview` retained as legacy) OR the finding explicitly questions whether a named component should exist
131
+
132
+ If multiple candidates match the criteria, elevate ALL of them. The criteria above (P0/P1, manual, framing-level section, premise-challenge signal phrases) are restrictive enough that this list will be short for any well-formed document; do not impose a further numerical cap. Picking only one root when two valid roots exist leaves the second root's natural dependents stranded as independent manual findings — the exact UX problem chains are meant to solve.
133
+
134
+ **Peer vs nested test.** Two candidate roots are peers when accepting root A's proposed fix would not resolve root B's concern (and vice versa). They are nested when one root's fix would moot the other — in which case the subsumed candidate becomes a dependent of the surviving root, not a peer root. Apply the test symmetrically: check both directions before deciding.
135
+
136
+ **Surviving-root selection under asymmetric subsumption.** When nested, the surviving root is the one whose fix moots the other — **not** the one with higher confidence. If accepting Root A's fix moots Root B's concern, but accepting Root B's fix leaves Root A's concern standing, A is the surviving root and B becomes its dependent, regardless of which candidate scored higher confidence. The subsumption direction determines scope (broader premise wins); confidence determines strength, not scope. Confidence is used for tie-breaking *among peers*, not for deciding which of two nested candidates dominates.
137
+
138
+ **Sanity diagnostic.** If more than 3 candidates match, reconsider whether the criteria are being applied correctly — it is unusual for a single document to contain more than 3 genuinely distinct premise-level challenges. Do not silently drop candidates; either confirm each one independently meets the criteria (and surface them all), or tighten the application of the criteria. If the count is legitimately high, surfacing all of them is more useful than hiding any.
139
+
140
+ If none match, skip the rest of this step — no chains exist.
141
+
142
+ **Dependent assignment under multiple roots.** When multiple roots exist and a candidate dependent could plausibly link to more than one, assign it to the root whose rejection most directly dissolves the dependent's concern. If ambiguity remains, assign to the root with the higher confidence anchor; if anchors tie, assign to the root appearing first in document order. A dependent never links to more than one root — a single `depends_on` value.
143
+
144
+ **Step 2: Identify dependents.** For each candidate root, scan the remaining findings for dependents. The predicate must match the cascade trigger in `references/walkthrough.md` — dependents cascade when the user rejects (Skip/Defer) the root, so dependency is defined on the rejection branch, not the acceptance branch. A finding is a dependent of a root when:
145
+
146
+ - The root challenges a foundational premise about a named component — questioning whether it should exist, whether the proposed approach is correct, or whether the work is justified. Shapes to recognize (not a vocabulary list — map to whatever the document's domain actually uses): a compatibility layer whose necessity is challenged, a planned feature whose justification is in doubt, an abstraction whose warrant is questioned, a proposed change whose scope is disputed, a migration target whose choice is contested, an architectural commitment whose basis is unsupported
147
+ - The candidate's `suggested_fix` modifies, adds detail to, or constrains that same component
148
+ - The candidate's concern would dissolve if the root's premise is rejected — meaning: if the user rejects the root (Skip/Defer), the component the dependent targets is no longer a settled part of the plan, so the dependent's fix has nothing stable to act on and batch-rejects with the root
149
+
150
+ Test with the substitution check: "If the user rejects the root (Skip/Defer), does the dependent's finding still describe an actionable concern the user would want to engage with this round?" If no — the dependent's premise dissolves alongside the root's — it is a dependent. If yes (the finding identifies a problem that survives root rejection), it is not.
151
+
152
+ **Step 3: Independence safeguard.** Even when a finding's target component is addressed by the root, do NOT link if:
153
+
154
+ - The dependent identifies a problem that would exist regardless of the root's resolution. A migration's rollback plan, a module's error handling, a feature's test coverage — these are operational obligations that don't evaporate when the premise changes. They describe how a component must behave if it exists at all.
155
+ - The dependent's `why_it_matters` cites evidence (codebase fact, framework convention, production data) that stands on its own, not conditioned on the premise
156
+ - The dependent is `safe_auto` — it has one clear correct fix and should apply regardless of the root's resolution
157
+
158
+ When uncertain, default to NOT linking. A mis-linked chain hides a real issue; leaving a finding unlinked only costs one extra decision.
159
+
160
+ **Step 4: Annotate.** On each dependent, record `depends_on: <root_finding_id>` (use section + normalized title as the id). On each root, record `dependents: [<dependent_ids>]`. Cap `dependents` at 6 entries per root — if more than 6 candidates link to the same root, keep the top 6 by severity, then confidence anchor (descending), then document order as the deterministic final tiebreak; leave the rest unlinked (over-aggressive chaining risks obscuring independent concerns).
161
+
162
+ Do NOT reclassify, re-route, or change the confidence anchor of any finding in this step. Linking is purely annotative; the walk-through and presentation use the annotation, synthesis proper does not.
163
+
164
+ **Step 5: Report in Coverage.** Add a line to the coverage summary: `Chains: N root(s) with M total dependents`. When N = 0, omit the line.
165
+
166
+ **Count invariant (critical — do not violate).** `M` in the coverage line is the number of findings with `depends_on` set after Step 4 completes — i.e., the final linked count after steps 2 (candidacy), 3 (independence safeguard), and 4 (cap). It is NOT the number of candidates considered in Step 2. The same `dependents` array is the source of truth for both coverage counting AND rendering the `Dependents (...)` sub-block. If a finding appears in a root's `dependents` array, it MUST appear nested under that root in the presentation and MUST NOT appear at its own severity position. If a finding does NOT appear in any root's `dependents` array, it MUST appear at its own severity position and MUST NOT appear nested anywhere. Coverage count and rendering drift apart only if the orchestrator is using two different source-of-truth values — there is exactly one, the post-Step-4 `dependents` array on each root.
167
+
168
+ **Worked example A (rename-shape).** Review of a refactor plan surfaces 11 findings. One is P0 manual "Rename premise unsupported by user-facing evidence" in Problem Frame — a candidate root. Scanning the other 10:
169
+
170
+ - P1 manual "Alias mechanism unjustified scope" — root proposes scoping down to a pure alias-free rename; dependent's fix proposes dropping alias infrastructure. Linked.
171
+ - P2 manual "AliasedCommand abstraction overkill" — abstraction exists to support the alias; if alias dropped, abstraction dissolves. Linked.
172
+ - P2 manual "Rename forecloses dual-mode future" — concern only exists if rename proceeds. Linked.
173
+ - P2 manual "Identity drift: command vs artifact names" — naming asymmetry only exists if rename proceeds. Linked.
174
+ - P1 manual "Migration lacks rollback strategy" — migration needs rollback regardless of scope. NOT linked (independence safeguard).
175
+ - P0 gated_auto "Deployment-ordering between migration and code" — concrete fix user confirms regardless. NOT linked (safeguard: gated_auto with own resolution path).
176
+
177
+ Result: 1 root + 4 dependents. User sees the root first; rejecting it cascades the 4 dependents to auto-resolved. Manual engagement drops from 11 → 7 (6 unlinked + 1 visible root).
178
+
179
+ **Worked example B (auth-shape).** Review of a plan to introduce a new session-management middleware. One finding is P1 manual "Middleware rewrite premise unsupported — existing session handling has no reported reliability issues" in Problem Frame. Scanning the other findings:
180
+
181
+ - P2 manual "Middleware abstraction boundary unclear vs existing request context" — the boundary only matters if the middleware is built. Linked.
182
+ - P2 manual "Rollout strategy for new session store not specified" — the rollout only matters if the new store ships. Linked.
183
+ - P1 gated_auto "CSRF token regeneration missing on session rotation" — a real security gap in the plan's written design, independent of whether the middleware is the right approach. NOT linked (safeguard: gated_auto, concrete fix applies regardless).
184
+ - P2 manual "Existing session timeout behavior not captured in tests" — this is a pre-existing test coverage gap. It exists in the current code regardless of whether the rewrite happens. NOT linked (independence safeguard).
185
+
186
+ Result: 1 root + 2 dependents. The shape is the same as Example A — different vocabulary, different domain — which is the pattern to recognize.
187
+
188
+ ### 3.6 Promote Auto-Eligible Findings
189
+
190
+ Scan `manual` findings for promotion to `safe_auto` or `gated_auto`. Promote when the finding meets one of the consolidated auto-promotion patterns:
191
+
192
+ - **Codebase-pattern-resolved.** `why_it_matters` cites a specific existing codebase pattern (concrete file/function/usage reference, not just "best practice" or "convention"), and `suggested_fix` follows that pattern. Promote to `gated_auto` — the user still confirms, but the codebase evidence resolves ambiguity.
193
+ - **Factually incorrect behavior.** The document describes behavior that is factually wrong, and the correct behavior is derivable from context or the codebase. Promote to `gated_auto`.
194
+ - **Missing standard security/reliability controls.** The omission is clearly a gap (not a legitimate design choice for the system described), and the fix follows established practice (HTTPS enforcement, checksum verification, input sanitization, fallback-with-deprecation-warning on renames). Promote to `gated_auto`.
195
+ - **Framework-native-API substitutions.** A hand-rolled implementation duplicates first-class framework behavior, and the framework API is cited. Promote to `gated_auto`.
196
+ - **Mechanically-implied completeness additions.** The missing content follows mechanically from the document's own explicit, concrete decisions (not high-level goals). Promote to `safe_auto` when there is genuinely one correct addition; `gated_auto` when the addition is substantive.
197
+
198
+ Do not promote if the finding involves scope or priority changes where the author may have weighed tradeoffs invisible to the reviewer.
199
+
200
+ **Strawman-downgrade safeguard.** If a `safe_auto` finding names dismissed alternatives in `why_it_matters` (per the subagent template's strawman rule), verify the alternatives are genuinely strawmen. If any alternative is a plausible design choice that the persona dismissed too aggressively, downgrade to `gated_auto` so the user sees the tradeoff before the fix applies.
201
+
202
+ ### 3.7 Route by Autofix Class
203
+
204
+ **Severity and autofix_class are independent.** A P1 finding can be `safe_auto` if the correct fix is obvious. The test is not "how important?" but "is there one clear correct fix, or does this require judgment?"
205
+
206
+ **Anchor and autofix_class are also independent.** Anchor gates the finding into a surface (FYI vs actionable); `autofix_class` decides what the actionable surface does with it. Both are consulted in this step.
207
+
208
+ Findings reaching 3.7 have already been gated to anchors `50`, `75`, or `100` by 3.2 (anchors `0` and `25` were dropped).
209
+
210
+ | Anchor | Autofix Class | Route |
211
+ |--------|---------------|-------|
212
+ | `100` | `safe_auto` | Apply silently in Phase 4. Requires `suggested_fix`. Demote to `gated_auto` if missing. |
213
+ | `100` | `gated_auto` | Enter the per-finding walk-through with Apply marked (recommended). Requires `suggested_fix`. Demote to `manual` if missing. |
214
+ | `100` | `manual` | Enter the per-finding walk-through with user-judgment framing. `suggested_fix` is optional. |
215
+ | `75` | `safe_auto` | Demote to `gated_auto` before routing — silent apply is reserved for anchor `100` findings where evidence directly confirms the fix. Enter the walk-through with Apply marked (recommended). |
216
+ | `75` | `gated_auto` | Enter the per-finding walk-through with Apply marked (recommended). Requires `suggested_fix`. Demote to `manual` if missing. |
217
+ | `75` | `manual` | Enter the per-finding walk-through with user-judgment framing. `suggested_fix` is optional. |
218
+ | `50` | any | Surface in the FYI subsection regardless of `autofix_class`. Do not enter the walk-through or any bulk action. These are observations, not decisions. |
219
+
220
+ **Auto-eligible patterns for safe_auto:** summary/detail mismatch (body authoritative over overview), wrong counts, missing list entries derivable from elsewhere in the document, stale internal cross-references, terminology drift, prose/diagram contradictions where prose is more detailed, missing steps mechanically implied by other content, unstated thresholds implied by surrounding context.
221
+
222
+ **Auto-eligible patterns for gated_auto:** codebase-pattern-resolved fixes, factually incorrect behavior, missing standard security/reliability controls, framework-native-API substitutions, substantive completeness additions mechanically implied by explicit decisions.
223
+
224
+ ### 3.8 Sort
225
+
226
+ Sort findings for presentation: P0 → P1 → P2 → P3, then by finding type (errors before omissions), then by confidence anchor (descending: `100` first, then `75`, then `50`), then by document order (section position) as the deterministic final tiebreak.
227
+
228
+ ### 3.9 Suppress Restatements in Residual Concerns and Deferred Questions
229
+
230
+ Persona outputs carry `residual_risks` and `deferred_questions` arrays alongside `findings`. After the actionable-tier set is finalized (post-3.7 routing), personas often re-surface the same substance in their residual/deferred arrays — the persona's own finding and the persona's own residual concern are about the same issue. Rendering both sections verbatim inflates the output with restatements that carry no new signal.
231
+
232
+ For every `residual_risk` and `deferred_question` across all persona outputs, check against the finalized actionable-finding set (findings at confidence anchor `75` or `100`, plus FYI-subsection findings at anchor `50`). Drop the residual/deferred item if either of these holds:
233
+
234
+ - **Section-and-substance overlap.** The residual/deferred item names the same section as an actionable finding AND its substance fuzzy-matches the finding's `title` or `why_it_matters` (shared key nouns/verbs indicating the same concern).
235
+ - **Question form of an actionable finding.** A deferred question whose subject is directly answered by or obviated by an actionable finding's recommendation. Example: actionable finding "Motivation cites no real incident" → deferred question "Is there a concrete triggering event?" — the finding already raised this; the question restates it interrogatively.
236
+
237
+ Do NOT drop residual/deferred items that introduce genuinely new signal (a concern or question the actionable findings do not touch). When in doubt, keep — this pass is for obvious restatements, not borderline calls.
238
+
239
+ Run this pass on the merged set across all personas. Record the count dropped as a Coverage footnote line when non-zero: `Restated: N (residual/deferred items suppressed as duplicates of actionable findings)`. Ordering: footnotes appear in the sequence `Dropped:`, `Chains:`, `Restated:` below the Coverage table, each on its own line. Omit any footnote whose count is zero.
240
+
241
+ ## Phase 4: Apply and Present
242
+
243
+ **User-facing vocabulary rule (applies to ALL user-visible output in Phase 4, not just the rendered template).** Internal enum values — `safe_auto`, `gated_auto`, `manual`, `FYI` — stay inside the schema and synthesis prose. Every word the user sees in Phase 4 output, including free-text narration between sections, transition preambles, status lines, and confirmation messages, MUST use user-facing vocabulary: "fixes" (for `safe_auto`), "proposed fixes" (for `gated_auto`), "decisions" (for `manual` findings at anchor `75` or `100`), "FYI observations" (for any finding at anchor `50`). The only exception is the `Tier` column in rendered tables, which is explicitly documented as surfacing the internal enum for transparency. Do NOT emit narration like "safe_auto fixes applied" or "N safe_auto findings" — write "fixes applied" or "N fixes" instead.
244
+
245
+ ### Apply safe_auto fixes
246
+
247
+ Apply only `safe_auto` findings **at confidence anchor `100`** to the document in a single pass. This matches the 3.7 routing table: anchor `100` + `safe_auto` silent-applies; anchor `75` + `safe_auto` was demoted to `gated_auto` in 3.7 and enters the walk-through instead; anchor `50` + any `autofix_class` routes to FYI and must never auto-apply.
248
+
249
+ - Edit the document inline using the platform's edit tool
250
+ - Track what was changed for the "Applied fixes" section in the rendered output (`safe_auto` is the internal enum; the rendered section header reads "Applied fixes")
251
+ - Do not ask for approval — these have one clear correct fix AND evidence directly confirms (anchor `100`)
252
+ - Do NOT silent-apply any `safe_auto` finding at anchor `75` or `50`. If a finding reaches this step with `autofix_class: safe_auto` and anchor below `100`, the 3.7 routing rule was not applied correctly; re-run 3.7 for that finding before continuing.
253
+
254
+ List every applied fix in the output summary so the user can see what changed. Use enough detail to convey the substance of each fix (section, what was changed, reviewer attribution). This is especially important for fixes that add content or touch document meaning — the user should not have to diff the document to understand what the review did.
255
+
256
+ ### Route Remaining Findings
257
+
258
+ After safe_auto fixes apply, remaining findings split into buckets:
259
+
260
+ - `gated_auto` and `manual` findings at confidence anchor `75` or `100` → enter the routing question (see Unit 5 / `references/walkthrough.md`)
261
+ - FYI-subsection findings → surface in the presentation only, no routing
262
+ - Zero actionable findings remaining → skip the routing question; flow directly to Phase 5 terminal question
263
+
264
+ **Headless mode:** Do not use interactive question tools. Output all findings as a structured text envelope the caller can parse. Internal enum values (`safe_auto`, `gated_auto`, `manual`, `FYI`) stay in the schema and synthesis prose; the envelope below uses user-facing vocabulary — "fixes", "Proposed fixes", "Decisions", "FYI observations" — so headless output reads the same way interactive output does.
265
+
266
+ ```
267
+ Document review complete (headless mode).
268
+
269
+ Applied N fixes:
270
+ - <section>: <what was changed> (<reviewer>)
271
+ - <section>: <what was changed> (<reviewer>)
272
+
273
+ Proposed fixes (concrete fix, requires user confirmation):
274
+
275
+ [P0] Section: <section> — <title> (<reviewer>, confidence <anchor>)
276
+ Why: <why_it_matters>
277
+ Suggested fix: <suggested_fix>
278
+
279
+ Decisions (requires user judgment):
280
+
281
+ [P1] Section: <section> — <title> (<reviewer>, confidence <anchor>)
282
+ Why: <why_it_matters>
283
+ Suggested fix: <suggested_fix or "none">
284
+
285
+ Dependents (would resolve if this root is rejected):
286
+ [P2] Section: <section> — <title> (<reviewer>, confidence <anchor>)
287
+ Why: <why_it_matters>
288
+ [P2] Section: <section> — <title> (<reviewer>, confidence <anchor>)
289
+ Why: <why_it_matters>
290
+
291
+ FYI observations (anchor 50, no decision required):
292
+
293
+ [P3] Section: <section> — <title> (<reviewer>, confidence <anchor>)
294
+ Why: <why_it_matters>
295
+
296
+ Residual concerns:
297
+ - <concern> (<source>)
298
+
299
+ Deferred questions:
300
+ - <question> (<source>)
301
+
302
+ Dropped: N (anchors 0/25 suppressed)
303
+ Chains: N root(s) with M dependents
304
+ Restated: N (residual/deferred items suppressed as duplicates of actionable findings)
305
+
306
+ Review complete
307
+ ```
308
+
309
+ Omit any section with zero items. The section headers reflect user-facing vocabulary: the "Proposed fixes" bucket carries `gated_auto` findings at anchor `75` or `100` (the persona has a concrete fix; the user confirms), "Decisions" carries `manual` findings at anchor `75` or `100` (judgment calls), and "FYI observations" carries any finding at anchor `50` regardless of `autofix_class`. When a root has dependents, render the root at its normal position in the severity-sorted list and nest its dependents as an indented `Dependents (...)` sub-block immediately below. Do not re-list dependents at their own severity position — they appear only under their root. End with "Review complete" as the terminal signal so callers can detect completion.
310
+
311
+ **Compact rendering for FYI observations, residual concerns, and deferred questions (high-count mode).** When the combined count of these three buckets is 5 or more, collapse each to a one-line count followed by a tight bullet list without per-item `Why` expansion. Actionable buckets (Proposed fixes / Decisions) remain fully rendered regardless. This mirrors the interactive-mode rule in `references/review-output-template.md` so both modes produce the same shape.
312
+
313
+ **Interactive mode:**
314
+
315
+ Present findings using the review output template (read `references/review-output-template.md`). Within each severity level, separate findings by type:
316
+
317
+ - Errors (design tensions, contradictions, incorrect statements) first — these need resolution
318
+ - Omissions (missing steps, absent details, forgotten entries) second — these need additions
319
+
320
+ Brief summary at the top: "Applied N fixes. K items need attention (X errors, Y omissions). Z FYI observations."
321
+
322
+ Include the Coverage table, applied fixes, FYI observations (as a distinct subsection), residual concerns, and deferred questions.
323
+
324
+ **All tables MUST be pipe-delimited markdown (`| col | col |`). Do NOT use ASCII box-drawing characters (`┌ ┬ ┐ ├ ┼ ┤ └ ┴ ┘ │ ─`) under any circumstances, including for the Coverage table.** This rule restates the template's formatting requirement at the point of rendering so it cannot drift. Pipe-delimited tables render correctly across all target harnesses; box-drawing characters break rendering in some and violate the repo convention documented in root `AGENTS.md`.
325
+
326
+ ### R29 Rejected-Finding Suppression (Round 2+)
327
+
328
+ When the orchestrator is running round 2+ on the same document in the same session, the decision primer (see `SKILL.md` — Decision primer) carries forward every prior-round Skipped, Deferred, and Acknowledged finding. Synthesis suppresses re-raised rejected findings rather than re-surfacing them to the user. Acknowledged is treated as a rejected-class decision here: the user saw the finding, chose not to act on it (no Apply, no Defer append), and wants it on record — equivalent to Skip for suppression purposes.
329
+
330
+ For each current-round finding, compare against the primer's rejected list:
331
+
332
+ - **Matching predicate:** same as R30 — `normalize(section) + normalize(title)` fingerprint augmented with evidence-substring overlap check (>50%). If a current-round finding matches a prior-round rejected finding on fingerprint AND evidence overlap, drop the current-round finding.
333
+ - **Materially-different exception:** if the current document state has changed around the finding's section since the prior round (e.g., the section was edited and the evidence quote no longer appears in the current text), treat the finding as new — the underlying context shifted and the concern may be genuinely different now. The persona's evidence itself reveals this: a quote that doesn't appear in the current document is a signal the prior-round rejection no longer applies.
334
+ - **On suppression:** record the drop in Coverage with a "previously rejected, re-raised this round" note so the user can see what was suppressed. The user can explicitly escalate by invoking the review again on a different context if they believe the suppression was wrong.
335
+
336
+ This rule runs at synthesis time, not at the persona level. Personas have a soft instruction via the subagent template's `{decision_primer}` variable to avoid re-raising rejected findings, but the orchestrator is the authoritative gate — if a persona re-raises despite the primer, synthesis drops the finding.
337
+
338
+ ### R30 Fix-Landed Matching Predicate
339
+
340
+ When the orchestrator is running round 2+ on the same document (see Unit 7 multi-round memory), synthesis verifies that prior-round Applied findings actually landed. For each prior-round Applied finding:
341
+
342
+ - **Matching predicate:** `normalize(section) + normalize(title)` (same fingerprint as 3.3 dedup) augmented with an evidence-substring overlap check. If any current-round persona raises a finding whose fingerprint matches a prior-round Applied finding AND shares >50% of its evidence substring with the prior-round evidence, treat it as a fix-landed regression.
343
+ - **Section renames count as different locations.** If the section name has changed between rounds (edit introduced a heading rename), treat the new section as a different location and the current-round finding as new.
344
+ - **On match:** flag the finding as "fix did not land" in the report rather than surfacing it as a new finding. Include the prior-round finding's title and the current-round persona's evidence so the user can see why the verification flagged it.
345
+
346
+ ### Protected Artifacts
347
+
348
+ During synthesis, discard any finding that recommends deleting or removing files in:
349
+
350
+ - `docs/brainstorms/`
351
+ - `docs/plans/`
352
+ - `docs/solutions/`
353
+
354
+ These are pipeline artifacts and must not be flagged for removal.
355
+
356
+ ## Phase 5: Next Action — Terminal Question
357
+
358
+ **Headless mode:** Return "Review complete" immediately. Do not ask questions. The caller receives the text envelope from Phase 4 and handles any remaining findings.
359
+
360
+ **Interactive mode:** fire the terminal question using the platform's blocking question tool (`question` in OpenCode, `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension)). In OpenCode the tool should already be loaded from the Interactive-mode pre-load step in `SKILL.md` — if it isn't, call `ToolSearch` with `select:question` now. Fall back to numbered options in chat only when no blocking tool exists in the harness or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question. This question is distinct from the mid-flow routing question (`references/walkthrough.md`) — the routing question chooses *how* to engage with findings, this one chooses *what to do next* once engagement is complete. Do not merge them.
361
+
362
+ **Stem:** `Apply decisions and what next?`
363
+
364
+ **Options (three by default; two in the zero-actionable case):**
365
+
366
+ When `fixes_applied_count > 0` (at least one safe_auto or Apply decision has landed this session):
367
+
368
+ ```
369
+ A. Apply decisions and proceed to <next stage>
370
+ B. Apply decisions and re-review
371
+ C. Exit without further action
372
+ ```
373
+
374
+ When `fixes_applied_count == 0` (zero-actionable case, or the user took routing option D / every walk-through decision was Skip):
375
+
376
+ ```
377
+ A. Proceed to <next stage>
378
+ B. Exit without further action
379
+ ```
380
+
381
+ The `<next stage>` substitution uses the document type from Phase 1:
382
+
383
+ - Requirements document → `ce-plan`
384
+ - Plan document → `ce-work`
385
+
386
+ **Label adaptation:** when no decisions are queued to apply, the primary option drops the `Apply decisions and` prefix — the label should match what the system is doing. `Apply decisions and proceed` when fixes are queued; `Proceed` when nothing is queued.
387
+
388
+ **Caller-context handling (implicit):** the terminal question's "Proceed to <next stage>" option is interpreted contextually by the agent from the visible conversation state. When ce-doc-review is invoked from inside another skill's flow (e.g., ce-brainstorm Phase 4 re-review, ce-plan phase 5.3.8), the agent does not fire a nested `/ce-plan` or `/ce-work` dispatch — it returns control to the caller's flow which continues its own logic. When invoked standalone, "Proceed" dispatches the appropriate next skill. No explicit caller-hint argument is required; if this implicit handling proves unreliable in practice, an explicit `nested:true` flag can be added as a follow-up.
389
+
390
+ ### Iteration limit
391
+
392
+ After 2 refinement passes, recommend completion — diminishing returns are likely. But if the user wants to continue, allow it; the primer carries all prior-round decisions so later rounds suppress repeat findings cleanly.
393
+
394
+ Return "Review complete" as the terminal signal for callers, regardless of which option the user picked.
395
+
396
+ ## What NOT to Do
397
+
398
+ - Do not rewrite the entire document
399
+ - Do not add new sections or requirements the user didn't discuss
400
+ - Do not over-engineer or add complexity
401
+ - Do not create separate review files or add metadata sections
402
+ - Do not modify caller skills (ce-brainstorm, ce-plan, or external plugin skills that invoke ce-doc-review)
403
+
404
+ ## Iteration Guidance
405
+
406
+ On subsequent passes, re-dispatch personas with the multi-round decision primer (see Unit 7) and re-synthesize. Fixed findings self-suppress because their evidence is gone from the current doc; rejected findings are handled by the R29 pattern-match suppression rule; applied-fix verification uses the R30 matching predicate above. If findings are repetitive across passes after these mechanisms run, recommend completion.