npm - @fro.bot/systematic - Versions diffs - 2.6.0 → 2.7.0 - Mend

@fro.bot/systematic 2.6.0 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/agents/review/api-contract-reviewer.md +1 -1
package/agents/review/correctness-reviewer.md +1 -1
package/agents/review/data-migrations-reviewer.md +1 -1
package/agents/review/dhh-rails-reviewer.md +1 -1
package/agents/review/julik-frontend-races-reviewer.md +1 -1
package/agents/review/kieran-python-reviewer.md +1 -1
package/agents/review/kieran-rails-reviewer.md +1 -1
package/agents/review/kieran-typescript-reviewer.md +1 -1
package/agents/review/maintainability-reviewer.md +1 -1
package/agents/review/performance-reviewer.md +1 -1
package/agents/review/reliability-reviewer.md +1 -1
package/agents/review/security-reviewer.md +1 -1
package/agents/workflow/bug-reproduction-validator.md +1 -1
package/dist/cli.js +1 -1
package/dist/{index-3h7kpmfa.js → index-k9tdxh0p.js} +1 -1
package/dist/index.d.ts +1 -1
package/dist/index.js +2 -3
package/dist/lib/skills.d.ts +1 -0
package/package.json +1 -1
package/skills/ce-brainstorm/references/handoff.md +127 -0
package/skills/ce-brainstorm/references/requirements-capture.md +243 -0
package/skills/ce-brainstorm/references/universal-brainstorming.md +63 -0
package/skills/ce-ideate/references/post-ideation-workflow.md +240 -0
package/skills/ce-plan/references/deepening-workflow.md +249 -0
package/skills/ce-plan/references/plan-handoff.md +96 -0
package/skills/ce-plan/references/universal-planning.md +114 -0
package/skills/ce-plan/references/visual-communication.md +31 -0
package/skills/ce-work/references/shipping-workflow.md +129 -0
package/skills/ce-work-beta/references/codex-delegation-workflow.md +327 -0
package/skills/ce-work-beta/references/shipping-workflow.md +129 -0
package/skills/compound-docs/SKILL.md +2 -3
package/skills/document-review/references/synthesis-and-presentation.md +406 -0
package/skills/proof/references/hitl-review.md +368 -0
package/skills/writing-systematic-skills/SKILL.md +115 -0
package/skills/writing-systematic-skills/references/foundation-conventions.md +143 -0

package/skills/ce-brainstorm/references/universal-brainstorming.md ADDED Viewed

@@ -0,0 +1,63 @@
+# Universal Brainstorming Facilitator
+This file is loaded when ce-brainstorm detects a non-software task (Phase 0). It replaces the software-specific brainstorming phases (Phases 0.2 through 4) with facilitation principles for any domain. The Core Principles and **Interaction Rules** in the parent `ce-brainstorm/SKILL.md` still apply unchanged — including one-question-per-turn and the default to the platform's blocking question tool. This file extends those rules with universal-domain facilitation guidance; it does not relax them.
+---
+## Your role
+Be a thinking partner, not an answer machine. The user came here because they're stuck or exploring — they want to think WITH someone, not receive a deliverable. Resist the urge to generate a complete solution immediately. A premature answer anchors the conversation and kills exploration.
+**Match the tone to the stakes.** For personal or life decisions (career changes, housing, relationships, family), lead with values and feelings before frameworks and analysis. Ask what matters to them, not just what the options are. For lighter or creative tasks (podcast topics, event ideas, side projects), energy and enthusiasm are more useful than caution.
+## Asking questions
+"Thinking partner" framing does not mean "conversational prose." The parent skill's Interaction Rules apply in full: one question per turn, and default to the platform's blocking question tool (with its free-text fallback) even for opening and elicitation.
+"What's prompting this?", "what matters most here?", and "what have you ruled out?" feel open-ended and conversational, but that's not a reason to skip the tool. The free-text option preserves flexibility while a well-crafted option set teaches the user the dimensions they might not have separated. Pick-plus-optional-note is lower activation energy than composing prose from scratch — especially for emotional or values-laden topics where prose can feel like an essay prompt.
+Drop to prose only when (a) the answer is inherently narrative ("walk me through how you got here"), (b) the question is diagnostic or introspective and presented options would leak your priors and bias the answer, or (c) you cannot write 3-4 genuinely distinct, plausibly-correct options that cover the space without padding. If you'd be straining to fill the option slots, the question is open — use prose.
+## How to start
+**Assess scope first.** Not every brainstorm needs deep exploration:
+- **Quick** (user has a clear goal, just needs a sounding board): Confirm understanding, offer a few targeted suggestions or reactions, done in 2-3 exchanges.
+- **Standard** (some unknowns, needs to explore options): 4-6 exchanges, generate and compare options, help decide.
+- **Full** (vague goal, lots of uncertainty, or high-stakes decision): Deep exploration, many exchanges, structured convergence.
+**Ask what they're already thinking.** Before offering ideas, find out what the user has considered, tried, or rejected. This prevents fixation on AI-generated ideas and surfaces hidden constraints.
+**When the user represents a group** (couple, family, team) — surface whose preferences are in play and where they diverge. The brainstorm shifts from "help you decide" to "help you find alignment." Ask about each person's priorities, not just the speaker's.
+**Understand before generating.** Spend time on the problem before jumping to solutions. "What would success look like?" and "What have you already ruled out?" reveal more than "Here are 10 ideas."
+## How to explore and generate
+**Use diverse angles to avoid repetitive ideas.** When generating options, vary your approach across exchanges:
+- Inversion: "What if you did the opposite of the obvious choice?"
+- Constraints as creative tools: "What if budget/time/distance were no issue?" then "What if you had to do it for free?"
+- Analogy: "How does someone in a completely different context solve a similar problem?"
+- What the user hasn't considered: introduce lateral ideas from unexpected directions
+**Separate generation from evaluation.** When exploring options, don't critique them in the same breath. Generate first, evaluate later. Make the transition explicit when it's time to narrow.
+**Offer options to react to when the user is stuck.** People who can't generate from scratch can often evaluate presented options. Use multi-select questions to gather preferences efficiently. Always include a skip option for users who want to move faster.
+**Keep presented options to 3-5 at any decision point.** More causes analysis paralysis.
+## How to converge
+When the conversation has enough material to narrow — reflect back what you've heard. Name the user's priorities as they've emerged through the conversation (what excited them, what they rejected, what they asked about). Propose a frontrunner with reasoning tied to their criteria, and invite pushback. Keep final options to 3-5 max. Don't force a final decision if the user isn't there yet — clarity on direction is a valid outcome.
+## When to wrap up
+**Always synthesize a summary in the chat.** Before offering any next steps, reflect back what emerged: key decisions, the direction chosen, open threads, and any assumptions made. This is the primary output of the brainstorm — the user should be able to read the summary and know what they landed on.
+**Then offer next steps** using the platform's blocking question tool: `question` in OpenCode (call `ToolSearch` with `select:question` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). Fall back to numbered options in chat only when no blocking tool exists in the harness or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question.
+**Question:** "Brainstorm wrapped. What would you like to do next?"
+- **Create a plan** → hand off to `/ce-plan` with the decided goal and constraints
+- **Save summary to disk** → write the summary as a markdown file in the current working directory
+- **Open in Proof (web app) — review and comment to iterate with the agent** → load the `ce-proof` skill to open the doc in Every's Proof editor, iterate with the agent via comments, or copy a link to share with others
+- **Done** → the conversation was the value, no artifact needed

package/skills/ce-ideate/references/post-ideation-workflow.md ADDED Viewed

@@ -0,0 +1,240 @@
+# Post-Ideation Workflow
+Read this file after Phase 2 ideation agents return and the orchestrator has merged and deduped their outputs into a master candidate list. Do not load before Phase 2 completes.
+## Phase 3: Adversarial Filtering
+Review every candidate idea critically. The orchestrator performs this filtering directly -- do not dispatch sub-agents for critique.
+Do not generate replacement ideas in this phase unless explicitly refining.
+For each rejected idea, write a one-line reason.
+Rejection criteria:
+- too vague
+- not actionable
+- duplicates a stronger idea
+- not grounded in the stated context
+- too expensive relative to likely value
+- already covered by existing workflows or docs
+- interesting but better handled as a brainstorm variant, not a product improvement
+- **unjustified — no articulated warrant** (sub-agent failed to provide `direct:`, `external:`, or `reasoned:` justification, or the stated warrant does not actually support the claimed move)
+- **below ambition floor** (fails the meeting-test: would not warrant team discussion — except when Phase 0.5 detected tactical focus signals, in which case this criterion is waived)
+- **subject-replacement** (abandons or replaces the subject of ideation rather than operating on it — e.g., "pivot to an unrelated domain," "become a different organization")
+Score survivors using a consistent rubric weighing: groundedness in stated context, **warrant strength** (`direct:` > `external:` > `reasoned:`; none excluded, but direct-evidence ideas score higher all else equal), expected value, novelty, pragmatism, leverage on future work, implementation burden, and overlap with stronger ideas.
+Target output:
+- keep 5-7 survivors by default
+- if too many survive, run a second stricter pass
+- if fewer than 5 survive, report that honestly rather than lowering the bar
+## Phase 4: Present the Survivors
+**Checkpoint B (V17).** Before presenting, write `<scratch-dir>/survivors.md` (using the absolute path captured in Phase 1) containing the survivor list plus key context (focus hint, grounding summary, rejection summary). This protects the post-critique state before the user reaches the persistence menu. Best-effort: if the write fails (disk full, permissions), log a warning and proceed; the checkpoint is not load-bearing. Reuses the same `<run-id>` and `<scratch-dir>` generated in Phase 1; not cleaned up at the end of the run (the run directory is preserved so the V15 cache remains reusable across run-ids in the same session — see Phase 6).
+Present the surviving ideas to the user. The terminal review loop is a complete ideation cycle in itself — persistence is opt-in (Phase 5), and refinement happens in conversation with no file or network cost (Phase 6).
+Present only the surviving ideas in structured form:
+- title
+- description
+- **warrant** (tagged `direct:` / `external:` / `reasoned:`, with the quoted evidence, cited source, or written-out argument)
+- rationale (how the warrant connects to the move's significance)
+- downsides
+- confidence score
+- estimated complexity
+Then include a brief rejection summary so the user can see what was considered and cut.
+Keep the presentation concise. Allow brief follow-up questions and lightweight clarification.
+## Phase 5: Persistence (Opt-In, Mode-Aware)
+Persistence is opt-in. The terminal review loop is a complete ideation cycle. Refinement loops happen in conversation with no file or network cost. Persistence triggers only when the user explicitly chooses to save, share, or hand off (selected in Phase 6).
+When the user picks an option in Phase 6 that requires a durable record (Open and iterate in Proof, Brainstorm, Save and end), ensure a record exists first. When the user chooses to keep refining, no record is needed unless the user asks.
+**Mode-determined defaults:**
+| Action | Repo mode default | Elsewhere mode default |
+|---|---|---|
+| Save | `docs/ideation/YYYY-MM-DD-<topic>-ideation.md` | Proof |
+| Share | Proof (additional) | Proof (primary) |
+| Brainstorm handoff | `ce-brainstorm` | `ce-brainstorm` (universal-brainstorming) |
+| End | Conversation only is fine | Conversation only is fine |
+Either mode can also use the other destination on explicit request ("save to Proof even though this is repo mode", "save to a local file even though this is elsewhere"). Honor such overrides directly.
+### 5.1 File Save (default for repo mode; on request for elsewhere mode)
+1. Ensure `docs/ideation/` exists
+2. Choose the file path:
+   - `docs/ideation/YYYY-MM-DD-<topic>-ideation.md`
+   - `docs/ideation/YYYY-MM-DD-open-ideation.md` when no focus exists
+3. Write or update the ideation document
+Use this structure and omit clearly irrelevant fields only when necessary:
+```markdown
+---
+date: YYYY-MM-DD
+topic: <kebab-case-topic>
+focus: <optional focus hint>
+mode: <repo-grounded | elsewhere-software | elsewhere-non-software>
+---
+# Ideation: <Title>
+## Grounding Context
+[Grounding summary from Phase 1 — labeled "Codebase Context" in repo mode, "Topic Context" in elsewhere mode]
+## Ranked Ideas
+### 1. <Idea Title>
+**Description:** [Concrete explanation]
+**Warrant:** [`direct:` / `external:` / `reasoned:` — the actual basis, quoted or cited]
+**Rationale:** [How the warrant connects to the move's significance]
+**Downsides:** [Tradeoffs or costs]
+**Confidence:** [0-100%]
+**Complexity:** [Low / Medium / High]
+**Status:** [Unexplored / Explored]
+## Rejection Summary
+| # | Idea | Reason Rejected |
+|---|------|-----------------|
+| 1 | <Idea> | <Reason rejected> |
+```
+If resuming:
+- update the existing file in place
+- preserve explored markers
+### 5.2 Proof Save (default for elsewhere mode; on request for repo mode)
+Hand off the ideation content to the `ce-proof` skill in HITL review mode. This uploads the doc, runs an iterative review loop (user annotates in Proof, agent ingests feedback and applies tracked edits), and (in repo mode) syncs the reviewed markdown back to `docs/ideation/`.
+Load the `ce-proof` skill in HITL-review mode with:
+- **source content:** the survivors and rejection summary from Phase 4 (in repo mode, this is the file written in 5.1; in elsewhere mode, render to a temp file as the source for upload)
+- **doc title:** `Ideation: <topic>` or the H1 of the ideation doc
+- **identity:** `ai:systematic` / `Systematic`
+- **recommended next step:** `/ce-brainstorm` (shown in the proof skill's final terminal output)
+The Proof failure ladder in Phase 6.5 governs what happens when this hand-off fails.
+**Caller-aware return.** The return-rule bullets below describe the default control flow, but the next step depends on which Phase 6 option invoked the Proof save. Apply the right branch for the caller:
+- **§6.2 Open and iterate in Proof.** Behavior is mode-aware:
+    - *Repo mode:* return to the Phase 6 menu on every status. The Proof-reviewed content is now synced locally, and the user typically has a follow-up action in the repo (brainstorm toward a plan, save and end, or keep refining).
+    - *Elsewhere mode:* on a successful Proof return (`proceeded` or `done_for_now`), exit cleanly — narrate that the artifact lives at `docUrl` (including any stale-local note if applicable) and stop. Proof iteration is often the terminal act in elsewhere mode; forcing another menu choice after the user already got what they came for produces decision fatigue. Only the `aborted` branch returns to the Phase 6 menu so the user can retry or pick another path.
+- **§6.3 Brainstorm a selected idea.** On a successful Proof return (`proceeded` or `done_for_now`), do **not** stop at the Phase 6 menu — after applying the per-status handling below (including any stale-local pull offer), continue into §6.3's remaining bullets (mark the chosen idea as `Explored`, then load `ce-brainstorm`). Only the `aborted` branch returns to the Phase 6 menu, since no durable record was written.
+- **§6.4 Save and end.** On a successful Proof return (`proceeded` or `done_for_now`), exit cleanly: narrate that the ideation was saved, surface the `docUrl` (and the local-path note if applicable), and stop. Do **not** re-ask the Phase 6 question — the user already chose to end. Only the `aborted` branch returns to the Phase 6 menu so the user can retry or pick a different path.
+When the proof skill returns control:
+- `status: proceeded` with `localSynced: true` → the ideation doc on disk now reflects the review. Apply the caller-aware return rule above for the invoking branch.
+- `status: proceeded` with `localSynced: false` → the reviewed version lives in Proof at `docUrl` but the local copy is stale. Offer to pull the Proof doc to `localPath` using the proof skill's Pull workflow. Apply the caller-aware return rule above; if the pull was declined, include a one-line note that `<localPath>` is stale vs. Proof so the next handoff (or final exit narration) doesn't read the old content silently. Placement: above the Phase 6 menu when the caller-aware rule returns to it, in the handoff preamble to `ce-brainstorm` for §6.3, or alongside the final save/exit narration for §6.2 elsewhere / §6.4.
+- `status: done_for_now` → the doc on disk may be stale if the user edited in Proof before leaving. Offer to pull the Proof doc to `localPath` so the local ideation artifact stays in sync, then apply the caller-aware return rule above. `done_for_now` means the user stopped the HITL loop — it does not mean they ended the whole ideation session unless the caller-aware rule exits (§6.2 elsewhere mode or §6.4). If the pull was declined, include the stale-local note at the placement described in the previous bullet.
+- `status: aborted` → fall back to the Phase 6 menu without changes, regardless of caller. No durable record was written, so §6.3 must not proceed with the brainstorm handoff and §6.4 must not end — the menu lets the user retry or pick another path.
+## Phase 6: Refine or Hand Off
+Ask what should happen next using the platform's blocking question tool: `question` in OpenCode (call `ToolSearch` with `select:question` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). Fall back to numbered options in chat only when no blocking tool exists in the harness or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question.
+**Question:** "What should the agent do next?"
+Offer these four options (labels are self-contained with the distinguishing word front-loaded so options stay distinct when truncated):
+1. **Refine the ideation in conversation (or stop here — no save)** — add ideas, re-evaluate, or deepen analysis. No file or network side effects; ending the conversation at any point after this pick is a valid no-save exit.
+2. **Open and iterate in Proof** — save the ideation to Proof and enter the proof skill's HITL review loop: iterate via comments in the Proof editor; reviewed edits sync back to `docs/ideation/` in repo mode.
+3. **Brainstorm a selected idea** — load `ce-brainstorm` with the chosen idea as the seed. The orchestrator first writes a durable record using the mode default in Phase 5.
+4. **Save and end** — persist the ideation using the mode default (file in repo mode, Proof in elsewhere mode), then end.
+No-save exit is supported without a dedicated menu option. Pick option 1 and stop the conversation, or use the question tool's free-text escape to say so directly — persistence is opt-in and the terminal review loop is already a complete ideation cycle.
+Do not delete the run's scratch directory (`<scratch-dir>` resolved in Phase 1) on completion. The V15 web-research cache is session-scoped and reused across run-ids by later ideation invocations in the same session (see `references/web-research-cache.md`); per-run cleanup would defeat that reuse. Checkpoint A (`raw-candidates.md`) and Checkpoint B (`survivors.md`) are cheap to leave behind and follow the repo's Scratch Space cross-invocation-reusable convention — OS handles eventual cleanup.
+### 6.1 Refine the Ideation in Conversation
+Route refinement by intent:
+- `add more ideas` or `explore new angles` -> return to Phase 2
+- `re-evaluate` or `raise the bar` -> return to Phase 3
+- `dig deeper on idea #N` -> expand only that idea's analysis
+No persistence triggers during refinement. The user can choose Save and end (or Brainstorm, or Open and iterate in Proof) when they are ready to persist.
+Ending after refinement — or without any refinement at all — is a valid no-save exit. There is no required next step; stopping the conversation here leaves no durable artifact, which matches the opt-in persistence contract.
+### 6.2 Open and Iterate in Proof
+Invoke the Proof HITL review path via §5.2 with §6.2 as the caller. In repo mode, ensure the local file exists first (run §5.1) so the HITL sync-back has a target; in elsewhere mode, §5.2 renders to a temp file as usual. Honor Phase 5's "ensure a record exists first" contract either way.
+Apply §5.2's caller-aware return rule for the §6.2 branch — behavior is mode-aware. In repo mode, return to the Phase 6 menu on every status so the user can pick a follow-up (brainstorm toward a plan, save-and-end, or keep refining) now that the Proof review is reflected in the local file. In elsewhere mode, exit cleanly on a successful Proof return since Proof iteration is often the terminal act — the artifact lives at `docUrl` and is the canonical record; only the `aborted` status returns to the menu.
+If the Proof handoff fails, the §6.5 Proof Failure Ladder governs recovery.
+### 6.3 Brainstorm a Selected Idea
+- Write or update the durable record per the mode default in Phase 5 (file in repo mode, Proof in elsewhere mode). When this routes through §5.2 Proof Save, apply §5.2's caller-aware return rule: continue into the next bullet on a successful Proof return instead of bouncing back to the Phase 6 menu. If Proof returned `aborted` (no durable record written), go back to the Phase 6 menu and do **not** proceed with the brainstorm handoff.
+- Mark the chosen idea as `Explored` in the saved record
+- Load the `ce-brainstorm` skill with the chosen idea as the seed
+**Repo mode only:** do **not** skip brainstorming and go straight to `ce-plan` from ideation output — `ce-plan` wants brainstorm-grounded requirements. In elsewhere modes, ideation (or ideation + Proof iteration) is a legitimate terminal state; brainstorming is optional deeper development of one idea, not a required next rung on an implementation ladder that does not exist in these modes.
+### 6.4 Save and End
+Persist via the mode default (5.1 in repo mode, 5.2 in elsewhere mode), then end. If the user instead asked to use the non-default destination, honor that explicit request.
+When the path lands in a Proof save (5.2), apply §5.2's caller-aware return rule for the §6.4 branch: on a successful Proof return, exit cleanly — narrate the save, surface the `docUrl` (and any stale-local note if the pull was declined), and stop. Do **not** loop back to the Phase 6 menu; the user already chose to end. Only a `status: aborted` from Proof returns to the menu so the user can retry or pick another path (file save, custom path, or keep refining). The §6.5 Proof Failure Ladder still governs persistent Proof failures and ends at the Phase 6 menu — that failure-recovery path is distinct from the successful-save exit described here.
+When the path lands in a file save (5.1):
+- offer to commit only the ideation doc
+- do not create a branch
+- do not push
+- if the user declines, leave the file uncommitted
+After the file save (and optional commit), end the session — do not return to the Phase 6 menu.
+### 6.5 Proof Failure Ladder
+The `ce-proof` skill performs single-retry-once internally on transient failures (`STALE_BASE`, `BASE_TOKEN_REQUIRED`) before surfacing failure. The proof skill's return contract does not expose typed error classes to callers — the orchestrator cannot distinguish retryable vs terminal failures from outside.
+**Orchestrator-side retry harness (intentionally minimal):** wrap the proof skill invocation in **one** additional best-effort retry with a short pause (~2 seconds). The proof skill already retried internally, so this catches transient races at the orchestrator boundary without compounding latency. Do not classify error types from outside the skill — no detection mechanism exists.
+Distinguish create-failure from ops-failure by inspecting whether the proof skill returned a `docUrl` before failing:
+- **Create-failure** (no `docUrl` returned): retry the create.
+- **Ops-failure** (a `docUrl` was returned, but a later operation failed): retry only the failing operation. **Do not recreate** the document.
+**Failure narration.** Narrate the single retry to the terminal so the pause does not look like a hang ("Retrying Proof... attempt 2/2"). On persistent failure, narrate that retry exhausted before showing the fallback menu.
+**Fallback menu after persistent failure.** Use the platform's blocking question tool. Present these options (omit option (a) if no repo exists at CWD):
+- "Save to `docs/ideation/` instead" (repo-mode default destination, available when CWD is inside a git repo)
+- "Save to a custom path the user provides" (validate writable; create parent dirs)
+- "Skip save and keep the ideation in conversation" (no persistence)
+If proof returned a partial `docUrl` before failing, surface that URL alongside the fallback options so the user can recover or share the partial record.
+After the fallback completes (any path), continue back to the Phase 6 menu so the user can still refine, iterate in Proof, brainstorm, or save and end.
+## Quality Bar
+Before finishing, check:
+- the idea set is grounded in the stated context (codebase in repo mode; user-supplied context in elsewhere mode)
+- **every surviving idea has articulated warrant** (`direct:`, `external:`, or `reasoned:`) that actually supports the claimed move — speculation dressed as ambition was rejected, with reasons
+- **every surviving idea passes the meeting-test** unless Phase 0.5 detected tactical focus signals that waived the floor
+- **no surviving idea replaces the subject** rather than operating on it
+- the candidate list was generated before filtering
+- the original many-ideas -> critique -> survivors mechanism was preserved
+- if sub-agents were used, they improved diversity without replacing the core workflow
+- every rejected idea has a reason
+- survivors are materially better than a naive "give me ideas" list
+- persistence followed user choice — terminal-only sessions did not write a file or call Proof
+- when persistence did trigger, the mode default was respected unless the user explicitly overrode it
+- acting on an idea routes to `ce-brainstorm`, not directly to implementation

package/skills/ce-plan/references/deepening-workflow.md ADDED Viewed

@@ -0,0 +1,249 @@
+# Deepening Workflow
+This file contains the confidence-check execution path (5.3.3-5.3.7). Load it only when the deepening gate at 5.3.2 determines that deepening is warranted.
+## 5.3.3 Score Confidence Gaps
+Use a checklist-first, risk-weighted scoring pass.
+For each section, compute:
+- **Trigger count** - number of checklist problems that apply
+- **Risk bonus** - add 1 if the topic is high-risk and this section is materially relevant to that risk
+- **Critical-section bonus** - add 1 for `Key Technical Decisions`, `Implementation Units`, `System-Wide Impact`, `Risks & Dependencies`, or `Open Questions` in `Standard` or `Deep` plans
+Treat a section as a candidate if:
+- it hits **2+ total points**, or
+- it hits **1+ point** in a high-risk domain and the section is materially important
+Choose only the top **2-5** sections by score. If deepening a lightweight plan (high-risk exception), cap at **1-2** sections.
+If the plan already has a `deepened:` date:
+- Prefer sections that have not yet been substantially strengthened, if their scores are comparable
+- Revisit an already-deepened section only when it still scores clearly higher than alternatives
+**Section Checklists:**
+**Requirements**
+- Requirements are vague or disconnected from implementation units
+- Success criteria are missing or not reflected downstream
+- Units do not clearly advance the traced requirements
+- Origin requirements are not clearly carried forward
+- Origin A/F/AE IDs (when supplied by the upstream brainstorm) are not preserved where planning decisions touch them, or are referenced inconsistently across Requirements, units, and test scenarios
+**Context & Research / Sources & References**
+- Relevant repo patterns are named but never used in decisions or implementation units
+- Cited learnings or references do not materially shape the plan
+- High-risk work lacks appropriate external or internal grounding
+- Research is generic instead of tied to this repo or this plan
+**Key Technical Decisions**
+- A decision is stated without rationale
+- Rationale does not explain tradeoffs or rejected alternatives
+- The decision does not connect back to scope, requirements, or origin context
+- An obvious design fork exists but the plan never addresses why one path won
+**Open Questions**
+- Product blockers are hidden as assumptions
+- Planning-owned questions are incorrectly deferred to implementation
+- Resolved questions have no clear basis in repo context, research, or origin decisions
+- Deferred items are too vague to be useful later
+**High-Level Technical Design (when present)**
+- The sketch uses the wrong medium for the work
+- The sketch contains implementation code rather than pseudo-code
+- The non-prescriptive framing is missing or weak
+- The sketch does not connect to the key technical decisions or implementation units
+**High-Level Technical Design (when absent)** *(Standard or Deep plans only)*
+- The work involves DSL design, API surface design, multi-component integration, complex data flow, or state-heavy lifecycle
+- Key technical decisions would be easier to validate with a visual or pseudo-code representation
+- The approach section of implementation units is thin and a higher-level technical design would provide context
+**Implementation Units**
+- Dependency order is unclear or likely wrong
+- File paths or test file paths are missing where they should be explicit
+- Units are too large, too vague, or broken into micro-steps
+- Approach notes are thin or do not name the pattern to follow
+- Test scenarios are vague (don't name inputs and expected outcomes), skip applicable categories (e.g., no error paths for a unit with failure modes, no integration scenarios for a unit crossing layers), or are disproportionate to the unit's complexity
+- Feature-bearing units have blank or missing test scenarios (feature-bearing units require actual test scenarios; the `Test expectation: none` annotation is only valid for non-feature-bearing units)
+- Verification outcomes are vague or not expressed as observable results
+- Existing U-IDs were renumbered after a unit was reordered, split, or deleted (U-IDs are stable: never renumber existing IDs; gaps from deletions are preserved; new units take the next unused number)
+- A unit realizing an origin Key Flow does not cite the F-ID, or a unit enforcing an origin Acceptance Example does not cite the AE-ID, when origin supplies them
+**System-Wide Impact**
+- Affected interfaces, callbacks, middleware, entry points, or parity surfaces are missing
+- Failure propagation is underexplored
+- State lifecycle, caching, or data integrity risks are absent where relevant
+- Integration coverage is weak for cross-layer work
+**Risks & Dependencies / Documentation / Operational Notes**
+- Risks are listed without mitigation
+- Rollout, monitoring, migration, or support implications are missing when warranted
+- External dependency assumptions are weak or unstated
+- Security, privacy, performance, or data risks are absent where they obviously apply
+Use the plan's own `Context & Research` and `Sources & References` as evidence. If those sections cite a pattern, learning, or risk that never affects decisions, implementation units, or verification, treat that as a confidence gap.
+## 5.3.4 Report and Dispatch Targeted Research
+Before dispatching agents, report what sections are being strengthened and why:
+```text
+Strengthening [section names] — [brief reason for each, e.g., "decision rationale is thin", "cross-boundary effects aren't mapped"]
+```
+For each selected section, choose the smallest useful agent set. Do **not** run every agent. Use at most **1-3 agents per section** and usually no more than **8 agents total**.
+Use fully-qualified agent names inside Task calls.
+**Deterministic Section-to-Agent Mapping:**
+**Requirements / Open Questions classification**
+- `ce-spec-flow-analyzer` for missing user flows, edge cases, and handoff gaps
+- `ce-repo-research-analyst` (Scope: `architecture, patterns`) for repo-grounded patterns, conventions, and implementation reality checks
+**Context & Research / Sources & References gaps**
+- `ce-learnings-researcher` for institutional knowledge and past solved problems
+- `ce-framework-docs-researcher` for official framework or library behavior
+- `ce-best-practices-researcher` for current external patterns and industry guidance
+- Add `ce-git-history-analyzer` only when historical rationale or prior art is materially missing
+**Key Technical Decisions**
+- `ce-architecture-strategist` for design integrity, boundaries, and architectural tradeoffs
+- Add `ce-framework-docs-researcher` or `ce-best-practices-researcher` when the decision needs external grounding beyond repo evidence
+**High-Level Technical Design**
+- `ce-architecture-strategist` for validating that the technical design accurately represents the intended approach and identifying gaps
+- `ce-repo-research-analyst` (Scope: `architecture, patterns`) for grounding the technical design in existing repo patterns and conventions
+- Add `ce-best-practices-researcher` when the technical design involves a DSL, API surface, or pattern that benefits from external validation
+**Implementation Units / Verification**
+- `ce-repo-research-analyst` (Scope: `patterns`) for concrete file targets, patterns to follow, and repo-specific sequencing clues
+- `ce-pattern-recognition-specialist` for consistency, duplication risks, and alignment with existing patterns
+- Add `ce-spec-flow-analyzer` when sequencing depends on user flow or handoff completeness
+**System-Wide Impact**
+- `ce-architecture-strategist` for cross-boundary effects, interface surfaces, and architectural knock-on impact
+- Add the specific specialist that matches the risk:
+  - `ce-performance-oracle` for scalability, latency, throughput, and resource-risk analysis
+  - `ce-security-sentinel` for auth, validation, exploit surfaces, and security boundary review
+  - `ce-data-integrity-guardian` for migrations, persistent state safety, consistency, and data lifecycle risks
+**Risks & Dependencies / Operational Notes**
+- Use the specialist that matches the actual risk:
+  - `ce-security-sentinel` for security, auth, privacy, and exploit risk
+  - `ce-data-integrity-guardian` for persistent data safety, constraints, and transaction boundaries
+  - `ce-data-migration-expert` for migration realism, backfills, and production data transformation risk
+  - `ce-deployment-verification-agent` for rollout checklists, rollback planning, and launch verification
+  - `ce-performance-oracle` for capacity, latency, and scaling concerns
+**Agent Prompt Shape:**
+For each selected section, pass:
+- The scope prefix from the mapping above when the agent supports scoped invocation
+- A short plan summary
+- The exact section text
+- Why the section was selected, including which checklist triggers fired
+- The plan depth and risk profile
+- A specific question to answer
+Instruct the agent to return:
+- findings that change planning quality
+- stronger rationale, sequencing, verification, risk treatment, or references
+- no implementation code
+- no shell commands
+## 5.3.5 Choose Research Execution Mode
+Use the lightest mode that will work:
+- **Direct mode** - Default. Use when the selected section set is small and the parent can safely read the agent outputs inline.
+- **Artifact-backed mode** - Use only when the selected research scope is large enough that inline returns would create unnecessary context pressure.
+Signals that justify artifact-backed mode:
+- More than 5 agents are likely to return meaningful findings
+- The selected section excerpts are long enough that repeating them in multiple agent outputs would be wasteful
+- The topic is high-risk and likely to attract bulky source-backed analysis
+If artifact-backed mode is not clearly warranted, stay in direct mode.
+Artifact-backed mode uses a per-run OS-temp scratch directory. Create it once before dispatching sub-agents and capture its **absolute path** — pass that absolute path to each sub-agent so they write to it directly. Do not use `.context/`; the artifacts are per-run throwaway that are cleaned up when deepening ends (see 5.3.6b), matching the repo Scratch Space convention for one-shot artifacts. Do not pass unresolved shell-variable strings to sub-agents; they need the resolved absolute path.
+```bash
+SCRATCH_DIR="$(mktemp -d -t ce-plan-deepen-XXXXXX)"
+echo "$SCRATCH_DIR"
+```
+Refer to the echoed absolute path as `<scratch-dir>` throughout the rest of this workflow.
+## 5.3.6 Run Targeted Research
+Launch the selected agents in parallel using the execution mode chosen above. If the current platform does not support parallel dispatch, run them sequentially instead. Omit the `mode` parameter when dispatching so the user's configured permission settings apply.
+Prefer local repo and institutional evidence first. Use external research only when the gap cannot be closed responsibly from repo context or already-cited sources.
+If a selected section can be improved by reading the origin document more carefully, do that before dispatching external agents.
+**Direct mode:** Have each selected agent return its findings directly to the parent. Keep the return payload focused: strongest findings only, the evidence or sources that matter, the concrete planning improvement implied by the finding.
+**Artifact-backed mode:** For each selected agent, pass the absolute `<scratch-dir>` path captured earlier and instruct the agent to write one compact artifact file inside that directory, then return only a short completion summary. Each artifact should contain: target section, why selected, 3-7 findings, source-backed rationale, the specific plan change implied by each finding. No implementation code, no shell commands.
+If an artifact is missing or clearly malformed, re-run that agent or fall back to direct-mode reasoning for that section.
+If agent outputs conflict:
+- Prefer repo-grounded and origin-grounded evidence over generic advice
+- Prefer official framework documentation over secondary best-practice summaries when the conflict is about library behavior
+- If a real tradeoff remains, record it explicitly in the plan
+## 5.3.6b Interactive Finding Review (Interactive Mode Only)
+Skip this step in auto mode — proceed directly to 5.3.7.
+In interactive mode, present each agent's findings to the user before integration. For each agent that returned findings:
+1. **Summarize the agent and its target section** — e.g., "The ce-architecture-strategist reviewed Key Technical Decisions and found:"
+2. **Present the findings concisely** — bullet the key points, not the raw agent output. Include enough context for the user to evaluate: what the agent found, what evidence supports it, and what plan change it implies.
+3. **Ask the user** using the platform's blocking question tool when available (see Interaction Method):
+   - **Accept** — integrate these findings into the plan
+   - **Reject** — discard these findings entirely
+   - **Discuss** — the user wants to talk through the findings before deciding
+If the user chooses "Discuss", engage in brief dialogue about the findings and then re-ask with only accept/reject (no discuss option on the second ask). The user makes a deliberate choice either way.
+When presenting findings from multiple agents targeting the same section, present them one agent at a time so the user can make independent decisions. Do not merge findings from different agents before showing them.
+After all agents have been reviewed, carry only the accepted findings forward to 5.3.7.
+If the user accepted no findings, report "No findings accepted — plan unchanged." Then proceed directly to Phase 5.4 (skip document-review and synthesis — the plan was not modified). This interactive-mode-only skip does not apply in auto mode; auto mode always proceeds through 5.3.7 and 5.3.8. No explicit scratch cleanup needed — `$SCRATCH_DIR` is OS temp and will be cleaned up by the OS; leaving it in place preserves the rejected agent artifacts for debugging.
+If findings were accepted and the plan was modified, proceed through 5.3.7 and 5.3.8 as normal — document-review acts as a quality gate on the changes.
+## 5.3.7 Synthesize and Update the Plan
+Strengthen only the selected sections. Keep the plan coherent and preserve its overall structure.
+**In interactive mode:** Only integrate findings the user accepted in 5.3.6b. If some findings from different agents touch the same section, reconcile them coherently but do not reintroduce rejected findings.
+Allowed changes:
+- Clarify or strengthen decision rationale
+- Tighten requirements trace or origin fidelity
+- Reorder or split implementation units when sequencing is weak — but **never renumber existing U-IDs**. Reordering preserves U-IDs in their new order (e.g., U1, U3, U5 reordered is correct; renumbering to U1, U2, U3 is not). Splitting keeps the original U-ID on the original concept and assigns the next unused number to the new unit. Renumbering breaks ce-work blocker and verification references that were written against the original IDs
+- Add missing pattern references, file/test paths, or verification outcomes
+- Expand system-wide impact, risks, or rollout treatment where justified
+- Reclassify open questions between `Resolved During Planning` and `Deferred to Implementation` when evidence supports the change
+- Strengthen, replace, or add a High-Level Technical Design section when the work warrants it and the current representation is weak
+- Strengthen or add per-unit technical design fields where the unit's approach is non-obvious
+- Add or update `deepened: YYYY-MM-DD` in frontmatter when the plan was substantively improved
+Do **not**:
+- Add implementation code — no imports, exact method signatures, or framework-specific syntax. Pseudo-code sketches and DSL grammars are allowed
+- Add git commands, commit choreography, or exact test command recipes
+- Add generic `Research Insights` subsections everywhere
+- Rewrite the entire plan from scratch
+- Invent new product requirements, scope changes, or success criteria without surfacing them explicitly
+- Renumber existing U-IDs as part of reordering, splitting, deletion, or "tidying" the unit list. Deepening is the most likely accidental-renumber vector — preserve U-IDs even when the new order would look cleaner with sequential numbering
+If research reveals a product-level ambiguity that should change behavior or scope:
+- Do not silently decide it here
+- Record it under `Open Questions`
+- Recommend `ce-brainstorm` if the gap is truly product-defining