npm - @fro.bot/systematic - Versions diffs - 2.6.0 → 2.7.0 - Mend

@fro.bot/systematic 2.6.0 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/agents/review/api-contract-reviewer.md +1 -1
package/agents/review/correctness-reviewer.md +1 -1
package/agents/review/data-migrations-reviewer.md +1 -1
package/agents/review/dhh-rails-reviewer.md +1 -1
package/agents/review/julik-frontend-races-reviewer.md +1 -1
package/agents/review/kieran-python-reviewer.md +1 -1
package/agents/review/kieran-rails-reviewer.md +1 -1
package/agents/review/kieran-typescript-reviewer.md +1 -1
package/agents/review/maintainability-reviewer.md +1 -1
package/agents/review/performance-reviewer.md +1 -1
package/agents/review/reliability-reviewer.md +1 -1
package/agents/review/security-reviewer.md +1 -1
package/agents/workflow/bug-reproduction-validator.md +1 -1
package/dist/cli.js +1 -1
package/dist/{index-3h7kpmfa.js → index-k9tdxh0p.js} +1 -1
package/dist/index.d.ts +1 -1
package/dist/index.js +2 -3
package/dist/lib/skills.d.ts +1 -0
package/package.json +1 -1
package/skills/ce-brainstorm/references/handoff.md +127 -0
package/skills/ce-brainstorm/references/requirements-capture.md +243 -0
package/skills/ce-brainstorm/references/universal-brainstorming.md +63 -0
package/skills/ce-ideate/references/post-ideation-workflow.md +240 -0
package/skills/ce-plan/references/deepening-workflow.md +249 -0
package/skills/ce-plan/references/plan-handoff.md +96 -0
package/skills/ce-plan/references/universal-planning.md +114 -0
package/skills/ce-plan/references/visual-communication.md +31 -0
package/skills/ce-work/references/shipping-workflow.md +129 -0
package/skills/ce-work-beta/references/codex-delegation-workflow.md +327 -0
package/skills/ce-work-beta/references/shipping-workflow.md +129 -0
package/skills/compound-docs/SKILL.md +2 -3
package/skills/document-review/references/synthesis-and-presentation.md +406 -0
package/skills/proof/references/hitl-review.md +368 -0
package/skills/writing-systematic-skills/SKILL.md +115 -0
package/skills/writing-systematic-skills/references/foundation-conventions.md +143 -0

package/skills/ce-plan/references/plan-handoff.md ADDED Viewed

@@ -0,0 +1,96 @@
+# Plan Handoff
+This file contains post-plan-writing instructions: document review, post-generation options, and issue creation. Load it after the plan file has been written and the confidence check (5.3.1-5.3.7) is complete.
+## 5.3.8 Document Review
+After the confidence check (and any deepening), run the `ce-doc-review` skill on the plan file. Pass the plan path as the argument. When this step is reached, it is mandatory — do not skip it because the confidence check already ran. The two tools catch different classes of issues.
+The confidence check and ce-doc-review are complementary:
+- The confidence check strengthens rationale, sequencing, risk treatment, and grounding
+- Document-review checks coherence, feasibility, scope alignment, and surfaces role-specific issues
+If ce-doc-review returns findings that were auto-applied, note them briefly when presenting handoff options. If residual P0/P1 findings were surfaced, mention them so the user can decide whether to address them before proceeding.
+When ce-doc-review returns "Review complete", proceed to Final Checks.
+**Pipeline mode:** If invoked from an automated workflow such as LFG or any `disable-model-invocation` context, run `ce-doc-review` with `mode:headless` and the plan path. Headless mode applies auto-fixes silently and returns structured findings without interactive prompts. Address any P0/P1 findings before returning control to the caller.
+## 5.3.9 Final Checks and Cleanup
+Before proceeding to post-generation options:
+- Confirm the plan is stronger in specific ways, not merely longer
+- Confirm the planning boundary is intact
+- Confirm origin decisions were preserved when an origin document exists
+If artifact-backed mode was used:
+- Clean up the temporary scratch directory after the plan is safely updated
+- If cleanup is not practical on the current platform, note where the artifacts were left
+## 5.4 Post-Generation Options
+**Pipeline mode:** If invoked from an automated workflow such as LFG or any `disable-model-invocation` context, skip the interactive menu below and return control to the caller immediately. The plan file has already been written, the confidence check has already run, and ce-doc-review has already run — the caller (e.g., lfg) determines the next step.
+After document-review completes, present the options using the platform's blocking question tool: `question` in OpenCode (call `ToolSearch` with `select:question` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). Fall back to numbered options in chat only when no blocking tool exists in the harness or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question.
+**Path format:** Use absolute paths for chat-output file references — relative paths are not auto-linked as clickable in most terminals.
+**Question:** "Plan ready at `<absolute path to plan>`. What would you like to do next?"
+**Options:**
+1. **Start `/ce-work`** (recommended) - Begin implementing this plan in the current session
+2. **Create Issue** - Create a tracked issue from this plan in your configured issue tracker (GitHub or Linear)
+3. **Open in Proof (web app) — review and comment to iterate with the agent** - Open the doc in Every's Proof editor, iterate with the agent via comments, or copy a link to share with others
+4. **Done for now** - Pause; the plan file is saved and can be resumed later
+**Surface additional document review contextually, not as a menu fixture:** When the prior document-review pass surfaced residual P0/P1 findings that the user has not addressed, mention them adjacent to the menu and offer another review pass in prose (e.g., "Document review flagged 2 P1 findings you may want to address — want me to run another pass before you pick?"). Do not add it to the option list.
+Based on selection:
+- **Start `/ce-work`** -> Call `/ce-work` with the plan path
+- **Create Issue** -> Follow the Issue Creation section below
+- **Open in Proof (web app) — review and comment to iterate with the agent** -> Load the `ce-proof` skill in HITL-review mode with:
+  - source file: `docs/plans/<plan_filename>.md`
+  - doc title: `Plan: <plan title from frontmatter>`
+  - identity: `ai:systematic` / `Systematic`
+  - recommended next step: `/ce-work` (shown in the ce-proof skill's final terminal output)
+  Follow `references/hitl-review.md` in the ce-proof skill. It uploads the plan, prompts the user for review in Proof's web UI, ingests each thread by reading it fresh and replying in-thread, applies agreed edits as tracked suggestions, and syncs the final markdown back to the plan file atomically on proceed.
+  When the ce-proof skill returns:
+  - `status: proceeded` with `localSynced: true` -> the plan on disk now reflects the review. Re-run `ce-doc-review` on the updated plan before re-rendering the menu — HITL can materially rewrite the plan body, so the prior ce-doc-review pass no longer covers the current file and section 5.3.8 requires a review before any handoff option is offered. Then return to the post-generation options with the refreshed residual findings.
+  - `status: proceeded` with `localSynced: false` -> the reviewed version lives in Proof at `docUrl` but the local copy is stale. Offer to pull the Proof doc to `localPath` using the ce-proof skill's Pull workflow. If the pull happened, re-run `ce-doc-review` on the pulled file before re-rendering the options (same 5.3.8 rationale — the local plan was materially updated by the pull). If the pull was declined, include a one-line note above the menu that `<localPath>` is stale vs. Proof — otherwise `Start /ce-work` or `Create Issue` will silently use the pre-review copy.
+  - `status: done_for_now` -> the plan on disk may be stale if the user edited in Proof before leaving. Offer to pull the Proof doc to `localPath` so the local plan file stays in sync. If the pull happened, re-run `ce-doc-review` on the pulled file before re-rendering the options (same 5.3.8 rationale). If the pull was declined, include the stale-local note above the menu. `done_for_now` means the user stopped the HITL loop — it does not mean they ended the whole plan session; they may still want to start work or create an issue.
+  - `status: aborted` -> fall back to the options without changes.
+  If the initial upload fails (network error, Proof API down), retry once after a short wait. If it still fails, tell the user the upload didn't succeed and briefly explain why, then return to the options — don't leave them wondering why the option did nothing.
+- **Done for now** -> Display a brief confirmation that the plan file is saved and end the turn
+- **If the user asks for another document review** (either from the contextual prompt when P0/P1 findings remain, or by free-form request) -> Load the `ce-doc-review` skill with the plan path for another pass, then return to the options
+- **Other** -> Accept free text for revisions and loop back to options
+## Issue Creation
+When the user selects "Create Issue", detect their project tracker:
+1. Read `AGENTS.md` (or `AGENTS.md` for compatibility) at the repo root and look for `project_tracker: github` or `project_tracker: linear`.
+2. If `project_tracker: github`:
+   ```bash
+   gh issue create --title "<type>: <title>" --body-file <plan_path>
+   ```
+3. If `project_tracker: linear`:
+   ```bash
+   linear issue create --title "<title>" --description "$(cat <plan_path>)"
+   ```
+4. If no tracker is configured, ask the user which tracker they use with the platform's blocking question tool: `question` in OpenCode (call `ToolSearch` with `select:question` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). Fall back to asking in chat only when no blocking tool exists or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip. Options: `GitHub`, `Linear`, `Skip`. Then:
+   - Proceed with the chosen tracker's command above
+   - Offer to persist the choice by adding `project_tracker: <value>` to `AGENTS.md`, where `<value>` is the lowercase tracker key (`github` or `linear`) — not the display label — so future runs match the detector in step 1 and skip this prompt
+   - If `Skip`, return to the options without creating an issue
+5. If the detected tracker's CLI is not installed or not authenticated, surface a clear error (e.g., "`gh` CLI not found — install it or create the issue manually") and return to the options.
+After issue creation:
+- Display the issue URL
+- Ask whether to proceed to `/ce-work` using the platform's blocking question tool

package/skills/ce-plan/references/universal-planning.md ADDED Viewed

@@ -0,0 +1,114 @@
+# Universal Planning Workflow
+This file is loaded when ce-plan detects a non-software task (Phase 0.1b). It replaces the software-specific phases (0.2 through 5.1) with a domain-agnostic planning workflow.
+## Before starting: verify classification
+The detection stub in SKILL.md routes here for anything that isn't clearly software. Verify the classification is correct before proceeding:
+- **Is this actually a software task?** The key distinction is task-type, not topic-domain. A study guide about Rust is non-software (producing educational content). A Rust library refactor is software (modifying code). If this is actually software, return to Phase 0.2 in the main SKILL.md.
+- **Is this a quick-help request, not a planning task?** Error messages, factual questions, and single-step tasks don't need a plan. Respond directly and exit. Examples: "zsh: command not found: brew", "what's the capital of France."
+- **Pipeline mode?** If invoked from LFG or any `disable-model-invocation` context: output "This is a non-software task. The LFG pipeline requires ce-work, which only supports software tasks. Use `/ce-plan` directly for non-software planning." and stop.
+Once past these checks, commit to producing a plan. Do not exit because the task looks like a "lookup" or "research question" — the user invoked `ce-plan` because they want a structured output.
+---
+## Step 1: Assess Ambiguity and Research Need
+Evaluate two things before planning:
+**Would 1-3 quick questions meaningfully improve this plan?**
+- **Default: ask 1-3 questions** via Step 1b when the answers would change the plan's structure or content. Always include a final option like "Skip — just make the plan with reasonable assumptions" so the user can opt out instantly.
+- **Skip questions entirely** only when the request already specifies all major variables or the task is simple enough that reasonable assumptions cover it well.
+**Research need — does this plan depend on facts that change faster than training data?**
+| Research need | Signals | Action |
+|--------------|---------|--------|
+| **None** | Generic, timeless, or conceptual plan (study curriculum methodology, project management approach, personal goal breakdown) | Skip research. Model knowledge is sufficient. After structuring the plan, offer: "I based this on general knowledge. Want me to search for [specific thing research would improve]?" — e.g., sourced recipes, current product recommendations, expert frameworks. Only if the user accepts. |
+| **Recommended** | Plan references specific locations, venues, dates, prices, schedules, seasonal availability, or current events — anything where stale information would break the plan (closed restaurants, changed prices, cancelled events, wrong seasonal dates). | Research before planning. Decompose into 2-5 focused research questions and dispatch parallel web searches. In OpenCode, use the Agent tool with `model: "haiku"` for each search to reduce cost. Collate findings before structuring the plan. |
+When research is recommended, do it — don't just offer. Stale recommendations (closed restaurants, rethemed attractions, outdated prices) are worse than no recommendations. The user invoked `/ce-plan` because they want a good plan, not a disclaimer about training data.
+**Research decomposition pattern:**
+1. Identify 2-5 independent research questions based on the task. Good questions target facts the model is least confident about: current prices, hours, availability, recent changes, seasonal specifics.
+2. Dispatch parallel research. Prefer user-named surfaces first per Core Principle 8 in SKILL.md; fall back to web search for questions those surfaces don't cover.
+3. Collate findings into a brief research summary before proceeding to planning.
+Example for "plan a date night in Seattle this Saturday":
+- "Best restaurants open late Saturday in Capitol Hill Seattle 2026"
+- "Events happening in Seattle [specific date]"
+- "Seattle waterfront current status and hours"
+## Step 1b: Focused Q&A
+Ask up to 3 questions targeting the unknowns that would most change the plan. Use the platform's blocking question tool: `question` in OpenCode (call `ToolSearch` with `select:question` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). Fall back to numbered options in chat only when no blocking tool exists or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question.
+**How to ask well:**
+- Offer informed options, not open-ended blanks. Instead of "When are you going?", try "Mid-week visits have 30-40% shorter lines — are you flexible on timing?" The question should give the user a frame of reference, not just extract information.
+- Use multi-select when several independent choices can be captured in one question. This is compact and respects the user's time.
+- Always include a final option like **"Skip — just make the plan with reasonable assumptions"** so the user can opt out at any point.
+Focus on the unknowns specific to this task that would change what the plan recommends or how it's structured. Do not ask more than 3 — after that, proceed with assumptions for anything remaining.
+## Step 2: Structure the Plan
+Create a structured plan guided by these quality principles. Do NOT use the software plan template (implementation units, test scenarios, file paths, etc.).
+### Format: when to prescribe vs. present options
+Not every plan should be a single linear path. Match the format to the task:
+| Task type | Best format | Why |
+|-----------|------------|-----|
+| **High personal preference** (food, entertainment, activities, gifts) | Curated options per category — present 2-3 choices and let the user compose | Preferences vary; a single pick may miss. Options respect the user's taste. |
+| **Logical sequence** (study plan, project timeline, multi-day trip logistics) | Single prescriptive path with clear ordering | Sequencing matters; options at each step create decision paralysis. |
+| **Hybrid** (event with fixed structure but variable details) | Fixed structure with choice points marked | The skeleton is set but specific vendors/venues/activities are options. |
+Example: A date night plan should present 2-3 restaurant options, 2-3 activity options, and a suggested flow — not pick one restaurant and build the whole evening around it. A study plan should prescribe a single weekly progression — not present 3 different curricula to choose from.
+### Formatting: bullets over prose
+- Prefer bullets and tables for actionable content (steps, options, logistics, budgets)
+- Use prose only for context, rationale, or explanations that connect the dots
+- Plans are for scanning and executing, not reading cover-to-cover
+### Quality principles
+- **Actionable steps**: Each step is specific enough to execute without further research
+- **Sequenced by dependency**: Steps are in the right order, with dependencies noted
+- **Time-aware**: When relevant, include timing, durations, deadlines, or phases
+- **Resource-identified**: Specify what's needed — tools, materials, people, budget, locations
+- **Contingency-aware**: For important decisions, note alternatives or what to do if plans change
+- **Appropriately detailed**: Match detail to task complexity. A weekend trip needs less structure than a 3-month curriculum. A dinner plan should be concise, not a 200-line document.
+- **Domain-appropriate format**: Choose a structure that fits the domain:
+  - Itinerary for travel (day-by-day, with times and locations)
+  - Syllabus or curriculum for study plans (topics, resources, milestones)
+  - Runbook for events (timeline, responsibilities, logistics)
+  - Project plan for business or operational tasks (phases, owners, deliverables)
+  - Research plan for investigations (questions, methods, sources)
+  - Options menu for preference-driven tasks (curated picks per category)
+## Step 3: Save or Share
+After structuring the plan, ask the user how they want to receive it using the platform's blocking question tool: `question` in OpenCode (call `ToolSearch` with `select:question` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). Fall back to numbered options in chat only when no blocking tool exists or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question.
+**Question:** "Plan ready. How would you like to receive it?"
+**Options:**
+1. **Save to disk** — Write the plan as a markdown file. Ask where:
+   - `docs/plans/` (only show if this directory exists)
+   - Current working directory
+   - `/tmp`
+   - A custom path
+   - Use filename convention: `YYYY-MM-DD-<descriptive-name>-plan.md`
+   - Start the document with a `# Title` heading, followed by `Created: YYYY-MM-DD` on the next line. No YAML frontmatter.
+2. **Open in Proof (web app) — review and comment to iterate with the agent** — Open the doc in Every's Proof editor, iterate with the agent via comments, or copy a link to share with others. Load the `ce-proof` skill to create and open the document.
+3. **Save to disk AND open in Proof** — Do both: write the markdown file to disk and open the doc in Proof for review.
+Do not offer `/ce-work` (software-only) or issue creation (not applicable to non-software plans).

package/skills/ce-plan/references/visual-communication.md ADDED Viewed

@@ -0,0 +1,31 @@
+# Visual Communication in Plan Documents
+Section 3.4 covers diagrams about the *solution being planned* (pseudo-code, mermaid sequences, state diagrams). The existing Section 4.3 mermaid rule encourages those solution-design diagrams within Technical Design and per-unit fields. This guidance covers a different concern: visual aids that help readers *navigate and comprehend the plan document itself* -- dependency graphs, interaction diagrams, and comparison tables that make plan structure scannable.
+Visual aids are conditional on content patterns, not on plan depth classification -- a Lightweight plan about a complex multi-unit workflow may warrant a dependency graph; a Deep plan about a straightforward feature may not.
+**When to include:**
+| Plan describes... | Visual aid | Placement |
+|---|---|---|
+| 4+ implementation units with non-linear dependencies (parallelism, diamonds, fan-in/fan-out) | Mermaid dependency graph | Before or after the Implementation Units heading |
+| System-Wide Impact naming 3+ interacting surfaces or cross-layer effects | Mermaid interaction or component diagram | Within the System-Wide Impact section |
+| Summary or Problem Frame involving 3+ behavioral modes, states, or variants | Markdown comparison table | Within Summary or Problem Frame (legacy plans may still use `Overview`) |
+| Key Technical Decisions with 3+ interacting decisions, or Alternative Approaches with 3+ alternatives | Markdown comparison table | Within the relevant section |
+**When to skip:**
+- The plan has 3 or fewer units in a straight dependency chain -- the Dependencies field on each unit is sufficient
+- Prose already communicates the relationships clearly
+- The visual would duplicate what the High-Level Technical Design section already shows
+- The visual describes code-level detail (specific method names, SQL columns, API field lists)
+**Format selection:**
+- **Mermaid** (default) for dependency graphs and interaction diagrams -- 5-15 nodes, no in-box annotations, standard flowchart shapes. Use `TB` (top-to-bottom) direction so diagrams stay narrow in both rendered and source form. Source should be readable as fallback in diff views and terminals.
+- **ASCII/box-drawing diagrams** for annotated flows that need rich in-box content -- file path layouts, decision logic branches, multi-column spatial arrangements. More expressive than mermaid when the diagram's value comes from annotations within nodes. Follow 80-column max for code blocks, use vertical stacking.
+- **Markdown tables** for mode/variant comparisons and decision/approach comparisons.
+- Keep diagrams proportionate to the plan. A 6-unit linear chain gets a simple 6-node graph. A complex dependency graph with fan-out and fan-in may need 10-15 nodes -- that is fine if every node earns its place.
+- Place inline at the point of relevance, not in a separate section.
+- Plan-structure level only -- unit dependencies, component interactions, mode comparisons, impact surfaces. Not implementation architecture, data schemas, or code structure (those belong in Section 3.4).
+- Prose is authoritative: when a visual aid and its surrounding prose disagree, the prose governs.
+After generating a visual aid, verify it accurately represents the plan sections it illustrates -- correct dependency edges, no missing surfaces, no merged units.

package/skills/ce-work/references/shipping-workflow.md ADDED Viewed

@@ -0,0 +1,129 @@
+# Shipping Workflow
+This file contains the shipping workflow (Phase 3-4). Load it only when all Phase 2 tasks are complete and execution transitions to quality check.
+## Phase 3: Quality Check
+1. **Run Core Quality Checks**
+   Always run before submitting:
+   ```bash
+   # Run full test suite (use project's test command)
+   # Examples: bin/rails test, npm test, pytest, go test, etc.
+   # Run linting (per AGENTS.md)
+   # Use linting-agent before pushing to origin
+   ```
+2. **Code Review** (REQUIRED)
+   Every change gets reviewed before shipping. The depth scales with the change's risk profile, but review itself is never skipped.
+   **Tier 2: Full review (default)** -- REQUIRED unless Tier 1 criteria are explicitly met. Invoke the `ce-code-review` skill with `mode:autofix` to run specialized reviewer agents, auto-apply safe fixes, and record residual downstream work in the per-run artifact. When the plan file path is known, pass it as `plan:<path>`. This is the mandatory default -- proceed to Tier 1 only after confirming every criterion below.
+   **Tier 1: Inline self-review** -- A lighter alternative permitted only when **all four** criteria are true. Before choosing Tier 1, explicitly state which criteria apply and why. If any criterion is uncertain, use Tier 2.
+   - Purely additive (new files only, no existing behavior modified)
+   - Single concern (one skill, one component -- not cross-cutting)
+   - Pattern-following (implementation mirrors an existing example with no novel logic)
+   - Plan-faithful (no scope growth, no deferred questions resolved with surprising answers)
+3. **Residual Work Gate** (REQUIRED when Tier 2 ran)
+   After Tier 2 code review completes, inspect the Residual Actionable Work summary it returned (or read the run artifact directly if the summary was not emitted). If one or more residual `downstream-resolver` findings remain, do not proceed to Final Validation until the user decides how to handle them.
+   Ask the user using the platform's blocking question tool (`question` in OpenCode with `ToolSearch select:question` pre-loaded if needed, `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension)). Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool. Never silently skip the gate.
+   Stem: `Code review found N residual finding(s) the skill did not auto-fix. How should the agent proceed?`
+   Options (four or fewer, self-contained labels):
+   - `Apply/fix now` — loop back into review with focused fixes; the agent investigates each finding, applies changes where safe, and re-runs review.
+   - `File tickets via project tracker` — load `references/tracker-defer.md` in Interactive mode; the agent files tickets in the project's detected tracker (or `gh` fallback, or leaves them in the report if no sink exists) and proceeds to Final Validation.
+   - `Accept and proceed` — record the residual findings verbatim in a durable "Known Residuals" sink before shipping. If a PR will be created or updated in Phase 4, include them in the PR description's "Known Residuals" section (the agent owns this when calling `ce-commit-push-pr`). If the user later chooses the no-PR `ce-commit` path, create `docs/residual-review-findings/<branch-or-head-sha>.md`, include the accepted findings and source review-run context, stage it with the implementation commit, and mention the file path in the final summary. The user has acknowledged the risk, but the findings must not live only in the transient session.
+   - `Stop — do not ship` — abort the shipping workflow. The user will handle findings manually before re-invoking.
+   Skip this gate entirely when the review reported `Residual actionable work: none.` or when only Tier 1 (inline self-review) was used. Do not proceed past this gate on an `Accept and proceed` decision until the agent has recorded whether the durable sink is `PR Known Residuals` or `docs/residual-review-findings/<branch-or-head-sha>.md`.
+4. **Final Validation**
+   - All tasks marked completed
+   - Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
+   - Linting passes
+   - Code follows existing patterns
+   - Figma designs match (if applicable)
+   - No console errors or warnings
+   - If the plan has a `Requirements` section (or legacy `Requirements Trace`), verify each requirement is satisfied by the completed work
+   - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
+5. **Prepare Operational Validation Plan** (REQUIRED)
+   - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
+   - Include concrete:
+     - Log queries/search terms
+     - Metrics or dashboards to watch
+     - Expected healthy signals
+     - Failure signals and rollback/mitigation trigger
+     - Validation window and owner
+   - If there is truly no production/runtime impact, still include the section with: `No additional operational monitoring required` and a one-line reason.
+## Phase 4: Ship It
+1. **Prepare Evidence Context**
+   Do not invoke `ce-demo-reel` directly in this step. Evidence capture belongs to the PR creation or PR description update flow, where the final PR diff and description context are available.
+   Note whether the completed work has observable behavior (UI rendering, CLI output, API/library behavior with a runnable example, generated artifacts, or workflow output). The `ce-commit-push-pr` skill will ask whether to capture evidence only when evidence is possible.
+2. **Update Plan Status**
+   If the input document has YAML frontmatter with a `status` field, update it to `completed`:
+   ```
+   status: active  ->  status: completed
+   ```
+3. **Commit and Create Pull Request**
+   Load the `ce-commit-push-pr` skill to handle committing, pushing, and PR creation. The skill handles convention detection, branch safety, logical commit splitting, adaptive PR descriptions, and attribution badges.
+   When providing context for the PR description, include:
+   - The plan's summary and key decisions
+   - Testing notes (tests added/modified, manual testing performed)
+   - Evidence context from step 1, so `ce-commit-push-pr` can decide whether to ask about capturing evidence
+   - Figma design link (if applicable)
+   - The Post-Deploy Monitoring & Validation section (see Phase 3 Step 5)
+   - Any "Known Residuals" accepted in the Phase 3 Residual Work Gate, rendered as a dedicated section in the PR body with severity, file:line, and title per finding
+   If the user prefers to commit without creating a PR, load the `ce-commit` skill instead.
+4. **Notify User**
+   - Summarize what was completed
+   - Link to PR (if one was created)
+   - Note any follow-up work needed
+   - Suggest next steps if applicable
+## Quality Checklist
+Before creating PR, verify:
+- [ ] All clarifying questions asked and answered
+- [ ] All tasks marked completed
+- [ ] Testing addressed -- tests pass AND new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
+- [ ] Linting passes (use linting-agent)
+- [ ] Code follows existing patterns
+- [ ] Figma designs match implementation (if applicable)
+- [ ] Evidence decision handled by `ce-commit-push-pr` when the change has observable behavior
+- [ ] Commit messages follow conventional format
+- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
+- [ ] Code review completed (inline self-review or full `ce-code-review`)
+- [ ] PR description includes summary, testing notes, and evidence when captured
+- [ ] PR description includes Compound Engineered badge with accurate model and harness
+## Code Review Tiers
+Every change gets reviewed. The tier determines depth, not whether review happens.
+**Tier 2 (full review)** -- REQUIRED default. Invoke `ce-code-review mode:autofix` with `plan:<path>` when available. Safe fixes are applied automatically; residual work is recorded in the run artifact for downstream routing. Always use this tier unless all four Tier 1 criteria are explicitly confirmed.
+**Tier 1 (inline self-review)** -- permitted only when all four are true (state each explicitly before choosing):
+- Purely additive (new files only, no existing behavior modified)
+- Single concern (one skill, one component -- not cross-cutting)
+- Pattern-following (mirrors an existing example, no novel logic)
+- Plan-faithful (no scope growth, no surprising deferred-question resolutions)