npm - @fro.bot/systematic - Versions diffs - 2.3.3 → 2.4.1 - Mend

@fro.bot/systematic 2.3.3 → 2.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (72) hide show

package/README.md +12 -13
package/agents/design/design-implementation-reviewer.md +2 -19
package/agents/design/design-iterator.md +2 -31
package/agents/design/figma-design-sync.md +2 -22
package/agents/docs/ankane-readme-writer.md +2 -19
package/agents/document-review/adversarial-document-reviewer.md +3 -2
package/agents/document-review/coherence-reviewer.md +5 -7
package/agents/document-review/design-lens-reviewer.md +3 -4
package/agents/document-review/feasibility-reviewer.md +3 -4
package/agents/document-review/product-lens-reviewer.md +25 -6
package/agents/document-review/scope-guardian-reviewer.md +3 -4
package/agents/document-review/security-lens-reviewer.md +3 -4
package/agents/research/best-practices-researcher.md +4 -21
package/agents/research/framework-docs-researcher.md +2 -19
package/agents/research/git-history-analyzer.md +2 -19
package/agents/research/issue-intelligence-analyst.md +2 -24
package/agents/research/learnings-researcher.md +7 -28
package/agents/research/repo-research-analyst.md +3 -32
package/agents/research/slack-researcher.md +128 -0
package/agents/review/agent-native-reviewer.md +109 -195
package/agents/review/architecture-strategist.md +3 -19
package/agents/review/cli-agent-readiness-reviewer.md +1 -27
package/agents/review/code-simplicity-reviewer.md +5 -19
package/agents/review/data-integrity-guardian.md +3 -19
package/agents/review/data-migration-expert.md +3 -19
package/agents/review/deployment-verification-agent.md +3 -19
package/agents/review/pattern-recognition-specialist.md +4 -20
package/agents/review/performance-oracle.md +3 -31
package/agents/review/project-standards-reviewer.md +5 -5
package/agents/review/schema-drift-detector.md +3 -19
package/agents/review/security-sentinel.md +3 -25
package/agents/review/testing-reviewer.md +3 -3
package/agents/workflow/lint.md +1 -2
package/agents/workflow/pr-comment-resolver.md +54 -22
package/agents/workflow/spec-flow-analyzer.md +2 -25
package/package.json +1 -1
package/skills/agent-native-architecture/SKILL.md +28 -27
package/skills/agent-native-architecture/references/agent-execution-patterns.md +3 -3
package/skills/agent-native-architecture/references/agent-native-testing.md +1 -1
package/skills/agent-native-architecture/references/mobile-patterns.md +1 -1
package/skills/andrew-kane-gem-writer/SKILL.md +5 -5
package/skills/ce-brainstorm/SKILL.md +43 -181
package/skills/ce-compound/SKILL.md +143 -89
package/skills/ce-compound-refresh/SKILL.md +48 -5
package/skills/ce-ideate/SKILL.md +27 -242
package/skills/ce-plan/SKILL.md +165 -81
package/skills/ce-review/SKILL.md +348 -125
package/skills/ce-review/references/findings-schema.json +5 -0
package/skills/ce-review/references/persona-catalog.md +2 -2
package/skills/ce-review/references/resolve-base.sh +5 -2
package/skills/ce-review/references/subagent-template.md +25 -3
package/skills/ce-work/SKILL.md +95 -242
package/skills/ce-work-beta/SKILL.md +154 -301
package/skills/dhh-rails-style/SKILL.md +13 -12
package/skills/document-review/SKILL.md +56 -109
package/skills/document-review/references/findings-schema.json +0 -23
package/skills/document-review/references/subagent-template.md +13 -18
package/skills/dspy-ruby/SKILL.md +8 -8
package/skills/every-style-editor/SKILL.md +3 -2
package/skills/frontend-design/SKILL.md +2 -3
package/skills/git-commit/SKILL.md +1 -1
package/skills/git-commit-push-pr/SKILL.md +81 -265
package/skills/git-worktree/SKILL.md +20 -21
package/skills/lfg/SKILL.md +10 -17
package/skills/onboarding/SKILL.md +2 -2
package/skills/onboarding/scripts/inventory.mjs +31 -7
package/skills/proof/SKILL.md +134 -28
package/skills/resolve-pr-feedback/SKILL.md +7 -2
package/skills/setup/SKILL.md +1 -1
package/skills/test-browser/SKILL.md +10 -11
package/skills/test-xcode/SKILL.md +6 -3
package/dist/lib/manifest.d.ts +0 -39

package/skills/ce-review/references/findings-schema.json CHANGED Viewed

@@ -124,6 +124,11 @@
       "downstream-resolver": "Turn this into residual work for later resolution.",
       "human": "A person must make a judgment call before code changes should continue.",
       "release": "Operational or rollout follow-up; do not convert into code-fix work automatically."
+    },
+    "return_tiers": {
+      "description": "Finding fields are split into two tiers. The full schema (with all required fields) applies to the artifact file on disk. The compact return to the orchestrator omits detail-tier fields. Both are valid uses of this schema in different contexts.",
+      "merge_tier": "Returned to orchestrator: title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix (optional). Plus top-level reviewer, residual_risks, testing_gaps.",
+      "detail_tier": "Required in artifact file, omitted from compact return: why_it_matters, evidence. The artifact file must pass full schema validation including all required fields. Headless output depends on why_it_matters and evidence being present in the artifact."
     }
   }
 }

package/skills/ce-review/references/persona-catalog.md CHANGED Viewed

@@ -13,7 +13,7 @@ Spawned on every review regardless of diff content.
 | `correctness` | `systematic:review:correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation, intent compliance |
 | `testing` | `systematic:review:testing-reviewer` | Coverage gaps, weak assertions, brittle tests, missing edge case tests |
 | `maintainability` | `systematic:review:maintainability-reviewer` | Coupling, complexity, naming, dead code, premature abstraction |
-| `project-standards` | `systematic:review:project-standards-reviewer` | AGENTS.md and AGENTS.md compliance -- frontmatter, references, naming, cross-platform portability, tool selection |
+| `project-standards` | `systematic:review:project-standards-reviewer` | AGENTS.md compliance -- frontmatter, references, naming, cross-platform portability, tool selection |
 **CE agents (unstructured output, synthesized separately):**
@@ -33,7 +33,7 @@ Spawned when the orchestrator identifies relevant patterns in the diff. The orch
 | `api-contract` | `systematic:review:api-contract-reviewer` | Route definitions, serializer/interface changes, event schemas, exported type signatures, API versioning |
 | `data-migrations` | `systematic:review:data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations |
 | `reliability` | `systematic:review:reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks |
-| `adversarial` | `systematic:review:adversarial-reviewer` | Diff has >=50 changed non-test, non-generated, non-lockfile lines, OR touches auth, payments, data mutations, external API integrations, or other high-risk domains |
+| `adversarial` | `systematic:review:adversarial-reviewer` | Diff has >=50 changed lines of executable code (not prose/instruction Markdown, JSON schemas, or config), OR touches auth, payments, data mutations, external API integrations, or other high-risk domains regardless of file type |
 | `cli-readiness` | `systematic:review:cli-readiness-reviewer` | CLI command definitions, argument parsing, CLI framework usage, command handler implementations |
 | `previous-comments` | `systematic:review:previous-comments-reviewer` | **PR-only.** Reviewing a PR that has existing review comments or review threads from prior review rounds. Skip entirely when no PR metadata was gathered in Stage 1. |

package/skills/ce-review/references/resolve-base.sh CHANGED Viewed

@@ -52,7 +52,9 @@ if [ -n "$REVIEW_BASE_BRANCH" ]; then
   if [ -n "$PR_BASE_REPO" ]; then
     PR_BASE_REMOTE=$(git remote -v | awk "index(\$2, \"github.com:$PR_BASE_REPO\") || index(\$2, \"github.com/$PR_BASE_REPO\") {print \$1; exit}")
     if [ -n "$PR_BASE_REMOTE" ]; then
-      git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH:refs/remotes/$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
+      # Always fetch — a locally cached ref may be stale, producing a
+      # merge-base that predates squash-merged work and inflating the diff.
+      git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH:refs/remotes/$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
       BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
     fi
   fi
@@ -60,7 +62,8 @@ if [ -n "$REVIEW_BASE_BRANCH" ]; then
     # Only try origin if it exists as a remote; otherwise skip to avoid
     # confusing errors in fork setups where origin points at the user's fork.
     if git remote get-url origin >/dev/null 2>&1; then
-      git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags origin "$REVIEW_BASE_BRANCH:refs/remotes/origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
+      # Always fetch — same rationale as the fork-safe path above.
+      git fetch --no-tags origin "$REVIEW_BASE_BRANCH:refs/remotes/origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
       BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
     fi
     # Fall back to a bare local ref only if remote resolution failed

package/skills/ce-review/references/subagent-template.md CHANGED Viewed

@@ -18,7 +18,23 @@ You are a specialist code reviewer.
 </scope-rules>
 <output-contract>
-Return ONLY valid JSON matching the findings schema below. No prose, no markdown, no explanation outside the JSON object.
+You produce up to two outputs depending on whether a run ID was provided:
+1. **Artifact file (when run ID is present).** If a Run ID appears in <review-context> below, WRITE your full analysis (all schema fields, including why_it_matters, evidence, and suggested_fix) as JSON to:
+   .context/systematic/ce-review/{run_id}/{reviewer_name}.json
+   This is the ONE write operation you are permitted to make. Use the platform's file-write tool.
+   If the write fails, continue -- the compact return still provides everything the merge needs.
+   If no Run ID is provided (the field is empty or absent), skip this step entirely -- do not attempt any file write.
+2. **Compact return (always).** RETURN compact JSON to the parent with ONLY merge-tier fields per finding:
+   title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix.
+   Do NOT include why_it_matters or evidence in the returned JSON.
+   Include reviewer, residual_risks, and testing_gaps at the top level.
+The full file preserves detail for downstream consumers (headless output, debugging).
+The compact return keeps the orchestrator's context lean for merge and synthesis.
+The schema below describes the **full artifact file format** (all fields required). For the compact return, follow the field list above -- omit why_it_matters and evidence even though the schema marks them as required.
 {schema}
@@ -41,9 +57,10 @@ False-positive categories to actively suppress:
 - Generic "consider adding" advice without a concrete failure mode
 Rules:
-- Every finding MUST include at least one evidence item grounded in the actual code.
+- You are a leaf reviewer inside an already-running systematic review workflow. Do not invoke systematic skills or agents unless this template explicitly instructs you to. Perform your analysis directly and return findings in the required output format only.
+- Every finding in the full artifact file MUST include at least one evidence item grounded in the actual code. The compact return omits evidence -- the evidence requirement applies to the disk artifact only.
 - Set pre_existing to true ONLY for issues in unchanged code that are unrelated to this diff. If the diff makes the issue newly relevant, it is NOT pre-existing.
-- You are operationally read-only. You may use non-mutating inspection commands, including read-oriented `git` / `gh` commands, to gather evidence. Do not edit files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
+- You are operationally read-only. The one permitted exception is writing your full analysis to the `.context/` artifact path when a run ID is provided. You may also use non-mutating inspection commands, including read-oriented `git` / `gh` commands, to gather evidence. Do not edit project files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
 - Set `autofix_class` accurately -- not every finding is `advisory`. Use this decision guide:
   - `safe_auto`: The fix is local and deterministic — the fixer can apply it mechanically without design judgment. Examples: extracting a duplicated helper, adding a missing nil/null check, fixing an off-by-one, adding a missing test for an untested code path, removing dead code.
   - `gated_auto`: A concrete fix exists but it changes contracts, permissions, or crosses a module boundary in a way that deserves explicit approval. Examples: adding authentication to an unprotected endpoint, changing a public API response shape, switching from soft-delete to hard-delete.
@@ -62,6 +79,9 @@ Rules:
 </pr-context>
 <review-context>
+Run ID: {run_id}
+Reviewer name: {reviewer_name}
 Intent: {intent_summary}
 Changed files: {file_list}
@@ -82,3 +102,5 @@ Diff:
 | `{pr_metadata}` | Stage 1 output | PR title, body, and URL when reviewing a PR. Empty string when reviewing a branch or standalone checkout |
 | `{file_list}` | Stage 1 output | List of changed files from the scope step |
 | `{diff}` | Stage 1 output | The actual diff content to review |
+| `{run_id}` | Stage 4 output | Unique review run identifier for the artifact directory |
+| `{reviewer_name}` | Stage 3 output | Persona or agent name used as the artifact filename stem |

package/skills/ce-work/SKILL.md CHANGED Viewed

@@ -1,16 +1,16 @@
 ---
 name: ce:work
-description: Execute work plans efficiently while maintaining quality and finishing features
-argument-hint: '[plan file, specification, or todo file path]'
+description: Execute work efficiently while maintaining quality and finishing features
+argument-hint: "[Plan doc path or description of work. Blank to auto use latest plan doc]"
 ---
-# Work Plan Execution Command
+# Work Execution Command
-Execute a work plan efficiently while maintaining quality and finishing features.
+Execute work efficiently while maintaining quality and finishing features.
 ## Introduction
-This command takes a work document (plan, specification, or todo file) and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
+This command takes a work document (plan, specification, or todo file) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
 ## Input Document
@@ -18,9 +18,33 @@ This command takes a work document (plan, specification, or todo file) and execu
 ## Execution Workflow
+### Phase 0: Input Triage
+Determine how to proceed based on what was provided in `<input_document>`.
+**Plan document** (input is a file path to an existing plan, specification, or todo file) → skip to Phase 1.
+**Bare prompt** (input is a description of work, not a file path):
+1. **Scan the work area**
+   - Identify files likely to change based on the prompt
+   - Find existing test files for those areas (search for test/spec files that import, reference, or share names with the implementation files)
+   - Note local patterns and conventions in the affected areas
+2. **Assess complexity and route**
+   | Complexity | Signals | Action |
+   |-----------|---------|--------|
+   | **Trivial** | 1-2 files, no behavioral change (typo, config, rename) | Proceed to Phase 1 step 2 (environment setup), then implement directly — no task list, no execution loop. Apply Test Discovery if the change touches behavior-bearing code |
+   | **Small / Medium** | Clear scope, under ~10 files | Build a task list from discovery. Proceed to Phase 1 step 2 |
+   | **Large** | Cross-cutting, architectural decisions, 10+ files, touches auth/payments/migrations | Inform the user this would benefit from `/ce:brainstorm` or `/ce:plan` to surface edge cases and scope boundaries. Honor their choice. If proceeding, build a task list and continue to Phase 1 step 2 |
+---
 ### Phase 1: Quick Start
-1. **Read Plan and Clarify**
+1. **Read Plan and Clarify** _(skip if arriving from Phase 0 with a bare prompt)_
    - Read the work document completely
    - Treat the plan as a decision artifact, not an execution script
@@ -49,8 +73,17 @@ This command takes a work document (plan, specification, or todo file) and execu
    ```
    **If already on a feature branch** (not the default branch):
-   - Ask: "Continue working on `[current_branch]`, or create a new branch?"
-   - If continuing, proceed to step 3
+   First, check whether the branch name is **meaningful** — a name like `feat/crowd-sniff` or `fix/email-validation` tells future readers what the work is about. Auto-generated worktree names (e.g., `worktree-jolly-beaming-raven`) or other opaque names do not.
+   If the branch name is meaningless or auto-generated, suggest renaming it before continuing:
+   ```bash
+   git branch -m <meaningful-name>
+   ```
+   Derive the new name from the plan title or work description (e.g., `feat/crowd-sniff`). Present the rename as a recommended option alongside continuing as-is.
+   Then ask: "Continue working on `[current_branch]`, or create a new branch?"
+   - If continuing (with or without rename), proceed to step 3
    - If creating new, follow Option A or B below
    **If on the default branch**, choose how to proceed:
@@ -78,7 +111,7 @@ This command takes a work document (plan, specification, or todo file) and execu
    - You want to keep the default branch clean while experimenting
    - You plan to switch between branches frequently
-3. **Create Todo List**
+3. **Create Todo List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
    - Use your available task tracking tool (e.g., todowrite, task lists) to break the plan into actionable tasks
    - Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
    - Carry each unit's `Execution note` into the task when present
@@ -96,18 +129,44 @@ This command takes a work document (plan, specification, or todo file) and execu
    | Strategy | When to use |
    |----------|-------------|
-   | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight |
-   | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks |
-   | **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
+   | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight. **Default for bare-prompt work** — bare prompts rarely produce enough structured context to justify subagent dispatch |
+   | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks. Requires plan-unit metadata (Goal, Files, Approach, Test scenarios) |
+   | **Parallel subagents** | 3+ tasks that pass the Parallel Safety Check (below). Dispatch independent units simultaneously, run dependent units after their prerequisites complete. Requires plan-unit metadata |
+   **Parallel Safety Check** — required before choosing parallel dispatch:
+   1. Build a file-to-unit mapping from every candidate unit's `Files:` section (Create, Modify, and Test paths)
+   2. Check for intersection — any file path appearing in 2+ units means overlap
+   3. If any overlap is found, downgrade to serial subagents. Log the reason (e.g., "Units 2 and 4 share `config/routes.rb` — using serial dispatch"). Serial subagents still provide context-window isolation without shared-directory risks
+   Even with no file overlap, parallel subagents sharing a working directory face git index contention (concurrent staging/committing corrupts the index) and test interference (concurrent test runs pick up each other's in-progress changes). The parallel subagent constraints below mitigate these.
    **Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
    - The full plan file path (for overall context)
    - The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
    - Any resolved deferred questions relevant to that unit
+   - Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests
+   **Parallel subagent constraints** — when dispatching units in parallel (not serial or inline):
+   - Instruct each subagent: "Do not stage files (`git add`), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete."
+   - These constraints prevent git index contention and test interference between concurrent subagents
+   **Permission mode:** Omit the `mode` parameter when dispatching subagents so the user's configured permission settings apply. Do not pass `mode: "auto"` — it overrides user-level settings like `bypassPermissions`.
-   After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
+   **After each subagent completes (serial mode):**
+   1. Review the subagent's diff — verify changes match the unit's scope and `Files:` list
+   2. Run the relevant test suite to confirm the tree is healthy
+   3. If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
+   4. Update the plan checkboxes and task list
+   5. Dispatch the next unit
-   For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
+   **After all parallel subagents in a batch complete:**
+   1. Wait for every subagent in the current parallel batch to finish before acting on any of their results
+   2. Cross-check for discovered file collisions: compare the actual files modified by all subagents in the batch (not just their declared `Files:` lists). Subagents may create or modify files not anticipated during planning — this is expected, since plans describe *what* not *how*. A collision only matters when 2+ subagents in the same batch modified the same file. In a shared working directory, only the last writer's version survives — the other unit's changes to that file are lost. If a collision is detected: commit all non-colliding files from all units first, then re-run the affected units serially for the shared file so each builds on the other's committed work
+   3. For each completed unit, in dependency order: review the diff, run the relevant test suite, stage only that unit's files, and commit with a conventional message derived from the unit's Goal
+   4. If tests fail after committing a unit's changes, diagnose and fix before committing the next unit
+   5. Update the plan checkboxes and task list
+   6. Dispatch the next batch of independent units, or the next dependent unit
 ### Phase 2: Execute
@@ -118,12 +177,14 @@ This command takes a work document (plan, specification, or todo file) and execu
    ```
    while (tasks remain):
      - Mark task as in-progress
-     - Read any referenced files from the plan
+     - Read any referenced files from the plan or discovered during Phase 0
      - Look for similar patterns in codebase
+     - Find existing test files for implementation files being changed (Test Discovery — see below)
      - Implement following existing conventions
-     - Write tests for new functionality
+     - Add, update, or remove tests to match implementation changes (see Test Discovery below)
      - Run System-Wide Test Check (see below)
      - Run tests after changes
+     - Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
      - Mark task as completed
      - Evaluate for incremental commit (see below)
    ```
@@ -136,6 +197,17 @@ This command takes a work document (plan, specification, or todo file) and execu
    - Do not over-implement beyond the current behavior slice when working test-first
    - Skip test-first discipline for trivial renames, pure configuration, and pure styling work
+   **Test Discovery** — Before implementing changes to a file, find its existing test files (search for test/spec files that import, reference, or share naming patterns with the implementation file). When a plan specifies test scenarios or test files, start there, then check for additional test coverage the plan may not have enumerated. Changes to implementation files should be accompanied by corresponding test updates — new tests for new behavior, modified tests for changed behavior, removed or updated tests for deleted behavior.
+   **Test Scenario Completeness** — Before writing tests for a feature-bearing unit, check whether the plan's `Test scenarios` cover all categories that apply to this unit. If a category is missing or scenarios are vague (e.g., "validates correctly" without naming inputs and expected outcomes), supplement from the unit's own context before writing tests:
+   | Category | When it applies | How to derive if missing |
+   |----------|----------------|------------------------|
+   | **Happy path** | Always for feature-bearing units | Read the unit's Goal and Approach for core input/output pairs |
+   | **Edge cases** | When the unit has meaningful boundaries (inputs, state, concurrency) | Identify boundary values, empty/nil inputs, and concurrent access patterns |
+   | **Error/failure paths** | When the unit has failure modes (validation, external calls, permissions) | Enumerate invalid inputs the unit should reject, permission/auth denials it should enforce, and downstream failures it should handle |
+   | **Integration** | When the unit crosses layers (callbacks, middleware, multi-service) | Identify the cross-layer chain and write a scenario that exercises it without mocks |
    **System-Wide Test Check** — Before marking a task done, pause and ask:
    | Question | What to do |
@@ -182,6 +254,8 @@ This command takes a work document (plan, specification, or todo file) and execu
    **Note:** Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.
+   **Parallel subagent mode:** When units run as parallel subagents, the subagents do not commit — the orchestrator handles staging and committing after the entire parallel batch completes (see Parallel subagent constraints in Phase 1 Step 4). The commit guidance in this section applies to inline and serial execution, and to the orchestrator's commit decisions after parallel batch completion.
 3. **Follow Existing Patterns**
    - The plan should reference similar code - read those files first
@@ -195,7 +269,7 @@ This command takes a work document (plan, specification, or todo file) and execu
    - Run relevant tests after each significant change
    - Don't wait until the end to test
    - Fix failures immediately
-   - Add new tests for new functionality
+   - Add new tests for new behavior, update tests for changed behavior, remove tests for deleted behavior
    - **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.
 5. **Simplify as You Go**
@@ -221,201 +295,9 @@ This command takes a work document (plan, specification, or todo file) and execu
    - Create new tasks if scope expands
    - Keep user informed of major milestones
-### Phase 3: Quality Check
-1. **Run Core Quality Checks**
-   Always run before submitting:
-   ```bash
-   # Run full test suite (use project's test command)
-   # Examples: bin/rails test, npm test, pytest, go test, etc.
-   # Run linting (per AGENTS.md)
-   # Use linting-agent before pushing to origin
-   ```
-2. **Consider Code Review** (Optional)
-   Use for complex, risky, or large changes. Load the `ce:review` skill with `mode:autofix` to fix safe issues and flag the rest before shipping.
-3. **Final Validation**
-   - All tasks marked completed
-   - All tests pass
-   - Linting passes
-   - Code follows existing patterns
-   - Figma designs match (if applicable)
-   - No console errors or warnings
-   - If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
-   - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
-4. **Prepare Operational Validation Plan** (REQUIRED)
-   - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
-   - Include concrete:
-     - Log queries/search terms
-     - Metrics or dashboards to watch
-     - Expected healthy signals
-     - Failure signals and rollback/mitigation trigger
-     - Validation window and owner
-   - If there is truly no production/runtime impact, still include the section with: `No additional operational monitoring required` and a one-line reason.
-### Phase 4: Ship It
-1. **Create Commit**
-   ```bash
-   git add .
-   git status  # Review what's being committed
-   git diff --staged  # Check the changes
-   # Commit with conventional format
-   git commit -m "$(cat <<'EOF'
-   feat(scope): description of what and why
-   Brief explanation if needed.
-   🤖 Generated with [MODEL] via [HARNESS](HARNESS_URL) + Systematic v[VERSION]
-   Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com>
-   EOF
-   )"
-   ```
-   **Fill in at commit/PR time:**
-   | Placeholder | Value | Example |
-   |-------------|-------|---------|
-   | Placeholder | Value | Example |
-   |-------------|-------|---------|
-   | `[MODEL]` | Model name | Claude Opus 4.6, GPT-5.4 |
-   | `[CONTEXT]` | Context window (if known) | 200K, 1M |
-   | `[THINKING]` | Thinking level (if known) | extended thinking |
-   | `[HARNESS]` | Tool running you | OpenCode, Codex, Gemini CLI |
-   | `[HARNESS_URL]` | Link to that tool | `https://opencode.ai` |
-   | `[VERSION]` | `plugin.json` → `version` | 2.40.0 |
-   Subagents creating commits/PRs are equally responsible for accurate attribution.
-2. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
-   For **any** design changes, new views, or UI modifications, you MUST capture and upload screenshots:
-   **Step 1: Start dev server** (if not running)
-   ```bash
-   bin/dev  # Run in background
-   ```
-   **Step 2: Capture screenshots with agent-browser CLI**
-   ```bash
-   agent-browser open http://localhost:3000/[route]
-   agent-browser snapshot -i
-   agent-browser screenshot output.png
-   ```
-   See the `agent-browser` skill for detailed usage.
-   **Step 3: Upload using imgup skill**
-   ```bash
-   skill: imgup
-   # Then upload each screenshot:
-   imgup -h pixhost screenshot.png  # pixhost works without API key
-   # Alternative hosts: catbox, imagebin, beeimg
-   ```
-   **What to capture:**
-   - **New screens**: Screenshot of the new UI
-   - **Modified screens**: Before AND after screenshots
-   - **Design implementation**: Screenshot showing Figma design match
-   **IMPORTANT**: Always include uploaded image URLs in PR description. This provides visual context for reviewers and documents the change.
-3. **Create Pull Request**
-   ```bash
-   git push -u origin feature-branch-name
-   gh pr create --title "Feature: [Description]" --body "$(cat <<'EOF'
-   ## Summary
-   - What was built
-   - Why it was needed
-   - Key decisions made
-   ## Testing
-   - Tests added/modified
-   - Manual testing performed
-   ## Post-Deploy Monitoring & Validation
-   - **What to monitor/search**
-     - Logs:
-     - Metrics/Dashboards:
-   - **Validation checks (queries/commands)**
-     - `command or query here`
-   - **Expected healthy behavior**
-     - Expected signal(s)
-   - **Failure signal(s) / rollback trigger**
-     - Trigger + immediate action
-   - **Validation window & owner**
-     - Window:
-     - Owner:
-   - **If no operational impact**
-     - `No additional operational monitoring required: <reason>`
-   ## Before / After Screenshots
-   | Before | After |
-   |--------|-------|
-   | ![before](URL) | ![after](URL) |
-   ## Figma Design
-   [Link if applicable]
-   ---
-   [![Systematic v[VERSION]](https://img.shields.io/badge/Systematic-v[VERSION]-6366f1)](https://github.com/marcusrbrown/systematic)
-   🤖 Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL)
-   EOF
-   )"
-   ```
-4. **Update Plan Status**
-   If the input document has YAML frontmatter with a `status` field, update it to `completed`:
-   ```
-   status: active  →  status: completed
-   ```
-5. **Notify User**
-   - Summarize what was completed
-   - Link to PR
-   - Note any follow-up work needed
-   - Suggest next steps if applicable
----
-## Swarm Mode with Agent Teams (Optional)
-For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in OpenCode, multi-agent workflows in Codex).
-**Agent teams are typically experimental and require opt-in.** Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.
-### When to Use Agent Teams vs Subagents
-| Agent Teams | Subagents (standard mode) |
-|-------------|---------------------------|
-| Agents need to discuss and challenge each other's approaches | Each task is independent — only the result matters |
-| Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
-| 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
-| User explicitly requests "swarm mode" or "agent teams" | Default for most plans |
-Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead — use them when the inter-agent communication genuinely improves the outcome.
+### Phase 3-4: Quality Check and Ship It
-### Agent Teams Workflow
-1. **Create team** — use your available team creation mechanism
-2. **Create task list** — parse Implementation Units into tasks with dependency relationships
-3. **Spawn teammates** — assign specialized roles (implementer, tester, reviewer) based on the plan's needs. Give each teammate the plan file path and their specific task assignments
-4. **Coordinate** — the lead monitors task completion, reassigns work if someone gets stuck, and spawns additional workers as phases unblock
-5. **Cleanup** — shut down all teammates, then clean up the team resources
----
+When all Phase 2 tasks are complete and execution transitions to quality check, read `references/shipping-workflow.md` for the full shipping workflow: quality checks, code review, final validation, PR creation, and notification.
 ## Key Principles
@@ -442,7 +324,7 @@ Most plans should use subagent dispatch from standard mode. Agent teams add sign
 - Follow existing patterns
 - Write tests for new code
 - Run linting before pushing
-- Use reviewer agents for complex/risky changes only
+- Review every change — inline for simple additive work, full review for everything else
 ### Ship Complete Features
@@ -450,34 +332,6 @@ Most plans should use subagent dispatch from standard mode. Agent teams add sign
 - Don't leave features 80% done
 - A finished feature that ships beats a perfect feature that doesn't
-## Quality Checklist
-Before creating PR, verify:
-- [ ] All clarifying questions asked and answered
-- [ ] All tasks marked completed
-- [ ] Tests pass (run project's test command)
-- [ ] Linting passes (use linting-agent)
-- [ ] Code follows existing patterns
-- [ ] Figma designs match implementation (if applicable)
-- [ ] Before/after screenshots captured and uploaded (for UI changes)
-- [ ] Commit messages follow conventional format
-- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
-- [ ] PR description includes summary, testing notes, and screenshots
-- [ ] PR description includes Compound Engineered badge with accurate model, harness, and version
-## When to Use Reviewer Agents
-**Don't use by default.** Use reviewer agents only when:
-- Large refactor affecting many files (10+)
-- Security-sensitive changes (authentication, permissions, data access)
-- Performance-critical code paths
-- Complex algorithms or business logic
-- User explicitly requests thorough review
-For most features: tests + linting + following patterns is sufficient.
 ## Common Pitfalls to Avoid
 - **Analysis paralysis** - Don't overthink, read the plan and execute
@@ -486,5 +340,4 @@ For most features: tests + linting + following patterns is sufficient.
 - **Testing at the end** - Test continuously or suffer later
 - **Forgetting to track progress** - Update task status as you go or lose track of what's done
 - **80% done syndrome** - Finish the feature, don't move on early
-- **Over-reviewing simple changes** - Save reviewer agents for complex work
+- **Skipping review** - Every change gets reviewed; only the depth varies