npm - @fro.bot/systematic - Versions diffs - 2.3.2 → 2.4.0 - Mend

@fro.bot/systematic 2.3.2 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (71) hide show

package/README.md +12 -13
package/agents/design/design-implementation-reviewer.md +2 -19
package/agents/design/design-iterator.md +2 -31
package/agents/design/figma-design-sync.md +2 -22
package/agents/docs/ankane-readme-writer.md +2 -19
package/agents/document-review/adversarial-document-reviewer.md +3 -2
package/agents/document-review/coherence-reviewer.md +5 -7
package/agents/document-review/design-lens-reviewer.md +3 -4
package/agents/document-review/feasibility-reviewer.md +3 -4
package/agents/document-review/product-lens-reviewer.md +25 -6
package/agents/document-review/scope-guardian-reviewer.md +3 -4
package/agents/document-review/security-lens-reviewer.md +3 -4
package/agents/research/best-practices-researcher.md +4 -21
package/agents/research/framework-docs-researcher.md +2 -19
package/agents/research/git-history-analyzer.md +2 -19
package/agents/research/issue-intelligence-analyst.md +2 -24
package/agents/research/learnings-researcher.md +7 -28
package/agents/research/repo-research-analyst.md +3 -32
package/agents/research/slack-researcher.md +128 -0
package/agents/review/agent-native-reviewer.md +109 -195
package/agents/review/architecture-strategist.md +3 -19
package/agents/review/cli-agent-readiness-reviewer.md +1 -27
package/agents/review/code-simplicity-reviewer.md +5 -19
package/agents/review/data-integrity-guardian.md +3 -19
package/agents/review/data-migration-expert.md +3 -19
package/agents/review/deployment-verification-agent.md +3 -19
package/agents/review/pattern-recognition-specialist.md +4 -20
package/agents/review/performance-oracle.md +3 -31
package/agents/review/project-standards-reviewer.md +5 -5
package/agents/review/schema-drift-detector.md +3 -19
package/agents/review/security-sentinel.md +3 -25
package/agents/review/testing-reviewer.md +3 -3
package/agents/workflow/pr-comment-resolver.md +54 -22
package/agents/workflow/spec-flow-analyzer.md +2 -25
package/package.json +1 -1
package/skills/agent-native-architecture/SKILL.md +28 -27
package/skills/agent-native-architecture/references/agent-execution-patterns.md +3 -3
package/skills/agent-native-architecture/references/agent-native-testing.md +1 -1
package/skills/agent-native-architecture/references/mobile-patterns.md +1 -1
package/skills/andrew-kane-gem-writer/SKILL.md +5 -5
package/skills/ce-brainstorm/SKILL.md +43 -181
package/skills/ce-compound/SKILL.md +143 -89
package/skills/ce-compound-refresh/SKILL.md +48 -5
package/skills/ce-ideate/SKILL.md +27 -242
package/skills/ce-plan/SKILL.md +165 -81
package/skills/ce-review/SKILL.md +348 -125
package/skills/ce-review/references/findings-schema.json +5 -0
package/skills/ce-review/references/persona-catalog.md +2 -2
package/skills/ce-review/references/resolve-base.sh +5 -2
package/skills/ce-review/references/subagent-template.md +25 -3
package/skills/ce-work/SKILL.md +95 -242
package/skills/ce-work-beta/SKILL.md +154 -301
package/skills/dhh-rails-style/SKILL.md +13 -12
package/skills/document-review/SKILL.md +56 -109
package/skills/document-review/references/findings-schema.json +0 -23
package/skills/document-review/references/subagent-template.md +13 -18
package/skills/dspy-ruby/SKILL.md +8 -8
package/skills/every-style-editor/SKILL.md +3 -2
package/skills/frontend-design/SKILL.md +2 -3
package/skills/git-commit/SKILL.md +1 -1
package/skills/git-commit-push-pr/SKILL.md +81 -265
package/skills/git-worktree/SKILL.md +20 -21
package/skills/lfg/SKILL.md +10 -17
package/skills/onboarding/SKILL.md +2 -2
package/skills/onboarding/scripts/inventory.mjs +31 -7
package/skills/proof/SKILL.md +134 -28
package/skills/resolve-pr-feedback/SKILL.md +7 -2
package/skills/setup/SKILL.md +1 -1
package/skills/test-browser/SKILL.md +10 -11
package/skills/test-xcode/SKILL.md +6 -3
package/dist/lib/manifest.d.ts +0 -39

package/skills/ce-work-beta/SKILL.md CHANGED Viewed

@@ -1,27 +1,103 @@
 ---
 name: ce:work-beta
-description: '[BETA] Execute work plans with external delegate support. Same as ce:work but includes experimental Codex delegation mode for token-conserving code implementation.'
-argument-hint: '[plan file, specification, or todo file path]'
+description: "[BETA] Execute work with external delegate support. Same as ce:work but includes experimental Codex delegation mode for token-conserving code implementation."
 disable-model-invocation: true
+argument-hint: "[Plan doc path or description of work. Blank to auto use latest plan doc] [delegate:codex]"
 ---
-# Work Plan Execution Command
+# Work Execution Command
-Execute a work plan efficiently while maintaining quality and finishing features.
+Execute work efficiently while maintaining quality and finishing features.
 ## Introduction
-This command takes a work document (plan, specification, or todo file) and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
+This command takes a work document (plan, specification, or todo file) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
+**Beta rollout note:** Invoke `ce:work-beta` manually when you want to trial Codex delegation. During the beta period, planning and workflow handoffs remain pointed at stable `ce:work` to avoid dual-path orchestration complexity.
 ## Input Document
 <input_document> #$ARGUMENTS </input_document>
+## Argument Parsing
+Parse `$ARGUMENTS` for the following optional tokens. Strip each recognized token before interpreting the remainder as the plan file path or bare prompt.
+| Token | Example | Effect |
+|-------|---------|--------|
+| `delegate:codex` | `delegate:codex` | Activate Codex delegation mode for plan execution |
+| `delegate:local` | `delegate:local` | Deactivate delegation even if enabled in config |
+All tokens are optional. When absent, fall back to the resolution chain below.
+**Fuzzy activation:** Also recognize imperative delegation-intent phrases such as "use codex", "delegate to codex", "codex mode", or "delegate mode" as equivalent to `delegate:codex`. A bare mention of "codex" in a prompt (e.g., "fix codex converter bugs") must NOT activate delegation -- only clear delegation intent triggers it.
+**Fuzzy deactivation:** Also recognize phrases such as "no codex", "local mode", "standard mode" as equivalent to `delegate:local`.
+### Settings Resolution Chain
+After extracting tokens from arguments, resolve the delegation state using this precedence chain:
+1. **Argument flag** -- `delegate:codex` or `delegate:local` from the current invocation (highest priority)
+2. **Config file** -- extract settings from the config block below. Value `codex` for `work_delegate` activates delegation; `false` deactivates.
+3. **Hard default** -- `false` (delegation off)
+**Config (pre-resolved):**
+!`cat "$(git rev-parse --show-toplevel 2>/dev/null)/.systematic/config.local.yaml" 2>/dev/null || cat "$(dirname "$(git rev-parse --path-format=absolute --git-common-dir 2>/dev/null)")/.systematic/config.local.yaml" 2>/dev/null || echo '__NO_CONFIG__'`
+If the block above contains YAML key-value pairs, extract values for the keys listed below.
+If it shows `__NO_CONFIG__`, the file does not exist — all settings fall through to defaults.
+If it shows an unresolved command string, read `.systematic/config.local.yaml` from the repo root using the native file-read tool (e.g., Read in OpenCode, read_file in Codex). If the file does not exist, all settings fall through to defaults.
+If any setting has an unrecognized value, fall through to the hard default for that setting.
+Config keys:
+- `work_delegate` -- `codex` or default `false`
+- `work_delegate_consent` -- `true` or default `false`
+- `work_delegate_sandbox` -- `yolo` (default) or `full-auto`
+- `work_delegate_decision` -- `auto` (default) or `ask`
+- `work_delegate_model` -- Codex model to use (default `gpt-5.4`). Passthrough — any valid model name accepted.
+- `work_delegate_effort` -- `minimal`, `low`, `medium`, `high` (default), or `xhigh`
+Store the resolved state for downstream consumption:
+- `delegation_active` -- boolean, whether delegation mode is on
+- `delegation_source` -- `argument` or `config` or `default` -- how delegation was resolved (used by environment guard to decide notification verbosity)
+- `sandbox_mode` -- `yolo` or `full-auto` (from config or default `yolo`)
+- `consent_granted` -- boolean (from config `work_delegate_consent`)
+- `delegate_model` -- string (from config or default `gpt-5.4`)
+- `delegate_effort` -- string (from config or default `high`)
+---
 ## Execution Workflow
+### Phase 0: Input Triage
+Determine how to proceed based on what was provided in `<input_document>`.
+**Plan document** (input is a file path to an existing plan, specification, or todo file) → skip to Phase 1.
+**Bare prompt** (input is a description of work, not a file path):
+1. **Scan the work area**
+   - Identify files likely to change based on the prompt
+   - Find existing test files for those areas (search for test/spec files that import, reference, or share names with the implementation files)
+   - Note local patterns and conventions in the affected areas
+2. **Assess complexity and route**
+   | Complexity | Signals | Action |
+   |-----------|---------|--------|
+   | **Trivial** | 1-2 files, no behavioral change (typo, config, rename) | Proceed to Phase 1 step 2 (environment setup), then implement directly — no task list, no execution loop. Apply Test Discovery if the change touches behavior-bearing code |
+   | **Small / Medium** | Clear scope, under ~10 files | Build a task list from discovery. Proceed to Phase 1 step 2 |
+   | **Large** | Cross-cutting, architectural decisions, 10+ files, touches auth/payments/migrations | Inform the user this would benefit from `/ce:brainstorm` or `/ce:plan` to surface edge cases and scope boundaries. Honor their choice. If proceeding, build a task list and continue to Phase 1 step 2 |
+---
 ### Phase 1: Quick Start
-1. **Read Plan and Clarify**
+1. **Read Plan and Clarify** _(skip if arriving from Phase 0 with a bare prompt)_
    - Read the work document completely
    - Treat the plan as a decision artifact, not an execution script
@@ -50,8 +126,17 @@ This command takes a work document (plan, specification, or todo file) and execu
    ```
    **If already on a feature branch** (not the default branch):
-   - Ask: "Continue working on `[current_branch]`, or create a new branch?"
-   - If continuing, proceed to step 3
+   First, check whether the branch name is **meaningful** — a name like `feat/crowd-sniff` or `fix/email-validation` tells future readers what the work is about. Auto-generated worktree names (e.g., `worktree-jolly-beaming-raven`) or other opaque names do not.
+   If the branch name is meaningless or auto-generated, suggest renaming it before continuing:
+   ```bash
+   git branch -m <meaningful-name>
+   ```
+   Derive the new name from the plan title or work description (e.g., `feat/crowd-sniff`). Present the rename as a recommended option alongside continuing as-is.
+   Then ask: "Continue working on `[current_branch]`, or create a new branch?"
+   - If continuing (with or without rename), proceed to step 3
    - If creating new, follow Option A or B below
    **If on the default branch**, choose how to proceed:
@@ -79,7 +164,7 @@ This command takes a work document (plan, specification, or todo file) and execu
    - You want to keep the default branch clean while experimenting
    - You plan to switch between branches frequently
-3. **Create Todo List**
+3. **Create Todo List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
    - Use your available task tracking tool (e.g., todowrite, task lists) to break the plan into actionable tasks
    - Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
    - Carry each unit's `Execution note` into the task when present
@@ -93,22 +178,50 @@ This command takes a work document (plan, specification, or todo file) and execu
 4. **Choose Execution Strategy**
+   **Delegation routing gate:** If `delegation_active` is true AND the input is a plan file (not a bare prompt), read `references/codex-delegation-workflow.md` and follow its Pre-Delegation Checks and Delegation Decision flow. If all checks pass and delegation proceeds, force **serial execution** and proceed directly to Phase 2 using the workflow's batched execution loop. If any check disables delegation, fall through to the standard strategy table below. If delegation is active but the input is a bare prompt (no plan file), set `delegation_active` to false with a brief note: "Codex delegation requires a plan file -- using standard mode." and continue with the standard strategy selection below.
    After creating the task list, decide how to execute based on the plan's size and dependency structure:
    | Strategy | When to use |
    |----------|-------------|
-   | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight |
-   | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks |
-   | **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
+   | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight. **Default for bare-prompt work** — bare prompts rarely produce enough structured context to justify subagent dispatch |
+   | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks. Requires plan-unit metadata (Goal, Files, Approach, Test scenarios) |
+   | **Parallel subagents** | 3+ tasks that pass the Parallel Safety Check (below). Dispatch independent units simultaneously, run dependent units after their prerequisites complete. Requires plan-unit metadata |
+   **Parallel Safety Check** — required before choosing parallel dispatch:
+   1. Build a file-to-unit mapping from every candidate unit's `Files:` section (Create, Modify, and Test paths)
+   2. Check for intersection — any file path appearing in 2+ units means overlap
+   3. If any overlap is found, downgrade to serial subagents. Log the reason (e.g., "Units 2 and 4 share `config/routes.rb` — using serial dispatch"). Serial subagents still provide context-window isolation without shared-directory risks
+   Even with no file overlap, parallel subagents sharing a working directory face git index contention (concurrent staging/committing corrupts the index) and test interference (concurrent test runs pick up each other's in-progress changes). The parallel subagent constraints below mitigate these.
    **Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
    - The full plan file path (for overall context)
    - The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
    - Any resolved deferred questions relevant to that unit
+   - Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests
-   After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
+   **Parallel subagent constraints** — when dispatching units in parallel (not serial or inline):
+   - Instruct each subagent: "Do not stage files (`git add`), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete."
+   - These constraints prevent git index contention and test interference between concurrent subagents
-   For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
+   **Permission mode:** Omit the `mode` parameter when dispatching subagents so the user's configured permission settings apply. Do not pass `mode: "auto"` — it overrides user-level settings like `bypassPermissions`.
+   **After each subagent completes (serial mode):**
+   1. Review the subagent's diff — verify changes match the unit's scope and `Files:` list
+   2. Run the relevant test suite to confirm the tree is healthy
+   3. If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
+   4. Update the plan checkboxes and task list
+   5. Dispatch the next unit
+   **After all parallel subagents in a batch complete:**
+   1. Wait for every subagent in the current parallel batch to finish before acting on any of their results
+   2. Cross-check for discovered file collisions: compare the actual files modified by all subagents in the batch (not just their declared `Files:` lists). Subagents may create or modify files not anticipated during planning — this is expected, since plans describe *what* not *how*. A collision only matters when 2+ subagents in the same batch modified the same file. In a shared working directory, only the last writer's version survives — the other unit's changes to that file are lost. If a collision is detected: commit all non-colliding files from all units first, then re-run the affected units serially for the shared file so each builds on the other's committed work
+   3. For each completed unit, in dependency order: review the diff, run the relevant test suite, stage only that unit's files, and commit with a conventional message derived from the unit's Goal
+   4. If tests fail after committing a unit's changes, diagnose and fix before committing the next unit
+   5. Update the plan checkboxes and task list
+   6. Dispatch the next batch of independent units, or the next dependent unit
 ### Phase 2: Execute
@@ -119,12 +232,16 @@ This command takes a work document (plan, specification, or todo file) and execu
    ```
    while (tasks remain):
      - Mark task as in-progress
-     - Read any referenced files from the plan
+     - Read any referenced files from the plan or discovered during Phase 0
      - Look for similar patterns in codebase
-     - Implement following existing conventions
-     - Write tests for new functionality
+     - Find existing test files for implementation files being changed (Test Discovery — see below)
+     - If delegation_active: branch to the Codex Delegation Execution Loop
+       (see `references/codex-delegation-workflow.md`)
+     - Otherwise: implement following existing conventions
+     - Add, update, or remove tests to match implementation changes (see Test Discovery below)
      - Run System-Wide Test Check (see below)
      - Run tests after changes
+     - Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
      - Mark task as completed
      - Evaluate for incremental commit (see below)
    ```
@@ -137,6 +254,17 @@ This command takes a work document (plan, specification, or todo file) and execu
    - Do not over-implement beyond the current behavior slice when working test-first
    - Skip test-first discipline for trivial renames, pure configuration, and pure styling work
+   **Test Discovery** — Before implementing changes to a file, find its existing test files (search for test/spec files that import, reference, or share naming patterns with the implementation file). When a plan specifies test scenarios or test files, start there, then check for additional test coverage the plan may not have enumerated. Changes to implementation files should be accompanied by corresponding test updates — new tests for new behavior, modified tests for changed behavior, removed or updated tests for deleted behavior.
+   **Test Scenario Completeness** — Before writing tests for a feature-bearing unit, check whether the plan's `Test scenarios` cover all categories that apply to this unit. If a category is missing or scenarios are vague (e.g., "validates correctly" without naming inputs and expected outcomes), supplement from the unit's own context before writing tests:
+   | Category | When it applies | How to derive if missing |
+   |----------|----------------|------------------------|
+   | **Happy path** | Always for feature-bearing units | Read the unit's Goal and Approach for core input/output pairs |
+   | **Edge cases** | When the unit has meaningful boundaries (inputs, state, concurrency) | Identify boundary values, empty/nil inputs, and concurrent access patterns |
+   | **Error/failure paths** | When the unit has failure modes (validation, external calls, permissions) | Enumerate invalid inputs the unit should reject, permission/auth denials it should enforce, and downstream failures it should handle |
+   | **Integration** | When the unit crosses layers (callbacks, middleware, multi-service) | Identify the cross-layer chain and write a scenario that exercises it without mocks |
    **System-Wide Test Check** — Before marking a task done, pause and ask:
    | Question | What to do |
@@ -183,6 +311,8 @@ This command takes a work document (plan, specification, or todo file) and execu
    **Note:** Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.
+   **Parallel subagent mode:** When units run as parallel subagents, the subagents do not commit — the orchestrator handles staging and committing after the entire parallel batch completes (see Parallel subagent constraints in Phase 1 Step 4). The commit guidance in this section applies to inline and serial execution, and to the orchestrator's commit decisions after parallel batch completion.
 3. **Follow Existing Patterns**
    - The plan should reference similar code - read those files first
@@ -196,7 +326,7 @@ This command takes a work document (plan, specification, or todo file) and execu
    - Run relevant tests after each significant change
    - Don't wait until the end to test
    - Fix failures immediately
-   - Add new tests for new functionality
+   - Add new tests for new behavior, update tests for changed behavior, remove tests for deleted behavior
    - **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.
 5. **Simplify as You Go**
@@ -230,263 +360,15 @@ This command takes a work document (plan, specification, or todo file) and execu
    - Create new tasks if scope expands
    - Keep user informed of major milestones
-### Phase 3: Quality Check
-1. **Run Core Quality Checks**
-   Always run before submitting:
-   ```bash
-   # Run full test suite (use project's test command)
-   # Examples: bin/rails test, npm test, pytest, go test, etc.
-   # Run linting (per AGENTS.md)
-   # Use linting-agent before pushing to origin
-   ```
-2. **Consider Code Review** (Optional)
-   Use for complex, risky, or large changes. Load the `ce:review` skill with `mode:autofix` to fix safe issues and flag the rest before shipping.
-3. **Final Validation**
-   - All tasks marked completed
-   - All tests pass
-   - Linting passes
-   - Code follows existing patterns
-   - Figma designs match (if applicable)
-   - No console errors or warnings
-   - If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
-   - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
-4. **Prepare Operational Validation Plan** (REQUIRED)
-   - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
-   - Include concrete:
-     - Log queries/search terms
-     - Metrics or dashboards to watch
-     - Expected healthy signals
-     - Failure signals and rollback/mitigation trigger
-     - Validation window and owner
-   - If there is truly no production/runtime impact, still include the section with: `No additional operational monitoring required` and a one-line reason.
-### Phase 4: Ship It
-1. **Create Commit**
-   ```bash
-   git add .
-   git status  # Review what's being committed
-   git diff --staged  # Check the changes
-   # Commit with conventional format
-   git commit -m "$(cat <<'EOF'
-   feat(scope): description of what and why
-   Brief explanation if needed.
-   🤖 Generated with [MODEL] via [HARNESS](HARNESS_URL) + Systematic v[VERSION]
-   Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com>
-   EOF
-   )"
-   ```
-   **Fill in at commit/PR time:**
-   | Placeholder | Value | Example |
-   |-------------|-------|---------|
-   | Placeholder | Value | Example |
-   |-------------|-------|---------|
-   | `[MODEL]` | Model name | Claude Opus 4.6, GPT-5.4 |
-   | `[CONTEXT]` | Context window (if known) | 200K, 1M |
-   | `[THINKING]` | Thinking level (if known) | extended thinking |
-   | `[HARNESS]` | Tool running you | OpenCode, Codex, Gemini CLI |
-   | `[HARNESS_URL]` | Link to that tool | `https://opencode.ai` |
-   | `[VERSION]` | `plugin.json` → `version` | 2.40.0 |
+### Phase 3-4: Quality Check and Ship It
-   Subagents creating commits/PRs are equally responsible for accurate attribution.
-2. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
-   For **any** design changes, new views, or UI modifications, you MUST capture and upload screenshots:
-   **Step 1: Start dev server** (if not running)
-   ```bash
-   bin/dev  # Run in background
-   ```
-   **Step 2: Capture screenshots with agent-browser CLI**
-   ```bash
-   agent-browser open http://localhost:3000/[route]
-   agent-browser snapshot -i
-   agent-browser screenshot output.png
-   ```
-   See the `agent-browser` skill for detailed usage.
-   **Step 3: Upload using imgup skill**
-   ```bash
-   skill: imgup
-   # Then upload each screenshot:
-   imgup -h pixhost screenshot.png  # pixhost works without API key
-   # Alternative hosts: catbox, imagebin, beeimg
-   ```
-   **What to capture:**
-   - **New screens**: Screenshot of the new UI
-   - **Modified screens**: Before AND after screenshots
-   - **Design implementation**: Screenshot showing Figma design match
-   **IMPORTANT**: Always include uploaded image URLs in PR description. This provides visual context for reviewers and documents the change.
-3. **Create Pull Request**
-   ```bash
-   git push -u origin feature-branch-name
-   gh pr create --title "Feature: [Description]" --body "$(cat <<'EOF'
-   ## Summary
-   - What was built
-   - Why it was needed
-   - Key decisions made
-   ## Testing
-   - Tests added/modified
-   - Manual testing performed
-   ## Post-Deploy Monitoring & Validation
-   - **What to monitor/search**
-     - Logs:
-     - Metrics/Dashboards:
-   - **Validation checks (queries/commands)**
-     - `command or query here`
-   - **Expected healthy behavior**
-     - Expected signal(s)
-   - **Failure signal(s) / rollback trigger**
-     - Trigger + immediate action
-   - **Validation window & owner**
-     - Window:
-     - Owner:
-   - **If no operational impact**
-     - `No additional operational monitoring required: <reason>`
-   ## Before / After Screenshots
-   | Before | After |
-   |--------|-------|
-   | ![before](URL) | ![after](URL) |
-   ## Figma Design
-   [Link if applicable]
-   ---
-   [![Systematic v[VERSION]](https://img.shields.io/badge/Systematic-v[VERSION]-6366f1)](https://github.com/marcusrbrown/systematic)
-   🤖 Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL)
-   EOF
-   )"
-   ```
-4. **Update Plan Status**
-   If the input document has YAML frontmatter with a `status` field, update it to `completed`:
-   ```
-   status: active  →  status: completed
-   ```
-5. **Notify User**
-   - Summarize what was completed
-   - Link to PR
-   - Note any follow-up work needed
-   - Suggest next steps if applicable
+When all Phase 2 tasks are complete and execution transitions to quality check, read `references/shipping-workflow.md` for the full shipping workflow: quality checks, code review, final validation, PR creation, and notification.
 ---
-## Swarm Mode with Agent Teams (Optional)
-For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in OpenCode, multi-agent workflows in Codex).
-**Agent teams are typically experimental and require opt-in.** Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.
+## Codex Delegation Mode
-### When to Use Agent Teams vs Subagents
-| Agent Teams | Subagents (standard mode) |
-|-------------|---------------------------|
-| Agents need to discuss and challenge each other's approaches | Each task is independent — only the result matters |
-| Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
-| 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
-| User explicitly requests "swarm mode" or "agent teams" | Default for most plans |
-Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead — use them when the inter-agent communication genuinely improves the outcome.
-### Agent Teams Workflow
-1. **Create team** — use your available team creation mechanism
-2. **Create task list** — parse Implementation Units into tasks with dependency relationships
-3. **Spawn teammates** — assign specialized roles (implementer, tester, reviewer) based on the plan's needs. Give each teammate the plan file path and their specific task assignments
-4. **Coordinate** — the lead monitors task completion, reassigns work if someone gets stuck, and spawns additional workers as phases unblock
-5. **Cleanup** — shut down all teammates, then clean up the team resources
----
-## External Delegate Mode (Optional)
-For plans where token conservation matters, delegate code implementation to an external delegate (currently Codex CLI) while keeping planning, review, and git operations in the current agent.
-This mode integrates with the existing Phase 1 Step 4 strategy selection as a **task-level modifier** - the strategy (inline/serial/parallel) still applies, but the implementation step within each tagged task delegates to the external tool instead of executing directly.
-### When to Use External Delegation
-| External Delegation | Standard Mode |
-|---------------------|---------------|
-| Task is pure code implementation | Task requires research or exploration |
-| Plan has clear acceptance criteria | Task is ambiguous or needs iteration |
-| Token conservation matters (e.g., Max20 plan) | Unlimited plan or small task |
-| Files to change are well-scoped | Changes span many interconnected files |
-### Enabling External Delegation
-External delegation activates when any of these conditions are met:
-- The user says "use codex for this work", "delegate to codex", or "delegate mode"
-- A plan implementation unit contains `Execution target: external-delegate` in its Execution note (set by ce:plan)
-The specific delegate tool is resolved at execution time. Currently the only supported delegate is Codex CLI. Future delegates can be added without changing plan files.
-### Environment Guard
-Before attempting delegation, check whether the current agent is already running inside a delegate's sandbox. Delegation from within a sandbox will fail silently or recurse.
-Check for known sandbox indicators:
-- `CODEX_SANDBOX` environment variable is set
-- `CODEX_SESSION_ID` environment variable is set
-- The filesystem is read-only at `.git/` (Codex sandbox blocks git writes)
-If any indicator is detected, print "Already running inside a delegate sandbox - using standard mode." and proceed with standard execution for that task.
-### External Delegation Workflow
-When external delegation is active, follow this workflow for each tagged task. Do not skip delegation because a task seems "small", "simple", or "faster inline". The user or plan explicitly requested delegation.
-1. **Check availability**
-   Verify the delegate CLI is installed. If not found, print "Delegate CLI not installed - continuing with standard mode." and proceed normally.
-2. **Build prompt** — For each task, assemble a prompt from the plan's implementation unit (Goal, Files, Approach, Conventions from project AGENTS.md/AGENTS.md). Include rules: no git commits, no PRs, run `git status` and `git diff --stat` when done. Never embed credentials or tokens in the prompt - pass auth through environment variables.
-3. **Write prompt to file** — Save the assembled prompt to a unique temporary file to avoid shell quoting issues and cross-task races. Use a unique filename per task.
-4. **Delegate** — Run the delegate CLI, piping the prompt file via stdin (not argv expansion, which hits `ARG_MAX` on large prompts). Omit the model flag to use the delegate's default model, which stays current without manual updates.
-5. **Review diff** — After the delegate finishes, verify the diff is non-empty and in-scope. Run the project's test/lint commands. If the diff is empty or out-of-scope, fall back to standard mode for that task.
-6. **Commit** — The current agent handles all git operations. The delegate's sandbox blocks `.git/index.lock` writes, so the delegate cannot commit. Stage changes and commit with a conventional message.
-7. **Error handling** — On any delegate failure (rate limit, error, empty diff), fall back to standard mode for that task. Track consecutive failures - after 3 consecutive failures, disable delegation for remaining tasks and print "Delegate disabled after 3 consecutive failures - completing remaining tasks in standard mode."
-### Mixed-Model Attribution
-When some tasks are executed by the delegate and others by the current agent, use the following attribution in Phase 4:
-- If all tasks used the delegate: attribute to the delegate model
-- If all tasks used standard mode: attribute to the current agent's model
-- If mixed: use `Generated with [CURRENT_MODEL] + [DELEGATE_MODEL] via [HARNESS]` and note which tasks were delegated in the PR description
+When `delegation_active` is true after argument parsing, read `references/codex-delegation-workflow.md` for the complete delegation workflow: pre-checks, batching, prompt template, execution loop, and result classification.
 ---
@@ -515,7 +397,7 @@ When some tasks are executed by the delegate and others by the current agent, us
 - Follow existing patterns
 - Write tests for new code
 - Run linting before pushing
-- Use reviewer agents for complex/risky changes only
+- Review every change — inline for simple additive work, full review for everything else
 ### Ship Complete Features
@@ -523,34 +405,6 @@ When some tasks are executed by the delegate and others by the current agent, us
 - Don't leave features 80% done
 - A finished feature that ships beats a perfect feature that doesn't
-## Quality Checklist
-Before creating PR, verify:
-- [ ] All clarifying questions asked and answered
-- [ ] All tasks marked completed
-- [ ] Tests pass (run project's test command)
-- [ ] Linting passes (use linting-agent)
-- [ ] Code follows existing patterns
-- [ ] Figma designs match implementation (if applicable)
-- [ ] Before/after screenshots captured and uploaded (for UI changes)
-- [ ] Commit messages follow conventional format
-- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
-- [ ] PR description includes summary, testing notes, and screenshots
-- [ ] PR description includes Compound Engineered badge with accurate model, harness, and version
-## When to Use Reviewer Agents
-**Don't use by default.** Use reviewer agents only when:
-- Large refactor affecting many files (10+)
-- Security-sensitive changes (authentication, permissions, data access)
-- Performance-critical code paths
-- Complex algorithms or business logic
-- User explicitly requests thorough review
-For most features: tests + linting + following patterns is sufficient.
 ## Common Pitfalls to Avoid
 - **Analysis paralysis** - Don't overthink, read the plan and execute
@@ -559,5 +413,4 @@ For most features: tests + linting + following patterns is sufficient.
 - **Testing at the end** - Test continuously or suffer later
 - **Forgetting to track progress** - Update task status as you go or lose track of what's done
 - **80% done syndrome** - Finish the feature, don't move on early
-- **Over-reviewing simple changes** - Save reviewer agents for complex work
+- **Skipping review** - Every change gets reviewed; only the depth varies

package/skills/dhh-rails-style/SKILL.md CHANGED Viewed

@@ -57,12 +57,12 @@ What are you working on?
 | Response | Reference to Read |
 |----------|-------------------|
-| 1, controller | [controllers.md](./references/controllers.md) |
-| 2, model | [models.md](./references/models.md) |
-| 3, view, frontend, turbo, stimulus, css | [frontend.md](./references/frontend.md) |
-| 4, architecture, routing, auth, job, cache | [architecture.md](./references/architecture.md) |
-| 5, test, testing, minitest, fixture | [testing.md](./references/testing.md) |
-| 6, gem, dependency, library | [gems.md](./references/gems.md) |
+| 1, controller | `references/controllers.md` |
+| 2, model | `references/models.md` |
+| 3, view, frontend, turbo, stimulus, css | `references/frontend.md` |
+| 4, architecture, routing, auth, job, cache | `references/architecture.md` |
+| 5, test, testing, minitest, fixture | `references/testing.md` |
+| 6, gem, dependency, library | `references/gems.md` |
 | 7, review | Read all references, then review code |
 | 8, general task | Read relevant references based on context |
@@ -153,12 +153,12 @@ All detailed patterns in `references/`:
 | File | Topics |
 |------|--------|
-| [controllers.md](./references/controllers.md) | REST mapping, concerns, Turbo responses, API patterns, HTTP caching |
-| [models.md](./references/models.md) | Concerns, state records, callbacks, scopes, POROs, authorization, broadcasting |
-| [frontend.md](./references/frontend.md) | Turbo Streams, Stimulus controllers, CSS layers, OKLCH colors, partials |
-| [architecture.md](./references/architecture.md) | Routing, authentication, jobs, Current attributes, caching, database patterns |
-| [testing.md](./references/testing.md) | Minitest, fixtures, unit/integration/system tests, testing patterns |
-| [gems.md](./references/gems.md) | What they use vs avoid, decision framework, Gemfile examples |
+| `references/controllers.md` | REST mapping, concerns, Turbo responses, API patterns, HTTP caching |
+| `references/models.md` | Concerns, state records, callbacks, scopes, POROs, authorization, broadcasting |
+| `references/frontend.md` | Turbo Streams, Stimulus controllers, CSS layers, OKLCH colors, partials |
+| `references/architecture.md` | Routing, authentication, jobs, Current attributes, caching, database patterns |
+| `references/testing.md` | Minitest, fixtures, unit/integration/system tests, testing patterns |
+| `references/gems.md` | What they use vs avoid, decision framework, Gemfile examples |
 </reference_index>
 <success_criteria>
@@ -183,3 +183,4 @@ Based on [The Unofficial 37signals/DHH Rails Style Guide](https://github.com/mar
 - Code examples from Fizzy are licensed under the O'Saasy License
 - Not affiliated with or endorsed by 37signals
 </credits>