@fro.bot/systematic 2.3.2 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/README.md +12 -13
  2. package/agents/design/design-implementation-reviewer.md +2 -19
  3. package/agents/design/design-iterator.md +2 -31
  4. package/agents/design/figma-design-sync.md +2 -22
  5. package/agents/docs/ankane-readme-writer.md +2 -19
  6. package/agents/document-review/adversarial-document-reviewer.md +3 -2
  7. package/agents/document-review/coherence-reviewer.md +5 -7
  8. package/agents/document-review/design-lens-reviewer.md +3 -4
  9. package/agents/document-review/feasibility-reviewer.md +3 -4
  10. package/agents/document-review/product-lens-reviewer.md +25 -6
  11. package/agents/document-review/scope-guardian-reviewer.md +3 -4
  12. package/agents/document-review/security-lens-reviewer.md +3 -4
  13. package/agents/research/best-practices-researcher.md +4 -21
  14. package/agents/research/framework-docs-researcher.md +2 -19
  15. package/agents/research/git-history-analyzer.md +2 -19
  16. package/agents/research/issue-intelligence-analyst.md +2 -24
  17. package/agents/research/learnings-researcher.md +7 -28
  18. package/agents/research/repo-research-analyst.md +3 -32
  19. package/agents/research/slack-researcher.md +128 -0
  20. package/agents/review/agent-native-reviewer.md +109 -195
  21. package/agents/review/architecture-strategist.md +3 -19
  22. package/agents/review/cli-agent-readiness-reviewer.md +1 -27
  23. package/agents/review/code-simplicity-reviewer.md +5 -19
  24. package/agents/review/data-integrity-guardian.md +3 -19
  25. package/agents/review/data-migration-expert.md +3 -19
  26. package/agents/review/deployment-verification-agent.md +3 -19
  27. package/agents/review/pattern-recognition-specialist.md +4 -20
  28. package/agents/review/performance-oracle.md +3 -31
  29. package/agents/review/project-standards-reviewer.md +5 -5
  30. package/agents/review/schema-drift-detector.md +3 -19
  31. package/agents/review/security-sentinel.md +3 -25
  32. package/agents/review/testing-reviewer.md +3 -3
  33. package/agents/workflow/pr-comment-resolver.md +54 -22
  34. package/agents/workflow/spec-flow-analyzer.md +2 -25
  35. package/package.json +1 -1
  36. package/skills/agent-native-architecture/SKILL.md +28 -27
  37. package/skills/agent-native-architecture/references/agent-execution-patterns.md +3 -3
  38. package/skills/agent-native-architecture/references/agent-native-testing.md +1 -1
  39. package/skills/agent-native-architecture/references/mobile-patterns.md +1 -1
  40. package/skills/andrew-kane-gem-writer/SKILL.md +5 -5
  41. package/skills/ce-brainstorm/SKILL.md +43 -181
  42. package/skills/ce-compound/SKILL.md +143 -89
  43. package/skills/ce-compound-refresh/SKILL.md +48 -5
  44. package/skills/ce-ideate/SKILL.md +27 -242
  45. package/skills/ce-plan/SKILL.md +165 -81
  46. package/skills/ce-review/SKILL.md +348 -125
  47. package/skills/ce-review/references/findings-schema.json +5 -0
  48. package/skills/ce-review/references/persona-catalog.md +2 -2
  49. package/skills/ce-review/references/resolve-base.sh +5 -2
  50. package/skills/ce-review/references/subagent-template.md +25 -3
  51. package/skills/ce-work/SKILL.md +95 -242
  52. package/skills/ce-work-beta/SKILL.md +154 -301
  53. package/skills/dhh-rails-style/SKILL.md +13 -12
  54. package/skills/document-review/SKILL.md +56 -109
  55. package/skills/document-review/references/findings-schema.json +0 -23
  56. package/skills/document-review/references/subagent-template.md +13 -18
  57. package/skills/dspy-ruby/SKILL.md +8 -8
  58. package/skills/every-style-editor/SKILL.md +3 -2
  59. package/skills/frontend-design/SKILL.md +2 -3
  60. package/skills/git-commit/SKILL.md +1 -1
  61. package/skills/git-commit-push-pr/SKILL.md +81 -265
  62. package/skills/git-worktree/SKILL.md +20 -21
  63. package/skills/lfg/SKILL.md +10 -17
  64. package/skills/onboarding/SKILL.md +2 -2
  65. package/skills/onboarding/scripts/inventory.mjs +31 -7
  66. package/skills/proof/SKILL.md +134 -28
  67. package/skills/resolve-pr-feedback/SKILL.md +7 -2
  68. package/skills/setup/SKILL.md +1 -1
  69. package/skills/test-browser/SKILL.md +10 -11
  70. package/skills/test-xcode/SKILL.md +6 -3
  71. package/dist/lib/manifest.d.ts +0 -39
@@ -1,27 +1,103 @@
1
1
  ---
2
2
  name: ce:work-beta
3
- description: '[BETA] Execute work plans with external delegate support. Same as ce:work but includes experimental Codex delegation mode for token-conserving code implementation.'
4
- argument-hint: '[plan file, specification, or todo file path]'
3
+ description: "[BETA] Execute work with external delegate support. Same as ce:work but includes experimental Codex delegation mode for token-conserving code implementation."
5
4
  disable-model-invocation: true
5
+ argument-hint: "[Plan doc path or description of work. Blank to auto use latest plan doc] [delegate:codex]"
6
6
  ---
7
7
 
8
- # Work Plan Execution Command
8
+ # Work Execution Command
9
9
 
10
- Execute a work plan efficiently while maintaining quality and finishing features.
10
+ Execute work efficiently while maintaining quality and finishing features.
11
11
 
12
12
  ## Introduction
13
13
 
14
- This command takes a work document (plan, specification, or todo file) and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
14
+ This command takes a work document (plan, specification, or todo file) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
15
+
16
+ **Beta rollout note:** Invoke `ce:work-beta` manually when you want to trial Codex delegation. During the beta period, planning and workflow handoffs remain pointed at stable `ce:work` to avoid dual-path orchestration complexity.
15
17
 
16
18
  ## Input Document
17
19
 
18
20
  <input_document> #$ARGUMENTS </input_document>
19
21
 
22
+ ## Argument Parsing
23
+
24
+ Parse `$ARGUMENTS` for the following optional tokens. Strip each recognized token before interpreting the remainder as the plan file path or bare prompt.
25
+
26
+ | Token | Example | Effect |
27
+ |-------|---------|--------|
28
+ | `delegate:codex` | `delegate:codex` | Activate Codex delegation mode for plan execution |
29
+ | `delegate:local` | `delegate:local` | Deactivate delegation even if enabled in config |
30
+
31
+ All tokens are optional. When absent, fall back to the resolution chain below.
32
+
33
+ **Fuzzy activation:** Also recognize imperative delegation-intent phrases such as "use codex", "delegate to codex", "codex mode", or "delegate mode" as equivalent to `delegate:codex`. A bare mention of "codex" in a prompt (e.g., "fix codex converter bugs") must NOT activate delegation -- only clear delegation intent triggers it.
34
+
35
+ **Fuzzy deactivation:** Also recognize phrases such as "no codex", "local mode", "standard mode" as equivalent to `delegate:local`.
36
+
37
+ ### Settings Resolution Chain
38
+
39
+ After extracting tokens from arguments, resolve the delegation state using this precedence chain:
40
+
41
+ 1. **Argument flag** -- `delegate:codex` or `delegate:local` from the current invocation (highest priority)
42
+ 2. **Config file** -- extract settings from the config block below. Value `codex` for `work_delegate` activates delegation; `false` deactivates.
43
+ 3. **Hard default** -- `false` (delegation off)
44
+
45
+ **Config (pre-resolved):**
46
+ !`cat "$(git rev-parse --show-toplevel 2>/dev/null)/.systematic/config.local.yaml" 2>/dev/null || cat "$(dirname "$(git rev-parse --path-format=absolute --git-common-dir 2>/dev/null)")/.systematic/config.local.yaml" 2>/dev/null || echo '__NO_CONFIG__'`
47
+
48
+ If the block above contains YAML key-value pairs, extract values for the keys listed below.
49
+ If it shows `__NO_CONFIG__`, the file does not exist — all settings fall through to defaults.
50
+ If it shows an unresolved command string, read `.systematic/config.local.yaml` from the repo root using the native file-read tool (e.g., Read in OpenCode, read_file in Codex). If the file does not exist, all settings fall through to defaults.
51
+
52
+ If any setting has an unrecognized value, fall through to the hard default for that setting.
53
+
54
+ Config keys:
55
+ - `work_delegate` -- `codex` or default `false`
56
+ - `work_delegate_consent` -- `true` or default `false`
57
+ - `work_delegate_sandbox` -- `yolo` (default) or `full-auto`
58
+ - `work_delegate_decision` -- `auto` (default) or `ask`
59
+ - `work_delegate_model` -- Codex model to use (default `gpt-5.4`). Passthrough — any valid model name accepted.
60
+ - `work_delegate_effort` -- `minimal`, `low`, `medium`, `high` (default), or `xhigh`
61
+
62
+ Store the resolved state for downstream consumption:
63
+ - `delegation_active` -- boolean, whether delegation mode is on
64
+ - `delegation_source` -- `argument` or `config` or `default` -- how delegation was resolved (used by environment guard to decide notification verbosity)
65
+ - `sandbox_mode` -- `yolo` or `full-auto` (from config or default `yolo`)
66
+ - `consent_granted` -- boolean (from config `work_delegate_consent`)
67
+ - `delegate_model` -- string (from config or default `gpt-5.4`)
68
+ - `delegate_effort` -- string (from config or default `high`)
69
+
70
+ ---
71
+
20
72
  ## Execution Workflow
21
73
 
74
+ ### Phase 0: Input Triage
75
+
76
+ Determine how to proceed based on what was provided in `<input_document>`.
77
+
78
+ **Plan document** (input is a file path to an existing plan, specification, or todo file) → skip to Phase 1.
79
+
80
+ **Bare prompt** (input is a description of work, not a file path):
81
+
82
+ 1. **Scan the work area**
83
+
84
+ - Identify files likely to change based on the prompt
85
+ - Find existing test files for those areas (search for test/spec files that import, reference, or share names with the implementation files)
86
+ - Note local patterns and conventions in the affected areas
87
+
88
+ 2. **Assess complexity and route**
89
+
90
+ | Complexity | Signals | Action |
91
+ |-----------|---------|--------|
92
+ | **Trivial** | 1-2 files, no behavioral change (typo, config, rename) | Proceed to Phase 1 step 2 (environment setup), then implement directly — no task list, no execution loop. Apply Test Discovery if the change touches behavior-bearing code |
93
+ | **Small / Medium** | Clear scope, under ~10 files | Build a task list from discovery. Proceed to Phase 1 step 2 |
94
+ | **Large** | Cross-cutting, architectural decisions, 10+ files, touches auth/payments/migrations | Inform the user this would benefit from `/ce:brainstorm` or `/ce:plan` to surface edge cases and scope boundaries. Honor their choice. If proceeding, build a task list and continue to Phase 1 step 2 |
95
+
96
+ ---
97
+
22
98
  ### Phase 1: Quick Start
23
99
 
24
- 1. **Read Plan and Clarify**
100
+ 1. **Read Plan and Clarify** _(skip if arriving from Phase 0 with a bare prompt)_
25
101
 
26
102
  - Read the work document completely
27
103
  - Treat the plan as a decision artifact, not an execution script
@@ -50,8 +126,17 @@ This command takes a work document (plan, specification, or todo file) and execu
50
126
  ```
51
127
 
52
128
  **If already on a feature branch** (not the default branch):
53
- - Ask: "Continue working on `[current_branch]`, or create a new branch?"
54
- - If continuing, proceed to step 3
129
+
130
+ First, check whether the branch name is **meaningful** — a name like `feat/crowd-sniff` or `fix/email-validation` tells future readers what the work is about. Auto-generated worktree names (e.g., `worktree-jolly-beaming-raven`) or other opaque names do not.
131
+
132
+ If the branch name is meaningless or auto-generated, suggest renaming it before continuing:
133
+ ```bash
134
+ git branch -m <meaningful-name>
135
+ ```
136
+ Derive the new name from the plan title or work description (e.g., `feat/crowd-sniff`). Present the rename as a recommended option alongside continuing as-is.
137
+
138
+ Then ask: "Continue working on `[current_branch]`, or create a new branch?"
139
+ - If continuing (with or without rename), proceed to step 3
55
140
  - If creating new, follow Option A or B below
56
141
 
57
142
  **If on the default branch**, choose how to proceed:
@@ -79,7 +164,7 @@ This command takes a work document (plan, specification, or todo file) and execu
79
164
  - You want to keep the default branch clean while experimenting
80
165
  - You plan to switch between branches frequently
81
166
 
82
- 3. **Create Todo List**
167
+ 3. **Create Todo List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
83
168
  - Use your available task tracking tool (e.g., todowrite, task lists) to break the plan into actionable tasks
84
169
  - Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
85
170
  - Carry each unit's `Execution note` into the task when present
@@ -93,22 +178,50 @@ This command takes a work document (plan, specification, or todo file) and execu
93
178
 
94
179
  4. **Choose Execution Strategy**
95
180
 
181
+ **Delegation routing gate:** If `delegation_active` is true AND the input is a plan file (not a bare prompt), read `references/codex-delegation-workflow.md` and follow its Pre-Delegation Checks and Delegation Decision flow. If all checks pass and delegation proceeds, force **serial execution** and proceed directly to Phase 2 using the workflow's batched execution loop. If any check disables delegation, fall through to the standard strategy table below. If delegation is active but the input is a bare prompt (no plan file), set `delegation_active` to false with a brief note: "Codex delegation requires a plan file -- using standard mode." and continue with the standard strategy selection below.
182
+
96
183
  After creating the task list, decide how to execute based on the plan's size and dependency structure:
97
184
 
98
185
  | Strategy | When to use |
99
186
  |----------|-------------|
100
- | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight |
101
- | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks |
102
- | **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
187
+ | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight. **Default for bare-prompt work** — bare prompts rarely produce enough structured context to justify subagent dispatch |
188
+ | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks. Requires plan-unit metadata (Goal, Files, Approach, Test scenarios) |
189
+ | **Parallel subagents** | 3+ tasks that pass the Parallel Safety Check (below). Dispatch independent units simultaneously, run dependent units after their prerequisites complete. Requires plan-unit metadata |
190
+
191
+ **Parallel Safety Check** — required before choosing parallel dispatch:
192
+
193
+ 1. Build a file-to-unit mapping from every candidate unit's `Files:` section (Create, Modify, and Test paths)
194
+ 2. Check for intersection — any file path appearing in 2+ units means overlap
195
+ 3. If any overlap is found, downgrade to serial subagents. Log the reason (e.g., "Units 2 and 4 share `config/routes.rb` — using serial dispatch"). Serial subagents still provide context-window isolation without shared-directory risks
196
+
197
+ Even with no file overlap, parallel subagents sharing a working directory face git index contention (concurrent staging/committing corrupts the index) and test interference (concurrent test runs pick up each other's in-progress changes). The parallel subagent constraints below mitigate these.
103
198
 
104
199
  **Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
105
200
  - The full plan file path (for overall context)
106
201
  - The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
107
202
  - Any resolved deferred questions relevant to that unit
203
+ - Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests
108
204
 
109
- After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
205
+ **Parallel subagent constraints** when dispatching units in parallel (not serial or inline):
206
+ - Instruct each subagent: "Do not stage files (`git add`), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete."
207
+ - These constraints prevent git index contention and test interference between concurrent subagents
110
208
 
111
- For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
209
+ **Permission mode:** Omit the `mode` parameter when dispatching subagents so the user's configured permission settings apply. Do not pass `mode: "auto"` it overrides user-level settings like `bypassPermissions`.
210
+
211
+ **After each subagent completes (serial mode):**
212
+ 1. Review the subagent's diff — verify changes match the unit's scope and `Files:` list
213
+ 2. Run the relevant test suite to confirm the tree is healthy
214
+ 3. If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
215
+ 4. Update the plan checkboxes and task list
216
+ 5. Dispatch the next unit
217
+
218
+ **After all parallel subagents in a batch complete:**
219
+ 1. Wait for every subagent in the current parallel batch to finish before acting on any of their results
220
+ 2. Cross-check for discovered file collisions: compare the actual files modified by all subagents in the batch (not just their declared `Files:` lists). Subagents may create or modify files not anticipated during planning — this is expected, since plans describe *what* not *how*. A collision only matters when 2+ subagents in the same batch modified the same file. In a shared working directory, only the last writer's version survives — the other unit's changes to that file are lost. If a collision is detected: commit all non-colliding files from all units first, then re-run the affected units serially for the shared file so each builds on the other's committed work
221
+ 3. For each completed unit, in dependency order: review the diff, run the relevant test suite, stage only that unit's files, and commit with a conventional message derived from the unit's Goal
222
+ 4. If tests fail after committing a unit's changes, diagnose and fix before committing the next unit
223
+ 5. Update the plan checkboxes and task list
224
+ 6. Dispatch the next batch of independent units, or the next dependent unit
112
225
 
113
226
  ### Phase 2: Execute
114
227
 
@@ -119,12 +232,16 @@ This command takes a work document (plan, specification, or todo file) and execu
119
232
  ```
120
233
  while (tasks remain):
121
234
  - Mark task as in-progress
122
- - Read any referenced files from the plan
235
+ - Read any referenced files from the plan or discovered during Phase 0
123
236
  - Look for similar patterns in codebase
124
- - Implement following existing conventions
125
- - Write tests for new functionality
237
+ - Find existing test files for implementation files being changed (Test Discovery — see below)
238
+ - If delegation_active: branch to the Codex Delegation Execution Loop
239
+ (see `references/codex-delegation-workflow.md`)
240
+ - Otherwise: implement following existing conventions
241
+ - Add, update, or remove tests to match implementation changes (see Test Discovery below)
126
242
  - Run System-Wide Test Check (see below)
127
243
  - Run tests after changes
244
+ - Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
128
245
  - Mark task as completed
129
246
  - Evaluate for incremental commit (see below)
130
247
  ```
@@ -137,6 +254,17 @@ This command takes a work document (plan, specification, or todo file) and execu
137
254
  - Do not over-implement beyond the current behavior slice when working test-first
138
255
  - Skip test-first discipline for trivial renames, pure configuration, and pure styling work
139
256
 
257
+ **Test Discovery** — Before implementing changes to a file, find its existing test files (search for test/spec files that import, reference, or share naming patterns with the implementation file). When a plan specifies test scenarios or test files, start there, then check for additional test coverage the plan may not have enumerated. Changes to implementation files should be accompanied by corresponding test updates — new tests for new behavior, modified tests for changed behavior, removed or updated tests for deleted behavior.
258
+
259
+ **Test Scenario Completeness** — Before writing tests for a feature-bearing unit, check whether the plan's `Test scenarios` cover all categories that apply to this unit. If a category is missing or scenarios are vague (e.g., "validates correctly" without naming inputs and expected outcomes), supplement from the unit's own context before writing tests:
260
+
261
+ | Category | When it applies | How to derive if missing |
262
+ |----------|----------------|------------------------|
263
+ | **Happy path** | Always for feature-bearing units | Read the unit's Goal and Approach for core input/output pairs |
264
+ | **Edge cases** | When the unit has meaningful boundaries (inputs, state, concurrency) | Identify boundary values, empty/nil inputs, and concurrent access patterns |
265
+ | **Error/failure paths** | When the unit has failure modes (validation, external calls, permissions) | Enumerate invalid inputs the unit should reject, permission/auth denials it should enforce, and downstream failures it should handle |
266
+ | **Integration** | When the unit crosses layers (callbacks, middleware, multi-service) | Identify the cross-layer chain and write a scenario that exercises it without mocks |
267
+
140
268
  **System-Wide Test Check** — Before marking a task done, pause and ask:
141
269
 
142
270
  | Question | What to do |
@@ -183,6 +311,8 @@ This command takes a work document (plan, specification, or todo file) and execu
183
311
 
184
312
  **Note:** Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.
185
313
 
314
+ **Parallel subagent mode:** When units run as parallel subagents, the subagents do not commit — the orchestrator handles staging and committing after the entire parallel batch completes (see Parallel subagent constraints in Phase 1 Step 4). The commit guidance in this section applies to inline and serial execution, and to the orchestrator's commit decisions after parallel batch completion.
315
+
186
316
  3. **Follow Existing Patterns**
187
317
 
188
318
  - The plan should reference similar code - read those files first
@@ -196,7 +326,7 @@ This command takes a work document (plan, specification, or todo file) and execu
196
326
  - Run relevant tests after each significant change
197
327
  - Don't wait until the end to test
198
328
  - Fix failures immediately
199
- - Add new tests for new functionality
329
+ - Add new tests for new behavior, update tests for changed behavior, remove tests for deleted behavior
200
330
  - **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.
201
331
 
202
332
  5. **Simplify as You Go**
@@ -230,263 +360,15 @@ This command takes a work document (plan, specification, or todo file) and execu
230
360
  - Create new tasks if scope expands
231
361
  - Keep user informed of major milestones
232
362
 
233
- ### Phase 3: Quality Check
234
-
235
- 1. **Run Core Quality Checks**
236
-
237
- Always run before submitting:
238
-
239
- ```bash
240
- # Run full test suite (use project's test command)
241
- # Examples: bin/rails test, npm test, pytest, go test, etc.
242
-
243
- # Run linting (per AGENTS.md)
244
- # Use linting-agent before pushing to origin
245
- ```
246
-
247
- 2. **Consider Code Review** (Optional)
248
-
249
- Use for complex, risky, or large changes. Load the `ce:review` skill with `mode:autofix` to fix safe issues and flag the rest before shipping.
250
-
251
- 3. **Final Validation**
252
- - All tasks marked completed
253
- - All tests pass
254
- - Linting passes
255
- - Code follows existing patterns
256
- - Figma designs match (if applicable)
257
- - No console errors or warnings
258
- - If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
259
- - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
260
-
261
- 4. **Prepare Operational Validation Plan** (REQUIRED)
262
- - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
263
- - Include concrete:
264
- - Log queries/search terms
265
- - Metrics or dashboards to watch
266
- - Expected healthy signals
267
- - Failure signals and rollback/mitigation trigger
268
- - Validation window and owner
269
- - If there is truly no production/runtime impact, still include the section with: `No additional operational monitoring required` and a one-line reason.
270
-
271
- ### Phase 4: Ship It
272
-
273
- 1. **Create Commit**
274
-
275
- ```bash
276
- git add .
277
- git status # Review what's being committed
278
- git diff --staged # Check the changes
279
-
280
- # Commit with conventional format
281
- git commit -m "$(cat <<'EOF'
282
- feat(scope): description of what and why
283
-
284
- Brief explanation if needed.
285
-
286
- 🤖 Generated with [MODEL] via [HARNESS](HARNESS_URL) + Systematic v[VERSION]
287
-
288
- Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com>
289
- EOF
290
- )"
291
- ```
292
-
293
- **Fill in at commit/PR time:**
294
-
295
- | Placeholder | Value | Example |
296
- |-------------|-------|---------|
297
- | Placeholder | Value | Example |
298
- |-------------|-------|---------|
299
- | `[MODEL]` | Model name | Claude Opus 4.6, GPT-5.4 |
300
- | `[CONTEXT]` | Context window (if known) | 200K, 1M |
301
- | `[THINKING]` | Thinking level (if known) | extended thinking |
302
- | `[HARNESS]` | Tool running you | OpenCode, Codex, Gemini CLI |
303
- | `[HARNESS_URL]` | Link to that tool | `https://opencode.ai` |
304
- | `[VERSION]` | `plugin.json` → `version` | 2.40.0 |
363
+ ### Phase 3-4: Quality Check and Ship It
305
364
 
306
- Subagents creating commits/PRs are equally responsible for accurate attribution.
307
-
308
- 2. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
309
-
310
- For **any** design changes, new views, or UI modifications, you MUST capture and upload screenshots:
311
-
312
- **Step 1: Start dev server** (if not running)
313
- ```bash
314
- bin/dev # Run in background
315
- ```
316
-
317
- **Step 2: Capture screenshots with agent-browser CLI**
318
- ```bash
319
- agent-browser open http://localhost:3000/[route]
320
- agent-browser snapshot -i
321
- agent-browser screenshot output.png
322
- ```
323
- See the `agent-browser` skill for detailed usage.
324
-
325
- **Step 3: Upload using imgup skill**
326
- ```bash
327
- skill: imgup
328
- # Then upload each screenshot:
329
- imgup -h pixhost screenshot.png # pixhost works without API key
330
- # Alternative hosts: catbox, imagebin, beeimg
331
- ```
332
-
333
- **What to capture:**
334
- - **New screens**: Screenshot of the new UI
335
- - **Modified screens**: Before AND after screenshots
336
- - **Design implementation**: Screenshot showing Figma design match
337
-
338
- **IMPORTANT**: Always include uploaded image URLs in PR description. This provides visual context for reviewers and documents the change.
339
-
340
- 3. **Create Pull Request**
341
-
342
- ```bash
343
- git push -u origin feature-branch-name
344
-
345
- gh pr create --title "Feature: [Description]" --body "$(cat <<'EOF'
346
- ## Summary
347
- - What was built
348
- - Why it was needed
349
- - Key decisions made
350
-
351
- ## Testing
352
- - Tests added/modified
353
- - Manual testing performed
354
-
355
- ## Post-Deploy Monitoring & Validation
356
- - **What to monitor/search**
357
- - Logs:
358
- - Metrics/Dashboards:
359
- - **Validation checks (queries/commands)**
360
- - `command or query here`
361
- - **Expected healthy behavior**
362
- - Expected signal(s)
363
- - **Failure signal(s) / rollback trigger**
364
- - Trigger + immediate action
365
- - **Validation window & owner**
366
- - Window:
367
- - Owner:
368
- - **If no operational impact**
369
- - `No additional operational monitoring required: <reason>`
370
-
371
- ## Before / After Screenshots
372
- | Before | After |
373
- |--------|-------|
374
- | ![before](URL) | ![after](URL) |
375
-
376
- ## Figma Design
377
- [Link if applicable]
378
-
379
- ---
380
-
381
- [![Systematic v[VERSION]](https://img.shields.io/badge/Systematic-v[VERSION]-6366f1)](https://github.com/marcusrbrown/systematic)
382
- 🤖 Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL)
383
- EOF
384
- )"
385
- ```
386
-
387
- 4. **Update Plan Status**
388
-
389
- If the input document has YAML frontmatter with a `status` field, update it to `completed`:
390
- ```
391
- status: active → status: completed
392
- ```
393
-
394
- 5. **Notify User**
395
- - Summarize what was completed
396
- - Link to PR
397
- - Note any follow-up work needed
398
- - Suggest next steps if applicable
365
+ When all Phase 2 tasks are complete and execution transitions to quality check, read `references/shipping-workflow.md` for the full shipping workflow: quality checks, code review, final validation, PR creation, and notification.
399
366
 
400
367
  ---
401
368
 
402
- ## Swarm Mode with Agent Teams (Optional)
403
-
404
- For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in OpenCode, multi-agent workflows in Codex).
405
-
406
- **Agent teams are typically experimental and require opt-in.** Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.
369
+ ## Codex Delegation Mode
407
370
 
408
- ### When to Use Agent Teams vs Subagents
409
-
410
- | Agent Teams | Subagents (standard mode) |
411
- |-------------|---------------------------|
412
- | Agents need to discuss and challenge each other's approaches | Each task is independent — only the result matters |
413
- | Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
414
- | 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
415
- | User explicitly requests "swarm mode" or "agent teams" | Default for most plans |
416
-
417
- Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead — use them when the inter-agent communication genuinely improves the outcome.
418
-
419
- ### Agent Teams Workflow
420
-
421
- 1. **Create team** — use your available team creation mechanism
422
- 2. **Create task list** — parse Implementation Units into tasks with dependency relationships
423
- 3. **Spawn teammates** — assign specialized roles (implementer, tester, reviewer) based on the plan's needs. Give each teammate the plan file path and their specific task assignments
424
- 4. **Coordinate** — the lead monitors task completion, reassigns work if someone gets stuck, and spawns additional workers as phases unblock
425
- 5. **Cleanup** — shut down all teammates, then clean up the team resources
426
-
427
- ---
428
-
429
- ## External Delegate Mode (Optional)
430
-
431
- For plans where token conservation matters, delegate code implementation to an external delegate (currently Codex CLI) while keeping planning, review, and git operations in the current agent.
432
-
433
- This mode integrates with the existing Phase 1 Step 4 strategy selection as a **task-level modifier** - the strategy (inline/serial/parallel) still applies, but the implementation step within each tagged task delegates to the external tool instead of executing directly.
434
-
435
- ### When to Use External Delegation
436
-
437
- | External Delegation | Standard Mode |
438
- |---------------------|---------------|
439
- | Task is pure code implementation | Task requires research or exploration |
440
- | Plan has clear acceptance criteria | Task is ambiguous or needs iteration |
441
- | Token conservation matters (e.g., Max20 plan) | Unlimited plan or small task |
442
- | Files to change are well-scoped | Changes span many interconnected files |
443
-
444
- ### Enabling External Delegation
445
-
446
- External delegation activates when any of these conditions are met:
447
- - The user says "use codex for this work", "delegate to codex", or "delegate mode"
448
- - A plan implementation unit contains `Execution target: external-delegate` in its Execution note (set by ce:plan)
449
-
450
- The specific delegate tool is resolved at execution time. Currently the only supported delegate is Codex CLI. Future delegates can be added without changing plan files.
451
-
452
- ### Environment Guard
453
-
454
- Before attempting delegation, check whether the current agent is already running inside a delegate's sandbox. Delegation from within a sandbox will fail silently or recurse.
455
-
456
- Check for known sandbox indicators:
457
- - `CODEX_SANDBOX` environment variable is set
458
- - `CODEX_SESSION_ID` environment variable is set
459
- - The filesystem is read-only at `.git/` (Codex sandbox blocks git writes)
460
-
461
- If any indicator is detected, print "Already running inside a delegate sandbox - using standard mode." and proceed with standard execution for that task.
462
-
463
- ### External Delegation Workflow
464
-
465
- When external delegation is active, follow this workflow for each tagged task. Do not skip delegation because a task seems "small", "simple", or "faster inline". The user or plan explicitly requested delegation.
466
-
467
- 1. **Check availability**
468
-
469
- Verify the delegate CLI is installed. If not found, print "Delegate CLI not installed - continuing with standard mode." and proceed normally.
470
-
471
- 2. **Build prompt** — For each task, assemble a prompt from the plan's implementation unit (Goal, Files, Approach, Conventions from project AGENTS.md/AGENTS.md). Include rules: no git commits, no PRs, run `git status` and `git diff --stat` when done. Never embed credentials or tokens in the prompt - pass auth through environment variables.
472
-
473
- 3. **Write prompt to file** — Save the assembled prompt to a unique temporary file to avoid shell quoting issues and cross-task races. Use a unique filename per task.
474
-
475
- 4. **Delegate** — Run the delegate CLI, piping the prompt file via stdin (not argv expansion, which hits `ARG_MAX` on large prompts). Omit the model flag to use the delegate's default model, which stays current without manual updates.
476
-
477
- 5. **Review diff** — After the delegate finishes, verify the diff is non-empty and in-scope. Run the project's test/lint commands. If the diff is empty or out-of-scope, fall back to standard mode for that task.
478
-
479
- 6. **Commit** — The current agent handles all git operations. The delegate's sandbox blocks `.git/index.lock` writes, so the delegate cannot commit. Stage changes and commit with a conventional message.
480
-
481
- 7. **Error handling** — On any delegate failure (rate limit, error, empty diff), fall back to standard mode for that task. Track consecutive failures - after 3 consecutive failures, disable delegation for remaining tasks and print "Delegate disabled after 3 consecutive failures - completing remaining tasks in standard mode."
482
-
483
- ### Mixed-Model Attribution
484
-
485
- When some tasks are executed by the delegate and others by the current agent, use the following attribution in Phase 4:
486
-
487
- - If all tasks used the delegate: attribute to the delegate model
488
- - If all tasks used standard mode: attribute to the current agent's model
489
- - If mixed: use `Generated with [CURRENT_MODEL] + [DELEGATE_MODEL] via [HARNESS]` and note which tasks were delegated in the PR description
371
+ When `delegation_active` is true after argument parsing, read `references/codex-delegation-workflow.md` for the complete delegation workflow: pre-checks, batching, prompt template, execution loop, and result classification.
490
372
 
491
373
  ---
492
374
 
@@ -515,7 +397,7 @@ When some tasks are executed by the delegate and others by the current agent, us
515
397
  - Follow existing patterns
516
398
  - Write tests for new code
517
399
  - Run linting before pushing
518
- - Use reviewer agents for complex/risky changes only
400
+ - Review every change — inline for simple additive work, full review for everything else
519
401
 
520
402
  ### Ship Complete Features
521
403
 
@@ -523,34 +405,6 @@ When some tasks are executed by the delegate and others by the current agent, us
523
405
  - Don't leave features 80% done
524
406
  - A finished feature that ships beats a perfect feature that doesn't
525
407
 
526
- ## Quality Checklist
527
-
528
- Before creating PR, verify:
529
-
530
- - [ ] All clarifying questions asked and answered
531
- - [ ] All tasks marked completed
532
- - [ ] Tests pass (run project's test command)
533
- - [ ] Linting passes (use linting-agent)
534
- - [ ] Code follows existing patterns
535
- - [ ] Figma designs match implementation (if applicable)
536
- - [ ] Before/after screenshots captured and uploaded (for UI changes)
537
- - [ ] Commit messages follow conventional format
538
- - [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
539
- - [ ] PR description includes summary, testing notes, and screenshots
540
- - [ ] PR description includes Compound Engineered badge with accurate model, harness, and version
541
-
542
- ## When to Use Reviewer Agents
543
-
544
- **Don't use by default.** Use reviewer agents only when:
545
-
546
- - Large refactor affecting many files (10+)
547
- - Security-sensitive changes (authentication, permissions, data access)
548
- - Performance-critical code paths
549
- - Complex algorithms or business logic
550
- - User explicitly requests thorough review
551
-
552
- For most features: tests + linting + following patterns is sufficient.
553
-
554
408
  ## Common Pitfalls to Avoid
555
409
 
556
410
  - **Analysis paralysis** - Don't overthink, read the plan and execute
@@ -559,5 +413,4 @@ For most features: tests + linting + following patterns is sufficient.
559
413
  - **Testing at the end** - Test continuously or suffer later
560
414
  - **Forgetting to track progress** - Update task status as you go or lose track of what's done
561
415
  - **80% done syndrome** - Finish the feature, don't move on early
562
- - **Over-reviewing simple changes** - Save reviewer agents for complex work
563
-
416
+ - **Skipping review** - Every change gets reviewed; only the depth varies
@@ -57,12 +57,12 @@ What are you working on?
57
57
 
58
58
  | Response | Reference to Read |
59
59
  |----------|-------------------|
60
- | 1, controller | [controllers.md](./references/controllers.md) |
61
- | 2, model | [models.md](./references/models.md) |
62
- | 3, view, frontend, turbo, stimulus, css | [frontend.md](./references/frontend.md) |
63
- | 4, architecture, routing, auth, job, cache | [architecture.md](./references/architecture.md) |
64
- | 5, test, testing, minitest, fixture | [testing.md](./references/testing.md) |
65
- | 6, gem, dependency, library | [gems.md](./references/gems.md) |
60
+ | 1, controller | `references/controllers.md` |
61
+ | 2, model | `references/models.md` |
62
+ | 3, view, frontend, turbo, stimulus, css | `references/frontend.md` |
63
+ | 4, architecture, routing, auth, job, cache | `references/architecture.md` |
64
+ | 5, test, testing, minitest, fixture | `references/testing.md` |
65
+ | 6, gem, dependency, library | `references/gems.md` |
66
66
  | 7, review | Read all references, then review code |
67
67
  | 8, general task | Read relevant references based on context |
68
68
 
@@ -153,12 +153,12 @@ All detailed patterns in `references/`:
153
153
 
154
154
  | File | Topics |
155
155
  |------|--------|
156
- | [controllers.md](./references/controllers.md) | REST mapping, concerns, Turbo responses, API patterns, HTTP caching |
157
- | [models.md](./references/models.md) | Concerns, state records, callbacks, scopes, POROs, authorization, broadcasting |
158
- | [frontend.md](./references/frontend.md) | Turbo Streams, Stimulus controllers, CSS layers, OKLCH colors, partials |
159
- | [architecture.md](./references/architecture.md) | Routing, authentication, jobs, Current attributes, caching, database patterns |
160
- | [testing.md](./references/testing.md) | Minitest, fixtures, unit/integration/system tests, testing patterns |
161
- | [gems.md](./references/gems.md) | What they use vs avoid, decision framework, Gemfile examples |
156
+ | `references/controllers.md` | REST mapping, concerns, Turbo responses, API patterns, HTTP caching |
157
+ | `references/models.md` | Concerns, state records, callbacks, scopes, POROs, authorization, broadcasting |
158
+ | `references/frontend.md` | Turbo Streams, Stimulus controllers, CSS layers, OKLCH colors, partials |
159
+ | `references/architecture.md` | Routing, authentication, jobs, Current attributes, caching, database patterns |
160
+ | `references/testing.md` | Minitest, fixtures, unit/integration/system tests, testing patterns |
161
+ | `references/gems.md` | What they use vs avoid, decision framework, Gemfile examples |
162
162
  </reference_index>
163
163
 
164
164
  <success_criteria>
@@ -183,3 +183,4 @@ Based on [The Unofficial 37signals/DHH Rails Style Guide](https://github.com/mar
183
183
  - Code examples from Fizzy are licensed under the O'Saasy License
184
184
  - Not affiliated with or endorsed by 37signals
185
185
  </credits>
186
+