@fro.bot/systematic 2.3.2 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (71) hide show
  1. package/README.md +12 -13
  2. package/agents/design/design-implementation-reviewer.md +2 -19
  3. package/agents/design/design-iterator.md +2 -31
  4. package/agents/design/figma-design-sync.md +2 -22
  5. package/agents/docs/ankane-readme-writer.md +2 -19
  6. package/agents/document-review/adversarial-document-reviewer.md +3 -2
  7. package/agents/document-review/coherence-reviewer.md +5 -7
  8. package/agents/document-review/design-lens-reviewer.md +3 -4
  9. package/agents/document-review/feasibility-reviewer.md +3 -4
  10. package/agents/document-review/product-lens-reviewer.md +25 -6
  11. package/agents/document-review/scope-guardian-reviewer.md +3 -4
  12. package/agents/document-review/security-lens-reviewer.md +3 -4
  13. package/agents/research/best-practices-researcher.md +4 -21
  14. package/agents/research/framework-docs-researcher.md +2 -19
  15. package/agents/research/git-history-analyzer.md +2 -19
  16. package/agents/research/issue-intelligence-analyst.md +2 -24
  17. package/agents/research/learnings-researcher.md +7 -28
  18. package/agents/research/repo-research-analyst.md +3 -32
  19. package/agents/research/slack-researcher.md +128 -0
  20. package/agents/review/agent-native-reviewer.md +109 -195
  21. package/agents/review/architecture-strategist.md +3 -19
  22. package/agents/review/cli-agent-readiness-reviewer.md +1 -27
  23. package/agents/review/code-simplicity-reviewer.md +5 -19
  24. package/agents/review/data-integrity-guardian.md +3 -19
  25. package/agents/review/data-migration-expert.md +3 -19
  26. package/agents/review/deployment-verification-agent.md +3 -19
  27. package/agents/review/pattern-recognition-specialist.md +4 -20
  28. package/agents/review/performance-oracle.md +3 -31
  29. package/agents/review/project-standards-reviewer.md +5 -5
  30. package/agents/review/schema-drift-detector.md +3 -19
  31. package/agents/review/security-sentinel.md +3 -25
  32. package/agents/review/testing-reviewer.md +3 -3
  33. package/agents/workflow/pr-comment-resolver.md +54 -22
  34. package/agents/workflow/spec-flow-analyzer.md +2 -25
  35. package/package.json +1 -1
  36. package/skills/agent-native-architecture/SKILL.md +28 -27
  37. package/skills/agent-native-architecture/references/agent-execution-patterns.md +3 -3
  38. package/skills/agent-native-architecture/references/agent-native-testing.md +1 -1
  39. package/skills/agent-native-architecture/references/mobile-patterns.md +1 -1
  40. package/skills/andrew-kane-gem-writer/SKILL.md +5 -5
  41. package/skills/ce-brainstorm/SKILL.md +43 -181
  42. package/skills/ce-compound/SKILL.md +143 -89
  43. package/skills/ce-compound-refresh/SKILL.md +48 -5
  44. package/skills/ce-ideate/SKILL.md +27 -242
  45. package/skills/ce-plan/SKILL.md +165 -81
  46. package/skills/ce-review/SKILL.md +348 -125
  47. package/skills/ce-review/references/findings-schema.json +5 -0
  48. package/skills/ce-review/references/persona-catalog.md +2 -2
  49. package/skills/ce-review/references/resolve-base.sh +5 -2
  50. package/skills/ce-review/references/subagent-template.md +25 -3
  51. package/skills/ce-work/SKILL.md +95 -242
  52. package/skills/ce-work-beta/SKILL.md +154 -301
  53. package/skills/dhh-rails-style/SKILL.md +13 -12
  54. package/skills/document-review/SKILL.md +56 -109
  55. package/skills/document-review/references/findings-schema.json +0 -23
  56. package/skills/document-review/references/subagent-template.md +13 -18
  57. package/skills/dspy-ruby/SKILL.md +8 -8
  58. package/skills/every-style-editor/SKILL.md +3 -2
  59. package/skills/frontend-design/SKILL.md +2 -3
  60. package/skills/git-commit/SKILL.md +1 -1
  61. package/skills/git-commit-push-pr/SKILL.md +81 -265
  62. package/skills/git-worktree/SKILL.md +20 -21
  63. package/skills/lfg/SKILL.md +10 -17
  64. package/skills/onboarding/SKILL.md +2 -2
  65. package/skills/onboarding/scripts/inventory.mjs +31 -7
  66. package/skills/proof/SKILL.md +134 -28
  67. package/skills/resolve-pr-feedback/SKILL.md +7 -2
  68. package/skills/setup/SKILL.md +1 -1
  69. package/skills/test-browser/SKILL.md +10 -11
  70. package/skills/test-xcode/SKILL.md +6 -3
  71. package/dist/lib/manifest.d.ts +0 -39
@@ -124,6 +124,11 @@
124
124
  "downstream-resolver": "Turn this into residual work for later resolution.",
125
125
  "human": "A person must make a judgment call before code changes should continue.",
126
126
  "release": "Operational or rollout follow-up; do not convert into code-fix work automatically."
127
+ },
128
+ "return_tiers": {
129
+ "description": "Finding fields are split into two tiers. The full schema (with all required fields) applies to the artifact file on disk. The compact return to the orchestrator omits detail-tier fields. Both are valid uses of this schema in different contexts.",
130
+ "merge_tier": "Returned to orchestrator: title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix (optional). Plus top-level reviewer, residual_risks, testing_gaps.",
131
+ "detail_tier": "Required in artifact file, omitted from compact return: why_it_matters, evidence. The artifact file must pass full schema validation including all required fields. Headless output depends on why_it_matters and evidence being present in the artifact."
127
132
  }
128
133
  }
129
134
  }
@@ -13,7 +13,7 @@ Spawned on every review regardless of diff content.
13
13
  | `correctness` | `systematic:review:correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation, intent compliance |
14
14
  | `testing` | `systematic:review:testing-reviewer` | Coverage gaps, weak assertions, brittle tests, missing edge case tests |
15
15
  | `maintainability` | `systematic:review:maintainability-reviewer` | Coupling, complexity, naming, dead code, premature abstraction |
16
- | `project-standards` | `systematic:review:project-standards-reviewer` | AGENTS.md and AGENTS.md compliance -- frontmatter, references, naming, cross-platform portability, tool selection |
16
+ | `project-standards` | `systematic:review:project-standards-reviewer` | AGENTS.md compliance -- frontmatter, references, naming, cross-platform portability, tool selection |
17
17
 
18
18
  **CE agents (unstructured output, synthesized separately):**
19
19
 
@@ -33,7 +33,7 @@ Spawned when the orchestrator identifies relevant patterns in the diff. The orch
33
33
  | `api-contract` | `systematic:review:api-contract-reviewer` | Route definitions, serializer/interface changes, event schemas, exported type signatures, API versioning |
34
34
  | `data-migrations` | `systematic:review:data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations |
35
35
  | `reliability` | `systematic:review:reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks |
36
- | `adversarial` | `systematic:review:adversarial-reviewer` | Diff has >=50 changed non-test, non-generated, non-lockfile lines, OR touches auth, payments, data mutations, external API integrations, or other high-risk domains |
36
+ | `adversarial` | `systematic:review:adversarial-reviewer` | Diff has >=50 changed lines of executable code (not prose/instruction Markdown, JSON schemas, or config), OR touches auth, payments, data mutations, external API integrations, or other high-risk domains regardless of file type |
37
37
  | `cli-readiness` | `systematic:review:cli-readiness-reviewer` | CLI command definitions, argument parsing, CLI framework usage, command handler implementations |
38
38
  | `previous-comments` | `systematic:review:previous-comments-reviewer` | **PR-only.** Reviewing a PR that has existing review comments or review threads from prior review rounds. Skip entirely when no PR metadata was gathered in Stage 1. |
39
39
 
@@ -52,7 +52,9 @@ if [ -n "$REVIEW_BASE_BRANCH" ]; then
52
52
  if [ -n "$PR_BASE_REPO" ]; then
53
53
  PR_BASE_REMOTE=$(git remote -v | awk "index(\$2, \"github.com:$PR_BASE_REPO\") || index(\$2, \"github.com/$PR_BASE_REPO\") {print \$1; exit}")
54
54
  if [ -n "$PR_BASE_REMOTE" ]; then
55
- git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH:refs/remotes/$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
55
+ # Always fetch a locally cached ref may be stale, producing a
56
+ # merge-base that predates squash-merged work and inflating the diff.
57
+ git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH:refs/remotes/$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
56
58
  BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
57
59
  fi
58
60
  fi
@@ -60,7 +62,8 @@ if [ -n "$REVIEW_BASE_BRANCH" ]; then
60
62
  # Only try origin if it exists as a remote; otherwise skip to avoid
61
63
  # confusing errors in fork setups where origin points at the user's fork.
62
64
  if git remote get-url origin >/dev/null 2>&1; then
63
- git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags origin "$REVIEW_BASE_BRANCH:refs/remotes/origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
65
+ # Always fetch same rationale as the fork-safe path above.
66
+ git fetch --no-tags origin "$REVIEW_BASE_BRANCH:refs/remotes/origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
64
67
  BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
65
68
  fi
66
69
  # Fall back to a bare local ref only if remote resolution failed
@@ -18,7 +18,23 @@ You are a specialist code reviewer.
18
18
  </scope-rules>
19
19
 
20
20
  <output-contract>
21
- Return ONLY valid JSON matching the findings schema below. No prose, no markdown, no explanation outside the JSON object.
21
+ You produce up to two outputs depending on whether a run ID was provided:
22
+
23
+ 1. **Artifact file (when run ID is present).** If a Run ID appears in <review-context> below, WRITE your full analysis (all schema fields, including why_it_matters, evidence, and suggested_fix) as JSON to:
24
+ .context/systematic/ce-review/{run_id}/{reviewer_name}.json
25
+ This is the ONE write operation you are permitted to make. Use the platform's file-write tool.
26
+ If the write fails, continue -- the compact return still provides everything the merge needs.
27
+ If no Run ID is provided (the field is empty or absent), skip this step entirely -- do not attempt any file write.
28
+
29
+ 2. **Compact return (always).** RETURN compact JSON to the parent with ONLY merge-tier fields per finding:
30
+ title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix.
31
+ Do NOT include why_it_matters or evidence in the returned JSON.
32
+ Include reviewer, residual_risks, and testing_gaps at the top level.
33
+
34
+ The full file preserves detail for downstream consumers (headless output, debugging).
35
+ The compact return keeps the orchestrator's context lean for merge and synthesis.
36
+
37
+ The schema below describes the **full artifact file format** (all fields required). For the compact return, follow the field list above -- omit why_it_matters and evidence even though the schema marks them as required.
22
38
 
23
39
  {schema}
24
40
 
@@ -41,9 +57,10 @@ False-positive categories to actively suppress:
41
57
  - Generic "consider adding" advice without a concrete failure mode
42
58
 
43
59
  Rules:
44
- - Every finding MUST include at least one evidence item grounded in the actual code.
60
+ - You are a leaf reviewer inside an already-running systematic review workflow. Do not invoke systematic skills or agents unless this template explicitly instructs you to. Perform your analysis directly and return findings in the required output format only.
61
+ - Every finding in the full artifact file MUST include at least one evidence item grounded in the actual code. The compact return omits evidence -- the evidence requirement applies to the disk artifact only.
45
62
  - Set pre_existing to true ONLY for issues in unchanged code that are unrelated to this diff. If the diff makes the issue newly relevant, it is NOT pre-existing.
46
- - You are operationally read-only. You may use non-mutating inspection commands, including read-oriented `git` / `gh` commands, to gather evidence. Do not edit files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
63
+ - You are operationally read-only. The one permitted exception is writing your full analysis to the `.context/` artifact path when a run ID is provided. You may also use non-mutating inspection commands, including read-oriented `git` / `gh` commands, to gather evidence. Do not edit project files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
47
64
  - Set `autofix_class` accurately -- not every finding is `advisory`. Use this decision guide:
48
65
  - `safe_auto`: The fix is local and deterministic — the fixer can apply it mechanically without design judgment. Examples: extracting a duplicated helper, adding a missing nil/null check, fixing an off-by-one, adding a missing test for an untested code path, removing dead code.
49
66
  - `gated_auto`: A concrete fix exists but it changes contracts, permissions, or crosses a module boundary in a way that deserves explicit approval. Examples: adding authentication to an unprotected endpoint, changing a public API response shape, switching from soft-delete to hard-delete.
@@ -62,6 +79,9 @@ Rules:
62
79
  </pr-context>
63
80
 
64
81
  <review-context>
82
+ Run ID: {run_id}
83
+ Reviewer name: {reviewer_name}
84
+
65
85
  Intent: {intent_summary}
66
86
 
67
87
  Changed files: {file_list}
@@ -82,3 +102,5 @@ Diff:
82
102
  | `{pr_metadata}` | Stage 1 output | PR title, body, and URL when reviewing a PR. Empty string when reviewing a branch or standalone checkout |
83
103
  | `{file_list}` | Stage 1 output | List of changed files from the scope step |
84
104
  | `{diff}` | Stage 1 output | The actual diff content to review |
105
+ | `{run_id}` | Stage 4 output | Unique review run identifier for the artifact directory |
106
+ | `{reviewer_name}` | Stage 3 output | Persona or agent name used as the artifact filename stem |
@@ -1,16 +1,16 @@
1
1
  ---
2
2
  name: ce:work
3
- description: Execute work plans efficiently while maintaining quality and finishing features
4
- argument-hint: '[plan file, specification, or todo file path]'
3
+ description: Execute work efficiently while maintaining quality and finishing features
4
+ argument-hint: "[Plan doc path or description of work. Blank to auto use latest plan doc]"
5
5
  ---
6
6
 
7
- # Work Plan Execution Command
7
+ # Work Execution Command
8
8
 
9
- Execute a work plan efficiently while maintaining quality and finishing features.
9
+ Execute work efficiently while maintaining quality and finishing features.
10
10
 
11
11
  ## Introduction
12
12
 
13
- This command takes a work document (plan, specification, or todo file) and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
13
+ This command takes a work document (plan, specification, or todo file) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
14
14
 
15
15
  ## Input Document
16
16
 
@@ -18,9 +18,33 @@ This command takes a work document (plan, specification, or todo file) and execu
18
18
 
19
19
  ## Execution Workflow
20
20
 
21
+ ### Phase 0: Input Triage
22
+
23
+ Determine how to proceed based on what was provided in `<input_document>`.
24
+
25
+ **Plan document** (input is a file path to an existing plan, specification, or todo file) → skip to Phase 1.
26
+
27
+ **Bare prompt** (input is a description of work, not a file path):
28
+
29
+ 1. **Scan the work area**
30
+
31
+ - Identify files likely to change based on the prompt
32
+ - Find existing test files for those areas (search for test/spec files that import, reference, or share names with the implementation files)
33
+ - Note local patterns and conventions in the affected areas
34
+
35
+ 2. **Assess complexity and route**
36
+
37
+ | Complexity | Signals | Action |
38
+ |-----------|---------|--------|
39
+ | **Trivial** | 1-2 files, no behavioral change (typo, config, rename) | Proceed to Phase 1 step 2 (environment setup), then implement directly — no task list, no execution loop. Apply Test Discovery if the change touches behavior-bearing code |
40
+ | **Small / Medium** | Clear scope, under ~10 files | Build a task list from discovery. Proceed to Phase 1 step 2 |
41
+ | **Large** | Cross-cutting, architectural decisions, 10+ files, touches auth/payments/migrations | Inform the user this would benefit from `/ce:brainstorm` or `/ce:plan` to surface edge cases and scope boundaries. Honor their choice. If proceeding, build a task list and continue to Phase 1 step 2 |
42
+
43
+ ---
44
+
21
45
  ### Phase 1: Quick Start
22
46
 
23
- 1. **Read Plan and Clarify**
47
+ 1. **Read Plan and Clarify** _(skip if arriving from Phase 0 with a bare prompt)_
24
48
 
25
49
  - Read the work document completely
26
50
  - Treat the plan as a decision artifact, not an execution script
@@ -49,8 +73,17 @@ This command takes a work document (plan, specification, or todo file) and execu
49
73
  ```
50
74
 
51
75
  **If already on a feature branch** (not the default branch):
52
- - Ask: "Continue working on `[current_branch]`, or create a new branch?"
53
- - If continuing, proceed to step 3
76
+
77
+ First, check whether the branch name is **meaningful** — a name like `feat/crowd-sniff` or `fix/email-validation` tells future readers what the work is about. Auto-generated worktree names (e.g., `worktree-jolly-beaming-raven`) or other opaque names do not.
78
+
79
+ If the branch name is meaningless or auto-generated, suggest renaming it before continuing:
80
+ ```bash
81
+ git branch -m <meaningful-name>
82
+ ```
83
+ Derive the new name from the plan title or work description (e.g., `feat/crowd-sniff`). Present the rename as a recommended option alongside continuing as-is.
84
+
85
+ Then ask: "Continue working on `[current_branch]`, or create a new branch?"
86
+ - If continuing (with or without rename), proceed to step 3
54
87
  - If creating new, follow Option A or B below
55
88
 
56
89
  **If on the default branch**, choose how to proceed:
@@ -78,7 +111,7 @@ This command takes a work document (plan, specification, or todo file) and execu
78
111
  - You want to keep the default branch clean while experimenting
79
112
  - You plan to switch between branches frequently
80
113
 
81
- 3. **Create Todo List**
114
+ 3. **Create Todo List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
82
115
  - Use your available task tracking tool (e.g., todowrite, task lists) to break the plan into actionable tasks
83
116
  - Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
84
117
  - Carry each unit's `Execution note` into the task when present
@@ -96,18 +129,44 @@ This command takes a work document (plan, specification, or todo file) and execu
96
129
 
97
130
  | Strategy | When to use |
98
131
  |----------|-------------|
99
- | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight |
100
- | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks |
101
- | **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
132
+ | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight. **Default for bare-prompt work** — bare prompts rarely produce enough structured context to justify subagent dispatch |
133
+ | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks. Requires plan-unit metadata (Goal, Files, Approach, Test scenarios) |
134
+ | **Parallel subagents** | 3+ tasks that pass the Parallel Safety Check (below). Dispatch independent units simultaneously, run dependent units after their prerequisites complete. Requires plan-unit metadata |
135
+
136
+ **Parallel Safety Check** — required before choosing parallel dispatch:
137
+
138
+ 1. Build a file-to-unit mapping from every candidate unit's `Files:` section (Create, Modify, and Test paths)
139
+ 2. Check for intersection — any file path appearing in 2+ units means overlap
140
+ 3. If any overlap is found, downgrade to serial subagents. Log the reason (e.g., "Units 2 and 4 share `config/routes.rb` — using serial dispatch"). Serial subagents still provide context-window isolation without shared-directory risks
141
+
142
+ Even with no file overlap, parallel subagents sharing a working directory face git index contention (concurrent staging/committing corrupts the index) and test interference (concurrent test runs pick up each other's in-progress changes). The parallel subagent constraints below mitigate these.
102
143
 
103
144
  **Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
104
145
  - The full plan file path (for overall context)
105
146
  - The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
106
147
  - Any resolved deferred questions relevant to that unit
148
+ - Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests
149
+
150
+ **Parallel subagent constraints** — when dispatching units in parallel (not serial or inline):
151
+ - Instruct each subagent: "Do not stage files (`git add`), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete."
152
+ - These constraints prevent git index contention and test interference between concurrent subagents
153
+
154
+ **Permission mode:** Omit the `mode` parameter when dispatching subagents so the user's configured permission settings apply. Do not pass `mode: "auto"` — it overrides user-level settings like `bypassPermissions`.
107
155
 
108
- After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
156
+ **After each subagent completes (serial mode):**
157
+ 1. Review the subagent's diff — verify changes match the unit's scope and `Files:` list
158
+ 2. Run the relevant test suite to confirm the tree is healthy
159
+ 3. If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
160
+ 4. Update the plan checkboxes and task list
161
+ 5. Dispatch the next unit
109
162
 
110
- For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
163
+ **After all parallel subagents in a batch complete:**
164
+ 1. Wait for every subagent in the current parallel batch to finish before acting on any of their results
165
+ 2. Cross-check for discovered file collisions: compare the actual files modified by all subagents in the batch (not just their declared `Files:` lists). Subagents may create or modify files not anticipated during planning — this is expected, since plans describe *what* not *how*. A collision only matters when 2+ subagents in the same batch modified the same file. In a shared working directory, only the last writer's version survives — the other unit's changes to that file are lost. If a collision is detected: commit all non-colliding files from all units first, then re-run the affected units serially for the shared file so each builds on the other's committed work
166
+ 3. For each completed unit, in dependency order: review the diff, run the relevant test suite, stage only that unit's files, and commit with a conventional message derived from the unit's Goal
167
+ 4. If tests fail after committing a unit's changes, diagnose and fix before committing the next unit
168
+ 5. Update the plan checkboxes and task list
169
+ 6. Dispatch the next batch of independent units, or the next dependent unit
111
170
 
112
171
  ### Phase 2: Execute
113
172
 
@@ -118,12 +177,14 @@ This command takes a work document (plan, specification, or todo file) and execu
118
177
  ```
119
178
  while (tasks remain):
120
179
  - Mark task as in-progress
121
- - Read any referenced files from the plan
180
+ - Read any referenced files from the plan or discovered during Phase 0
122
181
  - Look for similar patterns in codebase
182
+ - Find existing test files for implementation files being changed (Test Discovery — see below)
123
183
  - Implement following existing conventions
124
- - Write tests for new functionality
184
+ - Add, update, or remove tests to match implementation changes (see Test Discovery below)
125
185
  - Run System-Wide Test Check (see below)
126
186
  - Run tests after changes
187
+ - Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
127
188
  - Mark task as completed
128
189
  - Evaluate for incremental commit (see below)
129
190
  ```
@@ -136,6 +197,17 @@ This command takes a work document (plan, specification, or todo file) and execu
136
197
  - Do not over-implement beyond the current behavior slice when working test-first
137
198
  - Skip test-first discipline for trivial renames, pure configuration, and pure styling work
138
199
 
200
+ **Test Discovery** — Before implementing changes to a file, find its existing test files (search for test/spec files that import, reference, or share naming patterns with the implementation file). When a plan specifies test scenarios or test files, start there, then check for additional test coverage the plan may not have enumerated. Changes to implementation files should be accompanied by corresponding test updates — new tests for new behavior, modified tests for changed behavior, removed or updated tests for deleted behavior.
201
+
202
+ **Test Scenario Completeness** — Before writing tests for a feature-bearing unit, check whether the plan's `Test scenarios` cover all categories that apply to this unit. If a category is missing or scenarios are vague (e.g., "validates correctly" without naming inputs and expected outcomes), supplement from the unit's own context before writing tests:
203
+
204
+ | Category | When it applies | How to derive if missing |
205
+ |----------|----------------|------------------------|
206
+ | **Happy path** | Always for feature-bearing units | Read the unit's Goal and Approach for core input/output pairs |
207
+ | **Edge cases** | When the unit has meaningful boundaries (inputs, state, concurrency) | Identify boundary values, empty/nil inputs, and concurrent access patterns |
208
+ | **Error/failure paths** | When the unit has failure modes (validation, external calls, permissions) | Enumerate invalid inputs the unit should reject, permission/auth denials it should enforce, and downstream failures it should handle |
209
+ | **Integration** | When the unit crosses layers (callbacks, middleware, multi-service) | Identify the cross-layer chain and write a scenario that exercises it without mocks |
210
+
139
211
  **System-Wide Test Check** — Before marking a task done, pause and ask:
140
212
 
141
213
  | Question | What to do |
@@ -182,6 +254,8 @@ This command takes a work document (plan, specification, or todo file) and execu
182
254
 
183
255
  **Note:** Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.
184
256
 
257
+ **Parallel subagent mode:** When units run as parallel subagents, the subagents do not commit — the orchestrator handles staging and committing after the entire parallel batch completes (see Parallel subagent constraints in Phase 1 Step 4). The commit guidance in this section applies to inline and serial execution, and to the orchestrator's commit decisions after parallel batch completion.
258
+
185
259
  3. **Follow Existing Patterns**
186
260
 
187
261
  - The plan should reference similar code - read those files first
@@ -195,7 +269,7 @@ This command takes a work document (plan, specification, or todo file) and execu
195
269
  - Run relevant tests after each significant change
196
270
  - Don't wait until the end to test
197
271
  - Fix failures immediately
198
- - Add new tests for new functionality
272
+ - Add new tests for new behavior, update tests for changed behavior, remove tests for deleted behavior
199
273
  - **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.
200
274
 
201
275
  5. **Simplify as You Go**
@@ -221,201 +295,9 @@ This command takes a work document (plan, specification, or todo file) and execu
221
295
  - Create new tasks if scope expands
222
296
  - Keep user informed of major milestones
223
297
 
224
- ### Phase 3: Quality Check
225
-
226
- 1. **Run Core Quality Checks**
227
-
228
- Always run before submitting:
229
-
230
- ```bash
231
- # Run full test suite (use project's test command)
232
- # Examples: bin/rails test, npm test, pytest, go test, etc.
233
-
234
- # Run linting (per AGENTS.md)
235
- # Use linting-agent before pushing to origin
236
- ```
237
-
238
- 2. **Consider Code Review** (Optional)
239
-
240
- Use for complex, risky, or large changes. Load the `ce:review` skill with `mode:autofix` to fix safe issues and flag the rest before shipping.
241
-
242
- 3. **Final Validation**
243
- - All tasks marked completed
244
- - All tests pass
245
- - Linting passes
246
- - Code follows existing patterns
247
- - Figma designs match (if applicable)
248
- - No console errors or warnings
249
- - If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
250
- - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
251
-
252
- 4. **Prepare Operational Validation Plan** (REQUIRED)
253
- - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
254
- - Include concrete:
255
- - Log queries/search terms
256
- - Metrics or dashboards to watch
257
- - Expected healthy signals
258
- - Failure signals and rollback/mitigation trigger
259
- - Validation window and owner
260
- - If there is truly no production/runtime impact, still include the section with: `No additional operational monitoring required` and a one-line reason.
261
-
262
- ### Phase 4: Ship It
263
-
264
- 1. **Create Commit**
265
-
266
- ```bash
267
- git add .
268
- git status # Review what's being committed
269
- git diff --staged # Check the changes
270
-
271
- # Commit with conventional format
272
- git commit -m "$(cat <<'EOF'
273
- feat(scope): description of what and why
274
-
275
- Brief explanation if needed.
276
-
277
- 🤖 Generated with [MODEL] via [HARNESS](HARNESS_URL) + Systematic v[VERSION]
278
-
279
- Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com>
280
- EOF
281
- )"
282
- ```
283
-
284
- **Fill in at commit/PR time:**
285
-
286
- | Placeholder | Value | Example |
287
- |-------------|-------|---------|
288
- | Placeholder | Value | Example |
289
- |-------------|-------|---------|
290
- | `[MODEL]` | Model name | Claude Opus 4.6, GPT-5.4 |
291
- | `[CONTEXT]` | Context window (if known) | 200K, 1M |
292
- | `[THINKING]` | Thinking level (if known) | extended thinking |
293
- | `[HARNESS]` | Tool running you | OpenCode, Codex, Gemini CLI |
294
- | `[HARNESS_URL]` | Link to that tool | `https://opencode.ai` |
295
- | `[VERSION]` | `plugin.json` → `version` | 2.40.0 |
296
-
297
- Subagents creating commits/PRs are equally responsible for accurate attribution.
298
-
299
- 2. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
300
-
301
- For **any** design changes, new views, or UI modifications, you MUST capture and upload screenshots:
302
-
303
- **Step 1: Start dev server** (if not running)
304
- ```bash
305
- bin/dev # Run in background
306
- ```
307
-
308
- **Step 2: Capture screenshots with agent-browser CLI**
309
- ```bash
310
- agent-browser open http://localhost:3000/[route]
311
- agent-browser snapshot -i
312
- agent-browser screenshot output.png
313
- ```
314
- See the `agent-browser` skill for detailed usage.
315
-
316
- **Step 3: Upload using imgup skill**
317
- ```bash
318
- skill: imgup
319
- # Then upload each screenshot:
320
- imgup -h pixhost screenshot.png # pixhost works without API key
321
- # Alternative hosts: catbox, imagebin, beeimg
322
- ```
323
-
324
- **What to capture:**
325
- - **New screens**: Screenshot of the new UI
326
- - **Modified screens**: Before AND after screenshots
327
- - **Design implementation**: Screenshot showing Figma design match
328
-
329
- **IMPORTANT**: Always include uploaded image URLs in PR description. This provides visual context for reviewers and documents the change.
330
-
331
- 3. **Create Pull Request**
332
-
333
- ```bash
334
- git push -u origin feature-branch-name
335
-
336
- gh pr create --title "Feature: [Description]" --body "$(cat <<'EOF'
337
- ## Summary
338
- - What was built
339
- - Why it was needed
340
- - Key decisions made
341
-
342
- ## Testing
343
- - Tests added/modified
344
- - Manual testing performed
345
-
346
- ## Post-Deploy Monitoring & Validation
347
- - **What to monitor/search**
348
- - Logs:
349
- - Metrics/Dashboards:
350
- - **Validation checks (queries/commands)**
351
- - `command or query here`
352
- - **Expected healthy behavior**
353
- - Expected signal(s)
354
- - **Failure signal(s) / rollback trigger**
355
- - Trigger + immediate action
356
- - **Validation window & owner**
357
- - Window:
358
- - Owner:
359
- - **If no operational impact**
360
- - `No additional operational monitoring required: <reason>`
361
-
362
- ## Before / After Screenshots
363
- | Before | After |
364
- |--------|-------|
365
- | ![before](URL) | ![after](URL) |
366
-
367
- ## Figma Design
368
- [Link if applicable]
369
-
370
- ---
371
-
372
- [![Systematic v[VERSION]](https://img.shields.io/badge/Systematic-v[VERSION]-6366f1)](https://github.com/marcusrbrown/systematic)
373
- 🤖 Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL)
374
- EOF
375
- )"
376
- ```
377
-
378
- 4. **Update Plan Status**
379
-
380
- If the input document has YAML frontmatter with a `status` field, update it to `completed`:
381
- ```
382
- status: active → status: completed
383
- ```
384
-
385
- 5. **Notify User**
386
- - Summarize what was completed
387
- - Link to PR
388
- - Note any follow-up work needed
389
- - Suggest next steps if applicable
390
-
391
- ---
392
-
393
- ## Swarm Mode with Agent Teams (Optional)
394
-
395
- For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in OpenCode, multi-agent workflows in Codex).
396
-
397
- **Agent teams are typically experimental and require opt-in.** Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.
398
-
399
- ### When to Use Agent Teams vs Subagents
400
-
401
- | Agent Teams | Subagents (standard mode) |
402
- |-------------|---------------------------|
403
- | Agents need to discuss and challenge each other's approaches | Each task is independent — only the result matters |
404
- | Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
405
- | 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
406
- | User explicitly requests "swarm mode" or "agent teams" | Default for most plans |
407
-
408
- Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead — use them when the inter-agent communication genuinely improves the outcome.
298
+ ### Phase 3-4: Quality Check and Ship It
409
299
 
410
- ### Agent Teams Workflow
411
-
412
- 1. **Create team** — use your available team creation mechanism
413
- 2. **Create task list** — parse Implementation Units into tasks with dependency relationships
414
- 3. **Spawn teammates** — assign specialized roles (implementer, tester, reviewer) based on the plan's needs. Give each teammate the plan file path and their specific task assignments
415
- 4. **Coordinate** — the lead monitors task completion, reassigns work if someone gets stuck, and spawns additional workers as phases unblock
416
- 5. **Cleanup** — shut down all teammates, then clean up the team resources
417
-
418
- ---
300
+ When all Phase 2 tasks are complete and execution transitions to quality check, read `references/shipping-workflow.md` for the full shipping workflow: quality checks, code review, final validation, PR creation, and notification.
419
301
 
420
302
  ## Key Principles
421
303
 
@@ -442,7 +324,7 @@ Most plans should use subagent dispatch from standard mode. Agent teams add sign
442
324
  - Follow existing patterns
443
325
  - Write tests for new code
444
326
  - Run linting before pushing
445
- - Use reviewer agents for complex/risky changes only
327
+ - Review every change — inline for simple additive work, full review for everything else
446
328
 
447
329
  ### Ship Complete Features
448
330
 
@@ -450,34 +332,6 @@ Most plans should use subagent dispatch from standard mode. Agent teams add sign
450
332
  - Don't leave features 80% done
451
333
  - A finished feature that ships beats a perfect feature that doesn't
452
334
 
453
- ## Quality Checklist
454
-
455
- Before creating PR, verify:
456
-
457
- - [ ] All clarifying questions asked and answered
458
- - [ ] All tasks marked completed
459
- - [ ] Tests pass (run project's test command)
460
- - [ ] Linting passes (use linting-agent)
461
- - [ ] Code follows existing patterns
462
- - [ ] Figma designs match implementation (if applicable)
463
- - [ ] Before/after screenshots captured and uploaded (for UI changes)
464
- - [ ] Commit messages follow conventional format
465
- - [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
466
- - [ ] PR description includes summary, testing notes, and screenshots
467
- - [ ] PR description includes Compound Engineered badge with accurate model, harness, and version
468
-
469
- ## When to Use Reviewer Agents
470
-
471
- **Don't use by default.** Use reviewer agents only when:
472
-
473
- - Large refactor affecting many files (10+)
474
- - Security-sensitive changes (authentication, permissions, data access)
475
- - Performance-critical code paths
476
- - Complex algorithms or business logic
477
- - User explicitly requests thorough review
478
-
479
- For most features: tests + linting + following patterns is sufficient.
480
-
481
335
  ## Common Pitfalls to Avoid
482
336
 
483
337
  - **Analysis paralysis** - Don't overthink, read the plan and execute
@@ -486,5 +340,4 @@ For most features: tests + linting + following patterns is sufficient.
486
340
  - **Testing at the end** - Test continuously or suffer later
487
341
  - **Forgetting to track progress** - Update task status as you go or lose track of what's done
488
342
  - **80% done syndrome** - Finish the feature, don't move on early
489
- - **Over-reviewing simple changes** - Save reviewer agents for complex work
490
-
343
+ - **Skipping review** - Every change gets reviewed; only the depth varies