@fro.bot/systematic 2.7.3 → 2.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/agents/workflow/systematic-implementer.md +39 -0
- package/package.json +1 -1
- package/skills/ce-work/SKILL.md +12 -1
- package/skills/ce-work-beta/SKILL.md +0 -416
- package/skills/ce-work-beta/references/codex-delegation-workflow.md +0 -327
- package/skills/ce-work-beta/references/shipping-workflow.md +0 -129
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: systematic-implementer
|
|
3
|
+
description: Implements one plan unit in a fresh subagent context and reports bounded changes back to the orchestrator.
|
|
4
|
+
mode: subagent
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are a focused implementer dispatched by a parent OpenCode session orchestrating a multi-unit plan. You implement one unit's worth of changes and report back to the orchestrator.
|
|
8
|
+
|
|
9
|
+
## Role
|
|
10
|
+
|
|
11
|
+
Execute exactly the assigned slice of the plan. Treat the parent session as the owner of sequencing, final verification, git operations, and release decisions.
|
|
12
|
+
|
|
13
|
+
## Constraints
|
|
14
|
+
|
|
15
|
+
- Do not stage files (`git add`), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete.
|
|
16
|
+
- Do not push to remote.
|
|
17
|
+
- Do not edit the plan document, task list, checkboxes, or orchestration state. Report completion status to the orchestrator instead.
|
|
18
|
+
- Stay within the assigned unit's scope unless the implementation cannot be completed safely without touching adjacent files. If that happens, make the smallest necessary change and call it out explicitly in your final response.
|
|
19
|
+
- Do not start unrelated cleanup or refactors. If you see follow-up work, report it instead of doing it.
|
|
20
|
+
|
|
21
|
+
## Approach
|
|
22
|
+
|
|
23
|
+
1. Read the plan path and unit details provided by the orchestrator.
|
|
24
|
+
2. Identify the unit's Goal, Files, Approach, Execution note, Patterns to follow, Test scenarios, and Verification criteria.
|
|
25
|
+
3. Inspect the referenced files and local patterns before editing.
|
|
26
|
+
4. Make the bounded file edits needed for the assigned unit.
|
|
27
|
+
5. Run the narrowest affected test or check command relevant to the files you touched. If no targeted check exists, say so. Do not run the full project test suite.
|
|
28
|
+
6. If the unit changes behavior, ensure the provided test scenarios cover happy paths, edge cases, error paths, and integration boundaries where applicable. Add or adjust targeted tests only when the assigned unit calls for it.
|
|
29
|
+
7. Stop and report if the plan is ambiguous, the declared file scope is wrong, or a required dependency is missing.
|
|
30
|
+
|
|
31
|
+
## Output
|
|
32
|
+
|
|
33
|
+
Your final response to the orchestrator must include:
|
|
34
|
+
|
|
35
|
+
- Summary of changes made.
|
|
36
|
+
- Files modified.
|
|
37
|
+
- Targeted checks run and their results.
|
|
38
|
+
- Any deviations from the unit's declared Files list.
|
|
39
|
+
- Any unresolved questions, blocked work, or issues requiring orchestrator attention before the next dispatch.
|
package/package.json
CHANGED
package/skills/ce-work/SKILL.md
CHANGED
|
@@ -141,11 +141,22 @@ Determine how to proceed based on what was provided in `<input_document>`.
|
|
|
141
141
|
|
|
142
142
|
Even with no file overlap, parallel subagents sharing a working directory face git index contention (concurrent staging/committing corrupts the index) and test interference (concurrent test runs pick up each other's in-progress changes). The parallel subagent constraints below mitigate these.
|
|
143
143
|
|
|
144
|
-
**Subagent dispatch** uses
|
|
144
|
+
**Subagent dispatch** uses OpenCode's `task` tool with the bundled `systematic-implementer` subagent:
|
|
145
|
+
|
|
146
|
+
```typescript
|
|
147
|
+
task({
|
|
148
|
+
subagent_type: "systematic-implementer",
|
|
149
|
+
description: <unit goal>,
|
|
150
|
+
prompt: <unit prompt body>,
|
|
151
|
+
})
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
`description` carries the one-line unit Goal. `prompt` carries the plan path plus the unit's Files, Approach, Execution note, Patterns, Test scenarios, Verification, and relevant resolved deferred questions. For each unit, give the subagent:
|
|
145
155
|
- The full plan file path (for overall context)
|
|
146
156
|
- The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
|
|
147
157
|
- Any resolved deferred questions relevant to that unit
|
|
148
158
|
- Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests
|
|
159
|
+
- Instruction not to edit the plan document, task list, checkboxes, or orchestration state. Subagents report completion status instead
|
|
149
160
|
|
|
150
161
|
**Parallel subagent constraints** — when dispatching units in parallel (not serial or inline):
|
|
151
162
|
- Instruct each subagent: "Do not stage files (`git add`), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete."
|
|
@@ -1,416 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: ce:work-beta
|
|
3
|
-
description: "[BETA] Execute work with external delegate support. Same as ce:work but includes experimental Codex delegation mode for token-conserving code implementation."
|
|
4
|
-
disable-model-invocation: true
|
|
5
|
-
argument-hint: "[Plan doc path or description of work. Blank to auto use latest plan doc] [delegate:codex]"
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
# Work Execution Command
|
|
9
|
-
|
|
10
|
-
Execute work efficiently while maintaining quality and finishing features.
|
|
11
|
-
|
|
12
|
-
## Introduction
|
|
13
|
-
|
|
14
|
-
This command takes a work document (plan, specification, or todo file) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
|
|
15
|
-
|
|
16
|
-
**Beta rollout note:** Invoke `ce:work-beta` manually when you want to trial Codex delegation. During the beta period, planning and workflow handoffs remain pointed at stable `ce:work` to avoid dual-path orchestration complexity.
|
|
17
|
-
|
|
18
|
-
## Input Document
|
|
19
|
-
|
|
20
|
-
<input_document> #$ARGUMENTS </input_document>
|
|
21
|
-
|
|
22
|
-
## Argument Parsing
|
|
23
|
-
|
|
24
|
-
Parse `$ARGUMENTS` for the following optional tokens. Strip each recognized token before interpreting the remainder as the plan file path or bare prompt.
|
|
25
|
-
|
|
26
|
-
| Token | Example | Effect |
|
|
27
|
-
|-------|---------|--------|
|
|
28
|
-
| `delegate:codex` | `delegate:codex` | Activate Codex delegation mode for plan execution |
|
|
29
|
-
| `delegate:local` | `delegate:local` | Deactivate delegation even if enabled in config |
|
|
30
|
-
|
|
31
|
-
All tokens are optional. When absent, fall back to the resolution chain below.
|
|
32
|
-
|
|
33
|
-
**Fuzzy activation:** Also recognize imperative delegation-intent phrases such as "use codex", "delegate to codex", "codex mode", or "delegate mode" as equivalent to `delegate:codex`. A bare mention of "codex" in a prompt (e.g., "fix codex converter bugs") must NOT activate delegation -- only clear delegation intent triggers it.
|
|
34
|
-
|
|
35
|
-
**Fuzzy deactivation:** Also recognize phrases such as "no codex", "local mode", "standard mode" as equivalent to `delegate:local`.
|
|
36
|
-
|
|
37
|
-
### Settings Resolution Chain
|
|
38
|
-
|
|
39
|
-
After extracting tokens from arguments, resolve the delegation state using this precedence chain:
|
|
40
|
-
|
|
41
|
-
1. **Argument flag** -- `delegate:codex` or `delegate:local` from the current invocation (highest priority)
|
|
42
|
-
2. **Config file** -- extract settings from the config block below. Value `codex` for `work_delegate` activates delegation; `false` deactivates.
|
|
43
|
-
3. **Hard default** -- `false` (delegation off)
|
|
44
|
-
|
|
45
|
-
**Config (pre-resolved):**
|
|
46
|
-
!`cat "$(git rev-parse --show-toplevel 2>/dev/null)/.systematic/config.local.yaml" 2>/dev/null || cat "$(dirname "$(git rev-parse --path-format=absolute --git-common-dir 2>/dev/null)")/.systematic/config.local.yaml" 2>/dev/null || echo '__NO_CONFIG__'`
|
|
47
|
-
|
|
48
|
-
If the block above contains YAML key-value pairs, extract values for the keys listed below.
|
|
49
|
-
If it shows `__NO_CONFIG__`, the file does not exist — all settings fall through to defaults.
|
|
50
|
-
If it shows an unresolved command string, read `.systematic/config.local.yaml` from the repo root using the native file-read tool (e.g., Read in OpenCode, read_file in Codex). If the file does not exist, all settings fall through to defaults.
|
|
51
|
-
|
|
52
|
-
If any setting has an unrecognized value, fall through to the hard default for that setting.
|
|
53
|
-
|
|
54
|
-
Config keys:
|
|
55
|
-
- `work_delegate` -- `codex` or default `false`
|
|
56
|
-
- `work_delegate_consent` -- `true` or default `false`
|
|
57
|
-
- `work_delegate_sandbox` -- `yolo` (default) or `full-auto`
|
|
58
|
-
- `work_delegate_decision` -- `auto` (default) or `ask`
|
|
59
|
-
- `work_delegate_model` -- Codex model to use (default `gpt-5.4`). Passthrough — any valid model name accepted.
|
|
60
|
-
- `work_delegate_effort` -- `minimal`, `low`, `medium`, `high` (default), or `xhigh`
|
|
61
|
-
|
|
62
|
-
Store the resolved state for downstream consumption:
|
|
63
|
-
- `delegation_active` -- boolean, whether delegation mode is on
|
|
64
|
-
- `delegation_source` -- `argument` or `config` or `default` -- how delegation was resolved (used by environment guard to decide notification verbosity)
|
|
65
|
-
- `sandbox_mode` -- `yolo` or `full-auto` (from config or default `yolo`)
|
|
66
|
-
- `consent_granted` -- boolean (from config `work_delegate_consent`)
|
|
67
|
-
- `delegate_model` -- string (from config or default `gpt-5.4`)
|
|
68
|
-
- `delegate_effort` -- string (from config or default `high`)
|
|
69
|
-
|
|
70
|
-
---
|
|
71
|
-
|
|
72
|
-
## Execution Workflow
|
|
73
|
-
|
|
74
|
-
### Phase 0: Input Triage
|
|
75
|
-
|
|
76
|
-
Determine how to proceed based on what was provided in `<input_document>`.
|
|
77
|
-
|
|
78
|
-
**Plan document** (input is a file path to an existing plan, specification, or todo file) → skip to Phase 1.
|
|
79
|
-
|
|
80
|
-
**Bare prompt** (input is a description of work, not a file path):
|
|
81
|
-
|
|
82
|
-
1. **Scan the work area**
|
|
83
|
-
|
|
84
|
-
- Identify files likely to change based on the prompt
|
|
85
|
-
- Find existing test files for those areas (search for test/spec files that import, reference, or share names with the implementation files)
|
|
86
|
-
- Note local patterns and conventions in the affected areas
|
|
87
|
-
|
|
88
|
-
2. **Assess complexity and route**
|
|
89
|
-
|
|
90
|
-
| Complexity | Signals | Action |
|
|
91
|
-
|-----------|---------|--------|
|
|
92
|
-
| **Trivial** | 1-2 files, no behavioral change (typo, config, rename) | Proceed to Phase 1 step 2 (environment setup), then implement directly — no task list, no execution loop. Apply Test Discovery if the change touches behavior-bearing code |
|
|
93
|
-
| **Small / Medium** | Clear scope, under ~10 files | Build a task list from discovery. Proceed to Phase 1 step 2 |
|
|
94
|
-
| **Large** | Cross-cutting, architectural decisions, 10+ files, touches auth/payments/migrations | Inform the user this would benefit from `/ce:brainstorm` or `/ce:plan` to surface edge cases and scope boundaries. Honor their choice. If proceeding, build a task list and continue to Phase 1 step 2 |
|
|
95
|
-
|
|
96
|
-
---
|
|
97
|
-
|
|
98
|
-
### Phase 1: Quick Start
|
|
99
|
-
|
|
100
|
-
1. **Read Plan and Clarify** _(skip if arriving from Phase 0 with a bare prompt)_
|
|
101
|
-
|
|
102
|
-
- Read the work document completely
|
|
103
|
-
- Treat the plan as a decision artifact, not an execution script
|
|
104
|
-
- If the plan includes sections such as `Implementation Units`, `Work Breakdown`, `Requirements Trace`, `Files`, `Test Scenarios`, or `Verification`, use those as the primary source material for execution
|
|
105
|
-
- Check for `Execution note` on each implementation unit — these carry the plan's execution posture signal for that unit (for example, test-first or characterization-first). Note them when creating tasks.
|
|
106
|
-
- Check for a `Deferred to Implementation` or `Implementation-Time Unknowns` section — these are questions the planner intentionally left for you to resolve during execution. Note them before starting so they inform your approach rather than surprising you mid-task
|
|
107
|
-
- Check for a `Scope Boundaries` section — these are explicit non-goals. Refer back to them if implementation starts pulling you toward adjacent work
|
|
108
|
-
- Review any references or links provided in the plan
|
|
109
|
-
- If the user explicitly asks for TDD, test-first, or characterization-first execution in this session, honor that request even if the plan has no `Execution note`
|
|
110
|
-
- If anything is unclear or ambiguous, ask clarifying questions now
|
|
111
|
-
- Get user approval to proceed
|
|
112
|
-
- **Do not skip this** - better to ask questions now than build the wrong thing
|
|
113
|
-
|
|
114
|
-
2. **Setup Environment**
|
|
115
|
-
|
|
116
|
-
First, check the current branch:
|
|
117
|
-
|
|
118
|
-
```bash
|
|
119
|
-
current_branch=$(git branch --show-current)
|
|
120
|
-
default_branch=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@')
|
|
121
|
-
|
|
122
|
-
# Fallback if remote HEAD isn't set
|
|
123
|
-
if [ -z "$default_branch" ]; then
|
|
124
|
-
default_branch=$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo "main" || echo "master")
|
|
125
|
-
fi
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
**If already on a feature branch** (not the default branch):
|
|
129
|
-
|
|
130
|
-
First, check whether the branch name is **meaningful** — a name like `feat/crowd-sniff` or `fix/email-validation` tells future readers what the work is about. Auto-generated worktree names (e.g., `worktree-jolly-beaming-raven`) or other opaque names do not.
|
|
131
|
-
|
|
132
|
-
If the branch name is meaningless or auto-generated, suggest renaming it before continuing:
|
|
133
|
-
```bash
|
|
134
|
-
git branch -m <meaningful-name>
|
|
135
|
-
```
|
|
136
|
-
Derive the new name from the plan title or work description (e.g., `feat/crowd-sniff`). Present the rename as a recommended option alongside continuing as-is.
|
|
137
|
-
|
|
138
|
-
Then ask: "Continue working on `[current_branch]`, or create a new branch?"
|
|
139
|
-
- If continuing (with or without rename), proceed to step 3
|
|
140
|
-
- If creating new, follow Option A or B below
|
|
141
|
-
|
|
142
|
-
**If on the default branch**, choose how to proceed:
|
|
143
|
-
|
|
144
|
-
**Option A: Create a new branch**
|
|
145
|
-
```bash
|
|
146
|
-
git pull origin [default_branch]
|
|
147
|
-
git checkout -b feature-branch-name
|
|
148
|
-
```
|
|
149
|
-
Use a meaningful name based on the work (e.g., `feat/user-authentication`, `fix/email-validation`).
|
|
150
|
-
|
|
151
|
-
**Option B: Use a worktree (recommended for parallel development)**
|
|
152
|
-
```bash
|
|
153
|
-
skill: git-worktree
|
|
154
|
-
# The skill will create a new branch from the default branch in an isolated worktree
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
**Option C: Continue on the default branch**
|
|
158
|
-
- Requires explicit user confirmation
|
|
159
|
-
- Only proceed after user explicitly says "yes, commit to [default_branch]"
|
|
160
|
-
- Never commit directly to the default branch without explicit permission
|
|
161
|
-
|
|
162
|
-
**Recommendation**: Use worktree if:
|
|
163
|
-
- You want to work on multiple features simultaneously
|
|
164
|
-
- You want to keep the default branch clean while experimenting
|
|
165
|
-
- You plan to switch between branches frequently
|
|
166
|
-
|
|
167
|
-
3. **Create Todo List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
|
|
168
|
-
- Use your available task tracking tool (e.g., todowrite, task lists) to break the plan into actionable tasks
|
|
169
|
-
- Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
|
|
170
|
-
- Carry each unit's `Execution note` into the task when present
|
|
171
|
-
- For each unit, read the `Patterns to follow` field before implementing — these point to specific files or conventions to mirror
|
|
172
|
-
- Use each unit's `Verification` field as the primary "done" signal for that task
|
|
173
|
-
- Do not expect the plan to contain implementation code, micro-step TDD instructions, or exact shell commands
|
|
174
|
-
- Include dependencies between tasks
|
|
175
|
-
- Prioritize based on what needs to be done first
|
|
176
|
-
- Include testing and quality check tasks
|
|
177
|
-
- Keep tasks specific and completable
|
|
178
|
-
|
|
179
|
-
4. **Choose Execution Strategy**
|
|
180
|
-
|
|
181
|
-
**Delegation routing gate:** If `delegation_active` is true AND the input is a plan file (not a bare prompt), read `references/codex-delegation-workflow.md` and follow its Pre-Delegation Checks and Delegation Decision flow. If all checks pass and delegation proceeds, force **serial execution** and proceed directly to Phase 2 using the workflow's batched execution loop. If any check disables delegation, fall through to the standard strategy table below. If delegation is active but the input is a bare prompt (no plan file), set `delegation_active` to false with a brief note: "Codex delegation requires a plan file -- using standard mode." and continue with the standard strategy selection below.
|
|
182
|
-
|
|
183
|
-
After creating the task list, decide how to execute based on the plan's size and dependency structure:
|
|
184
|
-
|
|
185
|
-
| Strategy | When to use |
|
|
186
|
-
|----------|-------------|
|
|
187
|
-
| **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight. **Default for bare-prompt work** — bare prompts rarely produce enough structured context to justify subagent dispatch |
|
|
188
|
-
| **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks. Requires plan-unit metadata (Goal, Files, Approach, Test scenarios) |
|
|
189
|
-
| **Parallel subagents** | 3+ tasks that pass the Parallel Safety Check (below). Dispatch independent units simultaneously, run dependent units after their prerequisites complete. Requires plan-unit metadata |
|
|
190
|
-
|
|
191
|
-
**Parallel Safety Check** — required before choosing parallel dispatch:
|
|
192
|
-
|
|
193
|
-
1. Build a file-to-unit mapping from every candidate unit's `Files:` section (Create, Modify, and Test paths)
|
|
194
|
-
2. Check for intersection — any file path appearing in 2+ units means overlap
|
|
195
|
-
3. If any overlap is found, downgrade to serial subagents. Log the reason (e.g., "Units 2 and 4 share `config/routes.rb` — using serial dispatch"). Serial subagents still provide context-window isolation without shared-directory risks
|
|
196
|
-
|
|
197
|
-
Even with no file overlap, parallel subagents sharing a working directory face git index contention (concurrent staging/committing corrupts the index) and test interference (concurrent test runs pick up each other's in-progress changes). The parallel subagent constraints below mitigate these.
|
|
198
|
-
|
|
199
|
-
**Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
|
|
200
|
-
- The full plan file path (for overall context)
|
|
201
|
-
- The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
|
|
202
|
-
- Any resolved deferred questions relevant to that unit
|
|
203
|
-
- Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests
|
|
204
|
-
|
|
205
|
-
**Parallel subagent constraints** — when dispatching units in parallel (not serial or inline):
|
|
206
|
-
- Instruct each subagent: "Do not stage files (`git add`), create commits, or run the project test suite. The orchestrator handles testing, staging, and committing after all parallel units complete."
|
|
207
|
-
- These constraints prevent git index contention and test interference between concurrent subagents
|
|
208
|
-
|
|
209
|
-
**Permission mode:** Omit the `mode` parameter when dispatching subagents so the user's configured permission settings apply. Do not pass `mode: "auto"` — it overrides user-level settings like `bypassPermissions`.
|
|
210
|
-
|
|
211
|
-
**After each subagent completes (serial mode):**
|
|
212
|
-
1. Review the subagent's diff — verify changes match the unit's scope and `Files:` list
|
|
213
|
-
2. Run the relevant test suite to confirm the tree is healthy
|
|
214
|
-
3. If tests fail, diagnose and fix before proceeding — do not dispatch dependent units on a broken tree
|
|
215
|
-
4. Update the plan checkboxes and task list
|
|
216
|
-
5. Dispatch the next unit
|
|
217
|
-
|
|
218
|
-
**After all parallel subagents in a batch complete:**
|
|
219
|
-
1. Wait for every subagent in the current parallel batch to finish before acting on any of their results
|
|
220
|
-
2. Cross-check for discovered file collisions: compare the actual files modified by all subagents in the batch (not just their declared `Files:` lists). Subagents may create or modify files not anticipated during planning — this is expected, since plans describe *what* not *how*. A collision only matters when 2+ subagents in the same batch modified the same file. In a shared working directory, only the last writer's version survives — the other unit's changes to that file are lost. If a collision is detected: commit all non-colliding files from all units first, then re-run the affected units serially for the shared file so each builds on the other's committed work
|
|
221
|
-
3. For each completed unit, in dependency order: review the diff, run the relevant test suite, stage only that unit's files, and commit with a conventional message derived from the unit's Goal
|
|
222
|
-
4. If tests fail after committing a unit's changes, diagnose and fix before committing the next unit
|
|
223
|
-
5. Update the plan checkboxes and task list
|
|
224
|
-
6. Dispatch the next batch of independent units, or the next dependent unit
|
|
225
|
-
|
|
226
|
-
### Phase 2: Execute
|
|
227
|
-
|
|
228
|
-
1. **Task Execution Loop**
|
|
229
|
-
|
|
230
|
-
For each task in priority order:
|
|
231
|
-
|
|
232
|
-
```
|
|
233
|
-
while (tasks remain):
|
|
234
|
-
- Mark task as in-progress
|
|
235
|
-
- Read any referenced files from the plan or discovered during Phase 0
|
|
236
|
-
- Look for similar patterns in codebase
|
|
237
|
-
- Find existing test files for implementation files being changed (Test Discovery — see below)
|
|
238
|
-
- If delegation_active: branch to the Codex Delegation Execution Loop
|
|
239
|
-
(see `references/codex-delegation-workflow.md`)
|
|
240
|
-
- Otherwise: implement following existing conventions
|
|
241
|
-
- Add, update, or remove tests to match implementation changes (see Test Discovery below)
|
|
242
|
-
- Run System-Wide Test Check (see below)
|
|
243
|
-
- Run tests after changes
|
|
244
|
-
- Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
|
|
245
|
-
- Mark task as completed
|
|
246
|
-
- Evaluate for incremental commit (see below)
|
|
247
|
-
```
|
|
248
|
-
|
|
249
|
-
When a unit carries an `Execution note`, honor it. For test-first units, write the failing test before implementation for that unit. For characterization-first units, capture existing behavior before changing it. For units without an `Execution note`, proceed pragmatically.
|
|
250
|
-
|
|
251
|
-
Guardrails for execution posture:
|
|
252
|
-
- Do not write the test and implementation in the same step when working test-first
|
|
253
|
-
- Do not skip verifying that a new test fails before implementing the fix or feature
|
|
254
|
-
- Do not over-implement beyond the current behavior slice when working test-first
|
|
255
|
-
- Skip test-first discipline for trivial renames, pure configuration, and pure styling work
|
|
256
|
-
|
|
257
|
-
**Test Discovery** — Before implementing changes to a file, find its existing test files (search for test/spec files that import, reference, or share naming patterns with the implementation file). When a plan specifies test scenarios or test files, start there, then check for additional test coverage the plan may not have enumerated. Changes to implementation files should be accompanied by corresponding test updates — new tests for new behavior, modified tests for changed behavior, removed or updated tests for deleted behavior.
|
|
258
|
-
|
|
259
|
-
**Test Scenario Completeness** — Before writing tests for a feature-bearing unit, check whether the plan's `Test scenarios` cover all categories that apply to this unit. If a category is missing or scenarios are vague (e.g., "validates correctly" without naming inputs and expected outcomes), supplement from the unit's own context before writing tests:
|
|
260
|
-
|
|
261
|
-
| Category | When it applies | How to derive if missing |
|
|
262
|
-
|----------|----------------|------------------------|
|
|
263
|
-
| **Happy path** | Always for feature-bearing units | Read the unit's Goal and Approach for core input/output pairs |
|
|
264
|
-
| **Edge cases** | When the unit has meaningful boundaries (inputs, state, concurrency) | Identify boundary values, empty/nil inputs, and concurrent access patterns |
|
|
265
|
-
| **Error/failure paths** | When the unit has failure modes (validation, external calls, permissions) | Enumerate invalid inputs the unit should reject, permission/auth denials it should enforce, and downstream failures it should handle |
|
|
266
|
-
| **Integration** | When the unit crosses layers (callbacks, middleware, multi-service) | Identify the cross-layer chain and write a scenario that exercises it without mocks |
|
|
267
|
-
|
|
268
|
-
**System-Wide Test Check** — Before marking a task done, pause and ask:
|
|
269
|
-
|
|
270
|
-
| Question | What to do |
|
|
271
|
-
|----------|------------|
|
|
272
|
-
| **What fires when this runs?** Callbacks, middleware, observers, event handlers — trace two levels out from your change. | Read the actual code (not docs) for callbacks on models you touch, middleware in the request chain, `after_*` hooks. |
|
|
273
|
-
| **Do my tests exercise the real chain?** If every dependency is mocked, the test proves your logic works *in isolation* — it says nothing about the interaction. | Write at least one integration test that uses real objects through the full callback/middleware chain. No mocks for the layers that interact. |
|
|
274
|
-
| **Can failure leave orphaned state?** If your code persists state (DB row, cache, file) before calling an external service, what happens when the service fails? Does retry create duplicates? | Trace the failure path with real objects. If state is created before the risky call, test that failure cleans up or that retry is idempotent. |
|
|
275
|
-
| **What other interfaces expose this?** Mixins, DSLs, alternative entry points (Agent vs Chat vs ChatMethods). | Grep for the method/behavior in related classes. If parity is needed, add it now — not as a follow-up. |
|
|
276
|
-
| **Do error strategies align across layers?** Retry middleware + application fallback + framework error handling — do they conflict or create double execution? | List the specific error classes at each layer. Verify your rescue list matches what the lower layer actually raises. |
|
|
277
|
-
|
|
278
|
-
**When to skip:** Leaf-node changes with no callbacks, no state persistence, no parallel interfaces. If the change is purely additive (new helper method, new view partial), the check takes 10 seconds and the answer is "nothing fires, skip."
|
|
279
|
-
|
|
280
|
-
**When this matters most:** Any change that touches models with callbacks, error handling with fallback/retry, or functionality exposed through multiple interfaces.
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
2. **Incremental Commits**
|
|
284
|
-
|
|
285
|
-
After completing each task, evaluate whether to create an incremental commit:
|
|
286
|
-
|
|
287
|
-
| Commit when... | Don't commit when... |
|
|
288
|
-
|----------------|---------------------|
|
|
289
|
-
| Logical unit complete (model, service, component) | Small part of a larger unit |
|
|
290
|
-
| Tests pass + meaningful progress | Tests failing |
|
|
291
|
-
| About to switch contexts (backend → frontend) | Purely scaffolding with no behavior |
|
|
292
|
-
| About to attempt risky/uncertain changes | Would need a "WIP" commit message |
|
|
293
|
-
|
|
294
|
-
**Heuristic:** "Can I write a commit message that describes a complete, valuable change? If yes, commit. If the message would be 'WIP' or 'partial X', wait."
|
|
295
|
-
|
|
296
|
-
If the plan has Implementation Units, use them as a starting guide for commit boundaries — but adapt based on what you find during implementation. A unit might need multiple commits if it's larger than expected, or small related units might land together. Use each unit's Goal to inform the commit message.
|
|
297
|
-
|
|
298
|
-
**Commit workflow:**
|
|
299
|
-
```bash
|
|
300
|
-
# 1. Verify tests pass (use project's test command)
|
|
301
|
-
# Examples: bin/rails test, npm test, pytest, go test, etc.
|
|
302
|
-
|
|
303
|
-
# 2. Stage only files related to this logical unit (not `git add .`)
|
|
304
|
-
git add <files related to this logical unit>
|
|
305
|
-
|
|
306
|
-
# 3. Commit with conventional message
|
|
307
|
-
git commit -m "feat(scope): description of this unit"
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
**Handling merge conflicts:** If conflicts arise during rebasing or merging, resolve them immediately. Incremental commits make conflict resolution easier since each commit is small and focused.
|
|
311
|
-
|
|
312
|
-
**Note:** Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.
|
|
313
|
-
|
|
314
|
-
**Parallel subagent mode:** When units run as parallel subagents, the subagents do not commit — the orchestrator handles staging and committing after the entire parallel batch completes (see Parallel subagent constraints in Phase 1 Step 4). The commit guidance in this section applies to inline and serial execution, and to the orchestrator's commit decisions after parallel batch completion.
|
|
315
|
-
|
|
316
|
-
3. **Follow Existing Patterns**
|
|
317
|
-
|
|
318
|
-
- The plan should reference similar code - read those files first
|
|
319
|
-
- Match naming conventions exactly
|
|
320
|
-
- Reuse existing components where possible
|
|
321
|
-
- Follow project coding standards (see AGENTS.md; use AGENTS.md only if the repo still keeps a compatibility shim)
|
|
322
|
-
- When in doubt, grep for similar implementations
|
|
323
|
-
|
|
324
|
-
4. **Test Continuously**
|
|
325
|
-
|
|
326
|
-
- Run relevant tests after each significant change
|
|
327
|
-
- Don't wait until the end to test
|
|
328
|
-
- Fix failures immediately
|
|
329
|
-
- Add new tests for new behavior, update tests for changed behavior, remove tests for deleted behavior
|
|
330
|
-
- **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.
|
|
331
|
-
|
|
332
|
-
5. **Simplify as You Go**
|
|
333
|
-
|
|
334
|
-
After completing a cluster of related implementation units (or every 2-3 units), review recently changed files for simplification opportunities — consolidate duplicated patterns, extract shared helpers, and improve code reuse and efficiency. This is especially valuable when using subagents, since each agent works with isolated context and can't see patterns emerging across units.
|
|
335
|
-
|
|
336
|
-
Don't simplify after every single unit — early patterns may look duplicated but diverge intentionally in later units. Wait for a natural phase boundary or when you notice accumulated complexity.
|
|
337
|
-
|
|
338
|
-
If a `/simplify` skill or equivalent is available, use it. Otherwise, review the changed files yourself for reuse and consolidation opportunities.
|
|
339
|
-
|
|
340
|
-
6. **Figma Design Sync** (if applicable)
|
|
341
|
-
|
|
342
|
-
For UI work with Figma designs:
|
|
343
|
-
|
|
344
|
-
- Implement components following design specs
|
|
345
|
-
- Use figma-design-sync agent iteratively to compare
|
|
346
|
-
- Fix visual differences identified
|
|
347
|
-
- Repeat until implementation matches design
|
|
348
|
-
|
|
349
|
-
7. **Frontend Design Guidance** (if applicable)
|
|
350
|
-
|
|
351
|
-
For UI tasks without a Figma design -- where the implementation touches view, template, component, layout, or page files, creates user-visible routes, or the plan contains explicit UI/frontend/design language:
|
|
352
|
-
|
|
353
|
-
- Load the `frontend-design` skill before implementing
|
|
354
|
-
- Follow its detection, guidance, and verification flow
|
|
355
|
-
- If the skill produced a verification screenshot, it satisfies Phase 4's screenshot requirement -- no need to capture separately. If the skill fell back to mental review (no browser access), Phase 4's screenshot capture still applies
|
|
356
|
-
|
|
357
|
-
8. **Track Progress**
|
|
358
|
-
- Keep the task list updated as you complete tasks
|
|
359
|
-
- Note any blockers or unexpected discoveries
|
|
360
|
-
- Create new tasks if scope expands
|
|
361
|
-
- Keep user informed of major milestones
|
|
362
|
-
|
|
363
|
-
### Phase 3-4: Quality Check and Ship It
|
|
364
|
-
|
|
365
|
-
When all Phase 2 tasks are complete and execution transitions to quality check, read `references/shipping-workflow.md` for the full shipping workflow: quality checks, code review, final validation, PR creation, and notification.
|
|
366
|
-
|
|
367
|
-
---
|
|
368
|
-
|
|
369
|
-
## Codex Delegation Mode
|
|
370
|
-
|
|
371
|
-
When `delegation_active` is true after argument parsing, read `references/codex-delegation-workflow.md` for the complete delegation workflow: pre-checks, batching, prompt template, execution loop, and result classification.
|
|
372
|
-
|
|
373
|
-
---
|
|
374
|
-
|
|
375
|
-
## Key Principles
|
|
376
|
-
|
|
377
|
-
### Start Fast, Execute Faster
|
|
378
|
-
|
|
379
|
-
- Get clarification once at the start, then execute
|
|
380
|
-
- Don't wait for perfect understanding - ask questions and move
|
|
381
|
-
- The goal is to **finish the feature**, not create perfect process
|
|
382
|
-
|
|
383
|
-
### The Plan is Your Guide
|
|
384
|
-
|
|
385
|
-
- Work documents should reference similar code and patterns
|
|
386
|
-
- Load those references and follow them
|
|
387
|
-
- Don't reinvent - match what exists
|
|
388
|
-
|
|
389
|
-
### Test As You Go
|
|
390
|
-
|
|
391
|
-
- Run tests after each change, not at the end
|
|
392
|
-
- Fix failures immediately
|
|
393
|
-
- Continuous testing prevents big surprises
|
|
394
|
-
|
|
395
|
-
### Quality is Built In
|
|
396
|
-
|
|
397
|
-
- Follow existing patterns
|
|
398
|
-
- Write tests for new code
|
|
399
|
-
- Run linting before pushing
|
|
400
|
-
- Review every change — inline for simple additive work, full review for everything else
|
|
401
|
-
|
|
402
|
-
### Ship Complete Features
|
|
403
|
-
|
|
404
|
-
- Mark all tasks completed before moving on
|
|
405
|
-
- Don't leave features 80% done
|
|
406
|
-
- A finished feature that ships beats a perfect feature that doesn't
|
|
407
|
-
|
|
408
|
-
## Common Pitfalls to Avoid
|
|
409
|
-
|
|
410
|
-
- **Analysis paralysis** - Don't overthink, read the plan and execute
|
|
411
|
-
- **Skipping clarifying questions** - Ask now, not after building wrong thing
|
|
412
|
-
- **Ignoring plan references** - The plan has links for a reason
|
|
413
|
-
- **Testing at the end** - Test continuously or suffer later
|
|
414
|
-
- **Forgetting to track progress** - Update task status as you go or lose track of what's done
|
|
415
|
-
- **80% done syndrome** - Finish the feature, don't move on early
|
|
416
|
-
- **Skipping review** - Every change gets reviewed; only the depth varies
|
|
@@ -1,327 +0,0 @@
|
|
|
1
|
-
# Codex Delegation Workflow
|
|
2
|
-
|
|
3
|
-
When `delegation_active` is true, code implementation is delegated to the Codex CLI (`codex exec`) instead of being implemented directly. The orchestrating OpenCode agent retains control of planning, review, git operations, and orchestration.
|
|
4
|
-
|
|
5
|
-
## Delegation Decision
|
|
6
|
-
|
|
7
|
-
If `work_delegate_decision` is `ask`, present the recommendation and wait for the user's choice before proceeding.
|
|
8
|
-
|
|
9
|
-
**When recommending Codex delegation:**
|
|
10
|
-
|
|
11
|
-
> "Codex delegation active. [N] implementation units -- delegating in one batch."
|
|
12
|
-
> 1. Delegate to Codex *(recommended)*
|
|
13
|
-
> 2. Execute with OpenCode instead
|
|
14
|
-
|
|
15
|
-
**When recommending Codex delegation, multiple batches:**
|
|
16
|
-
|
|
17
|
-
> "Codex delegation active. [N] implementation units -- delegating in [X] batches."
|
|
18
|
-
> 1. Delegate to Codex *(recommended)*
|
|
19
|
-
> 2. Execute with OpenCode instead
|
|
20
|
-
|
|
21
|
-
**When recommending OpenCode (all units are trivial):**
|
|
22
|
-
|
|
23
|
-
> "Codex delegation active, but these are small changes where the cost of delegating outweighs having OpenCode do them."
|
|
24
|
-
> 1. Execute with OpenCode *(recommended)*
|
|
25
|
-
> 2. Delegate to Codex anyway
|
|
26
|
-
|
|
27
|
-
If the user chooses the delegation option, proceed to Pre-Delegation Checks below. If the user chooses the OpenCode option, set `delegation_active` to false and return to standard execution in the parent skill.
|
|
28
|
-
|
|
29
|
-
If `work_delegate_decision` is `auto` (the default), state the execution plan in one line and proceed without waiting: "Codex delegation active. Delegating [N] units in [X] batch(es)." If all units are trivial, set `delegation_active` to false and proceed: "Codex delegation active. All units are trivial -- executing with OpenCode."
|
|
30
|
-
|
|
31
|
-
## Pre-Delegation Checks
|
|
32
|
-
|
|
33
|
-
Run these checks **once before the first batch**. If any check fails, fall back to standard mode for the remainder of the plan execution. Do not re-run on subsequent batches.
|
|
34
|
-
|
|
35
|
-
**0. Platform Gate**
|
|
36
|
-
|
|
37
|
-
Codex delegation is only supported when the orchestrating agent is running in OpenCode. If the current session is Codex, Gemini CLI, OpenCode, or any other platform, set `delegation_active` to false and proceed in standard mode.
|
|
38
|
-
|
|
39
|
-
**1. Environment Guard**
|
|
40
|
-
|
|
41
|
-
Check whether the current agent is already running inside a Codex sandbox:
|
|
42
|
-
|
|
43
|
-
```bash
|
|
44
|
-
if [ -n "$CODEX_SANDBOX" ] || [ -n "$CODEX_SESSION_ID" ]; then
|
|
45
|
-
echo "inside_sandbox=true"
|
|
46
|
-
else
|
|
47
|
-
echo "inside_sandbox=false"
|
|
48
|
-
fi
|
|
49
|
-
```
|
|
50
|
-
|
|
51
|
-
If `inside_sandbox` is true, delegation would recurse or fail.
|
|
52
|
-
|
|
53
|
-
- If `delegation_source` is `argument`: emit "Already inside Codex sandbox -- using standard mode." and set `delegation_active` to false.
|
|
54
|
-
- If `delegation_source` is `config` or `default`: set `delegation_active` to false silently.
|
|
55
|
-
|
|
56
|
-
**2. Availability Check**
|
|
57
|
-
|
|
58
|
-
**Codex availability (pre-resolved):**
|
|
59
|
-
!`command -v codex >/dev/null 2>&1 && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_FOUND"`
|
|
60
|
-
|
|
61
|
-
If the line above shows `CODEX_AVAILABLE`, proceed to the next check.
|
|
62
|
-
If it shows `CODEX_NOT_FOUND`, the Codex CLI is not installed. Emit "Codex CLI not found (install via `npm install -g @openai/codex` or `brew install codex`) -- using standard mode." and set `delegation_active` to false.
|
|
63
|
-
If it shows an unresolved command string, run `command -v codex` using a shell tool. If the command prints a path, proceed. If it fails or prints nothing, emit the same message and set `delegation_active` to false.
|
|
64
|
-
|
|
65
|
-
**3. Consent Flow**
|
|
66
|
-
|
|
67
|
-
If `consent_granted` is not true (from config `work_delegate_consent`):
|
|
68
|
-
|
|
69
|
-
Present a one-time consent warning using the platform's blocking question tool (`question` in OpenCode, `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension)). The consent warning explains:
|
|
70
|
-
- Delegation sends implementation units to `codex exec` as a structured prompt
|
|
71
|
-
- **yolo mode** (`--yolo`): Full system access including network. Required for verification steps that run tests or install dependencies. **Recommended.**
|
|
72
|
-
- **full-auto mode** (`--full-auto`): Workspace-write sandbox, no network access.
|
|
73
|
-
|
|
74
|
-
Present the sandbox mode choice: (1) yolo (recommended), (2) full-auto.
|
|
75
|
-
|
|
76
|
-
On acceptance:
|
|
77
|
-
- Resolve the repo root: `git rev-parse --show-toplevel`. Write `work_delegate_consent: true` and `work_delegate_sandbox: <chosen-mode>` to `<repo-root>/.systematic/config.local.yaml`
|
|
78
|
-
- To write: (1) if file or directory does not exist, create `<repo-root>/.systematic/` and write the YAML file; (2) if file exists, merge new keys preserving existing keys
|
|
79
|
-
- Update `consent_granted` and `sandbox_mode` in the resolved state
|
|
80
|
-
|
|
81
|
-
On decline:
|
|
82
|
-
- Ask whether to disable delegation entirely for this project
|
|
83
|
-
- If yes: write `work_delegate: false` to `<repo-root>/.systematic/config.local.yaml` (using the same repo root resolved above). To write: (1) if file or directory does not exist, create `<repo-root>/.systematic/` and write the YAML file; (2) if file exists, merge new keys preserving existing keys. Set `delegation_active` to false, proceed in standard mode
|
|
84
|
-
- If no: set `delegation_active` to false for this invocation only, proceed in standard mode
|
|
85
|
-
|
|
86
|
-
**Headless consent:** If running in a headless or non-interactive context, delegation proceeds only if `work_delegate_consent` is already `true` in the config file. If consent is not recorded, set `delegation_active` to false silently.
|
|
87
|
-
|
|
88
|
-
## Batching
|
|
89
|
-
|
|
90
|
-
Delegate all units in one batch. If the plan exceeds 5 units, split into batches at the plan's own phase boundaries, or in groups of roughly 5 -- never splitting units that share files. Skip delegation entirely if every unit is trivial.
|
|
91
|
-
|
|
92
|
-
## Prompt Template
|
|
93
|
-
|
|
94
|
-
At the start of delegated execution, create a per-run OS-temp scratch directory via `mktemp -d` and capture its **absolute path** for all downstream use. All scratch files for this invocation live under that directory. Do not use `.context/` — these scratch files are per-run throwaway that get cleaned up when delegated execution ends (see Cleanup below), matching the repo Scratch Space convention for one-shot artifacts. Do not pass unresolved shell-variable strings to non-shell tools (Write, Read); use the absolute path returned by `mktemp -d`.
|
|
95
|
-
|
|
96
|
-
```bash
|
|
97
|
-
SCRATCH_DIR="$(mktemp -d -t ce-work-codex-XXXXXX)"
|
|
98
|
-
echo "$SCRATCH_DIR"
|
|
99
|
-
```
|
|
100
|
-
|
|
101
|
-
Refer to the echoed absolute path as `<scratch-dir>` throughout the rest of this workflow.
|
|
102
|
-
|
|
103
|
-
Before each batch, write a prompt file to `<scratch-dir>/prompt-batch-<batch-num>.md`.
|
|
104
|
-
|
|
105
|
-
Build the prompt from the batch's implementation units using these XML-tagged sections:
|
|
106
|
-
|
|
107
|
-
```xml
|
|
108
|
-
<task>
|
|
109
|
-
[For a single-unit batch: Goal from the implementation unit.
|
|
110
|
-
For a multi-unit batch: list each unit with its Goal, stating the concrete
|
|
111
|
-
job, repository context, and expected end state for each.]
|
|
112
|
-
</task>
|
|
113
|
-
|
|
114
|
-
<files>
|
|
115
|
-
[Combined file list from all units in the batch -- files to create, modify, or read.]
|
|
116
|
-
</files>
|
|
117
|
-
|
|
118
|
-
<patterns>
|
|
119
|
-
[File paths from all units' "Patterns to follow" fields. If no patterns:
|
|
120
|
-
"No explicit patterns referenced -- follow existing conventions in the
|
|
121
|
-
modified files."]
|
|
122
|
-
</patterns>
|
|
123
|
-
|
|
124
|
-
<approach>
|
|
125
|
-
[For a single-unit batch: Approach from the unit.
|
|
126
|
-
For a multi-unit batch: list each unit's approach, noting dependencies
|
|
127
|
-
and suggested ordering.]
|
|
128
|
-
</approach>
|
|
129
|
-
|
|
130
|
-
<constraints>
|
|
131
|
-
- Do NOT run git commit, git push, or create PRs -- the orchestrating agent handles all git operations
|
|
132
|
-
- Restrict all modifications to files within the repository root
|
|
133
|
-
- Keep changes tightly scoped to the stated task -- avoid unrelated refactors, renames, or cleanup
|
|
134
|
-
- Resolve the task fully before stopping -- do not stop at the first plausible answer
|
|
135
|
-
- If you discover mid-execution that you need to modify files outside the repo root, complete what you can within the repo and report what you could not do via the result schema issues field
|
|
136
|
-
</constraints>
|
|
137
|
-
|
|
138
|
-
<testing>
|
|
139
|
-
Before writing tests, check whether the plan's test scenarios cover all
|
|
140
|
-
categories that apply to each unit. Supplement gaps before writing tests:
|
|
141
|
-
- Happy path: core input/output pairs from each unit's goal
|
|
142
|
-
- Edge cases: boundary values, empty/nil inputs, type mismatches
|
|
143
|
-
- Error/failure paths: invalid inputs, permission denials, downstream failures
|
|
144
|
-
- Integration: cross-layer scenarios that mocks alone won't prove
|
|
145
|
-
|
|
146
|
-
Write tests that name specific inputs and expected outcomes. If your changes
|
|
147
|
-
touch code with callbacks, middleware, or event handlers, verify the
|
|
148
|
-
interaction chain works end-to-end.
|
|
149
|
-
</testing>
|
|
150
|
-
|
|
151
|
-
<verify>
|
|
152
|
-
After implementing, run ALL test files together in a single command (not
|
|
153
|
-
per-file). Cross-file contamination (e.g., mocked globals leaking between
|
|
154
|
-
test files) only surfaces when tests run in the same process. If tests
|
|
155
|
-
fail, fix the issues and re-run until they pass. Do not report status
|
|
156
|
-
"completed" unless verification passes. This is your responsibility --
|
|
157
|
-
the orchestrator will not re-run verification independently.
|
|
158
|
-
|
|
159
|
-
[Test and lint commands from the project. Use the union of all units'
|
|
160
|
-
verification commands as a single combined invocation.]
|
|
161
|
-
</verify>
|
|
162
|
-
|
|
163
|
-
<output_contract>
|
|
164
|
-
Report your result via the --output-schema mechanism. Fill in every field:
|
|
165
|
-
- status: "completed" ONLY if all changes were made AND verification passes,
|
|
166
|
-
"partial" if incomplete, "failed" if no meaningful progress
|
|
167
|
-
- files_modified: array of file paths you changed
|
|
168
|
-
- issues: array of strings describing any problems, gaps, or out-of-scope
|
|
169
|
-
work discovered
|
|
170
|
-
- summary: one-paragraph description of what was done
|
|
171
|
-
- verification_summary: what you ran to verify (command and outcome).
|
|
172
|
-
Example: "Ran `bun test` -- 14 tests passed, 0 failed."
|
|
173
|
-
If no verification was possible, say why.
|
|
174
|
-
</output_contract>
|
|
175
|
-
```
|
|
176
|
-
|
|
177
|
-
## Result Schema
|
|
178
|
-
|
|
179
|
-
Write the result schema to `<scratch-dir>/result-schema.json` (using the absolute path captured at the start) once at the start of delegated execution:
|
|
180
|
-
|
|
181
|
-
```json
|
|
182
|
-
{
|
|
183
|
-
"type": "object",
|
|
184
|
-
"properties": {
|
|
185
|
-
"status": { "enum": ["completed", "partial", "failed"] },
|
|
186
|
-
"files_modified": { "type": "array", "items": { "type": "string" } },
|
|
187
|
-
"issues": { "type": "array", "items": { "type": "string" } },
|
|
188
|
-
"summary": { "type": "string" },
|
|
189
|
-
"verification_summary": { "type": "string" }
|
|
190
|
-
},
|
|
191
|
-
"required": ["status", "files_modified", "issues", "summary", "verification_summary"],
|
|
192
|
-
"additionalProperties": false
|
|
193
|
-
}
|
|
194
|
-
```
|
|
195
|
-
|
|
196
|
-
Each batch's result is written to `<scratch-dir>/result-batch-<batch-num>.json` via the `-o` flag. On plan failure, files are left in place for debugging.
|
|
197
|
-
|
|
198
|
-
If the result JSON is absent or malformed after a successful exit code, classify as task failure.
|
|
199
|
-
|
|
200
|
-
## Execution Loop
|
|
201
|
-
|
|
202
|
-
Initialize a `consecutive_failures` counter at 0 before the first batch.
|
|
203
|
-
|
|
204
|
-
**Clean-baseline preflight:** Before the first batch, verify there are no uncommitted changes to tracked files:
|
|
205
|
-
|
|
206
|
-
```bash
|
|
207
|
-
git diff --quiet HEAD
|
|
208
|
-
```
|
|
209
|
-
|
|
210
|
-
This intentionally ignores untracked files. Only staged or unstaged modifications to tracked files make rollback unsafe. However, if untracked files exist at paths in the batch's planned Files list, rollback (`git clean -fd -- <paths>`) would delete them. If such overlaps are detected, warn the user and recommend committing or stashing those files before proceeding.
|
|
211
|
-
|
|
212
|
-
If tracked files are dirty, stop and present options: (1) commit current changes, (2) stash explicitly (`git stash push -m "pre-delegation"`), (3) continue in standard mode (sets `delegation_active` to false). Do not auto-stash user changes.
|
|
213
|
-
|
|
214
|
-
**Delegation invocation:** For each batch, execute these as **separate Bash tool calls** (not combined into one):
|
|
215
|
-
|
|
216
|
-
**Step A — Launch (background, separate Bash call):**
|
|
217
|
-
|
|
218
|
-
Write the prompt file, then make a single Bash tool call with `run_in_background: true` set on the tool parameter. This call returns immediately and has no timeout ceiling.
|
|
219
|
-
|
|
220
|
-
Substitute the literal absolute path captured at setup for every `<scratch-dir>` below. Each Bash tool call starts a fresh shell, so the `$SCRATCH_DIR` variable from the setup snippet is not preserved — an unresolved `$SCRATCH_DIR` would expand empty and break result detection.
|
|
221
|
-
|
|
222
|
-
```bash
|
|
223
|
-
# Substitute the resolved sandbox_mode value (yolo or full-auto) from the skill state
|
|
224
|
-
SANDBOX_MODE="<sandbox_mode>"
|
|
225
|
-
|
|
226
|
-
# Resolve sandbox flag
|
|
227
|
-
if [ "$SANDBOX_MODE" = "full-auto" ]; then
|
|
228
|
-
SANDBOX_FLAG="--full-auto"
|
|
229
|
-
else
|
|
230
|
-
SANDBOX_FLAG="--dangerously-bypass-approvals-and-sandbox"
|
|
231
|
-
fi
|
|
232
|
-
|
|
233
|
-
codex exec \
|
|
234
|
-
$SANDBOX_FLAG \
|
|
235
|
-
--output-schema "<scratch-dir>/result-schema.json" \
|
|
236
|
-
-o "<scratch-dir>/result-batch-<batch-num>.json" \
|
|
237
|
-
- < "<scratch-dir>/prompt-batch-<batch-num>.md"
|
|
238
|
-
```
|
|
239
|
-
|
|
240
|
-
**Conditional flags** — only include each line when the corresponding skill-state value is set:
|
|
241
|
-
|
|
242
|
-
- If `delegate_model` is set, insert ` -m "<delegate_model>" \` as a line before `$SANDBOX_FLAG`.
|
|
243
|
-
- If `delegate_effort` is set, insert ` -c 'model_reasoning_effort="<delegate_effort>"' \` as a line before `$SANDBOX_FLAG`.
|
|
244
|
-
|
|
245
|
-
When either value is unset, omit its line entirely — Codex resolves the default from the user's `~/.codex/config.toml` (and ultimately the CLI's own built-in default). Do not substitute a placeholder string for unset values.
|
|
246
|
-
|
|
247
|
-
Critical: `run_in_background: true` must be set as a **Bash tool parameter**, not as a shell `&` suffix. The tool parameter is what removes the timeout ceiling. A shell `&` inside a foreground Bash call still hits the 2-minute default timeout.
|
|
248
|
-
|
|
249
|
-
Quoting is critical for the `-c` flag when present: use single quotes around the entire key=value and double quotes around the TOML string value inside. Example: `-c 'model_reasoning_effort="high"'`.
|
|
250
|
-
|
|
251
|
-
Do not improvise CLI flags or modify this invocation template beyond the documented conditional insertions.
|
|
252
|
-
|
|
253
|
-
**Step B — Poll (foreground, separate Bash calls):**
|
|
254
|
-
|
|
255
|
-
After the launch call returns, make a **new, separate** foreground Bash tool call that polls for the result file. This keeps the agent's turn active so the user cannot interfere with the working tree.
|
|
256
|
-
|
|
257
|
-
Substitute the literal absolute path captured at setup for `<scratch-dir>`. The shell variable from Step A does not survive across separate Bash tool calls.
|
|
258
|
-
|
|
259
|
-
```bash
|
|
260
|
-
RESULT_FILE="<scratch-dir>/result-batch-<batch-num>.json"
|
|
261
|
-
for i in $(seq 1 6); do
|
|
262
|
-
test -s "$RESULT_FILE" && echo "DONE" && exit 0
|
|
263
|
-
sleep 10
|
|
264
|
-
done
|
|
265
|
-
echo "Waiting for Codex..."
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
If the output is "Waiting for Codex...", issue the same polling command again as another separate Bash call. Repeat until the output is "DONE", then read the result file and proceed to classification.
|
|
269
|
-
|
|
270
|
-
**Polling termination conditions:** Stop polling when any of these conditions is met:
|
|
271
|
-
|
|
272
|
-
- **Result file appears** (output is "DONE") -- proceed to result classification normally.
|
|
273
|
-
- **Background process exits with non-zero code** -- classify as CLI failure (row 1). Rollback and fall back to standard mode.
|
|
274
|
-
- **Background process exits with zero code but result file is absent** -- classify as task failure (row 2: exit 0, result JSON missing). Rollback and increment `consecutive_failures`.
|
|
275
|
-
- **5 polling rounds** elapse (~5 minutes) without the result file appearing and without a background process notification -- treat as a hung process. Classify as CLI failure (row 1). Rollback and fall back to standard mode.
|
|
276
|
-
|
|
277
|
-
**Result classification:** Codex is responsible for running verification internally and fixing failures before reporting -- the orchestrator does not re-run verification independently.
|
|
278
|
-
|
|
279
|
-
| # | Signal | Classification | Action |
|
|
280
|
-
|---|--------|---------------|--------|
|
|
281
|
-
| 1 | Exit code != 0 | CLI failure | Rollback to HEAD. Fall back to standard mode for ALL remaining work. |
|
|
282
|
-
| 2 | Exit code 0, result JSON missing or malformed | Task failure | Rollback to HEAD. Increment `consecutive_failures`. |
|
|
283
|
-
| 3 | Exit code 0, `status: "failed"` | Task failure | Rollback to HEAD. Increment `consecutive_failures`. |
|
|
284
|
-
| 4 | Exit code 0, `status: "partial"` | Partial success | Keep the diff. Complete remaining work locally, verify, and commit. Increment `consecutive_failures`. |
|
|
285
|
-
| 5 | Exit code 0, `status: "completed"` | Success | Commit changes. Reset `consecutive_failures` to 0. |
|
|
286
|
-
|
|
287
|
-
**Result handoff — surface to user:** After reading the result JSON and before committing or rolling back, display a summary so the user sees what happened. Format:
|
|
288
|
-
|
|
289
|
-
> **Codex batch <batch-num> — <classification>**
|
|
290
|
-
> <summary from result JSON>
|
|
291
|
-
>
|
|
292
|
-
> **Files:** <comma-separated list from files_modified>
|
|
293
|
-
> **Verification:** <verification_summary from result JSON>
|
|
294
|
-
> **Issues:** <issues list, or "None">
|
|
295
|
-
|
|
296
|
-
On failure or partial results, include the classification reason (e.g., "status: failed", "result JSON missing") so the user understands why the orchestrator is rolling back or completing locally.
|
|
297
|
-
|
|
298
|
-
Keep this brief — the goal is transparency, not a wall of text. One short block per batch.
|
|
299
|
-
|
|
300
|
-
**Rollback procedure:**
|
|
301
|
-
|
|
302
|
-
```bash
|
|
303
|
-
git checkout -- .
|
|
304
|
-
git clean -fd -- <paths from the batch's combined Files list>
|
|
305
|
-
```
|
|
306
|
-
|
|
307
|
-
Do NOT use bare `git clean -fd` without path arguments.
|
|
308
|
-
|
|
309
|
-
**Commit on success:**
|
|
310
|
-
|
|
311
|
-
```bash
|
|
312
|
-
git add $(git diff --name-only HEAD; git ls-files --others --exclude-standard)
|
|
313
|
-
git commit -m "feat(<scope>): <batch summary>"
|
|
314
|
-
```
|
|
315
|
-
|
|
316
|
-
**Between batches** (plans split into multiple batches): Report what completed, test results, and what's next. Continue immediately unless the user intervenes -- the checkpoint exists so the user *can* steer, not so they *must*.
|
|
317
|
-
|
|
318
|
-
**Circuit breaker:** After 3 consecutive failures, set `delegation_active` to false and emit: "Codex delegation disabled after 3 consecutive failures -- completing remaining units in standard mode."
|
|
319
|
-
|
|
320
|
-
**Scratch cleanup:** No explicit cleanup needed — OS temp handles eventual cleanup (macOS `$TMPDIR` periodic purge; Linux/WSL `/tmp` reboot or periodic cleanup). Leaving `<scratch-dir>` in place after the run also preserves intermediate artifacts for debugging if anything went wrong.
|
|
321
|
-
|
|
322
|
-
## Mixed-Model Attribution
|
|
323
|
-
|
|
324
|
-
When some units are executed by Codex and others locally:
|
|
325
|
-
- If all units used delegation: attribute to the Codex model
|
|
326
|
-
- If all units used standard mode: attribute to the current agent's model
|
|
327
|
-
- If mixed: note which units were delegated in the PR description and credit both models
|
|
@@ -1,129 +0,0 @@
|
|
|
1
|
-
# Shipping Workflow
|
|
2
|
-
|
|
3
|
-
This file contains the shipping workflow (Phase 3-4). Load it only when all Phase 2 tasks are complete and execution transitions to quality check.
|
|
4
|
-
|
|
5
|
-
## Phase 3: Quality Check
|
|
6
|
-
|
|
7
|
-
1. **Run Core Quality Checks**
|
|
8
|
-
|
|
9
|
-
Always run before submitting:
|
|
10
|
-
|
|
11
|
-
```bash
|
|
12
|
-
# Run full test suite (use project's test command)
|
|
13
|
-
# Examples: bin/rails test, npm test, pytest, go test, etc.
|
|
14
|
-
|
|
15
|
-
# Run linting (per AGENTS.md)
|
|
16
|
-
# Use linting-agent before pushing to origin
|
|
17
|
-
```
|
|
18
|
-
|
|
19
|
-
2. **Code Review** (REQUIRED)
|
|
20
|
-
|
|
21
|
-
Every change gets reviewed before shipping. The depth scales with the change's risk profile, but review itself is never skipped.
|
|
22
|
-
|
|
23
|
-
**Tier 2: Full review (default)** -- REQUIRED unless Tier 1 criteria are explicitly met. Invoke the `ce-code-review` skill with `mode:autofix` to run specialized reviewer agents, auto-apply safe fixes, and record residual downstream work in the per-run artifact. When the plan file path is known, pass it as `plan:<path>`. This is the mandatory default -- proceed to Tier 1 only after confirming every criterion below.
|
|
24
|
-
|
|
25
|
-
**Tier 1: Inline self-review** -- A lighter alternative permitted only when **all four** criteria are true. Before choosing Tier 1, explicitly state which criteria apply and why. If any criterion is uncertain, use Tier 2.
|
|
26
|
-
- Purely additive (new files only, no existing behavior modified)
|
|
27
|
-
- Single concern (one skill, one component -- not cross-cutting)
|
|
28
|
-
- Pattern-following (implementation mirrors an existing example with no novel logic)
|
|
29
|
-
- Plan-faithful (no scope growth, no deferred questions resolved with surprising answers)
|
|
30
|
-
|
|
31
|
-
3. **Residual Work Gate** (REQUIRED when Tier 2 ran)
|
|
32
|
-
|
|
33
|
-
After Tier 2 code review completes, inspect the Residual Actionable Work summary it returned (or read the run artifact directly if the summary was not emitted). If one or more residual `downstream-resolver` findings remain, do not proceed to Final Validation until the user decides how to handle them.
|
|
34
|
-
|
|
35
|
-
Ask the user using the platform's blocking question tool (`question` in OpenCode with `ToolSearch select:question` pre-loaded if needed, `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension)). Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool. Never silently skip the gate.
|
|
36
|
-
|
|
37
|
-
Stem: `Code review found N residual finding(s) the skill did not auto-fix. How should the agent proceed?`
|
|
38
|
-
|
|
39
|
-
Options (four or fewer, self-contained labels):
|
|
40
|
-
- `Apply/fix now` — loop back into review with focused fixes; the agent investigates each finding, applies changes where safe, and re-runs review.
|
|
41
|
-
- `File tickets via project tracker` — load `references/tracker-defer.md` in Interactive mode; the agent files tickets in the project's detected tracker (or `gh` fallback, or leaves them in the report if no sink exists) and proceeds to Final Validation.
|
|
42
|
-
- `Accept and proceed` — record the residual findings verbatim in a durable "Known Residuals" sink before shipping. If a PR will be created or updated in Phase 4, include them in the PR description's "Known Residuals" section (the agent owns this when calling `ce-commit-push-pr`). If the user later chooses the no-PR `ce-commit` path, create `docs/residual-review-findings/<branch-or-head-sha>.md`, include the accepted findings and source review-run context, stage it with the implementation commit, and mention the file path in the final summary. The user has acknowledged the risk, but the findings must not live only in the transient session.
|
|
43
|
-
- `Stop — do not ship` — abort the shipping workflow. The user will handle findings manually before re-invoking.
|
|
44
|
-
|
|
45
|
-
Skip this gate entirely when the review reported `Residual actionable work: none.` or when only Tier 1 (inline self-review) was used. Do not proceed past this gate on an `Accept and proceed` decision until the agent has recorded whether the durable sink is `PR Known Residuals` or `docs/residual-review-findings/<branch-or-head-sha>.md`.
|
|
46
|
-
|
|
47
|
-
4. **Final Validation**
|
|
48
|
-
- All tasks marked completed
|
|
49
|
-
- Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
|
|
50
|
-
- Linting passes
|
|
51
|
-
- Code follows existing patterns
|
|
52
|
-
- Figma designs match (if applicable)
|
|
53
|
-
- No console errors or warnings
|
|
54
|
-
- If the plan has a `Requirements` section (or legacy `Requirements Trace`), verify each requirement is satisfied by the completed work
|
|
55
|
-
- If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
|
|
56
|
-
|
|
57
|
-
5. **Prepare Operational Validation Plan** (REQUIRED)
|
|
58
|
-
- Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
|
|
59
|
-
- Include concrete:
|
|
60
|
-
- Log queries/search terms
|
|
61
|
-
- Metrics or dashboards to watch
|
|
62
|
-
- Expected healthy signals
|
|
63
|
-
- Failure signals and rollback/mitigation trigger
|
|
64
|
-
- Validation window and owner
|
|
65
|
-
- If there is truly no production/runtime impact, still include the section with: `No additional operational monitoring required` and a one-line reason.
|
|
66
|
-
|
|
67
|
-
## Phase 4: Ship It
|
|
68
|
-
|
|
69
|
-
1. **Prepare Evidence Context**
|
|
70
|
-
|
|
71
|
-
Do not invoke `ce-demo-reel` directly in this step. Evidence capture belongs to the PR creation or PR description update flow, where the final PR diff and description context are available.
|
|
72
|
-
|
|
73
|
-
Note whether the completed work has observable behavior (UI rendering, CLI output, API/library behavior with a runnable example, generated artifacts, or workflow output). The `ce-commit-push-pr` skill will ask whether to capture evidence only when evidence is possible.
|
|
74
|
-
|
|
75
|
-
2. **Update Plan Status**
|
|
76
|
-
|
|
77
|
-
If the input document has YAML frontmatter with a `status` field, update it to `completed`:
|
|
78
|
-
```
|
|
79
|
-
status: active -> status: completed
|
|
80
|
-
```
|
|
81
|
-
|
|
82
|
-
3. **Commit and Create Pull Request**
|
|
83
|
-
|
|
84
|
-
Load the `ce-commit-push-pr` skill to handle committing, pushing, and PR creation. The skill handles convention detection, branch safety, logical commit splitting, adaptive PR descriptions, and attribution badges.
|
|
85
|
-
|
|
86
|
-
When providing context for the PR description, include:
|
|
87
|
-
- The plan's summary and key decisions
|
|
88
|
-
- Testing notes (tests added/modified, manual testing performed)
|
|
89
|
-
- Evidence context from step 1, so `ce-commit-push-pr` can decide whether to ask about capturing evidence
|
|
90
|
-
- Figma design link (if applicable)
|
|
91
|
-
- The Post-Deploy Monitoring & Validation section (see Phase 3 Step 5)
|
|
92
|
-
- Any "Known Residuals" accepted in the Phase 3 Residual Work Gate, rendered as a dedicated section in the PR body with severity, file:line, and title per finding
|
|
93
|
-
|
|
94
|
-
If the user prefers to commit without creating a PR, load the `ce-commit` skill instead.
|
|
95
|
-
|
|
96
|
-
4. **Notify User**
|
|
97
|
-
- Summarize what was completed
|
|
98
|
-
- Link to PR (if one was created)
|
|
99
|
-
- Note any follow-up work needed
|
|
100
|
-
- Suggest next steps if applicable
|
|
101
|
-
|
|
102
|
-
## Quality Checklist
|
|
103
|
-
|
|
104
|
-
Before creating PR, verify:
|
|
105
|
-
|
|
106
|
-
- [ ] All clarifying questions asked and answered
|
|
107
|
-
- [ ] All tasks marked completed
|
|
108
|
-
- [ ] Testing addressed -- tests pass AND new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
|
|
109
|
-
- [ ] Linting passes (use linting-agent)
|
|
110
|
-
- [ ] Code follows existing patterns
|
|
111
|
-
- [ ] Figma designs match implementation (if applicable)
|
|
112
|
-
- [ ] Evidence decision handled by `ce-commit-push-pr` when the change has observable behavior
|
|
113
|
-
- [ ] Commit messages follow conventional format
|
|
114
|
-
- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
|
|
115
|
-
- [ ] Code review completed (inline self-review or full `ce-code-review`)
|
|
116
|
-
- [ ] PR description includes summary, testing notes, and evidence when captured
|
|
117
|
-
- [ ] PR description includes Compound Engineered badge with accurate model and harness
|
|
118
|
-
|
|
119
|
-
## Code Review Tiers
|
|
120
|
-
|
|
121
|
-
Every change gets reviewed. The tier determines depth, not whether review happens.
|
|
122
|
-
|
|
123
|
-
**Tier 2 (full review)** -- REQUIRED default. Invoke `ce-code-review mode:autofix` with `plan:<path>` when available. Safe fixes are applied automatically; residual work is recorded in the run artifact for downstream routing. Always use this tier unless all four Tier 1 criteria are explicitly confirmed.
|
|
124
|
-
|
|
125
|
-
**Tier 1 (inline self-review)** -- permitted only when all four are true (state each explicitly before choosing):
|
|
126
|
-
- Purely additive (new files only, no existing behavior modified)
|
|
127
|
-
- Single concern (one skill, one component -- not cross-cutting)
|
|
128
|
-
- Pattern-following (mirrors an existing example, no novel logic)
|
|
129
|
-
- Plan-faithful (no scope growth, no surprising deferred-question resolutions)
|