oh-my-codex 0.18.8 → 0.18.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (221) hide show
  1. package/Cargo.lock +12 -12
  2. package/Cargo.toml +1 -1
  3. package/README.md +4 -0
  4. package/dist/autopilot/__tests__/deep-interview-gate.test.d.ts +2 -0
  5. package/dist/autopilot/__tests__/deep-interview-gate.test.d.ts.map +1 -0
  6. package/dist/autopilot/__tests__/deep-interview-gate.test.js +215 -0
  7. package/dist/autopilot/__tests__/deep-interview-gate.test.js.map +1 -0
  8. package/dist/autopilot/__tests__/fsm.test.js +3 -0
  9. package/dist/autopilot/__tests__/fsm.test.js.map +1 -1
  10. package/dist/autopilot/__tests__/ralplan-gate.test.js +148 -0
  11. package/dist/autopilot/__tests__/ralplan-gate.test.js.map +1 -1
  12. package/dist/autopilot/deep-interview-gate.d.ts.map +1 -1
  13. package/dist/autopilot/deep-interview-gate.js +140 -0
  14. package/dist/autopilot/deep-interview-gate.js.map +1 -1
  15. package/dist/autopilot/fsm.js +2 -2
  16. package/dist/autopilot/fsm.js.map +1 -1
  17. package/dist/cli/__tests__/auth.test.js +37 -2
  18. package/dist/cli/__tests__/auth.test.js.map +1 -1
  19. package/dist/cli/__tests__/codex-feature-probe.test.d.ts +2 -0
  20. package/dist/cli/__tests__/codex-feature-probe.test.d.ts.map +1 -0
  21. package/dist/cli/__tests__/codex-feature-probe.test.js +46 -0
  22. package/dist/cli/__tests__/codex-feature-probe.test.js.map +1 -0
  23. package/dist/cli/__tests__/codex-plugin-layout.test.js +1 -1
  24. package/dist/cli/__tests__/codex-plugin-layout.test.js.map +1 -1
  25. package/dist/cli/__tests__/doctor-warning-copy.test.js +2 -0
  26. package/dist/cli/__tests__/doctor-warning-copy.test.js.map +1 -1
  27. package/dist/cli/__tests__/index.test.js +288 -6
  28. package/dist/cli/__tests__/index.test.js.map +1 -1
  29. package/dist/cli/__tests__/launch-fallback.test.js +19 -5
  30. package/dist/cli/__tests__/launch-fallback.test.js.map +1 -1
  31. package/dist/cli/__tests__/package-bin-contract.test.js +39 -10
  32. package/dist/cli/__tests__/package-bin-contract.test.js.map +1 -1
  33. package/dist/cli/__tests__/question.test.js +26 -9
  34. package/dist/cli/__tests__/question.test.js.map +1 -1
  35. package/dist/cli/__tests__/resume.test.js +50 -1
  36. package/dist/cli/__tests__/resume.test.js.map +1 -1
  37. package/dist/cli/__tests__/setup-refresh.test.js +6 -2
  38. package/dist/cli/__tests__/setup-refresh.test.js.map +1 -1
  39. package/dist/cli/__tests__/sparkshell-packaging.test.js +45 -2
  40. package/dist/cli/__tests__/sparkshell-packaging.test.js.map +1 -1
  41. package/dist/cli/__tests__/team-decompose.test.js +10 -5
  42. package/dist/cli/__tests__/team-decompose.test.js.map +1 -1
  43. package/dist/cli/__tests__/team.test.js +45 -1
  44. package/dist/cli/__tests__/team.test.js.map +1 -1
  45. package/dist/cli/__tests__/ultragoal.test.js +75 -0
  46. package/dist/cli/__tests__/ultragoal.test.js.map +1 -1
  47. package/dist/cli/__tests__/update.test.js +214 -17
  48. package/dist/cli/__tests__/update.test.js.map +1 -1
  49. package/dist/cli/__tests__/windows-popup-loop-contract.test.js +1 -1
  50. package/dist/cli/auth.d.ts.map +1 -1
  51. package/dist/cli/auth.js +25 -1
  52. package/dist/cli/auth.js.map +1 -1
  53. package/dist/cli/codex-feature-probe.d.ts +5 -2
  54. package/dist/cli/codex-feature-probe.d.ts.map +1 -1
  55. package/dist/cli/codex-feature-probe.js +25 -9
  56. package/dist/cli/codex-feature-probe.js.map +1 -1
  57. package/dist/cli/index.d.ts +39 -5
  58. package/dist/cli/index.d.ts.map +1 -1
  59. package/dist/cli/index.js +184 -101
  60. package/dist/cli/index.js.map +1 -1
  61. package/dist/cli/setup.d.ts.map +1 -1
  62. package/dist/cli/setup.js +9 -1
  63. package/dist/cli/setup.js.map +1 -1
  64. package/dist/cli/team.d.ts +4 -0
  65. package/dist/cli/team.d.ts.map +1 -1
  66. package/dist/cli/team.js +43 -4
  67. package/dist/cli/team.js.map +1 -1
  68. package/dist/cli/ultragoal.d.ts.map +1 -1
  69. package/dist/cli/ultragoal.js +29 -0
  70. package/dist/cli/ultragoal.js.map +1 -1
  71. package/dist/cli/update.d.ts +20 -3
  72. package/dist/cli/update.d.ts.map +1 -1
  73. package/dist/cli/update.js +265 -23
  74. package/dist/cli/update.js.map +1 -1
  75. package/dist/cli/version.d.ts.map +1 -1
  76. package/dist/cli/version.js +5 -9
  77. package/dist/cli/version.js.map +1 -1
  78. package/dist/compat/__tests__/doctor-contract.test.js +12 -1
  79. package/dist/compat/__tests__/doctor-contract.test.js.map +1 -1
  80. package/dist/hooks/__tests__/agents-overlay.test.js +1 -0
  81. package/dist/hooks/__tests__/agents-overlay.test.js.map +1 -1
  82. package/dist/hooks/__tests__/autopilot-skill-contract.test.js +15 -0
  83. package/dist/hooks/__tests__/autopilot-skill-contract.test.js.map +1 -1
  84. package/dist/hooks/__tests__/code-review-skill-contract.test.js +7 -3
  85. package/dist/hooks/__tests__/code-review-skill-contract.test.js.map +1 -1
  86. package/dist/hooks/__tests__/deep-interview-contract.test.js +46 -1
  87. package/dist/hooks/__tests__/deep-interview-contract.test.js.map +1 -1
  88. package/dist/hooks/__tests__/skill-guidance-contract.test.js +14 -5
  89. package/dist/hooks/__tests__/skill-guidance-contract.test.js.map +1 -1
  90. package/dist/hooks/agents-overlay.d.ts.map +1 -1
  91. package/dist/hooks/agents-overlay.js +2 -1
  92. package/dist/hooks/agents-overlay.js.map +1 -1
  93. package/dist/hooks/extensibility/__tests__/plugin-runner.test.js +112 -1
  94. package/dist/hooks/extensibility/__tests__/plugin-runner.test.js.map +1 -1
  95. package/dist/hooks/extensibility/plugin-runner-stdin.d.ts +2 -0
  96. package/dist/hooks/extensibility/plugin-runner-stdin.d.ts.map +1 -0
  97. package/dist/hooks/extensibility/plugin-runner-stdin.js +16 -0
  98. package/dist/hooks/extensibility/plugin-runner-stdin.js.map +1 -0
  99. package/dist/hooks/extensibility/plugin-runner.js +2 -4
  100. package/dist/hooks/extensibility/plugin-runner.js.map +1 -1
  101. package/dist/hud/__tests__/index.test.js +23 -2
  102. package/dist/hud/__tests__/index.test.js.map +1 -1
  103. package/dist/hud/__tests__/reconcile.test.js +387 -0
  104. package/dist/hud/__tests__/reconcile.test.js.map +1 -1
  105. package/dist/hud/__tests__/state.test.js +28 -0
  106. package/dist/hud/__tests__/state.test.js.map +1 -1
  107. package/dist/hud/__tests__/tmux.test.js +118 -7
  108. package/dist/hud/__tests__/tmux.test.js.map +1 -1
  109. package/dist/hud/index.d.ts +6 -1
  110. package/dist/hud/index.d.ts.map +1 -1
  111. package/dist/hud/index.js +12 -3
  112. package/dist/hud/index.js.map +1 -1
  113. package/dist/hud/reconcile.d.ts +6 -2
  114. package/dist/hud/reconcile.d.ts.map +1 -1
  115. package/dist/hud/reconcile.js +58 -28
  116. package/dist/hud/reconcile.js.map +1 -1
  117. package/dist/hud/state.d.ts.map +1 -1
  118. package/dist/hud/state.js +4 -18
  119. package/dist/hud/state.js.map +1 -1
  120. package/dist/hud/tmux.d.ts +14 -1
  121. package/dist/hud/tmux.d.ts.map +1 -1
  122. package/dist/hud/tmux.js +129 -15
  123. package/dist/hud/tmux.js.map +1 -1
  124. package/dist/question/__tests__/renderer.test.js +566 -1
  125. package/dist/question/__tests__/renderer.test.js.map +1 -1
  126. package/dist/question/renderer.d.ts +9 -1
  127. package/dist/question/renderer.d.ts.map +1 -1
  128. package/dist/question/renderer.js +246 -70
  129. package/dist/question/renderer.js.map +1 -1
  130. package/dist/ralplan/consensus-gate.js +9 -1
  131. package/dist/ralplan/consensus-gate.js.map +1 -1
  132. package/dist/scripts/__tests__/codex-native-hook.test.js +322 -15
  133. package/dist/scripts/__tests__/codex-native-hook.test.js.map +1 -1
  134. package/dist/scripts/__tests__/run-test-files.test.js +115 -1
  135. package/dist/scripts/__tests__/run-test-files.test.js.map +1 -1
  136. package/dist/scripts/codex-native-hook.d.ts.map +1 -1
  137. package/dist/scripts/codex-native-hook.js +94 -20
  138. package/dist/scripts/codex-native-hook.js.map +1 -1
  139. package/dist/scripts/notify-hook/team-worker-stop.d.ts.map +1 -1
  140. package/dist/scripts/notify-hook/team-worker-stop.js +54 -21
  141. package/dist/scripts/notify-hook/team-worker-stop.js.map +1 -1
  142. package/dist/scripts/run-test-files.js +218 -160
  143. package/dist/scripts/run-test-files.js.map +1 -1
  144. package/dist/state/__tests__/operations.test.js +463 -0
  145. package/dist/state/__tests__/operations.test.js.map +1 -1
  146. package/dist/team/__tests__/delivery-log.test.js +18 -0
  147. package/dist/team/__tests__/delivery-log.test.js.map +1 -1
  148. package/dist/team/__tests__/runtime.test.js +48 -0
  149. package/dist/team/__tests__/runtime.test.js.map +1 -1
  150. package/dist/team/__tests__/tmux-session.test.js +107 -0
  151. package/dist/team/__tests__/tmux-session.test.js.map +1 -1
  152. package/dist/team/__tests__/tmux-test-fixture.d.ts.map +1 -1
  153. package/dist/team/__tests__/tmux-test-fixture.js +14 -2
  154. package/dist/team/__tests__/tmux-test-fixture.js.map +1 -1
  155. package/dist/team/__tests__/tmux-test-fixture.test.js +1 -0
  156. package/dist/team/__tests__/tmux-test-fixture.test.js.map +1 -1
  157. package/dist/team/__tests__/worker-bootstrap.test.js +54 -1
  158. package/dist/team/__tests__/worker-bootstrap.test.js.map +1 -1
  159. package/dist/team/delivery-log.d.ts +1 -1
  160. package/dist/team/delivery-log.d.ts.map +1 -1
  161. package/dist/team/delivery-log.js.map +1 -1
  162. package/dist/team/repo-aware-decomposition.d.ts +4 -0
  163. package/dist/team/repo-aware-decomposition.d.ts.map +1 -1
  164. package/dist/team/repo-aware-decomposition.js.map +1 -1
  165. package/dist/team/runtime.d.ts.map +1 -1
  166. package/dist/team/runtime.js +78 -9
  167. package/dist/team/runtime.js.map +1 -1
  168. package/dist/team/tmux-session.d.ts +1 -0
  169. package/dist/team/tmux-session.d.ts.map +1 -1
  170. package/dist/team/tmux-session.js +16 -5
  171. package/dist/team/tmux-session.js.map +1 -1
  172. package/dist/team/ultragoal-context.d.ts +12 -0
  173. package/dist/team/ultragoal-context.d.ts.map +1 -1
  174. package/dist/team/ultragoal-context.js +32 -8
  175. package/dist/team/ultragoal-context.js.map +1 -1
  176. package/dist/utils/__tests__/paths.test.js +23 -0
  177. package/dist/utils/__tests__/paths.test.js.map +1 -1
  178. package/dist/utils/__tests__/platform-command.test.js +16 -1
  179. package/dist/utils/__tests__/platform-command.test.js.map +1 -1
  180. package/dist/utils/__tests__/version.test.d.ts +2 -0
  181. package/dist/utils/__tests__/version.test.d.ts.map +1 -0
  182. package/dist/utils/__tests__/version.test.js +51 -0
  183. package/dist/utils/__tests__/version.test.js.map +1 -0
  184. package/dist/utils/paths.d.ts +8 -1
  185. package/dist/utils/paths.d.ts.map +1 -1
  186. package/dist/utils/paths.js +20 -6
  187. package/dist/utils/paths.js.map +1 -1
  188. package/dist/utils/platform-command.d.ts +9 -0
  189. package/dist/utils/platform-command.d.ts.map +1 -1
  190. package/dist/utils/platform-command.js +15 -0
  191. package/dist/utils/platform-command.js.map +1 -1
  192. package/dist/utils/toml.d.ts +4 -0
  193. package/dist/utils/toml.d.ts.map +1 -0
  194. package/dist/utils/toml.js +75 -0
  195. package/dist/utils/toml.js.map +1 -0
  196. package/dist/utils/version.d.ts +7 -0
  197. package/dist/utils/version.d.ts.map +1 -0
  198. package/dist/utils/version.js +67 -0
  199. package/dist/utils/version.js.map +1 -0
  200. package/dist/verification/__tests__/ci-rust-gates.test.js +8 -0
  201. package/dist/verification/__tests__/ci-rust-gates.test.js.map +1 -1
  202. package/dist/verification/__tests__/dev-merge-issue-close-workflow.test.js +16 -2
  203. package/dist/verification/__tests__/dev-merge-issue-close-workflow.test.js.map +1 -1
  204. package/package.json +4 -3
  205. package/plugins/oh-my-codex/.codex-plugin/plugin.json +1 -1
  206. package/plugins/oh-my-codex/skills/autopilot/SKILL.md +3 -0
  207. package/plugins/oh-my-codex/skills/code-review/SKILL.md +2 -2
  208. package/plugins/oh-my-codex/skills/deep-interview/SKILL.md +85 -11
  209. package/plugins/oh-my-codex/skills/ultrawork/SKILL.md +32 -17
  210. package/skills/autopilot/SKILL.md +3 -0
  211. package/skills/code-review/SKILL.md +2 -2
  212. package/skills/deep-interview/SKILL.md +85 -11
  213. package/skills/ultrawork/SKILL.md +32 -17
  214. package/src/scripts/__tests__/codex-native-hook.test.ts +391 -26
  215. package/src/scripts/__tests__/run-test-files.test.ts +138 -2
  216. package/src/scripts/codex-native-hook.ts +99 -17
  217. package/src/scripts/notify-hook/team-worker-stop.ts +58 -18
  218. package/src/scripts/prepare-build.js +83 -0
  219. package/src/scripts/run-test-files.ts +229 -150
  220. package/templates/AGENTS.md +40 -199
  221. package/src/scripts/postinstall-bootstrap.js +0 -23
@@ -51,6 +51,11 @@ If no flag is provided, use **Standard**.
51
51
  - Gather codebase facts via `explore` before asking user about internals
52
52
  - `omx explore` is deprecated. Use normal repository inspection tools/subagents for simple read-only brownfield fact gathering; use `omx sparkshell` only for explicit shell-native read-only evidence, and keep ambiguous or non-shell-only investigation on the richer normal path.
53
53
  - Always run a preflight context intake before the first interview question
54
+ - For brownfield work, preflight must include doc/context grounding before user-facing questions: inspect applicable `AGENTS.md` files, README/getting-started docs, relevant `docs/` contracts/plans/ADRs, existing `.omx/context/` snapshots, and any project-local glossary/context files such as `CONTEXT.md` or `CONTEXT-MAP.md` when present.
55
+ - Treat existing repo language as evidence, not authority: if the user uses a fuzzy, overloaded, or conflicting term, surface the specific doc/code wording and ask which meaning should govern before implementation.
56
+ - Cross-check user claims about current behavior against code or documented contracts when discoverable. If docs and code disagree, ask a confirmation question that names both sources instead of silently choosing one.
57
+ - Use scenario-based edge-case grilling when relationships, boundaries, or handoff behavior are unclear: invent one concrete scenario that stresses the ambiguous boundary, then ask one focused question about the expected outcome.
58
+ - Durable docs, glossary, ADR, or memory updates are opt-in and public-safe only. Deep-interview may recommend such updates in the handoff summary, but must not automatically create or dump public docs from interview transcripts unless the user explicitly chooses that as in-scope.
54
59
  - If initial context is oversized or would exceed the prompt budget, do not paste or forward the raw payload into interview prompts; request and record a prompt-safe initial-context summary first
55
60
  - The oversized initial-context summary gate is blocking: wait for the concise summary before ambiguity scoring, crystallizing artifacts, or any downstream execution handoff
56
61
  - The summary must preserve goals, constraints, success criteria, non-goals, decision boundaries, and references to any full source documents so downstream consumers receive a prompt-safe but faithful context
@@ -97,8 +102,15 @@ If no flag is provided, use **Standard**.
97
102
  - Unknowns/open questions
98
103
  - Decision-boundary unknowns
99
104
  - Likely codebase touchpoints
105
+ - Relevant repo docs/rules/context inspected
106
+ - Terminology or doc/code conflicts found
100
107
  - Prompt-safe initial-context summary status (`not_needed`, `needed`, or `recorded`)
101
- 5. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
108
+ 5. For brownfield tasks, inspect the applicable documentation/rule surface before the first user-facing round. Prefer exact, nearby sources over broad scans:
109
+ - governing `AGENTS.md` files and template/runtime instruction surfaces that apply to the touched paths
110
+ - README/getting-started docs and relevant docs under `docs/`, especially contracts, plans, ADR-like records, and workflow docs
111
+ - existing `.omx/context/` snapshots, `.omx/specs/`, and planning artifacts relevant to the slug
112
+ - project-local glossary/context files such as `CONTEXT.md`, `CONTEXT-MAP.md`, or context-specific docs when they exist
113
+ 6. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
102
114
 
103
115
  ## Phase 1: Initialize
104
116
 
@@ -137,13 +149,14 @@ If no flag is provided, use **Standard**.
137
149
  Repeat until ambiguity `<= threshold`, the pressure pass is complete, the readiness gates are explicit, the user exits with warning, or max rounds are reached. This is a stop condition: below threshold, do not open a new ordinary interview branch.
138
150
 
139
151
  ### 2a) Generate next question
140
- If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
152
+ If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
141
153
 
142
154
  Use:
143
155
  - Original idea
144
156
  - Prior Q&A rounds
145
157
  - Current dimension scores
146
158
  - Brownfield context (if any)
159
+ - Doc/context grounding notes, including existing terminology, governing rules, and any doc/code mismatch
147
160
  - Activated challenge mode injection (Phase 3)
148
161
 
149
162
  Target the lowest-scoring dimension, but respect stage priority:
@@ -155,12 +168,21 @@ Follow-up pressure ladder after each answer:
155
168
  1. Ask for a concrete example, counterexample, or evidence signal behind the latest claim
156
169
  2. Probe the hidden assumption, dependency, or belief that makes the claim true
157
170
  3. Force a boundary or tradeoff: what would you explicitly not do, defer, or reject?
158
- 4. If the answer still describes symptoms, reframe toward essence / root cause before moving on
171
+ 4. Challenge fuzzy or conflicting terms against the repo's documented language and current code behavior
172
+ 5. Stress-test the boundary with one concrete scenario or edge case when a relationship or handoff remains ambiguous
173
+ 6. If the answer still describes symptoms, reframe toward essence / root cause before moving on
159
174
 
160
175
  Prefer staying on the same thread for multiple rounds when it has the highest leverage. Breadth without pressure is not progress.
161
176
 
162
177
  Maintain a **Breadth Ledger** across independent ambiguity tracks: scope, constraints, outputs, verification, brownfield integration, and any user-mentioned deliverable tracks. The ledger is a guard, not a mandatory rotation rule: stay deep on the current thread until it has been pressure-tested, then zoom out only when another material track remains unresolved and would change execution.
163
178
 
179
+ Maintain a **Docs/Terminology Ledger** for brownfield interviews:
180
+ - repo docs/rules/context sources inspected, with path references
181
+ - canonical terms already used by the repo and terms to avoid or disambiguate
182
+ - user terms that conflict with docs or current code behavior
183
+ - doc/code mismatches that require a human decision before implementation
184
+ - optional durable-doc follow-ups that are safe to propose but not auto-apply
185
+
164
186
  Detailed dimensions:
165
187
  - Intent Clarity — why the user wants this
166
188
  - Outcome Clarity — what end state they want
@@ -306,6 +328,7 @@ Append round result and updated scores via `omx state write --input '<json>' --j
306
328
  Use each mode once when applicable. These are normal escalation tools, not rare rescue moves:
307
329
 
308
330
  - **Contrarian** (round 2+ or immediately when an answer rests on an untested assumption): challenge core assumptions
331
+ - **Terminologist** (brownfield, whenever a key term is fuzzy, overloaded, or conflicts with repo docs/code): force a canonical meaning against existing project language before implementation
309
332
  - **Simplifier** (round 4+ or when scope expands faster than outcome clarity): probe minimal viable scope
310
333
  - **Ontologist** (round 5+ and ambiguity > 0.25, or when the user keeps describing symptoms): ask for essence-level reframing
311
334
 
@@ -336,6 +359,9 @@ Spec should include:
336
359
  - Assumptions exposed + resolutions
337
360
  - Pressure-pass findings (which answer was revisited, and what changed)
338
361
  - Brownfield evidence vs inference notes for any repository-grounded confirmation questions
362
+ - Docs/Terminology Ledger with inspected repo docs/rules/context, term conflicts, and any doc/code mismatch decisions
363
+ - Scenario/edge-case pressure findings that materially shaped scope or acceptance criteria
364
+ - Optional durable documentation recommendations, explicitly marked opt-in and public-safe; do not include raw private transcript dumps
339
365
  - Technical context findings
340
366
  - Full or condensed transcript
341
367
 
@@ -365,11 +391,45 @@ When the clarified task is specifically about `$autoresearch`, or the skill is i
365
391
 
366
392
  ## Phase 5: Execution Bridge
367
393
 
368
- Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, and any residual-risk warnings across the handoff.
394
+ Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, and any residual-risk warnings across the handoff.
395
+
396
+ ### Optional execution contract foundation
397
+
398
+ When an Autopilot/deep-interview handoff explicitly requires a stride contract, emit it as structured data rather than prose. This is a validation foundation, not a broadness-inference feature: do not infer stride from task length, phase labels, snapshots, or freeform wording.
399
+
400
+ Canonical location under Autopilot state:
401
+
402
+ ```json
403
+ {
404
+ "handoff_artifacts": {
405
+ "deep_interview": {
406
+ "execution_contract_required": true,
407
+ "execution_contract": {
408
+ "version": 1,
409
+ "execution_stride": "task",
410
+ "source": "deep-interview",
411
+ "selected_by": "user",
412
+ "allow_task_shrink": true,
413
+ "completion_unit": "One focused task",
414
+ "stop_condition": "Stop after that task is implemented and verified",
415
+ "acceptance_coverage_scope": "task",
416
+ "shrink_policy": "allowed"
417
+ }
418
+ }
419
+ }
420
+ }
421
+ ```
422
+
423
+ Stride meanings:
424
+ - `task`: conservative, small-step execution; `allow_task_shrink:true`, `acceptance_coverage_scope:"task"`, `shrink_policy:"allowed"`.
425
+ - `deliverable`: finish the named deliverable before stopping; `allow_task_shrink:false`, `acceptance_coverage_scope:"deliverable"`, `shrink_policy:"ask_before_shrink"`.
426
+ - `milestone`: finish the larger approved milestone unless blocked; `allow_task_shrink:false`, `acceptance_coverage_scope:"milestone"`, `shrink_policy:"deny_unless_blocked"`.
427
+
428
+ Only set `execution_contract_required:true` when the selected downstream workflow needs this explicit stride/stop-condition guard. New artifacts must write the canonical snake_case schema shown above under `handoff_artifacts.deep_interview`; runtime readers may accept legacy camelCase field/marker aliases and direct/nested `execution_contract` locations only as compatibility input. If `execution_contract_required` is absent or false, downstream Autopilot compatibility behavior is unchanged.
369
429
 
370
430
  ### Goal-mode follow-ups
371
431
 
372
- Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
432
+ Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
373
433
 
374
434
  - **`$ultragoal`** — default goal-mode follow-up for implementation or general goal-oriented follow-up specs that should be converted into durable Codex/OMX goals with sequential completion tracking.
375
435
  - **`$autoresearch-goal`** — use when the clarified context is a research project: a research question, reference/literature gathering, evaluator-backed analysis, or professor/critic-style deliverable.
@@ -377,7 +437,16 @@ Include these product-facing suggestions when they fit the clarified spec, witho
377
437
 
378
438
  Recommend `$ultragoal` as the default durable goal-mode follow-up because it supersedes Ralph for goal tracking. Preserve `$team` for coordinated parallel implementation and keep `$ralph` only as an explicit fallback for persistent single-owner execution/verification when the user specifically selects it.
379
439
 
380
- ### 1. **`$ralplan` (Recommended)**
440
+ ### 1. **`$ultragoal` (Default durable execution follow-up)**
441
+ - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
442
+ - **Invocation:** `$ultragoal create-goals --brief-file <spec-path>` followed by `$ultragoal complete-goals` in the active execution lane
443
+ - **Consumer Behavior:** Convert the clarified spec into durable goal-mode work. Preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, scenario-pressure findings, and residual-risk warnings as binding story constraints.
444
+ - **Skipped / Already-Satisfied Stages:** Requirement interview, ambiguity clarification, doc/context preflight, and early intent-boundary elicitation
445
+ - **Expected Output:** `.omx/ultragoal/brief.md`, `.omx/ultragoal/goals.json`, `.omx/ultragoal/ledger.jsonl`, implementation evidence, verification evidence, and final cleanup/review-gate evidence
446
+ - **Best When:** The clarified spec is execution-ready or the user explicitly wants durable goal tracking as the next step
447
+ - **Next Recommended Step:** Run the Ultragoal completion loop; launch `$team` only inside an active Ultragoal story when parallel lanes are warranted, and use `$ralph` only as an explicit fallback when the user asks for that legacy persistence mode
448
+
449
+ ### 2. **`$ralplan` (Recommended when architecture/test-shape review is still needed)**
381
450
  - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
382
451
  - **Invocation:** `$plan --consensus --direct <spec-path>`
383
452
  - **Consumer Behavior:** Treat the deep-interview spec as the requirements source of truth. Do not repeat the interview by default; refine architecture/feasibility around the clarified intent and boundaries instead.
@@ -386,7 +455,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
386
455
  - **Best When:** Requirements are clear enough to stop interviewing, but architectural validation / consensus planning is still desirable
387
456
  - **Next Recommended Step:** Use the approved planning artifacts with `$ultragoal` as the default durable goal-mode follow-up (optionally with `$team` for parallel lanes); choose `$autoresearch-goal` for research validation or `$performance-goal` for measurable optimization, and use `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
388
457
 
389
- ### 2. **`$autopilot`**
458
+ ### 3. **`$autopilot`**
390
459
  - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
391
460
  - **Invocation:** `$autopilot <spec-path>`
392
461
  - **Consumer Behavior:** Use the deep-interview spec as the clarified execution brief. Preserve intent, non-goals, decision boundaries, and acceptance criteria as binding context for planning/execution.
@@ -395,7 +464,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
395
464
  - **Best When:** The clarified spec is already strong enough for direct planning + execution without an additional consensus gate
396
465
  - **Next Recommended Step:** Continue through autopilot's execution/QA/validation flow; if coordination-heavy execution emerges, prefer `$team` under a leader-owned `$ultragoal` ledger, using `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
397
466
 
398
- ### 3. **`$ralph` (Explicit fallback only)**
467
+ ### 4. **`$ralph` (Explicit fallback only)**
399
468
  - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
400
469
  - **Invocation:** `$ralph <spec-path>`
401
470
  - **Consumer Behavior:** Use the spec's acceptance criteria and boundary constraints as the persistence target. Do not reopen requirements discovery unless the user explicitly asks to refine further.
@@ -404,7 +473,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
404
473
  - **Best When:** The user explicitly asks for Ralph's persistent sequential completion pressure; otherwise use `$ultragoal` for durable goal tracking and completion checkpoints
405
474
  - **Next Recommended Step:** If this explicit fallback is selected, continue Ralph's persistence loop; if work expands into coordination-heavy lanes, hand off to `$team` under `$ultragoal` checkpointing rather than promoting Ralph as the next default
406
475
 
407
- ### 4. **`$team`**
476
+ ### 5. **`$team`**
408
477
  - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
409
478
  - **Invocation:** `$team <spec-path>`
410
479
  - **Consumer Behavior:** Treat the spec as shared execution context for coordinated parallel work. Preserve the clarified intent, non-goals, decision boundaries, and acceptance criteria as common lane constraints.
@@ -413,7 +482,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
413
482
  - **Best When:** The task is large, multi-lane, or blocker-sensitive enough to justify coordinated parallel execution instead of a single persistent loop
414
483
  - **Next Recommended Step:** Follow the team verification path when the coordinated execution phase finishes; checkpoint completion through `$ultragoal` by default, escalating to a separate Ralph loop only when the user explicitly asks for that persistent verification/fix owner
415
484
 
416
- ### 5. **Refine further**
485
+ ### 6. **Refine further**
417
486
  - **Input Artifact:** Existing transcript, context snapshot, and current spec draft
418
487
  - **Invocation:** Continue the interview loop
419
488
  - **Consumer Behavior:** Re-enter questioning to resolve the highest-leverage remaining uncertainty
@@ -437,6 +506,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
437
506
  - Use `omx state write/read --input '<json>' --json` for resumable mode state; `state_write` / `state_read` are explicit MCP compatibility fallbacks only
438
507
  - If the interview cannot ask a required `omx question` round, persist the blocker as terminal state with `active: false` and `current_phase: "blocked"`; do not write a terminal blocked phase with `active: true`
439
508
  - Read/write context snapshots under `.omx/context/`
509
+ - Read applicable repo docs/rules/context during preflight; write durable docs, glossary, ADR, or memory updates only when the user explicitly opts in and the content is public-safe
440
510
  - Record whether the oversized-context summary gate is not needed, pending, or satisfied before any scoring or handoff step
441
511
  - Save transcript/spec artifacts under `.omx/interviews/` and `.omx/specs/`
442
512
  </Tool_Usage>
@@ -460,7 +530,11 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
460
530
  - [ ] Transcript written to `.omx/interviews/{slug}-{timestamp}.md`
461
531
  - [ ] Spec written to `.omx/specs/deep-interview-{slug}.md`
462
532
  - [ ] Brownfield questions use evidence-backed confirmation when applicable
463
- - [ ] Handoff options provided (`$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$ultragoal`, `$autoresearch-goal`, `$performance-goal`) when applicable
533
+ - [ ] Brownfield preflight inspected applicable repo docs/rules/context before user-facing questions
534
+ - [ ] Fuzzy or conflicting terminology was challenged against repo language/current code behavior when applicable
535
+ - [ ] Scenario-based edge-case grilling was used when boundary ambiguity would materially affect implementation
536
+ - [ ] Durable docs/ADR/memory updates, if any, were explicitly opted into and public-safe
537
+ - [ ] Handoff options provided (`$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$autoresearch-goal`, `$performance-goal`) when applicable
464
538
  - [ ] No direct implementation performed in this mode
465
539
  </Final_Checklist>
466
540
 
@@ -4,22 +4,23 @@ description: Parallel execution engine for high-throughput task completion
4
4
  ---
5
5
 
6
6
  <Purpose>
7
- Ultrawork is a parallel execution engine for high-throughput task completion. It is a component, not a standalone persistence mode: it provides parallelism, context discipline, and smart delegation guidance, but not Ralph's persistence loop, architect sign-off, or long-running completion guarantees.
7
+ Ultrawork is a parallel execution engine for high-throughput task completion. It is a component, not a standalone persistence or verification mode: it provides parallelism, context discipline, and smart delegation guidance, but not durable goal tracking, Team's tmux worker lifecycle, Ralph's legacy persistence loop, architect sign-off, or long-running completion guarantees.
8
8
  </Purpose>
9
9
 
10
10
  <Use_When>
11
11
  - Multiple independent tasks can run simultaneously
12
12
  - User says "ulw", "ultrawork", or explicitly wants parallel execution
13
13
  - Task benefits from concurrent execution plus lightweight evidence before wrap-up
14
- - You need a direct-tool lane plus optional background evidence lanes without entering Ralph
14
+ - You need a direct-tool lane plus optional background evidence lanes without entering Team or a durable goal workflow
15
15
  </Use_When>
16
16
 
17
17
  <Do_Not_Use_When>
18
- - Task requires guaranteed completion with persistence, architect verification, or deslop/reverification -- use `ralph` instead (Ralph includes ultrawork)
19
- - Task requires a full autonomous pipeline -- use `autopilot` instead (autopilot defaults to Ultragoal, with Team/parallel execution used only when needed)
20
- - There is only one sequential task with no parallelism opportunity -- execute directly or delegate to a single `executor`
18
+ - Task needs durable goal tracking, ledger checkpoints, or resume across stories -- use `ultragoal` instead
19
+ - Task needs coordinated tmux workers, shared task state, mailbox/dispatch coordination, or long-running parallel execution -- use `team` instead
20
+ - Task requires a full autonomous pipeline -- use `autopilot` instead (default loop: `deep-interview -> ralplan -> ultragoal`, with `team` only when needed)
21
+ - Task intentionally requires the legacy persistent single-owner completion/verification loop -- use `ralph` explicitly; do not present it as the default durable path
22
+ - There is only one sequential task with no parallelism opportunity -- execute directly, use `ultragoal` for durable tracking, or delegate to a single `executor`
21
23
  - The request is still in plan-consensus mode -- keep planning artifacts in `ralplan` until execution is explicitly authorized
22
- - User needs session persistence for resume -- use `ralph`, which adds persistence on top of ultrawork
23
24
  </Do_Not_Use_When>
24
25
 
25
26
  <Why_This_Exists>
@@ -138,8 +139,12 @@ Why bad: No verification output, no acceptance evidence, and no manual QA note w
138
139
  </Examples>
139
140
 
140
141
  <Escalation_And_Stop_Conditions>
141
- - When ultrawork is invoked directly (not via Ralph), apply lightweight verification only -- build/typecheck passes when relevant, affected tests pass, and manual QA notes are captured when needed.
142
- - Ralph owns persistence, architect verification, deslop, and the full verified-completion promise. Do not claim those guarantees from direct ultrawork alone.
142
+ - When ultrawork is invoked directly, apply lightweight verification only -- build/typecheck passes when relevant, affected tests pass, and manual QA notes are captured when needed.
143
+ - Ultrawork does not own persistence, durable ledgers, architect verification, deslop, full QA, or the full verified-completion promise. Do not claim those guarantees from direct ultrawork alone.
144
+ - Escalate to `ultragoal` when the work needs durable goal state, story checkpoints, or resume across implementation steps.
145
+ - Escalate to `team` when the work needs coordinated tmux workers, shared task state, or durable multi-worker lifecycle control.
146
+ - Escalate to explicitly requested `ralph` only for the supported legacy single-owner persistence/verification fallback.
147
+ - Ralph owns persistence, architect verification, deslop, and the full verified-completion promise only when explicitly selected as the supported legacy fallback; direct ultrawork does not own those guarantees.
143
148
  - If a task fails repeatedly across retries, report the issue rather than retrying indefinitely.
144
149
  - Escalate to the user when tasks have unclear dependencies, conflicting requirements, or a materially branching acceptance target.
145
150
  </Escalation_And_Stop_Conditions>
@@ -159,17 +164,27 @@ Why bad: No verification output, no acceptance evidence, and no manual QA note w
159
164
  ## Relationship to Other Modes
160
165
 
161
166
  ```
162
- ralph (persistence + verified completion wrapper)
163
- \-- includes: ultrawork (this skill)
164
- \-- provides: high-throughput execution + lightweight evidence
167
+ ultrawork (this skill)
168
+ \-- provides: in-session parallel execution discipline + lightweight evidence
165
169
 
166
- autopilot (autonomous execution)
167
- \-- includes: ralph
168
- \-- includes: ultrawork (this skill)
170
+ ultragoal (durable goal execution)
171
+ \-- owns: goal ledger, checkpoints, resume across stories, final gate discipline
172
+ \-- may use: team for parallel lanes when a story benefits from coordinated workers
169
173
 
170
- ecomode (token efficiency)
171
- \-- modifies: ultrawork's model selection
174
+ team (tmux coordinated execution)
175
+ \-- owns: worker panes, shared task state, mailbox/dispatch, lifecycle control
176
+ \-- can return: checkpoint-ready evidence to an Ultragoal leader
177
+
178
+ autopilot (strict autonomous delivery loop)
179
+ \-- default flow: deep-interview -> ralplan -> ultragoal -> code-review -> ultraqa
180
+ \-- may use: team only when an Ultragoal story needs parallel execution
181
+
182
+ ralph (supported legacy explicit fallback)
183
+ \-- owns: single-owner persistence loop + architect verification when intentionally selected
184
+
185
+ ecomode (deprecated compatibility-only)
186
+ \-- do not route users there from ultrawork; it is not the current model-selection path
172
187
  ```
173
188
 
174
- Ultrawork is the parallelism and execution-discipline layer. Ralph adds persistence, architect verification, deslop, and retry-until-done behavior. Autopilot adds the broader autonomous lifecycle pipeline. Ecomode adjusts ultrawork's model routing to favor cheaper models.
189
+ Ultrawork is the parallelism and execution-discipline layer. Ultragoal is the current default durable goal/ledger follow-up. Team is the coordinated tmux parallel runtime, often nested under an Ultragoal story when durable work needs multiple lanes. Autopilot orchestrates the full default lifecycle through deep-interview, ralplan, ultragoal, code-review, and ultraqa. Ralph remains active as an explicit legacy fallback for persistent single-owner verification, but it is not the recommended default durable path. Ecomode is deprecated compatibility-only and should not be advertised as the ultrawork model-selection route.
175
190
  </Advanced>
@@ -133,6 +133,9 @@ Required fields:
133
133
 
134
134
  - **On start**: `omx state write --input '{"mode":"autopilot","active":true,"current_phase":"deep-interview","iteration":1,"review_cycle":0,"state":{"phase_cycle":["deep-interview","ralplan","ultragoal","code-review","ultraqa"],"handoff_artifacts":{"context_snapshot_path":"<snapshot-path>","deep_interview":null,"ralplan":null,"ralplan_consensus_gate":{"required":true,"sequence":["architect-review","critic-review"],"planning_artifacts_are_not_consensus":true,"required_review_roles":["architect","critic"],"ralplan_architect_review":null,"ralplan_critic_review":null,"complete":false},"ultragoal":null,"code_review":null,"ultraqa":null},"review_verdict":null,"qa_verdict":null,"return_to_ralplan_reason":null}}' --json`
135
135
  - **On deep-interview -> ralplan**: only after a separate gate proves the interview chain is explicitly complete or the user explicitly authorized a skip. For completion, persist `deep_interview_gate:{"status":"complete","rationale":"<why requirements are complete>","handoff_summary":"<summary>"}` (or equivalent non-empty rationale/summary) plus the clarified spec/requirements under `handoff_artifacts.deep_interview`; if a final `omx question` was involved, keep its same-session answered record linked by `question_id`/`satisfied_at`. For skip, persist `deep_interview_gate:{"status":"skipped","skip_authorized_by_user":true,"skip_reason":"<user-authorized reason>","skipped_at":"<timestamp>","source":"user","session_id":"<session>"}`. Do not leave deep-interview merely because the first `omx question` was answered or cleared.
136
+ - **Optional execution contract foundation**: when a downstream handoff explicitly sets `execution_contract_required:true`, persist a complete structured `execution_contract` under `handoff_artifacts.deep_interview` before leaving deep-interview. The canonical schema is `version:1`, `execution_stride:"task"|"deliverable"|"milestone"`, `source:"deep-interview"`, `selected_by:"user"|"default"`, `allow_task_shrink:<boolean>`, non-empty `completion_unit`, non-empty `stop_condition`, `acceptance_coverage_scope:"task"|"deliverable"|"milestone"`, and `shrink_policy:"allowed"|"ask_before_shrink"|"deny_unless_blocked"`.
137
+ - Stride semantics are binding only when `execution_contract_required:true`: `task` means `allow_task_shrink:true`, `acceptance_coverage_scope:"task"`, `shrink_policy:"allowed"`; `deliverable` means `allow_task_shrink:false`, `acceptance_coverage_scope:"deliverable"`, `shrink_policy:"ask_before_shrink"`; `milestone` means `allow_task_shrink:false`, `acceptance_coverage_scope:"milestone"`, `shrink_policy:"deny_unless_blocked"`.
138
+ - Preserve legacy behavior when `execution_contract_required` is absent or false. Do not infer stride from prose, broadness, phase names, snapshots, or task size; this foundation only validates an explicit structured contract and deliberately uses `milestone` rather than `phase`. New artifacts must write canonical snake_case keys under `handoff_artifacts.deep_interview`; the runtime may read legacy camelCase field/marker aliases and direct/nested `execution_contract` locations only as compatibility input.
136
139
  - **On ralplan -> ultragoal**: only after `ralplan_consensus_gate.complete:true`, with tracker-backed native-subagent `ralplan_architect_review.agent_role:"architect"` and `ralplan_architect_review.verdict:"approve"` recorded before tracker-backed native-subagent `ralplan_critic_review.agent_role:"critic"` and `ralplan_critic_review.verdict:"approve"`; `codex_exec` or artifact-only approvals are trace evidence but not native lane proof. Set `current_phase:"ultragoal"` and persist the plan/test-spec paths under `handoff_artifacts.ralplan`.
137
140
  - **On missing ralplan consensus evidence**: keep `current_phase:"ralplan"`, persist `ralplan_consensus_gate.complete:false` with `blocked_reason`, and report an explicit blocker or max-iteration outcome instead of handing off to execution.
138
141
  - **On ultragoal -> code-review**: set `current_phase:"code-review"`, persist implementation/test/ledger evidence under `handoff_artifacts.ultragoal`.
@@ -71,10 +71,11 @@ Delegates to the `code-reviewer` and `architect` agents in parallel for a two-la
71
71
 
72
72
  Do not self-review as a fallback. If the `code-reviewer` or `architect` agent path is missing, unavailable, skipped, or fails, emit a clear unavailable-review result and block approval until the independent lane evidence exists.
73
73
 
74
+ Respect the user's current model and reasoning/effort selection when launching review lanes. Do not pass `model` or `reasoning_effort` overrides in the review-lane task calls unless the user explicitly asks for review-specific overrides; omitting them lets native subagents inherit the active session settings.
75
+
74
76
  ```
75
77
  task(
76
78
  agent_type="code-reviewer",
77
- reasoning_effort="xhigh",
78
79
  prompt="CODE REVIEW TASK
79
80
 
80
81
  Review code changes for quality, security, and maintainability.
@@ -100,7 +101,6 @@ Output: Code review report with:
100
101
 
101
102
  task(
102
103
  agent_type="architect",
103
- reasoning_effort="xhigh",
104
104
  prompt="ARCHITECTURE / DEVIL'S-ADVOCATE REVIEW TASK
105
105
 
106
106
  Review the same code changes from the architecture/tradeoff perspective.
@@ -51,6 +51,11 @@ If no flag is provided, use **Standard**.
51
51
  - Gather codebase facts via `explore` before asking user about internals
52
52
  - `omx explore` is deprecated. Use normal repository inspection tools/subagents for simple read-only brownfield fact gathering; use `omx sparkshell` only for explicit shell-native read-only evidence, and keep ambiguous or non-shell-only investigation on the richer normal path.
53
53
  - Always run a preflight context intake before the first interview question
54
+ - For brownfield work, preflight must include doc/context grounding before user-facing questions: inspect applicable `AGENTS.md` files, README/getting-started docs, relevant `docs/` contracts/plans/ADRs, existing `.omx/context/` snapshots, and any project-local glossary/context files such as `CONTEXT.md` or `CONTEXT-MAP.md` when present.
55
+ - Treat existing repo language as evidence, not authority: if the user uses a fuzzy, overloaded, or conflicting term, surface the specific doc/code wording and ask which meaning should govern before implementation.
56
+ - Cross-check user claims about current behavior against code or documented contracts when discoverable. If docs and code disagree, ask a confirmation question that names both sources instead of silently choosing one.
57
+ - Use scenario-based edge-case grilling when relationships, boundaries, or handoff behavior are unclear: invent one concrete scenario that stresses the ambiguous boundary, then ask one focused question about the expected outcome.
58
+ - Durable docs, glossary, ADR, or memory updates are opt-in and public-safe only. Deep-interview may recommend such updates in the handoff summary, but must not automatically create or dump public docs from interview transcripts unless the user explicitly chooses that as in-scope.
54
59
  - If initial context is oversized or would exceed the prompt budget, do not paste or forward the raw payload into interview prompts; request and record a prompt-safe initial-context summary first
55
60
  - The oversized initial-context summary gate is blocking: wait for the concise summary before ambiguity scoring, crystallizing artifacts, or any downstream execution handoff
56
61
  - The summary must preserve goals, constraints, success criteria, non-goals, decision boundaries, and references to any full source documents so downstream consumers receive a prompt-safe but faithful context
@@ -97,8 +102,15 @@ If no flag is provided, use **Standard**.
97
102
  - Unknowns/open questions
98
103
  - Decision-boundary unknowns
99
104
  - Likely codebase touchpoints
105
+ - Relevant repo docs/rules/context inspected
106
+ - Terminology or doc/code conflicts found
100
107
  - Prompt-safe initial-context summary status (`not_needed`, `needed`, or `recorded`)
101
- 5. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
108
+ 5. For brownfield tasks, inspect the applicable documentation/rule surface before the first user-facing round. Prefer exact, nearby sources over broad scans:
109
+ - governing `AGENTS.md` files and template/runtime instruction surfaces that apply to the touched paths
110
+ - README/getting-started docs and relevant docs under `docs/`, especially contracts, plans, ADR-like records, and workflow docs
111
+ - existing `.omx/context/` snapshots, `.omx/specs/`, and planning artifacts relevant to the slug
112
+ - project-local glossary/context files such as `CONTEXT.md`, `CONTEXT-MAP.md`, or context-specific docs when they exist
113
+ 6. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
102
114
 
103
115
  ## Phase 1: Initialize
104
116
 
@@ -137,13 +149,14 @@ If no flag is provided, use **Standard**.
137
149
  Repeat until ambiguity `<= threshold`, the pressure pass is complete, the readiness gates are explicit, the user exits with warning, or max rounds are reached. This is a stop condition: below threshold, do not open a new ordinary interview branch.
138
150
 
139
151
  ### 2a) Generate next question
140
- If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
152
+ If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
141
153
 
142
154
  Use:
143
155
  - Original idea
144
156
  - Prior Q&A rounds
145
157
  - Current dimension scores
146
158
  - Brownfield context (if any)
159
+ - Doc/context grounding notes, including existing terminology, governing rules, and any doc/code mismatch
147
160
  - Activated challenge mode injection (Phase 3)
148
161
 
149
162
  Target the lowest-scoring dimension, but respect stage priority:
@@ -155,12 +168,21 @@ Follow-up pressure ladder after each answer:
155
168
  1. Ask for a concrete example, counterexample, or evidence signal behind the latest claim
156
169
  2. Probe the hidden assumption, dependency, or belief that makes the claim true
157
170
  3. Force a boundary or tradeoff: what would you explicitly not do, defer, or reject?
158
- 4. If the answer still describes symptoms, reframe toward essence / root cause before moving on
171
+ 4. Challenge fuzzy or conflicting terms against the repo's documented language and current code behavior
172
+ 5. Stress-test the boundary with one concrete scenario or edge case when a relationship or handoff remains ambiguous
173
+ 6. If the answer still describes symptoms, reframe toward essence / root cause before moving on
159
174
 
160
175
  Prefer staying on the same thread for multiple rounds when it has the highest leverage. Breadth without pressure is not progress.
161
176
 
162
177
  Maintain a **Breadth Ledger** across independent ambiguity tracks: scope, constraints, outputs, verification, brownfield integration, and any user-mentioned deliverable tracks. The ledger is a guard, not a mandatory rotation rule: stay deep on the current thread until it has been pressure-tested, then zoom out only when another material track remains unresolved and would change execution.
163
178
 
179
+ Maintain a **Docs/Terminology Ledger** for brownfield interviews:
180
+ - repo docs/rules/context sources inspected, with path references
181
+ - canonical terms already used by the repo and terms to avoid or disambiguate
182
+ - user terms that conflict with docs or current code behavior
183
+ - doc/code mismatches that require a human decision before implementation
184
+ - optional durable-doc follow-ups that are safe to propose but not auto-apply
185
+
164
186
  Detailed dimensions:
165
187
  - Intent Clarity — why the user wants this
166
188
  - Outcome Clarity — what end state they want
@@ -306,6 +328,7 @@ Append round result and updated scores via `omx state write --input '<json>' --j
306
328
  Use each mode once when applicable. These are normal escalation tools, not rare rescue moves:
307
329
 
308
330
  - **Contrarian** (round 2+ or immediately when an answer rests on an untested assumption): challenge core assumptions
331
+ - **Terminologist** (brownfield, whenever a key term is fuzzy, overloaded, or conflicts with repo docs/code): force a canonical meaning against existing project language before implementation
309
332
  - **Simplifier** (round 4+ or when scope expands faster than outcome clarity): probe minimal viable scope
310
333
  - **Ontologist** (round 5+ and ambiguity > 0.25, or when the user keeps describing symptoms): ask for essence-level reframing
311
334
 
@@ -336,6 +359,9 @@ Spec should include:
336
359
  - Assumptions exposed + resolutions
337
360
  - Pressure-pass findings (which answer was revisited, and what changed)
338
361
  - Brownfield evidence vs inference notes for any repository-grounded confirmation questions
362
+ - Docs/Terminology Ledger with inspected repo docs/rules/context, term conflicts, and any doc/code mismatch decisions
363
+ - Scenario/edge-case pressure findings that materially shaped scope or acceptance criteria
364
+ - Optional durable documentation recommendations, explicitly marked opt-in and public-safe; do not include raw private transcript dumps
339
365
  - Technical context findings
340
366
  - Full or condensed transcript
341
367
 
@@ -365,11 +391,45 @@ When the clarified task is specifically about `$autoresearch`, or the skill is i
365
391
 
366
392
  ## Phase 5: Execution Bridge
367
393
 
368
- Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, and any residual-risk warnings across the handoff.
394
+ Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, and any residual-risk warnings across the handoff.
395
+
396
+ ### Optional execution contract foundation
397
+
398
+ When an Autopilot/deep-interview handoff explicitly requires a stride contract, emit it as structured data rather than prose. This is a validation foundation, not a broadness-inference feature: do not infer stride from task length, phase labels, snapshots, or freeform wording.
399
+
400
+ Canonical location under Autopilot state:
401
+
402
+ ```json
403
+ {
404
+ "handoff_artifacts": {
405
+ "deep_interview": {
406
+ "execution_contract_required": true,
407
+ "execution_contract": {
408
+ "version": 1,
409
+ "execution_stride": "task",
410
+ "source": "deep-interview",
411
+ "selected_by": "user",
412
+ "allow_task_shrink": true,
413
+ "completion_unit": "One focused task",
414
+ "stop_condition": "Stop after that task is implemented and verified",
415
+ "acceptance_coverage_scope": "task",
416
+ "shrink_policy": "allowed"
417
+ }
418
+ }
419
+ }
420
+ }
421
+ ```
422
+
423
+ Stride meanings:
424
+ - `task`: conservative, small-step execution; `allow_task_shrink:true`, `acceptance_coverage_scope:"task"`, `shrink_policy:"allowed"`.
425
+ - `deliverable`: finish the named deliverable before stopping; `allow_task_shrink:false`, `acceptance_coverage_scope:"deliverable"`, `shrink_policy:"ask_before_shrink"`.
426
+ - `milestone`: finish the larger approved milestone unless blocked; `allow_task_shrink:false`, `acceptance_coverage_scope:"milestone"`, `shrink_policy:"deny_unless_blocked"`.
427
+
428
+ Only set `execution_contract_required:true` when the selected downstream workflow needs this explicit stride/stop-condition guard. New artifacts must write the canonical snake_case schema shown above under `handoff_artifacts.deep_interview`; runtime readers may accept legacy camelCase field/marker aliases and direct/nested `execution_contract` locations only as compatibility input. If `execution_contract_required` is absent or false, downstream Autopilot compatibility behavior is unchanged.
369
429
 
370
430
  ### Goal-mode follow-ups
371
431
 
372
- Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
432
+ Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
373
433
 
374
434
  - **`$ultragoal`** — default goal-mode follow-up for implementation or general goal-oriented follow-up specs that should be converted into durable Codex/OMX goals with sequential completion tracking.
375
435
  - **`$autoresearch-goal`** — use when the clarified context is a research project: a research question, reference/literature gathering, evaluator-backed analysis, or professor/critic-style deliverable.
@@ -377,7 +437,16 @@ Include these product-facing suggestions when they fit the clarified spec, witho
377
437
 
378
438
  Recommend `$ultragoal` as the default durable goal-mode follow-up because it supersedes Ralph for goal tracking. Preserve `$team` for coordinated parallel implementation and keep `$ralph` only as an explicit fallback for persistent single-owner execution/verification when the user specifically selects it.
379
439
 
380
- ### 1. **`$ralplan` (Recommended)**
440
+ ### 1. **`$ultragoal` (Default durable execution follow-up)**
441
+ - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
442
+ - **Invocation:** `$ultragoal create-goals --brief-file <spec-path>` followed by `$ultragoal complete-goals` in the active execution lane
443
+ - **Consumer Behavior:** Convert the clarified spec into durable goal-mode work. Preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, scenario-pressure findings, and residual-risk warnings as binding story constraints.
444
+ - **Skipped / Already-Satisfied Stages:** Requirement interview, ambiguity clarification, doc/context preflight, and early intent-boundary elicitation
445
+ - **Expected Output:** `.omx/ultragoal/brief.md`, `.omx/ultragoal/goals.json`, `.omx/ultragoal/ledger.jsonl`, implementation evidence, verification evidence, and final cleanup/review-gate evidence
446
+ - **Best When:** The clarified spec is execution-ready or the user explicitly wants durable goal tracking as the next step
447
+ - **Next Recommended Step:** Run the Ultragoal completion loop; launch `$team` only inside an active Ultragoal story when parallel lanes are warranted, and use `$ralph` only as an explicit fallback when the user asks for that legacy persistence mode
448
+
449
+ ### 2. **`$ralplan` (Recommended when architecture/test-shape review is still needed)**
381
450
  - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
382
451
  - **Invocation:** `$plan --consensus --direct <spec-path>`
383
452
  - **Consumer Behavior:** Treat the deep-interview spec as the requirements source of truth. Do not repeat the interview by default; refine architecture/feasibility around the clarified intent and boundaries instead.
@@ -386,7 +455,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
386
455
  - **Best When:** Requirements are clear enough to stop interviewing, but architectural validation / consensus planning is still desirable
387
456
  - **Next Recommended Step:** Use the approved planning artifacts with `$ultragoal` as the default durable goal-mode follow-up (optionally with `$team` for parallel lanes); choose `$autoresearch-goal` for research validation or `$performance-goal` for measurable optimization, and use `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
388
457
 
389
- ### 2. **`$autopilot`**
458
+ ### 3. **`$autopilot`**
390
459
  - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
391
460
  - **Invocation:** `$autopilot <spec-path>`
392
461
  - **Consumer Behavior:** Use the deep-interview spec as the clarified execution brief. Preserve intent, non-goals, decision boundaries, and acceptance criteria as binding context for planning/execution.
@@ -395,7 +464,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
395
464
  - **Best When:** The clarified spec is already strong enough for direct planning + execution without an additional consensus gate
396
465
  - **Next Recommended Step:** Continue through autopilot's execution/QA/validation flow; if coordination-heavy execution emerges, prefer `$team` under a leader-owned `$ultragoal` ledger, using `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
397
466
 
398
- ### 3. **`$ralph` (Explicit fallback only)**
467
+ ### 4. **`$ralph` (Explicit fallback only)**
399
468
  - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
400
469
  - **Invocation:** `$ralph <spec-path>`
401
470
  - **Consumer Behavior:** Use the spec's acceptance criteria and boundary constraints as the persistence target. Do not reopen requirements discovery unless the user explicitly asks to refine further.
@@ -404,7 +473,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
404
473
  - **Best When:** The user explicitly asks for Ralph's persistent sequential completion pressure; otherwise use `$ultragoal` for durable goal tracking and completion checkpoints
405
474
  - **Next Recommended Step:** If this explicit fallback is selected, continue Ralph's persistence loop; if work expands into coordination-heavy lanes, hand off to `$team` under `$ultragoal` checkpointing rather than promoting Ralph as the next default
406
475
 
407
- ### 4. **`$team`**
476
+ ### 5. **`$team`**
408
477
  - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
409
478
  - **Invocation:** `$team <spec-path>`
410
479
  - **Consumer Behavior:** Treat the spec as shared execution context for coordinated parallel work. Preserve the clarified intent, non-goals, decision boundaries, and acceptance criteria as common lane constraints.
@@ -413,7 +482,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
413
482
  - **Best When:** The task is large, multi-lane, or blocker-sensitive enough to justify coordinated parallel execution instead of a single persistent loop
414
483
  - **Next Recommended Step:** Follow the team verification path when the coordinated execution phase finishes; checkpoint completion through `$ultragoal` by default, escalating to a separate Ralph loop only when the user explicitly asks for that persistent verification/fix owner
415
484
 
416
- ### 5. **Refine further**
485
+ ### 6. **Refine further**
417
486
  - **Input Artifact:** Existing transcript, context snapshot, and current spec draft
418
487
  - **Invocation:** Continue the interview loop
419
488
  - **Consumer Behavior:** Re-enter questioning to resolve the highest-leverage remaining uncertainty
@@ -437,6 +506,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
437
506
  - Use `omx state write/read --input '<json>' --json` for resumable mode state; `state_write` / `state_read` are explicit MCP compatibility fallbacks only
438
507
  - If the interview cannot ask a required `omx question` round, persist the blocker as terminal state with `active: false` and `current_phase: "blocked"`; do not write a terminal blocked phase with `active: true`
439
508
  - Read/write context snapshots under `.omx/context/`
509
+ - Read applicable repo docs/rules/context during preflight; write durable docs, glossary, ADR, or memory updates only when the user explicitly opts in and the content is public-safe
440
510
  - Record whether the oversized-context summary gate is not needed, pending, or satisfied before any scoring or handoff step
441
511
  - Save transcript/spec artifacts under `.omx/interviews/` and `.omx/specs/`
442
512
  </Tool_Usage>
@@ -460,7 +530,11 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
460
530
  - [ ] Transcript written to `.omx/interviews/{slug}-{timestamp}.md`
461
531
  - [ ] Spec written to `.omx/specs/deep-interview-{slug}.md`
462
532
  - [ ] Brownfield questions use evidence-backed confirmation when applicable
463
- - [ ] Handoff options provided (`$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$ultragoal`, `$autoresearch-goal`, `$performance-goal`) when applicable
533
+ - [ ] Brownfield preflight inspected applicable repo docs/rules/context before user-facing questions
534
+ - [ ] Fuzzy or conflicting terminology was challenged against repo language/current code behavior when applicable
535
+ - [ ] Scenario-based edge-case grilling was used when boundary ambiguity would materially affect implementation
536
+ - [ ] Durable docs/ADR/memory updates, if any, were explicitly opted into and public-safe
537
+ - [ ] Handoff options provided (`$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$autoresearch-goal`, `$performance-goal`) when applicable
464
538
  - [ ] No direct implementation performed in this mode
465
539
  </Final_Checklist>
466
540