npm - oh-my-codex - Versions diffs - 0.18.8 → 0.18.10 - Mend

oh-my-codex 0.18.8 → 0.18.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (221) hide show

package/Cargo.lock +12 -12
package/Cargo.toml +1 -1
package/README.md +4 -0
package/dist/autopilot/__tests__/deep-interview-gate.test.d.ts +2 -0
package/dist/autopilot/__tests__/deep-interview-gate.test.d.ts.map +1 -0
package/dist/autopilot/__tests__/deep-interview-gate.test.js +215 -0
package/dist/autopilot/__tests__/deep-interview-gate.test.js.map +1 -0
package/dist/autopilot/__tests__/fsm.test.js +3 -0
package/dist/autopilot/__tests__/fsm.test.js.map +1 -1
package/dist/autopilot/__tests__/ralplan-gate.test.js +148 -0
package/dist/autopilot/__tests__/ralplan-gate.test.js.map +1 -1
package/dist/autopilot/deep-interview-gate.d.ts.map +1 -1
package/dist/autopilot/deep-interview-gate.js +140 -0
package/dist/autopilot/deep-interview-gate.js.map +1 -1
package/dist/autopilot/fsm.js +2 -2
package/dist/autopilot/fsm.js.map +1 -1
package/dist/cli/__tests__/auth.test.js +37 -2
package/dist/cli/__tests__/auth.test.js.map +1 -1
package/dist/cli/__tests__/codex-feature-probe.test.d.ts +2 -0
package/dist/cli/__tests__/codex-feature-probe.test.d.ts.map +1 -0
package/dist/cli/__tests__/codex-feature-probe.test.js +46 -0
package/dist/cli/__tests__/codex-feature-probe.test.js.map +1 -0
package/dist/cli/__tests__/codex-plugin-layout.test.js +1 -1
package/dist/cli/__tests__/codex-plugin-layout.test.js.map +1 -1
package/dist/cli/__tests__/doctor-warning-copy.test.js +2 -0
package/dist/cli/__tests__/doctor-warning-copy.test.js.map +1 -1
package/dist/cli/__tests__/index.test.js +288 -6
package/dist/cli/__tests__/index.test.js.map +1 -1
package/dist/cli/__tests__/launch-fallback.test.js +19 -5
package/dist/cli/__tests__/launch-fallback.test.js.map +1 -1
package/dist/cli/__tests__/package-bin-contract.test.js +39 -10
package/dist/cli/__tests__/package-bin-contract.test.js.map +1 -1
package/dist/cli/__tests__/question.test.js +26 -9
package/dist/cli/__tests__/question.test.js.map +1 -1
package/dist/cli/__tests__/resume.test.js +50 -1
package/dist/cli/__tests__/resume.test.js.map +1 -1
package/dist/cli/__tests__/setup-refresh.test.js +6 -2
package/dist/cli/__tests__/setup-refresh.test.js.map +1 -1
package/dist/cli/__tests__/sparkshell-packaging.test.js +45 -2
package/dist/cli/__tests__/sparkshell-packaging.test.js.map +1 -1
package/dist/cli/__tests__/team-decompose.test.js +10 -5
package/dist/cli/__tests__/team-decompose.test.js.map +1 -1
package/dist/cli/__tests__/team.test.js +45 -1
package/dist/cli/__tests__/team.test.js.map +1 -1
package/dist/cli/__tests__/ultragoal.test.js +75 -0
package/dist/cli/__tests__/ultragoal.test.js.map +1 -1
package/dist/cli/__tests__/update.test.js +214 -17
package/dist/cli/__tests__/update.test.js.map +1 -1
package/dist/cli/__tests__/windows-popup-loop-contract.test.js +1 -1
package/dist/cli/auth.d.ts.map +1 -1
package/dist/cli/auth.js +25 -1
package/dist/cli/auth.js.map +1 -1
package/dist/cli/codex-feature-probe.d.ts +5 -2
package/dist/cli/codex-feature-probe.d.ts.map +1 -1
package/dist/cli/codex-feature-probe.js +25 -9
package/dist/cli/codex-feature-probe.js.map +1 -1
package/dist/cli/index.d.ts +39 -5
package/dist/cli/index.d.ts.map +1 -1
package/dist/cli/index.js +184 -101
package/dist/cli/index.js.map +1 -1
package/dist/cli/setup.d.ts.map +1 -1
package/dist/cli/setup.js +9 -1
package/dist/cli/setup.js.map +1 -1
package/dist/cli/team.d.ts +4 -0
package/dist/cli/team.d.ts.map +1 -1
package/dist/cli/team.js +43 -4
package/dist/cli/team.js.map +1 -1
package/dist/cli/ultragoal.d.ts.map +1 -1
package/dist/cli/ultragoal.js +29 -0
package/dist/cli/ultragoal.js.map +1 -1
package/dist/cli/update.d.ts +20 -3
package/dist/cli/update.d.ts.map +1 -1
package/dist/cli/update.js +265 -23
package/dist/cli/update.js.map +1 -1
package/dist/cli/version.d.ts.map +1 -1
package/dist/cli/version.js +5 -9
package/dist/cli/version.js.map +1 -1
package/dist/compat/__tests__/doctor-contract.test.js +12 -1
package/dist/compat/__tests__/doctor-contract.test.js.map +1 -1
package/dist/hooks/__tests__/agents-overlay.test.js +1 -0
package/dist/hooks/__tests__/agents-overlay.test.js.map +1 -1
package/dist/hooks/__tests__/autopilot-skill-contract.test.js +15 -0
package/dist/hooks/__tests__/autopilot-skill-contract.test.js.map +1 -1
package/dist/hooks/__tests__/code-review-skill-contract.test.js +7 -3
package/dist/hooks/__tests__/code-review-skill-contract.test.js.map +1 -1
package/dist/hooks/__tests__/deep-interview-contract.test.js +46 -1
package/dist/hooks/__tests__/deep-interview-contract.test.js.map +1 -1
package/dist/hooks/__tests__/skill-guidance-contract.test.js +14 -5
package/dist/hooks/__tests__/skill-guidance-contract.test.js.map +1 -1
package/dist/hooks/agents-overlay.d.ts.map +1 -1
package/dist/hooks/agents-overlay.js +2 -1
package/dist/hooks/agents-overlay.js.map +1 -1
package/dist/hooks/extensibility/__tests__/plugin-runner.test.js +112 -1
package/dist/hooks/extensibility/__tests__/plugin-runner.test.js.map +1 -1
package/dist/hooks/extensibility/plugin-runner-stdin.d.ts +2 -0
package/dist/hooks/extensibility/plugin-runner-stdin.d.ts.map +1 -0
package/dist/hooks/extensibility/plugin-runner-stdin.js +16 -0
package/dist/hooks/extensibility/plugin-runner-stdin.js.map +1 -0
package/dist/hooks/extensibility/plugin-runner.js +2 -4
package/dist/hooks/extensibility/plugin-runner.js.map +1 -1
package/dist/hud/__tests__/index.test.js +23 -2
package/dist/hud/__tests__/index.test.js.map +1 -1
package/dist/hud/__tests__/reconcile.test.js +387 -0
package/dist/hud/__tests__/reconcile.test.js.map +1 -1
package/dist/hud/__tests__/state.test.js +28 -0
package/dist/hud/__tests__/state.test.js.map +1 -1
package/dist/hud/__tests__/tmux.test.js +118 -7
package/dist/hud/__tests__/tmux.test.js.map +1 -1
package/dist/hud/index.d.ts +6 -1
package/dist/hud/index.d.ts.map +1 -1
package/dist/hud/index.js +12 -3
package/dist/hud/index.js.map +1 -1
package/dist/hud/reconcile.d.ts +6 -2
package/dist/hud/reconcile.d.ts.map +1 -1
package/dist/hud/reconcile.js +58 -28
package/dist/hud/reconcile.js.map +1 -1
package/dist/hud/state.d.ts.map +1 -1
package/dist/hud/state.js +4 -18
package/dist/hud/state.js.map +1 -1
package/dist/hud/tmux.d.ts +14 -1
package/dist/hud/tmux.d.ts.map +1 -1
package/dist/hud/tmux.js +129 -15
package/dist/hud/tmux.js.map +1 -1
package/dist/question/__tests__/renderer.test.js +566 -1
package/dist/question/__tests__/renderer.test.js.map +1 -1
package/dist/question/renderer.d.ts +9 -1
package/dist/question/renderer.d.ts.map +1 -1
package/dist/question/renderer.js +246 -70
package/dist/question/renderer.js.map +1 -1
package/dist/ralplan/consensus-gate.js +9 -1
package/dist/ralplan/consensus-gate.js.map +1 -1
package/dist/scripts/__tests__/codex-native-hook.test.js +322 -15
package/dist/scripts/__tests__/codex-native-hook.test.js.map +1 -1
package/dist/scripts/__tests__/run-test-files.test.js +115 -1
package/dist/scripts/__tests__/run-test-files.test.js.map +1 -1
package/dist/scripts/codex-native-hook.d.ts.map +1 -1
package/dist/scripts/codex-native-hook.js +94 -20
package/dist/scripts/codex-native-hook.js.map +1 -1
package/dist/scripts/notify-hook/team-worker-stop.d.ts.map +1 -1
package/dist/scripts/notify-hook/team-worker-stop.js +54 -21
package/dist/scripts/notify-hook/team-worker-stop.js.map +1 -1
package/dist/scripts/run-test-files.js +218 -160
package/dist/scripts/run-test-files.js.map +1 -1
package/dist/state/__tests__/operations.test.js +463 -0
package/dist/state/__tests__/operations.test.js.map +1 -1
package/dist/team/__tests__/delivery-log.test.js +18 -0
package/dist/team/__tests__/delivery-log.test.js.map +1 -1
package/dist/team/__tests__/runtime.test.js +48 -0
package/dist/team/__tests__/runtime.test.js.map +1 -1
package/dist/team/__tests__/tmux-session.test.js +107 -0
package/dist/team/__tests__/tmux-session.test.js.map +1 -1
package/dist/team/__tests__/tmux-test-fixture.d.ts.map +1 -1
package/dist/team/__tests__/tmux-test-fixture.js +14 -2
package/dist/team/__tests__/tmux-test-fixture.js.map +1 -1
package/dist/team/__tests__/tmux-test-fixture.test.js +1 -0
package/dist/team/__tests__/tmux-test-fixture.test.js.map +1 -1
package/dist/team/__tests__/worker-bootstrap.test.js +54 -1
package/dist/team/__tests__/worker-bootstrap.test.js.map +1 -1
package/dist/team/delivery-log.d.ts +1 -1
package/dist/team/delivery-log.d.ts.map +1 -1
package/dist/team/delivery-log.js.map +1 -1
package/dist/team/repo-aware-decomposition.d.ts +4 -0
package/dist/team/repo-aware-decomposition.d.ts.map +1 -1
package/dist/team/repo-aware-decomposition.js.map +1 -1
package/dist/team/runtime.d.ts.map +1 -1
package/dist/team/runtime.js +78 -9
package/dist/team/runtime.js.map +1 -1
package/dist/team/tmux-session.d.ts +1 -0
package/dist/team/tmux-session.d.ts.map +1 -1
package/dist/team/tmux-session.js +16 -5
package/dist/team/tmux-session.js.map +1 -1
package/dist/team/ultragoal-context.d.ts +12 -0
package/dist/team/ultragoal-context.d.ts.map +1 -1
package/dist/team/ultragoal-context.js +32 -8
package/dist/team/ultragoal-context.js.map +1 -1
package/dist/utils/__tests__/paths.test.js +23 -0
package/dist/utils/__tests__/paths.test.js.map +1 -1
package/dist/utils/__tests__/platform-command.test.js +16 -1
package/dist/utils/__tests__/platform-command.test.js.map +1 -1
package/dist/utils/__tests__/version.test.d.ts +2 -0
package/dist/utils/__tests__/version.test.d.ts.map +1 -0
package/dist/utils/__tests__/version.test.js +51 -0
package/dist/utils/__tests__/version.test.js.map +1 -0
package/dist/utils/paths.d.ts +8 -1
package/dist/utils/paths.d.ts.map +1 -1
package/dist/utils/paths.js +20 -6
package/dist/utils/paths.js.map +1 -1
package/dist/utils/platform-command.d.ts +9 -0
package/dist/utils/platform-command.d.ts.map +1 -1
package/dist/utils/platform-command.js +15 -0
package/dist/utils/platform-command.js.map +1 -1
package/dist/utils/toml.d.ts +4 -0
package/dist/utils/toml.d.ts.map +1 -0
package/dist/utils/toml.js +75 -0
package/dist/utils/toml.js.map +1 -0
package/dist/utils/version.d.ts +7 -0
package/dist/utils/version.d.ts.map +1 -0
package/dist/utils/version.js +67 -0
package/dist/utils/version.js.map +1 -0
package/dist/verification/__tests__/ci-rust-gates.test.js +8 -0
package/dist/verification/__tests__/ci-rust-gates.test.js.map +1 -1
package/dist/verification/__tests__/dev-merge-issue-close-workflow.test.js +16 -2
package/dist/verification/__tests__/dev-merge-issue-close-workflow.test.js.map +1 -1
package/package.json +4 -3
package/plugins/oh-my-codex/.codex-plugin/plugin.json +1 -1
package/plugins/oh-my-codex/skills/autopilot/SKILL.md +3 -0
package/plugins/oh-my-codex/skills/code-review/SKILL.md +2 -2
package/plugins/oh-my-codex/skills/deep-interview/SKILL.md +85 -11
package/plugins/oh-my-codex/skills/ultrawork/SKILL.md +32 -17
package/skills/autopilot/SKILL.md +3 -0
package/skills/code-review/SKILL.md +2 -2
package/skills/deep-interview/SKILL.md +85 -11
package/skills/ultrawork/SKILL.md +32 -17
package/src/scripts/__tests__/codex-native-hook.test.ts +391 -26
package/src/scripts/__tests__/run-test-files.test.ts +138 -2
package/src/scripts/codex-native-hook.ts +99 -17
package/src/scripts/notify-hook/team-worker-stop.ts +58 -18
package/src/scripts/prepare-build.js +83 -0
package/src/scripts/run-test-files.ts +229 -150
package/templates/AGENTS.md +40 -199
package/src/scripts/postinstall-bootstrap.js +0 -23

package/plugins/oh-my-codex/skills/deep-interview/SKILL.md CHANGED Viewed

@@ -51,6 +51,11 @@ If no flag is provided, use **Standard**.
 - Gather codebase facts via `explore` before asking user about internals
 - `omx explore` is deprecated. Use normal repository inspection tools/subagents for simple read-only brownfield fact gathering; use `omx sparkshell` only for explicit shell-native read-only evidence, and keep ambiguous or non-shell-only investigation on the richer normal path.
 - Always run a preflight context intake before the first interview question
+- For brownfield work, preflight must include doc/context grounding before user-facing questions: inspect applicable `AGENTS.md` files, README/getting-started docs, relevant `docs/` contracts/plans/ADRs, existing `.omx/context/` snapshots, and any project-local glossary/context files such as `CONTEXT.md` or `CONTEXT-MAP.md` when present.
+- Treat existing repo language as evidence, not authority: if the user uses a fuzzy, overloaded, or conflicting term, surface the specific doc/code wording and ask which meaning should govern before implementation.
+- Cross-check user claims about current behavior against code or documented contracts when discoverable. If docs and code disagree, ask a confirmation question that names both sources instead of silently choosing one.
+- Use scenario-based edge-case grilling when relationships, boundaries, or handoff behavior are unclear: invent one concrete scenario that stresses the ambiguous boundary, then ask one focused question about the expected outcome.
+- Durable docs, glossary, ADR, or memory updates are opt-in and public-safe only. Deep-interview may recommend such updates in the handoff summary, but must not automatically create or dump public docs from interview transcripts unless the user explicitly chooses that as in-scope.
 - If initial context is oversized or would exceed the prompt budget, do not paste or forward the raw payload into interview prompts; request and record a prompt-safe initial-context summary first
 - The oversized initial-context summary gate is blocking: wait for the concise summary before ambiguity scoring, crystallizing artifacts, or any downstream execution handoff
 - The summary must preserve goals, constraints, success criteria, non-goals, decision boundaries, and references to any full source documents so downstream consumers receive a prompt-safe but faithful context
@@ -97,8 +102,15 @@ If no flag is provided, use **Standard**.
    - Unknowns/open questions
    - Decision-boundary unknowns
    - Likely codebase touchpoints
+   - Relevant repo docs/rules/context inspected
+   - Terminology or doc/code conflicts found
    - Prompt-safe initial-context summary status (`not_needed`, `needed`, or `recorded`)
-5. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
+5. For brownfield tasks, inspect the applicable documentation/rule surface before the first user-facing round. Prefer exact, nearby sources over broad scans:
+   - governing `AGENTS.md` files and template/runtime instruction surfaces that apply to the touched paths
+   - README/getting-started docs and relevant docs under `docs/`, especially contracts, plans, ADR-like records, and workflow docs
+   - existing `.omx/context/` snapshots, `.omx/specs/`, and planning artifacts relevant to the slug
+   - project-local glossary/context files such as `CONTEXT.md`, `CONTEXT-MAP.md`, or context-specific docs when they exist
+6. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
 ## Phase 1: Initialize
@@ -137,13 +149,14 @@ If no flag is provided, use **Standard**.
 Repeat until ambiguity `<= threshold`, the pressure pass is complete, the readiness gates are explicit, the user exits with warning, or max rounds are reached. This is a stop condition: below threshold, do not open a new ordinary interview branch.
 ### 2a) Generate next question
-If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
+If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
 Use:
 - Original idea
 - Prior Q&A rounds
 - Current dimension scores
 - Brownfield context (if any)
+- Doc/context grounding notes, including existing terminology, governing rules, and any doc/code mismatch
 - Activated challenge mode injection (Phase 3)
 Target the lowest-scoring dimension, but respect stage priority:
@@ -155,12 +168,21 @@ Follow-up pressure ladder after each answer:
 1. Ask for a concrete example, counterexample, or evidence signal behind the latest claim
 2. Probe the hidden assumption, dependency, or belief that makes the claim true
 3. Force a boundary or tradeoff: what would you explicitly not do, defer, or reject?
-4. If the answer still describes symptoms, reframe toward essence / root cause before moving on
+4. Challenge fuzzy or conflicting terms against the repo's documented language and current code behavior
+5. Stress-test the boundary with one concrete scenario or edge case when a relationship or handoff remains ambiguous
+6. If the answer still describes symptoms, reframe toward essence / root cause before moving on
 Prefer staying on the same thread for multiple rounds when it has the highest leverage. Breadth without pressure is not progress.
 Maintain a **Breadth Ledger** across independent ambiguity tracks: scope, constraints, outputs, verification, brownfield integration, and any user-mentioned deliverable tracks. The ledger is a guard, not a mandatory rotation rule: stay deep on the current thread until it has been pressure-tested, then zoom out only when another material track remains unresolved and would change execution.
+Maintain a **Docs/Terminology Ledger** for brownfield interviews:
+- repo docs/rules/context sources inspected, with path references
+- canonical terms already used by the repo and terms to avoid or disambiguate
+- user terms that conflict with docs or current code behavior
+- doc/code mismatches that require a human decision before implementation
+- optional durable-doc follow-ups that are safe to propose but not auto-apply
 Detailed dimensions:
 - Intent Clarity — why the user wants this
 - Outcome Clarity — what end state they want
@@ -306,6 +328,7 @@ Append round result and updated scores via `omx state write --input '<json>' --j
 Use each mode once when applicable. These are normal escalation tools, not rare rescue moves:
 - **Contrarian** (round 2+ or immediately when an answer rests on an untested assumption): challenge core assumptions
+- **Terminologist** (brownfield, whenever a key term is fuzzy, overloaded, or conflicts with repo docs/code): force a canonical meaning against existing project language before implementation
 - **Simplifier** (round 4+ or when scope expands faster than outcome clarity): probe minimal viable scope
 - **Ontologist** (round 5+ and ambiguity > 0.25, or when the user keeps describing symptoms): ask for essence-level reframing
@@ -336,6 +359,9 @@ Spec should include:
 - Assumptions exposed + resolutions
 - Pressure-pass findings (which answer was revisited, and what changed)
 - Brownfield evidence vs inference notes for any repository-grounded confirmation questions
+- Docs/Terminology Ledger with inspected repo docs/rules/context, term conflicts, and any doc/code mismatch decisions
+- Scenario/edge-case pressure findings that materially shaped scope or acceptance criteria
+- Optional durable documentation recommendations, explicitly marked opt-in and public-safe; do not include raw private transcript dumps
 - Technical context findings
 - Full or condensed transcript
@@ -365,11 +391,45 @@ When the clarified task is specifically about `$autoresearch`, or the skill is i
 ## Phase 5: Execution Bridge
-Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, and any residual-risk warnings across the handoff.
+Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, and any residual-risk warnings across the handoff.
+### Optional execution contract foundation
+When an Autopilot/deep-interview handoff explicitly requires a stride contract, emit it as structured data rather than prose. This is a validation foundation, not a broadness-inference feature: do not infer stride from task length, phase labels, snapshots, or freeform wording.
+Canonical location under Autopilot state:
+```json
+{
+  "handoff_artifacts": {
+    "deep_interview": {
+      "execution_contract_required": true,
+      "execution_contract": {
+        "version": 1,
+        "execution_stride": "task",
+        "source": "deep-interview",
+        "selected_by": "user",
+        "allow_task_shrink": true,
+        "completion_unit": "One focused task",
+        "stop_condition": "Stop after that task is implemented and verified",
+        "acceptance_coverage_scope": "task",
+        "shrink_policy": "allowed"
+      }
+    }
+  }
+}
+```
+Stride meanings:
+- `task`: conservative, small-step execution; `allow_task_shrink:true`, `acceptance_coverage_scope:"task"`, `shrink_policy:"allowed"`.
+- `deliverable`: finish the named deliverable before stopping; `allow_task_shrink:false`, `acceptance_coverage_scope:"deliverable"`, `shrink_policy:"ask_before_shrink"`.
+- `milestone`: finish the larger approved milestone unless blocked; `allow_task_shrink:false`, `acceptance_coverage_scope:"milestone"`, `shrink_policy:"deny_unless_blocked"`.
+Only set `execution_contract_required:true` when the selected downstream workflow needs this explicit stride/stop-condition guard. New artifacts must write the canonical snake_case schema shown above under `handoff_artifacts.deep_interview`; runtime readers may accept legacy camelCase field/marker aliases and direct/nested `execution_contract` locations only as compatibility input. If `execution_contract_required` is absent or false, downstream Autopilot compatibility behavior is unchanged.
 ### Goal-mode follow-ups
-Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
+Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
 - **`$ultragoal`** — default goal-mode follow-up for implementation or general goal-oriented follow-up specs that should be converted into durable Codex/OMX goals with sequential completion tracking.
 - **`$autoresearch-goal`** — use when the clarified context is a research project: a research question, reference/literature gathering, evaluator-backed analysis, or professor/critic-style deliverable.
@@ -377,7 +437,16 @@ Include these product-facing suggestions when they fit the clarified spec, witho
 Recommend `$ultragoal` as the default durable goal-mode follow-up because it supersedes Ralph for goal tracking. Preserve `$team` for coordinated parallel implementation and keep `$ralph` only as an explicit fallback for persistent single-owner execution/verification when the user specifically selects it.
-### 1. **`$ralplan` (Recommended)**
+### 1. **`$ultragoal` (Default durable execution follow-up)**
+- **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
+- **Invocation:** `$ultragoal create-goals --brief-file <spec-path>` followed by `$ultragoal complete-goals` in the active execution lane
+- **Consumer Behavior:** Convert the clarified spec into durable goal-mode work. Preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, scenario-pressure findings, and residual-risk warnings as binding story constraints.
+- **Skipped / Already-Satisfied Stages:** Requirement interview, ambiguity clarification, doc/context preflight, and early intent-boundary elicitation
+- **Expected Output:** `.omx/ultragoal/brief.md`, `.omx/ultragoal/goals.json`, `.omx/ultragoal/ledger.jsonl`, implementation evidence, verification evidence, and final cleanup/review-gate evidence
+- **Best When:** The clarified spec is execution-ready or the user explicitly wants durable goal tracking as the next step
+- **Next Recommended Step:** Run the Ultragoal completion loop; launch `$team` only inside an active Ultragoal story when parallel lanes are warranted, and use `$ralph` only as an explicit fallback when the user asks for that legacy persistence mode
+### 2. **`$ralplan` (Recommended when architecture/test-shape review is still needed)**
 - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
 - **Invocation:** `$plan --consensus --direct <spec-path>`
 - **Consumer Behavior:** Treat the deep-interview spec as the requirements source of truth. Do not repeat the interview by default; refine architecture/feasibility around the clarified intent and boundaries instead.
@@ -386,7 +455,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - **Best When:** Requirements are clear enough to stop interviewing, but architectural validation / consensus planning is still desirable
 - **Next Recommended Step:** Use the approved planning artifacts with `$ultragoal` as the default durable goal-mode follow-up (optionally with `$team` for parallel lanes); choose `$autoresearch-goal` for research validation or `$performance-goal` for measurable optimization, and use `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
-### 2. **`$autopilot`**
+### 3. **`$autopilot`**
 - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
 - **Invocation:** `$autopilot <spec-path>`
 - **Consumer Behavior:** Use the deep-interview spec as the clarified execution brief. Preserve intent, non-goals, decision boundaries, and acceptance criteria as binding context for planning/execution.
@@ -395,7 +464,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - **Best When:** The clarified spec is already strong enough for direct planning + execution without an additional consensus gate
 - **Next Recommended Step:** Continue through autopilot's execution/QA/validation flow; if coordination-heavy execution emerges, prefer `$team` under a leader-owned `$ultragoal` ledger, using `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
-### 3. **`$ralph` (Explicit fallback only)**
+### 4. **`$ralph` (Explicit fallback only)**
 - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
 - **Invocation:** `$ralph <spec-path>`
 - **Consumer Behavior:** Use the spec's acceptance criteria and boundary constraints as the persistence target. Do not reopen requirements discovery unless the user explicitly asks to refine further.
@@ -404,7 +473,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - **Best When:** The user explicitly asks for Ralph's persistent sequential completion pressure; otherwise use `$ultragoal` for durable goal tracking and completion checkpoints
 - **Next Recommended Step:** If this explicit fallback is selected, continue Ralph's persistence loop; if work expands into coordination-heavy lanes, hand off to `$team` under `$ultragoal` checkpointing rather than promoting Ralph as the next default
-### 4. **`$team`**
+### 5. **`$team`**
 - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
 - **Invocation:** `$team <spec-path>`
 - **Consumer Behavior:** Treat the spec as shared execution context for coordinated parallel work. Preserve the clarified intent, non-goals, decision boundaries, and acceptance criteria as common lane constraints.
@@ -413,7 +482,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - **Best When:** The task is large, multi-lane, or blocker-sensitive enough to justify coordinated parallel execution instead of a single persistent loop
 - **Next Recommended Step:** Follow the team verification path when the coordinated execution phase finishes; checkpoint completion through `$ultragoal` by default, escalating to a separate Ralph loop only when the user explicitly asks for that persistent verification/fix owner
-### 5. **Refine further**
+### 6. **Refine further**
 - **Input Artifact:** Existing transcript, context snapshot, and current spec draft
 - **Invocation:** Continue the interview loop
 - **Consumer Behavior:** Re-enter questioning to resolve the highest-leverage remaining uncertainty
@@ -437,6 +506,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - Use `omx state write/read --input '<json>' --json` for resumable mode state; `state_write` / `state_read` are explicit MCP compatibility fallbacks only
 - If the interview cannot ask a required `omx question` round, persist the blocker as terminal state with `active: false` and `current_phase: "blocked"`; do not write a terminal blocked phase with `active: true`
 - Read/write context snapshots under `.omx/context/`
+- Read applicable repo docs/rules/context during preflight; write durable docs, glossary, ADR, or memory updates only when the user explicitly opts in and the content is public-safe
 - Record whether the oversized-context summary gate is not needed, pending, or satisfied before any scoring or handoff step
 - Save transcript/spec artifacts under `.omx/interviews/` and `.omx/specs/`
 </Tool_Usage>
@@ -460,7 +530,11 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - [ ] Transcript written to `.omx/interviews/{slug}-{timestamp}.md`
 - [ ] Spec written to `.omx/specs/deep-interview-{slug}.md`
 - [ ] Brownfield questions use evidence-backed confirmation when applicable
-- [ ] Handoff options provided (`$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$ultragoal`, `$autoresearch-goal`, `$performance-goal`) when applicable
+- [ ] Brownfield preflight inspected applicable repo docs/rules/context before user-facing questions
+- [ ] Fuzzy or conflicting terminology was challenged against repo language/current code behavior when applicable
+- [ ] Scenario-based edge-case grilling was used when boundary ambiguity would materially affect implementation
+- [ ] Durable docs/ADR/memory updates, if any, were explicitly opted into and public-safe
+- [ ] Handoff options provided (`$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$autoresearch-goal`, `$performance-goal`) when applicable
 - [ ] No direct implementation performed in this mode
 </Final_Checklist>

package/plugins/oh-my-codex/skills/ultrawork/SKILL.md CHANGED Viewed

@@ -4,22 +4,23 @@ description: Parallel execution engine for high-throughput task completion
 ---
 <Purpose>
-Ultrawork is a parallel execution engine for high-throughput task completion. It is a component, not a standalone persistence mode: it provides parallelism, context discipline, and smart delegation guidance, but not Ralph's persistence loop, architect sign-off, or long-running completion guarantees.
+Ultrawork is a parallel execution engine for high-throughput task completion. It is a component, not a standalone persistence or verification mode: it provides parallelism, context discipline, and smart delegation guidance, but not durable goal tracking, Team's tmux worker lifecycle, Ralph's legacy persistence loop, architect sign-off, or long-running completion guarantees.
 </Purpose>
 <Use_When>
 - Multiple independent tasks can run simultaneously
 - User says "ulw", "ultrawork", or explicitly wants parallel execution
 - Task benefits from concurrent execution plus lightweight evidence before wrap-up
-- You need a direct-tool lane plus optional background evidence lanes without entering Ralph
+- You need a direct-tool lane plus optional background evidence lanes without entering Team or a durable goal workflow
 </Use_When>
 <Do_Not_Use_When>
-- Task requires guaranteed completion with persistence, architect verification, or deslop/reverification -- use `ralph` instead (Ralph includes ultrawork)
-- Task requires a full autonomous pipeline -- use `autopilot` instead (autopilot defaults to Ultragoal, with Team/parallel execution used only when needed)
-- There is only one sequential task with no parallelism opportunity -- execute directly or delegate to a single `executor`
+- Task needs durable goal tracking, ledger checkpoints, or resume across stories -- use `ultragoal` instead
+- Task needs coordinated tmux workers, shared task state, mailbox/dispatch coordination, or long-running parallel execution -- use `team` instead
+- Task requires a full autonomous pipeline -- use `autopilot` instead (default loop: `deep-interview -> ralplan -> ultragoal`, with `team` only when needed)
+- Task intentionally requires the legacy persistent single-owner completion/verification loop -- use `ralph` explicitly; do not present it as the default durable path
+- There is only one sequential task with no parallelism opportunity -- execute directly, use `ultragoal` for durable tracking, or delegate to a single `executor`
 - The request is still in plan-consensus mode -- keep planning artifacts in `ralplan` until execution is explicitly authorized
-- User needs session persistence for resume -- use `ralph`, which adds persistence on top of ultrawork
 </Do_Not_Use_When>
 <Why_This_Exists>
@@ -138,8 +139,12 @@ Why bad: No verification output, no acceptance evidence, and no manual QA note w
 </Examples>
 <Escalation_And_Stop_Conditions>
-- When ultrawork is invoked directly (not via Ralph), apply lightweight verification only -- build/typecheck passes when relevant, affected tests pass, and manual QA notes are captured when needed.
-- Ralph owns persistence, architect verification, deslop, and the full verified-completion promise. Do not claim those guarantees from direct ultrawork alone.
+- When ultrawork is invoked directly, apply lightweight verification only -- build/typecheck passes when relevant, affected tests pass, and manual QA notes are captured when needed.
+- Ultrawork does not own persistence, durable ledgers, architect verification, deslop, full QA, or the full verified-completion promise. Do not claim those guarantees from direct ultrawork alone.
+- Escalate to `ultragoal` when the work needs durable goal state, story checkpoints, or resume across implementation steps.
+- Escalate to `team` when the work needs coordinated tmux workers, shared task state, or durable multi-worker lifecycle control.
+- Escalate to explicitly requested `ralph` only for the supported legacy single-owner persistence/verification fallback.
+- Ralph owns persistence, architect verification, deslop, and the full verified-completion promise only when explicitly selected as the supported legacy fallback; direct ultrawork does not own those guarantees.
 - If a task fails repeatedly across retries, report the issue rather than retrying indefinitely.
 - Escalate to the user when tasks have unclear dependencies, conflicting requirements, or a materially branching acceptance target.
 </Escalation_And_Stop_Conditions>
@@ -159,17 +164,27 @@ Why bad: No verification output, no acceptance evidence, and no manual QA note w
 ## Relationship to Other Modes
 ```
-ralph (persistence + verified completion wrapper)
- \-- includes: ultrawork (this skill)
-     \-- provides: high-throughput execution + lightweight evidence
+ultrawork (this skill)
+ \-- provides: in-session parallel execution discipline + lightweight evidence
-autopilot (autonomous execution)
- \-- includes: ralph
-     \-- includes: ultrawork (this skill)
+ultragoal (durable goal execution)
+ \-- owns: goal ledger, checkpoints, resume across stories, final gate discipline
+ \-- may use: team for parallel lanes when a story benefits from coordinated workers
-ecomode (token efficiency)
- \-- modifies: ultrawork's model selection
+team (tmux coordinated execution)
+ \-- owns: worker panes, shared task state, mailbox/dispatch, lifecycle control
+ \-- can return: checkpoint-ready evidence to an Ultragoal leader
+autopilot (strict autonomous delivery loop)
+ \-- default flow: deep-interview -> ralplan -> ultragoal -> code-review -> ultraqa
+ \-- may use: team only when an Ultragoal story needs parallel execution
+ralph (supported legacy explicit fallback)
+ \-- owns: single-owner persistence loop + architect verification when intentionally selected
+ecomode (deprecated compatibility-only)
+ \-- do not route users there from ultrawork; it is not the current model-selection path
 ```
-Ultrawork is the parallelism and execution-discipline layer. Ralph adds persistence, architect verification, deslop, and retry-until-done behavior. Autopilot adds the broader autonomous lifecycle pipeline. Ecomode adjusts ultrawork's model routing to favor cheaper models.
+Ultrawork is the parallelism and execution-discipline layer. Ultragoal is the current default durable goal/ledger follow-up. Team is the coordinated tmux parallel runtime, often nested under an Ultragoal story when durable work needs multiple lanes. Autopilot orchestrates the full default lifecycle through deep-interview, ralplan, ultragoal, code-review, and ultraqa. Ralph remains active as an explicit legacy fallback for persistent single-owner verification, but it is not the recommended default durable path. Ecomode is deprecated compatibility-only and should not be advertised as the ultrawork model-selection route.
 </Advanced>

package/skills/autopilot/SKILL.md CHANGED Viewed

@@ -133,6 +133,9 @@ Required fields:
 - **On start**: `omx state write --input '{"mode":"autopilot","active":true,"current_phase":"deep-interview","iteration":1,"review_cycle":0,"state":{"phase_cycle":["deep-interview","ralplan","ultragoal","code-review","ultraqa"],"handoff_artifacts":{"context_snapshot_path":"<snapshot-path>","deep_interview":null,"ralplan":null,"ralplan_consensus_gate":{"required":true,"sequence":["architect-review","critic-review"],"planning_artifacts_are_not_consensus":true,"required_review_roles":["architect","critic"],"ralplan_architect_review":null,"ralplan_critic_review":null,"complete":false},"ultragoal":null,"code_review":null,"ultraqa":null},"review_verdict":null,"qa_verdict":null,"return_to_ralplan_reason":null}}' --json`
 - **On deep-interview -> ralplan**: only after a separate gate proves the interview chain is explicitly complete or the user explicitly authorized a skip. For completion, persist `deep_interview_gate:{"status":"complete","rationale":"<why requirements are complete>","handoff_summary":"<summary>"}` (or equivalent non-empty rationale/summary) plus the clarified spec/requirements under `handoff_artifacts.deep_interview`; if a final `omx question` was involved, keep its same-session answered record linked by `question_id`/`satisfied_at`. For skip, persist `deep_interview_gate:{"status":"skipped","skip_authorized_by_user":true,"skip_reason":"<user-authorized reason>","skipped_at":"<timestamp>","source":"user","session_id":"<session>"}`. Do not leave deep-interview merely because the first `omx question` was answered or cleared.
+  - **Optional execution contract foundation**: when a downstream handoff explicitly sets `execution_contract_required:true`, persist a complete structured `execution_contract` under `handoff_artifacts.deep_interview` before leaving deep-interview. The canonical schema is `version:1`, `execution_stride:"task"|"deliverable"|"milestone"`, `source:"deep-interview"`, `selected_by:"user"|"default"`, `allow_task_shrink:<boolean>`, non-empty `completion_unit`, non-empty `stop_condition`, `acceptance_coverage_scope:"task"|"deliverable"|"milestone"`, and `shrink_policy:"allowed"|"ask_before_shrink"|"deny_unless_blocked"`.
+  - Stride semantics are binding only when `execution_contract_required:true`: `task` means `allow_task_shrink:true`, `acceptance_coverage_scope:"task"`, `shrink_policy:"allowed"`; `deliverable` means `allow_task_shrink:false`, `acceptance_coverage_scope:"deliverable"`, `shrink_policy:"ask_before_shrink"`; `milestone` means `allow_task_shrink:false`, `acceptance_coverage_scope:"milestone"`, `shrink_policy:"deny_unless_blocked"`.
+  - Preserve legacy behavior when `execution_contract_required` is absent or false. Do not infer stride from prose, broadness, phase names, snapshots, or task size; this foundation only validates an explicit structured contract and deliberately uses `milestone` rather than `phase`. New artifacts must write canonical snake_case keys under `handoff_artifacts.deep_interview`; the runtime may read legacy camelCase field/marker aliases and direct/nested `execution_contract` locations only as compatibility input.
 - **On ralplan -> ultragoal**: only after `ralplan_consensus_gate.complete:true`, with tracker-backed native-subagent `ralplan_architect_review.agent_role:"architect"` and `ralplan_architect_review.verdict:"approve"` recorded before tracker-backed native-subagent `ralplan_critic_review.agent_role:"critic"` and `ralplan_critic_review.verdict:"approve"`; `codex_exec` or artifact-only approvals are trace evidence but not native lane proof. Set `current_phase:"ultragoal"` and persist the plan/test-spec paths under `handoff_artifacts.ralplan`.
 - **On missing ralplan consensus evidence**: keep `current_phase:"ralplan"`, persist `ralplan_consensus_gate.complete:false` with `blocked_reason`, and report an explicit blocker or max-iteration outcome instead of handing off to execution.
 - **On ultragoal -> code-review**: set `current_phase:"code-review"`, persist implementation/test/ledger evidence under `handoff_artifacts.ultragoal`.

package/skills/code-review/SKILL.md CHANGED Viewed

@@ -71,10 +71,11 @@ Delegates to the `code-reviewer` and `architect` agents in parallel for a two-la
 Do not self-review as a fallback. If the `code-reviewer` or `architect` agent path is missing, unavailable, skipped, or fails, emit a clear unavailable-review result and block approval until the independent lane evidence exists.
+Respect the user's current model and reasoning/effort selection when launching review lanes. Do not pass `model` or `reasoning_effort` overrides in the review-lane task calls unless the user explicitly asks for review-specific overrides; omitting them lets native subagents inherit the active session settings.
 ```
 task(
   agent_type="code-reviewer",
-  reasoning_effort="xhigh",
   prompt="CODE REVIEW TASK
 Review code changes for quality, security, and maintainability.
@@ -100,7 +101,6 @@ Output: Code review report with:
 task(
   agent_type="architect",
-  reasoning_effort="xhigh",
   prompt="ARCHITECTURE / DEVIL'S-ADVOCATE REVIEW TASK
 Review the same code changes from the architecture/tradeoff perspective.

package/skills/deep-interview/SKILL.md CHANGED Viewed

@@ -51,6 +51,11 @@ If no flag is provided, use **Standard**.
 - Gather codebase facts via `explore` before asking user about internals
 - `omx explore` is deprecated. Use normal repository inspection tools/subagents for simple read-only brownfield fact gathering; use `omx sparkshell` only for explicit shell-native read-only evidence, and keep ambiguous or non-shell-only investigation on the richer normal path.
 - Always run a preflight context intake before the first interview question
+- For brownfield work, preflight must include doc/context grounding before user-facing questions: inspect applicable `AGENTS.md` files, README/getting-started docs, relevant `docs/` contracts/plans/ADRs, existing `.omx/context/` snapshots, and any project-local glossary/context files such as `CONTEXT.md` or `CONTEXT-MAP.md` when present.
+- Treat existing repo language as evidence, not authority: if the user uses a fuzzy, overloaded, or conflicting term, surface the specific doc/code wording and ask which meaning should govern before implementation.
+- Cross-check user claims about current behavior against code or documented contracts when discoverable. If docs and code disagree, ask a confirmation question that names both sources instead of silently choosing one.
+- Use scenario-based edge-case grilling when relationships, boundaries, or handoff behavior are unclear: invent one concrete scenario that stresses the ambiguous boundary, then ask one focused question about the expected outcome.
+- Durable docs, glossary, ADR, or memory updates are opt-in and public-safe only. Deep-interview may recommend such updates in the handoff summary, but must not automatically create or dump public docs from interview transcripts unless the user explicitly chooses that as in-scope.
 - If initial context is oversized or would exceed the prompt budget, do not paste or forward the raw payload into interview prompts; request and record a prompt-safe initial-context summary first
 - The oversized initial-context summary gate is blocking: wait for the concise summary before ambiguity scoring, crystallizing artifacts, or any downstream execution handoff
 - The summary must preserve goals, constraints, success criteria, non-goals, decision boundaries, and references to any full source documents so downstream consumers receive a prompt-safe but faithful context
@@ -97,8 +102,15 @@ If no flag is provided, use **Standard**.
    - Unknowns/open questions
    - Decision-boundary unknowns
    - Likely codebase touchpoints
+   - Relevant repo docs/rules/context inspected
+   - Terminology or doc/code conflicts found
    - Prompt-safe initial-context summary status (`not_needed`, `needed`, or `recorded`)
-5. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
+5. For brownfield tasks, inspect the applicable documentation/rule surface before the first user-facing round. Prefer exact, nearby sources over broad scans:
+   - governing `AGENTS.md` files and template/runtime instruction surfaces that apply to the touched paths
+   - README/getting-started docs and relevant docs under `docs/`, especially contracts, plans, ADR-like records, and workflow docs
+   - existing `.omx/context/` snapshots, `.omx/specs/`, and planning artifacts relevant to the slug
+   - project-local glossary/context files such as `CONTEXT.md`, `CONTEXT-MAP.md`, or context-specific docs when they exist
+6. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
 ## Phase 1: Initialize
@@ -137,13 +149,14 @@ If no flag is provided, use **Standard**.
 Repeat until ambiguity `<= threshold`, the pressure pass is complete, the readiness gates are explicit, the user exits with warning, or max rounds are reached. This is a stop condition: below threshold, do not open a new ordinary interview branch.
 ### 2a) Generate next question
-If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
+If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
 Use:
 - Original idea
 - Prior Q&A rounds
 - Current dimension scores
 - Brownfield context (if any)
+- Doc/context grounding notes, including existing terminology, governing rules, and any doc/code mismatch
 - Activated challenge mode injection (Phase 3)
 Target the lowest-scoring dimension, but respect stage priority:
@@ -155,12 +168,21 @@ Follow-up pressure ladder after each answer:
 1. Ask for a concrete example, counterexample, or evidence signal behind the latest claim
 2. Probe the hidden assumption, dependency, or belief that makes the claim true
 3. Force a boundary or tradeoff: what would you explicitly not do, defer, or reject?
-4. If the answer still describes symptoms, reframe toward essence / root cause before moving on
+4. Challenge fuzzy or conflicting terms against the repo's documented language and current code behavior
+5. Stress-test the boundary with one concrete scenario or edge case when a relationship or handoff remains ambiguous
+6. If the answer still describes symptoms, reframe toward essence / root cause before moving on
 Prefer staying on the same thread for multiple rounds when it has the highest leverage. Breadth without pressure is not progress.
 Maintain a **Breadth Ledger** across independent ambiguity tracks: scope, constraints, outputs, verification, brownfield integration, and any user-mentioned deliverable tracks. The ledger is a guard, not a mandatory rotation rule: stay deep on the current thread until it has been pressure-tested, then zoom out only when another material track remains unresolved and would change execution.
+Maintain a **Docs/Terminology Ledger** for brownfield interviews:
+- repo docs/rules/context sources inspected, with path references
+- canonical terms already used by the repo and terms to avoid or disambiguate
+- user terms that conflict with docs or current code behavior
+- doc/code mismatches that require a human decision before implementation
+- optional durable-doc follow-ups that are safe to propose but not auto-apply
 Detailed dimensions:
 - Intent Clarity — why the user wants this
 - Outcome Clarity — what end state they want
@@ -306,6 +328,7 @@ Append round result and updated scores via `omx state write --input '<json>' --j
 Use each mode once when applicable. These are normal escalation tools, not rare rescue moves:
 - **Contrarian** (round 2+ or immediately when an answer rests on an untested assumption): challenge core assumptions
+- **Terminologist** (brownfield, whenever a key term is fuzzy, overloaded, or conflicts with repo docs/code): force a canonical meaning against existing project language before implementation
 - **Simplifier** (round 4+ or when scope expands faster than outcome clarity): probe minimal viable scope
 - **Ontologist** (round 5+ and ambiguity > 0.25, or when the user keeps describing symptoms): ask for essence-level reframing
@@ -336,6 +359,9 @@ Spec should include:
 - Assumptions exposed + resolutions
 - Pressure-pass findings (which answer was revisited, and what changed)
 - Brownfield evidence vs inference notes for any repository-grounded confirmation questions
+- Docs/Terminology Ledger with inspected repo docs/rules/context, term conflicts, and any doc/code mismatch decisions
+- Scenario/edge-case pressure findings that materially shaped scope or acceptance criteria
+- Optional durable documentation recommendations, explicitly marked opt-in and public-safe; do not include raw private transcript dumps
 - Technical context findings
 - Full or condensed transcript
@@ -365,11 +391,45 @@ When the clarified task is specifically about `$autoresearch`, or the skill is i
 ## Phase 5: Execution Bridge
-Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, and any residual-risk warnings across the handoff.
+Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, and any residual-risk warnings across the handoff.
+### Optional execution contract foundation
+When an Autopilot/deep-interview handoff explicitly requires a stride contract, emit it as structured data rather than prose. This is a validation foundation, not a broadness-inference feature: do not infer stride from task length, phase labels, snapshots, or freeform wording.
+Canonical location under Autopilot state:
+```json
+{
+  "handoff_artifacts": {
+    "deep_interview": {
+      "execution_contract_required": true,
+      "execution_contract": {
+        "version": 1,
+        "execution_stride": "task",
+        "source": "deep-interview",
+        "selected_by": "user",
+        "allow_task_shrink": true,
+        "completion_unit": "One focused task",
+        "stop_condition": "Stop after that task is implemented and verified",
+        "acceptance_coverage_scope": "task",
+        "shrink_policy": "allowed"
+      }
+    }
+  }
+}
+```
+Stride meanings:
+- `task`: conservative, small-step execution; `allow_task_shrink:true`, `acceptance_coverage_scope:"task"`, `shrink_policy:"allowed"`.
+- `deliverable`: finish the named deliverable before stopping; `allow_task_shrink:false`, `acceptance_coverage_scope:"deliverable"`, `shrink_policy:"ask_before_shrink"`.
+- `milestone`: finish the larger approved milestone unless blocked; `allow_task_shrink:false`, `acceptance_coverage_scope:"milestone"`, `shrink_policy:"deny_unless_blocked"`.
+Only set `execution_contract_required:true` when the selected downstream workflow needs this explicit stride/stop-condition guard. New artifacts must write the canonical snake_case schema shown above under `handoff_artifacts.deep_interview`; runtime readers may accept legacy camelCase field/marker aliases and direct/nested `execution_contract` locations only as compatibility input. If `execution_contract_required` is absent or false, downstream Autopilot compatibility behavior is unchanged.
 ### Goal-mode follow-ups
-Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
+Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
 - **`$ultragoal`** — default goal-mode follow-up for implementation or general goal-oriented follow-up specs that should be converted into durable Codex/OMX goals with sequential completion tracking.
 - **`$autoresearch-goal`** — use when the clarified context is a research project: a research question, reference/literature gathering, evaluator-backed analysis, or professor/critic-style deliverable.
@@ -377,7 +437,16 @@ Include these product-facing suggestions when they fit the clarified spec, witho
 Recommend `$ultragoal` as the default durable goal-mode follow-up because it supersedes Ralph for goal tracking. Preserve `$team` for coordinated parallel implementation and keep `$ralph` only as an explicit fallback for persistent single-owner execution/verification when the user specifically selects it.
-### 1. **`$ralplan` (Recommended)**
+### 1. **`$ultragoal` (Default durable execution follow-up)**
+- **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
+- **Invocation:** `$ultragoal create-goals --brief-file <spec-path>` followed by `$ultragoal complete-goals` in the active execution lane
+- **Consumer Behavior:** Convert the clarified spec into durable goal-mode work. Preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, scenario-pressure findings, and residual-risk warnings as binding story constraints.
+- **Skipped / Already-Satisfied Stages:** Requirement interview, ambiguity clarification, doc/context preflight, and early intent-boundary elicitation
+- **Expected Output:** `.omx/ultragoal/brief.md`, `.omx/ultragoal/goals.json`, `.omx/ultragoal/ledger.jsonl`, implementation evidence, verification evidence, and final cleanup/review-gate evidence
+- **Best When:** The clarified spec is execution-ready or the user explicitly wants durable goal tracking as the next step
+- **Next Recommended Step:** Run the Ultragoal completion loop; launch `$team` only inside an active Ultragoal story when parallel lanes are warranted, and use `$ralph` only as an explicit fallback when the user asks for that legacy persistence mode
+### 2. **`$ralplan` (Recommended when architecture/test-shape review is still needed)**
 - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
 - **Invocation:** `$plan --consensus --direct <spec-path>`
 - **Consumer Behavior:** Treat the deep-interview spec as the requirements source of truth. Do not repeat the interview by default; refine architecture/feasibility around the clarified intent and boundaries instead.
@@ -386,7 +455,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - **Best When:** Requirements are clear enough to stop interviewing, but architectural validation / consensus planning is still desirable
 - **Next Recommended Step:** Use the approved planning artifacts with `$ultragoal` as the default durable goal-mode follow-up (optionally with `$team` for parallel lanes); choose `$autoresearch-goal` for research validation or `$performance-goal` for measurable optimization, and use `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
-### 2. **`$autopilot`**
+### 3. **`$autopilot`**
 - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
 - **Invocation:** `$autopilot <spec-path>`
 - **Consumer Behavior:** Use the deep-interview spec as the clarified execution brief. Preserve intent, non-goals, decision boundaries, and acceptance criteria as binding context for planning/execution.
@@ -395,7 +464,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - **Best When:** The clarified spec is already strong enough for direct planning + execution without an additional consensus gate
 - **Next Recommended Step:** Continue through autopilot's execution/QA/validation flow; if coordination-heavy execution emerges, prefer `$team` under a leader-owned `$ultragoal` ledger, using `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
-### 3. **`$ralph` (Explicit fallback only)**
+### 4. **`$ralph` (Explicit fallback only)**
 - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
 - **Invocation:** `$ralph <spec-path>`
 - **Consumer Behavior:** Use the spec's acceptance criteria and boundary constraints as the persistence target. Do not reopen requirements discovery unless the user explicitly asks to refine further.
@@ -404,7 +473,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - **Best When:** The user explicitly asks for Ralph's persistent sequential completion pressure; otherwise use `$ultragoal` for durable goal tracking and completion checkpoints
 - **Next Recommended Step:** If this explicit fallback is selected, continue Ralph's persistence loop; if work expands into coordination-heavy lanes, hand off to `$team` under `$ultragoal` checkpointing rather than promoting Ralph as the next default
-### 4. **`$team`**
+### 5. **`$team`**
 - **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
 - **Invocation:** `$team <spec-path>`
 - **Consumer Behavior:** Treat the spec as shared execution context for coordinated parallel work. Preserve the clarified intent, non-goals, decision boundaries, and acceptance criteria as common lane constraints.
@@ -413,7 +482,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - **Best When:** The task is large, multi-lane, or blocker-sensitive enough to justify coordinated parallel execution instead of a single persistent loop
 - **Next Recommended Step:** Follow the team verification path when the coordinated execution phase finishes; checkpoint completion through `$ultragoal` by default, escalating to a separate Ralph loop only when the user explicitly asks for that persistent verification/fix owner
-### 5. **Refine further**
+### 6. **Refine further**
 - **Input Artifact:** Existing transcript, context snapshot, and current spec draft
 - **Invocation:** Continue the interview loop
 - **Consumer Behavior:** Re-enter questioning to resolve the highest-leverage remaining uncertainty
@@ -437,6 +506,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - Use `omx state write/read --input '<json>' --json` for resumable mode state; `state_write` / `state_read` are explicit MCP compatibility fallbacks only
 - If the interview cannot ask a required `omx question` round, persist the blocker as terminal state with `active: false` and `current_phase: "blocked"`; do not write a terminal blocked phase with `active: true`
 - Read/write context snapshots under `.omx/context/`
+- Read applicable repo docs/rules/context during preflight; write durable docs, glossary, ADR, or memory updates only when the user explicitly opts in and the content is public-safe
 - Record whether the oversized-context summary gate is not needed, pending, or satisfied before any scoring or handoff step
 - Save transcript/spec artifacts under `.omx/interviews/` and `.omx/specs/`
 </Tool_Usage>
@@ -460,7 +530,11 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
 - [ ] Transcript written to `.omx/interviews/{slug}-{timestamp}.md`
 - [ ] Spec written to `.omx/specs/deep-interview-{slug}.md`
 - [ ] Brownfield questions use evidence-backed confirmation when applicable
-- [ ] Handoff options provided (`$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$ultragoal`, `$autoresearch-goal`, `$performance-goal`) when applicable
+- [ ] Brownfield preflight inspected applicable repo docs/rules/context before user-facing questions
+- [ ] Fuzzy or conflicting terminology was challenged against repo language/current code behavior when applicable
+- [ ] Scenario-based edge-case grilling was used when boundary ambiguity would materially affect implementation
+- [ ] Durable docs/ADR/memory updates, if any, were explicitly opted into and public-safe
+- [ ] Handoff options provided (`$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$autoresearch-goal`, `$performance-goal`) when applicable
 - [ ] No direct implementation performed in this mode
 </Final_Checklist>