oh-my-codex 0.18.7 → 0.18.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/Cargo.lock +12 -12
- package/Cargo.toml +1 -1
- package/README.md +5 -5
- package/crates/omx-sparkshell/tests/execution.rs +1 -1
- package/dist/agents/__tests__/native-config.test.js +42 -1
- package/dist/agents/__tests__/native-config.test.js.map +1 -1
- package/dist/agents/definitions.d.ts +8 -0
- package/dist/agents/definitions.d.ts.map +1 -1
- package/dist/agents/definitions.js +1 -0
- package/dist/agents/definitions.js.map +1 -1
- package/dist/agents/native-config.d.ts +5 -1
- package/dist/agents/native-config.d.ts.map +1 -1
- package/dist/agents/native-config.js +17 -2
- package/dist/agents/native-config.js.map +1 -1
- package/dist/autopilot/__tests__/fsm.test.js +3 -0
- package/dist/autopilot/__tests__/fsm.test.js.map +1 -1
- package/dist/autopilot/fsm.js +2 -2
- package/dist/autopilot/fsm.js.map +1 -1
- package/dist/cli/__tests__/auth.test.js +4 -2
- package/dist/cli/__tests__/auth.test.js.map +1 -1
- package/dist/cli/__tests__/codex-plugin-layout.test.js +512 -1
- package/dist/cli/__tests__/codex-plugin-layout.test.js.map +1 -1
- package/dist/cli/__tests__/doctor-warning-copy.test.js +39 -0
- package/dist/cli/__tests__/doctor-warning-copy.test.js.map +1 -1
- package/dist/cli/__tests__/index.test.js +98 -6
- package/dist/cli/__tests__/index.test.js.map +1 -1
- package/dist/cli/__tests__/package-bin-contract.test.js +28 -8
- package/dist/cli/__tests__/package-bin-contract.test.js.map +1 -1
- package/dist/cli/__tests__/question.test.js +26 -9
- package/dist/cli/__tests__/question.test.js.map +1 -1
- package/dist/cli/__tests__/ralph-goal-mode-contract.test.js +13 -0
- package/dist/cli/__tests__/ralph-goal-mode-contract.test.js.map +1 -1
- package/dist/cli/__tests__/ralph.test.js +14 -0
- package/dist/cli/__tests__/ralph.test.js.map +1 -1
- package/dist/cli/__tests__/resume.test.js +50 -1
- package/dist/cli/__tests__/resume.test.js.map +1 -1
- package/dist/cli/__tests__/setup-install-mode.test.js +89 -0
- package/dist/cli/__tests__/setup-install-mode.test.js.map +1 -1
- package/dist/cli/__tests__/setup-refresh.test.js +65 -0
- package/dist/cli/__tests__/setup-refresh.test.js.map +1 -1
- package/dist/cli/__tests__/state.test.js +21 -0
- package/dist/cli/__tests__/state.test.js.map +1 -1
- package/dist/cli/__tests__/team.test.js +2 -2
- package/dist/cli/__tests__/update.test.js +323 -18
- package/dist/cli/__tests__/update.test.js.map +1 -1
- package/dist/cli/__tests__/windows-popup-loop-contract.test.js +1 -1
- package/dist/cli/doctor.d.ts.map +1 -1
- package/dist/cli/doctor.js +8 -1
- package/dist/cli/doctor.js.map +1 -1
- package/dist/cli/index.d.ts +21 -4
- package/dist/cli/index.d.ts.map +1 -1
- package/dist/cli/index.js +143 -28
- package/dist/cli/index.js.map +1 -1
- package/dist/cli/plugin-marketplace.d.ts +14 -2
- package/dist/cli/plugin-marketplace.d.ts.map +1 -1
- package/dist/cli/plugin-marketplace.js +62 -15
- package/dist/cli/plugin-marketplace.js.map +1 -1
- package/dist/cli/ralph.d.ts.map +1 -1
- package/dist/cli/ralph.js +3 -1
- package/dist/cli/ralph.js.map +1 -1
- package/dist/cli/setup-preferences.d.ts +2 -0
- package/dist/cli/setup-preferences.d.ts.map +1 -1
- package/dist/cli/setup-preferences.js +4 -0
- package/dist/cli/setup-preferences.js.map +1 -1
- package/dist/cli/setup.d.ts +3 -0
- package/dist/cli/setup.d.ts.map +1 -1
- package/dist/cli/setup.js +166 -27
- package/dist/cli/setup.js.map +1 -1
- package/dist/cli/state.d.ts.map +1 -1
- package/dist/cli/state.js +8 -1
- package/dist/cli/state.js.map +1 -1
- package/dist/cli/tmux-hook.d.ts.map +1 -1
- package/dist/cli/tmux-hook.js +16 -0
- package/dist/cli/tmux-hook.js.map +1 -1
- package/dist/cli/update.d.ts +22 -3
- package/dist/cli/update.d.ts.map +1 -1
- package/dist/cli/update.js +312 -26
- package/dist/cli/update.js.map +1 -1
- package/dist/cli/version.d.ts.map +1 -1
- package/dist/cli/version.js +5 -9
- package/dist/cli/version.js.map +1 -1
- package/dist/compat/__tests__/doctor-contract.test.js +12 -1
- package/dist/compat/__tests__/doctor-contract.test.js.map +1 -1
- package/dist/config/__tests__/generator-notify.test.js +1 -0
- package/dist/config/__tests__/generator-notify.test.js.map +1 -1
- package/dist/config/generator.d.ts +2 -2
- package/dist/config/generator.d.ts.map +1 -1
- package/dist/config/generator.js +2 -2
- package/dist/config/generator.js.map +1 -1
- package/dist/config/team-mode.d.ts +12 -0
- package/dist/config/team-mode.d.ts.map +1 -0
- package/dist/config/team-mode.js +91 -0
- package/dist/config/team-mode.js.map +1 -0
- package/dist/hooks/__tests__/agents-overlay.test.js +88 -0
- package/dist/hooks/__tests__/agents-overlay.test.js.map +1 -1
- package/dist/hooks/__tests__/code-review-skill-contract.test.js +12 -0
- package/dist/hooks/__tests__/code-review-skill-contract.test.js.map +1 -1
- package/dist/hooks/__tests__/deep-interview-contract.test.js +30 -1
- package/dist/hooks/__tests__/deep-interview-contract.test.js.map +1 -1
- package/dist/hooks/__tests__/keyword-detector.test.js +423 -3
- package/dist/hooks/__tests__/keyword-detector.test.js.map +1 -1
- package/dist/hooks/__tests__/notify-fallback-watcher.test.js +1 -1
- package/dist/hooks/__tests__/notify-fallback-watcher.test.js.map +1 -1
- package/dist/hooks/__tests__/notify-hook-auto-nudge.test.js +189 -0
- package/dist/hooks/__tests__/notify-hook-auto-nudge.test.js.map +1 -1
- package/dist/hooks/__tests__/notify-hook-team-leader-nudge.test.js +35 -2
- package/dist/hooks/__tests__/notify-hook-team-leader-nudge.test.js.map +1 -1
- package/dist/hooks/__tests__/notify-hook-tmux-heal.test.js +3 -3
- package/dist/hooks/__tests__/notify-hook-tmux-heal.test.js.map +1 -1
- package/dist/hooks/__tests__/skill-guidance-contract.test.js +21 -0
- package/dist/hooks/__tests__/skill-guidance-contract.test.js.map +1 -1
- package/dist/hooks/agents-overlay.d.ts.map +1 -1
- package/dist/hooks/agents-overlay.js +36 -50
- package/dist/hooks/agents-overlay.js.map +1 -1
- package/dist/hooks/extensibility/__tests__/plugin-runner.test.js +31 -0
- package/dist/hooks/extensibility/__tests__/plugin-runner.test.js.map +1 -1
- package/dist/hooks/extensibility/plugin-runner.js +17 -21
- package/dist/hooks/extensibility/plugin-runner.js.map +1 -1
- package/dist/hooks/keyword-detector.d.ts.map +1 -1
- package/dist/hooks/keyword-detector.js +258 -12
- package/dist/hooks/keyword-detector.js.map +1 -1
- package/dist/hooks/prompt-guidance-contract.d.ts.map +1 -1
- package/dist/hooks/prompt-guidance-contract.js +6 -0
- package/dist/hooks/prompt-guidance-contract.js.map +1 -1
- package/dist/hooks/session.d.ts +1 -0
- package/dist/hooks/session.d.ts.map +1 -1
- package/dist/hooks/session.js.map +1 -1
- package/dist/hud/__tests__/authority.test.js +435 -32
- package/dist/hud/__tests__/authority.test.js.map +1 -1
- package/dist/hud/__tests__/hud-tmux-injection.test.js +2 -1
- package/dist/hud/__tests__/hud-tmux-injection.test.js.map +1 -1
- package/dist/hud/__tests__/index.test.js +42 -0
- package/dist/hud/__tests__/index.test.js.map +1 -1
- package/dist/hud/__tests__/reconcile.test.js +642 -15
- package/dist/hud/__tests__/reconcile.test.js.map +1 -1
- package/dist/hud/__tests__/render.test.js +61 -0
- package/dist/hud/__tests__/render.test.js.map +1 -1
- package/dist/hud/__tests__/state.test.js +160 -4
- package/dist/hud/__tests__/state.test.js.map +1 -1
- package/dist/hud/__tests__/tmux.test.js +180 -21
- package/dist/hud/__tests__/tmux.test.js.map +1 -1
- package/dist/hud/authority.d.ts +5 -0
- package/dist/hud/authority.d.ts.map +1 -1
- package/dist/hud/authority.js +324 -28
- package/dist/hud/authority.js.map +1 -1
- package/dist/hud/index.d.ts +3 -2
- package/dist/hud/index.d.ts.map +1 -1
- package/dist/hud/index.js +42 -19
- package/dist/hud/index.js.map +1 -1
- package/dist/hud/reconcile.d.ts +3 -3
- package/dist/hud/reconcile.d.ts.map +1 -1
- package/dist/hud/reconcile.js +128 -19
- package/dist/hud/reconcile.js.map +1 -1
- package/dist/hud/render.d.ts.map +1 -1
- package/dist/hud/render.js +35 -0
- package/dist/hud/render.js.map +1 -1
- package/dist/hud/state.d.ts.map +1 -1
- package/dist/hud/state.js +65 -80
- package/dist/hud/state.js.map +1 -1
- package/dist/hud/tmux.d.ts +24 -6
- package/dist/hud/tmux.d.ts.map +1 -1
- package/dist/hud/tmux.js +136 -38
- package/dist/hud/tmux.js.map +1 -1
- package/dist/hud/types.d.ts +11 -0
- package/dist/hud/types.d.ts.map +1 -1
- package/dist/hud/types.js.map +1 -1
- package/dist/mcp/__tests__/state-paths.test.js +71 -1
- package/dist/mcp/__tests__/state-paths.test.js.map +1 -1
- package/dist/mcp/state-paths.d.ts +32 -0
- package/dist/mcp/state-paths.d.ts.map +1 -1
- package/dist/mcp/state-paths.js +113 -17
- package/dist/mcp/state-paths.js.map +1 -1
- package/dist/mcp/state-server.d.ts +4 -4
- package/dist/question/__tests__/renderer.test.js +566 -1
- package/dist/question/__tests__/renderer.test.js.map +1 -1
- package/dist/question/renderer.d.ts +9 -1
- package/dist/question/renderer.d.ts.map +1 -1
- package/dist/question/renderer.js +246 -70
- package/dist/question/renderer.js.map +1 -1
- package/dist/scripts/__tests__/codex-native-hook.test.js +837 -101
- package/dist/scripts/__tests__/codex-native-hook.test.js.map +1 -1
- package/dist/scripts/__tests__/notify-state-io.test.js +72 -1
- package/dist/scripts/__tests__/notify-state-io.test.js.map +1 -1
- package/dist/scripts/__tests__/notify-tmux-injection.test.d.ts +2 -0
- package/dist/scripts/__tests__/notify-tmux-injection.test.d.ts.map +1 -0
- package/dist/scripts/__tests__/notify-tmux-injection.test.js +57 -0
- package/dist/scripts/__tests__/notify-tmux-injection.test.js.map +1 -0
- package/dist/scripts/__tests__/run-test-files.test.js +74 -0
- package/dist/scripts/__tests__/run-test-files.test.js.map +1 -1
- package/dist/scripts/__tests__/verify-native-agents.test.js +65 -0
- package/dist/scripts/__tests__/verify-native-agents.test.js.map +1 -1
- package/dist/scripts/codex-native-hook.d.ts.map +1 -1
- package/dist/scripts/codex-native-hook.js +107 -39
- package/dist/scripts/codex-native-hook.js.map +1 -1
- package/dist/scripts/eval/eval-parity-smoke.js +1 -1
- package/dist/scripts/eval/eval-parity-smoke.js.map +1 -1
- package/dist/scripts/notify-hook/auto-nudge.d.ts.map +1 -1
- package/dist/scripts/notify-hook/auto-nudge.js +3 -1
- package/dist/scripts/notify-hook/auto-nudge.js.map +1 -1
- package/dist/scripts/notify-hook/ralph-session-resume.d.ts.map +1 -1
- package/dist/scripts/notify-hook/ralph-session-resume.js +3 -10
- package/dist/scripts/notify-hook/ralph-session-resume.js.map +1 -1
- package/dist/scripts/notify-hook/state-io.d.ts.map +1 -1
- package/dist/scripts/notify-hook/state-io.js +62 -38
- package/dist/scripts/notify-hook/state-io.js.map +1 -1
- package/dist/scripts/notify-hook/team-leader-nudge.d.ts.map +1 -1
- package/dist/scripts/notify-hook/team-leader-nudge.js +7 -0
- package/dist/scripts/notify-hook/team-leader-nudge.js.map +1 -1
- package/dist/scripts/notify-hook/tmux-injection.d.ts +7 -0
- package/dist/scripts/notify-hook/tmux-injection.d.ts.map +1 -1
- package/dist/scripts/notify-hook/tmux-injection.js +24 -18
- package/dist/scripts/notify-hook/tmux-injection.js.map +1 -1
- package/dist/scripts/notify-hook.js +75 -11
- package/dist/scripts/notify-hook.js.map +1 -1
- package/dist/scripts/run-test-files.js +193 -22
- package/dist/scripts/run-test-files.js.map +1 -1
- package/dist/scripts/sync-plugin-mirror.d.ts.map +1 -1
- package/dist/scripts/sync-plugin-mirror.js +61 -3
- package/dist/scripts/sync-plugin-mirror.js.map +1 -1
- package/dist/scripts/verify-native-agents.d.ts.map +1 -1
- package/dist/scripts/verify-native-agents.js +58 -1
- package/dist/scripts/verify-native-agents.js.map +1 -1
- package/dist/state/__tests__/operations.test.js +113 -0
- package/dist/state/__tests__/operations.test.js.map +1 -1
- package/dist/state/__tests__/skill-active.test.js +3 -16
- package/dist/state/__tests__/skill-active.test.js.map +1 -1
- package/dist/state/__tests__/workflow-transition.test.js +25 -0
- package/dist/state/__tests__/workflow-transition.test.js.map +1 -1
- package/dist/state/operations.d.ts.map +1 -1
- package/dist/state/operations.js +57 -2
- package/dist/state/operations.js.map +1 -1
- package/dist/state/skill-active.d.ts.map +1 -1
- package/dist/state/skill-active.js +7 -39
- package/dist/state/skill-active.js.map +1 -1
- package/dist/state/workflow-transition-reconcile.d.ts.map +1 -1
- package/dist/state/workflow-transition-reconcile.js +10 -14
- package/dist/state/workflow-transition-reconcile.js.map +1 -1
- package/dist/team/__tests__/runtime.test.js +1 -1
- package/dist/team/__tests__/runtime.test.js.map +1 -1
- package/dist/team/__tests__/scaling.test.js +9 -4
- package/dist/team/__tests__/scaling.test.js.map +1 -1
- package/dist/team/__tests__/tmux-session.test.js +195 -2
- package/dist/team/__tests__/tmux-session.test.js.map +1 -1
- package/dist/team/__tests__/worker-runtime-identity.test.js +4 -2
- package/dist/team/__tests__/worker-runtime-identity.test.js.map +1 -1
- package/dist/team/scaling.d.ts.map +1 -1
- package/dist/team/scaling.js +3 -2
- package/dist/team/scaling.js.map +1 -1
- package/dist/team/tmux-session.d.ts +2 -0
- package/dist/team/tmux-session.d.ts.map +1 -1
- package/dist/team/tmux-session.js +142 -12
- package/dist/team/tmux-session.js.map +1 -1
- package/dist/utils/__tests__/platform-command.test.js +16 -1
- package/dist/utils/__tests__/platform-command.test.js.map +1 -1
- package/dist/utils/__tests__/version.test.d.ts +2 -0
- package/dist/utils/__tests__/version.test.d.ts.map +1 -0
- package/dist/utils/__tests__/version.test.js +51 -0
- package/dist/utils/__tests__/version.test.js.map +1 -0
- package/dist/utils/paths.d.ts +8 -1
- package/dist/utils/paths.d.ts.map +1 -1
- package/dist/utils/paths.js +16 -4
- package/dist/utils/paths.js.map +1 -1
- package/dist/utils/platform-command.d.ts +9 -0
- package/dist/utils/platform-command.d.ts.map +1 -1
- package/dist/utils/platform-command.js +15 -0
- package/dist/utils/platform-command.js.map +1 -1
- package/dist/utils/version.d.ts +7 -0
- package/dist/utils/version.d.ts.map +1 -0
- package/dist/utils/version.js +67 -0
- package/dist/utils/version.js.map +1 -0
- package/dist/verification/__tests__/ci-rust-gates.test.js +89 -1
- package/dist/verification/__tests__/ci-rust-gates.test.js.map +1 -1
- package/dist/verification/__tests__/dev-merge-issue-close-workflow.test.js +16 -2
- package/dist/verification/__tests__/dev-merge-issue-close-workflow.test.js.map +1 -1
- package/package.json +11 -10
- package/plugins/oh-my-codex/.codex-plugin/plugin.json +1 -1
- package/plugins/oh-my-codex/hooks/codex-native-hook.mjs +334 -21
- package/plugins/oh-my-codex/hooks/hooks.json +1 -2
- package/plugins/oh-my-codex/skills/autopilot/SKILL.md +3 -1
- package/plugins/oh-my-codex/skills/code-review/SKILL.md +7 -7
- package/plugins/oh-my-codex/skills/deep-interview/SKILL.md +51 -11
- package/plugins/oh-my-codex/skills/ralph/SKILL.md +22 -22
- package/plugins/oh-my-codex/skills/ultraqa/SKILL.md +9 -0
- package/skills/autopilot/SKILL.md +3 -1
- package/skills/code-review/SKILL.md +7 -7
- package/skills/deep-interview/SKILL.md +51 -11
- package/skills/ralph/SKILL.md +22 -22
- package/skills/ultraqa/SKILL.md +9 -0
- package/src/scripts/__tests__/codex-native-hook.test.ts +946 -98
- package/src/scripts/__tests__/notify-state-io.test.ts +95 -0
- package/src/scripts/__tests__/notify-tmux-injection.test.ts +82 -0
- package/src/scripts/__tests__/run-test-files.test.ts +102 -0
- package/src/scripts/__tests__/verify-native-agents.test.ts +75 -0
- package/src/scripts/codex-native-hook.ts +123 -34
- package/src/scripts/demo-team-e2e.sh +10 -7
- package/src/scripts/eval/eval-parity-smoke.ts +1 -1
- package/src/scripts/notify-hook/auto-nudge.ts +3 -1
- package/src/scripts/notify-hook/ralph-session-resume.ts +2 -8
- package/src/scripts/notify-hook/state-io.ts +75 -37
- package/src/scripts/notify-hook/team-leader-nudge.ts +7 -0
- package/src/scripts/notify-hook/tmux-injection.ts +35 -19
- package/src/scripts/notify-hook.ts +91 -4
- package/src/scripts/prepare-build.js +83 -0
- package/src/scripts/run-test-files.ts +192 -22
- package/src/scripts/sync-plugin-mirror.ts +98 -9
- package/src/scripts/verify-native-agents.ts +65 -1
- package/src/scripts/postinstall-bootstrap.js +0 -23
|
@@ -26,14 +26,14 @@ Ralph is a persistence loop that keeps working on a task until it is fully compl
|
|
|
26
26
|
</Do_Not_Use_When>
|
|
27
27
|
|
|
28
28
|
<Why_This_Exists>
|
|
29
|
-
Complex tasks often fail silently: partial implementations get declared "done", tests get skipped, edge cases get forgotten. Ralph prevents this by looping until work is genuinely complete, requiring fresh verification evidence before allowing completion, and using
|
|
29
|
+
Complex tasks often fail silently: partial implementations get declared "done", tests get skipped, edge cases get forgotten. Ralph prevents this by looping until work is genuinely complete, requiring fresh verification evidence before allowing completion, and using explicit architect native-subagent verification to confirm quality.
|
|
30
30
|
</Why_This_Exists>
|
|
31
31
|
|
|
32
32
|
<Execution_Policy>
|
|
33
33
|
- Fire independent agent calls simultaneously -- never wait sequentially for independent work
|
|
34
34
|
- Use `run_in_background: true` for long operations (installs, builds, test suites)
|
|
35
|
-
- Always
|
|
36
|
-
-
|
|
35
|
+
- Always set `agent_type` when spawning native subagents; use `reasoning_effort` for per-dispatch intensity when needed
|
|
36
|
+
- Preserve legacy Ralph tier intent through native reasoning effort: LOW -> `low`, STANDARD -> `medium`, THOROUGH -> `xhigh`
|
|
37
37
|
- Deliver the full implementation: no scope reduction, no partial completion, no deleting tests to make them pass
|
|
38
38
|
- Apply the shared workflow guidance pattern: outcome-first framing, concise visible updates for multi-step execution, local overrides for the active workflow branch, validation proportional to risk, explicit stop rules, and automatic continuation for safe reversible steps. Ask only for material, destructive, credentialed, external-production, or preference-dependent branches.
|
|
39
39
|
- Integrate with Codex goal mode when goal tools are available: inspect the active thread goal with `get_goal`, preserve it as the top-level stop condition, and only call `update_goal({status: "complete"})` after a Ralph completion audit proves the objective is actually achieved.
|
|
@@ -54,10 +54,10 @@ Complex tasks often fail silently: partial implementations get declared "done",
|
|
|
54
54
|
- Do not begin Ralph execution work (delegation, implementation, or verification loops) until snapshot grounding exists. If forced to proceed quickly, note explicit risk tradeoffs.
|
|
55
55
|
1. **Review progress**: Check TODO list and any prior iteration state
|
|
56
56
|
2. **Continue from where you left off**: Pick up incomplete tasks
|
|
57
|
-
3. **Delegate in parallel**: Route tasks to specialist agents
|
|
58
|
-
- Simple lookups:
|
|
59
|
-
- Standard work:
|
|
60
|
-
- Complex analysis:
|
|
57
|
+
3. **Delegate in parallel**: Route tasks to specialist native agents with explicit `agent_type` and appropriate `reasoning_effort`
|
|
58
|
+
- Simple lookups: `reasoning_effort="low"` -- "What does this function return?"
|
|
59
|
+
- Standard work: `reasoning_effort="medium"` -- "Add error handling to this module"
|
|
60
|
+
- Complex analysis: `reasoning_effort="xhigh"` -- "Debug this race condition"
|
|
61
61
|
- When Ralph is entered as a ralplan follow-up, start from the approved **available-agent-types roster** and make the delegation plan explicit: implementation lane, evidence/regression lane, and final sign-off lane using only known agent types
|
|
62
62
|
4. **Run long operations in background**: Builds, installs, test suites use `run_in_background: true`
|
|
63
63
|
5. **Visual task gate (when screenshot/reference images are present)**:
|
|
@@ -72,11 +72,11 @@ Complex tasks often fail silently: partial implementations get declared "done",
|
|
|
72
72
|
b. Run verification (test, build, lint)
|
|
73
73
|
c. Read the output -- confirm it actually passed
|
|
74
74
|
d. Check: zero pending/in_progress TODO items
|
|
75
|
-
7. **Architect verification** (
|
|
76
|
-
- <5 files, <100 lines with full tests:
|
|
77
|
-
- Standard changes:
|
|
78
|
-
- >20 files or security/architectural changes:
|
|
79
|
-
- Ralph floor: always
|
|
75
|
+
7. **Architect verification** (native role):
|
|
76
|
+
- <5 files, <100 lines with full tests: `task(agent_type="architect", reasoning_effort="medium", prompt="...")` minimum
|
|
77
|
+
- Standard changes: `task(agent_type="architect", reasoning_effort="medium", prompt="...")`
|
|
78
|
+
- >20 files or security/architectural changes: `task(agent_type="architect", reasoning_effort="xhigh", prompt="...")`
|
|
79
|
+
- Ralph floor: always run an explicit `architect` native subagent, even for small changes
|
|
80
80
|
7.5 **Mandatory Deslop Pass**:
|
|
81
81
|
- After Step 7 passes, run `oh-my-codex:ai-slop-cleaner` on **all files changed during the Ralph session**.
|
|
82
82
|
- Scope the cleaner to **changed files only**; do not widen the pass beyond Ralph-owned edits.
|
|
@@ -87,7 +87,7 @@ Complex tasks often fail silently: partial implementations get declared "done",
|
|
|
87
87
|
- If post-deslop regression fails, roll back cleaner changes or fix and retry. Then rerun Step 7.5 and Step 7.6 until the regression is green.
|
|
88
88
|
- Do not proceed to completion until post-deslop regression is green (unless `--no-deslop` explicitly skipped the deslop pass).
|
|
89
89
|
8. **On approval**: If Codex goal mode is active, call `update_goal({status: "complete"})` before `/cancel`; report final elapsed time and token-budget usage when the tool returns it. Then run `/cancel` to cleanly exit and clean up all state files.
|
|
90
|
-
9. **On rejection**: Fix the issues raised, then re-verify
|
|
90
|
+
9. **On rejection**: Fix the issues raised, then re-verify with the same `agent_type` and `reasoning_effort` profile
|
|
91
91
|
</Steps>
|
|
92
92
|
|
|
93
93
|
<Tool_Usage>
|
|
@@ -150,11 +150,11 @@ Use the CLI-first state surface for Ralph lifecycle state (`omx state write/read
|
|
|
150
150
|
<Good>
|
|
151
151
|
Correct parallel delegation:
|
|
152
152
|
```
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
153
|
+
task(agent_type="executor", reasoning_effort="low", prompt="Add type export for UserConfig")
|
|
154
|
+
task(agent_type="executor", reasoning_effort="medium", prompt="Implement the caching layer for API responses")
|
|
155
|
+
task(agent_type="executor", reasoning_effort="xhigh", prompt="Refactor auth module to support OAuth2 flow")
|
|
156
156
|
```
|
|
157
|
-
Why good: Three independent tasks fired simultaneously
|
|
157
|
+
Why good: Three independent tasks fired simultaneously while explicitly selecting the installed `executor` native role, so the UI/tracker does not show default subagents; legacy tier intent is preserved through native reasoning effort (`LOW` -> `low`, `STANDARD` -> `medium`, `THOROUGH` -> `xhigh`).
|
|
158
158
|
</Good>
|
|
159
159
|
|
|
160
160
|
<Good>
|
|
@@ -163,7 +163,7 @@ Correct verification before completion:
|
|
|
163
163
|
1. Run: npm test → Output: "42 passed, 0 failed"
|
|
164
164
|
2. Run: npm run build → Output: "Build succeeded"
|
|
165
165
|
3. Run: lsp_diagnostics → Output: 0 errors
|
|
166
|
-
4.
|
|
166
|
+
4. task(agent_type="architect", reasoning_effort="medium", prompt="verify completion") → Verdict: "APPROVED"
|
|
167
167
|
5. Run /cancel
|
|
168
168
|
```
|
|
169
169
|
Why good: Fresh evidence at each step, architect verification, then clean exit.
|
|
@@ -178,9 +178,9 @@ Why bad: Uses "should" and "look good" -- no fresh test/build output, no archite
|
|
|
178
178
|
<Bad>
|
|
179
179
|
Sequential execution of independent tasks:
|
|
180
180
|
```
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
181
|
+
task(agent_type="executor", reasoning_effort="low", prompt="Add type export") → wait →
|
|
182
|
+
task(agent_type="executor", reasoning_effort="medium", prompt="Implement caching") → wait →
|
|
183
|
+
task(agent_type="executor", reasoning_effort="xhigh", prompt="Refactor auth")
|
|
184
184
|
```
|
|
185
185
|
Why bad: These are independent tasks that should run in parallel, not sequentially.
|
|
186
186
|
</Bad>
|
|
@@ -200,7 +200,7 @@ Why bad: These are independent tasks that should run in parallel, not sequential
|
|
|
200
200
|
- [ ] Fresh test run output shows all tests pass
|
|
201
201
|
- [ ] Fresh build output shows success
|
|
202
202
|
- [ ] lsp_diagnostics shows 0 errors on affected files
|
|
203
|
-
- [ ] Architect verification passed (
|
|
203
|
+
- [ ] Architect verification passed through explicit `task(agent_type="architect", reasoning_effort="medium"...)` minimum
|
|
204
204
|
- [ ] Codex goal-mode completion audit passed, and `update_goal({status: "complete"})` was called when an active goal exists
|
|
205
205
|
- [ ] ai-slop-cleaner pass completed on changed files (or --no-deslop specified)
|
|
206
206
|
- [ ] Post-deslop regression tests pass
|
|
@@ -58,6 +58,15 @@ The matrix must include normal-path coverage plus adversarial dynamic e2e scenar
|
|
|
58
58
|
- Validate exit codes and output semantics; do not trust success-looking text alone.
|
|
59
59
|
- Do not delete, rewrite, or mask unrelated user work. Capture dirty-worktree evidence before and after generated harness work.
|
|
60
60
|
|
|
61
|
+
### Temporary Harness Generation Guardrails
|
|
62
|
+
|
|
63
|
+
Generated harnesses are part of the QA evidence chain; until setup succeeds, they are evidence about the harness apparatus, not product behavior.
|
|
64
|
+
|
|
65
|
+
- **Use absolute repo imports for built artifacts.** When a harness runs from `/tmp` or another scratch directory but imports repository code, resolve the repository root explicitly from the verified repo cwd and import built modules with an absolute path or `pathToFileURL(join(repoRoot, "dist", ...)).href`. Never rely on `./dist/...` from the harness file's temporary directory.
|
|
66
|
+
- **Use a safe file writer for JS/TS harness bodies.** Prefer a small Node/Python writer or another non-interpolating file-write mechanism for harness source that contains backticks, `${...}`, shell metacharacters, or prompt-injection strings. If a shell heredoc is unavoidable, quote the delimiter and verify the written file before execution; do not use interpolating heredocs for JavaScript assertions.
|
|
67
|
+
- **Sanitize OMX runtime env for isolated probes.** When the scenario creates a temporary repo/state tree or intentionally checks local isolation, run the probe with `OMX_ROOT` and `OMX_STATE_ROOT` unset (for example `env -u OMX_ROOT -u OMX_STATE_ROOT ...`) so ambient boxed runtime state cannot redirect reads/writes away from the scenario fixture.
|
|
68
|
+
- **Classify harness setup failures separately.** If a generated harness fails before exercising product behavior because of import paths, shell interpolation, environment leakage, or fixture construction, record it as harness debris, fix the harness, and rerun the scenario before declaring a product defect.
|
|
69
|
+
|
|
61
70
|
## Cycle Workflow
|
|
62
71
|
|
|
63
72
|
### Cycle N (Max 5)
|
|
@@ -68,12 +68,14 @@ Before Phase `deep-interview` or `ralplan` starts or resumes:
|
|
|
68
68
|
1. Derive a task slug from the request.
|
|
69
69
|
2. Reuse the latest relevant `.omx/context/{slug}-*.md` snapshot when available.
|
|
70
70
|
3. If none exists, create `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) with:
|
|
71
|
-
- task
|
|
71
|
+
- activation prompt / task seed
|
|
72
|
+
- original task status (`activation-prompt`, `legacy-unverified`, or `unavailable`)
|
|
72
73
|
- desired outcome
|
|
73
74
|
- known facts/evidence
|
|
74
75
|
- constraints
|
|
75
76
|
- unknowns/open questions
|
|
76
77
|
- likely codebase touchpoints
|
|
78
|
+
- a scope note that the seed is the Autopilot activation prompt, not guaranteed prior conversation context
|
|
77
79
|
4. If brownfield facts are missing, run `explore` first before or during `$deep-interview` (`$deep-interview --quick <task>` remains acceptable for bounded low-ambiguity intake); do not skip the clarification gate merely because the task sounds actionable.
|
|
78
80
|
5. Carry the snapshot path in Autopilot state and all handoff artifacts.
|
|
79
81
|
</Pre-context Intake>
|
|
@@ -31,7 +31,7 @@ Delegates to the `code-reviewer` and `architect` agents in parallel for a two-la
|
|
|
31
31
|
2. **Launch Parallel Review Lanes**
|
|
32
32
|
- **`code-reviewer` lane** - owns spec compliance, security, code quality, performance, and maintainability findings
|
|
33
33
|
- **`architect` lane** - owns the devil's-advocate / design-tradeoff perspective
|
|
34
|
-
- Both lanes run in parallel and produce distinct outputs before final synthesis
|
|
34
|
+
- Both lanes run in parallel on a clean context with explicit scope and artifacts, and produce distinct outputs before final synthesis
|
|
35
35
|
- If either lane cannot be launched or does not return evidence, report `independent review unavailable`; do **not** substitute the current/authoring lane, and do **not** approve or mark the review merge-ready.
|
|
36
36
|
|
|
37
37
|
3. **Review Categories**
|
|
@@ -71,10 +71,11 @@ Delegates to the `code-reviewer` and `architect` agents in parallel for a two-la
|
|
|
71
71
|
|
|
72
72
|
Do not self-review as a fallback. If the `code-reviewer` or `architect` agent path is missing, unavailable, skipped, or fails, emit a clear unavailable-review result and block approval until the independent lane evidence exists.
|
|
73
73
|
|
|
74
|
+
Respect the user's current model and reasoning/effort selection when launching review lanes. Do not pass `model` or `reasoning_effort` overrides in the review-lane task calls unless the user explicitly asks for review-specific overrides; omitting them lets native subagents inherit the active session settings.
|
|
75
|
+
|
|
74
76
|
```
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
tier="THOROUGH",
|
|
77
|
+
task(
|
|
78
|
+
agent_type="code-reviewer",
|
|
78
79
|
prompt="CODE REVIEW TASK
|
|
79
80
|
|
|
80
81
|
Review code changes for quality, security, and maintainability.
|
|
@@ -98,9 +99,8 @@ Output: Code review report with:
|
|
|
98
99
|
- Approval recommendation (APPROVE / REQUEST CHANGES / COMMENT)"
|
|
99
100
|
)
|
|
100
101
|
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
tier="THOROUGH",
|
|
102
|
+
task(
|
|
103
|
+
agent_type="architect",
|
|
104
104
|
prompt="ARCHITECTURE / DEVIL'S-ADVOCATE REVIEW TASK
|
|
105
105
|
|
|
106
106
|
Review the same code changes from the architecture/tradeoff perspective.
|
|
@@ -51,6 +51,11 @@ If no flag is provided, use **Standard**.
|
|
|
51
51
|
- Gather codebase facts via `explore` before asking user about internals
|
|
52
52
|
- `omx explore` is deprecated. Use normal repository inspection tools/subagents for simple read-only brownfield fact gathering; use `omx sparkshell` only for explicit shell-native read-only evidence, and keep ambiguous or non-shell-only investigation on the richer normal path.
|
|
53
53
|
- Always run a preflight context intake before the first interview question
|
|
54
|
+
- For brownfield work, preflight must include doc/context grounding before user-facing questions: inspect applicable `AGENTS.md` files, README/getting-started docs, relevant `docs/` contracts/plans/ADRs, existing `.omx/context/` snapshots, and any project-local glossary/context files such as `CONTEXT.md` or `CONTEXT-MAP.md` when present.
|
|
55
|
+
- Treat existing repo language as evidence, not authority: if the user uses a fuzzy, overloaded, or conflicting term, surface the specific doc/code wording and ask which meaning should govern before implementation.
|
|
56
|
+
- Cross-check user claims about current behavior against code or documented contracts when discoverable. If docs and code disagree, ask a confirmation question that names both sources instead of silently choosing one.
|
|
57
|
+
- Use scenario-based edge-case grilling when relationships, boundaries, or handoff behavior are unclear: invent one concrete scenario that stresses the ambiguous boundary, then ask one focused question about the expected outcome.
|
|
58
|
+
- Durable docs, glossary, ADR, or memory updates are opt-in and public-safe only. Deep-interview may recommend such updates in the handoff summary, but must not automatically create or dump public docs from interview transcripts unless the user explicitly chooses that as in-scope.
|
|
54
59
|
- If initial context is oversized or would exceed the prompt budget, do not paste or forward the raw payload into interview prompts; request and record a prompt-safe initial-context summary first
|
|
55
60
|
- The oversized initial-context summary gate is blocking: wait for the concise summary before ambiguity scoring, crystallizing artifacts, or any downstream execution handoff
|
|
56
61
|
- The summary must preserve goals, constraints, success criteria, non-goals, decision boundaries, and references to any full source documents so downstream consumers receive a prompt-safe but faithful context
|
|
@@ -97,8 +102,15 @@ If no flag is provided, use **Standard**.
|
|
|
97
102
|
- Unknowns/open questions
|
|
98
103
|
- Decision-boundary unknowns
|
|
99
104
|
- Likely codebase touchpoints
|
|
105
|
+
- Relevant repo docs/rules/context inspected
|
|
106
|
+
- Terminology or doc/code conflicts found
|
|
100
107
|
- Prompt-safe initial-context summary status (`not_needed`, `needed`, or `recorded`)
|
|
101
|
-
5.
|
|
108
|
+
5. For brownfield tasks, inspect the applicable documentation/rule surface before the first user-facing round. Prefer exact, nearby sources over broad scans:
|
|
109
|
+
- governing `AGENTS.md` files and template/runtime instruction surfaces that apply to the touched paths
|
|
110
|
+
- README/getting-started docs and relevant docs under `docs/`, especially contracts, plans, ADR-like records, and workflow docs
|
|
111
|
+
- existing `.omx/context/` snapshots, `.omx/specs/`, and planning artifacts relevant to the slug
|
|
112
|
+
- project-local glossary/context files such as `CONTEXT.md`, `CONTEXT-MAP.md`, or context-specific docs when they exist
|
|
113
|
+
6. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
|
|
102
114
|
|
|
103
115
|
## Phase 1: Initialize
|
|
104
116
|
|
|
@@ -137,13 +149,14 @@ If no flag is provided, use **Standard**.
|
|
|
137
149
|
Repeat until ambiguity `<= threshold`, the pressure pass is complete, the readiness gates are explicit, the user exits with warning, or max rounds are reached. This is a stop condition: below threshold, do not open a new ordinary interview branch.
|
|
138
150
|
|
|
139
151
|
### 2a) Generate next question
|
|
140
|
-
If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
|
|
152
|
+
If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
|
|
141
153
|
|
|
142
154
|
Use:
|
|
143
155
|
- Original idea
|
|
144
156
|
- Prior Q&A rounds
|
|
145
157
|
- Current dimension scores
|
|
146
158
|
- Brownfield context (if any)
|
|
159
|
+
- Doc/context grounding notes, including existing terminology, governing rules, and any doc/code mismatch
|
|
147
160
|
- Activated challenge mode injection (Phase 3)
|
|
148
161
|
|
|
149
162
|
Target the lowest-scoring dimension, but respect stage priority:
|
|
@@ -155,12 +168,21 @@ Follow-up pressure ladder after each answer:
|
|
|
155
168
|
1. Ask for a concrete example, counterexample, or evidence signal behind the latest claim
|
|
156
169
|
2. Probe the hidden assumption, dependency, or belief that makes the claim true
|
|
157
170
|
3. Force a boundary or tradeoff: what would you explicitly not do, defer, or reject?
|
|
158
|
-
4.
|
|
171
|
+
4. Challenge fuzzy or conflicting terms against the repo's documented language and current code behavior
|
|
172
|
+
5. Stress-test the boundary with one concrete scenario or edge case when a relationship or handoff remains ambiguous
|
|
173
|
+
6. If the answer still describes symptoms, reframe toward essence / root cause before moving on
|
|
159
174
|
|
|
160
175
|
Prefer staying on the same thread for multiple rounds when it has the highest leverage. Breadth without pressure is not progress.
|
|
161
176
|
|
|
162
177
|
Maintain a **Breadth Ledger** across independent ambiguity tracks: scope, constraints, outputs, verification, brownfield integration, and any user-mentioned deliverable tracks. The ledger is a guard, not a mandatory rotation rule: stay deep on the current thread until it has been pressure-tested, then zoom out only when another material track remains unresolved and would change execution.
|
|
163
178
|
|
|
179
|
+
Maintain a **Docs/Terminology Ledger** for brownfield interviews:
|
|
180
|
+
- repo docs/rules/context sources inspected, with path references
|
|
181
|
+
- canonical terms already used by the repo and terms to avoid or disambiguate
|
|
182
|
+
- user terms that conflict with docs or current code behavior
|
|
183
|
+
- doc/code mismatches that require a human decision before implementation
|
|
184
|
+
- optional durable-doc follow-ups that are safe to propose but not auto-apply
|
|
185
|
+
|
|
164
186
|
Detailed dimensions:
|
|
165
187
|
- Intent Clarity — why the user wants this
|
|
166
188
|
- Outcome Clarity — what end state they want
|
|
@@ -306,6 +328,7 @@ Append round result and updated scores via `omx state write --input '<json>' --j
|
|
|
306
328
|
Use each mode once when applicable. These are normal escalation tools, not rare rescue moves:
|
|
307
329
|
|
|
308
330
|
- **Contrarian** (round 2+ or immediately when an answer rests on an untested assumption): challenge core assumptions
|
|
331
|
+
- **Terminologist** (brownfield, whenever a key term is fuzzy, overloaded, or conflicts with repo docs/code): force a canonical meaning against existing project language before implementation
|
|
309
332
|
- **Simplifier** (round 4+ or when scope expands faster than outcome clarity): probe minimal viable scope
|
|
310
333
|
- **Ontologist** (round 5+ and ambiguity > 0.25, or when the user keeps describing symptoms): ask for essence-level reframing
|
|
311
334
|
|
|
@@ -336,6 +359,9 @@ Spec should include:
|
|
|
336
359
|
- Assumptions exposed + resolutions
|
|
337
360
|
- Pressure-pass findings (which answer was revisited, and what changed)
|
|
338
361
|
- Brownfield evidence vs inference notes for any repository-grounded confirmation questions
|
|
362
|
+
- Docs/Terminology Ledger with inspected repo docs/rules/context, term conflicts, and any doc/code mismatch decisions
|
|
363
|
+
- Scenario/edge-case pressure findings that materially shaped scope or acceptance criteria
|
|
364
|
+
- Optional durable documentation recommendations, explicitly marked opt-in and public-safe; do not include raw private transcript dumps
|
|
339
365
|
- Technical context findings
|
|
340
366
|
- Full or condensed transcript
|
|
341
367
|
|
|
@@ -365,11 +391,11 @@ When the clarified task is specifically about `$autoresearch`, or the skill is i
|
|
|
365
391
|
|
|
366
392
|
## Phase 5: Execution Bridge
|
|
367
393
|
|
|
368
|
-
Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, and any residual-risk warnings across the handoff.
|
|
394
|
+
Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, and any residual-risk warnings across the handoff.
|
|
369
395
|
|
|
370
396
|
### Goal-mode follow-ups
|
|
371
397
|
|
|
372
|
-
Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
|
|
398
|
+
Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
|
|
373
399
|
|
|
374
400
|
- **`$ultragoal`** — default goal-mode follow-up for implementation or general goal-oriented follow-up specs that should be converted into durable Codex/OMX goals with sequential completion tracking.
|
|
375
401
|
- **`$autoresearch-goal`** — use when the clarified context is a research project: a research question, reference/literature gathering, evaluator-backed analysis, or professor/critic-style deliverable.
|
|
@@ -377,7 +403,16 @@ Include these product-facing suggestions when they fit the clarified spec, witho
|
|
|
377
403
|
|
|
378
404
|
Recommend `$ultragoal` as the default durable goal-mode follow-up because it supersedes Ralph for goal tracking. Preserve `$team` for coordinated parallel implementation and keep `$ralph` only as an explicit fallback for persistent single-owner execution/verification when the user specifically selects it.
|
|
379
405
|
|
|
380
|
-
### 1. **`$
|
|
406
|
+
### 1. **`$ultragoal` (Default durable execution follow-up)**
|
|
407
|
+
- **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
|
|
408
|
+
- **Invocation:** `$ultragoal create-goals --brief-file <spec-path>` followed by `$ultragoal complete-goals` in the active execution lane
|
|
409
|
+
- **Consumer Behavior:** Convert the clarified spec into durable goal-mode work. Preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, scenario-pressure findings, and residual-risk warnings as binding story constraints.
|
|
410
|
+
- **Skipped / Already-Satisfied Stages:** Requirement interview, ambiguity clarification, doc/context preflight, and early intent-boundary elicitation
|
|
411
|
+
- **Expected Output:** `.omx/ultragoal/brief.md`, `.omx/ultragoal/goals.json`, `.omx/ultragoal/ledger.jsonl`, implementation evidence, verification evidence, and final cleanup/review-gate evidence
|
|
412
|
+
- **Best When:** The clarified spec is execution-ready or the user explicitly wants durable goal tracking as the next step
|
|
413
|
+
- **Next Recommended Step:** Run the Ultragoal completion loop; launch `$team` only inside an active Ultragoal story when parallel lanes are warranted, and use `$ralph` only as an explicit fallback when the user asks for that legacy persistence mode
|
|
414
|
+
|
|
415
|
+
### 2. **`$ralplan` (Recommended when architecture/test-shape review is still needed)**
|
|
381
416
|
- **Input Artifact:** `.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
|
|
382
417
|
- **Invocation:** `$plan --consensus --direct <spec-path>`
|
|
383
418
|
- **Consumer Behavior:** Treat the deep-interview spec as the requirements source of truth. Do not repeat the interview by default; refine architecture/feasibility around the clarified intent and boundaries instead.
|
|
@@ -386,7 +421,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
|
|
|
386
421
|
- **Best When:** Requirements are clear enough to stop interviewing, but architectural validation / consensus planning is still desirable
|
|
387
422
|
- **Next Recommended Step:** Use the approved planning artifacts with `$ultragoal` as the default durable goal-mode follow-up (optionally with `$team` for parallel lanes); choose `$autoresearch-goal` for research validation or `$performance-goal` for measurable optimization, and use `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
|
|
388
423
|
|
|
389
|
-
###
|
|
424
|
+
### 3. **`$autopilot`**
|
|
390
425
|
- **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
|
|
391
426
|
- **Invocation:** `$autopilot <spec-path>`
|
|
392
427
|
- **Consumer Behavior:** Use the deep-interview spec as the clarified execution brief. Preserve intent, non-goals, decision boundaries, and acceptance criteria as binding context for planning/execution.
|
|
@@ -395,7 +430,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
|
|
|
395
430
|
- **Best When:** The clarified spec is already strong enough for direct planning + execution without an additional consensus gate
|
|
396
431
|
- **Next Recommended Step:** Continue through autopilot's execution/QA/validation flow; if coordination-heavy execution emerges, prefer `$team` under a leader-owned `$ultragoal` ledger, using `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
|
|
397
432
|
|
|
398
|
-
###
|
|
433
|
+
### 4. **`$ralph` (Explicit fallback only)**
|
|
399
434
|
- **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
|
|
400
435
|
- **Invocation:** `$ralph <spec-path>`
|
|
401
436
|
- **Consumer Behavior:** Use the spec's acceptance criteria and boundary constraints as the persistence target. Do not reopen requirements discovery unless the user explicitly asks to refine further.
|
|
@@ -404,7 +439,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
|
|
|
404
439
|
- **Best When:** The user explicitly asks for Ralph's persistent sequential completion pressure; otherwise use `$ultragoal` for durable goal tracking and completion checkpoints
|
|
405
440
|
- **Next Recommended Step:** If this explicit fallback is selected, continue Ralph's persistence loop; if work expands into coordination-heavy lanes, hand off to `$team` under `$ultragoal` checkpointing rather than promoting Ralph as the next default
|
|
406
441
|
|
|
407
|
-
###
|
|
442
|
+
### 5. **`$team`**
|
|
408
443
|
- **Input Artifact:** `.omx/specs/deep-interview-{slug}.md`
|
|
409
444
|
- **Invocation:** `$team <spec-path>`
|
|
410
445
|
- **Consumer Behavior:** Treat the spec as shared execution context for coordinated parallel work. Preserve the clarified intent, non-goals, decision boundaries, and acceptance criteria as common lane constraints.
|
|
@@ -413,7 +448,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
|
|
|
413
448
|
- **Best When:** The task is large, multi-lane, or blocker-sensitive enough to justify coordinated parallel execution instead of a single persistent loop
|
|
414
449
|
- **Next Recommended Step:** Follow the team verification path when the coordinated execution phase finishes; checkpoint completion through `$ultragoal` by default, escalating to a separate Ralph loop only when the user explicitly asks for that persistent verification/fix owner
|
|
415
450
|
|
|
416
|
-
###
|
|
451
|
+
### 6. **Refine further**
|
|
417
452
|
- **Input Artifact:** Existing transcript, context snapshot, and current spec draft
|
|
418
453
|
- **Invocation:** Continue the interview loop
|
|
419
454
|
- **Consumer Behavior:** Re-enter questioning to resolve the highest-leverage remaining uncertainty
|
|
@@ -437,6 +472,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
|
|
|
437
472
|
- Use `omx state write/read --input '<json>' --json` for resumable mode state; `state_write` / `state_read` are explicit MCP compatibility fallbacks only
|
|
438
473
|
- If the interview cannot ask a required `omx question` round, persist the blocker as terminal state with `active: false` and `current_phase: "blocked"`; do not write a terminal blocked phase with `active: true`
|
|
439
474
|
- Read/write context snapshots under `.omx/context/`
|
|
475
|
+
- Read applicable repo docs/rules/context during preflight; write durable docs, glossary, ADR, or memory updates only when the user explicitly opts in and the content is public-safe
|
|
440
476
|
- Record whether the oversized-context summary gate is not needed, pending, or satisfied before any scoring or handoff step
|
|
441
477
|
- Save transcript/spec artifacts under `.omx/interviews/` and `.omx/specs/`
|
|
442
478
|
</Tool_Usage>
|
|
@@ -460,7 +496,11 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
|
|
|
460
496
|
- [ ] Transcript written to `.omx/interviews/{slug}-{timestamp}.md`
|
|
461
497
|
- [ ] Spec written to `.omx/specs/deep-interview-{slug}.md`
|
|
462
498
|
- [ ] Brownfield questions use evidence-backed confirmation when applicable
|
|
463
|
-
- [ ]
|
|
499
|
+
- [ ] Brownfield preflight inspected applicable repo docs/rules/context before user-facing questions
|
|
500
|
+
- [ ] Fuzzy or conflicting terminology was challenged against repo language/current code behavior when applicable
|
|
501
|
+
- [ ] Scenario-based edge-case grilling was used when boundary ambiguity would materially affect implementation
|
|
502
|
+
- [ ] Durable docs/ADR/memory updates, if any, were explicitly opted into and public-safe
|
|
503
|
+
- [ ] Handoff options provided (`$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$autoresearch-goal`, `$performance-goal`) when applicable
|
|
464
504
|
- [ ] No direct implementation performed in this mode
|
|
465
505
|
</Final_Checklist>
|
|
466
506
|
|
package/skills/ralph/SKILL.md
CHANGED
|
@@ -26,14 +26,14 @@ Ralph is a persistence loop that keeps working on a task until it is fully compl
|
|
|
26
26
|
</Do_Not_Use_When>
|
|
27
27
|
|
|
28
28
|
<Why_This_Exists>
|
|
29
|
-
Complex tasks often fail silently: partial implementations get declared "done", tests get skipped, edge cases get forgotten. Ralph prevents this by looping until work is genuinely complete, requiring fresh verification evidence before allowing completion, and using
|
|
29
|
+
Complex tasks often fail silently: partial implementations get declared "done", tests get skipped, edge cases get forgotten. Ralph prevents this by looping until work is genuinely complete, requiring fresh verification evidence before allowing completion, and using explicit architect native-subagent verification to confirm quality.
|
|
30
30
|
</Why_This_Exists>
|
|
31
31
|
|
|
32
32
|
<Execution_Policy>
|
|
33
33
|
- Fire independent agent calls simultaneously -- never wait sequentially for independent work
|
|
34
34
|
- Use `run_in_background: true` for long operations (installs, builds, test suites)
|
|
35
|
-
- Always
|
|
36
|
-
-
|
|
35
|
+
- Always set `agent_type` when spawning native subagents; use `reasoning_effort` for per-dispatch intensity when needed
|
|
36
|
+
- Preserve legacy Ralph tier intent through native reasoning effort: LOW -> `low`, STANDARD -> `medium`, THOROUGH -> `xhigh`
|
|
37
37
|
- Deliver the full implementation: no scope reduction, no partial completion, no deleting tests to make them pass
|
|
38
38
|
- Apply the shared workflow guidance pattern: outcome-first framing, concise visible updates for multi-step execution, local overrides for the active workflow branch, validation proportional to risk, explicit stop rules, and automatic continuation for safe reversible steps. Ask only for material, destructive, credentialed, external-production, or preference-dependent branches.
|
|
39
39
|
- Integrate with Codex goal mode when goal tools are available: inspect the active thread goal with `get_goal`, preserve it as the top-level stop condition, and only call `update_goal({status: "complete"})` after a Ralph completion audit proves the objective is actually achieved.
|
|
@@ -54,10 +54,10 @@ Complex tasks often fail silently: partial implementations get declared "done",
|
|
|
54
54
|
- Do not begin Ralph execution work (delegation, implementation, or verification loops) until snapshot grounding exists. If forced to proceed quickly, note explicit risk tradeoffs.
|
|
55
55
|
1. **Review progress**: Check TODO list and any prior iteration state
|
|
56
56
|
2. **Continue from where you left off**: Pick up incomplete tasks
|
|
57
|
-
3. **Delegate in parallel**: Route tasks to specialist agents
|
|
58
|
-
- Simple lookups:
|
|
59
|
-
- Standard work:
|
|
60
|
-
- Complex analysis:
|
|
57
|
+
3. **Delegate in parallel**: Route tasks to specialist native agents with explicit `agent_type` and appropriate `reasoning_effort`
|
|
58
|
+
- Simple lookups: `reasoning_effort="low"` -- "What does this function return?"
|
|
59
|
+
- Standard work: `reasoning_effort="medium"` -- "Add error handling to this module"
|
|
60
|
+
- Complex analysis: `reasoning_effort="xhigh"` -- "Debug this race condition"
|
|
61
61
|
- When Ralph is entered as a ralplan follow-up, start from the approved **available-agent-types roster** and make the delegation plan explicit: implementation lane, evidence/regression lane, and final sign-off lane using only known agent types
|
|
62
62
|
4. **Run long operations in background**: Builds, installs, test suites use `run_in_background: true`
|
|
63
63
|
5. **Visual task gate (when screenshot/reference images are present)**:
|
|
@@ -72,11 +72,11 @@ Complex tasks often fail silently: partial implementations get declared "done",
|
|
|
72
72
|
b. Run verification (test, build, lint)
|
|
73
73
|
c. Read the output -- confirm it actually passed
|
|
74
74
|
d. Check: zero pending/in_progress TODO items
|
|
75
|
-
7. **Architect verification** (
|
|
76
|
-
- <5 files, <100 lines with full tests:
|
|
77
|
-
- Standard changes:
|
|
78
|
-
- >20 files or security/architectural changes:
|
|
79
|
-
- Ralph floor: always
|
|
75
|
+
7. **Architect verification** (native role):
|
|
76
|
+
- <5 files, <100 lines with full tests: `task(agent_type="architect", reasoning_effort="medium", prompt="...")` minimum
|
|
77
|
+
- Standard changes: `task(agent_type="architect", reasoning_effort="medium", prompt="...")`
|
|
78
|
+
- >20 files or security/architectural changes: `task(agent_type="architect", reasoning_effort="xhigh", prompt="...")`
|
|
79
|
+
- Ralph floor: always run an explicit `architect` native subagent, even for small changes
|
|
80
80
|
7.5 **Mandatory Deslop Pass**:
|
|
81
81
|
- After Step 7 passes, run `oh-my-codex:ai-slop-cleaner` on **all files changed during the Ralph session**.
|
|
82
82
|
- Scope the cleaner to **changed files only**; do not widen the pass beyond Ralph-owned edits.
|
|
@@ -87,7 +87,7 @@ Complex tasks often fail silently: partial implementations get declared "done",
|
|
|
87
87
|
- If post-deslop regression fails, roll back cleaner changes or fix and retry. Then rerun Step 7.5 and Step 7.6 until the regression is green.
|
|
88
88
|
- Do not proceed to completion until post-deslop regression is green (unless `--no-deslop` explicitly skipped the deslop pass).
|
|
89
89
|
8. **On approval**: If Codex goal mode is active, call `update_goal({status: "complete"})` before `/cancel`; report final elapsed time and token-budget usage when the tool returns it. Then run `/cancel` to cleanly exit and clean up all state files.
|
|
90
|
-
9. **On rejection**: Fix the issues raised, then re-verify
|
|
90
|
+
9. **On rejection**: Fix the issues raised, then re-verify with the same `agent_type` and `reasoning_effort` profile
|
|
91
91
|
</Steps>
|
|
92
92
|
|
|
93
93
|
<Tool_Usage>
|
|
@@ -150,11 +150,11 @@ Use the CLI-first state surface for Ralph lifecycle state (`omx state write/read
|
|
|
150
150
|
<Good>
|
|
151
151
|
Correct parallel delegation:
|
|
152
152
|
```
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
153
|
+
task(agent_type="executor", reasoning_effort="low", prompt="Add type export for UserConfig")
|
|
154
|
+
task(agent_type="executor", reasoning_effort="medium", prompt="Implement the caching layer for API responses")
|
|
155
|
+
task(agent_type="executor", reasoning_effort="xhigh", prompt="Refactor auth module to support OAuth2 flow")
|
|
156
156
|
```
|
|
157
|
-
Why good: Three independent tasks fired simultaneously
|
|
157
|
+
Why good: Three independent tasks fired simultaneously while explicitly selecting the installed `executor` native role, so the UI/tracker does not show default subagents; legacy tier intent is preserved through native reasoning effort (`LOW` -> `low`, `STANDARD` -> `medium`, `THOROUGH` -> `xhigh`).
|
|
158
158
|
</Good>
|
|
159
159
|
|
|
160
160
|
<Good>
|
|
@@ -163,7 +163,7 @@ Correct verification before completion:
|
|
|
163
163
|
1. Run: npm test → Output: "42 passed, 0 failed"
|
|
164
164
|
2. Run: npm run build → Output: "Build succeeded"
|
|
165
165
|
3. Run: lsp_diagnostics → Output: 0 errors
|
|
166
|
-
4.
|
|
166
|
+
4. task(agent_type="architect", reasoning_effort="medium", prompt="verify completion") → Verdict: "APPROVED"
|
|
167
167
|
5. Run /cancel
|
|
168
168
|
```
|
|
169
169
|
Why good: Fresh evidence at each step, architect verification, then clean exit.
|
|
@@ -178,9 +178,9 @@ Why bad: Uses "should" and "look good" -- no fresh test/build output, no archite
|
|
|
178
178
|
<Bad>
|
|
179
179
|
Sequential execution of independent tasks:
|
|
180
180
|
```
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
181
|
+
task(agent_type="executor", reasoning_effort="low", prompt="Add type export") → wait →
|
|
182
|
+
task(agent_type="executor", reasoning_effort="medium", prompt="Implement caching") → wait →
|
|
183
|
+
task(agent_type="executor", reasoning_effort="xhigh", prompt="Refactor auth")
|
|
184
184
|
```
|
|
185
185
|
Why bad: These are independent tasks that should run in parallel, not sequentially.
|
|
186
186
|
</Bad>
|
|
@@ -200,7 +200,7 @@ Why bad: These are independent tasks that should run in parallel, not sequential
|
|
|
200
200
|
- [ ] Fresh test run output shows all tests pass
|
|
201
201
|
- [ ] Fresh build output shows success
|
|
202
202
|
- [ ] lsp_diagnostics shows 0 errors on affected files
|
|
203
|
-
- [ ] Architect verification passed (
|
|
203
|
+
- [ ] Architect verification passed through explicit `task(agent_type="architect", reasoning_effort="medium"...)` minimum
|
|
204
204
|
- [ ] Codex goal-mode completion audit passed, and `update_goal({status: "complete"})` was called when an active goal exists
|
|
205
205
|
- [ ] ai-slop-cleaner pass completed on changed files (or --no-deslop specified)
|
|
206
206
|
- [ ] Post-deslop regression tests pass
|
package/skills/ultraqa/SKILL.md
CHANGED
|
@@ -58,6 +58,15 @@ The matrix must include normal-path coverage plus adversarial dynamic e2e scenar
|
|
|
58
58
|
- Validate exit codes and output semantics; do not trust success-looking text alone.
|
|
59
59
|
- Do not delete, rewrite, or mask unrelated user work. Capture dirty-worktree evidence before and after generated harness work.
|
|
60
60
|
|
|
61
|
+
### Temporary Harness Generation Guardrails
|
|
62
|
+
|
|
63
|
+
Generated harnesses are part of the QA evidence chain; until setup succeeds, they are evidence about the harness apparatus, not product behavior.
|
|
64
|
+
|
|
65
|
+
- **Use absolute repo imports for built artifacts.** When a harness runs from `/tmp` or another scratch directory but imports repository code, resolve the repository root explicitly from the verified repo cwd and import built modules with an absolute path or `pathToFileURL(join(repoRoot, "dist", ...)).href`. Never rely on `./dist/...` from the harness file's temporary directory.
|
|
66
|
+
- **Use a safe file writer for JS/TS harness bodies.** Prefer a small Node/Python writer or another non-interpolating file-write mechanism for harness source that contains backticks, `${...}`, shell metacharacters, or prompt-injection strings. If a shell heredoc is unavoidable, quote the delimiter and verify the written file before execution; do not use interpolating heredocs for JavaScript assertions.
|
|
67
|
+
- **Sanitize OMX runtime env for isolated probes.** When the scenario creates a temporary repo/state tree or intentionally checks local isolation, run the probe with `OMX_ROOT` and `OMX_STATE_ROOT` unset (for example `env -u OMX_ROOT -u OMX_STATE_ROOT ...`) so ambient boxed runtime state cannot redirect reads/writes away from the scenario fixture.
|
|
68
|
+
- **Classify harness setup failures separately.** If a generated harness fails before exercising product behavior because of import paths, shell interpolation, environment leakage, or fixture construction, record it as harness debris, fix the harness, and rerun the scenario before declaring a product defect.
|
|
69
|
+
|
|
61
70
|
## Cycle Workflow
|
|
62
71
|
|
|
63
72
|
### Cycle N (Max 5)
|