rafcode 3.2.1 → 3.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +3 -1
- package/CLAUDE.md +0 -1
- package/RAF/41-echo-chamber/decisions.md +13 -0
- package/RAF/41-echo-chamber/input.md +4 -0
- package/RAF/41-echo-chamber/outcomes/1-update-codex-model-defaults.md +24 -0
- package/RAF/41-echo-chamber/outcomes/2-e2e-test-codex-provider.md +74 -0
- package/RAF/41-echo-chamber/plans/1-update-codex-model-defaults.md +28 -0
- package/RAF/41-echo-chamber/plans/2-e2e-test-codex-provider.md +103 -0
- package/RAF/42-patch-parade/decisions.md +29 -0
- package/RAF/42-patch-parade/input.md +9 -0
- package/RAF/42-patch-parade/outcomes/1-fix-codex-model-resolution.md +36 -0
- package/RAF/42-patch-parade/outcomes/2-fix-provider-aware-name-generation.md +31 -0
- package/RAF/42-patch-parade/outcomes/3-fix-codex-error-event-rendering.md +32 -0
- package/RAF/42-patch-parade/outcomes/4-update-cli-help-docs.md +28 -0
- package/RAF/42-patch-parade/outcomes/5-update-default-codex-models-to-gpt-5-4.md +33 -0
- package/RAF/42-patch-parade/outcomes/6-unify-model-config-schema.md +89 -0
- package/RAF/42-patch-parade/plans/1-fix-codex-model-resolution.md +35 -0
- package/RAF/42-patch-parade/plans/2-fix-provider-aware-name-generation.md +38 -0
- package/RAF/42-patch-parade/plans/3-fix-codex-error-event-rendering.md +32 -0
- package/RAF/42-patch-parade/plans/4-update-cli-help-docs.md +31 -0
- package/RAF/42-patch-parade/plans/5-update-default-codex-models-to-gpt-5-4.md +35 -0
- package/RAF/42-patch-parade/plans/6-unify-model-config-schema.md +46 -0
- package/RAF/43-swiss-army/decisions.md +34 -0
- package/RAF/43-swiss-army/input.md +7 -0
- package/RAF/43-swiss-army/outcomes/1-fix-model-validation.md +21 -0
- package/RAF/43-swiss-army/outcomes/2-update-commit-format.md +31 -0
- package/RAF/43-swiss-army/outcomes/3-wire-reasoning-effort.md +28 -0
- package/RAF/43-swiss-army/outcomes/4-remove-provider-flag.md +27 -0
- package/RAF/43-swiss-army/outcomes/5-config-wizard-validation.md +23 -0
- package/RAF/43-swiss-army/outcomes/6-add-fast-mode.md +32 -0
- package/RAF/43-swiss-army/outcomes/7-config-preset.md +31 -0
- package/RAF/43-swiss-army/plans/1-fix-model-validation.md +38 -0
- package/RAF/43-swiss-army/plans/2-update-commit-format.md +46 -0
- package/RAF/43-swiss-army/plans/3-wire-reasoning-effort.md +39 -0
- package/RAF/43-swiss-army/plans/4-remove-provider-flag.md +43 -0
- package/RAF/43-swiss-army/plans/5-config-wizard-validation.md +42 -0
- package/RAF/43-swiss-army/plans/6-add-fast-mode.md +46 -0
- package/RAF/43-swiss-army/plans/7-config-preset.md +51 -0
- package/RAF/44-config-api-change/decisions.md +22 -0
- package/RAF/44-config-api-change/input.md +5 -0
- package/RAF/44-config-api-change/outcomes/1-restructure-config-subcommands.md +19 -0
- package/RAF/44-config-api-change/outcomes/2-move-preset-under-config.md +17 -0
- package/RAF/44-config-api-change/outcomes/3-update-existing-tests-for-config-api.md +14 -0
- package/RAF/44-config-api-change/outcomes/4-update-config-command-docs.md +11 -0
- package/RAF/44-config-api-change/outcomes/5-fix-codex-name-generation.md +18 -0
- package/RAF/44-config-api-change/plans/1-restructure-config-subcommands.md +37 -0
- package/RAF/44-config-api-change/plans/2-move-preset-under-config.md +38 -0
- package/RAF/44-config-api-change/plans/3-update-existing-tests-for-config-api.md +38 -0
- package/RAF/44-config-api-change/plans/4-update-config-command-docs.md +36 -0
- package/RAF/44-config-api-change/plans/5-fix-codex-name-generation.md +49 -0
- package/RAF/45-signal-cairn/decisions.md +7 -0
- package/RAF/45-signal-cairn/input.md +2 -0
- package/RAF/45-signal-cairn/outcomes/1-rename-provider-to-harness.md +19 -0
- package/RAF/45-signal-cairn/outcomes/2-normalize-model-display-names.md +18 -0
- package/RAF/45-signal-cairn/plans/1-rename-provider-to-harness.md +40 -0
- package/RAF/45-signal-cairn/plans/2-normalize-model-display-names.md +41 -0
- package/RAF/45-signal-lantern/decisions.md +10 -0
- package/RAF/45-signal-lantern/input.md +2 -0
- package/RAF/45-signal-lantern/outcomes/1-add-effort-and-fast-to-do-model-display.md +15 -0
- package/RAF/45-signal-lantern/outcomes/2-capture-codex-post-run-token-usage.md +15 -0
- package/RAF/45-signal-lantern/outcomes/3-show-codex-token-summaries-without-fake-cost.md +14 -0
- package/RAF/45-signal-lantern/plans/1-add-effort-and-fast-to-do-model-display.md +38 -0
- package/RAF/45-signal-lantern/plans/2-capture-codex-post-run-token-usage.md +37 -0
- package/RAF/45-signal-lantern/plans/3-show-codex-token-summaries-without-fake-cost.md +40 -0
- package/RAF/46-lantern-arc/decisions.md +19 -0
- package/RAF/46-lantern-arc/input.md +6 -0
- package/RAF/46-lantern-arc/outcomes/1-remove-spark-alias.md +16 -0
- package/RAF/46-lantern-arc/outcomes/2-clean-up-worktree-plan-command.md +30 -0
- package/RAF/46-lantern-arc/outcomes/3-fix-token-usage-accumulation.md +32 -0
- package/RAF/46-lantern-arc/outcomes/4-display-effort-in-compact-mode.md +22 -0
- package/RAF/46-lantern-arc/outcomes/5-codex-fast-mode-research.md +38 -0
- package/RAF/46-lantern-arc/outcomes/6-optimize-llm-prompts.md +39 -0
- package/RAF/46-lantern-arc/plans/1-remove-spark-alias.md +38 -0
- package/RAF/46-lantern-arc/plans/2-clean-up-worktree-plan-command.md +33 -0
- package/RAF/46-lantern-arc/plans/3-fix-token-usage-accumulation.md +33 -0
- package/RAF/46-lantern-arc/plans/4-display-effort-in-compact-mode.md +28 -0
- package/RAF/46-lantern-arc/plans/5-codex-fast-mode-research.md +34 -0
- package/RAF/46-lantern-arc/plans/6-optimize-llm-prompts.md +48 -0
- package/RAF/47-signal-trim/decisions.md +13 -0
- package/RAF/47-signal-trim/input.md +2 -0
- package/RAF/47-signal-trim/plans/1-remove-cache-from-status.md +73 -0
- package/README.md +47 -57
- package/dist/commands/config.d.ts.map +1 -1
- package/dist/commands/config.js +47 -49
- package/dist/commands/config.js.map +1 -1
- package/dist/commands/do.d.ts +2 -0
- package/dist/commands/do.d.ts.map +1 -1
- package/dist/commands/do.js +57 -44
- package/dist/commands/do.js.map +1 -1
- package/dist/commands/plan.d.ts.map +1 -1
- package/dist/commands/plan.js +36 -153
- package/dist/commands/plan.js.map +1 -1
- package/dist/commands/preset.d.ts +3 -0
- package/dist/commands/preset.d.ts.map +1 -0
- package/dist/commands/preset.js +158 -0
- package/dist/commands/preset.js.map +1 -0
- package/dist/core/claude-runner.d.ts +2 -0
- package/dist/core/claude-runner.d.ts.map +1 -1
- package/dist/core/claude-runner.js +36 -12
- package/dist/core/claude-runner.js.map +1 -1
- package/dist/core/codex-runner.d.ts +1 -0
- package/dist/core/codex-runner.d.ts.map +1 -1
- package/dist/core/codex-runner.js +26 -7
- package/dist/core/codex-runner.js.map +1 -1
- package/dist/core/failure-analyzer.js +2 -1
- package/dist/core/failure-analyzer.js.map +1 -1
- package/dist/core/git.d.ts +2 -2
- package/dist/core/git.d.ts.map +1 -1
- package/dist/core/git.js +53 -3
- package/dist/core/git.js.map +1 -1
- package/dist/core/pull-request.js +3 -3
- package/dist/core/pull-request.js.map +1 -1
- package/dist/core/runner-factory.d.ts +4 -4
- package/dist/core/runner-factory.d.ts.map +1 -1
- package/dist/core/runner-factory.js +8 -8
- package/dist/core/runner-factory.js.map +1 -1
- package/dist/core/runner-interface.d.ts +1 -1
- package/dist/core/runner-types.d.ts +17 -4
- package/dist/core/runner-types.d.ts.map +1 -1
- package/dist/parsers/codex-stream-renderer.d.ts +7 -0
- package/dist/parsers/codex-stream-renderer.d.ts.map +1 -1
- package/dist/parsers/codex-stream-renderer.js +37 -4
- package/dist/parsers/codex-stream-renderer.js.map +1 -1
- package/dist/prompts/amend.d.ts.map +1 -1
- package/dist/prompts/amend.js +29 -101
- package/dist/prompts/amend.js.map +1 -1
- package/dist/prompts/execution.d.ts.map +1 -1
- package/dist/prompts/execution.js +17 -34
- package/dist/prompts/execution.js.map +1 -1
- package/dist/prompts/planning.d.ts.map +1 -1
- package/dist/prompts/planning.js +21 -120
- package/dist/prompts/planning.js.map +1 -1
- package/dist/types/config.d.ts +33 -31
- package/dist/types/config.d.ts.map +1 -1
- package/dist/types/config.js +14 -28
- package/dist/types/config.js.map +1 -1
- package/dist/utils/config.d.ts +36 -16
- package/dist/utils/config.d.ts.map +1 -1
- package/dist/utils/config.js +209 -104
- package/dist/utils/config.js.map +1 -1
- package/dist/utils/name-generator.d.ts.map +1 -1
- package/dist/utils/name-generator.js +25 -12
- package/dist/utils/name-generator.js.map +1 -1
- package/dist/utils/terminal-symbols.d.ts +15 -2
- package/dist/utils/terminal-symbols.d.ts.map +1 -1
- package/dist/utils/terminal-symbols.js +36 -4
- package/dist/utils/terminal-symbols.js.map +1 -1
- package/dist/utils/token-tracker.d.ts +6 -1
- package/dist/utils/token-tracker.d.ts.map +1 -1
- package/dist/utils/token-tracker.js +84 -51
- package/dist/utils/token-tracker.js.map +1 -1
- package/dist/utils/validation.d.ts +1 -2
- package/dist/utils/validation.d.ts.map +1 -1
- package/dist/utils/validation.js +4 -25
- package/dist/utils/validation.js.map +1 -1
- package/package.json +1 -1
- package/src/commands/config.ts +60 -63
- package/src/commands/do.ts +63 -51
- package/src/commands/plan.ts +34 -165
- package/src/commands/preset.ts +186 -0
- package/src/core/claude-runner.ts +45 -5
- package/src/core/codex-runner.ts +32 -7
- package/src/core/failure-analyzer.ts +2 -1
- package/src/core/git.ts +57 -3
- package/src/core/pull-request.ts +3 -3
- package/src/core/runner-factory.ts +9 -9
- package/src/core/runner-interface.ts +1 -1
- package/src/core/runner-types.ts +17 -4
- package/src/parsers/codex-stream-renderer.ts +47 -4
- package/src/prompts/amend.ts +29 -101
- package/src/prompts/config-docs.md +206 -62
- package/src/prompts/execution.ts +17 -34
- package/src/prompts/planning.ts +21 -120
- package/src/types/config.ts +47 -58
- package/src/utils/config.ts +248 -115
- package/src/utils/name-generator.ts +29 -13
- package/src/utils/terminal-symbols.ts +46 -6
- package/src/utils/token-tracker.ts +96 -57
- package/src/utils/validation.ts +5 -30
- package/tests/unit/amend-prompt.test.ts +3 -2
- package/tests/unit/claude-runner-interactive.test.ts +21 -3
- package/tests/unit/claude-runner.test.ts +39 -0
- package/tests/unit/codex-runner.test.ts +163 -0
- package/tests/unit/codex-stream-renderer.test.ts +127 -0
- package/tests/unit/command-output.test.ts +57 -0
- package/tests/unit/commit-planning-artifacts-worktree.test.ts +24 -7
- package/tests/unit/commit-planning-artifacts.test.ts +26 -4
- package/tests/unit/config-command.test.ts +215 -303
- package/tests/unit/config.test.ts +319 -235
- package/tests/unit/dependency-integration.test.ts +27 -1
- package/tests/unit/do-model-display.test.ts +35 -0
- package/tests/unit/execution-prompt.test.ts +49 -19
- package/tests/unit/name-generator.test.ts +82 -12
- package/tests/unit/plan-command-auto-flag.test.ts +7 -10
- package/tests/unit/plan-command.test.ts +14 -17
- package/tests/unit/planning-prompt.test.ts +9 -8
- package/tests/unit/terminal-symbols.test.ts +94 -3
- package/tests/unit/token-tracker.test.ts +180 -1
- package/tests/unit/validation.test.ts +9 -41
- package/tests/unit/worktree-flag-override.test.ts +0 -186
package/CLAUDE.md
CHANGED
|
@@ -8,4 +8,3 @@ Node.js CLI tool that orchestrates task planning and execution via Claude Code C
|
|
|
8
8
|
|
|
9
9
|
- Keep README.md updated when adding/changing CLI commands, flags, or features
|
|
10
10
|
- This app has no users. Make whatever changes you want. This project is super greenfield. It's ok if you change the schema entirely.
|
|
11
|
-
- The role of this file is to describe common mistakes and confusion points that agents might encounter as they work in this project. If you ever encounter something in the project that surprises you, please alert the developer working with you and indicate that this is the case in the AgentMD file to help prevent future agents from having the same issue.
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# Project Decisions
|
|
2
|
+
|
|
3
|
+
## Should this be a single task or split?
|
|
4
|
+
Single task to re-run all E2E test phases, plus a separate config update task first.
|
|
5
|
+
|
|
6
|
+
## Should interactive flows (raf-dev plan --provider codex) be tested?
|
|
7
|
+
Yes, try interactive too — attempt raf-dev plan --provider codex interactively via PTY, documenting any difficulties.
|
|
8
|
+
|
|
9
|
+
## Which models to test?
|
|
10
|
+
Try all available models. User's Codex CLI now has: gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex, gpt-5.2, gpt-5.1-codex-max, gpt-5.1-codex-mini.
|
|
11
|
+
|
|
12
|
+
## The configured default `gpt-5.3-codex-spark` doesn't exist in available models. Update config or just document?
|
|
13
|
+
Update the default config: use `gpt-5.3-codex` for easy/spark-tier tasks (nameGeneration, failureAnalysis, effort: low). User initially said gpt-5.4-mini but corrected to gpt-5.3-codex for easy tasks too.
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
# Outcome: Update Codex Model Defaults
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
Replaced all references to the defunct `gpt-5.3-codex-spark` model with `gpt-5.3-codex` across the codebase.
|
|
5
|
+
|
|
6
|
+
## Changes Made
|
|
7
|
+
|
|
8
|
+
### `src/types/config.ts`
|
|
9
|
+
- `codexModels.nameGeneration`: `gpt-5.3-codex-spark` → `gpt-5.3-codex`
|
|
10
|
+
- `codexModels.failureAnalysis`: `gpt-5.3-codex-spark` → `gpt-5.3-codex`
|
|
11
|
+
- `codexEffortMapping.low`: `gpt-5.3-codex-spark` → `gpt-5.3-codex`
|
|
12
|
+
|
|
13
|
+
### `src/utils/config.ts`
|
|
14
|
+
- Updated comment examples (2 places)
|
|
15
|
+
- Updated error message examples (2 places)
|
|
16
|
+
- Removed `gpt-5.3-codex-spark` entry from `CODEX_MODEL_TIER_ORDER`; updated tier comment; `spark` and `codex` now both at tier 1
|
|
17
|
+
- Removed `gpt-5.3-codex-spark` → `'spark'` mapping from `getModelShortName`
|
|
18
|
+
- Updated `MODEL_ALIAS_TO_FULL_ID.spark` to point to `gpt-5.3-codex`
|
|
19
|
+
|
|
20
|
+
## Notes
|
|
21
|
+
- The `spark` alias is preserved but now resolves to `gpt-5.3-codex` instead of the defunct `gpt-5.3-codex-spark`
|
|
22
|
+
- Build passes with no TypeScript errors
|
|
23
|
+
|
|
24
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# Outcome: E2E Test Codex Provider (Post-Fix Verification)
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
Comprehensive E2E testing of the Codex provider after fixes from RAF[38:8]. All 3 previously-found critical/major issues are confirmed fixed. Full task execution (`raf-dev do --provider codex`) works end-to-end. Two new minor issues discovered.
|
|
5
|
+
|
|
6
|
+
## Test Results
|
|
7
|
+
|
|
8
|
+
### Fix #1: JSONL Stream Renderer (was CRITICAL) — PASS
|
|
9
|
+
- 10/10 unit tests pass against `renderCodexStreamEvent()`
|
|
10
|
+
- Tested: `item.completed` (agent_message, command_execution, file_change), `error`, `turn.failed`, `turn.completed` (usage), flat-format events (AgentMessage, CommandExecution), unknown events, `item.started` (skipped)
|
|
11
|
+
- All events produce correct display and textContent — the bug where events hit the default case and produced empty output is fixed
|
|
12
|
+
|
|
13
|
+
### Fix #2: `--provider` CLI Flag (was CRITICAL) — PASS
|
|
14
|
+
- 4/4 runner factory tests pass
|
|
15
|
+
- `createRunner({ provider: 'codex' })` → `CodexRunner` (not `ClaudeRunner`)
|
|
16
|
+
- `do.ts` line 200: `-p, --provider` defined; line 402: `options.provider` forwarded
|
|
17
|
+
- `plan.ts` line 77: `-p, --provider` defined; lines 287/531/709: provider forwarded to `createRunner()`
|
|
18
|
+
|
|
19
|
+
### Fix #3: Error Events (was MAJOR) — PASS
|
|
20
|
+
- Top-level `error` events render correctly: `✗ Error: <message>`
|
|
21
|
+
- `turn.failed` with `message` field renders: `✗ Failed: <message>`
|
|
22
|
+
- Tested with real Codex error output (invalid model → 400 error)
|
|
23
|
+
|
|
24
|
+
### `raf-dev do --provider codex` (Full Flow) — PASS
|
|
25
|
+
- Spawned `codex exec --full-auto --json --ephemeral -m gpt-5.3-codex <prompt>`
|
|
26
|
+
- JSONL stream rendered correctly in verbose mode (agent messages, commands, file changes, usage)
|
|
27
|
+
- Task completed successfully: code modified, outcome written, commit created
|
|
28
|
+
- Usage data captured: in: 215481, out: 3420
|
|
29
|
+
- Total execution time: ~2m 25s
|
|
30
|
+
|
|
31
|
+
### `raf-dev plan --provider codex` (Interactive) — PARTIAL (PTY limitation)
|
|
32
|
+
- `--provider codex` flag accepted and routed correctly
|
|
33
|
+
- Command starts up and reaches editor prompt
|
|
34
|
+
- Full interactive PTY testing not possible from non-TTY context (Claude Code environment)
|
|
35
|
+
- Direct `codex` interactive mode also requires real TTY (`stdin is not a terminal`)
|
|
36
|
+
- **Conclusion**: Code path is wired correctly; full interactive testing requires a real terminal session
|
|
37
|
+
|
|
38
|
+
### Model Resolution — PASS
|
|
39
|
+
- `effort: low` → `gpt-5.3-codex` (updated in task 1) ✓
|
|
40
|
+
- `effort: medium` → `gpt-5.3-codex` ✓
|
|
41
|
+
- `effort: high` → `gpt-5.4` ✓
|
|
42
|
+
- `nameGeneration` → `gpt-5.3-codex` (updated in task 1) ✓
|
|
43
|
+
- `failureAnalysis` → `gpt-5.3-codex` (updated in task 1) ✓
|
|
44
|
+
- Note: Model resolution tests pass only with the worktree build (task 1 changes). The main project dist still has `gpt-5.3-codex-spark` references until this branch merges.
|
|
45
|
+
|
|
46
|
+
### Model Availability — PASS
|
|
47
|
+
- `gpt-5.4`: works ✓
|
|
48
|
+
- `gpt-5.4-mini`: works ✓
|
|
49
|
+
- `gpt-5.3-codex`: works ✓ (used in full flow test)
|
|
50
|
+
|
|
51
|
+
## New Issues Found
|
|
52
|
+
|
|
53
|
+
### NEW-1: `item.completed` with `item.type: "error"` not rendered (MINOR)
|
|
54
|
+
- **Severity**: Minor
|
|
55
|
+
- Codex emits `{"type":"item.completed","item":{"type":"error","message":"..."}}` for some errors
|
|
56
|
+
- The `renderItemCompleted()` switch only handles `agent_message`, `command_execution`, `file_change` — `error` falls to default (empty output)
|
|
57
|
+
- **Impact**: Low — Codex also emits a separate top-level `{"type":"error"}` event which IS handled, so the error message still appears
|
|
58
|
+
|
|
59
|
+
### NEW-2: `turn.failed` with nested `error.message` uses default text (MINOR)
|
|
60
|
+
- **Severity**: Minor
|
|
61
|
+
- Real Codex output: `{"type":"turn.failed","error":{"message":"..."}}`
|
|
62
|
+
- Renderer reads `event.message` but real event has `event.error.message`
|
|
63
|
+
- Displays "Turn failed" (default) instead of the actual error message
|
|
64
|
+
- **Impact**: Low — the preceding `error` event already displays the full message
|
|
65
|
+
|
|
66
|
+
## Comparison with RAF[38:8]
|
|
67
|
+
|
|
68
|
+
| Issue | RAF[38:8] Status | Current Status |
|
|
69
|
+
|-------|-----------------|----------------|
|
|
70
|
+
| JSONL stream renderer wrong format | CRITICAL - all events empty | FIXED ✓ |
|
|
71
|
+
| `--provider` flag no-op | CRITICAL - always used Claude | FIXED ✓ |
|
|
72
|
+
| Error events silently swallowed | MAJOR - no error display | FIXED ✓ |
|
|
73
|
+
|
|
74
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
---
|
|
2
|
+
effort: low
|
|
3
|
+
---
|
|
4
|
+
# Task: Update Codex Model Defaults
|
|
5
|
+
|
|
6
|
+
## Objective
|
|
7
|
+
Replace the defunct `gpt-5.3-codex-spark` model with `gpt-5.4-mini` in the default Codex configuration.
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
The `gpt-5.3-codex-spark` model no longer exists in the Codex CLI model list. The user's Codex CLI now offers: gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex, gpt-5.2, gpt-5.1-codex-max, gpt-5.1-codex-mini. The `gpt-5.3-codex` model should replace `gpt-5.3-codex-spark` for all lightweight/spark-tier uses.
|
|
11
|
+
|
|
12
|
+
## Requirements
|
|
13
|
+
- Replace all occurrences of `gpt-5.3-codex-spark` with `gpt-5.3-codex` in `src/types/config.ts`
|
|
14
|
+
- Update the `CodexModelAlias` type: rename the `'spark'` alias or update its mapping to point to `gpt-5.3-codex`
|
|
15
|
+
- Update any model resolution/mapping code in `src/utils/config.ts` that references `gpt-5.3-codex-spark`
|
|
16
|
+
- Update README.md if it mentions the old model name
|
|
17
|
+
|
|
18
|
+
## Implementation Steps
|
|
19
|
+
1. In `src/types/config.ts`, change `DEFAULT_CONFIG.codexModels.nameGeneration` from `'gpt-5.3-codex-spark'` to `'gpt-5.3-codex'`
|
|
20
|
+
2. In `src/types/config.ts`, change `DEFAULT_CONFIG.codexModels.failureAnalysis` from `'gpt-5.3-codex-spark'` to `'gpt-5.3-codex'`
|
|
21
|
+
3. In `src/types/config.ts`, change `DEFAULT_CONFIG.codexEffortMapping.low` from `'gpt-5.3-codex-spark'` to `'gpt-5.3-codex'`
|
|
22
|
+
4. Search for any other references to `gpt-5.3-codex-spark` across the codebase and update them (e.g., in `src/utils/config.ts` model resolution maps, README.md)
|
|
23
|
+
5. Run `npm run build` to verify no type errors
|
|
24
|
+
|
|
25
|
+
## Acceptance Criteria
|
|
26
|
+
- [ ] No references to `gpt-5.3-codex-spark` remain in the codebase
|
|
27
|
+
- [ ] `gpt-5.3-codex` is used for nameGeneration, failureAnalysis, and effort: low
|
|
28
|
+
- [ ] Build passes with no errors
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
---
|
|
2
|
+
effort: high
|
|
3
|
+
---
|
|
4
|
+
# Task: E2E Test Codex Provider (Post-Fix Verification)
|
|
5
|
+
|
|
6
|
+
## Objective
|
|
7
|
+
Verify that all 3 issues found in RAF[38:8] are fixed and that the Codex provider works end-to-end, including interactive flows.
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
RAF[38:8] E2E testing found 3 issues:
|
|
11
|
+
1. **CRITICAL**: JSONL stream renderer parsed wrong event format → Fixed in commit `d3ad381`
|
|
12
|
+
2. **CRITICAL**: `--provider` CLI flag was a no-op → Fixed in commit `1c55657`
|
|
13
|
+
3. **MAJOR**: Error events silently swallowed → Fixed in commit `d3ad381`
|
|
14
|
+
|
|
15
|
+
This task re-runs all scenarios to confirm the fixes work with real Codex CLI output.
|
|
16
|
+
|
|
17
|
+
## Dependencies
|
|
18
|
+
1
|
|
19
|
+
|
|
20
|
+
## Requirements
|
|
21
|
+
- Use `raf-dev` (not `raf`) for all testing
|
|
22
|
+
- Test ALL major scenarios: planning, execution, config/model resolution, error handling
|
|
23
|
+
- Test interactive flows (`raf-dev plan --provider codex`) this time — document any PTY difficulties
|
|
24
|
+
- Try all available models to verify they work: gpt-5.4, gpt-5.4-mini, gpt-5.3-codex
|
|
25
|
+
- Document all results in the outcome with PASS/FAIL per scenario
|
|
26
|
+
- Do NOT auto-create fix tasks — just document issues
|
|
27
|
+
|
|
28
|
+
## Implementation Steps
|
|
29
|
+
|
|
30
|
+
### Phase 1: Set up dummy project
|
|
31
|
+
1. Create a temporary dummy Node.js project at `/tmp/raf-codex-test-project/` with:
|
|
32
|
+
- `package.json` with name and basic scripts
|
|
33
|
+
- `src/index.ts` — a small file with intentional TODOs
|
|
34
|
+
- `tsconfig.json` — basic TypeScript config
|
|
35
|
+
- Initialize git repo (`git init && git add . && git commit`)
|
|
36
|
+
|
|
37
|
+
### Phase 2: Verify Fix #1 — JSONL Stream Renderer (was CRITICAL)
|
|
38
|
+
2. Write a small Node.js script that imports and tests `codex-stream-renderer.ts` directly with real Codex event formats:
|
|
39
|
+
- `{"type":"item.completed","item":{"type":"agent_message","text":"hello"}}` → should produce display + textContent
|
|
40
|
+
- `{"type":"item.completed","item":{"type":"command_execution","command":"ls","exit_code":0}}` → should produce display
|
|
41
|
+
- `{"type":"error","message":"something failed"}` → should produce error display
|
|
42
|
+
- `{"type":"turn.failed","reason":"timeout"}` → should produce failure display
|
|
43
|
+
- `{"type":"turn.completed","usage":{"input_tokens":100,"output_tokens":50}}` → should capture usage
|
|
44
|
+
3. Verify each produces non-empty output (the bug was all events hitting the default case and producing empty output)
|
|
45
|
+
|
|
46
|
+
### Phase 3: Verify Fix #2 — `--provider` CLI Flag (was CRITICAL)
|
|
47
|
+
4. Run `raf-dev do --provider codex` on the dummy project and verify:
|
|
48
|
+
- The `--provider` flag is actually read from Commander options
|
|
49
|
+
- `createRunner()` receives `provider: 'codex'`
|
|
50
|
+
- A `CodexRunner` is instantiated (not `ClaudeRunner`)
|
|
51
|
+
- The codex CLI binary is invoked (not claude)
|
|
52
|
+
5. Check `src/commands/do.ts` and `src/commands/plan.ts` to confirm `options.provider` is read and forwarded
|
|
53
|
+
|
|
54
|
+
### Phase 4: Verify Fix #3 — Error Events (was MAJOR)
|
|
55
|
+
6. Test with an invalid/unavailable model to trigger Codex error output
|
|
56
|
+
7. Verify error messages are displayed to the user (not silently swallowed)
|
|
57
|
+
|
|
58
|
+
### Phase 5: Test `raf-dev do --provider codex` (full flow)
|
|
59
|
+
8. Create a simple plan file in the dummy project with `effort: medium`
|
|
60
|
+
9. Run `raf-dev do --provider codex` and verify:
|
|
61
|
+
- Task execution starts correctly
|
|
62
|
+
- `codex exec --full-auto --json --ephemeral -m <model>` command is constructed properly
|
|
63
|
+
- JSONL stream output displays correctly in verbose mode
|
|
64
|
+
- Task completes and produces an outcome file
|
|
65
|
+
- Any commits are created correctly
|
|
66
|
+
|
|
67
|
+
### Phase 6: Test `raf-dev plan --provider codex` (interactive)
|
|
68
|
+
10. Run `raf-dev plan --provider codex` targeting the dummy project
|
|
69
|
+
- Provide a simple input like "add input validation to the exported functions"
|
|
70
|
+
- Verify: Does the PTY spawn correctly? Does Codex receive the prompt?
|
|
71
|
+
- Verify: Are plan files generated with correct frontmatter?
|
|
72
|
+
- Document any difficulties with PTY interaction
|
|
73
|
+
|
|
74
|
+
### Phase 7: Test model resolution with available models
|
|
75
|
+
11. Test effort-based model resolution:
|
|
76
|
+
- `effort: low` → should use `gpt-5.3-codex` (updated in task 1)
|
|
77
|
+
- `effort: medium` → should use `gpt-5.3-codex`
|
|
78
|
+
- `effort: high` → should use `gpt-5.4`
|
|
79
|
+
12. Test explicit model override in plan frontmatter (e.g., `model: codex/gpt-5.4`)
|
|
80
|
+
13. Try running with different models to verify they work: gpt-5.4, gpt-5.4-mini, gpt-5.3-codex
|
|
81
|
+
|
|
82
|
+
### Phase 8: Document results
|
|
83
|
+
14. Create outcome document with:
|
|
84
|
+
- Each scenario tested and PASS/FAIL
|
|
85
|
+
- Detailed description of any failures
|
|
86
|
+
- Severity assessment for new issues
|
|
87
|
+
- Comparison with RAF[38:8] results (which issues are now fixed)
|
|
88
|
+
|
|
89
|
+
## Acceptance Criteria
|
|
90
|
+
- [ ] All 3 previously-found issues verified as fixed
|
|
91
|
+
- [ ] JSONL stream renderer correctly parses real Codex events
|
|
92
|
+
- [ ] `--provider codex` flag correctly routes to CodexRunner
|
|
93
|
+
- [ ] Error events displayed (not silently swallowed)
|
|
94
|
+
- [ ] `raf-dev do --provider codex` tested end-to-end
|
|
95
|
+
- [ ] `raf-dev plan --provider codex` interactive flow attempted and documented
|
|
96
|
+
- [ ] Model resolution tested with available models
|
|
97
|
+
- [ ] Comprehensive outcome document created
|
|
98
|
+
|
|
99
|
+
## Notes
|
|
100
|
+
- This task requires the `codex` CLI to be installed and available in PATH
|
|
101
|
+
- The key files to check: `src/core/codex-runner.ts`, `src/parsers/codex-stream-renderer.ts`, `src/core/runner-factory.ts`, `src/commands/do.ts`, `src/commands/plan.ts`
|
|
102
|
+
- Previous outcome for reference: `/Users/eremeev/projects/RAF/RAF/38-dual-wielder/outcomes/8-e2e-test-codex-provider.md`
|
|
103
|
+
- Fixes applied in commits: `d3ad381` (renderer + error handling), `1c55657` (--provider flag wiring)
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# Project Decisions
|
|
2
|
+
|
|
3
|
+
## For `fix-minor-bugs`, which specific bugs do you want included in scope beyond the two concrete issues you already named?
|
|
4
|
+
Take the minor bugs from `/Users/eremeev/projects/RAF/RAF/41-echo-chamber/outcomes/2-e2e-test-codex-provider.md`, specifically the two new minor issues documented there:
|
|
5
|
+
- `item.completed` with `item.type: "error"` is not rendered
|
|
6
|
+
- `turn.failed` with nested `error.message` falls back to default text
|
|
7
|
+
|
|
8
|
+
## For `fix-provider-aware-name-generation`, should the plan include tests for both `claude` and `codex`, or is wiring plus a focused regression test enough?
|
|
9
|
+
Focused regression test is enough.
|
|
10
|
+
|
|
11
|
+
## For `fix-codex-opus-model-selection`, what should RAF do when the provider is `codex` and the resolved model is `opus`: remap to a supported Codex default, reject with a clear RAF error, or something else?
|
|
12
|
+
This should not happen. Investigate and fix the incorrect resolution/config path so Codex does not resolve to `opus` in the first place.
|
|
13
|
+
|
|
14
|
+
Investigation notes:
|
|
15
|
+
- `resolveModelOption()` falls back to `getModel(scenario)` without a provider argument
|
|
16
|
+
- `plan.ts` and `do.ts` call `resolveModelOption()` before threading `options.provider` into model resolution
|
|
17
|
+
- `src/prompts/planning.ts` and `src/prompts/amend.ts` contain hardcoded example frontmatter with `model: opus`, which can bias Codex planning output toward an unsupported model override
|
|
18
|
+
|
|
19
|
+
## For `update-cli-help-docs`, should I update only CLI help text and `README.md`, or also any prompt/docs artifacts under `src/prompts` and `RAF/*` that still mention the removed flags?
|
|
20
|
+
Update only CLI help text and `README.md`.
|
|
21
|
+
|
|
22
|
+
## For `update-default-codex-config`, should every Codex model slot use the same literal model string `gpt-5.4`, including planning, execution, name generation, and fallback/default slots?
|
|
23
|
+
Yes.
|
|
24
|
+
|
|
25
|
+
## For `separate-effort-to-reasoning-effort-config`, should the config stay provider-specific, or should it move to a provider-agnostic schema even if that is a breaking change?
|
|
26
|
+
Make it provider-agnostic and change config so each model is specified as an object like `{ model: "opus", reasoningEffort: "high", provider: "claude" }`. Remove the top-level provider field, remove separate Codex model and effort-mapping sections, and remove special model-specifying flags like `--model` and `--sonnet`.
|
|
27
|
+
|
|
28
|
+
## For `separate-effort-to-reasoning-effort-config`, should RAF reject the old config keys and removed model flags with migration errors, or just drop support and cover the new schema with tests?
|
|
29
|
+
Drop support with no migration path. Add new tests for the new config schema to cover all cases.
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
- [ ] fix minor bugs
|
|
2
|
+
- [ ] update cli help docs to reflect on removed --worktreee --no-worktree flags
|
|
3
|
+
- [ ] Pass the provider option through to generateProjectNames() so it spawns the correct binary (codex or claude) instead of hardcoding claude. Update callSonnetForMultipleNames and runClaudePrint to accept a provider parameter and use getProviderBinaryName(provider) for the spawn call.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
update default config so all codedx modals are gpt-5.4
|
|
8
|
+
separate mapping for effort to model resoning effort in config from
|
|
9
|
+
the task effort level (low/medium/high) or as a separate config field for codex only
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# Outcome: Fix Codex Model Resolution
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Fixed the root cause of Codex provider resolving to Claude-only models (like `opus`) by threading provider context through the model resolution pipeline.
|
|
6
|
+
|
|
7
|
+
## Changes Made
|
|
8
|
+
|
|
9
|
+
### Core fix: `src/utils/validation.ts`
|
|
10
|
+
- Added `provider` parameter to `resolveModelOption()` so the fallback path (`getModel(scenario, provider)`) uses provider-specific defaults instead of always using Claude defaults.
|
|
11
|
+
|
|
12
|
+
### Command integration: `src/commands/plan.ts`
|
|
13
|
+
- Moved provider resolution (`options.provider`) before model resolution so it's available when calling `resolveModelOption()`.
|
|
14
|
+
- Passed provider to `resolveModelOption(..., provider)`.
|
|
15
|
+
|
|
16
|
+
### Command integration: `src/commands/do.ts`
|
|
17
|
+
- Extracted provider early and passed it to `resolveModelOption(..., provider)`.
|
|
18
|
+
- Fixed `getModel('failureAnalysis')` call to pass `provider` so failure analysis also uses Codex models when appropriate.
|
|
19
|
+
|
|
20
|
+
### Prompt neutralization: `src/prompts/planning.ts` and `src/prompts/amend.ts`
|
|
21
|
+
- Changed hardcoded `model: opus` example in plan/amend prompts to `model: sonnet`, which is valid for both providers and doesn't bias Codex-generated plans toward an unsupported model.
|
|
22
|
+
|
|
23
|
+
### Regression tests: `tests/unit/validation.test.ts`
|
|
24
|
+
- Added test: codex provider returns codex-specific defaults (`gpt-5.3-codex` for plan, `gpt-5.4` for execute).
|
|
25
|
+
- Added test: claude/undefined provider returns claude defaults (`opus`).
|
|
26
|
+
- Added test: no scenario with codex provider ever resolves to `opus`.
|
|
27
|
+
|
|
28
|
+
## Acceptance Criteria
|
|
29
|
+
|
|
30
|
+
- [x] `--provider codex` no longer resolves default plan or execution models to `opus`
|
|
31
|
+
- [x] Effort-based model resolution uses `codexEffortMapping` when the provider is `codex`
|
|
32
|
+
- [x] Planning guidance no longer nudges Codex plans toward explicit `model: opus` frontmatter
|
|
33
|
+
- [x] Focused regression tests cover the provider-aware resolution path
|
|
34
|
+
- [x] All tests pass (4 pre-existing failures unrelated to this change)
|
|
35
|
+
|
|
36
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# Outcome: Fix Provider-Aware Name Generation
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Threaded the provider parameter through the name generation pipeline so `raf plan --provider codex` spawns the Codex binary with the correct Codex model for project name generation.
|
|
6
|
+
|
|
7
|
+
## Changes Made
|
|
8
|
+
|
|
9
|
+
### `src/utils/name-generator.ts`
|
|
10
|
+
- Added `provider` parameter to `runClaudePrint()`, `callSonnetForName()`, `callSonnetForMultipleNames()`, `generateProjectName()`, and `generateProjectNames()`.
|
|
11
|
+
- `runClaudePrint()` now uses `getProviderBinaryName(provider)` instead of hardcoded `'claude'` for the spawn binary.
|
|
12
|
+
- `runClaudePrint()` now passes `provider` to `getModel('nameGeneration', provider)` for provider-aware model resolution.
|
|
13
|
+
- Imported `getProviderBinaryName` from `runner-factory` and `HarnessProvider` type.
|
|
14
|
+
|
|
15
|
+
### `src/commands/plan.ts`
|
|
16
|
+
- Passed `provider` to `getModel('nameGeneration', provider)` for the status log message.
|
|
17
|
+
- Passed `provider` to `generateProjectNames(cleanInput, provider)`.
|
|
18
|
+
|
|
19
|
+
### `tests/unit/name-generator.test.ts`
|
|
20
|
+
- Added test: codex provider spawns the `codex` binary with `gpt-5.3-codex` model.
|
|
21
|
+
- Added test: claude provider spawns the `claude` binary.
|
|
22
|
+
|
|
23
|
+
## Acceptance Criteria
|
|
24
|
+
|
|
25
|
+
- [x] `raf plan --provider codex` uses the Codex binary for generated project names.
|
|
26
|
+
- [x] Name generation uses the provider-appropriate configured model.
|
|
27
|
+
- [x] Claude name generation behavior remains unchanged.
|
|
28
|
+
- [x] A focused regression test covers the new provider-aware path.
|
|
29
|
+
- [x] All tests pass (4 pre-existing failures unrelated to this change).
|
|
30
|
+
|
|
31
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Outcome: Fix Codex Error Event Rendering
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Fixed two renderer gaps in `codex-stream-renderer.ts` so that real-world Codex error events produce visible output instead of empty or generic text.
|
|
6
|
+
|
|
7
|
+
## Changes Made
|
|
8
|
+
|
|
9
|
+
### `src/parsers/codex-stream-renderer.ts`
|
|
10
|
+
- Added `message?: string` to the `item` shape in `CodexEvent` so error items can carry a message.
|
|
11
|
+
- Added `error?: { message?: string }` to `CodexEvent` for the nested error object on `turn.failed` events.
|
|
12
|
+
- Added `case 'error'` in `renderItemCompleted()` that renders ` ✗ Error: <message>\n` (matching the existing top-level error style).
|
|
13
|
+
- Updated `renderTurnFailed()` to prefer `event.error?.message` (the real Codex field) before falling back to `event.message` and then the generic `'Turn failed'` text.
|
|
14
|
+
|
|
15
|
+
### `tests/unit/codex-stream-renderer.test.ts` (new file)
|
|
16
|
+
- 8 focused tests covering both bug cases and confirming existing event types are unchanged:
|
|
17
|
+
- `item.completed` with `item.type: "error"` renders error line
|
|
18
|
+
- `item.completed` error with missing message uses fallback
|
|
19
|
+
- `turn.failed` with `error.message` surfaces the real message
|
|
20
|
+
- `turn.failed` falls back to `event.message` when no error object
|
|
21
|
+
- `turn.failed` falls back to generic text when neither field present
|
|
22
|
+
- Existing: `agent_message`, `command_execution`, top-level `error` event
|
|
23
|
+
|
|
24
|
+
## Acceptance Criteria
|
|
25
|
+
|
|
26
|
+
- [x] `item.completed` with `item.type: "error"` renders a visible error line.
|
|
27
|
+
- [x] `turn.failed.error.message` is surfaced in the rendered output.
|
|
28
|
+
- [x] Existing supported Codex event rendering remains unchanged.
|
|
29
|
+
- [x] Focused renderer tests cover both real-world bug cases.
|
|
30
|
+
- [x] All tests pass (4 pre-existing failures unrelated to this change)
|
|
31
|
+
|
|
32
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Outcome: Update CLI Help Docs
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Removed `--worktree` and `--no-worktree` flag references from CLI help text and README.md.
|
|
6
|
+
|
|
7
|
+
## Changes Made
|
|
8
|
+
|
|
9
|
+
### `src/commands/plan.ts`
|
|
10
|
+
- Removed the `-w, --worktree` and `--no-worktree` Commander option declarations from `createPlanCommand()`.
|
|
11
|
+
|
|
12
|
+
### `README.md`
|
|
13
|
+
- Removed `raf plan --worktree` from the `raf plan` usage examples.
|
|
14
|
+
- Updated the Worktree Mode "Basic workflow" example to use config-based activation instead of the flag.
|
|
15
|
+
- Updated "How it works" bullets to remove `--worktree` and `--no-worktree` references.
|
|
16
|
+
- Removed the `--worktree` and `--no-worktree` rows from the Command Reference flag table.
|
|
17
|
+
|
|
18
|
+
### `tests/unit/worktree-flag-override.test.ts` (deleted)
|
|
19
|
+
- Deleted the test file that tested the removed CLI flags. The file was already partially broken (do command tests were pre-existing failures) and all remaining tests became invalid after flag removal.
|
|
20
|
+
|
|
21
|
+
## Acceptance Criteria
|
|
22
|
+
|
|
23
|
+
- [x] CLI help output no longer lists the removed `--worktree` / `--no-worktree` flags.
|
|
24
|
+
- [x] `README.md` no longer documents the removed flags.
|
|
25
|
+
- [x] No prompt docs or archived `RAF/*` artifacts are changed for this task.
|
|
26
|
+
- [x] All tests pass (3 pre-existing failures unrelated to this change)
|
|
27
|
+
|
|
28
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Outcome: Update Default Codex Models to GPT-5.4
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Updated every Codex default model entry in `DEFAULT_CONFIG` to `gpt-5.4`, replacing the previous mixed defaults (`gpt-5.3-codex` for most slots, `gpt-5.4` only for execute and effort: high).
|
|
6
|
+
|
|
7
|
+
## Changes Made
|
|
8
|
+
|
|
9
|
+
### `src/types/config.ts`
|
|
10
|
+
- `codexModels.plan`: `gpt-5.3-codex` → `gpt-5.4`
|
|
11
|
+
- `codexModels.nameGeneration`: `gpt-5.3-codex` → `gpt-5.4`
|
|
12
|
+
- `codexModels.failureAnalysis`: `gpt-5.3-codex` → `gpt-5.4`
|
|
13
|
+
- `codexModels.prGeneration`: `gpt-5.3-codex` → `gpt-5.4`
|
|
14
|
+
- `codexModels.config`: `gpt-5.3-codex` → `gpt-5.4`
|
|
15
|
+
- `codexEffortMapping.low`: `gpt-5.3-codex` → `gpt-5.4`
|
|
16
|
+
- `codexEffortMapping.medium`: `gpt-5.3-codex` → `gpt-5.4`
|
|
17
|
+
- `codexModels.execute` and `codexEffortMapping.high` were already `gpt-5.4` — unchanged.
|
|
18
|
+
|
|
19
|
+
### `tests/unit/validation.test.ts`
|
|
20
|
+
- Updated assertions for codex plan and failureAnalysis defaults from `gpt-5.3-codex` to `gpt-5.4`.
|
|
21
|
+
|
|
22
|
+
### `tests/unit/name-generator.test.ts`
|
|
23
|
+
- Updated assertion for codex name generation model from `gpt-5.3-codex` to `gpt-5.4`.
|
|
24
|
+
|
|
25
|
+
## Acceptance Criteria
|
|
26
|
+
|
|
27
|
+
- [x] `DEFAULT_CONFIG.codexModels.plan`, `.execute`, `.nameGeneration`, `.failureAnalysis`, `.prGeneration`, and `.config` are all `gpt-5.4`.
|
|
28
|
+
- [x] `DEFAULT_CONFIG.codexEffortMapping.low`, `.medium`, and `.high` are all `gpt-5.4`.
|
|
29
|
+
- [x] Claude defaults remain unchanged.
|
|
30
|
+
- [x] Any documentation or tests that mention old Codex defaults are updated.
|
|
31
|
+
- [x] All tests pass (3 pre-existing failures unrelated to this change)
|
|
32
|
+
|
|
33
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
# Outcome: Unify Model Config Schema
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
Replaced the provider-split model configuration with a unified `ModelEntry` schema where every model entry is `{ model, provider, reasoningEffort? }`. Removed `--model` and `--sonnet` CLI flags. Removed legacy `provider`, `codexModels`, and `codexEffortMapping` config keys.
|
|
6
|
+
|
|
7
|
+
## Changes Made
|
|
8
|
+
|
|
9
|
+
### `src/types/config.ts`
|
|
10
|
+
- Added `ModelEntry` interface: `{ model: string, provider: HarnessProvider, reasoningEffort?: TaskEffortLevel }`
|
|
11
|
+
- Changed `ModelsConfig` and `EffortMappingConfig` to use `ModelEntry` instead of string values
|
|
12
|
+
- Removed `provider`, `codexModels`, `codexEffortMapping` from `RafConfig`
|
|
13
|
+
- Updated `DEFAULT_CONFIG` with `ModelEntry` objects for all model/effort entries
|
|
14
|
+
- Removed `model` and `sonnet` from `PlanCommandOptions` and `DoCommandOptions`
|
|
15
|
+
|
|
16
|
+
### `src/utils/config.ts`
|
|
17
|
+
- Added `REMOVED_KEYS` map rejecting legacy keys with helpful migration messages
|
|
18
|
+
- Added `validateModelEntry()` for validating `ModelEntry` objects
|
|
19
|
+
- `validateConfig()` validates model entries as objects, not strings
|
|
20
|
+
- `deepMerge()` uses `mergeModelEntry()` for per-entry model merging
|
|
21
|
+
- `getModel()` returns `ModelEntry` with optional `providerOverride` parameter
|
|
22
|
+
- `resolveEffortToModel()` returns `ModelEntry`
|
|
23
|
+
- `applyModelCeiling()` works with `ModelEntry` objects
|
|
24
|
+
- Added `parseModelSpec()` for parsing frontmatter model strings to derive provider
|
|
25
|
+
|
|
26
|
+
### `src/utils/validation.ts`
|
|
27
|
+
- Removed `resolveModelOption()` function entirely
|
|
28
|
+
|
|
29
|
+
### `src/commands/plan.ts`
|
|
30
|
+
- Removed `--model` and `--sonnet` Commander options
|
|
31
|
+
- Updated to use `ModelEntry` for runner creation and logging
|
|
32
|
+
|
|
33
|
+
### `src/commands/do.ts`
|
|
34
|
+
- Removed `--model` and `--sonnet` Commander options
|
|
35
|
+
- `resolveTaskModel` returns `ModelEntry` instead of string
|
|
36
|
+
- Uses `parseModelSpec()` for frontmatter model parsing
|
|
37
|
+
|
|
38
|
+
### `src/commands/config.ts`
|
|
39
|
+
- Updated to use `ModelEntry` for runner creation
|
|
40
|
+
|
|
41
|
+
### `src/utils/name-generator.ts`
|
|
42
|
+
- Removed `provider` parameter from all functions (now config-driven)
|
|
43
|
+
- Uses `getModel('nameGeneration')` to get both model and provider
|
|
44
|
+
|
|
45
|
+
### `src/core/failure-analyzer.ts`
|
|
46
|
+
- Updated `getModel('failureAnalysis')` usage for `ModelEntry`
|
|
47
|
+
|
|
48
|
+
### `src/core/pull-request.ts`
|
|
49
|
+
- Updated `getModel('prGeneration')` usage for `ModelEntry`
|
|
50
|
+
|
|
51
|
+
### `tests/unit/config.test.ts`
|
|
52
|
+
- Complete rewrite for new `ModelEntry` schema
|
|
53
|
+
- Tests for rejected legacy keys, invalid entries, mixed-provider configs
|
|
54
|
+
|
|
55
|
+
### `tests/unit/config-command.test.ts`
|
|
56
|
+
- Updated all model config examples to use `ModelEntry` objects
|
|
57
|
+
|
|
58
|
+
### `tests/unit/validation.test.ts`
|
|
59
|
+
- Removed `resolveModelOption` tests, added verification it's not exported
|
|
60
|
+
|
|
61
|
+
### `tests/unit/name-generator.test.ts`
|
|
62
|
+
- Updated for config-driven behavior (no provider parameter)
|
|
63
|
+
|
|
64
|
+
### `tests/unit/plan-command-auto-flag.test.ts`
|
|
65
|
+
- Updated to verify `--model`/`--sonnet` flags are absent
|
|
66
|
+
|
|
67
|
+
### `README.md`
|
|
68
|
+
- Updated config examples to use `ModelEntry` objects
|
|
69
|
+
- Rewrote provider configuration section
|
|
70
|
+
- Removed `--model` and `--sonnet` from command reference tables
|
|
71
|
+
|
|
72
|
+
### `src/prompts/config-docs.md`
|
|
73
|
+
- Complete rewrite for `ModelEntry` schema
|
|
74
|
+
- Updated all examples, validation rules, and config editor instructions
|
|
75
|
+
|
|
76
|
+
## Acceptance Criteria
|
|
77
|
+
|
|
78
|
+
- [x] `ModelEntry` interface with `{ model, provider, reasoningEffort? }` replaces string model values
|
|
79
|
+
- [x] `DEFAULT_CONFIG` uses `ModelEntry` objects for all `models` and `effortMapping` entries
|
|
80
|
+
- [x] Legacy `provider`, `codexModels`, `codexEffortMapping` keys are rejected with migration messages
|
|
81
|
+
- [x] `--model` and `--sonnet` CLI flags removed from `raf plan` and `raf do`
|
|
82
|
+
- [x] `--provider` CLI flag kept as override mechanism
|
|
83
|
+
- [x] `getModel()` returns `ModelEntry` (not string)
|
|
84
|
+
- [x] `resolveModelOption` removed from validation.ts
|
|
85
|
+
- [x] All consumers updated: plan, do, config, name-generator, failure-analyzer, pull-request
|
|
86
|
+
- [x] README.md and config-docs.md updated for new schema
|
|
87
|
+
- [x] All tests pass (4 pre-existing failures unrelated to this change)
|
|
88
|
+
|
|
89
|
+
<promise>COMPLETE</promise>
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
---
|
|
2
|
+
effort: medium
|
|
3
|
+
---
|
|
4
|
+
# Task: Fix Codex Model Resolution
|
|
5
|
+
|
|
6
|
+
## Objective
|
|
7
|
+
Ensure RAF resolves provider-specific default and effort-based models correctly so Codex flows never select the Claude-only `opus` model.
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
The current Codex path can still surface `opus`, which leads to unsupported-model errors such as `The 'opus' model is not supported when using Codex with a ChatGPT account.` Investigation found that provider context is dropped during default model resolution, and the planning prompts still include a hardcoded `model: opus` example that can bias generated plans.
|
|
11
|
+
|
|
12
|
+
## Requirements
|
|
13
|
+
- Fix the root cause instead of remapping `opus` after the fact.
|
|
14
|
+
- Update the default model resolution path so `--provider codex` uses `codexModels.*` and `codexEffortMapping.*` consistently.
|
|
15
|
+
- Make `plan` and `do` read `options.provider` before resolving default models and pass provider context through the relevant helpers.
|
|
16
|
+
- Review provider-sensitive helpers such as `resolveModelOption()` and `getModel()` call sites to ensure Codex never inherits Claude defaults implicitly.
|
|
17
|
+
- Remove or neutralize hardcoded `model: opus` prompt examples in planning-related prompts so Codex-generated plans are not steered toward unsupported explicit model overrides.
|
|
18
|
+
- Add focused regression coverage for the incorrect provider/model resolution path.
|
|
19
|
+
|
|
20
|
+
## Implementation Steps
|
|
21
|
+
1. Trace the startup model-resolution path for `raf plan` and `raf do`, including CLI flag parsing and config lookup.
|
|
22
|
+
2. Update the resolution helpers to accept provider context where needed and use provider-specific defaults.
|
|
23
|
+
3. Adjust `src/commands/plan.ts` and `src/commands/do.ts` to resolve provider before model selection and thread it through consistently.
|
|
24
|
+
4. Revise planning/amend prompt examples to avoid Codex-hostile hardcoded `model: opus` guidance.
|
|
25
|
+
5. Add or update tests that prove Codex defaults resolve to supported Codex models rather than Claude aliases.
|
|
26
|
+
|
|
27
|
+
## Acceptance Criteria
|
|
28
|
+
- [ ] `--provider codex` no longer resolves default plan or execution models to `opus`.
|
|
29
|
+
- [ ] Effort-based model resolution uses `codexEffortMapping` when the provider is `codex`.
|
|
30
|
+
- [ ] Planning guidance no longer nudges Codex plans toward explicit `model: opus` frontmatter.
|
|
31
|
+
- [ ] Focused regression tests cover the provider-aware resolution path.
|
|
32
|
+
- [ ] All tests pass
|
|
33
|
+
|
|
34
|
+
## Notes
|
|
35
|
+
The fix should preserve the existing Claude defaults and should not introduce fallback remapping that hides the underlying resolution bug.
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
---
|
|
2
|
+
effort: medium
|
|
3
|
+
---
|
|
4
|
+
# Task: Fix Provider-Aware Name Generation
|
|
5
|
+
|
|
6
|
+
## Objective
|
|
7
|
+
Pass the selected provider through project name generation so RAF spawns the correct CLI binary and model when suggesting project names.
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
`generateProjectNames()` currently hardcodes the Claude binary path through helper functions, so `--provider codex` still launches Claude during name generation. The user wants the provider option threaded through `generateProjectNames()`, `callSonnetForMultipleNames()`, and `runClaudePrint()` and the spawn call switched to `getProviderBinaryName(provider)`.
|
|
11
|
+
|
|
12
|
+
## Dependencies
|
|
13
|
+
1
|
|
14
|
+
|
|
15
|
+
## Requirements
|
|
16
|
+
- Pass the active provider from the planning command into `generateProjectNames()`.
|
|
17
|
+
- Update `callSonnetForMultipleNames()` and `runClaudePrint()` to accept a provider parameter.
|
|
18
|
+
- Use `getProviderBinaryName(provider)` instead of hardcoding `claude` for the spawn call.
|
|
19
|
+
- Resolve the name-generation model with provider awareness so Codex uses `codexModels.nameGeneration`.
|
|
20
|
+
- Preserve the existing fallback-name behavior when the CLI call fails.
|
|
21
|
+
- Add a focused regression test; broad dual-provider test coverage is not required for this task.
|
|
22
|
+
|
|
23
|
+
## Implementation Steps
|
|
24
|
+
1. Update the name-generation helper signatures to accept a provider argument.
|
|
25
|
+
2. Thread provider from `src/commands/plan.ts` into `generateProjectNames()` and related helpers.
|
|
26
|
+
3. Replace the hardcoded spawn binary with `getProviderBinaryName(provider)` and provider-aware model lookup.
|
|
27
|
+
4. Keep fallback sanitization and fallback name generation behavior unchanged.
|
|
28
|
+
5. Add a targeted regression test covering the provider-specific spawn/model path.
|
|
29
|
+
|
|
30
|
+
## Acceptance Criteria
|
|
31
|
+
- [ ] `raf plan --provider codex` uses the Codex binary for generated project names.
|
|
32
|
+
- [ ] Name generation uses the provider-appropriate configured model.
|
|
33
|
+
- [ ] Claude name generation behavior remains unchanged.
|
|
34
|
+
- [ ] A focused regression test covers the new provider-aware path.
|
|
35
|
+
- [ ] All tests pass
|
|
36
|
+
|
|
37
|
+
## Notes
|
|
38
|
+
Minimal changes are preferred here; rename helpers only if it materially improves clarity.
|